This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

LEARNING IN NONATOMIC GAMES, PART I:
FINITE ACTION SPACES AND POPULATION GAMES

Saeed Hadikhanloo Tweag I/O, Paris, France. [email protected] Rida Laraki CNRS (Lamsade, University of Dauphine-PSL) and University of Liverpool (Computer Science Department). [email protected]
Panayotis Mertikopoulos
 Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP, LIG, 38000, Grenoble, France. [email protected]
 and  Sylvain Sorin Institut de Mathématiques Jussieu-PRG, Sorbonne Université, UPMC, CNRS UMR 7586. [email protected]
Abstract.

We examine the long-run behavior of a wide range of dynamics for learning in nonatomic games, in both discrete and continuous time. The class of dynamics under consideration includes fictitious play and its regularized variants, the best reply dynamics (again, possibly regularized), as well as the dynamics of dual averaging / “follow the regularized leader” (which themselves include as special cases the replicator dynamics and Friedman’s projection dynamics). Our analysis concerns both the actual trajectory of play and its time-average, and we cover potential and monotone games, as well as games with an evolutionarily stable state (global or otherwise). We focus exclusively on games with finite action spaces; nonatomic games with continuous action spaces are treated in detail in Part II.

Key words and phrases:
Learning; nonatomic games; fictitious play; dual averaging; evolutionary stability; Nash equilibrium; variational inequalities
2020 Mathematics Subject Classification:
Primary 91A22, 91A26; secondary 37N40, 68Q32.

1. Introduction

The study of evolutionary dynamics in population games has been the mainstay of evolutionary game theory ever since the inception of the field in the mid-1970’s. Such dynamics are typically derived from biological models or economic microfoundations that express the net growth rate of a type (or strategy) within a population as a function of the relative frequency of all other types. Thus, owing to the flexibility of this general framework, these considerations have generated an immense body of literature; for an introduction to the topic, we refer the reader to the masterful treatments of Hofbauer & Sigmund [20], Weibull [57], and the more recent, panoramic survey of Sandholm [45, 46].

Our paper takes a complementary approach and aims to examine the long-run behavior of learning procedures in nonatomic games. While the topic of learning in games with a finite number of players is fairly well-studied – see e.g., the classic works of Fudenberg & Levine [13] and Cesa-Bianchi & Lugosi [9] – the same cannot be said for learning in population games. On that account, our goal in this paper is to provide a synthetic treatment for a wide range of dynamics for learning in nonatomic games, unifying along the way a number of existing results in the literature.

We pursue this goal in two parts. In the current paper, we focus exclusively on games with shared, finite action spaces; games with continuous (and possibly player-dependent) action spaces are significantly more complicated, so their study is deferred to the companion paper [16].

Overview of the paper. 

To specify the general framework under consideration one has to:

  1. a)

    Describe the family of games to be studied: These are anonymous nonatomic games and two concrete classes thereof, potential and monotone games; the relevant details are provided in Section 2.

  2. b)

    Identify the solution concepts involved: We will mainly focus on Nash equilibria and variants of evolutionarily stable states, expressed throughout as solutions of an associated variational inequality; this is also described in Section 2.

  3. c)

    Define the dynamics under study: This is the content of Section 3, where we begin by recalling the standard fictitious play procedure of Brown [8] and Robinson [41], its regularized variants [18], and its continuous-time analogue, the best reply dynamics of Gilboa & Matsui [15]. Subsequently, we introduce a general class of dynamics known as dual averaging – or “follow the regularized leader” – and which contains as special cases the replicator dynamics of Taylor & Jonker [55] and the projection dynamics of Friedman [12].

The dynamics we examine evolve in both discrete and continuous time: the continuous-time analysis is presented in Section 4 and comprises the template for the discrete-time analysis that follows in Section 5. In both cases, our analysis builds on previous work and results by Monderer & Shapley [37], Sandholm [44], Hofbauer & Sigmund [21], Hofbauer & Sandholm [18, 19], Hofbauer et al. [23] and Mertikopoulos & Sandholm [32, 33]. In this context, some of the new results that we present in our paper can be summarized as follows:

  1. a)

    We establish a precise link between (regularized) fictitious play and dual averaging: informally, fictitious play best-responds to the average population state whereas dual averaging best-responds to the average population payoff profile.

  2. b)

    We determine the equilibrium convergence properties of vanishingly regularized fictitious play in continuous time.

  3. c)

    We provide a range of conditions for the convergence of dual averaging trajectories to evolutionarily stable states, in both discrete and continuous time (and possibly with a variable learning rate).

Finally, in Section 6, we discuss several extensions of these results beyond single-population nonatomic games, and we provide a preview of the issues that arise in nonatomic games with continuous action spaces – the topic of interest of the follow-up paper [16].

2. Preliminaries

2.1. Nonatomic games

A (single-population) nonatomic game is defined by the following primitives:

  • A population of players. This is modeled by the unit interval =[0,1]\mathcal{I}=[0,1] endowed with the Lebesgue measure μ\mu. The nonatomic aspect means that the behavior of a set of players with zero measure has no impact on the game (see below).

  • A set of pure strategies 𝒜={1,,A}\mathcal{A}=\{1,\dotsc,A\}. Following Schmeidler [47], the players’ strategic behavior is defined as a measurable function χ:𝒜\chi\colon\mathcal{I}\to\mathcal{A}, with χ(i)𝒜\chi(i)\in\mathcal{A} denoting the pure strategy of player ii. Given a pure strategy α𝒜\alpha\in\mathcal{A}, the inverse image χ1(α)={i:χ(i)=α}\chi^{-1}(\alpha)=\{i\in\mathcal{I}:\chi(i)=\alpha\} will be called the set of “α\alpha-strategists”, and the corresponding pushforward measure xχμμχ1x\coloneqq\chi\sharp\mu\equiv\mu\circ\chi^{-1} on 𝒜\mathcal{A} will be called the state of the population. This measure will be the main variable of interest; for convenience, we will view xx as an element of the simplex 𝒳Δ(𝒜)\mathcal{X}\coloneqq\operatorname{\operatorname{\Delta}}(\mathcal{A}) and we will write xα=μ(χ1(α))x_{\alpha}=\mu(\chi^{-1}(\alpha)) for the relative frequency of α\alpha-strategists in the population.

  • A family of continuous payoff functions uα:𝒳u_{\alpha}\colon\mathcal{X}\to\mathbb{R}, α𝒜\alpha\in\mathcal{A}. Specifically, uα(x)u_{\alpha}(x) denotes the payoff to α\alpha-strategists when the state of the population is x𝒳x\in\mathcal{X}. Collectively, we will also write F(x)=(uα(x))α𝒜F(x)=(u_{\alpha}(x))_{\alpha\in\mathcal{A}} for the population’s payoff vector at state xx.

The above definition specifies a population game 𝒢𝒢(𝒜,F)\mathcal{G}\equiv\mathcal{G}(\mathcal{A},F), cf. Sandholm [45] and references therein. In this definition, players are anonymous in the sense that their payoffs factor through the state of the population and do not depend on specific players choosing specific strategies. For more intricate classes of nonatomic games (possibly involving idiosyncratic components), see Schmeidler [47] and Mas-Colell [27].

2.2. Equilibria, stability, and variational inequalities

In population games, an equilibrium is defined as a population state x𝒳x^{\ast}\in\mathcal{X} in which almost all players are satisfied with their choice of strategy. Formally, x𝒳x^{\ast}\in\mathcal{X} is an equilibrium of 𝒢\mathcal{G} if

uα(x)uβ(x)for all αsupp(x)β𝒜,u_{\alpha}(x^{\ast})\geq u_{\beta}(x^{\ast})\quad\text{for all $\alpha\in\operatorname{supp}(x^{\ast})$, $\beta\in\mathcal{A}$}, (Eq)

where supp(x)\operatorname{supp}(x^{\ast}) denotes the support of xx^{\ast}. Equivalently, this requirement for x𝒳x^{\ast}\in\mathcal{X} can be stated as a Stampacchia variational inequality (SVI) of the form

F(x),xx0for all x𝒳.\langle F(x^{\ast}),x-x^{\ast}\rangle\leq 0\quad\text{for all $x\in\mathcal{X}$}. (SVI)

Equilibria can also be seen as fixed points of the continuous endomorphism xΠ𝒳(x+F(x))x\mapsto\operatorname{\Pi}_{\mathcal{X}}(x+F(x)), where Π𝒳\operatorname{\Pi}_{\mathcal{X}} denotes the Euclidean projector on 𝒳\mathcal{X}. By standard fixed point arguments, it follows that the game’s set of equilibria is nonempty; we will denote this set as Eq(𝒢)\operatorname{Eq}(\mathcal{G}).

Remark.

In operator theory and optimization, the direction of (SVI) is typically reversed because optimization problems are usually formulated as minimization problems. The utility maximization viewpoint is more common in the literature on population games, so we will maintain this sign convention throughout.

An alternative characterization of equilibria is by means of the best-response correspondence 𝖡𝖱:𝒳𝒳\operatorname{\mathsf{BR}}\colon\mathcal{X}\rightrightarrows\mathcal{X}, defined here as

𝖡𝖱(x)\displaystyle\operatorname{\mathsf{BR}}(x) argmaxx𝒳F(x),x,\displaystyle\coloneqq\operatorname*{arg\,max}_{x^{\prime}\in\mathcal{X}}\langle F(x),x^{\prime}\rangle, (1)

We then have the following string of equivalences:

x𝖡𝖱(x)supp(x)𝖡𝖱(x)xEq(𝒢)x^{\ast}\in\operatorname{\mathsf{BR}}(x^{\ast})\iff\operatorname{supp}(x^{\ast})\subseteq\operatorname{\mathsf{BR}}(x^{\ast})\iff x^{\ast}\in\operatorname{Eq}(\mathcal{G}) (2)

i.e., the equilibria of the game correspond to the fixed points of 𝖡𝖱\operatorname{\mathsf{BR}}. One can easily show that, under our assumptions, this correspondence has a closed graph and convex compact non-empty values. By Kakutani’s fixed point theorem, Eq(𝒢)\operatorname{Eq}(\mathcal{G}) – that is, the set of fixed points of 𝖡𝖱\operatorname{\mathsf{BR}} – is nonempty and compact.

We will also consider the associated Minty variational inequality (MVI) for x𝒳x^{\ast}\in\mathcal{X}, namely

F(x),xx0for all x𝒳.\langle F(x),x-x^{\ast}\rangle\leq 0\quad\text{for all $x\in\mathcal{X}$}. (MVI)

For concreteness, we will write SVI(𝒢)\operatorname{SVI}(\mathcal{G}) and MVI(𝒢)\operatorname{MVI}(\mathcal{G}) for the set of solutions of (SVI) and (MVI) respectively. In terms of structural properties, MVI(𝒢)\operatorname{MVI}(\mathcal{G}) is convex and, under our continuity assumption for FF, it is straightforward to verify that MVI(𝒢)SVI(𝒢)Eq(𝒢)\operatorname{MVI}(\mathcal{G})\subseteq\operatorname{SVI}(\mathcal{G})\equiv\operatorname{Eq}(\mathcal{G}). In this regard, (MVI) represents an “equilibrium refinement” criterion for 𝒢\mathcal{G} (see also Section 2.3 below).

In more detail, (MVI) is intimately related to the seminal notion of evolutionary stability that was introduced by Maynard Smith & Price [28] and formed the starting point of evolutionary game theory. To make this link precise, recall that a population state x𝒳x^{\ast}\in\mathcal{X} is an evolutionarily stable state of 𝒢\mathcal{G} [28, 20, 45], if

F(δx+(1δ)x),xx<0\langle F(\delta x+(1-\delta)x^{\ast}),x-x^{\ast}\rangle<0 (ESS)

for all sufficiently small δ>0\delta>0 and all states xxx\neq x^{\ast}. Analogously, x𝒳x^{\ast}\in\mathcal{X} is a neutrally stable state (NSS) of 𝒢\mathcal{G} if (ESS) holds as a weak inequality, i.e.,

F(δx+(1δ)x),xx0\langle F(\delta x+(1-\delta)x^{\ast}),x-x^{\ast}\rangle\leq 0 (NSS)

for all sufficiently small δ>0\delta>0 and all x𝒳x\in\mathcal{X}. Finally, xx^{\ast} is said to be a globally evolutionarily stable state (resp. globally neutrally stable state) of 𝒢\mathcal{G} if (ESS) [resp. (NSS)] holds for all δ(0,1]\delta\in(0,1]. In obvious notation, we will write ESS(𝒢)\operatorname{ESS}(\mathcal{G}), NSS(𝒢)\operatorname{NSS}(\mathcal{G}), GESS(𝒢)\operatorname{GESS}(\mathcal{G}) and GNSS(𝒢)\operatorname{GNSS}(\mathcal{G}) for the respective set of states of 𝒢\mathcal{G} (by definition GESS(𝒢)\operatorname{GESS}(\mathcal{G}) consists of at most a single point).

Characterizations and implications. 

A concise characterization of these notions of stability was derived by Taylor [54] and Hofbauer et al. [22], who showed that xESS(𝒢)x^{\ast}\in\operatorname{ESS}(\mathcal{G}) if and only if there exists a neighborhood 𝒰\mathcal{U} of xx^{\ast} in 𝒳\mathcal{X} such that

F(x),xx<0for all x𝒰{x},\langle F(x),x-x^{\ast}\rangle<0\quad\text{for all $x\in\mathcal{U}\setminus\{x^{\ast}\}$}, (3)

and, furthermore, if we can take 𝒰=𝒳\mathcal{U}=\mathcal{X} above, then GESS(𝒢)={x}\operatorname{GESS}(\mathcal{G})=\{x^{\ast}\}. For the corresponding neutral versions, we have xNSS(𝒢)x^{\ast}\in\operatorname{NSS}(\mathcal{G}) if and only if (3) holds on 𝒰\mathcal{U} as a weak inequality, and xGNSS(𝒢)x^{\ast}\in\operatorname{GNSS}(\mathcal{G}) if and only if we can take 𝒰=𝒳\mathcal{U}=\mathcal{X} in this case. This characterization of GNSS(𝒢)\operatorname{GNSS}(\mathcal{G}) corresponds precisely to the solution set of (MVI); in particular, we have the following string of inclusions:

GESS{\operatorname{GESS}}GNSS{\operatorname{GNSS}}MVI{\operatorname{MVI}}ESS{\operatorname{ESS}}NSS{\operatorname{NSS}}Eq{\operatorname{Eq}}SVI{\operatorname{SVI}}\scriptstyle{\subseteq}\scriptstyle{\subseteq}\scriptstyle{\equiv}\scriptstyle{\subseteq}\scriptstyle{\subseteq}\scriptstyle{\subseteq}\scriptstyle{\subseteq}\scriptstyle{\equiv} (4)

In general, the above inclusions are all one-way; in the next section, we discuss two important cases where some of them become equivalences.

2.3. Classes of games

The following classes of games will play an important role in the sequel:

  1. (1)

    Potential games: these are games that admit a potential function, i.e., a function P:𝒳P\colon\mathcal{X}\to\mathbb{R} such that F=PF=\nabla P on 𝒳\mathcal{X}, or, more generally,

    P(x;xx)=F(x),xxfor all x,x𝒳,P^{\prime}(x;x^{\prime}-x)=\langle F(x),x^{\prime}-x\rangle\quad\text{for all~$x,x^{\prime}\in\mathcal{X}$}, (Pot)

    where P(x;z)limt0+[P(x+tz)P(x)]/tP^{\prime}(x;z)\coloneqq\lim_{t\to 0^{+}}[P(x+tz)-P(x)]/t denotes the one-sided directional derivative of PP at xx along zz. If 𝒢\mathcal{G} is a potential game, any local maximum of PP is an equilibrium of 𝒢\mathcal{G}, and any strict local maximum of PP is an evolutionarily stable state (ESS) of 𝒢\mathcal{G}. For a detailed treatment of potential games in population games extending the initial study of Monderer & Shapley [37], see Sandholm [44, 45].

  2. (2)

    Monotone games: these are games that satisfy the monotonicity condition

    F(x)F(x),xx0for all x,x𝒳.\langle F(x^{\prime})-F(x),x^{\prime}-x\rangle\leq 0\quad\text{for all $x,x^{\prime}\in\mathcal{X}$}. (Mon)

    This condition implies that every equilibrium of 𝒢\mathcal{G} is neutrally stable, so we get the following equivalences:

    MVI(𝒢)GNSS(𝒢)=NSS(𝒢)=Eq(𝒢)SVI(𝒢)\operatorname{MVI}(\mathcal{G})\equiv\operatorname{GNSS}(\mathcal{G})=\operatorname{NSS}(\mathcal{G})=\operatorname{Eq}(\mathcal{G})\equiv\operatorname{SVI}(\mathcal{G}) (5)

    In this class, the existence of equilibria relies only on the minmax theorem [36].

    Going further, a game is called strictly monotone if (Mon) holds as a strict inequality for all xxx^{\prime}\neq x. In this case, we have the stronger equivalence

    GESS(𝒢)=ESS(𝒢)=Eq(𝒢),\operatorname{GESS}(\mathcal{G})=\operatorname{ESS}(\mathcal{G})=\operatorname{Eq}(\mathcal{G}), (6)

    i.e., all inclusions in (4) become two-way. Moreover, Eq(𝒢)\operatorname{Eq}(\mathcal{G}) is a singleton which is also the unique globally evolutionarily stable state (GESS) of the game.

The intersection of potential and monotone (resp. strictly monotone) games occurs when 𝒢\mathcal{G} admits a concave (resp. strictly concave) potential. The most important example of such games is the class of nonatomic congestion games that arise in routing and transportation theory, cf. Dafermos [11].

2.4. Regularization

In the sequel, we will often consider “regularized” best responses that are single-valued. These are defined as follows:

Definition 2.1.

A regularizer on 𝒳\mathcal{X} is a function h:𝒜{}h\colon\mathbb{R}^{\mathcal{A}}\to\mathbb{R}\cup\{\infty\} such that:

  1. (1)

    hh is supported on 𝒳\mathcal{X}, i.e., domh={x𝒜:h(x)<}=𝒳\operatorname{dom}h=\{x\in\mathbb{R}^{\mathcal{A}}:h(x)<\infty\}=\mathcal{X}.

  2. (2)

    hh is strictly convex and continuous on 𝒳\mathcal{X}.

Given a regularizer on 𝒳\mathcal{X}, we further define:

  1. \edefnit\selectfonta\edefnn)

    The associated choice map Qh:𝒜𝒳Q_{h}\colon\mathbb{R}^{\mathcal{A}}\to\mathcal{X} given by

    Qh(v)\displaystyle Q_{h}(v) argmaxx𝒳{v,xh(x)}.\displaystyle\coloneqq\operatorname*{arg\,max}_{x^{\prime}\in\mathcal{X}}\{\langle v,x^{\prime}\rangle-h(x^{\prime})\}. (7a)
    𝖡𝖱h(x)\displaystyle\operatorname{\mathsf{BR}}_{h}(x) argmaxx𝒳{F(x),xh(x)}=Qh(F(x)).\displaystyle\coloneqq\operatorname*{arg\,max}_{x^{\prime}\in\mathcal{X}}\{\langle F(x),x^{\prime}\rangle-h(x^{\prime})\}=Q_{h}(F(x)). (7b)
Remark 2.1.

By our assumptions for hh (continuity and strict convexity), QhQ_{h} and 𝖡𝖱h\operatorname{\mathsf{BR}}_{h} are both well-defined and single-valued as correspondences. Moreover, by the strict convexity of hh, it follows that the convex conjugate h(y)=maxx𝒳{y,xh(x)}h^{\ast}(y)=\max_{x\in\mathcal{X}}\{\langle y,x\rangle-h(x)\} of hh is differentiable and satisfies

Qh(v)=h(v)for all v𝒜.Q_{h}(v)=\nabla h^{\ast}(v)\quad\text{for all $v\in\mathbb{R}^{\mathcal{A}}$}. (8)
Remark 2.2.

By construction, 𝖡𝖱h\operatorname{\mathsf{BR}}_{h}{} (approximately) best-responds to strategies, while QhQ_{h} best-responds to payoff vectors. For this reason, we will sometimes refer to QhQ_{h} as a “payoff-based” regularized best response (as opposed to “strategy-based” regularized best response for 𝖡𝖱h\operatorname{\mathsf{BR}}_{h}{}).

Moving forward, for part of our analysis, we will also require one (or both) of the regularity conditions below:

  1. (1)

    hh is KK-strongly convex on 𝒳\mathcal{X}, i.e.,

    h(λx+(1λ)x)λh(x)+(1λ)h(x)K2λ(1λ)xx2h(\lambda x+(1-\lambda)x^{\prime})\leq\lambda h(x)+(1-\lambda)h(x^{\prime})-\frac{K}{2}\lambda(1-\lambda)\lVert x^{\prime}-x\rVert^{2} (StrCvx)

    for all x,x𝒳x,x^{\prime}\in\mathcal{X} and all λ[0,1]\lambda\in[0,1].

  2. (2)

    The subdifferential h\partial h of hh admits a continuous selection; specifically, writing 𝒳hdomh={x:h(x)}\mathcal{X}_{h}\coloneqq\operatorname{dom}\partial h=\{x:\partial h(x)\neq\varnothing\} for the “prox-domain” of hh, we posit that there exists a continuous map ~h:𝒳h𝒜\widetilde{\nabla}h\colon\mathcal{X}_{h}\to\mathbb{R}^{\mathcal{A}} such that

    ~h(x)h(x)for all x𝒳h.\widetilde{\nabla}h(x)\in\partial h(x)\quad\text{for all $x\in\mathcal{X}_{h}$}. (Diff)

By replacing best responses with regularized best responses in the definition of equilibria, we also define the notion of a regularized equilibrium as follows:

Definition 2.2.

Given a regularizer hh on 𝒳\mathcal{X} and a regularization weight ε>0\varepsilon>0, we define an ε\varepsilon-regularized equilibrium of 𝒢\mathcal{G} as a profile x𝒳x^{\ast}\in\mathcal{X} such that x𝖡𝖱εh(x)x^{\ast}\in\operatorname{\mathsf{BR}}_{\varepsilon h}(x^{\ast}). The set of ε\varepsilon-regularized equilibria of 𝒢\mathcal{G} will be denoted as Eqεh(𝒢)\operatorname{Eq}_{\varepsilon h}(\mathcal{G}).

Under (Diff), x𝒳x^{\ast}\in\mathcal{X} is an ε\varepsilon-regularized equilibrium if and only if

F(x)ε~h(x),xx0for all x𝒳.\langle F(x^{\ast})-\varepsilon\widetilde{\nabla}h(x^{\ast}),x^{\ast}-x\rangle\geq 0\quad\text{for all $x\in\mathcal{X}$}. (9)

In this regard, ε\varepsilon-regularized equilibria can be seen as Nash equilibria of the “ε\varepsilon-regularized” game 𝒢ε𝒢ε(𝒜,Fε)\mathcal{G}_{\varepsilon}\equiv\mathcal{G}_{\varepsilon}(\mathcal{A},F_{\varepsilon}) with payoff profile Fε(x)F(x)ε~h(x)F_{\varepsilon}(x)\coloneqq F(x)-\varepsilon\widetilde{\nabla}h(x); more concisely, Eqε(𝒢)=Eq(𝒢ε)\operatorname{Eq}_{\varepsilon}(\mathcal{G})=\operatorname{Eq}(\mathcal{G}_{\varepsilon}).

Equilibria and regularized equilibria are related by the next hemicontinuity property, itself a consequence of the maximum theorem of Berge [6].

Lemma 1.

Let xεEqε(𝒢)x^{\ast}_{\varepsilon}\in\operatorname{Eq}_{\varepsilon}(\mathcal{G}), ε>0\varepsilon>0, be a family of regularized equilibria of 𝒢\mathcal{G}. Then, in the limit ε0\varepsilon\to 0, any accumulation point of xεx^{\ast}_{\varepsilon} is an equilibrium of 𝒢\mathcal{G}.

We close this section with a concrete illustration of the above concepts:

Example 2.1 (Logit choice).

An important example of regularization corresponds to the entropic regularizer h(x)=α𝒜xαlogxαh(x)=\sum_{\alpha\in\mathcal{A}}x_{\alpha}\log x_{\alpha} which, by a standard calculation, leads to the choice map

Λ(v)=(ev1,,evA)ev1++evA.\operatorname{\Lambda}(v)=\frac{(e^{v_{1}},\dotsc,e^{v_{A}})}{e^{v_{1}}+\dotsm+e^{v_{A}}}. (10)

This map is commonly referred to as the logit choice map [29], and the corresponding fixed points x=Λ(F(x)/ε)x^{\ast}=\operatorname{\Lambda}(F(x^{\ast})/\varepsilon) are known as logit equilibria; for a detailed discussion, see van Damme [56], McKelvey & Palfrey [30], Sandholm [45], and references therein. It is also well known that the entropic regularizer is 11-strongly convex relative to the L1L_{1}-norm, so Λ\operatorname{\Lambda} is 11-Lipschitz continuous, cf. Shalev-Shwartz [48]. \lozenge

3. Dynamics

In this section, we introduce a wide range of dynamics for learning in nonatomic games, in both continuous and discrete time. The unifying principle of the dynamics under study is that agents play a best response – regularized or otherwise – to some version of the history of play: the empirical population frequencies, or the players’ average gains. For historical reasons, we begin with the discrete-time framework in Section 3.1, and we present the corresponding continuous-time dynamics in Section 3.2.

3.1. Discrete-time processes

In the discrete-time case, we will consider the evolution of the population state Xn𝒳X_{n}\in\mathcal{X} over a series of epochs n=1,2,n=1,2,\dotsc (corresponding for example to generations in an evolutionary context). In turn, this population state is determined by a recursive update rule – or algorithm – that defines the dynamics in question. Depending on the manner of “best-responding” and the signal that the agents are best-responding to, we have the following dynamics.

\AclFP. 

Dating back to Brown [8] and Robinson [41], fictitious play (FP) posits that agents best-respond to the empirical frequency (or “time average”) X¯n=(1/n)k=1nXk\bar{X}_{n}=(1/n)\sum_{k=1}^{n}X_{k} of their population state. Concretely, this means that the population state evolves according to the process

Xn+1𝖡𝖱(X¯n)=argmaxx𝒳F(X¯n),x.X_{n+1}\in\operatorname{\mathsf{BR}}(\bar{X}_{n})=\operatorname*{arg\,max}\nolimits_{x\in\mathcal{X}}\langle F(\bar{X}_{n}),x\rangle. (FP)

In terms of empirical frequencies, we have the recursive dynamics

X¯n+1X¯n1n+1[𝖡𝖱(X¯n)X¯n].\bar{X}_{n+1}-\bar{X}_{n}\in\frac{1}{n+1}[\operatorname{\mathsf{BR}}(\bar{X}_{n})-\bar{X}_{n}]. (11)

In contrast to the sequence of actual population states Xn𝒳X_{n}\in\mathcal{X}, n=1,2,n=1,2,\dotsc, the stationary points of the time-averaged process (11) are straightforward to characterize. Indeed, any fixed point xx^{\ast} of (11) satisfies

0𝖡𝖱(x)x,0\in\operatorname{\mathsf{BR}}(x^{\ast})-x^{\ast}, (12)

i.e., xx^{\ast} is stationary under (11) if and only if it is an equilibrium of 𝒢\mathcal{G}.

\AclRFP. 

A variant of the original fictitious play algorithm is the “regularized” version in which best responses are replaced by regularized best responses. In particular, given an underlying regularizer hh on 𝒳\mathcal{X} and a corresponding regularization weight ε>0\varepsilon>0, the regularized fictitious play (RFP) process is defined as

Xn+1=𝖡𝖱εh(X¯n)=argmaxx𝒳Vεh(X¯n,x)X_{n+1}=\operatorname{\mathsf{BR}}_{\varepsilon h}(\bar{X}_{n})=\operatorname*{arg\,max}\nolimits_{x\in\mathcal{X}}V_{\varepsilon h}(\bar{X}_{n},{x}) (RFP)

with

Vεh(x,x)=F(x),xεh(x)V_{\varepsilon h}(x^{\prime},x)=\langle F(x^{\prime}),x\rangle-\varepsilon h(x) (13)

Then, as before, in terms of empirical frequencies, we have

X¯n+1X¯n1n+1[𝖡𝖱εh(X¯n)X¯n].\bar{X}_{n+1}-\bar{X}_{n}\in\frac{1}{n+1}[\operatorname{\mathsf{BR}}_{\varepsilon h}(\bar{X}_{n})-\bar{X}_{n}]. (14)

As in the case of (3.1), the stationary points of the time-averaged dynamics (3.1) are given by the fixed point equation

0𝖡𝖱εh(x)x,0\in\operatorname{\mathsf{BR}}_{\varepsilon h}(x^{\ast})-x^{\ast}, (15)

i.e., they are precisely the ε\varepsilon-regularized equilibria of 𝒢\mathcal{G} (cf. Definition 2.2).

Remark.

In the literature, this process is sometimes referred to as “smooth” or “perturbed” fictitious play, cf. Fudenberg & Levine [14], Hofbauer & Sandholm [18] and references therein. This difference in terminology is owed to a different set of assumptions for hh, which gives rise to interior-valued choice maps Q:𝒜𝒳Q\colon\mathbb{R}^{\mathcal{A}}\to\mathcal{X}.

\AclVRFP. 

When the regularization weight in (3.1) decreases over time, we obtain the vanishingly regularized fictitious play (VRFP) dynamics of Benaïm & Faure [3], viz.

Xn+1=𝖡𝖱εnh(X¯n)=argmaxx𝒳Vεnh(X¯n,x)X_{n+1}=\operatorname{\mathsf{BR}}_{\varepsilon_{n}h}(\bar{X}_{n})=\operatorname*{arg\,max}\nolimits_{x\in\mathcal{X}}\ V_{\varepsilon_{n}h}(\bar{X}_{n},{x}) (VRFP)

with εn>0\varepsilon_{n}>0 for all nn and εn0\varepsilon_{n}\to 0 as nn\to\infty. In terms of empirical frequencies, we then have

X¯n+1X¯n1n+1[𝖡𝖱εnh(X¯n)X¯n].\bar{X}_{n+1}-\bar{X}_{n}\in\frac{1}{n+1}[\operatorname{\mathsf{BR}}_{\varepsilon_{n}h}(\bar{X}_{n})-\bar{X}_{n}]. (16)

Because this process is non-autonomous (εn\varepsilon_{n} depends explicitly on nn), it is no longer meaningful to discuss its rest points.

\AclDA. 

Dually to the above, instead of best-responding to the aggregate history of play (perfectly or approximately), we can also consider the case where agents play a ”regularized best response to their aggregate payoff”. In our context, this gives rise to the dual averaging dynamics

Xn+1\displaystyle X_{n+1} =Qh(ηnk=1nF(Xk)),\displaystyle=Q_{h}\left(\eta_{n}\sum_{k=1}^{n}F(X_{k})\right), (17)
with QhQ_{h} defined in (7a). Then, in iterative form, we get the update rule
Sn=Sn1+F(Xn)Xn+1=Qh(ηnSn)\displaystyle\begin{split}S_{n}&=S_{n-1}+F(X_{n})\\ X_{n+1}&=Q_{h}(\eta_{n}S_{n})\end{split} (DA)

where ηn>0\eta_{n}>0 is a “learning rate” parameter in the spirit of Nesterov [40]. In the recursive formulation (3.1) of the dynamics, the initialization S0=0S_{0}=0 is the most common one but, otherwise, it can be arbitrary.

Remark.

In the literature on online learning and online optimization, the process (3.1) is known as “follow the regularized leader” (FTRL) [48, 49]. The terminology “dual averaging” is due to Nesterov [40] and is more common in the theory of variational inequalities and offline convex optimization. We adopt the latter to highlight the link of our work to variational inequalities.

Links between the dynamics: the case of random matching. 

To illustrate the relations between the various learning processes discussed above, it will be instructive to consider nonatomic games generated by random matching [57, 45]. In such games, players are randomly matched to play a symmetric two-player game with payoff matrix A𝒜×𝒜A\in\mathbb{R}^{\mathcal{A}\times\mathcal{A}}, so the population’s mean payoff profile is F(x)=MxF(x)=Mx for all x𝒳x\in\mathcal{X}. As a result, the dynamics (3.1) may be rewritten as

Xn+1=Qh(ηnk=1nF(Xk))=Qh(nηnF(X¯n))=Qh/(nηn)(F(X¯n)),X_{n+1}=Q_{h}\left(\eta_{n}\sum_{k=1}^{n}F(X_{k})\right)=Q_{h}\left(n\eta_{n}F(\bar{X}_{n})\right)=Q_{h/(n\eta_{n})}(F(\bar{X}_{n})), (18)

which immediately allows us to recover the variants of fictitious play as follows:

  1. (1)

    For (VRFP): ηn=1/(nεn)\eta_{n}=1/(n\varepsilon_{n}) for a sequence of weights εn>0\varepsilon_{n}>0, limnεn=0\lim_{n\to\infty}\varepsilon_{n}=0.

  2. (2)

    For (3.1): ηn=1/(nε)\eta_{n}=1/(n\varepsilon) for some fixed ε>0\varepsilon>0.

  3. (3)

    For (3.1): this corresponds to the limiting cases “nηn=n\eta_{n}=\infty” or “h=0h=0”. [Of course, these parameter choices are not formally allowed in (3.1), so this limit is informal.]

Remark.

The above is meaningful only when F(x)F(x) is linear in xx; however, the relation between ε\varepsilon and η\eta will be seen to underlie a large part of the sequel.

3.2. Continuous-time dynamics

We now proceed to define the corresponding continuous-time dynamics for each of the processes described above.

\AclBRD. 

The autonomous formulation (11) of fictitious play can be seen as an Euler discretization of the best reply dynamics (BRD) of Gilboa & Matsui [15], namely

μ˙t𝖡𝖱(μt)μt.\dot{\mu}_{t}\in\operatorname{\mathsf{BR}}(\mu_{t})-\mu_{t}. (BRD)

In the above, μt𝒳\mu_{t}\in\mathcal{X} is the continuous-time analogue of the empirical mean process X¯n\bar{X}_{n} in discrete time; we use the notation μ\mu instead of xx to highlight this link. Clearly, the rest points of (3.2) are again described by the fixed point equation (12), i.e., μ\mu^{\ast} is stationary under (3.2) if and only if it is an equilibrium of 𝒢\mathcal{G}.

\AclRBRD. 

Working as above, the regularized best reply dynamics (RBRD) are defined as

μ˙t=𝖡𝖱εh(μt)μt.\dot{\mu}_{t}=\operatorname{\mathsf{BR}}_{\varepsilon h}(\mu_{t})-\mu_{t}. (RBRD)

As in the case of (3.1), the rest points of (3.2) are characterized by the fixed point equation (15), i.e., μ\mu^{\ast} is stationary under (3.2) if and only if it is an ε\varepsilon-regularized equilibrium of 𝒢\mathcal{G}.

Example 3.1 (The logit dynamics).

Building on the logit choice map discussed in Section 2 (cf. Example 2.1), the associated regularized best reply dynamics are known as logit dynamics, and are given by

μ˙t=Λ(F(μt)/ε)μt.\dot{\mu}_{t}=\operatorname{\Lambda}\left(F(\mu_{t})/\varepsilon\right)-\mu_{t}. (19)

Vanishing regularized best reply dynamics. 

If we allow the regularization weight ε\varepsilon to vary in (3.2), we obtain the non-autonomous dynamics

μ˙t=𝖡𝖱εth(μt)μt,\dot{\mu}_{t}=\operatorname{\mathsf{BR}}_{\varepsilon_{t}h}(\mu_{t})-\mu_{t}, (VBRD)

where the “V” indicates again that εt0\varepsilon_{t}\to 0 as tt\to\infty.

The dual averaging dynamics. 

Finally, given a regularizer hh on 𝒳\mathcal{X} as above, the dynamics of dual averaging in continuous time can be written as

y˙t\displaystyle\dot{y}_{t} =F(xt)\displaystyle=F(x_{t}) (DAD)
xt\displaystyle x_{t} =Qh(ηtyt)\displaystyle=Q_{h}(\eta_{t}y_{t})

where ηt0\eta_{t}\geq 0 is a time-varying learning parameter, which we assume throughout to be non-increasing. More compactly, the above can also be written as

y˙t\displaystyle\dot{y}_{t} =F(Qh(ηtyt))\displaystyle=F(Q_{h}(\eta_{t}y_{t})) (20a)
or
xt\displaystyle x_{t} =Qh(ηt0tF(xs)𝑑s),\displaystyle=Q_{h}\left(\eta_{t}\int_{0}^{t}F(x_{s})\>ds\right), (20b)

depending on whether we take a differential- or integral-based viewpoint. Specifically, we note that (20a) is an autonomous ordinary differential equation evolving in the dual, payoff space of the game; by contrast, (20b) is an integral equation stated directly in the primal, strategy space of the game. We also note that, unlike (3.2), (3.2) and (3.2), the dynamics (3.2) are stated in terms of the actual population state xtx_{t} at time tt, not its empirical mean μt\mu_{t} (which is the driving variable of the previous dynamics).

Example 3.2.

If ηt1\eta_{t}\equiv 1 and hh is the entropic regularizer on 𝒳\mathcal{X} (cf. Example 2.1), the dynamics (3.2) yield the replicator dynamics of Taylor & Jonker [55], viz.

x˙α,t=xα,t[uα(xt)F(xt),xt].\dot{x}_{\alpha,t}=x_{\alpha,t}[u_{\alpha}(x_{t})-\langle F(x_{t}),x_{t}\rangle]. (RD)

In words, (RD) indicates that the per capita growth rate of the population of α\alpha-strategists is proportional to the difference between the payoff that they experience and the mean population payoff. For a detailed presentation of this derivation in different contexts, see Rustichini [43], Sorin [50, 51], Mertikopoulos & Moustakas [31] and references therein. \lozenge

Example 3.3.

If ηt1\eta_{t}\equiv 1 and h(x)=(1/2)α𝒜xα2h(x)=(1/2)\sum_{\alpha\in\mathcal{A}}x_{\alpha}^{2}, the integral form (20b) of (3.2) becomes

xt=Π(0tF(xs)𝑑s)x_{t}=\operatorname{\Pi}\left(\int_{0}^{t}F(x_{s})\>ds\right) (21)

where Π\operatorname{\Pi} denotes the Euclidean projector on 𝒳\mathcal{X}. The differential form of (21) is more delicate to describe because xtx_{t} may enter and exit different faces of 𝒳\mathcal{X} in perpetuity. However, if we focus on an open interval 𝒯\mathcal{T}\subseteq\mathbb{R} over which the support of xtx_{t} remains constant, it can be shown that (21) follows the projection dynamics of Friedman [12], viz. for all t𝒯t\in\mathcal{T} we have

x˙α,t={uα(x)1|supp(xt)|βsupp(xt)uβ(x)if αsupp(xt),0otherwise.\dot{x}_{\alpha,t}=\begin{cases}u_{\alpha}(x)-\frac{1}{\lvert\operatorname{supp}(x_{t})\rvert}\sum_{\beta\in\operatorname{supp}(x_{t})}u_{\beta}(x)&\quad\text{if $\alpha\in\operatorname{supp}(x_{t})$},\\ 0&\quad\text{otherwise}.\end{cases} (PD)

For the details of this derivation, see Mertikopoulos & Sandholm [32]. \lozenge

Remark.

For posterity, we note that both regularizers used in the above examples satisfy (StrCvx), (Diff), and the technical condition (Rec) that we describe later in the paper.

Random matching in continuous time. 

Following Hofbauer et al. [23], the various dynamics presented so far can be linked as follows when FF is linear and μt=1t0txs𝑑s\mu_{t}=\frac{1}{t}\int_{0}^{t}x_{s}\>ds for t>0t>0:

xt\displaystyle x_{t} =argmaxx𝒳{ηt0tF(xs),x𝑑sh(x)}=argmaxx𝒳{ηttF(μt),xh(x)}\displaystyle=\operatorname*{arg\,max}\limits_{x\in\mathcal{X}}\left\{\eta_{t}\int_{0}^{t}\langle F(x_{s}),x\rangle\>ds-h(x)\right\}=\operatorname*{arg\,max}\limits_{x\in\mathcal{X}}\left\{\eta_{t}\,t\,\langle F(\mu_{t}),x\rangle-h(x)\right\}
=𝖡𝖱[1/(tηt)]h(μt).\displaystyle=\operatorname{\mathsf{BR}}_{[1/(t\eta_{t})]h}(\mu_{t}). (22)

It thus follows that the empirical distribution of play μt\mu_{t} under (3.2) follows the dynamics

μ˙t=𝖡𝖱εth(μt)μtt\dot{\mu}_{t}=\frac{\operatorname{\mathsf{BR}}_{\varepsilon_{t}h}(\mu_{t})-\mu_{t}}{t} (23)

with εt=1/(tηt)\varepsilon_{t}=1/(t\eta_{t}). Hence, after the change of time tlogtt\leftarrow\log t, we get (3.2) with a variable regularization parameter εt=1/(tηt)\varepsilon_{t}=1/(t\eta_{t}).

4. Analysis and results in continuous time

We now proceed to present our results for the class of dynamics under study. We begin with the continuous-time analysis which provides a template for the discrete-time analysis in the next section. We prove that under some assumptions depending on the dynamical system and the class of games under consideration (potential, monotone or strictly monotone), the dynamics converge to the set of equilibria or or – regularized equilibria – of the game.111Recall that a family (xt)t0(x_{t})_{t\geq 0} (resp. a sequence (Xn)(X_{n}), n=1,2,n=1,2,\dotsc) converges to a set BB if the set of accumulation points A[xt]{\textbf{A}}[x_{t}] (resp. 𝐀[xn]\mathbf{A}[x_{n}]), as tt\rightarrow\infty (resp nn\rightarrow\infty), satisfies: A[xt]B{\textbf{A}}[x_{t}]\subset B (resp. 𝐀[xn]B\mathbf{A}[x_{n}]\subset B).

4.1. \AclBRD

Our first result concerns the best reply dynamics (3.2). Albeit slightly more general, the results of this subsection and the next essentially follow Hofbauer & Sandholm [18, 19].

Theorem 4.1.

Suppose that one of the following holds:

  1. \edefitn(\edefitit\selectfonta\edefitn)

    𝒢\mathcal{G} is potential.

  2. \edefitn(\edefitit\selectfonta\edefitn)

    𝒢\mathcal{G} is monotone and FF is C1C^{1}-smooth.

Then every solution orbit μt\mu_{t} of (3.2) converges to Eq(𝒢)\operatorname{Eq}(\mathcal{G}).

Proof.

Both cases rely on establishing a suitable Lyapunov function E(t)E(t) for (3.2):

  1. (1)

    In the potential case, we simply take E(t)=P(μt)E(t)=P(\mu_{t}).

  2. (2)

    In the monotone case, E(t)=Gap(μt)E(t)=\operatorname{Gap}(\mu_{t}) where Gap\operatorname{Gap} denotes the function

    Gap(x):=maxx𝒳F(x),xx=F(x),𝖡𝖱(x)x\operatorname{Gap}(x):=\max_{x^{\prime}\in\mathcal{X}}\langle F(x),x^{\prime}-x\rangle=\langle F(x),\operatorname{\mathsf{BR}}(x)-x\rangle (24)

    Observe that Gap(x)0\operatorname{Gap}(x)\geq 0 with equality if and only if xEq(𝒢)x\in\operatorname{Eq}(\mathcal{G}).

We now proceed on a case-by-case basis:

Potential case. 

From (Pot) we obtain that

ddtP(μt)=limδ0+[P(μt+δμ˙t)P(μt)]/δ=F(μt),μ˙t\frac{d}{dt}P(\mu_{t})=\lim_{\delta\to 0^{+}}[P(\mu_{t}+\delta\dot{\mu}_{t})-P(\mu_{t})]/\delta=\langle F(\mu_{t}),\dot{\mu}_{t}\rangle (25)

Since μ˙t𝖡𝖱(μt)μt\dot{\mu}_{t}\in\operatorname{\mathsf{BR}}(\mu_{t})-\mu_{t}, it follows that ddtP(μt)=Gap(μt)0\frac{d}{dt}P(\mu_{t})=\operatorname{Gap}(\mu_{t})\geq 0 with equality if and only if μ˙t=0\dot{\mu}_{t}=0. Consequently, from Lyapunov’s first method [see e.g., 20, Theorem 2.6.1] we deduce that μt\mu_{t} converges to the set of rest points of (3.2), which is Eq(𝒢)\operatorname{Eq}(\mathcal{G}).

Monotone case. 

Let JFJF denote the Jacobian of FF. By the envelope theorem we get:

ddtGap(μt)=F(μt),μ˙t+μ˙tJF(μt)(𝖡𝖱(μt)μt)=Gap(μt)+μ˙tJF(μt)μ˙t\frac{d}{dt}\operatorname{Gap}(\mu_{t})=\langle F(\mu_{t}),-\dot{\mu}_{t}\rangle+\dot{\mu}_{t}JF(\mu_{t})(\operatorname{\mathsf{BR}}(\mu_{t})-\mu_{t})=-\operatorname{Gap}(\mu_{t})+\dot{\mu}_{t}JF(\mu_{t})\dot{\mu}_{t} (26)

As 𝒢\mathcal{G} is monotone, (xx)JF(x)(xx)0(x^{\prime}-x)JF(x)(x^{\prime}-x)\leq 0 for all x,x𝒳x,x^{\prime}\in\mathcal{X}, so μ˙tJF(μt)μ˙t0\dot{\mu}_{t}JF(\mu_{t})\dot{\mu}_{t}\leq 0. Consequently:

ddtGap(μt)Gap(μt).\frac{d}{dt}\operatorname{Gap}(\mu_{t})\leq-\operatorname{Gap}(\mu_{t}). (27)

Hence Gap(μt)\operatorname{Gap}(\mu_{t}) decreases exponentially to 0, so μt\mu_{t} converges to Eq(𝒢)\operatorname{Eq}(\mathcal{G}).∎

4.2. \AclRBRD

In this section we will assume that the regularizer hh satisfies (Diff). In a slight abuse of notation, we will skip the dependence on hh and write FεF_{\varepsilon} for FεhFε~hF_{\varepsilon h}\coloneqq F-\varepsilon\widetilde{\nabla}h, see (9), with ~h\widetilde{\nabla}h defined as in (Diff). We also write 𝖡𝖱ε(x)\operatorname{\mathsf{BR}}_{\varepsilon}(x) for 𝖡𝖱εh(x)\operatorname{\mathsf{BR}}_{\varepsilon h}(x) and so on. The reason is that the regularizer hh will be fixed throughout while the weight ε\varepsilon will be allowed to vary in the next subsection.

Lemma 2.

Under (Diff), we have Fε(x),𝖡𝖱ε(x)x0\langle F_{\varepsilon}(x),\operatorname{\mathsf{BR}}_{\varepsilon}(x)-x\rangle\geq 0 for all x𝒳x\in\mathcal{X}, and equality holds if and only if x=𝖡𝖱ε(x)x=\operatorname{\mathsf{BR}}_{\varepsilon}(x).

Proof.

Let ϕ(z)=F(x),zεh(z)\phi(z)=\langle F(x),z\rangle-\varepsilon h(z). Then ϕ\phi is strictly concave and so ϕ\partial\phi defines a strictly monotone operator. Thus, x=argmaxz𝒳ϕ(z)x^{\prime}=\arg\max_{z\in\mathcal{X}}\phi(z) is a solution of the variational inequality (MVI) associated to ϕ\partial\phi: for all dϕ(x)d\in\partial\phi(x): d,xx0\langle d,x^{\prime}-x\rangle\geq 0 with equality if and only if x=xx=x^{\prime}. As Fε(x)ϕ(x)F_{\varepsilon}(x)\in\partial\phi(x) and x=𝖡𝖱ε(x)x^{\prime}=\operatorname{\mathsf{BR}}_{\varepsilon}(x) the result follows. ∎

Lemma 3.

Suppose that (Diff) holds. If 𝒢\mathcal{G} with payoff field FF is potential (resp. monotone) then the game 𝒢ε\mathcal{G}_{\varepsilon} with payoff function FεF_{\varepsilon} is potential (resp. strictly monotone).

Proof.

If 𝒢\mathcal{G} has a potential PP, a potential function of 𝒢ε\mathcal{G}_{\varepsilon} is PεhP-\varepsilon h. If 𝒢\mathcal{G} is a monotone game, the game 𝒢ε\mathcal{G}_{\varepsilon} is strictly monotone because we have ~h(x)~h(x),xx0\langle\widetilde{\nabla}h(x)-\widetilde{\nabla}h(x^{\prime}),x-x^{\prime}\rangle\geq 0 for every x,x𝒳x,x^{\prime}\in\mathcal{X}, with equality if and only if x=xx=x^{\prime}. ∎

We now turn to the asymptotic properties of regularized best reply dynamics.

Theorem 4.2.

Suppose that (Diff) is satisfied and assume one of the following holds:

  1. \edefitn(\edefitit\selectfonta\edefitn)

    𝒢\mathcal{G} is potential.

  2. \edefitn(\edefitit\selectfonta\edefitn)

    𝒢\mathcal{G} is monotone and FF is C1C^{1}-smooth.

Then every solution orbit μt\mu_{t} of (3.2) converges to Eqε(𝒢)\operatorname{Eq}_{\varepsilon}(\mathcal{G}).

Remark.

Observe that when 𝒢\mathcal{G} is monotone, 𝒢ε\mathcal{G}_{\varepsilon} is strictly monotone, so Eqε(𝒢)\operatorname{Eq}_{\varepsilon}(\mathcal{G}) is a singleton.

Proof.

As above, both cases rely on establishing a suitable Lyapunov function E(t)E(t) for (3.2):

  1. (1)

    In the potential case: E(t)=Pε(μt)P(μt)εh(μt)E(t)=P_{\varepsilon}(\mu_{t})\coloneqq P(\mu_{t})-\varepsilon h(\mu_{t}).

  2. (2)

    In the monotone case:

    E(t)\displaystyle E(t) =Gapε(μt)=maxx𝒳F(μt),xμtε(h(x)h(μt))\displaystyle=\operatorname{Gap}_{\varepsilon}(\mu_{t})=\max_{x^{\prime}\in\mathcal{X}}\langle F(\mu_{t}),x^{\prime}-\mu_{t}\rangle-\varepsilon(h(x^{\prime})-h(\mu_{t}))
    =maxx𝒳Vε(μt,x)Vε(μt,μt)\displaystyle=\max_{x^{\prime}\in\mathcal{X}}V_{\varepsilon}(\mu_{t},x^{\prime})-V_{\varepsilon}(\mu_{t},\mu_{t}) (28)

Potential case. 

There is z(t)h(μt)z(t)\in\partial h(\mu_{t}) such that – similarly to (25) – we have:

ddtPε(μt)=limdt0+[Pε(μt+μ˙tdt)Pε(μt)]/dt=F(μt),μ˙tεz(t),μ˙t\frac{d}{dt}P_{\varepsilon}(\mu_{t})=\lim_{dt\to 0^{+}}[P_{\varepsilon}(\mu_{t}+\dot{\mu}_{t}dt)-P_{\varepsilon}(\mu_{t})]/dt=\langle F(\mu_{t}),\dot{\mu}_{t}\rangle-\varepsilon\langle z(t),\dot{\mu}_{t}\rangle (29)

As μ˙t=𝖡𝖱ε(μt)μt\dot{\mu}_{t}=\operatorname{\mathsf{BR}}_{\varepsilon}(\mu_{t})-\mu_{t}, by Lemma A.1 in the appendix, we have z(t),μ˙t~h(μt),μ˙t\langle z(t),\dot{\mu}_{t}\rangle\leq\langle\widetilde{\nabla}h(\mu_{t}),\dot{\mu}_{t}\rangle. Consequently

ddtPε(μt)F(μt),μ˙tε~h(μt),μ˙t=Fε(μt),μ˙t\frac{d}{dt}P_{\varepsilon}(\mu_{t})\geq\langle F(\mu_{t}),\dot{\mu}_{t}\rangle-\varepsilon\langle\widetilde{\nabla}h(\mu_{t}),\dot{\mu}_{t}\rangle=\langle F_{\varepsilon}(\mu_{t}),\dot{\mu}_{t}\rangle (30)

By Lemma 2 and the above, we deduce that ddtPε(μt)0\frac{d}{dt}P_{\varepsilon}(\mu_{t})\geq 0 with equality if and only if μ˙t=0\dot{\mu}_{t}=0. Consequently, by a Lyapunov argument, μt\mu_{t} converges to Eqε(𝒢)\operatorname{Eq}_{\varepsilon}(\mathcal{G}).

Monotone case. 

Using the envelope theorem as in (26) and, just as above, there is z(t)h(μt)z(t)\in\partial h(\mu_{t}) such that:

ddtGapε(μt)=F(μt)εz(t),μ˙t+μ˙tJF(μt)μ˙tFε(μt),μ˙t+μ˙tJF(μt)μ˙t\displaystyle\frac{d}{dt}\operatorname{Gap}_{\varepsilon}(\mu_{t})=-\langle F(\mu_{t})-\varepsilon z(t),\dot{\mu}_{t}\rangle+\dot{\mu}_{t}JF(\mu_{t})\dot{\mu}_{t}\leq-\langle F_{\varepsilon}(\mu_{t}),\dot{\mu}_{t}\rangle+\dot{\mu}_{t}JF(\mu_{t})\dot{\mu}_{t} (31)

The inequality being a consequence of Lemma A.1 as it implies that z(t),μ˙t~h(μt),μ˙t\langle z(t),\dot{\mu}_{t}\rangle\leq\langle\widetilde{\nabla}h(\mu_{t}),\dot{\mu}_{t}\rangle. Since 𝒢\mathcal{G} is monotone, μ˙tJF(μt)μ˙t0\dot{\mu}_{t}JF(\mu_{t})\dot{\mu}_{t}\leq 0. Consequently, as μ˙t=𝖡𝖱ε(μt)μt\dot{\mu}_{t}=\operatorname{\mathsf{BR}}_{\varepsilon}(\mu_{t})-\mu_{t}, by Lemma 2, ddtGapε(μt)0\frac{d}{dt}\operatorname{Gap}_{\varepsilon}(\mu_{t})\leq 0 with equality if and only if μ˙t=0\dot{\mu}_{t}=0. Hence μt\mu_{t} converges to Eqε(𝒢)\operatorname{Eq}_{\varepsilon}(\mathcal{G}). ∎

4.3. Vanishing regularized best reply dynamics

A similar technique will be used to obtain convergence to the set of Nash equilibria for vanishing regularized best reply dynamics. The additional difficulty here comes from the fact that the vanishing regularized best reply dynamics is not autonomous and so we should prove by hand the Lyapunov convergence argument. To the best of our knowledge, this is a new result.

Theorem 4.3.

Suppose that (Diff) holds and (3.2) is run with a smoothly-varying and strictly decreasing regularization weight εt0\varepsilon_{t}\to 0. Assume further that hh takes values in [0,1][0,1], and one of the following holds:

  1. \edefitn(\edefitit\selectfonta\edefitn)

    𝒢\mathcal{G} is potential.

  2. \edefitn(\edefitit\selectfonta\edefitn)

    𝒢\mathcal{G} is monotone, FF is C1C^{1}-smooth, and ~h\widetilde{\nabla}h is bounded.

Then every solution orbit μt\mu_{t} of (3.2) converges to Eqε(𝒢)\operatorname{Eq}_{\varepsilon}(\mathcal{G}).

Remark.

The assumption that hh takes value in [0,1][0,1] is only made for notational convenience and does not incur any loss of generality.

Proof.

As before, we treat the potential and monotone cases separately.

Potential case. 

Let Pεt(μt)=P(μt)εth(μt)P_{\varepsilon_{t}}(\mu_{t})=P(\mu_{t})-\varepsilon_{t}h(\mu_{t}). Then, a calculation similar to the above implies that there exists z(t)h(μt)z(t)\in\partial h(\mu_{t}) such that

ddtPεt(μt)=F(μt)εtz(t),𝖡𝖱εt(μt)μtε˙th(μt)Fεt(μt),𝖡𝖱εt(μt)μtε˙th(μt)\frac{d}{dt}P_{\varepsilon_{t}}(\mu_{t})=\langle F(\mu_{t})-\varepsilon_{t}z(t),\operatorname{\mathsf{BR}}_{\varepsilon_{t}}(\mu_{t})-\mu_{t}\rangle-\dot{\varepsilon}_{t}h(\mu_{t})\geq\langle F_{\varepsilon_{t}}(\mu_{t}),\operatorname{\mathsf{BR}}_{\varepsilon_{t}}(\mu_{t})-\mu_{t}\rangle-\dot{\varepsilon}_{t}h(\mu_{t}) (32)

The inequality being a consequence of Lemma A.1 as it implies that z(t),𝖡𝖱εt(μt)μt~h(μt),𝖡𝖱εt(μt)μt\langle z(t),\operatorname{\mathsf{BR}}_{\varepsilon_{t}}(\mu_{t})-\mu_{t}\rangle\leq\langle\widetilde{\nabla}h(\mu_{t}),\operatorname{\mathsf{BR}}_{\varepsilon_{t}}(\mu_{t})-\mu_{t}\rangle. By Lemma 2, Fεt(x),𝖡𝖱εt(x)x0\langle F_{\varepsilon_{t}}(x),\operatorname{\mathsf{BR}}_{\varepsilon_{t}}(x)-x\rangle\geq 0. Since ε˙th(μt)0-\dot{\varepsilon}_{t}h(\mu_{t})\geq 0 we deduce that ddtPεt(μt)0\frac{d}{dt}P_{\varepsilon_{t}}(\mu_{t})\geq 0 and consequently Pεt(μt)P_{\varepsilon_{t}}(\mu_{t}) converges. As εt\varepsilon_{t} goes to zero and h()h(\cdot) is bounded, P(μt)P(\mu_{t}) converges as well. We will deduce from this that μt\mu_{t} converges to Eq(𝒢)\operatorname{Eq}(\mathcal{G}). By contradiction, if a limit point μ0=limkμtk\mu_{0}=\lim_{k}\mu_{t_{k}} is not in Eq(𝒢)\operatorname{Eq}(\mathcal{G}) then:

Gap(μ0)=maxx𝒳F(μ0),xμ0=F(μ0),𝖡𝖱(μ0)μ0=α>0.\operatorname{Gap}(\mu_{0})=\max_{x^{\prime}\in\mathcal{X}}\langle F(\mu_{0}),x^{\prime}-\mu_{0}\rangle=\langle F(\mu_{0}),\operatorname{\mathsf{BR}}(\mu_{0})-\mu_{0}\rangle=\alpha>0. (33)

Recall that, by definition, for all tt and μ\mu:

Gapεt(μ)\displaystyle\operatorname{Gap}_{\varepsilon_{t}}(\mu) =maxx𝒳F(μ),xμεt(h(x)h(μ))\displaystyle=\max_{x^{\prime}\in\mathcal{X}}\langle F(\mu),x^{\prime}-\mu\rangle-\varepsilon_{t}(h(x^{\prime})-h(\mu))
=F(μ),𝖡𝖱εt(μ)μεt(h(𝖡𝖱εt(μ))h(μ))\displaystyle=\langle F(\mu),\operatorname{\mathsf{BR}}_{\varepsilon_{t}}(\mu)-\mu\rangle-\varepsilon_{t}(h(\operatorname{\mathsf{BR}}_{\varepsilon_{t}}(\mu))-h(\mu))
=maxx𝒳Vεt(μ,x)Vεt(μ,μ).\displaystyle=\max_{x^{\prime}\in\mathcal{X}}V_{\varepsilon_{t}}(\mu,x^{\prime})-V_{\varepsilon_{t}}(\mu,\mu). (34)

Also, as hh takes values in [0,1][0,1], |Vε(x,x)Vε(x,x)(Vε(x,x)Vε(x,x))||εε||V_{\varepsilon}(x,x^{\prime})-V_{\varepsilon}(x,x)-(V_{\varepsilon^{\prime}}(x,x^{\prime})-V_{\varepsilon^{\prime}}(x,x))|\leq|\varepsilon-\varepsilon^{\prime}|. Thus:

|Gapεt(μ)Gap(μ)|εt.\displaystyle\lvert\operatorname{Gap}_{\varepsilon_{t}}(\mu)-\operatorname{Gap}(\mu)\rvert\leq\varepsilon_{t}. (35)

As in (25) one has for all kk and ss:

ddtP(μs+tk)\displaystyle\frac{d}{dt}P(\mu_{s+t_{k}}) =F(μs+tk),𝖡𝖱εs+tk(μs+tk)μs+tk\displaystyle=\langle F(\mu_{s+t_{k}}),\operatorname{\mathsf{BR}}_{\varepsilon_{s+t_{k}}}(\mu_{s+t_{k}})-\mu_{s+t_{k}}\rangle (36)

Thus, using (4.3) then (35) we deduce that:

ddtP(μs+tk)\displaystyle\frac{d}{dt}P(\mu_{s+t_{k}}) Gapεs+tk(μtk+s)εs+tkGap(μtk+s)2εs+tk.\displaystyle\geq\operatorname{Gap}_{\varepsilon_{s+t_{k}}}(\mu_{t_{k}+s})-\varepsilon_{s+t_{k}}\geq\operatorname{Gap}(\mu_{t_{k}+s})-2\varepsilon_{s+t_{k}}. (37)

As the function μGap(μ)\mu\rightarrow\operatorname{Gap}(\mu) is uniformly continuous and tμtt\rightarrow\mu_{t} is Lipschitz, recalling that Gap(μk)α>0\operatorname{Gap}(\mu_{k})\rightarrow\alpha>0, we deduce that there is δ>0\delta>0 and K1K_{1} such that for kK1k\geq K_{1} and all 0sδ0\leq s\leq\delta one has:

Gap(μtk+s)Gap(μtk)α/43α/4\displaystyle\operatorname{Gap}(\mu_{t_{k}+s})\geq\operatorname{Gap}(\mu_{t_{k}})-\alpha/4\geq 3\alpha/4 (38)

As εt\varepsilon_{t} decreases to 0 there is K2K_{2} such that for all kK2k\geq K_{2}, εs+tkα/4\varepsilon_{s+t_{k}}\leq\alpha/4. From this, (35) and (38), we obtain that ddtP(μs+tk)α/2\frac{d}{dt}P(\mu_{s+t_{k}})\geq\alpha/2 for all 0sδ0\leq s\leq\delta, for all kmax(K1,K2)k\geq\max(K_{1},K_{2}). Thus P(μδ+tk)P(μtk)=0δddtP(μs+tk)𝑑sδα/2P(\mu_{\delta+t_{k}})-P(\mu_{t_{k}})=\int_{0}^{\delta}\frac{d}{dt}P(\mu_{s+t_{k}})ds\geq\delta\alpha/2, which contradicts the convergence of P(μt)P(\mu_{t}).

Monotone case. 

Using the envelope theorem, and a calculation similar to the above implies that there exists z(t)h(μt)z(t)\in\partial h(\mu_{t}) such that

ddtGapεt(μt)\displaystyle\frac{d}{dt}\operatorname{Gap}_{\varepsilon_{t}}(\mu_{t}) =F(μt)εtz(t),μ˙t+μ˙tJF(μt)μ˙t+ε˙t(h(μt)h(BRεt(μt))\displaystyle=-\langle F(\mu_{t})-\varepsilon_{t}z(t),\dot{\mu}_{t}\rangle+\dot{\mu}_{t}JF(\mu_{t})\dot{\mu}_{t}+\dot{\varepsilon}_{t}(h(\mu_{t})-h(BR_{\varepsilon_{t}}(\mu_{t})) (39a)
Fεt(μt),μ˙t+μ˙tJF(μt)μ˙t+ε˙t(h(μt)h(BRεt(μt))\displaystyle\leq-\langle F_{\varepsilon_{t}}(\mu_{t}),\dot{\mu}_{t}\rangle+\dot{\mu}_{t}JF(\mu_{t})\dot{\mu}_{t}+\dot{\varepsilon}_{t}(h(\mu_{t})-h(BR_{\varepsilon_{t}}(\mu_{t})) (39b)
Fεt(μt),μ˙t+ε˙t(h(μt)h(BRεt(μt))\displaystyle\leq-\langle F_{\varepsilon_{t}}(\mu_{t}),\dot{\mu}_{t}\rangle+\dot{\varepsilon}_{t}(h(\mu_{t})-h(BR_{\varepsilon_{t}}(\mu_{t})) (39c)
=Fεt(μt),BRεtμt+ε˙t(h(μt)h(BRεt(μt)).\displaystyle=-\langle F_{\varepsilon_{t}}(\mu_{t}),BR_{\varepsilon_{t}}-\mu_{t}\rangle+\dot{\varepsilon}_{t}(h(\mu_{t})-h(BR_{\varepsilon_{t}}(\mu_{t})). (39d)

The first inequality is a consequence of Lemma A.1, which implies that z(t),μ˙t~h(μt),μ˙t\langle z(t),\dot{\mu}_{t}\rangle\leq\langle\widetilde{\nabla}h(\mu_{t}),\dot{\mu}_{t}\rangle. The second inequality is a consequence of μ˙tJF(μt)μ˙t0\dot{\mu}_{t}JF(\mu_{t})\dot{\mu}_{t}\leq 0 as 𝒢\mathcal{G} is monotone.

We now proceed to prove that ddt(Gapεt(μt)+εt)<0\frac{d}{dt}(\operatorname{Gap}_{\varepsilon_{t}}(\mu_{t})+\varepsilon_{t})<0.

  1. \edefnitCase 1:

    If Gapεt(μt)=0\operatorname{Gap}_{\varepsilon_{t}}(\mu_{t})=0 then μt=BRεt(μt)=Eqεt(𝒢)\mu_{t}=BR_{\varepsilon_{t}}(\mu_{t})=\operatorname{Eq}_{\varepsilon_{t}}(\mathcal{G}) thus μ˙t=0\dot{\mu}_{t}=0. Then (39a) implies that ddtGapεt(μt)=0\frac{d}{dt}\operatorname{Gap}_{\varepsilon_{t}}(\mu_{t})=0 and thus ddt(Gapεt(μt)+εt)<0\frac{d}{dt}(\operatorname{Gap}_{\varepsilon_{t}}(\mu_{t})+\varepsilon_{t})<0.

  2. \edefnitCase 2:

    If Gapεt(μt)>0\operatorname{Gap}_{\varepsilon_{t}}(\mu_{t})>0 then since h()[0,1]h(\cdot)\in[0,1] and ε˙t<0\dot{\varepsilon}_{t}<0 we have:

    ε˙t(h(μt)h(BRεt(μt))ε˙t\displaystyle\dot{\varepsilon}_{t}(h(\mu_{t})-h(BR_{\varepsilon_{t}}(\mu_{t}))\leq-\dot{\varepsilon}_{t} (40)

    and from Lemma 2 and (39d) we deduce that ddt(Gapεt(μt)+εt)<0\frac{d}{dt}(\operatorname{Gap}_{\varepsilon_{t}}(\mu_{t})+\varepsilon_{t})<0.

Consequently, Gapεt(μt)+εt\operatorname{Gap}_{\varepsilon_{t}}(\mu_{t})+\varepsilon_{t} converges to δ0\delta\geq 0 and since εt\varepsilon_{t} decreases to zero we conclude that Gapεt(μt)δ\operatorname{Gap}_{\varepsilon_{t}}(\mu_{t})\rightarrow\delta. If δ>0\delta>0, define:

Δ\displaystyle\Delta Fεt(μt),BRεt(μt)μt\displaystyle\coloneqq-\langle F_{\varepsilon_{t}}(\mu_{t}),BR_{\varepsilon_{t}}(\mu_{t})-\mu_{t}\rangle
=F(μt),BRεt(μt)μt+εt~h(μt),BRεt(μt)μt.\displaystyle\,=-\langle F(\mu_{t}),BR_{\varepsilon_{t}}(\mu_{t})-\mu_{t}\rangle+\varepsilon_{t}\langle\widetilde{\nabla}h(\mu_{t}),BR_{\varepsilon_{t}}(\mu_{t})-\mu_{t}\rangle. (41)

From (4.3) we further get:

Δ\displaystyle\Delta =Gapεt(μt)+εt(h(μt)h(BRεt(μt))+εt~h(μt),BRεt(μt)μt\displaystyle=-\operatorname{Gap}_{\varepsilon_{t}}(\mu_{t})+\varepsilon_{t}(h(\mu_{t})-h(BR_{\varepsilon_{t}}(\mu_{t}))+\varepsilon_{t}\langle\widetilde{\nabla}h(\mu_{t}),BR_{\varepsilon_{t}}(\mu_{t})-\mu_{t}\rangle (42)

As Gapεt(μt)δ\operatorname{Gap}_{\varepsilon_{t}}(\mu_{t})\rightarrow\delta, and by the assumptions, h()h(\cdot) and ~h\widetilde{\nabla}h are bounded and εt\varepsilon_{t} decreases to 0, we deduce that for tt large enough:

Δ=Fεt(μt),BRεt(μt)μtδ/2.\Delta=-\langle F_{\varepsilon_{t}}(\mu_{t}),BR_{\varepsilon_{t}}(\mu_{t})-\mu_{t}\rangle\leq-\delta/2. (43)

Consequently, for tt large enough we deduce from (39d), (40) and the last inequality that

ddt(Gapεt(μt)+εt)Fεt(μt),BRεt(μt)μt<δ/2,\frac{d}{dt}(\operatorname{Gap}_{\varepsilon_{t}}(\mu_{t})+\varepsilon_{t})\leq-\langle F_{\varepsilon_{t}}(\mu_{t}),BR_{\varepsilon_{t}}(\mu_{t})-\mu_{t}\rangle<-\delta/2, (44)

which is impossible because Gapεt(μt)+εt0\operatorname{Gap}_{\varepsilon_{t}}(\mu_{t})+\varepsilon_{t}\geq 0. Hence δ=0\delta=0 and so μt\mu_{t} converges to Eq(𝒢)\operatorname{Eq}(\mathcal{G}). ∎

4.4. \AclDAD with a constant learning rate

Before stating our main result we establish some useful lemmas. Define the regret associated to a trajectory xtx_{t} and a reference point x𝒳x\in\mathcal{X} as:

Regx(t)=0tF(xs),xxs𝑑s\operatorname{Reg}_{x}(t)=\int_{0}^{t}\langle F(x_{s}),x-x_{s}\rangle\>ds (45)
Lemma 4.

Let xtx_{t} be a solution trajectory of (3.2). Then:

Regx(t)maxhminh.\operatorname{Reg}_{x}(t)\leq\max h-\min h. (46)

This is a particular case of a more general bound due to Kwon & Mertikopoulos [26] (see Lemma 7 below). The next more compact proof of this simpler bound follows Bravo & Mertikopoulos [7] and Sorin [52].

Proof.

Define for x𝒳x\in\mathcal{X} and y𝒜y\in\mathbb{R}^{\mathcal{A}} the Fenchel coupling

Fh(x,y)=h(x)+h(y)x,y.F_{h}(x,y)=h(x)+h^{\ast}(y)-\langle x,y\rangle. (47)

By the Fenchel-Young inequality, Fh(x,y)0F_{h}(x,y)\geq 0. Recall that from (8) and (20b) xt=h(yt)=Qh(yt)x_{t}=\nabla h^{\ast}(y_{t})=Q_{h}(y_{t}) with yt=0tF(xs)𝑑sy_{t}=\int_{0}^{t}F(x_{s})ds. Then, following Mertikopoulos & Zhou [35, Lemma A.2], a simple derivation gives that for all x𝒳x\in\mathcal{X}:

ddtFh(x,yt)=F(xt),h(yt)x=F(xt),xtx=ddtRegx(t),\frac{d}{dt}F_{h}(x,y_{t})=\langle F(x_{t}),\nabla h^{\ast}(y_{t})-x\rangle=\langle F(x_{t}),x_{t}-x\rangle=-\frac{d}{dt}\operatorname{Reg}_{x}(t), (48)

Consequently, following the reasoning of Sorin [52], we get

Regx(t)Fh(x,0)=h(x)+h(0)maxh+h(0)=maxhminh\operatorname{Reg}_{x}(t)\leq F_{h}(x,0)=h(x)+h^{\ast}(0)\leq\max h+h^{\ast}(0)=\max h-\min h\qed
Lemma 5.

If a solution trajectory xt=xx_{t}=x^{\ast} of (3.2) is stationary, then xEq(𝒢)x^{\ast}\in\operatorname{Eq}(\mathcal{G}).

Proof.

From the previous lemma, for all t>0t>0 and x𝒳x\in\mathcal{X} one has

F(x),xx=Regx(t)tmaxhminht,\langle F(x^{\ast}),x-x^{\ast}\rangle=\frac{\operatorname{Reg}_{x}(t)}{t}\leq\frac{\max h-\min h}{t}, (49)

Letting tt\to\infty implies xEq(𝒢)x^{\ast}\in\operatorname{Eq}(\mathcal{G}), as was to be shown. ∎

Lemma 6.

If hh satisfies (StrCvx), then for all x𝒳x\in\mathcal{X} and all sequences yny_{n} in 𝒜\mathbb{R}^{\mathcal{A}}.

Fh(x,yn)0Qh(yn)xF_{h}(x,y_{n})\to 0\implies Q_{h}(y_{n})\to x (50)
Proof.

See Lemma A.2 in the appendix. ∎

We also need to assume the so called “reciprocity’ condition which requires that for all x𝒳x\in\mathcal{X} and all sequences yny_{n} in 𝒜\mathbb{R}^{\mathcal{A}}:

Qh(yn)xFh(x,yn)0Q_{h}(y_{n})\to x\implies F_{h}(x,y_{n})\to 0 (Rec)

Taken together, (StrCvx) and (Rec) imply that the sequence xn=Qh(yn)x_{n}=Q_{h}(y_{n}) converges to x𝒳x\in\mathcal{X} if and only if Fh(x,yn)0F_{h}(x,y_{n})\to 0. The terminology “reciprocity” is justified by the fact that, under (Rec), the topology induced by the level sets of the Fenchel coupling becomes equivalent to the ordinary topology on 𝒳\mathcal{X}; for a precise statement, see Lemma A.2 in Appendix A. This is a subtle – but important – requirement, which is fortunately satisfied by all the standard regularization choices on 𝒳\mathcal{X}, cf. [34, 35].

We can now state our convergence results for (3.2) when the learning rate is constant.

Theorem 4.4.

Let xtx_{t} be a solution trajectory of (3.2) with constant learning rate ηt1\eta_{t}\equiv 1. Assume further that one of the following holds:

  1. \edefitn(\edefitit\selectfonta\edefitn)

    𝒢\mathcal{G} is potential and hh^{\ast} is twice differentiable.

  2. \edefitn(\edefitit\selectfonta\edefitn)

    𝒢\mathcal{G} is strictly monotone, and hh satisfies (StrCvx) and (Rec).

Then xtx_{t} converges to Eq(𝒢)\operatorname{Eq}(\mathcal{G}).

Proof.

Again, we treat the potential and monotone cases separately.

Potential case. 

As yt=0tF(xs)𝑑sy_{t}=\int_{0}^{t}F(x_{s})\>ds and xt=Qh(yt)=h(yt)x_{t}=Q_{h}(y_{t})=\nabla h^{\ast}(y_{t}) we obtain that:

x˙t=F(xt),2h(yt).\dot{x}_{t}=\langle F(x_{t}),\nabla^{2}h^{\ast}(y_{t})\rangle. (51)

Since hh^{\ast} is convex, 2h\nabla^{2}h^{\ast} is semidefinite positive and so:

ddtP(xt)=F(xt),x˙t=F(xt)2h(yt)F(xt)0,\frac{d}{dt}P(x_{t})=\langle F(x_{t}),\dot{x}_{t}\rangle=F(x_{t})\nabla^{2}h^{\ast}(y_{t})F(x_{t})\geq 0, (52)

the inequality being strict whenever x˙t0\dot{x}_{t}\neq 0. As such, xtx_{t} converges to Eq(𝒢)\operatorname{Eq}(\mathcal{G}).

Monotone case. 

As 𝒢\mathcal{G} is strictly monotone, Eq(𝒢)\operatorname{Eq}(\mathcal{G}) is a singleton which coincides with xx^{\ast} the (necessarily unique) GESS of the game. The calculation in (48) gives:

ddtFh(x,yt)=F(xt),xtx0\frac{d}{dt}F_{h}(x^{\ast},y_{t})=\langle F(x_{t}),x_{t}-x^{\ast}\rangle\leq 0 (53)

As xx^{\ast} is a GESS ddtFh(x,yt)0\frac{d}{dt}F_{h}(x^{\ast},y_{t})\leq 0, with equality if and only if xt=Qh(yt)=xx_{t}=Q_{h}(y_{t})=x^{\ast}. Suppose there is δ>0\delta>0 such that F(xt),xtx<δ<0\langle F(x_{t}),x_{t}-x^{\ast}\rangle<-\delta<0 for all tt large enough. Then, Fh(x,yt)F_{h}(x^{\ast},y_{t}) goes to -\infty which is impossible because Fh()0F_{h}(\cdot)\geq 0. Thus there is tkt_{k}\rightarrow\infty such that F(xtk),xtkx0\langle F(x_{t_{k}}),x_{t_{k}}-x^{\ast}\rangle\rightarrow 0 and so xtk=Qh(ytk)xx_{t_{k}}=Q_{h}(y_{t_{k}})\rightarrow x^{\ast}. By the reciprocity condition (Rec), we deduce that Fh(x,ytk)0F_{h}(x^{\ast},y_{t_{k}})\rightarrow 0 since xx^{\ast} is a GESS. As ddtFh(x,yt)0\frac{d}{dt}F_{h}(x^{\ast},y_{t})\leq 0 and Fh()0F_{h}(\cdot)\geq 0, necessarily Fh(x,yt)F_{h}(x^{\ast},y_{t}) converges and by the last argument, its limit is zero. Using the strong convexity of hh, Lemma 6 implies that xt=Qh(yt)x_{t}=Q_{h}(y_{t}) converges to xx^{\ast}. ∎

There are several things to note here. First, the proof of convergence to the unique equilibrium in strictly monotone games (part b of Theorem 4.4) extends to all games with a GESS. Second, the same proof implies that in all games, ESS (if exist) are asymptotically stable under (3.2) with constant learning rate (e.g., the dynamics converges to an ESS if it starts sufficiently close to it). Third, Theorem 4.4 concerns the “day-to-day” evolution of (3.2), not the corresponding empirical mean. Last, the theorem does not apply to monotone games which are not strict.

To complete the picture, the next proposition shows the convergence of the empirical mean to the set of Nash equilibria in all monotone games.

Proposition 4.1.

Let xt=Qh(yt)x_{t}=Q_{h}(y_{t}) be a solution trajectory of (3.2) with constant learning rate ηt1\eta_{t}\equiv 1. If the underlying game is monotone, then μt=1t0txs𝑑s\mu_{t}=\frac{1}{t}\int_{0}^{t}x_{s}ds converges to Eq(𝒢)\operatorname{Eq}(\mathcal{G}).

Proof.

By Lemma 4 and monotonicity of the game, for all x𝒳x\in\mathcal{X}:

maxhminhRegx(t)=0tF(xs),xxs𝑑s0tF(x),xxs𝑑s\max h-\min h\geq\operatorname{Reg}_{x}(t)=\int_{0}^{t}\langle F(x_{s}),x-x_{s}\rangle\>ds\geq\int_{0}^{t}\langle F(x),x-x_{s}\rangle\>ds (54)

and hence

maxhminhtF(x),xμt.\frac{\max h-\min h}{t}\geq\langle F(x),x-\mu_{t}\rangle. (55)

Consequently, μt\mu_{t} converges to MVI(𝒢)\operatorname{MVI}(\mathcal{G}) which coincides with Eq(𝒢)\operatorname{Eq}(\mathcal{G}) as the game is monotone. ∎

4.5. \AclDAD with a variable learning rate

This subsection extends the results of the previous section on monotone games in two directions: we consider GESSs instead of (strictly) monotone games, and we treat dynamics with a variable learning rate. Other extensions are also possible, but we do not treat them here.

We begin with a technical lemma on the regret incurred by (3.2):

Lemma 7.

If xtx_{t} is a solution trajectory of (3.2) then the regret is bounded as follows:

Regx(t)maxhminhηt.\operatorname{Reg}_{x}(t)\leq\frac{\max h-\min h}{\eta_{t}}. (56)

Consequently, if tηtt\eta_{t}\to\infty and xt=xx_{t}=x^{\ast} for all t0t\geq 0, then xEq(𝒢)x^{\ast}\in\operatorname{Eq}(\mathcal{G}).

Proof.

The bound in the regret follows from [26, Theorem 4.1], and the implication follows as in Lemma 5 above. ∎

Lemma 7 allows to extend proposition 4.1 to variable learning rates.

Proposition 4.2.

Let xt=Qh(yt)x_{t}=Q_{h}(y_{t}) be a solution trajectory of (3.2). If the underlying game is monotone and tηtt\eta_{t}\to\infty then μt=1t0txs𝑑s\mu_{t}=\frac{1}{t}\int_{0}^{t}x_{s}\>ds converges to Eq(𝒢)\operatorname{Eq}(\mathcal{G}).

The next result extends part b) of theorem 4.4 to variable learning rates. The proof is a little more complicated.

Theorem 4.5.

Let xtx_{t} be a solution trajectory of (3.2) with a non-increasing learning rate. Suppose further that the underlying game 𝒢\mathcal{G} admits a GESS xx^{\ast}. If the regularizer satisfies (StrCvx) and (Rec), and the dynamics’ learning rate satisfies limttηt=\lim_{t\to\infty}t\eta_{t}=\infty, then xtx_{t} converges to xx^{\ast}.

Proof.

[26, Eq. 29] prove that

ddtFh(x,ηtyt)ηtF(xt),xtxη˙ηt2(maxhminh).\frac{d}{dt}\frac{F_{h}(x^{\ast},\eta_{t}y_{t})}{\eta_{t}}\leq\langle F(x_{t}),x_{t}-x^{\ast}\rangle-\frac{\dot{\eta}}{\eta_{t}^{2}}(\max h-\min h). (57)

Let ε={y𝒜:Fh(x,y)ε}\mathcal{F}_{\varepsilon}=\{y\in\mathbb{R}^{\mathcal{A}}:F_{h}(x^{\ast},y)\leq\varepsilon\} denote the “Fenchel zone” of magnitude ε\varepsilon with respect to xx^{\ast}. Then it suffices to show that

For all ε>0\varepsilon>0, there exists T(ε)>0T(\varepsilon)>0 such that ηtytε\eta_{t}y_{t}\in\mathcal{F}_{\varepsilon} for all tT(ε)t\geq T(\varepsilon). (P)

Actually this means that Fh(x,ηtyt)0F_{h}(x^{\ast},\eta_{t}y_{t})\to 0. By (StrCvx), (50) holds and thus xtx_{t} converges to xx^{\ast}. The proof of (P) follows from the following two claims:

Claim 1. 

ηtytε\eta_{t}y_{t}\in\mathcal{F}_{\varepsilon} infinitely often.

Proof.

If ηtytε\eta_{t}y_{t}\notin\mathcal{F}_{\varepsilon}, then by (Rec), there exists a neighbourhood 𝒰(ε)\mathcal{U}(\varepsilon) of xx^{\ast} such that xt𝒰(ε)x_{t}\notin\mathcal{U}(\varepsilon). By definition of GESS, there is c(ε)>0c(\varepsilon)>0 such that F(xt),xtx<2c(ε)\langle F(x_{t}),x_{t}-x^{\ast}\rangle<-2c(\varepsilon). As η˙tηt20\frac{\dot{\eta}_{t}}{\eta_{t}^{2}}\rightarrow 0, there is T1(ε)T_{1}(\varepsilon) such that for all tT1(ε)t\geq T_{1}(\varepsilon) we have η˙tηt2(maxhminh)<c(ε)\frac{\dot{\eta}_{t}}{\eta_{t}^{2}}(\max h-\min h)<c(\varepsilon). Consequently by contradiction to the claim, ddtFh(x,ηtyt)ηt<c(ε)\frac{d}{dt}\frac{F_{h}(x^{\ast},\eta_{t}y_{t})}{\eta_{t}}<-c(\varepsilon) for all tt large enough and so Fh(x,ηtyt)ηt\frac{F_{h}(x^{\ast},\eta_{t}y_{t})}{\eta_{t}} goes to -\infty. Since Fh(x,ηtyt)0F_{h}(x^{\ast},\eta_{t}y_{t})\geq 0 this is impossible. ∎

Claim 2. 

There exists T2(ε)T1(ε/2)T_{2}(\varepsilon)\geq T_{1}(\varepsilon/2) such that if tT2(ε)t\geq T_{2}(\varepsilon) and ηtytε\eta_{t}y_{t}\in\mathcal{F}_{\varepsilon} then there exists τ>0\tau>0 such that ηsysε\eta_{s}y_{s}\in\mathcal{F}_{\varepsilon} for all s[t,t+τ]s\in[t,t+\tau].

Proof.

We need to consider two cases separately:

  1. Case a).

    If ηtytε/2\eta_{t}y_{t}\in\mathcal{F}_{\varepsilon/2} then obviously, there is τ>0\tau>0 such that ηsysε\eta_{s}y_{s}\in\mathcal{F}_{\varepsilon} for all s[t,t+τ]s\in[t,t+\tau].

  2. Case b).

    If ηtytε\eta_{t}y_{t}\in\mathcal{F}_{\varepsilon} but ηtytε/2\eta_{t}y_{t}\notin\mathcal{F}_{\varepsilon/2} then, from claim 1, ddtFh(x,ηtyt)ηt<c(ε/2)<0\frac{d}{dt}\frac{F_{h}(x^{\ast},\eta_{t}y_{t})}{\eta_{t}}<-c(\varepsilon/2)<0. Thus there exists τ>0\tau>0 such that for all s[t,t+τ]s\in[t,t+\tau]: Fh(x,ηsys)ηsFh(x,ηtyt)ηt\frac{F_{h}(x^{\ast},\eta_{s}y_{s})}{\eta_{s}}\leq\frac{F_{h}(x^{\ast},\eta_{t}y_{t})}{\eta_{t}}. As ηtytε\eta_{t}y_{t}\in\mathcal{F}_{\varepsilon}, Fh(x,ηtyt)εF_{h}(x^{\ast},\eta_{t}y_{t})\leq\varepsilon and since ηsηt\eta_{s}\leq\eta_{t}, we deduce that Fh(x,ηsys)εF_{h}(x^{\ast},\eta_{s}y_{s})\leq\varepsilon. Thus ηsysε\eta_{s}y_{s}\in\mathcal{F}_{\varepsilon} for all s[t,t+τ]s\in[t,t+\tau]. ∎

From Claims 1 and 2, there exists tmax(T2(ε),T(ε))t\geq\max(T_{2}(\varepsilon),T(\varepsilon)), such that ηtyt\eta_{t}y_{t} belongs to ε\mathcal{F}_{\varepsilon} and remains in it forever. ∎

5. Analysis and results in discrete time

In this section, we continue with the analysis of the discrete-time dynamics presented in Section 3.1, starting with fictitious play and its variants in Section 5.1 before moving on to the dynamics of dual averaging in Section 5.2.

5.1. \AclFP and regularized fictitious play

The convergence of (3.1) and (3.1) can be obtained from the continuous-time analysis of Section 4 using the theory of stochastic approximation [2, 4, 5]. Informally, this theory links the asymptotic behavior of differential inclusions of the form

x˙tM(xt)\dot{x}_{t}\in M(x_{t}) (58)

to the limit sets of discrete-time processes satisfying the basic recursion

Xn+1Xnγn+1M(Xn),X_{n+1}-X_{n}\in\gamma_{n+1}M(X_{n}), (59)

where M:𝒳𝒜M\colon\mathcal{X}\rightrightarrows\mathbb{R}^{\mathcal{A}} is an upper semicontinuous, compact-convex-valued correspondence on 𝒳\mathcal{X}, and γn>0\gamma_{n}>0 is a vanishing step-size sequence.

In our specific context, the role of MM is to be played by the best-reply correspondence 𝖡𝖱\operatorname{\mathsf{BR}} for (3.1), and its regularized variant 𝖡𝖱εh\operatorname{\mathsf{BR}}_{\varepsilon h} for (3.1). Concretely, we have:

Proposition 5.1.

Let X¯n=(1/n)k=1nXk\bar{X}_{n}=(1/n)\sum_{k=1}^{n}X_{k}, n=1,2,n=1,2,\dotsc, be the empirical frequency of play under a sequence of population states Xn𝒳X_{n}\in\mathcal{X}. Then:

  1. \edefitn(\edefitit\selectfonti\edefitn)

    If XnX_{n} is generated by (3.1), X¯n\bar{X}_{n} is an asymptotic pseudotrajectory of (3.2).

  2. \edefitn(\edefitit\selectfonti\edefitn)

    If XnX_{n} is generated by (3.1), X¯n\bar{X}_{n} is an asymptotic pseudotrajectory of (3.2).

Informally, the notion of an asymptotic pseudotrajectory (APT) means that X¯n\bar{X}_{n} asymptotically tracks the pseudo-flow of (3.2)/(3.2) with arbitrary accuracy over windows of arbitrary length; for a precise definition, see Benaïm et al. [4, Definitions I and 4.1].The proof of Proposition 5.1 follows immediately from the general theory of Benaïm et al. [4, Propositions 1.3, 1.4, and Theorem 4.2], so we do not present it here.

In view of this “asymptotic approximation” property, it is plausible to expect that X¯n\bar{X}_{n} exhibits the same asymptotic behavior as the corresponding continuous-time dynamics. Our next result makes this intuition precise in two classes of games:

  1. (1)

    Monotone games, i.e., games that satisfy (Mon).

  2. (2)

    Potential games for which the ε\varepsilon-regularized potential Pε=PεhP_{\varepsilon}=P-\varepsilon h, ε0\varepsilon\geq 0, satisifies the Sard-type condition:

    Pε(Eqε(𝒢)) has empty interior.\text{$P_{\varepsilon}(\operatorname{Eq}_{\varepsilon}(\mathcal{G}))$ has empty interior}. (Sε)

Our main results for (3.1) and (3.1) may then be stated as follows:

Theorem 5.1.

Suppose that 𝒢\mathcal{G} satisfies (Mon) or (Sε) for ε=0\varepsilon=0. Then the empirical frequency of play X¯n\bar{X}_{n} under (3.1) converges to Eq(𝒢)\operatorname{Eq}(\mathcal{G}).

Theorem 5.2.

Suppose (Diff) holds and 𝒢\mathcal{G} satisfies (Mon) or (Sε) for some ε>0\varepsilon>0. Then the empirical frequency of play X¯n\bar{X}_{n} under (3.1) converges to Eqε(𝒢)\operatorname{Eq}_{\varepsilon}(\mathcal{G}).

Sketch of proof.

The proof of Theorems 5.1 and 5.2 follows a similar technical path starting with the fact that X¯n\bar{X}_{n} is an APT of (3.2) / (3.2) respectively (cf. Proposition 5.1 above). In both cases, Theorem 4.3 of Benaïm et al. [4] shows that the accumulation points of an APT lie in an internally chain transitive set of the underlying continuous-time dynamics.222An internally chain transitive set is a compact invariant set that contains no proper attractors [10]. The claims of Theorem 5.1 may then be proved as follows:

  1. (1)

    For games satisfying (Mon), invoke Theorem 4.1 from Section 4 and Theorem 3.23 and Proposition 3.25 of Benaïm et al. [4].

  2. (2)

    For games satisfying (Sε) with ε=0\varepsilon=0, invoke Theorem 4.1 from Section 4 and Proposition 3.27 of Benaïm et al. [4].

Finally, for ε>0\varepsilon>0, the proof of Theorem 5.2 is entirely analogous, and simply invokes Theorem 4.2 in lieu of Theorem 4.1 in the above chain of implications. ∎

5.2. \AclDA

We now proceed to state and prove our main convergence result for the recursion (3.1). To do so, we will require the “reciprocity” condition (Rec) that was introduced for the study of the continuous-time dynamics (3.2), namely

Qh(yn)xFh(x,yn)0Q_{h}(y_{n})\to x\implies F_{h}(x,y_{n})\to 0 (Rec)

for all x𝒳x\in\mathcal{X} and all sequences yny_{n} in 𝒜\mathbb{R}^{\mathcal{A}}. We then have:

Theorem 5.3.

Suppose that (3.1) is run with a regularizer satisfying (StrCvx) and (Rec), and a vanishing non-increasing learning rate ηn0\eta_{n}\searrow 0 such that

limn(1ηn1ηn1)=0.\lim_{n\to\infty}\left(\frac{1}{\eta_{n}}-\frac{1}{\eta_{n-1}}\right)=0. (60)

Then:

  • If xx^{\ast} is a GESS of 𝒢\mathcal{G}, the sequence of population states XnX_{n} converges to xx^{\ast}.

  • If xx^{\ast} is an ESS of 𝒢\mathcal{G}, the same conclusion holds provided that max{ηn,ηn1ηn11}\max\{\eta_{n},\eta_{n}^{-1}-\eta_{n-1}^{-1}\} is small enough and (3.1) is initialized sufficiently close to xx^{\ast}.

Corollary 5.1.

Suppose that 𝒢\mathcal{G} is strictly monotone and (3.1) is run with a regularizer satisfying (StrCvx) and (Rec), and a learning rate of the form ηn1/np\eta_{n}\propto 1/n^{p} for p(0,1)p\in(0,1). Then the sequence of population states induced by (3.1) converges to the (necessarily unique) equilibrium of 𝒢\mathcal{G}.

Our proof relies on the use of a suitable “energy inequality”, provided here by a deflated variant of the Fenchel coupling:

Lemma 8.

Fix a population state x𝒳x\in\mathcal{X}, and let

En1ηnFh(x,ηnSn)=1ηn[h(x)h(Xn+1)+ηnSn,Xn+1x],E_{n}\coloneqq\frac{1}{\eta_{n}}F_{h}(x,\eta_{n}S_{n})=\frac{1}{\eta_{n}}[h(x)-h(X_{n+1})+\eta_{n}\langle S_{n},X_{n+1}-x\rangle], (61)

with SnS_{n} as in (3.1). Then, for all n=1,2,n=1,2,\dotsc, we have:

EnEn1+F(Xn),Xnx+[h(x)minh]δn+Fh(Xn,ηn1Sn)ηn1,E_{n}\leq E_{n-1}+\langle F(X_{n}),X_{n}-x\rangle+[h(x)-\min h]\,\delta_{n}+\frac{F_{h}(X_{n},\eta_{n-1}S_{n})}{\eta_{n-1}}, (62)

where δn=1/ηn1/ηn1\delta_{n}=1/\eta_{n}-1/\eta_{n-1}. If, in addition, hh satisfies (StrCvx), we have

EnEn1+F(Xn),Xnx+[h(x)minh]δn+ηn12KF(Xn)2.E_{n}\leq E_{n-1}+\langle F(X_{n}),X_{n}-x\rangle+[h(x)-\min h]\,\delta_{n}+\frac{\eta_{n-1}}{2K}\lVert F(X_{n})\rVert_{\ast}^{2}. (63)

Energy inequalities of the form (62) are standard in the analysis of dual averaging schemes, and they go back at least as far as Nesterov [39, 40] (who introduced the method). The specific formulation above follows Héliou et al. [17, Lemma. C.1], so we omit the proof; for a range of similar computations, see Shalev-Shwartz [48, Lemma 2.20], Kwon [25, p. 42] and Sorin [52].

Now, to move forward with the proof of Theorem 5.3, we will require two complementary arguments. The first is a stability result: we show below that, if XnX_{n} is sufficiently close to an evolutionarily stable state xx^{\ast} and the method’s learning rate is “small enough”, then Xn+1X_{n+1} remains close to xx^{\ast}. To state this precisely, let ε={y𝒜:Fh(x,y)ε}\mathcal{F}_{\varepsilon}=\{y\in\mathbb{R}^{\mathcal{A}}:F_{h}(x^{\ast},y)\leq\varepsilon\} denote the “Fenchel zone” of magnitude ε\varepsilon with respect to xx^{\ast}, and let 𝒟ε=Qh(ε)={x=Qh(y):yε}\mathcal{D}_{\varepsilon}=Q_{h}(\mathcal{F}_{\varepsilon})=\{x=Q_{h}(y):y\in\mathcal{F}_{\varepsilon}\} denote the image of ε\mathcal{F}_{\varepsilon} under QhQ_{h}. We then have:

Lemma 9.

Let xESS(𝒢)x^{\ast}\in\operatorname{ESS}(\mathcal{G}) and assume that (StrCvx) and (Rec) hold. Then, for all sufficiently small ε>0\varepsilon>0, we have the following implication (valid for all n=1,2,n=1,2,\dotsc): if Xn𝒟εX_{n}\in\mathcal{D}_{\varepsilon} and max{ηn,δn}\max\{\eta_{n},\delta_{n}\} is small enough, we also have Xn+1𝒟εX_{n+1}\in\mathcal{D}_{\varepsilon}.

The second component of the proof of Theorem 5.3 is a subsequence extraction argument: if the iterates of (3.1) lie in a neighborhood of xx^{\ast} where (ESS) holds, there exists a subsequence of XnX_{n} that converges to xx^{\ast}.

Lemma 10.

Let xESS(𝒢)x^{\ast}\in\operatorname{ESS}(\mathcal{G}), let 𝒰\mathcal{U} be the neighborhood of validity of (ESS), and assume further that (StrCvx) and (60) hold. If Xn𝒰X_{n}\in\mathcal{U} for all sufficiently large nn, we have lim infnXnx=0\liminf_{n\to\infty}\lVert X_{n}-x^{\ast}\rVert=0.

We prove these two ancillary results below.

Proof of Lemma 9.

Let 𝒰\mathcal{U} be the neighborhood whose existence is guaranteed by (ESS), i.e., F(x),xx<0\langle F(x),x-x^{\ast}\rangle<0 for all x𝒰x\in\mathcal{U}, xxx\neq x^{\ast}. By the relation between the Fenchel coupling and the ordinary norm topology on 𝒳\mathcal{X} (cf. Lemma A.2 in Appendix A), it follows that 𝒟ε𝒰\mathcal{D}_{\varepsilon}\subseteq\mathcal{U} if ε\varepsilon is taken small enough; we will assume this to be the case throughout the rest of this proof. Moreover, by (Rec) and the continuity of uu, there exists some cc(ε)c\equiv c(\varepsilon) such that F(x),xxc<0\langle F(x),x-x^{\ast}\rangle\leq-c<0 for all x𝒟ε𝒟ε/2x\in\mathcal{D}_{\varepsilon}\setminus\mathcal{D}_{\varepsilon/2}.

With all this in hand, assume that Xn𝒟εX_{n}\in\mathcal{D}_{\varepsilon}; we will show that Xn+1𝒟εX_{n+1}\in\mathcal{D}_{\varepsilon} provided that max{ηn,δn}\max\{\eta_{n},\delta_{n}\} is small enough. To do so, let Fn=Fh(x,ηnSn)F_{n}=F_{h}(x^{\ast},\eta_{n}S_{n}); then, by substituting xxx\leftarrow x^{\ast} in the template inequality (63) of Lemma 8, we obtain:

  1. Case 1.

    If Xn𝒟ε𝒟ε/2X_{n}\in\mathcal{D}_{\varepsilon}\setminus\mathcal{D}_{\varepsilon/2}, then

    FnFn1ηn[cδn0ptηn1G22K],F_{n}\leq F_{n-1}-\eta_{n}\left[c-\delta_{n}0pt-\eta_{n-1}\frac{G^{2}}{2K}\right], (64)

    where 0pt=maxhminh0pt=\max h-\min h and G=maxx𝒳F(x)G=\max_{x\in\mathcal{X}}\lVert F(x)\rVert_{\ast}. Hence, if δnc/(20pt)\delta_{n}\leq c/(20pt) and ηn1cK/G2\eta_{n-1}\leq cK/G^{2}, we conclude that FnFn1F_{n}\leq F_{n-1}.

  2. Case 2.

    If Xn𝒟ε/2X_{n}\in\mathcal{D}_{\varepsilon/2}, then

    Fnε2+δn0pt+ηn1G22K,F_{n}\leq\frac{\varepsilon}{2}+\delta_{n}0pt+\eta_{n-1}\frac{G^{2}}{2K}, (65)

    so FnεF_{n}\leq\varepsilon provided that δn<ε/(40pt)\delta_{n}<\varepsilon/(40pt) and ηn1εK/(2G2)\eta_{n-1}\leq\varepsilon K/(2G^{2}).

In both cases, we conclude that Xn+1=Qh(ηnSn)𝒟εX_{n+1}=Q_{h}(\eta_{n}S_{n})\in\mathcal{D}_{\varepsilon}, as claimed. ∎

Proof of Lemma 10.

Assume to the contrary that lim infnXnx>0\liminf_{n\to\infty}\lVert X_{n}-x^{\ast}\rVert>0. Then, by assumption, there exist ε>0\varepsilon>0 and n0n0(ε)n_{0}\equiv n_{0}(\varepsilon) such that Xn𝒰X_{n}\in\mathcal{U} and Xnxε\lVert X_{n}-x^{\ast}\rVert\geq\varepsilon for all nn0n\geq n_{0}. Since xESS(𝒢)x^{\ast}\in\operatorname{ESS}(\mathcal{G}), there exists a positive constant c>0c>0 such that F(Xn),Xnxc<0\langle F(X_{n}),X_{n}-x^{\ast}\rangle\leq-c<0 for all nn0n\geq n_{0}. With this in mind, substituting xxx\leftarrow x^{\ast} in the template inequality (63) and telescoping yields

En\displaystyle E_{n} En0c(nn0)+(1ηn1ηn0)0pt+G22Kk=n0nηk\displaystyle\leq E_{n_{0}}-c(n-n_{0})+\left(\frac{1}{\eta_{n}}-\frac{1}{\eta_{n_{0}}}\right)0pt+\frac{G^{2}}{2K}\sum_{k=n_{0}}^{n}\eta_{k}
En0+cn00pt/n0n[c1nηn1nk=n0nηk].\displaystyle\leq E_{n_{0}}+cn_{0}-0pt/n_{0}-n\left[c-\frac{1}{n\eta_{n}}-\frac{1}{n}\sum_{k=n_{0}}^{n}\eta_{k}\right]. (66)

To proceed, note that the first part of (60) gives

limn(1/ηn1/ηn1)n(n1)=limn(1ηn1ηn1)=limnδn=0,\lim_{n\to\infty}\frac{\left(1/\eta_{n}-1/\eta_{n-1}\right)}{n-(n-1)}=\lim_{n\to\infty}\left(\frac{1}{\eta_{n}}-\frac{1}{\eta_{n-1}}\right)=\lim_{n\to\infty}\delta_{n}=0, (67)

so limn1/(nηn)=0\lim_{n\to\infty}1/(n\eta_{n})=0 by the Stolz–Cesàro theorem. By the second part of (60), we also have (1/n)k=n0nηk=0(1/n)\sum_{k=n_{0}}^{n}\eta_{k}=0. Hence, by letting nn\to\infty in (5.2), we get limnEn=\lim_{n\to\infty}E_{n}=-\infty, a contradiction which completes our proof. ∎

The proof of our main convergence results follows by a tandem application of the above.

Proof of Theorem 5.3.

We consider the two cases separately.

Case 1: xGESS(𝒢)x^{\ast}\in\operatorname{GESS}(\mathcal{G}). 

Let ε>0\varepsilon>0 be arbitrary. By (60), it follows that limnηn=limnδn=0\lim_{n\to\infty}\eta_{n}=\lim_{n\to\infty}\delta_{n}=0, and, by Lemma 10, there exists a subsequence XnkX_{n_{k}} of XnX_{n} converging to xx^{\ast}. This means that, in the long run, ηn\eta_{n} and δn\delta_{n} become arbitrarily small, and XnX_{n} comes arbitrarily close to xx^{\ast}. Hence, by applying Lemma 9 inductively, we conclude that Xn𝒟εX_{n}\in\mathcal{D}_{\varepsilon} for all sufficiently large nn. Since ε>0\varepsilon>0 has been chosen arbitrarily, this implies that XnX_{n} converges to xx^{\ast}, as claimed.

Case 2: xESS(𝒢)x^{\ast}\in\operatorname{ESS}(\mathcal{G}). 

Let 𝒰\mathcal{U} be the neighborhood whose existence is guaranteed by the evolutionary stability condition (3) for xx^{\ast}. Then, by Lemma 9, if (3.1) is initialized sufficiently close to xx^{\ast} and max{ηn,δn}\max\{\eta_{n},\delta_{n}\} is sufficiently small, we will have Xn𝒰X_{n}\in\mathcal{U} for all nn. Hence, by Lemma 10, we conclude that lim infnXnx=0\liminf_{n\to\infty}\lVert X_{n}-x^{\ast}\rVert=0. Our proof then follows by the same inductive application of Lemma 9 as in the case of a GESS above. ∎

Time averages under dual averaging. 

We close this section with a result on the empirical frequency of play under (3.1). Indeed, telescoping (62) trivially yields the bound:

k=1nF(Xk),xXkmaxhminhηn+12Kk=1nηk1F(Xk)2.\sum_{k=1}^{n}\langle F(X_{k}),x-X_{k}\rangle\leq\frac{\max h-\min h}{\eta_{n}}+\frac{1}{2K}\sum_{k=1}^{n}\eta_{k-1}\lVert F(X_{k})\rVert_{\ast}^{2}. (68)

This regret-type bound echoes standard results in the literature – see e.g., Nesterov [40, Theorem 1] – and leads to the following time-average convergence result:

Proposition 5.2.

Suppose that (3.1) is run with a regularizer satisfying (StrCvx) and a learning rate of the form ηn1/n\eta_{n}\propto 1/\sqrt{n}. If 𝒢\mathcal{G} is monotone, the empirical frequency of play X¯n\bar{X}_{n} under (3.1) converges to Eq(𝒢)\operatorname{Eq}(\mathcal{G}).

The proof of Proposition 5.2 mimics that of Proposition 4.1 so we omit it. We only note that, in contrast to Theorem 5.3, Proposition 5.2 does not require strict monotonicity or reciprocity; however, it concerns the cruder, average frequency of play, so it provides a considerably weaker guarantee in this regard.

6. Extensions and concluding remarks

In this concluding section, we provide a series of extensions and remarks that would have otherwise disrupted the flow of the rest of our paper.

6.1. Positive correlation

Sandholm [44] introduced the so-called “positive correlation” condition

x˙t0F(xt),x˙t>0.\dot{x}_{t}\neq 0\implies\langle F(x_{t}),\dot{x}_{t}\rangle>0. (PC)

In the case of potential games, (PC) implies that

ddtP(xt)=F(xt),x˙t>0\frac{d}{dt}P(x_{t})=\langle F(x_{t}),\dot{x}_{t}\rangle>0 (69)

whenever x˙t0\dot{x}_{t}\neq 0. This in turn implies the convergence of the trajectories to the set of rest points of the dynamics [45, Theorem 7.1.2]. Some of our results in Section 4 for potential games can also be obtained by establishing (PC) for the dynamics under study.

6.2. No-regret dynamics

Our second remark concerns continuous- or discrete-time processes for which the average regret vanishes, i.e., for all x𝒳x\in\mathcal{X} we have:

lim supt1t0tF(xs),xxs𝑑s\displaystyle\limsup_{t\to\infty}\frac{1}{t}\int_{0}^{t}\langle F(x_{s}),x-x_{s}\rangle\>ds =0\displaystyle=0 in continuous time (70a)
lim supn1nk=1nF(Xk),xXk\displaystyle\limsup_{n\to\infty}\frac{1}{n}\sum_{k=1}^{n}\langle F(X_{k}),x-X_{k}\rangle =0\displaystyle=0 in discrete time (70b)

As we saw in Propositions 4.1 and 5.2, the dynamics of dual averaging have the no-regret property (70) in both continuous and discrete time. Looking more closely at the proofs of Propositions 4.1 and 5.2, we conclude that, in monotone games, the empirical frequency of play under any dynamical process that satisfies (70) converges to the game’s set of equilibria. This property forms the basis of many algorithms and dynamics designed to solve variational inequalities and equilibrium problems [38, 40, 24]; the above provides a game-theoretic interpretation of this property in the context of population games.

6.3. Extensions

We also note that our results can be extended to the following settings:

  1. (1)

    If 𝒳\mathcal{X} is a convex compact subset of 𝒜\mathbb{R}^{\mathcal{A}} and F:𝒳𝒜F\colon\mathcal{X}\to\mathbb{R}^{\mathcal{A}} is a vector field on 𝒳\mathcal{X}, we maintain the same results for convergence to solutions of (SVI) and (MVI); however, the population game interpretation disappears.

  2. (2)

    This interpretation is recovered in multi-population games: if there are several player populations indexed by i=1,,Ni=1,\dotsc,N, and if 𝒜i\mathcal{A}_{i} denotes the set of pure strategies of the ii-th population and uiα:i=1NΔ(𝒜i)u_{i\alpha}\colon\prod_{i=1}^{N}\operatorname{\operatorname{\Delta}}(\mathcal{A}_{i})\to\mathbb{R} is the payoff function of α\alpha-strategists in the ii-th population, it suffices to set 𝒳=iΔ(𝒜i)\mathcal{X}=\prod_{i}\operatorname{\operatorname{\Delta}}(\mathcal{A}_{i}) and F(x)=(uiαi(x))αi𝒜i,iF(x)=(u_{i\alpha_{i}}(x))_{\alpha_{i}\in\mathcal{A}_{i},i\in\mathcal{I}}.

  3. (3)

    Our theory can also be extended to normal form games with a finite number of players and smooth concave payoff functions gig_{i}, i=1,,Ni=1,\dotsc,N. In this case, the components of the corresponding vector field are Giα(x)=gixiαG_{i\alpha}(x)=\frac{\partial g_{i}}{\partial x_{i\alpha}}, and the notion of evolutionary stability boils down to that of variational stability; for a detailed treatment, see Sorin & Wan [53] and Mertikopoulos & Zhou [35].

6.4. Links with variational inequalities and further extensions

Taking a more general point of view that focuses on the vector field FF, we can also consider the following framework that brings to the forefront the associated variational inequalities. Let 𝒳i\mathcal{X}_{i}, i=1,,Ni=1,\dotsc,N be a convex compact subset of a topological vector space 𝒱i\mathcal{V}_{i} with dual 𝒱i\mathcal{V}^{\ast}_{i}, write 𝒱=i𝒱i\mathcal{V}=\prod_{i}\mathcal{V}_{i} and 𝒳=i𝒳i\mathcal{X}=\prod_{i}\mathcal{X}_{i}, and let Fi:𝒳𝒱iF_{i}\colon\mathcal{X}\to\mathcal{V}^{\ast}_{i} be a collection of continuous maps, i=1,,Ni=1,\dotsc,N. Finally, set F(x)=(F1(x),,FN(x))F(x)=(F_{1}(x),\dotsc,F_{N}(x)) and, as usual, write

F(x),z=i=1NFi(x),zi\langle F(x),z\rangle=\sum_{i=1}^{N}\langle F_{i}(x),z_{i}\rangle (71)

for the dual pairing between F(x)𝒱=i𝒱iF(x)\in\mathcal{V}^{\ast}=\prod_{i}\mathcal{V}^{\ast}_{i} and z=(z1,,zN)𝒱i𝒱iz=(z_{1},\dotsc,z_{N})\in\mathcal{V}\coloneqq\prod_{i}\mathcal{V}_{i}.

We may then define SVI(𝒳,F)\operatorname{SVI}(\mathcal{X},F) as the set of x𝒳x^{\ast}\in\mathcal{X} such that

F(x),xx0,for all x𝒳\langle F(x^{\ast}),x-x^{\ast}\rangle\leq 0,\quad\text{for all $x\in\mathcal{X}$} (72)

and MVI(𝒳,F)\operatorname{MVI}(\mathcal{X},F) as the set of x𝒳x^{\ast}\in\mathcal{X} such that

F(x),xx0for all x𝒳.\langle F(x),x-x^{\ast}\rangle\leq 0\quad\text{for all $x\in\mathcal{X}$}. (73)

The counterparts for ESS\operatorname{ESS} and GESS\operatorname{GESS} may also be defined similarly (though the link with population games is no longer present).

Example 6.1.

Consider a strategic game with NN players, compact metric strategy spaces 𝒜id\mathcal{A}_{i}\subseteq\mathbb{R}^{d} and continuous payoff functions ui:𝒜u_{i}\colon\mathcal{A}\to\mathbb{R}, i=1,,Ni=1,\dotsc,N, with 𝒜j𝒜j\mathcal{A}\coloneqq\prod_{j}\mathcal{A}_{j}. Introduce the mixed extension of the game as usual: for a probability distribution (mixed strategy profile) χ=(χ1,,χN)𝒳\chi=(\chi_{1},\dotsc,\chi_{N})\in\mathcal{X} with χi𝒳iΔ(𝒜i)\chi_{i}\in\mathcal{X}_{i}\coloneqq\operatorname{\operatorname{\Delta}}(\mathcal{A}_{i}), let uiu_{i} be extended as

ui(χ)=𝒜ui(α1,,αN)𝑑χ1(α1)𝑑χN(αN).u_{i}(\chi)=\int_{\mathcal{A}}u_{i}(\alpha_{1},\dotsc,\alpha_{N})\,\>d\chi_{1}(\alpha_{1})\dotsm\>d\chi_{N}(\alpha_{N}). (74)

Then, introduce Fi(χ)=ui(,χi)F_{i}(\chi)=u_{i}(\cdot,\chi_{-i}) so that

Fi(χ),χi=Fi(χi;χi)\langle F_{i}(\chi),\chi^{\prime}_{i}\rangle=F_{i}(\chi^{\prime}_{i};\chi_{-i}) (75)

for all χi𝒳i=Δ(𝒜i)\chi^{\prime}_{i}\in\mathcal{X}_{i}=\operatorname{\operatorname{\Delta}}(\mathcal{A}_{i}). In this way, the equilibrium condition for χ𝒳\chi^{\ast}\in\mathcal{X} becomes

F(χ),χχ=i=1NFi(χ),χiχi0for all χ𝒳,\langle F(\chi^{\ast}),\chi-\chi^{\ast}\rangle=\sum_{i=1}^{N}\langle F_{i}(\chi^{\ast}),\chi_{i}-\chi^{\ast}_{i}\rangle\leq 0\quad\text{for all $\chi\in\mathcal{X}$}, (76)

which is again an instance of SVI(𝒳,F)\operatorname{SVI}(\mathcal{X},F). \lozenge

In this setting, FF is monotone (dissipative) if

F(x)F(x),xx0for all x,x𝒳\langle F(x)-F(x^{\prime}),x-x^{\prime}\rangle\leq 0\quad\text{for all $x,x^{\prime}\in\mathcal{X}$} (77)

As in the case of Section 2, we have SVI(𝒳,F)=MVI(𝒳,F)\operatorname{SVI}(\mathcal{X},F)=\operatorname{MVI}(\mathcal{X},F) if FF is monotone; furthermore, mutatis mutandis, all the properties presented in Section 2.3 extend to this general setup.

The best-response map associated to FF is defined again as

𝖡𝖱(x)=argmaxx𝒳F(x),x.\operatorname{\mathsf{BR}}(x)=\operatorname*{arg\,max}_{x^{\prime}\in\mathcal{X}}\langle F(x),x^{\prime}\rangle. (78)

In addition to monotonicity, we will also now make the “unique best response” assumption

𝖡𝖱(x)\operatorname{\mathsf{BR}}(x) is a singleton for all x𝒳x\in\mathcal{X}. (U)
Proposition 6.1.

If FF is monotone and (U) holds, the set EMVI(𝒳,F)=SVI(𝒳,F)E\coloneqq\operatorname{MVI}(\mathcal{X},F)=\operatorname{SVI}(\mathcal{X},F) is a singleton.

Proof.

If x,xEx,x^{\prime}\in E, we have F(x),xx0\langle F(x),x^{\prime}-x\rangle\leq 0 and F(x),xx0\langle F(x^{\prime}),x-x^{\prime}\rangle\leq 0, so

F(x)F(x),xx0\langle F(x^{\prime})-F(x),x^{\prime}-x\rangle\geq 0 (79)

implying first F(x)F(x),xx=0\langle F(x^{\prime})-F(x),x^{\prime}-x\rangle=0 by monotonicity and then F(x),xx=0\langle F(x^{\prime}),x^{\prime}-x\rangle=0. Thus, x=xx=x^{\prime} by (U). ∎

The regularized best response map is likewise defined as

𝖡𝖱εh(x)=argmaxx𝒳{F(x),xεh(x)}\operatorname{\mathsf{BR}}_{\varepsilon h}(x)=\operatorname*{arg\,max}_{x^{\prime}\in\mathcal{X}}\{\langle F(x),x^{\prime}\rangle-\varepsilon h(x^{\prime})\} (80)

where h:𝒳h\colon\mathcal{X}\to\mathbb{R} satisfies the same axioms as Definition 2.1. Note that 𝖡𝖱εh\operatorname{\mathsf{BR}}_{\varepsilon h} satisfies (U) by construction.

Now, using these best-response maps, we may extend the definition of the dynamics (3.1), (3.1) and (3.1) to the more general setup considered here in the obvious way. We then have:

Proposition 6.2.

Suppose that (U) holds and let XnX_{n}, n=1,2,n=1,2,\dotsc, be a sequence of states generated by (3.1). If the accumulation points of X¯n\bar{X}_{n} lie in SVI(𝒳,F)\operatorname{SVI}(\mathcal{X},F), the same holds for the accumulation points of XnX_{n}.

Proof.

Let xx^{\ast} be an accumulation point of Xn+1=𝖡𝖱(X¯n)X_{n+1}=\operatorname{\mathsf{BR}}(\bar{X}_{n}), and let Xnk+1X_{n_{k}+1} be a subsequence converging to xx^{\ast}. By descending to a further subsequence if necessary, we may assume without loss of generality that X¯nk\bar{X}_{n_{k}} converges to some limit point x¯\bar{x}. Furthermore, by the definition of (3.1), we have:

F(X¯nk),Xnk+1x=F(X¯nk),𝖡𝖱(X¯nk)x0for all x𝒳.\langle F(\bar{X}_{n_{k}}),X_{n_{k}+1}-x\rangle=\langle F(\bar{X}_{n_{k}}),\operatorname{\mathsf{BR}}(\bar{X}_{n_{k}})-x\rangle\geq 0\quad\text{for all $x\in\mathcal{X}$}. (81)

Hence, taking the limit kk\to\infty, we get

F(x¯),xx0for all x𝒳,\langle F(\bar{x}),x^{\ast}-x\rangle\geq 0\quad\text{for all $x\in\mathcal{X}$}, (82)

so x𝖡𝖱(x¯)x^{\ast}\in\operatorname{\mathsf{BR}}(\bar{x}). Since x¯SVI(𝒳,F)\bar{x}\in\operatorname{SVI}(\mathcal{X},F) by assumption, we deduce further that x¯𝖡𝖱(x¯)\bar{x}\in\operatorname{\mathsf{BR}}(\bar{x}). Hence, by (U), we conclude that x¯=x\bar{x}=x^{\ast}, and our assertion follows. ∎

Obviously, under (U), a similar property holds for regularized fictitious play.

6.5. Preview for Part II

Part II of our paper [16] deals with games with continuous action spaces, in the spirit of Example 6.1 (but with a continuous population of nonatomic players). The first step is to define the Stampacchia variational inequality when 𝒜\mathcal{A} is a compact metric space, Θ\Theta is an appropriate compact convex subset of Δ(𝒜)\Delta(\mathcal{A}), FF is a continuous map θFθ()\theta\mapsto F_{\theta}(\cdot) from Θ\Theta to the space of continuous functions on 𝒜\mathcal{A}, and the duality mapping is given by

Fθ,ν=𝒜Fθ(α)𝑑ν(α)for all θ,νΘ.\langle F_{\theta},\nu\rangle=\int_{\mathcal{A}}F_{\theta}(\alpha)\>d\nu(\alpha)\quad\text{for all $\theta,\nu\in\Theta$}. (83)

In this framework, we introduce potential and monotone maps, and we prove the convergence of the discrete-time dynamics of fictitious play.

We show that this study covers the case of nonatomic games with (player-dependent) compact action spaces. Finally, we introduce a class of first-order mean-field games and prove that solutions of the Stampacchia variational inequality correspond to the equilibria of the game in normal form where 𝒜\mathcal{A} is the set of paths generated by the players.

Appendix A Properties of choice maps and the Fenchel coupling

Our aim in this appendix is to provide some useful background and properties for the regularized choice map hh and the Fenchel coupling.

To recall the basic setup, we assume below that hh is a regularizer on 𝒳\mathcal{X} in the sense of Definition 2.1. The convex conjugate h:𝒜h^{\ast}\colon\mathbb{R}^{\mathcal{A}}\to\mathbb{R} of hh is then defined as

h(y)=maxx𝒳{y,xh(x)}.h^{\ast}(y)=\max_{x\in\mathcal{X}}\{\langle y,x\rangle-h(x)\}. (A.1)

Since 𝒳\mathcal{X} is compact and hh is strictly convex and continuous on 𝒳\mathcal{X}, the maximum in (A.1) is attained at a unique point in 𝒳\mathcal{X}, so h(y)h^{\ast}(y) is finite for all y𝒜y\in\mathbb{R}^{\mathcal{A}} [1]. Moreover, by standard results in convex analysis [42, Chap. 26], hh^{\ast} is differentiable on 𝒜\mathbb{R}^{\mathcal{A}} and its gradient satisfies the identity

h(y)=argmaxx𝒳{y,xh(x)}=Q(y)\nabla h^{\ast}(y)=\operatorname*{arg\,max}_{x\in\mathcal{X}}\{\langle y,x\rangle-h(x)\}=Q(y) (A.2)

where Q:𝒜𝒳Q\colon\mathbb{R}^{\mathcal{A}}\to\mathcal{X} is the choice map induced by hh (cf. Definition 2.1).

The lemma below establishes the inverse differentiability property for hh, namely that h=Q1\partial h=Q^{-1}:

Lemma A.1.

Let hh be a regularizer on 𝒳\mathcal{X}. Then, for all x𝒳x\in\mathcal{X} and all y𝒜y\in\mathbb{R}^{\mathcal{A}}, we have:

x=Q(y)yh(x).x=Q(y)\;\iff\;y\in\partial h(x). (A.3)

If, in addition, hh satisfies (Diff), we also have

h(x),xpy,xp\langle\nabla h(x),x-p\rangle\leq\langle y,x-p\rangle (A.4)

for all p𝒳p\in\mathcal{X} and all yh(x)y\in\partial h(x), xdomhx\in\operatorname{dom}\partial h.

Proof of Lemma A.1.

To prove (A.3), note that xx solves (A.2) if and only if yh(x)0y-\partial h(x)\ni 0, i.e., if and only if yh(x)y\in\partial h(x).

For the inequality (A.4), it suffices to show it holds for all pdomhp\in\operatorname{dom}\partial h (by continuity). To do so, let

ϕ(t)=h(x+t(px))[h(x)+y,x+t(px)].\phi(t)=h(x+t(p-x))-[h(x)+\langle y,x+t(p-x)\rangle]. (A.5)

Since hh is strictly convex and yh(x)y\in\partial h(x) by (A.3), it follows that ϕ(t)0\phi(t)\geq 0 with equality if and only if t=0t=0. Moreover, note that ψ(t)=h(x+t(px))y,px\psi(t)=\langle\nabla h(x+t(p-x))-y,p-x\rangle is a continuous subgradient selection of ϕ\phi. Given that ϕ\phi and ψ\psi are both continuous on [0,1][0,1], it follows a fortiori that ϕ\phi is continuously differentiable and ϕ=ψ\phi^{\prime}=\psi on [0,1][0,1]. Thus, with ϕ\phi convex and ϕ(t)0=ϕ(0)\phi(t)\geq 0=\phi(0) for all t[0,1]t\in[0,1], we conclude that ϕ(0)=h(x)y,px0\phi^{\prime}(0)=\langle\nabla h(x)-y,p-x\rangle\geq 0, from which our claim follows. ∎

Lemma A.2.

Let hh be a regularizer on 𝒳\mathcal{X}. Then, for all p𝒳p\in\mathcal{X} and all y𝒜y\in\mathbb{R}^{\mathcal{A}}, we have

Fh(p,y)0with equality if and only if p=Q(y).F_{h}(p,y)\geq 0\quad\text{with equality if and only if $p=Q(y)$}. (A.6)

If, in addition, if hh satisfies (StrCvx), we also have:

Fh(p,y)K2Q(y)p2.F_{h}(p,y)\geq\tfrac{K}{2}\lVert Q(y)-p\rVert^{2}. (A.7)
Proof.

Our first claim is a trivial consequence of the Fenchel-Young inequality and the strict convexity of hh. For our second claim, let x=Q(y)x=Q(y). Then, by the definition (A.1) of the convex conjugate of hh, we have:

Fh(p,y)\displaystyle F_{h}(p,y) =h(p)+h(y)y,p\displaystyle=h(p)+h^{\ast}(y)-\langle y,p\rangle
=h(p)+y,Q(y)h(Q(y))y,p\displaystyle=h(p)+\langle y,Q(y)\rangle-h(Q(y))-\langle y,p\rangle
=h(p)h(x)y,px\displaystyle=h(p)-h(x)-\langle y,p-x\rangle (A.8)

Now, since hh is KK-strongly convex, we also have:

h((1λ)x+λp)+λ(1λ)(K/2)xp2(1λ)h(x)+λh(p)h((1-\lambda)x+\lambda p)+\lambda(1-\lambda)(K/2)\lVert x-p\rVert^{2}\leq(1-\lambda)h(x)+\lambda h(p) (A.9)

and, with x=Q(y)x=Q(y), we get:

y,xh(x)y,(1λ)x+λph((1λ)x+λp).\langle y,x\rangle-h(x)\geq\langle y,(1-\lambda)x+\lambda p\rangle-h((1-\lambda)x+\lambda p). (A.10)

Thus, after rearranging, dividing by λ\lambda, and letting λ0\lambda\to 0, we obtain

h(p)h(x)y,px(K/2)xp2h(p)-h(x)-\langle y,p-x\rangle\geq(K/2)\lVert x-p\rVert^{2} (A.11)

and our assertion follows from (A). ∎

By virtue of this lemma, we obtain the following convergence criterion in terms of the Fenchel coupling:

Corollary A.1.

Let hh be a regularizer on 𝒳\mathcal{X} satisfying (StrCvx), fix a base point p𝒳p\in\mathcal{X}, and let yny_{n} be a sequence in 𝒜\mathbb{R}^{\mathcal{A}}. Then, if Fh(p,yn)0F_{h}(p,y_{n})\to 0, we also have Q(yn)pQ(y_{n})\to p.

Proof.

By Lemma A.2, we have Q(yn)p(2/K)Fh(p,yn)\lVert Q(y_{n})-p\rVert\leq\sqrt{(2/K)F_{h}(p,y_{n})}, so our claim follows. ∎

Acknowledgments

The authors acknowledge support by the COST Action CA16228 “European Network for Game Theory” (GAMENET), the FMJH Program PGMO under grant Damper, and the French National Research Agency (ANR) in the framework of the “Investissements d’avenir” program (ANR-15-IDEX-02), the LabEx PERSYVAL (ANR-11-LABX-0025-01), MIAI@Grenoble Alpes (ANR-19-P3IA-0003), and the grants ORACLESS (ANR-16-CE33-0004) and ALIAS (ANR-19-CE48-0018-01).

References

  • Bauschke & Combettes [2017] Heinz H. Bauschke and Patrick L. Combettes. Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York, NY, USA, 2 edition, 2017.
  • Benaïm [1999] Michel Benaïm. Dynamics of stochastic approximation algorithms. In Jacques Azéma, Michel Émery, Michel Ledoux, and Marc Yor (eds.), Séminaire de Probabilités XXXIII, volume 1709 of Lecture Notes in Mathematics, pp.  1–68. Springer Berlin Heidelberg, 1999.
  • Benaïm & Faure [2013] Michel Benaïm and Mathieu Faure. Consistency of vanishingly smooth fictitious play. Mathematics of Operations Research, 38(3):437–450, 2013.
  • Benaïm et al. [2005] Michel Benaïm, Josef Hofbauer, and Sylvain Sorin. Stochastic approximations and differential inclusions. SIAM Journal on Control and Optimization, 44(1):328–348, 2005.
  • Benaïm et al. [2006] Michel Benaïm, Josef Hofbauer, and Sylvain Sorin. Stochastic approximations and differential inclusions, part II: Applications. Mathematics of Operations Research, 31(4):673–695, 2006.
  • Berge [1997] Claude Berge. Topological Spaces. Dover, New York, 1997.
  • Bravo & Mertikopoulos [2017] Mario Bravo and Panayotis Mertikopoulos. On the robustness of learning in games with stochastically perturbed payoff observations. Games and Economic Behavior, 103, John Nash Memorial issue:41–66, May 2017.
  • Brown [1951] George W. Brown. Iterative solutions of games by fictitious play. In T. C. Coopmans (ed.), Activity Analysis of Productions and Allocation, 374-376. Wiley, 1951.
  • Cesa-Bianchi & Lugosi [2006] Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, Learning, and Games. Cambridge University Press, 2006.
  • Conley [1978] Charles Cameron Conley. Isolated Invariant Set and the Morse Index. American Mathematical Society, Providence, RI, 1978.
  • Dafermos [1980] Stella Dafermos. Traffic equilibrium and variational inequalities. Transportation Science, 14(1):42–54, February 1980.
  • Friedman [1991] Daniel Friedman. Evolutionary games in economics. Econometrica, 59(3):637–666, 1991.
  • Fudenberg & Levine [1998] Drew Fudenberg and David K. Levine. The Theory of Learning in Games, volume 2 of Economic learning and social evolution. MIT Press, Cambridge, MA, 1998.
  • Fudenberg & Levine [1999] Drew Fudenberg and David K. Levine. Conditional universal consistency. Games and Economic Behavior, 29(1):104–130, 1999.
  • Gilboa & Matsui [1991] Itzhak Gilboa and Akihiko Matsui. Social stability and equilibrium. Econometrica, 59(3):859–867, May 1991.
  • Hadikhanloo et al. [2021] Saeed Hadikhanloo, Rida Laraki, Panayotis Mertikopoulos, and Sylvain Sorin. Learning in nonatomic games, Part II: Continuous action spaces and mean-field games. Mimeo, 2021.
  • Héliou et al. [2020] Amélie Héliou, Matthieu Martin, Panayotis Mertikopoulos, and Thibaud Rahier. Online non-convex optimization with imperfect feedback. In NeurIPS ’20: Proceedings of the 34th International Conference on Neural Information Processing Systems, 2020.
  • Hofbauer & Sandholm [2002] Josef Hofbauer and William H. Sandholm. On the global convergence of stochastic fictitious play. Econometrica, 70(6):2265–2294, November 2002.
  • Hofbauer & Sandholm [2009] Josef Hofbauer and William H. Sandholm. Stable games and their dynamics. Journal of Economic Theory, 144(4):1665–1693, July 2009.
  • Hofbauer & Sigmund [1998] Josef Hofbauer and Karl Sigmund. Evolutionary Games and Population Dynamics. Cambridge University Press, Cambridge, UK, 1998.
  • Hofbauer & Sigmund [2003] Josef Hofbauer and Karl Sigmund. Evolutionary game dynamics. Bulletin of the American Mathematical Society, 40(4):479–519, July 2003.
  • Hofbauer et al. [1979] Josef Hofbauer, Peter Schuster, and Karl Sigmund. A note on evolutionarily stable strategies and game dynamics. Journal of Theoretical Biology, 81(3):609–612, 1979.
  • Hofbauer et al. [2009] Josef Hofbauer, Sylvain Sorin, and Yannick Viossat. Time average replicator and best reply dynamics. Mathematics of Operations Research, 34(2):263–269, May 2009.
  • Juditsky et al. [2011] Anatoli Juditsky, Arkadi Semen Nemirovski, and Claire Tauvel. Solving variational inequalities with stochastic mirror-prox algorithm. Stochastic Systems, 1(1):17–58, 2011.
  • Kwon [2016] Joon Kwon. Stratégies de descente miroir pour la minimisation du regret et l’approchabilité. PhD thesis, Université Pierre-et-Marie-Curie, 2016.
  • Kwon & Mertikopoulos [2017] Joon Kwon and Panayotis Mertikopoulos. A continuous-time approach to online optimization. Journal of Dynamics and Games, 4(2):125–148, April 2017.
  • Mas-Colell [1984] Andreu Mas-Colell. On a theorem of Schmeidler. Journal of Mathematical Economics, 13:201–206, 1984.
  • Maynard Smith & Price [1973] John Maynard Smith and George R. Price. The logic of animal conflict. Nature, 246:15–18, November 1973.
  • McFadden [1974] Daniel L. McFadden. Conditional logit analysis of qualitative choice behavior. In P. Zarembka (ed.), Frontiers in Econometrics, pp. 105–142. Academic Press, New York, NY, 1974.
  • McKelvey & Palfrey [1995] Richard D. McKelvey and Thomas R. Palfrey. Quantal response equilibria for normal form games. Games and Economic Behavior, 10(6):6–38, 1995.
  • Mertikopoulos & Moustakas [2010] Panayotis Mertikopoulos and Aris L. Moustakas. The emergence of rational behavior in the presence of stochastic perturbations. The Annals of Applied Probability, 20(4):1359–1388, July 2010.
  • Mertikopoulos & Sandholm [2016] Panayotis Mertikopoulos and William H. Sandholm. Learning in games via reinforcement and regularization. Mathematics of Operations Research, 41(4):1297–1324, November 2016.
  • Mertikopoulos & Sandholm [2018] Panayotis Mertikopoulos and William H. Sandholm. Riemannian game dynamics. Journal of Economic Theory, 177:315–364, September 2018.
  • Mertikopoulos & Staudigl [2018] Panayotis Mertikopoulos and Mathias Staudigl. On the convergence of gradient-like flows with noisy gradient input. SIAM Journal on Optimization, 28(1):163–197, January 2018.
  • Mertikopoulos & Zhou [2019] Panayotis Mertikopoulos and Zhengyuan Zhou. Learning in games with continuous action sets and unknown payoff functions. Mathematical Programming, 173(1-2):465–507, January 2019.
  • Minty [1967] George J. Minty. On the generalization of a direct method of the calculus of variations. Bulletin of the American Mathematical Society, 73(3):315–321, May 1967.
  • Monderer & Shapley [1996] Dov Monderer and Lloyd S. Shapley. Potential games. Games and Economic Behavior, 14(1):124 – 143, 1996.
  • Nemirovski [2004] Arkadi Semen Nemirovski. Prox-method with rate of convergence O(1/t){O}(1/t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM Journal on Optimization, 15(1):229–251, 2004.
  • Nesterov [2007] Yurii Nesterov. Dual extrapolation and its applications to solving variational inequalities and related problems. Mathematical Programming, 109(2):319–344, 2007.
  • Nesterov [2009] Yurii Nesterov. Primal-dual subgradient methods for convex problems. Mathematical Programming, 120(1):221–259, 2009.
  • Robinson [1951] Julia Robinson. An iterative method for solving a game. Annals of Mathematics, 54:296–301, 1951.
  • Rockafellar [1970] Ralph Tyrrell Rockafellar. Convex Analysis. Princeton University Press, Princeton, NJ, 1970.
  • Rustichini [1999] Aldo Rustichini. Optimal properties of stimulus-response learning models. Games and Economic Behavior, 29(1-2):244–273, 1999.
  • Sandholm [2001] William H. Sandholm. Potential games with continuous player sets. Journal of Economic Theory, 97:81–108, 2001.
  • Sandholm [2010] William H. Sandholm. Population Games and Evolutionary Dynamics. MIT Press, Cambridge, MA, 2010.
  • Sandholm [2015] William H. Sandholm. Population games and deterministic evolutionary dynamics. In H. Peyton Young and Shmuel Zamir (eds.), Handbook of Game Theory IV, pp.  703–778. Elsevier, 2015.
  • Schmeidler [1973] David Schmeidler. Equilibrium points of nonatomic games. Journal of Statistical Physics, 7(4):295–300, April 1973.
  • Shalev-Shwartz [2011] Shai Shalev-Shwartz. Online learning and online convex optimization. Foundations and Trends in Machine Learning, 4(2):107–194, 2011.
  • Shalev-Shwartz & Singer [2006] Shai Shalev-Shwartz and Yoram Singer. Convex repeated games and Fenchel duality. In NIPS’ 06: Proceedings of the 19th Annual Conference on Neural Information Processing Systems, pp.  1265–1272. MIT Press, 2006.
  • Sorin [2009] Sylvain Sorin. Exponential weight algorithm in continuous time. Mathematical Programming, 116(1):513–528, 2009.
  • Sorin [2020] Sylvain Sorin. Replicator dynamics: Old and new. Journal of Dynamics and Games, 7(4):365–386, 2020.
  • Sorin [2021] Sylvain Sorin. No-regret algorithms in online learning, games and convex optimization. Mimeo, 2021.
  • Sorin & Wan [2016] Sylvain Sorin and Cheng Wan. Finite composite games: Equilibria and dynamics. Journal of Dynamics and Games, 3(1):101–120, January 2016.
  • Taylor [1979] Peter D. Taylor. Evolutionarily stable strategies with two types of player. Journal of Applied Probability, 16(1):76–83, March 1979.
  • Taylor & Jonker [1978] Peter D. Taylor and Leo B. Jonker. Evolutionary stable strategies and game dynamics. Mathematical Biosciences, 40(1-2):145–156, 1978.
  • van Damme [1987] Eric van Damme. Stability and perfection of Nash equilibria. Springer-Verlag, Berlin, 1987.
  • Weibull [1995] Jörgen W. Weibull. Evolutionary Game Theory. MIT Press, Cambridge, MA, 1995.