Online Optimization in Games via Control Theory:
Connecting Regret, Passivity and Poincaré Recurrence
Abstract
We present a novel control-theoretic understanding of online optimization and learning in games, via the notion of passivity. Passivity is a fundamental concept in control theory, which abstracts energy conservation and dissipation in physical systems. It has become a standard tool in analysis of general feedback systems, to which game dynamics belong. Our starting point is to show that all continuous-time Follow-the-Regularized-Leader (FTRL) dynamics, which include the well-known Replicator Dynamic, are lossless, i.e. it is passive with no energy dissipation. Interestingly, we prove that passivity implies bounded regret, connecting two fundamental primitives of control theory and online optimization.
The observation of energy conservation in FTRL inspires us to present a family of lossless learning dynamics, each of which has an underlying energy function with a simple gradient structure. This family is closed under convex combination; as an immediate corollary, any convex combination of FTRL dynamics is lossless and thus has bounded regret. This allows us to extend the framework of Fox and Shamma [FS13] to prove not just global asymptotic stability results for game dynamics, but Poincaré recurrence results as well. Intuitively, when a lossless game (e.g. graphical constant-sum game) is coupled with lossless learning dynamics, their feedback interconnection is also lossless, which results in a pendulum-like energy-preserving recurrent behavior, generalizing [PS14, MPP18].
Lossless Learning Dynamic | Physical System | |
Replicator Dynamic, | ||
examples | Follow-the-Regularized-Leader | gravity |
(FTRL) dynamics | ||
state | cumulative payoffs | vertical height |
energy | storage function | (negative) potential energy |
gradient of | ||
energy | mixed strategies of agents | gravitational force |
convex combination (CC): | linear combination (LC): | |
invariant | any CC of storage functions produces | any LC of potential energy function is |
property | a lossless learning dynamic that | a potential energy function that yields |
yields the same CC of mixed strategies | the same LC of gravitational forces | |
another | instantaneous payoffs | velocity |
analogue | ||
change of | ||
energy value |
1 Introduction
Online optimization aims at designing algorithms that can maximize performance in unpredictable and even adversarially evolving environments. The standard benchmark for success in these environments is minimizing regret, which is defined as the difference between the accumulated performance of the algorithm and that of the best action in hindsight. One of the most important achievements of the field has been to establish that such regret minimizing algorithms exist [CBL06, SS12]. Amongst the most well-known such algorithms is the class of Follow-the-Regularized-Leader (FTRL) algorithms, which include as special cases ubiquitous meta-algorithms such as Multiplicative Weights Update [FS99, AHK05] and Gradient Descent [SS12]. It is well known that such algorithms can achieve regret by employing slowly decreasing step-sizes, and that this bound is effectively optimal given arbitrary payoff sequences. When applying such algorithms in games, as the sequence of payoffs becomes more predictable, stronger regret guarantees are possible [RS13, FLST16, SALS15, BP19]. In continuous-time model, FTRL dynamics are once again optimal achieving bounded regret in general settings [KM17, MPP18, BGP20]. Hence both from the perspective of optimization as well as game theory, FTRL dynamics constitute effectively an optimal choice.
Control theory, on the other hand, is motivated by a seemingly unrelated set of questions. It aims to develop methodologies for stabilizing complex processes and machines. Due to its intrinsic connections to real-world systems, control theory revolves around concepts with a strong grounding in physical systems. A fundamental property of numerous physical systems is passivity, which is typically defined in terms of energy dissipation, conservation and transformation [Wil72a, Wil72b, OPNSR13]. Passivity is an “input-output” property of a system, and expresses that a system which is supplied with bounded energy can only output bounded energy. Passive systems come equipped with a storage function that accumulates the supplied energy but perhaps with some loss (cf. energy loss due to friction in mechanical systems). Overall, passivity encodes a useful notion of stability, since such system cannot explode into unpredictable out-of-control motion as it would correspond to unbounded energy output.
Although the fields of online optimization and control theory are both well developed with long and distinct histories, their interconnection is still rather nascent. Online algorithms can be abstractly thought as input-output operators where the input is a stream of payoffs, and the output is a stream of behavioral outcomes. Both notions of regret and passivity are similar properties of such input-output algorithms/operators and encode a notion of predictability and stability around a reference frame. In regret, the reference frame is given by the cumulative payoff of past actions, in passivity by energy level sets. This raises our first set of questions:
Are there formal connections between regret and passivity? Moreso, can we interpret the optimal regret of FTRL dynamics from a passivity perspective? Are there similarly optimal learning dynamics / input-output operators?
Any formal connection across the two fields is clearly valuable, as it allows for a fusion of ideas and methodologies between two well-developed fields, and expedite the progress on areas of interest that are common to both, such as game theory [FL98, CBL06, MS15, MS18]. For related issues on the intersection of learning, control theory and games, see [Sha20] and the references therein.
Notably, Fox and Shamma [FS13] proposed a control-theoretic framework for analyzing learning in games. One of their key contributions is to identify a modular approach, where an analysis can be performed by studying a learning operator (which converts payoff input to strategy output) and a game operator (which converts strategy input to payoff output) independently, while the whole game dynamic is a feedback interconnection system of the two operators (see Figure 2). By focusing on coupling learning heuristics that are strictly passive with passive games (e.g. zero-sum games), the resulting strictly passive systems were shown to converge to equilibria, generalizing and unifying numerous prior results, e.g. [HS09].
The modular approach has allowed numerous works which study learning or game operators separately [MS16, PSM18, Mab18, GP19]. Despite this progress, settings of critical importance for AI such as understanding the perfectly recurrent non-equilibrating behaviors of Gradient Descent (and other FTRL dynamics) in zero-sum games has so far remained outside the reach of these techniques [PS14, MPP18, BRM+18, VGFP19, PML+20]. This raises our second question:
Can the pendulum-like cyclic behavior of FTRL dynamics in zero-sum games be understood and generalized via passivity?

Our Contributions.
We provide affirmative answers to both questions raised above. We show that any finitely passive learning dynamic111 In control theory literature, a passive operator usually has a storage function with a finite lower bound. In this paper, storage functions can have a finite lower bound or not, depending on the types of the operators. We emphasize those operators which have a storage function with a finite lower bound as being finitely passive. For the definition of finitely passive learning dynamic, see Definition 5. guarantees constant regret (Theorem 8). By using the notion of convex conjugate from convex analysis, we show that any continuous-time FTRL dynamic is finitely passive and lossless (Theorem 7); the same holds for certain escort replicator dynamics [Har11] (Appendix D). These generalize [Mab18] which showed that Replicator Dynamic is finitely lossless. Combining the two theorems above immediately recovers the result in [MPP18] that any FTRL dynamic guarantees constant regret. We note that in the analysis of Mabrok [Mab18], the state space (i.e. the domain of the storage function) is the space of mixed strategies, while we turn to a new state space of cumulative payoffs for FTRL. This choice is crucial for the generalization to FTRL dynamics, and it permits a cleaner proof via the tools established in convex analysis.
A key observation that enables the above results is that FTRL dynamic admits a storage function with a simple gradient structure, which will be described formally in Section 5. This motivates us to study a new family of lossless learning dynamics, which is in one-one correspondence with the family of storage functions possessing that gradient structure. By observing that such storage functions are closed under convex combination, we discover that any convex combination of FTRL dynamics is finitely passive and lossless, and thus guarantees constant regret (Theorem 16). “Convex combination of FTRL dynamics” means: Suppose there are FTRL dynamics, indexed from to . When we use the -th one, it converts the cumulative payoffs at time to a mixed strategy . A convex combination of these FTRL dynamics converts the cumulative payoffs at time to the mixed strategy , where ’s are positive constants satisfying .
Convex combinations of lossless dynamics are directly analogous to linear combinations of conservative vector fields in analyzing physical dynamics (see Figure 1). This technique is also of practical relevance, since we might want to mix-and-match different dynamics to elicit their advantages. For instance, different learning dynamics may lean toward either exploitation or exploration. By combining them via convex combination with our own choice of ’s, we can control our desired balance between exploitation and exploration (see Example 14).
We also show that for every graphical constant-sum game (e.g. two-person zero-sum game) that admits a fully-mixed Nash equilibrium, it corresponds to a finitely lossless game operator (Proposition 17). Thus, the game dynamic of any convex combinations of FTRL dynamics in such a game corresponds to a finitely lossless operator. We use this observation to show that the game dynamic is almost perfectly recurrent, via the notion of Poincaré recurrence (Theorem 19). This distinguishes our work from [FS13] and its subsequent works about learning in games, as they mostly concern stability/convergence, while we study recurrence.
Roadmap.
In Sections 2 and 3, we present the necessary background for this work, including the definitions of different operators, the notions of (lossless) passivity and storage function, and some basic results about passivity. In Section 4, we show the desirable properties (finitely losslessness, constant regret) of FTRL dynamics. In Section 5, we present a characterization of lossless learning dynamics via the above-mentioned gradient structure of storage functions, and we discuss some properties of convex combinations of such learning dynamics. The results about Poincaré recurrences of learning in graphical constant-sum games are presented in Section 6. All missing proofs can be found in the appendix.
2 Preliminary
In this section, we define the operators depicted in Figure 2. We first define learning dynamic and its corresponding learning operator. When there are multiple agents and each agent is using one learning dynamic, their learning operators can be concatenated naturally to form a merged learning operator. Then we define game operator, whose interconnection with a merged learning operator in the manner of Figure 2 is called a dynamical game system.
We use a bold lower case to denote a vector variable. Let denote the set of non-negative real numbers. In this paper, every function from to is assumed to be square integrable, and we call it a function-of-time. An (input-output) operator is a mapping whose input and output are both functions-of-time. Let denote the probability simplex over actions, i.e. . denotes the inner product of the vectors of same dimension , i.e. .
Learning Dynamic and Learning Operator.
We focus on the following type of continuous-time learning dynamics. An agent has an action set ; let . The process starts at time . For any time and for each action , the agent computes the cumulative payoff if she chooses action in the time interval , denoted by . Formally, for each action , let denote the (instantaneous) payoff to action at time . Then
where is a constant chosen at the beginning of the process. Let . For any , the agent uses a conversion function which takes as input, and outputs a mixed strategy over the actions. The process can be expressed compactly as an ordinary differential equations (ODE) system:
(Initial condition) | |||||
(Cumulative payoff/state update) | (1) | ||||
(Behavioral/strategy output) |
When is already given, the conversion function specifies the learning dynamic. The learning dynamic can be viewed as an operator, which takes the function-of-time as input, and the output is another function-of-time .
Regret.
For any , the regret of a learning dynamic at time is
We say a learning dynamic guarantees constant regret if for any and any , the regret at time is bounded from above by a constant that depends on only.
Replicator Dynamic and FTRL Learning Dynamics.
When the conversion function is the logit choice map
(2) |
where , the learning dynamic (1) is equivalent to the well-known Replicator Dynamic [HS98, San10], which is the continuous-time analogue of Multiplicative Weights Update.
A FTRL learning dynamic is specified by a strictly convex regularizer function , which determines the conversion function as below:
(3) |
It is known that Replicator Dynamic is a special case of FTRL, by setting . Online Gradient Descent (OGD) is another commonly studied learning dynamic that is also a special case of FTRL with regularization, i.e. . [H+16] When , the conversion function of OGD is
(4) |
Merged Learning Operator.
When a system has agents, and each agent uses a learning dynamic, we concatenate the corresponding learning operators together to form a merged learning operator (MLO). Precisely, the input to the MLO is , and its output is , where is the output of the learning operator of agent when its input is . In this paper, we use hat notations (e.g. ) to denote variables formed by such concatenations of variables of individual agents.
Game Operator and Dynamical Game System.
Here, we provide a general definition of game operators (as in Figure 2), and leave the discussion about graphical constant-sum games, which appear in our Poincaré recurrence results, to Section 6.
A game has agents. Each agent has actions. After each agent chooses a mixed strategy over her own actions, the game determines a payoff vector , where is the payoff to action of agent . Let , and . We can think of the game as a function . Its game operator takes a function-of-time as input, and it outputs a function-of-time , where for all .
A dynamical game system (DGS) comprises of agents. The game operator has input and output . Each agent uses a learning dynamic of the form (1). The agents’ learning operators are concatenated to form a MLO, which has input and output when . The MLO is interconnected with the game operator in the manner of Figure 2.
3 Passivity
To motivate the notions of passivity and energy, consider an electrical network connected to a power source, where the voltage and current across the network at time are respectively and . Let denote the energy stored in the network at time . We have . The reason for the inequality (but not an exact equality) is that energy might dissipate from the network. In this setting, the function-of-time is the input, while the function-of-time is the output, so the network is indeed an operator; and as we shall see, this operator is passive.
Passivity of State Space System.
To generalize the above idea to passivity of an ODE system, we need several mathematical notations. Let denote the Hilbert space of square integrable functions mapping to with inner product: . Let . An (input-output) operator is simply a mapping , where .
We consider the following type of operators, which can be represented by an ODE system called state space system (SSS) of the following general form:
(5) | ||||
where , , and . The set is called the set of states. As the notations suggest, are the input and output of this operator respectively. When is fed into the operator, the first two equalities define a well-posed ODE system, so a unique solution of exists under mild conditions. Then is the output determined by a function of the unique solution and the input . The learning dynamic (1) is such an operator, by viewing in (1) as in (5) respectively. We are ready to present the definition of passivity for such operators.
Definition 1.
A SSS is passive if there exists a storage function such that for all , and all input-output pairs , we have
(6) |
If the equality always holds, then we say the SSS is lossless passive or simply lossless. If a SSS is passive (resp. lossless) via a storage function that has a finite lower bound, we say it is finitely passive (resp. finitely lossless).
We note the reminiscence of inequality (6) with the inequality in the motivating example of electrical network.
When a SSS is finitely passive/lossless, we may assume, without loss of generality, that the finite lower bound of its storage function is zero. We make this assumption on all finitely passive/lossless SSS in the rest of this paper.

Feedback Interconnection System.
In Figure 2, we presented a feedback interconnection (FIC) system. While it is intuitive for readers to have a first understanding of the concept in control theory, for our analysis it is more convenient to use Figure 3. The FIC system in Figure 3 consists of two SSS and , with an external input source , while its output is ; note that the FIC system is an operator by definition. The variables are related via: , , , and .
An important property of passive operators is that passivity is composable, i.e. the composition of two passive operators results in a passive system. Intuitively, if no operator in the total system is able to produce energy, then the system as a whole cannot produce energy either. The following theorem formally captures this intuition for FIC systems.
Theorem 2 ([FS13] Theorem 3.2).
Consider the FIC system in Figure 3. Suppose that for , is passive via storage function . Then the FIC system is a passive operator via storage function . Precisely, for any , and ,
If are lossless, then the FIC system is lossless via storage function , i.e. the inequality above becomes an equality.
Note that in the above theorem, if the FIC system is lossless and is the zero function, then we have for all , i.e. the value of does not change over time. The underlying dynamic is said to admit a constant-of-motion.
DGS as a FIC System.
Basic Results about Passivity of Learning Operators.
By viewing a DGS as a FIC system, we are interested in MLO and game operators which possess good properties like passivity. The key advantage of this approach is it permits a DGS to be decoupled into two distinct operators, which can be analysed separately.
is a MLO. First, we show that it is passive if all learning operators possessed by the agents are passive. This allows us to turn our focus to analyzing whether each individual learning operator is passive/lossless or not.
Proposition 3.
Suppose for each agent , her learning operator with shift is passive (resp. lossless) via a storage function . Then the MLO with shift is passive (resp. lossless) via the storage function .
By the next proposition, the choice of shift does not affect passivity of the learning operator. Thus, we say a learning dynamic is passive when its learning operator with any shift is passive.
Proposition 4.
Let be two learning operators of the same learning dynamic, with shifts respectively. is passive (resp. lossless) via storage function if and only if is passive (resp. lossless) via storage function , where is any constant.
The above proposition works even when the shifts are not mixed strategies, e.g. when is the zero vector. We use to denote a storage function of a learning operator with zero shift, and to denote a storage function of a learning operator with a shift of a mixed strategy.
While a shift does not affect passivity, it does affect whether the learning operator is finitely passive or not. In order to prove that certain learning dynamics guarantee constant regret, we need their learning operators with some specific shifts to be finitely passive. This motivates the following definition of finitely passive learning dynamics. Let denote the vector with the -th entry be one, and all other entries be zero.
Definition 5.
A learning dynamic is finitely passive (resp. finitely lossless) if for every action , its learning operator with shift is finitely passive (resp. finitely lossless).
Proposition 6.
If a learning dynamic is finitely passive (resp. finitely lossless), then for any mixed strategy , the learning operator with shift is finitely passive (resp. finitely lossless).
4 Passivity of Learning Operators
4.1 FTRL Dynamics are Finitely Lossless
We start the analysis by establishing a strong connection between FTRL dynamics and passivity. Specifically, FTRL dynamics are finitely lossless.
Theorem 7.
Given any FTRL dynamic over actions and with regularizer function , let the convex conjugate of be
(7) |
Then for any , the learning operator with shift is finitely lossless via the storage function given below:
(8) |
In particular, for any action , the learning operator with shift is finitely lossless, and hence any FTRL dynamic is finitely lossless by Definition 5.
Proof: .
By the “maximizing argument” identity in p. 149 of [SS12], we have . Hence, . By the chain rule, , and hence , verifying that the operator is lossless. Moreover, is bounded below by zero, since by the definition of , for any , . ∎
We summarize the properties of the operators of FTRL dynamics with various shifts in Figure 4.
FTRL with zero shift | FTRL with shift of a mixed strategy | |
input | (payoff) | |
output | (mixed strategy) | |
state | (cumulative payoff) | |
storage function | (see (7)) | |
infimum | ||
property | lossless | finitely lossless |
4.2 Relationship to Regret
Theorem 8.
Any finitely passive learning dynamic guarantees constant regret.
Proof: .
Let denote the storage function of the learning operator with shift . Since the learning dynamic is finitely passive, we can assume the infimum of is zero. By the definition of passivity,
Hence, the regret w.r.t. action at time satisfies:
Thus, the regret up to time is bounded by , which is a constant that depends only on the initial state . ∎
Corollary 9 ([MPP18]).
Every FTRL dynamic guarantees constant regret.
Theorem 8 states that finite passivity implies constant regret. The following proposition states that the converse (constant regret implies finite passivity) is also true, if we restrict to lossless learning dynamics.
Proposition 10.
Suppose that a learning dynamic is lossless. Then the learning dynamic guarantees constant regret if and only if it is finitely lossless.
5 A Characterization of Lossless Learning Dynamics, and Their
Convex Combinations
In Section 4.1, we showed that every FTRL dynamic is lossless. A FTRL dynamic is specified by a convex regularizer function , while the storage function has an implicit form in term of . Here, we ask how we can specify the storage function directly to generate a lossless learning dynamic. Not all storage functions work, so we first seek some necessary conditions on them. These conditions are stated using the storage function of the learning operator with zero shift. By Proposition 4, the lossless storage function of the learning operator with shift is .
Suppose there is a lossless learning dynamic in the form of (1); recall that when is already given, the learning dynamic is specified by its conversion function . Let be the storage function of its learning operator with zero shift. We present two necessary conditions on and .
Necessary Condition 1 (NC1). (i) For any , . (ii) For any , we have .
Reason: Since the learning dynamic is lossless, . Taking time-derivative on both sides yields
On the other hand, by the chain rule, we have
Thus, for any and . This readily implies conditions (ii) holds.222To formally argue this, for any and for each action , we construct continuous such that and . This implies that the -th component of is same as the -th component of . The construction is easy to make. Since , condition (i) holds.
Necessary Condition 2 (NC2). For any real number and any , .
Reason: By NC1, , which is in . Thus, the directional derivative of along the direction is .
Indeed, it is easy to verify that the above two necessary conditions are also sufficient.
Proposition 11.
Any smooth function which satisfies NC1(i) and NC2 is a lossless storage function of a learning operator with zero shift. The conversion function satisfies .
Proof: .
Due to NC2, , thus . This equality and NC1(i) implies . Now, consider a learning dynamic with conversion function . Then for any function-of-time and any , we have . Integrating both sides w.r.t. shows that the learning dynamic is lossless via the storage function . ∎
Example 12.
Consider the learning dynamic where the conversion function always outputs the uniform mixed strategy, i.e. for all , . By Proposition 11, it is a lossless learning dynamic via , a storage function that clearly satisfies NC1(i) and NC2. Of course, there is not really “learning” with this conversion function, and it is easy to construct examples to demonstrate that its regret can be unbounded. By Proposition 10, this learning dynamic is not finitely lossless. This shows the family of finitely lossless learning dynamics is a proper subset of the family of lossless learning dynamics.
The family of smooth functions satisfying NC1(i) and NC2 can be represented compactly as
An interesting observation is that this family is closed under convex combination, i.e. if , then for any real numbers such that , . By Proposition 11, are lossless storage functions of some learning operators with zero shift. Suppose the conversion functions are respectively. Then by Proposition 11 again, is a lossless storage function of a learning operator with zero shift, with conversion function . This motivates the following definition of convex combination of learning dynamics.
Definition 13.
Given learning dynamics, each over actions, let denote the conversion function of the -th learning dynamic. Given any non-negative constants where , the convex combination of the learning dynamics with parameters is a learning dynamic with conversion function .
Example 14.
Suppose we are using the half-half convex combination of Replicator Dynamic (RD) and Online Gradient Descent (OGD), and . Recall the conversion functions of RD and OGD in (2) and (4). Suppose that at some time , . Then the mixed strategy at time with the half-half convex combination of RD and OGD is
By (4), outputs a strategy with zero probability of choosing action 2 whenever . In contrast, maintains a tiny but positive probability of choosing action 2 even when is much larger than . We may say OGD leans toward exploitation while RD leans toward exploration. By combining the two learning dynamics via convex combination with our own choice of ’s, we obtain a new lossless learning dynamic with our desired balance between exploitation and exploration.
Convex combination not only preserves losslessness, but also finitely losslessness. Suppose there are several finitely lossless learning dynamics. By Definition 5, for every action , the storage functions of their learning operators with shift have finite lower bounds. It is easy to verify that for a convex combination of these lossless learning dynamics, its learning operator with shift is lossless via the same convex combination of the storage functions mentioned above. Since the storage functions have finite lower bounds, their convex combination has a finite lower bound too.
Theorem 15.
Given any lossless (resp. finitely lossless) learning dynamics, any convex combination of them is a lossless (resp. finitely lossless) learning dynamic.
Theorem 16.
Any convex combination of any finitely lossless learning dynamics is a learning dynamic that guarantees constant regret. In particular, any convex combination of any FTRL learning dynamics is a learning dynamic that guarantees constant regret.
Remark. Note that the family of smooth storage functions that generate finitely lossless learning dynamics is
Again, this family is closed under convex combination. By Theorem 7, the family of FTRL learning dynamics is a subset of the family of finitely lossless learning dynamics. It is not clear if the two families are equal; we believe not. For instance, for the half-half convex combination of Replicator Dynamic and Online Gradient Descent, we cannot find any regularizer function that validates it is a FTRL dynamic.
6 Lossless DGS and Poincaré Recurrence
In the last section, we present a potentially broad family of (finitely) lossless learning dynamics. In this section, our focus is on DGS in which agents use such learning dynamics. We first prove that for certain graphical constant-sum games, their game operators are finitely lossless. Thus, for any DGS comprising of such game and finitely lossless learning dynamics, it admits a constant-of-motion by Theorem 2 when , i.e. the sum of the two storage function values is a constant for any . Then we use this constant-of-motion and follow a principled approach proposed by Mertikopoulos et al. [MPP18] to show our main result here: the DGS is Poincaré recurrent. In the rest of this section, we first define graphical constant-sum game and Poincaré recurrence, then we apply the principled approach to prove our main result.
Graphical Constant-sum Game.
There are agents. Each agent has actions. In a graphical game [KLS01], there is a normal-form (matrix) game between every pair of agents, which we call an edge-game. The edge game between agents is specified by two matrices, and . When each agent chooses a mixed strategy , the payoff vector of agent , which contains the payoff to each action of agent , is the sum of the payoff vectors in all her edge-games. Precisely, the payoff vector of agent is
(9) |
A graphical constant-sum game [DP09] is a graphical game such that for every pair of agents , there exists a constant satisfying for any action of agent and action of agent .
Game Operator.
Recall that a game operator has an input of mixed strategies of different agents with shift , while the output is the negative of the cumulative payoff vector to different actions. We point out one useful fact, which follows quite readily from [PS14].
Proposition 17.
If is a Nash equilibrium of a graphical constant-sum game, then the game operator with shift is passive; the storage function is the zero function. Moreover, if is fully-mixed (i.e. every entry in is positive), then the game operator is lossless via the zero storage function.
When the game is lossless, by the composability of passive operators (Theorem 2), if it is coupled with passive (resp. lossless) learning dynamics, then the DGS is passive (resp. lossless).
Poincaré Recurrence.
In a DGS, is a function of via the game operator, while is a function of via the conversion function. Thus, is a function of . We say the ODE system (1) is divergence-free if is zero everywhere. When is derived using a graphical game via (9), does not depend on since for every , is a function of only. Thus, the game dynamic is divergence-free.
Intuitively, an ODE system with domain is Poincaré recurrent if almost all trajectories return arbitrarily close to their initial position infinitely often. In order to work formally with the notion of Poincaré recurrence we need to define a measure on . We use the standard Lebesgue measure on . Liouville’s formula states that divergence-free ODE systems preserve volume [Wei95]; see Appendix A for more discussion of volume and Liouville’s formula. Thus, the following Poincaré Recurrence Theorem is applicable if its bounded-orbit requirement is satisfied.
Theorem 18 ([Poi90, Bar06]).
If a transformation preserves volume and has bounded orbits, then it is Poincaré recurrent, i.e. for each open set there exist orbits that intersect this set infinitely often.
Given any , we can cover by countably many balls of radius , and apply the theorem to each ball. We conclude that almost every point returns to within an neighbourhood of itself. Since is arbitrary, we conclude that almost every initial point is almost recurrent.
Poincaré Recurrence in DGS.
Recall that in each learning operator of a DGS, the state is represented by a vector , which represents the cumulative payoffs. Clearly, it can be unbounded as time goes even in a graphical constant sum game333For instance, this happens for a two-person zero-sum game with every payoff entry to agent 1 is strictly positive., thus prohibiting us to use Theorem 18. Instead, as in [MPP18], we consider a transformation that maps to
It is well-known that for any FTRL dynamic with starting point and conversion function , it is equivalent to the following dynamic with state variables , whereas :
(10) |
Given a mixed strategy , if we cast the output of the above dynamic to be , then it is easy to verify that the learning operator is finitely lossless via the storage function , where is the storage function of the original learning operator with shift .
Theorem 19.
Poincaré recurrence occurs in the strategy space for any dynamical game system where (1) each agent employs a learning dynamic which is a convex combination of FTRL; and (2) the underlying game is a graphical constant-sum game with a fully-mixed Nash equilibrium.
See Figure 5 for an illumination of Poincaré recurrence under conditions (1) and (2). To prove the theorem, we first show that Poincaré recurrence occurs in the space that contains ( is the concatenation of the of all agents). This comprises of two major steps:
-
•
The dynamic preserves volume, since the dynamic is divergence-free.
-
•
For any starting point , the dynamic remains bounded. To show this, we use the fact that is a constant-of-motion of the game dynamic, so for any , must stay within a level set of , which is bounded (see Appendix C).
Poincaré recurrence in the space that contains implies Poincaré recurrence in the strategy space , since the conversion function in (10) is continuous.

7 Conclusion
We present a control-theoretic perspective to understanding popular learning dynamics like FTRL and escort replicator dynamics. At the heart of it is the use of storage (energy) functions to govern how the dynamic turns history of payoffs to strategic choices. This mirrors the study of physical dynamics, e.g. electrical networks [Kha15]. Analysis via storage functions permits us to prove optimal regret bounds and inspires interesting generalizations of FTRL via convex combinations.
An important benefit of these control-theoretic tools is, as pointed out by Fox and Shamma [FS13], they allow decoupling of game dynamics into learning operators and game operators. This provides a framework to understand learning-in-games via a modular approach, by analyzing these operators separately. This technique can liberate us from analyzing each individual learning-in-game system in ad-hoc manner.
In our work, we initiate the study of connections between online optimization and control theory with continuous-time learning dynamics. An interesting problem is how such a connection can be generalized to discrete-time learning algorithms, e.g. Multiplicative Weights Update and its optimistic variant [BP18, Che18, DP19, CP20]. There does exist theory that generalizes passivity to discrete-time settings, e.g. Section VI.7 in [DV75]. We hope our work inspires further studies in this direction.
Lastly, we believe that this control-theoretic perspective is also useful for understanding learning dynamics/algorithms which are not always passive. The perspective can help us spot under which situations the learning dynamics/algorithms create or dissipate energy. By avoiding situations where energy is created, it is possible that we can achieve stable outcomes in learning processes.
Acknowledgements
This research/project is supported in part by NRF2019-NRF-ANR095 ALIAS grant, grant PIE-SGP-AI-2018-01, NRF 2018 Fellowship NRF-NRFF2018-07, AME Programmatic Fund (Grant No. A20H6b0151) from the Agency for Science, Technology and Research (A*STAR) and the National Research Foundation, Singapore under its AI Singapore Program (AISG Award No: AISG2-RP-2020-016).
References
- [AHK05] Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta algorithm and applications. Technical report, 2005.
- [Bar06] Luis Barreira. Poincare recurrence: old and new. In XIVth International Congress on Mathematical Physics. World Scientific., pages 415–422, 2006.
- [BGP20] James P. Bailey, Gauthier Gidel, and Georgios Piliouras. Finite regret and cycles with fixed step-size via alternating gradient descent-ascent. In Conference on Learning Theory, COLT 2020, 9-12 July 2020, Virtual Event [Graz, Austria], volume 125 of Proceedings of Machine Learning Research, pages 391–407. PMLR, 2020.
- [BP18] James P. Bailey and Georgios Piliouras. Multiplicative weights update in zero-sum games. In Proceedings of the 2018 ACM Conference on Economics and Computation, Ithaca, NY, USA, June 18-22, 2018, pages 321–338. ACM, 2018.
- [BP19] James Bailey and Georgios Piliouras. Fast and furious learning in zero-sum games: vanishing regret with non-vanishing step sizes. In Advances in Neural Information Processing Systems, pages 12977–12987, 2019.
- [BRM+18] David Balduzzi, Sebastien Racaniere, James Martens, Jakob Foerster, Karl Tuyls, and Thore Graepel. The mechanics of n-player differentiable games. In ICML, 2018.
- [CBL06] Nikolo Cesa-Bianchi and Gabor Lugoisi. Prediction, Learning, and Games. Cambridge University Press, 2006.
- [Che18] Yun Kuen Cheung. Multiplicative weights updates with constant step-size in graphical constant-sum games. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pages 3532–3542, 2018.
- [CP19] Yun Kuen Cheung and Georgios Piliouras. Vortices instead of equilibria in minmax optimization: Chaos and butterfly effects of online learning in zero-sum games. In Conference on Learning Theory, COLT 2019, 25-28 June 2019, Phoenix, AZ, USA, volume 99 of Proceedings of Machine Learning Research, pages 807–834. PMLR, 2019.
- [CP20] Yun Kuen Cheung and Georgios Piliouras. Chaos, extremism and optimism: Volume analysis of learning in games. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
- [DP09] Constantinos Daskalakis and Christos H. Papadimitriou. On a network generalization of the minmax theorem. In ICALP 2009: Proceedings of the 2009 International Colloquium on Automata, Languages, and Programming, 2009.
- [DP19] Constantinos Daskalakis and Ioannis Panageas. Last-iterate convergence: Zero-sum games and constrained min-max optimization. In 10th Innovations in Theoretical Computer Science Conference, ITCS 2019, January 10-12, 2019, San Diego, California, USA, volume 124 of LIPIcs, pages 27:1–27:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019.
- [DV75] C.A. Desoer and M. Vidyasagar. Feedback Systems: Input-output Properties. Electrical science series. Academic Press, 1975.
- [FL98] Drew Fudenberg and David K. Levine. The Theory of Learning in Games. MIT Press Books. The MIT Press, 1998.
- [FLST16] Dylan J Foster, Thodoris Lykouris, Karthik Sridharan, and Eva Tardos. Learning in games: Robustness of fast convergence. In Advances in Neural Information Processing Systems, pages 4727–4735, 2016.
- [FS99] Yoav Freund and Robert E Schapire. Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29(1-2):79–103, 1999.
- [FS13] Michael J Fox and Jeff S Shamma. Population games, stable games, and passivity. Games, 4(4):561–583, 2013.
- [GP19] D. Gadjov and L. Pavel. A passivity-based approach to nash equilibrium seeking over networks. IEEE Transactions on Automatic Control, 64(3):1077–1092, 2019.
- [H+16] Elad Hazan et al. Introduction to online convex optimization. Foundations and Trends® in Optimization, 2(3-4):157–325, 2016.
- [Har11] Marc Harper. Escort evolutionary game theory. Physica D: Nonlinear Phenomena, 240(18):1411 – 1415, 2011.
- [HS98] Josef Hofbauer and Karl Sigmund. Evolutionary Games and Population Dynamics. Cambridge University Press, Cambridge, UK, 1998.
- [HS09] Josef Hofbauer and William H Sandholm. Stable games and their dynamics. Journal of Economic theory, 144(4):1665–1693, 2009.
- [Kha15] Hassan K. Khalil. Nonlinear Control. Pearson Education, 2015.
- [KLS01] M. J. Kearns, M. L. Littman, and S. P. Singh. Graphical models for game theory. In UAI, 2001.
- [KM17] Joon Kwon and Panayotis Mertikopoulos. A continuous-time approach to online optimization. Journal of Dynamics and Games, 4(2):125–148, April 2017.
- [Mab18] M. A. Mabrok. Passivity Analysis of Replicator Dynamics and its Variations. arXiv e-prints, page arXiv:1812.07164, Dec 2018.
- [MPP18] Panayotis Mertikopoulos, Christos Papadimitriou, and Georgios Piliouras. Cycles in adversarial regularized learning. In ACM-SIAM Symposium on Discrete Algorithms, 2018.
- [MS15] Jason R Marden and Jeff S Shamma. Game theory and distributed control. In Handbook of game theory with economic applications, volume 4, pages 861–899. Elsevier, 2015.
- [MS16] M. A. Mabrok and J. S. Shamma. Passivity analysis of higher order evolutionary dynamics and population games. In 2016 IEEE 55th Conference on Decision and Control (CDC), pages 6129–6134, 2016.
- [MS18] Jason R Marden and Jeff S Shamma. Game theory and control. Annual Review of Control, Robotics, and Autonomous Systems, 1:105–134, 2018.
- [OPNSR13] Romeo Ortega, Julio Antonio Loría Perez, Per Johan Nicklasson, and Hebertt J Sira-Ramirez. Passivity-based control of Euler-Lagrange systems: mechanical, electrical and electromechanical applications. Springer Science & Business Media, 2013.
- [PML+20] Julien Perolat, Remi Munos, Jean-Baptiste Lespiau, Shayegan Omidshafiei, Mark Rowland, Pedro Ortega, Neil Burch, Thomas Anthony, David Balduzzi, Bart De Vylder, et al. From poincaré recurrence to convergence in imperfect information games: Finding equilibrium via regularization. arXiv preprint arXiv:2002.08456, 2020.
- [Poi90] Henri Poincaré. Sur le problème des trois corps et les équations de la dynamique. Acta mathematica, 13(1):A3–A270, 1890.
- [PS14] Georgios Piliouras and Jeff S Shamma. Optimization despite chaos: Convex relaxations to complex limit sets via poincaré recurrence. In Proceedings of the twenty-fifth annual ACM-SIAM symposium on Discrete algorithms, pages 861–873. SIAM, 2014.
- [PSM18] S. Park, J. S. Shamma, and N. C. Martins. Passivity and evolutionary game dynamics. In 2018 IEEE Conference on Decision and Control (CDC), pages 3553–3560, 2018.
- [RS13] Alexander Rakhlin and Karthik Sridharan. Online learning with predictable sequences. In COLT, 2013.
- [SALS15] Vasilis Syrgkanis, Alekh Agarwal, Haipeng Luo, and Robert E. Schapire. Fast convergence of regularized learning in games. In Proceedings of the 28th International Conference on Neural Information Processing Systems, NIPS’15, pages 2989–2997, Cambridge, MA, USA, 2015. MIT Press.
- [San10] William H. Sandholm. Population Games and Evolutionary Dynamics. MIT Press, 2010.
- [Sha20] Jeff S. Shamma. Feedback control perspectives on learning, 2020.
- [SS12] Shai Shalev-Shwartz. Online learning and online convex optimization. Foundations and Trends® in Machine Learning, 4(2):107–194, 2012.
- [VGFP19] Emmanouil-Vasileios Vlatakis-Gkaragkounis, Lampros Flokas, and Georgios Piliouras. Poincaré recurrence, cycles and spurious equilibria in gradient-descent-ascent for non-convex non-concave zero-sum games. In Advances in Neural Information Processing Systems, pages 10450–10461, 2019.
- [Wei95] J. W. Weibull. Evolutionary Game Theory. MIT Press; Cambridge, MA: Cambridge University Press., 1995.
- [Wil72a] Jan C. Willems. Dissipative dynamical systems part i: General theory. In Archive for Rational Mechanics and Analysis, volume 45(5), pages 321–351. 1972.
- [Wil72b] Jan C. Willems. Dissipative dynamical systems part ii: Linear systems with quadratic supply rates. In Archive for Rational Mechanics and Analysis, volume 45(5), pages 352–393. 1972.
Appendix
Appendix A Volume and Liouville’s Formula
Let be a function. Given an ODE system but with a flexibility to choose the starting point, let be the solution of the ODE system at time with starting point . Given any set , let . When is measurable, under mild conditions on the ODE system, is measurable and its volume is . Liouville’s formula states that the time derivative of the volume is equal to the integral of the divergence of the ODE system over :
where is the Jacobian of the ODE system. Note that , where is the -th component of the function . This immediately implies volume preservation for divergence-free systems.
Appendix B Missing Proofs
Proof of Proposition 4: .
It suffices to prove the forward (only if) direction, as the other direction is symmetric. By the definition of passivity,
This implies
Thus, by setting , we have
certifying passivity of the operator . ∎
Proof of Proposition 6: .
Suppose that for each action , the learning operator with shift is finitely passive via storage function . We claim that the storage function can certify finitely passivity of the learning operator with shift of the mixed strategy . To see why, note that for each action , we have
We are done by multiply both sides of the inequality by , then summing up over all ’s. ∎
Proof of Proposition 10: .
Done by Theorem 8.
Suppose the contrary, i.e. the learning algorithm is lossless, but there exists such that the learning operator with shift is not finitely lossless. Thus, it has a storage function which is not bounded from below, and
Following the calculation in the proof of Theorem 8, the regret w.r.t. action at time is exactly equal to .
Since is not bounded from below, for any , there exists such that . It is easy to construct such that ; for instance, set for all . For this choice of , the regret at time is . Since we can choose arbitrarily negative value of , the learning dynamic cannot guarantee constant regret, a contradiction. ∎
Proof of Proposition 17: .
To show that the game operator is passive, according to Definition 1 and the input-output choice of (see Figure 3), it suffices to show that
Recall the definition of in a graphical constant-sum game. Since is simply the total payoffs to all agents, is the sum of the constants of all edge-games, i.e. . We denote this double summation by . It remains to show that always if we want to show the game operator is passive, and to show that always if we want to show the game operator is lossless.
Let the action set of agent be . We first expand as follows:
the last equality holds since is the transpose of . Then we rewrite the double summation on the RHS as follows:
It remains to bound the term . Observe that for each agent , is the payoff to agent when she chooses the mixed strategy , while every other agent chooses the mixed strategy . Since the mixed strategies are coming from a Nash equilibrium (NE), by the definition of NE, , where is the payoff to agent at the NE. Thus, , where the RHS is the total payoffs to all agents at the NE. Since the game is constant-sum, we have . Hence, .
When the NE is fully-mixed, we have the following extra property: at the NE, for the agent , her payoff from each of her actions is the same, and equals to . Thus, exactly equals to , so . ∎
Appendix C Poincaré Recurrence
Corollary 20.
The FIC system which corresponds to a dynamical game system, in which is any finitely lossless MLO and is any game operator which corresponds to a graphical constant-sum game with a fully-mixed Nash equilibrium, is finitely lossless. The storage function that demonstrates finitely losslessness of the FIC system is the same as the storage function of . When the external input is the zero function, the storage function becomes a constant-of-motion.
To complete the proof of Theorem 19, we need to show the second property required by the principled approach of [MPP18]. It relies crucially on the following lemma. Recall that we have defined the following in the main paper, which converts the storage function for the original learning operator to the storage function of the new learning operator of (10).
(11) |
Lemma 21 (Adapted from [MPP18], Appendix D).
Recall the definition of FTRL and Theorem 7. For each agent , suppose she uses a convex combination of FTRL dynamics indexed by . Let the storage functions of these FTRL dynamics be . Also, let denote a vector in for agent . Then the storage function of the whole dynamical game system is
Due to Corollary 20, this storage function is a constant-of-motion when , and thus is bounded by certain constant when the starting point is already given. Since every and hence has infimum zero, we must have: for each agent , , and hence . Then by Lemma 21, for each agent , remains bounded for all , and thus the overall vector also remains bounded for all .
Appendix D Escort Learning Dynamics
An escort learning dynamic [Har11] is a system of differential equations on variable : for each ,
where each is a positive function on domain . Note that when , this is Replicator Dynamic.
Proposition 22.
Suppose a learning dynamic has the following property: if it starts at a point in the interior of , then it stays in the interior forever. We have: the learning dynamic is FTRL via a separable strictly convex regularizer function if and only if it is an escort replicator dynamic.
Proof: .
If the specified learning dynamic is FTRL, recall that the conversion function takes as input, and output the mixed strategy
which we denote by in this proof. Let and . When is in the interior of for some , by Appendix D of [CP19], we have
By the chain rule,
By recognizing as , the FTRL dynamic is an escort replicator dynamic. Precisely, we set . Since is strictly convex, is a positive function, hence is a positive function too.
Conversely, if the specified algorithm is an escort learning dynamic with escort function for each , to show that it is a FTRL dynamic with some strictly convex regularizer , we set to be separable, and for each , . Thus, it suffice to set to be any double anti-derivative of . Since , each is strictly convex, and hence is strictly convex. ∎
Appendix E More Plots Illuminating Poincaré Recurrences
Finally, we present more plots that illuminate Poincaré recurrences of learning in games in the next two pages.




