This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

No-Regret Learning in Games is Turing Complete

Gabriel P. Andrade1    Rafael Frongillo1    Georgios Piliouras2
(1 Department of Computer Science
University of Colorado Boulder
{gabriel.andrade ; raf}@colorado.edu
2 Engineering Systems and Design
Singapore University of Technology and Design
[email protected])
Abstract

Games are natural models for multi-agent machine learning settings, such as generative adversarial networks (GANs). The desirable outcomes from algorithmic interactions in these games are encoded as game theoretic equilibrium concepts, e.g. Nash and coarse correlated equilibria. As directly computing an equilibrium is typically impractical, one often aims to design learning algorithms that iteratively converge to equilibria. A growing body of negative results casts doubt on this goal, from non-convergence to chaotic and even arbitrary behaviour. In this paper we add a strong negative result to this list: learning in games is Turing complete. Specifically, we prove Turing completeness of the replicator dynamic on matrix games, one of the simplest possible settings. Our results imply the undecicability of reachability problems for learning algorithms in games, a special case of which is determining equilibrium convergence.

1.  Introduction

Many multi-agent machine learning settings can be modeled as games, from social or economic systems with algorithmic decision-makers to popular learning architectures such as generative adversarial networks (GANs). Desired outcomes in these settings are often encoded as equilibrium concepts, and therefore a primary goal is identifying machine learning algorithms with provable convergence to these equilibria.

While there has been progress in deriving strong time-average convergence guarantees for popular online learning algorithms, the per-iteration behaviour of learning in games remains elusive. Recent results attempt to formalize how elusive these dynamics can be, from non-convergence results to establishing chaotic, or even essentially arbitrary, behaviour (Andrade et al., 2021; Benaïm et al., 2012; Flokas et al., 2020; Giannou et al., 2021; Letcher, 2021). Experiments confirm that chaos can actually be typical behaviour (Sanders et al., 2018).

In this work, we add an even more sobering negative result to this list: learning in games is Turing complete. Specifically, we show that replicator dynamics in matrix games, one of the simplest possible settings, can simulate an arbitrary Turing machine (Theorem 1). Here simulation is defined in terms of reachability, a natural decision problem for dynamical systems that asks whether a given system and initial condition eventually intersects (reaches) a certain set; a dynamical system simulates a Turing machine if the corresponding halting problem reduces to the reachability problem. Our proof combines two recent results, on the Turing completeness of fluid dynamics (Cardona et al., 2021a), and on the approximate universality of learning in games (Andrade et al., 2021).

We believe our results have far-reaching implications for the literature on learning in games. Most immediate is the fact that the reachability problem is undecidable for no-regret learning in general (Corollary 3). This result calls into question the feasibility of equilibration as a goal, since even deciding whether a learning algorithm gets close to an equilibrium is undecidable. More broadly, these results establish the computational power of learning dynamics in games—and accordingly, their inherent complexity as formalized by computabiity theory.

Beyond the continuous-time setting, we borrow tools from numerical analysis to show that the multiplicative weights algorithm can simulate any bounded Turing machine (Theorem 2). Extending this analysis to arbitrary Turing machines, and thus establishing Turing completeness for the discrete-time setting, may not be possible with the techniques we consider. Establishing (or refuting) the Turing completeness of multiplicative weights is therefore left as an important open question, and one that will likely require entirely new techniques.

2.  Preliminaries

2.1.  Matrix Games

A finite nn-player normal form game consists of nn agents [n]={1,,n}[n]=\{1,\dots,n\}, where each agent i[n]i\in[n] can choose actions from a finite action set SiS_{i}. Actions are chosen by agent ii according to a mixed strategy, a distribution xi\textbf{x}_{i} in the probability |Si||S_{i}|-simplex Δ|Si|={xi+|Si|:sSixis=1}\Delta^{|S_{i}|}=\{\textbf{x}_{i}\in\mathbb{R}^{|S_{i}|}_{+}:\sum_{s\in S_{i}}x_{is}=1\}. In normal form games, agents receive payoffs from pairwise interactions according to payoff matrices Ai,j|Si|×|Sj|A_{i,j}\in\mathbb{R}^{|S_{i}|\times|S_{j}|} where i,j[n]i,j\in[n] and iji\neq j. Given that mixed strategies xiΔ|Si|\textbf{x}_{i}\in\Delta^{|S_{i}|} and xjΔ|Sj|\textbf{x}_{j}\in\Delta^{|S_{j}|} are chosen, agent ii receives payoff xiAi,jxj\textbf{x}^{\intercal}_{i}A_{i,j}\textbf{x}_{j}. These payoffs yield a natural optimization problem for each agent, where agents act strategically and independently to maximize their expected payoff over the other agents’ mixed strategies, i.e.

maxxiΔ|Si|j[n];jixiAi,jxj,i[n].\max\limits_{\textbf{x}_{i}\in\Delta^{|S_{i}|}}\sum_{j\in[n];~{}j\neq i}\textbf{x}_{i}^{\intercal}A_{i,j}\textbf{x}_{j},\qquad i\in[n]~{}. (1)

Throughout the paper we’ll restrict our attention to the case known as matrix games, when n=2n=2.

2.2.  Follow-the-Regularized-Leader (FTRL) Learning and Replicator Dynamics

In many game settings, the optimization in eq. (1) is a moving target since the opponent adaptively updates their strategy and the payoff matrix may be unknown. In such settings, arguably the most well known class of algorithms is Follow-the-Regularized-Leader (FTRL). The continuous-time version of an FTRL algorithm is as follows. Given initial payoff vector yi(0)\textbf{y}_{i}(0), an agent ii that plays against agent jj in a matrix game Ai,jA_{i,j} updates their strategy at time tt according to

yi(t)\displaystyle\textbf{y}_{i}(t) =yi(0)+0tAi,jxj(s)𝑑s\displaystyle=\textbf{y}_{i}(0)+\int_{0}^{t}A_{i,j}\textbf{x}_{j}(s)ds (2)
xi(t)\displaystyle\textbf{x}_{i}(t) =argmaxxiΔ|Si|{xi,yi(t)hi(xi)}\displaystyle=\operatorname*{arg\,max}_{\textbf{x}_{i}\in\Delta^{|S_{i}|}}\{\langle\textbf{x}_{i},\textbf{y}_{i}(t)\rangle-h_{i}(\textbf{x}_{i})\}

where hih_{i} is strongly convex and continuously differentiable. FTRL effectively performs a balancing act between exploration and exploitation. The cumulative payoff vector yi(t)\textbf{y}_{i}(t) indicates the total payouts until time tt, i.e. if agent ii had played strategy siSis_{i}\in S_{i} continuously from t=0t=0 until time tt, agent ii would receive a total reward of yisi(t)\textbf{y}_{is_{i}}(t). The two most well-known instantiations of FTRL dynamics are the online gradient descent algorithm when hi(xi)=xi22h_{i}(\textbf{x}_{i})=||\textbf{x}_{i}||_{2}^{2}, and the replicator dynamics (the continuous-time analogue of Multiplicative Weights Update (Arora et al., 2012)) when hi(xi)=si𝒮ixisilnxisih_{i}(\textbf{x}_{i})=\sum_{s_{i}\in{\cal S}_{i}}\textbf{x}_{is_{i}}\ln\textbf{x}_{is_{i}}. FTRL dynamics in continuous time has bounded regret in arbitrary games (Mertikopoulos et al., 2018). For more information on FTRL dynamics and online optimization, see Shalev-Shwartz (2012).

In this paper, we will focus on replicator dynamics (RD) as the learning process generating game dynamics. In addition to its role in optimization, replicator dynamics is the prototypical dynamic studied in evolutionary game theory (Weibull, 1995; Sandholm, 2010) and is one of the key mathematical models of evolution and biological competition (Schuster and Sigmund, 1983; Taylor and Jonker, 1978). In this context, replicator dynamics can be thought of as a normalized form of ecological population models, and is studied given a single payoff matrix AA and a single probability distribution x that can be thought abstractly as capturing the proportions of different species/strategies in the current population. Species/strategies get randomly paired up and the resulting payoff determines which strategies will increase/decrease over time.

Formally, the dynamics are as follows. Let Am×mA\in\mathbb{R}^{m\times m} be a matrix game and xΔm\textbf{x}\in\Delta^{m} be the mixed strategy played. RD on AA are given by:

x˙i=dxidt=xi((Ax)ixAx),i[n]\dot{x}_{i}=\frac{dx_{i}}{dt}=x_{i}\left((A\textbf{x})_{i}-\textbf{x}^{\intercal}A\textbf{x}\right),\qquad i\in[n] (3)

Under the symmetry of Ai,j=Aj,iA_{i,j}=A_{j,i}, and of initial conditions (i.e. xi=xj\textbf{x}_{i}=\textbf{x}_{j}), it is immediate to see that under the xi,xj\textbf{x}_{i},\textbf{x}_{j} solutions of eq. (2) are identical to each other and to the solution of eq. (3) with A=Ai,j=Aj,iA=A_{i,j}=A_{j,i}. For our purposes, it will suffice to focus on exactly this setting of matrix games defined by a single payoff matrix AA and a single probability distribution x, which is actually the standard setting within evolutionary game theory.

2.3.  Dynamical Systems Theory

A dynamical system is a mathematical model of a time-evolving process. The objects undergoing change in a dynamical system is called its state and is often denoted by x𝕏\textbf{x}\in\mathbb{X}, where 𝕏\mathbb{X} is a topological space called a state space. For most of this paper we will be focusing on continuous time systems, but in §5 we will consider discrete time systems derived from numerical approximations of their continuous counterpart. To distinguish between continuous and discrete time, we will use x(t)\textbf{x}(t) to describe the state as a function of continuous time tt\in\mathbb{R} and xt\textbf{x}^{t} to describe the state as a function of discrete time tt\in\mathbb{Z}.

Change between states in a continuous time dynamical system is described by a flow Φ:𝕏×𝕏\Phi:\mathbb{X}\times\mathbb{R}\to\mathbb{X} satisfying two properties:

  1. (i)

    For each tt\in\mathbb{R}, Φ(,t):𝕏𝕏\Phi(\cdot,t):\mathbb{X}\to\mathbb{X} is bijective, continuous, and has a continuous inverse.

  2. (ii)

    For every s,ts,t\in\mathbb{R} and x𝕏\textbf{x}\in\mathbb{X}, Φ(x,s+t)=Φ(Φ(x,t),s)\Phi(\textbf{x},s+t)=\Phi(\Phi(\textbf{x},t),s).

Intuitively, flows describe the evolution of states in the dynamical system. Given a time tt\in\mathbb{R}, the flow gives us the relative movement of every point x𝕏\textbf{x}\in\mathbb{X}; we will denote this by the map Φt:𝕏𝕏\Phi^{t}:\mathbb{X}\to\mathbb{X}. Similarly, given a point x𝕏\textbf{x}\in\mathbb{X}, the flow captures the trajectory of x as a function of time; in an abuse of notation, we will denote this by x(t)\textbf{x}(t) where tt is changing.

Continuous time dynamical systems are often given as systems of ordinary differential equations (ODEs). Systems of ODEs describe a vector field V:𝕏T𝕏V:\mathbb{X}\to T\mathbb{X} which assigns to each x𝕏\textbf{x}\in\mathbb{X} a vector in the tangent space of 𝕏\mathbb{X} at x. The unit sphere 𝕊n={xn+1:x22=1}\mathbb{S}^{n}=\{\textbf{x}\in\mathbb{R}^{n+1}:\|x\|_{2}^{2}=1\} will play a special role in proving Theorem 1, in which case the tangent space T𝕊nT\mathbb{S}^{n} at each x𝕊n\textbf{x}\in\mathbb{S}^{n} is {yn:xy=0}\{\textbf{y}\in\mathbb{R}^{n}:\textbf{x}\cdot\textbf{y}=0\}. Intuitively, the tangent space defines bundles of vectors that ensure the system’s states remain well defined on the state space as time progresses. A system of ODEs is said to generate (or give) a flow Φ\Phi if Φ\Phi describes a solution of the ODEs at each point x𝕏\textbf{x}\in\mathbb{X}. Throughout this paper we assume that all dynamical systems discussed can be given by a system of ODEs. For this reason, we will use the term dynamical system to refer to the system of ODEs, the associated vector field, and a generated flow interchangeably. A well known result in dynamical systems theory states that, for Lipschitz-continuous systems of ODEs, the generated flow is unique (see Perko (1991); Meiss (2007)) and using these terms interchangeably is well defined.

An important notion for proving Theorem 1, and dynamical systems in general, is that of a global attracting set of the dynamical system. Let Φ\Phi be a flow generated by some dynamical system on 𝕏\mathbb{X}. We say 𝕐𝕏\mathbb{Y}\subset\mathbb{X} is forward invariant for the flow Φ\Phi if Φt(y)𝕐\Phi^{t}(\textbf{y})\in\mathbb{Y} for every t0t\geq 0, y𝕐\textbf{y}\in\mathbb{Y}. We say 𝕐𝕏\mathbb{Y}\subset\mathbb{X} is globally attracting for the flow Φ\Phi if 𝕐\mathbb{Y} is nonempty, forward invariant, and

𝕐t>0{Φt(x):x𝕏}.\mathbb{Y}\supseteq\bigcap\limits_{t>0}\{\Phi^{t}(\textbf{x}):\textbf{x}\in\mathbb{X}\}~{}. (4)

Stated informally, if 𝕐\mathbb{Y} is globally attracting it will eventually capture the dynamics of Φ\Phi starting from any point in 𝕏\mathbb{X} after some transitionary period of time.

Now let 𝕏\mathbb{X} and 𝕐\mathbb{Y} be two topological spaces. We say that a function f:𝕏𝕐f:\mathbb{X}\to\mathbb{Y} is a homeomorphism if (i) ff is bijective, (ii) ff is continuous, and (iii) ff has a continuous inverse. Furthermore, two flows Φ:𝕏×𝕏\Phi:\mathbb{X}\times\mathbb{R}\to\mathbb{X} and Ψ:𝕐×𝕐\Psi:\mathbb{Y}\times\mathbb{R}\to\mathbb{Y} are homeomorphic if there exists a homeomorphism g:𝕏𝕐g:\mathbb{X}\to\mathbb{Y} such that for each x𝕏\textbf{x}\in\mathbb{X} and tt\in\mathbb{R} we have g(Φ(x,t))=Ψ(g(x),t)g(\Phi(\textbf{x},t))=\Psi(g(\textbf{x}),t). If gg is also C1C^{1} and has a C1C^{1} inverse, then we say gg is a diffeomorphism and that the flows Φ\Phi and Ψ\Psi are diffeomorphic. Observe that every diffeomorphism is also a homeomorphism, and thus every pair of diffeomorphic flows are also homeomorphic. Homeomorphic (resp. diffeomorphic) flows satisfy a strong, and typical, notion of equivalence between dynamical systems. Intuitively, two dynamical systems are homeomorphic if their trajectories can be mapped to one another by stretching and bending space.

2.4.  Turing Machines

Throughout this paper we rely crucially on the notion of a Turing complete dynamical systems, i.e. a dynamical system able to simulate any Turing machine. We will briefly recall the Turing machine model and formalize its relationship with dynamical systems.

A Turing machine is given by a tuple T=(Q,Σ,δ,q0,qhalt)T=\left(Q,\Sigma,\delta,q_{0},q_{\text{halt}}\right) where

  • QQ is a finite set of states, including an initial state q0q_{0} and a halting state qhaltq_{\text{halt}};

  • Σ\Sigma is an alphabet with cardinality at least two;

  • δ:Q×ΣQ×Σ×{1,0,1}\delta:Q\times\Sigma\to Q\times\Sigma\times\{-1,0,1\} is a transition function.

For a given Turing machine TT and an input tape s=(si)iΣs=(s_{i})_{i\in\mathbb{Z}}\in\Sigma^{\mathbb{Z}}, the Turing machine’s computation is carried out according to the following process:

  • [0]

    Initialize the current state qq to q0q_{0}, and the current tape w=(wi)iw=(w_{i})_{i\in\mathbb{Z}} to be ss.

  • [1]

    If q=qhaltq=q_{\text{halt}} then halt the algorithm and return ww as output. Otherwise compute δ(q,w0)=(q,w0,σ)\delta(q,w_{0})=(q^{\prime},w^{\prime}_{0},\sigma), where σ{1,0,1}\sigma\in\{-1,0,1\}.

  • [2]

    Update the current state and tape by setting q=qq=q^{\prime} and the 0th0^{\text{th}} position of ww to w0=w0w_{0}=w^{\prime}_{0}.

  • [3]

    Update ww with the σ\sigma shifted tape (wi+σ)(w_{i+\sigma}), then return to [1].

Without loss of generality, we will assume that Turing machines adhere to standard simplifying conventions (cf. Sipser (1996)). Specifically, we assume that the alphabet Σ={0,,9}\Sigma=\{0,\dots,9\} and any given tape of the Turing machine only has a finite number of symbols different from 0, where 0 represents the special “blank symbol”. Under these assumptions it follows that there exists a finite (possibly large) integer k0>0k_{0}>0 such that any tape ww satisfies

w=(wi)i=000wk0wk0000w=(w_{i})_{i\in\mathbb{Z}}=\dots 000w_{-k_{0}}\dots w_{k_{0}}000\dots (5)

with each wiΣw_{i}\in\Sigma. Equivalently, at any given step in the Turing machine’s evolution, these assumptions ensure there can be at most 2k0+12k_{0}+1 non-blank symbols on the tape. In particular, we get that the space of configurations of a Turing machine TT is Q×AQ×ΣQ\times A\subset Q\times\Sigma^{\mathbb{Z}}, where AA is the subset of strings taking the form (5).

The construction of dynamical systems that simulate Turing machines is at the heart of our results, and has been studied for various problems in physics (Reif et al., 1994; Freedman, 1998; Cardona et al., 2021b). Although equivalent definitions exist, our analyses will adopt the formalisms used by recent work on fluid dynamics (Cardona et al., 2021a; Tao, 2017). An analogous definition can be given for flows on a manifold.

Definition 1.

A vector field XX on a manifold MM simulates a Turing machine TT if there exists an explicitly constructible open set Uwk,,wkMU_{w_{-k},\dots,w_{k}}\subset M corresponding to each finite string wk,,wkΣw_{-k},\dots,w_{k}\in\Sigma, and an explicitly constructible point psMp_{s}\in M corresponding to each sΣs\in\Sigma^{\mathbb{Z}}, such that: TT with input tape ss halts with an output tape having values wk,,wkw_{-k},\dots,w_{k} in positions k,,k-k,\dots,k respectively if and only if the trajectory of XX through psp_{s} intersects Uwk,,wkU_{w_{-k},\dots,w_{k}}.

Intuitively, a dynamical system simulates a Turing machine if there is a correspondence between trajectories reaching certain sets and computations halting with certain configurations. In particular, constructing the point psp_{s} depends only on the Turing machine TT and input tape ss, while constructing the set Uwk,,wkU_{w_{-k},\dots,w_{k}} depends only on the specified halting configuration of TT. Both here and throughout the paper, we say a mathematical object (e.g. points, sets, or matrices) is constructible if it can be computed in finite time; constructability is not explicitly used in our arguments, but is important for nuanced technical reasons since it disallows pathological scenarios such as having all information about a machine’s computations being encoded in an initial condition.

Definition 1 leads to a natural notion of Turing completeness for dynamical systems.

Definition 2.

A dynamical system is Turing complete if it can simulate a universal Turing machine TT.

3.  Turing Complete Dynamics on Matrix Games

Our goal in this section is to establish the Turing completeness of replicator dynamics; in §3.1 we provide all precursory results required to prove the main result in §3.2.

3.1.  Turing Complete Vector Fields and Approximation-Free Game-Embeddings

Our construction of Turing complete game dynamics relies crucially on the notion of generalized Lotka-Volterra (GLV) vector fields. In particular, two properties of GLV vector fields will play a key role in the proof: (i) polynomial vector fields on ++n\mathbb{R}^{n}_{++} are a special case of GLV vector fields, and (ii) GLV vector fields can be embedded into RD on a matrix games without approximation.

Formally, a GLV vector field is a vector field on ++n\mathbb{R}^{n}_{++} given by the system of ODEs

x˙i=dxidt=xi(λi+j[m]Aijk[n]xkBjk),i[n]\dot{x}_{i}=\frac{dx_{i}}{dt}=x_{i}\left(\lambda_{i}+\sum_{j\in[m]}A_{ij}\prod_{k\in[n]}x^{B_{jk}}_{k}\right),\qquad i\in[n] (6)

where mm is some positive integer, λn\lambda\in\mathbb{R}^{n}, An×mA\in\mathbb{R}^{n\times m}, and Bm×nB\in\mathbb{R}^{m\times n}. Since exponents given by BB can be any real number, the terms in the parentheses are multivariate generalized polynomials. In special cases where the ODEs are standard multivariate polynomials, GLV vector fields equate to polynomial vector fields—a fact straightforwardly shown by noting that any polynomial vector field P={Pi}i[n]P=\{P_{i}\}_{i\in[n]} on ++n\mathbb{R}^{n}_{++} is equivalent to the GLV vector field P~={xi(1xiPi)}i[n]\tilde{P}=\{x_{i}(\tfrac{1}{x_{i}}P_{i})\}_{i\in[n]}.

Polynomial and GLV vector fields play an integral role by allowing us to invoke recent results by Cardona et al. (2021a) and Andrade et al. (2021). The starting point of our construction can stated as follows:

Proposition 1 (Theorem 4.1 of Cardona et al. (2021a)).

There exists a constructible polynomial vector field XX of degree 5858 on 𝕊17\mathbb{S}^{17} which is Turing complete and bounded.

In Appendix A we provide a proof sketch of this result; we refer the reader to Cardona et al. (2021a) for the full proof. In §3.2 we will extend the Turing completeness from Proposition 1 to replicator dynamics in matrix games by leveraging recent work by Andrade et al. (2021). In essence, Andrade et al. (2021) showed that GLV vector fields can approximate essentially any dynamical system, and that any GLV vector field can be embedded into the dynamics of RD on some matrix game. In this paper we only rely on the latter result, since polynomial vector fields are already a special case of GLV vector fields and thus do not need to be approximated.

Proposition 2 (Theorem 3 of Andrade et al. (2021)).

Let P~\tilde{P} be a GLV vector field on ++n\mathbb{R}^{n}_{++} and Φ\Phi be the flow generated by P~\tilde{P}. For mnm\geq n, there exists a flow Θ\Theta on relint(Δm)\operatorname{\operatorname*{relint}}(\Delta^{m}) and a constructible diffeomorphism f:++nrelint(Δm)f:\mathbb{R}^{n}_{++}\to\mathbb{P}\subseteq\operatorname{\operatorname*{relint}}(\Delta^{m}) such that:

  1. (i)

    The flow Θ\Theta on relint(Δm)\operatorname{\operatorname*{relint}}(\Delta^{m}) is given by RD on a matrix game with payoff matrix Am×mA\in\mathbb{R}^{m\times m}.

  2. (ii)

    The flow Θ|=f(Γ){\left.\kern-1.2pt\Theta\vphantom{\big{|}}\right|_{\mathbb{P}}}=f(\Gamma) and Φ=f1(Θ|)\Phi=f^{-1}({\left.\kern-1.2pt\Theta\vphantom{\big{|}}\right|_{\mathbb{P}}}), where Θ|{\left.\kern-1.2pt\Theta\vphantom{\big{|}}\right|_{\mathbb{P}}} is the flow given by Θ\Theta restricted to \mathbb{P}.

  3. (iii)

    The integer m1m-1 is at least the number of unique monomials in P~\tilde{P}.

At a high level, proving Proposition 2 boils down to composing an embedding trick introduced by Brenig and Goriely (1989) with Theorem 7.5.17.5.1 by Hofbauer and Sigmund (1998). The relationship highlighted here between m1m-1 and the number of monomials was not included in the original statement by Andrade et al. (2021), however it is shown as part of an important step in their proof and is required for Corollary 1.

3.2.  Replicator Dynamics on Matrix Games is Turing Complete

To prove the main result of this section, Theorem 1, we will apply Proposition 2 on a diffeomorphism of the Turing complete vector field constructed in Proposition 1.

Theorem 1.

There exists m0m\geq 0 and a constructible matrix game Am×mA\in\mathbb{R}^{m\times m} such that replicator dynamics on AA is Turing complete.

Proof.

Let XX be the Turing complete polynomial vector field on 𝕊17\mathbb{S}^{17} given by Proposition 1. We begin by embedding XX into a polynomial vector field X¯\overline{X} on 18\mathbb{R}^{18} where 𝕊17\mathbb{S}^{17} is globally attracting. Since trajectories of X¯\overline{X} are globally attracted to 𝕊17\mathbb{S}^{17}, a standard change of coordinates via translation yields a polynomial vector field that is well-defined on ++18\mathbb{R}^{18}_{++}. Therefore, as polynomial vector fields on ++18\mathbb{R}^{18}_{++} are a special case of GLV vector fields, we will conclude the proof by applying Proposition 2 from §3.1.

Let {ϕi}i[18]\{\phi_{i}\}_{i\in[18]} be the set of polynomials given by XX. Define π(x)=(1x22)\pi(\textbf{x})=(1-\|\textbf{x}\|_{2}^{2}) for x18\textbf{x}\in\mathbb{R}^{18}. Now define X¯\overline{X} as the vector field on 18\mathbb{R}^{18} given by the system

x˙i\displaystyle\dot{x}_{i} =xi(π(x)+1xiϕi(x))\displaystyle=x_{i}\left(\pi(\textbf{x})+\frac{1}{x_{i}}\phi_{i}(\textbf{x})\right)
=xiπ(x)+ϕi(x),\displaystyle=x_{i}\pi(\textbf{x})+\phi_{i}(\textbf{x})~{},

for each i[18]i\in[18]. By construction 𝕊17\mathbb{S}^{17} is forward invariant under X¯\overline{X}, as π(x)=0\pi(\textbf{x})=0 on 𝕊17\mathbb{S}^{17} and XX is forward invariant on 𝕊17\mathbb{S}^{17}. Furthermore, observe that for x=x(t)18\textbf{x}=\textbf{x}(t)\in\mathbb{R}^{18} the solutions of X¯\overline{X} satisfy

ddtx22\displaystyle\frac{d}{dt}\|\textbf{x}\|_{2}^{2} =2i[18]xix˙i\displaystyle=2\sum_{i\in[18]}x_{i}\dot{x}_{i}
=2(i[18]xi2π(x)+i[18]xiϕi(x))\displaystyle=2\left(\sum_{i\in[18]}x_{i}^{2}\pi(\textbf{x})+\sum_{i\in[18]}x_{i}\phi_{i}(\textbf{x})\right)
=2π(x)(i[18]xi2)+2(i[18]xiϕi(x))\displaystyle=2\pi(\textbf{x})\left(\sum_{i\in[18]}x_{i}^{2}\right)+2\left(\sum_{i\in[18]}x_{i}\phi_{i}(\textbf{x})\right)
=2π(x)x22\displaystyle=2\pi(\textbf{x})\|\textbf{x}\|_{2}^{2}
=2x22(1x22),\displaystyle=2\|\textbf{x}\|_{2}^{2}\left(1-\|\textbf{x}\|_{2}^{2}\right)~{},

since, by definition of T𝕊17T\mathbb{S}^{17}, the constraint x22=1\|\textbf{x}\|_{2}^{2}=1 ensures XX satisfies

2i[18]xiϕi(x)=0.2\sum_{i\in[18]}x_{i}\phi_{i}(\textbf{x})=0~{}.

The term 2x22(1x22)2\|\textbf{x}\|_{2}^{2}\left(1-\|\textbf{x}\|_{2}^{2}\right) is a logistic equation in x22\|\textbf{x}\|_{2}^{2}. Thus, for every x18\textbf{x}\in\mathbb{R}^{18}, we know x221\|\textbf{x}\|_{2}^{2}\to 1 as tt\to\infty. It follows that 𝕊17\mathbb{S}^{17} is globally attracting for the trajectories generated by X¯\overline{X}.

Denote a standard translation of axes by σ\sigma\in\mathbb{R} as Fσ:1818F_{\sigma}:\mathbb{R}^{18}\to\mathbb{R}^{18}, xx+σ𝟙\textbf{x}\mapsto\textbf{x}+\sigma\mathbbm{1}, where 𝟙\mathbbm{1} is the all-ones vector. Since solutions of X¯\overline{X} are attracted to 𝕊17\mathbb{S}^{17} and Proposition 1 ensures {ϕi}i[18]\{\phi_{i}\}_{i\in[18]} is bounded due to the reparametrization done in eq. (4.2)(4.2) in Cardona et al. (2021a), there exists suitable values of σ\sigma such that composing FσF_{\sigma} with X¯\overline{X} yields a polynomial vector field that is forward invariant on ++18\mathbb{R}^{18}_{++}. Formally, let B>0B>0 be the bound on {ϕi}i[18]\{\phi_{i}\}_{i\in[18]} given in Proposition 1, i.e. for all i[18]i\in[18] and x18\textbf{x}\in\mathbb{R}^{18} the vector field XX satisfies |ϕi(x)|B|\phi_{i}(\textbf{x})|\leq B. To ensure the translated vector field is forward invariant on ++18\mathbb{R}^{18}_{++}, it suffices to find σ\sigma such that Y=FσX¯Y=F_{\sigma}\circ\overline{X} is strictly positive on the boundary when y++18\textbf{y}\in\mathbb{R}^{18}_{++} has yi=0y_{i}=0 for some i[18]i\in[18]. By definition we know that YY at any y++18\textbf{y}\in\mathbb{R}^{18}_{++} is identical to X¯\overline{X} at x=yσ𝟙\textbf{x}=\textbf{y}-\sigma\mathbbm{1}. The system of equations {y˙i}i[18]\{\dot{y}_{i}\}_{i\in[18]} is given by the system of equations {x˙i}i[18]\{\dot{x}_{i}\}_{i\in[18]} under the substitution x=yσ𝟙\textbf{x}=\textbf{y}-\sigma\mathbbm{1}. Therefore we find that, for y++18\textbf{y}\in\mathbb{R}^{18}_{++} with yi=0y_{i}=0 for some i[18]i\in[18],

y˙i\displaystyle\dot{y}_{i} =(yiσ)π(yσ𝟙)+ϕi(yσ𝟙)\displaystyle=(y_{i}-\sigma)\pi(\textbf{y}-\sigma\mathbbm{1})+\phi_{i}(\textbf{y}-\sigma\mathbbm{1})
(σ)(1yσ𝟙22)B\displaystyle\geq(-\sigma)(1-\|\textbf{y}-\sigma\mathbbm{1}\|_{2}^{2})-B
=σyσ𝟙22σB\displaystyle=\sigma\|\textbf{y}-\sigma\mathbbm{1}\|_{2}^{2}-\sigma-B
σ3σB,\displaystyle\geq\sigma^{3}-\sigma-B~{},

which implies y˙i>0\dot{y}_{i}>0 whenever B<σ+σ3B<-\sigma+\sigma^{3}. Thus, for values of σ\sigma satisfying B<σ+σ3B<-\sigma+\sigma^{3}, we have Y=FσX¯Y=F_{\sigma}\circ\overline{X} which is well defined on ++18\mathbb{R}^{18}_{++} for all initial conditions in ++18\mathbb{R}^{18}_{++}.

By definition of YY, as a translated copy of X¯\overline{X}, the set Fσ(𝕊17)F_{\sigma}(\mathbb{S}^{17}) is globally attracting in YY, and Y|Fσ(𝕊17){\left.\kern-1.2ptY\vphantom{\big{|}}\right|_{F_{\sigma}(\mathbb{S}^{17})}} is a Turing complete polynomial vector field. It follows we have constructed a polynomial vector on ++18\mathbb{R}^{18}_{++} that inherits the Turing complete dynamics of XX. Since polynomial vector fields on ++18\mathbb{R}^{18}_{++} are a special case of GLV vector fields on ++18\mathbb{R}^{18}_{++}, from Proposition 2 there exists a diffeomorphism f:++18relint(Δm)f:\mathbb{R}^{18}_{++}\to\mathbb{P}\subseteq\operatorname{\operatorname*{relint}}(\Delta^{m}) from trajectories of YY onto trajectories of an invariant submanifold of replicator dynamics on a matrix game Am×mA\in\mathbb{R}^{m\times m}.

We conclude by showing how the Turing completeness of XX corresponds to Turing completeness for replicator dynamics on AA. Suppose we have a given Turing machine TT, an input tape ss, and some finite string ω\omega. By Proposition 1 there exists a point psp_{s} and open set UωU_{\omega} such that trajectories of XX through psp_{s} intersect UωU_{\omega} if and only if TT halts with input ss and output matching ω\omega about the machine’s head. Our analysis above shows that X¯|𝕊17=X{\left.\kern-1.2pt\overline{X}\vphantom{\big{|}}\right|_{\mathbb{S}^{17}}}=X, so trajectories of X¯\overline{X} through psp_{s} intersect UωU_{\omega} if and only if TT halts with input ss and output matching ω\omega. Therefore, after translating X¯\overline{X}, we know trajectories of YY through Fσ(ps)F_{\sigma}(p_{s}) intersect Fσ(Uω)F_{\sigma}(U_{\omega}) if and only if TT halts with input ss and output matching ω\omega. Finally, since diffeomorphisms are closed under composition, we conclude that trajectories of replicator dynamics on AA through the point f(Fσ(ps))f(F_{\sigma}(p_{s})) intersect the set f(Fσ(Uω))f(F_{\sigma}(U_{\omega})) if and only if TT halts with input ss and output matching ω\omega, where ff is the diffeomorphism above. Thus, on an invariant submanifold of relint(Δm)\operatorname{\operatorname*{relint}}(\Delta^{m}), replicator dynamics on AA simulates TT. Taking TT to be a universal Turing machine completes the proof. ∎

An interesting corollary of Theorem 1 is that we arrive at a bound on the number of actions needed for defining games where learning dynamics can be Turing complete. The bound is likely loose for several reasons. Firstly, the polynomial vector field from Proposition 1 is not known to have minimal degree nor dimension. Secondly, the combinatorial argument in Appendix B makes no attempt at a nuanced count on the number of unique monomials in the polynomials given by these vector fields. Deriving a tight bound is not only an interesting open question for game dynamics, but also for recent work in fluid dynamics (Cardona et al., 2021a; de Lizaur, 2021) and analog computing (Hainry, 2009).

Corollary 1.

For some m(7618)+1m\leq{76\choose 18}+1, there exists a matrix game Am×mA\in\mathbb{R}^{m\times m} such that replicator dynamics on AA is Turing complete.

4.  Undecidable Phenomena in No-Regret Learning Dynamics

The Turing completeness of replicator dynamics (i.e. Theorem 1) has deep implications for machine learning and, more generally, learning in strategic environments. Specifically, if a dynamical system simulates a Turing machine, Definition 1 gives a reduction from the halting problem for Turing machines to the reachability problem for dynamical systems, which we use alongside the Turing completeness established in Theorem 1 to uncover the existence of undecidable reachability problems. As will be discussed in §4.2, the existence of undecidable problems makes it increasingly important that we understand computability in instances of reachability arising from fundamental solution concepts for game theory and machine learning.

4.1.  The Halting and Reachability Problems

The halting problem is a prototypical decision problem for Turing machines and is arguably the most famous undecidable problem in computer science. Given a Turing machine TT and an input tape, the halting problem for TT asks whether or not TT will halt. By contrast, the reachability problem is canonical for dynamical systems and has been studied in various control settings; given a dynamical system XX and a set of initial conditions, the reachability problem for XX asks whether or not XX’s trajectory will intersect a predetermined set. Although the computability of the halting problem is generally well understood in Turing machines, the computability of the reachability problem has not traditionally been studied in the context of game dynamics. However, from the strong equivalence between halting and reachability required by Definition 1, we immediately get a reduction between these classic decision problems.

Proposition 3.

If a dynamical system XX simulates a Turing machine TT, then the halting problem for TT reduces to the reachability problem for XX.

The proof of this proposition follows directly from Definition 1, since checking whether the dynamical system reaches a set becomes equivalent to checking whether the Turing machine halts by definition. From Theorem 1 we know that replicator dynamics on a matrix game can simulate a universal Turing machine. Therefore, due to the undecidability of the halting problem in general, we deduce that the reachability problem can be undecidable for replicator dynamics on matrix games.

Corollary 2.

There exist matrix games where reachability is undecidable for replicator dynamics.

The corollary follows immediately from Proposition 3 and Theorem 1, since the undecidability of the halting problem for universal Turing machines uncovers the undecidability of the reachability problem for replicator dynamics on matrix games.

4.2.  Implications for No-Regret Learning in Games

Games are primarily understood and studied via equilibrium concepts, e.g. Nash equilibria, evolutionary stable strategies, and coarse correlated equilibria. It is therefore unsurprising that the goal of learning in games is often to converge on some set of equilibria. Yet, beyond certain special cases (e.g. potential games), learning behaviours remain largely enigmatic and there has been limited progress towards resolving non-convergence in general settings. The results in this paper may explain why: determining convergence to a set of equilibria is a special case of reachability, and identifying learning algorithms that provably converge on such a set can be an undecidable problem even in very simple classes of games. The goal of this section is to formalize this intuition.

In Corollary 2 we found that reachability can be undecidable for replicator dynamics on matrix games. Therefore, taken as a negative result, Corollary 2 implies that undecidable trajectories can exist in larger classes of game dynamics where replicator dynamics on matrix games is a special case. Unfortunately, replicator dynamics is special case of FTRL dynamics and no-regret learning dynamics more generally (Mertikopoulos et al., 2018), which suggests these popular learning dynamics can inherit the negative result on any class of games containing matrix games. Similarly, matrix games are very restricted and a special case of many popular classes of games, e.g. normal form and smooth games. As an example of how broadly these results generalize, matrix games in the FTRL framework describe quadratic objective functions and thus undecidable trajectories exist for optimization-driven learning over quadratic objectives. Thus, as Corollary 2 holds for replicator dynamics on matrix games, we arrive at the reachability problem being generally undecidable for rich classes of game dynamics studied in the literature and used in practice.

Corollary 3.

There exist games where reachability for no-regret learning dynamics is undecidable.

In light of Corollary 2, the claim follows from our discussion above. As determining convergence to sets of game theoretic solution concepts is a special case of the reachability problem, Corollary 3 reveals that determining whether game dynamics converge to fundamental solution concepts is undecidable in general. It is important to note that the undecidability may not hold for specific games or learning dynamics; the primary take-away is that undecidability is possible and has strong implications about how we should approach these important questions.

5.  Discrete Learning Dynamics and Turing Machine Simulations

Thus far we have focused on the continuous-time replicator learning dynamics, but in practice discrete-time learning dynamics are typically used. A folk result in the study of game dynamics states that the multiplicative weights update (MWU) algorithm is essentially an Euler discretization of replicator dynamics. It is therefore natural to ask whether MWU, the discrete analogue of replicator dynamics, are also Turing complete. Unfortunately, as will be shown in this section, standard numerical error analyses are likely insufficient for proving Turing completeness in discrete time; intuitively, the reason is because discretizations of a continuous time process will yield error bounds that grow as a function of time. We will formalize these error bounds in §5.1 and use them in §5.2 to begin untangling the computational power of MWU. Discussions of related open questions are left for §6.

5.1.  Discretization Error of Multiplicative Weights Updates

The fact that MWU is a discretization of replicator dynamics is well known in the field of game dynamics, but a precise derivation of this relationship is often omitted. For clarity in our analysis of discretization errors, we will highlight one possible discretization that reveals MWU as a discrete-analogue of replicator dynamics in Appendix C. The discretization we arrive at is used to find a bound on the cumulative error of MWU relative to replicator dynamics, which is crucial for the analyses and discussion to follow.

Lemma 1.

Let Φ\Phi be the flow generated by replicator dynamics and xt\textbf{x}^{t} be the mixed strategy found on the ttht^{\text{th}} iterate of MWU. The error accrued by a single iteration of MWU with step-size η>0\eta>0 is

xt+1Φ(η,xt)1e2η.\|\textbf{x}^{t+1}-\Phi(\eta,\textbf{x}^{t})\|_{\infty}\leq 1-e^{-2\eta}~{}.

The proof of Lemma 1 consists of relatively straightforward calculations, but requires carefully handling nonlinearities introduced by MWU; a full proof is included in Appendix D.

Using Lemma 1 as a basis, we can bound the error accrued over multiple iterations of MWU.111In the language of numerical analysis, Lemma 1 gives the local error used to find the global error in Lemma 2.

Lemma 2.

Let Φ\Phi be the flow generated by replicator dynamics and xt\textbf{x}^{t} be the mixed strategy found on the ttht^{\text{th}} iterate of MWU. The error accrued after t+1t+1 iterations of MWU with step-size η>0\eta>0 is

xt+1Φ(tη,x0)𝒪(et).\|\textbf{x}^{t+1}-\Phi(t\eta,\textbf{x}^{0})\|\leq\mathcal{O}\left(e^{t}\right)~{}.

A full derivation of Lemma 2 is found in Appendix E, and follows from using Lemma 1 alongside standard techniques for bounding error in iterated numerical methods.

5.2.  Simulating Bounded Turing Machines with Multiplicative Weights Update

The result in Lemma 2 shows that, relative to replicator dynamics, the error accrued by MWU will grow with the number of iterations. Error growing as a function of time is problematic when simulating a Turing machine by using MWU as discretization of replicator dynamics.

Recall that in Theorem 1 we showed that replicator dynamics can simulate a universal Turing machine because it can embed a dynamical system that simulates a universal Turing machine, which is done to ensure the Turing machine’s halting remains equivalent to the dynamics’ trajectories reaching a certain set. However, in general, determining whether such a Turing machine will halt or how many steps are required to halt is undecidable. Therefore, without an a priori bound on the maximum amount of time needed to determine whether the machine halts or not, we cannot choose step sizes for MWU that guarantee the discretization remains sufficiently close to replicator dynamics when intersecting the relevant set.

Theorem 2.

Let k>0k>0 be a finite integer and 𝕋k\mathbb{T}_{k} be the set of Turing machines that we can determine to halt or not after kk steps of computation. For any kk, there exists step sizes η>0\eta>0 such that MWU with step-size η\eta can simulate any Turing machine in 𝕋k\mathbb{T}_{k}.

The result follows from the construction of the open sets used in Proposition 1 and the fact that we can ensure MWU’s discretization error stays sufficiently small over any finite window of time due to Lemma 2. Resolving the limitations of Theorem 2, and uncovering the true computational power of discrete algorithms such as MWU, will likely require new technical approaches for bounding errors or simulating Turing machines.

6.  Conclusion

We have shown that replicator dynamics in matrix games can simulate universal Turing machines. In continuous time, this observation was extended to provide deeper insight into the complexities of game theoretic learning. In fact, as highlighted in §4, the plurality of negative results on game dynamics can be understood as a natural byproduct of Theorem 3. Given that the present paper uses replicator dynamics specifically and matrix games broadly, complimenting the results given here with analyses based on other learning dynamics and classes of games could be instrumental in guiding future research by finding settings where designing well-behaved game dynamics is a tractable problem. As was done for Turing machines in computational complexity theory and becomes more natural given the techniques used in our analyses, compartmentalizing the complexity of learning in games using traditional complexity classes suggests a promising line of investigation for finding tractable settings for learning in games.

Refer to caption
Figure 1: A comparison of replicator dynamics and MWU on a matrix game derived by Andrade et al. (2021) to simulate a chaotic dynamical system. On the left is replicator dynamics with the dynamics embedded into its behaviours, whereas on the right we have 1000010000 iterations of MWU with a relatively large step size. Although not identical, it is clear that MWU retains the intricate and complex behaviours of replicator dynamics.

In discrete time, the Turing completeness of replicator dynamics was used to show that MWU can simulate bounded Turing machines. However, our approach does not disallow for the possibility of MWU being Turing complete as well; using MWU’s relationship to replicator dynamics seems to have inherent numerical limitations arising from error growing with time. Since discrete-time learning is more applicable in practice, it remains an important open question to determine whether MWU and other discrete learning algorithms are Turing complete. That being said, the smoothness constraints on continuous-time learning often leads to better behaved dynamics than discrete-analogues, and thus the study of continuous dynamics generally serves as restricted special case of what is possible in discrete-time. As evidence of this claim, not only are complex dynamic phenomena prevalent in low dimensional discrete systems where it is impossible in continuous systems (e.g. chaos (Chotibut et al., 2020)), but Figure 1 demonstrates the robustness of MWU by showing it can follow replicator dynamics on a matrix game derived by Andrade et al. (2021) in order to emulate the iconic Lorenz strange attractor. In future work, instead of using continuous learning dynamics as a proxy, directly simulating Turing machines with discrete dynamics may provide powerful tools for learning in games. Research on Turing machine simulations using physical systems has a rich history and encompasses far more than what is discussed in this paper. Various techniques have been used to directly simulate Turing machines using discrete dynamics (Moore, 1990; Siegelmann and Sontag, 1992), and insights from this prior work may hold potent insights for applications to learning in games.

Acknowledgments

We thank Joshua Grochow for their insights, discussions, and references. This research-project is supported in part by the National Research Foundation, Singapore under NRF 2018 Fellowship NRF-NRFF2018-07, AI Singapore Program (AISG Award No: AISG2-RP-2020-016), NRF2019-NRF-ANR095 ALIAS grant, AME Programmatic Fund (Grant No. A20H6b0151) from the Agency for Science, Technology and Research (A*STAR), grant PIE-SGP-AI-2018-01 and Provost’s Chair Professorship grant RGEPPV2101. This material is based upon work supported by the National Science Foundation under Grant No. IIS-2045347.

References

  • Andrade et al. [2021] Gabriel P. Andrade, Rafael Frongillo, and Georgios Piliouras. Learning in matrix games can be arbitrarily complex. In Mikhail Belkin and Samory Kpotufe, editors, Proceedings of Thirty Fourth Conference on Learning Theory, volume 134 of Proceedings of Machine Learning Research, pages 159–185. PMLR, 15–19 Aug 2021. URL https://proceedings.mlr.press/v134/andrade21a.html.
  • Arora et al. [2012] Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta-algorithm and applications. Theory of Computing, 8(1):121–164, 2012.
  • Benaïm et al. [2012] Michel Benaïm, Josef Hofbauer, and Sylvain Sorin. Perturbations of set-valued dynamical systems, with applications to game theory. Dynamic Games and Applications, 2:195–205, 2012.
  • Brenig and Goriely [1989] L. Brenig and A. Goriely. Universal canonical forms for time-continuous dynamical systems. Phys. Rev. A, 40:4119–4122, Oct 1989. doi: 10.1103/PhysRevA.40.4119. URL https://link.aps.org/doi/10.1103/PhysRevA.40.4119.
  • Cardona et al. [2021a] Robert Cardona, Eva Miranda, and Daniel Peralta-Salas. Turing universality of the incompressible Euler equations and a conjecture of Moore. CoRR, abs/2104.04356, 2021a. URL https://arxiv.org/abs/2104.04356.
  • Cardona et al. [2021b] Robert Cardona, Eva Miranda, Daniel Peralta-Salas, and Francisco Presas. Constructing Turing complete Euler flows in dimension 3. Proceedings of the National Academy of Sciences, 118(19), 2021b. ISSN 0027-8424. doi: 10.1073/pnas.2026818118. URL https://www.pnas.org/content/118/19/e2026818118.
  • Chotibut et al. [2020] Thiparat Chotibut, Fryderyk Falniowski, Michal Misiurewicz, and Georgios Piliouras. Family of chaotic maps from game theory. Dynamical Systems, 2020. doi: 10.1080/14689367.2020.1795624. https://doi.org/10.1080/14689367.2020.1795624.
  • de Lizaur [2021] Francisco Torres de Lizaur. Chaos in the incompressible Euler equation on manifolds of high dimension. arXiv preprint arXiv:2104.00647, 2021.
  • Flokas et al. [2020] Lampros Flokas, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Thanasis Lianeas, Panayotis Mertikopoulos, and Georgios Piliouras. No-regreet learning and mixed nash equilibria: They do not mix. In NeurIPS, 2020.
  • Freedman [1998] Michael H. Freedman. P/NP, and the quantum field computer. Proceedings of the National Academy of Sciences, 95(1):98–101, 1998. ISSN 0027-8424. doi: 10.1073/pnas.95.1.98. URL https://www.pnas.org/content/95/1/98.
  • Giannou et al. [2021] Angeliki Giannou, Emmanouil Vasileios Vlatakis-Gkaragkounis, and Panayotis Mertikopoulos. Survival of the strictest: Stable and unstable equilibria under regularized learning with partial information. In Mikhail Belkin and Samory Kpotufe, editors, Proceedings of Thirty Fourth Conference on Learning Theory, volume 134 of Proceedings of Machine Learning Research, pages 2147–2148. PMLR, 15–19 Aug 2021. URL https://proceedings.mlr.press/v134/giannou21a.html.
  • Graça et al. [2008] Daniel S. Graça, Manuel L. Campagnolo, and Jorge Buescu. Computability with polynomial differential equations. Advances in Applied Mathematics, 40(3):330–349, 2008. ISSN 0196-8858. doi: https://doi.org/10.1016/j.aam.2007.02.003. URL https://www.sciencedirect.com/science/article/pii/S019688580700067X.
  • Hainry [2009] Emmanuel Hainry. Decidability and undecidability in dynamical systems. 2009.
  • Hofbauer and Sigmund [1998] Josef Hofbauer and Karl Sigmund. Evolutionary Games and Population Dynamics. Cambridge University Press, 1998. doi: 10.1017/CBO9781139173179.
  • Hofbauer et al. [2009] Josef Hofbauer, Sylvain Sorin, and Yannick Viossat. Time average replicator and best-reply dynamics. Mathematics of Operations Research, 34(2):263–269, 2009. ISSN 0364765X, 15265471. URL http://www.jstor.org/stable/40538381.
  • Letcher [2021] Alistair Letcher. On the impossibility of global convergence in multi-loss optimization, 2021.
  • Meiss [2007] James D. Meiss. Differential Dynamical Systems (Monographs on Mathematical Modeling and Computation). Society for Industrial and Applied Mathematics, USA, 2007. ISBN 0898716357.
  • Mertikopoulos et al. [2018] Panayotis Mertikopoulos, Christos Papadimitriou, and Georgios Piliouras. Cycles in adversarial regularized learning. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’18, page 2703–2717, USA, 2018. Society for Industrial and Applied Mathematics. ISBN 9781611975031.
  • Moore [1990] Cristopher Moore. Unpredictability and undecidability in dynamical systems. Phys. Rev. Lett., 64:2354–2357, May 1990. doi: 10.1103/PhysRevLett.64.2354. URL https://link.aps.org/doi/10.1103/PhysRevLett.64.2354.
  • Perko [1991] Lawrence Perko. Differential Equations and Dynamical Systems. Springer-Verlag, Berlin, Heidelberg, 1991. ISBN 0387974431.
  • Reif et al. [1994] John H. Reif, J. Doug Tygar, and A. Yoshida. Computability and complexity of ray tracing. Discrete & Computational Geometry, 11:265–288, 1994.
  • Sanders et al. [2018] James B. T. Sanders, J. Doyne Farmer, and Tobias Galla. The prevalence of chaotic dynamics in games with many players. Scientific reports, 8(1):1–13, 2018.
  • Sandholm [2010] William H. Sandholm. Population Games and Evolutionary Dynamics. MIT Press, 2010.
  • Schuster and Sigmund [1983] Peter Schuster and Karl Sigmund. Replicator dynamics. Journal of Theoretical Biology, 100(3):533 – 538, 1983. ISSN 0022-5193. doi: http://dx.doi.org/10.1016/0022-5193(83)90445-9. URL http://www.sciencedirect.com/science/article/pii/0022519383904459.
  • Shalev-Shwartz [2012] Shai Shalev-Shwartz. Online learning and online convex optimization. Foundations and Trends® in Machine Learning, 4(2):107–194, 2012. ISSN 1935-8237. doi: 10.1561/2200000018. URL http://dx.doi.org/10.1561/2200000018.
  • Siegelmann and Sontag [1992] Hava T. Siegelmann and Eduardo D. Sontag. On the computational power of neural nets. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, COLT ’92, page 440–449, New York, NY, USA, 1992. Association for Computing Machinery. ISBN 089791497X. doi: 10.1145/130385.130432. URL https://doi.org/10.1145/130385.130432.
  • Sipser [1996] Michael Sipser. Introduction to the Theory of Computation. International Thomson Publishing, 1st edition, 1996. ISBN 053494728X.
  • Tao [2017] Terence Tao. On the universality of potential well dynamics. Dynamics of Partial Differential Equations, 14(3):219–238, 2017.
  • Taylor and Jonker [1978] Peter D. Taylor and Leo B. Jonker. Evolutionary stable strategies and game dynamics. Mathematical Biosciences, 40(1):145 – 156, 1978. ISSN 0025-5564. doi: https://doi.org/10.1016/0025-5564(78)90077-9. URL http://www.sciencedirect.com/science/article/pii/0025556478900779.
  • Weibull [1995] Jörgen W. Weibull. Evolutionary Game Theory. MIT Press; Cambridge, MA: Cambridge University Press., 1995.

Appendix A Turing Complete Polynomial Flows on 𝕊17\mathbb{S}^{17}

We will briefly sketch the construction by Cardona et al. [2021a] of the Turing complete polynomial vector field in Proposition 1; for a complete treatment we refer the reader to Cardona et al. [2021a]. To simplify notation, throughout this section we will represent a step in a Turing machine TT’s evolution (i.e. an iteration of Steps 1–3 in §2.4) by the global transition function

GT:Q×AQ×A,G_{T}:Q\times A\to Q\times A~{},

where we set GT(qhalt,w)(qhalt,w)G_{T}(q_{\text{halt}},w)\coloneqq(q_{\text{halt}},w) for any tape ww.

Let T=(Q,Σ,δ,q0,qhalt)T=\left(Q,\Sigma,\delta,q_{0},q_{\text{halt}}\right) be a Turing machine. We begin by encoding each configuration of TT as a constructible point in 3\mathbb{R}^{3}. Let r=|Q|r=|Q| be the cardinality of the set of states QQ, then we will represent the elements of QQ by [r]={1,,r}[r]=\{1,\dots,r\}\in\mathbb{N}. Since we know tapes satisfy eq. (5), we can encode any such w=(wi)iw=(w_{i})_{i\in\mathbb{Z}} as the pair of points in 2\mathbb{N}^{2} given by

y1\displaystyle y_{1} =w0+w110++wk010k0\displaystyle=w_{0}+w_{1}10+\dots+w_{k_{0}}10^{k_{0}}
y2\displaystyle y_{2} =w1+w210++wk010k01.\displaystyle=w_{-1}+w_{-2}10+\dots+w_{-k_{0}}10^{k_{0}-1}~{}.

Taken together, we have an encoding of every (q,w)Q×A(q,w)\in Q\times A as (y1,y2,q)33(y_{1},y_{2},q)\in\mathbb{N}^{3}\subset\mathbb{R}^{3}. Define ζ:Q×A3\zeta:Q\times A\to\mathbb{N}^{3} as the map assigning each configuration in Q×AQ\times A its associated point in 3\mathbb{N}^{3} that we constructed. The global transition function GTG_{T} can now be reinterpreted as a map from ζ(Q×A)3\zeta(Q\times A)\subset\mathbb{N}^{3} to itself. By extending said map to be the identity map on points in 3ζ(Q×A)\mathbb{N}^{3}\setminus\zeta(Q\times A), we arrive at a map on the whole of 3\mathbb{N}^{3} to itself—for simplicity, we will denote this extended map by Gζ(T):33G_{\zeta(T)}:\mathbb{N}^{3}\to\mathbb{N}^{3}.

Using this encoding, the next step in the construction is to simulate TT using a polynomial vector field PP on n+3\mathbb{R}^{n+3}. To this end, a modification of a construction by Graça et al. [2008] is given. Specifically, Graça et al. [2008] construct a non-autonomous polynomial vector field that simulates TT, and this vector field is made autonomous via a standard trick of introducing a proxy variable in place of the explicit dependence on time. Let PP on n+3\mathbb{R}^{n+3} be the autonomous polynomial vector field derived via this modification. The construction by Graça et al. [2008] also shows how, given an input tape sAs\in A, a point ps=(ζ(q0,s),y~0)n+3p_{s}=\left(\zeta(q_{0},s),\tilde{y}_{0}\right)\in\mathbb{R}^{n+3} is constructed so that the trajectory of PP starting from psp_{s} will simulate Gζ(T)G_{\zeta(T)}. The term ζ(q0,s)3\zeta(q_{0},s)\in\mathbb{R}^{3} is defined above and the term y~0n\tilde{y}_{0}\in\mathbb{R}^{n} is from a composition of polynomials depending only on TT and ss—neither of these points are affected by the modification and can be taken as is. The group property of flows ensures that any trajectory passing through psp_{s} is equivalent to a trajectory ending at and then “restarting” from psp_{s}, so we can assume psp_{s} is an initial condition in Definition 2 without loss of generality. Suppose we have a finite string w=(wk,,wk)w^{*}=(w^{*}_{-k},\dots,w^{*}_{k}) of symbols in Σ\Sigma, we will now construct the set UwU_{w^{*}} in Definition 2.222For brevity we will brush over the construction of this set on the component corresponding to the proxy variable for time. Technically this component should be a union of small open intervals for each ii\in\mathbb{N}, which intuitively associates a rough length of time in the dynamical system with a step in the Turing machine. However, formally introducing this portion of the construction is not particularly insightful since the relevance to the proof is rather tautological due to the proxy variable monotonically increasing at the same constant rate as time. Let ω={wΣ|wi=wii[k,k]}\omega=\{w\in\Sigma^{\mathbb{Z}}~{}|~{}w_{i}=w^{*}_{i}~{}\forall i\in[-k,k]\}, ϵ>0\epsilon>0 be a small positive constant, and 3|ζ(qhalt,ω){\left.\kern-1.2pt\mathbb{R}^{3}\vphantom{\big{|}}\right|_{\zeta(q_{\text{halt}},\omega)}} be the set of points in 3\mathbb{R}^{3} corresponding to configurations of TT of the form (qhalt,wω)(q_{\text{halt}},w\in\omega). Defining Uwϵ3U_{w^{*}}^{\epsilon}\subset\mathbb{R}^{3} as an ϵ\epsilon-neighborhood of 3|ζ(qhalt,ω){\left.\kern-1.2pt\mathbb{R}^{3}\vphantom{\big{|}}\right|_{\zeta(q_{\text{halt}},\omega)}} gives the open set

UwUwϵ×n.U_{w^{*}}\coloneqq U_{w^{*}}^{\epsilon}\times\mathbb{R}^{n}~{}.

Showing PP satisfies Definition 2 with this choice of psp_{s} and UwU_{w^{*}} follows from a relatively straightforward argument using properties inherited from the construction by Graça et al. [2008]. Finally, the polynomial vector field XX in Proposition 1 is constructed by using the pullback of inverse stereographic projection on a suitable reparametrization of PP and taking TT to be a universal Turing machine. The pullback of inverse stereographic projection ensures that XX is a polynomial vector field tangent to the sphere and the reparametrization ensures the vector field is bounded.333Technically X=X¯|𝕊n+4X={\left.\kern-1.2pt\overline{X}\vphantom{\big{|}}\right|_{\mathbb{S}^{n+4}}}, where X¯\overline{X} is a polynomial vector field on n+5\mathbb{R}^{n+5} and tangent to 𝕊n+4\mathbb{S}^{n+4}. Similarly, as discussed in the proof of Theorem 1.31.3 by Cardona et al. [2021a], the reparametrization ensures the vector field is global because it is bounded. The fact that XX is well-defined on 𝕊17\mathbb{S}^{17} and has degree 5858 follows from an analysis by Hainry [2009] of the construction by Graça et al. [2008].

Appendix B Proof of Corollary 1

Corollary 4.

For some m(7618)+1m\leq{76\choose 18}+1, there exists a matrix game Am×mA\in\mathbb{R}^{m\times m} such that replicator dynamics on AA is Turing complete.

Proof.

Let XX, X¯\overline{X}, and YY be the vector fields defined in the proof of Theorem 1. Similarly, let Am×mA\in\mathbb{R}^{m\times m} be the matrix game we arrived at by applying Proposition 2 to YY. From Proposition 2 we know that m1m-1 is at least the number of unique monomials in the generalized polynomials in YY, so the proof follows by bounding the number of unique monomials from above.

From Proposition 1 we know that XX is a polynomial vector field of degree 5858. As mentioned in Appendix A, the specific degree of 5858 was derived from follow-up work by Hainry [2009] analyzing the construction by Graça et al. [2008]. However, although the vector field is technically constructible, actually constructing XX to simulate a universal Turing machine is non-trivial in practice. With this complication in mind, a crude upper bound on the number of unique monomials in XX is simply the number of unique monomials of degree 5858 in 1818 variables. Therefore, a standard combinatorial argument tells us that the number of unique monomials in the polynomials of XX is at most (58+1818)=(7618){58+18\choose 18}={76\choose 18}. The construction of X¯\overline{X} cannot increase the number of monomials counted by this combinatorial argument since it can only introduce unique monomials via the term 1x221-\|\textbf{x}\|_{2}^{2}, which is already counted in the bound (7618){76\choose 18}. Similarly, we construct YY by translating X¯\overline{X} by a constant and therefore can only introduce the constant monomial (i.e. terms with all variables having zero exponents) which is already being counted. Thus we have found that m1(7618)m-1\leq{76\choose 18}, which implies m(7618)+1m\leq{76\choose 18}+1. ∎

Appendix C Deriving MWU as discrete-analogue of Replicator Dynamics

Let δ:nΔn\delta:\mathbb{R}^{n}\to\Delta^{n} be the logit map defined as

δi(y)=exp(yi)j[n]exp(yj),yn,i[n].\delta_{i}(\textbf{y})=\frac{\operatorname{\operatorname*{exp}}(\textbf{y}_{i})}{\sum_{j\in[n]}\operatorname{\operatorname*{exp}}(\textbf{y}_{j})}~{},\qquad\textbf{y}\in\mathbb{R}^{n},i\in[n]~{}.

Hofbauer et al. [2009] showed that the flow generated by replicator dynamics can be written as

xi(t)=δi(y(t))=exp(yi(t))j[n]exp(yj(t)),y(0)n,i[n],t,\textbf{x}_{i}(t)=\delta_{i}(\textbf{y}(t))=\frac{\operatorname{\operatorname*{exp}}(\textbf{y}_{i}(t))}{\sum_{j\in[n]}\operatorname{\operatorname*{exp}}(\textbf{y}_{j}(t))}~{},\qquad\textbf{y}(0)\in\mathbb{R}^{n},i\in[n],t\in\mathbb{R}~{}, (7)

where x and y are the mixed strategy and cumulative payoff vectors given in eq. (2). Rewriting eq. (7) in the form of eq. (2) gives an explicit representation of replicator dynamics’ trajectories as a functions of cumulative payoffs,

yi(t)\displaystyle\textbf{y}_{i}(t) =yi(0)+0tj[n]Ai,jδj(y(s))ds\displaystyle=\textbf{y}_{i}(0)+\int_{0}^{t}\sum_{j\in[n]}A_{i,j}\delta_{j}(\textbf{y}(s))ds (8)
xi(t)\displaystyle\textbf{x}_{i}(t) =δi(y(t)).\displaystyle=\delta_{i}(\textbf{y}(t))~{}.

By applying a standard Euler discretization with step size η\eta to the payoffs y in eq. (8), we find

yi(t+η)yi(t)+ηy˙i(t)=yi(t)+ηj[n]Ai,jδj(y(t)).\textbf{y}_{i}(t+\eta)\approx\textbf{y}_{i}(t)+\eta\ \dot{\textbf{y}}_{i}(t)=\textbf{y}_{i}(t)+\eta\sum_{j\in[n]}A_{i,j}\delta_{j}(\textbf{y}(t))~{}.

Finally, iteratively applying this Euler discretization of the cumulative payoffs and using the logit map will give us the well-known MWU algorithm. Formally, denoting the discretization’s ttht^{\text{th}} iterate by yt\textbf{y}^{t}, we write MWU as

yit+1\displaystyle\textbf{y}_{i}^{t+1} =yit+ηj[n]Ai,jδj(yt)=yi0+ητ=1tAi,jδj(yt)\displaystyle=\textbf{y}_{i}^{t}+\eta\sum_{j\in[n]}A_{i,j}\delta_{j}(\textbf{y}^{t})=\textbf{y}_{i}^{0}+\eta\sum_{\tau=1}^{t}A_{i,j}\delta_{j}(\textbf{y}^{t}) (9)
xit+1\displaystyle\textbf{x}_{i}^{t+1} =δi(yt+1)=δi(y0+ητ=1tAi,jδj(yt)).\displaystyle=\delta_{i}(\textbf{y}^{t+1})=\delta_{i}(\textbf{y}^{0}+\eta\sum_{\tau=1}^{t}A_{i,j}\delta_{j}(\textbf{y}^{t}))~{}.

As the form of MWU in eq. (9) was found via an Euler discretization, a standard result in numerical analysis tells us that the error accrued by a single iteration of MWU starting from the same initial conditions as replicator dynamics will satisfy

yi1yi(η)𝒪(η2).\|\textbf{y}_{i}^{1}-\textbf{y}_{i}(\eta)\|\leq\mathcal{O}(\eta^{2})~{}.

However, since we are simulating Turing machines in the space of mixed strategies, we need error bounds on the probability simplex itself and not in the space of cumulative payoffs.

Appendix D Proof of Lemma 1

Lemma 3.

Let Φ\Phi be the flow generated by replicator dynamics and xt\textbf{x}^{t} be the mixed strategy found on the ttht^{\text{th}} iterate of MWU. The error accrued by a single iteration of MWU with step-size η>0\eta>0 is

xt+1Φ(η,xt)1e2η.\|\textbf{x}^{t+1}-\Phi(\eta,\textbf{x}^{t})\|_{\infty}\leq 1-e^{-2\eta}~{}.
Proof.

Suppose without loss of generality that for any action ii the expected payoff is bounded to [1,1][-1,1], i.e. j[n]Ai,jδj(y)[1,1]\sum_{j\in[n]}A_{i,j}\delta_{j}(\textbf{y})\in[-1,1].444The assumption that expected payoffs are bounded to [1,1][-1,1] does not affect learning dynamics since we can always normalize the payoff matrix by its largest element. Let W(t)=j[n]exp(yj(t))W(t)=\sum_{j\in[n]}\operatorname{\operatorname*{exp}}(\textbf{y}_{j}(t)) and Wi(t)=exp(yi(t))=xi(t)W(t)W_{i}(t)=\operatorname{\operatorname*{exp}}(\textbf{y}_{i}(t))=x_{i}(t)W(t). Then continuous time RD becomes

xi(t)=Wi(t)W(t).\textbf{x}_{i}(t)=\frac{W_{i}(t)}{W(t)}~{}.

Similarly, define W^t=j[n]exp(yjt1)\hat{W}^{t}=\sum_{j\in[n]}\operatorname{\operatorname*{exp}}(\textbf{y}_{j}^{t-1}) and W^it=exp(yit1)=xitW^t\hat{W}_{i}^{t}=\operatorname{\operatorname*{exp}}(\textbf{y}_{i}^{t-1})=x_{i}^{t}\hat{W}^{t}. Then MWU becomes

xit=W^itW^t.\textbf{x}_{i}^{t}=\frac{\hat{W}_{i}^{t}}{\hat{W}^{t}}~{}.

We are interested in bounding the local error of MWU as a discretization of RD, i.e. the error introduced by a single step of MWU relative to RD after a single step starting from the same point. Thus without loss of generality we will focus on the first iterate of MWU and RD after t=ηt=\eta amount of time. Since expected payoffs are bounded to [1,1][-1,1] we deduce from the analysis in Appendix C that

W^i1exp(η)Wi(η)W^i1exp(η),\hat{W}_{i}^{1}\operatorname{\operatorname*{exp}}(-\eta)\leq W_{i}(\eta)\leq\hat{W}_{i}^{1}\operatorname{\operatorname*{exp}}(\eta)~{},

which implies

W^1exp(η)W(η)W^1exp(η).\hat{W}^{1}\operatorname{\operatorname*{exp}}(-\eta)\leq W(\eta)\leq\hat{W}^{1}\operatorname{\operatorname*{exp}}(\eta)~{}.

Hence

xi1exp(2η)xi(η)xi1exp(2η)\textbf{x}_{i}^{1}\operatorname{\operatorname*{exp}}(-2\eta)\leq\textbf{x}_{i}(\eta)\leq\textbf{x}_{i}^{1}\operatorname{\operatorname*{exp}}(2\eta)

whenever RD and MWU start from the same initial condition.

We have thus found that the local error introduced by a single time step is

x1x(η)x1x1exp(2η)|1exp(2η)|.\|\textbf{x}^{1}-\textbf{x}(\eta)\|\leq\|\textbf{x}^{1}-\textbf{x}^{1}\operatorname{\operatorname*{exp}}(-2\eta)\|\leq|1-\operatorname{\operatorname*{exp}}(-2\eta)|~{}.

Observing that η>0\eta>0 gives the result. ∎

Appendix E Proof of Lemma 2

Lemma 4.

Let Φ\Phi be the flow generated by replicator dynamics and xt\textbf{x}^{t} be the mixed strategy found on the ttht^{\text{th}} iterate of MWU. The error accrued after t+1t+1 iterations of MWU with step-size η>0\eta>0 is

xt+1Φ(tη,x0)𝒪(et).\|\textbf{x}^{t+1}-\Phi(t\eta,\textbf{x}^{0})\|\leq\mathcal{O}\left(e^{t}\right)~{}.
Proof.

The flow Φ\Phi is C1C^{1} and Δn\Delta^{n} is compact, so we know that Φ\Phi is Lipschitz continuous. Let LL denote the Lipschitz constant for Φ\Phi with respect to \|\cdot\|_{\infty}. It follows that for every initial condition x0Δn\textbf{x}^{0}\in\Delta^{n},

Et+1=xt+1Φ((t+1)η,x0)\displaystyle E^{t+1}=\|\textbf{x}^{t+1}-\Phi((t+1)\eta,\textbf{x}^{0})\| =xt+1Φ(η,Φ(tη,x0))\displaystyle=\|\textbf{x}^{t+1}-\Phi(\eta,\Phi(t\eta,\textbf{x}^{0}))\|
=xt+1Φ(η,xt)+Φ(η,xt)Φ(η,Φ(tη,x0))\displaystyle=\|\textbf{x}^{t+1}-\Phi(\eta,\textbf{x}^{t})+\Phi(\eta,\textbf{x}^{t})-\Phi(\eta,\Phi(t\eta,\textbf{x}^{0}))\|
xt+1Φ(η,xt)+Φ(η,xt)Φ(η,Φ(tη,x0))\displaystyle\leq\|\textbf{x}^{t+1}-\Phi(\eta,\textbf{x}^{t})\|+\|\Phi(\eta,\textbf{x}^{t})-\Phi(\eta,\Phi(t\eta,\textbf{x}^{0}))\|
|1exp(2η)|+eηLxtΦ(tη,x0)\displaystyle\leq|1-\operatorname{\operatorname*{exp}}(-2\eta)|+e^{\eta L}\|\textbf{x}^{t}-\Phi(t\eta,\textbf{x}^{0})\|
=|1exp(2η)|+eηLEt\displaystyle=|1-\operatorname{\operatorname*{exp}}(-2\eta)|+e^{\eta L}E^{t}

To conclude our proof, we require a special case of the discrete Gronwall lemma. This powerful tool for numerical error analysis tells us that if, for some constants aa and bb with a>0a>0, a positive sequence {zτ}τ=0t\{z^{\tau}\}_{\tau=0}^{t} satisfies

zτ+1b+azτ,τ[t1],z^{\tau+1}\leq b+az^{\tau}~{},\qquad\forall\tau\in[t-1]~{},

then for a1a\neq 1

zτaτz0+baτ1a1,τ[t],z^{\tau}\leq a^{\tau}z^{0}+b\frac{a^{\tau}-1}{a-1}~{},\qquad\forall\tau\in[t]~{},

and for a=1a=1

zτz0+τb,τ[t].z^{\tau}\leq z^{0}+\tau b~{},\qquad\forall\tau\in[t]~{}.

Recall that both η>0\eta>0 and L>0L>0 by definition. Let zt=Etz^{t}=E^{t}, a=eηLa=e^{\eta L}, and b=|1exp(2η)|b=|1-\operatorname{\operatorname*{exp}}(-2\eta)|. Applying the discrete Gronwall lemma yields

Et+1e(t+1)ηLE0+|1exp(2η)|e(t+1)ηL1eηL1.E^{t+1}\leq e^{(t+1)\eta L}E^{0}+|1-\operatorname{\operatorname*{exp}}(-2\eta)|\frac{e^{(t+1)\eta L}-1}{e^{\eta L}-1}~{}.

Clearly E0=0E^{0}=0 since MWU and replicator dynamics have the same initial conditions. Thus we have shown

Et+1|1exp(2η)|e(t+1)ηL1eηL1,E^{t+1}\leq|1-\operatorname{\operatorname*{exp}}(-2\eta)|\frac{e^{(t+1)\eta L}-1}{e^{\eta L}-1}~{},

which concludes the proof. ∎