Identifying Bridges and Catalysts for Persistent Cooperation Using Network-Based Approach

1^st Xingru Chen School of Science
Beijing University of Posts and Telecommunications
Beijing 100876, China
[email protected] 2^nd Feng Fu Department of Mathematics
Department of Biomedical Data Science
Dartmouth College
Hanover, NH 03755, USA
[email protected]

Abstract

The framework of iterated Prisoner’s Dilemma (IPD) is commonly used to study direct reciprocity and cooperation, with a focus on the assessment of the generosity and reciprocal fairness of an IPD strategy in one-on-one settings. In order to understand the persistence and resilience of reciprocal cooperation, here we study long-term population dynamics of IPD strategies using the Moran process where stochastic dynamics of strategy competition can lead to the rise and fall of cooperation. Although prior work has included a handful of typical IPD strategies in the consideration, it remains largely unclear which type of IPD strategies is pivotal in steering the population away from defection and providing an escape hatch for establishing cooperation. We use a network-based approach to analyze and characterize networks of evolutionary pathways that bridge transient episodes of evolution dominated by depressing defection and ultimately catalyze the evolution of reciprocal cooperation in the long run. We group IPD strategies into three types according to their stationary cooperativity with an unconditional cooperator: the good (fully cooperative), the bad (fully exploitive), and the ugly (in between the former two types). We consider the mutation-selection equilibrium with rare mutations and quantify the impact of the presence versus absence of any given IPD strategy on the resulting population equilibrium. We identify catalysts (certain IPD strategies) as well as bridges (particular evolutionary pathways) that are most crucial for boosting the abundance of good types and suppressing that of bad types or having the highest betweenness centrality. Our work has practical implications and broad applicability to real-world cooperation problems, such as conceiving protocols of steering control and strategy intervention by leveraging catalysts and bridges that are capable of strengthening persistence and resilience.

Index Terms:

evolutionary game dynamics, resilience, pairwise invasion, network theory, pivotal node

I Introduction

Understanding how cooperation evolves and is sustained is a prominent problem of broad interest and primary significance [1]. Among others, direct reciprocity has been extensively studied using the Iterated Prisoner’s Dilemma (IPD) games, where individual behavior is categorized as different strategies [2, 3]. In particular, the so-called Zero-Determinant (ZD) strategies is a set of rather simple memory-one strategies that can unilaterally set a linear relation between their own payoffs and that of their opponent [4, 5]. The finding of such a powerful control over payoffs has greatly spurred new waves of work from diverse fields including network science, computer science, and social science, aiming to shed light on the robustness and resilience of cooperation by means of the natural selection of IPD strategies [6, 7, 8, 9].

Prior work on IPD strategies focuses on their ability to foster fairness and cooperation among pairwise interactions. Because of the uncertainty in opponent types, IPD strategies can be optimized in terms of their tolerance, retaliation, and reconciliation of defective moves and also their level of self-recognition and discerning co-players. For example, Grim Trigger (also known as Grudger) retaliates the opponent’s defection by turning to defection forever [10], and Tit for Tat (TFT) always replicates the opponent’s previous move [11]. In contrast, Tit for Two Tats (TF2T) can tolerate the co-player’s defection not more than twice in a row before taking revenge [12]. Contrite TFT can also reconcile errors and mistakes in moves with cooperation onwards [13]. Suspicious TFT uses defection as an initial trial of the opponent’s type because of the low trust of others [6]. Collective strategies further take a particular sequence of initial moves to distinguish “us vs them” and only cooperate with themselves [14]. Even though this field has been extensively studied with more than 100 common IPD strategies discovered, the framework of IPD games still remains and increasingly becomes an important testbed for combining ideas from artificial intelligence and game theory [15, 16, 17, 18, 19, 20].

A most striking discovery of IPD strategies is that ZD strategies are able to enforce a unilateral linear relationship between their own payoff and their co-player’s [4, 5]. By prescribing their particular move conditional on each outcome, ZD players can control the payoff results and even demand an unfair share of the payoffs accrued from interactions. Inspired by this fact, previous studies classify IPD strategies using a dichotomy by their intention of cooperation and ability to reciprocate: partner vs rival strategies [21]. These strategies themselves are powerful and form a refined subset of IPD strategies. In this work, we extend prior studies and focus on the morality of strategies based on a simple yet intuitive definition: good strategies that are fully cooperative with an unconditional cooperator (ALLC), bad strategies that are fully defective with an unconditional cooperator, and ugly ones that fall in between. The present classification can lead to a mesoscopic description of cyclic population dynamics in a manner similar to Rock-Paper-Scissors games [22].

Although prior work has included a handful of strategies in consideration, it remains largely unclear which types of strategies are pivotal in steering the population away from defection and providing an escape hatch for establishing cooperation. To characterize such transitions between strategies, we consider a network of strategies where evolutionary pathways between them can be evaluated as a direct graph, depending on their ability to disfavor defection and foster the evolution of reciprocal cooperation over the long term.

In our model, individuals play against each other with prescribed IPD strategies and obtain their respective average payoffs from the interactions. Their payoffs, subsequently, will determine their reproductive fitness. To study evolutionary competition, we use a Moran process in a population of finite size and with a significantly low mutation rate [23]. The long-term equilibrium of the corresponding evolutionary dynamics can be analytically derived using the approximation method of an embedded Markov chain [24, 25]. Moreover, we can define a directed network where (i) the nodes are the strategies, (ii) the direction of an edge indicates the fixation of the “target strategy” in a homogeneous population of the “source strategy”, and (iii) the weight of an edge is the ratio of the two fixation probabilities between the pair of strategies. The creation and manipulation of this network allow us to incorporate standard graph measures and algorithms to analyze the functionality of any strategy in the population.

The network-based method helps describe and compare competing IPD strategies. Above all, it can be used to identify essential IPD strategies like Win-Shift, Lose-Stay [26] (the reverse of Win-Stay, Lose-Shift (WSLS) [27]) that act as catalysts and transitions between strategies such as from ALLD to Win-Shift, Lose-Stay that act as bridges for recovering cooperation from defection. Their presence plays an important role in the robustness and persistence of cooperation. Our findings share strong similarities with previous studies on the ecological stability and resilience of food webs [28, 29, 30].

II Methods and Model

II-A Payoff structure

The Prisoner’s Dilemma (PD) game is a symmetric game involving two players X and Y, and two actions: to cooperate or to defect. In a one-shot PD game, the four possible outcomes correspond to different payoffs from the focal player’s perspective: if both are cooperators, one gets the reward $R$ , if a cooperator is against a defector, the sucker’s payoff $S$ , if a defector is against a cooperator, the temptation $T$ , and if both are defectors, the punishment $P$ . The game is considered a paradigm for understanding the conflict between self-interest and collective interest as the payoff structure satisfies $T>R>P>S$ .

The iterated Prisoner’s Dilemma (IPD) games further assume repeated encounters between the same two individuals and shed insights into the idea of direct reciprocity [2]. For any pair of two IPD strategies, denoted by A and B without loss of generality, we can compute the average payoff matrix for their game interactions:

\blockarray{ccc}&\text{A}\text{B}\\ \block{c[cc]}\text{A}ab\bigstrut[t]\\ \text{B}cd\bigstrut[b]\\ .

(1)

The pairwise competition dynamics between A and B typically fall into four types [31]:

(a)

dominance, if $a>c$ and $b>d$ or $a<c$ and $b<d$ ;
(b)

bistability, if $a>c$ and $b<d$ ;
(c)

coexistence, if $a<c$ and $b>d$ ;
(d)

neutrality, if $a=c$ and $b=d$ .

II-B Classification of IPD strategies

Although hundreds of strategies for the IPD games have been published in the literature, the set is almost negligible compared with the inexhaustible strategy space. To further evaluate these strategies, it was a natural question to ask: how can they be divided into groups?

The strategies can be labeled with their memory lengths which indicate the amount of information they can hold in mind. There are memory-zero strategies whose current actions do not depend on the history of the match. Next comes memory-one strategies remembering only the outcome of the previous round. So on and so forth. A strategy can even have infinite memory, for instance, a Looker-up strategy which always remembers the result of the very first round [18].

Despite that the classification by memory lengths is plain and simple, it cannot reflect the competitiveness and dominance of strategies in general. Later on, another taxonomy was put forward, further exploring the evolutionary relevance of strategies [21].

Definition 1 (Nice Strategies).

A strategy is nice if it is never the first to defect $\Leftrightarrow$ if it always cooperates with a cooperator $\Leftrightarrow$ if it always cooperates with itself.

Definition 2 (Cautious Strategies).

A strategy is cautious if it is never the first to cooperate $\Leftrightarrow$ if it always defects against a defector $\Leftrightarrow$ if it always defects against itself.

The equivalence relations are straightforward to show and we omit the proof here.

These two groups of strategies have no intersections. A nice strategy aims for the payoff $R$ while a cautious one refuses to be extorted by others. It is worth noticing that both nice and cautious strategies take up only a measure-zero and hence a negligible portion of the strategy space. There exist other strategies that do not belong to the two groups, for example, the extortionate ZD strategies.

Remark.

A finer classification based on Definition 1 and 2 gives partners and rivals as subsets of nice and cautious strategies, respectively. If a partner’s payoff is less than $R$ , then the opponent’s payoff should also be less than $R$ :

\pi_{X}<R\Rightarrow\pi_{Y}<R.

(2)

On the other hand, a rival’s payoff is always greater than or equal to the opponent’s payoff:

\pi_{X}\geq\pi_{Y}.

(3)

The “opposite” of partners and rivals are referred to as requiting strategies and submissive strategies.

This classification touches on the performance of IPD strategies in the process of evolution. We now introduce our own taxonomy where strategies are divided into three different classes based on the expected payoff of a cooperator as their opponent in the IPD games (see Figure 1a). The method is simple and works as well as, if not better than, the existing one for identifying the competitiveness of strategies.

Definition 3 (Good Strategies).

A strategy is good if a cooperator gets an expected payoff $\pi_{Y}=R$ as its opponent.

Definition 4 (Bad Strategies).

A strategy is bad if a cooperator gets an expected payoff $\pi_{Y}=S$ as its opponent.

Definition 5 (Ugly Strategies).

A strategy is ugly if a cooperator gets an expected payoff $S<\pi_{Y}<R$ as its opponent.

Intuitively, good strategies tend to be friendly to an innocent opponent while bad strategies are determined to exploit the same opponent. It is straightforward to make a comparison between different groups of strategies from the classifications of nice-cautious and good-bad-ugly. For instance, a nice strategy and hence a partner is always a good strategy. According to [21], successful strategies from the perspective of evolution are either partners or rivals. We can draw a similar conclusion given Definition 3, 4, and 5: successful individuals are either good or bad.

Refer to caption — Figure 1: The good, the bad, and the ugly. (a) We classify IPD strategies according to their inclination of exploitation against an unconditional cooperator (ALLC): good ones are those who never exploit, bad ones are those who always exploit, and ugly ones are those in between. Unlike prior studies based on pairwise payoff comparison of self vs opponent from the perspective of fairness, our approach offers an intuitive classification that is built on the intrinsic cooperativity of the strategies themselves. (b) and (c) There exists cyclic population dynamics that resembles the classic Rock-Paper-Scissors games in the competition among the good, the bad, and the ugly strategies. A specific cycle of length 3 is given in (b) and a cycle of length 6 is given in (c). The direction of the edges in (b) and (c) indicates greater fixation fixability than the opposite.

II-C Evolutionary dynamics

We use a network-based approach to analyze and characterize networks of evolutionary pathways that bridge transient episodes of evolution dominated by depressing defection and ultimately catalyze the evolution of reciprocal cooperation in the long run. In our model, individuals play against each other using their prescribed IPD strategies (there are $m$ many) and obtain an average payoff from their interactions, $\pi_{i}$ . Their payoffs will further determine their reproductive fitness, $f_{i}=\exp[\beta\pi_{i}]$ , where $\beta$ is the selection strength.

To study evolutionary competition, we use a Moran process in a population of finite size $N$ : an individual is chosen to reproduce an offspring with probability proportional to their fitness. With probability $\mu$ , a mutation occurs and the offspring will randomly choose one of the available $m$ strategies. With probability $1-\mu$ , the offspring is identical to the parent. This newly produced offspring will replace another individual randomly chosen from the entire population.

In the limit of rare mutations where $\mu\to 0$ , the fate of a mutant is determined, either reaching fixation or going extinct before the next new mutant arises. Under this assumption, the evolutionary dynamics in finite populations will dwell on a homogeneous population state most of the time, followed by stochastic transitions from one state to another. And the transition between any two population states is determined by the pairwise invasion dynamics (Figure 2). The transition rate can be calculated using the fixation probabilities $\rho_{ij}$ , which is the probability that a population of strategy $i$ -players is invaded and taken over by a single strategy $j$ -player (Figure 2b). Assuming the payoff matrix for $i$ vs $j$ as in (1), the ratio of fixation probabilities is given by

\frac{\rho_{ji}}{\rho_{ij}}=\exp\{\beta[\frac{N}{2}(a+b-c-d)-a+d]\}.

(4)

Moreover, the long-term equilibrium of the full evolutionary dynamics for $m$ multiple strategies can be analytically studied using the approximation method of an embedded Markov chain.

Theorem 1 (Imitation Processes with Small Mutations).

The $m\times m$ Markov matrix of the $m$ homogeneous states is determined by the transition rates between pairs of strategies. In general, the $ij$ th component is

m_{ij}=\begin{dcases}\mu\rho_{ij},&i\neq j\\ 1-\displaystyle\sum_{i\neq j}\mu\rho_{ij}.&i=j\end{dcases}

(5)

In addition, there exists a unique vector $v=[v_{1},v_{2},\cdots,v_{m}]$ satisfying

\bm{v}M=M,\qquad\displaystyle\sum_{i=1}^{m}v_{i}=1,\qquad v_{i}\geq 0,\qquad\forall i,

(6)

where the $i$ th component $v_{i}$ is the abundance of strategy $i$ after the system becomes stable.

We use Theorem 6 to compute the stationary distribution of IPD strategies under the limit of a small mutation rate as shown in the Results section.

We can now consider the $m$ strategies as network nodes and define a directed edge from strategy $i$ to strategy $j$ if the latter has a greater fixation probability and the weight is given by $\rho_{ij}/\rho_{ji}$ , and vice versa. In such a way, we obtain a weighted network that consists of different IPD strategies for further identification of catalysts and bridges. We search exhaustively ugly strategies and transitions from or to ugly strategies boosting the abundance of good strategies and suppressing that of bad strategies. There exist cycles in this directed network, a closed directed loop of different IPD strategies that can give rise to cyclic population dynamics of persistent cooperation in a Rock-Paper-Scissors manner.

III Results

Due to the symmetry in which good strategies and AllC each get an average payoff $R$ , whether a good strategy can be favored over ALLC in the pairwise competition dynamics is determined by their self-cooperation levels. As such, the presence of good strategies is pivotal to determining the fate of ALLC. On the other hand, good strategies can be viewed as allies of ALLC, bad strategies can be seen as suppressors of ALLC, and ugly strategies can be considered in between the two extremes.

To characterize and quantify the long-term collective success of good strategies as compared to their counterparts of bad and ugly ones, we study the evolutionary dynamics with a pool of prescribed IPD strategies with rare mutations. In this limit, the population spends most of the time in homogeneous states and the fate of any new mutant, either fixation or extinction, is determined before the next mutant arises in the population. To gain analytical insights, we first identify the type of game interactions between each pair of IPD strategies as shown in Figure 2a. We distinguish four types: dominance (their percentage is $43\%$ ), bistability ( $23\%$ ), co-existence ( $9\%$ ), neutrality ( $24\%$ ), and a few others that do not fall into these four types. Remarkably, neutrality takes up a substantial proportion. The pairwise fixation probabilities are shown in Figure 2b (noting the dependence on selection strength $\beta$ and population size $N$ ). Under rare mutations, we are able to analytically calculate the stationary distribution for any given set of IPD strategies under consideration (see Methods and Model section). We confirm a good agreement between analytical results and agent-based simulations.

We use a directed weighted network to capture all possible evolutionary pathways between any pair of IPD strategies. Each edge can be distinguished by the type of game interactions as shown in Figure 3, and their weight is further given by the ratio of fixation probabilities. The neutrality games, especially between good IPD strategies (Figure 3d), provide an escape hatch for sustaining cooperation and also increasing resilience against perturbations such as invasion attempts by bad strategies.

Our network-based approach works for scenarios involving arbitrarily many IPD strategies. In order to get a clear-cut view with an intuitive understanding, we focus on a small sample of 25 IPD strategies, which are either memory-zero (3) or memory-one (22) yet contain the most prominent ones such as TFT, WSLS, and ZD strategies. We use pairwise fixation probabilities (Figure 4a) to compute the stationary abundances of strategies (Figure 4b), and the lump sum abundances of good, bad, and ugly strategies are given in the inset (Figure 4b). We see that good strategies are collectively more successful than the other two types. The network containing all possible pairwise evolutionary pathways is visualized in Figure 4c. The weight of each edge is given by the ratio of fixation probabilities pointing from the one disfavored to the other favored. In the case of two strategies that have equal fixation probabilities, the associated edge is bidirectional with weight one. This network will help us to further identify catalysts and bridges pivotal to the evolutionary success of good strategies.

In Figure 5a, we find that Win-Shift, Lose-Stay, an ugly strategy, is the most important catalyst for the evolution of good strategies for varying selection strengths and across different measures. The extortionate ZD strategy (with an extortion factor $\chi=2$ ) is the second most important catalyst, followed by the Alternator (an IPD strategy alternating between cooperating and defecting). The most crucial evolutionary pathway, a particular edge in the network, depends also on the specific selection strength $\beta$ and the measure used. For most scenarios (Figure 5b), the directed edge from ALLC to Win-Shift, Lose-Stay is the critical bridge, the presence of which can significantly boost the abundance of good strategies and suppress that of bad strategies while having the highest betweenness centrality among ugly strategies.

IV Discussion and Conclusion

Our results demonstrate the very importance of the exact composition of the strategy set when studying natural selection (including but not limited to evolutionary stability and resilience) of particular IPD strategies in population dynamics. Our network-based approach can be used to evaluate and assess the impact of including or excluding an individual IPD strategy on the evolution of good strategies and overall cooperation level. Besides, our method can be applied to steer population dynamics to desired states by adding one or more additional catalysts (and forming bridges that act as evolutionary “ramps”) that are amenable to external controls. This potential extension is an important insight arising from the present study.

In this work, we consider error-free IPD games where players perfectly implement their intended moves and perfectly observe others’ moves. However, noisy games where trembling hands or fuzzy minds are at play are worthy of further investigation [32]. It remains an open problem how intelligent players discern intentional deception from innocent errors and mistakes. In particular, the mechanism that they find a common ground notwithstanding different goals and perceptions of fairness is an important and promising area for future work [33].

The present study focuses on fixation probabilities in stochastic dynamics. We emphasize that the evolutionary time scale is another important quantity that should be taken into account. Some evolutionary pathway from one type to another is only possible in theory, that is, the probability of taking over (fixation probability) is non-zero but the expected conditional time for such fixation to occur is exponential under certain circumstances. For example, the fixation time can tend to infinity when the interaction is a snowdrift game as a particular type of co-existence and the selection strength $\beta$ is non-weak [34]. In light of this, the identification of bridges and catalysts needs to take into account the reasonable requirement of fixation time scales as well. The prior observation of stochastic tunneling in which a third type swiftly takes over a protracted mixture of two competing types could be quite useful in refining the search for the optimal presence of catalysts [35].

In the well-studied evolutionary dynamics between ALLC and ALLD where ALLC can never be favored, adding a third type such as TFT or Loners can fundamentally alter the underlying evolutionary dynamics [36], possibly leading natural selection to favor ALLC over ALLD. Our work generalizes this prior insight to multiple IPD strategies and offers a novel perspective of intervention and control. A targeted suppression and/or promotion of certain subgroup strategies can be achieved by adding or removing certain types of strategies from the pool. Our work highlights that the natural selection of strategies depending on the presence or absence of certain strategies is nontrivial and has broader applications to steering control problems with a broader context [37].

In conclusion, we have characterized and compared competing IPD strategies using a network-based approach. Our method can be used to identify essential IPD strategies (e.g. Win-Shift, Lose-Stay) and transitions between strategies (e.g. ALLD to Win-Shift, Lose-Stay) that act as catalysts and bridges for recovering cooperation from defection, and thus their presence plays an important role in the robustness and persistence of cooperation. Our findings resemble some interesting similarities with previous studies on the ecological stability and resilience of food webs [28, 29, 30].

Acknowledgment

X.C. gratefully acknowledges the generous faculty startup fund provided by BUPT (No. 505022023).

References

[1] Robert Axelrod and William D Hamilton. The evolution of cooperation. science, 211(4489):1390–1396, 1981.
[2] Robert L Trivers. The evolution of reciprocal altruism. The Quarterly review of biology, 46(1):35–57, 1971.
[3] Martin A Nowak. Five rules for the evolution of cooperation. science, 314(5805):1560–1563, 2006.
[4] William H Press and Freeman J Dyson. Iterated prisoner’s dilemma contains strategies that dominate any evolutionary opponent. Proceedings of the National Academy of Sciences, 109(26):10409–10413, 2012.
[5] Alexander J Stewart and Joshua B Plotkin. From extortion to generosity, evolution in the iterated prisoner’s dilemma. Proceedings of the National Academy of Sciences, 110(38):15348–15353, 2013.
[6] Christian Hilbe, Martin A Nowak, and Arne Traulsen. Adaptive dynamics of extortion and compliance. PloS one, 8(11):e77886, 2013.
[7] Fang Chen, Te Wu, and Long Wang. Evolutionary dynamics of zero-determinant strategies in repeated multiplayer games. Journal of Theoretical Biology, 549:111209, 2022.
[8] Xingru Chen, Long Wang, and Feng Fu. The intricate geometry of zero-determinant strategies underlying evolutionary adaptation from extortion to generosity. New Journal of Physics, 24(10):103001, 2022.
[9] Xingru Chen and Feng Fu. Outlearning extortioners: unbending strategies can foster reciprocal fairness and cooperation. PNAS nexus, 2(6):pgad176, 2023.
[10] Jeffrey S Banks and Rangarajan K Sundaram. Repeated games, finite automata, and complexity. Games and Economic Behavior, 2(2):97–117, 1990.
[11] Robert Axelrod. Effective choice in the prisoner’s dilemma. Journal of conflict resolution, 24(1):3–25, 1980.
[12] Robert Axelrod. More effective choice in the prisoner’s dilemma. Journal of conflict resolution, 24(3):379–403, 1980.
[13] Jianzhong Wu and Robert Axelrod. How to cope with noise in the iterated prisoner’s dilemma. Journal of Conflict resolution, 39(1):183–189, 1995.
[14] Jiawei Li and Graham Kendall. A strategy with novel evolutionary features for the iterated prisoner’s dilemma. Evolutionary Computation, 17(2):257–274, 2009.
[15] Tuomas W Sandholm and Robert H Crites. Multiagent reinforcement learning in the iterated prisoner’s dilemma. Biosystems, 37(1-2):147–166, 1996.
[16] Daan Bloembergen, Karl Tuyls, Daniel Hennes, and Michael Kaisers. Evolutionary dynamics of multi-agent learning: A survey. Journal of Artificial Intelligence Research, 53:659–697, 2015.
[17] Joel Z Leibo, Vinicius Zambaldi, Marc Lanctot, Janusz Marecki, and Thore Graepel. Multi-agent reinforcement learning in sequential social dilemmas. arXiv preprint arXiv:1702.03037, 2017.
[18] Marc Harper, Vincent Knight, Martin Jones, Georgios Koutsovoulos, Nikoleta E Glynatsi, and Owen Campbell. Reinforcement learning produces dominant strategies for the iterated prisoner’s dilemma. PloS one, 12(12):e0188046, 2017.
[19] Jakob Foerster, Richard Y Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, and Igor Mordatch. Learning with opponent-learning awareness. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pages 122–130, 2018.
[20] Baihan Lin, Djallel Bouneffouf, and Guillermo Cecchi. Online learning in iterated prisoner’s dilemma to mimic human behavior. In PRICAI 2022: Trends in Artificial Intelligence: 19th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2022, Shanghai, China, November 10–13, 2022, Proceedings, Part III, pages 134–147. Springer, 2022.
[21] Christian Hilbe, Krishnendu Chatterjee, and Martin A Nowak. Partners and rivals in direct reciprocity. Nature human behaviour, 2(7):469–477, 2018.
[22] Benjamin Kerr, Margaret A Riley, Marcus W Feldman, and Brendan JM Bohannan. Local dispersal promotes biodiversity in a real-life game of rock–paper–scissors. Nature, 418(6894):171–174, 2002.
[23] Patrick Alfred Pierce Moran. Random processes in genetics. In Mathematical proceedings of the cambridge philosophical society, volume 54, pages 60–71. Cambridge University Press, 1958.
[24] Drew Fudenberg and Lorens A Imhof. Imitation processes with small mutations. Journal of Economic Theory, 131(1):251–262, 2006.
[25] Tibor Antal, Arne Traulsen, Hisashi Ohtsuki, Corina E Tarnita, and Martin A Nowak. Mutation-selection equilibrium in games with multiple strategies. Journal of theoretical biology, 258(4):614–622, 2009.
[26] Jiawei Li, Philip Hingston, and Graham Kendall. Engineering design of strategies for winning iterated prisoner’s dilemma competitions. IEEE Transactions on Computational Intelligence and AI in Games, 3(4):348–360, 2011.
[27] Martin Nowak and Karl Sigmund. A strategy of win-stay, lose-shift that outperforms tit-for-tat in the prisoner’s dilemma game. Nature, 364(6432):56–58, 1993.
[28] Robert W Rutledge, Bennett L Basore, and Robert J Mulholland. Ecological stability: an information theory viewpoint. Journal of Theoretical Biology, 57(2):355–371, 1976.
[29] Robert T Paine. Food webs: linkage, interaction strength and community infrastructure. Journal of animal ecology, 49(3):667–685, 1980.
[30] Stuart L Pimm, John H Lawton, and Joel E Cohen. Food web patterns and their consequences. Nature, 350(6320):669–674, 1991.
[31] Martin A Nowak. Evolutionary dynamics: exploring the equations of life. Harvard university press, 2006.
[32] Robert Axelrod. Launching “the evolution of cooperation”. Journal of theoretical biology, 299:21–24, 2012.
[33] Allan Dafoe, Edward Hughes, Yoram Bachrach, Tantum Collins, Kevin R McKee, Joel Z Leibo, Kate Larson, and Thore Graepel. Open problems in cooperative ai. arXiv preprint arXiv:2012.08630, 2020.
[34] Tibor Antal and Istvan Scheuring. Fixation of strategies for an evolutionary game in finite populations. Bulletin of mathematical biology, 68(8):1923–1944, 2006.
[35] Yoh Iwasa, Franziska Michor, and Martin A Nowak. Stochastic tunnels in evolutionary dynamics. Genetics, 166(3):1571–1579, 2004.
[36] Christoph Hauert, Silvia De Monte, Josef Hofbauer, and Karl Sigmund. Volunteering as red queen mechanism for cooperation in public goods games. Science, 296(5570):1129–1132, 2002.
[37] Xin Wang, Zhiming Zheng, and Feng Fu. Steering eco-evolutionary game dynamics with manifold control. Proceedings of the Royal Society A, 476(2233):20190643, 2020.