[1,2]\fnmNaoki \surMasuda

1]\orgdivDepartment of Mathematics, \orgnameUniversity at Buffalo, \orgaddress\streetState University of New York, \cityBuffalo, \postcode14260-2900, \stateNY, \countryUSA

2]\orgdivComputational and Data-Enabled Science and Engineering Program, \orgnameUniversity at Buffalo, \orgaddress\streetState University of New York, \cityBuffalo, \postcode14260-5030, \stateNY, \countryUSA

Perturbation theory for evolution of cooperation on networks

\fnmLingqi \surMeng [email protected] [ [

Abstract

Network structure is a mechanism for promoting cooperation in social dilemma games. In the present study, we explore graph surgery, i.e., to slightly perturb the given network, towards a network that better fosters cooperation. To this end, we develop a perturbation theory to assess the change in the propensity of cooperation when we add or remove a single edge to/from the given network. Our perturbation theory is for a previously proposed random-walk-based theory that provides the threshold benefit-to-cost ratio, $(b/c)^{*}$ , which is the value of the benefit-to-cost ratio in the donation game above which the cooperator is more likely to fixate than in a control case, for any finite networks. We find that $(b/c)^{*}$ decreases when we remove a single edge in a majority of cases and that our perturbation theory captures at a reasonable accuracy which edge removal makes $(b/c)^{*}$ small to facilitate cooperation. In contrast, $(b/c)^{*}$ tends to increase when we add an edge, and the perturbation theory is not good at predicting the edge addition that changes $(b/c)^{*}$ by a large amount. Our perturbation theory significantly reduces the computational complexity for calculating the outcome of graph surgery.

keywords:

evolutionary game; prisoner’s dilemma; network reciprocity; fixation; stochastic dynamics

1 Introduction

Since Darwin’s time, explaining cooperative behavior in groups of self-interested individuals has been a challenge [1, 2, 3, 4, 5, 6, 7, 8]. Game theory including evolutionary game theory has shown that a population of self-interested individuals playing a social dilemma game of the prisoner’s dilemma type does not sustain cooperation without an additional mechanism. To explain cooperation in social dilemma situations in nature including in biological populations and to promote cooperation in human society, there have been proposed various mathematical mechanisms to support cooperation. Population structure as represented by contact networks of individuals is one such mechanism. The structure of contact networks constrains who can interact with whom and promotes emergence and endurance of clusters of cooperative players in local regions in spatial lattices [2, 9, 10, 11] and adjacent pairs of nodes in general networks [12, 13, 14, 11, 15].

A major indicator of the success of a mutant trait in evolutionary dynamics is the fixation probability. It is defined as the probability that the mutant type will spread and eventually occupy the entire population as a result of evolutionary dynamics, given an initial distribution of mutants [16, 17, 18, 5]. When each individual is in either of the two types (i.e., wild and mutant) at any given time and the population structure is described by a network on $N$ nodes, the state of the network is specified by an $N$ -dimensional binary vector of which the $i$ th entry encodes the type of the $i$ th node. In the absence of mutation, the fixation probability of the mutant starting from the state in which all the nodes are of the wild type is equal to $0$ . The fixation probability of the mutant is equal to $1$ if all the nodes are initially mutant. For general initial conditions, the exact solution of the fixation probability requires solving a linear system of $2^{N}-2$ equations [18, 13]. Therefore, it is difficult to exactly compute the fixation probability except for small networks, highly symmetric networks, or networks with other mathematically convenient properties.

We focus on social dilemma situations, in particular the prisoner’s dilemma game, in the present paper. In the prisoner’s dilemma, the wild and mutant types correspond to cooperator and defector, respectively, or vice versa. The calculation of the fixation probability for the prisoner’s dilemma game on networks, potentially with some additional assumptions, is usually more involved than the calculation in the case of the constant selection, in which the fitness of the wild and mutant types is fixed throughout the evolutionary dynamics. In games, the fitness of an individual generally depends on how other individuals behave, which makes setting up the linear system of $2^{N}-2$ equations and efficiently solving it, particularly the latter, a difficult task. Under this circumstance, weak selection is an assumption that often facilitates analytical evaluation of the fixation probability of the mutant type including in social dilemma games [16]. Let us write down each individual’s fitness as a sum of a constant term, called the baseline fitness, and the payoff that the individual receives by playing the game. By definition, weak selection means that the payoff is small compared to the baseline fitness. Under weak selection, Ohtsuki et al. developed a pair approximation theory that enables us to analytically derive the conditions under which cooperation fixates with a larger probability than a baseline on random regular graphs, i.e., random graphs in which all nodes have the same number of neighbors [13]. Furthermore, Allen et al. extended this result to the case of arbitrary networks using coalescence times from random walk theory [15]. With these methods, one can avoid dealing with a set of $2^{N}-2$ linear equations and calculate the leading term of the fixation probability in polynomial time in terms of $N$ .

In Ref. [15], the authors derived a key indicator to quantify the ease of cooperation in networks, i.e., the threshold benefit-to-cost ratio above which selection favors cooperation, denoted by $(b/c)^{*}$ . In fact, substantial changes in $(b/c)^{*}$ may occur when one only slightly perturbs the network structure, which is an operation referred to as graph surgery [15]. A carefully designed graph surgery may enhance cooperation by reducing $(b/c)^{*}$ by a larger amount than by a random graph surgery. For example, a small mean degree (i.e., the number of neighbors that a node has) of the network tends to induce cooperation [13, 15]. Therefore, decreasing the weight of an edge or removing an edge is expected to enhance cooperation. However, this may not be an optimal choice. Which particular edge should we perturb or remove to efficiently enhance cooperation? One can answer this question by removing just one edge from the original network, calculating $(b/c)^{*}$ for the perturbed network, and repeating the same procedure for each different perturbation of the original network. However, this procedure may be computationally costly. Note that the method to calculate the fixation probability for cooperation in arbitrary networks, developed in Ref. [15], is still computationally costly although its computational complexity is polynomial in $N$ .

In the current study, we develop a perturbation theory with the aim of predicting the direction and amount of the change in $(b/c)^{*}$ when one slightly perturbs the weight of an arbitrary single edge. We find that, for most networks, the actual change in $(b/c)^{*}$ when we remove an edge and the change predicted by our perturbation theory are strongly correlated, which makes it possible to propose a single edge to be removed for efficiently enhancing cooperation. However, the correlation between the result of direct numerical simulations and the perturbation theory is considerably weaker when one adds a new edge to the existing network. Therefore, our perturbation theory is not practically useful when one adds new edges. Compared to the direct numerical simulations, our perturbation theory is much faster, which allows us to compute the fixation probability under graph surgery in larger networks.

2 Fixation of cooperation on networks under weak selection

We assume that the graph $G$ is connected and undirected. We denote the set of nodes by $V=\{1,\ldots,N\}$ , where $N$ is the number of nodes. For each pair of nodes $i,j\in V$ , we denote the edge weight by $w_{ij}\geq 0$ . If there is no edge between $i$ and $j$ , we set $w_{ij}=0$ . We allow self-loops, i.e., positive values of $w_{ii}$ [15]. The weighted degree of node $i$ , denoted by $s_{i}=\sum_{j=1}^{N}w_{ij}$ , also called the node strength, is the sum of the weight of the edges connected to the node.

A discrete-time random walker is said to be simple if the walker located at node $i$ moves to one of its neighbors, denoted by $j$ , in a single time step with probability proportional to $w_{ij}$ , i.e., with probability $p_{ij}=w_{ij}/s_{i}$ . Let $W=(w_{ij})$ be the $N\times N$ weighted adjacency matrix. The transition probability matrix $P=(p_{ij})$ of the simple random walk is given by $P=D^{-1}W$ , where $D=\mathrm{diag}(s_{1},\ldots,s_{N})$ , i.e., the diagonal matrix whose diagonal entries are equal to $s_{1},s_{2},\ldots,s_{N}$ . Let $\boldsymbol{\pi}=(\pi_{1},\ldots,\pi_{N})$ be the stationary probability vector of the random walk with transition probability matrix $P$ , i.e., the solution of $\boldsymbol{\pi}P=\boldsymbol{\pi}$ . It holds true that [19, 20]

\pi_{i}=\frac{s_{i}}{\sum_{\ell=1}^{N}s_{\ell}},\quad i\in\{1,\ldots,N\}.

(1)

We use the donation game, which is a special case of the prisoner’s dilemma game. In the donation game, which is a two-player game, one player, called the donor, decides whether or not to pay a cost $c$ $(>0)$ . If the donor pays $c$ , which we refer to as cooperation, then the other player, called the recipient, receives benefit $b$ $(>c)$ . If the donor does not pay $c$ , which we refer to as defection, then the donor does not lose anything, and the recipient does not gain anything. Therefore, the payoff matrix of the donation game for a pair of players is given by

\bordermatrix{&\mathrm{C}&\mathrm{D}\cr\mathrm{C}&b-c&-c\cr\mathrm{D}&b&0},

(2)

where C and D represent cooperation and defection, respectively, and the payoff values represent those for the row player. We assume that each player on a node participates in the game as donor and recipient half of the times each.

We assign 0 and 1 to the defector and cooperator, respectively. Then, we can represent a state of the entire network by a binary vector $\boldsymbol{x}=(x_{1},\ldots,x_{N})\in\{0,1\}^{N}$ . With this notation, the payoff of node $i$ averaged over all its neighbors is given by

\displaystyle f_{i}(\boldsymbol{x})=-cx_{i}+b\sum_{j=1}^{N}p_{ij}x_{j}.

(3)

The reproductive rate of node $i$ in state $\boldsymbol{x}$ is given by

\displaystyle R_{i}(\boldsymbol{x})=1+\eta f_{i}(\boldsymbol{x}),

(4)

where $\eta$ represents the strength of the selection. If $\eta=0$ , the reproductive rate does not depend on the payoff matrix or the action (i.e., cooperation or defection) of any node. This case is equivalent to the so-called voter model. If $\eta\to 0$ , the payoff weakly impacts the selection, and this limit is called the weak selection regime. The idea behind weak selection is that, in reality, many different factors may contribute to the overall fitness of an individual, and the game under consideration is just one such factor [13, 15].

We drive evolutionary dynamics by the death-birth process with selection on birth on an arbitrary network composed of cooperators and defectors [13, 15]. Specifically, we first select a node to be updated, denoted by $i$ , uniformly at random. Second, we select one of the $i$ ’s neighbors, denoted by $j$ , for reproduction with the probability proportional to $w_{ij}R_{j}(\boldsymbol{x})$ . Third, the offspring, $i$ , inherits the type of $j$ . This completes a single round of the evolutionary dynamics, which we schematically show in Fig. 1.

Refer to caption — Figure 1: Death-birth process with selection on birth on the unweighted network. (a) Each individual obtains a payoff by interacting with all its neighbors. C and D represent cooperator and defector, respectively. (b) We select a node whose type to be replaced uniformly at random, shown in gray. Then, one of the three neighbors of this node, whose payoff values are indicated, will replace the gray node. We select each of the cooperating neighbors with probability $[1+\eta(b/2-c)]/[1+\eta(b/2-c)+1+\eta(b/2-c)+1+\eta(2b/3)]=[6+3\eta(b-2c)]/[18+2\eta(5b-6c)]$ and the defecting neighbor with probability $[1+\eta(2b/3)]/[1+\eta(b/2-c)+1+\eta(b/2-c)+1+\eta(2b/3)]=(3+2\eta b)/[9+\eta(5b-6c)]$ for reproduction. (c) In this example, we select the cooperating neighbor to the left, which replaces the offspring node.

The death-birth process in any finite population without mutation will eventually reach the state in which all individuals are cooperators or defectors and halt. In other words, the cooperation or defection fixates in finite time with probability $1$ . Suppose the initial condition in which one node is cooperator and the other $N-1$ nodes are defectors. There are $N$ such initial conditions depending on which node is the cooperator. We consider the initial probability distribution over all possible states that assigns probability $1/N$ to each of the states with exactly one cooperator and probability zero to all the other states. We denote by $\rho_{\mathrm{C}}$ the expectation that the cooperation fixates under this distribution of the initial state. If $\rho_{\mathrm{C}}>1/N$ , natural selection favors cooperation [16, 13, 5, 15]. In Ref. [15], Allen et al. showed that

\displaystyle\rho_{\mathrm{C}}=\frac{1}{N}+\frac{\eta}{2N}\left[-c\tau_{2}+b(\tau_{3}-\tau_{1})\right]+O(\eta^{2}),

(5)

where

\displaystyle\tau_{k}=\sum_{i=1}^{N}\sum_{j=1}^{N}\pi_{i}p_{ij}^{(k)}t_{ij},

(6)

$p_{ij}^{(k)}$ is the $(i,j)$ th entry of matrix $P^{k}$ , which implies that $p_{ij}^{(1)}=p_{ij}$ , and

\displaystyle t_{ij}=\begin{cases}0&\mathrm{if}\ i=j,\\ 1+\frac{1}{2}\sum_{k=1}^{N}(p_{ik}t_{jk}+p_{jk}t_{ik})&\mathrm{otherwise}.\end{cases}

(7)

Equation (7) implies that $t_{ij}=t_{ji}$ is the mean coalescence time of two random walkers when one walker is initially located at node $i$ and the other at node $j$ . Note that $p_{ij}^{(k)}$ is the $k$ -step transition probability of the random walk from node $i$ to node $j$ . Therefore, $\tau_{k}$ is the expected value of $t_{ij}$ when $i$ and $j$ are the two ends of a $k$ -step random walk trajectory on $G$ under the stationary distribution [15]. Equation (5) implies that the threshold value of the benefit-to-cost ratio above which the natural selection favors cooperation (i.e., $\rho_{\mathrm{C}}>1/N$ ) is given by

\displaystyle\left(\frac{b}{c}\right)^{*}=\frac{\tau_{2}}{\tau_{3}-\tau_{1}}.

(8)

Natural selection favors cooperation if $b/c>(b/c)^{*}$ .

For example, if the underlying network is regular with degree $k$ , we have

	$\displaystyle\tau_{1}=N-1,$		(9)
	$\displaystyle\tau_{2}=N-2,$		(10)

and

\displaystyle\tau_{3}=N+\frac{N}{k}-3,

(11)

such that

\displaystyle\left(\frac{b}{c}\right)^{*}=k

(12)

as $N\to\infty$ [15]. Note that the right-hand side of Eq. (8) only depends on the adjacency matrix of the network. In other words, the structure of the contact network determines whether and how much natural selection favors cooperation.

3 Perturbation theory for graph surgery

In this section, we develop a perturbation theory to determine the change in $(b/c)^{*}$ when one perturbs the weight of a single edge. To this end, we start by rewriting Eq. (6) in terms of matrices and vectors. Let $\boldsymbol{1}=(1,\ldots,1)^{\top}$ , where ^⊤ represents the transposition. Let $T=(t_{ij})$ be the $N\times N$ matrix of the mean coalescence time. Using these notations, we rewrite Eq. (6) as

\displaystyle\tau_{k}=\boldsymbol{\pi}\left(P^{k}\circ T\right)\boldsymbol{1},

(13)

where $k=1,2,3$ , and $\circ$ represents the Hadamard product.

If one changes the weight of an edge $(i_{0},j_{0})$ by $\varepsilon$ , where $|\varepsilon|\ll 1$ , including the case in which we create a new edge with weight $\varepsilon$ $(>0)$ , the perturbed network remains connected and undirected. Therefore, one can still use Eq. (8) to compute $(b/c)^{*}$ . Equation (8) uses Eq. (6), which requires $\boldsymbol{\pi}$ , $P$ , and $T$ . We denote these variables after the perturbation by $\boldsymbol{\pi}(\varepsilon)$ , $P(\varepsilon)$ , and $T(\varepsilon)$ . To distinguish the quantities before and after the perturbation, we denote these variables before the perturbation by $\boldsymbol{\pi}(0)$ , $P(0)$ , and $T(0)$ .

For writing down $\boldsymbol{\pi}(\varepsilon)$ , we denote by

\displaystyle S=\sum_{i=1}^{N}s_{i}=\sum_{i=1}^{N}\sum_{j=1}^{N}w_{ij}

(14)

the sum of the weighted degree of over all the nodes. Under a small perturbation, we carry out Taylor expansion of Eq. (1) to obtain

\displaystyle\boldsymbol{\pi}(\varepsilon)=\boldsymbol{\pi}(0)+\varepsilon\Delta\boldsymbol{\pi}+o(\varepsilon),

(15)

where $\Delta\boldsymbol{\pi}=(\Delta\pi_{1},\ldots,\Delta\pi_{N})$ . We obtain

\displaystyle\Delta\pi_{i}=\frac{\delta_{ii_{0}}+\delta_{ij_{0}}}{S}-\frac{2\pi_{i}(0)}{S},

(16)

where $\delta_{ij}$ is the Kronecker delta. We present the derivation of Eq. (16) in Appendix A.

To calculate $P(\varepsilon)$ , we define a symmetric indicator function, denoted by $\chi_{i_{0}j_{0}}$ , by

\displaystyle\chi_{i_{0}j_{0}}(i,j)=\begin{cases}1&\mathrm{if\ }(i,j)=(i_{0},j_{0})\ \mathrm{or}\ (i,j)=(j_{0},i_{0}),\\ 0&\mathrm{otherwise}.\end{cases}

(17)

We obtain

$\displaystyle P(\varepsilon)$	$\displaystyle=P(0)+\varepsilon\Theta^{(1)}+o(\varepsilon),$	(18)
$\displaystyle P^{2}(\varepsilon)$	$\displaystyle=P^{2}(0)+\varepsilon\left[\Theta^{(1)}P(0)+P(0)\Theta^{(1)}\right]+o(\varepsilon)$
	$\displaystyle:=P^{2}(0)+\varepsilon\Theta^{(2)}+o(\varepsilon),$	(19)
$\displaystyle P^{3}(\varepsilon)$	$\displaystyle=P^{3}(0)+\varepsilon\left[\Theta^{(1)}P^{2}(0)+P(0)\Theta^{(1)}P(0)+P^{2}(0)\Theta^{(1)}\right]+o(\varepsilon)$
	$\displaystyle:=P^{3}(0)+\varepsilon\Theta^{(3)}+o(\varepsilon),$	(20)

where $\Theta^{(1)}=(\theta^{(1)}_{ij})$ , $\Theta^{(2)}=(\theta^{(2)}_{ij})$ , and $\Theta^{(3)}=(\theta^{(3)}_{ij})$ are $N\times N$ matrices whose entries are given by

\displaystyle\theta^{(1)}_{ij}

\displaystyle=\frac{\chi_{i_{0}j_{0}}(i,j)}{s_{i}}-p_{ij}(0)\frac{\delta_{ii_{0}}+\delta_{ij_{0}}}{s_{i}},\phantom{----------------}

(21)

	$\displaystyle\theta^{(2)}_{ij}$	$\displaystyle=\frac{\delta_{ii_{0}}p_{j_{0}j}(0)}{s_{i}}+\frac{\delta_{ij_{0}}p_{i_{0}j}(0)}{s_{i}}-p_{ij}^{(2)}(0)\frac{\delta_{ii_{0}}+\delta_{ij_{0}}}{s_{i}}$
		$\displaystyle\phantom{=}+\frac{\delta_{jj_{0}}p_{ii_{0}}(0)}{s_{i_{0}}}+\frac{\delta_{ji_{0}}p_{ij_{0}}(0)}{s_{j_{0}}}-p_{ii_{0}}(0)p_{i_{0}j}(0)\frac{1}{s_{i_{0}}}-p_{ij_{0}}(0)p_{j_{0}j}(0)\frac{1}{s_{j_{0}}},$		(22)

and

$\displaystyle\theta^{(3)}_{ij}$	$\displaystyle=\frac{\delta_{ii_{0}}}{s_{i_{0}}}p^{(2)}_{j_{0}j}(0)+\frac{\delta_{ij_{0}}}{s_{j_{0}}}p^{(2)}_{i_{0}j}(0)-p^{(3)}_{ij}(0)\frac{\delta_{ii_{0}}+\delta_{ij_{0}}}{s_{i}}$
	$\displaystyle\phantom{=}+\frac{p_{ii_{0}}(0)p_{j_{0}j}(0)}{s_{i_{0}}}+\frac{p_{ij_{0}}(0)p_{i_{0}j}(0)}{s_{j_{0}}}-\frac{p_{ii_{0}}(0)p_{i_{0}j}^{(2)}(0)}{s_{i_{0}}}-\frac{p_{ij_{0}}(0)p_{j_{0}j}^{(2)}(0)}{s_{j_{0}}}$
	$\displaystyle\phantom{=}+\frac{\delta_{jj_{0}}}{s_{i_{0}}}p^{(2)}_{ii_{0}}(0)+\frac{\delta_{ji_{0}}}{s_{j_{0}}}p^{(2)}_{ij_{0}}(0)-p^{(2)}_{ii_{0}}(0)p_{i_{0}j}(0)\frac{1}{s_{i_{0}}}-p^{(2)}_{ij_{0}}(0)p_{j_{0}j}(0)\frac{1}{s_{j_{0}}}.$	(23)

We show the derivation of Eqs. (21), (22), and (23) in Appendix B.

We next calculate $T(\varepsilon)$ . Matrix $T(0)=(t_{ij}(0))$ satisfies

\displaystyle t_{ij}(0)=\begin{cases}0&\mathrm{if}\ i=j,\\ 1+\frac{1}{2}\left[\sum_{k=1}^{j-1}p_{ik}(0)t_{kj}(0)+\sum_{k=j+1}^{N}p_{ik}(0)t_{jk}(0)\right.\\ \left.+\sum_{k=1}^{i-1}p_{jk}(0)t_{ki}(0)+\sum_{k=i+1}^{N}p_{jk}(0)t_{ik}(0)\right]&\mathrm{if}\ i<j,\\ t_{ji}(0)&\mathrm{if}\ i>j,\end{cases}

(24)

which we obtain by applying $t_{ij}(0)=t_{ji}(0)$ to Eq. (7). Note that $\{p_{11}(0),p_{12}(0),\ldots,p_{NN}(0)\}$ are known from the network structure and that $\{t_{11}(0),t_{12}(0),\ldots,t_{NN}(0)\}$ are unknowns. We stack Eq. (24) for the different $i$ and $j$ values in lexicographical order of $(i,j)$ on the left-hand side. In other words, the first equation is $t_{11}(0)=0$ , the second equation is $t_{12}(0)-\frac{1}{2}p_{11}(0)t_{12}(0)-\frac{1}{2}\sum_{k=3}^{N}p_{1k}(0)t_{2k}(0)-\frac{1}{2}\sum_{k=2}^{N}p_{2k}(0)t_{1k}(0)=1$ , the third equation is $t_{13}(0)-\frac{1}{2}p_{11}(0)t_{13}(0)-\frac{1}{2}p_{12}(0)t_{23}(0)-\frac{1}{2}\sum_{k=4}^{N}p_{1k}(0)t_{3k}(0)-\frac{1}{2}\sum_{k=2}^{N}p_{3k}(0)t_{1k}(0)=1$ , and so on. Denote by $\mathrm{vec}(T(0))$ the thus obtained vectorization of matrix $T(0)$ , i.e.,

\displaystyle\mathrm{vec}(T(0))=(t_{11}(0),\ldots,t_{1N}(0);t_{21}(0),\ldots,t_{2N}(0);\ldots,t_{N1}(0),\ldots,t_{NN}(0))^{\top}.

(25)

Equation (25) is a redundant expression because $T(0)$ is a symmetric matrix and its diagonal elements are equal to 0. However, we use Eq. (25) in the following text because it makes the theoretical derivations and computational implementation easier than the most compact vector form of $T(0)$ , which would be $N(N-1)/2$ -dimensional. Using Eq. (25), we rewrite Eq. (24) as

\displaystyle M(0)\mathrm{vec}(T(0))=\boldsymbol{d},

(26)

where $M(0)$ is the $N^{2}\times N^{2}$ matrix whose entries are determined by Eq. (24), and $\boldsymbol{d}$ is the $N^{2}$ -dimensional column vector whose $((k-1)N+k)$ th entry is equal to 0 for all $k\in\{1,\ldots N\}$ , and all the other entries are equal to 1. Because it also holds true that $M(\varepsilon)\mathrm{vec}(T(\varepsilon))=\boldsymbol{d}$ , the calculation of $T(\varepsilon)$ requires $M(\varepsilon)$ , which is the matrix with perturbation, defined similarly to $M(0)$ . We obtain the entries of $M(\varepsilon)$ by those of $M(0)$ with each $p_{ij}(0)$ (with $i,j\in\{1,\ldots,N\}$ ) being replaced by $p_{ij}(\varepsilon)$ . We write the Taylor expansion of $M(\varepsilon)$ as

\displaystyle M(\varepsilon)=M(0)+\varepsilon\Delta M+o(\varepsilon)

(27)

and calculate $\Delta M$ as follows.

We write $\Delta M$ as a block matrix

\displaystyle\Delta M=\begin{pmatrix}\Delta_{11}&\Delta_{12}&\cdots&\Delta_{1N}\\ \Delta_{21}&\Delta_{22}&\cdots&\Delta_{2N}\\ \vdots&\vdots&\ddots&\vdots\\ \Delta_{N1}&\Delta_{N2}&\cdots&\Delta_{NN}\end{pmatrix},

(28)

where each $\Delta_{ij}$ is an $N\times N$ matrix. We show the derivation of each $\Delta_{ij}$ in Appendix C. We point out that the number of nonzero rows of $\Delta M$ is equal to $2\times(N-2)+(N-1)\times 2=4N-6$ , which is much smaller than $N^{2}$ for a large $N$ (see Appendix C).

To derive the first-order term of $T(\varepsilon)$ from $\Delta M$ , we use Eq. (27) to obtain

$\displaystyle\mathrm{vec}(T(\varepsilon))$	$\displaystyle=M(\varepsilon)^{-1}\boldsymbol{d}$
	$\displaystyle=(M(0)+\Delta M)^{-1}\boldsymbol{d}$
	$\displaystyle=\left[M(0)(I+\varepsilon M(0)^{-1}\Delta M+o(\varepsilon))\right]^{-1}\boldsymbol{d}$
	$\displaystyle=\left[I-\varepsilon M(0)^{-1}\Delta M+o(\varepsilon)\right]M(0)^{-1}\boldsymbol{d}$
	$\displaystyle=\left(I-\varepsilon M(0)^{-1}\Delta M\right)\mathrm{vec}(T(0))+o(\varepsilon)$
	$\displaystyle=\mathrm{vec}(T(0))-\varepsilon M(0)^{-1}\Delta M\mathrm{vec}(T(0))+o(\varepsilon).$	(29)

Therefore, we obtain

\displaystyle T(\varepsilon):=T(0)+\varepsilon\Delta T+o(\varepsilon),

(30)

where $\Delta T$ is the $N\times N$ matrix satisfying

\displaystyle\mathrm{vec}(\Delta T)=-M(0)^{-1}\Delta M\mathrm{vec}(T(0)).

(31)

Finally, using Eq. (13), we derive the perturbed $\tau_{k}(\varepsilon)$ as follows:

\displaystyle\tau_{k}(\varepsilon)=\tau_{k}(0)+\varepsilon\Gamma_{k}+o(\varepsilon),

(32)

where

\displaystyle\Gamma_{k}=\Delta\boldsymbol{\pi}(P^{k}(0)\circ T(0))+\boldsymbol{\pi}(0)(\Theta^{(k)}\circ T(0))+\boldsymbol{\pi}(0)(P^{k}(0)\circ\Delta T).

(33)

By substituting Eq. (32) in Eq. (8), we obtain

\displaystyle\left(\frac{b}{c}\right)^{*}(\varepsilon)

\displaystyle:=\left(\frac{b}{c}\right)^{*}(0)+\varepsilon\Delta\left(\frac{b}{c}\right)^{*}+o(\varepsilon),

(34)

where

\displaystyle\Delta\left(\frac{b}{c}\right)^{*}=\frac{(\tau_{3}(0)-\tau_{1}(0))\Gamma_{2}-\tau_{2}(0)(\Gamma_{3}-\Gamma_{1})}{(\tau_{3}(0)-\tau_{1}(0))^{2}}.

(35)

4 Time complexity

To calculate $(b/c)^{*}$ for a network with $N$ nodes, the original algorithm requires calculating the mean coalescence time by solving a linear system of $N(N-1)/2$ variables, i.e., $t_{ij}$ (with $i,j\in\{1,\ldots,N\}$ and $i<j)$ , which has a time complexity of $O(N^{6})$ . With the Coppersmith-Winograd algorithm [21], the time complexity is reduced to $O(N^{4.75})$ [15]. To determine the single edge whose removal decreases $(b/c)^{*}$ by the largest amount, for example, one needs to repeat this procedure for each edge. Therefore, the entire procedure with an ordinary algorithm and the Coppersmith-Winograd algorithm requires $O(N^{6}|E|)$ and $O(N^{4.75}|E|)$ time, respectively, where $|E|$ is the number of edges. For a sparse network, for which $|E|=O(N)$ , the time complexity is $O(N^{7})$ and $O(N^{5.75})$ , respectively.

The matrix $\Delta M$ defined by Eq. (28) is sparse and has a special pattern. If the $i$ th row of $\Delta M$ is a zero row, then the $i$ th element of vector $\Delta M\mathrm{vec}(T(0))$ is zero, and we do not need to calculate it. Therefore, to calculate $\Delta M\mathrm{vec}(T(0))$ , we only need to focus on its $((i_{0}-1)N+k)$ th entries, where $k\in\{1,\ldots,N\}\setminus\{i_{0}\}$ , $((j_{0}-1)N+k)$ th entries, where $k\in\{1,\ldots,N\}\setminus\{j_{0}\}$ , and $((k-1)N+i_{0})$ th and $((k-1)N+j_{0})$ th entries, where $k\in\{1,\ldots,N\}\setminus\{i_{0},j_{0}\}$ . All the other entries of $\Delta M\mathrm{vec}(T(0))$ are equal to $0$ . We show a pseudo algorithm to calculate $\Delta T$ in Algorithm 1.

Input: Matrices

\Theta^{(1)}

and

M(0)^{-1}

; vector

(\boldsymbol{v}_{1},\boldsymbol{v}_{2},\ldots,\boldsymbol{v}_{N})

; edge

(i_{0},j_{0})

Output: Matrix

\Delta T

/* Compute

\Delta M\mathrm{vec}(T(0))

Initialize

N^{2}

-dimensional vector

u=0

while $k\in\{1,\ldots,N\}\setminus\{i_{0},j_{0}\}$ do

u_{(i_{0}-1)N+k}\leftarrow\Theta_{i_{0}}\cdot\boldsymbol{v}_{k}

u_{(j_{0}-1)N+k}\leftarrow\Theta_{j_{0}}\cdot\boldsymbol{v}_{k}

u_{(k-1)N+i_{0}}\leftarrow\Theta_{i_{0}}\cdot\boldsymbol{v}_{k}

u_{(k-1)N+j_{0}}\leftarrow\Theta_{j_{0}}\cdot\boldsymbol{v}_{k}

end while

u_{(i_{0}-1)N+j_{0}}\leftarrow\Theta_{i_{0}}\cdot\boldsymbol{v}_{j_{0}}+\Theta_{j_{0}}\cdot\boldsymbol{v}_{i_{0}}

u_{(j_{0}-1)N+i_{0}}\leftarrow\Theta_{i_{0}}\cdot\boldsymbol{v}_{j_{0}}+\Theta_{j_{0}}\cdot\boldsymbol{v}_{i_{0}}

u=(u_{1},\ldots,u_{N^{2}})^{\top}

is now equal to

\Delta M\mathrm{vec}(T(0))

/* Compute

\mathrm{vec}(\Delta T)

/* Multiply

M(0)^{-1}

and the already calculated

\Delta M\mathrm{vec}(T(0))

Initialize

N^{2}

-dimensional vector

\mathrm{vec}(\Delta T)=0

while $k\in\{1,\ldots,N\}\setminus\{i_{0},j_{0}\}$ do

\mathrm{vec}(\Delta T)\leftarrow\mathrm{vec}(\Delta T)+u_{(i_{0}-1)N+k}\left(M(0)^{-1}\right)_{(i_{0}-1)N+k}

\mathrm{vec}(\Delta T)\leftarrow\mathrm{vec}(\Delta T)+u_{(j_{0}-1)N+k}\left(M(0)^{-1}\right)_{(j_{0}-1)N+k}

\mathrm{vec}(\Delta T)\leftarrow\mathrm{vec}(\Delta T)+u_{(k-1)N+i_{0}}\left(M(0)^{-1}\right)_{(k-1)N+i_{0}}

\mathrm{vec}(\Delta T)\leftarrow\mathrm{vec}(\Delta T)+u_{(k-1)N+j_{0}}\left(M(0)^{-1}\right)_{(k-1)N+j_{0}}

end while

\mathrm{vec}(\Delta T)\leftarrow\mathrm{vec}(\Delta T)+u_{(i_{0}-1)N+j_{0}}\left(M(0)^{-1}\right)_{(i_{0}-1)N+j_{0}}

\mathrm{vec}(\Delta T)\leftarrow\mathrm{vec}(\Delta T)+u_{(j_{0}-1)N+i_{0}}\left(M(0)^{-1}\right)_{(j_{0}-1)N+i_{0}}

Return

\Delta T

Algorithm 1 Pseudoalgorithm to compute

\Delta T

. Let

\left(M(0)^{-1}\right)_{i}

be the

i

th column of

M(0)^{-1}

. Let

\Theta_{i}

be the

i

th row of

\Theta^{(1)}

, where

\Theta^{(1)}

is defined by Eq. (21). Let

\boldsymbol{v}_{i}

be the

i

th row of

T(0)

such that

\mathrm{vec}(T(0))=(\boldsymbol{v}_{1},\boldsymbol{v}_{2},\ldots,\boldsymbol{v}_{N})^{\top}

We now discuss the computational complexity of our perturbation method. Because the inner product of $N$ -dimensional vectors has a time complexity of $O(N)$ , the first while loop in Algorithm 1 has a complexity of $O(N^{2})$ . The second while loop computes $\mathrm{vec}(\Delta T)$ . Because the scalar multiplication of an $N^{2}$ -dimensional vector requires $O(N^{2})$ time, the entire while loop has a time complexity of $O(N^{3})$ . Therefore, for a single perturbation experiment, one can carry out the entire algorithm in $O(N^{3})$ time to obtain the perturbed $\{t_{ij}\}$ , and hence $(b/c)^{*}$ . This is considerably smaller than $O(N^{4.75})$ and $O(N^{6})$ with the Coppersmith-Winograd algorithm and the standard algorithm, respectively. The entire procedure to determine the single edge to be removed to maximize cooperation with the perturbation theory requires $O(N^{3}|E|)$ time in general networks and $O(N^{4})$ time for sparse networks.

5 Data

We use the following four synthetic networks and seven empirical networks in our numerical analysis in section 6. We show the number of nodes and that of edges for each network in Table 1 and visualize them in Fig. 2. All the networks are connected networks without self-loops.

We use a network generated by the Erdős–Rényi (ER) random graph with $N=100$ nodes. We connect $300$ pairs of nodes out of the $N(N-1)/2=4950$ pairs of nodes selected uniformly at random. The average degree $\langle k\rangle=6$ .

With the Barabási–Albert (BA) model, we sequentially add new nodes each with $m=3$ edges that connect to existing nodes according to the linear preferential attachment rule [22]. We start the growth process from the star graph with four nodes. The degree distribution approximately obeys $p(k)\propto k^{-3}$ , where $p(k)$ is the probability that a node has degree of $k$ , and $\propto$ represents “proportional to”, in the limit of $N\to\infty$ . We set $N=100$ and $m=3$ , which yields $291$ edges, implying $\langle k\rangle=5.82\approx 6$ .

The planted $\ell$ -partition model, also called the random partition (RP) graph, partitions the set of $N$ nodes into $\ell$ groups, each of which has $N/\ell$ nodes [23]. Any pair of nodes in the same group is adjacent to each other with probability $p_{\mathrm{in}}$ . Any pair of nodes belonging to different groups are adjacent to each other with probability $p_{\mathrm{out}}$ . If $p_{\mathrm{in}}>p_{\mathrm{out}}$ , the intra-cluster edge density exceeds the inter-cluster edge density such that the network has community structure. We set $N=100$ , $\ell=2$ , $p_{\mathrm{in}}=0.11$ , and $p_{\mathrm{out}}=0.01$ such that the mean degree $\langle k\rangle=p_{\mathrm{in}}(N/\ell-1)+p_{\mathrm{out}}N(\ell-1)/\ell=5.89$ in theory. We use a network generated by this model having $\langle k\rangle=6.12$ .

The Lancichinetti–Fortunato–Radicchi (LFR) model generates networks with community structure [24]. The model generates a power-law degree distribution with power-law exponent $\gamma$ , and a power-law distribution of the size of the community with power-law exponent $\kappa$ . The model also requires the maximal degree $k_{\mathrm{max}}$ and mean degree $\langle k\rangle$ as input. The mixing parameter $\overline{\mu}\in(0,1)$ specifies the fraction of edges that connect different communities. A small value of $\overline{\mu}$ leads to strong community structure. We set $N=100$ , $\gamma=3$ , $\kappa=2$ , $\langle k\rangle=6$ , $k_{\mathrm{max}}=100$ , and $\overline{\mu}=0.1$ . A network generated by this model that we use has $\langle k\rangle=6.08$ .

We consider the following seven empirical networks. The karate club network consists of 34 nodes and 78 edges [25]. Each node represents a member of a karate club in a university in the United States, who were observed between 1970 and 1972. The edges represent interaction outside the activities of the club.

The weaver network has 42 nodes and 151 edges [26]. Each node represents a sociable weaver (Philetairus socius) observed in Benfontein Game Farm, Kimberley, South Africa. The observation lasted for 10 months in total: September–December 2010 and 2011, and January–February 2013. Two nodes are adjacent to each other if the two weavers used the same nest chambers either for roosting or nest-building within a series of observations in the same year.

The sparrow network has 52 nodes and 516 edges [27]. A node represents a golden-crowned sparrow (Zonotrichia atricapilla) observed at the University of California, Santa Cruz Arboretum. The data was recorded between January and March 2010 [27]. Although the original network is weighted, we regard this network as an unweighted network.

The lizard network has 60 nodes and 318 edges [28]. Each node represents a lizard (Tiliqua rugosa) observed in a chenopod shrubland near Bundey Bore Station in South Australia. Each lizard was attached to the dorsal surface of the tail a data logger unit, which recorded synchronized GPS locations every 10 minutes. Two lizards were regarded to be adjacent to each other if they were within 2 meters of each other in any GPS record.

The dolphin network has 62 nodes and 159 edges [29]. Each node represents a bottlenose dolphin (Tursiops). An edge represents a frequent association between two dolphins.

The email network has 167 nodes and 3251 edges [30]. Each node represents an employee of a mid-sized manufacturing company in Poland. An edge between two nodes (i.e., employees) indicates that there exists at least one email correspondence between the two individuals. We do not distinguish the senders and the recipients and treat the network as undirected network.

The bird network has 202 nodes and 11900 edges [31]. In the experiment, they placed some nest boxes in Wytham Woods, Oxford, UK, for six days to record individuals that landed on the entrance hole while prospecting for breeding territories. Each node represents a wild bird, which is either great tit (Parus major), blue tit (Cyanistes caeruleus), marsh tit (Poecile palustris), coal tit (Periparus ater), or Eurasian nuthatch (Sitta europaea). An edge represents two birds that overlapped in nest-box exploration patterns on the same day.

6 Numerical results

6.1 Addition or removal of a single edge

We examine the accuracy at which our perturbation theory describes the change in $(b/c)^{*}$ when we add or remove an edge in the given unweighted network. Before the perturbation, $w_{ij}=w_{ji}=1$ if there exists an edge between the $i$ th and $j$ th nodes, and $w_{ij}=w_{ji}=0$ otherwise. In the case of edge addition, we add an edge with weight $\varepsilon$ between a pair of nodes $(i_{0},j_{0})$ without an edge in the original network unless we state otherwise, where $0<\varepsilon\leq 1$ . Therefore, $w_{i_{0}j_{0}}(=w_{j_{0}i_{0}})$ changes from $0$ to $\varepsilon$ , and all the other $w_{ij}\in\{0,1\}$ values remain unchanged. The addition of an unweighted edge corresponds to $\varepsilon=1$ . In the case of edge removal, we reduce the weight of an edge $(i_{0},j_{0})$ in the original network by $-\varepsilon$ , where $-1\leq\varepsilon<0$ . Therefore, $w_{i_{0}j_{0}}(=w_{j_{0}i_{0}})$ changes from $1$ to $1+\varepsilon$ , and all the other $w_{ij}$ values remain unchanged. The complete removal of an unweighted edge corresponds to $\varepsilon=-1$ .

We are interested in whether the linear approximation to $(b/c)^{*}(\varepsilon)$ given by Eq. (34), i.e., $\Delta(b/c)^{*}$ , which we call the slope, predicts the change in $(b/c)^{*}$ in response to the addition of a single edge, i.e., $(b/c)^{*}(1)-(b/c)^{*}(0)$ , or the removal of a single edge, i.e., $(b/c)^{*}(-1)-(b/c)^{*}(0)$ . We start by directly computing the change in $(b/c)^{*}$ , i.e., $(b/c)^{*}(\varepsilon)-(b/c)^{*}(0)$ , in response to adding a new edge of weight $\varepsilon$ $(>0)$ or reducing the weight of an existing edge by changing the edge weight to $1+\varepsilon$ $(<1)$ for various values of $\varepsilon$ for relatively small networks. The outcome of our perturbation theory, i.e., $\Delta(b/c)^{*}$ is equal to $\lim_{\varepsilon\to 0}\left[(b/c)^{*}(\varepsilon)-(b/c)^{*}(0)\right]/\varepsilon$ , where $(b/c)^{*}(0)$ and $(b/c)^{*}(\varepsilon)$ are the values obtained by the direct numerical simulations.

We show the relationship between $(b/c)^{*}(\varepsilon)-(b/c)^{*}(0)$ and $\varepsilon$ when we reduce the weight of a single edge in a BA network with $N=100$ nodes in Fig. 3(a). Each line in the figure corresponds to an edge whose weight is gradually reduced. Note that $\varepsilon=0$ corresponds to the original network. Figure 3(a) indicates that $(b/c)^{*}$ roughly monotonically decreases as we gradually decrease the edge weight (i.e., decrease $\varepsilon$ from 0 to negative values) except near $\varepsilon=0$ . For this network, the removal of any single edge (i.e., $\varepsilon=-1$ ) leads to a decrease in $(b/c)^{*}$ , implying that the edge removal promotes cooperation. However, we note that a small decrease in the weight of an edge in the original network (e.g., $\varepsilon=-0.3$ ) increases $(b/c)^{*}$ for some edges, making cooperation more difficult than in the original network. Figure 3(a) implies that the perturbation theory is not accurate at describing the amount of the change in $(b/c)^{*}$ upon the edge removal because most of the curves shown in the figures, corresponding to the different edges in the original network, are far from being linear. However, we observe that the curves with the largest values of the slope of the curve at $\varepsilon=0$ tend to yield the smallest values of $(b/c)^{*}$ at $\varepsilon=-1$ . Therefore, the perturbation theory, which produces the slope value, is expected to be efficient at detecting the edges whose removal yields the largest decrease in $(b/c)^{*}$ .

We show in Fig. 3(b) the change in $(b/c)^{*}$ plotted against $\varepsilon$ when we add a new edge with weight $\varepsilon$ . Each line corresponds to a pair of nodes between which there is initially no edge. Note that $\varepsilon=1$ corresponds to the addition of an unweighted edge. We find that the addition of any unweighted edge increases $(b/c)^{*}$ , making cooperation difficult. However, in contrast to the case of edge removal, the addition of an unweighted edge (i.e., with edge weight $\varepsilon=1$ ) does not necessarily yield the largest change in $(b/c)^{*}$ among edges of different weights $\varepsilon\in(0,1]$ . Specifically, for many node pairs that are initially not adjacent to each other, adding an edge with an intermediate edge weight (e.g., $\varepsilon\approx 0.7$ ) maximizes the increase in $(b/c)^{*}$ (see Fig. 3(b)). Another observation is that the slope of the curve at $\varepsilon=0$ , corresponding to the perturbation theory, is apparently less predictive of the effect of adding an unweighted edge (i.e., $\varepsilon=1$ ). Specifically, Fig. 3(b) indicates that, even if the slope at $\varepsilon=0$ is large, $(b/c)^{*}$ at $\varepsilon=1$ can be relatively small because $(b/c)^{*}$ decreases as $\varepsilon$ increases when $\varepsilon$ is close to 1. Furthermore, the curves with the largest slopes at $\varepsilon=0$ do not yield the largest changes in the $(b/c)^{*}$ value at $\varepsilon=1$ , which implies that the perturbation theory is expected to be inefficient at predicting the edge addition that makes the cooperation most difficult.

We find similar results for the planted $2$ -partition model for the gradual removal of a single edge (see Fig. 3(c)). A notable difference from the case of the BA model is that there exists one edge whose complete removal increases $(b/c)^{*}$ , making the cooperation difficult. The two nodes forming this edge have degrees $2$ and $9$ , which are not outstanding. Furthermore, we have confirmed by running a deterministic approximate modularity maximization algorithm [32], using function greedy_modularity_communities in NetworkX, that these two nodes belong to the same community among the four communities detected. Therefore, this particular edge looks like just a normal edge.

We show in Fig. 3(d) the dependence of $(b/c)^{*}$ on $\varepsilon$ when we gradually increase the weight of an edge that is initially absent in the planted $2$ -partition network. The slope of the curve at $(b/c)^{*}$ at $\varepsilon=0$ is apparently not strongly related to the change in $(b/c)^{*}$ at $\varepsilon=1$ .

We show the results of edge removal in the dolphin network in Fig. 3(e). There are two edges out of the 150 edges of which the removal (i.e., $\varepsilon=-1$ ) increases $(b/c)^{*}$ , making cooperation difficult. These two edges are formed by two nodes with degrees 2 and 5 and two other ones with degrees 2 and 7. These degree values are not outstanding in the entire network. The four nodes belong to the same community among the four communities detected by the same approximate modularity maximization algorithm [32]. These results suggest that the two edges apparently look normal. The removal of any other edge decreases $(b/c)^{*}$ , enhancing cooperation. Similar to the BA model, the curves with the largest slopes at $\varepsilon=0$ yield the largest decreases in $(b/c)^{*}$ at $\varepsilon=-1$ .

We show in Fig. 3(f) the dependence of $(b/c)^{*}$ on $\varepsilon$ when we gradually increase the weight of an edge that is initially absent in the dolphin network. The results are similar to those for the planted $2$ -partition model shown in Fig. 3(d). Many curves yield decrease in $(b/c)^{*}$ at $\varepsilon=1$ , implying that the edge addition can promote cooperation, whereas the converse is the case for many other curves. The slope of the curve of $(b/c)^{*}$ at $\varepsilon=0$ is apparently not strongly related to the change in $(b/c)^{*}$ at $\varepsilon=1$ .

Table 1: Pearson correlation coefficient,

r

, between the shift in

(b/c)^{*}

obtained by direct numerical simulations and that predicted by the perturbation theory. We remind that

N

is the number of nodes and that

|E|

is the number of edges. A large positive value of

r

upon edge addition or enhancement implies that the perturbation theory is good at predicting the outcome of adding or enhancing an edge. A large negative value of

r

upon edge removal implies that the perturbation theory is good at predicting the outcome of removing an edge.

Network	$N$	$\|E\|$	$r$ , edge	$r$ , edge	$r$ , edge
			addition	removal	enhancement
ER	100	300	$-0.55$	$-0.87$	$0.99$
BA	100	291	$-0.36$	$-0.86$	$0.99$
RP	100	306	$-0.39$	$-0.80$	$0.98$
LFR	100	304	$\phantom{-}0.27$	$-0.84$	$0.98$
Karate	34	78	$\phantom{-}0.35$	$-0.88$	$0.99$
Weaver	42	152	$\phantom{-}0.94$	$-0.93$	$0.99$
Sparrow	52	516	$-0.01$	$-0.95$	$1.00$
Lizard	60	318	$\phantom{-}0.72$	$-0.93$	$0.99$
Dolphin	62	159	$\phantom{-}0.56$	$-0.72$	$0.96$

The nonlinearity in the curves shown in Fig. 3 indicates that our perturbation theory is not accurate at predicting the amount of change in $(b/c)^{*}$ when we completely remove or add an edge in most cases. Therefore, we turn to ask whether the slope obtained from the perturbation theory is useful at determining the edge whose removal or addition changes $(b/c)^{*}$ by a large amount, representing strong promotion or suppression of cooperation in networks. We show in Fig. 4(a) the relationship between the change in $(b/c)^{*}$ when we remove an edge from the BA network and the slope $\Delta(b/c)^{*}$ obtained from Eq. (34). The two quantities are strongly negatively correlated (Pearson correlation coefficient $r=-0.86$ , sample size $n=291,p<0.01$ ). This result indicates that the perturbation theory, which is theoretically accurate only in the vicinity of $\varepsilon=0$ , is good at predicting the outcome of removing an edge. We show in Fig. 4(b) the change in $(b/c)^{*}$ when we add a new edge to the same BA network as a function of the slope, $\Delta(b/c)^{*}$ . The change in $(b/c)^{*}$ is not strongly positively correlated with $\Delta(b/c)^{*}$ , suggesting that the perturbation theory is not good at predicting the outcome of adding an edge, whereas the correlation coefficient is significant due to a large sample size ( $r=-0.36,n=4659,p<0.01$ ). Note that a large positive correlation coefficient when we add an edge would imply that the perturbation theory is good at predicting the outcome of adding an edge.

We show in Figs. 4(c) and 4(d) the results for the same correlation analysis for the planted $2$ -partition model network. When one removes an existing edge, the change in $(b/c)^{*}$ and slope $\Delta(b/c)^{*}$ are strongly negatively correlated ( $r=-0.80,n=306,p<0.01$ ; see Fig. 4(c)), which is similar to the result for the BA model shown in Fig. 4(a), suggesting that the perturbation theory is good at predicting the outcome of removing an edge. When one adds a new edge, the change in $(b/c)^{*}$ and slope $\Delta(b/c)^{*}$ are weakly correlated for this network ( $r=-0.39,n=4644,p<0.01$ ; see Fig. 4(d)), which is similar to the result for the BA model shown in Fig. 4(b).

We show the corresponding results for the dolphin network in Figs. 4(e) and 4(f). The change in $(b/c)^{*}$ and slope $\Delta(b/c)^{*}$ are strongly negatively correlated when one removes an edge ( $r=-0.72,n=150,p<0.01$ ; see Fig. 4(e)) and less strongly correlated when one adds a new edge ( $r=0.56,n=1732,p<0.01$ ; see Fig. 4(f)). A strongly negative correlation for the edge removal (i.e., $r=-0.72$ ) is similar to the result for the BA model. A positive correlation for the edge addition (i.e., $r=0.56$ ) implies that the perturbation theory is to some extent good at predicting the outcome of adding an edge.

We show in Table 1 the same relationships for the other networks. For all synthetic and empirical networks, the slope $\Delta(b/c)^{*}$ obtained from perturbation theory is strongly negatively correlated with the change in $(b/c)^{*}$ when we remove an existing edge ( $r\leq-0.72$ ). Therefore, the perturbation theory is effective at predicting the outcome of removing an edge across different networks. However, the correlation is strongly positive only for a small fraction of networks (i.e., $r\geq 0.5$ for three out of the nine networks) when we add a new edge to the network.

6.2 Enhancement of the weight of an existing edge

In this section, we allow weighted networks and consider an increase or decrease in the weight of an existing edge of the network. Because we effectively analyzed the case of the decrease in the edge weight in section 6.1 (i.e., by setting $-1<\varepsilon<0$ ), here we only consider enhancement of the weight of an existing edge by $\varepsilon$ .

We enhanced the weight of an existing edge by $0<\varepsilon\leq 1$ , making the edge weight $1+\varepsilon$ , and numerically examined $(b/c)^{*}$ in the altered weighted networks. We plot in Figs. 5(a), 5(c), and 5(e) the change in $(b/c)^{*}$ relative to the original network against $\varepsilon$ for the three networks used in Figs. 3 and 4. For the BA model, increasing the weight of 74 out of the 291 existing edges from $1$ to $2$ (i.e., $\varepsilon=1$ ) led to an increase in $(b/c)^{*}$ , making cooperation more difficult, whereas the opposite is the case when one enhances the weight of any other edge (see Fig. 5(a)). This result contrasts with the case of adding a new edge to the same network, which always increases $(b/c)^{*}$ (see Fig. 3(b)). In the planted 2-partition model (Fig. 5(c)) and the dolphin network (Fig. 5(e)), enhancing the edge weight of 7 out of the 306 edges and 23 out of the 159 edges, respectively, led to an increase in $(b/c)^{*}$ . Therefore, in a majority of cases, cooperation becomes easier by enhancing the weight of a single edge, which contrasts with the results for adding a new edge to these networks (see Figs. 3(d) and 3(f)). These results altogether suggest that adding new edges and enhancing the weight of existing edges often lead to different results.

Figures 5(a), 5(c), and 5(e) also indicate that the change in $(b/c)^{*}$ is close to linear as a function of $\varepsilon$ . Therefore, our perturbation theory should be accurate at estimating the change in $(b/c)^{*}$ with $\varepsilon=1$ . To verify this prediction, we show in Figs. 5(b), 5(d), and 5(f) the relationship between the change in $(b/c)^{*}$ in response to changing the weight of a single edge from $1$ to $2$ and $\Delta(b/c)^{*}$ obtained by the perturbation theory for the three networks. As expected, the accuracy of the perturbation theory is high. We have confirmed that a high accuracy also holds true for other networks (see the last column of Table 1). These high accuracy results are in stark contrast to the results in case of adding a new edge, with which the accuracy of the perturbation theory is low.

6.3 Sequential edge removal

The nonlinearity in the curves shown in Fig. 3, and the results shown in Fig. 4 and Table 1 indicate that our perturbation theory is not accurate at estimating the amount of change in $(b/c)^{*}$ upon an edge removal. Therefore, we turn to investigate whether our perturbation theory is good at finding edges to be sequentially removed to decrease $(b/c)^{*}$ by a large amount in larger networks. Denote by $G_{0}$ an original network. We remove the edge with the largest $\Delta(b/c)^{*}$ , resulting in network $G_{1}$ . Then, we calculate $\Delta(b/c)^{*}$ for each existing edge in $G_{1}$ and remove the edge with the largest $\Delta(b/c)^{*}$ , resulting in network $G_{2}$ . We repeat this procedure another three times to eventually obtain network $G_{5}$ , which has five fewer edges than $G_{0}$ .

A simple rule of thumb to determine edges to be removed to enhance cooperation is to use the degree of nodes composing the edge. In particular, $(b/c)^{*}$ for the death-birth rule is small for random regular graphs with small degrees [13] and general networks with a small mean degree [15]. Therefore, we test the performance of our perturbation theory against a degree-based heuristic to remove an edge for enhancing cooperation, which we define as follows. Denote by $(i,j)$ the edge to be removed and by $k_{i}$ and $k_{j}$ the degree of the $i$ th and $j$ th nodes, respectively. Note that $k_{i}=\sum_{\ell=1}^{N}w_{i\ell}(=\sum_{\ell=1}^{N}w_{\ell i})$ for our networks, which are unweighted. For each network, we remove the edge whose $k_{i}+k_{j}$ is largest. After removing an edge according to this criterion, we select the edge with the largest $k_{i}+k_{j}$ in the reduced network and remove it. We repeat this procedure another three times to remove five edges in total. In our numerical experiments described below, we have verified that the selected edges are always the same if the score for the edge is defined by $k_{i}k_{j}$ instead of $k_{i}+k_{j}$ .

We carry out sequential edge removal experiments on three synthetic networks and three empirical networks. Note that the six networks are mostly larger than those used in the previous numerical simulations. For these networks, it is computationally difficult to exactly calculate $(b/c)^{*}$ for all possible networks with, for example, one edge being removed from the original network.

We show the change in $(b/c)^{*}$ relative to the original network as we sequentially remove five edges using our perturbation theory by the red lines in Fig. 6. As expected, $(b/c)^{*}$ decreases, corresponding to negative $\Delta(b/c)^{*}$ values, as we remove edges one by one. We also show the result of the sequential edge removal based on the degree sum $k_{i}+k_{j}$ by the blue lines in the same figure. For all networks, there are multiple edges that have the same value of $k_{i}+k_{j}$ at least in one of the five steps to remove a single edge. In this case, we calculated $\Delta(b/c)^{*}$ for all the possible scenarios of removing one of the edges that maximize $k_{i}+k_{j}$ in each step of edge removal. This is why we have obtained multiple blue lines in the figure. In all cases, $(b/c)^{*}$ decreases as we sequentially remove edges with the largest $k_{i}+k_{j}$ value. Figure 6 indicates that the edge removal based on our perturbation theory results in a larger decrease in $(b/c)^{*}$ than that based on $k_{i}+k_{j}$ for all the networks. To be quantitative, we measured the decrease in $(b/c)^{*}$ after the removal of five edges compared to the original network with the perturbation theory and with the degree sum. The former was larger than the average of the latter (i.e., average of the blue lines in Fig. 6) by a factor of $1.02$ , $1.01$ , $1.02$ , $1.05$ , $1.02$ , and $1.02$ for the ER random graph (Fig. 6(a)), BA model (Fig. 6(b)), planted 2-partition network (Fig. 6(c)), lizard network (Fig. 6(d)), email network (Fig. 6(e)), and bird network (Fig. 6(f)), respectively.

7 Conclusions

To determine $(b/c)^{*}$ for an arbitrary network, one needs to solve a system of $N^{2}$ linear equations such that the time complexity is $O(N^{6})$ . With the Coppersmith-Winograd algorithm, the time complexity is reduced to $O(N^{4.75})$ , but this is still large (see section 4). In particular, it is computationally costly to carry out graph surgery with various possible edges to be added or removed to compare the results in terms of $(b/c)^{*}$ . Therefore, we have developed a perturbation theory for the graph surgery with which we can evaluate the perturbed $(b/c)^{*}$ in $O(N^{3})$ time. We have verified that the first-order term $\Delta(b/c)^{*}$ obtained from our perturbation theory predicts the rank of the change in $(b/c)^{*}$ when one removes an edge from the network with a high accuracy. Specifically, we have numerically shown that the edge with the largest $\Delta(b/c)^{*}$ value is the one whose actual removal decreases $(b/c)^{*}$ by the largest amount in two out of the three networks (see Fig. 4(a), 4(c), and 4(e)). Therefore, we conclude that our perturbation theory is useful for finding the edge whose removal efficiently enhances cooperation in the given network with a reduced computational cost.

We focused on the death-birth process because it tends to foster cooperation compared to other rules of strategy updating [13, 11]. However, it is straightforward to formulate similar perturbation methods in the case of other updating rules such as the birth-death process [18, 13] and the pairwise comparison rule [33, 34, 16, 35] as well as in the case of other payoff matrices. In particular, our theory should be applicable to the case of constant selection [18, 36], with which the payoff matrix is independent of the opponent’s action. The perturbation theory may be more accurate for other update rules or games than the combination of the death-birth rule and the prisoner’s dilemma game examined in the present study. Exploitation of our perturbation approach in these directions is left for future work.

Another direction of future work is interaction between the selection strength and network perturbation. In the present work, we have assumed the weak selection limit. However, one can retain a selection strength parameter (which is $\eta$ in this article) to be finite and write down a formal solution. Then, it may be interesting to consider the simultaneous limit of weak selection $\eta\to 0$ and weak network perturbation $\varepsilon\to 0$ in a way $\eta$ and $\varepsilon$ are interrelated. Apart from this research direction, assessing the validity of the present perturbation theory under strong selection is left for future work. To this end, we first need to understand the accuracy of the original theory of fixation of cooperation in networks [15], which our theory is based on, under strong selection.

We do not know why the perturbation theory is more accurate when one removes an edge than when one adds an edge. Furthermore, we have found that the perturbation theory is fairly accurate at predicting the result for adding a parallel edge where an edge already exists, whereas it is not accurate when adding an edge where an edge does not exist in the original network. In a related vein, we observed nonmonotonic behavior in the cooperativity in terms of $(b/c)^{*}$ especially when we gradually added a weighted edge (Figs. 3(b) and 3(f)). These results lead us to hypothesize that we can engineer networks that promote cooperation better by considering weighted networks than unweighted networks. These topics also warrant future work.

Data accessibility

The empirical network data sets are open resources and available at [37, 38, 39, 40, 41, 42, 43]. The Python codes and synthetic network data sets used in the present study are available on GitHub [44].

Acknowledgments

N.M. acknowledges support from AFOSR European Office (under Grant No. FA9550-19-1-7024), the Sumitomo Foundation, the Japan Science and Technology Agency (JST) Moonshot R&D (under Grant No. JPMJMS2021), and the National Science Foundation (under Grant Nos. 2052720 and 2204936).

Appendix A Derivation of Eq. (16)

We restate the stationary probability $\pi_{i}(0)$ , given by Eq. (1), as

\displaystyle\pi_{i}(0)=\frac{s_{i}(0)}{S(0)},

(36)

where $s_{i}(0)$ is the original weighted degree of the $i$ th node, and $S(0)$ , given by Eq. (14), is the original total edge weight of the network. If one perturbs the weight of an edge $(i_{0},j_{0})$ by $\varepsilon$ , the weighted degree of the $i$ th node, $s_{i}(\varepsilon)$ , is equal to $s_{i}(0)+\delta_{ii_{0}}\varepsilon+\delta_{ij_{0}}\varepsilon$ . After the perturbation, the total edge weight of the network changes to $S(\varepsilon)=S(0)+2\varepsilon$ . Therefore, the stationary probability after the perturbation, $\pi_{i}(\varepsilon)$ , is given by

\displaystyle\pi_{i}(\varepsilon)=\frac{s_{i}(0)+\delta_{ii_{0}}\varepsilon+\delta_{ij_{0}}\varepsilon}{S(0)+2\varepsilon}.

(37)

The first-order derivative of $\pi_{i}(\varepsilon)$ in terms of $\varepsilon$ , denoted by $\Delta\pi_{i}$ , is given by

\displaystyle\Delta\pi_{i}(\varepsilon)=\frac{(\delta_{ii_{0}}+\delta_{ij_{0}})\left[S(0)+2\varepsilon\right]-\left[s_{i}(0)+\delta_{ii_{0}}\varepsilon+\delta_{ij_{0}}\varepsilon\right]\times 2}{\left[S(0)+2\varepsilon\right]^{2}}.

(38)

By setting $\varepsilon=0$ in Eq. (38), we obtain

	$\displaystyle\Delta\pi_{i}(0)$	$\displaystyle=\frac{S(0)(\delta_{ii_{0}}+\delta_{ij_{0}})-2s_{i}(0)}{S(0)^{2}}$
		$\displaystyle=\frac{\delta_{ii_{0}}+\delta_{ij_{0}}}{S(0)}-\frac{2\pi_{i}(0)}{S(0)}.$		(39)

We wrote $S$ instead of $S(0)$ in Eq. (16) for brevity.

Appendix B Derivation of Eqs. (21), (22), and (23)

B.1 Derivation of Eq. (21)

Let $p_{ij}(\varepsilon)$ be the transition probability of the random walk from node $i$ to node $j$ on the network after perturbing the weight of edge $(i_{0},j_{0})$ by $\varepsilon$ . Note that $p_{ij}(0)$ is the transition probability on the original network. By definition, we have

\displaystyle p_{ij}(\varepsilon)=\frac{w_{ij}(\varepsilon)}{s_{i}(\varepsilon)},

(40)

where $w_{ij}(\varepsilon)$ represents the weight of edge $(i,j)$ after the same perturbation, and we remind that $s_{i}(\varepsilon)$ is the weighted degree of the $i$ th node after the perturbation. In Appendix A, we showed that

\displaystyle s_{i}(\varepsilon)=s_{i}(0)+\delta_{ii_{0}}\varepsilon+\delta_{ij_{0}}\varepsilon.

(41)

We can also verify that

\displaystyle w_{ij}(\varepsilon)=w_{ij}(0)+\varepsilon\chi_{i_{0}j_{0}}(i,j),

(42)

where $\chi_{i_{0}j_{0}}(i,j)$ , defined by Eq. (17), is the indicator function, which is equal to $1$ if and only if edge $(i,j)$ is the perturbed edge $(i_{0},j_{0})$ ; it is equal to $0$ otherwise. By substituting Eqs. (41) and (42) in Eq. (40), we obtain

\displaystyle p_{ij}(\varepsilon)=\frac{w_{ij}(0)+\varepsilon\chi_{i_{0}j_{0}}(i,j)}{s_{i}(0)+\delta_{ii_{0}}\varepsilon+\delta_{ij_{0}}\varepsilon}.

(43)

By taking the first-order derivative of Eq. (43) with respect to $\varepsilon$ , we obtain

\displaystyle p_{ij}^{\prime}(\varepsilon)=\frac{\chi_{i_{0}j_{0}}(i,j)\left[s_{i}(0)+\delta_{ii_{0}}\varepsilon+\delta_{ij_{0}}\varepsilon\right]-\left[w_{ij}(0)+\varepsilon\chi_{i_{0}j_{0}}(i,j)\right](\delta_{ii_{0}}+\delta_{ij_{0}})}{\left[s_{i}(0)+\delta_{ii_{0}}\varepsilon+\delta_{ij_{0}}\varepsilon\right]^{2}},

(44)

which leads to

	$\displaystyle p_{ij}^{\prime}(0)$	$\displaystyle=\frac{\chi_{i_{0}j_{0}}(i,j)s_{i}(0)-w_{ij}(0)(\delta_{ii_{0}}+\delta_{ij_{0}})}{\left[s_{i}(0)\right]^{2}}$
		$\displaystyle=\frac{\chi_{i_{0}j_{0}}(i,j)}{s_{i}}-p_{ij}(0)\frac{\delta_{ii_{0}}+\delta_{ij_{0}}}{s_{i}}.$		(45)

We omitted the argument of $s_{i}(0)$ in the last line of Eq. (45) without ambiguity. This completes the proof of Eq. (21). Note that $p_{ij}^{\prime}(0)$ is equivalent to $\theta^{(1)}_{ij}$ in Eq. (21). We chose a different notation $\Theta^{(1)}$ in the main text to avoid confusion between the power of matrix $P$ and the derivative of $P$ .

B.2 Derivation of Eq. (22)

We have shown that

\displaystyle P(\varepsilon)=P(0)+\varepsilon\Theta^{(1)}+o(\varepsilon),

(46)

where $\Theta^{(1)}=(\theta^{(1)}_{ij})=(p_{ij}^{\prime}(0))$ . By squaring Eq. (46), we obtain

\displaystyle P^{2}(\varepsilon)=P^{2}(0)+\varepsilon\left[\Theta^{(1)}P(0)+P(0)\Theta^{(1)}\right]+o(\varepsilon).

(47)

Let us evaluate matrix $\Theta^{(1)}P(0)$ first. We denote by $a_{ij}$ the $(i,j)$ entry of $\Theta^{(1)}P(0)$ . By substituting Eq. (21) in

\displaystyle a_{ij}=\sum_{k=1}^{N}\theta^{(1)}_{ik}p_{kj}(0),

(48)

we obtain

$\displaystyle a_{ij}$	$\displaystyle=\sum_{k=1}^{N}\left[\frac{\chi_{i_{0}j_{0}}(i,k)}{s_{i}}-p_{ik}(0)\frac{\delta_{ii_{0}}+\delta_{ij_{0}}}{s_{i}}\right]p_{kj}(0)$
	$\displaystyle=\sum_{k=1}^{N}\frac{\chi_{i_{0}j_{0}}(i,k)p_{kj}(0)}{s_{i}}-\sum_{k=1}^{N}p_{ik}(0)p_{kj}(0)\frac{\delta_{ii_{0}}+\delta_{ij_{0}}}{s_{i}}$
	$\displaystyle=\sum_{k=1}^{N}\frac{\chi_{i_{0}j_{0}}(i,k)p_{kj}(0)}{s_{i}}-p_{ij}^{(2)}(0)\frac{\delta_{ii_{0}}+\delta_{ij_{0}}}{s_{i}},$	(49)

where $p_{ij}^{(2)}(0)$ is the transition probability of the random walk from node $i$ to node $j$ in two time steps. Note that $\chi_{i_{0}j_{0}}(i,k)=1$ if and only if $(i,k)=(i_{0},j_{0})$ or $(i,k)=(j_{0},i_{0})$ . If $i=i_{0}$ , then $k=j_{0}$ must hold true for the term $\chi_{i_{0}j_{0}}(i,k)p_{kj}(0)$ not to vanish. Similarly, if $i=j_{0}$ , then $k=i_{0}$ must hold true for $\chi_{i_{0}j_{0}}(i,k)p_{kj}(0)$ not to vanish. If $i\neq i_{0},j_{0}$ , then we obtain $\chi_{i_{0}j_{0}}(i,k)p_{kj}(0)=0$ $\forall k$ . Therefore, we can simplify Eq. (49) into

\displaystyle a_{ij}

\displaystyle=\frac{\delta_{ii_{0}}p_{j_{0}j}(0)}{s_{i}}+\frac{\delta_{ij_{0}}p_{i_{0}j}(0)}{s_{i}}-p_{ij}^{(2)}(0)\frac{\delta_{ii_{0}}+\delta_{ij_{0}}}{s_{i}}.

(50)

Similarly, by substituting Eq. (21) in

\displaystyle b_{ij}\equiv\sum_{k=1}^{N}p_{ik}(0)\theta^{(1)}_{kj},

(51)

we obtain

$\displaystyle b_{ij}$	$\displaystyle=\sum_{k=1}^{N}p_{ik}(0)\left[\frac{\chi_{i_{0}j_{0}}(k,j)}{s_{k}}-p_{kj}(0)\frac{\delta_{ki_{0}}+\delta_{kj_{0}}}{s_{k}}\right]$
	$\displaystyle=\sum_{k=1}^{N}\frac{\chi_{i_{0}j_{0}}(k,j)p_{ik}(0)}{s_{k}}-\sum_{k=1}^{N}p_{ik}(0)p_{kj}(0)\frac{\delta_{ki_{0}}+\delta_{kj_{0}}}{s_{k}}$
	$\displaystyle=\frac{\delta_{jj_{0}}p_{ii_{0}}(0)}{s_{i_{0}}}+\frac{\delta_{ji_{0}}p_{ij_{0}}(0)}{s_{j_{0}}}-p_{ii_{0}}(0)p_{i_{0}j}(0)\frac{1}{s_{i_{0}}}-p_{ij_{0}}(0)p_{j_{0}j}(0)\frac{1}{s_{j_{0}}}.$	(52)

Because $\theta^{(2)}_{ij}=a_{ij}+b_{ij}$ , we have proved Eq. (22).

B.3 Derivation of Eq. (23)

By matrix multiplication, we obtain

\displaystyle\Theta^{(3)}=\Theta^{(1)}P^{2}(0)+P(0)\Theta^{(1)}P(0)+P^{2}(0)\Theta^{(1)}.

(53)

Let $c_{ij}$ , $d_{ij}$ , and $e_{ij}$ be the $(i,j)$ entry of matrices $\Theta^{(1)}P^{2}(0)$ , $P(0)\Theta^{(1)}P(0)$ , and $P^{2}(0)\Theta^{(1)}$ , respectively.

Using Eq. (21), we obtain

$\displaystyle c_{ij}$	$\displaystyle=\sum_{k=1}^{N}\theta^{(1)}_{ik}p^{(2)}_{kj}(0)$
	$\displaystyle=\sum_{k=1}^{N}\left[\frac{\chi_{i_{0}j_{0}}(i,k)}{s_{i}}-p_{ik}(0)\frac{\delta_{ii_{0}}+\delta_{ij_{0}}}{s_{i}}\right]p^{(2)}_{kj}(0)$
	$\displaystyle=\sum_{k=1}^{N}\frac{\chi_{i_{0}j_{0}}(i,k)p^{(2)}_{kj}(0)}{s_{i}}-\sum_{k=1}^{N}p_{ik}(0)p^{(2)}_{kj}(0)\frac{\delta_{ii_{0}}+\delta_{ij_{0}}}{s_{i}}$
	$\displaystyle=\frac{\delta_{ii_{0}}}{s_{i_{0}}}p^{(2)}_{j_{0}j}(0)+\frac{\delta_{ij_{0}}}{s_{j_{0}}}p^{(2)}_{i_{0}j}(0)-p^{(3)}_{ij}(0)\frac{\delta_{ii_{0}}+\delta_{ij_{0}}}{s_{i}}.$	(54)

Similarly, we obtain

$\displaystyle d_{ij}$	$\displaystyle=\sum_{k=1}^{N}b_{ik}p_{kj}(0)$
	$\displaystyle=\sum_{k=1}^{N}\left[\frac{\delta_{kj_{0}}p_{ii_{0}}(0)}{s_{i_{0}}}+\frac{\delta_{ki_{0}}p_{ij_{0}}(0)}{s_{j_{0}}}-p_{ii_{0}}(0)p_{i_{0}k}(0)\frac{1}{s_{i_{0}}}-p_{ij_{0}}(0)p_{j_{0}k}(0)\frac{1}{s_{j_{0}}}\right]p_{kj}(0)$
	$\displaystyle=\frac{p_{ii_{0}}(0)p_{j_{0}j}(0)}{s_{i_{0}}}+\frac{p_{ij_{0}}(0)p_{i_{0}j}(0)}{s_{j_{0}}}-\frac{p_{ii_{0}}(0)p_{i_{0}j}^{(2)}(0)}{s_{i_{0}}}-\frac{p_{ij_{0}}(0)p_{j_{0}j}^{(2)}(0)}{s_{j_{0}}}$	(55)

and

$\displaystyle e_{ij}$	$\displaystyle=\sum_{k=1}^{N}p_{ik}^{(2)}(0)\theta_{kj}^{(1)}$
	$\displaystyle=\sum_{k=1}^{N}p_{ik}^{(2)}(0)\left[\frac{\chi_{i_{0}j_{0}}(k,j)}{s_{k}}-p_{kj}(0)\frac{\delta_{ki_{0}}+\delta_{kj_{0}}}{s_{k}}\right]$
	$\displaystyle=\frac{\delta_{jj_{0}}}{s_{i_{0}}}p^{(2)}_{ii_{0}}(0)+\frac{\delta_{ji_{0}}}{s_{j_{0}}}p^{(2)}_{ij_{0}}(0)-p^{(2)}_{ii_{0}}(0)p_{i_{0}j}(0)\frac{1}{s_{i_{0}}}-p^{(2)}_{ij_{0}}(0)p_{j_{0}j}(0)\frac{1}{s_{j_{0}}}.$	(56)

Because $\theta^{(3)}_{ij}=c_{ij}+d_{ij}+e_{ij}$ , we have proved Eq. (23).

Appendix C Derivation of $\Delta M$

In this section, we derive $\Delta M$ as a block matrix

\displaystyle\Delta M=\begin{pmatrix}\Delta_{11}&\Delta_{12}&\cdots&\Delta_{1N}\\ \Delta_{21}&\Delta_{22}&\cdots&\Delta_{2N}\\ \vdots&\vdots&\ddots&\vdots\\ \Delta_{N1}&\Delta_{N2}&\cdots&\Delta_{NN}\end{pmatrix},

(57)

where each $\Delta_{ij}$ is an $N\times N$ matrix.

The $i$ th row of the diagonal block $\Delta_{ii}$ is filled by 0, and all the other rows are the same as those of matrix $-\frac{1}{2}\Theta^{(1)}$ . For example, we obtain

\displaystyle\Delta_{22}=-\frac{1}{2}\begin{pmatrix}\theta^{(1)}_{11}&\theta^{(1)}_{12}&\cdots&\theta^{(1)}_{1N}\\ 0&0&\cdots&0\\ \theta^{(1)}_{31}&\theta^{(1)}_{32}&\cdots&\theta^{(1)}_{3N}\\ \vdots&\vdots&\ddots&\vdots\\ \theta^{(1)}_{N1}&\theta^{(1)}_{N2}&\cdots&\theta^{(1)}_{NN}\end{pmatrix}.

(58)

For $i\neq j$ , the $j$ th row of $\Delta_{ij}$ is equal to the $i$ th row of $-\frac{1}{2}\Theta^{(1)}$ , and all the other rows are filled by 0. For example, we obtain

\displaystyle\Delta_{21}=-\frac{1}{2}\begin{pmatrix}\theta^{(1)}_{21}&\theta^{(1)}_{22}&\cdots&\theta^{(1)}_{2N}\\ 0&0&\cdots&0\\ \vdots&\vdots&\ddots&\vdots\\ 0&0&\cdots&0\\ \end{pmatrix}

(59)

and

\displaystyle\Delta_{23}=-\frac{1}{2}\begin{pmatrix}0&0&\cdots&0\\ 0&0&\cdots&0\\ \theta^{(1)}_{21}&\theta^{(1)}_{22}&\cdots&\theta^{(1)}_{2N}\\ 0&0&\cdots&0\\ \vdots&\vdots&\ddots&\vdots\\ 0&0&\cdots&0\end{pmatrix}.

(60)

Equation (21) implies that only the $i_{0}$ th and $j_{0}$ th rows of $\Theta^{(1)}$ may be nonzero. Owing to this property, there are only $3N-2$ nonzero matrix blocks $\Delta_{ij}$ out of the $N^{2}$ blocks of $\Delta M$ , which we show using an example as follows. Assume that we perturb edge $(i_{0},j_{0})=(4,7)$ . Then, the fourth and seventh rows are the only nonzero rows of $\Theta^{(1)}$ . Therefore, $\Delta_{ii}$ , where $i\notin\{i_{0},j_{0}\}$ , has only the $i_{0}$ th and $j_{0}$ th rows nonzero. Any off-diagonal matrix block $\Delta_{ij}$ , where $i\not\in\{i_{0},j_{0}\}$ and $j\neq i$ , is zero because it has $-\left(\theta_{i1}^{(1)},\ldots,\theta_{iN}^{(1)}\right)/2$ in the $j$ th row, this row is zero given that $i\notin\{i_{0},j_{0}\}$ , and all the other rows are zero. We next consider $\Delta_{i_{0}j}$ , where $j\in\{1,\ldots,N\}$ . Diagonal block $\Delta_{i_{0}i_{0}}$ has only the $j_{0}$ th row nonzero, which is given by $-\left(\theta^{(1)}_{j_{0}1},\ldots,\theta^{(1)}_{j_{0}N}\right)/2$ . Off-diagonal block $\Delta_{i_{0}j}$ , where $j\neq i_{0}$ , has only the $j$ th row nonzero, which is equal to $-\left(\theta^{(1)}_{i_{0}1},\ldots,\theta^{(1)}_{i_{0}N}\right)/2$ . Therefore, all the blocks $\Delta_{i_{0}j}$ with $j\in\{1,\ldots,N\}$ are nonzero in general. Likewise, $\Delta_{j_{0}j_{0}}$ has only the $i_{0}$ th row nonzero, which is given by $-\left(\theta^{(1)}_{i_{0}1},\ldots,\theta^{(1)}_{i_{0}N}\right)/2$ . Off-diagonal block $\Delta_{j_{0}j}$ , where $j\neq j_{0}$ , has only the $j$ th row nonzero, which is equal to $-\left(\theta^{(1)}_{j_{0}1},\ldots,\theta^{(1)}_{j_{0}N}\right)/2$ . Therefore, all the blocks $\Delta_{j_{0}j}$ with $j\in\{1,\ldots,N\}$ are nonzero in general. This proves that there are $(N-2)+N+N=3N-2$ nonzero matrix blocks $\Delta_{ij}$ .

Furthermore, most rows of $\Delta M$ are zero rows. Specifically, consider $N$ rows of $\Delta M$ given in a block matrix form by $\left(\Delta_{i1},\ldots,\Delta_{iN}\right)$ , where $i\notin\{i_{0},j_{0}\}$ . As we have shown, the only non-zero block among $\Delta_{i1}$ , $\ldots$ , $\Delta_{iN}$ is $\Delta_{ii}$ , and the only nonzero rows of $\Delta_{ii}$ are the $i_{0}$ th and $j_{0}$ th rows. Therefore, the $N-2$ rows of $\left(\Delta_{i1},\ldots,\Delta_{iN}\right)$ , i.e., $j$ th rows with $j\notin\{i_{0},j_{0}\}$ , are zero rows. In addition, the $i_{0}$ th row of $\left(\Delta_{i_{0}1},\ldots,\Delta_{i_{0}N}\right)$ and the $j_{0}$ th row of $\left(\Delta_{j_{0}1},\ldots,\Delta_{j_{0}N}\right)$ are zero rows. Therefore, the number of nonzero rows of $\Delta M$ is equal to $2\times(N-2)+(N-1)\times 2=4N-6$ .

References

\bibcommenthead
Trivers [1971] Trivers, R.L.: The evolution of reciprocal altruism. Q. Rev. Biol. 46, 35–57 (1971)
Axelrod [1984] Axelrod, R.: The Evolution of Cooperation. Basic Books, New York (1984)
Hofbauer and Sigmund [1998] Hofbauer, J., Sigmund, K.: Evolutionary Games and Population Dynamics. Cambridge University Press, Cambridge, UK (1998)
Nowak [2006a] Nowak, M.A.: Five rules for the evolution of cooperation. Science 314, 1560–1563 (2006)
Nowak [2006b] Nowak, M.A.: Evolutionary Dynamics: Exploring the Equations of Life. Harvard University Press, Cambridge, UK (2006)
Henrich and Henrich [2007] Henrich, N., Henrich, J.P.: Why Humans Cooperate: A Cultural and Evolutionary Explanation. Oxford University Press, Oxford, UK (2007)
Sigmund [2010] Sigmund, K.: The Calculus of Selfishness. Princeton University Press, Princeton, NJ (2010)
Bowles and Gintis [2011] Bowles, S., Gintis, H.: A Cooperative Species. Princeton University Press, Princeton, NJ (2011)
Nowak and May [1992] Nowak, M.A., May, R.M.: Evolutionary games and spatial chaos. Nature 359, 826–829 (1992)
Nowak and May [1993] Nowak, M.A., May, R.M.: The spatial dilemmas of evolution. Int. J. Bifurcation Chaos 3, 35–78 (1993)
Szabó and Fath [2007] Szabó, G., Fath, G.: Evolutionary games on graphs. Phys. Rep. 446, 97–216 (2007)
Santos and Pacheco [2005] Santos, F.C., Pacheco, J.M.: Scale-free networks provide a unifying framework for the emergence of cooperation. Phys. Rev. Lett. 95, 098104 (2005)
Ohtsuki et al. [2006] Ohtsuki, H., Hauert, C., Lieberman, E., Nowak, M.A.: A simple rule for the evolution of cooperation on graphs and social networks. Nature 441, 502–505 (2006)
Santos et al. [2006] Santos, F.C., Pacheco, J.M., Lenaerts, T.: Evolutionary dynamics of social dilemmas in structured heterogeneous populations. Proc. Natl. Acad. Sci. U.S.A. 103, 3490–3494 (2006)
Allen et al. [2017] Allen, B., Lippner, G., Chen, Y.-T., Fotouhi, B., Momeni, N., Yau, S.-T., Nowak, M.A.: Evolutionary dynamics on any population structure. Nature 544, 227–230 (2017)
Nowak et al. [2004] Nowak, M.A., Sasaki, A., Taylor, C., Fudenberg, D.: Emergence of cooperation and evolutionary stability in finite populations. Nature 428, 646–650 (2004)
Ewens [2004] Ewens, W.J.: Mathematical Population Genetics: Theoretical Introduction vol. 1. Springer, New York, NY (2004)
Lieberman et al. [2005] Lieberman, E., Hauert, C., Nowak, M.A.: Evolutionary dynamics on graphs. Nature 433, 312–316 (2005)
[19] Aldous, D., Fill, J.A.: Reversible Markov chains and random walks on graphs. Unfinished monograph, recompiled 2014 (2002) available at
http://www.stat.berkeley.edu/ $\sim$ aldous/RWG/book.html. Accessed on August 6, 2022
Masuda et al. [2017] Masuda, N., Porter, M.A., Lambiotte, R.: Random walks and diffusion on networks. Phys. Rep. 716, 1–58 (2017)
Coppersmith and Winograd [1987] Coppersmith, D., Winograd, S.: Matrix multiplication via arithmetic progressions. In: Proceedings of the Nineteenth Annual ACM Symposium on Theory of Computing, pp. 1–6 (1987)
Barabási and Albert [1999] Barabási, A.-L., Albert, R.: Emergence of scaling in random networks. Science 286, 509–512 (1999)
Fortunato [2010] Fortunato, S.: Community detection in graphs. Phys. Rep. 486, 75–174 (2010)
Lancichinetti et al. [2008] Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Phys. Rev. E 78, 046110 (2008)
Zachary [1977] Zachary, W.W.: An information flow model for conflict and fission in small groups. J. Anthropol. Res. 33, 452–473 (1977)
van Dijk et al. [2014] Dijk, R.E., Kaden, J.C., Argüelles-Ticó, A., Dawson, D.A., Burke, T., Hatchwell, B.J.: Cooperative investment in public goods is kin directed in communal nests of social birds. Ecol. Lett. 17, 1141–1148 (2014)
Arnberg et al. [2015] Arnberg, N.N., Shizuka, D., Chaine, A.S., Lyon, B.E.: Social network structure in wintering golden-crowned sparrows is not correlated with kinship. Mol. Ecol. 24, 5034–5044 (2015)
Bull et al. [2012] Bull, C.M., Godfrey, S., Gordon, D.M.: Social networks and the spread of salmonella in a sleepy lizard population. Mol. Ecol. 21, 4386–4392 (2012)
Lusseau et al. [2003] Lusseau, D., Schneider, K., Boisseau, O.J., Haase, P., Slooten, E., Dawson, S.M.: The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations. Behav. Ecol. Sociobiol. 54, 396–405 (2003)
Michalski et al. [2011] Michalski, R., Palus, S., Kazienko, P.: Matching organizational structure and social network extracted from email communication. In: International Conference on Business Information Systems, pp. 197–206 (2011). Springer, Berlin, Germany
Firth and Sheldon [2015] Firth, J.A., Sheldon, B.C.: Experimental manipulation of avian social structure reveals segregation is carried over across contexts. Proc. R. Soc. B 282, 20142350 (2015)
Clauset et al. [2004] Clauset, A., Newman, M.E.J., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70, 066111 (2004)
Blume [1993] Blume, L.E.: The statistical mechanics of strategic interaction. Games Econ. Behav. 5, 387–424 (1993)
Szabó and Tőke [1998] Szabó, G., Tőke, C.: Evolutionary prisoner’s dilemma game on a square lattice. Phys. Rev. E 58, 69 (1998)
Traulsen et al. [2006] Traulsen, A., Nowak, M.A., Pacheco, J.M.: Stochastic dynamics of invasion and fixation. Phys. Rev. E 74, 011909 (2006)
Allen et al. [2021] Allen, B., Sample, C., Steinhagen, P., Shapiro, J., King, M., Hedspeth, T., Goncalves, M.: Fixation probabilities in graph-structured populations under weak selection. PLoS Comput. Biol. 17, 1008695 (2021)
[37] Karate network data set: https://networkrepository.com/karate.php organized by R. Rossi and N. Ahmed. Accessed on June 26, 2022
[38] Weaver network data set: https://networkrepository.com/aves-weaver-social-03.php organized by R. Rossi and N. Ahmed. Accessed on June 26, 2022
[39] Sparrow network data set: https://networkrepository.com/aves-sparrow-social.php organized by R. Rossi and N. Ahmed. Accessed on June 26, 2022
[40] Lizard network data set: https://networkrepository.com/reptilia-lizard-network-social.php organized by R. Rossi and N. Ahmed. Accessed on June 26, 2022
[41] Dolphin network data set: https://networkrepository.com/dolphins.php organized by R. Rossi and N. Ahmed. Accessed on June 26, 2022
[42] Email network data set: https://networkrepository.com/ia-radoslaw-email.php organized by R. Rossi and N. Ahmed. Accessed on June 26, 2022
[43] Bird network data set: https://networkrepository.com/aves-wildbird-network.php organized by R. Rossi and N. Ahmed. Accessed on June 26, 2022
[44] Meng, L. Data sets and Python codes. See https://github.com/lingqime/Perturbation-theory-for-evolutionary-game-dynamics-on-networks. Accessed on August 13, 2022

Perturbation theory for evolution of cooperation on networks

Abstract

keywords:

1 Introduction

2 Fixation of cooperation on networks under weak selection

3 Perturbation theory for graph surgery

4 Time complexity

5 Data

6 Numerical results

6.1 Addition or removal of a single edge

6.2 Enhancement of the weight of an existing edge

6.3 Sequential edge removal

7 Conclusions

Data accessibility

Acknowledgments

Appendix A Derivation of Eq. (16)

Appendix B Derivation of Eqs. (21), (22), and (23)

B.1 Derivation of Eq. (21)

B.2 Derivation of Eq. (22)

B.3 Derivation of Eq. (23)

Appendix C Derivation of Δ​M\Delta M

References

Appendix C Derivation of $\Delta M$