Necessary and Sufficient Condition for the Existence of Zero-Determinant Strategies in Repeated Games

Masahiko Ueda¹ [email protected]¹Graduate School of Sciences and Technology for Innovation¹Graduate School of Sciences and Technology for Innovation Yamaguchi University Yamaguchi University Yamaguchi 753-8511 Yamaguchi 753-8511 Japan Japan

Abstract

Zero-determinant strategies are a class of memory-one strategies in repeated games which unilaterally enforce linear relationships between payoffs. It has long been unclear for what stage games zero-determinant strategies exist. We provide a necessary and sufficient condition for the existence of zero-determinant strategies. This condition can be interpreted as the existence of two different actions which unilaterally adjust the total value of a linear combination of payoffs. A relation between the class of stage games where zero-determinant strategies exist and other class of stage games is also provided.

1 Introduction

Zero-determinant (ZD) strategies are a class of memory-one strategies (strategies which recall only one previous period) in repeated games which unilaterally enforce linear relationships between payoffs of players. ZD strategies were first discovered by two physicists, Press and Dyson, in the repeated prisoner’s dilemma games [1]. ZD strategies contain several counterintuitive examples, such as the equalizer strategy, which unilaterally sets the payoff of the opponent, and the extortionate strategy, which always obtains payoff greater than or equal to that of the opponent. ZD strategies also contain the generous ZD strategy, which achieves a cooperative Nash equilibrium [2]. After their discovery, many extensions have been done mainly in two directions. The first direction is extension of the range of application of ZD strategies. Concretely, ZD strategies were extended to multi-player multi-action games [3, 4, 5, 6], games with a discount factor [7, 5, 8], games with imperfect monitoring [9, 10, 11, 12], and games with asynchronous update [13]. The second direction is extension of the ability of payoff control. The concept of ZD strategies was extended so as to control moments of payoffs [14], time correlation functions of payoffs [15], and conditional expectations of payoffs [16]. A mathematical framework of ZD strategies has been used to classify memory-one strategies into such as partner strategies and rival strategies, in social dilemma situation [2, 17, 7]. Furthermore, the relation between unbeatable imitation [18, 19] and ZD strategies has gradually been clarified in two-player symmetric games [20].

Although ZD strategies have been found in several stage games, such as the prisoner’s dilemma game [1], the public goods game [3, 4], the continuous donation game [5], a two-player two-action asymmetric game [21], and two-player symmetric potential games [20], a condition for the existence of ZD strategies has not been clear. For example, it has been known that ZD strategies do not exist in the rock-paper-scissors game [11]. It has been believed that the existence of ZD strategies is highly dependent on the structure of the stage game.

In this paper, we provide a necessary and sufficient condition for the existence of ZD strategies. This condition implies that the stage game must be easy to handle in some sense for players who want to use ZD strategies for the existence of ZD strategies. From another perspective, we can introduce a class of stage games in which ZD strategies exist. Such classification of stage games may be useful similarly as symmetric games [22], potential games [23], and generalized rock-paper-scissors games [18]. We provide a relation between the class of stage games where ZD strategies exist and other class of games, for the case of two-player symmetric games.

This paper is organized as follows. In section 2, we introduce repeated games and ZD strategies. In section 3, we provide our main theorem about the necessary and sufficient condition for the existence of ZD strategies. A relation between the class of stage games where ZD strategies exist and other class of stage games is also provided in the section. Section 4 is devoted to concluding remarks.

2 Preliminaries

We consider a repeated game [24]. The set of players is described as $\mathcal{N}:=\left\{1,\cdots,N\right\}$ , where $N>1$ is the number of players. The action of player $j\in\mathcal{N}$ in the stage game is written as $\sigma_{j}\in A_{j}:=\left\{1,\cdots,M_{j}\right\}$ , where $M_{j}$ is a natural number describing the number of action of player $j$ . We collectively write $\mathcal{A}:=\prod_{j=1}^{N}A_{j}$ and $\bm{\sigma}:=\left(\sigma_{1},\cdots,\sigma_{N}\right)\in\mathcal{A}$ . We call $\bm{\sigma}$ an action profile. The payoff of player $j$ when the action profile is $\bm{\sigma}$ is described as $s_{j}\left(\bm{\sigma}\right)$ . Therefore, the stage game is $G:=\left(\mathcal{N},\left\{A_{j}\right\}_{j\in\mathcal{N}},\left\{s_{j}\right\}_{j\in\mathcal{N}}\right)$ . We write a probability $M$ -simplex by $\Delta_{M}$ . We also introduce the notation $\sigma_{-j}:=\left(\sigma_{1},\cdots,\sigma_{j-1},\sigma_{j+1},\cdots,\sigma_{N}\right)\in\prod_{k\neq j}A_{k}$ .

We repeat the stage game $G$ infinitely. We write an action of player $j$ at round $t\geq 1$ as $\sigma_{j}^{(t)}$ . The behavior strategy of player $j$ is described as $\mathcal{T}_{j}:=\left\{T^{(t)}_{j}\right\}_{t=1}^{\infty}$ , where $T^{(t)}_{j}:\mathcal{A}^{t-1}\to\Delta_{M_{j}}$ is the conditional probability at $t$ -th round. We write the expectation of the quantity $B$ with respect to strategies of all players by $\mathbb{E}[B]$ . We introduce a discounting factor $\delta$ satisfying $0\leq\delta\leq 1$ in order to discount future payoffs. The payoff of player $j$ in the infinitely repeated game is defined by

\displaystyle\mathcal{S}_{j}

\displaystyle:=

\displaystyle\left\{\begin{array}[]{ll}(1-\delta)\sum_{t=1}^{\infty}\delta^{t-1}\mathbb{E}\left[s_{j}\left(\bm{\sigma}^{(t)}\right)\right]&\left(0\leq\delta<1\right)\\ \lim_{T\rightarrow\infty}\frac{1}{T}\sum_{t=1}^{T}\mathbb{E}\left[s_{j}\left(\bm{\sigma}^{(t)}\right)\right]&\left(\delta=1\right).\end{array}\right.

(3)

In this paper, we consider only the case $\delta=1$ [1].

A time-independent memory-one strategy of player $j$ is defined as a strategy such that $T^{(t)}_{j}=T_{j}$ for $\forall t\geq 2$ and $\sigma_{j}^{(t)}$ is determined only by $\bm{\sigma}^{(t-1)}$ . For time-independent memory-one strategies $T_{j}$ of player $j$ , we introduce the Press-Dyson vectors [2, 11]

\displaystyle\hat{T}_{j}\left(\sigma_{j}|\bm{\sigma}^{\prime}\right)

\displaystyle:=

\displaystyle T_{j}\left(\sigma_{j}|\bm{\sigma}^{\prime}\right)-\delta_{\sigma_{j},\sigma^{\prime}_{j}}\quad\left(\forall\sigma_{j},\forall\bm{\sigma}^{\prime}\right),

(4)

where $\delta_{\sigma,\sigma^{\prime}}$ is the Kronecker delta. The second term in the right-hand side of Eq. (4) can be regarded as a memory-one strategy (called “Repeat”) which repeats his/her own previous action, and therefore the Press-Dyson vectors are interpreted as the difference between his/her own strategy and “Repeat”. It should be noted that, due to the properties of the conditional probability $T_{j}$ , the Press-Dyson vectors satisfy several relations. First, it satisfies

\displaystyle\sum_{\sigma_{j}}\hat{T}_{j}\left(\sigma_{j}|\bm{\sigma}^{\prime}\right)

\displaystyle=

\displaystyle 0\quad\left(\forall\bm{\sigma}^{\prime}\right)

(5)

due to the normalization condition of $T_{j}$ . Second, it satisfies

\displaystyle\hat{T}_{j}\left(\sigma_{j}|\bm{\sigma}^{\prime}\right)

\displaystyle\left\{\begin{array}[]{ll}\leq 0,&\left(\sigma_{j}=\sigma^{\prime}_{j}\right)\\ \geq 0,&\left(\sigma_{j}\neq\sigma^{\prime}_{j}\right)\end{array}\right.

(8)

for all $\sigma_{j}$ and all $\bm{\sigma}^{\prime}$ . Third, it satisfies

\displaystyle\left|\hat{T}_{j}\left(\sigma_{j}|\bm{\sigma}^{\prime}\right)\right|

\displaystyle\leq

\displaystyle 1\quad\left(\forall\sigma_{j},\forall\bm{\sigma}^{\prime}\right).

(9)

The last two comes from the fact that $T_{j}$ takes value in $[0,1]$ .

For simplicity, we introduce the notation $s_{0}\left(\bm{\sigma}\right)=1$ $(\forall\bm{\sigma})$ . By using the Press-Dyson vectors, we define the zero-determinant strategies.

Definition 1 ([1, 5])

A time-independent memory-one strategy of player $j$ is a zero-determinant (ZD) strategy when its Press-Dyson vectors can be written in the form

\displaystyle\sum_{\sigma_{j}}c_{\sigma_{j}}\hat{T}_{j}\left(\sigma_{j}|\bm{\sigma}^{\prime}\right)

\displaystyle=

\displaystyle\sum_{k=0}^{N}\alpha_{k}s_{k}\left(\bm{\sigma}^{\prime}\right)\quad\left(\forall\bm{\sigma}^{\prime}\right)

(10)

with some nontrivial coefficients $\left\{\alpha_{k}\right\}$ and $\left\{c_{\sigma_{j}}\right\}$ (that is, not $\alpha_{0}=\alpha_{1}=\cdots=\alpha_{N}=0$ , and not $c_{1}=\cdots=c_{M_{j}}=\mathrm{const.}$ ) and Eq. (10) is not zero for some $\bm{\sigma}^{\prime}$ .

In other words, in ZD strategies, a linear combination of the Press-Dyson vectors is described as a linear combination of payoff vectors and a vector of all ones. It has been known that a ZD strategy (10) unilaterally enforces a linear relation between expected payoffs [1, 2, 20]:

\displaystyle 0

\displaystyle=

\displaystyle\sum_{k=0}^{N}\alpha_{k}\left\langle s_{k}\right\rangle^{*},

(11)

where $\left\langle\cdots\right\rangle^{*}$ is the expectation with respect to the limit-of-means distribution

\displaystyle P^{*}\left(\bm{\sigma}\right)

\displaystyle:=

\displaystyle\lim_{T\rightarrow\infty}\frac{1}{T}\sum_{t=1}^{T}P_{t}\left(\bm{\sigma}\right)\quad(\forall\bm{\sigma}),

(12)

and

\displaystyle P_{t}\left(\bm{\sigma}^{(t)}\right)

\displaystyle:=

\displaystyle\sum_{\bm{\sigma}^{(t-1)}}\cdots\sum_{\bm{\sigma}^{(1)}}P\left(\bm{\sigma}^{(t)},\cdots,\bm{\sigma}^{(1)}\right)

(13)

is the marginal distribution obtained from the joint distribution of action profiles. Because $\mathcal{S}_{k}=\left\langle s_{k}\right\rangle^{*}$ $(\forall k)$ , the linear relation (11) can be interpreted as a linear relation between payoffs in the repeated game.

3 Results

3.1 Necessary and sufficient condition for the existence of ZD strategies

Although ZD strategies have been found in several stage games, such as the prisoner’s dilemma game [1], the public goods game [3, 4], the continuous donation game [5], a two-player two-action asymmetric game [21], and two-player symmetric potential games [20], the condition of the existence of ZD strategies has not been clear. In this section, we provide a necessary and sufficient condition for the existence of ZD strategies.

Theorem 1

A ZD strategy of player $j$ exists if and only if there exist some nontrivial coefficients $\{\alpha_{k}\}_{k=0}^{N}$ and two different actions $\overline{\sigma}_{j},\underline{\sigma}_{j}\in A_{j}$ of player $j$ such that

	$\displaystyle\sum_{k=0}^{N}\alpha_{k}s_{k}\left(\overline{\sigma}_{j},\sigma_{-j}\right)$	$\displaystyle\leq$	$\displaystyle 0\quad\left(\forall\sigma_{-j}\right)$
	$\displaystyle\sum_{k=0}^{N}\alpha_{k}s_{k}\left(\underline{\sigma}_{j},\sigma_{-j}\right)$	$\displaystyle\geq$	$\displaystyle 0\quad\left(\forall\sigma_{-j}\right),$		(14)

and $\sum_{k=0}^{N}\alpha_{k}s_{k}$ is not identically zero, for the stage game $G$ .

Proof. (Necessity) If a ZD strategy of player $j$ exists, then the Press-Dyson vectors satisfy

\displaystyle\sum_{\sigma_{j}}c_{\sigma_{j}}\hat{T}_{j}\left(\sigma_{j}|\bm{\sigma}^{\prime}\right)

\displaystyle=

\displaystyle\sum_{k=0}^{N}\alpha_{k}s_{k}\left(\bm{\sigma}^{\prime}\right)\quad\left(\forall\bm{\sigma}^{\prime}\right)

(15)

with some nontrivial coefficients $\left\{\alpha_{k}\right\}$ and $\left\{c_{\sigma_{j}}\right\}$ and Eq. (15) is not identically zero. Below we write $B\left(\bm{\sigma}\right):=\sum_{k=0}^{N}\alpha_{k}s_{k}\left(\bm{\sigma}\right)$ $(\forall\bm{\sigma})$ . By using Eq. (5), this can be written as

\displaystyle B\left(\bm{\sigma}^{\prime}\right)

\displaystyle=

\displaystyle\sum_{\sigma_{j}}\left(c_{\sigma_{j}}-c_{\mathrm{max}}\right)\hat{T}_{j}\left(\sigma_{j}|\bm{\sigma}^{\prime}\right)

(16)

and

\displaystyle B\left(\bm{\sigma}^{\prime}\right)

\displaystyle=

\displaystyle\sum_{\sigma_{j}}\left(c_{\sigma_{j}}-c_{\mathrm{min}}\right)\hat{T}_{j}\left(\sigma_{j}|\bm{\sigma}^{\prime}\right),

(17)

where we have defined

	$\displaystyle c_{\mathrm{max}}$	$\displaystyle:=$	$\displaystyle\max_{\sigma_{j}}c_{\sigma_{j}}$		(18)
	$\displaystyle c_{\mathrm{min}}$	$\displaystyle:=$	$\displaystyle\min_{\sigma_{j}}c_{\sigma_{j}}.$		(19)

We also introduce

	$\displaystyle\sigma_{\mathrm{max}}$	$\displaystyle:=$	$\displaystyle\arg\max_{\sigma_{j}}c_{\sigma_{j}}$		(20)
	$\displaystyle\sigma_{\mathrm{min}}$	$\displaystyle:=$	$\displaystyle\arg\min_{\sigma_{j}}c_{\sigma_{j}},$		(21)

where ties may be broken arbitrarily. It should also be noted that $\sigma_{\mathrm{max}}\neq\sigma_{\mathrm{min}}$ , because the left-hand-side of Eq. (15) becomes $0$ if $\sigma_{\mathrm{max}}=\sigma_{\mathrm{min}}$ and therefore $c_{\mathrm{max}}=c_{\mathrm{min}}$ , which contradicts with the definition of ZD strategies. Then, by using the property (8), we obtain

	$\displaystyle B\left(\sigma_{\mathrm{max}},\sigma_{-j}^{\prime}\right)$	$\displaystyle=$	$\displaystyle\sum_{\sigma_{j}}\left(c_{\sigma_{j}}-c_{\mathrm{max}}\right)\hat{T}_{j}\left(\sigma_{j}\|\sigma_{\mathrm{max}},\sigma_{-j}^{\prime}\right)$		(22)
		$\displaystyle\leq$	$\displaystyle 0\quad\left(\forall\sigma_{-j}^{\prime}\right)$		(22)

and

	$\displaystyle B\left(\sigma_{\mathrm{min}},\sigma_{-j}^{\prime}\right)$	$\displaystyle=$	$\displaystyle\sum_{\sigma_{j}}\left(c_{\sigma_{j}}-c_{\mathrm{min}}\right)\hat{T}_{j}\left(\sigma_{j}\|\sigma_{\mathrm{min}},\sigma_{-j}^{\prime}\right)$		(23)
		$\displaystyle\geq$	$\displaystyle 0\quad\left(\forall\sigma_{-j}^{\prime}\right).$		(23)

Therefore, the ZD strategy satisfies the condition (14) with $\overline{\sigma}_{j}=\sigma_{\mathrm{max}}$ and $\underline{\sigma}_{j}=\sigma_{\mathrm{min}}$ .

(Sufficiency) If there exist some nontrivial coefficients $\{\alpha_{k}\}_{k=0}^{N}$ and two different actions $\overline{\sigma}_{j}$ and $\underline{\sigma}_{j}$ of player $j$ satisfying the condition (14), we can construct a ZD strategy as follows. We first introduce $M:=\prod_{k=1}^{N}M_{k}$ and a vector notation of a $M$ -component quantity $D(\bm{\sigma})\in\mathbb{R}$ by $\bm{D}:=\left(D(\bm{\sigma})\right)_{\bm{\sigma}\in\mathcal{A}}\in\mathbb{R}^{M}$ . We also introduce vectors obtained from $\bm{D}$

\displaystyle\left[\bm{D}\right]_{\sigma_{j},d}

\displaystyle:=

\displaystyle\left(D(\bm{\sigma}^{\prime})\mathbb{I}(dD(\bm{\sigma}^{\prime})>0)\mathbb{I}(\sigma_{j}^{\prime}=\sigma_{j})\right)_{\bm{\sigma}^{\prime}\in\mathcal{A}}\quad\left(\sigma_{j}\in A_{j},d\in\{+,-\}\right),

where $\mathbb{I}(\cdots)$ is an indicator function which returns $1$ if $\cdots$ holds, and $0$ otherwise. By the definition, any $M$ -component vectors $\bm{D}$ can be decomposed into linearly independent vectors

\displaystyle\bm{D}

\displaystyle=

\displaystyle\sum_{\sigma_{j}}\sum_{d=+,-}\left[\bm{D}\right]_{\sigma_{j},d}.

(25)

For the quantity $\bm{B}=\sum_{k=0}^{N}\alpha_{k}\bm{s}_{k}$ , our assumption (14) leads to

\displaystyle\bm{B}

\displaystyle=

\displaystyle\sum_{\sigma_{j}\neq\overline{\sigma}_{j},\underline{\sigma}_{j}}\sum_{d=+,-}\left[\bm{B}\right]_{\sigma_{j},d}+\left[\bm{B}\right]_{\overline{\sigma}_{j},-}+\left[\bm{B}\right]_{\underline{\sigma}_{j},+}.

(26)

We also collectively write the Press-Dyson vectors of player $j$ by $\hat{\bm{T}}_{j}\left(\sigma_{j}\right):=\left(\hat{T}_{j}\left(\sigma_{j}|\bm{\sigma}^{\prime}\right)\right)_{\bm{\sigma}^{\prime}\in\mathcal{A}}$ . Below we construct ZD strategies for the case $M_{j}>2$ and the case $M_{j}=2$ separately. (Because of the existence of two different actions $\overline{\sigma}_{j},\underline{\sigma}_{j}$ , $M_{j}$ must be greater than $1$ .)

(i)

$M_{j}>2$
For the case, we set a strategy of player $j$ as

$\displaystyle\hat{\bm{T}}_{j}\left(\overline{\sigma}_{j}\right)$	$\displaystyle=$	$\displaystyle\frac{1}{W}\left(\sum_{\sigma_{j}\neq\overline{\sigma}_{j},\underline{\sigma}_{j}}\left[\bm{B}\right]_{\sigma_{j},+}+\left[\bm{B}\right]_{\overline{\sigma}_{j},-}\right)$
$\displaystyle\hat{\bm{T}}_{j}\left(\underline{\sigma}_{j}\right)$	$\displaystyle=$	$\displaystyle-\frac{1}{W}\left(\sum_{\sigma_{j}\neq\overline{\sigma}_{j},\underline{\sigma}_{j}}\left[\bm{B}\right]_{\sigma_{j},-}+\left[\bm{B}\right]_{\underline{\sigma}_{j},+}\right)$
$\displaystyle\hat{\bm{T}}_{j}\left(\sigma_{j}\right)$	$\displaystyle=$	$\displaystyle\frac{1}{W}\left(-\left[\bm{B}\right]_{\sigma_{j},+}+\left[\bm{B}\right]_{\sigma_{j},-}-\frac{1}{M_{j}-2}\left[\bm{B}\right]_{\overline{\sigma}_{j},-}\right.$	(27)
		$\displaystyle\qquad\left.+\frac{1}{M_{j}-2}\left[\bm{B}\right]_{\underline{\sigma}_{j},+}\right)\quad\left(\sigma_{j}\neq\overline{\sigma}_{j},\underline{\sigma}_{j}\right),$	(27)

where we have defined

\displaystyle W

\displaystyle:=

\displaystyle\max_{\bm{\sigma}\in\mathcal{A}}\left|B(\bm{\sigma})\right|\neq 0.

(28)

We can easily check that these vectors indeed satisfy the condition of strategies (8) and (9). In addition, the condition (5) is also satisfied because

\displaystyle\sum_{\sigma_{j}}\hat{\bm{T}}_{j}\left(\sigma_{j}\right)

\displaystyle=

\displaystyle\bm{0}.

(29)

Furthermore, the strategy (27) satisfies

\displaystyle\hat{\bm{T}}_{j}\left(\overline{\sigma}_{j}\right)-\hat{\bm{T}}_{j}\left(\underline{\sigma}_{j}\right)

\displaystyle=

\displaystyle\frac{1}{W}\bm{B},

(30)

where we have used Eq. (26). Therefore, the strategy (27) is a ZD strategy.

(ii)

$M_{j}=2$
For the case, we remark that the two actions of player $j$ are $\overline{\sigma}_{j}$ and $\underline{\sigma}_{j}$ . We set a strategy of player $j$ as

	$\displaystyle\hat{\bm{T}}_{j}\left(\overline{\sigma}_{j}\right)$	$\displaystyle=$	$\displaystyle\frac{1}{W}\left(\left[\bm{B}\right]_{\overline{\sigma}_{j},-}+\left[\bm{B}\right]_{\underline{\sigma}_{j},+}\right)$
	$\displaystyle\hat{\bm{T}}_{j}\left(\underline{\sigma}_{j}\right)$	$\displaystyle=$	$\displaystyle-\hat{\bm{T}}_{j}\left(\overline{\sigma}_{j}\right),$		(31)

where $W$ is defined by Eq. (28). We can easily check that these vectors indeed satisfy the condition of strategies (5), (8), (9). In addition, the strategy (31) satisfies

\displaystyle\hat{\bm{T}}_{j}\left(\overline{\sigma}_{j}\right)

\displaystyle=

\displaystyle\frac{1}{W}\bm{B},

(32)

where we have used Eq. (26). Therefore, the strategy (31) is a ZD strategy.

$\Box$

3.2 Example

In this subsection, we construct a ZD strategy for the case of the repeated prisoner’s dilemma game. The prisoner’s dilemma game is a two-player two-action symmetric game with following payoffs:

$\displaystyle\bm{s}_{1}$	$\displaystyle:=$	$\displaystyle\left(s_{1}(1,1),s_{1}(1,2),s_{1}(2,1),s_{1}(2,2)\right)^{\mathsf{T}}$
	$\displaystyle=$	$\displaystyle\left(R,S,T,P\right)^{\mathsf{T}}$
$\displaystyle\bm{s}_{2}$	$\displaystyle:=$	$\displaystyle\left(s_{2}(1,1),s_{2}(1,2),s_{2}(2,1),s_{2}(2,2)\right)^{\mathsf{T}}$	(33)
	$\displaystyle=$	$\displaystyle\left(R,T,S,P\right)^{\mathsf{T}},$	(33)

where $T>R>P>S$ and $2R>T+S$ . (The actions $1$ and $2$ correspond to cooperation and defection, respectively.) If we consider the quantity $\bm{B}=\sum_{k=0}^{2}\alpha_{k}\bm{s}_{k}$ with $\alpha_{1}=0$ and $\alpha_{2}=1$ ,

\displaystyle\bm{B}

\displaystyle=

\displaystyle\left(R+\alpha_{0},T+\alpha_{0},S+\alpha_{0},P+\alpha_{0}\right)^{\mathsf{T}}.

(34)

Then, if we choose $\alpha_{0}$ as $\alpha_{0}\in[-R,-P]$ , we find that the actions $1$ and $2$ of player $1$ satisfy the condition of Theorem 1 as $\overline{\sigma}_{1}=2$ and $\underline{\sigma}_{1}=1$ . Therefore, we conclude that the repeated prisoner’s dilemma game contains at least one ZD strategy, which is a well-known result. By using the construction method in the proof of Theorem 1, $\bm{B}$ is decomposed into

	$\displaystyle\left[\bm{B}\right]_{2,-}$	$\displaystyle=$	$\displaystyle\left(0,0,S+\alpha_{0},P+\alpha_{0}\right)^{\mathsf{T}}$
	$\displaystyle\left[\bm{B}\right]_{1,+}$	$\displaystyle=$	$\displaystyle\left(R+\alpha_{0},T+\alpha_{0},0,0\right)^{\mathsf{T}},$		(35)

and the ZD strategy is

	$\displaystyle\hat{\bm{T}}_{1}\left(2\right)$	$\displaystyle=$	$\displaystyle\frac{1}{W}\bm{B}$
	$\displaystyle\hat{\bm{T}}_{1}\left(1\right)$	$\displaystyle=$	$\displaystyle-\hat{\bm{T}}_{1}\left(2\right)$		(36)

with $W:=\max_{\bm{\sigma}\in\mathcal{A}}\left|B(\bm{\sigma})\right|$ . It should be noted that this ZD strategy is called the equalizer strategy and it unilaterally enforces $\left\langle s_{2}\right\rangle^{*}=-\alpha_{0}$ [1].

3.3 Relation to other class of stage games

Theorem 1 can be used to define a class of stage games where ZD strategies exist. In this paper, we call this class ZD games. A natural question is the relation between ZD games and other classes of stage games, such as potential games and totally symmetric games.

Here, we restrict our attention to two-player symmetric games. In other words, the payoffs satisfy $s_{2}(\sigma_{1},\sigma_{2})=s_{1}(\sigma_{2},\sigma_{1})$ $(\forall\bm{\sigma})$ . We also write the set of actions as $A_{1}=A_{2}=A:=\{1,\cdots,L\}$ , where $L$ is the common number of actions. We first introduce the concept of generalized rock-paper-scissors games.

Definition 2 ([25, 18])

A stage game is a generalized rock-paper-scissors (gRPS) game if it contains at least one subset of the action space $A^{\prime}\subseteq A$ such that for all $\sigma_{1}\in A^{\prime}$ there exists $\sigma_{2}\in A^{\prime}$ such that $s_{1}^{(\mathrm{A})}\left(\sigma_{1},\sigma_{2}\right)<0$ , where $s_{1}^{(\mathrm{A})}(\sigma_{1},\sigma_{2}):=\left[s_{1}(\sigma_{1},\sigma_{2})-s_{1}(\sigma_{2},\sigma_{1})\right]/2$ is an anti-symmetric part of $s_{1}$ .

We also call the complementary set of gRPS games in all two-player symmetric games as non-gRPS games. We now prove the following theorem on the relation between non-gRPS games and ZD games.

Theorem 2

If a stage game is not a gRPS game, then it is a ZD game.

Proof. If a stage game is not a gRPS game, then, for all $A^{\prime}\subseteq A$ , there exists an action $\sigma_{1}\in A^{\prime}$ such that $s_{1}^{(\mathrm{A})}\left(\sigma_{1},\sigma_{2}\right)\geq 0$ $\left(\forall\sigma_{2}\in A^{\prime}\right)$ . Such action $\sigma_{1}$ is an unbeatable action in $A^{\prime}$ . It should be noted that $A^{\prime}$ can be $A$ . We now construct a series of unbeatable actions as follows. First, $\sigma^{*(1)}$ is an unbeatable action when the action space is $A$ , that is,

\displaystyle s_{1}^{(\mathrm{A})}\left(\sigma^{*(1)},\sigma_{2}\right)

\displaystyle\geq

\displaystyle 0\quad\left(\forall\sigma_{2}\in A\right).

(37)

Then, for $2\leq l\leq L$ , we recursively define the set $A^{(l)}:=A\backslash\left\{\sigma^{*(1)},\cdots,\sigma^{*(l-1)}\right\}$ and an action $\sigma^{*(l)}\in A^{(l)}$ such that

\displaystyle s_{1}^{(\mathrm{A})}\left(\sigma^{*(l)},\sigma_{2}\right)\geq 0\quad\left(\forall\sigma_{2}\in A^{(l)}\right).

(38)

We also write $A^{(1)}:=A$ . We remark that such series $\left\{\sigma^{*(l)}\right\}_{l=1}^{L}$ is well-defined due to the property of non-gRPS games.

Because $s_{1}^{(\mathrm{A})}(\sigma_{1},\sigma_{2})=\left[s_{1}(\sigma_{1},\sigma_{2})-s_{2}(\sigma_{1},\sigma_{2})\right]/2$ for two-player symmetric games, we can see that $\underline{\sigma}_{1}=\sigma^{*(1)}$ in the condition of Theorem 1:

\displaystyle s_{1}\left(\sigma^{*(1)},\sigma_{2}\right)-s_{2}\left(\sigma^{*(1)},\sigma_{2}\right)

\displaystyle\geq

\displaystyle 0\quad\left(\forall\sigma_{2}\in A\right).

(39)

Next, we prove that $\overline{\sigma}_{1}=\sigma^{*(L)}$ :

\displaystyle s_{1}\left(\sigma^{*(L)},\sigma_{2}\right)-s_{2}\left(\sigma^{*(L)},\sigma_{2}\right)

\displaystyle\leq

\displaystyle 0\quad\left(\forall\sigma_{2}\in A\right).

(40)

Assume to the contrary that

\displaystyle s_{1}^{(\mathrm{A})}\left(\sigma^{*(L)},\sigma_{2}\right)

\displaystyle>

\displaystyle 0\quad(\exists\sigma_{2}\in A).

(41)

(Because $s_{1}^{(\mathrm{A})}$ is an anti-symmetric part, $\sigma_{2}\neq\sigma^{*(L)}$ .) This is rewritten as

\displaystyle s_{1}^{(\mathrm{A})}\left(\sigma_{2},\sigma^{*(L)}\right)

\displaystyle<

\displaystyle 0\quad(\exists\sigma_{2}\in A).

(42)

However, since $\sigma_{2}\in\left\{\sigma^{*(1)},\cdots,\sigma^{*(L-1)}\right\}$ and $\sigma^{*(L)}\in A^{(l)}$ for $1\leq l\leq L$ , this contradicts with Eq. (38). Therefore, we conclude that Eq. (40) indeed holds. We now find that Eqs. (39) and (40) correspond to the condition for the existence of ZD strategies in Theorem 1. We remark that a linear relation enforced by the ZD strategy is $\left\langle s_{1}\right\rangle^{*}=\left\langle s_{2}\right\rangle^{*}$ . $\Box$

It should be noted that an unbeatable imitation strategy exists if and only if a stage game is not a gRPS game [18]. Theorem 2 also constructs an unbeatable ZD strategy, which unilaterally enforces $\left\langle s_{1}\right\rangle^{*}=\left\langle s_{2}\right\rangle^{*}$ , for non-gRPS games. Both results imply that, in non-gRPS games, it is not easy for players to exploit the opponent. We also note that two-player symmetric potential games are a subset of non-gRPS games [18]. Therefore, our result directly leads to the existence of ZD strategies in two-player symmetric potential games [20].

We finally remark that the converse of Theorem 2 is not true. That is, ZD strategies can exist for some gRPS games. An example is a game in Table 1, which is a modified version of the RPS game.

Table 1: A gRPS game with a ZD strategy.

	$R$	$P$	$S$
$R$	0,0	-1,1	1,-1
$P$	1,-1	0,0	-1,1
$S$	-1,1	1,-1	0,0
$\overline{\sigma}$	0,0	0,0	0,0
$\underline{\sigma}$	0,0	0,0	0,0

Although this game contains a gRPS cycle when $A^{\prime}=\{R,P,S\}$ , this game is also a ZD game, since $\overline{\sigma}$ and $\underline{\sigma}$ are regarded as the two actions in Theorem 1.

4 Concluding Remarks

In this paper, we have provided the necessary and sufficient condition for the existence of ZD strategies in repeated games (Theorem 1). This condition exactly means the existence of two actions which unilaterally increases and decreases the total value of $\sum_{k=0}^{N}\alpha_{k}s_{k}$ , respectively. We have now found that such property is necessary for unilateral control of payoffs by ZD strategies. In fact, we can easily check that the rock-paper-scissors game does not contain the two actions as in Theorem 1, which leads to the absence of ZD strategies [11]. From another point of view, stage games satisfying the condition of Theorem 1 can be regarded as a class allowing the existence of ZD strategies. We also provided the relation between this class of stage games (ZD games) and non-gRPS games for the case of two-player symmetric games. Further investigation on the relation between ZD games and other classes of stage games is needed.

We have investigated only the situation that a discounting factor is $\delta=1$ and monitoring is perfect. In general, the set of possible ZD strategies decreases as $\delta$ decreases and monitoring becomes imperfect [9, 8, 11, 26]. Particularly, the existence of ZD strategies in games with imperfect monitoring will be highly dependent on the set of signals of each player. Investigation of the necessary and sufficient condition for the existence of ZD strategies in games with discounting and imperfect monitoring is an important subject of future work. It would be interesting if our result can be applied for the existence of memory- $m$ ZD strategies with $m\geq 2$ [15].

{acknowledgment}

This study was supported by JSPS KAKENHI Grant Number JP20K19884 and Inamori Research Grants.

References

[1] W. H. Press and F. J. Dyson: Proceedings of the National Academy of Sciences 109 (2012) 10409.
[2] E. Akin: Ergodic Theory, Advances in Dynamical Systems (2016) 77.
[3] C. Hilbe, B. Wu, A. Traulsen, and M. A. Nowak: Proceedings of the National Academy of Sciences 111 (2014) 16425.
[4] L. Pan, D. Hao, Z. Rong, and T. Zhou: Scientific Reports 5 (2015) 13096.
[5] A. McAvoy and C. Hauert: Proceedings of the National Academy of Sciences 113 (2016) 3573.
[6] X. He, H. Dai, P. Ning, and R. Dutta: IEEE Signal Processing Letters 23 (2016) 311.
[7] C. Hilbe, A. Traulsen, and K. Sigmund: Games and Economic Behavior 92 (2015) 41.
[8] G. Ichinose and N. Masuda: Journal of Theoretical Biology 438 (2018) 61.
[9] D. Hao, Z. Rong, and T. Zhou: Phys. Rev. E 91 (2015) 052803.
[10] A. Mamiya and G. Ichinose: Journal of Theoretical Biology 477 (2019) 63.
[11] M. Ueda and T. Tanaka: PLOS ONE 15 (2020) e0230973.
[12] A. Mamiya and G. Ichinose: Phys. Rev. E 102 (2020) 032115.
[13] A. McAvoy and C. Hauert: Theoretical Population Biology 113 (2017) 13.
[14] M. Ueda: Journal of the Physical Society of Japan 90 (2021) 025002.
[15] M. Ueda: Royal Society Open Science 8 (2021) 202186.
[16] M. Ueda: arXiv preprint arXiv:2012.10231 (2020).
[17] A. J. Stewart and J. B. Plotkin: Proceedings of the National Academy of Sciences 110 (2013) 15348.
[18] P. Duersch, J. Oechssler, and B. C. Schipper: Games and Economic Behavior 76 (2012) 88.
[19] P. Duersch, J. Oechssler, and B. C. Schipper: International Journal of Game Theory 43 (2014) 25.
[20] M. Ueda: Journal of the Physical Society of Japan 91 (2022) 054804.
[21] M. A. Taha and A. Ghoneim: Applied Mathematics and Computation 369 (2020) 124862.
[22] A. Plan: University of Arizona Economics Working Paper (2017) 17.
[23] D. Monderer and L. S. Shapley: Games and Economic Behavior 14 (1996) 124.
[24] G. J. Mailath and L. Samuelson: Repeated games and reputations: long-run relationships (Oxford University Press, 2006).
[25] P. Duersch, J. Oechssler, and B. C. Schipper: International Journal of Game Theory 41 (2012) 553.
[26] A. Mamiya, D. Miyagawa, and G. Ichinose: Journal of Theoretical Biology 526 (2021) 110810.