This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Necessary and Sufficient Condition for the Existence of Zero-Determinant Strategies in Repeated Games

Masahiko Ueda1 [email protected]1Graduate School of Sciences and Technology for Innovation1Graduate School of Sciences and Technology for Innovation Yamaguchi University Yamaguchi University Yamaguchi 753-8511 Yamaguchi 753-8511 Japan Japan
Abstract

Zero-determinant strategies are a class of memory-one strategies in repeated games which unilaterally enforce linear relationships between payoffs. It has long been unclear for what stage games zero-determinant strategies exist. We provide a necessary and sufficient condition for the existence of zero-determinant strategies. This condition can be interpreted as the existence of two different actions which unilaterally adjust the total value of a linear combination of payoffs. A relation between the class of stage games where zero-determinant strategies exist and other class of stage games is also provided.

1 Introduction

Zero-determinant (ZD) strategies are a class of memory-one strategies (strategies which recall only one previous period) in repeated games which unilaterally enforce linear relationships between payoffs of players. ZD strategies were first discovered by two physicists, Press and Dyson, in the repeated prisoner’s dilemma games [1]. ZD strategies contain several counterintuitive examples, such as the equalizer strategy, which unilaterally sets the payoff of the opponent, and the extortionate strategy, which always obtains payoff greater than or equal to that of the opponent. ZD strategies also contain the generous ZD strategy, which achieves a cooperative Nash equilibrium [2]. After their discovery, many extensions have been done mainly in two directions. The first direction is extension of the range of application of ZD strategies. Concretely, ZD strategies were extended to multi-player multi-action games [3, 4, 5, 6], games with a discount factor [7, 5, 8], games with imperfect monitoring [9, 10, 11, 12], and games with asynchronous update [13]. The second direction is extension of the ability of payoff control. The concept of ZD strategies was extended so as to control moments of payoffs [14], time correlation functions of payoffs [15], and conditional expectations of payoffs [16]. A mathematical framework of ZD strategies has been used to classify memory-one strategies into such as partner strategies and rival strategies, in social dilemma situation [2, 17, 7]. Furthermore, the relation between unbeatable imitation [18, 19] and ZD strategies has gradually been clarified in two-player symmetric games [20].

Although ZD strategies have been found in several stage games, such as the prisoner’s dilemma game [1], the public goods game [3, 4], the continuous donation game [5], a two-player two-action asymmetric game [21], and two-player symmetric potential games [20], a condition for the existence of ZD strategies has not been clear. For example, it has been known that ZD strategies do not exist in the rock-paper-scissors game [11]. It has been believed that the existence of ZD strategies is highly dependent on the structure of the stage game.

In this paper, we provide a necessary and sufficient condition for the existence of ZD strategies. This condition implies that the stage game must be easy to handle in some sense for players who want to use ZD strategies for the existence of ZD strategies. From another perspective, we can introduce a class of stage games in which ZD strategies exist. Such classification of stage games may be useful similarly as symmetric games [22], potential games [23], and generalized rock-paper-scissors games [18]. We provide a relation between the class of stage games where ZD strategies exist and other class of games, for the case of two-player symmetric games.

This paper is organized as follows. In section 2, we introduce repeated games and ZD strategies. In section 3, we provide our main theorem about the necessary and sufficient condition for the existence of ZD strategies. A relation between the class of stage games where ZD strategies exist and other class of stage games is also provided in the section. Section 4 is devoted to concluding remarks.

2 Preliminaries

We consider a repeated game [24]. The set of players is described as 𝒩:={1,,N}\mathcal{N}:=\left\{1,\cdots,N\right\}, where N>1N>1 is the number of players. The action of player j𝒩j\in\mathcal{N} in the stage game is written as σjAj:={1,,Mj}\sigma_{j}\in A_{j}:=\left\{1,\cdots,M_{j}\right\}, where MjM_{j} is a natural number describing the number of action of player jj. We collectively write 𝒜:=j=1NAj\mathcal{A}:=\prod_{j=1}^{N}A_{j} and 𝝈:=(σ1,,σN)𝒜\bm{\sigma}:=\left(\sigma_{1},\cdots,\sigma_{N}\right)\in\mathcal{A}. We call 𝝈\bm{\sigma} an action profile. The payoff of player jj when the action profile is 𝝈\bm{\sigma} is described as sj(𝝈)s_{j}\left(\bm{\sigma}\right). Therefore, the stage game is G:=(𝒩,{Aj}j𝒩,{sj}j𝒩)G:=\left(\mathcal{N},\left\{A_{j}\right\}_{j\in\mathcal{N}},\left\{s_{j}\right\}_{j\in\mathcal{N}}\right). We write a probability MM-simplex by ΔM\Delta_{M}. We also introduce the notation σj:=(σ1,,σj1,σj+1,,σN)kjAk\sigma_{-j}:=\left(\sigma_{1},\cdots,\sigma_{j-1},\sigma_{j+1},\cdots,\sigma_{N}\right)\in\prod_{k\neq j}A_{k}.

We repeat the stage game GG infinitely. We write an action of player jj at round t1t\geq 1 as σj(t)\sigma_{j}^{(t)}. The behavior strategy of player jj is described as 𝒯j:={Tj(t)}t=1\mathcal{T}_{j}:=\left\{T^{(t)}_{j}\right\}_{t=1}^{\infty}, where Tj(t):𝒜t1ΔMjT^{(t)}_{j}:\mathcal{A}^{t-1}\to\Delta_{M_{j}} is the conditional probability at tt-th round. We write the expectation of the quantity BB with respect to strategies of all players by 𝔼[B]\mathbb{E}[B]. We introduce a discounting factor δ\delta satisfying 0δ10\leq\delta\leq 1 in order to discount future payoffs. The payoff of player jj in the infinitely repeated game is defined by

𝒮j\displaystyle\mathcal{S}_{j} :=\displaystyle:= {(1δ)t=1δt1𝔼[sj(𝝈(t))](0δ<1)limT1Tt=1T𝔼[sj(𝝈(t))](δ=1).\displaystyle\left\{\begin{array}[]{ll}(1-\delta)\sum_{t=1}^{\infty}\delta^{t-1}\mathbb{E}\left[s_{j}\left(\bm{\sigma}^{(t)}\right)\right]&\left(0\leq\delta<1\right)\\ \lim_{T\rightarrow\infty}\frac{1}{T}\sum_{t=1}^{T}\mathbb{E}\left[s_{j}\left(\bm{\sigma}^{(t)}\right)\right]&\left(\delta=1\right).\end{array}\right. (3)

In this paper, we consider only the case δ=1\delta=1 [1].

A time-independent memory-one strategy of player jj is defined as a strategy such that Tj(t)=TjT^{(t)}_{j}=T_{j} for t2\forall t\geq 2 and σj(t)\sigma_{j}^{(t)} is determined only by 𝝈(t1)\bm{\sigma}^{(t-1)}. For time-independent memory-one strategies TjT_{j} of player jj, we introduce the Press-Dyson vectors [2, 11]

T^j(σj|𝝈)\displaystyle\hat{T}_{j}\left(\sigma_{j}|\bm{\sigma}^{\prime}\right) :=\displaystyle:= Tj(σj|𝝈)δσj,σj(σj,𝝈),\displaystyle T_{j}\left(\sigma_{j}|\bm{\sigma}^{\prime}\right)-\delta_{\sigma_{j},\sigma^{\prime}_{j}}\quad\left(\forall\sigma_{j},\forall\bm{\sigma}^{\prime}\right), (4)

where δσ,σ\delta_{\sigma,\sigma^{\prime}} is the Kronecker delta. The second term in the right-hand side of Eq. (4) can be regarded as a memory-one strategy (called “Repeat”) which repeats his/her own previous action, and therefore the Press-Dyson vectors are interpreted as the difference between his/her own strategy and “Repeat”. It should be noted that, due to the properties of the conditional probability TjT_{j}, the Press-Dyson vectors satisfy several relations. First, it satisfies

σjT^j(σj|𝝈)\displaystyle\sum_{\sigma_{j}}\hat{T}_{j}\left(\sigma_{j}|\bm{\sigma}^{\prime}\right) =\displaystyle= 0(𝝈)\displaystyle 0\quad\left(\forall\bm{\sigma}^{\prime}\right) (5)

due to the normalization condition of TjT_{j}. Second, it satisfies

T^j(σj|𝝈)\displaystyle\hat{T}_{j}\left(\sigma_{j}|\bm{\sigma}^{\prime}\right) {0,(σj=σj)0,(σjσj)\displaystyle\left\{\begin{array}[]{ll}\leq 0,&\left(\sigma_{j}=\sigma^{\prime}_{j}\right)\\ \geq 0,&\left(\sigma_{j}\neq\sigma^{\prime}_{j}\right)\end{array}\right. (8)

for all σj\sigma_{j} and all 𝝈\bm{\sigma}^{\prime}. Third, it satisfies

|T^j(σj|𝝈)|\displaystyle\left|\hat{T}_{j}\left(\sigma_{j}|\bm{\sigma}^{\prime}\right)\right| \displaystyle\leq 1(σj,𝝈).\displaystyle 1\quad\left(\forall\sigma_{j},\forall\bm{\sigma}^{\prime}\right). (9)

The last two comes from the fact that TjT_{j} takes value in [0,1][0,1].

For simplicity, we introduce the notation s0(𝝈)=1s_{0}\left(\bm{\sigma}\right)=1 (𝝈)(\forall\bm{\sigma}). By using the Press-Dyson vectors, we define the zero-determinant strategies.

Definition 1 ([1, 5])

A time-independent memory-one strategy of player jj is a zero-determinant (ZD) strategy when its Press-Dyson vectors can be written in the form

σjcσjT^j(σj|𝝈)\displaystyle\sum_{\sigma_{j}}c_{\sigma_{j}}\hat{T}_{j}\left(\sigma_{j}|\bm{\sigma}^{\prime}\right) =\displaystyle= k=0Nαksk(𝝈)(𝝈)\displaystyle\sum_{k=0}^{N}\alpha_{k}s_{k}\left(\bm{\sigma}^{\prime}\right)\quad\left(\forall\bm{\sigma}^{\prime}\right) (10)

with some nontrivial coefficients {αk}\left\{\alpha_{k}\right\} and {cσj}\left\{c_{\sigma_{j}}\right\} (that is, not α0=α1==αN=0\alpha_{0}=\alpha_{1}=\cdots=\alpha_{N}=0, and not c1==cMj=const.c_{1}=\cdots=c_{M_{j}}=\mathrm{const.}) and Eq. (10) is not zero for some 𝛔\bm{\sigma}^{\prime}.

In other words, in ZD strategies, a linear combination of the Press-Dyson vectors is described as a linear combination of payoff vectors and a vector of all ones. It has been known that a ZD strategy (10) unilaterally enforces a linear relation between expected payoffs [1, 2, 20]:

0\displaystyle 0 =\displaystyle= k=0Nαksk,\displaystyle\sum_{k=0}^{N}\alpha_{k}\left\langle s_{k}\right\rangle^{*}, (11)

where \left\langle\cdots\right\rangle^{*} is the expectation with respect to the limit-of-means distribution

P(𝝈)\displaystyle P^{*}\left(\bm{\sigma}\right) :=\displaystyle:= limT1Tt=1TPt(𝝈)(𝝈),\displaystyle\lim_{T\rightarrow\infty}\frac{1}{T}\sum_{t=1}^{T}P_{t}\left(\bm{\sigma}\right)\quad(\forall\bm{\sigma}), (12)

and

Pt(𝝈(t))\displaystyle P_{t}\left(\bm{\sigma}^{(t)}\right) :=\displaystyle:= 𝝈(t1)𝝈(1)P(𝝈(t),,𝝈(1))\displaystyle\sum_{\bm{\sigma}^{(t-1)}}\cdots\sum_{\bm{\sigma}^{(1)}}P\left(\bm{\sigma}^{(t)},\cdots,\bm{\sigma}^{(1)}\right) (13)

is the marginal distribution obtained from the joint distribution of action profiles. Because 𝒮k=sk\mathcal{S}_{k}=\left\langle s_{k}\right\rangle^{*} (k)(\forall k), the linear relation (11) can be interpreted as a linear relation between payoffs in the repeated game.

3 Results

3.1 Necessary and sufficient condition for the existence of ZD strategies

Although ZD strategies have been found in several stage games, such as the prisoner’s dilemma game [1], the public goods game [3, 4], the continuous donation game [5], a two-player two-action asymmetric game [21], and two-player symmetric potential games [20], the condition of the existence of ZD strategies has not been clear. In this section, we provide a necessary and sufficient condition for the existence of ZD strategies.

Theorem 1

A ZD strategy of player jj exists if and only if there exist some nontrivial coefficients {αk}k=0N\{\alpha_{k}\}_{k=0}^{N} and two different actions σ¯j,σ¯jAj\overline{\sigma}_{j},\underline{\sigma}_{j}\in A_{j} of player jj such that

k=0Nαksk(σ¯j,σj)\displaystyle\sum_{k=0}^{N}\alpha_{k}s_{k}\left(\overline{\sigma}_{j},\sigma_{-j}\right) \displaystyle\leq 0(σj)\displaystyle 0\quad\left(\forall\sigma_{-j}\right)
k=0Nαksk(σ¯j,σj)\displaystyle\sum_{k=0}^{N}\alpha_{k}s_{k}\left(\underline{\sigma}_{j},\sigma_{-j}\right) \displaystyle\geq 0(σj),\displaystyle 0\quad\left(\forall\sigma_{-j}\right), (14)

and k=0Nαksk\sum_{k=0}^{N}\alpha_{k}s_{k} is not identically zero, for the stage game GG.

Proof. (Necessity) If a ZD strategy of player jj exists, then the Press-Dyson vectors satisfy

σjcσjT^j(σj|𝝈)\displaystyle\sum_{\sigma_{j}}c_{\sigma_{j}}\hat{T}_{j}\left(\sigma_{j}|\bm{\sigma}^{\prime}\right) =\displaystyle= k=0Nαksk(𝝈)(𝝈)\displaystyle\sum_{k=0}^{N}\alpha_{k}s_{k}\left(\bm{\sigma}^{\prime}\right)\quad\left(\forall\bm{\sigma}^{\prime}\right) (15)

with some nontrivial coefficients {αk}\left\{\alpha_{k}\right\} and {cσj}\left\{c_{\sigma_{j}}\right\} and Eq. (15) is not identically zero. Below we write B(𝝈):=k=0Nαksk(𝝈)B\left(\bm{\sigma}\right):=\sum_{k=0}^{N}\alpha_{k}s_{k}\left(\bm{\sigma}\right) (𝝈)(\forall\bm{\sigma}). By using Eq. (5), this can be written as

B(𝝈)\displaystyle B\left(\bm{\sigma}^{\prime}\right) =\displaystyle= σj(cσjcmax)T^j(σj|𝝈)\displaystyle\sum_{\sigma_{j}}\left(c_{\sigma_{j}}-c_{\mathrm{max}}\right)\hat{T}_{j}\left(\sigma_{j}|\bm{\sigma}^{\prime}\right) (16)

and

B(𝝈)\displaystyle B\left(\bm{\sigma}^{\prime}\right) =\displaystyle= σj(cσjcmin)T^j(σj|𝝈),\displaystyle\sum_{\sigma_{j}}\left(c_{\sigma_{j}}-c_{\mathrm{min}}\right)\hat{T}_{j}\left(\sigma_{j}|\bm{\sigma}^{\prime}\right), (17)

where we have defined

cmax\displaystyle c_{\mathrm{max}} :=\displaystyle:= maxσjcσj\displaystyle\max_{\sigma_{j}}c_{\sigma_{j}} (18)
cmin\displaystyle c_{\mathrm{min}} :=\displaystyle:= minσjcσj.\displaystyle\min_{\sigma_{j}}c_{\sigma_{j}}. (19)

We also introduce

σmax\displaystyle\sigma_{\mathrm{max}} :=\displaystyle:= argmaxσjcσj\displaystyle\arg\max_{\sigma_{j}}c_{\sigma_{j}} (20)
σmin\displaystyle\sigma_{\mathrm{min}} :=\displaystyle:= argminσjcσj,\displaystyle\arg\min_{\sigma_{j}}c_{\sigma_{j}}, (21)

where ties may be broken arbitrarily. It should also be noted that σmaxσmin\sigma_{\mathrm{max}}\neq\sigma_{\mathrm{min}}, because the left-hand-side of Eq. (15) becomes 0 if σmax=σmin\sigma_{\mathrm{max}}=\sigma_{\mathrm{min}} and therefore cmax=cminc_{\mathrm{max}}=c_{\mathrm{min}}, which contradicts with the definition of ZD strategies. Then, by using the property (8), we obtain

B(σmax,σj)\displaystyle B\left(\sigma_{\mathrm{max}},\sigma_{-j}^{\prime}\right) =\displaystyle= σj(cσjcmax)T^j(σj|σmax,σj)\displaystyle\sum_{\sigma_{j}}\left(c_{\sigma_{j}}-c_{\mathrm{max}}\right)\hat{T}_{j}\left(\sigma_{j}|\sigma_{\mathrm{max}},\sigma_{-j}^{\prime}\right) (22)
\displaystyle\leq 0(σj)\displaystyle 0\quad\left(\forall\sigma_{-j}^{\prime}\right)

and

B(σmin,σj)\displaystyle B\left(\sigma_{\mathrm{min}},\sigma_{-j}^{\prime}\right) =\displaystyle= σj(cσjcmin)T^j(σj|σmin,σj)\displaystyle\sum_{\sigma_{j}}\left(c_{\sigma_{j}}-c_{\mathrm{min}}\right)\hat{T}_{j}\left(\sigma_{j}|\sigma_{\mathrm{min}},\sigma_{-j}^{\prime}\right) (23)
\displaystyle\geq 0(σj).\displaystyle 0\quad\left(\forall\sigma_{-j}^{\prime}\right).

Therefore, the ZD strategy satisfies the condition (14) with σ¯j=σmax\overline{\sigma}_{j}=\sigma_{\mathrm{max}} and σ¯j=σmin\underline{\sigma}_{j}=\sigma_{\mathrm{min}}.

(Sufficiency) If there exist some nontrivial coefficients {αk}k=0N\{\alpha_{k}\}_{k=0}^{N} and two different actions σ¯j\overline{\sigma}_{j} and σ¯j\underline{\sigma}_{j} of player jj satisfying the condition (14), we can construct a ZD strategy as follows. We first introduce M:=k=1NMkM:=\prod_{k=1}^{N}M_{k} and a vector notation of a MM-component quantity D(𝝈)D(\bm{\sigma})\in\mathbb{R} by 𝑫:=(D(𝝈))𝝈𝒜M\bm{D}:=\left(D(\bm{\sigma})\right)_{\bm{\sigma}\in\mathcal{A}}\in\mathbb{R}^{M}. We also introduce vectors obtained from 𝑫\bm{D}

[𝑫]σj,d\displaystyle\left[\bm{D}\right]_{\sigma_{j},d} :=\displaystyle:= (D(𝝈)𝕀(dD(𝝈)>0)𝕀(σj=σj))𝝈𝒜(σjAj,d{+,}),\displaystyle\left(D(\bm{\sigma}^{\prime})\mathbb{I}(dD(\bm{\sigma}^{\prime})>0)\mathbb{I}(\sigma_{j}^{\prime}=\sigma_{j})\right)_{\bm{\sigma}^{\prime}\in\mathcal{A}}\quad\left(\sigma_{j}\in A_{j},d\in\{+,-\}\right),

where 𝕀()\mathbb{I}(\cdots) is an indicator function which returns 11 if \cdots holds, and 0 otherwise. By the definition, any MM-component vectors 𝑫\bm{D} can be decomposed into linearly independent vectors

𝑫\displaystyle\bm{D} =\displaystyle= σjd=+,[𝑫]σj,d.\displaystyle\sum_{\sigma_{j}}\sum_{d=+,-}\left[\bm{D}\right]_{\sigma_{j},d}. (25)

For the quantity 𝑩=k=0Nαk𝒔k\bm{B}=\sum_{k=0}^{N}\alpha_{k}\bm{s}_{k}, our assumption (14) leads to

𝑩\displaystyle\bm{B} =\displaystyle= σjσ¯j,σ¯jd=+,[𝑩]σj,d+[𝑩]σ¯j,+[𝑩]σ¯j,+.\displaystyle\sum_{\sigma_{j}\neq\overline{\sigma}_{j},\underline{\sigma}_{j}}\sum_{d=+,-}\left[\bm{B}\right]_{\sigma_{j},d}+\left[\bm{B}\right]_{\overline{\sigma}_{j},-}+\left[\bm{B}\right]_{\underline{\sigma}_{j},+}. (26)

We also collectively write the Press-Dyson vectors of player jj by 𝑻^j(σj):=(T^j(σj|𝝈))𝝈𝒜\hat{\bm{T}}_{j}\left(\sigma_{j}\right):=\left(\hat{T}_{j}\left(\sigma_{j}|\bm{\sigma}^{\prime}\right)\right)_{\bm{\sigma}^{\prime}\in\mathcal{A}}. Below we construct ZD strategies for the case Mj>2M_{j}>2 and the case Mj=2M_{j}=2 separately. (Because of the existence of two different actions σ¯j,σ¯j\overline{\sigma}_{j},\underline{\sigma}_{j}, MjM_{j} must be greater than 11.)

  1. (i)

    Mj>2M_{j}>2
    For the case, we set a strategy of player jj as

    𝑻^j(σ¯j)\displaystyle\hat{\bm{T}}_{j}\left(\overline{\sigma}_{j}\right) =\displaystyle= 1W(σjσ¯j,σ¯j[𝑩]σj,++[𝑩]σ¯j,)\displaystyle\frac{1}{W}\left(\sum_{\sigma_{j}\neq\overline{\sigma}_{j},\underline{\sigma}_{j}}\left[\bm{B}\right]_{\sigma_{j},+}+\left[\bm{B}\right]_{\overline{\sigma}_{j},-}\right)
    𝑻^j(σ¯j)\displaystyle\hat{\bm{T}}_{j}\left(\underline{\sigma}_{j}\right) =\displaystyle= 1W(σjσ¯j,σ¯j[𝑩]σj,+[𝑩]σ¯j,+)\displaystyle-\frac{1}{W}\left(\sum_{\sigma_{j}\neq\overline{\sigma}_{j},\underline{\sigma}_{j}}\left[\bm{B}\right]_{\sigma_{j},-}+\left[\bm{B}\right]_{\underline{\sigma}_{j},+}\right)
    𝑻^j(σj)\displaystyle\hat{\bm{T}}_{j}\left(\sigma_{j}\right) =\displaystyle= 1W([𝑩]σj,++[𝑩]σj,1Mj2[𝑩]σ¯j,\displaystyle\frac{1}{W}\left(-\left[\bm{B}\right]_{\sigma_{j},+}+\left[\bm{B}\right]_{\sigma_{j},-}-\frac{1}{M_{j}-2}\left[\bm{B}\right]_{\overline{\sigma}_{j},-}\right. (27)
    +1Mj2[𝑩]σ¯j,+)(σjσ¯j,σ¯j),\displaystyle\qquad\left.+\frac{1}{M_{j}-2}\left[\bm{B}\right]_{\underline{\sigma}_{j},+}\right)\quad\left(\sigma_{j}\neq\overline{\sigma}_{j},\underline{\sigma}_{j}\right),

    where we have defined

    W\displaystyle W :=\displaystyle:= max𝝈𝒜|B(𝝈)|0.\displaystyle\max_{\bm{\sigma}\in\mathcal{A}}\left|B(\bm{\sigma})\right|\neq 0. (28)

    We can easily check that these vectors indeed satisfy the condition of strategies (8) and (9). In addition, the condition (5) is also satisfied because

    σj𝑻^j(σj)\displaystyle\sum_{\sigma_{j}}\hat{\bm{T}}_{j}\left(\sigma_{j}\right) =\displaystyle= 𝟎.\displaystyle\bm{0}. (29)

    Furthermore, the strategy (27) satisfies

    𝑻^j(σ¯j)𝑻^j(σ¯j)\displaystyle\hat{\bm{T}}_{j}\left(\overline{\sigma}_{j}\right)-\hat{\bm{T}}_{j}\left(\underline{\sigma}_{j}\right) =\displaystyle= 1W𝑩,\displaystyle\frac{1}{W}\bm{B}, (30)

    where we have used Eq. (26). Therefore, the strategy (27) is a ZD strategy.

  2. (ii)

    Mj=2M_{j}=2
    For the case, we remark that the two actions of player jj are σ¯j\overline{\sigma}_{j} and σ¯j\underline{\sigma}_{j}. We set a strategy of player jj as

    𝑻^j(σ¯j)\displaystyle\hat{\bm{T}}_{j}\left(\overline{\sigma}_{j}\right) =\displaystyle= 1W([𝑩]σ¯j,+[𝑩]σ¯j,+)\displaystyle\frac{1}{W}\left(\left[\bm{B}\right]_{\overline{\sigma}_{j},-}+\left[\bm{B}\right]_{\underline{\sigma}_{j},+}\right)
    𝑻^j(σ¯j)\displaystyle\hat{\bm{T}}_{j}\left(\underline{\sigma}_{j}\right) =\displaystyle= 𝑻^j(σ¯j),\displaystyle-\hat{\bm{T}}_{j}\left(\overline{\sigma}_{j}\right), (31)

    where WW is defined by Eq. (28). We can easily check that these vectors indeed satisfy the condition of strategies (5), (8), (9). In addition, the strategy (31) satisfies

    𝑻^j(σ¯j)\displaystyle\hat{\bm{T}}_{j}\left(\overline{\sigma}_{j}\right) =\displaystyle= 1W𝑩,\displaystyle\frac{1}{W}\bm{B}, (32)

    where we have used Eq. (26). Therefore, the strategy (31) is a ZD strategy.

\Box

3.2 Example

In this subsection, we construct a ZD strategy for the case of the repeated prisoner’s dilemma game. The prisoner’s dilemma game is a two-player two-action symmetric game with following payoffs:

𝒔1\displaystyle\bm{s}_{1} :=\displaystyle:= (s1(1,1),s1(1,2),s1(2,1),s1(2,2))𝖳\displaystyle\left(s_{1}(1,1),s_{1}(1,2),s_{1}(2,1),s_{1}(2,2)\right)^{\mathsf{T}}
=\displaystyle= (R,S,T,P)𝖳\displaystyle\left(R,S,T,P\right)^{\mathsf{T}}
𝒔2\displaystyle\bm{s}_{2} :=\displaystyle:= (s2(1,1),s2(1,2),s2(2,1),s2(2,2))𝖳\displaystyle\left(s_{2}(1,1),s_{2}(1,2),s_{2}(2,1),s_{2}(2,2)\right)^{\mathsf{T}} (33)
=\displaystyle= (R,T,S,P)𝖳,\displaystyle\left(R,T,S,P\right)^{\mathsf{T}},

where T>R>P>ST>R>P>S and 2R>T+S2R>T+S. (The actions 11 and 22 correspond to cooperation and defection, respectively.) If we consider the quantity 𝑩=k=02αk𝒔k\bm{B}=\sum_{k=0}^{2}\alpha_{k}\bm{s}_{k} with α1=0\alpha_{1}=0 and α2=1\alpha_{2}=1,

𝑩\displaystyle\bm{B} =\displaystyle= (R+α0,T+α0,S+α0,P+α0)𝖳.\displaystyle\left(R+\alpha_{0},T+\alpha_{0},S+\alpha_{0},P+\alpha_{0}\right)^{\mathsf{T}}. (34)

Then, if we choose α0\alpha_{0} as α0[R,P]\alpha_{0}\in[-R,-P], we find that the actions 11 and 22 of player 11 satisfy the condition of Theorem 1 as σ¯1=2\overline{\sigma}_{1}=2 and σ¯1=1\underline{\sigma}_{1}=1. Therefore, we conclude that the repeated prisoner’s dilemma game contains at least one ZD strategy, which is a well-known result. By using the construction method in the proof of Theorem 1, 𝑩\bm{B} is decomposed into

[𝑩]2,\displaystyle\left[\bm{B}\right]_{2,-} =\displaystyle= (0,0,S+α0,P+α0)𝖳\displaystyle\left(0,0,S+\alpha_{0},P+\alpha_{0}\right)^{\mathsf{T}}
[𝑩]1,+\displaystyle\left[\bm{B}\right]_{1,+} =\displaystyle= (R+α0,T+α0,0,0)𝖳,\displaystyle\left(R+\alpha_{0},T+\alpha_{0},0,0\right)^{\mathsf{T}}, (35)

and the ZD strategy is

𝑻^1(2)\displaystyle\hat{\bm{T}}_{1}\left(2\right) =\displaystyle= 1W𝑩\displaystyle\frac{1}{W}\bm{B}
𝑻^1(1)\displaystyle\hat{\bm{T}}_{1}\left(1\right) =\displaystyle= 𝑻^1(2)\displaystyle-\hat{\bm{T}}_{1}\left(2\right) (36)

with W:=max𝝈𝒜|B(𝝈)|W:=\max_{\bm{\sigma}\in\mathcal{A}}\left|B(\bm{\sigma})\right|. It should be noted that this ZD strategy is called the equalizer strategy and it unilaterally enforces s2=α0\left\langle s_{2}\right\rangle^{*}=-\alpha_{0} [1].

3.3 Relation to other class of stage games

Theorem 1 can be used to define a class of stage games where ZD strategies exist. In this paper, we call this class ZD games. A natural question is the relation between ZD games and other classes of stage games, such as potential games and totally symmetric games.

Here, we restrict our attention to two-player symmetric games. In other words, the payoffs satisfy s2(σ1,σ2)=s1(σ2,σ1)s_{2}(\sigma_{1},\sigma_{2})=s_{1}(\sigma_{2},\sigma_{1}) (𝝈)(\forall\bm{\sigma}). We also write the set of actions as A1=A2=A:={1,,L}A_{1}=A_{2}=A:=\{1,\cdots,L\}, where LL is the common number of actions. We first introduce the concept of generalized rock-paper-scissors games.

Definition 2 ([25, 18])

A stage game is a generalized rock-paper-scissors (gRPS) game if it contains at least one subset of the action space AAA^{\prime}\subseteq A such that for all σ1A\sigma_{1}\in A^{\prime} there exists σ2A\sigma_{2}\in A^{\prime} such that s1(A)(σ1,σ2)<0s_{1}^{(\mathrm{A})}\left(\sigma_{1},\sigma_{2}\right)<0, where s1(A)(σ1,σ2):=[s1(σ1,σ2)s1(σ2,σ1)]/2s_{1}^{(\mathrm{A})}(\sigma_{1},\sigma_{2}):=\left[s_{1}(\sigma_{1},\sigma_{2})-s_{1}(\sigma_{2},\sigma_{1})\right]/2 is an anti-symmetric part of s1s_{1}.

We also call the complementary set of gRPS games in all two-player symmetric games as non-gRPS games. We now prove the following theorem on the relation between non-gRPS games and ZD games.

Theorem 2

If a stage game is not a gRPS game, then it is a ZD game.

Proof. If a stage game is not a gRPS game, then, for all AAA^{\prime}\subseteq A, there exists an action σ1A\sigma_{1}\in A^{\prime} such that s1(A)(σ1,σ2)0s_{1}^{(\mathrm{A})}\left(\sigma_{1},\sigma_{2}\right)\geq 0 (σ2A)\left(\forall\sigma_{2}\in A^{\prime}\right). Such action σ1\sigma_{1} is an unbeatable action in AA^{\prime}. It should be noted that AA^{\prime} can be AA. We now construct a series of unbeatable actions as follows. First, σ(1)\sigma^{*(1)} is an unbeatable action when the action space is AA, that is,

s1(A)(σ(1),σ2)\displaystyle s_{1}^{(\mathrm{A})}\left(\sigma^{*(1)},\sigma_{2}\right) \displaystyle\geq 0(σ2A).\displaystyle 0\quad\left(\forall\sigma_{2}\in A\right). (37)

Then, for 2lL2\leq l\leq L, we recursively define the set A(l):=A\{σ(1),,σ(l1)}A^{(l)}:=A\backslash\left\{\sigma^{*(1)},\cdots,\sigma^{*(l-1)}\right\} and an action σ(l)A(l)\sigma^{*(l)}\in A^{(l)} such that

s1(A)(σ(l),σ2)0(σ2A(l)).\displaystyle s_{1}^{(\mathrm{A})}\left(\sigma^{*(l)},\sigma_{2}\right)\geq 0\quad\left(\forall\sigma_{2}\in A^{(l)}\right). (38)

We also write A(1):=AA^{(1)}:=A. We remark that such series {σ(l)}l=1L\left\{\sigma^{*(l)}\right\}_{l=1}^{L} is well-defined due to the property of non-gRPS games.

Because s1(A)(σ1,σ2)=[s1(σ1,σ2)s2(σ1,σ2)]/2s_{1}^{(\mathrm{A})}(\sigma_{1},\sigma_{2})=\left[s_{1}(\sigma_{1},\sigma_{2})-s_{2}(\sigma_{1},\sigma_{2})\right]/2 for two-player symmetric games, we can see that σ¯1=σ(1)\underline{\sigma}_{1}=\sigma^{*(1)} in the condition of Theorem 1:

s1(σ(1),σ2)s2(σ(1),σ2)\displaystyle s_{1}\left(\sigma^{*(1)},\sigma_{2}\right)-s_{2}\left(\sigma^{*(1)},\sigma_{2}\right) \displaystyle\geq 0(σ2A).\displaystyle 0\quad\left(\forall\sigma_{2}\in A\right). (39)

Next, we prove that σ¯1=σ(L)\overline{\sigma}_{1}=\sigma^{*(L)}:

s1(σ(L),σ2)s2(σ(L),σ2)\displaystyle s_{1}\left(\sigma^{*(L)},\sigma_{2}\right)-s_{2}\left(\sigma^{*(L)},\sigma_{2}\right) \displaystyle\leq 0(σ2A).\displaystyle 0\quad\left(\forall\sigma_{2}\in A\right). (40)

Assume to the contrary that

s1(A)(σ(L),σ2)\displaystyle s_{1}^{(\mathrm{A})}\left(\sigma^{*(L)},\sigma_{2}\right) >\displaystyle> 0(σ2A).\displaystyle 0\quad(\exists\sigma_{2}\in A). (41)

(Because s1(A)s_{1}^{(\mathrm{A})} is an anti-symmetric part, σ2σ(L)\sigma_{2}\neq\sigma^{*(L)}.) This is rewritten as

s1(A)(σ2,σ(L))\displaystyle s_{1}^{(\mathrm{A})}\left(\sigma_{2},\sigma^{*(L)}\right) <\displaystyle< 0(σ2A).\displaystyle 0\quad(\exists\sigma_{2}\in A). (42)

However, since σ2{σ(1),,σ(L1)}\sigma_{2}\in\left\{\sigma^{*(1)},\cdots,\sigma^{*(L-1)}\right\} and σ(L)A(l)\sigma^{*(L)}\in A^{(l)} for 1lL1\leq l\leq L, this contradicts with Eq. (38). Therefore, we conclude that Eq. (40) indeed holds. We now find that Eqs. (39) and (40) correspond to the condition for the existence of ZD strategies in Theorem 1. We remark that a linear relation enforced by the ZD strategy is s1=s2\left\langle s_{1}\right\rangle^{*}=\left\langle s_{2}\right\rangle^{*}. \Box

It should be noted that an unbeatable imitation strategy exists if and only if a stage game is not a gRPS game [18]. Theorem 2 also constructs an unbeatable ZD strategy, which unilaterally enforces s1=s2\left\langle s_{1}\right\rangle^{*}=\left\langle s_{2}\right\rangle^{*}, for non-gRPS games. Both results imply that, in non-gRPS games, it is not easy for players to exploit the opponent. We also note that two-player symmetric potential games are a subset of non-gRPS games [18]. Therefore, our result directly leads to the existence of ZD strategies in two-player symmetric potential games [20].

We finally remark that the converse of Theorem 2 is not true. That is, ZD strategies can exist for some gRPS games. An example is a game in Table 1, which is a modified version of the RPS game.

Table 1: A gRPS game with a ZD strategy.
RR PP SS σ¯\overline{\sigma} σ¯\underline{\sigma}
RR 0,0 -1,1 1,-1 0,0 0,0
PP 1,-1 0,0 -1,1 0,0 0,0
SS -1,1 1,-1 0,0 0,0 0,0
σ¯\overline{\sigma} 0,0 0,0 0,0 0,0 0,0
σ¯\underline{\sigma} 0,0 0,0 0,0 0,0 0,0

Although this game contains a gRPS cycle when A={R,P,S}A^{\prime}=\{R,P,S\}, this game is also a ZD game, since σ¯\overline{\sigma} and σ¯\underline{\sigma} are regarded as the two actions in Theorem 1.

4 Concluding Remarks

In this paper, we have provided the necessary and sufficient condition for the existence of ZD strategies in repeated games (Theorem 1). This condition exactly means the existence of two actions which unilaterally increases and decreases the total value of k=0Nαksk\sum_{k=0}^{N}\alpha_{k}s_{k}, respectively. We have now found that such property is necessary for unilateral control of payoffs by ZD strategies. In fact, we can easily check that the rock-paper-scissors game does not contain the two actions as in Theorem 1, which leads to the absence of ZD strategies [11]. From another point of view, stage games satisfying the condition of Theorem 1 can be regarded as a class allowing the existence of ZD strategies. We also provided the relation between this class of stage games (ZD games) and non-gRPS games for the case of two-player symmetric games. Further investigation on the relation between ZD games and other classes of stage games is needed.

We have investigated only the situation that a discounting factor is δ=1\delta=1 and monitoring is perfect. In general, the set of possible ZD strategies decreases as δ\delta decreases and monitoring becomes imperfect [9, 8, 11, 26]. Particularly, the existence of ZD strategies in games with imperfect monitoring will be highly dependent on the set of signals of each player. Investigation of the necessary and sufficient condition for the existence of ZD strategies in games with discounting and imperfect monitoring is an important subject of future work. It would be interesting if our result can be applied for the existence of memory-mm ZD strategies with m2m\geq 2 [15].

{acknowledgment}

This study was supported by JSPS KAKENHI Grant Number JP20K19884 and Inamori Research Grants.

References

  • [1] W. H. Press and F. J. Dyson: Proceedings of the National Academy of Sciences 109 (2012) 10409.
  • [2] E. Akin: Ergodic Theory, Advances in Dynamical Systems (2016) 77.
  • [3] C. Hilbe, B. Wu, A. Traulsen, and M. A. Nowak: Proceedings of the National Academy of Sciences 111 (2014) 16425.
  • [4] L. Pan, D. Hao, Z. Rong, and T. Zhou: Scientific Reports 5 (2015) 13096.
  • [5] A. McAvoy and C. Hauert: Proceedings of the National Academy of Sciences 113 (2016) 3573.
  • [6] X. He, H. Dai, P. Ning, and R. Dutta: IEEE Signal Processing Letters 23 (2016) 311.
  • [7] C. Hilbe, A. Traulsen, and K. Sigmund: Games and Economic Behavior 92 (2015) 41.
  • [8] G. Ichinose and N. Masuda: Journal of Theoretical Biology 438 (2018) 61.
  • [9] D. Hao, Z. Rong, and T. Zhou: Phys. Rev. E 91 (2015) 052803.
  • [10] A. Mamiya and G. Ichinose: Journal of Theoretical Biology 477 (2019) 63.
  • [11] M. Ueda and T. Tanaka: PLOS ONE 15 (2020) e0230973.
  • [12] A. Mamiya and G. Ichinose: Phys. Rev. E 102 (2020) 032115.
  • [13] A. McAvoy and C. Hauert: Theoretical Population Biology 113 (2017) 13.
  • [14] M. Ueda: Journal of the Physical Society of Japan 90 (2021) 025002.
  • [15] M. Ueda: Royal Society Open Science 8 (2021) 202186.
  • [16] M. Ueda: arXiv preprint arXiv:2012.10231 (2020).
  • [17] A. J. Stewart and J. B. Plotkin: Proceedings of the National Academy of Sciences 110 (2013) 15348.
  • [18] P. Duersch, J. Oechssler, and B. C. Schipper: Games and Economic Behavior 76 (2012) 88.
  • [19] P. Duersch, J. Oechssler, and B. C. Schipper: International Journal of Game Theory 43 (2014) 25.
  • [20] M. Ueda: Journal of the Physical Society of Japan 91 (2022) 054804.
  • [21] M. A. Taha and A. Ghoneim: Applied Mathematics and Computation 369 (2020) 124862.
  • [22] A. Plan: University of Arizona Economics Working Paper (2017) 17.
  • [23] D. Monderer and L. S. Shapley: Games and Economic Behavior 14 (1996) 124.
  • [24] G. J. Mailath and L. Samuelson: Repeated games and reputations: long-run relationships (Oxford University Press, 2006).
  • [25] P. Duersch, J. Oechssler, and B. C. Schipper: International Journal of Game Theory 41 (2012) 553.
  • [26] A. Mamiya, D. Miyagawa, and G. Ichinose: Journal of Theoretical Biology 526 (2021) 110810.