This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

aainstitutetext: Department of Mathematics and Statistics &\& Centre for Quantum Topology and Its Applications (quanTA), University of Saskatchewan, Saskatoon, Saskatchewan S7N 5E6, Canadabbinstitutetext: Department of Physics, Osaka University, Toyonaka, Osaka 5600043, Japan

Infinitely Repeated Quantum Games and Strategic Efficiency

Kazuki Ikeda [email protected] b    and Shoto Aoki
Abstract

Repeated quantum game theory addresses long term relations among players who choose quantum strategies. In the conventional quantum game theory, single round quantum games or at most finitely repeated games have been widely studied, however less is known for infinitely repeated quantum games. Investigating infinitely repeated games is crucial since finitely repeated games do not much differ from single round games. In this work we establish the concept of general repeated quantum games and show the Quantum Folk Theorem, which claims that by iterating a game one can find an equilibrium strategy of the game and receive reward that is not obtained by a Nash equilibrium of the corresponding single round quantum game. A significant difference between repeated quantum prisoner’s dilemma and repeated classical prisoner’s dilemma is that the classical Pareto optimal solution is not always an equilibrium of the repeated quantum game when entanglement is sufficiently strong. When entanglement is sufficiently strong and reward is small, mutual cooperation cannot be an equilibrium of the repeated quantum game. In addition we present several concrete equilibrium strategies of the repeated quantum prisoner’s dilemma.

1 Introduction and Summary

In our study on repeated quantum games, we aim at exploring long term relations among players and efficient quantum strategies that may make their payoff maximal and can become equilibria of the repetition of quantum games. The first work of an infinitely repeated quantum game was presented by one of the authors in 2020QuIP…19…25I , where infinitely repeated quantum games are explained in terms of quantum automata and optimal transformation and equilibria of some strategies are provided. The previous work is a quantum extension of a classical prisoner’s dilemma where behavior is recognized by signals which can be monitored imperfectly or perfectly. To explore more on quantum properties of games, we give a different formulation of infinitely repeated games and investigate a long term relation between players. In this article we give a general definition of infinitely repeated quantum games. In particular we address an infinitely repeated quantum prisoner’s dilemma and show the folk theorem of the repeated quantum game. The main contributions of our work to the quantum game theory are Theorem 3.7 and Theorem 3.8. By Theorem 3.7, we claim that mutual cooperation cannot be an equilibrium when players are maximally entangled and reward for cooperating is not enough. This fact distinguishes a repeated quantum game from a repeated classical game. Note that mutual cooperation is an equilibrium of the classical repeated prisoner’s dilemma. By Theorem 3.8, we present the quantum version of the Folk theorem, which assures the existence of equilibrium strategies for any degree of entanglement. Our model will also be useful in discussing complex dynamic processes and evolutionary processes in many economic or interdisciplinary fields from a game theoretical perspective. In economics, the negotiation of a contract can be regarded as a repeated game if it continues for a long time, and the recent Internet auction is basically a repeated game. Most recently, a quantum model of contracts, moral hazard, and incentive contracts was first formulated in Ikeda_cat2021 . It would be very interesting to consider such a model in terms of repeated games. Furthermore, considering the evolutionary nature of ecological systems will be a new challenge doi:10.1098/rspa.2021.0397 . For further motivation to consider repeated quantum games, please consult 2020QuIP…19…25I ; 2020arXiv201014098A for example.

In the sense of classical games, players try to maximize reward by choosing strategies based on opponents’ signals obtained by measurement. In a single stage prisoner’s dilemma, mutual cooperation is not a Nash equilibrium Nash48 but a Pareto optimal, however when repetition of the game is allowed, such a Pareto optimal strategy can be an equilibrium of the infinitely repeated game. A study on an infinitely repeated game aims at finding such a non-trivial equilibrium solution that can be established in a long term relationship. From this viewpoint repeated games play a fundamental role in a fairly large part of the modern economics. Hence repeated quantum games will become important when quantum systems become prevalent in many parts of the near future society.

There are various definitions of quantum game Eisert:1998vv ; 1999PhRvL..82.1052M and it is as open to establish the foundation. Indeed, many authors work on quantum games with various motivations Li2013 ; Li13 . Recent advances in quantum games are reviewed in 2018arXiv180307919S . Most of the conventional works on quantum games focus on some single stage quantum games, where each one plays a quantum strategy only one time. However, in a practical situation, a game is more likely played repeatedly, hence it is more meaningful to address repeated quantum games. Repeated games are categorized into (1) finitely repeated games and (2) infinitely repeated games. However, equilibria of finitely repeated classical games are completely the same as those of single round classical games. This is also true for at most finitely repeated quantum games and the Nash equilibrium of finite repeated quantum prisoner’s dilemma is the same as the single stage case 2002PhLA..300..541I ; 2012JPhA…45h5307F .

On the other hand, in infinitely repeated (classical) games or unknown length games, there is no fixed optimum strategy, hence they are very different from single stage games 10.2307/1911077 ; 10.2307/1911307 . It is widely known that in infinitely repeated (classical) prisoner’s dilemma, mutual cooperation can be an equilibrium. So it is natural to investigate similar strategic behavior of quantum players. But less has been known for infinitely repeated quantum games. To our best knowledge, this is the first article which gives a formal definition of infinitely repeated games and investigates the general Folk theorem of the infinitely repeated prisoner’s dilemma.

This work is organized as follows. In Sec.2, we address single round quantum prisoner’s dilemma and investigate equilibria of the game. We first prepare terminologies and concepts used for this work. We present novel relations between payoff and entanglement in Sec.2.2. In Sec.3 we establish the concept of a generic repeated quantum game (Definition 3.1) and present some equilibrium strategies (Lemma 3.3 and 3.5). Our main results (3.7 and 3.8) are described in Sec.3.3. This work is concluded in Sec.4.

2 Single Round Quantum Games

2.1 Setup

We consider the quantum prisoner’s dilemma Eisert:1998vv ; 1999PhRvL..82.1052M . Let |C,|D\ket{C},\ket{D} be two normalized orthogonal states that represent "Cooperative" and "Non-cooperative" states, respectively. Then a general state of the game is a complex linear combination, spanned by the basis vectors |CC,|CD,|DC,|DD\ket{CC},\ket{CD},\ket{DC},\ket{DD}. Each player ii chooses a unitary strategy UiU_{i} and the game is in a state

|ψ=𝒥(UAUB)𝒥|CC,\displaystyle\Ket{\psi}=\mathcal{J}^{\dagger}(U_{A}\otimes U_{B})\mathcal{J}\Ket{CC}, (1)

where 𝒥\mathcal{J} gives some entanglement. In what follows we use

𝒥=exp[iθ2YY]=cosθ2+isinθ2(YY),\displaystyle\mathcal{J}=\exp{\left[i\frac{\theta}{2}Y\otimes Y\right]}=\cos{\frac{\theta}{2}}+i\sin{\frac{\theta}{2}}(Y\otimes Y), (2)

where θ\theta represents entanglement and two states become maximally entangled at θ=π2\theta=\frac{\pi}{2}. We assume a quantum strategy of a player ii has a representation

Ui=αiI+i(βiX+γiY+δiZ)(αi,,δi)\displaystyle U_{i}=\alpha_{i}I+i\left(\beta_{i}X+\gamma_{i}Y+\delta_{i}Z\right)\quad(\alpha_{i},\cdots,\delta_{i}\in\mathbb{R}) (3)

where the coefficients obey αi2++δi2=1\alpha^{2}_{i}+\cdots+\delta_{i}^{2}=1.

Bob:C Bob:D
Alice:C (rr,rr) (ss,tt)
Alice:D (tt,ss) (pp,pp)
Table 1: Pay-off matrix of the prisoner’s dilemma. p,r,sp,r,s and tt satisfy t>r>p>s0t>r>p>s\geq 0.

Alice receives payoff $A=ψ|$^A|ψ\$_{A}=\Braket{\psi}{\hat{\$}_{A}}{\psi}, where $^A\hat{\$}_{A} is her payoff matrix

$^A=r|CCCC|+p|DDDD|+t|DCDC|+s|CDCD|.\displaystyle\hat{\$}_{A}=r\Ket{CC}\bra{CC}+p\ket{DD}\bra{DD}+t\ket{DC}\bra{DC}+s\ket{CD}\bra{CD}. (4)

Without loss of generality, we can always proceed with the discussion as s=0s=0 if necessary. Taking into account of UAU_{A} (3), the average payoff $^A=$^A(xA,xB)\hat{\$}_{A}=\hat{\$}_{A}(x_{A},x_{B}) of Alice is a function of xi=(αi,,δi)x_{i}=(\alpha_{i},\cdots,\delta_{i})

$A(xA,xB)=r(αAαBδAδB+sinθ(βAγB+γAβB))2+p(βAβBγAγB+sinθ(αAδB+δAαB))2+t(γAαB+βAδB+sinθ(αAβB+δAγB))2+cos2θ(r(αAδB+δAδB)2+p(βAγB+γAβB)2+t(βAαBγAδB)2).\displaystyle\begin{split}\$_{A}(x_{A},x_{B})&=r(\alpha_{A}\alpha_{B}-\delta_{A}\delta_{B}+\sin{\theta}(\beta_{A}\gamma_{B}+\gamma_{A}\beta_{B}))^{2}\\ &+p(\beta_{A}\beta_{B}-\gamma_{A}\gamma_{B}+\sin{\theta}(\alpha_{A}\delta_{B}+\delta_{A}\alpha_{B}))^{2}\\ &+t(\gamma_{A}\alpha_{B}+\beta_{A}\delta_{B}+\sin{\theta}(-\alpha_{A}\beta_{B}+\delta_{A}\gamma_{B}))^{2}\\ &+\cos^{2}{\theta}\left(r(\alpha_{A}\delta_{B}+\delta_{A}\delta_{B})^{2}+p(\beta_{A}\gamma_{B}+\gamma_{A}\beta_{B})^{2}\right.\\ &\left.+t(\beta_{A}\alpha_{B}-\gamma_{A}\delta_{B})^{2}\right).\end{split} (5)

Since the game is symmetric for two players, $A(xA,xB)=$B(xB,xA)\$_{A}(x_{A},x_{B})=\$_{B}(x_{B},x_{A}) is satisfied.

Definition 2.1.

A quantum strategy UU is called pure if a player chooses any single qubit operator.

Definition 2.2.

A quantum strategy UU is called mixed if a player choose one from {I,X,Y,Z}\{I,X,Y,Z\} with some probability (pI,pX,pY,pZ)(p_{I},p_{X},p_{Y},p_{Z})111The most general definition of a single-qubit mixed strategy UU is given as U=U(2)p(g)g𝑑μ(g),U=\int_{U(2)}p(g)gd\mu(g), (6) where dμd\mu is a Haar measure on U(2)U(2) and p(g)p(g) is non-negative and satisfies U(2)p(g)𝑑μ(g)=1\int_{U(2)}p(g)d\mu(g)=1. Throughout this article, we intend to explore mixed strategies based on Def. 2.2. .

Although our mixed strategies are probabilistic mixtures of the four Pauli operators {I,X,Y,Z}\{I,X,Y,Z\}, we show the game is powerful enough to obtain stronger results compared with classical prisoner’s dilemma with mixed strategy.

The classical pure strategic prisoner’s dilemma allows players to choose one of {I,X}\{I,X\} and it becomes a mixed strategy game if the state is written as

|ψ=i{CC,CD,DC,DD}pi|i,\Ket{\psi}=\sum_{i\in\{CC,CD,DC,DD\}}p_{i}\ket{i}, (7)

where pip_{i} is non-negative and satisfies ipi=1\sum_{i}p_{i}=1. Note that for a single round quantum game, a generic state is represented as

|ψ=i{CC,CD,DC,DD}ci|i,\ket{\psi}=\sum_{i\in\{CC,CD,DC,DD\}}c_{i}\ket{i}, (8)

where cic_{i} is a complex number and a state |i\ket{i} is measured with the probability |ci|2|c_{i}|^{2}. However as long as one considers a single round game, the quantum game is resemble the classical mixed strategy game, where a strategy is chosen stochastically with a given probability distribution.

Considering repeated quantum games is a way to make quantum games much more meaningful, since transition of quantum states is different from that of classical states. To begin with, we first address single round quantum prisoner’s dilemma and equilibria of the game.

2.2 Equilibria

2.2.1 Without Entanglement θ=0\theta=0

The case without entanglement corresponds to the classical prisoner’s dilemma. Table 2 presents the correspondence between operators and classical actions.

I X Y Z
I (C,C) (C,D) (C,D) (C,C)
X (D,C) (D,D) (D,D) (D,C)
Y (D,C) (D,D) (D,D) (D,C)
Z (C,C) (C,D) (C,D) (C,C)
Table 2: Operators and actions for the case without entanglement θ=0\theta=0.

Then the strategy is simply a mixed strategy of the pure strategies {C,D}\{C,D\}, therefore average payoff of Alice is

$A=r(αA2+δA2)(αB2+δB2)+p(βA2+γA2)(βB2+γB2)+t(βA2+γA2)(αB2+δB2).\displaystyle\begin{split}\$_{A}&=r(\alpha_{A}^{2}+\delta_{A}^{2})(\alpha_{B}^{2}+\delta_{B}^{2})\\ &+p(\beta_{A}^{2}+\gamma_{A}^{2})(\beta_{B}^{2}+\gamma_{B}^{2})\\ &+t(\beta_{A}^{2}+\gamma_{A}^{2})(\alpha_{B}^{2}+\delta_{B}^{2}).\end{split} (9)

As a result, UA=i(βAX+γAY),UB=i(βBX+γBY)U_{A}=i(\beta_{A}X+\gamma_{A}Y),U_{B}=i(\beta_{B}X+\gamma_{B}Y) is the equilibrium.

2.2.2 Maximally Entangled θ=π/2\theta=\pi/2

In contrast to the case without entanglement, the strategy part of the maximally entangled case becomes

𝒥(IX)𝒥\displaystyle\mathcal{J}^{\dagger}(I\otimes X)\mathcal{J} =(IX)𝒥2=YZ\displaystyle=(I\otimes X)\mathcal{J}^{2}=-Y\otimes Z (10)
𝒥(IY)𝒥\displaystyle\mathcal{J}^{\dagger}(I\otimes Y)\mathcal{J} =IY.\displaystyle=I\otimes Y. (11)

Therefore the correspondence between the operators and actions becomes non-trivial as exhibited in Table 3.

I X Y Z
I (C,C) (D,C) (C,D) (D,D)
X (C,D) (D,D) (C,C) (D,C)
Y (D,C) (C,C) (D,D) (C,D)
Z (D,D) (C,D) (D,C) (C,C)
Table 3: Operators and actions for the maximal entangled case θ=π2\theta=\frac{\pi}{2}.
Pure Quantum Strategy

We first consider the pure strategy game. The average payoff of Alice is

$A(xA,xB)=r(αAαBδAδB+βAγB+γAβB)2+p(βAβBγAγB+αAδB+δAαB)2+t(γAαB+βAδBαAβB+δAγB)2.\displaystyle\begin{split}\$_{A}(x_{A},x_{B})&=r(\alpha_{A}\alpha_{B}-\delta_{A}\delta_{B}+\beta_{A}\gamma_{B}+\gamma_{A}\beta_{B})^{2}\\ &+p(\beta_{A}\beta_{B}-\gamma_{A}\gamma_{B}+\alpha_{A}\delta_{B}+\delta_{A}\alpha_{B})^{2}\\ &+t(\gamma_{A}\alpha_{B}+\beta_{A}\delta_{B}-\alpha_{A}\beta_{B}+\delta_{A}\gamma_{B})^{2}.\end{split} (12)

This can be written as

$A=r(αA)2+p(βA)2+t(γA)2,\displaystyle\$_{A}=r(\alpha_{A}^{\prime})^{2}+p(\beta_{A}^{\prime})^{2}+t(\gamma_{A}^{\prime})^{2}, (13)

where

αA=αAαBδAδB+βAγB+γAβBβA=βAβBγAγB+αAδB+δAαBγA=γAαB+βAδBαAβB+δAγBδA=γBαA+βBδAαBβA+δBγA\displaystyle\begin{split}\alpha_{A}^{\prime}&=\alpha_{A}\alpha_{B}-\delta_{A}\delta_{B}+\beta_{A}\gamma_{B}+\gamma_{A}\beta_{B}\\ \beta_{A}^{\prime}&=\beta_{A}\beta_{B}-\gamma_{A}\gamma_{B}+\alpha_{A}\delta_{B}+\delta_{A}\alpha_{B}\\ \gamma_{A}^{\prime}&=\gamma_{A}\alpha_{B}+\beta_{A}\delta_{B}-\alpha_{A}\beta_{B}+\delta_{A}\gamma_{B}\\ \delta_{A}^{\prime}&=\gamma_{B}\alpha_{A}+\beta_{B}\delta_{A}-\alpha_{B}\beta_{A}+\delta_{B}\gamma_{A}\end{split} (14)

Note that the equation (14) is a coordinate transformation on S3S^{3}, hence there is a xAS3x_{A}\in S^{3} that makes γA=1\gamma_{A}^{\prime}=1 for all xBS3x_{B}\in S^{3}. In other words, Alice can find a stronger strategy for any strategy of Bob, and vice versa. Therefore there is no equilibrium for this case. In this sense this looks like "Rock-paper-scissors" .

Mixed Quantum Strategy

We can find equilibria when we allow Alice and Bob play mixed quantum strategies. That means Alice chooses a strategy from {I,X,Y,Z}\{I,X,Y,Z\} with probability pA=(pIA,pXA,pYA,pZA)p^{A}=(p^{A}_{I},p^{A}_{X},p^{A}_{Y},p^{A}_{Z}). Alice can execute her strategy UAU_{A} in such a way that

UA=pIAII+pXAiXI+pYAiYX+pZAiZX\displaystyle U_{A}=\sqrt{p_{I}^{A}}I\otimes I+\sqrt{p_{X}^{A}}iX\otimes I+\sqrt{p_{Y}^{A}}iY\otimes X+\sqrt{p_{Z}^{A}}iZ\otimes X (15)

Then the average payoff of Alice is given by

$A=(pIApXApYApZA)(rtspsprttrpspstr)(pIBpXBpYBpZB)\displaystyle\$_{A}=\left(\begin{array}[]{cccc}p^{A}_{I}&p^{A}_{X}&p^{A}_{Y}&p^{A}_{Z}\end{array}\right)\left(\begin{array}[]{cccc}r&t&s&p\\ s&p&r&t\\ t&r&p&s\\ p&s&t&r\end{array}\right)\left(\begin{array}[]{c}p^{B}_{I}\\ p^{B}_{X}\\ p^{B}_{Y}\\ p^{B}_{Z}\end{array}\right) (25)

We find that the above form can be written as

$A\displaystyle\$_{A} =rpIB+tpXB+spYB+ppZB\displaystyle=rp^{B}_{I}+tp^{B}_{X}+sp^{B}_{Y}+pp^{B}_{Z}
+(pXApYApZA)((rs)(pIB+pYB)+(tp)(pXB+pZB)(tr)(pIBpXB)+(ps)(pYBpZB)(rp)(pIB+pZB)+(ts)(pYBpXB))\displaystyle+\left(\begin{array}[]{ccc}p^{A}_{X}&p^{A}_{Y}&p^{A}_{Z}\end{array}\right)\left(\begin{array}[]{c}(r-s)(-p^{B}_{I}+p^{B}_{Y})+(t-p)(-p^{B}_{X}+p^{B}_{Z})\\ (t-r)(p^{B}_{I}-p^{B}_{X})+(p-s)(p^{B}_{Y}-p^{B}_{Z})\\ (r-p)(-p^{B}_{I}+p^{B}_{Z})+(t-s)(p^{B}_{Y}-p^{B}_{X})\end{array}\right) (30)

If pIB=pXB=pYB=pZB=14p^{B}_{I}=p^{B}_{X}=p^{B}_{Y}=p^{B}_{Z}=\frac{1}{4}, the average payoff of Alice does not depend on pAp^{A} and it becomes

$A=r+t+s+p4.\displaystyle\$_{A}=\frac{r+t+s+p}{4}. (31)

So one of the equilibria is

{pA=(14,14,14,14)pB=(14,14,14,14)\displaystyle\left\{\begin{array}[]{c}{p^{A}}^{\star}=\left(\frac{1}{4},\frac{1}{4},\frac{1}{4},\frac{1}{4}\right)\\ {p^{B}}^{\star}=\left(\frac{1}{4},\frac{1}{4},\frac{1}{4},\frac{1}{4}\right)\end{array}\right. (34)

In addition, we have another equilibrium.

Proposition 2.3.

If t+s>r+pt+s>r+p, then

{pA=(0,12,12,0)pB=(12,0,0,12)or{pA=(12,0,0,12)pB=(0,12,12,0)\displaystyle\left\{\begin{array}[]{c}{p^{A}}^{\star}=\left(0,\frac{1}{2},\frac{1}{2},0\right)\\ {p^{B}}^{\star}=\left(\frac{1}{2},0,0,\frac{1}{2}\right)\end{array}\right.~{}~{}\text{or}~{}~{}\left\{\begin{array}[]{c}{p^{A}}^{\star}=\left(\frac{1}{2},0,0,\frac{1}{2}\right)\\ {p^{B}}^{\star}=\left(0,\frac{1}{2},\frac{1}{2},0\right)\end{array}\right. (39)

is an equilibrium. The average payoff at the equilibrium is t+s2\frac{t+s}{2}.

Proof.

It is enough to show one of (39). Using (30), we find that the average payoff of Alice respects

$A(pA,pB)\displaystyle\$_{A}(p^{A},{p^{B}}^{\star}) =r+p2+(pXA+pYA)12(tpr+s)\displaystyle=\frac{r+p}{2}+(p^{A}_{X}+p^{A}_{Y})\frac{1}{2}(t-p-r+s) (40)
t+s2=$A(pA,pB)\displaystyle\leq\frac{t+s}{2}=\$_{A}({p^{A}}^{\star},{p^{B}}^{\star}) (41)

Similarly the average payoff of Bob satisfies

$B(pA,pB)\displaystyle\$_{B}({p^{A}}^{\star},p^{B}) =t+s2(pXB+pYB)12(tpr+s)\displaystyle=\frac{t+s}{2}-(p^{B}_{X}+p^{B}_{Y})\frac{1}{2}(t-p-r+s) (42)
t+s2=$B(pA,pB).\displaystyle\leq\frac{t+s}{2}=\$_{B}({p^{A}}^{\star},{p^{B}}^{\star}). (43)

This ends the proof. ∎

Repeating the same discussion, we can show the following statement.

Proposition 2.4.

If t+s<r+pt+s<r+p, then

{pA=(0,12,12,0)pB=(0,12,12,0)or{pA=(12,0,0,12)pB=(12,0,0,12)\displaystyle\left\{\begin{array}[]{c}{p^{A}}^{\star}=\left(0,\frac{1}{2},\frac{1}{2},0\right)\\ {p^{B}}^{\star}=\left(0,\frac{1}{2},\frac{1}{2},0\right)\end{array}\right.~{}~{}\text{or}~{}~{}\left\{\begin{array}[]{c}{p^{A}}^{\star}=\left(\frac{1}{2},0,0,\frac{1}{2}\right)\\ {p^{B}}^{\star}=\left(\frac{1}{2},0,0,\frac{1}{2}\right)\end{array}\right. (48)

is an equilibrium. The average payoff at the equilibrium is r+p2\frac{r+p}{2}.

2.2.3 In-between

Pure Quantum Strategy

We consider a general case with entanglement θ(0,π/2)\theta\in(0,\pi/2). Without entanglement θ=0\theta=0, both X,YX,Y are always stronger than II, but II is sometimes stronger than XX when there is some entanglement. YY is always stronger than II. So in what follows we first assume Bob plays UB=YU_{B}=Y, which corresponds to xB=(0,0,1,0)x_{B}=(0,0,1,0). In this case, the payoff of Alice is

$A(xA,xB)=(rsin2θ+pcos2θ)(pcos2θ(tr)sin2θ)δA2(rp)sin2θγA2(rsin2θ+pcos2θ)αA2.\displaystyle\begin{split}\$_{A}(x_{A},x_{B})&=(r\sin^{2}\theta+p\cos^{2}\theta)\\ &-(p\cos^{2}\theta-(t-r)\sin^{2}\theta)\delta_{A}^{2}\\ &-(r-p)\sin^{2}\theta\gamma_{A}^{2}\\ &-(r\sin^{2}\theta+p\cos^{2}\theta)\alpha_{A}^{2}.\end{split}

Note that the first term is always positive, the third and fourth terms are always negative since r>p>0r>p>0. The second term could be positive if sin2θ<ptr+p\sin^{2}\theta<\frac{p}{t-r+p}. Therefore Alice’s best reaction to UB=YU_{B}=Y is xA=(αA,βA,γA,δA)=(0,1,0,0)x_{A}=(\alpha_{A},\beta_{A},\gamma_{A},\delta_{A})=(0,1,0,0) and the corresponding payoff is

$A(xA,xB)=rsin2θ+pcos2θ.\$_{A}(x_{A}^{\star},x_{B}^{\star})=r\sin^{2}\theta+p\cos^{2}\theta. (49)

Similarly one can find that Bob’s best reaction to xA=(0,1,0,0)x_{A}=(0,1,0,0) is xB=(0,0,1,0)x_{B}=(0,0,1,0). So the pair of xA=(0,1,0,0)x^{\star}_{A}=(0,1,0,0) and xB=(0,0,1,0)x^{\star}_{B}=(0,0,1,0) is an equilibrium since they satisfy

$A(xA,xB)$A(xA,xB)$B(xA,xB)$A(xA,xB)\displaystyle\begin{split}\$_{A}(x_{A}^{\star},x_{B}^{\star})\geq\$_{A}(x_{A},x_{B}^{\star})\\ \$_{B}(x_{A}^{\star},x_{B}^{\star})\geq\$_{A}(x_{A}^{\star},x_{B})\end{split} (50)

More generally one can find that

{xA=(0,cosϕ,sinϕ,0)xB=(0,sinϕ,cosϕ,0)\displaystyle\left\{\begin{array}[]{l}x_{A}^{\star}=(0,\cos\phi,\sin\phi,0)\\ x_{B}^{\star}=(0,\sin\phi,\cos\phi,0)\end{array}\right. (53)

is an equilibrium for all ϕ[0,2π]\phi\in[0,2\pi] and the corresponding average payoff of Alice is

$A(xA,xB)=rsin2θ+pcos2θ.\$_{A}(x_{A}^{\star},x_{B}^{\star})=r\sin^{2}\theta+p\cos^{2}\theta. (54)

Equilibrium strategies do not exist if sin2θ>ptr+p\sin^{2}\theta>\frac{p}{t-r+p}, which is consistent with our previous discussion that the maximal entanglement θ=π2\theta=\frac{\pi}{2} case does not have any equilibrium while only pure quantum strategies are played.

Mixed Quantum Strategy

In general the average payoff of Alice can be written as

$A\displaystyle\$_{A} =(pIApXApYApZA)A(pIBpXBpYBpZB)\displaystyle=\left(\begin{array}[]{cccc}p^{A}_{I}&p^{A}_{X}&p^{A}_{Y}&p^{A}_{Z}\end{array}\right)A\left(\begin{array}[]{c}p^{B}_{I}\\ p^{B}_{X}\\ p^{B}_{Y}\\ p^{B}_{Z}\end{array}\right) (60)
A\displaystyle A =(rscos2θ+tsin2θsrcos2θ+psin2θtcos2θ+ssin2θppcos2θ+rsin2θttpcos2θ+rsin2θptcos2θ+ssin2θrcos2θ+psin2θsscos2θ+tsin2θr)\displaystyle=\left(\begin{array}[]{cccc}r&s\cos^{2}\theta+t\sin^{2}\theta&s&r\cos^{2}\theta+p\sin^{2}\theta\\ t\cos^{2}\theta+s\sin^{2}\theta&p&p\cos^{2}\theta+r\sin^{2}\theta&t\\ t&p\cos^{2}\theta+r\sin^{2}\theta&p&t\cos^{2}\theta+s\sin^{2}\theta\\ r\cos^{2}\theta+p\sin^{2}\theta&s&s\cos^{2}\theta+t\sin^{2}\theta&r\end{array}\right) (65)

We define

t¯=tcos2θ+ssin2θr¯=rcos2θ+psin2θp¯=pcos2θ+rsin2θs¯=scos2θ+tsin2θ\displaystyle\begin{array}[]{c}\bar{t}=t\cos^{2}\theta+s\sin^{2}\theta\\ \bar{r}=r\cos^{2}\theta+p\sin^{2}\theta\\ \bar{p}=p\cos^{2}\theta+r\sin^{2}\theta\\ \bar{s}=s\cos^{2}\theta+t\sin^{2}\theta\end{array} (70)

and

pI=pZ=12s+s¯pp¯t+t¯rr¯pp¯+s+s¯pX=pY=12t+t¯rr¯t+t¯rr¯pp¯+s+s¯\displaystyle\begin{split}p_{I}^{\star}=p_{Z}^{\star}=\frac{1}{2}\frac{s+\bar{s}-p-\bar{p}}{t+\bar{t}-r-\bar{r}-p-\bar{p}+s+\bar{s}}\\ p_{X}^{\star}=p_{Y}^{\star}=\frac{1}{2}\frac{t+\bar{t}-r-\bar{r}}{t+\bar{t}-r-\bar{r}-p-\bar{p}+s+\bar{s}}\end{split} (71)
Proposition 2.5.

pA=pB=pp^{A}=p^{B}=p^{\star} is an equilibrium

Proof.

The equation (65) can be decomposed into

$A\displaystyle\$_{A} =(rs¯sr¯)(pIBpXBpYBpZB)\displaystyle=\left(\begin{array}[]{cccc}r&\bar{s}&s&\bar{r}\end{array}\right)\left(\begin{array}[]{c}p^{B}_{I}\\ p^{B}_{X}\\ p^{B}_{Y}\\ p^{B}_{Z}\end{array}\right) (77)
+(pXApYApZA)(t¯rps¯p¯str¯trp¯s¯pst¯r¯r¯rss¯s¯srr¯)(pIBpXBpYBpZB)\displaystyle+\left(\begin{array}[]{ccc}p_{X}^{A}&p_{Y}^{A}&p_{Z}^{A}\end{array}\right)\left(\begin{array}[]{cccc}\bar{t}-r&p-\bar{s}&\bar{p}-s&t-\bar{r}\\ t-r&\bar{p}-\bar{s}&p-s&\bar{t}-\bar{r}\\ \bar{r}-r&s-\bar{s}&\bar{s}-s&r-\bar{r}\end{array}\right)\left(\begin{array}[]{c}p^{B}_{I}\\ p^{B}_{X}\\ p^{B}_{Y}\\ p^{B}_{Z}\end{array}\right) (86)

One can show that the second term vanishes at pB=pp^{B}=p^{\star}. Therefore $A(pA,p)=$A(p,p)\$_{A}(p^{A},p^{\star})=\$_{A}(p^{\star},p^{\star}) is satisfied for any pAp^{A}. Since the same argument is held for Bob’s average payoff, (pA,pB)=(p,p)(p^{A},p^{B})=(p^{\star},p^{\star}) is an equilibrium. ∎

In addition, we can find another equilibrium for t+s>r+pt+s>r+p and t+s<r+pt+s<r+p. We first address the case where t+s>r+pt+s>r+p.

Proposition 2.6.

If t+s>r+pt+s>r+p, the following pairs of (pA,pB)(p^{A},p^{B}) are equilibria.

{pA=(0,12,12,0)pB=(0,12,12,0)(sin2θ<2(ps)tr+ps)\displaystyle\left\{\begin{array}[]{c}{p^{A}}^{\star}=\left(0,\frac{1}{2},\frac{1}{2},0\right)\\ {p^{B}}^{\star}=\left(0,\frac{1}{2},\frac{1}{2},0\right)\end{array}\right.\quad\left(\sin^{2}\theta<\frac{2(p-s)}{t-r+p-s}\right) (89)
{pA=(12,0,0,12)pB=(0,12,12,0)(sin2θ>2(ps)tr+ps)\displaystyle\left\{\begin{array}[]{c}{p^{A}}^{\star}=\left(\frac{1}{2},0,0,\frac{1}{2}\right)\\ {p^{B}}^{\star}=\left(0,\frac{1}{2},\frac{1}{2},0\right)\end{array}\right.\quad\left(\sin^{2}\theta>\frac{2(p-s)}{t-r+p-s}\right) (92)
Proof.

We first consider the case where sin2θ<2(ps)tr+ps\sin^{2}\theta<\frac{2(p-s)}{t-r+p-s} holds. We write pA=pB=(0,12,12,0){p^{A}}^{\star}={p^{B}}^{\star}=\left(0,\frac{1}{2},\frac{1}{2},0\right). Then Alice’s average payoff is

$A(pA,pB)\displaystyle\$_{A}(p^{A},{p^{B}}^{\star}) =12(scos2θ+tsin2θ+s)\displaystyle=\frac{1}{2}\left(s\cos^{2}\theta+t\sin^{2}\theta+s\right)
+12((tr)sin2θ+(ps)(cos2θ+1))(pXA+pYA)\displaystyle+\frac{1}{2}(-(t-r)\sin^{2}\theta+(p-s)(\cos^{2}\theta+1))({p^{A}_{X}+p^{A}_{Y}}) (93)

Note that the second term is positive for (sin2θ<2(ps)tr+ps)\left(\sin^{2}\theta<\frac{2(p-s)}{t-r+p-s}\right). Therefore $A(pA,pB)$A(pA,pB)\$_{A}({p^{A}}^{\star},{p^{B}}^{\star})\geq\$_{A}(p^{A},{p^{B}}^{\star}) for all pAp^{A}. In the same way, the average payoff of Bob also respects $B(pA,pB)$A(pA,pB)\$_{B}({p^{A}}^{\star},{p^{B}}^{\star})\geq\$_{A}({p^{A}}^{\star},{p^{B}}).

For sin2θ>2(ps)tr+ps\sin^{2}\theta>\frac{2(p-s)}{t-r+p-s}, we write pA=(12,0,0,12),pB=(0,12,12,0){p^{A}}^{\star}=\left(\frac{1}{2},0,0,\frac{1}{2}\right),{p^{B}}^{\star}=\left(0,\frac{1}{2},\frac{1}{2},0\right). Then the average payoff of Alice is exactly the same as (93). Since sin2θ>2(ps)tr+ps\sin^{2}\theta>\frac{2(p-s)}{t-r+p-s}, the second term of (93) is negative. Hence it becomes maximal for pXA=pYA=0p^{A}_{X}=p^{A}_{Y}=0, namely $A(pA,pB)$A(pA,pB)\$_{A}({p^{A}}^{\star},{p^{B}}^{\star})\geq\$_{A}(p^{A},{p^{B}}^{\star}) for all pAp^{A}. The average payoff of Bob is

$B(pA,pB)\displaystyle{\$_{B}}({p^{A}}^{\star},{p^{B}}) =12(rcos2θ+psin2θ+r)\displaystyle=\frac{1}{2}\left(r\cos^{2}\theta+p\sin^{2}\theta+r\right)
+12((ps)sin2θ+(tr)(cos2θ+1))(pXB+pYB)\displaystyle+\frac{1}{2}(-(p-s)\sin^{2}\theta+(t-r)(\cos^{2}\theta+1))(p^{B}_{X}+p^{B}_{Y}) (94)

Since t+s>r+pt+s>r+p, the second term is always positive. Therefore $B(pA,pB)$B(pA,pB)\$_{B}({p^{A}}^{\star},{p^{B}}^{\star})\geq\$_{B}({p^{A}}^{\star},p^{B}) for all pBp^{B}. This ends the proof. ∎

Note that for sin2θ>2(ps)tr+ps\sin^{2}\theta>\frac{2(p-s)}{t-r+p-s} and at the equilibrium, the average payoff of Alice is smaller than that of Bob: $A(pA,pB)<$B(pA,pB)\$_{A}({p^{A}}^{\star},{p^{B}}^{\star})<\$_{B}({p^{A}}^{\star},{p^{B}}^{\star}).

In summary, the average payoff of Alice is exhibited in Fig.1.

Refer to caption
Figure 1: The average payoff of Alice at the Nash equilibrium, when t+s>r+pt+s>r+p.

We next consider the case where t+s<r+pt+s<r+p. One can show the following statement as we did before.

Proposition 2.7.

If t+s<r+pt+s<r+p, the following pairs of (pA,pB)(p^{A},p^{B}) are equilibria.

{pA=(0,12,12,0)pB=(0,12,12,0)(0θπ2)\displaystyle\left\{\begin{array}[]{c}{p^{A}}^{\star}=\left(0,\frac{1}{2},\frac{1}{2},0\right)\\ {p^{B}}^{\star}=\left(0,\frac{1}{2},\frac{1}{2},0\right)\end{array}\right.\quad\left(0\leq\theta\leq\frac{\pi}{2}\right) (97)
{pA=(12,0,0,12)pB=(12,0,012)(sin2θ>2(tr)tr+ps)\displaystyle\left\{\begin{array}[]{c}{p^{A}}^{\star}=\left(\frac{1}{2},0,0,\frac{1}{2}\right)\\ {p^{B}}^{\star}=\left(\frac{1}{2},0,0\frac{1}{2}\right)\end{array}\right.\quad\left(\sin^{2}\theta>\frac{2(t-r)}{t-r+p-s}\right) (100)

Since 1<2(ps)tr+ps1<\frac{2(p-s)}{t-r+p-s} is always true, they satisfy sin2θ<2(ps)tr+ps\sin^{2}\theta<\frac{2(p-s)}{t-r+p-s}. Therefore (97) is an equilibrium for all θ\theta. In summary, the average payoff of Alice is exhibited in Fig.2 and 3.

Refer to caption
Figure 2: The average payoff of Alice at the Nash equilibrium, when t+s<r+pt+s<r+p and 2(tr)>ps2(t-r)>p-s.
Refer to caption
Figure 3: The average of Alice’s payoff at the Nash equilibrium, when t+s<r+pt+s<r+p and 2(tr)<ps2(t-r)<p-s.

3 Repeated Quantum Games

3.1 Setup

A generic repeated quantum game is summarized as follows.

Definition 3.1 (Repeated Quantum Game).

Let {τi}i=1,2,\{\tau_{i}\}_{i=1,2,\cdots} be a set of positive integers and {|ψt}t=0,1,\{\ket{\psi^{t}}\}_{t=0,1,\cdots} be a family of quantum states. For each ii, put ti=kiτkt_{i}=\sum_{k}^{i}\tau_{k}. Let {|i}i\{\ket{i}\}_{i} be an orthonormal basis and {ci}i\{c_{i}\}_{i} be a set of complex numbers such that i|ci|2=1\sum_{i}|c_{i}|^{2}=1. With respect to each |i\ket{i}, suppose that reward to a player nn is given by a certain operator $^n\hat{\$}_{n} in such a way that i|$^n|i\bra{i}\hat{\$}_{n}\ket{i} and i|$^n|j=0\bra{i}\hat{\$}_{n}\ket{j}=0 for iji\neq j. Then a general NN-persons quantum repeated game proceeds as follows:

  1. 1.

    At any tt, the game is in a state

    |ψt=ici|i,\ket{\psi^{t}}=\sum_{i}c_{i}\ket{i}, (101)
  2. 2.

    The time-evolution of the game is given by unitary operators UtU^{t}

    |ψt+1=Ut|ψt\ket{\psi^{t+1}}=U^{t}\ket{\psi^{t}} (102)
  3. 3.

    At time t=tit=t_{i}, reward is evaluated and given to players nn by a

    $nti=f(τi)ψti|$^n|ψti,\$^{t_{i}}_{n}=f(\tau_{i})\bra{\psi^{t_{i}}}\hat{\$}_{n}\ket{\psi^{t_{i}}}, (103)

    where f(τi)f(\tau_{i}) is a function such that f(1)=1f(1)=1. It is a collective expression of the interest and other costs incurred during that period of τi\tau_{i}.

  4. 4.

    The total payoff of a player nn at time t[tk,tk+1)t\in[t_{k},t_{k+1}) is defined by

    $n(t)=ikδti$nti\$_{n}(t)=\sum_{i}^{k}\delta^{t_{i}}\$^{t_{i}}_{n} (104)

    and in the limit it is $n=limt$n(t)\$_{n}=\lim_{t\to\infty}\$_{n}(t).

Now let us consider the repeated quantum prisoner’s dilemma. Let |ψtAB\ket{\psi^{t}}\in\mathcal{H}_{A}\otimes\mathcal{H}_{B} be a state at the tt-th round. Let {τi}i=1,2,\{\tau_{i}\}_{i=1,2,\cdots} be a set of positive integers. We assume measurement of quantum states is done at the end of the ti=k=1iτkt_{i}=\sum_{k=1}^{i}\tau_{k} round for each ii. For example, τi=1\tau_{i}=1 for all ii means we measure quantum states every round. Suppose Vτi1+1,,VτiV^{\tau_{i-1}+1},\cdots,V^{\tau_{i}} are operators of Alice, then her strategy is UAti=VAti1+1VAti1+τiU^{t_{i}}_{A}=V^{t_{i-1}+1}_{A}\otimes\cdots\otimes V^{t_{i-1}+\tau_{i}}_{A} and the game is in a state

|ψti=𝒥(UAtiUBti)𝒥|ψti1\ket{\psi^{t_{i}}}=\mathcal{J}^{\dagger}(U^{t_{i}}_{A}\otimes U^{t_{i}}_{B})\mathcal{J}\Ket{\psi^{t_{i-1}}} (105)

which should be measured at the end of tit_{i} round. The payoff could depend on τi\tau_{i}:

$Ati=f(τi)ψti|$^A|ψti,\$^{t_{i}}_{A}=f(\tau_{i})\bra{\psi^{t_{i}}}\hat{\$}_{A}\Ket{\psi^{t_{i}}}, (106)

where f(τi)f(\tau_{i}) is a certain function of τi\tau_{i}. In repeated games, the total payoff is written with discount factor δ(0,1)\delta\in(0,1) and in our case it can be introduced in such a way that

$A=iδti$Ati.\$_{A}=\sum_{i}\delta^{t_{i}}\$^{t_{i}}_{A}. (107)

An interpretation of the discount factor is that the importance of the future payoffs decreases with time. The measurement periods {τi}\{\tau_{i}\} can be defined randomly or predeterminedly. When game’s state is measured every round, the probability of monitoring signals is classical (see (15)). In order to enjoy full quantum games, a game should evolve with unitary operation.

Since this work is the first study on infinitely repeated quantum games, we focus on the most fundamental case and investigate a role of entanglement. We will work on a generic case elsewhere. For this purpose, in what follows, we put τi=1\tau_{i}=1 for all ii and f(1)=1f(1)=1. Currently, we are considering only the prisoner’s dilemma, but there are frequent cases when the game is repeated, for example in negotiation games. Negotiations usually involve thinking not only about the pie on the table today, but also about the pie after tomorrow. It is natural that interest may accrue and costs may change during negotiations. The function ff represents the change in cost that can occur during the period between the finalization of one state and the transition to the next state. Negotiation games are the basic repeated games in contract theory. The formulation of contract theory as quantum games has been recently given in Ikeda_cat2021 . Further developing that model as a game theory is a very interesting problem.

Then the game evolves in such a way that

|ψt+1=𝒥(UAtUBt)𝒥|ψt,\displaystyle\Ket{\psi^{t+1}}=\mathcal{J}^{\dagger}(U_{A}^{t}\otimes U_{B}^{t})\mathcal{J}\Ket{\psi^{t}}, (108)

where UAtU_{A}^{t} is her quantum strategy for the ttth round. Then her average payoff is

$At=ψt|$^A|ψt.\$^{t}_{A}=\Braket{\psi^{t}}{\hat{\$}_{A}}{\psi^{t}}. (109)

3.2 Equilibria

We define a trigger strategy as follows.

Definition 3.2 (Trigger 1).
  1. 1.

    Alice and Bob play II at time τ\tau if cooperative relation is maintained before time τ\tau.

  2. 2.

    If Alice (Bob) deviates from this cooperative strategy at τ\tau, then Bob (Alice) chooses either XX or YY with equal probability at time τ+1\tau+1.

We show that Trigger 1 is an equilibrium of the repeated quantum prisoner’s dilemma.

Lemma 3.3.

Trigger 1 is an equilibrium of the repeated quantum prisoner’s dilemma if either one of the following conditions are satisfied:

  1. 1.

    r+p>t+sr+p>t+s

  2. 2.

    r+p<t+sr+p<t+s and sin2θ<2(ps)tr+ps\sin^{2}\theta<\frac{2(p-s)}{t-r+p-s}

.

Proof.

Suppose they play II for all tt. Then Alice’s total payoff is

VA=r+δr+δ2r+=r1δ\displaystyle\begin{split}V_{A}&=r+\delta r+\delta^{2}r+\cdots\\ &=\frac{r}{1-\delta}\end{split} (110)

Now assume that they play II before tt and Alice plays YY at τ\tau. Since Bob plays II at τ\tau, her maximal payoff is

VA=t+δ$A(pA,pB)+δ2$A(pA,pB)+=t+δ1δ$A(pA,pB),\displaystyle\begin{split}V^{\prime}_{A}&=t+\delta\$_{A}(p^{A},{p^{B}}^{\star})+\delta^{2}\$_{A}(p^{A},{p^{B}}^{\star})+\cdots\\ &=t+\frac{\delta}{1-\delta}\$_{A}(p^{A},{p^{B}}^{\star}),\end{split} (111)

where pB=(0,12,12,0){p^{B}}^{\star}=\left(0,\frac{1}{2},\frac{1}{2},0\right). Note that, according to Propositions 2.6 and 2.7, pB{p^{B}}^{\star} is an equilibrium of the game.

Choosing YY cannot be her incentive if VA>VAV_{A}>V^{\prime}_{A}, which is equilibrium to

rt+δ1δ(r$A(pA,pB))>0.r-t+\frac{\delta}{1-\delta}(r-\$_{A}(p^{A},{p^{B}}^{\star}))>0. (112)

Since r$A(pA,pB)r-\$_{A}(p^{A},{p^{B}}^{\star}) is always positive, this inequality is satisfied for a large δ\delta. The greatest lower bound δinf\delta_{\inf} of such δ\delta is obtained as

δinf=trt12(p+pcos2θ+rsin2θ))\delta_{\inf}=\frac{t-r}{t-\frac{1}{2}(p+p\cos^{2}\theta+r\sin^{2}\theta))} (113)

Since δinf\delta_{\inf} is smaller than 1, no deviation occurs.

If Alice or Bob deviates from cooperative strategy, they choose XX or YY with equal probability along trigger 1. According to Proposition 2.6 and 2.7, choosing XX or YY with equal probability each other is an equilibrium when condition 1,2 are satisfied. So no deviation occurs, as long as Alice and Bob choose XX or YY with equal probability.

Therefore, trigger 1 can be an equilibrium of the repeated game. ∎

Fig.4 and 5 present the relation between δinf\delta_{\inf} and entanglement. The bigger θ\theta becomes, the more profit the players can receive, therefore the discount factor δ\delta should be sufficiently big in order to maintain the cooperative relation when their states are strongly entangled.

Refer to caption
Figure 4: The greatest lower bound δinf\delta_{\inf} of discount factor as a function of the entangle parameter θ\theta when t+s>r+pt+s>r+p.
Refer to caption
Figure 5: The greatest lower bound δinf\delta_{\inf} as a function of the entangle parameter θ\theta when t+s<r+pt+s<r+p.
Definition 3.4 (Trigger 2).
  1. 1.

    Alice and Bob play II at time τ\tau if cooperative relation is maintained before time τ\tau.

  2. 2.

    If Bob (Alice) deviates from the cooperative strategy at time τ\tau, then Alice (Bob) chooses either XX or YY with the equal probability at time τ+1\tau+1.

  3. 3.

    If Alice and Bob plays XX and YY (or YY and XX) respectively at time τ\tau, then at time τ+1\tau+1 Alice (Bob) repeats the same strategy, otherwise chooses either XX or YY with the equal probability at time τ+1\tau+1.

  4. 4.

    If Alice and Bob repeat XX and YY (or YY and XX) respectively before time τ\tau and Bob (Alice) deviates from the repetition at time τ\tau, then Alice (Bob) chooses either XX or YY with the equal probability at time τ+1\tau+1.

Refer to caption
Figure 6: Alice’s strategy diagram against Bob’s strategies

Deviation 1 means that Alice (Bob) deviates the cooperation and chooses XX or YY. Deviation 2 means that Alice (Bob) chooses II or ZZ when they should choose XX or YY with the equal probability. According to Proposition 2.6, it can happen when sin2θ>2(ps)tr+ps\sin^{2}\theta>\frac{2(p-s)}{t-r+p-s}. Deviation 3 means that Alice (Bob) chooses II or ZZ when Alice should play XX (YY) and Bob should play YY (X)(X), respectively. It can happen at sin2θ>pstr+ps\sin^{2}\theta>\frac{p-s}{t-r+p-s}. This is because that if an opponent chooses YY (X)(X), choosing ZZ (I)(I) against XX (Y)(Y) can yield more profit.

We can modify Trigger 1 and redefine Trigger 2 so that for all θ\theta, it can be a subgame perfect equilibrium.

Lemma 3.5 (Trigger 2).

Trigger 2 is an equilibrium of the repeated game for r>t+p2r>\frac{t+p}{2}.

Proof.

We first address the case where Alice choose deviation 1. Her total expected payoff when she plays II is VA=r1δV_{A}=\frac{r}{1-\delta}. If Alice chosen deviation 1, Alice and Bob choose XX or YY with the equal probability. (X,X),(X,Y),(Y,X),(Y,Y)(X,X),(X,Y),(Y,X),(Y,Y) can occur with the equal probability. In this round, the expected payoff is 12(p+pcos2θ+rsin2θ)\frac{1}{2}\left(p+p\cos^{2}\theta+r\sin^{2}\theta\right). If (X,X)(X,X) or (Y,Y)(Y,Y) is played, they should choose XX or YY in the next round. If (X,Y)(X,Y) or (Y,X)(Y,X) is played, they should choose the same quantum strategy as last time in the next round. Therefore her maximal expected payoff for choosing deviation 1 is

VA=t+t=1+δt(12)t($A(X,X)+$A(X,Y))+s=1+t=1+δs+t(12)s$A(X,Y)=t+δ21δ2(p+11δ(pcos2θ+rsin2θ))\displaystyle\begin{split}V^{\prime}_{A}&=t+\sum_{t=1}^{+\infty}\delta^{t}\left(\frac{1}{2}\right)^{t}\left(\$_{A}(X,X)+\$_{A}(X,Y)\right)+\sum_{s=1}^{+\infty}\sum_{t=1}^{+\infty}\delta^{s+t}\left(\frac{1}{2}\right)^{s}\$_{A}(X,Y)\\ &=t+\frac{\frac{\delta}{2}}{1-\frac{\delta}{2}}\left(p+\frac{1}{1-\delta}\left(p\cos^{2}\theta+r\sin^{2}\theta\right)\right)\end{split} (114)

She does not have an incentive to choose deviation 1 when VAVA>0V_{A}-V^{\prime}_{A}>0, which can happen is δ\delta is large.

Regarding deviation 2, Alice’s total expected payoff when she choose either XX or YY with the equal probability is

VA=121δ2(p+11δ(pcos2θ+rsin2θ))V_{A}=\frac{\frac{1}{2}}{1-\frac{\delta}{2}}\left(p+\frac{1}{1-\delta}\left(p\cos^{2}\theta+r\sin^{2}\theta\right)\right) (115)

and her maximal expected payoff for choosing deviation 2 is

VA=scos2θ+tsin2θ2+δVA.V^{\prime}_{A}=\frac{s\cos^{2}\theta+t\sin^{2}\theta}{2}+\delta V_{A}. (116)

She does not have an incentive to choose deviation 2 if VAVA>0V_{A}-V^{\prime}_{A}>0, which is satisfied for a large δ\delta.

On deviation 3, Alice’s total expected payoff for choosing XX (YY) while Bob choosing YY (XX) is

VA=$A(X,Y)1δ=pcos2θ+rsin2θ1δ\displaystyle\begin{split}V_{A}&=\frac{\$_{A}(X,Y)}{1-\delta}\\ &=\frac{p\cos^{2}\theta+r\sin^{2}\theta}{1-\delta}\end{split} (117)

Her maximal expected payoff for choosing deviation 3 is

VA=scos2θ+tsin2θ+δ(121δ2(p+11δ(pcos2θ+rsin2θ)))V^{\prime}_{A}=s\cos^{2}\theta+t\sin^{2}\theta+\delta\left(\frac{\frac{1}{2}}{1-\frac{\delta}{2}}\left(p+\frac{1}{1-\delta}\left(p\cos^{2}\theta+r\sin^{2}\theta\right)\right)\right) (118)

since Alice and Bob choose XX or YY with the equal probability along trigger 2. She does not have an incentive to choose deviation 3 if VAVA>0V_{A}-V^{\prime}_{A}>0, which is satisfied for a large δ\delta.

For all cases, one can find δ<1\delta<1 that does not endows a player with an incentive for deviation. ∎

The curves in Fig. 7 presents the relation between θ\theta and the greatest lower bound δinf\delta_{\inf} of δ\delta that respects the condition VAVA>0V_{A}-V^{\prime}_{A}>0. Such δ\delta lives in the colored region in the figure.

Refer to caption
Figure 7: The region (gray) where trigger 2 becomes an equilibrium when r>t+p2r>\frac{t+p}{2}.

3.3 Folk Theorem

3.3.1 Pure Quantum Strategy

We first consider both players choose pure strategies. Let S=U(2)×U(2)S=U(2)\times U(2) be a set of strategies of Alice and Bob. We write a pair of Alice and Bob’s profit $(a)=($A(a),$B(a))\$(a)=\left(\$_{A}(a),\$_{B}(a)\right) for a strategy aSa\in S. And let ν\nu be the profit per single round of the repeated game. Then it turns out that they can receive profit ν\nu in VV (Fig.8 and 9).

V={ν2|ν=Sp(a)$(a)dμ(a),p(a)0,Sp(a)𝑑μ(a)=1},\displaystyle V=\left\{\nu\in\mathbb{R}^{2}\middle|\nu=\int_{S}p(a)\$(a)d\mu(a),p(a)\geq 0,\int_{S}p(a)d\mu(a)=1\right\}, (119)

where dμd\mu is a Haar measure on U(2)×U(2)U(2)\times U(2).

Refer to caption
Figure 8: Domain of profit for r>t+s2r>\frac{t+s}{2}.
Refer to caption
Figure 9: Domain of profit for r<t+s2r<\frac{t+s}{2}.

Alice’s mini-max value is defined as

νA¯=infUBsupUA$A(UA,UB),\displaystyle\underline{\nu_{A}}=\inf_{U_{B}}\sup_{U_{A}}\$_{A}(U_{A},U_{B}), (120)

which is shown in Fig.10. If νA¯\underline{\nu_{A}} is smaller than max{r,t+s2}\max\left\{r,\frac{t+s}{2}\right\}, then there exists a subgame perfect equilibrium. This can be summarized as follows.

Theorem 3.6.

While players choose pure quantum strategies, there exists a subgame perfect equilibrium of the repeated quantum prisoner’s dilemma if

sin2θ<max{r,t+s2}sts.\displaystyle\sin^{2}\theta<\frac{\max\left\{r,\frac{t+s}{2}\right\}-s}{t-s}. (121)
Refer to caption
Figure 10: The mini-max value of Alice when both Alice and Bob choose only pure strategies.

3.3.2 Mixed Quantum Strategy

The mini-max value for the mixed strategy case is defined as

νA¯=minpBmaxpA$A(pA,pB),\displaystyle\underline{\nu_{A}}=\min_{{p^{B}}}\max_{{p^{A}}}{\$_{A}}({p^{A}},{p^{B}}), (122)

which is shown in Fig11 and 12. There is a subgame perfect equilibrium since νA¯<max{r,t+s2}\underline{\nu_{A}}<\max\left\{r,\frac{t+s}{2}\right\} is always satisfied. Note that, for all entanglement parameter θ\theta there exists non-empty region

V={νV|νi>νi¯,i=A,B},\displaystyle V^{\star}=\left\{\nu\in V\middle|\nu_{i}>\underline{\nu_{i}},i=A,B\right\}, (123)

which we call the feasible and individually rational payoff set. If Alice and Bob are entangled, the mini-max value becomes large and the area of VV^{\star} gets small. Especially, the area of VV^{\star} becomes the smallest when states are maximally entangled θ=π2\theta=\frac{\pi}{2} and is exhibited as the dark parts of Fig.13 and 14. It does not contain (r,r)(r,r) when θ=π2\theta=\frac{\pi}{2}, therefore a notable difference between classical and quantum repeated prisoner’s dilemma is the following.

Refer to caption
Figure 11: The mini-max value of Alice for t+s>r+pt+s>r+p when they play mixed strategies.
Refer to caption
Figure 12: The mini-max value of Alice for t+s<r+pt+s<r+p when they play mixed strategies.
Theorem 3.7 (Anti-Folk Theorem).

When Alice and Bob are maximally entangled, cooperation and cooperation cannot be realized unless rt+s2r\geq\frac{t+s}{2}.

In the classical repeated prisoner’s game, cooperation and cooperation is the Pareto optimal solution and can be an equilibrium of the repeated game, called the Folk theorem, whereas it is not true for the quantum repeated game. For the case of our quantum game, one can always find a better solution that makes their payoff larger than choosing cooperation and cooperation. So in order to establish the cooperative relation, they require a sufficiently large reward rr for cooperating.

Refer to caption
Figure 13: The feasible and individually rational payoff set for r>t+s2r>\frac{t+s}{2}. Since the dark gray area contains (r,r)(r,r), in this case the solution in which both cooperate is realized as the quantum version of the equilibrium solution (Quantum Folk Theorem).
Refer to caption
Figure 14: The feasible and individually rational payoff set for r<t+s2r<\frac{t+s}{2}. Since the dark gray area does not contain (r,r)(r,r), in this case the solution in which both cooperate is not realized as the quantum version of the equilibrium solution (Anti-Folk Theorem).

Based on a similar argument for the classical Folk theorem, we obtain its quantum version.

Theorem 3.8 (Quantum Folk Theorem).

For all entanglement parameter θ\theta, there exists an appropriate discount factor δ(0,1)\delta\in(0,1) such that any payoff in the feasible and individually rational payoff are in the set VV^{\star}.

Proof.

Let mAm_{A} be Alice’s mixed strategy that endows Bob with the mini-max value. Then if the round-average payoff for playing aAa^{\star}\in A is in VV^{\star}, the following is an equilibrium.

  1. 1.

    Choose aAa^{\star}\in A unless deviation occurs.

  2. 2.

    If deviation is observed, the both play m=(mA,mB)m=(m_{A},m_{B}) for the next TT rounds, then the both play aa^{\star}.

  3. 3.

    Once if mm is not chosen in the TT rounds, the both again choose mm for the succeeding TT rounds.

Here TT is chosen as

di<T($i(a)$i(m)),(i=A,B)\displaystyle d_{i}<T(\$_{i}(a^{\star})-\$_{i}(m)),\quad(i=A,B) (124)

where di=maxai{$i(ai,ai)$i(a)}d_{i}=\max_{a_{i}}\{\$_{i}(a_{i},a_{-i}^{\star})-\$_{i}(a^{\star})\}.

One can complete the proof of the above as did in 10.2307/1911307 . ∎

4 Conclusion and Future Directions

In this work we established the concept of a generic repeated quantum game and addressed repeated quantum prisoner’s dilemma. Based on Definition 3.1, we addressed the repeated quantum prisoner’s dilemma for the case of τi=1\tau_{i}=1 for all ii. Even though the game evolves in a classical manner, repetition of the quantum game is much different from the classical repeated prisoner’s dilemma (Theorem 3.7) and obtained a novel feasible and individually rational set VV^{\star} for the repeated quantum prisoner’s dilemma (Theorem 3.8).

As stated at the beginning of this article, repeated games play a very important role in game theory. In many cases, human activities are followed by long-term strategic relationships. In particular, in order to form a sustainable market, it is necessary to consider the interests and strategies of market participants and develop rules from the perspective of repeated games. It is a market on complex Hilbert space, which is not envisioned by conventional economic theory, but will be realized in the near future with the realization of the quantum Internet. As this paper has shown, the equilibrium solution of a classical game is not necessarily the solution in an infinite quantum game. This clearly shows that infinite iteration quantum games are not an extension of classical theory, and suggests that quantum economics on complex Hilbert spaces is completely different from conventional economic theory. From this point of view, it is clear that the economics of quantum money 2020arXiv201014098A , market design and contract theory with quantum operations Ikeda_cat2021 need to be seriously investigated, and that these are new and challenging issues in the quantum age.

There are many research directions of the repeated quantum games. For example, it will be interesting to address cases where τi1\tau_{i}\neq 1. In this work we focused on the τi=1\tau_{i}=1 case and as a result we clarified a role of entanglement θ\theta. From a viewpoint of quantum computation, the complex probability coefficients cic_{i} are important as well as entanglement. Taking into account of it, studying more on a generic time-evolution of the quantum game is crucial and it is necessary to introduce general review periods τi\tau_{i} which the players should not know a priori.

Furthermore, it is important to address games with three or more persons from a viewpoint of repeated quantum games. As Morgenstern and Neumann described morgenstern1953theory , games with three or more persons can be much different from games with two persons, since they have a chance to form a coalition. In quantum games, there are various ways to introduce entanglement which makes games more complicated. In the future quantum network, people will use entangled strategies in general for various purposes. Therefore it will be more practical to investigate quantum games where many strategies are entangled. More recently the authors investigated a quantum economy where three persons create, consume and exchange some entangled quantum information, using entangled strategies 2020arXiv201014098A . According to it, quantum people can use stronger strategies than people using only classical resources, and their strategic behavior in a quantum market can be much more complicated in general. So as emphasized in 2020arXiv201014098A , economic in the quantum era will become essentially different from that today in the literature. From this perspective, investigating a long term relation among quantum players will be significant.

Moreover it is also interesting to apply quantum repeated games not only to economics but also to other fields, since game theoretic perspectives are used in various disciplines such as business administration, psychology, biology, and computer science. Studying quantum effects of strategies and human behavior in a global quantum system will require advanced quantum technologies, but quantum games will also shed new light on a study completed in a local quantum system. For example, recent advances in applications of quantum algorithm are crucial for potential speedup of machine learning, which can be regarded as a repeated game. Generative adversarial networks (GANs) could be such a typical example. In general, GANs have multiple and complicated equilibria and it will be interesting to explore QGANs PhysRevLett.121.040502 in terms of repeated quantum games.

We leave it an open question to investigate a Nash equilibrium in general repeated quantum games. For single stage non-cooperative games, the equilibria in pure quantum strategies are addressed in 2018arXiv180102053S and those in mixed quantum strategies are given in 1999PhRvL..82.1052M , by using Glicksburg’s fixed-point theorem.

Acknowledgement

This work was supported by PIMS Postdoctoral Fellowship Award (K. I.)

References