This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Zero-Sum Games for Continuous-time Markov decision processes with risk-sensitive average cost criterion

Mrinal K. Ghosh Department of Mathematics
Indian Institute of Science
Bangalore-560012, India.
[email protected]
Subrata Golui Department of Mathematics
Indian Institute of Technology Guwahati
Guwahati, Assam, India
[email protected]
Chandan Pal Department of Mathematics
Indian Institute of Technology Guwahati
Guwahati, Assam, India
[email protected]
 and  Somnath Pradhan Department of Mathematics and Statistics
Queen’s University
Kingston, Ontario-K7L 3N6, Canada
[email protected]
Abstract.

We consider zero-sum stochastic games for continuous time Markov decision processes with risk-sensitive average cost criterion. Here the transition and cost rates may be unbounded. We prove the existence of the value of the game and a saddle-point equilibrium in the class of all stationary strategies under a Lyapunov stability condition. This is accomplished by establishing the existence of a principal eigenpair for the corresponding Hamilton-Jacobi-Isaacs (HJI) equation. This in turn is established by using the nonlinear version of Krein-Rutman theorem. We then obtain a characterization of the saddle-point equilibrium in terms of the corresponding HJI equation. Finally, we use a controlled population system to illustrate results.

Keywords: Zero-sum game; risk-sensitive average cost criterion; history dependent strategy; HJI equation; saddle point equilibrium.

1. INTRODUCTION

Markov decision processes (MDPs) are widely used for modeling control problems that arise naturally in many real-life problems, for example in queueing models, epidemiology models, birth-death models etc, see [5], [16], [31], [32]. When there is more than one controller (or player) the stochastic control problem is referred to as stochastic game problem. Stochastic dynamic game was first introduced in [33] and has been studied extensively in the literature due to its immense applications; see [3], [6], [10], [11], [15], [34], [37], [38] and the references therein. In this article we consider the risk-sensitive ergodic zero-sum game for continuous-time Markov decision processes (CTMDPs). In zero-sum game, one player is trying to minimize her/his cost and the other player is trying to maximize the same. In literature, the expected average cost criterion is a commonly used optimality criterion in the theory of CTMDPs and has been widely studied under the different sets of optimality conditions; for control problems see, [16], [39] and the references therein; for game problems see [13], [20], [35] and the references therein. In these papers the decision-makers are risk-neutral. However, risk preferences may vary from person to person in the real-world applications. In order to address this concern one of the approaches that is available in the literature is risk-sensitive criterion. In this criterion one investigates the expectation of an exponential of the random quantity. This takes into account the attitude of the controller with respect to risk. The performance of a pair of strategies is measured by risk-sensitive average cost criterion, which in our present case is defined by (2.4), below. The analysis of risk-sensitive control is technically more involved because of the exponential nature of the cost. The risk-sensitive average cost stochastic optimal control problems for CTMDPs were first considered in [9] and have been studied extensively in the literature due to its applications in finance and large deviation theory. Recently, there has been an extensive work on risk-sensitive average cost criterion problems for CTMDPs; see, for example [7], [14], [25], [26], [28] and the references therein. The risk-sensitive stochastic zero-sum games for MDPs have been studied in [[3], [6], [10], [11], [37]] and [[4], [24], [36]] consider the nonzero-sum games for MDPs. In [[3], [6]], the authors study zero-sum risk-sensitive stochastic games for discrete time MDPs with bounded cost. Both of the papers considered first the discounted cost and then ergodic cost. In [6], the authors extended the results of [3] to the general state space case. The zero-sum risk-sensitive average games have been studied in [10] and discounted risk-sensitive zero-sum games were studied in [29] for CTMDPs with bounded cost and transition rates. But this boundedness requirement restricts our domain of application, since in many real-life situations we see that the reward/cost and transition rates are unbounded as for example in queueing, telecommunication and population processes. In [11] and [37], the authors study finite horizon zero-sum risk-sensitive continuous-time stochastic games. In [11], unbounded costs and transition rates are considered while [37] considers unbounded transition but bounded cost. The discounted risk-sensitive zero-sum game for CTMDPs was studied in [12] with unbounded cost and transition rates.

Here we study zero-sum ergodic risk-sensitive stochastic games for CTMDPs having the following features: (a) transition and cost rates may be unbounded (b) state space is countable (c) at any state of the system the space of admissible actions is compact (d) the strategies may be history dependent. To the best of our knowledge, this is the first work which deals with infinite horizon continuous-time zero-sum risk-sensitive stochastic games for ergodic criterion on countable state space for unbounded transition and cost rates. Under a Lyapunov stability condition, we prove the existence of a saddle-point equilibrium in the class of stationary strategies. Using Krein-Rutman theorem, we first prove that the corresponding HJI equation has a unique solution for any finite subset of the state space. Then using the Lyapunov stability condition, we establish the existence of a unique solution for the corresponding HJI equation on the whole state space. Also we give a complete characterization of saddle point equilibrium in terms of the corresponding HJI equation.

The rest of this article is arranged as follows. Section 2 gives the description of the problem and assumptions. We also show in this section that the required risk-sensitive optimality equation (HJI equation) has a solution. In Section 3, we completely characterize all possible saddle point equilibria in the class of stationary Markov strategies. In Section 4, we present an illustrative example.

2. The game model

In this section we introduce the continuous-time zero-sum stochastic game model described by the following elements

{S,A,B,(A(i)A,B(i)B,iS),q(|i,a,b),c(i,a,b)},\{S,A,B,(A(i)\subset A,B(i)\subset B,i\in S),q(\cdot|i,a,b),c(i,a,b)\}, (2.1)

where

  • SS, called the state space, is the set of all nonnegative integers.

  • AA and BB are the action sets for players 1 and 2, respectively. The action spaces AA and BB are assumed to be Borel spaces with the Borel σ\sigma-algebras (A)\mathcal{B}(A) and (B)\mathcal{B}(B), respectively.

  • For each iSi\in S, A(i)(A)A(i)\in\mathcal{B}(A) and B(i)(B)B(i)\in\mathcal{B}(B) denote the sets of admissible actions for players 1 and 2 in state ii, respectively. Let K:={(i,a,b)|iS,aA(i),bB(i)}K:=\{(i,a,b)|i\in S,a\in A(i),b\in B(i)\}, which is a Borel subset of S×A×BS\times A\times B. Throughout this paper, we assume that the admissible action spaces A(i)(A)A(i)(\subset A) and B(i)(B)B(i)(\subset B) are compact for each ii.

  • Given any (i,a,b)K(i,a,b)\in K, the transition rate q(j|i,a,b)q(j|i,a,b) is a signed kernel on SS such that q(j|i,a,b)0q(j|i,a,b)\geq 0 for all j,iSj,i\in S with jij\neq i. Moreover, we assume that q(j|i,a,b)q(j|i,a,b) satisfies the following conservative and stable conditions: for any iS,i\in S,

    jSq(j|i,a,b)=0for all(a,b)A(i)×B(i)and\displaystyle\sum_{j\in S}q(j|i,a,b)=0~{}\text{for all}~{}(a,b)\in A(i)\times B(i)~{}~{}~{}\text{and}
    q(i):=sup(a,b)A(i)×B(i)q(i,a,b)<,\displaystyle~{}q^{*}(i):=\sup_{(a,b)\in A(i)\times B(i)}q(i,a,b)<\infty, (2.2)

    where q(i,a,b):=q(i|i,a,b)0.q(i,a,b):=-q(i|i,a,b)\geq 0.

  • Finally, the measurable function c:K+c:K\to\mathbb{R}_{+} denotes the cost rate (representing cost for player 2 and payoff for player 1).

The game evolves as follows. The players observe continuously the current state of the system. When the system is in state iSi\in S at time t0t\geq 0, the players independently choose actions atA(i)a_{t}\in A(i) and btB(i)b_{t}\in B(i) according to some strategies, respectively. As a consequence of this, the following happens:

  • player 2 pays an immediate cost at rate c(i,at,bt)c(i,a_{t},b_{t}) to player 1;

  • the system stays in state ii for a random time, with rate of leaving ii given by q(i,at,bt)q(i,a_{t},b_{t}), and then jumps to a new state jij\neq i with the probability determined by q(|i,at,bt)q(i,at,bt)\dfrac{q(\cdot|i,a_{t},b_{t})}{q(i,a_{t},b_{t})} (see Proposition in [[16], p. 205] for details).

When the state of the system transits to a new state jj, the above procedure is repeated.

The goal of player 2 is to minimize his/her accumulated cost, whereas player 1 tries to maximize the same with respect to some performance criterion J(,,,)J(\cdot,\cdot,\cdot,\cdot), which in our present case is defined by (2.4), below. Such a model is relevant in worst-case scenarios, e.g., in financial applications when a risk-averse investor is trying to maximize his long-term portfolio gain against the market which, by default, is the minimizer in this case.

To formalize what is described above, below we describe the construction of continuous time Markov decision processes (CTMDPs) under possibly admissible history-dependent strategies. To construct the underlying CTMDPs (as in [[19], [22], [30]]), we introduce some notations: let SΔ:=S{Δ}S_{\Delta}:=S\cup\{\Delta\} (with some ΔS\Delta\notin S), Ω0:=(S×(0,))\Omega^{0}:=(S\times(0,\infty))^{\infty}, Ω:=Ω0{(i0,θ1,i1,,θk,ik,,Δ,,Δ,)|ilS,θl(0,),\Omega:=\Omega^{0}\cup\{(i_{0},\theta_{1},i_{1},\cdots,\theta_{k},i_{k},\infty,\Delta,\infty,\Delta,\cdots)|i_{l}\in S,\theta_{l}\in(0,\infty), for each 0lk,k0}0\leq l\leq k,k\geq 0\}, and let \mathscr{F} be the Borel σ\sigma-algebra on Ω\Omega. Then we obtain the measurable space (Ω,)(\Omega,\mathscr{F}). For each k0k\geq 0, ω:=(i0,θ1,i1,,θk,ik,)Ω,\omega:=(i_{0},\theta_{1},i_{1},\cdots,\theta_{k},i_{k},\cdots)\in\Omega, define T0(ω):=0T_{0}(\omega):=0, Tk(ω):=Tk1(ω)+θkT_{k}(\omega):=T_{k-1}(\omega)+\theta_{k}, T(ω):=limkTk(ω)T_{\infty}(\omega):=\lim_{k\rightarrow\infty}T_{k}(\omega). Using {Tk}\{T_{k}\}, we define the state process {ξt}t0\{\xi_{t}\}_{t\geq 0} as

ξt(ω):=k0I{Tkt<Tk+1}ik+I{tT}Δ, for t0.\xi_{t}(\omega):=\sum_{k\geq 0}I_{\{T_{k}\leq t<T_{k+1}\}}i_{k}+I_{\{t\geq T_{\infty}\}}\Delta,\text{ for }t\geq 0. (2.3)

Here, IEI_{E} denotes the indicator function of a set EE, and we use the convention that 0+z=:z0+z=:z and 0z=:00\cdot z=:0 for all zSΔz\in S_{\Delta}. The process after TT_{\infty} is regarded to be absorbed in the state Δ\Delta. Thus, let q(|Δ,aΔ,bΔ):0q(\cdot|\Delta,a_{\Delta},b_{\Delta}):\equiv 0, AΔ:=A{aΔ}A_{\Delta}:=A\cup\{a_{\Delta}\}, BΔ:=B{bΔ}B_{\Delta}:=B\cup\{b_{\Delta}\}, A(Δ):={aΔ}A(\Delta):=\{a_{\Delta}\}, B(Δ):={bΔ}B(\Delta):=\{b_{\Delta}\}, c(Δ,a,b):0c(\Delta,a,b):\equiv 0 for all (a,b)AΔ×BΔ(a,b)\in A_{\Delta}\times B_{\Delta}, where aΔa_{\Delta}, bΔb_{\Delta} are isolated points. Moreover, let t:=σ({Tks,ξTkD}:D(S),0st,k0)\mathscr{F}_{t}:=\sigma(\{T_{k}\leq s,\xi_{T_{k}}\in D\}:D\in\mathcal{B}(S),0\leq s\leq t,k\geq 0) for all t0t\geq 0, s=:0t<st\mathscr{F}_{s-}=:\bigvee_{0\leq t<s}\mathscr{F}_{t}, and 𝒫:=σ({A×{0},A0}{B×(s,),Bs})\mathscr{P}:=\sigma(\{A\times\{0\},A\in\mathscr{F}_{0}\}\cup\{B\times(s,\infty),B\in\mathscr{F}_{s-}\}) which denotes the σ\sigma-algebra of predictable sets on Ω×[0,)\Omega\times[0,\infty) related to {t}t0\{\mathscr{F}_{t}\}_{t\geq 0}.

In order to define the risk sensitive cost criterion, we need to introduce the definition of strategy below.

Definition 2.1.

An admissible history-dependent strategy for player 1, denoted by π1\pi^{1}, is determined by a sequence {πk1,k0}\{\pi_{k}^{1},k\geq 0\} of stochastic kernel on AA such that

π1(da|ω,t)\displaystyle\pi^{1}(da|\omega,t) =I{t=0}(t)π01(da|i0,0)+k0I{Tk<tTk+1}πk1(da|i0,θ1,i1,,θk,ik,tTk)\displaystyle=I_{\{t=0\}}(t)\pi_{0}^{1}(da|i_{0},0)+\sum_{k\geq 0}I_{\{T_{k}<t\leq T_{k+1}\}}\pi^{1}_{k}(da|i_{0},\theta_{1},i_{1},\dots,\theta_{k},i_{k},t-T_{k})
+I{tT}δaΔ(da),\displaystyle+I_{\{t\geq T_{\infty}\}}\delta_{a_{\Delta}}(da),

where π01(da|i0,0)\pi_{0}^{1}(da|i_{0},0) is a stochastic kernel on AA given SS such that π01(A(i0)|i0,0)=1\pi_{0}^{1}(A(i_{0})|i_{0},0)=1, πk1(k1)\pi^{1}_{k}(k\geq 1) are stochastic kernels on AA given (S×(0,))k+1(S\times(0,\infty))^{k+1} such that πk1(A(ik)|i0,θ1,i1,,θk,ik,tTk)=1\pi_{k}^{1}(A(i_{k})|i_{0},\theta_{1},i_{1},\cdots,\theta_{k},i_{k},t-T_{k})=1, and δaΔ(da)\delta_{a_{\Delta}}(da) denotes the Dirac measure at the point aΔa_{\Delta}.

The set of all admissible history-dependent strategies for player 1 is denoted by Π1\Pi^{1}. A strategy π1Π1\pi^{1}\in\Pi^{1} for player 1, is called a Markov if π1(da|ω,t)=π1(da|ξt(w),t)\pi^{1}(da|\omega,t)=\pi^{1}(da|\xi_{t-}(w),t) for every wΩw\in\Omega and t0t\geq 0, where ξt(w):=limstξs(w)\xi_{t-}(w):=\lim_{s\uparrow t}\xi_{s}(w). A Markov stragegy π1\pi^{1} is called a stationary Markov strategy if π1\pi^{1} does not have explicit dependence on time. We denote by Π1m\Pi^{m}_{1} and Π1s\Pi^{s}_{1} the family of all Markov strategies and stationary Markov strategies, respectively, for player 1. The sets of all admissible history-dependent strategies Π2\Pi^{2}, all Markov strategies Π2m\Pi^{m}_{2} and all stationary strategies Π2s\Pi^{s}_{2} for player 2 are defined similarly.

For any compact metric space YY, let P(Y)P(Y) denote the space of probability measures on (Y)\mathcal{B}(Y) with Prohorov topology. Since for each iSi\in S, A(i)A(i) and B(i)B(i) are compact sets, P(A(i))P(A(i)) and P(B(i))P(B(i)) are compact metric spaces. For each i,jSi,j\in S, μP(A(i))\mu\in P(A(i)) and νP(B(i))\nu\in P(B(i)), the associated cost and transition rates are defined, respectively, as follows:

c(i,μ,ν):=B(i)A(i)c(i,a,b)μ(da)ν(db),c(i,\mu,\nu):=\int_{B(i)}\int_{A(i)}c(i,a,b)\mu(da)\nu(db),
q(j|i,μ,ν):=B(i)A(i)q(j|i,a,b)μ(da)ν(db).q(j|i,\mu,\nu):=\int_{B(i)}\int_{A(i)}q(j|i,a,b)\mu(da)\nu(db).

Note that π1Π1s\pi^{1}\in{\Pi_{1}^{s}} can be identified with a map π1:SP(A)\pi^{1}:S\to{P}(A) such that π1(|j)P(A(j))\pi^{1}(\cdot|j)\in{P}(A(j)) for each jSj\in S. Thus, we have Π1s=ΠiSP(A(i))\Pi_{1}^{s}=\displaystyle\Pi_{i\in S}{P}(A(i)) and Π2s=ΠiSP(B(i))\Pi_{2}^{s}=\displaystyle\Pi_{i\in S}{P}(B(i)). Therefore by Tychonoff theorem, the sets Π1s{\Pi_{1}^{s}} and Π2s{\Pi_{2}^{s}} are compact metric spaces. Also, note that under Assumption 2.1 (given below) for any initial state iSi\in S and any pair of strategies (π1,π2)Π1×Π2(\pi^{1},\pi^{2})\in\Pi^{1}\times\Pi^{2}, Theorem 4.27 in [23] yields the existence of a unique probability measure denoted by Piπ1,π2P^{\pi^{1},\pi^{2}}_{i} on (Ω,)(\Omega,\mathscr{F}). Let Eiπ1,π2E^{\pi^{1},\pi^{2}}_{i} be the expectation operator with respect to Piπ1,π2P^{\pi^{1},\pi^{2}}_{i}. Also, from [[16], pp.13-15], we know that {ξt}t0\{\xi_{t}\}_{t\geq 0} is a Markov process under any (π1,π2)Π1m×Π2m(\pi^{1},\pi^{2})\in\Pi^{m}_{1}\times\Pi^{m}_{2} (in fact, strong Markov). Now we give the definition of the risk-sensitive average cost criterion for zero-sum continuous-time games. Since the risk-sensitive parameter remains fixed throughout we assume without any loss of generality that the risk-sensitivity coefficient θ=1\theta=1. For each iSi\in S and any (π1,π2)Π1×Π2(\pi^{1},\pi^{2})\in\Pi^{1}\times\Pi^{2}, the risk-sensitive ergodic cost criterion is given by

J(i,c,π1,π2):=lim supT1TlnEiπ1,π2[e0TBAc(ξt,a,b)π1(da|ω,t)π2(db|ω,t)𝑑t].J(i,c,\pi^{1},\pi^{2}):=\limsup_{T\rightarrow\infty}\frac{1}{T}\ln E^{\pi^{1},\pi^{2}}_{i}\biggl{[}e^{\int_{0}^{T}\int_{B}\int_{A}c(\xi_{t},a,b)\pi^{1}(da|\omega,t)\pi^{2}(db|\omega,t)dt}\biggr{]}. (2.4)

Player 1 tries to maximize the above over his/her admissible strategies whereas player 2 tries to minimize the same. Now we define the lower/upper value of the game. The functions on S defined by L(i):=supπ1Π1infπ2Π2J(i,c,π1,π2)L(i):=\displaystyle\sup_{\pi^{1}\in\Pi^{1}}\inf_{\pi^{2}\in\Pi^{2}}J(i,c,\pi^{1},\pi^{2}) and U(i):=infπ2Π2supπ1Π1J(i,c,π1,π2)U(i):=\displaystyle\inf_{\pi^{2}\in\Pi^{2}}\sup_{\pi^{1}\in\Pi^{1}}J(i,c,\pi^{1},\pi^{2}) are called, respectively, the lower value and the upper value of the game. It is easy to see that

L(i)U(i)for alliS.L(i)\leq U(i)~{}\text{for all}~{}i\in S.
Definition 2.2.

If L(i)=U(i)L(i)=U(i) for all iSi\in S, then the common function is called the value of the game and is denoted by J(i)J^{*}(i).

Definition 2.3.

Suppose that the game admits a value JJ^{*}. Then a strategy π1\pi^{*1} in Π1\Pi^{1} is said to be optimal for player 1 if

infπ2Π2J(i,c,π1,π2)=J(i)for alliS.\inf_{\pi^{2}\in\Pi^{2}}J(i,c,\pi^{*1},\pi^{2})=J^{*}(i)~{}\text{for all}~{}i\in S.

Similarly, π2Π2\pi^{*2}\in\Pi^{2} is optimal for player 2 if

supπ1Π1J(i,c,π1,π2)=J(i)for alliS.\sup_{\pi^{1}\in\Pi^{1}}J(i,c,\pi^{1},\pi^{*2})=J^{*}(i)~{}\text{for all}~{}i\in S.

If πkΠk\pi^{*k}\in\Pi^{k} is optimal for player k (k=1,2), then (π1,π2)(\pi^{*1},\pi^{*2}) is called a pair of optimal strategies and also called a saddle-point equilibrium.

Next we list the commonly used notations below:

  • For any finite set 𝒟S\mathcal{D}\subset S, we define 𝒟={f:Sf(i)=0i𝒟c}\mathcal{B}_{\mathcal{D}}=\{f:S\to\mathbb{R}\mid\,\,f(i)=0\,\,\,\forall\,\,i\in\mathcal{D}^{c}\} .

  • 𝒟+𝒟\mathcal{B}_{\mathcal{D}}^{+}\subset\mathcal{B}_{\mathcal{D}} denotes the cone of all nonnegative functions vanishing outside 𝒟.\mathscr{D}.

  • Given any real-valued function 𝒱1\mathcal{V}\geq 1 on SS, we define a Banach space (L𝒱,𝒱)(L^{\infty}_{\mathcal{V}},\|\cdot\|^{\infty}_{\mathcal{V}}) of 𝒱\mathcal{V}-weighted functions by

    L𝒱={u:Su𝒱:=supiS|u(i)|𝒱(i)<}.L^{\infty}_{\mathcal{V}}=\biggl{\{}u:S\rightarrow\mathbb{R}\mid\|u\|^{\infty}_{\mathcal{V}}:=\sup_{i\in S}\frac{|u(i)|}{\mathcal{V}(i)}<\infty\biggr{\}}.
  • c:=sup(i,a,b)Kc(i,a,b)\|c\|_{\infty}:=\displaystyle\sup_{(i,a,b)\in K}c(i,a,b).

  • For any function f𝒟f\in\mathcal{B}_{\mathcal{D}}, f𝒟=max{|f(i)|:i𝒟}\|f\|_{\mathcal{D}}=\max\{|f(i)|:i\in\mathcal{D}\}.

  • For any finite set S\mathscr{B}\subset S, τ~():=inf{t>0:ξt}\tilde{\tau}(\mathscr{B}):=\inf\{t>0:\xi_{t}\in\mathscr{B}\}.

Our main goal is to establish the existence of a saddle-point equilibrium among the class of admissible history-dependent strategies. To this end, following [3] and [7], we investigate the HJI equation given by

ρψ(i)\displaystyle\rho\psi(i) =supμP(A(i))infνP(B(i))[jSψ(j)q(j|i,μ,ν)+c(i,μ,ν)ψ(i)]\displaystyle=\sup_{\mu\in P(A(i))}\inf_{\nu\in P(B(i))}\bigg{[}\sum_{j\in S}\psi(j)q(j|i,\mu,\nu)+c(i,\mu,\nu)\psi(i)\bigg{]}
=infνP(B(i))supμP(A(i))[jSψ(j)q(j|i,μ,ν)+c(i,μ,ν)ψ(i)].\displaystyle=\inf_{\nu\in P(B(i))}\sup_{\mu\in P(A(i))}\bigg{[}\sum_{j\in S}\psi(j)q(j|i,\mu,\nu)+c(i,\mu,\nu)\psi(i)\bigg{]}. (2.5)

Here ρ\rho is a scalar and ψ\psi is an appropriate function. The above is clearly an eigenvalue problem related to a nonlinear operator on an appropriate space. By a nonlinear version of Krein-Rutman theorem, we first show that Dirichlet eigenvalue problem associated with the above equation admits a solution in the space of bounded functions. Then by using a suitable limiting argument we show that the above HJI equation admits a principal eigenpair in an appropriate space. Finally exploiting the HJI equation, we completely characterize all possible saddle-point equilibria in the space of stationary Markov strategies. This is a brief outline of our procedure of establishing a saddle point equilibrium. The details now follow.

Since the transition rates (i.e., q(j|i,a,b)q(j|i,a,b) ) may be unbounded, to avoid the explosion of the state process {ξt,t0}\{\xi_{t},t\geq 0\}, the following assumption is imposed on the transition rates, which had been widely used in CTMDPs; see, for instance, [[17], [18], [19]] and references therein.

Assumption 2.1.

There exist real-valued function V~1\tilde{V}\geq 1 on SS, constants b00b_{0}\neq 0 and b10b_{1}\geq 0, and b2>0b_{2}>0 such that :

  1. (i)

    jSV~(j)q(j|i,a,b)b0V~(i)+b1\sum_{j\in S}\tilde{V}(j)q(j|i,a,b)\leq b_{0}\tilde{V}(i)+b_{1} for all (i,a,b)K(i,a,b)\in K;

  2. (ii)

    q(i)b2V~(i)q^{*}(i)\leq b_{2}\tilde{V}(i) for all iSi\in S, where q(i)q^{*}(i) is as in (2.2).

Throughout the rest of this article we are going to assume that Assumption 2.1 holds. Note that if supiSq(i)<\sup_{i\in S}q^{*}(i)<\infty then Assumption 2.1 holds trivially. In this case we can choose V~\tilde{V} to be a suitable constant.

Since we are allowing our transition and cost rates to be unbounded, to guarantee the finiteness of J(i,c,π1,π2)J(i,c,\pi^{1},\pi^{2}), we need the following Assumption.

Assumption 2.2.

We assume that the CTMDP {ξt}t0\{\xi_{t}\}_{t\geq 0} is irreducible under every pair of stationary Markov strategies (π1,π2)Π1s×Π2s(\pi^{1},\pi^{2})\in\Pi^{s}_{1}\times\Pi^{s}_{2}. Assume that the cost function cc is bounded below. Thus without loss of generality we assume that c0c\geq 0. Furthermore, suppose there exist a constant C>0C>0, a finite set 𝒦^\hat{\mathscr{K}} and a Lyapunov function V:S[1,)V:S\to[1,\infty) such that one of the following hold.

  • (a)

    When the running cost is bounded: For some positive constant γ^>c\hat{\gamma}>\|c\|_{\infty}, we have following blanket stability condition

    sup(a,b)A(i)×B(i)jSV(j)q(j|i,a,b)CI𝒦^(i)γ^V(i)iS.\displaystyle\sup_{(a,b)\in A(i)\times B(i)}\sum_{j\in S}V(j)q(j|i,a,b)\leq CI_{\hat{\mathscr{K}}}(i)-\hat{\gamma}V(i)~{}\forall i\in S. (2.6)
  • (b)

    When the running cost is unbounded: For some norm-like function ^:S+\hat{\ell}:S\rightarrow\mathbb{R}_{+}, the function ^()max(a,b)A()×B()c(,a,b)\hat{\ell}(\cdot)-\displaystyle\max_{(a,b)\in A(\cdot)\times B(\cdot)}c(\cdot,a,b)\; is norm-like and we have the following blanket stability condition

    sup(a,b)A(i)×B(i)jSV(j)q(j|i,a,b)CI𝒦^(i)^(i)V(i)iS.\displaystyle\sup_{(a,b)\in A(i)\times B(i)}\sum_{j\in S}V(j)q(j|i,a,b)\leq CI_{\hat{\mathscr{K}}}(i)-\hat{\ell}(i)V(i)~{}\forall i\in S. (2.7)

We wish to establish the existence of a saddle-point equilibrium in the class of all stationary strategies. In view of this we also need the following assumptions. Let i0Si_{0}\in S be a fixed point (a reference state).

Assumption 2.3.
  1. (i)

    For any fixed i,jSi,j\in S the functions q(j|i,a,b)q(j|i,a,b) and c(i,a,b){c}(i,a,b) are continuous in (a,b)A(i)×B(i)(a,b)\in A(i)\times B(i) .

  2. (ii)

    The sum jSV(j)q(j|i,a,b)\displaystyle\sum_{j\in S}V(j)q(j|i,a,b) is continuous in (a,b)A(i)×B(i)(a,b)\in A(i)\times B(i) for any given iSi\in S, where VV is as Assumption 2.2.

  3. (iii)

    There exists i0Si_{0}\in S such that any state can be reached from i0i_{0}, i.e., q(j|i0,a,b)>0q(j|i_{0},a,b)>0 for all ji0j\neq i_{0} and (a,b)A(i0)×B(i0)(a,b)\in A(i_{0})\times B(i_{0}).

We first construct an increasing sequence of finite subsets 𝒟^nS\hat{\mathscr{D}}_{n}\subset S such that i=0𝒟^n=S\displaystyle\cup_{i=0}^{\infty}\hat{\mathscr{D}}_{n}=S and i0𝒟^ni_{0}\in\hat{\mathscr{D}}_{n} for all nn\in\mathbb{N}. Define τn:=τ(𝒟^n):=inf{t0:ξt𝒟^n}\tau_{n}:=\tau(\hat{\mathscr{D}}_{n}):=\inf\{t\geq 0:\xi_{t}\notin\hat{\mathscr{D}}_{n}\}, first exit time from 𝒟^n\hat{\mathscr{D}}_{n}.

Proposition 2.1.

Suppose Assumption 2.3 holds. Let c~:K\tilde{c}:K\to\mathbb{R} be a function continuous in (a,b)A(i)×B(i)(a,b)\in A(i)\times B(i) for each fixed iSi\in S. Suppose the cost function c~\tilde{c} satisfies the relation c~<δ\tilde{c}<-\delta in 𝒟^n\hat{\mathscr{D}}_{n} for some δ>0\delta>0 and nn\in\mathbb{N} . Then for any g𝒟^ng\in\mathcal{B}_{\hat{\mathscr{D}}_{n}} there exists a unique φ𝒟^n\varphi\in\mathcal{B}_{\hat{\mathscr{D}}_{n}} satisfying the following nonlinear equation

g(i)\displaystyle-g(i) =supμP(A(i))infνP(B(i))[jSφ(j)q(j|i,μ,ν)+c~(i,μ,ν)φ(i)]\displaystyle=\sup_{\mu\in P(A(i))}\inf_{\nu\in P(B(i))}\biggl{[}\sum_{j\in S}\varphi(j)q(j|i,\mu,\nu)+\tilde{c}(i,\mu,\nu)\varphi(i)\biggr{]}
=infνP(B(i))supμP(A(i))[jSφ(j)q(j|i,μ,ν)+c~(i,μ,ν)φ(i)]i𝒟^n,\displaystyle=\inf_{\nu\in P(B(i))}\sup_{\mu\in P(A(i))}\biggl{[}\sum_{j\in S}\varphi(j)q(j|i,\mu,\nu)+\tilde{c}(i,\mu,\nu)\varphi(i)\biggr{]}~{}\forall i\in\hat{\mathscr{D}}_{n}, (2.8)

with φ(i)=0\varphi(i)=0 for all i𝒟^nci\in\hat{\mathscr{D}}_{n}^{c} . Moreover the unique solution of the above equation satisfies

φ(i)\displaystyle\varphi(i) =supπ1Π1infπ2Π2Eiπ1,π2[0τne0tc~(ξs,π1(s),π2(s))𝑑sg(ξt)𝑑t]\displaystyle=\sup_{\pi^{1}\in\Pi^{1}}\inf_{\pi^{2}\in\Pi^{2}}E^{\pi^{1},\pi^{2}}_{i}\biggl{[}\int_{0}^{\tau_{n}}e^{\int_{0}^{t}\tilde{c}(\xi_{s},\pi^{1}(s),\pi^{2}(s))ds}g(\xi_{t})dt\biggr{]}
=infπ2Π2supπ1Π1Eiπ1,π2[0τne0tc~(ξs,π1(s),π2(s))𝑑sg(ξt)𝑑t]iS,\displaystyle=\inf_{\pi^{2}\in\Pi^{2}}\sup_{\pi^{1}\in\Pi^{1}}E^{\pi^{1},\pi^{2}}_{i}\biggl{[}\int_{0}^{\tau_{n}}e^{\int_{0}^{t}\tilde{c}(\xi_{s},\pi^{1}(s),\pi^{2}(s))ds}g(\xi_{t})dt\biggr{]}~{}\forall i\in S, (2.9)

where as before τn=inf{t0:ξt𝒟^n}\tau_{n}=\inf\{t\geq 0:\xi_{t}\notin\hat{\mathscr{D}}_{n}\}.

Proof.

Let (yi)i𝒟^n(y_{i})_{i\in\hat{\mathscr{D}}_{n}} be a sequence in \mathbb{R}. Fix i𝒟^ni\in\hat{\mathscr{D}}_{n}. Let F:F:\mathbb{R}\rightarrow\mathbb{R} be defined by

xF(x)=supμP(A(i))infνP(B(i))[ij𝒟^nyjq(j|i,μ,ν)+(q(i|i,μ,ν)+c~(i,μ,ν))x],i𝒟^n.\displaystyle x\rightarrow F(x)=\sup_{\mu\in P(A(i))}\inf_{\nu\in P(B(i))}\biggl{[}\sum_{i\neq j\in\hat{\mathscr{D}}_{n}}y_{j}q(j|i,\mu,\nu)+\biggl{(}q(i|i,\mu,\nu)+\tilde{c}(i,\mu,\nu)\biggr{)}x\biggr{]},i\in\hat{\mathscr{D}}_{n}. (2.10)

Suppose x2>x1x_{2}>x_{1}. Let ε>0\varepsilon>0. Then there exists πε1Π1s\pi^{1}_{\varepsilon}\in\Pi^{s}_{1} for which the following holds

F(x1)F(x2)=\displaystyle F(x_{1})-F(x_{2})= supμP(A(i))infνP(B(i))[ij𝒟^nyjq(j|i,μ,ν)+(q(i|i,μ,ν)+c~(i,μ,ν))x1]\displaystyle\sup_{\mu\in P(A(i))}\inf_{\nu\in P(B(i))}\biggl{[}\sum_{i\neq j\in\hat{\mathscr{D}}_{n}}y_{j}q(j|i,\mu,\nu)+\biggl{(}q(i|i,\mu,\nu)+\tilde{c}(i,\mu,\nu)\biggr{)}x_{1}\biggr{]}
supμP(A(i))infνP(B(i))[ij𝒟^nyjq(j|i,μ,ν)+(q(i|i,μ,ν)+c~(i,μ,ν))x2]\displaystyle-\sup_{\mu\in P(A(i))}\inf_{\nu\in P(B(i))}\biggl{[}\sum_{i\neq j\in\hat{\mathscr{D}}_{n}}y_{j}q(j|i,\mu,\nu)+\biggl{(}q(i|i,\mu,\nu)+\tilde{c}(i,\mu,\nu)\biggr{)}x_{2}\biggr{]}
\displaystyle\geq infνP(B(i))[ij𝒟^nyjq(j|i,πε1(i),ν)+(q(i|i,πε1(i),ν)+c~(i,πε1(i),ν))x1]\displaystyle\inf_{\nu\in P(B(i))}\biggl{[}\sum_{i\neq j\in\hat{\mathscr{D}}_{n}}y_{j}q(j|i,\pi^{1}_{\varepsilon}(i),\nu)+\biggl{(}q(i|i,\pi^{1}_{\varepsilon}(i),\nu)+\tilde{c}(i,\pi^{1}_{\varepsilon}(i),\nu)\biggr{)}x_{1}\biggr{]}
infνP(B(i))[ij𝒟^nyjq(j|i,πε1(i),ν)+(q(i|i,πε1(i),ν)+c~(i,πε1(i),ν))x2+ε]\displaystyle-\inf_{\nu\in P(B(i))}\biggl{[}\sum_{i\neq j\in\hat{\mathscr{D}}_{n}}y_{j}q(j|i,\pi^{1}_{\varepsilon}(i),\nu)+\biggl{(}q(i|i,\pi^{1}_{\varepsilon}(i),\nu)+\tilde{c}(i,\pi^{1}_{\varepsilon}(i),\nu)\biggr{)}x_{2}+\varepsilon\biggr{]}
\displaystyle\geq infνP(B(i))[(q(i|i,πε1(i),ν)+c~(i,πε1(i),ν))(x1x2)]ε\displaystyle\inf_{\nu\in P(B(i))}\biggl{[}\biggl{(}q(i|i,\pi^{1}_{\varepsilon}(i),\nu)+\tilde{c}(i,\pi^{1}_{\varepsilon}(i),\nu)\biggr{)}(x_{1}-x_{2})\biggr{]}-\varepsilon
\displaystyle\geq infνP(B(i))[c~(i,πε1(i),ν)(x2x1)]ε\displaystyle\inf_{\nu\in P(B(i))}\biggl{[}-\tilde{c}(i,\pi^{1}_{\varepsilon}(i),\nu)(x_{2}-x_{1})\biggr{]}-\varepsilon
>\displaystyle> δ(x2x1)ε.\displaystyle\,\,\delta(x_{2}-x_{1})-\varepsilon.

Since ε>0\varepsilon>0 is arbitrary we get F(x1)>F(x2)F(x_{1})>F(x_{2}). Also, we see that limx+F(x)=\lim_{x\rightarrow+\infty}F(x)=-\infty and limxF(x)=+\lim_{x\rightarrow-\infty}F(x)=+\infty. Since FF is continuous in xx, for every yy\in\mathbb{R}, there exists a unique xx satisfying F(x)=yF(x)=y. Now using the definition of FF, for fixed g𝒟^ng\in\mathcal{B}_{\hat{\mathscr{D}}_{n}}, we can define a map T^:𝒟^n𝒟^n\hat{T}:\mathcal{B}_{\hat{\mathscr{D}}_{n}}\rightarrow\mathcal{B}_{\hat{\mathscr{D}}_{n}} satisfying

supμP(A(i))infνP(B(i))[ij𝒟^nϕ~(j)q(j|i,μ,ν)+(q(i|i,μ,ν)+c~(i,μ,ν))(T^ϕ~(i))]=g(i),i𝒟^n.\displaystyle\sup_{\mu\in P(A(i))}\inf_{\nu\in P(B(i))}\biggl{[}\sum_{i\neq j\in\hat{\mathscr{D}}_{n}}\tilde{\phi}(j)q(j|i,\mu,\nu)+\biggl{(}q(i|i,\mu,\nu)+\tilde{c}(i,\mu,\nu)\biggr{)}(\hat{T}\tilde{\phi}(i))\biggr{]}=-g(i),~{}i\in\hat{\mathscr{D}}_{n}. (2.11)

Let ϕ~1,ϕ~2𝒟^n\tilde{\phi}_{1},\tilde{\phi}_{2}\in\mathcal{B}_{\hat{\mathscr{D}}_{n}}. Also, let π~1\tilde{\pi}^{1} be an outer maximizing selector of

supμP(A(i))infνP(B(i))[ij𝒟^nϕ~2(j)q(j|i,μ,ν)+(q(i|i,μ,ν)+c~(i,μ,ν))T^ϕ~2(i)].\sup_{\mu\in P(A(i))}\inf_{\nu\in P(B(i))}\biggl{[}\sum_{i\neq j\in\hat{\mathscr{D}}_{n}}\tilde{\phi}_{2}(j)q(j|i,\mu,\nu)+\biggl{(}q(i|i,\mu,\nu)+\tilde{c}(i,\mu,\nu)\biggr{)}\hat{T}\tilde{\phi}_{2}(i)\biggr{]}\,.

Assumption 2.3, ensures the existence of such a selector. It then follows that

0=\displaystyle 0= supμP(A(i))infνP(B(i))[ij𝒟^nϕ~1(j)q(j|i,μ,ν)+(q(i|i,μ,ν)+c~(i,μ,ν))T^ϕ~1(i)]\displaystyle\sup_{\mu\in P(A(i))}\inf_{\nu\in P(B(i))}\biggl{[}\sum_{i\neq j\in\hat{\mathscr{D}}_{n}}\tilde{\phi}_{1}(j)q(j|i,\mu,\nu)+\biggl{(}q(i|i,\mu,\nu)+\tilde{c}(i,\mu,\nu)\biggr{)}\hat{T}\tilde{\phi}_{1}(i)\biggr{]}
supμP(A(i))infνP(B(i))[ij𝒟^nϕ~2(j)q(j|i,μ,ν)+(q(i|i,μ,ν)+c~(i,μ,ν))T^ϕ~2(i)]\displaystyle-\sup_{\mu\in P(A(i))}\inf_{\nu\in P(B(i))}\biggl{[}\sum_{i\neq j\in\hat{\mathscr{D}}_{n}}\tilde{\phi}_{2}(j)q(j|i,\mu,\nu)+\biggl{(}q(i|i,\mu,\nu)+\tilde{c}(i,\mu,\nu)\biggr{)}\hat{T}\tilde{\phi}_{2}(i)\biggr{]}
\displaystyle\geq infνP(B(i))[ij𝒟^nϕ~1(j)q(j|i,π~1(i),ν)+(q(i|i,π~1(i),ν)+c~(i,π~1(i),ν))T^ϕ~1(i)]\displaystyle\inf_{\nu\in P(B(i))}\biggl{[}\sum_{i\neq j\in\hat{\mathscr{D}}_{n}}\tilde{\phi}_{1}(j)q(j|i,\tilde{\pi}^{1}(i),\nu)+\biggl{(}q(i|i,\tilde{\pi}^{1}(i),\nu)+\tilde{c}(i,\tilde{\pi}^{1}(i),\nu)\biggr{)}\hat{T}\tilde{\phi}_{1}(i)\biggr{]}
infνP(B(i))[ij𝒟^nϕ~2(j)q(j|i,π~1(i),ν)+(q(i|i,π~1(i),ν)+c~(i,π~1(i),ν))T^ϕ~2(i)]\displaystyle-\inf_{\nu\in P(B(i))}\biggl{[}\sum_{i\neq j\in\hat{\mathscr{D}}_{n}}\tilde{\phi}_{2}(j)q(j|i,\tilde{\pi}^{1}(i),\nu)+\biggl{(}q(i|i,\tilde{\pi}^{1}(i),\nu)+\tilde{c}(i,\tilde{\pi}^{1}(i),\nu)\biggr{)}\hat{T}\tilde{\phi}_{2}(i)\biggr{]}
\displaystyle\geq infνP(B(i))[ij𝒟^n(ϕ~1(j)ϕ~2(j))q(j|i,π~1(i),ν)+(q(i|i,π~1(i),ν)+c~(i,π~1(i),ν))(T^ϕ~1(i)T^ϕ~2(i))].\displaystyle\inf_{\nu\in P(B(i))}\biggl{[}\sum_{i\neq j\in\hat{\mathscr{D}}_{n}}(\tilde{\phi}_{1}(j)-\tilde{\phi}_{2}(j))q(j|i,\tilde{\pi}^{1}(i),\nu)+\biggl{(}q(i|i,\tilde{\pi}^{1}(i),\nu)+\tilde{c}(i,\tilde{\pi}^{1}(i),\nu)\biggr{)}(\hat{T}\tilde{\phi}_{1}(i)-\hat{T}\tilde{\phi}_{2}(i))\biggr{]}\,.

Now let the infimum of the RHS (of the above) attain at π2\pi^{*2}. Then

ϕ~1ϕ~2𝒟^nq(i|i,π~1(i),π2(i))+(q(i|i,π~1(i),π2(i))+c~(i,π~1(i),π2(i)))(T^ϕ~1(i)T^ϕ~2(i))0.\|\tilde{\phi}_{1}-\tilde{\phi}_{2}\|_{\hat{\mathscr{D}}_{n}}q(i|i,\tilde{\pi}^{1}(i),\pi^{*2}(i))+\biggl{(}q(i|i,\tilde{\pi}^{1}(i),\pi^{*2}(i))+\tilde{c}(i,\tilde{\pi}^{1}(i),\pi^{*2}(i))\biggr{)}(\hat{T}\tilde{\phi}_{1}(i)-\hat{T}\tilde{\phi}_{2}(i))\leq 0.

Hence, we deduce that

(T^ϕ~2(i)T^ϕ~1(i))supμP(A(i))supνP(B(i))q(i|i,μ,ν)q(i|i,μ,ν)c~(i,μ,ν)ϕ~1ϕ~2𝒟^n.\displaystyle(\hat{T}\tilde{\phi}_{2}(i)-\hat{T}\tilde{\phi}_{1}(i))\leq\sup_{\mu\in P(A(i))}\sup_{\nu\in P(B(i))}\frac{-q(i|i,\mu,\nu)}{-q(i|i,\mu,\nu)-\tilde{c}(i,\mu,\nu)}\|\tilde{\phi}_{1}-\tilde{\phi}_{2}\|_{\hat{\mathscr{D}}_{n}}\,.

Now in the above calculation, interchanging ϕ~1,ϕ~2\tilde{\phi}_{1},\tilde{\phi}_{2}, it follows that

T^ϕ~1T^ϕ~2𝒟^nα1ϕ~1ϕ~2𝒟^n,\|\hat{T}\tilde{\phi}_{1}-\hat{T}\tilde{\phi}_{2}\|_{\hat{\mathscr{D}}_{n}}\leq\alpha_{1}\|\tilde{\phi}_{1}-\tilde{\phi}_{2}\|_{\hat{\mathscr{D}}_{n}}\,,

where α1\alpha_{1} is a positive constant less than 11 . This implies that T^\hat{T} is a contraction map. Thus, by Banach fixed point theorem, there exists a unique φ𝒟^n\varphi\in\mathcal{B}_{\hat{\mathscr{D}}_{n}} such that T^(φ)=φ\hat{T}(\varphi)=\varphi. Now by Fan’s minimax theorem, see [[8], Theorem 3], we have

supμP(A(i))infνP(B(i))[jSφ(j)q(j|i,μ,ν)+c~(i,μ,ν)φ(i)]\displaystyle\sup_{\mu\in P(A(i))}\inf_{\nu\in P(B(i))}\biggl{[}\sum_{j\in S}\varphi(j)q(j|i,\mu,\nu)+\tilde{c}(i,\mu,\nu)\varphi(i)\biggr{]}
=infνP(B(i))supμP(A(i))[jSφ(j)q(j|i,μ,ν)+c~(i,μ,ν)φ(i)].\displaystyle=\inf_{\nu\in P(B(i))}\sup_{\mu\in P(A(i))}\biggl{[}\sum_{j\in S}\varphi(j)q(j|i,\mu,\nu)+\tilde{c}(i,\mu,\nu)\varphi(i)\biggr{]}.

This proves that (2.8) admits a unique solution. Now by using Dynkin formula as in [[16], Appendix C.3], for any (π1,π2)Π1×Π2(\pi^{1},\pi^{2})\in\Pi^{1}\times\Pi^{2} and T>0T>0, we get

Eiπ1,π2[e0Tτnc~(ξs,π1(s),π2(s))𝑑sφ(ξTτn)]φ(i)\displaystyle E^{\pi^{1},\pi^{2}}_{i}\biggl{[}e^{\int_{0}^{T\wedge\tau_{n}}\tilde{c}(\xi_{s},\pi^{1}(s),\pi^{2}(s))ds}\varphi(\xi_{T\wedge\tau_{n}})\biggr{]}-\varphi(i)
=Eiπ1,π2[0Tτne0tc~(ξs,π1(s),π2(s))𝑑s(c~(ξt,π1(t),π2(t))φ(ξt)+jSφ(j)q(j|ξt,π1(t),π2(t)))𝑑t].\displaystyle=E^{\pi^{1},\pi^{2}}_{i}\biggl{[}\int_{0}^{T\wedge\tau_{n}}e^{\int_{0}^{t}\tilde{c}(\xi_{s},\pi^{1}(s),\pi^{2}(s))ds}\bigg{(}\tilde{c}(\xi_{t},\pi^{1}(t),\pi^{2}(t))\varphi(\xi_{t})+\sum_{j\in S}\varphi(j)q(j|\xi_{t},\pi^{1}(t),\pi^{2}(t))\bigg{)}dt\biggr{]}. (2.12)

Using the compactness of A(i),B(i)A(i),B(i) and the continuity of c~,q\tilde{c},q, there exists a pair of selectors (π1,π2)Π1s×Π2s(\pi^{*1},\pi^{*2})\in\Pi^{s}_{1}\times\Pi^{s}_{2} (i.e., a mini-max selector) satisfying

g(i)\displaystyle-g(i) =infνP(B(i))[jSφ(j)q(j|i,π1(i),ν)+c~(i,π1(i),ν)φ(i)]\displaystyle=\inf_{\nu\in P(B(i))}\biggl{[}\sum_{j\in S}\varphi(j)q(j|i,\pi^{*1}(i),\nu)+\tilde{c}(i,\pi^{*1}(i),\nu)\varphi(i)\biggr{]}
=supμP(A(i))[jSφ(j)q(j|i,μ,π2(i))+c~(i,μ,π2(i))φ(i)].\displaystyle=\sup_{\mu\in P(A(i))}\biggl{[}\sum_{j\in S}\varphi(j)q(j|i,\mu,\pi^{*2}(i))+\tilde{c}(i,\mu,\pi^{*2}(i))\varphi(i)\biggr{]}. (2.13)

Then, using (2.12) and (2.13), we obtain

Eiπ1,π2[0Tτne0tc~(ξs,π1(ξs),π2(s))𝑑sg(ξt)𝑑t]Eiπ1,π2[e0Tτnc~(ξs,π1(ξs),π2(s))𝑑sφ(ξTτn)]+φ(i).\displaystyle E^{\pi^{*1},\pi^{2}}_{i}\biggl{[}\int_{0}^{T\wedge\tau_{n}}e^{\int_{0}^{t}\tilde{c}(\xi_{s},\pi^{*1}(\xi_{s}),\pi^{2}(s))ds}g(\xi_{t})dt\biggr{]}\geq-E^{\pi^{*1},\pi^{2}}_{i}\biggl{[}e^{\int_{0}^{T\wedge\tau_{n}}\tilde{c}(\xi_{s},\pi^{*1}(\xi_{s}),\pi^{2}(s))ds}\varphi(\xi_{T\wedge\tau_{n}})\biggr{]}+\varphi(i).

Using the dominated convergence theorem, taking TT\rightarrow\infty in the above equation, we get

Eiπ1,π2[0τne0tc~(ξs,π1(ξs),π2(s))𝑑sg(ξt)𝑑t]\displaystyle E^{\pi^{*1},\pi^{2}}_{i}\biggl{[}\int_{0}^{\tau_{n}}e^{\int_{0}^{t}\tilde{c}(\xi_{s},\pi^{*1}(\xi_{s}),\pi^{2}(s))ds}g(\xi_{t})dt\biggr{]} Eiπ1,π2[e0τnc~(ξs,π1(ξs),π2(s))𝑑sφ(ξτn)]+φ(i)\displaystyle\geq-E^{\pi^{*1},\pi^{2}}_{i}\biggl{[}e^{\int_{0}^{\tau_{n}}\tilde{c}(\xi_{s},\pi^{*1}(\xi_{s}),\pi^{2}(s))ds}\varphi(\xi_{\tau_{n}})\biggr{]}+\varphi(i)
=φ(i).\displaystyle=\varphi(i).

Hence

φ(i)\displaystyle\varphi(i) Eiπ1,π2[0τne0tc~(ξs,π1(ξs),π2(s))𝑑sg(ξt)𝑑t].\displaystyle\leq E^{\pi^{*1},\pi^{2}}_{i}\biggl{[}\int_{0}^{\tau_{n}}e^{\int_{0}^{t}\tilde{c}(\xi_{s},\pi^{*1}(\xi_{s}),\pi^{2}(s))ds}g(\xi_{t})dt\biggr{]}.

Since π2Π2\pi^{2}\in\Pi^{2} is arbitrary,

φ(i)infπ2Π2Eiπ1,π2[0τne0tc~(ξs,π1(ξs),π2(s))𝑑sg(ξt)𝑑t].\displaystyle\varphi(i)\leq\inf_{\pi^{2}\in\Pi^{2}}E^{\pi^{*1},\pi^{2}}_{i}\biggl{[}\int_{0}^{\tau_{n}}e^{\int_{0}^{t}\tilde{c}(\xi_{s},\pi^{*1}(\xi_{s}),\pi^{2}(s))ds}g(\xi_{t})dt\biggr{]}. (2.14)

Similarly, using (2.12), (2.13), and Fatou’s Lemma, we get

φ(i)\displaystyle\varphi(i) supπ1Π1Eiπ1,π2[0τne0tc~(ξs,π1(s),π2(ξs))𝑑sg(ξt)𝑑t].\displaystyle\geq\sup_{\pi^{1}\in\Pi^{1}}E^{\pi^{1},\pi^{*2}}_{i}\biggl{[}\int_{0}^{\tau_{n}}e^{\int_{0}^{t}\tilde{c}(\xi_{s},\pi^{1}(s),\pi^{*2}(\xi_{s}))ds}g(\xi_{t})dt\biggr{]}. (2.15)

Using (2.14) and (2.15), we obtain

φ(i)\displaystyle\varphi(i) =infπ2Π2supπ1Π1Eiπ1,π2[0τne0tc~(ξs,π1(s),π2(s))𝑑sg(ξt)𝑑t]\displaystyle=\inf_{\pi^{2}\in\Pi^{2}}\sup_{\pi^{1}\in\Pi^{1}}E^{\pi^{1},\pi^{2}}_{i}\biggl{[}\int_{0}^{\tau_{n}}e^{\int_{0}^{t}\tilde{c}(\xi_{s},\pi^{1}(s),\pi^{2}(s))ds}g(\xi_{t})dt\biggr{]}
=supπ1Π1infπ2Π2Eiπ1,π2[0τne0tc~(ξs,π1(s),π2(s))𝑑sg(ξt)𝑑t]iS.\displaystyle=\sup_{\pi^{1}\in\Pi^{1}}\inf_{\pi^{2}\in\Pi^{2}}E^{\pi^{1},\pi^{2}}_{i}\biggl{[}\int_{0}^{\tau_{n}}e^{\int_{0}^{t}\tilde{c}(\xi_{s},\pi^{1}(s),\pi^{2}(s))ds}g(\xi_{t})dt\biggr{]}~{}i\in S.

This completes the proof. ∎

We now recall a version of the nonlinear Krein-Rutman theorem from [[1], Section 3.1]. Let 𝒳^\hat{\mathscr{X}} be an ordered Banach space. In what follows \succeq denotes a partial ordering in 𝒳^\hat{\mathscr{X}} with respect to a positive cone 𝒞^\hat{\mathscr{C}} (𝒳^\subset\hat{\mathscr{X}}), that is xyxy𝒞^x\succeq y\Leftrightarrow x-y\in\hat{\mathscr{C}}. Also, recall that if a map T~:𝒳^𝒳^\tilde{T}:\hat{\mathscr{X}}\rightarrow\hat{\mathscr{X}} is continuous and compact, it is called completely continuous.

Theorem 2.1.

Let 𝒳^\hat{\mathscr{X}} be as above and 𝒞^𝒳^\hat{\mathscr{C}}\subset\hat{\mathscr{X}} a nonempty closed cone that satisfies 𝒞^𝒞^=𝒳^\hat{\mathscr{C}}-\hat{\mathscr{C}}=\hat{\mathscr{X}}. Let T~:𝒳^𝒳^\tilde{T}:\hat{\mathscr{X}}\rightarrow\hat{\mathscr{X}} be an order-preserving, completely continuous, 1-homogeneous map with the property that if for some nonzero ζ𝒞^\zeta\in\hat{\mathscr{C}} and N>0N>0, we have NT~(ζ)ζN\tilde{T}(\zeta)\succeq\zeta. Then there exist a nontrivial f𝒞^f\in\hat{\mathscr{C}} and λ~>0\tilde{\lambda}>0 satisfying T~f=λ~f\tilde{T}f=\tilde{\lambda}f.

Lemma 2.1.

Suppose Assumption 2.2 holds. Consider a finite subset \mathscr{B} of SS such that 𝒦^\hat{\mathscr{K}}\subset\mathscr{B}. Let τ~()=inf{t>0:ξt}\tilde{\tau}(\mathscr{B})=\inf\{t>0:\xi_{t}\in\mathscr{B}\}. Then for any pair of strategies (π1,π2)Π1×Π2(\pi^{1},\pi^{2})\in\Pi^{1}\times\Pi^{2}, the following results hold.

  1. (i)

    When Assumption 2.2 (a) holds:

    Eiπ1,π2[eγ^τ~()V(ξτ~())]V(i)ic.\displaystyle E^{\pi^{1},\pi^{2}}_{i}\bigg{[}e^{\hat{\gamma}\tilde{\tau}(\mathscr{B})}V(\xi_{\tilde{\tau}(\mathscr{B})})\bigg{]}\leq V(i)~{}\forall~{}i\in\mathscr{B}^{c}. (2.16)
  2. (ii)

    When Assumption 2.2 (b) holds:

    Eiπ1,π2[e0τ~()^(ξs)𝑑sV(ξτ~())]V(i)ic.\displaystyle E^{\pi^{1},\pi^{2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}(\mathscr{B})}\hat{\ell}(\xi_{s})ds}V(\xi_{\tilde{\tau}(\mathscr{B})})\bigg{]}\leq V(i)~{}\forall~{}i\in\mathscr{B}^{c}. (2.17)
Proof.

It is easy to see that the proof of (i) is analogous to that the proof of (ii) when we replace ^\hat{\ell} with γ^\hat{\gamma}. So, we prove only part (ii). Suppose Assumption 2.2 (b) holds. Let nn be large enough so that 𝒟^n\mathscr{B}\subset\hat{\mathscr{D}}_{n}. Applying Dynkin’s formula [[16], Appendix C.3], for ic𝒟^ni\in\mathscr{B}^{c}\cap\hat{\mathscr{D}}_{n} we have

Eiπ1,π2[e0τ~()Tτn^(ξs)𝑑sV(τ~()Tτn)]V(i)\displaystyle E^{\pi^{1},\pi^{2}}_{i}\biggl{[}e^{\int_{0}^{\tilde{\tau}(\mathscr{B})\wedge T\wedge\tau_{n}}\hat{\ell}(\xi_{s})ds}V(\tilde{\tau}(\mathscr{B})\wedge T\wedge\tau_{n})\biggr{]}-V(i)
=Eiπ1,π2[0τ~()Tτne0t^(ξs)𝑑s[^(ξt)V(ξt)+jSq(j|ξt,π1(t),π2(t))V(j)]𝑑t]\displaystyle=E^{\pi^{1},\pi^{2}}_{i}\biggl{[}\int_{0}^{\tilde{\tau}(\mathscr{B})\wedge T\wedge\tau_{n}}e^{\int_{0}^{t}\hat{\ell}(\xi_{s})ds}[\hat{\ell}(\xi_{t})V(\xi_{t})+\sum_{j\in S}q(j|\xi_{t},\pi^{1}(t),\pi^{2}(t))V(j)]dt\biggr{]}
Eiπ1,π2[0τ~()Tτne0t^(ξs)𝑑sCI𝒦^(ξt)𝑑t]=0,\displaystyle\leq E^{\pi^{1},\pi^{2}}_{i}\biggl{[}\int_{0}^{\tilde{\tau}(\mathscr{B})\wedge T\wedge\tau_{n}}e^{\int_{0}^{t}\hat{\ell}(\xi_{s})ds}CI_{\hat{\mathscr{K}}}(\xi_{t})dt\biggr{]}=0,

where τn=inf{t0:ξt𝒟^n}\tau_{n}=\inf\{t\geq 0:\xi_{t}\notin\hat{\mathscr{D}}_{n}\} . Now by Fatou’s lemma, taking first nn\rightarrow\infty and then TT\rightarrow\infty, we get the required result. ∎

Lemma 2.2.

Suppose Assumptions 2.1, 2.2, and 2.3 hold. Then for nn\in\mathbb{N}, there exists a pair (ρn,ψn)×𝒟^n+(\rho_{n},\psi_{n})\in\mathbb{R}\times\mathcal{B}_{\hat{\mathscr{D}}_{n}}^{+}, ψn0\psi_{n}\gneq 0 for the following Dirichlet nonlinear eigenequation

ρnψn(i)\displaystyle\rho_{n}\psi_{n}(i) =supμP(A(i))infνP(B(i))[jSψn(j)q(j|i,μ,ν)+c(i,μ,ν)ψn(i)]\displaystyle=\sup_{\mu\in P(A(i))}\inf_{\nu\in P(B(i))}\bigg{[}\sum_{j\in S}\psi_{n}(j)q(j|i,\mu,\nu)+c(i,\mu,\nu)\psi_{n}(i)\bigg{]}
=infνP(B(i))supμP(A(i))[jSψn(j)q(j|i,μ,ν)+c(i,μ,ν)ψn(i)].\displaystyle=\inf_{\nu\in P(B(i))}\sup_{\mu\in P(A(i))}\bigg{[}\sum_{j\in S}\psi_{n}(j)q(j|i,\mu,\nu)+c(i,\mu,\nu)\psi_{n}(i)\bigg{]}. (2.18)

Also, for each iSi\in S such that ψn(i)>0\psi_{n}(i)>0, we have

ρnsupπ1Π1infπ2Π2lim supT1TlogEiπ1,π2[e0Tc(ξt,π1(t),π2(t))𝑑t].\displaystyle\rho_{n}\leq\sup_{\pi^{1}\in\Pi^{1}}\inf_{\pi^{2}\in\Pi^{2}}\limsup_{T\rightarrow\infty}\frac{1}{T}\log E^{\pi^{1},\pi^{2}}_{i}\bigg{[}e^{\int_{0}^{T}c(\xi_{t},\pi^{1}(t),\pi^{2}(t))dt}\bigg{]}. (2.19)

Additionally the sequence {ρn}\{\rho_{n}\} is bounded satisfying lim infnρn0\liminf_{n\rightarrow\infty}\rho_{n}\geq 0.

Proof.

Let δ>0\delta>0. Set c~=csup𝒟^ncδ\tilde{c}=\displaystyle{c-\sup_{\hat{\mathscr{D}}_{n}}c-\delta} . Let T~:𝒟^n𝒟^n\tilde{T}:\mathcal{B}_{\hat{\mathscr{D}}_{n}}\rightarrow\mathcal{B}_{\hat{\mathscr{D}}_{n}} be an operator defined as

T~(g)(i):=supπ1Π1infπ2Π2Eiπ1,π2[0τne0tc~(ξs,π1(s),π2(s))𝑑sg(ξt)𝑑t],i𝒟^n,\displaystyle\tilde{T}(g)(i):=\sup_{\pi^{1}\in\Pi^{1}}\inf_{\pi^{2}\in\Pi^{2}}E^{\pi^{1},\pi^{2}}_{i}\biggl{[}\int_{0}^{\tau_{n}}e^{\int_{0}^{t}\tilde{c}(\xi_{s},\pi^{1}(s),\pi^{2}(s))ds}g(\xi_{t})dt\biggr{]},~{}i\in\hat{\mathscr{D}}_{n}, (2.20)

with T~(g)(i)=0 for i𝒟^nc\tilde{T}(g)(i)=0~{}\text{ for }~{}i\in\hat{\mathscr{D}}_{n}^{c} . Let g1,g2𝒟^ng_{1},g_{2}\in\mathcal{B}_{\hat{\mathscr{D}}_{n}} such that g1g2g_{1}\succeq g_{2}, i.e., g1(i)g2(i)g_{1}(i)\geq g_{2}(i) for each ii. Also, let T~(g1)=φ^1\tilde{T}(g_{1})=\hat{\varphi}_{1} and T~(g2)=φ^2\tilde{T}(g_{2})=\hat{\varphi}_{2} . Then there exists π^1Π1s\hat{\pi}^{*1}\in\Pi^{s}_{1} such that

g2(i)\displaystyle-g_{2}(i) =supμP(A(i))infνP(B(i))[jSφ^2(j)q(j|i,μ,ν)+c~(i,μ,ν)φ^2(i)]\displaystyle=\sup_{\mu\in P(A(i))}\inf_{\nu\in P(B(i))}\biggl{[}\sum_{j\in S}\hat{\varphi}_{2}(j)q(j|i,\mu,\nu)+\tilde{c}(i,\mu,\nu)\hat{\varphi}_{2}(i)\biggr{]}
=infνP(B(i))[jSφ^2(j)q(j|i,π^1(i),ν)+c~(i,π^1(i),ν)φ^2(i)]i𝒟^n.\displaystyle=\inf_{\nu\in P(B(i))}\biggl{[}\sum_{j\in S}\hat{\varphi}_{2}(j)q(j|i,\hat{\pi}^{*1}(i),\nu)+\tilde{c}(i,\hat{\pi}^{*1}(i),\nu)\hat{\varphi}_{2}(i)\biggr{]}~{}\forall i\in\hat{\mathscr{D}}_{n}\,.

Also, from the proof of Proposition 2.1, we have

φ^2(i)=T~(g2)(i)=infπ2Π2Eiπ^1,π2[0τne0tc~(ξs,π^1(ξs),π2(s))𝑑sg2(ξt)𝑑t].\hat{\varphi}_{2}(i)=\tilde{T}(g_{2})(i)=\inf_{\pi^{2}\in\Pi^{2}}E^{\hat{\pi}^{*1},\pi^{2}}_{i}\biggl{[}\int_{0}^{\tau_{n}}e^{\int_{0}^{t}\tilde{c}(\xi_{s},\hat{\pi}^{*1}(\xi_{s}),\pi^{2}(s))ds}g_{2}(\xi_{t})dt\biggr{]}\,.

Thus, we deduce that

T~(g1)(i)T~(g2)(i)=\displaystyle\tilde{T}(g_{1})(i)-\tilde{T}(g_{2})(i)= supπ1Π1infπ2Π2Eiπ1,π2[0τne0tc~(ξs,π1(s),π2(s))𝑑sg1(ξt)𝑑t]\displaystyle\sup_{\pi^{1}\in\Pi^{1}}\inf_{\pi^{2}\in\Pi^{2}}E^{\pi^{1},\pi^{2}}_{i}\biggl{[}\int_{0}^{\tau_{n}}e^{\int_{0}^{t}\tilde{c}(\xi_{s},\pi^{1}(s),\pi^{2}(s))ds}g_{1}(\xi_{t})dt\biggr{]}
supπ1Π1infπ2Π2Eiπ1,π2[0τne0tc~(ξs,π1(s),π2(s))𝑑sg2(ξt)𝑑t]\displaystyle-\sup_{\pi^{1}\in\Pi^{1}}\inf_{\pi^{2}\in\Pi^{2}}E^{\pi^{1},\pi^{2}}_{i}\biggl{[}\int_{0}^{\tau_{n}}e^{\int_{0}^{t}\tilde{c}(\xi_{s},\pi^{1}(s),\pi^{2}(s))ds}g_{2}(\xi_{t})dt\biggr{]}
\displaystyle\geq infπ2Π2Eiπ^1,π2[0τne0tc~(ξs,π^1(ξs),π2(s))𝑑sg1(ξt)𝑑t]\displaystyle\inf_{\pi^{2}\in\Pi^{2}}E^{\hat{\pi}^{*1},\pi^{2}}_{i}\biggl{[}\int_{0}^{\tau_{n}}e^{\int_{0}^{t}\tilde{c}(\xi_{s},\hat{\pi}^{*1}(\xi_{s}),\pi^{2}(s))ds}g_{1}(\xi_{t})dt\biggr{]}
infπ2Π2Eiπ^1,π2[0τne0tc~(ξs,π^1(ξs),π2(s))𝑑sg2(ξt)𝑑t]\displaystyle-\inf_{\pi^{2}\in\Pi^{2}}E^{\hat{\pi}^{*1},\pi^{2}}_{i}\biggl{[}\int_{0}^{\tau_{n}}e^{\int_{0}^{t}\tilde{c}(\xi_{s},\hat{\pi}^{*1}(\xi_{s}),\pi^{2}(s))ds}g_{2}(\xi_{t})dt\biggr{]}
\displaystyle\geq infπ2Π2Eiπ^1,π2[0τne0tc~(ξs,π^1(ξs),π2(s))𝑑s(g1(ξt)g2(ξt))𝑑t].\displaystyle\inf_{\pi^{2}\in\Pi^{2}}E^{\hat{\pi}^{*1},\pi^{2}}_{i}\biggl{[}\int_{0}^{\tau_{n}}e^{\int_{0}^{t}\tilde{c}(\xi_{s},\hat{\pi}^{*1}(\xi_{s}),\pi^{2}(s))ds}(g_{1}(\xi_{t})-g_{2}(\xi_{t}))dt\biggr{]}.

This gives us T~(g1)T~(g2)\tilde{T}(g_{1})\succeq\tilde{T}(g_{2}). Clearly T~(λg)=λT~(g)\tilde{T}(\lambda g)=\lambda\tilde{T}(g) for all λ0\lambda\geq 0. Since c~<δ\tilde{c}<-\delta, there exists a constant α2>0\alpha_{2}>0 such that

T~(g^1)T~(g^2)𝒟^nα2g^1g^2𝒟^n, for any g^1,g^2𝒟^n.\|\tilde{T}(\hat{g}_{1})-\tilde{T}(\hat{g}_{2})\|_{\hat{\mathscr{D}}_{n}}\leq\alpha_{2}\|\hat{g}_{1}-\hat{g}_{2}\|_{\hat{\mathscr{D}}_{n}},~{}\text{ for any }\hat{g}_{1},\hat{g}_{2}\in\mathcal{B}_{\hat{\mathscr{D}}_{n}}.

Thus T~\tilde{T} is continuous. Let {gm}\{g_{m}\} be a bounded sequence in 𝒟^n\mathcal{B}_{\hat{\mathscr{D}}_{n}}. Then from (2.20), for some constant α3>0\alpha_{3}>0 such that T~gm𝒟^nα3\|\tilde{T}g_{m}\|_{\hat{\mathscr{D}}_{n}}\leq\alpha_{3}. Now applying diagonalization arguments, there exist a subsequence of {T~gm}\{\tilde{T}g_{m}\}, ( denoting by the same sequence without loss of generality) and a function ϕ𝒟^n\phi\in\mathcal{B}_{\hat{\mathscr{D}}_{n}} such that T~gmϕ𝒟^n0\|\tilde{T}g_{m}-\phi\|_{\hat{\mathscr{D}}_{n}}\rightarrow 0 as mm\rightarrow\infty. Hence the map T~:𝒟^n𝒟^n\tilde{T}:\mathcal{B}_{\hat{\mathscr{D}}_{n}}\rightarrow\mathcal{B}_{\hat{\mathscr{D}}_{n}} is compact. Therefore it is completely continuous. Let g𝒟^ng\in\mathcal{B}_{\hat{\mathscr{D}}_{n}} such that g(i0)=1g(i_{0})=1 and g(j)=0g(j)=0 for all ji0j\neq i_{0}. Then by (2.20), we have

T~(g)(i0)\displaystyle\tilde{T}(g)(i_{0}) supπ1Π1infπ2Π2Ei0π1,π2[0T1e0tc~(ξs,π1(s),π2(s))𝑑sg(ξt)𝑑t]\displaystyle\geq\sup_{\pi^{1}\in\Pi^{1}}\inf_{\pi^{2}\in\Pi^{2}}E^{\pi^{1},\pi^{2}}_{i_{0}}\biggl{[}\int_{0}^{T_{1}}e^{\int_{0}^{t}\tilde{c}(\xi_{s},\pi^{1}(s),\pi^{2}(s))ds}g(\xi_{t})dt\biggr{]}
g(i0)c~𝒟^nsupπ1Π1infπ2Π2Ei0π1,π2[1ec~𝒟^nT1]\displaystyle\geq\frac{g(i_{0})}{\|\tilde{c}\|_{\hat{\mathscr{D}}_{n}}}\sup_{\pi^{1}\in\Pi^{1}}\inf_{\pi^{2}\in\Pi^{2}}E^{\pi^{1},\pi^{2}}_{i_{0}}\bigg{[}1-e^{-\|\tilde{c}\|_{\hat{\mathscr{D}}_{n}}T_{1}}\bigg{]}
=g(i0)1c~𝒟^n+q(i0),\displaystyle=g(i_{0})\frac{1}{\|\tilde{c}\|_{\hat{\mathscr{D}}_{n}}+q^{*}(i_{0})},

where T1T_{1} is the first jump time (clearly, T1τnT_{1}\leq\tau_{n}). Thus NT~(g)gN\tilde{T}(g)\succeq g where N=c~𝒟^n+q(i0)>0N={\|\tilde{c}\|_{\hat{\mathscr{D}}_{n}}+q^{*}(i_{0})}>0. Therefore by Theorem 2.1, there exists a nontrivial ψn𝒟^n+\psi_{n}\in\mathcal{B}_{\hat{\mathscr{D}}_{n}}^{+} where ψn0\psi_{n}\neq 0 and a constant λ𝒟^n>0\lambda_{\hat{\mathscr{D}}_{n}}>0 such that T~(ψn)=λ𝒟^nψn\tilde{T}(\psi_{n})=\lambda_{\hat{\mathscr{D}}_{n}}\psi_{n}, i.e.,

ρ~nψn(i)=supμP(A(i))infνP(B(i))[ψn(j)q(j|i,μ,ν)+c~(i,μ,ν)ψn(i)]i𝒟^n,\displaystyle\tilde{\rho}_{n}\psi_{n}(i)=\sup_{\mu\in P(A(i))}\inf_{\nu\in P(B(i))}\biggl{[}\psi_{n}(j)q(j|i,\mu,\nu)+\tilde{c}(i,\mu,\nu)\psi_{n}(i)\biggr{]}~{}\forall i\in\hat{\mathscr{D}}_{n},

where ρ~n=[λ𝒟^n]1\tilde{\rho}_{n}=-[\lambda_{\hat{\mathscr{D}}_{n}}]^{-1}. Therefore in terms of cc, we have

ρnψn(i)=supμP(A(i))infνP(B(i))[ψn(j)q(j|i,μ,ν)+c(i,μ,ν)ψn(i)]i𝒟^n,\displaystyle\rho_{n}\psi_{n}(i)=\sup_{\mu\in P(A(i))}\inf_{\nu\in P(B(i))}\biggl{[}\psi_{n}(j)q(j|i,\mu,\nu)+c(i,\mu,\nu)\psi_{n}(i)\biggr{]}~{}\forall i\in\hat{\mathscr{D}}_{n},

where ρn=ρ~n+sup𝒟^nc+δ\rho_{n}=\tilde{\rho}_{n}+\displaystyle{\sup_{\hat{\mathscr{D}}_{n}}c}+\delta. Now by Fan’s minimax theorem, see [[8], Theorem 3], we have

ρnψn(i)\displaystyle\rho_{n}\psi_{n}(i) =supμP(A(i))infνP(B(i))[ψn(j)q(j|i,μ,ν)+c(i,μ,ν)ψn(i)]\displaystyle=\sup_{\mu\in P(A(i))}\inf_{\nu\in P(B(i))}\biggl{[}\psi_{n}(j)q(j|i,\mu,\nu)+c(i,\mu,\nu)\psi_{n}(i)\biggr{]}
=infνP(B(i))supμP(A(i))[ψn(j)q(j|i,μ,ν)+c(i,μ,ν)ψn(i)]i𝒟^n.\displaystyle=\inf_{\nu\in P(B(i))}\sup_{\mu\in P(A(i))}\biggl{[}\psi_{n}(j)q(j|i,\mu,\nu)+c(i,\mu,\nu)\psi_{n}(i)\biggr{]}~{}\forall i\in\hat{\mathscr{D}}_{n}.

This proves that (2.18) admits a unique solution. As before by the continuity of c,qc,q and the compactness of A(i)A(i), there exists πn1Π1s\pi^{*1}_{n}\in\Pi^{s}_{1} such that (2.18), can be written as

ρnψn(i)=infνP(B(i))[ψn(j)q(j|i,πn1(i),ν)+c(i,πn1(i),ν)ψn(i)]i𝒟^n.\displaystyle\rho_{n}\psi_{n}(i)=\inf_{\nu\in P(B(i))}\biggl{[}\psi_{n}(j)q(j|i,\pi^{*1}_{n}(i),\nu)+c(i,\pi^{*1}_{n}(i),\nu)\psi_{n}(i)\biggr{]}~{}\forall i\in\hat{\mathscr{D}}_{n}. (2.21)

Now applying Dynkin’s formula (see [[7], Lemma 3.1]) and using (2.21), we get

ψn(i)\displaystyle\psi_{n}(i) Eiπn1,π2[e0T(c(ξs,πn1(ξs),π2(s))ρn)𝑑sψn(ξT)I{T<τn}]\displaystyle\leq E^{\pi^{*1}_{n},\pi^{2}}_{i}\biggl{[}e^{\int_{0}^{T}({c}(\xi_{s},\pi^{*1}_{n}(\xi_{s}),\pi^{2}(s))-\rho_{n})ds}\psi_{n}(\xi_{T})I_{\{T<\tau_{n}\}}\biggr{]}
(sup𝒟^nψn)Eiπn1,π2[e0T(c(ξs,πn1(ξs),π2(s))ρn)𝑑s].\displaystyle\leq(\sup_{\hat{\mathscr{D}}_{n}}\psi_{n})E^{\pi^{*1}_{n},\pi^{2}}_{i}\biggl{[}e^{\int_{0}^{T}({c}(\xi_{s},\pi^{*1}_{n}(\xi_{s}),\pi^{2}(s))-\rho_{n})ds}\biggr{]}. (2.22)

If ψn(i)>0\psi_{n}(i)>0 then by taking logarithm on the both sides in (2.22), dividing by TT and letting TT\rightarrow\infty, we get

ρn\displaystyle\rho_{n} lim supT1TlogEiπn1,π2[e0Tc(ξs,πn1(ξs),π2(s))𝑑s].\displaystyle\leq\limsup_{T\rightarrow\infty}\frac{1}{T}\log E^{\pi^{*1}_{n},\pi^{2}}_{i}\biggl{[}e^{\int_{0}^{T}{c}(\xi_{s},\pi^{*1}_{n}(\xi_{s}),\pi^{2}(s))ds}\biggr{]}.

Since π2Π2\pi^{2}\in\Pi^{2} is arbitrary, we obtain

ρn\displaystyle\rho_{n} infπ2Π2lim supT1TlogEiπn1,π2[e0Tc(ξs,πn1(ξs),π2(s))𝑑s]\displaystyle\leq\inf_{\pi^{2}\in\Pi^{2}}\limsup_{T\rightarrow\infty}\frac{1}{T}\log E^{\pi^{*1}_{n},\pi^{2}}_{i}\biggl{[}e^{\int_{0}^{T}{c}(\xi_{s},\pi^{*1}_{n}(\xi_{s}),\pi^{2}(s))ds}\biggr{]}
supπ1Π1infπ2Π2lim supT1TlogEiπ1,π2[e0Tc(ξs,π1(s),π2(s))𝑑s].\displaystyle\leq\sup_{\pi^{1}\in\Pi^{1}}\inf_{\pi^{2}\in\Pi^{2}}\limsup_{T\rightarrow\infty}\frac{1}{T}\log E^{\pi^{1},\pi^{2}}_{i}\biggl{[}e^{\int_{0}^{T}c(\xi_{s},\pi^{1}(s),\pi^{2}(s))ds}\biggr{]}.

We now show that J(i,c,π1,π2)J(i,c,\pi^{1},\pi^{2}) is finite for every (π1,π2)Π1×Π2(\pi^{1},\pi^{2})\in\Pi^{1}\times\Pi^{2} and iSi\in S. We only provide a proof under Assumption 2.2 (b) and the proof under Assumption 2.2 (a) would be analogous. Now from (2.7) we get

sup(a,b)A(i)×B(i)jSV(j)q(j|i,a,b)(C^(i))V(i)iS.\displaystyle\sup_{(a,b)\in A(i)\times B(i)}\sum_{j\in S}V(j)q(j|i,a,b)\leq(C-\hat{\ell}(i))V(i)~{}\forall i\in S. (2.23)

Then by Dynkin formula, we get

Eiπ1,π2[e0Tτn(^(ξt)C)𝑑tV(ξTτn)]V(i)iS.\displaystyle E^{\pi^{1},\pi^{2}}_{i}\bigg{[}e^{\int_{0}^{T\wedge\tau_{n}}(\hat{\ell}(\xi_{t})-C)dt}V(\xi_{T\wedge\tau_{n}})\bigg{]}\leq V(i)~{}\forall i\in S. (2.24)

By Fatou’s lemma, taking nn\rightarrow\infty in (2.24), we get

Eiπ1,π2[e0T(^(ξt)C)𝑑tV(ξT)]V(i)iS.\displaystyle E^{\pi^{1},\pi^{2}}_{i}\bigg{[}e^{\int_{0}^{T}(\hat{\ell}(\xi_{t})-C)dt}V(\xi_{T})\bigg{]}\leq V(i)~{}\forall i\in S.

Now, since V1V\geq 1, taking logarithm on both sides in the above equation, dividing both sides by TT and letting TT\rightarrow\infty, we obtain

J(i,^,π1,π2)C for alliS.J(i,\hat{\ell},\pi^{1},\pi^{2})\leq C~{}\text{ for all}~{}i\in S.

Since, ^sup(a,b)A(i)×B(i)c(,a,b)\hat{\ell}-\displaystyle\sup_{(a,b)\in A(i)\times B(i)}c(\cdot,a,b) is norm-like, we have sup(a,b)A(i)×B(i)c(i,a,b)^(i)+k1\displaystyle\sup_{(a,b)\in A(i)\times B(i)}c(i,a,b)\leq\hat{\ell}(i)+k_{1}\foralliSi\in S for some constant k1k_{1}. Hence we get

J(i,c,π1,π2)C+k1(π1,π2)Π1×Π2,iS.\displaystyle J(i,c,\pi^{1},\pi^{2})\leq C+k_{1}~{}~{}\forall(\pi^{1},\pi^{2})\in\Pi^{1}\times\Pi^{2},\forall i\in S. (2.25)

It is clear from (2.19) and (2.25) that ρn\rho_{n} has an upper bound. Next we prove that ρn\rho_{n} is bounded below. By using assumption 2.3 (iii) and (2.18), we have ψn(i0)>0\psi_{n}(i_{0})>0. Thus normalizing ψn\psi_{n}, we have ψn(i0)=1\psi_{n}(i_{0})=1. Also, since c0c\geq 0, by (2.18) we get

ρn\displaystyle\rho_{n} supμP(A(i0))infνP(B(i0))[jSψn(j)q(j|i0,μ,ν)]\displaystyle\geq\sup_{\mu\in P(A(i_{0}))}\inf_{\nu\in P(B(i_{0}))}\bigg{[}\sum_{j\in S}\psi_{n}(j)q(j|i_{0},\mu,\nu)\bigg{]}
supμP(A(i0))infνP(B(i0))q(i0|i0,μ,ν).\displaystyle\geq\sup_{\mu\in P(A(i_{0}))}\inf_{\nu\in P(B(i_{0}))}q(i_{0}|i_{0},\mu,\nu).

So, {ρn}\{\rho_{n}\} is bounded below. Now we claim that ρ^:=lim infnρn0\hat{\rho}:=\displaystyle\liminf_{n\rightarrow\infty}\rho_{n}\geq 0. If not, then on contrarary, ρ^<0\hat{\rho}<0. So, along some subsequence, we have (with an abuse of notation, we use the same sequence) ρnρ^\rho_{n}\rightarrow\hat{\rho}, as nn\rightarrow\infty and for large nn, ρn<0\rho_{n}<0. Let πn2\pi_{n}^{*2} be outer minimizing selector of (2.18). Thus, using (2.18), for large enough nn, we have

0>ρnψn(i0)\displaystyle 0>\rho_{n}\psi_{n}(i_{0}) =supμP(A(i0))[jSψn(j)q(j|i0,μ,πn2(i0))+c(i0,μ,πn2(i0))ψn(i0)]\displaystyle=\sup_{\mu\in P(A(i_{0}))}\bigg{[}\sum_{j\in S}\psi_{n}(j)q(j|i_{0},\mu,\pi^{*2}_{n}(i_{0}))+c(i_{0},\mu,\pi_{n}^{*2}(i_{0}))\psi_{n}(i_{0})\bigg{]}
supμP(A(i0))[jSψn(j)q(j|i0,μ,πn2(i0))]\displaystyle\geq\sup_{\mu\in P(A(i_{0}))}\bigg{[}\sum_{j\in S}\psi_{n}(j)q(j|i_{0},\mu,\pi^{*2}_{n}(i_{0}))\bigg{]}
jSψn(j)q(j|i0,μ,πn2(i0)).\displaystyle\geq\sum_{j\in S}\psi_{n}(j)q(j|i_{0},\mu,\pi_{n}^{*2}(i_{0})).

Now by Assumption 2.3 (iii), from the above equation, we get

ψn(j)q(i0|i0,μ,πn2(i0))q(j|i0,μ,πn2(i0))supμP(A(i0))supνP(B(i0))q(i0|i0,μ,ν)q(j|i0,μ,ν) for ji0.\displaystyle\psi_{n}(j)\leq\frac{-q(i_{0}|i_{0},\mu,\pi_{n}^{*2}(i_{0}))}{q(j|i_{0},\mu,\pi_{n}^{*2}(i_{0}))}\leq\sup_{\mu\in P(A(i_{0}))}\sup_{\nu\in P(B(i_{0}))}\frac{-q(i_{0}|i_{0},\mu,\nu)}{q(j|i_{0},\mu,\nu)}~{}\text{ for }~{}j\neq i_{0}.

So, by diagonalization argument we say, there exist a subsequence (denoting by the same sequence with an abuse of notation) and a function ψ\psi with ψ(i0)=1\psi(i_{0})=1 such that ψn(i)ψ(i)\psi_{n}(i)\rightarrow\psi(i), as nn\rightarrow\infty for all iSi\in S. By our assumption A(i)A(i) is compact for each iSi\in S and πn2\pi_{n}^{*2} is outer minimizing selector of (2.18). Hence we have πn2(i)π2(i)\pi^{*2}_{n}(i)\rightarrow\pi^{*2}(i), for all iSi\in S, as nn\to\infty. Therefore we have

ρnψn(i)\displaystyle\rho_{n}\psi_{n}(i) jSψn(j)q(j|i,μ,πn2(i))+c(i,μ,πn2(i))ψn(i).\displaystyle\geq\sum_{j\in S}\psi_{n}(j)q(j|i,\mu,\pi^{*2}_{n}(i))+c(i,\mu,\pi^{*2}_{n}(i))\psi_{n}(i). (2.26)

So, taking nn\rightarrow\infty in the above equation, we obtain

0>ρ^ψ(i)\displaystyle 0>\hat{\rho}\psi(i) jSψ(j)q(j|i,μ,π2(i))+c(i,μ,π2(i))ψ(i)\displaystyle\geq\sum_{j\in S}\psi(j)q(j|i,\mu,\pi^{*2}(i))+c(i,\mu,\pi^{*2}(i))\psi(i)
jSψ(j)q(j|i,μ,π2(i)).\displaystyle\geq\sum_{j\in S}\psi(j)q(j|i,\mu,\pi^{*2}(i)). (2.27)

Let π1Π1s\pi^{1}\in\Pi^{s}_{1}. Applying Dynkin formula and using (2.27), we obtain

Eiπ1,π2[ψ(ξtτn)]ψ(i)\displaystyle E^{\pi^{1},\pi^{*2}}_{i}[\psi(\xi_{t\wedge\tau_{n}})]-\psi(i)
=Eiπ1,π2[0tτnjSψ(j)q(j|ξs,π1(ξs),π2(ξs))ds]0.\displaystyle=E^{\pi^{1},\pi^{*2}}_{i}\biggl{[}\int_{0}^{t\wedge\tau_{n}}\sum_{j\in S}\psi(j)q(j|\xi_{s},\pi^{1}(\xi_{s}),\pi^{*2}(\xi_{s}))ds\biggr{]}\leq 0.

Now, using dominated convergence theorem, taking nn\rightarrow\infty, we get Eiπ1,π2[ψ(ξt)]ψ(i)E^{\pi^{1},\pi^{*2}}_{i}[\psi(\xi_{t})]\leq\psi(i). So, with respect to the canonical filtration of ξ\xi, {ψ(ξt)}\{\psi(\xi_{t})\} is supermartingale. So, by Doob’s martingale convergence theorem as tt\rightarrow\infty, ψ(ξt)\psi(\xi_{t}) converges. Now by Assumption 2.2, ξ\xi is recurrent. Thus the skeleton process {ξn:n}\{\xi_{n}:n\in\mathbb{N}\} is also recurrent (see for details [[2], Proposition 5.1.1]). This implies, that the process {ξn:n}\{\xi_{n}:n\in\mathbb{N}\} visits every state of SS infinitely often. But this is possible only if ψ1\psi\equiv 1. Since c0c\geq 0, this contradicts (2.27). Thus, lim infnρn0\displaystyle\liminf_{n\rightarrow\infty}\rho_{n}\geq 0. ∎

Lemma 2.3.

Suppose Assumptions 2.1, 2.2, and 2.3 hold. Then there exists (ρ,ψ)+×LV(\rho,\psi^{*})\in\mathbb{R}_{+}\times L^{\infty}_{V} with ψ>0\psi^{*}>0, such that

ρψ(i)\displaystyle\rho\psi^{*}(i) =supμP(A(i))infνP(B(i))[jSψ(j)q(j|i,μ,ν)+c(i,μ,ν)ψ(i)]\displaystyle=\sup_{\mu\in P(A(i))}\inf_{\nu\in P(B(i))}\bigg{[}\sum_{j\in S}\psi^{*}(j)q(j|i,\mu,\nu)+c(i,\mu,\nu)\psi^{*}(i)\bigg{]}
=infνP(B(i))supμP(A(i))[jSψ(j)q(j|i,μ,ν)+c(i,μ,ν)ψ(i)],iS.\displaystyle=\inf_{\nu\in P(B(i))}\sup_{\mu\in P(A(i))}\bigg{[}\sum_{j\in S}\psi^{*}(j)q(j|i,\mu,\nu)+c(i,\mu,\nu)\psi^{*}(i)\bigg{]},~{}i\in S. (2.28)

Also, the solution (ρ,ψ)(\rho,\psi^{*}) has the following characteristic.

  1. (i)

    ρinfiSsupπ1Π1infπ2Π2lim supT1TlnEiπ1,π2[e0Tc(ξt,π1(t),π2(t))𝑑t]\rho\leq\displaystyle\inf_{i\in S}\displaystyle\displaystyle\sup_{\pi^{1}\in\Pi^{1}}\inf_{\pi^{2}\in\Pi^{2}}\limsup_{T\rightarrow\infty}\frac{1}{T}\ln E^{\pi^{1},\pi^{2}}_{i}\biggl{[}e^{\int_{0}^{T}c(\xi_{t},\pi^{1}(t),\pi^{2}(t))dt}\biggr{]}.

  2. (ii)

    For any mini-max selector (π1,π2)Π1s×Π2s(\pi^{*1},\pi^{*2})\in\Pi_{1}^{s}\times\Pi_{2}^{s} of (2.28), we have

    ψ(i)\displaystyle\psi^{*}(i) =supπ1Π1Eiπ1,π2[e0τ~()(c(ξt,π1(t),π2(ξt))ρ)𝑑tψ(ξτ~())]\displaystyle=\sup_{\pi^{1}\in\Pi^{1}}E^{\pi^{1},\pi^{*2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}(\mathscr{B})}(c(\xi_{t},\pi^{1}(t),\pi^{*2}(\xi_{t}))-\rho)dt}\psi^{*}(\xi_{\tilde{\tau}(\mathscr{B})})\bigg{]}
    =infπ2Π2Eiπ1,π2[e0τ~()(c(ξt,π1(ξt),π2(t))ρ)𝑑tψ(ξτ~())]ic,\displaystyle=\inf_{\pi^{2}\in\Pi^{2}}E^{\pi^{*1},\pi^{2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}(\mathscr{B})}(c(\xi_{t},\pi^{*1}(\xi_{t}),\pi^{2}(t))-\rho)dt}\psi^{*}(\xi_{\tilde{\tau}(\mathscr{B})})\bigg{]}~{}\forall i\in\mathscr{B}^{c}\,, (2.29)

    for some finite set 𝒦^\mathscr{B}\supset\hat{\mathscr{K}}.

Proof.

Using Assumption 2.2 and the fact c0c\geq 0, there exists a finite set \mathscr{B} containing 𝒦^\hat{\mathscr{K}} such that the following hold.

  • When Assumption 2.2 (a) holds:

    (sup(a,b)A(i)×B(i)c(i,a,b)ρn)<γ^ic,for allnlarge.\displaystyle(\sup_{(a,b)\in A(i)\times B(i)}c(i,a,b)-\rho_{n})<\hat{\gamma}~{}\forall i\in\mathscr{B}^{c},~{}\text{for all}~{}n~{}\text{large}. (2.30)
  • When Assumption 2.2 (b) holds:

    (sup(a,b)A(i)×B(i)c(i,a,b)ρn)<^(i)ic,for allnlarge.\displaystyle(\sup_{(a,b)\in A(i)\times B(i)}c(i,a,b)-\rho_{n})<\hat{\ell}(i)~{}\forall i\in\mathscr{B}^{c},~{}\text{for all}~{}n~{}\text{large}. (2.31)

Now we scale ψn\psi_{n} in such a way that it touches VV from below. Define

θ^n=sup{k>0:(Vkψn)>0inS}.\hat{\theta}_{n}=\sup\{k>0:(V-k\psi_{n})>0~{}\text{in}~{}S\}.

Then we see that θ^n\hat{\theta}_{n} is finite as ψn\psi_{n} vanishes in 𝒟^nc\hat{\mathscr{D}}_{n}^{c} and ψn0\psi_{n}\gneq 0. We claim that if we replace ψn\psi_{n} by θ^nψn\hat{\theta}_{n}\psi_{n}, then ψn\psi_{n} touches VV inside \mathscr{B}. If not, then for some state i^c\hat{i}\in\mathscr{B}^{c}, (Vψn)(i^)=0(V-\psi_{n})(\hat{i})=0 and Vψn>0V-\psi_{n}>0 in 𝒟^nc\mathscr{B}\cup\hat{\mathscr{D}}_{n}^{c}. Let πn1Π1s\pi^{*1}_{n}\in\Pi_{1}^{s} be an outer maximizing selector of (2.18). Then by Dynkin formula, we get (under Assumption 2.2 (b))

ψn(i^)\displaystyle\psi_{n}(\hat{i}) Ei^πn1,π2[e0Tτ~()(c(ξs,πn1(ξs),π2(s))ρn)𝑑sψn(ξTτ~())I{Tτ~()<τn}]\displaystyle\leq E^{\pi^{*1}_{n},\pi^{2}}_{\hat{i}}\biggl{[}e^{\int_{0}^{T\wedge\tilde{\tau}(\mathscr{B})}(c(\xi_{s},\pi^{*1}_{n}(\xi_{s}),\pi^{2}(s))-\rho_{n})ds}\psi_{n}(\xi_{T\wedge\tilde{\tau}(\mathscr{B})})I_{\{T\wedge\tilde{\tau}(\mathscr{B})<\tau_{n}\}}\biggr{]}
Ei^πn1,π2[e0Tτ~()^(ξs)𝑑sψn(ξTτ~())I{Tτ~()<τn}].\displaystyle\leq E^{\pi^{*1}_{n},\pi^{2}}_{\hat{i}}\biggl{[}e^{\int_{0}^{T\wedge\tilde{\tau}(\mathscr{B})}\hat{\ell}(\xi_{s})ds}\psi_{n}(\xi_{T\wedge\tilde{\tau}(\mathscr{B})})I_{\{T\wedge\tilde{\tau}(\mathscr{B})<\tau_{n}\}}\biggr{]}.

Since ψnV\psi_{n}\leq V, in view of Lemma 2.1, by the dominated convergence theorem, taking TT\rightarrow\infty, we get

ψn(i^)Ei^πn1,π2[e0τ~()^(ξs)𝑑sψn(ξτ~())].\displaystyle\psi_{n}(\hat{i})\leq E^{\pi^{*1}_{n},\pi^{2}}_{\hat{i}}\biggl{[}e^{\int_{0}^{\tilde{\tau}(\mathscr{B})}\hat{\ell}(\xi_{s})ds}\psi_{n}(\xi_{\tilde{\tau}(\mathscr{B})})\biggr{]}.

Using this and (2.17), we have

0=(Vψn)(i^)Ei^πn1,π2[e0τ~()^(ξs)𝑑s(Vψn)(ξτ~())]>0.\displaystyle 0=(V-\psi_{n})(\hat{i})\geq E^{\pi^{*1}_{n},\pi^{2}}_{\hat{i}}\bigg{[}e^{\int_{0}^{\tilde{\tau}(\mathscr{B})}\hat{\ell}(\xi_{s})ds}(V-\psi_{n})(\xi_{\tilde{\tau}(\mathscr{B})})\bigg{]}>0.

Hence we arrive at a contradiction. Thus ψn\psi_{n} touches VV inside \mathscr{B}. Similar conclusion holds under Assumption 2.2 (a). Now, since ψnV\psi_{n}\leq V for all large n, by diagonalization argument, there exists a subsequence (by an abuse of notation, we use the same sequence) such that, ψnψ\psi_{n}\rightarrow\psi^{*} for all iSi\in S, as nn\rightarrow\infty, and ψV\psi^{*}\leq V. Also, since by Lemma 2.2, the sequence {ρn}\{\rho_{n}\} is bounded and lim infnρn0\displaystyle\liminf_{n\rightarrow\infty}\rho_{n}\geq 0, we can find a subsequence (by an abuse of notation we use the same sequence) and some ρ0\rho\geq 0 such that ρnρ\rho_{n}\rightarrow\rho as nn\rightarrow\infty. Thus as before, there exists a mini-max selector (πn1,πn2)Π1s×Π2s(\pi^{*1}_{n},\pi^{*2}_{n})\in\Pi_{1}^{s}\times\Pi_{2}^{s} of (2.18), i.e.,

infνP(B(i))[jSψn(j)q(j|i,πn1(i),ν)+c(i,πn1(i),ν)ψn(i)]\displaystyle\inf_{\nu\in P(B(i))}\bigg{[}\sum_{j\in S}\psi_{n}(j)q(j|i,\pi^{*1}_{n}(i),\nu)+c(i,\pi^{*1}_{n}(i),\nu)\psi_{n}(i)\bigg{]}
=supμP(A(i))infνP(B(i))[jSψn(j)q(j|i,μ,ν)+c(i,μ,ν)ψn(i)]\displaystyle=\sup_{\mu\in P(A(i))}\inf_{\nu\in P(B(i))}\bigg{[}\sum_{j\in S}\psi_{n}(j)q(j|i,\mu,\nu)+c(i,\mu,\nu)\psi_{n}(i)\bigg{]}
=infνP(B(i))supμP(A(i))[jSψn(j)q(j|i,μ,ν)+c(i,μ,ν)ψn(i)]\displaystyle=\inf_{\nu\in P(B(i))}\sup_{\mu\in P(A(i))}\bigg{[}\sum_{j\in S}\psi_{n}(j)q(j|i,\mu,\nu)+c(i,\mu,\nu)\psi_{n}(i)\bigg{]}
=supμP(A(i))[jSψn(j)q(j|i,μ,πn2(i))+c(i,μ,πn2(i))ψn(i)].\displaystyle=\sup_{\mu\in P(A(i))}\bigg{[}\sum_{j\in S}\psi_{n}(j)q(j|i,\mu,\pi^{*2}_{n}(i))+c(i,\mu,\pi^{*2}_{n}(i))\psi_{n}(i)\bigg{]}. (2.32)

Hence,

ρnψn(i)[jSψn(j)q(j|i,πn1(i),ν)+c(i,πn1(i),ν)ψn(i)].\displaystyle\rho_{n}\psi_{n}(i)\leq\bigg{[}\sum_{j\in S}\psi_{n}(j)q(j|i,\pi^{*1}_{n}(i),\nu)+c(i,\pi^{*1}_{n}(i),\nu)\psi_{n}(i)\bigg{]}.

The above implies

ρnψn(i)ψn(i)q(i|i,πn1(i),ν)[jiψn(j)q(j|i,πn1(i),ν)+c(i,πn1(i),ν)ψn(i)].\displaystyle\rho_{n}\psi_{n}(i)-\psi_{n}(i)q(i|i,\pi^{*1}_{n}(i),\nu)\leq\bigg{[}\sum_{j\neq i}\psi_{n}(j)q(j|i,\pi^{*1}_{n}(i),\nu)+c(i,\pi^{*1}_{n}(i),\nu)\psi_{n}(i)\bigg{]}. (2.33)

Now, since ψn(i)V(i)\psi_{n}(i)\leq V(i) for all iSi\in S, we have

jiψ(j)q(j|i,πn1(i),ν)jiV(j)q(j|i,πn1(i),ν).\displaystyle\sum_{j\neq i}\psi(j)q(j|i,\pi^{*1}_{n}(i),\nu)\leq\sum_{j\neq i}V(j)q(j|i,\pi^{*1}_{n}(i),\nu). (2.34)

Also, since Π1s\Pi_{1}^{s} and Π2s\Pi_{2}^{s} are compact there exist π1Π1s\pi^{*1}\in\Pi_{1}^{s} and π2Π2s\pi^{*2}\in\Pi_{2}^{s} such that πn1π1\pi^{*1}_{n}\rightarrow\pi^{*1} and πn2π2\pi^{*2}_{n}\rightarrow\pi^{*2} as nn\rightarrow\infty. Under given assumptions, from [[13], Lemma 7.2] it is clear that the functions c(i,μ,ν)c(i,\mu,\nu), and jSq(j|i,μ,ν)u(j)\displaystyle\sum_{j\in S}q(j|i,\mu,\nu)u(j) are continuous at (μ,ν)(\mu,\nu) on P(A(i))×P(B(i))P(A(i))\times P(B(i)) for each fixed uLVu\in L^{\infty}_{V}, iSi\in S. Therefore by the dominated convergence theorem, letting nn\rightarrow\infty in (2.33), we obtain

ρψ(i)jSψ(j)q(j|i,π1(i),ν)+c(i,π1(i),ν)ψ(i).\displaystyle\rho\psi^{*}(i)\leq\sum_{j\in S}\psi^{*}(j)q(j|i,\pi^{*1}(i),\nu)+c(i,\pi^{*1}(i),\nu)\psi^{*}(i).

Hence we have

ρψ(i)\displaystyle\rho\psi^{*}(i) infνP(B(i))[jSψ(j)q(j|i,π1(i),ν)+c(i,π1(i),ν)ψ(i)].\displaystyle\leq\inf_{\nu\in P(B(i))}\bigg{[}\sum_{j\in S}\psi^{*}(j)q(j|i,\pi^{*1}(i),\nu)+c(i,\pi^{*1}(i),\nu)\psi^{*}(i)\biggr{]}.
supμP(A(i))infνP(B(i))[jSψ(j)q(j|i,μ,ν)+c(i,μ,ν)ψ(i)].\displaystyle\leq\sup_{\mu\in P(A(i))}\inf_{\nu\in P(B(i))}\bigg{[}\sum_{j\in S}\psi^{*}(j)q(j|i,\mu,\nu)+c(i,\mu,\nu)\psi^{*}(i)\biggr{]}. (2.35)

By similar arguments using (2.32) and extended Fatou’s lemma [[20], Lemma 8.3.7], we get

ρψ(i)\displaystyle\rho\psi^{*}(i) supμP(A(i))[jSψ(j)q(j|i,μ,π2(i))+c(i,μ,π2(i))ψ(i)]\displaystyle\geq\sup_{\mu\in P(A(i))}\bigg{[}\sum_{j\in S}\psi^{*}(j)q(j|i,\mu,\pi^{*2}(i))+c(i,\mu,\pi^{*2}(i))\psi^{*}(i)\biggr{]}
infνP(B(i))supμP(A(i))[jSψ(j)q(j|i,μ,ν)+c(i,μ,ν)ψ(i)].\displaystyle\geq\inf_{\nu\in P(B(i))}\sup_{\mu\in P(A(i))}\bigg{[}\sum_{j\in S}\psi^{*}(j)q(j|i,\mu,\nu)+c(i,\mu,\nu)\psi^{*}(i)\biggr{]}. (2.36)

Hence by (2.35) and (2.36), we get (2.28). Since at some point in \mathscr{B} we have (Vψn)=0(V-\psi_{n})=0, for all large nn. It follows that (Vψ)(i)=0(V-\psi^{*})(i^{*})=0 for some ii^{*}\in\mathscr{B}. Since V1V\geq 1, it is clear that ψ\psi^{*} is nontrivial. Now we claim that ψ>0\psi^{*}>0. If not, then we must have ψ(i~)=0\psi^{*}(\tilde{i})=0 for some i~S\tilde{i}\in S. Again as before, there exits a pair of a mini-max selector (π1,π2)Π1s×Π2s(\pi^{*1},\pi^{*2})\in\Pi_{1}^{s}\times\Pi_{2}^{s} such that form (2.28), we have

ρψ(i~)=[jSψ(j)q(j|i~,π1(i~),π2(i~))+c(i~,π1(i~),π2(i~))ψ(i~)].\displaystyle\rho\psi^{*}(\tilde{i})=\bigg{[}\sum_{j\in S}\psi^{*}(j)q(j|\tilde{i},\pi^{*1}(\tilde{i}),\pi^{*2}(\tilde{i}))+c(\tilde{i},\pi^{*1}(\tilde{i}),\pi^{*2}(\tilde{i}))\psi^{*}(\tilde{i})\biggr{]}. (2.37)

This implies

ji~ψ(j)q(j|i~,π1(i~),π2(i~))=0.\sum_{j\neq\tilde{i}}\psi^{*}(j)q(j|\tilde{i},\pi^{*1}(\tilde{i}),\pi^{*2}(\tilde{i}))=0.

Since the Markov chain ξ\xi is irreducible under (π1,π2)Π1s×Π2s(\pi^{*1},\pi^{*2})\in\Pi^{s}_{1}\times\Pi^{s}_{2}, from the above equation, it follows that ψ0\psi^{*}\equiv 0. So, we arrive at a contradiction. This proves the claim. Now we prove (i) and (ii).

(i) Since ψ>0\psi^{*}>0 and ψn(i)ψ(i)\psi_{n}(i)\rightarrow\psi^{*}(i) as nn\rightarrow\infty, we have ψn>0\psi_{n}>0 for all large enough nn. So, using (2.19), we have limnρnsupπ1Π1infπ2Π2J(i,c,π1,π2)\displaystyle\lim_{n\rightarrow\infty}\rho_{n}\leq\sup_{\pi^{1}\in\Pi^{1}}\inf_{\pi^{2}\in\Pi^{2}}J(i,c,\pi^{1},\pi^{2}) for all iSi\in S.

(ii) By measurable selection theorem in [[27], Theorem 2.2], there exists a pair of strategies (a mini-max selector) (π1,π2)Π1s×Π2s(\pi^{*1},\pi^{*2})\in\Pi_{1}^{s}\times\Pi_{2}^{s} (as in (2.32)) satisfying

ρψ(i)\displaystyle\rho\psi^{*}(i) =supμP(A(i))[jSψ(j)q(j|i,μ,π2(i))+c(i,μ,π2(i))ψ(i)]\displaystyle=\sup_{\mu\in P(A(i))}\bigg{[}\sum_{j\in S}\psi^{*}(j)q(j|i,\mu,\pi^{*2}(i))+c(i,\mu,\pi^{*2}(i))\psi^{*}(i)\biggr{]}
=infνP(B(i))[jSψ(j)q(j|i,π1(i),ν)+c(i,π1(i),ν)ψ(i)].\displaystyle=\inf_{\nu\in P(B(i))}\bigg{[}\sum_{j\in S}\psi^{*}(j)q(j|i,\pi^{*1}(i),\nu)+c(i,\pi^{*1}(i),\nu)\psi^{*}(i)\biggr{]}. (2.38)

Using (2.38), Lemma 2.1, and Dynkin’s formula, we have

ψ(i)Eiπ1,π2[e0τ~()T(c(ξt,π1(t),π2(ξt))ρ)𝑑tψ(ξτ~()T)]ic.\displaystyle\psi^{*}(i)\geq E^{\pi^{1},\pi^{*2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}(\mathscr{B})\wedge T}(c(\xi_{t},\pi^{1}(t),\pi^{*2}(\xi_{t}))-\rho)dt}\psi^{*}(\xi_{\tilde{\tau}(\mathscr{B})\wedge T})\bigg{]}~{}\forall i\in\mathscr{B}^{c}.

By Fatou’s lemma taking TT\rightarrow\infty, we get

ψ(i)Eiπ1,π2[e0τ~()(c(ξt,π1(t),π2(ξt))ρ)𝑑tψ(ξτ~())],ic.\displaystyle\psi^{*}(i)\geq E^{\pi^{1},\pi^{*2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}(\mathscr{B})}(c(\xi_{t},\pi^{1}(t),\pi^{*2}(\xi_{t}))-\rho)dt}\psi^{*}(\xi_{\tilde{\tau}(\mathscr{B})})\bigg{]},~{}\forall i\in\mathscr{B}^{c}. (2.39)

Hence,

ψ(i)\displaystyle\psi^{*}(i) supπ1Π1Eiπ1,π2[e0τ~()(c(ξt,π1(t),π2(ξt))ρ)𝑑tψ(ξτ~())]\displaystyle\geq\sup_{\pi^{1}\in\Pi^{1}}E^{\pi^{1},\pi^{*2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}(\mathscr{B})}(c(\xi_{t},\pi^{1}(t),\pi^{*2}(\xi_{t}))-\rho)dt}\psi^{*}(\xi_{\tilde{\tau}(\mathscr{B})})\bigg{]}
infπ2Π2supπ1Π1Eiπ1,π2[e0τ~()(c(ξt,π1(t),π2(t))ρ)𝑑tψ(ξτ~())],ic.\displaystyle\geq\inf_{\pi^{2}\in\Pi^{2}}\sup_{\pi^{1}\in\Pi^{1}}E^{\pi^{1},\pi^{2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}(\mathscr{B})}(c(\xi_{t},\pi^{1}(t),\pi^{2}(t))-\rho)dt}\psi^{*}(\xi_{\tilde{\tau}(\mathscr{B})})\bigg{]},~{}\forall i\in\mathscr{B}^{c}. (2.40)

Also, using (2.38), Lemma 2.1, and Dynkin’s formula, we obtain

ψ(i)Eiπ1,π2[e0τ~()T(c(ξt,π1(ξt),π2(t))ρ)𝑑tψ(ξτ~()T)]ic.\displaystyle\psi^{*}(i)\leq E^{\pi^{*1},\pi^{2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}(\mathscr{B})\wedge T}(c(\xi_{t},\pi^{*1}(\xi_{t}),\pi^{2}(t))-\rho)dt}\psi^{*}(\xi_{\tilde{\tau}(\mathscr{B})\wedge T})\bigg{]}~{}\forall i\in\mathscr{B}^{c}.

Since ψV\psi^{*}\leq V, using the estimates as in Lemma 2.1, taking TT\rightarrow\infty, by dominated convergent theorem it follows that

ψ(i)\displaystyle\psi^{*}(i) Eiπ1,π2[e0τ~()(c(ξt,π1(ξt),π2(t))ρ)𝑑tψ(ξτ~())]ic.\displaystyle\leq E^{\pi^{*1},\pi^{2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}(\mathscr{B})}(c(\xi_{t},\pi^{*1}(\xi_{t}),\pi^{2}(t))-\rho)dt}\psi^{*}(\xi_{\tilde{\tau}(\mathscr{B})})\bigg{]}~{}\forall i\in\mathscr{B}^{c}. (2.41)

Hence

ψ(i)\displaystyle\psi^{*}(i) infπ2Π2Eiπ1,π2[e0τ~()(c(ξt,π1(ξt),π2(t))ρ)𝑑tψ(ξτ~())]\displaystyle\leq\inf_{\pi^{2}\in\Pi^{2}}E^{\pi^{*1},\pi^{2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}(\mathscr{B})}(c(\xi_{t},\pi^{*1}(\xi_{t}),\pi^{2}(t))-\rho)dt}\psi^{*}(\xi_{\tilde{\tau}(\mathscr{B})})\bigg{]}
supπ1Π1infπ2Π2Eiπ1,π2[e0τ~()(c(ξt,π1(t),π2(t))ρ)𝑑tψ(ξτ~())],ic.\displaystyle\leq\sup_{\pi^{1}\in\Pi^{1}}\inf_{\pi^{2}\in\Pi^{2}}E^{\pi^{1},\pi^{2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}(\mathscr{B})}(c(\xi_{t},\pi^{1}(t),\pi^{2}(t))-\rho)dt}\psi^{*}(\xi_{\tilde{\tau}(\mathscr{B})})\bigg{]},~{}\forall i\in\mathscr{B}^{c}. (2.42)

From (2.40) and (2.42), we get (2.29). ∎

3. Existence of risk-sensitive average optimal strategies

In this section we prove that any mini-max selector of the associated HJI equation is a saddle point equilibrium. Also, exploiting the stochastic representation (2.29) we completely characterize all possible saddle point equilibrium in the space of stationary Markov strategies.

Theorem 3.1.

Suppose Assumptions 2.1, 2.2, and 2.3 hold. Then for any mini-max selector (π1,π2)Π1s×Π2s(\pi^{*1},\pi^{*2})\in\Pi^{s}_{1}\times\Pi^{s}_{2} of (2.28), i.e., for any pair (π1,π2)Π1s×Π2s(\pi^{*1},\pi^{*2})\in\Pi^{s}_{1}\times\Pi^{s}_{2} satisfying

infνP(B(i))[jSψ(j)q(j|i,π1(i),ν)+c(i,π1(i),ν)ψ(i)]\displaystyle\inf_{\nu\in P(B(i))}\bigg{[}\sum_{j\in S}\psi^{*}(j)q(j|i,\pi^{*1}(i),\nu)+c(i,\pi^{*1}(i),\nu)\psi^{*}(i)\bigg{]}
=supμP(A(i))infνP(B(i))[jSψ(j)q(j|i,μ,ν)+c(i,μ,ν)ψ(i)]\displaystyle=\sup_{\mu\in P(A(i))}\inf_{\nu\in P(B(i))}\bigg{[}\sum_{j\in S}\psi^{*}(j)q(j|i,\mu,\nu)+c(i,\mu,\nu)\psi^{*}(i)\bigg{]}
=infνP(B(i))supμP(A(i))[jSψ(j)q(j|i,μ,ν)+c(i,μ,ν)ψ(i)]\displaystyle=\inf_{\nu\in P(B(i))}\sup_{\mu\in P(A(i))}\bigg{[}\sum_{j\in S}\psi^{*}(j)q(j|i,\mu,\nu)+c(i,\mu,\nu)\psi^{*}(i)\bigg{]}
=supμP(A(i))[jSψ(j)q(j|i,μ,π2(i))+c(i,μ,π2(i))ψ(i)],iS,\displaystyle=\sup_{\mu\in P(A(i))}\bigg{[}\sum_{j\in S}\psi^{*}(j)q(j|i,\mu,\pi^{*2}(i))+c(i,\mu,\pi^{*2}(i))\psi^{*}(i)\bigg{]},~{}i\in S, (3.1)

we have

ρ\displaystyle\rho =infiSsupπ1Π1lim supT1TlogEiπ1,π2[e0Tc(ξt,π1(t),π2(ξt))𝑑t]\displaystyle=\inf_{i\in S}\sup_{\pi^{1}\in\Pi^{1}}\limsup_{T\rightarrow\infty}\frac{1}{T}\log E^{\pi^{1},\pi^{*2}}_{i}\bigg{[}e^{\int_{0}^{T}c(\xi_{t},\pi^{1}(t),\pi^{*2}(\xi_{t}))dt}\bigg{]}
=infiSsupπ1Π1infπ2Π2lim supT1TlogEiπ1,π2[e0Tc(ξt,π1(t),π2(t))𝑑t]\displaystyle=\inf_{i\in S}\sup_{\pi^{1}\in\Pi^{1}}\inf_{\pi^{2}\in\Pi^{2}}\limsup_{T\rightarrow\infty}\frac{1}{T}\log E^{\pi^{1},\pi^{2}}_{i}\bigg{[}e^{\int_{0}^{T}c(\xi_{t},\pi^{1}(t),\pi^{2}(t))dt}\bigg{]}
=infiSinfπ2Π2supπ1Π1lim supT1TlogEiπ1,π2[e0Tc(ξt,π1(t),π2(t))𝑑t].\displaystyle=\inf_{i\in S}\inf_{\pi^{2}\in\Pi^{2}}\sup_{\pi^{1}\in\Pi^{1}}\limsup_{T\rightarrow\infty}\frac{1}{T}\log E^{\pi^{1},\pi^{2}}_{i}\bigg{[}e^{\int_{0}^{T}c(\xi_{t},\pi^{1}(t),\pi^{2}(t))dt}\bigg{]}. (3.2)
Proof.

We perturb the cost function as follows.

  • If Assumption 2.2 (a) holds: We define for (a,b)A(i)×B(i)(a,b)\in A(i)\times B(i), iSi\in S, c^n(i,a,b)=c(i,a,b)I𝒟^n(i)+(c+α3)I𝒟^nc\hat{c}_{n}(i,a,b)=c(i,a,b)I_{\hat{\mathscr{D}}_{n}}(i)+(\|c\|_{\infty}+\alpha_{3})I_{\hat{\mathscr{D}}_{n}^{c}}. Here α3>0\alpha_{3}>0, is a small number satisfying c+α3<γ^\|c\|_{\infty}+\alpha_{3}<\hat{\gamma}. Note that c^n<γ^\|\hat{c}_{n}\|_{\infty}<\hat{\gamma}.

  • If Assumption 2.2 (b) holds: We define for (a,b)A(i)×B(i)(a,b)\in A(i)\times B(i), iSi\in S, c^n(i,a,b)=c(i,a,b)+1n[^(i)sup(a,b)A(i)×B(i)c(i,a,b)]+\hat{c}_{n}(i,a,b)=c(i,a,b)+\frac{1}{n}[\hat{\ell}(i)-\displaystyle\sup_{(a,b)\in A(i)\times B(i)}c(i,a,b)]_{+}. Note that the function [^()sup(a,b)A()×B()c(,a,b)]+[\hat{\ell}(\cdot)-\displaystyle\sup_{(a,b)\in A(\cdot)\times B(\cdot)}c(\cdot,a,b)]_{+} is norm-like function. Also, it is easy to see that for large enough nn, ^()sup(a,b)A()×B()c^n(,a,b)\hat{\ell}(\cdot)-\displaystyle\sup_{(a,b)\in A(\cdot)\times B(\cdot)}\hat{c}_{n}(\cdot,a,b) is norm-like.

In view of Lemma 2.3, it is clear that for π2Π2s\pi^{*2}\in\Pi^{s}_{2}, there exists (ψ~n,ρ~n)LV×+(\tilde{\psi}_{n},\tilde{\rho}_{n})\in L^{\infty}_{V}\times\mathbb{R}_{+}, ψ~n>0\tilde{\psi}_{n}>0 satisfying

ρ~nψ~n(i)=supμP(A(i))[jSψ~n(j)q(j|i,μ,π2(i))+c^n(i,μ,π2(i))ψ~n(i)]\displaystyle{\tilde{\rho}_{n}}\tilde{\psi}_{n}(i)=\sup_{\mu\in P(A(i))}\bigg{[}\sum_{j\in S}\tilde{\psi}_{n}(j)q(j|i,\mu,\pi^{*2}(i))+{\hat{c}_{n}(i,\mu,\pi^{*2}(i))}\tilde{\psi}_{n}(i)\bigg{]} (3.3)

such that

0ρ~nsupπ1Π1lim supT1TlogEiπ1,π2[e0Tc^n(ξt,π1(t),π2(ξt))𝑑t].\displaystyle 0\leq\tilde{\rho}_{n}\leq\sup_{\pi^{1}\in\Pi^{1}}\limsup_{T\rightarrow\infty}\frac{1}{T}\log E^{\pi^{1},\pi^{*2}}_{i}\bigg{[}e^{\int_{0}^{T}\hat{c}_{n}(\xi_{t},\pi^{1}(t),\pi^{*2}(\xi_{t}))dt}\bigg{]}. (3.4)

Also, for some finite set 1𝒦\mathscr{B}_{1}\supset\mathscr{B}\supset\mathscr{K} , we have

ψ~n(i)=supπ1Π1Eiπ1,π2[e0τ~(1)(c^n(ξt,π1(t),π2(ξt))ρ~n)𝑑tψ~n(ξτ~(1))],i1c.\displaystyle\tilde{\psi}_{n}(i)=\sup_{\pi^{1}\in\Pi^{1}}E^{\pi^{1},\pi^{*2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}({\mathscr{B}_{1}})}(\hat{c}_{n}(\xi_{t},\pi^{1}(t),\pi^{*2}(\xi_{t}))-\tilde{\rho}_{n})dt}\tilde{\psi}_{n}(\xi_{\tilde{\tau}(\mathscr{B}_{1})})\bigg{]},~{}i\in\mathscr{B}_{1}^{c}. (3.5)

Now from the proof of Lemma 2.3, we have a finite set ~\tilde{\mathscr{B}}, depending on nn, containing 𝒦^\hat{\mathscr{K}} such that the following cases happen:

  • Under Assumption 2.2 (a): From (3.4), we have ρ~nc^n\tilde{\rho}_{n}\leq\|\hat{c}_{n}\|_{\infty}. Thus, for i𝒟^nci\in{\hat{\mathscr{D}}_{n}}^{c}, it follows that c^n(i,a,b)ρ~n0\hat{c}_{n}(i,a,b)-\tilde{\rho}_{n}\geq 0 for all (a,b)A(i)×B(i)(a,b)\in A(i)\times B(i). Consequently, we may take ~=𝒟^n\tilde{\mathscr{B}}=\hat{\mathscr{D}}_{n} such that c^n(i,a,b)ρ~n0\hat{c}_{n}(i,a,b)-\tilde{\rho}_{n}\geq 0 in ~c\tilde{\mathscr{B}}^{c} for all (a,b)A(i)×B(i)(a,b)\in A(i)\times B(i).

  • Under Assumption 2.2 (b): since c^n\hat{c}_{n} is norm-like function, we can choose suitable finite set ~\tilde{\mathscr{B}} such that (c^n(i,a,b)ρ~n)0(\hat{c}_{n}(i,a,b)-\tilde{\rho}_{n})\geq 0 in ~c\tilde{\mathscr{B}}^{c} for all (a,b)A(i)×B(i)(a,b)\in A(i)\times B(i).

For any π1Π1\pi^{1}\in\Pi^{1}, applying Dynkin formula and using (3.3) and Lemma 2.1, we get

ψ~n(i)Eiπ1,π2[e0τ~(~)T(c^n(ξt,π1(t),π2(ξt))ρ~n)𝑑tψ~n(ξτ~(~)T)].\displaystyle\tilde{\psi}_{n}(i)\geq E^{\pi^{1},\pi^{*2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}(\tilde{\mathscr{B}})\wedge T}(\hat{c}_{n}(\xi_{t},\pi^{1}(t),\pi^{*2}(\xi_{t}))-\tilde{\rho}_{n})dt}\tilde{\psi}_{n}(\xi_{\tilde{\tau}(\tilde{\mathscr{B}})\wedge T})\bigg{]}.

Since for i~ci\in\tilde{\mathscr{B}}^{c}, c^n(i,a,b)ρ~n0\hat{c}_{n}(i,a,b)-\tilde{\rho}_{n}\geq 0, by Fatou’s lemma taking TT\rightarrow\infty, we get

ψ~n(i)\displaystyle\tilde{\psi}_{n}(i) Eiπ1,π2[e0τ~(~)(c^n(ξt,π1(t),π2(ξt))ρ~n)𝑑tψ~n(ξτ~(~))]\displaystyle\geq E^{\pi^{1},\pi^{*2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}(\tilde{\mathscr{B}})}(\hat{c}_{n}(\xi_{t},\pi^{1}(t),\pi^{*2}(\xi_{t}))-\tilde{\rho}_{n})dt}\tilde{\psi}_{n}(\xi_{\tilde{\tau}(\tilde{\mathscr{B}})})\bigg{]}
(min~ψ~n)i~c.\displaystyle\geq(\min_{\tilde{\mathscr{B}}}\tilde{\psi}_{n})~{}\forall~{}i\in\tilde{\mathscr{B}}^{c}.

This implies that, ψ~n\tilde{\psi}_{n} has a lower bound. Now, applying Dynkin formula, and using (3.3) and Lemma 2.1, we deduce that

ψ~n(i)\displaystyle\tilde{\psi}_{n}(i) Eiπ1,π2[e0TτN(c^n(ξt,π1(t),π2(ξt))ρ~n)𝑑tψ~n(ξTτN)],\displaystyle\geq E^{\pi^{1},\pi^{*2}}_{i}\bigg{[}e^{\int_{0}^{T\wedge\tau_{N}}(\hat{c}_{n}(\xi_{t},\pi^{1}(t),\pi^{*2}(\xi_{t}))-\tilde{\rho}_{n})dt}\tilde{\psi}_{n}(\xi_{T\wedge\tau_{N}})\bigg{]},

for any iSi\in S, where τN:=inf{t0:ξt{1,2,,N}}\tau_{N}:=\inf\{t\geq 0:\xi_{t}\notin\{1,2,\cdots,N\}\}, NN\in\mathbb{N}. By Fatou’s lemma taking NN\rightarrow\infty, we get

ψ~n(i)\displaystyle\tilde{\psi}_{n}(i) Eiπ1,π2[e0T(c^n(ξt,π1(t),π2(ξt))ρ~n)𝑑tψ~n(ξT)]\displaystyle\geq E^{\pi^{1},\pi^{*2}}_{i}\bigg{[}e^{\int_{0}^{T}(\hat{c}_{n}(\xi_{t},\pi^{1}(t),\pi^{*2}(\xi_{t}))-\tilde{\rho}_{n})dt}\tilde{\psi}_{n}(\xi_{T})\bigg{]}
min~ψ~nEiπ1,π2[e0T(c^n(ξt,π1(t),π2(ξt))ρ~n)𝑑t].\displaystyle\geq\min_{\tilde{\mathscr{B}}}\tilde{\psi}_{n}E^{\pi^{1},\pi^{*2}}_{i}\bigg{[}e^{\int_{0}^{T}(\hat{c}_{n}(\xi_{t},\pi^{1}(t),\pi^{*2}(\xi_{t}))-\tilde{\rho}_{n})dt}\bigg{]}.

Thus, taking logarithm both sides, dividing by TT and letting TT\to\infty, we obtain

ρ~nlim supT1TlogEiπ1,π2[e0Tc^n(ξt,π1(t),π2(ξt))𝑑t].\displaystyle\tilde{\rho}_{n}\geq\limsup_{T\rightarrow\infty}\frac{1}{T}\log E^{\pi^{1},\pi^{*2}}_{i}\bigg{[}e^{\int_{0}^{T}\hat{c}_{n}(\xi_{t},\pi^{1}(t),\pi^{*2}(\xi_{t}))dt}\bigg{]}.

Since π1Π1\pi^{1}\in\Pi^{1} arbitrary, it follows that

ρ~n\displaystyle\tilde{\rho}_{n} supπ1Π1lim supT1TlogEiπ1,π2[e0Tc^n(ξt,π1(t),π2(ξt))𝑑t]\displaystyle\geq\sup_{\pi^{1}\in\Pi^{1}}\limsup_{T\rightarrow\infty}\frac{1}{T}\log E^{\pi^{1},\pi^{*2}}_{i}\bigg{[}e^{\int_{0}^{T}\hat{c}_{n}(\xi_{t},\pi^{1}(t),\pi^{*2}(\xi_{t}))dt}\bigg{]}
supπ1Π1lim supT1TlogEiπ1,π2[e0Tc(ξt,π1(t),π2(ξt))𝑑t].\displaystyle\geq\sup_{\pi^{1}\in\Pi^{1}}\limsup_{T\rightarrow\infty}\frac{1}{T}\log E^{\pi^{1},\pi^{*2}}_{i}\bigg{[}e^{\int_{0}^{T}{c}(\xi_{t},\pi^{1}(t),\pi^{*2}(\xi_{t}))dt}\bigg{]}.

Using this and (3.4), we get supπ1Π1J(i,c,π1,π2)supπ1Π1J(i,c^n,π1,π2)=ρ~n\displaystyle\sup_{\pi^{1}\in\Pi^{1}}J(i,c,\pi^{1},\pi^{*2})\leq\sup_{\pi^{1}\in\Pi^{1}}J(i,\hat{c}_{n},\pi^{1},\pi^{*2})=\tilde{\rho}_{n} for all nn. From the definition of c^n\hat{c}_{n}, it is easy to see that ρ~n\tilde{\rho}_{n} is a decreasing sequence which has a lower bound. Now by similar arguments as in Lemma 2.3, it follows that there exists a pair (ρ~,ψ~)(\tilde{\rho},\tilde{\psi}) such that ρ~nρ~\tilde{\rho}_{n}\rightarrow\tilde{\rho} and ψ~nψ~\tilde{\psi}_{n}\rightarrow\tilde{\psi} as nn\rightarrow\infty. As in Lemma 2.3, by taking nn\rightarrow\infty in (3.3), we get

ρ~ψ~(i)=supμP(A(i))[jSψ~(j)q(j|i,μ,π2(i))+c(i,μ,π2(i))ψ~(i)].\displaystyle{\tilde{\rho}}\tilde{\psi}(i)=\sup_{\mu\in P(A(i))}\bigg{[}\sum_{j\in S}\tilde{\psi}(j)q(j|i,\mu,\pi^{*2}(i))+{{c}(i,\mu,\pi^{*2}(i))}\tilde{\psi}(i)\bigg{]}. (3.6)

Also, we have ρ~supπ1Π1J(i,c,π1,π2)ρ\displaystyle\tilde{\rho}\geq\sup_{\pi^{1}\in\Pi^{1}}J(i,c,\pi^{1},\pi^{*2})\geq\rho. Now, we want to show that ρ~=ρ\tilde{\rho}=\rho. Let π~1\tilde{\pi}^{*1} be a selector in (3.6). Thus

ρ~ψ~(i)=[jSψ~(j)q(j|i,π~1(i),π2(i))+c(i,π~1(i),π2(i))ψ~(i)].\displaystyle{\tilde{\rho}}\tilde{\psi}(i)=\bigg{[}\sum_{j\in S}\tilde{\psi}(j)q(j|i,\tilde{\pi}^{*1}(i),\pi^{*2}(i))+{{c}(i,\tilde{\pi}^{*1}(i),\pi^{*2}(i))}\tilde{\psi}(i)\bigg{]}. (3.7)

In view of estimates in Lemma 2.1, applying Dynkin’s formula and the dominated convergence theorem, from (3.7) we deduce that there exists a finite set 21\mathscr{B}_{2}\supset\mathscr{B}_{1} such that

ψ~(i)Eiπ~1,π2[e0τ~(2)(c(ξt,π~1(ξt),π2(ξt))ρ~)𝑑tψ~(ξτ~(2))],i2c.\displaystyle\tilde{\psi}(i)\leq E^{\tilde{\pi}^{*1},\pi^{*2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}({\mathscr{B}_{2}})}({c}(\xi_{t},\tilde{\pi}^{*1}(\xi_{t}),\pi^{*2}(\xi_{t}))-\tilde{\rho})dt}\tilde{\psi}(\xi_{\tilde{\tau}(\mathscr{B}_{2})})\bigg{]},~{}\forall i\in\mathscr{B}_{2}^{c}. (3.8)

Since ρ~ρ\tilde{\rho}\geq\rho, arguing as in Lemma 2.3 (see, (2.29)) for 2\mathscr{B}_{2}\supset\mathscr{B} we have

ψ(i)Eiπ~1,π2[e0τ~(2)(c(ξt,π~1(ξt),π2(ξt))ρ~)𝑑tψ(ξτ~(2))]i2c.\displaystyle\psi^{*}(i)\geq E^{\tilde{\pi}^{*1},\pi^{*2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}({\mathscr{B}_{2}})}({c}(\xi_{t},\tilde{\pi}^{*1}(\xi_{t}),\pi^{*2}(\xi_{t}))-\tilde{\rho})dt}\psi^{*}(\xi_{\tilde{\tau}(\mathscr{B}_{2})})\bigg{]}~{}\forall i\in\mathscr{B}_{2}^{c}. (3.9)

Now we choose an appropriate constant κ\kappa (e.g., κ=min2ψψ~\displaystyle\kappa=\min_{\mathscr{B}_{2}}\frac{\psi^{*}}{\tilde{\psi}}), so that (ψκψ~)0(\psi^{*}-\kappa\tilde{\psi})\geq 0 in 2\mathscr{B}_{2} and for some i^02,\hat{i}_{0}\in\mathscr{B}_{2},(ψκψ~)(i^0)=0(\psi^{*}-\kappa\tilde{\psi})(\hat{i}_{0})=0. From (3.8) and (3.9), we get

ψ(i)κψ~(i)Eiπ~1,π2[e0τ~(2)(c(ξt,π~1(ξt),π2(ξt))ρ~)𝑑t(ψκψ~)(ξτ~(2))]i2c.\displaystyle\psi^{*}(i)-\kappa\tilde{\psi}(i)\geq E^{\tilde{\pi}^{*1},\pi^{*2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}({\mathscr{B}_{2}})}({c}(\xi_{t},\tilde{\pi}^{*1}(\xi_{t}),\pi^{*2}(\xi_{t}))-\tilde{\rho})dt}(\psi^{*}-\kappa\tilde{\psi})(\xi_{\tilde{\tau}(\mathscr{B}_{2})})\bigg{]}~{}\forall i\in\mathscr{B}_{2}^{c}. (3.10)

From the above expression it is easy to see that (ψκψ~)0(\psi^{*}-\kappa\tilde{\psi})\geq 0 in SS. Now using (3.1), (3.7) and the fact that ρ~ρ\tilde{\rho}\geq\rho, we get

ρ~(ψ\displaystyle\tilde{\rho}(\psi^{*} κψ~)(i^0)\displaystyle-\kappa\tilde{\psi})(\hat{i}_{0})
[jS(ψκψ~)(j)q(j|i^0,π~1(i^0),π2(i^0))+c(i^0,π~1(i^0),π2(i^0))(ψκψ~)(i^0)].\displaystyle\geq\biggl{[}\sum_{j\in S}(\psi^{*}-\kappa\tilde{\psi})(j)q(j|\hat{i}_{0},\tilde{\pi}^{*1}(\hat{i}_{0}),\pi^{*2}(\hat{i}_{0}))+c(\hat{i}_{0},\tilde{\pi}^{*1}(\hat{i}_{0}),\pi^{*2}(\hat{i}_{0}))(\psi^{*}-\kappa\tilde{\psi})(\hat{i}_{0})\biggr{]}.

This implies that

ji^0(ψκψ~)(j)q(j|i^0,π~1(i^0),π2(i^0))=0.\sum_{j\neq\hat{i}_{0}}(\psi^{*}-\kappa\tilde{\psi})(j)q(j|{\hat{i}_{0}},\tilde{\pi}^{*1}(\hat{i}_{0}),\pi^{*2}(\hat{i}_{0}))=0\,. (3.11)

Since the Markov chain ξ\xi is irreducible under (π~1,π2)(\tilde{\pi}^{*1},\pi^{*2}), by (3.11), we have ψ=κψ~\psi^{*}=\kappa\tilde{\psi} in SS. From (3.1) and (3.6) it follows that ρ~=ρ\tilde{\rho}=\rho. This proves (3.2). ∎

In the next theorem we show that any mini-max selector of (2.28) is a saddle point equilibrium.

Theorem 3.2.

Suppose Assumptions 2.1, 2.2, and 2.3 hold. Then any mini-max selector (π1,π2)Π1s×Π2s(\pi^{*1},\pi^{*2})\in\Pi^{s}_{1}\times\Pi^{s}_{2} of (2.28) is a saddle point equilibrium.

Proof.

Arguing as in Lemma 2.3 and Theorem 3.1, there exists (ρπ1,π2,ψπ1,π2)+×LV(\rho^{\pi^{*1},\pi^{*2}},\psi^{\pi^{*1},\pi^{*2}})\in\mathbb{R}_{+}\times L^{\infty}_{V} with ψπ1,π2>0\psi^{\pi^{*1},\pi^{*2}}>0 satisfying

ρπ1,π2ψπ1,π2(i)=[jSψπ1,π2(j)q(j|i,π1(i),π2(i))+c(i,π1(i),π2(i))ψπ1,π2(i)].\displaystyle\rho^{\pi^{*1},\pi^{*2}}\psi^{\pi^{*1},\pi^{*2}}(i)=\bigg{[}\sum_{j\in S}\psi^{\pi^{*1},\pi^{*2}}(j)q(j|i,\pi^{*1}(i),\pi^{*2}(i))+c(i,\pi^{*1}(i),\pi^{*2}(i))\psi^{\pi^{*1},\pi^{*2}}(i)\bigg{]}. (3.12)

Furthermore ρπ1,π2=J(i,c,π1,π2)\rho^{\pi^{*1},\pi^{*2}}=J(i,c,\pi^{*1},\pi^{*2}) and for some finite set 𝒦^\mathscr{B}\supset\hat{\mathscr{K}} (without loss of generality denoting by the same notation)

ψπ1,π2(i)=Eiπ1,π2[e0τ~()(c(ξt,π1(ξt),π2(ξt))ρπ1,π2)𝑑tψπ1,π2(ξτ~())]ic.\displaystyle\psi^{\pi^{*1},\pi^{*2}}(i)=E^{\pi^{*1},\pi^{*2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}(\mathscr{B})}(c(\xi_{t},\pi^{*1}(\xi_{t}),\pi^{*2}(\xi_{t}))-\rho^{\pi^{*1},\pi^{*2}})dt}\psi^{\pi^{*1},\pi^{*2}}(\xi_{\tilde{\tau}(\mathscr{B})})\bigg{]}~{}\forall i\in\mathscr{B}^{c}. (3.13)

Thus, from (3.2) it is clear that ρπ1,π2ρ\rho^{\pi^{*1},\pi^{*2}}\leq\rho . Now, following similar arguments as in Theorem 3.1 it is easy to see that ρπ1,π2=ρ\rho^{\pi^{*1},\pi^{*2}}=\rho . This implies that J(i,c,π1,π2)ρπ1,π2J(i,c,\pi^{1},\pi^{*2})\leq\rho^{\pi^{*1},\pi^{*2}} for all π1Π1\pi^{1}\in\Pi^{1} . Next, from [7] it is clear that if we consider the minimization problem minπ2Π2J(i,c,π1,π2)\displaystyle\min_{\pi^{2}\in\Pi^{2}}J(i,c,\pi^{*1},\pi^{2}), then the optimal control exists in the space of stationary Markov strategies. Thus to complete the proof, it is enough to show that J(i,c,π1,π2)J(i,c,π1,π2)J(i,c,\pi^{*1},\pi^{*2})\leq J(i,c,\pi^{*1},\pi^{2}) for any π2Π2s\pi^{2}\in\Pi^{s}_{2} . If not suppose that J(i,c,π1,π2)>J(i,c,π1,π2)J(i,c,\pi^{*1},\pi^{*2})>J(i,c,\pi^{*1},\pi^{2}) for some π2Π2s\pi^{2}\in\Pi^{s}_{2} . We know that for π2Π2s\pi^{2}\in\Pi^{s}_{2}, there exists (ρπ1,π2,ψπ1,π2)+×LV(\rho^{\pi^{*1},\pi^{2}},\psi^{\pi^{*1},\pi^{2}})\in\mathbb{R}_{+}\times L^{\infty}_{V} with ψπ1,π2>0\psi^{\pi^{*1},\pi^{2}}>0 satisfying

ρπ1,π2ψπ1,π2(i)=[jSψπ1,π2(j)q(j|i,π1(i),π2(i))+c(i,π1(i),π2(i))ψπ1,π2(i)],\displaystyle\rho^{\pi^{*1},\pi^{2}}\psi^{\pi^{*1},\pi^{2}}(i)=\bigg{[}\sum_{j\in S}\psi^{\pi^{*1},\pi^{2}}(j)q(j|i,\pi^{*1}(i),\pi^{2}(i))+c(i,\pi^{*1}(i),\pi^{2}(i))\psi^{\pi^{*1},\pi^{2}}(i)\bigg{]}, (3.14)

also we have ρπ1,π2=J(i,c,π1,π2)\rho^{\pi^{*1},\pi^{2}}=J(i,c,\pi^{*1},\pi^{2}) and for some finite set \mathscr{B} (𝒦^\supset\hat{\mathscr{K}})

ψπ1,π2(i)=Eiπ1,π2[e0τ~()(c(ξt,π1(ξt),π2(ξt))ρπ1,π2)𝑑tψπ1,π2(ξτ~())]ic.\displaystyle\psi^{\pi^{*1},\pi^{2}}(i)=E^{\pi^{*1},\pi^{2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}(\mathscr{B})}(c(\xi_{t},\pi^{*1}(\xi_{t}),\pi^{2}(\xi_{t}))-\rho^{\pi^{*1},\pi^{2}})dt}\psi^{\pi^{*1},\pi^{2}}(\xi_{\tilde{\tau}(\mathscr{B})})\bigg{]}~{}\forall i\in\mathscr{B}^{c}. (3.15)

From (2.29), we deduce that

ψ(i)Eiπ1,π2[e0τ~()(c(ξt,π1(ξt),π2(ξt))ρ)𝑑tψ(ξτ~())],ic.\displaystyle\psi^{*}(i)\leq E^{\pi^{*1},\pi^{2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}(\mathscr{B})}(c(\xi_{t},\pi^{*1}(\xi_{t}),\pi^{2}(\xi_{t}))-\rho)dt}\psi^{*}(\xi_{\tilde{\tau}(\mathscr{B})})\bigg{]},~{}\forall i\in\mathscr{B}^{c}. (3.16)

Now, as in Theorem 3.1, using (3.15) and (3.16) one can deduce that ψπ1,π2=ηψ\psi^{\pi^{*1},\pi^{2}}=\eta\psi^{*} for some positive constant η\eta . In view (2.28) and (3.14), it follows that ρρπ1,π2\rho\leq\rho^{\pi^{*1},\pi^{2}}, i.e., J(i,c,π1,π2)J(i,c,π1,π2)J(i,c,\pi^{*1},\pi^{*2})\leq J(i,c,\pi^{*1},\pi^{2}), which is a contradiction. This completes the proof. ∎

Next we prove the converse of the above theorem.

Theorem 3.3.

Suppose Assumptions 2.1, 2.2, and 2.3 hold. If there exists a saddle point equilibrium (π^1,π^2)Π1s×Π2s(\hat{\pi}^{*1},\hat{\pi}^{*2})\in\Pi^{s}_{1}\times\Pi^{s}_{2} , i.e.,

J(i,c,π1,π^2)J(i,c,π^1,π^2)J(i,c,π^1,π2),\displaystyle J(i,c,\pi^{1},\hat{\pi}^{*2})\leq J(i,c,\hat{\pi}^{*1},\hat{\pi}^{*2})\leq J(i,c,\hat{\pi}^{*1},\pi^{2})\,,

for all iSi\in S , π1Π1\pi^{1}\in\Pi^{1} and π2Π2\pi^{2}\in\Pi^{2} . Then (π^1,π^2)(\hat{\pi}^{*1},\hat{\pi}^{*2}) is a mini-max selector of (2.28).

Proof.

From Theorem 3.2, we deduce that

ρ=infπ2Π2supπ1Π1J(i,c,π1,π2)\displaystyle\rho=\inf_{\pi^{2}\in\Pi^{2}}\sup_{\pi^{1}\in\Pi^{1}}J(i,c,\pi^{1},\pi^{2}) supπ1Π1J(i,c,π1,π^2)J(i,c,π^1,π^2)\displaystyle\leq\sup_{\pi^{1}\in\Pi^{1}}J(i,c,\pi^{1},\hat{\pi}^{*2})\leq J(i,c,\hat{\pi}^{*1},\hat{\pi}^{*2})
infπ2Π2J(i,c,π^1,π2)supπ1Π1infπ2Π2J(i,c,π1,π2)=ρ.\displaystyle\leq\inf_{\pi^{2}\in\Pi^{2}}J(i,c,\hat{\pi}^{*1},\pi^{2})\leq\sup_{\pi^{1}\in\Pi^{1}}\inf_{\pi^{2}\in\Pi^{2}}J(i,c,\pi^{1},\pi^{2})=\rho\,.

This implies that ρ=J(i,c,π^1,π^2)\rho=J(i,c,\hat{\pi}^{*1},\hat{\pi}^{*2})  and ρ=infπ2Π2J(i,c,π^1,π2)\displaystyle\rho=\inf_{\pi^{2}\in\Pi^{2}}J(i,c,\hat{\pi}^{*1},\pi^{2}) . Arguing as in Lemma 2.3 and using Theorem 3.1, it follows that for π^1Π1s\hat{\pi}^{*1}\in\Pi_{1}^{s} there exists (ρπ^1,ψπ^1)+×LV(\rho_{\hat{\pi}^{*1}},\psi_{\hat{\pi}^{*1}}^{*})\in\mathbb{R}_{+}\times L^{\infty}_{V} with ψπ^1>0\psi_{\hat{\pi}^{*1}}^{*}>0 such that

ρπ^1ψπ^1(i)=infνP(B(i))[jSψπ^1(j)q(j|i,π^1(i),ν)+c(i,π^1(i),ν)ψπ^1(i)],\displaystyle\rho_{\hat{\pi}^{*1}}\psi_{\hat{\pi}^{*1}}^{*}(i)=\inf_{\nu\in P(B(i))}\bigg{[}\sum_{j\in S}\psi_{\hat{\pi}^{*1}}^{*}(j)q(j|i,\hat{\pi}^{*1}(i),\nu)+c(i,\hat{\pi}^{*1}(i),\nu)\psi_{\hat{\pi}^{*1}}^{*}(i)\bigg{]}, (3.17)

and ρπ^1=ρ\rho_{\hat{\pi}^{*1}}=\rho (since ρ=infπ2Π2J(i,c,π^1,π2)\displaystyle\rho=\inf_{\pi^{2}\in\Pi^{2}}J(i,c,\hat{\pi}^{*1},\pi^{2})). Let (π1,π2)({\pi}^{*1},{\pi}^{*2}) be any mini-max selector of (2.28). Then form the above, we get

ρπ^1ψπ^1(i)[jSψπ^1(j)q(j|i,π^1(i),π2(i))+c(i,π^1(i),π2(i))ψπ^1(i)].\displaystyle\rho_{\hat{\pi}^{*1}}\psi_{\hat{\pi}^{*1}}^{*}(i)\leq\bigg{[}\sum_{j\in S}\psi_{\hat{\pi}^{*1}}^{*}(j)q(j|i,\hat{\pi}^{*1}(i),\pi^{*2}(i))+c(i,\hat{\pi}^{*1}(i),\pi^{*2}(i))\psi_{\hat{\pi}^{*1}}^{*}(i)\bigg{]}. (3.18)

Again arguing as in Lemma 2.3, for some 𝒦^\mathscr{B}\supset\hat{\mathscr{K}} we have

ψπ^1(i)Eiπ^1,π2[e0τ~()(c(ξt,π^1(ξt),π2(ξt))ρ)𝑑tψπ^1(ξτ~())]ic.\displaystyle\psi_{\hat{\pi}^{*1}}^{*}(i)\leq E^{\hat{\pi}^{*1},\pi^{*2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}(\mathscr{B})}(c(\xi_{t},\hat{\pi}^{*1}(\xi_{t}),\pi^{*2}(\xi_{t}))-\rho)dt}\psi_{\hat{\pi}^{*1}}^{*}(\xi_{\tilde{\tau}(\mathscr{B})})\bigg{]}~{}\forall i\in\mathscr{B}^{c}. (3.19)

Since, (π1,π2)({\pi}^{*1},{\pi}^{*2}) is a mini-max selector of (2.28), we have

ρψ(i)[jSψ(j)q(j|i,π^1(i),π2(i))+c(i,π^1(i),π2(i))ψ(i)].\displaystyle\rho\psi^{*}(i)\geq\bigg{[}\sum_{j\in S}\psi^{*}(j)q(j|i,\hat{\pi}^{*1}(i),\pi^{*2}(i))+c(i,\hat{\pi}^{*1}(i),\pi^{*2}(i))\psi^{*}(i)\bigg{]}.

Thus, by applying Dynkin’s formula and Fatou’s lemma, we obtain

ψ(i)Eiπ^1,π2[e0τ~()(c(ξt,π^1(ξt),π2(ξt))ρ)𝑑tψ(ξτ~())]ic.\displaystyle\psi^{*}(i)\geq E^{\hat{\pi}^{*1},\pi^{*2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}(\mathscr{B})}(c(\xi_{t},\hat{\pi}^{*1}(\xi_{t}),\pi^{*2}(\xi_{t}))-\rho)dt}\psi^{*}(\xi_{\tilde{\tau}(\mathscr{B})})\bigg{]}~{}\forall i\in\mathscr{B}^{c}. (3.20)

Using (3.19) and (3.20), and following the arguments as in Theorem 3.1 one can show that ψ=η^ψπ^1\psi^{*}=\hat{\eta}\psi_{\hat{\pi}^{*1}}^{*} for some positive constant η^\hat{\eta}. Therefore, combining (2.28) and (3.17) it is easy to see that π^1\hat{\pi}^{*1} is an outer maximizing selector of (2.28). By similar arguments we can show that π^2\hat{\pi}^{*2} is an outer minimizing selector of (2.28). This completes the proof. ∎

4. Example

In this section an illustrative example is presented. In our model the transition rate is unbounded, and the cost rate is nonnegative and unbounded.

Example 4.1.

Consider a controlled birth-death system in which the state variable denotes the total population at each time t0t\geq 0. In this system there are ‘natural’ arrival and departure rates, say μ^\hat{\mu} and λ^\hat{\lambda}, respectively. Here player 1 controls arrival parameters h^1\hat{h}_{1} and player 2 controls departure parameters h^2\hat{h}_{2}. At any time tt, when the state of the system is iS:={0,1,}i\in S:=\{0,1,\cdots\}, player 1 takes an action aa from a given set A(i)A(i) (which is a compact subset of some Polish space AA). This action may increase (h^1(i,a)0)(\hat{h}_{1}(i,a)\geq 0) or decrease (h^1(i,a)0)(\hat{h}_{1}(i,a)\leq 0), the arrival rate and these actions result in a payoff denoted by c^1(i,a)\hat{c}_{1}(i,a) per unit time. Similarly, if the state is i{1,2,}i\in\{1,2,\cdots\}, player 2 takes an action bb from a set B(i)B(i) (which is a compact subset of a Polish space BB) to increase (h^2(i,b)0)(\hat{h}_{2}(i,b)\geq 0) or to decrease (h^2(i,b)0)(\hat{h}_{2}(i,b)\leq 0), the departure rate and these actions produce a payoff denoted by c^2(i,b)\hat{c}_{2}(i,b) per unit time. Also, in addition, assume that player 1 ‘owns’ the system and he/she gets a reward p^i\hat{p}\cdot i for each unit of time during which the system remains in the state iSi\in S, where p^>0\hat{p}>0 is a fixed reward fee per customer. We also, assume that when the state of the system reaches at state i=0i=0, any number of arrivals may occur. When there is no customer in the system, (i.e., i=0i=0), control of departure is unnecessary.

We next formulate this model as a continuous-time Markov game. The corresponding transition rate q(j|i,a,b)q(j|i,a,b) and reward rate c(i,a,b)c(i,a,b) for player 1 are given as follows: for (0,a,b)K(0,a,b)\in K (KK as in the game model (2.1)). We take

q(j|0,a,b)=α(j+3)4 for allj1,such thatjSq(j|0,a,b)=0,\displaystyle q(j|0,a,b)=\frac{\alpha}{(j+3)^{4}}~{}\text{ for all}~{}j\geq 1,~{}\text{such that}~{}\sum_{j\in S}q(j|0,a,b)=0, (4.1)

where α>0\alpha>0 is some constant so that q(0|0,a,b)3q(0|0,a,b)\leq-3. Also for (i,a,b)K(i,a,b)\in K with i1i\geq 1,

q(j|i,a,b)={λ^(i+3)2+h^2(i,b),ifj=i1μ^iλ^(i+3)2h^1(i,a)h^2(i,b),ifj=iμ^i+h^1(i,a),ifj=i+10,otherwise.\displaystyle q(j|i,a,b)=\left\{\begin{array}[]{lll}{}\displaystyle&\hat{\lambda}(i+3)^{2}+\hat{h}_{2}(i,b),~{}\text{if}~{}j=i-1\\ &-\hat{\mu}i-\hat{\lambda}(i+3)^{2}-\hat{h}_{1}(i,a)-\hat{h}_{2}(i,b),~{}\text{if}~{}j=i\\ &\hat{\mu}i+\hat{h}_{1}(i,a),~{}\text{if}~{}j=i+1\\ &0,~{}\text{otherwise}\displaystyle{}.\end{array}\right. (4.5)
c(i,a,b):=p^ic^1(i,a)+c^2(i,b) for (i,a,b)K.\displaystyle c(i,a,b):=\hat{p}\cdot i-\hat{c}_{1}(i,a)+\hat{c}_{2}(i,b)~{}\text{ for }~{}(i,a,b)\in K. (4.6)

We now explore conditions under which there exists a pair of optimal strategies. To do so, we make the following assumptions.

  1. (I)

    Let λ^max{μ^,2}\hat{\lambda}\geq\max\{\hat{\mu},2\}, μ^i+h^1(i,a)>0\hat{\mu}i+\hat{h}_{1}(i,a)>0, and λ^(i+3)2+h^2(i,b)>0\hat{\lambda}(i+3)^{2}+\hat{h}_{2}(i,b)>0 for all (i,a,b)K(i,a,b)\in K with i1i\geq 1; and assume that h^1(0,a)>0\hat{h}_{1}(0,a)>0 and h^2(0,b)=0\hat{h}_{2}(0,b)=0 for all (a,b)A(i)×B(i).(a,b)\in A(i)\times B(i).

  2. (II)

    The functions h^1(,):S×A[μ^,μ^]\hat{h}_{1}(\cdot,\cdot):S\times A\rightarrow[-\hat{\mu},\hat{\mu}], h^2(,):S×B[λ^,λ^]\hat{h}_{2}(\cdot,\cdot):S\times B\rightarrow[-\hat{\lambda},\hat{\lambda}], c^1(i,a)\hat{c}_{1}(i,a), and c^2(i,b)\hat{c}_{2}(i,b) are continuous with their respective variables for each fixed iSi\in S. Also, assume that min(a,b)A(i)×B(i)[c^1(i,a)c^2(i,b)]\displaystyle\min_{(a,b)\in A(i)\times B(i)}[\hat{c}_{1}(i,a)-\hat{c}_{2}(i,b)] is norm-like function and p^ic^1(i,a)+c^2(i,b)0\hat{p}\cdot i-\hat{c}_{1}(i,a)+\hat{c}_{2}(i,b)\geq 0 for (i,a,b)K(i,a,b)\in K. Here we take p^<1\hat{p}<1.

Proposition 4.1.

Under conditions (I)-(II), the above controlled birth-death system satisfies the Assumptions 2.1, 2.2, and 2.3. Hence by Theorem 3.2, there exists a pair of optimal strategies.

Proof.

Consider the Lyapunov function given by

V(i):=(i+3)2 for iS.\displaystyle V(i):=(i+3)^{2}~{}\text{ for }~{}i\in S.

We have V(i)1V(i)\geq 1 for all iSi\in S. Now for each i1i\geq 1, and (a,b)A(i)×B(i)(a,b)\in A(i)\times B(i), we have

jSq(j|i,a,b)V(j)\displaystyle\sum_{j\in S}q(j|i,a,b)V(j) =q(i1|i,a,b)V(i1)+V(i)q(i|i,a,b)+V(i+1)q(i+1|i,a,b)\displaystyle=q(i-1|i,a,b)V(i-1)+V(i)q(i|i,a,b)+V(i+1)q(i+1|i,a,b)
=(i+2)2[λ^(i+3)2+h^2(i,b)](i+3)2[μ^i+λ^(i+3)2+h^1(i,a)+h^2(i,b)]\displaystyle=(i+2)^{2}[\hat{\lambda}(i+3)^{2}+\hat{h}_{2}(i,b)]-(i+3)^{2}[\hat{\mu}i+\hat{\lambda}(i+3)^{2}+\hat{h}_{1}(i,a)+\hat{h}_{2}(i,b)]
+(i+4)2[μ^i+h^1(i,a)]\displaystyle\quad+(i+4)^{2}[\hat{\mu}i+\hat{h}_{1}(i,a)]
=[λ^(i+3)2+h^2(i,b)](2i+5)+(μ^i+h^1(i,a))(2i+7)\displaystyle=-[\hat{\lambda}(i+3)^{2}+\hat{h}_{2}(i,b)](2i+5)+(\hat{\mu}i+\hat{h}_{1}(i,a))(2i+7)
=λ^(i+3)2(i+3+i+2)h^2(i,b)(2i+5)+(μ^i+h^1(i,a))(2i+7)\displaystyle=-\hat{\lambda}(i+3)^{2}(i+3+i+2)-\hat{h}_{2}(i,b)(2i+5)+(\hat{\mu}i+\hat{h}_{1}(i,a))(2i+7)
=λ^(i+3)2(i+3)λ^(i+3)2(i+2)h^2(i,b)(2i+5)+(μ^i+h^1(i,a))(2i+7)\displaystyle=-\hat{\lambda}(i+3)^{2}(i+3)-\hat{\lambda}(i+3)^{2}(i+2)-\hat{h}_{2}(i,b)(2i+5)+(\hat{\mu}i+\hat{h}_{1}(i,a))(2i+7)
λ^(i+3)(i+3)2λ^(i+3)2(i+2)+λ^(2i+5)+λ^i(2i+7)+λ^(2i+7)\displaystyle\leq-\hat{\lambda}(i+3)(i+3)^{2}-\hat{\lambda}(i+3)^{2}(i+2)+\hat{\lambda}(2i+5)+\hat{\lambda}i(2i+7)+\hat{\lambda}(2i+7)
( sinceh2(i,b)λ^,μ^λ^,h1(i,a)μ^λ^, by conditions (I) and (II))\displaystyle~{}(\text{ since}-h_{2}(i,b)\leq\hat{\lambda},~{}\hat{\mu}\leq\hat{\lambda},~{}h_{1}(i,a)\leq\hat{\mu}\leq\hat{\lambda},\text{ by conditions (I) and (II)})
=λ^2(i+3)V(i)+{λ^2(i+3)(i+3)2λ^(i+3)2(i+2)+λ^(2i+5)\displaystyle=-\frac{\hat{\lambda}}{2}(i+3)V(i)+\biggl{\{}-\frac{\hat{\lambda}}{2}(i+3)(i+3)^{2}-\hat{\lambda}(i+3)^{2}(i+2)+\hat{\lambda}(2i+5)
+λ^i(2i+7)+λ^(2i+7)}\displaystyle+\hat{\lambda}i(2i+7)+\hat{\lambda}(2i+7)\biggr{\}}
λ^2(i+3)V(i)( since the term within the second bracket is negative)\displaystyle\leq-\frac{\hat{\lambda}}{2}(i+3)V(i)~{}(\text{ since the term within the second bracket is negative})
(i+3)V(i)( by condition (I), since λ^2)\displaystyle\leq-(i+3)V(i)~{}(\text{ by condition (I), since }\hat{\lambda}\geq 2)
=^(i)V(i)\displaystyle=-\hat{\ell}(i)V(i) (4.7)

where ^(i)=i+3\hat{\ell}(i)=i+3. For i=0i=0,

jSq(j|i,a,b)V(j)\displaystyle\sum_{j\in S}q(j|i,a,b)V(j) =9q(0|0,a,b)+j1q(j|0,a,b)(j+3)2\displaystyle=9q(0|0,a,b)+\sum_{j\geq 1}q(j|0,a,b)(j+3)^{2}
CI𝒦^(0)^(0)V(0),\displaystyle\leq CI_{\hat{\mathscr{K}}}(0)-\hat{\ell}(0)V(0), (4.8)

where 𝒦^={0}\hat{\mathscr{K}}=\{0\} and C=απ26C=\frac{\alpha\pi^{2}}{6}. Now

sup(a,b)A(i)×B(i)q(i,a,b)\displaystyle\sup_{(a,b)\in A(i)\times B(i)}q(i,a,b) sup(a,b)A(i)×B(i){μ^i+λ^(i+3)2+|h^1(i,a)|+|h^2(i,b)|}\displaystyle\leq\sup_{(a,b)\in A(i)\times B(i)}\biggl{\{}\hat{\mu}i+\hat{\lambda}(i+3)^{2}+|\hat{h}_{1}(i,a)|+|\hat{h}_{2}(i,b)|\biggr{\}}
λ^i+2λ^+λ^(i+3)2\displaystyle\leq\hat{\lambda}i+2\hat{\lambda}+\hat{\lambda}(i+3)^{2}
2(i+3)2λ^b2V(i)i1,\displaystyle\leq 2(i+3)^{2}\hat{\lambda}\leq b_{2}V(i)~{}\forall~{}i\geq 1, (4.9)

where b2=max{2λ^,j1α(j+3)4}b_{2}=\max\{2\hat{\lambda},\displaystyle\sum_{j\geq 1}\frac{\alpha}{{(j+3)}^{4}}\}. From (4.7) and (4.8), for all iSi\in S, we get

jSq(j|i,a,b)V(j)b0V(i)+b1,\displaystyle\sum_{j\in S}q(j|i,a,b)V(j)\leq b_{0}V(i)+b_{1}, (4.10)

where b1=Cb_{1}=C and b0=1b_{0}=1. Now

^(i)max(a,b)A(i)×B(i)c(i,a,b)=3+(1p^)i+min(a,b)A(i)×B(i)[c^1(i,a)c^2(i,b)].\displaystyle\hat{\ell}(i)-\max_{(a,b)\in A(i)\times B(i)}c(i,a,b)=3+(1-\hat{p})i+\min_{(a,b)\in A(i)\times B(i)}[\hat{c}_{1}(i,a)-\hat{c}_{2}(i,b)]. (4.11)

From (4.9) and (4.10), Assumption 2.1 is verified. By the condition (II), equations (4.7), (4.8), (4.11), it is easy to see that Assumption 2.2 is verified. By (4.1), (4.6), the condition (II), and the definition of qq as defined above, Assumption 2.3 (i) is verified. By (4.7) and (4.8), Assumption 2.3 (ii) is verified. Hence by Theorem 3.2, it follows that there exists an optimal pair of stationary strategies for this controlled Birth-Death process. ∎

References

  • [1] A. ARAPOSTATHIS, A counterexample to a nonlinear version of the Kreın-Rutman theorem by R. Mahadevan, Nonlinear Anal., 171 (2018), pp. 170–176.
  • [2] W. J. ANDERSON, Continuous-time Markov chains, Springer Series in Statistics: Probab. and its Appl., Springer-Verlag, New York, (1991), An applications-oriented approach.
  • [3] A. BASU AND M. K. GHOSH, Zero-sum risk-sensitive stochastic games on a countable state space, Stoch, processes and their appli., 124 (2014), pp. 961-983.
  • [4] A. BASU AND M. K. GHOSH, Nonzero-sum risk-sensitive stochastic games on a countable state space, Math. of Oper. Res., 43 (2018), pp. 516-532.
  • [5] N. BAUERLE AND U. RIEDER, More risk-sensitive Markov decision processes, Math. Oper. Res., 39 (2014), pp. 105-120.
  • [6] N. BAUERLE AND U. RIEDER, Zero-sum risk-sensitive stochastic games, Stoch. processes and their appl., 127 (2017), pp. 622-642.
  • [7] A. BISWAS AND S. PRADHAN, Ergodic risk-sensitive control of Markov processes on countable state space revisited, ArXiv e-prints 2104.04825 (2021). Available at https://arxiv.org/abs/2104.04825.
  • [8] K. FAN, Minimax Theorems, Proc. Natl. Acad. Sci., USA, 39 (1953), pp. 42-47.
  • [9] M. K. GHOSH AND S. SAHA, Risk-sensitive control of continuous-time Markov chains, Stochastic, 86 (2014), pp. 655-675.
  • [10] M. K. GHOSH, K. S. KUMAR, AND C. PAL, Zero-sum risk-sensitive stochastic games for continuous-time Markov chains, Stoch. Anal. Appl., 34 (2016), pp. 835-851.
  • [11] S. GOLUI AND C. PAL, Continuous-time Zero-Sum Games for Markov chains with risk-sensitive finite-horizon cost criterion, Stoch. Anal. and Appl., (2021). Available at: https://doi.org/10.1080/07362994.2021.1889381.
  • [12] S. GOLUI AND C. PAL, Continuous-time zero-sum games for Markov decision processes with discounted risk-sensitive cost criterion, Dyn. Games and Appl., (2021). Available at: https://doi.org/10.1007/s13235-021-00391-2.
  • [13] X. P. GUO AND O. HERNANDEZ-LERMA, Zero-sum games for continuous-time Markov chains with unbounded transition and average payoff rates, J. Appl. Probab., 40 (2003), pp. 327-345.
  • [14] X.P. GUO AND YOUNGHUI HUANG, Risk-sensitive average continuous-time Markov decision processes with unbounded transition and cost rates, J. Appl. Probab., 58 (2021), pp. 523-550.
  • [15] X. P. GUO AND O. HERNANDEZ-LERMA, Zero-sum games for continuous-time jump Markov processes in Polish spaces: discounted payoffs, Adv. in Appl. Probab., 39 (2007), pp. 645-668.
  • [16] X. P. GUO AND O. HERNANDEZ-LERMA, Continuous-Time Markov decision processes, Stoch. Model. and Appl. Probab., Springer-Verlag, Berlin, 62 (2009). Theorey and applications.
  • [17] X. GUO AND X. SONG, Discounted continuous-time constrained Markov decision processes in polish spaces, Ann. Appl. Probab., 21 (2011), pp. 2016-2049.
  • [18] X. P. GUO AND Z. W. LIAO, Risk-sensitive discounted continuous-time Markov decision processes with unbounded rates, SIAM J. Control Optim., 57 (2019), pp. 3857-3883.
  • [19] X. P. GUO AND A. PIUNOVSKIY, Discounted continuous-time Markov decision processes with constraints: Unbounded transition and loss rates, Math. Oper. Res., 36 (2011), pp. 105-132.
  • [20] O. HERNANDEZ-LERMA AND J. LASSERRE, Further topics on discrete-time Markov control processes, Springer, New York, (1999).
  • [21] O. HERNANDEZ-LERMA AND J. B. LASSERRE Zero-sum stochastic games in Borel spaces: average payoff criterion, SIAM J. Control Optimization, 39 (2001), pp. 1520-1539.
  • [22] M. Y. KITAEV, Semi-Markov and jump Markov controlled models: Average cost criterion, SIAM Theory Probab. Appl., 30 (1995), pp. 272-288.
  • [23] M. Y. KITAEV AND V.V. RYKOV, Controlled Queueing Systems, CRC Press, Boca Raton, (1995).
  • [24] M. B. KLOMPSTRA, Nash equilibria in risk-sensitive dynamic games, IEEE Trans. Automat. Control. 45 (2007), pp. 1397-1401.
  • [25] K.S. KUMAR AND C. PAL, Risk-sensitive control of jump process on denumerable state space with near monotone cost, Appl. Math. Optim., 68 (2013), pp. 311-331.
  • [26] K.S. KUMAR AND C. PAL, Risk-sensitive ergodic control of continuous-time Markov processes with denumerable state space, Stoch. Anal. Appl., 33 (2015), pp. 863-881.
  • [27] A. S. NOWAK, Measurable selection theorems for minimax stochastic optimization problems, SIAM J. Control Optim., 23 (1985), pp. 466-476.
  • [28] C. PAL AND S. PRADHAN, Risk-sensitive control of pure jump processes on a general state space, Stochastics, 91 (2019), pp. 155-174. Available at https://doi.org/10.1080/17442508.2018.1521413.
  • [29] C. PAL AND S. PRADHAN, Zero-Sum Games for Pure Jump Processes with Risk-Sensitive Discounted Cost Criteria, J. Dyn. Games, (2021). Available at http://dx.doi.org/10.3934/jdg.2021020.
  • [30] A. PIUNOVSKIY AND Y. ZHANG, Discounted continuous-time Markov decision processes with unbounded rates: The convex analytic approach, SIAM J. Control Optim., 49 (2011), pp. 2032-2061.
  • [31] A. PIUNOVSKIY AND Y. ZHANG, Continuous-time Markov decision processes, Springer, (2020).
  • [32] M. L. PUTERMAN, Markov decision processes: Disc. stoch. dyn. program., Wiley, New York, (1994).
  • [33] L. S. SHAPLEY, Stochastic games, Proc. Nat. Acad. Sci., USA 39 (1953), pp. 1095-1100.
  • [34] Q. D. WEI AND X. CHEN, Stochastic games for continuous-time jump processes under finite-horizon payoff criterion, Appl. Math. Optim., 74 (2016), pp. 273–301.
  • [35] Q. D. WEI AND X. CHEN, Nonzero-sum games for continuous-time jupm processes under the expected Average payoff criterion, Appl. Math. Optim., 83 (2019), pp. 915-938.
  • [36] Q. D. WEI AND X. CHEN, Nonzero-sum Risk-Sensitive Average Stochastic Gamea: The Case of Unbounded Costs, Dyn. games and Appli., (2021). Available at https://doi.org/10.1007/s13235-021-00380-5.
  • [37] Q. WEI, Zero-sum games for continuous-time Markov jump processes with risk-sensitive finite-horizon cost criterion, Oper. Res. Lett., 46 (2018), pp. 69-75.
  • [38] W. Z. ZHANG AND X. P. GUO, Nonzero-sum games for continuous-time Markov chains with unbounded transition and average payoff rates, Sci. China Math., 55 (2012), pp. 2405–2416.
  • [39] W. Z. ZHANG, Average optimality for continuous-time Markov decision processes under weak continuty conditions, J. of Appl. probab., 51 (2014), pp. 954-970.