Zero-Sum Games for Continuous-time Markov decision processes with risk-sensitive average cost criterion

Mrinal K. Ghosh Department of Mathematics
Indian Institute of Science
Bangalore-560012, India. [email protected] , Subrata Golui Department of Mathematics
Indian Institute of Technology Guwahati
Guwahati, Assam, India [email protected] , Chandan Pal Department of Mathematics
Indian Institute of Technology Guwahati
Guwahati, Assam, India [email protected] and Somnath Pradhan Department of Mathematics and Statistics
Queen’s University
Kingston, Ontario-K7L 3N6, Canada [email protected]

Abstract.

We consider zero-sum stochastic games for continuous time Markov decision processes with risk-sensitive average cost criterion. Here the transition and cost rates may be unbounded. We prove the existence of the value of the game and a saddle-point equilibrium in the class of all stationary strategies under a Lyapunov stability condition. This is accomplished by establishing the existence of a principal eigenpair for the corresponding Hamilton-Jacobi-Isaacs (HJI) equation. This in turn is established by using the nonlinear version of Krein-Rutman theorem. We then obtain a characterization of the saddle-point equilibrium in terms of the corresponding HJI equation. Finally, we use a controlled population system to illustrate results.

Keywords: Zero-sum game; risk-sensitive average cost criterion; history dependent strategy; HJI equation; saddle point equilibrium.

1. INTRODUCTION

Markov decision processes (MDPs) are widely used for modeling control problems that arise naturally in many real-life problems, for example in queueing models, epidemiology models, birth-death models etc, see [5], [16], [31], [32]. When there is more than one controller (or player) the stochastic control problem is referred to as stochastic game problem. Stochastic dynamic game was first introduced in [33] and has been studied extensively in the literature due to its immense applications; see [3], [6], [10], [11], [15], [34], [37], [38] and the references therein. In this article we consider the risk-sensitive ergodic zero-sum game for continuous-time Markov decision processes (CTMDPs). In zero-sum game, one player is trying to minimize her/his cost and the other player is trying to maximize the same. In literature, the expected average cost criterion is a commonly used optimality criterion in the theory of CTMDPs and has been widely studied under the different sets of optimality conditions; for control problems see, [16], [39] and the references therein; for game problems see [13], [20], [35] and the references therein. In these papers the decision-makers are risk-neutral. However, risk preferences may vary from person to person in the real-world applications. In order to address this concern one of the approaches that is available in the literature is risk-sensitive criterion. In this criterion one investigates the expectation of an exponential of the random quantity. This takes into account the attitude of the controller with respect to risk. The performance of a pair of strategies is measured by risk-sensitive average cost criterion, which in our present case is defined by (2.4), below. The analysis of risk-sensitive control is technically more involved because of the exponential nature of the cost. The risk-sensitive average cost stochastic optimal control problems for CTMDPs were first considered in [9] and have been studied extensively in the literature due to its applications in finance and large deviation theory. Recently, there has been an extensive work on risk-sensitive average cost criterion problems for CTMDPs; see, for example [7], [14], [25], [26], [28] and the references therein. The risk-sensitive stochastic zero-sum games for MDPs have been studied in [[3], [6], [10], [11], [37]] and [[4], [24], [36]] consider the nonzero-sum games for MDPs. In [[3], [6]], the authors study zero-sum risk-sensitive stochastic games for discrete time MDPs with bounded cost. Both of the papers considered first the discounted cost and then ergodic cost. In [6], the authors extended the results of [3] to the general state space case. The zero-sum risk-sensitive average games have been studied in [10] and discounted risk-sensitive zero-sum games were studied in [29] for CTMDPs with bounded cost and transition rates. But this boundedness requirement restricts our domain of application, since in many real-life situations we see that the reward/cost and transition rates are unbounded as for example in queueing, telecommunication and population processes. In [11] and [37], the authors study finite horizon zero-sum risk-sensitive continuous-time stochastic games. In [11], unbounded costs and transition rates are considered while [37] considers unbounded transition but bounded cost. The discounted risk-sensitive zero-sum game for CTMDPs was studied in [12] with unbounded cost and transition rates.

Here we study zero-sum ergodic risk-sensitive stochastic games for CTMDPs having the following features: (a) transition and cost rates may be unbounded (b) state space is countable (c) at any state of the system the space of admissible actions is compact (d) the strategies may be history dependent. To the best of our knowledge, this is the first work which deals with infinite horizon continuous-time zero-sum risk-sensitive stochastic games for ergodic criterion on countable state space for unbounded transition and cost rates. Under a Lyapunov stability condition, we prove the existence of a saddle-point equilibrium in the class of stationary strategies. Using Krein-Rutman theorem, we first prove that the corresponding HJI equation has a unique solution for any finite subset of the state space. Then using the Lyapunov stability condition, we establish the existence of a unique solution for the corresponding HJI equation on the whole state space. Also we give a complete characterization of saddle point equilibrium in terms of the corresponding HJI equation.

The rest of this article is arranged as follows. Section 2 gives the description of the problem and assumptions. We also show in this section that the required risk-sensitive optimality equation (HJI equation) has a solution. In Section 3, we completely characterize all possible saddle point equilibria in the class of stationary Markov strategies. In Section 4, we present an illustrative example.

2. The game model

In this section we introduce the continuous-time zero-sum stochastic game model described by the following elements

\{S,A,B,(A(i)\subset A,B(i)\subset B,i\in S),q(\cdot|i,a,b),c(i,a,b)\},

(2.1)

where

•

$S$ , called the state space, is the set of all nonnegative integers.
•

$A$ and $B$ are the action sets for players 1 and 2, respectively. The action spaces $A$ and $B$ are assumed to be Borel spaces with the Borel $\sigma$ -algebras $\mathcal{B}(A)$ and $\mathcal{B}(B)$ , respectively.
•

For each $i\in S$ , $A(i)\in\mathcal{B}(A)$ and $B(i)\in\mathcal{B}(B)$ denote the sets of admissible actions for players 1 and 2 in state $i$ , respectively. Let $K:=\{(i,a,b)|i\in S,a\in A(i),b\in B(i)\}$ , which is a Borel subset of $S\times A\times B$ . Throughout this paper, we assume that the admissible action spaces $A(i)(\subset A)$ and $B(i)(\subset B)$ are compact for each $i$ .

•

Given any $(i,a,b)\in K$ , the transition rate $q(j|i,a,b)$ is a signed kernel on $S$ such that $q(j|i,a,b)\geq 0$ for all $j,i\in S$ with $j\neq i$ . Moreover, we assume that $q(j|i,a,b)$ satisfies the following conservative and stable conditions: for any $i\in S,$

	$\displaystyle\sum_{j\in S}q(j\|i,a,b)=0~{}\text{for all}~{}(a,b)\in A(i)\times B(i)~{}~{}~{}\text{and}$
	$\displaystyle~{}q^{*}(i):=\sup_{(a,b)\in A(i)\times B(i)}q(i,a,b)<\infty,$		(2.2)

where $q(i,a,b):=-q(i|i,a,b)\geq 0.$

•

Finally, the measurable function $c:K\to\mathbb{R}_{+}$ denotes the cost rate (representing cost for player 2 and payoff for player 1).

The game evolves as follows. The players observe continuously the current state of the system. When the system is in state $i\in S$ at time $t\geq 0$ , the players independently choose actions $a_{t}\in A(i)$ and $b_{t}\in B(i)$ according to some strategies, respectively. As a consequence of this, the following happens:

•

player 2 pays an immediate cost at rate $c(i,a_{t},b_{t})$ to player 1;
•

the system stays in state $i$ for a random time, with rate of leaving $i$ given by $q(i,a_{t},b_{t})$ , and then jumps to a new state $j\neq i$ with the probability determined by $\dfrac{q(\cdot|i,a_{t},b_{t})}{q(i,a_{t},b_{t})}$ (see Proposition in [[16], p. 205] for details).

When the state of the system transits to a new state $j$ , the above procedure is repeated.

The goal of player 2 is to minimize his/her accumulated cost, whereas player 1 tries to maximize the same with respect to some performance criterion $J(\cdot,\cdot,\cdot,\cdot)$ , which in our present case is defined by (2.4), below. Such a model is relevant in worst-case scenarios, e.g., in financial applications when a risk-averse investor is trying to maximize his long-term portfolio gain against the market which, by default, is the minimizer in this case.

To formalize what is described above, below we describe the construction of continuous time Markov decision processes (CTMDPs) under possibly admissible history-dependent strategies. To construct the underlying CTMDPs (as in [[19], [22], [30]]), we introduce some notations: let $S_{\Delta}:=S\cup\{\Delta\}$ (with some $\Delta\notin S$ ), $\Omega^{0}:=(S\times(0,\infty))^{\infty}$ , $\Omega:=\Omega^{0}\cup\{(i_{0},\theta_{1},i_{1},\cdots,\theta_{k},i_{k},\infty,\Delta,\infty,\Delta,\cdots)|i_{l}\in S,\theta_{l}\in(0,\infty),$ for each $0\leq l\leq k,k\geq 0\}$ , and let $\mathscr{F}$ be the Borel $\sigma$ -algebra on $\Omega$ . Then we obtain the measurable space $(\Omega,\mathscr{F})$ . For each $k\geq 0$ , $\omega:=(i_{0},\theta_{1},i_{1},\cdots,\theta_{k},i_{k},\cdots)\in\Omega,$ define $T_{0}(\omega):=0$ , $T_{k}(\omega):=T_{k-1}(\omega)+\theta_{k}$ , $T_{\infty}(\omega):=\lim_{k\rightarrow\infty}T_{k}(\omega)$ . Using $\{T_{k}\}$ , we define the state process $\{\xi_{t}\}_{t\geq 0}$ as

\xi_{t}(\omega):=\sum_{k\geq 0}I_{\{T_{k}\leq t<T_{k+1}\}}i_{k}+I_{\{t\geq T_{\infty}\}}\Delta,\text{ for }t\geq 0.

(2.3)

Here, $I_{E}$ denotes the indicator function of a set $E$ , and we use the convention that $0+z=:z$ and $0\cdot z=:0$ for all $z\in S_{\Delta}$ . The process after $T_{\infty}$ is regarded to be absorbed in the state $\Delta$ . Thus, let $q(\cdot|\Delta,a_{\Delta},b_{\Delta}):\equiv 0$ , $A_{\Delta}:=A\cup\{a_{\Delta}\}$ , $B_{\Delta}:=B\cup\{b_{\Delta}\}$ , $A(\Delta):=\{a_{\Delta}\}$ , $B(\Delta):=\{b_{\Delta}\}$ , $c(\Delta,a,b):\equiv 0$ for all $(a,b)\in A_{\Delta}\times B_{\Delta}$ , where $a_{\Delta}$ , $b_{\Delta}$ are isolated points. Moreover, let $\mathscr{F}_{t}:=\sigma(\{T_{k}\leq s,\xi_{T_{k}}\in D\}:D\in\mathcal{B}(S),0\leq s\leq t,k\geq 0)$ for all $t\geq 0$ , $\mathscr{F}_{s-}=:\bigvee_{0\leq t<s}\mathscr{F}_{t}$ , and $\mathscr{P}:=\sigma(\{A\times\{0\},A\in\mathscr{F}_{0}\}\cup\{B\times(s,\infty),B\in\mathscr{F}_{s-}\})$ which denotes the $\sigma$ -algebra of predictable sets on $\Omega\times[0,\infty)$ related to $\{\mathscr{F}_{t}\}_{t\geq 0}$ .

In order to define the risk sensitive cost criterion, we need to introduce the definition of strategy below.

Definition 2.1.

An admissible history-dependent strategy for player 1, denoted by $\pi^{1}$ , is determined by a sequence $\{\pi_{k}^{1},k\geq 0\}$ of stochastic kernel on $A$ such that

	$\displaystyle\pi^{1}(da\|\omega,t)$	$\displaystyle=I_{\{t=0\}}(t)\pi_{0}^{1}(da\|i_{0},0)+\sum_{k\geq 0}I_{\{T_{k}<t\leq T_{k+1}\}}\pi^{1}_{k}(da\|i_{0},\theta_{1},i_{1},\dots,\theta_{k},i_{k},t-T_{k})$
		$\displaystyle+I_{\{t\geq T_{\infty}\}}\delta_{a_{\Delta}}(da),$

where $\pi_{0}^{1}(da|i_{0},0)$ is a stochastic kernel on $A$ given $S$ such that $\pi_{0}^{1}(A(i_{0})|i_{0},0)=1$ , $\pi^{1}_{k}(k\geq 1)$ are stochastic kernels on $A$ given $(S\times(0,\infty))^{k+1}$ such that $\pi_{k}^{1}(A(i_{k})|i_{0},\theta_{1},i_{1},\cdots,\theta_{k},i_{k},t-T_{k})=1$ , and $\delta_{a_{\Delta}}(da)$ denotes the Dirac measure at the point $a_{\Delta}$ .

The set of all admissible history-dependent strategies for player 1 is denoted by $\Pi^{1}$ . A strategy $\pi^{1}\in\Pi^{1}$ for player 1, is called a Markov if $\pi^{1}(da|\omega,t)=\pi^{1}(da|\xi_{t-}(w),t)$ for every $w\in\Omega$ and $t\geq 0$ , where $\xi_{t-}(w):=\lim_{s\uparrow t}\xi_{s}(w)$ . A Markov stragegy $\pi^{1}$ is called a stationary Markov strategy if $\pi^{1}$ does not have explicit dependence on time. We denote by $\Pi^{m}_{1}$ and $\Pi^{s}_{1}$ the family of all Markov strategies and stationary Markov strategies, respectively, for player 1. The sets of all admissible history-dependent strategies $\Pi^{2}$ , all Markov strategies $\Pi^{m}_{2}$ and all stationary strategies $\Pi^{s}_{2}$ for player 2 are defined similarly.

For any compact metric space $Y$ , let $P(Y)$ denote the space of probability measures on $\mathcal{B}(Y)$ with Prohorov topology. Since for each $i\in S$ , $A(i)$ and $B(i)$ are compact sets, $P(A(i))$ and $P(B(i))$ are compact metric spaces. For each $i,j\in S$ , $\mu\in P(A(i))$ and $\nu\in P(B(i))$ , the associated cost and transition rates are defined, respectively, as follows:

c(i,\mu,\nu):=\int_{B(i)}\int_{A(i)}c(i,a,b)\mu(da)\nu(db),

q(j|i,\mu,\nu):=\int_{B(i)}\int_{A(i)}q(j|i,a,b)\mu(da)\nu(db).

Note that $\pi^{1}\in{\Pi_{1}^{s}}$ can be identified with a map $\pi^{1}:S\to{P}(A)$ such that $\pi^{1}(\cdot|j)\in{P}(A(j))$ for each $j\in S$ . Thus, we have $\Pi_{1}^{s}=\displaystyle\Pi_{i\in S}{P}(A(i))$ and $\Pi_{2}^{s}=\displaystyle\Pi_{i\in S}{P}(B(i))$ . Therefore by Tychonoff theorem, the sets ${\Pi_{1}^{s}}$ and ${\Pi_{2}^{s}}$ are compact metric spaces. Also, note that under Assumption 2.1 (given below) for any initial state $i\in S$ and any pair of strategies $(\pi^{1},\pi^{2})\in\Pi^{1}\times\Pi^{2}$ , Theorem 4.27 in [23] yields the existence of a unique probability measure denoted by $P^{\pi^{1},\pi^{2}}_{i}$ on $(\Omega,\mathscr{F})$ . Let $E^{\pi^{1},\pi^{2}}_{i}$ be the expectation operator with respect to $P^{\pi^{1},\pi^{2}}_{i}$ . Also, from [[16], pp.13-15], we know that $\{\xi_{t}\}_{t\geq 0}$ is a Markov process under any $(\pi^{1},\pi^{2})\in\Pi^{m}_{1}\times\Pi^{m}_{2}$ (in fact, strong Markov). Now we give the definition of the risk-sensitive average cost criterion for zero-sum continuous-time games. Since the risk-sensitive parameter remains fixed throughout we assume without any loss of generality that the risk-sensitivity coefficient $\theta=1$ . For each $i\in S$ and any $(\pi^{1},\pi^{2})\in\Pi^{1}\times\Pi^{2}$ , the risk-sensitive ergodic cost criterion is given by

J(i,c,\pi^{1},\pi^{2}):=\limsup_{T\rightarrow\infty}\frac{1}{T}\ln E^{\pi^{1},\pi^{2}}_{i}\biggl{[}e^{\int_{0}^{T}\int_{B}\int_{A}c(\xi_{t},a,b)\pi^{1}(da|\omega,t)\pi^{2}(db|\omega,t)dt}\biggr{]}.

(2.4)

Player 1 tries to maximize the above over his/her admissible strategies whereas player 2 tries to minimize the same. Now we define the lower/upper value of the game. The functions on S defined by $L(i):=\displaystyle\sup_{\pi^{1}\in\Pi^{1}}\inf_{\pi^{2}\in\Pi^{2}}J(i,c,\pi^{1},\pi^{2})$ and $U(i):=\displaystyle\inf_{\pi^{2}\in\Pi^{2}}\sup_{\pi^{1}\in\Pi^{1}}J(i,c,\pi^{1},\pi^{2})$ are called, respectively, the lower value and the upper value of the game. It is easy to see that

L(i)\leq U(i)~{}\text{for all}~{}i\in S.

Definition 2.2.

If $L(i)=U(i)$ for all $i\in S$ , then the common function is called the value of the game and is denoted by $J^{*}(i)$ .

Definition 2.3.

Suppose that the game admits a value $J^{*}$ . Then a strategy $\pi^{*1}$ in $\Pi^{1}$ is said to be optimal for player 1 if

\inf_{\pi^{2}\in\Pi^{2}}J(i,c,\pi^{*1},\pi^{2})=J^{*}(i)~{}\text{for all}~{}i\in S.

Similarly, $\pi^{*2}\in\Pi^{2}$ is optimal for player 2 if

\sup_{\pi^{1}\in\Pi^{1}}J(i,c,\pi^{1},\pi^{*2})=J^{*}(i)~{}\text{for all}~{}i\in S.

If $\pi^{*k}\in\Pi^{k}$ is optimal for player k (k=1,2), then $(\pi^{*1},\pi^{*2})$ is called a pair of optimal strategies and also called a saddle-point equilibrium.

Next we list the commonly used notations below:

•

For any finite set $\mathcal{D}\subset S$ , we define $\mathcal{B}_{\mathcal{D}}=\{f:S\to\mathbb{R}\mid\,\,f(i)=0\,\,\,\forall\,\,i\in\mathcal{D}^{c}\}$ .
•

$\mathcal{B}_{\mathcal{D}}^{+}\subset\mathcal{B}_{\mathcal{D}}$ denotes the cone of all nonnegative functions vanishing outside $\mathscr{D}.$

•

Given any real-valued function $\mathcal{V}\geq 1$ on $S$ , we define a Banach space $(L^{\infty}_{\mathcal{V}},\|\cdot\|^{\infty}_{\mathcal{V}})$ of $\mathcal{V}$ -weighted functions by

L^{\infty}_{\mathcal{V}}=\biggl{\{}u:S\rightarrow\mathbb{R}\mid\|u\|^{\infty}_{\mathcal{V}}:=\sup_{i\in S}\frac{|u(i)|}{\mathcal{V}(i)}<\infty\biggr{\}}.

•

$\|c\|_{\infty}:=\displaystyle\sup_{(i,a,b)\in K}c(i,a,b)$ .
•

For any function $f\in\mathcal{B}_{\mathcal{D}}$ , $\|f\|_{\mathcal{D}}=\max\{|f(i)|:i\in\mathcal{D}\}$ .
•

For any finite set $\mathscr{B}\subset S$ , $\tilde{\tau}(\mathscr{B}):=\inf\{t>0:\xi_{t}\in\mathscr{B}\}$ .

Our main goal is to establish the existence of a saddle-point equilibrium among the class of admissible history-dependent strategies. To this end, following [3] and [7], we investigate the HJI equation given by

	$\displaystyle\rho\psi(i)$	$\displaystyle=\sup_{\mu\in P(A(i))}\inf_{\nu\in P(B(i))}\bigg{[}\sum_{j\in S}\psi(j)q(j\|i,\mu,\nu)+c(i,\mu,\nu)\psi(i)\bigg{]}$
		$\displaystyle=\inf_{\nu\in P(B(i))}\sup_{\mu\in P(A(i))}\bigg{[}\sum_{j\in S}\psi(j)q(j\|i,\mu,\nu)+c(i,\mu,\nu)\psi(i)\bigg{]}.$		(2.5)

Here $\rho$ is a scalar and $\psi$ is an appropriate function. The above is clearly an eigenvalue problem related to a nonlinear operator on an appropriate space. By a nonlinear version of Krein-Rutman theorem, we first show that Dirichlet eigenvalue problem associated with the above equation admits a solution in the space of bounded functions. Then by using a suitable limiting argument we show that the above HJI equation admits a principal eigenpair in an appropriate space. Finally exploiting the HJI equation, we completely characterize all possible saddle-point equilibria in the space of stationary Markov strategies. This is a brief outline of our procedure of establishing a saddle point equilibrium. The details now follow.

Since the transition rates (i.e., $q(j|i,a,b)$ ) may be unbounded, to avoid the explosion of the state process $\{\xi_{t},t\geq 0\}$ , the following assumption is imposed on the transition rates, which had been widely used in CTMDPs; see, for instance, [[17], [18], [19]] and references therein.

Assumption 2.1.

There exist real-valued function $\tilde{V}\geq 1$ on $S$ , constants $b_{0}\neq 0$ and $b_{1}\geq 0$ , and $b_{2}>0$ such that :

(i)

$\sum_{j\in S}\tilde{V}(j)q(j|i,a,b)\leq b_{0}\tilde{V}(i)+b_{1}$ for all $(i,a,b)\in K$ ;
(ii)

$q^{*}(i)\leq b_{2}\tilde{V}(i)$ for all $i\in S$ , where $q^{*}(i)$ is as in (2.2).

Throughout the rest of this article we are going to assume that Assumption 2.1 holds. Note that if $\sup_{i\in S}q^{*}(i)<\infty$ then Assumption 2.1 holds trivially. In this case we can choose $\tilde{V}$ to be a suitable constant.

Since we are allowing our transition and cost rates to be unbounded, to guarantee the finiteness of $J(i,c,\pi^{1},\pi^{2})$ , we need the following Assumption.

Assumption 2.2.

We assume that the CTMDP $\{\xi_{t}\}_{t\geq 0}$ is irreducible under every pair of stationary Markov strategies $(\pi^{1},\pi^{2})\in\Pi^{s}_{1}\times\Pi^{s}_{2}$ . Assume that the cost function $c$ is bounded below. Thus without loss of generality we assume that $c\geq 0$ . Furthermore, suppose there exist a constant $C>0$ , a finite set $\hat{\mathscr{K}}$ and a Lyapunov function $V:S\to[1,\infty)$ such that one of the following hold.

(a)

When the running cost is bounded: For some positive constant $\hat{\gamma}>\|c\|_{\infty}$ , we have following blanket stability condition

\displaystyle\sup_{(a,b)\in A(i)\times B(i)}\sum_{j\in S}V(j)q(j|i,a,b)\leq CI_{\hat{\mathscr{K}}}(i)-\hat{\gamma}V(i)~{}\forall i\in S.

(2.6)

(b)

When the running cost is unbounded: For some norm-like function $\hat{\ell}:S\rightarrow\mathbb{R}_{+}$ , the function $\hat{\ell}(\cdot)-\displaystyle\max_{(a,b)\in A(\cdot)\times B(\cdot)}c(\cdot,a,b)\;$ is norm-like and we have the following blanket stability condition

\displaystyle\sup_{(a,b)\in A(i)\times B(i)}\sum_{j\in S}V(j)q(j|i,a,b)\leq CI_{\hat{\mathscr{K}}}(i)-\hat{\ell}(i)V(i)~{}\forall i\in S.

(2.7)

We wish to establish the existence of a saddle-point equilibrium in the class of all stationary strategies. In view of this we also need the following assumptions. Let $i_{0}\in S$ be a fixed point (a reference state).

Assumption 2.3.

(i)

For any fixed $i,j\in S$ the functions $q(j|i,a,b)$ and ${c}(i,a,b)$ are continuous in $(a,b)\in A(i)\times B(i)$ .
(ii)

The sum $\displaystyle\sum_{j\in S}V(j)q(j|i,a,b)$ is continuous in $(a,b)\in A(i)\times B(i)$ for any given $i\in S$ , where $V$ is as Assumption 2.2.
(iii)

There exists $i_{0}\in S$ such that any state can be reached from $i_{0}$ , i.e., $q(j|i_{0},a,b)>0$ for all $j\neq i_{0}$ and $(a,b)\in A(i_{0})\times B(i_{0})$ .

We first construct an increasing sequence of finite subsets $\hat{\mathscr{D}}_{n}\subset S$ such that $\displaystyle\cup_{i=0}^{\infty}\hat{\mathscr{D}}_{n}=S$ and $i_{0}\in\hat{\mathscr{D}}_{n}$ for all $n\in\mathbb{N}$ . Define $\tau_{n}:=\tau(\hat{\mathscr{D}}_{n}):=\inf\{t\geq 0:\xi_{t}\notin\hat{\mathscr{D}}_{n}\}$ , first exit time from $\hat{\mathscr{D}}_{n}$ .

Proposition 2.1.

Suppose Assumption 2.3 holds. Let $\tilde{c}:K\to\mathbb{R}$ be a function continuous in $(a,b)\in A(i)\times B(i)$ for each fixed $i\in S$ . Suppose the cost function $\tilde{c}$ satisfies the relation $\tilde{c}<-\delta$ in $\hat{\mathscr{D}}_{n}$ for some $\delta>0$ and $n\in\mathbb{N}$ . Then for any $g\in\mathcal{B}_{\hat{\mathscr{D}}_{n}}$ there exists a unique $\varphi\in\mathcal{B}_{\hat{\mathscr{D}}_{n}}$ satisfying the following nonlinear equation

	$\displaystyle-g(i)$	$\displaystyle=\sup_{\mu\in P(A(i))}\inf_{\nu\in P(B(i))}\biggl{[}\sum_{j\in S}\varphi(j)q(j\|i,\mu,\nu)+\tilde{c}(i,\mu,\nu)\varphi(i)\biggr{]}$
		$\displaystyle=\inf_{\nu\in P(B(i))}\sup_{\mu\in P(A(i))}\biggl{[}\sum_{j\in S}\varphi(j)q(j\|i,\mu,\nu)+\tilde{c}(i,\mu,\nu)\varphi(i)\biggr{]}~{}\forall i\in\hat{\mathscr{D}}_{n},$		(2.8)

with $\varphi(i)=0$ for all $i\in\hat{\mathscr{D}}_{n}^{c}$ . Moreover the unique solution of the above equation satisfies

	$\displaystyle\varphi(i)$	$\displaystyle=\sup_{\pi^{1}\in\Pi^{1}}\inf_{\pi^{2}\in\Pi^{2}}E^{\pi^{1},\pi^{2}}_{i}\biggl{[}\int_{0}^{\tau_{n}}e^{\int_{0}^{t}\tilde{c}(\xi_{s},\pi^{1}(s),\pi^{2}(s))ds}g(\xi_{t})dt\biggr{]}$
		$\displaystyle=\inf_{\pi^{2}\in\Pi^{2}}\sup_{\pi^{1}\in\Pi^{1}}E^{\pi^{1},\pi^{2}}_{i}\biggl{[}\int_{0}^{\tau_{n}}e^{\int_{0}^{t}\tilde{c}(\xi_{s},\pi^{1}(s),\pi^{2}(s))ds}g(\xi_{t})dt\biggr{]}~{}\forall i\in S,$		(2.9)

where as before $\tau_{n}=\inf\{t\geq 0:\xi_{t}\notin\hat{\mathscr{D}}_{n}\}$ .

Proof.

Let $(y_{i})_{i\in\hat{\mathscr{D}}_{n}}$ be a sequence in $\mathbb{R}$ . Fix $i\in\hat{\mathscr{D}}_{n}$ . Let $F:\mathbb{R}\rightarrow\mathbb{R}$ be defined by

\displaystyle x\rightarrow F(x)=\sup_{\mu\in P(A(i))}\inf_{\nu\in P(B(i))}\biggl{[}\sum_{i\neq j\in\hat{\mathscr{D}}_{n}}y_{j}q(j|i,\mu,\nu)+\biggl{(}q(i|i,\mu,\nu)+\tilde{c}(i,\mu,\nu)\biggr{)}x\biggr{]},i\in\hat{\mathscr{D}}_{n}.

(2.10)

Suppose $x_{2}>x_{1}$ . Let $\varepsilon>0$ . Then there exists $\pi^{1}_{\varepsilon}\in\Pi^{s}_{1}$ for which the following holds

	$\displaystyle F(x_{1})-F(x_{2})=$	$\displaystyle\sup_{\mu\in P(A(i))}\inf_{\nu\in P(B(i))}\biggl{[}\sum_{i\neq j\in\hat{\mathscr{D}}_{n}}y_{j}q(j\|i,\mu,\nu)+\biggl{(}q(i\|i,\mu,\nu)+\tilde{c}(i,\mu,\nu)\biggr{)}x_{1}\biggr{]}$
		$\displaystyle-\sup_{\mu\in P(A(i))}\inf_{\nu\in P(B(i))}\biggl{[}\sum_{i\neq j\in\hat{\mathscr{D}}_{n}}y_{j}q(j\|i,\mu,\nu)+\biggl{(}q(i\|i,\mu,\nu)+\tilde{c}(i,\mu,\nu)\biggr{)}x_{2}\biggr{]}$
	$\displaystyle\geq$	$\displaystyle\inf_{\nu\in P(B(i))}\biggl{[}\sum_{i\neq j\in\hat{\mathscr{D}}_{n}}y_{j}q(j\|i,\pi^{1}_{\varepsilon}(i),\nu)+\biggl{(}q(i\|i,\pi^{1}_{\varepsilon}(i),\nu)+\tilde{c}(i,\pi^{1}_{\varepsilon}(i),\nu)\biggr{)}x_{1}\biggr{]}$
		$\displaystyle-\inf_{\nu\in P(B(i))}\biggl{[}\sum_{i\neq j\in\hat{\mathscr{D}}_{n}}y_{j}q(j\|i,\pi^{1}_{\varepsilon}(i),\nu)+\biggl{(}q(i\|i,\pi^{1}_{\varepsilon}(i),\nu)+\tilde{c}(i,\pi^{1}_{\varepsilon}(i),\nu)\biggr{)}x_{2}+\varepsilon\biggr{]}$
	$\displaystyle\geq$	$\displaystyle\inf_{\nu\in P(B(i))}\biggl{[}\biggl{(}q(i\|i,\pi^{1}_{\varepsilon}(i),\nu)+\tilde{c}(i,\pi^{1}_{\varepsilon}(i),\nu)\biggr{)}(x_{1}-x_{2})\biggr{]}-\varepsilon$
	$\displaystyle\geq$	$\displaystyle\inf_{\nu\in P(B(i))}\biggl{[}-\tilde{c}(i,\pi^{1}_{\varepsilon}(i),\nu)(x_{2}-x_{1})\biggr{]}-\varepsilon$
	$\displaystyle>$	$\displaystyle\,\,\delta(x_{2}-x_{1})-\varepsilon.$

Since $\varepsilon>0$ is arbitrary we get $F(x_{1})>F(x_{2})$ . Also, we see that $\lim_{x\rightarrow+\infty}F(x)=-\infty$ and $\lim_{x\rightarrow-\infty}F(x)=+\infty$ . Since $F$ is continuous in $x$ , for every $y\in\mathbb{R}$ , there exists a unique $x$ satisfying $F(x)=y$ . Now using the definition of $F$ , for fixed $g\in\mathcal{B}_{\hat{\mathscr{D}}_{n}}$ , we can define a map $\hat{T}:\mathcal{B}_{\hat{\mathscr{D}}_{n}}\rightarrow\mathcal{B}_{\hat{\mathscr{D}}_{n}}$ satisfying

\displaystyle\sup_{\mu\in P(A(i))}\inf_{\nu\in P(B(i))}\biggl{[}\sum_{i\neq j\in\hat{\mathscr{D}}_{n}}\tilde{\phi}(j)q(j|i,\mu,\nu)+\biggl{(}q(i|i,\mu,\nu)+\tilde{c}(i,\mu,\nu)\biggr{)}(\hat{T}\tilde{\phi}(i))\biggr{]}=-g(i),~{}i\in\hat{\mathscr{D}}_{n}.

(2.11)

Let $\tilde{\phi}_{1},\tilde{\phi}_{2}\in\mathcal{B}_{\hat{\mathscr{D}}_{n}}$ . Also, let $\tilde{\pi}^{1}$ be an outer maximizing selector of

\sup_{\mu\in P(A(i))}\inf_{\nu\in P(B(i))}\biggl{[}\sum_{i\neq j\in\hat{\mathscr{D}}_{n}}\tilde{\phi}_{2}(j)q(j|i,\mu,\nu)+\biggl{(}q(i|i,\mu,\nu)+\tilde{c}(i,\mu,\nu)\biggr{)}\hat{T}\tilde{\phi}_{2}(i)\biggr{]}\,.

Assumption 2.3, ensures the existence of such a selector. It then follows that

	$\displaystyle 0=$	$\displaystyle\sup_{\mu\in P(A(i))}\inf_{\nu\in P(B(i))}\biggl{[}\sum_{i\neq j\in\hat{\mathscr{D}}_{n}}\tilde{\phi}_{1}(j)q(j\|i,\mu,\nu)+\biggl{(}q(i\|i,\mu,\nu)+\tilde{c}(i,\mu,\nu)\biggr{)}\hat{T}\tilde{\phi}_{1}(i)\biggr{]}$
		$\displaystyle-\sup_{\mu\in P(A(i))}\inf_{\nu\in P(B(i))}\biggl{[}\sum_{i\neq j\in\hat{\mathscr{D}}_{n}}\tilde{\phi}_{2}(j)q(j\|i,\mu,\nu)+\biggl{(}q(i\|i,\mu,\nu)+\tilde{c}(i,\mu,\nu)\biggr{)}\hat{T}\tilde{\phi}_{2}(i)\biggr{]}$
	$\displaystyle\geq$	$\displaystyle\inf_{\nu\in P(B(i))}\biggl{[}\sum_{i\neq j\in\hat{\mathscr{D}}_{n}}\tilde{\phi}_{1}(j)q(j\|i,\tilde{\pi}^{1}(i),\nu)+\biggl{(}q(i\|i,\tilde{\pi}^{1}(i),\nu)+\tilde{c}(i,\tilde{\pi}^{1}(i),\nu)\biggr{)}\hat{T}\tilde{\phi}_{1}(i)\biggr{]}$
		$\displaystyle-\inf_{\nu\in P(B(i))}\biggl{[}\sum_{i\neq j\in\hat{\mathscr{D}}_{n}}\tilde{\phi}_{2}(j)q(j\|i,\tilde{\pi}^{1}(i),\nu)+\biggl{(}q(i\|i,\tilde{\pi}^{1}(i),\nu)+\tilde{c}(i,\tilde{\pi}^{1}(i),\nu)\biggr{)}\hat{T}\tilde{\phi}_{2}(i)\biggr{]}$
	$\displaystyle\geq$	$\displaystyle\inf_{\nu\in P(B(i))}\biggl{[}\sum_{i\neq j\in\hat{\mathscr{D}}_{n}}(\tilde{\phi}_{1}(j)-\tilde{\phi}_{2}(j))q(j\|i,\tilde{\pi}^{1}(i),\nu)+\biggl{(}q(i\|i,\tilde{\pi}^{1}(i),\nu)+\tilde{c}(i,\tilde{\pi}^{1}(i),\nu)\biggr{)}(\hat{T}\tilde{\phi}_{1}(i)-\hat{T}\tilde{\phi}_{2}(i))\biggr{]}\,.$

Now let the infimum of the RHS (of the above) attain at $\pi^{*2}$ . Then

\|\tilde{\phi}_{1}-\tilde{\phi}_{2}\|_{\hat{\mathscr{D}}_{n}}q(i|i,\tilde{\pi}^{1}(i),\pi^{*2}(i))+\biggl{(}q(i|i,\tilde{\pi}^{1}(i),\pi^{*2}(i))+\tilde{c}(i,\tilde{\pi}^{1}(i),\pi^{*2}(i))\biggr{)}(\hat{T}\tilde{\phi}_{1}(i)-\hat{T}\tilde{\phi}_{2}(i))\leq 0.

Hence, we deduce that

\displaystyle(\hat{T}\tilde{\phi}_{2}(i)-\hat{T}\tilde{\phi}_{1}(i))\leq\sup_{\mu\in P(A(i))}\sup_{\nu\in P(B(i))}\frac{-q(i|i,\mu,\nu)}{-q(i|i,\mu,\nu)-\tilde{c}(i,\mu,\nu)}\|\tilde{\phi}_{1}-\tilde{\phi}_{2}\|_{\hat{\mathscr{D}}_{n}}\,.

Now in the above calculation, interchanging $\tilde{\phi}_{1},\tilde{\phi}_{2}$ , it follows that

\|\hat{T}\tilde{\phi}_{1}-\hat{T}\tilde{\phi}_{2}\|_{\hat{\mathscr{D}}_{n}}\leq\alpha_{1}\|\tilde{\phi}_{1}-\tilde{\phi}_{2}\|_{\hat{\mathscr{D}}_{n}}\,,

where $\alpha_{1}$ is a positive constant less than $1$ . This implies that $\hat{T}$ is a contraction map. Thus, by Banach fixed point theorem, there exists a unique $\varphi\in\mathcal{B}_{\hat{\mathscr{D}}_{n}}$ such that $\hat{T}(\varphi)=\varphi$ . Now by Fan’s minimax theorem, see [[8], Theorem 3], we have

	$\displaystyle\sup_{\mu\in P(A(i))}\inf_{\nu\in P(B(i))}\biggl{[}\sum_{j\in S}\varphi(j)q(j\|i,\mu,\nu)+\tilde{c}(i,\mu,\nu)\varphi(i)\biggr{]}$
	$\displaystyle=\inf_{\nu\in P(B(i))}\sup_{\mu\in P(A(i))}\biggl{[}\sum_{j\in S}\varphi(j)q(j\|i,\mu,\nu)+\tilde{c}(i,\mu,\nu)\varphi(i)\biggr{]}.$

This proves that (2.8) admits a unique solution. Now by using Dynkin formula as in [[16], Appendix C.3], for any $(\pi^{1},\pi^{2})\in\Pi^{1}\times\Pi^{2}$ and $T>0$ , we get

	$\displaystyle E^{\pi^{1},\pi^{2}}_{i}\biggl{[}e^{\int_{0}^{T\wedge\tau_{n}}\tilde{c}(\xi_{s},\pi^{1}(s),\pi^{2}(s))ds}\varphi(\xi_{T\wedge\tau_{n}})\biggr{]}-\varphi(i)$
	$\displaystyle=E^{\pi^{1},\pi^{2}}_{i}\biggl{[}\int_{0}^{T\wedge\tau_{n}}e^{\int_{0}^{t}\tilde{c}(\xi_{s},\pi^{1}(s),\pi^{2}(s))ds}\bigg{(}\tilde{c}(\xi_{t},\pi^{1}(t),\pi^{2}(t))\varphi(\xi_{t})+\sum_{j\in S}\varphi(j)q(j\|\xi_{t},\pi^{1}(t),\pi^{2}(t))\bigg{)}dt\biggr{]}.$		(2.12)

Using the compactness of $A(i),B(i)$ and the continuity of $\tilde{c},q$ , there exists a pair of selectors $(\pi^{*1},\pi^{*2})\in\Pi^{s}_{1}\times\Pi^{s}_{2}$ (i.e., a mini-max selector) satisfying

	$\displaystyle-g(i)$	$\displaystyle=\inf_{\nu\in P(B(i))}\biggl{[}\sum_{j\in S}\varphi(j)q(j\|i,\pi^{1}(i),\nu)+\tilde{c}(i,\pi^{1}(i),\nu)\varphi(i)\biggr{]}$
		$\displaystyle=\sup_{\mu\in P(A(i))}\biggl{[}\sum_{j\in S}\varphi(j)q(j\|i,\mu,\pi^{2}(i))+\tilde{c}(i,\mu,\pi^{2}(i))\varphi(i)\biggr{]}.$		(2.13)

Then, using (2.12) and (2.13), we obtain

\displaystyle E^{\pi^{*1},\pi^{2}}_{i}\biggl{[}\int_{0}^{T\wedge\tau_{n}}e^{\int_{0}^{t}\tilde{c}(\xi_{s},\pi^{*1}(\xi_{s}),\pi^{2}(s))ds}g(\xi_{t})dt\biggr{]}\geq-E^{\pi^{*1},\pi^{2}}_{i}\biggl{[}e^{\int_{0}^{T\wedge\tau_{n}}\tilde{c}(\xi_{s},\pi^{*1}(\xi_{s}),\pi^{2}(s))ds}\varphi(\xi_{T\wedge\tau_{n}})\biggr{]}+\varphi(i).

Using the dominated convergence theorem, taking $T\rightarrow\infty$ in the above equation, we get

	$\displaystyle E^{\pi^{1},\pi^{2}}_{i}\biggl{[}\int_{0}^{\tau_{n}}e^{\int_{0}^{t}\tilde{c}(\xi_{s},\pi^{1}(\xi_{s}),\pi^{2}(s))ds}g(\xi_{t})dt\biggr{]}$	$\displaystyle\geq-E^{\pi^{1},\pi^{2}}_{i}\biggl{[}e^{\int_{0}^{\tau_{n}}\tilde{c}(\xi_{s},\pi^{1}(\xi_{s}),\pi^{2}(s))ds}\varphi(\xi_{\tau_{n}})\biggr{]}+\varphi(i)$
		$\displaystyle=\varphi(i).$

Hence

\displaystyle\varphi(i)

\displaystyle\leq E^{\pi^{*1},\pi^{2}}_{i}\biggl{[}\int_{0}^{\tau_{n}}e^{\int_{0}^{t}\tilde{c}(\xi_{s},\pi^{*1}(\xi_{s}),\pi^{2}(s))ds}g(\xi_{t})dt\biggr{]}.

Since $\pi^{2}\in\Pi^{2}$ is arbitrary,

\displaystyle\varphi(i)\leq\inf_{\pi^{2}\in\Pi^{2}}E^{\pi^{*1},\pi^{2}}_{i}\biggl{[}\int_{0}^{\tau_{n}}e^{\int_{0}^{t}\tilde{c}(\xi_{s},\pi^{*1}(\xi_{s}),\pi^{2}(s))ds}g(\xi_{t})dt\biggr{]}.

(2.14)

Similarly, using (2.12), (2.13), and Fatou’s Lemma, we get

\displaystyle\varphi(i)

\displaystyle\geq\sup_{\pi^{1}\in\Pi^{1}}E^{\pi^{1},\pi^{*2}}_{i}\biggl{[}\int_{0}^{\tau_{n}}e^{\int_{0}^{t}\tilde{c}(\xi_{s},\pi^{1}(s),\pi^{*2}(\xi_{s}))ds}g(\xi_{t})dt\biggr{]}.

(2.15)

Using (2.14) and (2.15), we obtain

	$\displaystyle\varphi(i)$	$\displaystyle=\inf_{\pi^{2}\in\Pi^{2}}\sup_{\pi^{1}\in\Pi^{1}}E^{\pi^{1},\pi^{2}}_{i}\biggl{[}\int_{0}^{\tau_{n}}e^{\int_{0}^{t}\tilde{c}(\xi_{s},\pi^{1}(s),\pi^{2}(s))ds}g(\xi_{t})dt\biggr{]}$
		$\displaystyle=\sup_{\pi^{1}\in\Pi^{1}}\inf_{\pi^{2}\in\Pi^{2}}E^{\pi^{1},\pi^{2}}_{i}\biggl{[}\int_{0}^{\tau_{n}}e^{\int_{0}^{t}\tilde{c}(\xi_{s},\pi^{1}(s),\pi^{2}(s))ds}g(\xi_{t})dt\biggr{]}~{}i\in S.$

This completes the proof. ∎

We now recall a version of the nonlinear Krein-Rutman theorem from [[1], Section 3.1]. Let $\hat{\mathscr{X}}$ be an ordered Banach space. In what follows $\succeq$ denotes a partial ordering in $\hat{\mathscr{X}}$ with respect to a positive cone $\hat{\mathscr{C}}$ ( $\subset\hat{\mathscr{X}}$ ), that is $x\succeq y\Leftrightarrow x-y\in\hat{\mathscr{C}}$ . Also, recall that if a map $\tilde{T}:\hat{\mathscr{X}}\rightarrow\hat{\mathscr{X}}$ is continuous and compact, it is called completely continuous.

Theorem 2.1.

Let $\hat{\mathscr{X}}$ be as above and $\hat{\mathscr{C}}\subset\hat{\mathscr{X}}$ a nonempty closed cone that satisfies $\hat{\mathscr{C}}-\hat{\mathscr{C}}=\hat{\mathscr{X}}$ . Let $\tilde{T}:\hat{\mathscr{X}}\rightarrow\hat{\mathscr{X}}$ be an order-preserving, completely continuous, 1-homogeneous map with the property that if for some nonzero $\zeta\in\hat{\mathscr{C}}$ and $N>0$ , we have $N\tilde{T}(\zeta)\succeq\zeta$ . Then there exist a nontrivial $f\in\hat{\mathscr{C}}$ and $\tilde{\lambda}>0$ satisfying $\tilde{T}f=\tilde{\lambda}f$ .

Lemma 2.1.

Suppose Assumption 2.2 holds. Consider a finite subset $\mathscr{B}$ of $S$ such that $\hat{\mathscr{K}}\subset\mathscr{B}$ . Let $\tilde{\tau}(\mathscr{B})=\inf\{t>0:\xi_{t}\in\mathscr{B}\}$ . Then for any pair of strategies $(\pi^{1},\pi^{2})\in\Pi^{1}\times\Pi^{2}$ , the following results hold.

(i)

When Assumption 2.2 (a) holds:

\displaystyle E^{\pi^{1},\pi^{2}}_{i}\bigg{[}e^{\hat{\gamma}\tilde{\tau}(\mathscr{B})}V(\xi_{\tilde{\tau}(\mathscr{B})})\bigg{]}\leq V(i)~{}\forall~{}i\in\mathscr{B}^{c}.

(2.16)

(ii)

When Assumption 2.2 (b) holds:

\displaystyle E^{\pi^{1},\pi^{2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}(\mathscr{B})}\hat{\ell}(\xi_{s})ds}V(\xi_{\tilde{\tau}(\mathscr{B})})\bigg{]}\leq V(i)~{}\forall~{}i\in\mathscr{B}^{c}.

(2.17)

Proof.

It is easy to see that the proof of (i) is analogous to that the proof of (ii) when we replace $\hat{\ell}$ with $\hat{\gamma}$ . So, we prove only part (ii). Suppose Assumption 2.2 (b) holds. Let $n$ be large enough so that $\mathscr{B}\subset\hat{\mathscr{D}}_{n}$ . Applying Dynkin’s formula [[16], Appendix C.3], for $i\in\mathscr{B}^{c}\cap\hat{\mathscr{D}}_{n}$ we have

	$\displaystyle E^{\pi^{1},\pi^{2}}_{i}\biggl{[}e^{\int_{0}^{\tilde{\tau}(\mathscr{B})\wedge T\wedge\tau_{n}}\hat{\ell}(\xi_{s})ds}V(\tilde{\tau}(\mathscr{B})\wedge T\wedge\tau_{n})\biggr{]}-V(i)$
	$\displaystyle=E^{\pi^{1},\pi^{2}}_{i}\biggl{[}\int_{0}^{\tilde{\tau}(\mathscr{B})\wedge T\wedge\tau_{n}}e^{\int_{0}^{t}\hat{\ell}(\xi_{s})ds}[\hat{\ell}(\xi_{t})V(\xi_{t})+\sum_{j\in S}q(j\|\xi_{t},\pi^{1}(t),\pi^{2}(t))V(j)]dt\biggr{]}$
	$\displaystyle\leq E^{\pi^{1},\pi^{2}}_{i}\biggl{[}\int_{0}^{\tilde{\tau}(\mathscr{B})\wedge T\wedge\tau_{n}}e^{\int_{0}^{t}\hat{\ell}(\xi_{s})ds}CI_{\hat{\mathscr{K}}}(\xi_{t})dt\biggr{]}=0,$

where $\tau_{n}=\inf\{t\geq 0:\xi_{t}\notin\hat{\mathscr{D}}_{n}\}$ . Now by Fatou’s lemma, taking first $n\rightarrow\infty$ and then $T\rightarrow\infty$ , we get the required result. ∎

Lemma 2.2.

Suppose Assumptions 2.1, 2.2, and 2.3 hold. Then for $n\in\mathbb{N}$ , there exists a pair $(\rho_{n},\psi_{n})\in\mathbb{R}\times\mathcal{B}_{\hat{\mathscr{D}}_{n}}^{+}$ , $\psi_{n}\gneq 0$ for the following Dirichlet nonlinear eigenequation

	$\displaystyle\rho_{n}\psi_{n}(i)$	$\displaystyle=\sup_{\mu\in P(A(i))}\inf_{\nu\in P(B(i))}\bigg{[}\sum_{j\in S}\psi_{n}(j)q(j\|i,\mu,\nu)+c(i,\mu,\nu)\psi_{n}(i)\bigg{]}$
		$\displaystyle=\inf_{\nu\in P(B(i))}\sup_{\mu\in P(A(i))}\bigg{[}\sum_{j\in S}\psi_{n}(j)q(j\|i,\mu,\nu)+c(i,\mu,\nu)\psi_{n}(i)\bigg{]}.$		(2.18)

Also, for each $i\in S$ such that $\psi_{n}(i)>0$ , we have

\displaystyle\rho_{n}\leq\sup_{\pi^{1}\in\Pi^{1}}\inf_{\pi^{2}\in\Pi^{2}}\limsup_{T\rightarrow\infty}\frac{1}{T}\log E^{\pi^{1},\pi^{2}}_{i}\bigg{[}e^{\int_{0}^{T}c(\xi_{t},\pi^{1}(t),\pi^{2}(t))dt}\bigg{]}.

(2.19)

Additionally the sequence $\{\rho_{n}\}$ is bounded satisfying $\liminf_{n\rightarrow\infty}\rho_{n}\geq 0$ .

Proof.

Let $\delta>0$ . Set $\tilde{c}=\displaystyle{c-\sup_{\hat{\mathscr{D}}_{n}}c-\delta}$ . Let $\tilde{T}:\mathcal{B}_{\hat{\mathscr{D}}_{n}}\rightarrow\mathcal{B}_{\hat{\mathscr{D}}_{n}}$ be an operator defined as

\displaystyle\tilde{T}(g)(i):=\sup_{\pi^{1}\in\Pi^{1}}\inf_{\pi^{2}\in\Pi^{2}}E^{\pi^{1},\pi^{2}}_{i}\biggl{[}\int_{0}^{\tau_{n}}e^{\int_{0}^{t}\tilde{c}(\xi_{s},\pi^{1}(s),\pi^{2}(s))ds}g(\xi_{t})dt\biggr{]},~{}i\in\hat{\mathscr{D}}_{n},

(2.20)

with $\tilde{T}(g)(i)=0~{}\text{ for }~{}i\in\hat{\mathscr{D}}_{n}^{c}$ . Let $g_{1},g_{2}\in\mathcal{B}_{\hat{\mathscr{D}}_{n}}$ such that $g_{1}\succeq g_{2}$ , i.e., $g_{1}(i)\geq g_{2}(i)$ for each $i$ . Also, let $\tilde{T}(g_{1})=\hat{\varphi}_{1}$ and $\tilde{T}(g_{2})=\hat{\varphi}_{2}$ . Then there exists $\hat{\pi}^{*1}\in\Pi^{s}_{1}$ such that

	$\displaystyle-g_{2}(i)$	$\displaystyle=\sup_{\mu\in P(A(i))}\inf_{\nu\in P(B(i))}\biggl{[}\sum_{j\in S}\hat{\varphi}_{2}(j)q(j\|i,\mu,\nu)+\tilde{c}(i,\mu,\nu)\hat{\varphi}_{2}(i)\biggr{]}$
		$\displaystyle=\inf_{\nu\in P(B(i))}\biggl{[}\sum_{j\in S}\hat{\varphi}_{2}(j)q(j\|i,\hat{\pi}^{1}(i),\nu)+\tilde{c}(i,\hat{\pi}^{1}(i),\nu)\hat{\varphi}_{2}(i)\biggr{]}~{}\forall i\in\hat{\mathscr{D}}_{n}\,.$

Also, from the proof of Proposition 2.1, we have

\hat{\varphi}_{2}(i)=\tilde{T}(g_{2})(i)=\inf_{\pi^{2}\in\Pi^{2}}E^{\hat{\pi}^{*1},\pi^{2}}_{i}\biggl{[}\int_{0}^{\tau_{n}}e^{\int_{0}^{t}\tilde{c}(\xi_{s},\hat{\pi}^{*1}(\xi_{s}),\pi^{2}(s))ds}g_{2}(\xi_{t})dt\biggr{]}\,.

Thus, we deduce that

	$\displaystyle\tilde{T}(g_{1})(i)-\tilde{T}(g_{2})(i)=$	$\displaystyle\sup_{\pi^{1}\in\Pi^{1}}\inf_{\pi^{2}\in\Pi^{2}}E^{\pi^{1},\pi^{2}}_{i}\biggl{[}\int_{0}^{\tau_{n}}e^{\int_{0}^{t}\tilde{c}(\xi_{s},\pi^{1}(s),\pi^{2}(s))ds}g_{1}(\xi_{t})dt\biggr{]}$
		$\displaystyle-\sup_{\pi^{1}\in\Pi^{1}}\inf_{\pi^{2}\in\Pi^{2}}E^{\pi^{1},\pi^{2}}_{i}\biggl{[}\int_{0}^{\tau_{n}}e^{\int_{0}^{t}\tilde{c}(\xi_{s},\pi^{1}(s),\pi^{2}(s))ds}g_{2}(\xi_{t})dt\biggr{]}$
	$\displaystyle\geq$	$\displaystyle\inf_{\pi^{2}\in\Pi^{2}}E^{\hat{\pi}^{1},\pi^{2}}_{i}\biggl{[}\int_{0}^{\tau_{n}}e^{\int_{0}^{t}\tilde{c}(\xi_{s},\hat{\pi}^{1}(\xi_{s}),\pi^{2}(s))ds}g_{1}(\xi_{t})dt\biggr{]}$
		$\displaystyle-\inf_{\pi^{2}\in\Pi^{2}}E^{\hat{\pi}^{1},\pi^{2}}_{i}\biggl{[}\int_{0}^{\tau_{n}}e^{\int_{0}^{t}\tilde{c}(\xi_{s},\hat{\pi}^{1}(\xi_{s}),\pi^{2}(s))ds}g_{2}(\xi_{t})dt\biggr{]}$
	$\displaystyle\geq$	$\displaystyle\inf_{\pi^{2}\in\Pi^{2}}E^{\hat{\pi}^{1},\pi^{2}}_{i}\biggl{[}\int_{0}^{\tau_{n}}e^{\int_{0}^{t}\tilde{c}(\xi_{s},\hat{\pi}^{1}(\xi_{s}),\pi^{2}(s))ds}(g_{1}(\xi_{t})-g_{2}(\xi_{t}))dt\biggr{]}.$

This gives us $\tilde{T}(g_{1})\succeq\tilde{T}(g_{2})$ . Clearly $\tilde{T}(\lambda g)=\lambda\tilde{T}(g)$ for all $\lambda\geq 0$ . Since $\tilde{c}<-\delta$ , there exists a constant $\alpha_{2}>0$ such that

\|\tilde{T}(\hat{g}_{1})-\tilde{T}(\hat{g}_{2})\|_{\hat{\mathscr{D}}_{n}}\leq\alpha_{2}\|\hat{g}_{1}-\hat{g}_{2}\|_{\hat{\mathscr{D}}_{n}},~{}\text{ for any }\hat{g}_{1},\hat{g}_{2}\in\mathcal{B}_{\hat{\mathscr{D}}_{n}}.

Thus $\tilde{T}$ is continuous. Let $\{g_{m}\}$ be a bounded sequence in $\mathcal{B}_{\hat{\mathscr{D}}_{n}}$ . Then from (2.20), for some constant $\alpha_{3}>0$ such that $\|\tilde{T}g_{m}\|_{\hat{\mathscr{D}}_{n}}\leq\alpha_{3}$ . Now applying diagonalization arguments, there exist a subsequence of $\{\tilde{T}g_{m}\}$ , ( denoting by the same sequence without loss of generality) and a function $\phi\in\mathcal{B}_{\hat{\mathscr{D}}_{n}}$ such that $\|\tilde{T}g_{m}-\phi\|_{\hat{\mathscr{D}}_{n}}\rightarrow 0$ as $m\rightarrow\infty$ . Hence the map $\tilde{T}:\mathcal{B}_{\hat{\mathscr{D}}_{n}}\rightarrow\mathcal{B}_{\hat{\mathscr{D}}_{n}}$ is compact. Therefore it is completely continuous. Let $g\in\mathcal{B}_{\hat{\mathscr{D}}_{n}}$ such that $g(i_{0})=1$ and $g(j)=0$ for all $j\neq i_{0}$ . Then by (2.20), we have

	$\displaystyle\tilde{T}(g)(i_{0})$	$\displaystyle\geq\sup_{\pi^{1}\in\Pi^{1}}\inf_{\pi^{2}\in\Pi^{2}}E^{\pi^{1},\pi^{2}}_{i_{0}}\biggl{[}\int_{0}^{T_{1}}e^{\int_{0}^{t}\tilde{c}(\xi_{s},\pi^{1}(s),\pi^{2}(s))ds}g(\xi_{t})dt\biggr{]}$
		$\displaystyle\geq\frac{g(i_{0})}{\\|\tilde{c}\\|_{\hat{\mathscr{D}}_{n}}}\sup_{\pi^{1}\in\Pi^{1}}\inf_{\pi^{2}\in\Pi^{2}}E^{\pi^{1},\pi^{2}}_{i_{0}}\bigg{[}1-e^{-\\|\tilde{c}\\|_{\hat{\mathscr{D}}_{n}}T_{1}}\bigg{]}$
		$\displaystyle=g(i_{0})\frac{1}{\\|\tilde{c}\\|_{\hat{\mathscr{D}}_{n}}+q^{*}(i_{0})},$

where $T_{1}$ is the first jump time (clearly, $T_{1}\leq\tau_{n}$ ). Thus $N\tilde{T}(g)\succeq g$ where $N={\|\tilde{c}\|_{\hat{\mathscr{D}}_{n}}+q^{*}(i_{0})}>0$ . Therefore by Theorem 2.1, there exists a nontrivial $\psi_{n}\in\mathcal{B}_{\hat{\mathscr{D}}_{n}}^{+}$ where $\psi_{n}\neq 0$ and a constant $\lambda_{\hat{\mathscr{D}}_{n}}>0$ such that $\tilde{T}(\psi_{n})=\lambda_{\hat{\mathscr{D}}_{n}}\psi_{n}$ , i.e.,

\displaystyle\tilde{\rho}_{n}\psi_{n}(i)=\sup_{\mu\in P(A(i))}\inf_{\nu\in P(B(i))}\biggl{[}\psi_{n}(j)q(j|i,\mu,\nu)+\tilde{c}(i,\mu,\nu)\psi_{n}(i)\biggr{]}~{}\forall i\in\hat{\mathscr{D}}_{n},

where $\tilde{\rho}_{n}=-[\lambda_{\hat{\mathscr{D}}_{n}}]^{-1}$ . Therefore in terms of $c$ , we have

\displaystyle\rho_{n}\psi_{n}(i)=\sup_{\mu\in P(A(i))}\inf_{\nu\in P(B(i))}\biggl{[}\psi_{n}(j)q(j|i,\mu,\nu)+c(i,\mu,\nu)\psi_{n}(i)\biggr{]}~{}\forall i\in\hat{\mathscr{D}}_{n},

where $\rho_{n}=\tilde{\rho}_{n}+\displaystyle{\sup_{\hat{\mathscr{D}}_{n}}c}+\delta$ . Now by Fan’s minimax theorem, see [[8], Theorem 3], we have

	$\displaystyle\rho_{n}\psi_{n}(i)$	$\displaystyle=\sup_{\mu\in P(A(i))}\inf_{\nu\in P(B(i))}\biggl{[}\psi_{n}(j)q(j\|i,\mu,\nu)+c(i,\mu,\nu)\psi_{n}(i)\biggr{]}$
		$\displaystyle=\inf_{\nu\in P(B(i))}\sup_{\mu\in P(A(i))}\biggl{[}\psi_{n}(j)q(j\|i,\mu,\nu)+c(i,\mu,\nu)\psi_{n}(i)\biggr{]}~{}\forall i\in\hat{\mathscr{D}}_{n}.$

This proves that (2.18) admits a unique solution. As before by the continuity of $c,q$ and the compactness of $A(i)$ , there exists $\pi^{*1}_{n}\in\Pi^{s}_{1}$ such that (2.18), can be written as

\displaystyle\rho_{n}\psi_{n}(i)=\inf_{\nu\in P(B(i))}\biggl{[}\psi_{n}(j)q(j|i,\pi^{*1}_{n}(i),\nu)+c(i,\pi^{*1}_{n}(i),\nu)\psi_{n}(i)\biggr{]}~{}\forall i\in\hat{\mathscr{D}}_{n}.

(2.21)

Now applying Dynkin’s formula (see [[7], Lemma 3.1]) and using (2.21), we get

	$\displaystyle\psi_{n}(i)$	$\displaystyle\leq E^{\pi^{1}_{n},\pi^{2}}_{i}\biggl{[}e^{\int_{0}^{T}({c}(\xi_{s},\pi^{1}_{n}(\xi_{s}),\pi^{2}(s))-\rho_{n})ds}\psi_{n}(\xi_{T})I_{\{T<\tau_{n}\}}\biggr{]}$
		$\displaystyle\leq(\sup_{\hat{\mathscr{D}}_{n}}\psi_{n})E^{\pi^{1}_{n},\pi^{2}}_{i}\biggl{[}e^{\int_{0}^{T}({c}(\xi_{s},\pi^{1}_{n}(\xi_{s}),\pi^{2}(s))-\rho_{n})ds}\biggr{]}.$		(2.22)

If $\psi_{n}(i)>0$ then by taking logarithm on the both sides in (2.22), dividing by $T$ and letting $T\rightarrow\infty$ , we get

\displaystyle\rho_{n}

\displaystyle\leq\limsup_{T\rightarrow\infty}\frac{1}{T}\log E^{\pi^{*1}_{n},\pi^{2}}_{i}\biggl{[}e^{\int_{0}^{T}{c}(\xi_{s},\pi^{*1}_{n}(\xi_{s}),\pi^{2}(s))ds}\biggr{]}.

Since $\pi^{2}\in\Pi^{2}$ is arbitrary, we obtain

	$\displaystyle\rho_{n}$	$\displaystyle\leq\inf_{\pi^{2}\in\Pi^{2}}\limsup_{T\rightarrow\infty}\frac{1}{T}\log E^{\pi^{1}_{n},\pi^{2}}_{i}\biggl{[}e^{\int_{0}^{T}{c}(\xi_{s},\pi^{1}_{n}(\xi_{s}),\pi^{2}(s))ds}\biggr{]}$
		$\displaystyle\leq\sup_{\pi^{1}\in\Pi^{1}}\inf_{\pi^{2}\in\Pi^{2}}\limsup_{T\rightarrow\infty}\frac{1}{T}\log E^{\pi^{1},\pi^{2}}_{i}\biggl{[}e^{\int_{0}^{T}c(\xi_{s},\pi^{1}(s),\pi^{2}(s))ds}\biggr{]}.$

We now show that $J(i,c,\pi^{1},\pi^{2})$ is finite for every $(\pi^{1},\pi^{2})\in\Pi^{1}\times\Pi^{2}$ and $i\in S$ . We only provide a proof under Assumption 2.2 (b) and the proof under Assumption 2.2 (a) would be analogous. Now from (2.7) we get

\displaystyle\sup_{(a,b)\in A(i)\times B(i)}\sum_{j\in S}V(j)q(j|i,a,b)\leq(C-\hat{\ell}(i))V(i)~{}\forall i\in S.

(2.23)

Then by Dynkin formula, we get

\displaystyle E^{\pi^{1},\pi^{2}}_{i}\bigg{[}e^{\int_{0}^{T\wedge\tau_{n}}(\hat{\ell}(\xi_{t})-C)dt}V(\xi_{T\wedge\tau_{n}})\bigg{]}\leq V(i)~{}\forall i\in S.

(2.24)

By Fatou’s lemma, taking $n\rightarrow\infty$ in (2.24), we get

\displaystyle E^{\pi^{1},\pi^{2}}_{i}\bigg{[}e^{\int_{0}^{T}(\hat{\ell}(\xi_{t})-C)dt}V(\xi_{T})\bigg{]}\leq V(i)~{}\forall i\in S.

Now, since $V\geq 1$ , taking logarithm on both sides in the above equation, dividing both sides by $T$ and letting $T\rightarrow\infty$ , we obtain

J(i,\hat{\ell},\pi^{1},\pi^{2})\leq C~{}\text{ for all}~{}i\in S.

Since, $\hat{\ell}-\displaystyle\sup_{(a,b)\in A(i)\times B(i)}c(\cdot,a,b)$ is norm-like, we have $\displaystyle\sup_{(a,b)\in A(i)\times B(i)}c(i,a,b)\leq\hat{\ell}(i)+k_{1}$ $\forall$ $i\in S$ for some constant $k_{1}$ . Hence we get

\displaystyle J(i,c,\pi^{1},\pi^{2})\leq C+k_{1}~{}~{}\forall(\pi^{1},\pi^{2})\in\Pi^{1}\times\Pi^{2},\forall i\in S.

(2.25)

It is clear from (2.19) and (2.25) that $\rho_{n}$ has an upper bound. Next we prove that $\rho_{n}$ is bounded below. By using assumption 2.3 (iii) and (2.18), we have $\psi_{n}(i_{0})>0$ . Thus normalizing $\psi_{n}$ , we have $\psi_{n}(i_{0})=1$ . Also, since $c\geq 0$ , by (2.18) we get

	$\displaystyle\rho_{n}$	$\displaystyle\geq\sup_{\mu\in P(A(i_{0}))}\inf_{\nu\in P(B(i_{0}))}\bigg{[}\sum_{j\in S}\psi_{n}(j)q(j\|i_{0},\mu,\nu)\bigg{]}$
		$\displaystyle\geq\sup_{\mu\in P(A(i_{0}))}\inf_{\nu\in P(B(i_{0}))}q(i_{0}\|i_{0},\mu,\nu).$

So, $\{\rho_{n}\}$ is bounded below. Now we claim that $\hat{\rho}:=\displaystyle\liminf_{n\rightarrow\infty}\rho_{n}\geq 0$ . If not, then on contrarary, $\hat{\rho}<0$ . So, along some subsequence, we have (with an abuse of notation, we use the same sequence) $\rho_{n}\rightarrow\hat{\rho}$ , as $n\rightarrow\infty$ and for large $n$ , $\rho_{n}<0$ . Let $\pi_{n}^{*2}$ be outer minimizing selector of (2.18). Thus, using (2.18), for large enough $n$ , we have

	$\displaystyle 0>\rho_{n}\psi_{n}(i_{0})$	$\displaystyle=\sup_{\mu\in P(A(i_{0}))}\bigg{[}\sum_{j\in S}\psi_{n}(j)q(j\|i_{0},\mu,\pi^{2}_{n}(i_{0}))+c(i_{0},\mu,\pi_{n}^{2}(i_{0}))\psi_{n}(i_{0})\bigg{]}$
		$\displaystyle\geq\sup_{\mu\in P(A(i_{0}))}\bigg{[}\sum_{j\in S}\psi_{n}(j)q(j\|i_{0},\mu,\pi^{*2}_{n}(i_{0}))\bigg{]}$
		$\displaystyle\geq\sum_{j\in S}\psi_{n}(j)q(j\|i_{0},\mu,\pi_{n}^{*2}(i_{0})).$

Now by Assumption 2.3 (iii), from the above equation, we get

\displaystyle\psi_{n}(j)\leq\frac{-q(i_{0}|i_{0},\mu,\pi_{n}^{*2}(i_{0}))}{q(j|i_{0},\mu,\pi_{n}^{*2}(i_{0}))}\leq\sup_{\mu\in P(A(i_{0}))}\sup_{\nu\in P(B(i_{0}))}\frac{-q(i_{0}|i_{0},\mu,\nu)}{q(j|i_{0},\mu,\nu)}~{}\text{ for }~{}j\neq i_{0}.

So, by diagonalization argument we say, there exist a subsequence (denoting by the same sequence with an abuse of notation) and a function $\psi$ with $\psi(i_{0})=1$ such that $\psi_{n}(i)\rightarrow\psi(i)$ , as $n\rightarrow\infty$ for all $i\in S$ . By our assumption $A(i)$ is compact for each $i\in S$ and $\pi_{n}^{*2}$ is outer minimizing selector of (2.18). Hence we have $\pi^{*2}_{n}(i)\rightarrow\pi^{*2}(i)$ , for all $i\in S$ , as $n\to\infty$ . Therefore we have

\displaystyle\rho_{n}\psi_{n}(i)

\displaystyle\geq\sum_{j\in S}\psi_{n}(j)q(j|i,\mu,\pi^{*2}_{n}(i))+c(i,\mu,\pi^{*2}_{n}(i))\psi_{n}(i).

(2.26)

So, taking $n\rightarrow\infty$ in the above equation, we obtain

	$\displaystyle 0>\hat{\rho}\psi(i)$	$\displaystyle\geq\sum_{j\in S}\psi(j)q(j\|i,\mu,\pi^{2}(i))+c(i,\mu,\pi^{2}(i))\psi(i)$
		$\displaystyle\geq\sum_{j\in S}\psi(j)q(j\|i,\mu,\pi^{*2}(i)).$		(2.27)

Let $\pi^{1}\in\Pi^{s}_{1}$ . Applying Dynkin formula and using (2.27), we obtain

	$\displaystyle E^{\pi^{1},\pi^{*2}}_{i}[\psi(\xi_{t\wedge\tau_{n}})]-\psi(i)$
	$\displaystyle=E^{\pi^{1},\pi^{2}}_{i}\biggl{[}\int_{0}^{t\wedge\tau_{n}}\sum_{j\in S}\psi(j)q(j\|\xi_{s},\pi^{1}(\xi_{s}),\pi^{2}(\xi_{s}))ds\biggr{]}\leq 0.$

Now, using dominated convergence theorem, taking $n\rightarrow\infty$ , we get $E^{\pi^{1},\pi^{*2}}_{i}[\psi(\xi_{t})]\leq\psi(i)$ . So, with respect to the canonical filtration of $\xi$ , $\{\psi(\xi_{t})\}$ is supermartingale. So, by Doob’s martingale convergence theorem as $t\rightarrow\infty$ , $\psi(\xi_{t})$ converges. Now by Assumption 2.2, $\xi$ is recurrent. Thus the skeleton process $\{\xi_{n}:n\in\mathbb{N}\}$ is also recurrent (see for details [[2], Proposition 5.1.1]). This implies, that the process $\{\xi_{n}:n\in\mathbb{N}\}$ visits every state of $S$ infinitely often. But this is possible only if $\psi\equiv 1$ . Since $c\geq 0$ , this contradicts (2.27). Thus, $\displaystyle\liminf_{n\rightarrow\infty}\rho_{n}\geq 0$ . ∎

Lemma 2.3.

Suppose Assumptions 2.1, 2.2, and 2.3 hold. Then there exists $(\rho,\psi^{*})\in\mathbb{R}_{+}\times L^{\infty}_{V}$ with $\psi^{*}>0$ , such that

	$\displaystyle\rho\psi^{*}(i)$	$\displaystyle=\sup_{\mu\in P(A(i))}\inf_{\nu\in P(B(i))}\bigg{[}\sum_{j\in S}\psi^{}(j)q(j\|i,\mu,\nu)+c(i,\mu,\nu)\psi^{}(i)\bigg{]}$
		$\displaystyle=\inf_{\nu\in P(B(i))}\sup_{\mu\in P(A(i))}\bigg{[}\sum_{j\in S}\psi^{}(j)q(j\|i,\mu,\nu)+c(i,\mu,\nu)\psi^{}(i)\bigg{]},~{}i\in S.$		(2.28)

Also, the solution $(\rho,\psi^{*})$ has the following characteristic.

(i)

$\rho\leq\displaystyle\inf_{i\in S}\displaystyle\displaystyle\sup_{\pi^{1}\in\Pi^{1}}\inf_{\pi^{2}\in\Pi^{2}}\limsup_{T\rightarrow\infty}\frac{1}{T}\ln E^{\pi^{1},\pi^{2}}_{i}\biggl{[}e^{\int_{0}^{T}c(\xi_{t},\pi^{1}(t),\pi^{2}(t))dt}\biggr{]}$ .

(ii)

For any mini-max selector $(\pi^{*1},\pi^{*2})\in\Pi_{1}^{s}\times\Pi_{2}^{s}$ of (2.28), we have

	$\displaystyle\psi^{*}(i)$	$\displaystyle=\sup_{\pi^{1}\in\Pi^{1}}E^{\pi^{1},\pi^{2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}(\mathscr{B})}(c(\xi_{t},\pi^{1}(t),\pi^{2}(\xi_{t}))-\rho)dt}\psi^{*}(\xi_{\tilde{\tau}(\mathscr{B})})\bigg{]}$
		$\displaystyle=\inf_{\pi^{2}\in\Pi^{2}}E^{\pi^{1},\pi^{2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}(\mathscr{B})}(c(\xi_{t},\pi^{1}(\xi_{t}),\pi^{2}(t))-\rho)dt}\psi^{*}(\xi_{\tilde{\tau}(\mathscr{B})})\bigg{]}~{}\forall i\in\mathscr{B}^{c}\,,$		(2.29)

for some finite set $\mathscr{B}\supset\hat{\mathscr{K}}$ .

Proof.

Using Assumption 2.2 and the fact $c\geq 0$ , there exists a finite set $\mathscr{B}$ containing $\hat{\mathscr{K}}$ such that the following hold.

•

When Assumption 2.2 (a) holds:

\displaystyle(\sup_{(a,b)\in A(i)\times B(i)}c(i,a,b)-\rho_{n})<\hat{\gamma}~{}\forall i\in\mathscr{B}^{c},~{}\text{for all}~{}n~{}\text{large}.

(2.30)

•

When Assumption 2.2 (b) holds:

\displaystyle(\sup_{(a,b)\in A(i)\times B(i)}c(i,a,b)-\rho_{n})<\hat{\ell}(i)~{}\forall i\in\mathscr{B}^{c},~{}\text{for all}~{}n~{}\text{large}.

(2.31)

Now we scale $\psi_{n}$ in such a way that it touches $V$ from below. Define

\hat{\theta}_{n}=\sup\{k>0:(V-k\psi_{n})>0~{}\text{in}~{}S\}.

Then we see that $\hat{\theta}_{n}$ is finite as $\psi_{n}$ vanishes in $\hat{\mathscr{D}}_{n}^{c}$ and $\psi_{n}\gneq 0$ . We claim that if we replace $\psi_{n}$ by $\hat{\theta}_{n}\psi_{n}$ , then $\psi_{n}$ touches $V$ inside $\mathscr{B}$ . If not, then for some state $\hat{i}\in\mathscr{B}^{c}$ , $(V-\psi_{n})(\hat{i})=0$ and $V-\psi_{n}>0$ in $\mathscr{B}\cup\hat{\mathscr{D}}_{n}^{c}$ . Let $\pi^{*1}_{n}\in\Pi_{1}^{s}$ be an outer maximizing selector of (2.18). Then by Dynkin formula, we get (under Assumption 2.2 (b))

	$\displaystyle\psi_{n}(\hat{i})$	$\displaystyle\leq E^{\pi^{1}_{n},\pi^{2}}_{\hat{i}}\biggl{[}e^{\int_{0}^{T\wedge\tilde{\tau}(\mathscr{B})}(c(\xi_{s},\pi^{1}_{n}(\xi_{s}),\pi^{2}(s))-\rho_{n})ds}\psi_{n}(\xi_{T\wedge\tilde{\tau}(\mathscr{B})})I_{\{T\wedge\tilde{\tau}(\mathscr{B})<\tau_{n}\}}\biggr{]}$
		$\displaystyle\leq E^{\pi^{*1}_{n},\pi^{2}}_{\hat{i}}\biggl{[}e^{\int_{0}^{T\wedge\tilde{\tau}(\mathscr{B})}\hat{\ell}(\xi_{s})ds}\psi_{n}(\xi_{T\wedge\tilde{\tau}(\mathscr{B})})I_{\{T\wedge\tilde{\tau}(\mathscr{B})<\tau_{n}\}}\biggr{]}.$

Since $\psi_{n}\leq V$ , in view of Lemma 2.1, by the dominated convergence theorem, taking $T\rightarrow\infty$ , we get

\displaystyle\psi_{n}(\hat{i})\leq E^{\pi^{*1}_{n},\pi^{2}}_{\hat{i}}\biggl{[}e^{\int_{0}^{\tilde{\tau}(\mathscr{B})}\hat{\ell}(\xi_{s})ds}\psi_{n}(\xi_{\tilde{\tau}(\mathscr{B})})\biggr{]}.

Using this and (2.17), we have

\displaystyle 0=(V-\psi_{n})(\hat{i})\geq E^{\pi^{*1}_{n},\pi^{2}}_{\hat{i}}\bigg{[}e^{\int_{0}^{\tilde{\tau}(\mathscr{B})}\hat{\ell}(\xi_{s})ds}(V-\psi_{n})(\xi_{\tilde{\tau}(\mathscr{B})})\bigg{]}>0.

Hence we arrive at a contradiction. Thus $\psi_{n}$ touches $V$ inside $\mathscr{B}$ . Similar conclusion holds under Assumption 2.2 (a). Now, since $\psi_{n}\leq V$ for all large n, by diagonalization argument, there exists a subsequence (by an abuse of notation, we use the same sequence) such that, $\psi_{n}\rightarrow\psi^{*}$ for all $i\in S$ , as $n\rightarrow\infty$ , and $\psi^{*}\leq V$ . Also, since by Lemma 2.2, the sequence $\{\rho_{n}\}$ is bounded and $\displaystyle\liminf_{n\rightarrow\infty}\rho_{n}\geq 0$ , we can find a subsequence (by an abuse of notation we use the same sequence) and some $\rho\geq 0$ such that $\rho_{n}\rightarrow\rho$ as $n\rightarrow\infty$ . Thus as before, there exists a mini-max selector $(\pi^{*1}_{n},\pi^{*2}_{n})\in\Pi_{1}^{s}\times\Pi_{2}^{s}$ of (2.18), i.e.,

	$\displaystyle\inf_{\nu\in P(B(i))}\bigg{[}\sum_{j\in S}\psi_{n}(j)q(j\|i,\pi^{1}_{n}(i),\nu)+c(i,\pi^{1}_{n}(i),\nu)\psi_{n}(i)\bigg{]}$
	$\displaystyle=\sup_{\mu\in P(A(i))}\inf_{\nu\in P(B(i))}\bigg{[}\sum_{j\in S}\psi_{n}(j)q(j\|i,\mu,\nu)+c(i,\mu,\nu)\psi_{n}(i)\bigg{]}$
	$\displaystyle=\inf_{\nu\in P(B(i))}\sup_{\mu\in P(A(i))}\bigg{[}\sum_{j\in S}\psi_{n}(j)q(j\|i,\mu,\nu)+c(i,\mu,\nu)\psi_{n}(i)\bigg{]}$
	$\displaystyle=\sup_{\mu\in P(A(i))}\bigg{[}\sum_{j\in S}\psi_{n}(j)q(j\|i,\mu,\pi^{2}_{n}(i))+c(i,\mu,\pi^{2}_{n}(i))\psi_{n}(i)\bigg{]}.$		(2.32)

Hence,

\displaystyle\rho_{n}\psi_{n}(i)\leq\bigg{[}\sum_{j\in S}\psi_{n}(j)q(j|i,\pi^{*1}_{n}(i),\nu)+c(i,\pi^{*1}_{n}(i),\nu)\psi_{n}(i)\bigg{]}.

The above implies

\displaystyle\rho_{n}\psi_{n}(i)-\psi_{n}(i)q(i|i,\pi^{*1}_{n}(i),\nu)\leq\bigg{[}\sum_{j\neq i}\psi_{n}(j)q(j|i,\pi^{*1}_{n}(i),\nu)+c(i,\pi^{*1}_{n}(i),\nu)\psi_{n}(i)\bigg{]}.

(2.33)

Now, since $\psi_{n}(i)\leq V(i)$ for all $i\in S$ , we have

\displaystyle\sum_{j\neq i}\psi(j)q(j|i,\pi^{*1}_{n}(i),\nu)\leq\sum_{j\neq i}V(j)q(j|i,\pi^{*1}_{n}(i),\nu).

(2.34)

Also, since $\Pi_{1}^{s}$ and $\Pi_{2}^{s}$ are compact there exist $\pi^{*1}\in\Pi_{1}^{s}$ and $\pi^{*2}\in\Pi_{2}^{s}$ such that $\pi^{*1}_{n}\rightarrow\pi^{*1}$ and $\pi^{*2}_{n}\rightarrow\pi^{*2}$ as $n\rightarrow\infty$ . Under given assumptions, from [[13], Lemma 7.2] it is clear that the functions $c(i,\mu,\nu)$ , and $\displaystyle\sum_{j\in S}q(j|i,\mu,\nu)u(j)$ are continuous at $(\mu,\nu)$ on $P(A(i))\times P(B(i))$ for each fixed $u\in L^{\infty}_{V}$ , $i\in S$ . Therefore by the dominated convergence theorem, letting $n\rightarrow\infty$ in (2.33), we obtain

\displaystyle\rho\psi^{*}(i)\leq\sum_{j\in S}\psi^{*}(j)q(j|i,\pi^{*1}(i),\nu)+c(i,\pi^{*1}(i),\nu)\psi^{*}(i).

Hence we have

	$\displaystyle\rho\psi^{*}(i)$	$\displaystyle\leq\inf_{\nu\in P(B(i))}\bigg{[}\sum_{j\in S}\psi^{}(j)q(j\|i,\pi^{1}(i),\nu)+c(i,\pi^{1}(i),\nu)\psi^{}(i)\biggr{]}.$
		$\displaystyle\leq\sup_{\mu\in P(A(i))}\inf_{\nu\in P(B(i))}\bigg{[}\sum_{j\in S}\psi^{}(j)q(j\|i,\mu,\nu)+c(i,\mu,\nu)\psi^{}(i)\biggr{]}.$		(2.35)

By similar arguments using (2.32) and extended Fatou’s lemma [[20], Lemma 8.3.7], we get

	$\displaystyle\rho\psi^{*}(i)$	$\displaystyle\geq\sup_{\mu\in P(A(i))}\bigg{[}\sum_{j\in S}\psi^{}(j)q(j\|i,\mu,\pi^{2}(i))+c(i,\mu,\pi^{2}(i))\psi^{}(i)\biggr{]}$
		$\displaystyle\geq\inf_{\nu\in P(B(i))}\sup_{\mu\in P(A(i))}\bigg{[}\sum_{j\in S}\psi^{}(j)q(j\|i,\mu,\nu)+c(i,\mu,\nu)\psi^{}(i)\biggr{]}.$		(2.36)

Hence by (2.35) and (2.36), we get (2.28). Since at some point in $\mathscr{B}$ we have $(V-\psi_{n})=0$ , for all large $n$ . It follows that $(V-\psi^{*})(i^{*})=0$ for some $i^{*}\in\mathscr{B}$ . Since $V\geq 1$ , it is clear that $\psi^{*}$ is nontrivial. Now we claim that $\psi^{*}>0$ . If not, then we must have $\psi^{*}(\tilde{i})=0$ for some $\tilde{i}\in S$ . Again as before, there exits a pair of a mini-max selector $(\pi^{*1},\pi^{*2})\in\Pi_{1}^{s}\times\Pi_{2}^{s}$ such that form (2.28), we have

\displaystyle\rho\psi^{*}(\tilde{i})=\bigg{[}\sum_{j\in S}\psi^{*}(j)q(j|\tilde{i},\pi^{*1}(\tilde{i}),\pi^{*2}(\tilde{i}))+c(\tilde{i},\pi^{*1}(\tilde{i}),\pi^{*2}(\tilde{i}))\psi^{*}(\tilde{i})\biggr{]}.

(2.37)

This implies

\sum_{j\neq\tilde{i}}\psi^{*}(j)q(j|\tilde{i},\pi^{*1}(\tilde{i}),\pi^{*2}(\tilde{i}))=0.

Since the Markov chain $\xi$ is irreducible under $(\pi^{*1},\pi^{*2})\in\Pi^{s}_{1}\times\Pi^{s}_{2}$ , from the above equation, it follows that $\psi^{*}\equiv 0$ . So, we arrive at a contradiction. This proves the claim. Now we prove (i) and (ii).

(i) Since $\psi^{*}>0$ and $\psi_{n}(i)\rightarrow\psi^{*}(i)$ as $n\rightarrow\infty$ , we have $\psi_{n}>0$ for all large enough $n$ . So, using (2.19), we have $\displaystyle\lim_{n\rightarrow\infty}\rho_{n}\leq\sup_{\pi^{1}\in\Pi^{1}}\inf_{\pi^{2}\in\Pi^{2}}J(i,c,\pi^{1},\pi^{2})$ for all $i\in S$ .

(ii) By measurable selection theorem in [[27], Theorem 2.2], there exists a pair of strategies (a mini-max selector) $(\pi^{*1},\pi^{*2})\in\Pi_{1}^{s}\times\Pi_{2}^{s}$ (as in (2.32)) satisfying

	$\displaystyle\rho\psi^{*}(i)$	$\displaystyle=\sup_{\mu\in P(A(i))}\bigg{[}\sum_{j\in S}\psi^{}(j)q(j\|i,\mu,\pi^{2}(i))+c(i,\mu,\pi^{2}(i))\psi^{}(i)\biggr{]}$
		$\displaystyle=\inf_{\nu\in P(B(i))}\bigg{[}\sum_{j\in S}\psi^{}(j)q(j\|i,\pi^{1}(i),\nu)+c(i,\pi^{1}(i),\nu)\psi^{}(i)\biggr{]}.$		(2.38)

Using (2.38), Lemma 2.1, and Dynkin’s formula, we have

\displaystyle\psi^{*}(i)\geq E^{\pi^{1},\pi^{*2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}(\mathscr{B})\wedge T}(c(\xi_{t},\pi^{1}(t),\pi^{*2}(\xi_{t}))-\rho)dt}\psi^{*}(\xi_{\tilde{\tau}(\mathscr{B})\wedge T})\bigg{]}~{}\forall i\in\mathscr{B}^{c}.

By Fatou’s lemma taking $T\rightarrow\infty$ , we get

\displaystyle\psi^{*}(i)\geq E^{\pi^{1},\pi^{*2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}(\mathscr{B})}(c(\xi_{t},\pi^{1}(t),\pi^{*2}(\xi_{t}))-\rho)dt}\psi^{*}(\xi_{\tilde{\tau}(\mathscr{B})})\bigg{]},~{}\forall i\in\mathscr{B}^{c}.

(2.39)

Hence,

	$\displaystyle\psi^{*}(i)$	$\displaystyle\geq\sup_{\pi^{1}\in\Pi^{1}}E^{\pi^{1},\pi^{2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}(\mathscr{B})}(c(\xi_{t},\pi^{1}(t),\pi^{2}(\xi_{t}))-\rho)dt}\psi^{*}(\xi_{\tilde{\tau}(\mathscr{B})})\bigg{]}$
		$\displaystyle\geq\inf_{\pi^{2}\in\Pi^{2}}\sup_{\pi^{1}\in\Pi^{1}}E^{\pi^{1},\pi^{2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}(\mathscr{B})}(c(\xi_{t},\pi^{1}(t),\pi^{2}(t))-\rho)dt}\psi^{*}(\xi_{\tilde{\tau}(\mathscr{B})})\bigg{]},~{}\forall i\in\mathscr{B}^{c}.$		(2.40)

Also, using (2.38), Lemma 2.1, and Dynkin’s formula, we obtain

\displaystyle\psi^{*}(i)\leq E^{\pi^{*1},\pi^{2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}(\mathscr{B})\wedge T}(c(\xi_{t},\pi^{*1}(\xi_{t}),\pi^{2}(t))-\rho)dt}\psi^{*}(\xi_{\tilde{\tau}(\mathscr{B})\wedge T})\bigg{]}~{}\forall i\in\mathscr{B}^{c}.

Since $\psi^{*}\leq V$ , using the estimates as in Lemma 2.1, taking $T\rightarrow\infty$ , by dominated convergent theorem it follows that

\displaystyle\psi^{*}(i)

\displaystyle\leq E^{\pi^{*1},\pi^{2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}(\mathscr{B})}(c(\xi_{t},\pi^{*1}(\xi_{t}),\pi^{2}(t))-\rho)dt}\psi^{*}(\xi_{\tilde{\tau}(\mathscr{B})})\bigg{]}~{}\forall i\in\mathscr{B}^{c}.

(2.41)

Hence

	$\displaystyle\psi^{*}(i)$	$\displaystyle\leq\inf_{\pi^{2}\in\Pi^{2}}E^{\pi^{1},\pi^{2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}(\mathscr{B})}(c(\xi_{t},\pi^{1}(\xi_{t}),\pi^{2}(t))-\rho)dt}\psi^{*}(\xi_{\tilde{\tau}(\mathscr{B})})\bigg{]}$
		$\displaystyle\leq\sup_{\pi^{1}\in\Pi^{1}}\inf_{\pi^{2}\in\Pi^{2}}E^{\pi^{1},\pi^{2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}(\mathscr{B})}(c(\xi_{t},\pi^{1}(t),\pi^{2}(t))-\rho)dt}\psi^{*}(\xi_{\tilde{\tau}(\mathscr{B})})\bigg{]},~{}\forall i\in\mathscr{B}^{c}.$		(2.42)

From (2.40) and (2.42), we get (2.29). ∎

3. Existence of risk-sensitive average optimal strategies

In this section we prove that any mini-max selector of the associated HJI equation is a saddle point equilibrium. Also, exploiting the stochastic representation (2.29) we completely characterize all possible saddle point equilibrium in the space of stationary Markov strategies.

Theorem 3.1.

Suppose Assumptions 2.1, 2.2, and 2.3 hold. Then for any mini-max selector $(\pi^{*1},\pi^{*2})\in\Pi^{s}_{1}\times\Pi^{s}_{2}$ of (2.28), i.e., for any pair $(\pi^{*1},\pi^{*2})\in\Pi^{s}_{1}\times\Pi^{s}_{2}$ satisfying

	$\displaystyle\inf_{\nu\in P(B(i))}\bigg{[}\sum_{j\in S}\psi^{}(j)q(j\|i,\pi^{1}(i),\nu)+c(i,\pi^{1}(i),\nu)\psi^{}(i)\bigg{]}$
	$\displaystyle=\sup_{\mu\in P(A(i))}\inf_{\nu\in P(B(i))}\bigg{[}\sum_{j\in S}\psi^{}(j)q(j\|i,\mu,\nu)+c(i,\mu,\nu)\psi^{}(i)\bigg{]}$
	$\displaystyle=\inf_{\nu\in P(B(i))}\sup_{\mu\in P(A(i))}\bigg{[}\sum_{j\in S}\psi^{}(j)q(j\|i,\mu,\nu)+c(i,\mu,\nu)\psi^{}(i)\bigg{]}$
	$\displaystyle=\sup_{\mu\in P(A(i))}\bigg{[}\sum_{j\in S}\psi^{}(j)q(j\|i,\mu,\pi^{2}(i))+c(i,\mu,\pi^{2}(i))\psi^{}(i)\bigg{]},~{}i\in S,$		(3.1)

we have

$\displaystyle\rho$	$\displaystyle=\inf_{i\in S}\sup_{\pi^{1}\in\Pi^{1}}\limsup_{T\rightarrow\infty}\frac{1}{T}\log E^{\pi^{1},\pi^{2}}_{i}\bigg{[}e^{\int_{0}^{T}c(\xi_{t},\pi^{1}(t),\pi^{2}(\xi_{t}))dt}\bigg{]}$
	$\displaystyle=\inf_{i\in S}\sup_{\pi^{1}\in\Pi^{1}}\inf_{\pi^{2}\in\Pi^{2}}\limsup_{T\rightarrow\infty}\frac{1}{T}\log E^{\pi^{1},\pi^{2}}_{i}\bigg{[}e^{\int_{0}^{T}c(\xi_{t},\pi^{1}(t),\pi^{2}(t))dt}\bigg{]}$
	$\displaystyle=\inf_{i\in S}\inf_{\pi^{2}\in\Pi^{2}}\sup_{\pi^{1}\in\Pi^{1}}\limsup_{T\rightarrow\infty}\frac{1}{T}\log E^{\pi^{1},\pi^{2}}_{i}\bigg{[}e^{\int_{0}^{T}c(\xi_{t},\pi^{1}(t),\pi^{2}(t))dt}\bigg{]}.$	(3.2)

Proof.

We perturb the cost function as follows.

•

If Assumption 2.2 (a) holds: We define for $(a,b)\in A(i)\times B(i)$ , $i\in S$ , $\hat{c}_{n}(i,a,b)=c(i,a,b)I_{\hat{\mathscr{D}}_{n}}(i)+(\|c\|_{\infty}+\alpha_{3})I_{\hat{\mathscr{D}}_{n}^{c}}$ . Here $\alpha_{3}>0$ , is a small number satisfying $\|c\|_{\infty}+\alpha_{3}<\hat{\gamma}$ . Note that $\|\hat{c}_{n}\|_{\infty}<\hat{\gamma}$ .
•

If Assumption 2.2 (b) holds: We define for $(a,b)\in A(i)\times B(i)$ , $i\in S$ , $\hat{c}_{n}(i,a,b)=c(i,a,b)+\frac{1}{n}[\hat{\ell}(i)-\displaystyle\sup_{(a,b)\in A(i)\times B(i)}c(i,a,b)]_{+}$ . Note that the function $[\hat{\ell}(\cdot)-\displaystyle\sup_{(a,b)\in A(\cdot)\times B(\cdot)}c(\cdot,a,b)]_{+}$ is norm-like function. Also, it is easy to see that for large enough $n$ , $\hat{\ell}(\cdot)-\displaystyle\sup_{(a,b)\in A(\cdot)\times B(\cdot)}\hat{c}_{n}(\cdot,a,b)$ is norm-like.

In view of Lemma 2.3, it is clear that for $\pi^{*2}\in\Pi^{s}_{2}$ , there exists $(\tilde{\psi}_{n},\tilde{\rho}_{n})\in L^{\infty}_{V}\times\mathbb{R}_{+}$ , $\tilde{\psi}_{n}>0$ satisfying

\displaystyle{\tilde{\rho}_{n}}\tilde{\psi}_{n}(i)=\sup_{\mu\in P(A(i))}\bigg{[}\sum_{j\in S}\tilde{\psi}_{n}(j)q(j|i,\mu,\pi^{*2}(i))+{\hat{c}_{n}(i,\mu,\pi^{*2}(i))}\tilde{\psi}_{n}(i)\bigg{]}

(3.3)

such that

\displaystyle 0\leq\tilde{\rho}_{n}\leq\sup_{\pi^{1}\in\Pi^{1}}\limsup_{T\rightarrow\infty}\frac{1}{T}\log E^{\pi^{1},\pi^{*2}}_{i}\bigg{[}e^{\int_{0}^{T}\hat{c}_{n}(\xi_{t},\pi^{1}(t),\pi^{*2}(\xi_{t}))dt}\bigg{]}.

(3.4)

Also, for some finite set $\mathscr{B}_{1}\supset\mathscr{B}\supset\mathscr{K}$ , we have

\displaystyle\tilde{\psi}_{n}(i)=\sup_{\pi^{1}\in\Pi^{1}}E^{\pi^{1},\pi^{*2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}({\mathscr{B}_{1}})}(\hat{c}_{n}(\xi_{t},\pi^{1}(t),\pi^{*2}(\xi_{t}))-\tilde{\rho}_{n})dt}\tilde{\psi}_{n}(\xi_{\tilde{\tau}(\mathscr{B}_{1})})\bigg{]},~{}i\in\mathscr{B}_{1}^{c}.

(3.5)

Now from the proof of Lemma 2.3, we have a finite set $\tilde{\mathscr{B}}$ , depending on $n$ , containing $\hat{\mathscr{K}}$ such that the following cases happen:

•

Under Assumption 2.2 (a): From (3.4), we have $\tilde{\rho}_{n}\leq\|\hat{c}_{n}\|_{\infty}$ . Thus, for $i\in{\hat{\mathscr{D}}_{n}}^{c}$ , it follows that $\hat{c}_{n}(i,a,b)-\tilde{\rho}_{n}\geq 0$ for all $(a,b)\in A(i)\times B(i)$ . Consequently, we may take $\tilde{\mathscr{B}}=\hat{\mathscr{D}}_{n}$ such that $\hat{c}_{n}(i,a,b)-\tilde{\rho}_{n}\geq 0$ in $\tilde{\mathscr{B}}^{c}$ for all $(a,b)\in A(i)\times B(i)$ .
•

Under Assumption 2.2 (b): since $\hat{c}_{n}$ is norm-like function, we can choose suitable finite set $\tilde{\mathscr{B}}$ such that $(\hat{c}_{n}(i,a,b)-\tilde{\rho}_{n})\geq 0$ in $\tilde{\mathscr{B}}^{c}$ for all $(a,b)\in A(i)\times B(i)$ .

For any $\pi^{1}\in\Pi^{1}$ , applying Dynkin formula and using (3.3) and Lemma 2.1, we get

\displaystyle\tilde{\psi}_{n}(i)\geq E^{\pi^{1},\pi^{*2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}(\tilde{\mathscr{B}})\wedge T}(\hat{c}_{n}(\xi_{t},\pi^{1}(t),\pi^{*2}(\xi_{t}))-\tilde{\rho}_{n})dt}\tilde{\psi}_{n}(\xi_{\tilde{\tau}(\tilde{\mathscr{B}})\wedge T})\bigg{]}.

Since for $i\in\tilde{\mathscr{B}}^{c}$ , $\hat{c}_{n}(i,a,b)-\tilde{\rho}_{n}\geq 0$ , by Fatou’s lemma taking $T\rightarrow\infty$ , we get

	$\displaystyle\tilde{\psi}_{n}(i)$	$\displaystyle\geq E^{\pi^{1},\pi^{2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}(\tilde{\mathscr{B}})}(\hat{c}_{n}(\xi_{t},\pi^{1}(t),\pi^{2}(\xi_{t}))-\tilde{\rho}_{n})dt}\tilde{\psi}_{n}(\xi_{\tilde{\tau}(\tilde{\mathscr{B}})})\bigg{]}$
		$\displaystyle\geq(\min_{\tilde{\mathscr{B}}}\tilde{\psi}_{n})~{}\forall~{}i\in\tilde{\mathscr{B}}^{c}.$

This implies that, $\tilde{\psi}_{n}$ has a lower bound. Now, applying Dynkin formula, and using (3.3) and Lemma 2.1, we deduce that

\displaystyle\tilde{\psi}_{n}(i)

\displaystyle\geq E^{\pi^{1},\pi^{*2}}_{i}\bigg{[}e^{\int_{0}^{T\wedge\tau_{N}}(\hat{c}_{n}(\xi_{t},\pi^{1}(t),\pi^{*2}(\xi_{t}))-\tilde{\rho}_{n})dt}\tilde{\psi}_{n}(\xi_{T\wedge\tau_{N}})\bigg{]},

for any $i\in S$ , where $\tau_{N}:=\inf\{t\geq 0:\xi_{t}\notin\{1,2,\cdots,N\}\}$ , $N\in\mathbb{N}$ . By Fatou’s lemma taking $N\rightarrow\infty$ , we get

	$\displaystyle\tilde{\psi}_{n}(i)$	$\displaystyle\geq E^{\pi^{1},\pi^{2}}_{i}\bigg{[}e^{\int_{0}^{T}(\hat{c}_{n}(\xi_{t},\pi^{1}(t),\pi^{2}(\xi_{t}))-\tilde{\rho}_{n})dt}\tilde{\psi}_{n}(\xi_{T})\bigg{]}$
		$\displaystyle\geq\min_{\tilde{\mathscr{B}}}\tilde{\psi}_{n}E^{\pi^{1},\pi^{2}}_{i}\bigg{[}e^{\int_{0}^{T}(\hat{c}_{n}(\xi_{t},\pi^{1}(t),\pi^{2}(\xi_{t}))-\tilde{\rho}_{n})dt}\bigg{]}.$

Thus, taking logarithm both sides, dividing by $T$ and letting $T\to\infty$ , we obtain

\displaystyle\tilde{\rho}_{n}\geq\limsup_{T\rightarrow\infty}\frac{1}{T}\log E^{\pi^{1},\pi^{*2}}_{i}\bigg{[}e^{\int_{0}^{T}\hat{c}_{n}(\xi_{t},\pi^{1}(t),\pi^{*2}(\xi_{t}))dt}\bigg{]}.

Since $\pi^{1}\in\Pi^{1}$ arbitrary, it follows that

	$\displaystyle\tilde{\rho}_{n}$	$\displaystyle\geq\sup_{\pi^{1}\in\Pi^{1}}\limsup_{T\rightarrow\infty}\frac{1}{T}\log E^{\pi^{1},\pi^{2}}_{i}\bigg{[}e^{\int_{0}^{T}\hat{c}_{n}(\xi_{t},\pi^{1}(t),\pi^{2}(\xi_{t}))dt}\bigg{]}$
		$\displaystyle\geq\sup_{\pi^{1}\in\Pi^{1}}\limsup_{T\rightarrow\infty}\frac{1}{T}\log E^{\pi^{1},\pi^{2}}_{i}\bigg{[}e^{\int_{0}^{T}{c}(\xi_{t},\pi^{1}(t),\pi^{2}(\xi_{t}))dt}\bigg{]}.$

Using this and (3.4), we get $\displaystyle\sup_{\pi^{1}\in\Pi^{1}}J(i,c,\pi^{1},\pi^{*2})\leq\sup_{\pi^{1}\in\Pi^{1}}J(i,\hat{c}_{n},\pi^{1},\pi^{*2})=\tilde{\rho}_{n}$ for all $n$ . From the definition of $\hat{c}_{n}$ , it is easy to see that $\tilde{\rho}_{n}$ is a decreasing sequence which has a lower bound. Now by similar arguments as in Lemma 2.3, it follows that there exists a pair $(\tilde{\rho},\tilde{\psi})$ such that $\tilde{\rho}_{n}\rightarrow\tilde{\rho}$ and $\tilde{\psi}_{n}\rightarrow\tilde{\psi}$ as $n\rightarrow\infty$ . As in Lemma 2.3, by taking $n\rightarrow\infty$ in (3.3), we get

\displaystyle{\tilde{\rho}}\tilde{\psi}(i)=\sup_{\mu\in P(A(i))}\bigg{[}\sum_{j\in S}\tilde{\psi}(j)q(j|i,\mu,\pi^{*2}(i))+{{c}(i,\mu,\pi^{*2}(i))}\tilde{\psi}(i)\bigg{]}.

(3.6)

Also, we have $\displaystyle\tilde{\rho}\geq\sup_{\pi^{1}\in\Pi^{1}}J(i,c,\pi^{1},\pi^{*2})\geq\rho$ . Now, we want to show that $\tilde{\rho}=\rho$ . Let $\tilde{\pi}^{*1}$ be a selector in (3.6). Thus

\displaystyle{\tilde{\rho}}\tilde{\psi}(i)=\bigg{[}\sum_{j\in S}\tilde{\psi}(j)q(j|i,\tilde{\pi}^{*1}(i),\pi^{*2}(i))+{{c}(i,\tilde{\pi}^{*1}(i),\pi^{*2}(i))}\tilde{\psi}(i)\bigg{]}.

(3.7)

In view of estimates in Lemma 2.1, applying Dynkin’s formula and the dominated convergence theorem, from (3.7) we deduce that there exists a finite set $\mathscr{B}_{2}\supset\mathscr{B}_{1}$ such that

\displaystyle\tilde{\psi}(i)\leq E^{\tilde{\pi}^{*1},\pi^{*2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}({\mathscr{B}_{2}})}({c}(\xi_{t},\tilde{\pi}^{*1}(\xi_{t}),\pi^{*2}(\xi_{t}))-\tilde{\rho})dt}\tilde{\psi}(\xi_{\tilde{\tau}(\mathscr{B}_{2})})\bigg{]},~{}\forall i\in\mathscr{B}_{2}^{c}.

(3.8)

Since $\tilde{\rho}\geq\rho$ , arguing as in Lemma 2.3 (see, (2.29)) for $\mathscr{B}_{2}\supset\mathscr{B}$ we have

\displaystyle\psi^{*}(i)\geq E^{\tilde{\pi}^{*1},\pi^{*2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}({\mathscr{B}_{2}})}({c}(\xi_{t},\tilde{\pi}^{*1}(\xi_{t}),\pi^{*2}(\xi_{t}))-\tilde{\rho})dt}\psi^{*}(\xi_{\tilde{\tau}(\mathscr{B}_{2})})\bigg{]}~{}\forall i\in\mathscr{B}_{2}^{c}.

(3.9)

Now we choose an appropriate constant $\kappa$ (e.g., $\displaystyle\kappa=\min_{\mathscr{B}_{2}}\frac{\psi^{*}}{\tilde{\psi}}$ ), so that $(\psi^{*}-\kappa\tilde{\psi})\geq 0$ in $\mathscr{B}_{2}$ and for some $\hat{i}_{0}\in\mathscr{B}_{2},$ $(\psi^{*}-\kappa\tilde{\psi})(\hat{i}_{0})=0$ . From (3.8) and (3.9), we get

\displaystyle\psi^{*}(i)-\kappa\tilde{\psi}(i)\geq E^{\tilde{\pi}^{*1},\pi^{*2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}({\mathscr{B}_{2}})}({c}(\xi_{t},\tilde{\pi}^{*1}(\xi_{t}),\pi^{*2}(\xi_{t}))-\tilde{\rho})dt}(\psi^{*}-\kappa\tilde{\psi})(\xi_{\tilde{\tau}(\mathscr{B}_{2})})\bigg{]}~{}\forall i\in\mathscr{B}_{2}^{c}.

(3.10)

From the above expression it is easy to see that $(\psi^{*}-\kappa\tilde{\psi})\geq 0$ in $S$ . Now using (3.1), (3.7) and the fact that $\tilde{\rho}\geq\rho$ , we get

	$\displaystyle\tilde{\rho}(\psi^{*}$	$\displaystyle-\kappa\tilde{\psi})(\hat{i}_{0})$
		$\displaystyle\geq\biggl{[}\sum_{j\in S}(\psi^{}-\kappa\tilde{\psi})(j)q(j\|\hat{i}_{0},\tilde{\pi}^{1}(\hat{i}_{0}),\pi^{2}(\hat{i}_{0}))+c(\hat{i}_{0},\tilde{\pi}^{1}(\hat{i}_{0}),\pi^{2}(\hat{i}_{0}))(\psi^{}-\kappa\tilde{\psi})(\hat{i}_{0})\biggr{]}.$

This implies that

\sum_{j\neq\hat{i}_{0}}(\psi^{*}-\kappa\tilde{\psi})(j)q(j|{\hat{i}_{0}},\tilde{\pi}^{*1}(\hat{i}_{0}),\pi^{*2}(\hat{i}_{0}))=0\,.

(3.11)

Since the Markov chain $\xi$ is irreducible under $(\tilde{\pi}^{*1},\pi^{*2})$ , by (3.11), we have $\psi^{*}=\kappa\tilde{\psi}$ in $S$ . From (3.1) and (3.6) it follows that $\tilde{\rho}=\rho$ . This proves (3.2). ∎

In the next theorem we show that any mini-max selector of (2.28) is a saddle point equilibrium.

Theorem 3.2.

Suppose Assumptions 2.1, 2.2, and 2.3 hold. Then any mini-max selector $(\pi^{*1},\pi^{*2})\in\Pi^{s}_{1}\times\Pi^{s}_{2}$ of (2.28) is a saddle point equilibrium.

Proof.

Arguing as in Lemma 2.3 and Theorem 3.1, there exists $(\rho^{\pi^{*1},\pi^{*2}},\psi^{\pi^{*1},\pi^{*2}})\in\mathbb{R}_{+}\times L^{\infty}_{V}$ with $\psi^{\pi^{*1},\pi^{*2}}>0$ satisfying

\displaystyle\rho^{\pi^{*1},\pi^{*2}}\psi^{\pi^{*1},\pi^{*2}}(i)=\bigg{[}\sum_{j\in S}\psi^{\pi^{*1},\pi^{*2}}(j)q(j|i,\pi^{*1}(i),\pi^{*2}(i))+c(i,\pi^{*1}(i),\pi^{*2}(i))\psi^{\pi^{*1},\pi^{*2}}(i)\bigg{]}.

(3.12)

Furthermore $\rho^{\pi^{*1},\pi^{*2}}=J(i,c,\pi^{*1},\pi^{*2})$ and for some finite set $\mathscr{B}\supset\hat{\mathscr{K}}$ (without loss of generality denoting by the same notation)

\displaystyle\psi^{\pi^{*1},\pi^{*2}}(i)=E^{\pi^{*1},\pi^{*2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}(\mathscr{B})}(c(\xi_{t},\pi^{*1}(\xi_{t}),\pi^{*2}(\xi_{t}))-\rho^{\pi^{*1},\pi^{*2}})dt}\psi^{\pi^{*1},\pi^{*2}}(\xi_{\tilde{\tau}(\mathscr{B})})\bigg{]}~{}\forall i\in\mathscr{B}^{c}.

(3.13)

Thus, from (3.2) it is clear that $\rho^{\pi^{*1},\pi^{*2}}\leq\rho$ . Now, following similar arguments as in Theorem 3.1 it is easy to see that $\rho^{\pi^{*1},\pi^{*2}}=\rho$ . This implies that $J(i,c,\pi^{1},\pi^{*2})\leq\rho^{\pi^{*1},\pi^{*2}}$ for all $\pi^{1}\in\Pi^{1}$ . Next, from [7] it is clear that if we consider the minimization problem $\displaystyle\min_{\pi^{2}\in\Pi^{2}}J(i,c,\pi^{*1},\pi^{2})$ , then the optimal control exists in the space of stationary Markov strategies. Thus to complete the proof, it is enough to show that $J(i,c,\pi^{*1},\pi^{*2})\leq J(i,c,\pi^{*1},\pi^{2})$ for any $\pi^{2}\in\Pi^{s}_{2}$ . If not suppose that $J(i,c,\pi^{*1},\pi^{*2})>J(i,c,\pi^{*1},\pi^{2})$ for some $\pi^{2}\in\Pi^{s}_{2}$ . We know that for $\pi^{2}\in\Pi^{s}_{2}$ , there exists $(\rho^{\pi^{*1},\pi^{2}},\psi^{\pi^{*1},\pi^{2}})\in\mathbb{R}_{+}\times L^{\infty}_{V}$ with $\psi^{\pi^{*1},\pi^{2}}>0$ satisfying

\displaystyle\rho^{\pi^{*1},\pi^{2}}\psi^{\pi^{*1},\pi^{2}}(i)=\bigg{[}\sum_{j\in S}\psi^{\pi^{*1},\pi^{2}}(j)q(j|i,\pi^{*1}(i),\pi^{2}(i))+c(i,\pi^{*1}(i),\pi^{2}(i))\psi^{\pi^{*1},\pi^{2}}(i)\bigg{]},

(3.14)

also we have $\rho^{\pi^{*1},\pi^{2}}=J(i,c,\pi^{*1},\pi^{2})$ and for some finite set $\mathscr{B}$ ( $\supset\hat{\mathscr{K}}$ )

\displaystyle\psi^{\pi^{*1},\pi^{2}}(i)=E^{\pi^{*1},\pi^{2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}(\mathscr{B})}(c(\xi_{t},\pi^{*1}(\xi_{t}),\pi^{2}(\xi_{t}))-\rho^{\pi^{*1},\pi^{2}})dt}\psi^{\pi^{*1},\pi^{2}}(\xi_{\tilde{\tau}(\mathscr{B})})\bigg{]}~{}\forall i\in\mathscr{B}^{c}.

(3.15)

From (2.29), we deduce that

\displaystyle\psi^{*}(i)\leq E^{\pi^{*1},\pi^{2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}(\mathscr{B})}(c(\xi_{t},\pi^{*1}(\xi_{t}),\pi^{2}(\xi_{t}))-\rho)dt}\psi^{*}(\xi_{\tilde{\tau}(\mathscr{B})})\bigg{]},~{}\forall i\in\mathscr{B}^{c}.

(3.16)

Now, as in Theorem 3.1, using (3.15) and (3.16) one can deduce that $\psi^{\pi^{*1},\pi^{2}}=\eta\psi^{*}$ for some positive constant $\eta$ . In view (2.28) and (3.14), it follows that $\rho\leq\rho^{\pi^{*1},\pi^{2}}$ , i.e., $J(i,c,\pi^{*1},\pi^{*2})\leq J(i,c,\pi^{*1},\pi^{2})$ , which is a contradiction. This completes the proof. ∎

Next we prove the converse of the above theorem.

Theorem 3.3.

Suppose Assumptions 2.1, 2.2, and 2.3 hold. If there exists a saddle point equilibrium $(\hat{\pi}^{*1},\hat{\pi}^{*2})\in\Pi^{s}_{1}\times\Pi^{s}_{2}$ , i.e.,

\displaystyle J(i,c,\pi^{1},\hat{\pi}^{*2})\leq J(i,c,\hat{\pi}^{*1},\hat{\pi}^{*2})\leq J(i,c,\hat{\pi}^{*1},\pi^{2})\,,

for all $i\in S$ , $\pi^{1}\in\Pi^{1}$ and $\pi^{2}\in\Pi^{2}$ . Then $(\hat{\pi}^{*1},\hat{\pi}^{*2})$ is a mini-max selector of (2.28).

Proof.

From Theorem 3.2, we deduce that

	$\displaystyle\rho=\inf_{\pi^{2}\in\Pi^{2}}\sup_{\pi^{1}\in\Pi^{1}}J(i,c,\pi^{1},\pi^{2})$	$\displaystyle\leq\sup_{\pi^{1}\in\Pi^{1}}J(i,c,\pi^{1},\hat{\pi}^{2})\leq J(i,c,\hat{\pi}^{1},\hat{\pi}^{*2})$
		$\displaystyle\leq\inf_{\pi^{2}\in\Pi^{2}}J(i,c,\hat{\pi}^{*1},\pi^{2})\leq\sup_{\pi^{1}\in\Pi^{1}}\inf_{\pi^{2}\in\Pi^{2}}J(i,c,\pi^{1},\pi^{2})=\rho\,.$

This implies that $\rho=J(i,c,\hat{\pi}^{*1},\hat{\pi}^{*2})$ and $\displaystyle\rho=\inf_{\pi^{2}\in\Pi^{2}}J(i,c,\hat{\pi}^{*1},\pi^{2})$ . Arguing as in Lemma 2.3 and using Theorem 3.1, it follows that for $\hat{\pi}^{*1}\in\Pi_{1}^{s}$ there exists $(\rho_{\hat{\pi}^{*1}},\psi_{\hat{\pi}^{*1}}^{*})\in\mathbb{R}_{+}\times L^{\infty}_{V}$ with $\psi_{\hat{\pi}^{*1}}^{*}>0$ such that

\displaystyle\rho_{\hat{\pi}^{*1}}\psi_{\hat{\pi}^{*1}}^{*}(i)=\inf_{\nu\in P(B(i))}\bigg{[}\sum_{j\in S}\psi_{\hat{\pi}^{*1}}^{*}(j)q(j|i,\hat{\pi}^{*1}(i),\nu)+c(i,\hat{\pi}^{*1}(i),\nu)\psi_{\hat{\pi}^{*1}}^{*}(i)\bigg{]},

(3.17)

and $\rho_{\hat{\pi}^{*1}}=\rho$ (since $\displaystyle\rho=\inf_{\pi^{2}\in\Pi^{2}}J(i,c,\hat{\pi}^{*1},\pi^{2})$ ). Let $({\pi}^{*1},{\pi}^{*2})$ be any mini-max selector of (2.28). Then form the above, we get

\displaystyle\rho_{\hat{\pi}^{*1}}\psi_{\hat{\pi}^{*1}}^{*}(i)\leq\bigg{[}\sum_{j\in S}\psi_{\hat{\pi}^{*1}}^{*}(j)q(j|i,\hat{\pi}^{*1}(i),\pi^{*2}(i))+c(i,\hat{\pi}^{*1}(i),\pi^{*2}(i))\psi_{\hat{\pi}^{*1}}^{*}(i)\bigg{]}.

(3.18)

Again arguing as in Lemma 2.3, for some $\mathscr{B}\supset\hat{\mathscr{K}}$ we have

\displaystyle\psi_{\hat{\pi}^{*1}}^{*}(i)\leq E^{\hat{\pi}^{*1},\pi^{*2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}(\mathscr{B})}(c(\xi_{t},\hat{\pi}^{*1}(\xi_{t}),\pi^{*2}(\xi_{t}))-\rho)dt}\psi_{\hat{\pi}^{*1}}^{*}(\xi_{\tilde{\tau}(\mathscr{B})})\bigg{]}~{}\forall i\in\mathscr{B}^{c}.

(3.19)

Since, $({\pi}^{*1},{\pi}^{*2})$ is a mini-max selector of (2.28), we have

\displaystyle\rho\psi^{*}(i)\geq\bigg{[}\sum_{j\in S}\psi^{*}(j)q(j|i,\hat{\pi}^{*1}(i),\pi^{*2}(i))+c(i,\hat{\pi}^{*1}(i),\pi^{*2}(i))\psi^{*}(i)\bigg{]}.

Thus, by applying Dynkin’s formula and Fatou’s lemma, we obtain

\displaystyle\psi^{*}(i)\geq E^{\hat{\pi}^{*1},\pi^{*2}}_{i}\bigg{[}e^{\int_{0}^{\tilde{\tau}(\mathscr{B})}(c(\xi_{t},\hat{\pi}^{*1}(\xi_{t}),\pi^{*2}(\xi_{t}))-\rho)dt}\psi^{*}(\xi_{\tilde{\tau}(\mathscr{B})})\bigg{]}~{}\forall i\in\mathscr{B}^{c}.

(3.20)

Using (3.19) and (3.20), and following the arguments as in Theorem 3.1 one can show that $\psi^{*}=\hat{\eta}\psi_{\hat{\pi}^{*1}}^{*}$ for some positive constant $\hat{\eta}$ . Therefore, combining (2.28) and (3.17) it is easy to see that $\hat{\pi}^{*1}$ is an outer maximizing selector of (2.28). By similar arguments we can show that $\hat{\pi}^{*2}$ is an outer minimizing selector of (2.28). This completes the proof. ∎

4. Example

In this section an illustrative example is presented. In our model the transition rate is unbounded, and the cost rate is nonnegative and unbounded.

Example 4.1.

Consider a controlled birth-death system in which the state variable denotes the total population at each time $t\geq 0$ . In this system there are ‘natural’ arrival and departure rates, say $\hat{\mu}$ and $\hat{\lambda}$ , respectively. Here player 1 controls arrival parameters $\hat{h}_{1}$ and player 2 controls departure parameters $\hat{h}_{2}$ . At any time $t$ , when the state of the system is $i\in S:=\{0,1,\cdots\}$ , player 1 takes an action $a$ from a given set $A(i)$ (which is a compact subset of some Polish space $A$ ). This action may increase $(\hat{h}_{1}(i,a)\geq 0)$ or decrease $(\hat{h}_{1}(i,a)\leq 0)$ , the arrival rate and these actions result in a payoff denoted by $\hat{c}_{1}(i,a)$ per unit time. Similarly, if the state is $i\in\{1,2,\cdots\}$ , player 2 takes an action $b$ from a set $B(i)$ (which is a compact subset of a Polish space $B$ ) to increase $(\hat{h}_{2}(i,b)\geq 0)$ or to decrease $(\hat{h}_{2}(i,b)\leq 0)$ , the departure rate and these actions produce a payoff denoted by $\hat{c}_{2}(i,b)$ per unit time. Also, in addition, assume that player 1 ‘owns’ the system and he/she gets a reward $\hat{p}\cdot i$ for each unit of time during which the system remains in the state $i\in S$ , where $\hat{p}>0$ is a fixed reward fee per customer. We also, assume that when the state of the system reaches at state $i=0$ , any number of arrivals may occur. When there is no customer in the system, (i.e., $i=0$ ), control of departure is unnecessary.

We next formulate this model as a continuous-time Markov game. The corresponding transition rate $q(j|i,a,b)$ and reward rate $c(i,a,b)$ for player 1 are given as follows: for $(0,a,b)\in K$ ( $K$ as in the game model (2.1)). We take

\displaystyle q(j|0,a,b)=\frac{\alpha}{(j+3)^{4}}~{}\text{ for all}~{}j\geq 1,~{}\text{such that}~{}\sum_{j\in S}q(j|0,a,b)=0,

(4.1)

where $\alpha>0$ is some constant so that $q(0|0,a,b)\leq-3$ . Also for $(i,a,b)\in K$ with $i\geq 1$ ,

\displaystyle q(j|i,a,b)=\left\{\begin{array}[]{lll}{}\displaystyle&\hat{\lambda}(i+3)^{2}+\hat{h}_{2}(i,b),~{}\text{if}~{}j=i-1\\ &-\hat{\mu}i-\hat{\lambda}(i+3)^{2}-\hat{h}_{1}(i,a)-\hat{h}_{2}(i,b),~{}\text{if}~{}j=i\\ &\hat{\mu}i+\hat{h}_{1}(i,a),~{}\text{if}~{}j=i+1\\ &0,~{}\text{otherwise}\displaystyle{}.\end{array}\right.

(4.5)

\displaystyle c(i,a,b):=\hat{p}\cdot i-\hat{c}_{1}(i,a)+\hat{c}_{2}(i,b)~{}\text{ for }~{}(i,a,b)\in K.

(4.6)

We now explore conditions under which there exists a pair of optimal strategies. To do so, we make the following assumptions.

(I)

Let $\hat{\lambda}\geq\max\{\hat{\mu},2\}$ , $\hat{\mu}i+\hat{h}_{1}(i,a)>0$ , and $\hat{\lambda}(i+3)^{2}+\hat{h}_{2}(i,b)>0$ for all $(i,a,b)\in K$ with $i\geq 1$ ; and assume that $\hat{h}_{1}(0,a)>0$ and $\hat{h}_{2}(0,b)=0$ for all $(a,b)\in A(i)\times B(i).$
(II)

The functions $\hat{h}_{1}(\cdot,\cdot):S\times A\rightarrow[-\hat{\mu},\hat{\mu}]$ , $\hat{h}_{2}(\cdot,\cdot):S\times B\rightarrow[-\hat{\lambda},\hat{\lambda}]$ , $\hat{c}_{1}(i,a)$ , and $\hat{c}_{2}(i,b)$ are continuous with their respective variables for each fixed $i\in S$ . Also, assume that $\displaystyle\min_{(a,b)\in A(i)\times B(i)}[\hat{c}_{1}(i,a)-\hat{c}_{2}(i,b)]$ is norm-like function and $\hat{p}\cdot i-\hat{c}_{1}(i,a)+\hat{c}_{2}(i,b)\geq 0$ for $(i,a,b)\in K$ . Here we take $\hat{p}<1$ .

Proposition 4.1.

Under conditions (I)-(II), the above controlled birth-death system satisfies the Assumptions 2.1, 2.2, and 2.3. Hence by Theorem 3.2, there exists a pair of optimal strategies.

Proof.

Consider the Lyapunov function given by

\displaystyle V(i):=(i+3)^{2}~{}\text{ for }~{}i\in S.

We have $V(i)\geq 1$ for all $i\in S$ . Now for each $i\geq 1$ , and $(a,b)\in A(i)\times B(i)$ , we have

$\displaystyle\sum_{j\in S}q(j\|i,a,b)V(j)$	$\displaystyle=q(i-1\|i,a,b)V(i-1)+V(i)q(i\|i,a,b)+V(i+1)q(i+1\|i,a,b)$
	$\displaystyle=(i+2)^{2}[\hat{\lambda}(i+3)^{2}+\hat{h}_{2}(i,b)]-(i+3)^{2}[\hat{\mu}i+\hat{\lambda}(i+3)^{2}+\hat{h}_{1}(i,a)+\hat{h}_{2}(i,b)]$
	$\displaystyle\quad+(i+4)^{2}[\hat{\mu}i+\hat{h}_{1}(i,a)]$
	$\displaystyle=-[\hat{\lambda}(i+3)^{2}+\hat{h}_{2}(i,b)](2i+5)+(\hat{\mu}i+\hat{h}_{1}(i,a))(2i+7)$
	$\displaystyle=-\hat{\lambda}(i+3)^{2}(i+3+i+2)-\hat{h}_{2}(i,b)(2i+5)+(\hat{\mu}i+\hat{h}_{1}(i,a))(2i+7)$
	$\displaystyle=-\hat{\lambda}(i+3)^{2}(i+3)-\hat{\lambda}(i+3)^{2}(i+2)-\hat{h}_{2}(i,b)(2i+5)+(\hat{\mu}i+\hat{h}_{1}(i,a))(2i+7)$
	$\displaystyle\leq-\hat{\lambda}(i+3)(i+3)^{2}-\hat{\lambda}(i+3)^{2}(i+2)+\hat{\lambda}(2i+5)+\hat{\lambda}i(2i+7)+\hat{\lambda}(2i+7)$
	$\displaystyle~{}(\text{ since}-h_{2}(i,b)\leq\hat{\lambda},~{}\hat{\mu}\leq\hat{\lambda},~{}h_{1}(i,a)\leq\hat{\mu}\leq\hat{\lambda},\text{ by conditions (I) and (II)})$
	$\displaystyle=-\frac{\hat{\lambda}}{2}(i+3)V(i)+\biggl{\{}-\frac{\hat{\lambda}}{2}(i+3)(i+3)^{2}-\hat{\lambda}(i+3)^{2}(i+2)+\hat{\lambda}(2i+5)$
	$\displaystyle+\hat{\lambda}i(2i+7)+\hat{\lambda}(2i+7)\biggr{\}}$
	$\displaystyle\leq-\frac{\hat{\lambda}}{2}(i+3)V(i)~{}(\text{ since the term within the second bracket is negative})$
	$\displaystyle\leq-(i+3)V(i)~{}(\text{ by condition (I), since }\hat{\lambda}\geq 2)$
	$\displaystyle=-\hat{\ell}(i)V(i)$	(4.7)

where $\hat{\ell}(i)=i+3$ . For $i=0$ ,

	$\displaystyle\sum_{j\in S}q(j\|i,a,b)V(j)$	$\displaystyle=9q(0\|0,a,b)+\sum_{j\geq 1}q(j\|0,a,b)(j+3)^{2}$
		$\displaystyle\leq CI_{\hat{\mathscr{K}}}(0)-\hat{\ell}(0)V(0),$		(4.8)

where $\hat{\mathscr{K}}=\{0\}$ and $C=\frac{\alpha\pi^{2}}{6}$ . Now

$\displaystyle\sup_{(a,b)\in A(i)\times B(i)}q(i,a,b)$	$\displaystyle\leq\sup_{(a,b)\in A(i)\times B(i)}\biggl{\{}\hat{\mu}i+\hat{\lambda}(i+3)^{2}+\|\hat{h}_{1}(i,a)\|+\|\hat{h}_{2}(i,b)\|\biggr{\}}$
	$\displaystyle\leq\hat{\lambda}i+2\hat{\lambda}+\hat{\lambda}(i+3)^{2}$
	$\displaystyle\leq 2(i+3)^{2}\hat{\lambda}\leq b_{2}V(i)~{}\forall~{}i\geq 1,$	(4.9)

where $b_{2}=\max\{2\hat{\lambda},\displaystyle\sum_{j\geq 1}\frac{\alpha}{{(j+3)}^{4}}\}$ . From (4.7) and (4.8), for all $i\in S$ , we get

\displaystyle\sum_{j\in S}q(j|i,a,b)V(j)\leq b_{0}V(i)+b_{1},

(4.10)

where $b_{1}=C$ and $b_{0}=1$ . Now

\displaystyle\hat{\ell}(i)-\max_{(a,b)\in A(i)\times B(i)}c(i,a,b)=3+(1-\hat{p})i+\min_{(a,b)\in A(i)\times B(i)}[\hat{c}_{1}(i,a)-\hat{c}_{2}(i,b)].

(4.11)

From (4.9) and (4.10), Assumption 2.1 is verified. By the condition (II), equations (4.7), (4.8), (4.11), it is easy to see that Assumption 2.2 is verified. By (4.1), (4.6), the condition (II), and the definition of $q$ as defined above, Assumption 2.3 (i) is verified. By (4.7) and (4.8), Assumption 2.3 (ii) is verified. Hence by Theorem 3.2, it follows that there exists an optimal pair of stationary strategies for this controlled Birth-Death process. ∎

References

[1] A. ARAPOSTATHIS, A counterexample to a nonlinear version of the Kreın-Rutman theorem by R. Mahadevan, Nonlinear Anal., 171 (2018), pp. 170–176.
[2] W. J. ANDERSON, Continuous-time Markov chains, Springer Series in Statistics: Probab. and its Appl., Springer-Verlag, New York, (1991), An applications-oriented approach.
[3] A. BASU AND M. K. GHOSH, Zero-sum risk-sensitive stochastic games on a countable state space, Stoch, processes and their appli., 124 (2014), pp. 961-983.
[4] A. BASU AND M. K. GHOSH, Nonzero-sum risk-sensitive stochastic games on a countable state space, Math. of Oper. Res., 43 (2018), pp. 516-532.
[5] N. BAUERLE AND U. RIEDER, More risk-sensitive Markov decision processes, Math. Oper. Res., 39 (2014), pp. 105-120.
[6] N. BAUERLE AND U. RIEDER, Zero-sum risk-sensitive stochastic games, Stoch. processes and their appl., 127 (2017), pp. 622-642.
[7] A. BISWAS AND S. PRADHAN, Ergodic risk-sensitive control of Markov processes on countable state space revisited, ArXiv e-prints 2104.04825 (2021). Available at https://arxiv.org/abs/2104.04825.
[8] K. FAN, Minimax Theorems, Proc. Natl. Acad. Sci., USA, 39 (1953), pp. 42-47.
[9] M. K. GHOSH AND S. SAHA, Risk-sensitive control of continuous-time Markov chains, Stochastic, 86 (2014), pp. 655-675.
[10] M. K. GHOSH, K. S. KUMAR, AND C. PAL, Zero-sum risk-sensitive stochastic games for continuous-time Markov chains, Stoch. Anal. Appl., 34 (2016), pp. 835-851.
[11] S. GOLUI AND C. PAL, Continuous-time Zero-Sum Games for Markov chains with risk-sensitive finite-horizon cost criterion, Stoch. Anal. and Appl., (2021). Available at: https://doi.org/10.1080/07362994.2021.1889381.
[12] S. GOLUI AND C. PAL, Continuous-time zero-sum games for Markov decision processes with discounted risk-sensitive cost criterion, Dyn. Games and Appl., (2021). Available at: https://doi.org/10.1007/s13235-021-00391-2.
[13] X. P. GUO AND O. HERNANDEZ-LERMA, Zero-sum games for continuous-time Markov chains with unbounded transition and average payoff rates, J. Appl. Probab., 40 (2003), pp. 327-345.
[14] X.P. GUO AND YOUNGHUI HUANG, Risk-sensitive average continuous-time Markov decision processes with unbounded transition and cost rates, J. Appl. Probab., 58 (2021), pp. 523-550.
[15] X. P. GUO AND O. HERNANDEZ-LERMA, Zero-sum games for continuous-time jump Markov processes in Polish spaces: discounted payoffs, Adv. in Appl. Probab., 39 (2007), pp. 645-668.
[16] X. P. GUO AND O. HERNANDEZ-LERMA, Continuous-Time Markov decision processes, Stoch. Model. and Appl. Probab., Springer-Verlag, Berlin, 62 (2009). Theorey and applications.
[17] X. GUO AND X. SONG, Discounted continuous-time constrained Markov decision processes in polish spaces, Ann. Appl. Probab., 21 (2011), pp. 2016-2049.
[18] X. P. GUO AND Z. W. LIAO, Risk-sensitive discounted continuous-time Markov decision processes with unbounded rates, SIAM J. Control Optim., 57 (2019), pp. 3857-3883.
[19] X. P. GUO AND A. PIUNOVSKIY, Discounted continuous-time Markov decision processes with constraints: Unbounded transition and loss rates, Math. Oper. Res., 36 (2011), pp. 105-132.
[20] O. HERNANDEZ-LERMA AND J. LASSERRE, Further topics on discrete-time Markov control processes, Springer, New York, (1999).
[21] O. HERNANDEZ-LERMA AND J. B. LASSERRE Zero-sum stochastic games in Borel spaces: average payoff criterion, SIAM J. Control Optimization, 39 (2001), pp. 1520-1539.
[22] M. Y. KITAEV, Semi-Markov and jump Markov controlled models: Average cost criterion, SIAM Theory Probab. Appl., 30 (1995), pp. 272-288.
[23] M. Y. KITAEV AND V.V. RYKOV, Controlled Queueing Systems, CRC Press, Boca Raton, (1995).
[24] M. B. KLOMPSTRA, Nash equilibria in risk-sensitive dynamic games, IEEE Trans. Automat. Control. 45 (2007), pp. 1397-1401.
[25] K.S. KUMAR AND C. PAL, Risk-sensitive control of jump process on denumerable state space with near monotone cost, Appl. Math. Optim., 68 (2013), pp. 311-331.
[26] K.S. KUMAR AND C. PAL, Risk-sensitive ergodic control of continuous-time Markov processes with denumerable state space, Stoch. Anal. Appl., 33 (2015), pp. 863-881.
[27] A. S. NOWAK, Measurable selection theorems for minimax stochastic optimization problems, SIAM J. Control Optim., 23 (1985), pp. 466-476.
[28] C. PAL AND S. PRADHAN, Risk-sensitive control of pure jump processes on a general state space, Stochastics, 91 (2019), pp. 155-174. Available at https://doi.org/10.1080/17442508.2018.1521413.
[29] C. PAL AND S. PRADHAN, Zero-Sum Games for Pure Jump Processes with Risk-Sensitive Discounted Cost Criteria, J. Dyn. Games, (2021). Available at http://dx.doi.org/10.3934/jdg.2021020.
[30] A. PIUNOVSKIY AND Y. ZHANG, Discounted continuous-time Markov decision processes with unbounded rates: The convex analytic approach, SIAM J. Control Optim., 49 (2011), pp. 2032-2061.
[31] A. PIUNOVSKIY AND Y. ZHANG, Continuous-time Markov decision processes, Springer, (2020).
[32] M. L. PUTERMAN, Markov decision processes: Disc. stoch. dyn. program., Wiley, New York, (1994).
[33] L. S. SHAPLEY, Stochastic games, Proc. Nat. Acad. Sci., USA 39 (1953), pp. 1095-1100.
[34] Q. D. WEI AND X. CHEN, Stochastic games for continuous-time jump processes under finite-horizon payoff criterion, Appl. Math. Optim., 74 (2016), pp. 273–301.
[35] Q. D. WEI AND X. CHEN, Nonzero-sum games for continuous-time jupm processes under the expected Average payoff criterion, Appl. Math. Optim., 83 (2019), pp. 915-938.
[36] Q. D. WEI AND X. CHEN, Nonzero-sum Risk-Sensitive Average Stochastic Gamea: The Case of Unbounded Costs, Dyn. games and Appli., (2021). Available at https://doi.org/10.1007/s13235-021-00380-5.
[37] Q. WEI, Zero-sum games for continuous-time Markov jump processes with risk-sensitive finite-horizon cost criterion, Oper. Res. Lett., 46 (2018), pp. 69-75.
[38] W. Z. ZHANG AND X. P. GUO, Nonzero-sum games for continuous-time Markov chains with unbounded transition and average payoff rates, Sci. China Math., 55 (2012), pp. 2405–2416.
[39] W. Z. ZHANG, Average optimality for continuous-time Markov decision processes under weak continuty conditions, J. of Appl. probab., 51 (2014), pp. 954-970.

	$\displaystyle 0=$	$\displaystyle\sup_{\mu\in P(A(i))}\inf_{\nu\in P(B(i))}\biggl{[}\sum_{i\neq j\in\hat{\mathscr{D}}_{n}}\tilde{\phi}_{1}(j)q(j\|i,\mu,\nu)+\biggl{(}q(i\|i,\mu,\nu)+\tilde{c}(i,\mu,\nu)\biggr{)}\hat{T}\tilde{\phi}_{1}(i)\biggr{]}$
		$\displaystyle-\sup_{\mu\in P(A(i))}\inf_{\nu\in P(B(i))}\biggl{[}\sum_{i\neq j\in\hat{\mathscr{D}}_{n}}\tilde{\phi}_{2}(j)q(j\|i,\mu,\nu)+\biggl{(}q(i\|i,\mu,\nu)+\tilde{c}(i,\mu,\nu)\biggr{)}\hat{T}\tilde{\phi}_{2}(i)\biggr{]}$
	$\displaystyle\geq$	$\displaystyle\inf_{\nu\in P(B(i))}\biggl{[}\sum_{i\neq j\in\hat{\mathscr{D}}_{n}}\tilde{\phi}_{1}(j)q(j\|i,\tilde{\pi}^{1}(i),\nu)+\biggl{(}q(i\|i,\tilde{\pi}^{1}(i),\nu)+\tilde{c}(i,\tilde{\pi}^{1}(i),\nu)\biggr{)}\hat{T}\tilde{\phi}_{1}(i)\biggr{]}$
		$\displaystyle-\inf_{\nu\in P(B(i))}\biggl{[}\sum_{i\neq j\in\hat{\mathscr{D}}_{n}}\tilde{\phi}_{2}(j)q(j\|i,\tilde{\pi}^{1}(i),\nu)+\biggl{(}q(i\|i,\tilde{\pi}^{1}(i),\nu)+\tilde{c}(i,\tilde{\pi}^{1}(i),\nu)\biggr{)}\hat{T}\tilde{\phi}_{2}(i)\biggr{]}$
	$\displaystyle\geq$	$\displaystyle\inf_{\nu\in P(B(i))}\biggl{[}\sum_{i\neq j\in\hat{\mathscr{D}}_{n}}(\tilde{\phi}_{1}(j)-\tilde{\phi}_{2}(j))q(j\|i,\tilde{\pi}^{1}(i),\nu)+\biggl{(}q(i\|i,\tilde{\pi}^{1}(i),\nu)+\tilde{c}(i,\tilde{\pi}^{1}(i),\nu)\biggr{)}(\hat{T}\tilde{\phi}_{1}(i)-\hat{T}\tilde{\phi}_{2}(i))\biggr{]}\,.$

	$\displaystyle 0>\rho_{n}\psi_{n}(i_{0})$	$\displaystyle=\sup_{\mu\in P(A(i_{0}))}\bigg{[}\sum_{j\in S}\psi_{n}(j)q(j\|i_{0},\mu,\pi^{2}_{n}(i_{0}))+c(i_{0},\mu,\pi_{n}^{2}(i_{0}))\psi_{n}(i_{0})\bigg{]}$
		$\displaystyle\geq\sup_{\mu\in P(A(i_{0}))}\bigg{[}\sum_{j\in S}\psi_{n}(j)q(j\|i_{0},\mu,\pi^{*2}_{n}(i_{0}))\bigg{]}$
		$\displaystyle\geq\sum_{j\in S}\psi_{n}(j)q(j\|i_{0},\mu,\pi_{n}^{*2}(i_{0})).$

	$\displaystyle\inf_{\nu\in P(B(i))}\bigg{[}\sum_{j\in S}\psi_{n}(j)q(j\|i,\pi^{1}_{n}(i),\nu)+c(i,\pi^{1}_{n}(i),\nu)\psi_{n}(i)\bigg{]}$
	$\displaystyle=\sup_{\mu\in P(A(i))}\inf_{\nu\in P(B(i))}\bigg{[}\sum_{j\in S}\psi_{n}(j)q(j\|i,\mu,\nu)+c(i,\mu,\nu)\psi_{n}(i)\bigg{]}$
	$\displaystyle=\inf_{\nu\in P(B(i))}\sup_{\mu\in P(A(i))}\bigg{[}\sum_{j\in S}\psi_{n}(j)q(j\|i,\mu,\nu)+c(i,\mu,\nu)\psi_{n}(i)\bigg{]}$
	$\displaystyle=\sup_{\mu\in P(A(i))}\bigg{[}\sum_{j\in S}\psi_{n}(j)q(j\|i,\mu,\pi^{2}_{n}(i))+c(i,\mu,\pi^{2}_{n}(i))\psi_{n}(i)\bigg{]}.$		(2.32)

	$\displaystyle\inf_{\nu\in P(B(i))}\bigg{[}\sum_{j\in S}\psi^{}(j)q(j\|i,\pi^{1}(i),\nu)+c(i,\pi^{1}(i),\nu)\psi^{}(i)\bigg{]}$
	$\displaystyle=\sup_{\mu\in P(A(i))}\inf_{\nu\in P(B(i))}\bigg{[}\sum_{j\in S}\psi^{}(j)q(j\|i,\mu,\nu)+c(i,\mu,\nu)\psi^{}(i)\bigg{]}$
	$\displaystyle=\inf_{\nu\in P(B(i))}\sup_{\mu\in P(A(i))}\bigg{[}\sum_{j\in S}\psi^{}(j)q(j\|i,\mu,\nu)+c(i,\mu,\nu)\psi^{}(i)\bigg{]}$
	$\displaystyle=\sup_{\mu\in P(A(i))}\bigg{[}\sum_{j\in S}\psi^{}(j)q(j\|i,\mu,\pi^{2}(i))+c(i,\mu,\pi^{2}(i))\psi^{}(i)\bigg{]},~{}i\in S,$		(3.1)