Zero-Sum Games for Continuous-time Markov decision processes with risk-sensitive average cost criterion
Abstract.
We consider zero-sum stochastic games for continuous time Markov decision processes with risk-sensitive average cost criterion. Here the transition and cost rates may be unbounded. We prove the existence of the value of the game and a saddle-point equilibrium in the class of all stationary strategies under a Lyapunov stability condition. This is accomplished by establishing the existence of a principal eigenpair for the corresponding Hamilton-Jacobi-Isaacs (HJI) equation. This in turn is established by using the nonlinear version of Krein-Rutman theorem. We then obtain a characterization of the saddle-point equilibrium in terms of the corresponding HJI equation. Finally, we use a controlled population system to illustrate results.
Keywords: Zero-sum game; risk-sensitive average cost criterion; history dependent strategy; HJI equation; saddle point equilibrium.
1. INTRODUCTION
Markov decision processes (MDPs) are widely used for modeling control problems that arise naturally in many real-life problems, for example in queueing models, epidemiology models, birth-death models etc, see [5], [16], [31], [32]. When there is more than one controller (or player) the stochastic control problem is referred to as stochastic game problem. Stochastic dynamic game was first introduced in [33] and has been studied extensively in the literature due to its immense applications; see [3], [6], [10], [11], [15], [34], [37], [38] and the references therein. In this article we consider the risk-sensitive ergodic zero-sum game for continuous-time Markov decision processes (CTMDPs). In zero-sum game, one player is trying to minimize her/his cost and the other player is trying to maximize the same. In literature, the expected average cost criterion is a commonly used optimality criterion in the theory of CTMDPs and has been widely studied under the different sets of optimality conditions; for control problems see, [16], [39] and the references therein; for game problems see [13], [20], [35] and the references therein. In these papers the decision-makers are risk-neutral. However, risk preferences may vary from person to person in the real-world applications. In order to address this concern one of the approaches that is available in the literature is risk-sensitive criterion. In this criterion one investigates the expectation of an exponential of the random quantity. This takes into account the attitude of the controller with respect to risk. The performance of a pair of strategies is measured by risk-sensitive average cost criterion, which in our present case is defined by (2.4), below. The analysis of risk-sensitive control is technically more involved because of the exponential nature of the cost. The risk-sensitive average cost stochastic optimal control problems for CTMDPs were first considered in [9] and have been studied extensively in the literature due to its applications in finance and large deviation theory. Recently, there has been an extensive work on risk-sensitive average cost criterion problems for CTMDPs; see, for example [7], [14], [25], [26], [28] and the references therein. The risk-sensitive stochastic zero-sum games for MDPs have been studied in [[3], [6], [10], [11], [37]] and [[4], [24], [36]] consider the nonzero-sum games for MDPs. In [[3], [6]], the authors study zero-sum risk-sensitive stochastic games for discrete time MDPs with bounded cost. Both of the papers considered first the discounted cost and then ergodic cost. In [6], the authors extended the results of [3] to the general state space case. The zero-sum risk-sensitive average games have been studied in [10] and discounted risk-sensitive zero-sum games were studied in [29] for CTMDPs with bounded cost and transition rates. But this boundedness requirement restricts our domain of application, since in many real-life situations we see that the reward/cost and transition rates are unbounded as for example in queueing, telecommunication and population processes. In [11] and [37], the authors study finite horizon zero-sum risk-sensitive continuous-time stochastic games. In [11], unbounded costs and transition rates are considered while [37] considers unbounded transition but bounded cost. The discounted risk-sensitive zero-sum game for CTMDPs was studied in [12] with unbounded cost and transition rates.
Here we study zero-sum ergodic risk-sensitive stochastic games for CTMDPs having the following features: (a) transition and cost rates may be unbounded (b) state space is countable (c) at any state of the system the space of admissible actions is compact (d) the strategies may be history dependent. To the best of our knowledge, this is the first work which deals with infinite horizon continuous-time zero-sum risk-sensitive stochastic games for ergodic criterion on countable state space for unbounded transition and cost rates. Under a Lyapunov stability condition, we prove the existence of a saddle-point equilibrium in the class of stationary strategies. Using Krein-Rutman theorem, we first prove that the corresponding HJI equation has a unique solution for any finite subset of the state space. Then using the Lyapunov stability condition, we establish the existence of a unique solution for the corresponding HJI equation on the whole state space. Also we give a complete characterization of saddle point equilibrium in terms of the corresponding HJI equation.
The rest of this article is arranged as follows. Section 2 gives the description of the problem and assumptions. We also show in this section that the required risk-sensitive optimality equation (HJI equation) has a solution. In Section 3, we completely characterize all possible saddle point equilibria in the class of stationary Markov strategies. In Section 4, we present an illustrative example.
2. The game model
In this section we introduce the continuous-time zero-sum stochastic game model described by the following elements
(2.1) |
where
-
•
, called the state space, is the set of all nonnegative integers.
-
•
and are the action sets for players 1 and 2, respectively. The action spaces and are assumed to be Borel spaces with the Borel -algebras and , respectively.
-
•
For each , and denote the sets of admissible actions for players 1 and 2 in state , respectively. Let , which is a Borel subset of . Throughout this paper, we assume that the admissible action spaces and are compact for each .
-
•
Given any , the transition rate is a signed kernel on such that for all with . Moreover, we assume that satisfies the following conservative and stable conditions: for any
(2.2) where
-
•
Finally, the measurable function denotes the cost rate (representing cost for player 2 and payoff for player 1).
The game evolves as follows. The players observe continuously the current state of the system. When the system is in state at time , the players independently choose actions and according to some strategies, respectively. As a consequence of this, the following happens:
-
•
player 2 pays an immediate cost at rate to player 1;
-
•
the system stays in state for a random time, with rate of leaving given by , and then jumps to a new state with the probability determined by (see Proposition in [[16], p. 205] for details).
When the state of the system transits to a new state , the above procedure is repeated.
The goal of player 2 is to minimize his/her accumulated cost, whereas player 1 tries to maximize the same with respect to some performance criterion , which in our present case is defined by (2.4), below. Such a model is relevant in worst-case scenarios, e.g., in financial applications when a risk-averse investor is trying to maximize his long-term portfolio gain against the market which, by default, is the minimizer in this case.
To formalize what is described above, below we describe the construction of continuous time Markov decision processes (CTMDPs) under possibly admissible history-dependent strategies. To construct the underlying CTMDPs (as in [[19], [22], [30]]), we introduce some notations: let (with some ), , for each , and let be the Borel -algebra on . Then we obtain the measurable space . For each , define , , . Using , we define the state process as
(2.3) |
Here, denotes the indicator function of a set , and we use the convention that and for all . The process after is regarded to be absorbed in the state . Thus, let , , , , , for all , where , are isolated points. Moreover, let for all , , and which denotes the -algebra of predictable sets on related to .
In order to define the risk sensitive cost criterion, we need to introduce the definition of strategy below.
Definition 2.1.
An admissible history-dependent strategy for player 1, denoted by , is determined by a sequence of stochastic kernel on such that
where is a stochastic kernel on given such that , are stochastic kernels on given such that , and denotes the Dirac measure at the point .
The set of all admissible history-dependent strategies for player 1 is denoted by . A strategy for player 1, is called a Markov if for every and , where . A Markov stragegy is called a stationary Markov strategy if does not have explicit dependence on time. We denote by and the family of all Markov strategies and stationary Markov strategies, respectively, for player 1. The sets of all admissible history-dependent strategies , all Markov strategies and all stationary strategies for player 2 are defined similarly.
For any compact metric space , let denote the space of probability measures on with Prohorov topology. Since for each , and are compact sets, and are compact metric spaces. For each , and , the associated cost and transition rates are defined, respectively, as follows:
Note that can be identified with a map such that for each . Thus, we have and . Therefore by Tychonoff theorem, the sets and are compact metric spaces. Also, note that under Assumption 2.1 (given below) for any initial state and any pair of strategies , Theorem 4.27 in [23] yields the existence of a unique probability measure denoted by on . Let be the expectation operator with respect to . Also, from [[16], pp.13-15], we know that is a Markov process under any (in fact, strong Markov). Now we give the definition of the risk-sensitive average cost criterion for zero-sum continuous-time games. Since the risk-sensitive parameter remains fixed throughout we assume without any loss of generality that the risk-sensitivity coefficient . For each and any , the risk-sensitive ergodic cost criterion is given by
(2.4) |
Player 1 tries to maximize the above over his/her admissible strategies whereas player 2 tries to minimize the same. Now we define the lower/upper value of the game. The functions on S defined by and are called, respectively, the lower value and the upper value of the game. It is easy to see that
Definition 2.2.
If for all , then the common function is called the value of the game and is denoted by .
Definition 2.3.
Suppose that the game admits a value . Then a strategy in is said to be optimal for player 1 if
Similarly, is optimal for player 2 if
If is optimal for player k (k=1,2), then is called a pair of optimal strategies and also called a saddle-point equilibrium.
Next we list the commonly used notations below:
-
•
For any finite set , we define .
-
•
denotes the cone of all nonnegative functions vanishing outside
-
•
Given any real-valued function on , we define a Banach space of -weighted functions by
-
•
.
-
•
For any function , .
-
•
For any finite set , .
Our main goal is to establish the existence of a saddle-point equilibrium among the class of admissible history-dependent strategies. To this end, following [3] and [7], we investigate the HJI equation given by
(2.5) |
Here is a scalar and is an appropriate function. The above is clearly an eigenvalue problem related to a nonlinear operator on an appropriate space. By a nonlinear version of Krein-Rutman theorem, we first show that Dirichlet eigenvalue problem associated with the above equation admits a solution in the space of bounded functions. Then by using a suitable limiting argument we show that the above HJI equation admits a principal eigenpair in an appropriate space. Finally exploiting the HJI equation, we completely characterize all possible saddle-point equilibria in the space of stationary Markov strategies. This is a brief outline of our procedure of establishing a saddle point equilibrium. The details now follow.
Since the transition rates (i.e., ) may be unbounded, to avoid the explosion of the state process , the following assumption is imposed on the transition rates, which had been widely used in CTMDPs; see, for instance, [[17], [18], [19]] and references therein.
Assumption 2.1.
There exist real-valued function on , constants and , and such that :
-
(i)
for all ;
-
(ii)
for all , where is as in (2.2).
Throughout the rest of this article we are going to assume that Assumption 2.1 holds. Note that if then Assumption 2.1 holds trivially. In this case we can choose to be a suitable constant.
Since we are allowing our transition and cost rates to be unbounded, to guarantee the finiteness of , we need the following Assumption.
Assumption 2.2.
We assume that the CTMDP is irreducible under every pair of stationary Markov strategies . Assume that the cost function is bounded below. Thus without loss of generality we assume that . Furthermore, suppose there exist a constant , a finite set and a Lyapunov function such that one of the following hold.
-
(a)
When the running cost is bounded: For some positive constant , we have following blanket stability condition
(2.6) -
(b)
When the running cost is unbounded: For some norm-like function , the function is norm-like and we have the following blanket stability condition
(2.7)
We wish to establish the existence of a saddle-point equilibrium in the class of all stationary strategies. In view of this we also need the following assumptions. Let be a fixed point (a reference state).
Assumption 2.3.
-
(i)
For any fixed the functions and are continuous in .
-
(ii)
The sum is continuous in for any given , where is as Assumption 2.2.
-
(iii)
There exists such that any state can be reached from , i.e., for all and .
We first construct an increasing sequence of finite subsets such that and for all . Define , first exit time from .
Proposition 2.1.
Suppose Assumption 2.3 holds. Let be a function continuous in for each fixed . Suppose the cost function satisfies the relation in for some and . Then for any there exists a unique satisfying the following nonlinear equation
(2.8) |
with for all . Moreover the unique solution of the above equation satisfies
(2.9) |
where as before .
Proof.
Let be a sequence in . Fix . Let be defined by
(2.10) |
Suppose . Let . Then there exists for which the following holds
Since is arbitrary we get . Also, we see that and . Since is continuous in , for every , there exists a unique satisfying . Now using the definition of , for fixed , we can define a map satisfying
(2.11) |
Let . Also, let be an outer maximizing selector of
Assumption 2.3, ensures the existence of such a selector. It then follows that
Now let the infimum of the RHS (of the above) attain at . Then
Hence, we deduce that
Now in the above calculation, interchanging , it follows that
where is a positive constant less than . This implies that is a contraction map. Thus, by Banach fixed point theorem, there exists a unique such that . Now by Fan’s minimax theorem, see [[8], Theorem 3], we have
This proves that (2.8) admits a unique solution. Now by using Dynkin formula as in [[16], Appendix C.3], for any and , we get
(2.12) |
Using the compactness of and the continuity of , there exists a pair of selectors (i.e., a mini-max selector) satisfying
(2.13) |
Then, using (2.12) and (2.13), we obtain
Using the dominated convergence theorem, taking in the above equation, we get
Hence
Since is arbitrary,
(2.14) |
Similarly, using (2.12), (2.13), and Fatou’s Lemma, we get
(2.15) |
Using (2.14) and (2.15), we obtain
This completes the proof. ∎
We now recall a version of the nonlinear Krein-Rutman theorem from [[1], Section 3.1]. Let be an ordered Banach space. In what follows denotes a partial ordering in with respect to a positive cone (), that is . Also, recall that if a map is continuous and compact, it is called completely continuous.
Theorem 2.1.
Let be as above and a nonempty closed cone that satisfies . Let be an order-preserving, completely continuous, 1-homogeneous map with the property that if for some nonzero and , we have . Then there exist a nontrivial and satisfying .
Lemma 2.1.
Suppose Assumption 2.2 holds. Consider a finite subset of such that . Let . Then for any pair of strategies , the following results hold.
Proof.
It is easy to see that the proof of (i) is analogous to that the proof of (ii) when we replace with . So, we prove only part (ii). Suppose Assumption 2.2 (b) holds. Let be large enough so that . Applying Dynkin’s formula [[16], Appendix C.3], for we have
where . Now by Fatou’s lemma, taking first and then , we get the required result. ∎
Lemma 2.2.
Suppose Assumptions 2.1, 2.2, and 2.3 hold. Then for , there exists a pair , for the following Dirichlet nonlinear eigenequation
(2.18) |
Also, for each such that , we have
(2.19) |
Additionally the sequence is bounded satisfying .
Proof.
Let . Set . Let be an operator defined as
(2.20) |
with . Let such that , i.e., for each . Also, let and . Then there exists such that
Also, from the proof of Proposition 2.1, we have
Thus, we deduce that
This gives us . Clearly for all . Since , there exists a constant such that
Thus is continuous. Let be a bounded sequence in . Then from (2.20), for some constant such that . Now applying diagonalization arguments, there exist a subsequence of , ( denoting by the same sequence without loss of generality) and a function such that as . Hence the map is compact. Therefore it is completely continuous. Let such that and for all . Then by (2.20), we have
where is the first jump time (clearly, ). Thus where . Therefore by Theorem 2.1, there exists a nontrivial where and a constant such that , i.e.,
where . Therefore in terms of , we have
where . Now by Fan’s minimax theorem, see [[8], Theorem 3], we have
This proves that (2.18) admits a unique solution. As before by the continuity of and the compactness of , there exists such that (2.18), can be written as
(2.21) |
Now applying Dynkin’s formula (see [[7], Lemma 3.1]) and using (2.21), we get
(2.22) |
If then by taking logarithm on the both sides in (2.22), dividing by and letting , we get
Since is arbitrary, we obtain
We now show that is finite for every and . We only provide a proof under Assumption 2.2 (b) and the proof under Assumption 2.2 (a) would be analogous. Now from (2.7) we get
(2.23) |
Then by Dynkin formula, we get
(2.24) |
By Fatou’s lemma, taking in (2.24), we get
Now, since , taking logarithm on both sides in the above equation, dividing both sides by and letting , we obtain
Since, is norm-like, we have for some constant . Hence we get
(2.25) |
It is clear from (2.19) and (2.25) that has an upper bound. Next we prove that is bounded below. By using assumption 2.3 (iii) and (2.18), we have . Thus normalizing , we have . Also, since , by (2.18) we get
So, is bounded below. Now we claim that . If not, then on contrarary, . So, along some subsequence, we have (with an abuse of notation, we use the same sequence) , as and for large , . Let be outer minimizing selector of (2.18). Thus, using (2.18), for large enough , we have
Now by Assumption 2.3 (iii), from the above equation, we get
So, by diagonalization argument we say, there exist a subsequence (denoting by the same sequence with an abuse of notation) and a function with such that , as for all . By our assumption is compact for each and is outer minimizing selector of (2.18). Hence we have , for all , as . Therefore we have
(2.26) |
So, taking in the above equation, we obtain
(2.27) |
Let . Applying Dynkin formula and using (2.27), we obtain
Now, using dominated convergence theorem, taking , we get . So, with respect to the canonical filtration of , is supermartingale. So, by Doob’s martingale convergence theorem as , converges. Now by Assumption 2.2, is recurrent. Thus the skeleton process is also recurrent (see for details [[2], Proposition 5.1.1]). This implies, that the process visits every state of infinitely often. But this is possible only if . Since , this contradicts (2.27). Thus, . ∎
Lemma 2.3.
Proof.
Using Assumption 2.2 and the fact , there exists a finite set containing such that the following hold.
Now we scale in such a way that it touches from below. Define
Then we see that is finite as vanishes in and . We claim that if we replace by , then touches inside . If not, then for some state , and in . Let be an outer maximizing selector of (2.18). Then by Dynkin formula, we get (under Assumption 2.2 (b))
Since , in view of Lemma 2.1, by the dominated convergence theorem, taking , we get
Using this and (2.17), we have
Hence we arrive at a contradiction. Thus touches inside . Similar conclusion holds under Assumption 2.2 (a). Now, since for all large n, by diagonalization argument, there exists a subsequence (by an abuse of notation, we use the same sequence) such that, for all , as , and . Also, since by Lemma 2.2, the sequence is bounded and , we can find a subsequence (by an abuse of notation we use the same sequence) and some such that as . Thus as before, there exists a mini-max selector of (2.18), i.e.,
(2.32) |
Hence,
The above implies
(2.33) |
Now, since for all , we have
(2.34) |
Also, since and are compact there exist and such that and as . Under given assumptions, from [[13], Lemma 7.2] it is clear that the functions , and are continuous at on for each fixed , . Therefore by the dominated convergence theorem, letting in (2.33), we obtain
Hence we have
(2.35) |
By similar arguments using (2.32) and extended Fatou’s lemma [[20], Lemma 8.3.7], we get
(2.36) |
Hence by (2.35) and (2.36), we get (2.28). Since at some point in we have , for all large . It follows that for some . Since , it is clear that is nontrivial. Now we claim that . If not, then we must have for some . Again as before, there exits a pair of a mini-max selector such that form (2.28), we have
(2.37) |
This implies
Since the Markov chain is irreducible under , from the above equation, it follows that . So, we arrive at a contradiction. This proves the claim. Now we prove (i) and (ii).
(i) Since and as , we have for all large enough . So, using (2.19), we have for all .
(ii) By measurable selection theorem in [[27], Theorem 2.2], there exists a pair of strategies (a mini-max selector) (as in (2.32)) satisfying
(2.38) |
Using (2.38), Lemma 2.1, and Dynkin’s formula, we have
By Fatou’s lemma taking , we get
(2.39) |
Hence,
(2.40) |
Also, using (2.38), Lemma 2.1, and Dynkin’s formula, we obtain
Since , using the estimates as in Lemma 2.1, taking , by dominated convergent theorem it follows that
(2.41) |
Hence
(2.42) |
3. Existence of risk-sensitive average optimal strategies
In this section we prove that any mini-max selector of the associated HJI equation is a saddle point equilibrium. Also, exploiting the stochastic representation (2.29) we completely characterize all possible saddle point equilibrium in the space of stationary Markov strategies.
Theorem 3.1.
Proof.
We perturb the cost function as follows.
-
•
If Assumption 2.2 (a) holds: We define for , , . Here , is a small number satisfying . Note that .
-
•
If Assumption 2.2 (b) holds: We define for , , . Note that the function is norm-like function. Also, it is easy to see that for large enough , is norm-like.
In view of Lemma 2.3, it is clear that for , there exists , satisfying
(3.3) |
such that
(3.4) |
Also, for some finite set , we have
(3.5) |
Now from the proof of Lemma 2.3, we have a finite set , depending on , containing such that the following cases happen:
- •
-
•
Under Assumption 2.2 (b): since is norm-like function, we can choose suitable finite set such that in for all .
For any , applying Dynkin formula and using (3.3) and Lemma 2.1, we get
Since for , , by Fatou’s lemma taking , we get
This implies that, has a lower bound. Now, applying Dynkin formula, and using (3.3) and Lemma 2.1, we deduce that
for any , where , . By Fatou’s lemma taking , we get
Thus, taking logarithm both sides, dividing by and letting , we obtain
Since arbitrary, it follows that
Using this and (3.4), we get for all . From the definition of , it is easy to see that is a decreasing sequence which has a lower bound. Now by similar arguments as in Lemma 2.3, it follows that there exists a pair such that and as . As in Lemma 2.3, by taking in (3.3), we get
(3.6) |
Also, we have . Now, we want to show that . Let be a selector in (3.6). Thus
(3.7) |
In view of estimates in Lemma 2.1, applying Dynkin’s formula and the dominated convergence theorem, from (3.7) we deduce that there exists a finite set such that
(3.8) |
Since , arguing as in Lemma 2.3 (see, (2.29)) for we have
(3.9) |
Now we choose an appropriate constant (e.g., ), so that in and for some . From (3.8) and (3.9), we get
(3.10) |
From the above expression it is easy to see that in . Now using (3.1), (3.7) and the fact that , we get
This implies that
(3.11) |
Since the Markov chain is irreducible under , by (3.11), we have in . From (3.1) and (3.6) it follows that . This proves (3.2). ∎
In the next theorem we show that any mini-max selector of (2.28) is a saddle point equilibrium.
Theorem 3.2.
Proof.
Arguing as in Lemma 2.3 and Theorem 3.1, there exists with satisfying
(3.12) |
Furthermore and for some finite set (without loss of generality denoting by the same notation)
(3.13) |
Thus, from (3.2) it is clear that . Now, following similar arguments as in Theorem 3.1 it is easy to see that . This implies that for all . Next, from [7] it is clear that if we consider the minimization problem , then the optimal control exists in the space of stationary Markov strategies. Thus to complete the proof, it is enough to show that for any . If not suppose that for some . We know that for , there exists with satisfying
(3.14) |
also we have and for some finite set ()
(3.15) |
From (2.29), we deduce that
(3.16) |
Now, as in Theorem 3.1, using (3.15) and (3.16) one can deduce that for some positive constant . In view (2.28) and (3.14), it follows that , i.e., , which is a contradiction. This completes the proof. ∎
Next we prove the converse of the above theorem.
Theorem 3.3.
Proof.
From Theorem 3.2, we deduce that
This implies that and . Arguing as in Lemma 2.3 and using Theorem 3.1, it follows that for there exists with such that
(3.17) |
and (since ). Let be any mini-max selector of (2.28). Then form the above, we get
(3.18) |
Again arguing as in Lemma 2.3, for some we have
(3.19) |
Since, is a mini-max selector of (2.28), we have
Thus, by applying Dynkin’s formula and Fatou’s lemma, we obtain
(3.20) |
Using (3.19) and (3.20), and following the arguments as in Theorem 3.1 one can show that for some positive constant . Therefore, combining (2.28) and (3.17) it is easy to see that is an outer maximizing selector of (2.28). By similar arguments we can show that is an outer minimizing selector of (2.28). This completes the proof. ∎
4. Example
In this section an illustrative example is presented. In our model the transition rate is unbounded, and the cost rate is nonnegative and unbounded.
Example 4.1.
Consider a controlled birth-death system in which the state variable denotes the total population at each time . In this system there are ‘natural’ arrival and departure rates, say and , respectively. Here player 1 controls arrival parameters and player 2 controls departure parameters . At any time , when the state of the system is , player 1 takes an action from a given set (which is a compact subset of some Polish space ). This action may increase or decrease , the arrival rate and these actions result in a payoff denoted by per unit time. Similarly, if the state is , player 2 takes an action from a set (which is a compact subset of a Polish space ) to increase or to decrease , the departure rate and these actions produce a payoff denoted by per unit time. Also, in addition, assume that player 1 ‘owns’ the system and he/she gets a reward for each unit of time during which the system remains in the state , where is a fixed reward fee per customer. We also, assume that when the state of the system reaches at state , any number of arrivals may occur. When there is no customer in the system, (i.e., ), control of departure is unnecessary.
We next formulate this model as a continuous-time Markov game. The corresponding transition rate and reward rate for player 1 are given as follows: for ( as in the game model (2.1)). We take
(4.1) |
where is some constant so that . Also for with ,
(4.5) |
(4.6) |
We now explore conditions under which there exists a pair of optimal strategies. To do so, we make the following assumptions.
-
(I)
Let , , and for all with ; and assume that and for all
-
(II)
The functions , , , and are continuous with their respective variables for each fixed . Also, assume that is norm-like function and for . Here we take .
Proposition 4.1.
Proof.
Consider the Lyapunov function given by
We have for all . Now for each , and , we have
(4.7) |
where . For ,
(4.8) |
where and . Now
(4.9) |
where . From (4.7) and (4.8), for all , we get
(4.10) |
where and . Now
(4.11) |
From (4.9) and (4.10), Assumption 2.1 is verified. By the condition (II), equations (4.7), (4.8), (4.11), it is easy to see that Assumption 2.2 is verified. By (4.1), (4.6), the condition (II), and the definition of as defined above, Assumption 2.3 (i) is verified. By (4.7) and (4.8), Assumption 2.3 (ii) is verified. Hence by Theorem 3.2, it follows that there exists an optimal pair of stationary strategies for this controlled Birth-Death process. ∎
References
- [1] A. ARAPOSTATHIS, A counterexample to a nonlinear version of the Kreın-Rutman theorem by R. Mahadevan, Nonlinear Anal., 171 (2018), pp. 170–176.
- [2] W. J. ANDERSON, Continuous-time Markov chains, Springer Series in Statistics: Probab. and its Appl., Springer-Verlag, New York, (1991), An applications-oriented approach.
- [3] A. BASU AND M. K. GHOSH, Zero-sum risk-sensitive stochastic games on a countable state space, Stoch, processes and their appli., 124 (2014), pp. 961-983.
- [4] A. BASU AND M. K. GHOSH, Nonzero-sum risk-sensitive stochastic games on a countable state space, Math. of Oper. Res., 43 (2018), pp. 516-532.
- [5] N. BAUERLE AND U. RIEDER, More risk-sensitive Markov decision processes, Math. Oper. Res., 39 (2014), pp. 105-120.
- [6] N. BAUERLE AND U. RIEDER, Zero-sum risk-sensitive stochastic games, Stoch. processes and their appl., 127 (2017), pp. 622-642.
- [7] A. BISWAS AND S. PRADHAN, Ergodic risk-sensitive control of Markov processes on countable state space revisited, ArXiv e-prints 2104.04825 (2021). Available at https://arxiv.org/abs/2104.04825.
- [8] K. FAN, Minimax Theorems, Proc. Natl. Acad. Sci., USA, 39 (1953), pp. 42-47.
- [9] M. K. GHOSH AND S. SAHA, Risk-sensitive control of continuous-time Markov chains, Stochastic, 86 (2014), pp. 655-675.
- [10] M. K. GHOSH, K. S. KUMAR, AND C. PAL, Zero-sum risk-sensitive stochastic games for continuous-time Markov chains, Stoch. Anal. Appl., 34 (2016), pp. 835-851.
- [11] S. GOLUI AND C. PAL, Continuous-time Zero-Sum Games for Markov chains with risk-sensitive finite-horizon cost criterion, Stoch. Anal. and Appl., (2021). Available at: https://doi.org/10.1080/07362994.2021.1889381.
- [12] S. GOLUI AND C. PAL, Continuous-time zero-sum games for Markov decision processes with discounted risk-sensitive cost criterion, Dyn. Games and Appl., (2021). Available at: https://doi.org/10.1007/s13235-021-00391-2.
- [13] X. P. GUO AND O. HERNANDEZ-LERMA, Zero-sum games for continuous-time Markov chains with unbounded transition and average payoff rates, J. Appl. Probab., 40 (2003), pp. 327-345.
- [14] X.P. GUO AND YOUNGHUI HUANG, Risk-sensitive average continuous-time Markov decision processes with unbounded transition and cost rates, J. Appl. Probab., 58 (2021), pp. 523-550.
- [15] X. P. GUO AND O. HERNANDEZ-LERMA, Zero-sum games for continuous-time jump Markov processes in Polish spaces: discounted payoffs, Adv. in Appl. Probab., 39 (2007), pp. 645-668.
- [16] X. P. GUO AND O. HERNANDEZ-LERMA, Continuous-Time Markov decision processes, Stoch. Model. and Appl. Probab., Springer-Verlag, Berlin, 62 (2009). Theorey and applications.
- [17] X. GUO AND X. SONG, Discounted continuous-time constrained Markov decision processes in polish spaces, Ann. Appl. Probab., 21 (2011), pp. 2016-2049.
- [18] X. P. GUO AND Z. W. LIAO, Risk-sensitive discounted continuous-time Markov decision processes with unbounded rates, SIAM J. Control Optim., 57 (2019), pp. 3857-3883.
- [19] X. P. GUO AND A. PIUNOVSKIY, Discounted continuous-time Markov decision processes with constraints: Unbounded transition and loss rates, Math. Oper. Res., 36 (2011), pp. 105-132.
- [20] O. HERNANDEZ-LERMA AND J. LASSERRE, Further topics on discrete-time Markov control processes, Springer, New York, (1999).
- [21] O. HERNANDEZ-LERMA AND J. B. LASSERRE Zero-sum stochastic games in Borel spaces: average payoff criterion, SIAM J. Control Optimization, 39 (2001), pp. 1520-1539.
- [22] M. Y. KITAEV, Semi-Markov and jump Markov controlled models: Average cost criterion, SIAM Theory Probab. Appl., 30 (1995), pp. 272-288.
- [23] M. Y. KITAEV AND V.V. RYKOV, Controlled Queueing Systems, CRC Press, Boca Raton, (1995).
- [24] M. B. KLOMPSTRA, Nash equilibria in risk-sensitive dynamic games, IEEE Trans. Automat. Control. 45 (2007), pp. 1397-1401.
- [25] K.S. KUMAR AND C. PAL, Risk-sensitive control of jump process on denumerable state space with near monotone cost, Appl. Math. Optim., 68 (2013), pp. 311-331.
- [26] K.S. KUMAR AND C. PAL, Risk-sensitive ergodic control of continuous-time Markov processes with denumerable state space, Stoch. Anal. Appl., 33 (2015), pp. 863-881.
- [27] A. S. NOWAK, Measurable selection theorems for minimax stochastic optimization problems, SIAM J. Control Optim., 23 (1985), pp. 466-476.
- [28] C. PAL AND S. PRADHAN, Risk-sensitive control of pure jump processes on a general state space, Stochastics, 91 (2019), pp. 155-174. Available at https://doi.org/10.1080/17442508.2018.1521413.
- [29] C. PAL AND S. PRADHAN, Zero-Sum Games for Pure Jump Processes with Risk-Sensitive Discounted Cost Criteria, J. Dyn. Games, (2021). Available at http://dx.doi.org/10.3934/jdg.2021020.
- [30] A. PIUNOVSKIY AND Y. ZHANG, Discounted continuous-time Markov decision processes with unbounded rates: The convex analytic approach, SIAM J. Control Optim., 49 (2011), pp. 2032-2061.
- [31] A. PIUNOVSKIY AND Y. ZHANG, Continuous-time Markov decision processes, Springer, (2020).
- [32] M. L. PUTERMAN, Markov decision processes: Disc. stoch. dyn. program., Wiley, New York, (1994).
- [33] L. S. SHAPLEY, Stochastic games, Proc. Nat. Acad. Sci., USA 39 (1953), pp. 1095-1100.
- [34] Q. D. WEI AND X. CHEN, Stochastic games for continuous-time jump processes under finite-horizon payoff criterion, Appl. Math. Optim., 74 (2016), pp. 273–301.
- [35] Q. D. WEI AND X. CHEN, Nonzero-sum games for continuous-time jupm processes under the expected Average payoff criterion, Appl. Math. Optim., 83 (2019), pp. 915-938.
- [36] Q. D. WEI AND X. CHEN, Nonzero-sum Risk-Sensitive Average Stochastic Gamea: The Case of Unbounded Costs, Dyn. games and Appli., (2021). Available at https://doi.org/10.1007/s13235-021-00380-5.
- [37] Q. WEI, Zero-sum games for continuous-time Markov jump processes with risk-sensitive finite-horizon cost criterion, Oper. Res. Lett., 46 (2018), pp. 69-75.
- [38] W. Z. ZHANG AND X. P. GUO, Nonzero-sum games for continuous-time Markov chains with unbounded transition and average payoff rates, Sci. China Math., 55 (2012), pp. 2405–2416.
-
[39]
W. Z. ZHANG, Average optimality for continuous-time Markov decision processes under weak continuty conditions, J. of Appl. probab., 51 (2014), pp. 954-970.