Robustness of Stochastic Optimal Control to Approximate Diffusion Models under Several Cost Evaluation Criteria
Abstract.
In control theory, typically a nominal model is assumed based on which an optimal control is designed and then applied to an actual (true) system. This gives rise to the problem of performance loss due to the mismatch between the true model and the assumed model. A robustness problem in this context is to show that the error due to the mismatch between a true model and an assumed model decreases to zero as the assumed model approaches the true model. We study this problem when the state dynamics of the system are governed by controlled diffusion processes. In particular, we will discuss continuity and robustness properties of finite horizon and infinite-horizon -discounted/ergodic optimal control problems for a general class of non-degenerate controlled diffusion processes, as well as for optimal control up to an exit time. Under a general set of assumptions and a convergence criterion on the models, we first establish that the optimal value of the approximate model converges to the optimal value of the true model. We then establish that the error due to mismatch that occurs by application of a control policy, designed for an incorrectly estimated model, to a true model decreases to zero as the incorrect model approaches the true model. We will see that, compared to related results in the discrete-time setup, the continuous-time theory will let us utilize the strong regularity properties of solutions to optimality (HJB) equations, via the theory of uniformly elliptic PDEs, to arrive at strong continuity and robustness properties.
Key words and phrases:
Robust control, Controlled diffusions, Hamilton-Jacobi-Bellman equation, Stationary control2000 Mathematics Subject Classification:
Primary: 93E20, 60J60; secondary: 49J551. Introduction
In stochastic control applications, typically only an ideal model is assumed, or learned from available incomplete data, based on which an optimal control is designed and then applied to the actual system. This gives rise to the problem of performance loss due to the mismatch between the actual system and the assumed system. A robustness problem in this context is to show that the error due to mismatch decreases to zero as the assumed system approaches the actual system. With this motivation, in this article, our goal is to study the continuity and robustness properties of finite horizon and infinite horizon discounted/ergodic cost problems for a large class of multidimensional controlled diffusions. We note that the problems of existence, uniqueness and verification of optimality of stationary Markov policies have been studied extensively in literature see e.g., [Bor-book], [HP09-book] (finite horizon) [BS86], [BB96] (discounted cost) [AA12], [AA13], [BG88I], [BG90b] (ergodic cost) and references therein. For a book-length exposition of this topic see e.g., [ABG-book].
In more explicit terms, here is the problem that we will study. For a precise statement please see Section 2.3. Suppose that our true model is represented as (see, e.g., Eq. 2.1), where is the true system model (representing the system evolution model via the drift and diffusion terms) and is the associated running cost function, and let (see, e.g., Eq. 2.15) be a sequence of approximating models with associated running cost functions , such that as approximating models converge to the true model in some sense to be precisely stated. Suppose that for each choice of control policy the associated total cost in true/approximating models are respectively. The objective of the controller is to minimize the total cost over all possible admissible policies. If the optimal control policies of the true/approximating models are , respectively, the performance loss due to mismatch is given by . Thus the robustness problem in this context is to show that as . See Section 2.3. In this sense, our paper can be viewed as a continuous-time counterpart of the setting studied in [KY-20], [KRY-20].
This problem is of major practical importance and, accordingly, there have been many studies. Most of the existing works in this direction are concerned with the discrete time Markov decision process, see for instance [KY-20], [KRY-20], [BJP02], [KV16], [NG05] [SX15], and references therein.
We should note that the term robustness has various interpretations, contexts and solution methods. A common approach to robustness in the literature has been to design controllers that work sufficiently well for all possible uncertain systems under some structured constraints, such as norm bounded perturbations (see [basbern], [zhou1996robust]). For such problems, the design for robust controllers has often been developed through a game theoretic formulation where the minimizer is the controller and the maximizer is the uncertainty. In [DJP00], [jacobson1973optimal] the authors established the connections of this formulation to risk sensitive control. Using Legendre-type transforms, relative entropy constraints came in to the literature to probabilistically model the uncertainties, see e.g. [dai1996connections, Eqn. (4)] or [DJP00, Eqns. (2)-(3)]. Here, one selects a nominal system which satisfies a relative entropy bound between the actual measure and the nominal measure, solves a risk sensitive optimal control problem, and this solution value provides an upper bound for the original system performance. Therefore, a common approach in robust stochastic control has been to consider all models which satisfy certain bounds in terms of relative entropy pseudo-distance (or Kullback-Leibler divergence); see e.g. [DJP00, dai1996connections, dupuis2000kernel, boel2002robustness] among others. In order to quantify the uncertainty in the system models, other than the relative entropy pseudo-distance, various other metrics/criterion have also been used in the literature. In [tzortzis2015dynamic], for discrete time controlled models, the authors have studied a min-max formulation for robust control where the one-stage transition kernel belongs to a ball under the total variation metric for each state action pair. For distributionally robust stochastic optimization problems, it is assumed that the underlying probability measure of the system lies within an ambiguity set and a worst case single-stage optimization is made considering the probability measures in the ambiguity set. To construct ambiguity sets, [blanchet2016], [esfahani2015] use the Wasserstein metric, [erdogan2005] uses the Prokhorov metric which metrizes the weak topology, [sun2015] uses the total variation distance, and [lam2016] works with relative entropy. For fully observed finite state-action space models with uncertain transition probabilities, the authors in [iyengar2005robust], [nilim2005robust] have studied robust dynamic programming approaches through a min-max formulation. Similar work with model uncertainty includes [oksendal2014forward], [benavoli2011robust], [xu_mannor]. In the economics literature related work has been done in [hansen2001robust], [gossner2008entropy].
The robustness formulation we study has been considered in [KY-20], [KRY-20] for discrete-time models, where the authors studied both continuity of value functions as transition kernel models converge, as well as the robustness problem where an optimal control designed for an incorrect approximate model is applied to a true model and the mismatch term is studied. The solution approach is fundamentally different in the continuous-time analysis we present in this paper. In a related study [Dean18], the author studied the optimal control of systems with unknown dynamics for a linear quadratic regulator setup and proposes an algorithm to learn the system from observed data with quantitative convergence bounds. The author in [Lan81, Theorem 5.1] has considered fully observed discrete time controlled models and established continuity results for approximate models and gives a set convergence result for sets of optimal control actions, this set convergence result is inconclusive for robustness without further assumptions on the true system model (for more details see [KY-20]). For fully observed MDPs, [muller1997does] studied continuity of the value function under a general metric defined as the integral probability metric, which captures both the total variation metric or the Kantorovich metric with different setups (which is not weaker than the metrics leading to weak convergence). A recent study on game problems along a similar theme is presented in [subramanian2021robustness].
For control problems of MDPs with standard Borel spaces, the approximation methods through quantization, which lead to finite models, can be viewed as approximations of transition kernels, but this interpretation requires caution: indeed, [SaYuLi17, arruda2012, arruda2013], among many others, study approximation methods for MDPs where the convergence of approximate models is satisfied in a particularly constructed fashion. Reference [SaYuLi17] presents a construction for the approximate models through quantizing the actual model with continuous spaces (leading to a finite space model), which allows for continuity and robustness results with only a weak continuity assumption on the true transition kernel which, in turn, leads to the weak convergence of the approximate models. For both fully observed and partially observed models, a detailed analysis of approximation methods for continuous state and action spaces can be found in [SaLiYuSpringer] .
The literature on robustness of stochastic optimal control for continuous time system seems to be rather limited; see e.g., [GL99], [LJE15] [hansen2001robust] . In [GL99] the authors have considered the problem of controlling a system whose dynamics are given by a stochastic differential equation (SDE) whose coefficients are known only up to a certain degree of accuracy. For the finite horizon reward maximization problem, using the technique of contractive operators, [GL99] has obtained upper bounds of performance loss due to mismatch (or, “robustness index”) and has shown by an example that the robustness index may behave abnormally even if we have the convergence of the value functions. The associated discounted payoff maximization problem has been studied in [LJE15], where using a Lyapunov type stability assumption the authors have studied the robustness problem via a game theoretic formulation. For controlled diffusion models, the authors in [hansen2001robust] described the links between the max-min expected utility theory and the applications of robust-control theory, in analogy with some of the papers on discrete-time noted above adopting a min-max formulation. Along a further direction, for controlled diffusions, via the Maximum Principle technique, [PDPB02a], [PDPB02b], [PDPB02c] have established the robustness of optimal controls for the finite horizon payoff criterion.
In a recent comprehensive work [RZ21], the authors have studied the robustness of feedback relaxed controls for a continuous time stochastic exit time problem. Under sufficient smoothness assumptions on the coefficients (i.e, uniform Lipschitz continuity on the diffusion coefficients and uniform Hölder continuity on the discount factor and payoff function on a fixed bounded domain) they have established that a regularized control problem admits a Hölder continuous optimal feedback control and also they have shown that both the value function and the feedback control of the regularized control problem are Lipschitz stable with respect to parameter perturbations when the action space is finite. It is known that the optimal control obtained form the HJB equation (i.e. the argmin function) in general is unstable with respect to perturbations of coefficients; in practice, this would result in numerical instability of learning algorithms (as noted in [RZ21]).
Stability/continuity of solutions of PDEs with respect coefficient perturbations is a significant mathematical and practical question in PDE theory (see e.g. [WLS01], [SI72]). The continuity results established in this paper (see Theorems 3.3, 4.3, 4.8) will provide sufficient conditions which ensure stability of solutions of semilinear elliptic PDEs (HJB equations) in the whole space .
Our robustness results also will be useful to the study of the robust optimal investment problems for local volatility models, e.g. given in [AS08, Remark 2.1] (also, see [KT12], [BDD20]) .
When the system noise is not given by a Wiener process, but it is given by a general wide bandwidth noise (or, a more general discontinuous martingales [LRT00]), the controlled process becomes a non-Markovian process even under stationary Markov policies. The general method of studying optimal control problem for such a system is to find suitable Markovian processes which approximate the non-Markovian process (see, [K90], [KR87], [KR87a], [KR88]). For wide bandwidth noise driven controlled systems [K90], [KR87], [KR87a], [KR88], diffusion approximation techniques were used to study stochastic optimal control problems. The results described in this paper are complementary to the above mentioned works on the diffusion approximation of wide bandwidth noise driven systems.
Contributions and main results. In the present paper, our aim is to study the continuity and robustness properties for a general class of controlled diffusion processes in for both infinite horizon discounted/ ergodic costs, where the action space is a (general) compact metric space. As in [KY-20], [KRY-20], in order to establish our desired robustness results we will use the continuity result as an intermediate step. For the discounted cost case, we will establish our results following a direct approach (under a relatively weaker set of assumptions on the diffusion coefficients, i.e., locally Lipschitz continuous coefficients). Using the results on existence and uniqueness of solutions of the associated discounted Hamilton Jacobi Bellman (HJB) equation and the complete characterization of (discounted) optimal policies in the space of stationary Markov policies (see [ABG-book, Theorem 3.5.6]), we first establish the continuity of value functions. Then utilizing this continuity of value functions, we derive a robustness result. The analysis of ergodic cost (or long-run expected average cost) is somewhat more involved. To the best of our knowledge there is no work on continuity and robustness properties of optimal controls for the ergodic cost criterion in the existing literature (for the discrete-time setup, see [KRY-20]). We have studied these ergodic cost problems under two sets of assumptions: In the first case, we assume that our running cost function satisfies a near-monotone type structural assumption (see, eq. Eq. 4.1, Assumption (A6)), and in the second case we assume Lyapunov type stability assumptions on the dynamics of the system (see Assumption (A7)) .
One of the major issues in analyzing the robustness of ergodic optimal controls under the near-monotone hypothesis is the non-uniqueness/restricted uniqueness of solutions of the associated HJB equation (see, [ABG-book, Example 3.8.3], [AA13]). It is shown in [ABG-book, Example 3.8.3] that the ergodic HJB equation may admit uncountably many solutions. Considering this, in [AA13, Theorem 1.1] the author has established the uniqueness of compatible solution pairs (see [AA13, Definition 1.1]). Exploiting this uniqueness result, under a suitable tightness assumption (on a certain set of invariant measures) we will establish the desired robustness result. Under the Lyapunov type stability assumption it is known that the ergodic HJB equation admits a unique solution in a certain class of functions, also the complete characterization of ergodic optimal control is known (see [ABG-book, Theorem 3.7.11] and [ABG-book, Theorem 3.7.12]) . Utilizing this characterization of optimal controls, we derive the robustness properties of ergodic optimal controls under a Lyapunov stability assumption.
We also emphasize the duality between the PDE approach vs. a probabilistic flow approach to study robustness. The PDE approach presents a very general and conclusive, yet concise and unified, approach for several cost criteria (notably, a probabilistic approach via Dynkin’s lemma would require separate arguments for discounted infinite-horizon and average cost infinite-horizon criteria) and such a unified approach had not been considered earlier, to our knowledge.
Thus, the main results of this article can be roughly described as follows.
-
•
For discounted cost criterion: We establish continuity of value functions and provide sufficient conditions which ensure robustness/stability of optimal controls designed under model uncertainties.
-
•
For ergodic cost criterion: Under two different sets of assumptions ((i) where the running cost is near-monotone or (ii) where a Lyapunov stability condition holds) we establish the continuity of value functions and exploiting the continuity results we derive the robustness/stability of ergodic optimal controls designed for approximate models applied to actual systems.
-
•
For finite horizon cost criterion: Under uniform boundedness assumptions on the drift term and diffusion matrices (of the true and approximating models), we establish continuity of value functions. Then exploiting the continuity result we prove the robustness/stability of optimal controls designed under model uncertainties.
-
•
For cost up to an exit time: Similar to the above criteria, under a mild set of assumptions we first establish the continuity of value functions and then using the continuity results we establish the robustness/stability of optimal controls designed under model uncertainties.
We will see that compared with the discrete-time counterpart of this problem studied in [KY-20] (discounted cost) and [KRY-20] (average cost), where value iteration methods were crucially used, in our analysis here we will develop rather direct arguments, with strong implications, utilizing regularity properties of value functions: In the discrete-time setup, these properties need to be established via tedious arguments whereas the continuous-time theory allows for the use of regularity properties of solutions to PDEs. Nonetheless, we will see that continuous convergence in control actions of models and cost functions is a unifying condition for continuity and robustness properties in both the discrete-time setup studied in [KY-20] (discounted cost) and [KRY-20] (average cost) and our current paper. Compared to [RZ21], in addition to the infinite horizon criteria we study, we note that the perturbations we consider do not involve only coefficient/parameter variations, i.e., we consider functional perturbations, and the action space we consider is uncountable, though we do not establish the Lipschitz property of control policies, unlike [RZ21].
The rest of the paper is organized as follows. Section 2 introduces the the problem setup and summarizes the notation. Section 3 is devoted to the analysis of robustness of optimal controls for discounted cost criterion. In Section 4 we provide the analysis of robustness of ergodic optimal control under two different sets of hypotheses (i) near-monotonicity (ii) Lyapunov stability. For the finite horizon cost criterion the robustness problem is analyzed in Section 5. The robustness problem for optimal controls up to an exit time is considered in Section 6.
2. Description of the problem
Let be a compact metric space and be the space of probability measures on with topology of weak convergence. Let
be given functions. We consider a stochastic optimal control problem whose state is evolving according to a controlled diffusion process given by the solution of the following stochastic differential equation (SDE)
(2.1) |
Where
-
•
is a -dimensional standard Wiener process, defined on a complete probability space .
-
•
We extend the drift term as follows:
for .
-
•
is a valued process satisfying the following non-anticipativity condition: for is independent of
The process is called an admissible control, and the set of all admissible controls is denoted by (see, [BG90]).
To ensure existence and uniqueness of strong solutions of Eq. 2.1, we impose the following assumptions on the drift and the diffusion matrix .
-
(A1)
Local Lipschitz continuity: The function , are locally Lipschitz continuous in (uniformly with respect to the control action for ). In particular, for some constant depending on , we have
for all and , where . Also, we are assuming that is jointly continuous in .
-
(A2)
Affine growth condition: and satisfy a global growth condition of the form
for some constant .
-
(A3)
Nondegeneracy: For each , it holds that
and for all , where .
By a Markov control we mean an admissible control of the form for some Borel measurable function . The space of all Markov controls is denoted by . If the function is independent of , then or by an abuse of notation itself is called a stationary Markov control. The set of all stationary Markov controls is denoted by . From [ABG-book, Section 2.4], we have that the set is metrizable with compact metric under the following topology: A sequence in if and only if
for all and (for more details, see [ABG-book, Lemma 2.4.1]) . It is well known that under the hypotheses (A1)–(A3), for any admissible control Eq. 2.1 has a unique strong solution [ABG-book, Theorem 2.2.4], and under any stationary Markov strategy Eq. 2.1 has unique strong solution which is a strong Feller (therefore strong Markov) process [ABG-book, Theorem 2.2.12].
2.1. Cost Criteria
Let be the running cost function. We assume that
-
(A4)
The running cost is bounded (i.e., for some positive constant ), jointly continuous in and locally Lipschitz continuous in its first argument uniformly with respect to .
This condition (A4) can also be relaxed to (A4)́, to be presented further below, where the local Lipschitz property is eliminated.
We extend as follows: for
In this article, we consider the problem of minimizing finite horizon, discounted, ergodic and control up to an exit time cost criteria:
Discounted cost criterion. For , the associated -discounted cost is given by
(2.2) |
where is the discount factor and is the solution of Eq. 2.1 corresponding to and is the expectation with respect to the law of the process with initial condition . The controller tries to minimize Eq. 2.2 over his/her admissible policies . Thus, a policy is said to be optimal if for all
(2.3) |
where is called the optimal value.
Ergodic cost criterion. For each the associated ergodic cost is defined as
(2.4) |
and the optimal value is defined as
(2.5) |
Then a control is said to be optimal if we have
(2.6) |
Finite horizon cost. For , the associated finite horizon cost is given by
(2.7) |
where is the terminal cost. The optimal value is defined as
(2.8) |
Thus, a policy is said to be (finite horizon) optimal if we have
(2.9) |
Control up to an exit time. This criterion will be presented in Section 6. Our analysis for this criterion will be immediate given the study involving the above criteria.
We define a family of operators mapping to by
(2.10) |
for , . For we extend as follows:
(2.11) |
For , we define
(2.12) |
We are interested in the robustness of optimal controls under these criteria. To this end, we now introduce our approximating models.
2.2. Approximating Control Diffusion Process:
Let, , , be sequence of functions satisfying the following assumptions
-
(A5)
-
(i)
as
(2.13) -
(ii)
Continuous convergence in controls: for any sequence
(2.14) -
(iii)
for each , and satisfy Assumptions (A1) - (A3) and is uniformly bounded ( in particular, where is a positive constant as in (A4)), jointly continuous in and locally Lipschitz continuous in its first argument uniformly with respect to . .
-
(i)
Let for each , be the solution of the following SDE
(2.15) |
Define a family of operators mapping to by
(2.16) |
for , . For the approximated model, for each and the associated discounted cost is defined as
(2.17) |
and the optimal value is defined as
(2.18) |
For each and the associated ergodic cost is defined as
(2.19) |
and the optimal value is defined as
(2.20) |
Similarly, for each and the associated finite horizon cost is given by
(2.21) |
The optimal value is given by
(2.22) |
where state process is given by the solution of the SDE Eq. 2.15 .
2.3. Continuity and Robustness Problems
The primary objective of this article will be to address the following problems:
- •
-
•
Robustness: Suppose is an optimal policy designed over incorrect model Eq. 2.15 for finite horizon/ discounted/ergodic/up to an exit time cost problem, does this imply
-
•
for discounted cost:
-
•
for ergodic cost:
-
•
for finite horizon cost :
-
•
for cost up to an exit time: (for details, see Section 6)
as .
-
•
In this article, under a mild set of assumptions we show that the answers to the above mentioned questions are affirmative.
Example 2.1.
-
(i)
If our noise term is not the (ideal) Brownian, and instead of Eq. 2.1, the state dynamics of the system is governed the following SDE
(2.23) Here we are approximating the noise term by a It process , given by
(2.24) where and as .
- (ii)
-
(iii)
Consider a Vasicek interest rate model, given by
this is a mean reverting process, where is the rate of reversion, is the long term mean and is the volatility. The wealth process corresponding to this interest model can be described by Eq. 2.1 (see [AS08, Remark 2.1], [KT12], [DJ07])). Since market models are typically incomplete, usually model parameters () are learned from the market data. This gives rise to the problem of robustness of optimal investment. This also applies to several other interest/pricing models as well, [merton1998applications].
-
(iv)
In the above examples, can be a regularized version of , e.g. by adding, for , where , which then would continuously converge (in control) to as .
In the cases above, the approximating kernel conditions in (A5) would apply.
Remark 2.1.
If we replace by , in the relaxed control framework if is Lipschitz continuous for then Eq. 2.1 admits a unique strong solution. But in general stationary policies are just measurable functions. Existence of suitable strong solutions in our setting is not known (see, [ABG-book, Remarks 2.3.2], [B05Survey]) . However, under stationary Markov policies one can prove the existence of weak solutions which may not be unique [stroock1997multidimensional][ABG-book, Remarks 2.3.2] (note though that uniqueness is established for in [stroock1997multidimensional, p. 192-194] under some conditions). The existence of a suitable strong solution (which is also a strong Markov process) under stationary Markov policies is essential to obtain stochastic representation of solutions of HJB equations (by applying It-Krylov formula).
Notation:
-
•
For any set , by we denote first exit time of the process from the set , defined by
-
•
denotes the open ball of radius in , centered at the origin, and denotes the complement of in .
-
•
, denote the first exist time from , respectively, i.e., , and .
-
•
By we denote the trace of a square matrix .
-
•
For any domain , the space (), , denotes the class of all real-valued functions on whose partial derivatives up to and including order (of any order) exist and are continuous.
-
•
denotes the subset of , , consisting of functions that have compact support. This denotes the space of test functions.
-
•
denotes the class of bounded continuous functions on .
-
•
, denotes the subspace of , , consisting of functions that vanish in .
-
•
, denotes the class of functions whose partial derivatives up to order are Hölder continuous of order .
-
•
, , denotes the Banach space of (equivalence classes of) measurable functions satisfying .
-
•
, , denotes the standard Sobolev space of functions on whose weak derivatives up to order are in , equipped with its natural norm (see, [Adams]) .
-
•
If is a space of real-valued functions on , consists of all functions such that for every . In a similar fashion, we define .
-
•
For , let , . Then if . Similarly, with natural norm (see [BL84-book])
3. Analysis of Discounted Cost
In this section we analyze the robustness of optimal controls for discounted cost criterion. From [ABG-book, Theorem 3.5.6], we have the following characterization of the optimal -discounted cost .
Theorem 3.1.
Remark 3.1.
The assumption that the running cost is Lipschitz continuous in it’s first argument uniformly with respect to the second, is used to obtain a solution of the HJB equation Eq. 3.1. If we don’t have this uniformly Lipschitz assumption, one can still show that the HJB equation admits a solution now in , and all the conclusions of the Theorem 3.1 still hold . To see this: in view of [GilTru, Theorem 9.15] and the Schauder fixed point theorem, it can be shown that there exists satisfying the Dirichlet problem
Now letting and following [ABG-book, Theorem 3.5.6] we arrive at the solution.
Hence, one can replace our assumption (A4) by the following (relatively weaker) assumption
-
(A4)́
The running cost is bounded (i.e., for some positive constant ) and jointly continuous in both variables .
All the results of this paper will also hold if we replace (A4) by (A4)́ .
As in Theorem 3.1, following [ABG-book, Theorem 3.5.6], for each approximating model we have the following complete characterization of an optimal policy, which is in the space of stationary Markov policies.
Theorem 3.2.
Suppose (A5)(iii) holds. Then for each , there exists a unique solution of
(3.3) |
Moreover, we have the following:
-
(i)
is the optimal discounted cost, i.e.,
-
(ii)
is -discounted optimal control if and only if it is a measurable minimizing selector ofEq. 3.3, i.e.,
(3.4)
In the next theorem, we prove that converges to as for all . This result will be useful in establishing the robustness of discounted optimal controls.
Theorem 3.3.
Suppose Assumptions (A1)-(A5) hold. Then
(3.5) |
Proof.
From Eq. 3.3 and Eq. 3.4 for any minimizing selector , it follows that
Then using the standard elliptic PDE estimate as in [GilTru, Theorem 9.11], for any and , we deduce that
(3.6) |
where is a positive constant which is independent of . Since
from Eq. 3.6 we get
(3.7) |
We know that for , the space is reflexive and separable, hence, as a corollary of the Banach-Alaoglu theorem, we have that every bounded sequence in has a weakly convergent subsequence (see, [HB-book, Theorem 3.18.]). Also, we know that for the space is compactly embedded in , where (see [ABG-book, Theorem A.2.15 (2b)]), which implies that every weakly convergent sequence in will converge strongly in . Thus, in view of estimate Eq. 3.7, by a standard diagonalization argument and the Banach-Alaoglu theorem, we can extract a subsequence such that for some
(3.8) |
In the following, we will show that . Now, for any compact set , it is easy to see that
(3.9) |
Since , continuously on compact set and in for any compact set , as we deduce that
(3.10) |
Thus, multiplying by a test function , from Eq. 3.3, we obtain
In view of Eq. 3.8 and Eq. 3.10, letting it follows that
(3.11) |
Since is arbitrary and from LABEL:{ETC1.3D} we deduce that
(3.12) |
Let be a minimizing selector of Eq. 3.12 and be the solution of the SDE Eq. 2.1 corresponding to . Then applying It-Krylov formula, we obtain the following
Hence, using Eq. 3.12, we deduce that
(3.13) |
Since is bounded and
letting , it is easy to see that
Now, letting by monotone convergence theorem, from Eq. 3.13 we obtain
(3.14) |
Again by similar argument, applying It-Krylov formula and using Eq. 3.12, for any , we have
This implies
(3.15) |
Thus, from Eq. 3.14 and Eq. 3.15, we deduce that
(3.16) |
Since both are continuous functions on , from Eq. 2.3 and Eq. 3.16, it follows that for all . This completes the proof. ∎
Let be the solution of the SDE Eq. 2.1 corresponding to . Then we have
(3.17) |
Next we prove the robustness result, i.e., we prove that as , where is an optimal control of the approximated model and is an optimal control of the true model. As in [KY-20] we will use the continuity result above as an intermediate step.
Theorem 3.4.
Suppose Assumptions (A1)-(A5) hold. Then
(3.18) |
Proof.
Following the argument as in [ABG-book, Theorem 3.5.6], one can show that for each , there exists satisfying
(3.19) |
Applying It-Krylov formula, we deduce that
Now using Eq. 3.19, it follows that
(3.20) |
Since is bounded and
letting we deduce that
Thus, from Eq. 3.20, letting by monotone convergence theorem we obtain
(3.21) |
This implies that . Thus, as in Theorem 3.3 (see, Eq. 3.6, Eq. 3.7), by standard Sobolev estimate, for any we get , for some positive constant independent of . Hence, by the Banach-Alaoglu theorem and standard diagonalization argument (as in Eq. 3.8), we have there exists such that along some sub-sequence
(3.22) |
Since space of stationary Markov strategies is compact, along some further sub-sequence (without loss of generality denoting by same sequence) we have in . It is easy to see that
Since in on any compact set strongly and by the topology of , we have weakly. Thus, in view of the topology of , and since in as we obtain
(3.23) |
Now, multiplying by a test function , from Eq. 3.19, it follows that
Hence, using Eq. 3.22, Eq. 3.23, and letting we obtain
(3.24) |
Since is arbitrary and from Eq. 3.24, we deduce that the function satisfies
(3.25) |
As earlier, applying It-Krylov formula and using LABEL:{ETC1.4F}, it follows that
(3.26) |
where is the solution of SDE Eq. 2.1 corresponding to .
Now, we have
(3.27) |
From Theorem 3.1, we know that . Thus from Theorem 3.3, we deduce that as . To complete the proof we have to show that as . Also, from Theorem 3.2 we know that is a minimizing selector of the HJB equation Eq. 3.3 of the approximated model, thus it follows that
(3.28) |
Hence, by standard Sobolev estimate (as in Theorem 3.3), for each we have , for some positive constant independent of . Thus, we can extract a further sub-sequence (without loss of generality denoting by same sequence) such that for some (as in Eq. 3.8) we get
(3.29) |
Following the similar steps as in Theorem 3.3, multiplying by test function and letting , from Section 3 we deduce that satisfies
(3.30) |
From the continuity results (Theorem 3.3), it is easy to see that for all . Moreover, applying It-Krylov formula and using LABEL:{ETC1.4I} we obtain
(3.31) |
Since both , are continuous, from Eq. 3.26 and Eq. 3.31, it follows that both (which is equals to ) and converge to the same limit. This completes the proof. ∎
Remark 3.2.
Note that in the above, we indirectly also showed the continuity of the value function in the control policy (under the topology defined); uniqueness of the solution to the PDE in the above implies continuity. This result, while can be obtained from the analysis of Borkar [Bor89] (in a slightly more restrictive setup), is obtained directly via a careful optimality analysis and will have important consequences in numerical solutions and approximation results for both discounted and average cost optimality. This is studied in details, with implications in [YukselPradhan] .
4. Analysis of Ergodic Cost
In this section we study the robustness problem for the ergodic cost criterion. The associated optimal control problem for this cost criterion has been studied extensively in the literature, see e.g., [ABG-book].
For this cost evolution criterion we will study the robustness problem under two sets of assumptions: the first is so called near-monotonicity condition on the running cost which discourage instability and second is Lyapunov stability.
4.1. Analysis under a near-monotonicity assumption
Here we assume that the cost function satisfies the following near-monotonicity condition:
-
(A6)
It holds that
(4.1)
This condition penalizes the escape of probability mass to infinity. Since our running cost is bounded it is easy to see that . Recall that a stationary policy is said to be stable if the associated diffusion process is positive recurrent. It is known that under Eq. 4.1, optimal control exists in the space of stable stationary Markov controls (see, [ABG-book, Theorem 3.4.5]).
Now from [ABG-book, Theorem 3.6.10], we have the following complete characterization of ergodic optimal control.
Theorem 4.1.
Suppose that Assumptions (A1)-(A4) and (A6) hold. Then there exists a unique solution pair , , with and and , satisfying
(4.2) |
Moreover, we have
-
(i)
-
(ii)
a stationary Markov control is an optimal control if and only if it is a minimizing selector of Eq. 4.2, i.e., if and only if it satisfies
(4.3)
We assume that for the approximated model, for each the running cost function satisfies the near-monotonicity condition Eq. 4.1 relative to i.e.,
(4.4) |
Thus, in view of [ABG-book, Theorem 3.6.10], for the approximating model, for each we have the following theorem.
Theorem 4.2.
Suppose that Assumption (A5)(iii) holds. Then there exists a unique solution pair , , with and and , satisfying
(4.5) |
Moreover, we have
-
(i)
-
(ii)
a stationary Markov control is an optimal control if and only if it is a minimizing selector of Eq. 4.5, i.e., if and only if it satisfies
(4.6)
In view of the near-monotonicity assumption Eq. 4.4, for any minimizing selector of Eq. 4.5, it is easy to see that outside a compact set for some . Since is bounded from below, [ABG-book, Theorem 2.6.10(f)] asserts that is stable. Hence, we deduce that the optimal policies of the approximating models are stable. However, note that the compact set mentioned above may not be applicable uniformly for all , which turns out to be a consequential issue.
Now we want to show that as the optimal value of the approximated model converges to the optimal value of the true model . Under near-monotonicity assumption this result may not be true in general due to the restricted uniqueness/non-uniqueness of the solution of the associated HJB equation (see e.g., [AA12], [AA13]). As a result of this, in [AA12], [M97] the authors have shown that for the optimal control problem the policy iteration algorithm (PIA) may fail to converge to the optimal value. In order to to ensure convergence of the PIA, in addition to the near-monotonicity assumption a blanket Lyapunov condition is assumed in [M97] .
Accordingly, in this article, to guarantee the convergence , we will assume that
is tight, where is the unique invariant measure of the solution of Eq. 2.15 corresponding to (the optimal policies of the approximated models) . One sufficient condition which ensures the required tightness is the following: there exists a pair of nonnegative inf-compact functions such that for some positive constant and for all and .
Theorem 4.3.
Suppose that Assumptions (A1) - (A6) hold. Also, assume that the set is tight. Then, we have
(4.7) |
Proof.
From Theorem 4.2, we know that for each there exists , , with and , satisfying
(4.8) |
where . Since , it follows that .
From [ABG-book, Theorem 3.6.6] (the standard vanishing discount asymptotics), we know that as the difference and , where is the solution of the -discounted HJB equation Eq. 3.3 . Let
Since the map is continuous, it is easy to see that is closed and due the near-monotonicity assumption (see, Eq. 4.4), it follows that is bounded. Therefore is a compact subset of . Since and is stable, from [ABG-book, Lemma 3.6.1], we have
(4.9) |
Now for any minimizing selector of Eq. 3.3, we get
Since for all , from estimate (3.6.9b) of [ABG-book, Lemma 3.6.3], it follows that
(4.10) |
for all , where is positive number such that and is a positive constant which depends only on and . Now combining Eq. 4.9 and Eq. 4.10, we obtain
(4.11) |
In view of assumption Eq. 4.4, one can choose independent of . Thus Eq. 4.11 implies that
(4.12) |
Hence, by the Banach-Alaoglu theorem and standard diagonalization argument (as in Eq. 3.8), we have there exists such that along a sub-sequence
(4.13) |
Again, since , along a further sub-sequence (without loss of generality denoting by same sequence), we have as . Now, as before, multiplying by test function , from Eq. 4.8, we obtain
By similar argument as in Theorem 3.3, in view of Eq. 4.13, letting it follows that
(4.14) |
Since is arbitrary and , we deduce that satisfies
Since is compact along a further subsequence (denoting by the same sequence) in . Repeating the above argument, one can show that the pair satisfies
As we know for all (see, Eq. 4.8), it is easy to see that . Next we show that is bounded from below. From estimate (3.6.9a) of [ABG-book, Lemma 3.6.3], for each we have
(4.15) |
for some constant which depends only on and . Also, let be a sequence such that as , thus for each we have
(4.16) |
where the last inequality follows form the fact that . Hence, in view of estimate Eq. 4.15, we deduce that
(4.17) |
This implies that the limit . Note that
Since is tight, from [ABG-book, Lemma 3.2.6], we deduce that in total variation norm as , where is the unique invariant measure of Eq. 2.1 corresponding to . Thus, by writing
(4.18) |
and noting that the first term converges to zero by total variation convergence of and the second term converging by the convergence in the control topology on as is fixed; in view of the fact that (continuously over control actions) we conclude that . Therefore, the pair , , which has the properties that and , is a compatible solution (see [AA13, Definition 1.1]) to Eq. 4.2. Since solution to the equation Eq. 4.2 is unique (see [AA13, Theorem 1.1]), it follows that . This completes the proof of the theorem. ∎
In the following theorem, we prove existence and uniqueness of solution of a certain Poisson’s equation. This will be useful in proving the robustness result.
Theorem 4.4.
Suppose that Assumptions (A1) - (A4) hold. Let be a stable control such that
(4.19) |
Then, there exists a unique pair , , with and and , satisfying
(4.20) |
Moreover, we have
-
(i)
.
-
(ii)
for all
(4.21)
Proof.
Since is bounded, we have . Also, since (see, Eq. 4.19) , from [ABG-book, Lemma 3.6.1], it follows that
(4.22) |
where and is the -discounted cost defined as in Eq. 2.2. It known that is a solution to the Poisson’s equation (see, [ABG-book, Lemma A.3.7])
(4.23) |
Since is compact, for some , we have . Thus from [ABG-book, Lemma 3.6.3], we deduce that for each there exist constants depending only on such that
(4.24) |
(4.25) |
Thus, arguing as in [ABG-book, Lemma 3.6.6], we deduce that there exists such that as , and and the pair satisfies
(4.26) |
By Eq. 4.22, we get . Now, in view of estimates Eq. 4.22 and Eq. 4.25, it is easy to see that
(4.27) |
Also, arguing as in Theorem 4.3 (see Section 4.1), from estimate Eq. 4.24 it follows that
(4.28) |
Now, applying It-Krylov formula and using Eq. 4.26 we obtain
This implies
Since is stable, letting , we get
Now dividing both sides of the above inequality by and letting , it follows that
Thus, . This indeed implies that . The representation Eq. 4.21 of follows by closely mimicking the argument of [ABG-book, Lemma 3.6.9]. Therefore, we have a solution pair to Eq. 4.20 satisfying (i) and (ii).
Next we want to prove that the solution pair is unique. To this end, let , , with and and , satisfying
(4.29) |
Applying It-Krylov formula and using Eq. 4.29 we obtain
(4.30) |
Since, , from Eq. 4.30 we obtain . Now, from Eq. 4.26, applying It-Krylov formula, we deduce that
(4.31) |
Since is stable and is bounded from below, for all we have
Hence, letting by Fatou’s lemma from Eq. 4.31, it follows that
Since , letting , we obtain
(4.32) |
From Eq. 4.21 and Eq. 4.32, it is easy to see that in . On the other hand by Eq. 4.20 and Eq. 4.29 one has in . Hence, applying strong maximum principle [GilTru, Theorem 9.6], one has . This proves uniqueness. ∎
Next we prove the robustness result, i.e., we prove that as , where is an optimal ergodic control of the approximated model (see, Theorem 4.2) . In order to establish this result we will assume that is tight, where is the unique invariant measure of Eq. 2.1 corresponding to .
Theorem 4.5.
Suppose that Assumptions (A1) - (A6) hold. Also, assume that
(4.33) |
and the sets and are tight. Then, we have
(4.34) |
Proof.
We shall follow a similar proof program as that of Theorem 3.4, under the discounted setup. Since is bounded, we have . From our assumption Eq. 4.33, we know that . Hence, from Theorem 4.4, we have there exists a unique pair , , with and , satisfying
(4.35) |
with . Moreover, in view of assumption Eq. 4.33, from Eq. 4.27 and Eq. 4.28, we have
(4.36) |
where are constants independent of . Thus by the Banach-Alaoglu theorem and standard diagonalization argument (as in Eq. 3.8), we deduce that exists such that along a sub-sequence
(4.37) |
Again, since , along a further sub-sequence (without loss of generality denoting by same sequence), we have as . Since is compact along a further subsequence (without loss of generality denoting by same sequence) we have as . Now, as before, multiplying by test function and letting , from Eq. 4.35, we deduce that the pair , , satisfies
(4.38) |
Since for all , it is easy to see that . Also, by Eq. 4.36, it follows that . Hence, using Eq. 4.33 and Eq. 4.38, we have is stable. Since is tight, in view of [ABG-book, Lemma 3.2.6], it is easy to see that . Thus, by Lemma 4.4, we deduce that .
Note that
Since as (see, Theorem 4.3), to complete the proof we have to show that as . From Theorem 4.2, we know that the pair , , with , satisfies
(4.39) |
For any minimizing selector , rewriting Eq. 4.39, we get
(4.40) |
Now, in view of estimates Eq. 4.12 and Eq. 4.17, it follows that
(4.41) |
where are constants independent of . Hence, by the Banach-Alaoglu theorem and standard diagonalization argument (see Eq. 3.8), we have there exists such that along a sub-sequence
(4.42) |
Also, implies that along a further subsequence (denoting by same sequence without loss generality) . Since in , multiplying by test functions and letting from Eq. 4.40, we obtain that the pair , satisfies
(4.43) |
Form Eq. 4.41, it easy to see that . Also, since for all , we have . Since is tight, arguing as in proof of Theorem 4.3, we deduce that . Thus, by uniqueness of solution of Eq. 4.43 (see, Theorem 4.4) it follows that . Since both and converges to same limit , we deduce that as . This completes the proof of the theorem. ∎
4.2. Analysis under Lyapunov stability
In this section we study the robustness problem for ergodic cost criterion under Lyapunov stability assumption. We assume the following Foster-Lyapunov condition on the dynamics.
-
(A7)
-
(i)
There exists a positive constant , and a pair of inf-compact functions (i.e., the sub-level sets are compact or empty sets in , respectively for each ) such that
(4.44) where is locally Lipschitz continuous in its first argument uniformly with respect to second.
- (ii)
-
(i)
Combining [ABG-book, Theorem 3.7.11] and [ABG-book, Theorem 3.7.12], we have the following complete characterization of the ergodic optimal control.
Theorem 4.6.
Suppose that assumptions (A1)-(A4) and (A7)(i) hold. Then the ergodic HJB equation
(4.46) |
admits unique solution satisfying . Moreover, we have
-
(i)
-
(ii)
a stationary Markov control is an optimal control (i.e., ) if and only if it satisfies
(4.47) -
(iii)
for any satisfying Eq. 4.47, we have
(4.48)
Again, from [ABG-book, Theorem 3.7.11] and [ABG-book, Theorem 3.7.12], for the approximated model for each , we have the following complete characterization of the optimal control.
Theorem 4.7.
Suppose that Assumptions (A5) and (A7)(ii) hold. Then the ergodic HJB equation
(4.49) |
admits unique solution satisfying . Moreover, we have
-
(i)
-
(ii)
a stationary Markov control is an optimal control (i.e., ) if and only if it satisfies
(4.50) -
(iii)
for any satisfying Eq. 4.50, we have
(4.51)
From [ABG-book, lemma 3.7.8], it is easy to see that the functions are bounded from below. Next we show that under Assumption (A7), as the optimal value of the approximated model converges to the optimal value of the true model.
Theorem 4.8.
Suppose that Assumptions (A1)-(A5) and (A7) hold. Then, it follows that
(4.52) |
Proof.
Since, we get . Also, Eq. 4.44 implies that all is stable and for any (see, [ABG-book, Lemma 3.3.4] and [ABG-book, Lemma 3.2.4(b)]). Thus from [ABG-book, Theorem 3.7.6], we have there exist constants depending only on the radius such that for all , we have
(4.53) |
By standard vanishing discount argument (see [ABG-book, Lemma 3.7.8]) as we have and . Hence the estimates Eq. 4.53 give us . Since the constant is independent of , by standard diagonalization argument and the Banach-Alaoglu theorem, we can extract a subsequence such that for some (as in Eq. 3.8)
(4.54) |
Also, since , along a further sub-sequence (with out loss of generality denoting by same sequence) we have as . Now multiplying both sides of the equation Eq. 4.49 by test functions , we obtain
As in Theorem 3.4, using Eq. 4.54 and letting it follows that satisfies
(4.55) |
Rewriting the equation Eq. 4.55, we have
where
In view of Eq. 4.54 and assumptions (A1) and (A2), it is easy to see that where . Thus, by elliptic regularity [CL89, Theorem 3] (also see, [GilTru, Theorem 9.19]), we obtain .
Next we want to show that . Since we have , where . Also, since is inf-compact for a large enough we have for all . Let be the solution of Eq. 2.15 corresponding to . Hence, in view of Eq. 4.45, by It-Krylov formula, for any and we deduce that
where and . Letting , by Fatou’s lemma we obtain
(4.56) |
Again, by It-Krylov formula, for any and we have
Thus, by Fatou’s lemma letting and using Eq. 4.56 we get
for some positive constant . Hence, by arguing as in the proof of [ABG-book, Lemma 3.7.2 (i)], we have
(4.57) |
Now, following the proof of [ABG-book, Lemma 3.7.8] (see, eq.(3.7.47)), it follows that
(4.58) |
We know that for the space is compactly embedded in where . Since for some positive constant which depends only on , we deduce that , where is a constant. Also, since from Eq. 4.58, it is easy to see that
(4.59) |
Therefore, by combining Eq. 4.54, Eq. 4.57 and Eq. 4.59, we obtain . Since, satisfying satisfies Eq. 4.46, by uniqueness result of Theorem 4.6, we deduce that . This completes the proof of the theorem. ∎
Next theorem proofs existence of a unique solution to a certain equation in some suitable function space. This result will be very useful in establishing our robustness result.
Theorem 4.9.
Suppose that assumptions (A1)-(A4) and (A7)(i) hold. Then for each there exist a unique solution pair for any satisfying
(4.60) |
Furthermore, we have
-
(i)
-
(ii)
for all , we have
(4.61)
Proof.
Existence of a solution pair for any satisfying (i) and (ii) follows from [ABG-book, Lemma 3.7.8] . Now we want to prove the uniqueness of the solutions of Eq. 4.60. Let for any be any other solution pair of Eq. 4.60 with . By It-Krylov formula, for we obtain
(4.62) |
Note that
Thus, letting by monotone convergence theorem, we get
Since , in view of [ABG-book, Lemma 3.7.2 (ii)], letting , we deduce that
(4.63) |
Also, from [ABG-book, Lemma 3.7.2 (ii)], we have
Now, dividing both sides of Eq. 4.63 by and letting , we obtain
This implies that . Using Eq. 4.60, by It-Krylov formula we have
(4.64) |
Also, by It-Krylov formula and using Eq. 4.44 it follows that
Since , form the above estimate, we get
Thus, letting by Fatou’s lemma from Eq. 4.64, it follows that
Since , letting , we deduce that
(4.65) |
Since , from Eq. 4.61 and Eq. 4.65, it is easy to see that in . Also, since and are two solution pairs of Eq. 4.60, we have in . Hence, by strong maximum principle [GilTru, Theorem 9.6], one has . This proves the uniqueness . ∎
Now we are ready to prove the robustness result, i.e., we want to show that as , where is an optimal ergodic control of the approximated model (see, Theorem 4.7) .
Theorem 4.10.
Suppose that Assumptions (A1) - (A5) and (A7) hold. Then, we have
(4.66) |
Proof.
We shall follow a similar proof program as that of Theorem 3.4, under the discounted setup. From Theorem 4.9, we know that for each there exists a unique pair , , with satisfying
(4.67) |
In view of Eq. 4.44, it is easy to see that, each is stable and for any (see, [ABG-book, Lemma 3.3.4] and [ABG-book, Lemma 3.2.4(b)]). Thus, from [ABG-book, Theorem 3.7.4], it follows that where is a constant independent of . Therefore by the Banach-Alaoglu theorem and standard diagonalization argument (as in Eq. 3.8), we deduce that there exists such that along a sub-sequence
(4.68) |
Again, since , along a further sub-sequence (without loss of generality denoting by same sequence), we have as . Since is compact along a further subsequence (without loss of generality denoting by same sequence) we have as . Now, as in Theorem 3.4, multiplying by test function and letting , from Eq. 4.67, it is easy to see that , , satisfies
(4.69) |
As we know that for all , we deduce that . Arguing as in Theorem 4.8 and using the estimate , we have
(4.70) |
Thus, by uniqueness of solution of Eq. 4.69 (see, Theorem 4.9), we deduce that .
By the triangle inequality
From Theorem 4.7 we have as . Hence to complete the proof we have to show that as . Now, for any minimizing selector of Eq. 4.49, we have
(4.71) |
In view of the estimate Eq. 4.53, we obtain
(4.72) |
where is a constant independent of . Hence, by the Banach-Alaoglu theorem and standard diagonalization argument (see Eq. 3.8), we have there exists such that along a sub-sequence
(4.73) |
Since, long a further subsequence (denoting by same sequence without loss generality) . As we know in , multiplying both sides of Eq. 4.71 by test functions and letting , it follows that , satisfies
(4.74) |
Arguing as in Theorem 4.8, one can show that . Hence, by uniqueness of solution of Eq. 4.71 (see, Theorem 4.9) we deduce that . Since both and converge to same limit , it follows that as . This completes the proof of the theorem. ∎
5. Finite Horizon Cost
In this section we study the robustness problem under a finite horizon criterion . We will assume that satisfy the following:
-
(FN1)
The functions satisfy
and
for some positive constant . Furthermore, , .
From [BL84-book, Theorem 3.3, p. 235], the finite horizon optimality equation (or, the HJB equation)
(5.1) | ||||
(5.2) |
admits a unique solution , for some and . Now, by Itô-Krylov formula (as in [HP09-book, Theorem 3.5.2]), there exist an optimal Markov policy, i.e., there exists such that .
Similarly, for each (for the approximating models) the optimality equation
(5.3) | ||||
(5.4) |
admits a unique solution , . Moreover, by the Itô-Krylov formula (as in [HP09-book, Theorem 3.5.2]), there exists such that .
The following theorem shows that as the approximating model approaches the true model the optimal value of the approximating model converge to the optimal value of the true model.
Theorem 5.1.
Suppose Assumptions (A1), (A3) and (FN1) hold. Then
Proof.
For any minimizing selector of Eq. 5.3, we have
(5.5) | ||||
(5.6) |
By the Itô-Krylov formula, it follows that
(5.7) |
This implies that
(5.8) |
Rewriting Eq. 5.5, it follows that
for some fixed . Thus, by parabolic pde estimate [BL84-book, eq. (3.8), p. 234], we deduce that
(5.9) |
Thus, from Eq. 5.8 and Eq. 5.9, we obtain for some positive constant (independent of ) . Since is a reflexive Banach space, as a corollary of the Banach-Alaoglu theorem, there exists such that along a subsequence (without loss of generality denoting by same sequence)
(5.10) |
Now, as in our earlier analysis for the different cost criteria considered, multiplying both sides of the Eq. 5.3 by test function and integrating, we get
(5.11) |
Thus, in view of Eq. 5.10, letting , from Eq. 5.11 it follows that (arguing as in Section 3 - Eq. 3.11)
Since is arbitrary, from the above equation we deduce that satisfies
(5.12) |
Since is the unique solution of Section 5, we deduce that . This completes the proof. ∎
In the following theorem, we prove the robustness result for the finite horizon cost criterion.
Theorem 5.2.
Suppose Assumptions (A1), (A3) and (FN1) hold. Then for any optimal control of the approximating models we have
Proof.
Since the space is compact (with topology defined as in [YukselPradhan, Definition 2.2]), along a sub-sequence . From [BL84-book, Theorem 3.3, p. 235], we have that for each there exists a unique solution , , to the following Poisson equation
(5.13) |
By Itô-Krylov formula, from Section 5 it follows that
(5.14) |
This gives us
(5.15) |
Arguing as in Theorem 5.1, letting from Section 5, we deduce that there exists , , satisfying
(5.16) |
Now using Section 5, by Itô-Krylov formula we deduce that
(5.17) |
6. Control up to an Exit Time
Before we conclude the paper, let us also briefly note that if one consider an optimal control up to an exit time with the cost given as:
-
•
(in true model:) for each the associated cost is given as
-
•
(in approximated models:) for each and the associated cost is given as
where is a smooth bounded domain, , is the discount function and is the terminal cost function. In the true model the optimal value is defined as , and in the approximated model the optimal value is defined as . We assume that , . As in [RZ21], [B05Survey, p.229] the analysis leads to the following HJB equation.
By similar argument as in [ABG-book, Theorem 3.5.3], [ABG-book, Theorem 3.5.6] we have that , are unique solutions to their respective HJB equations. Existence follows by utilizing the Leray-Schauder fixed point theorem as in [ABG-book, Theorem 3.5.3] and uniqueness follows by It-Krylov formula as in [ABG-book, Theorem 3.5.6] . Using standard elliptic PDE estimates (on bounded domain ) and closely mimicking the arguments as in Theorem 3.3, we have the following continuity result
Theorem 6.1.
Suppose Assumptions (A1)-(A5) hold. Then
For each , suppose that , are optimal controls of the approximated model and true model respectively. Then in view of the the above continuity result, following the steps of the proof of the Theorem 3.4, we obtain the following robustness result.
Theorem 6.2.
Suppose Assumptions (A1)-(A5) hold. Then
7. Revisiting Example 2.1
Consider Example 2.1(i).
- •
- •
-
•
Finite horizon cost: For each , let be a finite horizon optimal control when the system is governed by LABEL:{ERS1.1} . Then in view of Theorem 5.2, we have
(7.3) -
•
For cost up to an exit time: Let be a an optimal control when the system is governed by LABEL:{ERS1.1}, for each . Then Theorem 6.2 ensures that
(7.4)
8. Conclusion
In this paper, we studied continuity of optimal costs and robustness/stability of optimal control policies designed for an incorrect models applied to an actual model for both discounted/ergodic cost criteria. In our analysis we have crucially used the fact that our actual model is a non-degenerate diffusion model. It would be an interesting problem to investigate if such results can be proved in the cases when the limiting system (actual system) is a degenerate diffusion system. Also, in our analysis we have assumed that our system noise is given by a Wiener process; it would be interesting to study further noise processes e.g., when system noise is a wide-bandwidth process or a more general discontinuous martingale noise (as in [K90], [KR87], [KR87a], [KR88]) . In the latter case the controlled process may become non-Markovian process even under stationary Markov policies. Therefore, it is reasonable to find suitable Markovian approximation of it which maintains the necessary properties of the original system. The analysis of robustness problems in this setting is a direction of research worth pursuing.