Robustness of Stochastic Optimal Control to Approximate Diffusion Models under Several Cost Evaluation Criteria

Somnath Pradhan^† ^†Department of Mathematics and Statistics, Queen’s University, Kingston, ON, Canada [email protected] and Serdar Yüksel^‡ ^‡Department of Mathematics and Statistics, Queen’s University, Kingston, ON, Canada [email protected]

Abstract.

In control theory, typically a nominal model is assumed based on which an optimal control is designed and then applied to an actual (true) system. This gives rise to the problem of performance loss due to the mismatch between the true model and the assumed model. A robustness problem in this context is to show that the error due to the mismatch between a true model and an assumed model decreases to zero as the assumed model approaches the true model. We study this problem when the state dynamics of the system are governed by controlled diffusion processes. In particular, we will discuss continuity and robustness properties of finite horizon and infinite-horizon $\alpha$ -discounted/ergodic optimal control problems for a general class of non-degenerate controlled diffusion processes, as well as for optimal control up to an exit time. Under a general set of assumptions and a convergence criterion on the models, we first establish that the optimal value of the approximate model converges to the optimal value of the true model. We then establish that the error due to mismatch that occurs by application of a control policy, designed for an incorrectly estimated model, to a true model decreases to zero as the incorrect model approaches the true model. We will see that, compared to related results in the discrete-time setup, the continuous-time theory will let us utilize the strong regularity properties of solutions to optimality (HJB) equations, via the theory of uniformly elliptic PDEs, to arrive at strong continuity and robustness properties.

Key words and phrases:

Robust control, Controlled diffusions, Hamilton-Jacobi-Bellman equation, Stationary control

2000 Mathematics Subject Classification:

Primary: 93E20, 60J60; secondary: 49J55

1. Introduction

In stochastic control applications, typically only an ideal model is assumed, or learned from available incomplete data, based on which an optimal control is designed and then applied to the actual system. This gives rise to the problem of performance loss due to the mismatch between the actual system and the assumed system. A robustness problem in this context is to show that the error due to mismatch decreases to zero as the assumed system approaches the actual system. With this motivation, in this article, our goal is to study the continuity and robustness properties of finite horizon and infinite horizon discounted/ergodic cost problems for a large class of multidimensional controlled diffusions. We note that the problems of existence, uniqueness and verification of optimality of stationary Markov policies have been studied extensively in literature see e.g., [Bor-book], [HP09-book] (finite horizon) [BS86], [BB96] (discounted cost) [AA12], [AA13], [BG88I], [BG90b] (ergodic cost) and references therein. For a book-length exposition of this topic see e.g., [ABG-book].

In more explicit terms, here is the problem that we will study. For a precise statement please see Section 2.3. Suppose that our true model is represented as $(X,c)$ (see, e.g., Eq. 2.1), where $X$ is the true system model (representing the system evolution model via the drift and diffusion terms) and $c$ is the associated running cost function, and let $(X_{n},c_{n})$ (see, e.g., Eq. 2.15) be a sequence of approximating models $X_{n}$ with associated running cost functions $c_{n}$ , such that as $n\to\infty$ approximating models $X_{n}$ converge to the true model $X$ in some sense to be precisely stated. Suppose that for each choice of control policy $U$ the associated total cost in true/approximating models are ${\mathcal{J}}(c,U),$ ${\mathcal{J}}_{n}(c_{n},U)$ respectively. The objective of the controller is to minimize the total cost over all possible admissible policies. If the optimal control policies of the true/approximating models are $v^{*}$ , $v^{*n}$ respectively, the performance loss due to mismatch is given by $|{\mathcal{J}}(c,v^{*})-{\mathcal{J}}(c,v^{*n})|$ . Thus the robustness problem in this context is to show that $|{\mathcal{J}}(c,v^{*})-{\mathcal{J}}(c,v^{*n})|\to 0$ as $n\to\infty$ . See Section 2.3. In this sense, our paper can be viewed as a continuous-time counterpart of the setting studied in [KY-20], [KRY-20].

This problem is of major practical importance and, accordingly, there have been many studies. Most of the existing works in this direction are concerned with the discrete time Markov decision process, see for instance [KY-20], [KRY-20], [BJP02], [KV16], [NG05] [SX15], and references therein.

We should note that the term robustness has various interpretations, contexts and solution methods. A common approach to robustness in the literature has been to design controllers that work sufficiently well for all possible uncertain systems under some structured constraints, such as $H_{\infty}$ norm bounded perturbations (see [basbern], [zhou1996robust]). For such problems, the design for robust controllers has often been developed through a game theoretic formulation where the minimizer is the controller and the maximizer is the uncertainty. In [DJP00], [jacobson1973optimal] the authors established the connections of this formulation to risk sensitive control. Using Legendre-type transforms, relative entropy constraints came in to the literature to probabilistically model the uncertainties, see e.g. [dai1996connections, Eqn. (4)] or [DJP00, Eqns. (2)-(3)]. Here, one selects a nominal system which satisfies a relative entropy bound between the actual measure and the nominal measure, solves a risk sensitive optimal control problem, and this solution value provides an upper bound for the original system performance. Therefore, a common approach in robust stochastic control has been to consider all models which satisfy certain bounds in terms of relative entropy pseudo-distance (or Kullback-Leibler divergence); see e.g. [DJP00, dai1996connections, dupuis2000kernel, boel2002robustness] among others. In order to quantify the uncertainty in the system models, other than the relative entropy pseudo-distance, various other metrics/criterion have also been used in the literature. In [tzortzis2015dynamic], for discrete time controlled models, the authors have studied a min-max formulation for robust control where the one-stage transition kernel belongs to a ball under the total variation metric for each state action pair. For distributionally robust stochastic optimization problems, it is assumed that the underlying probability measure of the system lies within an ambiguity set and a worst case single-stage optimization is made considering the probability measures in the ambiguity set. To construct ambiguity sets, [blanchet2016], [esfahani2015] use the Wasserstein metric, [erdogan2005] uses the Prokhorov metric which metrizes the weak topology, [sun2015] uses the total variation distance, and [lam2016] works with relative entropy. For fully observed finite state-action space models with uncertain transition probabilities, the authors in [iyengar2005robust], [nilim2005robust] have studied robust dynamic programming approaches through a min-max formulation. Similar work with model uncertainty includes [oksendal2014forward], [benavoli2011robust], [xu_mannor]. In the economics literature related work has been done in [hansen2001robust], [gossner2008entropy].

The robustness formulation we study has been considered in [KY-20], [KRY-20] for discrete-time models, where the authors studied both continuity of value functions as transition kernel models converge, as well as the robustness problem where an optimal control designed for an incorrect approximate model is applied to a true model and the mismatch term is studied. The solution approach is fundamentally different in the continuous-time analysis we present in this paper. In a related study [Dean18], the author studied the optimal control of systems with unknown dynamics for a linear quadratic regulator setup and proposes an algorithm to learn the system from observed data with quantitative convergence bounds. The author in [Lan81, Theorem 5.1] has considered fully observed discrete time controlled models and established continuity results for approximate models and gives a set convergence result for sets of optimal control actions, this set convergence result is inconclusive for robustness without further assumptions on the true system model (for more details see [KY-20]). For fully observed MDPs, [muller1997does] studied continuity of the value function under a general metric defined as the integral probability metric, which captures both the total variation metric or the Kantorovich metric with different setups (which is not weaker than the metrics leading to weak convergence). A recent study on game problems along a similar theme is presented in [subramanian2021robustness].

For control problems of MDPs with standard Borel spaces, the approximation methods through quantization, which lead to finite models, can be viewed as approximations of transition kernels, but this interpretation requires caution: indeed, [SaYuLi17, arruda2012, arruda2013], among many others, study approximation methods for MDPs where the convergence of approximate models is satisfied in a particularly constructed fashion. Reference [SaYuLi17] presents a construction for the approximate models through quantizing the actual model with continuous spaces (leading to a finite space model), which allows for continuity and robustness results with only a weak continuity assumption on the true transition kernel which, in turn, leads to the weak convergence of the approximate models. For both fully observed and partially observed models, a detailed analysis of approximation methods for continuous state and action spaces can be found in [SaLiYuSpringer] .

The literature on robustness of stochastic optimal control for continuous time system seems to be rather limited; see e.g., [GL99], [LJE15] [hansen2001robust] . In [GL99] the authors have considered the problem of controlling a system whose dynamics are given by a stochastic differential equation (SDE) whose coefficients are known only up to a certain degree of accuracy. For the finite horizon reward maximization problem, using the technique of contractive operators, [GL99] has obtained upper bounds of performance loss due to mismatch (or, “robustness index”) and has shown by an example that the robustness index may behave abnormally even if we have the convergence of the value functions. The associated discounted payoff maximization problem has been studied in [LJE15], where using a Lyapunov type stability assumption the authors have studied the robustness problem via a game theoretic formulation. For controlled diffusion models, the authors in [hansen2001robust] described the links between the max-min expected utility theory and the applications of robust-control theory, in analogy with some of the papers on discrete-time noted above adopting a min-max formulation. Along a further direction, for controlled diffusions, via the Maximum Principle technique, [PDPB02a], [PDPB02b], [PDPB02c] have established the robustness of optimal controls for the finite horizon payoff criterion.

In a recent comprehensive work [RZ21], the authors have studied the robustness of feedback relaxed controls for a continuous time stochastic exit time problem. Under sufficient smoothness assumptions on the coefficients (i.e, uniform Lipschitz continuity on the diffusion coefficients and uniform Hölder continuity on the discount factor and payoff function on a fixed bounded domain) they have established that a regularized control problem admits a Hölder continuous optimal feedback control and also they have shown that both the value function and the feedback control of the regularized control problem are Lipschitz stable with respect to parameter perturbations when the action space is finite. It is known that the optimal control obtained form the HJB equation (i.e. the argmin function) in general is unstable with respect to perturbations of coefficients; in practice, this would result in numerical instability of learning algorithms (as noted in [RZ21]).

Stability/continuity of solutions of PDEs with respect coefficient perturbations is a significant mathematical and practical question in PDE theory (see e.g. [WLS01], [SI72]). The continuity results established in this paper (see Theorems 3.3, 4.3, 4.8) will provide sufficient conditions which ensure stability of solutions of semilinear elliptic PDEs (HJB equations) in the whole space ${\mathds{R}^{d}}$ .

Our robustness results also will be useful to the study of the robust optimal investment problems for local volatility models, e.g. given in [AS08, Remark 2.1] (also, see [KT12], [BDD20]) .

When the system noise is not given by a Wiener process, but it is given by a general wide bandwidth noise (or, a more general discontinuous martingales [LRT00]), the controlled process becomes a non-Markovian process even under stationary Markov policies. The general method of studying optimal control problem for such a system is to find suitable Markovian processes which approximate the non-Markovian process (see, [K90], [KR87], [KR87a], [KR88]). For wide bandwidth noise driven controlled systems [K90], [KR87], [KR87a], [KR88], diffusion approximation techniques were used to study stochastic optimal control problems. The results described in this paper are complementary to the above mentioned works on the diffusion approximation of wide bandwidth noise driven systems.

Contributions and main results. In the present paper, our aim is to study the continuity and robustness properties for a general class of controlled diffusion processes in ${\mathds{R}^{d}}$ for both infinite horizon discounted/ ergodic costs, where the action space is a (general) compact metric space. As in [KY-20], [KRY-20], in order to establish our desired robustness results we will use the continuity result as an intermediate step. For the discounted cost case, we will establish our results following a direct approach (under a relatively weaker set of assumptions on the diffusion coefficients, i.e., locally Lipschitz continuous coefficients). Using the results on existence and uniqueness of solutions of the associated discounted Hamilton Jacobi Bellman (HJB) equation and the complete characterization of (discounted) optimal policies in the space of stationary Markov policies (see [ABG-book, Theorem 3.5.6]), we first establish the continuity of value functions. Then utilizing this continuity of value functions, we derive a robustness result. The analysis of ergodic cost (or long-run expected average cost) is somewhat more involved. To the best of our knowledge there is no work on continuity and robustness properties of optimal controls for the ergodic cost criterion in the existing literature (for the discrete-time setup, see [KRY-20]). We have studied these ergodic cost problems under two sets of assumptions: In the first case, we assume that our running cost function satisfies a near-monotone type structural assumption (see, eq. Eq. 4.1, Assumption (A6)), and in the second case we assume Lyapunov type stability assumptions on the dynamics of the system (see Assumption (A7)) .

One of the major issues in analyzing the robustness of ergodic optimal controls under the near-monotone hypothesis is the non-uniqueness/restricted uniqueness of solutions of the associated HJB equation (see, [ABG-book, Example 3.8.3], [AA13]). It is shown in [ABG-book, Example 3.8.3] that the ergodic HJB equation may admit uncountably many solutions. Considering this, in [AA13, Theorem 1.1] the author has established the uniqueness of compatible solution pairs (see [AA13, Definition 1.1]). Exploiting this uniqueness result, under a suitable tightness assumption (on a certain set of invariant measures) we will establish the desired robustness result. Under the Lyapunov type stability assumption it is known that the ergodic HJB equation admits a unique solution in a certain class of functions, also the complete characterization of ergodic optimal control is known (see [ABG-book, Theorem 3.7.11] and [ABG-book, Theorem 3.7.12]) . Utilizing this characterization of optimal controls, we derive the robustness properties of ergodic optimal controls under a Lyapunov stability assumption.

We also emphasize the duality between the PDE approach vs. a probabilistic flow approach to study robustness. The PDE approach presents a very general and conclusive, yet concise and unified, approach for several cost criteria (notably, a probabilistic approach via Dynkin’s lemma would require separate arguments for discounted infinite-horizon and average cost infinite-horizon criteria) and such a unified approach had not been considered earlier, to our knowledge.

Thus, the main results of this article can be roughly described as follows.

•

For discounted cost criterion: We establish continuity of value functions and provide sufficient conditions which ensure robustness/stability of optimal controls designed under model uncertainties.
•

For ergodic cost criterion: Under two different sets of assumptions ((i) where the running cost is near-monotone or (ii) where a Lyapunov stability condition holds) we establish the continuity of value functions and exploiting the continuity results we derive the robustness/stability of ergodic optimal controls designed for approximate models applied to actual systems.
•

For finite horizon cost criterion: Under uniform boundedness assumptions on the drift term and diffusion matrices (of the true and approximating models), we establish continuity of value functions. Then exploiting the continuity result we prove the robustness/stability of optimal controls designed under model uncertainties.
•

For cost up to an exit time: Similar to the above criteria, under a mild set of assumptions we first establish the continuity of value functions and then using the continuity results we establish the robustness/stability of optimal controls designed under model uncertainties.

We will see that compared with the discrete-time counterpart of this problem studied in [KY-20] (discounted cost) and [KRY-20] (average cost), where value iteration methods were crucially used, in our analysis here we will develop rather direct arguments, with strong implications, utilizing regularity properties of value functions: In the discrete-time setup, these properties need to be established via tedious arguments whereas the continuous-time theory allows for the use of regularity properties of solutions to PDEs. Nonetheless, we will see that continuous convergence in control actions of models and cost functions is a unifying condition for continuity and robustness properties in both the discrete-time setup studied in [KY-20] (discounted cost) and [KRY-20] (average cost) and our current paper. Compared to [RZ21], in addition to the infinite horizon criteria we study, we note that the perturbations we consider do not involve only coefficient/parameter variations, i.e., we consider functional perturbations, and the action space we consider is uncountable, though we do not establish the Lipschitz property of control policies, unlike [RZ21].

The rest of the paper is organized as follows. Section 2 introduces the the problem setup and summarizes the notation. Section 3 is devoted to the analysis of robustness of optimal controls for discounted cost criterion. In Section 4 we provide the analysis of robustness of ergodic optimal control under two different sets of hypotheses (i) near-monotonicity (ii) Lyapunov stability. For the finite horizon cost criterion the robustness problem is analyzed in Section 5. The robustness problem for optimal controls up to an exit time is considered in Section 6.

2. Description of the problem

Let $\mathbb{U}$ be a compact metric space and $\mathrm{V}=\mathscr{P}(\mathbb{U})$ be the space of probability measures on $\mathbb{U}$ with topology of weak convergence. Let

b:{\mathds{R}^{d}}\times\mathbb{U}\to{\mathds{R}^{d}},

\sigma:{\mathds{R}^{d}}\to\mathds{R}^{d\times d},\,\sigma=[\sigma_{ij}(\cdot)]_{1\leq i,j\leq d},

be given functions. We consider a stochastic optimal control problem whose state is evolving according to a controlled diffusion process given by the solution of the following stochastic differential equation (SDE)

\mathrm{d}X_{t}\,=\,b(X_{t},U_{t})\mathrm{d}t+\upsigma(X_{t})\mathrm{d}W_{t}\,,\quad X_{0}=x\in{\mathds{R}^{d}}.

(2.1)

Where

•

$W$ is a $d$ -dimensional standard Wiener process, defined on a complete probability space $(\Omega,{\mathfrak{F}},\mathbb{P})$ .
•

We extend the drift term $b:{\mathds{R}^{d}}\times\mathrm{V}\to{\mathds{R}^{d}}$ as follows:

$b(x,\mathrm{v})=\int_{\mathbb{U}}b(x,\zeta)\mathrm{v}(\mathrm{d}\zeta),$

for $\mathrm{v}\in\mathrm{V}$ .

•

$U$ is a $\mathrm{V}$ valued process satisfying the following non-anticipativity condition: for $s<t\,,$ $W_{t}-W_{s}$ is independent of

{\mathfrak{F}}_{s}:=\,\,\mbox{the completion of}\,\,\,\sigma(X_{0},U_{r},W_{r}:r\leq s)\,\,\,\mbox{relative to}\,\,({\mathfrak{F}},\mathbb{P})\,.

The process $U$ is called an admissible control, and the set of all admissible controls is denoted by $\mathfrak{U}$ (see, [BG90]).

To ensure existence and uniqueness of strong solutions of Eq. 2.1, we impose the following assumptions on the drift $b$ and the diffusion matrix $\upsigma$ .

(A1)

Local Lipschitz continuity: The function $\upsigma\,=\,\bigl{[}\upsigma^{ij}\bigr{]}\colon\mathds{R}^{d}\to\mathds{R}^{d\times d}$ , $b\colon{\mathds{R}^{d}}\times\mathbb{U}\to{\mathds{R}^{d}}$ are locally Lipschitz continuous in $x$ (uniformly with respect to the control action for $b$ ). In particular, for some constant $C_{R}>0$ depending on $R>0$ , we have

$\lvert b(x,\zeta)-b(y,\zeta)\rvert^{2}+\lVert\upsigma(x)-\upsigma(y)\rVert^{2}\,\leq\,C_{R}\,\lvert x-y\rvert^{2}$

for all $x,y\in{\mathscr{B}}_{R}$ and $\zeta\in\mathbb{U}$ , where $\lVert\upsigma\rVert:=\sqrt{\operatorname*{Tr}(\upsigma\upsigma^{\mathsf{T}})}$ . Also, we are assuming that $b$ is jointly continuous in $(x,\zeta)$ .

(A2)

Affine growth condition: $b$ and $\upsigma$ satisfy a global growth condition of the form

\sup_{\zeta\in\mathbb{U}}\,\langle b(x,\zeta),x\rangle^{+}+\lVert\upsigma(x)\rVert^{2}\,\leq\,C_{0}\bigl{(}1+\lvert x\rvert^{2}\bigr{)}\qquad\forall\,x\in\mathds{R}^{d},

for some constant $C_{0}>0$ .

(A3)

Nondegeneracy: For each $R>0$ , it holds that

$\sum_{i,j=1}^{d}a^{ij}(x)z_{i}z_{j}\,\geq\,C^{-1}_{R}\lvert z\rvert^{2}\qquad\forall\,x\in{\mathscr{B}}_{R}\,,$

and for all $z=(z_{1},\dotsc,z_{d})^{\mathsf{T}}\in\mathds{R}^{d}$ , where $a:=\frac{1}{2}\upsigma\upsigma^{\mathsf{T}}$ .

By a Markov control we mean an admissible control of the form $U_{t}=v(t,X_{t})$ for some Borel measurable function $v:\mathds{R}_{+}\times{\mathds{R}^{d}}\to\mathrm{V}$ . The space of all Markov controls is denoted by $\mathfrak{U}_{\mathsf{m}}$ . If the function $v$ is independent of $t$ , then $U$ or by an abuse of notation $v$ itself is called a stationary Markov control. The set of all stationary Markov controls is denoted by $\mathfrak{U}_{\mathsf{sm}}$ . From [ABG-book, Section 2.4], we have that the set $\mathfrak{U}_{\mathsf{sm}}$ is metrizable with compact metric under the following topology: A sequence $v_{n}\to v$ in $\mathfrak{U}_{\mathsf{sm}}$ if and only if

\lim_{n\to\infty}\int_{{\mathds{R}^{d}}}f(x)\int_{\mathbb{U}}g(x,u)v_{n}(x)(\mathrm{d}u)\mathrm{d}x=\int_{{\mathds{R}^{d}}}f(x)\int_{\mathbb{U}}g(x,u)v(x)(\mathrm{d}u)\mathrm{d}x

for all $f\in L^{1}({\mathds{R}^{d}})\cap L^{2}({\mathds{R}^{d}})$ and $g\in{\mathcal{C}}_{b}({\mathds{R}^{d}}\times\mathbb{U})$ (for more details, see [ABG-book, Lemma 2.4.1]) . It is well known that under the hypotheses (A1)–(A3), for any admissible control Eq. 2.1 has a unique strong solution [ABG-book, Theorem 2.2.4], and under any stationary Markov strategy Eq. 2.1 has unique strong solution which is a strong Feller (therefore strong Markov) process [ABG-book, Theorem 2.2.12].

2.1. Cost Criteria

Let $c\colon{\mathds{R}^{d}}\times\mathbb{U}\to\mathds{R}_{+}$ be the running cost function. We assume that

(A4)

The running cost $c$ is bounded (i.e., $\|c\|_{\infty}\leq M$ for some positive constant $M$ ), jointly continuous in $(x,\zeta)$ and locally Lipschitz continuous in its first argument uniformly with respect to $\zeta\in\mathbb{U}$ .

This condition (A4) can also be relaxed to (A4)́, to be presented further below, where the local Lipschitz property is eliminated.

We extend $c\colon{\mathds{R}^{d}}\times\mathrm{V}\to\mathds{R}_{+}$ as follows: for $\mathrm{v}\in\mathrm{V}$

c(x,\mathrm{v}):=\int_{\mathbb{U}}c(x,\zeta)\mathrm{v}(\mathrm{d}\zeta)\,.

In this article, we consider the problem of minimizing finite horizon, discounted, ergodic and control up to an exit time cost criteria:

Discounted cost criterion. For $U\in\mathfrak{U}$ , the associated $\alpha$ -discounted cost is given by

{\mathcal{J}}_{\alpha}^{U}(x,c)\,:=\,\operatorname{\mathbb{E}}_{x}^{U}\left[\int_{0}^{\infty}e^{-\alpha s}c(X_{s},U_{s})\mathrm{d}s\right],\quad x\in{\mathds{R}^{d}}\,,

(2.2)

where $\alpha>0$ is the discount factor and $X(\cdot)$ is the solution of Eq. 2.1 corresponding to $U\in\mathfrak{U}$ and $\operatorname{\mathbb{E}}_{x}^{U}$ is the expectation with respect to the law of the process $X(\cdot)$ with initial condition $x$ . The controller tries to minimize Eq. 2.2 over his/her admissible policies $\mathfrak{U}$ . Thus, a policy $U^{*}\in\mathfrak{U}$ is said to be optimal if for all $x\in{\mathds{R}^{d}}$

{\mathcal{J}}_{\alpha}^{U^{*}}(x,c)=\inf_{U\in\mathfrak{U}}{\mathcal{J}}_{\alpha}^{U}(x,c)\,\,\,(\,=:\,\,\,V_{\alpha}(x))\,,

(2.3)

where $V_{\alpha}(x)$ is called the optimal value.

Ergodic cost criterion. For each $U\in\mathfrak{U}$ the associated ergodic cost is defined as

{\mathscr{E}}_{x}(c,U)=\limsup_{T\to\infty}\frac{1}{T}\operatorname{\mathbb{E}}_{x}^{U}\left[\int_{0}^{T}c(X_{s},U_{s})\mathrm{d}{s}\right]\,,

(2.4)

and the optimal value is defined as

{\mathscr{E}}^{*}(c)\,:=\,\inf_{x\in{\mathds{R}^{d}}}\inf_{U\in\mathfrak{U}}{\mathscr{E}}_{x}(c,U)\,.

(2.5)

Then a control $U^{*}\in\mathfrak{U}$ is said to be optimal if we have

{\mathscr{E}}_{x}(c,U^{*})={\mathscr{E}}^{*}(c)\quad\text{for all}\,\,\,x\in{\mathds{R}^{d}}\,.

(2.6)

Finite horizon cost. For $U\in\mathfrak{U}$ , the associated finite horizon cost is given by

{\mathcal{J}}_{T}^{U}(x,c)=\operatorname{\mathbb{E}}_{x}^{U}\left[\int_{0}^{T}c(X_{s},U_{s})\mathrm{d}{s}+H(X_{T})\right]\,,

(2.7)

where $H(\cdot)$ is the terminal cost. The optimal value is defined as

{\mathcal{J}}_{T}^{*}(x,c)\,:=\,\inf_{U\in\mathfrak{U}}{\mathcal{J}}_{T}^{U}(x,c)\,.

(2.8)

Thus, a policy $U^{*}\in\mathfrak{U}$ is said to be (finite horizon) optimal if we have

{\mathcal{J}}_{T}^{U^{*}}(x,c)={\mathcal{J}}_{T}^{*}(x,c)\quad\text{for all}\,\,\,x\in{\mathds{R}^{d}}\,.

(2.9)

Control up to an exit time. This criterion will be presented in Section 6. Our analysis for this criterion will be immediate given the study involving the above criteria.

We define a family of operators ${\mathscr{L}}_{\zeta}$ mapping ${\mathcal{C}}^{2}({\mathds{R}^{d}})$ to ${\mathcal{C}}({\mathds{R}^{d}})$ by

{\mathscr{L}}_{\zeta}f(x)\,:=\,\operatorname*{Tr}\bigl{(}a(x)\nabla^{2}f(x)\bigr{)}+\,b(x,\zeta)\cdot\nabla f(x)\,,

(2.10)

for $\zeta\in\mathbb{U}$ , $f\in{\mathcal{C}}^{2}({\mathds{R}^{d}})$ . For $\mathrm{v}\in\mathrm{V}$ we extend ${\mathscr{L}}_{\zeta}$ as follows:

{\mathscr{L}}_{\mathrm{v}}f(x)\,:=\,\int_{\mathbb{U}}{\mathscr{L}}_{\zeta}f(x)\mathrm{v}(\mathrm{d}\zeta)\,.

(2.11)

For $v\in\mathfrak{U}_{\mathsf{sm}}$ , we define

{\mathscr{L}}_{v}f(x)\,:=\,\operatorname*{Tr}(a\nabla^{2}f(x))+b(x,v(x))\cdot\nabla f(x)\,.

(2.12)

We are interested in the robustness of optimal controls under these criteria. To this end, we now introduce our approximating models.

2.2. Approximating Control Diffusion Process:

Let, $\upsigma_{n}\,=\,\bigl{[}\upsigma_{n}^{ij}\bigr{]}\colon\mathds{R}^{d}\to\mathds{R}^{d\times d}$ , $b_{n}\colon{\mathds{R}^{d}}\times\mathbb{U}\to{\mathds{R}^{d}}$ , $c_{n}\colon{\mathds{R}^{d}}\times\mathbb{U}\to{\mathds{R}^{d}}$ be sequence of functions satisfying the following assumptions

(A5)

(i)

as $n\to\infty$

$\upsigma_{n}(x)\to\upsigma(x)\quad\text{a.e.}\,\,x\in{\mathds{R}^{d}}\,,$ (2.13)

(ii)

Continuous convergence in controls: for any sequence $\zeta_{n}\to\zeta$

c_{n}(x,\zeta_{n})\to c(x,\zeta)\quad\text{and}\quad b_{n}(x,\zeta_{n})\to b(x,\zeta)\quad\text{a.e.}\,\,x\in{\mathds{R}^{d}}\,.

(2.14)

(iii)

for each $n\in\mathds{N}$ , $b_{n}$ and $\upsigma_{n}$ satisfy Assumptions (A1) - (A3) and $c_{n}$ is uniformly bounded ( in particular, $\lVert c_{n}\rVert_{\infty}\leq M$ where $M$ is a positive constant as in (A4)), jointly continuous in $(x,\zeta)$ and locally Lipschitz continuous in its first argument uniformly with respect to $\zeta\in\mathbb{U}$ . .

Let for each $n\in\mathds{N}$ , $X_{t}^{n}$ be the solution of the following SDE

\mathrm{d}X_{t}^{n}\,=\,b_{n}(X_{t}^{n},U_{t})\mathrm{d}t+\upsigma_{n}(X_{t}^{n})\mathrm{d}W_{t}\,,\quad X_{0}^{n}=x\in{\mathds{R}^{d}}.

(2.15)

Define a family of operators ${\mathscr{L}}_{\zeta}^{n}$ mapping ${\mathcal{C}}^{2}({\mathds{R}^{d}})$ to ${\mathcal{C}}({\mathds{R}^{d}})$ by

{\mathscr{L}}_{\zeta}^{n}f(x)\,:=\,\operatorname*{Tr}\bigl{(}a_{n}(x)\nabla^{2}f(x)\bigr{)}+\,b_{n}(x,\zeta)\cdot\nabla f(x)\,,

(2.16)

for $\zeta\in\mathbb{U}$ , $f\in{\mathcal{C}}^{2}({\mathds{R}^{d}})$ . For the approximated model, for each $n\in\mathds{N}$ and $U\in\mathfrak{U}$ the associated discounted cost is defined as

{\mathcal{J}}_{\alpha,n}^{U}(x,c_{n})\,:=\,\operatorname{\mathbb{E}}_{x}^{U}\left[\int_{0}^{\infty}e^{-\alpha t}c_{n}(X_{s}^{n},U_{s})\mathrm{d}s\right],\quad x\in{\mathds{R}^{d}}\,,

(2.17)

and the optimal value is defined as

V_{\alpha}^{n}(x)\,:=\,\inf_{U\in\mathfrak{U}}{\mathcal{J}}_{\alpha,n}^{U}(x,c_{n})

(2.18)

For each $n\in\mathds{N}$ and $U\in\mathfrak{U}$ the associated ergodic cost is defined as

{\mathscr{E}}_{x}^{n}(c_{n},U)=\limsup_{T\to\infty}\frac{1}{T}\operatorname{\mathbb{E}}_{x}^{U}\left[\int_{0}^{T}c_{n}(X_{s}^{n},U_{s})\mathrm{d}{s}\right]\,,

(2.19)

and the optimal value is defined as

{\mathscr{E}}^{n*}(c_{n})\,:=\,\inf_{x\in{\mathds{R}^{d}}}\inf_{U\in\mathfrak{U}}{\mathscr{E}}_{x}^{n}(c_{n},U)\,.

(2.20)

Similarly, for each $n\in\mathds{N}$ and $U\in\mathfrak{U}$ the associated finite horizon cost is given by

{\mathcal{J}}_{T,n}^{U}(x,c_{n})\,:=\,\operatorname{\mathbb{E}}_{x}^{U}\left[\int_{0}^{T}c_{n}(X_{s}^{n},U_{s})\mathrm{d}{s}+H(X_{T}^{n})\right]\,.

(2.21)

The optimal value is given by

{\mathcal{J}}_{T,n}^{*}(x,c_{n})=\inf_{U\in\mathfrak{U}}{\mathcal{J}}_{T,n}^{U}(x,c_{n})\quad\text{for all}\,\,\,x\in{\mathds{R}^{d}}\,,

(2.22)

where state process $X_{t}^{n}$ is given by the solution of the SDE Eq. 2.15 .

2.3. Continuity and Robustness Problems

The primary objective of this article will be to address the following problems:

•
Continuity: If the approximated model Eq. 2.15 approches the true model Eq. 2.1, whether this implies
- •
  
  for discounted cost: $V_{\alpha}^{n}\to V_{\alpha}?$
- •
  
  for ergodic cost : ${\mathscr{E}}^{n*}(c_{n})\to{\mathscr{E}}^{*}(c)?$
- •
  
  for finite horizon cost : ${\mathcal{J}}_{T,n}^{*}(x,c_{n})\to{\mathcal{J}}_{T}^{*}(x,c)?$
- •
  
  for cost up to an exit time: $\hat{{\mathcal{J}}}_{e,n}^{*}\to\hat{{\mathcal{J}}}_{e}^{*}?$ (for details, see Section 6)
•
Robustness: Suppose $v_{n}^{*}$ is an optimal policy designed over incorrect model Eq. 2.15 for finite horizon/ discounted/ergodic/up to an exit time cost problem, does this imply
- •
  
  for discounted cost: ${\mathcal{J}}_{\alpha}^{v_{n}^{*}}(x,c)\to V_{\alpha}?$
- •
  
  for ergodic cost: ${\mathscr{E}}_{x}(c,v_{n}^{*})\to{\mathscr{E}}^{*}(c)?$
- •
  
  for finite horizon cost : ${\mathcal{J}}_{T}^{v_{n}^{*}}(x,c)\to{\mathcal{J}}_{T}^{*}(x,c)?$
- •
  
  for cost up to an exit time: $\hat{{\mathcal{J}}}_{e}^{v_{n}^{*}}\to\hat{{\mathcal{J}}}_{e}^{*}?$ (for details, see Section 6)
as $n\to\infty$ .

In this article, under a mild set of assumptions we show that the answers to the above mentioned questions are affirmative.

Example 2.1.

(i)

If our noise term is not the (ideal) Brownian, and instead of Eq. 2.1, the state dynamics of the system is governed the following SDE

\begin{cases}\mathrm{d}\hat{X}_{t}^{n}\,=\,b(\hat{X}_{t}^{n},U_{t})\mathrm{d}t+\upsigma(\hat{X}_{t}^{n})\mathrm{d}S_{t}^{n}\\ \mathrm{d}S_{t}^{n}\,=\,\hat{b}_{n}(\hat{X}_{t}^{n})\mathrm{d}t+\hat{\upsigma}_{n}(\hat{X}_{t}^{n})\mathrm{d}\hat{W}_{t}\,.\end{cases}

(2.23)

Here we are approximating the noise term by a It $\hat{\rm o}$ process $\{S_{t}^{n}\}$ , given by

\mathrm{d}S_{t}^{n}\,=\,\hat{b}_{n}(\hat{X}_{t}^{n})\mathrm{d}t+\hat{\upsigma}_{n}(\hat{X}_{t}^{n})\mathrm{d}\hat{W}_{t}\,,

(2.24)

where $\hat{b}_{n}(\cdot)\to 0$ and $\hat{\upsigma}_{n}(\cdot)\to I$ as $n\to\infty$ .

(ii)

Suppose that Eq. 2.1 is approximated by Eq. 2.15 where $b_{n}$ and $\upsigma_{n}$ consist of polynomials of appropriate dimensions which converge pointwise to $b$ and $\upsigma$ (which are already assumed to be continuous) where we also have continuous convergence in control variable $\zeta$ .
(iii)

Consider a Vasicek interest rate model, given by

$\mathrm{d}r_{t}\,=\,\theta(\mu-r_{t})\mathrm{d}t+\sigma\mathrm{d}W_{t}\,.$

this is a mean reverting process, where $\theta$ is the rate of reversion, $\mu$ is the long term mean and $\sigma$ is the volatility. The wealth process corresponding to this interest model can be described by Eq. 2.1 (see [AS08, Remark 2.1], [KT12], [DJ07])). Since market models are typically incomplete, usually model parameters ( $\theta,\mu,\sigma$ ) are learned from the market data. This gives rise to the problem of robustness of optimal investment. This also applies to several other interest/pricing models as well, [merton1998applications].
(iv)

In the above examples, $c_{n}$ can be a regularized version of $c$ , e.g. by adding, for $\epsilon_{n}>0$ , $\epsilon_{n}\zeta^{T}\zeta$ where $\mathbb{U}\subset\mathbb{R}^{m}$ , which then would continuously converge (in control) to $c$ as $\epsilon_{n}\to 0$ .

In the cases above, the approximating kernel conditions in (A5) would apply.

Remark 2.1.

If we replace $\sigma(x)$ by $\sigma(x,\zeta)$ , in the relaxed control framework if $\sigma(\cdot,v(\cdot))$ is Lipschitz continuous for $v\in\mathfrak{U}_{\mathsf{sm}}$ then Eq. 2.1 admits a unique strong solution. But in general stationary policies $v\in\mathfrak{U}_{\mathsf{sm}}$ are just measurable functions. Existence of suitable strong solutions in our setting is not known (see, [ABG-book, Remarks 2.3.2], [B05Survey]) . However, under stationary Markov policies one can prove the existence of weak solutions which may not be unique [stroock1997multidimensional][ABG-book, Remarks 2.3.2] (note though that uniqueness is established for $d=1,2$ in [stroock1997multidimensional, p. 192-194] under some conditions). The existence of a suitable strong solution (which is also a strong Markov process) under stationary Markov policies is essential to obtain stochastic representation of solutions of HJB equations (by applying It $\hat{o}$ -Krylov formula).

Notation:

•

For any set $A\subset\mathds{R}^{d}$ , by $\uptau(A)$ we denote first exit time of the process $\{X_{t}\}$ from the set $A\subset\mathds{R}^{d}$ , defined by

$\uptau(A)\,:=\,\inf\,\{t>0\,\colon X_{t}\not\in A\}\,.$
•

${\mathscr{B}}_{r}$ denotes the open ball of radius $r$ in $\mathds{R}^{d}$ , centered at the origin, and ${\mathscr{B}}_{r}^{c}$ denotes the complement of ${\mathscr{B}}_{r}$ in ${\mathds{R}^{d}}$ .
•

$\uptau_{r}$ , ${\breve{\uptau}}_{r}$ denote the first exist time from ${\mathscr{B}}_{r}$ , ${\mathscr{B}}_{r}^{c}$ respectively, i.e., $\uptau_{r}:=\uptau({\mathscr{B}}_{r})$ , and ${\breve{\uptau}}_{r}:=\uptau({\mathscr{B}}^{c}_{r})$ .
•

By $\operatorname*{Tr}S$ we denote the trace of a square matrix $S$ .
•

For any domain $\mathcal{D}\subset\mathds{R}^{d}$ , the space ${\mathcal{C}}^{k}(\mathcal{D})$ ( ${\mathcal{C}}^{\infty}(\mathcal{D})$ ), $k\geq 0$ , denotes the class of all real-valued functions on $\mathcal{D}$ whose partial derivatives up to and including order $k$ (of any order) exist and are continuous.
•

${\mathcal{C}}_{\mathrm{c}}^{k}(\mathcal{D})$ denotes the subset of ${\mathcal{C}}^{k}(\mathcal{D})$ , $0\leq k\leq\infty$ , consisting of functions that have compact support. This denotes the space of test functions.
•

${\mathcal{C}}_{b}({\mathds{R}^{d}})$ denotes the class of bounded continuous functions on ${\mathds{R}^{d}}$ .
•

${\mathcal{C}}^{k}_{0}(\mathcal{D})$ , denotes the subspace of ${\mathcal{C}}^{k}(\mathcal{D})$ , $0\leq k<\infty$ , consisting of functions that vanish in $\mathcal{D}^{c}$ .
•

${\mathcal{C}}^{k,r}(\mathcal{D})$ , denotes the class of functions whose partial derivatives up to order $k$ are Hölder continuous of order $r$ .
•

${L}^{p}(\mathcal{D})$ , $p\in[1,\infty)$ , denotes the Banach space of (equivalence classes of) measurable functions $f$ satisfying $\int_{\mathcal{D}}\lvert f(x)\rvert^{p}\,\mathrm{d}{x}<\infty$ .
•

${\mathscr{W}}^{k,p}(\mathcal{D})$ , $k\geq 0$ , $p\geq 1$ denotes the standard Sobolev space of functions on $\mathcal{D}$ whose weak derivatives up to order $k$ are in ${L}^{p}(\mathcal{D})$ , equipped with its natural norm (see, [Adams]) .
•

If $\mathcal{X}(Q)$ is a space of real-valued functions on $Q$ , $\mathcal{X}_{\mathrm{loc}}(Q)$ consists of all functions $f$ such that $f\varphi\in\mathcal{X}(Q)$ for every $\varphi\in{\mathcal{C}}_{\mathrm{c}}^{\infty}(Q)$ . In a similar fashion, we define ${\mathscr{W}}_{\text{loc}}^{k,p}(\mathcal{D})$ .

•

For $\mu>0$ , let $e_{\mu}(x)=e^{-\mu\sqrt{1+\lvert x\rvert^{2}}}$ , $x\in{\mathds{R}^{d}}$ . Then $f\in{L}^{p,\mu}((0,T)\times{\mathds{R}^{d}})$ if $fe_{\mu}\in{L}^{p}((0,T)\times{\mathds{R}^{d}})$ . Similarly, ${\mathscr{W}}^{1,2,p,\mu}((0,T)\times{\mathds{R}^{d}})=\{f\in{L}^{p,\mu}((0,T)\times{\mathds{R}^{d}})\mid f,\frac{\partial f}{\partial t},\frac{\partial f}{\partial x_{i}},\frac{\partial^{2}f}{\partial x_{i}\partial x_{j}}\in{L}^{p,\mu}((0,T)\times{\mathds{R}^{d}})\}$ with natural norm (see [BL84-book])

	$\displaystyle\lVert f\rVert_{{\mathscr{W}}^{1,2,p,\mu}}=\lVert\frac{\partial f}{\partial t}\rVert_{{L}^{p,\mu}((0,T)\times{\mathds{R}^{d}})}+\lVert f\rVert_{{L}^{p,\mu}((0,T)\times{\mathds{R}^{d}})}+$	$\displaystyle\sum_{i}\lVert\frac{\partial f}{\partial x_{i}}\rVert_{{L}^{p,\mu}((0,T)\times{\mathds{R}^{d}})}$
		$\displaystyle+\sum_{i,j}\lVert\frac{\partial^{2}f}{\partial x_{i}\partial x_{j}}\rVert_{{L}^{p,\mu}((0,T)\times{\mathds{R}^{d}})}\,.$

3. Analysis of Discounted Cost

In this section we analyze the robustness of optimal controls for discounted cost criterion. From [ABG-book, Theorem 3.5.6], we have the following characterization of the optimal $\alpha$ -discounted cost $V_{\alpha}$ .

Theorem 3.1.

Suppose Assumptions (A1)-(A4) hold. Then the optimal discounted cost $V_{\alpha}$ defined in Eq. 2.3 is the unique solution in ${\mathcal{C}}^{2}({\mathds{R}^{d}})\cap{\mathcal{C}}_{b}({\mathds{R}^{d}})$ of the HJB equation

\min_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}V_{\alpha}(x)+c(x,\zeta)\right]=\alpha V_{\alpha}(x)\,,\quad\text{for all\ }\,\,x\in{\mathds{R}^{d}}\,.

(3.1)

Moreover, $v^{*}\in\mathfrak{U}_{\mathsf{sm}}$ is $\alpha$ -discounted optimal control if and only if it is a measurable minimizing selector ofEq. 3.1, i.e.,

b(x,v^{*}(x))\cdot\nabla V_{\alpha}(x)+c(x,v^{*}(x))=\min_{\zeta\in\mathbb{U}}\left[b(x,\zeta)\cdot\nabla V_{\alpha}(x)+c(x,\zeta)\right]\quad\text{a.e.}\,\,\,x\in{\mathds{R}^{d}}\,.

(3.2)

Remark 3.1.

The assumption that the running cost is Lipschitz continuous in it’s first argument uniformly with respect to the second, is used to obtain a ${\mathcal{C}}^{2}({\mathds{R}^{d}})$ solution of the HJB equation Eq. 3.1. If we don’t have this uniformly Lipschitz assumption, one can still show that the HJB equation admits a solution now in ${\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})$ , $p\geq d+1$ and all the conclusions of the Theorem 3.1 still hold . To see this: in view of [GilTru, Theorem 9.15] and the Schauder fixed point theorem, it can be shown that there exists $\phi_{R}\in{\mathscr{W}}^{2,p}({\mathscr{B}}_{R})$ satisfying the Dirichlet problem

\displaystyle\min_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}\phi_{R}(x)+c(x,\zeta)\right]=\alpha\phi_{R}(x)\,,\quad\text{for all\ }\,\,x\in{\mathscr{B}}_{R}\,,\quad\text{with}\quad\phi_{R}=0\,\,\,\text{on}\,\,\,\partial{{\mathscr{B}}_{R}}\,.

Now letting $R\to\infty$ and following [ABG-book, Theorem 3.5.6] we arrive at the solution.

Hence, one can replace our assumption (A4) by the following (relatively weaker) assumption

(A4)́

The running cost $c$ is bounded (i.e., $\|c\|_{\infty}\leq M$ for some positive constant $M$ ) and jointly continuous in both variables $(x,\zeta)$ .

All the results of this paper will also hold if we replace (A4) by (A4)́ .

As in Theorem 3.1, following [ABG-book, Theorem 3.5.6], for each approximating model we have the following complete characterization of an optimal policy, which is in the space of stationary Markov policies.

Theorem 3.2.

Suppose (A5)(iii) holds. Then for each $n\in\mathds{N}$ , there exists a unique solution $V_{\alpha}^{n}\in{\mathcal{C}}^{2}({\mathds{R}^{d}})\cap{\mathcal{C}}_{b}({\mathds{R}^{d}})$ of

\min_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}^{n}V_{\alpha}^{n}(x)+c_{n}(x,\zeta)\right]=\alpha V_{\alpha}^{n}(x)\,,\quad\text{for all\ }\,\,x\in{\mathds{R}^{d}}\,.

(3.3)

Moreover, we have the following:

(i)

$V_{\alpha}^{n}$ is the optimal discounted cost, i.e.,

V_{\alpha}^{n}(x)=\inf_{U\in\mathfrak{U}}\operatorname{\mathbb{E}}_{x}^{U}\left[\int_{0}^{\infty}e^{-\alpha t}c_{n}(X_{s}^{n},U_{s})\mathrm{d}s\right]\quad x\in{\mathds{R}^{d}}\,,

(ii)

$v_{n}^{*}\in\mathfrak{U}_{\mathsf{sm}}$ is $\alpha$ -discounted optimal control if and only if it is a measurable minimizing selector ofEq. 3.3, i.e.,

b_{n}(x,v_{n}^{*}(x))\cdot\nabla V_{\alpha}^{n}(x)+c_{n}(x,v_{n}^{*}(x))=\min_{\zeta\in\mathbb{U}}\left[b_{n}(x,\zeta)\cdot\nabla V_{\alpha}^{n}(x)+c_{n}(x,\zeta)\right]\quad\text{a.e.}\,\,\,x\in{\mathds{R}^{d}}\,.

(3.4)

In the next theorem, we prove that $V_{\alpha}^{n}(x)$ converges to $V_{\alpha}(x)$ as $n\to\infty$ for all $x\in{\mathds{R}^{d}}$ . This result will be useful in establishing the robustness of discounted optimal controls.

Theorem 3.3.

Suppose Assumptions (A1)-(A5) hold. Then

\lim_{n\to\infty}V_{\alpha}^{n}(x)=V_{\alpha}(x)\quad\text{for all}\,\,x\in{\mathds{R}^{d}}\,.

(3.5)

Proof.

From Eq. 3.3 and Eq. 3.4 for any minimizing selector $v_{n}^{*}\in\mathfrak{U}_{\mathsf{sm}}$ , it follows that

\operatorname*{Tr}\bigl{(}a_{n}(x)\nabla^{2}V_{\alpha}^{n}(x)\bigr{)}+b_{n}(x,v_{n}^{*}(x))\cdot\nabla V_{\alpha}^{n}(x)+c_{n}(x,v_{n}^{*}(x))=\alpha V_{\alpha}^{n}(x)\,.

Then using the standard elliptic PDE estimate as in [GilTru, Theorem 9.11], for any $p\geq d+1$ and $R>0$ , we deduce that

\lVert V_{\alpha}^{n}(x)\rVert_{{\mathscr{W}}^{2,p}({\mathscr{B}}_{R})}\,\leq\,\kappa_{1}\bigl{(}\lVert V_{\alpha}^{n}(x)\rVert_{L^{p}({\mathscr{B}}_{2R})}+\lVert c_{n}(x,v_{n}^{*}(x))\rVert_{L^{p}({\mathscr{B}}_{2R})}\bigr{)}\,,

(3.6)

where $\kappa_{1}$ is a positive constant which is independent of $n$ . Since

\lVert c_{n}\rVert_{\infty}\,:=\,\sup_{(x,u)\in{\mathds{R}^{d}}\times\mathbb{U}}c_{n}(x,u)\leq M,\quad\text{and}\quad V_{\alpha}^{n}(x)\leq\frac{\lVert c_{n}\rVert_{\infty}}{\alpha}\,,

from Eq. 3.6 we get

\lVert V_{\alpha}^{n}(x)\rVert_{{\mathscr{W}}^{2,p}({\mathscr{B}}_{R})}\,\leq\,\kappa_{1}M\bigl{(}\frac{|{\mathscr{B}}_{2R}|^{\frac{1}{p}}}{\alpha}+|{\mathscr{B}}_{2R}|^{\frac{1}{p}}\bigr{)}\,.

(3.7)

We know that for $1<p<\infty$ , the space ${\mathscr{W}}^{2,p}({\mathscr{B}}_{R})$ is reflexive and separable, hence, as a corollary of the Banach-Alaoglu theorem, we have that every bounded sequence in ${\mathscr{W}}^{2,p}({\mathscr{B}}_{R})$ has a weakly convergent subsequence (see, [HB-book, Theorem 3.18.]). Also, we know that for $p\geq d+1$ the space ${\mathscr{W}}^{2,p}({\mathscr{B}}_{R})$ is compactly embedded in ${\mathcal{C}}^{1,\beta}(\bar{{\mathscr{B}}}_{R})$ , where $\beta<1-\frac{d}{p}$ (see [ABG-book, Theorem A.2.15 (2b)]), which implies that every weakly convergent sequence in ${\mathscr{W}}^{2,p}({\mathscr{B}}_{R})$ will converge strongly in ${\mathcal{C}}^{1,\beta}(\bar{{\mathscr{B}}}_{R})$ . Thus, in view of estimate Eq. 3.7, by a standard diagonalization argument and the Banach-Alaoglu theorem, we can extract a subsequence $\{V_{\alpha}^{n_{k}}\}$ such that for some $V_{\alpha}^{*}\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})$

\begin{cases}V_{\alpha}^{n_{k}}\to&V_{\alpha}^{*}\quad\text{in}\quad{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\quad\text{(weakly)}\\ V_{\alpha}^{n_{k}}\to&V_{\alpha}^{*}\quad\text{in}\quad{\mathcal{C}}^{1,\beta}_{loc}({\mathds{R}^{d}})\quad\text{(strongly)}\,.\end{cases}

(3.8)

In the following, we will show that $V^{*}_{\alpha}=V_{\alpha}$ . Now, for any compact set $K\subset{\mathds{R}^{d}}$ , it is easy to see that

		$\displaystyle\max_{x\in K}\|\min_{\zeta\in\mathbb{U}}\{b_{n_{k}}(x,\zeta)\cdot\nabla V_{\alpha}^{n_{k}}(x)+c(x,\zeta)\}-\min_{\zeta\in\mathbb{U}}\{b(x,\zeta)\cdot\nabla V_{\alpha}^{*}(x)+c(x,\zeta)\}\|$
		$\displaystyle\,\leq\,\max_{x\in K}\max_{\zeta\in\mathbb{U}}\|\{b_{n_{k}}(x,\zeta)\cdot\nabla V_{\alpha}^{n_{k}}(x)+c_{n}(x,\zeta)\}-\{b(x,\zeta)\cdot\nabla V_{\alpha}^{*}(x)+c(x,\zeta)\}\|$
		$\displaystyle\,\leq\,\max_{x\in K}\max_{\zeta\in\mathbb{U}}\|b_{n_{k}}(x,\zeta)\cdot\nabla V_{\alpha}^{n_{k}}(x)-b(x,\zeta)\cdot\nabla V_{\alpha}^{*}(x)\|+\max_{x\in K}\max_{\zeta\in\mathbb{U}}\|c_{n_{k}}(x,\zeta)-c(x,\zeta)\|$		(3.9)

Since $c_{n}(x,\cdot)\to c(x,\cdot)$ , $b_{n}(x,\cdot)\to b(x,\cdot)$ continuously on compact set $\mathbb{U}$ and $V_{\alpha}^{n_{k}}\to V_{\alpha}^{*}$ in ${\mathcal{C}}^{1,\beta}_{loc}({\mathds{R}^{d}})\,,$ for any compact set $K\subset{\mathds{R}^{d}}$ , as $k\to\infty$ we deduce that

\displaystyle\max_{x\in K}|\min_{\zeta\in\mathbb{U}}\{b_{n_{k}}(x,\zeta)\cdot\nabla V_{\alpha}^{n_{k}}(x)+c_{n_{k}}(x,\zeta)\}-\min_{\zeta\in\mathbb{U}}\{b(x,\zeta)\cdot\nabla V_{\alpha}(x)+c(x,\zeta)\}|\to 0

(3.10)

Thus, multiplying by a test function $\phi\in{\mathcal{C}}_{c}^{\infty}({\mathds{R}^{d}})$ , from Eq. 3.3, we obtain

\int_{{\mathds{R}^{d}}}\operatorname*{Tr}\bigl{(}a_{n_{k}}(x)\nabla^{2}V_{\alpha}^{n_{k}}(x)\bigr{)}\phi(x)\mathrm{d}x+\int_{{\mathds{R}^{d}}}\min_{\zeta\in\mathbb{U}}\{b_{n_{k}}(x,\zeta)\cdot\nabla V_{\alpha}^{n_{k}}(x)+c_{n_{k}}(x,\zeta)\}\phi(x)\mathrm{d}x=\alpha\int_{{\mathds{R}^{d}}}V_{\alpha}^{n_{k}}(x)\phi(x)\mathrm{d}x\,.

In view of Eq. 3.8 and Eq. 3.10, letting $k\to\infty$ it follows that

\int_{{\mathds{R}^{d}}}\operatorname*{Tr}\bigl{(}a(x)\nabla^{2}V_{\alpha}^{*}(x)\bigr{)}\phi(x)\mathrm{d}x+\int_{{\mathds{R}^{d}}}\min_{\zeta\in\mathbb{U}}\{b(x,\zeta)\cdot\nabla V_{\alpha}^{*}(x)+c(x,\zeta)\}\phi(x)\mathrm{d}x=\alpha\int_{{\mathds{R}^{d}}}V_{\alpha}^{*}(x)\phi(x)\mathrm{d}x\,.

(3.11)

Since $\phi\in{\mathcal{C}}_{c}^{\infty}({\mathds{R}^{d}})$ is arbitrary and $V_{\alpha}^{*}\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})$ from LABEL:{ETC1.3D} we deduce that

\min_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}V_{\alpha}^{*}(x)+c(x,\zeta)\right]=\alpha V_{\alpha}^{*}(x)\,,\quad\text{a.e.\ }\,\,x\in{\mathds{R}^{d}}\,.

(3.12)

Let $\tilde{v}^{*}\in\mathfrak{U}_{\mathsf{sm}}$ be a minimizing selector of Eq. 3.12 and $\tilde{X}$ be the solution of the SDE Eq. 2.1 corresponding to $\tilde{v}^{*}$ . Then applying It $\hat{\rm o}$ -Krylov formula, we obtain the following

	$\displaystyle\operatorname{\mathbb{E}}_{x}^{\tilde{v}^{}}\left[e^{-\alpha T}V_{\alpha}^{}(\tilde{X}_{T})\right]-V_{\alpha}^{*}(x)$
	$\displaystyle\,=\,\operatorname{\mathbb{E}}_{x}^{\tilde{v}^{}}\left[\int_{0}^{T}e^{-\alpha s}\{\operatorname{Tr}\bigl{(}a(\tilde{X}_{s})\nabla^{2}V_{\alpha}^{}(\tilde{X}_{s})\bigr{)}+b(\tilde{X}_{s},\tilde{v}^{}(\tilde{X}_{s}))\cdot\nabla V_{\alpha}^{}(\tilde{X}_{s})-\alpha V_{\alpha}^{}(\tilde{X}_{s}))\}\mathrm{d}{s}\right]\,.$

Hence, using Eq. 3.12, we deduce that

\displaystyle\operatorname{\mathbb{E}}_{x}^{\tilde{v}^{*}}\left[e^{-\alpha T}V_{\alpha}^{*}(\tilde{X}_{T})\right]-V_{\alpha}^{*}(x)\,=\,-\operatorname{\mathbb{E}}_{x}^{\tilde{v}^{*}}\left[\int_{0}^{T}e^{-\alpha s}c(\tilde{X}_{s},\tilde{v}^{*}(\tilde{X}_{s}))\mathrm{d}{s}\right]\,.

(3.13)

Since $V_{\alpha}^{*}$ is bounded and

\operatorname{\mathbb{E}}_{x}^{\tilde{v}^{*}}\left[e^{-\alpha T}V_{\alpha}^{*}(\tilde{X}_{T})\right]=e^{-\alpha T}\operatorname{\mathbb{E}}_{x}^{\tilde{v}^{*}}\left[V_{\alpha}^{*}(\tilde{X}_{T})\right],

letting $T\to\infty$ , it is easy to see that

\lim_{T\to\infty}\operatorname{\mathbb{E}}_{x}^{\tilde{v}^{*}}\left[e^{-\alpha T}V_{\alpha}^{*}(\tilde{X}_{T})\right]=0\,.

Now, letting $T\to\infty$ by monotone convergence theorem, from Eq. 3.13 we obtain

\displaystyle V_{\alpha}^{*}(x)\,=\,\operatorname{\mathbb{E}}_{x}^{\tilde{v}^{*}}\left[\int_{0}^{\infty}e^{-\alpha s}c(\tilde{X}_{s},\tilde{v}^{*}(\tilde{X}_{s}))\mathrm{d}{s}\right]\,.

(3.14)

Again by similar argument, applying It $\hat{\rm o}$ -Krylov formula and using Eq. 3.12, for any $U\in\mathfrak{U}$ , we have

\displaystyle V_{\alpha}^{*}(x)\,\leq\,\operatorname{\mathbb{E}}_{x}^{U}\left[\int_{0}^{\infty}e^{-\alpha s}c(\tilde{X}_{s},U_{s})\mathrm{d}{s}\right]\,.

This implies

\displaystyle V_{\alpha}^{*}(x)\,\leq\,\inf_{U\in\mathfrak{U}}\operatorname{\mathbb{E}}_{x}^{U}\left[\int_{0}^{\infty}e^{-\alpha s}c(\tilde{X}_{s},U_{s})\mathrm{d}{s}\right]\,.

(3.15)

Thus, from Eq. 3.14 and Eq. 3.15, we deduce that

\displaystyle V_{\alpha}^{*}(x)\,=\,\inf_{U\in\mathfrak{U}}\operatorname{\mathbb{E}}_{x}^{U}\left[\int_{0}^{\infty}e^{-\alpha s}c(\tilde{X}_{s},U_{s})\mathrm{d}{s}\right]\,.

(3.16)

Since both $V_{\alpha},V_{\alpha}^{*}$ are continuous functions on ${\mathds{R}^{d}}$ , from Eq. 2.3 and Eq. 3.16, it follows that $V_{\alpha}(x)=V_{\alpha}^{*}(x)$ for all $x\in{\mathds{R}^{d}}$ . This completes the proof. ∎

Let $\hat{X}^{n}$ be the solution of the SDE Eq. 2.1 corresponding to $v_{n}^{*}$ . Then we have

{\mathcal{J}}_{\alpha}^{v_{n}^{*}}(x,c)\,=\,\operatorname{\mathbb{E}}_{x}^{v_{n}^{*}}\left[\int_{0}^{\infty}e^{-\alpha t}c(\hat{X}^{n}_{s},v_{n}^{*}(\hat{X}^{n}_{s}))\mathrm{d}s\right],\quad x\in{\mathds{R}^{d}}\,.

(3.17)

Next we prove the robustness result, i.e., we prove that ${\mathcal{J}}_{\alpha}^{v_{n}^{*}}(x,c)\to{\mathcal{J}}_{\alpha}^{v^{*}}(x,c)$ as $n\to\infty$ , where $v_{n}^{*}$ is an optimal control of the approximated model and $v^{*}$ is an optimal control of the true model. As in [KY-20] we will use the continuity result above as an intermediate step.

Theorem 3.4.

Suppose Assumptions (A1)-(A5) hold. Then

\lim_{n\to\infty}{\mathcal{J}}_{\alpha}^{v_{n}^{*}}(x,c)={\mathcal{J}}_{\alpha}^{v^{*}}(x,c)\quad\text{for all}\,\,x\in{\mathds{R}^{d}}\,.

(3.18)

Proof.

Following the argument as in [ABG-book, Theorem 3.5.6], one can show that for each $v_{n}^{*}\in\mathfrak{U}_{\mathsf{sm}}$ , there exists $V_{\alpha}^{n,*}\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\cap{\mathcal{C}}_{b}({\mathds{R}^{d}})$ satisfying

\operatorname*{Tr}\bigl{(}a(x)\nabla^{2}V_{\alpha}^{n,*}(x)\bigr{)}+b(x,v_{n}^{*}(x))\cdot\nabla V_{\alpha}^{n,*}(x)+c(x,v_{n}^{*}(x))=\alpha V_{\alpha}^{n,*}(x)\,.

(3.19)

Applying It $\hat{\rm o}$ -Krylov formula, we deduce that

	$\displaystyle\operatorname{\mathbb{E}}_{x}^{v_{n}^{}}\left[e^{-\alpha T}V_{\alpha}^{n,}(\hat{X}^{n}_{T})\right]-V_{\alpha}^{n,*}(x)$
	$\displaystyle\,=\,\operatorname{\mathbb{E}}_{x}^{v_{n}^{}}\left[\int_{0}^{T}e^{-\alpha s}\{\operatorname{Tr}\bigl{(}a(\hat{X}^{n}_{s})\nabla^{2}V_{\alpha}^{n,}(\hat{X}^{n}_{s})\bigr{)}+b(\hat{X}^{n}_{s},v_{n}^{}(\hat{X}^{n}_{s}))\cdot\nabla V_{\alpha}^{n,}(\hat{X}^{n}_{s})-\alpha V_{\alpha}^{n,}(\hat{X}^{n}_{s}))\}\mathrm{d}{s}\right]\,.$

Now using Eq. 3.19, it follows that

\displaystyle\operatorname{\mathbb{E}}_{x}^{v_{n}^{*}}\left[e^{-\alpha T}V_{\alpha}^{n,*}(\hat{X}^{n}_{T})\right]-V_{\alpha}^{n,*}(x)\,=\,-\operatorname{\mathbb{E}}_{x}^{v_{n}^{*}}\left[\int_{0}^{T}e^{-\alpha s}c(\hat{X}^{n}_{s},v_{n}^{*}(\hat{X}^{n}_{s}))\mathrm{d}{s}\right]\,.

(3.20)

Since $V_{\alpha}^{n,*}$ is bounded and

\operatorname{\mathbb{E}}_{x}^{v_{n}^{*}}\left[e^{-\alpha T}V_{\alpha}^{n,*}(\hat{X}^{n}_{T})\right]=e^{-\alpha T}\operatorname{\mathbb{E}}_{x}^{v_{n}^{*}}\left[V_{\alpha}^{n,*}(\hat{X}^{n}_{T})\right],

letting $T\to\infty$ we deduce that

\lim_{T\to\infty}\operatorname{\mathbb{E}}_{x}^{v_{n}^{*}}\left[e^{-\alpha T}V_{\alpha}^{n,*}(\hat{X}^{n}_{T})\right]=0\,.

Thus, from Eq. 3.20, letting $T\to\infty$ by monotone convergence theorem we obtain

\displaystyle V_{\alpha}^{n,*}(x)\,=\,\operatorname{\mathbb{E}}_{x}^{v_{n}^{*}}\left[\int_{0}^{\infty}e^{-\alpha s}c(\hat{X}^{n}_{s},v_{n}^{*}(\hat{X}^{n}_{s}))\mathrm{d}{s}\right]\,=\,{\mathcal{J}}_{\alpha}^{v_{n}^{*}}(x,c)\,.

(3.21)

This implies that $V_{\alpha}^{n,*}\leq\frac{\lVert c\rVert_{\infty}}{\alpha}$ . Thus, as in Theorem 3.3 (see, Eq. 3.6, Eq. 3.7), by standard Sobolev estimate, for any $R>0$ we get $\lVert V_{\alpha}^{n,*}\rVert_{{\mathscr{W}}^{2,p}({\mathscr{B}}_{R})}\leq\kappa_{2}$ , for some positive constant $\kappa_{2}$ independent of $n$ . Hence, by the Banach-Alaoglu theorem and standard diagonalization argument (as in Eq. 3.8), we have there exists $\hat{V}_{\alpha}^{*}\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\cap{\mathcal{C}}_{b}({\mathds{R}^{d}})$ such that along some sub-sequence $\{V_{\alpha}^{n_{k},*}\}$

\begin{cases}V_{\alpha}^{n_{k},*}\to&\hat{V}_{\alpha}^{*}\quad\text{in}\quad{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\quad\text{(weakly)}\\ V_{\alpha}^{n_{k},*}\to&\hat{V}_{\alpha}^{*}\quad\text{in}\quad{\mathcal{C}}^{1,\beta}_{loc}({\mathds{R}^{d}})\quad\text{(strongly)}\,.\end{cases}

(3.22)

Since space of stationary Markov strategies $\mathfrak{U}_{\mathsf{sm}}$ is compact, along some further sub-sequence (without loss of generality denoting by same sequence) we have $v_{n_{k}}^{*}\to\hat{v}^{*}$ in $\mathfrak{U}_{\mathsf{sm}}$ . It is easy to see that

	$\displaystyle b(x,v_{n_{k}}^{}(x))\cdot\nabla V_{\alpha}^{n_{k},}(x)-b(x,\hat{v}^{}(x))\cdot\nabla\hat{V}_{\alpha}^{}(x)=$	$\displaystyle b(x,v_{n_{k}}^{}(x))\cdot\nabla\left(V_{\alpha}^{n_{k},}-\hat{V}_{\alpha}^{*}\right)(x)$
		$\displaystyle+\left(b(x,v_{n_{k}}^{}(x))-b(x,\hat{v}^{}(x))\right)\cdot\nabla\hat{V}_{\alpha}^{*}(x)\,.$

Since $V_{\alpha}^{n_{k},*}\to\hat{V}_{\alpha}^{*}$ in ${\mathcal{C}}^{1,\beta}_{loc}({\mathds{R}^{d}})\,,$ on any compact set $b(x,v_{n_{k}}^{*}(x))\cdot\nabla\left(V_{\alpha}^{n_{k},*}-\hat{V}_{\alpha}^{*}\right)(x)\to 0$ strongly and by the topology of $\mathfrak{U}_{\mathsf{sm}}$ , we have $\left(b(x,v_{n_{k}}^{*}(x))-b(x,\hat{v}^{*}(x))\right)\cdot\nabla\hat{V}_{\alpha}^{*}(x)\to 0$ weakly. Thus, in view of the topology of $\mathfrak{U}_{\mathsf{sm}}$ , and since $V_{\alpha}^{n_{k},*}\to\hat{V}_{\alpha}^{*}$ in ${\mathcal{C}}^{1,\beta}_{loc}({\mathds{R}^{d}})\,,$ as $k\to\infty$ we obtain

b(x,v_{n_{k}}^{*}(x))\cdot\nabla V_{\alpha}^{n_{k},*}(x)+c(x,v_{n_{k}}^{*}(x))\to b(x,\hat{v}^{*}(x))\cdot\nabla\hat{V}_{\alpha}^{*}(x)+c(x,\hat{v}^{*}(x))\quad\text{weakly}\,.

(3.23)

Now, multiplying by a test function $\phi\in{\mathcal{C}}_{c}^{\infty}({\mathds{R}^{d}})$ , from Eq. 3.19, it follows that

	$\displaystyle\int_{{\mathds{R}^{d}}}\operatorname{Tr}\bigl{(}a(x)\nabla^{2}V_{\alpha}^{n_{k},}(x)\bigr{)}\phi(x)\mathrm{d}x+\int_{{\mathds{R}^{d}}}\{b(x,v_{n_{k}}^{}(x))\cdot\nabla V_{\alpha}^{n_{k},}(x)+$	$\displaystyle c(x,v_{n_{k}}^{*}(x))\}\phi(x)\mathrm{d}x$
		$\displaystyle=\alpha\int_{{\mathds{R}^{d}}}V_{\alpha}^{n_{k},*}(x)\phi(x)\mathrm{d}x\,.$

Hence, using Eq. 3.22, Eq. 3.23, and letting $k\to\infty$ we obtain

\int_{{\mathds{R}^{d}}}\operatorname*{Tr}\bigl{(}a(x)\nabla^{2}\hat{V}_{\alpha}^{*}(x)\bigr{)}\phi(x)\mathrm{d}x+\int_{{\mathds{R}^{d}}}\{b(x,\hat{v}^{*}(x))\cdot\nabla\hat{V}_{\alpha}^{*}(x)+c(x,\hat{v}^{*}(x))\}\phi(x)\mathrm{d}x=\alpha\int_{{\mathds{R}^{d}}}\hat{V}_{\alpha}^{*}(x)\phi(x)\mathrm{d}x\,.

(3.24)

Since $\phi\in{\mathcal{C}}_{c}^{\infty}({\mathds{R}^{d}})$ is arbitrary and $\hat{V}_{\alpha}^{*}\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})$ from Eq. 3.24, we deduce that the function $\hat{V}_{\alpha}^{*}\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\cap{\mathcal{C}}_{b}({\mathds{R}^{d}})$ satisfies

\operatorname*{Tr}\bigl{(}a(x)\nabla^{2}\hat{V}_{\alpha}^{*}(x)\bigr{)}+b(x,\hat{v}^{*}(x))\cdot\nabla\hat{V}_{\alpha}^{*}(x)+c(x,\hat{v}^{*}(x))=\alpha\hat{V}_{\alpha}^{*}(x)\,.

(3.25)

As earlier, applying It $\hat{\rm o}$ -Krylov formula and using LABEL:{ETC1.4F}, it follows that

\displaystyle\hat{V}_{\alpha}^{*}(x)\,=\,\operatorname{\mathbb{E}}_{x}^{\hat{v}^{*}}\left[\int_{0}^{\infty}e^{-\alpha s}c(\hat{X}_{s},\hat{v}^{*}(\hat{X}_{s}))\mathrm{d}{s}\right]\,,

(3.26)

where $\hat{X}$ is the solution of SDE Eq. 2.1 corresponding to $\hat{v}^{*}$ .

Now, we have

|{\mathcal{J}}_{\alpha}^{v_{n_{k}}^{*}}(x,c)-{\mathcal{J}}_{\alpha}^{v^{*}}(x,c)|\leq|{\mathcal{J}}_{\alpha}^{v_{n_{k}}^{*}}(x,c)-V_{\alpha}^{n_{k}}(x)|+|V_{\alpha}^{n_{k}}(x)-{\mathcal{J}}_{\alpha}^{v^{*}}(x,c)|\,.

(3.27)

From Theorem 3.1, we know that ${\mathcal{J}}_{\alpha}^{v^{*}}(x,c)=V_{\alpha}(x)$ . Thus from Theorem 3.3, we deduce that $|V_{\alpha}^{n_{k}}(x)-{\mathcal{J}}_{\alpha}^{v^{*}}(x,c)|\to 0$ as $k\to\infty$ . To complete the proof we have to show that $|{\mathcal{J}}_{\alpha}^{v_{n_{k}}^{*}}(x,c)-V_{\alpha}^{n_{k}}(x)|\to 0$ as $k\to\infty$ . Also, from Theorem 3.2 we know that $v_{n}^{*}\in\mathfrak{U}_{\mathsf{sm}}$ is a minimizing selector of the HJB equation Eq. 3.3 of the approximated model, thus it follows that

	$\displaystyle\alpha V_{\alpha}^{n_{k}}(x)$	$\displaystyle\,=\,\min_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}^{n_{k}}V_{\alpha}^{n_{k}}(x)+c_{n_{k}}(x,\zeta)\right]$
		$\displaystyle\,=\,\operatorname{Tr}\bigl{(}a_{n_{k}}(x)\nabla^{2}V_{\alpha}^{n_{k}}(x)\bigr{)}+b_{n_{k}}(x,v_{n_{k}}^{}(x))\cdot\nabla V_{\alpha}^{n_{k}}(x)+c(x,v_{n_{k}}^{*}(x))\,,\quad\text{a.e.\ }\,\,x\in{\mathds{R}^{d}}\,.$		(3.28)

Hence, by standard Sobolev estimate (as in Theorem 3.3), for each $R>0$ we have $\lVert V_{\alpha}^{n_{k}}\rVert_{{\mathscr{W}}^{2,p}({\mathscr{B}}_{R})}\leq\kappa_{3}$ , for some positive constant $\kappa_{3}$ independent of $k$ . Thus, we can extract a further sub-sequence (without loss of generality denoting by same sequence) such that for some $\tilde{V}_{\alpha}^{*}\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\cap{\mathcal{C}}_{b}({\mathds{R}^{d}})$ (as in Eq. 3.8) we get

\begin{cases}V_{\alpha}^{n_{k}}\to&\tilde{V}_{\alpha}^{*}\quad\text{in}\quad{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\quad\text{(weakly)}\\ V_{\alpha}^{n_{k}}\to&\tilde{V}_{\alpha}^{*}\quad\text{in}\quad{\mathcal{C}}^{1,\beta}_{loc}({\mathds{R}^{d}})\quad\text{(strongly)}\,.\end{cases}

(3.29)

Following the similar steps as in Theorem 3.3, multiplying by test function and letting $k\to\infty$ , from Section 3 we deduce that $\tilde{V}_{\alpha}^{*}\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\cap{\mathcal{C}}_{b}({\mathds{R}^{d}})$ satisfies

	$\displaystyle\alpha\tilde{V}_{\alpha}^{*}(x)$	$\displaystyle\,=\,\min_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}\tilde{V}_{\alpha}^{*}(x)+c(x,\zeta)\right]$
		$\displaystyle\,=\,\operatorname{Tr}\bigl{(}a(x)\nabla^{2}\tilde{V}_{\alpha}^{}(x)\bigr{)}+b(x,\hat{v}^{}(x))\cdot\nabla\tilde{V}_{\alpha}^{}(x)+c(x,\hat{v}^{*}(x))(x)\,.$		(3.30)

From the continuity results (Theorem 3.3), it is easy to see that $\tilde{V}_{\alpha}^{*}(x)={\mathcal{J}}_{\alpha}^{v^{*}}(x,c)$ for all $x\in{\mathds{R}^{d}}$ . Moreover, applying It $\hat{\rm o}$ -Krylov formula and using LABEL:{ETC1.4I} we obtain

\displaystyle\tilde{V}_{\alpha}^{*}(x)\,=\,\operatorname{\mathbb{E}}_{x}^{\hat{v}^{*}}\left[\int_{0}^{\infty}e^{-\alpha s}c(\hat{X}_{s},\hat{v}^{*}(\hat{X}_{s}))\mathrm{d}{s}\right]\,.

(3.31)

Since both $\hat{V}_{\alpha}^{*}$ , $\tilde{V}_{\alpha}^{*}$ are continuous, from Eq. 3.26 and Eq. 3.31, it follows that both ${\mathcal{J}}_{\alpha}^{v_{n_{k}}^{*}}(x,c)$ (which is equals to $V_{\alpha}^{n_{k},*}(x)$ ) and $V_{\alpha}^{n_{k}}(x)$ converge to the same limit. This completes the proof. ∎

Remark 3.2.

Note that in the above, we indirectly also showed the continuity of the value function in the control policy (under the topology defined); uniqueness of the solution to the PDE in the above implies continuity. This result, while can be obtained from the analysis of Borkar [Bor89] (in a slightly more restrictive setup), is obtained directly via a careful optimality analysis and will have important consequences in numerical solutions and approximation results for both discounted and average cost optimality. This is studied in details, with implications in [YukselPradhan] .

4. Analysis of Ergodic Cost

In this section we study the robustness problem for the ergodic cost criterion. The associated optimal control problem for this cost criterion has been studied extensively in the literature, see e.g., [ABG-book].

For this cost evolution criterion we will study the robustness problem under two sets of assumptions: the first is so called near-monotonicity condition on the running cost which discourage instability and second is Lyapunov stability.

4.1. Analysis under a near-monotonicity assumption

Here we assume that the cost function $c$ satisfies the following near-monotonicity condition:

(A6)

It holds that

$\liminf_{\lVert x\rVert\to\infty}\inf_{\zeta\in\mathbb{U}}c(x,\zeta)>{\mathscr{E}}^{*}(c)\,.$ (4.1)

This condition penalizes the escape of probability mass to infinity. Since our running cost $c$ is bounded it is easy to see that ${\mathscr{E}}^{*}(c)\leq\lVert c\rVert_{\infty}$ . Recall that a stationary policy $v\in\mathfrak{U}_{\mathsf{sm}}$ is said to be stable if the associated diffusion process is positive recurrent. It is known that under Eq. 4.1, optimal control exists in the space of stable stationary Markov controls (see, [ABG-book, Theorem 3.4.5]).

Now from [ABG-book, Theorem 3.6.10], we have the following complete characterization of ergodic optimal control.

Theorem 4.1.

Suppose that Assumptions (A1)-(A4) and (A6) hold. Then there exists a unique solution pair $(V,\rho)\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\times\mathds{R}$ , $1<p<\infty$ , with $V(0)=0$ and $\inf_{{\mathds{R}^{d}}}V>-\infty$ and $\rho\leq{\mathscr{E}}^{*}(c)$ , satisfying

\rho=\min_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}V(x)+c(x,\zeta)\right]\,.

(4.2)

Moreover, we have

(i)

$\rho={\mathscr{E}}^{*}(c)$

(ii)

a stationary Markov control $v^{*}\in\mathfrak{U}_{\mathsf{sm}}$ is an optimal control if and only if it is a minimizing selector of Eq. 4.2, i.e., if and only if it satisfies

\min_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}V(x)+c(x,\zeta)\right]\,=\,\operatorname*{Tr}\bigl{(}a(x)\nabla^{2}V(x)\bigr{)}+b(x,v^{*}(x))\cdot\nabla V(x)+c(x,v^{*}(x))\,,\quad\text{a.e.}\,\,x\in{\mathds{R}^{d}}\,.

(4.3)

We assume that for the approximated model, for each $n\in\mathds{N}$ the running cost function $c_{n}$ satisfies the near-monotonicity condition Eq. 4.1 relative to $\max_{n\in\mathds{N}}{\mathscr{E}}^{n*}(c_{n})$ i.e.,

\max_{n\in\mathds{N}}{\mathscr{E}}^{n*}(c_{n})<\liminf_{\lVert x\rVert\to\infty}\inf_{\zeta\in\mathbb{U}}c_{n}(x,\zeta)\,.

(4.4)

Thus, in view of [ABG-book, Theorem 3.6.10], for the approximating model, for each $n\in\mathds{N}$ we have the following theorem.

Theorem 4.2.

Suppose that Assumption (A5)(iii) holds. Then there exists a unique solution pair $(V_{n},\rho_{n})\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\times\mathds{R}$ , $1<p<\infty$ , with $V_{n}(0)=0$ and $\inf_{{\mathds{R}^{d}}}V_{n}>-\infty$ and $\rho_{n}\leq{\mathscr{E}}^{n*}(c_{n})$ , satisfying

\rho_{n}=\min_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}^{n}V_{n}(x)+c_{n}(x,\zeta)\right]

(4.5)

Moreover, we have

(i)

$\rho_{n}={\mathscr{E}}^{n*}(c_{n})$

(ii)

a stationary Markov control $v_{n}^{*}\in\mathfrak{U}_{\mathsf{sm}}$ is an optimal control if and only if it is a minimizing selector of Eq. 4.5, i.e., if and only if it satisfies

\min_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}^{n}V_{n}(x)+c_{n}(x,\zeta)\right]\,=\,\operatorname*{Tr}\bigl{(}a_{n}(x)\nabla^{2}V_{n}(x)\bigr{)}+b_{n}(x,v_{n}^{*}(x))\cdot\nabla V_{n}(x)+c_{n}(x,v_{n}^{*}(x))\,,\quad\text{a.e.}\,\,x\in{\mathds{R}^{d}}\,.

(4.6)

In view of the near-monotonicity assumption Eq. 4.4, for any minimizing selector $v_{n}^{*}\in\mathfrak{U}_{\mathsf{sm}}$ of Eq. 4.5, it is easy to see that outside a compact set ${\mathscr{L}}_{v_{n}^{*}}^{n}V_{n}(x)\leq-\epsilon$ for some $\epsilon>0$ . Since $V_{n}$ is bounded from below, [ABG-book, Theorem 2.6.10(f)] asserts that $v_{n}^{*}$ is stable. Hence, we deduce that the optimal policies of the approximating models are stable. However, note that the compact set mentioned above may not be applicable uniformly for all $n$ , which turns out to be a consequential issue.

Now we want to show that as $n\to\infty$ the optimal value of the approximated model ${\mathscr{E}}^{n*}(c_{n})$ converges to the optimal value of the true model ${\mathscr{E}}^{*}(c)$ . Under near-monotonicity assumption this result may not be true in general due to the restricted uniqueness/non-uniqueness of the solution of the associated HJB equation (see e.g., [AA12], [AA13]). As a result of this, in [AA12], [M97] the authors have shown that for the optimal control problem the policy iteration algorithm (PIA) may fail to converge to the optimal value. In order to to ensure convergence of the PIA, in addition to the near-monotonicity assumption a blanket Lyapunov condition is assumed in [M97] .

Accordingly, in this article, to guarantee the convergence ${\mathscr{E}}^{n*}(c_{n})\to{\mathscr{E}}^{*}(c)$ , we will assume that

\Theta\,:=\,\{\eta_{v_{n}^{*}}^{n}:n\in\mathds{N}\},

is tight, where $\eta_{v_{n}^{*}}^{n}$ is the unique invariant measure of the solution $X^{n}$ of Eq. 2.15 corresponding to $v_{n}^{*}\in\mathfrak{U}_{\mathsf{sm}}$ (the optimal policies of the approximated models) . One sufficient condition which ensures the required tightness is the following: there exists a pair of nonnegative inf-compact functions $({\mathcal{V}},h)\in{\mathcal{C}}^{2}({\mathds{R}^{d}})\times{\mathcal{C}}({\mathds{R}^{d}})$ such that ${\mathscr{L}}_{v_{n}^{*}}^{n}{\mathcal{V}}(x)\leq\hat{\kappa}_{0}-h(x)$ for some positive constant $\hat{\kappa}_{0}$ and for all $n\in\mathds{N}$ and $x\in{\mathds{R}^{d}}$ .

Theorem 4.3.

Suppose that Assumptions (A1) - (A6) hold. Also, assume that the set $\Theta$ is tight. Then, we have

\lim_{n\to\infty}{\mathscr{E}}^{n*}(c_{n})={\mathscr{E}}^{*}(c)\,.

(4.7)

Proof.

From Theorem 4.2, we know that for each $n\in\mathds{N}$ there exists $(V_{n},\rho_{n})\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\times\mathds{R}$ , $1<p<\infty$ , with $V_{n}(0)=0$ and $\inf_{{\mathds{R}^{d}}}V_{n}>-\infty$ , satisfying

\rho_{n}=\min_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}^{n}V_{n}(x)+c_{n}(x,\zeta)\right]\,,

(4.8)

where $\rho_{n}={\mathscr{E}}^{n*}(c_{n})$ . Since $\lVert c_{n}\rVert_{\infty}\leq M$ , it follows that $\rho_{n}={\mathscr{E}}^{n*}(c_{n})\leq M$ .

From [ABG-book, Theorem 3.6.6] (the standard vanishing discount asymptotics), we know that as $\alpha\to 0$ the difference $V_{\alpha}^{n}(\cdot)-V_{\alpha}^{n}(0)\to V_{n}(\cdot)$ and $\alpha V_{\alpha}^{n}(0)\to\rho_{n}$ , where $V_{\alpha}^{n}$ is the solution of the $\alpha$ -discounted HJB equation Eq. 3.3 . Let

\kappa(\rho_{n})\,:=\,\{x\in{\mathds{R}^{d}}\mid\min_{\zeta\in\mathbb{U}}c_{n}(x,\zeta)\leq\rho_{n}\}\,.

Since the map $x\to\min_{\zeta\in\mathbb{U}}c_{n}(x,\zeta)$ is continuous, it is easy to see that $\kappa(\rho_{n})$ is closed and due the near-monotonicity assumption (see, Eq. 4.4), it follows that $\kappa(\rho_{n})$ is bounded. Therefore $\kappa(\rho_{n})$ is a compact subset of ${\mathds{R}^{d}}$ . Since $V_{\alpha}^{n}\leq{\mathcal{J}}_{\alpha,n}^{v_{n}^{*}}(x,c_{n})$ and $v_{n}^{*}$ is stable, from [ABG-book, Lemma 3.6.1], we have

\inf_{\kappa(\rho_{n})}V_{\alpha}^{n}=\inf_{{\mathds{R}^{d}}}V_{\alpha}^{n}\leq\frac{\rho_{n}}{\alpha}\,.

(4.9)

Now for any minimizing selector $\hat{v}_{n}^{*}\in\mathfrak{U}_{\mathsf{sm}}$ of Eq. 3.3, we get

\operatorname*{Tr}\bigl{(}a_{n}(x)\nabla^{2}V_{\alpha}^{n}(x)\bigr{)}+b_{n}(x,\hat{v}_{n}^{*}(x))\cdot\nabla V_{\alpha}^{n}(x)-\alpha V_{\alpha}^{n}(x)=-c_{n}(x,\hat{v}_{n}^{*}(x))\,.

Since $\lVert c_{n}\rVert_{\infty}\leq M$ for all $n\in\mathds{N}$ , from estimate (3.6.9b) of [ABG-book, Lemma 3.6.3], it follows that

\lVert V_{\alpha}^{n}-V_{\alpha}^{n}(0)\rVert_{{\mathscr{W}}^{2,p}({\mathscr{B}}_{R})}\leq\tilde{C}(R,p)\left(1+\alpha\inf_{{\mathscr{B}}_{R_{0}}}V_{\alpha}^{n}\right)\,,

(4.10)

for all $R>R_{0}$ , where $R_{0}\in\mathds{R}$ is positive number such that $\kappa(\rho_{n})\subset{\mathscr{B}}_{R_{0}}$ and $\tilde{C}(R,p)$ is a positive constant which depends only on $d$ and $R_{0}$ . Now combining Eq. 4.9 and Eq. 4.10, we obtain

\lVert V_{\alpha}^{n}-V_{\alpha}^{n}(0)\rVert_{{\mathscr{W}}^{2,p}({\mathscr{B}}_{R})}\leq\tilde{C}(R,p)\left(1+M\right)\,.

(4.11)

In view of assumption Eq. 4.4, one can choose $R_{0}$ independent of $n$ . Thus Eq. 4.11 implies that

\lVert V_{n}\rVert_{{\mathscr{W}}^{2,p}({\mathscr{B}}_{R})}\leq\tilde{C}(R,p)\left(1+M\right)\,.

(4.12)

Hence, by the Banach-Alaoglu theorem and standard diagonalization argument (as in Eq. 3.8), we have there exists $\tilde{V}\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})$ such that along a sub-sequence

\begin{cases}V_{n_{k}}\to&\tilde{V}\quad\text{in}\quad{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\quad\text{(weakly)}\\ V_{n_{k}}\to&\tilde{V}\quad\text{in}\quad{\mathcal{C}}^{1,\beta}_{loc}({\mathds{R}^{d}})\quad\text{(strongly)}\,.\end{cases}

(4.13)

Again, since $\rho_{n}\leq M$ , along a further sub-sequence (without loss of generality denoting by same sequence), we have $\rho_{n_{k}}\to\tilde{\rho}$ as $k\to\infty$ . Now, as before, multiplying by test function $\phi\in{\mathcal{C}}_{c}^{\infty}({\mathds{R}^{d}})$ , from Eq. 4.8, we obtain

\int_{{\mathds{R}^{d}}}\operatorname*{Tr}\bigl{(}a_{n_{k}}(x)\nabla^{2}V_{n_{k}}(x)\bigr{)}\phi(x)\mathrm{d}x+\int_{{\mathds{R}^{d}}}\min_{\zeta\in\mathbb{U}}\{b_{n_{k}}(x,\zeta)\cdot\nabla V_{n_{k}}(x)+c_{n_{k}}(x,\zeta)\}\phi(x)\mathrm{d}x=\int_{{\mathds{R}^{d}}}\rho_{n_{k}}\phi(x)\mathrm{d}x\,.

By similar argument as in Theorem 3.3, in view of Eq. 4.13, letting $k\to\infty$ it follows that

\int_{{\mathds{R}^{d}}}\operatorname*{Tr}\bigl{(}a(x)\nabla^{2}\tilde{V}(x)\bigr{)}\phi(x)\mathrm{d}x+\int_{{\mathds{R}^{d}}}\min_{\zeta\in\mathbb{U}}\{b(x,\zeta)\cdot\nabla\tilde{V}(x)+c(x,\zeta)\}\phi(x)\mathrm{d}x=\int_{{\mathds{R}^{d}}}\tilde{\rho}\phi(x)\mathrm{d}x\,.

(4.14)

Since $\phi\in{\mathcal{C}}_{c}^{\infty}({\mathds{R}^{d}})$ is arbitrary and $\tilde{V}\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})$ , we deduce that $\tilde{V}\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})$ satisfies

\tilde{\rho}=\min_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}\tilde{V}(x)+c(x,\zeta)\right]\,.

Since $\mathfrak{U}_{\mathsf{sm}}$ is compact along a further subsequence $v_{n_{k}}^{*}\to\tilde{v}^{*}$ (denoting by the same sequence) in $\mathfrak{U}_{\mathsf{sm}}$ . Repeating the above argument, one can show that the pair $(\tilde{V},\tilde{\rho})$ satisfies

\tilde{\rho}={\mathscr{L}}_{\tilde{v}^{*}}\tilde{V}(x)+c(x,\tilde{v}^{*}(x))\,.

As we know $V_{n}(0)=0$ for all $n\in\mathds{N}$ (see, Eq. 4.8), it is easy to see that $\tilde{V}(0)=0$ . Next we show that $\tilde{V}$ is bounded from below. From estimate (3.6.9a) of [ABG-book, Lemma 3.6.3], for each $R>R_{0}$ we have

\left(\operatorname*{osc}_{{\mathscr{B}}_{2R}}V_{\alpha}^{n}\,:=\,\right)\sup_{x\in{\mathscr{B}}_{2R}}V_{\alpha}^{n}(x)-\inf_{x\in{\mathscr{B}}_{2R}}V_{\alpha}^{n}(x)\leq\tilde{C}_{1}(R)(1+\alpha\inf_{{\mathscr{B}}_{R_{0}}}V_{\alpha}^{n})\leq\tilde{C}_{1}(R)(1+M)\,,

(4.15)

for some constant $\tilde{C}_{1}(R)>0$ which depends only on $d$ and $R_{0}$ . Also, let $\alpha_{k}$ be a sequence such that $\alpha_{k}\to 0$ as $k\to\infty$ , thus for each $x\in{\mathds{R}^{d}}$ we have

$\displaystyle V_{n}(x)$	$\displaystyle=\lim_{k\to\infty}\left(V_{\alpha_{k}}^{n}(x)-V_{\alpha_{k}}^{n}(0)\right)\geq\liminf_{k\to\infty}\left(V_{\alpha_{k}}^{n}(x)-\inf_{{\mathds{R}^{d}}}V_{\alpha_{k}}^{n}(x)+\inf_{{\mathds{R}^{d}}}V_{\alpha_{k}}^{n}(x)-V_{\alpha_{k}}^{n}(0)\right)$
	$\displaystyle\geq-\limsup_{k\to\infty}\left(V_{\alpha_{k}}^{n}(0)-\inf_{{\mathds{R}^{d}}}V_{\alpha_{k}}^{n}(x)\right)+\liminf_{k\to\infty}\left(V_{\alpha_{k}}^{n}(x)-\inf_{{\mathds{R}^{d}}}V_{\alpha_{k}}^{n}(x)\right)$
	$\displaystyle\geq-\limsup_{k\to\infty}\left(\operatorname*{osc}_{{\mathscr{B}}_{R_{0}}}V_{\alpha_{k}}^{n}\right);\quad\left(\text{since}\,\,\,\inf_{{\mathscr{B}}_{R_{0}}}V_{\alpha_{k}}^{n}=\inf_{{\mathds{R}^{d}}}V_{\alpha_{k}}^{n}\right)\,,$	(4.16)

where the last inequality follows form the fact that $V_{\alpha_{k}}^{n}(x)-\inf_{{\mathds{R}^{d}}}V_{\alpha_{k}}^{n}(x)\geq 0$ . Hence, in view of estimate Eq. 4.15, we deduce that

V_{n}(x)\geq-\tilde{C}_{1}(R_{0})(1+M)\,.

(4.17)

This implies that the limit $\tilde{V}\geq-\tilde{C}_{1}(R_{0})(1+M)$ . Note that

\rho_{n_{k}}={\mathscr{E}}^{{n_{k}}*}(c_{n_{k}})=\int_{{\mathds{R}^{d}}}\int_{\mathbb{U}}c_{n_{k}}(x,\zeta)v_{n_{k}}^{*}(x)(\mathrm{d}\zeta)\eta_{v_{n_{k}}^{*}}^{n_{k}}(\mathrm{d}x)\,.

Since $\Theta$ is tight, from [ABG-book, Lemma 3.2.6], we deduce that $\eta_{v_{n_{k}}^{*}}^{n_{k}}\to\eta_{\tilde{v}^{*}}$ in total variation norm as $k\to\infty$ , where $\eta_{\tilde{v}^{*}}$ is the unique invariant measure of Eq. 2.1 corresponding to $\tilde{v}^{*}$ . Thus, by writing

	$\displaystyle\int_{{\mathds{R}^{d}}}\int_{\mathbb{U}}c_{n_{k}}(x,\zeta)v_{n_{k}}^{}(x)(\mathrm{d}\zeta)\eta_{v_{n_{k}}^{}}^{n_{k}}(\mathrm{d}x)\,-\int_{{\mathds{R}^{d}}}\int_{\mathbb{U}}c(x,\zeta)\tilde{v}^{}(x)(\mathrm{d}\zeta)\eta_{\tilde{v}^{}}(\mathrm{d}x)$
	$\displaystyle=\bigg{(}\int_{{\mathds{R}^{d}}}\int_{\mathbb{U}}c_{n_{k}}(x,\zeta)v_{n_{k}}^{}(x)(\mathrm{d}\zeta)\eta_{v_{n_{k}}^{}}^{n_{k}}(\mathrm{d}x)-\int_{{\mathds{R}^{d}}}\int_{\mathbb{U}}c_{n_{k}}(x,\zeta)v_{n_{k}}^{}(x)(\mathrm{d}\zeta)\eta_{\tilde{v}^{}}(\mathrm{d}x)\bigg{)}$
	$\displaystyle\quad+\bigg{(}\int_{{\mathds{R}^{d}}}\int_{\mathbb{U}}c_{n_{k}}(x,\zeta)v_{n_{k}}^{}(x)(\mathrm{d}\zeta)\eta_{\tilde{v}^{}}(\mathrm{d}x)-\int_{{\mathds{R}^{d}}}\int_{\mathbb{U}}c(x,\zeta)\tilde{v}^{}(x)(\mathrm{d}\zeta)\eta_{\tilde{v}^{}}(\mathrm{d}x)\bigg{)}$		(4.18)

and noting that the first term converges to zero by total variation convergence of $\eta_{v_{n_{k}}^{*}}^{n_{k}}\to\eta_{\tilde{v}^{*}}$ and the second term converging by the convergence in the control topology on $\mathfrak{U}_{\mathsf{sm}}$ as $\eta_{\tilde{v}^{*}}$ is fixed; in view of the fact that $c_{n}\to c$ (continuously over control actions) we conclude that $\tilde{\rho}=\int_{{\mathds{R}^{d}}}\int_{\mathbb{U}}c(x,\zeta)\tilde{v}^{*}(x)(\mathrm{d}\zeta)\eta_{\tilde{v}^{*}}(\mathrm{d}x)$ . Therefore, the pair $(\tilde{V},\tilde{\rho})\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\times\mathds{R}$ , $1<p<\infty$ , which has the properties that $\tilde{V}(0)=0$ and $\inf_{{\mathds{R}^{d}}}\tilde{V}>-\infty$ , is a compatible solution (see [AA13, Definition 1.1]) to Eq. 4.2. Since solution to the equation Eq. 4.2 is unique (see [AA13, Theorem 1.1]), it follows that $(\tilde{V},\tilde{\rho})\equiv(V,\rho)$ . This completes the proof of the theorem. ∎

In the following theorem, we prove existence and uniqueness of solution of a certain Poisson’s equation. This will be useful in proving the robustness result.

Theorem 4.4.

Suppose that Assumptions (A1) - (A4) hold. Let $v\in\mathfrak{U}_{\mathsf{sm}}$ be a stable control such that

\liminf_{\lVert x\rVert\to\infty}\inf_{\zeta\in\mathbb{U}}c(x,\zeta)>\inf_{x\in{\mathds{R}^{d}}}{\mathscr{E}}_{x}(c,v)\,.

(4.19)

Then, there exists a unique pair $(V^{v},\rho_{v})\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\times\mathds{R}$ , $1<p<\infty$ , with $V^{v}(0)=0$ and $\inf_{{\mathds{R}^{d}}}V^{v}>-\infty$ and $\rho_{v}\leq\int_{{\mathds{R}^{d}}}\int_{\mathbb{U}}c(x,\zeta)v(x)(\mathrm{d}\zeta)\eta_{v}(\mathrm{d}x)$ , satisfying

\rho_{v}=\left[{\mathscr{L}}_{v}V^{v}(x)+c(x,v(x))\right]

(4.20)

Moreover, we have

(i)

$\rho_{v}=\inf_{{\mathds{R}^{d}}}{\mathscr{E}}_{x}(c,v)$ .

(ii)

for all $x\in{\mathds{R}^{d}}$

V^{v}(x)\,=\,\lim_{r\downarrow 0}\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{{\breve{\uptau}}_{r}}\left(c(X_{t},v(X_{t}))-\rho_{v}\right)\mathrm{d}t\right]\,.

(4.21)

Proof.

Since $c$ is bounded, we have $\left(\rho^{v}\,:=\,\right)\int_{{\mathds{R}^{d}}}\int_{\mathbb{U}}c(x,\zeta)v(x)(\mathrm{d}\zeta)\eta_{v}(\mathrm{d}x)\leq\inf_{{\mathds{R}^{d}}}{\mathscr{E}}_{x}(c,v)\leq\lVert c\rVert_{\infty}$ . Also, since (see, Eq. 4.19) $\liminf_{\lVert x\rVert\to\infty}\inf_{\zeta\in\mathbb{U}}c(x,\zeta)>\rho^{v}$ , from [ABG-book, Lemma 3.6.1], it follows that

\inf_{\kappa(\rho^{v})}{\mathcal{J}}_{\alpha}^{v}(x,c)=\inf_{{\mathds{R}^{d}}}{\mathcal{J}}_{\alpha}^{v}(x,c)\leq\frac{\rho^{v}}{\alpha}\,,

(4.22)

where $\kappa(\rho^{v})\,:=\,\{x\in{\mathds{R}^{d}}\mid\min_{\zeta\in\mathbb{U}}c(x,\zeta)\leq\rho^{v}\}$ and ${\mathcal{J}}_{\alpha}^{v}(x,c)$ is the $\alpha$ -discounted cost defined as in Eq. 2.2. It known that ${\mathcal{J}}_{\alpha}^{v}(x,c)$ is a solution to the Poisson’s equation (see, [ABG-book, Lemma A.3.7])

{\mathscr{L}}_{v}{\mathcal{J}}_{\alpha}^{v}(x,c)-\alpha{\mathcal{J}}_{\alpha}^{v}(x,c)=-c(x,v(x))\,.

(4.23)

Since $\kappa(\rho^{v})$ is compact, for some $R_{0}>0$ , we have $\kappa(\rho^{v})\subset{\mathscr{B}}_{R_{0}}$ . Thus from [ABG-book, Lemma 3.6.3], we deduce that for each $R>R_{0}$ there exist constants $\tilde{C}_{2}(R),\tilde{C}_{2}(R,p)$ depending only on $d,R_{0}$ such that

\operatorname*{osc}_{{\mathscr{B}}_{2R}}{\mathcal{J}}_{\alpha}^{v}(x,c)\leq\tilde{C}_{2}(R)\left(1+\alpha\inf_{{\mathscr{B}}_{R_{0}}}{\mathcal{J}}_{\alpha}^{v}(x,c)\right)\,,

(4.24)

\lVert{\mathcal{J}}_{\alpha}^{v}(\cdot,c)-{\mathcal{J}}_{\alpha}^{v}(0,c)\rVert_{{\mathscr{W}}^{2,p}({\mathscr{B}}_{R})}\leq\tilde{C}_{2}(R,p)\left(1+\alpha\inf_{{\mathscr{B}}_{R_{0}}}{\mathcal{J}}_{\alpha}^{v}(x,c)\right)\,.

(4.25)

Thus, arguing as in [ABG-book, Lemma 3.6.6], we deduce that there exists $(V^{v},\tilde{\rho}_{v})\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\times\mathds{R}$ such that as $\alpha\to 0$ , ${\mathcal{J}}_{\alpha}^{v}(\cdot,c)-{\mathcal{J}}_{\alpha}^{v}(0,c)\to V^{v}(\cdot)$ and $\alpha{\mathcal{J}}_{\alpha}^{v}(0,c)\to\tilde{\rho}_{v}$ and the pair $(V^{v},\tilde{\rho}_{v})$ satisfies

{\mathscr{L}}_{v}V^{v}(x)+c(x,v(x))=\tilde{\rho}_{v}\,.

(4.26)

By Eq. 4.22, we get $\tilde{\rho}_{v}\leq\rho^{v}$ . Now, in view of estimates Eq. 4.22 and Eq. 4.25, it is easy to see that

\lVert V^{v}\rVert_{{\mathscr{W}}^{2,p}({\mathscr{B}}_{R})}\leq\tilde{C}_{2}(R,p)\left(1+M\right)\,.

(4.27)

Also, arguing as in Theorem 4.3 (see Section 4.1), from estimate Eq. 4.24 it follows that

V^{v}\geq-\tilde{C}_{2}(R_{0})\left(1+M\right)\,.

(4.28)

Now, applying It $\hat{\rm o}$ -Krylov formula and using Eq. 4.26 we obtain

\displaystyle\operatorname{\mathbb{E}}_{x}^{v}\left[V^{v}\left(X_{T\wedge\uptau_{R}}\right)\right]-V^{v}(x)\,=\,\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{T\wedge\uptau_{R}}\left(\tilde{\rho}_{v}-c(X_{t},v(X_{t}))\right)\mathrm{d}{t}\right]\,.

This implies

\displaystyle\inf_{y\in{\mathds{R}^{d}}}V^{v}(y)-V^{v}(x)\,\leq\,\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{T\wedge\uptau_{R}}\left(\tilde{\rho}_{v}-c(X_{t},v(X_{t}))\right)\mathrm{d}{t}\right]\,.

Since $v$ is stable, letting $R\to\infty$ , we get

\displaystyle\inf_{y\in{\mathds{R}^{d}}}V^{v}(y)-V^{v}(x)\,\leq\,\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{T}\left(\tilde{\rho}_{v}-c(X_{t},v(X_{t}))\right)\mathrm{d}{t}\right]\,.

Now dividing both sides of the above inequality by $T$ and letting $T\to\infty$ , it follows that

\displaystyle\limsup_{T\to\infty}\frac{1}{T}\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{T}c(X_{t},v(X_{t}))\mathrm{d}{t}\right]\,\leq\,\tilde{\rho}_{v}\,.

Thus, $\rho^{v}\leq\tilde{\rho}_{v}$ . This indeed implies that $\rho^{v}=\tilde{\rho}_{v}$ . The representation Eq. 4.21 of $V^{v}$ follows by closely mimicking the argument of [ABG-book, Lemma 3.6.9]. Therefore, we have a solution pair $(V^{v},\rho_{v})$ to Eq. 4.20 satisfying (i) and (ii).

Next we want to prove that the solution pair is unique. To this end, let $(\hat{V}^{v},\hat{\rho}_{v})\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\times\mathds{R}$ , $1<p<\infty$ , with $\hat{V}^{v}(0)=0$ and $\inf_{{\mathds{R}^{d}}}\hat{V}^{v}>-\infty$ and $\hat{\rho}_{v}\leq\int_{{\mathds{R}^{d}}}\int_{\mathbb{U}}c(x,\zeta)v(x)(\mathrm{d}\zeta)\eta_{v}(\mathrm{d}x)$ , satisfying

\hat{\rho}_{v}=\left[{\mathscr{L}}_{v}\hat{V}^{v}(x)+c(x,v(x))\right]

(4.29)

Applying It $\hat{\rm o}$ -Krylov formula and using Eq. 4.29 we obtain

\displaystyle\limsup_{T\to\infty}\frac{1}{T}\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{T}c(X_{t},v(X_{t}))\mathrm{d}{t}\right]\,\leq\,\hat{\rho}_{v}

(4.30)

Since, $\hat{\rho}_{v}\leq\inf_{{\mathds{R}^{d}}}{\mathscr{E}}_{x}(c,v)$ , from Eq. 4.30 we obtain $\hat{\rho}^{v}=\rho_{v}$ . Now, from Eq. 4.26, applying It $\hat{\rm o}$ -Krylov formula, we deduce that

\displaystyle\hat{V}^{v}(x)\,=\,\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{{\breve{\uptau}}_{r}\wedge\uptau_{R}}\left(c(X_{t},v(X_{t}))-\hat{\rho}_{v}\right)\mathrm{d}{t}+\hat{V}^{v}\left(X_{{\breve{\uptau}}_{r}\wedge\uptau_{R}}\right)\right]\,.

(4.31)

Since $v$ is stable and $\hat{V}^{v}$ is bounded from below, for all $x\in{\mathds{R}^{d}}$ we have

\liminf_{R\to\infty}\operatorname{\mathbb{E}}_{x}^{v}\left[\hat{V}^{v}\left(X_{\uptau_{R}}\right)\mathds{1}_{\{{\breve{\uptau}}_{r}\geq\uptau_{R}\}}\right]\geq 0\,.

Hence, letting $R\to\infty$ by Fatou’s lemma from Eq. 4.31, it follows that

	$\displaystyle\hat{V}^{v}(x)$	$\displaystyle\,\geq\,\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{{\breve{\uptau}}_{r}}\left(c(X_{t},v(X_{t}))-\hat{\rho}_{v}\right)\mathrm{d}{t}+\hat{V}^{v}\left(X_{{\breve{\uptau}}_{r}}\right)\right]$
		$\displaystyle\,\geq\,\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{{\breve{\uptau}}_{r}}\left(c(X_{t},v(X_{t}))-\hat{\rho}_{v}\right)\mathrm{d}{t}\right]+\inf_{{\mathscr{B}}_{r}}\hat{V}^{v}\,.$

Since $\hat{V}^{v}(0)=0$ , letting $r\to 0$ , we obtain

\displaystyle\hat{V}^{v}(x)\,\geq\,\limsup_{r\downarrow 0}\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{{\breve{\uptau}}_{r}}\left(c(X_{t},v(X_{t}))-\hat{\rho}_{v}\right)\mathrm{d}{t}\right]\,.

(4.32)

From Eq. 4.21 and Eq. 4.32, it is easy to see that $V^{v}-\hat{V}^{v}\leq 0$ in ${\mathds{R}^{d}}$ . On the other hand by Eq. 4.20 and Eq. 4.29 one has ${\mathscr{L}}_{v}\left(V^{v}-\hat{V}^{v}\right)(x)=0$ in ${\mathds{R}^{d}}$ . Hence, applying strong maximum principle [GilTru, Theorem 9.6], one has $V^{v}=\hat{V}^{v}$ . This proves uniqueness. ∎

Next we prove the robustness result, i.e., we prove that ${\mathscr{E}}_{x}(c,v_{n}^{*})\to\rho$ as $n\to\infty$ , where $v_{n}^{*}$ is an optimal ergodic control of the approximated model (see, Theorem 4.2) . In order to establish this result we will assume that $\widehat{\Theta}:=\{\eta_{v_{n}^{*}}:n\in\mathds{N}\}$ is tight, where $\eta_{v_{n}^{*}}$ is the unique invariant measure of Eq. 2.1 corresponding to $v_{n}^{*}$ .

Theorem 4.5.

Suppose that Assumptions (A1) - (A6) hold. Also, assume that

\liminf_{\lVert x\rVert\to\infty}\inf_{\zeta\in\mathbb{U}}c(x,\zeta)>\inf_{x\in{\mathds{R}^{d}}}\sup_{n\in\mathds{N}}{\mathscr{E}}_{x}(c,v_{n}^{*})\,,

(4.33)

and the sets $\widehat{\Theta}$ and $\Theta$ are tight. Then, we have

\lim_{n\to\infty}\inf_{x\in{\mathds{R}^{d}}}{\mathscr{E}}_{x}(c,v_{n}^{*})={\mathscr{E}}^{*}(c)\,.

(4.34)

Proof.

We shall follow a similar proof program as that of Theorem 3.4, under the discounted setup. Since $c$ is bounded, we have $\left(\rho_{v_{n}^{*}}\,:=\,\right)\inf_{x\in{\mathds{R}^{d}}}{\mathscr{E}}_{x}(c,v_{n}^{*})\leq\lVert c\rVert_{\infty}$ . From our assumption Eq. 4.33, we know that $\liminf_{\lVert x\rVert\to\infty}\inf_{\zeta\in\mathbb{U}}c(x,\zeta)>\rho_{v_{n}^{*}}$ . Hence, from Theorem 4.4, we have there exists a unique pair $(V^{v_{n}^{*}},\rho_{v_{n}^{*}})\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\times\mathds{R}$ , $1<p<\infty$ , with $V^{v_{n}^{*}}(0)=0$ and $\inf_{{\mathds{R}^{d}}}V^{v_{n}^{*}}>-\infty$ , satisfying

\rho_{v_{n}^{*}}=\left[{\mathscr{L}}_{v_{n}^{*}}V^{v_{n}^{*}}(x)+c(x,{v_{n}^{*}}(x))\right],

(4.35)

with $\rho_{v_{n}^{*}}=\inf_{x\in{\mathds{R}^{d}}}{\mathscr{E}}_{x}(c,v_{n}^{*})=\int_{{\mathds{R}^{d}}}\int_{\mathbb{U}}c(x,\zeta)v_{n}^{*}(x)(\mathrm{d}\zeta)\eta_{v_{n}^{*}}(\mathrm{d}x)$ . Moreover, in view of assumption Eq. 4.33, from Eq. 4.27 and Eq. 4.28, we have

\lVert V^{v_{n}^{*}}\rVert_{{\mathscr{W}}^{2,p}({\mathscr{B}}_{R})}\leq\kappa_{1}\quad\text{and}\quad V^{v_{n}^{*}}(x)\geq-\kappa_{2}\,\,\,\text{for all}\,\,\,x\in{\mathds{R}^{d}}\,,

(4.36)

where $\kappa_{1},\kappa_{2}$ are constants independent of $n\in\mathds{N}$ . Thus by the Banach-Alaoglu theorem and standard diagonalization argument (as in Eq. 3.8), we deduce that exists $\hat{V}\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})$ such that along a sub-sequence

\begin{cases}V^{v_{n_{k}}^{*}}\to&\hat{V}\quad\text{in}\quad{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\quad\text{(weakly)}\\ V^{v_{n_{k}}^{*}}\to&\hat{V}\quad\text{in}\quad{\mathcal{C}}^{1,\beta}_{loc}({\mathds{R}^{d}})\quad\text{(strongly)}\,.\end{cases}

(4.37)

Again, since $\rho_{v_{n}^{*}}\leq M$ , along a further sub-sequence (without loss of generality denoting by same sequence), we have $\rho_{v_{n_{k}}^{*}}\to\hat{\rho}$ as $k\to\infty$ . Since $\mathfrak{U}_{\mathsf{sm}}$ is compact along a further subsequence (without loss of generality denoting by same sequence) we have $v_{n_{k}}^{*}\to\hat{v}^{*}$ as $k\to\infty$ . Now, as before, multiplying by test function and letting $k\to\infty$ , from Eq. 4.35, we deduce that the pair $(\hat{V},\hat{\rho})\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\times\mathds{R}$ , $1<p<\infty$ , satisfies

\hat{\rho}=\left[{\mathscr{L}}_{\hat{v}^{*}}\hat{V}(x)+c(x,{\hat{v}^{*}}(x))\right]

(4.38)

Since $V^{v_{n_{k}}^{*}}(0)=0$ for all $k\in\mathds{N}$ , it is easy to see that $\hat{V}(0)=0$ . Also, by Eq. 4.36, it follows that $\inf_{{\mathds{R}^{d}}}\hat{V}>-\infty$ . Hence, using Eq. 4.33 and Eq. 4.38, we have $\hat{v}^{*}\in\mathfrak{U}_{\mathsf{sm}}$ is stable. Since $\widehat{\Theta}$ is tight, in view of [ABG-book, Lemma 3.2.6], it is easy to see that $\hat{\rho}=\int_{{\mathds{R}^{d}}}\int_{\mathbb{U}}c(x,\zeta)\hat{v}^{*}(x)(\mathrm{d}\zeta)\eta_{\hat{v}^{*}}(\mathrm{d}x)$ . Thus, by Lemma 4.4, we deduce that $(\hat{V},\hat{\rho})\equiv(V^{\hat{v}^{*}},\rho_{\hat{v}^{*}})$ .

Note that

|\rho_{v_{n_{k}}^{*}}-\rho|\leq|\rho_{v_{n_{k}}^{*}}-\rho_{n_{k}}|+|\rho_{n_{k}}-\rho|\,.

Since $|\rho_{n_{k}}-\rho|\to 0$ as $k\to\infty$ (see, Theorem 4.3), to complete the proof we have to show that $|\rho_{v_{n_{k}}^{*}}-\rho_{n_{k}}|\to 0$ as $k\to\infty$ . From Theorem 4.2, we know that the pair $(V_{n_{k}},\rho_{n_{k}})\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\times\mathds{R}$ , $1<p<\infty$ , with $V_{n_{k}}(0)=0$ , satisfies

\rho_{n_{k}}=\min_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}^{n_{k}}V_{n_{k}}(x)+c_{n_{k}}(x,\zeta)\right]\,.

(4.39)

For any minimizing selector $v_{n_{k}}^{*}\in\mathfrak{U}_{\mathsf{sm}}$ , rewriting Eq. 4.39, we get

\rho_{n_{k}}=\left[{\mathscr{L}}_{v_{n_{k}}^{*}}^{n_{k}}V_{n_{k}}(x)+c_{n_{k}}(x,v_{n_{k}}^{*}(x))\right]\,.

(4.40)

Now, in view of estimates Eq. 4.12 and Eq. 4.17, it follows that

\lVert V_{n_{k}}\rVert_{{\mathscr{W}}^{2,p}({\mathscr{B}}_{R})}\leq\kappa_{3}\quad\text{and}\quad V_{n_{k}}(x)\geq-\kappa_{4}\,\,\,\text{for all}\,\,\,x\in{\mathds{R}^{d}}\,,

(4.41)

where $\kappa_{3},\kappa_{4}$ are constants independent of $k\in\mathds{N}$ . Hence, by the Banach-Alaoglu theorem and standard diagonalization argument (see Eq. 3.8), we have there exists $\bar{V}\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})$ such that along a sub-sequence

\begin{cases}V_{n_{k}}\to&\bar{V}\quad\text{in}\quad{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\quad\text{(weakly)}\\ V_{n_{k}}\to&\bar{V}\quad\text{in}\quad{\mathcal{C}}^{1,\beta}_{loc}({\mathds{R}^{d}})\quad\text{(strongly)}\,.\end{cases}

(4.42)

Also, $\rho_{n_{k}}\leq M$ implies that along a further subsequence (denoting by same sequence without loss generality) $\rho_{n_{k}}\to\bar{\rho}$ . Since $v_{n_{k}}^{*}\to\hat{v}^{*}$ in $\mathfrak{U}_{\mathsf{sm}}$ , multiplying by test functions and letting $k\to\infty$ from Eq. 4.40, we obtain that the pair $(\bar{V},\bar{\rho})\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\times\mathds{R}$ , $1<p<\infty$ satisfies

\bar{\rho}=\left[{\mathscr{L}}_{\hat{v}^{*}}\bar{V}(x)+c(x,\hat{v}^{*}(x))\right]\,.

(4.43)

Form Eq. 4.41, it easy to see that $\inf_{{\mathds{R}^{d}}}\bar{V}>-\infty$ . Also, since $V_{n_{k}}(0)=0$ for all $k\in\mathds{N}$ , we have $\bar{V}(0)=0$ . Since $\Theta$ is tight, arguing as in proof of Theorem 4.3, we deduce that $\bar{\rho}=\int_{{\mathds{R}^{d}}}\int_{\mathbb{U}}c(x,\zeta)\hat{v}^{*}(x)(\mathrm{d}\zeta)\eta_{\hat{v}^{*}}(\mathrm{d}x)$ . Thus, by uniqueness of solution of Eq. 4.43 (see, Theorem 4.4) it follows that $(\bar{V},\bar{\rho})\equiv(V^{\hat{v}^{*}},\rho_{\hat{v}^{*}})$ . Since both $\rho_{v_{n_{k}}^{*}}$ and $\rho_{n_{k}}$ converges to same limit $\rho_{\hat{v}^{*}}$ , we deduce that $|\rho_{v_{n_{k}}^{*}}-\rho_{n_{k}}|\to 0$ as $k\to\infty$ . This completes the proof of the theorem. ∎

4.2. Analysis under Lyapunov stability

In this section we study the robustness problem for ergodic cost criterion under Lyapunov stability assumption. We assume the following Foster-Lyapunov condition on the dynamics.

(A7)

(i)

There exists a positive constant $\widehat{C}_{0}$ , and a pair of inf-compact functions $({\mathcal{V}},h)\in{\mathcal{C}}^{2}({\mathds{R}^{d}})\times{\mathcal{C}}({\mathds{R}^{d}}\times\mathbb{U})$ (i.e., the sub-level sets $\{{\mathcal{V}}\leq k\}\,,\{h\leq k\}$ are compact or empty sets in ${\mathds{R}^{d}}$ , ${\mathds{R}^{d}}\times\mathbb{U}$ respectively for each $k\in\mathds{R}$ ) such that

{\mathscr{L}}_{\zeta}{\mathcal{V}}(x)\leq\widehat{C}_{0}-h(x,u)\quad\text{for all}\,\,\,(x,\zeta)\in{\mathds{R}^{d}}\times\mathbb{U}\,,

(4.44)

where $h$ is locally Lipschitz continuous in its first argument uniformly with respect to second.

(ii)

For each $n\in\mathds{N}$ , we have

{\mathscr{L}}_{\zeta}^{n}{\mathcal{V}}(x)\leq\widehat{C}_{0}-h(x,u)\quad\text{for all}\,\,\,(x,\zeta)\in{\mathds{R}^{d}}\times\mathbb{U}\,,

(4.45)

where the functions ${\mathcal{V}},h$ are as in Eq. 4.44 .

Combining [ABG-book, Theorem 3.7.11] and [ABG-book, Theorem 3.7.12], we have the following complete characterization of the ergodic optimal control.

Theorem 4.6.

Suppose that assumptions (A1)-(A4) and (A7)(i) hold. Then the ergodic HJB equation

\rho=\min_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}V^{*}(x)+c(x,\zeta)\right]

(4.46)

admits unique solution $(V^{*},\rho)\in{\mathcal{C}}^{2}({\mathds{R}^{d}})\cap{\mathfrak{o}}({\mathcal{V}})\times\mathds{R}$ satisfying $V^{*}(0)=0$ . Moreover, we have

(i)

$\rho={\mathscr{E}}^{*}(c)$

(ii)

a stationary Markov control $v^{*}\in\mathfrak{U}_{\mathsf{sm}}$ is an optimal control (i.e., ${\mathscr{E}}_{x}(c,v^{*})={\mathscr{E}}^{*}(c)$ ) if and only if it satisfies

\min_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}V^{*}(x)+c(x,\zeta)\right]\,=\,\operatorname*{Tr}\bigl{(}a(x)\nabla^{2}V^{*}(x)\bigr{)}+b(x,v^{*}(x))\cdot\nabla V^{*}(x)+c(x,v^{*}(x))\,,\quad\text{a.e.}\,\,x\in{\mathds{R}^{d}}\,.

(4.47)

(iii)

for any $v^{*}\in\mathfrak{U}_{\mathsf{sm}}$ satisfying Eq. 4.47, we have

V^{*}(x)\,=\,\lim_{r\downarrow 0}\operatorname{\mathbb{E}}_{x}^{v^{*}}\left[\int_{0}^{{\breve{\uptau}}_{r}}\left(c(X_{t},v^{*}(X_{t}))-{\mathscr{E}}^{*}(c)\right)\mathrm{d}t\right]\quad\text{for all}\,\,\,x\in{\mathds{R}^{d}}\,.

(4.48)

Again, from [ABG-book, Theorem 3.7.11] and [ABG-book, Theorem 3.7.12], for the approximated model for each $n\in\mathds{N}$ , we have the following complete characterization of the optimal control.

Theorem 4.7.

Suppose that Assumptions (A5) and (A7)(ii) hold. Then the ergodic HJB equation

\rho_{n}=\min_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}^{n}V(x)+c_{n}(x,\zeta)\right]

(4.49)

admits unique solution $(V^{n*},\rho_{n})\in{\mathcal{C}}^{2}({\mathds{R}^{d}})\cap{\mathfrak{o}}({\mathcal{V}})\times\mathds{R}$ satisfying $V^{n*}(0)=0$ . Moreover, we have

(i)

$\rho_{n}={\mathscr{E}}^{n*}(c_{n})$

(ii)

a stationary Markov control $v_{n}^{*}\in\mathfrak{U}_{\mathsf{sm}}$ is an optimal control (i.e., ${\mathscr{E}}_{x}^{n}(c_{n},v_{n}^{*})={\mathscr{E}}^{n*}(c_{n})$ ) if and only if it satisfies

\min_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}^{n}V^{n*}(x)+c_{n}(x,\zeta)\right]\,=\,\operatorname*{Tr}\bigl{(}a_{n}(x)\nabla^{2}V^{n*}(x)\bigr{)}+b_{n}(x,v_{n}^{*}(x))\cdot\nabla V^{n*}(x)+c(x,v_{n}^{*}(x))\,,\quad\text{a.e.}\,\,x\in{\mathds{R}^{d}}\,.

(4.50)

(iii)

for any $v_{n}^{*}\in\mathfrak{U}_{\mathsf{sm}}$ satisfying Eq. 4.50, we have

V^{n*}(x)\,=\,\lim_{r\downarrow 0}\operatorname{\mathbb{E}}_{x}^{v_{n}^{*}}\left[\int_{0}^{{\breve{\uptau}}_{r}}\left(c_{n}(X_{t},v_{n}^{*}(X_{t}))-{\mathscr{E}}^{n*}(c_{n})\right)\mathrm{d}t\right]\quad\text{for all}\,\,\,x\in{\mathds{R}^{d}}\,.

(4.51)

From [ABG-book, lemma 3.7.8], it is easy to see that the functions $V^{n*},V^{*}$ are bounded from below. Next we show that under Assumption (A7), as $n\to\infty$ the optimal value $V^{n*}$ of the approximated model converges to the optimal value $V^{*}$ of the true model.

Theorem 4.8.

Suppose that Assumptions (A1)-(A5) and (A7) hold. Then, it follows that

\lim_{n\to\infty}{\mathscr{E}}^{n*}(c_{n})={\mathscr{E}}^{*}(c)\,.

(4.52)

Proof.

Since, $\|c_{n}\|_{\infty}\leq M,$ we get ${\mathscr{E}}^{n*}(c_{n})\leq M$ . Also, Eq. 4.44 implies that all $v\in\mathfrak{U}_{\mathsf{sm}}$ is stable and $\inf_{v\in\mathfrak{U}_{\mathsf{sm}}}\eta_{v}({\mathscr{B}}_{R})>0$ for any $R>0$ (see, [ABG-book, Lemma 3.3.4] and [ABG-book, Lemma 3.2.4(b)]). Thus from [ABG-book, Theorem 3.7.6], we have there exist constants $\widehat{C}_{1},\widehat{C}_{2}>0$ depending only on the radius $R>0$ such that for all $\alpha>0$ , we have

\|V_{\alpha}^{n}(\cdot)-V_{\alpha}^{n}(0)\|_{{\mathscr{W}}^{2,p}({\mathscr{B}}_{R})}\leq\widehat{C}_{1}\quad\text{and}\,\,\,\sup_{{\mathscr{B}}_{R}}\alpha V_{\alpha}^{n}\leq\widehat{C}_{2}\,.

(4.53)

By standard vanishing discount argument (see [ABG-book, Lemma 3.7.8]) as $\alpha\to 0$ we have $V_{\alpha}^{n}(\cdot)-V_{\alpha}^{n}(0)\to V^{n*}$ and $\alpha V_{\alpha}^{n}(0)\to\rho_{n}$ . Hence the estimates Eq. 4.53 give us $\|V^{n*}\|_{{\mathscr{W}}^{2,p}({\mathscr{B}}_{R})}\leq\widehat{C}_{1}$ . Since the constant $\widehat{C}_{1}$ is independent of $n$ , by standard diagonalization argument and the Banach-Alaoglu theorem, we can extract a subsequence $\{V^{n_{k}*}\}$ such that for some $\widehat{V}^{*}\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})$ (as in Eq. 3.8)

\begin{cases}V^{n_{k}*}\to&\widehat{V}^{*}\quad\text{in}\quad{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\quad\text{(weakly)}\\ V^{n_{k}*}\to&\widehat{V}^{*}\quad\text{in}\quad{\mathcal{C}}^{1,\beta}_{loc}({\mathds{R}^{d}})\quad\text{(strongly)}\,.\end{cases}

(4.54)

Also, since $\rho_{n}\leq M$ , along a further sub-sequence (with out loss of generality denoting by same sequence) we have $\rho_{n_{k}}\to\widehat{\rho}$ as $k\to\infty$ . Now multiplying both sides of the equation Eq. 4.49 by test functions $\phi$ , we obtain

\int_{{\mathds{R}^{d}}}\operatorname*{Tr}\bigl{(}a_{n_{k}}(x)\nabla^{2}V^{n_{k}*}(x)\bigr{)}\phi(x)\mathrm{d}x+\int_{{\mathds{R}^{d}}}\min_{\zeta\in\mathbb{U}}\{b_{n_{k}}(x,\zeta)\cdot\nabla V^{n_{k}*}(x)+c_{n_{k}}(x,\zeta)\}\phi(x)\mathrm{d}x=\int_{{\mathds{R}^{d}}}\rho_{n_{k}}\phi(x)\mathrm{d}x\,.

As in Theorem 3.4, using Eq. 4.54 and letting $k\to\infty$ it follows that $\widehat{V}^{*}\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})$ satisfies

\widehat{\rho}=\min_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}\widehat{V}^{*}(x)+c(x,\zeta)\right]\,.

(4.55)

Rewriting the equation Eq. 4.55, we have

\operatorname*{Tr}\bigl{(}a(x)\nabla^{2}\widehat{V}^{*}(x)\bigr{)}=f(x)\,,\quad\text{a.e.}\,\,x\in{\mathds{R}^{d}}\,,

where

f(x)=-\inf_{\zeta\in\mathbb{U}}\left[b(x,\zeta)\cdot\nabla\widehat{V}^{*}(x)+c(x,\zeta)-\widehat{\rho}\right]\,.

In view of Eq. 4.54 and assumptions (A1) and (A2), it is easy to see that $f\in{\mathcal{C}}^{0,\beta}_{loc}({\mathds{R}^{d}})$ where $0<\beta<1-\frac{d}{p}$ . Thus, by elliptic regularity [CL89, Theorem 3] (also see, [GilTru, Theorem 9.19]), we obtain $\widehat{V}^{*}\in{\mathcal{C}}^{2}({\mathds{R}^{d}})$ .

Next we want to show that $\widehat{V}^{*}\in{\mathfrak{o}}{({\mathcal{V}})}$ . Since $\sup_{n}\|c_{n}\|\leq M$ we have $1+\tilde{c}\in{\mathfrak{o}}{(h)}$ , where $\tilde{c}\,:=\,\sup_{n}c_{n}$ . Also, since $h$ is inf-compact for a large enough $r>0$ we have $\displaystyle{\widehat{C}_{0}-\inf_{\zeta\in\mathbb{U}}h(x,\zeta)\leq-\epsilon}$ for all $x\in{\mathscr{B}}_{r}^{c}$ . Let $X_{t}^{n}$ be the solution of Eq. 2.15 corresponding to $v\in\mathfrak{U}_{\mathsf{sm}}$ . Hence, in view of Eq. 4.45, by It $\hat{\rm o}$ -Krylov formula, for any $v\in\mathfrak{U}_{\mathsf{sm}}$ and $x\in{\mathscr{B}}_{r}^{c}\cap{\mathscr{B}}_{R}$ we deduce that

\operatorname{\mathbb{E}}_{x}^{v}\left[{\mathcal{V}}(X_{{\breve{\uptau}}_{r}^{n}\wedge\uptau_{R}^{n}}^{n})\right]-{\mathcal{V}}(x)=\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{{\breve{\uptau}}_{r}^{n}\wedge\uptau_{R}^{n}}{\mathscr{L}}_{v}^{n}{\mathcal{V}}(X_{s}^{n})\mathrm{d}s\right]\leq-\epsilon\operatorname{\mathbb{E}}_{x}^{v}\left[{\breve{\uptau}}_{r}^{n}\wedge\uptau_{R}^{n}\right]\,,

where ${\breve{\uptau}}_{r}^{n}:=\inf\{t\geq 0:X_{t}^{n}\in{\mathscr{B}}_{r}\}$ and $\uptau_{R}^{n}:=\inf\{t\geq 0:X_{t}^{n}\in{\mathscr{B}}_{R}^{c}\}$ . Letting $R\to\infty$ , by Fatou’s lemma we obtain

\operatorname{\mathbb{E}}_{x}^{v}\left[{\breve{\uptau}}_{r}^{n}\right]\leq\frac{1}{\epsilon}{\mathcal{V}}(x)\quad\text{for all}\,\,\,x\in{\mathscr{B}}_{r}^{c}\,\,\,\text{and}\,\,\,n\in\mathds{N}\,.

(4.56)

Again, by It $\hat{\rm o}$ -Krylov formula, for any $v\in\mathfrak{U}_{\mathsf{sm}}$ and $x\in{\mathscr{B}}_{r}^{c}\cap{\mathscr{B}}_{R}$ we have

\operatorname{\mathbb{E}}_{x}^{v}\left[{\mathcal{V}}(X_{{\breve{\uptau}}_{r}^{n}\wedge\uptau_{R}^{n}}^{n})\right]-{\mathcal{V}}(x)=\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{{\breve{\uptau}}_{r}^{n}\wedge\uptau_{R}^{n}}{\mathscr{L}}_{v}^{n}{\mathcal{V}}(X_{s}^{n})\mathrm{d}s\right]\leq\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{{\breve{\uptau}}_{r}^{n}\wedge\uptau_{R}^{n}}(\widehat{C}_{0}-h(X_{s}^{n},v(X_{s}^{n})))\mathrm{d}s\right]\,,

Thus, by Fatou’s lemma letting $R\to\infty$ and using Eq. 4.56 we get

\sup_{n\in\mathds{N}}\sup_{v\in\mathfrak{U}_{\mathsf{sm}}}\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{{\breve{\uptau}}_{r}^{n}}h(X_{s}^{n},v(X_{s}^{n}))\mathrm{d}s\right]\leq\widehat{M}_{1}{\mathcal{V}}(x)\,,

for some positive constant $\widehat{M}_{1}$ . Hence, by arguing as in the proof of [ABG-book, Lemma 3.7.2 (i)], we have

\sup_{n\in\mathds{N}}\sup_{v\in\mathfrak{U}_{\mathsf{sm}}}\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{{\breve{\uptau}}_{r}^{n}}(1+\tilde{c}(X_{s}^{n},v(X_{s}^{n})))\mathrm{d}s\right]\in{\mathfrak{o}}{({\mathcal{V}})}\,.

(4.57)

Now, following the proof of [ABG-book, Lemma 3.7.8] (see, eq.(3.7.47)), it follows that

V^{n*}(x)\,\leq\,\sup_{v\in\mathfrak{U}_{\mathsf{sm}}}\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{{\breve{\uptau}}_{r}^{n}}\left(c_{n}(X_{t}^{n},v(X_{t}^{n}))-{\mathscr{E}}^{n*}(c_{n})\right)\mathrm{d}t+V^{n*}(X_{{\breve{\uptau}}_{r}})\right]\,.

(4.58)

We know that for $p\geq d+1$ the space ${\mathscr{W}}^{2,p}({\mathscr{B}}_{R})$ is compactly embedded in ${\mathcal{C}}^{1,\beta}(\bar{{\mathscr{B}}}_{R})$ where $0<\beta<1-\frac{d}{p}$ . Since $\|V^{n*}\|_{{\mathscr{W}}^{2,p}({\mathscr{B}}_{R})}\leq\widehat{C}_{1}$ for some positive constant $\widehat{C}_{1}$ which depends only on $R$ , we deduce that $\sup_{n\in\mathds{N}}\sup_{{\mathscr{B}}_{r}}|V^{n*}|\leq\widehat{M}_{2}$ , where $\widehat{M}_{2}(>0)$ is a constant. Also, since ${\mathscr{E}}^{n*}(c_{n})\leq\|c_{n}\|_{\infty}\leq M$ from Eq. 4.58, it is easy to see that

|V^{n*}(x)|\,\leq\,M\sup_{n\in\mathds{N}}\sup_{v\in\mathfrak{U}_{\mathsf{sm}}}\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{{\breve{\uptau}}_{r}^{n}}\left(\tilde{c}(X_{t}^{n},v(X_{t}^{n}))+1\right)\mathrm{d}t+\sup_{n\in\mathds{N}}\sup_{{\mathscr{B}}_{r}}|V^{n*}|\right]\,.

(4.59)

Therefore, by combining Eq. 4.54, Eq. 4.57 and Eq. 4.59, we obtain $\widehat{V}^{*}\in{\mathfrak{o}}{({\mathcal{V}})}$ . Since, $(\widehat{V}^{*},\widehat{\rho})\in{\mathcal{C}}^{2}({\mathds{R}^{d}})\cap{\mathfrak{o}}({\mathcal{V}})\times\mathds{R}$ satisfying $V^{*}(0)=0$ satisfies Eq. 4.46, by uniqueness result of Theorem 4.6, we deduce that $(\widehat{V}^{*},\widehat{\rho})\equiv(V^{*},\rho)$ . This completes the proof of the theorem. ∎

Next theorem proofs existence of a unique solution to a certain equation in some suitable function space. This result will be very useful in establishing our robustness result.

Theorem 4.9.

Suppose that assumptions (A1)-(A4) and (A7)(i) hold. Then for each $v\in\mathfrak{U}_{\mathsf{sm}}$ there exist a unique solution pair $(V^{v},\rho^{v})\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\cap{\mathfrak{o}}({\mathcal{V}})\times\mathds{R}$ for any $p>1$ satisfying

\rho^{v}={\mathscr{L}}_{v}V^{v}(x)+c(x,v(x))\quad\text{with}\quad V^{v}(0)=0\,.

(4.60)

Furthermore, we have

(i)

$\rho^{v}={\mathscr{E}}_{x}(c,v)$

(ii)

for all $x\in{\mathds{R}^{d}}$ , we have

V^{v}(x)\,=\,\lim_{r\downarrow 0}\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{{\breve{\uptau}}_{r}}\left(c(X_{t},v(X_{t}))-{\mathscr{E}}_{x}(c,v)\right)\mathrm{d}t\right]\,.

(4.61)

Proof.

Existence of a solution pair $(V^{v},\rho^{v})\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\cap{\mathfrak{o}}({\mathcal{V}})\times\mathds{R}$ for any $p>1$ satisfying (i) and (ii) follows from [ABG-book, Lemma 3.7.8] . Now we want to prove the uniqueness of the solutions of Eq. 4.60. Let $(\bar{V}^{v},\bar{\rho}^{v})\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\cap{\mathfrak{o}}({\mathcal{V}})\times\mathds{R}$ for any $p>1$ be any other solution pair of Eq. 4.60 with $\bar{V}^{v}(0)=0$ . By It $\hat{\rm o}$ -Krylov formula, for $R>0$ we obtain

	$\displaystyle\operatorname{\mathbb{E}}_{x}^{v}\left[\bar{V}^{v}(X_{T\wedge\uptau_{R}})\right]-\bar{V}^{v}(x)$	$\displaystyle=\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{T\wedge\uptau_{R}}{\mathscr{L}}_{v}\bar{V}^{v}(X_{s})\mathrm{d}s\right]$
		$\displaystyle=\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{T\wedge\uptau_{R}}\left(\bar{\rho}^{v}-c(X_{s},v(X_{s}))\right)\mathrm{d}s\right]\,.$		(4.62)

Note that

\int_{0}^{T\wedge\uptau_{R}}\left(\bar{\rho}^{v}-c(X_{s},v(X_{s}))\right)\mathrm{d}s=\int_{0}^{T\wedge\uptau_{R}}\bar{\rho}^{v}\mathrm{d}s-\int_{0}^{T\wedge\uptau_{R}}c(X_{s},v(X_{s}))\mathrm{d}s

Thus, letting $R\to\infty$ by monotone convergence theorem, we get

\lim_{R\to\infty}\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{T\wedge\uptau_{R}}\left(\bar{\rho}^{v}-c(X_{s},v(X_{s}))\right)\mathrm{d}s\right]=\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{T}\left(\bar{\rho}^{v}-c(X_{s},v(X_{s}))\right)\mathrm{d}s\right]\,.

Since $\bar{V}^{v}\in{\mathfrak{o}}{({\mathcal{V}})}$ , in view of [ABG-book, Lemma 3.7.2 (ii)], letting $R\to\infty$ , we deduce that

\displaystyle\operatorname{\mathbb{E}}_{x}^{v}\left[\bar{V}^{v}(X_{T})\right]-\bar{V}^{v}(x)=\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{T}\left(\bar{\rho}^{v}-c(X_{s},v(X_{s}))\right)\mathrm{d}s\right]\,.

(4.63)

Also, from [ABG-book, Lemma 3.7.2 (ii)], we have

\lim_{T\to\infty}\frac{\operatorname{\mathbb{E}}_{x}^{v}\left[\bar{V}^{v}(X_{T})\right]}{T}=0\,.

Now, dividing both sides of Eq. 4.63 by $T$ and letting $T\to\infty$ , we obtain

\displaystyle\bar{\rho}^{v}=\limsup_{T\to\infty}\frac{1}{T}\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{T}\left(c(X_{s},v(X_{s}))\right)\mathrm{d}s\right]\,.

This implies that $\bar{\rho}^{v}=\rho^{v}$ . Using Eq. 4.60, by It $\hat{\rm o}$ -Krylov formula we have

\displaystyle\bar{V}^{v}(x)\,=\,\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{{\breve{\uptau}}_{r}\wedge\uptau_{R}}\left(c(X_{t},v(X_{t}))-\bar{\rho}^{v}\right)\mathrm{d}{t}+\bar{V}^{v}\left(X_{{\breve{\uptau}}_{r}\wedge\uptau_{R}}\right)\right]\,.

(4.64)

Also, by It $\hat{\rm o}$ -Krylov formula and using Eq. 4.44 it follows that

\operatorname{\mathbb{E}}_{x}^{v}\left[{\mathcal{V}}\left(X_{\uptau_{R}}\right)\mathds{1}_{\{{\breve{\uptau}}_{r}\geq\uptau_{R}\}}\right]\leq\widehat{C}_{0}\operatorname{\mathbb{E}}_{x}^{v}\left[{\breve{\uptau}}_{r}\right]+{\mathcal{V}}(x)\quad\text{for all}\,\,\,r<|x|<R\,.

Since $\bar{V}^{v}\in{\mathfrak{o}}({\mathcal{V}})$ , form the above estimate, we get

\liminf_{R\to\infty}\operatorname{\mathbb{E}}_{x}^{v}\left[\bar{V}^{v}\left(X_{\uptau_{R}}\right)\mathds{1}_{\{{\breve{\uptau}}_{r}\geq\uptau_{R}\}}\right]=0\,.

Thus, letting $R\to\infty$ by Fatou’s lemma from Eq. 4.64, it follows that

	$\displaystyle\bar{V}^{v}(x)$	$\displaystyle\,\geq\,\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{{\breve{\uptau}}_{r}}\left(c(X_{t},v(X_{t}))-\bar{\rho}^{v}\right)\mathrm{d}{t}+\bar{V}^{v}\left(X_{{\breve{\uptau}}_{r}}\right)\right]$
		$\displaystyle\,\geq\,\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{{\breve{\uptau}}_{r}}\left(c(X_{t},v(X_{t}))-\bar{\rho}^{v}\right)\mathrm{d}{t}\right]+\inf_{{\mathscr{B}}_{r}}\bar{V}^{v}\,.$

Since $\bar{V}^{v}(0)=0$ , letting $r\to 0$ , we deduce that

\displaystyle\bar{V}^{v}(x)\,\geq\,\limsup_{r\downarrow 0}\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{{\breve{\uptau}}_{r}}\left(c(X_{t},v(X_{t}))-\bar{\rho}^{v}\right)\mathrm{d}{t}\right]\,.

(4.65)

Since $\widehat{\rho}^{v}=\bar{\rho}^{v}$ , from Eq. 4.61 and Eq. 4.65, it is easy to see that $\widehat{V}^{v}-\bar{V}^{v}\leq 0$ in ${\mathds{R}^{d}}$ . Also, since $(V^{v},\rho^{v})$ and $(\bar{V}^{v},\bar{\rho}^{v})$ are two solution pairs of Eq. 4.60, we have ${\mathscr{L}}_{v}\left(\widehat{V}^{v}-\bar{V}^{v}\right)(x)=0$ in ${\mathds{R}^{d}}$ . Hence, by strong maximum principle [GilTru, Theorem 9.6], one has $V^{v}=\bar{V}^{v}$ . This proves the uniqueness . ∎

Now we are ready to prove the robustness result, i.e., we want to show that ${\mathscr{E}}_{x}(c,v_{n}^{*})\to{\mathscr{E}}^{*}(c)$ as $n\to\infty$ , where $v_{n}^{*}$ is an optimal ergodic control of the approximated model (see, Theorem 4.7) .

Theorem 4.10.

Suppose that Assumptions (A1) - (A5) and (A7) hold. Then, we have

\lim_{n\to\infty}\inf_{x\in{\mathds{R}^{d}}}{\mathscr{E}}_{x}(c,v_{n}^{*})={\mathscr{E}}^{*}(c)\,.

(4.66)

Proof.

We shall follow a similar proof program as that of Theorem 3.4, under the discounted setup. From Theorem 4.9, we know that for each $n\in\mathds{N}$ there exists a unique pair $(V^{v_{n}^{*}},\rho^{v_{n}^{*}})\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\cap{\mathfrak{o}}{({\mathcal{V}})}\times\mathds{R}$ , $1<p<\infty$ , with $V^{v_{n}^{*}}(0)=0$ satisfying

\rho^{v_{n}^{*}}=\left[{\mathscr{L}}_{v_{n}^{*}}V^{v_{n}^{*}}(x)+c(x,{v_{n}^{*}}(x))\right]

(4.67)

In view of Eq. 4.44, it is easy to see that, each $v\in\mathfrak{U}_{\mathsf{sm}}$ is stable and $\inf_{v\in\mathfrak{U}_{\mathsf{sm}}}\eta_{v}({\mathscr{B}}_{R})>0$ for any $R>0$ (see, [ABG-book, Lemma 3.3.4] and [ABG-book, Lemma 3.2.4(b)]). Thus, from [ABG-book, Theorem 3.7.4], it follows that $\lVert V^{v_{n}^{*}}\rVert_{{\mathscr{W}}^{2,p}({\mathscr{B}}_{R})}\leq\hat{\kappa}_{1}$ where $\hat{\kappa}_{1}$ is a constant independent of $n\in\mathds{N}$ . Therefore by the Banach-Alaoglu theorem and standard diagonalization argument (as in Eq. 3.8), we deduce that there exists $\tilde{V}\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})$ such that along a sub-sequence

\begin{cases}V^{v_{n_{k}}^{*}}\to&\tilde{V}\quad\text{in}\quad{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\quad\text{(weakly)}\\ V^{v_{n_{k}}^{*}}\to&\tilde{V}\quad\text{in}\quad{\mathcal{C}}^{1,\beta}_{loc}({\mathds{R}^{d}})\quad\text{(strongly)}\,.\end{cases}

(4.68)

Again, since $\rho^{v_{n}^{*}}\leq M$ , along a further sub-sequence (without loss of generality denoting by same sequence), we have $\rho^{v_{n_{k}}^{*}}\to\tilde{\rho}$ as $k\to\infty$ . Since $\mathfrak{U}_{\mathsf{sm}}$ is compact along a further subsequence (without loss of generality denoting by same sequence) we have $v_{n_{k}}^{*}\to\tilde{v}^{*}$ as $k\to\infty$ . Now, as in Theorem 3.4, multiplying by test function and letting $k\to\infty$ , from Eq. 4.67, it is easy to see that $(\tilde{V},\tilde{\rho})\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\times\mathds{R}$ , $1<p<\infty$ , satisfies

\tilde{\rho}=\left[{\mathscr{L}}_{\tilde{v}^{*}}\tilde{V}(x)+c(x,{\tilde{v}^{*}}(x))\right]

(4.69)

As we know that $V^{v_{n_{k}}^{*}}(0)=0$ for all $k\in\mathds{N}$ , we deduce that $\tilde{V}(0)=0$ . Arguing as in Theorem 4.8 and using the estimate $\lVert V^{v_{n}^{*}}\rVert_{{\mathscr{W}}^{2,p}({\mathscr{B}}_{R})}\leq\hat{\kappa}_{1}$ , we have

|\tilde{V}(x)|\,\leq\,M\sup_{v\in\mathfrak{U}_{\mathsf{sm}}}\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{{\breve{\uptau}}_{r}}\left(c(X_{t},v(X_{t}))+1\right)\mathrm{d}t+\sup_{n\in\mathds{N}}\sup_{{\mathscr{B}}_{r}}|V^{v_{n}^{*}}|\right]\in{\mathfrak{o}}{({\mathcal{V}})}\,.

(4.70)

Thus, by uniqueness of solution of Eq. 4.69 (see, Theorem 4.9), we deduce that $(\tilde{V},\tilde{\rho})\equiv(V^{\tilde{v}^{*}},\rho^{\tilde{v}^{*}})$ .

By the triangle inequality

|\rho^{v_{n_{k}}^{*}}-\rho|\leq|\rho^{v_{n_{k}}^{*}}-\rho_{n_{k}}|+|\rho_{n_{k}}-\rho|\,.

From Theorem 4.7 we have $|\rho_{n_{k}}-\rho|\to 0$ as $k\to\infty$ . Hence to complete the proof we have to show that $|\rho^{v_{n_{k}}^{*}}-\rho_{n_{k}}|\to 0$ as $k\to\infty$ . Now, for any minimizing selector $v_{n_{k}}^{*}\in\mathfrak{U}_{\mathsf{sm}}$ of Eq. 4.49, we have

\rho_{n_{k}}=\left[{\mathscr{L}}_{v_{n_{k}}^{*}}^{n_{k}}V^{n_{k}}(x)+c_{n_{k}}(x,v_{n_{k}}^{*}(x))\right]\,.

(4.71)

In view of the estimate Eq. 4.53, we obtain

\lVert V^{n_{k}}\rVert_{{\mathscr{W}}^{2,p}({\mathscr{B}}_{R})}\leq\hat{\kappa}

(4.72)

where $\hat{\kappa}>0$ is a constant independent of $k\in\mathds{N}$ . Hence, by the Banach-Alaoglu theorem and standard diagonalization argument (see Eq. 3.8), we have there exists $\tilde{V}^{*}\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})$ such that along a sub-sequence

\begin{cases}V^{n_{k}}\to&\tilde{V}^{*}\quad\text{in}\quad{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\quad\text{(weakly)}\\ V^{n_{k}}\to&\tilde{V}^{*}\quad\text{in}\quad{\mathcal{C}}^{1,\beta}_{loc}({\mathds{R}^{d}})\quad\text{(strongly)}\,.\end{cases}

(4.73)

Since, $\rho_{n_{k}}\leq M$ long a further subsequence (denoting by same sequence without loss generality) $\rho_{n_{k}}\to\tilde{\rho}^{*}$ . As we know $v_{n_{k}}^{*}\to\tilde{v}^{*}$ in $\mathfrak{U}_{\mathsf{sm}}$ , multiplying both sides of Eq. 4.71 by test functions and letting $k\to\infty$ , it follows that $(\tilde{V}^{*},\tilde{\rho}^{*})\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\times\mathds{R}$ , $1<p<\infty$ satisfies

\tilde{\rho}^{*}=\left[{\mathscr{L}}_{\tilde{v}^{*}}\tilde{V}^{*}(x)+c(x,\tilde{v}^{*}(x))\right]\,.

(4.74)

Arguing as in Theorem 4.8, one can show that $\tilde{V}^{*}\in{\mathfrak{o}}{({\mathcal{V}})}$ . Hence, by uniqueness of solution of Eq. 4.71 (see, Theorem 4.9) we deduce that $(\tilde{V}^{*},\tilde{\rho}^{*})\equiv(V^{\tilde{v}^{*}},\rho^{\tilde{v}^{*}})$ . Since both $\rho^{v_{n_{k}}^{*}}$ and $\rho_{n_{k}}$ converge to same limit $\rho^{\tilde{v}^{*}}$ , it follows that $|\rho^{v_{n_{k}}^{*}}-\rho_{n_{k}}|\to 0$ as $k\to\infty$ . This completes the proof of the theorem. ∎

5. Finite Horizon Cost

In this section we study the robustness problem under a finite horizon criterion . We will assume that $a,a_{n},b,b_{n},c,c_{n}$ satisfy the following:

(FN1)

The functions $a,a_{n},b,b_{n},c,c_{n}$ satisfy

\sup_{(x,\zeta)\in{\mathds{R}^{d}}\times\mathbb{U}}\left[\lvert b(x,\zeta)\rvert+\lVert a(x)\rVert+\sum_{i}^{d}\lVert\frac{\partial{a}}{\partial x_{i}}(x)\rVert+\lvert c(x,\zeta)\rvert\right]\,\leq\,\mathrm{K}\,.

and

\sup_{n\in\mathds{N}}\sup_{(x,\zeta)\in{\mathds{R}^{d}}\times\mathbb{U}}\left[\lvert b_{n}(x,\zeta)\rvert+\lVert a_{n}(x)\rVert+\sum_{i}^{d}\lVert\frac{\partial{a_{n}}}{\partial x_{i}}(x)\rVert+\lvert c_{n}(x,\zeta)\rvert\right]\,\leq\,\mathrm{K}\,.

for some positive constant $\mathrm{K}$ . Furthermore, $H\in{\mathscr{W}}^{2,p,\mu}({\mathds{R}^{d}})\cap{L}^{\infty}({\mathds{R}^{d}})$ , $p\geq 2$ .

From [BL84-book, Theorem 3.3, p. 235], the finite horizon optimality equation (or, the HJB equation)

		$\displaystyle\frac{\partial\psi}{\partial t}+\inf_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}\psi+c(x,\zeta)\right]=0$		(5.1)
		$\displaystyle\psi(T,x)=H(x)$		(5.2)

admits a unique solution $\psi\in{\mathscr{W}}^{1,2,p,\mu}((0,T)\times{\mathds{R}^{d}})\cap{L}^{\infty}((0,T)\times{\mathds{R}^{d}})$ , for some $p\geq 2$ and $\mu>0$ . Now, by Itô-Krylov formula (as in [HP09-book, Theorem 3.5.2]), there exist an optimal Markov policy, i.e., there exists $v^{*}\in\mathfrak{U}_{\mathsf{m}}$ such that ${\mathcal{J}}_{T}^{v^{*}}(x,c)={\mathcal{J}}_{T}^{*}(x,c)=\psi(0,x)$ .

Similarly, for each $n\in\mathds{N}$ (for the approximating models) the optimality equation

		$\displaystyle\frac{\partial\psi_{n}}{\partial t}+\inf_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}^{n}\psi_{n}+c_{n}(x,\zeta)\right]=0$		(5.3)
		$\displaystyle\psi_{n}(T,x)=H(x)$		(5.4)

admits a unique solution $\psi_{n}\in{\mathscr{W}}^{1,2,p,\mu}((0,T)\times{\mathds{R}^{d}})\cap{L}^{\infty}((0,T)\times{\mathds{R}^{d}})$ , $p\geq 2$ . Moreover, by the Itô-Krylov formula (as in [HP09-book, Theorem 3.5.2]), there exists $v_{n}^{*}\in\mathfrak{U}_{\mathsf{m}}$ such that ${\mathcal{J}}_{T,n}^{v_{n}^{*}}(x,c_{n})={\mathcal{J}}_{T,n}^{*}(x,c_{n})=\psi_{n}(0,x)$ .

The following theorem shows that as the approximating model approaches the true model the optimal value of the approximating model converge to the optimal value of the true model.

Theorem 5.1.

Suppose Assumptions (A1), (A3) and (FN1) hold. Then

\lim_{n\to\infty}{\mathcal{J}}_{T,n}^{*}(x,c_{n})={\mathcal{J}}_{T}^{*}(x,c)\,.

Proof.

For any minimizing selector $v_{n}^{*}$ of Eq. 5.3, we have

		$\displaystyle\frac{\partial\psi_{n}}{\partial t}+{\mathscr{L}}_{v_{n}^{}}^{n}\psi_{n}+c_{n}(x,v_{n}^{}(t,x))=0$		(5.5)
		$\displaystyle\psi_{n}(T,x)=H(x)$		(5.6)

By the Itô-Krylov formula, it follows that

\displaystyle\psi_{n}(t,x)=\operatorname{\mathbb{E}}_{x}^{v_{n}^{*}}\left[\int_{t}^{T}c_{n}(X_{s}^{n},v_{n}^{*}(s,X_{s}^{*}))\mathrm{d}{s}+H(X_{T}^{*})\right]

(5.7)

This implies that

\lVert\psi_{n}\rVert_{\infty}\leq T\lVert c_{n}\rVert_{\infty}+\lVert H\rVert_{\infty}\,.

(5.8)

Rewriting Eq. 5.5, it follows that

	$\displaystyle\frac{\partial\psi_{n}}{\partial t}+{\mathscr{L}}_{v_{n}^{}}^{n}\psi_{n}+\lambda_{0}\psi_{n}=\lambda_{0}\psi_{n}-c_{n}(x,v_{n}^{}(t,x))$
	$\displaystyle\psi_{n}(T,x)=H(x)\,,$

for some fixed $\lambda_{0}>0$ . Thus, by parabolic pde estimate [BL84-book, eq. (3.8), p. 234], we deduce that

\lVert\psi_{n}\rVert_{{\mathscr{W}}^{1,2,p,\mu}}\leq\hat{\kappa}_{1}\lVert\lambda_{0}\psi_{n}-c_{n}(x,v_{n}^{*}(t,x))\rVert_{{L}^{p,\mu}}\,.

(5.9)

Thus, from Eq. 5.8 and Eq. 5.9, we obtain $\lVert\psi_{n}\rVert_{{\mathscr{W}}^{1,2,p,\mu}}\leq\hat{\kappa}_{2}$ for some positive constant $\hat{\kappa}_{2}$ (independent of $n$ ) . Since ${\mathscr{W}}^{1,2,p,\mu}((0,T)\times{\mathds{R}^{d}})$ is a reflexive Banach space, as a corollary of the Banach-Alaoglu theorem, there exists $\bar{\psi}\in{\mathscr{W}}^{1,2,p,\mu}((0,T)\times{\mathds{R}^{d}})$ such that along a subsequence (without loss of generality denoting by same sequence)

\begin{cases}\psi_{n}\to&\bar{\psi}\quad\text{in}\quad{\mathscr{W}}^{1,2,p,\mu}((0,T)\times{\mathds{R}^{d}})\quad\text{(weakly)}\\ \psi_{n}\to&\bar{\psi}\quad\text{in}\quad{\mathscr{W}}^{0,1,p,\mu}((0,T)\times{\mathds{R}^{d}})\quad\text{(strongly)}\,.\end{cases}

(5.10)

Now, as in our earlier analysis for the different cost criteria considered, multiplying both sides of the Eq. 5.3 by test function $\phi\in{\mathcal{C}}_{c}^{\infty}((0,T)\times{\mathds{R}^{d}})$ and integrating, we get

\displaystyle\int_{0}^{T}\int_{{\mathds{R}^{d}}}\frac{\partial\psi_{n}}{\partial t}\phi(t,x)\mathrm{d}t\mathrm{d}x+\int_{0}^{T}\int_{{\mathds{R}^{d}}}\inf_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}^{n}\psi_{n}+c_{n}(x,\zeta)\right]\phi(t,x)\mathrm{d}t\mathrm{d}x=0\,.

(5.11)

Thus, in view of Eq. 5.10, letting $n\to\infty$ , from Eq. 5.11 it follows that (arguing as in Section 3 - Eq. 3.11)

\displaystyle\int_{0}^{T}\int_{{\mathds{R}^{d}}}\frac{\partial\bar{\psi}}{\partial t}\phi(t,x)\mathrm{d}t\mathrm{d}x+\int_{0}^{T}\int_{{\mathds{R}^{d}}}\inf_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}\bar{\psi}+c(x,\zeta)\right]\phi(t,x)\mathrm{d}t\mathrm{d}x=0\,.

Since $\phi\in{\mathcal{C}}_{c}^{\infty}((0,T)\times{\mathds{R}^{d}})$ is arbitrary, from the above equation we deduce that $\bar{\psi}\in{\mathscr{W}}^{1,2,p,\mu}((0,T)\times{\mathds{R}^{d}})$ satisfies

		$\displaystyle\frac{\partial\bar{\psi}}{\partial t}+\inf_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}\bar{\psi}+c(x,\zeta)\right]=0$
		$\displaystyle\bar{\psi}(T,x)=H(x)\,.$		(5.12)

Since $\psi$ is the unique solution of Section 5, we deduce that $\bar{\psi}(0,x)=\psi(0,x)={\mathcal{J}}_{T}^{*}(x,c)$ . This completes the proof. ∎

In the following theorem, we prove the robustness result for the finite horizon cost criterion.

Theorem 5.2.

Suppose Assumptions (A1), (A3) and (FN1) hold. Then for any optimal control $v_{n}^{*}$ of the approximating models we have

\lim_{n\to\infty}{\mathcal{J}}_{T}^{v_{n}^{*}}(x,c)={\mathcal{J}}_{T}^{*}(x,c)\,.

Proof.

By the triangle inequality we have

|{\mathcal{J}}_{T}^{v_{n}^{*}}(x,c)-{\mathcal{J}}_{T}^{*}(x,c)|\leq|{\mathcal{J}}_{T}^{v_{n}^{*}}(x,c)-{\mathcal{J}}_{T,n}^{v_{n}^{*}}(x,c_{n})|+|{\mathcal{J}}_{T,n}^{v_{n}^{*}}(x,c_{n})-{\mathcal{J}}_{T}^{*}(x,c)|\,.

From Theorem 5.1, it is known that $|{\mathcal{J}}_{T,n}^{v_{n}^{*}}(x,c_{n})-{\mathcal{J}}_{T}^{*}(x,c)|\to 0$ as $n\to\infty$ . Next, we show that $|{\mathcal{J}}_{T}^{v_{n}^{*}}(x,c)-{\mathcal{J}}_{T,n}^{v_{n}^{*}}(x,c_{n})|\to 0$ as $n\to\infty$ .

Since the space $\mathfrak{U}_{\mathsf{m}}$ is compact (with topology defined as in [YukselPradhan, Definition 2.2]), along a sub-sequence $v_{n}^{*}\to\bar{v}$ . From [BL84-book, Theorem 3.3, p. 235], we have that for each $n\in\mathds{N}$ there exists a unique solution $\bar{\psi}_{n}\in{\mathscr{W}}^{1,2,p,\mu}((0,T)\times{\mathds{R}^{d}})\cap{L}^{\infty}((0,T)\times{\mathds{R}^{d}})$ , $p\geq 2$ , to the following Poisson equation

		$\displaystyle\frac{\partial\bar{\psi}_{n}}{\partial t}+\left[{\mathscr{L}}_{v_{n}^{}}\bar{\psi}_{n}+c(x,v_{n}^{}(t,x))\right]=0$
		$\displaystyle\psi_{n}(T,x)=H(x)\,.$		(5.13)

By Itô-Krylov formula, from Section 5 it follows that

\displaystyle\bar{\psi}_{n}(t,x)=\operatorname{\mathbb{E}}_{x}^{v_{n}^{*}}\left[\int_{t}^{T}c(X_{s},v_{n}^{*}(s,X_{s}))\mathrm{d}{s}+H(X_{T})\right]

(5.14)

This gives us

\lVert\bar{\psi}_{n}\rVert_{\infty}\leq T\lVert c\rVert_{\infty}+\lVert H\rVert_{\infty}\,.

(5.15)

Arguing as in Theorem 5.1, letting $n\to\infty$ from Section 5, we deduce that there exists $\hat{\psi}\in{\mathscr{W}}^{1,2,p,\mu}((0,T)\times{\mathds{R}^{d}})\cap{L}^{\infty}((0,T)\times{\mathds{R}^{d}})$ , $p\geq 2$ , satisfying

		$\displaystyle\frac{\partial\hat{\psi}}{\partial t}+\left[{\mathscr{L}}_{\bar{v}}\hat{\psi}+c(x,\bar{v}(t,x))\right]=0$
		$\displaystyle\hat{\psi}(T,x)=H(x)\,.$		(5.16)

Now using Section 5, by Itô-Krylov formula we deduce that

\displaystyle\hat{\psi}(t,x)=\operatorname{\mathbb{E}}_{x}^{\bar{v}}\left[\int_{t}^{T}c(X_{s},\bar{v}(s,X_{s}))\mathrm{d}{s}+H(X_{T})\right]\,.

(5.17)

Moreover, we have

		$\displaystyle\frac{\partial\psi_{n}}{\partial t}+{\mathscr{L}}_{v_{n}^{}}^{n}\psi_{n}+c_{n}(x,v_{n}^{}(t,x))=0$		(5.18)
		$\displaystyle\psi_{n}(T,x)=H(x)\,.$		(5.19)

Letting $n\to\infty$ , as in Theorem 5.1, we have there exists $\tilde{\psi}\in{\mathscr{W}}^{1,2,p,\mu}((0,T)\times{\mathds{R}^{d}})\cap{L}^{\infty}((0,T)\times{\mathds{R}^{d}})$ , $p\geq 2$ , satisfying

		$\displaystyle\frac{\partial\tilde{\psi}}{\partial t}+\left[{\mathscr{L}}_{\bar{v}}\tilde{\psi}+c(x,\bar{v}(t,x))\right]=0$
		$\displaystyle\tilde{\psi}(T,x)=H(x)\,.$		(5.20)

By Itô-Krylov formula, from Section 5, we obtain

\displaystyle\tilde{\psi}(t,x)=\operatorname{\mathbb{E}}_{x}^{\bar{v}}\left[\int_{t}^{T}c(X_{s},\bar{v}(s,X_{s}))\mathrm{d}{s}+H(X_{T})\right]\,.

(5.21)

From Eq. 5.17and Eq. 5.21, we deduce that ${\mathcal{J}}_{T}^{v_{n}^{*}}(x,c)=\bar{\psi}_{n}(0,x)$ and ${\mathcal{J}}_{T,n}^{v_{n}^{*}}(x,c_{n})=\psi_{n}(0,x)$ converge to the same limit . This completes the proof. ∎

6. Control up to an Exit Time

Before we conclude the paper, let us also briefly note that if one consider an optimal control up to an exit time with the cost given as:

•

(in true model:) for each $U\in\mathfrak{U}$ the associated cost is given as

\hat{{\mathcal{J}}}_{e}^{U}(x)\,:=\,\operatorname{\mathbb{E}}_{x}^{U}\left[\int_{0}^{\tau(O)}e^{-\int_{0}^{t}\delta(X_{s},U_{s})\mathrm{d}s}c(X_{t},U_{t})\mathrm{d}t+e^{-\int_{0}^{\tau(O)}\delta(X_{s},U_{s})\mathrm{d}s}h(X_{\tau(O)})\right],\quad x\in{\mathds{R}^{d}}\,,

•

(in approximated models:) for each $n\in\mathds{N}$ and $U\in\mathfrak{U}$ the associated cost is given as

\hat{{\mathcal{J}}}_{e,n}^{U}(x)\,:=\,\operatorname{\mathbb{E}}_{x}^{U}\left[\int_{0}^{\tau(O)}e^{-\int_{0}^{t}\delta(X_{s},U_{s})\mathrm{d}s}c_{n}(X_{t},U_{t})\mathrm{d}t+e^{-\int_{0}^{\tau(O)}\delta(X_{s},U_{s})\mathrm{d}s}h(X_{\tau(O)})\right],\quad x\in{\mathds{R}^{d}}\,,

where $O\subset{\mathds{R}^{d}}$ is a smooth bounded domain, $\tau(O)\,:=\,\inf\{t\geq 0:X_{t}\notin O\}$ , $\delta(\cdot,\cdot):\bar{O}\times\mathbb{U}\to[0,\infty)$ is the discount function and $h:\bar{O}\to\mathds{R}_{+}$ is the terminal cost function. In the true model the optimal value is defined as $\hat{{\mathcal{J}}}_{e}^{*}(x)=\inf_{U\in\mathfrak{U}}\hat{{\mathcal{J}}}_{e}^{U}(x)$ , and in the approximated model the optimal value is defined as $\hat{{\mathcal{J}}}_{e,n}^{*}(x)=\inf_{U\in\mathfrak{U}}\hat{{\mathcal{J}}}_{e,n}^{U}(x)$ . We assume that $\delta\in{\mathcal{C}}(\bar{O}\times\mathbb{U})$ , $h\in{\mathcal{C}}(\bar{O})$ . As in [RZ21], [B05Survey, p.229] the analysis leads to the following HJB equation.

\displaystyle\min_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}\phi(x)-\delta(x,\zeta)\phi(x)+c(x,\zeta)\right]=0\,,\quad\text{for all\ }\,\,x\in O\,,\quad\text{with}\quad\phi=h\,\,\,\text{on}\,\,\,\partial{O}\,.

By similar argument as in [ABG-book, Theorem 3.5.3], [ABG-book, Theorem 3.5.6] we have that $\hat{{\mathcal{J}}}_{e}^{*}$ , $\hat{{\mathcal{J}}}_{e,n}^{*}$ are unique solutions to their respective HJB equations. Existence follows by utilizing the Leray-Schauder fixed point theorem as in [ABG-book, Theorem 3.5.3] and uniqueness follows by It $\hat{o}$ -Krylov formula as in [ABG-book, Theorem 3.5.6] . Using standard elliptic PDE estimates (on bounded domain $O$ ) and closely mimicking the arguments as in Theorem 3.3, we have the following continuity result

Theorem 6.1.

Suppose Assumptions (A1)-(A5) hold. Then

\lim_{n\to\infty}\hat{{\mathcal{J}}}_{e,n}^{*}(x)=\hat{{\mathcal{J}}}_{e}^{*}(x)\quad\text{for all}\,\,x\in\bar{O}\,.

For each $n\in\mathds{N}$ , suppose that $\hat{v}_{e,n}^{*}\in\mathfrak{U}_{\mathsf{sm}}$ , $\hat{v}_{e}^{*}\in\mathfrak{U}_{\mathsf{sm}}$ are optimal controls of the approximated model and true model respectively. Then in view of the the above continuity result, following the steps of the proof of the Theorem 3.4, we obtain the following robustness result.

Theorem 6.2.

Suppose Assumptions (A1)-(A5) hold. Then

\lim_{n\to\infty}\hat{{\mathcal{J}}}_{e}^{\hat{v}_{e,n}^{*}}(x)=\hat{{\mathcal{J}}}_{e}^{\hat{v}_{e}^{*}}(x)\quad\text{for all}\,\,x\in\bar{O}\,.

7. Revisiting Example 2.1

Consider Example 2.1(i).

•

For discounted cost: Let $\hat{v}_{n}^{*}$ be a discounted cost optimal control when the system is governed by LABEL:{ERS1.1} (existence of such control is ensured by Theorem 3.1). Then following Theorem 3.4, we have that

\lim_{n\to\infty}{\mathcal{J}}_{\alpha}^{\hat{v}_{n}^{*}}(x,c)={\mathcal{J}}_{\alpha}^{v^{*}}(x,c)\quad\text{for all}\,\,x\in{\mathds{R}^{d}}\,.

(7.1)

•

For ergodic cost: Let $\hat{v}_{n}^{*}$ be an ergodic optimal control when the system is governed by LABEL:{ERS1.1} (existence is guaranteed by Theorem 4.2, Theorem 4.7). Then arguing as in Theorem 4.5 (for near-monotone case) Theorem 4.10 (for stable case), it follows that

$\lim_{n\to\infty}\inf_{x\in{\mathds{R}^{d}}}{\mathscr{E}}_{x}(c,v_{n}^{*})={\mathscr{E}}^{*}(c)\,.$ (7.2)

•

Finite horizon cost: For each $n\in\mathds{N}$ , let $\hat{v}_{n}^{*}$ be a finite horizon optimal control when the system is governed by LABEL:{ERS1.1} . Then in view of Theorem 5.2, we have

\lim_{n\to\infty}{\mathcal{J}}_{T}^{\hat{v}_{n}^{*}}(x,c)={\mathcal{J}}_{T}^{*}(x,c)\quad\text{for all}\,\,x\in{\mathds{R}^{d}}\,.

(7.3)

•

For cost up to an exit time: Let $\hat{v}_{e,n}^{*}$ be a an optimal control when the system is governed by LABEL:{ERS1.1}, for each $n\in\mathds{N}$ . Then Theorem 6.2 ensures that

\lim_{n\to\infty}\hat{{\mathcal{J}}}_{e}^{\hat{v}_{e,n}^{*}}(x)=\hat{{\mathcal{J}}}_{e}^{\hat{v}_{e}^{*}}(x)\quad\text{for all}\,\,x\in\bar{O}\,.

(7.4)

8. Conclusion

In this paper, we studied continuity of optimal costs and robustness/stability of optimal control policies designed for an incorrect models applied to an actual model for both discounted/ergodic cost criteria. In our analysis we have crucially used the fact that our actual model is a non-degenerate diffusion model. It would be an interesting problem to investigate if such results can be proved in the cases when the limiting system (actual system) is a degenerate diffusion system. Also, in our analysis we have assumed that our system noise is given by a Wiener process; it would be interesting to study further noise processes e.g., when system noise is a wide-bandwidth process or a more general discontinuous martingale noise (as in [K90], [KR87], [KR87a], [KR88]) . In the latter case the controlled process may become non-Markovian process even under stationary Markov policies. Therefore, it is reasonable to find suitable Markovian approximation of it which maintains the necessary properties of the original system. The analysis of robustness problems in this setting is a direction of research worth pursuing.

		$\displaystyle\max_{x\in K}\|\min_{\zeta\in\mathbb{U}}\{b_{n_{k}}(x,\zeta)\cdot\nabla V_{\alpha}^{n_{k}}(x)+c(x,\zeta)\}-\min_{\zeta\in\mathbb{U}}\{b(x,\zeta)\cdot\nabla V_{\alpha}^{*}(x)+c(x,\zeta)\}\|$
		$\displaystyle\,\leq\,\max_{x\in K}\max_{\zeta\in\mathbb{U}}\|\{b_{n_{k}}(x,\zeta)\cdot\nabla V_{\alpha}^{n_{k}}(x)+c_{n}(x,\zeta)\}-\{b(x,\zeta)\cdot\nabla V_{\alpha}^{*}(x)+c(x,\zeta)\}\|$
		$\displaystyle\,\leq\,\max_{x\in K}\max_{\zeta\in\mathbb{U}}\|b_{n_{k}}(x,\zeta)\cdot\nabla V_{\alpha}^{n_{k}}(x)-b(x,\zeta)\cdot\nabla V_{\alpha}^{*}(x)\|+\max_{x\in K}\max_{\zeta\in\mathbb{U}}\|c_{n_{k}}(x,\zeta)-c(x,\zeta)\|$		(3.9)

	$\displaystyle\int_{{\mathds{R}^{d}}}\int_{\mathbb{U}}c_{n_{k}}(x,\zeta)v_{n_{k}}^{}(x)(\mathrm{d}\zeta)\eta_{v_{n_{k}}^{}}^{n_{k}}(\mathrm{d}x)\,-\int_{{\mathds{R}^{d}}}\int_{\mathbb{U}}c(x,\zeta)\tilde{v}^{}(x)(\mathrm{d}\zeta)\eta_{\tilde{v}^{}}(\mathrm{d}x)$
	$\displaystyle=\bigg{(}\int_{{\mathds{R}^{d}}}\int_{\mathbb{U}}c_{n_{k}}(x,\zeta)v_{n_{k}}^{}(x)(\mathrm{d}\zeta)\eta_{v_{n_{k}}^{}}^{n_{k}}(\mathrm{d}x)-\int_{{\mathds{R}^{d}}}\int_{\mathbb{U}}c_{n_{k}}(x,\zeta)v_{n_{k}}^{}(x)(\mathrm{d}\zeta)\eta_{\tilde{v}^{}}(\mathrm{d}x)\bigg{)}$
	$\displaystyle\quad+\bigg{(}\int_{{\mathds{R}^{d}}}\int_{\mathbb{U}}c_{n_{k}}(x,\zeta)v_{n_{k}}^{}(x)(\mathrm{d}\zeta)\eta_{\tilde{v}^{}}(\mathrm{d}x)-\int_{{\mathds{R}^{d}}}\int_{\mathbb{U}}c(x,\zeta)\tilde{v}^{}(x)(\mathrm{d}\zeta)\eta_{\tilde{v}^{}}(\mathrm{d}x)\bigg{)}$		(4.18)

Robustness of Stochastic Optimal Control to Approximate Diffusion Models under Several Cost Evaluation Criteria

Abstract.

Key words and phrases:

2000 Mathematics Subject Classification:

1. Introduction

2. Description of the problem

2.1. Cost Criteria

2.2. Approximating Control Diffusion Process:

2.3. Continuity and Robustness Problems

Example 2.1.

Remark 2.1.

Notation:

3. Analysis of Discounted Cost

Theorem 3.1.

Remark 3.1.

Theorem 3.2.

Theorem 3.3.

Proof.

Theorem 3.4.

Proof.

Remark 3.2.

4. Analysis of Ergodic Cost

4.1. Analysis under a near-monotonicity assumption

Theorem 4.1.

Theorem 4.2.

Theorem 4.3.

Proof.

Theorem 4.4.

Proof.

Theorem 4.5.

Proof.

4.2. Analysis under Lyapunov stability

Theorem 4.6.

Theorem 4.7.

Theorem 4.8.

Proof.

Theorem 4.9.

Proof.

Theorem 4.10.

Proof.

5. Finite Horizon Cost

Theorem 5.1.

Proof.

Theorem 5.2.

Proof.

6. Control up to an Exit Time

Theorem 6.1.

Theorem 6.2.

7. Revisiting Example 2.1

8. Conclusion

References