This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Robustness of Stochastic Optimal Control to Approximate Diffusion Models under Several Cost Evaluation Criteria

Somnath Pradhan Department of Mathematics and Statistics, Queen’s University, Kingston, ON, Canada [email protected]  and  Serdar Yüksel Department of Mathematics and Statistics, Queen’s University, Kingston, ON, Canada [email protected]
Abstract.

In control theory, typically a nominal model is assumed based on which an optimal control is designed and then applied to an actual (true) system. This gives rise to the problem of performance loss due to the mismatch between the true model and the assumed model. A robustness problem in this context is to show that the error due to the mismatch between a true model and an assumed model decreases to zero as the assumed model approaches the true model. We study this problem when the state dynamics of the system are governed by controlled diffusion processes. In particular, we will discuss continuity and robustness properties of finite horizon and infinite-horizon α\alpha-discounted/ergodic optimal control problems for a general class of non-degenerate controlled diffusion processes, as well as for optimal control up to an exit time. Under a general set of assumptions and a convergence criterion on the models, we first establish that the optimal value of the approximate model converges to the optimal value of the true model. We then establish that the error due to mismatch that occurs by application of a control policy, designed for an incorrectly estimated model, to a true model decreases to zero as the incorrect model approaches the true model. We will see that, compared to related results in the discrete-time setup, the continuous-time theory will let us utilize the strong regularity properties of solutions to optimality (HJB) equations, via the theory of uniformly elliptic PDEs, to arrive at strong continuity and robustness properties.

Key words and phrases:
Robust control, Controlled diffusions, Hamilton-Jacobi-Bellman equation, Stationary control
2000 Mathematics Subject Classification:
Primary: 93E20, 60J60; secondary: 49J55

1. Introduction

In stochastic control applications, typically only an ideal model is assumed, or learned from available incomplete data, based on which an optimal control is designed and then applied to the actual system. This gives rise to the problem of performance loss due to the mismatch between the actual system and the assumed system. A robustness problem in this context is to show that the error due to mismatch decreases to zero as the assumed system approaches the actual system. With this motivation, in this article, our goal is to study the continuity and robustness properties of finite horizon and infinite horizon discounted/ergodic cost problems for a large class of multidimensional controlled diffusions. We note that the problems of existence, uniqueness and verification of optimality of stationary Markov policies have been studied extensively in literature see e.g., [Bor-book], [HP09-book] (finite horizon) [BS86], [BB96] (discounted cost) [AA12], [AA13], [BG88I], [BG90b] (ergodic cost) and references therein. For a book-length exposition of this topic see e.g., [ABG-book].

In more explicit terms, here is the problem that we will study. For a precise statement please see Section 2.3. Suppose that our true model is represented as (X,c)(X,c) (see, e.g., Eq. 2.1), where XX is the true system model (representing the system evolution model via the drift and diffusion terms) and cc is the associated running cost function, and let (Xn,cn)(X_{n},c_{n}) (see, e.g., Eq. 2.15) be a sequence of approximating models XnX_{n} with associated running cost functions cnc_{n}, such that as nn\to\infty approximating models XnX_{n} converge to the true model XX in some sense to be precisely stated. Suppose that for each choice of control policy UU the associated total cost in true/approximating models are 𝒥(c,U),{\mathcal{J}}(c,U), 𝒥n(cn,U){\mathcal{J}}_{n}(c_{n},U) respectively. The objective of the controller is to minimize the total cost over all possible admissible policies. If the optimal control policies of the true/approximating models are vv^{*}, vnv^{*n} respectively, the performance loss due to mismatch is given by |𝒥(c,v)𝒥(c,vn)||{\mathcal{J}}(c,v^{*})-{\mathcal{J}}(c,v^{*n})|. Thus the robustness problem in this context is to show that |𝒥(c,v)𝒥(c,vn)|0|{\mathcal{J}}(c,v^{*})-{\mathcal{J}}(c,v^{*n})|\to 0 as nn\to\infty . See Section 2.3. In this sense, our paper can be viewed as a continuous-time counterpart of the setting studied in [KY-20], [KRY-20].

This problem is of major practical importance and, accordingly, there have been many studies. Most of the existing works in this direction are concerned with the discrete time Markov decision process, see for instance [KY-20], [KRY-20], [BJP02], [KV16], [NG05] [SX15], and references therein.

We should note that the term robustness has various interpretations, contexts and solution methods. A common approach to robustness in the literature has been to design controllers that work sufficiently well for all possible uncertain systems under some structured constraints, such as HH_{\infty} norm bounded perturbations (see [basbern], [zhou1996robust]). For such problems, the design for robust controllers has often been developed through a game theoretic formulation where the minimizer is the controller and the maximizer is the uncertainty. In [DJP00], [jacobson1973optimal] the authors established the connections of this formulation to risk sensitive control. Using Legendre-type transforms, relative entropy constraints came in to the literature to probabilistically model the uncertainties, see e.g. [dai1996connections, Eqn. (4)] or [DJP00, Eqns. (2)-(3)]. Here, one selects a nominal system which satisfies a relative entropy bound between the actual measure and the nominal measure, solves a risk sensitive optimal control problem, and this solution value provides an upper bound for the original system performance. Therefore, a common approach in robust stochastic control has been to consider all models which satisfy certain bounds in terms of relative entropy pseudo-distance (or Kullback-Leibler divergence); see e.g. [DJP00, dai1996connections, dupuis2000kernel, boel2002robustness] among others. In order to quantify the uncertainty in the system models, other than the relative entropy pseudo-distance, various other metrics/criterion have also been used in the literature. In [tzortzis2015dynamic], for discrete time controlled models, the authors have studied a min-max formulation for robust control where the one-stage transition kernel belongs to a ball under the total variation metric for each state action pair. For distributionally robust stochastic optimization problems, it is assumed that the underlying probability measure of the system lies within an ambiguity set and a worst case single-stage optimization is made considering the probability measures in the ambiguity set. To construct ambiguity sets, [blanchet2016], [esfahani2015] use the Wasserstein metric, [erdogan2005] uses the Prokhorov metric which metrizes the weak topology, [sun2015] uses the total variation distance, and [lam2016] works with relative entropy. For fully observed finite state-action space models with uncertain transition probabilities, the authors in [iyengar2005robust], [nilim2005robust] have studied robust dynamic programming approaches through a min-max formulation. Similar work with model uncertainty includes [oksendal2014forward], [benavoli2011robust], [xu_mannor]. In the economics literature related work has been done in [hansen2001robust], [gossner2008entropy].

The robustness formulation we study has been considered in [KY-20], [KRY-20] for discrete-time models, where the authors studied both continuity of value functions as transition kernel models converge, as well as the robustness problem where an optimal control designed for an incorrect approximate model is applied to a true model and the mismatch term is studied. The solution approach is fundamentally different in the continuous-time analysis we present in this paper. In a related study [Dean18], the author studied the optimal control of systems with unknown dynamics for a linear quadratic regulator setup and proposes an algorithm to learn the system from observed data with quantitative convergence bounds. The author in [Lan81, Theorem 5.1] has considered fully observed discrete time controlled models and established continuity results for approximate models and gives a set convergence result for sets of optimal control actions, this set convergence result is inconclusive for robustness without further assumptions on the true system model (for more details see [KY-20]). For fully observed MDPs, [muller1997does] studied continuity of the value function under a general metric defined as the integral probability metric, which captures both the total variation metric or the Kantorovich metric with different setups (which is not weaker than the metrics leading to weak convergence). A recent study on game problems along a similar theme is presented in [subramanian2021robustness].

For control problems of MDPs with standard Borel spaces, the approximation methods through quantization, which lead to finite models, can be viewed as approximations of transition kernels, but this interpretation requires caution: indeed, [SaYuLi17, arruda2012, arruda2013], among many others, study approximation methods for MDPs where the convergence of approximate models is satisfied in a particularly constructed fashion. Reference [SaYuLi17] presents a construction for the approximate models through quantizing the actual model with continuous spaces (leading to a finite space model), which allows for continuity and robustness results with only a weak continuity assumption on the true transition kernel which, in turn, leads to the weak convergence of the approximate models. For both fully observed and partially observed models, a detailed analysis of approximation methods for continuous state and action spaces can be found in [SaLiYuSpringer] .

The literature on robustness of stochastic optimal control for continuous time system seems to be rather limited; see e.g., [GL99], [LJE15] [hansen2001robust] . In [GL99] the authors have considered the problem of controlling a system whose dynamics are given by a stochastic differential equation (SDE) whose coefficients are known only up to a certain degree of accuracy. For the finite horizon reward maximization problem, using the technique of contractive operators, [GL99] has obtained upper bounds of performance loss due to mismatch (or, “robustness index”) and has shown by an example that the robustness index may behave abnormally even if we have the convergence of the value functions. The associated discounted payoff maximization problem has been studied in [LJE15], where using a Lyapunov type stability assumption the authors have studied the robustness problem via a game theoretic formulation. For controlled diffusion models, the authors in [hansen2001robust] described the links between the max-min expected utility theory and the applications of robust-control theory, in analogy with some of the papers on discrete-time noted above adopting a min-max formulation. Along a further direction, for controlled diffusions, via the Maximum Principle technique, [PDPB02a], [PDPB02b], [PDPB02c] have established the robustness of optimal controls for the finite horizon payoff criterion.

In a recent comprehensive work [RZ21], the authors have studied the robustness of feedback relaxed controls for a continuous time stochastic exit time problem. Under sufficient smoothness assumptions on the coefficients (i.e, uniform Lipschitz continuity on the diffusion coefficients and uniform Hölder continuity on the discount factor and payoff function on a fixed bounded domain) they have established that a regularized control problem admits a Hölder continuous optimal feedback control and also they have shown that both the value function and the feedback control of the regularized control problem are Lipschitz stable with respect to parameter perturbations when the action space is finite. It is known that the optimal control obtained form the HJB equation (i.e. the argmin function) in general is unstable with respect to perturbations of coefficients; in practice, this would result in numerical instability of learning algorithms (as noted in [RZ21]).

Stability/continuity of solutions of PDEs with respect coefficient perturbations is a significant mathematical and practical question in PDE theory (see e.g. [WLS01], [SI72]). The continuity results established in this paper (see Theorems 3.3, 4.3, 4.8) will provide sufficient conditions which ensure stability of solutions of semilinear elliptic PDEs (HJB equations) in the whole space d{\mathds{R}^{d}}.

Our robustness results also will be useful to the study of the robust optimal investment problems for local volatility models, e.g. given in [AS08, Remark 2.1] (also, see [KT12], [BDD20]) .

When the system noise is not given by a Wiener process, but it is given by a general wide bandwidth noise (or, a more general discontinuous martingales [LRT00]), the controlled process becomes a non-Markovian process even under stationary Markov policies. The general method of studying optimal control problem for such a system is to find suitable Markovian processes which approximate the non-Markovian process (see, [K90], [KR87], [KR87a], [KR88]). For wide bandwidth noise driven controlled systems [K90], [KR87], [KR87a], [KR88], diffusion approximation techniques were used to study stochastic optimal control problems. The results described in this paper are complementary to the above mentioned works on the diffusion approximation of wide bandwidth noise driven systems.

Contributions and main results. In the present paper, our aim is to study the continuity and robustness properties for a general class of controlled diffusion processes in d{\mathds{R}^{d}} for both infinite horizon discounted/ ergodic costs, where the action space is a (general) compact metric space. As in [KY-20], [KRY-20], in order to establish our desired robustness results we will use the continuity result as an intermediate step. For the discounted cost case, we will establish our results following a direct approach (under a relatively weaker set of assumptions on the diffusion coefficients, i.e., locally Lipschitz continuous coefficients). Using the results on existence and uniqueness of solutions of the associated discounted Hamilton Jacobi Bellman (HJB) equation and the complete characterization of (discounted) optimal policies in the space of stationary Markov policies (see [ABG-book, Theorem 3.5.6]), we first establish the continuity of value functions. Then utilizing this continuity of value functions, we derive a robustness result. The analysis of ergodic cost (or long-run expected average cost) is somewhat more involved. To the best of our knowledge there is no work on continuity and robustness properties of optimal controls for the ergodic cost criterion in the existing literature (for the discrete-time setup, see [KRY-20]). We have studied these ergodic cost problems under two sets of assumptions: In the first case, we assume that our running cost function satisfies a near-monotone type structural assumption (see, eq. Eq. 4.1, Assumption (A6)), and in the second case we assume Lyapunov type stability assumptions on the dynamics of the system (see Assumption (A7)) .

One of the major issues in analyzing the robustness of ergodic optimal controls under the near-monotone hypothesis is the non-uniqueness/restricted uniqueness of solutions of the associated HJB equation (see, [ABG-book, Example 3.8.3], [AA13]). It is shown in [ABG-book, Example 3.8.3] that the ergodic HJB equation may admit uncountably many solutions. Considering this, in [AA13, Theorem 1.1] the author has established the uniqueness of compatible solution pairs (see [AA13, Definition 1.1]). Exploiting this uniqueness result, under a suitable tightness assumption (on a certain set of invariant measures) we will establish the desired robustness result. Under the Lyapunov type stability assumption it is known that the ergodic HJB equation admits a unique solution in a certain class of functions, also the complete characterization of ergodic optimal control is known (see [ABG-book, Theorem 3.7.11] and [ABG-book, Theorem 3.7.12]) . Utilizing this characterization of optimal controls, we derive the robustness properties of ergodic optimal controls under a Lyapunov stability assumption.

We also emphasize the duality between the PDE approach vs. a probabilistic flow approach to study robustness. The PDE approach presents a very general and conclusive, yet concise and unified, approach for several cost criteria (notably, a probabilistic approach via Dynkin’s lemma would require separate arguments for discounted infinite-horizon and average cost infinite-horizon criteria) and such a unified approach had not been considered earlier, to our knowledge.

Thus, the main results of this article can be roughly described as follows.

  • For discounted cost criterion: We establish continuity of value functions and provide sufficient conditions which ensure robustness/stability of optimal controls designed under model uncertainties.

  • For ergodic cost criterion: Under two different sets of assumptions ((i) where the running cost is near-monotone or (ii) where a Lyapunov stability condition holds) we establish the continuity of value functions and exploiting the continuity results we derive the robustness/stability of ergodic optimal controls designed for approximate models applied to actual systems.

  • For finite horizon cost criterion: Under uniform boundedness assumptions on the drift term and diffusion matrices (of the true and approximating models), we establish continuity of value functions. Then exploiting the continuity result we prove the robustness/stability of optimal controls designed under model uncertainties.

  • For cost up to an exit time: Similar to the above criteria, under a mild set of assumptions we first establish the continuity of value functions and then using the continuity results we establish the robustness/stability of optimal controls designed under model uncertainties.

We will see that compared with the discrete-time counterpart of this problem studied in [KY-20] (discounted cost) and [KRY-20] (average cost), where value iteration methods were crucially used, in our analysis here we will develop rather direct arguments, with strong implications, utilizing regularity properties of value functions: In the discrete-time setup, these properties need to be established via tedious arguments whereas the continuous-time theory allows for the use of regularity properties of solutions to PDEs. Nonetheless, we will see that continuous convergence in control actions of models and cost functions is a unifying condition for continuity and robustness properties in both the discrete-time setup studied in [KY-20] (discounted cost) and [KRY-20] (average cost) and our current paper. Compared to [RZ21], in addition to the infinite horizon criteria we study, we note that the perturbations we consider do not involve only coefficient/parameter variations, i.e., we consider functional perturbations, and the action space we consider is uncountable, though we do not establish the Lipschitz property of control policies, unlike [RZ21].

The rest of the paper is organized as follows. Section 2 introduces the the problem setup and summarizes the notation. Section 3 is devoted to the analysis of robustness of optimal controls for discounted cost criterion. In Section 4 we provide the analysis of robustness of ergodic optimal control under two different sets of hypotheses (i) near-monotonicity (ii) Lyapunov stability. For the finite horizon cost criterion the robustness problem is analyzed in Section 5. The robustness problem for optimal controls up to an exit time is considered in Section 6.

2. Description of the problem

Let 𝕌\mathbb{U} be a compact metric space and V=𝒫(𝕌)\mathrm{V}=\mathscr{P}(\mathbb{U}) be the space of probability measures on 𝕌\mathbb{U} with topology of weak convergence. Let

b:d×𝕌d,b:{\mathds{R}^{d}}\times\mathbb{U}\to{\mathds{R}^{d}},
σ:dd×d,σ=[σij()]1i,jd,\sigma:{\mathds{R}^{d}}\to\mathds{R}^{d\times d},\,\sigma=[\sigma_{ij}(\cdot)]_{1\leq i,j\leq d},

be given functions. We consider a stochastic optimal control problem whose state is evolving according to a controlled diffusion process given by the solution of the following stochastic differential equation (SDE)

dXt=b(Xt,Ut)dt+σ(Xt)dWt,X0=xd.\mathrm{d}X_{t}\,=\,b(X_{t},U_{t})\mathrm{d}t+\upsigma(X_{t})\mathrm{d}W_{t}\,,\quad X_{0}=x\in{\mathds{R}^{d}}. (2.1)

Where

  • WW is a dd-dimensional standard Wiener process, defined on a complete probability space (Ω,𝔉,)(\Omega,{\mathfrak{F}},\mathbb{P}).

  • We extend the drift term b:d×Vdb:{\mathds{R}^{d}}\times\mathrm{V}\to{\mathds{R}^{d}} as follows:

    b(x,v)=𝕌b(x,ζ)v(dζ),b(x,\mathrm{v})=\int_{\mathbb{U}}b(x,\zeta)\mathrm{v}(\mathrm{d}\zeta),

    for vV\mathrm{v}\in\mathrm{V}.

  • UU is a V\mathrm{V} valued process satisfying the following non-anticipativity condition: for s<t,s<t\,, WtWsW_{t}-W_{s} is independent of

    𝔉s:=the completion ofσ(X0,Ur,Wr:rs)relative to(𝔉,).{\mathfrak{F}}_{s}:=\,\,\mbox{the completion of}\,\,\,\sigma(X_{0},U_{r},W_{r}:r\leq s)\,\,\,\mbox{relative to}\,\,({\mathfrak{F}},\mathbb{P})\,.

The process UU is called an admissible control, and the set of all admissible controls is denoted by 𝔘\mathfrak{U} (see, [BG90]).

To ensure existence and uniqueness of strong solutions of Eq. 2.1, we impose the following assumptions on the drift bb and the diffusion matrix σ\upsigma .

  • (A1)

    Local Lipschitz continuity: The function σ=[σij]:dd×d\upsigma\,=\,\bigl{[}\upsigma^{ij}\bigr{]}\colon\mathds{R}^{d}\to\mathds{R}^{d\times d}, b:d×𝕌db\colon{\mathds{R}^{d}}\times\mathbb{U}\to{\mathds{R}^{d}} are locally Lipschitz continuous in xx (uniformly with respect to the control action for bb). In particular, for some constant CR>0C_{R}>0 depending on R>0R>0, we have

    |b(x,ζ)b(y,ζ)|2+σ(x)σ(y)2CR|xy|2\lvert b(x,\zeta)-b(y,\zeta)\rvert^{2}+\lVert\upsigma(x)-\upsigma(y)\rVert^{2}\,\leq\,C_{R}\,\lvert x-y\rvert^{2}

    for all x,yRx,y\in{\mathscr{B}}_{R} and ζ𝕌\zeta\in\mathbb{U}, where σ:=Tr(σσ𝖳)\lVert\upsigma\rVert:=\sqrt{\operatorname*{Tr}(\upsigma\upsigma^{\mathsf{T}})} . Also, we are assuming that bb is jointly continuous in (x,ζ)(x,\zeta).

  • (A2)

    Affine growth condition: bb and σ\upsigma satisfy a global growth condition of the form

    supζ𝕌b(x,ζ),x++σ(x)2C0(1+|x|2)xd,\sup_{\zeta\in\mathbb{U}}\,\langle b(x,\zeta),x\rangle^{+}+\lVert\upsigma(x)\rVert^{2}\,\leq\,C_{0}\bigl{(}1+\lvert x\rvert^{2}\bigr{)}\qquad\forall\,x\in\mathds{R}^{d},

    for some constant C0>0C_{0}>0.

  • (A3)

    Nondegeneracy: For each R>0R>0, it holds that

    i,j=1daij(x)zizjCR1|z|2xR,\sum_{i,j=1}^{d}a^{ij}(x)z_{i}z_{j}\,\geq\,C^{-1}_{R}\lvert z\rvert^{2}\qquad\forall\,x\in{\mathscr{B}}_{R}\,,

    and for all z=(z1,,zd)𝖳dz=(z_{1},\dotsc,z_{d})^{\mathsf{T}}\in\mathds{R}^{d}, where a:=12σσ𝖳a:=\frac{1}{2}\upsigma\upsigma^{\mathsf{T}}.

By a Markov control we mean an admissible control of the form Ut=v(t,Xt)U_{t}=v(t,X_{t}) for some Borel measurable function v:+×dVv:\mathds{R}_{+}\times{\mathds{R}^{d}}\to\mathrm{V}. The space of all Markov controls is denoted by 𝔘𝗆\mathfrak{U}_{\mathsf{m}} . If the function vv is independent of tt, then UU or by an abuse of notation vv itself is called a stationary Markov control. The set of all stationary Markov controls is denoted by 𝔘𝗌𝗆\mathfrak{U}_{\mathsf{sm}}. From [ABG-book, Section 2.4], we have that the set 𝔘𝗌𝗆\mathfrak{U}_{\mathsf{sm}} is metrizable with compact metric under the following topology: A sequence vnvv_{n}\to v in 𝔘𝗌𝗆\mathfrak{U}_{\mathsf{sm}} if and only if

limndf(x)𝕌g(x,u)vn(x)(du)dx=df(x)𝕌g(x,u)v(x)(du)dx\lim_{n\to\infty}\int_{{\mathds{R}^{d}}}f(x)\int_{\mathbb{U}}g(x,u)v_{n}(x)(\mathrm{d}u)\mathrm{d}x=\int_{{\mathds{R}^{d}}}f(x)\int_{\mathbb{U}}g(x,u)v(x)(\mathrm{d}u)\mathrm{d}x

for all fL1(d)L2(d)f\in L^{1}({\mathds{R}^{d}})\cap L^{2}({\mathds{R}^{d}}) and g𝒞b(d×𝕌)g\in{\mathcal{C}}_{b}({\mathds{R}^{d}}\times\mathbb{U}) (for more details, see [ABG-book, Lemma 2.4.1]) . It is well known that under the hypotheses (A1)(A3), for any admissible control Eq. 2.1 has a unique strong solution [ABG-book, Theorem 2.2.4], and under any stationary Markov strategy Eq. 2.1 has unique strong solution which is a strong Feller (therefore strong Markov) process [ABG-book, Theorem 2.2.12].

2.1. Cost Criteria

Let c:d×𝕌+c\colon{\mathds{R}^{d}}\times\mathbb{U}\to\mathds{R}_{+} be the running cost function. We assume that

  • (A4)

    The running cost cc is bounded (i.e., cM\|c\|_{\infty}\leq M for some positive constant MM), jointly continuous in (x,ζ)(x,\zeta) and locally Lipschitz continuous in its first argument uniformly with respect to ζ𝕌\zeta\in\mathbb{U}.

This condition (A4) can also be relaxed to (A4)́, to be presented further below, where the local Lipschitz property is eliminated.

We extend c:d×V+c\colon{\mathds{R}^{d}}\times\mathrm{V}\to\mathds{R}_{+} as follows: for vV\mathrm{v}\in\mathrm{V}

c(x,v):=𝕌c(x,ζ)v(dζ).c(x,\mathrm{v}):=\int_{\mathbb{U}}c(x,\zeta)\mathrm{v}(\mathrm{d}\zeta)\,.

In this article, we consider the problem of minimizing finite horizon, discounted, ergodic and control up to an exit time cost criteria:

Discounted cost criterion. For U𝔘U\in\mathfrak{U}, the associated α\alpha-discounted cost is given by

𝒥αU(x,c):=𝔼xU[0eαsc(Xs,Us)ds],xd,{\mathcal{J}}_{\alpha}^{U}(x,c)\,:=\,\operatorname{\mathbb{E}}_{x}^{U}\left[\int_{0}^{\infty}e^{-\alpha s}c(X_{s},U_{s})\mathrm{d}s\right],\quad x\in{\mathds{R}^{d}}\,, (2.2)

where α>0\alpha>0 is the discount factor and X()X(\cdot) is the solution of Eq. 2.1 corresponding to U𝔘U\in\mathfrak{U} and 𝔼xU\operatorname{\mathbb{E}}_{x}^{U} is the expectation with respect to the law of the process X()X(\cdot) with initial condition xx. The controller tries to minimize Eq. 2.2 over his/her admissible policies 𝔘\mathfrak{U} . Thus, a policy U𝔘U^{*}\in\mathfrak{U} is said to be optimal if for all xdx\in{\mathds{R}^{d}}

𝒥αU(x,c)=infU𝔘𝒥αU(x,c)(=:Vα(x)),{\mathcal{J}}_{\alpha}^{U^{*}}(x,c)=\inf_{U\in\mathfrak{U}}{\mathcal{J}}_{\alpha}^{U}(x,c)\,\,\,(\,=:\,\,\,V_{\alpha}(x))\,, (2.3)

where Vα(x)V_{\alpha}(x) is called the optimal value.

Ergodic cost criterion. For each U𝔘U\in\mathfrak{U} the associated ergodic cost is defined as

x(c,U)=lim supT1T𝔼xU[0Tc(Xs,Us)ds],{\mathscr{E}}_{x}(c,U)=\limsup_{T\to\infty}\frac{1}{T}\operatorname{\mathbb{E}}_{x}^{U}\left[\int_{0}^{T}c(X_{s},U_{s})\mathrm{d}{s}\right]\,, (2.4)

and the optimal value is defined as

(c):=infxdinfU𝔘x(c,U).{\mathscr{E}}^{*}(c)\,:=\,\inf_{x\in{\mathds{R}^{d}}}\inf_{U\in\mathfrak{U}}{\mathscr{E}}_{x}(c,U)\,. (2.5)

Then a control U𝔘U^{*}\in\mathfrak{U} is said to be optimal if we have

x(c,U)=(c)for allxd.{\mathscr{E}}_{x}(c,U^{*})={\mathscr{E}}^{*}(c)\quad\text{for all}\,\,\,x\in{\mathds{R}^{d}}\,. (2.6)

Finite horizon cost. For U𝔘U\in\mathfrak{U}, the associated finite horizon cost is given by

𝒥TU(x,c)=𝔼xU[0Tc(Xs,Us)ds+H(XT)],{\mathcal{J}}_{T}^{U}(x,c)=\operatorname{\mathbb{E}}_{x}^{U}\left[\int_{0}^{T}c(X_{s},U_{s})\mathrm{d}{s}+H(X_{T})\right]\,, (2.7)

where H()H(\cdot) is the terminal cost. The optimal value is defined as

𝒥T(x,c):=infU𝔘𝒥TU(x,c).{\mathcal{J}}_{T}^{*}(x,c)\,:=\,\inf_{U\in\mathfrak{U}}{\mathcal{J}}_{T}^{U}(x,c)\,. (2.8)

Thus, a policy U𝔘U^{*}\in\mathfrak{U} is said to be (finite horizon) optimal if we have

𝒥TU(x,c)=𝒥T(x,c)for allxd.{\mathcal{J}}_{T}^{U^{*}}(x,c)={\mathcal{J}}_{T}^{*}(x,c)\quad\text{for all}\,\,\,x\in{\mathds{R}^{d}}\,. (2.9)

Control up to an exit time. This criterion will be presented in Section 6. Our analysis for this criterion will be immediate given the study involving the above criteria.

We define a family of operators ζ{\mathscr{L}}_{\zeta} mapping 𝒞2(d){\mathcal{C}}^{2}({\mathds{R}^{d}}) to 𝒞(d){\mathcal{C}}({\mathds{R}^{d}}) by

ζf(x):=Tr(a(x)2f(x))+b(x,ζ)f(x),{\mathscr{L}}_{\zeta}f(x)\,:=\,\operatorname*{Tr}\bigl{(}a(x)\nabla^{2}f(x)\bigr{)}+\,b(x,\zeta)\cdot\nabla f(x)\,, (2.10)

for ζ𝕌\zeta\in\mathbb{U},    f𝒞2(d)f\in{\mathcal{C}}^{2}({\mathds{R}^{d}}) . For vV\mathrm{v}\in\mathrm{V} we extend ζ{\mathscr{L}}_{\zeta} as follows:

vf(x):=𝕌ζf(x)v(dζ).{\mathscr{L}}_{\mathrm{v}}f(x)\,:=\,\int_{\mathbb{U}}{\mathscr{L}}_{\zeta}f(x)\mathrm{v}(\mathrm{d}\zeta)\,. (2.11)

For v𝔘𝗌𝗆v\in\mathfrak{U}_{\mathsf{sm}}, we define

vf(x):=Tr(a2f(x))+b(x,v(x))f(x).{\mathscr{L}}_{v}f(x)\,:=\,\operatorname*{Tr}(a\nabla^{2}f(x))+b(x,v(x))\cdot\nabla f(x)\,. (2.12)

We are interested in the robustness of optimal controls under these criteria. To this end, we now introduce our approximating models.

2.2. Approximating Control Diffusion Process:

Let, σn=[σnij]:dd×d\upsigma_{n}\,=\,\bigl{[}\upsigma_{n}^{ij}\bigr{]}\colon\mathds{R}^{d}\to\mathds{R}^{d\times d}, bn:d×𝕌db_{n}\colon{\mathds{R}^{d}}\times\mathbb{U}\to{\mathds{R}^{d}}, cn:d×𝕌dc_{n}\colon{\mathds{R}^{d}}\times\mathbb{U}\to{\mathds{R}^{d}} be sequence of functions satisfying the following assumptions

  • (A5)
    • (i)

      as nn\to\infty

      σn(x)σ(x)a.e.xd,\upsigma_{n}(x)\to\upsigma(x)\quad\text{a.e.}\,\,x\in{\mathds{R}^{d}}\,, (2.13)
    • (ii)

      Continuous convergence in controls: for any sequence ζnζ\zeta_{n}\to\zeta

      cn(x,ζn)c(x,ζ)andbn(x,ζn)b(x,ζ)a.e.xd.c_{n}(x,\zeta_{n})\to c(x,\zeta)\quad\text{and}\quad b_{n}(x,\zeta_{n})\to b(x,\zeta)\quad\text{a.e.}\,\,x\in{\mathds{R}^{d}}\,. (2.14)
    • (iii)

      for each nn\in\mathds{N},  bnb_{n} and σn\upsigma_{n} satisfy Assumptions (A1) - (A3) and cnc_{n} is uniformly bounded ( in particular, cnM\lVert c_{n}\rVert_{\infty}\leq M where MM is a positive constant as in (A4)), jointly continuous in (x,ζ)(x,\zeta) and locally Lipschitz continuous in its first argument uniformly with respect to ζ𝕌\zeta\in\mathbb{U}. .

Let for each nn\in\mathds{N}, XtnX_{t}^{n} be the solution of the following SDE

dXtn=bn(Xtn,Ut)dt+σn(Xtn)dWt,X0n=xd.\mathrm{d}X_{t}^{n}\,=\,b_{n}(X_{t}^{n},U_{t})\mathrm{d}t+\upsigma_{n}(X_{t}^{n})\mathrm{d}W_{t}\,,\quad X_{0}^{n}=x\in{\mathds{R}^{d}}. (2.15)

Define a family of operators ζn{\mathscr{L}}_{\zeta}^{n} mapping 𝒞2(d){\mathcal{C}}^{2}({\mathds{R}^{d}}) to 𝒞(d){\mathcal{C}}({\mathds{R}^{d}}) by

ζnf(x):=Tr(an(x)2f(x))+bn(x,ζ)f(x),{\mathscr{L}}_{\zeta}^{n}f(x)\,:=\,\operatorname*{Tr}\bigl{(}a_{n}(x)\nabla^{2}f(x)\bigr{)}+\,b_{n}(x,\zeta)\cdot\nabla f(x)\,, (2.16)

for ζ𝕌\zeta\in\mathbb{U},    f𝒞2(d)f\in{\mathcal{C}}^{2}({\mathds{R}^{d}}) . For the approximated model, for each nn\in\mathds{N} and U𝔘U\in\mathfrak{U} the associated discounted cost is defined as

𝒥α,nU(x,cn):=𝔼xU[0eαtcn(Xsn,Us)ds],xd,{\mathcal{J}}_{\alpha,n}^{U}(x,c_{n})\,:=\,\operatorname{\mathbb{E}}_{x}^{U}\left[\int_{0}^{\infty}e^{-\alpha t}c_{n}(X_{s}^{n},U_{s})\mathrm{d}s\right],\quad x\in{\mathds{R}^{d}}\,, (2.17)

and the optimal value is defined as

Vαn(x):=infU𝔘𝒥α,nU(x,cn)V_{\alpha}^{n}(x)\,:=\,\inf_{U\in\mathfrak{U}}{\mathcal{J}}_{\alpha,n}^{U}(x,c_{n}) (2.18)

For each nn\in\mathds{N} and U𝔘U\in\mathfrak{U} the associated ergodic cost is defined as

xn(cn,U)=lim supT1T𝔼xU[0Tcn(Xsn,Us)ds],{\mathscr{E}}_{x}^{n}(c_{n},U)=\limsup_{T\to\infty}\frac{1}{T}\operatorname{\mathbb{E}}_{x}^{U}\left[\int_{0}^{T}c_{n}(X_{s}^{n},U_{s})\mathrm{d}{s}\right]\,, (2.19)

and the optimal value is defined as

n(cn):=infxdinfU𝔘xn(cn,U).{\mathscr{E}}^{n*}(c_{n})\,:=\,\inf_{x\in{\mathds{R}^{d}}}\inf_{U\in\mathfrak{U}}{\mathscr{E}}_{x}^{n}(c_{n},U)\,. (2.20)

Similarly, for each nn\in\mathds{N} and U𝔘U\in\mathfrak{U} the associated finite horizon cost is given by

𝒥T,nU(x,cn):=𝔼xU[0Tcn(Xsn,Us)ds+H(XTn)].{\mathcal{J}}_{T,n}^{U}(x,c_{n})\,:=\,\operatorname{\mathbb{E}}_{x}^{U}\left[\int_{0}^{T}c_{n}(X_{s}^{n},U_{s})\mathrm{d}{s}+H(X_{T}^{n})\right]\,. (2.21)

The optimal value is given by

𝒥T,n(x,cn)=infU𝔘𝒥T,nU(x,cn)for allxd,{\mathcal{J}}_{T,n}^{*}(x,c_{n})=\inf_{U\in\mathfrak{U}}{\mathcal{J}}_{T,n}^{U}(x,c_{n})\quad\text{for all}\,\,\,x\in{\mathds{R}^{d}}\,, (2.22)

where state process XtnX_{t}^{n} is given by the solution of the SDE Eq. 2.15 .

2.3. Continuity and Robustness Problems

The primary objective of this article will be to address the following problems:

  • Continuity: If the approximated model Eq. 2.15 approches the true model Eq. 2.1, whether this implies

    • for discounted cost: VαnVα?V_{\alpha}^{n}\to V_{\alpha}?

    • for ergodic cost : n(cn)(c)?{\mathscr{E}}^{n*}(c_{n})\to{\mathscr{E}}^{*}(c)?

    • for finite horizon cost : 𝒥T,n(x,cn)𝒥T(x,c)?{\mathcal{J}}_{T,n}^{*}(x,c_{n})\to{\mathcal{J}}_{T}^{*}(x,c)?

    • for cost up to an exit time: 𝒥^e,n𝒥^e?\hat{{\mathcal{J}}}_{e,n}^{*}\to\hat{{\mathcal{J}}}_{e}^{*}?     (for details, see Section 6)

  • Robustness: Suppose vnv_{n}^{*} is an optimal policy designed over incorrect model Eq. 2.15 for finite horizon/ discounted/ergodic/up to an exit time cost problem, does this imply

    • for discounted cost: 𝒥αvn(x,c)Vα?{\mathcal{J}}_{\alpha}^{v_{n}^{*}}(x,c)\to V_{\alpha}?

    • for ergodic cost: x(c,vn)(c)?{\mathscr{E}}_{x}(c,v_{n}^{*})\to{\mathscr{E}}^{*}(c)?

    • for finite horizon cost : 𝒥Tvn(x,c)𝒥T(x,c)?{\mathcal{J}}_{T}^{v_{n}^{*}}(x,c)\to{\mathcal{J}}_{T}^{*}(x,c)?

    • for cost up to an exit time: 𝒥^evn𝒥^e?\hat{{\mathcal{J}}}_{e}^{v_{n}^{*}}\to\hat{{\mathcal{J}}}_{e}^{*}?     (for details, see Section 6)

    as nn\to\infty .

In this article, under a mild set of assumptions we show that the answers to the above mentioned questions are affirmative.

Example 2.1.
  • (i)

    If our noise term is not the (ideal) Brownian, and instead of Eq. 2.1, the state dynamics of the system is governed the following SDE

    {dX^tn=b(X^tn,Ut)dt+σ(X^tn)dStndStn=b^n(X^tn)dt+σ^n(X^tn)dW^t.\begin{cases}\mathrm{d}\hat{X}_{t}^{n}\,=\,b(\hat{X}_{t}^{n},U_{t})\mathrm{d}t+\upsigma(\hat{X}_{t}^{n})\mathrm{d}S_{t}^{n}\\ \mathrm{d}S_{t}^{n}\,=\,\hat{b}_{n}(\hat{X}_{t}^{n})\mathrm{d}t+\hat{\upsigma}_{n}(\hat{X}_{t}^{n})\mathrm{d}\hat{W}_{t}\,.\end{cases} (2.23)

    Here we are approximating the noise term by a Ito^\hat{\rm o} process {Stn}\{S_{t}^{n}\}, given by

    dStn=b^n(X^tn)dt+σ^n(X^tn)dW^t,\mathrm{d}S_{t}^{n}\,=\,\hat{b}_{n}(\hat{X}_{t}^{n})\mathrm{d}t+\hat{\upsigma}_{n}(\hat{X}_{t}^{n})\mathrm{d}\hat{W}_{t}\,, (2.24)

    where b^n()0\hat{b}_{n}(\cdot)\to 0 and σ^n()I\hat{\upsigma}_{n}(\cdot)\to I as nn\to\infty.

  • (ii)

    Suppose that Eq. 2.1 is approximated by Eq. 2.15 where bnb_{n} and σn\upsigma_{n} consist of polynomials of appropriate dimensions which converge pointwise to bb and σ\upsigma (which are already assumed to be continuous) where we also have continuous convergence in control variable ζ\zeta.

  • (iii)

    Consider a Vasicek interest rate model, given by

    drt=θ(μrt)dt+σdWt.\mathrm{d}r_{t}\,=\,\theta(\mu-r_{t})\mathrm{d}t+\sigma\mathrm{d}W_{t}\,.

    this is a mean reverting process, where θ\theta is the rate of reversion, μ\mu is the long term mean and σ\sigma is the volatility. The wealth process corresponding to this interest model can be described by Eq. 2.1 (see [AS08, Remark 2.1], [KT12], [DJ07])). Since market models are typically incomplete, usually model parameters (θ,μ,σ\theta,\mu,\sigma) are learned from the market data. This gives rise to the problem of robustness of optimal investment. This also applies to several other interest/pricing models as well, [merton1998applications].

  • (iv)

    In the above examples, cnc_{n} can be a regularized version of cc, e.g. by adding, for ϵn>0\epsilon_{n}>0, ϵnζTζ\epsilon_{n}\zeta^{T}\zeta where 𝕌m\mathbb{U}\subset\mathbb{R}^{m}, which then would continuously converge (in control) to cc as ϵn0\epsilon_{n}\to 0.

In the cases above, the approximating kernel conditions in (A5) would apply.

Remark 2.1.

If we replace σ(x)\sigma(x) by σ(x,ζ)\sigma(x,\zeta), in the relaxed control framework if σ(,v())\sigma(\cdot,v(\cdot)) is Lipschitz continuous for v𝔘𝗌𝗆v\in\mathfrak{U}_{\mathsf{sm}} then Eq. 2.1 admits a unique strong solution. But in general stationary policies v𝔘𝗌𝗆v\in\mathfrak{U}_{\mathsf{sm}} are just measurable functions. Existence of suitable strong solutions in our setting is not known (see, [ABG-book, Remarks 2.3.2], [B05Survey]) . However, under stationary Markov policies one can prove the existence of weak solutions which may not be unique [stroock1997multidimensional][ABG-book, Remarks 2.3.2] (note though that uniqueness is established for d=1,2d=1,2 in [stroock1997multidimensional, p. 192-194] under some conditions). The existence of a suitable strong solution (which is also a strong Markov process) under stationary Markov policies is essential to obtain stochastic representation of solutions of HJB equations (by applying Ito^\hat{o}-Krylov formula).

Notation:

  • For any set AdA\subset\mathds{R}^{d}, by τ(A)\uptau(A) we denote first exit time of the process {Xt}\{X_{t}\} from the set AdA\subset\mathds{R}^{d}, defined by

    τ(A):=inf{t>0:XtA}.\uptau(A)\,:=\,\inf\,\{t>0\,\colon X_{t}\not\in A\}\,.
  • r{\mathscr{B}}_{r} denotes the open ball of radius rr in d\mathds{R}^{d}, centered at the origin, and rc{\mathscr{B}}_{r}^{c} denotes the complement of r{\mathscr{B}}_{r} in d{\mathds{R}^{d}} .

  • τr\uptau_{r}, τ˘r{\breve{\uptau}}_{r} denote the first exist time from r{\mathscr{B}}_{r}, rc{\mathscr{B}}_{r}^{c} respectively, i.e., τr:=τ(r)\uptau_{r}:=\uptau({\mathscr{B}}_{r}), and τ˘r:=τ(rc){\breve{\uptau}}_{r}:=\uptau({\mathscr{B}}^{c}_{r}).

  • By TrS\operatorname*{Tr}S we denote the trace of a square matrix SS.

  • For any domain 𝒟d\mathcal{D}\subset\mathds{R}^{d}, the space 𝒞k(𝒟){\mathcal{C}}^{k}(\mathcal{D}) (𝒞(𝒟){\mathcal{C}}^{\infty}(\mathcal{D})), k0k\geq 0, denotes the class of all real-valued functions on 𝒟\mathcal{D} whose partial derivatives up to and including order kk (of any order) exist and are continuous.

  • 𝒞ck(𝒟){\mathcal{C}}_{\mathrm{c}}^{k}(\mathcal{D}) denotes the subset of 𝒞k(𝒟){\mathcal{C}}^{k}(\mathcal{D}), 0k0\leq k\leq\infty, consisting of functions that have compact support. This denotes the space of test functions.

  • 𝒞b(d){\mathcal{C}}_{b}({\mathds{R}^{d}}) denotes the class of bounded continuous functions on d{\mathds{R}^{d}} .

  • 𝒞0k(𝒟){\mathcal{C}}^{k}_{0}(\mathcal{D}), denotes the subspace of 𝒞k(𝒟){\mathcal{C}}^{k}(\mathcal{D}), 0k<0\leq k<\infty, consisting of functions that vanish in 𝒟c\mathcal{D}^{c}.

  • 𝒞k,r(𝒟){\mathcal{C}}^{k,r}(\mathcal{D}), denotes the class of functions whose partial derivatives up to order kk are Hölder continuous of order rr.

  • Lp(𝒟){L}^{p}(\mathcal{D}), p[1,)p\in[1,\infty), denotes the Banach space of (equivalence classes of) measurable functions ff satisfying 𝒟|f(x)|pdx<\int_{\mathcal{D}}\lvert f(x)\rvert^{p}\,\mathrm{d}{x}<\infty.

  • 𝒲k,p(𝒟){\mathscr{W}}^{k,p}(\mathcal{D}), k0k\geq 0, p1p\geq 1 denotes the standard Sobolev space of functions on 𝒟\mathcal{D} whose weak derivatives up to order kk are in Lp(𝒟){L}^{p}(\mathcal{D}), equipped with its natural norm (see, [Adams]) .

  • If 𝒳(Q)\mathcal{X}(Q) is a space of real-valued functions on QQ, 𝒳loc(Q)\mathcal{X}_{\mathrm{loc}}(Q) consists of all functions ff such that fφ𝒳(Q)f\varphi\in\mathcal{X}(Q) for every φ𝒞c(Q)\varphi\in{\mathcal{C}}_{\mathrm{c}}^{\infty}(Q). In a similar fashion, we define 𝒲lock,p(𝒟){\mathscr{W}}_{\text{loc}}^{k,p}(\mathcal{D}).

  • For μ>0\mu>0, let eμ(x)=eμ1+|x|2e_{\mu}(x)=e^{-\mu\sqrt{1+\lvert x\rvert^{2}}} , xdx\in{\mathds{R}^{d}} . Then fLp,μ((0,T)×d)f\in{L}^{p,\mu}((0,T)\times{\mathds{R}^{d}}) if feμLp((0,T)×d)fe_{\mu}\in{L}^{p}((0,T)\times{\mathds{R}^{d}}) . Similarly, 𝒲1,2,p,μ((0,T)×d)={fLp,μ((0,T)×d)f,ft,fxi,2fxixjLp,μ((0,T)×d)}{\mathscr{W}}^{1,2,p,\mu}((0,T)\times{\mathds{R}^{d}})=\{f\in{L}^{p,\mu}((0,T)\times{\mathds{R}^{d}})\mid f,\frac{\partial f}{\partial t},\frac{\partial f}{\partial x_{i}},\frac{\partial^{2}f}{\partial x_{i}\partial x_{j}}\in{L}^{p,\mu}((0,T)\times{\mathds{R}^{d}})\} with natural norm (see [BL84-book])

    f𝒲1,2,p,μ=ftLp,μ((0,T)×d)+fLp,μ((0,T)×d)+\displaystyle\lVert f\rVert_{{\mathscr{W}}^{1,2,p,\mu}}=\lVert\frac{\partial f}{\partial t}\rVert_{{L}^{p,\mu}((0,T)\times{\mathds{R}^{d}})}+\lVert f\rVert_{{L}^{p,\mu}((0,T)\times{\mathds{R}^{d}})}+ ifxiLp,μ((0,T)×d)\displaystyle\sum_{i}\lVert\frac{\partial f}{\partial x_{i}}\rVert_{{L}^{p,\mu}((0,T)\times{\mathds{R}^{d}})}
    +i,j2fxixjLp,μ((0,T)×d).\displaystyle+\sum_{i,j}\lVert\frac{\partial^{2}f}{\partial x_{i}\partial x_{j}}\rVert_{{L}^{p,\mu}((0,T)\times{\mathds{R}^{d}})}\,.

3. Analysis of Discounted Cost

In this section we analyze the robustness of optimal controls for discounted cost criterion. From [ABG-book, Theorem 3.5.6], we have the following characterization of the optimal α\alpha-discounted cost VαV_{\alpha} .

Theorem 3.1.

Suppose Assumptions (A1)-(A4) hold. Then the optimal discounted cost VαV_{\alpha} defined in Eq. 2.3 is the unique solution in 𝒞2(d)𝒞b(d){\mathcal{C}}^{2}({\mathds{R}^{d}})\cap{\mathcal{C}}_{b}({\mathds{R}^{d}}) of the HJB equation

minζ𝕌[ζVα(x)+c(x,ζ)]=αVα(x),for all xd.\min_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}V_{\alpha}(x)+c(x,\zeta)\right]=\alpha V_{\alpha}(x)\,,\quad\text{for all\ }\,\,x\in{\mathds{R}^{d}}\,. (3.1)

Moreover, v𝔘𝗌𝗆v^{*}\in\mathfrak{U}_{\mathsf{sm}} is α\alpha-discounted optimal control if and only if it is a measurable minimizing selector ofEq. 3.1, i.e.,

b(x,v(x))Vα(x)+c(x,v(x))=minζ𝕌[b(x,ζ)Vα(x)+c(x,ζ)]a.e.xd.b(x,v^{*}(x))\cdot\nabla V_{\alpha}(x)+c(x,v^{*}(x))=\min_{\zeta\in\mathbb{U}}\left[b(x,\zeta)\cdot\nabla V_{\alpha}(x)+c(x,\zeta)\right]\quad\text{a.e.}\,\,\,x\in{\mathds{R}^{d}}\,. (3.2)
Remark 3.1.

The assumption that the running cost is Lipschitz continuous in it’s first argument uniformly with respect to the second, is used to obtain a 𝒞2(d){\mathcal{C}}^{2}({\mathds{R}^{d}}) solution of the HJB equation Eq. 3.1. If we don’t have this uniformly Lipschitz assumption, one can still show that the HJB equation admits a solution now in 𝒲loc2,p(d){\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}}),  pd+1p\geq d+1 and all the conclusions of the Theorem 3.1 still hold . To see this: in view of [GilTru, Theorem 9.15] and the Schauder fixed point theorem, it can be shown that there exists ϕR𝒲2,p(R)\phi_{R}\in{\mathscr{W}}^{2,p}({\mathscr{B}}_{R}) satisfying the Dirichlet problem

minζ𝕌[ζϕR(x)+c(x,ζ)]=αϕR(x),for all xR,withϕR=0onR.\displaystyle\min_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}\phi_{R}(x)+c(x,\zeta)\right]=\alpha\phi_{R}(x)\,,\quad\text{for all\ }\,\,x\in{\mathscr{B}}_{R}\,,\quad\text{with}\quad\phi_{R}=0\,\,\,\text{on}\,\,\,\partial{{\mathscr{B}}_{R}}\,.

Now letting RR\to\infty and following [ABG-book, Theorem 3.5.6] we arrive at the solution.

Hence, one can replace our assumption (A4) by the following (relatively weaker) assumption

  • (A4)́

    The running cost cc is bounded (i.e., cM\|c\|_{\infty}\leq M for some positive constant MM) and jointly continuous in both variables (x,ζ)(x,\zeta) .

All the results of this paper will also hold if we replace (A4) by (A4)́ .

As in Theorem 3.1, following [ABG-book, Theorem 3.5.6], for each approximating model we have the following complete characterization of an optimal policy, which is in the space of stationary Markov policies.

Theorem 3.2.

Suppose (A5)(iii) holds. Then for each nn\in\mathds{N}, there exists a unique solution Vαn𝒞2(d)𝒞b(d)V_{\alpha}^{n}\in{\mathcal{C}}^{2}({\mathds{R}^{d}})\cap{\mathcal{C}}_{b}({\mathds{R}^{d}}) of

minζ𝕌[ζnVαn(x)+cn(x,ζ)]=αVαn(x),for all xd.\min_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}^{n}V_{\alpha}^{n}(x)+c_{n}(x,\zeta)\right]=\alpha V_{\alpha}^{n}(x)\,,\quad\text{for all\ }\,\,x\in{\mathds{R}^{d}}\,. (3.3)

Moreover, we have the following:

  • (i)

    VαnV_{\alpha}^{n} is the optimal discounted cost, i.e.,

    Vαn(x)=infU𝔘𝔼xU[0eαtcn(Xsn,Us)ds]xd,V_{\alpha}^{n}(x)=\inf_{U\in\mathfrak{U}}\operatorname{\mathbb{E}}_{x}^{U}\left[\int_{0}^{\infty}e^{-\alpha t}c_{n}(X_{s}^{n},U_{s})\mathrm{d}s\right]\quad x\in{\mathds{R}^{d}}\,,
  • (ii)

    vn𝔘𝗌𝗆v_{n}^{*}\in\mathfrak{U}_{\mathsf{sm}} is α\alpha-discounted optimal control if and only if it is a measurable minimizing selector ofEq. 3.3, i.e.,

    bn(x,vn(x))Vαn(x)+cn(x,vn(x))=minζ𝕌[bn(x,ζ)Vαn(x)+cn(x,ζ)]a.e.xd.b_{n}(x,v_{n}^{*}(x))\cdot\nabla V_{\alpha}^{n}(x)+c_{n}(x,v_{n}^{*}(x))=\min_{\zeta\in\mathbb{U}}\left[b_{n}(x,\zeta)\cdot\nabla V_{\alpha}^{n}(x)+c_{n}(x,\zeta)\right]\quad\text{a.e.}\,\,\,x\in{\mathds{R}^{d}}\,. (3.4)

In the next theorem, we prove that Vαn(x)V_{\alpha}^{n}(x) converges to Vα(x)V_{\alpha}(x) as nn\to\infty for all xdx\in{\mathds{R}^{d}} . This result will be useful in establishing the robustness of discounted optimal controls.

Theorem 3.3.

Suppose Assumptions (A1)-(A5) hold. Then

limnVαn(x)=Vα(x)for allxd.\lim_{n\to\infty}V_{\alpha}^{n}(x)=V_{\alpha}(x)\quad\text{for all}\,\,x\in{\mathds{R}^{d}}\,. (3.5)
Proof.

From Eq. 3.3 and Eq. 3.4 for any minimizing selector vn𝔘𝗌𝗆v_{n}^{*}\in\mathfrak{U}_{\mathsf{sm}}, it follows that

Tr(an(x)2Vαn(x))+bn(x,vn(x))Vαn(x)+cn(x,vn(x))=αVαn(x).\operatorname*{Tr}\bigl{(}a_{n}(x)\nabla^{2}V_{\alpha}^{n}(x)\bigr{)}+b_{n}(x,v_{n}^{*}(x))\cdot\nabla V_{\alpha}^{n}(x)+c_{n}(x,v_{n}^{*}(x))=\alpha V_{\alpha}^{n}(x)\,.

Then using the standard elliptic PDE estimate as in [GilTru, Theorem 9.11], for any pd+1p\geq d+1 and R>0R>0, we deduce that

Vαn(x)𝒲2,p(R)κ1(Vαn(x)Lp(2R)+cn(x,vn(x))Lp(2R)),\lVert V_{\alpha}^{n}(x)\rVert_{{\mathscr{W}}^{2,p}({\mathscr{B}}_{R})}\,\leq\,\kappa_{1}\bigl{(}\lVert V_{\alpha}^{n}(x)\rVert_{L^{p}({\mathscr{B}}_{2R})}+\lVert c_{n}(x,v_{n}^{*}(x))\rVert_{L^{p}({\mathscr{B}}_{2R})}\bigr{)}\,, (3.6)

where κ1\kappa_{1} is a positive constant which is independent of nn . Since

cn:=sup(x,u)d×𝕌cn(x,u)M,andVαn(x)cnα,\lVert c_{n}\rVert_{\infty}\,:=\,\sup_{(x,u)\in{\mathds{R}^{d}}\times\mathbb{U}}c_{n}(x,u)\leq M,\quad\text{and}\quad V_{\alpha}^{n}(x)\leq\frac{\lVert c_{n}\rVert_{\infty}}{\alpha}\,,

from Eq. 3.6 we get

Vαn(x)𝒲2,p(R)κ1M(|2R|1pα+|2R|1p).\lVert V_{\alpha}^{n}(x)\rVert_{{\mathscr{W}}^{2,p}({\mathscr{B}}_{R})}\,\leq\,\kappa_{1}M\bigl{(}\frac{|{\mathscr{B}}_{2R}|^{\frac{1}{p}}}{\alpha}+|{\mathscr{B}}_{2R}|^{\frac{1}{p}}\bigr{)}\,. (3.7)

We know that for 1<p<1<p<\infty, the space 𝒲2,p(R){\mathscr{W}}^{2,p}({\mathscr{B}}_{R}) is reflexive and separable, hence, as a corollary of the Banach-Alaoglu theorem, we have that every bounded sequence in 𝒲2,p(R){\mathscr{W}}^{2,p}({\mathscr{B}}_{R}) has a weakly convergent subsequence (see, [HB-book, Theorem 3.18.]). Also, we know that for pd+1p\geq d+1 the space 𝒲2,p(R){\mathscr{W}}^{2,p}({\mathscr{B}}_{R}) is compactly embedded in 𝒞1,β(¯R){\mathcal{C}}^{1,\beta}(\bar{{\mathscr{B}}}_{R}) , where β<1dp\beta<1-\frac{d}{p} (see [ABG-book, Theorem A.2.15 (2b)]), which implies that every weakly convergent sequence in 𝒲2,p(R){\mathscr{W}}^{2,p}({\mathscr{B}}_{R}) will converge strongly in 𝒞1,β(¯R){\mathcal{C}}^{1,\beta}(\bar{{\mathscr{B}}}_{R}) . Thus, in view of estimate Eq. 3.7, by a standard diagonalization argument and the Banach-Alaoglu theorem, we can extract a subsequence {Vαnk}\{V_{\alpha}^{n_{k}}\} such that for some Vα𝒲loc2,p(d)V_{\alpha}^{*}\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})

{VαnkVαin𝒲loc2,p(d)(weakly)VαnkVαin𝒞loc1,β(d)(strongly).\begin{cases}V_{\alpha}^{n_{k}}\to&V_{\alpha}^{*}\quad\text{in}\quad{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\quad\text{(weakly)}\\ V_{\alpha}^{n_{k}}\to&V_{\alpha}^{*}\quad\text{in}\quad{\mathcal{C}}^{1,\beta}_{loc}({\mathds{R}^{d}})\quad\text{(strongly)}\,.\end{cases} (3.8)

In the following, we will show that Vα=VαV^{*}_{\alpha}=V_{\alpha}. Now, for any compact set KdK\subset{\mathds{R}^{d}}, it is easy to see that

maxxK|minζ𝕌{bnk(x,ζ)Vαnk(x)+c(x,ζ)}minζ𝕌{b(x,ζ)Vα(x)+c(x,ζ)}|\displaystyle\max_{x\in K}|\min_{\zeta\in\mathbb{U}}\{b_{n_{k}}(x,\zeta)\cdot\nabla V_{\alpha}^{n_{k}}(x)+c(x,\zeta)\}-\min_{\zeta\in\mathbb{U}}\{b(x,\zeta)\cdot\nabla V_{\alpha}^{*}(x)+c(x,\zeta)\}|
maxxKmaxζ𝕌|{bnk(x,ζ)Vαnk(x)+cn(x,ζ)}{b(x,ζ)Vα(x)+c(x,ζ)}|\displaystyle\,\leq\,\max_{x\in K}\max_{\zeta\in\mathbb{U}}|\{b_{n_{k}}(x,\zeta)\cdot\nabla V_{\alpha}^{n_{k}}(x)+c_{n}(x,\zeta)\}-\{b(x,\zeta)\cdot\nabla V_{\alpha}^{*}(x)+c(x,\zeta)\}|
maxxKmaxζ𝕌|bnk(x,ζ)Vαnk(x)b(x,ζ)Vα(x)|+maxxKmaxζ𝕌|cnk(x,ζ)c(x,ζ)|\displaystyle\,\leq\,\max_{x\in K}\max_{\zeta\in\mathbb{U}}|b_{n_{k}}(x,\zeta)\cdot\nabla V_{\alpha}^{n_{k}}(x)-b(x,\zeta)\cdot\nabla V_{\alpha}^{*}(x)|+\max_{x\in K}\max_{\zeta\in\mathbb{U}}|c_{n_{k}}(x,\zeta)-c(x,\zeta)| (3.9)

Since cn(x,)c(x,)c_{n}(x,\cdot)\to c(x,\cdot), bn(x,)b(x,)b_{n}(x,\cdot)\to b(x,\cdot) continuously on compact set 𝕌\mathbb{U} and VαnkVαV_{\alpha}^{n_{k}}\to V_{\alpha}^{*} in 𝒞loc1,β(d),{\mathcal{C}}^{1,\beta}_{loc}({\mathds{R}^{d}})\,, for any compact set KdK\subset{\mathds{R}^{d}}, as kk\to\infty we deduce that

maxxK|minζ𝕌{bnk(x,ζ)Vαnk(x)+cnk(x,ζ)}minζ𝕌{b(x,ζ)Vα(x)+c(x,ζ)}|0\displaystyle\max_{x\in K}|\min_{\zeta\in\mathbb{U}}\{b_{n_{k}}(x,\zeta)\cdot\nabla V_{\alpha}^{n_{k}}(x)+c_{n_{k}}(x,\zeta)\}-\min_{\zeta\in\mathbb{U}}\{b(x,\zeta)\cdot\nabla V_{\alpha}(x)+c(x,\zeta)\}|\to 0 (3.10)

Thus, multiplying by a test function ϕ𝒞c(d)\phi\in{\mathcal{C}}_{c}^{\infty}({\mathds{R}^{d}}), from Eq. 3.3, we obtain

dTr(ank(x)2Vαnk(x))ϕ(x)dx+dminζ𝕌{bnk(x,ζ)Vαnk(x)+cnk(x,ζ)}ϕ(x)dx=αdVαnk(x)ϕ(x)dx.\int_{{\mathds{R}^{d}}}\operatorname*{Tr}\bigl{(}a_{n_{k}}(x)\nabla^{2}V_{\alpha}^{n_{k}}(x)\bigr{)}\phi(x)\mathrm{d}x+\int_{{\mathds{R}^{d}}}\min_{\zeta\in\mathbb{U}}\{b_{n_{k}}(x,\zeta)\cdot\nabla V_{\alpha}^{n_{k}}(x)+c_{n_{k}}(x,\zeta)\}\phi(x)\mathrm{d}x=\alpha\int_{{\mathds{R}^{d}}}V_{\alpha}^{n_{k}}(x)\phi(x)\mathrm{d}x\,.

In view of Eq. 3.8 and Eq. 3.10, letting kk\to\infty it follows that

dTr(a(x)2Vα(x))ϕ(x)dx+dminζ𝕌{b(x,ζ)Vα(x)+c(x,ζ)}ϕ(x)dx=αdVα(x)ϕ(x)dx.\int_{{\mathds{R}^{d}}}\operatorname*{Tr}\bigl{(}a(x)\nabla^{2}V_{\alpha}^{*}(x)\bigr{)}\phi(x)\mathrm{d}x+\int_{{\mathds{R}^{d}}}\min_{\zeta\in\mathbb{U}}\{b(x,\zeta)\cdot\nabla V_{\alpha}^{*}(x)+c(x,\zeta)\}\phi(x)\mathrm{d}x=\alpha\int_{{\mathds{R}^{d}}}V_{\alpha}^{*}(x)\phi(x)\mathrm{d}x\,. (3.11)

Since ϕ𝒞c(d)\phi\in{\mathcal{C}}_{c}^{\infty}({\mathds{R}^{d}}) is arbitrary and Vα𝒲loc2,p(d)V_{\alpha}^{*}\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}}) from LABEL:{ETC1.3D} we deduce that

minζ𝕌[ζVα(x)+c(x,ζ)]=αVα(x),a.e. xd.\min_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}V_{\alpha}^{*}(x)+c(x,\zeta)\right]=\alpha V_{\alpha}^{*}(x)\,,\quad\text{a.e.\ }\,\,x\in{\mathds{R}^{d}}\,. (3.12)

Let v~𝔘𝗌𝗆\tilde{v}^{*}\in\mathfrak{U}_{\mathsf{sm}} be a minimizing selector of Eq. 3.12 and X~\tilde{X} be the solution of the SDE Eq. 2.1 corresponding to v~\tilde{v}^{*}. Then applying Ito^\hat{\rm o}-Krylov formula, we obtain the following

𝔼xv~[eαTVα(X~T)]Vα(x)\displaystyle\operatorname{\mathbb{E}}_{x}^{\tilde{v}^{*}}\left[e^{-\alpha T}V_{\alpha}^{*}(\tilde{X}_{T})\right]-V_{\alpha}^{*}(x)
=𝔼xv~[0Teαs{Tr(a(X~s)2Vα(X~s))+b(X~s,v~(X~s))Vα(X~s)αVα(X~s))}ds].\displaystyle\,=\,\operatorname{\mathbb{E}}_{x}^{\tilde{v}^{*}}\left[\int_{0}^{T}e^{-\alpha s}\{\operatorname*{Tr}\bigl{(}a(\tilde{X}_{s})\nabla^{2}V_{\alpha}^{*}(\tilde{X}_{s})\bigr{)}+b(\tilde{X}_{s},\tilde{v}^{*}(\tilde{X}_{s}))\cdot\nabla V_{\alpha}^{*}(\tilde{X}_{s})-\alpha V_{\alpha}^{*}(\tilde{X}_{s}))\}\mathrm{d}{s}\right]\,.

Hence, using Eq. 3.12, we deduce that

𝔼xv~[eαTVα(X~T)]Vα(x)=𝔼xv~[0Teαsc(X~s,v~(X~s))ds].\displaystyle\operatorname{\mathbb{E}}_{x}^{\tilde{v}^{*}}\left[e^{-\alpha T}V_{\alpha}^{*}(\tilde{X}_{T})\right]-V_{\alpha}^{*}(x)\,=\,-\operatorname{\mathbb{E}}_{x}^{\tilde{v}^{*}}\left[\int_{0}^{T}e^{-\alpha s}c(\tilde{X}_{s},\tilde{v}^{*}(\tilde{X}_{s}))\mathrm{d}{s}\right]\,. (3.13)

Since VαV_{\alpha}^{*} is bounded and

𝔼xv~[eαTVα(X~T)]=eαT𝔼xv~[Vα(X~T)],\operatorname{\mathbb{E}}_{x}^{\tilde{v}^{*}}\left[e^{-\alpha T}V_{\alpha}^{*}(\tilde{X}_{T})\right]=e^{-\alpha T}\operatorname{\mathbb{E}}_{x}^{\tilde{v}^{*}}\left[V_{\alpha}^{*}(\tilde{X}_{T})\right],

letting TT\to\infty, it is easy to see that

limT𝔼xv~[eαTVα(X~T)]=0.\lim_{T\to\infty}\operatorname{\mathbb{E}}_{x}^{\tilde{v}^{*}}\left[e^{-\alpha T}V_{\alpha}^{*}(\tilde{X}_{T})\right]=0\,.

Now, letting TT\to\infty by monotone convergence theorem, from Eq. 3.13 we obtain

Vα(x)=𝔼xv~[0eαsc(X~s,v~(X~s))ds].\displaystyle V_{\alpha}^{*}(x)\,=\,\operatorname{\mathbb{E}}_{x}^{\tilde{v}^{*}}\left[\int_{0}^{\infty}e^{-\alpha s}c(\tilde{X}_{s},\tilde{v}^{*}(\tilde{X}_{s}))\mathrm{d}{s}\right]\,. (3.14)

Again by similar argument, applying Ito^\hat{\rm o}-Krylov formula and using Eq. 3.12, for any U𝔘U\in\mathfrak{U} , we have

Vα(x)𝔼xU[0eαsc(X~s,Us)ds].\displaystyle V_{\alpha}^{*}(x)\,\leq\,\operatorname{\mathbb{E}}_{x}^{U}\left[\int_{0}^{\infty}e^{-\alpha s}c(\tilde{X}_{s},U_{s})\mathrm{d}{s}\right]\,.

This implies

Vα(x)infU𝔘𝔼xU[0eαsc(X~s,Us)ds].\displaystyle V_{\alpha}^{*}(x)\,\leq\,\inf_{U\in\mathfrak{U}}\operatorname{\mathbb{E}}_{x}^{U}\left[\int_{0}^{\infty}e^{-\alpha s}c(\tilde{X}_{s},U_{s})\mathrm{d}{s}\right]\,. (3.15)

Thus, from Eq. 3.14 and Eq. 3.15, we deduce that

Vα(x)=infU𝔘𝔼xU[0eαsc(X~s,Us)ds].\displaystyle V_{\alpha}^{*}(x)\,=\,\inf_{U\in\mathfrak{U}}\operatorname{\mathbb{E}}_{x}^{U}\left[\int_{0}^{\infty}e^{-\alpha s}c(\tilde{X}_{s},U_{s})\mathrm{d}{s}\right]\,. (3.16)

Since both Vα,VαV_{\alpha},V_{\alpha}^{*} are continuous functions on d{\mathds{R}^{d}}, from Eq. 2.3 and Eq. 3.16, it follows that Vα(x)=Vα(x)V_{\alpha}(x)=V_{\alpha}^{*}(x) for all xdx\in{\mathds{R}^{d}}. This completes the proof. ∎

Let X^n\hat{X}^{n} be the solution of the SDE Eq. 2.1 corresponding to vnv_{n}^{*}. Then we have

𝒥αvn(x,c)=𝔼xvn[0eαtc(X^sn,vn(X^sn))ds],xd.{\mathcal{J}}_{\alpha}^{v_{n}^{*}}(x,c)\,=\,\operatorname{\mathbb{E}}_{x}^{v_{n}^{*}}\left[\int_{0}^{\infty}e^{-\alpha t}c(\hat{X}^{n}_{s},v_{n}^{*}(\hat{X}^{n}_{s}))\mathrm{d}s\right],\quad x\in{\mathds{R}^{d}}\,. (3.17)

Next we prove the robustness result, i.e., we prove that 𝒥αvn(x,c)𝒥αv(x,c){\mathcal{J}}_{\alpha}^{v_{n}^{*}}(x,c)\to{\mathcal{J}}_{\alpha}^{v^{*}}(x,c) as nn\to\infty , where vnv_{n}^{*} is an optimal control of the approximated model and vv^{*} is an optimal control of the true model. As in [KY-20] we will use the continuity result above as an intermediate step.

Theorem 3.4.

Suppose Assumptions (A1)-(A5) hold. Then

limn𝒥αvn(x,c)=𝒥αv(x,c)for allxd.\lim_{n\to\infty}{\mathcal{J}}_{\alpha}^{v_{n}^{*}}(x,c)={\mathcal{J}}_{\alpha}^{v^{*}}(x,c)\quad\text{for all}\,\,x\in{\mathds{R}^{d}}\,. (3.18)
Proof.

Following the argument as in [ABG-book, Theorem 3.5.6], one can show that for each vn𝔘𝗌𝗆v_{n}^{*}\in\mathfrak{U}_{\mathsf{sm}}, there exists Vαn,𝒲loc2,p(d)𝒞b(d)V_{\alpha}^{n,*}\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\cap{\mathcal{C}}_{b}({\mathds{R}^{d}}) satisfying

Tr(a(x)2Vαn,(x))+b(x,vn(x))Vαn,(x)+c(x,vn(x))=αVαn,(x).\operatorname*{Tr}\bigl{(}a(x)\nabla^{2}V_{\alpha}^{n,*}(x)\bigr{)}+b(x,v_{n}^{*}(x))\cdot\nabla V_{\alpha}^{n,*}(x)+c(x,v_{n}^{*}(x))=\alpha V_{\alpha}^{n,*}(x)\,. (3.19)

Applying Ito^\hat{\rm o}-Krylov formula, we deduce that

𝔼xvn[eαTVαn,(X^Tn)]Vαn,(x)\displaystyle\operatorname{\mathbb{E}}_{x}^{v_{n}^{*}}\left[e^{-\alpha T}V_{\alpha}^{n,*}(\hat{X}^{n}_{T})\right]-V_{\alpha}^{n,*}(x)
=𝔼xvn[0Teαs{Tr(a(X^sn)2Vαn,(X^sn))+b(X^sn,vn(X^sn))Vαn,(X^sn)αVαn,(X^sn))}ds].\displaystyle\,=\,\operatorname{\mathbb{E}}_{x}^{v_{n}^{*}}\left[\int_{0}^{T}e^{-\alpha s}\{\operatorname*{Tr}\bigl{(}a(\hat{X}^{n}_{s})\nabla^{2}V_{\alpha}^{n,*}(\hat{X}^{n}_{s})\bigr{)}+b(\hat{X}^{n}_{s},v_{n}^{*}(\hat{X}^{n}_{s}))\cdot\nabla V_{\alpha}^{n,*}(\hat{X}^{n}_{s})-\alpha V_{\alpha}^{n,*}(\hat{X}^{n}_{s}))\}\mathrm{d}{s}\right]\,.

Now using Eq. 3.19, it follows that

𝔼xvn[eαTVαn,(X^Tn)]Vαn,(x)=𝔼xvn[0Teαsc(X^sn,vn(X^sn))ds].\displaystyle\operatorname{\mathbb{E}}_{x}^{v_{n}^{*}}\left[e^{-\alpha T}V_{\alpha}^{n,*}(\hat{X}^{n}_{T})\right]-V_{\alpha}^{n,*}(x)\,=\,-\operatorname{\mathbb{E}}_{x}^{v_{n}^{*}}\left[\int_{0}^{T}e^{-\alpha s}c(\hat{X}^{n}_{s},v_{n}^{*}(\hat{X}^{n}_{s}))\mathrm{d}{s}\right]\,. (3.20)

Since Vαn,V_{\alpha}^{n,*} is bounded and

𝔼xvn[eαTVαn,(X^Tn)]=eαT𝔼xvn[Vαn,(X^Tn)],\operatorname{\mathbb{E}}_{x}^{v_{n}^{*}}\left[e^{-\alpha T}V_{\alpha}^{n,*}(\hat{X}^{n}_{T})\right]=e^{-\alpha T}\operatorname{\mathbb{E}}_{x}^{v_{n}^{*}}\left[V_{\alpha}^{n,*}(\hat{X}^{n}_{T})\right],

letting TT\to\infty we deduce that

limT𝔼xvn[eαTVαn,(X^Tn)]=0.\lim_{T\to\infty}\operatorname{\mathbb{E}}_{x}^{v_{n}^{*}}\left[e^{-\alpha T}V_{\alpha}^{n,*}(\hat{X}^{n}_{T})\right]=0\,.

Thus, from Eq. 3.20, letting TT\to\infty by monotone convergence theorem we obtain

Vαn,(x)=𝔼xvn[0eαsc(X^sn,vn(X^sn))ds]=𝒥αvn(x,c).\displaystyle V_{\alpha}^{n,*}(x)\,=\,\operatorname{\mathbb{E}}_{x}^{v_{n}^{*}}\left[\int_{0}^{\infty}e^{-\alpha s}c(\hat{X}^{n}_{s},v_{n}^{*}(\hat{X}^{n}_{s}))\mathrm{d}{s}\right]\,=\,{\mathcal{J}}_{\alpha}^{v_{n}^{*}}(x,c)\,. (3.21)

This implies that Vαn,cαV_{\alpha}^{n,*}\leq\frac{\lVert c\rVert_{\infty}}{\alpha}. Thus, as in Theorem 3.3 (see, Eq. 3.6, Eq. 3.7), by standard Sobolev estimate, for any R>0R>0 we get Vαn,𝒲2,p(R)κ2\lVert V_{\alpha}^{n,*}\rVert_{{\mathscr{W}}^{2,p}({\mathscr{B}}_{R})}\leq\kappa_{2} , for some positive constant κ2\kappa_{2} independent of nn. Hence, by the Banach-Alaoglu theorem and standard diagonalization argument (as in Eq. 3.8), we have there exists V^α𝒲loc2,p(d)𝒞b(d)\hat{V}_{\alpha}^{*}\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\cap{\mathcal{C}}_{b}({\mathds{R}^{d}}) such that along some sub-sequence {Vαnk,}\{V_{\alpha}^{n_{k},*}\}

{Vαnk,V^αin𝒲loc2,p(d)(weakly)Vαnk,V^αin𝒞loc1,β(d)(strongly).\begin{cases}V_{\alpha}^{n_{k},*}\to&\hat{V}_{\alpha}^{*}\quad\text{in}\quad{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\quad\text{(weakly)}\\ V_{\alpha}^{n_{k},*}\to&\hat{V}_{\alpha}^{*}\quad\text{in}\quad{\mathcal{C}}^{1,\beta}_{loc}({\mathds{R}^{d}})\quad\text{(strongly)}\,.\end{cases} (3.22)

Since space of stationary Markov strategies 𝔘𝗌𝗆\mathfrak{U}_{\mathsf{sm}} is compact, along some further sub-sequence (without loss of generality denoting by same sequence) we have vnkv^v_{n_{k}}^{*}\to\hat{v}^{*} in 𝔘𝗌𝗆\mathfrak{U}_{\mathsf{sm}} . It is easy to see that

b(x,vnk(x))Vαnk,(x)b(x,v^(x))V^α(x)=\displaystyle b(x,v_{n_{k}}^{*}(x))\cdot\nabla V_{\alpha}^{n_{k},*}(x)-b(x,\hat{v}^{*}(x))\cdot\nabla\hat{V}_{\alpha}^{*}(x)= b(x,vnk(x))(Vαnk,V^α)(x)\displaystyle b(x,v_{n_{k}}^{*}(x))\cdot\nabla\left(V_{\alpha}^{n_{k},*}-\hat{V}_{\alpha}^{*}\right)(x)
+(b(x,vnk(x))b(x,v^(x)))V^α(x).\displaystyle+\left(b(x,v_{n_{k}}^{*}(x))-b(x,\hat{v}^{*}(x))\right)\cdot\nabla\hat{V}_{\alpha}^{*}(x)\,.

Since Vαnk,V^αV_{\alpha}^{n_{k},*}\to\hat{V}_{\alpha}^{*} in 𝒞loc1,β(d),{\mathcal{C}}^{1,\beta}_{loc}({\mathds{R}^{d}})\,, on any compact set b(x,vnk(x))(Vαnk,V^α)(x)0b(x,v_{n_{k}}^{*}(x))\cdot\nabla\left(V_{\alpha}^{n_{k},*}-\hat{V}_{\alpha}^{*}\right)(x)\to 0 strongly and by the topology of 𝔘𝗌𝗆\mathfrak{U}_{\mathsf{sm}}, we have (b(x,vnk(x))b(x,v^(x)))V^α(x)0\left(b(x,v_{n_{k}}^{*}(x))-b(x,\hat{v}^{*}(x))\right)\cdot\nabla\hat{V}_{\alpha}^{*}(x)\to 0 weakly. Thus, in view of the topology of 𝔘𝗌𝗆\mathfrak{U}_{\mathsf{sm}}, and since Vαnk,V^αV_{\alpha}^{n_{k},*}\to\hat{V}_{\alpha}^{*} in 𝒞loc1,β(d),{\mathcal{C}}^{1,\beta}_{loc}({\mathds{R}^{d}})\,, as kk\to\infty we obtain

b(x,vnk(x))Vαnk,(x)+c(x,vnk(x))b(x,v^(x))V^α(x)+c(x,v^(x))weakly.b(x,v_{n_{k}}^{*}(x))\cdot\nabla V_{\alpha}^{n_{k},*}(x)+c(x,v_{n_{k}}^{*}(x))\to b(x,\hat{v}^{*}(x))\cdot\nabla\hat{V}_{\alpha}^{*}(x)+c(x,\hat{v}^{*}(x))\quad\text{weakly}\,. (3.23)

Now, multiplying by a test function ϕ𝒞c(d)\phi\in{\mathcal{C}}_{c}^{\infty}({\mathds{R}^{d}}), from Eq. 3.19, it follows that

dTr(a(x)2Vαnk,(x))ϕ(x)dx+d{b(x,vnk(x))Vαnk,(x)+\displaystyle\int_{{\mathds{R}^{d}}}\operatorname*{Tr}\bigl{(}a(x)\nabla^{2}V_{\alpha}^{n_{k},*}(x)\bigr{)}\phi(x)\mathrm{d}x+\int_{{\mathds{R}^{d}}}\{b(x,v_{n_{k}}^{*}(x))\cdot\nabla V_{\alpha}^{n_{k},*}(x)+ c(x,vnk(x))}ϕ(x)dx\displaystyle c(x,v_{n_{k}}^{*}(x))\}\phi(x)\mathrm{d}x
=αdVαnk,(x)ϕ(x)dx.\displaystyle=\alpha\int_{{\mathds{R}^{d}}}V_{\alpha}^{n_{k},*}(x)\phi(x)\mathrm{d}x\,.

Hence, using Eq. 3.22, Eq. 3.23, and letting kk\to\infty we obtain

dTr(a(x)2V^α(x))ϕ(x)dx+d{b(x,v^(x))V^α(x)+c(x,v^(x))}ϕ(x)dx=αdV^α(x)ϕ(x)dx.\int_{{\mathds{R}^{d}}}\operatorname*{Tr}\bigl{(}a(x)\nabla^{2}\hat{V}_{\alpha}^{*}(x)\bigr{)}\phi(x)\mathrm{d}x+\int_{{\mathds{R}^{d}}}\{b(x,\hat{v}^{*}(x))\cdot\nabla\hat{V}_{\alpha}^{*}(x)+c(x,\hat{v}^{*}(x))\}\phi(x)\mathrm{d}x=\alpha\int_{{\mathds{R}^{d}}}\hat{V}_{\alpha}^{*}(x)\phi(x)\mathrm{d}x\,. (3.24)

Since ϕ𝒞c(d)\phi\in{\mathcal{C}}_{c}^{\infty}({\mathds{R}^{d}}) is arbitrary and V^α𝒲loc2,p(d)\hat{V}_{\alpha}^{*}\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}}) from Eq. 3.24, we deduce that the function V^α𝒲loc2,p(d)𝒞b(d)\hat{V}_{\alpha}^{*}\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\cap{\mathcal{C}}_{b}({\mathds{R}^{d}}) satisfies

Tr(a(x)2V^α(x))+b(x,v^(x))V^α(x)+c(x,v^(x))=αV^α(x).\operatorname*{Tr}\bigl{(}a(x)\nabla^{2}\hat{V}_{\alpha}^{*}(x)\bigr{)}+b(x,\hat{v}^{*}(x))\cdot\nabla\hat{V}_{\alpha}^{*}(x)+c(x,\hat{v}^{*}(x))=\alpha\hat{V}_{\alpha}^{*}(x)\,. (3.25)

As earlier, applying Ito^\hat{\rm o}-Krylov formula and using LABEL:{ETC1.4F}, it follows that

V^α(x)=𝔼xv^[0eαsc(X^s,v^(X^s))ds],\displaystyle\hat{V}_{\alpha}^{*}(x)\,=\,\operatorname{\mathbb{E}}_{x}^{\hat{v}^{*}}\left[\int_{0}^{\infty}e^{-\alpha s}c(\hat{X}_{s},\hat{v}^{*}(\hat{X}_{s}))\mathrm{d}{s}\right]\,, (3.26)

where X^\hat{X} is the solution of SDE Eq. 2.1 corresponding to v^\hat{v}^{*} .

Now, we have

|𝒥αvnk(x,c)𝒥αv(x,c)||𝒥αvnk(x,c)Vαnk(x)|+|Vαnk(x)𝒥αv(x,c)|.|{\mathcal{J}}_{\alpha}^{v_{n_{k}}^{*}}(x,c)-{\mathcal{J}}_{\alpha}^{v^{*}}(x,c)|\leq|{\mathcal{J}}_{\alpha}^{v_{n_{k}}^{*}}(x,c)-V_{\alpha}^{n_{k}}(x)|+|V_{\alpha}^{n_{k}}(x)-{\mathcal{J}}_{\alpha}^{v^{*}}(x,c)|\,. (3.27)

From Theorem 3.1, we know that 𝒥αv(x,c)=Vα(x){\mathcal{J}}_{\alpha}^{v^{*}}(x,c)=V_{\alpha}(x). Thus from Theorem 3.3, we deduce that |Vαnk(x)𝒥αv(x,c)|0|V_{\alpha}^{n_{k}}(x)-{\mathcal{J}}_{\alpha}^{v^{*}}(x,c)|\to 0 as kk\to\infty . To complete the proof we have to show that |𝒥αvnk(x,c)Vαnk(x)|0|{\mathcal{J}}_{\alpha}^{v_{n_{k}}^{*}}(x,c)-V_{\alpha}^{n_{k}}(x)|\to 0 as kk\to\infty . Also, from Theorem 3.2 we know that vn𝔘𝗌𝗆v_{n}^{*}\in\mathfrak{U}_{\mathsf{sm}} is a minimizing selector of the HJB equation Eq. 3.3 of the approximated model, thus it follows that

αVαnk(x)\displaystyle\alpha V_{\alpha}^{n_{k}}(x) =minζ𝕌[ζnkVαnk(x)+cnk(x,ζ)]\displaystyle\,=\,\min_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}^{n_{k}}V_{\alpha}^{n_{k}}(x)+c_{n_{k}}(x,\zeta)\right]
=Tr(ank(x)2Vαnk(x))+bnk(x,vnk(x))Vαnk(x)+c(x,vnk(x)),a.e. xd.\displaystyle\,=\,\operatorname*{Tr}\bigl{(}a_{n_{k}}(x)\nabla^{2}V_{\alpha}^{n_{k}}(x)\bigr{)}+b_{n_{k}}(x,v_{n_{k}}^{*}(x))\cdot\nabla V_{\alpha}^{n_{k}}(x)+c(x,v_{n_{k}}^{*}(x))\,,\quad\text{a.e.\ }\,\,x\in{\mathds{R}^{d}}\,. (3.28)

Hence, by standard Sobolev estimate (as in Theorem 3.3), for each R>0R>0 we have Vαnk𝒲2,p(R)κ3\lVert V_{\alpha}^{n_{k}}\rVert_{{\mathscr{W}}^{2,p}({\mathscr{B}}_{R})}\leq\kappa_{3} , for some positive constant κ3\kappa_{3} independent of kk. Thus, we can extract a further sub-sequence (without loss of generality denoting by same sequence) such that for some V~α𝒲loc2,p(d)𝒞b(d)\tilde{V}_{\alpha}^{*}\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\cap{\mathcal{C}}_{b}({\mathds{R}^{d}}) (as in Eq. 3.8) we get

{VαnkV~αin𝒲loc2,p(d)(weakly)VαnkV~αin𝒞loc1,β(d)(strongly).\begin{cases}V_{\alpha}^{n_{k}}\to&\tilde{V}_{\alpha}^{*}\quad\text{in}\quad{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\quad\text{(weakly)}\\ V_{\alpha}^{n_{k}}\to&\tilde{V}_{\alpha}^{*}\quad\text{in}\quad{\mathcal{C}}^{1,\beta}_{loc}({\mathds{R}^{d}})\quad\text{(strongly)}\,.\end{cases} (3.29)

Following the similar steps as in Theorem 3.3, multiplying by test function and letting kk\to\infty, from Section 3 we deduce that V~α𝒲loc2,p(d)𝒞b(d)\tilde{V}_{\alpha}^{*}\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\cap{\mathcal{C}}_{b}({\mathds{R}^{d}}) satisfies

αV~α(x)\displaystyle\alpha\tilde{V}_{\alpha}^{*}(x) =minζ𝕌[ζV~α(x)+c(x,ζ)]\displaystyle\,=\,\min_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}\tilde{V}_{\alpha}^{*}(x)+c(x,\zeta)\right]
=Tr(a(x)2V~α(x))+b(x,v^(x))V~α(x)+c(x,v^(x))(x).\displaystyle\,=\,\operatorname*{Tr}\bigl{(}a(x)\nabla^{2}\tilde{V}_{\alpha}^{*}(x)\bigr{)}+b(x,\hat{v}^{*}(x))\cdot\nabla\tilde{V}_{\alpha}^{*}(x)+c(x,\hat{v}^{*}(x))(x)\,. (3.30)

From the continuity results (Theorem 3.3), it is easy to see that V~α(x)=𝒥αv(x,c)\tilde{V}_{\alpha}^{*}(x)={\mathcal{J}}_{\alpha}^{v^{*}}(x,c) for all xdx\in{\mathds{R}^{d}}. Moreover, applying Ito^\hat{\rm o}-Krylov formula and using LABEL:{ETC1.4I} we obtain

V~α(x)=𝔼xv^[0eαsc(X^s,v^(X^s))ds].\displaystyle\tilde{V}_{\alpha}^{*}(x)\,=\,\operatorname{\mathbb{E}}_{x}^{\hat{v}^{*}}\left[\int_{0}^{\infty}e^{-\alpha s}c(\hat{X}_{s},\hat{v}^{*}(\hat{X}_{s}))\mathrm{d}{s}\right]\,. (3.31)

Since both V^α\hat{V}_{\alpha}^{*}, V~α\tilde{V}_{\alpha}^{*} are continuous, from Eq. 3.26 and Eq. 3.31, it follows that both 𝒥αvnk(x,c){\mathcal{J}}_{\alpha}^{v_{n_{k}}^{*}}(x,c) (which is equals to Vαnk,(x)V_{\alpha}^{n_{k},*}(x)) and Vαnk(x)V_{\alpha}^{n_{k}}(x) converge to the same limit. This completes the proof. ∎

Remark 3.2.

Note that in the above, we indirectly also showed the continuity of the value function in the control policy (under the topology defined); uniqueness of the solution to the PDE in the above implies continuity. This result, while can be obtained from the analysis of Borkar [Bor89] (in a slightly more restrictive setup), is obtained directly via a careful optimality analysis and will have important consequences in numerical solutions and approximation results for both discounted and average cost optimality. This is studied in details, with implications in [YukselPradhan] .

4. Analysis of Ergodic Cost

In this section we study the robustness problem for the ergodic cost criterion. The associated optimal control problem for this cost criterion has been studied extensively in the literature, see e.g., [ABG-book].

For this cost evolution criterion we will study the robustness problem under two sets of assumptions: the first is so called near-monotonicity condition on the running cost which discourage instability and second is Lyapunov stability.

4.1. Analysis under a near-monotonicity assumption

Here we assume that the cost function cc satisfies the following near-monotonicity condition:

  • (A6)

    It holds that

    lim infxinfζ𝕌c(x,ζ)>(c).\liminf_{\lVert x\rVert\to\infty}\inf_{\zeta\in\mathbb{U}}c(x,\zeta)>{\mathscr{E}}^{*}(c)\,. (4.1)

This condition penalizes the escape of probability mass to infinity. Since our running cost cc is bounded it is easy to see that (c)c{\mathscr{E}}^{*}(c)\leq\lVert c\rVert_{\infty} . Recall that a stationary policy v𝔘𝗌𝗆v\in\mathfrak{U}_{\mathsf{sm}} is said to be stable if the associated diffusion process is positive recurrent. It is known that under Eq. 4.1, optimal control exists in the space of stable stationary Markov controls (see, [ABG-book, Theorem 3.4.5]).

Now from [ABG-book, Theorem 3.6.10], we have the following complete characterization of ergodic optimal control.

Theorem 4.1.

Suppose that Assumptions (A1)-(A4) and (A6) hold. Then there exists a unique solution pair (V,ρ)𝒲loc2,p(d)×(V,\rho)\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\times\mathds{R},   1<p<1<p<\infty, with V(0)=0V(0)=0 and infdV>\inf_{{\mathds{R}^{d}}}V>-\infty and ρ(c)\rho\leq{\mathscr{E}}^{*}(c), satisfying

ρ=minζ𝕌[ζV(x)+c(x,ζ)].\rho=\min_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}V(x)+c(x,\zeta)\right]\,. (4.2)

Moreover, we have

  • (i)

    ρ=(c)\rho={\mathscr{E}}^{*}(c)

  • (ii)

    a stationary Markov control v𝔘𝗌𝗆v^{*}\in\mathfrak{U}_{\mathsf{sm}} is an optimal control if and only if it is a minimizing selector of Eq. 4.2, i.e., if and only if it satisfies

    minζ𝕌[ζV(x)+c(x,ζ)]=Tr(a(x)2V(x))+b(x,v(x))V(x)+c(x,v(x)),a.e.xd.\min_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}V(x)+c(x,\zeta)\right]\,=\,\operatorname*{Tr}\bigl{(}a(x)\nabla^{2}V(x)\bigr{)}+b(x,v^{*}(x))\cdot\nabla V(x)+c(x,v^{*}(x))\,,\quad\text{a.e.}\,\,x\in{\mathds{R}^{d}}\,. (4.3)

We assume that for the approximated model, for each nn\in\mathds{N} the running cost function cnc_{n} satisfies the near-monotonicity condition Eq. 4.1 relative to maxnn(cn)\max_{n\in\mathds{N}}{\mathscr{E}}^{n*}(c_{n})  i.e.,

maxnn(cn)<lim infxinfζ𝕌cn(x,ζ).\max_{n\in\mathds{N}}{\mathscr{E}}^{n*}(c_{n})<\liminf_{\lVert x\rVert\to\infty}\inf_{\zeta\in\mathbb{U}}c_{n}(x,\zeta)\,. (4.4)

Thus, in view of [ABG-book, Theorem 3.6.10], for the approximating model, for each nn\in\mathds{N} we have the following theorem.

Theorem 4.2.

Suppose that Assumption (A5)(iii) holds. Then there exists a unique solution pair (Vn,ρn)𝒲loc2,p(d)×(V_{n},\rho_{n})\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\times\mathds{R},   1<p<1<p<\infty, with Vn(0)=0V_{n}(0)=0 and infdVn>\inf_{{\mathds{R}^{d}}}V_{n}>-\infty and ρnn(cn)\rho_{n}\leq{\mathscr{E}}^{n*}(c_{n}), satisfying

ρn=minζ𝕌[ζnVn(x)+cn(x,ζ)]\rho_{n}=\min_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}^{n}V_{n}(x)+c_{n}(x,\zeta)\right] (4.5)

Moreover, we have

  • (i)

    ρn=n(cn)\rho_{n}={\mathscr{E}}^{n*}(c_{n})

  • (ii)

    a stationary Markov control vn𝔘𝗌𝗆v_{n}^{*}\in\mathfrak{U}_{\mathsf{sm}} is an optimal control if and only if it is a minimizing selector of Eq. 4.5, i.e., if and only if it satisfies

    minζ𝕌[ζnVn(x)+cn(x,ζ)]=Tr(an(x)2Vn(x))+bn(x,vn(x))Vn(x)+cn(x,vn(x)),a.e.xd.\min_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}^{n}V_{n}(x)+c_{n}(x,\zeta)\right]\,=\,\operatorname*{Tr}\bigl{(}a_{n}(x)\nabla^{2}V_{n}(x)\bigr{)}+b_{n}(x,v_{n}^{*}(x))\cdot\nabla V_{n}(x)+c_{n}(x,v_{n}^{*}(x))\,,\quad\text{a.e.}\,\,x\in{\mathds{R}^{d}}\,. (4.6)

In view of the near-monotonicity assumption Eq. 4.4, for any minimizing selector vn𝔘𝗌𝗆v_{n}^{*}\in\mathfrak{U}_{\mathsf{sm}} of Eq. 4.5, it is easy to see that outside a compact set vnnVn(x)ϵ{\mathscr{L}}_{v_{n}^{*}}^{n}V_{n}(x)\leq-\epsilon for some ϵ>0\epsilon>0 . Since VnV_{n} is bounded from below, [ABG-book, Theorem 2.6.10(f)] asserts that vnv_{n}^{*} is stable. Hence, we deduce that the optimal policies of the approximating models are stable. However, note that the compact set mentioned above may not be applicable uniformly for all nn, which turns out to be a consequential issue.

Now we want to show that as nn\to\infty the optimal value of the approximated model n(cn){\mathscr{E}}^{n*}(c_{n}) converges to the optimal value of the true model (c){\mathscr{E}}^{*}(c) . Under near-monotonicity assumption this result may not be true in general due to the restricted uniqueness/non-uniqueness of the solution of the associated HJB equation (see e.g., [AA12], [AA13]). As a result of this, in [AA12], [M97] the authors have shown that for the optimal control problem the policy iteration algorithm (PIA) may fail to converge to the optimal value. In order to to ensure convergence of the PIA, in addition to the near-monotonicity assumption a blanket Lyapunov condition is assumed in [M97] .

Accordingly, in this article, to guarantee the convergence n(cn)(c){\mathscr{E}}^{n*}(c_{n})\to{\mathscr{E}}^{*}(c), we will assume that

Θ:={ηvnn:n},\Theta\,:=\,\{\eta_{v_{n}^{*}}^{n}:n\in\mathds{N}\},

is tight, where ηvnn\eta_{v_{n}^{*}}^{n} is the unique invariant measure of the solution XnX^{n} of Eq. 2.15 corresponding to vn𝔘𝗌𝗆v_{n}^{*}\in\mathfrak{U}_{\mathsf{sm}} (the optimal policies of the approximated models) . One sufficient condition which ensures the required tightness is the following: there exists a pair of nonnegative inf-compact functions (𝒱,h)𝒞2(d)×𝒞(d)({\mathcal{V}},h)\in{\mathcal{C}}^{2}({\mathds{R}^{d}})\times{\mathcal{C}}({\mathds{R}^{d}}) such that vnn𝒱(x)κ^0h(x){\mathscr{L}}_{v_{n}^{*}}^{n}{\mathcal{V}}(x)\leq\hat{\kappa}_{0}-h(x) for some positive constant κ^0\hat{\kappa}_{0} and for all nn\in\mathds{N} and xdx\in{\mathds{R}^{d}} .

Theorem 4.3.

Suppose that Assumptions (A1) - (A6) hold. Also, assume that the set Θ\Theta is tight. Then, we have

limnn(cn)=(c).\lim_{n\to\infty}{\mathscr{E}}^{n*}(c_{n})={\mathscr{E}}^{*}(c)\,. (4.7)
Proof.

From Theorem 4.2, we know that for each nn\in\mathds{N} there exists (Vn,ρn)𝒲loc2,p(d)×(V_{n},\rho_{n})\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\times\mathds{R},   1<p<1<p<\infty, with Vn(0)=0V_{n}(0)=0 and infdVn>\inf_{{\mathds{R}^{d}}}V_{n}>-\infty, satisfying

ρn=minζ𝕌[ζnVn(x)+cn(x,ζ)],\rho_{n}=\min_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}^{n}V_{n}(x)+c_{n}(x,\zeta)\right]\,, (4.8)

where ρn=n(cn)\rho_{n}={\mathscr{E}}^{n*}(c_{n}) . Since cnM\lVert c_{n}\rVert_{\infty}\leq M, it follows that ρn=n(cn)M\rho_{n}={\mathscr{E}}^{n*}(c_{n})\leq M .

From [ABG-book, Theorem 3.6.6] (the standard vanishing discount asymptotics), we know that as α0\alpha\to 0 the difference Vαn()Vαn(0)Vn()V_{\alpha}^{n}(\cdot)-V_{\alpha}^{n}(0)\to V_{n}(\cdot) and αVαn(0)ρn\alpha V_{\alpha}^{n}(0)\to\rho_{n}, where VαnV_{\alpha}^{n} is the solution of the α\alpha-discounted HJB equation Eq. 3.3 . Let

κ(ρn):={xdminζ𝕌cn(x,ζ)ρn}.\kappa(\rho_{n})\,:=\,\{x\in{\mathds{R}^{d}}\mid\min_{\zeta\in\mathbb{U}}c_{n}(x,\zeta)\leq\rho_{n}\}\,.

Since the map xminζ𝕌cn(x,ζ)x\to\min_{\zeta\in\mathbb{U}}c_{n}(x,\zeta) is continuous, it is easy to see that κ(ρn)\kappa(\rho_{n}) is closed and due the near-monotonicity assumption (see, Eq. 4.4), it follows that κ(ρn)\kappa(\rho_{n}) is bounded. Therefore κ(ρn)\kappa(\rho_{n}) is a compact subset of d{\mathds{R}^{d}}. Since Vαn𝒥α,nvn(x,cn)V_{\alpha}^{n}\leq{\mathcal{J}}_{\alpha,n}^{v_{n}^{*}}(x,c_{n}) and vnv_{n}^{*} is stable, from [ABG-book, Lemma 3.6.1], we have

infκ(ρn)Vαn=infdVαnρnα.\inf_{\kappa(\rho_{n})}V_{\alpha}^{n}=\inf_{{\mathds{R}^{d}}}V_{\alpha}^{n}\leq\frac{\rho_{n}}{\alpha}\,. (4.9)

Now for any minimizing selector v^n𝔘𝗌𝗆\hat{v}_{n}^{*}\in\mathfrak{U}_{\mathsf{sm}} of Eq. 3.3, we get

Tr(an(x)2Vαn(x))+bn(x,v^n(x))Vαn(x)αVαn(x)=cn(x,v^n(x)).\operatorname*{Tr}\bigl{(}a_{n}(x)\nabla^{2}V_{\alpha}^{n}(x)\bigr{)}+b_{n}(x,\hat{v}_{n}^{*}(x))\cdot\nabla V_{\alpha}^{n}(x)-\alpha V_{\alpha}^{n}(x)=-c_{n}(x,\hat{v}_{n}^{*}(x))\,.

Since cnM\lVert c_{n}\rVert_{\infty}\leq M for all nn\in\mathds{N}, from estimate (3.6.9b) of [ABG-book, Lemma 3.6.3], it follows that

VαnVαn(0)𝒲2,p(R)C~(R,p)(1+αinfR0Vαn),\lVert V_{\alpha}^{n}-V_{\alpha}^{n}(0)\rVert_{{\mathscr{W}}^{2,p}({\mathscr{B}}_{R})}\leq\tilde{C}(R,p)\left(1+\alpha\inf_{{\mathscr{B}}_{R_{0}}}V_{\alpha}^{n}\right)\,, (4.10)

for all R>R0R>R_{0}, where R0R_{0}\in\mathds{R} is positive number such that κ(ρn)R0\kappa(\rho_{n})\subset{\mathscr{B}}_{R_{0}} and C~(R,p)\tilde{C}(R,p) is a positive constant which depends only on dd and R0R_{0} . Now combining Eq. 4.9 and Eq. 4.10, we obtain

VαnVαn(0)𝒲2,p(R)C~(R,p)(1+M).\lVert V_{\alpha}^{n}-V_{\alpha}^{n}(0)\rVert_{{\mathscr{W}}^{2,p}({\mathscr{B}}_{R})}\leq\tilde{C}(R,p)\left(1+M\right)\,. (4.11)

In view of assumption Eq. 4.4, one can choose R0R_{0} independent of nn. Thus Eq. 4.11 implies that

Vn𝒲2,p(R)C~(R,p)(1+M).\lVert V_{n}\rVert_{{\mathscr{W}}^{2,p}({\mathscr{B}}_{R})}\leq\tilde{C}(R,p)\left(1+M\right)\,. (4.12)

Hence, by the Banach-Alaoglu theorem and standard diagonalization argument (as in Eq. 3.8), we have there exists V~𝒲loc2,p(d)\tilde{V}\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}}) such that along a sub-sequence

{VnkV~in𝒲loc2,p(d)(weakly)VnkV~in𝒞loc1,β(d)(strongly).\begin{cases}V_{n_{k}}\to&\tilde{V}\quad\text{in}\quad{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\quad\text{(weakly)}\\ V_{n_{k}}\to&\tilde{V}\quad\text{in}\quad{\mathcal{C}}^{1,\beta}_{loc}({\mathds{R}^{d}})\quad\text{(strongly)}\,.\end{cases} (4.13)

Again, since ρnM\rho_{n}\leq M, along a further sub-sequence (without loss of generality denoting by same sequence), we have ρnkρ~\rho_{n_{k}}\to\tilde{\rho} as kk\to\infty . Now, as before, multiplying by test function ϕ𝒞c(d)\phi\in{\mathcal{C}}_{c}^{\infty}({\mathds{R}^{d}}), from Eq. 4.8, we obtain

dTr(ank(x)2Vnk(x))ϕ(x)dx+dminζ𝕌{bnk(x,ζ)Vnk(x)+cnk(x,ζ)}ϕ(x)dx=dρnkϕ(x)dx.\int_{{\mathds{R}^{d}}}\operatorname*{Tr}\bigl{(}a_{n_{k}}(x)\nabla^{2}V_{n_{k}}(x)\bigr{)}\phi(x)\mathrm{d}x+\int_{{\mathds{R}^{d}}}\min_{\zeta\in\mathbb{U}}\{b_{n_{k}}(x,\zeta)\cdot\nabla V_{n_{k}}(x)+c_{n_{k}}(x,\zeta)\}\phi(x)\mathrm{d}x=\int_{{\mathds{R}^{d}}}\rho_{n_{k}}\phi(x)\mathrm{d}x\,.

By similar argument as in Theorem 3.3, in view of Eq. 4.13, letting kk\to\infty it follows that

dTr(a(x)2V~(x))ϕ(x)dx+dminζ𝕌{b(x,ζ)V~(x)+c(x,ζ)}ϕ(x)dx=dρ~ϕ(x)dx.\int_{{\mathds{R}^{d}}}\operatorname*{Tr}\bigl{(}a(x)\nabla^{2}\tilde{V}(x)\bigr{)}\phi(x)\mathrm{d}x+\int_{{\mathds{R}^{d}}}\min_{\zeta\in\mathbb{U}}\{b(x,\zeta)\cdot\nabla\tilde{V}(x)+c(x,\zeta)\}\phi(x)\mathrm{d}x=\int_{{\mathds{R}^{d}}}\tilde{\rho}\phi(x)\mathrm{d}x\,. (4.14)

Since ϕ𝒞c(d)\phi\in{\mathcal{C}}_{c}^{\infty}({\mathds{R}^{d}}) is arbitrary and V~𝒲loc2,p(d)\tilde{V}\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}}), we deduce that V~𝒲loc2,p(d)\tilde{V}\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}}) satisfies

ρ~=minζ𝕌[ζV~(x)+c(x,ζ)].\tilde{\rho}=\min_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}\tilde{V}(x)+c(x,\zeta)\right]\,.

Since 𝔘𝗌𝗆\mathfrak{U}_{\mathsf{sm}} is compact along a further subsequence vnkv~v_{n_{k}}^{*}\to\tilde{v}^{*} (denoting by the same sequence) in 𝔘𝗌𝗆\mathfrak{U}_{\mathsf{sm}}. Repeating the above argument, one can show that the pair (V~,ρ~)(\tilde{V},\tilde{\rho}) satisfies

ρ~=v~V~(x)+c(x,v~(x)).\tilde{\rho}={\mathscr{L}}_{\tilde{v}^{*}}\tilde{V}(x)+c(x,\tilde{v}^{*}(x))\,.

As we know Vn(0)=0V_{n}(0)=0 for all nn\in\mathds{N} (see, Eq. 4.8), it is easy to see that V~(0)=0\tilde{V}(0)=0. Next we show that V~\tilde{V} is bounded from below. From estimate (3.6.9a) of [ABG-book, Lemma 3.6.3], for each R>R0R>R_{0} we have

(osc2RVαn:=)supx2RVαn(x)infx2RVαn(x)C~1(R)(1+αinfR0Vαn)C~1(R)(1+M),\left(\operatorname*{osc}_{{\mathscr{B}}_{2R}}V_{\alpha}^{n}\,:=\,\right)\sup_{x\in{\mathscr{B}}_{2R}}V_{\alpha}^{n}(x)-\inf_{x\in{\mathscr{B}}_{2R}}V_{\alpha}^{n}(x)\leq\tilde{C}_{1}(R)(1+\alpha\inf_{{\mathscr{B}}_{R_{0}}}V_{\alpha}^{n})\leq\tilde{C}_{1}(R)(1+M)\,, (4.15)

for some constant C~1(R)>0\tilde{C}_{1}(R)>0 which depends only on dd and R0R_{0} . Also, let αk\alpha_{k} be a sequence such that αk0\alpha_{k}\to 0 as kk\to\infty, thus for each xdx\in{\mathds{R}^{d}} we have

Vn(x)\displaystyle V_{n}(x) =limk(Vαkn(x)Vαkn(0))lim infk(Vαkn(x)infdVαkn(x)+infdVαkn(x)Vαkn(0))\displaystyle=\lim_{k\to\infty}\left(V_{\alpha_{k}}^{n}(x)-V_{\alpha_{k}}^{n}(0)\right)\geq\liminf_{k\to\infty}\left(V_{\alpha_{k}}^{n}(x)-\inf_{{\mathds{R}^{d}}}V_{\alpha_{k}}^{n}(x)+\inf_{{\mathds{R}^{d}}}V_{\alpha_{k}}^{n}(x)-V_{\alpha_{k}}^{n}(0)\right)
lim supk(Vαkn(0)infdVαkn(x))+lim infk(Vαkn(x)infdVαkn(x))\displaystyle\geq-\limsup_{k\to\infty}\left(V_{\alpha_{k}}^{n}(0)-\inf_{{\mathds{R}^{d}}}V_{\alpha_{k}}^{n}(x)\right)+\liminf_{k\to\infty}\left(V_{\alpha_{k}}^{n}(x)-\inf_{{\mathds{R}^{d}}}V_{\alpha_{k}}^{n}(x)\right)
lim supk(oscR0Vαkn);(sinceinfR0Vαkn=infdVαkn),\displaystyle\geq-\limsup_{k\to\infty}\left(\operatorname*{osc}_{{\mathscr{B}}_{R_{0}}}V_{\alpha_{k}}^{n}\right);\quad\left(\text{since}\,\,\,\inf_{{\mathscr{B}}_{R_{0}}}V_{\alpha_{k}}^{n}=\inf_{{\mathds{R}^{d}}}V_{\alpha_{k}}^{n}\right)\,, (4.16)

where the last inequality follows form the fact that Vαkn(x)infdVαkn(x)0V_{\alpha_{k}}^{n}(x)-\inf_{{\mathds{R}^{d}}}V_{\alpha_{k}}^{n}(x)\geq 0 . Hence, in view of estimate Eq. 4.15, we deduce that

Vn(x)C~1(R0)(1+M).V_{n}(x)\geq-\tilde{C}_{1}(R_{0})(1+M)\,. (4.17)

This implies that the limit V~C~1(R0)(1+M)\tilde{V}\geq-\tilde{C}_{1}(R_{0})(1+M) . Note that

ρnk=nk(cnk)=d𝕌cnk(x,ζ)vnk(x)(dζ)ηvnknk(dx).\rho_{n_{k}}={\mathscr{E}}^{{n_{k}}*}(c_{n_{k}})=\int_{{\mathds{R}^{d}}}\int_{\mathbb{U}}c_{n_{k}}(x,\zeta)v_{n_{k}}^{*}(x)(\mathrm{d}\zeta)\eta_{v_{n_{k}}^{*}}^{n_{k}}(\mathrm{d}x)\,.

Since Θ\Theta is tight, from [ABG-book, Lemma 3.2.6], we deduce that ηvnknkηv~\eta_{v_{n_{k}}^{*}}^{n_{k}}\to\eta_{\tilde{v}^{*}} in total variation norm as kk\to\infty, where ηv~\eta_{\tilde{v}^{*}} is the unique invariant measure of Eq. 2.1 corresponding to v~\tilde{v}^{*} . Thus, by writing

d𝕌cnk(x,ζ)vnk(x)(dζ)ηvnknk(dx)d𝕌c(x,ζ)v~(x)(dζ)ηv~(dx)\displaystyle\int_{{\mathds{R}^{d}}}\int_{\mathbb{U}}c_{n_{k}}(x,\zeta)v_{n_{k}}^{*}(x)(\mathrm{d}\zeta)\eta_{v_{n_{k}}^{*}}^{n_{k}}(\mathrm{d}x)\,-\int_{{\mathds{R}^{d}}}\int_{\mathbb{U}}c(x,\zeta)\tilde{v}^{*}(x)(\mathrm{d}\zeta)\eta_{\tilde{v}^{*}}(\mathrm{d}x)
=(d𝕌cnk(x,ζ)vnk(x)(dζ)ηvnknk(dx)d𝕌cnk(x,ζ)vnk(x)(dζ)ηv~(dx))\displaystyle=\bigg{(}\int_{{\mathds{R}^{d}}}\int_{\mathbb{U}}c_{n_{k}}(x,\zeta)v_{n_{k}}^{*}(x)(\mathrm{d}\zeta)\eta_{v_{n_{k}}^{*}}^{n_{k}}(\mathrm{d}x)-\int_{{\mathds{R}^{d}}}\int_{\mathbb{U}}c_{n_{k}}(x,\zeta)v_{n_{k}}^{*}(x)(\mathrm{d}\zeta)\eta_{\tilde{v}^{*}}(\mathrm{d}x)\bigg{)}
+(d𝕌cnk(x,ζ)vnk(x)(dζ)ηv~(dx)d𝕌c(x,ζ)v~(x)(dζ)ηv~(dx))\displaystyle\quad+\bigg{(}\int_{{\mathds{R}^{d}}}\int_{\mathbb{U}}c_{n_{k}}(x,\zeta)v_{n_{k}}^{*}(x)(\mathrm{d}\zeta)\eta_{\tilde{v}^{*}}(\mathrm{d}x)-\int_{{\mathds{R}^{d}}}\int_{\mathbb{U}}c(x,\zeta)\tilde{v}^{*}(x)(\mathrm{d}\zeta)\eta_{\tilde{v}^{*}}(\mathrm{d}x)\bigg{)} (4.18)

and noting that the first term converges to zero by total variation convergence of ηvnknkηv~\eta_{v_{n_{k}}^{*}}^{n_{k}}\to\eta_{\tilde{v}^{*}} and the second term converging by the convergence in the control topology on 𝔘𝗌𝗆\mathfrak{U}_{\mathsf{sm}} as ηv~\eta_{\tilde{v}^{*}} is fixed; in view of the fact that cncc_{n}\to c (continuously over control actions) we conclude that ρ~=d𝕌c(x,ζ)v~(x)(dζ)ηv~(dx)\tilde{\rho}=\int_{{\mathds{R}^{d}}}\int_{\mathbb{U}}c(x,\zeta)\tilde{v}^{*}(x)(\mathrm{d}\zeta)\eta_{\tilde{v}^{*}}(\mathrm{d}x) . Therefore, the pair (V~,ρ~)𝒲loc2,p(d)×(\tilde{V},\tilde{\rho})\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\times\mathds{R},   1<p<1<p<\infty, which has the properties that V~(0)=0\tilde{V}(0)=0 and infdV~>\inf_{{\mathds{R}^{d}}}\tilde{V}>-\infty, is a compatible solution (see [AA13, Definition 1.1]) to Eq. 4.2. Since solution to the equation Eq. 4.2 is unique (see [AA13, Theorem 1.1]), it follows that (V~,ρ~)(V,ρ)(\tilde{V},\tilde{\rho})\equiv(V,\rho) . This completes the proof of the theorem. ∎

In the following theorem, we prove existence and uniqueness of solution of a certain Poisson’s equation. This will be useful in proving the robustness result.

Theorem 4.4.

Suppose that Assumptions (A1) - (A4) hold. Let v𝔘𝗌𝗆v\in\mathfrak{U}_{\mathsf{sm}} be a stable control such that

lim infxinfζ𝕌c(x,ζ)>infxdx(c,v).\liminf_{\lVert x\rVert\to\infty}\inf_{\zeta\in\mathbb{U}}c(x,\zeta)>\inf_{x\in{\mathds{R}^{d}}}{\mathscr{E}}_{x}(c,v)\,. (4.19)

Then, there exists a unique pair (Vv,ρv)𝒲loc2,p(d)×(V^{v},\rho_{v})\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\times\mathds{R},   1<p<1<p<\infty, with Vv(0)=0V^{v}(0)=0 and infdVv>\inf_{{\mathds{R}^{d}}}V^{v}>-\infty and ρvd𝕌c(x,ζ)v(x)(dζ)ηv(dx)\rho_{v}\leq\int_{{\mathds{R}^{d}}}\int_{\mathbb{U}}c(x,\zeta)v(x)(\mathrm{d}\zeta)\eta_{v}(\mathrm{d}x), satisfying

ρv=[vVv(x)+c(x,v(x))]\rho_{v}=\left[{\mathscr{L}}_{v}V^{v}(x)+c(x,v(x))\right] (4.20)

Moreover, we have

  • (i)

    ρv=infdx(c,v)\rho_{v}=\inf_{{\mathds{R}^{d}}}{\mathscr{E}}_{x}(c,v) .

  • (ii)

    for all xdx\in{\mathds{R}^{d}}

    Vv(x)=limr0𝔼xv[0τ˘r(c(Xt,v(Xt))ρv)dt].V^{v}(x)\,=\,\lim_{r\downarrow 0}\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{{\breve{\uptau}}_{r}}\left(c(X_{t},v(X_{t}))-\rho_{v}\right)\mathrm{d}t\right]\,. (4.21)
Proof.

Since cc is bounded, we have (ρv:=)d𝕌c(x,ζ)v(x)(dζ)ηv(dx)infdx(c,v)c\left(\rho^{v}\,:=\,\right)\int_{{\mathds{R}^{d}}}\int_{\mathbb{U}}c(x,\zeta)v(x)(\mathrm{d}\zeta)\eta_{v}(\mathrm{d}x)\leq\inf_{{\mathds{R}^{d}}}{\mathscr{E}}_{x}(c,v)\leq\lVert c\rVert_{\infty} . Also, since (see, Eq. 4.19) lim infxinfζ𝕌c(x,ζ)>ρv\liminf_{\lVert x\rVert\to\infty}\inf_{\zeta\in\mathbb{U}}c(x,\zeta)>\rho^{v}, from [ABG-book, Lemma 3.6.1], it follows that

infκ(ρv)𝒥αv(x,c)=infd𝒥αv(x,c)ρvα,\inf_{\kappa(\rho^{v})}{\mathcal{J}}_{\alpha}^{v}(x,c)=\inf_{{\mathds{R}^{d}}}{\mathcal{J}}_{\alpha}^{v}(x,c)\leq\frac{\rho^{v}}{\alpha}\,, (4.22)

where κ(ρv):={xdminζ𝕌c(x,ζ)ρv}\kappa(\rho^{v})\,:=\,\{x\in{\mathds{R}^{d}}\mid\min_{\zeta\in\mathbb{U}}c(x,\zeta)\leq\rho^{v}\} and 𝒥αv(x,c){\mathcal{J}}_{\alpha}^{v}(x,c) is the α\alpha-discounted cost defined as in Eq. 2.2. It known that 𝒥αv(x,c){\mathcal{J}}_{\alpha}^{v}(x,c) is a solution to the Poisson’s equation (see, [ABG-book, Lemma A.3.7])

v𝒥αv(x,c)α𝒥αv(x,c)=c(x,v(x)).{\mathscr{L}}_{v}{\mathcal{J}}_{\alpha}^{v}(x,c)-\alpha{\mathcal{J}}_{\alpha}^{v}(x,c)=-c(x,v(x))\,. (4.23)

Since κ(ρv)\kappa(\rho^{v}) is compact, for some R0>0R_{0}>0, we have κ(ρv)R0\kappa(\rho^{v})\subset{\mathscr{B}}_{R_{0}} . Thus from [ABG-book, Lemma 3.6.3], we deduce that for each R>R0R>R_{0} there exist constants C~2(R),C~2(R,p)\tilde{C}_{2}(R),\tilde{C}_{2}(R,p) depending only on d,R0d,R_{0} such that

osc2R𝒥αv(x,c)C~2(R)(1+αinfR0𝒥αv(x,c)),\operatorname*{osc}_{{\mathscr{B}}_{2R}}{\mathcal{J}}_{\alpha}^{v}(x,c)\leq\tilde{C}_{2}(R)\left(1+\alpha\inf_{{\mathscr{B}}_{R_{0}}}{\mathcal{J}}_{\alpha}^{v}(x,c)\right)\,, (4.24)
𝒥αv(,c)𝒥αv(0,c)𝒲2,p(R)C~2(R,p)(1+αinfR0𝒥αv(x,c)).\lVert{\mathcal{J}}_{\alpha}^{v}(\cdot,c)-{\mathcal{J}}_{\alpha}^{v}(0,c)\rVert_{{\mathscr{W}}^{2,p}({\mathscr{B}}_{R})}\leq\tilde{C}_{2}(R,p)\left(1+\alpha\inf_{{\mathscr{B}}_{R_{0}}}{\mathcal{J}}_{\alpha}^{v}(x,c)\right)\,. (4.25)

Thus, arguing as in [ABG-book, Lemma 3.6.6], we deduce that there exists (Vv,ρ~v)𝒲loc2,p(d)×(V^{v},\tilde{\rho}_{v})\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\times\mathds{R} such that as α0\alpha\to 0, 𝒥αv(,c)𝒥αv(0,c)Vv(){\mathcal{J}}_{\alpha}^{v}(\cdot,c)-{\mathcal{J}}_{\alpha}^{v}(0,c)\to V^{v}(\cdot) and α𝒥αv(0,c)ρ~v\alpha{\mathcal{J}}_{\alpha}^{v}(0,c)\to\tilde{\rho}_{v} and the pair (Vv,ρ~v)(V^{v},\tilde{\rho}_{v}) satisfies

vVv(x)+c(x,v(x))=ρ~v.{\mathscr{L}}_{v}V^{v}(x)+c(x,v(x))=\tilde{\rho}_{v}\,. (4.26)

By Eq. 4.22, we get ρ~vρv\tilde{\rho}_{v}\leq\rho^{v}. Now, in view of estimates Eq. 4.22 and Eq. 4.25, it is easy to see that

Vv𝒲2,p(R)C~2(R,p)(1+M).\lVert V^{v}\rVert_{{\mathscr{W}}^{2,p}({\mathscr{B}}_{R})}\leq\tilde{C}_{2}(R,p)\left(1+M\right)\,. (4.27)

Also, arguing as in Theorem 4.3 (see Section 4.1), from estimate Eq. 4.24 it follows that

VvC~2(R0)(1+M).V^{v}\geq-\tilde{C}_{2}(R_{0})\left(1+M\right)\,. (4.28)

Now, applying Ito^\hat{\rm o}-Krylov formula and using Eq. 4.26 we obtain

𝔼xv[Vv(XTτR)]Vv(x)=𝔼xv[0TτR(ρ~vc(Xt,v(Xt)))dt].\displaystyle\operatorname{\mathbb{E}}_{x}^{v}\left[V^{v}\left(X_{T\wedge\uptau_{R}}\right)\right]-V^{v}(x)\,=\,\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{T\wedge\uptau_{R}}\left(\tilde{\rho}_{v}-c(X_{t},v(X_{t}))\right)\mathrm{d}{t}\right]\,.

This implies

infydVv(y)Vv(x)𝔼xv[0TτR(ρ~vc(Xt,v(Xt)))dt].\displaystyle\inf_{y\in{\mathds{R}^{d}}}V^{v}(y)-V^{v}(x)\,\leq\,\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{T\wedge\uptau_{R}}\left(\tilde{\rho}_{v}-c(X_{t},v(X_{t}))\right)\mathrm{d}{t}\right]\,.

Since vv is stable, letting RR\to\infty, we get

infydVv(y)Vv(x)𝔼xv[0T(ρ~vc(Xt,v(Xt)))dt].\displaystyle\inf_{y\in{\mathds{R}^{d}}}V^{v}(y)-V^{v}(x)\,\leq\,\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{T}\left(\tilde{\rho}_{v}-c(X_{t},v(X_{t}))\right)\mathrm{d}{t}\right]\,.

Now dividing both sides of the above inequality by TT and letting TT\to\infty, it follows that

lim supT1T𝔼xv[0Tc(Xt,v(Xt))dt]ρ~v.\displaystyle\limsup_{T\to\infty}\frac{1}{T}\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{T}c(X_{t},v(X_{t}))\mathrm{d}{t}\right]\,\leq\,\tilde{\rho}_{v}\,.

Thus, ρvρ~v\rho^{v}\leq\tilde{\rho}_{v}. This indeed implies that ρv=ρ~v\rho^{v}=\tilde{\rho}_{v} . The representation Eq. 4.21 of VvV^{v} follows by closely mimicking the argument of [ABG-book, Lemma 3.6.9]. Therefore, we have a solution pair (Vv,ρv)(V^{v},\rho_{v}) to Eq. 4.20 satisfying (i) and (ii).

Next we want to prove that the solution pair is unique. To this end, let (V^v,ρ^v)𝒲loc2,p(d)×(\hat{V}^{v},\hat{\rho}_{v})\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\times\mathds{R},   1<p<1<p<\infty, with V^v(0)=0\hat{V}^{v}(0)=0 and infdV^v>\inf_{{\mathds{R}^{d}}}\hat{V}^{v}>-\infty and ρ^vd𝕌c(x,ζ)v(x)(dζ)ηv(dx)\hat{\rho}_{v}\leq\int_{{\mathds{R}^{d}}}\int_{\mathbb{U}}c(x,\zeta)v(x)(\mathrm{d}\zeta)\eta_{v}(\mathrm{d}x), satisfying

ρ^v=[vV^v(x)+c(x,v(x))]\hat{\rho}_{v}=\left[{\mathscr{L}}_{v}\hat{V}^{v}(x)+c(x,v(x))\right] (4.29)

Applying Ito^\hat{\rm o}-Krylov formula and using Eq. 4.29 we obtain

lim supT1T𝔼xv[0Tc(Xt,v(Xt))dt]ρ^v\displaystyle\limsup_{T\to\infty}\frac{1}{T}\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{T}c(X_{t},v(X_{t}))\mathrm{d}{t}\right]\,\leq\,\hat{\rho}_{v} (4.30)

Since, ρ^vinfdx(c,v)\hat{\rho}_{v}\leq\inf_{{\mathds{R}^{d}}}{\mathscr{E}}_{x}(c,v), from Eq. 4.30 we obtain ρ^v=ρv\hat{\rho}^{v}=\rho_{v} . Now, from Eq. 4.26, applying Ito^\hat{\rm o}-Krylov formula, we deduce that

V^v(x)=𝔼xv[0τ˘rτR(c(Xt,v(Xt))ρ^v)dt+V^v(Xτ˘rτR)].\displaystyle\hat{V}^{v}(x)\,=\,\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{{\breve{\uptau}}_{r}\wedge\uptau_{R}}\left(c(X_{t},v(X_{t}))-\hat{\rho}_{v}\right)\mathrm{d}{t}+\hat{V}^{v}\left(X_{{\breve{\uptau}}_{r}\wedge\uptau_{R}}\right)\right]\,. (4.31)

Since vv is stable and V^v\hat{V}^{v} is bounded from below, for all xdx\in{\mathds{R}^{d}} we have

lim infR𝔼xv[V^v(XτR)𝟙{τ˘rτR}]0.\liminf_{R\to\infty}\operatorname{\mathbb{E}}_{x}^{v}\left[\hat{V}^{v}\left(X_{\uptau_{R}}\right)\mathds{1}_{\{{\breve{\uptau}}_{r}\geq\uptau_{R}\}}\right]\geq 0\,.

Hence, letting RR\to\infty by Fatou’s lemma from Eq. 4.31, it follows that

V^v(x)\displaystyle\hat{V}^{v}(x) 𝔼xv[0τ˘r(c(Xt,v(Xt))ρ^v)dt+V^v(Xτ˘r)]\displaystyle\,\geq\,\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{{\breve{\uptau}}_{r}}\left(c(X_{t},v(X_{t}))-\hat{\rho}_{v}\right)\mathrm{d}{t}+\hat{V}^{v}\left(X_{{\breve{\uptau}}_{r}}\right)\right]
𝔼xv[0τ˘r(c(Xt,v(Xt))ρ^v)dt]+infrV^v.\displaystyle\,\geq\,\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{{\breve{\uptau}}_{r}}\left(c(X_{t},v(X_{t}))-\hat{\rho}_{v}\right)\mathrm{d}{t}\right]+\inf_{{\mathscr{B}}_{r}}\hat{V}^{v}\,.

Since V^v(0)=0\hat{V}^{v}(0)=0, letting r0r\to 0, we obtain

V^v(x)lim supr0𝔼xv[0τ˘r(c(Xt,v(Xt))ρ^v)dt].\displaystyle\hat{V}^{v}(x)\,\geq\,\limsup_{r\downarrow 0}\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{{\breve{\uptau}}_{r}}\left(c(X_{t},v(X_{t}))-\hat{\rho}_{v}\right)\mathrm{d}{t}\right]\,. (4.32)

From Eq. 4.21 and Eq. 4.32, it is easy to see that VvV^v0V^{v}-\hat{V}^{v}\leq 0 in d{\mathds{R}^{d}}. On the other hand by Eq. 4.20 and Eq. 4.29 one has v(VvV^v)(x)=0{\mathscr{L}}_{v}\left(V^{v}-\hat{V}^{v}\right)(x)=0 in d{\mathds{R}^{d}}. Hence, applying strong maximum principle [GilTru, Theorem 9.6], one has Vv=V^vV^{v}=\hat{V}^{v}. This proves uniqueness. ∎

Next we prove the robustness result, i.e., we prove that x(c,vn)ρ{\mathscr{E}}_{x}(c,v_{n}^{*})\to\rho as nn\to\infty, where vnv_{n}^{*} is an optimal ergodic control of the approximated model (see, Theorem 4.2) . In order to establish this result we will assume that Θ^:={ηvn:n}\widehat{\Theta}:=\{\eta_{v_{n}^{*}}:n\in\mathds{N}\} is tight, where ηvn\eta_{v_{n}^{*}} is the unique invariant measure of Eq. 2.1 corresponding to vnv_{n}^{*} .

Theorem 4.5.

Suppose that Assumptions (A1) - (A6) hold. Also, assume that

lim infxinfζ𝕌c(x,ζ)>infxdsupnx(c,vn),\liminf_{\lVert x\rVert\to\infty}\inf_{\zeta\in\mathbb{U}}c(x,\zeta)>\inf_{x\in{\mathds{R}^{d}}}\sup_{n\in\mathds{N}}{\mathscr{E}}_{x}(c,v_{n}^{*})\,, (4.33)

and the sets Θ^\widehat{\Theta} and Θ\Theta are tight. Then, we have

limninfxdx(c,vn)=(c).\lim_{n\to\infty}\inf_{x\in{\mathds{R}^{d}}}{\mathscr{E}}_{x}(c,v_{n}^{*})={\mathscr{E}}^{*}(c)\,. (4.34)
Proof.

We shall follow a similar proof program as that of Theorem  3.4, under the discounted setup. Since cc is bounded, we have (ρvn:=)infxdx(c,vn)c\left(\rho_{v_{n}^{*}}\,:=\,\right)\inf_{x\in{\mathds{R}^{d}}}{\mathscr{E}}_{x}(c,v_{n}^{*})\leq\lVert c\rVert_{\infty} . From our assumption Eq. 4.33, we know that lim infxinfζ𝕌c(x,ζ)>ρvn\liminf_{\lVert x\rVert\to\infty}\inf_{\zeta\in\mathbb{U}}c(x,\zeta)>\rho_{v_{n}^{*}} . Hence, from Theorem 4.4, we have there exists a unique pair (Vvn,ρvn)𝒲loc2,p(d)×(V^{v_{n}^{*}},\rho_{v_{n}^{*}})\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\times\mathds{R},   1<p<1<p<\infty, with Vvn(0)=0V^{v_{n}^{*}}(0)=0 and infdVvn>\inf_{{\mathds{R}^{d}}}V^{v_{n}^{*}}>-\infty, satisfying

ρvn=[vnVvn(x)+c(x,vn(x))],\rho_{v_{n}^{*}}=\left[{\mathscr{L}}_{v_{n}^{*}}V^{v_{n}^{*}}(x)+c(x,{v_{n}^{*}}(x))\right], (4.35)

with ρvn=infxdx(c,vn)=d𝕌c(x,ζ)vn(x)(dζ)ηvn(dx)\rho_{v_{n}^{*}}=\inf_{x\in{\mathds{R}^{d}}}{\mathscr{E}}_{x}(c,v_{n}^{*})=\int_{{\mathds{R}^{d}}}\int_{\mathbb{U}}c(x,\zeta)v_{n}^{*}(x)(\mathrm{d}\zeta)\eta_{v_{n}^{*}}(\mathrm{d}x) . Moreover, in view of assumption Eq. 4.33, from Eq. 4.27 and Eq. 4.28, we have

Vvn𝒲2,p(R)κ1andVvn(x)κ2for allxd,\lVert V^{v_{n}^{*}}\rVert_{{\mathscr{W}}^{2,p}({\mathscr{B}}_{R})}\leq\kappa_{1}\quad\text{and}\quad V^{v_{n}^{*}}(x)\geq-\kappa_{2}\,\,\,\text{for all}\,\,\,x\in{\mathds{R}^{d}}\,, (4.36)

where κ1,κ2\kappa_{1},\kappa_{2} are constants independent of nn\in\mathds{N} . Thus by the Banach-Alaoglu theorem and standard diagonalization argument (as in Eq. 3.8), we deduce that exists V^𝒲loc2,p(d)\hat{V}\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}}) such that along a sub-sequence

{VvnkV^in𝒲loc2,p(d)(weakly)VvnkV^in𝒞loc1,β(d)(strongly).\begin{cases}V^{v_{n_{k}}^{*}}\to&\hat{V}\quad\text{in}\quad{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\quad\text{(weakly)}\\ V^{v_{n_{k}}^{*}}\to&\hat{V}\quad\text{in}\quad{\mathcal{C}}^{1,\beta}_{loc}({\mathds{R}^{d}})\quad\text{(strongly)}\,.\end{cases} (4.37)

Again, since ρvnM\rho_{v_{n}^{*}}\leq M, along a further sub-sequence (without loss of generality denoting by same sequence), we have ρvnkρ^\rho_{v_{n_{k}}^{*}}\to\hat{\rho} as kk\to\infty . Since 𝔘𝗌𝗆\mathfrak{U}_{\mathsf{sm}} is compact along a further subsequence (without loss of generality denoting by same sequence) we have vnkv^v_{n_{k}}^{*}\to\hat{v}^{*} as kk\to\infty . Now, as before, multiplying by test function and letting kk\to\infty, from Eq. 4.35, we deduce that the pair (V^,ρ^)𝒲loc2,p(d)×(\hat{V},\hat{\rho})\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\times\mathds{R},   1<p<1<p<\infty, satisfies

ρ^=[v^V^(x)+c(x,v^(x))]\hat{\rho}=\left[{\mathscr{L}}_{\hat{v}^{*}}\hat{V}(x)+c(x,{\hat{v}^{*}}(x))\right] (4.38)

Since Vvnk(0)=0V^{v_{n_{k}}^{*}}(0)=0 for all kk\in\mathds{N}, it is easy to see that V^(0)=0\hat{V}(0)=0 . Also, by Eq. 4.36, it follows that infdV^>\inf_{{\mathds{R}^{d}}}\hat{V}>-\infty . Hence, using Eq. 4.33 and Eq. 4.38, we have v^𝔘𝗌𝗆\hat{v}^{*}\in\mathfrak{U}_{\mathsf{sm}} is stable. Since Θ^\widehat{\Theta} is tight, in view of [ABG-book, Lemma 3.2.6], it is easy to see that ρ^=d𝕌c(x,ζ)v^(x)(dζ)ηv^(dx)\hat{\rho}=\int_{{\mathds{R}^{d}}}\int_{\mathbb{U}}c(x,\zeta)\hat{v}^{*}(x)(\mathrm{d}\zeta)\eta_{\hat{v}^{*}}(\mathrm{d}x) . Thus, by Lemma 4.4, we deduce that (V^,ρ^)(Vv^,ρv^)(\hat{V},\hat{\rho})\equiv(V^{\hat{v}^{*}},\rho_{\hat{v}^{*}}) .

Note that

|ρvnkρ||ρvnkρnk|+|ρnkρ|.|\rho_{v_{n_{k}}^{*}}-\rho|\leq|\rho_{v_{n_{k}}^{*}}-\rho_{n_{k}}|+|\rho_{n_{k}}-\rho|\,.

Since |ρnkρ|0|\rho_{n_{k}}-\rho|\to 0 as kk\to\infty (see, Theorem 4.3), to complete the proof we have to show that |ρvnkρnk|0|\rho_{v_{n_{k}}^{*}}-\rho_{n_{k}}|\to 0 as kk\to\infty . From Theorem 4.2, we know that the pair (Vnk,ρnk)𝒲loc2,p(d)×(V_{n_{k}},\rho_{n_{k}})\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\times\mathds{R},   1<p<1<p<\infty, with Vnk(0)=0V_{n_{k}}(0)=0, satisfies

ρnk=minζ𝕌[ζnkVnk(x)+cnk(x,ζ)].\rho_{n_{k}}=\min_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}^{n_{k}}V_{n_{k}}(x)+c_{n_{k}}(x,\zeta)\right]\,. (4.39)

For any minimizing selector vnk𝔘𝗌𝗆v_{n_{k}}^{*}\in\mathfrak{U}_{\mathsf{sm}}, rewriting Eq. 4.39, we get

ρnk=[vnknkVnk(x)+cnk(x,vnk(x))].\rho_{n_{k}}=\left[{\mathscr{L}}_{v_{n_{k}}^{*}}^{n_{k}}V_{n_{k}}(x)+c_{n_{k}}(x,v_{n_{k}}^{*}(x))\right]\,. (4.40)

Now, in view of estimates Eq. 4.12 and Eq. 4.17, it follows that

Vnk𝒲2,p(R)κ3andVnk(x)κ4for allxd,\lVert V_{n_{k}}\rVert_{{\mathscr{W}}^{2,p}({\mathscr{B}}_{R})}\leq\kappa_{3}\quad\text{and}\quad V_{n_{k}}(x)\geq-\kappa_{4}\,\,\,\text{for all}\,\,\,x\in{\mathds{R}^{d}}\,, (4.41)

where κ3,κ4\kappa_{3},\kappa_{4} are constants independent of kk\in\mathds{N} . Hence, by the Banach-Alaoglu theorem and standard diagonalization argument (see Eq. 3.8), we have there exists V¯𝒲loc2,p(d)\bar{V}\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}}) such that along a sub-sequence

{VnkV¯in𝒲loc2,p(d)(weakly)VnkV¯in𝒞loc1,β(d)(strongly).\begin{cases}V_{n_{k}}\to&\bar{V}\quad\text{in}\quad{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\quad\text{(weakly)}\\ V_{n_{k}}\to&\bar{V}\quad\text{in}\quad{\mathcal{C}}^{1,\beta}_{loc}({\mathds{R}^{d}})\quad\text{(strongly)}\,.\end{cases} (4.42)

Also, ρnkM\rho_{n_{k}}\leq M implies that along a further subsequence (denoting by same sequence without loss generality) ρnkρ¯\rho_{n_{k}}\to\bar{\rho}. Since vnkv^v_{n_{k}}^{*}\to\hat{v}^{*} in 𝔘𝗌𝗆\mathfrak{U}_{\mathsf{sm}}, multiplying by test functions and letting kk\to\infty from Eq. 4.40, we obtain that the pair (V¯,ρ¯)𝒲loc2,p(d)×(\bar{V},\bar{\rho})\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\times\mathds{R},   1<p<1<p<\infty satisfies

ρ¯=[v^V¯(x)+c(x,v^(x))].\bar{\rho}=\left[{\mathscr{L}}_{\hat{v}^{*}}\bar{V}(x)+c(x,\hat{v}^{*}(x))\right]\,. (4.43)

Form Eq. 4.41, it easy to see that infdV¯>\inf_{{\mathds{R}^{d}}}\bar{V}>-\infty. Also, since Vnk(0)=0V_{n_{k}}(0)=0 for all kk\in\mathds{N}, we have V¯(0)=0\bar{V}(0)=0 . Since Θ\Theta is tight, arguing as in proof of Theorem 4.3, we deduce that ρ¯=d𝕌c(x,ζ)v^(x)(dζ)ηv^(dx)\bar{\rho}=\int_{{\mathds{R}^{d}}}\int_{\mathbb{U}}c(x,\zeta)\hat{v}^{*}(x)(\mathrm{d}\zeta)\eta_{\hat{v}^{*}}(\mathrm{d}x) . Thus, by uniqueness of solution of Eq. 4.43 (see, Theorem 4.4) it follows that (V¯,ρ¯)(Vv^,ρv^)(\bar{V},\bar{\rho})\equiv(V^{\hat{v}^{*}},\rho_{\hat{v}^{*}}) . Since both ρvnk\rho_{v_{n_{k}}^{*}} and ρnk\rho_{n_{k}} converges to same limit ρv^\rho_{\hat{v}^{*}}, we deduce that |ρvnkρnk|0|\rho_{v_{n_{k}}^{*}}-\rho_{n_{k}}|\to 0 as kk\to\infty . This completes the proof of the theorem. ∎

4.2. Analysis under Lyapunov stability

In this section we study the robustness problem for ergodic cost criterion under Lyapunov stability assumption. We assume the following Foster-Lyapunov condition on the dynamics.

  • (A7)
    • (i)

      There exists a positive constant C^0\widehat{C}_{0}, and a pair of inf-compact functions (𝒱,h)𝒞2(d)×𝒞(d×𝕌)({\mathcal{V}},h)\in{\mathcal{C}}^{2}({\mathds{R}^{d}})\times{\mathcal{C}}({\mathds{R}^{d}}\times\mathbb{U}) (i.e., the sub-level sets {𝒱k},{hk}\{{\mathcal{V}}\leq k\}\,,\{h\leq k\} are compact or empty sets in d{\mathds{R}^{d}} , d×𝕌{\mathds{R}^{d}}\times\mathbb{U} respectively for each kk\in\mathds{R}) such that

      ζ𝒱(x)C^0h(x,u)for all(x,ζ)d×𝕌,{\mathscr{L}}_{\zeta}{\mathcal{V}}(x)\leq\widehat{C}_{0}-h(x,u)\quad\text{for all}\,\,\,(x,\zeta)\in{\mathds{R}^{d}}\times\mathbb{U}\,, (4.44)

      where hh is locally Lipschitz continuous in its first argument uniformly with respect to second.

    • (ii)

      For each nn\in\mathds{N} , we have

      ζn𝒱(x)C^0h(x,u)for all(x,ζ)d×𝕌,{\mathscr{L}}_{\zeta}^{n}{\mathcal{V}}(x)\leq\widehat{C}_{0}-h(x,u)\quad\text{for all}\,\,\,(x,\zeta)\in{\mathds{R}^{d}}\times\mathbb{U}\,, (4.45)

      where the functions 𝒱,h{\mathcal{V}},h are as in Eq. 4.44 .

Combining [ABG-book, Theorem 3.7.11] and [ABG-book, Theorem 3.7.12], we have the following complete characterization of the ergodic optimal control.

Theorem 4.6.

Suppose that assumptions (A1)-(A4) and (A7)(i) hold. Then the ergodic HJB equation

ρ=minζ𝕌[ζV(x)+c(x,ζ)]\rho=\min_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}V^{*}(x)+c(x,\zeta)\right] (4.46)

admits unique solution (V,ρ)𝒞2(d)𝔬(𝒱)×(V^{*},\rho)\in{\mathcal{C}}^{2}({\mathds{R}^{d}})\cap{\mathfrak{o}}({\mathcal{V}})\times\mathds{R} satisfying V(0)=0V^{*}(0)=0 . Moreover, we have

  • (i)

    ρ=(c)\rho={\mathscr{E}}^{*}(c)

  • (ii)

    a stationary Markov control v𝔘𝗌𝗆v^{*}\in\mathfrak{U}_{\mathsf{sm}} is an optimal control (i.e., x(c,v)=(c){\mathscr{E}}_{x}(c,v^{*})={\mathscr{E}}^{*}(c)) if and only if it satisfies

    minζ𝕌[ζV(x)+c(x,ζ)]=Tr(a(x)2V(x))+b(x,v(x))V(x)+c(x,v(x)),a.e.xd.\min_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}V^{*}(x)+c(x,\zeta)\right]\,=\,\operatorname*{Tr}\bigl{(}a(x)\nabla^{2}V^{*}(x)\bigr{)}+b(x,v^{*}(x))\cdot\nabla V^{*}(x)+c(x,v^{*}(x))\,,\quad\text{a.e.}\,\,x\in{\mathds{R}^{d}}\,. (4.47)
  • (iii)

    for any v𝔘𝗌𝗆v^{*}\in\mathfrak{U}_{\mathsf{sm}} satisfying Eq. 4.47, we have

    V(x)=limr0𝔼xv[0τ˘r(c(Xt,v(Xt))(c))dt]for allxd.V^{*}(x)\,=\,\lim_{r\downarrow 0}\operatorname{\mathbb{E}}_{x}^{v^{*}}\left[\int_{0}^{{\breve{\uptau}}_{r}}\left(c(X_{t},v^{*}(X_{t}))-{\mathscr{E}}^{*}(c)\right)\mathrm{d}t\right]\quad\text{for all}\,\,\,x\in{\mathds{R}^{d}}\,. (4.48)

Again, from [ABG-book, Theorem 3.7.11] and [ABG-book, Theorem 3.7.12], for the approximated model for each nn\in\mathds{N}, we have the following complete characterization of the optimal control.

Theorem 4.7.

Suppose that Assumptions (A5) and (A7)(ii) hold. Then the ergodic HJB equation

ρn=minζ𝕌[ζnV(x)+cn(x,ζ)]\rho_{n}=\min_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}^{n}V(x)+c_{n}(x,\zeta)\right] (4.49)

admits unique solution (Vn,ρn)𝒞2(d)𝔬(𝒱)×(V^{n*},\rho_{n})\in{\mathcal{C}}^{2}({\mathds{R}^{d}})\cap{\mathfrak{o}}({\mathcal{V}})\times\mathds{R} satisfying Vn(0)=0V^{n*}(0)=0 . Moreover, we have

  • (i)

    ρn=n(cn)\rho_{n}={\mathscr{E}}^{n*}(c_{n})

  • (ii)

    a stationary Markov control vn𝔘𝗌𝗆v_{n}^{*}\in\mathfrak{U}_{\mathsf{sm}} is an optimal control (i.e., xn(cn,vn)=n(cn){\mathscr{E}}_{x}^{n}(c_{n},v_{n}^{*})={\mathscr{E}}^{n*}(c_{n})) if and only if it satisfies

    minζ𝕌[ζnVn(x)+cn(x,ζ)]=Tr(an(x)2Vn(x))+bn(x,vn(x))Vn(x)+c(x,vn(x)),a.e.xd.\min_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}^{n}V^{n*}(x)+c_{n}(x,\zeta)\right]\,=\,\operatorname*{Tr}\bigl{(}a_{n}(x)\nabla^{2}V^{n*}(x)\bigr{)}+b_{n}(x,v_{n}^{*}(x))\cdot\nabla V^{n*}(x)+c(x,v_{n}^{*}(x))\,,\quad\text{a.e.}\,\,x\in{\mathds{R}^{d}}\,. (4.50)
  • (iii)

    for any vn𝔘𝗌𝗆v_{n}^{*}\in\mathfrak{U}_{\mathsf{sm}} satisfying Eq. 4.50, we have

    Vn(x)=limr0𝔼xvn[0τ˘r(cn(Xt,vn(Xt))n(cn))dt]for allxd.V^{n*}(x)\,=\,\lim_{r\downarrow 0}\operatorname{\mathbb{E}}_{x}^{v_{n}^{*}}\left[\int_{0}^{{\breve{\uptau}}_{r}}\left(c_{n}(X_{t},v_{n}^{*}(X_{t}))-{\mathscr{E}}^{n*}(c_{n})\right)\mathrm{d}t\right]\quad\text{for all}\,\,\,x\in{\mathds{R}^{d}}\,. (4.51)

From [ABG-book, lemma 3.7.8], it is easy to see that the functions Vn,VV^{n*},V^{*} are bounded from below. Next we show that under Assumption (A7), as nn\to\infty the optimal value VnV^{n*} of the approximated model converges to the optimal value VV^{*} of the true model.

Theorem 4.8.

Suppose that Assumptions (A1)-(A5) and (A7) hold. Then, it follows that

limnn(cn)=(c).\lim_{n\to\infty}{\mathscr{E}}^{n*}(c_{n})={\mathscr{E}}^{*}(c)\,. (4.52)
Proof.

Since, cnM,\|c_{n}\|_{\infty}\leq M, we get n(cn)M{\mathscr{E}}^{n*}(c_{n})\leq M . Also, Eq. 4.44 implies that all v𝔘𝗌𝗆v\in\mathfrak{U}_{\mathsf{sm}} is stable and infv𝔘𝗌𝗆ηv(R)>0\inf_{v\in\mathfrak{U}_{\mathsf{sm}}}\eta_{v}({\mathscr{B}}_{R})>0 for any R>0R>0 (see, [ABG-book, Lemma 3.3.4] and [ABG-book, Lemma 3.2.4(b)]). Thus from [ABG-book, Theorem 3.7.6], we have there exist constants C^1,C^2>0\widehat{C}_{1},\widehat{C}_{2}>0 depending only on the radius R>0R>0 such that for all α>0\alpha>0, we have

Vαn()Vαn(0)𝒲2,p(R)C^1andsupRαVαnC^2.\|V_{\alpha}^{n}(\cdot)-V_{\alpha}^{n}(0)\|_{{\mathscr{W}}^{2,p}({\mathscr{B}}_{R})}\leq\widehat{C}_{1}\quad\text{and}\,\,\,\sup_{{\mathscr{B}}_{R}}\alpha V_{\alpha}^{n}\leq\widehat{C}_{2}\,. (4.53)

By standard vanishing discount argument (see [ABG-book, Lemma 3.7.8]) as α0\alpha\to 0 we have Vαn()Vαn(0)VnV_{\alpha}^{n}(\cdot)-V_{\alpha}^{n}(0)\to V^{n*} and αVαn(0)ρn\alpha V_{\alpha}^{n}(0)\to\rho_{n} . Hence the estimates Eq. 4.53 give us Vn𝒲2,p(R)C^1\|V^{n*}\|_{{\mathscr{W}}^{2,p}({\mathscr{B}}_{R})}\leq\widehat{C}_{1} . Since the constant C^1\widehat{C}_{1} is independent of nn, by standard diagonalization argument and the Banach-Alaoglu theorem, we can extract a subsequence {Vnk}\{V^{n_{k}*}\} such that for some V^𝒲loc2,p(d)\widehat{V}^{*}\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}}) (as in Eq. 3.8)

{VnkV^in𝒲loc2,p(d)(weakly)VnkV^in𝒞loc1,β(d)(strongly).\begin{cases}V^{n_{k}*}\to&\widehat{V}^{*}\quad\text{in}\quad{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\quad\text{(weakly)}\\ V^{n_{k}*}\to&\widehat{V}^{*}\quad\text{in}\quad{\mathcal{C}}^{1,\beta}_{loc}({\mathds{R}^{d}})\quad\text{(strongly)}\,.\end{cases} (4.54)

Also, since ρnM\rho_{n}\leq M , along a further sub-sequence (with out loss of generality denoting by same sequence) we have ρnkρ^\rho_{n_{k}}\to\widehat{\rho} as kk\to\infty . Now multiplying both sides of the equation Eq. 4.49 by test functions ϕ\phi, we obtain

dTr(ank(x)2Vnk(x))ϕ(x)dx+dminζ𝕌{bnk(x,ζ)Vnk(x)+cnk(x,ζ)}ϕ(x)dx=dρnkϕ(x)dx.\int_{{\mathds{R}^{d}}}\operatorname*{Tr}\bigl{(}a_{n_{k}}(x)\nabla^{2}V^{n_{k}*}(x)\bigr{)}\phi(x)\mathrm{d}x+\int_{{\mathds{R}^{d}}}\min_{\zeta\in\mathbb{U}}\{b_{n_{k}}(x,\zeta)\cdot\nabla V^{n_{k}*}(x)+c_{n_{k}}(x,\zeta)\}\phi(x)\mathrm{d}x=\int_{{\mathds{R}^{d}}}\rho_{n_{k}}\phi(x)\mathrm{d}x\,.

As in Theorem 3.4, using Eq. 4.54 and letting kk\to\infty it follows that V^𝒲loc2,p(d)\widehat{V}^{*}\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}}) satisfies

ρ^=minζ𝕌[ζV^(x)+c(x,ζ)].\widehat{\rho}=\min_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}\widehat{V}^{*}(x)+c(x,\zeta)\right]\,. (4.55)

Rewriting the equation Eq. 4.55, we have

Tr(a(x)2V^(x))=f(x),a.e.xd,\operatorname*{Tr}\bigl{(}a(x)\nabla^{2}\widehat{V}^{*}(x)\bigr{)}=f(x)\,,\quad\text{a.e.}\,\,x\in{\mathds{R}^{d}}\,,

where

f(x)=infζ𝕌[b(x,ζ)V^(x)+c(x,ζ)ρ^].f(x)=-\inf_{\zeta\in\mathbb{U}}\left[b(x,\zeta)\cdot\nabla\widehat{V}^{*}(x)+c(x,\zeta)-\widehat{\rho}\right]\,.

In view of Eq. 4.54 and assumptions (A1) and (A2), it is easy to see that f𝒞loc0,β(d)f\in{\mathcal{C}}^{0,\beta}_{loc}({\mathds{R}^{d}}) where 0<β<1dp0<\beta<1-\frac{d}{p} . Thus, by elliptic regularity [CL89, Theorem 3] (also see, [GilTru, Theorem 9.19]), we obtain V^𝒞2(d)\widehat{V}^{*}\in{\mathcal{C}}^{2}({\mathds{R}^{d}}) .

Next we want to show that V^𝔬(𝒱)\widehat{V}^{*}\in{\mathfrak{o}}{({\mathcal{V}})}. Since supncnM\sup_{n}\|c_{n}\|\leq M we have 1+c~𝔬(h)1+\tilde{c}\in{\mathfrak{o}}{(h)}, where c~:=supncn\tilde{c}\,:=\,\sup_{n}c_{n} . Also, since hh is inf-compact for a large enough r>0r>0 we have C^0infζ𝕌h(x,ζ)ϵ\displaystyle{\widehat{C}_{0}-\inf_{\zeta\in\mathbb{U}}h(x,\zeta)\leq-\epsilon} for all xrcx\in{\mathscr{B}}_{r}^{c} . Let XtnX_{t}^{n} be the solution of Eq. 2.15 corresponding to v𝔘𝗌𝗆v\in\mathfrak{U}_{\mathsf{sm}} . Hence, in view of Eq. 4.45, by Ito^\hat{\rm o}-Krylov formula, for any v𝔘𝗌𝗆v\in\mathfrak{U}_{\mathsf{sm}} and xrcRx\in{\mathscr{B}}_{r}^{c}\cap{\mathscr{B}}_{R} we deduce that

𝔼xv[𝒱(Xτ˘rnτRnn)]𝒱(x)=𝔼xv[0τ˘rnτRnvn𝒱(Xsn)ds]ϵ𝔼xv[τ˘rnτRn],\operatorname{\mathbb{E}}_{x}^{v}\left[{\mathcal{V}}(X_{{\breve{\uptau}}_{r}^{n}\wedge\uptau_{R}^{n}}^{n})\right]-{\mathcal{V}}(x)=\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{{\breve{\uptau}}_{r}^{n}\wedge\uptau_{R}^{n}}{\mathscr{L}}_{v}^{n}{\mathcal{V}}(X_{s}^{n})\mathrm{d}s\right]\leq-\epsilon\operatorname{\mathbb{E}}_{x}^{v}\left[{\breve{\uptau}}_{r}^{n}\wedge\uptau_{R}^{n}\right]\,,

where τ˘rn:=inf{t0:Xtnr}{\breve{\uptau}}_{r}^{n}:=\inf\{t\geq 0:X_{t}^{n}\in{\mathscr{B}}_{r}\} and τRn:=inf{t0:XtnRc}\uptau_{R}^{n}:=\inf\{t\geq 0:X_{t}^{n}\in{\mathscr{B}}_{R}^{c}\} . Letting RR\to\infty, by Fatou’s lemma we obtain

𝔼xv[τ˘rn]1ϵ𝒱(x)for allxrcandn.\operatorname{\mathbb{E}}_{x}^{v}\left[{\breve{\uptau}}_{r}^{n}\right]\leq\frac{1}{\epsilon}{\mathcal{V}}(x)\quad\text{for all}\,\,\,x\in{\mathscr{B}}_{r}^{c}\,\,\,\text{and}\,\,\,n\in\mathds{N}\,. (4.56)

Again, by Ito^\hat{\rm o}-Krylov formula, for any v𝔘𝗌𝗆v\in\mathfrak{U}_{\mathsf{sm}} and xrcRx\in{\mathscr{B}}_{r}^{c}\cap{\mathscr{B}}_{R} we have

𝔼xv[𝒱(Xτ˘rnτRnn)]𝒱(x)=𝔼xv[0τ˘rnτRnvn𝒱(Xsn)ds]𝔼xv[0τ˘rnτRn(C^0h(Xsn,v(Xsn)))ds],\operatorname{\mathbb{E}}_{x}^{v}\left[{\mathcal{V}}(X_{{\breve{\uptau}}_{r}^{n}\wedge\uptau_{R}^{n}}^{n})\right]-{\mathcal{V}}(x)=\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{{\breve{\uptau}}_{r}^{n}\wedge\uptau_{R}^{n}}{\mathscr{L}}_{v}^{n}{\mathcal{V}}(X_{s}^{n})\mathrm{d}s\right]\leq\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{{\breve{\uptau}}_{r}^{n}\wedge\uptau_{R}^{n}}(\widehat{C}_{0}-h(X_{s}^{n},v(X_{s}^{n})))\mathrm{d}s\right]\,,

Thus, by Fatou’s lemma letting RR\to\infty and using Eq. 4.56 we get

supnsupv𝔘𝗌𝗆𝔼xv[0τ˘rnh(Xsn,v(Xsn))ds]M^1𝒱(x),\sup_{n\in\mathds{N}}\sup_{v\in\mathfrak{U}_{\mathsf{sm}}}\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{{\breve{\uptau}}_{r}^{n}}h(X_{s}^{n},v(X_{s}^{n}))\mathrm{d}s\right]\leq\widehat{M}_{1}{\mathcal{V}}(x)\,,

for some positive constant M^1\widehat{M}_{1} . Hence, by arguing as in the proof of [ABG-book, Lemma 3.7.2 (i)], we have

supnsupv𝔘𝗌𝗆𝔼xv[0τ˘rn(1+c~(Xsn,v(Xsn)))ds]𝔬(𝒱).\sup_{n\in\mathds{N}}\sup_{v\in\mathfrak{U}_{\mathsf{sm}}}\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{{\breve{\uptau}}_{r}^{n}}(1+\tilde{c}(X_{s}^{n},v(X_{s}^{n})))\mathrm{d}s\right]\in{\mathfrak{o}}{({\mathcal{V}})}\,. (4.57)

Now, following the proof of [ABG-book, Lemma 3.7.8] (see, eq.(3.7.47)), it follows that

Vn(x)supv𝔘𝗌𝗆𝔼xv[0τ˘rn(cn(Xtn,v(Xtn))n(cn))dt+Vn(Xτ˘r)].V^{n*}(x)\,\leq\,\sup_{v\in\mathfrak{U}_{\mathsf{sm}}}\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{{\breve{\uptau}}_{r}^{n}}\left(c_{n}(X_{t}^{n},v(X_{t}^{n}))-{\mathscr{E}}^{n*}(c_{n})\right)\mathrm{d}t+V^{n*}(X_{{\breve{\uptau}}_{r}})\right]\,. (4.58)

We know that for pd+1p\geq d+1 the space 𝒲2,p(R){\mathscr{W}}^{2,p}({\mathscr{B}}_{R}) is compactly embedded in 𝒞1,β(¯R){\mathcal{C}}^{1,\beta}(\bar{{\mathscr{B}}}_{R}) where 0<β<1dp0<\beta<1-\frac{d}{p} . Since Vn𝒲2,p(R)C^1\|V^{n*}\|_{{\mathscr{W}}^{2,p}({\mathscr{B}}_{R})}\leq\widehat{C}_{1} for some positive constant C^1\widehat{C}_{1} which depends only on RR, we deduce that supnsupr|Vn|M^2\sup_{n\in\mathds{N}}\sup_{{\mathscr{B}}_{r}}|V^{n*}|\leq\widehat{M}_{2}, where M^2(>0)\widehat{M}_{2}(>0) is a constant. Also, since n(cn)cnM{\mathscr{E}}^{n*}(c_{n})\leq\|c_{n}\|_{\infty}\leq M from Eq. 4.58, it is easy to see that

|Vn(x)|Msupnsupv𝔘𝗌𝗆𝔼xv[0τ˘rn(c~(Xtn,v(Xtn))+1)dt+supnsupr|Vn|].|V^{n*}(x)|\,\leq\,M\sup_{n\in\mathds{N}}\sup_{v\in\mathfrak{U}_{\mathsf{sm}}}\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{{\breve{\uptau}}_{r}^{n}}\left(\tilde{c}(X_{t}^{n},v(X_{t}^{n}))+1\right)\mathrm{d}t+\sup_{n\in\mathds{N}}\sup_{{\mathscr{B}}_{r}}|V^{n*}|\right]\,. (4.59)

Therefore, by combining Eq. 4.54, Eq. 4.57 and Eq. 4.59, we obtain V^𝔬(𝒱)\widehat{V}^{*}\in{\mathfrak{o}}{({\mathcal{V}})} . Since, (V^,ρ^)𝒞2(d)𝔬(𝒱)×(\widehat{V}^{*},\widehat{\rho})\in{\mathcal{C}}^{2}({\mathds{R}^{d}})\cap{\mathfrak{o}}({\mathcal{V}})\times\mathds{R} satisfying V(0)=0V^{*}(0)=0 satisfies Eq. 4.46, by uniqueness result of Theorem 4.6, we deduce that (V^,ρ^)(V,ρ)(\widehat{V}^{*},\widehat{\rho})\equiv(V^{*},\rho). This completes the proof of the theorem. ∎

Next theorem proofs existence of a unique solution to a certain equation in some suitable function space. This result will be very useful in establishing our robustness result.

Theorem 4.9.

Suppose that assumptions (A1)-(A4) and (A7)(i) hold. Then for each v𝔘𝗌𝗆v\in\mathfrak{U}_{\mathsf{sm}} there exist a unique solution pair (Vv,ρv)𝒲loc2,p(d)𝔬(𝒱)×(V^{v},\rho^{v})\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\cap{\mathfrak{o}}({\mathcal{V}})\times\mathds{R} for any p>1p>1 satisfying

ρv=vVv(x)+c(x,v(x))withVv(0)=0.\rho^{v}={\mathscr{L}}_{v}V^{v}(x)+c(x,v(x))\quad\text{with}\quad V^{v}(0)=0\,. (4.60)

Furthermore, we have

  • (i)

    ρv=x(c,v)\rho^{v}={\mathscr{E}}_{x}(c,v)

  • (ii)

    for all xdx\in{\mathds{R}^{d}}, we have

    Vv(x)=limr0𝔼xv[0τ˘r(c(Xt,v(Xt))x(c,v))dt].V^{v}(x)\,=\,\lim_{r\downarrow 0}\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{{\breve{\uptau}}_{r}}\left(c(X_{t},v(X_{t}))-{\mathscr{E}}_{x}(c,v)\right)\mathrm{d}t\right]\,. (4.61)
Proof.

Existence of a solution pair (Vv,ρv)𝒲loc2,p(d)𝔬(𝒱)×(V^{v},\rho^{v})\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\cap{\mathfrak{o}}({\mathcal{V}})\times\mathds{R} for any p>1p>1 satisfying (i) and (ii) follows from [ABG-book, Lemma 3.7.8] . Now we want to prove the uniqueness of the solutions of Eq. 4.60. Let (V¯v,ρ¯v)𝒲loc2,p(d)𝔬(𝒱)×(\bar{V}^{v},\bar{\rho}^{v})\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\cap{\mathfrak{o}}({\mathcal{V}})\times\mathds{R} for any p>1p>1 be any other solution pair of Eq. 4.60 with V¯v(0)=0\bar{V}^{v}(0)=0. By Ito^\hat{\rm o}-Krylov formula, for R>0R>0 we obtain

𝔼xv[V¯v(XTτR)]V¯v(x)\displaystyle\operatorname{\mathbb{E}}_{x}^{v}\left[\bar{V}^{v}(X_{T\wedge\uptau_{R}})\right]-\bar{V}^{v}(x) =𝔼xv[0TτRvV¯v(Xs)ds]\displaystyle=\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{T\wedge\uptau_{R}}{\mathscr{L}}_{v}\bar{V}^{v}(X_{s})\mathrm{d}s\right]
=𝔼xv[0TτR(ρ¯vc(Xs,v(Xs)))ds].\displaystyle=\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{T\wedge\uptau_{R}}\left(\bar{\rho}^{v}-c(X_{s},v(X_{s}))\right)\mathrm{d}s\right]\,. (4.62)

Note that

0TτR(ρ¯vc(Xs,v(Xs)))ds=0TτRρ¯vds0TτRc(Xs,v(Xs))ds\int_{0}^{T\wedge\uptau_{R}}\left(\bar{\rho}^{v}-c(X_{s},v(X_{s}))\right)\mathrm{d}s=\int_{0}^{T\wedge\uptau_{R}}\bar{\rho}^{v}\mathrm{d}s-\int_{0}^{T\wedge\uptau_{R}}c(X_{s},v(X_{s}))\mathrm{d}s

Thus, letting RR\to\infty by monotone convergence theorem, we get

limR𝔼xv[0TτR(ρ¯vc(Xs,v(Xs)))ds]=𝔼xv[0T(ρ¯vc(Xs,v(Xs)))ds].\lim_{R\to\infty}\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{T\wedge\uptau_{R}}\left(\bar{\rho}^{v}-c(X_{s},v(X_{s}))\right)\mathrm{d}s\right]=\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{T}\left(\bar{\rho}^{v}-c(X_{s},v(X_{s}))\right)\mathrm{d}s\right]\,.

Since V¯v𝔬(𝒱)\bar{V}^{v}\in{\mathfrak{o}}{({\mathcal{V}})}, in view of [ABG-book, Lemma 3.7.2 (ii)], letting RR\to\infty, we deduce that

𝔼xv[V¯v(XT)]V¯v(x)=𝔼xv[0T(ρ¯vc(Xs,v(Xs)))ds].\displaystyle\operatorname{\mathbb{E}}_{x}^{v}\left[\bar{V}^{v}(X_{T})\right]-\bar{V}^{v}(x)=\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{T}\left(\bar{\rho}^{v}-c(X_{s},v(X_{s}))\right)\mathrm{d}s\right]\,. (4.63)

Also, from [ABG-book, Lemma 3.7.2 (ii)], we have

limT𝔼xv[V¯v(XT)]T=0.\lim_{T\to\infty}\frac{\operatorname{\mathbb{E}}_{x}^{v}\left[\bar{V}^{v}(X_{T})\right]}{T}=0\,.

Now, dividing both sides of Eq. 4.63 by TT and letting TT\to\infty, we obtain

ρ¯v=lim supT1T𝔼xv[0T(c(Xs,v(Xs)))ds].\displaystyle\bar{\rho}^{v}=\limsup_{T\to\infty}\frac{1}{T}\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{T}\left(c(X_{s},v(X_{s}))\right)\mathrm{d}s\right]\,.

This implies that ρ¯v=ρv\bar{\rho}^{v}=\rho^{v} . Using Eq. 4.60, by Ito^\hat{\rm o}-Krylov formula we have

V¯v(x)=𝔼xv[0τ˘rτR(c(Xt,v(Xt))ρ¯v)dt+V¯v(Xτ˘rτR)].\displaystyle\bar{V}^{v}(x)\,=\,\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{{\breve{\uptau}}_{r}\wedge\uptau_{R}}\left(c(X_{t},v(X_{t}))-\bar{\rho}^{v}\right)\mathrm{d}{t}+\bar{V}^{v}\left(X_{{\breve{\uptau}}_{r}\wedge\uptau_{R}}\right)\right]\,. (4.64)

Also, by Ito^\hat{\rm o}-Krylov formula and using Eq. 4.44 it follows that

𝔼xv[𝒱(XτR)𝟙{τ˘rτR}]C^0𝔼xv[τ˘r]+𝒱(x)for allr<|x|<R.\operatorname{\mathbb{E}}_{x}^{v}\left[{\mathcal{V}}\left(X_{\uptau_{R}}\right)\mathds{1}_{\{{\breve{\uptau}}_{r}\geq\uptau_{R}\}}\right]\leq\widehat{C}_{0}\operatorname{\mathbb{E}}_{x}^{v}\left[{\breve{\uptau}}_{r}\right]+{\mathcal{V}}(x)\quad\text{for all}\,\,\,r<|x|<R\,.

Since V¯v𝔬(𝒱)\bar{V}^{v}\in{\mathfrak{o}}({\mathcal{V}}), form the above estimate, we get

lim infR𝔼xv[V¯v(XτR)𝟙{τ˘rτR}]=0.\liminf_{R\to\infty}\operatorname{\mathbb{E}}_{x}^{v}\left[\bar{V}^{v}\left(X_{\uptau_{R}}\right)\mathds{1}_{\{{\breve{\uptau}}_{r}\geq\uptau_{R}\}}\right]=0\,.

Thus, letting RR\to\infty by Fatou’s lemma from Eq. 4.64, it follows that

V¯v(x)\displaystyle\bar{V}^{v}(x) 𝔼xv[0τ˘r(c(Xt,v(Xt))ρ¯v)dt+V¯v(Xτ˘r)]\displaystyle\,\geq\,\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{{\breve{\uptau}}_{r}}\left(c(X_{t},v(X_{t}))-\bar{\rho}^{v}\right)\mathrm{d}{t}+\bar{V}^{v}\left(X_{{\breve{\uptau}}_{r}}\right)\right]
𝔼xv[0τ˘r(c(Xt,v(Xt))ρ¯v)dt]+infrV¯v.\displaystyle\,\geq\,\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{{\breve{\uptau}}_{r}}\left(c(X_{t},v(X_{t}))-\bar{\rho}^{v}\right)\mathrm{d}{t}\right]+\inf_{{\mathscr{B}}_{r}}\bar{V}^{v}\,.

Since V¯v(0)=0\bar{V}^{v}(0)=0, letting r0r\to 0, we deduce that

V¯v(x)lim supr0𝔼xv[0τ˘r(c(Xt,v(Xt))ρ¯v)dt].\displaystyle\bar{V}^{v}(x)\,\geq\,\limsup_{r\downarrow 0}\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{{\breve{\uptau}}_{r}}\left(c(X_{t},v(X_{t}))-\bar{\rho}^{v}\right)\mathrm{d}{t}\right]\,. (4.65)

Since ρ^v=ρ¯v\widehat{\rho}^{v}=\bar{\rho}^{v}, from Eq. 4.61 and Eq. 4.65, it is easy to see that V^vV¯v0\widehat{V}^{v}-\bar{V}^{v}\leq 0 in d{\mathds{R}^{d}}. Also, since (Vv,ρv)(V^{v},\rho^{v}) and (V¯v,ρ¯v)(\bar{V}^{v},\bar{\rho}^{v}) are two solution pairs of Eq. 4.60, we have v(V^vV¯v)(x)=0{\mathscr{L}}_{v}\left(\widehat{V}^{v}-\bar{V}^{v}\right)(x)=0 in d{\mathds{R}^{d}}. Hence, by strong maximum principle [GilTru, Theorem 9.6], one has Vv=V¯vV^{v}=\bar{V}^{v}. This proves the uniqueness . ∎

Now we are ready to prove the robustness result, i.e., we want to show that x(c,vn)(c){\mathscr{E}}_{x}(c,v_{n}^{*})\to{\mathscr{E}}^{*}(c) as nn\to\infty, where vnv_{n}^{*} is an optimal ergodic control of the approximated model (see, Theorem 4.7) .

Theorem 4.10.

Suppose that Assumptions (A1) - (A5) and (A7) hold. Then, we have

limninfxdx(c,vn)=(c).\lim_{n\to\infty}\inf_{x\in{\mathds{R}^{d}}}{\mathscr{E}}_{x}(c,v_{n}^{*})={\mathscr{E}}^{*}(c)\,. (4.66)
Proof.

We shall follow a similar proof program as that of Theorem 3.4, under the discounted setup. From Theorem 4.9, we know that for each nn\in\mathds{N} there exists a unique pair (Vvn,ρvn)𝒲loc2,p(d)𝔬(𝒱)×(V^{v_{n}^{*}},\rho^{v_{n}^{*}})\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\cap{\mathfrak{o}}{({\mathcal{V}})}\times\mathds{R},   1<p<1<p<\infty, with Vvn(0)=0V^{v_{n}^{*}}(0)=0 satisfying

ρvn=[vnVvn(x)+c(x,vn(x))]\rho^{v_{n}^{*}}=\left[{\mathscr{L}}_{v_{n}^{*}}V^{v_{n}^{*}}(x)+c(x,{v_{n}^{*}}(x))\right] (4.67)

In view of Eq. 4.44, it is easy to see that, each v𝔘𝗌𝗆v\in\mathfrak{U}_{\mathsf{sm}} is stable and infv𝔘𝗌𝗆ηv(R)>0\inf_{v\in\mathfrak{U}_{\mathsf{sm}}}\eta_{v}({\mathscr{B}}_{R})>0 for any R>0R>0 (see, [ABG-book, Lemma 3.3.4] and [ABG-book, Lemma 3.2.4(b)]). Thus, from [ABG-book, Theorem 3.7.4], it follows that Vvn𝒲2,p(R)κ^1\lVert V^{v_{n}^{*}}\rVert_{{\mathscr{W}}^{2,p}({\mathscr{B}}_{R})}\leq\hat{\kappa}_{1} where κ^1\hat{\kappa}_{1} is a constant independent of nn\in\mathds{N} . Therefore by the Banach-Alaoglu theorem and standard diagonalization argument (as in Eq. 3.8), we deduce that there exists V~𝒲loc2,p(d)\tilde{V}\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}}) such that along a sub-sequence

{VvnkV~in𝒲loc2,p(d)(weakly)VvnkV~in𝒞loc1,β(d)(strongly).\begin{cases}V^{v_{n_{k}}^{*}}\to&\tilde{V}\quad\text{in}\quad{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\quad\text{(weakly)}\\ V^{v_{n_{k}}^{*}}\to&\tilde{V}\quad\text{in}\quad{\mathcal{C}}^{1,\beta}_{loc}({\mathds{R}^{d}})\quad\text{(strongly)}\,.\end{cases} (4.68)

Again, since ρvnM\rho^{v_{n}^{*}}\leq M, along a further sub-sequence (without loss of generality denoting by same sequence), we have ρvnkρ~\rho^{v_{n_{k}}^{*}}\to\tilde{\rho} as kk\to\infty . Since 𝔘𝗌𝗆\mathfrak{U}_{\mathsf{sm}} is compact along a further subsequence (without loss of generality denoting by same sequence) we have vnkv~v_{n_{k}}^{*}\to\tilde{v}^{*} as kk\to\infty . Now, as in Theorem 3.4, multiplying by test function and letting kk\to\infty, from Eq. 4.67, it is easy to see that (V~,ρ~)𝒲loc2,p(d)×(\tilde{V},\tilde{\rho})\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\times\mathds{R},   1<p<1<p<\infty, satisfies

ρ~=[v~V~(x)+c(x,v~(x))]\tilde{\rho}=\left[{\mathscr{L}}_{\tilde{v}^{*}}\tilde{V}(x)+c(x,{\tilde{v}^{*}}(x))\right] (4.69)

As we know that Vvnk(0)=0V^{v_{n_{k}}^{*}}(0)=0 for all kk\in\mathds{N}, we deduce that V~(0)=0\tilde{V}(0)=0 . Arguing as in Theorem 4.8 and using the estimate Vvn𝒲2,p(R)κ^1\lVert V^{v_{n}^{*}}\rVert_{{\mathscr{W}}^{2,p}({\mathscr{B}}_{R})}\leq\hat{\kappa}_{1}, we have

|V~(x)|Msupv𝔘𝗌𝗆𝔼xv[0τ˘r(c(Xt,v(Xt))+1)dt+supnsupr|Vvn|]𝔬(𝒱).|\tilde{V}(x)|\,\leq\,M\sup_{v\in\mathfrak{U}_{\mathsf{sm}}}\operatorname{\mathbb{E}}_{x}^{v}\left[\int_{0}^{{\breve{\uptau}}_{r}}\left(c(X_{t},v(X_{t}))+1\right)\mathrm{d}t+\sup_{n\in\mathds{N}}\sup_{{\mathscr{B}}_{r}}|V^{v_{n}^{*}}|\right]\in{\mathfrak{o}}{({\mathcal{V}})}\,. (4.70)

Thus, by uniqueness of solution of Eq. 4.69 (see, Theorem 4.9), we deduce that (V~,ρ~)(Vv~,ρv~)(\tilde{V},\tilde{\rho})\equiv(V^{\tilde{v}^{*}},\rho^{\tilde{v}^{*}}) .

By the triangle inequality

|ρvnkρ||ρvnkρnk|+|ρnkρ|.|\rho^{v_{n_{k}}^{*}}-\rho|\leq|\rho^{v_{n_{k}}^{*}}-\rho_{n_{k}}|+|\rho_{n_{k}}-\rho|\,.

From Theorem 4.7 we have |ρnkρ|0|\rho_{n_{k}}-\rho|\to 0 as kk\to\infty. Hence to complete the proof we have to show that |ρvnkρnk|0|\rho^{v_{n_{k}}^{*}}-\rho_{n_{k}}|\to 0 as kk\to\infty . Now, for any minimizing selector vnk𝔘𝗌𝗆v_{n_{k}}^{*}\in\mathfrak{U}_{\mathsf{sm}} of Eq. 4.49, we have

ρnk=[vnknkVnk(x)+cnk(x,vnk(x))].\rho_{n_{k}}=\left[{\mathscr{L}}_{v_{n_{k}}^{*}}^{n_{k}}V^{n_{k}}(x)+c_{n_{k}}(x,v_{n_{k}}^{*}(x))\right]\,. (4.71)

In view of the estimate Eq. 4.53, we obtain

Vnk𝒲2,p(R)κ^\lVert V^{n_{k}}\rVert_{{\mathscr{W}}^{2,p}({\mathscr{B}}_{R})}\leq\hat{\kappa} (4.72)

where κ^>0\hat{\kappa}>0 is a constant independent of kk\in\mathds{N} . Hence, by the Banach-Alaoglu theorem and standard diagonalization argument (see Eq. 3.8), we have there exists V~𝒲loc2,p(d)\tilde{V}^{*}\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}}) such that along a sub-sequence

{VnkV~in𝒲loc2,p(d)(weakly)VnkV~in𝒞loc1,β(d)(strongly).\begin{cases}V^{n_{k}}\to&\tilde{V}^{*}\quad\text{in}\quad{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\quad\text{(weakly)}\\ V^{n_{k}}\to&\tilde{V}^{*}\quad\text{in}\quad{\mathcal{C}}^{1,\beta}_{loc}({\mathds{R}^{d}})\quad\text{(strongly)}\,.\end{cases} (4.73)

Since, ρnkM\rho_{n_{k}}\leq M long a further subsequence (denoting by same sequence without loss generality) ρnkρ~\rho_{n_{k}}\to\tilde{\rho}^{*}. As we know vnkv~v_{n_{k}}^{*}\to\tilde{v}^{*} in 𝔘𝗌𝗆\mathfrak{U}_{\mathsf{sm}}, multiplying both sides of Eq. 4.71 by test functions and letting kk\to\infty, it follows that (V~,ρ~)𝒲loc2,p(d)×(\tilde{V}^{*},\tilde{\rho}^{*})\in{\mathscr{W}}_{\text{loc}}^{2,p}({\mathds{R}^{d}})\times\mathds{R},   1<p<1<p<\infty satisfies

ρ~=[v~V~(x)+c(x,v~(x))].\tilde{\rho}^{*}=\left[{\mathscr{L}}_{\tilde{v}^{*}}\tilde{V}^{*}(x)+c(x,\tilde{v}^{*}(x))\right]\,. (4.74)

Arguing as in Theorem 4.8, one can show that V~𝔬(𝒱)\tilde{V}^{*}\in{\mathfrak{o}}{({\mathcal{V}})}. Hence, by uniqueness of solution of Eq. 4.71 (see, Theorem 4.9) we deduce that (V~,ρ~)(Vv~,ρv~)(\tilde{V}^{*},\tilde{\rho}^{*})\equiv(V^{\tilde{v}^{*}},\rho^{\tilde{v}^{*}}) . Since both ρvnk\rho^{v_{n_{k}}^{*}} and ρnk\rho_{n_{k}} converge to same limit ρv~\rho^{\tilde{v}^{*}}, it follows that |ρvnkρnk|0|\rho^{v_{n_{k}}^{*}}-\rho_{n_{k}}|\to 0 as kk\to\infty . This completes the proof of the theorem. ∎

5. Finite Horizon Cost

In this section we study the robustness problem under a finite horizon criterion . We will assume that a,an,b,bn,c,cna,a_{n},b,b_{n},c,c_{n} satisfy the following:

  • (FN1)

    The functions a,an,b,bn,c,cna,a_{n},b,b_{n},c,c_{n} satisfy

    sup(x,ζ)d×𝕌[|b(x,ζ)|+a(x)+idaxi(x)+|c(x,ζ)|]K.\sup_{(x,\zeta)\in{\mathds{R}^{d}}\times\mathbb{U}}\left[\lvert b(x,\zeta)\rvert+\lVert a(x)\rVert+\sum_{i}^{d}\lVert\frac{\partial{a}}{\partial x_{i}}(x)\rVert+\lvert c(x,\zeta)\rvert\right]\,\leq\,\mathrm{K}\,.

    and

    supnsup(x,ζ)d×𝕌[|bn(x,ζ)|+an(x)+idanxi(x)+|cn(x,ζ)|]K.\sup_{n\in\mathds{N}}\sup_{(x,\zeta)\in{\mathds{R}^{d}}\times\mathbb{U}}\left[\lvert b_{n}(x,\zeta)\rvert+\lVert a_{n}(x)\rVert+\sum_{i}^{d}\lVert\frac{\partial{a_{n}}}{\partial x_{i}}(x)\rVert+\lvert c_{n}(x,\zeta)\rvert\right]\,\leq\,\mathrm{K}\,.

    for some positive constant K\mathrm{K} . Furthermore, H𝒲2,p,μ(d)L(d)H\in{\mathscr{W}}^{2,p,\mu}({\mathds{R}^{d}})\cap{L}^{\infty}({\mathds{R}^{d}}) ,   p2p\geq 2 .

From [BL84-book, Theorem 3.3, p. 235], the finite horizon optimality equation (or, the HJB equation)

ψt+infζ𝕌[ζψ+c(x,ζ)]=0\displaystyle\frac{\partial\psi}{\partial t}+\inf_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}\psi+c(x,\zeta)\right]=0 (5.1)
ψ(T,x)=H(x)\displaystyle\psi(T,x)=H(x) (5.2)

admits a unique solution ψ𝒲1,2,p,μ((0,T)×d)L((0,T)×d)\psi\in{\mathscr{W}}^{1,2,p,\mu}((0,T)\times{\mathds{R}^{d}})\cap{L}^{\infty}((0,T)\times{\mathds{R}^{d}}), for some p2p\geq 2 and μ>0\mu>0. Now, by Itô-Krylov formula (as in [HP09-book, Theorem 3.5.2]), there exist an optimal Markov policy, i.e., there exists v𝔘𝗆v^{*}\in\mathfrak{U}_{\mathsf{m}} such that 𝒥Tv(x,c)=𝒥T(x,c)=ψ(0,x){\mathcal{J}}_{T}^{v^{*}}(x,c)={\mathcal{J}}_{T}^{*}(x,c)=\psi(0,x) .

Similarly, for each nn\in\mathds{N} (for the approximating models) the optimality equation

ψnt+infζ𝕌[ζnψn+cn(x,ζ)]=0\displaystyle\frac{\partial\psi_{n}}{\partial t}+\inf_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}^{n}\psi_{n}+c_{n}(x,\zeta)\right]=0 (5.3)
ψn(T,x)=H(x)\displaystyle\psi_{n}(T,x)=H(x) (5.4)

admits a unique solution ψn𝒲1,2,p,μ((0,T)×d)L((0,T)×d)\psi_{n}\in{\mathscr{W}}^{1,2,p,\mu}((0,T)\times{\mathds{R}^{d}})\cap{L}^{\infty}((0,T)\times{\mathds{R}^{d}}) ,   p2p\geq 2 . Moreover, by the Itô-Krylov formula (as in [HP09-book, Theorem 3.5.2]), there exists vn𝔘𝗆v_{n}^{*}\in\mathfrak{U}_{\mathsf{m}} such that 𝒥T,nvn(x,cn)=𝒥T,n(x,cn)=ψn(0,x){\mathcal{J}}_{T,n}^{v_{n}^{*}}(x,c_{n})={\mathcal{J}}_{T,n}^{*}(x,c_{n})=\psi_{n}(0,x) .

The following theorem shows that as the approximating model approaches the true model the optimal value of the approximating model converge to the optimal value of the true model.

Theorem 5.1.

Suppose Assumptions (A1), (A3) and (FN1) hold. Then

limn𝒥T,n(x,cn)=𝒥T(x,c).\lim_{n\to\infty}{\mathcal{J}}_{T,n}^{*}(x,c_{n})={\mathcal{J}}_{T}^{*}(x,c)\,.
Proof.

For any minimizing selector vnv_{n}^{*} of Eq. 5.3, we have

ψnt+vnnψn+cn(x,vn(t,x))=0\displaystyle\frac{\partial\psi_{n}}{\partial t}+{\mathscr{L}}_{v_{n}^{*}}^{n}\psi_{n}+c_{n}(x,v_{n}^{*}(t,x))=0 (5.5)
ψn(T,x)=H(x)\displaystyle\psi_{n}(T,x)=H(x) (5.6)

By the Itô-Krylov formula, it follows that

ψn(t,x)=𝔼xvn[tTcn(Xsn,vn(s,Xs))ds+H(XT)]\displaystyle\psi_{n}(t,x)=\operatorname{\mathbb{E}}_{x}^{v_{n}^{*}}\left[\int_{t}^{T}c_{n}(X_{s}^{n},v_{n}^{*}(s,X_{s}^{*}))\mathrm{d}{s}+H(X_{T}^{*})\right] (5.7)

This implies that

ψnTcn+H.\lVert\psi_{n}\rVert_{\infty}\leq T\lVert c_{n}\rVert_{\infty}+\lVert H\rVert_{\infty}\,. (5.8)

Rewriting Eq. 5.5, it follows that

ψnt+vnnψn+λ0ψn=λ0ψncn(x,vn(t,x))\displaystyle\frac{\partial\psi_{n}}{\partial t}+{\mathscr{L}}_{v_{n}^{*}}^{n}\psi_{n}+\lambda_{0}\psi_{n}=\lambda_{0}\psi_{n}-c_{n}(x,v_{n}^{*}(t,x))
ψn(T,x)=H(x),\displaystyle\psi_{n}(T,x)=H(x)\,,

for some fixed λ0>0\lambda_{0}>0 . Thus, by parabolic pde estimate [BL84-book, eq. (3.8), p. 234], we deduce that

ψn𝒲1,2,p,μκ^1λ0ψncn(x,vn(t,x))Lp,μ.\lVert\psi_{n}\rVert_{{\mathscr{W}}^{1,2,p,\mu}}\leq\hat{\kappa}_{1}\lVert\lambda_{0}\psi_{n}-c_{n}(x,v_{n}^{*}(t,x))\rVert_{{L}^{p,\mu}}\,. (5.9)

Thus, from Eq. 5.8 and Eq. 5.9, we obtain ψn𝒲1,2,p,μκ^2\lVert\psi_{n}\rVert_{{\mathscr{W}}^{1,2,p,\mu}}\leq\hat{\kappa}_{2} for some positive constant κ^2\hat{\kappa}_{2} (independent of nn) . Since 𝒲1,2,p,μ((0,T)×d){\mathscr{W}}^{1,2,p,\mu}((0,T)\times{\mathds{R}^{d}}) is a reflexive Banach space, as a corollary of the Banach-Alaoglu theorem, there exists ψ¯𝒲1,2,p,μ((0,T)×d)\bar{\psi}\in{\mathscr{W}}^{1,2,p,\mu}((0,T)\times{\mathds{R}^{d}}) such that along a subsequence (without loss of generality denoting by same sequence)

{ψnψ¯in𝒲1,2,p,μ((0,T)×d)(weakly)ψnψ¯in𝒲0,1,p,μ((0,T)×d)(strongly).\begin{cases}\psi_{n}\to&\bar{\psi}\quad\text{in}\quad{\mathscr{W}}^{1,2,p,\mu}((0,T)\times{\mathds{R}^{d}})\quad\text{(weakly)}\\ \psi_{n}\to&\bar{\psi}\quad\text{in}\quad{\mathscr{W}}^{0,1,p,\mu}((0,T)\times{\mathds{R}^{d}})\quad\text{(strongly)}\,.\end{cases} (5.10)

Now, as in our earlier analysis for the different cost criteria considered, multiplying both sides of the Eq. 5.3 by test function ϕ𝒞c((0,T)×d)\phi\in{\mathcal{C}}_{c}^{\infty}((0,T)\times{\mathds{R}^{d}}) and integrating, we get

0Tdψntϕ(t,x)dtdx+0Tdinfζ𝕌[ζnψn+cn(x,ζ)]ϕ(t,x)dtdx=0.\displaystyle\int_{0}^{T}\int_{{\mathds{R}^{d}}}\frac{\partial\psi_{n}}{\partial t}\phi(t,x)\mathrm{d}t\mathrm{d}x+\int_{0}^{T}\int_{{\mathds{R}^{d}}}\inf_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}^{n}\psi_{n}+c_{n}(x,\zeta)\right]\phi(t,x)\mathrm{d}t\mathrm{d}x=0\,. (5.11)

Thus, in view of Eq. 5.10, letting nn\to\infty, from Eq. 5.11 it follows that (arguing as in Section 3 - Eq. 3.11)

0Tdψ¯tϕ(t,x)dtdx+0Tdinfζ𝕌[ζψ¯+c(x,ζ)]ϕ(t,x)dtdx=0.\displaystyle\int_{0}^{T}\int_{{\mathds{R}^{d}}}\frac{\partial\bar{\psi}}{\partial t}\phi(t,x)\mathrm{d}t\mathrm{d}x+\int_{0}^{T}\int_{{\mathds{R}^{d}}}\inf_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}\bar{\psi}+c(x,\zeta)\right]\phi(t,x)\mathrm{d}t\mathrm{d}x=0\,.

Since ϕ𝒞c((0,T)×d)\phi\in{\mathcal{C}}_{c}^{\infty}((0,T)\times{\mathds{R}^{d}}) is arbitrary, from the above equation we deduce that ψ¯𝒲1,2,p,μ((0,T)×d)\bar{\psi}\in{\mathscr{W}}^{1,2,p,\mu}((0,T)\times{\mathds{R}^{d}}) satisfies

ψ¯t+infζ𝕌[ζψ¯+c(x,ζ)]=0\displaystyle\frac{\partial\bar{\psi}}{\partial t}+\inf_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}\bar{\psi}+c(x,\zeta)\right]=0
ψ¯(T,x)=H(x).\displaystyle\bar{\psi}(T,x)=H(x)\,. (5.12)

Since ψ\psi is the unique solution of Section 5, we deduce that ψ¯(0,x)=ψ(0,x)=𝒥T(x,c)\bar{\psi}(0,x)=\psi(0,x)={\mathcal{J}}_{T}^{*}(x,c). This completes the proof. ∎

In the following theorem, we prove the robustness result for the finite horizon cost criterion.

Theorem 5.2.

Suppose Assumptions (A1), (A3) and (FN1) hold. Then for any optimal control vnv_{n}^{*} of the approximating models we have

limn𝒥Tvn(x,c)=𝒥T(x,c).\lim_{n\to\infty}{\mathcal{J}}_{T}^{v_{n}^{*}}(x,c)={\mathcal{J}}_{T}^{*}(x,c)\,.
Proof.

By the triangle inequality we have

|𝒥Tvn(x,c)𝒥T(x,c)||𝒥Tvn(x,c)𝒥T,nvn(x,cn)|+|𝒥T,nvn(x,cn)𝒥T(x,c)|.|{\mathcal{J}}_{T}^{v_{n}^{*}}(x,c)-{\mathcal{J}}_{T}^{*}(x,c)|\leq|{\mathcal{J}}_{T}^{v_{n}^{*}}(x,c)-{\mathcal{J}}_{T,n}^{v_{n}^{*}}(x,c_{n})|+|{\mathcal{J}}_{T,n}^{v_{n}^{*}}(x,c_{n})-{\mathcal{J}}_{T}^{*}(x,c)|\,.

From Theorem 5.1, it is known that |𝒥T,nvn(x,cn)𝒥T(x,c)|0|{\mathcal{J}}_{T,n}^{v_{n}^{*}}(x,c_{n})-{\mathcal{J}}_{T}^{*}(x,c)|\to 0 as nn\to\infty . Next, we show that |𝒥Tvn(x,c)𝒥T,nvn(x,cn)|0|{\mathcal{J}}_{T}^{v_{n}^{*}}(x,c)-{\mathcal{J}}_{T,n}^{v_{n}^{*}}(x,c_{n})|\to 0 as nn\to\infty .

Since the space 𝔘𝗆\mathfrak{U}_{\mathsf{m}} is compact (with topology defined as in [YukselPradhan, Definition 2.2]), along a sub-sequence vnv¯v_{n}^{*}\to\bar{v}. From [BL84-book, Theorem 3.3, p. 235], we have that for each nn\in\mathds{N} there exists a unique solution ψ¯n𝒲1,2,p,μ((0,T)×d)L((0,T)×d)\bar{\psi}_{n}\in{\mathscr{W}}^{1,2,p,\mu}((0,T)\times{\mathds{R}^{d}})\cap{L}^{\infty}((0,T)\times{\mathds{R}^{d}}) ,   p2p\geq 2, to the following Poisson equation

ψ¯nt+[vnψ¯n+c(x,vn(t,x))]=0\displaystyle\frac{\partial\bar{\psi}_{n}}{\partial t}+\left[{\mathscr{L}}_{v_{n}^{*}}\bar{\psi}_{n}+c(x,v_{n}^{*}(t,x))\right]=0
ψn(T,x)=H(x).\displaystyle\psi_{n}(T,x)=H(x)\,. (5.13)

By Itô-Krylov formula, from Section 5 it follows that

ψ¯n(t,x)=𝔼xvn[tTc(Xs,vn(s,Xs))ds+H(XT)]\displaystyle\bar{\psi}_{n}(t,x)=\operatorname{\mathbb{E}}_{x}^{v_{n}^{*}}\left[\int_{t}^{T}c(X_{s},v_{n}^{*}(s,X_{s}))\mathrm{d}{s}+H(X_{T})\right] (5.14)

This gives us

ψ¯nTc+H.\lVert\bar{\psi}_{n}\rVert_{\infty}\leq T\lVert c\rVert_{\infty}+\lVert H\rVert_{\infty}\,. (5.15)

Arguing as in Theorem 5.1, letting nn\to\infty from Section 5, we deduce that there exists ψ^𝒲1,2,p,μ((0,T)×d)L((0,T)×d)\hat{\psi}\in{\mathscr{W}}^{1,2,p,\mu}((0,T)\times{\mathds{R}^{d}})\cap{L}^{\infty}((0,T)\times{\mathds{R}^{d}}) ,   p2p\geq 2, satisfying

ψ^t+[v¯ψ^+c(x,v¯(t,x))]=0\displaystyle\frac{\partial\hat{\psi}}{\partial t}+\left[{\mathscr{L}}_{\bar{v}}\hat{\psi}+c(x,\bar{v}(t,x))\right]=0
ψ^(T,x)=H(x).\displaystyle\hat{\psi}(T,x)=H(x)\,. (5.16)

Now using Section 5, by Itô-Krylov formula we deduce that

ψ^(t,x)=𝔼xv¯[tTc(Xs,v¯(s,Xs))ds+H(XT)].\displaystyle\hat{\psi}(t,x)=\operatorname{\mathbb{E}}_{x}^{\bar{v}}\left[\int_{t}^{T}c(X_{s},\bar{v}(s,X_{s}))\mathrm{d}{s}+H(X_{T})\right]\,. (5.17)

Moreover, we have

ψnt+vnnψn+cn(x,vn(t,x))=0\displaystyle\frac{\partial\psi_{n}}{\partial t}+{\mathscr{L}}_{v_{n}^{*}}^{n}\psi_{n}+c_{n}(x,v_{n}^{*}(t,x))=0 (5.18)
ψn(T,x)=H(x).\displaystyle\psi_{n}(T,x)=H(x)\,. (5.19)

Letting nn\to\infty, as in Theorem 5.1, we have there exists ψ~𝒲1,2,p,μ((0,T)×d)L((0,T)×d)\tilde{\psi}\in{\mathscr{W}}^{1,2,p,\mu}((0,T)\times{\mathds{R}^{d}})\cap{L}^{\infty}((0,T)\times{\mathds{R}^{d}}) ,   p2p\geq 2, satisfying

ψ~t+[v¯ψ~+c(x,v¯(t,x))]=0\displaystyle\frac{\partial\tilde{\psi}}{\partial t}+\left[{\mathscr{L}}_{\bar{v}}\tilde{\psi}+c(x,\bar{v}(t,x))\right]=0
ψ~(T,x)=H(x).\displaystyle\tilde{\psi}(T,x)=H(x)\,. (5.20)

By Itô-Krylov formula, from Section 5, we obtain

ψ~(t,x)=𝔼xv¯[tTc(Xs,v¯(s,Xs))ds+H(XT)].\displaystyle\tilde{\psi}(t,x)=\operatorname{\mathbb{E}}_{x}^{\bar{v}}\left[\int_{t}^{T}c(X_{s},\bar{v}(s,X_{s}))\mathrm{d}{s}+H(X_{T})\right]\,. (5.21)

From Eq. 5.17and Eq. 5.21, we deduce that 𝒥Tvn(x,c)=ψ¯n(0,x){\mathcal{J}}_{T}^{v_{n}^{*}}(x,c)=\bar{\psi}_{n}(0,x) and 𝒥T,nvn(x,cn)=ψn(0,x){\mathcal{J}}_{T,n}^{v_{n}^{*}}(x,c_{n})=\psi_{n}(0,x) converge to the same limit . This completes the proof. ∎

6. Control up to an Exit Time

Before we conclude the paper, let us also briefly note that if one consider an optimal control up to an exit time with the cost given as:

  • (in true model:) for each U𝔘U\in\mathfrak{U} the associated cost is given as

    𝒥^eU(x):=𝔼xU[0τ(O)e0tδ(Xs,Us)dsc(Xt,Ut)dt+e0τ(O)δ(Xs,Us)dsh(Xτ(O))],xd,\hat{{\mathcal{J}}}_{e}^{U}(x)\,:=\,\operatorname{\mathbb{E}}_{x}^{U}\left[\int_{0}^{\tau(O)}e^{-\int_{0}^{t}\delta(X_{s},U_{s})\mathrm{d}s}c(X_{t},U_{t})\mathrm{d}t+e^{-\int_{0}^{\tau(O)}\delta(X_{s},U_{s})\mathrm{d}s}h(X_{\tau(O)})\right],\quad x\in{\mathds{R}^{d}}\,,
  • (in approximated models:) for each nn\in\mathds{N} and U𝔘U\in\mathfrak{U} the associated cost is given as

    𝒥^e,nU(x):=𝔼xU[0τ(O)e0tδ(Xs,Us)dscn(Xt,Ut)dt+e0τ(O)δ(Xs,Us)dsh(Xτ(O))],xd,\hat{{\mathcal{J}}}_{e,n}^{U}(x)\,:=\,\operatorname{\mathbb{E}}_{x}^{U}\left[\int_{0}^{\tau(O)}e^{-\int_{0}^{t}\delta(X_{s},U_{s})\mathrm{d}s}c_{n}(X_{t},U_{t})\mathrm{d}t+e^{-\int_{0}^{\tau(O)}\delta(X_{s},U_{s})\mathrm{d}s}h(X_{\tau(O)})\right],\quad x\in{\mathds{R}^{d}}\,,

where OdO\subset{\mathds{R}^{d}} is a smooth bounded domain, τ(O):=inf{t0:XtO}\tau(O)\,:=\,\inf\{t\geq 0:X_{t}\notin O\}, δ(,):O¯×𝕌[0,)\delta(\cdot,\cdot):\bar{O}\times\mathbb{U}\to[0,\infty) is the discount function and h:O¯+h:\bar{O}\to\mathds{R}_{+} is the terminal cost function. In the true model the optimal value is defined as 𝒥^e(x)=infU𝔘𝒥^eU(x)\hat{{\mathcal{J}}}_{e}^{*}(x)=\inf_{U\in\mathfrak{U}}\hat{{\mathcal{J}}}_{e}^{U}(x), and in the approximated model the optimal value is defined as 𝒥^e,n(x)=infU𝔘𝒥^e,nU(x)\hat{{\mathcal{J}}}_{e,n}^{*}(x)=\inf_{U\in\mathfrak{U}}\hat{{\mathcal{J}}}_{e,n}^{U}(x) . We assume that δ𝒞(O¯×𝕌)\delta\in{\mathcal{C}}(\bar{O}\times\mathbb{U}), h𝒞(O¯)h\in{\mathcal{C}}(\bar{O}). As in [RZ21], [B05Survey, p.229] the analysis leads to the following HJB equation.

minζ𝕌[ζϕ(x)δ(x,ζ)ϕ(x)+c(x,ζ)]=0,for all xO,withϕ=honO.\displaystyle\min_{\zeta\in\mathbb{U}}\left[{\mathscr{L}}_{\zeta}\phi(x)-\delta(x,\zeta)\phi(x)+c(x,\zeta)\right]=0\,,\quad\text{for all\ }\,\,x\in O\,,\quad\text{with}\quad\phi=h\,\,\,\text{on}\,\,\,\partial{O}\,.

By similar argument as in [ABG-book, Theorem 3.5.3], [ABG-book, Theorem 3.5.6] we have that 𝒥^e\hat{{\mathcal{J}}}_{e}^{*}, 𝒥^e,n\hat{{\mathcal{J}}}_{e,n}^{*} are unique solutions to their respective HJB equations. Existence follows by utilizing the Leray-Schauder fixed point theorem as in [ABG-book, Theorem 3.5.3] and uniqueness follows by Ito^\hat{o}-Krylov formula as in [ABG-book, Theorem 3.5.6] . Using standard elliptic PDE estimates (on bounded domain OO) and closely mimicking the arguments as in Theorem 3.3, we have the following continuity result

Theorem 6.1.

Suppose Assumptions (A1)-(A5) hold. Then

limn𝒥^e,n(x)=𝒥^e(x)for allxO¯.\lim_{n\to\infty}\hat{{\mathcal{J}}}_{e,n}^{*}(x)=\hat{{\mathcal{J}}}_{e}^{*}(x)\quad\text{for all}\,\,x\in\bar{O}\,.

For each nn\in\mathds{N}, suppose that v^e,n𝔘𝗌𝗆\hat{v}_{e,n}^{*}\in\mathfrak{U}_{\mathsf{sm}}, v^e𝔘𝗌𝗆\hat{v}_{e}^{*}\in\mathfrak{U}_{\mathsf{sm}} are optimal controls of the approximated model and true model respectively. Then in view of the the above continuity result, following the steps of the proof of the Theorem 3.4, we obtain the following robustness result.

Theorem 6.2.

Suppose Assumptions (A1)-(A5) hold. Then

limn𝒥^ev^e,n(x)=𝒥^ev^e(x)for allxO¯.\lim_{n\to\infty}\hat{{\mathcal{J}}}_{e}^{\hat{v}_{e,n}^{*}}(x)=\hat{{\mathcal{J}}}_{e}^{\hat{v}_{e}^{*}}(x)\quad\text{for all}\,\,x\in\bar{O}\,.

7. Revisiting Example 2.1

Consider Example 2.1(i).

  • For discounted cost: Let v^n\hat{v}_{n}^{*} be a discounted cost optimal control when the system is governed by LABEL:{ERS1.1} (existence of such control is ensured by Theorem 3.1). Then following Theorem 3.4, we have that

    limn𝒥αv^n(x,c)=𝒥αv(x,c)for allxd.\lim_{n\to\infty}{\mathcal{J}}_{\alpha}^{\hat{v}_{n}^{*}}(x,c)={\mathcal{J}}_{\alpha}^{v^{*}}(x,c)\quad\text{for all}\,\,x\in{\mathds{R}^{d}}\,. (7.1)
  • For ergodic cost: Let v^n\hat{v}_{n}^{*} be an ergodic optimal control when the system is governed by LABEL:{ERS1.1} (existence is guaranteed by Theorem 4.2, Theorem 4.7). Then arguing as in Theorem 4.5 (for near-monotone case) Theorem 4.10 (for stable case), it follows that

    limninfxdx(c,vn)=(c).\lim_{n\to\infty}\inf_{x\in{\mathds{R}^{d}}}{\mathscr{E}}_{x}(c,v_{n}^{*})={\mathscr{E}}^{*}(c)\,. (7.2)
  • Finite horizon cost: For each nn\in\mathds{N}, let v^n\hat{v}_{n}^{*} be a finite horizon optimal control when the system is governed by LABEL:{ERS1.1} . Then in view of Theorem 5.2, we have

    limn𝒥Tv^n(x,c)=𝒥T(x,c)for allxd.\lim_{n\to\infty}{\mathcal{J}}_{T}^{\hat{v}_{n}^{*}}(x,c)={\mathcal{J}}_{T}^{*}(x,c)\quad\text{for all}\,\,x\in{\mathds{R}^{d}}\,. (7.3)
  • For cost up to an exit time: Let v^e,n\hat{v}_{e,n}^{*} be a an optimal control when the system is governed by LABEL:{ERS1.1}, for each nn\in\mathds{N}. Then Theorem 6.2 ensures that

    limn𝒥^ev^e,n(x)=𝒥^ev^e(x)for allxO¯.\lim_{n\to\infty}\hat{{\mathcal{J}}}_{e}^{\hat{v}_{e,n}^{*}}(x)=\hat{{\mathcal{J}}}_{e}^{\hat{v}_{e}^{*}}(x)\quad\text{for all}\,\,x\in\bar{O}\,. (7.4)

8. Conclusion

In this paper, we studied continuity of optimal costs and robustness/stability of optimal control policies designed for an incorrect models applied to an actual model for both discounted/ergodic cost criteria. In our analysis we have crucially used the fact that our actual model is a non-degenerate diffusion model. It would be an interesting problem to investigate if such results can be proved in the cases when the limiting system (actual system) is a degenerate diffusion system. Also, in our analysis we have assumed that our system noise is given by a Wiener process; it would be interesting to study further noise processes e.g., when system noise is a wide-bandwidth process or a more general discontinuous martingale noise (as in [K90], [KR87], [KR87a], [KR88]) . In the latter case the controlled process may become non-Markovian process even under stationary Markov policies. Therefore, it is reasonable to find suitable Markovian approximation of it which maintains the necessary properties of the original system. The analysis of robustness problems in this setting is a direction of research worth pursuing.

References