This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Convex operator-theoretic methods in stochastic control

Boris Houska
(ShanghaiTech University)
Abstract

This paper is about operator-theoretic methods for solving nonlinear stochastic optimal control problems to global optimality. These methods leverage on the convex duality between optimally controlled diffusion processes and Hamilton-Jacobi-Bellman (HJB) equations for nonlinear systems in an ergodic Hilbert-Sobolev space. In detail, a generalized Bakry-Emery condition is introduced under which one can establish the global exponential stabilizability of a large class of nonlinear systems. It is shown that this condition is sufficient to ensure the existence of solutions of the ergodic HJB for stochastic optimal control problems on infinite time horizons. Moreover, a novel dynamic programming recursion for bounded linear operators is introduced, which can be used to numerically solve HJB equations by a Galerkin projection.

1 Introduction

Bellman’s principle of optimality [12] and its dual counterpart, the Chapman-Kolmogorov equation [20, 30] for controlled diffusion processes [33], are the fundament of stochastic control theory. They describe the semigroup structure of the evolution of optimally operated stochastic processes. Their associated partial differential equations (PDEs) for continuous-time models are the well-known HJB equation [23] and the controlled Fokker-Planck-Kolmogorov (FPK) equation [15].


While parabolic FPKs and HJBs are related to stochastic optimal control problems on finite time horizons, their elliptic counterparts model the evolution of optimally operated diffusion processes on an infinite horizon. One distinguishes between ergodic optimal control and infinite horizon optimal control.

  • Ergodic optimal control is about the analysis of the limit behavior of optimally operated dynamic systems as well as the long-term average control performance. For instance, in the deterministic case, a control system might have optimal steady states or optimal periodic orbits, but it could as well be that it is optimal to stay on a homoclinic orbit or even to allow a chaotic limit behavior. Similarly, the optimal ergodic distribution of nonlinear stochastic control systems often has a highly complex multi-modal structure.

  • Infinite horizon optimal control is about optimizing the system’s transient behavior. Here, the hope is that this transient eventually converges to an optimal ergodic limit. For general nonlinear stochastic systems, however, such a convergence analysis often turns out to be difficult. This is because, in practice, one is often interested in optimizing the system’s economic performance, which leads to indefinite stage cost functions. Unfortunately, it is often not even clear whether the integral over such indefinite costs has an expected value on an infinite time horizon.


The goal of the present paper is to address the above challenges in ergodic- and infinite horizon optimal control by building upon recent results in elliptic diffusion operator theory and functional analysis, reviewed below.

1.1 Literature Review

The systematic mathematical study of stochastic systems goes back to the work of Kolmogorov [30], although related continuous-time electro-dynamic, molecular, quantum-theoretic, and thermal diffusions have been studied even earlier by Fokker [25], von Smoluchowski [45], Planck [37], and Chapman [20]. The relevance of such diffusions in the context of analyzing the ergodic behavior of dynamical systems has been observed by Has’minskiǐ [27] in 1960, who found an elegant way of using Harnack’s inequality in combination with a weak Lyapunov function in order to analyze the existence of steady states of FPK equations. And, in fact, it was only six years later that Wonham realized the great potential of Has’minskiǐ’s work in the context of ergodic control theory [46]. By now, these historical developments are well-understood and have been extended by many authors, as reviewed in great detail in [15] focusing on FPK equations and in [33] focusing on controlled diffusion processes.


In contrast to linear FPKs, HJBs are nonlinear PDEs, which have been introduced by Bellman [12]. They model the backward-in-time evolution of the value function of either deterministic or stochastic optimal control problems. Here, the state of the HJB equation can—at least formally—be interpreted as the co-state of the FPK-constraint of certain optimal diffusion control problems. As such, HJBs and FPKs are closely related via the concept of duality in convex optimization, as originally pointed out in [13] and [24]. HJBs for optimal control of deterministic systems are typically analyzed by introducing an artificial stochastic perturbation and then passing to the so-called viscosity limit by making this perturbation infinitesimally small. The corresponding viscosity methods go back to Crandall, Ishii and Lions [22]. They can be considered as the main tool for analyzing parabolic HJBs on finite time horizons. The textbooks [9] and [23] offer a relatively complete overview of the associated literature on HJBs between 1957 and 1997.


Regarding the existence and properties of solutions to elliptic HJBs for ergodic and infinite-horizon optimal control, however, surprisingly little is known. In [9] a whole chapter is dedicated to infinite horizon optimal control problems with exponentially discounted stage costs. The analysis methods in this chapter are, however, inapplicable to problems with a general indefinite stage cost. In the context of ergodic optimal control, this problem has partly been resolved by Arisawa [1, 2], who has proposed methods to analyze the limit of vanishing discounts, which, under certain assumptions, turns out to be a successful path towards analyzing ergodic HJBs. Moreover, Barles [10] as well as Namah and Roquejoffre [36] have suggested methods for analyzing the long term behavior of HJBs with periodic Hamiltonians. These methods have in common that they focus on special classes of stochastic systems under periodicity assumptions. And, finally, yet another option for analyzing infinite horizon optimal control problems is to work with turnpikes [5, 41], which can be used to analyze the long-term local behavior of systems near optimal steady-states or periodic orbits.


In contrast to ergodic HJBs, much more is known about the limit behavior of FPKs. This is not only due the above mentioned work by Has’minskiǐ, but also due to relatively recent breakthroughs in functional analysis that can be traced back to the seminal work of Bakry and Emery [6] on carre-du-champ operators, whose far-reaching consequences have only been understood to full extent in the last two decades. Based on earlier work by Gross on hypercontractivity and logarithmic Sobolev inequalities [26], Arnold, Markowich, Toscani, and Unterreiter [3] were among the first to come up with a comprehensive and general analysis of the convergence of FPKs to ergodic limit states by featuring Bakry-Emery conditions. Even more practical conditions for deriving logarithmic Sobolev- and general entropy-type inequalities for diffusion semigroups and FPKs have appeared in an article by Bolley and Gentil [16]. The most recent developments regarding the trend to equilibrium of FPKs in the degenerate case are summarized in [7, 8].


Last but not least, concerning recent developments in PDE-constrained optimization [28], optimal control of FPKs has received significant attention during the last five years [4, 17, 18]. These articles are, however, more concerned with the well-posedness of FPK-constrained optimization problems, typically, with least-squares or other objectives. These particular classes of FPK-constrained optimization problems are not directly related to HJBs. The only exception is the work of Bobkov, Gentil, and Ledoux [14], who are the first to attempt an analysis of Hamilton-Jacobi equations based on logarithmic Sobolev inequalities—even if these authors focus only on a very special class of HJBs that can be solved explicitly via the Hopf-Lax formula.

1.2 Contributions

This paper presents novel results about the existence and uniqueness of solutions to ergodic and infinite horizon stochastic optimal control problems. They are summarized, respectively, in Theorem 1 and Theorem 2.


In detail, Theorem 1 is about the existence of ergodic limits of optimally controlled nonlinear stochastic McKean-Vlasov systems on a general convex domain with reflecting boundary, assuming that a weak Has’minskiǐ-Lyapunov function is available. Its proof is based on an energy dissipation argument for FPKs with mixed boundary conditions that models the evolution of the probability density function of its underlying nonlinear stochastic control system. The theorem is applicable to a general class of ergodic optimal control problems with indefinite stage cost and potentially multi-modal optimal limit distributions.


Similarly, Theorem 2 is about the existence of weak solutions to ergodic HJBs with Neumann boundary condition in a suitably weighted Hilbert-Sobolev space and their relation to infinite horizon stochastic optimal control. It is based on a generalized Bakry-Emery condition for optimally controlled FPKs under which global exponential stabilizability can be established. This condition turns out to be sufficient for ensuring that the value function of the stochastic infinite horizon optimal control problem is a bounded linear operator, whose unique Riesz’ representation can be used to construct a weak solution of the ergodic HJB.


A particularly relevant aspect of Theorem 2 is that it succeeds in analyzing a class of nonlinear PDEs—in this case HJBs—by relying on convex functional analysis methods that have originally been designed for analyzing linear PDEs. This implies in particular that mature numerical methods for linear parabolic PDEs, such as finite element Galerkin methods, can be used to approximately solve finite- and infinite horizon stochastic optimal control problems via a discretized operator-theoretic dynamic programming recursion. Because such methods for discretizing linear parabolic PDEs are, in general, more accurate, easier to analyze, and easier to implement than methods for discretizing nonlinear PDEs, such as HJBs, operator-theoretic dynamic programming methods offer clear advantages over classical dynamic programming methods for optimal control.

1.3 Overview

In order to put the above contributions into context, this paper is structured as follows.

  • Section 2 reviews the relations between controlled McKean-Vlasov systems and FPKs in the presence of reflecting boundary conditions.

  • Section 3 explains how to use energy functionals to establish dissipativity results for optimally controlled FPKs, leading to the first main result on ergodic optimal control that is summarized in Theorem 1.

  • Section 4 proposes a novel approach to analyzing parabolic HJBs with Neumann boundary conditions in a suitably weighted Hilbert-Sobolev space.

  • Section 5 introduces Lyapunov diffusion operators in order to analyze the stabilizability of controlled diffusions via a generalized Bakry-Emery condition.

  • Section 6 introduces ergodic HJBs and presents the main result on infinite horizon optimal control that is summarized in Theorem 2.

  • Section 7 presents a finite element Galerkin projection based operator-theoretic dynamic programming method for solving parabolic and ergodic HJBs. A case study illustrates its performance for an infinite horizon stochastic optimal control problem with indefinite stage costs and non-trivial ergodic behavior.


And, finally, Section 8 concludes the paper.

1.4 Notation

Let Ω𝗇\Omega\subseteq\mathbb{R}^{\mathsf{n}} denote an open set, Ω¯=cl(Ω)\overline{\Omega}=\mathrm{cl}(\Omega) its closure and Ω=Ω¯Ω\partial\Omega=\overline{\Omega}\setminus\Omega its boundary. The sets

Llock(Ω),Lk(Ω),C0(Ω),Ck(Ω¯),andH1(Ω)L_{\mathrm{loc}}^{k}(\Omega),\ L^{k}(\Omega),\ C_{0}^{\infty}(\Omega),\ C^{k}\left(\overline{\Omega}\right),\ \text{and}\ \ H^{1}(\Omega)

denote—as usual—the set of functions that are, respectively, locally LkL^{k}-integrable, LkL^{k}-integrable, smooth with compact support, kk-times continuously differentiable with continuous extensions to the boundary, and L2L^{2}-integrable with L2L^{2}-integrable weak derivatives on Ω\Omega. The inner products

(φ,ψ)L2(Ω)\displaystyle(\varphi,\psi)_{L^{2}(\Omega)} =def\displaystyle\ \overset{\mathrm{def}}{=}\ Ωφψdx\displaystyle\int_{\Omega}\varphi\psi\,\mathrm{d}x
and(φ,ψ)H1(Ω)\displaystyle\text{and}\qquad(\varphi,\psi)_{H^{1}(\Omega)} =def\displaystyle\ \overset{\mathrm{def}}{=}\ Ω(φψ+φψ)dx\displaystyle\int_{\Omega}\left(\varphi\psi+\nabla\varphi^{\intercal}\nabla\psi\right)\,\mathrm{d}x

are defined such that L2(Ω)L^{2}(\Omega) and H1(Ω)H^{1}(\Omega) are Hilbert spaces. Here, \nabla denotes the gradient operator. Additionally, if ωLloc1(Ω)\omega\in L_{\mathrm{loc}}^{1}(\Omega) with ω>0\omega>0 is a strictly positive weight function, the shorthands

L2(ω)\displaystyle L^{2}(\omega) =def\displaystyle\ \overset{\mathrm{def}}{=}\ {φLloc1(Ω)|φωL2(Ω)}\displaystyle\left\{\ \varphi\in L_{\mathrm{loc}}^{1}(\Omega)\ \middle|\ \varphi\cdot\sqrt{\omega}\ \in\ L^{2}(\Omega)\ \right\}
andH1(ω)\displaystyle\text{and}\quad H^{1}(\omega) =def\displaystyle\ \overset{\mathrm{def}}{=}\ {φLloc1(Ω)|φωH1(Ω)}\displaystyle\left\{\ \varphi\in L_{\mathrm{loc}}^{1}(\Omega)\ \middle|\ \varphi\cdot\sqrt{\omega}\ \in\ H^{1}(\Omega)\ \right\}

are used. The associated inner products are then, of course, weighted with ω\omega, too. For given time intervals [0,T][0,T]\subseteq\mathbb{R} and any Banach space (X,X)(X,\|\cdot\|_{X}), we use the symbol Lk(0,T;X)L^{k}(0,T;X) to denote the set of strongly measurable functions φ:[0,T]X\varphi:[0,T]\to X with

φLk(0,T;X)=def(0Tφ(t)Xkdt)1/k<,\|\varphi\|_{L^{k}(0,T;X)}\ \overset{\mathrm{def}}{=}\ \left(\int_{0}^{T}\|\varphi(t)\|_{X}^{k}\,\mathrm{d}t\right)^{1/k}\ <\ \infty,

such that Lk(0,T;X)L^{k}(0,T;X) is a Bochner space. Weak derivatives with respect to time are indicated by a “dot”; and H1(ω)H^{1}(\omega)^{*} denotes the dual of H1(ω)H^{1}(\omega). The space

𝒫(ω)=def{φL2(0,T;H1(ω))φ˙L2(0,T;H1(ω))}\mathcal{P}(\omega)\ \overset{\mathrm{def}}{=}\ \{\ \varphi\in L^{2}(0,T;H^{1}(\omega))\mid\dot{\varphi}\in L^{2}(0,T;H^{1}(\omega)^{*})\ \}

is then again a Hilbert space with respect to its natural inner product [28, Thm. 1.31]. Finally, the natural duality pairing between a Hilbert space and its dual is denoted by ,\langle\cdot,\cdot\rangle assuming that it is clear from the context which Hilbert space is meant.

2 Stochastic Control Systems

This section is about the equivalence of closed-loop controlled stochastic nonlinear systems in a finite dimensional space and a special class of open-loop controlled deterministic linear systems in an infinite dimensional space, namely, controlled FPKs.

2.1 Nonlinear Control Systems

Deterministic control-affine systems have the form

x˙=f(x)+G(x)uon𝕋=def(0,T).\displaystyle\dot{x}=f(x)+G(x)u\qquad\text{on}\quad\mathbb{T}\ \overset{\mathrm{def}}{=}\ (0,T)\;. (1)

Here, 𝕋\mathbb{T}\subseteq\mathbb{R} denotes an open time horizon. It is finite if T<T<\infty and infinite if T=T=\infty. The right-hand functions fC1(Ω¯)nxf\in C^{1}(\overline{\Omega})^{n_{x}} and GC1(Ω¯)nuG\in C^{1}(\overline{\Omega})^{n_{u}} are assumed to be defined on the closure of an open domain Ωnx\Omega\subseteq\mathbb{R}^{n_{x}}. Moreover, x(t)x(t) denotes the state and u(t)u(t) the control at time t𝕋t\in\mathbb{T}.


The goal is to design a feedback μLloc1(0,T;Lloc1(Ω))\mu\in L_{\mathrm{loc}}^{1}(0,T;L_{\mathrm{loc}}^{1}(\Omega)) that can be used to control (1) in closed-loop mode,

x˙(t)=F[μ](t,x(t))fort𝕋,\displaystyle\dot{x}(t)=F[\mu](t,x(t))\quad\text{for}\ \ t\in\mathbb{T}, (2)

with F[μ](t,x)=deff(x)+G(x)μ(t,x)F[\mu](t,x)\ \overset{\mathrm{def}}{=}\ f(x)+G(x)\mu(t,x). Because traditional existence theorems for ordinary differential equations are inapplicable in the context of such general feedback functions μ\mu, solutions of (2) are defined in the viscosity sense, by introducing a small random process noise.

2.2 Stochastic Differential Equations

Let us temporarily assume that Ω=nx\Omega=\mathbb{R}^{n_{x}}. In this case, we can add a process noise to the right-hand of (2), arriving at a stochastic differential equation (SDE) of the form

dX=F[μ](t,X)dt+2ϵdWton𝕋,\displaystyle\mathrm{d}X=F[\mu](t,X)\,\mathrm{d}t+\sqrt{2\epsilon}\,\mathrm{d}W_{t}\qquad\text{on}\ \ \mathbb{T}, (3)

where WtW_{t}, t0t\geq 0, denotes a Wiener process, ϵ>0\epsilon>0 a diffusion constant, and x0nxx_{0}\in\mathbb{R}^{n_{x}} an initial state. The state of this SDE at time t𝕋t\in\mathbb{T} is denoted by X(t)X(t). It is a random variable, whose probability density function is denoted by ρ(t)L1(Ω)\rho(t)\in L^{1}(\Omega). This means that for any given Borel set SΩS\subseteq\Omega, the probability of the event X(t)SX(t)\in S is given by

Pr(X(t)S)=Sρ(t)dx.\mathrm{Pr}\left(\,X(t)\in S\,\right)\ =\ \int_{S}\rho(t)\,\mathrm{d}x\;.

Conditions under which (3) admits a martingale solution XX that has such an L1L^{1}-integrable probability density function ρ\rho are discussed below.

Remark 1

The assumption that ϵ\epsilon is a scalar constant is introduced for simplicity presentation. The results below can be generalized to the case that the constant scalar factor 2ϵ\sqrt{2\epsilon} in (3) is replaced by a potentially state-dependent invertible matrix.

2.3 Reflections at the Boundary

The states of many nonlinear systems have no physical interpretation outside of their natural domain Ω\Omega. For instance the concentration of the reactants in a controlled chemical process are usually required to be non-negative implying that Ωnx\Omega\neq\mathbb{R}^{n_{x}}. In such a case, the stochastic process noise model in (3) makes no sense, as the Wiener process WtW_{t} is unbounded. Instead, at least for the case that Ω\Omega is convex, a physically more meaningful stochastic model is given by

2ϵdWtdXF[μ](t,X)dt+NΩ¯(X)dton𝕋.\displaystyle\begin{array}[]{rcl}\sqrt{2\epsilon}\,\mathrm{d}W_{t}&\in&\mathrm{d}X-F[\mu](t,X)\mathrm{d}t+N_{\overline{\Omega}}(X)\,\mathrm{d}t\quad\text{on}\ \ \mathbb{T}.\end{array} (5)

This is a McKean-Vlasov SDE with reflection [21, 40], where the additional drift term,

NΩ¯(x)=def{znxyΩ¯,z(xy)0},N_{\overline{\Omega}}(x)\ \overset{\mathrm{def}}{=}\ \{z\in\mathbb{R}^{n_{x}}\mid\forall y\in\overline{\Omega},\ z^{\intercal}(x-y)\geq 0\},

acts as reflector. We have NΩ¯(x)={0}N_{\overline{\Omega}}(x)=\{0\} for xΩx\in\Omega, which means that (3) and (5) coincide on Ω\Omega. For xΩx\in\partial\Omega, however, NΩ¯(x)N_{\overline{\Omega}}(x) is equal to the normal cone of Ω¯\overline{\Omega} at xx. Consequently, whenever (5) admits a solution XX that has an L1L^{1}-integrable probability density ρ\rho, we expect that

Pr(X(t)Ω)=Ωρ(t)dx= 1,\displaystyle\mathrm{Pr}(\,X(t)\in\Omega\,)\ =\ \int_{\Omega}\rho(t)\,\mathrm{d}x\ =\ 1\;, (6)

which is the case under very mild assumptions that are, for instance, discussed in [21], but sufficient conditions under which this is the case are also discussed below. For the special case that Ω=nx\Omega=\mathbb{R}^{n_{x}}, (5) reduces to (3).

2.4 Fokker-Planck-Kolmogorov Equations

Let 𝒜\mathcal{A} denote the ϵ\epsilon-diffusion operator of ff and \mathcal{B} the linear transport operator of GG, formally defined by

𝒜ρ=div(fρ)+ϵΔρandν=div(Gν),\mathcal{A}\rho=-\mathrm{div}\left(f\rho\right)+\epsilon\,\Delta\rho\qquad\text{and}\qquad\mathcal{B}\nu=-\mathrm{div}(G\nu)\;,

where “div\mathrm{div}” denotes the divergence and “Δ\Delta” the Laplace operator. The pair (𝒜,)(\mathcal{A},\mathcal{B}) is associated with an infinite-dimensional linear control system, called the controlled FPK equation. In the presence of reflecting boundary conditions, it has the form

ρ˙=𝒜ρ+νonΩ𝕋ρ0=ρ(0)onΩ0=(ϵρfρGν)n|ΩonΣ𝕋.\displaystyle\begin{array}[]{rcll}\dot{\rho}&=&\mathcal{A}\rho+\mathcal{B}\nu&\text{on}\ \ \Omega_{\mathbb{T}}\\[4.55254pt] \rho_{0}&=&\rho(0)&\text{on}\ \ \Omega\\[4.55254pt] 0&=&\left.\left(\epsilon\nabla\rho-f\rho-G\nu\right)^{\intercal}n\right|_{\partial\Omega}&\text{on}\ \ \Sigma_{\mathbb{T}}.\\ \end{array} (10)

Here, ρ0L1(Ω)\rho_{0}\in L^{1}(\Omega) denotes a given initial probability distribution on Ω\Omega, while

Ω𝕋=def𝕋×ΩandΣ𝕋=def𝕋×Ω\Omega_{\mathbb{T}}\ \overset{\mathrm{def}}{=}\ \mathbb{T}\times\Omega\qquad\text{and}\qquad\Sigma_{\mathbb{T}}\ \overset{\mathrm{def}}{=}\ \mathbb{T}\times\partial\Omega

denote, respectively, the FPK’s open domain and its spatial boundary. Moreover, n:Ωnxn:\partial\Omega\to\mathbb{R}^{n_{x}} denotes an outer normal, such that

xΩ,n(x)NΩ¯(x)andn(x)2=1.\forall x\in\partial\Omega,\quad n(x)\in N_{\overline{\Omega}}(x)\quad\text{and}\quad\|n(x)\|_{2}=1\;.

The open-loop linear control system (10) is closely related to the closed-loop nonlinear control system (5) after identifying ρ\rho with the probability distribution of the random process XX and after substituting

ν=μρonΩ𝕋.\displaystyle\nu\ =\ \mu\rho\qquad\text{on}\ \ \Omega_{\mathbb{T}}\;. (11)

In order to elaborate on this relation between (5), (10) and (11), a couple of definitions are needed. For this aim, the shorthand

𝒰(ω)=def(L2(0,T;L2(ω)))nu\mathcal{U}(\omega)\ \overset{\mathrm{def}}{=}\ (L^{2}(0,T;L^{2}(\omega)))^{n_{u}}

is introduced, where ωLloc1(Ω)\omega\in L_{\mathrm{loc}}^{1}(\Omega), ω>0\omega>0, is a strictly positive function. Moreover, the linear operators 𝒜\mathcal{A} and \mathcal{B} are associated with the bilinear forms

a(ρ,V)\displaystyle a(\rho,V) =def\displaystyle\ \overset{\mathrm{def}}{=}\ ΩρfVϵρVdx\displaystyle\int_{\Omega}\rho f^{\intercal}\nabla V-\epsilon\nabla\rho^{\intercal}\nabla V\,\mathrm{d}x
andb(ν,V)\displaystyle\text{and}\qquad b(\nu,V) =def\displaystyle\ \overset{\mathrm{def}}{=}\ i=1nuνi,VGi,\displaystyle\sum_{i=1}^{n_{u}}\left\langle\nu_{i},\nabla V^{\intercal}G_{i}\right\rangle,

which are defined for all ρ𝒫(ω1)\rho\in\mathcal{P}(\omega^{-1}) and all V𝒫(ω)V\in\mathcal{P}(\omega).

Definition 1

Let ωL(Ω)\omega\in L^{\infty}(\Omega) with ω>0\omega>0 on Ω\Omega and ν𝒰(ω1)\nu\in\mathcal{U}(\omega^{-1}) be given. A function ρ𝒫(ω1)\rho\in\mathcal{P}(\omega^{-1}) with given initial value ρ(0)=ρ0L2(ω1)\rho(0)=\rho_{0}\in L^{2}(\omega^{-1}) is called a weak solution of (10) if

0Tρ˙,Vdt\displaystyle\int_{0}^{T}\left\langle\dot{\rho},V\right\rangle\,\mathrm{d}t =\displaystyle\ =\ 0Ta(ρ,V)+b(ν,V)dt\displaystyle\int_{0}^{T}a(\rho,V)+b(\nu,V)\,\mathrm{d}t (12)

for all test functions V𝒫(ω)V\in\mathcal{P}(\omega).


For the special case that Ω\Omega is convex and bounded and T<T<\infty, we may set the weight function to ω=1\omega=1. In this case, the existence of a unique weak solution of (10) for arbitrary ν𝒰(1)\nu\in\mathcal{U}(1) follows from a standard result about parabolic PDEs, see [28, Thm. 1.33], recalling that ff and GG are assumed to be continuously differentiable on Ω¯\overline{\Omega} and ϵ>0\epsilon>0. For unbounded domains Ω𝕋\Omega_{\mathbb{T}}, however, additional assumptions on ff, GG, and ω\omega are needed in order to ensure the existence of weak solutions, as discussed a bit further below, in Section 3.

2.5 Nonlinear Control via Linear Systems Theory

From a control theoretic perspective, it is interesting to point out that for the purpose of analyzing the finite-dimensional closed-loop controlled nonlinear system (f,G)(f,G), as defined by (5), it sufficient to analyze the infinite dimensional open-loop controlled linear system (𝒜,)(\mathcal{A},\mathcal{B}), as defined in (10). This is because the existence of a unique weak solution ρ𝒫(ω1)\rho\in\mathcal{P}(\omega^{-1}) of (10) for ωL(Ω)\omega\in L^{\infty}(\Omega) and an initial probability density function 0ρ0L2(ω1)L1(Ω)0\leq\rho_{0}\in L^{2}(\omega^{-1})\cap L^{1}(\Omega) with Ωρ0dx=1\int_{\Omega}\rho_{0}\mathrm{d}x=1 of the initial random variable X(0)X(0) is sufficient to ensure that (5) has a martingale solution XX with probability density function ρ\rho that satisfies (6). This is a standard result from the theory of diffusion processes [42], which has recently been extended for general FPKs with reflecting boundary conditions [7]; see also [21]. Of course, here we have to substitute (11), but this substitution is justified, too. Namely, it follows from the minimum principle for parabolic FPKs [15] that if ρ𝒫(ω1)\rho\in\mathcal{P}(\omega^{-1}), is a weak solution of (10) for an initial probability density ρ00\rho_{0}\geq 0, then

ρ>0onΩ𝕋,\displaystyle\rho>0\quad\text{on}\ \ \ \Omega_{\mathbb{T}}\;, (13)

recalling that Ω𝕋\Omega_{\mathbb{T}} is an open set, 0𝕋0\notin\mathbb{T}, and ϵ>0\epsilon>0. Thus, the closed-loop control law, μ=ν/ρ\mu=\nu/\rho, can be recovered from the infinite-dimensional open-loop input ν\nu and the strictly positive function ρ\rho.

Remark 2

There are many ways to reformulate finite dimensional nonlinear control systems as equivalent infinite dimensional linear control systems. This includes linear embedding methods based on occupation measures [34, 44] as well as lifting methods based on first order differential operators such as Liouville-Koopman or Perron-Frobenius operators [31, 43]. In contrast to these first order transport operators and their adjoints, the Kolmogorov forward operator 𝒜\mathcal{A} is, however, an elliptic second order operator. Such second order operators have many advantages regarding both their mathematical properties and their numerical discretization [28].

3 Ergodic Optimal Control

The idea to analyze optimal control of diffusion processes with reflection can be traced back to the work of Puterman [38], who originally introduced such optimal control problems on a finite time horizon 𝕋\mathbb{T} and on a bounded domain Ω\Omega. This section presents methods for analyzing such optimal control problems on potentially unbounded domains, with a focus on ergodic limits.

3.1 Problem Formulation

Let 𝕌nu\mathbb{U}\subseteq\mathbb{R}^{n_{u}} denote a control constraint set, let

𝔘=def{μLloc1(0,T;Lloc1(Ω))μ𝕌onΩ𝕋}\mathfrak{U}\ \overset{\mathrm{def}}{=}\ \{\mu\in L_{\mathrm{loc}}^{1}(0,T;L_{\mathrm{loc}}^{1}(\Omega))\mid\mu\in\mathbb{U}\ \ \text{on}\ \ \Omega_{\mathbb{T}}\}

denote the associated set of feasible control laws, and let :Ω×𝕌\ell:\Omega\times\mathbb{U}\to\mathbb{R} be a cost function. The current section is about ergodic optimal control problems of the form

J=limTminμ𝔘𝔼{1T0T(X(t),μ(X(t)))dt}.\displaystyle J^{\star}\ =\ \lim_{T\to\infty}\min_{\mu\in\mathfrak{U}}\ \mathbb{E}\left\{\frac{1}{T}\int_{0}^{T}\ell(X(t),\mu(X(t)))\,\mathrm{d}t\right\}. (14)

Here, XX denotes a martingale solution of the SDE (5) that depends implicitly on the optimization variable μ\mu.

Assumption 1

Throughout this paper, we assume that

  1. 1.

    the set Ωnx\Omega\subseteq\mathbb{R}^{n_{x}} is convex, open, and non-empty;

  2. 2.

    the system satisfies (f,G)C1(Ω¯)nx×(1+nu)(f,G)\in C^{1}(\overline{\Omega})^{n_{x}\times(1+n_{u})};

  3. 3.

    the set 𝕌nu\mathbb{U}\subseteq\mathbb{R}^{n_{u}} is convex, compact, 𝕌\mathbb{U}\neq\varnothing; and

  4. 4.

    the stage cost C2(Ω¯×𝕌)\ell\in C^{2}(\overline{\Omega}\times\mathbb{U}) is bounded from below, radially unbounded in xx,

    x(x,u),\|x\|\to\infty\quad\Longrightarrow\quad\ell(x,u)\to\infty,

    and strongly convex in uu, u20\nabla_{u}^{2}\ell\succ 0 on Ω×𝕌\Omega\times\mathbb{U}.

Assumption 1 neither excludes that Ω\Omega is unbounded nor that Ω=nx\Omega=\mathbb{R}^{n_{x}}. On such unbounded domains, however, a weak Has’minskiǐ-Lyapunov function is needed.

Assumption 2

There exists a function QC2(Ω¯)Q\in C^{2}(\overline{\Omega}) and positive constants γ1,γ2,γ3,γ4(0,)\gamma_{1},\gamma_{2},\gamma_{3},\gamma_{4}\in(0,\infty) such that

  1. 1.

    the Hessian of QQ is positive definite and uniformly bounded, such that

    γ1I2Qγ2IonΩ,\displaystyle\gamma_{1}I\ \preceq\ \nabla^{2}Q\ \preceq\ \gamma_{2}I\qquad\text{on}\ \ \Omega, (15)
  2. 2.

    the function QQ satisfies Qn0\nabla Q^{\intercal}n\geq 0 on Ω\partial\Omega, and

  3. 3.

    QQ satisfies the weak Lyapunov condition

    F[μ]Q\displaystyle F[\mu]^{\intercal}\nabla Q \displaystyle\ \leq\ γ3γ4[μ]onΩ\displaystyle\gamma_{3}-\gamma_{4}\ell[\mu]\qquad\text{on}\ \ \Omega (16)
    with[μ](x)\displaystyle\text{with}\qquad\quad\ell[\mu](x) =def\displaystyle\ \overset{\mathrm{def}}{=}\ (x,μ(x))\displaystyle\ell(x,\mu(x)) (17)

    for at least one μC1(Ω)\mu\in C^{1}(\Omega) with μint(𝕌)\mu\in\mathrm{int}(\mathbb{U}) on Ω\Omega.


Assumption 2 can be interpreted as a weak stabilizability condition of the system (f,G)(f,G) relative to its cost \ell.

Example 1

If Assumption 1 holds and if the domain Ω={xx<1}\Omega=\{x\mid\|x\|<1\} is the open unit ball, Assumption 2 is always satisfied, because Condition (16) holds with

Q(x)=12x2andγ1=γ2=γ4=1Q(x)=\frac{1}{2}\|x\|^{2}\qquad\text{and}\qquad\gamma_{1}=\gamma_{2}=\gamma_{4}=1

for all sufficiently large constants γ3\gamma_{3}.

Example 2

Let f(x)=xx3f(x)=x-x^{3}, G(x)=0G(x)=0, and the cost (x,u)=x2+x4+u2\ell(x,u)=x^{2}+x^{4}+u^{2} be defined on the unbounded domain Ω=\Omega=\mathbb{R}. In this case, Assumption 2 is satisfied for μ=0\mu=0,

Q(x)=12x2,γ1=γ2=γ3=1,andγ4=14.Q(x)=\frac{1}{2}\|x\|^{2},\ \ \gamma_{1}=\gamma_{2}=\gamma_{3}=1,\quad\text{and}\quad\gamma_{4}=\frac{1}{4}\;.

The associated nominal nonlinear system, however, is neither controllable nor stabilizable and, consequently, does not admit a strong control Lyapunov function.


The above examples suggest that Assumptions 1 and 2 are satisfied for a large class of nonlinear control systems. This statement is additionally supported by a long list of examples for generic FPKs in [15], where a very similar definition of Has’minskiǐ-Lyapunov functions is used.

3.2 Optimal Steady States

This section discusses methods for analyzing the expected value of the stage cost \ell at the optimal steady state of the FPK. As a first step towards this goal, the following lemma exploits a variant of Has’minskiǐ’s theorem in order to guarantee the existence of a potentially suboptimal steady state and to derive an associated upper bound on the expected value of \ell.

Lemma 1

Let Assumptions 1 and 2 be satisfied and let μC1(Ω)\mu\in C^{1}(\Omega) be a control law for which (16) holds. Then there exists an equilibrium ρsH1(Ω)L1(Ω)\rho_{\mathrm{s}}\in H^{1}(\Omega)\cap L^{1}(\Omega) such that

0=𝒜ρs+(μρs)onΩ,0ρsonΩ,0=(ϵρsF[μ]ρs)nonΩ,1=Ωρs(x)dx,ϵγ2nx+γ3γ4Ω(x,μ(x))ρs(x)dx.\displaystyle\begin{array}[]{rcll}0&\ =&\mathcal{A}\rho_{\mathrm{s}}+\mathcal{B}(\mu\rho_{\mathrm{s}})&\mathrm{on}\ \ \Omega,\\[4.55254pt] 0&\ \geq&\rho_{\mathrm{s}}&\mathrm{on}\ \ \Omega,\\[4.55254pt] 0&\ =&\left(\epsilon\nabla\rho_{\mathrm{s}}-F[\mu]\rho_{\mathrm{s}}\right)^{\intercal}n&\mathrm{on}\ \ \partial\Omega,\\[4.55254pt] 1&\ =&\int_{\Omega}\rho_{\mathrm{s}}(x)\,\mathrm{d}x\;,&\\[4.55254pt] \dfrac{\epsilon\gamma_{2}n_{x}+\gamma_{3}}{\gamma_{4}}&\ \geq&\int_{\Omega}\ell(x,\mu(x))\rho_{\mathrm{s}}(x)\,\mathrm{d}x\;.\end{array} (23)

Proof. Let QQ be defined as in Assumption 2 and let

Ωα=def{xΩQ(x)<α}forα>infxΩQ(x)\Omega_{\alpha}\ \overset{\mathrm{def}}{=}\ \{x\in\Omega\mid Q(x)<\alpha\}\qquad\text{for}\qquad\alpha\ >\ \inf_{x\in\Omega}Q(x)

denote the associated open α\alpha-sublevel sets of QQ. Because QQ is strongly convex and differentiable and because α\alpha is assumed to be sufficiently large, the sets Ωα\Omega_{\alpha} are open, non-empty, convex, bounded, and have a Lipschitz boundary. Thus, it makes sense to introduce the convex PDE-constrained auxiliary optimization problem

~α=def\displaystyle\widetilde{\ell}_{\alpha}\ \overset{\mathrm{def}}{=}\ supρ\displaystyle\sup_{\rho} Ωαγ4[μ]ρdx\displaystyle\ \int_{\Omega_{\alpha}}\gamma_{4}\ell[\mu]\rho\,\mathrm{d}x (24)
s.t.\displaystyle\mathrm{s.t.} {0=ϵΔρdiv(F[μ]ρ)onΩα0ρonΩα0=(ϵρF[μ]ρ)nαonΩα,1=Ωαρdx\displaystyle\ \left\{\begin{array}[]{rcll}0&=&\epsilon\Delta\rho-\mathrm{div}(F[\mu]\rho)&\text{on}\ \Omega_{\alpha}\\[4.55254pt] 0&\geq&\rho&\text{on}\ \Omega_{\alpha}\\[4.55254pt] 0&\ =&\left(\epsilon\nabla\rho-F[\mu]\rho\right)^{\intercal}n_{\alpha}&\text{on}\ \ \partial\Omega_{\alpha},\\[4.55254pt] 1&=&\int_{\Omega_{\alpha}}\rho\,\mathrm{d}x\end{array}\right. (29)

with optimization variable ρH1(Ωα)L1(Ωα)\rho\in H^{1}(\Omega_{\alpha})\cap L^{1}(\Omega_{\alpha}). Here, nαn_{\alpha} denotes an outer normal of Ωα\Omega_{\alpha}. Note that the generalized version of Has’minskiǐ’s theorem on the existence of unique probability solutions of the stationary FPK on convex domains [29, Thm. A], see also [15], implies that (24) is feasible. Since we assume that \ell is bounded from below, it follows that there exists a σ¯>\underline{\sigma}>-\infty with σ¯<~α\underline{\sigma}<\widetilde{\ell}_{\alpha}. Moreover, a Lagrangian relaxation of (24) yields

~αsupρ0{σ¯+(γ4[μ]+F[μ]Qσ¯,ρ)L2(Ωα)\displaystyle\widetilde{\ell}_{\alpha}\ \leq\ \sup_{\rho\geq 0}\ \left\{\overline{\sigma}+(\gamma_{4}\ell[\mu]+F[\mu]^{\intercal}\nabla Q-\overline{\sigma},\rho)_{L^{2}(\Omega_{\alpha})}\right.
ϵ(Q,ρ)L2(Ωα)}\displaystyle\left.\qquad\qquad\qquad-\epsilon(\nabla Q^{\intercal},\nabla\rho)_{L^{2}(\Omega_{\alpha})}\right\}
=supρ0{σ¯+(γ4[μ]+F[μ]Q+ϵΔQσ¯,ρ)L2(Ωα)\displaystyle\ =\ \sup_{\rho\geq 0}\ \left\{\vphantom{\int_{\partial\Omega}}\overline{\sigma}+(\gamma_{4}\ell[\mu]+F[\mu]^{\intercal}\nabla Q+\epsilon\Delta Q-\overline{\sigma},\rho)_{L^{2}(\Omega_{\alpha})}\right.
ϵΩρQnαdx}σ¯,\displaystyle\left.\qquad\qquad\quad-\epsilon\int_{\partial\Omega}\rho\nabla Q^{\intercal}n_{\alpha}\mathrm{d}x\right\}\ \leq\ \overline{\sigma}\;, (30)

for sufficiently large constants σ¯>0\overline{\sigma}>0. The latter inequality follows from (15), (16), and the fact that

Qn0onΩQnα0onΩα\nabla Q^{\intercal}n\geq 0\ \ \text{on}\ \ \partial\Omega\qquad\Longrightarrow\qquad\nabla Q^{\intercal}n_{\alpha}\geq 0\ \ \text{on}\ \ \partial\Omega_{\alpha}

holds due to the above construction of Ωα\Omega_{\alpha}. More precisely, (15) and (16) imply that

ΔQnxγ2andγ4[μ]+F[μ]Qγ3.\Delta Q\ \leq\ n_{x}\gamma_{2}\quad\text{and}\quad\gamma_{4}\ell[\mu]+F[\mu]^{\intercal}\nabla Q\ \leq\ \gamma_{3}\;.

Consequently, (30) holds for

σ¯=γ3+ϵnxγ2.\overline{\sigma}\ =\ \gamma_{3}+\epsilon\,n_{x}\gamma_{2}\;.

But this particular choice for σ¯\overline{\sigma} does not depend on α\alpha and, consequently, the bounds σ¯ασ¯\underline{\sigma}\leq\ell_{\alpha}\leq\overline{\sigma} hold uniformly. The statement of the theorem follows then by taking the limit α\alpha\to\infty in (24) and dividing the upper bound on ~\widetilde{\ell}_{\infty} by γ4\gamma_{4}. \diamond


The stage cost of (14) can be represented by the convex perspective function

𝔏(ρ,ν)=defΩ(,νρ)ρdx.\displaystyle\mathfrak{L}(\rho,\nu)\ \overset{\mathrm{def}}{=}\ \int_{\Omega}\ell\left(\cdot,\frac{\nu}{\rho}\right)\rho\,\mathrm{d}x. (31)

It is defined for any strictly positive and measurable function ρ>0\rho>0 and any measurable function ν\nu for which the above integral exists. Similarly, control constraints are represented by the convex cone

𝒦=def{(ρ,ν)H1(Ω)×L2(Ω)nu|νρ𝕌onΩρ0onΩ}.\mathcal{K}\ \overset{\mathrm{def}}{=}\ \left\{\ (\rho,\nu)\in H^{1}(\Omega)\times L^{2}(\Omega)^{n_{u}}\ \middle|\begin{array}[]{ll}\nu\in\rho\mathbb{U}&\text{on}\ \Omega\\[4.55254pt] \rho\geq 0&\text{on}\ \Omega\end{array}\right\}.

Thus, optimal steady states can be found by solving the convex optimization problem

=def\displaystyle\ell_{\infty}\overset{\mathrm{def}}{=} min(ρ,ν)𝒦\displaystyle\min_{(\rho,\nu)\in\mathcal{K}} 𝔏(ρ,ν)\displaystyle\ \mathfrak{L}(\rho,\nu) (32)
s.t.\displaystyle\mathrm{s.t.} {0=𝒜ρ+νonΩ0=(ϵρfρGν)nonΩ,1=Ωρdx,\displaystyle\ \left\{\begin{array}[]{rcll}0&=&\mathcal{A}\rho+\mathcal{B}\nu&\text{on}\ \Omega\\[4.55254pt] 0&\ =&\left(\epsilon\nabla\rho-f\rho-G\nu\right)^{\intercal}n&\text{on}\ \ \partial\Omega,\\[4.55254pt] 1&=&\int_{\Omega}\rho\,\mathrm{d}x,\end{array}\right. (36)

with (ρ,ν)𝒦(\rho,\nu)\in\mathcal{K} denoting optimization variables.

Corollary 1

Let Assumptions 1 and 2 be satisfied. Then (32) has a unique minimizer (ρ,ν)𝒦(\rho_{\infty},\nu_{\infty})\in\mathcal{K}.

Proof. The construction of ρs\rho_{\mathrm{s}} and μ\mu in Lemma 1 satisfies

(ρs,μρs)int(𝒦).(\rho_{\mathrm{s}},\mu\rho_{\mathrm{s}})\ \in\ \mathrm{int}(\mathcal{K})\;.

As such, the closed and convex cone 𝒦\mathcal{K} is compatible with Slater’s constraint qualification, recalling that ρs>0\rho_{\mathrm{s}}>0 follows from the minimum principle for elliptic operators [15]. Consequently, feasibility and boundedness of (32) follows immediately from Lemma 1, as (ρ,ν)=(ρs,μρs)(\rho,\nu)=(\rho_{\mathrm{s}},\mu\rho_{\mathrm{s}}) is a feasible point with bounded objective value. In summary, (32) has the following properties:

  1. 1.

    Slater’s (weak) constraint qualification holds,

  2. 2.

    the objective function \mathcal{L} is a convex perspective function that is—due to Assumption 1—strongly convex in the control ν\nu, and

  3. 3.

    the operator 𝒜\mathcal{A} is uniformly elliptic.


Consequently, the statement of this corollary follows from a well-known standard result on the existence and uniqueness of minimizers of elliptic PDE-constrained convex optimization problems in Hilbert-Sobolev spaces; see, for example, [28, Thm 1.45].\diamond

3.3 Dissipation of Energy

Let (ρ,ν)(\rho_{\infty},\nu_{\infty}) denote the optimal steady state, as defined in Corollary 1, and let μ=ν/ρ\mu_{\infty}=\nu_{\infty}/\rho_{\infty} denote an associated optimal control law. It can be used to study the controlled parabolic FPK

ρ˙=𝒜ρ+(μρ)onΩ𝕋ρ0=ρ(0)onΩ0=(ϵρF[μ]ρ)n|ΩonΣ𝕋\displaystyle\begin{array}[]{rcll}\dot{\rho}&\ =&\mathcal{A}\rho+\mathcal{B}(\mu_{\infty}\rho)&\text{on}\ \ \Omega_{\mathbb{T}}\\[4.55254pt] \rho_{0}&=&\rho(0)&\text{on}\ \ \Omega\\[4.55254pt] 0&=&\left.\left(\epsilon\nabla\rho-F[\mu_{\infty}]^{\intercal}\rho\right)^{\intercal}n\right|_{\partial\Omega}&\text{on}\ \ \Sigma_{\mathbb{T}}\end{array} (40)

on a possibly infinite time horizon 𝕋\mathbb{T}. Here, ρ0𝒫0(ρ1)\rho_{0}\in\mathcal{P}_{0}(\rho_{\infty}^{-1}) denotes an initial probability density that is assumed to be an element of the set

𝒫0(ρ1)=def{ρ0L2(ρ1)L1(Ω)|Ωρ0dx=1,ρ00onΩ}.\mathcal{P}_{0}(\rho_{\infty}^{-1})\ \overset{\mathrm{def}}{=}\ \left\{\ \rho_{0}\in L^{2}(\rho_{\infty}^{-1})\cap L^{1}(\Omega)\ \middle|\begin{array}[]{l}\int_{\Omega}\rho_{0}\,\mathrm{d}x=1,\\[4.55254pt] \rho_{0}\geq 0\ \ \text{on}\ \ \Omega\end{array}\right\}.

In this setting, the key idea for analyzing (40) is to introduce the energy functional

E=defρρL2(ρ1)2=Ω(ρρ)2ρdx.E\ \overset{\mathrm{def}}{=}\ \|\rho-\rho_{\infty}\|_{L^{2}(\rho_{\infty}^{-1})}^{2}\ =\ \int_{\Omega}\frac{(\rho-\rho_{\infty})^{2}}{\rho_{\infty}}\,\mathrm{d}x\ .\

Its weak time derivative is given by

dEdt=2ϵΩ(ρρ)22ρdx 0.\displaystyle\frac{\mathrm{d}E}{\mathrm{d}t}\ =\ -2\epsilon\int_{\Omega}\left\|\nabla\left(\frac{\rho}{\rho_{\infty}}\right)\right\|_{2}^{2}\rho_{\infty}\,\mathrm{d}x\ \leq\ \ 0\;. (41)

Note that this equation is well-known in the functional analysis and FPK literature [3, 15], where EE is often interpreted as energy (or entropy) that is dissipated during the evolution of the FPK. Alternatively, by translating this to control engineering language, EE could also be called a Lyapunov function for (40).

Lemma 2

Let Assumptions 1 and 2 be satisfied. Then, (40) has for every ρ0𝒫0(ρ1)\rho_{0}\in\mathcal{P}_{0}(\rho_{\infty}^{-1}) a unique solution

ρ𝒫(ρ1)with{ρ>0onΩ𝕋Ωρdx=1on𝕋.\rho\in\mathcal{P}(\rho_{\infty}^{-1})\qquad\text{with}\qquad\left\{\begin{array}[]{ll}\rho>0&\mathrm{on}\ \ \Omega_{\mathbb{T}}\\[4.55254pt] \int_{\Omega}\rho\,\mathrm{d}x=1&\mathrm{on}\ \ \mathbb{T}.\end{array}\right.

Moreover, ρ\rho satisfies (41). This results holds on finite and infinite time horizons 𝕋\mathbb{T}.

Proof. FPKs have been analyzed by so many authors that is impossible to list them all. As such, [3] and [15] should be regarded as selected examples for two out of a long list of articles in which proofs of variants of the current lemma can be found. Nevertheless, because this is important for the developments in this paper, it appears appropriate to recall the following three main arguments why the above statement holds.

  1. 1.

    First, if Ω𝕋\Omega_{\mathbb{T}} is bounded, a unique weak solution ρ𝒫(ρ1)\rho\in\mathcal{P}(\rho_{\infty}^{-1}) of (40) exists, since it is a uniformly parabolic linear PDE. Moreover, it follows from Harnack’s inequality that

    ϕ=defρρH1(ρ)on𝕋.\displaystyle\phi\ \overset{\mathrm{def}}{=}\ \frac{\rho}{\rho_{\infty}}\ \in\ H^{1}(\rho_{\infty})\qquad\text{on}\ \ \mathbb{T}. (42)

    The details of this argument can be found in [15, Chapter 7], where the square-integrability of logarithmic gradients of FPKs is discussed in all detail.

  2. 2.

    Next, (41) can be verified by direct computation,

    dEdt\displaystyle\frac{\mathrm{d}E}{\mathrm{d}t} =\displaystyle\ =\ 2ρ˙,ϕ1= 2ρ˙,ϕ1ρ˙,ϕ2\displaystyle 2\left\langle\dot{\rho},\phi-1\right\rangle\ =\ 2\left\langle\dot{\rho},\phi-1\right\rangle-\left\langle\dot{\rho}_{\infty},\phi^{2}\right\rangle (43)
    =(40)\displaystyle\overset{\eqref{eq::FPinf}}{=} 2Ω(ρF[μ]ϕϵρϕ)dx\displaystyle 2\int_{\Omega}\left(\rho F[\mu_{\infty}]^{\intercal}\nabla\phi-\epsilon\nabla\rho^{\intercal}\nabla\phi\right)\,\mathrm{d}x
    2Ω[ρF[μ]ϕϵρϕ]ϕdx\displaystyle-2\int_{\Omega}\left[\rho_{\infty}F[\mu_{\infty}]^{\intercal}\nabla\phi-\epsilon\nabla\rho_{\infty}^{\intercal}\nabla\phi\right]\phi\,\mathrm{d}x
    =(42)\displaystyle\overset{\eqref{eq::phi}}{=} 2ϵΩϕ22ρdx 0,\displaystyle-2\epsilon\int_{\Omega}\left\|\nabla\phi\right\|_{2}^{2}\rho_{\infty}\,\mathrm{d}x\ \leq\ 0,

    where the equation in the second line follows by using that the steady state ρ\rho_{\infty} also satisfies (40) but with ρ0=ρ\rho_{0}=\rho_{\infty} and ρ˙=0\dot{\rho}_{\infty}=0. Thus, Gronwall’s lemma yields the a-priori energy estimate

    ρ(t)ρL2(ρ1)ρ0ρL2(ρ1)on𝕋.\displaystyle\|\rho(t)-\rho_{\infty}\|_{L^{2}(\rho_{\infty}^{-1})}\ \leq\ \|\rho_{0}-\rho_{\infty}\|_{L^{2}(\rho_{\infty}^{-1})}\quad\text{on}\ \ \mathbb{T}. (44)
  3. 3.

    For the case that Ω\Omega and 𝕋\mathbb{T} are unbounded, one can proceed as in the proof of Lemma 1. This means that the above two steps are repeated in order to construct a sequence of solutions ρk𝒫(ρ1)\rho_{k}\in\mathcal{P}(\rho_{\infty}^{-1}) of (40) on the bounded domains Ωαk×(0,Tk)\Omega_{\alpha_{k}}\times(0,T_{k}) with increasing sequences αk\alpha_{k}\to\infty and TkT_{k}\to\infty. Because the energy estimate (44) yields a uniform upper bound, the sequence (ρk)k(\rho_{k})_{k\in\mathbb{N}} has a weakly convergent subsequence in the Hilbert space 𝒫(ρ1)\mathcal{P}(\rho_{\infty}^{-1}), whose limit must be a weak solution of (40) and, by continuity, (41) holds, too.


In addition to the above three arguments, it is also recalled that FPKs model the evolution of probability densities and, as such, the conservation law

Ωρdx=Ωρ0dx= 1\int_{\Omega}\rho\,\mathrm{d}x\ =\ \int_{\Omega}\rho_{0}\,\mathrm{d}x\ =\ 1

holds on 𝕋\mathbb{T}, referring to [15] for a formal discussion of this property. This completes the proof of the lemma. \diamond

Remark 3

Note that more general strategies for constructing Lyapunov functions for FPKs can be found in [3, 16], where so-called Φ\Phi-entropies are proposed as Lyapunov functions. These functions are useful for analyzing the properties of FPKs in general LpL^{p}-spaces, typically for 1p<1\leq p<\infty. Here, the case p=1p=1 is particularly relevant, as the L1L^{1}-norm is a natural choice for measuring the distance between probability densities. The methods in this paper are nevertheless presented in Hilbert-Sobolev spaces, as many of the constructions below rely on the concept of duality in convex PDE-constrained optimization. Extensions to other reflexive and non-reflexive LpL^{p}-spaces are beyond the scope of this article.

3.4 Ergodic Optimal Control

In order to analyze the ergodic optimal control problem (14), an additional assumption is introduced.

Assumption 3

We have [μ]L2(ρ)\ell[\mu_{\infty}]\in L^{2}(\rho_{\infty}).

This assumption is equivalent to requiring that [μ]\ell[\mu_{\infty}] has a variance at the steady state ρ\rho_{\infty}; that is,

Ω[μ]2ρdx2<.\int_{\Omega}\ell[\mu_{\infty}]^{2}\rho_{\infty}\,\mathrm{d}x-\ell_{\infty}^{2}\ <\ \infty.

If Assumption 1 and 2 hold and if Ω\Omega is bounded, this is clearly the case, and Assumption 3 automatically holds. Otherwise, if Ω\Omega is unbounded, such a variance exists under very mild growth conditions on \ell, as discussed in all detail in [15, Chapter 3], where upper bound estimates for ρ\rho_{\infty} can be found. Next, let

J(T,ρ0)=def\displaystyle J(T,\rho_{0})\overset{\mathrm{def}}{=} min(ρ,ν)𝒦𝕋\displaystyle\min_{(\rho,\nu)\in\mathcal{K}_{\mathbb{T}}} 0T𝔏(ρ,ν)dt\displaystyle\ \int_{0}^{T}\mathfrak{L}(\rho,\nu)\,\mathrm{d}t (45)
s.t.\displaystyle\mathrm{s.t.} {ρ˙=𝒜ρ+νonΩ𝕋0=(ϵρfρGν)nonΣ𝕋ρ0=ρ(0),onΩ\displaystyle\left\{\begin{array}[]{rcll}\dot{\rho}&=&\mathcal{A}\rho+\mathcal{B}\nu&\text{on}\ \Omega_{\mathbb{T}}\\[4.55254pt] 0&=&\left(\epsilon\nabla\rho-f\rho-G\nu\right)^{\intercal}n&\text{on}\ \Sigma_{\mathbb{T}}\\[4.55254pt] \rho_{0}&=&\rho(0),&\text{on}\ \Omega\end{array}\right. (49)

denote the value function of the stochastic control problem on the finite time horizon 𝕋=(0,T)\mathbb{T}=(0,T) with parameters T>0T>0 and ρ0𝒫0(ρ1)\rho_{0}\in\mathcal{P}_{0}(\rho_{\infty}^{-1}). Here, the optimization variables ρ\rho and ν\nu are elements of the convex cone

𝒦𝕋=def{(ρ,ν)|ρ𝒫(ρ1)ν𝒰(ρ1)(ρ(t),ν(t))𝒦f.a.t𝕋}.\mathcal{K}_{\mathbb{T}}\ \overset{\mathrm{def}}{=}\ \left\{\ (\rho,\nu)\ \middle|\begin{array}[]{l}\rho\in\mathcal{P}(\rho_{\infty}^{-1})\\[4.55254pt] \nu\in\mathcal{U}(\rho_{\infty}^{-1})\\[4.55254pt] (\rho(t),\nu(t))\in\mathcal{K}\quad\text{f.a.}\ \ t\in\mathbb{T}\end{array}\right\}\;.

By construction, Problem (14) is equivalent to computing the limit

J=limTJ(T,ρ0)T.\displaystyle J^{\star}\ =\ \lim_{T\to\infty}\ \frac{J(T,\rho_{0})}{T}\;. (50)

The following lemma proves that J=J^{\star}=\ell_{\infty}, which, in turn, implies that JJ^{\star} does not depend on ρ0\rho_{0}.

Theorem 1

Let Assumptions 12, and 3 be satisfied and let \ell_{\infty} be defined as in (32). Then (45) has for all T>0T>0 and all ρ0𝒫(ρ1)\rho_{0}\in\mathcal{P}(\rho_{\infty}^{-1}) a unique minimizer (ρ,ν)𝒦𝕋(\rho^{\star},\nu^{\star})\ \in\ \mathcal{K}_{\mathbb{T}}. Moreover, the ergodic limit (50) exists. It is given by

ρ0𝒫0(ρ1),limTJ(T,ρ0)T=.\forall\rho_{0}\in\mathcal{P}_{0}(\rho_{\infty}^{-1}),\qquad\lim_{T\to\infty}\frac{J(T,\rho_{0})}{T}\ =\ \ell_{\infty}\;.

Proof. This proof is divided into two parts, where the first part verifies the claim that (45) has a unique minimizer while the second part is about the ergodic limit.


Part I. First note that an upper bound on the stage cost of (45) can be found by applying the triangle- and the Cauchy-Schwarz inequality,

|𝔏(ρ,ν)|\displaystyle\left|\mathfrak{L}(\rho,\nu)\right| \displaystyle\ \leq\ +|𝔏(ρ,ν)|\displaystyle\ell_{\infty}+\left|\mathfrak{L}(\rho,\nu)-\ell_{\infty}\right| (51)
=\displaystyle= +|Ω[μ](ρρ)dx|\displaystyle\ \ell_{\infty}+\left|\int_{\Omega}\ell[\mu_{\infty}](\rho-\rho_{\infty})\,\mathrm{d}x\right|
\displaystyle\leq +ρρL2(ρ1)[μ]L2(ρ),\displaystyle\ell_{\infty}+\|\rho-\rho_{\infty}\|_{L^{2}(\rho_{\infty}^{-1})}\left\|\ell[\mu_{\infty}]\right\|_{L^{2}(\rho_{\infty})},

which holds along the trajectory (ρ,ν)=(ρ,μρ)(\rho,\nu)=(\rho,\mu_{\infty}\rho) of (40). Due to Assumption 3 and the energy estimate (44) from the proof of Lemma 2, this upper bound is sufficient to ensure that J(T,ρ0)<J(T,\rho_{0})<\infty. Thus, the first part of this proof is analogous to the argument in the proof of Corollary 1: the conic constraint (ρ,ν)𝒦𝕋(\rho,\nu)\in\mathcal{K}_{\mathbb{T}} is strictly feasible, which implies that Slater’s weak constraint qualification holds, and \mathcal{L} is strictly convex in ν\nu. Consequently, the existence of a unique minimizer of (45) for T<T<\infty follows from the standard existence theorem for PDE-constrained optimization problems [28, Thm. 1.45].


Part II. Lemma 2 implies first that the solution ρ\rho of (40) is such that that the function

ϕ(t)=ρ(t)ρtC\phi(t)=\frac{\rho(t)}{\rho_{\infty}}\ \ \overset{t\to\infty}{\longrightarrow}\ \ C

converges to a constant CC. But we also have

ρ(t)L1(Ω)=ρ(t)L1(Ω)= 1,\|\rho(t)\|_{L^{1}(\Omega)}\ =\ \|\rho_{\infty}(t)\|_{L^{1}(\Omega)}\ =\ 1,

since ρ\rho and ρ\rho_{\infty} are both probability density functions. Consequently, this constant is given by C=1C=1 and ρ(t)\rho(t) converges to ρ\rho_{\infty} in L2(ρ1)L^{2}(\rho_{\infty}^{-1}) for tt\to\infty. By substituting ν=μρ\nu=\mu_{\infty}\rho in (45), it follows that there exists for every δ>0\delta>0 a Tδ<T_{\delta}<\infty, with TδT_{\delta}\to\infty for δ0\delta\to 0, such that

TTδ,J(T,ρ0)T+δ.\displaystyle\forall T\geq T_{\delta},\qquad\frac{J(T,\rho_{0})}{T}\ \leq\ \ell_{\infty}+\delta\;. (52)

Next, let (ρδ,νδ)(\rho_{\delta},\nu_{\delta}) denote the minimizer of (45) on the time horizon 𝕋=(0,Tδ)\mathbb{T}=(0,T_{\delta}) and let

ρ¯δ=def1Tδ0Tδρδdtandν¯δ=def1Tδ0Tδνδdt\overline{\rho}_{\delta}\ \overset{\mathrm{def}}{=}\ \frac{1}{T_{\delta}}\int_{0}^{T_{\delta}}\rho_{\delta}\,\mathrm{d}t\qquad\text{and}\qquad\overline{\nu}_{\delta}\ \overset{\mathrm{def}}{=}\ \frac{1}{T_{\delta}}\int_{0}^{T_{\delta}}\nu_{\delta}\,\mathrm{d}t

denote their time averages. Since 𝒦\mathcal{K} is a convex cone, it follows that (ρ¯δ,ν¯δ)𝒦(\overline{\rho}_{\delta},\overline{\nu}_{\delta})\in\mathcal{K} and

𝒜ρ¯δ+ν¯δ=1Tδ0Tδρ˙δdt=ρ(Tδ)ρ0Tδ,\displaystyle\mathcal{A}\overline{\rho}_{\delta}+\mathcal{B}\overline{\nu}_{\delta}\ =\ \frac{1}{T_{\delta}}\int_{0}^{T_{\delta}}\dot{\rho}_{\delta}\,\mathrm{d}t\ =\ \frac{\rho(T_{\delta})-\rho_{0}}{T_{\delta}},

which implies that

limδ0(𝒜ρ¯δ+ν¯δ)= 0,\displaystyle\lim_{\delta\to 0}\,\left(\mathcal{A}\overline{\rho}_{\delta}+\mathcal{B}\overline{\nu}_{\delta}\right)\ =\ 0, (53)

almost everywhere on Ω\Omega, since \ell is bounded from below and radially unbounded. In other words, (ρ¯δ,ν¯δ)(\overline{\rho}_{\delta},\overline{\nu}_{\delta}) converges to a feasible steady-state in 𝒦\mathcal{K}. Moreover, Jensen’s inequality yields

J(Tδ,ρ)Tδ=0Tδ𝔏(ρδ,vδ)dtTδ𝔏(ρ¯δ,ν¯δ).\displaystyle\frac{J(T_{\delta},\rho)}{T_{\delta}}\ =\ \frac{\int_{0}^{T_{\delta}}\mathfrak{L}(\rho_{\delta},v_{\delta})\,\mathrm{d}t}{T_{\delta}}\ \geq\ \mathfrak{L}\left(\overline{\rho}_{\delta},\overline{\nu}_{\delta}\right)\,. (54)

But then, on the one hand, (52) implies

lim supδ0J(Tδ,ρ0)Tδ\limsup_{\delta\to 0}\ \frac{J(T_{\delta},\rho_{0})}{T_{\delta}}\ \leq\ \ell_{\infty}

and, on the other hand, (53) and (54) and the fact that 𝒜\mathcal{A} is uniformly elliptic imply, by continuity,

lim infδ0J(Tδ,ρ0)Tδ.\liminf_{\delta\to 0}\ \frac{J(T_{\delta},\rho_{0})}{T_{\delta}}\ \geq\ \ell_{\infty}\;.

Consequently, the ergodic limit for δ0\delta\to 0, or, equivalently, TδT_{\delta}\to\infty, exists, and J=J^{\star}=\ell_{\infty}. This completes the proof of the theorem.\diamond

4 Hamilton-Jacobi-Bellman Equations

Many classical methods for analyzing the existence of solutions to HJBs [9, 23] start by establishing (or sometimes even imposing) certain properties—for example, Hölder continuity or differentiability—of the value function of an optimal control problem and then show that this function actually satisfies its associated HJB, be it in a classical-, weak-, or viscosity sense. An impressive exception is the work by Krylov [32, 33], which is using and developing methods for general nonlinear parabolic PDEs that can be applied to analyze the solutions of controlled diffusions and nonlinear HJBs. Nevertheless, it appears that the convex duality of HJBs and FPKs, which, in principle, is known since a long time [13, 24], has never been fully exploited for the purpose of analyzing HJBs. Therefore, rather than relying on general nonlinear analysis methods, this section proposes to analyze HJBs by using modern PDE-constrained convex optimization methods [28], which can exploit the above mentioned duality of HJBs and FPKs.

4.1 Hamiltonians

Let H:Ω¯×nxH:\overline{\Omega}\times\mathbb{R}^{n_{x}}\to\mathbb{R} and u:Ω¯×nx𝕌u^{\star}:\overline{\Omega}\times\mathbb{R}^{n_{x}}\to\mathbb{U} denote the Hamiltonian and its associated minimizer,

H(x,λ)\displaystyle H(x,\lambda) =def\displaystyle\overset{\mathrm{def}}{=} minu𝕌{(x,u)+λ[f(x)+G(x)u]}\displaystyle\min_{u\in\mathbb{U}}\left\{\ell(x,u)+\lambda^{\intercal}[f(x)+G(x)u]\right\}
andu(x,λ)\displaystyle\text{and}\quad u^{\star}(x,\lambda) =def\displaystyle\ \overset{\mathrm{def}}{=}\ argminu𝕌{(x,u)+λ[f(x)+G(x)u]}\displaystyle\underset{u\in\mathbb{U}}{\text{argmin}}\left\{\ell(x,u)+\lambda^{\intercal}[f(x)+G(x)u]\right\}

for all (x,λ)Ω¯×nxx,\lambda)\in\overline{\Omega}\times\mathbb{R}^{n_{x}}. Assumption 1 implies that uu^{\star} is Lipschitz continuous on Ω¯×nx\overline{\Omega}\times\mathbb{R}^{n_{x}}. Thus, HH is Lipschitz continuous and, consequently,

V𝒫(ρ),H(,V)L2(0,T;L2(ρ)),\displaystyle\forall V\in\mathcal{P}(\rho_{\infty}),\qquad H(\cdot,\nabla V)\in L^{2}(0,T;L^{2}(\rho_{\infty}))\;, (55)

recalling that ρ\rho_{\infty} denotes the optimal steady state, as defined in Corollary 1.

4.2 Parabolic HJBs

On finite time horizons 𝕋=(0,T)\mathbb{T}=(0,T) the parabolic HJB with Neumann boundary condition is given by

V˙=H(x,V)+ϵΔVonΩ𝕋0=V(T)onΩ0=VnonΣ𝕋.\displaystyle\begin{array}[]{rcll}-\dot{V}&\ =&H(x,\nabla V)+\epsilon\,\Delta V&\text{on}\ \ \Omega_{\mathbb{T}}\\[4.55254pt] 0&\ =&V(T)&\text{on}\ \ \Omega\\[4.55254pt] 0&=&\nabla V^{\intercal}n&\text{on}\ \ \Sigma_{\mathbb{T}}.\end{array} (59)

The definition of weak solutions of (59) relies on (55).

Definition 2

A function V𝒫(ρ)V\in\mathcal{P}(\rho_{\infty}) with V(T,)=0V(T,\cdot)=0 is called a weak solution of (59) if

0TV˙,ρdt=0TΩ[ρH(x,V)ϵρV]dxdt\displaystyle-\int_{0}^{T}\langle\dot{V},\rho\rangle\,\mathrm{d}t\ =\int_{0}^{T}\int_{\Omega}\left[\rho H(x,\nabla V)-\epsilon\nabla\rho^{\intercal}\nabla V\right]\mathrm{d}x\,\mathrm{d}t

for all test functions ρ𝒫(ρ1)\rho\in\mathcal{P}(\rho_{\infty}^{-1}).

Different from classical methods for analyzing HJBs, the following lemma uses Riesz’ representation theorem to establish the existence and uniqueness of weak solutions in the Hilbert space 𝒫(ρ)\mathcal{P}(\rho_{\infty}).

Lemma 3

Let Assumptions 12, and 3 hold and let ρ\rho_{\infty} be defined as in Corollary 1. The statements below hold for all T<T<\infty.

  1. 1.

    HJB (59) has a unique weak solution V𝒫(ρ)V\in\mathcal{P}(\rho_{\infty}).

  2. 2.

    The value function J(T,)J(T,\cdot) of (45) is a bounded linear operator, whose Riesz representation is given by

    ρ0𝒫0(ρ1),J(T,ρ0)=ΩV(0,x)ρ0(x)dx.\forall\rho_{0}\in\mathcal{P}_{0}(\rho_{\infty}^{-1}),\ \ \ J(T,\rho_{0})\ =\ \int_{\Omega}V(0,x)\rho_{0}(x)\,\mathrm{d}x.

Proof. Let us temporarily assume that Ω\Omega is bounded. In this case, the auxiliary optimization problem

𝔍(μ)=def\displaystyle\mathfrak{J}(\mu)\ \overset{\mathrm{def}}{=}\ min(ρ,ν)𝒦𝕋\displaystyle\underset{(\rho,\nu)\in\mathcal{K}_{\mathbb{T}}}{\min} 0T𝔏(ρ,ν)dt\displaystyle\ \int_{0}^{T}\mathfrak{L}(\rho,\nu)\,\mathrm{d}t (60)
s.t.\displaystyle\mathrm{s.t.} {ρ˙=𝒜ρ+νonΩ𝕋ν=μρonΩ𝕋ρ0=ρ(0),onΩ0=(ϵρfρGν)nonΣ𝕋\displaystyle\left\{\begin{array}[]{rcll}\dot{\rho}&=&\mathcal{A}\rho+\mathcal{B}\nu&\text{on}\ \Omega_{\mathbb{T}}\\[4.55254pt] \nu&=&\mu\rho&\text{on}\ \Omega_{\mathbb{T}}\\[4.55254pt] \rho_{0}&=&\rho(0),&\text{on}\ \Omega\\[4.55254pt] 0&=&\left(\epsilon\nabla\rho-f\rho-G\nu\right)^{\intercal}n&\text{on}\ \Sigma_{\mathbb{T}}\end{array}\right. (65)

satisfies Slater’s weak constraint qualification for uniformly parabolic PDE-constrained convex optimization problems on a bounded domain, recalling that 𝔏\mathfrak{L} is strongly convex in ν\nu. Consequently, it follows from a standard existence result for PDE-constrained optimization problems [28, Chapter 1.5] that (60) has for all parameters μ𝔘\mu\in\mathfrak{U} and given ρ0𝒫0(ρ1)\rho_{0}\in\mathcal{P}_{0}(\rho_{\infty}^{-1}) a unique minimizer (ρμ,νμ)𝒦𝕋(\rho_{\mu},\nu_{\mu})\in\mathcal{K}_{\mathbb{T}}. Next, let V𝒫(ρ)V\in\mathcal{P}(\rho_{\infty}) denote the unique weak solution of the uniformly parabolic linear PDE

V˙=[μ]+F[μ]V+ϵΔVonΩ𝕋0=V(T)onΩ0=VnonΣ𝕋\displaystyle\begin{array}[]{rcll}-\dot{V}&=&\ell[\mu^{\star}]+F[\mu^{\star}]^{\intercal}\nabla V+\epsilon\Delta V&\text{on}\ \Omega_{\mathbb{T}}\\[4.55254pt] 0&=&V(T)&\text{on}\ \Omega\\[4.55254pt] 0&=&\nabla V^{\intercal}n&\text{on}\ \Sigma_{\mathbb{T}}\end{array} (69)

at μ=ν/ρ\mu^{\star}=\nu^{\star}/\rho^{\star}, where (ρ,ν)𝒦𝕋(\rho^{\star},\nu^{\star})\in\mathcal{K}_{\mathbb{T}} denotes—as in Theorem 1—the minimizer of (45). This notation is such that the relations ρ=ρμ\rho^{\star}=\rho_{\mu^{\star}} and ν=νμ\nu^{\star}=\nu_{\mu^{\star}} hold. Let

χ(ρ,μ)\displaystyle\chi(\rho,\mu) =def\displaystyle\ \overset{\mathrm{def}}{=}\ 0T{V˙+[μ]+F[μ]V,ρ\displaystyle\int_{0}^{T}\{\langle\dot{V}+\ell[\mu]+F[\mu]^{\intercal}\nabla V,\rho\rangle
ϵρV}dt+ΩV(0,x)ρ0(x)dx\displaystyle\quad\ -\epsilon\nabla\rho^{\intercal}\nabla V\}\,\mathrm{d}t+\int_{\Omega}V(0,x)\rho_{0}(x)\,\mathrm{d}x

denote an associated parametric Lagrangian at VV. Because VV is a weak solution of (69), the function χ(,μ)\chi(\cdot,\mu^{\star}) is constant on 𝒫(ρ1)\mathcal{P}(\rho_{\infty}^{-1}) and satisfies

ρ𝒫(ρ1),χ(ρ,μ)=ΩV(0,x)ρ0(x)dx.\displaystyle\forall\rho\in\mathcal{P}(\rho_{\infty}^{-1}),\quad\chi(\rho,\mu^{\star})\ =\ \int_{\Omega}V(0,x)\rho_{0}(x)\,\mathrm{d}x\;. (70)

Moreover, the Lagrangian χ\chi is closely related to the Hamiltonian HH. In order to elaborate on this relation, the auxiliary control law

μ^=defu(,V),\hat{\mu}\ \overset{\mathrm{def}}{=}\ u^{\star}(\cdot,\nabla V)\;,

is introduced, recalling that uu^{\star} is Lipschitz continuous. Thus, it satisfies μ^𝔘\hat{\mu}\in\mathfrak{U} and

infμ𝔘χ(ρ,μ)\displaystyle\inf_{\mu\in\mathfrak{U}}\ \chi(\rho,\mu) =\displaystyle\ =\ χ(ρ,μ^)\displaystyle\chi(\rho,\hat{\mu})
=\displaystyle= 0TΩ{ρH(x,V)ϵρV}dxdt\displaystyle\int_{0}^{T}\int_{\Omega}\left\{\rho H(x,\nabla V)-\epsilon\nabla\rho^{\intercal}\nabla V\right\}\mathrm{d}x\,\mathrm{d}t
+0TV˙,ρdt+ΩV(0,x)ρ0(x)dx\displaystyle+\int_{0}^{T}\langle\dot{V},\rho\rangle\,\mathrm{d}t+\int_{\Omega}V(0,x)\rho_{0}(x)\,\mathrm{d}x

for all ρ𝒫(ρ1)\rho\in\mathcal{P}(\rho_{\infty}^{-1}) with ρ0\rho\geq 0. Moreover, by regarding VV as a test function, it follows that

𝔍(μ)\displaystyle\mathfrak{J}(\mu) =\displaystyle\ =\ 0T{𝔏(ρμ,μρμ)ρ˙μ,V\displaystyle\int_{0}^{T}\left\{\mathfrak{L}(\rho_{\mu},\mu\rho_{\mu})-\langle\dot{\rho}_{\mu},V\rangle\right. (72)
+a(ρμ,V)+b(μρμ,V)}dt\displaystyle\qquad\qquad\qquad\quad\left.+a(\rho_{\mu},V)+b(\mu\rho_{\mu},V)\right\}\,\mathrm{d}t
=\displaystyle= χ(ρμ,μ),\displaystyle\chi(\rho_{\mu},\mu)\;,

for all μ𝔘\mu\in\mathfrak{U}, where the second equation in (72) follows by substituting the definitions of a,b,χa,b,\chi, and 𝔏\mathfrak{L} and a partial integration with respect to time. Next, an important observation is that

𝔍(μ)\displaystyle\mathfrak{J}(\mu^{\star}) =(72)\displaystyle\ \overset{\eqref{eq::dualJP}}{=}\ χ(ρ,μ)=(70)χ(ρμ^,μ)infμ𝔘χ(ρμ^,μ)\displaystyle\chi(\rho^{\star},\mu^{\star})\ \overset{\eqref{eq::chiconst}}{=}\ \chi(\rho_{\hat{\mu}},\mu^{\star})\ \geq\ \inf_{\mu\in\mathfrak{U}}\chi(\rho_{\hat{\mu}},\mu) (73)
=(4.2)\displaystyle\ \overset{\eqref{eq::chi}}{=}\ χ(ρμ^,μ^)=(72)𝔍(μ^).\displaystyle\chi(\rho_{\hat{\mu}},\hat{\mu})\ \overset{\eqref{eq::dualJP}}{=}\ \mathfrak{J}(\hat{\mu})\;.

Since μ=ν/ρ\mu^{\star}=\nu^{\star}/\rho^{\star} is—due to Theorem 1—the unique minimizer of the optimization problem

𝔍(μ)=minμ𝔘𝔍(μ)𝔍(μ^)(73)𝔍(μ)=J(μ^),\mathfrak{J}(\mu^{\star})\ =\ \min_{\mu\in\mathfrak{U}}\ \mathfrak{J}(\mu)\ \leq\ \mathfrak{J}(\hat{\mu})\quad\overset{\eqref{eq::Jineq}}{\Longrightarrow}\quad\mathfrak{J}(\mu^{\star})\ =\ J(\hat{\mu}),

it follows from (73) that μ^=μ\hat{\mu}=\mu^{\star}. Moreover, by substituting the equation μ=μ^\mu^{\star}=\hat{\mu} in (69), we find that VV satisfies (59); and, finally, (70) and (72) yield

J(T,ρ0)=𝔍(μ)=ΩV(0,x)ρ0(x)dx\quad J(T,\rho_{0})\ =\ \mathfrak{J}(\mu^{\star})\ =\ \int_{\Omega}V(0,x)\rho_{0}(x)\,\mathrm{d}x

for all ρ0𝒫(ρ1)\rho_{0}\in\mathcal{P}(\rho_{\infty}^{-1}). This implies that J(T,)J(T,\cdot) is a bounded linear operator. Next, if HJB (59) had a second solution V~V\widetilde{V}\neq V, we could repeat the above construction by using the control law

μ~=u(,V~)\widetilde{\mu}\ =\ u^{\star}(\cdot,\nabla\widetilde{V})

in order to find yet another representation of the operator J(T,)J(T,\cdot), namely,

J(T,ρ0)=ΩV~(0,x)ρ0(x)dxJ(T,\rho_{0})\ =\ \int_{\Omega}\widetilde{V}(0,x)\rho_{0}(x)\,\mathrm{d}x

for any ρ0𝒫(ρ1)\rho_{0}\in\mathcal{P}(\rho_{\infty}^{-1}). But this is impossible, as the Riesz representation of the bounded linear operator J(T,)J(T,\cdot) must be unique. Thus, HJB (59) has a unique solution and we have proven both statements of the theorem under the additional assumption that Ω\Omega is bounded.


Last but not least, if Ω\Omega is unbounded, the same strategy as in the proof of Lemma 2 can be applied. This means that we repeat the above argument by replacing the domain Ω\Omega in (45) with an increasing sequence of bounded domains Ωαk\Omega_{\alpha_{k}} in order to construct an associated sequence of bounded linear operators, Jk(T,)J_{k}(T,\cdot), that converges to J(T,)J(T,\cdot) for kk\to\infty. Since Theorem 1 has already established the fact that J(T,)J(T,\cdot) is bounded for finite TT but potentially unbounded domains Ω\Omega, it follows that J(T,)J(T,\cdot) is a bounded linear operator. Note that this is equivalent to the convergence of the corresponding sequence of Riesz’ representations Vk𝒫(ρ)V_{k}\in\mathcal{P}(\rho_{\infty}) of the operators

Jk(T,ρ0)=ΩαkVk(0,x)ρ0(x)dx,J_{k}(T,\rho_{0})\ =\ \int_{\Omega_{\alpha_{k}}}V_{k}(0,x)\rho_{0}(x)\,\mathrm{d}x,

since in Hilbert spaces one can rely on the natural isometric isomorphism between linear operators and their Riesz’ representations. Because HH is Lipschitz continuous, the limit of the sequence VkV_{k} for kk\to\infty is the unique weak solution of (59). This completes the proof of the lemma. \diamond

Remark 4

As already mentioned above, parabolic HJBs are among the most well-studied nonlinear PDEs in the literature [9, 23, 33]. Obviously, Lemma 3 adds yet another set of assumptions under which such HJBs admit a unique solution. The relevance of this result, however, is that—different from all other methods for analyzing the existence of solutions of HJBs—the proof of Lemma 3 starts from the convex optimization problem (45) and then derives the HJB (59) by constructing an associated dual problem. As discussed in the following sections, this convex optimization based derivation of HJBs turns out to have many advantages, as it enables us to analyze HJBs by leveraging on modern functional analysis methods for linear FPKs.

5 Exponential Stabilizability

Carré Du Champ (CDC) operators have originally been introduced in the French literature on functional analysis [6]. They can be used to analyze general diffusions. The following section introduces a variant of such CDC operators in order to establish global exponential stabilizability conditions for controlled FPKs.

5.1 Weighted Carré Du Champ Operators

The weighted CDC operator Γ\Gamma is a bilinear form,

φ,ϕC(Ω),Γ(φ,ϕ)=defφPϕ.\displaystyle\forall\varphi,\phi\in C^{\infty}(\Omega),\qquad\Gamma(\varphi,\phi)\ \overset{\mathrm{def}}{=}\ \nabla\varphi^{\intercal}P\nabla\phi\;. (74)

Here, PC2(Ω)nx×nxP\in C^{2}(\Omega)^{n_{x}\times n_{x}} denotes a weight that satisfies λ¯IPλ¯I\underline{\lambda}I\preceq P\preceq\overline{\lambda}I for given positive lower- and upper bounds, 0<λ¯λ¯<0<\underline{\lambda}\leq\overline{\lambda}<\infty. For the special case that P=ϵIP=\epsilon I is constant and equal to the scaled unit matrix ϵI\epsilon I, Γ\Gamma coincides with the traditional CDC operator of the diffusion semigroup that is generated by the linear operator

ϕC(Ω),[μ]ϕ=defF[μ]ϕ+ϵΔϕ.\forall\phi\in C^{\infty}(\Omega),\qquad\mathcal{L}[\mu]\phi\ \overset{\mathrm{def}}{=}\ F[\mu]^{\intercal}\nabla\phi+\epsilon\Delta\phi.

It is defined for all control laws μ𝔘\mu\in\mathfrak{U}, although the right-hand of the CDC inequality

12([μ]ϕ22ϕ[μ]ϕ)=ϵϕ22ϵλ¯Γ(ϕ,ϕ)\displaystyle\frac{1}{2}\left(\mathcal{L}[\mu]\phi^{2}-2\phi\mathcal{L}[\mu]\phi\right)\ =\ \epsilon\left\|\nabla\phi\right\|_{2}^{2}\ \leq\ \frac{\epsilon}{\underline{\lambda}}\,\Gamma(\phi,\phi) (75)

is an invariant, in the sense that it does not depend on μ\mu.

5.2 Lyapunov Diffusion Operators

The design of potentially non-trivial weight functions PP turns out to be important for the construction of practical stabilizability conditions. In order to explain why this is so, the Lyapunov diffusion operator

[μ]P=def12([μ]PF[μ]PPF[μ]),\displaystyle\mathcal{R}[\mu]P\ \overset{\mathrm{def}}{=}\ \frac{1}{2}\left(\mathcal{L}[\mu]P-F^{\prime}[\mu]P-PF^{\prime}[\mu]^{\intercal}\right), (76)

is introduced for arbitrary μ𝔘\mu\in\mathfrak{U}. Here, [μ]\mathcal{L}[\mu] is applied to PC2(Ω)nx×nxP\in C^{2}(\Omega)^{n_{x}\times n_{x}} componentwise and F[μ]F^{\prime}[\mu] denotes the distributional Jacobian of F[μ]F[\mu]. This means that we define

([μ]P)i,j\displaystyle(\mathcal{L}[\mu]P)_{i,j} =def\displaystyle\ \overset{\mathrm{def}}{=}\ F[μ]Pi,j+ϵΔPi,j\displaystyle F[\mu]^{\intercal}\nabla P_{i,j}+\epsilon\Delta P_{i,j}
and(F[μ])i,j\displaystyle\text{and}\qquad(F^{\prime}[\mu])_{i,j} =def\displaystyle\overset{\mathrm{def}}{=} xjFi[μ]\displaystyle\partial_{x_{j}}F_{i}[\mu]

for all components i,j{1,,nx}i,j\in\{1,\ldots,n_{x}\}, where xj\partial_{x_{j}} denotes the distributional derivative with respect to xjx_{j}. In the following, we use the convention that a formal symmetric matrix-valued distribution YY on Ω\Omega is positive semi-definite, writing Y0Y\succeq 0, if

ΩTr(YZ)dx 0\int_{\Omega}\mathrm{Tr}(YZ)\,\mathrm{d}x\ \geq\ 0

for all symmetric and positive semi-definite test functions ZC0(Ω)nY×nYZ\in C_{0}^{\infty}(\Omega)^{n_{Y}\times n_{Y}} with compatible dimension. Linear matrix inequalities (LMIs) are then understood in an analogous manner.

Assumption 4

Let μ=ν/ρ\mu_{\infty}=\nu_{\infty}/\rho_{\infty} denote the optimal ergodic control law as defined in Corollary 1. We assume that there exists a λ>0\lambda>0 such that the LMI

([μ]P(ϵP)(ϵP)I(ϵP))λ(P 000)onΩ\displaystyle\left(\begin{array}[]{cc}\mathcal{R}[\mu_{\infty}]P&\nabla^{\intercal}\otimes(\epsilon P)\\[4.55254pt] \nabla\otimes(\epsilon P)&I\otimes(\epsilon P)\end{array}\right)\ \succeq\ \lambda\left(\begin{array}[]{cc}P&\ 0\\ 0&0\end{array}\right)\ \ \text{on}\ \ \Omega\ \ (81)

admits a positive definite solution PC2(Ω)nx×nxP\in C^{2}(\Omega)^{n_{x}\times n_{x}} with λ¯IPλ¯P\underline{\lambda}I\preceq P\preceq\overline{\lambda}P on Ω\Omega. Here, \otimes denotes the Kronecker product.


Examples for linear and nonlinear control systems for which Assumption 4 holds can be found below.

Example 3

Let f(x)=Axf(x)=Ax and G(x)=BG(x)=B be a linear control system on Ω=nx\Omega=\mathbb{R}^{n_{x}} with (A,B)(A,B) being asymptotically stabilizable and (x,u)=x2+u2\ell(x,u)=\|x\|^{2}+\|u\|^{2}. In this case, μ=Kx\mu_{\infty}=Kx is the well-known Linear-Quadratic-Gaussian (LQG) regulator and Assumption 4 holds for a constant Pnx×nxP\in\mathbb{R}^{n_{x}\times n_{x}}. This is because for a constant P0P\succ 0, the equations [μ]P=0\mathcal{L}[\mu]P=0 and P=0\nabla\otimes P=0 hold such that LMI (81) reduces to

[μ]P=(A+BK)P+P(A+BK)2λP.\mathcal{R}[\mu]P\ =\ -\frac{(A+BK)P+P(A+BK)^{\intercal}}{2}\ \succeq\ \lambda P\;.

This LMI has a strictly positive solution P0P\succ 0 for sufficiently small λ>0\lambda>0, since we assume that (A,B)(A,B) is asymptotically stabilizable, or, equivalently, that the eigenvalues of A+BKA+BK have strictly negative real parts.


Example 4

Let us consider a stochastic system with f(x)=x2x3f(x)=\frac{x}{2}-x^{3}, G(x)=0G(x)=0, and ϵ=1\epsilon=1 on Ω=\Omega=\mathbb{R}. It leads to an SDE with multi-modal limit distribution,

ρ(x)=1Cexp(x2x44),\rho_{\infty}(x)=\frac{1}{C}\exp\left(\frac{x^{2}-x^{4}}{4}\right),

where the constant CC can be chosen such that the integral over ρ\rho_{\infty} is equal to 11. In this example, (81) cannot be satisfies for a constant function PP, but it is possible to find other classes of functions which satisfy (81). For instance, Assumption 4 can be satisfied by setting

P(x)=112ex2,λ=120,λ¯=12,andλ¯=1.P(x)=1-\frac{1}{2}e^{-x^{2}},\ \ \lambda=\frac{1}{20},\ \ \underline{\lambda}=\frac{1}{2},\ \ \text{and}\ \ \overline{\lambda}=1\;.

Apart from the above examples, another reason why Assumption 4 appears appropriate for our purposes, is elaborated in the following section. Namely, it turns out that LMI (81) is satisfied for an even broader class of problems than the original Bakry-Emery condition for FPKs, which is—at least in the FPK literature—well-accepted as a useful and practical condition; see, for instance, [3, 6, 16, 35], where many examples are collected.

5.3 Exponential Convergence Rate

The following lemma interprets LMI (81) as a generalized Bakry-Emery condition.

Lemma 4

Let Assumptions 12, and 4 hold and let ρ0𝒫0(ρ1)\rho_{0}\in\mathcal{P}_{0}(\rho_{\infty}^{-1}) be given. Then the controlled FPK

ρ˙=𝒜ρ+(μρ)onΩ𝕋ρ0=ρ(0)onΩ0=(ρF[μ]ρ)n|ΩonΣ𝕋\displaystyle\begin{array}[]{rcll}\dot{\rho}&=&\mathcal{A}\rho+\mathcal{B}(\mu_{\infty}\rho)&\text{on}\ \ \Omega_{\mathbb{T}}\\[4.55254pt] \rho_{0}&=&\rho(0)&\text{on}\ \ \Omega\\[4.55254pt] 0&=&\left.\left(\nabla\rho-F[\mu_{\infty}]\rho\right)^{\intercal}n\right|_{\partial\Omega}&\text{on}\ \ \Sigma_{\mathbb{T}}\\ \end{array} (85)

has a unique solution ρ𝒫(ρ1)\rho\in\mathcal{P}(\rho_{\infty}^{-1}) that converges to the optimal ergodic density function ρ\rho_{\infty} in L2(ρ1)L^{2}(\rho_{\infty}^{-1}) with exponential convergence rate γ=2λ(λ¯/λ¯)\gamma=2\lambda\cdot\left(\underline{\lambda}/\overline{\lambda}\right); that is,

ρ(t)ρL2(ρ1)eγtρ0ρL2(ρ1)\displaystyle\|\rho(t)-\rho_{\infty}\|_{L^{2}(\rho_{\infty}^{-1})}\ \leq\ e^{-\gamma t}\|\rho_{0}-\rho_{\infty}\|_{L^{2}(\rho_{\infty}^{-1})} (86)

for all t>0t>0.

Proof. In order to interpret LMI (81) as a generalized Bakry-Emery condition, we work out the Γ2\Gamma_{2}-operator

Γ2(ϕ)\displaystyle\Gamma_{2}(\phi) =def\displaystyle\ \overset{\mathrm{def}}{=}\ 12([μ]Γ(ϕ,ϕ)2Γ(ϕ,[μ]ϕ))\displaystyle\dfrac{1}{2}\left(\mathcal{L}[\mu_{\infty}]\Gamma(\phi,\phi)-2\Gamma(\phi,\mathcal{L}[\mu_{\infty}]\phi)\right) (93)
=\displaystyle= 12F[μ](ϕPϕ)+ϵΔ(ϕPϕ)\displaystyle\ \dfrac{1}{2}F[\mu_{\infty}]^{\intercal}\nabla(\nabla\phi^{\intercal}P\nabla\phi)+\epsilon\Delta(\nabla\phi^{\intercal}P\nabla\phi)
ϕP(F[μ]ϕ+ϵΔϕ)\displaystyle\hphantom{F[\mu_{\infty}]}-\nabla\phi P\nabla(F[\mu_{\infty}]^{\intercal}\nabla\phi+\epsilon\Delta\phi)
=\displaystyle=\ 12ϕ[μ]Pϕ+2ϵ(ϕ)(P)ϕ\displaystyle\dfrac{1}{2}\nabla\phi^{\intercal}\mathcal{L}[\mu_{\infty}]P\nabla\phi+2\epsilon(\nabla\otimes\nabla\phi)^{\intercal}(\nabla\otimes P)\nabla\phi
+ϵ(ϕ)(IP)(ϕ)\displaystyle\hphantom{F[\mu_{\infty}]}+\epsilon(\nabla\otimes\nabla\phi)^{\intercal}(I\otimes P)(\nabla\otimes\nabla\phi)
ϕPF[μ]ϕ\displaystyle\hphantom{F[\mu]}-\nabla\phi PF^{\prime}[\mu_{\infty}]^{\intercal}\nabla\phi
=\displaystyle= (ϕϕ)([μ]PϵPϵPϵIP)(ϕϕ)\displaystyle\ \left(\begin{array}[]{c}\nabla\phi\\ \nabla\otimes\nabla\phi\end{array}\right)^{\intercal}\left(\begin{array}[]{cc}\mathcal{R}[\mu_{\infty}]P&\epsilon\nabla^{\intercal}\otimes P\\ \epsilon\nabla\otimes P&\epsilon I\otimes P\end{array}\right)\left(\begin{array}[]{c}\nabla\phi\\ \nabla\otimes\nabla\phi\end{array}\right)

for arbitrary test functions ϕC0(Ω¯)\phi\in C_{0}^{\infty}(\overline{\Omega}). Note that this equation is understood by interpreting all first order derivatives of F[μ]F[\mu_{\infty}] as distributional derivatives. Next, (81) implies that the integral version of the Bakry-Emery condition [6, 35],

ΩΓ2(ϕ)ωdxλΩΓ(ϕ,ϕ)ωdx,\displaystyle\int_{\Omega}\Gamma_{2}(\phi)\omega\,\mathrm{d}x\ \geq\ \lambda\int_{\Omega}\Gamma(\phi,\phi)\omega\,\mathrm{d}x, (94)

is satisfied for all test functions ϕC0(Ω¯)\phi\in C_{0}^{\infty}(\overline{\Omega}) and all weight functions ωH1(Ω)\omega\in H^{1}(\Omega) with ω0\omega\geq 0. This integral inequality remains correct even if we work with distributional Jacobians F[μ]F^{\prime}[\mu_{\infty}], since these distributional derivatives merely enter via the integrand of the left-hand integral in (94), and, consequently, can be resolved by partial integration. It is well-known [16, Thm. 2 & Prop. 5] that (75) and (94) imply that the entropy inequality

Ω(ϕ1)2ρdxϵ2λλ¯ΩΓ(ϕ,ϕ)ρdx,\displaystyle\int_{\Omega}(\phi-1)^{2}\,\rho_{\infty}\,\mathrm{d}x\ \leq\ \frac{\epsilon}{2\lambda\underline{\lambda}}\int_{\Omega}\Gamma(\phi,\phi)\,\rho_{\infty}\,\mathrm{d}x, (95)

holds for all ϕC0(Ω¯)\phi\in C_{0}^{\infty}(\overline{\Omega}), recalling that (ρdx)(\rho_{\infty}\,\mathrm{d}x) is, by construction, an ergodic measure of the diffusion operator [μ]\mathcal{L}[\mu_{\infty}]. Of course, by continuity, it follows that (95) then also holds for all functions ϕH1(ρ)\phi\in H^{1}(\rho_{\infty}). Consequently, it further follows from Lemma 2 that

ddtρ(t)ρL2(ρ1)2\displaystyle\frac{\mathrm{d}}{\mathrm{d}t}\|\rho(t)-\rho_{\infty}\|_{L^{2}(\rho_{\infty}^{-1})}^{2} \displaystyle\ \leq\ 2ϵλ¯ΩΓ(ϕ,ϕ)ρdx\displaystyle\frac{-2\epsilon}{\overline{\lambda}}\int_{\Omega}\Gamma(\phi,\phi)\rho_{\infty}\,\mathrm{d}x (96)
(95)\displaystyle\overset{\eqref{eq::EntropyIneq}}{\leq} 4λ(λ¯/λ¯)Ω(ϕ1)2ρdx\displaystyle-4\lambda(\underline{\lambda}/\overline{\lambda})\int_{\Omega}(\phi-1)^{2}\,\rho_{\infty}\,\mathrm{d}x
=\displaystyle= 2γρ(t)ρL2(ρ1)2,\displaystyle-2\gamma\|\rho(t)-\rho_{\infty}\|_{L^{2}(\rho_{\infty}^{-1})}^{2},

where we have set ϕ=ρ/ρ\phi=\rho/\rho_{\infty}. An application of Gronwall’s lemma to (96) yields the energy estimate (86), which, in turn, implies that the unique weak solution ρ𝒫(ρ1)\rho\in\mathcal{P}(\rho_{\infty}^{-1}) of (85) also remains well-defined on infinite horizons.\diamond

6 Infinite Horizon Optimal Control

The technical result of Lemma 2 on the exponential convergence of the state of the optimally controlled FPK can be used to bound the value function of infinite horizon optimal control problems. Here, the main idea is to exploit the strong duality between FPKs and HJBs that has been established by Lemma 3.

6.1 Problem Formulation

This section concerns infinite horizon optimal control problems of the form

J(ρ0)=deflimT(J(T,ρ0)T),\displaystyle J_{\infty}(\rho_{0})\ \overset{\mathrm{def}}{=}\ \lim_{T\to\infty}\ \left(J(T,\rho_{0})-T\ell_{\infty}\right)\;, (97)

recalling that the cost function JJ has been defined in (45). Here, \ell_{\infty} denotes the ergodic average cost that has been defined in (32) and interpreted in Theorem 1. Compared to the ergodic optimal control problem from Theorem 1, the cost JJ_{\infty} of the infinite horizon optimal control problem (97) is more difficult to analyze, as it depends on the initial probability density function ρ0𝒫0(ρ)\rho_{0}\in\mathcal{P}_{0}(\rho_{\infty}). Conditions under which the limit on the right hand of (97) exists are discussed below.

6.2 Ergodic HJB

Problem (97) is closely related to the ergodic HJB with Neumann boundary condition, given by

=H(x,V)+ϵΔVonΩ0=(V)nonΩ.\displaystyle\begin{array}[]{rcll}\ell_{\infty}&\ =&H(x,\nabla V_{\infty})+\epsilon\,\Delta V_{\infty}&\text{on}\ \ \Omega\\[4.55254pt] 0&=&\left(\nabla V_{\infty}\right)^{\intercal}n&\text{on}\ \ \partial\Omega\;.\end{array} (100)

Weak solutions of (100) are defined as follows, recalling once more that HH is Lipschitz continuous.

Definition 3

A function VH1(ρ)V_{\infty}\in H^{1}(\rho_{\infty}) is called a weak solution of (100) if

Ωρdx=Ω[ρH(x,V)ϵρV]dx\displaystyle\ell_{\infty}\int_{\Omega}\rho\,\mathrm{d}x\ =\ \int_{\Omega}\left[\rho H(x,\nabla V_{\infty})-\epsilon\nabla\rho^{\intercal}\nabla V_{\infty}\right]\mathrm{d}x

for all test functions ρH1(ρ1)\rho\in H^{1}(\rho_{\infty}^{-1}).

The relation between this ergodic HJB and (97) is explained in the following section.

6.3 Solution of the Infinite Horizon Control Problem

The following theorem summarizes a novel result about the existence of weak solutions of ergodic HJBs in the presence of general indefinite stage cost functions.

Theorem 2

If Assumptions 123, and 4 hold, then the limit JJ_{\infty} in (97) exists for all ρ0𝒫0(ρ1)\rho_{0}\in\mathcal{P}_{0}(\rho_{\infty}^{-1}) and there exists a unique VH1(ρ)V_{\infty}\in H^{1}(\rho_{\infty}) such that

ρ0𝒫0(ρ1),J(ρ0)=ΩV(x)ρ0(x)dx.\forall\rho_{0}\in\mathcal{P}_{0}(\rho_{\infty}^{-1}),\qquad J_{\infty}(\rho_{0})\ =\ \int_{\Omega}V_{\infty}(x)\rho_{0}(x)\,\mathrm{d}x\;.

Moreover, VV_{\infty} is a weak solution of the ergodic HJB (100). It satisfies the energy estimate

Ω(V)2ρdx1γ2Ω[μ]2ρdx,\displaystyle\int_{\Omega}(V_{\infty})^{2}\rho_{\infty}\,\mathrm{d}x\ \leq\ \frac{1}{\gamma^{2}}\int_{\Omega}\ell[\mu_{\infty}]^{2}\rho_{\infty}\,\mathrm{d}x, (101)

recalling that γ=2λ(λ¯/λ¯)\gamma=2\lambda\cdot\left(\underline{\lambda}/\overline{\lambda}\right).


Proof. The proof is divided into three parts:


Part I. Let ρ\rho denote the solution of (85) and let ν=μρ\nu=\mu_{\infty}\rho denote its associated input function such that

|𝔏(ρ,ν)|\displaystyle\left|\mathfrak{L}(\rho,\nu)-\ell_{\infty}\right| (51)\displaystyle\overset{\eqref{eq::CS}}{\leq} ρρL2(ρ1)[μ]L2(ρ)\displaystyle\|\rho-\rho_{\infty}\|_{L^{2}(\rho_{\infty}^{-1})}\|\ell[\mu_{\infty}]\|_{L^{2}(\rho_{\infty})}
(86)\displaystyle\overset{\eqref{eq::expconv}}{\leq} eγtρ0ρL2(ρ1)[μ]L2(ρ).\displaystyle\ e^{-\gamma t}\|\rho_{0}-\rho_{\infty}\|_{L^{2}(\rho_{\infty}^{-1})}\|\ell[\mu_{\infty}]\|_{L^{2}(\rho_{\infty})}.

The above upper bounded can be used to bound the value function of the auxiliary optimization problem

J+(T,ρ0)=def\displaystyle J^{+}(T,\rho_{0})\,\overset{\mathrm{def}}{=}\, minρ,ν\displaystyle\underset{\rho,\nu}{\min} 0T(𝔏(ρ,ν))dt\displaystyle\ \int_{0}^{T}\left(\mathfrak{L}(\rho,\nu)-\ell_{\infty}\right)\,\mathrm{d}t (102)
s.t.\displaystyle\mathrm{s.t.} {ρ˙=𝒜ρ+νonΩ𝕋ν=μρonΩ𝕋ρ0=ρ(0),onΩ0=(ϵρfρGν)nonΣ𝕋.\displaystyle\left\{\begin{array}[]{rcll}\dot{\rho}&=&\mathcal{A}\rho+\mathcal{B}\nu&\text{on}\ \Omega_{\mathbb{T}}\\[4.55254pt] \nu&=&\mu_{\infty}\rho&\text{on}\ \Omega_{\mathbb{T}}\\[4.55254pt] \rho_{0}&=&\rho(0),&\text{on}\ \Omega\\[4.55254pt] 0&=&\left(\epsilon\nabla\rho-f\rho-G\nu\right)^{\intercal}n&\text{on}\ \Sigma_{\mathbb{T}}.\end{array}\right. (107)

Namely, by integrating the above inequality, we find that

|J+(T,ρ0)|ρ0ρL2(ρ1)γΩ[μ]2ρdx\displaystyle\left|J^{+}(T,\rho_{0})\right|\ \leq\ \frac{\|\rho_{0}-\rho_{\infty}\|_{L^{2}(\rho_{\infty}^{-1})}}{\gamma}\sqrt{\int_{\Omega}\ell[\mu_{\infty}]^{2}\rho_{\infty}\,\mathrm{d}x} (108)

for all T>0T>0 and all ρ0𝒫0(ρ1)\rho_{0}\in\mathcal{P}_{0}(\rho_{\infty}^{-1}). Moreover, by using an analogous argument as in the proof of Lemma 3, we can use the weak solution of the linear parabolic PDE

V˙+=[μ]+F[μ]V++ϵΔV+onΩ𝕋0=V+(T)onΩ0=(V+)nonΣ𝕋\displaystyle\begin{array}[]{rcll}-\dot{V}^{+}&=&\ell[\mu_{\infty}]-\ell_{\infty}+F[\mu_{\infty}]^{\intercal}\nabla V^{+}+\epsilon\Delta V^{+}&\text{on}\ \Omega_{\mathbb{T}}\\[4.55254pt] 0&=&V^{+}(T)&\text{on}\ \Omega\\[4.55254pt] 0&=&\nabla(V^{+})^{\intercal}n&\text{on}\ \Sigma_{\mathbb{T}}\end{array} (112)

to represent J+(T,ρ)J^{+}(T,\rho) in the form

ρ0𝒫0(ρ1),J+(T,ρ0)=ΩV+(0,x)ρ0(x)dx.\forall\rho_{0}\in\mathcal{P}_{0}(\rho_{\infty}^{-1}),\qquad J^{+}(T,\rho_{0})\ =\ \int_{\Omega}V^{+}(0,x)\rho_{0}(x)\,\mathrm{d}x\;.

Next, because J(T,)J(T,\cdot) is a uniformly bounded linear operator, its limit for TT\to\infty admits a unique Riesz representation,

J+(ρ0)=deflimTJ+(T,ρ0)=ΩV(x)ρ0(x)dx,J_{\infty}^{+}(\rho_{0})\ \overset{\mathrm{def}}{=}\ \lim_{T\to\infty}J^{+}(T,\rho_{0})\ =\ \int_{\Omega}V_{\infty}(x)\rho_{0}(x)\,\mathrm{d}x,

where VH1(ρ)V_{\infty}\in H^{1}(\rho_{\infty}) must be a weak solution of the linear elliptic PDE

=[μ]+F[μ]V+ϵΔVonΩ0=(V)nonΩ\displaystyle\begin{array}[]{rcll}\ell_{\infty}&=&\ell[\mu_{\infty}]+F[\mu_{\infty}]^{\intercal}\nabla V_{\infty}+\epsilon\Delta V_{\infty}&\text{on}\ \Omega\\[4.55254pt] 0&=&\nabla(V_{\infty})^{\intercal}n&\text{on}\ \partial\Omega\end{array} (115)

at μ=ν/ρ\mu_{\infty}=\nu_{\infty}/\rho_{\infty}.


Part II. Let us use the function J+J_{\infty}^{+} as a terminal cost of the finite horizon optimal control problem

J~(T,ρ0)=def\displaystyle\widetilde{J}(T,\rho_{0})\overset{\mathrm{def}}{=} min(ρ,ν)𝒦𝕋\displaystyle\underset{(\rho,\nu)\in\mathcal{K}_{\mathbb{T}}}{\min} 0T(𝔏(ρ,ν))dt+J+(ρ(T))\displaystyle\int_{0}^{T}\left(\mathfrak{L}(\rho,\nu)-\ell_{\infty}\right)\,\mathrm{d}t+J_{\infty}^{+}(\rho(T)) (119)
s.t.\displaystyle\mathrm{s.t.} {ρ˙=𝒜ρ+νonΩ𝕋ρ0=ρ(0),onΩ0=(ϵρfρGν)nonΣ𝕋.\displaystyle\left\{\begin{array}[]{rcll}\dot{\rho}&=&\mathcal{A}\rho+\mathcal{B}\nu&\text{on}\ \Omega_{\mathbb{T}}\\[4.55254pt] \rho_{0}&=&\rho(0),&\text{on}\ \Omega\\[4.55254pt] 0&=&\left(\epsilon\nabla\rho-f\rho-G\nu\right)^{\intercal}n&\text{on}\ \Sigma_{\mathbb{T}}.\end{array}\right.

By using an analogous argument as in Lemma 3, it follows that J~(T,)\widetilde{J}(T,\cdot) is a bounded linear operator, whose unique Riesz representation

J~(T,ρ0)=ΩV~(0,x)ρ0(x)dx\widetilde{J}(T,\rho_{0})\ =\ \int_{\Omega}\widetilde{V}(0,x)\rho_{0}(x)\,\mathrm{d}x

is given by the weak solution of the parabolic HJB

tV~=H(x,V~)+ϵΔV~onΩ𝕋V=V~(T)onΩ0=V~nonΣ𝕋.\displaystyle\begin{array}[]{rcll}-\partial_{t}\widetilde{V}&\ =&H(x,\nabla\widetilde{V})+\epsilon\,\Delta\widetilde{V}&\text{on}\ \ \Omega_{\mathbb{T}}\\[4.55254pt] V_{\infty}&\ =&\widetilde{V}(T)&\text{on}\ \ \Omega\\[4.55254pt] 0&=&\nabla\widetilde{V}^{\intercal}n&\text{on}\ \ \Sigma_{\mathbb{T}}.\end{array} (124)

Since μ\mu_{\infty} is a feasible time-invariant control law for (LABEL:eq::Jtilde), we must have

xΩ,V~(0,x)V(x).\forall x\in\Omega,\quad\widetilde{V}(0,x)\ \leq\ V_{\infty}(x).

Next, let us assume that we can find a ρ0𝒫0(ρ1)\rho_{0}\in\mathcal{P}_{0}(\rho_{\infty}^{-1}) and a T~>0\widetilde{T}>0 for which J~(T~,ρ0)<J+(ρ0)\widetilde{J}(\widetilde{T},\rho_{0})<J_{\infty}^{+}(\rho_{0}). Then the Lebesgue measure of the set

{xΩV~(0,x)<V(x)}\{\ x\in\Omega\ \mid\ \widetilde{V}(0,x)\ <\ V_{\infty}(x)\ \}

must be strictly positive. But this implies that there exists a δ>0\delta>0 such that

TT~,J~(T,ρ)J+(ρ)δ=δ,\displaystyle\forall T\geq\widetilde{T},\qquad\widetilde{J}(T,\rho_{\infty})\ \leq\ J_{\infty}^{+}(\rho_{\infty})-\delta\ =\ -\delta\;, (125)

where δ\delta does not depend on TT. In order to show that this inequality leads to a contradiction, we denote the minimizer of (LABEL:eq::Jtilde) at ρ0=ρ\rho_{0}=\rho_{\infty} by (ρT,νT)(\rho_{T}^{\star},\nu_{T}^{\star}). Due to the above inequality, the ergodic average values

ρ¯=deflimT0TρTdtTandν¯=deflimT0TνTdtT\overline{\rho}\ \overset{\mathrm{def}}{=}\ \lim_{T\to\infty}\frac{\int_{0}^{T}\rho_{T}^{\star}\,\mathrm{d}t}{T}\quad\text{and}\quad\overline{\nu}\ \overset{\mathrm{def}}{=}\ \lim_{T\to\infty}\frac{\int_{0}^{T}\nu_{T}^{\star}\,\mathrm{d}t}{T}

exist in H1(Ω)H^{1}(\Omega), since the optimal trajectories remain bounded and must satisfy

limTρT(T)=ρandlimTνT(T)=ν,\lim_{T\to\infty}\rho_{T}^{\star}(T)\ =\ \rho_{\infty}\quad\text{and}\quad\lim_{T\to\infty}\nu_{T}^{\star}(T)\ =\ \nu_{\infty}\;,

where the limits are understood in H1(Ω)H^{1}(\Omega). The latter claim follows by using an analogous argument as in the proof of Theorem 1, recalling that the ergodic limit (ρ,ν)(\rho_{\infty},\nu_{\infty}) is strictly optimal on infinite horizons. This implies in particular that 0=𝒜ρ¯+ν¯0=\mathcal{A}\overline{\rho}+\mathcal{B}\overline{\nu} as well as

(ρ¯,ν¯)δ,\mathcal{L}(\overline{\rho},\overline{\nu})\ \leq\ \ell_{\infty}-\delta\ ,

due to (125) and Jensen’s inequality, recalling that 𝔏\mathfrak{L} is convex. But this contradicts the definition of \ell_{\infty} in (32). Thus, we must have

T>0,J~(T,)=J+()=J(),\forall T>0,\qquad\widetilde{J}(T,\cdot)\ =\ J_{\infty}^{+}(\cdot)\ =\ J_{\infty}(\cdot),

where the latter equation follows from the fact the first equation holds uniformly and the fact that the contribution of the terminal cost,

limTJ+(ρT(T))= 0\lim_{T\to\infty}J_{\infty}^{+}(\rho_{T}^{\star}(T))\ =\ 0

vanishes for TT\to\infty.


Part III. An immediate consequence of the constructions in Part II is that the unique weak solution of the parabolic HJB (124),

xΩ,V~(t,x)=V(x)\forall x\in\Omega,\qquad\widetilde{V}(t,x)\ =\ V_{\infty}(x)

does not depend on t>0t>0. Thus, VV_{\infty} is a weak solution of the ergodic HJB (100) as claimed by the statement of this theorem. Finally, the energy estimate follows from (108) by using that

VL2(ρ)\displaystyle\|V_{\infty}\|_{L^{2}(\rho_{\infty})} =\displaystyle\ =\ supρ0ρΩV(ρ0ρ)dxρ0ρL2(ρ1)\displaystyle\sup_{\rho_{0}\neq\rho_{\infty}}\frac{\int_{\Omega}V_{\infty}(\rho_{0}-\rho_{\infty})\,\mathrm{d}x}{\|\rho_{0}-\rho_{\infty}\|_{L^{2}(\rho_{\infty}^{-1})}} (126)
=\displaystyle= supρ0ρJ+(ρ0)ρ0ρL2(ρ1)\displaystyle\sup_{\rho_{0}\neq\rho_{\infty}}\frac{J_{\infty}^{+}(\rho_{0})}{\|\rho_{0}-\rho_{\infty}\|_{L^{2}(\rho_{\infty}^{-1})}}
(108)\displaystyle\overset{\eqref{eq::energy1}}{\leq} 1γΩ[μ]2ρdx.\displaystyle\frac{1}{\gamma}\sqrt{\int_{\Omega}\ell[\mu_{\infty}]^{2}\rho_{\infty}\,\mathrm{d}x}\;.

This completes the proof.\diamond

Remark 5

The above framework can, in principle, be extended for systems with state constraints of the form

𝔼{c(X(t))} 0Ωcρ(t)dx 0\mathbb{E}\{c(X(t))\}\ \leq\ 0\qquad\Longleftrightarrow\qquad\int_{\Omega}c\,\rho(t)\,\mathrm{d}x\ \leq\ 0

for a constraint function cL(Ω)c\in L^{\infty}(\Omega). This can be done by adding the above linear inequality constraint to (45), as long as one restricts the analysis to initial density functions ρ0\rho_{0} for which this linear inequality constraint is strictly feasible, such that Slater’s constraint qualification still holds. Because (45) is then still a convex PDE-constrained optimization problem, it still admits a unique solution. One major difficulty, however, is that, in this case, J(T,)J(T,\cdot) is not a bounded linear operator anymore. This renders Lemma 3 and Theorem 2 inapplicable to systems with stochastic state constraints. Instead, the KKT optimality conditions of (45) have to be analyzed by using the Zowe-Kurcyusz theorem [47], see also [28, Thm. 1.56]. In combination with Rockafellar’s theorem [11, Thm. 22.14] it then follows that the Frechet subdifferential of J(T,)J(T,\cdot) (and also JJ_{\infty}) is, under suitable regularity assumptions, a maximal monotone operator. The practical relevance of such analysis techniques is, however, limited, as the whole point of the HJBs (59) and (100) is that one can compute a value function that is independent of ρ0\rho_{0}. Because this desirable property of the HJB fails to hold in the presence of explicit state constraints, a much more practical heuristic is to choose a function \ell that takes large values in regions of the state space that one wishes to constraint. The optimal controller can then be expected to automatically avoid these regions with high probability whenever possible.

7 Global Optimal Control Algorithm

The results of Lemma 3 and Theorem 2 motivate a completely new numerical method for solving HJBs. This method relies on discretizing the dynamic programming recursion for the bounded linear operator J(T,)J(T,\cdot) rather than on discretizing the value function VV directly.

7.1 Galerkin Projection

The bounded linear operator J(T,)J(T,\cdot), as defined in (45), can be approximated by a finite dimensional linear function 𝖩0()\mathsf{J}_{0}(\cdot) by applying a simple Galerkin projection. This means that we choose basis functions

φ1,φ2,,φ𝗆H1(Ω)\varphi_{1},\varphi_{2},\ldots,\varphi_{\mathsf{m}}\in H^{1}(\Omega)

in order to discretize the states ρ\rho and controls ν\nu. For simplicity of presentation, we assume that this construction is based on given grid points

𝗑1,𝗑2,,𝗑𝗆Ωand𝗍k=kh\mathsf{x}_{1},\mathsf{x}_{2},\ldots,\mathsf{x}_{\mathsf{m}}\in\Omega\quad\text{and}\quad\mathsf{t}_{k}=kh

with step-size h>0h>0 and time index kk\in\mathbb{N} and that Lagrange’s equation holds111Note that the methods in this paper can be extended easily for all kinds of finite element-, model order reduction, and other related numerical methods for linear parabolic PDEs or even tailored methods for FPKs, which might use basis functions that do not satisfy Lagrange’s equation—this equation is merely introduced in order avoid unnecessary technical notation overhead.; that is, φi(𝗑j)=δi,j\varphi_{i}(\mathsf{x}_{j})=\delta_{i,j}, where δi,j=0\delta_{i,j}=0 if iji\neq j and δi,i=1\delta_{i,i}=1. In this case, the discrete approximation of the bounded linear operator J(T,)J(T,\cdot) at T=t𝖭T=t_{\mathsf{N}} has the form

𝖩0(z0)=def\displaystyle\mathsf{J}_{0}(z_{0})\ \overset{\mathrm{def}}{=}\ minz,v\displaystyle\min_{z,v}\ k=0𝖭1i=1𝗆𝗐i(𝗑i,vk,izk,i)zk,i\displaystyle\sum_{k=0}^{\mathsf{N}-1}\sum_{i=1}^{\mathsf{m}}\,\mathsf{w}_{i}\cdot\ell\left(\mathsf{x}_{i},\frac{v_{k,i}}{z_{k,i}}\right)\cdot z_{k,i} (127)
s.t. {k{0,1,,𝖭1},𝖤zk+1=𝖠zk+𝖡vk(zk,i,vk,i)𝕂f.a.i{1,,𝗆}.\displaystyle\left\{\begin{array}[]{l}\forall k\in\{0,1,\ldots,\mathsf{N}-1\},\\[4.55254pt] \mathsf{E}z_{k+1}=\mathsf{A}z_{k}+\mathsf{B}v_{k}\\[4.55254pt] (z_{k,i},v_{k,i})\in\mathbb{K}\quad\text{f.a.}\quad i\in\{1,\ldots,\mathsf{m}\}.\end{array}\right. (131)

Here, 𝖠\mathsf{A} and 𝖡\mathsf{B} are discrete-time system matrices that are found by a Galerkin projection of 𝒜\mathcal{A} and \mathcal{B} in combination with a suitable implicit differential equation solver in order to discretize the system also in time. In this context, 𝖤\mathsf{E} is an invertible mass matrix. The coefficients of the associated truncated series expansions

ρ(𝗍𝗄)i=1𝗆zk,iφiandν(𝗍𝗄)i=1𝗆vk,iφi\rho(\mathsf{t_{k}})\ \approx\ \sum_{i=1}^{\mathsf{m}}z_{k,i}\varphi_{i}\quad\text{and}\quad\nu(\mathsf{t_{k}})\ \approx\ \sum_{i=1}^{\mathsf{m}}v_{k,i}\varphi_{i}

are optimized subject to a conic constraint, given by

𝕂={(z,v)×nu|vz𝕌andz>0}.\mathbb{K}\ =\ \left\{\ (z,v)\in\mathbb{R}\times\mathbb{R}^{n_{u}}\ \middle|\ v\in z\mathbb{U}\ \ \text{and}\ \ z>0\ \right\}.

Moreover, the quadrature weights 𝗐i>0\mathsf{w}_{i}>0 are needed in order to discretize the integral in the objective of (45).

Remark 6

Methods for bounding the discretization error of the above Galerkin projection of (45) in dependence on 𝗆\mathsf{m} and 𝖭\mathsf{N} can be found in [19]. Moreover, we recall that a desirable property of Galerkin methods is that—at least for appropriate choices of the basis functions—Markov’s conservation laws

𝟣𝖤1𝖠=𝟣and𝟣𝖤1𝖡=0\mathsf{1}^{\intercal}\mathsf{E}^{-1}\mathsf{A}=\mathsf{1}^{\intercal}\qquad\text{and}\qquad\mathsf{1}^{\intercal}\mathsf{E}^{-1}\mathsf{B}=0

hold. Similarly, accurate discretization schemes either satisfy (or simply enforce) the condition

𝖠+𝖡diag(𝗎)> 0\mathsf{A}+\mathsf{B}\cdot\mathrm{diag}(\mathsf{u})\ >\ 0

componentwise for all vectors 𝗎\mathsf{u}, with 𝗎i𝕌\mathsf{u}_{i}\in\mathbb{U}, such that 𝗓k>0\mathsf{z}_{k}>0 implies 𝗓k+1>0\mathsf{z}_{k+1}>0 for all feasible inputs.

7.2 Dual perspective function

For the following developments it will be convenient to introduce the Fenchel conjugate of the stage cost,

𝖽(x,λ)=defminu𝕌{(x,u)+λu}.\mathsf{d}(x,\lambda)\ \overset{\mathrm{def}}{=}\ \min_{u\in\mathbb{U}}\,\left\{\ell(x,u)+\lambda^{\intercal}u\right\}\;.

Fortunately, in practical applications, one can find an explicit expression for 𝖽\mathsf{d}. For instance if \ell is a diagonal quadratic form in uu and 𝕌\mathbb{U} a simple interval box, a single evaluation of 𝖽\mathsf{d} with modern processors can be done efficiently, usually, in a couple of nanoseconds. In this case, the dual perspective function

𝖣(λ)=def(𝖽(𝗑1,λ1/𝗐1)𝗐1𝖽(𝗑𝗆,λ𝗆/𝗐𝗆)𝗐𝗆)\mathsf{D}(\lambda)\ \overset{\mathrm{def}}{=}\ \left(\begin{array}[]{c}\mathsf{d}(\mathsf{x}_{1},\lambda_{1}/\mathsf{w}_{1})\cdot\mathsf{w}_{1}\\ \vdots\\ \mathsf{d}(\mathsf{x}_{\mathsf{m}},\lambda_{\mathsf{m}}/\mathsf{w}_{\mathsf{m}})\cdot\mathsf{w}_{\mathsf{m}}\end{array}\right)

can be evaluated with reasonable effort, even if the number of basis functions, 𝗆\mathsf{m}, is very large.

7.3 Operator-Theoretic Dynamic Programming

Assuming that the stochastic optimal control problem is discretized as above, it can be solved by starting with y𝖭=0y_{\mathsf{N}}=0 and solving the recursion

𝖤yk=𝖠yk+1+𝖣(𝖡yk+1)\displaystyle\mathsf{E}^{\intercal}y_{k}\ =\ \mathsf{A}^{\intercal}y_{k+1}+\mathsf{D}(\,\mathsf{B}^{\intercal}y_{k+1}\,) (132)

backwards in time, for k{𝖭1,, 1, 0}k\in\{\mathsf{N}-1,\,\ldots,\,1,\,0\}. This procedure can be justified directly by observing that the linear cost-to-go function of the discrete optimal control problem (127) is given by 𝖩k(z)=ykEz\mathsf{J}_{k}(z)=y_{k}^{\intercal}Ez. It satisfies the discretized operator-theoretic dynamic programming recursion,

yk𝖤z\displaystyle y_{k}^{\intercal}\mathsf{E}z =\displaystyle\ =\ minvi=1𝗆𝗐i(𝗑i,vizi)zi+yk+1(𝖠z+𝖡v)\displaystyle\min_{v}\sum_{i=1}^{\mathsf{m}}\,\mathsf{w}_{i}\cdot\ell\left(\mathsf{x}_{i},\frac{v_{i}}{z_{i}}\right)\cdot z_{i}+y_{k+1}^{\intercal}(\mathsf{A}z+\mathsf{B}v)
=\displaystyle= [𝖠yk+1+𝖣(𝖡yk+1)]z.\displaystyle\left[\ \mathsf{A}^{\intercal}y_{k+1}+\mathsf{D}\left(\,\mathsf{B}^{\intercal}y_{k+1}\,\right)\ \right]^{\intercal}z\;.

A comparison of coefficients yields (132). Moreover, the associated separable minimizers

uk,i=defargminu𝕌{(xi,u)+yk+1𝖡iu𝗐i}u_{k,i}^{\star}\ \overset{\mathrm{def}}{=}\ \underset{u\in\mathbb{U}}{\mathrm{argmin}}\ \left\{\ \ell(x_{\mathrm{i}},u)+\frac{y_{k+1}^{\intercal}\mathsf{B}_{i}u}{\mathsf{w}_{i}}\ \right\}

can be used to recover an approximation of the optimal control law μ=ν/ρ\mu^{\star}=\nu^{\star}/\rho^{\star} at the minimizer (ρ,ν)(\rho^{\star},\nu^{\star}) of (45),

μ(𝗍k,𝗑i)uk,i.\mu^{\star}(\mathsf{t}_{k},\mathsf{x}_{i})\ \approx\ u_{k,i}^{\star}\;.

The computational complexity for computing this approximation of μ\mu^{\star} is given by 𝒪(𝗆𝖭)\mathcal{O}(\mathsf{m}\cdot\mathsf{N}). Finally, regarding the solution of the infinite horizon control problem, one option is to keep on iterating (132) backward in time until a suitable termination criterion is satisfied, for example, until

ukuk+1𝖳𝖮𝖫\|u_{k}-u_{k+1}\|_{\infty}\ \leq\ \mathsf{TOL}

for a user specified tolerance TOL>0\mathrm{TOL}>0. This method converges for kk\to-\infty as long as the conditions of Theorem 2 are satisfied—assuming that the above finite element discretization is reasonably accurate. Alternatively, one can use a large-scale convex optimization problem solver to solve (127) for 𝖭=1\mathsf{N}=1 but subject to the additional stationarity constraint z1=z0z_{1}=z_{0}.

Remark 7

As reviewed in the introduction, the duality between HJBs and optimally controlled FPKs is known since a long time and, in fact, in [39] a method for solving HJBs via FPK-constrained convex optimization can be found. The operator-theoretic dynamic programming recursion (132) is, however, an original contribution of this paper. It is entirely explicit and does—at least if the Fenchel conjugate dd admits an explicit expression—not even require an optimization problem solver at all.

Refer to caption
Figure 1: LEFT: the optimal ergodic control law μ:Ω𝕌\mu_{\infty}:\Omega\to\mathbb{U} on the domain Ω=[3,3]2\Omega=[-3,3]^{2} for the control bounds 𝕌=[4,4]\mathbb{U}=[-4,4]. MIDDLE: simulations of the states of the closed-loop system without noise for selected initial values (red dots), which converge to a non-trivial attractor set. UPPER RIGHT: the indefinite solution VV_{\infty} of the ergodic HJB that corresponds to the unique Riesz representation of the cost function, as introduced in Theorem 2. LOWER RIGHT: the optimal multi-modal steady state ρ\rho_{\infty}.

7.4 Numerical Illustration

This section is about a nonlinear stochastic system with two states and one control input, given by

f(x)=(x212x1x2x1)G(x)=(01),ϵ=0.2,f(x)=\left(\begin{array}[]{c}x_{2}-\frac{1}{2}x_{1}x_{2}\\ -x_{1}\end{array}\right)\quad G(x)=\left(\begin{array}[]{c}0\\ 1\end{array}\right),\ \ \epsilon=0.2,

on the domain Ω=[3,3]2\Omega=[-3,3]^{2} and control constraint set 𝕌=[4,4]\mathbb{U}=[-4,4]. An associated non-convex stage cost function is given by

(x,u)=14x12+3(x221)2+12u2.\ell(x,u)=\frac{1}{4}x_{1}^{2}+3\left(x_{2}^{2}-1\right)^{2}+\frac{1}{2}u^{2}\;.

This example trivially satisfies Assumptions 12, and 3. Assumption 4 can be verified numerically finding the exponential decay rate γ0.06\gamma\approx 0.06. Figure 1 visualizes the optimal infinite horizon control law μ\mu_{\infty}, the optimal steady state ρ\rho_{\infty}, and the associated dual solution VV_{\infty} that solves the ergodic HJB at

2.03.\ell_{\infty}\ \approx\ 2.03\;.

This solution has been found by using a Galerkin discretization with Lagrange basis functions on a simple equidistant grid with 𝗆=22500\mathsf{m}=22500 basis functions and N=103N=10^{3} time iterations with h=0.05h=0.05. The table below,

𝗆\mathsf{m} 𝖭\mathsf{N} μ𝗆μL2(Ω)\|\mu_{\infty}^{\mathsf{m}}-\mu_{\infty}^{\star}\|_{L^{2}(\Omega)} 𝖢𝖯𝖴𝗍𝗂𝗆𝖾\mathsf{CPU-time}
100100 10310^{3} 0.170.17 9ms9\,\mathrm{ms}
400400 10310^{3} 0.0150.015 35ms35\,\mathrm{ms}
900900 10310^{3} 0.00560.0056 79ms79\,\mathrm{ms}

,

lists the CPU run-time of a JULIA-prototype implementation of (132) for different choices of 𝗆\mathsf{m}. The third column of this table lists the L2L^{2}-norm between the control law μ𝗆\mu_{\infty}^{\mathsf{m}} that is found by using 𝗆\mathsf{m} basis functions and the control law μ\mu_{\infty}^{\star} that is found by using 2250022500 basis functions.


Note that this example should be regarded as a “toy problem”. More advanced implementations of the above method and case studies are left for future work. Nevertheless, the above example is non-trivial in the sense that it has a multi-modal steady state ρ\rho_{\infty}. Moreover, the associated solution VV_{\infty} of the ergodic HJB turns out to be indefinite. And, finally, the closed-loop system has a non-trivial limit behavior: the plot in the center of Figure 1 shows closed-loop simulations without noise in order to illustrate that the nominal trajectories converge to a non-trivial periodic attractor.

8 Conclusions

This paper has presented methods for analyzing ergodic- and infinite horizon stochastic optimal control problems. In detail, Theorem 1 has established the existence and uniqueness of ergodic optimal controls under Assumptions 1 and 2. These assumptions are satisfied for most nonlinear stochastic control problems of practical relevance, because nonlinear McKean-Vlasov systems on bounded domains with reflecting boundaries always admit a Lyapunov-Has’minskiǐ function. But even if the domain is unbounded such functions are—in contrast to traditional Lyapunov functions—relatively easy to find.


Concerning the analysis of infinite horizon optimal control problems, this paper has proposed to introduce a novel class of Lyapunov diffusion operators that can be used to formulate a generalized Bakry-Emery condition, as summarized in Assumption 4. The relevance of this Lyapunov diffusion operator based LMI has been clarified in Theorem 2, which has used this condition to establish the existence of solutions to ergodic HJB equations for nonlinear optimal control problems with indefinite stage costs.


For the first time, this paper has presented methods for analyzing a very general class of nonlinear HJBs by using functional analysis methods in Hilbert-Sobolev spaces that would usually find applications in the context of linear parabolic PDEs. This has far reaching consequences on the development of global robust optimal control algorithms. This is because the proposed operator-theoretic dynamic programming method renders existing numerical methods for discretizing linear parabolic PDEs fully applicable for solving nonlinear HJBs. The exploitation of the implications of this observation appear to be a fruitful direction for future research on stochastic optimal control.

References

  • [1] M. Arisawa. Ergodic problem for the Hamilton- Jacobi-Bellman equation (II). Ann. Inst. H. Poincaré Anal. Non Linéaire, 14:415–438, 1998.
  • [2] M. Arisawa and P.-L. Lions. On ergodic stochastic control. Communications in Partial Differential Equations, 23(11–12):333–358, 1998.
  • [3] A. Arnold, P. Markowich, G. Toscani, and A. Unterreiter. On convex Sobolev inequalities and the rate of convergence to equilibrium for Fokker-Planck type equations. Comm. Partial Differential Equations, 26(1–2):43–100, 2001.
  • [4] M.S. Aronna and F. Tröltzsch. First and second order optimality conditions for the control of Fokker-Planck equations. ESAIM - Control, Optimisation and Calculus of Variations, 27(15), 2021.
  • [5] V. Aškovic, E. Trélat, and H. Zidani. On the asymptotic behavior of the value function in large time optimal control problems. IFAC-PapersOnLine, 55(16):38–43, 2022.
  • [6] D. Bakry and M. Émery. Diffusions hypercontractives. Séminaire de probabilités, XIX, Lecture Notes in Math. 1123, pages 177–206, 1985.
  • [7] V. Barbu. The controllability of Fokker-Planck equations with reflecting boundary conditions and controllers in diffusion term. SIAM Journal on Control and Optimization, 59(1), 2021.
  • [8] V. Barbu and M. Röckner. The evolution to equilibrium of solutions to nonlinear Fokker-Planck equation. Indiana University Mathematics Journal, 72(1):89–131, 2023.
  • [9] M. Bardi and I. Capuzzo-Dolcetta. Optimal control and viscosity solutions of Hamilton-Jacobi-Bellman equations. Birkhäuser, 1997.
  • [10] G. Barles and P.E. Souganidis. On the large-time behavior of solutions of Hamilton-Jacobi equations. SIAM Journal on Mathematical Analysis, 31(4):925–939, 2000.
  • [11] H.H. Bauschke and P.L. Combettes. Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics, Springer, 2011.
  • [12] R.E. Bellman. Dynamic Programming. Princeton University Press, 1957.
  • [13] J.-M. Bismut. Conjugate convex functions in optimal stochastic control. Journal of Mathematical Analysis and Applications, 44:383–404, 1973.
  • [14] S.G. Bobkov, I. Gentil, and M Ledoux. Hypercontractivity of Hamilton-Jacobi equations. Journal de Mathématiques Pures et Appliquées, 80(7):669–696, 2001.
  • [15] V.I. Bogachev, N.V. Krylov, M. Röckner, and S.V. Shaposhnikov. Fokker-Planck-Kolmogorov equations. American Mathematical Society, 2015.
  • [16] F. Bolley and I. Gentil. Phi-entropy inequalities for diffusion semigroups. Journal de Mathématiques Pures et Appliquées, 93:449–473, 2010.
  • [17] T. Breiten, K. Kunisch, and L. Pfeiffer. Control strategies for the Fokker- Planck equation. ESAIM: Control, Optimisation and Calculus of Variations, 24(2):741–763, 2018.
  • [18] T. Breiten, K. Kunisch, and L. Pfeiffer. Infinite-horizon bilinear optimal control problems: sensitivity analysis and polynomial feedback laws. SIAM Journal on Control and Optimization, 56(5):3184–3214, 2018.
  • [19] S. Brenner and R.L. Scott. The mathematical theory of finite element methods. Springer, 2005.
  • [20] S. Chapman. On the Brownian displacements and thermal diffusion of grains suspended in a non-uniform fluid. Proceedings of the Royal Society (A), 119:34–54, 1928.
  • [21] I. Ciotir and R. Fayad. Nonlinear Fokker-Planck equation with reflecting boundary conditions. Journal of Differential Equations, 321(1):296–317, 2022.
  • [22] M.G. Crandall, H. Ishii, and P.-L. Lions. User’s guide to viscosity solutions of second order partial differential equations. Bulletin of the Americal Mathematical Society, 27(1):1–67, 1992.
  • [23] W.H. Fleming and H.M. Soner. Controlled Markov Processes and Viscosity Solutions. Springer, 1993.
  • [24] W.H. Fleming and D. Vermes. Convex duality approach to the control of diffusions. SIAM Journal on Control and Optimization, 27(5):1136–1155, 1989.
  • [25] A.D. Fokker. Die mittlere Energie rotierender elektrischer Dipole im Strahlungsfeld. Annalen der Physik, 4(3):810–820, 1914.
  • [26] L. Gross. Logarithmic Sobolev inequalities. Amer. J. Math., 97(4):1061–1083, 1975.
  • [27] R.Z. Has’minskiǐ. Ergodic properties of recurrent diffusion processes and stabilization of the solution of the Cauchy problem for parabolic equations. Theory of Probability and its Applications, 5:179–196, 1960.
  • [28] M. Hinze, R. Pinnau, M. Ulbrich, and S. Ulbrich. Optimization with PDE Constraints. Springer, 2009.
  • [29] W. Huang, M. Ji, Z. Liu, and Y. Yi. Steady states of Fokker-Planck equations: I. Existence. Journal of Dynamics and Differential Equations, 27:721–742, 2015.
  • [30] A.N. Kolmogoroff. Über die analytischen Methoden in der Wahrscheinlichkeitsrechnung. Mathematische Annalen, 104:415–458, 1931.
  • [31] M. Korda and I. Mezić. Linear predictors for nonlinear dynamical systems: Koopman operator meets model predictive control. Automatica, 93:149–160, 2018.
  • [32] N.V. Krylov. Nonlinear Elliptic and Parabolic Equations of the Second Order. Reidel, Dordrecht, 1987.
  • [33] N.V. Krylov. Controlled diffusion processes. Springer, 2008.
  • [34] J.B. Lasserre, D. Henrion, C. Prieur, and E. Trelat. Nonlinear optimal control via occupation measures and LMI relaxations. SIAM Journal on Control and Optimization, 47:1649–1666, 2008.
  • [35] M. Ledoux. On an integral criterion for hypercontractivity of diffusion semigroups and extremal functions. Journal of Functional Analysis, 105(2):444–465, 1992.
  • [36] G. Namah and J.M. Roquejoffre. Remarks on the long time behavior of the solutions of Hamilton-Jacobi equations. Communications on Partial Differential Equations, 24:883–893, 1999.
  • [37] M. Planck. Über einen Satz der statistischen Dynamik und seine Erweiterung in der Quantentheorie. Sitzungsberichte der Königlich Preussischen Akademie der Wissenschaften, pages 324–341, 1917.
  • [38] M.L. Puterman. Optimal control of diffusion processes with reflection. Journal of Optimization Theory and Applications, 22(1), 1977.
  • [39] P. Rutquist, T. Wik, and C. Breitholtz. State constrained optimal control via the Fokker-Planck equation. IFAC-PapersOnLine, 50(1):6303–6307, 2017.
  • [40] A.-S. Sznitman. Nonlinear reflecting diffusion process, and the propagation of chaos and fluctuations associated. Journal of Functional Analysis, 56(3):311–336, 1984.
  • [41] E. Trélat and E. Zuazua. The turnpike property in finite-dimensional nonlinear optimal control. Journal of Differential Equations, 258(1):81–114, 2015.
  • [42] D. Trevisan. Well-posedness of multidimensional diffusion processes. Electronic Journal of Probability, 21:1–41, 2016.
  • [43] M.E. Villanueva, C.N. Jones, and B. Houska. Towards global optimal control via Koopman lifts. Automatica, 132(109610), 2021.
  • [44] R. Vinter. Convex duality and nonlinear optimal control. SIAM Journal on Control and Optimization, 31(2):518–538, 1993.
  • [45] M. von Smoluchowski. Über Brownsche Molekularbewegung unter Einwirkung äußerer Kräfte und den Zusammenhang mit der verallgemeinerten Diffusionsgleichung. Annalen der Physik, 353(24):1103–1112, 1915.
  • [46] W.M. Wonham. Liapunov criteria for weak stochastic stability. Journal of Differential Equations, 2:195–207, 1966.
  • [47] J. Zowe and S. Kurcyusz. Regularity and stability for the mathematical programming problem in Banach spaces. Applied Mathematics and Optimization, 5:49–62, 1979.