This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A Feynman-Kac Type Theorem for ODEs: Solutions of Second Order ODEs as Modes of Diffusions

Zachary Selk Department of Mathematics, Purdue University, West Lafayette IN 47907, USA.  and  Harsha Honnappa School of Industrial Engineering, Purdue University, West Lafayette IN 47907, USA. [zselk,honnappa]@purdue.edu
Abstract.

In this article, we prove a Feynman-Kac type result for a broad class of second order ordinary differential equations. The classical Feynman-Kac theorem says that the solution to a broad class of second order parabolic equations is the mean of a particular diffusion. In our situation, we show that the solution to a system of second order ordinary differential equations is the mode of a diffusion, defined through the Onsager-Machlup formalism. One potential utility of our result is to use Monte Carlo type methods to estimate the solutions of ordinary differential equations. We conclude with examples of our result illustrating its utility in numerically solving linear second order ODEs.

2010 Mathematics Subject Classification:
60H30, 60G15, 34B05
H. Honnappa is supported by the NSF grant DMS-1812197.

1. Introduction

The Feynman-Kac theorem is a widely known and extensively used formula that relates the solutions of second order parabolic partial differential equations (PDEs) to the average of a transformation of a diffusion. It was introduced in [22] and has since seen tremendous utility and generalization. The Feynman-Kac formula has been extended to stochastic PDE in e.g. [4, 19, 20]. It has also seen utility in physics, see e.g. [1, 8, 17, 24, 27]. It has been used also in interacting particle systems, see e.g. [7, 11, 12, 13]. It has been used in finance, see e.g. [21].

There are many forms of the Feynman-Kac formula, see e.g. [25] for a standard reference. In one form, the Feynman-Kac formula says the following.

Theorem 1 (Feynman-Kac).

Consider the partial differential equation

tu(t,x)+μ(t,x)xu(t,x)+12σ2(t,x)2x2u(t,x)=V(t,x)u(t,x),\frac{\partial}{\partial t}u(t,x)+\mu(t,x)\frac{\partial}{\partial x}u(t,x)+\frac{1}{2}\sigma^{2}(t,x)\frac{\partial^{2}}{\partial x^{2}}u(t,x)=V(t,x)u(t,x),

on the domain [0,T]×[0,T]\times\mathbb{R}, with terminal condition u(T,x)=ψ(x)u(T,x)=\psi(x) and μ,σ,V\mu,\sigma,V sufficiently regular. Then

u(t,x)=Eμ0[etTV(s,X(s))𝑑sψ(X(T))|X(t)=x],u(t,x)=E_{\mu_{0}}[e^{-\int_{t}^{T}V(s,X(s))ds}\psi(X(T))|X(t)=x],

where X(t)X(t) is the solution to the solution to the stochastic differential equation

dX(t)=μ(t,X(t))dt+σ(t,X(t))dB(t),dX(t)=\mu(t,X(t))dt+\sigma(t,X(t))dB(t),

with initial condition X(t)=xX(t)=x, B(t)B(t) a standard Brownian motion and μ0\mu_{0} is the measure associated to B(t)B(t).

Linear second order ordinary differential equations and systems thereof find extensive use in engineering, physics, biology, geometry, control theory and many other areas of pure and applied mathematics. One example of a system of such equations in physics is the matrix Airy equation, which has connections to the intersection theory on the moduli space of curves, see [23]. Further examples abound in the computation of long-run average costs associated with diffusion processes, which can be expressed as solutions of second-order ordinary differential equations (ODEs). In this article, we demonstrate a theorem analogous to Theorem 1 for second order ordinary differential equations. However in the ODE case, solutions are represented as the conditional “mode” of a diffusion instead of a conditional mean. By “mode”, we mean the minimizer of an Onsager-Machlup function. It was first shown in [15] that the Onsager-Machlup function is a Lagrangian for the “most likely path” or “mode” of a diffusion. Our main theorem can be summarized as follows, see Theorem 22 for a full version.

Theorem 2 (Feynman-Kac for ODEs).

Consider the differential equation

(1) y¨(t)+f(t)y˙(t)+g(t)y(t)+h(t)=0,\ddot{y}(t)+f(t)\dot{y}(t)+g(t)y(t)+h(t)=0,

where the unknown yC2([0,T],)y\in C^{2}([0,T],\mathbb{R}), with boundary conditions y(0)=0y(0)=0 and y(T)=ay(T)=a. Let f(t),g(t)f(t),g(t) and h(t)h(t) be (almost) arbitrary C1C^{1} real valued functions. Then there exists a choice of C1([0,T],)C^{1}([0,T],\mathbb{R}) functions A(t)A(t) and D(t)D(t) that depend on f,gf,g and hh so that the solution to (1) is

(2) y(t)=eV(t)Mode[X(t)|X(T)=a],y(t)=e^{-V(t)}\operatorname{Mode}\left[X(t)\bigg{|}X(T)=a\right],

where X(t)=0t[A(s)B(s)+D(s)]𝑑s+B(t)X(t)=\int_{0}^{t}[A(s)B(s)+D(s)]ds+B(t), B(t)B(t) is a standard Brownian motion and additionally V(t)=210tf(s)𝑑sV(t)=2^{-1}\int_{0}^{t}{f(s)}ds. Here, the conditional mode of a diffusion is defined as the minimizer of the Onsager-Machlup function defined in Theorem 13 over all paths zz with z(T)=az(T)=a.

The eV(t)e^{-V(t)} term is analogous to the Feynman-Kac for PDEs case, Theorem 1. The Feynman-Kac formula can first be proven for diffusions with no factor in front of the u(t,x)u(t,x) term. Then including the exponential weight term, one can get more general terms in front of u(t,x)u(t,x).

The primary tool we will use is the Onsager-Machlup formalism established in [15, 10, 26]. In particular, we will extend the Portmanteau theorem in [26], which relates information projections, Kullback-Leibler weighted control, finding the most likely path of a diffusion and Euler-Lagrange equations. In [15], it was shown that the so-called “Onsager-Machlup” function is a Lagrangian for the most likely path of a diffusion. That is, the minimizer is the “mode” of the path. We show that the solution to (almost) any second order linear ODE is the (potentially weighted) conditional mode a diffusion, defined as the minimizer of the Onsager-Machlup function. Additionally, we extend our results to include systems of second order linear ODEs.

In Section 2, we introduce this formalism and the required stochastic analysis. In Section 3 we extend the Portmanteau theorem of [26] for our purposes. In Section 4 we prove Theorem 2. Finally in Section 5 we give some examples.

2. Preliminaries

In this section, we collect some preliminary results on stochastic analysis on Wiener space and finding modes of diffusions through the Onsager-Machlup formalism. For more information, see [3, 10, 18, 26]. We first define classical Wiener space and the Cameron-Martin space of Brownian motion which play a central role in our paper.

Definition 3.

We say that (𝒞0([0,T],n),(),μ0)(\mathcal{C}_{0}([0,T],\mathbb{R}^{n}),\mathcal{B}(\|\cdot\|_{\infty}),\mu_{0}) is classical Wiener space, where 𝒞0[0,T]\mathcal{C}_{0}[0,T] is the space of continuous n\mathbb{R}^{n} valued functions ff on the compact interval [0,T][0,T] with f(0)=0f(0)=0, ()\mathcal{B}(\|\cdot\|_{\infty}) are the Borel sets from the supremum norm \|\cdot\|_{\infty} and μ0\mu_{0} is classical Wiener measure corresponding to a standard nn-dimensional Brownian motion B(t)B(t).

Definition 4.

The Cameron-Martin space μ0\mathcal{H}_{\mu_{0}} associated to classical Wiener space is the Sobolev space W01,2[0,T]W_{0}^{1,2}[0,T] of all absolutely continuous functions whose derivative lie in L2[0,T]L^{2}[0,T]. We also denote the Cameron-Martin norm of an element zμ0z\in\mathcal{H}_{\mu_{0}} by

zμ02:=0Tz˙(t)2𝑑t.\|z\|_{\mu_{0}}^{2}:=\int_{0}^{T}\|\dot{z}(t)\|^{2}dt.

The primary reason why we need the Cameron-Martin space is the following theorem, called the Cameron-Martin theorem.

Theorem 5.

[18, Theorem 3.41] Let μ0\mu_{0} be classical Wiener measure on 𝒞0\mathcal{C}_{0}. For h𝒞0h\in\mathcal{C}_{0}, define the pushforward measure μ:=Th(μ0)\mu:=T_{h}^{\ast}(\mu_{0}) by

Th(μ0)(A)=μ0(Ah),T_{h}^{\ast}(\mu_{0})(A)=\mu_{0}(A-h),

for all Borel A()A\in\mathcal{B}(\|\cdot\|_{\infty}). Then the pushforward measure μ:=Th(μ0)\mu:=T_{h}^{\ast}(\mu_{0}) is absolutely continuous with respect to μ0\mu_{0} if and only hμ0h\in\mathcal{H}_{\mu_{0}}. Furthermore, the Radon-Nikodym derivative of μ\mu with respect to μ0\mu_{0} is

dμdμ0=exp(0Th˙(t)𝑑B(t)120Th˙(t)2𝑑t).\frac{d\mu}{d\mu_{0}}=\exp\left(\int_{0}^{T}\dot{h}(t)\cdot dB(t)-\frac{1}{2}\int_{0}^{T}\|\dot{h}(t)\|^{2}dt\right).

For more information on Cameron-Martin spaces see [18], Chapter 3. We formally define the set of Gaussian shift measures from Theorem 5 to be

(3) 𝒫:={Th(μ0):hμ0}.\mathcal{P}:=\{T_{h}^{\ast}(\mu_{0}):h\in\mathcal{H}_{\mu_{0}}\}.

The Cameron-Martin theorem states that one can shift by “deterministic” paths that lie in the Cameron-Martin space in a way that the pushforward measure is absolutely continuous with respect to μ0\mu_{0}. The generalization to shifting classical Wiener measure by more general processes is Girsanov’s theorem. We provide both a forward and a reverse direction here.

Theorem 6.

[3, Lemma 5.76] Let (𝒞0[0,T],μ0)(\mathcal{C}_{0}[0,T],\mu_{0}) be classical Wiener space, and assume:

(a) FF is a progressively measurable process with respect to the filtration generated by the standard Brownian motion B(t)B(t);

(b) The sample paths are almost surely in the Cameron-Martin space W01,2W_{0}^{1,2};

(c) Novikov’s condition,

Eμ0[exp(0Tf(s)2𝑑s)]<,E_{\mu_{0}}\left[\exp\left(\int_{0}^{T}\|f(s)\|^{2}ds\right)\right]<\infty,

where f(s)=F˙(s)f(s)=\dot{F}(s), holds.

Then, the process B(t)F(t)B(t)-F(t) is a standard Brownian motion under μ\mu with density

dμdμ0=exp(0Tf(s)𝑑B(s)120Tf(s)2𝑑s).\frac{d\mu}{d\mu_{0}}=\exp\left(\int_{0}^{T}f(s)\cdot dB(s)-\frac{1}{2}\int_{0}^{T}\|f(s)\|^{2}ds\right).
Theorem 7.

[3, Theorem 5.72] Let (𝒞0[0,T],μ0)(\mathcal{C}_{0}[0,T],\mu_{0}) be classical Wiener space, and suppose μμ0\mu\sim\mu_{0}. Then, there exists a progressively measurable process F(t)F(t) with sample paths almost surely in the Cameron-Martin space W01,2W_{0}^{1,2} so that the process B(t)F(t)B(t)-F(t) is a standard Brownian motion under μ\mu. Furthermore, the density is given by

dμdμ0=exp(0Tf(s)𝑑B(s)120Tf(s)2𝑑s),\frac{d\mu}{d\mu_{0}}=\exp\left(\int_{0}^{T}f(s)\cdot dB(s)-\frac{1}{2}\int_{0}^{T}\|f(s)\|^{2}ds\right),

where f(s)=F˙(s)f(s)=\dot{F}(s).

We can now introduce Onsager-Machlup formalism for finding the mode of a diffusion. The Onsager-Machlup function is defined as below.

Definition 8.

Let μ0\mu_{0} be a classical Wiener measure on 𝒞0\mathcal{C}_{0}. Let Bδ(z)B^{\delta}(z)\subset\mathcal{B} be the open ball of radius δ\delta around zz. Let μ\mu be another measure that is absolutely continuous with respect to μ0\mu_{0}. If the limit

limδ0μ(Bδ(z2))μ(Bδ(z1))=exp(OM(z1)OM(z2))\lim_{\delta\to 0}\frac{\mu(B^{\delta}(z_{2}))}{\mu(B^{\delta}(z_{1}))}=\exp\left(OM(z_{1})-OM(z_{2})\right)

exists for all z1,z2μ0z_{1},z_{2}\in\mathcal{H}_{\mu_{0}}, then OMOM is called the Onsager-Machlup function for μ\mu.

In order to ensure that the Onsager-Machlup function exists, we need the following hypothesis and theorem taken from [10].

Hypothesis 9.

Let Φ:𝒞0\Phi:\mathcal{C}_{0}\to\mathbb{R} be a functional satisfying the following conditions:

(i) For every ε>0\varepsilon>0 there is an MM\in\mathbb{R} such that

Φ(ω)Mεω2,\Phi(\omega)\geq M-\varepsilon\|\omega\|^{2},

for all ω𝒞0\omega\in\mathcal{C}_{0}.

(ii) Φ\Phi is locally bounded above, i.e., for every r>0r>0 there is a K=K(r)>0K=K(r)>0 such that

Φ(ω)K,\Phi(\omega)\leq K,

for all ω𝒞0\omega\in\mathcal{C}_{0} with ω<r\|\omega\|<r.

(iii) Φ\Phi is locally Lipschitz continuous, i.e., for every r>0r>0 there exists L=L(r)>0L=L(r)>0 such that

|Φ(ω1)Φ(ω2)|Lω1ω2,|\Phi(\omega_{1})-\Phi(\omega_{2})|\leq L\|\omega_{1}-\omega_{2}\|,

for all ω1,ω2𝒞0\omega_{1},\omega_{2}\in\mathcal{C}_{0} with ω1<r\|\omega_{1}\|<r and ω2<r\|\omega_{2}\|<r.

Theorem 10.

[10, Theorem 3.2] Let Φ:𝒞0\Phi:\mathcal{C}_{0}\to\mathbb{R} satisfy Hypothesis 9. Let the measure μ\mu have the density

dμdμ0=eΦEμ0[eΦ].\frac{d\mu}{d\mu_{0}}=\frac{e^{-\Phi}}{E_{\mu_{0}}[e^{-\Phi}]}.

Then, the OM function is given by

OMΦ(z)={Φ(z)+12zμ02 if zμ0 else.OM_{\Phi}(z)=\begin{cases}\Phi(z)+\frac{1}{2}\|z\|_{\mu_{0}}^{2}&\text{ if }z\in\mathcal{H}_{\mu_{0}}\\ \infty&\text{ else}\end{cases}.
Remark 11.

The minimizer of the Onsager-Machlup function can be seen as the “most likely” path of the diffusion, which was shown in [15]. This fact is central to the approach in this paper. Furthermore, we define the conditional mode to minimize OMOM over a region. That is, for a subset Aμ0A\subset\mathcal{H}_{\mu_{0}} and a process XX whose path measure has a density

dμdμ0=eΦEμ0[eΦ],\frac{d\mu}{d\mu_{0}}=\frac{e^{-\Phi}}{E_{\mu_{0}}[e^{-\Phi}]},

we define

Mode[X(t)|A]:=argminzAOMΦ(z),\operatorname{Mode}\left[X(t)\bigg{|}A\right]:=\arg\min_{z\in A}OM_{\Phi}(z),

if the infimum is indeed achieved.

We also recall the definition of Kullback-Leibler divergence. For two probability measures μ1,μ2\mu_{1},\mu_{2} on some measurable space (X,𝒳)(X,\mathcal{X}), the KL divergence is defined by

DKL(μ1||μ2)={Eμ1[log(dμ1dμ2)] if dμ1dμ2 exists  else.D_{KL}(\mu_{1}||\mu_{2})=\begin{cases}E_{\mu_{1}}\left[\log\left(\frac{d\mu_{1}}{d\mu_{2}}\right)\right]&\text{ if }\frac{d\mu_{1}}{d\mu_{2}}\text{ exists }\\ \infty&\text{ else}.\end{cases}

We recall three key properties of the KL-divergence: (i) DKL(μ||μ0)0D_{KL}(\mu||\mu_{0})\geq 0; (ii) DKL(μ||μ0)=0D_{KL}(\mu||\mu_{0})=0 if and only if μ=μ0\mu=\mu_{0}; and (iii) μDKL(μ||μ0)\mu\mapsto D_{KL}(\mu||\mu_{0}) is convex. The next result shows how to compute the KL-divergence between two shift measures drawn from the Cameron-Martin space μ0\mathcal{H}_{\mu_{0}}.

Theorem 12.

[9, Lemma 3.20] Let μ0\mu_{0} be a centered Gaussian measure on \mathcal{B} with Cameron-Martin space μ0\mathcal{H}_{\mu_{0}}, and let μ1=Th1(μ0)\mu_{1}=T_{h_{1}}^{\ast}(\mu_{0}) and μ2=Th2(μ0)\mu_{2}=T_{h_{2}}^{\ast}(\mu_{0}) for some h1,h2μ0h_{1},h_{2}\in\mathcal{H}_{\mu_{0}}. Then, the Radon-Nikodym derivative dμ1dμ2\frac{d\mu_{1}}{d\mu_{2}} exists and

DKL(μ1||μ2)=12h1h2μ02.D_{KL}(\mu_{1}||\mu_{2})=\frac{1}{2}\|h_{1}-h_{2}\|_{\mu_{0}}^{2}.

3. A Portmanteau Theorem

The final result we need is analogous to Theorem 1.1 from [26], a portmanteau theorem that lets us convert from Euler-Lagrange equations, to information projections, to Kullback Leibler (KL) divergence weighted control, to Onsager-Machlup functions for measures on classical Wiener space.

Theorem 13.

Let 𝒞0\mathcal{C}_{0} be a classical Wiener space with Cameron-Martin space μ0\mathcal{H}_{\mu_{0}} (see Definition 4). Let C:𝒞0C:\mathcal{C}_{0}\to\mathbb{R} be a functional so that μ0(C<+)>0\mu_{0}(C<+\infty)>0 and Eμ0[exp(C)|C|]<+E_{\mu_{0}}[\exp(-C)\,|C|]<+\infty. Furthermore, assume that the functional Φ:𝒞0\Phi:\mathcal{C}_{0}\to\mathbb{R} defined by Φ(z)=Eμ0[C(ω+z)]\Phi(z)=E_{\mu_{0}}[C(\omega+z)] exists and satisfies Hypothesis 9. Define the probability measure μ\mu^{\ast} on 𝒞0\mathcal{C}_{0} with density

dμdμ0:=eCEμ0[eC],\frac{d\mu^{\ast}}{d\mu_{0}}:=\frac{e^{-C}}{E_{\mu_{0}}[e^{-C}]},

and the probability measure μ~\tilde{\mu} on 𝒞0\mathcal{C}_{0} with density

dμ~dμ0:=eΦEμ0[eΦ].\frac{d\tilde{\mu}}{d\mu_{0}}:=\frac{e^{-\Phi}}{E_{\mu_{0}}[e^{-\Phi}]}.

Let 𝒫\mathcal{P} be the set of Gaussian shift measures which are absolutely continuous with respect to μ0\mu_{0} (defined in Eq. (3)). Define 𝒫a\mathcal{P}_{a} to be the subset of 𝒫\mathcal{P} defined in (3) defined by

(4) 𝒫a:={Th(μ0):hμ0,h(T)=a}.\mathcal{P}_{a}:=\{T_{h}^{\ast}(\mu_{0}):h\in\mathcal{H}_{\mu_{0}},h(T)=a\}.

Furthermore, suppose that CC is such that that Eμ0[C2]<E_{\mu_{0}}[C^{2}]<\infty and therefore there is a progressively measurable process Γ(t,B(t))\Gamma(t,B(t)) so that C=0TΓ(t,B(t))𝑑B(t)C=\int_{0}^{T}\Gamma(t,B(t))\cdot dB(t) by Itô’s representation theorem. Let L(t,z,z˙)L(t,z,\dot{z}) be the Lagrangian

(5) L(t,q(t),q˙(t);Γ):=Eμ0[q˙(t)Γ(t,B(t)+q(t))2].L(t,q(t),\dot{q}(t);\,\Gamma):=E_{\mu_{0}}\left[\|\dot{q}(t)-\Gamma(t,B(t)+q(t))\|^{2}\right].

Suppose that there exist α>0\alpha>0 and β0\beta\geq 0 such that L(t,x,x˙)α|x˙|2βL(t,x,\dot{x})\geq\alpha|\dot{x}|^{2}-\beta for all t[0,T]t\in[0,T], xnx\in\mathbb{R}^{n}, and x˙n\dot{x}\in\mathbb{R}^{n}. Consider the calculus of variations problem:

Consider the following four optimization problems:

(a) (Information projection)

𝕂:=infμ𝒫aDKL(μ||μ).\mathbb{K}:=\inf_{\mu\in\mathcal{P}_{a}}D_{KL}(\mu||\mu^{\ast}).

(b) (State independent KL-weighted control)

:=infμ𝒫a{Eμ[C]+DKL(μ||μ0)}.\mathbb{P}:=\inf_{\mu\in\mathcal{P}_{a}}\left\{E_{\mu}[C]+D_{KL}(\mu||\mu_{0})\right\}.

(c) (Conditional mode of μ~\tilde{\mu})

𝕄:=infzμ0,z(T)=aOMΦ(z).\mathbb{M}:=\inf_{z\in\mathcal{H}_{\mu_{0}},z(T)=a}OM_{\Phi}(z).

(d) (Calculus of variations)

𝕃:infzW01,2[0,T]{(z,z˙;Γ):=120TL(t,z(t),z˙(t);Γ)dt s.t. z(0)=0,z(T)=a}.\mathbb{L}:\inf_{z\in W_{0}^{1,2}[0,T]}\left\{\mathcal{L}(z,\dot{z};\,\Gamma):=\frac{1}{2}\int_{0}^{T}L(t,z(t),\dot{z}(t);\,\Gamma)dt\text{ s.t. }z(0)=0,z(T)=a\right\}.

Then, the optimal values val(𝕂)\operatorname{val}(\mathbb{K}), val()\operatorname{val}(\mathbb{P}), val(𝕄)\operatorname{val}(\mathbb{M}) and val(𝕃)\operatorname{val}(\mathbb{L}) are all attained (and so all three problems have optimal solutions). By the Cameron-Martin theorem (see Theorem 5), we can identify each shift measure μz𝒫\mu_{z}\in\mathcal{P} with its corresponding shift zμ0z\in\mathcal{H}_{\mu_{0}}, and so we have

sol(𝕂)=sol()sol(𝕄)=sol(𝕃).\operatorname{sol}(\mathbb{K})=\operatorname{sol}(\mathbb{P})\equiv\operatorname{sol}(\mathbb{M})=\operatorname{sol}(\mathbb{L}).

In the above, we have used equivalence “\equiv” for shorthand to denote that μzsol()\mu_{z}\in\operatorname{sol}(\mathbb{P}) (or μzsol(𝕂)\mu_{z}\in\operatorname{sol}(\mathbb{K})) if and only if its corresponding shift zsol(𝕄).z\in\operatorname{sol}(\mathbb{M}). (or zsol(𝕃)z\in\operatorname{sol}(\mathbb{L})).

Proof.

Equivalence of (a) and (b) Under assumptions the μ0(C<+)>0\mu_{0}(C<+\infty)>0 and Eμ0[exp(C)|C|]<+E_{\mu_{0}}[\exp(-C)\,|C|]<+\infty the measure μ\mu^{\ast} is well defined, see Lemma 2.4 in [6]. Then we can note that for any μ\mu equivalent to μ\mu^{\ast} we have that

DKL(μ||μ)\displaystyle D_{KL}(\mu||\mu^{\ast}) =Eμ[log(dμdμ)]\displaystyle=E_{\mu}\left[\log\left(\frac{d\mu}{d\mu^{\ast}}\right)\right]
=Eμ[log(dμdμ0)]Eμ[log(dμdμ0)]\displaystyle=E_{\mu}\left[\log\left(\frac{d\mu}{d\mu_{0}}\right)\right]-E_{\mu}\left[\log\left(\frac{d\mu^{\ast}}{d\mu_{0}}\right)\right]
=Eμ[C]+DKL(μ||μ0)logEμ0[eC].\displaystyle=E_{\mu}[C]+D_{KL}(\mu||\mu_{0})-\log E_{\mu_{0}}[e^{-C}].

Taking an infimum of both sides shows that problem (a) is equivalent to problem (b).

Equivalence of (a) and (d) We just have to check that for all μ𝒫\mu\in\mathcal{P} where 𝒫\mathcal{P} is defined in (3), with shift qμ0q\in\mathcal{H}_{\mu_{0}} we have that

DKL(μ||μ)=120TL(t,q(t),q˙(t);Γ)dt.D_{KL}(\mu||\mu^{\ast})=\frac{1}{2}\int_{0}^{T}L(t,q(t),\dot{q}(t);\Gamma)dt.

To this aim, we note that if μ𝒫\mu\in\mathcal{P} where 𝒫\mathcal{P} is defined in (3) with corresponding shift qq that

dμdμ=exp(0T(q˙Γ(t,B(t)))𝑑B(t)120Tq˙2Γ(t,B(t))2ds).\frac{d\mu}{d\mu^{\ast}}=\exp\left(\int_{0}^{T}(\dot{q}-\Gamma(t,B(t)))\cdot dB(t)-\frac{1}{2}\int_{0}^{T}\|\dot{q}\|^{2}-\|\Gamma(t,B(t))\|^{2}ds\right).

By Girsanov’s theorem, Theorem 6, we have that B(t)=B~(t)+q˙(t)B(t)=\tilde{B}(t)+\dot{q}(t) where B~(t)\tilde{B}(t) is a Brownian motion under μ\mu. Therefore the KL divergence is thus

DKL(μ||μ)\displaystyle D_{KL}(\mu||\mu^{\ast}) =Eμ[0T(q˙(t)Γ(t,B~(t)+q(t)))d(B~(t)+q(t))\displaystyle=E_{\mu}\bigg{[}\int_{0}^{T}(\dot{q}(t)-\Gamma(t,\tilde{B}(t)+q(t)))\cdot d(\tilde{B}(t)+q(t))
120T(q˙2Γ(t,B~(t)+q(t))2)dt].\displaystyle\qquad-\frac{1}{2}\int_{0}^{T}(\|\dot{q}\|^{2}-\|\Gamma(t,\tilde{B}(t)+q(t))\|^{2})dt\bigg{]}.

Using the mean zero property of Itô integration yields that

DKL(μ||μ)\displaystyle D_{KL}(\mu||\mu^{\ast}) =Eμ[0T(q˙(t)2Γ(t,B~(t)+q(t)))q˙(t))dt\displaystyle=E_{\mu}\bigg{[}\int_{0}^{T}(\|\dot{q}(t)\|^{2}-\Gamma(t,\tilde{B}(t)+q(t)))\cdot\dot{q}(t))dt
120T(q˙2Γ(t,B~(t)+q(t))2)dt].\displaystyle\qquad-\frac{1}{2}\int_{0}^{T}(\|\dot{q}\|^{2}-\|\Gamma(t,\tilde{B}(t)+q(t))\|^{2})dt\bigg{]}.

Therefore using Fubini’s theorem and a polarization identity we conclude that

DKL(μ||μ)=120TEμ0[q˙(t)Γ(t,B(t)+q(t))2]dt.D_{KL}(\mu||\mu^{\ast})=\frac{1}{2}\int_{0}^{T}E_{\mu_{0}}\left[\|\dot{q}(t)-\Gamma(t,B(t)+q(t))\|^{2}\right]dt.

The assumptions on the Lagrangian LL imply that there is a unique solution to problem (d) and hence there is a solution to problem (a).

Equivalence of (a) and (c) Using Theorem 12 and the definition of Φ\Phi for each μ𝒫\mu\in\mathcal{P} with corresponding shift zμμ0z_{\mu}\in\mathcal{H}_{\mu_{0}}, we have that

DKL(μ||μ)\displaystyle D_{KL}(\mu||\mu^{\ast}) =Eμ[C]+DKL(μ||μ0)logEμ0[eC]\displaystyle=E_{\mu}[C]+D_{KL}(\mu||\mu_{0})-\log E_{\mu_{0}}[e^{-C}]
=Eμ0[C(ω+zμ)]+12zμμ02logEμ0[eC]\displaystyle=E_{\mu_{0}}[C(\omega+z_{\mu})]+\frac{1}{2}\|z_{\mu}\|_{\mu_{0}}^{2}-\log E_{\mu_{0}}[e^{-C}]
=Φ(zμ)+12zμμ02logEμ0[eC]\displaystyle=\Phi(z_{\mu})+\frac{1}{2}\|z_{\mu}\|_{\mu_{0}}^{2}-\log E_{\mu_{0}}[e^{-C}]
=OMΦ(zμ)logEμ0[eC].\displaystyle=OM_{\Phi}(z_{\mu})-\log E_{\mu_{0}}[e^{-C}].

Using the equivalence between (a) and (d), along with the assumptions on the Lagrangian LL concludes.

Remark 14.

Theorem 13 is based on [26, Theorem 1.1]. However, there are three key differences. First, parts (a), (b), and (c) are stated in terms of a general Gaussian measure in [26, Theorem 1.1] whereas in Theorem 13 it is stated for a nn dimensional Brownian motion. Second, part (d) in [26, Theorem 1.1] is only stated in terms of a 11 dimensional Brownian motion whereas in Theorem 13 it is for nn dimensions. Third, in [26, Theorem 1.1], everything is stated in terms of unconditional mode instead of conditional mode from Theorem 13.

4. Proof of Feynman-Kac for ODEs

In this section, we restate Theorem 2 in more detail and offer a proof. For pedagogical reasons, we first prove the result for a single ODE. We then extend to the case of a system of ODEs. The proof for the system is similar but it is included for completeness.

4.1. Feynman-Kac for single ODE

In this subsection we prove our Feynman-Kac result for a single linear second order ODE. First we prove the case in which f(t)=0f(t)=0 and thus V(t)=0V(t)=0 and we are working with no exponential weight. We then use this to construct the solution for general ff. First we have the following lemma.

Lemma 15.

If Γ(t,x)=A(t)x+D(t)\Gamma(t,x)=A(t)x+D(t) for C1([0,T],n×n)C^{1}([0,T],\mathbb{R}^{n\times n}) matrix AA and C1([0,T],n)C^{1}([0,T],\mathbb{R}^{n}) vector DD, then μ=μ~\mu^{\ast}=\tilde{\mu}.

Proof.

We just have to show that C=ΦC=\Phi. Recall that C=0TΓ(t,B(t))𝑑B(t)C=\int_{0}^{T}\Gamma(t,B(t))\cdot dB(t). To the stated aim, we write that

Φ(z)\displaystyle\Phi(z) =Eμ0[C(ω+z)]\displaystyle=E_{\mu_{0}}[C(\omega+z)]
=Eμ0[0T[A(t)(B(t)+z(t))+D(t)]d(B(t)+z(t))]\displaystyle=E_{\mu_{0}}\left[\int_{0}^{T}[A(t)(B(t)+z(t))+D(t)]\cdot d(B(t)+z(t))\right]
=0T[A(t)z(t)+D(t)]𝑑z(t)\displaystyle=\int_{0}^{T}[A(t)z(t)+D(t)]\cdot dz(t)
=C(z).\displaystyle=C(z).

Proposition 16.

Consider the differential equation

(6) y¨(t)+g(t)y(t)+h(t)=0,\ddot{y}(t)+g(t)y(t)+h(t)=0,

where the unknown yC2([0,T],)y\in C^{2}([0,T],\mathbb{R}), with boundary conditions y(0)=0y(0)=0 and y(T)=ay(T)=a. Let g(t)g(t) and h(t)h(t) be C1([0,T],)C^{1}([0,T],\mathbb{R}) functions so that the differential equations

(7) A2(t)+A˙(t)=g(t)A^{2}(t)+\dot{A}(t)=-g(t)

and

D˙(t)+D(t)A(t)=h(t)\dot{D}(t)+D(t)A(t)=-h(t)

have solutions, A(t),D(t)A(t),D(t). Let XX be the diffusion corresponding to μ\mu^{\ast} given in Theorem 13 with Γ(t,x)=A(t)x+D(t)\Gamma(t,x)=A(t)x+D(t). Then the solution to (6) is

(8) y(t)=Mode[X(t)|X(T)=a],y(t)=\operatorname{Mode}\left[X(t)\bigg{|}X(T)=a\right],

Here, the conditional mode of a diffusion is defined as the minimizer of the Onsager-Machlup function defined in Theorem 13 over all paths zz with z(T)=az(T)=a, as in Remark 11.

Proof.

Let Γ(t,x)=A(t)x+D(t)\Gamma(t,x)=A(t)x+D(t) for the Lagrangian defined in Theorem 13. The Lagrangian is thus

(9) L(t,q,q˙)=Eμ0[(q˙(t)A(t)(B(t)+q(t))D(t))2].L(t,q,\dot{q})=E_{\mu_{0}}\left[(\dot{q}(t)-A(t)(B(t)+q(t))-D(t))^{2}\right].

First, we note that all of the hypotheses of Theorem 13 hold as A,DA,D are both C1C^{1} functions by assumption. Computing the derivatives of LL yields that

Lq(t,q,q˙)=2A(t)(q˙(t)A(t)q(t)D(t))L_{q}(t,q,\dot{q})=-2A(t)\left(\dot{q}(t)-A(t)q(t)-D(t)\right)

and

Lq˙(t,q,q˙)=2(q˙(t)A(t)q(t)D(t)).L_{\dot{q}}(t,q,\dot{q})=2\left(\dot{q}(t)-A(t)q(t)-D(t)\right).

The Euler Lagrange equation for LL is then

A(t)(q˙(t)A(t)q(t)D(t))q¨(t)+A˙(t)q(t)+A(t)q˙(t)+D˙(t)=0.-A(t)\left(\dot{q}(t)-A(t)q(t)-D(t)\right)-\ddot{q}(t)+\dot{A}(t)q(t)+A(t)\dot{q}(t)+\dot{D}(t)=0.

We may simplify this as

(10) q¨(t)+q(t)(A2(t)+A˙(t))+(D˙(t)+A(t)D(t))=0.-\ddot{q}(t)+q(t)\left(A^{2}(t)+\dot{A}(t)\right)+\left(\dot{D}(t)+A(t)D(t)\right)=0.

By the assumption of AA and DD, we have that the Euler Lagrange equation for the Lagrangian LL is equation (6). Therefore by Theorem 13 we have that the solution to equation (6) is the conditional mode of μ~\tilde{\mu}. By Lemma 15 we have that μ=μ~\mu^{\ast}=\tilde{\mu}. Therefore the solution to equation (6) is the conditional mode of μ\mu^{\ast}. ∎

Remark 17.

The equation

(11) A2(t)+A˙(t)=g(t)A^{2}(t)+\dot{A}(t)=-g(t)

is an example of a Riccati equation, whose general form is

A˙(t)=γ0(t)+γ1(t)A(t)+γ2(t)A2(t),\dot{A}(t)=\gamma_{0}(t)+\gamma_{1}(t)A(t)+\gamma_{2}(t)A^{2}(t),

where γi\gamma_{i} are continuous functions and γ0,γ20\gamma_{0},\gamma_{2}\neq 0. See [2] for conditions on solving Riccati equations explicitly. To this aim, they convert the Riccati equation to a second order linear differential equation

z¨(t)+(γ1(t)+γ˙2(t)/γ2(t))z˙(t)+γ0(t)γ2(t)z(t)=0,\ddot{z}(t)+(-\gamma_{1}(t)+\dot{\gamma}_{2}(t)/\gamma_{2}(t))\dot{z}(t)+\gamma_{0}(t)\gamma_{2}(t)z(t)=0,

where A(t)=z˙(t)/(γ2(t)z(t)).A(t)=-\dot{z}(t)/(\gamma_{2}(t)z(t)). In the case of Theorem 16, the Riccati equation (11) turns into the homogeneous equation

z¨(t)+g(t)z(t)=0,\ddot{z}(t)+g(t)z(t)=0,

with A(t)=z˙(t)/z(t).A(t)=-\dot{z}(t)/z(t). Note however that this is not quite the homogeneous version of (6) because in (6) we insist that y(0)=0y(0)=0.

Explicitly giving CC for an equation (6) comes down to the solvability of the equation (11). We give an example of when (11) is explicitly solvable as a corollary.

Corollary 18.

Consider the differential equation

(12) y¨(t)+gy(t)+h(t)=0,\ddot{y}(t)+gy(t)+h(t)=0,

where the unknown yC2([0,T],)y\in C^{2}([0,T],\mathbb{R}), with boundary conditions y(0)=0y(0)=0 and y(T)=ay(T)=a. Let h(t)h(t) be C1([0,T],)C^{1}([0,T],\mathbb{R}) and let g0g\leq 0 be a constant. Then the solution to (12) is

(13) y(t)=Mode[X(t)|X(T)=a],y(t)=\operatorname{Mode}\left[X(t)\bigg{|}X(T)=a\right],

where XX is the diffusion

X(t)=0t(gB(u)eg1uesgh(s)𝑑s)𝑑u+B(t).X(t)=\int_{0}^{t}\left(\sqrt{-g}B(u)-e^{-\sqrt{-g}}\int_{1}^{u}e^{s\sqrt{-g}}h(s)ds\right)du+B(t).
Remark 19.

One might ask if one can find a Γ(t,x)\Gamma(t,x) that yields an Euler-Lagrange equation of the form (1) with f0f\neq 0. The answer is no, and the Γ(t,x)\Gamma(t,x) used in Proposition 16 is of maximal generality, as the below lemma will show.

Lemma 20.

Let Γ(t,x)\Gamma(t,x) as in Portmanteau theorem be such that xΓ(t,x)\frac{\partial}{\partial x}\Gamma(t,x) depends nontrivially on xx. Then the resulting Euler Lagrange equation has a nonlinearity in qq.

Proof.

We write down the Lagrangian as

L(t,q,q˙)=Eμ0[(q˙(t)Γ(t,B(t)+q(t)))2].L(t,q,\dot{q})=E_{\mu_{0}}\left[(\dot{q}(t)-\Gamma(t,B(t)+q(t)))^{2}\right].

Taking derivatives yields

Lq(t,q,q˙)=2Eμ0[(q˙(t)Γ(t,B(t)+q(t)))(Γx(t,B(t)+q(t)))],L_{q}(t,q,\dot{q})=2E_{\mu_{0}}\left[(\dot{q}(t)-\Gamma(t,B(t)+q(t)))(-\Gamma_{x}(t,B(t)+q(t)))\right],

and

Lq˙(t,q,q˙)=2Eμ0[(q˙(t)Γ(t,B(t)+q(t)))].L_{\dot{q}}(t,q,\dot{q})=2E_{\mu_{0}}\left[(\dot{q}(t)-\Gamma(t,B(t)+q(t)))\right].

The Euler-Lagrange equation for LL is thus

q˙(t)Eμ0[Γx(t,B(t)+q(t))]+Eμ0[Γ(t,B(t)+q(t))Γx(t,B(t)+q(t))]\displaystyle-\dot{q}(t)E_{\mu_{0}}[\Gamma_{x}(t,B(t)+q(t))]+E_{\mu_{0}}[\Gamma(t,B(t)+q(t))\Gamma_{x}(t,B(t)+q(t))]
q¨(t)+ddtEμ0[Γ(t,B(t)+q(t))]=0.\displaystyle\leavevmode\nobreak\ \leavevmode\nobreak\ -\ddot{q}(t)+\frac{d}{dt}E_{\mu_{0}}[\Gamma(t,B(t)+q(t))]=0.

Because Γx(t,x)\Gamma_{x}(t,x) depends nontrivially on xx, the term

Eμ0[Γ(t,B(t)+q(t))Γx(t,B(t)+q(t))]E_{\mu_{0}}[\Gamma(t,B(t)+q(t))\Gamma_{x}(t,B(t)+q(t))]

depends nonlinearly on qq with no terms involving q˙\dot{q} or q¨\ddot{q}. The three other terms will all involve a q˙\dot{q} or a q¨\ddot{q}, so there is no chance for cancellation. ∎

Corollary 21.

If an ODE

y¨(t)+f(t)y˙(t)+g(t)y(t)+h(t)=0,\ddot{y}(t)+f(t)\dot{y}(t)+g(t)y(t)+h(t)=0,

where f,g,hC1(0,T)f,g,h\in C^{1}(0,T), comes from the Euler-Lagrange equation in Theorem 13, then Γ(t,x)=A(t)x+C(t),\Gamma(t,x)=A(t)x+C(t), for some A,CC1(0,T)A,C\in C^{1}(0,T). Thus f=0f=0 in Proposition 16 is of maximal generality.

We finally prove our main theorem for general ff by applying a simple transformation to solutions given in Proposition 16. This is the full version of Theorem 2.

Theorem 22 (Feynman-Kac for ODEs).

Consider the differential equation

(14) y¨(t)+f(t)y˙(t)+g(t)y(t)+h(t)=0,\ddot{y}(t)+f(t)\dot{y}(t)+g(t)y(t)+h(t)=0,

where the unknown yC2([0,T],)y\in C^{2}([0,T],\mathbb{R}), with boundary conditions y(0)=0y(0)=0 and y(T)=ay(T)=a. Let f(t),g(t)f(t),g(t) and h(t)h(t) be C1C^{1} real valued functions. Let V(t)=210tf(s)𝑑sV(t)=2^{-1}\int_{0}^{t}f(s)ds. Define the functions

g^(t)=V¨(t)+(V˙(t))2f(t)V˙(t)+g(t)\hat{g}(t)=-\ddot{V}(t)+(\dot{V}(t))^{2}-f(t)\dot{V}(t)+g(t)

and

h^(t)=eV(t)h(t).\hat{h}(t)=e^{V(t)}h(t).

Suppose that the differential equations

A2(t)+A˙(t)=g^(t)A^{2}(t)+\dot{A}(t)=-\hat{g}(t)

and

D˙(t)+D(t)A(t)=h^(t)\dot{D}(t)+D(t)A(t)=-\hat{h}(t)

have solutions, A(t),D(t)A(t),D(t), implying that the solution y^(t)=Mode[XX(T)=a]\hat{y}(t)=\operatorname{Mode}\left[X\mid X(T)=a\right] given in Proposition 16 to the equation

y^′′(t)+g^(t)y^(t)+h^(t)=0\hat{y}^{\prime\prime}(t)+\hat{g}(t)\hat{y}(t)+\hat{h}(t)=0

exists. Then the solution to (14) is

(15) y(t)=eV(t)y^(t)=eV(t)Mode[XX(T)=a].y(t)=e^{-V(t)}\hat{y}(t)=e^{-V(t)}\operatorname{Mode}\left[X\mid X(T)=a\right].
Proof.

Differentiating yy once yields that

y˙(t)=eV(t)[y^(t)V˙(t)y^(t)].\dot{y}(t)=e^{-V(t)}[\hat{y}^{\prime}(t)-\dot{V}(t)\hat{y}(t)].

Differentiating again gives

y¨(t)=eV(t)[y^′′(t)V¨(t)y^(t)V˙(t)y^(t)V˙(t)y^(t)+(V˙(t))2y^(t)].\ddot{y}(t)=e^{-V(t)}[\hat{y}^{\prime\prime}(t)-\ddot{V}(t)\hat{y}(t)-\dot{V}(t)\hat{y}^{\prime}(t)-\dot{V}(t)\hat{y}^{\prime}(t)+(\dot{V}(t))^{2}\hat{y}(t)].

Checking that yy satisfies equation (14) concludes that

y¨(t)+f(t)y˙(t)+g(t)y(t)+h(t)\displaystyle\ddot{y}(t)+f(t)\dot{y}(t)+g(t)y(t)+h(t) =eV(t)[y^′′(t)+g^(t)y^(t)]+h(t)\displaystyle=e^{-V(t)}[\hat{y}^{\prime\prime}(t)+\hat{g}(t)\hat{y}(t)]+h(t)
=eV(t)[eV(t)h(t)]+h(t)\displaystyle=e^{-V(t)}[-e^{V(t)}h(t)]+h(t)
=0.\displaystyle=0.

4.2. Feynman-Kac for Systems of Equations

In this subsection, we extend Theorem 22 to the case of systems of linear second order ODEs. The proofs are similar to the proofs in the previous section, but we include them for completeness.

Proposition 23.

Consider the system of differential equations

(16) y¨(t)+g(t)y(t)+h(t)=0,\ddot{y}(t)+g(t)y(t)+h(t)=0,

where the unknown yC2([0,T],n)y\in C^{2}([0,T],\mathbb{R}^{n}), with boundary conditions y(0)=0y(0)=0 and y(T)=ay(T)=a. Let g(t)g(t) be a C1([0,T],n×n)C^{1}([0,T],\mathbb{R}^{n\times n}) matrix and let h(t)h(t) be a C1([0,T],n)C^{1}([0,T],\mathbb{R}^{n}) vector so that the differential equations

AT(t)A(t)+A˙(t)=g(t)A^{T}(t)A(t)+\dot{A}(t)=-g(t)

and

D˙(t)+AT(t)D(t)=h(t)\dot{D}(t)+A^{T}(t)D(t)=-h(t)

have solutions, A(t),D(t)A(t),D(t). Let XX be the diffusion corresponding to μ\mu given in Theorem 13 with Γ(t,x)=A(t)x+D(t)\Gamma(t,x)=A(t)x+D(t). Then the solution to (16) is

(17) y(t)=Mode[X(t)|X(T)=a],y(t)=\operatorname{Mode}\left[X(t)\bigg{|}X(T)=a\right],

Here, the conditional mode of a diffusion is defined as the minimizer of the Onsager-Machlup function defined in Theorem 13 over all paths zz with z(T)=az(T)=a, as in Remark 11.

Proof.

All the assumptions for Theorem 13 are satisfied so we may write the Lagrangian for Γ\Gamma as

(18) L(t,q,q˙)=Eμ0[q˙(t)A(t)(B(t)+q(t))D(t)2].L(t,q,\dot{q})=E_{\mu_{0}}\left[\|\dot{q}(t)-A(t)(B(t)+q(t))-D(t)\|^{2}\right].

Differentiating under the integral sign yields that

Lq(t,q,q˙)\displaystyle L_{q}(t,q,\dot{q}) =Eμ0[2(q˙(t)A(t)(B(t)+q(t))D(t))TA(t)]\displaystyle=-E_{\mu_{0}}\left[2(\dot{q}(t)-A(t)(B(t)+q(t))-D(t))^{T}A(t)\right]
=2(q˙(t)A(t)q(t)D(t))TA(t)\displaystyle=-2(\dot{q}(t)-A(t)q(t)-D(t))^{T}A(t)
=2(q˙T(t)A(t)qT(t)AT(t)A(t)DT(t)A(t))\displaystyle=-2(\dot{q}^{T}(t)A(t)-q^{T}(t)A^{T}(t)A(t)-D^{T}(t)A(t))

and

Lq˙(t,q,q˙)\displaystyle L_{\dot{q}}(t,q,\dot{q}) =Eμ0[2(q˙(t)A(t)(B(t)+q(t))D(t))T]\displaystyle=E_{\mu_{0}}\left[2(\dot{q}(t)-A(t)(B(t)+q(t))-D(t))^{T}\right]
=2(q˙(t)A(t)q(t)D(t))T\displaystyle=2(\dot{q}(t)-A(t)q(t)-D(t))^{T}
=2(q˙T(t)qT(t)AT(t)DT(t)).\displaystyle=2(\dot{q}^{T}(t)-q^{T}(t)A^{T}(t)-D^{T}(t)).

The Euler-Lagrange equation is then

q˙T(t)A(t)+qT(t)AT(t)A(t)+DT(t)A(t)q¨T(t)+qTA˙T(t)+q˙T(t)AT(t)+D˙T(t)=0.-\dot{q}^{T}(t)A(t)+q^{T}(t)A^{T}(t)A(t)+D^{T}(t)A(t)-\ddot{q}^{T}(t)+q^{T}\dot{A}^{T}(t)+\dot{q}^{T}(t)A^{T}(t)+\dot{D}^{T}(t)=0.

Collecting terms yields that

(19) q¨T(t)+qT(t)(AT(t)A(t)+A˙(t))+DT(t)A(t)+D˙T(t)=0.-\ddot{q}^{T}(t)+q^{T}(t)(A^{T}(t)A(t)+\dot{A}(t))+D^{T}(t)A(t)+\dot{D}^{T}(t)=0.

Taking the transpose of equation (19) yields that

q¨(t)+(AT(t)A(t)+A˙T(t))q˙(t)+AT(t)D(t)+D˙(t)=0,-\ddot{q}(t)+(A^{T}(t)A(t)+\dot{A}^{T}(t))\dot{q}(t)+A^{T}(t)D(t)+\dot{D}(t)=0,

which is equation (16). ∎

Theorem 24 (Feynman-Kac for system of ODEs).

Consider the system of differential equations

(20) y¨(t)+f(t)y˙(t)+g(t)y(t)+h(t)=0,\ddot{y}(t)+f(t)\dot{y}(t)+g(t)y(t)+h(t)=0,

where the unknown yC2([0,T],n)y\in C^{2}([0,T],\mathbb{R}^{n}), with boundary conditions y(0)=0y(0)=0 and y(T)=ay(T)=a. Let f(t),g(t)f(t),g(t) be C1([0,T],n×n)C^{1}([0,T],\mathbb{R}^{n\times n}) matrices and let h(t)h(t) be a C1([0,T],n)C^{1}([0,T],\mathbb{R}^{n}) vector. Let V(t)=210tf(s)𝑑sV(t)=2^{-1}\int_{0}^{t}f(s)ds. Suppose that the matrix exponentials eV(t)e^{V(t)} and eV(t)e^{-V(t)} exist and eV(t)e^{-V(t)} commutes with both f(t)f(t) and g(t)g(t). Define the functions

g^(t)=V¨(t)+(V˙(t))2f(t)V˙(t)+g(t)\hat{g}(t)=-\ddot{V}(t)+(\dot{V}(t))^{2}-f(t)\dot{V}(t)+g(t)

and

h^(t)=eV(t)h(t),\hat{h}(t)=e^{V(t)}h(t),

where eV(t)e^{V(t)} denotes the standard exponential of a matrix. Suppose that the differential equations

(21) AT(t)A(t)+A˙(t)=g^(t)A^{T}(t)A(t)+\dot{A}(t)=-\hat{g}(t)

and

D˙(t)+AT(t)D(t)=h^(t)\dot{D}(t)+A^{T}(t)D(t)=-\hat{h}(t)

have solutions, A(t),D(t)A(t),D(t), implying that the solution y^(t)=Mode[XX(T)=a]\hat{y}(t)=\operatorname{Mode}\left[X\mid X(T)=a\right] given in Proposition 23 to the equation

y^′′(t)+g^(t)y^(t)+h^(t)=0\hat{y}^{\prime\prime}(t)+\hat{g}(t)\hat{y}(t)+\hat{h}(t)=0

exists. Then the solution to (14) is

(22) y(t)=eV(t)y^(t)=eV(t)Mode[XX(T)=a].y(t)=e^{-V(t)}\hat{y}(t)=e^{-V(t)}\operatorname{Mode}\left[X\mid X(T)=a\right].
Proof.

Differentiating yy once yields that

y˙(t)=eV(t)[y^(t)V˙(t)y^(t)].\dot{y}(t)=e^{-V(t)}[\hat{y}^{\prime}(t)-\dot{V}(t)\hat{y}(t)].

Differentiating again gives

y¨(t)=eV(t)[y^′′(t)V¨(t)y^(t)V˙(t)y^(t)V˙(t)y^(t)+(V˙(t))2y^(t)].\ddot{y}(t)=e^{-V(t)}[\hat{y}^{\prime\prime}(t)-\ddot{V}(t)\hat{y}(t)-\dot{V}(t)\hat{y}^{\prime}(t)-\dot{V}(t)\hat{y}^{\prime}(t)+(\dot{V}(t))^{2}\hat{y}(t)].

Checking that yy satisfies equation (14) using the commutivity assumption concludes that

y¨(t)+f(t)y˙(t)+g(t)y(t)+h(t)\displaystyle\ddot{y}(t)+f(t)\dot{y}(t)+g(t)y(t)+h(t) =eV(t)[y^′′(t)+g^(t)y^(t)]+h(t)\displaystyle=e^{-V(t)}[\hat{y}^{\prime\prime}(t)+\hat{g}(t)\hat{y}^{\prime}(t)]+h(t)
=eV(t)[eV(t)h(t)]+h(t)\displaystyle=e^{-V(t)}[-e^{V(t)}h(t)]+h(t)
=0.\displaystyle=0.

5. Examples

In this section, we give a few examples of the process XX for various ODEs.

Example 1.

Consider the ODE

y¨(t)y(t)=0.\ddot{y}(t)-y(t)=0.

As there is no y˙\dot{y} term, we may use Proposition 16 directly. In this case, A(t)A(t) solves the first order differential equation

A2(t)+A˙(t)=1.A^{2}(t)+\dot{A}(t)=1.

Thus we may let A=±1A=\pm 1. As there is no hh term, we may simply let D=0D=0. Therefore choosing Γ(t,x)=x\Gamma(t,x)=x and thus

C=0TB(t)𝑑B(t).C=\int_{0}^{T}B(t)dB(t).

Thus the solution is the conditional mode of the process X(t)X(t) with density

dμdμ0=eCEμ0[eC].\frac{d\mu^{\ast}}{d\mu_{0}}=\frac{e^{-C}}{E_{\mu_{0}}[e^{-C}]}.

That is, the process

X(t)=0tB(s)𝑑s+B(t).X(t)=\int_{0}^{t}B(s)ds+B(t).
Example 2.

Consider the ODE

y¨(t)+y˙(t)y(t)=0.\ddot{y}(t)+\dot{y}(t)-y(t)=0.

As f0f\neq 0 we must use the full Theorem 22. We note that g^(t)=1\hat{g}(t)=-1 and h^(t)=1\hat{h}(t)=-1. So therefore XX is the same as the previous example and we only need to include the exponential weight to get that

y(t)=et/2Mode[X(t)X(T)=a]y(t)=e^{-t/2}\operatorname{Mode}\left[X(t)\mid X(T)=a\right]
Example 3.

Consider the ODE

y¨(t)y(t)+t=0.\ddot{y}(t)-y(t)+t=0.

Then we can let A(t)=1A(t)=1 and D(t)=1tD(t)=1-t. Therefore Γ(t,x)=x+1t\Gamma(t,x)=x+1-t. Therefore

C=0T[B(t)+1t]𝑑B(t).C=\int_{0}^{T}[B(t)+1-t]dB(t).

Thus the solution is the conditional mode of the process X(t)X(t) with density

dμdμ0=eCEμ0[eC].\frac{d\mu^{\ast}}{d\mu_{0}}=\frac{e^{-C}}{E_{\mu_{0}}[e^{-C}]}.

That is, the process is

X(t)=0t[B(s)+1s]𝑑s+B(t).X(t)=\int_{0}^{t}[B(s)+1-s]ds+B(t).

We conclude with an example of an ODE so that equation (7) cannot be solved explicitly.

Example 4.

Consider the Airy ODE

y¨(t)ty(t)=0,\ddot{y}(t)-ty(t)=0,

with boundary conditions y(0)=0y(0)=0 and y(1)=2y(1)=2. As f=0f=0 we may use Proposition 16. AA must satisfy

A2(t)+A˙(t)=t.A^{2}(t)+\dot{A}(t)=t.

However, this equation is not explicitly solvable by hand. Therefore we estimate AA using a regular perturbation expansion and use part (d) of our portmanteau Theorem 13 to estimate the solution. We numerically minimize the functional

120TEμ0[(q˙(t)A(t)B(t))2]𝑑t\frac{1}{2}\int_{0}^{T}E_{\mu_{0}}[(\dot{q}(t)-A(t)B(t))^{2}]dt

over all qq which are Legendre polynomials of degree less than 200200 with the estimated AA, we call this minimizer qq_{\ast}. We plot the estimated solutions against the true solution, along with the value of the functional

L(q)=0T(q¨(t)tq(t))2𝑑tL(q_{\ast})=\int_{0}^{T}(\ddot{q}_{\ast}(t)-tq_{\ast}(t))^{2}dt

for various degrees of approximation in Figure 1.

Refer to caption
(a) Approximate solution compared to the Airy function.
Refer to caption
(b) Improvement in the approximation (compared with the 50 degree case).
Refer to caption
(c) Improvement in the value L(q)L(q_{\ast}) with the degree of approximation.
Figure 1.

The approximation can be expected to improve further by employing higher degree Legendre polynomials. However, as Figure 1(B) and Figure 1(C) show, this improvement is very slow. Nonetheless, what we have demonstrated is a completely new approach to solving the Airy equation.

6. Conclusions

We presented a new Feynman-Kac type formula for systems of linear second order ODEs, by demonstrating that the solution of the ODE is the mode of a specific diffusion process that is determined by the coefficients of the ODE. To the best of our knowledge, this characterization of the solution of a given linear ODE in terms of a path functional of a stochastic process is novel and has not been identified in the literature before. This result opens the door to new ways of solving second order linear ODEs, potentially leveraging stochastic simulation, just as the Feynman-Kac theorem has led to the development of Monte Carlo methods for solving linear parabolic PDEs. We gave a few examples of the utility of our result for numerically solving ODEs. The primary bottleneck to numerically implementing this method is the calculation of AA which, as noted before, is the solution to a Riccati equation which is known to be difficult to solve explicitly. There is work, of course, on numerical solutions of Riccati equations, see e.g. [5]. In our current example with the Airy equation, we compute an approximate solution to the Ricatti equation by using a regular perturbation expansion (which entails the introduction of some ‘bias’ into final solution to the second order ODE).

However, once AA is identified, numerically computing the solution to the second order ODE can be done reasonably efficiently. Part (d) of our Portmanteau in Theorem 13 provides one possible way of computing the mode. Alternatively, the part (a) of the theorem suggests that the mode could be computed by solving the constrained information projection. One approach to doing this would be to use the Iterative Proportional Fitting Procedure (IPFP) or its discrete counterpart the Sinkhorn algorithm [14], for instance. There are also direct methods for computing the mode; see [16] for examples using a trapezoidal discretization.

To conclude, we pose two open questions:

Question 1.

Is there an efficient way of numerically estimating the solution without estimating AA first?

Question 2.

Can we extend our results to nonlinear ODE? That is, given a general equation

y¨(t)+F(t,y,y˙)=0,\ddot{y}(t)+F(t,y,\dot{y})=0,

is there a condition on FF so that we can guarantee to find a corresponding Γ\Gamma?

Acknowledgements

The authors would like to acknowledge Zihe Zhou for her code for the Airy equation, Example 4.

References

  • [1] Luigi Accardi. On the quantum Feynman-Kac formula. Rend. Sem. Mat. Fis. Milano, 48:135–180 (1980), 1978.
  • [2] Anas Al Bastami, Milivoj R. Belić, and Nikola Z. Petrović. Special solutions of the Riccati equation with applications to the Gross-Pitaevskii nonlinear PDE. Electron. J. Differential Equations, pages No. 66, 10, 2010.
  • [3] Fabrice Baudoin. Diffusion processes and stochastic calculus. EMS Textbooks in Mathematics. European Mathematical Society (EMS), Zürich, 2014.
  • [4] Lorenzo Bertini and Nicoletta Cancrini. The stochastic heat equation: Feynman-Kac formula and intermittence. J. Statist. Phys., 78(5-6):1377–1401, 1995.
  • [5] Jafar Biazar and Mohsen Didgar. Numerical solution of Riccati equations by the Adomian and asymptotic decomposition methods over extended domains. Int. J. Differ. Equ., pages Art. ID 580741, 7, 2015.
  • [6] Joris Bierkens and Hilbert J. Kappen. Explicit solution of relative entropy weighted control. Systems Control Lett., 72:36–43, 2014.
  • [7] F. Cérou, P. Del Moral, and A. Guyader. A nonasymptotic theorem for unnormalized Feynman-Kac particle models. Ann. Inst. Henri Poincaré Probab. Stat., 47(3):629–649, 2011.
  • [8] K. L. Chung and K. M. Rao. Feynman-Kac functional and the Schrödinger equation. In Seminar on Stochastic Processes, 1981 (Evanston, Ill., 1981), volume 1 of Progr. Prob. Statist., pages 1–29. Birkhäuser, Boston, Mass., 1981.
  • [9] Y Dan. Bayesian inference for Gaussian models: Inverse problems and evolution equations. 2020.
  • [10] M. Dashti, K. J. H. Law, A. M. Stuart, and J. Voss. MAP estimators and their consistency in Bayesian nonparametric inverse problems. Inverse Problems, 29(9):095017, 27, 2013.
  • [11] P. Del Moral and L. Miclo. Branching and interacting particle systems approximations of Feynman-Kac formulae with applications to non-linear filtering. In Séminaire de Probabilités, XXXIV, volume 1729 of Lecture Notes in Math., pages 1–145. Springer, Berlin, 2000.
  • [12] Pierre Del Moral. Feynman-Kac formulae. Probability and its Applications (New York). Springer-Verlag, New York, 2004. Genealogical and interacting particle systems with applications.
  • [13] Pierre Del Moral, Arnaud Doucet, and Sumeetpal S. Singh. A backward particle interpretation of Feynman-Kac formulae. M2AN Math. Model. Numer. Anal., 44(5):947–975, 2010.
  • [14] Simone Di Marino and Augusto Gerolin. An optimal transport approach for the schrödinger bridge problem and convergence of sinkhorn algorithm. Journal of Scientific Computing, 85(2):1–28, 2020.
  • [15] Detlef Dürr and Alexander Bach. The Onsager-Machlup function as Lagrangian for the most probable path of a diffusion process. Communications in Mathematical Physics, 60(2):153–170, 1978.
  • [16] Dimas Abreu Dutra, Bruno Otávio Soares Teixeira, and Luis Antonio Aguirre. Maximum a posteriori state path estimation: discretization limits and their interpretation. Automatica J. IFAC, 50(5):1360–1368, 2014.
  • [17] Batu Güneysu, Matthias Keller, and Marcel Schmidt. A Feynman-Kac-Itô formula for magnetic Schrödinger operators on graphs. Probab. Theory Related Fields, 165(1-2):365–399, 2016.
  • [18] Martin Hairer. An introduction to stochastic PDEs. arXiv preprint arXiv:0907.4178, 2009.
  • [19] Yaozhong Hu, Fei Lu, and David Nualart. Feynman-Kac formula for the heat equation driven by fractional noise with Hurst parameter H<1/2H<1/2. Ann. Probab., 40(3):1041–1068, 2012.
  • [20] Yaozhong Hu, David Nualart, and Jian Song. Feynman-Kac formula for heat equation driven by fractional white noise. Ann. Probab., 39(1):291–326, 2011.
  • [21] Svante Janson and Johan Tysk. Feynman-Kac formulas for Black-Scholes-type operators. Bull. London Math. Soc., 38(2):269–282, 2006.
  • [22] M. Kac. On distributions of certain Wiener functionals. Trans. Amer. Math. Soc., 65:1–13, 1949.
  • [23] Maxim Kontsevich. Intersection theory on the moduli space of curves and the matrix Airy function. Comm. Math. Phys., 147(1):1–23, 1992.
  • [24] József Lőrinczi, Fumio Hiroshima, and Volker Betz. Feynman-Kac-type theorems and Gibbs measures on path space, volume 34 of De Gruyter Studies in Mathematics. Walter de Gruyter & Co., Berlin, 2011. With applications to rigorous quantum field theory.
  • [25] Bernt Ø ksendal. Stochastic differential equations. Universitext. Springer-Verlag, Berlin, sixth edition, 2003. An introduction with applications.
  • [26] Zachary Selk, William Haskell, and Harsha Honnappa. Information projection on banach spaces with applications to state independent kl-weighted optimal control. Appl. Math. Optim., 2021.
  • [27] Zhong Xin Zhao. Green function for Schrödinger operator and conditioned Feynman-Kac gauge. J. Math. Anal. Appl., 116(2):309–334, 1986.