This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Relative Entropy Methods for Calculating Committors

Gabriel Earle and Brian Van Koten
Abstract.

Motivated by challenges arising in molecular simulation, we analyze and develop methods of computing reactive trajectories and committor functions for systems described by the overdamped Langevin dynamics. Our main technical advance is a new loss function that measures the accuracy of approximations to the committor function describing a given chemical reaction or other rare transition event. The loss admits a simple interpretation in terms of the distribution of reactive trajectories, and it can be computed in practice to compare the accuracies of different approximations of the committor. We also derive a method of calculating committors by direct minimization of the loss via stochastic gradient descent.

BvK and GE were supported by NSF DMS-2012207

1. Introduction

Motivated by challenges arising in molecular simulation, we analyze and develop methods for computing reactive trajectories and committor functions. To be precise, we consider the stochastic differential equation (2) that characterizes reactive trajectories of the overdamped Langevin dynamics in the transition path theory of Weinan E and Eric Vanden-Eijnden [5]. We assess the effect of replacing the exact committor function qq in that equation with an approximation q~\tilde{q}, resulting in approximate reactive trajectories. Our primary theoretical advances are

  1. (1)

    a characterization of those q~\tilde{q} for which solutions of (2) with q~\tilde{q} in place of qq exist,

  2. (2)

    a characterization of those q~\tilde{q} for which the distributions of the exact and approximate reactive trajectories are mutually absolutely continuous, and

  3. (3)

    convenient formulas for the change of measure and relative entropy between the exact and approximate distributions for a given q~\tilde{q}.

Surprisingly, the change of measure can be computed exactly without prior knowledge of the exact committor function qq. Based on this observation, we propose novel computational methods, including

  1. (1)

    a stochastic gradient descent scheme for estimating committor functions that directly minimizes the relative entropy between the exact and approximate distributions of reactive trajectories, and

  2. (2)

    a model assessment procedure for comparing approximate committor functions based on relative entropy differences as in the Bayes and Akaike information criteria.

We limit our study to systems described by the overdamped Langevin dynamics

dXt=U(Xt)dt+2εdBt.\mathrm{d}X_{t}=-\nabla U(X_{t})\,\mathrm{d}t+\sqrt{2\varepsilon}\,\mathrm{d}B_{t}. (1)

In models of molecular systems, XtX_{t} takes values in the space of positions 3N\mathbb{R}^{3N} of the NN atomic nuclei comprising the system, U:3NU:\mathbb{R}^{3N}\rightarrow\mathbb{R} is a potential energy, and ε=kBT\varepsilon=k_{B}T where T>0T>0 is the temperature and kBk_{B} is Boltzmann’s constant. Each local minimum of UU corresponds to a stable state. When the temperature is low, the dynamics is metastable [8]. That is, trajectories spend most of the time vibrating around stable states, undergoing transitions between states only rarely.

Metastability entails severe computational difficulties. One usually wants to estimate statistics of rare transitions, but it may be impractical to compute a trajectory long enough to observe even one. For example, in simulations of proteins, resolving the fastest vibrations requires a time step no larger than 1015s10^{-15}s, but conformational changes may occur only every 106s10^{-6}s or less frequently. To overcome difficulties related to metastability, enhanced sampling methods have been devised that perturb or condition the dynamics (1) so that transitions occur over short time scales. In various ways, these methods correct for the perturbations to consistently estimate observables of the unperturbed dynamics.

We consider a particular class of enhanced sampling methods based on Transition Path Theory (TPT) [5, 13]. In TPT, one studies transitions between a fixed pair of disjoint sets A3NA\subset\mathbb{R}^{3N} and B3NB\subset\mathbb{R}^{3N}, e.g. balls centered at different local minima of UU corresponding to different stable states. TPT characterizes reactive trajectories, which are segments of trajectories of (1) observed starting when they last leave AA and ending when they first hit BB; see Figure 1 and Section 2 for a precise definition. The reactive trajectories have the same law as the transition path process

dYt=U(Xt)dt+2εlogq(Xt)dt+2εdBt\mathrm{d}Y_{t}=-\nabla U(X_{t})\,\mathrm{d}t+2\varepsilon\nabla\log q(X_{t})\,\mathrm{d}t+\sqrt{2\varepsilon}\,\mathrm{d}B_{t} (2)

where

q(x)=[τB<τA|X0=x]q(x)=\mathds{P}[\tau_{B}<\tau_{A}|X_{0}=x]

is the (forward) committor function, i.e. the probability that a trajectory of (1) with X0=xX_{0}=x will hit BB before AA. The drift 2εlogq(Yt)2\varepsilon\nabla\log q(Y_{t}) in (2) forces trajectories away from AA and towards BB. As a result, in many cases, transitions between AA and BB occur much more quickly under (2) than they would under (1); cf. Figure 2.

There has been significant recent interest in sampling reactive trajectories by first computing an approximation q~\tilde{q} to the committor function and then simulating trajectories of (4) with q~\tilde{q} in place of qq; see [17] and numerical methods for computing committor functions such as [6, 7, 3, 9, 12]. Such a procedure can be dramatically more efficient than the direct simulation of long trajectories to observe transitions, but the effect of the approximation of the committor on the law of the reactive trajectories has not been assessed. Our results in Sections 3 and 4 below provide a means of assessing and controlling the quality of the approximation in theory and practice. In Theorem 1, we characterize the set of approximate committor functions q~\tilde{q} for which the approximate process is absolutely continuous with respect to the exact process, and we derive a convenient formula for the change of measure between the exact and approximate processes.

Surprisingly, the change of measure can be computed explicitly, at least up to a normalizing constant, even when the exact committor function is unknown; cf. equation (11). Considering Girsanov’s theorem, one might expect the change of measure to depend on the difference in drifts 2εlogq(Yt)2εlogq~(Yt)2\varepsilon\nabla\log q(Y_{t})-2\varepsilon\nabla\log\tilde{q}(Y_{t}). In Section 3, we show that one can totally eliminate the dependence on exact committor qq. We propose several novel computational methods based on this observation. In Section 4.3, we derive a stochastic gradient descent method for computing committor functions by direct minimization of the relative entropy of the approximate process with respect to the exact process. Competing methods for estimating committors minimize loss functions that do not relate so closely to the law of the approximate transition path process. For example, Ritz methods minimize

J(q~):=3N(AB)|q~(x)|2exp(U(x)/ε)dxJ(\tilde{q}):=\int_{\mathbb{R}^{3N}\setminus(A\cup B)}\lvert\nabla\tilde{q}(x)\rvert^{2}\exp(-U(x)/\varepsilon)\,\mathrm{d}x

subject to q~=0\tilde{q}=0 on A\partial A and q~=1\tilde{q}=1 on B\partial B [3]. It is not clear how the value of the functional J(q~)J(\tilde{q}) relates to the discrepancy between the distributions of exact and approximate reactive trajectories. In that sense, it is not a such a natural loss function as our relative entropy. In Section 4.2, we devise a model assessment procedure that compares approximate committor functions based on relative entropy differences as in the Bayes and Akaike information criteria.

All applications of our results demand that extraordinary care be taken to correctly handle singularities. For example, since q=0q=0 on A\partial A, the drift 2εlogq(Yt)2\varepsilon\nabla\log q(Y_{t}) in (4) is singular on A\partial A, and standard numerical integrators may fail to converge. We have devised a weakly convergent operator splitting scheme for integrating (2) in the case where A\partial A is planar and the approximate committor function meets a certain boundary condition on A\partial A. We will publish a numerical analysis of this integrator in a companion to the present article. We have not yet been able to devise a practical, provably convergent integrator that correctly handles the singularity on A\partial A in all cases. We also note that the change of measure formula (11) involves an integral that may be singular, but we have devised a general means of treating the singularity and correctly computing the integral; cf. Section 5.

2. A Digest of Transition Path Theory

In this section, we present a brief review of transition path theory [5, 13]. As above, let U:dU:\mathbb{R}^{d}\rightarrow\mathbb{R} be a potential energy, describing for example a molecular system. Let ε=kBT>0\varepsilon=k_{B}T>0 be a temperature parameter, and suppose that XtX_{t} evolves according to the overdamped Langevin dynamics

dXt=U(Xt)dt+2εdBt.\mathrm{d}X_{t}=-\nabla U(X_{t})\,\mathrm{d}t+\sqrt{2\varepsilon}\,\mathrm{d}B_{t}. (3)

Under certain conditions on UU [10], the overdamped Langevin dynamics is ergodic for the Boltzmann distribution

ρ(dx)=Z1exp(U(x)/ε)dx where Z=dexp(U(x)/ε)dx.\rho(\mathrm{d}x)=Z^{-1}\exp(-U(x)/\varepsilon)\,\mathrm{d}x\text{ where }Z=\int_{\mathbb{R}^{d}}\exp(-U(x)/\varepsilon)\,\mathrm{d}x.

We assume ergodicity throughout this work.

Assumption 1.

For the given potential function UU, the overdamped Langevin dynamics (3) is ergodic.

In transition path theory, to characterize rare transitions of the overdamped Langevin dynamics, one first chooses a disjoint pair of subsets AA and BB of d\mathbb{R}^{d}. In applications, AA and BB would usually be associated with metastable states, e.g. they might be sets modeling different conformations of a biomolecule. In addition to the sets AA and BB, we find it convenient to define the transition region

𝒯=d(AB).\mathscr{T}=\mathbb{R}^{d}\setminus(A\cup B).

We impose the following assumptions on AA, BB, and 𝒯\mathscr{T}.

Assumption 2.

We assume that AA and BB are disjoint, closed subsets of d\mathbb{R}^{d} with nonempty interior. The boundaries of AA and BB are regular CC^{\infty}-submanifolds of d\mathbb{R}^{d}. The transition region 𝒯\mathscr{T} is connected.

Figure 1. Schematic illustration of reactive trajectories, entrance times, and exit times. The black line depicts a trajectory of overdamped Langevin dynamics. The bold red segments depict reactive trajectories. Here, τB,0\tau_{B,0} is the first hitting time of BB, τA,1\tau_{A,1} is the first hitting time of AA after τB,0\tau_{B,0}, τB,1\tau_{B,1} is the first hitting time of BB after τA,1\tau_{A,1}, and so forth. The kk’th exit time σA,k\sigma_{A,k} is the last time before τB,k\tau_{B,k} that the process was in AA.
Refer to caption
Figure 2. A trajectory of the overdamped Langevin dynamics for a simple two-dimensional model potential. The blue curves are contours of the potential. The trajectory depicted here was initialized from the Boltzmann distribution, i.e. in equilibrium. Its starting point happened to be in the set AA on the left side of the figure. The trajectory was terminated on hitting the boundary of BB for the first time. The bold red curve depicts the end of the trajectory from the last time it left AA to the first time it entered BB; i.e. the reactive part of the trajectory. The faint red curve depicts the beginning of the trajectory up to the last time it left AA.
Refer to caption

Given a pair of sets AA and BB, one defines sequences of entrance times, exit times, and reactive trajectories as illustrated in Figure 1. The entrance times are

τA,0\displaystyle\tau_{A,0} :=inf{t0;XtA},\displaystyle:=\inf\{t\geq 0;X_{t}\in A\},
τB,0\displaystyle\tau_{B,0} :=inf{tτA,0;XtB},\displaystyle:=\inf\{t\geq\tau_{A,0};X_{t}\in B\},

and for k1k\geq 1

τA,k\displaystyle\tau_{A,k} :=inf{tτB,k1;XtA},\displaystyle:=\inf\{t\geq\tau_{B,k-1};X_{t}\in A\},
τB,k\displaystyle\tau_{B,k} :=inf{tτA,k1;XtB}.\displaystyle:=\inf\{t\geq\tau_{A,k-1};X_{t}\in B\}.

The exit times are

σA,0\displaystyle\sigma_{A,0} :=sup{τA0tτB,0;XtA},\displaystyle:=\sup\{\tau_{A_{0}}\leq t\leq\tau_{B,0};X_{t}\in A\},
σB,0\displaystyle\sigma_{B,0} :=sup{τB,0tτA,1;XtB},\displaystyle:=\sup\{\tau_{B,0}\leq t\leq\tau_{A,1};X_{t}\in B\},

and for k1k\geq 1

σA,k\displaystyle\sigma_{A,k} :=sup{τA,ktτB,k;XtA},\displaystyle:=\sup\{\tau_{A,k}\leq t\leq\tau_{B,k};X_{t}\in A\},
σB,k\displaystyle\sigma_{B,k} :=sup{τB,0tτA,1;XtB}.\displaystyle:=\sup\{\tau_{B,0}\leq t\leq\tau_{A,1};X_{t}\in B\}.

The kk’th reactive trajectory (or transition path) YtkY^{k}_{t} is the segment of the random path XtX_{t} between the kk’th exit time σA,k\sigma_{A,k} from AA and the kk’th entrance time τB,k\tau_{B,k} to BB. That is,

Ytk:=XσA,k+t for t[0,τB,kσA,k].Y^{k}_{t}:=X_{\sigma_{A,k}+t}\text{ for }t\in[0,\tau_{B,k}-\sigma_{A,k}].

Transition path theory characterizes the distribution of reactive trajectories in terms of the committor function

q(x):=[τB<τA|X0=x].q(x):=\mathds{P}[\tau_{B}<\tau_{A}|X_{0}=x].

Here, τA\tau_{A} and τB\tau_{B} are the first hitting times of AA and BB for XtX_{t}, so q(x)q(x) is the probability of hitting BB before AA when starting from xx. In [13], Lu and Nolen verified that if the overdamped Langevin dynamics is in the steady state with XtρX_{t}\sim\rho for all t0t\geq 0, then the reactive trajectories have the same distribution as the solution of the transition path equation

dYt=U(Yt)dt+2εlogq(Yt)dt+2εdBt\mathrm{d}Y_{t}=-\nabla U(Y_{t})\,\mathrm{d}t+2\varepsilon\nabla\log q(Y_{t})\,\mathrm{d}t+\sqrt{2\varepsilon}\,\mathrm{d}B_{t} (4)

where Y0Y_{0} has the reactive flux distribution

Y01ν|q(x)|exp(U(x)/ε)dS(x).Y_{0}\sim\frac{1}{\nu}\lvert\nabla q(x)\rvert\exp(-U(x)/\varepsilon)\,\mathrm{d}S(x). (5)

Here, SS is the surface measure on A\partial A, and

ν=A|q(x)|exp(U(x)/ε)dS(x).\nu=\int_{\partial A}\lvert\nabla q(x)\rvert\exp(-U(x)/\varepsilon)\,\mathrm{d}S(x).

Observe that the transition path equation is the overdamped Langevin dynamics with the additional drift term 2εlogq(Xt)dt2\varepsilon\nabla\log q(X_{t})\,\mathrm{d}t. On A\partial A, we have q=0q=0, and by Lemma 1 below, |q|>0\lvert\nabla q\rvert>0. Therefore, the drift logq=qq\nabla\log q=\frac{\nabla q}{q} is singular on A\partial A. Despite the singularity, Lu and Nolen showed that a unique strong solution of (4) exists for any initial distribution, even distributions supported on A\partial A. Moreover, with probability one, XtAX_{t}\notin A for t>0t>0. In effect, the singular drift repels trajectories from AA, forcing transitions to occur.

Remark 1.

One can derive (4) formally by conditioning the overdamped Langevin dynamics on hitting BB before AA using the Doob hh-transform. This approach does not establish a definite connection with reactive trajectories or the existence of strong solutions.

The committor solves the elliptic boundary value problem

{Lq=0 in 𝒯,q=0 on A,q=1 on B,\begin{cases}Lq=0&\text{ in }\mathscr{T},\\ q=0&\text{ on }\partial A,\\ q=1&\text{ on }\partial B,\end{cases} (6)

where

L:=U+εΔL:=-\nabla U\cdot\nabla+\varepsilon\Delta

is the generator of the overdamped Langevin dynamics [5]. Therefore, under our smoothness assumptions on UU and 𝒯\mathscr{T}, elliptic regularity, the Hopf lemma, and the maximum principle imply the properties of the committor listed in Lemma 1 below.

Assumption 3.

We assume that the potential energy U:dU:\mathbb{R}^{d}\rightarrow\mathbb{R} is infinitely differentiable, but we do not assume that UU or any of its derivatives are bounded.

Lemma 1.

Under Assumptions 2 and 3, the forward committor function q:𝒯[0,1]q:\mathscr{T}\rightarrow[0,1] extends to an infinitely differentiable function defined on an open set containing A\partial A and B\partial B. By abuse of notation, we let qq refer to both the committor function and the extension. For all x𝒯̊x\in\mathring{\mathscr{T}},

q(x)>0.q(x)>0.

For all xAx\in\partial A,

q(x)n(x)=|q(x)|>0,\nabla q(x)\cdot n(x)=\lvert\nabla q(x)\rvert>0,

where n(x)n(x) is the outward unit normal to AA at xx.

Proof.

By elliptic regularity, there exists an infinitely differentiable extension of qq to an open set containing 𝒯¯\overline{\mathscr{T}}. We have q(x)n(x)>0\nabla q(x)\cdot n(x)>0 for xAx\in\partial A by the Hopf Lemma, and q(x)>0q(x)>0 in the interior of 𝒯\mathscr{T} by the maximum principle. Moreover, since qq is constant on A\partial A and q(x)n(x)>0\nabla q(x)\cdot n(x)>0, q(x)=|q(x)|n(x)\nabla q(x)=\lvert\nabla q(x)\rvert n(x). ∎

Remark 2.

It is conventional to take the domain of the committor to be d\mathbb{R}^{d} instead of 𝒯\mathscr{T}, letting q=0q=0 on AA and q=1q=1 on BB. This extension of qq does not coincide with the smooth extension described in Lemma 1. Any smooth extension must take negative values at some points in the interior of AA since qn>0\nabla q\cdot n>0 and q=0q=0 on A\partial A. Moreover, observe that for our smooth extension, Lq=0Lq=0 everywhere in 𝒯¯\bar{\mathscr{T}}, including on A\partial A. In the original work on transition state theory, LqLq was instead understood as a distribution supported on AB\partial A\cup\partial B. We require a smooth extension only to simplify the notation in certain proofs. None of our results depends on a particular choice of extension.

3. A Change of Measure Formula for Approximations to the Transition Path Process

We analyze the errors that result when one substitutes an approximation q~\tilde{q} of the committor in the transition path equation (4). To be precise, in Theorem 1 below, we present a change of measure formula relating the distribution of solutions of the approximate transition path equation

dYt=U(Yt)dt+2εlogq~(Yt)dt+2εdBt\mathrm{d}Y_{t}=-\nabla U(Y_{t})\,\mathrm{d}t+2\varepsilon\nabla\log\tilde{q}(Y_{t})\,\mathrm{d}t+\sqrt{2\varepsilon}\,\mathrm{d}B_{t} (7)

with the exact transition path process. To derive the change of measure, motivated by Lemma 1, we impose the following assumptions on the approximate committor q~\tilde{q}.

Assumption 4.

Let q~:𝒯\tilde{q}:\mathscr{T}\rightarrow\mathbb{R} be an approximation to the committor with the following properties:

  1. 1)

    q~\tilde{q} extends to an infinitely differentiable function defined on an open set containing A\partial A and B\partial B.

  2. 2)

    q~(x)=0\tilde{q}(x)=0 for all xAx\in\partial A.

  3. 3)

    q~(x)n(x)>0\nabla\tilde{q}(x)\cdot n(x)>0 for all xAx\in\partial A where n(x)n(x) is the outward unit normal to AA at xx.

We now show that when an approximate committor q~\tilde{q} has the properties outlined above, the ratio q~q\frac{\tilde{q}}{q} must be smooth in a neighborhood of A\partial A. This will be the crucial property in deriving sufficient conditions for a change of measure.

Lemma 2.

Let qq be the forward committor. Let q~\tilde{q} be an approximation to qq having the properties outlined in Assumption 4. The function

r(x)=q~(x)q(x)r(x)=\frac{\tilde{q}(x)}{q(x)}

defined for x𝒯̊x\in\mathring{\mathscr{T}} extends to a function in C(𝒯¯)C^{\infty}(\overline{\mathscr{T}}). By abuse of notation, we let rr denote both the function and its extension. For xAx\in\partial A, we have

r(x)=|q~(x)||q(x)|=q~(x)n(x)q(x)n(x).r(x)=\frac{\lvert\nabla\tilde{q}(x)\rvert}{\lvert\nabla q(x)\rvert}=\frac{\nabla\tilde{q}(x)\cdot n(x)}{\nabla q(x)\cdot n(x)}.
Proof.

Under Assumption 2, there exists a tubular neighborhood of A\partial A. That is, there exist an open set 𝒟\mathscr{D} containing A\partial A, an infinitely differentiable retraction ρ:𝒟A\rho:\mathscr{D}\rightarrow\partial A, and an infinitely differentiable mapping h:𝒟h:\mathscr{D}\rightarrow\mathbb{R} so that for any x𝒟x\in\mathscr{D},

x=ρ(x)+h(x)n(ρ(x))x=\rho(x)+h(x)n(\rho(x))

where n(y)n(y) is the outward unit normal to AA at yAy\in\partial A. To simplify notation, we define

x¯:=ρ(x),\bar{x}:=\rho(x),

and we sometimes write hh and nn for h(x)h(x) and n(ρ(x))n(\rho(x)). We choose 𝒟\mathscr{D} so that 𝒟\mathscr{D} and BB are disjoint.

Since q~>0\tilde{q}>0 on 𝒯B\mathscr{T}\cup\partial B, rr is smooth in an open neighborhood of any point in 𝒯B\mathscr{T}\cup\partial B. It will suffice to show that rr is smooth up to A\partial A. Let x𝒟x\in\mathscr{D}. By Taylor’s theorem and Lemma 1,

q(x)\displaystyle q(x) =q(x¯+hn)\displaystyle=q(\bar{x}+hn)
=q(x¯)+(q(x¯)n)h+h2s=01ntD2q(x¯+shn)n(1s)ds\displaystyle=q(\bar{x})+(\nabla q(\bar{x})\cdot n)h+h^{2}\int_{s=0}^{1}n^{t}D^{2}q(\bar{x}+shn)n(1-s)\,\mathrm{d}s
=|q(x¯)|h+h2s=01ntD2q(x¯+shn)n(1s)ds.\displaystyle=\lvert\nabla q(\bar{x})\rvert h+h^{2}\int_{s=0}^{1}n^{t}D^{2}q(\bar{x}+shn)n(1-s)\,\mathrm{d}s.

Similarly, we have

q~(x)=|q~(x¯)|h+h2s=01ntD2q~(x¯+shn)n(1s)ds,\tilde{q}(x)=\lvert\nabla\tilde{q}(\bar{x})\rvert h+h^{2}\int_{s=0}^{1}n^{t}D^{2}\tilde{q}(\bar{x}+shn)n(1-s)\,\mathrm{d}s,

since the proof of Lemma 1 verifies that q~(x¯)=|q~(x¯)|n\nabla\tilde{q}(\bar{x})=\lvert\nabla\tilde{q}(\bar{x})\rvert n under our assumptions on q~\tilde{q}. Therefore,

q~(x)q(x)=|q~(x¯)||q(x¯)|1+h|q~(x¯)|s=01ntD2q~(x¯+shn)n(1s)ds1+h|q(x¯)|s=01ntD2q(x¯+shn)n(1s)ds.\frac{\tilde{q}(x)}{q(x)}=\frac{\lvert\nabla\tilde{q}(\bar{x})\rvert}{\lvert\nabla q(\bar{x})\rvert}\frac{1+\frac{h}{\lvert\nabla\tilde{q}(\bar{x})\rvert}\int_{s=0}^{1}n^{t}D^{2}\tilde{q}(\bar{x}+shn)n(1-s)\,\mathrm{d}s}{1+\frac{h}{\lvert\nabla q(\bar{x})\rvert}\int_{s=0}^{1}n^{t}D^{2}q(\bar{x}+shn)n(1-s)\,\mathrm{d}s}.

The result follows since x¯\bar{x} and hh are smooth functions of xx, q~\tilde{q} and qq are in C(Ω¯)C^{\infty}(\overline{\Omega}), and |q~(x¯)|\lvert\nabla\tilde{q}(\bar{x})\rvert and |q~(x¯)|\lvert\nabla\tilde{q}(\bar{x})\rvert are positive. ∎

Lemma 2 implies that logr\log r and logr\nabla\log r are smooth and bounded in a neighborhood of any point on A\partial A, since |q(x)|>0\lvert\nabla q(x)\rvert>0 for all xAx\in\partial A by Lemma 1. If logr\nabla\log r is in fact globally bounded, then the Novikov condition stated in Assumption 5 below holds, and Girsanov’s theorem guarantees a change of measure.

Assumption 5.

For any t>0t>0, we have the Novikov condition

[exp(ε0t|logr(Xs)|2ds)]<.\mathds{Q}\left[\exp\left(\varepsilon\int_{0}^{t}\lvert\nabla\log r(X_{s})\rvert^{2}\,\mathrm{d}s\right)\right]<\infty.

Establishing or assuming a global bound on logr\nabla\log r seems to us to be the most practical means of verifying the Novikov condition and the existence of a change of measure. For example, a global bound on logr\nabla\log r follows from Assumption 4 and Lemma 2 when 𝒯\mathscr{T} is bounded. We note, however, that a global bound is only a convenient sufficient condition, not necessary, for the results below.

To state our change of measure, we require some additional notation to properly specify the probability space on which the exact and approximate transition path processes are defined. Let BtB_{t} be a dd-dimensional standard Brownian motion on the probability space (Ω,,)(\Omega,\mathcal{F},\mathds{Q}), and suppose that Y0Y_{0} is a random variable on (Ω,)(\Omega,\mathcal{F}) taking values in 𝒯¯\overline{\mathscr{T}} and independent of BtB_{t}. Let t\mathcal{F}_{t} be the filtration generated by BtB_{t} and Y0Y_{0}. In [13], it is shown that a strong solution of

dYt=U(Yt)dt+2εlogq(Yt)dt+2εdBtdY_{t}=-\nabla U(Y_{t})\,\mathrm{d}t+2\varepsilon\nabla\log q(Y_{t})\,\mathrm{d}t+\sqrt{2\varepsilon}\,\mathrm{d}B_{t} (8)

exists with initial value Y0Y_{0}. Under our assumptions, Girsanov’s theorem yields a change of measure relating the transition path process (8) to a weak solution of the approximate transition path equation

dYt=U(Yt)dt+2εlogq~(Yt)dt+2εdBt.\mathrm{d}Y_{t}=-\nabla U(Y_{t})\,\mathrm{d}t+2\varepsilon\nabla\log\tilde{q}(Y_{t})\,\mathrm{d}t+\sqrt{2\varepsilon}\,\mathrm{d}B_{t}.
Lemma 3.

Let Assumptions 234, and 5 hold. Let YtY_{t} and Y0Y_{0} be as described in the previous paragraph. The process

Mt:=exp(0t2εlogr(Ys)dBs120t|2εlogr(Ys)|2ds)M_{t}:=\exp\left(\int_{0}^{t}\sqrt{2\varepsilon}\nabla\log r(Y_{s})\cdot\mathrm{d}B_{s}-\frac{1}{2}\int_{0}^{t}\lvert\sqrt{2\varepsilon}\nabla\log r(Y_{s})\rvert^{2}\,\mathrm{d}s\right) (9)

is a positive t\mathcal{F}_{t}-martingale under \mathds{Q} with [Mt]=1\mathds{Q}[M_{t}]=1 for all t0t\geq 0. Therefore, there exists a probability measure \mathds{P} on (Ω,)(\Omega,\mathcal{F}) so that for any fixed T>0T>0,

dd|T=MT.\left.\frac{\mathrm{d}\mathds{P}}{\mathrm{d}\mathds{Q}}\right\rvert_{\mathcal{F}_{T}}=M_{T}.

Define the process WtW_{t} by

2εWt:=YtY00tU(Ys)+2εlogq~(Ys)ds\sqrt{2\varepsilon}W_{t}:=Y_{t}-Y_{0}-\int_{0}^{t}-\nabla U(Y_{s})+2\varepsilon\nabla\log\tilde{q}(Y_{s})\,\mathrm{d}s

so that YtY_{t} solves

dYt=U(Yt)dt+2εlogq~(Yt)dt+2εdWt.\mathrm{d}Y_{t}=-\nabla U(Y_{t})\,\mathrm{d}t+2\varepsilon\nabla\log\tilde{q}(Y_{t})\,\mathrm{d}t+\sqrt{2\varepsilon}\,\mathrm{d}W_{t}.

Under \mathds{P}, {Wt}t=0T\{W_{t}\}_{t=0}^{T} has the law of a standard Brownian motion.

Proof.

The above merely restates Girsanov’s theorem. Observe that Girsanov’s theorem holds even when the drift is singular. Only some form of Novikov’s condition is required. ∎

Remark 3 (Weak Solutions of the Approximate Transition Path Equation).

Lemma 3 establishes the existence of at least a weak solution to

dYt=U(Yt)dt+2εlogq~(Yt)dt+2εdBt\mathrm{d}Y_{t}=-\nabla U(Y_{t})\,\mathrm{d}t+2\varepsilon\nabla\log\tilde{q}(Y_{t})\,\mathrm{d}t+\sqrt{2\varepsilon}\,\mathrm{d}B_{t}

with arbitrary initial distribution supported on A\partial A even though logq~\nabla\log\tilde{q} is singular on A\partial A. We do not verify the existence of strong solutions. However, we suspect that one could do so based on arguments roughly similar to those used in [13] to construct strong solutions of the transition path equation (4).

We now derive a formula for the change of measure that can be computed in practice. Observe that formula (9) for MtM_{t} cannot be computed without knowledge of the exact committor qq. As a first step towards eliminating this dependence on the committor, we show that logq\log q solves a Hamilton–Jacobi–Bellman equation. Results of this nature are standard; we include a proof only for the reader’s convenience.

Lemma 4.

For x𝒯̊x\in\mathring{\mathscr{T}},

0\displaystyle 0 =U(x)logq(x)+εΔlogq(x)+ε|logq(x)|2\displaystyle=-\nabla U(x)\cdot\nabla\log q(x)+\varepsilon\Delta\log q(x)+\varepsilon\lvert\nabla\log q(x)\rvert^{2}
=Llogq(x)+ε|logq(x)|2.\displaystyle=L\log q(x)+\varepsilon\lvert\nabla\log q(x)\rvert^{2}.
Proof.

Define

m(x):=logq(x)m(x):=\log q(x)

for xΩx\in\Omega. We have

0\displaystyle 0 =Lq\displaystyle=Lq
=Uq+εΔq\displaystyle=-\nabla U\cdot\nabla q+\varepsilon\Delta q
=Uexp(m)+εΔexp(m)\displaystyle=-\nabla U\cdot\nabla\exp(m)+\varepsilon\Delta\exp(m)
=exp(m)(Um+εΔm+ε|m|2)\displaystyle=\exp(m)(-\nabla U\cdot\nabla m+\varepsilon\Delta m+\varepsilon\lvert\nabla m\rvert^{2})

on xd(AB)x\in\mathbb{R}^{d}\setminus(A\cup B), which implies the result since exp(m)\exp(m) must be positive. ∎

We use the Hamilton–Jacobi–Bellman equation in Lemma 4 to eliminate qq from the exponential factor in formula (9) for MtM_{t}.

Lemma 5.

Under the hypotheses of Lemma 3, and assuming Y0Y_{0} takes values in A\partial A, we have

Mt\displaystyle M_{t} =|q(Y0)||q~(Y0)|q~(Yt)q(Yt)exp(0tLlogq~(Ys)+ε|logq~(Ys)|2ds)\displaystyle=\frac{\lvert\nabla q(Y_{0})\rvert}{\lvert\nabla\tilde{q}(Y_{0})\rvert}\frac{\tilde{q}(Y_{t})}{q(Y_{t})}\exp\left(-\int_{0}^{t}L\log\tilde{q}(Y_{s})+\varepsilon\lvert\nabla\log\tilde{q}(Y_{s})\rvert^{2}\,\mathrm{d}s\right)
=|q(Y0)||q~(Y0)|q~(Yt)q(Yt)exp(0tLq~q~(Ys)ds)\displaystyle=\frac{\lvert\nabla q(Y_{0})\rvert}{\lvert\nabla\tilde{q}(Y_{0})\rvert}\frac{\tilde{q}(Y_{t})}{q(Y_{t})}\exp\left(-\int_{0}^{t}\frac{L\tilde{q}}{\tilde{q}}(Y_{s})\,\mathrm{d}s\right) (10)

for t>0t>0 and M0=1M_{0}=1.

Proof.

First, we derive an alternative expression for the integral with respect to dBs\mathrm{d}B_{s} that appears in formula (9) for MtM_{t}. By the Ito formula, for any gC2(d;)g\in C^{2}(\mathbb{R}^{d};\mathbb{R}), we have

dg(Yt)\displaystyle\mathrm{d}g(Y_{t}) =g(Yt)tdYt+12dYttD2g(Yt)dYt\displaystyle=\nabla g(Y_{t})^{t}\mathrm{d}Y_{t}+\frac{1}{2}\mathrm{d}Y_{t}^{t}D^{2}g(Y_{t})\mathrm{d}Y_{t}
=Lg(Yt)dt+2εlogq(Yt)tg(Yt)dt+2εg(Yt)tdBt,\displaystyle=Lg(Y_{t})\,\mathrm{d}t+2\varepsilon\nabla\log q(Y_{t})^{t}\nabla g(Y_{t})\,\mathrm{d}t+\sqrt{2\varepsilon}\nabla g(Y_{t})^{t}\mathrm{d}B_{t},

so

2ε0tg(Ys)tdBs=g(Yt)g(Y0)0tLg(Ys)+2εlogq(Ys)tg(Ys)ds.\displaystyle\sqrt{2\varepsilon}\int_{0}^{t}\nabla g(Y_{s})^{t}\mathrm{d}B_{s}=g(Y_{t})-g(Y_{0})-\int_{0}^{t}Lg(Y_{s})+2\varepsilon\nabla\log q(Y_{s})^{t}\nabla g(Y_{s})\,\mathrm{d}s.

Applying the above with g=logrg=\log r and using Lemma 2, we have for t>0t>0 that

Mt\displaystyle M_{t} =r(Yt)r(Y0)exp(0tLlogr(Ys)+2εlogq(Ys)tlogr(Ys)+ε|logr(Ys)|2ds)\displaystyle=\frac{r(Y_{t})}{r(Y_{0})}\exp\left(-\int_{0}^{t}L\log r(Y_{s})+2\varepsilon\nabla\log q(Y_{s})^{t}\nabla\log r(Y_{s})+\varepsilon\lvert\nabla\log r(Y_{s})\rvert^{2}\,\mathrm{d}s\right)
=|q(Y0)||~q(Y0)|q~(Yt)q(Yt)\displaystyle=\frac{\lvert\nabla q(Y_{0})\rvert}{\lvert\tilde{\nabla}q(Y_{0})\rvert}\frac{\tilde{q}(Y_{t})}{q(Y_{t})}
×exp(0tLlogr(Ys)+2εlogq(Ys)tlogr(Ys)+ε|logr(Ys)|2ds).\displaystyle\quad\times\exp\left(-\int_{0}^{t}L\log r(Y_{s})+2\varepsilon\nabla\log q(Y_{s})^{t}\nabla\log r(Y_{s})+\varepsilon\lvert\nabla\log r(Y_{s})\rvert^{2}\,\mathrm{d}s\right).

By Lemma 4,

Llogr+2εlogqlogr+ε|logr|2\displaystyle L\log r+2\varepsilon\nabla\log q\cdot\nabla\log r+\varepsilon\lvert\nabla\log r\rvert^{2}
=Llogq~Llogq+2εlogqlogq~2ε|logq|2\displaystyle\qquad=L\log\tilde{q}-L\log q+2\varepsilon\nabla\log q\cdot\nabla\log\tilde{q}-2\varepsilon\lvert\nabla\log q\rvert^{2}
+ε|logq~|2+ε|logq|22εlogqlogq~\displaystyle\qquad\qquad+\varepsilon\lvert\nabla\log\tilde{q}\rvert^{2}+\varepsilon\lvert\nabla\log q\rvert^{2}-2\varepsilon\nabla\log q\cdot\nabla\log\tilde{q}
=Llogq~+12|logq~|2.\displaystyle\qquad=L\log\tilde{q}+\frac{1}{2}\lvert\nabla\log\tilde{q}\rvert^{2}.

Thus,

Mt=|q(Y0)||~q(Y0)|q~(Yt)q(Yt)exp(0tLlogq~(Ys)+ε|logq~(Ys)|2ds).M_{t}=\frac{\lvert\nabla q(Y_{0})\rvert}{\lvert\tilde{\nabla}q(Y_{0})\rvert}\frac{\tilde{q}(Y_{t})}{q(Y_{t})}\exp\left(-\int_{0}^{t}L\log\tilde{q}(Y_{s})+\varepsilon\lvert\nabla\log\tilde{q}(Y_{s})\rvert^{2}\,\mathrm{d}s\right).

Here, for t>0t>0, we have Yt𝒯̊Y_{t}\in\mathring{\mathscr{T}} with probability one, so q(Yt)>0q(Y_{t})>0, and the right-hand-side of the equation above is well-defined. In addition, M0=1M_{0}=1 by (9). This verifies the first formula for MtM_{t} in the statement of the lemma. The second formula follows from the first and the identity

Llogq~\displaystyle L\log\tilde{q} =εΔlogq~Ulogq~\displaystyle=\varepsilon\Delta\log\tilde{q}-\nabla U\cdot\nabla\log\tilde{q}
=Lq~q~ε|logq~|2.\displaystyle=\frac{L\tilde{q}}{\tilde{q}}-\varepsilon\lvert\nabla\log\tilde{q}\rvert^{2}.

We must now eliminate the factors |q(Y0)|\lvert\nabla q(Y_{0})\rvert and q(Yt)q(Y_{t}) from the change of measure formula (10). To get rid of q(Yt)q(Y_{t}), we simply stop observing the process at the first hitting time τ\tau of BB; note that YτBY_{\tau}\in\partial B, so q(Yτ)=1q(Y_{\tau})=1. Observing the process up to time τ\tau corresponds to restricting the measures \mathds{P} and \mathds{Q} to the stopping time σ\sigma-algebra defined below.

Definition 1.

Let τ\tau be the first hitting time of BB for the process YtY_{t} solving (8). Let τ\mathcal{F}_{\tau} be the stopping time σ\sigma-algebra of τ\tau, i.e.

τ:={A:A{τt}t for all t0}.\mathcal{F}_{\tau}:=\{A\in\mathcal{F}:A\cap\{\tau\leq t\}\in\mathcal{F}_{t}\text{ for all }t\geq 0\}.

For Theorem 1 below to hold, τ\tau must be finite for both the exact and approximate transition path processes. Assumption 1 on the ergodicity of the overdamped Langevin dyanmics implies that the exact transition path process hits BB, since BB has positive Lebesgue measure. That is, [τ<]=1\mathds{Q}[\tau<\infty]=1. We assume that that τ\tau is finite for the approximate process as well.

Assumption 6.

We assume that

[τ<]=1\mathds{P}[\tau<\infty]=1

for any initial value Y0Y_{0} supported in 𝒯¯\overline{\mathscr{T}}. Recall that \mathds{P} is the measure introduced in Lemma 3 under which YtY_{t} is a weak solution of the approximate transition path equation (7).

We eliminate |q(Y0)|\lvert\nabla q(Y_{0})\rvert from (10) by taking Y0Y_{0} to have the reactive flux distribution under \mathds{Q}. In that case, the factor |q(Y0)|\lvert\nabla q(Y_{0})\rvert in the reactive flux (5) cancels with the one in (10). Everything that remains in the change of measure can be computed without knowing the exact committor qq, except for a normalizing constant. The result is formula (11) in Theorem 1.

Theorem 1.

Let Assumptions 16 hold. Let BtB_{t} be a dd-dimensional standard Brownian motion on the probability space (Ω,,)(\Omega,\mathcal{F},\mathds{Q}). Let Y0Y_{0} be a random variable on (Ω,)(\Omega,\mathcal{F}) that has the reactive flux distribution

Y01ν|q(x)|exp(U(x)/ε)dSA(x),Y_{0}\sim\frac{1}{\nu}\lvert\nabla q(x)\rvert\exp(-U(x)/\varepsilon)\,\mathrm{d}S_{A}(x),

and assume that Y0Y_{0} is independent of BtB_{t} under \mathds{Q}. Let t\mathcal{F}_{t} be the filtration generated by BtB_{t} and Y0Y_{0}. Suppose that YtY_{t} solves the transition path equation

dYt=U(Yt)dt+2εlogq(Yt)dt+2εdBt\mathrm{d}Y_{t}=-\nabla U(Y_{t})\,\mathrm{d}t+2\varepsilon\nabla\log q(Y_{t})\,\mathrm{d}t+\sqrt{2\varepsilon}\,\mathrm{d}B_{t}

with initial condition Y0Y_{0}. Now let q~\tilde{q} be an approximation to the committor qq, and let m:A[0,)m:\partial A\rightarrow[0,\infty) be an unnormalized density over A\partial A that approximates the reactive flux distribution. Let

μ=Am(x)dSA(x)\mu=\int_{\partial A}m(x)\,\mathrm{d}S_{A}(x)

be the normalizing constant of mm. For t>0t>0, define

Zt:=νμq~(Yt)q(Yt)m(Y0)|q~(Y0)|exp(U(Y0)/ε)exp(0tLq~q~(Ys)ds),Z_{t}:=\frac{\nu}{\mu}\frac{\tilde{q}(Y_{t})}{q(Y_{t})}\frac{m(Y_{0})}{\lvert\nabla\tilde{q}(Y_{0})\rvert\exp(-U(Y_{0})/\varepsilon)}\exp\left(-\int_{0}^{t}\frac{L\tilde{q}}{\tilde{q}}(Y_{s})\,\mathrm{d}s\right),

and let

Z0:=νμm(Y0)|q(Y0)|exp(U(Y0)/ε).Z_{0}:=\frac{\nu}{\mu}\frac{m(Y_{0})}{\lvert\nabla q(Y_{0})\rvert\exp(-U(Y_{0})/\varepsilon)}.

The process ZtZ_{t} is a nonnegative t\mathcal{F}_{t}-martingale with [Zt]=1\mathds{Q}[Z_{t}]=1 for all t0t\geq 0, so there exists a probability measure \mathds{P} on (Ω,)(\Omega,\mathcal{F}) with

dd|t=Zt\left.\frac{\mathrm{d}\mathds{P}}{\mathrm{d}\mathds{Q}}\right\rvert_{\mathcal{F}_{t}}=Z_{t}

for all t0t\geq 0. Under \mathds{P},

Wt:=YtY00tU(Ys)+2εlogq~(Ys)dsW_{t}:=Y_{t}-Y_{0}-\int_{0}^{t}-\nabla U(Y_{s})+2\varepsilon\nabla\log\tilde{q}(Y_{s})\,\mathrm{d}s

is a Brownian motion and Y0μ1m(x)dSA(x)Y_{0}\sim\mu^{-1}m(x)\,\mathrm{d}S_{A}(x). The density of \mathds{P} restricted to the stopping time σ\sigma-algebra τ\mathcal{F}_{\tau} is

dd|τ=Zτ=νμq~(Yτ)m(Y0)|q~(Y0)|exp(U(Y0)/ε)exp(0τLq~q~(Ys)ds).\left.\frac{\mathrm{d}\mathds{P}}{\mathrm{d}\mathds{Q}}\right\rvert_{\mathcal{F}_{\tau}}=Z_{\tau}=\frac{\nu}{\mu}\tilde{q}(Y_{\tau})\frac{m(Y_{0})}{\lvert\nabla\tilde{q}(Y_{0})\rvert\exp(-U(Y_{0})/\varepsilon)}\exp\left(-\int_{0}^{\tau}\frac{L\tilde{q}}{\tilde{q}}(Y_{s})\,\mathrm{d}s\right). (11)
Proof.

See Appendix A. ∎

4. Applications of the Change of Measure

In this section, we consider two applications of the change of measure formula: selection and training. By training, we mean the minimization of the relative entropy DKL()D_{\mathrm{KL}}(\mathds{P}\|\mathds{Q}) of the approximate transition path process with respect to the exact process over a family of approximate committor functions. By selection, we mean the comparison of different approximations to the committor function based on relative entropy differences. This is somewhat similar to model selection based on the Bayes and Akaike information criteria except that we do not consider penalizing model complexity. We also speculate on two other possibilities: error estimation and importance sampling. By error estimation, we mean the direct estimation of DKL()D_{\mathrm{KL}}(\mathds{P}\|\mathds{Q}). By importance sampling, we mean the unbiased estimation of observables of the exact process given a sample of paths of the approximate process. Error estimation and importance sampling entail difficulties that we cannot completely resolve at this time; see Section 5 for details.

4.1. Relative Entropy

In general, if PP and QQ are probability measures on the same measurable space, one defines the relative entropy (or Kullback–Leibler divergence) of PP with respect to QQ as follows.

Definition 2 (Relative entropy).

If PP and QQ are probability measures on (S,𝒮)(S,\mathscr{S}) and PP is absolutely continuous with respect to QQ, we define the relative entropy of PP with respect to QQ by

DKL(PQ):=Q[dPdQlog(dPdQ)]=P[log(dPdQ)].D_{\mathrm{KL}}(P\|Q):=Q\left[\frac{\mathrm{d}P}{\mathrm{d}Q}\log\left(\frac{\mathrm{d}P}{\mathrm{d}Q}\right)\right]=P\left[\log\left(\frac{\mathrm{d}P}{\mathrm{d}Q}\right)\right].

If PP is not absolutely continuous with respect to QQ, DKL(PQ):=D_{\mathrm{KL}}(P\|Q):=\infty.

The relative entropy is widely used in machine learning and statistics to measure discrepancies between distributions. In particular, maximum likelihood estimation can be understood as relative entropy minimization. The relative entopy is not symmetric and the triangle inequality does not hold, so it is not a metric. However, DKL(PQ)0D_{\mathrm{KL}}(P\|Q)\geq 0 for all PP and QQ, and DKL(PQ)=0D_{\mathrm{KL}}(P\|Q)=0 implies P=QP=Q. Moreover, DKL(PQ)D_{\mathrm{KL}}(P\|Q) relates to some well-known metrics. For example, Pinsker’s inequality bounds the total variation distance in terms of DKLD_{\mathrm{KL}}:

PQTV=maxA𝒮|P(A)Q(A)|12DKL(PQ).\lVert P-Q\rVert_{\mathrm{TV}}=\max_{A\in\mathscr{S}}\,\lvert P(A)-Q(A)\rvert\leq\sqrt{\frac{1}{2}D_{\mathrm{KL}}(P\|Q)}.

By abuse of notation, throughout the rest of this work, we will write \mathds{P} and \mathds{Q} for the restrictions of \mathds{P} and \mathds{Q} to the stopping time σ\sigma-algebra τ\mathcal{F}_{\tau}. In other words, we will stop observing the process YtY_{t} when it hits BB at time τ\tau. By formula (11) in Theorem 1, we have

DKL()\displaystyle D_{\mathrm{KL}}(\mathds{P}\|\mathds{Q}) =[log(dd)]\displaystyle=\mathds{P}\left[\log\left(\frac{\mathrm{d}\mathds{P}}{\mathrm{d}\mathds{Q}}\right)\right]
=logνlogμ\displaystyle=\log\nu-\log\mu
+Alog(m(x)|q~(x)|exp(U(x)/ε))m(x)μdSA(x)\displaystyle\qquad+\int_{\partial A}\log\left(\frac{m(x)}{\lvert\nabla\tilde{q}(x)\rvert\exp(-U(x)/\varepsilon)}\right)\frac{m(x)}{\mu}\,\mathrm{d}S_{A}(x)
+[logq~(Yτ)0τLq~q~(Ys)ds]\displaystyle\qquad+\mathds{P}\left[\log\tilde{q}(Y_{\tau})-\int_{0}^{\tau}\frac{L\tilde{q}}{\tilde{q}}(Y_{s})\,\mathrm{d}s\right] (12)

and

DKL()\displaystyle D_{\mathrm{KL}}(\mathds{Q}\|\mathds{P}) =[log(dd)]\displaystyle=\mathds{Q}\left[\log\left(\frac{\mathrm{d}\mathds{Q}}{\mathrm{d}\mathds{P}}\right)\right]
=logμlogν\displaystyle=\log\mu-\log\nu
+Alog(|q~(x)|exp(U(x)/ε)m(x))|q(x)|exp(U(x)/ε)νdSA(x)\displaystyle\qquad+\int_{\partial A}\log\left(\frac{\lvert\nabla\tilde{q}(x)\rvert\exp(-U(x)/\varepsilon)}{m(x)}\right)\frac{\lvert\nabla q(x)\rvert\exp(-U(x)/\varepsilon)}{\nu}\,\mathrm{d}S_{A}(x)
+[0τLq~q~(Ys)dslogq~(Yτ)].\displaystyle\qquad+\mathds{Q}\left[\int_{0}^{\tau}\frac{L\tilde{q}}{\tilde{q}}(Y_{s})\,\mathrm{d}s-\log\tilde{q}(Y_{\tau})\right]. (13)

Both of these relative entropies measure the discrepancy between the exact distribution of transition paths and the distribution with an approximate committor function q~\tilde{q} in place of qq.

These formulas simplify under some conditions. First, note that logq~(Yτ)=0\log\tilde{q}(Y_{\tau})=0 when we impose q~=1\tilde{q}=1 on B\partial B. Second, note that the integrals over A\partial A vanish when

m(x)=|q~(x)|exp(U(x)/ε).m(x)=\lvert\nabla\tilde{q}(x)\rvert\exp(-U(x)/\varepsilon).

Of course, both conditions hold for the exact committor and reactive flux distribution, and one could impose either in practice when computing approximate committor functions.

Remark 4.

The existence of the change of measure dd\frac{\mathrm{d}\mathds{P}}{\mathrm{d}\mathds{Q}} does not imply DKL()<D_{\mathrm{KL}}(\mathds{P}\|\mathds{Q})<\infty. We note that DKL()<D_{\mathrm{KL}}(\mathds{P}\|\mathds{Q})<\infty under the same conditions that imply finite variance of the estimator in Lemma 8, but with [τ2]<\mathds{P}[\tau^{2}]<\infty replaced by [τ]<\mathds{P}[\tau]<\infty. We leave the proof to the reader.

4.2. Selection: Estimating Relative Entropy Differences

Let q~\tilde{q} and q¯\bar{q} be approximate committor functions for which Assumption 4 holds. Let m~\tilde{m} and m¯\bar{m} be approximate reactive flux densities supported on A\partial A with normalizing constants

μ~:=Am~(x)dSA(x) and μ¯:=Am¯(x)dSA(x).\tilde{\mu}:=\int_{\partial A}\tilde{m}(x)\,\mathrm{d}S_{A}(x)\text{ and }\bar{\mu}:=\int_{\partial A}\bar{m}(x)\,\mathrm{d}S_{A}(x).

Assume that m~\tilde{m} and m¯\bar{m} are strictly positive. We define ~\tilde{\mathds{P}} to be the law of the approximate transition path process corresponding to q~\tilde{q} and m~\tilde{m}, i.e. the law of a weak solution of (7) with singular drift 2εlogq~(Yt)2\varepsilon\nabla\log\tilde{q}(Y_{t}) and initial condition Y0μ~1m(x)dSA(x)Y_{0}\sim\tilde{\mu}^{-1}m(x)\,\mathrm{d}S_{A}(x). We define ¯\bar{\mathds{P}} similarly, but with q¯\bar{q} and m¯\bar{m} in place of q~\tilde{q} and m~\tilde{m}. We now explain how to estimate the relative entropy difference

δ(q~,m~;q¯,m¯):=DKL(~)DKL(¯)\delta(\tilde{q},\tilde{m};\bar{q},\bar{m}):=D_{\mathrm{KL}}(\tilde{\mathds{P}}\|\mathds{Q})-D_{\mathrm{KL}}(\bar{\mathds{P}}\|\mathds{Q})

given samples of paths from ~\tilde{\mathds{P}} and ¯\bar{\mathds{P}}. Using our estimator, one can compare the quality of approximate committor functions. For example, one can monitor the relative entropy difference between initial and successive approximations to the committor to assess progress during training, or one can compare the quality of coarse-grained approximations to the committor depending on different numbers of variables.

Suppose that we have an independent sample of N~\tilde{N} paths Y~tk\tilde{Y}^{k}_{t} drawn from ~\tilde{\mathds{P}}:

Y~tki.i.d.~ for k=1,,N~.\tilde{Y}^{k}_{t}\overset{\mathrm{i.i.d.}}{\sim}\tilde{\mathds{P}}\text{ for }k=1,\dots,\tilde{N}.

Under certain conditions on q~\tilde{q}, the sample average

I~N~:=1N~k=1N~log(m(Y~0k)|q~(Y~0k)|exp(U(Y~0k)/ε))+logq~(Y~τk)0τLq~q~(Y~sk)ds\displaystyle\tilde{I}_{\tilde{N}}:=\frac{1}{\tilde{N}}\sum_{k=1}^{\tilde{N}}\log\left(\frac{m(\tilde{Y}^{k}_{0})}{\lvert\nabla\tilde{q}(\tilde{Y}^{k}_{0})\rvert\exp(-U(\tilde{Y}^{k}_{0})/\varepsilon)}\right)+\log\tilde{q}(\tilde{Y}^{k}_{\tau})-\int_{0}^{\tau}\frac{L\tilde{q}}{\tilde{q}}(\tilde{Y}^{k}_{s})\,\mathrm{d}s

is a consistent and unbiased estimator of the term

I~\displaystyle\tilde{I} :=Alog(m(x)|q~(x)|exp(U(x)/ε))m(x)μdSA(x)\displaystyle:=\int_{\partial A}\log\left(\frac{m(x)}{\lvert\nabla\tilde{q}(x)\rvert\exp(-U(x)/\varepsilon)}\right)\frac{m(x)}{\mu}\,\mathrm{d}S_{A}(x)
+θ[logq~(Yτ)0τLq~q~(Ys)ds]\displaystyle\qquad+\mathds{P}_{\theta}\left[\log\tilde{q}(Y_{\tau})-\int_{0}^{\tau}\frac{L\tilde{q}}{\tilde{q}}(Y_{s})\,\mathrm{d}s\right]

appearing in formula (12) for DKL(~)D_{\mathrm{KL}}(\tilde{\mathds{P}}\|\mathds{Q}). We give conditions on q~\tilde{q} in Lemma 8 in Appendix B that guarantee finite variance and a weak law of large numbers. Of course, one could estimate the analogous term I¯\bar{I} in DKL(¯)D_{\mathrm{KL}}(\bar{\mathds{P}}\|\mathds{Q}) by a similar average I¯N¯\bar{I}_{\bar{N}} given a sample of N¯\bar{N} paths Y¯tk\bar{Y}^{k}_{t} drawn from ¯\bar{\mathds{P}}. The crucial problem is to show that the variance of 0τLq~q~(Y~sk)ds\int_{0}^{\tau}\frac{L\tilde{q}}{\tilde{q}}(\tilde{Y}^{k}_{s})\,\mathrm{d}s is finite even though Lq~q~\frac{L\tilde{q}}{\tilde{q}} would typically be singular on A\partial A, since q~=0\tilde{q}=0 on A\partial A.

We have

δ(q~,m~;q¯,m¯)=log(μ~μ¯)+I~I¯,\delta(\tilde{q},\tilde{m};\bar{q},\bar{m})=\log\left(\frac{\tilde{\mu}}{\bar{\mu}}\right)+\tilde{I}-\bar{I},

so it remains to estimate the ratio μ~μ¯\frac{\tilde{\mu}}{\bar{\mu}}. Analogous problems involving ratios of normalizing constants arise in Bayesian model selection and the calculation of alchemical free energy differences. Refer to [11] for a survey of methods for free energy calculations. We propose to use the Bennett Acceptance Ratio (BAR) method [1] to estimate μ~μ¯\frac{\tilde{\mu}}{\bar{\mu}}. BAR is in some sense the optimal estimator of μ~μ¯\frac{\tilde{\mu}}{\bar{\mu}} given i.i.d. samples from m~dS\tilde{m}\,\mathrm{d}S and m¯dS\bar{m}\,\mathrm{d}S. More complex alternatives that involve sampling from multiple distributions, such as the Multistate Bennett Acceptance Ratio (MBAR) method, may be needed in cases where m~\tilde{m} and m¯\bar{m} differ greatly [15].

Remark 5.

In practice, to sample from the approximate reactive flux densities m~\tilde{m} and m¯\bar{m}, one would most likely use a Markov chain Monte Carlo (MCMC) method. Many methods have been developed to sample distributions supported on submanifolds such as A\partial A. For example, see [11, Sections 3.2.3-4]. Of course, MCMC would produce correlated samples from the approximate reactive flux distributions, not independent samples as assumed above.

4.3. Training: Minimizing the Relative Entropy

One could also try to minimize the relative entropy over a family of approximate committor functions by gradient descent. Let {qθ;θk}\{q_{\theta};\theta\in\mathbb{R}^{k}\} be such a family. Let each approximate committor function correspond to an approximate reactive flux distribution

mθ(x):=μθ1qθ(x)n(x)exp(U(x)/ε)dSA(x),m_{\theta}(x):=\mu_{\theta}^{-1}\nabla q_{\theta}(x)\cdot n(x)\exp(-U(x)/\varepsilon)\,\mathrm{d}S_{A}(x), (14)

where

μθ:=Aqθ(x)n(x)exp(U(x)/ε)dSA(x).\mu_{\theta}:=\int_{\partial A}\nabla q_{\theta}(x)\cdot n(x)\exp(-U(x)/\varepsilon)\,\mathrm{d}S_{A}(x).

We define θ\mathds{P}_{\theta} to be the law of a weak solution of the stochastic differential equation (7) observed up to time τ\tau for the approximate committor function qθq_{\theta} and initial condition Y0mθY_{0}\sim m_{\theta}. In this section, we derive a consistent estimator of θDKL(θ)\nabla_{\theta}D_{\mathrm{KL}}(\mathds{P}_{\theta}\|\mathds{Q}) given a sample from θ\mathds{P}_{\theta}. We then use the estimator in a stochastic gradient descent method to minimize DKL(θ)D_{\mathrm{KL}}(\mathds{P}_{\theta}\|\mathds{Q}).

For our results in this section, we require much stronger assumptions on qθq_{\theta} and on the distribution of τ\tau than those imposed elsewhere; cf. Assumption 7. The most significant of these new assumptions is that

Lqθ=0 on A for all θk.Lq_{\theta}=0\text{ on }\partial A\text{ for all }\theta\in\mathbb{R}^{k}. (15)

Recall that for the exact committor qq, or more precisely for the smooth extension guaranteed by Lemma 1, we have Lq=0Lq=0 everywhere in 𝒯¯\bar{\mathscr{T}}, including on A\partial A; cf. Remark 2. One can construct practical families qθq_{\theta} satisfying (15) (cf. Section 5), and our numerical experiments indicate that stochastic gradient descent may converge faster when (15) holds. However, we do not claim that (15) is necessary for differentiability.

If Lqθ=0Lq_{\theta}=0 on A\partial A, then the quantity

(x;θ):=Lqθqθ(x)\ell(x;\theta):=\frac{Lq_{\theta}}{q_{\theta}}(x)

that appears in the change of measure formula (11) extends to a smooth function of xx defined on an open neighborhood of A\partial A, even though we assume qθ=0q_{\theta}=0 on A\partial A. To see this, let LqθLq_{\theta} take the place of q~\tilde{q} in the proof of Lemma 2. Similarly,

θ(x;θ)=θLqθ(x)qθ(x)θqθ(x)qθ(x)Lqθ(x)qθ(x)\nabla_{\theta}\ell(x;\theta)=\frac{\nabla_{\theta}Lq_{\theta}(x)}{q_{\theta}(x)}-\frac{\nabla_{\theta}q_{\theta}(x)}{q_{\theta}(x)}\frac{Lq_{\theta}(x)}{q_{\theta}(x)}

also extends to a smooth function on an open neighborhood of A\partial A, since we have Lqθ=qθ=0Lq_{\theta}=q_{\theta}=0 on A\partial A for all θ\theta, hence θLqθ=θqθ=0\nabla_{\theta}Lq_{\theta}=\nabla_{\theta}q_{\theta}=0 on A\partial A. We will assume that both (x;θ)\ell(x;\theta) and θ(x;θ)\nabla_{\theta}\ell(x;\theta) are not merely smooth on a neighborhood of A\partial A but bounded uniformly over all xx and θ\theta.

Another new assumption is that, for each θk\theta\in\mathbb{R}^{k}, there is some γ(θ)>0\gamma(\theta)>0 so that

θ[exp(γ(θ)τ)]<;\mathds{P}_{\theta}[\exp(\gamma(\theta)\tau)]<\infty; (16)

that is, the moment generating function of τ\tau under θ\mathds{P}_{\theta} is finite on an open interval containing zero. We will not verify (16) for any family of approximate committor functions. However, we note that for the exact transition path process in one-dimension, under some rather stringent conditions on the potential energy UU and on the sets AA and BB, one can show that τ\tau converges in distribution as ε0\varepsilon\rightarrow 0 to a shifted and scaled Gumbel random variable [2]. Under much more general conditions, escape times from basins of attraction of minima of UU are approximately exponentially distributed for the overdamped Langevin dynamics in the limit of small ε\varepsilon [8]. The moment generating function of a Gumbel or exponential random variable will be finite on an open interval containing zero, but not globally finite, which motivates (16).

We summarize these new assumptions together with a few others below.

Assumption 7.

We assume that qθ(x)q_{\theta}(x) is infinitely differentiable in both xx and θ\theta. We impose the following boundary conditions:

  • qθ=0q_{\theta}=0 for xAx\in\partial A.

  • qθ(x)>0q_{\theta}(x)>0 for x𝒯̊Bx\in\mathring{\mathscr{T}}\cup\partial B.

  • qθ(x)n(x)>0\nabla q_{\theta}(x)\cdot n(x)>0 for xAx\in\partial A.

  • Lqθ(x)=0Lq_{\theta}(x)=0 for xAx\in\partial A.

We assume the following uniform bounds:

  • For some C>0C>0,

    |Lqθqθ(x)|C and |θLqθqθ(x)|C\left\lvert\frac{Lq_{\theta}}{q_{\theta}}(x)\right\rvert\leq C\text{ and }\left\lvert\nabla_{\theta}\frac{Lq_{\theta}}{q_{\theta}}(x)\right\rvert\leq C

    for all x𝒯¯x\in\bar{\mathscr{T}} and all θk\theta\in\mathbb{R}^{k}.

  • For some M>0M>0,

    Mμθ1M>0 and |θmθ|MM\geq\mu_{\theta}\geq\frac{1}{M}>0\text{ and }\lvert\nabla_{\theta}m_{\theta}\rvert\leq M

    for all θk\theta\in\mathbb{R}^{k}.

We assume that for each θk\theta\in\mathbb{R}^{k}, there is some γ(θ)>0\gamma(\theta)>0 so that

θ[exp(γ(θ)τ)]<.\mathds{P}_{\theta}[\exp(\gamma(\theta)\tau)]<\infty. (17)

Under Assumption 7, the relative entropy is differentiable.

Theorem 2.

Let Assumption 7 hold, and let θ\mathds{P}_{\theta} and mθm_{\theta} be defined as above. The relative entropy DKL(θ)D_{\mathrm{KL}}(\mathds{P}_{\theta}\|\mathds{Q}) is differentiable as a function of θ\theta, and

θDKL(θ|)\displaystyle\nabla_{\theta}D_{\mathrm{KL}}(\mathds{P}_{\theta}|\mathds{Q}) =covθ(θlogqθ(Yτ)0τθLqθqθ(Ys)ds,\displaystyle=\operatorname{cov}_{\mathds{P}_{\theta}}\left(\nabla_{\theta}\log q_{\theta}(Y_{\tau})-\int_{0}^{\tau}\nabla_{\theta}\frac{Lq_{\theta}}{q_{\theta}}(Y_{s})\,\mathrm{d}s,\right.
logqθ(Yτ)0τLqθqθ(Ys)ds).\displaystyle\qquad\qquad\qquad\qquad\left.\log q_{\theta}(Y_{\tau})-\int_{0}^{\tau}\frac{Lq_{\theta}}{q_{\theta}}(Y_{s})\,\mathrm{d}s\right).

We give the proof in Appendix B. We propose to estimate θDKL(θ)\nabla_{\theta}D_{\mathrm{KL}}(\mathds{P}_{\theta}\|\mathds{Q}) like any other covariance. Given a sample

Yki.i.d.θ for k=1,,N,Y^{k}\overset{\mathrm{i.i.d.}}{\sim}\mathds{P}_{\theta}\text{ for }k=1,\dots,N,

we define sample averages

JN\displaystyle J_{N} :=1Nk=1Nlogqθ(Yτk)0τLqθqθ(Ysk)ds\displaystyle:=\frac{1}{N}\sum_{k=1}^{N}\log q_{\theta}(Y^{k}_{\tau})-\int_{0}^{\tau}\frac{Lq_{\theta}}{q_{\theta}}(Y^{k}_{s})\,\mathrm{d}s
KN\displaystyle K_{N} :=1Nk=1Nθlogqθ(Yτk)0τθLqθqθ(Ysk)ds\displaystyle:=\frac{1}{N}\sum_{k=1}^{N}\nabla_{\theta}\log q_{\theta}(Y^{k}_{\tau})-\int_{0}^{\tau}\nabla_{\theta}\frac{Lq_{\theta}}{q_{\theta}}(Y^{k}_{s})\,\mathrm{d}s

and the sample covariance

GN:=1N1k=1N(logqθ(Yτk)0τLqθqθ(Ysk)dsJN)×(θlogqθ(Yτk)0τθLqθqθ(Ysk)dsKN).\begin{split}G_{N}&:=\frac{1}{N-1}\sum_{k=1}^{N}\left(\log q_{\theta}(Y^{k}_{\tau})-\int_{0}^{\tau}\frac{Lq_{\theta}}{q_{\theta}}(Y^{k}_{s})\,\mathrm{d}s-J_{N}\right)\\ &\qquad\qquad\qquad\times\left(\nabla_{\theta}\log q_{\theta}(Y^{k}_{\tau})-\int_{0}^{\tau}\nabla_{\theta}\frac{Lq_{\theta}}{q_{\theta}}(Y^{k}_{s})\,\mathrm{d}s-K_{N}\right).\end{split} (18)

Assumption 7 implies that GNG_{N} has finite variance. We leave the proof to the reader. Using this gradient estimator, one can attempt to minimize DKL(θ)D_{\mathrm{KL}}(\mathds{P}_{\theta}\|\mathds{Q}) by stochastic gradient descent.

4.4. Importance Sampling and Direct Estimation of Relative Entropy

Using our change of measure formula, one could in principle estimate arbitrary expectations over the transition path process \mathds{Q} given a sample of paths from the approximate process \mathds{P}. Let g:Ωg:\Omega\rightarrow\mathbb{R} be τ\mathcal{F}_{\tau}-measurable, i.e. a function of the process observed up to time τ\tau. By (11), assuming for simplicity that q~=1\tilde{q}=1 on B\partial B and m=|q~|exp(U/ε)m=\lvert\nabla\tilde{q}\rvert\exp(-U/\varepsilon), we have

[g]\displaystyle\mathds{Q}[g] =[Zτ1g]\displaystyle=\mathds{P}\left[Z_{\tau}^{-1}g\right]
=μν[exp(0τ(ω)Lq~q~(Ys(ω))ds)g(ω)],\displaystyle=\frac{\mu}{\nu}\mathds{P}\left[\exp\left(\int_{0}^{\tau(\omega)}\frac{L\tilde{q}}{\tilde{q}}(Y_{s}(\omega))\,\mathrm{d}s\right)g(\omega)\right],

which suggests the self-normalized importance sampling estimator

g¯=1Nk=1Nexp(0τLq~q~(Ysk)ds)g(Yk)1Nk=1Nexp(0τLq~q~(Ysk)ds),\bar{g}=\frac{\frac{1}{N}\sum_{k=1}^{N}\exp\left(\int_{0}^{\tau}\frac{L\tilde{q}}{\tilde{q}}(Y^{k}_{s})\,\mathrm{d}s\right)g(Y^{k})}{\frac{1}{N}\sum_{k=1}^{N}\exp\left(\int_{0}^{\tau}\frac{L\tilde{q}}{\tilde{q}}(Y^{k}_{s})\,\mathrm{d}s\right)},

where Yki.i.d. for k=1,,NY^{k}\overset{\mathrm{i.i.d.}}{\sim}\mathds{P}\text{ for }k=1,\dots,N. Experience with other estimators related to Girsanov’s theorem strongly suggests that such a naïve estimator will not be robust. Some successes have been attained in the importance sampling of paths of diffusion processes by roughly similar estimators [18, 4, 16]. We hope that our results will facilitate the development of analogous methods for the transition path process. However, we note that [18, 4, 16] treat only paths observed up to a fixed finite time, not up to an unbounded stopping time as in our work.

One might also consider the possibility of estimating the relative entropy DKL()D_{\mathrm{KL}}(\mathds{P}\|\mathds{Q}), not a relative entropy difference, for a given approximate committor function. The crux of the problem is to estimate the ratio of normalizing constants

νμ\frac{\nu}{\mu}

given a sample of transition paths from \mathds{P}. In Section 4.2, we explain how to estimate all other terms comprising the relative entropy. As above, we have

νμ\displaystyle\frac{\nu}{\mu} =[exp(0τLq~q~(Ys)ds)],\displaystyle=\mathds{P}\left[\exp\left(\int_{0}^{\tau}\frac{L\tilde{q}}{\tilde{q}}(Y_{s})\,\mathrm{d}s\right)\right],

which suggests the estimator

1Nk=1Nexp(0τLq~q~(Ysk)ds),\frac{1}{N}\sum_{k=1}^{N}\exp\left(\int_{0}^{\tau}\frac{L\tilde{q}}{\tilde{q}}(Y^{k}_{s})\,\mathrm{d}s\right),

where Yki.i.d. for k=1,,NY^{k}\overset{\mathrm{i.i.d.}}{\sim}\mathds{P}\text{ for }k=1,\dots,N. We do not expect such an estimator to be much more robust than the general importance sampling estimator.

5. Numerical Methods

In this section, we address some obstacles to the practical implementation of the estimators proposed above. In Section 5.1, we explain how to construct families of committor functions that satisfy Assumption 4 and the boundary condition Lq~=0L\tilde{q}=0 on A\partial A that arises in Section 4.3. In Section 5.2, we explain how to handle the possibly singular integrand Lq~q~\frac{L\tilde{q}}{\tilde{q}} that appears in our change of measure formula (11).

5.1. Representing the Committor Function

We devise a practical means of representing approximate committor functions that satisfy Assumption 4 and the condition

Lq~=0 on AL\tilde{q}=0\text{ on }\partial A

appearing in Assumption 7. In our approach, one must first choose a computable function T:ΩT:\Omega\rightarrow\mathbb{R} with the following properties:

  • TT extends to a smooth function on an open neighborhood of Ω¯\overline{\Omega}.

  • T(x)n(x)>0\nabla T(x)\cdot n(x)>0 for all xAx\in\partial A.

  • T(x)=0T(x)=0 for all xAx\in\partial A.

  • T(x)>0T(x)>0 for all xΩAx\in\Omega\setminus\partial A.

For example, if A={xd;ξ(x)a}A=\{x\in\mathbb{R}^{d};\xi(x)\leq a\} were a sublevel set of some function ξ:d\xi:\mathbb{R}^{d}\rightarrow\mathbb{R}, one could try T=ξaT=\xi-a. Observe that all of the above conditions would hold at least locally in a neighborhood of A\partial A if |ξ(x)|\lvert\nabla\xi(x)\rvert were positive on A\partial A.

In our numerical experiments, the approximate committor takes the form

q~=Texp(w),\tilde{q}=T\exp(w),

where TT is as above and w:Ωw:\Omega\rightarrow\mathbb{R} belongs to some convenient family of smooth functions. Any q~\tilde{q} of this form satisfies Assumption 4. In particular, the normal derivative of q~\tilde{q} is positive over A\partial A, since

q~(x)n(x)\displaystyle\nabla\tilde{q}(x)\cdot n(x) ={T(x)exp(w(x))+w(x)T(x)exp(w(x))}n(x)\displaystyle=\big{\{}\nabla T(x)\exp(w(x))+\nabla w(x)T(x)\exp(w(x))\big{\}}\cdot n(x)
=T(x)n(x)exp(w(x))\displaystyle=\nabla T(x)\cdot n(x)\exp(w(x))
>0\displaystyle>0

for xAx\in\partial A. Moreover, for any TT meeting the conditions above, the exact committor qq can be written in the form q=Texp(w)q=T\exp(w) for w=log(q/T)w=\log\left(q/T\right), and this ww extends to a smooth function on an open neighborhood of Ω\Omega by Lemma 2.

The condition Lq~=0L\tilde{q}=0 on A\partial A appearing in Assumption 7 is equivalent with a Neumann boundary condition on ww. We have

Lq~={LT+2εwT+T(Lw+ε|w|2)}exp(w).L\tilde{q}=\left\{LT+2\varepsilon\nabla w\cdot\nabla T+T\left(Lw+\varepsilon\lvert\nabla w\rvert^{2}\right)\right\}\exp(w). (19)

Since T=0T=0 and Tn>0\nabla T\cdot n>0 on A\partial A, T=|T|n\nabla T=\lvert\nabla T\rvert n on A\partial A, and therefore (19) implies that Lq~=0L\tilde{q}=0 on A\partial A if and only if ww satisfies the Neumann boundary condition

w(x)n(x)=12εLT(x)|T(x)|\nabla w(x)\cdot n(x)=-\frac{1}{2\varepsilon}\frac{LT(x)}{\lvert\nabla T(x)\rvert} (20)

for xAx\in\partial A.

Depending on the geometry of A\partial A, it may be more or less difficult to impose the Neumann boundary condition. We propose a general approach that requires a smooth global retraction ρ:ΩA\rho:\Omega\rightarrow\partial A in addition to TT. We represent ww in the form

w(x)=w0(ρ(x))+T(x)w1(ρ(x))+T(x)2w2(x).w(x)=w_{0}(\rho(x))+T(x)w_{1}(\rho(x))+T(x)^{2}w_{2}(x). (21)

Here, w0:Aw_{0}:\partial A\rightarrow\mathbb{R} and w2:Ωw_{2}:\Omega\rightarrow\mathbb{R} are arbitrary. Given w0w_{0}, the condition Lq~=0L\tilde{q}=0 on A\partial A determines w1:Aw_{1}:\partial A\rightarrow\mathbb{R}. See Lemma 6 below for details.

Lemma 6.

Any function w:Ωw:\Omega\rightarrow\mathbb{R} that admits a smooth extension to an open neighborhood of Ω¯\overline{\Omega} may be expressed in the form (21) for some unique smooth w0w_{0}, w1w_{1}, and w2w_{2}. Moreover, the approximate committor function q~=Texp(w)\tilde{q}=T\exp(w) has Lq~=0L\tilde{q}=0 on A\partial A if and only if

w1(z)=w0t(z)Dρ(z)n(z)|T(z)|LT(z)|T(z)|2w_{1}(z)=-\frac{\nabla w_{0}^{t}(z)D\rho(z)n(z)}{\lvert\nabla T(z)\rvert}-\frac{LT(z)}{\lvert\nabla T(z)\rvert^{2}} (22)

for all zAz\in\partial A. In particular, the exact committor function qq can be expressed as q=Texp(w)q=T\exp(w) with ww of the form (21) for some smooth w0w_{0}, w1w_{1}, and w2w_{2}.

Proof.

See Appendix C. ∎

Observe that w0w_{0} determines the approximate reactive flux distribution corresponding to q~\tilde{q}. When ww takes the form (21), we have

q~(x)=T(x)exp(w0(x))\nabla\tilde{q}(x)=\nabla T(x)\exp(w_{0}(x))

for xAx\in\partial A, and therefore

|q~(x)|exp(U(x)/ε)dSA(x)=|T(x)|exp(w0(x))exp(U(x)/ε)dSA(x),\lvert\nabla\tilde{q}(x)\rvert\exp(-U(x)/\varepsilon)\,\mathrm{d}S_{A}(x)=\lvert\nabla T(x)\rvert\exp(w_{0}(x))\exp(U(x)/\varepsilon)\,\mathrm{d}S_{A}(x),

using that T=|T|n\nabla T=\lvert\nabla T\rvert n on A\partial A, as explained above.

5.2. Alternative Expression for the Improper Integral

Observe that Lq~q~\frac{L\tilde{q}}{\tilde{q}} must be singular on A\partial A unless Lq~=0L\tilde{q}=0 on A\partial A. Moreover, to compute Lq~L\tilde{q} requires computing Δq~\Delta\tilde{q}, which cannot be done efficiently by automatic differentiation when Ω\Omega is high-dimensional. Here, we derive a convenient alternative expression for the improper integral term

0τLq~q~(Ys)ds\int_{0}^{\tau}\frac{L\tilde{q}}{\tilde{q}}(Y_{s})\,\mathrm{d}s

in the relative entropy (12). At the cost of significant additional complexity, we remove both the singularity and also the computationally undesirable dependence on Δq~\Delta\tilde{q}. Lemma 7 below is the crucial result.

Lemma 7.

Let S:𝒯S:\mathscr{T}\rightarrow\mathbb{R} be a function for which Assumption 4 holds. Let w=logq~Sw=\log\frac{\tilde{q}}{S}, so

q~=Sexp(w).\tilde{q}=S\exp(w).

We have

0τLq~q~(Yt)dt=w(Yτ)w(Y0)\displaystyle\int_{0}^{\tau}\frac{L\tilde{q}}{\tilde{q}}(Y_{t})\,\mathrm{d}t=w(Y_{\tau})-w(Y_{0}) +0τLSS(Yt)ϵ|w(Yt)|2dt\displaystyle+\int_{0}^{\tau}\frac{LS}{S}(Y_{t})-\epsilon|\nabla w(Y_{t})|^{2}\,\mathrm{d}t
0τ2ϵw(Yt)dWt,\displaystyle-\int_{0}^{\tau}\sqrt{2\epsilon}\nabla w(Y_{t})\cdot\mathrm{d}W_{t},

where WtW_{t} is the \mathds{P}-Brownian motion in Theorem 1.

We give the proof in Appendix B. Here, we have only replaced one possibly singular integrand, Lq~q~\frac{L\tilde{q}}{\tilde{q}}, by another, LSS\frac{LS}{S}, so perhaps it is not clear that this is a useful formula. Note, however, that for theoretical purposes, we may take S=qS=q, so that w=logr=log(q~/q)w=\log r=\log(\tilde{q}/q), LS=0LS=0 everywhere in 𝒯¯\bar{\mathscr{T}} including on A\partial A, and the LSS\frac{LS}{S} term vanishes. In that case, one recovers a formula for the change of measure that is similar to (9). We use this formula in the proofs of some results in Section 4.

For computational purposes, one can choose SS so that ΔS\Delta S is easy to compute explicitly without resorting to automatic differentiation, and then one need only compute q~\nabla\tilde{q}, not Δq~\Delta\tilde{q}, when evaluating 0τLq~q~(Yt)dt\int_{0}^{\tau}\frac{L\tilde{q}}{\tilde{q}}(Y_{t})\,\mathrm{d}t. Note that if q~\tilde{q} is a neural network, then Δq~\Delta\tilde{q} is most likely be intractable for high-dimensional Ω\Omega, but q~\nabla\tilde{q} can be computed efficiently by backpropagation. Moreover, is LS=0LS=0 on A\partial A, then the integrand LSS\frac{LS}{S} is no longer singular. We explain in Section 5.1 above how to construct such an SS. One possibility is

S(x)=T(x)exp(T(x)LT(ρ(x))|T(x)|2),S(x)=T(x)\exp\left(-\frac{T(x)LT(\rho(x))}{\lvert\nabla T(x)\rvert^{2}}\right),

corresponding to taking w0w_{0} and w2w_{2} to be zero in (21). Here, TT and ρ\rho are as in Section 5.1. Thus, using Lemma 7, for a judicious choice of SS, one can eliminate both the singularity and the computationally intractable Laplacian of q~\tilde{q}. We grant, however, that for the particular SS defined above, it may still be difficult to compute LSLS efficiently in practice, especially because the generator LL depends on U\nabla U.

Remark 6.

We warn the reader that since WtW_{t} depends on q~\tilde{q}, one has to be rather careful in attempts to use this formula to simplify the calculation of derivatives of the relative entropy.

Appendix A Proofs of Results in Section 3

Proof of Theorem 1.

Since we assume the Novikov condition, by Lemma 5,

Mt=|q(Y0)||q~(Y0)|q~(Yt)q(Yt)exp(0tLq~q~(Ys)ds)M_{t}=\frac{\lvert\nabla q(Y_{0})\rvert}{\lvert\nabla\tilde{q}(Y_{0})\rvert}\frac{\tilde{q}(Y_{t})}{q(Y_{t})}\exp\left(-\int_{0}^{t}\frac{L\tilde{q}}{\tilde{q}}(Y_{s})\,\mathrm{d}s\right)

is an t\mathcal{F}_{t}-martingale under \mathds{Q} with [Mt]=1\mathds{Q}[M_{t}]=1 for t0t\geq 0. It follows that

Zt\displaystyle Z_{t} =μ1m(Y0)ν1|q(Y0)|exp(U(Y0)/ε)Mt\displaystyle=\frac{\mu^{-1}m(Y_{0})}{\nu^{-1}\lvert\nabla q(Y_{0})\rvert\exp(-U(Y_{0})/\varepsilon)}M_{t}

is a martingale with [Zt]=1\mathds{Q}[Z_{t}]=1: For 0st0\leq s\leq t, we have

[Zt|s]\displaystyle\mathds{Q}[Z_{t}|\mathcal{F}_{s}] =[μ1m(Y0)ν1|q(Y0)|exp(U(Y0)/ε)Mt|s]\displaystyle=\mathds{Q}\left[\left.\frac{\mu^{-1}m(Y_{0})}{\nu^{-1}\lvert\nabla q(Y_{0})\rvert\exp(-U(Y_{0})/\varepsilon)}M_{t}\right|\mathcal{F}_{s}\right]
=μ1m(Y0)ν1|q(Y0)|exp(U(Y0)/ε)[Mt|s]\displaystyle=\frac{\mu^{-1}m(Y_{0})}{\nu^{-1}\lvert\nabla q(Y_{0})\rvert\exp(-U(Y_{0})/\varepsilon)}\mathds{Q}[M_{t}|\mathcal{F}_{s}]
=μ1m(Y0)ν1|q(Y0)|exp(U(Y0)/ε)Ms.\displaystyle=\frac{\mu^{-1}m(Y_{0})}{\nu^{-1}\lvert\nabla q(Y_{0})\rvert\exp(-U(Y_{0})/\varepsilon)}M_{s}.
=Zs,\displaystyle=Z_{s},

Moreover,

[Zt]\displaystyle\mathds{Q}[Z_{t}] =[[μ1m(Y0)ν1|q(Y0)|exp(U(Y0)/ε)Mt|0]]\displaystyle=\mathds{Q}\left[\mathds{Q}\left[\left.\frac{\mu^{-1}m(Y_{0})}{\nu^{-1}\lvert\nabla q(Y_{0})\rvert\exp(-U(Y_{0})/\varepsilon)}M_{t}\right|\mathcal{F}_{0}\right]\right]
=[μ1m(Y0)ν1|q(Y0)|exp(U(Y0)/ε)[Mt|0]]\displaystyle=\mathds{Q}\left[\frac{\mu^{-1}m(Y_{0})}{\nu^{-1}\lvert\nabla q(Y_{0})\rvert\exp(-U(Y_{0})/\varepsilon)}\mathds{Q}[M_{t}|\mathcal{F}_{0}]\right]
=[μ1m(Y0)ν1|q(Y0)|exp(U(Y0)/ε)M0]\displaystyle=\mathds{Q}\left[\frac{\mu^{-1}m(Y_{0})}{\nu^{-1}\lvert\nabla q(Y_{0})\rvert\exp(-U(Y_{0})/\varepsilon)}M_{0}\right]
=1,\displaystyle=1,

since M0=1M_{0}=1 and Y0ν1|q(x)|exp(U(x)/ε)Y_{0}\sim\nu^{-1}\lvert\nabla q(x)\rvert\exp(-U(x)/\varepsilon) under \mathds{Q}.

We now show that ZτZ_{\tau} is the density of \mathds{P} with respect to \mathds{Q} restricted to the stopping time σ\sigma-algebra τ\mathcal{F}_{\tau}. Since ZtZ_{t} is a martingale, the optional stopping theorem [14, Chapter 2, Theorem 3.2] implies that for any fixed t>0t>0,

Ztτ=[Zt|tτ].Z_{t\wedge\tau}=\mathds{Q}[Z_{t}|\mathcal{F}_{t\wedge\tau}].

(Here, tτt\wedge\tau denotes the minimium of tt and τ\tau.) Let AτA\in\mathcal{F}_{\tau}. Since [τ<]=1\mathds{P}[\tau<\infty]=1, we have

[A]=limN[A{τN}].\mathds{P}[A]=\lim_{N\rightarrow\infty}\mathds{P}[A\cap\{\tau\leq N\}].

Now each of the sets

AN:=A{τ<N}A_{N}:=A\cap\{\tau<N\}

is in τNN\mathcal{F}_{\tau\wedge N}\subset\mathcal{F}_{N}, since AτA\in\mathcal{F}_{\tau}. (Here, τN\tau\wedge N means the minimum of τ\tau and NN.) Therefore,

[A]\displaystyle\mathds{P}[A] =limN[AN]\displaystyle=\lim_{N\rightarrow\infty}\mathds{P}[A_{N}]
(since [τ<]=1\mathds{P}[\tau<\infty]=1)
=limNAN(dω)\displaystyle=\lim_{N\rightarrow\infty}\int_{A_{N}}\mathds{P}(\mathrm{d}\omega)
=limNANZN(ω)(dω)\displaystyle=\lim_{N\rightarrow\infty}\int_{A_{N}}Z_{N}(\omega)\mathds{Q}(\mathrm{d}\omega)
(since ANNA_{N}\in\mathcal{F}_{N} and ZN=dd|NZ_{N}=\left.\frac{\mathrm{d}\mathds{P}}{\mathrm{d}\mathds{Q}}\right\rvert_{\mathcal{F}_{N}})
=limNAN[ZN(ω)|τ(ω)N](dω)\displaystyle=\lim_{N\rightarrow\infty}\int_{A_{N}}\mathds{Q}[Z_{N}(\omega)|\mathcal{F}_{\tau(\omega)\wedge N}]\mathds{Q}(\mathrm{d}\omega)
(since ANτNA_{N}\in\mathcal{F}_{\tau\wedge N})
=limNANZτ(ω)N(ω)(dω)\displaystyle=\lim_{N\rightarrow\infty}\int_{A_{N}}Z_{\tau(\omega)\wedge N}(\omega)\mathds{Q}(\mathrm{d}\omega)
(by optional stopping, as discussed above)
=limNANZτ(ω)(ω)(dω)\displaystyle=\lim_{N\rightarrow\infty}\int_{A_{N}}Z_{\tau(\omega)}(\omega)\mathds{Q}(\mathrm{d}\omega)
(since for ωAN\omega\in A_{N}, τ(ω)N=τ(ω)\tau(\omega)\wedge N=\tau(\omega))
=AZτ(ω)(ω)(dω),\displaystyle=\int_{A}Z_{\tau(\omega)}(\omega)\mathds{Q}(\mathrm{d}\omega),
(since [τ<]=1\mathds{Q}[\tau<\infty]=1)

and so ZτZ_{\tau} is the density of \mathds{P} with respect to \mathds{Q} on τ\mathcal{F}_{\tau}. ∎

Appendix B Proofs of Results in Section 4.1

First, we prove Lemma 7 which provides an alternative expression for the improper integral in our change of measure formula.

Proof of Lemma 7.

By Lemma 2, ww is smooth. Therefore, by Ito’s formula, under \mathds{P}, we have

dw(Yt)=Lw(Yt)dt+2ϵlogq~(Yt)w(Yt)dt+2ϵw(Yt)dWt,dw(Y_{t})=Lw(Y_{t})\,\mathrm{d}t+2\epsilon\nabla\log\tilde{q}(Y_{t})\cdot\nabla w(Y_{t})\,\mathrm{d}t+\sqrt{2\epsilon}\nabla w(Y_{t})\cdot\mathrm{d}W_{t},\\ (23)

where WtW_{t} is the \mathds{P}-Brownian motion defined in Theorem 1 Now let

v~:=2ϵlog(q~) and v:=2ϵT.\tilde{v}:=-2\epsilon\log(\tilde{q})\text{ and }v:=-2\epsilon T.

By a calculation similar to the proof of Lemma 4, we have

Lq~q~=12ϵ(Lv~12|v~|2) and LTT=12ϵ(Lv12|v|2).\frac{L\tilde{q}}{\tilde{q}}=-\frac{1}{2\epsilon}\left(L\tilde{v}-\frac{1}{2}|\nabla\tilde{v}|^{2}\right)\text{ and }\frac{LT}{T}=-\frac{1}{2\epsilon}\left(Lv-\frac{1}{2}|\nabla v|^{2}\right).

Also, w=12ϵ(v~v)w=-\frac{1}{2\epsilon}(\tilde{v}-v). Plugging these expressions into  (23), we get

dw(Yt)\displaystyle\mathrm{d}w(Y_{t}) =(12ϵLv~(Yt)+12ϵLv(Yt)+12ϵv~(Yt)(v~v)(Yt))dt\displaystyle=\left(-\frac{1}{2\epsilon}L\tilde{v}(Y_{t})+\frac{1}{2\epsilon}Lv(Y_{t})+\frac{1}{2\epsilon}\nabla\tilde{v}(Y_{t})\cdot\nabla(\tilde{v}-v)(Y_{t})\right)\,\mathrm{d}t
+2ϵw(Yt)dBt\displaystyle\qquad+\sqrt{2\epsilon}\nabla w(Y_{t})\cdot\mathrm{d}B_{t}
=12ϵ(Lv~(Yt)12|v~(Yt)|2)dt\displaystyle=-\frac{1}{2\epsilon}\left(L\tilde{v}(Y_{t})-\frac{1}{2}|\nabla\tilde{v}(Y_{t})|^{2}\right)\mathrm{d}t
+12ϵ(Lv(Yt)12|v(Yt)|2)dt\displaystyle\qquad+\frac{1}{2\epsilon}\left(Lv(Y_{t})-\frac{1}{2}|\nabla v(Y_{t})|^{2}\right)\mathrm{d}t
+12ϵ(12|v~(X2)|2+12|v(X2)|2v~(Yt)v(Yt))dt\displaystyle\qquad+\frac{1}{2\epsilon}\left(\frac{1}{2}|\nabla\tilde{v}(X_{2})|^{2}+\frac{1}{2}|\nabla v(X_{2})|^{2}-\nabla\tilde{v}(Y_{t})\cdot\nabla v(Y_{t})\right)\mathrm{d}t
+2ϵw(Yt)dBt\displaystyle\qquad+\sqrt{2\epsilon}\nabla w(Y_{t})\cdot\mathrm{d}B_{t}
=(Lq~q~(Yt)LTT(Yt))dt\displaystyle=\left(\frac{L\tilde{q}}{\tilde{q}}(Y_{t})-\frac{LT}{T}(Y_{t})\right)\mathrm{d}t
+ϵ|w(Yt)|2dt+2ϵw(Yt)dWt,\displaystyle\qquad+\epsilon|\nabla w(Y_{t})|^{2}\mathrm{d}t+\sqrt{2\epsilon}\nabla w(Y_{t})\cdot\mathrm{d}W_{t},

and the result follows. ∎

We use Lemma 7 to show that our estimator of relative entropy differences has finite variance, at least under certain conditions on q~\tilde{q}.

Lemma 8.

Assume that [τ2]<\mathds{P}[\tau^{2}]<\infty, that for some C>0C>0,

|log(q~(x)q(x))|C and |log(q~(x)q(x))|C\left\lvert\log\left(\frac{\tilde{q}(x)}{q(x)}\right)\right\rvert\leq C\text{ and }\left\lvert\nabla\log\left(\frac{\tilde{q}(x)}{q(x)}\right)\right\rvert\leq C

for all x𝒯¯x\in\bar{\mathscr{T}}, and that for some M>0M>0,

1M|m(x)|M and 1M|q~(x)|exp(U(x)/ε)M\frac{1}{M}\leq\lvert m(x)\rvert\leq M\text{ and }\frac{1}{M}\leq\lvert\nabla\tilde{q}(x)\rvert\exp(-U(x)/\varepsilon)\leq M

for all xAx\in\partial A. If so,

var(log(m(Y~01)|q~(Y~01)|exp(U(Y~01)/ε))+logq~(Y~τ1)0τLq~q~(Y~s1)ds)<,\operatorname{var}_{\mathds{P}}\left(\log\left(\frac{m(\tilde{Y}^{1}_{0})}{\lvert\nabla\tilde{q}(\tilde{Y}^{1}_{0})\rvert\exp(-U(\tilde{Y}^{1}_{0})/\varepsilon)}\right)+\log\tilde{q}(\tilde{Y}^{1}_{\tau})-\int_{0}^{\tau}\frac{L\tilde{q}}{\tilde{q}}(\tilde{Y}^{1}_{s})\,\mathrm{d}s\right)<\infty,

so limN~I~N~=I~\lim_{\tilde{N}\rightarrow\infty}\tilde{I}_{\tilde{N}}=\tilde{I} in L2()L^{2}(\mathds{P}).

Proof.

The crux of the proof is to show that

[(0τLq~q~(Ys)ds)2]<\mathds{P}\left[\left(\int_{0}^{\tau}\frac{L\tilde{q}}{\tilde{q}}(Y_{s})\,\mathrm{d}s\right)^{2}\right]<\infty

even though Lq~q~\frac{L\tilde{q}}{\tilde{q}} may be singular on A\partial A. Taking T=qT=q in Lemma 7 yields

0τLq~q~(Ys)ds\displaystyle\int_{0}^{\tau}\frac{L\tilde{q}}{\tilde{q}}(Y_{s})\,\mathrm{d}s =logr(Yτ)r(Y0)ε0τ|logr(Ys)|2ds2ε0τlogr(Ys)dWs\displaystyle=\log\frac{r(Y_{\tau})}{r(Y_{0})}-\varepsilon\int_{0}^{\tau}\lvert\nabla\log r(Y_{s})\rvert^{2}\,\mathrm{d}s-\sqrt{2\varepsilon}\int_{0}^{\tau}\nabla\log r(Y_{s})\cdot\mathrm{d}W_{s}

where r=q~/qr=\tilde{q}/q and WsW_{s} is the \mathds{P}-Brownian motion in Theorem 1. Therefore,

[(0τLq~q~(Ys)ds)2]12\displaystyle\mathds{P}\left[\left(\int_{0}^{\tau}\frac{L\tilde{q}}{\tilde{q}}(Y_{s})\,\mathrm{d}s\right)^{2}\right]^{\frac{1}{2}} [(logr(Yτ)r(Y0))2]12+ε[(0τ|logr(Ys)|2ds)2]12\displaystyle\leq\mathds{P}\left[\left(\log\frac{r(Y_{\tau})}{r(Y_{0})}\right)^{2}\right]^{\frac{1}{2}}+\varepsilon\mathds{P}\left[\left(\int_{0}^{\tau}\lvert\nabla\log r(Y_{s})\rvert^{2}\,\mathrm{d}s\right)^{2}\right]^{\frac{1}{2}}
+2ε[(0τlogr(Ys)dWs)2]12\displaystyle\qquad\qquad\qquad+\sqrt{2\varepsilon}\mathds{P}\left[\left(\int_{0}^{\tau}\nabla\log r(Y_{s})\cdot\mathrm{d}W_{s}\right)^{2}\right]^{\frac{1}{2}}
2C+εC[τ2]+2ε[(0τlogr(Ys)dWs)2]12.\displaystyle\leq 2C+\varepsilon C\mathds{P}[\tau^{2}]+\sqrt{2\varepsilon}\mathds{P}\left[\left(\int_{0}^{\tau}\nabla\log r(Y_{s})\cdot\mathrm{d}W_{s}\right)^{2}\right]^{\frac{1}{2}}. (24)

We bound the third term in (24) using the Ito isometry, which entails some minor but tiresome technical difficulties, since standard forms of the isometry only apply to integrals up to finite, deterministic times. Define

St=0tlogr(Ys)dWs.S_{t}=\int_{0}^{t}\nabla\log r(Y_{s})\cdot\mathrm{d}W_{s}.

We will approximate SτS_{\tau} by SτnS_{\tau\wedge n} for nn\in\mathbb{N} to apply the Ito isometry; we claim that SτnS_{\tau\wedge n} is a Cauchy sequence in L2()L^{2}(\mathds{P}). To see this, let n<nn<n^{\prime}\in\mathbb{N}, and observe that

[(SτnSτn)2]\displaystyle\mathds{P}[(S_{\tau\wedge n^{\prime}}-S_{\tau\wedge n})^{2}] =[(τnτnlogr(Ys)dWs)2]\displaystyle=\mathds{P}\left[\left(\int_{\tau\wedge n}^{\tau\wedge n^{\prime}}\nabla\log r(Y_{s})\cdot\mathrm{d}W_{s}\right)^{2}\right]
=[(0n𝟙τns<τnlogr(Ys)dWs)2]\displaystyle=\mathds{P}\left[\left(\int_{0}^{n^{\prime}}\mathds{1}_{\tau\wedge n\leq s<\tau\wedge n^{\prime}}\nabla\log r(Y_{s})\cdot\mathrm{d}W_{s}\right)^{2}\right]
=[0n𝟙τns<τn|logr(Ys)|2ds]\displaystyle=\mathds{P}\left[\int_{0}^{n^{\prime}}\mathds{1}_{\tau\wedge n\leq s<\tau\wedge n^{\prime}}\lvert\nabla\log r(Y_{s})\rvert^{2}\,\mathrm{d}s\right]
[0n𝟙τn|logr(Ys)|2ds]\displaystyle\leq\mathds{P}\left[\int_{0}^{n^{\prime}}\mathds{1}_{\tau\geq n}\lvert\nabla\log r(Y_{s})\rvert^{2}\,\mathrm{d}s\right]
C2[τn].\displaystyle\leq C^{2}\mathds{P}[\tau\geq n].

The third equality above follows from the Ito isometry, since τ\tau is a stopping time, hence 𝟙τns<τn\mathds{1}_{\tau\wedge n\leq s<\tau\wedge n^{\prime}} is adapted. The first inequality holds, since τ<n\tau<n implies 𝟙τns<τn=0\mathds{1}_{\tau\wedge n\leq s<\tau\wedge n^{\prime}}=0. We assume that [τ<]=1\mathds{P}[\tau<\infty]=1, so limn[τn]=0\lim_{n\rightarrow\infty}\mathds{P}[\tau\geq n]=0, hence SτnS_{\tau\wedge n} is Cauchy. In fact, we have limnSτn=Sτ\lim_{n\rightarrow\infty}S_{\tau\wedge n}=S_{\tau} in L2()L^{2}(\mathds{P}), since [τ<]=1\mathds{P}[\tau<\infty]=1 implies that SτnS_{\tau\wedge n} converges to SτS_{\tau} almost surely. We conclude that

[Sτ2]\displaystyle\mathds{P}[S_{\tau}^{2}] =limn[Sτn2]\displaystyle=\lim_{n\rightarrow\infty}\mathds{P}[S_{\tau\wedge n}^{2}]
=limn[(0n𝟙sτnlogr(Ys)dWs)2]\displaystyle=\lim_{n\rightarrow\infty}\mathds{P}\left[\left(\int_{0}^{n}\mathds{1}_{s\leq\tau\wedge n}\nabla\log r(Y_{s})\cdot\mathrm{d}W_{s}\right)^{2}\right]
=limn[0n𝟙sτn|logr(Ys)|2ds]\displaystyle=\lim_{n\rightarrow\infty}\mathds{P}\left[\int_{0}^{n}\mathds{1}_{s\leq\tau\wedge n}\lvert\nabla\log r(Y_{s})\rvert^{2}\,\mathrm{d}s\right]
limnC2[τn]\displaystyle\leq\lim_{n\rightarrow\infty}C^{2}\mathds{P}[\tau\wedge n]
=C2[τ].\displaystyle=C^{2}\mathds{P}[\tau].

It follows that

var(log(m(Y~01)|q~(Y~01)|exp(U(Y~01)/ε))+logq~(Y~τ1)0τLq~q~(Y~s1)ds)<.\operatorname{var}_{\mathds{P}}\left(\log\left(\frac{m(\tilde{Y}^{1}_{0})}{\lvert\nabla\tilde{q}(\tilde{Y}^{1}_{0})\rvert\exp(-U(\tilde{Y}^{1}_{0})/\varepsilon)}\right)+\log\tilde{q}(\tilde{Y}^{1}_{\tau})-\int_{0}^{\tau}\frac{L\tilde{q}}{\tilde{q}}(\tilde{Y}^{1}_{s})\,\mathrm{d}s\right)<\infty.

We leave the remaining details to the reader. ∎

We now prove Theorem 2 on the differentiability of DKL(θ)D_{\mathrm{KL}}(\mathds{P}_{\theta}\|\mathds{Q}) in θ\theta.

Proof of Theorem 2.

Since we assume that Lqθqθ\frac{Lq_{\theta}}{q_{\theta}} and θLqθqθ\nabla_{\theta}\frac{Lq_{\theta}}{q_{\theta}} are bounded, for each ωΩ\omega\in\Omega,

θ0τ(ω)Lqθqθ(Xs(ω))ds=0τ(ω)θLqθqθ(Xs(ω))ds.\nabla_{\theta}\int_{0}^{\tau(\omega)}\frac{Lq_{\theta}}{q_{\theta}}(X_{s}(\omega))\,\mathrm{d}s=\int_{0}^{\tau(\omega)}\nabla_{\theta}\frac{Lq_{\theta}}{q_{\theta}}(X_{s}(\omega))\,\mathrm{d}s.

Moreover, since we assume μθ1M>0\mu_{\theta}\geq\frac{1}{M}>0, and since qθq_{\theta} is differentiable, logμθ\log\mu_{\theta} is differentiable. Therefore,

θdθd\displaystyle\nabla_{\theta}\frac{\mathrm{d}\mathds{P}_{\theta}}{\mathrm{d}\mathds{Q}} =dθd(θqθ(Xτ)qθ(Xτ)θμθμθ0τθLqθqθ(Xs)ds).\displaystyle=\frac{\mathrm{d}\mathds{P}_{\theta}}{\mathrm{d}\mathds{Q}}\left(\frac{\nabla_{\theta}q_{\theta}(X_{\tau})}{q_{\theta}(X_{\tau})}-\frac{\nabla_{\theta}\mu_{\theta}}{\mu_{\theta}}-\int_{0}^{\tau}\nabla_{\theta}\frac{Lq_{\theta}}{q_{\theta}}(X_{s})\,\mathrm{d}s\right). (25)

One now wants to interchange differentiation with respect to θ\theta and expectation over \mathds{Q} to derive a formula for θDKL(θ)\nabla_{\theta}D_{\mathrm{KL}}(\mathds{P}_{\theta}\|\mathds{Q}). Write

g(x)=xlogxg(x)=x\log x

so that

DKL(θ)=[g(dθd)].D_{\mathrm{KL}}(\mathds{P}_{\theta}\|\mathds{Q})=\mathds{Q}\left[g\left(\frac{\mathrm{d}\mathds{P}_{\theta}}{\mathrm{d}\mathds{Q}}\right)\right].

We claim that

θDKL(θ)=θ[g(dθd)]=[g(dθd)θdθd].\nabla_{\theta}D_{\mathrm{KL}}(\mathds{P}_{\theta}\|\mathds{Q})=\nabla_{\theta}\mathds{Q}\left[g\left(\frac{\mathrm{d}\mathds{P}_{\theta}}{\mathrm{d}\mathds{Q}}\right)\right]=\mathds{Q}\left[g^{\prime}\left(\frac{\mathrm{d}\mathds{P}_{\theta}}{\mathrm{d}\mathds{Q}}\right)\nabla_{\theta}\frac{\mathrm{d}\mathds{P}_{\theta}}{\mathrm{d}\mathds{Q}}\right]. (26)

Assumption 7 does not imply that θdθd\nabla_{\theta}\frac{\mathrm{d}\mathds{P}_{\theta}}{\mathrm{d}\mathds{Q}} is bounded uniformly in θ\theta by a function in L1()L^{1}(\mathds{Q}), so textbook results related to differentiating under the integral do not apply. To verify (26), we return to the definition of the derivative. Given any h,θkh,\theta\in\mathbb{R}^{k}, by the mean value theorem, for some s:Ω[0,1]s:\Omega\rightarrow[0,1], we have

1h\displaystyle\frac{1}{\lVert h\rVert}\mathds{Q} [g(dθ+hd)g(dθd)g(dθd)θdθdh]\displaystyle\left[g\left(\frac{\mathrm{d}\mathds{P}_{\theta+h}}{\mathrm{d}\mathds{Q}}\right)-g\left(\frac{\mathrm{d}\mathds{P}_{\theta}}{\mathrm{d}\mathds{Q}}\right)-g^{\prime}\left(\frac{\mathrm{d}\mathds{P}_{\theta}}{\mathrm{d}\mathds{Q}}\right)\nabla_{\theta}\frac{\mathrm{d}\mathds{P}_{\theta}}{\mathrm{d}\mathds{Q}}\cdot h\right]
=[g(dθ+shd)θdθ+shdhhg(dθd)θdθdhh]\displaystyle=\mathds{Q}\left[g^{\prime}\left(\frac{\mathrm{d}\mathds{P}_{\theta+sh}}{\mathrm{d}\mathds{Q}}\right)\nabla_{\theta}\frac{\mathrm{d}\mathds{P}_{\theta+sh}}{\mathrm{d}\mathds{Q}}\cdot\frac{h}{\lVert h\rVert}-g^{\prime}\left(\frac{\mathrm{d}\mathds{P}_{\theta}}{\mathrm{d}\mathds{Q}}\right)\nabla_{\theta}\frac{\mathrm{d}\mathds{P}_{\theta}}{\mathrm{d}\mathds{Q}}\cdot\frac{h}{\lVert h\rVert}\right]
=[{g(dθ+shd)g(dθ+hd)}θdθdhh\displaystyle=\mathds{Q}\left[\left\{g^{\prime}\left(\frac{\mathrm{d}\mathds{P}_{\theta+sh}}{\mathrm{d}\mathds{Q}}\right)-g^{\prime}\left(\frac{\mathrm{d}\mathds{P}_{\theta+h}}{\mathrm{d}\mathds{Q}}\right)\right\}\nabla_{\theta}\frac{\mathrm{d}\mathds{P}_{\theta}}{\mathrm{d}\mathds{Q}}\cdot\frac{h}{\lVert h\rVert}\right.
+g(dθ+shd){θdθ+shdθdθd}hh]\displaystyle\qquad\qquad+\left.g^{\prime}\left(\frac{\mathrm{d}\mathds{P}_{\theta+sh}}{\mathrm{d}\mathds{Q}}\right)\left\{\nabla_{\theta}\frac{\mathrm{d}\mathds{P}_{\theta+sh}}{\mathrm{d}\mathds{Q}}-\nabla_{\theta}\frac{\mathrm{d}\mathds{P}_{\theta}}{\mathrm{d}\mathds{Q}}\right\}\cdot\frac{h}{\lVert h\rVert}\right]
=θ[{log(dθ+shd)log(dθd)}θlogdθdhh\displaystyle=\mathds{P}_{\theta}\left[\left\{\log\left(\frac{\mathrm{d}\mathds{P}_{\theta+sh}}{\mathrm{d}\mathds{Q}}\right)-\log\left(\frac{\mathrm{d}\mathds{P}_{\theta}}{\mathrm{d}\mathds{Q}}\right)\right\}\nabla_{\theta}\log\frac{\mathrm{d}\mathds{P}_{\theta}}{\mathrm{d}\mathds{Q}}\cdot\frac{h}{\lVert h\rVert}\right.
+{log(dθ+shd)+1}θlogdθ+shdhhdθ+shdddθ\displaystyle\qquad\qquad+\left\{\log\left(\frac{\mathrm{d}\mathds{P}_{\theta+sh}}{\mathrm{d}\mathds{Q}}\right)+1\right\}\nabla_{\theta}\log\frac{\mathrm{d}\mathds{P}_{\theta+sh}}{\mathrm{d}\mathds{Q}}\cdot\frac{h}{\lVert h\rVert}\frac{\mathrm{d}\mathds{P}_{\theta+sh}}{\mathrm{d}\mathds{Q}}\frac{\mathrm{d}\mathds{Q}}{\mathrm{d}\mathds{P}_{\theta}}
{log(dθ+shd)+1}θlogdθdhh]\displaystyle\qquad\qquad-\left.\left\{\log\left(\frac{\mathrm{d}\mathds{P}_{\theta+sh}}{\mathrm{d}\mathds{Q}}\right)+1\right\}\nabla_{\theta}\log\frac{\mathrm{d}\mathds{P}_{\theta}}{\mathrm{d}\mathds{Q}}\cdot\frac{h}{\lVert h\rVert}\right]
=:θ[R(θ;h)].\displaystyle=:\mathds{P}_{\theta}[R(\theta;h)].

As h0h\rightarrow 0, the integrand R(θ;h)R(\theta;h) in the expectation with respect to θ\mathds{P}_{\theta} above converges pointwise to zero. We now show that R(θ;h)R(\theta;h) is dominated in L1(θ)L^{1}(\mathds{P}_{\theta}), uniformly in hh for sufficiently small hh. If so, limh0θ[R(θ;h)]=0\lim_{h\rightarrow 0}\mathds{P}_{\theta}[R(\theta;h)]=0, which verifies differentiability of DKL(θ)D_{\mathrm{KL}}(\mathds{P}_{\theta}\|\mathds{Q}) and (26). By Assumption 7, we have

|dθ+shdddθ|\displaystyle\left\lvert\frac{\mathrm{d}\mathds{P}_{\theta+sh}}{\mathrm{d}\mathds{Q}}\frac{\mathrm{d}\mathds{Q}}{\mathrm{d}\mathds{P}_{\theta}}\right\rvert μθμθ+shexp(0τ|Lqθqθ(Yt)Lqθ+shqθ+sh(Yt)|dt)\displaystyle\leq\frac{\mu_{\theta}}{\mu_{\theta+sh}}\exp\left(\int_{0}^{\tau}\left\lvert\frac{Lq_{\theta}}{q_{\theta}}(Y_{t})-\frac{Lq_{\theta+sh}}{q_{\theta+sh}}(Y_{t})\right\rvert\,\mathrm{d}t\right)
M2exp(C|h|τ),\displaystyle\leq M^{2}\exp(C\lvert h\rvert\tau),
|log(dθd)|\displaystyle\left\lvert\log\left(\frac{\mathrm{d}\mathds{P}_{\theta}}{\mathrm{d}\mathds{Q}}\right)\right\rvert |log(μθν)|+0τ|Lqθqθ(Yt)|dt\displaystyle\leq\left\lvert\log\left(\frac{\mu_{\theta}}{\nu}\right)\right\rvert+\int_{0}^{\tau}\left\lvert\frac{Lq_{\theta}}{q_{\theta}}(Y_{t})\right\rvert\,\mathrm{d}t
|logν|+|logM|+Cτ,\displaystyle\leq\lvert\log\nu\rvert+\lvert\log M\rvert+C\tau,

and

|θlogdθd|\displaystyle\left\lvert\nabla_{\theta}\log\frac{\mathrm{d}\mathds{P}_{\theta}}{\mathrm{d}\mathds{Q}}\right\rvert |θmθmθ|+0τ|θLqθqθ(Yt)|dt\displaystyle\leq\left\lvert\frac{\nabla_{\theta}m_{\theta}}{m_{\theta}}\right\rvert+\int_{0}^{\tau}\left\lvert\nabla_{\theta}\frac{Lq_{\theta}}{q_{\theta}}(Y_{t})\right\rvert\,\mathrm{d}t
M2+Cτ.\displaystyle\leq M^{2}+C\tau.

Therefore, when |h|γ(θ)2C\lvert h\rvert\leq\frac{\gamma(\theta)}{2C},

|R(θ;h)|\displaystyle\lvert R(\theta;h)\rvert 2(|logν|+|logM|+Cτ)(M2+Cτ)\displaystyle\leq 2(\lvert\log\nu\rvert+\lvert\log M\rvert+C\tau)(M^{2}+C\tau)
+(1+|logν|+|logM|+Cτ)(M2+Cτ)\displaystyle\qquad+(1+\lvert\log\nu\rvert+\lvert\log M\rvert+C\tau)(M^{2}+C\tau)
+(1+|logν|+|logM|+Cτ)(M2+Cτ)M2exp(γ(θ)2τ).\displaystyle\qquad+(1+\lvert\log\nu\rvert+\lvert\log M\rvert+C\tau)(M^{2}+C\tau)M^{2}\exp\left(\frac{\gamma(\theta)}{2}\tau\right).

Under Assumption 7, the random variable on the right-hand-side above is in L1(θ)L^{1}(\mathds{P}_{\theta}), which concludes the proof of differentiability.

We now derive a more convenient formula for the derivative. As a first step, we observe that one can again exchange expectation and differentiation to obtain

0\displaystyle 0 =θθ[Ω]\displaystyle=\nabla_{\theta}\mathds{P}_{\theta}[\Omega]
=θ[dθd]\displaystyle=\nabla_{\theta}\mathds{Q}\left[\frac{\mathrm{d}\mathds{P}_{\theta}}{\mathrm{d}\mathds{Q}}\right]
=[θdθd]\displaystyle=\mathds{Q}\left[\nabla_{\theta}\frac{\mathrm{d}\mathds{P}_{\theta}}{\mathrm{d}\mathds{Q}}\right]
=θ[θlogdθd].\displaystyle=\mathds{P}_{\theta}\left[\nabla_{\theta}\log\frac{\mathrm{d}\mathds{P}_{\theta}}{\mathrm{d}\mathds{Q}}\right]. (27)

Now we have

θDKL(θ|)\displaystyle\nabla_{\theta}D_{\mathrm{KL}}(\mathds{P}_{\theta}|\mathds{Q}) =[g(dθd)θdθd]\displaystyle=\mathds{Q}\left[g^{\prime}\left(\frac{\mathrm{d}\mathds{P}_{\theta}}{\mathrm{d}\mathds{Q}}\right)\nabla_{\theta}\frac{\mathrm{d}\mathds{P}_{\theta}}{\mathrm{d}\mathds{Q}}\right]
=[(logdθd+1)θdθd]\displaystyle=\mathds{Q}\left[\left(\log\frac{\mathrm{d}\mathds{P}_{\theta}}{\mathrm{d}\mathds{Q}}+1\right)\nabla_{\theta}\frac{\mathrm{d}\mathds{P}_{\theta}}{\mathrm{d}\mathds{Q}}\right]
=θ[(logdθd+1)θlogdθd]\displaystyle=\mathds{P}_{\theta}\left[\left(\log\frac{\mathrm{d}\mathds{P}_{\theta}}{\mathrm{d}\mathds{Q}}+1\right)\nabla_{\theta}\log\frac{\mathrm{d}\mathds{P}_{\theta}}{\mathrm{d}\mathds{Q}}\right]
=covθ(logdθd,θlogdθd)\displaystyle=\operatorname{cov}_{\mathds{P}_{\theta}}\left(\log\frac{\mathrm{d}\mathds{P}_{\theta}}{\mathrm{d}\mathds{Q}},\nabla_{\theta}\log\frac{\mathrm{d}\mathds{P}_{\theta}}{\mathrm{d}\mathds{Q}}\right)
=covθ(logνμθ+logqθ(Xτ)0τLqθqθ(Xs)ds,\displaystyle=\operatorname{cov}_{\mathds{P}_{\theta}}\left(\log\frac{\nu}{\mu_{\theta}}+\log q_{\theta}(X_{\tau})-\int_{0}^{\tau}\frac{Lq_{\theta}}{q_{\theta}}(X_{s})\,\mathrm{d}s,\right.
θqθ(Xτ)qθ(Xτ)θμθμθ0τθLqθqθ(Xs)ds)\displaystyle\qquad\qquad\quad\left.\frac{\nabla_{\theta}q_{\theta}(X_{\tau})}{q_{\theta}(X_{\tau})}-\frac{\nabla_{\theta}\mu_{\theta}}{\mu_{\theta}}-\int_{0}^{\tau}\nabla_{\theta}\frac{Lq_{\theta}}{q_{\theta}}(X_{s})\,\mathrm{d}s\right)
=covθ(logqθ(Xτ)0τLqθqθ(Xs)ds,\displaystyle=\operatorname{cov}_{\mathds{P}_{\theta}}\left(\log q_{\theta}(X_{\tau})-\int_{0}^{\tau}\frac{Lq_{\theta}}{q_{\theta}}(X_{s})\,\mathrm{d}s,\right.
θqθ(Xτ)qθ(Xτ)0τθLqθqθ(Xs)ds)\displaystyle\qquad\qquad\quad\left.\frac{\nabla_{\theta}q_{\theta}(X_{\tau})}{q_{\theta}(X_{\tau})}-\int_{0}^{\tau}\nabla_{\theta}\frac{Lq_{\theta}}{q_{\theta}}(X_{s})\,\mathrm{d}s\right)

The fourth equality above follows from (27), and the last follows since logνμθ\log\frac{\nu}{\mu_{\theta}} and θμθμθ\frac{\nabla_{\theta}\mu_{\theta}}{\mu_{\theta}} are constants that do not depend on ω\omega. ∎

Appendix C Proofs of Results in Section 5

Proof of Lemma 6.

For any xΩx\in\Omega, we will write x¯\bar{x} for ρ(x)\rho(x) to simplify notation. Let w:Ωw:\Omega\rightarrow\mathbb{R} admit a twice continuously differentiable extension to an open neighborhood of Ω¯\overline{\Omega}. Suppose that for some twice continuously differentiable w0:Aw_{0}:\partial A\rightarrow\mathbb{R}, w1:Aw_{1}:\partial A\rightarrow\mathbb{R}, and w2:Ωw_{2}:\Omega\rightarrow\mathbb{R}, we have

w(x)=w0(x¯)+T(x)w1(x¯)+T(x)2w2(x)w(x)=w_{0}(\bar{x})+T(x)w_{1}(\bar{x})+T(x)^{2}w_{2}(x) (28)

for all xΩx\in\Omega. We will show that there can exist at most one triple of functions w0w_{0}, w1w_{1}, and w2w_{2} solving (28). First, observe that if (28) holds, then

w0(z)=w(z)w_{0}(z)=w(z) (29)

for all zAz\in\partial A, since T=0T=0 on A\partial A. Thus, given ww, equation (28) uniquely determines w0w_{0}. To see that (28) also determines w1w_{1}, observe that

w(x)t\displaystyle\nabla w(x)^{t} =w0(x)tDρ(x)+w1(x¯)T(x)t\displaystyle=\nabla w_{0}(x)^{t}D\rho(x)+w_{1}(\bar{x})\nabla T(x)^{t}
+T(x){w1(x)tDρ(x)+2w2(x)T(x)t}\displaystyle\qquad+T(x)\Big{\{}\nabla w_{1}(x)^{t}D\rho(x)+2w_{2}(x)\nabla T(x)^{t}\Big{\}}
+T(x)2w2(x)t.\displaystyle\qquad+T(x)^{2}\nabla w_{2}(x)^{t}.

Thus, for zAz\in\partial A,

w(z)n(z)=w0(z)tDρ(z)n(z)+w1(z)T(z)n(z),\nabla w(z)\cdot n(z)=\nabla w_{0}(z)^{t}D\rho(z)n(z)+w_{1}(z)\nabla T(z)\cdot n(z), (30)

hence

w1(z)=w(z)n(z)w0(z)tDρ(z)n(z)T(z)n(z),w_{1}(z)=\frac{\nabla w(z)\cdot n(z)-\nabla w_{0}(z)^{t}D\rho(z)n(z)}{\nabla T(z)\cdot n(z)}, (31)

since we assume T(z)n(z)>0\nabla T(z)\cdot n(z)>0. Finally, we have T(x)>0T(x)>0 for xΩAx\in\Omega\setminus\partial A, so

w2(x)=w(x)w0(x¯)T(x)w1(x¯)T(x)2.w_{2}(x)=\frac{w(x)-w_{0}(\bar{x})-T(x)w_{1}(\bar{x})}{T(x)^{2}}. (32)

Thus, equation (28) determines w2w_{2} over ΩA\Omega\setminus\partial A, and therefore also on A\partial A, since we assume w2w_{2} is continuous.

Now observe that for any given ww, equations (29), (31), and (32) define smooth w0w_{0}, w1w_{1}, and w2w_{2} so that (21) holds. To see this, one need only show that w2w_{2} defined by (32) is smooth even though TT is zero on A\partial A. For

v(x)=T(x)2w2(x)=w(x)w0(x¯)T(x)w1(x¯),v(x)=T(x)^{2}w_{2}(x)=w(x)-w_{0}(\bar{x})-T(x)w_{1}(\bar{x}),

we have

v(z)n(z)=0\nabla v(z)\cdot n(z)=0

for zAz\in\partial A. Therefore, by an argument similar to the proof of Lemma 2 but carrying the Taylor expansions to higher order,

w2(x)\displaystyle w_{2}(x) =v(x)T(x)2\displaystyle=\frac{v(x)}{T(x)^{2}}

is smooth. We conclude that every smooth ww admits a unique expression in the form (28).

Finally, we show that Lq~=0L\tilde{q}=0 on A\partial A if and only if

w1=w0tDρn|T|LT|T|2.w_{1}=-\frac{\nabla w_{0}^{t}D\rho n}{\lvert\nabla T\rvert}-\frac{LT}{\lvert\nabla T\rvert^{2}}. (33)

To see this, observe that for q~=Texp(w)\tilde{q}=T\exp(w),

Lq~=exp(w){LT+2εwT+T(Lw+ε|w|2)}.L\tilde{q}=\exp(w)\Big{\{}LT+2\varepsilon\nabla w\cdot\nabla T+T(Lw+\varepsilon\lvert\nabla w\rvert^{2})\Big{\}}.

Therefore, since T=0T=0 on A\partial A, Lq~=0L\tilde{q}=0 on A\partial A if and only if

LT+2εwT=0LT+2\varepsilon\nabla w\cdot\nabla T=0 (34)

on A\partial A. Since T=0T=0 and Tn>0\nabla T\cdot n>0 on A\partial A, T=|T|n\nabla T=\lvert\nabla T\rvert n on A\partial A. Thus, (34) reduces to the Neumann boundary condition

wn=12εLT|T|\nabla w\cdot n=-\frac{1}{2\varepsilon}\frac{LT}{\lvert\nabla T\rvert} (35)

on A\partial A. Equation (33) follows from this Neumann boundary condition and (30). ∎

References

  • [1] C. H. Bennett. Efficient estimation of free energy differences from Monte Carlo data. Journal of Computational Physics, 22(2):245–268, Oct. 1976.
  • [2] F. Cerou, A. Guyader, T. Lelievre, and F. Malrieu. On the length of one-dimensional reactive paths.
  • [3] Y. Chen, J. Hoskins, Y. Khoo, and M. Lindsey. Committor functions via tensor networks. Journal of Computational Physics, 472:111646, Jan. 2023.
  • [4] P. Dupuis, K. Spiliopoulos, and H. Wang. Importance Sampling for Multiscale Diffusions. Multiscale Modeling & Simulation, 10(1):1–27, Jan. 2012.
  • [5] W. E. and E. Vanden-Eijnden. Towards a Theory of Transition Paths. Journal of Statistical Physics, 123(3):503–523, May 2006.
  • [6] L. Evans, M. K. Cameron, and P. Tiwary. Computing committors via Mahalanobis diffusion maps with enhanced sampling data. The Journal of Chemical Physics, 157(21):214107, Dec. 2022.
  • [7] Y. Gao, T. Li, X. Li, and J.-G. Liu. Transition Path Theory for Langevin Dynamics on Manifolds: Optimal Control and Data-Driven Solver. Multiscale Modeling & Simulation, 21(1):1–33, Mar. 2023.
  • [8] V. Gayrard, A. Bovier, M. Eckhoff, and M. Klein. Metastability in Reversible Diffusion Processes I: Sharp Asymptotics for Capacities and Exit Times. Journal of the European Mathematical Society, 6(4):399–424, Dec. 2004.
  • [9] Y. Khoo, J. Lu, and L. Ying. Solving for high dimensional committor functions using artificial neural networks, Feb. 2018. arXiv:1802.10275 [cs, stat].
  • [10] T. Lelièvre and G. Stoltz. Partial differential equations and stochastic methods in molecular dynamics. Acta Numerica, 25:681–880, May 2016.
  • [11] T. Lelièvre, G. Stoltz, and M. Rousset. Free energy computations: a mathematical perspective. Imperial College Press, London ; Hackensack, N.J, 2010. OCLC: ocn244765923.
  • [12] Q. Li, B. Lin, and W. Ren. Computing committor functions for the study of rare events using deep learning. The Journal of Chemical Physics, 151(5):054112, Aug. 2019.
  • [13] J. Lu and J. Nolen. Reactive trajectories and the transition path process. Probability Theory and Related Fields, 161(1-2):195–244, Feb. 2015.
  • [14] D. Revuz and M. Yor. Continuous Martingales and Brownian Motion, 1999.
  • [15] M. R. Shirts and J. D. Chodera. Statistically optimal analysis of samples from multiple equilibrium states. The Journal of Chemical Physics, 129(12):124105, Sept. 2008.
  • [16] E. Vanden-Eijnden and J. Weare. Rare Event Simulation of Small Noise Diffusions. Communications on Pure and Applied Mathematics, 65(12):1770–1803, 2012. _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/cpa.21428.
  • [17] J. Yuan, A. Shah, C. Bentz, and M. Cameron. Optimal control for sampling the transition path process and estimating rates, June 2023. arXiv:2305.17112 [math].
  • [18] B. Zhang, T. Sahai, and Y. Marzouk. A Koopman framework for rare event simulation in stochastic differential equations. Journal of Computational Physics, 456:111025, May 2022. arXiv:2101.07330 [math, stat].