This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Geometric constraints improve inference of sparsely observed stochastic dynamics

Dimitra Maoutsa  \scalerel*
Technical University of Berlin
Germany
dimitra.maoutsa@{tu-berlin.de ; gmail.com}
https://dimitra-maoutsa.gitlab.io/
Abstract

The dynamics of systems of many degrees of freedom evolving on multiple scales are often modeled in terms of stochastic differential equations. Usually the structural form of these equations is unknown and the only manifestation of the system’s dynamics are observations at discrete points in time. Despite their widespread use, accurately inferring these systems from sparse-in-time observations remains challenging. Conventional inference methods either focus on the temporal structure of observations, neglecting the geometry of the system’s invariant density, or use geometric approximations of the invariant density, which are limited to conservative driving forces. To address these limitations, here, we introduce a novel approach that reconciles these two perspectives. We propose a path augmentation scheme that employs data-driven control to account for the geometry of the invariant system’s density. Non-parametric inference on the augmented paths, enables efficient identification of the underlying deterministic forces of systems observed at low sampling rates.

1 Introduction

Unraveling a system’s governing equations from time-series observations is often crucial for understanding unexplained natural phenomena. The goal is to find a mathematical representation that aligns with observational data and provides a comprehensive phenomenological understanding of the underlying mechanisms. This requires employing proper representations that capture key system properties while effectively simplify extraneous degrees of freedom. Stochastic differential equations (SDE) provide such a flexible representation (Arnold, 2014; Lande, 1976; Chandrasekhar, 1943; Nelson, 2004), by representing the dominant system forces in the deterministic part of the equation, (drift function) f():RdRdf(\cdot):R^{d}\rightarrow R^{d}, and summarising the unresolved or irrelevant degrees as stochastic forces acting on the system (diffusion). The resulting evolution equation has the form

dXt=f(Xt)dt+σdWt,X0=x0,\text{d}X_{t}=f(X_{t})\text{d}t+\sigma\,\text{d}W_{t},\qquad X_{0}=x_{0}, (1)

where σd×d\sigma\in\mathcal{R}^{d\times d} stands for the noise amplitude, and WtW_{t} for the d-dimensional vector of independent Wiener processes. The equation should be interpreted according to the Ito formalism. We observe the system state at distinct points in time through 𝒪k=ψ(Xkτ){\mathcal{O}_{k}=\psi(X_{k\tau})}, where Xkτ=˙Xt|t=τk{X}_{k\tau}\dot{=}{X}_{t}|{{}_{t=\tau k}}, with k=1,2,,K{k=1,2,\dots,K} observations collected at inter-observation intervals τ\tau, and want approximate the drift function f()f(\cdot) from the observations {𝒪k}k=0K\{\mathcal{O}_{k}\}^{K}_{k=0}. Here, we will consider ψ(x)=x\psi(x)=x, but the method generalises for monotonic functions ψ()\psi(\cdot).

2 Identifying stochastic systems from sparse-in-time observations

For small inter-observation intervals τ\tau, we consider that we observe the continuous path of the system state X0:TX_{0:T}. margin: High-frequency observations In that case, we can identify the drift function f()f(\cdot) by the first order Kramers-Moyal coefficient (Kramers, 1940; Moyal, 1949; Tabar, 2019) by empirically estimating conditional expectations of state increments (Friedrich et al., 2000; Ragwitz & Kantz, 2001; Boninsegna et al., 2018; Siegert et al., 1998). For the Bayesian non-parametric counterpart of this approach, (Ruttor et al., 2013; Batz et al., 2018) consider that transition probabilities between observations are Gaussian for dt0\text{d}t\rightarrow 0, resulting in a (Gaussian) likelihood for the drift (see Sec. A Eq. 7)

(X0:Tf)=exp[120Tf(Xt)σ22dt+0Tf(Xt),Xt+dtXtdt].\mathcal{L}(X_{0:T}\mid f)=\exp\left[-\frac{1}{2}\int^{T}_{0}\|f(X_{t})\|_{\sigma^{2}}^{2}\text{d}t+\int^{T}_{0}\langle f(X_{t}),X_{t+\text{d}t}-X_{t}\rangle\text{d}t\right]. (2)

To identify the drift  Ruttor et al. (2013) impose a Gaussian process prior on the function values ff (Eq. 12). In Eq. 2 we introduced the weighted inner product u,v=˙uσ2v{\langle u,v\rangle\dot{=}u^{\top}\cdot\sigma^{-2}v} and weighted norm uσ2=˙uσ2u\|u\|_{\sigma^{2}}\dot{=}u^{\top}\cdot\sigma^{-2}u.

However, as the inter-observation interval τ\tau increases, the transition probabilities between consecutive observations cannot be considered Gaussian, and thus the likelihood (Eq. 2) assumed between two successive observations is no longer valid if Eq. 1 is non-linear. margin: Low-frequency observations The likelihood for the drift P({𝒪k}k=1K|f)\text{P}(\{\mathcal{O}_{k}\}_{k=1}^{K}|f) for such settings takes the form of a path integral

P({𝒪k}k=1Kf)=P({𝒪k}k=1K,X0:Tf)𝒟(X0:T)=P({𝒪k}k=1KX0:T)P(X0:T|f)𝒟(X0:T),\text{P}(\{\mathcal{O}_{k}\}_{k=1}^{K}\mid f)=\int\text{P}(\{\mathcal{O}_{k}\}_{k=1}^{K},X_{0:T}\mid f)\mathcal{D}(X_{0:T})=\int\text{P}(\{\mathcal{O}_{k}\}_{k=1}^{K}\mid X_{0:T})\text{P}(X_{0:T}|f)\mathcal{D}(X_{0:T}), (3)

where {𝒪k}k=1K\{\mathcal{O}_{k}\}_{k=1}^{K} denotes the set of KK discrete time observations, P(X0:T|f)\text{P}(X_{0:T}|f) the prior path probability assuming the dynamics of Eq. 1, 𝒟(X0:T)\mathcal{D}(X_{0:T}) identifies the formal volume element on the path space, while P({𝒪k}k=1K|X0:T)\text{P}(\{\mathcal{O}_{k}\}_{k=1}^{K}|X_{0:T}) stands for the likelihood of observations given the latent path X0:TX_{0:T}.

\begin{overpic}[width=536.60715pt]{figs/path_augmentations.png} \put(3.0,25.0){a.} \put(23.0,25.0){b.} \put(43.0,25.0){c.} \put(63.0,25.0){d.} \put(80.0,25.0){e.} \end{overpic}
Figure 1: Existing path augmentation strategies match poorly the underlying transition density between consecutive observations underestimating its curvature. a.) Stochastic bridge marginal density (grey) between two successive observations 𝒪1\mathcal{O}_{1} and 𝒪2\mathcal{O}_{2} (pink triangles) following the ground truth dynamics. b.) Forward probability flow with estimated dynamics with a Gaussian likelihood (maroon) matches poorly the correct transition density and often fails to reach the second observation 𝒪2\mathcal{O}_{2} (downward pink triangle). Common path augmentation strategies employ either: c.) Brownian bridges, or d.) Ornstein Uhlenbeck (linear) bridge marginals resulting from local linearisations of the estimated drift with Gaussian likelihood. Both approaches match poorly the correct transition density, because they underestimate its curvature. e.) The proposed geometrically constrained path augmentation provides a better approximation of the underlying transition density by forcing the bridge paths towards the geodesic curve that connects consecutive observations on the manifold induced by the observations.

From a geometric perspective, we consider that the nonlinear system dynamics induce an invariant density that may be approximated by a (possibly low dimensional) manifold. The sparse-in-time observations are samples of that manifold. For low-frequency observations, Euclidean distances employed for computing the state increments Xt+τXtX_{t+\tau}-X_{t} do not consider the geometry induced by the nonlinear dynamics, and thereby underestimate the curvature of the transition density between consecutive observations (Figure 4).

3 Geometric path augmentation

Since the likelihood of Eq. 3 is intractable, we consider the unobserved continuous path between observations as latent random variables X0:TX_{0:T}, and obtain a maximum a posteriori estimate for the drift through Expectation Maximisation (EM) (Dempster et al., 1977). Similar parametric (Elerian et al., 2001; Sermaidis et al., 2013) and non-parametric (Batz et al., 2018; Ruttor et al., 2013) methods have addressed the drift inference in the past, primarily in high-frequency observation settings. Our approach is based on the non-parametric method discussed in  (Batz et al., 2018; Ruttor et al., 2013), with two significant advancements:

  • (i)

    We employ a path augmentation scheme following the estimated nonlinear dynamics resulting from inference with the Gaussian likelihood of Eq. 2 (as opposed to local linear approximations of these dynamics proposed in (Batz et al., 2018)).

  • (ii)

    Importantly, we further constrain the augmented paths to align with the geometry of the invariant density between consecutive observations (Fig. 4 b.).

We follow an iterative algorithm, where at each iteration nn we perform the two following steps:
(1.) An E(xpectation) step, where given a drift estimate f^n\hat{f}^{n} we construct an approximate posterior over the latent variables Q(X0:T)P(X0:T|{𝒪}k=1K,f^n(x))Q(X_{0:T})\approx\text{P}(X_{0:T}|\{\mathcal{O}\}^{K}_{k=1},\hat{f}^{n}(x)).
(2.) A M(aximisation) step, where we update the drift estimation.

\bullet Approximate posterior over paths. (E-step)

We approximate the continuous path trajectory X0:TX_{0:T} between observations by a posterior path measure defined as the minimiser of the free energy

[Q]=120T[g(x,t)f^(x)σ22+U𝒪(x,t)+U𝒢(x,t)]qt(x)dxdt.\mathcal{F}[Q]=\frac{1}{2}\int\limits^{T}_{0}\int\Big{[}\|g(x,t)-\hat{f}(x)\|_{\sigma^{2}}^{2}+{U_{\mathcal{O}}(x,t)}+{U_{\mathcal{G}}(x,t)}\Big{]}\,q_{t}(x)\,\text{d}x\,\text{d}t. (4)

The term U𝒪(x,t)=˙tklnP(𝒪k|x)δ(ttk){U_{\mathcal{O}}(x,t)\dot{=}-\sum_{t_{k}}\ln\text{P}(\mathcal{O}_{k}|x)\delta(t-t_{k})} forces the latent path to pass through the observations (or close to them depending on the observation process), while U𝒢(x,t)=˙Γtx2{U_{\mathcal{G}}(x,t)\dot{=}\|\Gamma_{t}-x\|^{2}} guides the latent path towards the geodesic curves γtk\gamma^{k}_{t^{\prime}} that connect consecutive observations on the manifold \mathcal{M} induced by the system’s invariant density (Sec. A.1.2). Here we denote Γt=˙{γtk}t=(k1)τ+tτ\Gamma_{t}\dot{=}\{\gamma^{k}_{t^{\prime}}\}_{t=(k-1)\tau+t^{\prime}\tau}, where γtk\gamma^{k}_{t^{\prime}} is the geodesic connecting 𝒪k\mathcal{O}_{k} and 𝒪k+1\mathcal{O}_{k+1}, and t[0,1]t^{\prime}\in[0,1]. We identify the geodesic γtk\gamma^{k}_{t^{\prime}} for each interval by learning the local metric of the manifold \mathcal{M} (see Sec. A.1.2 and Arvanitidis et al. (2019)).

\begin{overpic}[width=397.48499pt]{figs/After_firstbetter_FW_2Geodesic_LC_nose_0.50_dens_200_time_500_seed_13.png} \put(6.0,35.0){a.} \put(36.0,35.0){b.} \put(66.0,35.0){c.} \put(6.0,12.0){d.} \put(36.0,12.0){e.} \put(66.0,12.0){f.} \end{overpic}
Figure 2: Proposed path augmentation after two iterations already provides a good approximation of underlying drift. Estimated (red) and true (grey) force field with a.) Gaussian likelihood b.) after one and c.) after second iteration of augmentations. (insets) Ground truth against estimated angles for each point on the two dimensional grid. e.) Weighted root mean square error (wRMSE) for estimated drifts after each iteration for the presented example. The weights for averaging the error at each grid point are obtained from a kernel density estimation on the observations {𝒪k}k=1K\{\mathcal{O}_{k}\}^{K}_{k=1}. d.) wRMSE against inter-observation interval τ\tau for different noise conditions σ={0.25,0.5}\sigma=\{0.25,0.5\} for drift estimated with a Gaussian likelihood (gaus-circles), after first augmentation (1st-triangles), and after second augmentation (2nd-squares) for T=500T=500. f.) wRMSE against noise amplitude σ\sigma in the system for different trajectory durations T={500,1000}T=\{500,1000\} time units for inter-observation interval τ=240\tau=240. Markers follow the same coding as in d.). Errorbars indicate one standard deviation over 55 independent realisations.

Following Opper (2019), for each inter-observation interval [𝒪k,𝒪k+1][\mathcal{O}_{k},\mathcal{O}_{k+1}] we identify the posterior path measure (minimiser of Eq. 4) by the solution of a stochastic optimal control problem Maoutsa & Opper (2022; 2021); Maoutsa (2022) with the objective to obtain a time-dependent drift adjustment u(x,t):=g(x,t)f^(x)u(x,t):=g(x,t)-\hat{f}(x) for the system with drift f^(x)\hat{f}(x) with initial and terminal constraints determined by U𝒪(x,t)U_{\mathcal{O}}(x,t), and additional path constraints U𝒢(x,t)U_{\mathcal{G}}(x,t).

\bullet Drift estimation. (M-step)

To estimate the drift from a sampled latent path, we assume a Gaussian process prior over function values and employ a sparse kernel approximation similar to Batz et al. (2018) (see Sec. A.2 for details).

\begin{overpic}[width=258.36281pt]{figs/newer_comparison.png} \put(6.0,55.0){a.} \put(6.0,30.0){b.} \put(50.0,55.0){c.} \put(50.0,30.0){d.} \end{overpic}
Figure 3: Comparison of proposed path augmentation with Ornstein-Uhlenbeck augmentation. Weighted root mean square error (wRMSE) against noise amplitude σ\sigma for different inter-observation intervals for noise amplitude for moderate inter-observation intervals with a.) σ=0.25\sigma=0.25, and b.) σ=0.50\sigma=0.50, and for large inter-observation intervals with c.) σ=0.50\sigma=0.50, and d.) σ=0.75\sigma=0.75 and known direction of movement. In c., d. the inter-observation interval results in only one observation per oscillation period.

4 Numerical experiments

To demonstrate the performance of the proposed method we performed systematic estimations for a two-dimensional Van der Pol oscillator under different noise conditions σ\sigma, observed at different inter-observation intervals τ\tau for different lengths of trajectories TT (see Sec. B). For the examined noise amplitudes (Fig. 2 f.), the proposed path augmentation algorithm improves the naive estimation with Gaussian assumptions within two iterations for most noise amplitudes (Fig. 2). For increasing noise the improvement contributed by our approach decreases (Fig. 2f.), but is nevertheless not negligible. For low noise conditions, geodesics approximate the manifold structure better, however the path integral control is limited by the control costs proportional to inverse noise covariance. Our framework had comparable accuracy for all inter-observation lengths, but improvement was small for small lengths since in that setting the estimation with Gaussian likelihood already provides a good approximation of the ground truth drift. The proposed approach was compared to the augmentation method proposed by Batz et al. (2018) (augmentation with Ornstein-Uhlenbeck bridges) and delivered more accurate estimates for larger inter-observation intervals. For inter-observation intervals with only one observation per oscillation period (Figure 3c.,d.), the proposed approach delivered better results by considering additionally knowledge of the direction of movement in the state space (c.f. Sec. B). The variance of estimates of the proposed method was smaller compared to Batz et al. due to conditioning on the invariant geometry of the system.

5 Conclusion and Discussion

We introduced a new method for identifying stochastic systems from sparse-in-time observations of the system’s state that reconciles approaches that rely purely on the temporal structure of the observations with those that approximate the geometry of the invariant density. Our method employs a path augmentation strategy that uses the nonlinear dynamics of a coarse drift estimate and further constrains the augmented paths to follow the local geometry of the system’s invariant density. We found that the proposed approach provides efficient recovery of the underlying drift function for periodic or quasi-periodic systems under several noise conditions. Only a limited number of inference methods have attempted to merge geometric and temporal perspectives for the identification of stochastic systems, such as the Langevin regression (Callaham et al., 2021), TrajectoryNet (Tong et al., 2020), and the diffusion map method of Shnitzer et al. (2020; 2016). However, our method differs from these approaches by employing the geodesic approximation of the underlying data geometry (see also Sec. C Future directions).

Acknowledgements

We thank Mina Miolane for early guidance on extracting geodesics from variational autoencoder based approximation of data manifolds, Stefan Sommer for answering questions on diffusions on manifolds, and Prof. Manfred Opper for prompting us to work on this problem. We further thank Georgios Arvanitidis for maintaining an open repository with his previous work on approximating geodesics from data manifolds. An earlier account of this work has been presented at the NeurIPS 2022 workshop Machine Learning and the Physical Sciences (Maoutsa, 2023). We further acknowledge that previous work from the Python (Van Rossum & Drake Jr, 1995), numpy (Harris et al., 2020), scipy (Virtanen et al., 2020), matplotlib (Hunter, 2007), seaborn (Waskom, 2021), GPflow (Matthews et al., 2017), pyEMD (Pele & Werman, 2009), and pytorch (Paszke et al., 2017) communities facilitated the implementation of the computational part of this work.

An implementation of this work can be found in the following repository: https://github.com/dimitra-maoutsa/Geometric-path-augmentation-for-SDEs.

References

\appendixpage

Appendix A Drift inference for high and low frequency observations

We consider systems whose evolution is captured by the stochastic differential equation Eq. 1.

High frequency observations.

When the system path X0:TX_{0:T} is observed in continuous time, the infinitesimal transition probabilities of the diffusion process between consecutive observations are Gaussian, i.e.,

Pf(X0:Tf)exp(12dttXt+dtXtf(Xt)dtσ22).\text{P}_{f}(X_{0:T}\mid f)\propto\exp\left(-\frac{1}{2\text{d}t}\sum_{t}\|X_{t+\text{d}t}-X_{t}-f(X_{t})\text{d}t\|_{\sigma^{2}}^{2}\right). (5)

In turn, the transition probability of (discretised) Wiener paths P𝒲(X0:T)P_{\mathcal{W}}(X_{0:T}) (i.e., paths from a drift-less process) can be expressed as

P𝒲(X0:T)=exp(12dttXt+dtXtσ22),\text{P}_{\mathcal{W}}(X_{0:T})=\exp\left(-\frac{1}{2\text{d}t}\sum_{t}\|X_{t+\text{d}t}-X_{t}\|_{\sigma^{2}}^{2}\right), (6)

where uσ2=˙uσ2u\|u\|_{\sigma^{2}}\dot{=}u^{\top}\cdot{\sigma}^{-2}u denotes the weighted norm with D=˙σ2D\dot{=}\sigma^{2} indicating the noise covariance. We can thus express the likelihood for the drift ff by the Radon-Nykodym derivative between Pf(X0:T|f)P_{f}(X_{0:T}|f) and P𝒲(X0:T)\text{P}_{\mathcal{W}}(X_{0:T}) for paths X0:TX_{0:T} within the time interval [0,T][0,\,T]  (Liptser & Shiryaev, 2013)

(X0:Tf)=exp[12tf(Xt)σ22dt+tf(Xt),Xt+dtXtσ2],\mathcal{L}(X_{0:T}\mid f)=\exp\left[-\frac{1}{2}\sum_{t}\|f(X_{t})\|_{\sigma^{2}}^{2}\text{d}t+\sum_{t}\langle f(X_{t}),X_{t+\text{d}t}-X_{t}\rangle_{\sigma^{2}}\right], (7)

where for brevity we have introduced the notation u,v=˙uσ2v\langle u,v\rangle\dot{=}u^{\top}\cdot{\sigma}^{-2}v for the weighted inner product with respect to the inverse noise covariance σ2{\sigma}^{-2}. This expression results from applying the Girsanov theorem on the path measures induced by a process with drift ff and a Wiener process, with same diffusion σ\sigma, and employing an Euler-Maruyama discretisation on the continuous path X0:TX_{0:T}.

The likelihood given a continuously observed path of the SDE (Eq. 7) has a quadratic form in terms of the drift function. Therefore a Gaussian measure over function values (Gaussian process) is a natural conjugate prior for this likelihood. To identify the drift in a non-parametric form, we assume a Gaussian process prior for the function values fP0(f)=GP(mf,kf)f\sim\text{P}_{0}({f})=\text{GP}(m^{f},k^{f}), where mfm^{f} and kfk^{f} denote the mean and covariance function of the Gaussian process  (Ruttor et al., 2013). The prior measure can be written as

P0(f)=exp[12f(x)(kf(x,x))1f(x)dxdx],\text{P}_{0}({f})=\exp\left[-\frac{1}{2}\int\int f(x)\left(k^{f}(x,x^{\prime})\right)^{-1}f(x^{\prime})\text{d}x\text{d}x^{\prime}\right], (8)

if we consider a zero mean Gaussian process mf=0m^{f}=0.

Bayesian inference for the drift function ff requires the computation of a probability distribution in the function space, the posterior probability distribution Pf(fX0:T)\text{P}_{f}(f\mid X_{0:T}). From the Bayes’ rule the posterior can be expressed as

Pf(fX0:T)=P0(f)(X0:Tf)ZP0(f)(X0:Tf),\text{P}_{f}(f\mid X_{0:T})=\frac{\text{P}_{0}(f)\mathcal{L}(X_{0:T}\mid f)}{Z}\propto\text{P}_{0}(f)\mathcal{L}(X_{0:T}\mid f), (9)

where ZZ denotes a normalising factor defined as a path integral

Z=P0(f)(X0:Tf)𝒟f,Z=\int\text{P}_{0}(f)\mathcal{L}(X_{0:T}\mid f)\mathcal{D}f, (10)

where 𝒟f\mathcal{D}f denotes integration over the Hilbert space f:H0[f]<f:H_{0}[f]<\infty . Here we have expressed the prior probability over functions as P0(f)=eH0[f]\text{P}_{0}(f)=e^{-H_{0}[f]}. In (Ruttor et al., 2013) the authors show that in the continuous time limit, nonparametric estimation of drift functions becomes equivalent to Gaussian process regression, with the objective to identify the mapping from the system state XtX_{t} to state increments dXt\text{d}X_{t} (Rasmussen, 2003). More precisely, we consider NN observations of the system state XtX_{t} as the regressor, with associated response variables

Yt=Xt+dtXtdt,Y_{t}=\frac{X_{t+\text{d}t}-X_{t}}{\text{d}t}, (11)

and denote the kernel function of the Gaussian process by k(x,x)k(x,x^{\prime}).

If we denote with 𝒳={Xt}t=0Tdt\mathcal{X}=\{X_{t}\}^{T-\text{d}t}_{t=0} and 𝒴={Yt}t=0Tdt\mathcal{Y}=\{Y_{t}\}^{T-\text{d}t}_{t=0} the set of state observations and observation increments, the mean of the posterior process over drift functions ff can be expressed as

f¯(x)=kf(x,𝒳)(𝒦+σ2dtIN)1𝒴,\bar{f}(x)=k^{f}(x,\mathcal{X})^{\top}\left(\mathcal{K}+\frac{{\sigma}^{2}}{\text{d}t}I_{N}\right)^{-1}\mathcal{Y}, (12)

where we abused the notation and denoted with kf(x,𝒳)k^{f}(x,\mathcal{X}) the vector resulting from evaluating the kernel kfk^{f} at points xx and {𝒪t}k=1K1\{\mathcal{O}_{t}\}^{K-1}_{k=1}. Similarly 𝒦=kf(𝒳,𝒳)\mathcal{K}=k^{f}(\mathcal{X},\mathcal{X}) stands for the (K1)×(K1)(K-1)\times(K-1) matrix resulting from evaluation of the kernel on all observation pairs. In a similar vein, the posterior variance can be written as

Σ2(x)=kf(x,x)kf(x,𝒳)(𝒦+σ2dt)1kf(x,𝒳),\Sigma^{2}(x)=k^{f}(x,x)-k^{f}(x,\mathcal{X})^{\top}\left(\mathcal{K}+\frac{{\sigma}^{2}}{\text{d}t}\right)^{-1}k^{f}(x,\mathcal{X}), (13)

where the term σ2/dt{\sigma}^{2}/\text{d}t plays the role of observation noise.

Low frequency observations.

When the inter-observation interval becomes large (low frequency observations), the Gaussian likelihood of Eq. 7 becomes invalid, since for large inter-observation intervals the transition density is no longer Gaussian. Thus, drift estimation with Gaussian assumptions  (Friedrich & Peinke, 1997; Ruttor et al., 2013) becomes inaccurate. To mitigate this issue Lade (Lade, 2009) introduced a method to compute finite time corrections for the drift estimates, which has been applied (to the best of our knowledge) mostly to one dimensional problems (Honisch & Friedrich, 2011). On the other hand, the statistics community has proposed path augmentation schemes that augment the observed trajectory to a nearly continuous-time trajectory by sampling a simplified system’s dynamics between observations (Golightly & Wilkinson, 2008; Papaspiliopoulos et al., 2012; Sermaidis et al., 2013; Beskos et al., 2006; Chib et al., 2006). However for large inter-observation intervals and for nonlinear systems the simplified dynamics employed for path augmentation match poorly the underlying path statistics, and these methods show poor convergence rates or fail to identify the correct dynamics (Figure 1 c. and d.). We point out here, that path augmentation with Ornstein Uhlenbeck bridges using as drift the local linearisation of the correct dynamics, provides a good approximation of the underlying transition density. However, during inference, the true underlying dynamics are unknown, and the proposed local linearisations on inaccurate drift estimates (Batz et al., 2018) perform poorly for low frequency observations.

Notice that as the inter-observation interval τ\tau increases, the Gaussian likelihood assumed between two successive observations is no longer valid if the system is non-linear or when the noise is state dependent. The likelihood for the drift for such settings can be expressed in terms of a path integral

P(𝒪1:Kf)=P(𝒪1:KX0:T)P(X0:Tf)𝒟(X0:T),\text{P}(\mathcal{O}_{1:K}\mid f)=\int\text{P}(\mathcal{O}_{1:K}\mid X_{0:T})\text{P}(X_{0:T}\mid f)\mathcal{D}(X_{0:T}), (14)

where 𝒪1:K=˙{𝒪k}k=1K\mathcal{O}_{1:K}\dot{=}\{\mathcal{O}_{k}\}_{k=1}^{K} denotes the set of KK discrete time observations, P(X0:Tf)\text{P}(X_{0:T}\mid f) the prior path probability resulting from a diffusion process with drift f(x)f(x), 𝒟(X0:T)\mathcal{D}(X_{0:T}) identifies the formal volume element on the path space, and P(𝒪1:KX0:T)\text{P}(\mathcal{O}_{1:K}\mid X_{0:T}) stands for the likelihood of observations given the latent path X0:TX_{0:T}.

However, the path integral of Eq. 14 is intractable for nonlinear systems, thus we need to simultaneously estimate the drift and latent state of the diffusion process, i.e., to approximate the joint posterior measure of latent paths and drift functions P(X0:T,f𝒪1:K)\text{P}(X_{0:T},f\mid\mathcal{O}_{1:K}). Therefore we consider the unobserved continuous path X0:TX_{0:T} as latent random variables and employ an Expectation Maximisation (EM) algorithm to identify a maximum a posteriori estimate for the drift function. More precisely, we follow an iterative algorithm, where at each iteration nn we alternate between the two following steps:

An Expectation step, where given a drift estimate f^n(x)\hat{f}^{n}(x) we construct an approximate posterior over the latent variables Q(X0:T)P(X0:T𝒪1:K,f^n(x)){Q(X_{0:T})\approx\text{P}(X_{0:T}\mid\mathcal{O}_{1:K},\hat{f}^{n}(x))}, and compute the expected log-likelihood of the augmented path

𝔏(f^n(x),Q)=𝔼Q[ln(X0:Tf^n(x))].\mathfrak{L}\big{(}\hat{f}^{n}(x),Q\big{)}=\mathbb{E}_{Q}\Big{[}\ln\mathscr{L}\big{(}X_{0:T}\mid\hat{f}^{n}(x)\big{)}\Big{]}. (15)

A Maximisation step, where we update the drift estimation by maximising the expected log likelihood

fn+1(x)=argmaxf[𝔏(fn(x),Q)lnP0(fn(x))].f^{n+1}(x)=\arg\max_{f}\Big{[}\mathfrak{L}\big{(}f^{n}(x),Q\big{)}-\ln\text{P}_{0}\big{(}f^{n}(x)\big{)}\Big{]}. (16)

In Eq. 16 P0\text{P}_{0} denotes the Gaussian process prior over function values.

A.1 Approximate posterior over paths.

Here we first formulate the approximate posterior over paths (conditional distribution for the path given the observations) by considering only individual observations as constraints (Section A.1.1). However, this approach results computationally taxing calculations during path augmentation, since the observations are atypical states of the initially estimated drift. To overcome this issue, we subsequently extend the formalism (Section A.1.2) to incorporate constraints that consider also the local geometry of the observations.

A.1.1 Approximate posterior over paths without geometric constraints.

Given a drift function (or a drift estimate) f^(x)\hat{f}(x) we can apply variational techniques to approximate the posterior measure over the latent path conditioned on the observations {𝒪k}k=1K\{\mathcal{O}_{k}\}^{K}_{k=1}. We consider that the prior process (the process without considering the observations {𝒪k}k=1K\{\mathcal{O}_{k}\}^{K}_{k=1}) is described by the equation

P(X0:Tf^):dXt=f^(Xt)dt+σdWt.\text{P}(X_{0:T}\mid\hat{f}):\qquad\text{d}X_{t}=\hat{f}(X_{t})\text{d}t+\sigma\text{d}W_{t}. (17)

We will define an approximating (posterior) process that is conditioned on the observations. The conditioned process is also a diffusion process with the same diffusion as Eq. 17 but with a modified, time-dependent drift g(x,t)g(x,t) that accounts for the observations (Chetrite & Touchette, 2015; Majumdar & Orland, 2015). We identify the approximate posterior measure QQ with the posterior measure induced by an approximating process that is conditioned by the observations 𝒪1:K\mathcal{O}_{1:K} (Opper, 2019), with governing equation

Q(X0:T):dXt=g(Xt,t)dt+σdWt=(f^(Xt)+σ2u(Xt,t))dt+σdWt.Q(X_{0:T}):\qquad\text{d}X_{t}=g(X_{t},t)\text{d}t+\sigma\text{d}W_{t}=\left(\hat{f}(X_{t})+\sigma^{2}u(X_{t},t)\right)\text{d}t+\sigma\text{d}W_{t}. (18)

The effective drift g(Xt,t)g(X_{t},t) of Eq. 18 may be obtained from the solution of the variational problem of minimising the free energy

[Q]=𝒦(Q(X0:T)||P(X0:Tf^))k=1KEQ[lnP(𝒪kXtk)].\mathcal{F}[Q]=\mathcal{KL}\Big{(}Q(X_{0:T})||\text{P}(X_{0:T}\mid\hat{f})\Big{)}-\sum\limits_{k=1}^{K}{E}_{Q}[\ln\text{P}(\mathcal{O}_{k}\mid X_{t_{k}})]. (19)

By applying the Cameron-Girsanov-Martin theorem we can express the Kullback-Leibler divergence between the two path measures induced by the diffusions with drift f^(x)\hat{f}(x) and g(x,t)g(x,t) as

𝒦(Q(X0:T)||P(X0:T|f^))\displaystyle\mathcal{KL}\Big{(}Q(X_{0:T})||\text{P}(X_{0:T}|\hat{f})\Big{)} =EQ[ln(dQ(X0:T)dP(X0:T|f^))]\displaystyle=E_{{Q}}\left[\text{ln}\left(\frac{d{Q}(X_{0:T})}{d{P}\left(X_{0:T}|\hat{f}\right)}\right)\right] (20)
=EQ[(120Tf^(Xt)g(Xt,t)σ22dt+0Tf^(Xt)g(Xt,t)σ2dWt)]\displaystyle=E_{{Q}}\left[\left(-\frac{1}{2}\int_{0}^{T}{{\|\hat{f}(X_{t})-g(X_{t},t)\|_{\sigma^{2}}^{2}}\text{d}t}+\int_{0}^{T}{\frac{\hat{f}(X_{t})-g(X_{t},t)}{{\sigma^{2}}}\text{d}W_{t}}\right)\right] (21)
=EQ[(120Tf^(Xt)g(Xt,t)σ22dt+VT)]\displaystyle=E_{{Q}}\left[\left(-\frac{1}{2}\int_{0}^{T}{{\|\hat{f}(X_{t})-g(X_{t},t)\|_{\sigma^{2}}^{2}}\text{d}t}+V_{T}\right)\right] (22)
=120Tg(x,t)f^(x)σ22qt(x)dxdt+,\displaystyle=\frac{1}{2}\int\limits^{T}_{0}\int\|g(x,t)-\hat{f}(x)\|_{\sigma^{2}}^{2}\,q_{t}(x)\,\text{d}x\,\text{d}t+\mathfrak{C}, (23)

where qt(x)q_{t}(x) stands for the marginal density for XtX_{t} of the approximate process. In the third line we have introduced the random variable VT=0Tf^(Xt)g(Xt,t)σ2dWtV_{T}=\int_{0}^{T}{\frac{\hat{f}(X_{t})-g(X_{t},t)}{{\sigma^{2}}}\text{d}W_{t}}. Under the assumption that the function (Xt)=f^(Xt)g(Xt,t){\ell(X_{t})=\hat{f}(X_{t})-g(X_{t},t)} is bounded, piece-wise continuous, and in L2[0,)L^{2}[0,\infty) , VTV_{T} follows the distribution 𝒩(VT0,0T2(s)ds)\mathcal{N}\left(V_{T}\mid 0,\int_{0}^{T}\ell^{2}(s)\text{d}s\right), which for a given TT will result into a constant \mathfrak{C}. Thus the second term in Eq. 23 is not relevant for the minimisation of the free energy and will be omitted.

We can thus express the free energy of Eq. 19 as (Opper, 2019)

[Q]=120T[g(x,t)f^(x)σ22+U(x,t)]qt(x)dxdt,\mathcal{F}[Q]=\frac{1}{2}\int\limits^{T}_{0}\int\Big{[}\|g(x,t)-\hat{f}(x)\|_{\sigma^{2}}^{2}+U(x,t)\Big{]}\,q_{t}(x)\,\text{d}x\,\text{d}t, (24)

where the term U(x,t)U(x,t) accounts for the observations U(x,t)=tklnP(𝒪kx)δ(ttk)U(x,t)=-\sum\limits_{t_{k}}\ln\text{P}(\mathcal{O}_{k}\mid x)\delta(t-t_{k}).

The minimisation of the functional of the free energy can be construed as a stochastic control problem (Opper, 2019) with the objective to identify a time-dependent drift adjustment u(x,t):=g(x,t)f^(x)u(x,t):=g(x,t)-\hat{f}(x) for the system with drift f^(x)\hat{f}(x) so that the controlled dynamics fulfil the constraints imposed by the observations.

For the case of exact observations, i.e., for an observation process ψ(x)=x\psi(x)=x, we can compute the drift adjustment for each of the K1K-1 inter-observation intervals independently. Thus for each interval between consecutive observations, we identify the optimal control u(x,t)u(x,t) required to construct a stochastic bridge following the dynamics of Eq. 17 with initial and terminal states the respective observations 𝒪k\mathcal{O}_{k} and 𝒪k+1\mathcal{O}_{k+1}.

The optimal drift adjustment for such a stochastic control problem for the inter-observation interval between 𝒪k\mathcal{O}_{k} and 𝒪k+1\mathcal{O}_{k+1} can be obtained from the solution of the backward equation (see (Maoutsa & Opper, 2022; 2021))

ϕt(x)t=f^ϕt(x)+U(x,t)ϕt(x),\frac{\partial\phi_{t}(x)}{\partial t}=-\mathcal{L}_{\hat{f}}^{\dagger}\phi_{t}(x)+U(x,t)\phi_{t}(x), (25)

with terminal condition ϕT(x)=χ(x)=δ(x𝒪k+1)\phi_{T}(x)=\chi(x)=\delta(x-\mathcal{O}_{k+1}) and with f^\mathcal{L}_{\hat{f}}^{\dagger} denoting the adjoint Fokker-Planck operator for the process of Eq. 17. As shown in Maoutsa et al. (Maoutsa & Opper, 2022; 2021) the optimal drift adjustment u(x,t)u(x,t) can be expressed in terms of the difference of the logarithmic gradients of two probability flows

u(x,t)=D(lnqTt(x)lnρt(x)),u^{*}(x,t)=D\Big{(}\nabla\ln q_{T-t}(x)-\nabla\ln\rho_{t}(x)\Big{)}, (26)

where ρt\rho_{t} fulfils the forward (filtering) partial differential equation (PDE)

ρt(x)t=f^ρt(x)U(x,t)ρt(x),\frac{\partial\rho_{t}(x)}{\partial t}={\cal{L}}_{\hat{f}}\rho_{t}(x)-U(x,t)\rho_{t}(x), (27)

while qtq_{t} is the solution of a time-reversed PDE that depends on the logarithmic gradient of ρt(x)\rho_{t}(x)

qt(x)t\displaystyle\frac{\partial{q}_{t}(x)}{\partial t} =[(σ2lnρTt(x)f(x,Tt))qt(x)]+σ222qt(x),\displaystyle=-\nabla\cdot\Bigg{[}\Big{(}\sigma^{2}\nabla\ln\rho_{T-t}(x)-f(x,T-t)\Big{)}{q}_{t}(x)\Bigg{]}+\frac{\sigma^{2}}{2}\nabla^{2}{q}_{t}(x), (28)

with initial condition q0(x)ρT(x)χ(x){q}_{0}(x)\propto\rho_{T}(x)\chi(x) .

A.1.2 Approximate posterior over paths with geometric constraints.

The previously described construction of the approximate measure in terms of stochastic bridges is relevant when the observations have non vanishing probability under the law of the prior diffusion process of Eq. 17. However, when the prior process (with the estimated drift f^\hat{f}) differs considerably from the process that generated the observations, such a construction might either provide a bad approximation of the underlying path measure, or show slow numerical convergence in the construction of the diffusion bridges. To overcome this issue, we consider here additional constraints for the posterior process that force the paths of the posterior measure to respect the local geometry of the observations. In the following we provide a brief introduction on the basics of Riemannian geometry and consequently continue with the geometric considerations of the proposed method.

Riemannian geometry.

A dd-dimensional Riemannian manifold (Do Carmo & Flaherty Francis, 1992; Lee, 2018) (,h)\left(\mathcal{M},h\right) embedded in a DD-dimensional ambient space 𝒳=D\mathcal{X}=\mathcal{R}^{D} is a smooth curved dd-dimensional surface endowed with a smoothly varying inner product (Riemannian) metric h:x|xh:x\rightarrow\langle\cdot|\cdot\rangle_{x} on 𝒯x\mathcal{T}_{x}\mathcal{M}. A tangent space 𝒯x\mathcal{T}_{x}\mathcal{M} is defined at each point xx\in\mathcal{M}. The Riemannian metric hh defines a canonical volume measure on the manifold \mathcal{M}. Intuitively this characterises how to compute inner products locally between points on the tangent space of the manifold \mathcal{M}, and therefore determines also how to compute norms and thus distances between points on \mathcal{M}.

A coordinate chart (G,ϕ)(G,\phi) provides the mapping from an open set GG on \mathcal{M} to an open set VV in the Euclidean space. The dimensionality of the manifold is dd if for each point xx\in\mathcal{M} there exists a local neighborhood GdG\subset\mathcal{R}^{d}. We can represent the metric hh on the local chart (G,ϕ)(G,\phi) by the positive definite matrix (metric tensor) H(x)=(hi,j)x,0i,j,d=(xi|xjx)0i,j,dH(x)=(h_{i,j})_{x,0\leq i,j,\leq d}=\left(\langle\frac{\partial}{\partial x_{i}}|\frac{\partial}{\partial x_{j}}\rangle_{x}\right)_{0\leq i,j,\leq d} at each point xGx\in G.

For v,w𝒯xv,w\in\mathcal{T}_{x}\mathcal{M} and xGx\in G, their inner product can be expressed in terms of the matrix representation of the metric hh on the tangent space 𝒯x\mathcal{T}_{x}\mathcal{M} as v|wx=vH(x)w\langle v|w\rangle_{x}=v^{\top}H(x)w, where H(x)d×dH(x)\in\mathcal{R}^{d\times d} .

The length of a curve γ:[0,1]\gamma:[0,1]\rightarrow\mathcal{M} on the manifold is defined as the integral of the norm of the tangent vector

(γt)=01γ˙thdt=01γ˙tH(γt)γ˙tdt,\ell(\gamma_{t^{\prime}})=\int^{1}_{0}\|\dot{\gamma}_{t^{\prime}}\|_{h}\text{d}t^{\prime}=\int^{1}_{0}\sqrt{\dot{\gamma}_{t^{\prime}}^{\top}H(\gamma_{t^{\prime}})\dot{\gamma}_{t^{\prime}}}\text{d}t^{\prime}, (29)

where the dotted letter indicates the velocity of the curve γ˙t=tγt\dot{\gamma}_{t^{\prime}}=\partial_{t^{\prime}}\gamma_{t^{\prime}}. A geodesic curve is a locally length minimising smooth curve that connects two given points on the manifold.

Riemannian geometry of the observations.

For approximating the posterior over paths we take into account the geometry of the invariant density as it is represented by the observations. To that end, we consider systems whose dynamics induce invariant (inertial) manifolds that contain the global attractor of the system and on which system trajectories concentrate (Wiggins, 1994; Mohammed & Scheutzow, 1999; Girya & Chueshov, 1995; Fenichel & Moser, 1971; Arnold, 1990; Carverhill, 1985). We assume thus that the continuous-time trajectories X0:TdX_{0:T}\in\mathcal{R}^{d} of the underlying system concentrates on an invariant manifold md\mathcal{M}\in\mathcal{R}^{m\leq d} of dimensionality mm (possibly) smaller than dd. The discrete-time observations 𝒪k\mathcal{O}_{k} are thus samples of the manifold \mathcal{M}. The central premise of our approach is that unobserved paths between successive observations will be lying either on or in the vicinity of the manifold \mathcal{M}. In particular, we postulate that unobserved paths should lie in the vicinity of geodesics that connect consecutive observations on \mathcal{M}. To that end we propose a path augmentation framework that constraints the augmented paths to lie in the vicinity of identified geodesics between consecutive observations.

However, while this view of a lower dimensional manifold embedded in a higher dimensional ambient space helps to build our intuition for the proposed method, for computational purposes we adopt a complementary view inspired by the discussion in (Fröhlich et al., 2021). According to this view, we consider the entire observation space d\mathcal{R}^{d} as a smooth Riemannian manifold, =˙d\mathcal{M}\dot{=}\mathcal{R}^{d}, characterised by a Riemannian metric hh. The effect of the nonlinear geometry of the observations is then captured by the metric hh. Thus to approximate the geometric structure of the system’s invariant density, we learn the Riemannian metric tensor H:dd×dH:\mathcal{R}^{d}\rightarrow\mathcal{R}^{d\times d} and compute the geodesics between consecutive observations according to the learned metric. Intuitively according to this view the observations {𝒪k}k=1K\{\mathcal{O}_{k}\}^{K}_{k=1} introduce distortions in the way we compute distances on the state space.

In effect this approach does not reduce the dimensionality of the space we operate, but changes the way we compute inner products and thus distances, lengths, and geodesic curves on \mathcal{M}. The alternative perspective of working on a lower dimensional manifold would strongly depend on the correct assessment of the dimensionality of said manifold. For example, one could use a Variational Autoencoder to approximate the observation manifold and subsequently obtain the Riemannian metric from the embedding of the manifold mediated by the decoder. However, our preliminary results of such an approach revealed that such a method requires considerable fine tuning to adapt to the characteristics of each dynamical system and is sensitive to the estimation of the dimensionality of the approximated manifold.

To learn the Riemannian metric and compute the geodesics we follow the framework proposed by Arvanitidis et al. (2019). In particular, we approximate the local metric induced by the observations at location 𝐱\mathbf{x} of the state space, in a non-parametric form by the inverse of the weighted local diagonal covariance computed on the observations as (Arvanitidis et al., 2019)

Hdd(𝐱)=(i=1Kwi(𝐱)(xi(d)x(d))2+ϵ)1,H_{dd}(\mathbf{x})=\left(\sum\limits^{K}_{i=1}w_{i}(\mathbf{x})\left(x^{(d)}_{i}-x^{(d)}\right)^{2}+\epsilon\right)^{-1}, (30)

with weights wi(𝐱)=exp(𝐱i𝐱222σ2)w_{i}(\mathbf{x})=\exp\left(-\frac{\|\mathbf{x}_{i}-\mathbf{x}\|^{2}_{2}}{2\sigma^{2}_{\mathcal{M}}}\right), and x(d)x^{(d)} denoting the dd-th dimensional component of the vector 𝐱\mathbf{x}. The parameter ϵ>0\epsilon>0 ensures non-zero diagonals of the weighted covariance matrix, while σ\sigma_{\mathcal{M}} characterises the curvature of the manifold.

Between consecutive observations for each interval [𝒪k,𝒪k+1][\mathcal{O}_{k},\mathcal{O}_{k+1}], we identify the geodesic γtk\gamma^{k}_{t^{\prime}} as the energy minimising curve, i.e., as the minimiser of the kinetic energy functional (γtk)=01L(γtk,γ˙tk)dt\mathcal{E}(\gamma^{k}_{t^{\prime}})=\int^{1}_{0}L_{{\mathcal{M}}}(\gamma^{k}_{t^{\prime}},\dot{\gamma}^{k}_{t^{\prime}})\text{d}t^{\prime}

γtk=argminγtk,γ0k=𝒪k,γ1k=𝒪k+101L(γtk,γ˙tk)dt,\gamma^{k*}_{t^{\prime}}=\underset{\gamma^{k}_{t^{\prime}},\gamma^{k}_{0}=\mathcal{O}_{k},\gamma^{k}_{1}=\mathcal{O}_{k+1}}{\arg\min}\int^{1}_{0}L_{{\mathcal{M}}}(\gamma^{k}_{t^{\prime}},\dot{\gamma}^{k}_{t^{\prime}})\text{d}t^{\prime},
with01L(γtk,γ˙tk)dt=1201γ˙tkh2,\text{with}\;\;\;\;\int^{1}_{0}L_{{\mathcal{M}}}(\gamma^{k}_{t^{\prime}},\dot{\gamma}^{k}_{t^{\prime}})\text{d}t^{\prime}=\frac{1}{2}\int^{1}_{0}\|\dot{\gamma}^{k}_{t^{\prime}}\|^{2}_{h}, (31)

where L(γtk,γ˙tk)L_{{\mathcal{M}}}(\gamma^{k}_{t^{\prime}},\dot{\gamma}^{k}_{t^{\prime}}) denotes the Lagrangian. The minimising curve of this functional is the same as the minimiser of the curve length functional (γt)\ell(\gamma_{t^{\prime}}) (Eq. 29), i.e., the geodesic (Do Carmo & Flaherty Francis, 1992).

By applying calculus of variations, the minimising curve of the functional (γtk)\mathcal{E}(\gamma^{k}_{t^{\prime}}) can be obtained from the Euler-Lagrange equations, resulting in the following system of second order differential equations (Arvanitidis et al., 2017; Do Carmo & Flaherty Francis, 1992)

γt¨k=12H(γtk)1(2(I(γt˙k))vec[H(γtk)]γtkγt˙kvec[H(γtk)]γtk(γt˙kγt˙k)),\ddot{\gamma_{t}}^{k}=-\frac{1}{2}{H(\gamma^{k}_{t})}^{-1}\Bigg{(}2\left(I\otimes(\dot{\gamma_{t}}^{k})^{\top}\right)\frac{\partial\text{vec}[H(\gamma^{k}_{t})]}{\partial\gamma^{k}_{t}}\dot{\gamma_{t}}^{k}-\frac{\partial\text{vec}[H(\gamma^{k}_{t})]^{\top}}{\partial\gamma^{k}_{t}}\left(\dot{\gamma_{t}}^{k}\otimes\dot{\gamma_{t}}^{k}\right)\Bigg{)}, (32)

with boundary conditions γ0k=𝒪k\gamma^{k}_{0}=\mathcal{O}_{k} and γ1k=𝒪k+1\gamma^{k}_{1}=\mathcal{O}_{k+1}, where \otimes stands for the Kroenecker product, and vec[A]\text{vec}[A] denotes the vectorisation operation of matrix AA through stacking the columns of AA into a vector. Arvanitidis et al. (2019) obtain the geodesics by approximating the solution of the boundary value problem of Eq. 32 with a probabilistic differential equation solver.

Extended free energy functional.

We denote the collection of individual geodesics by Γt=˙{γtk}t=(k1)τ+tτ\Gamma_{t}\dot{=}\{\gamma^{k}_{t^{\prime}}\}_{t=(k-1)\tau+t^{\prime}\tau}, where γtk\gamma^{k}_{t^{\prime}} is the geodesic connecting 𝒪k\mathcal{O}_{k} and 𝒪k+1\mathcal{O}_{k+1}, and t[0,1]t^{\prime}\in[0,1] denotes a rescaled time variable. Additional to the constraints imposed in the previously explained setting (Sec A.1.1), here we add an extra term in the free energy U𝒢(x,t)=˙Γtx2{U_{\mathcal{G}}(x,t)\dot{=}\|\Gamma_{t}-x\|^{2}} that accounts for the local geometry of the invariant density, and guides the latent path towards the geodesic curves γtk\gamma^{k}_{t^{\prime}} that connect consecutive observations

[Q]=120T[g(x,t)f^(x)σ22+U𝒪(x,t)+βU𝒢(x,t)]qt(x)𝑑xdt.\mathcal{F}[Q]=\frac{1}{2}\int\limits^{T}_{0}\int\Big{[}\|g(x,t)-\hat{f}(x)\|_{\sigma^{2}}^{2}+U_{\mathcal{O}}(x,t)+\beta U_{\mathcal{G}}(x,t)\Big{]}\,q_{t}(x)\,dx\,\text{d}t. (33)

Here we denote the observation term by U𝒪(x,t)=˙tklnP(𝒪k|x)δ(ttk)U_{\mathcal{O}}(x,t)\dot{=}-\sum_{t_{k}}\ln\text{P}(\mathcal{O}_{k}|x)\delta(t-t_{k}), while β\beta stands for a weighting constant that determines the relative weight of the geometric term in the control objective.

Following Opper (2019), for each inter-observation interval [𝒪k,𝒪k+1][\mathcal{O}_{k},\mathcal{O}_{k+1}] we identify the posterior path measure (minimiser of Eq. 33) by the solution of a stochastic optimal control problem (Maoutsa & Opper, 2022; 2021) with the objective to obtain a time-dependent drift adjustment u(x,t):=g(x,t)f^(x)u(x,t):=g(x,t)-\hat{f}(x) for the system with drift f^(x)\hat{f}(x) with initial and terminal constraints defined by U𝒪(x,t)U_{\mathcal{O}}(x,t), and additional path constraints U𝒢(x,t)U_{\mathcal{G}}(x,t).

A.2 Approximate posterior over drift functions.

For a fixed path measure Q{Q}, the optimal measure for the drift Qf{Q}_{f} is a Gaussian process given by

QfPfexp(12f(x)σ22A(x)2f(x),B(x)σ2dx),{Q}_{f}\propto\text{P}_{f}\exp\left({-\frac{1}{2}\int\|f(x)\|_{\sigma^{2}}^{2}A(x)-2\langle f(x),B(x)\rangle_{\sigma^{2}}}\text{d}x\right), (34)

with

A(x)=˙0Tpt(x)dt,A(x)\dot{=}\int^{T}_{0}p_{t}(x)\text{d}t,

and

B(x)=˙0Tpt(x)g(x,t)dt,B(x)\dot{=}\int^{T}_{0}p_{t}(x)g(x,t)\text{d}t,

where pt(x)p_{t}(x) denotes the marginal constrained density of the state XtX_{t}. The function g(x,t)g(x,t) denotes the effective drift.

We assume a Gaussian process prior for the unknown function ff, i.e., fP0(f)=GP(mf,kf)f\sim\text{P}_{0}({f})=\text{GP}(m^{f},k^{f}) where mfm^{f} and kfk^{f} denote the mean and covariance function of the Gaussian process. Following Ruttor et al. (Ruttor et al., 2013), we employ a sparse kernel approximation for the drift ff by optimising the function values over a sparse set of SS inducing points {Zi}i=1S\{Z_{i}\}^{S}_{i=1}. We obtain the resulting drift from

f^S(x)=kf(x,𝒵)(I+Λ𝒦S)1𝐝,\hat{f}_{S}(x)=k^{f}(x,\mathcal{Z})\left(I+\Lambda\,\mathcal{K}_{S}\right)^{-1}\mathbf{d}, (35)

where we have defined introduced the notation 𝒦S=˙kf(𝒵,𝒵)\mathcal{K}_{S}\dot{=}k^{f}(\mathcal{Z},\mathcal{Z})

Λ=1σ2𝒦S1(kf(𝒵,x)A(x)kf(x,𝒵)dx)𝒦S1.\Lambda=\frac{1}{\sigma^{2}}\mathcal{K}^{-1}_{S}\left(\int k^{f}(\mathcal{Z},x)A(x)k^{f}(x,\mathcal{Z})\text{d}x\right)\mathcal{K}^{-1}_{S}. (36)
𝐝=1σ2𝒦S1(kf(𝒵,x)B(x)dx)𝒦S1,\mathbf{d}=\frac{1}{\sigma^{2}}\mathcal{K}^{-1}_{S}\left(\int k^{f}(\mathcal{Z},x)B(x)\text{d}x\right)\mathcal{K}^{-1}_{S}, (37)

We employ a sample based approximation of the densities in Eq. 34 resulting from the particle sampling of the path measure QQ. Thus by representing the densities by samples, we can rewrite the density pt(x)p_{t}(x) in terms of a sum of Dirac delta functions centered around the particles positions

pt(x)1Nj=1Nδ(xXj(t)),p_{t}(x)\approx\frac{1}{N}\sum^{N}_{j=1}\delta(x-X_{j}(t)),

and replace the Riemannian integrals with summation over particles. Here Xj(t)X_{j}(t) represents the position of the jj-th particle at time point tt.

Appendix B Details on numerical experiments

We simulated a two dimensional Van der Pol oscillator with drift function

f1(x,y)\displaystyle f_{1}(x,y) =μ(x13x3y)\displaystyle=\mu(x-\frac{1}{3}x^{3}-y) (38)
f2(x,y)\displaystyle f_{2}(x,y) =1μx,\displaystyle=\frac{1}{\mu}x, (39)

starting from initial condition x0=[1.81,1.41]x0=[1.81,-1.41] and under noise amplitudes σ={0.25,0.50,0.75,1.00}\sigma=\{0.25,0.50,0.75,1.00\} for total duration of T={500,1000}T=\{500,1000\} time units. The employed inter-observation intervals τ[80,320]×dt\tau\in[80,320]\times dt. The last inter-observation interval exceeds the half period of the oscillator and thus samples only a single state per period. This resulted in erroneous estimates. In this setting this indicates the upper limit of τ\tau for which we can provide estimates. However for any inference method, if the observation process samples only one observation per period, identifying the underlying force field without additional assumptions is not possible with temporal methods. The discretisation time-step used for simulation of the ground truth dynamics, and path augmentation δt=0.01\delta t=0.01. For sampling the controlled bridges we employed N=100N=100 particles evolving the associated ordinary differential equation as described in Maoutsa & Opper (2022; 2021); Maoutsa et al. (2020). The logarithmic gradient estimator used M=40M=40 inducing points. The sparse Gaussian process for estimating the drift was based on a sparse kernel approximation of S=300S=300 points. In the presented simulation we have employed a weighting parameter β=0.5\beta=0.5 (Eq. 33). This provides a moderate pull towards the invariant density. The example in Figure 1 was constructed with β=1\beta=1 and provides a better approximation of the transition density, than β=0.5\beta=0.5.

For identifying geodesic curves in settings where only one observation is collected per oscillation period, we have to bias the method that computes the geodesics to identify the geodesic towards the correct direction of movement, that in this case is not always the shortest curve on the manifold between two consecutive observations. To that end we assigned to each observation a phase-like variable and considered for the computation of the geodesics only the observations that had properly valued phases given the direction of movement and the phases of the two end-point observations. For example, for a counter-clockwise rotation and for phases at the end points of the bridge ϕk=ϕ(𝒪k)\phi_{k}=\phi(\mathcal{O}_{k}) and ϕk+1=ϕ(𝒪k+1)\phi_{k+1}=\phi(\mathcal{O}_{k+1}), we construct the geodesics only on the observations that have phases within thin [ϕk,ϕk+1][\phi_{k},\phi_{k+1}] if ϕk<ϕk+1\phi_{k}<\phi_{k+1}, or within [ϕk,1][0,ϕk+1][\phi_{k},1]\cup[0,\phi_{k+1}] if ϕk>ϕk+1\phi_{k}>\phi_{k+1}. With this approach we consider essentially only the relevant part of the manifold that aligns with the direction of movement. With the function ϕ():D[0,1]\phi(\cdot):\mathcal{R}^{D}\rightarrow[0,1] we denote the function that assigns a phase variable to each observation. Here for the Van der Pol oscillator we considered ϕ(𝐱)=arctan2(𝐱(2)𝐱(1))+π2π\phi(\mathbf{x})=\frac{\text{arctan2}(\frac{\mathbf{x}^{(2)}}{\mathbf{x}^{(1)}})+\pi}{2\pi}. An alternative option for assigning phase to each observation is measuring the angle between the Hilbert transform of 𝐱(i)\mathbf{x}^{(i)} and 𝐱(i)\mathbf{x}^{(i)} itself, and rescaling the values to the [0,1][0,1] interval. Here 𝐱(i)\mathbf{x}^{(i)} denotes one dimensional component of the observations.

Appendix C Future directions

The study of topological and geometric properties of invariant densities induced by stochastic dynamical systems is a relatively unexplored area with limited research available. Cong and Huang (Cong et al., 1997) approached the study of topological properties of systems perturbed by noise through the lens of random dynamical systems, which, under certain conditions and assumptions, can be translated to the language of stochastic dynamical systems. This direction has great potential for future investigation, especially in conjunction with delay embedding approaches devised for random dynamical systems (Stark et al., 2003) for estimation of the dimensionality of the invariant manifold.

One potential direction for future research is to directly employ Riemannian Langevin dynamics (Wang et al., 2020) to construct the augmented paths on the approximated metric. Alternatively, future directions for inference at the low sampling rate limit may tackle the problem through an operation learning perspective. Neglecting geometric considerations, constructing augmented paths for each inter-observation interval essentially solves the same control problem multiple times, only with different initial and terminal constraints. Thus neural network operator learning approaches like the neural operator used in Li et al. (2020a; b) that can provide generalizations for different initial and possibly terminal conditions may be relevant for tackling the inference problem through path augmentation in a more computationally efficient way.

Further promising future directions may consider path augmentation in terms of Schrödinger bridges that would account for noisy observations (here we implicitly assume no noise in the observations or small Gaussian noise).  Tamir et al. (2023) propose an efficient algorithm for introducing path constraints in the Schrödinger bridge problem that may be employed in our framework to account for geometric inductive biases for inference of stochastic dynamics.

Appendix D Additional figures

Refer to caption
Figure 4: Considered state increments for low frequency observations under Gaussian likelihood assumptions. Euclidean distance (yellow line) - used to compute the state increments between successive observations - does not account for the curvature of the invariant density. The geodesic curve (purple line) provides a better approximation of the unobserved state of the system between successive observations (light green line).