This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\jvol

XX \jnumX \jyearYear

\dhead

RESEARCH ARTICLE

\subhead

INFORMATION SCIENCE

\authornote

Corresponding authors. Email: [email protected]; [email protected]

EPR-Net: Constructing non-equilibrium potential landscape via a variational force projection formulation

Yue Zhao1    Wei Zhang2,3,∗    Tiejun Li1,4,5,∗ 1Center for Data Science, Peking University, Beijing 100871, China 2Zuse Institute Berlin, Berlin 14195, Germany 3Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin 14195, Germany 4LMAM and School of Mathematical Sciences, Peking University, Beijing 100871, China 5Center for Machine Learning Research, Peking University, Beijing 100871, China
(XX XX Year; XX XX Year; XX XX Year)
Abstract

[ABSTRACT]We present EPR-Net, a novel and effective deep learning approach that tackles a crucial challenge in biophysics: constructing potential landscapes for high-dimensional non-equilibrium steady-state (NESS) systems. EPR-Net leverages a nice mathematical fact that the desired negative potential gradient is simply the orthogonal projection of the driving force of the underlying dynamics in a weighted inner-product space. Remarkably, our loss function has an intimate connection with the steady entropy production rate (EPR), enabling simultaneous landscape construction and EPR estimation. We introduce an enhanced learning strategy for systems with small noise, and extend our framework to include dimensionality reduction and state-dependent diffusion coefficient case in a unified fashion. Comparative evaluations on benchmark problems demonstrate the superior accuracy, effectiveness, and robustness of EPR-Net compared to existing methods. We apply our approach to challenging biophysical problems, such as an 8D limit cycle and a 52D multi-stability problem, which provide accurate solutions and interesting insights on constructed landscapes. With its versatility and power, EPR-Net offers a promising solution for diverse landscape construction problems in biophysics.

doi:
10.1093/nsr/XXXX
keywords:
high-dimensional potential landscape, non-equilibrium system, entropy production rate, dimensionality reduction, deep learning

INTRODUCTION

Since Waddington’s famous landscape metaphor on the development of cells in the 1950s [1], the construction of potential landscape for non-equilibrium biochemical reaction systems has been recognized as an important problem in theoretical biology, as it provides insightful pictures for understanding complex dynamical mechanisms of biological processes. This problem has attracted considerable attention in recent decades in both biophysics and applied mathematics communities. Until now, different approaches have been proposed to realize Waddington’s landscape metaphor in a rational way, and its connection to computing the optimal epigenetic switching paths and switching rates in biochemical reaction systems has also been extensively studied. See [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13] and the references therein for details and [14, 15, 16, 17, 18] for reviews. Broadly speaking, these proposals can be classified into two types: (T1) the construction of potential landscape in the finite noise regime [2, 3, 4, 5] and (T2) the construction of the quasi-potential in the zero noise limit [7, 6, 8].

For low-dimensional systems (i.e., dimension less than 44), the potential landscape can be numerically computed either by solving a Fokker-Planck equation (FPE) using grid-based methods until the steady solution is reached approximately as in (T1) type proposals [3, 19], or by solving a Hamilton-Jacobi-Bellman (HJB) equation using, for instance, the ordered upwind method [20] or minimum action type method [8] as in (T2) type proposals. However, these approaches suffer from the curse of dimensionality when applied to high-dimensional systems. Although methods based on mean-field approximations are able to provide a semi-quantitative description of the potential landscape for some typical systems [4, 21], direct and general approaches are still favored in applications. In this aspect, pioneering work has been done recently, which allows direct construction of a high-dimensional potential landscape using deep neural networks (DNNs), based on either the steady viscous HJB equation satisfied by the potential landscape function in the (T1) case [22, 23], or the point-wise orthogonal decomposition of the force field in (T2) case [24].

While these works have significant advanced methodological developments, these approaches are based on solving HJB equations alone and may encounter numerical difficulties due to either non-uniqueness of the weak solution to the non-viscous HJB equation in (T2) case [25], or singularity of the solution with a small noise in (T1) case. Meanwhile, the loss functions considered in [24, 22, 23] are essentially of physics-informed neural network (PINN) form [26], so are generally non-convex, and thus might encounter troubling local minimum issues in the training process.

In this work, we present a simple yet efficient DNN approach to construct the potential landscape of high-dimensional non-equilibrium steady state (NESS) systems in (T1) type with either moderate or small noise. Its intimate connection with non-equilibrium statistical mechanics, nice variational structure and superior numerical performance make it a competitive and promising approach in landscape construction methodology. Our main contributions are as follows.

  1. (1)

    Proposal of entropy production rate (EPR) loss. We introduce the convex EPR loss function, reveal its connection with the entropy production rate in statistical physics, and propose its enhanced version using the tempering technique.

  2. (2)

    Dimensionality reduction. We put forward a simple dimensionality reduction strategy when the reducing variables are prescribed, and, interestingly, the reduction formalism has a unified formulation with the EPR framework for the primitive variables. This even holds whenever the system has constant or variable diffusion coefficients.

  3. (3)

    Successful high-dimensional applications. We successfully apply our approach to some challenging high-dimensional biological systems including an eight-dimensional (8D) cell cycle model [4] and a 52D multi-stable system [27]. Our results reveal more delicate structure of the constructed landscapes than what mean-field approximations typically provide, yet we acknowledge that mean-field approximations are not limited by the potential challenges in SDE simulations.

Overall, EPR-Net offers a promising solution for diverse landscape construction problems in biophysics. Even its nice mathematical structure and connection with non-equilibrium statistical physics make it a unique object that deserves further theoretical and numerical exploration in the future.

Refer to caption
Figure 1: Constructing energy landscapes through enhanced EPR workflow. (a) The primary objective is to construct the energy landscape defined through the steady-state distribution of the system. (b) Constructing the high-dimensional energy landscape using the EPR framework with primitive variables. (c) Constructing the dimensionality-reduced energy landscape using EPR with prescribed reduced variables.

EPR FRAMEWORK

In this section, we provide an overview of the whole EPR framework, including its primitive formulation, the physical interpretation, its convex property, and the enhanced EPR version.

Overview

Consider an ergodic stochastic differential equation (SDE)

d𝒙(t)dt=𝑭(𝒙(t))+2D𝒘˙,𝒙(0)=𝒙0,\frac{\differential\bm{x}(t)}{\differential t}=\bm{F}(\bm{x}(t))+\sqrt{2D}\,\dot{\bm{w}},\quad\bm{x}(0)=\bm{x}_{0}, (1)

where 𝒙0d\bm{x}_{0}\in\mathbb{R}^{d}, 𝑭:dd\bm{F}:\mathbb{R}^{d}\rightarrow\mathbb{R}^{d} is a smooth driving force, 𝒘˙=(w˙1,,w˙d)\dot{\bm{w}}=(\dot{w}_{1},\ldots,\dot{w}_{d})^{\top} is the dd-dimensional temporal Gaussian white noise with 𝔼w˙i(t)=0\mathbb{E}\dot{w}_{i}(t)=0 and 𝔼[w˙i(t)w˙j(s)]=δijδ(ts)\mathbb{E}[\dot{w}_{i}(t)\dot{w}_{j}(s)]=\delta_{ij}\delta(t-s) for i,j=1,,di,j=1,\ldots,d and s,t0s,t\geq 0. The constant D>0D>0 specifies the noise strength, which is often proportional to the system’s temperature TT. We denote by pss(𝒙)p_{\mathrm{ss}}(\bm{x}) its steady-state probability density function (PDF).

Following the (T1) type proposal in [3], we define the potential of (1) as U=DlnpssU=-D\ln p_{\mathrm{ss}} and the steady probability flux 𝑱ss=pss𝑭Dpss\bm{J}_{\mathrm{ss}}=p_{\mathrm{ss}}\bm{F}-D\nabla p_{\mathrm{ss}} in the domain Ω\Omega, which we assume for simplicity is either d\mathbb{R}^{d} or a dd-dimensional hyperrectangle. The steady-state PDF pss(𝒙)p_{\mathrm{ss}}(\bm{x}) satisfies the Fokker-Planck equation (FPE)

(pss𝑭)+DΔpss=0,inΩ,-\nabla\cdot(p_{\mathrm{ss}}\bm{F})+D\Delta p_{\mathrm{ss}}=0,\quad\textrm{in}~{}\Omega, (2)

and we assume the asymptotic boundary condition (BC) pss(𝒙)0p_{\mathrm{ss}}(\bm{x})\rightarrow 0 as |𝒙||\bm{x}|\rightarrow\infty when Ω=d\Omega=\mathbb{R}^{d}, or the reflecting boundary condition 𝑱ss𝒏=0\bm{J}_{\mathrm{ss}}\cdot\bm{n}=0 when Ωd\Omega\subset\mathbb{R}^{d} is a dd-dimensional hyperrectangle, where 𝒏\bm{n} is the unit outer normal. In both cases, we have pss(𝒙)0p_{\mathrm{ss}}(\bm{x})\geq 0 and Ωpss(𝒙)d𝒙=1\int_{\Omega}p_{\mathrm{ss}}(\bm{x})\,\differential\bm{x}=1.

Here we illustrate the main EPR workflow through Fig. 1. As depicted in Fig. 1a, our primary objective is to construct the energy landscape U=DlnpssU=-D\ln p_{\mathrm{ss}} defined for Eq. (1). Leveraging our proposed approach, the enhanced EPR method, we first simulate the considered SDEs until steady state is reached in order to get sample points (the whole ‘SDE simulation’ column of Fig. 1). We then train a neural network, representing the potential UU, for the landscape construction, even when confronted with challenging high-dimensional scenarios. We initially introduce the EPR loss LEPR\operatorname{L_{EPR}} (the middle column of Fig. 1b), which benefits from its convexity, with its minimum coinciding with the entropy production rate. Subsequently, we present the enhanced EPR loss Lenh\operatorname{L_{enh}}, tailored to encompass the transition domain more effectively. Furthermore, the overall methodology can be easily extended to the dimensionality reduction problem (Fig. 1c), with a unified formulation as the EPR framework shown in Fig. 1b, but for the projected variables and corresponding loss functions LPEPR\operatorname{L_{P-EPR}} and LPenh\operatorname{L_{P-enh}}.

EPR loss

Aiming at an effective approach for high-dimensional applications, we employ NNs to approximate U(𝒙)U(\bm{x}), and the key idea in this paper is to learn UU by training DNN with the loss function

LEPR(V)=Ω|𝑭(𝒙)+V(𝒙;θ)|2dπ(𝒙),\operatorname{L_{EPR}}(V)=\int_{\Omega}|\bm{F}(\bm{x})+\nabla V(\bm{x};\theta)|^{2}\,\differential\pi(\bm{x}), (3)

where V:=V(𝒙;θ)V:=V(\bm{x};\theta) is a neural network function with parameters θ\theta [28], and dπ(𝒙)=pss(𝒙)d𝒙\differential\pi(\bm{x})=p_{\mathrm{ss}}(\bm{x})\,\differential\bm{x}. To justify (3), we note that, for any function WW, UU satisfies the orthogonality relation

Ω(𝑭(𝒙)+U(𝒙))W(𝒙)dπ(𝒙)=0,\int_{\Omega}\big{(}\bm{F}(\bm{x})+\nabla U(\bm{x})\big{)}\cdot\nabla W(\bm{x})\,\differential\pi(\bm{x})=0, (4)

which follows from FPE (2), a simple integration by parts and the corresponding BC. Equation (4) means that 𝑭+U\bm{F}+\nabla U is perpendicular to the space 𝒢\mathcal{G} formed by W\nabla W for all possible WW under the π\pi-weighted inner product. Equivalently, U-\nabla U is the orthogonal projection of the driving force 𝑭\bm{F} onto 𝒢\mathcal{G} under the π\pi-weighted inner product, which implies that UU is the unique minimizer (up to a constant) of the loss function (3). See Supplementary Information (SI) Section A for derivations in detail. We note that related ideas are taken in coarse-graining of molecular systems [29, 30] and generative modeling in machine learning [31, 32]. However, to the best of our knowledge, using the loss in (3) to construct the potential of NESS systems has never been considered before.

In fact, define the π\pi-weighted inner product for any square integrable functions f,gf,g on Ω\Omega as

(f,g)π:=Ωf(𝒙)g(𝒙)dπ(𝒙)(f,g)_{\pi}:=\int_{\Omega}f(\bm{x})g(\bm{x})\,\differential\pi(\bm{x}) (5)

and the corresponding Lπ2L^{2}_{\pi}-norm π\|\cdot\|_{\pi} by fπ2:=(f,f)π\|f\|^{2}_{\pi}:=(f,f)_{\pi}, we get a Hilbert space Lπ2L^{2}_{\pi} (see, e.g., [33, Chapter II.1]). Eq. (4) implies that the minimization of EPR loss finds the orthogonal projection of 𝑭\bm{F} to the function gradient space 𝒢\mathcal{G}, which is formed by function gradients W\nabla W for any WW, under the π\pi-weighted inner product. Choosing W=UW=U in (4) gives

𝑭(𝒙)=U(𝒙)+𝒍(𝒙), such that (U,𝒍)π=0.\bm{F}(\bm{x})=-\nabla U(\bm{x})+\bm{l}(\bm{x}),\text{ such that }(\nabla U,\bm{l})_{\pi}=0. (6)

However, we remark that this orthogonality holds only in the Lπ2L^{2}_{\pi}-inner product sense instead of the pointwise sense. Furthermore, the two orthogonality relations (4) and (6) can be understood as follows. Using (6), relation (4) is equivalent to Ω𝒍Wdπ=0\int_{\Omega}\bm{l}\cdot\nabla Wd\pi=0 for any WW. Integration by parts gives (𝒍eU/D)=0\nabla\cdot(\bm{l}\,\mathrm{e}^{-U/D})=0, which is equivalent to U𝒍+D𝒍=0\nabla U\cdot\bm{l}+D\nabla\cdot\bm{l}=0. When D0D\rightarrow 0, we recover the pointwise orthogonality, which is adopted in computing quasi-potentials in [24]. In mathematical language, (6) can be understood as the Hodge-type decomposition in Lπ2L^{2}_{\pi} instead of L2L^{2} space.

To utilize (3) in numerical computations, we replace the ensemble average (3) with respect to the unknown π\pi by the empirical average with data sampled from (1):

L^EPR(θ)=1Ni=1N|𝑭(𝒙i)+V(𝒙i;θ)|2.\operatorname{\widehat{L}_{EPR}}(\theta)=\frac{1}{N}\sum_{i=1}^{N}\big{|}\bm{F}(\bm{x}_{i})+\nabla V(\bm{x}_{i};\theta)\big{|}^{2}. (7)

Here (𝒙i)1iN(\bm{x}_{i})_{1\leq i\leq N} could be either the final states (at time 𝖳\mathsf{T}) of NN independent trajectories starting from different initializations, or equally spaced time series along a single long trajectory up to time 𝖳\mathsf{T}, where 𝖳1\mathsf{T}\gg 1. In both cases, the ergodicity of SDE in (1) guarantees that (7) is a good approximation of (3) as long as 𝖳\mathsf{T} is large [34]. We adopt the former approach in the numerical experiments in this work, where the gradients of both VV (with respect to 𝒙\bm{x}) and the loss itself (with respect to θ\theta) in (7) are calculated by auto-differentiation through PyTorch [35]. Given the involvement of an approximation on π\pi, we perform a formal stability analysis of (3) in SI Section B to ensure its reliability and robustness. Additionally, in the following subsection, we elucidate the benefits of EPR, particularly in terms of convexity and its physical interpretation pertaining to the entropy production rate.

Physical interpretation and convexity

The minimum loss of (3), denoted as LEPR(U)\operatorname{L_{EPR}}(U), possesses a well-defined physical interpretation. In the following discussion, we show that this minimum EPR loss aligns precisely with the steady entropy production rate as defined in the NESS theory. Following [36, 37], we have the important identity concerning the entropy production for (1):

DdS(t)dt=ep(t)hd(t).D\frac{\differential S(t)}{\differential t}=e_{p}(t)-h_{d}(t). (8)

Here S(t):=Ωp(𝒙,t)lnp(𝒙,t)d𝒙S(t):=-\int_{\Omega}p(\bm{x},t)\ln p(\bm{x},t)\,\differential\bm{x} is the entropy of the probability density function p(𝒙,t)p(\bm{x},t) at time tt, epe_{p} is the entropy production rate (EPR)

ep(t)=Ω|𝑭Dlnp|2p(𝒙,t)d𝒙,e_{p}(t)=\int_{\Omega}\left|\bm{F}-D\nabla\ln p\right|^{2}p(\bm{x},t)\,\differential\bm{x}, (9)

and hdh_{d} is the heat dissipation rate

hd(t)=Ω𝑭(𝒙)𝑱(𝒙,t)d𝒙,h_{d}(t)=\int_{\Omega}\bm{F}(\bm{x})\cdot\bm{J}(\bm{x},t)\,\differential\bm{x}, (10)

with the probability flux 𝑱(𝒙,t):=p(𝒙,t)(𝑭(𝒙)Dlnp(𝒙,t))\bm{J}(\bm{x},t):=p(\bm{x},t)(\bm{F}(\bm{x})-D\nabla\ln p(\bm{x},t)) at time tt. When D=kBTD=k_{B}T, the above formulas have clear physical meaning in statistical physics.

From the loss in (3) and the steady state of (9), we get the identity

LEPR(U)=\displaystyle\operatorname{L_{EPR}}(U)= Ω|𝑱ss|21pssd𝒙\displaystyle\int_{\Omega}|\bm{J}_{\mathrm{ss}}|^{2}\frac{1}{p_{\mathrm{ss}}}\,\differential\bm{x} (11)
=\displaystyle= Ω|𝑭Dlnpss|2pssd𝒙=epss,\displaystyle\int_{\Omega}\left|\bm{F}-D\nabla\ln p_{\mathrm{ss}}\right|^{2}p_{\mathrm{ss}}\,\differential\bm{x}=e^{\rm{ss}}_{p},

where 𝑱ss(𝒙)\bm{J}_{\mathrm{ss}}(\bm{x}) is the steady probability flux and epsse^{\rm{ss}}_{p} denotes the steady entropy production rate (EPR) of the NESS system (1) [36, 3, 37]. Therefore, minimizing (3) is equivalent to approximating the steady Entropy Production Rate. This correspondence provides a rationale for naming our approach ‘EPR-Net’.

EPR loss has an appealing property that it is strictly convex on VV (up to a constant), i.e.,

LEPR(Vω)<(1ω)LEPR(V0)+ωLEPR(V1)\operatorname{L_{EPR}}(V_{\omega})<(1-\omega)\operatorname{L_{EPR}}(V_{0})+\omega\operatorname{L_{EPR}}(V_{1}) (12)

for any 0<ω<10<\omega<1 and V0,V1V_{0},V_{1} which satisfy (V0V1)0\nabla(V_{0}-V_{1})\not\equiv 0, where Vω:=(1ω)V0+ωV1V_{\omega}:=(1-\omega)V_{0}+\omega V_{1}. Eq. (12) can be easily verified by direct calculations. This strict convexity guarantees the uniqueness of the critical point VV (up to a constant) and provides theoretical guarantee on the fast convergence of the training procedure under certain assumptions [38]. It may also contribute to better convergence behavior of EPR during the training process, as mentioned in the numerical comparisons part.

Enhanced EPR loss

Substituting the relation pss(𝒙)=exp(U(𝒙)/D)p_{\mathrm{ss}}(\bm{x})=\exp(-U(\bm{x})/D) into (2), we get the viscous HJB equation

𝒩HJB(U):=𝑭U+DΔU|U|2+D𝑭=0\mathcal{N}_{\textrm{HJB}}(U):=-\bm{F}\cdot\nabla U+D\Delta U-|\nabla U|^{2}+D\nabla\cdot\bm{F}=0 (13)

with the asymptotic BC UU\rightarrow\infty as |𝒙||\bm{x}|\rightarrow\infty in the case of Ω=d\Omega=\mathbb{R}^{d}, or the reflecting BC (𝑭+U)𝒏=0(\bm{F}+\nabla U)\cdot\bm{n}=0 on Ω\partial\Omega when Ω\Omega is a dd-dimensional hyperrectangle, respectively. As in the framework of PINNs [26], (13) motivates the HJB loss

LHJB(V)=Ω|𝒩HJB(V(𝒙;θ))|2dμ(𝒙),\operatorname{L_{HJB}}(V)=\int_{\Omega}\big{|}\mathcal{N}_{\textrm{HJB}}(V(\bm{x};\theta))\big{|}^{2}\,\differential\mu(\bm{x}), (14)

where μ\mu is any desirable distribution. By choosing μ\mu properly, this loss allows the use of sample data that better covers the domain Ω\Omega and, when combined with the loss in (3), leads to improvement of the training results when DD is small. Specifically, for small DD, we propose the enhanced loss which has the form

L^enh(θ)=L^EPR(θ)+λL^HJB(θ),\operatorname{\widehat{L}_{enh}}(\theta)=\operatorname{\widehat{L}_{EPR}}(\theta)+\lambda\operatorname{\widehat{L}_{HJB}}(\theta), (15)

where L^EPR(θ)\operatorname{\widehat{L}_{EPR}}(\theta) is defined in (7), L^HJB(θ)=1Ni=1N|𝒩HJB(V(𝒙i;θ))|2\operatorname{\widehat{L}_{HJB}}(\theta)=\frac{1}{N^{\prime}}\sum_{i=1}^{N^{\prime}}|\mathcal{N}_{\textrm{HJB}}(V(\bm{x}^{\prime}_{i};\theta))|^{2} is an approximation of (14) using an independent data set (𝒙i)1iN(\bm{x}^{\prime}_{i})_{1\leq i\leq N^{\prime}} obtained by sampling the trajectories of (1) with a larger D>DD^{\prime}>D. The weight parameter λ>0\lambda>0 balances the contribution of the two terms in (15). While its value can be tuned based on system’s properties, in our numerical experiment we observed that the method performs well for λ\lambda taking values in a relatively broad range. Note that the proposed strategy is both general and easily adaptable. For instance, one can alternatively use data (𝒙i)1iN(\bm{x}^{\prime}_{i})_{1\leq i\leq N^{\prime}} that contains more samples in the transition region, or employ in (15) a modification of the loss (14[22]. We further illustrate the motivation of enhanced EPR in SI Section E.

RESULTS FOR ENERGY LANDSCAPE CONSTRUCTION

In this section, we demonstrate the superiority of the enhanced EPR over the HJB loss alone and over the normalizing flow (NF), which is a class of generative models used for density estimation that leverage invertible transformations to map between complex data distributions and simple latent distributions [39], through 2D benchmark examples. We then apply the enhanced EPR to the 3D Lorenz model and a 12D Gaussian mixture model to show its effectiveness in constructing energy landscapes in higher dimensions. We remark that alternative approaches have been investigated for potential construction in limit cycles [40] and the Lorenz system [41] distinct from our methodology.

Two-dimensional benchmark examples

Within this subsection, we undertake a comparative analysis of 2D benchmark problems. These encompass a toy model, a 2D biological system exhibiting a limit cycle [3], and a 2D multi-stable system [5], see SI Section F.1-F.4 for training details, additional results and problem settings.

Refer to caption
Figure 2: Two-dimensional benchmark examples solved under the EPR framework. (a-c) Filled contour plots of the learned potential V(𝒙;θ)V(\bm{x};\theta^{*}) for (a) a toy model learned by the EPR loss (3), (b) a biochemical oscillation network model [3] and (c) a tri-stable cell development model [5], all of which are learned by the enhanced loss (15). The force field 𝑭(𝒙)\bm{F}(\bm{x}) is decomposed into the gradient part V(𝒙;θ)-\nabla V(\bm{x};\theta^{*}) (white arrows) and the non-gradient part 𝑭(𝒙)+V(𝒙;θ)\bm{F}(\bm{x})+\nabla V(\bm{x};\theta^{*}) (gray arrows). The length of an arrow corresponds to the magnitude of the vector. The solid dots are samples from the simulated invariant distribution. (d-f) SDE-simulated samples (𝒙i)1iN(\bm{x}_{i})_{1\leq i\leq N} (yellow points) and enhanced samples (𝒙i)1iN(\bm{x}^{\prime}_{i})_{1\leq i\leq N^{\prime}} (green points). (d) D=0.1D=0.1, where enhanced samples are generated with D=2DD^{\prime}=2D. (e) D=0.1D=0.1, where enhanced samples are obtained by adding Gaussian perturbations with σ=0.05\sigma=0.05 on SDE-simulated samples. (f) D=0.01D=0.01, where enhanced samples are generated with D=10DD^{\prime}=10D.

Constructing landscapes

The potential function V(𝒙;θ)V(\bm{x};\theta) is approximated using a feedforward neural network architecture consisting of three hidden layers, employing the tanh activation function. Each hidden layer comprises 20 hidden states. Subsequently, we refer to the enhanced loss as

Lenh=λ1LEPR+λ2LHJB,\operatorname{L_{\text{enh}}}=\lambda_{1}\operatorname{L_{\text{EPR}}}+\lambda_{2}\operatorname{L_{\text{HJB}}}, (16)

where λ1\lambda_{1} and λ2\lambda_{2} should be chosen to balance the corresponding loss terms. In SI Section F.1, we conduct a sensitivity analysis of λ1\lambda_{1}, demonstrating that our methods are robust and effective across a broad spectrum of scenarios.

As illustrated in Fig. 2a-c, we initially showcase well-constructed landscapes under the EPR framework. These include the acquired potentials V(𝒙;θ)V(\bm{x};\theta^{*}), the decomposition of forces, and sample points derived from the simulated invariant distribution. In the toy model (Fig. 2a), the gradient of the potential (white arrows) points directly towards the corresponding attractor, while the non-gradient part of the force field (gray arrows) shows a counter-clockwise rotation in the model with a limit cycle (Fig. 2b), and a splitting-and-back flow from the attractor in the middle to the other two attractors in the tri-stable dynamical model (Fig. 2c).

Data enhancement techniques employed to generate improved samples are flexible, as visually depicted in Fig. 2d-f. We present choices for obtaining these samples, including generating more diffusive samples with a diffusion coefficient D>DD^{\prime}>D or introducing Gaussian perturbations with a standard deviation of σ\sigma. We provide a more comprehensive analysis of the data enhancement in SI Section F.1, demonstrating the resilience of enhanced EPR to variations in the enhanced samples. Specifically in Fig. 2, we use D=2DD^{\prime}=2D for the toy model, D=10DD^{\prime}=10D for the multi-stable problem and σ=0.05\sigma=0.05 for the limit-cycle problem. We use the same size of the SDE-simulated dataset (𝒙i)1iN(\bm{x}_{i})_{1\leq i\leq N} and the enhanced dataset (𝒙i)1iN(\bm{x}^{\prime}_{i})_{1\leq i\leq N^{\prime}}, denoted as N=NN=N^{\prime}.

Numerical comparisons

We proceed to perform a quantitative and comprehensive comparison of the performance between enhanced EPR, solving HJB alone and NF. For the toy model, the true solution is analytically known (see SI Section F.2), while for the other two 2D examples, we compute the reference solution by discretizing the steady FPE using a piecewise bilinear finite element method on a fine rectangular grid and computing the obtained sparse linear system using the least squares solver (the normalization condition Ωpss(𝒙)d𝒙=1\int_{\Omega}p_{\rm{ss}}(\bm{x})\differential\bm{x}=1 is utilized to fix the additive constant). After training, we shift the minimum of the potentials to the origin and we measure their accuracy using the relative root mean square error (rRMSE) and the relative mean absolute error (rMAE):

rRMSE\displaystyle\operatorname{rRMSE} =𝒟|V(𝒙;θ)U0(𝒙)|2d𝒙𝒟|U0(𝒙)|2d𝒙,\displaystyle=\sqrt{\frac{\int_{\mathcal{D}}\left|V(\bm{x};\theta^{*})-U_{0}(\bm{x})\right|^{2}\differential\bm{x}}{\int_{\mathcal{D}}|U_{0}(\bm{x})|^{2}\differential\bm{x}}}, (17)
rMAE\displaystyle\operatorname{rMAE} =𝒟|V(𝒙;θ)U0(𝒙)|d𝒙𝒟|U0(𝒙)|d𝒙,\displaystyle=\frac{\int_{\mathcal{D}}\left|V(\bm{x};\theta^{*})-U_{0}(\bm{x})\right|\differential\bm{x}}{\int_{\mathcal{D}}|U_{0}(\bm{x})|\differential\bm{x}}, (18)

on the domain 𝒟={𝒙Ω|U0(𝒙)20D}\mathcal{D}=\{\bm{x}\in\Omega|U_{0}(\bm{x})\leq 20D\}, where U0U_{0} denotes the reference solution.

Table 1: Comparisons of different methods. We report the mean and the standard deviation over 5 random seeds. The best results are highlighted in bold.
Problem Method rRMSE rMAE
Toy, D=0.1 Enhanced EPR 0.028±\pm0.010 0.026±\pm0.010
HJB loss alone 0.034±\pm0.016 0.029±\pm0.014
Normalizing Flow 0.124±\pm0.016 0.082±\pm0.011
Toy, D=0.05 Enhanced EPR 0.054±\pm0.020 0.048±\pm0.019
HJB loss alone 0.191±\pm0.218 0.160±\pm0.179
Normalizing Flow 0.239±\pm0.040 0.174±\pm0.034
Multi-stable Enhanced EPR 0.067±\pm0.047 0.066±\pm0.049
HJB loss alone 0.249±\pm0.015 0.228±\pm0.011
Normalizing Flow 0.202±\pm0.029 0.133±\pm0.017
Limit Cycle Enhanced EPR 0.070±\pm0.016 0.063±\pm0.020
HJB loss alone 0.231±\pm0.048 0.140±\pm0.021
Normalizing Flow 0.384±\pm0.021 0.267±\pm0.013
Refer to caption
Figure 3: Comparisons between models learned by (a,d) enhanced EPR, (b,e) HJB loss alone and (c,f) normalizing flow. (a-c) Filled contour plots of the potential V(𝒙;θ)V(\bm{x};\theta^{*}) for the toy model with D=0.05D=0.05. The force field 𝑭(𝒙)\bm{F}(\bm{x}) is decomposed into the gradient part V(𝒙;θ)-\nabla V(\bm{x};\theta^{*}) (white arrows) and the non-gradient part (gray arrows). The length of an arrow denotes the scale of the vector. The solid dots are samples from the simulated invariant distribution. The results in the high-energy region {𝒙|V(𝒙)30D}\{\bm{x}|V(\bm{x})\geq 30D\} are omitted since they are less relevant to the dynamics. (d-f) The absolute error of the learned potential constructed in different ways.

We present a detailed comparison of the 2D problems in Table 1. Our experiments involve solving HJB alone with various distributions of enhanced samples μ(𝒙)\mu(\bm{x}), and we report the optimal outcome as the representative entry for the table. For a fair comparison between enhanced EPR and standalone HJB, we maintain consistent network training configurations. Additionally, a detailed analysis of λ1\lambda_{1} and μ(𝐱)\mu(\mathbf{x}) in SI Section F.1 highlights the enhanced EPR’s superior performance and robustness. Concerning the flow model, we leverage the same SDE-simulated dataset as utilized in enhanced EPR. The metrics presented in Table 1 yield the same consistent result, reinforcing the superiority of enhanced EPR over other methods across all problems.

We further illustrate the learned potential landscapes obtained through different methods for the 2D toy model with D=0.05D=0.05 in Fig. 3a-c. Furthermore, we plot the absolute error between the learned potential and the reference solution in Fig. 3d-f. These plots demonstrate that errors within enhanced EPR are consistently small. In contrast, errors in HJB alone and NF can be significantly large, leading to unfavorable outcomes and limited smoothness. Based on this study and our numerical experiences, the advantages of the enhanced EPR over both the approach using HJB loss alone and NF can be summarized into the following points.

Refer to caption
Figure 4: Landscapes constructed by enhanced EPR. (a) Slices of the learned 3D potential V(𝒙;θ){V}(\bm{x};\theta^{*}) in the Lorenz system. The solid dots are samples from the simulated invariant distribution. (b) The accumulated measure of the absolute error between U0(𝒙)U_{0}(\bm{x}) and the learned solution V(𝒙;θ)V(\bm{x};\theta^{*}) based on the conditional probability p0(𝒙|x1,x2)p_{0}(\bm{x}|x_{1},x_{2}). (c) The learned 12D potential V(𝒙;θ)V(\bm{x};\theta^{*}) along a line 𝒙(t)=t𝝁1+(1t)𝝁2\bm{x}(t)=t\bm{\mu}_{1}+(1-t)\bm{\mu}_{2}, where 𝝁1,𝝁2\bm{\mu}_{1},\bm{\mu}_{2} are the means of two components in a 12D Gaussian mixture.
  • Accuracy. As shown in Table 1, the enhanced EPR achieves the best accuracy over the HJB loss alone and normalizing flow approaches in terms of the rRMSE and rMAE metric. From Fig. 3, we can also find that the enhanced EPR presents the overall landscape more consistent with the simulated samples and the true/reference solution than the other two methods. The decomposition of the force also shows a better matching result. The learned potential by the HJB loss alone and normalizing flow tends to be rough and non-smooth near the edge of samples (Fig. 3b and c). Compared with the enhanced EPR and HJB loss alone, the normalizing flow captures the high probability domain but does not explicitly take advantage of the information of the dynamics, thus making its performance the worst.

  • Robustness. Without the guidance of EPR loss, minimizing HJB loss alone does not lead to a good approximation of the true solution that can closely match the heuristically chosen distribution as shown in Fig. 3b. The significant variance observed in the results for the toy model with D=0.05D=0.05 by solving HJB alone is attributed to an exceptionally poor outlier. However, the enhanced EPR always gives reliable approximations with the introduced setup. This supports the robustness of the enhanced EPR loss. We refer to SI Section F.1 for detailed analysis of the chosen parameters λ1\lambda_{1} and distribution of enhanced samples μ(𝒙)\mu(\bm{x}).

  • Efficiency. Our numerical experiments show that the enhanced EPR converges faster than the approach using the HJB loss alone. For instance, in the toy model with D=0.05D=0.05, the enhanced EPR has achieved rRMSE of 0.088±0.0830.088\pm 0.083 and rMAE of 0.075±0.0700.075\pm 0.070 in 2000 epochs, while training with the HJB loss alone can not attain the same accuracy level even after 3000 epochs. As we have analyzed, this efficiency might be attributed to the strict convexity of the EPR loss. Related theoretical support for this speculation can be found in [38].

  • Performance of single EPR. We remark that single EPR, which relies solely on the EPR loss by setting λ=0\lambda=0 in (15), can still give acceptable results with a relatively small D=0.05D=0.05. Its rRMSE and rMAE are 0.103±0.0750.103\pm 0.075 and 0.087±0.0630.087\pm 0.063, respectively, which are also small enough. In the limit cycle problem, single EPR is not as good as in the toy model and the multi-stable problem, since there are much fewer samples in the domain where the non-gradient force is extremely large, as shown in Fig. 2b. However, single EPR loss can achieve competitive performance as long as the samples cover the domain effectively.

Three-dimensional Lorenz system

We apply our landscape construction approach to the 3D Lorenz system [42] with isotropic temporal Gaussian white noise, i.e. the system (1) with 𝑭=(Fx,Fy,Fz)\bm{F}=(F_{x},F_{y},F_{z}) and

Fx(x,y,z)\displaystyle F_{x}(x,y,z) =β1(yx),\displaystyle=\beta_{1}(y-x), (19)
Fy(x,y,z)\displaystyle F_{y}(x,y,z) =x(β2z)y,\displaystyle=x\left(\beta_{2}-z\right)-y, (20)
Fz(x,y,z)\displaystyle F_{z}(x,y,z) =xyβ3z,\displaystyle=xy-\beta_{3}z, (21)

where β1=10,β2=28\beta_{1}=10,\beta_{2}=28, and β3=83\beta_{3}=\frac{8}{3}. We add the noise with strength D=1D=1. This model was also considered in [23] with D=20D=20. We obtain the enhanced data (𝒙i)1iN(\bm{x}^{\prime}_{i})_{1\leq i\leq N^{\prime}} by adding Gaussian noises with standard deviation σ=5\sigma=5 to the SDE-simulated data (𝒙i)1iN(\bm{x}_{i})_{1\leq i\leq N}, where N=N=10000N=N^{\prime}=10000. We directly train the 3D potential V(𝒙;θ)V(\bm{x};\theta) by enhanced EPR (16) with λ1=10.0,λ2=1.0\lambda_{1}=10.0,\lambda_{2}=1.0, using Adam with a learning rate of 0.001 and batch size of 2048. A slice view of the landscape is presented in Fig. 4a and the learned 3D potential agrees well with the simulated samples.

Twelve-dimensional multi-stable system

To further validate the efficacy of EPR-Net in high-dimensional scenarios, we construct a Gaussian mixture model (GMM) in 12\mathbb{R}^{12} with two centers, of which the true solution can be denoted as U0(𝒙)=Dlogp0(𝒙)U_{0}(\bm{x})=-D\log p_{0}(\bm{x}), where p0(𝒙)=w1p1(𝒙)+w2p2(𝒙)p_{0}(\bm{x})=w_{1}p_{1}(\bm{x})+w_{2}p_{2}(\bm{x}), and pi(𝒙)=Zi1exp((𝒙𝝁i)TΣi1(𝒙𝝁i)/2),i=1,2p_{i}(\bm{x})=Z_{i}^{-1}\exp(-(\bm{x}-\bm{\mu}_{i})^{T}\Sigma_{i}^{-1}(\bm{x}-\bm{\mu}_{i})/2),i=1,2, with normalization factor ZiZ_{i}. The parameters are chosen as w1=0.6w_{1}=0.6, w2=0.4w_{2}=0.4, 𝝁1=(1.2,2.0,0.6,1.5,0.9,1.5,1.5,0.9,1.2,1.2,0.5\bm{\mu}_{1}=(1.2,2.0,0.6,1.5,0.9,1.5,1.5,0.9,1.2,1.2,0.5, 1.8)1.8)^{\top}, 𝝁2=(1.8,1.4,0.8,0.9,0.9,1.5,2.0,1.0\bm{\mu}_{2}=(1.8,1.4,0.8,0.9,0.9,1.5,2.0,1.0, 1.6,1.0,0.7,1.4)1.6,1.0,0.7,1.4)^{\top}, Σ1=0.04I\Sigma_{1}=0.04I, and Σ2=0.02I\Sigma_{2}=0.02I.

For evaluation, we first sample 10,000 points from the known GMM distribution and calculate the relative error between the learned 12D potential V(𝒙;θ)V(\bm{x};\theta^{*}) and the true potential U0(𝒙)U_{0}(\bm{x}). The resulting rRMSE and rMAE are 0.054 and 0.051, respectively, indicating the high accuracy of the learned potential. In addition, we denote the first two dimensions of this 12D problem as (x1,x2)(x_{1},x_{2}) and draw samples from the corresponding conditional distribution p0(𝒙|x1,x2)p_{0}(\bm{x}|x_{1},x_{2}). As depicted in Fig. 4b, we compute the cumulative measure of the absolute error between U0(𝒙)U_{0}(\bm{x}) and the learned solution V(𝒙)V(\bm{x}) based on the conditional distribution as |U0(𝒙)V(𝒙)|p0(𝒙|x1,x2)d𝒙\int|U_{0}(\bm{x})-V(\bm{x})|p_{0}(\bm{x}|x_{1},x_{2})\differential\bm{x}. Impressively, this integral consistently remains below 0.01, underscoring the fact that our approach yields negligible errors in the effective domain. Moreover, comparing the barrier heights (BHs) further validates the precision of our approach, with a relative error of less than 5% for both BHs. Due to the diagonal covariance matrix, the saddle point precisely lies on the line passing through 𝝁1\bm{\mu}_{1} and 𝝁2\bm{\mu}_{2}, allowing us to directly compare the BHs. As shown in Fig 4c, the true and computed slice lines coincide well. For additional details, please refer to SI Section F.5.

DIMENSIONALITY REDUCTION

When applying the approach above to high-dimensional problems, dimensionality reduction is necessary in order to visualize the results and gain physical insights. For simplicity, we consider the linear case and, with a slight abuse of notation, let 𝒙=(𝒚,𝒛)\bm{x}=(\bm{y},\bm{z})^{\top}, where 𝒛=(xi,xj)2\bm{z}=(x_{i},x_{j})\in\mathbb{R}^{2} contains the coordinates of two variables of interest, and 𝒚d2\bm{y}\in\mathbb{R}^{d-2} corresponds to the remaining d2d-2 variables. The domain Ω\Omega (either d\mathbb{R}^{d} or a dd-dimensional hyperrectangle) has the decomposition Ω=Σ×Ω~\Omega=\Sigma\times\widetilde{\Omega}, where Σd2\Sigma\subseteq\mathbb{R}^{d-2} and Ω~2\widetilde{\Omega}\subseteq\mathbb{R}^{2} are the domains of 𝒚\bm{y} and 𝒛\bm{z}, respectively. Automatic selection of linear reduced variables and extensions to nonlinear reduced variables with general domains are also possible [43, 44]. In the current setting, the reduced potential is

U~(𝒛)=Dlnp~ss(𝒛)=DlnΣpss(𝒚,𝒛)d𝒚,\widetilde{U}(\bm{z})=-D\ln\widetilde{p}_{\mathrm{ss}}(\bm{z})=-D\ln\int_{\Sigma}p_{\mathrm{ss}}(\bm{y},\bm{z})\,\differential\bm{y}, (22)

and one can show that U~\widetilde{U} minimizes the loss function:

LPEPR(V~)=Ω|𝑭𝒛(𝒚,𝒛)+𝒛V~(𝒛;θ)|2dπ(𝒚,𝒛),\operatorname{L_{P-EPR}}(\widetilde{V})=\int_{\Omega}\big{|}\bm{F}_{\bm{z}}(\bm{y},\bm{z})+\nabla_{\bm{z}}\widetilde{V}(\bm{z};\theta)\big{|}^{2}\,\differential\pi(\bm{y},\bm{z}), (23)

where 𝑭𝒛(𝒚,𝒛)2\bm{F}_{\bm{z}}(\bm{y},\bm{z})\in\mathbb{R}^{2} is the 𝒛\bm{z}-component of the force field 𝑭=(𝑭𝒚,𝑭𝒛)\bm{F}=(\bm{F}_{\bm{y}},\bm{F}_{\bm{z}})^{\top}.

Moreover, one can derive an enhanced loss as in (15) that could be used for systems with small DD. To this end, we note that U~\widetilde{U} satisfies the projected HJB equation

𝒩P-HJB(U~):=\displaystyle\mathcal{N}_{\textrm{P-HJB}}(\widetilde{U}):= 𝑭~𝒛U~+DΔ𝒛U~\displaystyle-\widetilde{\bm{F}}\cdot\nabla_{\bm{z}}\widetilde{U}+D\Delta_{\bm{z}}\widetilde{U}
|𝒛U~|2+D𝒛𝑭~=0,\displaystyle-|\nabla_{\bm{z}}\widetilde{U}|^{2}+D\nabla_{\bm{z}}\cdot\widetilde{\bm{F}}=0\,, (24)

with asymptotic BC U~\widetilde{U}\rightarrow\infty as |𝒛||\bm{z}|\rightarrow\infty, or the reflecting BC (𝑭~+𝒛U~)𝒏~=0(\widetilde{\bm{F}}+\nabla_{\bm{z}}\widetilde{U})\cdot\widetilde{\bm{n}}=0 on Ω~\partial\widetilde{\Omega}, where 𝑭~(𝒛):=Σ𝑭𝒛(𝒚,𝒛)dπ(𝒚|𝒛)\widetilde{\bm{F}}(\bm{z}):=\int_{\Sigma}\bm{F}_{\bm{z}}(\bm{y},\bm{z})\differential\pi(\bm{y}|\bm{z}) is the projected force defined using the conditional distribution dπ(𝒚|𝒛)=pss(𝒚,𝒛)/p~ss(𝒛)d𝒚\differential\pi(\bm{y}|\bm{z})=p_{\mathrm{ss}}(\bm{y},\bm{z})/\widetilde{p}_{\mathrm{ss}}(\bm{z})\,\differential\bm{y}, and 𝒏~\widetilde{\bm{n}} denotes the unit outer normal on Ω~\partial\widetilde{\Omega}. Based on (DIMENSIONALITY REDUCTION), we can formulate the projected HJB loss

LPHJB(V~)=Ω~|𝒩P-HJB(V~)|2dμ(𝒛),\operatorname{L_{P-HJB}}(\widetilde{V})=\int_{\widetilde{\Omega}}\big{|}\mathcal{N}_{\textrm{P-HJB}}(\widetilde{V})\big{|}^{2}\,\differential\mu(\bm{z}), (25)

where μ\mu is any suitable distribution over Ω~\widetilde{\Omega}, and 𝑭~\widetilde{\bm{F}} in (DIMENSIONALITY REDUCTION) is learned beforehand by training a DNN with the loss

LPFor(𝑮~)=Ω|𝑭𝒛(𝒚,𝒛)𝑮~(𝒛;θ)|2dπ(𝒚,𝒛).\operatorname{L_{P-For}}(\widetilde{\bm{G}})=\int_{\Omega}\big{|}\bm{F}_{\bm{z}}(\bm{y},\bm{z})-\widetilde{\bm{G}}(\bm{z};\theta)\big{|}^{2}\,\differential\pi(\bm{y},\bm{z}). (26)

The overall enhanced loss (16) used in numerical computations comprises two terms, which are empirical estimates of (23) and (25) based on two different sets of sample data. We refer the reader to SI Section C for derivation details, SI Section G.1 for experiments of Ferrell’s cell cycle model [45] and SI Section G.2 and G.3 for details and additional results of the 8D [4] and 52D [27] models.

Refer to caption
Figure 5: Dimensionality reduction of high-dimensional systems. (a-c) Eight-dimensional cell cycle model. (a) Reduced potential landscape V~\widetilde{V} with projected contour lines. The green star at (0.13,0.59)(0.13,0.59) denotes the saddle point of V~\widetilde{V}. (b) Projected sample points, streamlines of the projected force field 𝑮~(𝒙;θ)\widetilde{\bm{G}}(\bm{x};\theta^{*}) and the filled contour plot of V~(𝒙;θ)\widetilde{V}(\bm{x};\theta^{*}). Two red circles and two red dots (close to (0.22,0.11)(0.22,0.11) and (0.31,0.47)(0.31,0.47), respectively) show the stable limit sets of the projected force field. The yellow circle is the projection of the original high-dimensional limit cycle. (c) An enlarged view of the square domain in (b), showing the detailed spiral structure of the streamlines of 𝑮~(𝒙;θ)\widetilde{\bm{G}}(\bm{x};\theta^{*}) around the stable point. (d and e) Fifty-two-dimensional multi-stable system. (d) Projected force 𝑮~(𝒛;θ)\widetilde{\bm{G}}(\bm{z};\theta^{*}) and potential V1~(𝒛;θ)\widetilde{V_{1}}(\bm{z};\theta^{*}) of the 52D double-well model learned by enhanced EPR. (e) The absolute error of the reduced potential constructed in different ways, i.e., |V1~(𝒛;θ)V2~(𝒛;θ)||\widetilde{V_{1}}(\bm{z};\theta^{*})-\widetilde{V_{2}}(\bm{z};\theta^{*})|.

Eight-dimensional cell cycle: reduced potential

Subsequently, we apply our dimensionality reduction approach to construct the landscape for an 8D cell cycle model [4], which contains both a limit cycle and a stable equilibrium point. In this case, we consider CycB (a cyclin B protein) and Cdc20 (an exit protein) as the reduced variables following [4]. As shown in Fig. 5a and b, we can find that both the profile of the reduced potential and the force strength agree well with the density of projected samples. Moreover, we gain some important insights from Fig. 5b on the projection of the high-dimensional dynamics to two dimensions. One particular feature is that the limit cycle induced by the projected force 𝑮~(𝒙;θ)\widetilde{\bm{G}}(\bm{x};\theta^{*}) (outer red circle) slightly differs from the limit cycle directly projected from high dimensions (yellow circle), and the difference is either minor or moderate depending on whether the sample density near the limit circle is high or low. This is natural in the reduction when D>0D>0, since the distribution π(𝒚|𝒛)\pi(\bm{y}|\bm{z}) involved in computing 𝑮~(𝒙;θ)\widetilde{\bm{G}}(\bm{x};\theta^{*}) is not a Dirac δ\delta-distribution, but a diffusive distribution with varying widths along the limit cycle, and the difference will disappear as D0D\rightarrow 0. Another feature is that we unexpectedly get an additional stable limit cycle (inner red circle) and a stable point (red dot in the center) emerging inside the outer limit cycle. Though virtual in high dimensions and biologically irrelevant, the existence of such two limit sets is reminiscent of the Poincaré-Bendixson theorem in planar dynamics theory [46, Chapter 10.6], which depicts a common phenomenon when performing dimensionality reduction with limit cycles to the 2D plane. Note that this theorem does not guarantee the stability of such fixed points; they could be stable, unstable, or even saddle points, see another case in SI G.1. The emergence of these two limit sets is specific in this model due to the relatively flat landscape of the potential in the centering region. In addition, close to the saddle point of V~\widetilde{V} (green star), there is a barrier along the limit cycle direction, while there is a local well along the Cdc20 direction, which characterizes the region that biological cycle paths mainly go through. Last but not least, an enlarged view of the local attractive domain outside the limit cycle shows its intricate spiral structure (Fig. 5c), which has not been revealed in previous work based on mean-field approximation [4].

Fifty-two-dimensional multi-stable system: high-dimensional and reduced potentials

We compare two approaches to construct the reduced potential. One is to learn the reduced force 𝑮~(𝒛;θ)\widetilde{\bm{G}}(\bm{z};\theta^{*}) by (26) first and then use it to construct the landscape V~(𝒛;θ)\widetilde{V}(\bm{z};\theta^{*}) by enhanced EPR in 𝒛\bm{z}-space. The other is to build a high-dimensional potential V(𝒙;θ)V(\bm{x};\theta^{*}) by enhanced EPR first and then reduce it (see SI Section C.1). For this comparison, we apply our approach to a gene stem cell regulatory network described by an ODE in 52 dimensions [27] and take GATA6 (a major differentiation marker gene) and NANOG (a major stem cell marker gene) as the reduced variable 𝒛\bm{z}, as suggested in [27]. We define 𝒜i\mathcal{A}_{i} as the set of indices for activating xix_{i} and i\mathcal{R}_{i} as the set of indices for repressing xix_{i}, the corresponding relationships are defined as the 52D node network shown in [27]. For i=1,,52i=1,...,52,

Fi(𝒙)=kxi+j𝒜jaxjnSn+xjn+jjbSnSn+xjn,F_{i}(\bm{x})=-kx_{i}+\sum_{j\in\mathcal{A}_{j}}\frac{ax_{j}^{n}}{S^{n}+x_{j}^{n}}+\sum_{j\in\mathcal{R}_{j}}\frac{bS^{n}}{S^{n}+x_{j}^{n}}, (27)

where a=0.37a=0.37, b=0.5b=0.5, k=1k=1, S=0.5S=0.5, and n=3n=3. We choose the noise strength D=0.02D=0.02 and focus on the domain Ω=[0,3]52\Omega=[0,3]^{52}.

As shown in Fig. 5d, the projected force 𝑮~(𝒛;θ)\widetilde{\bm{G}}(\bm{z};\theta^{*}) demonstrates the reduced dynamics and the constructed potential V1~(𝒛;θ)\widetilde{V_{1}}(\bm{z};\theta^{*}) agrees well with the SDE-simulated samples. While V1~(𝒛;θ)\widetilde{V_{1}}(\bm{z};\theta^{*}) is learned by first obtaining the projected force, V2~(𝒛;θ)\widetilde{V_{2}}(\bm{z};\theta^{*}) is reduced from a high-dimensional V(𝒙;θ){V}(\bm{x};\theta^{*}) precomputed by enhanced EPR. The gray plot of the absolute error of |V1~(𝒛;θ)V2~(𝒛;θ)||\widetilde{V_{1}}(\bm{z};\theta^{*})-\widetilde{V_{2}}(\bm{z};\theta^{*})| is shown in Fig. 5e, which supports the consistency of two potentials learned by different approaches. When taking V1~(𝒛;θ)\widetilde{V_{1}}(\bm{z};\theta^{*}) as the reference solution, we get the rRMSE 0.113 and rMAE 0.097. The minor relative errors show that these two approaches to constructing the reduced potential are quantitatively consistent, although it is difficult to know which is more accurate. It also indirectly supports the reliability of the learned high-dimensional potential V(𝒙;θ)V(\bm{x};\theta^{*}). The obtained landscape shows a smoother and more delicate profile compared to the mean-field approach [27].

DISCUSSION AND CONCLUSION

The EPR-Net formulation can be extended to the case of state-dependent diffusion coefficients without difficulty. Consider the Itô SDE

d𝒙(t)dt=𝑭(𝒙)+2Dσ(𝒙)𝒘˙,𝒙(0)=𝒙0,\frac{\differential\bm{x}(t)}{\differential t}=\bm{F}(\bm{x})+\sqrt{2D}\sigma(\bm{x})\,\dot{\bm{w}},\ \bm{x}(0)=\bm{x}_{0}, (28)

with diffusion matrix σ(𝒙)d×m\sigma(\bm{x})\in\mathbb{R}^{d\times m} and an mm-dimensional temporal Gaussian white noise 𝒘˙\dot{\bm{w}}. We assume that mdm\geq d and the matrix a(𝒙):=(σσ)(𝒙)a(\bm{x}):=(\sigma\sigma^{\top})(\bm{x}) satisfies 𝒖a(𝒙)𝒖c0|𝒖|2\bm{u}^{\top}a(\bm{x})\bm{u}\geq c_{0}|\bm{u}|^{2} for all 𝒙,𝒖d\bm{x},\bm{u}\in\mathbb{R}^{d}, where c0>0c_{0}>0 is a positive constant. Using a similar derivation as before, we can again show that the high-dimensional landscape function UU of (28) minimizes the EPR loss

LVEPR(V)=Ω|𝑭v(𝒙)+a(𝒙)V(𝒙)|a12dπ(𝒙),\operatorname{L_{V-EPR}}(V)=\int_{\Omega}\left|\bm{F}^{v}(\bm{x})+a(\bm{x})\nabla V(\bm{x})\right|_{a^{-1}}^{2}\,\differential\pi{(\bm{x})}, (29)

where 𝑭v(𝒙)=𝑭(𝒙)Da(𝒙)\bm{F}^{v}(\bm{x})=\bm{F}(\bm{x})-D\nabla\cdot a(\bm{x}) and |𝒖|a12:=𝒖a1(𝒙)𝒖|\bm{u}|_{a^{-1}}^{2}:=\bm{u}^{\top}a^{-1}(\bm{x})\bm{u} for 𝒖d\bm{u}\in\mathbb{R}^{d}. We provide derivation details of (29) in SI Section D, and we leave the numerical study of (28)–(29) for future work.

To summarize, we have demonstrated the formulation, applicability and superiority of EPR to the benchmark and real biological examples over some existing approaches. Below we make some final remarks. First, concerning the use of the steady-state distribution π(𝒙)\pi(\bm{x}) in (3) and its approximation by a long time series of the SDE (1) in EPR-Net, we emphasize that it is the sampling approximation of π\pi that naturally captures the important parts of the potential function, and the landscape beyond the sampled regions is not that essential in practice. Second, as is exemplified in Fig. 2 and Table 1, we found that a direct application of density estimation methods (DEM), e.g., normalizing flows [39], to the sampled time series data does not give potential landscape with satisfactory accuracy. We speculate that such deficiency of DEM is due to its over-generality and the fact that it does not take advantage of the force field information explicitly compared to (3). Finally, we remark that in our dimensionality reduction approach, we choose the reduced variables according to available biophysics prior knowledge. Our approach also exhibits the capability to be extended to incorporate a state-dependent diffusion coefficient. Automatic selection of general nonlinear reduced variables can be also studied by incorporating the idea of the identification of collected variables in molecular simulations [47, 48].

Overall, we have presented the EPR-Net, a simple yet effective DNN approach, for constructing the non-equilibrium potential landscape of NESS systems. This approach is both elegant and robust due to its variational structure and its flexibility to be combined with other types of loss functions. Further extension of dimensionality reduction to nonlinear reduced variables and numerical investigations in the case of state-dependent diffusion coefficients will be explored in future works. Moreover, we are actively considering the application of our method to single-cell RNA sequencing data, which involves systems of really high dimensionality and typically lacks analytical ODE formulations. We will explore this point in our forthcoming studies.

METHODS

Detailed methods are given in the supplementary material. The code is accessible at https://github.com/yzhao98/EPR-Net.

ACKNOWLEDGMENTS

We thank Professors Chunhe Li, Xiaoliang Wan and Dr. Yufei Ma for helpful discussions. The computation of this work was conducted on the High-performance Computing Platform of Peking University.

FUNDING

T.L. and Y.Z. acknowledge support from the National Natural Science Foundational of China (11825102 and 12288101) and the Ministry of Science and Technology of China (2021YFA1003301). The work of W.Z. was supported by the Deutsche Forschungsgemeinschaft (DFG) under Germany’s Excellence Strategy, part of the Berlin Mathematics Research Centre MATH+ (EXC-2046/1, 390685689), and the DFG through Grant CRC 1114 ‘Scaling Cascades in Complex Systems’ (235221301).

AUTHOR CONTRIBUTIONS

Y.Z., W.Z. and T.L. designed the research; Y.Z. performed the research; Y.Z., W.Z. and T.L. wrote the paper.

Conflict of interest statement. None declared.

References

  • [1] Waddington C. The strategy of the genes 1957.
  • [2] Ao P. Potential in stochastic differential equations: novel construction. J. Phys. A-Math. Gen. 2004; 37: L25.
  • [3] Wang J, Xu L and Wang E. Potential landscape and flux framework of nonequilibrium networks: Robustness, dissipation, and coherence of biochemical oscillations. Proc. Nat. Acad. Sci. USA 2008; 105: 12271.
  • [4] Wang J, Li C and Wang E. Potential and flux landscapes quantify the stability and robustness of budding yeast cell cycle network. Proc. Nat. Acad. Sci. USA 2010; 107: 8195.
  • [5] Wang J, Zhang K, Xu L et al. Quantifying the Waddington landscape and biological paths for development and differentiation. Proc. Nat. Acad. Sci. USA 2011; 108: 8257.
  • [6] Zhou J, Aliyu M, Aurell E et al. Quasi-potential landscape in complex multi-stable systems. J. R. Soc., Interface 2012; 9: 3539.
  • [7] Ge H and Qian H. Landscapes of non-gradient dynamics without detailed balance: Stable limit cycles and multiple attractors. Chaos 2012; 22: 023140.
  • [8] Lv C, Li X, Li F et al. Constructing the energy landscape for genetic switching system driven by intrinsic noise. PLoS ONE 2014; 9: e88167.
  • [9] Shi J, Aihara K, Li T et al. Energy landscape decomposition for cell differentiation with proliferation effect. Nat. Sci. Rev. 2022; 9: nwac116.
  • [10] Maier RS and Stein DL. Escape problem for irreversible systems. Phys. Rev. E 1993; 48: 931.
  • [11] Aurell E and Sneppen K. Epigenetics as a first exit problem. Phys. Rev. Lett. 2002; 88: 048101.
  • [12] Sasai M and Wolynes P. Stochastic gene expression as a many-body problem. Proc. Nat. Acad. Sci. USA 2003; 100: 2374.
  • [13] Roma DM, O’Flanagan RA, Ruckenstein AE et al. Optimal path to epigenetic switching. Phys. Rev. E 2005; 71: 011902.
  • [14] Ferrell JEJ. Bistability, bifurcations, and Waddington’s epigenetic landscape. Curr. Biol. 2012; 22: R458.
  • [15] Wang J. Landscape and flux theory of non-equilibrium dynamical systems with application to biology. Adv. Phys. 2015; 64: 1.
  • [16] Zhou P and Li T. Construction of the landscape for multi-stable systems: Potential landscape, quasi-potential, A-type integral and beyond. J. Chem. Phys. 2016; 144: 094109.
  • [17] Yuan R, Zhu X, Wang G et al. Cancer as robust intrinsic state shaped by evolution: a key issues review. Rep. Prog. Phys. 2017; 80: 042701.
  • [18] Fang X, Kruse K, Lu T et al. Nonequilibrium physics in biology. Rev. Mod. Phys. 2019; 91: 045004.
  • [19] Li C, Hong T and Nie Q. Quantifying the landscape and kinetic paths for epithelial–mesenchymal transition from a core circuit. Phys. Chem. Chem. Phys. 2016; 18: 17949–56.
  • [20] Cameron M. Finding the quasipotential for nongradient SDEs. Phys. D 2012; 241: 1532.
  • [21] Li C and Wang J. Landscape and flux reveal a new global view and physical quantification of mammalian cell cycle. Proc. Nat. Acad. Sci. USA 2014; 111: 14130.
  • [22] Lin B, Li Q and Ren W. Computing the invariant distribution of randomly perturbed dynamical systems using deep learning. J. Sci. Comp. 2022; 91: 77.
  • [23] Lin B, Li Q and Ren W. Computing high-dimensional invariant distributions from noisy data. J. Comp. Phys. 2023; 474: 111783.
  • [24] Lin B, Li Q and Ren W. A data driven method for computing quasipotentials. Proc. Mach. Learn. Res., volume 145 of 2nd Annual Conference on Mathematical and Scientific Machine Learning (2021) 652.
  • [25] Crandall MG and Lions PL. Viscosity solution of Hamilton-Jacobi equations. Trans. Amer. Math. Soc. 1983; 277: 1.
  • [26] Raissi M, Perdikaris P and Karniadakis GE. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comp. Phys. 2019; 378: 686.
  • [27] Li C and Wang J. Quantifying cell fate decisions for differentiation and reprogramming of a human stem cell network: Landscape and biological paths. PLoS Comput. Biol. 2013; 9: e1003165.
  • [28] Goodfellow I, Bengio Y and Courville A. Deep learning 2016.
  • [29] Zhang L, Han J, Wang H et al. DeePCG: Constructing coarse-grained models via deep neural networks. J. Chem. Phys. 2018; 149: 034101.
  • [30] Husic BE, Charron NE, Lemm D et al. Coarse graining molecular dynamics with graph neural networks. J. Chem. Phys. 2020; 153: 194101.
  • [31] Hyvärinen A. Estimation of non-normalized statistical models by score matching. J. Mach. Learn. Res. 2005; 6: 695–709.
  • [32] Song Y and Ermon S. Generative modeling by estimating gradients of the data distribution. Wallach H, Larochelle H, Beygelzimer A et al., editors, Advances in Neural Information Processing Systems, volume 32 of NeurIPS (2019) 11918–30.
  • [33] Courant R and Hilbert D. Methods of mathematical physics 1953.
  • [34] Khasminskii R. Stochastic stability of differential equations 2012.
  • [35] Paszke A, Gross S, Massa F et al. Pytorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems, volume 32 of NeurIPS (2019) 8024.
  • [36] Qian H. Mesoscopic nonequilibrium thermodynamics of single macromolecules and dynamic entropy-energy compensation. Phys. Rev. E 2001; 65: 016102.
  • [37] Zhang X, Qian H and Qian M. Stochastic theory of nonequilibrium steady states and its applications. part i. Phys. Rep. 2012; 510: 1–86.
  • [38] Mei S, Montanari A and Nguyen PM. A mean field view of the landscape of two-layer neural networks. Proc. Nat. Acad. Sci. USA 2018; 115: E7665–E71.
  • [39] Kobyzev I, Prince SJD and Brubaker M. Normalizing flows: An introduction and review of current methods. IEEE Trans. Patt. Anal. Mach. Intel. 2021; 43: 3964–79.
  • [40] Zhu XM, Yin L and Ao P. Limit cycle and conserved dynamics. International Journal of Modern Physics B 2006; 20: 817–27.
  • [41] Ma Y, Tan Q, Yuan R et al. Potential function in a continuous dissipative chaotic system: Decomposition scheme and role of strange attractor. International Journal of Bifurcation and Chaos 2014; 24: 1450015.
  • [42] Lorenz EN. Deterministic nonperiodic flow. J. Atmos. Sci. 1963; 20: 130–41.
  • [43] Zhang W, Hartmann C and Schütte C. Effective dynamics along given reaction coordinates, and reaction rate theory. Faraday Discuss. 2016; 195: 365–94.
  • [44] Kang X and Li C. A dimension reduction approach for energy landscape: Identifying intermediate states in metabolism-emt network. Adv. Sci. 2021; 8: 2003133.
  • [45] Ferrell JE, Tsai TYC and Yang Q. Modeling the cell cycle: Why do certain circuits oscillate? Cell 2011; 144: 874–85.
  • [46] Hirsch MW, Smale S and Devaney RL. Differential equations, dynamical systems, and an introduction to chaos 2004.
  • [47] Bonati L, Zhang YY and Parrinello M. Neural networks-based variationally enhanced sampling. Proc. Nat. Acad. Sci., USA 2019; 116: 17641–7.
  • [48] Bonati L, Rizzi V and Parrinello M. Data-driven collective variables for enhanced sampling. J. Phys. Chem. Lett. 2020; 11: 2998–3004.
  • [49] Evans LC. Partial differential equations 2010.
  • [50] Holmes MH. Introduction to perturbation methods 2013.
  • [51] Frenkel D and Smit B. Understanding molecular simulation: From algorithms to applications 2002.
  • [52] Kingma DP and Ba J. Adam: a method for stochastic optimization. Proceedings of the International Conference on Learning Representations (2015) .
  • [53] Durkan C, Bekasov A, Murray I et al. Neural spline flows. Advances in Neural Information Processing Systems, volume 32 of NeurIPS (2019) .

Supplementary Information for EPR-Net:
Constructing non-equilibrium potential landscape via a variational force projection formulation

In this supplementary information (SI), we will present further theoretical derivations and computational details of the contents in the main text (MT). This SI consists of two parts: theory and computation.

PART 1: THEORY

We will first provide details of theoretical derivations omitted in the MT.

Appendix A Derivation of the EPR loss

In this section, we show that, up to an additive constant, the potential function U(𝒙):=Dlnpss(𝒙)U(\bm{x}):=-D\ln p_{\mathrm{ss}}(\bm{x}) is the unique minimizer of the EPR loss in equation (1) in MT.

First, we show that the orthogonality relation

Ω(𝑭+U)Wdπ=0\int_{\Omega}(\bm{F}+\nabla U)\cdot\nabla W\,\differential\pi=0 (30)

holds for any suitable function W(𝒙):dW(\bm{x}):\mathbb{R}^{d}\rightarrow\mathbb{R} under both choices of the boundary conditions considered in this work, where dπ(𝒙):=pss(𝒙)d𝒙\differential\pi(\bm{x}):=p_{\mathrm{ss}}(\bm{x})\differential\bm{x}. To see this, we note that

Ω(𝑭+U)Wdπ\displaystyle\int_{\Omega}(\bm{F}+\nabla U)\cdot\nabla W\,\differential\pi
=\displaystyle= Ω(pss𝑭Dpss)Wd𝒙\displaystyle\int_{\Omega}(p_{\mathrm{ss}}\bm{F}-D\nabla p_{\mathrm{ss}})\cdot\nabla W\,\differential\bm{x}
=\displaystyle= ΩW(pss𝑭Dpss)𝒏d𝒙\displaystyle\int_{\partial\Omega}W(p_{\mathrm{ss}}\bm{F}-D\nabla p_{\mathrm{ss}})\cdot\bm{n}\,\differential\bm{x}
ΩW(pss𝑭Dpss)d𝒙\displaystyle-\int_{\Omega}W\nabla\cdot(p_{\mathrm{ss}}\bm{F}-D\nabla p_{\mathrm{ss}})\,\differential\bm{x}
=\displaystyle= 0\displaystyle 0

where we have used integration by parts, the relation pss(𝒙)=exp(U(𝒙)/D)p_{\mathrm{ss}}(\bm{x})=\exp(-U(\bm{x})/D), the corresponding BC, and the steady state Fokker-Planck equation (FPE) satisfied by pssp_{\mathrm{ss}}.

Now consider the EPR loss, we have

LEPR(V)=\displaystyle\operatorname{L_{EPR}}(V)= Ω|𝑭+V|2dπ\displaystyle\int_{\Omega}|\bm{F}+\nabla V|^{2}\,\differential\pi
=\displaystyle= Ω|𝑭+U|2+|VU|2dπ,\displaystyle\int_{\Omega}|\bm{F}+\nabla U|^{2}+|\nabla V-\nabla U|^{2}\,\differential\pi,

where we have used the orthogonality relation (30) to arrive at the last equality, from which we conclude that U(𝒙)U(\bm{x}) is the unique minimizer of the EPR loss up to an additive constant.

Appendix B Stability of the EPR minimizer

In this section, we formally show that small perturbations of the invariant distribution π\pi will not introduce a disastrous change to the minimizer of the corresponding EPR loss. We only consider the case where Ω\Omega is a dd-dimensional hyperrectangle. The argument for Ω=d\Omega=\mathbb{R}^{d} is similar.

Suppose dπ(𝒙)=p(𝒙)d𝒙\differential\pi({\bm{x}})=p({\bm{x}})\differential{\bm{x}}, dμ(𝒙)=q(𝒙)d𝒙\differential\mu({\bm{x}})=q({\bm{x}})\differential{\bm{x}}, and the functions U(𝒙)U({\bm{x}}) and U¯(𝒙)\bar{U}({\bm{x}}) are the unique minimizers (up to a constant) of the following two EPR losses

U=argminVΩ|𝑭+V|2dπ,\displaystyle U=\mathop{\arg\min}_{V}\int_{\Omega}\left|\bm{F}+\nabla V\right|^{2}\,\differential\pi,
U¯=argminVΩ|𝑭+V|2dμ,\displaystyle\bar{U}=\mathop{\arg\min}_{V}\int_{\Omega}\left|\bm{F}+\nabla V\right|^{2}\,\differential\mu,

respectively. It is not difficult to find that the Euler-Lagrange equations of U,U¯U,\bar{U} are given by the following partial differential equation (PDE) with suitable BCs:

((𝑭+U)p)=0 in Ω,(𝑭+U)𝒏=0 on Ω,\displaystyle\nabla\cdot((\bm{F}+\nabla U)p)=0\mbox{ in }\Omega,\ (\bm{F}+\nabla U)\cdot\bm{n}=0\mbox{ on }\partial\Omega,
((𝑭+U¯)q)=0 in Ω,(𝑭+U¯)𝒏=0 on Ω.\displaystyle\nabla\cdot((\bm{F}+\nabla\bar{U})q)=0\mbox{ in }\Omega,\ (\bm{F}+\nabla\bar{U})\cdot\bm{n}=0\mbox{ on }\partial\Omega.

The PDEs above defined inside the domain Ω\Omega can be converted to

ΔUp+Up=(p𝑭),\displaystyle\Delta Up+\nabla U\cdot\nabla p=-\nabla\cdot(p\bm{F}),
ΔU¯q+U¯q=(q𝑭).\displaystyle\Delta\bar{U}q+\nabla\bar{U}\cdot\nabla q=-\nabla\cdot(q\bm{F}).

Define U0(𝒙)=Dlnp(𝒙)U_{0}({\bm{x}})=-D\ln p({\bm{x}}) and U¯0(𝒙)=Dlnq(𝒙)\bar{U}_{0}({\bm{x}})=-D\ln q({\bm{x}}). We then obtain

UU0+DΔU=𝑭U0D𝑭,\displaystyle-\nabla U\cdot\nabla U_{0}+D\Delta U=\bm{F}\cdot\nabla U_{0}-D\nabla\cdot\bm{F}, (31)
U¯U¯0+DΔU¯=𝑭U¯0D𝑭.\displaystyle-\nabla\bar{U}\cdot\nabla\bar{U}_{0}+D\Delta\bar{U}=\bm{F}\cdot\nabla\bar{U}_{0}-D\nabla\cdot\bm{F}. (32)

Assuming that δU0:=U0U¯0=O(ε)\delta U_{0}:=U_{0}-\bar{U}_{0}=O(\varepsilon), where 0<ϵ10<\epsilon\ll 1 denotes a small constant, we have the PDE for UU¯U-\bar{U} by subtracting (32) from (31):

(U\displaystyle-\nabla(U- U¯)U0+DΔ(UU¯)=𝑭(δU0)+U¯(δU0)\displaystyle\bar{U})\cdot\nabla U_{0}+D\Delta(U-\bar{U})=\bm{F}\cdot\nabla(\delta U_{0})+\nabla\bar{U}\cdot\nabla(\delta U_{0})

with BC (UU¯)𝒏=0\nabla(U-\bar{U})\cdot\bm{n}=0. Since U0,U¯,𝑭O(1)U_{0},\bar{U},\bm{F}\sim O(1), we can obtain that

U(𝒙)U¯(𝒙)=O(ε)U({\bm{x}})-\bar{U}({\bm{x}})=O(\varepsilon)

by the regularity theory of elliptic PDE [49, Section 6.3] when DO(1)D\sim O(1), or by the matched asymptotic expansion when D1D\ll 1 [50, Chapter 2]. In fact, the closeness between U(𝒙)U(\bm{x}) and U¯(𝒙)\bar{U}(\bm{x}) can be ensured as long as U0U_{0} and U¯0\bar{U}_{0} are close enough in the region where p(𝒙)p(\bm{x}) and q(𝒙)q(\bm{x}) are bounded away from zero by the method of characteristics analysis [49, Section 2.1] and matched asymptotics.

The above derivations assume that the PDFs pp,qq are smooth enough. The stability analysis for general distributions π\pi and μ\mu is also possible by utilizing the functional analysis tools. However, the derivations will be abstract and quite involved, and we will report it elsewhere.

Appendix C Dimensionality reduction

In this section, we study dimensionality reduction for high-dimensional problems in order to learn the projected potential. A straightforward approach is to first learn the high-dimensional potential UU and then find its low-dimensional representation, i.e., the reduced potential or the free energy function, using dimensionality reduction techniques. An alternative approach is to directly learn the low-dimensional reduced potential.

Denote by 𝒙=(𝒚,𝒛)Ω\bm{x}=(\bm{y},\bm{z})^{\top}\in\Omega. As in MT Section 1.4, we assume the domain

Ω=Σ×Ω~,\Omega=\Sigma\times\widetilde{\Omega},

where Σd2\Sigma\subseteq\mathbb{R}^{d-2} and Ω~2\widetilde{\Omega}\subseteq\mathbb{R}^{2} are the domain of 𝒚\bm{y} and 𝒛\bm{z}, respectively. The reduced potential U~(𝒛)\widetilde{U}(\bm{z}) is defined as

U~(𝒛)=Dlnp~ss(𝒛)=DlnΣpss(𝒚,𝒛)d𝒚.\widetilde{U}(\bm{z})=-D\ln\widetilde{p}_{\mathrm{ss}}(\bm{z})=-D\ln\int_{\Sigma}p_{\mathrm{ss}}(\bm{y},\bm{z})\,\differential\bm{y}. (33)

One natural approach for constructing U~(𝒛)\widetilde{U}(\bm{z}) is directly integrating pss(𝒚,𝒛)p_{\mathrm{ss}}(\bm{y},\bm{z}) based on the learned U(𝒚,𝒛)U(\bm{y},\bm{z}) with the EPR loss, i.e.,

U~(𝒛)=DlnΣexp(U(𝒚,𝒛)/D)d𝒚.\widetilde{U}(\bm{z})=-D\ln\int_{\Sigma}\exp(-U(\bm{y},\bm{z})/D)\,\differential\bm{y}. (34)

However, performing this integration is not a straightforward numerical task (see, e.g., [51, Chapter 7]).

C.1 Gradient projection loss

In this subsection, we study a simple approach to approximate U~(𝒛)\widetilde{U}(\bm{z}) based on sample points, which approximately obey the invariant distribution π(𝒙)\pi({\bm{x}}), and the learned high dimensional potential function U(𝒙)U(\bm{x}) by EPR loss. This approach is taken in the consistency checking of the reduced potentials by different methods in SI Section G.3. The idea is to utilize the gradient projection (GP) loss on the 𝒛\bm{z} components of U\nabla U:

LGP(V~)=Ω|𝒛U(𝒚,𝒛)𝒛V~(𝒛)|2dπ(𝒚,𝒛).\operatorname{L_{GP}}(\widetilde{V})=\int_{\Omega}\big{|}\nabla_{\bm{z}}U(\bm{y},\bm{z})-\nabla_{\bm{z}}\widetilde{V}(\bm{z})\big{|}^{2}\,\differential\pi(\bm{y},\bm{z}). (35)

To justify (35), we note that

LGP(V~)=\displaystyle\operatorname{L_{GP}}(\widetilde{V})= Ω|𝒛U𝒛V~|2dπ(𝒙)\displaystyle\int_{\Omega}\big{|}\nabla_{\bm{z}}U-\nabla_{\bm{z}}\widetilde{V}\big{|}^{2}\,\differential\pi(\bm{x})
=\displaystyle= Ω|𝒛U𝒛U~+𝒛U~𝒛V~|2dπ(𝒙)\displaystyle\int_{\Omega}\big{|}\nabla_{\bm{z}}U-\nabla_{\bm{z}}\widetilde{U}+\nabla_{\bm{z}}\widetilde{U}-\nabla_{\bm{z}}\widetilde{V}\big{|}^{2}\,\differential\pi(\bm{x})
=\displaystyle= Ω(|𝒛U𝒛U~|2+|𝒛U~𝒛V~|2)dπ(𝒙)\displaystyle\int_{\Omega}\Big{(}\big{|}\nabla_{\bm{z}}U-\nabla_{\bm{z}}\widetilde{U}\big{|}^{2}+\big{|}\nabla_{\bm{z}}\widetilde{U}-\nabla_{\bm{z}}\widetilde{V}\big{|}^{2}\Big{)}\,\differential\pi(\bm{x})
+2Ω(𝒛U𝒛U~)𝒛(U~V~)dπ(𝒙)\displaystyle+2\int_{\Omega}(\nabla_{\bm{z}}U-\nabla_{\bm{z}}\widetilde{U})\cdot\nabla_{\bm{z}}(\widetilde{U}-\widetilde{V})\,\differential\pi(\bm{x})
=:\displaystyle=: P1+P2,\displaystyle P_{1}+P_{2},

where P1P_{1} and P2P_{2} denote the terms in the third and the fourth line above, respectively. The term P2=0P_{2}=0 since

Ω𝒛U𝒛(U~V~)dπ(𝒙)\displaystyle\int_{\Omega}\nabla_{\bm{z}}U\cdot\nabla_{\bm{z}}(\widetilde{U}-\widetilde{V})\,\differential\pi(\bm{x})
=\displaystyle= Ω~(Σ𝒛UeUDd𝒚)𝒛(U~V~)d𝒛\displaystyle\int_{\widetilde{\Omega}}\left(\int_{\Sigma}\nabla_{\bm{z}}U\mathrm{e}^{-\frac{U}{D}}\differential\bm{y}\right)\cdot\nabla_{\bm{z}}(\widetilde{U}-\widetilde{V})\,\differential\bm{z}
=\displaystyle= DΩ~𝒛(ΣeUDd𝒚)𝒛(U~V~)d𝒛\displaystyle-D\int_{\widetilde{\Omega}}\nabla_{\bm{z}}\left(\int_{\Sigma}\mathrm{e}^{-\frac{U}{D}}\,\differential\bm{y}\right)\cdot\nabla_{\bm{z}}(\widetilde{U}-\widetilde{V})\,\differential\bm{z}
=\displaystyle= DΩ~𝒛p~ss𝒛(U~V~)d𝒛\displaystyle-D\int_{\widetilde{\Omega}}\nabla_{\bm{z}}\widetilde{p}_{\mathrm{ss}}\cdot\nabla_{\bm{z}}(\widetilde{U}-\widetilde{V})\,\differential\bm{z}
=\displaystyle= Ω~𝒛U~𝒛(U~V~)p~ssd𝒛\displaystyle\int_{\widetilde{\Omega}}\nabla_{\bm{z}}\widetilde{U}\cdot\nabla_{\bm{z}}(\widetilde{U}-\widetilde{V})\,\widetilde{p}_{\mathrm{ss}}\,\differential\bm{z}

and

Ω𝒛U~𝒛(U~V~)dπ(𝒙)=Ω~𝒛U~𝒛(U~V~)p~ssd𝒛,\displaystyle\int_{\Omega}\nabla_{\bm{z}}\widetilde{U}\cdot\nabla_{\bm{z}}(\widetilde{U}-\widetilde{V})\,\differential\pi(\bm{x})=\int_{\widetilde{\Omega}}\nabla_{\bm{z}}\widetilde{U}\cdot\nabla_{\bm{z}}(\widetilde{U}-\widetilde{V})\,\widetilde{p}_{\mathrm{ss}}\,\differential\bm{z},

which cancel with each other in P2P_{2}.

Therefore, the minimization of GP loss is equivalent to minimizing

Ω~|𝒛U~𝒛V~|2p~ssd𝒛,\int_{\widetilde{\Omega}}\big{|}\nabla_{\bm{z}}\widetilde{U}-\nabla_{\bm{z}}\widetilde{V}\big{|}^{2}\,\widetilde{p}_{\mathrm{ss}}\,\differential\bm{z},

which clearly implies that U~(𝒛)\widetilde{U}(\bm{z}) is the unique minimizer (up to a constant) of the proposed GP loss.

C.2 Projected EPR loss

In this subsection, we study the projected EPR (P-EPR) loss, which has the form

LPEPR(V~)=Ω|𝑭𝒛(𝒚,𝒛)+𝒛V~(𝒛)|2dπ(𝒚,𝒛),\operatorname{L_{P-EPR}}(\widetilde{V})=\int_{\Omega}\big{|}\bm{F}_{\bm{z}}(\bm{y},\bm{z})+\nabla_{\bm{z}}\widetilde{V}(\bm{z})\big{|}^{2}\,\differential\pi(\bm{y},\bm{z}), (36)

where 𝑭𝒛(𝒚,𝒛)2\bm{F}_{\bm{z}}(\bm{y},\bm{z})\in\mathbb{R}^{2} is the 𝒛\bm{z}-component of the force field 𝑭=(𝑭𝒚,𝑭𝒛)\bm{F}=(\bm{F}_{\bm{y}},\bm{F}_{\bm{z}})^{\top}.

Define

L~PEPR(V~)=Ω|𝑭(𝒚,𝒛)+V~(𝒛)|2dπ(𝒚,𝒛),\operatorname{\widetilde{L}_{P-EPR}}(\widetilde{V})=\int_{\Omega}\big{|}\bm{F}(\bm{y},\bm{z})+\nabla\widetilde{V}(\bm{z})\big{|}^{2}\,\differential\pi(\bm{y},\bm{z}), (37)

where \nabla is the full gradient with respect to 𝒙{\bm{x}}. To justify (36), we first note the following equivalence

minLPEPR(V~)minL~PEPR(V~),\min\operatorname{L_{P-EPR}}(\widetilde{V})\quad\Longleftrightarrow\quad\min\operatorname{\widetilde{L}_{P-EPR}}(\widetilde{V}), (38)

since 𝒚V~(𝒛)=0\nabla_{\bm{y}}\widetilde{V}(\bm{z})=0 and the 𝒚\bm{y}-components of 𝑭+V~\bm{F}+\nabla\widetilde{V} only introduce an irrelevant constant in (37). Furthermore, we have

L~PEPR(V~)=\displaystyle\operatorname{\widetilde{L}_{P-EPR}}(\widetilde{V})= Ω|𝑭+V~|2dπ(𝒙)\displaystyle\int_{\Omega}\big{|}\bm{F}+\nabla\widetilde{V}\big{|}^{2}\,\differential\pi(\bm{x})
=\displaystyle= Ω|𝑭+U+V~U|2dπ(𝒙)\displaystyle\int_{\Omega}\big{|}\bm{F}+\nabla U+\nabla\widetilde{V}-\nabla U\big{|}^{2}\,\differential\pi(\bm{x})
=\displaystyle= Ω|𝑭+U|2+|V~U|2dπ(𝒙),\displaystyle\int_{\Omega}\big{|}\bm{F}+\nabla U\big{|}^{2}+\big{|}\nabla\widetilde{V}-\nabla U\big{|}^{2}\,\differential\pi(\bm{x}),

where the last equality is due to the orthogonality relation (30).

Using a similar argument for deriving (38), the equivalence (38) itself, as well as the GP loss in (35), we get

minLPEPR(V~)minLGP(V~).\min\operatorname{L_{P-EPR}}(\widetilde{V})\quad\Longleftrightarrow\quad\min\operatorname{L_{GP}}(\widetilde{V}). (39)

Since U~\widetilde{U} minimizes the GP loss as is shown in the previous subsection, we conclude that U~\widetilde{U} minimizes the loss in (36).

C.3 Force projection loss

In this subsection, we study the force projection (P-For) loss for approximating the projection of 𝑭𝒛\bm{F}_{\bm{z}} onto the 𝒛\bm{z}-space.

Denote by

𝑭~(𝒛):=Σ𝑭𝒛(𝒚,𝒛)dπ(𝒚|𝒛)\widetilde{\bm{F}}(\bm{z}):=\int_{\Sigma}\bm{F}_{\bm{z}}(\bm{y},\bm{z})\,\differential\pi(\bm{y}|\bm{z}) (40)

the projected force defined using the conditional distribution

dπ(𝒚|𝒛)=pss(𝒚,𝒛)/p~ss(𝒛)d𝒚.\differential\pi(\bm{y}|\bm{z})=p_{\mathrm{ss}}(\bm{y},\bm{z})/\widetilde{p}_{\mathrm{ss}}(\bm{z})\,\differential\bm{y}. (41)

We can learn 𝑭~(𝒛)\widetilde{\bm{F}}(\bm{z}) via the following force projection loss

LPFor(𝑮~)=Ω|𝑭𝒛(𝒚,𝒛)𝑮~(𝒛)|2dπ(𝒚,𝒛).\operatorname{L_{P-For}}(\widetilde{\bm{G}})=\int_{\Omega}\big{|}\bm{F}_{\bm{z}}(\bm{y},\bm{z})-\widetilde{\bm{G}}(\bm{z})\big{|}^{2}\,\differential\pi(\bm{y},\bm{z}). (42)

To justify (42), we note that

Ω|𝑭𝒛(𝒚,𝒛)𝑮~(𝒛)|2dπ(𝒚,𝒛)\displaystyle\int_{\Omega}\big{|}\bm{F}_{\bm{z}}(\bm{y},\bm{z})-\widetilde{\bm{G}}(\bm{z})\big{|}^{2}\,\differential\pi(\bm{y},\bm{z})
=\displaystyle= Ω(|𝑭𝒛(𝒚,𝒛)|2+|𝑮~(𝒛)|2)dπ(𝒚,𝒛)2Ω𝑭𝒛(𝒚,𝒛)𝑮~(𝒛)dπ(𝒚,𝒛)\displaystyle\int_{\Omega}\Big{(}|\bm{F}_{\bm{z}}(\bm{y},\bm{z})|^{2}+|\widetilde{\bm{G}}(\bm{z})|^{2}\Big{)}\,\differential\pi(\bm{y},\bm{z})-2\int_{\Omega}\bm{F}_{\bm{z}}(\bm{y},\bm{z})\cdot\widetilde{\bm{G}}(\bm{z})\,\differential\pi(\bm{y},\bm{z})
=:\displaystyle=: P12P2.\displaystyle P_{1}-2P_{2}.

The term P2P_{2} can be simplified as

P2\displaystyle P_{2} =Ω~[Σ𝑭𝒛(𝒚,𝒛)dπ(𝒚|𝒛)]𝑮~(𝒛)p~ss(𝒛)d𝒛=Ω~𝑭~(𝒛)𝑮~(𝒛)p~ss(𝒛)d𝒛.\displaystyle=\int_{\widetilde{\Omega}}\left[\int_{\Sigma}\bm{F}_{\bm{z}}(\bm{y},\bm{z})\differential\pi(\bm{y}|\bm{z})\right]\cdot\widetilde{\bm{G}}(\bm{z})\,\widetilde{p}_{\mathrm{ss}}(\bm{z})\,\differential\bm{z}=\int_{\widetilde{\Omega}}\widetilde{\bm{F}}(\bm{z})\cdot\widetilde{\bm{G}}(\bm{z})\,\widetilde{p}_{\mathrm{ss}}(\bm{z})\,\differential\bm{z}.

Therefore, we have the equivalence

minLPFor(𝑮~)minL~PFor(𝑮~),\min\operatorname{L_{P-For}}(\widetilde{\bm{G}})\quad\Longleftrightarrow\quad\min\operatorname{\widetilde{L}_{P-For}}(\widetilde{\bm{G}}), (43)

where

L~PFor(𝑮~):=Ω~|𝑭~(𝒛)𝑮~(𝒛)|2p~ss(𝒛)d𝒛.\operatorname{\widetilde{L}_{P-For}}(\widetilde{\bm{G}}):=\int_{\widetilde{\Omega}}\big{|}\widetilde{\bm{F}}(\bm{z})-\widetilde{\bm{G}}(\bm{z})\big{|}^{2}\widetilde{p}_{\mathrm{ss}}(\bm{z})\,\differential\bm{z}.

From the analysis above we can conclude that F~(𝒛)\widetilde{F}(\bm{z}) minimizes the loss in (42).

C.4 HJB equation for the reduced potential

In this subsection, we show that the reduced potential U~\widetilde{U} satisfies the projected HJB equation

𝑭~\displaystyle\widetilde{\bm{F}} 𝒛U~+|𝒛U~|2DΔ𝒛U~D𝒛𝑭~=0,\displaystyle\cdot\nabla_{\bm{z}}\widetilde{U}+|\nabla_{\bm{z}}\widetilde{U}|^{2}-D\Delta_{\bm{z}}\widetilde{U}-D\nabla_{\bm{z}}\cdot\widetilde{\bm{F}}=0\,, (44)

with asymptotic BC U~\widetilde{U}\rightarrow\infty as |𝒛||\bm{z}|\rightarrow\infty, or the reflecting BC (𝑭~+𝒛U~)𝒏~=0(\widetilde{\bm{F}}+\nabla_{\bm{z}}\widetilde{U})\cdot\widetilde{\bm{n}}=0 on Ω~\partial\widetilde{\Omega}, where 𝒏~\widetilde{\bm{n}} denotes the unit outer normal on Ω~\partial\widetilde{\Omega}. We will only consider the rectangular domain case here. The argument for the unbounded case is similar.

Recall that pss(𝒙)p_{\mathrm{ss}}(\bm{x}) satisfies the FPE

(pss𝑭)DΔpss=0.\nabla\cdot(p_{\mathrm{ss}}\bm{F})-D\Delta p_{\mathrm{ss}}=0. (45)

Integrating both sides of (45) on Σ\Sigma with respect to 𝒚\bm{y} and utilizing the boundary condition 𝑱ss𝒏=0\bm{J}_{\mathrm{ss}}\cdot\bm{n}=0, where 𝑱ss=pss𝑭Dpss\bm{J}_{\mathrm{ss}}=p_{\mathrm{ss}}\bm{F}-D\nabla p_{\mathrm{ss}}, we get

𝒛(Σ𝑭𝒛pssd𝒚)DΔ𝒛p~ss=0.\nabla_{\bm{z}}\cdot\Big{(}\int_{\Sigma}\bm{F}_{\bm{z}}p_{\mathrm{ss}}\,\differential\bm{y}\Big{)}-D\Delta_{\bm{z}}\widetilde{p}_{\mathrm{ss}}=0. (46)

Taking (40) and (41) into account, we obtain

𝒛(p~ss𝑭~)DΔ𝒛p~ss=𝒛𝑱~=0,\nabla_{\bm{z}}\cdot\big{(}\widetilde{p}_{\mathrm{ss}}\widetilde{\bm{F}}\big{)}-D\Delta_{\bm{z}}\widetilde{p}_{\mathrm{ss}}=\nabla_{\bm{z}}\cdot\widetilde{\bm{J}}=0, (47)

i.e., a FPE for p~ss(𝒛)\widetilde{p}_{\mathrm{ss}}(\bm{z}) with the reduced force field 𝑭~\widetilde{\bm{F}}, where 𝑱~:=p~ss𝑭~D𝒛p~ss\widetilde{\bm{J}}:=\widetilde{p}_{\mathrm{ss}}\widetilde{\bm{F}}-D\nabla_{\bm{z}}\widetilde{p}_{\mathrm{ss}}. The corresponding boundary condition can be also derived by integrating the original BC 𝑱ss𝒏=0\bm{J}_{\mathrm{ss}}\cdot\bm{n}=0 on Σ\Sigma with respect to 𝒚\bm{y} for 𝒛Ω~\bm{z}\in\partial\widetilde{\Omega}, which gives

𝑱~𝒏~=(p~ss𝑭~D𝒛p~ss)𝒏~=0.\widetilde{\bm{J}}\cdot\widetilde{\bm{n}}=\big{(}\widetilde{p}_{\mathrm{ss}}\widetilde{\bm{F}}-D\nabla_{\bm{z}}\widetilde{p}_{\mathrm{ss}}\big{)}\cdot\widetilde{\bm{n}}=0. (48)

Substituting the relation p~ss(𝒛)=exp(U~(𝒛)/D)\widetilde{p}_{\mathrm{ss}}(\bm{z})=\exp(-\widetilde{U}(\bm{z})/D) into (47) and (48), we get (44) and the corresponding reflecting BC after some algebraic manipulations.

Appendix D State-dependent diffusion coefficients

In this section, we study the EPR loss for NESS systems with a state-dependent diffusion coefficient.

Consider the Itô SDE

d𝒙(t)dt=𝑭(𝒙(t))+2Dσ(𝒙(t))𝒘˙\frac{\differential\bm{x}(t)}{\differential t}=\bm{F}(\bm{x}(t))+\sqrt{2D}{\sigma(\bm{x}(t))}\dot{\bm{w}} (49)

with the state-dependent diffusion matrix σ(𝒙)\sigma(\bm{x}). Under the same assumptions as in MT Section 1.1, we have the FPE

(pss𝑭)D2:(pssa)=0.\nabla\cdot(p_{\mathrm{ss}}\bm{F})-D\nabla^{2}:\left(p_{\mathrm{ss}}a\right)=0. (50)

We show that the high dimensional landscape function UU of (49) minimizes the EPR loss

LVEPR(V)=Ω|𝑭v(𝒙)+a(𝒙)V(𝒙)|a1(𝒙)2dπ(𝒙),\operatorname{L_{V-EPR}}(V)=\int_{\Omega}\left|\bm{F}^{v}(\bm{x})+a(\bm{x})\nabla V(\bm{x})\right|_{a^{-1}(\bm{x})}^{2}\differential\pi{(\bm{x})}, (51)

where 𝑭v(𝒙):=𝑭(𝒙)Da(𝒙)\bm{F}^{v}(\bm{x}):=\bm{F}(\bm{x})-D\nabla\cdot a(\bm{x}) and |𝒖|a1(𝒙)2:=𝒖a1(𝒙)𝒖|\bm{u}|_{a^{-1}(\bm{x})}^{2}:=\bm{u}^{\top}a^{-1}(\bm{x})\bm{u} for 𝒖d\bm{u}\in\mathbb{R}^{d}.

To justify (51), we first note that (50) can be rewritten as

(pss𝑭vDapss)=0,\nabla\cdot(p_{\mathrm{ss}}\bm{F}^{v}-Da\nabla p_{\mathrm{ss}})=0\,, (52)

which, together with the BC, implies the orthogonality relation

Ω(𝑭v+aU)Wdπ=0\displaystyle\int_{\Omega}\big{(}\bm{F}^{v}+a\nabla U\big{)}\cdot\nabla W\,\differential\pi=0 (53)

for a suitable test function W(𝒙)W(\bm{x}). Following the same reasoning used in establishing (30) and utilizing (53), we have

Ω|𝑭v+aV|a12dπ\displaystyle\int_{\Omega}\left|\bm{F}^{v}+a\nabla V\right|_{a^{-1}}^{2}\,\differential\pi
=\displaystyle= Ω|𝑭v+aU+a(VU)|a12dπ\displaystyle\int_{\Omega}\big{|}\bm{F}^{v}+a\nabla U+a\nabla(V-U)\big{|}_{a^{-1}}^{2}\,\differential\pi
=\displaystyle= Ω|𝑭v+aU|a12dπ+Ω|a(VU)|a12dπ.\displaystyle\int_{\Omega}|\bm{F}^{v}+a\nabla U\big{|}_{a^{-1}}^{2}\,\differential\pi+\int_{\Omega}\big{|}a\nabla(V-U)\big{|}_{a^{-1}}^{2}\,\differential\pi.

The last expression implies that U(𝒙)U(\bm{x}) is the unique minimizer of LVEPR(V)\operatorname{L_{V-EPR}}(V) up to a constant.

The above derivation for the state-dependent diffusion case will permit us to construct the landscape for the chemical Langevin dynamics, which will be studied in future works.

PART 2: COMPUTATION

Now we present the computational details omitted in the MT. We will first demonstrate the motivation for enhanced EPR, and then provide the training details and problem settings for landscape construction with original coordinates and dimension reduction. As mentioned in MT, we refer to the enhanced loss as Eq. (16).

Appendix E Motivation for enhanced EPR

Refer to caption
Figure 6: An illustration for the motivation of enhanced EPR. (a) and (b) show the comparisons of the learned potentials and true solution on the line y=1.5y=1.5 in the toy model with D=0.1D=0.1 and D=0.05D=0.05, respectively. (c) shows the filled contour plot of the potential learned by only the EPR loss. The orange points are samples from the simulated invariant distribution with D=0.05D=0.05, while green points are enhanced samples simulated from a more diffusive distribution with D=0.1D^{\prime}=0.1, which are used in the enhanced EPR.

We utilize enhanced samples to achieve a better coverage of the transition domain. Note that the EPR loss equation (3) in MT requires training data sampled according to π(𝒙)\pi(\bm{x}). The accuracy of the learned V(𝒙;θ)V(\bm{x};\theta^{*}) (more precisely, the accuracy of the gradient V(𝒙;θ)\nabla V(\bm{x};\theta^{*})) using equation (3) in MT is guaranteed only in the “visible” domain of π\pi, i.e., regions that are covered by sample points. However, the enhanced EPR framework (16) combines both the EPR loss and the HJB, which allows the use of sample data that better covers the domain Ω\Omega (e.g., more samples in the transition regions between meta-stable states and near the boundaries of the visible domain). This approach is particularly compelling when the diffusion coefficient D is relatively small.

The single EPR loss works well for the toy model with a relatively large diffusion coefficient D=0.1D=0.1. A slice plot of the potential at y=1.5y=1.5 in Fig. 6a shows that the learned solution with EPR loss coincides well with the analytical solution. The relative root mean square error (rRMSE) and the relative mean absolute error (rMAE) have the mean and the standard deviation of 0.084±0.0060.084\pm 0.006 and 0.066±0.0080.066\pm 0.008 over 3 runs, respectively.

However, when decreasing DD to 0.050.05, the samples from the simulated invariant distribution mainly stay in the double wells (orange points in Fig. 6c) and are away from the transition region between the two wells. For this reason, as shown in Fig. 6b, the result with a single EPR loss captures the profile of the two wells, but these two parts of the profile are not accurately connected in the transition region where there are few samples, making the left well a bit higher than the right one. We then generate enhanced samples (𝒙i)1iN(\bm{x}^{\prime}_{i})_{1\leq i\leq N^{\prime}} using D=0.1D^{\prime}=0.1, which have better coverage of the transition region (green points in Fig. 6a). Incorporating these enhanced samples, we learn the potential with the enhanced EPR loss and the result agrees better with the true solution (Fig. 6b).

Though the enhanced EPR is more recommended in general situations, we remark that the single EPR loss can achieve competitive performance as long as the samples cover the domain effectively. This numerical experience works for all of the considered models.

Appendix F Landscape with original coordinates

In this section, we will describe the additional setup where we utilize enhanced EPR for landscape construction with original coordinates in MT. This section includes a 2D toy model, a 2D biological system with a limit cycle [3], a 2D multi-stable system [5] and a 12D Gaussian mixture model (GMM).

F.1 Training details and additional results for benchmark problems

We begin by simulating the SDE using the Euler-Maruyama scheme with reflecting boundary conditions until time 𝖳=1000\mathsf{T}=1000, starting from 1000010000 different initial states. This simulation yields N=10000N=10000 final states, which are used as the training dataset (𝐱i)1iN(\mathbf{x}_{i})_{1\leq i\leq N} to approximate the invariant distribution for EPR. For the 2D toy problem and 2D multi-stable problem, we employ a time-step size of 0.1, whereas for the 2D limit cycle problem, we use a smaller time-step size of 0.01. We train the network with a batch size of 1024 and a learning rate of 0.001 using the Adam [52] optimizer for 3000 epochs in the toy models, 5000 epochs in the multi-stable model and 10000 epochs in the limit cycle model. When focusing on single EPR loss in the toy model with D=0.05D=0.05, we extend to 8000 epochs to ensure adequate learning. At each training epoch, we update the dataset by performing one step Euler-Maruyama scheme to make it closer to the invariant distribution. For the comparison with normalizing flows, we train a neural spline flow [53].111https://github.com/643VincentStimper/normalizing-flows We repeat 4 blocks of the rational quadratic spline with three layers of 64 hidden units and a followed LU linear permutation. The flow model is trained by the Adam optimizer with the learning rate 0.0001 for 20000 epochs, based on the same SDE-simulated dataset as the enhanced EPR. To ensure the robustness of our results, we perform our experiments across 5 random seeds, which helps to account for variability and establish the reliability of our findings.

Table 2: Evaluation on various μ(x)\mu(\bm{x}) and λ^1\hat{\lambda}_{1} for HJB alone and enhanced EPR over 5 random seeds.
Metrics rRMSE rMAE
μ(x)\mu(\bm{x}) λ^1\hat{\lambda}_{1} HJB Enhanced EPR HJB Enhanced EPR
×\times 0.0 ×\times 0.1 ×\times 1.0 ×\times 10.0 ×\times 0.0 ×\times 0.1 ×\times 1.0 ×\times 10.0
Toy, D=0.1D=0.1 D=1DD^{\prime}=1D 0.034 ±\pm0.016 —— —— —— 0.029 ±\pm0.014 —— —— ——
D=2DD^{\prime}=2D 0.168 ±\pm0.010 0.062 ±\pm0.033 0.028 ±\pm0.010 0.038 ±\pm0.015 0.068 ±\pm0.009 0.108 ±\pm0.054 0.026 ±\pm0.010 0.032 ±\pm0.013
D=5DD^{\prime}=5D 0.225 ±\pm0.071 0.138 ±\pm0.021 0.080 ±\pm0.011 0.046 ±\pm0.018 0.123 ±\pm0.097 0.032 ±\pm0.025 0.030 ±\pm0.011 0.030 ±\pm0.019
D=10DD^{\prime}=10D 0.258 ±\pm0.078 0.182 ±\pm0.033 0.095 ±\pm0.010 0.067 ±\pm0.012 0.167 ±\pm0.110 0.062 ±\pm0.040 0.038 ±\pm0.010 0.037 ±\pm0.009
Toy, D=0.05D=0.05 D=1DD^{\prime}=1D 0.191 ±\pm0.218 —— —— —— 0.160 ±\pm0.179 —— —— ——
D=2DD^{\prime}=2D 0.267 ±\pm0.116 0.058 ±\pm0.045 0.054 ±\pm0.020 0.111 ±\pm0.083 0.164 ±\pm0.120 0.049 ±\pm0.036 0.048 ±\pm0.019 0.096 ±\pm0.069
D=5DD^{\prime}=5D 0.661 ±\pm0.126 0.417 ±\pm0.162 0.204 ±\pm0.213 0.055 ±\pm0.028 0.555 ±\pm0.093 0.328 ±\pm0.155 0.127 ±\pm0.184 0.029 ±\pm0.008
D=10DD^{\prime}=10D 0.544 ±\pm0.061 0.485 ±\pm0.126 0.227 ±\pm0.132 0.191 ±\pm0.190 0.489 ±\pm0.057 0.410 ±\pm0.096 0.157 ±\pm0.136 0.122 ±\pm0.168
Multi-stable D=1DD^{\prime}=1D 0.255 ±\pm0.007 —— —— —— 0.241 ±\pm0.004 —— —— ——
D=2DD^{\prime}=2D 0.249 ±\pm0.015 0.069 ±\pm0.045 0.110 ±\pm0.028 0.146 ±\pm0.044 0.228 ±\pm0.011 0.065 ±\pm0.046 0.106 ±\pm0.030 0.142 ±\pm0.047
D=5DD^{\prime}=5D 0.628 ±\pm0.046 0.061 ±\pm0.031 0.090 ±\pm0.042 0.128 ±\pm0.043 0.553 ±\pm0.063 0.059 ±\pm0.032 0.088 ±\pm0.046 0.124 ±\pm0.044
D=10DD^{\prime}=10D 0.630 ±\pm0.051 0.307 ±\pm0.036 0.067 ±\pm0.047 0.118 ±\pm0.032 0.550 ±\pm0.066 0.197 ±\pm0.036 0.066 ±\pm0.049 0.118 ±\pm0.034
Limit Cycle σ=0.0\sigma=0.0 0.231 ±\pm0.048 —— —— —— 0.140 ±\pm0.021 —— —— ——
σ=0.05\sigma=0.05 0.287 ±\pm0.175 0.256 ±\pm0.181 0.070 ±\pm0.016 0.136 ±\pm0.002 0.165 ±\pm0.108 0.149 ±\pm0.107 0.063 ±\pm0.020 0.108 ±\pm0.003
σ=0.1\sigma=0.1 0.503 ±\pm0.133 0.372 ±\pm0.173 0.373 ±\pm0.181 0.122 ±\pm0.008 0.313 ±\pm0.092 0.222 ±\pm0.119 0.200 ±\pm0.090 0.101 ±\pm0.006
σ=0.2\sigma=0.2 0.578 ±\pm0.010 0.556 ±\pm0.013 0.509 ±\pm0.025 0.298 ±\pm0.055 0.383 ±\pm0.025 0.355 ±\pm0.030 0.303 ±\pm0.040 0.175 ±\pm0.029

For a fair comparison, we fix λ2\lambda_{2} at 1.0 across all models. The parameters we use in the MT is λ1=1.0\lambda_{1}=1.0 for toy model and multi-stable model, and λ1=0.01\lambda_{1}=0.01 for the limit cycle model. As discussed in the MT, the selection of λ1\lambda_{1} aims to balance the two terms of the loss function. Based on our experience, systems with higher entropy production rates—typically featuring more non-gradient components in their force decomposition—require a smaller λ1\lambda_{1}, as in the limit cycle problem. However, the specific choice of λ1\lambda_{1} is relatively robust. As demonstrated in Table 2, we scrutinize the stability of λ1\lambda_{1} by testing its sensitivity within the enhanced EPR method. We adjust λ1\lambda_{1} by multiplying it by a set of factors {0.0, 0.1, 1.0, 10.0} as λ^1\hat{\lambda}_{1} for different experiments. Then λ^1=0.0\hat{\lambda}_{1}=0.0 corresponds to the scenario of using HJB alone. Despite the varying distributions of μ(𝐱)\mu(\mathbf{x}), our selected parameters typically yield satisfactory outcomes. Even with a relatively small λ^1\hat{\lambda}_{1}, enhanced EPR outperforms solving by HJB alone. The performance enhancements are more pronounced when relatively large values of λ^1\hat{\lambda}_{1} are used.

We also evaluate different distributions of samples using the HJB loss, i.e., μ(𝒙)\mu(\bm{x}). For enhanced EPR, using invariant distribution in the HJB loss term is ineffective, as HJB loss is involved to cover the transition domain. Consequently, there is no need to evaluate μ(𝒙)\mu(\bm{x}) with D=1DD^{\prime}=1D or σ=0\sigma=0 for enhanced EPR. In the case of the HJB loss, our numerical analysis suggests that overly diffusive samples can lead to significantly worse outcomes, underscoring the importance of capturing the critical domains within the loss. As shown in Table 2, the optimal results for HJB alone are achieved using the invariant distribution or a distribution slightly more extensive than that. However, with the same enhanced samples from μ(𝒙)\mu(\bm{x}), introducing a non-zero λ1\lambda_{1} enhances performance compared to HJB alone, particularly when the sample domain is so diffusive that solving HJB alone performs poorly. This emphasizes the robustness of enhanced EPR to variations in μ(𝒙)\mu(\bm{x}) and its advantage over solving HJB alone. Finally, we emphasize the optimal outcome for λ^1=1.0×λ1\hat{\lambda}_{1}=1.0\times\lambda_{1} across various μ(𝒙)\mu(\bm{x}) for each problem by bolding it in Table 2, serving as the result for enhanced EPR in Table 1 of the MT. While this may not be the best result among all the combinations of μ(𝒙)\mu(\bm{x}) and λ^1\hat{\lambda}_{1}, it is a commonly robust choice that can be readily achieved without dedicated tuning.

F.2 2D toy model

To verify the applicability and accuracy of our method, we initially apply it to a toy model with the driving force

𝑭(𝒙)=(I+A)U0(𝒙),\bm{F}(\bm{x})=-(I+A)\nabla U_{0}(\bm{x}), (54)

where Ad×dA\in\mathbb{R}^{d\times d} is a constant skew-symmetric matrix, i.e., A=AA^{\top}=-A, and U0U_{0} is some known function. With this choice of 𝑭\bm{F}, one can check that the true potential landscape is simply U(𝒙)=U0(𝒙)U(\bm{x})=U_{0}(\bm{x}). In particular, the system becomes reversible when A=0A=0. We construct a 2D toy model with the double-well potential as

U0(𝒙)=((x1.5)21.0)2+0.5(y1.5)2,U_{0}(\bm{x})=((x-1.5)^{2}-1.0)^{2}+0.5(y-1.5)^{2}, (55)

where 𝒙=(x,y)\bm{x}=(x,y)^{\top}. We take the anti-symmetric matrix

A=[00.50.50],A=\begin{bmatrix}0&0.5\\ -0.5&0\end{bmatrix}, (56)

which introduces a counter-clockwise rotation for a focusing central force field. This sets up a simple non-equilibrium system. In this model, we have the force decomposition

𝑭(𝒙)=U0(𝒙)+𝒍(𝒙),𝒍(𝒙)=AU0(𝒙)\bm{F}(\bm{x})=-\nabla U_{0}(\bm{x})+\bm{l}(\bm{x}),\ \bm{l}(\bm{x})=-A\nabla U_{0}(\bm{x})

and

𝒍(𝒙)U0(𝒙)=0,𝒍(𝒙)=0\bm{l}(\bm{x})\cdot\nabla U_{0}(\bm{x})=0,\quad\nabla\cdot\bm{l}(\bm{x})=0

hold in the pointwise sense. Therefore, the identity 𝒍(𝒙)U0(𝒙)+D𝒍(𝒙)=0\bm{l}(\bm{x})\cdot\nabla U_{0}(\bm{x})+D\nabla\cdot\bm{l}(\bm{x})=0 is satisfied for any D>0D>0 and, following the discussions in MT, we have constructed a non-reversible system with analytically known double-well potential which can be used to verify the accuracy of the learned potential. We focus on the domain Ω=[0,3]×[0,3]\Omega=[0,3]\times[0,3]. To fix the extra shifting degree of freedom of the potential function, we set the minimum of the potential to be zero and only plot the result on the domain {𝒙|V(𝒙)30D}\{\bm{x}|V(\bm{x})\leq 30D\} in Fig. 3 in MT since it is the domain of practical interest.

F.3 Two-dimensional limit cycle model

We apply our approach to the limit cycle dynamics with a Mexican-hat shape landscape [3].

Before introducing the concrete model, let us make the following observation. For any SDE like

d𝒙dt=𝑭(𝒙)+2D𝒘˙,\frac{\differential\bm{x}}{\differential t}=\bm{F}(\bm{x})+\sqrt{2D}\dot{\bm{w}}, (57)

the corresponding steady FPE is

(pss𝑭)DΔpss=0.\nabla\cdot(p_{\mathrm{ss}}\bm{F})-D\Delta p_{\mathrm{ss}}=0.

If we make the transformation

𝑭κ𝑭,DκD\bm{F}\rightarrow\kappa\bm{F},\ D\rightarrow\kappa D

in (57), then the steady state PDF

pss(𝒙)=exp(U(𝒙)D)=exp(κU(𝒙)κD)p_{\mathrm{ss}}(\bm{x})=\exp(-\frac{U(\bm{x})}{D})=\exp(-\frac{\kappa U(\bm{x})}{\kappa D})

will not change. The transformation only changes the timescale of the dynamics (57) from t0t_{0} to κt0\kappa t_{0}. However, the potential changes from UU to κU\kappa U if we utilize the drift κ𝑭(𝒙)\kappa\bm{F}(\bm{x}) and noise strength κD\kappa D in the system (57). This observation allows us to set the scale of UU to be O(1)O(1) by adjusting κ\kappa suitably for a specific problem. An alternative approach to accomplish this task is by choosing 𝑭\bm{F} to be κ𝑭\kappa\bm{F} in the EPR loss.

We take D=0.1D=0.1 and consider the stochastic dynamics (57) with 𝑭=(Fx,Fy)\bm{F}=(F_{x},F_{y}) and

Fx(x,y)\displaystyle F_{x}(x,y) =κ(α2+x21+x211+yax),\displaystyle=\kappa\left(\frac{\alpha^{2}+x^{2}}{1+x^{2}}\frac{1}{1+y}-ax\right), (58)
Fy(x,y)\displaystyle F_{y}(x,y) =κτ0(by1+cx2),\displaystyle=\frac{\kappa}{\tau_{0}}\left(b-\frac{y}{1+cx^{2}}\right), (59)

where the parameters are κ=100,α=a=b=0.1,c=100\kappa=100,\alpha=a=b=0.1,c=100, and τ0=5\tau_{0}=5. Here the choice of κ=100\kappa=100 is made such that UO(1)U\sim O(1) following [22]. We focus on the domain Ω=[0,8]×[0,6]\Omega=[0,8]\times[0,6]. As explained in the paragraph above, our setting corresponds to the case D=0.1/κ=0.001D=0.1/\kappa=0.001 for the force field considered in [3].

F.4 Two-dimensional multi-stable model

We also study the dynamics (57) for a multi-stable system [5] with 𝑭=(Fx,Fy)\bm{F}=(F_{x},F_{y}) and

Fx(x,y)\displaystyle F_{x}(x,y) =axnSn+xn+bSnSn+ynk1x,\displaystyle=\frac{ax^{n}}{S^{n}+x^{n}}+\frac{bS^{n}}{S^{n}+y^{n}}-k_{1}x, (60)
Fy(x,y)\displaystyle F_{y}(x,y) =aynSn+yn+bSnSn+xnk2y,\displaystyle=\frac{ay^{n}}{S^{n}+y^{n}}+\frac{bS^{n}}{S^{n}+x^{n}}-k_{2}y, (61)

where the parameters are a=b=k1=k2=1a=b=k_{1}=k_{2}=1, S=0.5S=0.5, and n=4n=4. We focus on the domain Ω=[0,3]×[0,3]\Omega=[0,3]\times[0,3] and present the results for D=0.01D=0.01 in MT.

F.5 Twelve-dimensional GMM model: high-dimensional potential

As mentioned in MT Methods 3.4, we can construct models with known potential with driving force like Eq. (54). The true solution U0U_{0} is denoted by U0(𝒙)=Dlogp0(𝒙)U_{0}(\bm{x})=-D\log p_{0}(\bm{x}), where p0p_{0} is the probability distribution of the GMM, as defined in MT Section 1.3. We randomly generate a matrix A0A_{0} with elements uniformly in [0,1][0,1], then obtain an skew-symmetric matrix by A=(A0A0)/2A=(A_{0}-A_{0}^{\top})/2. The non-reversible dynamics is then constructed with driving force in (54). We set the noise strength as D=0.01D=0.01 for this problem and use D=5DD^{\prime}=5D for enhanced samples. We use a three-layer neural network with 80 hidden states per layer. The data size is 10000 and we use Adam with a learning rate of 0.001 and batch size of 2048. We train enhanced EPR with λ1=0.1,λ2=1.0\lambda_{1}=0.1,\lambda_{2}=1.0 for 5000 epochs.

Appendix G Landscape with reduced coordinates

In this section, we provide additional details and results related to the dimension reduction problem. First, we present additional experiments of the 3D Ferrell’s cell cycle problem [45]. We then demonstrate an 8D limit cycle dynamics [4], in which interesting properties emerge with projected force 𝑮~(𝒛;θ)\widetilde{\bm{G}}(\bm{z};\theta^{*}) and potential V~(𝒛;θ)\widetilde{V}(\bm{z};\theta^{*}). Finally, we present a 52D multi-stable dynamics [27], on which we compare the reduced potential V~(𝒛;θ)\widetilde{V}(\bm{z};\theta^{*}) by two dimensionality reduction approaches.

G.1 Ferrell’s three-ODE model: reduced potential

Refer to caption
Figure 7: Streamlines of the projected force G~(z)\widetilde{\bm{G}}(\bm{z}) and filled contour plot of the reduced potential V~(z;θ)\widetilde{V}(\bm{z};\theta^{*}) for Ferrell’s three-ODE model learned by enhanced EPR. The projected force field G~(z)\widetilde{\bm{G}}(\bm{z}) is decomposed into the gradient part V~(z;θ)-\nabla\widetilde{V}(\bm{z};\theta^{*}) (white arrows) and the non-gradient part (gray arrows). The length of an arrow corresponds to the magnitude of the vector. The solid dots are samples from the simulated invariant distribution.

We utilize the dimensionality reduction method on Ferrell’s three-ODE model for a simplified cell cycle dynamics [45], where the concentrations of the cyclin-dependent protein kinase (CDK1\textrm{CDK}1), Polo-like kinase 1 (Plk1\textrm{Plk}1), and the anaphase-promoting complex (APC), denoted by

x=[CDK1],y=[Plk1],z=[APC]x=[\textrm{CDK}1],\ y=[\textrm{Plk}1],\ z=[\textrm{APC}]

respectively, obey the ODEs in the domain Ω=[0,1]3\Omega=[0,1]^{3}

Fx(x,y,z)\displaystyle F_{x}(x,y,z) =α1β1xzn1K1n1+zn1,\displaystyle=\alpha_{1}-\beta_{1}x\frac{z^{n_{1}}}{K_{1}^{n_{1}}+z^{n_{1}}}, (62)
Fy(x,y,z)\displaystyle F_{y}(x,y,z) =α2(1y)xn2K2n2+xn2β2y,\displaystyle=\alpha_{2}\left(1-y\right)\frac{x^{n_{2}}}{K_{2}^{n_{2}}+x^{n_{2}}}-\beta_{2}y, (63)
Fz(x,y,z)\displaystyle F_{z}(x,y,z) =α3(1z)yn3K3n3+yn3β3z,\displaystyle=\alpha_{3}\left(1-z\right)\frac{y^{n_{3}}}{K_{3}^{n_{3}}+y^{n_{3}}}-\beta_{3}z, (64)

with α1=0.1,α2=α3=β1=3,β2=β3=1,K1=K2=K3=0.5,n1=n2=8,\alpha_{1}=0.1,\alpha_{2}=\alpha_{3}=\beta_{1}=3,\beta_{2}=\beta_{3}=1,K_{1}=K_{2}=K_{3}=0.5,n_{1}=n_{2}=8, and n3=8n_{3}=8. Since our focus in this paper is on the methodology of constructing the potential landscape, we refer the interested readers to the literature [45] for concrete biological meaning of the considered variables. We add the noise scale D=0.01D=0.01 with isotropic temporal Gaussian white noise. For this particular problem, we employ three-layer neural networks with 8080 hidden units in each layer. We generate enhanced samples (𝐱i)1i10000(\mathbf{x}^{\prime}_{i})_{1\leq i\leq 10000} by simulating from a more diffusive distribution with D=5DD^{\prime}=5D. The projected force 𝐆~(𝐳;θ)\widetilde{\mathbf{G}}(\mathbf{z};\theta^{*}) is trained by the loss (42) for 10001000 epochs using Adam optimizer with a learning rate 0.001 and a batch size 2048. The reduced variables 𝐳=(x,y)\mathbf{z}=(x,y)^{\top} are utilized during this training phase. Subsequently, we train the projected potential V~(𝐳;θ)\widetilde{V}(\mathbf{z};\theta^{*}) for 40004000 epochs using the enhanced EPR loss defined in (16), with the chosen values of λ1=0.1\lambda_{1}=0.1 and λ2=1.0\lambda_{2}=1.0. As shown in Fig. 7, the obtained reduced potential shows a plateau in the centering region and a local-well tube domain along the reduced limit cycle.

G.2 Eight-dimensional complex system: reduced potential

Refer to caption
Figure 8: Streamlines and limit sets of the projected force field of the 8D cell cycle model by two reduced variables CycB and Cdc20. The outer red circle is the stable limit cycle of the reduced force field corresponding to the yellow circle as the projection of the original high-dimensional limit cycle. The inner red circle, red dot and two green circles are stable and unstable limit sets of the reduced dynamics, respectively, which are virtual in high dimensions.

We consider an 8D system in which the dynamics and parameters are the same as in the supplementary information of [4]. We take CycB and Cyc20 as the reduction variable 𝒛\bm{z}, and set the mass in this problem as m=0.8m=0.8.

We first point out that the noise strength D=0.0005D=0.0005 used in [4] is not suitable here in training neural networks since this would lead to a potential of order O(105)O(10^{-5}). Using the idea in SI Section F.3, we amplify the original force field 𝑭\bm{F} in [4] by κ=1000\kappa=1000 times, and take D=0.01D=0.01 for the transformed force field. This amounts to setting D=105D=10^{-5} for the original force field, which is even smaller than the parameter considered in [4]. We simulate the SDE without boundaries with timestep 10510^{-5} until 𝖳=5\mathsf{T}=5, starting from initial states drawn from a uniform distribution in [0,1.25]8[0,1.25]^{8}. The enhanced samples (𝒙i)1iN(\bm{x}^{\prime}_{i})_{1\leq i\leq N^{\prime}} are obtained by adding Gaussian perturbations with a standard deviation σ=0.05\sigma=0.05 to the SDE-simulated dataset (𝒙i)1iN(\bm{x}_{i})_{1\leq i\leq N}. We only keep the data within the biologically meaningful domain of [0,1.25]8[0,1.25]^{8} of size 25000 for computation.

We use three-layer networks with 80 hidden states in each layer for both force and potential. For the projected force, we train 2000 epochs to obtain 𝑮~(𝒛;θ)\widetilde{\bm{G}}(\bm{z};\theta^{*}). Then we conduct the enhanced EPR (16) with λ1=0.01,λ2=1.0\lambda_{1}=0.01,\lambda_{2}=1.0 for 10000 epochs. We use Adam with a learning rate of 0.001 and batch size of 8192.

In Fig. 8, we present a more detailed picture of the reduced dynamics for the 8D model than Fig. 5c in MT. Specifically, we further show two unstable limit cycles of the projected force field obtained by reversed time integration (two green circles in Fig. 8). These two unstable limit cycles play the role of separatrices between the neighboring stable limit sets. This picture occurs due to the fact that the landscape of the considered system is very flat in the centering region. These inner limit sets are virtual in high dimensions, but they naturally appear in the reduced dynamics on the plane. Similar features might also occur in other reduced dynamics in two dimensions.

G.3 Fifty-two-dimensional multi-stable system: high-dimensional and reduced potentials

For fairness, we all use a neural network with three-layer and 20 hidden states in each layer to denote the reduced force 𝑮~(𝒛;θ)\widetilde{\bm{G}}(\bm{z};\theta^{*}) and potential V1~(𝒛;θ)\widetilde{V_{1}}(\bm{z};\theta^{*}), and 80 hidden states for the 52D potential V(𝒙;θ)V(\bm{x};\theta^{*}). We use enhanced samples simulated from a more diffusive distribution with D=2DD^{\prime}=2D. The data size is 20000 and we use Adam with a learning rate of 0.001 and batch size of 2048, still. We use λ1=10.0,λ2=1.0\lambda_{1}=10.0,\lambda_{2}=1.0 in enhanced EPR (16). We train the force 𝑮~(𝒛;θ)\widetilde{\bm{G}}(\bm{z};\theta^{*}) for 1000 epochs and conduct V1~(𝒛;θ)\widetilde{V_{1}}(\bm{z};\theta^{*}) by enhanced EPR with for 5000 epochs. We also learn a 52D potential V(𝒙;θ)V(\bm{x};\theta^{*}) by enhanced EPR for 5000 epochs and then project it to V2~(𝒛;θ)\widetilde{V_{2}}(\bm{z};\theta^{*}) by (35) for 1000 epochs.