\jvol

XX \jnumX \jyearYear

\dhead

RESEARCH ARTICLE

\subhead

INFORMATION SCIENCE

\authornote

Corresponding authors. Email: [email protected]; [email protected]

EPR-Net: Constructing non-equilibrium potential landscape via a variational force projection formulation

Yue Zhao¹ Wei Zhang^2,3,∗ Tiejun Li^1,4,5,∗ ¹Center for Data Science, Peking University, Beijing 100871, China ²Zuse Institute Berlin, Berlin 14195, Germany ³Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin 14195, Germany ⁴LMAM and School of Mathematical Sciences, Peking University, Beijing 100871, China ⁵Center for Machine Learning Research, Peking University, Beijing 100871, China

(XX XX Year; XX XX Year; XX XX Year)

Abstract

[ABSTRACT]We present EPR-Net, a novel and effective deep learning approach that tackles a crucial challenge in biophysics: constructing potential landscapes for high-dimensional non-equilibrium steady-state (NESS) systems. EPR-Net leverages a nice mathematical fact that the desired negative potential gradient is simply the orthogonal projection of the driving force of the underlying dynamics in a weighted inner-product space. Remarkably, our loss function has an intimate connection with the steady entropy production rate (EPR), enabling simultaneous landscape construction and EPR estimation. We introduce an enhanced learning strategy for systems with small noise, and extend our framework to include dimensionality reduction and state-dependent diffusion coefficient case in a unified fashion. Comparative evaluations on benchmark problems demonstrate the superior accuracy, effectiveness, and robustness of EPR-Net compared to existing methods. We apply our approach to challenging biophysical problems, such as an 8D limit cycle and a 52D multi-stability problem, which provide accurate solutions and interesting insights on constructed landscapes. With its versatility and power, EPR-Net offers a promising solution for diverse landscape construction problems in biophysics.

doi:

10.1093/nsr/XXXX

keywords:

high-dimensional potential landscape, non-equilibrium system, entropy production rate, dimensionality reduction, deep learning

INTRODUCTION

Since Waddington’s famous landscape metaphor on the development of cells in the 1950s [1], the construction of potential landscape for non-equilibrium biochemical reaction systems has been recognized as an important problem in theoretical biology, as it provides insightful pictures for understanding complex dynamical mechanisms of biological processes. This problem has attracted considerable attention in recent decades in both biophysics and applied mathematics communities. Until now, different approaches have been proposed to realize Waddington’s landscape metaphor in a rational way, and its connection to computing the optimal epigenetic switching paths and switching rates in biochemical reaction systems has also been extensively studied. See [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13] and the references therein for details and [14, 15, 16, 17, 18] for reviews. Broadly speaking, these proposals can be classified into two types: (T1) the construction of potential landscape in the finite noise regime [2, 3, 4, 5] and (T2) the construction of the quasi-potential in the zero noise limit [7, 6, 8].

For low-dimensional systems (i.e., dimension less than $4$ ), the potential landscape can be numerically computed either by solving a Fokker-Planck equation (FPE) using grid-based methods until the steady solution is reached approximately as in (T1) type proposals [3, 19], or by solving a Hamilton-Jacobi-Bellman (HJB) equation using, for instance, the ordered upwind method [20] or minimum action type method [8] as in (T2) type proposals. However, these approaches suffer from the curse of dimensionality when applied to high-dimensional systems. Although methods based on mean-field approximations are able to provide a semi-quantitative description of the potential landscape for some typical systems [4, 21], direct and general approaches are still favored in applications. In this aspect, pioneering work has been done recently, which allows direct construction of a high-dimensional potential landscape using deep neural networks (DNNs), based on either the steady viscous HJB equation satisfied by the potential landscape function in the (T1) case [22, 23], or the point-wise orthogonal decomposition of the force field in (T2) case [24].

While these works have significant advanced methodological developments, these approaches are based on solving HJB equations alone and may encounter numerical difficulties due to either non-uniqueness of the weak solution to the non-viscous HJB equation in (T2) case [25], or singularity of the solution with a small noise in (T1) case. Meanwhile, the loss functions considered in [24, 22, 23] are essentially of physics-informed neural network (PINN) form [26], so are generally non-convex, and thus might encounter troubling local minimum issues in the training process.

In this work, we present a simple yet efficient DNN approach to construct the potential landscape of high-dimensional non-equilibrium steady state (NESS) systems in (T1) type with either moderate or small noise. Its intimate connection with non-equilibrium statistical mechanics, nice variational structure and superior numerical performance make it a competitive and promising approach in landscape construction methodology. Our main contributions are as follows.

(1)

Proposal of entropy production rate (EPR) loss. We introduce the convex EPR loss function, reveal its connection with the entropy production rate in statistical physics, and propose its enhanced version using the tempering technique.
(2)

Dimensionality reduction. We put forward a simple dimensionality reduction strategy when the reducing variables are prescribed, and, interestingly, the reduction formalism has a unified formulation with the EPR framework for the primitive variables. This even holds whenever the system has constant or variable diffusion coefficients.
(3)

Successful high-dimensional applications. We successfully apply our approach to some challenging high-dimensional biological systems including an eight-dimensional (8D) cell cycle model [4] and a 52D multi-stable system [27]. Our results reveal more delicate structure of the constructed landscapes than what mean-field approximations typically provide, yet we acknowledge that mean-field approximations are not limited by the potential challenges in SDE simulations.

Overall, EPR-Net offers a promising solution for diverse landscape construction problems in biophysics. Even its nice mathematical structure and connection with non-equilibrium statistical physics make it a unique object that deserves further theoretical and numerical exploration in the future.

Refer to caption — Figure 1: Constructing energy landscapes through enhanced EPR workflow. (a) The primary objective is to construct the energy landscape defined through the steady-state distribution of the system. (b) Constructing the high-dimensional energy landscape using the EPR framework with primitive variables. (c) Constructing the dimensionality-reduced energy landscape using EPR with prescribed reduced variables.

EPR FRAMEWORK

In this section, we provide an overview of the whole EPR framework, including its primitive formulation, the physical interpretation, its convex property, and the enhanced EPR version.

Overview

Consider an ergodic stochastic differential equation (SDE)

\frac{\differential\bm{x}(t)}{\differential t}=\bm{F}(\bm{x}(t))+\sqrt{2D}\,\dot{\bm{w}},\quad\bm{x}(0)=\bm{x}_{0},

(1)

where $\bm{x}_{0}\in\mathbb{R}^{d}$ , $\bm{F}:\mathbb{R}^{d}\rightarrow\mathbb{R}^{d}$ is a smooth driving force, $\dot{\bm{w}}=(\dot{w}_{1},\ldots,\dot{w}_{d})^{\top}$ is the $d$ -dimensional temporal Gaussian white noise with $\mathbb{E}\dot{w}_{i}(t)=0$ and $\mathbb{E}[\dot{w}_{i}(t)\dot{w}_{j}(s)]=\delta_{ij}\delta(t-s)$ for $i,j=1,\ldots,d$ and $s,t\geq 0$ . The constant $D>0$ specifies the noise strength, which is often proportional to the system’s temperature $T$ . We denote by $p_{\mathrm{ss}}(\bm{x})$ its steady-state probability density function (PDF).

Following the (T1) type proposal in [3], we define the potential of (1) as $U=-D\ln p_{\mathrm{ss}}$ and the steady probability flux $\bm{J}_{\mathrm{ss}}=p_{\mathrm{ss}}\bm{F}-D\nabla p_{\mathrm{ss}}$ in the domain $\Omega$ , which we assume for simplicity is either $\mathbb{R}^{d}$ or a $d$ -dimensional hyperrectangle. The steady-state PDF $p_{\mathrm{ss}}(\bm{x})$ satisfies the Fokker-Planck equation (FPE)

-\nabla\cdot(p_{\mathrm{ss}}\bm{F})+D\Delta p_{\mathrm{ss}}=0,\quad\textrm{in}~{}\Omega,

(2)

and we assume the asymptotic boundary condition (BC) $p_{\mathrm{ss}}(\bm{x})\rightarrow 0$ as $|\bm{x}|\rightarrow\infty$ when $\Omega=\mathbb{R}^{d}$ , or the reflecting boundary condition $\bm{J}_{\mathrm{ss}}\cdot\bm{n}=0$ when $\Omega\subset\mathbb{R}^{d}$ is a $d$ -dimensional hyperrectangle, where $\bm{n}$ is the unit outer normal. In both cases, we have $p_{\mathrm{ss}}(\bm{x})\geq 0$ and $\int_{\Omega}p_{\mathrm{ss}}(\bm{x})\,\differential\bm{x}=1$ .

Here we illustrate the main EPR workflow through Fig. 1. As depicted in Fig. 1a, our primary objective is to construct the energy landscape $U=-D\ln p_{\mathrm{ss}}$ defined for Eq. (1). Leveraging our proposed approach, the enhanced EPR method, we first simulate the considered SDEs until steady state is reached in order to get sample points (the whole ‘SDE simulation’ column of Fig. 1). We then train a neural network, representing the potential $U$ , for the landscape construction, even when confronted with challenging high-dimensional scenarios. We initially introduce the EPR loss $\operatorname{L_{EPR}}$ (the middle column of Fig. 1b), which benefits from its convexity, with its minimum coinciding with the entropy production rate. Subsequently, we present the enhanced EPR loss $\operatorname{L_{enh}}$ , tailored to encompass the transition domain more effectively. Furthermore, the overall methodology can be easily extended to the dimensionality reduction problem (Fig. 1c), with a unified formulation as the EPR framework shown in Fig. 1b, but for the projected variables and corresponding loss functions $\operatorname{L_{P-EPR}}$ and $\operatorname{L_{P-enh}}$ .

EPR loss

Aiming at an effective approach for high-dimensional applications, we employ NNs to approximate $U(\bm{x})$ , and the key idea in this paper is to learn $U$ by training DNN with the loss function

\operatorname{L_{EPR}}(V)=\int_{\Omega}|\bm{F}(\bm{x})+\nabla V(\bm{x};\theta)|^{2}\,\differential\pi(\bm{x}),

(3)

where $V:=V(\bm{x};\theta)$ is a neural network function with parameters $\theta$ [28], and $\differential\pi(\bm{x})=p_{\mathrm{ss}}(\bm{x})\,\differential\bm{x}$ . To justify (3), we note that, for any function $W$ , $U$ satisfies the orthogonality relation

\int_{\Omega}\big{(}\bm{F}(\bm{x})+\nabla U(\bm{x})\big{)}\cdot\nabla W(\bm{x})\,\differential\pi(\bm{x})=0,

(4)

which follows from FPE (2), a simple integration by parts and the corresponding BC. Equation (4) means that $\bm{F}+\nabla U$ is perpendicular to the space $\mathcal{G}$ formed by $\nabla W$ for all possible $W$ under the $\pi$ -weighted inner product. Equivalently, $-\nabla U$ is the orthogonal projection of the driving force $\bm{F}$ onto $\mathcal{G}$ under the $\pi$ -weighted inner product, which implies that $U$ is the unique minimizer (up to a constant) of the loss function (3). See Supplementary Information (SI) Section A for derivations in detail. We note that related ideas are taken in coarse-graining of molecular systems [29, 30] and generative modeling in machine learning [31, 32]. However, to the best of our knowledge, using the loss in (3) to construct the potential of NESS systems has never been considered before.

In fact, define the $\pi$ -weighted inner product for any square integrable functions $f,g$ on $\Omega$ as

(f,g)_{\pi}:=\int_{\Omega}f(\bm{x})g(\bm{x})\,\differential\pi(\bm{x})

(5)

and the corresponding $L^{2}_{\pi}$ -norm $\|\cdot\|_{\pi}$ by $\|f\|^{2}_{\pi}:=(f,f)_{\pi}$ , we get a Hilbert space $L^{2}_{\pi}$ (see, e.g., [33, Chapter II.1]). Eq. (4) implies that the minimization of EPR loss finds the orthogonal projection of $\bm{F}$ to the function gradient space $\mathcal{G}$ , which is formed by function gradients $\nabla W$ for any $W$ , under the $\pi$ -weighted inner product. Choosing $W=U$ in (4) gives

\bm{F}(\bm{x})=-\nabla U(\bm{x})+\bm{l}(\bm{x}),\text{ such that }(\nabla U,\bm{l})_{\pi}=0.

(6)

However, we remark that this orthogonality holds only in the $L^{2}_{\pi}$ -inner product sense instead of the pointwise sense. Furthermore, the two orthogonality relations (4) and (6) can be understood as follows. Using (6), relation (4) is equivalent to $\int_{\Omega}\bm{l}\cdot\nabla Wd\pi=0$ for any $W$ . Integration by parts gives $\nabla\cdot(\bm{l}\,\mathrm{e}^{-U/D})=0$ , which is equivalent to $\nabla U\cdot\bm{l}+D\nabla\cdot\bm{l}=0$ . When $D\rightarrow 0$ , we recover the pointwise orthogonality, which is adopted in computing quasi-potentials in [24]. In mathematical language, (6) can be understood as the Hodge-type decomposition in $L^{2}_{\pi}$ instead of $L^{2}$ space.

To utilize (3) in numerical computations, we replace the ensemble average (3) with respect to the unknown $\pi$ by the empirical average with data sampled from (1):

\operatorname{\widehat{L}_{EPR}}(\theta)=\frac{1}{N}\sum_{i=1}^{N}\big{|}\bm{F}(\bm{x}_{i})+\nabla V(\bm{x}_{i};\theta)\big{|}^{2}.

(7)

Here $(\bm{x}_{i})_{1\leq i\leq N}$ could be either the final states (at time $\mathsf{T}$ ) of $N$ independent trajectories starting from different initializations, or equally spaced time series along a single long trajectory up to time $\mathsf{T}$ , where $\mathsf{T}\gg 1$ . In both cases, the ergodicity of SDE in (1) guarantees that (7) is a good approximation of (3) as long as $\mathsf{T}$ is large [34]. We adopt the former approach in the numerical experiments in this work, where the gradients of both $V$ (with respect to $\bm{x}$ ) and the loss itself (with respect to $\theta$ ) in (7) are calculated by auto-differentiation through PyTorch [35]. Given the involvement of an approximation on $\pi$ , we perform a formal stability analysis of (3) in SI Section B to ensure its reliability and robustness. Additionally, in the following subsection, we elucidate the benefits of EPR, particularly in terms of convexity and its physical interpretation pertaining to the entropy production rate.

Physical interpretation and convexity

The minimum loss of (3), denoted as $\operatorname{L_{EPR}}(U)$ , possesses a well-defined physical interpretation. In the following discussion, we show that this minimum EPR loss aligns precisely with the steady entropy production rate as defined in the NESS theory. Following [36, 37], we have the important identity concerning the entropy production for (1):

D\frac{\differential S(t)}{\differential t}=e_{p}(t)-h_{d}(t).

(8)

Here $S(t):=-\int_{\Omega}p(\bm{x},t)\ln p(\bm{x},t)\,\differential\bm{x}$ is the entropy of the probability density function $p(\bm{x},t)$ at time $t$ , $e_{p}$ is the entropy production rate (EPR)

e_{p}(t)=\int_{\Omega}\left|\bm{F}-D\nabla\ln p\right|^{2}p(\bm{x},t)\,\differential\bm{x},

(9)

and $h_{d}$ is the heat dissipation rate

h_{d}(t)=\int_{\Omega}\bm{F}(\bm{x})\cdot\bm{J}(\bm{x},t)\,\differential\bm{x},

(10)

with the probability flux $\bm{J}(\bm{x},t):=p(\bm{x},t)(\bm{F}(\bm{x})-D\nabla\ln p(\bm{x},t))$ at time $t$ . When $D=k_{B}T$ , the above formulas have clear physical meaning in statistical physics.

From the loss in (3) and the steady state of (9), we get the identity

	$\displaystyle\operatorname{L_{EPR}}(U)=$	$\displaystyle\int_{\Omega}\|\bm{J}_{\mathrm{ss}}\|^{2}\frac{1}{p_{\mathrm{ss}}}\,\differential\bm{x}$		(11)
	$\displaystyle=$	$\displaystyle\int_{\Omega}\left\|\bm{F}-D\nabla\ln p_{\mathrm{ss}}\right\|^{2}p_{\mathrm{ss}}\,\differential\bm{x}=e^{\rm{ss}}_{p},$		(11)

where $\bm{J}_{\mathrm{ss}}(\bm{x})$ is the steady probability flux and $e^{\rm{ss}}_{p}$ denotes the steady entropy production rate (EPR) of the NESS system (1) [36, 3, 37]. Therefore, minimizing (3) is equivalent to approximating the steady Entropy Production Rate. This correspondence provides a rationale for naming our approach ‘EPR-Net’.

EPR loss has an appealing property that it is strictly convex on $V$ (up to a constant), i.e.,

\operatorname{L_{EPR}}(V_{\omega})<(1-\omega)\operatorname{L_{EPR}}(V_{0})+\omega\operatorname{L_{EPR}}(V_{1})

(12)

for any $0<\omega<1$ and $V_{0},V_{1}$ which satisfy $\nabla(V_{0}-V_{1})\not\equiv 0$ , where $V_{\omega}:=(1-\omega)V_{0}+\omega V_{1}$ . Eq. (12) can be easily verified by direct calculations. This strict convexity guarantees the uniqueness of the critical point $V$ (up to a constant) and provides theoretical guarantee on the fast convergence of the training procedure under certain assumptions [38]. It may also contribute to better convergence behavior of EPR during the training process, as mentioned in the numerical comparisons part.

Enhanced EPR loss

Substituting the relation $p_{\mathrm{ss}}(\bm{x})=\exp(-U(\bm{x})/D)$ into (2), we get the viscous HJB equation

\mathcal{N}_{\textrm{HJB}}(U):=-\bm{F}\cdot\nabla U+D\Delta U-|\nabla U|^{2}+D\nabla\cdot\bm{F}=0

(13)

with the asymptotic BC $U\rightarrow\infty$ as $|\bm{x}|\rightarrow\infty$ in the case of $\Omega=\mathbb{R}^{d}$ , or the reflecting BC $(\bm{F}+\nabla U)\cdot\bm{n}=0$ on $\partial\Omega$ when $\Omega$ is a $d$ -dimensional hyperrectangle, respectively. As in the framework of PINNs [26], (13) motivates the HJB loss

\operatorname{L_{HJB}}(V)=\int_{\Omega}\big{|}\mathcal{N}_{\textrm{HJB}}(V(\bm{x};\theta))\big{|}^{2}\,\differential\mu(\bm{x}),

(14)

where $\mu$ is any desirable distribution. By choosing $\mu$ properly, this loss allows the use of sample data that better covers the domain $\Omega$ and, when combined with the loss in (3), leads to improvement of the training results when $D$ is small. Specifically, for small $D$ , we propose the enhanced loss which has the form

\operatorname{\widehat{L}_{enh}}(\theta)=\operatorname{\widehat{L}_{EPR}}(\theta)+\lambda\operatorname{\widehat{L}_{HJB}}(\theta),

(15)

where $\operatorname{\widehat{L}_{EPR}}(\theta)$ is defined in (7), $\operatorname{\widehat{L}_{HJB}}(\theta)=\frac{1}{N^{\prime}}\sum_{i=1}^{N^{\prime}}|\mathcal{N}_{\textrm{HJB}}(V(\bm{x}^{\prime}_{i};\theta))|^{2}$ is an approximation of (14) using an independent data set $(\bm{x}^{\prime}_{i})_{1\leq i\leq N^{\prime}}$ obtained by sampling the trajectories of (1) with a larger $D^{\prime}>D$ . The weight parameter $\lambda>0$ balances the contribution of the two terms in (15). While its value can be tuned based on system’s properties, in our numerical experiment we observed that the method performs well for $\lambda$ taking values in a relatively broad range. Note that the proposed strategy is both general and easily adaptable. For instance, one can alternatively use data $(\bm{x}^{\prime}_{i})_{1\leq i\leq N^{\prime}}$ that contains more samples in the transition region, or employ in (15) a modification of the loss (14) [22]. We further illustrate the motivation of enhanced EPR in SI Section E.

RESULTS FOR ENERGY LANDSCAPE CONSTRUCTION

In this section, we demonstrate the superiority of the enhanced EPR over the HJB loss alone and over the normalizing flow (NF), which is a class of generative models used for density estimation that leverage invertible transformations to map between complex data distributions and simple latent distributions [39], through 2D benchmark examples. We then apply the enhanced EPR to the 3D Lorenz model and a 12D Gaussian mixture model to show its effectiveness in constructing energy landscapes in higher dimensions. We remark that alternative approaches have been investigated for potential construction in limit cycles [40] and the Lorenz system [41] distinct from our methodology.

Two-dimensional benchmark examples

Within this subsection, we undertake a comparative analysis of 2D benchmark problems. These encompass a toy model, a 2D biological system exhibiting a limit cycle [3], and a 2D multi-stable system [5], see SI Section F.1-F.4 for training details, additional results and problem settings.

Constructing landscapes

The potential function $V(\bm{x};\theta)$ is approximated using a feedforward neural network architecture consisting of three hidden layers, employing the tanh activation function. Each hidden layer comprises 20 hidden states. Subsequently, we refer to the enhanced loss as

\operatorname{L_{\text{enh}}}=\lambda_{1}\operatorname{L_{\text{EPR}}}+\lambda_{2}\operatorname{L_{\text{HJB}}},

(16)

where $\lambda_{1}$ and $\lambda_{2}$ should be chosen to balance the corresponding loss terms. In SI Section F.1, we conduct a sensitivity analysis of $\lambda_{1}$ , demonstrating that our methods are robust and effective across a broad spectrum of scenarios.

As illustrated in Fig. 2a-c, we initially showcase well-constructed landscapes under the EPR framework. These include the acquired potentials $V(\bm{x};\theta^{*})$ , the decomposition of forces, and sample points derived from the simulated invariant distribution. In the toy model (Fig. 2a), the gradient of the potential (white arrows) points directly towards the corresponding attractor, while the non-gradient part of the force field (gray arrows) shows a counter-clockwise rotation in the model with a limit cycle (Fig. 2b), and a splitting-and-back flow from the attractor in the middle to the other two attractors in the tri-stable dynamical model (Fig. 2c).

Data enhancement techniques employed to generate improved samples are flexible, as visually depicted in Fig. 2d-f. We present choices for obtaining these samples, including generating more diffusive samples with a diffusion coefficient $D^{\prime}>D$ or introducing Gaussian perturbations with a standard deviation of $\sigma$ . We provide a more comprehensive analysis of the data enhancement in SI Section F.1, demonstrating the resilience of enhanced EPR to variations in the enhanced samples. Specifically in Fig. 2, we use $D^{\prime}=2D$ for the toy model, $D^{\prime}=10D$ for the multi-stable problem and $\sigma=0.05$ for the limit-cycle problem. We use the same size of the SDE-simulated dataset $(\bm{x}_{i})_{1\leq i\leq N}$ and the enhanced dataset $(\bm{x}^{\prime}_{i})_{1\leq i\leq N^{\prime}}$ , denoted as $N=N^{\prime}$ .

Numerical comparisons

We proceed to perform a quantitative and comprehensive comparison of the performance between enhanced EPR, solving HJB alone and NF. For the toy model, the true solution is analytically known (see SI Section F.2), while for the other two 2D examples, we compute the reference solution by discretizing the steady FPE using a piecewise bilinear finite element method on a fine rectangular grid and computing the obtained sparse linear system using the least squares solver (the normalization condition $\int_{\Omega}p_{\rm{ss}}(\bm{x})\differential\bm{x}=1$ is utilized to fix the additive constant). After training, we shift the minimum of the potentials to the origin and we measure their accuracy using the relative root mean square error (rRMSE) and the relative mean absolute error (rMAE):

	$\displaystyle\operatorname{rRMSE}$	$\displaystyle=\sqrt{\frac{\int_{\mathcal{D}}\left\|V(\bm{x};\theta^{*})-U_{0}(\bm{x})\right\|^{2}\differential\bm{x}}{\int_{\mathcal{D}}\|U_{0}(\bm{x})\|^{2}\differential\bm{x}}},$		(17)
	$\displaystyle\operatorname{rMAE}$	$\displaystyle=\frac{\int_{\mathcal{D}}\left\|V(\bm{x};\theta^{*})-U_{0}(\bm{x})\right\|\differential\bm{x}}{\int_{\mathcal{D}}\|U_{0}(\bm{x})\|\differential\bm{x}},$		(18)

on the domain $\mathcal{D}=\{\bm{x}\in\Omega|U_{0}(\bm{x})\leq 20D\}$ , where $U_{0}$ denotes the reference solution.

Table 1: Comparisons of different methods. We report the mean and the standard deviation over 5 random seeds. The best results are highlighted in bold.

Problem	Method	rRMSE	rMAE
Toy, D=0.1	Enhanced EPR	0.028 $\pm$ 0.010	0.026 $\pm$ 0.010
	HJB loss alone	0.034 $\pm$ 0.016	0.029 $\pm$ 0.014
	Normalizing Flow	0.124 $\pm$ 0.016	0.082 $\pm$ 0.011
Toy, D=0.05	Enhanced EPR	0.054 $\pm$ 0.020	0.048 $\pm$ 0.019
	HJB loss alone	0.191 $\pm$ 0.218	0.160 $\pm$ 0.179
	Normalizing Flow	0.239 $\pm$ 0.040	0.174 $\pm$ 0.034
Multi-stable	Enhanced EPR	0.067 $\pm$ 0.047	0.066 $\pm$ 0.049
	HJB loss alone	0.249 $\pm$ 0.015	0.228 $\pm$ 0.011
	Normalizing Flow	0.202 $\pm$ 0.029	0.133 $\pm$ 0.017
Limit Cycle	Enhanced EPR	0.070 $\pm$ 0.016	0.063 $\pm$ 0.020
	HJB loss alone	0.231 $\pm$ 0.048	0.140 $\pm$ 0.021
	Normalizing Flow	0.384 $\pm$ 0.021	0.267 $\pm$ 0.013

We present a detailed comparison of the 2D problems in Table 1. Our experiments involve solving HJB alone with various distributions of enhanced samples $\mu(\bm{x})$ , and we report the optimal outcome as the representative entry for the table. For a fair comparison between enhanced EPR and standalone HJB, we maintain consistent network training configurations. Additionally, a detailed analysis of $\lambda_{1}$ and $\mu(\mathbf{x})$ in SI Section F.1 highlights the enhanced EPR’s superior performance and robustness. Concerning the flow model, we leverage the same SDE-simulated dataset as utilized in enhanced EPR. The metrics presented in Table 1 yield the same consistent result, reinforcing the superiority of enhanced EPR over other methods across all problems.

We further illustrate the learned potential landscapes obtained through different methods for the 2D toy model with $D=0.05$ in Fig. 3a-c. Furthermore, we plot the absolute error between the learned potential and the reference solution in Fig. 3d-f. These plots demonstrate that errors within enhanced EPR are consistently small. In contrast, errors in HJB alone and NF can be significantly large, leading to unfavorable outcomes and limited smoothness. Based on this study and our numerical experiences, the advantages of the enhanced EPR over both the approach using HJB loss alone and NF can be summarized into the following points.

•

Accuracy. As shown in Table 1, the enhanced EPR achieves the best accuracy over the HJB loss alone and normalizing flow approaches in terms of the rRMSE and rMAE metric. From Fig. 3, we can also find that the enhanced EPR presents the overall landscape more consistent with the simulated samples and the true/reference solution than the other two methods. The decomposition of the force also shows a better matching result. The learned potential by the HJB loss alone and normalizing flow tends to be rough and non-smooth near the edge of samples (Fig. 3b and c). Compared with the enhanced EPR and HJB loss alone, the normalizing flow captures the high probability domain but does not explicitly take advantage of the information of the dynamics, thus making its performance the worst.
•

Robustness. Without the guidance of EPR loss, minimizing HJB loss alone does not lead to a good approximation of the true solution that can closely match the heuristically chosen distribution as shown in Fig. 3b. The significant variance observed in the results for the toy model with $D=0.05$ by solving HJB alone is attributed to an exceptionally poor outlier. However, the enhanced EPR always gives reliable approximations with the introduced setup. This supports the robustness of the enhanced EPR loss. We refer to SI Section F.1 for detailed analysis of the chosen parameters $\lambda_{1}$ and distribution of enhanced samples $\mu(\bm{x})$ .
•

Efficiency. Our numerical experiments show that the enhanced EPR converges faster than the approach using the HJB loss alone. For instance, in the toy model with $D=0.05$ , the enhanced EPR has achieved rRMSE of $0.088\pm 0.083$ and rMAE of $0.075\pm 0.070$ in 2000 epochs, while training with the HJB loss alone can not attain the same accuracy level even after 3000 epochs. As we have analyzed, this efficiency might be attributed to the strict convexity of the EPR loss. Related theoretical support for this speculation can be found in [38].
•

Performance of single EPR. We remark that single EPR, which relies solely on the EPR loss by setting $\lambda=0$ in (15), can still give acceptable results with a relatively small $D=0.05$ . Its rRMSE and rMAE are $0.103\pm 0.075$ and $0.087\pm 0.063$ , respectively, which are also small enough. In the limit cycle problem, single EPR is not as good as in the toy model and the multi-stable problem, since there are much fewer samples in the domain where the non-gradient force is extremely large, as shown in Fig. 2b. However, single EPR loss can achieve competitive performance as long as the samples cover the domain effectively.

Three-dimensional Lorenz system

We apply our landscape construction approach to the 3D Lorenz system [42] with isotropic temporal Gaussian white noise, i.e. the system (1) with $\bm{F}=(F_{x},F_{y},F_{z})$ and

$\displaystyle F_{x}(x,y,z)$	$\displaystyle=\beta_{1}(y-x),$	(19)
$\displaystyle F_{y}(x,y,z)$	$\displaystyle=x\left(\beta_{2}-z\right)-y,$	(20)
$\displaystyle F_{z}(x,y,z)$	$\displaystyle=xy-\beta_{3}z,$	(21)

where $\beta_{1}=10,\beta_{2}=28$ , and $\beta_{3}=\frac{8}{3}$ . We add the noise with strength $D=1$ . This model was also considered in [23] with $D=20$ . We obtain the enhanced data $(\bm{x}^{\prime}_{i})_{1\leq i\leq N^{\prime}}$ by adding Gaussian noises with standard deviation $\sigma=5$ to the SDE-simulated data $(\bm{x}_{i})_{1\leq i\leq N}$ , where $N=N^{\prime}=10000$ . We directly train the 3D potential $V(\bm{x};\theta)$ by enhanced EPR (16) with $\lambda_{1}=10.0,\lambda_{2}=1.0$ , using Adam with a learning rate of 0.001 and batch size of 2048. A slice view of the landscape is presented in Fig. 4a and the learned 3D potential agrees well with the simulated samples.

Twelve-dimensional multi-stable system

To further validate the efficacy of EPR-Net in high-dimensional scenarios, we construct a Gaussian mixture model (GMM) in $\mathbb{R}^{12}$ with two centers, of which the true solution can be denoted as $U_{0}(\bm{x})=-D\log p_{0}(\bm{x})$ , where $p_{0}(\bm{x})=w_{1}p_{1}(\bm{x})+w_{2}p_{2}(\bm{x})$ , and $p_{i}(\bm{x})=Z_{i}^{-1}\exp(-(\bm{x}-\bm{\mu}_{i})^{T}\Sigma_{i}^{-1}(\bm{x}-\bm{\mu}_{i})/2),i=1,2$ , with normalization factor $Z_{i}$ . The parameters are chosen as $w_{1}=0.6$ , $w_{2}=0.4$ , $\bm{\mu}_{1}=(1.2,2.0,0.6,1.5,0.9,1.5,1.5,0.9,1.2,1.2,0.5$ , $1.8)^{\top}$ , $\bm{\mu}_{2}=(1.8,1.4,0.8,0.9,0.9,1.5,2.0,1.0$ , $1.6,1.0,0.7,1.4)^{\top}$ , $\Sigma_{1}=0.04I$ , and $\Sigma_{2}=0.02I$ .

For evaluation, we first sample 10,000 points from the known GMM distribution and calculate the relative error between the learned 12D potential $V(\bm{x};\theta^{*})$ and the true potential $U_{0}(\bm{x})$ . The resulting rRMSE and rMAE are 0.054 and 0.051, respectively, indicating the high accuracy of the learned potential. In addition, we denote the first two dimensions of this 12D problem as $(x_{1},x_{2})$ and draw samples from the corresponding conditional distribution $p_{0}(\bm{x}|x_{1},x_{2})$ . As depicted in Fig. 4b, we compute the cumulative measure of the absolute error between $U_{0}(\bm{x})$ and the learned solution $V(\bm{x})$ based on the conditional distribution as $\int|U_{0}(\bm{x})-V(\bm{x})|p_{0}(\bm{x}|x_{1},x_{2})\differential\bm{x}$ . Impressively, this integral consistently remains below 0.01, underscoring the fact that our approach yields negligible errors in the effective domain. Moreover, comparing the barrier heights (BHs) further validates the precision of our approach, with a relative error of less than 5% for both BHs. Due to the diagonal covariance matrix, the saddle point precisely lies on the line passing through $\bm{\mu}_{1}$ and $\bm{\mu}_{2}$ , allowing us to directly compare the BHs. As shown in Fig 4c, the true and computed slice lines coincide well. For additional details, please refer to SI Section F.5.

DIMENSIONALITY REDUCTION

When applying the approach above to high-dimensional problems, dimensionality reduction is necessary in order to visualize the results and gain physical insights. For simplicity, we consider the linear case and, with a slight abuse of notation, let $\bm{x}=(\bm{y},\bm{z})^{\top}$ , where $\bm{z}=(x_{i},x_{j})\in\mathbb{R}^{2}$ contains the coordinates of two variables of interest, and $\bm{y}\in\mathbb{R}^{d-2}$ corresponds to the remaining $d-2$ variables. The domain $\Omega$ (either $\mathbb{R}^{d}$ or a $d$ -dimensional hyperrectangle) has the decomposition $\Omega=\Sigma\times\widetilde{\Omega}$ , where $\Sigma\subseteq\mathbb{R}^{d-2}$ and $\widetilde{\Omega}\subseteq\mathbb{R}^{2}$ are the domains of $\bm{y}$ and $\bm{z}$ , respectively. Automatic selection of linear reduced variables and extensions to nonlinear reduced variables with general domains are also possible [43, 44]. In the current setting, the reduced potential is

\widetilde{U}(\bm{z})=-D\ln\widetilde{p}_{\mathrm{ss}}(\bm{z})=-D\ln\int_{\Sigma}p_{\mathrm{ss}}(\bm{y},\bm{z})\,\differential\bm{y},

(22)

and one can show that $\widetilde{U}$ minimizes the loss function:

\operatorname{L_{P-EPR}}(\widetilde{V})=\int_{\Omega}\big{|}\bm{F}_{\bm{z}}(\bm{y},\bm{z})+\nabla_{\bm{z}}\widetilde{V}(\bm{z};\theta)\big{|}^{2}\,\differential\pi(\bm{y},\bm{z}),

(23)

where $\bm{F}_{\bm{z}}(\bm{y},\bm{z})\in\mathbb{R}^{2}$ is the $\bm{z}$ -component of the force field $\bm{F}=(\bm{F}_{\bm{y}},\bm{F}_{\bm{z}})^{\top}$ .

Moreover, one can derive an enhanced loss as in (15) that could be used for systems with small $D$ . To this end, we note that $\widetilde{U}$ satisfies the projected HJB equation

	$\displaystyle\mathcal{N}_{\textrm{P-HJB}}(\widetilde{U}):=$	$\displaystyle-\widetilde{\bm{F}}\cdot\nabla_{\bm{z}}\widetilde{U}+D\Delta_{\bm{z}}\widetilde{U}$
		$\displaystyle-\|\nabla_{\bm{z}}\widetilde{U}\|^{2}+D\nabla_{\bm{z}}\cdot\widetilde{\bm{F}}=0\,,$		(24)

with asymptotic BC $\widetilde{U}\rightarrow\infty$ as $|\bm{z}|\rightarrow\infty$ , or the reflecting BC $(\widetilde{\bm{F}}+\nabla_{\bm{z}}\widetilde{U})\cdot\widetilde{\bm{n}}=0$ on $\partial\widetilde{\Omega}$ , where $\widetilde{\bm{F}}(\bm{z}):=\int_{\Sigma}\bm{F}_{\bm{z}}(\bm{y},\bm{z})\differential\pi(\bm{y}|\bm{z})$ is the projected force defined using the conditional distribution $\differential\pi(\bm{y}|\bm{z})=p_{\mathrm{ss}}(\bm{y},\bm{z})/\widetilde{p}_{\mathrm{ss}}(\bm{z})\,\differential\bm{y}$ , and $\widetilde{\bm{n}}$ denotes the unit outer normal on $\partial\widetilde{\Omega}$ . Based on (DIMENSIONALITY REDUCTION), we can formulate the projected HJB loss

\operatorname{L_{P-HJB}}(\widetilde{V})=\int_{\widetilde{\Omega}}\big{|}\mathcal{N}_{\textrm{P-HJB}}(\widetilde{V})\big{|}^{2}\,\differential\mu(\bm{z}),

(25)

where $\mu$ is any suitable distribution over $\widetilde{\Omega}$ , and $\widetilde{\bm{F}}$ in (DIMENSIONALITY REDUCTION) is learned beforehand by training a DNN with the loss

\operatorname{L_{P-For}}(\widetilde{\bm{G}})=\int_{\Omega}\big{|}\bm{F}_{\bm{z}}(\bm{y},\bm{z})-\widetilde{\bm{G}}(\bm{z};\theta)\big{|}^{2}\,\differential\pi(\bm{y},\bm{z}).

(26)

The overall enhanced loss (16) used in numerical computations comprises two terms, which are empirical estimates of (23) and (25) based on two different sets of sample data. We refer the reader to SI Section C for derivation details, SI Section G.1 for experiments of Ferrell’s cell cycle model [45] and SI Section G.2 and G.3 for details and additional results of the 8D [4] and 52D [27] models.

Eight-dimensional cell cycle: reduced potential

Subsequently, we apply our dimensionality reduction approach to construct the landscape for an 8D cell cycle model [4], which contains both a limit cycle and a stable equilibrium point. In this case, we consider CycB (a cyclin B protein) and Cdc20 (an exit protein) as the reduced variables following [4]. As shown in Fig. 5a and b, we can find that both the profile of the reduced potential and the force strength agree well with the density of projected samples. Moreover, we gain some important insights from Fig. 5b on the projection of the high-dimensional dynamics to two dimensions. One particular feature is that the limit cycle induced by the projected force $\widetilde{\bm{G}}(\bm{x};\theta^{*})$ (outer red circle) slightly differs from the limit cycle directly projected from high dimensions (yellow circle), and the difference is either minor or moderate depending on whether the sample density near the limit circle is high or low. This is natural in the reduction when $D>0$ , since the distribution $\pi(\bm{y}|\bm{z})$ involved in computing $\widetilde{\bm{G}}(\bm{x};\theta^{*})$ is not a Dirac $\delta$ -distribution, but a diffusive distribution with varying widths along the limit cycle, and the difference will disappear as $D\rightarrow 0$ . Another feature is that we unexpectedly get an additional stable limit cycle (inner red circle) and a stable point (red dot in the center) emerging inside the outer limit cycle. Though virtual in high dimensions and biologically irrelevant, the existence of such two limit sets is reminiscent of the Poincaré-Bendixson theorem in planar dynamics theory [46, Chapter 10.6], which depicts a common phenomenon when performing dimensionality reduction with limit cycles to the 2D plane. Note that this theorem does not guarantee the stability of such fixed points; they could be stable, unstable, or even saddle points, see another case in SI G.1. The emergence of these two limit sets is specific in this model due to the relatively flat landscape of the potential in the centering region. In addition, close to the saddle point of $\widetilde{V}$ (green star), there is a barrier along the limit cycle direction, while there is a local well along the Cdc20 direction, which characterizes the region that biological cycle paths mainly go through. Last but not least, an enlarged view of the local attractive domain outside the limit cycle shows its intricate spiral structure (Fig. 5c), which has not been revealed in previous work based on mean-field approximation [4].

Fifty-two-dimensional multi-stable system: high-dimensional and reduced potentials

We compare two approaches to construct the reduced potential. One is to learn the reduced force $\widetilde{\bm{G}}(\bm{z};\theta^{*})$ by (26) first and then use it to construct the landscape $\widetilde{V}(\bm{z};\theta^{*})$ by enhanced EPR in $\bm{z}$ -space. The other is to build a high-dimensional potential $V(\bm{x};\theta^{*})$ by enhanced EPR first and then reduce it (see SI Section C.1). For this comparison, we apply our approach to a gene stem cell regulatory network described by an ODE in 52 dimensions [27] and take GATA6 (a major differentiation marker gene) and NANOG (a major stem cell marker gene) as the reduced variable $\bm{z}$ , as suggested in [27]. We define $\mathcal{A}_{i}$ as the set of indices for activating $x_{i}$ and $\mathcal{R}_{i}$ as the set of indices for repressing $x_{i}$ , the corresponding relationships are defined as the 52D node network shown in [27]. For $i=1,...,52$ ,

F_{i}(\bm{x})=-kx_{i}+\sum_{j\in\mathcal{A}_{j}}\frac{ax_{j}^{n}}{S^{n}+x_{j}^{n}}+\sum_{j\in\mathcal{R}_{j}}\frac{bS^{n}}{S^{n}+x_{j}^{n}},

(27)

where $a=0.37$ , $b=0.5$ , $k=1$ , $S=0.5$ , and $n=3$ . We choose the noise strength $D=0.02$ and focus on the domain $\Omega=[0,3]^{52}$ .

As shown in Fig. 5d, the projected force $\widetilde{\bm{G}}(\bm{z};\theta^{*})$ demonstrates the reduced dynamics and the constructed potential $\widetilde{V_{1}}(\bm{z};\theta^{*})$ agrees well with the SDE-simulated samples. While $\widetilde{V_{1}}(\bm{z};\theta^{*})$ is learned by first obtaining the projected force, $\widetilde{V_{2}}(\bm{z};\theta^{*})$ is reduced from a high-dimensional ${V}(\bm{x};\theta^{*})$ precomputed by enhanced EPR. The gray plot of the absolute error of $|\widetilde{V_{1}}(\bm{z};\theta^{*})-\widetilde{V_{2}}(\bm{z};\theta^{*})|$ is shown in Fig. 5e, which supports the consistency of two potentials learned by different approaches. When taking $\widetilde{V_{1}}(\bm{z};\theta^{*})$ as the reference solution, we get the rRMSE 0.113 and rMAE 0.097. The minor relative errors show that these two approaches to constructing the reduced potential are quantitatively consistent, although it is difficult to know which is more accurate. It also indirectly supports the reliability of the learned high-dimensional potential $V(\bm{x};\theta^{*})$ . The obtained landscape shows a smoother and more delicate profile compared to the mean-field approach [27].

DISCUSSION AND CONCLUSION

The EPR-Net formulation can be extended to the case of state-dependent diffusion coefficients without difficulty. Consider the Itô SDE

\frac{\differential\bm{x}(t)}{\differential t}=\bm{F}(\bm{x})+\sqrt{2D}\sigma(\bm{x})\,\dot{\bm{w}},\ \bm{x}(0)=\bm{x}_{0},

(28)

with diffusion matrix $\sigma(\bm{x})\in\mathbb{R}^{d\times m}$ and an $m$ -dimensional temporal Gaussian white noise $\dot{\bm{w}}$ . We assume that $m\geq d$ and the matrix $a(\bm{x}):=(\sigma\sigma^{\top})(\bm{x})$ satisfies $\bm{u}^{\top}a(\bm{x})\bm{u}\geq c_{0}|\bm{u}|^{2}$ for all $\bm{x},\bm{u}\in\mathbb{R}^{d}$ , where $c_{0}>0$ is a positive constant. Using a similar derivation as before, we can again show that the high-dimensional landscape function $U$ of (28) minimizes the EPR loss

\operatorname{L_{V-EPR}}(V)=\int_{\Omega}\left|\bm{F}^{v}(\bm{x})+a(\bm{x})\nabla V(\bm{x})\right|_{a^{-1}}^{2}\,\differential\pi{(\bm{x})},

(29)

where $\bm{F}^{v}(\bm{x})=\bm{F}(\bm{x})-D\nabla\cdot a(\bm{x})$ and $|\bm{u}|_{a^{-1}}^{2}:=\bm{u}^{\top}a^{-1}(\bm{x})\bm{u}$ for $\bm{u}\in\mathbb{R}^{d}$ . We provide derivation details of (29) in SI Section D, and we leave the numerical study of (28)–(29) for future work.

To summarize, we have demonstrated the formulation, applicability and superiority of EPR to the benchmark and real biological examples over some existing approaches. Below we make some final remarks. First, concerning the use of the steady-state distribution $\pi(\bm{x})$ in (3) and its approximation by a long time series of the SDE (1) in EPR-Net, we emphasize that it is the sampling approximation of $\pi$ that naturally captures the important parts of the potential function, and the landscape beyond the sampled regions is not that essential in practice. Second, as is exemplified in Fig. 2 and Table 1, we found that a direct application of density estimation methods (DEM), e.g., normalizing flows [39], to the sampled time series data does not give potential landscape with satisfactory accuracy. We speculate that such deficiency of DEM is due to its over-generality and the fact that it does not take advantage of the force field information explicitly compared to (3). Finally, we remark that in our dimensionality reduction approach, we choose the reduced variables according to available biophysics prior knowledge. Our approach also exhibits the capability to be extended to incorporate a state-dependent diffusion coefficient. Automatic selection of general nonlinear reduced variables can be also studied by incorporating the idea of the identification of collected variables in molecular simulations [47, 48].

Overall, we have presented the EPR-Net, a simple yet effective DNN approach, for constructing the non-equilibrium potential landscape of NESS systems. This approach is both elegant and robust due to its variational structure and its flexibility to be combined with other types of loss functions. Further extension of dimensionality reduction to nonlinear reduced variables and numerical investigations in the case of state-dependent diffusion coefficients will be explored in future works. Moreover, we are actively considering the application of our method to single-cell RNA sequencing data, which involves systems of really high dimensionality and typically lacks analytical ODE formulations. We will explore this point in our forthcoming studies.

METHODS

Detailed methods are given in the supplementary material. The code is accessible at https://github.com/yzhao98/EPR-Net.

ACKNOWLEDGMENTS

We thank Professors Chunhe Li, Xiaoliang Wan and Dr. Yufei Ma for helpful discussions. The computation of this work was conducted on the High-performance Computing Platform of Peking University.

FUNDING

T.L. and Y.Z. acknowledge support from the National Natural Science Foundational of China (11825102 and 12288101) and the Ministry of Science and Technology of China (2021YFA1003301). The work of W.Z. was supported by the Deutsche Forschungsgemeinschaft (DFG) under Germany’s Excellence Strategy, part of the Berlin Mathematics Research Centre MATH+ (EXC-2046/1, 390685689), and the DFG through Grant CRC 1114 ‘Scaling Cascades in Complex Systems’ (235221301).

AUTHOR CONTRIBUTIONS

Y.Z., W.Z. and T.L. designed the research; Y.Z. performed the research; Y.Z., W.Z. and T.L. wrote the paper.

Conflict of interest statement. None declared.

References

[1] Waddington C. The strategy of the genes 1957.
[2] Ao P. Potential in stochastic differential equations: novel construction. J. Phys. A-Math. Gen. 2004; 37: L25.
[3] Wang J, Xu L and Wang E. Potential landscape and flux framework of nonequilibrium networks: Robustness, dissipation, and coherence of biochemical oscillations. Proc. Nat. Acad. Sci. USA 2008; 105: 12271.
[4] Wang J, Li C and Wang E. Potential and flux landscapes quantify the stability and robustness of budding yeast cell cycle network. Proc. Nat. Acad. Sci. USA 2010; 107: 8195.
[5] Wang J, Zhang K, Xu L et al. Quantifying the Waddington landscape and biological paths for development and differentiation. Proc. Nat. Acad. Sci. USA 2011; 108: 8257.
[6] Zhou J, Aliyu M, Aurell E et al. Quasi-potential landscape in complex multi-stable systems. J. R. Soc., Interface 2012; 9: 3539.
[7] Ge H and Qian H. Landscapes of non-gradient dynamics without detailed balance: Stable limit cycles and multiple attractors. Chaos 2012; 22: 023140.
[8] Lv C, Li X, Li F et al. Constructing the energy landscape for genetic switching system driven by intrinsic noise. PLoS ONE 2014; 9: e88167.
[9] Shi J, Aihara K, Li T et al. Energy landscape decomposition for cell differentiation with proliferation effect. Nat. Sci. Rev. 2022; 9: nwac116.
[10] Maier RS and Stein DL. Escape problem for irreversible systems. Phys. Rev. E 1993; 48: 931.
[11] Aurell E and Sneppen K. Epigenetics as a first exit problem. Phys. Rev. Lett. 2002; 88: 048101.
[12] Sasai M and Wolynes P. Stochastic gene expression as a many-body problem. Proc. Nat. Acad. Sci. USA 2003; 100: 2374.
[13] Roma DM, O’Flanagan RA, Ruckenstein AE et al. Optimal path to epigenetic switching. Phys. Rev. E 2005; 71: 011902.
[14] Ferrell JEJ. Bistability, bifurcations, and Waddington’s epigenetic landscape. Curr. Biol. 2012; 22: R458.
[15] Wang J. Landscape and flux theory of non-equilibrium dynamical systems with application to biology. Adv. Phys. 2015; 64: 1.
[16] Zhou P and Li T. Construction of the landscape for multi-stable systems: Potential landscape, quasi-potential, A-type integral and beyond. J. Chem. Phys. 2016; 144: 094109.
[17] Yuan R, Zhu X, Wang G et al. Cancer as robust intrinsic state shaped by evolution: a key issues review. Rep. Prog. Phys. 2017; 80: 042701.
[18] Fang X, Kruse K, Lu T et al. Nonequilibrium physics in biology. Rev. Mod. Phys. 2019; 91: 045004.
[19] Li C, Hong T and Nie Q. Quantifying the landscape and kinetic paths for epithelial–mesenchymal transition from a core circuit. Phys. Chem. Chem. Phys. 2016; 18: 17949–56.
[20] Cameron M. Finding the quasipotential for nongradient SDEs. Phys. D 2012; 241: 1532.
[21] Li C and Wang J. Landscape and flux reveal a new global view and physical quantification of mammalian cell cycle. Proc. Nat. Acad. Sci. USA 2014; 111: 14130.
[22] Lin B, Li Q and Ren W. Computing the invariant distribution of randomly perturbed dynamical systems using deep learning. J. Sci. Comp. 2022; 91: 77.
[23] Lin B, Li Q and Ren W. Computing high-dimensional invariant distributions from noisy data. J. Comp. Phys. 2023; 474: 111783.
[24] Lin B, Li Q and Ren W. A data driven method for computing quasipotentials. Proc. Mach. Learn. Res., volume 145 of 2nd Annual Conference on Mathematical and Scientific Machine Learning (2021) 652.
[25] Crandall MG and Lions PL. Viscosity solution of Hamilton-Jacobi equations. Trans. Amer. Math. Soc. 1983; 277: 1.
[26] Raissi M, Perdikaris P and Karniadakis GE. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comp. Phys. 2019; 378: 686.
[27] Li C and Wang J. Quantifying cell fate decisions for differentiation and reprogramming of a human stem cell network: Landscape and biological paths. PLoS Comput. Biol. 2013; 9: e1003165.
[28] Goodfellow I, Bengio Y and Courville A. Deep learning 2016.
[29] Zhang L, Han J, Wang H et al. DeePCG: Constructing coarse-grained models via deep neural networks. J. Chem. Phys. 2018; 149: 034101.
[30] Husic BE, Charron NE, Lemm D et al. Coarse graining molecular dynamics with graph neural networks. J. Chem. Phys. 2020; 153: 194101.
[31] Hyvärinen A. Estimation of non-normalized statistical models by score matching. J. Mach. Learn. Res. 2005; 6: 695–709.
[32] Song Y and Ermon S. Generative modeling by estimating gradients of the data distribution. Wallach H, Larochelle H, Beygelzimer A et al., editors, Advances in Neural Information Processing Systems, volume 32 of NeurIPS (2019) 11918–30.
[33] Courant R and Hilbert D. Methods of mathematical physics 1953.
[34] Khasminskii R. Stochastic stability of differential equations 2012.
[35] Paszke A, Gross S, Massa F et al. Pytorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems, volume 32 of NeurIPS (2019) 8024.
[36] Qian H. Mesoscopic nonequilibrium thermodynamics of single macromolecules and dynamic entropy-energy compensation. Phys. Rev. E 2001; 65: 016102.
[37] Zhang X, Qian H and Qian M. Stochastic theory of nonequilibrium steady states and its applications. part i. Phys. Rep. 2012; 510: 1–86.
[38] Mei S, Montanari A and Nguyen PM. A mean field view of the landscape of two-layer neural networks. Proc. Nat. Acad. Sci. USA 2018; 115: E7665–E71.
[39] Kobyzev I, Prince SJD and Brubaker M. Normalizing flows: An introduction and review of current methods. IEEE Trans. Patt. Anal. Mach. Intel. 2021; 43: 3964–79.
[40] Zhu XM, Yin L and Ao P. Limit cycle and conserved dynamics. International Journal of Modern Physics B 2006; 20: 817–27.
[41] Ma Y, Tan Q, Yuan R et al. Potential function in a continuous dissipative chaotic system: Decomposition scheme and role of strange attractor. International Journal of Bifurcation and Chaos 2014; 24: 1450015.
[42] Lorenz EN. Deterministic nonperiodic flow. J. Atmos. Sci. 1963; 20: 130–41.
[43] Zhang W, Hartmann C and Schütte C. Effective dynamics along given reaction coordinates, and reaction rate theory. Faraday Discuss. 2016; 195: 365–94.
[44] Kang X and Li C. A dimension reduction approach for energy landscape: Identifying intermediate states in metabolism-emt network. Adv. Sci. 2021; 8: 2003133.
[45] Ferrell JE, Tsai TYC and Yang Q. Modeling the cell cycle: Why do certain circuits oscillate? Cell 2011; 144: 874–85.
[46] Hirsch MW, Smale S and Devaney RL. Differential equations, dynamical systems, and an introduction to chaos 2004.
[47] Bonati L, Zhang YY and Parrinello M. Neural networks-based variationally enhanced sampling. Proc. Nat. Acad. Sci., USA 2019; 116: 17641–7.
[48] Bonati L, Rizzi V and Parrinello M. Data-driven collective variables for enhanced sampling. J. Phys. Chem. Lett. 2020; 11: 2998–3004.
[49] Evans LC. Partial differential equations 2010.
[50] Holmes MH. Introduction to perturbation methods 2013.
[51] Frenkel D and Smit B. Understanding molecular simulation: From algorithms to applications 2002.
[52] Kingma DP and Ba J. Adam: a method for stochastic optimization. Proceedings of the International Conference on Learning Representations (2015) .
[53] Durkan C, Bekasov A, Murray I et al. Neural spline flows. Advances in Neural Information Processing Systems, volume 32 of NeurIPS (2019) .

Supplementary Information for EPR-Net:
Constructing non-equilibrium potential landscape via a variational force projection formulation

In this supplementary information (SI), we will present further theoretical derivations and computational details of the contents in the main text (MT). This SI consists of two parts: theory and computation.

PART 1: THEORY

We will first provide details of theoretical derivations omitted in the MT.

Appendix A Derivation of the EPR loss

In this section, we show that, up to an additive constant, the potential function $U(\bm{x}):=-D\ln p_{\mathrm{ss}}(\bm{x})$ is the unique minimizer of the EPR loss in equation (1) in MT.

First, we show that the orthogonality relation

\int_{\Omega}(\bm{F}+\nabla U)\cdot\nabla W\,\differential\pi=0

(30)

holds for any suitable function $W(\bm{x}):\mathbb{R}^{d}\rightarrow\mathbb{R}$ under both choices of the boundary conditions considered in this work, where $\differential\pi(\bm{x}):=p_{\mathrm{ss}}(\bm{x})\differential\bm{x}$ . To see this, we note that

		$\displaystyle\int_{\Omega}(\bm{F}+\nabla U)\cdot\nabla W\,\differential\pi$
	$\displaystyle=$	$\displaystyle\int_{\Omega}(p_{\mathrm{ss}}\bm{F}-D\nabla p_{\mathrm{ss}})\cdot\nabla W\,\differential\bm{x}$
	$\displaystyle=$	$\displaystyle\int_{\partial\Omega}W(p_{\mathrm{ss}}\bm{F}-D\nabla p_{\mathrm{ss}})\cdot\bm{n}\,\differential\bm{x}$
		$\displaystyle-\int_{\Omega}W\nabla\cdot(p_{\mathrm{ss}}\bm{F}-D\nabla p_{\mathrm{ss}})\,\differential\bm{x}$
	$\displaystyle=$	$\displaystyle 0$

where we have used integration by parts, the relation $p_{\mathrm{ss}}(\bm{x})=\exp(-U(\bm{x})/D)$ , the corresponding BC, and the steady state Fokker-Planck equation (FPE) satisfied by $p_{\mathrm{ss}}$ .

Now consider the EPR loss, we have

	$\displaystyle\operatorname{L_{EPR}}(V)=$	$\displaystyle\int_{\Omega}\|\bm{F}+\nabla V\|^{2}\,\differential\pi$
	$\displaystyle=$	$\displaystyle\int_{\Omega}\|\bm{F}+\nabla U\|^{2}+\|\nabla V-\nabla U\|^{2}\,\differential\pi,$

where we have used the orthogonality relation (30) to arrive at the last equality, from which we conclude that $U(\bm{x})$ is the unique minimizer of the EPR loss up to an additive constant.

Appendix B Stability of the EPR minimizer

In this section, we formally show that small perturbations of the invariant distribution $\pi$ will not introduce a disastrous change to the minimizer of the corresponding EPR loss. We only consider the case where $\Omega$ is a $d$ -dimensional hyperrectangle. The argument for $\Omega=\mathbb{R}^{d}$ is similar.

Suppose $\differential\pi({\bm{x}})=p({\bm{x}})\differential{\bm{x}}$ , $\differential\mu({\bm{x}})=q({\bm{x}})\differential{\bm{x}}$ , and the functions $U({\bm{x}})$ and $\bar{U}({\bm{x}})$ are the unique minimizers (up to a constant) of the following two EPR losses

	$\displaystyle U=\mathop{\arg\min}_{V}\int_{\Omega}\left\|\bm{F}+\nabla V\right\|^{2}\,\differential\pi,$
	$\displaystyle\bar{U}=\mathop{\arg\min}_{V}\int_{\Omega}\left\|\bm{F}+\nabla V\right\|^{2}\,\differential\mu,$

respectively. It is not difficult to find that the Euler-Lagrange equations of $U,\bar{U}$ are given by the following partial differential equation (PDE) with suitable BCs:

	$\displaystyle\nabla\cdot((\bm{F}+\nabla U)p)=0\mbox{ in }\Omega,\ (\bm{F}+\nabla U)\cdot\bm{n}=0\mbox{ on }\partial\Omega,$
	$\displaystyle\nabla\cdot((\bm{F}+\nabla\bar{U})q)=0\mbox{ in }\Omega,\ (\bm{F}+\nabla\bar{U})\cdot\bm{n}=0\mbox{ on }\partial\Omega.$

The PDEs above defined inside the domain $\Omega$ can be converted to

	$\displaystyle\Delta Up+\nabla U\cdot\nabla p=-\nabla\cdot(p\bm{F}),$
	$\displaystyle\Delta\bar{U}q+\nabla\bar{U}\cdot\nabla q=-\nabla\cdot(q\bm{F}).$

Define $U_{0}({\bm{x}})=-D\ln p({\bm{x}})$ and $\bar{U}_{0}({\bm{x}})=-D\ln q({\bm{x}})$ . We then obtain

	$\displaystyle-\nabla U\cdot\nabla U_{0}+D\Delta U=\bm{F}\cdot\nabla U_{0}-D\nabla\cdot\bm{F},$		(31)
	$\displaystyle-\nabla\bar{U}\cdot\nabla\bar{U}_{0}+D\Delta\bar{U}=\bm{F}\cdot\nabla\bar{U}_{0}-D\nabla\cdot\bm{F}.$		(32)

Assuming that $\delta U_{0}:=U_{0}-\bar{U}_{0}=O(\varepsilon)$ , where $0<\epsilon\ll 1$ denotes a small constant, we have the PDE for $U-\bar{U}$ by subtracting (32) from (31):

\displaystyle-\nabla(U-

\displaystyle\bar{U})\cdot\nabla U_{0}+D\Delta(U-\bar{U})=\bm{F}\cdot\nabla(\delta U_{0})+\nabla\bar{U}\cdot\nabla(\delta U_{0})

with BC $\nabla(U-\bar{U})\cdot\bm{n}=0$ . Since $U_{0},\bar{U},\bm{F}\sim O(1)$ , we can obtain that

U({\bm{x}})-\bar{U}({\bm{x}})=O(\varepsilon)

by the regularity theory of elliptic PDE [49, Section 6.3] when $D\sim O(1)$ , or by the matched asymptotic expansion when $D\ll 1$ [50, Chapter 2]. In fact, the closeness between $U(\bm{x})$ and $\bar{U}(\bm{x})$ can be ensured as long as $U_{0}$ and $\bar{U}_{0}$ are close enough in the region where $p(\bm{x})$ and $q(\bm{x})$ are bounded away from zero by the method of characteristics analysis [49, Section 2.1] and matched asymptotics.

The above derivations assume that the PDFs $p$ , $q$ are smooth enough. The stability analysis for general distributions $\pi$ and $\mu$ is also possible by utilizing the functional analysis tools. However, the derivations will be abstract and quite involved, and we will report it elsewhere.

Appendix C Dimensionality reduction

In this section, we study dimensionality reduction for high-dimensional problems in order to learn the projected potential. A straightforward approach is to first learn the high-dimensional potential $U$ and then find its low-dimensional representation, i.e., the reduced potential or the free energy function, using dimensionality reduction techniques. An alternative approach is to directly learn the low-dimensional reduced potential.

Denote by $\bm{x}=(\bm{y},\bm{z})^{\top}\in\Omega$ . As in MT Section 1.4, we assume the domain

\Omega=\Sigma\times\widetilde{\Omega},

where $\Sigma\subseteq\mathbb{R}^{d-2}$ and $\widetilde{\Omega}\subseteq\mathbb{R}^{2}$ are the domain of $\bm{y}$ and $\bm{z}$ , respectively. The reduced potential $\widetilde{U}(\bm{z})$ is defined as

\widetilde{U}(\bm{z})=-D\ln\widetilde{p}_{\mathrm{ss}}(\bm{z})=-D\ln\int_{\Sigma}p_{\mathrm{ss}}(\bm{y},\bm{z})\,\differential\bm{y}.

(33)

One natural approach for constructing $\widetilde{U}(\bm{z})$ is directly integrating $p_{\mathrm{ss}}(\bm{y},\bm{z})$ based on the learned $U(\bm{y},\bm{z})$ with the EPR loss, i.e.,

\widetilde{U}(\bm{z})=-D\ln\int_{\Sigma}\exp(-U(\bm{y},\bm{z})/D)\,\differential\bm{y}.

(34)

However, performing this integration is not a straightforward numerical task (see, e.g., [51, Chapter 7]).

C.1 Gradient projection loss

In this subsection, we study a simple approach to approximate $\widetilde{U}(\bm{z})$ based on sample points, which approximately obey the invariant distribution $\pi({\bm{x}})$ , and the learned high dimensional potential function $U(\bm{x})$ by EPR loss. This approach is taken in the consistency checking of the reduced potentials by different methods in SI Section G.3. The idea is to utilize the gradient projection (GP) loss on the $\bm{z}$ components of $\nabla U$ :

\operatorname{L_{GP}}(\widetilde{V})=\int_{\Omega}\big{|}\nabla_{\bm{z}}U(\bm{y},\bm{z})-\nabla_{\bm{z}}\widetilde{V}(\bm{z})\big{|}^{2}\,\differential\pi(\bm{y},\bm{z}).

(35)

To justify (35), we note that

	$\displaystyle\operatorname{L_{GP}}(\widetilde{V})=$	$\displaystyle\int_{\Omega}\big{\|}\nabla_{\bm{z}}U-\nabla_{\bm{z}}\widetilde{V}\big{\|}^{2}\,\differential\pi(\bm{x})$
	$\displaystyle=$	$\displaystyle\int_{\Omega}\big{\|}\nabla_{\bm{z}}U-\nabla_{\bm{z}}\widetilde{U}+\nabla_{\bm{z}}\widetilde{U}-\nabla_{\bm{z}}\widetilde{V}\big{\|}^{2}\,\differential\pi(\bm{x})$
	$\displaystyle=$	$\displaystyle\int_{\Omega}\Big{(}\big{\|}\nabla_{\bm{z}}U-\nabla_{\bm{z}}\widetilde{U}\big{\|}^{2}+\big{\|}\nabla_{\bm{z}}\widetilde{U}-\nabla_{\bm{z}}\widetilde{V}\big{\|}^{2}\Big{)}\,\differential\pi(\bm{x})$
		$\displaystyle+2\int_{\Omega}(\nabla_{\bm{z}}U-\nabla_{\bm{z}}\widetilde{U})\cdot\nabla_{\bm{z}}(\widetilde{U}-\widetilde{V})\,\differential\pi(\bm{x})$
	$\displaystyle=:$	$\displaystyle P_{1}+P_{2},$

where $P_{1}$ and $P_{2}$ denote the terms in the third and the fourth line above, respectively. The term $P_{2}=0$ since

		$\displaystyle\int_{\Omega}\nabla_{\bm{z}}U\cdot\nabla_{\bm{z}}(\widetilde{U}-\widetilde{V})\,\differential\pi(\bm{x})$
	$\displaystyle=$	$\displaystyle\int_{\widetilde{\Omega}}\left(\int_{\Sigma}\nabla_{\bm{z}}U\mathrm{e}^{-\frac{U}{D}}\differential\bm{y}\right)\cdot\nabla_{\bm{z}}(\widetilde{U}-\widetilde{V})\,\differential\bm{z}$
	$\displaystyle=$	$\displaystyle-D\int_{\widetilde{\Omega}}\nabla_{\bm{z}}\left(\int_{\Sigma}\mathrm{e}^{-\frac{U}{D}}\,\differential\bm{y}\right)\cdot\nabla_{\bm{z}}(\widetilde{U}-\widetilde{V})\,\differential\bm{z}$
	$\displaystyle=$	$\displaystyle-D\int_{\widetilde{\Omega}}\nabla_{\bm{z}}\widetilde{p}_{\mathrm{ss}}\cdot\nabla_{\bm{z}}(\widetilde{U}-\widetilde{V})\,\differential\bm{z}$
	$\displaystyle=$	$\displaystyle\int_{\widetilde{\Omega}}\nabla_{\bm{z}}\widetilde{U}\cdot\nabla_{\bm{z}}(\widetilde{U}-\widetilde{V})\,\widetilde{p}_{\mathrm{ss}}\,\differential\bm{z}$

and

\displaystyle\int_{\Omega}\nabla_{\bm{z}}\widetilde{U}\cdot\nabla_{\bm{z}}(\widetilde{U}-\widetilde{V})\,\differential\pi(\bm{x})=\int_{\widetilde{\Omega}}\nabla_{\bm{z}}\widetilde{U}\cdot\nabla_{\bm{z}}(\widetilde{U}-\widetilde{V})\,\widetilde{p}_{\mathrm{ss}}\,\differential\bm{z},

which cancel with each other in $P_{2}$ .

Therefore, the minimization of GP loss is equivalent to minimizing

\int_{\widetilde{\Omega}}\big{|}\nabla_{\bm{z}}\widetilde{U}-\nabla_{\bm{z}}\widetilde{V}\big{|}^{2}\,\widetilde{p}_{\mathrm{ss}}\,\differential\bm{z},

which clearly implies that $\widetilde{U}(\bm{z})$ is the unique minimizer (up to a constant) of the proposed GP loss.

C.2 Projected EPR loss

In this subsection, we study the projected EPR (P-EPR) loss, which has the form

\operatorname{L_{P-EPR}}(\widetilde{V})=\int_{\Omega}\big{|}\bm{F}_{\bm{z}}(\bm{y},\bm{z})+\nabla_{\bm{z}}\widetilde{V}(\bm{z})\big{|}^{2}\,\differential\pi(\bm{y},\bm{z}),

(36)

where $\bm{F}_{\bm{z}}(\bm{y},\bm{z})\in\mathbb{R}^{2}$ is the $\bm{z}$ -component of the force field $\bm{F}=(\bm{F}_{\bm{y}},\bm{F}_{\bm{z}})^{\top}$ .

Define

\operatorname{\widetilde{L}_{P-EPR}}(\widetilde{V})=\int_{\Omega}\big{|}\bm{F}(\bm{y},\bm{z})+\nabla\widetilde{V}(\bm{z})\big{|}^{2}\,\differential\pi(\bm{y},\bm{z}),

(37)

where $\nabla$ is the full gradient with respect to ${\bm{x}}$ . To justify (36), we first note the following equivalence

\min\operatorname{L_{P-EPR}}(\widetilde{V})\quad\Longleftrightarrow\quad\min\operatorname{\widetilde{L}_{P-EPR}}(\widetilde{V}),

(38)

since $\nabla_{\bm{y}}\widetilde{V}(\bm{z})=0$ and the $\bm{y}$ -components of $\bm{F}+\nabla\widetilde{V}$ only introduce an irrelevant constant in (37). Furthermore, we have

	$\displaystyle\operatorname{\widetilde{L}_{P-EPR}}(\widetilde{V})=$	$\displaystyle\int_{\Omega}\big{\|}\bm{F}+\nabla\widetilde{V}\big{\|}^{2}\,\differential\pi(\bm{x})$
	$\displaystyle=$	$\displaystyle\int_{\Omega}\big{\|}\bm{F}+\nabla U+\nabla\widetilde{V}-\nabla U\big{\|}^{2}\,\differential\pi(\bm{x})$
	$\displaystyle=$	$\displaystyle\int_{\Omega}\big{\|}\bm{F}+\nabla U\big{\|}^{2}+\big{\|}\nabla\widetilde{V}-\nabla U\big{\|}^{2}\,\differential\pi(\bm{x}),$

where the last equality is due to the orthogonality relation (30).

Using a similar argument for deriving (38), the equivalence (38) itself, as well as the GP loss in (35), we get

\min\operatorname{L_{P-EPR}}(\widetilde{V})\quad\Longleftrightarrow\quad\min\operatorname{L_{GP}}(\widetilde{V}).

(39)

Since $\widetilde{U}$ minimizes the GP loss as is shown in the previous subsection, we conclude that $\widetilde{U}$ minimizes the loss in (36).

C.3 Force projection loss

In this subsection, we study the force projection (P-For) loss for approximating the projection of $\bm{F}_{\bm{z}}$ onto the $\bm{z}$ -space.

Denote by

\widetilde{\bm{F}}(\bm{z}):=\int_{\Sigma}\bm{F}_{\bm{z}}(\bm{y},\bm{z})\,\differential\pi(\bm{y}|\bm{z})

(40)

the projected force defined using the conditional distribution

\differential\pi(\bm{y}|\bm{z})=p_{\mathrm{ss}}(\bm{y},\bm{z})/\widetilde{p}_{\mathrm{ss}}(\bm{z})\,\differential\bm{y}.

(41)

We can learn $\widetilde{\bm{F}}(\bm{z})$ via the following force projection loss

\operatorname{L_{P-For}}(\widetilde{\bm{G}})=\int_{\Omega}\big{|}\bm{F}_{\bm{z}}(\bm{y},\bm{z})-\widetilde{\bm{G}}(\bm{z})\big{|}^{2}\,\differential\pi(\bm{y},\bm{z}).

(42)

To justify (42), we note that

		$\displaystyle\int_{\Omega}\big{\|}\bm{F}_{\bm{z}}(\bm{y},\bm{z})-\widetilde{\bm{G}}(\bm{z})\big{\|}^{2}\,\differential\pi(\bm{y},\bm{z})$
	$\displaystyle=$	$\displaystyle\int_{\Omega}\Big{(}\|\bm{F}_{\bm{z}}(\bm{y},\bm{z})\|^{2}+\|\widetilde{\bm{G}}(\bm{z})\|^{2}\Big{)}\,\differential\pi(\bm{y},\bm{z})-2\int_{\Omega}\bm{F}_{\bm{z}}(\bm{y},\bm{z})\cdot\widetilde{\bm{G}}(\bm{z})\,\differential\pi(\bm{y},\bm{z})$
	$\displaystyle=:$	$\displaystyle P_{1}-2P_{2}.$

The term $P_{2}$ can be simplified as

\displaystyle P_{2}

\displaystyle=\int_{\widetilde{\Omega}}\left[\int_{\Sigma}\bm{F}_{\bm{z}}(\bm{y},\bm{z})\differential\pi(\bm{y}|\bm{z})\right]\cdot\widetilde{\bm{G}}(\bm{z})\,\widetilde{p}_{\mathrm{ss}}(\bm{z})\,\differential\bm{z}=\int_{\widetilde{\Omega}}\widetilde{\bm{F}}(\bm{z})\cdot\widetilde{\bm{G}}(\bm{z})\,\widetilde{p}_{\mathrm{ss}}(\bm{z})\,\differential\bm{z}.

Therefore, we have the equivalence

\min\operatorname{L_{P-For}}(\widetilde{\bm{G}})\quad\Longleftrightarrow\quad\min\operatorname{\widetilde{L}_{P-For}}(\widetilde{\bm{G}}),

(43)

where

\operatorname{\widetilde{L}_{P-For}}(\widetilde{\bm{G}}):=\int_{\widetilde{\Omega}}\big{|}\widetilde{\bm{F}}(\bm{z})-\widetilde{\bm{G}}(\bm{z})\big{|}^{2}\widetilde{p}_{\mathrm{ss}}(\bm{z})\,\differential\bm{z}.

From the analysis above we can conclude that $\widetilde{F}(\bm{z})$ minimizes the loss in (42).

C.4 HJB equation for the reduced potential

In this subsection, we show that the reduced potential $\widetilde{U}$ satisfies the projected HJB equation

\displaystyle\widetilde{\bm{F}}

\displaystyle\cdot\nabla_{\bm{z}}\widetilde{U}+|\nabla_{\bm{z}}\widetilde{U}|^{2}-D\Delta_{\bm{z}}\widetilde{U}-D\nabla_{\bm{z}}\cdot\widetilde{\bm{F}}=0\,,

(44)

Recall that $p_{\mathrm{ss}}(\bm{x})$ satisfies the FPE

\nabla\cdot(p_{\mathrm{ss}}\bm{F})-D\Delta p_{\mathrm{ss}}=0.

(45)

Integrating both sides of (45) on $\Sigma$ with respect to $\bm{y}$ and utilizing the boundary condition $\bm{J}_{\mathrm{ss}}\cdot\bm{n}=0$ , where $\bm{J}_{\mathrm{ss}}=p_{\mathrm{ss}}\bm{F}-D\nabla p_{\mathrm{ss}}$ , we get

\nabla_{\bm{z}}\cdot\Big{(}\int_{\Sigma}\bm{F}_{\bm{z}}p_{\mathrm{ss}}\,\differential\bm{y}\Big{)}-D\Delta_{\bm{z}}\widetilde{p}_{\mathrm{ss}}=0.

(46)

Taking (40) and (41) into account, we obtain

\nabla_{\bm{z}}\cdot\big{(}\widetilde{p}_{\mathrm{ss}}\widetilde{\bm{F}}\big{)}-D\Delta_{\bm{z}}\widetilde{p}_{\mathrm{ss}}=\nabla_{\bm{z}}\cdot\widetilde{\bm{J}}=0,

(47)

i.e., a FPE for $\widetilde{p}_{\mathrm{ss}}(\bm{z})$ with the reduced force field $\widetilde{\bm{F}}$ , where $\widetilde{\bm{J}}:=\widetilde{p}_{\mathrm{ss}}\widetilde{\bm{F}}-D\nabla_{\bm{z}}\widetilde{p}_{\mathrm{ss}}$ . The corresponding boundary condition can be also derived by integrating the original BC $\bm{J}_{\mathrm{ss}}\cdot\bm{n}=0$ on $\Sigma$ with respect to $\bm{y}$ for $\bm{z}\in\partial\widetilde{\Omega}$ , which gives

\widetilde{\bm{J}}\cdot\widetilde{\bm{n}}=\big{(}\widetilde{p}_{\mathrm{ss}}\widetilde{\bm{F}}-D\nabla_{\bm{z}}\widetilde{p}_{\mathrm{ss}}\big{)}\cdot\widetilde{\bm{n}}=0.

(48)

Substituting the relation $\widetilde{p}_{\mathrm{ss}}(\bm{z})=\exp(-\widetilde{U}(\bm{z})/D)$ into (47) and (48), we get (44) and the corresponding reflecting BC after some algebraic manipulations.

Appendix D State-dependent diffusion coefficients

In this section, we study the EPR loss for NESS systems with a state-dependent diffusion coefficient.

Consider the Itô SDE

\frac{\differential\bm{x}(t)}{\differential t}=\bm{F}(\bm{x}(t))+\sqrt{2D}{\sigma(\bm{x}(t))}\dot{\bm{w}}

(49)

with the state-dependent diffusion matrix $\sigma(\bm{x})$ . Under the same assumptions as in MT Section 1.1, we have the FPE

\nabla\cdot(p_{\mathrm{ss}}\bm{F})-D\nabla^{2}:\left(p_{\mathrm{ss}}a\right)=0.

(50)

We show that the high dimensional landscape function $U$ of (49) minimizes the EPR loss

\operatorname{L_{V-EPR}}(V)=\int_{\Omega}\left|\bm{F}^{v}(\bm{x})+a(\bm{x})\nabla V(\bm{x})\right|_{a^{-1}(\bm{x})}^{2}\differential\pi{(\bm{x})},

(51)

where $\bm{F}^{v}(\bm{x}):=\bm{F}(\bm{x})-D\nabla\cdot a(\bm{x})$ and $|\bm{u}|_{a^{-1}(\bm{x})}^{2}:=\bm{u}^{\top}a^{-1}(\bm{x})\bm{u}$ for $\bm{u}\in\mathbb{R}^{d}$ .

To justify (51), we first note that (50) can be rewritten as

\nabla\cdot(p_{\mathrm{ss}}\bm{F}^{v}-Da\nabla p_{\mathrm{ss}})=0\,,

(52)

which, together with the BC, implies the orthogonality relation

\displaystyle\int_{\Omega}\big{(}\bm{F}^{v}+a\nabla U\big{)}\cdot\nabla W\,\differential\pi=0

(53)

for a suitable test function $W(\bm{x})$ . Following the same reasoning used in establishing (30) and utilizing (53), we have

		$\displaystyle\int_{\Omega}\left\|\bm{F}^{v}+a\nabla V\right\|_{a^{-1}}^{2}\,\differential\pi$
	$\displaystyle=$	$\displaystyle\int_{\Omega}\big{\|}\bm{F}^{v}+a\nabla U+a\nabla(V-U)\big{\|}_{a^{-1}}^{2}\,\differential\pi$
	$\displaystyle=$	$\displaystyle\int_{\Omega}\|\bm{F}^{v}+a\nabla U\big{\|}_{a^{-1}}^{2}\,\differential\pi+\int_{\Omega}\big{\|}a\nabla(V-U)\big{\|}_{a^{-1}}^{2}\,\differential\pi.$

The last expression implies that $U(\bm{x})$ is the unique minimizer of $\operatorname{L_{V-EPR}}(V)$ up to a constant.

The above derivation for the state-dependent diffusion case will permit us to construct the landscape for the chemical Langevin dynamics, which will be studied in future works.

PART 2: COMPUTATION

Now we present the computational details omitted in the MT. We will first demonstrate the motivation for enhanced EPR, and then provide the training details and problem settings for landscape construction with original coordinates and dimension reduction. As mentioned in MT, we refer to the enhanced loss as Eq. (16).

Appendix E Motivation for enhanced EPR

We utilize enhanced samples to achieve a better coverage of the transition domain. Note that the EPR loss equation (3) in MT requires training data sampled according to $\pi(\bm{x})$ . The accuracy of the learned $V(\bm{x};\theta^{*})$ (more precisely, the accuracy of the gradient $\nabla V(\bm{x};\theta^{*})$ ) using equation (3) in MT is guaranteed only in the “visible” domain of $\pi$ , i.e., regions that are covered by sample points. However, the enhanced EPR framework (16) combines both the EPR loss and the HJB, which allows the use of sample data that better covers the domain $\Omega$ (e.g., more samples in the transition regions between meta-stable states and near the boundaries of the visible domain). This approach is particularly compelling when the diffusion coefficient D is relatively small.

The single EPR loss works well for the toy model with a relatively large diffusion coefficient $D=0.1$ . A slice plot of the potential at $y=1.5$ in Fig. 6a shows that the learned solution with EPR loss coincides well with the analytical solution. The relative root mean square error (rRMSE) and the relative mean absolute error (rMAE) have the mean and the standard deviation of $0.084\pm 0.006$ and $0.066\pm 0.008$ over 3 runs, respectively.

However, when decreasing $D$ to $0.05$ , the samples from the simulated invariant distribution mainly stay in the double wells (orange points in Fig. 6c) and are away from the transition region between the two wells. For this reason, as shown in Fig. 6b, the result with a single EPR loss captures the profile of the two wells, but these two parts of the profile are not accurately connected in the transition region where there are few samples, making the left well a bit higher than the right one. We then generate enhanced samples $(\bm{x}^{\prime}_{i})_{1\leq i\leq N^{\prime}}$ using $D^{\prime}=0.1$ , which have better coverage of the transition region (green points in Fig. 6a). Incorporating these enhanced samples, we learn the potential with the enhanced EPR loss and the result agrees better with the true solution (Fig. 6b).

Though the enhanced EPR is more recommended in general situations, we remark that the single EPR loss can achieve competitive performance as long as the samples cover the domain effectively. This numerical experience works for all of the considered models.

Appendix F Landscape with original coordinates

In this section, we will describe the additional setup where we utilize enhanced EPR for landscape construction with original coordinates in MT. This section includes a 2D toy model, a 2D biological system with a limit cycle [3], a 2D multi-stable system [5] and a 12D Gaussian mixture model (GMM).

F.1 Training details and additional results for benchmark problems

We begin by simulating the SDE using the Euler-Maruyama scheme with reflecting boundary conditions until time $\mathsf{T}=1000$ , starting from $10000$ different initial states. This simulation yields $N=10000$ final states, which are used as the training dataset $(\mathbf{x}_{i})_{1\leq i\leq N}$ to approximate the invariant distribution for EPR. For the 2D toy problem and 2D multi-stable problem, we employ a time-step size of 0.1, whereas for the 2D limit cycle problem, we use a smaller time-step size of 0.01. We train the network with a batch size of 1024 and a learning rate of 0.001 using the Adam [52] optimizer for 3000 epochs in the toy models, 5000 epochs in the multi-stable model and 10000 epochs in the limit cycle model. When focusing on single EPR loss in the toy model with $D=0.05$ , we extend to 8000 epochs to ensure adequate learning. At each training epoch, we update the dataset by performing one step Euler-Maruyama scheme to make it closer to the invariant distribution. For the comparison with normalizing flows, we train a neural spline flow [53].¹¹1https://github.com/643VincentStimper/normalizing-flows We repeat 4 blocks of the rational quadratic spline with three layers of 64 hidden units and a followed LU linear permutation. The flow model is trained by the Adam optimizer with the learning rate 0.0001 for 20000 epochs, based on the same SDE-simulated dataset as the enhanced EPR. To ensure the robustness of our results, we perform our experiments across 5 random seeds, which helps to account for variability and establish the reliability of our findings.

Table 2: Evaluation on various

\mu(\bm{x})

and

\hat{\lambda}_{1}

for HJB alone and enhanced EPR over 5 random seeds.

	Metrics	rRMSE				rMAE
		HJB	Enhanced EPR			HJB	Enhanced EPR
		$\times$ 0.0	$\times$ 0.1	$\times$ 1.0	$\times$ 10.0	$\times$ 0.0	$\times$ 0.1	$\times$ 1.0	$\times$ 10.0
Toy, $D=0.1$	$D^{\prime}=1D$	0.034 $\pm$ 0.016	——	——	——	0.029 $\pm$ 0.014	——	——	——
	$D^{\prime}=2D$	0.168 $\pm$ 0.010	0.062 $\pm$ 0.033	0.028 $\pm$ 0.010	0.038 $\pm$ 0.015	0.068 $\pm$ 0.009	0.108 $\pm$ 0.054	0.026 $\pm$ 0.010	0.032 $\pm$ 0.013
	$D^{\prime}=5D$	0.225 $\pm$ 0.071	0.138 $\pm$ 0.021	0.080 $\pm$ 0.011	0.046 $\pm$ 0.018	0.123 $\pm$ 0.097	0.032 $\pm$ 0.025	0.030 $\pm$ 0.011	0.030 $\pm$ 0.019
	$D^{\prime}=10D$	0.258 $\pm$ 0.078	0.182 $\pm$ 0.033	0.095 $\pm$ 0.010	0.067 $\pm$ 0.012	0.167 $\pm$ 0.110	0.062 $\pm$ 0.040	0.038 $\pm$ 0.010	0.037 $\pm$ 0.009
Toy, $D=0.05$	$D^{\prime}=1D$	0.191 $\pm$ 0.218	——	——	——	0.160 $\pm$ 0.179	——	——	——
	$D^{\prime}=2D$	0.267 $\pm$ 0.116	0.058 $\pm$ 0.045	0.054 $\pm$ 0.020	0.111 $\pm$ 0.083	0.164 $\pm$ 0.120	0.049 $\pm$ 0.036	0.048 $\pm$ 0.019	0.096 $\pm$ 0.069
	$D^{\prime}=5D$	0.661 $\pm$ 0.126	0.417 $\pm$ 0.162	0.204 $\pm$ 0.213	0.055 $\pm$ 0.028	0.555 $\pm$ 0.093	0.328 $\pm$ 0.155	0.127 $\pm$ 0.184	0.029 $\pm$ 0.008
	$D^{\prime}=10D$	0.544 $\pm$ 0.061	0.485 $\pm$ 0.126	0.227 $\pm$ 0.132	0.191 $\pm$ 0.190	0.489 $\pm$ 0.057	0.410 $\pm$ 0.096	0.157 $\pm$ 0.136	0.122 $\pm$ 0.168
Multi-stable	$D^{\prime}=1D$	0.255 $\pm$ 0.007	——	——	——	0.241 $\pm$ 0.004	——	——	——
	$D^{\prime}=2D$	0.249 $\pm$ 0.015	0.069 $\pm$ 0.045	0.110 $\pm$ 0.028	0.146 $\pm$ 0.044	0.228 $\pm$ 0.011	0.065 $\pm$ 0.046	0.106 $\pm$ 0.030	0.142 $\pm$ 0.047
	$D^{\prime}=5D$	0.628 $\pm$ 0.046	0.061 $\pm$ 0.031	0.090 $\pm$ 0.042	0.128 $\pm$ 0.043	0.553 $\pm$ 0.063	0.059 $\pm$ 0.032	0.088 $\pm$ 0.046	0.124 $\pm$ 0.044
	$D^{\prime}=10D$	0.630 $\pm$ 0.051	0.307 $\pm$ 0.036	0.067 $\pm$ 0.047	0.118 $\pm$ 0.032	0.550 $\pm$ 0.066	0.197 $\pm$ 0.036	0.066 $\pm$ 0.049	0.118 $\pm$ 0.034
Limit Cycle	$\sigma=0.0$	0.231 $\pm$ 0.048	——	——	——	0.140 $\pm$ 0.021	——	——	——
	$\sigma=0.05$	0.287 $\pm$ 0.175	0.256 $\pm$ 0.181	0.070 $\pm$ 0.016	0.136 $\pm$ 0.002	0.165 $\pm$ 0.108	0.149 $\pm$ 0.107	0.063 $\pm$ 0.020	0.108 $\pm$ 0.003
	$\sigma=0.1$	0.503 $\pm$ 0.133	0.372 $\pm$ 0.173	0.373 $\pm$ 0.181	0.122 $\pm$ 0.008	0.313 $\pm$ 0.092	0.222 $\pm$ 0.119	0.200 $\pm$ 0.090	0.101 $\pm$ 0.006
	$\sigma=0.2$	0.578 $\pm$ 0.010	0.556 $\pm$ 0.013	0.509 $\pm$ 0.025	0.298 $\pm$ 0.055	0.383 $\pm$ 0.025	0.355 $\pm$ 0.030	0.303 $\pm$ 0.040	0.175 $\pm$ 0.029

For a fair comparison, we fix $\lambda_{2}$ at 1.0 across all models. The parameters we use in the MT is $\lambda_{1}=1.0$ for toy model and multi-stable model, and $\lambda_{1}=0.01$ for the limit cycle model. As discussed in the MT, the selection of $\lambda_{1}$ aims to balance the two terms of the loss function. Based on our experience, systems with higher entropy production rates—typically featuring more non-gradient components in their force decomposition—require a smaller $\lambda_{1}$ , as in the limit cycle problem. However, the specific choice of $\lambda_{1}$ is relatively robust. As demonstrated in Table 2, we scrutinize the stability of $\lambda_{1}$ by testing its sensitivity within the enhanced EPR method. We adjust $\lambda_{1}$ by multiplying it by a set of factors {0.0, 0.1, 1.0, 10.0} as $\hat{\lambda}_{1}$ for different experiments. Then $\hat{\lambda}_{1}=0.0$ corresponds to the scenario of using HJB alone. Despite the varying distributions of $\mu(\mathbf{x})$ , our selected parameters typically yield satisfactory outcomes. Even with a relatively small $\hat{\lambda}_{1}$ , enhanced EPR outperforms solving by HJB alone. The performance enhancements are more pronounced when relatively large values of $\hat{\lambda}_{1}$ are used.

We also evaluate different distributions of samples using the HJB loss, i.e., $\mu(\bm{x})$ . For enhanced EPR, using invariant distribution in the HJB loss term is ineffective, as HJB loss is involved to cover the transition domain. Consequently, there is no need to evaluate $\mu(\bm{x})$ with $D^{\prime}=1D$ or $\sigma=0$ for enhanced EPR. In the case of the HJB loss, our numerical analysis suggests that overly diffusive samples can lead to significantly worse outcomes, underscoring the importance of capturing the critical domains within the loss. As shown in Table 2, the optimal results for HJB alone are achieved using the invariant distribution or a distribution slightly more extensive than that. However, with the same enhanced samples from $\mu(\bm{x})$ , introducing a non-zero $\lambda_{1}$ enhances performance compared to HJB alone, particularly when the sample domain is so diffusive that solving HJB alone performs poorly. This emphasizes the robustness of enhanced EPR to variations in $\mu(\bm{x})$ and its advantage over solving HJB alone. Finally, we emphasize the optimal outcome for $\hat{\lambda}_{1}=1.0\times\lambda_{1}$ across various $\mu(\bm{x})$ for each problem by bolding it in Table 2, serving as the result for enhanced EPR in Table 1 of the MT. While this may not be the best result among all the combinations of $\mu(\bm{x})$ and $\hat{\lambda}_{1}$ , it is a commonly robust choice that can be readily achieved without dedicated tuning.

F.2 2D toy model

To verify the applicability and accuracy of our method, we initially apply it to a toy model with the driving force

\bm{F}(\bm{x})=-(I+A)\nabla U_{0}(\bm{x}),

(54)

where $A\in\mathbb{R}^{d\times d}$ is a constant skew-symmetric matrix, i.e., $A^{\top}=-A$ , and $U_{0}$ is some known function. With this choice of $\bm{F}$ , one can check that the true potential landscape is simply $U(\bm{x})=U_{0}(\bm{x})$ . In particular, the system becomes reversible when $A=0$ . We construct a 2D toy model with the double-well potential as

U_{0}(\bm{x})=((x-1.5)^{2}-1.0)^{2}+0.5(y-1.5)^{2},

(55)

where $\bm{x}=(x,y)^{\top}$ . We take the anti-symmetric matrix

A=\begin{bmatrix}0&0.5\\ -0.5&0\end{bmatrix},

(56)

which introduces a counter-clockwise rotation for a focusing central force field. This sets up a simple non-equilibrium system. In this model, we have the force decomposition

\bm{F}(\bm{x})=-\nabla U_{0}(\bm{x})+\bm{l}(\bm{x}),\ \bm{l}(\bm{x})=-A\nabla U_{0}(\bm{x})

and

\bm{l}(\bm{x})\cdot\nabla U_{0}(\bm{x})=0,\quad\nabla\cdot\bm{l}(\bm{x})=0

hold in the pointwise sense. Therefore, the identity $\bm{l}(\bm{x})\cdot\nabla U_{0}(\bm{x})+D\nabla\cdot\bm{l}(\bm{x})=0$ is satisfied for any $D>0$ and, following the discussions in MT, we have constructed a non-reversible system with analytically known double-well potential which can be used to verify the accuracy of the learned potential. We focus on the domain $\Omega=[0,3]\times[0,3]$ . To fix the extra shifting degree of freedom of the potential function, we set the minimum of the potential to be zero and only plot the result on the domain $\{\bm{x}|V(\bm{x})\leq 30D\}$ in Fig. 3 in MT since it is the domain of practical interest.

F.3 Two-dimensional limit cycle model

We apply our approach to the limit cycle dynamics with a Mexican-hat shape landscape [3].

Before introducing the concrete model, let us make the following observation. For any SDE like

\frac{\differential\bm{x}}{\differential t}=\bm{F}(\bm{x})+\sqrt{2D}\dot{\bm{w}},

(57)

the corresponding steady FPE is

\nabla\cdot(p_{\mathrm{ss}}\bm{F})-D\Delta p_{\mathrm{ss}}=0.

If we make the transformation

\bm{F}\rightarrow\kappa\bm{F},\ D\rightarrow\kappa D

in (57), then the steady state PDF

p_{\mathrm{ss}}(\bm{x})=\exp(-\frac{U(\bm{x})}{D})=\exp(-\frac{\kappa U(\bm{x})}{\kappa D})

will not change. The transformation only changes the timescale of the dynamics (57) from $t_{0}$ to $\kappa t_{0}$ . However, the potential changes from $U$ to $\kappa U$ if we utilize the drift $\kappa\bm{F}(\bm{x})$ and noise strength $\kappa D$ in the system (57). This observation allows us to set the scale of $U$ to be $O(1)$ by adjusting $\kappa$ suitably for a specific problem. An alternative approach to accomplish this task is by choosing $\bm{F}$ to be $\kappa\bm{F}$ in the EPR loss.

We take $D=0.1$ and consider the stochastic dynamics (57) with $\bm{F}=(F_{x},F_{y})$ and

	$\displaystyle F_{x}(x,y)$	$\displaystyle=\kappa\left(\frac{\alpha^{2}+x^{2}}{1+x^{2}}\frac{1}{1+y}-ax\right),$		(58)
	$\displaystyle F_{y}(x,y)$	$\displaystyle=\frac{\kappa}{\tau_{0}}\left(b-\frac{y}{1+cx^{2}}\right),$		(59)

where the parameters are $\kappa=100,\alpha=a=b=0.1,c=100$ , and $\tau_{0}=5$ . Here the choice of $\kappa=100$ is made such that $U\sim O(1)$ following [22]. We focus on the domain $\Omega=[0,8]\times[0,6]$ . As explained in the paragraph above, our setting corresponds to the case $D=0.1/\kappa=0.001$ for the force field considered in [3].

F.4 Two-dimensional multi-stable model

We also study the dynamics (57) for a multi-stable system [5] with $\bm{F}=(F_{x},F_{y})$ and

	$\displaystyle F_{x}(x,y)$	$\displaystyle=\frac{ax^{n}}{S^{n}+x^{n}}+\frac{bS^{n}}{S^{n}+y^{n}}-k_{1}x,$		(60)
	$\displaystyle F_{y}(x,y)$	$\displaystyle=\frac{ay^{n}}{S^{n}+y^{n}}+\frac{bS^{n}}{S^{n}+x^{n}}-k_{2}y,$		(61)

where the parameters are $a=b=k_{1}=k_{2}=1$ , $S=0.5$ , and $n=4$ . We focus on the domain $\Omega=[0,3]\times[0,3]$ and present the results for $D=0.01$ in MT.

F.5 Twelve-dimensional GMM model: high-dimensional potential

As mentioned in MT Methods 3.4, we can construct models with known potential with driving force like Eq. (54). The true solution $U_{0}$ is denoted by $U_{0}(\bm{x})=-D\log p_{0}(\bm{x})$ , where $p_{0}$ is the probability distribution of the GMM, as defined in MT Section 1.3. We randomly generate a matrix $A_{0}$ with elements uniformly in $[0,1]$ , then obtain an skew-symmetric matrix by $A=(A_{0}-A_{0}^{\top})/2$ . The non-reversible dynamics is then constructed with driving force in (54). We set the noise strength as $D=0.01$ for this problem and use $D^{\prime}=5D$ for enhanced samples. We use a three-layer neural network with 80 hidden states per layer. The data size is 10000 and we use Adam with a learning rate of 0.001 and batch size of 2048. We train enhanced EPR with $\lambda_{1}=0.1,\lambda_{2}=1.0$ for 5000 epochs.

Appendix G Landscape with reduced coordinates

In this section, we provide additional details and results related to the dimension reduction problem. First, we present additional experiments of the 3D Ferrell’s cell cycle problem [45]. We then demonstrate an 8D limit cycle dynamics [4], in which interesting properties emerge with projected force $\widetilde{\bm{G}}(\bm{z};\theta^{*})$ and potential $\widetilde{V}(\bm{z};\theta^{*})$ . Finally, we present a 52D multi-stable dynamics [27], on which we compare the reduced potential $\widetilde{V}(\bm{z};\theta^{*})$ by two dimensionality reduction approaches.

G.1 Ferrell’s three-ODE model: reduced potential

We utilize the dimensionality reduction method on Ferrell’s three-ODE model for a simplified cell cycle dynamics [45], where the concentrations of the cyclin-dependent protein kinase ( $\textrm{CDK}1$ ), Polo-like kinase 1 ( $\textrm{Plk}1$ ), and the anaphase-promoting complex (APC), denoted by

x=[\textrm{CDK}1],\ y=[\textrm{Plk}1],\ z=[\textrm{APC}]

respectively, obey the ODEs in the domain $\Omega=[0,1]^{3}$

$\displaystyle F_{x}(x,y,z)$	$\displaystyle=\alpha_{1}-\beta_{1}x\frac{z^{n_{1}}}{K_{1}^{n_{1}}+z^{n_{1}}},$	(62)
$\displaystyle F_{y}(x,y,z)$	$\displaystyle=\alpha_{2}\left(1-y\right)\frac{x^{n_{2}}}{K_{2}^{n_{2}}+x^{n_{2}}}-\beta_{2}y,$	(63)
$\displaystyle F_{z}(x,y,z)$	$\displaystyle=\alpha_{3}\left(1-z\right)\frac{y^{n_{3}}}{K_{3}^{n_{3}}+y^{n_{3}}}-\beta_{3}z,$	(64)

with $\alpha_{1}=0.1,\alpha_{2}=\alpha_{3}=\beta_{1}=3,\beta_{2}=\beta_{3}=1,K_{1}=K_{2}=K_{3}=0.5,n_{1}=n_{2}=8,$ and $n_{3}=8$ . Since our focus in this paper is on the methodology of constructing the potential landscape, we refer the interested readers to the literature [45] for concrete biological meaning of the considered variables. We add the noise scale $D=0.01$ with isotropic temporal Gaussian white noise. For this particular problem, we employ three-layer neural networks with $80$ hidden units in each layer. We generate enhanced samples $(\mathbf{x}^{\prime}_{i})_{1\leq i\leq 10000}$ by simulating from a more diffusive distribution with $D^{\prime}=5D$ . The projected force $\widetilde{\mathbf{G}}(\mathbf{z};\theta^{*})$ is trained by the loss (42) for $1000$ epochs using Adam optimizer with a learning rate 0.001 and a batch size 2048. The reduced variables $\mathbf{z}=(x,y)^{\top}$ are utilized during this training phase. Subsequently, we train the projected potential $\widetilde{V}(\mathbf{z};\theta^{*})$ for $4000$ epochs using the enhanced EPR loss defined in (16), with the chosen values of $\lambda_{1}=0.1$ and $\lambda_{2}=1.0$ . As shown in Fig. 7, the obtained reduced potential shows a plateau in the centering region and a local-well tube domain along the reduced limit cycle.

G.2 Eight-dimensional complex system: reduced potential

We consider an 8D system in which the dynamics and parameters are the same as in the supplementary information of [4]. We take CycB and Cyc20 as the reduction variable $\bm{z}$ , and set the mass in this problem as $m=0.8$ .

We first point out that the noise strength $D=0.0005$ used in [4] is not suitable here in training neural networks since this would lead to a potential of order $O(10^{-5})$ . Using the idea in SI Section F.3, we amplify the original force field $\bm{F}$ in [4] by $\kappa=1000$ times, and take $D=0.01$ for the transformed force field. This amounts to setting $D=10^{-5}$ for the original force field, which is even smaller than the parameter considered in [4]. We simulate the SDE without boundaries with timestep $10^{-5}$ until $\mathsf{T}=5$ , starting from initial states drawn from a uniform distribution in $[0,1.25]^{8}$ . The enhanced samples $(\bm{x}^{\prime}_{i})_{1\leq i\leq N^{\prime}}$ are obtained by adding Gaussian perturbations with a standard deviation $\sigma=0.05$ to the SDE-simulated dataset $(\bm{x}_{i})_{1\leq i\leq N}$ . We only keep the data within the biologically meaningful domain of $[0,1.25]^{8}$ of size 25000 for computation.

We use three-layer networks with 80 hidden states in each layer for both force and potential. For the projected force, we train 2000 epochs to obtain $\widetilde{\bm{G}}(\bm{z};\theta^{*})$ . Then we conduct the enhanced EPR (16) with $\lambda_{1}=0.01,\lambda_{2}=1.0$ for 10000 epochs. We use Adam with a learning rate of 0.001 and batch size of 8192.

In Fig. 8, we present a more detailed picture of the reduced dynamics for the 8D model than Fig. 5c in MT. Specifically, we further show two unstable limit cycles of the projected force field obtained by reversed time integration (two green circles in Fig. 8). These two unstable limit cycles play the role of separatrices between the neighboring stable limit sets. This picture occurs due to the fact that the landscape of the considered system is very flat in the centering region. These inner limit sets are virtual in high dimensions, but they naturally appear in the reduced dynamics on the plane. Similar features might also occur in other reduced dynamics in two dimensions.

G.3 Fifty-two-dimensional multi-stable system: high-dimensional and reduced potentials

For fairness, we all use a neural network with three-layer and 20 hidden states in each layer to denote the reduced force $\widetilde{\bm{G}}(\bm{z};\theta^{*})$ and potential $\widetilde{V_{1}}(\bm{z};\theta^{*})$ , and 80 hidden states for the 52D potential $V(\bm{x};\theta^{*})$ . We use enhanced samples simulated from a more diffusive distribution with $D^{\prime}=2D$ . The data size is 20000 and we use Adam with a learning rate of 0.001 and batch size of 2048, still. We use $\lambda_{1}=10.0,\lambda_{2}=1.0$ in enhanced EPR (16). We train the force $\widetilde{\bm{G}}(\bm{z};\theta^{*})$ for 1000 epochs and conduct $\widetilde{V_{1}}(\bm{z};\theta^{*})$ by enhanced EPR with for 5000 epochs. We also learn a 52D potential $V(\bm{x};\theta^{*})$ by enhanced EPR for 5000 epochs and then project it to $\widetilde{V_{2}}(\bm{z};\theta^{*})$ by (35) for 1000 epochs.

		$\displaystyle\int_{\Omega}\left\|\bm{F}^{v}+a\nabla V\right\|_{a^{-1}}^{2}\,\differential\pi$
	$\displaystyle=$	$\displaystyle\int_{\Omega}\big{\|}\bm{F}^{v}+a\nabla U+a\nabla(V-U)\big{\|}_{a^{-1}}^{2}\,\differential\pi$
	$\displaystyle=$	$\displaystyle\int_{\Omega}\|\bm{F}^{v}+a\nabla U\big{\|}_{a^{-1}}^{2}\,\differential\pi+\int_{\Omega}\big{\|}a\nabla(V-U)\big{\|}_{a^{-1}}^{2}\,\differential\pi.$