Extracting Stochastic Governing Laws by Nonlocal Kramers-Moyal Formulas

Yubin Lu School of Mathematics and Statistics & Center for Mathematical Sciences, Huazhong University of Science and Technology, Wuhan 430074, China Yang Li School of Automation, Nanjing University of Science and Technology, Nanjing 210094, China Jinqiao Duan Corresponding author: [email protected] Departments of Applied Mathematics & Physics, Illinois Institute of Technology, Chicago, IL 60616, USA

Abstract

With the rapid development of computational techniques and scientific tools, great progress of data-driven analysis has been made to extract governing laws of dynamical systems from data. Despite the wide occurrences of non-Gaussian fluctuations, the effective data-driven methods to identify stochastic differential equations with non-Gaussian Lévy noise are relatively few so far. In this work, we propose a data-driven approach to extract stochastic governing laws with both (Gaussian) Brownian motion and (non-Gaussian) Lévy motion, from short bursts of simulation data. Specifically, we use the normalizing flows technology to estimate the transition probability density function (solution of nonlocal Fokker-Planck equation) from data, and then substitute it into the recently proposed nonlocal Kramers-Moyal formulas to approximate Lévy jump measure, drift coefficient and diffusion coefficient. We demonstrate that this approach can learn the stochastic differential equation with Lévy motion. We present examples with one- and two-dimensional, decoupled and coupled systems to illustrate our method. This approach will become an effective tool for discovering stochastic governing laws and understanding complex dynamical behaviors.

Key Words and Phrases: Stochastic dynamics, nonlocal Kramers-Moyal formulas, normalizing flows, Lévy motions, data-driven modeling.

1 Introduction

Differential equations are usually used to model complex phenomena in climate, molecular biology, condensed matter physics, mechanical systems and other fields. In order to improve the understanding and make predictions about the future dynamical evolution, researchers need to describe the effects of noise in dynamical systems. Stochastic differential equations are an effective formalism for modeling physical phenomena under random fluctuations [1]. It is often assumed that the random fluctuations of deterministic dynamical systems are Gaussian noise. In the last decade or two, however, researchers began to actively study the dynamical behaviors for systems under non-Gaussian influences.

Stochastic differential equations with non-Gaussian Lévy motions have the potential for wider applications in physical sciences. For instance, stochastic dynamical systems with both (Gaussian) Brownian motion and (non-Gaussian) Lévy motion can be used to model a significant and widely used class of Markov processes, so-called Feller processes [6]. According to the Greenland ice core measurement data, Ditlevsen [18] found that the temperature in that climate system could be modeled as stochastic differential equations with $\alpha$ -stable Lévy motion. Other researchers also used the non-Gaussian Lévy motion to characterize the random fluctuations emerged in neural systems [42], gene networks [8], and the Earth systems [3, 58, 60, 55, 53].

However, it is often difficult to establish the mathematical models of complex systems in nature, especially when less-understood or unknown mechanisms are present. Thanks to recent improvements in the storage and computing power of computers, more experimental, observational and simulated data are now available. Data-driven approaches to complex dynamical systems have attracted increasing attention in recent years, including but not limited to system identification [37, 7, 9, 48, 22, 14, 31, 34, 49, 57, 51, 47, 27, 23, 29], model reduction [33, 27, 25, 26, 32, 61, 43, 10], extracting dynamical behaviors [20, 21, 4, 15, 38, 52, 54, 28] directly from data.

System identification of differential equations models for physical phenomena is a major task in data-driven analysis of complex dynamical systems. For deterministic systems, Brunton et. al [7, 9] developed tools to identify models as ordinary differential equations. For stochastic systems, there are at least three common ways to learn or extract the coefficients of stochastic differential equations models. The first way is to directly estimate these coefficients where mainstream methods include (nonlocal) Kramers-Moyal formulas [14, 31, 36] , neural differential equations [11, 30, 56, 16] and Bayesian inference [48, 22]. The second way is to learn the generator of stochastic differential equation by estimating the stochastic Koopman operator [34, 51, 47, 27]. Subsequently, we can obtain the coefficients of stochastic differential equation. In the third way, the coefficients are obtained by estimating the (nonlocal) Fokker-Planck equations corresponding to stochastic differential equations. For example, Chen et. al and Yang et. al [12, 59] identify stochastic systems by estimating the coefficients of the associated nonlocal Fokker-Planck equations.

The usual Kramers-Moyal formulas link the sample path data with coefficients of the stochastic systems with Gaussian noise. Therefore, it is possible to identify the systems by using the sample path data of stochastic differential equations. For stochastic dynamical systems with non-Gaussian fluctuations, We devised the so-called nonlocal Kramers-Moyal formulas in our previous work [31]. Hence the nonlocal Kramers-Moyal formulas can be used to identify stochastic differential equations with non-Gaussian fluctuations (in particular, Lévy motions). However, the original algorithm of nonlocal Kramers-Moyal formulas required a lot of sample data and assumptions that imposed a basis on the drift and diffusion coefficients.

In this present work, we devise a method to identify stochastic dynamical systems aiming to reduce data requirements and discard the basis assumption of the coefficients. Concretely, we first estimate the probability density (via normalizing flow neural networks; see below) of a stochastic differential equation with Lévy motion. Subsequently, we apply the nonlocal Kramers-Moyal formulas [31] to calculate the Lévy jump measure, drift coefficient and diffusion coefficient of the stochastic differential equation.

For this new method, probability density estimation plays an important role and this can be seen from nonlocal Kramers-Moyal formulas [31]. In this present work, we are inspired by modern generative machine learning techniques that allow for the ability to learn the underlying distribution that generates samples of a data set. Examples of such methods include variational autoencoders (VAEs), generative adversarial networks (GANs) and normalizing flows [46, 40, 24, 41, 17, 5, 35], etc. For VAEs and GANs, however, they can not provide an explicit density estimation. But the explicit density estimation is exactly what we need. Fortunately, normalizing flow is exactly what we want.

The idea of a normalizing flow was introduced by Tabak and Vanden-Eijnden [50] in the context of density estimation and sampling. A flow that is "normalizing" transforms a given distribution, generally into the standard normal one, while its (invertible) Jacobian is traced and inverted continuously to obtain the inverse map. By constructing the transformation that meets our requirements, we can get an estimate of the target probability density.

To demonstrate the performance of our method, we illustrate with one- and two-dimensional stochastic differential equations with Lévy motions. Furthermore, we also consider the effects of different types of noise, i.e., additive noise and multiplicative noise.

The remainder of this article is arranged as follows: In Section 2, we introduce the idea of normalizing flows and provide two types of transformations to show that how to estimate probability density from sample data. In Section 3, we review the nonlocal Kramers-Moyal formulas, in order to compute the Lévy jump measure, drift and diffusion by estimated density. In Section 4, we will show our results for different cases and compare with true coefficients. The final section contains a discussion of this study and perspectives for building on this line of research.

2 Density estimation

We recall normalizing flows in subsection 2.1, and then introduce two concrete cases of normalizing flows. One is neural spline flows in subsection 2.2, the other one is real-value non-volume preserving (RealNVP) methodology in the subsection 2.3. For those examples shown in Section 5, the neural spline flows are used to estimate one-dimensional probability density and the RealNVP are used to estimate two-dimensional probability densities.

2.1 Normalizing flows

Normalizing flows is a general idea to express probability density using a prior probability density and a series of bijective transformations. In the following, we adopt the notations from [40]. Given $z$ as a $D-$ dimensional real vector sampled from $p_{z}(z)$ . The main idea of normalizing flow is to find a transformation $T$ such that:

\displaystyle z=T(x),\quad where\quad z\thicksim p_{z}(z),

(1)

where $p_{z}(z)$ is a prior (or latent) density. When the transformation $T$ is invertible and both $T$ and $T^{-1}$ are differentiable, the density $p_{x}(x)$ can be calculated by a change of variables:

\displaystyle p_{x}(x)=p_{z}(T(x))\mid{\rm det}J_{T}(x)\mid,\quad where\quad z=T(x)

(2)

and $J_{T}(x)$ is the Jacobian of $T$ , i.e.,

$J_{T}(x)=\left[{\begin{array}[]{*{20}{c}}\frac{\partial T_{1}}{\partial x_{1}}&\cdots&\frac{\partial T_{1}}{\partial x_{D}}\\ \vdots&\ddots&\vdots\\ \frac{\partial T_{D}}{\partial x_{1}}&\cdots&\frac{\partial T_{D}}{\partial x_{D}}\end{array}}\right].$

For two invertible and differentiable transformations $T_{1}$ and $T_{2}$ , we have the following properties:

	$\displaystyle(T_{2}\circ T_{1})^{-1}$	$\displaystyle=T_{1}^{-1}\circ T_{2}^{-1}$
	$\displaystyle{\rm det}J_{T_{2}\circ T_{1}(z)}$	$\displaystyle={\rm det}J_{T_{2}}(T_{1}(z))\cdot{\rm det}J_{T_{1}}(z).$

Consequently, we can construct complex transformations by composing multiple instance of simpler transformations, i.e., $T=T_{K}\circ T_{K-1}\circ\cdots\circ T_{1}$ , where each $T_{K}$ transforms $z_{K-1}$ into $z_{K}$ , assuming $z_{0}=x$ and $z_{K}=z$ .

Let the set of samples $\{x_{0},x_{1},\ldots,x_{n}\}$ be taken from an unknown distribution $p_{x}(x;\theta)$ , we can minimize the negative log-likelihood on data,

\displaystyle\mathcal{L}=-\sum_{i=1}^{n}{\rm log}p_{x}(x_{i};\theta).

(3)

Take the change of variables formula (2) into (3), we have

\displaystyle\mathcal{L}=-\sum_{i=1}^{n}[{\rm log}p_{z}(T(x_{i}))+{\rm log}\mid{\rm det}J_{T}(x)\mid_{x=x_{i}}].

(4)

Therefore, we can learn the transformation $T$ by minimizing the loss function (4) .

This subsection 2.1 introduced the general idea of normalizing flows. We can see that the transformation $T$ plays a crucial role in approximating the target density function. Now we introduce two specific transformations, i.e., neural spline flows and RealNVP.

2.2 Neural spline flows

In this subsection we briefly review the neural spline flows. For the sake of simplicity, we only consider one-dimensional density estimation here. The authors Durkan et al [19] proposed to implement the transformation $T$ using monotonic rational-quadratic splines. A rational-quadratic function takes the form of a quotient of two quadratic polynomials.

We adopt the notations from [19]. The spline uses $K$ different rational-quadratic functions, with boundaries set by $K+1$ coordinates $\{x^{(k)},y^{(k)}\}_{k=0}^{K}$ known as knots. The knots monotonically increase between $(x^{(0)},y^{(0)})=(-B,-B)$ and $(x^{(K)},y^{(K)})=(B,B)$ . The spline itself maps an interval $[-B,B]$ to $[-B,B]$ . The $K-1$ derivatives at the internal points are set to arbitrary positive values and set the boundaries derivatives to $1$ .

In short, we can obtain a unique spline if we give the widths $\theta^{w}$ , heights $\theta^{h}$ of the $K$ bins and the $K-1$ derivatives $\theta^{d}$ at the internal knots.

Denoting $\theta=[\theta^{w},\theta^{h},\theta^{d}]$ , the author X constructed the transformation as follows,

	$\displaystyle\theta$	$\displaystyle=NN(x)$
	$\displaystyle z$	$\displaystyle=T_{\theta}(x),$		(5)

where $NN$ is a neural network and $T_{\theta}$ is a monotonic rational-quadratic spline.

Therefore, we can estimate the target density $p_{x}(x)$ by the rational-quadratic spline $T_{\theta}$ and the prior density $p_{z}(z)$

\displaystyle p_{x}(x)

\displaystyle=p_{z}(T(x))\mid{\rm det}J_{T_{\theta}}(x)\mid.

(6)

For more details, see [19].

2.3 Real-value non-volume preserving transformations

After briefly reviewing the general framework of normalizing flows and neural spline flows in previous subsections, we introduce the construction of the real-value non-volume preserving transformation [17]. In this subsection, for the sake of simplicity, we only focus on two-dimensional density estimation. In fact, this method can be extended to any dimension.

For the target probability density $p_{x}(x)$ , where $x=(x_{1},x_{2})$ is a two-dimensional real vector and the prior density is denoted by $p_{z}(z)$ , $z\in\mathbb{R}^{2}$ , we aim to design an invertible and differentiable transformation $T_{\theta}$ such that $z=T_{\theta}(x)$ and determinant of the Jacobian is easy to compute. To be more specific, we propose the following transformation,

	$\displaystyle z_{1}$	$\displaystyle=x_{1},$
	$\displaystyle z_{2}$	$\displaystyle=\frac{1}{C}[x_{2}e^{\mu(x_{1})}+\nu(x_{1})],$		(7)

where the notation $\mu$ and $\nu$ are two different neural networks. Here $C$ is a hyperparameter. We denote this transformation by $T_{\theta}$ . The Jacobian matrix then becomes a computationally tractable

\displaystyle{\rm det}J_{T_{\theta}}=e^{\mu(x_{1},t)}.

(8)

In consequence, for a given dataset $\mathcal{D}=\{x_{i}\}_{i=1}^{n}$ sampled from $p_{x}(x)$ , the negative log-likelihood becomes

	$\displaystyle\mathcal{L}$	$\displaystyle=-\sum_{i=1}^{n}{\rm log}p_{x}(x_{i};\theta)$
		$\displaystyle=-\sum_{i=1}^{n}[{\rm log}p_{z}(T_{\theta}(x_{i}))+{\rm log}\mid{\rm det}J_{T_{\theta}}(x)\mid_{x=x_{i}}].$		(9)

Minimizing the loss function (2.3) to get the optimal parameters of neural networks $\mu$ and $\nu$ , i.e., we learn a flexible, yet expressive, transformation $T_{\theta}$ from dataset $\mathcal{D}$ . Therefore, we can use this transformation $x=T_{\theta}^{-1}(z)$ to resample from probability density $p_{x}(x)$ . This means that we have learned a generative model that mimics the underlying random variable that has generated our samples in the training data set.

So far, we have seen how we use normalizing flows to estimate the probability density. Next, we will introduce the nonlocal Kramers-Moyal formulas and explain how to combine probability density with extracting stochastic governing laws.

3 Nonlocal Kramers-Moyal formulas

Last section introduces normalizing flows devoted to approximating the probability density function from sample data. It still requires to compute the Lévy jump measure, drift and diffusion terms based on the estimated probability density, in order to extract the stochastic dynamical systems. And this step can be accomplished by the recently proposed nonlocal Kramers-Moyal formulas. We will review it in the following.

Random fluctuations often have both Gaussian and non-Gaussian statistical features. By Lévy-Itô decomposition theorem [1, 13], a large class of random fluctuations are indeed modeled as linear combinations of a (Gaussian) Brownian motion $B_{t}$ and a (non-Gaussian) Lévy process $L_{t}$ . We thus consider an $n$ -dimensional stochastic dynamical system in the following form

dx\left(t\right)=b\left(x\left(t\right)\right)dt+\Lambda\left(x\left(t\right)\right)d{{B}_{t}}+\sigma d{{L}_{t}},

(10)

where ${{B}_{t}}={{\left[{{B}_{1,t}},\ \cdots,\ {{B}_{n,t}}\right]}^{T}}$ is an $n$ -dimensional Brownian motion, and ${{L}_{t}}={{\left[{{L}_{1,t}},\ \cdots,\ {{L}_{n,t}}\right]}^{T}}$ is an $n$ -dimensional non-Gaussian Lévy process with independent components described in the Appendix. The vector $b\left(x\right)={{\left[{{b}_{1}}\left(x\right),\ \cdots,\ {{b}_{n}}\left(x\right)\right]}^{T}}$ is the drift coefficient (or vector field) in ${{\mathbb{R}}^{n}}$ and $\Lambda\left(x\right)$ is an $n\times n$ matrix. The diffusion matrix is defined by $a\left(x\right)=\Lambda{{\Lambda}^{T}}$ . We take the positive constant $\sigma$ as the noise intensity of the Lévy process. Assume that the initial condition is $x\left(0\right)=z$ and the jump measure of ${{L}_{t}}$ is ${{\nu}}\left(dy\right)={{W}}\left(y\right)dy$ , with kernel $W$ , for $y\in\mathbb{R}^{n}\backslash\left\{0\right\}$ .

Based on the Fokker-Planck equation for the probability density function of the system (10), Li and Duan [31] derived the nonlocal Kramers-Moyal formulas to express the Lévy jump measure, drift coefficient and diffusion coefficient via sample path. Its kernel idea is to split the phase space into the regions of big and small jumps by a spherical surface. The integrals inside the ball (instead of the whole space in Kramers-Moyal formulas) provide the drift and diffusion, and the jump measure is computed by the data outside, respectively. We restate their main results including the following theorem and corollary as the theoretical foundation of our method.

The following theorem aims to express the Lévy jump measure, drift and diffusion in terms of the solution of Fokker-Planck equation $p\left(x,t|z,0\right)$ .

Theorem 1.

(Relation between stochastic governing law and Fokker-Planck equation)
For every $\varepsilon>0$ , the probability density function $p\left(x,t|z,0\right)$ and the jump measure, drift and diffusion have the following relations:
1) For every $x$ and $z$ satisfying $\left|x-z\right|>\varepsilon$ ,

\displaystyle\underset{t\to 0}{\mathop{\lim}}\,{{t}^{-1}}{p}\left(x,t|z,0\right)=\sigma^{-n}W\left(\sigma^{-1}\left(x-z\right)\right)

(11)

uniformly in $x$ and $z$ .
2) For $i=1,\ 2,\ \ldots,\ n$ ,

\displaystyle\underset{t\to 0}{\mathop{\lim}}\,{{t}^{-1}}\int_{\left|x-z\right|<\varepsilon}{\left({{x}_{i}}-{{z}_{i}}\right)p\left(x,t|z,0\right)\textrm{d}\mathbf{x}}={{b}_{i}}\left(z\right).

(12)

3) For $i,j=1,\ 2,\ \ldots,\ n$ ,

\displaystyle\underset{t\to 0}{\mathop{\lim}}\,{{t}^{-1}}\int_{\left|x-z\right|<\varepsilon}{\left({{x}_{i}}-{{z}_{i}}\right)\left({{x}_{j}}-{{z}_{j}}\right)p\left(x,t|z,0\right)\textrm{d}\mathbf{x}}={{a}_{ij}}\left(z\right)+\sigma^{-n}\int_{\left|y\right|<\varepsilon}{{y}_{i}{y}_{j}W\left({\sigma}^{-1}{y}\right)dy}.

(13)

This theorem can be reformulated as the following corollary, the so-called nonlocal Kramers-Moyal formulas, which express the jump measure, drift and diffusion via the sample paths of the stochastic differential equation (10).

Corollary 2.

(Nonlocal Kramers-Moyal formulas)
For every $\varepsilon>0$ , the sample path solution $x\left(t\right)$ of the stochastic differential equation (10) and the jump measure, drift and diffusion have the following relations:
1) For every $m>1$ ,

\displaystyle\underset{t\to 0}{\mathop{\lim}}\,{{t}^{-1}}\mathbb{P}\left\{\left.\left|{x}\left(t\right)-{z}\right|\in\left[\varepsilon,\ m\varepsilon\right)\right|x\left(0\right)=z\right\}=\sigma^{-n}\int_{\left|y\right|\in\left[\varepsilon,\ m\varepsilon\right)}{W\left(\sigma^{-1}{y}\right)dy}.

(14)

2) For $i=1,\ 2,\ \ldots,\ n$ ,

\displaystyle\underset{t\to 0}{\mathop{\lim}}\,{{t}^{-1}}\mathbb{P}\left\{\left.\left|x\left(t\right)-z\right|<\varepsilon\right|x\left(0\right)=z\right\}\cdot\mathbb{E}\left[\left.\left({{x}_{i}}\left(t\right)-{{z}_{i}}\right)\right|x\left(0\right)=z;\ \left|x\left(t\right)-z\right|<\varepsilon\right]={{b}_{i}}\left(z\right).

(15)

3) For $i,j=1,\ 2,\ \ldots,\ n$ ,

		$\displaystyle\underset{t\to 0}{\mathop{\lim}}\,{{t}^{-1}}\mathbb{P}\left\{\left.\left\|x\left(t\right)-z\right\|<\varepsilon\right\|x\left(0\right)=z\right\}\cdot\mathbb{E}\left[\left.\left({{x}_{i}}\left(t\right)-{{z}_{i}}\right)\left({{x}_{j}}\left(t\right)-{{z}_{j}}\right)\right\|x\left(0\right)=z;\ \left\|x\left(t\right)-z\right\|<\varepsilon\right]$
		$\displaystyle={{a}_{ij}}\left(z\right)+\sigma^{-n}\int_{\left\|y\right\|<\varepsilon}{{y}_{i}{y}_{j}W\left({\sigma}^{-1}{y}\right)dy}.$		(16)

In this work, we consider a special but significant Lévy motion for the sake of concreteness due to its extensive physical applications [39] as an example to illustrate our method, the rotational symmetric $\alpha$ -stable Lévy motion [45]. Its detailed information is present in the Appendix. In this case, the identification of the Lévy jump measure is transformed to learn the stability parameter $\alpha$ from data.

We have shown that the relationship between the nonlocal Kramers-Moyal formulas and the coefficients of stochastic differential equation. The nonlocal Kramers-Moyal formulas provide a way to estimate the Lévy jump measure, drift coefficient and diffusion coefficient using transition probability density. This is naturally associated with normalizing flows.

4 Method

In this section, we will state how to learn the coefficients of stochastic differential equation by combining the nonlocal Kramers-Moyal formulas with normalizing flows. To clarify which formulas in section 3 we used to estimate the coefficients, we provide a practical implementation here.

Implementation The practical implementation for learning the coefficients of stochastic differential equation:

1

Estimating transition probability density $p(x,t|z,0)$ from short bursts of simulation data with inital value $z$ using neural spline flows (2.2) or RealNVP (2.3).
2

Using the inverse transformation of normalizing flows $T_{\theta}^{-1}$ to obtain the samples from the density estimated $p(x,t|z,0)$ , and then the formula (14) is used to estimate the Lévy jump measure, i.e., anomalous diffusion coefficient $\sigma$ and stability parameter $\alpha$ . Here we use the resampled samples to estimate the probability on the left-hand side of the formula (14). For more details, see [31].
3

Using the transformation of normalizing flows $T_{\theta}$ to estimate the density $p(x,t|z,0)$ , then we can obtain the drift coefficient and diffusion coefficient from (12) and (13).

5 Results

In the previous sections, we have introduced the nonlocal Kramers-Moyal formulas and normalizing flows. The nonlocal Kramers-Moyal formulas show the relationship between the transition probability density $p(x,t|z,0)$ and coefficients (Lévy measure, drift and diffusion) for corresponding stochastic differential equation. Moreover, the normalizing flows provide a flexible way to approximate the transition density $p(x,t|z,0)$ from sample paths data. Therefore, we can estimate the coefficients of stochastic differential equation from sample paths.

Let $t^{*}$ be a value of time "near" zero, a dataset $\mathcal{D}=\{x_{i}^{(z)}\}_{i=1}^{n}$ sampled from $p_{x}(x,t^{*}|z,0)$ . To be specific, these samples come from simulating stochastic differential equation with initial value $z$ using Euler-Maruyama method, where $z\in[-2,2]$ or $z\in[-2,2]\times[-2,2]$ .

5.1 A one-dimensional system

Consider a one-dimensional stochastic differential equation with pure jump Lévy motion

\displaystyle dX(t)=(3X(t)-X^{3}(t))dt+dL^{\alpha}(t),

(17)

where $L_{1}^{\alpha}$ is a scalar, real-value $\alpha-$ stable Lévy motion with triple $(0,0,\nu_{\alpha})$ . Here we take $\alpha=1.5$ . The jump measure is given by

\displaystyle\nu_{\alpha}(dx)=c(1,\alpha)||y||^{-1-\alpha}dx,

(18)

with $c(1,\alpha)=\frac{\alpha\Gamma((1+\alpha)/2)}{2^{1-\alpha}\pi^{\frac{1}{2}}\Gamma(1-\alpha/2)}$ . Here we take $t^{*}=0.01$ , sample size $n=5000$ and standard normal density as our prior density. For the transformation of normalizing flows, we take neural spline flows as our basis transformation, denoted by $T_{\theta}$ . In order to improve the complexity of transformation, we take the $N=32$ compositions $T_{\theta}^{N}$ as our final transformation, where the neural network in (2.2) is a full connected neural network with 3 hidden layers and 32 nodes each layer. In addition, we choose the interval $[-B,B]=[-3,3]$ and $K=5$ bins in section 2.2.
We compare the true drift coefficient and the learned drift coefficient in Figure 1. In addition, the estimated $\alpha=1.55,\sigma=1.09$ , where the true values $\alpha=1.5,\sigma=1$ .

Refer to caption — Figure 1: 1D system: Learning the Drift coefficient. The blue line is the true drift coefficient, the red line is the learned drift coefficient.

In this subsection, we showed how to estimate the coefficients of stochastic differential equation using neural spline flows and nonlocal Kramers-Moyal formulas. However, the biggest disadvantage of neural spline flows is that the root of rational polynomial is required when solving the inverse transformation $T_{\theta}$ . Therefore we prefer to use another method to approximate multidimensional probability density, i.e., RealNVP here.

As we know in Section 2.3, the transformation (2.3) is our basis transformation. For the target probability density $p_{x}(x),x\in\mathbb{R}^{2}$ , our final transformation consists of compositions of the following three types of prototypical transformations:

	$\displaystyle z_{1}$	$\displaystyle=x_{1},$
	$\displaystyle z_{2}$	$\displaystyle=\frac{1}{C}[x_{2}e^{\mu_{1}(x_{1})}+\nu_{1}(x_{1})],$		(19)

	$\displaystyle z_{1}$	$\displaystyle=\frac{1}{C}[x_{1}e^{\mu_{2}(x_{2})}+\nu_{2}(x_{2})],$
	$\displaystyle z_{2}$	$\displaystyle=x_{2},$		(20)

	$\displaystyle z_{1}$	$\displaystyle=x_{1},$
	$\displaystyle z_{2}$	$\displaystyle=\frac{1}{C}[x_{2}e^{\mu_{3}(x_{1})}+\nu_{3}(x_{1})],$		(21)

where $(\mu_{i},\nu_{i}),i=1,2,3$ are six different full connected neural networks with 3 hidden layers and 16 nodes each layer. Denoted this final transformation by $T_{\theta}$ .

5.2 A decoupled system

Consider a two-dimensional stochastic differential equation with a pure jump Lévy motion

		$\displaystyle dX_{1}(t)=(3X_{1}(t)-X_{1}^{3}(t))dt+dL_{1}^{\alpha}(t),$
		$\displaystyle dX_{2}(t)=(3X_{2}(t)-X_{2}^{3}(t))dt+dL_{2}^{\alpha}(t),$		(22)

where $(L_{1}^{\alpha},L_{2}^{\alpha})$ is a two-dimensional rotational symmetry real-value $\alpha-$ stable Lévy motions with triple $(0,0,\nu_{\alpha})$ . Here we take $\alpha=1.5$ . The jump measure is given by

\displaystyle\nu_{\alpha}(dx)=c(2,\alpha)||y||^{-2-\alpha}dx,

(23)

with $c(2,\alpha)=\frac{\alpha\Gamma((2+\alpha)/2)}{2^{1-\alpha}\pi^{\frac{1}{2}}\Gamma(1-\alpha/2)}$ . Here we take $t^{*}=0.05$ , sample size $n=10000$ , hyperparameter $C=\frac{1}{3}$ and standard normal density as our prior density.

We compare the true drift coefficients and the learned drift coefficients in Figure 2. In addition, the estimated $\alpha=1.41,\sigma=1.04$ , where the true values $\alpha=1.5,\sigma=1$ .

Example 5.2 showed us we can learn the drift term of such a decoupling stochastic system from data. Next, we will illustrate our method for a more complexity example.

5.3 A coupled system

Consider a two-dimensional coupling stochastic differential equation with multiplicative Brownian motion and Lévy motion

		$\displaystyle dX_{1}(t)=(0.001X_{1}(t)-X_{1}(t)X_{2}(t))dt+X_{1}(t)dB_{1}(t)+dL_{1}^{\alpha}(t),$
		$\displaystyle dX_{2}(t)=(-6X_{2}(t)+0.25X_{1}^{2}(t))dt+X_{2}(t)dB_{2}(t)+dL_{2}^{\alpha}(t),$		(24)

where $(B_{1},B_{2})$ is a two-dimensional Brownian motion, $(L_{1}^{\alpha},L_{2}^{\alpha})$ is a two-dimensional rotational symmetry real-value $\alpha-$ stable Lévy motions with triple $(0,0,\nu_{\alpha})$ . Here we take $\alpha=1.5$ . The jump measure is given by

\displaystyle\nu_{\alpha}(dx)=c(2,\alpha)||y||^{-2-\alpha}dx,

(26)

with $c(2,\alpha)=\frac{\alpha\Gamma((2+\alpha)/2)}{2^{1-\alpha}\pi^{\frac{1}{2}}\Gamma(1-\alpha/2)}$ . Here we take $t^{*}=0.01$ , sample size $n=10000$ , hyperparameter $C=\frac{1}{3}$ and standard normal density as our prior density.

We compare the true drift coefficients and the learned drift coefficients in Figure 3. Figure 4 show the learned diffusion coefficients. In addition, the estimated $\alpha=1.58,\sigma=1.23$ , where the true values $\alpha=1.5,\sigma=1$ .

6 Discussion

In this work, we have presented a new method to extract stochastic governing laws by nonlocal Kramers-Moyal formulas. In particular, we use normalizing flows to estimate transition probability density from sample path data, and then identify the Lévy jump measure, drift coefficient and diffusion coefficient of a stochastic differential equation. We further develop a numerical algorithm and test it on some prototypical examples to confirm its efficacy.

Comparing with the original data-driven algorithm [31], our method reduces the amount of data and does not require the basis to represent these coefficients in the stochastic governing laws. This makes it easier to extract stochastic governing laws from noisy data with non-Gaussian statistical features. Although we have improved our previous work, there are still some challenges. We reduced the amount of data, but the accuracy of our estimation results was slightly worse than the original one. That is because we did not express the coefficients in terms of a basis (whose selection is also a challenge). The accuracy of the estimate also depends on individual dynamical systems. How to design an algorithm with strong robustness and high estimation accuracy is our future work.

Acknowledgments

The authors are grateful to Ting Gao and Cheng Fang for helpful discussions.

Data Availability

Research code is shared via https://github.com/Yubin-Lu/Extracting-Stochastic-Governing-Laws-by-Nonlocal-Kramers-Moyal-Formulas.

References

[1] J. Duan. An Introduction to Stochastic Dynamics. Cambridge University Press, New York, 2015.
[2] A. E. Kyprianou . Fluctuations of Lévy Processes with Applications: Introductory Lectures. Springer, Berlin Heidelberg, 2014.
[3] P. Ashwin, S. Wieczorek, R. Vitolo, and P. Cox. Tipping points in open systems: Bifurcation, noise-induced and rate-dependent examples in the climate system. Phil. Trans. R. Soc. A., 370:1166–1184, 2012.
[4] R. Banisch and P. Koltai. Understanding the geometry of transport: Diffusion maps for lagrangian trajectory data unravel coherent sets. Chaos, 27:035804, 2017.
[5] G. J. Both and R. Kusters. Temporal normalizing flows. arXiv:1912.09092v1, 2019.
[6] B. Böttcher. Feller processes: The next generation in modeling. brownian motion, lévy processes and beyond. PLoS One, 5:e15102, 2010.
[7] S. L. Brunton, J. Proctor, and J. Kutz. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the National Academy of Sciences, page 201517384, 2016.
[8] R. Cai, X. Chen, J. Duan, J. Kurths, and X. Li. Lévy noise-induced escape in an excitable system. J. Stat. Mech. Theory Exp., 6:063503, 2017.
[9] K. Champion, B. Lusch, J. N. Kutz, and S. L. Brunton. Data-driven discovery of coordinates and governing equations. Proceedings of the National Academy of Sciences, page 1906995116, 2019.
[10] B. Chen. STOCHASTIC MODELING OF WATER VAPOR IN THE CLIMATE SYSTEM. PhD thesis, Applied Mathematics in the Graduate College of the Illinois Institute of Technology, 2009.
[11] R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. Duvenaud. Neural ordinary differential equations. Advances in Neural Information Processing Systems, 2018.
[12] X. Chen, L. Yang, J. Duan, and G. E. Karniadakis. Solving inverse stochastic problems from discrete particle observations using the fokker-planck equation and physics-informed neural networks. arXiv:2008.10653v1, 2020.
[13] D. Applebaum. Lévy Processes and Stochastic Calculus, second ed. Cambridge University Press, 2009.
[14] M. Dai, T. Gao, Y. Lu, Y. Zheng, and J. Duan. Detecting the maximum likelihood transition path from data of stochastic dynamical systems. Chaos, 30:113124, 2020.
[15] M. Dellnitz, G. Froyland, C. Horenkamp, K. Padberg-Gehle, and A. S. Gupta. Seasonal variability of the subpolar gyres in the southern ocean: a numerical investigation based on transfer operators. onlinear Processes in Geophysics, 406:655–664, 2009.
[16] F. Dietrich, A. Makeev, G. Kevrekidis, N. Evangelou, T. Bertalan, S. Reich, and I. G. Kevrekidis. Learning effective stochastic differential equations from microscopic simulations: combining stochastic numerics and deep learning. arXiv:2106.09004v1, 2021.
[17] L. Dinh, J. Sohl-Dickstein, and S. Bengio. Density estimation using real nvp. arXiv:1605.08803v3, 2017.
[18] P. D. Ditlevsen. Observation of $\alpha$ -stable noise induced millennial climate changes from an ice-core record. Geophys. Res. Lett, 26:1441–1444, 1999.
[19] C. Durkan, A. Bekasov, I. Murray, and G. Papamakarios. Neural spline flows. arXiv:1906.04032v1, 2019.
[20] G. Froyland, O. Junge, and P. Koltai. Estimating long-term behavior of flows without trajectory integration: the infinitesimal generator approach. SIAM J. NUMER. ANAL, 51(1):223–247, 2013.
[21] G. Froyland and K. Padberg. Almost-invariant sets and invariant manifolds - connecting probabilistic and geometric descriptions of coherent structures in flows. Physica D, 238:1507?523, 2009.
[22] C. A. García, P. Félix, J. Presedo, and D. G. Márquez. Nonparametric estimation of stochastic differential equations with sparse gaussian processes. Physical Review E, page 022104, 2017.
[23] R. González-García, R. Rico-Martínez, and I. Kevrekidis. Identification of distributed parameter systems: A neural net based approach. Computers Chem. Engng, 22:S965–S968, 1998.
[24] D. P. Kingma, T. Salimans, R. Jozefowicz, I. S. X. Chen, and M. Welling. Improved variational inference with inverse autoregressive flow. arXiv:1606.04934v2, 2017.
[25] S. Klus, P. Koltai, and C. Schütte. On the numerical approximation of the perron- frobenius and koopman operator. Journal of Computational Dynamics, 3(1):51–79, 2016.
[26] S. Klus, F. Nüske, P. Koltai, H. Wu, I. Kevrekidis, C. Schütte, and F. Noé. Data-driven model reduction and transfer operator approximation. J Nonlinear Sci, pages 985–1010, 2018.
[27] S. Klus, F. Nüske, S. Peitz, J. H. Niemann, C. Clementi, and C. Schütte. Data-driven approximation of the koopman generator: Model reduction, system identification, and control. Physica D: Nonlinear Phenomena, 406:132416, 2020.
[28] D. I. Kopelevich, A. Z. Panagiotopoulos, and I. G. Kevrekidis. Coarse-grained kinetic computations for rare events: Application to micelle formation. The Journal of Chemical Physics, 122(4):044908, 2005.
[29] S. Lee, M. Kooshkbaghi, K. Spiliotis, C. I. Siettos, and I. G. Kevrekidis. Coarse-scale pdes from fine-scale observations via machine learning. Chaos: An Interdisciplinary Journal of Nonlinear Science, 30(1):013141, 2020.
[30] X. Li, R. T. Q. Chen, T.-K. L. Wong, and D. Duvenaud. Scalable gradients for stochastic differential equations. In Artificial Intelligence and Statistics, 2020.
[31] Y. Li and J. Duan. A data-driven approach for discovering stochastic dynamical systems with non-gaussian lévy noise. Physica D: Nonlinear Phenomena, 417:132830, 2021.
[32] P. Liu, C. I. Siettos, C. W. Gear, and I. Kevrekidis. Equation-free model reduction in agent-based computations: Coarse-grained bifurcation and variable-free rare event analysis. Math. Model. Nat. Phenom, 10(3):71–90, 2015.
[33] F. Lu. Data-driven model reduction for stochastic burgers equations. arXiv:2010.00736v2, 2020.
[34] Y. Lu and J. Duan. Discovering transition phenomena from data of stochastic dynamical systems with lévy noise. Chaos, 30:093110, 2020.
[35] Y. Lu, R. Maulik, T. Gao, F. Dietrich, I. G. Kevrekidis, and J. Duan. Learning the temporal evolution of multivariate densities via normalizing flows. arXiv:2107.13735v1, 2021.
[36] M. Reza Rahimi Tabar. Analysis and Data-Based Reconstruction of Complex Nonlinear Dynamical Systems. Springer, 2019.
[37] R. Maulik, A. Mohan, B. Lusch, S. Madireddy, P. Balaprakash, and D. Livescu. Time-series learning of latent-space dynamics for reduced-order model closure. Physica D: Nonlinear Phenomena, 405:132368, 2020.
[38] P. Metzner, E. Dittmer, T. Jahnke, and C. Schütte. Generator estimation of markov jump processes. Journal of Computational Physics, pages 353–375, 2007.
[39] O. E. Barndorff-Nielsen and T. Mikosch and S. I. Resnick (Eds.) . Lévy Processes: Theory and Applications. Birkhäuser, Boston, 2001.
[40] G. Papamakarios, E. Nalisnick, D. J. Rezende, S. Mohamed, and B. Lakshminarayanan. Normalizing flows for probabilistic modeling and inference. arXiv:1912.02762v1, 2019.
[41] G. Papamakarios, T. Pavlakou, and I. Murray. Masked autoregressive flow for density estimation. arXiv:1705.07057v4, 2018.
[42] A. Patel and B. Kosko. Stochastic resonance in continuous and spiking neural models with lévy noise. IEEE Trans. Neural Networks, 19:1993–2008, 2008.
[43] L. Ping, C. I. Siettos, C. W. Gear, and I. G. Kevrekidis. Equation-free model reduction in agent-based computations: Coarse-grained bifurcation and variable-free rare event analysis. Mathematical Modelling of Natural Phenomena, 10(3):71 – 90, 2015.
[44] R. Adler and R. Feldman and M. Taqqu . A Practical Guide to Heavy Tails: Statistical Techniques and Applications. Birkhäuser, Boston, 1998.
[45] R. Cont and P. Tankov . Financial Modelling with Jump Processes. CRC Press, UK, 2003.
[46] D. J. Rezende and S. Mohamed. Variational inference with normalizing flows. arXiv:1505.05770v6, 2016.
[47] A. N. Riseth and J. Taylor-King. Operator fitting for parameter estimation of stochastic differential equations. arXiv:1702.07597v2, 2019.
[48] A. Ruttor, P. Batz, and M. Opper. Approximate gaussian process inference for the drift of stochastic differential equations. International Conference on Neural Information Processing Systems Curran Associates Inc, 2013.
[49] P. J. Schmid. Dynamic mode decomposition of numerical and experimental data. Journal of Fluid Mechanics, pages 5–28, 2008.
[50] E. G. Tabak and E. Vanden-Eijnden. Density estimation by dual ascent of the log-likelihood. Communications in Mathematical Sciences, 8(1):217–233, 2010.
[51] N. Takeishi, Y. Kawahara, and T. Yairi. Subspace dynamic mode decomposition for stochastic koopman analysis. Physical Review E, page 033310, 2017.
[52] A. Tantet, F. R. van der Burgt, and H. A. Dijkstra. An early warning indicator for atmospheric blocking events using transfer operators. Chaos, page 036406, 2015.
[53] D. Tesfay, P. Wei, Y. Zheng, J. Duan, and J. Kurths. Transitions between metastable states in a simplified model for the thermohaline circulation under random fluctuations. Appl. Math. Comput., 26(11):124868, 2019.
[54] E. H. Thiede, D. Giannakis, A. R. Dinner, and J. Weare. Galerkin approximation of dynamical quantities using trajectory data. J Chem. Phys, page 244111, 2019.
[55] A. Tsiairis, P. Wei, Y. Chao, and J. Duan. Maximal likely phase lines for a reduced ice growth model. Physica A arXiv:1912.00471, 2020.
[56] B. Tzen and M. Raginsky. Neural stochastic differential equations: Deep latent gaussian models in the diffusion limit. arXiv:1905.09883v2, 2019.
[57] M. O. Williams, I. G. Kevrekidis, and C. Rowley. A data-driven approximation of the koopman operator: Extending dynamic mode decomposition. Journal of Nonlinear Science, pages 1307–1346, 2015.
[58] F. Yang, Y. Zheng, J. Duan, L. Fu, and S. Wiggins. The tipping times in an arctic sea ice system under influence of extreme events. Chaos, 30:063125, 2020.
[59] L. Yang, C. Daskalakis, and G. E. Karniadakis. Generative ensemble-regression: Learning stochastic dynamics from discrete particle ensemble observations. arXiv:2008.01915v1, 2020.
[60] Y. Zheng, F. Yang, J. Duan, X. Sun, L. Fu, and J. Kurths. The maximum likelihood climate change for global warming under the influence of greenhouse effect and lévy noise. Chaos, 30:013132, 2020.
[61] Y. Zou, V. A. Fonoberov, M. Fonoberova, I. Mezic, and I. G. Kevrekidis. Model reduction for agent-based social simulation: Coarse-graining a civil violence model. Physical Review E, 85(6):066106, JUN 2012.

Appendix

Lévy motions

Let $L=(L_{t},t\geq 0)$ be a stochastic process defined on a probability space $(\Omega,\mathcal{F},P)$ . We say that L is a Lévy motion if: (1) $L_{0}=0(a.s.)$ ;
(2) $L$ has independent and stationary increments;
(3) $L$ is stochastically continuous, i.e., for all $a>0$ and for all $s\geq 0$

\lim\limits_{t\to s}P(|L_{t}-L_{s}|>a)=0.

Characteristic functions

For a Lévy motion $(L_{t},t\geq 0)$ , we have the Lévy-Khinchine formula,

\mathbb{E}[e^{i(u,L_{t})}]=exp\{t[i(b,u)-\frac{1}{2}(u,Au)+\int_{\mathbb{R}^{d}\backslash\{0\}}[e^{i(u,y)}-1-i(u,y)I_{\{\|y\|<1\}}(y)]\,\nu(dy)]\},

for each $t\geq 0$ , $u\in\mathbb{R}^{n}$ , where $(b,A,\nu)$ is the triple of Lévy motion $(L_{t},t\geq 0)$ .

Theorem (The Lévy-Itô decomposition) If $(L_{t},t\geq 0)$ is a Lévy motion with $(b,A,\nu)$ , then there exists $b\in\mathbb{R}^{n}$ , a Brownian motion $B_{A}$ with covariance matrix $A$ and an independent Poisson random measure $N$ on $\mathbb{R}^{+}\times(\mathbb{R}^{n}-\{0\})$ such that, for each $t\geq 0$ ,

L_{t}=bt+B_{A}(t)+\int_{0<|x|<c}x\,\tilde{N}(t,dx)+\int_{|x|\geq c}x\,N(t,dx),

where $\int_{|x|\geq c}x\,N(t,dx)$ is a Poisson integral and $\int_{0<|x|<c}x\,\tilde{N}(t,dx)$ is a compensated Poisson integral defined by

\int_{0<|x|<c}x\,\tilde{N}(t,dx)=\int_{0<|x|<c}x\,N(t,dx)-t\int_{0<|x|<c}x\,\nu(dx).

The $\alpha$ -stable Lévy motions

The $\alpha$ -stable Lévy motion is a special but most popular type of the Lévy process defined by the stable random variable with the distribution ${{S}_{\alpha}}\left(\delta,\ \beta,\ \lambda\right)$ [44, 2, 39]. Usually, $\alpha\in\left(0,\ 2\right]$ is called the stability parameter, $\delta\in\left(0,\ \infty\right)$ is the scaling parameter, $\beta\in\left[-1,\ 1\right]$ is the skewness parameter and $\lambda\in\left(-\infty,\ \infty\right)$ is the shift parameter.

A stable random variable $X$ with $0<\alpha<2$ has the following "heavy tail" estimate:

\underset{x\to\infty}{\mathop{\lim}}\,{{y}^{\alpha}}\mathbb{P}\left(X>y\right)={{C}_{\alpha}}\frac{1+\beta}{2}{{\delta}^{\alpha}},

where ${{C}_{\alpha}}$ is a positive constant depending on $\alpha$ . In other words, the tail estimate decays polynomially. The $\alpha$ -stable Lévy motion has larger jumps with lower jump frequencies for smaller $\alpha$ ( $0<\alpha<1$ ), while it has smaller jump sizes with higher jump frequencies for larger $\alpha$ ( $1<\alpha<2$ ). The special case $\alpha=2$ corresponds to (Gaussian) Brownian motion. For more information aboutLévy process, refer to References [1, 13].

Fokker-Planck equations

Consider a stochastic differential equation in $\mathbb{R}^{n}$ :

\displaystyle dX_{t}=f(X_{t-})dt+\sigma(X_{t-})dB_{t}+dL_{t}^{\alpha},\quad X_{0}=x_{0},

(A.1)

where $f$ is a vector field, $\sigma$ is a $n\times n$ matrix, $B_{t}$ is Brownian motion in $\mathbb{R^{n}}$ and $L_{t}^{\alpha}$ is a symmetric $\alpha-$ stable Lévy moton in $\mathbb{R^{n}}$ with the generating triplet $(0,0,\nu_{\alpha})$ . The jump measure

\displaystyle\nu_{\alpha}=c(n,\alpha)\parallel y\parallel^{-n+\alpha}dy,

(A.2)

with $c(n,\alpha)=\frac{\alpha\Gamma((n+\alpha)/2)}{2^{1-\alpha}\pi^{n/2}\Gamma(1-\alpha/2)}$ . The processes $B_{t}$ and $L_{t}^{\alpha}$ are taken to be independent.
The Fokker-Planck equation for the stochastic differential equation (A.1) is then

	$\displaystyle p_{t}$	$\displaystyle=-\bigtriangledown\cdot(fp)+\frac{1}{2}{\rm Tr}[H(\sigma\sigma^{T}p)]$
		$\displaystyle+\int_{\mathbb{R}^{n}\backslash\{0\}}[p(x+y,t)-p(x,t)]\,\nu_{\alpha}(dy),$		(A.3)

where $p(x,0)=\delta(x-x_{0})$ . See [13, 1] for more details.