Variational inference of the drift function for stochastic differential equations driven by Lévy processes¹¹footnotemark: 1

Min Dai [email protected] Jinqiao Duan [email protected] Jianyu Hu Xiangjun Wang School of Mathematics and Statistics, & Center for Mathematical Science,
Huazhong University of Science and Technology, Wuhan, 430074, China. Department of Applied Mathematics, College of Computing,
Illinois Institute of Technology, Chicago, IL 60616, USA.

Abstract

In this paper, we consider the nonparametric estimation problem of the drift function of stochastic differential equations driven by $\alpha$ -stable Lévy motion. First, the Kullback-Leibler divergence between the path probabilities of two stochastic differential equations with different drift functions is optimized. By using the Lagrangian multiplier, the variational formula based on the stationary Fokker-Planck equation is constructed. Then combined with the data information, the empirical distribution is used to replace the stationary density, and the drift function is estimated non-parametrically from the perspective of the process. In the numerical experiment, the different amounts of data and different $\alpha$ values are studied. The experimental results show that the estimation result of the drift function is related to both. When the amount of data increases, the estimation result will be better, and when the $\alpha$ value increases, the estimation result is also better.

keywords:

Nonparametric estimation ,

\alpha

-stable Lévy motion , Kullback-Leibler divergence , Fokker-Planck equation , empirical distribution

1 Introduction

Stochastic dynamical systems are often used to describe random phenomena in real life. Usually, stochastic differential equation[1, 2] is used to model the stochastic process, but one of the important issues is the fitting of the model and the observed data, that is, the system identification in the data. The stochastic differential equation consists of a drift function and a diffusion part. For the thermodynamic equilibrium model, the diffusion function is proportional to the identity matrix, and the drift function is the gradient of potential energy. A simple and known method for estimating the drift function from the data is obtained from[3], which takes advantage of the fact that the potential function is proportional to the logarithm of the stationary density. Therefore, we can get an explicit representation of the drift function by using the kernel density (KDE) estimator. However, this estimator is only based on the distribution of the data, completely ignoring the time sequence of the observations and the time lag between the data. Kernel density estimator is non-parameterized, it does not require the drift function satisfies the specific parameterized form. For non-equilibrium model, in general, the explicit representation between the drift function and the density is unknown. At the same time, for the higher dimensional case, the convergence of the kernel density to the real density will be slow with the increase of sample data[4].

There are many methods for estimating the parameters of the drift function[5, 6]. For these parametric and non-parametric methods, we have to deal with the problem of low data sampling rate[7]. For example, the non-parametric method of conditional expectation is used[8], which requires numerical solution of the Kolmogorov backward equation at a given time interval, and also requires a large amount of data to calculate the conditional expectation. The Bayesian estimator using the Gaussian process prior gives a good estimate of the complete path under dense observations[9]. However, generally speaking, this method requires that unobserved diffusion paths be classified as hidden random variables between adjacent observations, which may lead to time-consuming calculations or require further approximation. Papaspiliopoulos et al.[9] introduced the Monte Carlo Gibbs sampling technique, which switched back and forth between hidden path sampling and drift function sampling of the process. For the processing of hidden paths, the expectation maximization method[10] can be used to approximate the hidden process with linear stochastic differential equations. However, these methods only consider the stochastic differential equations driven by Brownian motion. In nature, there exists a class of stochastic differential equations driven by non-Gaussian Lévy motion. The Lévy motion[11, 12], like the Brownian motion, is a type of Markov process with independent and stationary increments, which has a wide applications in the fields of physics, biology, earth science, etc. Therefore, it is of great significance to study the data recognition problem of this kind of the stochastic differential equation. For the identification problem of stochastic differential equations driven by Lévy motion, Li and Duan[13] studied this equation by using parameterized identification method based on Kramers-Moyal coefficient. However, due to the need to solve the conditional expectation value, it requires a high amount of data and increases the calculation cost.

In this paper, the main purpose is to construct an effective nonparametric inference of the drift function of a class of stochastic differential equations driven by $\alpha$ -stable Lévy motion, which can reduce the requirement for data sample size to a certain extent. In our method, we construct the Kullback-Leibler divergence[14] of the transition density through the diffusion process with two different drift functions, and use the variational formula based on the stationary Fokker-Planck equation[15] to derive the explicit form of the drift function. Combined with data information, we use empirical distribution to replace stationary density, and further minimize empirical functional to obtain target results. However, if the empirical functional has a penalty function term based on the kernel, then the method can be extended to nonparametric inference. In addition, the properties of $\alpha$ -stable Lévy motion will result in a large variation due to the different $\alpha$ values, we also carry out numerical experiments for different parameters $\alpha$ .

This paper is arranged as follows. In section 2, We derive the variational formula of the Fokker-Planck equation corresponding to the stochastic differential equation driven by an $\alpha$ -stable Lévy motion, and we use empirical distribution to replace stationary density and make full use of data information to give the expression of the drift function. In section 3, We show the feasibility of this variational method through numerical application examples. We end the paper some conclusions and discussions in section 4.

2 Theory

In this work, we consider the stochastic differential equation for the dynamics of a $d$ -dimensional diffusion processes $X_{t}\in\mathbb{R}^{d}$ given by

dX_{t}=g(X_{t})dt+\sigma(X_{t}){dB_{t}}+{dL_{t}^{\alpha}},

(1)

where $g(\cdot)\in\mathbb{R}^{d}$ is the drift function, the diffusion function $\sigma(\cdot)$ is the $d\times{k}$ dimensional matrix, $B_{t}$ is the Brownian motion in $\mathbb{R}^{k}$ and $L_{t}^{\alpha}$ is a symmetric $\alpha$ -stable L $\mathrm{\acute{e}}$ vy motion in $\mathbb{R}^{d}$ with the generating triplet $(0,0,\nu_{\alpha})$ . The jump measure[16]

\nu_{\alpha}(dy)=c(n,\alpha)||y||^{-(n+\alpha)}dy,

with $c(n,\alpha)=\frac{\alpha\Gamma((n+\alpha)/2)}{2^{1-\alpha}\pi^{n/2}\Gamma(1-\alpha/2)}.$ The processes $B_{t}$ and $L_{t}^{\alpha}$ are taken to be independent.

2.1 Variational formulation for the Fokker-Planck equation

Now, we will discuss how can we determine the drift function $g$ from data of sample path under the noise matrix $\sigma(\cdot)$ known. Suppose that the stationary density $p(x)$ of the process are given. We then splits the drift into two parts $g(x)=r(x)+f(x)$ , where $r(x)$ is known part and $f(x)$ is the part that we try to compute. Of course, in the multivariate case there is not enough information to reconstruct $f$ uniquely. However, we may search for a minimal solution which minimizes a quadratic functional

\frac{1}{2}\int{p(x)f(x)\cdot{A^{-1}(x)f(x)}}dx

(2)

for a given positive definite matrix $A(x)$ . Introducing a Lagrangian multiplier function $\psi(x)$ for the condition that the density $p$ fulfills the stationary Fokker-Planck equation with drift $g$ , we can derive the minimal $f$ from the variation of the Lagrange-functional

\frac{1}{2}\int{p(x)f(x)\cdot{A^{-1}(x)f(x)}}dx-\int{\psi(x)\{\mathcal{L}p(x)-\nabla\cdot(f(x)p(x))\}}dx

(3)

where the Fokker-Planck operator $\mathcal{L}$ corresponding to known drift $r(x)$ is given by

\mathcal{L}p(x)=-\nabla\cdot(r(x)p(x))+\frac{1}{2}{\mathrm{tr}[\nabla\nabla^{T}(D(x)p(x))]}+\int_{\mathbb{R}^{d}\setminus\{0\}}{[p(x+y,t)-p(x)]\nu_{\alpha}(dy)}

(4)

with $D(x)\doteq{\sigma(x)\sigma(x)^{T}}.$ Variation of (3) with respect to $f$ yields $f(x)=A(x)\nabla\psi(x).$ Inserting this solution back into (3) shows that the unknown function $\psi$ can be derived from the minimization of the functional

\varepsilon[\psi]=\int\Bigg{[}\frac{1}{2}\nabla{\psi(x)}\cdot{A(x)\nabla{\psi(x)}}+\mathcal{L}^{*}\psi(x)\Bigg{]}p(x)dx,

(5)

where $\mathcal{L}^{*}$ is the adjoint operator of $\mathcal{L}$ , (4) which fulfills

\int{\psi(x)\mathcal{L}p(x)dx}=\int{p(x)\mathcal{L}^{*}\psi(x)dx}

(6)

and is given by

\mathcal{L}^{*}\psi(x)=r(x)\cdot\nabla(x)+\frac{1}{2}{\mathrm{tr}[(D(x)\nabla\nabla^{T}\psi(x))]}+\int_{\mathbb{R}^{d}\setminus\{0\}}{[\psi(x+y,t)-\psi(x)]\nu_{\alpha}(dy)}

(7)

In fact, a direct minimization of (5) with respect to $\psi$ yields

\mathcal{L}[\psi]p(x)=\mathcal{L}p(x)-\nabla\cdot(A(x)\nabla\psi(x)p(x))=0,

(8)

which is the stationary Fokker-Planck equation corresponding to the density $p(x)$ and the drift $g(x)=r(x)+A(x)\nabla\psi(x).$ For the special case $D=A=I$ and $r=0$ the functional (5) was introduced in the field of machine learning as a score-function for estimating $\mathrm{ln}{p(x)}$ up to a normalization constant[17].

2.2 Minimizing the empirical functional

Our goal is to estimate $\psi$ from data by replacing the average over the stationary density $p(x)$ in the functional (5) by empirical distribution

\hat{p}(x)=\frac{1}{n}\sum_{i=1}^{n}\delta(x-x_{i})

(9)

where $x_{1},\cdots,x_{n}$ is a random, ergodic sample drawn from this density. An obvious possibility to construct estimators is to work with a parametric representation

\psi_{\omega}(x)=\sum_{k=1}^{K}\omega_{k}\varphi_{k}(x)

(10)

where the $\varphi_{k}$ are a set of given basis functions. Then we can obtain the empirical constraint optimization problem which is minimize the following

\varepsilon_{\mathrm{emp}}[\psi_{\omega}]=\sum_{i=1}^{n}\Bigg{[}\frac{1}{2}\nabla{\psi_{\omega}(x_{i})}\cdot{A(x_{i})\nabla{\psi_{\omega}(x_{i})}}+\mathcal{L}^{*}\psi_{\omega}(x_{i})\Bigg{]}

(11)

which is a quadratic form in the $\omega_{k}$ and can thus be performed in closed form. We are however interested in the case where a representation in terms of a finite set of basis functions is not rich enough to represent $\psi$ . Thus we will resort to a more general, nonparametric representation allowing for an infinite set of functions $\varphi_{k}.$ So, we can understand (10) as a Gaussian process model[18] for the function $\psi,$ that is we will choose the quadratic form $\frac{1}{2}\Sigma_{k}{\omega_{k}^{2}/\lambda_{k}}$ as an extra penalty term, where the $\lambda_{k}$ are hyper-parameters. The penalty can also be viewed from a pseudo-Bayesian perspective where $\mathrm{exp}{\{-C\varepsilon_{\mathrm{emp}}{[\psi_{\omega}]}\}}$ is interpreted as a likelihood and $\mathrm{exp}{\{-\frac{1}{2}\Sigma_{k}{\omega_{k}^{2}/\lambda_{k}}\}}$ as a Gaussian prior distribution over parameters $\omega_{k}$ . $C$ can be chosen to give different weight to the data and to the penalty.

Motivated by Gaussian process point of view we will introduce the kernel trick into our formalism avoiding an explicit specification of $\varphi_{k}$ and $\lambda_{k}$ and assume instead that these are defined implicitly as orthonormal eigenfunctions and eigenvalues of a positive definite kernel function $K(x,x^{{}^{\prime}})$ via

K(x,x^{{}^{\prime}})=\sum_{k}{\lambda_{k}\varphi_{k}(x)\varphi_{k}(x^{{}^{\prime}})}.

In the kernel approach the regularized functional can be written as

C\sum_{i=1}^{n}\Bigg{[}\frac{1}{2}\nabla{\psi(x_{i})}\cdot{A(x_{i})\nabla{\psi(x_{i})}}+\mathcal{L}^{*}\psi(x_{i})\Bigg{]}+\frac{1}{2}\int\int\psi(x)K^{-1}(x,x^{{}^{\prime}})\psi(x^{{}^{\prime}})dxdx^{{}^{\prime}},

(12)

where $K^{-1}(x,x^{{}^{\prime}})$ is the formal inverse of the kernel operator. Performing the variation with respect to $\psi$ yields

Cn\mathcal{L}[\psi]\hat{p}(x)+\int{K^{-1}(x,x^{{}^{\prime}})\psi(x^{{}^{\prime}})dx^{{}^{\prime}}}=0,

where $\mathcal{L}[\psi]$ was defined in (8). Multiplying both sides of this equation with the operator $K$ we get

\psi(x)+C\sum_{j=1}^{n}{\mathcal{L}_{x^{{}^{\prime}}}^{*}[\psi]K(x,x^{{}^{\prime}})_{x^{{}^{\prime}}=x_{j}}}=0

(13)

where the adjoint operator acts on functions $h$ as

{\mathcal{L}_{x^{{}^{\prime}}}^{*}h(x^{{}^{\prime}})}=(r(x^{{}^{\prime}})+A(x^{{}^{\prime}})\nabla\psi(x^{{}^{\prime}}))\nabla h(x^{{}^{\prime}})+\frac{1}{2}\mathrm{tr}{[D(x^{{}^{\prime}})\nabla\nabla^{T}h(x^{{}^{\prime}})]}+\int_{\mathbb{R}^{d}\setminus\{0\}}{[h(x^{{}^{\prime}}+y)-h(x^{{}^{\prime}})]\nu_{\alpha}(dy)}

(14)

Applied the family of kernel functions $h(x^{{}^{\prime}})=K(x,x^{{}^{\prime}})$ when the stationary density $p$ is replaced by its empirical approximation $\hat{p}$ (9).

The gradient of $\psi$ at the data points is computed by taking the gradient of (13) and setting $x=x_{i}.$ This yields the set of linear equations

\nabla\psi(x_{i})+C\sum_{j=1}^{n}{\mathcal{L}_{x^{{}^{\prime}}}^{*}[\psi]\nabla_{x}K(x,x^{{}^{\prime}})_{x=x_{i},x^{{}^{\prime}}=x_{j}}}=0

(15)

for the $d\times{n}$ unknowns $\nabla\psi(x_{i})$ which can be plugged into (13) to obtain the explicit result for the estimator.

3 Application

We will give a simple numerical example in this section to illustrate the results through the use of different amounts of data and the comparison of different $\alpha$ values.

Example 1.

Consider a stochastic dynamical system of the following form

dX_{t}=(X_{t}-X_{t}^{3})dt+{dW_{t}}+{dL_{t}^{\alpha}},~{}~{}~{}X_{0}=x\in{\mathbb{R}},

(16)

Where $W_{t}$ is the standard Brownian motion, $L_{t}^{\alpha}$ is a stable Lévy motion, the drift function is $f(x)=x-x^{3}$ . In the numerical simulation, we obtain a sample path data of the time series by observing the process $X_{t}$ once. In this example, we take the time step $\Delta{t}=0.001$ , and the sample sizes are $n=6000$ and $n=10000$ data points respectively, and then use the variational method of the Fokker-Planck equation to perform nonparametric inferences on the drift function $f$ . In addition, we also select different $\alpha$ values to compare the results. we will show the numerical results:

Refer to caption — Figure 1: When the sample data size $n=6000$ , the nonparametric inference results of the drift function $f$ with different $\alpha$ values.

From Figure (1) and (2), we can see that the size of the data and the different $\alpha$ values ??have an impact on the inference of the drift function. In this chapter, our method only considers the data obtained from one observation of the process. When the value of $\alpha$ is the same, the estimator of the drift function becomes better as the amount of data increases. And for the same amount of data, we’re going to consider the selection of $\alpha$ values of 1, 1.25, 1.5 and 1.75, respectively. As the value of $\alpha$ increases, the sample trajectory of the process is smoother, making the inference result of the drift function better. Of course, we also know from the numerical results that the estimation of the drift function needs to be improved, such as when $\alpha$ is close to 1, the estimation effect of the drift function is poor, so we can improve the numerical results by increasing the amount of sample data. In addition, the method to infer the drift function only. For diffusion function, we are still in the further study. On the one hand, we hope to obtain the diffusion function inference in the case of additive and multiplicative noise when the drift function is known. On the other hand, we also hope to obtain an optimization method to infer the drift function and the diffusion function at the same time.

4 Conclusion and discussions

In this paper, we have proposed a way to obtain the drift function estimator for a class of stochastic differential equations driven by $\alpha$ -stable Lévy motion from data. We construct a variational formula based on the stationary Fokker-Planck equation, replace the stationary density with an empirical distribution, and derive the drift function expression. Another contribution is that our method only needs to observe the dense data of the process once, which reduces the cost of data processing, and is more in line with the actual observation situation, so it is more conducive to processing the actual data.

In addition, there are some interesting extensions for this study. We only studied the nonparametric estimation of the drift function of the stochastic dynamical system driven by Lévy motion. It is worth thinking about the inference of the diffusion function for additive and multiplicative cases. On the other hand, we can also explore the method of extracting the dynamics of this type of the stochastic system from the data, so as to gain insight into the stochastic phenomenon in related fields.

Acknowledgements

We would like to thank Dr. Xiaoli Chen, Dr. Wei Wei, Dr. Cheng Fang for helpful discussions. This work was supported by the National Natural Science Foundation of China (NSFC) grants 12001213, 11801192, 11771449 and National Science Foundation (NSF) grant 1620449.

Appendix: Kullback-Leibler divergence and Expression of estimators

A1. Kullback-Leibler divergence

For every $t\in[0,T]$ , assuming that the dense path observation $X_{0:T}$ of equation (1) is obtained, we can derive the likelihood function. We discretize the time interval $[0,T]$ , take the time step as $\Delta{t}$ and use the fact that $\Delta{t}\rightarrow{0}$ , then the transition density of the equation (1) satisfies the Gaussian distribution. We find that the negative log-likelihood part that depends on the drift function $g$ can be approximated by the following equation

-\ln p(X_{0:T}|g)\simeq{\frac{1}{2}\sum_{t}\{\|g(X_{t})\|^{2}\Delta{t}-2\langle{g(X_{t}),(X_{T+\Delta{t}}-X_{t})}\rangle\}}+const,

Where the inner product $\langle{u,v}\rangle\doteq{u\cdot{D^{-1}}v}$ , and the corresponding square norm is $\|u\|^{2}\doteq{u\cdot{D^{-1}}u}$ .

For a sufficiently small $\Delta{t}$ , using the Gaussian form of the transition density, we can give the Kullback-Leibler divergence between the path probabilities of the two diffusion processes with drift functions of $g(x)$ and $r(x)$ respectively, where $g(x)=f(x)+r(x)$ , which can be expressed as

D(p(X_{0:T}|g)\|p(X_{0:T}|r))=\int_{0}^{T}dt\int{p_{t}(x)f(x)\cdot{D^{-1}(x)f(x)dx}}.

(17)

Assume that the two diffusion processes have the same diffusion function $D(x)$ and non-stochastic initial state [19]. The density $p_{t}(x)$ is the probability density of the diffusion process with drift function $g$ at time $t$ . Furthermore, we assume that the process reaches a stationary state and has a stationary density $p(x)$ , thus the relative entropy rate [20] can be obtained

\lim_{T\rightarrow{\infty}}\frac{1}{T}D(p(X_{0:T}|g)\|p(X_{0:T}|r))=\int{p_{t}(x)f(x)\cdot{D^{-1}(x)f(x)dx}}.

(18)

The comparison with equation (3) shows that when $A=D$ , the minimization of equation (5) will result in that the stationary density process with drift function $g$ is close enough to the process with drift function $r$ in the case of relative entropy. Therefore, we can understand this as the generalized minimum relative entropy (Kullback Leibler divergence) under the constraints given by the stationary density.

A2. Expression of estimators

We will show how to obtain an explicit representation of the estimator from equations (13) and (15). In order to simplify the notation, we make some notation simplification for the expressions covered in the equation. Let the vectors of $\nabla\psi(x_{i})$ and drift function $r(x_{i})$ at all data points be

\begin{split}&b=(\nabla\psi(x_{1}),\nabla\psi(x_{2}),\cdots,\nabla\psi(x_{n})),\\ &r=(r(x_{1}),r(x_{2}),\cdots,r(x_{n})).\end{split}

(19)

The kernel function $B$ and the block-diagonal weight matrix $A$ are defined as

\begin{split}&B_{ij}=\nabla_{x}\nabla_{x^{\prime}}K(x,x^{\prime})|_{x=x_{i},x^{\prime}=x_{j}},\\ &A_{ij}=\delta_{ij}A(x_{j}),\end{split}

(20)

and define

\begin{split}&y(x)=\frac{1}{2}\sum_{j=1}^{n}tr[D(x^{\prime})\nabla_{x^{\prime}}\nabla_{x^{\prime}}^{T}K(x,x^{\prime})]_{x^{\prime}=x_{j}},\\ &z(x)=\int_{\mathbb{R}^{d}-\{0\}}[K(x,x^{\prime}+w)-K(x,x^{\prime})]\nu_{\alpha}(dw),\end{split}

(21)

then we have

\begin{split}&\nabla y(x)=(\nabla{y(x_{1})},\nabla{y(x_{2})},\cdots,\nabla{y(x_{n})}),\\ &\nabla z(x)=(\nabla{z(x_{1})},\nabla{z(x_{2})},\cdots,\nabla{z(x_{n})}).\end{split}

(22)

We can rewrite (15) as

(I+CBA)b+C(Br+\nabla y+\nabla z)=0,

(23)

Where $I$ represents the identity matrix. Equation (13) can be similarly written as

\psi(x)+C(k(x)Ab+k(x)r+y(x)+z(x))=0,

(24)

and $k(x)=(\nabla_{x^{\prime}}K(x,x^{\prime})|_{x^{\prime}=x_{1}},\nabla_{x^{\prime}}K(x,x^{\prime})|_{x^{\prime}=x_{2}},\cdots,\nabla_{x^{\prime}}K(x,x^{\prime})|_{x^{\prime}=x_{n}}).$ Further, we can obtain the following result via solving the function $b$ of equation (23),

b=-(I+CBA)^{-1}C(Br+\nabla y+\nabla z).

Combined with equation (24), the explicit form of the estimator $\psi$ can be solved

\psi(x)=Ck(x)A(I+CBA)^{-1}C(Br+\nabla y+\nabla z)-C(k(x)r+y(x)+z(x)).

(25)

References

[1] Øksendal B. Stochastic Differential Equation: An Introduction with Applications. Sixth Edition. Berlin: Springer-Verlag, 2003.
[2] Gikhman I I, Skorokhod A V. Stochastic Differential Equations. New York: Springer-Verlag, 1972.
[3] Iacus S M. Simulation and Inference for Stochastic Differential Equations: with R Examples. New York: Springer, 2009.
[4] Sriperumbudur B, Fukumizu K, Gretton A, et al. Density estimation in infinite dimensional exponential families. Journal of Machine Learning Research, 2017, 18: 1-59.
[5] Batz P, Ruttor A, Opper M. Approximate Bayes learning of stochastic differential equations. Physical Review E, 2018, 98(2): 022109.
[6] Batz P, Ruttor A, Opper M. Variational estimation of the drift for stochastic differential equations from the empirical density. Journal of Statistical Mechanics: Theory and Experiment, 2016, 8(8): 083404.
[7] Gottschall J, Peinke J. On the definition and handling of different drift and diffusion estimates. New Journal of Physics, 2008, 10(8): 083034.
[8] Honisch C, Friedrich R. Estimation of Kramers-Moyal coefficients at low sampling rates. Physical Review E, 2011, 83(6): 066701.
[9] Papaspiliopoulos O, Pokern Y, Roberts G O, et al. Nonparametric estimation of diffusions: a differential equations approach. Biometrika, 2012, 99(3): 511-531.
[10] Ruttor A, Batz P, Opper M. Approximate Gaussian process inference for the drift function in stochastic differential equations. Advances in Neural Information Processing Systems. 2013. 2040-2048.
[11] Ken-Iti S. Lévy processes and infinitely divisible distributions. Cambridge University Press, 1999.
[12] Applebaum D. Lévy Processes and Stochastic Calculus. Cambridge University Press, 2009.
[13] Li Y, Duan J. A data-driven approach for discovering stochastic dynamical systems with non-Gaussian Lévy noise. Physica D: Nonlinear Phenomena, 2021, 417: 132830.
[14] Kullback S, Leibler R A. On information and sufficiency. The annals of mathematical statistics, 1951, 22(1): 79-86.
[15] Risken H. The Fokker-Planck Equation. Berlin: Springer, 1996.
[16] Duan J. An Introduction to Stochastic Dynamics. New York: Cambridge University Press, 2015.
[17] Hyvärinen A, Dayan P. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 2005, 6(4): 695-709.
[18] Rasmussen C E, Williams C K I. Gaussian Processes for Machine Learning. Cambridge: The MIT Press, 2006.
[19] Archambeau C, Opper M, Shen Y, et al. Variational inference for diffusion processes. Advances in Neural Information Processing Systems. 2008. 17-24.
[20] Chernyak V Y, Chertkov M, Bierkens J, et al. Stochastic optimal control as non-equilibrium statistical mechanics: Calculus of variations over density and current. Journal of Physics A: Mathematical and Theoretical, 2013, 47(2): 022001.

Variational inference of the drift function for stochastic differential equations driven by Lévy processes11footnotemark: 1

Abstract

keywords:

1 Introduction

2 Theory

2.1 Variational formulation for the Fokker-Planck equation

2.2 Minimizing the empirical functional

3 Application

Example 1.

4 Conclusion and discussions

Acknowledgements

Appendix: Kullback-Leibler divergence and Expression of estimators

References

References

Variational inference of the drift function for stochastic differential equations driven by Lévy processes¹¹footnotemark: 1