Variational inference of the drift function for stochastic differential equations driven by Lévy processes11footnotemark: 1
Abstract
In this paper, we consider the nonparametric estimation problem of the drift function of stochastic differential equations driven by -stable Lévy motion. First, the Kullback-Leibler divergence between the path probabilities of two stochastic differential equations with different drift functions is optimized. By using the Lagrangian multiplier, the variational formula based on the stationary Fokker-Planck equation is constructed. Then combined with the data information, the empirical distribution is used to replace the stationary density, and the drift function is estimated non-parametrically from the perspective of the process. In the numerical experiment, the different amounts of data and different values are studied. The experimental results show that the estimation result of the drift function is related to both. When the amount of data increases, the estimation result will be better, and when the value increases, the estimation result is also better.
keywords:
Nonparametric estimation , -stable Lévy motion , Kullback-Leibler divergence , Fokker-Planck equation , empirical distribution1 Introduction
Stochastic dynamical systems are often used to describe random phenomena in real life. Usually, stochastic differential equation[1, 2] is used to model the stochastic process, but one of the important issues is the fitting of the model and the observed data, that is, the system identification in the data. The stochastic differential equation consists of a drift function and a diffusion part. For the thermodynamic equilibrium model, the diffusion function is proportional to the identity matrix, and the drift function is the gradient of potential energy. A simple and known method for estimating the drift function from the data is obtained from[3], which takes advantage of the fact that the potential function is proportional to the logarithm of the stationary density. Therefore, we can get an explicit representation of the drift function by using the kernel density (KDE) estimator. However, this estimator is only based on the distribution of the data, completely ignoring the time sequence of the observations and the time lag between the data. Kernel density estimator is non-parameterized, it does not require the drift function satisfies the specific parameterized form. For non-equilibrium model, in general, the explicit representation between the drift function and the density is unknown. At the same time, for the higher dimensional case, the convergence of the kernel density to the real density will be slow with the increase of sample data[4].
There are many methods for estimating the parameters of the drift function[5, 6]. For these parametric and non-parametric methods, we have to deal with the problem of low data sampling rate[7]. For example, the non-parametric method of conditional expectation is used[8], which requires numerical solution of the Kolmogorov backward equation at a given time interval, and also requires a large amount of data to calculate the conditional expectation. The Bayesian estimator using the Gaussian process prior gives a good estimate of the complete path under dense observations[9]. However, generally speaking, this method requires that unobserved diffusion paths be classified as hidden random variables between adjacent observations, which may lead to time-consuming calculations or require further approximation. Papaspiliopoulos et al.[9] introduced the Monte Carlo Gibbs sampling technique, which switched back and forth between hidden path sampling and drift function sampling of the process. For the processing of hidden paths, the expectation maximization method[10] can be used to approximate the hidden process with linear stochastic differential equations. However, these methods only consider the stochastic differential equations driven by Brownian motion. In nature, there exists a class of stochastic differential equations driven by non-Gaussian Lévy motion. The Lévy motion[11, 12], like the Brownian motion, is a type of Markov process with independent and stationary increments, which has a wide applications in the fields of physics, biology, earth science, etc. Therefore, it is of great significance to study the data recognition problem of this kind of the stochastic differential equation. For the identification problem of stochastic differential equations driven by Lévy motion, Li and Duan[13] studied this equation by using parameterized identification method based on Kramers-Moyal coefficient. However, due to the need to solve the conditional expectation value, it requires a high amount of data and increases the calculation cost.
In this paper, the main purpose is to construct an effective nonparametric inference of the drift function of a class of stochastic differential equations driven by -stable Lévy motion, which can reduce the requirement for data sample size to a certain extent. In our method, we construct the Kullback-Leibler divergence[14] of the transition density through the diffusion process with two different drift functions, and use the variational formula based on the stationary Fokker-Planck equation[15] to derive the explicit form of the drift function. Combined with data information, we use empirical distribution to replace stationary density, and further minimize empirical functional to obtain target results. However, if the empirical functional has a penalty function term based on the kernel, then the method can be extended to nonparametric inference. In addition, the properties of -stable Lévy motion will result in a large variation due to the different values, we also carry out numerical experiments for different parameters .
This paper is arranged as follows. In section 2, We derive the variational formula of the Fokker-Planck equation corresponding to the stochastic differential equation driven by an -stable Lévy motion, and we use empirical distribution to replace stationary density and make full use of data information to give the expression of the drift function. In section 3, We show the feasibility of this variational method through numerical application examples. We end the paper some conclusions and discussions in section 4.
2 Theory
In this work, we consider the stochastic differential equation for the dynamics of a -dimensional diffusion processes given by
(1) |
where is the drift function, the diffusion function is the dimensional matrix, is the Brownian motion in and is a symmetric -stable Lvy motion in with the generating triplet . The jump measure[16]
with The processes and are taken to be independent.
2.1 Variational formulation for the Fokker-Planck equation
Now, we will discuss how can we determine the drift function from data of sample path under the noise matrix known. Suppose that the stationary density of the process are given. We then splits the drift into two parts , where is known part and is the part that we try to compute. Of course, in the multivariate case there is not enough information to reconstruct uniquely. However, we may search for a minimal solution which minimizes a quadratic functional
(2) |
for a given positive definite matrix . Introducing a Lagrangian multiplier function for the condition that the density fulfills the stationary Fokker-Planck equation with drift , we can derive the minimal from the variation of the Lagrange-functional
(3) |
where the Fokker-Planck operator corresponding to known drift is given by
(4) |
with Variation of (3) with respect to yields Inserting this solution back into (3) shows that the unknown function can be derived from the minimization of the functional
(5) |
where is the adjoint operator of , (4) which fulfills
(6) |
and is given by
(7) |
In fact, a direct minimization of (5) with respect to yields
(8) |
which is the stationary Fokker-Planck equation corresponding to the density and the drift For the special case and the functional (5) was introduced in the field of machine learning as a score-function for estimating up to a normalization constant[17].
2.2 Minimizing the empirical functional
Our goal is to estimate from data by replacing the average over the stationary density in the functional (5) by empirical distribution
(9) |
where is a random, ergodic sample drawn from this density. An obvious possibility to construct estimators is to work with a parametric representation
(10) |
where the are a set of given basis functions. Then we can obtain the empirical constraint optimization problem which is minimize the following
(11) |
which is a quadratic form in the and can thus be performed in closed form. We are however interested in the case where a representation in terms of a finite set of basis functions is not rich enough to represent . Thus we will resort to a more general, nonparametric representation allowing for an infinite set of functions So, we can understand (10) as a Gaussian process model[18] for the function that is we will choose the quadratic form as an extra penalty term, where the are hyper-parameters. The penalty can also be viewed from a pseudo-Bayesian perspective where is interpreted as a likelihood and as a Gaussian prior distribution over parameters . can be chosen to give different weight to the data and to the penalty.
Motivated by Gaussian process point of view we will introduce the kernel trick into our formalism avoiding an explicit specification of and and assume instead that these are defined implicitly as orthonormal eigenfunctions and eigenvalues of a positive definite kernel function via
In the kernel approach the regularized functional can be written as
(12) |
where is the formal inverse of the kernel operator. Performing the variation with respect to yields
where was defined in (8). Multiplying both sides of this equation with the operator we get
(13) |
where the adjoint operator acts on functions as
(14) |
Applied the family of kernel functions when the stationary density is replaced by its empirical approximation (9).
3 Application
We will give a simple numerical example in this section to illustrate the results through the use of different amounts of data and the comparison of different values.
Example 1.
Consider a stochastic dynamical system of the following form
(16) |
Where is the standard Brownian motion, is a stable Lévy motion, the drift function is . In the numerical simulation, we obtain a sample path data of the time series by observing the process once. In this example, we take the time step , and the sample sizes are and data points respectively, and then use the variational method of the Fokker-Planck equation to perform nonparametric inferences on the drift function . In addition, we also select different values to compare the results. we will show the numerical results:
From Figure (1) and (2), we can see that the size of the data and the different values ??have an impact on the inference of the drift function. In this chapter, our method only considers the data obtained from one observation of the process. When the value of is the same, the estimator of the drift function becomes better as the amount of data increases. And for the same amount of data, we’re going to consider the selection of values of 1, 1.25, 1.5 and 1.75, respectively. As the value of increases, the sample trajectory of the process is smoother, making the inference result of the drift function better. Of course, we also know from the numerical results that the estimation of the drift function needs to be improved, such as when is close to 1, the estimation effect of the drift function is poor, so we can improve the numerical results by increasing the amount of sample data. In addition, the method to infer the drift function only. For diffusion function, we are still in the further study. On the one hand, we hope to obtain the diffusion function inference in the case of additive and multiplicative noise when the drift function is known. On the other hand, we also hope to obtain an optimization method to infer the drift function and the diffusion function at the same time.
4 Conclusion and discussions
In this paper, we have proposed a way to obtain the drift function estimator for a class of stochastic differential equations driven by -stable Lévy motion from data. We construct a variational formula based on the stationary Fokker-Planck equation, replace the stationary density with an empirical distribution, and derive the drift function expression. Another contribution is that our method only needs to observe the dense data of the process once, which reduces the cost of data processing, and is more in line with the actual observation situation, so it is more conducive to processing the actual data.
In addition, there are some interesting extensions for this study. We only studied the nonparametric estimation of the drift function of the stochastic dynamical system driven by Lévy motion. It is worth thinking about the inference of the diffusion function for additive and multiplicative cases. On the other hand, we can also explore the method of extracting the dynamics of this type of the stochastic system from the data, so as to gain insight into the stochastic phenomenon in related fields.
Acknowledgements
We would like to thank Dr. Xiaoli Chen, Dr. Wei Wei, Dr. Cheng Fang for helpful discussions. This work was supported by the National Natural Science Foundation of China (NSFC) grants 12001213, 11801192, 11771449 and National Science Foundation (NSF) grant 1620449.
Appendix: Kullback-Leibler divergence and Expression of estimators
A1. Kullback-Leibler divergence
For every , assuming that the dense path observation of equation (1) is obtained, we can derive the likelihood function. We discretize the time interval , take the time step as and use the fact that , then the transition density of the equation (1) satisfies the Gaussian distribution. We find that the negative log-likelihood part that depends on the drift function can be approximated by the following equation
Where the inner product , and the corresponding square norm is .
For a sufficiently small , using the Gaussian form of the transition density, we can give the Kullback-Leibler divergence between the path probabilities of the two diffusion processes with drift functions of and respectively, where , which can be expressed as
(17) |
Assume that the two diffusion processes have the same diffusion function and non-stochastic initial state [19]. The density is the probability density of the diffusion process with drift function at time . Furthermore, we assume that the process reaches a stationary state and has a stationary density , thus the relative entropy rate [20] can be obtained
(18) |
The comparison with equation (3) shows that when , the minimization of equation (5) will result in that the stationary density process with drift function is close enough to the process with drift function in the case of relative entropy. Therefore, we can understand this as the generalized minimum relative entropy (Kullback Leibler divergence) under the constraints given by the stationary density.
A2. Expression of estimators
We will show how to obtain an explicit representation of the estimator from equations (13) and (15). In order to simplify the notation, we make some notation simplification for the expressions covered in the equation. Let the vectors of and drift function at all data points be
(19) |
The kernel function and the block-diagonal weight matrix are defined as
(20) |
and define
(21) |
then we have
(22) |
We can rewrite (15) as
(23) |
Where represents the identity matrix. Equation (13) can be similarly written as
(24) |
and Further, we can obtain the following result via solving the function of equation (23),
Combined with equation (24), the explicit form of the estimator can be solved
(25) |
References
References
- [1] Øksendal B. Stochastic Differential Equation: An Introduction with Applications. Sixth Edition. Berlin: Springer-Verlag, 2003.
- [2] Gikhman I I, Skorokhod A V. Stochastic Differential Equations. New York: Springer-Verlag, 1972.
- [3] Iacus S M. Simulation and Inference for Stochastic Differential Equations: with R Examples. New York: Springer, 2009.
- [4] Sriperumbudur B, Fukumizu K, Gretton A, et al. Density estimation in infinite dimensional exponential families. Journal of Machine Learning Research, 2017, 18: 1-59.
- [5] Batz P, Ruttor A, Opper M. Approximate Bayes learning of stochastic differential equations. Physical Review E, 2018, 98(2): 022109.
- [6] Batz P, Ruttor A, Opper M. Variational estimation of the drift for stochastic differential equations from the empirical density. Journal of Statistical Mechanics: Theory and Experiment, 2016, 8(8): 083404.
- [7] Gottschall J, Peinke J. On the definition and handling of different drift and diffusion estimates. New Journal of Physics, 2008, 10(8): 083034.
- [8] Honisch C, Friedrich R. Estimation of Kramers-Moyal coefficients at low sampling rates. Physical Review E, 2011, 83(6): 066701.
- [9] Papaspiliopoulos O, Pokern Y, Roberts G O, et al. Nonparametric estimation of diffusions: a differential equations approach. Biometrika, 2012, 99(3): 511-531.
- [10] Ruttor A, Batz P, Opper M. Approximate Gaussian process inference for the drift function in stochastic differential equations. Advances in Neural Information Processing Systems. 2013. 2040-2048.
- [11] Ken-Iti S. Lévy processes and infinitely divisible distributions. Cambridge University Press, 1999.
- [12] Applebaum D. Lévy Processes and Stochastic Calculus. Cambridge University Press, 2009.
- [13] Li Y, Duan J. A data-driven approach for discovering stochastic dynamical systems with non-Gaussian Lévy noise. Physica D: Nonlinear Phenomena, 2021, 417: 132830.
- [14] Kullback S, Leibler R A. On information and sufficiency. The annals of mathematical statistics, 1951, 22(1): 79-86.
- [15] Risken H. The Fokker-Planck Equation. Berlin: Springer, 1996.
- [16] Duan J. An Introduction to Stochastic Dynamics. New York: Cambridge University Press, 2015.
- [17] Hyvärinen A, Dayan P. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 2005, 6(4): 695-709.
- [18] Rasmussen C E, Williams C K I. Gaussian Processes for Machine Learning. Cambridge: The MIT Press, 2006.
- [19] Archambeau C, Opper M, Shen Y, et al. Variational inference for diffusion processes. Advances in Neural Information Processing Systems. 2008. 17-24.
- [20] Chernyak V Y, Chertkov M, Bierkens J, et al. Stochastic optimal control as non-equilibrium statistical mechanics: Calculus of variations over density and current. Journal of Physics A: Mathematical and Theoretical, 2013, 47(2): 022001.