Bayesian Filtering for Nonlinear Stochastic Systems Using
Holonomic Gradient Method with Integral Transform
Abstract
This paper proposes a symbolic-numeric Bayesian filtering method for a class of discrete-time nonlinear stochastic systems to achieve high accuracy with a relatively small online computational cost. The proposed method is based on the holonomic gradient method (HGM), which is a symbolic-numeric method to evaluate integrals efficiently depending on several parameters. By approximating the posterior probability density function (PDF) of the state as a Gaussian PDF, the update process of its mean and variance can be formulated as evaluations of several integrals that exactly take into account the nonlinearity of the system dynamics. An integral transform is used to evaluate these integrals more efficiently using the HGM compared to our previous method. Further, a numerical example is provided to demonstrate the efficiency of the proposed method over other existing methods.
I Introduction
In most engineering applications, knowledge of the state of a dynamical system is required for monitoring or control via feedback. The optimal filtering theory was developed to estimate the current state of a system using the history of its outputs that are observable but contaminated by random noise. Following the success of the Kalman filter (KF) [1, 2] for linear systems with Gaussian noise, optimal filtering theory has been extended to applications involving nonlinear systems as well as non-Gaussian noise [3, 4, 5, 6, 7].
Bayesian filtering is a general framework of optimal filtering. Although numerous problems can be formulated as the Bayesian filtering problems, they are difficult to implement as algorithms on computers, particularly for the real-time applications in systems and control. The Monte-Carlo scheme, which was first introduced for the particle filter (PF) [8, 9] and subsequently used in other filtering algorithms, such as [7], is a powerful tool to compute integrations accompanied by statistical operations, such as marginalizations and expectations in nonlinear Bayesian filtering problems. Moreover, the nonlinearity of the system dynamics can be exactly accounted for by updating the particles according to the dynamics. To reap these benefits of the Monte-Carlo scheme, however, the number of particles must be sufficiently large, which requires heavy computational resources and may be unacceptable for applications with short sampling intervals. Another approach involves extending the KF to nonlinear cases such as the extended KF (EKF) [10], unscented KF (UKF) [11], cubature KF (CKF) [12], and Gauss-Hermite KF (GHKF) [13]. In all these methods, the posterior probability density function (PDF) is assumed to be or approximated as a Gaussian PDF. Under this Gaussian approximation, the marginalizations and expectations of the nonlinear functions, which are required to update the posterior mean and variance, can be efficiently computed using the Gauss quadrature rules. However, in exchange for efficiency, the Gauss quadrature rules cannot provide exact values of the integrals, which implies that the nonlinearity of the system dynamics can only be considered in the approximate sense. Thus, there is still room to extend the boundary of the trade-off between the exact consideration of nonlinearity and reduction of computational cost.
In nonlinear cases, the integrals in the posterior PDF are ones of general nonlinear functions that depend on parameters such as the observed output, which are both analytically and numerically intractable to evaluate. In our previous work [14], a symbolic-numeric method called the holonomic gradient method (HGM) [15] was used to evaluate the integrals efficiently. By utilizing the symbolic computation in terms of differential operators, we can perform integrations offline while considering the nonlinearity of the system exactly. The state estimation is then reduced to solving a set of initial-value problems (IVPs) of nonlinear ordinary differential equations (ODEs), which can be efficiently accomplished online with numerical integration methods such as the fourth order Runge-Kutta (RK4) or Adams-Bashforth-Moulton predictor-corrector (ABM4) methods. However, our previous method requires solving an IVP for each component of the posterior mean and variance in the one-step estimation. This incurs a high computational cost for the online computation and is unacceptable for short sampling intervals.
To overcome the computational limitations of our previous method, we use an integral transform. Integral transforms are useful tools to compute the moments of a probability distribution efficiently. For example, in [16], Idan and Speyer used an integral transform called the characteristic function of the multivariate Cauchy distribution. For linear systems under additive Cauchy noise, the characteristic function can be analytically propagated in time and can be used to derive the explicit forms of the posterior mean and variance. In this paper, we introduce an integral transform, which is different from, but similar to, the moment generating function. This similarity enables us to compute all the components of the posterior mean and variance efficiently from the integral transform and its derivatives. Moreover, the integral transform and its derivatives can be simultaneously evaluated by using the HGM only once. Hence, in contrast to our previous method, only one IVP must be solved, which leads to the reduction of the online computational cost.
Notations
For the field of real numbers and a vector of indeterminates , denotes the field of rational functions in the components of over . denotes a vector of differential operators, where . We abbreviate by if is clearly specified according to the context. For a multi-index vector , and denote and , respectively. The symbol denotes the noncommutative ring of differential operators with coefficients in . The subscript of is omitted if it is clear from the context. We denote the action of an element on a sufficiently smooth function as ; for instance, . We say that a differential operator annihilates a function if . We denote the set of all positive definite matrices by . The PDF of a stochastic variable is denoted by if it depends on a parameter .
II Problem Setting
In the Bayesian filtering approach for discrete-time nonlinear systems, the posterior PDF of the state at time step conditional on the output history from time steps to is recursively updated via the following equations [17].
(1) | ||||
We assume that the system dynamics and observation process are defined by the following state and observation equations.
(2) | |||
(3) |
where and denote the system and observation noises that are assumed to be independent and identically distributed with the PDFs and , respectively. In addition, denotes a given input of the system. Using the rule of transformation of PDFs [10], the conditional PDFs and in (1) can be rewritten using functions , , , and as follows:
(4) | ||||
(5) |
where the former PDF is conditional on and parametrized by owing to (2). By substituting (4) and (5) into (1), we can describe the update law of Bayesian filtering using , , , and .
Since Bayesian filtering is a recursive algorithm, we can focus on the one-step update of the posterior PDF and omit the subscript . Moreover, the past history of outputs has nothing to do with the update at time step ; it only appears in the previous posterior PDF , so we also omit the past history . Thus, (1) can be summarized as
(6) |
where , , and denote the current state, output, and input, respectively; denotes the previous state, and denotes the joint PDF of and defined as the second line in (1).
The update law (6) can be viewed as a functional that maps the previous posterior PDF to the current one . In the linear case with Gaussian noise, this functional can be reduced to simple arithmetic operations of the mean and variance of the previous posterior PDF to yield the prediction and update steps of the Kalman filter. However, in the nonlinear cases, the Gaussian property of the posterior PDF can no longer be guaranteed, which makes it extremely difficult to compute the functional. Therefore, as the first step to overcome this difficulty, we approximate the posterior PDF using a Gaussian.
For the following discussion, denotes an approximation of the PDF with the same stochastic variables. Suppose we have a Gaussian approximating , where is defined as
By replacing with in the definition of , it can be approximated as
(7) |
and can be approximated as
(8) |
Note that the approximating PDF on the left-hand side is not Gaussian since we consider the nonlinearities of functions and without any approximations. Hence, we further approximate with a Gaussian of the same mean and variance as those of , that is, where
(9) |
and
(10) |
Definitions (9) and (10) provide a finite number of functions that map the parameters , , , and to the current estimates and . Hence, by evaluating the right-hand sides of (9) and (10) recursively, we can perform Bayesian filtering while exactly considering the nonlinearities of and . For simplicity, we define for a scalar-, vector-, or matrix-valued function as
which is an integral over depending on the parameters , , , and . The computations of (9) and (10) are then reduced to evaluations of three integrals, namely , , and , because and can be written as and , respectively.
For nonlinear KFs, the posterior PDF is usually approximated as Gaussian [12, 18]. The integrals , , and are also approximated without considering the effects of higher order moments induced by the nonlinearity of and . More specifically, the nonlinear KFs first approximate the integral in (7) by computing the mean and variance of the nonlinear function and further approximate the integral in the denominator of (8) in the same way using . By contrast, the proposed method considers the exact integrals using the HGM.
On the other hand, the PF approximates the integrals appearing in (1) using the Monte-Carlo scheme instead of the Gaussian approximation. Therefore, for a sufficiently large number of particles, the PF must exhibit a better accuracy than the proposed method. However, in the numerical example provided in Section V, we observe that the computational cost for the PF is more than that of the proposed method for the same level of accuracy.
Remark 1
Although, as mentioned above, the Gaussian approximation of the posterior distribution is a common approach in nonlinear filtering theory, this assumption is not suited to the cases of noise with heavy-tailed distributions, such as the Cauchy noise. In such cases, however, the proposed method may still be applicable by approximating the posterior distribution using a heavy-tailed distribution with some parameters. Provided that the parameters of the approximating distribution can be expressed as functions of the previous estimates, observed output, and known input (which corresponds to (9) and (10) for the Gaussian approximation), the proposed method can be performed in almost the same manner, which could be considered in future work.
III Holonomic Gradient Method
As described in the previous section, the update law of the mean and variance can be formulated as the evaluation of three integrals , , and . For linear systems with Gaussian noise, these integrals can be computed analytically using the Gaussian integral. However, for nonlinear systems, we need a numerical method because it is an extremely complex process to compute the integrals analytically. To achieve the trade-off between the exact consideration of nonlinearity and reduction of computational cost, we use a symbolic-numeric method for evaluating the complex integrals, called the HGM. We only provide the outline of the HGM in a specific situation due to limited space.
Let us consider the evaluation of the following integral.
(11) |
where and are smooth scalar-valued functions in and . For example, can be regarded as integral (11) with , , , and corresponds to .
Definition 1
The function is called holonomic if there exists a set of differential operators such that the vector-valued function defined as
(12) |
satisfies a set of the following PDEs.
(13) |
where . The set of PDEs (13) is called the Pfaffian system of .
We assume that we know and in Definition 1 for the holonomic function . Moreover, we assume that the vector at a certain point is known. For example, the vector can be directly computed from (11) if we use sufficient computational resources. Once is obtained, by virtue of (13), we can efficiently evaluate at a point without directly evaluating the multiple integral (11). The target point can be arbitrarily selected considering that the following process is successfully performed.
Let be a smooth curve with and . Considering the Pfaffian system (13) on this curve, we obtain an IVP
(14) |
This IVP can be solved to obtain using numerical solution methods such as the RK4 method if none of the denominators in vanish at any . Consequently, we can evaluate as the first component of .
The finite set and matrices , which are supposed to be known, can be computed from a finite set of differential operators called a basis of a zero-dimensional ideal in . The following theorem provides a way to find such a basis for a given function and to determine whether the function is holonomic or not (We have skipped the details for simplicity).
Theorem 1
[19] A smooth function is holonomic if and only if there exist differential operators, described as finite sums of the following form.
(15) |
which annihilate , i.e., for . Moreover, the set of differential operators is a basis of a zero-dimensional ideal in .
Example 1
It may be difficult to find a basis of a zero-dimensional ideal for directly from the expression (11). Using symbolic computation, we can compute a basis of a zero-dimensional ideal for from those for and [20], which implies the following corollary [20, 19].
Corollary 1
Example 2
For and , let and . Evidently, is annihilated by and , and is annihilated by and . That is, both functions are holonomic. Moreover, is rapidly decreasing with respect to and so is the product , which indicates that defined by (11) is holonomic (Corollary 1). Using Risa/Asir and its library nk_restriction, we can compute as a basis of a zero-dimensional ideal for from the differential operators annihilating and . Indeed, it is easy to confirm that in this example , which is annihilated by the basis.
Corollary 1, combined with the following assumption, ensures that is holonomic.
Assumption 1
The conditional PDFs and in (7) are holonomic functions.
IV Evaluation of Mean and Variance via Integral Transform
Under Assumption 1, all the components of the integrals , , and are holonomic functions of , , , and . Therefore, they can be efficiently evaluated using the HGM if the corresponding Pfaffian systems (13) and initial vectors (12) are computed in advance. The authors propose a method to compute (9) and (10) by directly using the HGM [14]. Although the previous method showed better performance than other existing methods for a numerical example, the IVP (14) has to be solved for each component of the three integrals. If we can decrease the number of IVPs, then it reduces the online computational cost. Hereafter, denotes a vector consisting of all the independent components of , , , and , where . With this notation, for example, denotes as a function of , , , , and .
In this work, we use an integral transform of to reduce the number of IVPs solved in the one-step estimation. Consider an integral transform defined as follows.
(16) |
Note that although this definition is similar to that of the moment generating function, it is different because is the PDF of the and pair rather than . We assume that there exists a compact subset of including the origin such that for all in the subset, (16) can be defined and smooth.
The integrals , , and are then obtained from the derivatives of at as follows:
(17) |
(18) |
Here, the approximating PDF is a holonomic function under Assumption 1, and the kernel is also holonomic. This implies that is also a holonomic function by Corollary 1. Therefore, exists such that the vector-valued function defined by and satisfies the Pfaffian system (13). Using the Pfaffian system, we can explicitly express and its derivatives (17) as well as (18) as a linear combination of the components of with coefficients in .
First, itself is the first component of , thus satisfying the following:
(19) |
where is a constant coefficient vector . Next, the first derivatives are obtained using the first components of the left-hand sides of (13). Hence, the following identities hold.
(20) |
where the coefficient vector is the first row vector of in (13). Finally, the second derivatives are obtained by differentiating both sides of (13); , and hence
(21) |
where the coefficient vector is the first row vector of .
We assume the explicit expressions of and are obtained. Using the expressions, we can compute the values of , , and at from (19)–(21) if we have the vector for a given . The computation of can be accomplished by numerically integrating (14) using numerical integration methods, such as the RK4 method. Therefore, the current mean and variance for a given data can be computed from (9) and (10) as
(22) |
In the previous method [14], IVP (14) has to be solved for each of the three integrals , , and even for the one-dimensional case, that is, three IVPs have to be solved in total. By contrast, only a single IVP for needs to be solved in the proposed method. This decrease in the number of IVPs leads to a reduction in the computational cost, which can be seen in Section V.
V Numerical Example
This section presents a numerical example to show the efficiency of the proposed method. In the following demonstration, we use Risa/Asir [21] for the symbolic computation, Maple for the offline numerical computation, and Python for the online numerical computation on a PC (Intel(R) Core(TM) i7-1065G7 CPU @ 1.30 GHz; RAM: 16 GB).
V-A Problem setting and results of offline computation
Consider the following nonlinear stochastic one-dimensional system.
(23) |
where and are the system and observation noises with the standard Gaussian PDF, respectively, and is given as the function of the discrete-time step .
First, we symbolically compute the inputs of Algorithm 1, namely, bases of zero-dimensional ideals annihilating , (4), (5), and . By differentiating with respect to each variable, we can obtain the differential operators annihilating as
(24) | ||||
The conditional PDF (4) is obtained by substituting the state equation into the PDF of and is annihilated by the following three differential operators:
(25) | ||||
The other conditional PDF (5) is also obtained by substituting the observation equation into the PDF of and is annihilated by
(26) |
Finally, is annihilated by
(27) |
By Theorem 1, the sets of differential operators (24)–(27) are bases of zero-dimensional ideals annihilating , , , and , respectively.
Using Algorithm 1, we can derive a basis of a zero-dimensional ideal annihilating , a finite set of differential operators , and the coefficient vectors and matrices , where . In this example, consists of seven differential operators, which include
The other operators are omitted because of space limitations. The finite set is obtained as . Hence, becomes a seven-dimensional vector-valued function:
(28) |
where represents the actions of all the components on . All entries of the coefficient matrices as well as the coefficient vectors in (20) and (21) are explicitly computed from . For example, an overview of is presented bellow.
where the entries denote rational functions of and are omitted because of the space limitations.
Finally, we have to choose the initial points and compute corresponding initial vectors . The choice of the initial points depends on the denominators in because they should not vanish on the integration path; otherwise, the right-hand side of the ODE in (14) diverges and therefore Algorithm 2 fails (line: 5). In this example, however, the least common multiple of all the denominators in is , which must not be zero because . Therefore, any component of must not diverge. For this example, we choose the following eight initial points: . The corresponding initial vectors can be computed numerically from the definition (28).
V-B Settings for online computation and estimation results
In the online computations, the data consisting of the input , output , and previous estimates and are given at each time step. Let denote the given data at a time step. For this example, we fix the integration path in line 3 of Algorithm 2 to a line segment from an initial point to a given data , that is, . The initial point is therefore selected for each given data such that the length of the line segment is the minimum among the prescribed set . The numerical integration along the defined path is performed using the ABM4 method, with the first three steps initialized using the RK4 method. Under these settings, Algorithm 2 is performed at each time step to compute the next estimates and from the previous ones and using the measurement and the known input . Figure 1 shows a realization of the system (23) and the estimation results obtained from the proposed method.


For comparison, we implemented the EKF, UKF, PF, and our previous method. The number of particles in the PF was set to 100 so that its accuracy would be almost the same as that of the proposed method. Three hundred realizations are generated from the random initial states and sequences of system and observation noises for this comparison.
To compare the performances of all the methods, the negative log-likelihood (NLL) [22] was computed. Figure 2 shows the NLL of each method averaged over all realizations. As can be seen, only the PF of 100 particles shows comparable performance to the proposed and previous methods. The averaged NLL of the previous method is identical to that of the proposed method because both approaches compute the same estimates (22). The computational times for the one-step estimations of all the methods are summarized as boxplots in Fig. 3. It can be seen that the proposed method is faster than the previous method as well as the PF with 100 particles, which show comparable performances for the NLL values. These results indicate that the proposed method outperforms the PF and our previous method.


As shown in this example, there exists at least a single problem in which the proposed method outperforms the existing methods, exhibiting the superiority of the proposed method. Currently, it is difficult to demonstrate the superiority for general cases because of limitations such as the instability of ODE (14). Therefore, the theoretical or experimental analysis demonstrating the superiority will be a part of future research.
Despite the complex procedure, the principle behind the proposed method is simple; it is basically the evaluation of the integrals , , and , which can be naturally obtained by the Gaussian approximation of the posterior PDF in Bayesian filtering. Thus, the proposed method is simple to understand except for the precise evaluation of integrals, which is complex. Moreover, if it is formulated as a set of Algorithms 1 and 2, the implementation of the proposed method can be automated using symbolic computation so that the users do not have to be concerned about its complex procedure. Therefore, we believe that it is worth studying the proposed method, and the improvement of its usability should be a part of the future research.
VI Conclusion
A symbolic-numeric method is proposed herein to perform Bayesian filtering for a class of nonlinear stochastic systems. The posterior PDF of the state is first approximated using a Gaussian PDF, and the mean and variance are then updated while considering the nonlinearity of the system exactly. This update process is formulated as evaluations of several integrals using the HGM. We also propose an integral transform to compute the mean and variance efficiently from the vector-valued function obtained using the HGM.
It is known that the numerical integration in the third step of the HGM may diverge owing to errors in the initial vectors. Therefore, in future work, the selection of integration paths or initial points will be studied in depth to enable more stable numerical integrations, for example, by considering the special structure of the problems considered in this work. Another direction for future work is approximating the posterior PDF more accurately using a PDF with more parameters than the Gaussian, such as the skew Gaussian.
References
- [1] R. E. Kalman, “A new approach to linear filtering and prediction problems,” ASME Journal of Basic Engineering, vol. 82, pp. 35–45, 1960.
- [2] R. E. Kalman and R. S. Bucy, “New results in linear filtering and prediction theory,” ASME Journal of Basic Engineering, vol. 83, no. 1, pp. 95–108, 1961.
- [3] I. Rusnak, “Maximum likelihood optimal estimator of non-autonomous nonlinear dynamic systems,” in Proceedings of European Control Conference, pp. 909–914, 2015.
- [4] X. Luo, Y. Jiao, W. L. Chiou, and S. S. Yau, “A novel suboptimal method for solving polynomial filtering problems,” Automatica, vol. 62, pp. 26–31, 2015.
- [5] B. Chen and G. Hu, “Nonlinear state estimation under bounded noises,” Automatica, vol. 98, pp. 159–168, 2018.
- [6] N. Duong, J. L. Speyer, J. Yoneyama, and M. Idan, “Laplace estimator for linear scalar systems,” in Proceedings of the IEEE Conference on Decision and Control, pp. 2283–2290, 2018.
- [7] K. Li, F. Pfaff, and U. D. Hanebeck, “Unscented dual quaternion particle filter for SE(3) estimation,” IEEE Control Systems Letters, vol. 5, no. 2, pp. 647–652, 2021.
- [8] G. Kitagawa, “Monte Carlo filtering and smoothing method for non-Gaussian nonlinear state space model,” Institute of Statistical Mathematics Research Memorandum, vol. 462, 1993.
- [9] N. J. Gordon, D. J. Salmond, and A. F. M. Smith, “Novel approach to nonlinear/non-Gaussian Bayesian state estimation,” in IEE Proceedings F - Radar and Signal Processing, vol. 140, pp. 107–113, 1993.
- [10] A. H. Jazwinski Stochastic Processes and Filtering Theory, Elsevier Science, 1970.
- [11] S. J. Julier and J. K. Uhlmann, “New extension of the Kalman filter to nonlinear systems,” in SPIE, vol. 3068, 1997.
- [12] I. Arasaratnam and S. Haykin, “Cubature Kalman filters,” IEEE Transactions on Automatic Control, vol. 54, no. 6, pp. 1254–1269, 2009.
- [13] K. Ito and K. Xiong, “Gaussian filters for nonlinear filtering problems,” IEEE Transactions on Automatic Control, vol. 45, no. 5, pp. 910–927, 2000.
- [14] T. Iori and T. Ohtsuka, “Symbolic-numeric computation of posterior mean and variance for a class of discrete-time nonlinear stochastic systems,” in Proceedings of the IEEE Conference on Decision and Control, pp. 4814–4821, 2020.
- [15] H. Nakayama, K. Nishiyama, M. Noro, K. Ohara, T. Sei, N. Takayama, and A. Takemura, “Holonomic gradient descent and its application to the Fisher-Bingham integral,” Advances in Applied Mathematics, vol. 47, no. 3, pp. 639–658, 2011.
- [16] M. Idan and J. L. Speyer, “Multivariate Cauchy estimator with scalar measurement and process noises,” SIAM Journal on Control and Optimization, vol. 52, no. 2, pp. 1108–1141, 2014.
- [17] J. A. E. Bryson and Y.-C. Ho Applied Optimal Control, John Wiley & Sons, 1st edition, 1975.
- [18] H. Fang, N. Tian, Y. Wang, M. Zhou, and M. A. Haile, “Nonlinear Bayesian estimation: from Kalman filter to a broader horizon,” IEEE/CAA Journal of Automatica Sinica, vol. 5, no. 2, pp. 401–417, 2018.
- [19] T. Hibi ed. Gröbner Bases: Statistics and Software Systems, Springer Japan, 1st edition, 2013.
- [20] T. Oaku, Y. Shiraki, and N. Takayama, “Algebraic algorithms for D-modules and numerical analysis,” Computer Mathematics (Proceedings of ASCM 2003), vol. 10, pp. 23–39, 2003.
- [21] M. Noro, N. Takayama, H. Nakayama, K. Nishiyama, and K. Ohara, “Risa/Asir: A computer algebra system,” http://www.math.kobe-u.ac.jp/Asir/asir.html, 2020.
- [22] M. P. Deisenroth Efficient Reinforcement Learning Using Gaussian Processes, KIT Scientific Publishing, 2010.