This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Maximum Correntropy Ensemble Kalman Filter

Yangtianze Tao, Jiayi Kang and Stephen Shing-Toung Yau This work is supported by National Natural Science Foundation of China (NSFC) grant (11961141005) and Tsinghua University Education Foundation fund (042202008). Corresponding author: Stephen Shing-Toung Yau.Yangtianze Tao and Jiayi Kang are with the Department of Mathematical Sciences, Tsinghua University, Beijing 100084, China. Email: [email protected], [email protected]Stephen Shing-Toung Yau is with the Department of Mathematical Sciences, Tsinghua University, Beijing 100084, China, and with the Yanqi Lake Beijing Institute of Mathematical Sciences and Applications, Huairou district, Beijing 101400, China. Email: [email protected]
Abstract

In this article, a robust ensemble Kalman filter (EnKF) called MC-EnKF is proposed for nonlinear state-space model to deal with filtering problems with non-Gaussian observation noises. Our MC-EnKF is derived based on maximum correntropy criterion (MCC) with some technical approximations. Moreover, we propose an effective adaptive strategy for kernel bandwidth selection. Besides, the relations between the common EnKF and MC-EnKF are given, i.e., MC-EnKF will converge to the common EnKF when the kernel bandwidth tends to infinity. This justification provides a complementary understanding of the kernel bandwidth selection for MC-EnKF. In experiments, non-Gaussian observation noises significantly reduce the performance of the common EnKF for both linear and nonlinear systems, whereas our proposed MC-EnKF with a suitable kernel bandwidth maintains its good performance at only a marginal increase in computing cost, demonstrating its robustness and efficiency to non-Gaussian observation noises.

I Introduction

Filtering [1] is the fundamental issue for the state estimation of the state-space model [2]. It has a large number of applications in many fields, such as robot vision [3] and data assimilation [4]. For the linear state-space model with Gaussian noise, the optimal filtering solution can be computed analytically, which is the well-known Kalman filter (KF) [5]. Since then, a large number of filtering algorithms for nonlinear state-space model with Gaussian noise have been proposed. The representatives among them are extended Kalman filter (EKF) [6], unscented Kalman filter (UKF) [7], cubature Kalman filter (CKF) [8] and ensemble Kalman filter (EnKF) [9].

In the case of Gaussian noise, the KF and its expansions work well. However, the noise frequently does not fit the Gaussian distribution in real-world application circumstances. For instance, in many practical settings of target tracking [10] and power systems [11], impulsive interferences and observation outliers are frequent. These disturbances are often modelled by some heavy-tailed impulsive noises (such as some mixed-Gaussian distributions). The main reason for this problem is that the KF is based on the well-known minimum mean square error (MMSE) criterion [12], which is sensitive to large outliers and deteriorates the KF’s robustness in non-Gaussian noise [13]. Numerous studies have attempted to address the filtering problems with non-Gaussian noise, such as filters based on information theoretic quantities [14], the Huber-based KFs [15] and robust Student’s t filter [16]. Besides these filters, a local similarity measure called correntropy [17] has been used to develop new robust filters [18, 19, 20, 21]. These filters are derived by maximum correntropy criterion (MCC) and called MCC-based filters, which can achieve great performance in the presence of heavy-tailed non-Gaussian noises since correntropy is insensitive to large outliers.

In this article, we formulate the estimation problem under MCC framework. Under this framework, the goal of this article is to develop a robust EnKF called the maximum correntropy EnKF (MC-EnKF) based on the MCC, to address nonlinear filtering problems when observations contain large outliers. With the help of suitable approximation techniques, we derive the recursion of ensembles for MC-EnKF based on MCC cost function. It is necessary to note that MC-EnKF employs the empirical prediction covariance calculated by ensembles. This procedure is consistent with the common EnKF, which allows us to prove that MC-EnKF will converge to the common EnKF when the kernel bandwidth tends to infinity. This justification provides a complementary knowledge of understanding the kernel bandwidth selection for MC-EnKF. Although MC-EnKF aims to handle non-Gaussian noise cases, it also has the flexibility to handle Gaussian noise cases with a large enough kernel bandwidth since EnKF performs well in this scenario. The contributions of this article are listed as follows.

  • We introduce the idea of the cost function to the field of EnKF, which inspires us to derive a robust EnKF called MC-EnKF based on the MCC cost function. Moreover, we propose an effective adaptive strategy for kernel bandwidth selection. Besides, we provide a complementary understanding of the kernel bandwidth selection theoretically, i.e., MC-EnKF will converge to the common EnKF when the kernel bandwidth is large enough. This justification gives the flexibility of MC-EnKF to handle cases involving both Gaussian and non-Gaussian noise.

  • Our theoretical result is supported by numerical experiments on several filtering benchmarks, i.e., our proposed MC-EnKF will perform like the common EnKF when the kernel bandwidth is large enough. Furthermore, experiments demonstrate that our proposed adaptive strategy for kernel bandwidth selection is effective. And with the appropriate kernel bandwidth, MC-EnKF can outperform EnKF with only a minor increase in computing cost when the underlying observation system is hampered by some heavy-tailed non-Gaussian noises.

The remainder of this article is organized as follows. In section II, we present the concept of MCC, and briefly review the nonlinear filtering problems and EnKF algorithm. In section III, we derive our MC-EnKF algorithm and propose an effective adaptive strategy for kernel bandwidth selection with necessary discussions. The experiments and discussions are presented in Section IV. The conclusions are drawn in Section V.

I-A Notations

Throughout the article, we use boldface lower-case letters for vectors and boldface upper-case letters for matrices. The transpose and mathematical expectation are denoted by {}\{\cdot\}^{\top} and 𝔼[]\mathbb{E}\left[\cdot\right], respectively. The Gaussian distribution with mean μ\mu and covariance 𝚺\bm{\Sigma} is denoted by 𝒩(μ,𝚺)\mathcal{N}(\mu,\bm{\Sigma}). n\mathbb{R}^{n} and m×n\mathbb{R}^{m\times n} denote, respectively, the nn-dimensional Euclidean space and the set of all n×mn\times m real matrices. Especially, 𝐈n\mathbf{I}_{n} denotes the n-dimensional identity matrix. weighted l2l^{2} norm 𝐱𝐀2:=𝐱𝐀1𝐱\|\mathbf{x}\|_{\mathbf{A}}^{2}:=\mathbf{x}^{\top}\mathbf{A}^{-1}\mathbf{x} with 𝐱n\mathbf{x}\in\mathbb{R}^{n} and non-singular matrix 𝐀n×n\mathbf{A}\in\mathbb{R}^{n\times n}.

II Background

In this section, we shall first introduce the concept of correntropy [17], which is a local similarity measure between two random vectors. Then we shall formulate the estimation problems based on MCC and MMSE estimate [12]. At last, we shall briefly review the framework of nonlinear filtering problems and introduce the EnKF algorithm.

II-A Maximum Correntropy Criterion and Minimum Mear Square Error

II-A1 Correntropy

For given two random vectors 𝐱\mathbf{x} and 𝐲\mathbf{y}, the correntropy between 𝐱\mathbf{x} and 𝐲\mathbf{y} is denoted by 𝒱(𝐱,𝐲)\mathcal{V}(\mathbf{x},\mathbf{y}), which is defined as follows:

𝒱(𝐱,𝐲)=𝔼[𝒢σ(𝐱𝐲𝐀1)],\mathcal{V}(\mathbf{x},\mathbf{y})=\mathbb{E}\left[\mathcal{G}_{\sigma}(\|\mathbf{x}-\mathbf{y}\|_{\mathbf{A}_{1}})\right], (1)

where 𝐀1\mathbf{A}_{1} is a given non-singular matrix with suitable dimension, 𝒢σ\mathcal{G}_{\sigma} is the Gaussian Kernel given by 𝒢σ(e)=exp(e22σ2)\mathcal{G}_{\sigma}(e)=\exp\left(-\frac{e^{2}}{2\sigma^{2}}\right), and σ>0\sigma>0 stands for the kernel bandwidth. Here we shall recall the important property for 𝒱(𝐱,𝐲)\mathcal{V}(\mathbf{x},\mathbf{y}), more details can be found in [17]. By taking the Taylor series expansion of the Gaussian Kernel, we have

𝒱(𝐱,𝐲)=n=0(1)n2nσ2nn!𝔼[𝐱𝐲𝐀12n].\mathcal{V}(\mathbf{x},\mathbf{y})=\sum_{n=0}^{\infty}\frac{(-1)^{n}}{2^{n}\sigma^{2n}n!}\mathbb{E}\left[\|\mathbf{x}-\mathbf{y}\|_{\mathbf{A}_{1}}^{2n}\right]. (2)

The correntropy can be seen as the weighted sum of all even order moments of the error vectors 𝐱𝐲\mathbf{x}-\mathbf{y}. The parameter to weight the second order and higher order moments appears to be the kernel bandwidth. The second order moment will predominate in the correntropy when the kernel bandwidth is particularly large compared to the error vectors range.

II-A2 Estimation Problems Formulation

Correntropy can be used as the optimality criterion for estimation problems. Suppose our goal is to learn a parameter θ\theta for a given estimator 𝐱(θ)\mathbf{x}(\theta), and let 𝐲\mathbf{y} denote the desired output. Then the MCC-based estimation problem can be formulated as solving the following optimzation problem:

𝜽^1=argmax𝜽Ω1𝒱(𝐱(θ),𝐲),\hat{\bm{\theta}}_{1}=\arg\max_{\bm{\theta}\in\Omega_{1}}\ \mathcal{V}(\mathbf{x}(\theta),\mathbf{y}), (3)

where 𝜽^1\hat{\bm{\theta}}_{1} denotes its optimal solution and Ω1\Omega_{1} denotes its feasible set for parameters. It is necessary to compare MCC with the conventional MMSE criterion. Let Ω2\Omega_{2} denotes its feasible set for parameters with 𝜽^2\hat{\bm{\theta}}_{2} as optimal solution. Then for the given non-singular matrix 𝐀2\mathbf{A}_{2} with suitable dimension, the MMSE-based estimation problem is formulated as follows:

𝜽^2\displaystyle\hat{\bm{\theta}}_{2} =argmax𝜽Ω2𝐱(θ)𝐲𝐀2\displaystyle=\arg\max_{\bm{\theta}\in\Omega_{2}}\ -\|\mathbf{x}(\theta)-\mathbf{y}\|_{\mathbf{A}_{2}} (4)
=argmin𝜽Ω2𝐱(θ)𝐲𝐀2.\displaystyle=\arg\min_{\bm{\theta}\in\Omega_{2}}\ \|\mathbf{x}(\theta)-\mathbf{y}\|_{\mathbf{A}_{2}}.

II-B Nonlinear Filtering Problems

In this article, we consider the nonlinear state-space model given by the following state and observation equations:

𝐱k\displaystyle\mathbf{x}_{k} =𝐟k(𝐱k1)+𝐰k\displaystyle=\mathbf{f}_{k}(\mathbf{x}_{k-1})+\mathbf{w}_{k}\quad (5)
𝐲k\displaystyle\mathbf{y}_{k} =𝐡k(𝐱k)+𝐯k,\displaystyle=\mathbf{h}_{k}(\mathbf{x}_{k})+\mathbf{v}_{k},

where state 𝐱kn\mathbf{x}_{k}\in\mathbb{R}^{n}, and observation 𝐲km\mathbf{y}_{k}\in\mathbb{R}^{m}. 𝐟k\mathbf{f}_{k} and 𝐡k\mathbf{h}_{k} are nonlinear functions called state equation and observation equation, respectively. State noise 𝐰k\mathbf{w}_{k} and observation noise 𝐯k\mathbf{v}_{k} are zero means with nominal covariance 𝐐kn×n\mathbf{Q}_{k}\in\mathbb{R}^{n\times n} and 𝐑km×m\mathbf{R}_{k}\in\mathbb{R}^{m\times m}, respectively. Let 𝐲1:k\mathbf{y}_{1:k} denote the σ\sigma-algebra generated by noisy observations {𝐲1,,𝐲k}\{\mathbf{y}_{1},\dots,\mathbf{y}_{k}\}. The filtering problem refers to estimating the posterior distribution p(𝐱k𝐲1:k)p(\mathbf{x}_{k}\mid\mathbf{y}_{1:k}), which is called filtering distribution.

In the follow-up, we focus on the scenario in which the observation noises are not Gaussian, i.e., they are induced by unknown large outliers. Hence, the real distribution of observation noise is unknown to us in fact. For example, we consider (1ϵ)𝒩(0,𝐑k)+ϵ𝒮(0,𝐒k)(1-\epsilon)\ \mathcal{N}(0,\mathbf{R}_{k})+\epsilon\ \mathcal{S}(0,\mathbf{S}_{k}) , where 0<ϵ10<\epsilon\ll 1 is the unknown probability, and 𝒮(0,𝐒k)\mathcal{S}(0,\mathbf{S}_{k}) is an arbitrary unknown distribution with large covariance 𝐒k\mathbf{S}_{k}. Here 𝐑k\mathbf{R}_{k} is known to us, so it is called the nominal covariance matrix. Additionally, in what follows, we shall develop our novel filter to deal with such noises above by utilizing the MCC as the cost function involving the nominal observation covariance 𝐑k\mathbf{R}_{k}.

II-C Ensemble Kalman Filter

The EnKF sequentially approximates the filtering distributions p(𝐱k𝐲1:k)p(\mathbf{x}_{k}\mid\mathbf{y}_{1:k}) using NN equally weighted ensembles {𝐱k(1),,𝐱k(N)}\{\mathbf{x}_{k}^{(1)},\dots,\mathbf{x}_{k}^{(N)}\}. At prediction steps, each ensemble 𝐱k(i)\mathbf{x}_{k}^{(i)} is propagated using the state equation, while at update steps a Kalman-type update is performed for each ensemble:

𝐱kk1(i)=𝐟k(𝐱k1k1(i))+𝐰k(i),\mathbf{x}_{k\mid k-1}^{(i)}=\mathbf{f}_{k}(\mathbf{x}_{k-1\mid k-1}^{(i)})+\mathbf{w}_{k}^{(i)}, (6)

and

𝐱kk(i)=𝐱kk1(i)+𝐊^k(𝐲k+𝐯k(i)𝐡k(𝐱kk1(i))),\mathbf{x}_{k\mid k}^{(i)}=\mathbf{x}_{k\mid k-1}^{(i)}+\hat{\mathbf{K}}_{k}\left(\mathbf{y}_{k}+\mathbf{v}_{k}^{(i)}-\mathbf{h}_{k}(\mathbf{x}_{k\mid k-1}^{(i)})\right), (7)

where 𝐰k(i)𝒩(0,𝐐k)\mathbf{w}_{k}^{(i)}\sim\mathcal{N}(0,\mathbf{Q}_{k}), 𝐯k(i)𝒩(0,𝐑k)\mathbf{v}_{k}^{(i)}\sim\mathcal{N}(0,\mathbf{R}_{k}) and the Kalman gain

𝐊^k=𝐂^k𝐇k(𝐇k𝐂^k𝐇k+𝐑k)1,\hat{\mathbf{K}}_{k}=\hat{\mathbf{C}}_{k}\mathbf{H}_{k}^{\top}(\mathbf{H}_{k}\hat{\mathbf{C}}_{k}\mathbf{H}_{k}^{\top}+\mathbf{R}_{k})^{-1}, (8)

is defined using the empirical prediction covariance 𝐂^k\hat{\mathbf{C}}_{k} of prediction ensembles {𝐱kk1(1),,𝐱kk1(N)}\{\mathbf{x}_{k\mid k-1}^{(1)},\dots,\mathbf{x}_{k\mid k-1}^{(N)}\}, namely

𝐂^k=1N1i=1N(𝐱kk1(i)𝐦^k)(𝐱kk1(i)𝐦^k),\hat{\mathbf{C}}_{k}=\frac{1}{N-1}\sum_{i=1}^{N}(\mathbf{x}_{k\mid k-1}^{(i)}-\hat{\mathbf{m}}_{k})(\mathbf{x}_{k\mid k-1}^{(i)}-\hat{\mathbf{m}}_{k})^{\top}, (9)

with

𝐦^k=1Ni=1N𝐱kk1(i),\hat{\mathbf{m}}_{k}=\frac{1}{N}\sum_{i=1}^{N}\mathbf{x}_{k\mid k-1}^{(i)}, (10)

and

𝐇k=𝐡k𝐱k𝐱k=𝐦^k.\mathbf{H}_{k}=\frac{\partial\mathbf{h}_{k}}{\partial\mathbf{x}_{k}}\mid_{\mathbf{x}_{k}=\hat{\mathbf{m}}_{k}}. (11)
Remark II.1

Empirical prediction mean in (10) and empirical prediction covariance in (9) provide a Gaussian approximation to the prediction distribution distribution, i.e.,

p(𝐱k𝐲1:k1)𝒩(𝐦^k,𝐂^k).p(\mathbf{x}_{k}\mid\mathbf{y}_{1:k-1})\approx\mathcal{N}(\hat{\mathbf{m}}_{k},\hat{\mathbf{C}}_{k}). (12)

III Proposed Algorithm

In this section, we shall present the derivation of MC-EnKF. Then we shall give the discussions on the adaptive strategy for kernel bandwidth selection and the convergence of MC-EnKF with respect to kernel bandwidth.

III-A Derivation of the Algorithm

Here we present how to derive MC-EnKF based on the MCC estimate (3). The prediction step of MC-EnKF is the same as those of EnKF, i.e., (6). Therefore, we shall focus on deriving its update step. Let 𝐱^kk1\hat{\mathbf{x}}_{k\mid k-1} and 𝐏kk1\mathbf{P}_{k\mid k-1} denote the prediction mean 𝔼[𝐱k𝐲1:k1]\mathbb{E}\left[\mathbf{x}_{k}\mid\mathbf{y}_{1:k-1}\right] and prediction covariance 𝔼[(𝐱k𝐱^kk1)(𝐱k𝐱^kk1)𝐲1:k1]\mathbb{E}\left[\left(\mathbf{x}_{k}-\hat{\mathbf{x}}_{k\mid k-1}\right)\left(\mathbf{x}_{k}-\hat{\mathbf{x}}_{k\mid k-1}\right)^{\top}\mid\mathbf{y}_{1:k-1}\right], respectively. As shown in [22], for the underlying linear system of (5), i.e., 𝐟k(𝐱k)=𝐅k𝐱k\mathbf{f}_{k}(\mathbf{x}_{k})=\mathbf{F}_{k}\mathbf{x}_{k} and 𝐡k(𝐱k)=𝐇k𝐱k\mathbf{h}_{k}(\mathbf{x}_{k})=\mathbf{H}_{k}\mathbf{x}_{k}, the update step of KF can be derived by using least square cost function,

1(𝐱k)=𝐲k𝐇k𝐱k𝐑k+𝐱k𝐱^kk1𝐏kk1.\mathcal{L}_{1}(\mathbf{x}_{k})=\|\mathbf{y}_{k}-\mathbf{H}_{k}\mathbf{x}_{k}\|_{\mathbf{R}_{k}}+\|\mathbf{x}_{k}-\hat{\mathbf{x}}_{k\mid k-1}\|_{\mathbf{P}_{k\mid k-1}}. (13)

Motivated by (13), we shall consider a new cost function based on MCC. Besides, in view of Remark II.1, we shall replace 𝐱^kk1\hat{\mathbf{x}}_{k\mid k-1} and 𝐏kk1\mathbf{P}_{k\mid k-1} in (13) by the empirical prediction mean 𝐦^k\hat{\mathbf{m}}_{k} in (10) and the empirical prediction covariance 𝐂^k\hat{\mathbf{C}}_{k} in (9). Therefore, our modified cost function for ensembles 𝐱kk(i)\mathbf{x}_{k\mid k}^{(i)} for i=1,2,,Ni=1,2,\dots,N is given by

2(𝐱k)\displaystyle\mathcal{L}_{2}(\mathbf{x}_{k}) =𝒢σ(𝐲k𝐡k(𝐱k)𝐑k)\displaystyle=\mathcal{G}_{\sigma}\left(\|\mathbf{y}_{k}-\mathbf{h}_{k}(\mathbf{x}_{k})\|_{\mathbf{R}_{k}}\right) (14)
+𝒢σ(𝐱k𝐦^k𝐂^k).\displaystyle+\mathcal{G}_{\sigma}\left(\|\mathbf{x}_{k}-\hat{\mathbf{m}}_{k}\|_{\hat{\mathbf{C}}_{k}}\right).

Then based on 2(𝐱k)\mathcal{L}_{2}(\mathbf{x}_{k}), the update step for MC-EnKF can be obtained solving the following optimization problem for i=1,2,,Ni=1,2,\dots,N:

𝐱kk(i)=argmax𝐱k2(𝐱k).\mathbf{x}_{k\mid k}^{(i)}=\arg\max_{\mathbf{x}_{k}}\ \mathcal{L}_{2}(\mathbf{x}_{k}). (15)

In what follows, the derivation contains some approximations. For the sake of obtaining our algorithm, we heuristically treat them as exact equalities. Let us denote

l𝐑k\displaystyle l_{\mathbf{R}_{k}} =𝒢σ(𝐲k𝐡k(𝐱k)𝐑k)\displaystyle=\mathcal{G}_{\sigma}\left(\left\|\mathbf{y}_{k}-\mathbf{h}_{k}\left(\mathbf{x}_{k}\right)\right\|_{\mathbf{R}_{k}}\right) (16)
l𝐂^k\displaystyle l_{\hat{\mathbf{C}}_{k}} =𝒢σ(𝐱k𝐦^k𝐂^k).\displaystyle=\mathcal{G}_{\sigma}\left(\|\mathbf{x}_{k}-\hat{\mathbf{m}}_{k}\|_{\hat{\mathbf{C}}_{k}}\right).

Then recall 𝐇k\mathbf{H}_{k} defined in (11), we consider this approximation when taking the derivative,

2(𝐱k)𝐱k=\displaystyle\frac{\partial\mathcal{L}_{2}\left(\mathbf{x}_{k}\right)}{\partial\mathbf{x}_{k}}= 1σ2(𝐡k𝐱k)l𝐑k𝐑k1(𝐲k𝐡k(𝐱k))\displaystyle-\frac{1}{\sigma^{2}}\left(\frac{\partial\mathbf{h}_{k}}{\partial\mathbf{x}_{k}}\right)^{\top}l_{\mathbf{R}_{k}}\mathbf{R}_{k}^{-1}\left(\mathbf{y}_{k}-\mathbf{h}_{k}\left(\mathbf{x}_{k}\right)\right) (17)
+1σ2l𝐂^k𝐂^k1(𝐱k𝐦^k)\displaystyle+\frac{1}{\sigma^{2}}l_{\hat{\mathbf{C}}_{k}}\hat{\mathbf{C}}_{k}^{-1}\left(\mathbf{x}_{k}-\hat{\mathbf{m}}_{k}\right)
\displaystyle\approx 1σ2𝐇kl𝐑k𝐑k1(𝐲k𝐡k(𝐱k))\displaystyle-\frac{1}{\sigma^{2}}\mathbf{H}_{k}^{\top}l_{\mathbf{R}_{k}}\mathbf{R}_{k}^{-1}\left(\mathbf{y}_{k}-\mathbf{h}_{k}\left(\mathbf{x}_{k}\right)\right)
+1σ2l𝐂^k𝐂^k1(𝐱k𝐦^k).\displaystyle+\frac{1}{\sigma^{2}}l_{\hat{\mathbf{C}}_{k}}\hat{\mathbf{C}}_{k}^{-1}\left(\mathbf{x}_{k}-\hat{\mathbf{m}}_{k}\right).

Now letting 2(𝐱k)𝐱k=0\frac{\partial\mathcal{L}_{2}\left(\mathbf{x}_{k}\right)}{\partial\mathbf{x}_{k}}=0, we have

𝐇kl𝐑k𝐑k1(𝐲k𝐡k(𝐱k))=l𝐂^k𝐂^k1(𝐱k𝐦^k).\mathbf{H}_{k}^{\top}l_{\mathbf{R}_{k}}\mathbf{R}_{k}^{-1}\left(\mathbf{y}_{k}-\mathbf{h}_{k}\left(\mathbf{x}_{k}\right)\right)=l_{\hat{\mathbf{C}}_{k}}\hat{\mathbf{C}}_{k}^{-1}\left(\mathbf{x}_{k}-\hat{\mathbf{m}}_{k}\right). (18)

Adopting the first-order Taylor series to approximate the nonlinear observation function 𝐡k\mathbf{h}_{k} at 𝐱kk1(i)\mathbf{x}_{k\mid k-1}^{(i)}, i.e.,

𝐡k(𝐱k)𝐡k(𝐱kk1(i))+𝐇k(𝐱k𝐱kk1(i)).\mathbf{h}_{k}\left(\mathbf{x}_{k}\right)\approx\mathbf{h}_{k}\left(\mathbf{x}_{k\mid k-1}^{(i)}\right)+\mathbf{H}_{k}\left(\mathbf{x}_{k}-\mathbf{x}_{k\mid k-1}^{(i)}\right). (19)

Substituting (19) into (18), we have

𝐱k=\displaystyle\mathbf{x}_{k}= 𝐱kk1(i)+(l𝐂^k𝐂^k1+𝐇kl𝐑k𝐑k1𝐇k)1\displaystyle\mathbf{x}_{k\mid k-1}^{(i)}+\left(l_{\hat{\mathbf{C}}_{k}}\hat{\mathbf{C}}_{k}^{-1}+\mathbf{H}_{k}^{\top}l_{\mathbf{R}_{k}}\mathbf{R}_{k}^{-1}\mathbf{H}_{k}\right)^{-1} (20)
×𝐇kl𝐑k𝐑k1(𝐲k𝐡k(𝐱kk1(i))).\displaystyle\times\mathbf{H}_{k}^{\top}l_{\mathbf{R}_{k}}\mathbf{R}_{k}^{-1}\left(\mathbf{y}_{k}-\mathbf{h}_{k}\left(\mathbf{x}_{k\mid k-1}^{(i)}\right)\right).

Then we obtain the following stochastic ensemble update rule like (7) for 𝐱kk(i)\mathbf{x}_{k\mid k}^{(i)} with drawing 𝐯k(i)𝒩(0,𝐑k)\mathbf{v}_{k}^{(i)}\sim\mathcal{N}(0,\mathbf{R}_{k}):

𝐱kk(i)=𝐱kk1(i)+𝐊~k(𝐲k+𝐯k(i)𝐡k(𝐱kk1(i))),\mathbf{x}_{k\mid k}^{(i)}=\mathbf{x}_{k\mid k-1}^{(i)}+\tilde{\mathbf{K}}_{k}\left(\mathbf{y}_{k}+\mathbf{v}_{k}^{(i)}-\mathbf{h}_{k}\left(\mathbf{x}_{k\mid k-1}^{(i)}\right)\right), (21)

where the new Kalman gain 𝐊~k\tilde{\mathbf{K}}_{k} is given by

𝐊~k\displaystyle\tilde{\mathbf{K}}_{k} =(l𝐂^k𝐂^k1+𝐇kl𝐑k𝐑k1𝐇k)1𝐇kl𝐑k𝐑k1\displaystyle=\left(l_{\hat{\mathbf{C}}_{k}}\hat{\mathbf{C}}_{k}^{-1}+\mathbf{H}_{k}^{\top}l_{\mathbf{R}_{k}}\mathbf{R}_{k}^{-1}\mathbf{H}_{k}\right)^{-1}\mathbf{H}_{k}^{\top}l_{\mathbf{R}_{k}}\mathbf{R}_{k}^{-1} (22)
=((𝐂^kl𝐂^k)1+𝐇k(𝐑kl𝐑k)1𝐇k)1𝐇k(𝐑kl𝐑k)1.\displaystyle=\left(\left(\frac{\hat{\mathbf{C}}_{k}}{l_{\hat{\mathbf{C}}_{k}}}\right)^{-1}+\mathbf{H}_{k}^{\top}\left(\frac{\mathbf{R}_{k}}{l_{\mathbf{R}_{k}}}\right)^{-1}\mathbf{H}_{k}\right)^{-1}\mathbf{H}_{k}^{\top}\left(\frac{\mathbf{R}_{k}}{l_{\mathbf{R}_{k}}}\right)^{-1}.

Here we shall discuss the practical algorithm, since l𝐑kl_{\mathbf{R}_{k}} and l𝐂^kl_{\hat{\mathbf{C}}_{k}} all contain the item 𝐱k\mathbf{x}_{k}. Here we introduce a novel technical approximation. Recall (16), we use 𝐦^k\hat{\mathbf{m}}_{k} to approximate 𝐱k\mathbf{x}_{k} contained in l𝐑kl_{\mathbf{R}_{k}} and l𝐂^kl_{\hat{\mathbf{C}}_{k}}, respectively, i.e.,

l^𝐑k\displaystyle\hat{l}_{\mathbf{R}_{k}} =𝒢σ(𝐲k𝐡k(𝐦^k)𝐑k)l𝐑k\displaystyle=\mathcal{G}_{\sigma}\left(\left\|\mathbf{y}_{k}-\mathbf{h}_{k}\left(\hat{\mathbf{m}}_{k}\right)\right\|_{\mathbf{R}_{k}}\right)\approx l_{\mathbf{R}_{k}} (23)
l^𝐂^k\displaystyle\hat{l}_{\hat{\mathbf{C}}_{k}} =𝒢σ(𝐦^k𝐦^k𝐂^k)l𝐂^k.\displaystyle=\mathcal{G}_{\sigma}\left(\|\hat{\mathbf{m}}_{k}-\hat{\mathbf{m}}_{k}\|_{\hat{\mathbf{C}}_{k}}\right)\approx l_{\hat{\mathbf{C}}_{k}}.

It follows that we summarise the specific algorithm steps of MC-EnKF in Algorithm 1.

Algorithm 1 MC-EnKF
1:  Intitialization. Start with initial filtering ensembles 𝐱00(1),,𝐱00(N)\mathbf{x}_{0\mid 0}^{(1)},\dots,\mathbf{x}_{0\mid 0}^{(N)}. Then, at each time k=1,2,k=1,2,\dots, given filtering ensembles 𝐱k1k1(1),,𝐱k1k1(N)\mathbf{x}_{k-1\mid k-1}^{(1)},\dots,\mathbf{x}_{k-1\mid k-1}^{(N)}, the MC-EnKF carries out the following two steps for i=1,2,,Ni=1,2,\dots,N:
2:  Prediction Step: Draw 𝐰k(i)𝒩(0,𝐐k)\mathbf{w}_{k}^{(i)}\sim\mathcal{N}(0,\mathbf{Q}_{k}) and calculate 𝐱kk1(i)\mathbf{x}_{k\mid k-1}^{(i)} via (6).  
3:  Update Step: Draw 𝐯k(i)𝒩(0,𝐑k)\mathbf{v}_{k}^{(i)}\sim\mathcal{N}(0,\mathbf{R}_{k}) and calculate 𝐱kk(i)\mathbf{x}_{k\mid k}^{(i)} by
𝐱kk(i)=𝐱kk1(i)+𝐊~k(𝐲k+𝐯k(i)𝐡k(𝐱kk1(i))),\mathbf{x}_{k\mid k}^{(i)}=\mathbf{x}_{k\mid k-1}^{(i)}+\tilde{\mathbf{K}}_{k}\left(\mathbf{y}_{k}+\mathbf{v}_{k}^{(i)}-\mathbf{h}_{k}\left(\mathbf{x}_{k\mid k-1}^{(i)}\right)\right),
where 𝐊~k\tilde{\mathbf{K}}_{k} is defined in (22) with the approximation (23).  
Remark III.1

The proposed MC-EnKF is still of Kalman type, i.e., it has a similar recursive structure as the common EnKF. Moreover, its computational complexity is only slightly higher than the common EnKF, which is verified in later experiments.

Remark III.2

In (22), we note that MC-EnKF is potential to handle non-Gaussian observation noises thanks to the two weights, l^𝐂^k\hat{l}_{\hat{\mathbf{C}}_{k}} and l^𝐑k\hat{l}_{\mathbf{R}_{k}}, which could adjust 𝐂^k\hat{\mathbf{C}}_{k} and 𝐑k\mathbf{R}_{k} adaptively. We shall also verify this argument in later experiments.

III-B Adaptive Strategy for Kernel Bandwidth Selection and Convergence Regarding to Kernel Bandwidth

In general, the kernel bandwidth (scale parameter) in the cost function balances the convergence rates of the regression model and its robustness [23]. Since our algorithm does not involve solving such regression problems, we focus on the how to tune this scale parameter for robustness (with respect to outliers). Intuitively, if the arrived observation 𝐲k\mathbf{y}_{k} contains large outliers, 𝐲k𝐡k(𝐱^k)2\|\mathbf{y}_{k}-\mathbf{h}_{k}(\hat{\mathbf{x}}_{k})\|_{2} will be large, where 2\|\cdot\|_{2} denote the l2l^{2}-norm. In this case, we need a smaller σ\sigma to make our algorithm more robust. Motivated by this intuition, we propose to set σk=1𝐲k𝐡k(𝐱^k)2\sigma_{k}=\frac{1}{\|\mathbf{y}_{k}-\mathbf{h}_{k}(\hat{\mathbf{x}}_{k})\|_{2}} adaptively, where σk\sigma_{k} denote the kernel bandwidth used in the kk-th step for MC-EnKF. As shown in our simulation experiments, this adaptive strategy can achieve a good estimation performance. Additionally, the MC-EnKF will act more and more like the common EnKF algorithm as σ\sigma increases. In particular, the following convergence theorem holds .

Theorem III.1

If the kernel bandwidth σ\sigma\to\infty, the proposed MC-EnKF will converge to the common EnKF.

Proof:

See Appendix A. ∎

Remark III.3

This justification in Theorem III.1 gives the flexibility of MC-EnKF to handle Gaussian noise cases since the common EnKF performs well in this case.

IV Experiments

In this section, we will conduct performance comparisons of our proposed MC-EnKF with the common EnKF and the maximum liklihood EnKF (ML-EnKF) [24], which optimizes a nonlinear cost function through maximum likelihood. These evaluations will be carried out on various filtering benchmarks that incorporate non-Gaussian noises, i.e., the observation noises with large outliers. In these experiments, we consider M=100M=100 independent Monte Carlo runs. In each run, N=1000N=1000 samples are used to evaluate the MSE of the state. The EnKF and MC-EnKF are implemented by Numpy [25] and run on Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz.

IV-A Linear System

The first benchmark we used here is a linear system, which is given by

𝐱k\displaystyle\mathbf{x}_{k} =[cos(α)sin(α)sin(α)cos(α)]𝐱k1+𝐰k,\displaystyle=\left[\begin{array}[]{cc}\cos(\alpha)&\sin(\alpha)\\ -\sin(\alpha)&\cos(\alpha)\end{array}\right]\mathbf{x}_{k-1}+\mathbf{w}_{k}, (24)
𝐲k\displaystyle\mathbf{y}_{k} =[11]𝐱k+𝐯k,\displaystyle=\left[\begin{array}[]{cc}1&1\end{array}\right]\mathbf{x}_{k}+\mathbf{v}_{k},

where α=π18\alpha=\frac{\pi}{18}, 𝐱0𝒩(0,𝐈2)\mathbf{x}_{0}\sim\mathcal{N}(0,\mathbf{I}_{2}) and 𝐰k𝒩(0,0.01𝐈2)\mathbf{w}_{k}\sim\mathcal{N}(0,0.01\mathbf{I}_{2}). Let 𝐐1=0.01𝐈1\mathbf{Q}_{1}=0.01\mathbf{I}_{1} be the nominal observation covariance, the observation noises are sampled from the mixture of Gaussian, i.e.,

𝐯k0.9𝒩(0,𝐐1)+0.1𝒩(0,100𝐐1).\mathbf{v}_{k}\sim 0.9\ \mathcal{N}(0,\mathbf{Q}_{1})+0.1\ \mathcal{N}(0,100\mathbf{Q}_{1}). (25)

Table I lists the MSEs and average CPU times of EnKF and MC-EnKF with different kernel bandwidths in this example, where the number of ensembles of EnKF and MC-EnKF are set as 100100. Note that the MC-EnKF outperforms EnKF in most cases and their average CPU times are virtually identical to the common EnKF. This demonstrates that our proposed MC-EnKF has better performance than those of EnKF with nearly no extra cost on CPU time. It means that MC-EnKF is more robust and efficient when we deal with observation noises with large outliers. In fact, our proposed adaptive strategy (MC-EnKF-Ada) outperforms most different σ\sigma cases and is only slightly worse than the best case. This suggests that our adaptive strategy works well for linear system. When we conduct the simulation for large enough σ\sigma, for example, σ=10000\sigma=10000, the result verifies Theorem III.1 for linear system since we note that the performance of MC-EnKF is almost identical to the EnKF in this case.

TABLE I: The MSEs comparisons between EnKF and MC-EnkF with different σ\sigma for linear system.
Methods MSE CPU Time
EnKF 3.1004 0.4676
MC-EnKF-Ada 2.2344 0.4697
MC-EnKF (σ=0.1\sigma=0.1) 2.9655 0.4634
MC-EnKF (σ=0.5\sigma=0.5) 2.9247 0.4636
MC-EnKF (σ=2\sigma=2) 2.1031 0.4689
MC-EnKF (σ=5\sigma=5) 1.9411 0.4629
MC-EnKF (σ=10\sigma=10) 2.3073 0.4642
MC-EnKF (σ=10000\sigma=10000) 3.1605 0.4725

In order to better present the comparison results, we choose the case where σ=5\sigma=5 to illustrate the true state and EnKF estimation and MC-EnKF estimation over time, which is shown in Fig 1. Here, the selected dimension is indicated by the - suffix of the legend in the figures; for example, EnKF-1 denotes the EnKF estimation on the first state dimension. This setting will also be used for the following figure. We can clearly see that this MC-EnKF gives accurate estimates but EnKF performs poorly when the observations contain large outliers.

Refer to caption
Figure 1: The true state versus the estimate of EnKF and the estimate of MC-EnKF (σ=5\sigma=5) over time for linear system.

IV-B Nonlinear System

The second benchmark is a nonlinear system, which is given as follows:

𝐱k\displaystyle\mathbf{x}_{k} =(𝐈2+κ1[10.20.21])𝐱k1+κ2cos(𝐱k1)+𝐰k,\displaystyle=\left(\mathbf{I}_{2}+\kappa_{1}\left[\begin{array}[]{cc}-1&0.2\\ 0.2&-1\end{array}\right]\right)\mathbf{x}_{k-1}+\kappa_{2}\cos(\mathbf{x}_{k-1})+\mathbf{w}_{k}, (26)
𝐲k\displaystyle\mathbf{y}_{k} =𝐱k+sin(𝐱k)+𝐯k,\displaystyle=\mathbf{x}_{k}+\sin(\mathbf{x}_{k})+\mathbf{v}_{k},

where 𝐱0𝒩(0,𝐈2)\mathbf{x}_{0}\sim\mathcal{N}(0,\mathbf{I}_{2}) and 𝐰k𝒩(0,𝐈2)\mathbf{w}_{k}\sim\mathcal{N}(0,\mathbf{I}_{2}). κ1=κ2=0.1\kappa_{1}=\kappa_{2}=0.1 are constants controlling the state dynamics. Let 𝐐2=𝐈2\mathbf{Q}_{2}=\mathbf{I}_{2} be the nominal observation covariance, the observation noises are sampled from the mixture of Gaussian, i.e.,

𝐯k0.9𝒩(0,𝐐2)+0.1𝒩(0,1000𝐐2).\mathbf{v}_{k}\sim 0.9\ \mathcal{N}(0,\mathbf{Q}_{2})+0.1\ \mathcal{N}(0,1000\mathbf{Q}_{2}). (27)

We list the MSEs and average CPU times of EnKF, ML-EnKF and MC-EnKF with different kernel bandwidths in Table II for this nonlinear example, where the number of ensembles of these filters still be set as 100100. It is obvious that for nonlinear system the EnKF degrades greatly in the presence of non-Gaussian observation noises and the linear approximation error of observation function, hence has the worst estimate performance. ML-EnKF performs better than EnKF, but is still affected by outliers. Better results can be obtained with our proposed adaptive strategy (MC-EnKF-Ada) than with some kernel bandwidths, indicating that it is still useful for nonlinear systems. Note that the MC-EnKF outperforms EnKF in all cases, which demonstrate that our proposed MC-EnKF can effectively eliminate the effect of the non-Gaussian observation noises. Additionally, we notice that they share almost identical CPU times, which demonstrates the efficiency of MC-EnKF. Moreover, with suitable kernel bandwidth (σ=5\sigma=5), MC-EnKF can achieve very accurate estimates while EnKF is heavily affected by large outliers, which is presented in Fig 2. We also note that when σ\sigma is large enough (σ=10000\sigma=10000), the performance of MC-EnKF is identical to the EnKF, which verifies the result of Theorem III.1 for nonlinear system.

TABLE II: The MSEs comparisons between EnKF and MC-EnkF with different σ\sigma for nonlinear system.
Methods MSE CPU Time
EnKF 4.0929 0.4768
ML-EnKF 3.3567 0.5059
MC-EnKF-Ada 2.9282 0.4775
MC-EnKF (σ=0.1\sigma=0.1) 3.0150 0.4793
MC-EnKF (σ=0.5\sigma=0.5) 2.9703 0.4769
MC-EnKF (σ=2\sigma=2) 1.5849 0.4713
MC-EnKF (σ=5\sigma=5) 1.3012 0.4762
MC-EnKF (σ=10\sigma=10) 1.3752 0.4729
MC-EnKF (σ=10000\sigma=10000) 4.0448 0.4799
Refer to caption
Figure 2: The true state versus the estimate of EnKF and the estimate of MC-EnKF (σ=5\sigma=5) over time for nonlinear system.

IV-C Discussions

According to the simulation experiments mentioned above, we notice the following two arguments:

  • With suitable kernel bandwidth, MC-EnKF can significantly improve its robustness to observation noises containing large outliers with nearly no CPU time loss.

  • With large kernel bandwidth, MC-EnKF performs like the common EnKF, which implies its flexibility to handle Gaussian noise cases.

These two arguments make our viewpoint a very fruitful area for further study. Besides, we note that the kernel bandwidth σ\sigma plays a significant role in the algorithm and will have an impact on its robustness to non-Gaussian observation (such as outliers). Although our proposed adaptive strategy is effective, it cannot achieve the performance with suitable kernel bandwidth. Therefore, a highly intriguing future work direction may be a more effective adaptive strategy to pick the right kernel bandwidth with theoretical guarantee since it is crucial for our proposed MC-EnKF. In Addition, this article only investigates using the special case (MCC) of generalized MCC [26] as a cost function to derive a robust EnKF with stochastic update steps. Hence how to develop robust variants of EnKF [9] based on the generalized MCC will be our next research focus.

V Conclusion

This article proposes a robust EnKF filtering algorithm called maximal correntropy ensemble Kalman filter (MC-EnKF), which is flexible to handle cases involving both Gaussian and non-Gaussian noise. Instead of the well-known minimum mean square error (MMSE) criterion, the MC-EnKF is derived by employing the maximum correntropy criterion (MCC) as the optimality criterion. Ensemble propagation equations remain to be of the Kalman type. The MC-EnKF will behave like the EnKF when the kernel bandwidth is large enough, and we also theoretically demonstrate this argument. With the proper kernel bandwidth, MC-EnKF can perform much better than the EnKF at only a slight increase in computational cost, especially when the underlying observation system is disturbed by some heavy-tailed non-Gaussian noises. Besides, we propose an adaptive strategy to help us choose kernel bandwidth, and its effectiveness is also been verified by experiments.

Appendix A Proof of the Theorem III.1

Here we shall give the proof of the Theorem III.1. Before proceeding, we need the following technical lemma.

Lemma A.1 (Matrix Inversion Lemma [27])

If 𝐀n×n\mathbf{A}\in\mathbb{R}^{n\times n}, 𝐂n×n\mathbf{C}\in\mathbb{R}^{n\times n} are non-singular, 𝐁n×m\mathbf{B}\in\mathbb{R}^{n\times m}, 𝐃m×n\mathbf{D}\in\mathbb{R}^{m\times n},

(𝐀+𝐁𝐂𝐃)1=𝐀1𝐀1𝐁(𝐃𝐀1𝐁+𝐂1)1𝐃𝐀1\left(\mathbf{A}+\mathbf{B}\mathbf{C}\mathbf{D}\right)^{-1}=\mathbf{A}^{-1}-\mathbf{A}^{-1}\mathbf{B}\left(\mathbf{D}\mathbf{A}^{-1}\mathbf{B}+\mathbf{C}^{-1}\right)^{-1}\mathbf{D}\mathbf{A}^{-1} (28)
Proof:

Now we state the proof of Theorem III.1. Note that when the kernel bandwidth σ\sigma\to\infty,

l^𝐑k\displaystyle\hat{l}_{\mathbf{R}_{k}} =𝒢σ(𝐲k𝐡k(𝐦^k)𝐑k)\displaystyle=\mathcal{G}_{\sigma}\left(\left\|\mathbf{y}_{k}-\mathbf{h}_{k}\left(\hat{\mathbf{m}}_{k}\right)\right\|_{\mathbf{R}_{k}}\right) (29)
=exp((𝐲k𝐡k(𝐦^k))𝐑k1(𝐲k𝐡k(𝐦^k))2σ2)\displaystyle=\exp\left(\frac{-\left(\mathbf{y}_{k}-\mathbf{h}_{k}\left(\hat{\mathbf{m}}_{k}\right)\right)^{\top}\mathbf{R}_{k}^{-1}\left(\mathbf{y}_{k}-\mathbf{h}_{k}\left(\hat{\mathbf{m}}_{k}\right)\right)}{2\sigma^{2}}\right)
1,\displaystyle\to 1,

Similarly, one can conclude that l^𝐂^k1\hat{l}_{\hat{\mathbf{C}}_{k}}\to 1 when σ\sigma\to\infty. These arguments imply 𝐊~k\tilde{\mathbf{K}}_{k} in (22) with the approximation (23) become

𝐊~k=(𝐂^k1+𝐇k𝐑k1𝐇k)1𝐇k𝐑k1,\tilde{\mathbf{K}}_{k}=\left(\hat{\mathbf{C}}_{k}^{-1}+\mathbf{H}_{k}^{\top}\mathbf{R}_{k}^{-1}\mathbf{H}_{k}\right)^{-1}\mathbf{H}_{k}^{\top}\mathbf{R}_{k}^{-1}, (30)

Then in view of Lemma A.1 (we choose 𝐀=𝐂^k1\mathbf{A}=\hat{\mathbf{C}}_{k}^{-1}, 𝐁=𝐇k\mathbf{B}=\mathbf{H}_{k}^{\top}, 𝐂=𝐑k\mathbf{C}=\mathbf{R}_{k} and 𝐃=𝐇k\mathbf{D}=\mathbf{H}_{k}), one can conclude that

𝐊~k\displaystyle\tilde{\mathbf{K}}_{k} =(𝐂^k1+𝐇k𝐑k1𝐇k)1𝐇k𝐑k1\displaystyle=\left(\hat{\mathbf{C}}_{k}^{-1}+\mathbf{H}_{k}^{\top}\mathbf{R}_{k}^{-1}\mathbf{H}_{k}\right)^{-1}\mathbf{H}_{k}^{\top}\mathbf{R}_{k}^{-1} (31)
=(𝐂^k𝐂^k𝐇k(𝐑k+𝐇k𝐂^k𝐇k)1𝐇k𝐂^k)𝐇k𝐑k1\displaystyle=\left(\hat{\mathbf{C}}_{k}-\hat{\mathbf{C}}_{k}\mathbf{H}_{k}^{\top}\left(\mathbf{R}_{k}+\mathbf{H}_{k}\hat{\mathbf{C}}_{k}\mathbf{H}_{k}^{\top}\right)^{-1}\mathbf{H}_{k}\hat{\mathbf{C}}_{k}\right)\mathbf{H}_{k}^{\top}\mathbf{R}_{k}^{-1}
=𝐂^k𝐇k𝐑k1𝐂^k𝐇k(𝐑k+𝐇k𝐂^k𝐇k)1\displaystyle=\hat{\mathbf{C}}_{k}\mathbf{H}_{k}^{\top}\mathbf{R}_{k}^{-1}-\hat{\mathbf{C}}_{k}\mathbf{H}_{k}^{\top}\left(\mathbf{R}_{k}+\mathbf{H}_{k}\hat{\mathbf{C}}_{k}\mathbf{H}_{k}^{\top}\right)^{-1}
×𝐇k𝐂^k𝐇k𝐑k1\displaystyle\times\mathbf{H}_{k}\hat{\mathbf{C}}_{k}\mathbf{H}_{k}^{\top}\mathbf{R}_{k}^{-1}
=𝐂^k𝐇k(𝐑k1(𝐑k+𝐇k𝐂^k𝐇k)1𝐇k𝐂^k𝐇k𝐑k1)\displaystyle=\hat{\mathbf{C}}_{k}\mathbf{H}_{k}^{\top}\left(\mathbf{R}_{k}^{-1}-\left(\mathbf{R}_{k}+\mathbf{H}_{k}\hat{\mathbf{C}}_{k}\mathbf{H}_{k}^{\top}\right)^{-1}\mathbf{H}_{k}\hat{\mathbf{C}}_{k}\mathbf{H}_{k}^{\top}\mathbf{R}_{k}^{-1}\right)
=𝐂^k𝐇k(𝐈(𝐑k+𝐇k𝐂^k𝐇k)1𝐇k𝐂^k𝐇k)𝐑k1.\displaystyle=\hat{\mathbf{C}}_{k}\mathbf{H}_{k}^{\top}\left(\mathbf{I}-\left(\mathbf{R}_{k}+\mathbf{H}_{k}\hat{\mathbf{C}}_{k}\mathbf{H}_{k}^{\top}\right)^{-1}\mathbf{H}_{k}\hat{\mathbf{C}}_{k}\mathbf{H}_{k}^{\top}\right)\mathbf{R}_{k}^{-1}.

Now use the Lemma A.1 again (we choose 𝐀=𝐈\mathbf{A}=\mathbf{I}, 𝐁=𝐈\mathbf{B}=\mathbf{I}, 𝐂=𝐑k\mathbf{C}=\mathbf{R}_{k} and 𝐃=𝐇k𝐂^k𝐇k\mathbf{D}=\mathbf{H}_{k}\hat{\mathbf{C}}_{k}\mathbf{H}_{k}^{\top}), one can further conclude that

𝐊~k\displaystyle\tilde{\mathbf{K}}_{k} =𝐂^k𝐇k(𝐈(𝐑k+𝐇k𝐂^k𝐇k)1𝐇k𝐂^k𝐇k)𝐑k1\displaystyle=\hat{\mathbf{C}}_{k}\mathbf{H}_{k}^{\top}\left(\mathbf{I}-\left(\mathbf{R}_{k}+\mathbf{H}_{k}\hat{\mathbf{C}}_{k}\mathbf{H}_{k}^{\top}\right)^{-1}\mathbf{H}_{k}\hat{\mathbf{C}}_{k}\mathbf{H}_{k}^{\top}\right)\mathbf{R}_{k}^{-1} (32)
=𝐂^k𝐇k(𝐈+𝐑k1𝐇k𝐂^k𝐇k)1𝐑k1\displaystyle=\hat{\mathbf{C}}_{k}\mathbf{H}_{k}^{\top}\left(\mathbf{I}+\mathbf{R}_{k}^{-1}\mathbf{H}_{k}\hat{\mathbf{C}}_{k}\mathbf{H}_{k}^{\top}\right)^{-1}\mathbf{R}_{k}^{-1}
=𝐂^k𝐇k(𝐑k+𝐇k𝐂^k𝐇k)1,\displaystyle=\hat{\mathbf{C}}_{k}\mathbf{H}_{k}^{\top}\left(\mathbf{R}_{k}+\mathbf{H}_{k}\hat{\mathbf{C}}_{k}\mathbf{H}_{k}^{\top}\right)^{-1},

which is equal to (8). ∎

References

  • [1] S. Särkkä, Bayesian filtering and smoothing.   Cambridge university press, 2013, no. 3.
  • [2] J. D. Hamilton, “State-space models,” Handbook of econometrics, vol. 4, pp. 3039–3080, 1994.
  • [3] S. Chen, “Kalman filter for robot vision: a survey,” IEEE Transactions on industrial electronics, vol. 59, no. 11, pp. 4409–4420, 2011.
  • [4] K. Law, A. Stuart, and K. Zygalakis, “Data assimilation,” Cham, Switzerland: Springer, vol. 214, p. 52, 2015.
  • [5] R. E. Kalman, “A new approach to linear filtering and prediction problems,” Transactions of the ASME–Journal of Basic Engineering, vol. 82, no. Series D, pp. 35–45, 1960.
  • [6] M. Hoshiya, E. Saito et al., “Structural identification by extended kalman filter,” Jour. of Eng. Mech., ASCE, vol. 110, no. 12, 1984.
  • [7] S. J. Julier and J. K. Uhlmann, “New extension of the kalman filter to nonlinear systems,” in Signal processing, sensor fusion, and target recognition VI, vol. 3068.   International Society for Optics and Photonics, 1997, pp. 182–193.
  • [8] I. Arasaratnam and S. Haykin, “Cubature kalman filters,” IEEE Transactions on automatic control, vol. 54, no. 6, pp. 1254–1269, 2009.
  • [9] G. Evensen, “The ensemble kalman filter: Theoretical formulation and practical implementation,” Ocean dynamics, vol. 53, no. 4, pp. 343–367, 2003.
  • [10] Y. Lu, B. Li, H. R. Karimi, and N. Zhang, “Measurement outlier-resistant target tracking in wireless sensor networks with energy harvesting constraints,” Journal of the Franklin Institute, 2022.
  • [11] Y. Wang, Y. Sun, V. Dinavahi, S. Cao, and D. Hou, “Adaptive robust cubature kalman filter for power system dynamic state estimation against outliers,” IEEE Access, vol. 7, pp. 105 872–105 881, 2019.
  • [12] A. H. Jazwinski, Stochastic processes and filtering theory.   Courier Corporation, 2007.
  • [13] Z. Wu, J. Shi, X. Zhang, W. Ma, B. Chen, and I. Senior Member, “Kernel recursive maximum correntropy,” Signal Processing, vol. 117, pp. 11–16, 2015.
  • [14] J. C. Principe, Information theoretic learning: Renyi’s entropy and kernel perspectives.   Springer Science & Business Media, 2010.
  • [15] V. Stojanovic and N. Nedic, “Robust kalman filtering for nonlinear multivariable stochastic systems in the presence of non-gaussian noise,” International Journal of Robust and Nonlinear Control, vol. 26, no. 3, pp. 445–460, 2016.
  • [16] Y. Huang, Y. Zhang, N. Li, Z. Wu, and J. A. Chambers, “A novel robust student’s t-based kalman filter,” IEEE Transactions on Aerospace and Electronic Systems, vol. 53, no. 3, pp. 1545–1554, 2017.
  • [17] W. Liu, P. P. Pokharel, and J. C. Principe, “Correntropy: Properties and applications in non-gaussian signal processing,” IEEE Transactions on signal processing, vol. 55, no. 11, pp. 5286–5298, 2007.
  • [18] B. Chen, X. Liu, H. Zhao, and J. C. Principe, “Maximum correntropy kalman filter,” Automatica, vol. 76, pp. 70–77, 2017.
  • [19] G. Wang, N. Li, and Y. Zhang, “Maximum correntropy unscented kalman and information filters for non-gaussian measurement noise,” Journal of the Franklin Institute, vol. 354, no. 18, pp. 8659–8677, 2017.
  • [20] G. Wang, Y. Zhang, and X. Wang, “Iterated maximum correntropy unscented kalman filters for non-gaussian systems,” Signal Processing, vol. 163, pp. 87–94, 2019.
  • [21] Y. Tao and S. S.-T. Yau, “Outlier-robust iterative extended kalman filtering,” IEEE Signal Processing Letters, vol. 30, pp. 743–747, 2023.
  • [22] H. E. Rauch, F. Tung, and C. T. Striebel, “Maximum likelihood estimates of linear dynamic systems,” AIAA journal, vol. 3, no. 8, pp. 1445–1450, 1965.
  • [23] Y. Feng, X. Huang, L. Shi, Y. Yang, J. A. Suykens et al., “Learning with the maximum correntropy criterion induced losses for regression.” J. Mach. Learn. Res., vol. 16, no. 30, pp. 993–1034, 2015.
  • [24] M. Zupanski, “Maximum likelihood ensemble filter: Theoretical aspects,” Monthly Weather Review, vol. 133, no. 6, pp. 1710–1726, 2005.
  • [25] C. R. Harris, K. J. Millman, S. J. Van Der Walt, R. Gommers, P. Virtanen, D. Cournapeau, E. Wieser, J. Taylor, S. Berg, N. J. Smith et al., “Array programming with numpy,” Nature, vol. 585, no. 7825, pp. 357–362, 2020.
  • [26] B. Chen, L. Xing, H. Zhao, N. Zheng, J. C. Prı et al., “Generalized correntropy for robust adaptive filtering,” IEEE Transactions on Signal Processing, vol. 64, no. 13, pp. 3376–3387, 2016.
  • [27] W. W. Hager, “Updating the inverse of a matrix,” SIAM review, vol. 31, no. 2, pp. 221–239, 1989.