Minimizing effects of the Kalman gain on Posterior covariance Eigenvalues, the characteristic polynomial and symmetric polynomials of Eigenvalues

Johannes Krotz ¹¹footnotemark: 1

Abstract

The Kalman gain is commonly derived as the minimizer of the trace of theposterior covariance. It is known that it also minimizes the determinant of the posterior covariance. I will show that it also minimizes the smallest Eigenvalue $\lambda_{1}$ and the chracteristic polynomial on $(-\infty,\lambda_{1})$ and is critical point to all symmetric polynomials of the Eigenvalues, minimizing some. This expands the range of uncertainty measures for which the Kalman Filter is optimal.

keywords:

Kalman Filter , Uncertainty measures , symetric Polynomials

\affiliation

[label1]organization=Department of Mathematics, University of Tennessee Knoxville,city=Knoxville, postcode=37996, state=TN, country=USA

In a Kalman filter algorithm the Kalman gain is defined by

\displaystyle{\bm{K}}^{*}={\bm{P}}{\bm{H}}^{\top}({\bm{H}}{\bm{P}}{\bm{H}}^{\top}{\bm{R}})^{-1},

(1)

where ${\bm{P}}$ is the covariance matrix of prior, ${\bm{R}}$ is the covariance matrix of the Likelihood and ${\bm{H}}$ is the measurement operator. [1, 5] The posterior covariance matrix ${\bm{P}}_{{\bm{K}}^{*}}$ is defined through

\displaystyle{\bm{P}}_{\bm{K}}=({\bm{I}}-{\bm{K}}{\bm{H}}){\bm{P}}({\bm{I}}-{\bm{K}}{\bm{H}})^{\top}+{\bm{K}}{\bm{R}}{\bm{K}}^{\top}.

(2)

evaluated at ${\bm{K}}^{*}$ , where ${\bm{I}}$ is the identity matrix. Note that as covariance matrices ${\bm{P}}$ and ${\bm{R}}$ are symmetric and striclty positive, properties which ${\bm{P}}_{\bm{K}}$ inherits.
The Kalman gain defined in equation (1) is often derived as the minimizer of the total posterior variance, i.e. the trace of $P_{K}$ .[1, 5, 3, 6, 4]. In other words

\displaystyle{\bm{K}}^{*}=\arg\min_{\bm{K}}\text{tr}({\bm{P}}_{\bm{K}})

(3)

It was shown in [2] that ${\bm{K}}^{*}$ also minimizes the posterior generalized variance, defined as the determinant of ${\bm{P}}_{\bm{K}}$ , i.e.

\displaystyle{\bm{K}}^{*}=\arg\min_{\bm{K}}\det({\bm{P}}_{\bm{K}}).

(4)

Let $\Phi({\bm{P}}_{\bm{K}},\lambda):=\det(\lambda{\bm{I}}-{\bm{P}}_{\bm{K}})=\prod_{i=1}^{n}(\lambda-\lambda^{\bm{K}}_{i}):=\sum_{i=0}^{n}a_{i}^{\bm{K}}\lambda^{i}$ be the characteristic Polynomial of ${\bm{P}}_{\bm{K}}$ and $0<\lambda^{\bm{K}}_{1}\leq\dots\leq\lambda^{\bm{K}}_{n}$ be its roots, the Eigenvalues of ${\bm{P}}_{\bm{K}}$ . Let $\Lambda^{\bm{K}}=\{\lambda^{\bm{K}}_{i}\}_{i=1}^{n}$ .

Theorem 1.

Let $\lambda\notin\Lambda^{{\bm{K}}^{*}}$ and $g_{\lambda}({\bm{K}}):=||\Phi({\bm{P}}_{\bm{K}},\lambda)||$ . The Kalman gain $K^{*}$ is a critical point of $g_{\lambda}$ and, if $\lambda<\lambda^{K^{*}}_{1}$

\displaystyle K^{*}=\arg\min_{\bm{K}}g_{\lambda}(K).

(5)

Proof.

Let $\lambda\notin\Lambda^{K}$ and let $f({\bm{K}},\lambda):=\log g_{\lambda}$ . Using matrix calculus it follows that

\displaystyle\frac{df_{\lambda}({\bm{K}})}{{\bm{K}}}=-\left(\lambda{\bm{I}}-{\bm{P}}_{\bm{K}}\right)^{-1}\frac{d{\bm{P}}_{\bm{K}}}{d{\bm{K}}}.

(6)

The equaton $\frac{df_{\lambda}({\bm{K}})}{{\bm{K}}}=0$ is solved if and only if $\frac{d{\bm{P}}_{\bm{K}}}{d{\bm{K}}}=0$ is solved. The latter however is only true for ${\bm{K}}={\bm{K}}^{*}$ , implying that $K^{*}$ is a critical point of $f_{\lambda}$ and hence of $g_{\lambda}$ .
For $\lambda<\lambda^{K^{*}}_{1}$ the term $-\left(\lambda{\bm{I}}-{\bm{P}}_{{\bm{K}}^{*}}\right)^{-1}$ is positive definite, meaning the minimizing properties of $\frac{d{\bm{P}}_{{\bm{K}}^{*}}}{d{\bm{K}}}$ are preserved. ∎

Corollary 1.1.

${\bm{K}}^{*}$ minimizes the smallest Eigenvalue

\displaystyle{\bm{K}}^{*}=\arg\min_{\bm{K}}(\lambda_{1}^{{\bm{K}}}).

(7)

Proof.

Let ${\bm{K}}\neq{\bm{K}}^{*}$ . Suppose $\lambda_{1}^{{\bm{K}}^{*}}>\lambda_{1}^{{\bm{K}}}$ . Then $||\Phi({\bm{P}}_{\bm{K}}),\lambda_{1}^{\bm{K}}||=0<||\Phi({\bm{P}}_{K^{*}}),\lambda_{1}^{\bm{K}}||$ . By continuity of these characteristic Polynomials there is $\lambda<\lambda_{1}^{\bm{K}}<\lambda_{1}^{K^{*}}$ , i.e. $\lambda\notin\Lambda^{\bm{K}}\cup\Lambda^{{\bm{K}}^{*}}$ such that $||\Phi({\bm{P}}_{\bm{K}}),\lambda||<||\Phi({\bm{P}}_{K^{*}}),\lambda||$ . This contradicts Theorem 1. ∎

Corollary 1.2.

Let $\Phi({\bm{P}}_{\bm{K}},\lambda):=\sum_{i=0}^{n}a_{i}^{\bm{K}}\lambda^{i}$ . Then ${\bm{K}}^{*}$ is critical point of the map ${\bm{K}}\mapsto||a_{i}||^{\bm{K}}$ for all $i=1,\dots,n$ and minimizer for even $i$ :

\displaystyle{\bm{K}}^{*}=\arg\min_{\bm{K}}||a_{i}^{\bm{K}}||

\displaystyle\text{for}\mod(i,2)=0.

(8)

Proof.

For $a_{0}$ the claim holds by applying Theorem 1 at $\lambda=0$ . For $j\in\{\dots,n\}$ let $\Psi_{j}(\lambda)=\sum_{j\neq i}^{n}a_{i}^{K}\lambda^{i}$ . Pick $\mu_{j}$ such that $\Psi_{j}(\mu_{j})=0$ . Then let $\phi_{j,\lambda}({\bm{K}}):=\frac{||\Phi({\bm{P}}_{\bm{K}},\lambda)||}{||\mu_{j}||^{j}}$ . This function $\phi$ has the same critical points and extrema as $g_{\lambda}$ in Theorem 1. Since $\phi_{j,\mu_{j}}({\bm{K}})=||a_{j}^{\bm{K}}||$ the first part of the claim follows.

The second part follows if $\mu_{j}<\lambda_{1}^{{\bm{K}}^{*}}$ for even all $j$ . Hence let $j\neq 0$ now be an even number. Since all Eigenvalues of ${\bm{P}}_{\bm{K}}$ are strictly positive and the $a_{i}^{\bm{K}}$ results from expanding the product $\prod_{i=1}^{n}(\lambda-\lambda_{i}^{K})$ , he signs of the $a_{i}^{\bm{K}}$ are alternating, meaning $sgn(a_{i}^{\bm{K}})=-sgn(a_{i+1}^{\bm{K}})$ for all $i=0,\dots,n-1$ . Since we ultimately care about $||\Phi({\bm{P}}_{\bm{K}},\lambda)||$ I will assume WLOG that $a_{i}>0$ for even $i$ . Hence $\Psi_{j}(0)=a_{0}=\Phi({\bm{P}}_{\bm{K}},0)>0$ and $\Psi_{j}(\lambda)=\Phi({\bm{P}}_{\bm{K}},\lambda)-a_{j}^{\bm{K}}\lambda^{j}<\Phi({\bm{P}}_{\bm{K}},\lambda)$ . Therefore $\Psi_{j}(\lambda)$ has a root, which we chose to be $\mu_{j}$ , between $0$ and $\lambda_{1}^{\bm{K}}<\lambda_{1}^{{\bm{K}}^{*}}$ . ∎

Remark 1 (Elementary symmetric Polynomials).

The elementary symmetric Polynomials in $n$ variables $e_{1}(X_{1},\dots,X_{n}),\dots,e_{n}(X_{1},\dots,X_{n})$ are defined as

\displaystyle e_{k}(X_{1},\dots,X_{n})=\sum_{1\leq j_{1}<\dots<j_{k}\leq n}X_{j_{1}}\cdots X_{j_{k}}.

(9)

Examples are $e_{1}(X_{1},\dots,X_{n})=X_{1}+\dots X_{n}$ and $e_{n}(X_{1},\dots,X_{n})=X_{1}\cdots X_{n}$ . They are invariant under permutation of their entries and appear naturally as the coefficients of the characteristic Polynomial:

\displaystyle\prod_{i=1}^{n}(\lambda-\lambda_{i})=\lambda^{n}-e_{1}(\lambda_{1},\dots,\lambda_{n})\lambda^{n-1}+\dots+(-1)^{n}e_{n}(\lambda_{1},\dots,\lambda_{n}).

(10)

The previous theorem thus also applies to these elementary symmetrical polynomials evaluated at Eigenvalues of ${\bm{P}}_{\bm{K}}$ .

Corollary 1.3.

The Kalman gain ${\bm{K}}^{*}$ is a critical point of the map ${\bm{K}}\mapsto Q(\lambda_{1}^{\bm{K}},\dots,\lambda_{n}^{\bm{K}})$ , where $Q$ is an arbirtray symmetric Polynomial, i.e. $Q(X_{1},\dots,X_{n})=Q(X_{\sigma_{1}},\dots,X_{\sigma_{n}})$ , where $\sigma\in S_{n}$ is a permutation.

Proof.

The non-leading coefficients $a^{\bm{K}}_{0},\dots,a^{\bm{K}}_{n-1}$ of the characteristic polynomial $\Phi({\bm{P}}_{K},\lambda)$ are the elementary symmetric polynonials $e_{1},\dots,e_{n}$ evaluated at the Eigenvalues of ${\bm{P}}_{\bm{K}}$ . From the previous corollary we showed that $K^{*}$ is a critical point for these. By the fundamental theorem of symmetric polynomials there is a polynomial $P(X_{1},\dots,X_{n})$ such that

\displaystyle Q(\lambda_{1}^{\bm{K}},\dots,\lambda_{n}^{\bm{K}})=P(e_{1}(\lambda_{1}^{\bm{K}},\dots,\lambda_{n}^{\bm{K}}),\dots,e_{n}(\lambda_{1}^{\bm{K}},\dots,\lambda_{n}^{\bm{K}})).

(11)

The claim follows by chain rule. ∎

Remark 2 (Special cases).

For $\lambda<0$ we can see that $||\Phi({\bm{P}}_{\bm{K}},\lambda)||=\sum_{i=0}^{n}|a^{\bm{K}}_{i}|||\lambda||^{i}$ . The Kalman gain $K^{*}$ minimizes this function for all $\lambda<\lambda^{K}_{1}$ . For $\lambda=-1$ it is found that the sum of all $a_{i}^{K}$ , i.e. the sum of all elementary symmetric Polynomials in Eigenvalues of ${\bm{P}}_{\bm{K}}$ are minimized by ${\bm{K}}^{*}$ . At $\lambda=0$ this evaluates to $a^{{\bm{K}}^{*}}_{0}=\det({\bm{P}}_{K^{*}})$ reproducing the result from [2] . It is easy to see that $\frac{||||\Phi({\bm{P}}_{\bm{K}},\lambda)||-||\lambda||^{n}||}{||\lambda||^{n-1}}\rightarrow a^{{\bm{K}}}_{n-1}=\text{tr}({\bm{P}}_{\bm{K}})$ as $\lambda\rightarrow-\infty$ . Since ${\bm{K}}^{*}$ minimizes this along the limiting process the minimizing of $\text{tr}({\bm{P}}_{\bm{K}})$ is rediscovered.

References

[1] Mark Asch, Marc Bocquet, and Maëlle Nodet. Data Assimilation: Methods, Algorithms, and Applications. Fundamentals of Algorithms. Society for Industrial and Applied Mathematics.
[2] Eviatar Bach. Proof that the kalman gain minimizes the generalized variance, 2021.
[3] Andrew H. Jazwinski. Stochastic Processes and Filtering Theory. Academic Press, Inc.
[4] Sören Laue, Matthias Mitterreiter, and Joachim Giesen. MatrixCalculus.org – Computing Derivatives of Matrix and Tensor Expressions. In Ulf Brefeld, Elisa Fromont, Andreas Hotho, Arno Knobbe, Marloes Maathuis, and Céline Robardet, editors, Machine Learning and Knowledge Discovery in Databases, Lecture Notes in Computer Science, pages 769–772. Springer International Publishing.
[5] Simo Sarkka. Bayesian Filtering and Smoothing. Cambridge University Press, Cambridge, 2013.
[6] Ricardo Todling. Estimation Theory and Foundations of Atmospheric Data Assimilation.