This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Minimizing effects of the Kalman gain on Posterior covariance Eigenvalues, the characteristic polynomial and symmetric polynomials of Eigenvalues

Johannes Krotz 11footnotemark: 1
Abstract

The Kalman gain is commonly derived as the minimizer of the trace of theposterior covariance. It is known that it also minimizes the determinant of the posterior covariance. I will show that it also minimizes the smallest Eigenvalue λ1\lambda_{1} and the chracteristic polynomial on (,λ1)(-\infty,\lambda_{1}) and is critical point to all symmetric polynomials of the Eigenvalues, minimizing some. This expands the range of uncertainty measures for which the Kalman Filter is optimal.

keywords:
Kalman Filter , Uncertainty measures , symetric Polynomials
\affiliation

[label1]organization=Department of Mathematics, University of Tennessee Knoxville,city=Knoxville, postcode=37996, state=TN, country=USA

In a Kalman filter algorithm the Kalman gain is defined by

𝑲=𝑷𝑯(𝑯𝑷𝑯𝑹)1,\displaystyle{\bm{K}}^{*}={\bm{P}}{\bm{H}}^{\top}({\bm{H}}{\bm{P}}{\bm{H}}^{\top}{\bm{R}})^{-1}, (1)

where 𝑷{\bm{P}} is the covariance matrix of prior, 𝑹{\bm{R}} is the covariance matrix of the Likelihood and 𝑯{\bm{H}} is the measurement operator. [1, 5] The posterior covariance matrix 𝑷𝑲{\bm{P}}_{{\bm{K}}^{*}} is defined through

𝑷𝑲=(𝑰𝑲𝑯)𝑷(𝑰𝑲𝑯)+𝑲𝑹𝑲.\displaystyle{\bm{P}}_{\bm{K}}=({\bm{I}}-{\bm{K}}{\bm{H}}){\bm{P}}({\bm{I}}-{\bm{K}}{\bm{H}})^{\top}+{\bm{K}}{\bm{R}}{\bm{K}}^{\top}. (2)

evaluated at 𝑲{\bm{K}}^{*}, where 𝑰{\bm{I}} is the identity matrix. Note that as covariance matrices 𝑷{\bm{P}} and 𝑹{\bm{R}} are symmetric and striclty positive, properties which 𝑷𝑲{\bm{P}}_{\bm{K}} inherits.
The Kalman gain defined in equation (1) is often derived as the minimizer of the total posterior variance, i.e. the trace of PKP_{K}.[1, 5, 3, 6, 4]. In other words

𝑲=argmin𝑲tr(𝑷𝑲)\displaystyle{\bm{K}}^{*}=\arg\min_{\bm{K}}\text{tr}({\bm{P}}_{\bm{K}}) (3)

It was shown in [2] that 𝑲{\bm{K}}^{*} also minimizes the posterior generalized variance, defined as the determinant of 𝑷𝑲{\bm{P}}_{\bm{K}}, i.e.

𝑲=argmin𝑲det(𝑷𝑲).\displaystyle{\bm{K}}^{*}=\arg\min_{\bm{K}}\det({\bm{P}}_{\bm{K}}). (4)

Let Φ(𝑷𝑲,λ):=det(λ𝑰𝑷𝑲)=i=1n(λλi𝑲):=i=0nai𝑲λi\Phi({\bm{P}}_{\bm{K}},\lambda):=\det(\lambda{\bm{I}}-{\bm{P}}_{\bm{K}})=\prod_{i=1}^{n}(\lambda-\lambda^{\bm{K}}_{i}):=\sum_{i=0}^{n}a_{i}^{\bm{K}}\lambda^{i} be the characteristic Polynomial of 𝑷𝑲{\bm{P}}_{\bm{K}} and 0<λ1𝑲λn𝑲0<\lambda^{\bm{K}}_{1}\leq\dots\leq\lambda^{\bm{K}}_{n} be its roots, the Eigenvalues of 𝑷𝑲{\bm{P}}_{\bm{K}}. Let Λ𝑲={λi𝑲}i=1n\Lambda^{\bm{K}}=\{\lambda^{\bm{K}}_{i}\}_{i=1}^{n}.

Theorem 1.

Let λΛ𝐊\lambda\notin\Lambda^{{\bm{K}}^{*}} and gλ(𝐊):=Φ(𝐏𝐊,λ)g_{\lambda}({\bm{K}}):=||\Phi({\bm{P}}_{\bm{K}},\lambda)||. The Kalman gain KK^{*} is a critical point of gλg_{\lambda} and, if λ<λ1K\lambda<\lambda^{K^{*}}_{1}

K=argmin𝑲gλ(K).\displaystyle K^{*}=\arg\min_{\bm{K}}g_{\lambda}(K). (5)
Proof.

Let λΛK\lambda\notin\Lambda^{K} and let f(𝑲,λ):=loggλf({\bm{K}},\lambda):=\log g_{\lambda}. Using matrix calculus it follows that

dfλ(𝑲)𝑲=(λ𝑰𝑷𝑲)1d𝑷𝑲d𝑲.\displaystyle\frac{df_{\lambda}({\bm{K}})}{{\bm{K}}}=-\left(\lambda{\bm{I}}-{\bm{P}}_{\bm{K}}\right)^{-1}\frac{d{\bm{P}}_{\bm{K}}}{d{\bm{K}}}. (6)

The equaton dfλ(𝑲)𝑲=0\frac{df_{\lambda}({\bm{K}})}{{\bm{K}}}=0 is solved if and only if d𝑷𝑲d𝑲=0\frac{d{\bm{P}}_{\bm{K}}}{d{\bm{K}}}=0 is solved. The latter however is only true for 𝑲=𝑲{\bm{K}}={\bm{K}}^{*}, implying that KK^{*} is a critical point of fλf_{\lambda} and hence of gλg_{\lambda}.
For λ<λ1K\lambda<\lambda^{K^{*}}_{1} the term (λ𝑰𝑷𝑲)1-\left(\lambda{\bm{I}}-{\bm{P}}_{{\bm{K}}^{*}}\right)^{-1} is positive definite, meaning the minimizing properties of d𝑷𝑲d𝑲\frac{d{\bm{P}}_{{\bm{K}}^{*}}}{d{\bm{K}}} are preserved. ∎

Corollary 1.1.

𝑲{\bm{K}}^{*} minimizes the smallest Eigenvalue

𝑲=argmin𝑲(λ1𝑲).\displaystyle{\bm{K}}^{*}=\arg\min_{\bm{K}}(\lambda_{1}^{{\bm{K}}}). (7)
Proof.

Let 𝑲𝑲{\bm{K}}\neq{\bm{K}}^{*}. Suppose λ1𝑲>λ1𝑲\lambda_{1}^{{\bm{K}}^{*}}>\lambda_{1}^{{\bm{K}}}. Then ||Φ(𝑷𝑲),λ1𝑲||=0<||Φ(𝑷K),λ1𝑲||||\Phi({\bm{P}}_{\bm{K}}),\lambda_{1}^{\bm{K}}||=0<||\Phi({\bm{P}}_{K^{*}}),\lambda_{1}^{\bm{K}}||. By continuity of these characteristic Polynomials there is λ<λ1𝑲<λ1K\lambda<\lambda_{1}^{\bm{K}}<\lambda_{1}^{K^{*}}, i.e. λΛ𝑲Λ𝑲\lambda\notin\Lambda^{\bm{K}}\cup\Lambda^{{\bm{K}}^{*}} such that ||Φ(𝑷𝑲),λ||<||Φ(𝑷K),λ||||\Phi({\bm{P}}_{\bm{K}}),\lambda||<||\Phi({\bm{P}}_{K^{*}}),\lambda||. This contradicts Theorem 1. ∎

Corollary 1.2.

Let Φ(𝐏𝐊,λ):=i=0nai𝐊λi\Phi({\bm{P}}_{\bm{K}},\lambda):=\sum_{i=0}^{n}a_{i}^{\bm{K}}\lambda^{i}. Then 𝐊{\bm{K}}^{*} is critical point of the map 𝐊ai𝐊{\bm{K}}\mapsto||a_{i}||^{\bm{K}} for all i=1,,ni=1,\dots,n and minimizer for even ii:

𝑲=argmin𝑲ai𝑲\displaystyle{\bm{K}}^{*}=\arg\min_{\bm{K}}||a_{i}^{\bm{K}}|| formod(i,2)=0.\displaystyle\text{for}\mod(i,2)=0. (8)
Proof.

For a0a_{0} the claim holds by applying Theorem 1 at λ=0\lambda=0. For j{,n}j\in\{\dots,n\} let Ψj(λ)=jinaiKλi\Psi_{j}(\lambda)=\sum_{j\neq i}^{n}a_{i}^{K}\lambda^{i}. Pick μj\mu_{j} such that Ψj(μj)=0\Psi_{j}(\mu_{j})=0. Then let ϕj,λ(𝑲):=Φ(𝑷𝑲,λ)μjj\phi_{j,\lambda}({\bm{K}}):=\frac{||\Phi({\bm{P}}_{\bm{K}},\lambda)||}{||\mu_{j}||^{j}}. This function ϕ\phi has the same critical points and extrema as gλg_{\lambda} in Theorem 1. Since ϕj,μj(𝑲)=aj𝑲\phi_{j,\mu_{j}}({\bm{K}})=||a_{j}^{\bm{K}}|| the first part of the claim follows.

The second part follows if μj<λ1𝑲\mu_{j}<\lambda_{1}^{{\bm{K}}^{*}} for even all jj. Hence let j0j\neq 0 now be an even number. Since all Eigenvalues of 𝑷𝑲{\bm{P}}_{\bm{K}} are strictly positive and the ai𝑲a_{i}^{\bm{K}} results from expanding the product i=1n(λλiK)\prod_{i=1}^{n}(\lambda-\lambda_{i}^{K}), he signs of the ai𝑲a_{i}^{\bm{K}} are alternating, meaning sgn(ai𝑲)=sgn(ai+1𝑲)sgn(a_{i}^{\bm{K}})=-sgn(a_{i+1}^{\bm{K}}) for all i=0,,n1i=0,\dots,n-1. Since we ultimately care about Φ(𝑷𝑲,λ)||\Phi({\bm{P}}_{\bm{K}},\lambda)|| I will assume WLOG that ai>0a_{i}>0 for even ii. Hence Ψj(0)=a0=Φ(𝑷𝑲,0)>0\Psi_{j}(0)=a_{0}=\Phi({\bm{P}}_{\bm{K}},0)>0 and Ψj(λ)=Φ(𝑷𝑲,λ)aj𝑲λj<Φ(𝑷𝑲,λ)\Psi_{j}(\lambda)=\Phi({\bm{P}}_{\bm{K}},\lambda)-a_{j}^{\bm{K}}\lambda^{j}<\Phi({\bm{P}}_{\bm{K}},\lambda). Therefore Ψj(λ)\Psi_{j}(\lambda) has a root, which we chose to be μj\mu_{j}, between 0 and λ1𝑲<λ1𝑲\lambda_{1}^{\bm{K}}<\lambda_{1}^{{\bm{K}}^{*}}. ∎

Remark 1 (Elementary symmetric Polynomials).

The elementary symmetric Polynomials in nn variables e1(X1,,Xn),,en(X1,,Xn)e_{1}(X_{1},\dots,X_{n}),\dots,e_{n}(X_{1},\dots,X_{n}) are defined as

ek(X1,,Xn)=1j1<<jknXj1Xjk.\displaystyle e_{k}(X_{1},\dots,X_{n})=\sum_{1\leq j_{1}<\dots<j_{k}\leq n}X_{j_{1}}\cdots X_{j_{k}}. (9)

Examples are e1(X1,,Xn)=X1+Xne_{1}(X_{1},\dots,X_{n})=X_{1}+\dots X_{n} and en(X1,,Xn)=X1Xne_{n}(X_{1},\dots,X_{n})=X_{1}\cdots X_{n}. They are invariant under permutation of their entries and appear naturally as the coefficients of the characteristic Polynomial:

i=1n(λλi)=λne1(λ1,,λn)λn1++(1)nen(λ1,,λn).\displaystyle\prod_{i=1}^{n}(\lambda-\lambda_{i})=\lambda^{n}-e_{1}(\lambda_{1},\dots,\lambda_{n})\lambda^{n-1}+\dots+(-1)^{n}e_{n}(\lambda_{1},\dots,\lambda_{n}). (10)

The previous theorem thus also applies to these elementary symmetrical polynomials evaluated at Eigenvalues of 𝐏𝐊{\bm{P}}_{\bm{K}}.

Corollary 1.3.

The Kalman gain 𝐊{\bm{K}}^{*} is a critical point of the map 𝐊Q(λ1𝐊,,λn𝐊){\bm{K}}\mapsto Q(\lambda_{1}^{\bm{K}},\dots,\lambda_{n}^{\bm{K}}), where QQ is an arbirtray symmetric Polynomial, i.e. Q(X1,,Xn)=Q(Xσ1,,Xσn)Q(X_{1},\dots,X_{n})=Q(X_{\sigma_{1}},\dots,X_{\sigma_{n}}), where σSn\sigma\in S_{n} is a permutation.

Proof.

The non-leading coefficients a0𝑲,,an1𝑲a^{\bm{K}}_{0},\dots,a^{\bm{K}}_{n-1} of the characteristic polynomial Φ(𝑷K,λ)\Phi({\bm{P}}_{K},\lambda) are the elementary symmetric polynonials e1,,ene_{1},\dots,e_{n} evaluated at the Eigenvalues of 𝑷𝑲{\bm{P}}_{\bm{K}}. From the previous corollary we showed that KK^{*} is a critical point for these. By the fundamental theorem of symmetric polynomials there is a polynomial P(X1,,Xn)P(X_{1},\dots,X_{n}) such that

Q(λ1𝑲,,λn𝑲)=P(e1(λ1𝑲,,λn𝑲),,en(λ1𝑲,,λn𝑲)).\displaystyle Q(\lambda_{1}^{\bm{K}},\dots,\lambda_{n}^{\bm{K}})=P(e_{1}(\lambda_{1}^{\bm{K}},\dots,\lambda_{n}^{\bm{K}}),\dots,e_{n}(\lambda_{1}^{\bm{K}},\dots,\lambda_{n}^{\bm{K}})). (11)

The claim follows by chain rule. ∎

Remark 2 (Special cases).

For λ<0\lambda<0 we can see that Φ(𝐏𝐊,λ)=i=0n|ai𝐊|λi||\Phi({\bm{P}}_{\bm{K}},\lambda)||=\sum_{i=0}^{n}|a^{\bm{K}}_{i}|||\lambda||^{i}. The Kalman gain KK^{*} minimizes this function for all λ<λ1K\lambda<\lambda^{K}_{1}. For λ=1\lambda=-1 it is found that the sum of all aiKa_{i}^{K}, i.e. the sum of all elementary symmetric Polynomials in Eigenvalues of 𝐏𝐊{\bm{P}}_{\bm{K}} are minimized by 𝐊{\bm{K}}^{*}. At λ=0\lambda=0 this evaluates to a0𝐊=det(𝐏K)a^{{\bm{K}}^{*}}_{0}=\det({\bm{P}}_{K^{*}}) reproducing the result from [2] . It is easy to see that Φ(𝐏𝐊,λ)λnλn1an1𝐊=tr(𝐏𝐊)\frac{||||\Phi({\bm{P}}_{\bm{K}},\lambda)||-||\lambda||^{n}||}{||\lambda||^{n-1}}\rightarrow a^{{\bm{K}}}_{n-1}=\text{tr}({\bm{P}}_{\bm{K}}) as λ\lambda\rightarrow-\infty. Since 𝐊{\bm{K}}^{*} minimizes this along the limiting process the minimizing of tr(𝐏𝐊)\text{tr}({\bm{P}}_{\bm{K}}) is rediscovered.

References

  • [1] Mark Asch, Marc Bocquet, and Maëlle Nodet. Data Assimilation: Methods, Algorithms, and Applications. Fundamentals of Algorithms. Society for Industrial and Applied Mathematics.
  • [2] Eviatar Bach. Proof that the kalman gain minimizes the generalized variance, 2021.
  • [3] Andrew H. Jazwinski. Stochastic Processes and Filtering Theory. Academic Press, Inc.
  • [4] Sören Laue, Matthias Mitterreiter, and Joachim Giesen. MatrixCalculus.org – Computing Derivatives of Matrix and Tensor Expressions. In Ulf Brefeld, Elisa Fromont, Andreas Hotho, Arno Knobbe, Marloes Maathuis, and Céline Robardet, editors, Machine Learning and Knowledge Discovery in Databases, Lecture Notes in Computer Science, pages 769–772. Springer International Publishing.
  • [5] Simo Sarkka. Bayesian Filtering and Smoothing. Cambridge University Press, Cambridge, 2013.
  • [6] Ricardo Todling. Estimation Theory and Foundations of Atmospheric Data Assimilation.