This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Optimal Linear Filtering for
Discrete-Time Systems with
Infinite-Dimensional Measurements

Maxwell M. Varley    Timothy L. Molloy    and Girish N. Nair This work received funding from the Australian Government, via grant AUSMURIB000001 associated with ONR MURI grant N00014-19-1-2571. M. M. Varley and G. N. Nair are with the Department of Electrical and Electronic Engineering, University of Melbourne, Parkville, VIC, 3010, Australia. (emails: [email protected], [email protected])T. L. Molloy is with the CIICADA Lab, School of Engineering, Australian National University, Canberra, ACT 2601, Australia (e-mail: [email protected])
Abstract

Systems equipped with modern sensing modalities such as vision and lidar gain access to increasingly high-dimensional measurements with which to enact estimation and control schemes. In this article, we examine the continuum limit of high-dimensional measurements and analyze state estimation in linear time-invariant systems with infinite-dimensional measurements but finite-dimensional states, both corrupted by additive noise. We propose a linear filter and derive the corresponding optimal gain functional in the sense of the minimum mean square error, analogous to the classic Kalman filter. By modeling the measurement noise as a wide-sense stationary random field, we are able to derive the optimal linear filter explicitly, in contrast to previous derivations of Kalman filters in distributed-parameter settings. Interestingly, we find that we need only impose conditions that are finite-dimensional in nature to ensure that the filter is asymptotically stable. The proposed filter is verified via simulation of a linearized system with a pinhole camera sensor.

This paper will be published in IEEE Transactions on Automatic Control.
©2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
{IEEEkeywords}

Discrete-Time Linear Systems, Distributed Parameter Systems, Kalman Filtering, State Estimation, Stochastic Fields, Stochastic Processes

1 Introduction

\IEEEPARstart

Many systems involve sensors that produce high-dimensional data. These types of systems occur in a wide range of fields, such as modeling economic growth [1], the spiking patterns of neurons in biological systems [2], and vision-based sensing in robotics [3]. In real-time settings with limited computational power, this data is typically converted to a lower-dimensional representation for timely processing, e.g. by extracting a limited number of key features or restricting to a subset of the data.

In this work, we propose to head in the opposite direction, and treat high-dimensional observations as infinite-dimensional random fields on a continuous index set. Our motivation for this approach comes from physics, where it is often more convenient to treat a complex physical system as a continuum, rather than as a large collection of discrete elements. With the right analytical tools, a continuous representation can simplify analysis and offer insights into the behaviour of the underlying high-dimensional system.

A system with infinite-dimensional observations does come with its own challenges however, which require the use of different analytical tools and lead to a new perspective compared to its finite-dimensional counterpart. The characterization of infinite-dimensional measurement noise involves stochastic fields, which must be handled carefully to maintain rigour. An example of this is the need for a generalization of the Fubini-Tonelli theorem to wide-sense stationary random fields, which we will present in Theorem 3.1.

Another important distinction is that in a standard finite-dimensional Kalman filter setup, an optimal gain is given by solving a matrix equation, and existence and uniqueness conditions arise naturally from standard linear algebra results. The infinite-dimensional observation space, however, results in an integral equation. We will show in Section 4 (under certain regularity assumptions) that this can be solved via Fourier methods and that the existence and uniqueness conditions can, surprisingly, still be given by examining properties of finite-dimensional objects. This gives us access to a closed-form continuous representation of the optimal gain, which is amenable to Fourier analysis.

Furthermore, we note that mean square stability of finite-dimensional systems often requires stabilizability and detectability conditions dependent on the state transition matrix, input matrix, and output matrix. For systems with infinite-dimensional measurements, it turns out that for mean square stability of our filter we require (sufficient) notions of stabilizability and detectability that take into account the noise covariance matrix, which is not the case for finite-dimensional systems. This is demonstrated in Section 5.

1.1 Literature Review

Infinite-dimensional representations are widely used in the study of distributed-parameter systems - see e.g. [4, 5, 6]. In such systems, the states and measurements are represented as functions both of time and a continuous-valued index taking values in some domain 𝒟\mathcal{D}. The models may be stochastic [7, 8] or deterministic [4], in either discrete or continuous-time [9]. A critical high-level survey of the approaches, claims, and results in this area has been carried out in [5], and a broader treatment of the estimation and control of distributed-parameter systems can be found in [4, 6, 10, 11]. An overview of various observer design methodologies for linear distributed-parameter systems is also given in the survey [12]. Although this survey does not discuss stochastic systems, the references therein are classified as either early lumping models, where the system is discretized before observer design, and late lumping models, where the system is discretized after observer design. In all the works discussed in [12], both the state and measurement spaces are considered infinite-dimensional.

In contrast with the rich body of work above, in this article we examine systems with infinite-dimensional measurements but finite-dimensional states. complementary problem setup occurs in [13], although in that work the system is assumed to have an infinite-dimensional state space and a finite-dimensional measurement space, which sidesteps the issue of calculating the inverse of an infinite-dimensional object, an issue we will discuss further in Section 6.2. The work in [13] uses an orthogonal projection operator to reduce the state to a finite-dimensional subspace and proceeds to establish upper bounds on the resulting discretization error by way of sensitivity analysis of a Ricatti difference equation. A more common approach is the aforementioned early lumping method, where discretization is performed at the start to produce linear models with high-but-finite-dimensional states and measurements. This is the approach taken by [14], where a Kalman filter is designed for a finite-dimensional subset of the infinite-dimensional measurement, corrupted by the additive Gaussian noise that is spatially white. In this article, we do not perform any such lumping, early or late, and our analysis remains firmly in an infinite-dimensional setting.

See Table 1 for a concise summary of early derivations of linear distributed-parameter filters, as well as more recent work. The papers therein employ a number of different approaches, including Green’s function [15], orthogonal projections [7], least squares arguments [16], the Wiener-Hopf method [8], Bayesian approaches [17], and 2\mathcal{H}_{2} estimation [18], to name a representative handful. The derivation in this article is closest to the orthogonal projection argument given in [7], although here we deal with a discrete-time system. It should also be noted that [7] assumes that the observation noise is both temporally and spatially white, whereas here we do not require the latter property. We do, however, introduce the assumption that the observation noise is stationary, which allows the novel derivation of a closed form expression of the optimal gain in Section 4.3. This form has not been presented in any of these previous works.

The model that we propose is relevant to systems where high-dimensional measurements, for instance from vision, radar, or lidar sensors, are used to estimate a low-dimensional state, such as the position and velocity of an autonomous agent or moving target. To the best of the authors’ knowledge, such systems have not been the focus of previous work in distributed parameter filtering.

Table 1: Properties and assumptions of previous work in state estimation of distributed-parameter models. The final column describes the work presented in this article.
Reference [7] [15] [19] [17] [16] [8] [20] [4] [18] Ours
Finite-Dim. State Space
Discrete Time
Noise is Stationary on Domain 𝒟\mathcal{D}
Explicit Optimal Gain

1.2 Contributions

Our key contribution is the derivation of an exact solution for the optimal linear filter for discrete-time systems with measurement noise and process noise modeled by random fields and random vectors respectively (Theorem 4.1, Procedure 1). The assumptions of finite state dimension, and that the measurement noise is stationary on domain 𝒟\mathcal{D}, enable us to derive the optimal gain functional explicitly in terms of a multi-dimensional Fourier transform on the measurement index set 𝒟\mathcal{D}.

This is in contrast to previous works in which the optimal gain is given in terms of a distributed-parameter inverse operator that is either not defined or defined implicitly [8], [17]. Moreover, this inverse can lead to implementation difficulties and is not always guaranteed to exist: these issues are discussed in further detail in Section 6.2. In comparison, our approach allows us to give sufficient conditions for the system to have a well-defined optimal gain function (Section 4): in rough terms, the effective bandwidth of the Fourier transform of the measurement kernel should be smaller than that of the noise covariance kernel.111The term bandwidth here refers to the effective range of “spatial” frequencies present in a measurement, not temporal frequencies.

An advantage of this continuum limit approach is that it enables us to make precise statements about the filters behaviour in the limit when measurement resolution approaches infinity. This paves the way for further analysis, as various finite-point approximation schemes may be formulated and the error compared with the asymptotic error of the optimal infinite-resolution filter in a principled and quantitative manner. Further discussion and analysis of these finite-point approximation schemes will be addressed in future work.

This article contains five significant extensions in comparison to our preliminary work in the conference paper [21].

The first major development is that this work, by way of a Hilbert space framework, presents necessary and sufficient conditions which guarantee the existence and uniqueness of the optimal gain function. Our previous work [21] employed a calculus of variations approach to yield only necessary conditions on the optimal gain function.

The second development is the extension of our proposed algorithm to a dd-dimensional, not just scalar, measurement domain 𝒟\mathcal{D}, allowing much richer sensor data to be modeled without obscuring the underlying spatial correlations. Examples include camera images (d=2d=2), as well as high-resolution lidar and magnetic resonance imaging (MRI) (d=3d=3). This extension entails the use of random fields and Fourier transforms with dd-dimensional frequency, as opposed to standard random processes and Fourier transforms on a scalar axis. This has also allowed us to present simulation results that involve two-dimensional images, as opposed to the simpler scalar functions previously presented in [21].

The third development is a weakening of the assumptions on the observation noise. The filter provided in [21] was restricted to systems with Gaussian observation noise, whereas the formulation presented here loosens this assumption and instead only requires that the noise be stationary.

The fourth development is that this work clearly defines explicit conditions such that the presented arguments are rigorous, whereas the derivations in [21] were of a more preliminary and exploratory nature. This work provides a list of assumptions such that our derived filter is well-defined, and that each step in the derivation is valid.

The fifth and final development is that we explicitly impose stabilizability and detectability conditions on the system (Section 2.4). The detectability condition in particular involves the measurement noise covariance function, as well as the deterministic dynamical matrix and measurement function. This strongly contrasts with other works in infinite-dimensional systems theory, which impose conditions of strong or weak observability [22, 23] that are agnostic to noise.

These extensions yield a clearer picture of the relationship between the system and the optimal filter, and provide a more generalized solution for the optimal gain.

Refer to caption
(a) Length scale =0.1\ell=0.1
Refer to caption
(b) Length scale =0.01\ell=0.01
Refer to caption
(c) Length scale =0.001\ell=0.001
Figure 1: Three normalized stationary random fields, spatially discretized into a 100×100100\times 100 grid over the domain [0.5,0.5]2[-0.5,0.5]^{2}. Each field is zero-mean with a squared exponential covariance function R(i,i)eii22(22)1R(i,i^{\prime})\propto e^{-\|i-i^{\prime}\|_{2}^{2}(2\ell^{2})^{-1}}. The length scale \ell determines how closely each element is influenced by its neighbors, with a larger length scale leading to a stronger correlation. Note that small length scales are often used to approximate ideal white noise.

1.3 Organization

This article is organized as follows. Section 2 outlines the relevant definitions, theorems, and conditions relating to the Hilbert space of random variables and random fields, as well as the multi-dimensional Fourier transform and notions of system stabilizability and detectability. Section 3 gives our model definition and discusses some concrete examples which our model could be feasibly used to represent. This section also summarizes the key assumptions needed to ensure validity of our analysis, and the motivation behind each assumption. Section 4 derives necessary and sufficient conditions for the optimal gain of the proposed filter (4.1), before presenting a unique gain function which satisfies these conditions 4.2. Section 4.3 will present the full filter equations and stability of this filter will be discussed in Section 5. Section 6 will give a procedure to implement this filter, with Section 6.1 discussing computational complexity and Section 6.2 drawing comparisons to existing results. Finally, Section 7 presents a simulated system and analyzes the empirical performance of the filter compared to its expected performance.

2 Preliminaries

2.1 Hilbert Space of Random Variables

In this work, we denote by \mathcal{H}, the Hilbert space of all real-valued random variables with finite second moments [24, Ch. 20]. The inner product of any two elements u,vu,v\in\mathcal{H} is defined by u,v=E[uv].\langle u,v\rangle=E[uv]. An inner product operating on two random vectors u,vu,v in the Hilbert space n\mathcal{H}^{n} of random nn-vectors is defined by u,v=E[uv]\langle u,v\rangle_{\mathcal{H}}=E[u^{\top}v]. The derivation used in this article involves defining the state of our observed system as a vector residing in a Hilbert space and then selecting an error-minimizing estimate that resides in a subspace. The following theorem [25, p.51, Th. 2] is fundamental to this approach.

Theorem 2.1 (Hilbert Projection Theorem)

Let 𝒢\mathcal{G} be a Hilbert space and MM a closed subspace of 𝒢.\mathcal{G}. Corresponding to any vector x𝒢x\in\mathcal{G}, there exists a unique vector m0Mm_{0}\in M such that xm0xmmM.||x-m_{0}||\leq||x-m||\ \forall\ m\in M. Furthermore, a necessary and sufficient condition that m0Mm_{0}\in M is the unique minimizing vector is that (xm0)M.(x-m_{0})\perp M.

2.2 Random Fields

This article models measurement noise as an additive random field and some care is required when manipulating these objects. Rigorous and extensive analysis of these fields can be found in a number of textbooks [26], [27], [28]. A brief summary of relevant definitions is given below.

Let (Θ,,P)(\Theta,\mathcal{F},P) be a probability space and d\mathbb{R}^{d} a Euclidian space endowed with Lebesgue measure. A random field is a function v:d×Θmv:\mathbb{R}^{d}\times\Theta\rightarrow\mathbb{R}^{m} which is measurable with respect to the product measure on d×Θ\mathbb{R}^{d}\times\Theta.222For concision, the dependence on θ\theta will usually be suppressed. A centered w.s.s. (wide-sense stationary) random field has the properties that i) E[v(i)]=0E[v(i)]=0 for all ii; ii) the random vector v(i)v(i)\in\mathcal{H} for all ii; and iii) the covariance function RR only depends on the difference between index points, so that for any i1,i2d,E[v(i1)v(i2)]=R(i1i2)i_{1},i_{2}\in\mathbb{R}^{d},\ E[v(i_{1})v^{\top}(i_{2})]=R(i_{1}-i_{2}). Such a field that is stationary over a domain 𝒟\mathcal{D} is often referred to as 𝒟\mathcal{D}-stationary noise. In the special case where the covariance is also independent of direction, the noise is said to be isotropic. The stationarity condition arises in situations where the noise statistics possess translational invariance, in other words, the noise statistics look the same everywhere throughout the domain. Some sources of noise may also be locally stationary, in that they are closely approximated by stationary processes over finite windows. Spatial stationarity is a common assumption in a number of diverse contexts such as image denoising [3, 29, 30], radio signal propagation [31], and geostatistics [32, Ch. 74]. Figure 1 gives three example realizations of a stationary random field with a squared exponential covariance function and varying length scales.

2.3 The Multi-Dimensional Fourier Transform

A critical step in this article involves the transformation of an integral equation to an algebraic equation, facilitated by the multi-dimensional Fourier transform, which we now define.

Definition 2.1 (Multi-Dimensional Fourier Transform)

[33, Ch. 1] The multi-dimensional Fourier transform of any fL1(d,)f\in L_{1}(\mathbb{R}^{d},\mathbb{R}) is given by {f}L(d,)\mathcal{F}\{f\}\in L_{\infty}(\mathbb{R}^{d},\mathbb{R}) and is defined as

({f})(ω)df(i)e2πjωi𝑑i,\displaystyle(\mathcal{F}\{f\})(\omega)\triangleq\int_{\mathbb{R}^{d}}f(i)e^{-2\pi j\omega\cdot i}di,

where ω\omega is a dd-dimensional vector of spatial frequencies. We will denote the Fourier transform of any function ff by f¯.\bar{f}. The inverse Fourier transform is defined by

(1{f})(i)=df¯(ω)e2πjiω𝑑ω.\displaystyle(\mathcal{F}^{-1}\{f\})(i)=\int_{\mathbb{R}^{d}}\bar{f}(\omega)e^{2\pi ji\cdot\omega}d\omega.

This form of the inverse generally only holds when f¯(ω)\bar{f}(\omega) is also absolutely integrable [34].

2.4 Stabilizability and Detectability

This article will employ notions of stabilizability and detectability, which we define as follows:

Definition 2.2 (Stabilizability)

[35, p. 342] The matrix pair (A,B)(A,B) is stabilizable if there exists a matrix CC such that

ρ(A+BC)<1,\displaystyle\rho(A+BC^{\top})<1,

where ρ()\rho(\cdot) is the spectral radius.

Definition 2.3 (Detectability)

[35, p. 342] The matrix pair (A,B)(A,B) is detectable if (A,B)(A^{\top},B) is stabilizable.

3 Problem Formulation and Assumptions

Consider the linear discrete-time system

xk+1\displaystyle x_{k+1} =Axk+wk,\displaystyle=Ax_{k}+w_{k}, (1)

with state xknx_{k}\in\mathbb{R}^{n}, additive process noise wknw_{k}\in\mathbb{R}^{n}, state transition matrix An×nA\in\mathbb{R}^{n\times n}, and time index kk\in\mathbb{N}. We omit a control input for clarity; the results derived in this article are still valid with minor changes if a control input is present and available to the estimator. The state is measured via an infinite-dimensional measurement field defined on domain d\mathbb{R}^{d}, that is,

zk(i)=γ(i)xk+vk(i).\displaystyle z_{k}(i)=\gamma(i)x_{k}+v_{k}(i). (2)

Note that the measurement domain idi\in\mathbb{R}^{d}, measurement zk:dmz_{k}:\mathbb{R}^{d}\rightarrow\mathbb{R}^{m}, measurement noise vk:dmv_{k}:\mathbb{R}^{d}\rightarrow\mathbb{R}^{m}, and measurement function γ:dm×n\gamma:\mathbb{R}^{d}\rightarrow\mathbb{R}^{m\times n}.

The schema presented in (1) and (2) are well-suited to systems where the state can be represented by a low-dimensional model, but the observations are represented by a very high or infinite-dimensional model. A major area where such systems typically hold is robotics, in the context of Simultaneous Localization and Mapping (SLAM). The state to be estimated is often the pose of a mobile robot, which could be described, for example, as a 9-vector that represents the robot position, velocity, and acceleration in 3\mathbb{R}^{3}. The state updates are generally linear, obeying Newtonian dynamics. The measurement sensors, on the other hand, can be of a much higher dimensionality.

When lighting conditions are adequate, optical cameras provide a rich source of data, and even a relatively cheap 10 megapixel camera will provide a ten million dimensional image every frame. In these conditions, vision-based localization can be particularly suited to GPS-denied areas, such as extra-terrestrial environments [36]. In the case of a colour camera, the measurement equation described (2) would apply with a 33-vector (m=3)(m=3) describing the RGB intensity of each pixel, defined on a 22-dimensional image domain (d=2)(d=2).

Under low or no lighting conditions, where optical cameras do not provide sufficient data, lidar is often employed. This setup has applications in the mining sector [37, 38], where underground tunnels or other ”visually degraded” environments must be traversed [39]. Although not generally as “pixel-dense” as optical cameras, the commonly used SICK LMS511 is capable of providing up to 4560 samples per scan, and higher-end lidar modules can provide upwards of a million samples per scan. This schema is also suited to target-tracking problems, where the state to be estimated now represents a moving body, and the measurement sensors are often radar, lidar, optical, or an ensemble thereof [40].

We assume the additive measurement noise in (2) is a w.s.s. centered random field with bounded covariance function RR, so that each element of z(i)z(i) is in \mathcal{H} for all idi\in\mathbb{R}^{d}. We also assume that γ\gamma is bounded and absolutely integrable over domain d\mathbb{R}^{d} under the Frobenius or other matrix norm.

The process noise and measurement noise are such that j,k\forall j,k\in\mathbb{N} and i,idi,i^{\prime}\in\mathbb{R}^{d},

E[wk]\displaystyle E[w_{k}] =0\displaystyle=0
E[vk(i)]\displaystyle E[v_{k}(i)] =0\displaystyle=0
E[vk(i)wj]\displaystyle E[v_{k}(i)w_{j}^{\top}] =0\displaystyle=0
E[wkwj]\displaystyle E[w_{k}w_{j}^{\top}] =Qδjk,\displaystyle=Q\delta_{j-k}, Qn×n\displaystyle Q\in\mathbb{R}^{n\times n}
E[vk(i)vj(i)]\displaystyle E[v_{k}(i)v_{j}^{\top}(i^{\prime})] =R(ii)δjk,\displaystyle=R\big{(}i-i^{\prime}\big{)}\delta_{j-k}, R(ii)m×m.\displaystyle R\big{(}i-i^{\prime}\big{)}\in\mathbb{R}^{m\times m}. (3)

Here QQ is a positive-definite matrix, δjk\delta_{j-k} is the discrete impulse, and RR is bounded and absolutely integrable.

We will examine the performance of a linear filter of the form

{x^k=Ax^k1+Kk[zkz^k]z^k(i)=γ(i)Ax^k1,\displaystyle\begin{cases}\hat{x}_{k}&=A\hat{x}_{k-1}+K_{k}[z_{k}-\hat{z}_{k}]\\ \hat{z}_{k}(i)&=\gamma(i)A\hat{x}_{k-1},\end{cases} (4)

under a mean square error (MSE) criterion E[xkx^k22]E[\|x_{k}-\hat{x}_{k}\|_{2}^{2}]. Here KkK_{k} is a linear mapping from the space of m\mathbb{R}^{m}-valued fields on d\mathbb{R}^{d} to n\mathbb{R}^{n}. The innovation term we will denote by skzkz^ks_{k}\triangleq z_{k}-\hat{z}_{k}. Motivated by the Riesz representation theorem[41, p. 188], we assume the integral form

Kkskdκk(i)sk(i)𝑑i,\displaystyle K_{k}s_{k}\triangleq\int_{\mathbb{R}^{d}}\kappa_{k}(i)s_{k}(i)di, (5)

where κk:dn×m\kappa_{k}:\mathbb{R}^{d}\rightarrow\mathbb{R}^{n\times m} is an optimal gain function to be determined.
We now present the assumptions we need, but before we do so, we introduce the following useful terms.

f(i)\displaystyle f(i) 1{γ¯R¯1}n×m,\displaystyle\triangleq\mathcal{F}^{-1}\{\bar{\gamma}^{\top}\bar{R}^{-1}\}\in\mathbb{R}^{n\times m},
S\displaystyle S df(i)γ(i)𝑑in×n.\displaystyle\triangleq\int_{\mathbb{R}^{d}}f(i)\gamma(i)\,di\in\mathbb{R}^{n\times n}. (6)

These terms are well-defined given the assumptions below, and we will also show in Section 5 that SS is symmetric and positive semi-definite, and hence there exists a unique symmetric positive semi-definite matrix GG such that S=GGS=GG[42, p. 439]. This matrix GG will be termed the principal square root of SS. A summary of the key assumptions stated for our model is now presented.

  A1

γL1(d,n×m)L(d,n×m)\gamma\in L_{1}(\mathbb{R}^{d},\mathbb{R}^{n\times m})\cap L_{\infty}(\mathbb{R}^{d},\mathbb{R}^{n\times m})

  A2

RL1(d,m×m)L(d,m×m)R\in L_{1}(\mathbb{R}^{d},\mathbb{R}^{m\times m})\cap L_{\infty}(\mathbb{R}^{d},\mathbb{R}^{m\times m})

  A3

R¯(ω)m×m\bar{R}(\omega)\in\mathbb{R}^{m\times m} is invertible for almost all ω\omega

  A4

γ¯R¯1L1(d,n×m)L2(d,n×m)\bar{\gamma}^{\top}\bar{R}^{-1}\in L_{1}(\mathbb{R}^{d},\mathbb{R}^{n\times m})\cap L_{2}(\mathbb{R}^{d},\mathbb{R}^{n\times m}) and has an inverse fourier transform in L1(d,n×m)L_{1}(\mathbb{R}^{d},\mathbb{R}^{n\times m})

  A5

κkL1(d,n×m)L2(d,n×m)k\kappa_{k}\in L_{1}(\mathbb{R}^{d},\mathbb{\mathbb{R}}^{n\times m})\cap L_{2}(\mathbb{R}^{d},\mathbb{\mathbb{R}}^{n\times m})\ \forall k

  A6

The matrix pair (A,Q)(A,Q) is stabilizable.

  A7

The matrix pair (A,G)(A,G) is detectable.

Assumption A1 and assumption A2 will be used in Section 4.2 to ensure that an appropriate Fourier transform of γ\gamma and RR is well-defined. Assumption A1 is also needed to extend the Fubini-Tonelli theorem to wide-sense-stationary random fields, as shown in Appendix 9. Assumption A3 and assumption A4 will be needed in Section 4.2 to ensure the existence of an explicit form of the optimal gain function κ\kappa. Assumption A5 is imposed to ensure that the integral (5) is well-defined by Theorem 3.1. Whilst there are notions of observability in infinite-dimensional systems theory (cf. [22, 23]), in this article we do not require them. Instead, we shall assume that (A,Q)(A,Q) is stabilizable (assumption A6) and that (A,G)(A,G) is detectable (assumption A7) in the sense of Definition 2.2 and Definition 2.3, where GG is a matrix-valued functional of the measurement function γ\gamma and the measurement noise covariance RR. This will ensure that the filter error covariance matrix converges as time grows. It should be noted that traditional detectability conditions do not typically depend on properties of the measurement noise [35, 22, 23].

We verify that the filter integral (5) is well-defined by adapting the classical Fubini-Tonelli theorem [43, Cor. 13.9] to w.s.s. random fields as follows:

Theorem 3.1 (Fubini-Tonelli for w.s.s. random fields)

Let v:d×Θmv:\mathbb{R}^{d}\times\Theta\rightarrow\mathbb{R}^{m} be a w.s.s. random field defined on the probability space (Θ,,P)(\Theta,\mathcal{F},P), with θΘ\theta\in\Theta representing a point in the sample space Θ\Theta, and with bounded covariance function Σ\Sigma. Let s:d×Θms:\mathbb{R}^{d}\times\Theta\rightarrow\mathbb{R}^{m} be a random field given by sv+γxs\triangleq v+\gamma x, where γL(d,m×n)\gamma\in L_{\infty}(\mathbb{R}^{d},\mathbb{R}^{m\times n}) and xx is a random vector with elements in \mathcal{H} and covariance matrix PP. Also let κ:dn×m\kappa:\mathbb{R}^{d}\rightarrow\mathbb{R}^{n\times m} be a matrix-valued function, absolutely integrable under the Frobenius matrix norm. Each component of the vector-valued function κ()s(,)\kappa(\cdot)s(\cdot,\cdot) is assumed to be measurable with respect to the product σ\sigma-algebra ×\mathcal{L}\times\mathcal{F}, where \mathcal{L} is the σ\sigma-algebra corresponding to the dd-dimensional Lebesgue measure. Then

dΘκ(i)s(i,θ)1P(dθ)𝑑i<,\int_{\mathbb{R}^{d}}\int_{\Theta}\|\kappa(i)s(i,\theta)\|_{1}P(d\theta)di<\infty,

and furthermore

Θdκ(i)s(i,θ)𝑑iP(dθ)=dΘκ(i)s(i,θ)P(dθ)𝑑i.\int_{\Theta}\int_{\mathbb{R}^{d}}\kappa(i)s(i,\theta)diP(d\theta)=\int_{\mathbb{R}^{d}}\int_{\Theta}\kappa(i)s(i,\theta)P(d\theta)di.

The proof of Theorem 3.1 is given in Appendix 9

4 Optimal Linear Filter Derivation

We wish to find the optimal state estimate x^k\hat{x}_{k}, where x^k\hat{x}_{k} is some bounded linear functional of all measurements up to time k.k. Optimality in this work will refer to the estimate which minimizes the mean square error E[xkx^k22]E[\|x_{k}-\hat{x}_{k}\|^{2}_{2}]. This random nn-vector minimization problem is equivalent to nn random scalar minimization problems, as minimization of each component norm will minimize the entire vector norm. Each component of x^k\hat{x}_{k} is then the orthogonal projection of the corresponding component of xkx_{k} onto the space of all scalar-valued linear functionals of the preceding measurements. Let xklx_{k}^{l}\in\mathcal{H} denote the lthl^{th} component of the state vector xk{x}_{k}. Also let our observation at time kk be denoted by the random mm-vector valued function zk:dm{z}_{k}:\mathbb{R}^{d}\rightarrow\mathbb{R}^{m}, assumed measurable with respect to the product measure on d×Θ\mathbb{R}^{d}\times\Theta, and the set of all possible measurements at time kk be 𝒵k\mathcal{Z}_{k}. Section 4.1 will state the optimality conditions of the filter, and Section 4.2 will derive an explicit solution that satisfies these conditions. A brief reference of the spaces and variables used in the following derivation is given in Table 2 and Table 3 respectively.

Table 2: Spaces used in Section 4 derivations.
Symbol Description
\mathcal{H} Hilbert space of random variables with zero mean and finite second moment.
𝒵k\mathcal{Z}_{k} Space of all possible measurements zk()z_{k}(\cdot).
Mk1M_{k-1} Space of all possible elements of \mathcal{H} expressible via (8).
MkM_{k}^{-} Space of all possible elements of \mathcal{H} expressible via (12).
𝒮k\mathcal{S}_{k} Space of all possible elements of \mathcal{H} expressible via (10).
Table 3: Variables used in Section 4 derivations.
Symbol Description
xx System state
zz Measurement
x^\hat{x} Optimal estimate of xx given past measurements
z^\hat{z} Optimal estimate of zz given past measurements
ss Innovation vector zz^z-\hat{z}
ϕ,α\phi,\alpha Arbitrary vectors in L1(d,1×m)L2(d,1×m)L_{1}(\mathbb{R}^{d},\mathbb{R}^{1\times m})\cap L_{2}(\mathbb{R}^{d},\mathbb{R}^{1\times m})
mm Arbitrary element expressible via (8)
s~\tilde{s} Arbitrary element expressible via (10)
β\beta Optimal post-measurement correction term
κ\kappa Optimal kernel which generates β\beta
()k(\cdot)_{k} The object ()(\cdot) at time step kk

4.1 Optimality Conditions

In this section, we will prove the following key theorem:

Theorem 4.1

The matrix kernel κk(i)L1(d,n×m)L2(d,n×m)\kappa_{k}(i)\in L_{1}(\mathbb{R}^{d},\mathbb{R}^{n\times m})\cap L_{2}(\mathbb{R}^{d},\mathbb{R}^{n\times m}) for (4) is optimal in the mean square error sense if and only if

dκk(i)(γ(i)Pk|k1γ(i)+R(ii))𝑑i=Pk|k1γ(i).\displaystyle\int_{\mathbb{R}^{d}}\kappa_{k}(i)\bigg{(}\gamma(i)P_{k|k-1}\gamma^{\top}(i^{\prime})+R(i-i^{\prime})\bigg{)}di=P_{k|k-1}\gamma^{\top}(i^{\prime}). (7)

where PkP_{k} and Pk|k1P_{k|k-1} denote the posterior covariance matrix and prior covariance matrix of the filter respectively.

Before we provide a proof of Theorem 4.1, we must first define some important sets, and projections onto those sets.
Consider the set of all possible elements of \mathcal{H} which can be expressed as a bounded linear integral functional operating over all previous measurements:

Mk1{mk1:mk1=j=1k1dϕj(i)zj(i)𝑑i},\displaystyle M_{k-1}\triangleq\bigg{\{}m_{k-1}\in\mathcal{H}:m_{k-1}=\sum_{j=1}^{k-1}\int_{\mathbb{R}^{d}}\phi_{j}(i)z_{j}(i)di\bigg{\}}, (8)
for some ϕjL1(d,1×m)L2(d,1×m)\displaystyle\text{ for some }\phi_{j}\in L_{1}(\mathbb{R}^{d},\mathbb{\mathbb{R}}^{1\times m})\cap L_{2}(\mathbb{R}^{d},\mathbb{\mathbb{R}}^{1\times m}) .

Note that Mk1M_{k-1} is a subspace of \mathcal{H}. We will also assume for the moment that Mk1M_{k-1} is a closed subspace of \mathcal{H}. Suppose that we are already in possession of the optimal estimate of xklx_{k}^{l} given the previous measurements. This estimate is the orthogonal projection of xklx_{k}^{l} onto Mk1M_{k-1},

x^k|k1lprojMk1xkl.\displaystyle\hat{x}_{k|k-1}^{l}\triangleq\operatorname{proj}_{M_{k-1}}x_{k}^{l}.

Given the measurements associated with Mk1,x^k|k1lM_{k-1},\;\hat{x}_{k|k-1}^{l} is the best estimate of xklx_{k}^{l} in the sense that it is the element of Mk1M_{k-1} that is the unique \mathcal{H}-norm minimizing vector in relation to the lthl^{th} component of the true state.

Let zkpz_{k}^{p} be the pthp^{th} element of zk{z}_{k} and z^kp\hat{z}_{k}^{p} denote the orthogonal projection of zkpz_{k}^{p} onto Mk1M_{k-1},

z^kp(i)projMk1zkp(i)i.\displaystyle\hat{z}_{k}^{p}(i)\triangleq\operatorname{proj}_{M_{k-1}}z_{k}^{p}(i)\quad\forall i.

The measurement innovation vector is sk(i)zk(i)z^k(i)s_{k}(i)\triangleq z_{k}(i)-\hat{z}_{k}(i), the pthp^{th} entry of which is denoted skp(i).s^{p}_{k}(i)\in\mathbb{R}. Note that for all mk1Mk1m_{k-1}\in M_{k-1},

skp(i)mk1i.\displaystyle s^{p}_{k}(i)\perp m_{k-1}\quad\forall i. (9)

and let

𝒮k{s~k:s~k=p=1mdαkp(i)sk(i)𝑑i},\displaystyle\mathcal{S}_{k}\triangleq\bigg{\{}\tilde{s}_{k}\in\mathcal{H}:\tilde{s}_{k}=\sum_{p=1}^{m}\int_{\mathbb{R}^{d}}\alpha_{k}^{p}(i)s_{k}(i)di\bigg{\}}, (10)
for some αkpL1(d,1×m)L2(d,1×m).\displaystyle\text{for some }\alpha_{k}^{p}\in L_{1}(\mathbb{R}^{d},\mathbb{\mathbb{R}}^{1\times m})\cap L_{2}(\mathbb{R}^{d},\mathbb{\mathbb{R}}^{1\times m}).

which is a subspace of .\mathcal{H}. Here αkL1(d,n×m)L2(d,n×m),{\alpha}_{k}\in L_{1}(\mathbb{R}^{d},\mathbb{\mathbb{R}}^{n\times m})\cap L_{2}(\mathbb{R}^{d},\mathbb{\mathbb{R}}^{n\times m}), with the pthp^{th} row given by αkp\alpha_{k}^{p}.

Lemma 4.2 (𝒮kMk1\mathcal{S}_{k}\perp M_{k-1})

Given the previous definitions of 𝒮k,Mk1,\mathcal{S}_{k},M_{k-1}, every element of 𝒮k\mathcal{S}_{k} is orthogonal to every element of Mk1.M_{k-1}.

Proof 4.3.

αkpL1(d,1×m)\alpha_{k}^{p}\in L_{1}(\mathbb{R}^{d},\mathbb{R}^{1\times m}) and skps_{k}^{p} is the sum of a random field with bounded covariance function, and an absolutely integrable function. We may therefore apply Theorem 3.1 and exchange integration and expectation operators.

s~k,mk1\displaystyle\bigg{\langle}\tilde{s}_{k},m_{k-1}\bigg{\rangle}_{\mathcal{H}} =p=1mdαkp(i)sk(i)𝑑i,mk1\displaystyle=\bigg{\langle}\sum_{p=1}^{m}\int_{\mathbb{R}^{d}}\alpha_{k}^{p}(i)s_{k}(i)di,m_{k-1}\bigg{\rangle}_{\mathcal{H}}
=E[(p=1mdαkp(i)sk(i)𝑑i)mk1]\displaystyle=E\bigg{[}\left(\sum_{p=1}^{m}\int_{\mathbb{R}^{d}}\alpha_{k}^{p}(i)s_{k}(i)di\right)\cdot m_{k-1}\bigg{]}
=p=1mdαkp(i)E[sk(i)mk1]𝑑i\displaystyle=\sum_{p=1}^{m}\int_{\mathbb{R}^{d}}\alpha_{k}^{p}(i)E[s_{k}(i)\cdot m_{k-1}]di
=0\displaystyle=0 (11)

The final equality following from the orthogonality property stated in (9).

Having shown that the spaces 𝒮\mathcal{S} and Mk1M_{k-1} are orthogonal, we will now find the projection of the true state xkx_{k} onto the space of all possible elements in \mathcal{H} that can be expressed by a bounded linear integral functional operating over all previous measurements and the current measurement. Let MkM^{-}_{k} be the set of all possible outputs of linear functionals of any element in 𝒵k\mathcal{Z}_{k},

Mk{mk:mk=dϕ(i)zk(i)𝑑i},\displaystyle M^{-}_{k}\triangleq\bigg{\{}m^{-}_{k}\in\mathcal{H}:m^{-}_{k}=\int_{\mathbb{R}^{d}}\phi(i)z_{k}(i)di\bigg{\}}, (12)
for some ϕL1(d,1×m)L2(d,1×m),zk𝒵k\displaystyle\text{ for some }\phi\in L_{1}(\mathbb{R}^{d},\mathbb{\mathbb{R}}^{1\times m})\cap L_{2}(\mathbb{R}^{d},\mathbb{\mathbb{R}}^{1\times m}),\ z_{k}\in\mathcal{Z}_{k} .

Note that using this definition, it immediately follows that Mk1+Mk=MkM_{k-1}+M^{-}_{k}=M_{k}, where the addition of sets is given by the Minkowski sum

U+V={u+v|uU,vV}.\displaystyle U+V=\{u+v|u\in U,v\in V\}.

We also use the \oplus operator to refer to a direct sum of subspaces, where any element within the direct sum is uniquely represented by a sum of elements within the relevant subspaces [44]. We wish to find the projection of xklx_{k}^{l} onto the space of Mk1+MkM_{k-1}+M^{-}_{k}, which is given by

x^kl\displaystyle\hat{x}^{l}_{k} =projMk1+Mkxkl\displaystyle=\operatorname{proj}_{M_{k-1}+M^{-}_{k}}x_{k}^{l}
=projMk1𝒮kxkl\displaystyle=\operatorname{proj}_{M_{k-1}\oplus\mathcal{S}_{k}}x_{k}^{l}
=projMk1xkl+proj𝒮kxkl\displaystyle=\operatorname{proj}_{M_{k-1}}x_{k}^{l}+\operatorname{proj}_{\mathcal{S}_{k}}x_{k}^{l}
=x^k|k1l+βkl,βklproj𝒮kxkl.\displaystyle=\hat{x}_{k|k-1}^{l}+\beta_{k}^{l},\quad\beta_{k}^{l}\triangleq\operatorname{proj}_{\mathcal{S}_{k}}x_{k}^{l}. (13)

This sum represents a decomposition of the projected state component given measurements up to time kk as a unique sum of a component within Mk1M_{k-1} and a component orthogonal to Mk1.M_{k-1}. Note that as βkl𝒮k\beta^{l}_{k}\in{\mathcal{S}_{k}}, it can be formulated as p=1mdκkl,p(i)skp(i)𝑑i\sum_{p=1}^{m}\int_{\mathbb{R}^{d}}\kappa_{k}^{l,p}(i)s_{k}^{p}(i)di, each κkl,p(i)\kappa_{k}^{l,p}(i) being the lthl^{th} row, pthp^{th} column element of some κk(i)n×m{\kappa}_{k}(i)\in\mathbb{R}^{n\times m} which must be determined. This matrix-valued function κk(i){{\kappa}_{k}}(i) is equivalent to our optimal gain. We are now ready to prove Theorem 4.1, which we will do by formulating necessary and sufficient conditions on the gain such that the optimal post-measurement correction term βk=dκk(i)sk(i)𝑑i\beta_{k}=\int_{\mathbb{R}^{d}}\kappa_{k}(i)s_{k}(i)\,di is generated. The proof of Theorem 4.1 is as follows:

Proof 4.4.

We will denote the lthl^{th} column of κk(i){\kappa}^{\top}_{k}(i) as κkl(i)m.{{\kappa}^{l}_{k}}(i)\in\mathbb{R}^{m}. By the Hilbert projection theorem, (xklβkl)𝒮kxklβkl,s~k=0(x_{k}^{l}-\beta^{l}_{k})\perp{\mathcal{S}_{k}}\implies\langle x_{k}^{l}-\beta^{l}_{k},\tilde{s}_{k}\rangle_{\mathcal{H}}=0 for all s~k𝒮k\tilde{s}_{k}\in\mathcal{S}_{k}, where βkl\beta^{l}_{k} is guaranteed to be unique. We then have that xkl,s~k=βkl,s~k\langle x_{k}^{l},\tilde{s}_{k}\rangle_{\mathcal{H}}=\langle\beta^{l}_{k},\tilde{s}_{k}\rangle_{\mathcal{H}} which, in conjunction with the fact that x^k|k1l,s~k=0\langle\hat{x}_{k|k-1}^{l},\tilde{s}_{k}\rangle_{\mathcal{H}}=0, implies

xklx^k|k1l,s~k=βkl,s~k.\displaystyle\langle x_{k}^{l}-\hat{x}_{k|k-1}^{l},\tilde{s}_{k}\rangle_{\mathcal{H}}=\langle\beta^{l}_{k},\tilde{s}_{k}\rangle_{\mathcal{H}}. (14)

This is a necessary and sufficient condition for βkl\beta_{k}^{l} to be the projection of our state component onto the space 𝒮k.\mathcal{S}_{k}. We now must find the appropriate optimal gain function κk(i)\kappa_{k}(i) such that βk\beta_{k}, the corresponding vector generated, satisfies condition (14) for all l.l. Expanding out the definitions of βkl\beta_{k}^{l} and s~k\tilde{s}_{k}, interchanging integration and summation, and rearranging, leads to

xklx^k|k1ldp=1mκkl,p(i)skp(i)di,dp=1mαkp(i)skp(i)di=0\displaystyle\bigg{\langle}\mkern-5.0mux_{k}^{l}-\hat{x}_{k|k-1}^{l}-\int_{\mathbb{R}^{d}}\sum_{p=1}^{m}\kappa_{k}^{l,p}(i)s_{k}^{p}(i)di,\mkern-5.0mu\int_{\mathbb{R}^{d}}\sum_{p=1}^{m}\alpha_{k}^{p}(i)s^{p}_{k}(i)di\mkern-5.0mu\bigg{\rangle}_{\mathcal{H}}\mkern-14.0mu=\mkern-2.0mu0

Note that p=1mκkl,p(i)skp(i)\sum_{p=1}^{m}\kappa_{k}^{l,p}(i){s}_{k}^{p}(i) is simply the vector multiplication sk(i)κkl(i).{s}^{\top}_{k}(i){\kappa}^{l}_{k}(i). The same reasoning may be applied to show p=1mαkp(i)skp(i)=sk(i)αk(i).\sum_{p=1}^{m}\alpha_{k}^{p}(i)s^{p}_{k}(i)=s^{\top}_{k}(i)\alpha_{k}(i). We then have for all ll

0\displaystyle 0\mkern-3.0mu =xklx^k|k1ldp=1mκkl,p(i)skp(i)di,dp=1mαkp(i)skp(i)di\displaystyle=\mkern-5.0mu\bigg{\langle}\mkern-5.0mux_{k}^{l}\mkern-4.0mu-\mkern-4.0mu\hat{x}_{k|k-1}^{l}\mkern-3.0mu-\mkern-6.0mu\int_{\mathbb{R}^{d}}\mkern-4.0mu\sum_{p=1}^{m}\mkern-2.0mu\kappa_{k}^{l,p}\mkern-2.0mu(i)s_{k}^{p}(i)di,\mkern-5.0mu\int_{\mathbb{R}^{d}}\mkern-3.0mu\sum_{p=1}^{m}\mkern-2.0mu\alpha_{k}^{p}(i^{\prime})s^{p}_{k}(i^{\prime})di^{\prime}\mkern-5.0mu\bigg{\rangle}_{\mathcal{\mkern-5.0muH}}
=xklx^k|k1ldsk(i)κkl(i)𝑑i,dsk(i)αk(i)𝑑i\displaystyle=\mkern-5.0mu\bigg{\langle}\mkern-5.0mux_{k}^{l}\mkern-4.0mu-\mkern-4.0mu\hat{x}_{k|k-1}^{l}\mkern-3.0mu-\mkern-6.0mu\int_{\mathbb{R}^{d}}{s}^{\top}_{k}(i){\kappa}^{l}_{k}(i)di,\int_{\mathbb{R}^{d}}s^{\top}_{k}(i^{\prime})\alpha_{k}(i^{\prime})di^{\prime}\bigg{\rangle}_{\mathcal{H}}
=E[(xklx^k|k1ldsk(i)κkl(i)𝑑i)dsk(i)αk(i)𝑑i]\displaystyle=\mkern-3.0muE\bigg{[}\bigg{(}x_{k}^{l}-\hat{x}_{k|k-1}^{l}-\mkern-4.0mu\int_{\mathbb{R}^{d}}\mkern-7.0mu{s}_{k}(i)^{\top}{\kappa}^{l}_{k}(i)di\bigg{)}\mkern-4.0mu\int_{\mathbb{R}^{d}}\mkern-7.0mus^{\top}_{k}(i^{\prime})\alpha_{k}(i^{\prime})di^{\prime}\bigg{]}
=Θd(xklx^k|k1ldsk(i)κkl(i)𝑑i)sk(i)αk(i)𝑑i𝑑P(θ).\displaystyle=\mkern-7.0mu\int_{\Theta}\mkern-3.0mu\int_{\mathbb{R}^{d}}\mkern-7.0mu\bigg{(}\mkern-4.0mux_{k}^{l}\mkern-3.0mu-\mkern-3.0mu\hat{x}_{k|k-1}^{l}\mkern-3.0mu-\mkern-7.0mu\int_{\mathbb{R}^{d}}\mkern-9.0mu{s}_{k}(i)^{\mkern-4.0mu\top}{\kappa}^{l}_{k}(i)di\mkern-4.0mu\bigg{)}s^{\mkern-4.0mu\top}_{k}(i^{\prime})\alpha_{k}(i^{\prime})di^{\prime}dP(\theta).

Note that as sks_{k} is a random field with bounded covariance function, and αk\alpha_{k} is composed of absolutely integrable components, then we may employ Theorem 3.1 and interchange expectation and integration operators. Therefore for all l,l,

0\displaystyle 0\mkern-3.0mu =dΘ(xklx^k|k1ldsk(i)κkl(i)𝑑i)sk(i)αk(i)𝑑P(θ)𝑑i\displaystyle=\mkern-8.0mu\int_{\mathbb{R}^{d}}\mkern-3.0mu\int_{\Theta}\mkern-3.0mu\bigg{(}\mkern-3.0mux_{k}^{l}\mkern-3.0mu-\mkern-3.0mu\hat{x}_{k|k-1}^{l}\mkern-3.0mu-\mkern-4.0mu\int_{\mathbb{R}^{d}}\mkern-4.0mu{s}_{k}(i)^{\mkern-4.0mu\top}\mkern-4.0mu{\kappa}^{l}_{k}(i)di\bigg{)}s^{\top}_{k}(i^{\prime})\alpha_{k}(i^{\prime})dP(\theta)di^{\prime}
=dE[(xklx^k|k1ldsk(i)κkl(i)𝑑i)sk(i)]αk(i)𝑑i\displaystyle=\mkern-4.0mu\int_{\mathbb{R}^{d}}\mkern-4.0muE\mkern-2.0mu\left[\left(\mkern-4.0mux_{k}^{l}-\hat{x}_{k|k-1}^{l}-\int_{\mathbb{R}^{d}}{s}_{k}(i)^{\top}{\kappa}^{l}_{k}(i)di\right)s^{\top}_{k}(i^{\prime})\right]\alpha_{k}(i^{\prime})di^{\prime}

Because αk\alpha_{k} is arbitrary, we must have that for all ll and almost all ii^{\prime}

E[(xklx^k|k1ldsk(i)κkl(i)𝑑i)sk(i)]=0.\displaystyle E\bigg{[}\bigg{(}x_{k}^{l}-\hat{x}_{k|k-1}^{l}-\int_{\mathbb{R}^{d}}{s}^{\top}_{k}(i){\kappa}_{k}^{l}(i)di\bigg{)}{s}_{k}^{\top}(i^{\prime})\bigg{]}={0}. (15)

As the first argument is a scalar and the second a vector, the condition that this holds true for all ll can be conveyed by a single matrix equality. Note that sk(i)κkl(i)s^{\top}_{k}(i)\kappa_{k}^{l}(i) is equivalent to the lthl^{th} row of κk(i)sk(i).\kappa_{k}(i)s_{k}(i). By stacking the ll row vectors in (15) to form an n×m\mathbb{R}^{n\times m} matrix we may impose the matrix equality that for almost all ii^{\prime}

E[(xkx^k|k1dκk(i)sk(i)𝑑i)sk(i)]=0.\displaystyle E\bigg{[}\bigg{(}{x}_{k}-\hat{x}_{k|k-1}-\int_{\mathbb{R}^{d}}{\kappa}_{k}(i){s}_{k}(i)di\bigg{)}{s}_{k}^{\top}(i^{\prime})\bigg{]}={0}. (16)

Substituting in our value for sk(i)s_{k}(i) into (16) we find

E[(xkx^k|k1dκk(i)[γ(i)(xkx^k|k1)+vk(i)]di)\displaystyle E\bigg{[}\bigg{(}x_{k}-\hat{x}_{k|k-1}-\int_{\mathbb{R}^{d}}\kappa_{k}(i)\big{[}\gamma(i)(x_{k}-\hat{x}_{k|k-1})+v_{k}(i)\big{]}di\bigg{)}
×([γ(i)(xkx^k|k1)+vk(i)])]\displaystyle\times\bigg{(}\big{[}\gamma(i^{\prime})(x_{k}-\hat{x}_{k|k-1})+v_{k}(i^{\prime})\big{]}\bigg{)}^{\top}\bigg{]}
=\displaystyle= E[(xkx^k|k1)(xkx^k|k1)]γ(i)\displaystyle E\bigg{[}(x_{k}-\hat{x}_{k|k-1})(x_{k}-\hat{x}_{k|k-1})^{\top}\bigg{]}\gamma^{\top}(i^{\prime})
d(κk(i)(γ(i)E[(xkx^k|k1)(xkx^k|k1)]γ(i)\displaystyle-\int_{\mathbb{R}^{d}}\bigg{(}\kappa_{k}(i)(\gamma(i)E\bigg{[}(x_{k}-\hat{x}_{k|k-1})(x_{k}-\hat{x}_{k|k-1})^{\top}\bigg{]}\gamma^{\top}(i^{\prime})
+E[vk(i)vk(i)])di=0\displaystyle+E\bigg{[}v_{k}(i)v_{k}^{\top}(i^{\prime})\bigg{]}\bigg{)}di=0

Let the posterior covariance matrix E[(xkx^k)(xkx^k)]E[(x_{k}-\hat{x}_{k})(x_{k}-\hat{x}_{k})^{\top}] be denoted by PkP_{k}, and let the prior covariance matrix E[(xkx^k|k1)(xkx^k|k1)]E[(x_{k}-\hat{x}_{k|k-1})(x_{k}-\hat{x}_{k|k-1})^{\top}] be denoted by Pk|k1.P_{k|k-1}. Substituting these terms for the corresponding expectations, we find the optimal gain function κk(i)\kappa_{k}(i) must satisfy the condition

Pk|k1γ(i)dκk(i)(γ(i)Pk|k1γ(i)+R(ii))𝑑i=0.\displaystyle P_{k|k-1}\gamma^{\top}(i^{\prime})-\mkern-6.0mu\int_{\mathbb{R}^{d}}\mkern-4.0mu\kappa_{k}(i)\bigg{(}\gamma(i)P_{k|k-1}\gamma^{\top}(i^{\prime})+R(i-i^{\prime})\bigg{)}di=0.

Rearranging these terms we arrive at the necessary and sufficient condition

dκk(i)(γ(i)Pk|k1γ(i)+R(ii))𝑑i=Pk|k1γ(i).\displaystyle\int_{\mathbb{R}^{d}}\kappa_{k}(i)\bigg{(}\gamma(i)P_{k|k-1}\gamma^{\top}(i^{\prime})+R(i-i^{\prime})\bigg{)}di=P_{k|k-1}\gamma^{\top}(i^{\prime}). (17)

An alternative derivation is given in [21] based on the Calculus of Variations approach. The derivation above, which uses the Hilbert Projection Theorem, has the advantage of a guaranteed uniqueness of the correction term dκk(i)sk(i)𝑑i\int_{\mathbb{R}^{d}}\kappa_{k}(i)s_{k}(i)di and state estimate x^k\hat{x}_{k}, which follows immediately from Theorem 2.1. This approach avoids the use of the functional derivative which has attracted some criticism [8], [5], as well as closely mirroring the methodology of the original Kalman proof [45].

Recall that we initially assumed Mk1M_{k-1} is a closed subspace. We will now show that an explicit closed-form solution of the optimal gain function can be derived via dd-dimensional Fourier based methods, thanks to the w.s.s. assumption of the measurement noise. We will then show that when a unique gain function which satisfies (17) exists, it is indeed the case that Mk1M_{k-1} is closed, which is derived in Corollary 4.5.

4.2 An Explicit Solution for the Optimal Gain Function

We may now proceed to find an explicit value for the optimal gain function that satisfies (7). Consider the matrix-valued functional

F(κk)(Idκk(i)γ(i)𝑑i)Pk|k1n×n\displaystyle F(\kappa_{k})\triangleq\bigg{(}I-\int_{\mathbb{R}^{d}}\kappa_{k}(i)\gamma(i)di\bigg{)}P_{k|k-1}\in\mathbb{R}^{n\times n} (18)

and note that,

dκk(i)R(ii)𝑑i\displaystyle\int_{\mathbb{R}^{d}}\kappa_{k}(i)R(i^{\prime}-i)di =F(κk)γ(i)\displaystyle=F(\kappa_{k})\gamma^{\top}(i^{\prime}) (19)

The left hand side of (19) is simply a multi-dimensional convolution. Many useful properties of the scalar Fourier transform carry over to the multi-dimensional Fourier transform. We will make use of the convolution property [46],

{df(i1y1,,idyd)g(y1,,yd)}dy1dyd\displaystyle\mathcal{F}\left\{\int_{\mathbb{R}^{d}}f(i_{1}-y_{1},...,i_{d}-y_{d})g(y_{1},...,y_{d})\right\}dy_{1}...dy_{d}
=\displaystyle= {(fg)(i1,i2,,id)}=({f})(ω)({g})(ω).\displaystyle\mathcal{F}\{(f\ast g)(i_{1},i_{2},...,i_{d})\}=(\mathcal{F}\{f\})(\omega)(\mathcal{F}\{g\})(\omega). (20)

This enables us to find that (19) is equivalent to

(κkR)(i)\displaystyle(\kappa_{k}\ast R)(i^{\prime}) =F(κk)γ(i)\displaystyle=F(\kappa_{k})\gamma^{\top}(i^{\prime}) (21)

Note that ω\omega will be a vector of the same dimension as ii. We now require three assumptions to ensure existence and invertibility of the Fourier transform of RR, as well as the existence of the Fourier transform of γ\gamma^{\top}: See A1 See A2 See A3 Applying (21) and (20) to (19) implies

κ¯k(ω)R¯(ω)\displaystyle\bar{\kappa}_{k}(\omega)\bar{R}(\omega) =F(κk)γ¯(ω)\displaystyle=F(\kappa_{k})\bar{\gamma}^{\top}(\omega)
κ¯k(ω)\displaystyle\bar{\kappa}_{k}(\omega) =F(κk)γ¯(ω)R¯(ω)1\displaystyle=F(\kappa_{k})\bar{\gamma}^{\top}(\omega)\bar{R}(\omega)^{-1}
κk(i)\displaystyle\kappa_{k}(i) =F(κk)f(i),f(i)=1{γ¯(ω)R¯(ω)1}.\displaystyle=F(\kappa_{k})f(i),\ f(i)=\mathcal{F}^{-1}\{\bar{\gamma}^{\top}(\omega)\bar{R}(\omega)^{-1}\}. (22)

For this method to be applicable we must also make the following assumption See A4 Combining (18) and (22) yields

F(κk)\displaystyle F(\kappa_{k}) =(IF(κk)df(i)γ(i)𝑑i)Pk|k1\displaystyle=\bigg{(}I-F(\kappa_{k})\int_{\mathbb{R}^{d}}f(i)\gamma(i)di\bigg{)}P_{k|k-1}
F(κk)\displaystyle\implies F(\kappa_{k}) =Pk|k1(I+df(i)γ(i)𝑑iPk|k1)1\displaystyle=P_{k|k-1}\bigg{(}I+\int_{\mathbb{R}^{d}}f(i)\gamma(i)diP_{k|k-1}\bigg{)}^{-1} (23)
κk(i)\displaystyle\implies\kappa_{k}(i) =Pk|k1(I+df(i)γ(i)𝑑iPk|k1)1f(i)\displaystyle=P_{k|k-1}\bigg{(}I+\int_{\mathbb{R}^{d}}f(i)\gamma(i)diP_{k|k-1}\bigg{)}^{-1}f(i)

Note that this directly implies that for all kk\in\mathbb{N}, κkL1(d,n×m)L2(d,n×m)\kappa_{k}\in L_{1}(\mathbb{R}^{d},\mathbb{R}^{n\times m})\cap L_{2}(\mathbb{R}^{d},\mathbb{R}^{n\times m}). For simplicity of notation, recall from (6) that S=df(i)γ(i)𝑑in×n.S=\int_{\mathbb{R}^{d}}f(i)\gamma(i)di\in\mathbb{R}^{n\times n}. Having thus found the optimal κk(i)\kappa_{k}(i) function which generates βk\beta_{k}, we now modify (13) to find for any state component ll,

projMk1+Mkxkl\displaystyle\operatorname{proj}_{M_{k-1}+M^{-}_{k}}x_{k}^{l}
=x^k|k1l+p=1md[Pk|k1(I+SPk|k1)1f(i)]l,pskp(i)𝑑i.\displaystyle=\hat{x}_{k|k-1}^{l}+\sum_{p=1}^{m}\int_{\mathbb{R}^{d}}\big{[}P_{k|k-1}(I+SP_{k|k-1})^{-1}f(i)\big{]}^{l,p}s^{p}_{k}(i)di.

These ll equations may be stacked up into a single matrix equation of the form

projMk1+Mkxk\displaystyle\operatorname{proj}_{M_{k-1}+M^{-}_{k}}x_{k}
=x^k|k1+dPk|k1(I+SPk|k1)1f(i)sk(i)𝑑i,\displaystyle=\hat{x}_{k|k-1}+\int_{\mathbb{R}^{d}}P_{k|k-1}(I+SP_{k|k-1})^{-1}f(i)s_{k}(i)di, (24)

where the projection operator acting on the state vector is understood to represent the vector of each components projections. We have thus far assumed that x^k|k1\hat{x}_{k|k-1} is known to us, we will complete the chain now by determining this value explicitly as

x^k|k1\displaystyle\hat{x}_{k|k-1} =projMk1xk\displaystyle=\operatorname{proj}_{M_{k-1}}x_{k}
=projMk1Axk1+projMk1wk1\displaystyle=\operatorname{proj}_{M_{k-1}}Ax_{k-1}+\operatorname{proj}_{M_{k-1}}w_{k-1}
=AprojMk1xk1\displaystyle=A\operatorname{proj}_{M_{k-1}}x_{k-1}
=AprojMk2+Mk2+xk1\displaystyle=A\operatorname{proj}_{M_{k-2}+M^{+}_{k-2}}x_{k-1}
=Ax^k1\displaystyle=A\hat{x}_{k-1} (25)

The form of (24) with this substitution is then

x^k=\displaystyle\hat{x}_{k}= projMk1+Mkxk\displaystyle\operatorname{proj}_{M_{k-1}+M^{-}_{k}}x_{k}
=\displaystyle= Ax^k1+Pk|k1(I+SPk|k1)1df(i)sk(i)𝑑i\displaystyle A{\hat{x}_{k-1}}+P_{k|k-1}(I+SP_{k|k-1})^{-1}\int_{\mathbb{R}^{d}}f(i)s_{k}(i)di
=\displaystyle= Ax^k1+\displaystyle A{\hat{x}_{k-1}}+
Pk|k1(I+SPk|k1)1df(i)(zk(i)γ(i)x^k|k1)𝑑i,\displaystyle P_{k|k-1}(I+SP_{k|k-1})^{-1}\int_{\mathbb{R}^{d}}\mkern-5.0muf(i)\big{(}z_{k}(i)-\gamma(i)\hat{x}_{k|k-1}\big{)}di,

where in the last line we substitute sk(i)=zk(i)z^k(i)s_{k}(i)=z_{k}(i)-\hat{z}_{k}(i), noting that

z^k(i)\displaystyle\hat{z}_{k}(i) =projMk1zk(i)\displaystyle=\operatorname{proj}_{M_{k-1}}z_{k}(i)
=projMk1(γ(i)xk+vk(i))\displaystyle=\operatorname{proj}_{M_{k-1}}\big{(}\gamma(i)x_{k}+v_{k}(i)\big{)}
=γ(i)projMk1xk\displaystyle=\gamma(i)\operatorname{proj}_{M_{k-1}}x_{k}
=γ(i)x^k|k1.\displaystyle=\gamma(i)\hat{x}_{k|k-1}.

Finally, we will show that the space Mk1M_{k-1} is indeed closed, which was previously assumed in Section 4.1. Hence it admits a unique projection by the Hilbert projection theorem (Thm. 2.1), implying the uniqueness of the optimal filter derived above.

Corollary 4.5.

Given the system defined in Section 3, and conditions given in Section 4.1, the space Mk1M_{k-1} is a closed subspace of \mathcal{H}.

Proof 4.6.

The optimal gain function given by (22) represents a kernel defining an operator Kk:𝒮kK_{k}:\mathcal{H}\rightarrow\mathcal{S}_{k} as presented in (5). This operator generates a unique βk=Kkxk\beta_{k}=K_{k}x_{k} such that for each element ll, (xklβkl)𝒮k(x_{k}^{l}-\beta_{k}^{l})\perp\mathcal{S}_{k}. This unique βk\beta_{k} is in fact a minimizer as for all s~k𝒮k\tilde{s}_{k}\in\mathcal{S}_{k}

s~kxkl2=\displaystyle\|\tilde{s}_{k}-x_{k}^{l}\|_{\mathcal{H}}^{2}= βklxkl+s~kβkl2\displaystyle\|\beta_{k}^{l}-x_{k}^{l}+\tilde{s}_{k}-\beta_{k}^{l}\|_{\mathcal{H}}^{2}
=\displaystyle= βklxkl2+s~kβkl2\displaystyle\|\beta_{k}^{l}-x_{k}^{l}\|_{\mathcal{H}}^{2}+\|\tilde{s}_{k}-\beta_{k}^{l}\|_{\mathcal{H}}^{2}
+2βklxkl,s~kβkl\displaystyle+2\langle\beta_{k}^{l}-x_{k}^{l},\tilde{s}_{k}-\beta_{k}^{l}\rangle_{\mathcal{H}}
=\displaystyle= βklxkl2+s~kβkl2\displaystyle\|\beta_{k}^{l}-x_{k}^{l}\|_{\mathcal{H}}^{2}+\|\tilde{s}_{k}-\beta_{k}^{l}\|_{\mathcal{H}}^{2}
xkls~k2\displaystyle\implies\|x_{k}^{l}-\tilde{s}_{k}\|_{\mathcal{H}}^{2}\geq xklβkl2,\displaystyle\|x_{k}^{l}-\beta_{k}^{l}\|_{\mathcal{H}}^{2},

with equality iff s~k=βkl\tilde{s}_{k}=\beta_{k}^{l}.

Let us now examine an arbitrary Cauchy sequence {s1,s2,,sn}\{s_{1},s_{2},...,s_{n}\} with each element in 𝒮k\mathcal{S}_{k}. We also denote limnsn=s¯\lim_{n\rightarrow\infty}s_{n}=\bar{s}. As 𝒮k\mathcal{S}_{k} is a subspace of a closed space \mathcal{H}, it is guaranteed that s¯\bar{s}\in\mathcal{H} exists. For any s¯\bar{s}\in\mathcal{H} there exists a unique minimizer βl𝒮k\beta^{l}\in\mathcal{S}_{k} such that

βls¯2\displaystyle\|\beta^{l}-\bar{s}\|_{\mathcal{H}}^{2} ss¯2s𝒮k\displaystyle\leq\|s-\bar{s}\|_{\mathcal{H}}^{2}\quad\forall s\in\mathcal{S}_{k}
βls¯2\displaystyle\|\beta^{l}-\bar{s}\|_{\mathcal{H}}^{2} sns¯2n.\displaystyle\leq\|s_{n}-\bar{s}\|_{\mathcal{H}}^{2}\quad\forall n.

As the upper bound goes to zero as nn increases, and the inequality holds for all nn, s¯\bar{s} must equal βl\beta^{l} and hence also reside in the space 𝒮k\mathcal{S}_{k}. As the Cauchy sequence was arbitrary, 𝒮k\mathcal{S}_{k} contains all its limit points and is hence closed. It is well known that if two closed linear subspaces of a Hilbert space are orthogonal, then the direct sum of these subspaces is also closed [47, p. 38]. This result, combined with the fact that Mk=M0𝒮1𝒮2𝒮kM_{k}=M_{0}\oplus\mathcal{S}_{1}\oplus\mathcal{S}_{2}...\oplus\mathcal{S}_{k}, implies that if M0M_{0} is a closed space, MkM_{k} is a closed space for any kk.

4.3 The Optimal Filter

We now have sufficient knowledge to state a key theorem.

Theorem 4.7.

For the system defined in Section 3, at each time step kk, the optimal filter gain κk(i)\kappa_{k}(i) and associated covariance matrices are given by the equations.

κk(i)\displaystyle\kappa_{k}(i) =Pkf(i),\displaystyle=P_{k}f(i),
Pk|k1\displaystyle P_{k|k-1} =APk1A+Q,\displaystyle=AP_{k-1}A^{\top}+Q,
Pk\displaystyle P_{k} =Pk|k1(I+SPk|k1)1,\displaystyle=P_{k|k-1}\bigg{(}I+SP_{k|k-1}\bigg{)}^{-1},

where f(i)=1{γ¯(ω)R¯(ω)1},S=df(i)γ(i)𝑑if(i)=\mathcal{F}^{-1}\{\bar{\gamma}^{\top}(\omega)\bar{R}(\omega)^{-1}\},\quad S=\int_{\mathbb{R}^{d}}f(i)\gamma(i)di.

Proof 4.8.

The optimal gain function κk\kappa_{k} has been derived in Section 4.2 and is given in (22). It remains to derive the covariance matrix update equations. First, we connect the prior covariance matrix to the covariance matrix of the previous time step, via

Pk|k1\displaystyle P_{k|k-1} =E[(xkx^k|k1)(xkx^k|k1)]\displaystyle=E[(x_{k}-\hat{x}_{k|k-1})(x_{k}-\hat{x}_{k|k-1})^{\top}]
=AE[(xk1x^k1)(xk1x^k1)]A+Q\displaystyle=AE[(x_{k-1}-\hat{x}_{k-1})(x_{k-1}-\hat{x}_{k-1})^{\top}]A^{\top}+Q
=APk1A+Q.\displaystyle=AP_{k-1}A^{\top}+Q. (26)

Then we determine the relation between PkP_{k} and Pk|k1P_{k|k-1} by first observing that expansion and substitution yields the equation

Pk=E[ekek]\displaystyle P_{k}=E[e_{k}e_{k}^{\top}]\mkern-14.0mu
=Pk|k1\displaystyle=P_{k|k-1} dκk(i)γ(i)𝑑iPk|k1Pk|k1dγ(i)κk(i)𝑑i\displaystyle-\int_{\mathbb{R}^{d}}\mkern-7.0mu\kappa_{k}(i)\gamma(i)diP_{k|k-1}-P_{k|k-1}\int_{\mathbb{R}^{d}}\mkern-7.0mu\gamma^{\top}(i)\kappa_{k}^{\top}(i)di
+dκk(i)γ(i)𝑑iPk|k1dγ(i)κk(i)𝑑i\displaystyle+\int_{\mathbb{R}^{d}}\kappa_{k}(i)\gamma(i)diP_{k|k-1}\int_{\mathbb{R}^{d}}\gamma^{\top}(i)\kappa_{k}^{\top}(i)di
+ddκk(i)R(ii)κk(i)𝑑i𝑑i.\displaystyle+\int_{\mathbb{R}^{d}}\int_{\mathbb{R}^{d}}\kappa_{k}(i)R(i-i^{\prime})\kappa_{k}(i^{\prime})^{\top}\,di\,di^{\prime}.

By substitution of (7), all but the first two terms cancel to give us

Pk=(Idκk(i)γ(i)𝑑i)Pk|k1\displaystyle P_{k}=\bigg{(}I-\int_{\mathbb{R}^{d}}\kappa_{k}(i)\gamma(i)di\bigg{)}P_{k|k-1} (27)

But this is precisely the equation for F(κk)F(\kappa_{k}) defined in (18), so from (23) we simply have

Pk=F(κk)=Pk|k1(I+df(i)γ(i)𝑑iPk|k1)1\displaystyle P_{k}=F(\kappa_{k})=P_{k|k-1}\bigg{(}I+\int_{\mathbb{R}^{d}}f(i)\gamma(i)\,diP_{k|k-1}\bigg{)}^{-1} (28)

5 Optimal Linear Filter Stability

We will now show that the derived filter provided by Theorem 4.7 is mean square stable under assumptions A1-A7. To do so we will establish via Lemma 5.1 that the associated covariance matrices obey a discrete-time algebraic Riccati equation. We will then, in Theorem 5.3, show that, so long as the initial covariance matrix is non-negative and symmetric, there exists a unique asymptotic a priori covariance matrix that satisfies this Riccati equation.

Lemma 5.1.

Consider the filter equations provided by Theorem 4.7 for the system defined in Section 3. The matrix SS is positive semi-definite, it is also finite under assumptions A1 and A4. The matrix SS admits a unique square root decomposition S=GGS=GG for some symmetric positive semi-definite Gn×nG\in\mathbb{R}^{n\times n}. Furthermore, for k1k\geq 1, the a priori covariance matrices Pk+1|kP_{k+1|k} satisfy the discrete-time algebraic Riccati equation

Pk+1|k=\displaystyle P_{k+1|k}= APk|k1A\displaystyle AP_{k|k-1}A^{\top}
\displaystyle- APk|k1G(I+GPk|k1G)1GPk|k1A+Q.\displaystyle AP_{k|k-1}G\left(I+G^{\top}P_{k|k-1}G\right)^{-1}G^{\top}P_{k|k-1}A^{\top}+Q. (29)
Proof 5.2.

First we find that R¯(ω)\bar{R}(\omega) is Hermitian as

R¯(ω)=\displaystyle\bar{R}(\omega)^{*}= de2πjωiR(i)𝑑i\displaystyle\int_{\mathbb{R}^{d}}e^{2\pi j\omega\cdot i}R(i)^{\top}\,di
=\displaystyle= de2πjωiR(i)𝑑i\displaystyle\int_{\mathbb{R}^{d}}e^{2\pi j\omega\cdot i}R(-i)\,di
=\displaystyle= de2πjωiR(i)𝑑i=R¯(ω),\displaystyle\int_{\mathbb{R}^{d}}e^{-2\pi j\omega\cdot i}R(i)di=\bar{R}(\omega),

where ()(\cdot)^{*} denotes the complex conjugate transpose. R¯(ω)\bar{R}(\omega) is also positive semi-definite, as for almost all ω\omega and all umu\in\mathbb{C}^{m},

uR¯(w)u=\displaystyle u^{*}\bar{R}(w)u= de2πjiωE[uv(+i)v()u]𝑑i,d\displaystyle\int_{\mathbb{R}^{d}}e^{-2\pi ji\cdot\omega}E[u^{*}v(\ell+i)v(\ell)^{\top}u]\,di,\ \forall\ell\in\mathbb{R}^{d}
=\displaystyle= de2πjiωE[(v()u)(v(+i)u)]𝑑i.\displaystyle\int_{\mathbb{R}^{d}}e^{-2\pi ji\cdot\omega}E\left[\left(v(\ell)^{\top}u\right)\left(v(\ell+i)^{\top}u\right)^{*}\right]\,di.

Define the stationary complex-valued random field ψ()v()u\psi(\ell)\triangleq v(\ell)^{\top}u\in\mathbb{C}. Denote the corresponding auto-covariance function and power spectral density as Kψ(i)K_{\psi}(i) and Sψ(ω)S_{\psi}(\omega) respectively. It then follows that

uR¯(ω)u=\displaystyle u^{*}\bar{R}(\omega)u= de2πjiωE[ψ()ψ(+i)]𝑑i\displaystyle\int_{\mathbb{R}^{d}}e^{-2\pi ji\cdot\omega}E\left[\psi(\ell)\psi(\ell+i)^{*}\right]\,di
=\displaystyle= de2πjiωKψ(i)𝑑i\displaystyle\int_{\mathbb{R}^{d}}e^{-2\pi ji\cdot\omega}K_{\psi}(i)\,di
=\displaystyle= Sψ(ω),\displaystyle S_{\psi}(\omega),

where the last equality is given by the Wiener-Khintchine-Einstein relation [48, p. 82]. As Sψ(ω)S_{\psi}(\omega) is non-negative, R¯(ω)\bar{R}(\omega) is positive semi-definite for almost all ω\omega. As R¯(ω)\bar{R}(\omega) is invertible by assumption A3, R¯(ω)\bar{R}(\omega) is positive-definite for almost all ω\omega. It follows directly that R¯(ω)1\bar{R}(\omega)^{-1} is hermitian and positive-definite for almost all ω\omega.

It can now be shown that SS is symmetric and positive semi-definite as

S=\displaystyle S= df(i)γ(i)𝑑i\displaystyle\int_{\mathbb{R}^{d}}f(i)\gamma(i)\,di
=\displaystyle= dde2πjiωγ¯(ω)R¯(ω)1γ(i)𝑑ω𝑑i\displaystyle\int_{\mathbb{R}^{d}}\int_{\mathbb{R}^{d}}e^{2\pi ji\cdot\omega}\bar{\gamma}(\omega)^{\top}\bar{R}(\omega)^{-1}\gamma(i)\,d\omega\,di
=\displaystyle= dγ¯(ω)R¯(ω)1de2πjiωγ(i)𝑑i𝑑ω\displaystyle\int_{\mathbb{R}^{d}}\bar{\gamma}(\omega)^{\top}\bar{R}(\omega)^{-1}\int_{\mathbb{R}^{d}}e^{2\pi ji\cdot\omega}\gamma(i)\,di\,d\omega (30)
=\displaystyle= dγ¯(ω)R¯(ω)11{γ(i)}𝑑ω\displaystyle\int_{\mathbb{R}^{d}}\bar{\gamma}(\omega)^{\top}\bar{R}(\omega)^{-1}\mathcal{F}^{-1}\{\gamma(i)\}\,d\omega
=\displaystyle= dγ¯(ω)R¯(ω)1γ¯(ω)𝑑ω\displaystyle\int_{\mathbb{R}^{d}}\bar{\gamma}(\omega)^{\top}\bar{R}(\omega)^{-1}\bar{\gamma}(-\omega)\,d\omega
=\displaystyle= dγ¯(ω)R¯(ω)1γ¯(ω)𝑑ω.\displaystyle\int_{\mathbb{R}^{d}}\bar{\gamma}(-\omega)^{\top}\bar{R}(\omega)^{-1}\bar{\gamma}(\omega)\,d\omega.

Due to the fact that R¯()1\bar{R}(\cdot)^{-1} is Hermitian and γ\gamma is real-valued we then have

S=\displaystyle S= dγ¯(ω)R¯(ω)1γ¯(ω)𝑑ω.\displaystyle\int_{\mathbb{R}^{d}}\bar{\gamma}(\omega)^{*}\bar{R}(\omega)^{-1}\bar{\gamma}(\omega)\,d\omega.

The interchange of integral operators in (30) must be justified. This is permitted by the Fubini-Tonelli theorem [43, Cor. 13.9] as each element of γ¯(ω)R¯(ω)1γ¯(ω)\bar{\gamma}(\omega)^{*}\bar{R}(\omega)^{-1}\bar{\gamma}(\omega) is absolutely integrable. This can be seen by noting that every element of γ¯(ω)\bar{\gamma}(\omega)^{*} is bounded, and that every element of R¯1(ω)γ¯(ω)={f(i)}\bar{R}^{-1}(\omega)\bar{\gamma}(\omega)=\mathcal{F}\{f(i)^{\top}\} is absolutely integrable, ensuring that every element of γ¯(ω)R¯(ω)1γ¯(ω)\bar{\gamma}(\omega)^{*}\bar{R}(\omega)^{-1}\bar{\gamma}(\omega) is a finite sum of products of absolutely integrable functions and bounded functions and hence is itself absolutely integrable.

This ensures SS is symmetric. R¯(ω)1\bar{R}(\omega)^{-1} is a hermitian positive-definite matrix and so admits a unique Cholesky decomposition R¯(ω)1=D(ω)D(ω)\bar{R}(\omega)^{-1}=D^{*}(\omega)D(\omega) with rank(D(ω))=m(D(\omega))=m for almost all ωd\omega\in\mathbb{R}^{d}. We can then state that for almost all ω\omega,

S=\displaystyle S= (D(ω)γ¯(ω))D(ω)γ¯(ω)𝑑ω\displaystyle\int_{\mathbb{R}}\left(D(\omega)\bar{\gamma}(\omega)\right)^{*}D(\omega)\bar{\gamma}(\omega)\,d\omega
xSx=\displaystyle\implies x^{\top}Sx= D()γ¯()xL220,xn\displaystyle\|D(\cdot)\bar{\gamma}(\cdot)x\|^{2}_{L_{2}}\geq 0,\ \forall x\in\mathbb{R}^{n}

hence SS is positive semi-definite. As SS is symmetric positive semi-definite, it also admits a unique decomposition S=GGS=GG for some symmetric positive semi-definite G=S12n×nG=S^{\frac{1}{2}}\in\mathbb{R}^{n\times n} [42, p. 440], proving the first lemma assertion.

Now, Procedure 1 asserts that

Pk+1|k\displaystyle P_{k+1|k} =APkA+Q\displaystyle=AP_{k}A^{\top}+Q
Pk\displaystyle P_{k} =Pk|k1(I+SPk|k1)1.\displaystyle=P_{k|k-1}\left(I+SP_{k|k-1}\right)^{-1}. (31)

Substituting the form S=GG=GGS=GG=GG^{\top} into (31) we have that

Pk+1|k=\displaystyle P_{k+1|k}= APk|k1(I+SPk|k1)1A+Q\displaystyle AP_{k|k-1}\left(I+SP_{k|k-1}\right)^{-1}A^{\top}+Q
=\displaystyle= APk|k1(I+GGPk|k1)1A+Q\displaystyle AP_{k|k-1}\left(I+GG^{\top}P_{k|k-1}\right)^{-1}A^{\top}+Q
=\displaystyle= APk|k1(IG(I+GPk|k1G)1GPk|k1)A\displaystyle AP_{k|k-1}\left(I-G\left(I+G^{\top}P_{k|k-1}G\right)^{-1}G^{\top}P_{k|k-1}\right)A^{\top}
+Q,\displaystyle+Q,

which follows from the Woodbury matrix identity. A final expansion allows us to arrive at the canonical form of the discrete-time Riccati equation (29), completing the proof.

Lemma 5.1 is important because it enables us to employ the standard notions of stabilizability and detectability from finite-dimensional linear systems theory to establish the following result characterizing the convergence of our optimal linear filter. Note that Theorem 5.3 requires the following assumptions See A6 See A7

Theorem 5.3.

Consider the filter equations given by Theorem 4.7, and let assumptions A6 and A7 hold. If Q>0Q>0, then the asymptotic a priori covariance PP^{-}_{\infty} is the unique stabilizing solution of the discrete-time algebraic Riccati equation

P\displaystyle P_{\infty}^{-} =APAAPG(I+GPG)1GPA+Q.\displaystyle=AP_{\infty}^{-}A^{\top}-AP_{\infty}^{-}G\left(I+G^{\top}P_{\infty}^{-}G\right)^{-1}G^{\top}P_{\infty}^{-}A^{\top}+Q.
Proof 5.4.

Given Lemma 5.1, [35, p. 77] establishes that discrete-time algebraic Riccati equations of the form (29) converge to a unique limit if Q>0Q>0, (A,G)(A,G) is detectable, and (A,Q)(A,Q) is stabilizable. In other words, if these conditions hold, then for any non-negative symmetric initial covariance P0P_{0}, there exists a limit

limkPk|k1=P,\displaystyle\lim_{k\rightarrow\infty}P_{k|k-1}=P_{\infty}^{-},

and this limit satisfies the steady-state equation

P=\displaystyle P_{\infty}^{-}= APA+Q\displaystyle AP_{\infty}^{-}A^{\top}+Q
APG(I+GPG)1GPA,\displaystyle-AP_{\infty}^{-}G\left(I+G^{\top}P_{\infty}^{-}G\right)^{-1}G^{\top}P_{\infty}^{-}A^{\top},

completing the proof.

Given Theorem 5.3, the asymptotic a posteriori covariance matrix PP_{\infty} can easily be calculated via the covariance update equation of Theorem 4.7, namely, P=P(I+SP)1P_{\infty}=P_{\infty}^{-}(I+SP_{\infty}^{-})^{-1}. Surprisingly, Theorem 5.3 requires only existing concepts from finite-dimensional linear system theory, and no concepts from infinite-dimensional systems theory.

6 Algorithm and Implementation

Theorem 4.7, combined with (25), can be used to construct an algorithm as follows:

Procedure 1 Optimal Linear Filter
1:Inputs: A,Q,R(i,i),P0,x^0,γ(i)A,Q,R(i,i^{\prime}),P_{0},\hat{x}_{0},\gamma(i)
2:for k1k\geq 1 do
3:     f(i)=1{γ¯(ω)R¯(ω)1}f(i)=\mathcal{F}^{-1}\{\bar{\gamma}^{\top}(\omega)\bar{R}(\omega)^{-1}\}
4:     S=df(i)γ(i)𝑑iS=\int_{\mathbb{R}^{d}}f(i)\gamma(i)di
5:     Pk|k1=APk1A+QP_{k|k-1}=AP_{k-1}A^{\top}+Q
6:     Pk=Pk|k1(I+SPk|k1)1P_{k}=P_{k|k-1}(I+SP_{k|k-1})^{-1}
7:     x^k|k1=Ax^k1\hat{x}_{k|k-1}=A\hat{x}_{k-1}
8:     Obtain measurement zk(i)z_{k}(i)
9:     x^k=x^k|k1+Pkdf(i)(zk(i)γ(i)x^k|k1)𝑑i\hat{x}_{k}=\hat{x}_{k|k-1}+P_{k}\int_{\mathbb{R}^{d}}f(i)(z_{k}(i)-\gamma(i)\hat{x}_{k|k-1})di
10:end for
11:Outputs: {P1,P2,,PN},{x^1,x^2,,x^N}\{P_{1},P_{2},...,P_{N}\},\{\hat{x}_{1},\hat{x}_{2},...,\hat{x}_{N}\}

We will now discuss the computational complexity of Procedure 1 and its connection to existing results, before implementing it in a simulated environment in Section 7.

6.1 Computational Complexity

Our optimal linear filter has many similar properties to the finite-dimensional Kalman filter, such as the convenient ability to calculate the full trajectory of covariance matrices before run-time, as they do not rely on any measurements. This filter is similar to the filter formulated in previous work [21], although the derivation of the filter presented in this work relies on the methodology of projections rather than the calculus of variations approach. The filter given in this article extends the previously derived filter to operate not only on the index set \mathbb{R}, but on any Euclidian space. Filtering on the Euclidian plane 2\mathbb{R}^{2} is perhaps the most relevant to imaging applications, but there may be numerous applications involving higher dimensions.

In the standard finite-dimensional Kalman filtering context with NN measurements, naive inversion of the matrix (R+ΓPΓ)N×N(R+\Gamma P\Gamma^{\top})\in\mathbb{R}^{N\times N} requires nearly cubic complexity in NN [49]. This condition can sometimes be ameliorated, however, by observing that if RR is the covariance matrix of a stationary process then it will have a Toeplitz form. This Toeplitz structure allows greatly reduced computational complexity for inversion operations, see [50] and the references throughout for a brief summary on the types of Toeplitz matrices and corresponding inversion complexities. For positive-definite Toeplitz matrices in particular, inversion may be performed with O(Nlog2(N))O(Nlog^{2}(N)) complexity.

The optimal Kalman filter in Procedure 1 possesses similar advantages due to the stationary structure of RR. If NN measurement samples are taken, we see in Procedure 1 that Fourier operations are used to calculate the covariance matrices and optimal gain functions, resulting in a fast Fourier transform applied to an m×nm\times n matrix with complexity O(mnNlog(N))O\left(mnN\log(N)\right) [51, p. 41].

Although Procedure 1 does contain matrix inversion, this is confined to low-dimensional n×nn\times n matrices. This approach to analysis in the continuum provides valuable insights into the optimal gain functions and the behavior of the system. This extension to the continuous measurement domain does introduce a trade-off between accuracy and speed of computation, as any implementable estimator will not be able to sample continuously over the entire space. A standard implementation may sample the space uniformly, which is the approach we take in Section 7, but a more sophisticated approach could involve the analysis of different non-uniform schemes that exploit the underlying system structure. Examination of this trade-off for various systems is an important area of future work.

6.2 Comparisons to Existing Results

During our analysis, we found the necessary and sufficient conditions on the optimal gain satisfies equation (17). If the measurements were finite-dimensional, this equation reduces to a matrix equation. The gain may then be calculated via matrix inversion as

Kk=Pkγ(R+γPkγ)1.\displaystyle K_{k}=P^{-}_{k}\gamma^{\top}\left(R+\gamma P_{k}^{-}\gamma^{\top}\right)^{-1}.

Note that this is the standard equation for the Kalman filter in the finite-dimensional case.

Previous works in this area have also derived condition (17) [19], [16], [8]. These works, however, have either only defined the optimal gain implicitly, or defined it via an operator inverse ()(\cdot)^{\dagger} which satisfies

𝒟G(i,i)G(i,i1)𝑑i=Iδ(ii1),\displaystyle\int_{\mathcal{D}}G(i,i^{\prime})G^{\dagger}(i^{\prime},i_{1})di^{\prime}=I\delta(i-i_{1}),

where δ()\delta(\cdot) is the Dirac delta function [8, Eq. (14)]. The optimal gain is then expressed using this distributed-parameter matrix inverse as

Kk=Pkγ(R+γPkγ).\displaystyle K_{k}=P_{k}^{-}\gamma^{\top}(R+\gamma P_{k}^{-}\gamma^{\top})^{\dagger}.

Some derivations also give results where the gain is reliant on R.R^{\dagger}. These expressions make the connection to the finite-dimensional Kalman filter clearer, but unfortunately the operator inverse is not guaranteed to exist. Furthermore, some forms of R(i,i)R(i,i^{\prime}) will lead to inverses that are troublesome to implement. The derivation of the inverse corresponding to an Ornstein-Uhlenbeck process has been presented previously in [21] and results in an RR^{\dagger} defined in terms of the Dirac delta function and its derivatives. The Dirac delta function is well-defined only when under an integral, and requires special care when implemented on real-world hardware. Derivatives of the Dirac delta function will also cause difficulties in implementation. The filter equations derived in this work do not rely on such an inverse, and give an explicit definition of the optimal gain. The formulation presented will, in some cases, avoid the need for generalized functions such as the aforementioned Dirac delta function and its higher derivatives.

The derived prior covariance update equation (29) can be compared with standard forms for the finite-dimensional Kalman filter [35, p. 39, Eq. (1.9)]. It is interesting to note that this implies that the prior covariance matrix for our filter is equivalent to that of a finite-dimensional Kalman filter with the measurement noise being a standard multi-variate normal distribution and an observation matrix equal to G=S12G=S^{\frac{1}{2}}.

We also highlight that the state estimation error previously defined as ekxkx^ke_{k}\triangleq x_{k}-\hat{x}_{k} satisfies

ek+1=Mkek+nk,\displaystyle e_{k+1}=M_{k}e_{k}+n_{k},

where Mk=AKkγn×nM_{k}=A-K_{k}\gamma\in\mathbb{R}^{n\times n}, and nk=Kkvk+wkn.n_{k}=-K_{k}v_{k}+w_{k}\in\mathbb{R}^{n}. Hence, the state estimation error dynamics and the optimal (component) gains κk(i)\kappa_{k}(i) to apply to points idi\in\mathbb{R}^{d} in the measurement domain depend on the state dynamics of the system, AA. In particular, as kk\to\infty, the steady-state gain operator KkK_{k} must be chosen such that AKkγA-K_{k}\gamma is stable and yields bounded mean square estimation errors. In other words, the gain must take account of the underlying dynamics. This aspect of the filter is important as many standard methods of processing high-dimensional measurements, such as vision-based systems, perform the detection and weighting of measurement features separately from the system dynamics.

7 Simulation Results

This section presents simulation results for the derived Kalman filter with infinite-dimensional measurements, as described in Algorithm 1. This system consists of a state vector xk=[qk,q˙k]x_{k}=[q_{k},\dot{q}_{k}]^{\top}, which represents the position and velocity of some agent. The motion model consists of a state transition model where Δ\Delta is the change in time between each discrete step. A two-dimensional image is fixed perpendicular to the line of motion of the filter. This image is represented by the function C(p):2C(p):\mathbb{R}^{2}\rightarrow\mathbb{R}. We note that the measurement readings are scalar valued, perhaps representing visual light intensity, but this could easily be extended to multi-dimensional image outputs. For example, the intensity of three distinct RGB channels could be represented solely by changing the form of the function C(p)C(p) to be vector-valued. The measurement model used is the simple pinhole camera model. An illustration of this model is given in Fig. 2. Both the motion model and the measurement model are affected by zero-mean, stationary, additive Gaussian noise such that E[wkwk]=QE[w_{k}w_{k}^{\top}]=Q and E[vk(i)vk(i)]=R(ii).E[v_{k}(i)v_{k}(i^{\prime})^{\top}]=R(i-i^{\prime}).

Refer to caption
Figure 2: Diagram of the 2-D Pinhole Camera Model

The patterned wall possesses a gray-scale intensity function

C(p)=e(ηp2)2cos(ξp2)+1\displaystyle C(p)=e^{-(\eta\|p\|_{2})^{2}}\cos(\xi\|p\|_{2})+1
Refer to caption
(a) A measurement with no noise.
Refer to caption
(b) A measurement with substantial noise.
Figure 3: Two measurements of the decaying sinusoidal pattern, Fig. 3(a) has no additive noise, while Fig. 3(b) is perturbed by the addition of a stochastic field with a squared exponential covariance function. Note that in 3(b) the measurements are spatially discretized for computational reasons.
Table 4: Simulation Parameters
System Variable Notation Value
State Matrix AA [1Δ01]\begin{bmatrix}1&\Delta\\ 0&1\end{bmatrix}
Process Noise Covariance QQ [σq200σq˙2]\begin{bmatrix}\sigma_{q}^{2}&0\\ 0&\sigma_{\dot{q}}^{2}\end{bmatrix}
Measurement Kernel γ(i)\gamma(i) See (32)\text{See }\eqref{eq:gamma}
State Vector xkx_{k} [qk,q˙k][q_{k},\dot{q}_{k}]^{\top}
Initial State x0x_{0} [1,0][1,0]^{\top}
Linearization Point x¯\bar{x} [1,0][1,0]^{\top}
Integral Domain 𝒟\mathcal{D} [0.5,0.5]×[0.5,0.5][-0.5,0.5]^{\top}\times[-0.5,0.5]^{\top}
Wall Pattern C(p)C(p) e(ηp2)2cos(ξp2)+1e^{-(\eta\|p\|_{2})^{2}}\cos(\xi\|p\|_{2})+1
Measurement Covariance R(i,i)R(i,i^{\prime}) ν2π2eii22(22)1\frac{\nu}{2\pi\ell^{2}}e^{-\|i-i^{\prime}\|_{2}^{2}(2\ell^{2})^{-1}}
Initial Error Covariance P0P_{0} QQ
Initial State Estimate x^0\hat{x}_{0} x0x_{0}
Time Interval Δ\Delta 11
Position Variance σq2\sigma_{q}^{2} 0.010.01
Velocity Variance σq˙2\sigma_{\dot{q}}^{2} 0.010.01
Decay Parameter η\eta 0.10.1
Frequency Parameter ξ\xi 0.80.8
 Measurement Noise Intensity ν\nu 1010
Focal Length LfL_{f} 0.010.01
Length Scale \ell 0.0250.025
Sample Spacing Δs\Delta_{s} 0.0050.005

where η\eta is a scalar parameter that determines the decay rate and ξ\xi is a scalar parameter that determines the frequency. Utilizing the pinhole camera model, our non-linear measurement equation is then of the form

zk(i,qk)\displaystyle z_{k}(i,q_{k}) =C(iqkLf)+vk(i).\displaystyle=C\left(\frac{iq_{k}}{L_{f}}\right)+v_{k}(i).

This function is linearized around some equilibrium point x¯=[q¯,q˙¯]\bar{x}=[\bar{q},\bar{\dot{q}}]^{\top} to derive the linear form used in the simulation. This linearization is given in equation (33) and leads to the corresponding measurement function

γ(i)=\displaystyle\gamma(i)= [e(ηLf1q¯i2)2×(2(ηLf1i2)2q¯cos(ξLf1i2q¯)\displaystyle\bigg{[}-e^{-(\eta L_{f}^{-1}\bar{q}\|i\|_{2})^{2}}\times\big{(}2(\eta L_{f}^{-1}\|i\|_{2})^{2}\bar{q}\cos(\xi L_{f}^{-1}\|i\|_{2}\bar{q})
+ξLf1i2sin(ξLf1i2q¯)),0]1×2.\displaystyle+\xi L_{f}^{-1}\|i\|_{2}\sin(\xi L_{f}^{-1}\|i\|_{2}\bar{q})\big{)},0\bigg{]}\in\mathbb{R}^{1\times 2}. (32)

The system equations then are:

xk+1=\displaystyle x_{k+1}= [1Δ01][qkq˙k]+wk\displaystyle\begin{bmatrix}1&\Delta\\ 0&1\end{bmatrix}\begin{bmatrix}q_{k}\\ \dot{q}_{k}\end{bmatrix}+w_{k}
zk(i)=\displaystyle z_{k}(i)= e(ηLf1q¯i2)2[2(ηLf1i2)2q¯cos(ξLf1i2q¯)\displaystyle-e^{-(\eta L_{f}^{-1}\bar{q}\|i\|_{2})^{2}}\big{[}2(\eta L_{f}^{-1}\|i\|_{2})^{2}\bar{q}\cos(\xi L_{f}^{-1}\|i\|_{2}\bar{q})
+ξLf1i2sin(ξLf1i2q¯)]qk+vk(i).\displaystyle+\xi L_{f}^{-1}\|i\|_{2}\sin(\xi L_{f}^{-1}\|i\|_{2}\bar{q})\big{]}q_{k}+v_{k}(i). (33)

An example of the linearized measurement pattern as measured by the system without and with noise are presented in Figs. 3(a) and 3(b) respectively.

We can formulate a discrete-time Riccati equation for this system as given by (29).

The matrix pair (A,Q)(A,Q) is stabilizable in the sense of Definition 2.2 because considering a matrix M=12σq2IM=\frac{1}{2\sigma_{q}^{2}}I, it is then clear that

ρ(AMQ)\displaystyle\rho(A-MQ) =ρ(A12I)\displaystyle=\rho\left(A-\frac{1}{2}I\right)
=ρ([121012])=12.\displaystyle=\rho\left(\begin{bmatrix}\frac{1}{2}&1\\ 0&\frac{1}{2}\end{bmatrix}\right)=\frac{1}{2}.

It is also true that the matrix pair (A,G)(A,G) is detectable in the sense of Definition 2.3, where G=S12G=S^{\frac{1}{2}}. This can be shown by demonstrating that the matrix pair (A,G)(A^{\top},G) is stabilizable, specifically, in the simulated system the matrix G=S12G=S^{\frac{1}{2}} takes the form

S=[G1000],G1+.\displaystyle S=\begin{bmatrix}G_{1}&0\\ 0&0\end{bmatrix},\ G_{1}\in\mathbb{R}^{+}.

Considering a matrix MM of the form

M=[1G114G100]\displaystyle M=\begin{bmatrix}\frac{1}{G_{1}}&\frac{1}{4G_{1}}\\ 0&0\end{bmatrix}

immediately gives that

ρ(ASM)\displaystyle\rho(A^{\top}-SM) =ρ([01411])=12.\displaystyle=\rho\left(\begin{bmatrix}0&-\frac{1}{4}\\ 1&1\end{bmatrix}\right)=\frac{1}{2}.
Refer to caption
(a) True Position and Estimated Position.
Refer to caption
(b) Empirical and Theoretical MSE.
Figure 4: Fig. 4(a) displays the true position and estimated position for a single realization of the filter over 50 time units. Fig. 4(b) displays the empirical position and velocity MSE, averaged over 20,00020,000 trials and compared with the theoretical position and velocity MSE.

These conditions, along with the fact that QQ is positive-definite, ensure that the discrete-time Riccati equation for this system possesses a unique stabilizing solution [35]. Utilizing the Matlab function idare and substituting the appropriate terms, the following covariance matrices were calculated for this system.

P=[1.20180.20190.20190.0695],P=[0.84750.14240.14240.0595].\displaystyle P_{\infty}^{-}=\begin{bmatrix}1.2018&0.2019\\ 0.2019&0.0695\end{bmatrix},\ P_{\infty}=\begin{bmatrix}0.8475&0.1424\\ 0.1424&0.0595\end{bmatrix}. (34)

The process noise of the position and velocity are uncorrelated. A centred, stationary, stochastic field is used to represent the measurement noise, the covariance function of which is a squared exponential also given in table 4.

With the parameters given in Table 4, Algorithm 1 was implemented and simulated using Matlab. It is necessary during implementation to discretise the system appropriately for numerical computation. Numerical functions were evaluated with spacings equal to Δs\Delta_{s} in each dimension, or a frequency of 200200 samples per unit of ii in each dimension. Algorithm 1 requires numerical integration for the posterior state update step, and the trapz function was used to implement this numerical integration. For this numerical integration step, a finite pixel domain 𝒟=[0.5,0.5]×[0.5,0.5]\mathcal{D}=[-0.5,0.5]\times[-0.5,0.5] was used. This is larger than the effective pixel-width of γ\gamma and much larger than the effective width \ell of the noise covariance RR, suggesting minimal impact on the results obtained.

This naive method of numerical integration does not exploit the structure of f(i)f(i) and hence there may be more efficient numerical methods that do exploit this structure and lead to more efficient computation. Analysis of such techniques and their impact on efficiency will continue to be an important question for future work in this area.

A single realization of the systems true state and the filter estimate over 5050 time steps is given in Fig. 4(a). The system was subject to 20,00020,000 simulated trials and the average MSE of both the position and velocity trajectories calculated. Fig. 4(b) presents these results alongside the theoretical mean square errors according to (34).

8 Conclusion

This article has presented a Kalman filter for a linear system with finite-dimensional states and infinite-dimensional measurements, both corrupted by additive random noise. The assumption that the measurement noise covariance function is stationary allows a closed-form expression of the optimal gain function, whereas previous derivations rely on an implicitly defined inverse. Conditions are derived which ensure the filter is asymptotically stable and, surprisingly, these conditions apply to finite-dimensional components of the system. An algorithm is presented which takes advantage of this stationarity property to reduce computational complexity, and the derived filter is tested in a simulated environment motivated by a linearization of the pinhole camera model.

An extension to a continuous-time framework is possible, but would require additional tools and some further technical conditions, such as the use of stochastic integrals (either in the Itô or Stratonovich sense). An extension of the Fubini-Tonelli theorem given by Theorem 3.1 would also need to be modified to validate the interchange of expectation and integration on stochastic fields over a continuous spatial and temporal field.

Future work includes investigating more efficient means of executing this algorithm, such as non-uniform integration intervals, to yield insight into effective sensor placement. This could be thought of as a form of feature selection, and this principled approach to feature selection may provide insights related to existing heuristics-based and data driven feature selection strategies often employed in high-dimensional measurement processing. It should also be noted that most real-world applications possess non-linear measurement equations and adapting the linear filter to nonlinear systems will be an important area for future work.

\appendices

9 Existence of Integral (5) via Fubini-Tonelli Theorem

We introduce one final technical assumption:

  A8

κk()sk(,)\kappa_{k}(\cdot)s_{k}(\cdot,\cdot) is measurable with respect to the product σ\sigma-algebra on d×Θ\mathbb{R}^{d}\times\Theta.

Proof 9.1.

For any fixed ii, Jensen’s inequality tells us

E[κ(i)s(i)2]2\displaystyle E\left[\|\kappa(i)s(i)\|_{2}\right]^{2}\leq E[κ(i)s(i)22]\displaystyle E\left[\|\kappa(i)s(i)\|_{2}^{2}\right]
=\displaystyle= E[Tr[κ(i)s(i)s(i)κ(i)]]\displaystyle E\bigg{[}Tr\big{[}\kappa(i)s(i)s^{\top}(i)\kappa^{\top}(i)\big{]}\bigg{]}
=\displaystyle= Tr[κ(i)E[s(i)s(i)]κ(i)]\displaystyle Tr\bigg{[}\kappa(i)E\left[s(i)s^{\top}(i)\right]\kappa^{\top}(i)\bigg{]}
=\displaystyle= Tr[κ(i)(Σ(0)+γ(i)Pγ(i))κ(i)]\displaystyle Tr\bigg{[}\kappa(i)\left(\Sigma(0)+\gamma(i)P\gamma(i)^{\top}\right)\kappa^{\top}(i)\bigg{]}
=\displaystyle= Σ12(0)κ(i)F2+P12γ(i)κ(i)]2F\displaystyle\|\Sigma^{\frac{1}{2}}(0)\kappa(i)\|^{2}_{F}+\|P^{\frac{1}{2}}\gamma(i)\kappa(i)]\|^{2}_{F}
\displaystyle\leq Σ12(0)F2κ(i)F2+P12F2γ(i)κ(i)F2.\displaystyle\|\Sigma^{\frac{1}{2}}(0)\|_{F}^{2}\|\kappa(i)\|_{F}^{2}+\|P^{\frac{1}{2}}\|_{F}^{2}\|\gamma(i)\kappa(i)\|_{F}^{2}.

This implies, due to the non-negativity of norms, that

E[κ(i)s(i)2]\displaystyle E\left[\|\kappa(i)s(i)\|_{2}\right]\leq Σ12(0)Fκ(i)F+P12Fγ(i)κ(i)F.\displaystyle\|\Sigma^{\frac{1}{2}}(0)\|_{F}\|\kappa(i)\|_{F}+\|P^{\frac{1}{2}}\|_{F}\|\gamma(i)\kappa(i)\|_{F}.

As this is true for any given ii, we have the inequality

dE[κ(i)s(i)2]𝑑i\displaystyle\int_{\mathbb{R}^{d}}E\left[\|\kappa(i)s(i)\|_{2}\right]di\leq Σ12(0)Fdκ(i)F𝑑i\displaystyle\|\Sigma^{\frac{1}{2}}(0)\|_{F}\int_{\mathbb{R}^{d}}\|\kappa(i)\|_{F}\,di
+P12Fdγ(i)κ(i)F𝑑i.\displaystyle+\|P^{\frac{1}{2}}\|_{F}\int_{\mathbb{R}^{d}}\|\gamma(i)\kappa(i)\|_{F}\,di.

As Σ\Sigma is bounded and κ(i)\kappa(i) is absolutely integrable, the first term of the right hand side is finite. By Hölder’s inequality and assumptions A1 and A5 we have κγL1κL1γL<\|\kappa\gamma\|_{L_{1}}\leq\|\kappa\|_{L_{1}}\|\gamma\|_{L_{\infty}}<\infty and hence the second term as well as the the left hand side is finite. As all norms on n\mathbb{R}^{n} are equivalent, we have

dE[κ(i)s(i)1]𝑑i\displaystyle\int_{\mathbb{R}^{d}}E\left[\|\kappa(i)s(i)\|_{1}\right]di <\displaystyle<\infty
dΘκ(i)s(i,θ)1𝑑P(dθ)𝑑i\displaystyle\iff\int_{\mathbb{R}^{d}}\int_{\Theta}\|\kappa(i)s(i,\theta)\|_{1}dP(d\theta)di <\displaystyle<\infty
dΘ|[κ(i)s(i,θ)]|𝑑P(dθ)𝑑i\displaystyle\iff\int_{\mathbb{R}^{d}}\int_{\Theta}\left|[\kappa(i)s(i,\theta)]^{\ell}\right|dP(d\theta)di <,\displaystyle<\infty,

where [κ(i)s(i,θ)][\kappa(i)s(i,\theta)]^{\ell} is the th\ell^{th} element of the vector κ(i)s(i,θ)\kappa(i)s(i,\theta). As the integral of the absolute expectation of each component is finite, we may apply the Fubini-Tonelli theorem component-wise for each \ell and compose the vector form

dE[κ(i)s(i)]𝑑i=E[dκ(i)s(i)𝑑i].\displaystyle\int_{\mathbb{R}^{d}}E\left[\kappa(i)s(i)\right]di=E\left[\int_{\mathbb{R}^{d}}\kappa(i)s(i)di\right].

References

  • [1] A. Belloni and V. Chernozhukov, High Dimensional Sparse Econometric Models: An Introduction, pp. 121–156. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011.
  • [2] R. Sepulchre, “Spiking control systems,” Proceedings of the IEEE, vol. 110, no. 5, pp. 577–589, 2022.
  • [3] P. Corke, Robotics, Vision and Control - Fundamental Algorithms in MATLAB®, vol. 73 of Springer Tracts in Advanced Robotics. Springer, 2011.
  • [4] K. Morris, Controller Design for Distributed Parameter Systems. Springer, 2020.
  • [5] R. Curtain, “A survey of infinite-dimensional filtering,” SIAM Review, vol. 17, no. 3, pp. 395–411, 1975.
  • [6] A. Bensoussan, G. Da Prato, M. Delfour, and S. Mitter, Representation and Control of Infinite Dimensional Systems. Systems and Control: Foundations and Applications, Birkhauser, 2 ed., 2007.
  • [7] S. Tzafestas and J. Nightingale, “Optimal filtering, smoothing and prediction in linear distributed-parameter systems,” Proceedings of the Institution of Electrical Engineers, vol. 115, pp. 1207–1212(5), 1968.
  • [8] H. Nagamine, S. Omatu, and T. Soeda, “The optimal filtering problem for a discrete-time distributed parameter system,” International Journal of Systems Science, vol. 10, no. 7, pp. 735–749, 1979.
  • [9] P. L. Falb, “Infinite-dimensional filtering: The kalman-bucy filter in hilbert space,” Information and Control, vol. 11, pp. 102–137, 1967.
  • [10] Z. Emirsajłow, Output Observers for Linear Infinite-Dimensional Control Systems, pp. 67–92. Cham: Springer International Publishing, 2021.
  • [11] R. Curtain and H. Zwart, Introduction to Infinite-Dimensional Systems Theory: A State-Space Approach, vol. 71 of Texts in Applied Mathematics book series. Germany: Springer, 2020.
  • [12] Z. Hidayat, R. Babuska, B. De Schutter, and A. Núñez, “Observers for linear distributed-parameter systems: A survey,” in IEEE International Symposium on Robotic and Sensors Environments, pp. 166–171, 2011.
  • [13] A. Aalto, “Spatial discretization error in Kalman filtering for discrete-time infinite dimensional systems,” IMA Journal of Mathematical Control and Information, vol. 35, pp. i51–i72, 04 2017.
  • [14] A. Küper and S. Waldherr, “Numerical gaussian process kalman filtering for spatiotemporal systems,” IEEE Transactions on Automatic Control, pp. 1–8, 2022.
  • [15] F. E. Thau, “On Optimum Filtering for a Class of Linear Distributed-Parameter Systems,” Journal of Basic Engineering, vol. 91, no. 2, pp. 173–178, 1969.
  • [16] S. G. Tzafestas, “On the distributed parameter least-squares state estimation theory,” International Journal of Systems Science, vol. 4, no. 6, pp. 833–858, 1973.
  • [17] S. G. Tzafestas, “Bayesian approach to distributed-parameter filtering and smoothing,” International Journal of Control, vol. 15, no. 2, pp. 273–295, 1972.
  • [18] K. A. Morris, “Optimal output estimation for infinite-dimensional systems with disturbances,” Systems & Control Letters, vol. 146, p. 104803, 2020.
  • [19] J. S. Meditch, “Least-squares filtering and smoothing for linear distributed parameter systems,” Automatica, vol. 7, p. 315–322, May 1971.
  • [20] A. Bensoussan, Some Remarks on Linear Filtering Theory for Infinite Dimensional Systems, vol. 286, pp. 27–39. Springer Berlin Heidelberg, 10 2003.
  • [21] M. M. Varley, T. L. Molloy, and G. N. Nair, “Kalman filtering for discrete-time linear systems with infinite-dimensional observations,” in 2022 American Control Conference (ACC), pp. 296–303, 2022.
  • [22] P. A. Fuhrmann, “On observability and stability in infinite-dimensional linear systems,” Journal of Optimization Theory and Applications, vol. 12, pp. 173–181, Aug 1973.
  • [23] M. Megan, “Stability and observability for linear discrete-time systems in hilbert spaces,” Bulletin mathématique de la Société des Sciences Mathématiques de la République Socialiste de Roumanie, vol. 24 (72), no. 3, pp. 277–284, 1980.
  • [24] B. Fristedt and L. Gray, A Modern Approach to Probability Theory. Boston, MA: Birkhauser, 1997.
  • [25] D. G. Luenberger, Optimization by Vector Space Methods. USA: John Wiley & Sons, Inc., 1st ed., 1997.
  • [26] R. J. Adler, The Geometry of Random Fields (Probability & Mathematical Statistics S.). John Wiley & Sons Inc, June 1981.
  • [27] J. Doob, Stochastic Processes. Probability and Statistics Series, Wiley, 1953.
  • [28] S. Janson, Gaussian Hilbert Spaces. Cambridge Tracts in Mathematics, Cambridge University Press, 1997.
  • [29] R. Boie and I. Cox, “An analysis of camera noise,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, no. 6, pp. 671–674, 1992.
  • [30] J. Fehrenbach, P. Weiss, and C. Lorenzo, “Variational algorithms to remove stationary noise: Applications to microscopy imaging,” IEEE Transactions on Image Processing, vol. 21, no. 10, pp. 4420–4430, 2012.
  • [31] T. L. Marzetta, “Spatially-stationary propagating random field model for massive mimo small-scale fading,” in 2018 IEEE International Symposium on Information Theory (ISIT), pp. 391–395, 2018.
  • [32] P. M. Atkinson and C. D. Lloyd, Geostatistical Models and Spatial Interpolation, pp. 1461–1476. Berlin, Heidelberg: Springer Berlin Heidelberg, 2014.
  • [33] E. M. Stein and G. Weiss, Introduction to Fourier Analysis on Euclidean Spaces. Princeton University Press, 1971.
  • [34] G. Folland, Fourier Analysis and Its Applications. American Mathematical Society, 2009.
  • [35] B. D. O. Anderson and J. B. Moore, Optimal filtering. Prentice-Hall Englewood Cliffs, N.J, 1979.
  • [36] A. I. Mourikis, N. Trawny, S. I. Roumeliotis, A. E. Johnson, A. Ansar, and L. Matthies, “Vision-aided inertial navigation for spacecraft entry, descent, and landing,” IEEE Transactions on Robotics, vol. 25, no. 2, pp. 264–280, 2009.
  • [37] M. Mascaró, I. Parra-Tsunekawa, C. Tampier, and J. Ruiz-del Solar, “Topological navigation and localization in tunnels—application to autonomous load-haul-dump vehicles operating in underground mines,” Applied Sciences, vol. 11, no. 14, 2021.
  • [38] H. Mäkelä, “Overview of lhd navigation without artificial beacons,” Robotics and Autonomous Systems, vol. 36, no. 1, pp. 21–35, 2001.
  • [39] A. Krasner, M. Sizintsev, A. Rajvanshi, H.-P. Chiu, N. Mithun, K. Kaighn, P. Miller, R. Villamil, and S. Samarasekera, “Signav: Semantically-informed gps-denied navigation and mapping in visually-degraded environments,” in 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 1858–1867, 2022.
  • [40] H. Cho, Y.-W. Seo, B. V. Kumar, and R. R. Rajkumar, “A multi-sensor fusion system for moving object detection and tracking in urban driving environments,” in 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 1836–1843, 2014.
  • [41] E. Kreyszig, Introductory Functional Analysis with Applications. Wiley Classics Library, Wiley, 1991.
  • [42] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge; New York: Cambridge University Press, 2nd ed., 2013.
  • [43] R. L. Schilling, Measures, Integrals and Martingales. Cambridge University Press, 2005.
  • [44] S. J. Axler, Linear Algebra Done Right. Undergraduate Texts in Mathematics, New York: Springer, 1997.
  • [45] R. E. Kalman, “A new approach to linear filtering and prediction problems,” Transactions of the ASME–Journal of Basic Engineering, vol. 82, no. Series D, pp. 35–45, 1960.
  • [46] R. Bracewell, The Two-Dimensional Fourier Transform, pp. 140–173. Boston, MA: Springer US, 2003.
  • [47] J. B. Conway, A course in functional analysis / John B. Conway. Graduate texts in mathematics ; 96, New York: Springer Science+Business Media, 2nd ed. ed., 2010.
  • [48] C. E. Rasmussen and C. K. I. Williams, Gaussian processes for machine learning. Adaptive computation and machine learning, MIT Press, 2006.
  • [49] V. Strassen, “Gaussian elimination is not optimal,” Numer. Math., vol. 13, p. 354–356, aug 1969.
  • [50] P. J. Ferreira and M. E. Domínguez, “Trading-off matrix size and matrix structure: Handling toeplitz equations by embedding on a larger circulant set,” Digital Signal Processing, vol. 20, no. 6, pp. 1711–1722, 2010.
  • [51] K. R. Rao, D. N. Kim, and J.-J. Hwang, Fast Fourier Transform - Algorithms and Applications. Springer Publishing Company, Incorporated, 1st ed., 2010.
{IEEEbiography}

[[Uncaptioned image]]Maxwell M. Varley Received the B.S. degree in electrical systems and the M.S. degree in Electrical Engineering from the University of Melbourne, Australia, in 2017 and 2019, respectively. He is currently working towards the Ph.D. degree in electrical engineering at the University of Melbourne.

{IEEEbiography}

[[Uncaptioned image]]Timothy L. Molloy (Member, IEEE) was born in Emerald, Australia. He received the B.E. and Ph.D. degrees from Queensland University of Technology (QUT), Brisbane, QLD, Australia, in 2010 and 2015, respectively. He is currently a Senior Lecturer in Mechatronics at the Australian National University. From 2017 to 2019, he was an Advance Queensland Research Fellow at QUT, and from 2020 to 2022, he was a Research Fellow at the University of Melbourne. His interests include signal processing and information theory for robotics and control. Dr. Molloy is the recipient of a QUT University Medal, a QUT Outstanding Doctoral Thesis Award, a 2018 Boeing Wirraway Team Award, and an Advance Queensland Early-Career Fellowship.

{IEEEbiography}

[[Uncaptioned image]]Girish N. Nair (Fellow, IEEE) was born in Malaysia. He received the B.E. (First Class Hons.) degree in electrical engineering, the B.Sc. degree in mathematics, and the Ph.D. degree in electrical engineering from the University of Melbourne, Parkville, VIC, Australia, in 1994, 1995, and 2000, respectively. He is currently a Professor with the Department of Electrical and Electronic Engineering, University of Melbourne. From 2015 to 2019, he was an ARC Future Fellow, and from 2019 to 2024, he is the Principal Australian Investigator of an AUSMURI project. He delivered a semiplenary lecture at the 60th IEEE Conference on Decision and Control, in 2021. His research interests are in information theory and control. Prof. Nair was a recipient of several prizes, including the IEEE CSS Axelby Outstanding Paper Award in 2014 and a SIAM Outstanding Paper Prize in 2006.