Optimal Linear Filtering for
Discrete-Time Systems with
Infinite-Dimensional Measurements

Maxwell M. Varley Timothy L. Molloy and Girish N. Nair This work received funding from the Australian Government, via grant AUSMURIB000001 associated with ONR MURI grant N00014-19-1-2571. M. M. Varley and G. N. Nair are with the Department of Electrical and Electronic Engineering, University of Melbourne, Parkville, VIC, 3010, Australia. (emails: [email protected], [email protected])T. L. Molloy is with the CIICADA Lab, School of Engineering, Australian National University, Canberra, ACT 2601, Australia (e-mail: [email protected])

Abstract

Systems equipped with modern sensing modalities such as vision and lidar gain access to increasingly high-dimensional measurements with which to enact estimation and control schemes. In this article, we examine the continuum limit of high-dimensional measurements and analyze state estimation in linear time-invariant systems with infinite-dimensional measurements but finite-dimensional states, both corrupted by additive noise. We propose a linear filter and derive the corresponding optimal gain functional in the sense of the minimum mean square error, analogous to the classic Kalman filter. By modeling the measurement noise as a wide-sense stationary random field, we are able to derive the optimal linear filter explicitly, in contrast to previous derivations of Kalman filters in distributed-parameter settings. Interestingly, we find that we need only impose conditions that are finite-dimensional in nature to ensure that the filter is asymptotically stable. The proposed filter is verified via simulation of a linearized system with a pinhole camera sensor.

{IEEEkeywords}

Discrete-Time Linear Systems, Distributed Parameter Systems, Kalman Filtering, State Estimation, Stochastic Fields, Stochastic Processes

1 Introduction

\IEEEPARstart

Many systems involve sensors that produce high-dimensional data. These types of systems occur in a wide range of fields, such as modeling economic growth [1], the spiking patterns of neurons in biological systems [2], and vision-based sensing in robotics [3]. In real-time settings with limited computational power, this data is typically converted to a lower-dimensional representation for timely processing, e.g. by extracting a limited number of key features or restricting to a subset of the data.

In this work, we propose to head in the opposite direction, and treat high-dimensional observations as infinite-dimensional random fields on a continuous index set. Our motivation for this approach comes from physics, where it is often more convenient to treat a complex physical system as a continuum, rather than as a large collection of discrete elements. With the right analytical tools, a continuous representation can simplify analysis and offer insights into the behaviour of the underlying high-dimensional system.

A system with infinite-dimensional observations does come with its own challenges however, which require the use of different analytical tools and lead to a new perspective compared to its finite-dimensional counterpart. The characterization of infinite-dimensional measurement noise involves stochastic fields, which must be handled carefully to maintain rigour. An example of this is the need for a generalization of the Fubini-Tonelli theorem to wide-sense stationary random fields, which we will present in Theorem 3.1.

Another important distinction is that in a standard finite-dimensional Kalman filter setup, an optimal gain is given by solving a matrix equation, and existence and uniqueness conditions arise naturally from standard linear algebra results. The infinite-dimensional observation space, however, results in an integral equation. We will show in Section 4 (under certain regularity assumptions) that this can be solved via Fourier methods and that the existence and uniqueness conditions can, surprisingly, still be given by examining properties of finite-dimensional objects. This gives us access to a closed-form continuous representation of the optimal gain, which is amenable to Fourier analysis.

Furthermore, we note that mean square stability of finite-dimensional systems often requires stabilizability and detectability conditions dependent on the state transition matrix, input matrix, and output matrix. For systems with infinite-dimensional measurements, it turns out that for mean square stability of our filter we require (sufficient) notions of stabilizability and detectability that take into account the noise covariance matrix, which is not the case for finite-dimensional systems. This is demonstrated in Section 5.

1.1 Literature Review

Infinite-dimensional representations are widely used in the study of distributed-parameter systems - see e.g. [4, 5, 6]. In such systems, the states and measurements are represented as functions both of time and a continuous-valued index taking values in some domain $\mathcal{D}$ . The models may be stochastic [7, 8] or deterministic [4], in either discrete or continuous-time [9]. A critical high-level survey of the approaches, claims, and results in this area has been carried out in [5], and a broader treatment of the estimation and control of distributed-parameter systems can be found in [4, 6, 10, 11]. An overview of various observer design methodologies for linear distributed-parameter systems is also given in the survey [12]. Although this survey does not discuss stochastic systems, the references therein are classified as either early lumping models, where the system is discretized before observer design, and late lumping models, where the system is discretized after observer design. In all the works discussed in [12], both the state and measurement spaces are considered infinite-dimensional.

In contrast with the rich body of work above, in this article we examine systems with infinite-dimensional measurements but finite-dimensional states. complementary problem setup occurs in [13], although in that work the system is assumed to have an infinite-dimensional state space and a finite-dimensional measurement space, which sidesteps the issue of calculating the inverse of an infinite-dimensional object, an issue we will discuss further in Section 6.2. The work in [13] uses an orthogonal projection operator to reduce the state to a finite-dimensional subspace and proceeds to establish upper bounds on the resulting discretization error by way of sensitivity analysis of a Ricatti difference equation. A more common approach is the aforementioned early lumping method, where discretization is performed at the start to produce linear models with high-but-finite-dimensional states and measurements. This is the approach taken by [14], where a Kalman filter is designed for a finite-dimensional subset of the infinite-dimensional measurement, corrupted by the additive Gaussian noise that is spatially white. In this article, we do not perform any such lumping, early or late, and our analysis remains firmly in an infinite-dimensional setting.

See Table 1 for a concise summary of early derivations of linear distributed-parameter filters, as well as more recent work. The papers therein employ a number of different approaches, including Green’s function [15], orthogonal projections [7], least squares arguments [16], the Wiener-Hopf method [8], Bayesian approaches [17], and $\mathcal{H}_{2}$ estimation [18], to name a representative handful. The derivation in this article is closest to the orthogonal projection argument given in [7], although here we deal with a discrete-time system. It should also be noted that [7] assumes that the observation noise is both temporally and spatially white, whereas here we do not require the latter property. We do, however, introduce the assumption that the observation noise is stationary, which allows the novel derivation of a closed form expression of the optimal gain in Section 4.3. This form has not been presented in any of these previous works.

The model that we propose is relevant to systems where high-dimensional measurements, for instance from vision, radar, or lidar sensors, are used to estimate a low-dimensional state, such as the position and velocity of an autonomous agent or moving target. To the best of the authors’ knowledge, such systems have not been the focus of previous work in distributed parameter filtering.

Table 1: Properties and assumptions of previous work in state estimation of distributed-parameter models. The final column describes the work presented in this article.

Reference	[7]	[15]	[19]	[17]	[16]	[8]	[20]	[4]	[18]	Ours
Finite-Dim. State Space	✗	✗	✗	✗	✗	✗	✗	✗	✗	✓
Discrete Time	✗	✓	✗	✓	✗	✓	✗	✗	✗	✓
Noise is Stationary on Domain $\mathcal{D}$	✗	✗	✗	✗	✗	✗	✗	✗	✗	✓
Explicit Optimal Gain	✗	✗	✗	✗	✗	✗	✗	✗	✗	✓

1.2 Contributions

Our key contribution is the derivation of an exact solution for the optimal linear filter for discrete-time systems with measurement noise and process noise modeled by random fields and random vectors respectively (Theorem 4.1, Procedure 1). The assumptions of finite state dimension, and that the measurement noise is stationary on domain $\mathcal{D}$ , enable us to derive the optimal gain functional explicitly in terms of a multi-dimensional Fourier transform on the measurement index set $\mathcal{D}$ .

This is in contrast to previous works in which the optimal gain is given in terms of a distributed-parameter inverse operator that is either not defined or defined implicitly [8], [17]. Moreover, this inverse can lead to implementation difficulties and is not always guaranteed to exist: these issues are discussed in further detail in Section 6.2. In comparison, our approach allows us to give sufficient conditions for the system to have a well-defined optimal gain function (Section 4): in rough terms, the effective bandwidth of the Fourier transform of the measurement kernel should be smaller than that of the noise covariance kernel.¹¹1The term bandwidth here refers to the effective range of “spatial” frequencies present in a measurement, not temporal frequencies.

An advantage of this continuum limit approach is that it enables us to make precise statements about the filters behaviour in the limit when measurement resolution approaches infinity. This paves the way for further analysis, as various finite-point approximation schemes may be formulated and the error compared with the asymptotic error of the optimal infinite-resolution filter in a principled and quantitative manner. Further discussion and analysis of these finite-point approximation schemes will be addressed in future work.

This article contains five significant extensions in comparison to our preliminary work in the conference paper [21].

The first major development is that this work, by way of a Hilbert space framework, presents necessary and sufficient conditions which guarantee the existence and uniqueness of the optimal gain function. Our previous work [21] employed a calculus of variations approach to yield only necessary conditions on the optimal gain function.

The second development is the extension of our proposed algorithm to a $d$ -dimensional, not just scalar, measurement domain $\mathcal{D}$ , allowing much richer sensor data to be modeled without obscuring the underlying spatial correlations. Examples include camera images ( $d=2$ ), as well as high-resolution lidar and magnetic resonance imaging (MRI) ( $d=3$ ). This extension entails the use of random fields and Fourier transforms with $d$ -dimensional frequency, as opposed to standard random processes and Fourier transforms on a scalar axis. This has also allowed us to present simulation results that involve two-dimensional images, as opposed to the simpler scalar functions previously presented in [21].

The third development is a weakening of the assumptions on the observation noise. The filter provided in [21] was restricted to systems with Gaussian observation noise, whereas the formulation presented here loosens this assumption and instead only requires that the noise be stationary.

The fourth development is that this work clearly defines explicit conditions such that the presented arguments are rigorous, whereas the derivations in [21] were of a more preliminary and exploratory nature. This work provides a list of assumptions such that our derived filter is well-defined, and that each step in the derivation is valid.

The fifth and final development is that we explicitly impose stabilizability and detectability conditions on the system (Section 2.4). The detectability condition in particular involves the measurement noise covariance function, as well as the deterministic dynamical matrix and measurement function. This strongly contrasts with other works in infinite-dimensional systems theory, which impose conditions of strong or weak observability [22, 23] that are agnostic to noise.

These extensions yield a clearer picture of the relationship between the system and the optimal filter, and provide a more generalized solution for the optimal gain.

Refer to caption — (a) Length scale $\ell=0.1$

1.3 Organization

This article is organized as follows. Section 2 outlines the relevant definitions, theorems, and conditions relating to the Hilbert space of random variables and random fields, as well as the multi-dimensional Fourier transform and notions of system stabilizability and detectability. Section 3 gives our model definition and discusses some concrete examples which our model could be feasibly used to represent. This section also summarizes the key assumptions needed to ensure validity of our analysis, and the motivation behind each assumption. Section 4 derives necessary and sufficient conditions for the optimal gain of the proposed filter (4.1), before presenting a unique gain function which satisfies these conditions 4.2. Section 4.3 will present the full filter equations and stability of this filter will be discussed in Section 5. Section 6 will give a procedure to implement this filter, with Section 6.1 discussing computational complexity and Section 6.2 drawing comparisons to existing results. Finally, Section 7 presents a simulated system and analyzes the empirical performance of the filter compared to its expected performance.

2 Preliminaries

2.1 Hilbert Space of Random Variables

In this work, we denote by $\mathcal{H}$ , the Hilbert space of all real-valued random variables with finite second moments [24, Ch. 20]. The inner product of any two elements $u,v\in\mathcal{H}$ is defined by $\langle u,v\rangle=E[uv].$ An inner product operating on two random vectors $u,v$ in the Hilbert space $\mathcal{H}^{n}$ of random $n$ -vectors is defined by $\langle u,v\rangle_{\mathcal{H}}=E[u^{\top}v]$ . The derivation used in this article involves defining the state of our observed system as a vector residing in a Hilbert space and then selecting an error-minimizing estimate that resides in a subspace. The following theorem [25, p.51, Th. 2] is fundamental to this approach.

Theorem 2.1 (Hilbert Projection Theorem)

Let $\mathcal{G}$ be a Hilbert space and $M$ a closed subspace of $\mathcal{G}.$ Corresponding to any vector $x\in\mathcal{G}$ , there exists a unique vector $m_{0}\in M$ such that $||x-m_{0}||\leq||x-m||\ \forall\ m\in M.$ Furthermore, a necessary and sufficient condition that $m_{0}\in M$ is the unique minimizing vector is that $(x-m_{0})\perp M.$

2.2 Random Fields

This article models measurement noise as an additive random field and some care is required when manipulating these objects. Rigorous and extensive analysis of these fields can be found in a number of textbooks [26], [27], [28]. A brief summary of relevant definitions is given below.

Let $(\Theta,\mathcal{F},P)$ be a probability space and $\mathbb{R}^{d}$ a Euclidian space endowed with Lebesgue measure. A random field is a function $v:\mathbb{R}^{d}\times\Theta\rightarrow\mathbb{R}^{m}$ which is measurable with respect to the product measure on $\mathbb{R}^{d}\times\Theta$ .²²2For concision, the dependence on $\theta$ will usually be suppressed. A centered w.s.s. (wide-sense stationary) random field has the properties that i) $E[v(i)]=0$ for all $i$ ; ii) the random vector $v(i)\in\mathcal{H}$ for all $i$ ; and iii) the covariance function $R$ only depends on the difference between index points, so that for any $i_{1},i_{2}\in\mathbb{R}^{d},\ E[v(i_{1})v^{\top}(i_{2})]=R(i_{1}-i_{2})$ . Such a field that is stationary over a domain $\mathcal{D}$ is often referred to as $\mathcal{D}-$ stationary noise. In the special case where the covariance is also independent of direction, the noise is said to be isotropic. The stationarity condition arises in situations where the noise statistics possess translational invariance, in other words, the noise statistics look the same everywhere throughout the domain. Some sources of noise may also be locally stationary, in that they are closely approximated by stationary processes over finite windows. Spatial stationarity is a common assumption in a number of diverse contexts such as image denoising [3, 29, 30], radio signal propagation [31], and geostatistics [32, Ch. 74]. Figure 1 gives three example realizations of a stationary random field with a squared exponential covariance function and varying length scales.

2.3 The Multi-Dimensional Fourier Transform

A critical step in this article involves the transformation of an integral equation to an algebraic equation, facilitated by the multi-dimensional Fourier transform, which we now define.

Definition 2.1 (Multi-Dimensional Fourier Transform)

[33, Ch. 1] The multi-dimensional Fourier transform of any $f\in L_{1}(\mathbb{R}^{d},\mathbb{R})$ is given by $\mathcal{F}\{f\}\in L_{\infty}(\mathbb{R}^{d},\mathbb{R})$ and is defined as

\displaystyle(\mathcal{F}\{f\})(\omega)\triangleq\int_{\mathbb{R}^{d}}f(i)e^{-2\pi j\omega\cdot i}di,

where $\omega$ is a $d$ -dimensional vector of spatial frequencies. We will denote the Fourier transform of any function $f$ by $\bar{f}.$ The inverse Fourier transform is defined by

\displaystyle(\mathcal{F}^{-1}\{f\})(i)=\int_{\mathbb{R}^{d}}\bar{f}(\omega)e^{2\pi ji\cdot\omega}d\omega.

This form of the inverse generally only holds when $\bar{f}(\omega)$ is also absolutely integrable [34].

2.4 Stabilizability and Detectability

This article will employ notions of stabilizability and detectability, which we define as follows:

Definition 2.2 (Stabilizability)

[35, p. 342] The matrix pair $(A,B)$ is stabilizable if there exists a matrix $C$ such that

\displaystyle\rho(A+BC^{\top})<1,

where $\rho(\cdot)$ is the spectral radius.

Definition 2.3 (Detectability)

[35, p. 342] The matrix pair $(A,B)$ is detectable if $(A^{\top},B)$ is stabilizable.

3 Problem Formulation and Assumptions

Consider the linear discrete-time system

\displaystyle x_{k+1}

\displaystyle=Ax_{k}+w_{k},

(1)

with state $x_{k}\in\mathbb{R}^{n}$ , additive process noise $w_{k}\in\mathbb{R}^{n}$ , state transition matrix $A\in\mathbb{R}^{n\times n}$ , and time index $k\in\mathbb{N}$ . We omit a control input for clarity; the results derived in this article are still valid with minor changes if a control input is present and available to the estimator. The state is measured via an infinite-dimensional measurement field defined on domain $\mathbb{R}^{d}$ , that is,

\displaystyle z_{k}(i)=\gamma(i)x_{k}+v_{k}(i).

(2)

Note that the measurement domain $i\in\mathbb{R}^{d}$ , measurement $z_{k}:\mathbb{R}^{d}\rightarrow\mathbb{R}^{m}$ , measurement noise $v_{k}:\mathbb{R}^{d}\rightarrow\mathbb{R}^{m}$ , and measurement function $\gamma:\mathbb{R}^{d}\rightarrow\mathbb{R}^{m\times n}$ .

The schema presented in (1) and (2) are well-suited to systems where the state can be represented by a low-dimensional model, but the observations are represented by a very high or infinite-dimensional model. A major area where such systems typically hold is robotics, in the context of Simultaneous Localization and Mapping (SLAM). The state to be estimated is often the pose of a mobile robot, which could be described, for example, as a 9-vector that represents the robot position, velocity, and acceleration in $\mathbb{R}^{3}$ . The state updates are generally linear, obeying Newtonian dynamics. The measurement sensors, on the other hand, can be of a much higher dimensionality.

When lighting conditions are adequate, optical cameras provide a rich source of data, and even a relatively cheap 10 megapixel camera will provide a ten million dimensional image every frame. In these conditions, vision-based localization can be particularly suited to GPS-denied areas, such as extra-terrestrial environments [36]. In the case of a colour camera, the measurement equation described (2) would apply with a $3$ -vector $(m=3)$ describing the RGB intensity of each pixel, defined on a $2-$ dimensional image domain $(d=2)$ .

Under low or no lighting conditions, where optical cameras do not provide sufficient data, lidar is often employed. This setup has applications in the mining sector [37, 38], where underground tunnels or other ”visually degraded” environments must be traversed [39]. Although not generally as “pixel-dense” as optical cameras, the commonly used SICK LMS511 is capable of providing up to 4560 samples per scan, and higher-end lidar modules can provide upwards of a million samples per scan. This schema is also suited to target-tracking problems, where the state to be estimated now represents a moving body, and the measurement sensors are often radar, lidar, optical, or an ensemble thereof [40].

We assume the additive measurement noise in (2) is a w.s.s. centered random field with bounded covariance function $R$ , so that each element of $z(i)$ is in $\mathcal{H}$ for all $i\in\mathbb{R}^{d}$ . We also assume that $\gamma$ is bounded and absolutely integrable over domain $\mathbb{R}^{d}$ under the Frobenius or other matrix norm.

The process noise and measurement noise are such that $\forall j,k\in\mathbb{N}$ and $i,i^{\prime}\in\mathbb{R}^{d}$ ,

$\displaystyle E[w_{k}]$	$\displaystyle=0$
$\displaystyle E[v_{k}(i)]$	$\displaystyle=0$
$\displaystyle E[v_{k}(i)w_{j}^{\top}]$	$\displaystyle=0$
$\displaystyle E[w_{k}w_{j}^{\top}]$	$\displaystyle=Q\delta_{j-k},$	$\displaystyle Q\in\mathbb{R}^{n\times n}$
$\displaystyle E[v_{k}(i)v_{j}^{\top}(i^{\prime})]$	$\displaystyle=R\big{(}i-i^{\prime}\big{)}\delta_{j-k},$	$\displaystyle R\big{(}i-i^{\prime}\big{)}\in\mathbb{R}^{m\times m}.$	(3)

Here $Q$ is a positive-definite matrix, $\delta_{j-k}$ is the discrete impulse, and $R$ is bounded and absolutely integrable.

We will examine the performance of a linear filter of the form

\displaystyle\begin{cases}\hat{x}_{k}&=A\hat{x}_{k-1}+K_{k}[z_{k}-\hat{z}_{k}]\\ \hat{z}_{k}(i)&=\gamma(i)A\hat{x}_{k-1},\end{cases}

(4)

under a mean square error (MSE) criterion $E[\|x_{k}-\hat{x}_{k}\|_{2}^{2}]$ . Here $K_{k}$ is a linear mapping from the space of $\mathbb{R}^{m}$ -valued fields on $\mathbb{R}^{d}$ to $\mathbb{R}^{n}$ . The innovation term we will denote by $s_{k}\triangleq z_{k}-\hat{z}_{k}$ . Motivated by the Riesz representation theorem[41, p. 188], we assume the integral form

\displaystyle K_{k}s_{k}\triangleq\int_{\mathbb{R}^{d}}\kappa_{k}(i)s_{k}(i)di,

(5)

where $\kappa_{k}:\mathbb{R}^{d}\rightarrow\mathbb{R}^{n\times m}$ is an optimal gain function to be determined.
We now present the assumptions we need, but before we do so, we introduce the following useful terms.

	$\displaystyle f(i)$	$\displaystyle\triangleq\mathcal{F}^{-1}\{\bar{\gamma}^{\top}\bar{R}^{-1}\}\in\mathbb{R}^{n\times m},$
	$\displaystyle S$	$\displaystyle\triangleq\int_{\mathbb{R}^{d}}f(i)\gamma(i)\,di\in\mathbb{R}^{n\times n}.$		(6)

These terms are well-defined given the assumptions below, and we will also show in Section 5 that $S$ is symmetric and positive semi-definite, and hence there exists a unique symmetric positive semi-definite matrix $G$ such that $S=GG$ [42, p. 439]. This matrix $G$ will be termed the principal square root of $S$ . A summary of the key assumptions stated for our model is now presented.

A1

$\gamma\in L_{1}(\mathbb{R}^{d},\mathbb{R}^{n\times m})\cap L_{\infty}(\mathbb{R}^{d},\mathbb{R}^{n\times m})$

A2

$R\in L_{1}(\mathbb{R}^{d},\mathbb{R}^{m\times m})\cap L_{\infty}(\mathbb{R}^{d},\mathbb{R}^{m\times m})$

A3

$\bar{R}(\omega)\in\mathbb{R}^{m\times m}$ is invertible for almost all $\omega$

A4

$\bar{\gamma}^{\top}\bar{R}^{-1}\in L_{1}(\mathbb{R}^{d},\mathbb{R}^{n\times m})\cap L_{2}(\mathbb{R}^{d},\mathbb{R}^{n\times m})$ and has an inverse fourier transform in $L_{1}(\mathbb{R}^{d},\mathbb{R}^{n\times m})$

A5

$\kappa_{k}\in L_{1}(\mathbb{R}^{d},\mathbb{\mathbb{R}}^{n\times m})\cap L_{2}(\mathbb{R}^{d},\mathbb{\mathbb{R}}^{n\times m})\ \forall k$

A6

The matrix pair $(A,Q)$ is stabilizable.

A7

The matrix pair $(A,G)$ is detectable.

Assumption A1 and assumption A2 will be used in Section 4.2 to ensure that an appropriate Fourier transform of $\gamma$ and $R$ is well-defined. Assumption A1 is also needed to extend the Fubini-Tonelli theorem to wide-sense-stationary random fields, as shown in Appendix 9. Assumption A3 and assumption A4 will be needed in Section 4.2 to ensure the existence of an explicit form of the optimal gain function $\kappa$ . Assumption A5 is imposed to ensure that the integral (5) is well-defined by Theorem 3.1. Whilst there are notions of observability in infinite-dimensional systems theory (cf. [22, 23]), in this article we do not require them. Instead, we shall assume that $(A,Q)$ is stabilizable (assumption A6) and that $(A,G)$ is detectable (assumption A7) in the sense of Definition 2.2 and Definition 2.3, where $G$ is a matrix-valued functional of the measurement function $\gamma$ and the measurement noise covariance $R$ . This will ensure that the filter error covariance matrix converges as time grows. It should be noted that traditional detectability conditions do not typically depend on properties of the measurement noise [35, 22, 23].

We verify that the filter integral (5) is well-defined by adapting the classical Fubini-Tonelli theorem [43, Cor. 13.9] to w.s.s. random fields as follows:

Theorem 3.1 (Fubini-Tonelli for w.s.s. random fields)

Let $v:\mathbb{R}^{d}\times\Theta\rightarrow\mathbb{R}^{m}$ be a w.s.s. random field defined on the probability space $(\Theta,\mathcal{F},P)$ , with $\theta\in\Theta$ representing a point in the sample space $\Theta$ , and with bounded covariance function $\Sigma$ . Let $s:\mathbb{R}^{d}\times\Theta\rightarrow\mathbb{R}^{m}$ be a random field given by $s\triangleq v+\gamma x$ , where $\gamma\in L_{\infty}(\mathbb{R}^{d},\mathbb{R}^{m\times n})$ and $x$ is a random vector with elements in $\mathcal{H}$ and covariance matrix $P$ . Also let $\kappa:\mathbb{R}^{d}\rightarrow\mathbb{R}^{n\times m}$ be a matrix-valued function, absolutely integrable under the Frobenius matrix norm. Each component of the vector-valued function $\kappa(\cdot)s(\cdot,\cdot)$ is assumed to be measurable with respect to the product $\sigma$ -algebra $\mathcal{L}\times\mathcal{F}$ , where $\mathcal{L}$ is the $\sigma$ -algebra corresponding to the $d$ -dimensional Lebesgue measure. Then

\int_{\mathbb{R}^{d}}\int_{\Theta}\|\kappa(i)s(i,\theta)\|_{1}P(d\theta)di<\infty,

and furthermore

\int_{\Theta}\int_{\mathbb{R}^{d}}\kappa(i)s(i,\theta)diP(d\theta)=\int_{\mathbb{R}^{d}}\int_{\Theta}\kappa(i)s(i,\theta)P(d\theta)di.

The proof of Theorem 3.1 is given in Appendix 9

4 Optimal Linear Filter Derivation

We wish to find the optimal state estimate $\hat{x}_{k}$ , where $\hat{x}_{k}$ is some bounded linear functional of all measurements up to time $k.$ Optimality in this work will refer to the estimate which minimizes the mean square error $E[\|x_{k}-\hat{x}_{k}\|^{2}_{2}]$ . This random $n$ -vector minimization problem is equivalent to $n$ random scalar minimization problems, as minimization of each component norm will minimize the entire vector norm. Each component of $\hat{x}_{k}$ is then the orthogonal projection of the corresponding component of $x_{k}$ onto the space of all scalar-valued linear functionals of the preceding measurements. Let $x_{k}^{l}\in\mathcal{H}$ denote the $l^{th}$ component of the state vector ${x}_{k}$ . Also let our observation at time $k$ be denoted by the random $m$ -vector valued function ${z}_{k}:\mathbb{R}^{d}\rightarrow\mathbb{R}^{m}$ , assumed measurable with respect to the product measure on $\mathbb{R}^{d}\times\Theta$ , and the set of all possible measurements at time $k$ be $\mathcal{Z}_{k}$ . Section 4.1 will state the optimality conditions of the filter, and Section 4.2 will derive an explicit solution that satisfies these conditions. A brief reference of the spaces and variables used in the following derivation is given in Table 2 and Table 3 respectively.

Table 2: Spaces used in Section 4 derivations.

Symbol	Description
$\mathcal{H}$	Hilbert space of random variables with zero mean and finite second moment.
$\mathcal{Z}_{k}$	Space of all possible measurements $z_{k}(\cdot)$ .
$M_{k-1}$	Space of all possible elements of $\mathcal{H}$ expressible via (8).
$M_{k}^{-}$	Space of all possible elements of $\mathcal{H}$ expressible via (12).
$\mathcal{S}_{k}$	Space of all possible elements of $\mathcal{H}$ expressible via (10).

Table 3: Variables used in Section 4 derivations.

Symbol	Description
$x$	System state
$z$	Measurement
$\hat{x}$	Optimal estimate of $x$ given past measurements
$\hat{z}$	Optimal estimate of $z$ given past measurements
$s$	Innovation vector $z-\hat{z}$
$\phi,\alpha$	Arbitrary vectors in $L_{1}(\mathbb{R}^{d},\mathbb{R}^{1\times m})\cap L_{2}(\mathbb{R}^{d},\mathbb{R}^{1\times m})$
$m$	Arbitrary element expressible via (8)
$\tilde{s}$	Arbitrary element expressible via (10)
$\beta$	Optimal post-measurement correction term
$\kappa$	Optimal kernel which generates $\beta$
$(\cdot)_{k}$	The object $(\cdot)$ at time step $k$

4.1 Optimality Conditions

In this section, we will prove the following key theorem:

Theorem 4.1

The matrix kernel $\kappa_{k}(i)\in L_{1}(\mathbb{R}^{d},\mathbb{R}^{n\times m})\cap L_{2}(\mathbb{R}^{d},\mathbb{R}^{n\times m})$ for (4) is optimal in the mean square error sense if and only if

\displaystyle\int_{\mathbb{R}^{d}}\kappa_{k}(i)\bigg{(}\gamma(i)P_{k|k-1}\gamma^{\top}(i^{\prime})+R(i-i^{\prime})\bigg{)}di=P_{k|k-1}\gamma^{\top}(i^{\prime}).

(7)

where $P_{k}$ and $P_{k|k-1}$ denote the posterior covariance matrix and prior covariance matrix of the filter respectively.

Before we provide a proof of Theorem 4.1, we must first define some important sets, and projections onto those sets.
Consider the set of all possible elements of $\mathcal{H}$ which can be expressed as a bounded linear integral functional operating over all previous measurements:

	$\displaystyle M_{k-1}\triangleq\bigg{\{}m_{k-1}\in\mathcal{H}:m_{k-1}=\sum_{j=1}^{k-1}\int_{\mathbb{R}^{d}}\phi_{j}(i)z_{j}(i)di\bigg{\}},$			(8)
	$\displaystyle\text{ for some }\phi_{j}\in L_{1}(\mathbb{R}^{d},\mathbb{\mathbb{R}}^{1\times m})\cap L_{2}(\mathbb{R}^{d},\mathbb{\mathbb{R}}^{1\times m})$	.

Note that $M_{k-1}$ is a subspace of $\mathcal{H}$ . We will also assume for the moment that $M_{k-1}$ is a closed subspace of $\mathcal{H}$ . Suppose that we are already in possession of the optimal estimate of $x_{k}^{l}$ given the previous measurements. This estimate is the orthogonal projection of $x_{k}^{l}$ onto $M_{k-1}$ ,

\displaystyle\hat{x}_{k|k-1}^{l}\triangleq\operatorname{proj}_{M_{k-1}}x_{k}^{l}.

Given the measurements associated with $M_{k-1},\;\hat{x}_{k|k-1}^{l}$ is the best estimate of $x_{k}^{l}$ in the sense that it is the element of $M_{k-1}$ that is the unique $\mathcal{H}$ -norm minimizing vector in relation to the $l^{th}$ component of the true state.

Let $z_{k}^{p}$ be the $p^{th}$ element of ${z}_{k}$ and $\hat{z}_{k}^{p}$ denote the orthogonal projection of $z_{k}^{p}$ onto $M_{k-1}$ ,

\displaystyle\hat{z}_{k}^{p}(i)\triangleq\operatorname{proj}_{M_{k-1}}z_{k}^{p}(i)\quad\forall i.

The measurement innovation vector is $s_{k}(i)\triangleq z_{k}(i)-\hat{z}_{k}(i)$ , the $p^{th}$ entry of which is denoted $s^{p}_{k}(i)\in\mathbb{R}.$ Note that for all $m_{k-1}\in M_{k-1}$ ,

\displaystyle s^{p}_{k}(i)\perp m_{k-1}\quad\forall i.

(9)

and let

	$\displaystyle\mathcal{S}_{k}\triangleq\bigg{\{}\tilde{s}_{k}\in\mathcal{H}:\tilde{s}_{k}=\sum_{p=1}^{m}\int_{\mathbb{R}^{d}}\alpha_{k}^{p}(i)s_{k}(i)di\bigg{\}},$		(10)
	$\displaystyle\text{for some }\alpha_{k}^{p}\in L_{1}(\mathbb{R}^{d},\mathbb{\mathbb{R}}^{1\times m})\cap L_{2}(\mathbb{R}^{d},\mathbb{\mathbb{R}}^{1\times m}).$

which is a subspace of $\mathcal{H}.$ Here ${\alpha}_{k}\in L_{1}(\mathbb{R}^{d},\mathbb{\mathbb{R}}^{n\times m})\cap L_{2}(\mathbb{R}^{d},\mathbb{\mathbb{R}}^{n\times m}),$ with the $p^{th}$ row given by $\alpha_{k}^{p}$ .

Lemma 4.2 ( $\mathcal{S}_{k}\perp M_{k-1}$ )

Given the previous definitions of $\mathcal{S}_{k},M_{k-1},$ every element of $\mathcal{S}_{k}$ is orthogonal to every element of $M_{k-1}.$

Proof 4.3.

$\alpha_{k}^{p}\in L_{1}(\mathbb{R}^{d},\mathbb{R}^{1\times m})$ and $s_{k}^{p}$ is the sum of a random field with bounded covariance function, and an absolutely integrable function. We may therefore apply Theorem 3.1 and exchange integration and expectation operators.

$\displaystyle\bigg{\langle}\tilde{s}_{k},m_{k-1}\bigg{\rangle}_{\mathcal{H}}$	$\displaystyle=\bigg{\langle}\sum_{p=1}^{m}\int_{\mathbb{R}^{d}}\alpha_{k}^{p}(i)s_{k}(i)di,m_{k-1}\bigg{\rangle}_{\mathcal{H}}$
	$\displaystyle=E\bigg{[}\left(\sum_{p=1}^{m}\int_{\mathbb{R}^{d}}\alpha_{k}^{p}(i)s_{k}(i)di\right)\cdot m_{k-1}\bigg{]}$
	$\displaystyle=\sum_{p=1}^{m}\int_{\mathbb{R}^{d}}\alpha_{k}^{p}(i)E[s_{k}(i)\cdot m_{k-1}]di$
	$\displaystyle=0$	(11)

The final equality following from the orthogonality property stated in (9).

Having shown that the spaces $\mathcal{S}$ and $M_{k-1}$ are orthogonal, we will now find the projection of the true state $x_{k}$ onto the space of all possible elements in $\mathcal{H}$ that can be expressed by a bounded linear integral functional operating over all previous measurements and the current measurement. Let $M^{-}_{k}$ be the set of all possible outputs of linear functionals of any element in $\mathcal{Z}_{k}$ ,

	$\displaystyle M^{-}_{k}\triangleq\bigg{\{}m^{-}_{k}\in\mathcal{H}:m^{-}_{k}=\int_{\mathbb{R}^{d}}\phi(i)z_{k}(i)di\bigg{\}},$			(12)
	$\displaystyle\text{ for some }\phi\in L_{1}(\mathbb{R}^{d},\mathbb{\mathbb{R}}^{1\times m})\cap L_{2}(\mathbb{R}^{d},\mathbb{\mathbb{R}}^{1\times m}),\ z_{k}\in\mathcal{Z}_{k}$	.

Note that using this definition, it immediately follows that $M_{k-1}+M^{-}_{k}=M_{k}$ , where the addition of sets is given by the Minkowski sum

\displaystyle U+V=\{u+v|u\in U,v\in V\}.

We also use the $\oplus$ operator to refer to a direct sum of subspaces, where any element within the direct sum is uniquely represented by a sum of elements within the relevant subspaces [44]. We wish to find the projection of $x_{k}^{l}$ onto the space of $M_{k-1}+M^{-}_{k}$ , which is given by

$\displaystyle\hat{x}^{l}_{k}$	$\displaystyle=\operatorname{proj}_{M_{k-1}+M^{-}_{k}}x_{k}^{l}$
	$\displaystyle=\operatorname{proj}_{M_{k-1}\oplus\mathcal{S}_{k}}x_{k}^{l}$
	$\displaystyle=\operatorname{proj}_{M_{k-1}}x_{k}^{l}+\operatorname{proj}_{\mathcal{S}_{k}}x_{k}^{l}$
	$\displaystyle=\hat{x}_{k\|k-1}^{l}+\beta_{k}^{l},\quad\beta_{k}^{l}\triangleq\operatorname{proj}_{\mathcal{S}_{k}}x_{k}^{l}.$	(13)

This sum represents a decomposition of the projected state component given measurements up to time $k$ as a unique sum of a component within $M_{k-1}$ and a component orthogonal to $M_{k-1}.$ Note that as $\beta^{l}_{k}\in{\mathcal{S}_{k}}$ , it can be formulated as $\sum_{p=1}^{m}\int_{\mathbb{R}^{d}}\kappa_{k}^{l,p}(i)s_{k}^{p}(i)di$ , each $\kappa_{k}^{l,p}(i)$ being the $l^{th}$ row, $p^{th}$ column element of some ${\kappa}_{k}(i)\in\mathbb{R}^{n\times m}$ which must be determined. This matrix-valued function ${{\kappa}_{k}}(i)$ is equivalent to our optimal gain. We are now ready to prove Theorem 4.1, which we will do by formulating necessary and sufficient conditions on the gain such that the optimal post-measurement correction term $\beta_{k}=\int_{\mathbb{R}^{d}}\kappa_{k}(i)s_{k}(i)\,di$ is generated. The proof of Theorem 4.1 is as follows:

Proof 4.4.

We will denote the $l^{th}$ column of ${\kappa}^{\top}_{k}(i)$ as ${{\kappa}^{l}_{k}}(i)\in\mathbb{R}^{m}.$ By the Hilbert projection theorem, $(x_{k}^{l}-\beta^{l}_{k})\perp{\mathcal{S}_{k}}\implies\langle x_{k}^{l}-\beta^{l}_{k},\tilde{s}_{k}\rangle_{\mathcal{H}}=0$ for all $\tilde{s}_{k}\in\mathcal{S}_{k}$ , where $\beta^{l}_{k}$ is guaranteed to be unique. We then have that $\langle x_{k}^{l},\tilde{s}_{k}\rangle_{\mathcal{H}}=\langle\beta^{l}_{k},\tilde{s}_{k}\rangle_{\mathcal{H}}$ which, in conjunction with the fact that $\langle\hat{x}_{k|k-1}^{l},\tilde{s}_{k}\rangle_{\mathcal{H}}=0$ , implies

\displaystyle\langle x_{k}^{l}-\hat{x}_{k|k-1}^{l},\tilde{s}_{k}\rangle_{\mathcal{H}}=\langle\beta^{l}_{k},\tilde{s}_{k}\rangle_{\mathcal{H}}.

(14)

This is a necessary and sufficient condition for $\beta_{k}^{l}$ to be the projection of our state component onto the space $\mathcal{S}_{k}.$ We now must find the appropriate optimal gain function $\kappa_{k}(i)$ such that $\beta_{k}$ , the corresponding vector generated, satisfies condition (14) for all $l.$ Expanding out the definitions of $\beta_{k}^{l}$ and $\tilde{s}_{k}$ , interchanging integration and summation, and rearranging, leads to

\displaystyle\bigg{\langle}\mkern-5.0mux_{k}^{l}-\hat{x}_{k|k-1}^{l}-\int_{\mathbb{R}^{d}}\sum_{p=1}^{m}\kappa_{k}^{l,p}(i)s_{k}^{p}(i)di,\mkern-5.0mu\int_{\mathbb{R}^{d}}\sum_{p=1}^{m}\alpha_{k}^{p}(i)s^{p}_{k}(i)di\mkern-5.0mu\bigg{\rangle}_{\mathcal{H}}\mkern-14.0mu=\mkern-2.0mu0

Note that $\sum_{p=1}^{m}\kappa_{k}^{l,p}(i){s}_{k}^{p}(i)$ is simply the vector multiplication ${s}^{\top}_{k}(i){\kappa}^{l}_{k}(i).$ The same reasoning may be applied to show $\sum_{p=1}^{m}\alpha_{k}^{p}(i)s^{p}_{k}(i)=s^{\top}_{k}(i)\alpha_{k}(i).$ We then have for all $l$

	$\displaystyle 0\mkern-3.0mu$	$\displaystyle=\mkern-5.0mu\bigg{\langle}\mkern-5.0mux_{k}^{l}\mkern-4.0mu-\mkern-4.0mu\hat{x}_{k\|k-1}^{l}\mkern-3.0mu-\mkern-6.0mu\int_{\mathbb{R}^{d}}\mkern-4.0mu\sum_{p=1}^{m}\mkern-2.0mu\kappa_{k}^{l,p}\mkern-2.0mu(i)s_{k}^{p}(i)di,\mkern-5.0mu\int_{\mathbb{R}^{d}}\mkern-3.0mu\sum_{p=1}^{m}\mkern-2.0mu\alpha_{k}^{p}(i^{\prime})s^{p}_{k}(i^{\prime})di^{\prime}\mkern-5.0mu\bigg{\rangle}_{\mathcal{\mkern-5.0muH}}$
		$\displaystyle=\mkern-5.0mu\bigg{\langle}\mkern-5.0mux_{k}^{l}\mkern-4.0mu-\mkern-4.0mu\hat{x}_{k\|k-1}^{l}\mkern-3.0mu-\mkern-6.0mu\int_{\mathbb{R}^{d}}{s}^{\top}_{k}(i){\kappa}^{l}_{k}(i)di,\int_{\mathbb{R}^{d}}s^{\top}_{k}(i^{\prime})\alpha_{k}(i^{\prime})di^{\prime}\bigg{\rangle}_{\mathcal{H}}$
		$\displaystyle=\mkern-3.0muE\bigg{[}\bigg{(}x_{k}^{l}-\hat{x}_{k\|k-1}^{l}-\mkern-4.0mu\int_{\mathbb{R}^{d}}\mkern-7.0mu{s}_{k}(i)^{\top}{\kappa}^{l}_{k}(i)di\bigg{)}\mkern-4.0mu\int_{\mathbb{R}^{d}}\mkern-7.0mus^{\top}_{k}(i^{\prime})\alpha_{k}(i^{\prime})di^{\prime}\bigg{]}$
		$\displaystyle=\mkern-7.0mu\int_{\Theta}\mkern-3.0mu\int_{\mathbb{R}^{d}}\mkern-7.0mu\bigg{(}\mkern-4.0mux_{k}^{l}\mkern-3.0mu-\mkern-3.0mu\hat{x}_{k\|k-1}^{l}\mkern-3.0mu-\mkern-7.0mu\int_{\mathbb{R}^{d}}\mkern-9.0mu{s}_{k}(i)^{\mkern-4.0mu\top}{\kappa}^{l}_{k}(i)di\mkern-4.0mu\bigg{)}s^{\mkern-4.0mu\top}_{k}(i^{\prime})\alpha_{k}(i^{\prime})di^{\prime}dP(\theta).$

Note that as $s_{k}$ is a random field with bounded covariance function, and $\alpha_{k}$ is composed of absolutely integrable components, then we may employ Theorem 3.1 and interchange expectation and integration operators. Therefore for all $l,$

	$\displaystyle 0\mkern-3.0mu$	$\displaystyle=\mkern-8.0mu\int_{\mathbb{R}^{d}}\mkern-3.0mu\int_{\Theta}\mkern-3.0mu\bigg{(}\mkern-3.0mux_{k}^{l}\mkern-3.0mu-\mkern-3.0mu\hat{x}_{k\|k-1}^{l}\mkern-3.0mu-\mkern-4.0mu\int_{\mathbb{R}^{d}}\mkern-4.0mu{s}_{k}(i)^{\mkern-4.0mu\top}\mkern-4.0mu{\kappa}^{l}_{k}(i)di\bigg{)}s^{\top}_{k}(i^{\prime})\alpha_{k}(i^{\prime})dP(\theta)di^{\prime}$
		$\displaystyle=\mkern-4.0mu\int_{\mathbb{R}^{d}}\mkern-4.0muE\mkern-2.0mu\left[\left(\mkern-4.0mux_{k}^{l}-\hat{x}_{k\|k-1}^{l}-\int_{\mathbb{R}^{d}}{s}_{k}(i)^{\top}{\kappa}^{l}_{k}(i)di\right)s^{\top}_{k}(i^{\prime})\right]\alpha_{k}(i^{\prime})di^{\prime}$

Because $\alpha_{k}$ is arbitrary, we must have that for all $l$ and almost all $i^{\prime}$

\displaystyle E\bigg{[}\bigg{(}x_{k}^{l}-\hat{x}_{k|k-1}^{l}-\int_{\mathbb{R}^{d}}{s}^{\top}_{k}(i){\kappa}_{k}^{l}(i)di\bigg{)}{s}_{k}^{\top}(i^{\prime})\bigg{]}={0}.

(15)

As the first argument is a scalar and the second a vector, the condition that this holds true for all $l$ can be conveyed by a single matrix equality. Note that $s^{\top}_{k}(i)\kappa_{k}^{l}(i)$ is equivalent to the $l^{th}$ row of $\kappa_{k}(i)s_{k}(i).$ By stacking the $l$ row vectors in (15) to form an $\mathbb{R}^{n\times m}$ matrix we may impose the matrix equality that for almost all $i^{\prime}$

\displaystyle E\bigg{[}\bigg{(}{x}_{k}-\hat{x}_{k|k-1}-\int_{\mathbb{R}^{d}}{\kappa}_{k}(i){s}_{k}(i)di\bigg{)}{s}_{k}^{\top}(i^{\prime})\bigg{]}={0}.

(16)

Substituting in our value for $s_{k}(i)$ into (16) we find

		$\displaystyle E\bigg{[}\bigg{(}x_{k}-\hat{x}_{k\|k-1}-\int_{\mathbb{R}^{d}}\kappa_{k}(i)\big{[}\gamma(i)(x_{k}-\hat{x}_{k\|k-1})+v_{k}(i)\big{]}di\bigg{)}$
		$\displaystyle\times\bigg{(}\big{[}\gamma(i^{\prime})(x_{k}-\hat{x}_{k\|k-1})+v_{k}(i^{\prime})\big{]}\bigg{)}^{\top}\bigg{]}$
	$\displaystyle=$	$\displaystyle E\bigg{[}(x_{k}-\hat{x}_{k\|k-1})(x_{k}-\hat{x}_{k\|k-1})^{\top}\bigg{]}\gamma^{\top}(i^{\prime})$
		$\displaystyle-\int_{\mathbb{R}^{d}}\bigg{(}\kappa_{k}(i)(\gamma(i)E\bigg{[}(x_{k}-\hat{x}_{k\|k-1})(x_{k}-\hat{x}_{k\|k-1})^{\top}\bigg{]}\gamma^{\top}(i^{\prime})$
		$\displaystyle+E\bigg{[}v_{k}(i)v_{k}^{\top}(i^{\prime})\bigg{]}\bigg{)}di=0$

Let the posterior covariance matrix $E[(x_{k}-\hat{x}_{k})(x_{k}-\hat{x}_{k})^{\top}]$ be denoted by $P_{k}$ , and let the prior covariance matrix $E[(x_{k}-\hat{x}_{k|k-1})(x_{k}-\hat{x}_{k|k-1})^{\top}]$ be denoted by $P_{k|k-1}.$ Substituting these terms for the corresponding expectations, we find the optimal gain function $\kappa_{k}(i)$ must satisfy the condition

\displaystyle P_{k|k-1}\gamma^{\top}(i^{\prime})-\mkern-6.0mu\int_{\mathbb{R}^{d}}\mkern-4.0mu\kappa_{k}(i)\bigg{(}\gamma(i)P_{k|k-1}\gamma^{\top}(i^{\prime})+R(i-i^{\prime})\bigg{)}di=0.

Rearranging these terms we arrive at the necessary and sufficient condition

\displaystyle\int_{\mathbb{R}^{d}}\kappa_{k}(i)\bigg{(}\gamma(i)P_{k|k-1}\gamma^{\top}(i^{\prime})+R(i-i^{\prime})\bigg{)}di=P_{k|k-1}\gamma^{\top}(i^{\prime}).

(17)

An alternative derivation is given in [21] based on the Calculus of Variations approach. The derivation above, which uses the Hilbert Projection Theorem, has the advantage of a guaranteed uniqueness of the correction term $\int_{\mathbb{R}^{d}}\kappa_{k}(i)s_{k}(i)di$ and state estimate $\hat{x}_{k}$ , which follows immediately from Theorem 2.1. This approach avoids the use of the functional derivative which has attracted some criticism [8], [5], as well as closely mirroring the methodology of the original Kalman proof [45].

Recall that we initially assumed $M_{k-1}$ is a closed subspace. We will now show that an explicit closed-form solution of the optimal gain function can be derived via $d$ -dimensional Fourier based methods, thanks to the w.s.s. assumption of the measurement noise. We will then show that when a unique gain function which satisfies (17) exists, it is indeed the case that $M_{k-1}$ is closed, which is derived in Corollary 4.5.

4.2 An Explicit Solution for the Optimal Gain Function

We may now proceed to find an explicit value for the optimal gain function that satisfies (7). Consider the matrix-valued functional

\displaystyle F(\kappa_{k})\triangleq\bigg{(}I-\int_{\mathbb{R}^{d}}\kappa_{k}(i)\gamma(i)di\bigg{)}P_{k|k-1}\in\mathbb{R}^{n\times n}

(18)

and note that,

\displaystyle\int_{\mathbb{R}^{d}}\kappa_{k}(i)R(i^{\prime}-i)di

\displaystyle=F(\kappa_{k})\gamma^{\top}(i^{\prime})

(19)

The left hand side of (19) is simply a multi-dimensional convolution. Many useful properties of the scalar Fourier transform carry over to the multi-dimensional Fourier transform. We will make use of the convolution property [46],

		$\displaystyle\mathcal{F}\left\{\int_{\mathbb{R}^{d}}f(i_{1}-y_{1},...,i_{d}-y_{d})g(y_{1},...,y_{d})\right\}dy_{1}...dy_{d}$
	$\displaystyle=$	$\displaystyle\mathcal{F}\{(f\ast g)(i_{1},i_{2},...,i_{d})\}=(\mathcal{F}\{f\})(\omega)(\mathcal{F}\{g\})(\omega).$		(20)

This enables us to find that (19) is equivalent to

\displaystyle(\kappa_{k}\ast R)(i^{\prime})

\displaystyle=F(\kappa_{k})\gamma^{\top}(i^{\prime})

(21)

Note that $\omega$ will be a vector of the same dimension as $i$ . We now require three assumptions to ensure existence and invertibility of the Fourier transform of $R$ , as well as the existence of the Fourier transform of $\gamma^{\top}$ : See A1 See A2 See A3 Applying (21) and (20) to (19) implies

$\displaystyle\bar{\kappa}_{k}(\omega)\bar{R}(\omega)$	$\displaystyle=F(\kappa_{k})\bar{\gamma}^{\top}(\omega)$
$\displaystyle\bar{\kappa}_{k}(\omega)$	$\displaystyle=F(\kappa_{k})\bar{\gamma}^{\top}(\omega)\bar{R}(\omega)^{-1}$
$\displaystyle\kappa_{k}(i)$	$\displaystyle=F(\kappa_{k})f(i),\ f(i)=\mathcal{F}^{-1}\{\bar{\gamma}^{\top}(\omega)\bar{R}(\omega)^{-1}\}.$	(22)

For this method to be applicable we must also make the following assumption See A4 Combining (18) and (22) yields

$\displaystyle F(\kappa_{k})$	$\displaystyle=\bigg{(}I-F(\kappa_{k})\int_{\mathbb{R}^{d}}f(i)\gamma(i)di\bigg{)}P_{k\|k-1}$
$\displaystyle\implies F(\kappa_{k})$	$\displaystyle=P_{k\|k-1}\bigg{(}I+\int_{\mathbb{R}^{d}}f(i)\gamma(i)diP_{k\|k-1}\bigg{)}^{-1}$	(23)
$\displaystyle\implies\kappa_{k}(i)$	$\displaystyle=P_{k\|k-1}\bigg{(}I+\int_{\mathbb{R}^{d}}f(i)\gamma(i)diP_{k\|k-1}\bigg{)}^{-1}f(i)$

Note that this directly implies that for all $k\in\mathbb{N}$ , $\kappa_{k}\in L_{1}(\mathbb{R}^{d},\mathbb{R}^{n\times m})\cap L_{2}(\mathbb{R}^{d},\mathbb{R}^{n\times m})$ . For simplicity of notation, recall from (6) that $S=\int_{\mathbb{R}^{d}}f(i)\gamma(i)di\in\mathbb{R}^{n\times n}.$ Having thus found the optimal $\kappa_{k}(i)$ function which generates $\beta_{k}$ , we now modify (13) to find for any state component $l$ ,

	$\displaystyle\operatorname{proj}_{M_{k-1}+M^{-}_{k}}x_{k}^{l}$
	$\displaystyle=\hat{x}_{k\|k-1}^{l}+\sum_{p=1}^{m}\int_{\mathbb{R}^{d}}\big{[}P_{k\|k-1}(I+SP_{k\|k-1})^{-1}f(i)\big{]}^{l,p}s^{p}_{k}(i)di.$

These $l$ equations may be stacked up into a single matrix equation of the form

	$\displaystyle\operatorname{proj}_{M_{k-1}+M^{-}_{k}}x_{k}$
	$\displaystyle=\hat{x}_{k\|k-1}+\int_{\mathbb{R}^{d}}P_{k\|k-1}(I+SP_{k\|k-1})^{-1}f(i)s_{k}(i)di,$		(24)

where the projection operator acting on the state vector is understood to represent the vector of each components projections. We have thus far assumed that $\hat{x}_{k|k-1}$ is known to us, we will complete the chain now by determining this value explicitly as

$\displaystyle\hat{x}_{k\|k-1}$	$\displaystyle=\operatorname{proj}_{M_{k-1}}x_{k}$
	$\displaystyle=\operatorname{proj}_{M_{k-1}}Ax_{k-1}+\operatorname{proj}_{M_{k-1}}w_{k-1}$
	$\displaystyle=A\operatorname{proj}_{M_{k-1}}x_{k-1}$
	$\displaystyle=A\operatorname{proj}_{M_{k-2}+M^{+}_{k-2}}x_{k-1}$
	$\displaystyle=A\hat{x}_{k-1}$	(25)

The form of (24) with this substitution is then

	$\displaystyle\hat{x}_{k}=$	$\displaystyle\operatorname{proj}_{M_{k-1}+M^{-}_{k}}x_{k}$
	$\displaystyle=$	$\displaystyle A{\hat{x}_{k-1}}+P_{k\|k-1}(I+SP_{k\|k-1})^{-1}\int_{\mathbb{R}^{d}}f(i)s_{k}(i)di$
	$\displaystyle=$	$\displaystyle A{\hat{x}_{k-1}}+$
		$\displaystyle P_{k\|k-1}(I+SP_{k\|k-1})^{-1}\int_{\mathbb{R}^{d}}\mkern-5.0muf(i)\big{(}z_{k}(i)-\gamma(i)\hat{x}_{k\|k-1}\big{)}di,$

where in the last line we substitute $s_{k}(i)=z_{k}(i)-\hat{z}_{k}(i)$ , noting that

	$\displaystyle\hat{z}_{k}(i)$	$\displaystyle=\operatorname{proj}_{M_{k-1}}z_{k}(i)$
		$\displaystyle=\operatorname{proj}_{M_{k-1}}\big{(}\gamma(i)x_{k}+v_{k}(i)\big{)}$
		$\displaystyle=\gamma(i)\operatorname{proj}_{M_{k-1}}x_{k}$
		$\displaystyle=\gamma(i)\hat{x}_{k\|k-1}.$

Finally, we will show that the space $M_{k-1}$ is indeed closed, which was previously assumed in Section 4.1. Hence it admits a unique projection by the Hilbert projection theorem (Thm. 2.1), implying the uniqueness of the optimal filter derived above.

Corollary 4.5.

Given the system defined in Section 3, and conditions given in Section 4.1, the space $M_{k-1}$ is a closed subspace of $\mathcal{H}$ .

Proof 4.6.

The optimal gain function given by (22) represents a kernel defining an operator $K_{k}:\mathcal{H}\rightarrow\mathcal{S}_{k}$ as presented in (5). This operator generates a unique $\beta_{k}=K_{k}x_{k}$ such that for each element $l$ , $(x_{k}^{l}-\beta_{k}^{l})\perp\mathcal{S}_{k}$ . This unique $\beta_{k}$ is in fact a minimizer as for all $\tilde{s}_{k}\in\mathcal{S}_{k}$

	$\displaystyle\\|\tilde{s}_{k}-x_{k}^{l}\\|_{\mathcal{H}}^{2}=$	$\displaystyle\\|\beta_{k}^{l}-x_{k}^{l}+\tilde{s}_{k}-\beta_{k}^{l}\\|_{\mathcal{H}}^{2}$
	$\displaystyle=$	$\displaystyle\\|\beta_{k}^{l}-x_{k}^{l}\\|_{\mathcal{H}}^{2}+\\|\tilde{s}_{k}-\beta_{k}^{l}\\|_{\mathcal{H}}^{2}$
		$\displaystyle+2\langle\beta_{k}^{l}-x_{k}^{l},\tilde{s}_{k}-\beta_{k}^{l}\rangle_{\mathcal{H}}$
	$\displaystyle=$	$\displaystyle\\|\beta_{k}^{l}-x_{k}^{l}\\|_{\mathcal{H}}^{2}+\\|\tilde{s}_{k}-\beta_{k}^{l}\\|_{\mathcal{H}}^{2}$
	$\displaystyle\implies\\|x_{k}^{l}-\tilde{s}_{k}\\|_{\mathcal{H}}^{2}\geq$	$\displaystyle\\|x_{k}^{l}-\beta_{k}^{l}\\|_{\mathcal{H}}^{2},$

with equality iff $\tilde{s}_{k}=\beta_{k}^{l}$ .

Let us now examine an arbitrary Cauchy sequence $\{s_{1},s_{2},...,s_{n}\}$ with each element in $\mathcal{S}_{k}$ . We also denote $\lim_{n\rightarrow\infty}s_{n}=\bar{s}$ . As $\mathcal{S}_{k}$ is a subspace of a closed space $\mathcal{H}$ , it is guaranteed that $\bar{s}\in\mathcal{H}$ exists. For any $\bar{s}\in\mathcal{H}$ there exists a unique minimizer $\beta^{l}\in\mathcal{S}_{k}$ such that

	$\displaystyle\\|\beta^{l}-\bar{s}\\|_{\mathcal{H}}^{2}$	$\displaystyle\leq\\|s-\bar{s}\\|_{\mathcal{H}}^{2}\quad\forall s\in\mathcal{S}_{k}$
	$\displaystyle\\|\beta^{l}-\bar{s}\\|_{\mathcal{H}}^{2}$	$\displaystyle\leq\\|s_{n}-\bar{s}\\|_{\mathcal{H}}^{2}\quad\forall n.$

As the upper bound goes to zero as $n$ increases, and the inequality holds for all $n$ , $\bar{s}$ must equal $\beta^{l}$ and hence also reside in the space $\mathcal{S}_{k}$ . As the Cauchy sequence was arbitrary, $\mathcal{S}_{k}$ contains all its limit points and is hence closed. It is well known that if two closed linear subspaces of a Hilbert space are orthogonal, then the direct sum of these subspaces is also closed [47, p. 38]. This result, combined with the fact that $M_{k}=M_{0}\oplus\mathcal{S}_{1}\oplus\mathcal{S}_{2}...\oplus\mathcal{S}_{k}$ , implies that if $M_{0}$ is a closed space, $M_{k}$ is a closed space for any $k$ .

4.3 The Optimal Filter

We now have sufficient knowledge to state a key theorem.

Theorem 4.7.

For the system defined in Section 3, at each time step $k$ , the optimal filter gain $\kappa_{k}(i)$ and associated covariance matrices are given by the equations.

	$\displaystyle\kappa_{k}(i)$	$\displaystyle=P_{k}f(i),$
	$\displaystyle P_{k\|k-1}$	$\displaystyle=AP_{k-1}A^{\top}+Q,$
	$\displaystyle P_{k}$	$\displaystyle=P_{k\|k-1}\bigg{(}I+SP_{k\|k-1}\bigg{)}^{-1},$

where $f(i)=\mathcal{F}^{-1}\{\bar{\gamma}^{\top}(\omega)\bar{R}(\omega)^{-1}\},\quad S=\int_{\mathbb{R}^{d}}f(i)\gamma(i)di$ .

Proof 4.8.

The optimal gain function $\kappa_{k}$ has been derived in Section 4.2 and is given in (22). It remains to derive the covariance matrix update equations. First, we connect the prior covariance matrix to the covariance matrix of the previous time step, via

$\displaystyle P_{k\|k-1}$	$\displaystyle=E[(x_{k}-\hat{x}_{k\|k-1})(x_{k}-\hat{x}_{k\|k-1})^{\top}]$
	$\displaystyle=AE[(x_{k-1}-\hat{x}_{k-1})(x_{k-1}-\hat{x}_{k-1})^{\top}]A^{\top}+Q$
	$\displaystyle=AP_{k-1}A^{\top}+Q.$	(26)

Then we determine the relation between $P_{k}$ and $P_{k|k-1}$ by first observing that expansion and substitution yields the equation

	$\displaystyle P_{k}=E[e_{k}e_{k}^{\top}]\mkern-14.0mu$
	$\displaystyle=P_{k\|k-1}$	$\displaystyle-\int_{\mathbb{R}^{d}}\mkern-7.0mu\kappa_{k}(i)\gamma(i)diP_{k\|k-1}-P_{k\|k-1}\int_{\mathbb{R}^{d}}\mkern-7.0mu\gamma^{\top}(i)\kappa_{k}^{\top}(i)di$
		$\displaystyle+\int_{\mathbb{R}^{d}}\kappa_{k}(i)\gamma(i)diP_{k\|k-1}\int_{\mathbb{R}^{d}}\gamma^{\top}(i)\kappa_{k}^{\top}(i)di$
		$\displaystyle+\int_{\mathbb{R}^{d}}\int_{\mathbb{R}^{d}}\kappa_{k}(i)R(i-i^{\prime})\kappa_{k}(i^{\prime})^{\top}\,di\,di^{\prime}.$

By substitution of (7), all but the first two terms cancel to give us

\displaystyle P_{k}=\bigg{(}I-\int_{\mathbb{R}^{d}}\kappa_{k}(i)\gamma(i)di\bigg{)}P_{k|k-1}

(27)

But this is precisely the equation for $F(\kappa_{k})$ defined in (18), so from (23) we simply have

\displaystyle P_{k}=F(\kappa_{k})=P_{k|k-1}\bigg{(}I+\int_{\mathbb{R}^{d}}f(i)\gamma(i)\,diP_{k|k-1}\bigg{)}^{-1}

(28)

5 Optimal Linear Filter Stability

We will now show that the derived filter provided by Theorem 4.7 is mean square stable under assumptions A1-A7. To do so we will establish via Lemma 5.1 that the associated covariance matrices obey a discrete-time algebraic Riccati equation. We will then, in Theorem 5.3, show that, so long as the initial covariance matrix is non-negative and symmetric, there exists a unique asymptotic a priori covariance matrix that satisfies this Riccati equation.

Lemma 5.1.

Consider the filter equations provided by Theorem 4.7 for the system defined in Section 3. The matrix $S$ is positive semi-definite, it is also finite under assumptions A1 and A4. The matrix $S$ admits a unique square root decomposition $S=GG$ for some symmetric positive semi-definite $G\in\mathbb{R}^{n\times n}$ . Furthermore, for $k\geq 1$ , the a priori covariance matrices $P_{k+1|k}$ satisfy the discrete-time algebraic Riccati equation

	$\displaystyle P_{k+1\|k}=$	$\displaystyle AP_{k\|k-1}A^{\top}$
	$\displaystyle-$	$\displaystyle AP_{k\|k-1}G\left(I+G^{\top}P_{k\|k-1}G\right)^{-1}G^{\top}P_{k\|k-1}A^{\top}+Q.$		(29)

Proof 5.2.

First we find that $\bar{R}(\omega)$ is Hermitian as

	$\displaystyle\bar{R}(\omega)^{*}=$	$\displaystyle\int_{\mathbb{R}^{d}}e^{2\pi j\omega\cdot i}R(i)^{\top}\,di$
	$\displaystyle=$	$\displaystyle\int_{\mathbb{R}^{d}}e^{2\pi j\omega\cdot i}R(-i)\,di$
	$\displaystyle=$	$\displaystyle\int_{\mathbb{R}^{d}}e^{-2\pi j\omega\cdot i}R(i)di=\bar{R}(\omega),$

where $(\cdot)^{*}$ denotes the complex conjugate transpose. $\bar{R}(\omega)$ is also positive semi-definite, as for almost all $\omega$ and all $u\in\mathbb{C}^{m}$ ,

	$\displaystyle u^{*}\bar{R}(w)u=$	$\displaystyle\int_{\mathbb{R}^{d}}e^{-2\pi ji\cdot\omega}E[u^{*}v(\ell+i)v(\ell)^{\top}u]\,di,\ \forall\ell\in\mathbb{R}^{d}$
	$\displaystyle=$	$\displaystyle\int_{\mathbb{R}^{d}}e^{-2\pi ji\cdot\omega}E\left[\left(v(\ell)^{\top}u\right)\left(v(\ell+i)^{\top}u\right)^{*}\right]\,di.$

Define the stationary complex-valued random field $\psi(\ell)\triangleq v(\ell)^{\top}u\in\mathbb{C}$ . Denote the corresponding auto-covariance function and power spectral density as $K_{\psi}(i)$ and $S_{\psi}(\omega)$ respectively. It then follows that

	$\displaystyle u^{*}\bar{R}(\omega)u=$	$\displaystyle\int_{\mathbb{R}^{d}}e^{-2\pi ji\cdot\omega}E\left[\psi(\ell)\psi(\ell+i)^{*}\right]\,di$
	$\displaystyle=$	$\displaystyle\int_{\mathbb{R}^{d}}e^{-2\pi ji\cdot\omega}K_{\psi}(i)\,di$
	$\displaystyle=$	$\displaystyle S_{\psi}(\omega),$

where the last equality is given by the Wiener-Khintchine-Einstein relation [48, p. 82]. As $S_{\psi}(\omega)$ is non-negative, $\bar{R}(\omega)$ is positive semi-definite for almost all $\omega$ . As $\bar{R}(\omega)$ is invertible by assumption A3, $\bar{R}(\omega)$ is positive-definite for almost all $\omega$ . It follows directly that $\bar{R}(\omega)^{-1}$ is hermitian and positive-definite for almost all $\omega$ .

It can now be shown that $S$ is symmetric and positive semi-definite as

$\displaystyle S=$	$\displaystyle\int_{\mathbb{R}^{d}}f(i)\gamma(i)\,di$
$\displaystyle=$	$\displaystyle\int_{\mathbb{R}^{d}}\int_{\mathbb{R}^{d}}e^{2\pi ji\cdot\omega}\bar{\gamma}(\omega)^{\top}\bar{R}(\omega)^{-1}\gamma(i)\,d\omega\,di$
$\displaystyle=$	$\displaystyle\int_{\mathbb{R}^{d}}\bar{\gamma}(\omega)^{\top}\bar{R}(\omega)^{-1}\int_{\mathbb{R}^{d}}e^{2\pi ji\cdot\omega}\gamma(i)\,di\,d\omega$	(30)
$\displaystyle=$	$\displaystyle\int_{\mathbb{R}^{d}}\bar{\gamma}(\omega)^{\top}\bar{R}(\omega)^{-1}\mathcal{F}^{-1}\{\gamma(i)\}\,d\omega$
$\displaystyle=$	$\displaystyle\int_{\mathbb{R}^{d}}\bar{\gamma}(\omega)^{\top}\bar{R}(\omega)^{-1}\bar{\gamma}(-\omega)\,d\omega$
$\displaystyle=$	$\displaystyle\int_{\mathbb{R}^{d}}\bar{\gamma}(-\omega)^{\top}\bar{R}(\omega)^{-1}\bar{\gamma}(\omega)\,d\omega.$

Due to the fact that $\bar{R}(\cdot)^{-1}$ is Hermitian and $\gamma$ is real-valued we then have

\displaystyle S=

\displaystyle\int_{\mathbb{R}^{d}}\bar{\gamma}(\omega)^{*}\bar{R}(\omega)^{-1}\bar{\gamma}(\omega)\,d\omega.

The interchange of integral operators in (30) must be justified. This is permitted by the Fubini-Tonelli theorem [43, Cor. 13.9] as each element of $\bar{\gamma}(\omega)^{*}\bar{R}(\omega)^{-1}\bar{\gamma}(\omega)$ is absolutely integrable. This can be seen by noting that every element of $\bar{\gamma}(\omega)^{*}$ is bounded, and that every element of $\bar{R}^{-1}(\omega)\bar{\gamma}(\omega)=\mathcal{F}\{f(i)^{\top}\}$ is absolutely integrable, ensuring that every element of $\bar{\gamma}(\omega)^{*}\bar{R}(\omega)^{-1}\bar{\gamma}(\omega)$ is a finite sum of products of absolutely integrable functions and bounded functions and hence is itself absolutely integrable.

This ensures $S$ is symmetric. $\bar{R}(\omega)^{-1}$ is a hermitian positive-definite matrix and so admits a unique Cholesky decomposition $\bar{R}(\omega)^{-1}=D^{*}(\omega)D(\omega)$ with rank $(D(\omega))=m$ for almost all $\omega\in\mathbb{R}^{d}$ . We can then state that for almost all $\omega$ ,

	$\displaystyle S=$	$\displaystyle\int_{\mathbb{R}}\left(D(\omega)\bar{\gamma}(\omega)\right)^{*}D(\omega)\bar{\gamma}(\omega)\,d\omega$
	$\displaystyle\implies x^{\top}Sx=$	$\displaystyle\\|D(\cdot)\bar{\gamma}(\cdot)x\\|^{2}_{L_{2}}\geq 0,\ \forall x\in\mathbb{R}^{n}$

hence $S$ is positive semi-definite. As $S$ is symmetric positive semi-definite, it also admits a unique decomposition $S=GG$ for some symmetric positive semi-definite $G=S^{\frac{1}{2}}\in\mathbb{R}^{n\times n}$ [42, p. 440], proving the first lemma assertion.

Now, Procedure 1 asserts that

	$\displaystyle P_{k+1\|k}$	$\displaystyle=AP_{k}A^{\top}+Q$
	$\displaystyle P_{k}$	$\displaystyle=P_{k\|k-1}\left(I+SP_{k\|k-1}\right)^{-1}.$		(31)

Substituting the form $S=GG=GG^{\top}$ into (31) we have that

	$\displaystyle P_{k+1\|k}=$	$\displaystyle AP_{k\|k-1}\left(I+SP_{k\|k-1}\right)^{-1}A^{\top}+Q$
	$\displaystyle=$	$\displaystyle AP_{k\|k-1}\left(I+GG^{\top}P_{k\|k-1}\right)^{-1}A^{\top}+Q$
	$\displaystyle=$	$\displaystyle AP_{k\|k-1}\left(I-G\left(I+G^{\top}P_{k\|k-1}G\right)^{-1}G^{\top}P_{k\|k-1}\right)A^{\top}$
		$\displaystyle+Q,$

which follows from the Woodbury matrix identity. A final expansion allows us to arrive at the canonical form of the discrete-time Riccati equation (29), completing the proof.

Lemma 5.1 is important because it enables us to employ the standard notions of stabilizability and detectability from finite-dimensional linear systems theory to establish the following result characterizing the convergence of our optimal linear filter. Note that Theorem 5.3 requires the following assumptions See A6 See A7

Theorem 5.3.

Consider the filter equations given by Theorem 4.7, and let assumptions A6 and A7 hold. If $Q>0$ , then the asymptotic a priori covariance $P^{-}_{\infty}$ is the unique stabilizing solution of the discrete-time algebraic Riccati equation

\displaystyle P_{\infty}^{-}

\displaystyle=AP_{\infty}^{-}A^{\top}-AP_{\infty}^{-}G\left(I+G^{\top}P_{\infty}^{-}G\right)^{-1}G^{\top}P_{\infty}^{-}A^{\top}+Q.

Proof 5.4.

Given Lemma 5.1, [35, p. 77] establishes that discrete-time algebraic Riccati equations of the form (29) converge to a unique limit if $Q>0$ , $(A,G)$ is detectable, and $(A,Q)$ is stabilizable. In other words, if these conditions hold, then for any non-negative symmetric initial covariance $P_{0}$ , there exists a limit

\displaystyle\lim_{k\rightarrow\infty}P_{k|k-1}=P_{\infty}^{-},

and this limit satisfies the steady-state equation

	$\displaystyle P_{\infty}^{-}=$	$\displaystyle AP_{\infty}^{-}A^{\top}+Q$
		$\displaystyle-AP_{\infty}^{-}G\left(I+G^{\top}P_{\infty}^{-}G\right)^{-1}G^{\top}P_{\infty}^{-}A^{\top},$

completing the proof.

Given Theorem 5.3, the asymptotic a posteriori covariance matrix $P_{\infty}$ can easily be calculated via the covariance update equation of Theorem 4.7, namely, $P_{\infty}=P_{\infty}^{-}(I+SP_{\infty}^{-})^{-1}$ . Surprisingly, Theorem 5.3 requires only existing concepts from finite-dimensional linear system theory, and no concepts from infinite-dimensional systems theory.

6 Algorithm and Implementation

Theorem 4.7, combined with (25), can be used to construct an algorithm as follows:

Procedure 1 Optimal Linear Filter

1:Inputs:

A,Q,R(i,i^{\prime}),P_{0},\hat{x}_{0},\gamma(i)

2:for

k\geq 1

f(i)=\mathcal{F}^{-1}\{\bar{\gamma}^{\top}(\omega)\bar{R}(\omega)^{-1}\}

S=\int_{\mathbb{R}^{d}}f(i)\gamma(i)di

P_{k|k-1}=AP_{k-1}A^{\top}+Q

P_{k}=P_{k|k-1}(I+SP_{k|k-1})^{-1}

\hat{x}_{k|k-1}=A\hat{x}_{k-1}

8: Obtain measurement

z_{k}(i)

\hat{x}_{k}=\hat{x}_{k|k-1}+P_{k}\int_{\mathbb{R}^{d}}f(i)(z_{k}(i)-\gamma(i)\hat{x}_{k|k-1})di

10:end for

11:Outputs:

\{P_{1},P_{2},...,P_{N}\},\{\hat{x}_{1},\hat{x}_{2},...,\hat{x}_{N}\}

We will now discuss the computational complexity of Procedure 1 and its connection to existing results, before implementing it in a simulated environment in Section 7.

6.1 Computational Complexity

Our optimal linear filter has many similar properties to the finite-dimensional Kalman filter, such as the convenient ability to calculate the full trajectory of covariance matrices before run-time, as they do not rely on any measurements. This filter is similar to the filter formulated in previous work [21], although the derivation of the filter presented in this work relies on the methodology of projections rather than the calculus of variations approach. The filter given in this article extends the previously derived filter to operate not only on the index set $\mathbb{R}$ , but on any Euclidian space. Filtering on the Euclidian plane $\mathbb{R}^{2}$ is perhaps the most relevant to imaging applications, but there may be numerous applications involving higher dimensions.

In the standard finite-dimensional Kalman filtering context with $N$ measurements, naive inversion of the matrix $(R+\Gamma P\Gamma^{\top})\in\mathbb{R}^{N\times N}$ requires nearly cubic complexity in $N$ [49]. This condition can sometimes be ameliorated, however, by observing that if $R$ is the covariance matrix of a stationary process then it will have a Toeplitz form. This Toeplitz structure allows greatly reduced computational complexity for inversion operations, see [50] and the references throughout for a brief summary on the types of Toeplitz matrices and corresponding inversion complexities. For positive-definite Toeplitz matrices in particular, inversion may be performed with $O(Nlog^{2}(N))$ complexity.

The optimal Kalman filter in Procedure 1 possesses similar advantages due to the stationary structure of $R$ . If $N$ measurement samples are taken, we see in Procedure 1 that Fourier operations are used to calculate the covariance matrices and optimal gain functions, resulting in a fast Fourier transform applied to an $m\times n$ matrix with complexity $O\left(mnN\log(N)\right)$ [51, p. 41].

Although Procedure 1 does contain matrix inversion, this is confined to low-dimensional $n\times n$ matrices. This approach to analysis in the continuum provides valuable insights into the optimal gain functions and the behavior of the system. This extension to the continuous measurement domain does introduce a trade-off between accuracy and speed of computation, as any implementable estimator will not be able to sample continuously over the entire space. A standard implementation may sample the space uniformly, which is the approach we take in Section 7, but a more sophisticated approach could involve the analysis of different non-uniform schemes that exploit the underlying system structure. Examination of this trade-off for various systems is an important area of future work.

6.2 Comparisons to Existing Results

During our analysis, we found the necessary and sufficient conditions on the optimal gain satisfies equation (17). If the measurements were finite-dimensional, this equation reduces to a matrix equation. The gain may then be calculated via matrix inversion as

\displaystyle K_{k}=P^{-}_{k}\gamma^{\top}\left(R+\gamma P_{k}^{-}\gamma^{\top}\right)^{-1}.

Note that this is the standard equation for the Kalman filter in the finite-dimensional case.

Previous works in this area have also derived condition (17) [19], [16], [8]. These works, however, have either only defined the optimal gain implicitly, or defined it via an operator inverse $(\cdot)^{\dagger}$ which satisfies

\displaystyle\int_{\mathcal{D}}G(i,i^{\prime})G^{\dagger}(i^{\prime},i_{1})di^{\prime}=I\delta(i-i_{1}),

where $\delta(\cdot)$ is the Dirac delta function [8, Eq. (14)]. The optimal gain is then expressed using this distributed-parameter matrix inverse as

\displaystyle K_{k}=P_{k}^{-}\gamma^{\top}(R+\gamma P_{k}^{-}\gamma^{\top})^{\dagger}.

Some derivations also give results where the gain is reliant on $R^{\dagger}.$ These expressions make the connection to the finite-dimensional Kalman filter clearer, but unfortunately the operator inverse is not guaranteed to exist. Furthermore, some forms of $R(i,i^{\prime})$ will lead to inverses that are troublesome to implement. The derivation of the inverse corresponding to an Ornstein-Uhlenbeck process has been presented previously in [21] and results in an $R^{\dagger}$ defined in terms of the Dirac delta function and its derivatives. The Dirac delta function is well-defined only when under an integral, and requires special care when implemented on real-world hardware. Derivatives of the Dirac delta function will also cause difficulties in implementation. The filter equations derived in this work do not rely on such an inverse, and give an explicit definition of the optimal gain. The formulation presented will, in some cases, avoid the need for generalized functions such as the aforementioned Dirac delta function and its higher derivatives.

The derived prior covariance update equation (29) can be compared with standard forms for the finite-dimensional Kalman filter [35, p. 39, Eq. (1.9)]. It is interesting to note that this implies that the prior covariance matrix for our filter is equivalent to that of a finite-dimensional Kalman filter with the measurement noise being a standard multi-variate normal distribution and an observation matrix equal to $G=S^{\frac{1}{2}}$ .

We also highlight that the state estimation error previously defined as $e_{k}\triangleq x_{k}-\hat{x}_{k}$ satisfies

\displaystyle e_{k+1}=M_{k}e_{k}+n_{k},

where $M_{k}=A-K_{k}\gamma\in\mathbb{R}^{n\times n}$ , and $n_{k}=-K_{k}v_{k}+w_{k}\in\mathbb{R}^{n}.$ Hence, the state estimation error dynamics and the optimal (component) gains $\kappa_{k}(i)$ to apply to points $i\in\mathbb{R}^{d}$ in the measurement domain depend on the state dynamics of the system, $A$ . In particular, as $k\to\infty$ , the steady-state gain operator $K_{k}$ must be chosen such that $A-K_{k}\gamma$ is stable and yields bounded mean square estimation errors. In other words, the gain must take account of the underlying dynamics. This aspect of the filter is important as many standard methods of processing high-dimensional measurements, such as vision-based systems, perform the detection and weighting of measurement features separately from the system dynamics.

7 Simulation Results

This section presents simulation results for the derived Kalman filter with infinite-dimensional measurements, as described in Algorithm 1. This system consists of a state vector $x_{k}=[q_{k},\dot{q}_{k}]^{\top}$ , which represents the position and velocity of some agent. The motion model consists of a state transition model where $\Delta$ is the change in time between each discrete step. A two-dimensional image is fixed perpendicular to the line of motion of the filter. This image is represented by the function $C(p):\mathbb{R}^{2}\rightarrow\mathbb{R}$ . We note that the measurement readings are scalar valued, perhaps representing visual light intensity, but this could easily be extended to multi-dimensional image outputs. For example, the intensity of three distinct RGB channels could be represented solely by changing the form of the function $C(p)$ to be vector-valued. The measurement model used is the simple pinhole camera model. An illustration of this model is given in Fig. 2. Both the motion model and the measurement model are affected by zero-mean, stationary, additive Gaussian noise such that $E[w_{k}w_{k}^{\top}]=Q$ and $E[v_{k}(i)v_{k}(i^{\prime})^{\top}]=R(i-i^{\prime}).$

The patterned wall possesses a gray-scale intensity function

\displaystyle C(p)=e^{-(\eta\|p\|_{2})^{2}}\cos(\xi\|p\|_{2})+1

Table 4: Simulation Parameters

System Variable	Notation	Value
State Matrix	$A$	$\begin{bmatrix}1&\Delta\\ 0&1\end{bmatrix}$
Process Noise Covariance	$Q$	$\begin{bmatrix}\sigma_{q}^{2}&0\\ 0&\sigma_{\dot{q}}^{2}\end{bmatrix}$
Measurement Kernel	$\gamma(i)$	$\text{See }\eqref{eq:gamma}$
State Vector	$x_{k}$	$[q_{k},\dot{q}_{k}]^{\top}$
Initial State	$x_{0}$	$[1,0]^{\top}$
Linearization Point	$\bar{x}$	$[1,0]^{\top}$
Integral Domain	$\mathcal{D}$	$[-0.5,0.5]^{\top}\times[-0.5,0.5]^{\top}$
Wall Pattern	$C(p)$	$e^{-(\eta\\|p\\|_{2})^{2}}\cos(\xi\\|p\\|_{2})+1$
Measurement Covariance	$R(i,i^{\prime})$	$\frac{\nu}{2\pi\ell^{2}}e^{-\\|i-i^{\prime}\\|_{2}^{2}(2\ell^{2})^{-1}}$
Initial Error Covariance	$P_{0}$	$Q$
Initial State Estimate	$\hat{x}_{0}$	$x_{0}$
Time Interval	$\Delta$	$1$
Position Variance	$\sigma_{q}^{2}$	$0.01$
Velocity Variance	$\sigma_{\dot{q}}^{2}$	$0.01$
Decay Parameter	$\eta$	$0.1$
Frequency Parameter	$\xi$	$0.8$
Measurement Noise Intensity	$\nu$	$10$
Focal Length	$L_{f}$	$0.01$
Length Scale	$\ell$	$0.025$
Sample Spacing	$\Delta_{s}$	$0.005$

where $\eta$ is a scalar parameter that determines the decay rate and $\xi$ is a scalar parameter that determines the frequency. Utilizing the pinhole camera model, our non-linear measurement equation is then of the form

\displaystyle z_{k}(i,q_{k})

\displaystyle=C\left(\frac{iq_{k}}{L_{f}}\right)+v_{k}(i).

This function is linearized around some equilibrium point $\bar{x}=[\bar{q},\bar{\dot{q}}]^{\top}$ to derive the linear form used in the simulation. This linearization is given in equation (33) and leads to the corresponding measurement function

	$\displaystyle\gamma(i)=$	$\displaystyle\bigg{[}-e^{-(\eta L_{f}^{-1}\bar{q}\\|i\\|_{2})^{2}}\times\big{(}2(\eta L_{f}^{-1}\\|i\\|_{2})^{2}\bar{q}\cos(\xi L_{f}^{-1}\\|i\\|_{2}\bar{q})$
		$\displaystyle+\xi L_{f}^{-1}\\|i\\|_{2}\sin(\xi L_{f}^{-1}\\|i\\|_{2}\bar{q})\big{)},0\bigg{]}\in\mathbb{R}^{1\times 2}.$		(32)

The system equations then are:

$\displaystyle x_{k+1}=$	$\displaystyle\begin{bmatrix}1&\Delta\\ 0&1\end{bmatrix}\begin{bmatrix}q_{k}\\ \dot{q}_{k}\end{bmatrix}+w_{k}$
$\displaystyle z_{k}(i)=$	$\displaystyle-e^{-(\eta L_{f}^{-1}\bar{q}\\|i\\|_{2})^{2}}\big{[}2(\eta L_{f}^{-1}\\|i\\|_{2})^{2}\bar{q}\cos(\xi L_{f}^{-1}\\|i\\|_{2}\bar{q})$
	$\displaystyle+\xi L_{f}^{-1}\\|i\\|_{2}\sin(\xi L_{f}^{-1}\\|i\\|_{2}\bar{q})\big{]}q_{k}+v_{k}(i).$	(33)

An example of the linearized measurement pattern as measured by the system without and with noise are presented in Figs. 3(a) and 3(b) respectively.

We can formulate a discrete-time Riccati equation for this system as given by (29).

The matrix pair $(A,Q)$ is stabilizable in the sense of Definition 2.2 because considering a matrix $M=\frac{1}{2\sigma_{q}^{2}}I$ , it is then clear that

	$\displaystyle\rho(A-MQ)$	$\displaystyle=\rho\left(A-\frac{1}{2}I\right)$
		$\displaystyle=\rho\left(\begin{bmatrix}\frac{1}{2}&1\\ 0&\frac{1}{2}\end{bmatrix}\right)=\frac{1}{2}.$

It is also true that the matrix pair $(A,G)$ is detectable in the sense of Definition 2.3, where $G=S^{\frac{1}{2}}$ . This can be shown by demonstrating that the matrix pair $(A^{\top},G)$ is stabilizable, specifically, in the simulated system the matrix $G=S^{\frac{1}{2}}$ takes the form

\displaystyle S=\begin{bmatrix}G_{1}&0\\ 0&0\end{bmatrix},\ G_{1}\in\mathbb{R}^{+}.

Considering a matrix $M$ of the form

\displaystyle M=\begin{bmatrix}\frac{1}{G_{1}}&\frac{1}{4G_{1}}\\ 0&0\end{bmatrix}

immediately gives that

\displaystyle\rho(A^{\top}-SM)

\displaystyle=\rho\left(\begin{bmatrix}0&-\frac{1}{4}\\ 1&1\end{bmatrix}\right)=\frac{1}{2}.

These conditions, along with the fact that $Q$ is positive-definite, ensure that the discrete-time Riccati equation for this system possesses a unique stabilizing solution [35]. Utilizing the Matlab function idare and substituting the appropriate terms, the following covariance matrices were calculated for this system.

\displaystyle P_{\infty}^{-}=\begin{bmatrix}1.2018&0.2019\\ 0.2019&0.0695\end{bmatrix},\ P_{\infty}=\begin{bmatrix}0.8475&0.1424\\ 0.1424&0.0595\end{bmatrix}.

(34)

The process noise of the position and velocity are uncorrelated. A centred, stationary, stochastic field is used to represent the measurement noise, the covariance function of which is a squared exponential also given in table 4.

With the parameters given in Table 4, Algorithm 1 was implemented and simulated using Matlab. It is necessary during implementation to discretise the system appropriately for numerical computation. Numerical functions were evaluated with spacings equal to $\Delta_{s}$ in each dimension, or a frequency of $200$ samples per unit of $i$ in each dimension. Algorithm 1 requires numerical integration for the posterior state update step, and the trapz function was used to implement this numerical integration. For this numerical integration step, a finite pixel domain $\mathcal{D}=[-0.5,0.5]\times[-0.5,0.5]$ was used. This is larger than the effective pixel-width of $\gamma$ and much larger than the effective width $\ell$ of the noise covariance $R$ , suggesting minimal impact on the results obtained.

This naive method of numerical integration does not exploit the structure of $f(i)$ and hence there may be more efficient numerical methods that do exploit this structure and lead to more efficient computation. Analysis of such techniques and their impact on efficiency will continue to be an important question for future work in this area.

A single realization of the systems true state and the filter estimate over $50$ time steps is given in Fig. 4(a). The system was subject to $20,000$ simulated trials and the average MSE of both the position and velocity trajectories calculated. Fig. 4(b) presents these results alongside the theoretical mean square errors according to (34).

8 Conclusion

This article has presented a Kalman filter for a linear system with finite-dimensional states and infinite-dimensional measurements, both corrupted by additive random noise. The assumption that the measurement noise covariance function is stationary allows a closed-form expression of the optimal gain function, whereas previous derivations rely on an implicitly defined inverse. Conditions are derived which ensure the filter is asymptotically stable and, surprisingly, these conditions apply to finite-dimensional components of the system. An algorithm is presented which takes advantage of this stationarity property to reduce computational complexity, and the derived filter is tested in a simulated environment motivated by a linearization of the pinhole camera model.

An extension to a continuous-time framework is possible, but would require additional tools and some further technical conditions, such as the use of stochastic integrals (either in the Itô or Stratonovich sense). An extension of the Fubini-Tonelli theorem given by Theorem 3.1 would also need to be modified to validate the interchange of expectation and integration on stochastic fields over a continuous spatial and temporal field.

Future work includes investigating more efficient means of executing this algorithm, such as non-uniform integration intervals, to yield insight into effective sensor placement. This could be thought of as a form of feature selection, and this principled approach to feature selection may provide insights related to existing heuristics-based and data driven feature selection strategies often employed in high-dimensional measurement processing. It should also be noted that most real-world applications possess non-linear measurement equations and adapting the linear filter to nonlinear systems will be an important area for future work.

\appendices

9 Existence of Integral (5) via Fubini-Tonelli Theorem

We introduce one final technical assumption:

A8

$\kappa_{k}(\cdot)s_{k}(\cdot,\cdot)$ is measurable with respect to the product $\sigma$ -algebra on $\mathbb{R}^{d}\times\Theta$ .

Proof 9.1.

For any fixed $i$ , Jensen’s inequality tells us

	$\displaystyle E\left[\\|\kappa(i)s(i)\\|_{2}\right]^{2}\leq$	$\displaystyle E\left[\\|\kappa(i)s(i)\\|_{2}^{2}\right]$
	$\displaystyle=$	$\displaystyle E\bigg{[}Tr\big{[}\kappa(i)s(i)s^{\top}(i)\kappa^{\top}(i)\big{]}\bigg{]}$
	$\displaystyle=$	$\displaystyle Tr\bigg{[}\kappa(i)E\left[s(i)s^{\top}(i)\right]\kappa^{\top}(i)\bigg{]}$
	$\displaystyle=$	$\displaystyle Tr\bigg{[}\kappa(i)\left(\Sigma(0)+\gamma(i)P\gamma(i)^{\top}\right)\kappa^{\top}(i)\bigg{]}$
	$\displaystyle=$	$\displaystyle\\|\Sigma^{\frac{1}{2}}(0)\kappa(i)\\|^{2}_{F}+\\|P^{\frac{1}{2}}\gamma(i)\kappa(i)]\\|^{2}_{F}$
	$\displaystyle\leq$	$\displaystyle\\|\Sigma^{\frac{1}{2}}(0)\\|_{F}^{2}\\|\kappa(i)\\|_{F}^{2}+\\|P^{\frac{1}{2}}\\|_{F}^{2}\\|\gamma(i)\kappa(i)\\|_{F}^{2}.$

This implies, due to the non-negativity of norms, that

\displaystyle E\left[\|\kappa(i)s(i)\|_{2}\right]\leq

\displaystyle\|\Sigma^{\frac{1}{2}}(0)\|_{F}\|\kappa(i)\|_{F}+\|P^{\frac{1}{2}}\|_{F}\|\gamma(i)\kappa(i)\|_{F}.

As this is true for any given $i$ , we have the inequality

	$\displaystyle\int_{\mathbb{R}^{d}}E\left[\\|\kappa(i)s(i)\\|_{2}\right]di\leq$	$\displaystyle\\|\Sigma^{\frac{1}{2}}(0)\\|_{F}\int_{\mathbb{R}^{d}}\\|\kappa(i)\\|_{F}\,di$
		$\displaystyle+\\|P^{\frac{1}{2}}\\|_{F}\int_{\mathbb{R}^{d}}\\|\gamma(i)\kappa(i)\\|_{F}\,di.$

As $\Sigma$ is bounded and $\kappa(i)$ is absolutely integrable, the first term of the right hand side is finite. By Hölder’s inequality and assumptions A1 and A5 we have $\|\kappa\gamma\|_{L_{1}}\leq\|\kappa\|_{L_{1}}\|\gamma\|_{L_{\infty}}<\infty$ and hence the second term as well as the the left hand side is finite. As all norms on $\mathbb{R}^{n}$ are equivalent, we have

	$\displaystyle\int_{\mathbb{R}^{d}}E\left[\\|\kappa(i)s(i)\\|_{1}\right]di$	$\displaystyle<\infty$
	$\displaystyle\iff\int_{\mathbb{R}^{d}}\int_{\Theta}\\|\kappa(i)s(i,\theta)\\|_{1}dP(d\theta)di$	$\displaystyle<\infty$
	$\displaystyle\iff\int_{\mathbb{R}^{d}}\int_{\Theta}\left\|[\kappa(i)s(i,\theta)]^{\ell}\right\|dP(d\theta)di$	$\displaystyle<\infty,$

where $[\kappa(i)s(i,\theta)]^{\ell}$ is the $\ell^{th}$ element of the vector $\kappa(i)s(i,\theta)$ . As the integral of the absolute expectation of each component is finite, we may apply the Fubini-Tonelli theorem component-wise for each $\ell$ and compose the vector form

\displaystyle\int_{\mathbb{R}^{d}}E\left[\kappa(i)s(i)\right]di=E\left[\int_{\mathbb{R}^{d}}\kappa(i)s(i)di\right].

References

[1] A. Belloni and V. Chernozhukov, High Dimensional Sparse Econometric Models: An Introduction, pp. 121–156. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011.
[2] R. Sepulchre, “Spiking control systems,” Proceedings of the IEEE, vol. 110, no. 5, pp. 577–589, 2022.
[3] P. Corke, Robotics, Vision and Control - Fundamental Algorithms in MATLAB®, vol. 73 of Springer Tracts in Advanced Robotics. Springer, 2011.
[4] K. Morris, Controller Design for Distributed Parameter Systems. Springer, 2020.
[5] R. Curtain, “A survey of infinite-dimensional filtering,” SIAM Review, vol. 17, no. 3, pp. 395–411, 1975.
[6] A. Bensoussan, G. Da Prato, M. Delfour, and S. Mitter, Representation and Control of Inﬁnite Dimensional Systems. Systems and Control: Foundations and Applications, Birkhauser, 2 ed., 2007.
[7] S. Tzafestas and J. Nightingale, “Optimal filtering, smoothing and prediction in linear distributed-parameter systems,” Proceedings of the Institution of Electrical Engineers, vol. 115, pp. 1207–1212(5), 1968.
[8] H. Nagamine, S. Omatu, and T. Soeda, “The optimal filtering problem for a discrete-time distributed parameter system,” International Journal of Systems Science, vol. 10, no. 7, pp. 735–749, 1979.
[9] P. L. Falb, “Infinite-dimensional filtering: The kalman-bucy filter in hilbert space,” Information and Control, vol. 11, pp. 102–137, 1967.
[10] Z. Emirsajłow, Output Observers for Linear Infinite-Dimensional Control Systems, pp. 67–92. Cham: Springer International Publishing, 2021.
[11] R. Curtain and H. Zwart, Introduction to Infinite-Dimensional Systems Theory: A State-Space Approach, vol. 71 of Texts in Applied Mathematics book series. Germany: Springer, 2020.
[12] Z. Hidayat, R. Babuska, B. De Schutter, and A. Núñez, “Observers for linear distributed-parameter systems: A survey,” in IEEE International Symposium on Robotic and Sensors Environments, pp. 166–171, 2011.
[13] A. Aalto, “Spatial discretization error in Kalman filtering for discrete-time infinite dimensional systems,” IMA Journal of Mathematical Control and Information, vol. 35, pp. i51–i72, 04 2017.
[14] A. Küper and S. Waldherr, “Numerical gaussian process kalman filtering for spatiotemporal systems,” IEEE Transactions on Automatic Control, pp. 1–8, 2022.
[15] F. E. Thau, “On Optimum Filtering for a Class of Linear Distributed-Parameter Systems,” Journal of Basic Engineering, vol. 91, no. 2, pp. 173–178, 1969.
[16] S. G. Tzafestas, “On the distributed parameter least-squares state estimation theory,” International Journal of Systems Science, vol. 4, no. 6, pp. 833–858, 1973.
[17] S. G. Tzafestas, “Bayesian approach to distributed-parameter filtering and smoothing,” International Journal of Control, vol. 15, no. 2, pp. 273–295, 1972.
[18] K. A. Morris, “Optimal output estimation for infinite-dimensional systems with disturbances,” Systems & Control Letters, vol. 146, p. 104803, 2020.
[19] J. S. Meditch, “Least-squares filtering and smoothing for linear distributed parameter systems,” Automatica, vol. 7, p. 315–322, May 1971.
[20] A. Bensoussan, Some Remarks on Linear Filtering Theory for Infinite Dimensional Systems, vol. 286, pp. 27–39. Springer Berlin Heidelberg, 10 2003.
[21] M. M. Varley, T. L. Molloy, and G. N. Nair, “Kalman filtering for discrete-time linear systems with infinite-dimensional observations,” in 2022 American Control Conference (ACC), pp. 296–303, 2022.
[22] P. A. Fuhrmann, “On observability and stability in infinite-dimensional linear systems,” Journal of Optimization Theory and Applications, vol. 12, pp. 173–181, Aug 1973.
[23] M. Megan, “Stability and observability for linear discrete-time systems in hilbert spaces,” Bulletin mathématique de la Société des Sciences Mathématiques de la République Socialiste de Roumanie, vol. 24 (72), no. 3, pp. 277–284, 1980.
[24] B. Fristedt and L. Gray, A Modern Approach to Probability Theory. Boston, MA: Birkhauser, 1997.
[25] D. G. Luenberger, Optimization by Vector Space Methods. USA: John Wiley & Sons, Inc., 1st ed., 1997.
[26] R. J. Adler, The Geometry of Random Fields (Probability & Mathematical Statistics S.). John Wiley & Sons Inc, June 1981.
[27] J. Doob, Stochastic Processes. Probability and Statistics Series, Wiley, 1953.
[28] S. Janson, Gaussian Hilbert Spaces. Cambridge Tracts in Mathematics, Cambridge University Press, 1997.
[29] R. Boie and I. Cox, “An analysis of camera noise,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, no. 6, pp. 671–674, 1992.
[30] J. Fehrenbach, P. Weiss, and C. Lorenzo, “Variational algorithms to remove stationary noise: Applications to microscopy imaging,” IEEE Transactions on Image Processing, vol. 21, no. 10, pp. 4420–4430, 2012.
[31] T. L. Marzetta, “Spatially-stationary propagating random field model for massive mimo small-scale fading,” in 2018 IEEE International Symposium on Information Theory (ISIT), pp. 391–395, 2018.
[32] P. M. Atkinson and C. D. Lloyd, Geostatistical Models and Spatial Interpolation, pp. 1461–1476. Berlin, Heidelberg: Springer Berlin Heidelberg, 2014.
[33] E. M. Stein and G. Weiss, Introduction to Fourier Analysis on Euclidean Spaces. Princeton University Press, 1971.
[34] G. Folland, Fourier Analysis and Its Applications. American Mathematical Society, 2009.
[35] B. D. O. Anderson and J. B. Moore, Optimal filtering. Prentice-Hall Englewood Cliffs, N.J, 1979.
[36] A. I. Mourikis, N. Trawny, S. I. Roumeliotis, A. E. Johnson, A. Ansar, and L. Matthies, “Vision-aided inertial navigation for spacecraft entry, descent, and landing,” IEEE Transactions on Robotics, vol. 25, no. 2, pp. 264–280, 2009.
[37] M. Mascaró, I. Parra-Tsunekawa, C. Tampier, and J. Ruiz-del Solar, “Topological navigation and localization in tunnels—application to autonomous load-haul-dump vehicles operating in underground mines,” Applied Sciences, vol. 11, no. 14, 2021.
[38] H. Mäkelä, “Overview of lhd navigation without artificial beacons,” Robotics and Autonomous Systems, vol. 36, no. 1, pp. 21–35, 2001.
[39] A. Krasner, M. Sizintsev, A. Rajvanshi, H.-P. Chiu, N. Mithun, K. Kaighn, P. Miller, R. Villamil, and S. Samarasekera, “Signav: Semantically-informed gps-denied navigation and mapping in visually-degraded environments,” in 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 1858–1867, 2022.
[40] H. Cho, Y.-W. Seo, B. V. Kumar, and R. R. Rajkumar, “A multi-sensor fusion system for moving object detection and tracking in urban driving environments,” in 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 1836–1843, 2014.
[41] E. Kreyszig, Introductory Functional Analysis with Applications. Wiley Classics Library, Wiley, 1991.
[42] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge; New York: Cambridge University Press, 2nd ed., 2013.
[43] R. L. Schilling, Measures, Integrals and Martingales. Cambridge University Press, 2005.
[44] S. J. Axler, Linear Algebra Done Right. Undergraduate Texts in Mathematics, New York: Springer, 1997.
[45] R. E. Kalman, “A new approach to linear filtering and prediction problems,” Transactions of the ASME–Journal of Basic Engineering, vol. 82, no. Series D, pp. 35–45, 1960.
[46] R. Bracewell, The Two-Dimensional Fourier Transform, pp. 140–173. Boston, MA: Springer US, 2003.
[47] J. B. Conway, A course in functional analysis / John B. Conway. Graduate texts in mathematics ; 96, New York: Springer Science+Business Media, 2nd ed. ed., 2010.
[48] C. E. Rasmussen and C. K. I. Williams, Gaussian processes for machine learning. Adaptive computation and machine learning, MIT Press, 2006.
[49] V. Strassen, “Gaussian elimination is not optimal,” Numer. Math., vol. 13, p. 354–356, aug 1969.
[50] P. J. Ferreira and M. E. Domínguez, “Trading-off matrix size and matrix structure: Handling toeplitz equations by embedding on a larger circulant set,” Digital Signal Processing, vol. 20, no. 6, pp. 1711–1722, 2010.
[51] K. R. Rao, D. N. Kim, and J.-J. Hwang, Fast Fourier Transform - Algorithms and Applications. Springer Publishing Company, Incorporated, 1st ed., 2010.

{IEEEbiography}

[ [Uncaptioned image] ]Maxwell M. Varley Received the B.S. degree in electrical systems and the M.S. degree in Electrical Engineering from the University of Melbourne, Australia, in 2017 and 2019, respectively. He is currently working towards the Ph.D. degree in electrical engineering at the University of Melbourne.

{IEEEbiography}

[ [Uncaptioned image] ]Timothy L. Molloy (Member, IEEE) was born in Emerald, Australia. He received the B.E. and Ph.D. degrees from Queensland University of Technology (QUT), Brisbane, QLD, Australia, in 2010 and 2015, respectively. He is currently a Senior Lecturer in Mechatronics at the Australian National University. From 2017 to 2019, he was an Advance Queensland Research Fellow at QUT, and from 2020 to 2022, he was a Research Fellow at the University of Melbourne. His interests include signal processing and information theory for robotics and control. Dr. Molloy is the recipient of a QUT University Medal, a QUT Outstanding Doctoral Thesis Award, a 2018 Boeing Wirraway Team Award, and an Advance Queensland Early-Career Fellowship.

{IEEEbiography}

[ [Uncaptioned image] ]Girish N. Nair (Fellow, IEEE) was born in Malaysia. He received the B.E. (First Class Hons.) degree in electrical engineering, the B.Sc. degree in mathematics, and the Ph.D. degree in electrical engineering from the University of Melbourne, Parkville, VIC, Australia, in 1994, 1995, and 2000, respectively. He is currently a Professor with the Department of Electrical and Electronic Engineering, University of Melbourne. From 2015 to 2019, he was an ARC Future Fellow, and from 2019 to 2024, he is the Principal Australian Investigator of an AUSMURI project. He delivered a semiplenary lecture at the 60th IEEE Conference on Decision and Control, in 2021. His research interests are in information theory and control. Prof. Nair was a recipient of several prizes, including the IEEE CSS Axelby Outstanding Paper Award in 2014 and a SIAM Outstanding Paper Prize in 2006.

	$\displaystyle 0\mkern-3.0mu$	$\displaystyle=\mkern-5.0mu\bigg{\langle}\mkern-5.0mux_{k}^{l}\mkern-4.0mu-\mkern-4.0mu\hat{x}_{k\|k-1}^{l}\mkern-3.0mu-\mkern-6.0mu\int_{\mathbb{R}^{d}}\mkern-4.0mu\sum_{p=1}^{m}\mkern-2.0mu\kappa_{k}^{l,p}\mkern-2.0mu(i)s_{k}^{p}(i)di,\mkern-5.0mu\int_{\mathbb{R}^{d}}\mkern-3.0mu\sum_{p=1}^{m}\mkern-2.0mu\alpha_{k}^{p}(i^{\prime})s^{p}_{k}(i^{\prime})di^{\prime}\mkern-5.0mu\bigg{\rangle}_{\mathcal{\mkern-5.0muH}}$
		$\displaystyle=\mkern-5.0mu\bigg{\langle}\mkern-5.0mux_{k}^{l}\mkern-4.0mu-\mkern-4.0mu\hat{x}_{k\|k-1}^{l}\mkern-3.0mu-\mkern-6.0mu\int_{\mathbb{R}^{d}}{s}^{\top}_{k}(i){\kappa}^{l}_{k}(i)di,\int_{\mathbb{R}^{d}}s^{\top}_{k}(i^{\prime})\alpha_{k}(i^{\prime})di^{\prime}\bigg{\rangle}_{\mathcal{H}}$
		$\displaystyle=\mkern-3.0muE\bigg{[}\bigg{(}x_{k}^{l}-\hat{x}_{k\|k-1}^{l}-\mkern-4.0mu\int_{\mathbb{R}^{d}}\mkern-7.0mu{s}_{k}(i)^{\top}{\kappa}^{l}_{k}(i)di\bigg{)}\mkern-4.0mu\int_{\mathbb{R}^{d}}\mkern-7.0mus^{\top}_{k}(i^{\prime})\alpha_{k}(i^{\prime})di^{\prime}\bigg{]}$
		$\displaystyle=\mkern-7.0mu\int_{\Theta}\mkern-3.0mu\int_{\mathbb{R}^{d}}\mkern-7.0mu\bigg{(}\mkern-4.0mux_{k}^{l}\mkern-3.0mu-\mkern-3.0mu\hat{x}_{k\|k-1}^{l}\mkern-3.0mu-\mkern-7.0mu\int_{\mathbb{R}^{d}}\mkern-9.0mu{s}_{k}(i)^{\mkern-4.0mu\top}{\kappa}^{l}_{k}(i)di\mkern-4.0mu\bigg{)}s^{\mkern-4.0mu\top}_{k}(i^{\prime})\alpha_{k}(i^{\prime})di^{\prime}dP(\theta).$

		$\displaystyle E\bigg{[}\bigg{(}x_{k}-\hat{x}_{k\|k-1}-\int_{\mathbb{R}^{d}}\kappa_{k}(i)\big{[}\gamma(i)(x_{k}-\hat{x}_{k\|k-1})+v_{k}(i)\big{]}di\bigg{)}$
		$\displaystyle\times\bigg{(}\big{[}\gamma(i^{\prime})(x_{k}-\hat{x}_{k\|k-1})+v_{k}(i^{\prime})\big{]}\bigg{)}^{\top}\bigg{]}$
	$\displaystyle=$	$\displaystyle E\bigg{[}(x_{k}-\hat{x}_{k\|k-1})(x_{k}-\hat{x}_{k\|k-1})^{\top}\bigg{]}\gamma^{\top}(i^{\prime})$
		$\displaystyle-\int_{\mathbb{R}^{d}}\bigg{(}\kappa_{k}(i)(\gamma(i)E\bigg{[}(x_{k}-\hat{x}_{k\|k-1})(x_{k}-\hat{x}_{k\|k-1})^{\top}\bigg{]}\gamma^{\top}(i^{\prime})$
		$\displaystyle+E\bigg{[}v_{k}(i)v_{k}^{\top}(i^{\prime})\bigg{]}\bigg{)}di=0$

	$\displaystyle\\|\tilde{s}_{k}-x_{k}^{l}\\|_{\mathcal{H}}^{2}=$	$\displaystyle\\|\beta_{k}^{l}-x_{k}^{l}+\tilde{s}_{k}-\beta_{k}^{l}\\|_{\mathcal{H}}^{2}$
	$\displaystyle=$	$\displaystyle\\|\beta_{k}^{l}-x_{k}^{l}\\|_{\mathcal{H}}^{2}+\\|\tilde{s}_{k}-\beta_{k}^{l}\\|_{\mathcal{H}}^{2}$
		$\displaystyle+2\langle\beta_{k}^{l}-x_{k}^{l},\tilde{s}_{k}-\beta_{k}^{l}\rangle_{\mathcal{H}}$
	$\displaystyle=$	$\displaystyle\\|\beta_{k}^{l}-x_{k}^{l}\\|_{\mathcal{H}}^{2}+\\|\tilde{s}_{k}-\beta_{k}^{l}\\|_{\mathcal{H}}^{2}$
	$\displaystyle\implies\\|x_{k}^{l}-\tilde{s}_{k}\\|_{\mathcal{H}}^{2}\geq$	$\displaystyle\\|x_{k}^{l}-\beta_{k}^{l}\\|_{\mathcal{H}}^{2},$

	$\displaystyle\\|\beta^{l}-\bar{s}\\|_{\mathcal{H}}^{2}$	$\displaystyle\leq\\|s-\bar{s}\\|_{\mathcal{H}}^{2}\quad\forall s\in\mathcal{S}_{k}$
	$\displaystyle\\|\beta^{l}-\bar{s}\\|_{\mathcal{H}}^{2}$	$\displaystyle\leq\\|s_{n}-\bar{s}\\|_{\mathcal{H}}^{2}\quad\forall n.$

	$\displaystyle P_{k}=E[e_{k}e_{k}^{\top}]\mkern-14.0mu$
	$\displaystyle=P_{k\|k-1}$	$\displaystyle-\int_{\mathbb{R}^{d}}\mkern-7.0mu\kappa_{k}(i)\gamma(i)diP_{k\|k-1}-P_{k\|k-1}\int_{\mathbb{R}^{d}}\mkern-7.0mu\gamma^{\top}(i)\kappa_{k}^{\top}(i)di$
		$\displaystyle+\int_{\mathbb{R}^{d}}\kappa_{k}(i)\gamma(i)diP_{k\|k-1}\int_{\mathbb{R}^{d}}\gamma^{\top}(i)\kappa_{k}^{\top}(i)di$
		$\displaystyle+\int_{\mathbb{R}^{d}}\int_{\mathbb{R}^{d}}\kappa_{k}(i)R(i-i^{\prime})\kappa_{k}(i^{\prime})^{\top}\,di\,di^{\prime}.$

Optimal Linear Filtering for Discrete-Time Systems with Infinite-Dimensional Measurements

Abstract

1 Introduction

1.1 Literature Review

1.2 Contributions

1.3 Organization

2 Preliminaries

2.1 Hilbert Space of Random Variables

Theorem 2.1 (Hilbert Projection Theorem)

2.2 Random Fields

2.3 The Multi-Dimensional Fourier Transform

Definition 2.1 (Multi-Dimensional Fourier Transform)

2.4 Stabilizability and Detectability

Definition 2.2 (Stabilizability)

Definition 2.3 (Detectability)

3 Problem Formulation and Assumptions

A1

A2

A3

A4

A5

A6

A7

Theorem 3.1 (Fubini-Tonelli for w.s.s. random fields)

4 Optimal Linear Filter Derivation

4.1 Optimality Conditions

Theorem 4.1

Lemma 4.2 (𝒮k⟂Mk−1\mathcal{S}_{k}\perp M_{k-1})

Proof 4.3.

Proof 4.4.

4.2 An Explicit Solution for the Optimal Gain Function

Corollary 4.5.

Proof 4.6.

4.3 The Optimal Filter

Theorem 4.7.

Proof 4.8.

5 Optimal Linear Filter Stability

Lemma 5.1.

Proof 5.2.

Theorem 5.3.

Proof 5.4.

6 Algorithm and Implementation

6.1 Computational Complexity

6.2 Comparisons to Existing Results

7 Simulation Results

8 Conclusion

9 Existence of Integral (5) via Fubini-Tonelli Theorem

A8

Proof 9.1.

References

Optimal Linear Filtering for
Discrete-Time Systems with
Infinite-Dimensional Measurements

Lemma 4.2 ( $\mathcal{S}_{k}\perp M_{k-1}$ )