Duality for Nonlinear Filtering II: Optimal Control

Jin W. Kim \IEEEmembershipStudent member, IEEE and Prashant G. Mehta \IEEEmembershipSenior member, IEEE This work is supported in part by the NSF award 1761622.Research reported in this paper was carried out by J. W. Kim, as part of his PhD dissertation work, while he was a graduate student at the University of Illinois at Urbana-Champaign. He is now with the Institute of Mathematics at the University of Potsdam (e-mail: [email protected]).P. G. Mehta is with the Coordinated Science Laboratory and the Department of Mechanical Science and Engineering at the University of Illinois at Urbana-Champaign (e-mail: [email protected]).

Abstract

This paper is concerned with the development and use of duality theory for a nonlinear filtering model with white noise observations. The main contribution of this paper is to introduce a stochastic optimal control problem as a dual to the nonlinear filtering problem. The mathematical statement of the dual relationship between the two problems is given in the form of a duality principle. The constraint for the optimal control problem is the backward stochastic differential equation (BSDE) introduced in the companion paper. The optimal control solution is obtained from an application of the maximum principle, and subsequently used to derive the equation of the nonlinear filter. The proposed duality is shown to be an exact extension of the classical Kalman-Bucy duality, and different from other types of optimal control and variational formulations given in literature.

{IEEEkeywords}

Stochastic systems; Optimal control; Nonlinear filtering.

1 Introduction

In this paper, we continue the development of duality theory for nonlinear filtering. While the companion paper (part I) was concerned with a (dual) controllability counterpart of stochastic observability, the purpose of this present paper (part II) is to express the nonlinear filtering problem as a (dual) optimal control problem. The proposed duality is shown to be an exact extension of the original Kalman-Bucy duality [1, 2], in the sense that the dual optimal control problem has the same minimum variance structure for both linear and nonlinear filtering problems. Because of its historical importance, we begin by introducing and reviewing the classical duality for the linear Gaussian model.

1.1 Background and literature review

The linear Gaussian filtering model is as follows:


$\displaystyle\,\mathrm{d}X_{t}$	$\displaystyle=A^{\hbox{\rm\tiny T}}X_{t}\,\mathrm{d}t+\sigma\,\mathrm{d}B_{t},\quad X_{0}\sim N(m_{0},\Sigma_{0})$	(1a)
$\displaystyle\,\mathrm{d}Z_{t}$	$\displaystyle=H^{\hbox{\rm\tiny T}}X_{t}\,\mathrm{d}t+\,\mathrm{d}W_{t}$	(1b)

where $X:=\{X_{t}\in\mathbb{R}^{d}:0\leq t\leq T\}$ is the state process, the prior $N(m_{0},\Sigma_{0})$ is a Gaussian density with mean $m_{0}\in\mathbb{R}^{d}$ and variance $\Sigma_{0}\succeq 0$ , $Z:=\{Z_{t}:0\leq t\leq T\}$ is the observation process, and both $B:=\{B_{t}:0\leq t\leq T\}$ and $W:=\{W_{t}:0\leq t\leq T\}$ are Brownian motion (B.M.). It is assumed that $X_{0},B,W$ are mutually independent. The model parameters $A\in\mathbb{R}^{d\times d}$ , $H\in\mathbb{R}^{d\times m}$ , and $\sigma\in\mathbb{R}^{d\times p}$ .

For this problem, the dual optimal control formulations are well-understood. These are of following two types:

•

Minimum variance optimal control problem:


$\displaystyle\mathop{\text{Minimize}}_{\stackrel{{\scriptstyle\{u_{t}\in\mathbb{R}^{m}:0\leq t\leq T\}}}{{=:u}}}\!:\quad{\sf J}(u)$	$\displaystyle=\|y_{0}\|^{2}_{\Sigma_{0}}+\int_{0}^{T}y_{t}^{\hbox{\rm\tiny T}}(\sigma\sigma^{\hbox{\rm\tiny T}})y_{t}+\|u_{t}\|^{2}\,\mathrm{d}t$	(2a)
$\displaystyle\text{Subject to}\;\;:\;-\frac{\,\mathrm{d}y_{t}}{\,\mathrm{d}t}$	$\displaystyle=Ay_{t}+Hu_{t},\quad y_{T}=f\;\;\text{(given)}$	(2b)

•

Minimum energy optimal control problem:


$\displaystyle\mathop{\text{Minimize}}_{\stackrel{{\scriptstyle\{u_{t}\in\mathbb{R}^{p}:0\leq t\leq T\}=:u}}{{\tilde{m}_{0}\in\mathbb{R}^{d}}}}\!:\quad$	$\displaystyle{\sf J}(u,\tilde{m}_{0};z)=\|m_{0}-\tilde{m}_{0}\|^{2}_{\Sigma_{0}^{-1}}$
	$\displaystyle\qquad+\int_{0}^{T}\|u_{t}\|^{2}+\|\dot{z}_{t}-H^{\hbox{\rm\tiny T}}\tilde{m}_{t}\|^{2}\,\mathrm{d}t$	(3a)
$\displaystyle\text{Subject to}\;\;:\;$	$\displaystyle\frac{\,\mathrm{d}\tilde{m}_{t}}{\,\mathrm{d}t}=A^{\top}\tilde{m}_{t}+\sigma u_{t}$	(3b)

where $z=\{z_{t}\in\mathbb{R}^{m}:0\leq t\leq T\}$ is a given sample path of observations.

These two types of linear quadratic (LQ) optimal control problems are known since 1960s and described in [3, Sec. 7.3.1 and 7.3.2]. Because it is discussed in the seminal paper [2] of Kalman and Bucy, the minimum variance duality (2) is also referred to as the Kalman-Bucy duality [4]. The relationship of the two problems to the model (1) is as follows:

•

Minimum variance duality is related to the filtering problem for the model (1). The optimal control cost (2a) comes from specifying a minimum variance objective for estimating the random variable $f^{\hbox{\rm\tiny T}}X_{T}$ for $f\in\mathbb{R}^{d}$ .
•

Minimum energy duality is related to a smoothing problem for the model (1). The optimal cost (3a) is obtained from specifying a maximum likelihood (ML) objective for estimating a trajectory $\{\tilde{m}_{t}:0\leq t\leq T\}$ given a sample path $\{z_{t}:0\leq t\leq T\}$ of observations.

Their respective solutions are related to (1) as follows:

•

The solution of the minimum variance duality (2) is useful to derive the Kalman filter for (1) [5, Ch. 7.6]. The derivation helps explain why the covariance equation of the Kalman filter is the same as the differential Ricatti equation (DRE) of the LQ optimal control. Note however that the time arrow is reversed: the DRE is solved in forward time for the Kalman filter. This is because the constraint (2b) is a backward (in time) ordinary differential equation (ODE).
•

The solution of the minimum energy duality (3) is a favorite technique to derive the forward-backward equations of smoothing for the model (1). The Hamilton’s equation for (3) is referred to as the Bryson-Frazier formula [6, Eq. (13.3.4)]. By introducing a DRE, other forms of solution, e.g., the Fraser-Potter smoother [7, Eq. (16)-(17)], are possible and useful in practice.

Given this background for the linear Gaussian model (1), there has been extensive work spanning decades on extending duality to the problems of nonlinear filtering and smoothing. The prominent duality type solution approaches in literature include the following:

•

Mortensen’s maximum likelihood estimator (MLE) [8].
•

Minimum energy estimator (MEE) in the model predictive control (MPC) literature [9, Ch. 4].
•

Log transformation relationship between the Zakai equation of nonlinear filtering and the Hamilton-Jacobi-Bellman (HJB) equation of optimal control [10].
•

Mitter and Newton’s variational formulation of the nonlinear smoothing problem [11].

In an early work [8], Mortensen considered a slightly more general version of the linear Gaussian model (1) where the drift terms in both (1a) and (1b) are nonlinear. Both the optimal control problem and its forward-backward solution are straightforward extensions of (3). Since 1960s, closely related extensions have appeared by different names in different communities, e.g., maximum likelihood estimation (MLE), maximum a posteriori (MAP) estimation, and the minimum energy estimation (MEE) which is discussed next.

Based on the use of duality, the theory and algorithms developed in the MPC literature are readily adapted to solve state estimation problems. The resulting class of estimators is referred to as the minimum energy estimator (MEE) [9, Ch. 4]. The MEE algorithms are broadly of two types: (i) Full information estimator (FIE) where the entire history of observation is used; and (ii) Moving horizon estimator (MHE) where only a most recent fixed window of observation is used. An important motivation is to also incorporate additional constraints in estimator design. Early papers include [12, 13, 14] and more recent extensions have appeared in [15, 16, 17, 18]. A historical survey is given in [9, Sec. 4.7] where Rawlings et. al. write “establishing duality [of optimal estimator] with the optimal regulator is a favorite technique for establishing estimator stability”. Although the specific comment is made for the Kalman filter, the remainder of the chapter amply demonstrates the utility of dual constructions for both algorithm design and convergence analysis (as the time-horizon $T\to\infty$ ). Convergence analysis typically requires additional assumptions on the model which in turn has motivated the work on nonlinear observability and detectability definitions. A literature survey of these definitions, including the connections to duality theory, appears in the introduction of the companion paper [19].

While the focus of MEE is on deterministic models, duality is also an important theme in the study of nonlinear stochastic systems (hidden Markov models). A key concept is the log transformation [20]. In [10], the log transformation was used to transform the Zakai equation into a Hamilton-Jacobi-Bellman (HJB) equation. Because of this, the negative log of a posterior density is a value function for some stochastic optimal control problem (this is how duality is understood in stochastic settings [21, Sec. 4.8]). While the problem itself was not clarified in [10] (see however [22]), Mitter and Newton introduced a dual optimal control problem in [11] based on a variational interpretation of the Bayes’ formula. This work continues to impact algorithm design which remains an important area of research [23, 24, 25, 26, 27]. A notable ensuing contribution appeared in the PhD thesis-work [28] where Mitter-Newton duality is used to obtain results on nonlinear filter stability.

Given the importance of duality for the purposes of stability analysis in both deterministic and stochastic settings of the problem, it is useful to return to the linear Gaussian model (1) and compare the two types of duality (2) and (3). An important point, that has perhaps not been stressed in literature, is that minimum variance duality (2) is more compatible with the classical duality between controllability and observability in linear systems theory. This is because of the following reasons:

•

Inputs and outputs. In (2), the control input $u$ has the same dimension $m$ as the output process while in (3), the control input $u$ is the dimension $n$ of the process noise. Evidently, it is natural to view the inputs and outputs as dual processes that have the same dimension.
•

Constraint. If we ignore the noise terms in (1) then the resulting deterministic state-output system ( $\dot{x}_{t}=A^{\top}x_{t}$ and $z_{t}=H^{\top}x_{t}$ ) shares a dual relationship with the deterministic state-input system (2b). (It is shown in part I [19, Sec. III-F] that (2b) is also the dual for the stochastic system (1)). In contrast, the ODE (3b) is a modified copy of the model (1a).
•

Stability condition. The condition for asymptotic analysis of (2) is stabilizability of (2b) and by duality detectability of $(A^{\hbox{\rm\tiny T}},H^{\hbox{\rm\tiny T}})$ . The latter is known to be also the appropriate condition for stability of the Kalman filter. In contrast, for (3), asymptotic convergence of the optimal $\tilde{m}_{T}$ is possible even with $\sigma=0$ . The important condition again is detectability of $(A^{\hbox{\rm\tiny T}},H^{\hbox{\rm\tiny T}})$ but it is not at all easy to see from (3).
•

Arrow of time. Because the respective DREs are solved forward (resp. backward) in time for optimal filtering (resp. control), the arrow of time flips between optimal control and optimal filtering. Evidently, this is the case for minimum variance duality (2) but not so for the minimum energy duality (3): The constraint (2b) is a backward in time ODE while the constraint (3b) is a modified copy of the signal model which proceeds forward in time.

All of this suggests that a fruitful approach – for both defining observability and for using the definition for asymptotic stability analysis – is to consider the minimum variance duality, which naturally begets the following questions:

•

What are the appropriate extensions of (2) and (3) for nonlinear deterministic and stochastic systems?
•

What type of duality is implicit in Mitter-Newton’s work? It is already evident that MEE is an extension of (3).

Both of these questions are answered in the present paper (for the white noise observation model). Before discussing the original contributions, it is noted that the past work on minimum variance duality has been on refinement and extensions of the linear model with additional constraints. In [29], it is used to obtain the solution to a class of singular regulator problems, and in [30], the Lagrangian dual for an MEE problem with truncated measurement noise is considered. Numerical algorithms for (2) and its extensions appear in [31, 32, 33, 34]. Prior to our work, it was widely believed that the nonlinear extension of minimum variance duality is not possible [4].

1.2 Summary of original contributions

The main contribution of this paper is to present a minimum variance dual to the nonlinear filtering problem. As in the companion paper (part I), the nonlinear filtering problem is for the HMM with the white noise observation model. The mathematical statement of the dual relationship between optimal filtering and optimal control is given in the form of a duality principle (Thm. 1). The principle relates the value of the control problem to the variance of the filtering problem. The classical Kalman-Bucy duality (2) is recovered as a special case for the linear-Gaussian model (1).

Two approaches are described to solve the optimal control problem: (i) Based on the use of the stochastic maximum principle to derive the Hamilton’s equation (Thm. 4.9); and (ii) Based on a martingale characterization (Thm. 5.14). A formula for the optimal control as a feedback control law is obtained and used to derive the equation of the optimal nonlinear filter. Our duality is also related to Mitter-Newton duality with a side-by-side comparison in Table 1.

This paper is drawn from the PhD thesis of the first author [35]. A prior conference version appeared in [36]. While the duality principle was already stated in the conference paper, it relied on a certain assumption [36, Assumption A1] which has now been proved. Various formulae are stated more simply, e.g., the use of carré du champ operator to specify the running cost. Issues related to function spaces have been clarified to a large extent. While the conference version relied on the innovation process, the present version directly works with the observation process. Such a choice is more natural for the problem at hand. As a result, most of the results and certainly their proofs are novel. Comparison with Mitter-Newton duality is also novel.

1.3 Paper outline

The outline of the remainder of this paper is as follows: The mathematical model and necessary background appears in Sec. 2. The dual optimal control problem together with the duality principle and its relation to the linear-Gaussian case is described in Sec. 3. Its solution using the maximum principle and the martingale characterization appears in Sec. 4 and Sec. 5, respectively. Duality-based derivation of the equation of the nonlinear filter appears in Sec. 6. A comparison with Mitter-Newton duality is contained in Sec. 7. The paper closes with some conclusions and directions for future work in Sec. 8. All the proof are contained in the Appendix.

2 Background

We briefly review the model and the notation as presented in [19]. Although the presentation is self-contained, it is in an abbreviated form with a focus on additional new concepts that are necessary for this paper.

On the probability space $(\Omega,{\cal F}_{T},{\sf P})$ , we consider a pair of continuous-time stochastic processes $(X,Z)$ as follows:

•

The state process $X=\{X_{t}:\Omega\to\mathbb{S}:0\leq t\leq T\}$ is a Feller-Markov process taking values in the state-space $\mathbb{S}$ . The prior is denoted by $\mu\in{\cal P}(\mathbb{S})$ (space of probability measures) and $X_{0}\sim\mu$ . The infinitesimal generator is denoted by ${\cal A}$ .
•

The observation process $Z=\{Z_{t}:0\leq t\leq T\}$ satisfies the stochastic differential equation (SDE):

$Z_{t}=\int_{0}^{t}h(X_{s})\,\mathrm{d}s+W_{t},\quad t\geq 0$ (4)

where $h:\mathbb{S}\to\mathbb{R}^{m}$ is referred to as the observation function and $W=\{W_{t}:0\leq t\leq T\}$ is an $m$ -dimensional Brownian motion (B.M.). We write $W$ is ${\sf P}$ -B.M. It is assumed that $W$ is independent of $X$ .

The above is referred to as the white noise observation model of nonlinear filtering. The model is denoted by $({\cal A},h)$ .

An important additional concept in this paper is the carré du champ operator $\Gamma$ defined as follows (see [37]):

(\Gamma f)(x)=({\cal A}f^{2})(x)-2f(x)({\cal A}f)(x),\quad x\in\mathbb{S}

where $f:\mathbb{S}\to\mathbb{R}$ is a test function. Explicit formulae for the most important examples are described next.

2.1 Guiding examples

Example 1 (Finite state-space)

$\mathbb{S}=\{1,2,\ldots,d\}$ . A real-valued function $f$ is identified with a vector in $\mathbb{R}^{d}$ where the $i^{\text{th}}$ element of the vector is $f(i)$ . In this manner, the generator ${\cal A}$ of the Markov process is identified with a row-stochastic rate matrix $A\in\mathbb{R}^{d\times d}$ (the non-diagonal elements of $A$ are non-negative and the row sum is zero). The carré du champ operator $\Gamma:\mathbb{R}^{d}\to\mathbb{R}^{d}$ is as follows:

(\Gamma f)(i)=\sum_{j\in\mathbb{S}}A(i,j)(f(i)-f(j))^{2},\quad i\in\mathbb{S}

(5)

Example 2 (Euclidean state-space)

$\mathbb{S}=\mathbb{R}^{d}$ . The Markov process $X$ is an Itô diffusion modeled using a stochastic differential equation (SDE):

\,\mathrm{d}X_{t}=a(X_{t})\,\mathrm{d}t+\sigma(X_{t})\,\mathrm{d}B_{t},\quad X_{0}\sim\mu

where $a\in C^{1}(\mathbb{R}^{d};\mathbb{R}^{d})$ and $\sigma\in C^{2}(\mathbb{R}^{d};\mathbb{R}^{d\times p})$ satisfy appropriate technical conditions such that a strong solution exists for $[0,T]$ , and $B=\{B_{t}:0\leq t\leq T\}$ is a standard B.M. assumed to be independent of $X_{0}$ and $W$ . In the Euclidean case, all the measures are identified with their density. In particular, we use the notation $\mu$ to denote the probability density function of the prior.

The infinitesimal generator ${\cal A}$ acts on $C^{2}(\mathbb{R}^{d};\mathbb{R})$ functions in its domain according to [38, Thm. 7.3.3]

({\cal A}f)(x):=a^{\hbox{\rm\tiny T}}(x)\nabla f(x)+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}\mbox{tr}\big{(}\sigma\sigma^{\hbox{\rm\tiny T}}(x)(D^{2}f)(x)\big{)},\quad x\in\mathbb{R}^{d}

where $\nabla f$ is the gradient vector and $D^{2}f$ is the Hessian matrix. For $f\in C^{1}(\mathbb{R}^{d};\mathbb{R})$ , the carré du champ operator is given by

(\Gamma f)(x)=\big{|}\sigma^{\hbox{\rm\tiny T}}(x)\nabla f(x)\big{|}^{2},\quad x\in\mathbb{R}^{d}

(6)

Example 3 (Linear Gaussian model)

The model (1) introduced in Sec. 1 is a special case of Itô diffusion where the drift terms are linear $a(x)=A^{\hbox{\rm\tiny T}}x$ and $h(x)=H^{\hbox{\rm\tiny T}}x$ , the coefficient of the process noise $\sigma(x)=\sigma$ is a constant matrix, and the prior $\mu$ is a Gaussian density. A real-valued linear function is expressed as

f(x)=\tilde{f}^{\hbox{\rm\tiny T}}x,\quad x\in\mathbb{R}^{d}

where $\tilde{f}\in\mathbb{R}^{d}$ . Then ${\cal A}f$ is also a linear function given by

\big{(}{\cal A}f\big{)}(x)=(A\tilde{f})^{\hbox{\rm\tiny T}}x,\quad x\in\mathbb{R}^{d}

and $\Gamma f$ is a constant function given by

\big{(}\Gamma f\big{)}(x)=\tilde{f}^{\hbox{\rm\tiny T}}\big{(}\sigma\sigma^{\hbox{\rm\tiny T}}\big{)}\tilde{f},\quad x\in\mathbb{R}^{d}

(7)

2.2 Background on nonlinear filtering

The canonical filtration ${\cal F}_{t}=\sigma\big{(}\{(X_{s},W_{s}):0\leq s\leq t\}\big{)}$ . The filtration generated by the observation is denoted by ${\cal Z}:=\{{\cal Z}_{t}:0\leq t\leq T\}$ where ${\cal Z}_{t}=\sigma\big{(}\{Z_{s}:0\leq s\leq t\}\big{)}$ . A standard approach is based upon the Girsanov change of measure. Suppose the model satisfies the Novikov’s condition: ${\sf E}\left(\exp\big{(}{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}\int_{0}^{T}|h(X_{t})|^{2}\,\mathrm{d}t\big{)}\right)<\infty$ . Define a new measure ${\tilde{\sf P}}$ on $(\Omega,{\cal F}_{T})$ as follows:

\frac{\,\mathrm{d}{\tilde{\sf P}}}{\,\mathrm{d}{\sf P}}=\exp\Big{(}-\int_{0}^{T}h^{\hbox{\rm\tiny T}}(X_{t})\,\mathrm{d}W_{t}-{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}\int_{0}^{T}|h(X_{t})|^{2}\,\mathrm{d}t\Big{)}=:D_{T}^{-1}

Then it is shown that the probability law for $X$ is unchanged but $Z$ is a ${\tilde{\sf P}}$ -B.M. that is independent of $X$ [28, Lem. 1.1.5]. The expectation with respect to ${\tilde{\sf P}}$ is denoted by ${\tilde{\sf E}}(\cdot)$ .

The two probability measures are used to define the un-normalized and the normalized (or nonlinear) filter are as follows: For $0\leq t\leq T$ and $f\in C_{b}(\mathbb{S})$ ,

	$\displaystyle\text{(un-normalized filter)}\quad\sigma_{t}(f)$	$\displaystyle:={\tilde{\sf E}}\big{(}D_{t}f(X_{t})\|{\cal Z}_{t}\big{)}$
	$\displaystyle\text{(nonlinear filter)}\quad\pi_{t}(f)$	$\displaystyle:={\sf E}\big{(}f(X_{t})\|{\cal Z}_{t}\big{)}$

As the name suggests, $\pi_{t}(f)=\frac{\sigma_{t}(f)}{\sigma_{t}({\sf 1})}$ which is referred to as the Kallianpur-Striebel formula [39, Thm. 5.3] (here ${\sf 1}$ is the constant function ${\sf 1}(x)=1$ for all $x\in\mathbb{S}$ ). Combining the tower property of conditional expectation with the change of measure gives

{\sf E}(f(X_{t}))={\sf E}(\pi_{t}(f))={\tilde{\sf E}}(\sigma_{t}(f))

(8)

2.3 Function spaces

The notation $L^{2}_{{\cal Z}_{T}}(\Omega;\mathbb{R}^{m})$ and $L^{2}_{{\cal Z}}\big{(}[0,T];\mathbb{R}^{m}\big{)}$ is used to denote the Hilbert space of ${\cal Z}_{T}$ -measurable random vector and ${\cal Z}$ -adapted stochastic process, respectively. These Hilbert spaces suffice if the state-space is finite. In general settings, let ${\cal Y}$ denote a suitable Banach space of real-valued functions on $\mathbb{S}$ , equipped with the norm $\|\cdot\|_{\cal Y}$ . Then

•

For a random function, the Banach space $L^{2}_{{\cal Z}_{T}}(\Omega;{\cal Y}):=\big{\{}F:\Omega\to{\cal Y}:F\text{ is }{\cal Z}_{T}\text{-measurable},\;{\tilde{\sf E}}\big{(}\|F\|_{\cal Y}^{2}\big{)}<\infty\big{\}}$ .
•

For a function-valued stochastic process, the Banach space is $L^{2}_{{\cal Z}}([0,T];{\cal Y}):=\Big{\{}Y:\Omega\times[0,T]\to{\cal Y}\;:\;Y\text{ is }{\cal Z}\text{-adapted},\;{\tilde{\sf E}}\Big{(}\int_{0}^{T}\|Y_{t}\|_{\cal Y}^{2}\,\mathrm{d}t\Big{)}<\infty\Big{\}}$ .

In the remainder of this paper, we set ${\cal Y}:=C_{b}(\mathbb{S})$ (the space of continuous and bounded functions) equipped with the sup-norm. The dual space ${\cal M}(\mathbb{S})$ (the space of rba measures) is denoted by ${\cal Y}^{\dagger}$ where the duality pairing $\langle f,\rho\rangle=\rho(f)$ for $f\in{\cal Y}$ and $\rho\in{\cal Y}^{\dagger}$ .

3 Main result: The duality principle

3.1 Problem statement

For a function $F\in L^{2}_{{\cal Z}_{T}}\big{(}\Omega;{\cal Y}\big{)}$ , the nonlinear filter $\pi_{T}(F)$ is the minimum variance estimate of $F(X_{T})$ [3, Sec. 6.1.2]:

\pi_{T}(F)=\mathop{\operatorname{argmin}}_{S_{T}\in L^{2}_{{\cal Z}_{T}}(\Omega;\mathbb{R})}{\sf E}\big{(}|F(X_{T})-S_{T}|^{2}\big{)}

(9)

Our goal is to express the above minimum variance optimization problem as a dual optimal control problem.

The conditional variance is denoted by

{\cal V}_{T}(F):={\sf E}\big{(}|F(X_{T})-\pi_{T}(F)|^{2}|{\cal Z}_{T}\big{)}=\pi_{T}(F^{2})-\big{(}\pi_{T}(F)\big{)}^{2}

For notational ease, the expected value of the conditional variance is denoted by

\operatorname{var}_{T}(F):={\sf E}\big{(}{\cal V}_{T}(F)\big{)}

Strictly speaking, the above is variance only at time $T=0$ . However, the verbiage is consistent with the “minimum variance” interpretation of the nonlinear filter.

3.2 Dual optimal control problem

The function space of admissible control inputs is denoted by ${\cal U}:=L^{2}_{{\cal Z}}\big{(}[0,T];\mathbb{R}^{m}\big{)}$ . An element of ${\cal U}$ is denoted $U=\{U_{t}:0\leq t\leq T\}$ . It is referred to as the control input. The main contribution of this paper is the following problem.

•

Minimum variance optimal control problem:


	$\displaystyle\mathop{\text{Minimize:}}_{U\in\;{\cal U}}\;{\sf J}_{T}(U)=\operatorname{var}_{0}(Y_{0})+{\sf E}\Big{(}\int_{0}^{T}l(Y_{t},V_{t},U_{t}\,;X_{t})\,\mathrm{d}t\Big{)}$		(10a)
	Subject to (BSDE constraint):
	$\displaystyle-\!\,\mathrm{d}Y_{t}(x)=\big{(}({\cal A}Y_{t})(x)+h(x)(U_{t}+V_{t}(x))\big{)}\,\mathrm{d}t-V_{t}^{\hbox{\rm\tiny T}}(x)\,\mathrm{d}Z_{t}$
	$\displaystyle\quad\;Y_{T}(x)=F(x),\;\;x\in\mathbb{S}$		(10b)

where the running cost

l(y,v,u;x):=(\Gamma y)(x)+|u+v(x)|^{2}

and $\operatorname{var}_{0}(Y_{0})={\sf E}(|Y_{0}(X_{0})-\mu(Y_{0})|^{2})=\mu(Y_{0}^{2})-\mu(Y_{0})^{2}$ .

Remark 1

The BSDE (10b) is introduced in the companion paper (part I) as the dual control system. The data for the BSDE is the given terminal condition $F\in L^{2}_{{\cal Z}_{T}}\big{(}\Omega;{\cal Y}\big{)}$ and the control input $U\in{\cal U}$ . The solution of the BSDE is the pair $(Y,V)=\{(Y_{t},V_{t}):0\leq t\leq T\}\in L^{2}_{{\cal Z}}\big{(}[0,T];{\cal Y}\times{\cal Y}^{m}\big{)}$ which is (forward) adapted to the filtration ${\cal Z}$ . Existence, uniqueness, and regularity theory for linear BSDEs is standard and throughout the paper, we assume that the solution of BSDE $(Y,V)$ is uniquely determined in $L^{2}_{{\cal Z}}\big{(}[0,T];{\cal Y}\times{\cal Y}^{m}\big{)}$ for each given $Y_{T}\in L^{2}_{{\cal Z}_{T}}(\Omega;{\cal Y})$ and $U\in L_{\cal Z}^{2}\big{(}[0,T];\mathbb{R}^{m}\big{)}$ . The well-posedness results for finite state-space can be found in [40, Ch. 7] and for the Euclidean state space in [41].

The relationship of (10) to the minimum variance objective (9) is given the following theorem.

Theorem 1 (Duality principle)

For any admissible control $U\in{\cal U}$ , consider an estimator

S_{T}:=\mu(Y_{0})-\int_{0}^{T}U_{t}^{\hbox{\rm\tiny T}}\,\mathrm{d}Z_{t}

(11)

Then

{\sf J}_{T}(U)={\sf E}\big{(}|F(X_{T})-S_{T}|^{2}\big{)}

(12)

Proof 3.2.

See Appendix A.1.

The problem (10) is a stochastic linear quadratic optimal control problem for which there is a well established existence-uniqueness theory for the optimal control solution. Application of this theory is the subject of the following section. For now, we assume that the optimal control is well-defined and denote it as $U^{\text{\rm(opt)}}=\{U_{t}^{\text{\rm(opt)}}:0\leq t\leq T\}$ . Because the right-hand side of the identity (12) is bounded below by $\text{var}_{T}(F)$ , the duality gap

{\sf J}_{T}(U^{\text{\rm(opt)}})-\operatorname{var}_{T}(F)\geq 0

In order to conclude that the duality gap is zero, it is both necessary and sufficient to show that there exists a $U\in{\cal U}$ such that the estimator $S_{T}$ , as given by (11), equals $\pi_{T}(F)$ . Since $Z$ is a ${\tilde{\sf P}}$ -B.M., the following lemma is a consequence of the Itô representation theorem for Brownian motion [38, Thm. 4.3.3].

Lemma 3.3.

For any $F\in L_{{\cal Z}_{T}}^{2}(\Omega;{\cal Y})$ , there exists a unique $U\in{\cal U}$ such that

\pi_{T}(F)={\tilde{\sf E}}\big{(}\pi_{T}(F)\big{)}-\int_{0}^{T}U_{t}^{\hbox{\rm\tiny T}}\,\mathrm{d}Z_{t},\quad{\tilde{\sf P}}\text{-a.s.}

Proof 3.4.

See Appendix A.2.

Because the duality gap is zero, the following implications are to be had:

•

The optimal control $U^{\text{\rm(opt)}}$ gives the conditional mean

$\pi_{T}(F)=\mu(Y_{0})-\int_{0}^{T}\big{(}U_{t}^{\text{\rm(opt)}}\big{)}^{\hbox{\rm\tiny T}}\,\mathrm{d}Z_{t},\quad{\sf P}\text{-a.s.}$

•

The optimal value is the expected value of the conditional variance

\operatorname{var}_{T}(F)=\operatorname{var}_{0}(Y_{0})+{\sf E}\Big{(}\int_{0}^{T}l(Y_{t},V_{t},U_{t}^{\text{\rm(opt)}};X_{t})\,\mathrm{d}t\Big{)}

where $(Y,V)$ is the optimally controlled stochastic process obtained with $U=U^{\text{\rm(opt)}}$ in (10b).

In fact, these two implications carry over to the entire optimal trajectory.

Proposition 3.5.

Suppose $U^{\text{\rm(opt)}}$ is the optimal control input and that $(Y,V)$ is the associated solution of the BSDE (10b). Then for almost every $0\leq t\leq T$ ,

	$\displaystyle\pi_{t}(Y_{t})$	$\displaystyle=\mu(Y_{0})-\int_{0}^{t}\big{(}U_{s}^{\text{\rm(opt)}}\big{)}^{\hbox{\rm\tiny T}}\,\mathrm{d}Z_{s},\quad{\sf P}\text{-a.s.}$		(13)
	$\displaystyle\operatorname{var}_{t}(Y_{t})$	$\displaystyle=\operatorname{var}_{0}(Y_{0})+{\sf E}\Big{(}\int_{0}^{t}l(Y_{s},V_{s},U_{s}^{\text{\rm(opt)}};X_{s})\,\mathrm{d}s\Big{)}$		(14)

Proof 3.6.

See Appendix A.3.

Consequently, the expected value of the conditional variance is the optimal cost-to-go (for a.e. $0\leq t\leq T$ ). We do not yet have a formula for the optimal control $U^{\text{\rm(opt)}}$ . The difficulty arises because there is no HJB equation for BSDE-constrained optimal control problem. Instead, the literature on such problem utilizes the stochastic maximum principle for BSDE which is the subject of the next section. Before that, we discuss the linear Gaussian case.

3.3 Linear Gaussian case

The goal is to show that the classical Kalman-Bucy duality (2) described in Sec. 1 for the linear Gaussian model (1) is a special case. Consider a linear function $F(x)=f^{\hbox{\rm\tiny T}}x$ where $f\in\mathbb{R}^{d}$ is a given deterministic vector. The problem is to compute a minimum variance estimate of the scalar random variable $f^{\hbox{\rm\tiny T}}X_{T}$ . It is given by ${\sf E}(f^{\hbox{\rm\tiny T}}X_{T}|{\cal Z}_{T})$ . Now, it is a standard result in the theory of Gaussian processes that conditional expectation can be evaluated in the form of a linear predictor [42, Cor. 1.10]. For this reason, it suffices to consider an estimator of the form

S_{T}:=b-\int_{0}^{T}u_{t}^{\hbox{\rm\tiny T}}\,\mathrm{d}Z_{t}

where $b\in\mathbb{R}$ and $u=\{u_{t}\in\mathbb{R}^{m}:0\leq t\leq T\}$ are both deterministic (the lower case notation is used to stress this). Consequently, for linear Gaussian estimation, we can restrict the admissible space of control inputs to $L^{2}\big{(}[0,T];\mathbb{R}^{m}\big{)}$ which is a much smaller subspace of $L_{{\cal Z}}^{2}\big{(}[0,T];\mathbb{R}^{m}\big{)}$ . Using a deterministic control $u$ , and the terminal condition $F(x)=f^{\hbox{\rm\tiny T}}x$ , the solution of the BSDE is given by

Y_{t}(x)=y_{t}^{\hbox{\rm\tiny T}}x,\quad V_{t}(x)=0,\quad x\in\mathbb{R}^{d},\;\;0\leq t\leq T

where $y=\{y_{t}\in\mathbb{R}^{d}:0\leq t\leq T\}$ is a solution of the backward ODE:

-\frac{\,\mathrm{d}y_{t}}{\,\mathrm{d}t}=Ay_{t}+Hu_{t},\quad y_{T}=f

Using the formula (7) for the carré du champ, the running cost

	$\displaystyle l(Y_{t},V_{t},U_{t};X_{t})$	$\displaystyle=(\Gamma Y_{t})(X_{t})+\|U_{t}+V_{t}(X_{t})\|^{2}$
		$\displaystyle=y_{t}^{\hbox{\rm\tiny T}}(\sigma\sigma^{\hbox{\rm\tiny T}})y_{t}+\|u_{t}\|^{2}$

With the Gaussian prior, the initial cost $\text{var}_{0}(y_{0})=y_{0}^{\hbox{\rm\tiny T}}\Sigma_{0}y_{0}$ . Combining all of the above, the optimal control problem (10) reduces to (2) for the linear Gaussian model (1).

Remark 3.7.

The solution of the optimal control problem yields the optimal control input $u^{\text{\rm(opt)}}=\{u^{\text{\rm(opt)}}_{t}:0\leq t\leq T\}$ , along with the vector $y_{0}\in\mathbb{R}^{d}$ that determines the minimum-variance estimator:

\displaystyle S_{T}

\displaystyle=\mu(y_{0}^{\hbox{\rm\tiny T}}x)-\int_{0}^{T}\big{(}u_{t}^{\text{\rm(opt)}}\big{)}^{\hbox{\rm\tiny T}}\,\mathrm{d}Z_{t}=y_{0}^{\hbox{\rm\tiny T}}m_{0}-\int_{0}^{T}\big{(}u_{t}^{\text{\rm(opt)}}\big{)}^{\hbox{\rm\tiny T}}\,\mathrm{d}Z_{t}

The Kalman filter is obtained by expressing $\{S_{t}(f):t\geq 0,\ f\in\mathbb{R}^{d}\}$ as the solution to a linear SDE [5, Ch. 7.6].

4 Solution of the optimal control problem

The BSDE constrained optimal control problem (10) is not in its standard form [43, Eq. 5.10]. There are two issues:

•

The probability space: The driving martingale of the BSDE (10b) is $Z$ , which is a ${\tilde{\sf P}}$ -B.M. However, the expectation in defining the optimal control objective (10a) is with respect to the measure ${\sf P}$ .
•

The filtration: The ‘state’ of the optimal control problem $(Y,V)$ is adapted to the filtration ${\cal Z}$ . However, the cost function (10a) also depends upon the non-adapted exogenous process $X$ .

The second problem is easily fixed by using the tower property of conditional expectation. To resolve the first problem, we have two choices:

1.

Use the change of measure to evaluate ${\sf J}_{T}(U)$ with respect to ${\tilde{\sf P}}$ measure, or
2.

Express the BSDE using a driving martingale that is a ${\sf P}$ -B.M. A convenient such process is the innovation process.

In this paper, the standard form of the dual optimal control problem is presented based on the first choice. For an analysis based on the second choice, see [36] and [35, Sec. 5.5].

In order to express the expectation for the control objective (10a) with respect to ${\tilde{\sf P}}$ , we use the change of measure (see Appendix A.4 for the calculation) to obtain

\displaystyle{\sf J}_{T}(U)

\displaystyle=\text{var}_{0}(Y_{0})+{\tilde{\sf E}}\Big{(}\int_{0}^{T}\ell(Y_{t},V_{t},U_{t};\sigma_{t})\,\mathrm{d}t\Big{)}

where the Lagrangian $\ell:{\cal Y}\times{\cal Y}^{m}\times\mathbb{R}^{m}\times{\cal Y}^{\dagger}\to\mathbb{R}$ is defined by

\ell(y,v,u;\rho)=\rho\big{(}\Gamma y\big{)}+\rho\big{(}|u+v|^{2}\big{)}

(15)

The dual optimal control problem (standard form) is now expressed as follows:


	$\displaystyle\mathop{\text{Minimize}}_{U\in{\cal U}}\quad{\sf J}_{T}(U)=\operatorname{var}_{0}(Y_{0})+{\tilde{\sf E}}\Big{(}\int_{0}^{T}\ell(Y_{t},V_{t},U_{t};\sigma_{t})\,\mathrm{d}t\Big{)}$		(16a)
	Subject to:
	$\displaystyle-\!\,\mathrm{d}Y_{t}(x)=\big{(}({\cal A}Y_{t})(x)+h^{\hbox{\rm\tiny T}}(x)(U_{t}+V_{t}(x))\big{)}\,\mathrm{d}t-V_{t}^{\hbox{\rm\tiny T}}(x)\,\mathrm{d}Z_{t}$
	$\displaystyle\quad\;Y_{T}(x)=F(x),\;\;x\in\mathbb{S}$		(16b)

Remark 4.8.

The Lagrangian is a time-dependent random functional of the dual state $(y,v)$ and the control $u$ . The randomness and time-dependency comes only from the last argument $\sigma_{t}$ .

4.1 Solution using the maximum principle

Because $y\in{\cal Y}$ is a function, the co-state $p\in{\cal Y}^{\dagger}$ is a measure. The Hamiltonian ${\cal H}:{\cal Y}\times{\cal Y}^{m}\times\mathbb{R}^{m}\times{\cal Y}^{\dagger}\times{\cal Y}^{\dagger}\to\mathbb{R}$ is defined as follows:

{\cal H}(y,v,u,p;\rho)=-p\big{(}{\cal A}y+h^{\hbox{\rm\tiny T}}(u+v)\big{)}-\ell(y,v,u;\rho)

In the following, the Hamilton’s equations for the optimal trajectory are derived by an application of the maximum principle for BSDE constrained optimal control problems [44, Thm. 4.4].

The Hamilton’s equations are expressed in terms of the derivatives of the Hamiltonian. In order to take derivatives with respect to functions and measures, we adopt the notion of Gâteaux differentiability. Given a nonlinear functional $F:{\cal Y}\to\mathbb{R}$ , the Gâteaux derivative $F_{y}(y)\in{\cal Y}^{\dagger}$ is obtained from the defining relation [3, Sec. 10.1.3]:

\frac{\,\mathrm{d}}{\,\mathrm{d}\varepsilon}F(y+\varepsilon\tilde{y})\Big{|}_{\varepsilon=0}=\big{\langle}\tilde{y},F_{y}(y)\big{\rangle},\quad\forall\,\tilde{y}\in{\cal Y}

For the problem at hand, the derivatives of the Hamiltonian are as follows:

	$\displaystyle{\cal H}_{y}(y,v,u,p;\rho)$	$\displaystyle=-{\cal A}^{\dagger}p-\big{(}\rho(\Gamma y)\big{)}_{y}$
	$\displaystyle{\cal H}_{v}(y,v,u,p;\rho)$	$\displaystyle=-ph-2(u+v)\rho$
	$\displaystyle{\cal H}_{u}(y,v,u,p;\rho)$	$\displaystyle=-p(h)-2\rho({\sf 1})u-2\rho(v)$
	$\displaystyle{\cal H}_{p}(y,v,u,p;\rho)$	$\displaystyle=-{\cal A}y-h^{\hbox{\rm\tiny T}}(u+v)$

where ${\cal A}^{\dagger}$ is the adjoint of ${\cal A}$ (whereby $({\cal A}^{\dagger}\rho)(f)=\rho({\cal A}f)$ for all $f\in{\cal Y},\rho\in{\cal Y}^{\dagger}$ ). Using this notation, the Hamilton’s equations are as follows:

Theorem 4.9.

Consider the optimal control problem (16). Suppose $U^{\text{\rm(opt)}}$ is the optimal control input and the $(Y,V)$ is the associated solution of the BSDE (16b). Then there exists a ${\cal Z}$ -adapted measure-valued stochastic process $P=\{P_{t}:0\leq t\leq T\}$ such that


	$\displaystyle\,\mathrm{d}P_{t}=-{\cal H}_{y}(Y_{t},V_{t},U_{t}^{\text{\rm(opt)}},P_{t};\sigma_{t})\,\mathrm{d}t$
	$\displaystyle\qquad\qquad-{\cal H}_{v}^{\hbox{\rm\tiny T}}(Y_{t},V_{t},U_{t}^{\text{\rm(opt)}},P_{t};\sigma_{t})\,\mathrm{d}Z_{t}$		(17a)
	$\displaystyle\,\mathrm{d}Y_{t}={\cal H}_{p}(Y_{t},V_{t},U_{t}^{\text{\rm(opt)}},P_{t};\sigma_{t})\,\mathrm{d}t+V_{t}\,\mathrm{d}Z_{t}$		(17b)
	$\displaystyle\frac{\,\mathrm{d}P_{0}}{\,\mathrm{d}\mu}(x)=2\big{(}Y_{0}(x)-\mu(Y_{0})\big{)},\quad Y_{T}(x)=F(x),\quad x\in\mathbb{S}$		(17c)

where the optimal control is given by

U_{t}^{\text{\rm(opt)}}=-\frac{1}{2}\frac{P_{t}(h)}{\sigma_{t}({\sf 1})}-\pi_{t}(V_{t}),\quad{\tilde{\sf P}}\text{-a.s.},\;0\leq t\leq T

(18)

(In (17c), $\frac{\,\mathrm{d}P_{0}}{\,\mathrm{d}\mu}$ denotes the R-N derivative of the measure $P_{0}$ with respect to the measure $\mu_{0}$ ).

Proof 4.10.

See Appendix A.5.

Remark 4.11.

From linear optimal control theory, it is known that $P_{t}$ is related to $Y_{t}$ by a ( ${\cal Z}_{t}$ -measurable) linear transformation [40, Sec. 6.6]. The boundary condition $\frac{\,\mathrm{d}P_{0}}{\,\mathrm{d}\mu}(x)=2\big{(}Y_{0}(x)-\mu(Y_{0})\big{)}$ suggests that the R-N derivative

\frac{\,\mathrm{d}P_{t}}{\,\mathrm{d}\sigma_{t}}(x)=2\big{(}Y_{t}(x)-\pi_{t}(Y_{t})\big{)},\quad{\tilde{\sf P}}\text{-a.s.},\;0\leq t\leq T,\;x\in\mathbb{S}

(19)

This is indeed the case as we show in Appendix A.6 by verifying that (19) solves the Hamilton’s equation. Combining this formula with (18), we have a formula for optimal control input as a feedback control law:

U_{t}^{\text{\rm(opt)}}=-\big{(}\pi_{t}(hY_{t})-\pi_{t}(h)\pi_{t}(Y_{t})\big{)}-\pi_{t}(V_{t}),\quad 0\leq t\leq T

4.2 Explicit formulae for the guiding examples

Example 4.12 (Finite state-space).

(Continued from Example 1). A real-valued function $f$ (resp. a measure $\rho$ ) is identified with a column vector in $\mathbb{R}^{d}$ where the $i^{\text{th}}$ element of the vector represents $f(i)$ (resp. $\rho(i)$ ), and $\rho(f)=\rho^{\hbox{\rm\tiny T}}f$ . In this manner, the generator ${\cal A}$ is identified with a rate matrix $A\in\mathbb{R}^{d\times d}$ and the observation function $h$ is identified with a matrix $H\in\mathbb{R}^{d\times m}$ . Let $\{e_{1},e_{2},\ldots,e_{d}\}$ denote the canonical basis in $\mathbb{R}^{d}$ , $Q(i)=\sum_{j\in\mathbb{S}}A(i,j)(e_{i}-e_{j})(e_{i}-e_{j})^{\hbox{\rm\tiny T}}$ and $\rho(Q)=\sum_{i\in\mathbb{S}}\rho(i)Q(i)$ . For any vector $b\in\mathbb{R}^{d}$ , $B=\operatorname{diag}(b)$ is a $d\times d$ diagonal matrix whose diagonal entires are defined as $B(i,i)=b(i)$ for $i=1,2,\ldots,d$ . For a $d\times d$ matrix $B$ , $b=\operatorname{diag}^{\dagger}(B)$ is a $d$ -dimensional vector whose entries are defined as $b(i)=B(i,i)$ for $i=1,2,\ldots,d$ .

The Lagrangian $\ell:\mathbb{R}^{d}\times\mathbb{R}^{d\times m}\times\mathbb{R}^{m}\times\mathbb{R}^{d}\to\mathbb{R}$ and the Hamiltonian ${\cal H}:\mathbb{R}^{d}\times\mathbb{R}^{d\times m}\times\mathbb{R}^{m}\times\mathbb{R}^{d}\times\mathbb{R}^{d}\to\mathbb{R}$ are as follows:

	$\displaystyle\ell(y,v,u;\rho)=y^{\hbox{\rm\tiny T}}\rho(Q)y+\rho({\sf 1})\|u\|^{2}+2u^{\hbox{\rm\tiny T}}v\rho+\rho^{\hbox{\rm\tiny T}}\operatorname{diag}^{\dagger}(vv^{\hbox{\rm\tiny T}})$
	$\displaystyle{\cal H}(y,v,u,p;\rho)=-p^{\hbox{\rm\tiny T}}(Ay+Hu+\operatorname{diag}^{\dagger}(Hv^{\hbox{\rm\tiny T}}))-\ell(y,v,u;\rho)$

The functional derivatives are now the partial derivatives. For the Hamiltonian, these are as follows:

	$\displaystyle{\cal H}_{y}(y,v,u,p;\rho)$	$\displaystyle=-A^{\hbox{\rm\tiny T}}p-2\rho(Q)y$
	$\displaystyle{\cal H}_{v}(y,v,u,p;\rho)$	$\displaystyle=-\operatorname{diag}(p)H-2\rho u^{\hbox{\rm\tiny T}}-2\operatorname{diag}(\rho)v$
	$\displaystyle{\cal H}_{u}(y,v,u,p;\rho)$	$\displaystyle=-H^{\hbox{\rm\tiny T}}p-2\rho({\sf 1})u-2v^{\hbox{\rm\tiny T}}\rho$
	$\displaystyle{\cal H}_{p}(y,v,u,p;\rho)$	$\displaystyle=-Ay-Hu-\operatorname{diag}^{\dagger}(Hv^{\hbox{\rm\tiny T}})$

The Hamilton’s equations are given by

	$\displaystyle\,\mathrm{d}P_{t}$	$\displaystyle=\big{(}A^{\hbox{\rm\tiny T}}P_{t}+2\sigma_{t}(Q)Y_{t}\big{)}\,\mathrm{d}t$
		$\displaystyle\qquad+\big{(}\operatorname{diag}(P_{t})H+2\sigma_{t}U_{t}^{\hbox{\rm\tiny T}}+2\operatorname{diag}(\sigma_{t})V_{t}\big{)}\,\mathrm{d}Z_{t}$
	$\displaystyle\,\mathrm{d}Y_{t}$	$\displaystyle=-\big{(}AY_{t}+HU_{t}+\operatorname{diag}^{\dagger}(HV_{t}^{\hbox{\rm\tiny T}})\big{)}\,\mathrm{d}t+V_{t}\,\mathrm{d}Z_{t}$
	$\displaystyle P_{0}$	$\displaystyle=2\Sigma_{0}Y_{0},\quad Y_{T}=F$

where $\Sigma_{0}:=\operatorname{diag}(\mu)-\mu\mu^{\top}$ .

Example 4.13 (Euclidean state-space).

(Continued from Example 2). We consider the Itô diffusion (2) in $\mathbb{R}^{d}$ with a prior density denoted as $\mu$ . Likewise, $\rho$ and $p$ are used to denote the density of the respective measures. Doing so, the Lagrangian and the Hamiltonian are as follows:

	$\displaystyle\ell(y,v,u;\rho)$	$\displaystyle=\int_{\mathbb{R}^{d}}\rho(x)\big{(}\|\sigma^{\hbox{\rm\tiny T}}(x)\nabla y(x)\|^{2}+\|u+v(x)\|^{2}\big{)}\,\mathrm{d}x$
	$\displaystyle{\cal H}(y,v,u,p;\rho)$	$\displaystyle=-\int_{\mathbb{R}^{d}}p(x)\big{(}{\cal A}y(x)+h^{\hbox{\rm\tiny T}}(x)(u+v(x))\big{)}\,\mathrm{d}x$
		$\displaystyle\qquad\qquad-\ell(y,v,u;\rho)$

The functional derivatives are computed by evaluating the first variation. These are as follows:

	$\displaystyle{\cal H}_{y}(y,v,u,p;\rho)$	$\displaystyle=-{\cal A}^{\dagger}p+2\nabla\cdot\big{(}\sigma\sigma^{\hbox{\rm\tiny T}}(\nabla y)\rho\big{)}$
	$\displaystyle{\cal H}_{v}(y,v,u,p;\rho)$	$\displaystyle=-ph-2(u+v)\rho$
	$\displaystyle{\cal H}_{u}(y,v,u,p;\rho)$	$\displaystyle=-p(h)-2\rho({\sf 1})u-2\rho(v)$
	$\displaystyle{\cal H}_{p}(y,v,u,p;\rho)$	$\displaystyle=-{\cal A}y-h^{\hbox{\rm\tiny T}}(u+v)$

where $\rho(v)$ is now understood to mean $\int\rho(x)v(x)\,\mathrm{d}x$ and the formula for adjoint is

({\cal A}^{\dagger}p)(x)=-\nabla\cdot(ap)(x)+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}\sum_{i,j=1}^{d}\frac{\partial^{2}}{\partial x_{i}\partial x_{j}}\big{(}[\sigma\sigma^{\hbox{\rm\tiny T}}]_{ij}p\big{)}(x)

Therefore, the Hamilton’s equations are given by

	$\displaystyle\,\mathrm{d}P_{t}(x)$	$\displaystyle=\big{(}({\cal A}^{\dagger}P_{t})(x)-2\nabla\cdot\big{(}\sigma\sigma^{\hbox{\rm\tiny T}}(\nabla Y_{t})\sigma_{t}\big{)}(x)\big{)}\,\mathrm{d}t$
		$\displaystyle\qquad+\big{(}P_{t}(x)h(x)+2(U_{t}+V_{t}(x))\sigma_{t}(x)\big{)}\,\mathrm{d}Z_{t}$
	$\displaystyle\,\mathrm{d}Y_{t}(x)$	$\displaystyle=-\big{(}{\cal A}Y_{t}+h^{\hbox{\rm\tiny T}}(x)(U_{t}+V_{t}(x))\big{)}\,\mathrm{d}t+V_{t}^{\hbox{\rm\tiny T}}(x)\,\mathrm{d}Z_{t}$
	$\displaystyle P_{0}(x)$	$\displaystyle=2\mu(x)\big{(}Y_{0}(x)-\mu(Y_{0})\big{)},\;Y_{T}(x)=F(x),\;x\in\mathbb{R}^{d}$

where note that $P_{t}$ is now a (random) function (same as $Y_{t}$ ).

5 Martingale characterization

Although we do not have an HJB equation, a martingale characterization of the optimal solution is possible as described in the following theorem:

Theorem 5.14.

Fix $U\in{\cal U}$ . Consider a ${\cal Z}$ -adapted real-valued stochastic process $M=\{M_{t}:0\leq t\leq T\}$

M_{t}:={\cal V}_{t}(Y_{t})-\int_{0}^{t}\ell(Y_{s},V_{s},U_{s};\pi_{s})\,\mathrm{d}s,\quad 0\leq t\leq T

where $(Y,V)$ is the solution to the BSDE (10b) and $\pi$ is the nonlinear filter. Then $M$ is a ${\sf P}$ -supermartingale, and $M$ is a ${\sf P}$ -martingale if and only if

U_{t}=-\big{(}\pi_{t}(hY_{t})-\pi_{t}(h)\pi_{t}(Y_{t})\big{)}-\pi_{t}(V_{t})

(20)

for $0\leq t\leq T$ , ${\sf P}\text{-a.s.}$ .

Proof 5.15.

See Appendix A.7.

A direct consequence of Thm. 5.14 is the optimality of the control (20), because

{\sf E}(M_{T})\leq{\sf E}(M_{0})

which means

{\sf E}\big{(}{\cal V}_{T}(F)\big{)}\leq{\sf E}\Big{(}{\cal V}_{0}(Y_{0})+\int_{0}^{T}\ell(Y_{t},V_{t},U_{t};\pi_{t})\,\mathrm{d}t\Big{)}={\sf J}_{T}(U)

with equality if and only if $U=U^{\text{\rm(opt)}}$ . Therefore, the expected value of the conditional variance $\text{var}_{T}(F)={\sf E}\big{(}{\cal V}_{T}(F)\big{)}$ is the optimal value functional for the optimal control problem.

Remark 5.16.

We now have a complete solution of the optimal control problem (10). Remarkably, the solution admits a meaningful interpretation not only at the terminal time $T$ but also for intermediate times $0\leq t\leq T$ . At time $t$ ,

•

The optimal value functional is $\text{var}_{t}(Y_{t})$ (formula (14)).
•

The optimal control $U_{t}^{\text{\rm(opt)}}$ is a feedback control law (20).
•

The optimal estimate is $\pi_{t}(Y_{t})$ (formula (13)).

Formula (13) for $\pi_{t}(Y_{t})$ explicitly connects the optimal control to the optimal filter. In particular, the optimal control up to time $t$ yields an optimal estimate of $Y_{t}(X_{t})$ .

Because of the BSDE constrained nature of the optimal control problem (10), an explicit characterization of the optimal value functional and the feedback form of the optimal control are both welcome surprises. It is noted that the feedback formula (20) for the optimal control is derived using two approaches: using the maximum principle (Rem. 4.11) and using the martingale characterization (Thm. 5.14).

6 Derivation of the nonlinear filter

From Prop. 3.5, using the formula (20) for optimal control

\pi_{t}(Y_{t})=\mu(Y_{0})+\int_{0}^{t}\big{(}\pi_{t}(hY_{s})-\pi_{s}(h)\pi_{s}(Y_{s})+\pi_{s}(V_{s})\big{)}^{\hbox{\rm\tiny T}}\,\mathrm{d}Z_{s}

(21)

for $0\leq t\leq T$ , ${\sf P}\text{-a.s.}$ . Because the equation for $Y$ is known, a natural question is whether (21) can be used to obtain the equation for nonlinear filter (akin to the derivation of the Kalman filter described in Rem. 3.7). A formal derivation of the nonlinear filter along these lines is given in Appendix A.8.

Table 1: Comparison of the Mitter-Newton duality and the duality proposed in this paper

	Mitter-Newton duality	Duality proposed in this paper
Filtering/smoothing objective	Minimize relative entropy (Eq. (22))	Minimize variance (Eq. (9))
Observation (output) process	Pathwise ( $z$ is a sample path)	$Z$ is a stochastic process
Control (input) process	$U_{t}$ has the dimension of the process noise	$U$ and $Z$ are both elements of $L^{2}_{{\cal Z}}([0,T];\mathbb{R}^{m})$
Dual optimal control problem	Eq. (23)	Eq. (10)
Arrow of time	Forward in time	Backward in time
Dual state-space	$\mathbb{S}$ : same as the state-space for $X_{t}$	${\cal Y}$ : the space of functions on $\mathbb{S}$
Constraint	Controlled copy of the state process SDE (23a)	Dual control system BSDE (10b)
Running cost (Lagrangian)	$l(x,u\,;z_{t})={\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}\|u\|^{2}+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}h^{2}(x)+z_{t}({\cal A}^{u}h)(x)$	$l(y,v,u;x)=(\Gamma y)(x)+\|u+v(x)\|^{2}$
Value function (its interpretation)	Minus log of the posterior density	Expected value of the conditional variance
Asymptotic analysis (condition)	Unclear	Stabilizability of BSDE $\Leftrightarrow$ Detectability of HMM
Optimal solution gives	Forward-backward equations of smoothing	Equation of nonlinear filtering
Linear-Gaussian special case	Minimum energy duality (3)	Minimum variance duality (2)

7 Comparison with Mitter-Newton Duality

7.1 Review of Mitter-Newton duality

In [11], Mitter and Newton introduced a modified version of the Markov process $X$ . The modified process is denoted by $\tilde{X}:=\{\tilde{X}_{t}:0\leq t\leq T\}$ . The problem is to pick (i) the initial prior $\tilde{\mu}$ ; and (ii) the state transition, such that the probability law of $\tilde{X}$ equals the conditional law for $X$ .

This is accomplished by setting up an optimization problem on the space of probability laws. Let ${\sf P}_{X}$ denote the law for $X$ , ${\sf Q}$ denote the law for $\tilde{X}$ , and ${\sf P}_{X\mid z}$ denote the conditional law for $X$ given an observation sample path $z=\{z_{t}\in\mathbb{R}^{m}:0\leq t\leq T\}$ . Assuming ${\sf Q}\ll{\sf P}_{X}$ , the objective function is the relative entropy between ${\sf Q}$ and ${\sf P}_{X\mid z}$ :

\min_{{\sf Q}}\quad{\sf E}_{{\sf Q}}\Big{(}\log\frac{\,\mathrm{d}{\sf Q}}{\,\mathrm{d}{\sf P}_{X}}\Big{)}-{\sf E}_{{\sf Q}}\Big{(}\log\frac{\,\mathrm{d}{\sf P}_{X\mid z}}{\,\mathrm{d}{\sf P}_{X}}\Big{)}

(22)

In [28], (22) is referred to as the variational Kallianpur-Striebel formula. For Example 2 (Itô diffusion), this procedure yields the following stochastic optimal control problem:


	$\displaystyle\mathop{\text{Min}}_{\tilde{\mu},\;U}:\;\;{\sf J}(\tilde{\mu},U\,;z)$
	$\displaystyle\qquad={\sf E}\Big{(}\log\frac{\,\mathrm{d}\tilde{\mu}}{\,\mathrm{d}\mu}(\tilde{X}_{0})-z_{T}h(\tilde{X}_{T})+\int_{0}^{T}l(\tilde{X}_{t},U_{t}\,;z_{t})\,\mathrm{d}t\Big{)}$		(23a)
	$\displaystyle\text{Subj.}:\;\;\,\mathrm{d}\tilde{X}_{t}=a(\tilde{X}_{t})\,\mathrm{d}t+\sigma(\tilde{X}_{t})(U_{t}\,\mathrm{d}t+\,\mathrm{d}\tilde{B}_{t}),\;\;\tilde{X}_{0}\sim\tilde{\mu}$		(23b)

where

l(x,u\,;z_{t}):={\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}|u|^{2}+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}h^{2}(x)+z_{t}({\cal A}^{u}h)(x)

where ${\cal A}^{u}$ is the generator of the controlled Markov process $\tilde{X}$ . A similar construction is also possible for Example 1 (finite state-space) [28, Sec. 2.2.2], [45, Sec. 3.3].

The problem (23) is a standard stochastic optimal control problem whose solution is obtained by writing the HJB equation (see [45]),

	$\displaystyle-\frac{\partial v_{t}}{\partial t}(x)$	$\displaystyle=\big{(}{\cal A}(v_{t}+z_{t}h)\big{)}(x)+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}h^{2}(x)$
		$\displaystyle\qquad-{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}\|\sigma^{\hbox{\rm\tiny T}}\nabla(v_{t}+z_{t}h)(x)\|^{2}$
	$\displaystyle v_{T}(x)$	$\displaystyle=-z_{T}h(x),\quad x\in\mathbb{R}^{d}$

and the optimal control $U_{t}=u_{t}^{\text{\rm(opt)}}(\tilde{X}_{t})$ where

u_{t}^{\text{\rm(opt)}}(x)=-\sigma^{\hbox{\rm\tiny T}}\nabla(v_{t}+z_{t}h)(x)

By expressing the value function

v_{t}(x)=-\log\big{(}q_{t}(x)e^{z_{t}h(x)}\big{)}

a direct calculation shows that the process $\{q_{t}:0\leq t\leq T\}$ satisfies the backward Zakai equation of the smoothing problem [46],[47, Thm. 3.8]. This shows the connection to both the log transformation and to the smoothing problem. In fact, the above can be used to derive the forward-backward equations of nonlinear smoothing (see [45] and [35, Appdx. B]).

Remark 7.17.

The stochastic optimal control problem (23) is equivalently stated as a deterministic optimal control problem on ${\cal Y}^{\dagger}$ [45, Sec. 3.2]. Note that the optimal control problem depends on a (fixed) observation sample path $z$ , which is the reason why a deterministic formulation is available.

7.2 Linear Gaussian case

The goal is to relate (23) to the minimum energy duality (3) described in Sec. 1 for the linear Gaussian model (1). In the linear Gaussian case, the controlled process (23b) becomes

\,\mathrm{d}\tilde{X}_{t}=A^{\hbox{\rm\tiny T}}\tilde{X}_{t}\,\mathrm{d}t+\sigma U_{t}\,\mathrm{d}t+\sigma\,\mathrm{d}\tilde{B}_{t},\quad\tilde{X}_{0}\sim N(\tilde{m}_{0},\tilde{\Sigma}_{0})

(24)

where $U$ , $\tilde{m}_{0},\tilde{\Sigma}_{0}$ are decision variables. Because the problem is linear Gaussian, it suffices to consider a linear control law of the form

U_{t}=K_{t}(\tilde{X}_{t}-\tilde{m}_{t})+u_{t}

(25)

where $\tilde{m}_{t}:={\sf E}(\tilde{X}_{t})$ and the two deterministic processes

	$\displaystyle K$	$\displaystyle=\{K_{t}\in\mathbb{R}^{p\times d}:0\leq t\leq T\}$
	$\displaystyle u$	$\displaystyle=\{u_{t}\in\mathbb{R}^{p}:0\leq t\leq T\}$

are the new decision variables. With a linear control law (25), the state $\tilde{X}_{t}$ is a Gaussian random variable with mean $\tilde{m}_{t}$ and variance $\tilde{\Sigma}_{t}$ . It is possible to equivalently express (23) as two un-coupled deterministic optimal control problems, for the mean and for the variance, respectively. Detailed calculations showing this are contained in Appendix A.9. In particular, it is shown that the optimal control problem for the mean is the classical minimum energy duality (3).

7.3 Comparison

Table 1 provides a side-by-side comparison of the two types of duality:

•

Mitter-Newton duality (23) on the left-hand side; and
•

Duality (10) proposed in this paper on the right-hand side.

In Sec. 7.2 and Sec. 3.3, the two are shown to be generalization of the classical minimum energy duality (3) and the minimum variance duality (2), respectively. All of this conclusively answers the two questions raised in Sec. 1.

We make a note of some important distinctions (compare with the bulleted list in Sec. 1):

•

Inputs and outputs. In proposed duality (10), inputs and outputs are dual processes that have the same dimension. These are element of the same Hilbert space ${\cal U}$ .
•

Constraint. The constraint is the dual control system (10b) studied in the companion paper (part I).
•

Stability condition. For asymptotic analysis of (10), stabilizability of the constraint is the most natural condition. The main result of part I was to establish that stabilizability of the dual control system is equivalent to the detectability of the HMM. The latter condition of course is central to filter stability.
•

Arrow of time. The dual control system is backward in time. However, it is important to note that the information structure (filtration) is forward in time. In particular, all the processes are forward adapted to the filtration ${\cal Z}$ defined by the observation process.

A major drawback of the proposed duality is that the problem (for the Euclidean state-space $\mathbb{S}=\mathbb{R}^{d}$ ) is infinite-dimensional. This is to be expected because the nonlinear filter is infinite-dimensional. In contrast, the state space in the minimum energy duality is $\mathbb{R}^{d}$ which is important for algorithm design as in MEE. Having said that, the linear quadratic nature of the infinite-dimensional problem may prove to be useful in practical applications of this work.

8 Conclusions and directions of future work

In this paper, we presented the minimum variance dual optimal control problem for the nonlinear filtering problem. The mathematical relationship between the two problems is given by a duality principle. Two approaches are described to solve the problem, based on maximum principle and based on a martingale characterization. A formula for the optimal control as a feedback control law is obtained, and used to derive the equation of the nonlinear filter. A detailed comparison with the Mitter-Newton duality is given.

There are several possible directions of future research: An important next step is to use the controllability and stabilizability definitions of the dual control system to recover the known results in filter stability. Research on this has already begun with preliminary results appearing in [35, Chapter 7-8] and [48, 49]. Although some sufficient conditions have been obtained and compared with literature, a complete resolution still remains open.

Both the stability analysis and the optimal control formulation suggest natural connections to the dissipativity theory. Because the dual control system is linear, one might consider quadratic forms of supply rate function as follows (compare with the formula for the running cost $l$ ):

s(y,v,u;x):=\gamma|u+v(x)|^{2}-|y(x)-{c}_{t}|^{2}

where $\gamma>0$ and $c:=\{c_{t}:0\leq t\leq T\}\in L^{2}_{{\cal Z}}\big{(}[0,T];\mathbb{R}\big{)}$ is a suitable stochastic process (which can be picked). Establishing conditions for existence of a storage function and relating these conditions to the properties of the HMM may be useful for stability and robustness analysis.

Another avenue is numerical approximation of the nonlinear filter by considering sub-optimal solutions of the dual optimal control problem. The simplest choice is to consider deterministic control inputs $U\in L^{2}\big{(}[0,T];\mathbb{R}^{m}\big{)}$ . Some preliminary work on algorithm design along these lines appears in [36, Rem. 1], [35, Sec. 9.2] and [50, Ch. 4]. In particular for the finite state space case, this approach provides derivation and justification of Kalman filter for Markov chains [51]. In this regard, it is useful to relate duality to both the feedback particle filter (FPF) [52] and to the special cases (apart from the linear Gaussian case) where the optimal filter is known to be finite-dimensional, e.g. [53].

9 Acknowledgement

It is a pleasure to acknowledge Sean Meyn and Amirhossein Taghvaei for many useful technical discussions over the years on the topic of duality. The authors also acknowledge Alain Bensoussan for his early encouragement of this work.

Appendix A Proofs of the statements

A.1 Proof of Thm. 1

For a Markov process, the following process is a martingale:

N_{t}(g)=g(X_{t})-\int_{0}^{t}{\cal A}g(X_{s})\,\mathrm{d}s

Upon applying the Itô-Wentzell theorem [54, Thm. 1.17] on $Y_{t}(X_{t})$ (note here that all stochastic processes are forward adapted),

\displaystyle\,\mathrm{d}Y_{t}(X_{t})

\displaystyle=-U_{t}^{\hbox{\rm\tiny T}}\,\mathrm{d}Z_{t}+\big{(}U_{t}+V_{t}(X_{t})\big{)}^{\hbox{\rm\tiny T}}\,\mathrm{d}W_{t}+\,\mathrm{d}N_{t}(Y_{t})

Integrating both sides from $0$ to $T$ ,

	$\displaystyle F(X_{T})$	$\displaystyle=Y_{0}(X_{0})-\int_{0}^{T}U_{t}^{\hbox{\rm\tiny T}}\,\mathrm{d}Z_{t}$
		$\displaystyle+\int_{0}^{T}(U_{t}+V_{t}(X_{t}))^{\hbox{\rm\tiny T}}\,\mathrm{d}W_{t}+\int_{0}^{T}\,\mathrm{d}N_{t}(Y_{t})$

Consider now an estimator

S_{T}=b-\int_{0}^{T}U_{t}^{\hbox{\rm\tiny T}}\,\mathrm{d}Z_{t}

where $b\in\mathbb{R}$ is a deterministic constant. Then

	$\displaystyle F(X_{T})-S_{T}$	$\displaystyle=\big{(}Y_{0}(X_{0})-b\big{)}+\int_{0}^{T}(U_{t}+V_{t}(X_{t}))^{\hbox{\rm\tiny T}}\,\mathrm{d}W_{t}$
		$\displaystyle\quad+\int_{0}^{T}\,\mathrm{d}N_{t}(Y_{t})$

The left-hand side is the error of the estimator. The three terms on the right-hand side are mutually independent. Therefore, upon squaring and taking an expectation

	$\displaystyle{\sf E}\big{(}\|F(X_{T})-S_{T}\|^{2}\big{)}={\sf E}\big{(}\|Y_{0}(X_{0})-\mu(Y_{0})\|^{2}\big{)}+(\mu(Y_{0})-b)^{2}$
	$\displaystyle+{\sf E}\Big{(}\int_{0}^{T}\|U_{t}+V_{t}(X_{t})\|^{2}+(\Gamma Y_{t})(X_{t})\,\mathrm{d}t\Big{)}$

The proof is completed by setting $b=\mu(Y_{0})$ .

A.2 Proof of Lemma 3.3

Because $Z$ is a ${\tilde{\sf P}}$ -B.M., the formula holds for $\pi_{T}(F)\in L^{2}_{{\cal Z}_{T}}(\Omega;\mathbb{R})$ by the Brownian motion representation theorem [42, Thm. 5.18]. Note that

|\pi_{T}(F)|^{2}\leq\|F\|_{\cal Y}^{2},\quad{\tilde{\sf P}}\text{-a.s.}

because $\|\cdot\|_{\cal Y}$ is the sup norm. Therefore if $F\in L^{2}_{{\cal Z}_{T}}(\Omega;{\cal Y})$ then $\pi_{T}(F)\in L^{2}_{{\cal Z}_{T}}(\Omega;\mathbb{R})$ . The conclusion follows.

A.3 Proof of Prop. 3.5

Using optimal control $U^{\text{\rm(opt)}}=\{U^{\text{\rm(opt)}}_{t}:0\leq t\leq T\}\in{\cal U}$ , $(Y,V)=\{(Y_{t},V_{t}):0\leq t\leq T\}\in L^{2}_{{\cal Z}}\big{(}[0,T];{\cal Y}\times{\cal Y}^{m}\big{)}$ is the solution of the BSDE (10b) with $Y_{T}=F\in L^{2}_{{\cal Z}_{T}}(\Omega;{\cal Y})$ . Fix $t\in[0,T]$ and let

S_{t}=\mu(Y_{0})-\int_{0}^{t}\big{(}U_{s}^{\text{\rm(opt)}}\big{)}^{\hbox{\rm\tiny T}}\,\mathrm{d}Z_{s}

Then by repeating the proof of Thm. 1 now over the time-horizon $[0,t]$ ,

{\sf E}\big{(}|Y_{t}(X_{t})-S_{t}|^{2}\big{)}=\text{var}_{0}(Y_{0})+{\sf E}(\int_{0}^{t}l(Y_{s},V_{s},U_{s}^{\text{\rm(opt)}};X_{s})\,\mathrm{d}s)

If ${\sf E}\big{(}|Y_{t}(X_{t})-S_{t}|^{2}\big{)}=\text{var}_{t}(Y_{t})$ then there is nothing to prove. Because then $S_{t}=\pi_{t}(Y_{t})$ ( ${\sf P}$ -a.s.) by the uniqueness of the conditional expectation. Therefore, suppose

\text{var}_{t}(Y_{t})={\sf E}\big{(}|Y_{t}(X_{t})-\pi_{t}(Y_{t})|^{2}\big{)}<{\sf E}\big{(}|Y_{t}(X_{t})-S_{t}|^{2}\big{)}

In this case, we show that there exists a $\tilde{U}\in{\cal U}$ such that ${\sf J}_{T}(\tilde{U})<{\sf J}_{T}(U^{\text{\rm(opt)}})$ . Because $U^{\text{\rm(opt)}}$ is the optimal control, this provides the necessary contradiction.

Set $C:={\sf E}(\int_{t}^{T}l(Y_{s},V_{s},U_{s}^{\text{\rm(opt)}};X_{s})\,\mathrm{d}s)$ and we have

{\sf J}_{T}(U^{\text{\rm(opt)}})={\sf E}\big{(}|Y_{t}(X_{t})-S_{t}|^{2}\big{)}+C

Because $Y_{t}\in L^{2}_{{\cal Z}_{t}}(\Omega;{\cal Y})$ , by Lemma 3.3 there exists $\hat{U}\in L_{{\cal Z}}^{2}([0,t];\mathbb{R}^{m})$ such that

\pi_{t}(Y_{t})={\tilde{\sf E}}(\pi_{t}(Y_{t}))-\int_{0}^{t}\hat{U}_{s}^{\hbox{\rm\tiny T}}\,\mathrm{d}Z_{s},\quad{\tilde{\sf P}}\text{-a.s.}

Consider an admissible control $\tilde{U}$ as follows

\tilde{U}_{s}=\begin{cases}\hat{U}_{s}&s\leq t\\ U_{s}^{\text{\rm(opt)}}&s>t\end{cases}

and denote by $(\tilde{Y},\tilde{V})$ the solution of the BSDE with the control $\tilde{U}$ . Because of the uniqueness of the solution, $(\tilde{Y}_{s},\tilde{V}_{s})=(Y_{s},V_{s})$ for all $s>t$ and therefore

	$\displaystyle{\sf J}_{T}(\tilde{U})$	$\displaystyle={\sf E}\big{(}\|Y_{t}(X_{t})-\pi_{t}(Y_{t})\|^{2}\big{)}+C$
		$\displaystyle<{\sf E}\big{(}\|Y_{t}(X_{t})-S_{t}\|^{2}\big{)}+C={\sf J}_{T}(U^{\text{\rm(opt)}})$

This supplies the necessary contradiction and completes the proof.

A.4 Derivation of the Lagrangian

Using the change of measure formula (8),

	$\displaystyle{\sf E}\big{(}(\Gamma Y_{t})(X_{t})\big{)}$	$\displaystyle={\tilde{\sf E}}\big{(}\sigma_{t}(\Gamma Y_{t})\big{)}$
	$\displaystyle{\sf E}\big{(}\|U_{t}+V_{t}(X_{t})\|^{2}\big{)}$	$\displaystyle={\tilde{\sf E}}\big{(}\sigma_{t}(\|U_{t}+V_{t}\|^{2})\big{)}$

Even though the formula (8) is stated for deterministic functions, it is easily extended to ${\cal Z}_{t}$ -measurable functions which is how it is used above. Therefore,

	$\displaystyle{\sf J}_{T}(U)$	$\displaystyle=\text{var}_{0}(Y_{0})+{\sf E}\Big{(}\int_{0}^{T}(\Gamma Y_{t})(X_{t})+\|U_{t}+V_{t}(X_{t})\|^{2}\,\mathrm{d}t\Big{)}$
		$\displaystyle=\text{var}_{0}(Y_{0})+{\tilde{\sf E}}\Big{(}\int_{0}^{T}\sigma_{t}(\Gamma Y_{t})+\sigma_{t}(\|U_{t}+V_{t}\|^{2})\,\mathrm{d}t\Big{)}$
		$\displaystyle=\text{var}_{0}(Y_{0})+{\tilde{\sf E}}\Big{(}\int_{0}^{T}\ell(Y_{t},V_{t},U_{t};\sigma_{t})\,\mathrm{d}t\Big{)}$

A.5 Proof of Thm. 4.9

Equation (17) is the Hamilton’s equation for optimal control of a BSDE [44, Thm. 4.4]. The optimal control is obtained from the maximum principle:

U_{t}=\mathop{\operatorname{argmax}}_{u\in\mathbb{R}^{m}}\;{\cal H}(Y_{t},V_{t},u,P_{t};\sigma_{t})

Since ${\cal H}$ is quadratic in the control input, the explicit formula (18) is obtained by evaluating the derivative and setting it to zero:

{\cal H}_{u}(Y_{t},V_{t},u,P_{t};\sigma_{t})=2\sigma_{t}({\sf 1})u+2\sigma_{t}(V_{t})+P_{t}(h)=0

A.6 Justification of the formula (19)

For notational ease, we drop the superscript ${}^{\text{\rm(opt)}}$ and denote the optimal control input simply as $U_{t}$ . In this proof, $\langle\cdot,\cdot\rangle$ is used to denote the duality paring between functions and measures (e.g., $\langle f,\mu\rangle=\mu(f)$ ).

Let $f$ be an arbitrary test function. We show that

\langle f,P_{t}\rangle=\big{\langle}2f(Y_{t}-\pi_{t}(Y_{t})),\sigma_{t}\big{\rangle},\quad 0<t\leq T

This is known to be true at time $t=0$ because of the boundary condition (17c). Therefore, the proof is carried out by taking a derivative of both sides and showing these to be identical.

Using the Itô-Wentzell formula for measure valued processes [55, Thm. 1.1],

	$\displaystyle\,\mathrm{d}\big{\langle}2f(Y_{t}$	$\displaystyle-\pi_{t}(Y_{t})),\sigma_{t}\big{\rangle}$
		$\displaystyle=2\big{\langle}{\cal A}(fY_{t})-f({\cal A}Y_{t})-\pi_{t}(Y_{t})({\cal A}f),\sigma_{t}\big{\rangle}\,\mathrm{d}t$
		$\displaystyle\qquad\quad+(\langle 2f(U_{t}+V_{t}),\sigma_{t}\rangle+\langle fh,P_{t}\rangle)\,\mathrm{d}Z_{t}$

where we have used $\,\mathrm{d}\big{(}\pi_{t}(Y_{t})\big{)}=-U_{t}\,\mathrm{d}Z_{t}$ (Prop. 3.5). From the Hamilton’s equation (17b), upon explicitly evaluating the terms

	$\displaystyle\,\mathrm{d}\langle f,P_{t}\rangle$	$\displaystyle=\Big{(}\langle{\cal A}f,P_{t}\rangle+\frac{\,\mathrm{d}}{\,\mathrm{d}\epsilon}\sigma_{t}\big{(}\Gamma(Y_{t}+\epsilon f)\big{)}\Big{\|}_{\epsilon=0}\Big{)}\,\mathrm{d}t$
		$\displaystyle\qquad\quad+(\langle fh,P_{t}\rangle+\langle 2f(U_{t}+V_{t}),\sigma_{t}\rangle)\,\mathrm{d}Z_{t}$

where

\frac{\,\mathrm{d}}{\,\mathrm{d}\epsilon}\Gamma(Y_{t}+\epsilon f)\Big{|}_{\epsilon=0}=2\big{(}{\cal A}(Y_{t}f)-Y_{t}({\cal A}f)-f({\cal A}Y_{t})\big{)}

On comparing the terms, the two derivatives are the seen to be the same where we use also the identity $\langle g,P_{t}\rangle=\big{\langle}2g(Y_{t}-\pi_{t}(Y_{t})),\sigma_{t}\big{\rangle}$ for $g={\cal A}f$ .

A.7 Proof of Thm. 5.14

The proof uses the equation of the nonlinear filter and $\,\mathrm{d}I_{t}:=\,\mathrm{d}Z_{t}-\pi_{t}(h)\,\mathrm{d}t$ is the innovation increment. We evaluate the derivative of ${\cal V}_{t}(Y_{t})=\pi_{t}(Y_{t}^{2})-\big{(}\pi_{t}(Y_{t})\big{)}^{2}$ .

	$\displaystyle\,\mathrm{d}\pi_{t}($	$\displaystyle Y_{t}^{2})$
		$\displaystyle=\pi_{t}({\cal A}Y_{t}^{2})\,\mathrm{d}t+\big{(}\pi_{t}(hY_{t}^{2})-\pi_{t}(h)\pi_{t}(Y_{t}^{2})\big{)}\,\mathrm{d}I_{t}$
		$\displaystyle\quad+\pi_{t}\big{(}-2Y_{t}\big{(}{\cal A}Y_{t}+h(U_{t}+V_{t})\big{)}+\|V_{t}\|^{2}\big{)}\,\mathrm{d}t$
		$\displaystyle\quad+2\pi_{t}\big{(}Y_{t}V_{t}\big{)}\,\mathrm{d}Z_{t}+2\big{(}\pi_{t}(hY_{t}V_{t})-\pi_{t}(h)\pi_{t}(Y_{t}V_{t})\big{)}\,\mathrm{d}t$
		$\displaystyle=\pi_{t}\big{(}\Gamma Y_{t}\big{)}\,\mathrm{d}t+\pi_{t}(\|V_{t}\|^{2})\,\mathrm{d}t-2\pi_{t}(hY_{t})U_{t}\,\mathrm{d}t$
		$\displaystyle\quad+\big{(}\pi_{t}(hY_{t}^{2})-\pi_{t}(h)\pi_{t}(Y_{t}^{2})+2\pi_{t}(Y_{t}V_{t})\big{)}\,\mathrm{d}I_{t}$

Similarly,

	$\displaystyle\,\mathrm{d}\pi_{t}(Y_{t})$	$\displaystyle=\pi_{t}({\cal A}Y_{t})\,\mathrm{d}t$
		$\displaystyle\quad+\big{(}\pi_{t}(hY_{t})-\pi_{t}(h)\pi_{t}(Y_{t})\big{)}\big{(}\,\mathrm{d}Z_{t}-\pi_{t}(h)\,\mathrm{d}t\big{)}$
		$\displaystyle\quad-\pi_{t}\big{(}{\cal A}Y_{t}+h(U_{t}+V_{t})\big{)}\,\mathrm{d}t+\pi_{t}\big{(}V_{t}\big{)}\,\mathrm{d}Z_{t}$
		$\displaystyle\quad+\big{(}\pi_{t}(hV_{t})-\pi_{t}(h)\pi_{t}(V_{t})\big{)}\,\mathrm{d}t$
		$\displaystyle=\big{(}\pi_{t}(hY_{t})-\pi_{t}(h)\pi_{t}(Y_{t})+\pi_{t}(V_{t})\big{)}\,\mathrm{d}Z_{t}$
		$\displaystyle\quad-\big{(}U_{t}+\pi_{t}(hY_{t})-\pi_{t}(h)\pi_{t}(Y_{t})+\pi_{t}(V_{t})\big{)}\pi_{t}(h)\,\mathrm{d}t$
		$\displaystyle=U_{t}^{\text{\rm(opt)}}\,\mathrm{d}Z_{t}-(U_{t}-U_{t}^{\text{\rm(opt)}})\pi_{t}(h)\,\mathrm{d}t$

where $U_{t}^{\text{\rm(opt)}}:=-\pi_{t}(hY_{t})+\pi_{t}(h)\pi_{t}(Y_{t})-\pi_{t}(V_{t})$ . Therefore,

	$\displaystyle\,\mathrm{d}\big{(}\pi_{t}(Y_{t})\big{)}^{2}$	$\displaystyle=2\pi_{t}(Y_{t})U_{t}^{\text{\rm(opt)}}\,\mathrm{d}Z_{t}+\|U_{t}^{\text{\rm(opt)}}\|^{2}\,\mathrm{d}t$
		$\displaystyle\quad-2\pi_{t}(Y_{t})(U_{t}-U_{t}^{\text{\rm(opt)}})\pi_{t}(h)\,\mathrm{d}t$

Collecting terms, we have

	$\displaystyle\,\mathrm{d}M_{t}=$	$\displaystyle\pi_{t}\big{(}\Gamma Y_{t}\big{)}\,\mathrm{d}t+\pi_{t}(\|V_{t}\|^{2})\,\mathrm{d}t-2\pi_{t}(hY_{t})U_{t}\,\mathrm{d}t$
		$\displaystyle+\big{(}\pi_{t}(hY_{t}^{2})-\pi_{t}(h)\pi_{t}(Y_{t}^{2})+2\pi_{t}(Y_{t}V_{t})\big{)}\,\mathrm{d}I_{t}$
		$\displaystyle-2\pi_{t}(Y_{t})U_{t}^{\text{\rm(opt)}}\,\mathrm{d}Z_{t}+2\pi_{t}(Y_{t})\big{(}U_{t}-U_{t}^{\text{\rm(opt)}}\big{)}\pi_{t}(h)\,\mathrm{d}t$
		$\displaystyle-\|U_{t}^{\text{\rm(opt)}}\|^{2}\,\mathrm{d}t-\ell(Y_{t},V_{t},U_{t};\pi_{t})\,\mathrm{d}t$
	$\displaystyle=$	$\displaystyle\big{(}\pi_{t}(hY_{t}^{2})-\pi_{t}(h)\pi_{t}(Y_{t}^{2})+2\pi_{t}(Y_{t}V_{t})\big{)}\,\mathrm{d}I_{t}$
		$\displaystyle-\|U_{t}-U_{t}^{\text{\rm(opt)}}\|^{2}\,\mathrm{d}t$

Since $-|U_{t}-U_{t}^{\text{\rm(opt)}}|^{2}\leq 0$ and $I$ is a ${\sf P}$ -martingale, $M$ is a ${\sf P}$ -supermartingale, and it is a martingale if and only if $U_{t}=U_{t}^{\text{\rm(opt)}}$ for all $t$ .

A.8 Formal derivation of the nonlinear filter

We begin with an ansatz

\,\mathrm{d}\pi_{t}(f)=\alpha_{t}(f)\,\mathrm{d}t+\beta_{t}(f)\,\mathrm{d}Z_{t}

(26)

where the goal is to obtain formulae for $\alpha_{t}$ and $\beta_{t}$ . Because we have an equation (21) for $\pi_{t}(Y_{t})$ , let us express $\,\mathrm{d}(\pi_{t}(Y_{t}))$ in terms of the unknown $\alpha_{t}$ and $\beta_{t}$ . Using the SDE (26) for $\pi_{t}$ and the BSDE (10b) for $Y_{t}$ , apply the Itô-Wentzell formula to obtain

	$\displaystyle\,\mathrm{d}\big{(}\pi_{t}(Y_{t})\big{)}$	$\displaystyle=(\alpha_{t}(Y_{t})+\beta_{t}(V_{t})-\pi_{t}({\cal A}Y_{t}+h^{\hbox{\rm\tiny T}}(U_{t}+V_{t})))\,\mathrm{d}t$
		$\displaystyle\qquad+(\beta_{t}(Y_{t})+\pi_{t}(V_{t}))\,\mathrm{d}Z_{t}$

Comparing with (21),

	$\displaystyle\alpha_{t}(Y_{t})+\beta_{t}(V_{t})-\pi_{t}({\cal A}Y_{t}+h^{\hbox{\rm\tiny T}}(U_{t}+V_{t}))=0$
	$\displaystyle\beta_{t}(Y_{t})+\pi_{t}(V_{t})=\big{(}\pi_{t}(hY_{t})-\pi_{t}(h)\pi_{t}(Y_{t})\big{)}+\pi_{t}(V_{t})$

for $0\leq t\leq T$ , ${\sf P}\text{-a.s.}$ . Because $F$ and therefore $Y_{t}$ is arbitrary, the second of these equations suggests setting

\beta_{t}(f)=\pi_{t}(hf)-\pi_{t}(h)\pi_{t}(f)

using which the first equation is manipulated to show

	$\displaystyle\alpha_{t}(Y_{t})$	$\displaystyle=\pi_{t}({\cal A}Y_{t})-\pi_{t}(h)\big{(}\pi_{t}(hY_{t})-\pi_{t}(h)\pi_{t}(Y_{t})+\pi_{t}(V_{t})\big{)}$
		$\displaystyle\qquad\quad+\pi_{t}(hV_{t})-\pi_{t}(hV_{t})+\pi_{t}(h)\pi_{t}(V_{t})$
		$\displaystyle=\pi_{t}({\cal A}Y_{t})-\pi_{t}(h)\big{(}\pi_{t}(hY_{t})-\pi_{t}(h)\pi_{t}(Y_{t})\big{)}$

which gives the following

\alpha_{t}(f)=\pi_{t}({\cal A}f)-\beta_{t}(f)\pi_{t}(h)

Substituting the expressions for $\alpha_{t}$ and $\beta_{t}$ into the ansatz (26)

	$\displaystyle\,\mathrm{d}\pi_{t}(f)=\big{(}\pi_{t}({\cal A}f)-\beta_{t}(f)\pi_{t}(h)\big{)}\,\mathrm{d}t+\beta_{t}(f)\,\mathrm{d}Z_{t}$
	$\displaystyle\;\;=\pi_{t}({\cal A}f)\,\mathrm{d}t+\big{(}\pi_{t}(hf)-\pi_{t}(h)\pi_{t}(f)\big{)}(\,\mathrm{d}Z_{t}-\pi_{t}(h)\,\mathrm{d}t)$

This is the well known SDE of the nonlinear filter.

A.9 Mitter-Newton duality for the linear Gaussian model

Consider (24) with the linear control law (25). Then $\tilde{X}_{t}$ is a Gaussian random variable whose mean $\tilde{m}_{t}$ and variance $\tilde{\Sigma}_{t}$ evolve as follows:


$\displaystyle\frac{\,\mathrm{d}\tilde{m}_{t}}{\,\mathrm{d}t}$	$\displaystyle=A^{\hbox{\rm\tiny T}}\tilde{m}_{t}+\sigma{u}_{t}$	(27a)
$\displaystyle\frac{\,\mathrm{d}\tilde{\Sigma}_{t}}{\,\mathrm{d}t}$	$\displaystyle=(A^{\hbox{\rm\tiny T}}+\sigma K_{t})\tilde{\Sigma}_{t}+\tilde{\Sigma}_{t}(A^{\hbox{\rm\tiny T}}+\sigma K_{t})^{\hbox{\rm\tiny T}}+\sigma\sigma^{\hbox{\rm\tiny T}}$	(27b)

Note that the two equations are entirely un-coupled: $u_{t}$ affects only the equation for $\tilde{m}_{t}$ and $K_{t}$ affects only the equation for $\tilde{\Sigma}_{t}$ . We now turn to explicitly computing the running cost. For the linear Gaussian model

({\cal A}^{u}h)(x)=H^{\hbox{\rm\tiny T}}(A^{\hbox{\rm\tiny T}}x+\sigma u)

and the running cost becomes

\displaystyle l(x,u\,;z_{t})

\displaystyle={\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}|u|^{2}+|H^{\hbox{\rm\tiny T}}x|^{2}+z_{t}H^{\hbox{\rm\tiny T}}(A^{\hbox{\rm\tiny T}}x+\sigma u)

Because $\tilde{X}_{t}\sim N(\tilde{m}_{t},\tilde{\Sigma}_{t})$ ,

	$\displaystyle{\sf E}\big{(}l(\tilde{X}_{t},u_{t}\,;z_{t})\big{)}$	$\displaystyle={\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}\|{u}_{t}\|^{2}+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}\mbox{tr}(K_{t}^{\hbox{\rm\tiny T}}K_{t}\tilde{\Sigma}_{t})+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}\|H^{\hbox{\rm\tiny T}}\tilde{m}_{t}\|^{2}$
		$\displaystyle\quad+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}\mbox{tr}(HH^{\hbox{\rm\tiny T}}\tilde{\Sigma}_{t})+z_{t}H^{\hbox{\rm\tiny T}}(A^{\hbox{\rm\tiny T}}\tilde{m}_{t}+\sigma{u}_{t})$

and because $\tilde{\mu}$ from $\mu$ are both Gaussian, the divergence

	$\displaystyle{\sf E}\Big{(}\log\frac{\,\mathrm{d}\tilde{\mu}}{\,\mathrm{d}\mu}(\tilde{X}_{0})\Big{)}$	$\displaystyle={\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}(m_{0}-\tilde{m}_{0})^{\hbox{\rm\tiny T}}\Sigma_{0}^{-1}(m_{0}-\tilde{m}_{0})$
		$\displaystyle\quad+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}\log\frac{\det(\tilde{\Sigma}_{0})}{\det(\Sigma_{0})}-\frac{d}{2}+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}\mbox{tr}(\tilde{\Sigma}_{0}\Sigma_{0}^{-1})$

and because $h(\cdot)$ is linear the terminal condition term

{\sf E}\big{(}z_{T}h(\tilde{X}_{T})\big{)}=z_{T}H^{\hbox{\rm\tiny T}}\tilde{m}_{T}

Combining all of the above, upon a formal integration by parts, ${\sf J}(\tilde{\mu},U;z)$ is expressed as sum of two un-coupled costs

	$\displaystyle{\sf J}_{1}(\tilde{m}_{0},{u}\,;z)$	$\displaystyle={\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}(m_{0}-\tilde{m}_{0})^{\hbox{\rm\tiny T}}\Sigma_{0}^{-1}(m_{0}-\tilde{m}_{0})$
		$\displaystyle+\int_{0}^{T}{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}\|u_{t}\|^{2}+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}\|\dot{z}_{t}-H^{\hbox{\rm\tiny T}}\tilde{m}_{t}\|^{2}\,\mathrm{d}t$
	$\displaystyle{\sf J}_{2}(\tilde{\Sigma}_{0},K\,;z)$	$\displaystyle={\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}\log(\det(\tilde{\Sigma}_{0}))+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}\mbox{tr}(\tilde{\Sigma}_{0}\Sigma_{0}^{-1})$
		$\displaystyle+\int_{0}^{T}{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}\mbox{tr}(K_{t}^{\hbox{\rm\tiny T}}K_{t}\tilde{\Sigma}_{t})+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}\mbox{tr}(HH^{\hbox{\rm\tiny T}}\tilde{\Sigma}_{t})\,\mathrm{d}t$

plus a few constant terms that are not affected by the decision variables. The first of these costs subject to the ODE constraint (27a) for the mean $\tilde{m}_{t}$ is the classical minimum energy duality.

References

[1] R. E. Kalman, “On the general theory of control systems,” in Proceedings First International Conference on Automatic Control, Moscow, USSR, 1960, pp. 481–492.
[2] R. E. Kalman and R. S. Bucy, “New results in linear filtering and prediction theory,” Journal of basic engineering, vol. 83, no. 1, pp. 95–108, 1961.
[3] A. Bensoussan, Estimation and Control of Dynamical Systems. Springer, 2018, vol. 48.
[4] E. Todorov, “General duality between optimal control and estimation,” in 2008 IEEE 47th Conference on Decision and Control (CDC), 12 2008, pp. 4286–4292.
[5] K. J. Åström, Introduction to Stochastic Control Theory. Academic Press, 1970.
[6] A. E. Bryson and Y.-C. Ho, Applied optimal control: optimization, estimation, and control. Routledge, 2018.
[7] D. Fraser and J. Potter, “The optimum linear smoother as a combination of two optimum linear filters,” IEEE Transactions on automatic control, vol. 14, no. 4, pp. 387–390, 1969.
[8] R. E. Mortensen, “Maximum-likelihood recursive nonlinear filtering,” Journal of Optimization Theory and Applications, vol. 2, no. 6, pp. 386–394, 1968.
[9] J. B. Rawlings, D. Q. Mayne, and M. Diehl, Model predictive control: theory, computation, and design. Nob Hill Publishing Madison, WI, 2017, vol. 2.
[10] W. H. Fleming and S. K. Mitter, “Optimal control and nonlinear filtering for nondegenerate diffusion processes,” Stochastics: An International Journal of Probability and Stochastic Processes, vol. 8, no. 1, pp. 63–77, 1982.
[11] S. K. Mitter and N. J. Newton, “A variational approach to nonlinear estimation,” SIAM Journal on Control and Optimization, vol. 42, no. 5, pp. 1813–1833, 2003.
[12] H. Michalska and D. Q. Mayne, “Moving horizon observers and observer-based control,” IEEE Transactions on Automatic Control, vol. 40, no. 6, pp. 995–1006, 1995.
[13] C. V. Rao, J. B. Rawlings, and J. H. Lee, “Constrained linear state estimation—a moving horizon approach,” Automatica, vol. 37, no. 10, pp. 1619–1628, 2001.
[14] A. J. Krener, “The convergence of the minimum energy estimator,” in New Trends in Nonlinear Dynamics and Control and their Applications. Springer, 2003, pp. 187–208.
[15] D. A. Copp and J. P. Hespanha, “Simultaneous nonlinear model predictive control and state estimation,” Automatica, vol. 77, pp. 143–154, 2017.
[16] M. Farina, G. Ferrari-Trecate, and R. Scattolini, “Distributed moving horizon estimation for linear constrained systems,” IEEE Trans. on Auto. Control, vol. 55, no. 11, pp. 2462–2475, 2010.
[17] R. Schneider, R. Hannemann-Tamás, and W. Marquardt, “An iterative partition-based moving horizon estimator with coupled inequality constraints,” Automatica, vol. 61, pp. 302–307, 2015.
[18] A. Alessandri, M. Baglietto, and G. Battistelli, “A maximum-likelihood kalman filter for switching discrete-time linear systems,” Automatica, vol. 46, no. 11, pp. 1870–1876, 2010.
[19] J. W. Kim and P. G. Mehta, “Duality for nonlinear filtering I: Observability,” unpublished.
[20] W. H. Fleming, “Exit probabilities and optimal stochastic control,” Applied Mathematics and Optimization, vol. 4, no. 1, pp. 329–346, 1978.
[21] A. Bensoussan, Stochastic control of partially observable systems. Cambridge University Press, 1992.
[22] W. H. Fleming and E. De Giorgi, “Deterministic nonlinear filtering,” Annali della Scuola Normale Superiore di Pisa-Classe di Scienze-Serie IV, vol. 25, no. 3, pp. 435–454, 1997.
[23] Y. Chen, T. T. Georgiou, and M. Pavon, “On the relation between optimal transport and Schrödinger bridges: A stochastic control viewpoint,” Journal of Optimization Theory and Applications, vol. 169, no. 2, pp. 671–691, 2016.
[24] H. J. Kappen and H. C. Ruiz, “Adaptive importance sampling for control and inference,” Journal of Statistical Physics, vol. 162, no. 5, pp. 1244–1266, 2016.
[25] S. Reich, “Data assimilation: the Schrödinger perspective,” Acta Numerica, vol. 28, pp. 635–711, 2019.
[26] H. Ruiz and H. J. Kappen, “Particle smoothing for hidden diffusion processes: Adaptive path integral smoother,” IEEE Transactions on Signal Processing, vol. 65, no. 12, pp. 3191–3203, 2017.
[27] T. Sutter, A. Ganguly, and H. Koeppl, “A variational approach to path estimation and parameter inference of hidden diffusion processes,” Journal of Machine Learning Research, vol. 17, pp. 6544–80, 2016.
[28] R. van Handel, “Filtering, stability, and robustness,” Ph.D. dissertation, California Institute of Technology, Pasadena, 12 2006.
[29] K. W. Simon and A. R. Stubberud, “Duality of linear estimation and control,” Journal of Optimization Theory and Applications, vol. 6, no. 1, pp. 55–67, 1970.
[30] G. C. Goodwin, J. A. de Doná, M. M. Seron, and X. W. Zhuo, “Lagrangian duality between constrained estimation and control,” Automatica, vol. 41, no. 6, pp. 935–944, 2005.
[31] P. K. Mishra, G. Chowdhary, and P. G. Mehta, “Minimum variance constrained estimator,” Automatica, vol. 137, p. 110106, 2022.
[32] B. K. Kwon, S. Han, O. K. Kwon, and W. H. Kwon, “Minimum variance FIR smoothers for discrete-time state space models,” IEEE Signal Processing Letters, vol. 14, no. 8, pp. 557–560, 2007.
[33] S. Zhao, Y. S. Shmaliy, B. Huang, and F. Liu, “Minimum variance unbiased FIR filter for discrete time-variant systems,” Automatica, vol. 53, pp. 355–361, 2015.
[34] M. Darouach, M. Zasadzinski, and M. Boutayeb, “Extension of minimum variance estimation for systems with unknown inputs,” Automatica, vol. 39, no. 5, pp. 867–876, 2003.
[35] J. W. Kim, “Duality for nonlinear filtiering,” Ph.D. dissertation, University of Illinois at Urbana-Champaign, Urbana, 06 2022.
[36] J. W. Kim, P. G. Mehta, and S. Meyn, “What is the Lagrangian for nonlinear filtering?” in 2019 IEEE 58th Conference on Decision and Control (CDC). Nice, France: IEEE, 12 2019, pp. 1607–1614.
[37] D. Bakry, I. Gentil, and M. Ledoux, Analysis and geometry of Markov diffusion operators. Springer Science & Business Media, 2013, vol. 348.
[38] B. Øksendal, Stochastic differential equations: an introduction with applications. Springer Science & Business Media, 2013.
[39] J. Xiong, An Introduction to Stochastic Filtering Theory. Oxford University Press on Demand, 2008, vol. 18.
[40] J. Yong and X. Y. Zhou, Stochastic controls: Hamiltonian systems and HJB equations. Springer Science & Business Media, 1999, vol. 43.
[41] J. Ma and J. Yong, “On linear, degenerate backward stochastic partial differential equations,” Probability Theory and Related Fields, vol. 113, no. 2, pp. 135–170, 1999.
[42] J. F. Le Gall, Brownian Motion, Martingales, and Stochastic Calculus. Springer, 2016, vol. 274.
[43] E. Pardoux and A. Răşcanu, Stochastic Differential Equations, Backward SDEs, Partial Differential Equations. Springer, 2014.
[44] S. Peng, “Backward stochastic differential equations and applications to optimal control,” Applied Mathematics and Optimization, vol. 27, no. 2, pp. 125–144, 1993.
[45] J. W. Kim and P. G. Mehta, “An optimal control derivation of nonlinear smoothing equations,” in Proceedings of the Workshop on Dynamics, Optimization and Computation held in honor of the 60th birthday of Michael Dellnitz. Springer, 2020, pp. 295–311.
[46] E. Pardoux, “Backward and forward stochastic partial differential equations associated with a non linear filtering problem,” in 1979 18th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes, vol. 2. IEEE, 1979, pp. 166–171.
[47] ——, “Non-linear filtering, prediction and smoothing,” in Stochastic systems: the mathematics of filtering and identification and applications. Springer, 1981, pp. 529–557.
[48] J. W. Kim, P. G. Mehta, and S. Meyn, “The conditional Poincaré inequality for filter stability,” in 2021 IEEE 60th Conference on Decision and Control (CDC), 12 2021, pp. 1629–1636.
[49] J. W. Kim and P. G. Mehta, “A dual characterization of the stability of the Wonham filter,” in 2021 IEEE 60th Conference on Decision and Control (CDC), 12 2021, pp. 1621–1628.
[50] J. Szalankiewicz, “Duality in nonlinear filtering,” Master’s thesis, Technische Universität Berlin, Institut für Mathematik, Berlin, 2021.
[51] N. V. Krylov, R. S. Lipster, and A. A. Novikov, “Kalman filter for Markov processes,” in Statistics and Control of Stochastic Processes. New York: Optimization Software, inc., 1984, pp. 197–213.
[52] T. Yang, P. G. Mehta, and S. Meyn, “Feedback particle filter,” IEEE Transactions on Automatic Control, vol. 58, no. 10, pp. 2465–2480, 10 2013.
[53] V. E. Beneš, “Exact finite-dimensional filters for certain diffusions with nonlinear drift,” Stochastics, vol. 5, no. 1-2, pp. 65–92, 1981.
[54] B. L. Rozovsky and S. V. Lototsky, Stochastic Evolution Systems: Linear Theory and Applications to Non-Linear Filtering. Springer, 2018, vol. 89.
[55] N. V. Krylov, “On the Itô–Wentzell formula for distribution-valued processes and related topics,” Probability Theory and Related Fields, vol. 150, no. 1-2, pp. 295–319, 2011.

{IEEEbiography}

[ [Uncaptioned image] ]Jin Won Kim received the Ph.D. degree in Mechanical Engineering from University of Illinois at Urbana-Champaign, Urbana, IL, in 2022. He is now a postdocdoral research scientist in the Institute of Mathematics at the University of Potsdam. His current research interests are in nonlinear filtering and stochastic optimal control. He received the Best Student Paper Awards at the IEEE Conference on Decision and Control 2019.

{IEEEbiography}

[ [Uncaptioned image] ]Prashant G. Mehta received the Ph.D. degree in Applied Mathematics from Cornell University, Ithaca, NY, in 2004. He is a Professor of Mechanical Science and Engineering at the University of Illinois at Urbana-Champaign. Prior to joining Illinois, he was a Research Engineer at the United Technologies Research Center (UTRC). His current research interests are in nonlinear filtering. He received the Outstanding Achievement Award at UTRC for his contributions to the modeling and control of combustion instabilities in jet-engines. His students received the Best Student Paper Awards at the IEEE Conference on Decision and Control 2007, 2009 and 2019, and were finalists for these awards in 2010 and 2012. In the past, he has served on the editorial boards of the ASME Journal of Dynamic Systems, Measurement, and Control and the Systems and Control Letters. He currently serves on the editorial board of the IEEE Transactions on Automatic Control.