This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Duality for Nonlinear Filtering II: Optimal Control

Jin W. Kim    \IEEEmembershipStudent member, IEEE    and Prashant G. Mehta    \IEEEmembershipSenior member, IEEE This work is supported in part by the NSF award 1761622.Research reported in this paper was carried out by J. W. Kim, as part of his PhD dissertation work, while he was a graduate student at the University of Illinois at Urbana-Champaign. He is now with the Institute of Mathematics at the University of Potsdam (e-mail: [email protected]).P. G. Mehta is with the Coordinated Science Laboratory and the Department of Mechanical Science and Engineering at the University of Illinois at Urbana-Champaign (e-mail: [email protected]).
Abstract

This paper is concerned with the development and use of duality theory for a nonlinear filtering model with white noise observations. The main contribution of this paper is to introduce a stochastic optimal control problem as a dual to the nonlinear filtering problem. The mathematical statement of the dual relationship between the two problems is given in the form of a duality principle. The constraint for the optimal control problem is the backward stochastic differential equation (BSDE) introduced in the companion paper. The optimal control solution is obtained from an application of the maximum principle, and subsequently used to derive the equation of the nonlinear filter. The proposed duality is shown to be an exact extension of the classical Kalman-Bucy duality, and different from other types of optimal control and variational formulations given in literature.

{IEEEkeywords}

Stochastic systems; Optimal control; Nonlinear filtering.

1 Introduction

In this paper, we continue the development of duality theory for nonlinear filtering. While the companion paper (part I) was concerned with a (dual) controllability counterpart of stochastic observability, the purpose of this present paper (part II) is to express the nonlinear filtering problem as a (dual) optimal control problem. The proposed duality is shown to be an exact extension of the original Kalman-Bucy duality [1, 2], in the sense that the dual optimal control problem has the same minimum variance structure for both linear and nonlinear filtering problems. Because of its historical importance, we begin by introducing and reviewing the classical duality for the linear Gaussian model.

1.1 Background and literature review

The linear Gaussian filtering model is as follows:

dXt\displaystyle\,\mathrm{d}X_{t} =ATXtdt+σdBt,X0N(m0,Σ0)\displaystyle=A^{\hbox{\rm\tiny T}}X_{t}\,\mathrm{d}t+\sigma\,\mathrm{d}B_{t},\quad X_{0}\sim N(m_{0},\Sigma_{0}) (1a)
dZt\displaystyle\,\mathrm{d}Z_{t} =HTXtdt+dWt\displaystyle=H^{\hbox{\rm\tiny T}}X_{t}\,\mathrm{d}t+\,\mathrm{d}W_{t} (1b)

where X:={Xtd:0tT}X:=\{X_{t}\in\mathbb{R}^{d}:0\leq t\leq T\} is the state process, the prior N(m0,Σ0)N(m_{0},\Sigma_{0}) is a Gaussian density with mean m0dm_{0}\in\mathbb{R}^{d} and variance Σ00\Sigma_{0}\succeq 0, Z:={Zt:0tT}Z:=\{Z_{t}:0\leq t\leq T\} is the observation process, and both B:={Bt:0tT}B:=\{B_{t}:0\leq t\leq T\} and W:={Wt:0tT}W:=\{W_{t}:0\leq t\leq T\} are Brownian motion (B.M.). It is assumed that X0,B,WX_{0},B,W are mutually independent. The model parameters Ad×dA\in\mathbb{R}^{d\times d}, Hd×mH\in\mathbb{R}^{d\times m}, and σd×p\sigma\in\mathbb{R}^{d\times p}.

For this problem, the dual optimal control formulations are well-understood. These are of following two types:

  • Minimum variance optimal control problem:

Minimize=:u{utm:0tT}:𝖩(u)\displaystyle\mathop{\text{Minimize}}_{\stackrel{{\scriptstyle\{u_{t}\in\mathbb{R}^{m}:0\leq t\leq T\}}}{{=:u}}}\!:\quad{\sf J}(u) =|y0|Σ02+0TytT(σσT)yt+|ut|2dt\displaystyle=|y_{0}|^{2}_{\Sigma_{0}}+\int_{0}^{T}y_{t}^{\hbox{\rm\tiny T}}(\sigma\sigma^{\hbox{\rm\tiny T}})y_{t}+|u_{t}|^{2}\,\mathrm{d}t (2a)
Subject to:dytdt\displaystyle\text{Subject to}\;\;:\;-\frac{\,\mathrm{d}y_{t}}{\,\mathrm{d}t} =Ayt+Hut,yT=f(given)\displaystyle=Ay_{t}+Hu_{t},\quad y_{T}=f\;\;\text{(given)} (2b)
  • Minimum energy optimal control problem:

Minimizem~0d{utp:0tT}=:u:\displaystyle\mathop{\text{Minimize}}_{\stackrel{{\scriptstyle\{u_{t}\in\mathbb{R}^{p}:0\leq t\leq T\}=:u}}{{\tilde{m}_{0}\in\mathbb{R}^{d}}}}\!:\quad 𝖩(u,m~0;z)=|m0m~0|Σ012\displaystyle{\sf J}(u,\tilde{m}_{0};z)=|m_{0}-\tilde{m}_{0}|^{2}_{\Sigma_{0}^{-1}}
+0T|ut|2+|z˙tHTm~t|2dt\displaystyle\qquad+\int_{0}^{T}|u_{t}|^{2}+|\dot{z}_{t}-H^{\hbox{\rm\tiny T}}\tilde{m}_{t}|^{2}\,\mathrm{d}t (3a)
Subject to:\displaystyle\text{Subject to}\;\;:\; dm~tdt=Am~t+σut\displaystyle\frac{\,\mathrm{d}\tilde{m}_{t}}{\,\mathrm{d}t}=A^{\top}\tilde{m}_{t}+\sigma u_{t} (3b)

where z={ztm:0tT}z=\{z_{t}\in\mathbb{R}^{m}:0\leq t\leq T\} is a given sample path of observations.

These two types of linear quadratic (LQ) optimal control problems are known since 1960s and described in [3, Sec. 7.3.1 and 7.3.2]. Because it is discussed in the seminal paper [2] of Kalman and Bucy, the minimum variance duality (2) is also referred to as the Kalman-Bucy duality [4]. The relationship of the two problems to the model (1) is as follows:

  • Minimum variance duality is related to the filtering problem for the model (1). The optimal control cost (2a) comes from specifying a minimum variance objective for estimating the random variable fTXTf^{\hbox{\rm\tiny T}}X_{T} for fdf\in\mathbb{R}^{d}.

  • Minimum energy duality is related to a smoothing problem for the model (1). The optimal cost (3a) is obtained from specifying a maximum likelihood (ML) objective for estimating a trajectory {m~t:0tT}\{\tilde{m}_{t}:0\leq t\leq T\} given a sample path {zt:0tT}\{z_{t}:0\leq t\leq T\} of observations.

Their respective solutions are related to (1) as follows:

  • The solution of the minimum variance duality (2) is useful to derive the Kalman filter for (1[5, Ch. 7.6]. The derivation helps explain why the covariance equation of the Kalman filter is the same as the differential Ricatti equation (DRE) of the LQ optimal control. Note however that the time arrow is reversed: the DRE is solved in forward time for the Kalman filter. This is because the constraint (2b) is a backward (in time) ordinary differential equation (ODE).

  • The solution of the minimum energy duality (3) is a favorite technique to derive the forward-backward equations of smoothing for the model (1). The Hamilton’s equation for (3) is referred to as the Bryson-Frazier formula [6, Eq. (13.3.4)]. By introducing a DRE, other forms of solution, e.g., the Fraser-Potter smoother [7, Eq. (16)-(17)], are possible and useful in practice.

Given this background for the linear Gaussian model (1), there has been extensive work spanning decades on extending duality to the problems of nonlinear filtering and smoothing. The prominent duality type solution approaches in literature include the following:

  • Mortensen’s maximum likelihood estimator (MLE) [8].

  • Minimum energy estimator (MEE) in the model predictive control (MPC) literature [9, Ch. 4].

  • Log transformation relationship between the Zakai equation of nonlinear filtering and the Hamilton-Jacobi-Bellman (HJB) equation of optimal control [10].

  • Mitter and Newton’s variational formulation of the nonlinear smoothing problem [11].

In an early work [8], Mortensen considered a slightly more general version of the linear Gaussian model (1) where the drift terms in both (1a) and (1b) are nonlinear. Both the optimal control problem and its forward-backward solution are straightforward extensions of (3). Since 1960s, closely related extensions have appeared by different names in different communities, e.g., maximum likelihood estimation (MLE), maximum a posteriori (MAP) estimation, and the minimum energy estimation (MEE) which is discussed next.

Based on the use of duality, the theory and algorithms developed in the MPC literature are readily adapted to solve state estimation problems. The resulting class of estimators is referred to as the minimum energy estimator (MEE) [9, Ch. 4]. The MEE algorithms are broadly of two types: (i) Full information estimator (FIE) where the entire history of observation is used; and (ii) Moving horizon estimator (MHE) where only a most recent fixed window of observation is used. An important motivation is to also incorporate additional constraints in estimator design. Early papers include [12, 13, 14] and more recent extensions have appeared in [15, 16, 17, 18]. A historical survey is given in [9, Sec. 4.7] where Rawlings et. al. write “establishing duality [of optimal estimator] with the optimal regulator is a favorite technique for establishing estimator stability”. Although the specific comment is made for the Kalman filter, the remainder of the chapter amply demonstrates the utility of dual constructions for both algorithm design and convergence analysis (as the time-horizon TT\to\infty). Convergence analysis typically requires additional assumptions on the model which in turn has motivated the work on nonlinear observability and detectability definitions. A literature survey of these definitions, including the connections to duality theory, appears in the introduction of the companion paper [19].

While the focus of MEE is on deterministic models, duality is also an important theme in the study of nonlinear stochastic systems (hidden Markov models). A key concept is the log transformation [20]. In [10], the log transformation was used to transform the Zakai equation into a Hamilton-Jacobi-Bellman (HJB) equation. Because of this, the negative log of a posterior density is a value function for some stochastic optimal control problem (this is how duality is understood in stochastic settings [21, Sec. 4.8]). While the problem itself was not clarified in [10] (see however [22]), Mitter and Newton introduced a dual optimal control problem in [11] based on a variational interpretation of the Bayes’ formula. This work continues to impact algorithm design which remains an important area of research [23, 24, 25, 26, 27]. A notable ensuing contribution appeared in the PhD thesis-work [28] where Mitter-Newton duality is used to obtain results on nonlinear filter stability.

Given the importance of duality for the purposes of stability analysis in both deterministic and stochastic settings of the problem, it is useful to return to the linear Gaussian model (1) and compare the two types of duality (2) and (3). An important point, that has perhaps not been stressed in literature, is that minimum variance duality (2) is more compatible with the classical duality between controllability and observability in linear systems theory. This is because of the following reasons:

  • Inputs and outputs. In (2), the control input uu has the same dimension mm as the output process while in (3), the control input uu is the dimension nn of the process noise. Evidently, it is natural to view the inputs and outputs as dual processes that have the same dimension.

  • Constraint. If we ignore the noise terms in (1) then the resulting deterministic state-output system (x˙t=Axt\dot{x}_{t}=A^{\top}x_{t} and zt=Hxtz_{t}=H^{\top}x_{t}) shares a dual relationship with the deterministic state-input system (2b). (It is shown in part I [19, Sec. III-F] that (2b) is also the dual for the stochastic system (1)). In contrast, the ODE (3b) is a modified copy of the model (1a).

  • Stability condition. The condition for asymptotic analysis of (2) is stabilizability of (2b) and by duality detectability of (AT,HT)(A^{\hbox{\rm\tiny T}},H^{\hbox{\rm\tiny T}}). The latter is known to be also the appropriate condition for stability of the Kalman filter. In contrast, for (3), asymptotic convergence of the optimal m~T\tilde{m}_{T} is possible even with σ=0\sigma=0. The important condition again is detectability of (AT,HT)(A^{\hbox{\rm\tiny T}},H^{\hbox{\rm\tiny T}}) but it is not at all easy to see from (3).

  • Arrow of time. Because the respective DREs are solved forward (resp. backward) in time for optimal filtering (resp. control), the arrow of time flips between optimal control and optimal filtering. Evidently, this is the case for minimum variance duality (2) but not so for the minimum energy duality (3): The constraint (2b) is a backward in time ODE while the constraint (3b) is a modified copy of the signal model which proceeds forward in time.

All of this suggests that a fruitful approach – for both defining observability and for using the definition for asymptotic stability analysis – is to consider the minimum variance duality, which naturally begets the following questions:

  • What are the appropriate extensions of (2) and (3) for nonlinear deterministic and stochastic systems?

  • What type of duality is implicit in Mitter-Newton’s work? It is already evident that MEE is an extension of (3).

Both of these questions are answered in the present paper (for the white noise observation model). Before discussing the original contributions, it is noted that the past work on minimum variance duality has been on refinement and extensions of the linear model with additional constraints. In [29], it is used to obtain the solution to a class of singular regulator problems, and in [30], the Lagrangian dual for an MEE problem with truncated measurement noise is considered. Numerical algorithms for (2) and its extensions appear in [31, 32, 33, 34]. Prior to our work, it was widely believed that the nonlinear extension of minimum variance duality is not possible [4].

1.2 Summary of original contributions

The main contribution of this paper is to present a minimum variance dual to the nonlinear filtering problem. As in the companion paper (part I), the nonlinear filtering problem is for the HMM with the white noise observation model. The mathematical statement of the dual relationship between optimal filtering and optimal control is given in the form of a duality principle (Thm. 1). The principle relates the value of the control problem to the variance of the filtering problem. The classical Kalman-Bucy duality (2) is recovered as a special case for the linear-Gaussian model (1).

Two approaches are described to solve the optimal control problem: (i) Based on the use of the stochastic maximum principle to derive the Hamilton’s equation (Thm. 4.9); and (ii) Based on a martingale characterization (Thm. 5.14). A formula for the optimal control as a feedback control law is obtained and used to derive the equation of the optimal nonlinear filter. Our duality is also related to Mitter-Newton duality with a side-by-side comparison in Table 1.

This paper is drawn from the PhD thesis of the first author [35]. A prior conference version appeared in [36]. While the duality principle was already stated in the conference paper, it relied on a certain assumption [36, Assumption A1] which has now been proved. Various formulae are stated more simply, e.g., the use of carré du champ operator to specify the running cost. Issues related to function spaces have been clarified to a large extent. While the conference version relied on the innovation process, the present version directly works with the observation process. Such a choice is more natural for the problem at hand. As a result, most of the results and certainly their proofs are novel. Comparison with Mitter-Newton duality is also novel.

1.3 Paper outline

The outline of the remainder of this paper is as follows: The mathematical model and necessary background appears in Sec. 2. The dual optimal control problem together with the duality principle and its relation to the linear-Gaussian case is described in Sec. 3. Its solution using the maximum principle and the martingale characterization appears in Sec. 4 and Sec. 5, respectively. Duality-based derivation of the equation of the nonlinear filter appears in Sec. 6. A comparison with Mitter-Newton duality is contained in Sec. 7. The paper closes with some conclusions and directions for future work in Sec. 8. All the proof are contained in the Appendix.

2 Background

We briefly review the model and the notation as presented in [19]. Although the presentation is self-contained, it is in an abbreviated form with a focus on additional new concepts that are necessary for this paper.

On the probability space (Ω,T,𝖯)(\Omega,{\cal F}_{T},{\sf P}), we consider a pair of continuous-time stochastic processes (X,Z)(X,Z) as follows:

  • The state process X={Xt:Ω𝕊:0tT}X=\{X_{t}:\Omega\to\mathbb{S}:0\leq t\leq T\} is a Feller-Markov process taking values in the state-space 𝕊\mathbb{S}. The prior is denoted by μ𝒫(𝕊)\mu\in{\cal P}(\mathbb{S}) (space of probability measures) and X0μX_{0}\sim\mu. The infinitesimal generator is denoted by 𝒜{\cal A}.

  • The observation process Z={Zt:0tT}Z=\{Z_{t}:0\leq t\leq T\} satisfies the stochastic differential equation (SDE):

    Zt=0th(Xs)ds+Wt,t0Z_{t}=\int_{0}^{t}h(X_{s})\,\mathrm{d}s+W_{t},\quad t\geq 0 (4)

    where h:𝕊mh:\mathbb{S}\to\mathbb{R}^{m} is referred to as the observation function and W={Wt:0tT}W=\{W_{t}:0\leq t\leq T\} is an mm-dimensional Brownian motion (B.M.). We write WW is 𝖯{\sf P}-B.M. It is assumed that WW is independent of XX.

The above is referred to as the white noise observation model of nonlinear filtering. The model is denoted by (𝒜,h)({\cal A},h).

An important additional concept in this paper is the carré du champ operator Γ\Gamma defined as follows (see [37]):

(Γf)(x)=(𝒜f2)(x)2f(x)(𝒜f)(x),x𝕊(\Gamma f)(x)=({\cal A}f^{2})(x)-2f(x)({\cal A}f)(x),\quad x\in\mathbb{S}

where f:𝕊f:\mathbb{S}\to\mathbb{R} is a test function. Explicit formulae for the most important examples are described next.

2.1 Guiding examples

Example 1 (Finite state-space)

𝕊={1,2,,d}\mathbb{S}=\{1,2,\ldots,d\}. A real-valued function ff is identified with a vector in d\mathbb{R}^{d} where the ithi^{\text{th}} element of the vector is f(i)f(i). In this manner, the generator 𝒜{\cal A} of the Markov process is identified with a row-stochastic rate matrix Ad×dA\in\mathbb{R}^{d\times d} (the non-diagonal elements of AA are non-negative and the row sum is zero). The carré du champ operator Γ:dd\Gamma:\mathbb{R}^{d}\to\mathbb{R}^{d} is as follows:

(Γf)(i)=j𝕊A(i,j)(f(i)f(j))2,i𝕊(\Gamma f)(i)=\sum_{j\in\mathbb{S}}A(i,j)(f(i)-f(j))^{2},\quad i\in\mathbb{S} (5)
Example 2 (Euclidean state-space)

𝕊=d\mathbb{S}=\mathbb{R}^{d}. The Markov process XX is an Itô diffusion modeled using a stochastic differential equation (SDE):

dXt=a(Xt)dt+σ(Xt)dBt,X0μ\,\mathrm{d}X_{t}=a(X_{t})\,\mathrm{d}t+\sigma(X_{t})\,\mathrm{d}B_{t},\quad X_{0}\sim\mu

where aC1(d;d)a\in C^{1}(\mathbb{R}^{d};\mathbb{R}^{d}) and σC2(d;d×p)\sigma\in C^{2}(\mathbb{R}^{d};\mathbb{R}^{d\times p}) satisfy appropriate technical conditions such that a strong solution exists for [0,T][0,T], and B={Bt:0tT}B=\{B_{t}:0\leq t\leq T\} is a standard B.M. assumed to be independent of X0X_{0} and WW. In the Euclidean case, all the measures are identified with their density. In particular, we use the notation μ\mu to denote the probability density function of the prior.

The infinitesimal generator 𝒜{\cal A} acts on C2(d;)C^{2}(\mathbb{R}^{d};\mathbb{R}) functions in its domain according to [38, Thm. 7.3.3]

(𝒜f)(x):=aT(x)f(x)+12tr(σσT(x)(D2f)(x)),xd({\cal A}f)(x):=a^{\hbox{\rm\tiny T}}(x)\nabla f(x)+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}\mbox{tr}\big{(}\sigma\sigma^{\hbox{\rm\tiny T}}(x)(D^{2}f)(x)\big{)},\quad x\in\mathbb{R}^{d}

where f\nabla f is the gradient vector and D2fD^{2}f is the Hessian matrix. For fC1(d;)f\in C^{1}(\mathbb{R}^{d};\mathbb{R}), the carré du champ operator is given by

(Γf)(x)=|σT(x)f(x)|2,xd(\Gamma f)(x)=\big{|}\sigma^{\hbox{\rm\tiny T}}(x)\nabla f(x)\big{|}^{2},\quad x\in\mathbb{R}^{d} (6)
Example 3 (Linear Gaussian model)

The model (1) introduced in Sec. 1 is a special case of Itô diffusion where the drift terms are linear a(x)=ATxa(x)=A^{\hbox{\rm\tiny T}}x and h(x)=HTxh(x)=H^{\hbox{\rm\tiny T}}x, the coefficient of the process noise σ(x)=σ\sigma(x)=\sigma is a constant matrix, and the prior μ\mu is a Gaussian density. A real-valued linear function is expressed as

f(x)=f~Tx,xdf(x)=\tilde{f}^{\hbox{\rm\tiny T}}x,\quad x\in\mathbb{R}^{d}

where f~d\tilde{f}\in\mathbb{R}^{d}. Then 𝒜f{\cal A}f is also a linear function given by

(𝒜f)(x)=(Af~)Tx,xd\big{(}{\cal A}f\big{)}(x)=(A\tilde{f})^{\hbox{\rm\tiny T}}x,\quad x\in\mathbb{R}^{d}

and Γf\Gamma f is a constant function given by

(Γf)(x)=f~T(σσT)f~,xd\big{(}\Gamma f\big{)}(x)=\tilde{f}^{\hbox{\rm\tiny T}}\big{(}\sigma\sigma^{\hbox{\rm\tiny T}}\big{)}\tilde{f},\quad x\in\mathbb{R}^{d} (7)

2.2 Background on nonlinear filtering

The canonical filtration t=σ({(Xs,Ws):0st}){\cal F}_{t}=\sigma\big{(}\{(X_{s},W_{s}):0\leq s\leq t\}\big{)}. The filtration generated by the observation is denoted by 𝒵:={𝒵t:0tT}{\cal Z}:=\{{\cal Z}_{t}:0\leq t\leq T\} where 𝒵t=σ({Zs:0st}){\cal Z}_{t}=\sigma\big{(}\{Z_{s}:0\leq s\leq t\}\big{)}. A standard approach is based upon the Girsanov change of measure. Suppose the model satisfies the Novikov’s condition: 𝖤(exp(120T|h(Xt)|2dt))<{\sf E}\left(\exp\big{(}{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}\int_{0}^{T}|h(X_{t})|^{2}\,\mathrm{d}t\big{)}\right)<\infty. Define a new measure 𝖯~{\tilde{\sf P}} on (Ω,T)(\Omega,{\cal F}_{T}) as follows:

d𝖯~d𝖯=exp(0ThT(Xt)dWt120T|h(Xt)|2dt)=:DT1\frac{\,\mathrm{d}{\tilde{\sf P}}}{\,\mathrm{d}{\sf P}}=\exp\Big{(}-\int_{0}^{T}h^{\hbox{\rm\tiny T}}(X_{t})\,\mathrm{d}W_{t}-{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}\int_{0}^{T}|h(X_{t})|^{2}\,\mathrm{d}t\Big{)}=:D_{T}^{-1}

Then it is shown that the probability law for XX is unchanged but ZZ is a 𝖯~{\tilde{\sf P}}-B.M. that is independent of XX [28, Lem. 1.1.5]. The expectation with respect to 𝖯~{\tilde{\sf P}} is denoted by 𝖤~(){\tilde{\sf E}}(\cdot).

The two probability measures are used to define the un-normalized and the normalized (or nonlinear) filter are as follows: For 0tT0\leq t\leq T and fCb(𝕊)f\in C_{b}(\mathbb{S}),

(un-normalized filter)σt(f)\displaystyle\text{(un-normalized filter)}\quad\sigma_{t}(f) :=𝖤~(Dtf(Xt)|𝒵t)\displaystyle:={\tilde{\sf E}}\big{(}D_{t}f(X_{t})|{\cal Z}_{t}\big{)}
(nonlinear filter)πt(f)\displaystyle\text{(nonlinear filter)}\quad\pi_{t}(f) :=𝖤(f(Xt)|𝒵t)\displaystyle:={\sf E}\big{(}f(X_{t})|{\cal Z}_{t}\big{)}

As the name suggests, πt(f)=σt(f)σt(𝟣)\pi_{t}(f)=\frac{\sigma_{t}(f)}{\sigma_{t}({\sf 1})} which is referred to as the Kallianpur-Striebel formula [39, Thm. 5.3] (here 𝟣{\sf 1} is the constant function 𝟣(x)=1{\sf 1}(x)=1 for all x𝕊x\in\mathbb{S}). Combining the tower property of conditional expectation with the change of measure gives

𝖤(f(Xt))=𝖤(πt(f))=𝖤~(σt(f)){\sf E}(f(X_{t}))={\sf E}(\pi_{t}(f))={\tilde{\sf E}}(\sigma_{t}(f)) (8)

2.3 Function spaces

The notation L𝒵T2(Ω;m)L^{2}_{{\cal Z}_{T}}(\Omega;\mathbb{R}^{m}) and L𝒵2([0,T];m)L^{2}_{{\cal Z}}\big{(}[0,T];\mathbb{R}^{m}\big{)} is used to denote the Hilbert space of 𝒵T{\cal Z}_{T}-measurable random vector and 𝒵{\cal Z}-adapted stochastic process, respectively. These Hilbert spaces suffice if the state-space is finite. In general settings, let 𝒴{\cal Y} denote a suitable Banach space of real-valued functions on 𝕊\mathbb{S}, equipped with the norm 𝒴\|\cdot\|_{\cal Y}. Then

  • For a random function, the Banach space L𝒵T2(Ω;𝒴):={F:Ω𝒴:F is 𝒵T-measurable,𝖤~(F𝒴2)<}L^{2}_{{\cal Z}_{T}}(\Omega;{\cal Y}):=\big{\{}F:\Omega\to{\cal Y}:F\text{ is }{\cal Z}_{T}\text{-measurable},\;{\tilde{\sf E}}\big{(}\|F\|_{\cal Y}^{2}\big{)}<\infty\big{\}}.

  • For a function-valued stochastic process, the Banach space is L𝒵2([0,T];𝒴):={Y:Ω×[0,T]𝒴:Y is 𝒵-adapted,𝖤~(0TYt𝒴2dt)<}L^{2}_{{\cal Z}}([0,T];{\cal Y}):=\Big{\{}Y:\Omega\times[0,T]\to{\cal Y}\;:\;Y\text{ is }{\cal Z}\text{-adapted},\;{\tilde{\sf E}}\Big{(}\int_{0}^{T}\|Y_{t}\|_{\cal Y}^{2}\,\mathrm{d}t\Big{)}<\infty\Big{\}}.

In the remainder of this paper, we set 𝒴:=Cb(𝕊){\cal Y}:=C_{b}(\mathbb{S}) (the space of continuous and bounded functions) equipped with the sup-norm. The dual space (𝕊){\cal M}(\mathbb{S}) (the space of rba measures) is denoted by 𝒴{\cal Y}^{\dagger} where the duality pairing f,ρ=ρ(f)\langle f,\rho\rangle=\rho(f) for f𝒴f\in{\cal Y} and ρ𝒴\rho\in{\cal Y}^{\dagger}.

3 Main result: The duality principle

3.1 Problem statement

For a function FL𝒵T2(Ω;𝒴)F\in L^{2}_{{\cal Z}_{T}}\big{(}\Omega;{\cal Y}\big{)}, the nonlinear filter πT(F)\pi_{T}(F) is the minimum variance estimate of F(XT)F(X_{T}) [3, Sec. 6.1.2]:

πT(F)=argminSTL𝒵T2(Ω;)𝖤(|F(XT)ST|2)\pi_{T}(F)=\mathop{\operatorname{argmin}}_{S_{T}\in L^{2}_{{\cal Z}_{T}}(\Omega;\mathbb{R})}{\sf E}\big{(}|F(X_{T})-S_{T}|^{2}\big{)} (9)

Our goal is to express the above minimum variance optimization problem as a dual optimal control problem.

The conditional variance is denoted by

𝒱T(F):=𝖤(|F(XT)πT(F)|2|𝒵T)=πT(F2)(πT(F))2{\cal V}_{T}(F):={\sf E}\big{(}|F(X_{T})-\pi_{T}(F)|^{2}|{\cal Z}_{T}\big{)}=\pi_{T}(F^{2})-\big{(}\pi_{T}(F)\big{)}^{2}

For notational ease, the expected value of the conditional variance is denoted by

varT(F):=𝖤(𝒱T(F))\operatorname{var}_{T}(F):={\sf E}\big{(}{\cal V}_{T}(F)\big{)}

Strictly speaking, the above is variance only at time T=0T=0. However, the verbiage is consistent with the “minimum variance” interpretation of the nonlinear filter.

3.2 Dual optimal control problem

The function space of admissible control inputs is denoted by 𝒰:=L𝒵2([0,T];m){\cal U}:=L^{2}_{{\cal Z}}\big{(}[0,T];\mathbb{R}^{m}\big{)}. An element of 𝒰{\cal U} is denoted U={Ut:0tT}U=\{U_{t}:0\leq t\leq T\}. It is referred to as the control input. The main contribution of this paper is the following problem.

  • Minimum variance optimal control problem:

Minimize:U𝒰𝖩T(U)=var0(Y0)+𝖤(0Tl(Yt,Vt,Ut;Xt)dt)\displaystyle\mathop{\text{Minimize:}}_{U\in\;{\cal U}}\;{\sf J}_{T}(U)=\operatorname{var}_{0}(Y_{0})+{\sf E}\Big{(}\int_{0}^{T}l(Y_{t},V_{t},U_{t}\,;X_{t})\,\mathrm{d}t\Big{)} (10a)
Subject to (BSDE constraint):
dYt(x)=((𝒜Yt)(x)+h(x)(Ut+Vt(x)))dtVtT(x)dZt\displaystyle-\!\,\mathrm{d}Y_{t}(x)=\big{(}({\cal A}Y_{t})(x)+h(x)(U_{t}+V_{t}(x))\big{)}\,\mathrm{d}t-V_{t}^{\hbox{\rm\tiny T}}(x)\,\mathrm{d}Z_{t}
YT(x)=F(x),x𝕊\displaystyle\quad\;Y_{T}(x)=F(x),\;\;x\in\mathbb{S} (10b)

where the running cost

l(y,v,u;x):=(Γy)(x)+|u+v(x)|2l(y,v,u;x):=(\Gamma y)(x)+|u+v(x)|^{2}

and var0(Y0)=𝖤(|Y0(X0)μ(Y0)|2)=μ(Y02)μ(Y0)2\operatorname{var}_{0}(Y_{0})={\sf E}(|Y_{0}(X_{0})-\mu(Y_{0})|^{2})=\mu(Y_{0}^{2})-\mu(Y_{0})^{2}.

Remark 1

The BSDE (10b) is introduced in the companion paper (part I) as the dual control system. The data for the BSDE is the given terminal condition FL𝒵T2(Ω;𝒴)F\in L^{2}_{{\cal Z}_{T}}\big{(}\Omega;{\cal Y}\big{)} and the control input U𝒰U\in{\cal U}. The solution of the BSDE is the pair (Y,V)={(Yt,Vt):0tT}L𝒵2([0,T];𝒴×𝒴m)(Y,V)=\{(Y_{t},V_{t}):0\leq t\leq T\}\in L^{2}_{{\cal Z}}\big{(}[0,T];{\cal Y}\times{\cal Y}^{m}\big{)} which is (forward) adapted to the filtration 𝒵{\cal Z}. Existence, uniqueness, and regularity theory for linear BSDEs is standard and throughout the paper, we assume that the solution of BSDE (Y,V)(Y,V) is uniquely determined in L𝒵2([0,T];𝒴×𝒴m)L^{2}_{{\cal Z}}\big{(}[0,T];{\cal Y}\times{\cal Y}^{m}\big{)} for each given YTL𝒵T2(Ω;𝒴)Y_{T}\in L^{2}_{{\cal Z}_{T}}(\Omega;{\cal Y}) and UL𝒵2([0,T];m)U\in L_{\cal Z}^{2}\big{(}[0,T];\mathbb{R}^{m}\big{)}. The well-posedness results for finite state-space can be found in [40, Ch. 7] and for the Euclidean state space in [41].

The relationship of (10) to the minimum variance objective (9) is given the following theorem.

Theorem 1 (Duality principle)

For any admissible control U𝒰U\in{\cal U}, consider an estimator

ST:=μ(Y0)0TUtTdZtS_{T}:=\mu(Y_{0})-\int_{0}^{T}U_{t}^{\hbox{\rm\tiny T}}\,\mathrm{d}Z_{t} (11)

Then

𝖩T(U)=𝖤(|F(XT)ST|2){\sf J}_{T}(U)={\sf E}\big{(}|F(X_{T})-S_{T}|^{2}\big{)} (12)
Proof 3.2.

See Appendix A.1.

The problem (10) is a stochastic linear quadratic optimal control problem for which there is a well established existence-uniqueness theory for the optimal control solution. Application of this theory is the subject of the following section. For now, we assume that the optimal control is well-defined and denote it as U(opt)={Ut(opt):0tT}U^{\text{\rm(opt)}}=\{U_{t}^{\text{\rm(opt)}}:0\leq t\leq T\}. Because the right-hand side of the identity (12) is bounded below by varT(F)\text{var}_{T}(F), the duality gap

𝖩T(U(opt))varT(F)0{\sf J}_{T}(U^{\text{\rm(opt)}})-\operatorname{var}_{T}(F)\geq 0

In order to conclude that the duality gap is zero, it is both necessary and sufficient to show that there exists a U𝒰U\in{\cal U} such that the estimator STS_{T}, as given by (11), equals πT(F)\pi_{T}(F). Since ZZ is a 𝖯~{\tilde{\sf P}}-B.M., the following lemma is a consequence of the Itô representation theorem for Brownian motion [38, Thm. 4.3.3].

Lemma 3.3.

For any FL𝒵T2(Ω;𝒴)F\in L_{{\cal Z}_{T}}^{2}(\Omega;{\cal Y}), there exists a unique U𝒰U\in{\cal U} such that

πT(F)=𝖤~(πT(F))0TUtTdZt,𝖯~-a.s.\pi_{T}(F)={\tilde{\sf E}}\big{(}\pi_{T}(F)\big{)}-\int_{0}^{T}U_{t}^{\hbox{\rm\tiny T}}\,\mathrm{d}Z_{t},\quad{\tilde{\sf P}}\text{-a.s.}
Proof 3.4.

See Appendix A.2.

Because the duality gap is zero, the following implications are to be had:

  • The optimal control U(opt)U^{\text{\rm(opt)}} gives the conditional mean

    πT(F)=μ(Y0)0T(Ut(opt))TdZt,𝖯-a.s.\pi_{T}(F)=\mu(Y_{0})-\int_{0}^{T}\big{(}U_{t}^{\text{\rm(opt)}}\big{)}^{\hbox{\rm\tiny T}}\,\mathrm{d}Z_{t},\quad{\sf P}\text{-a.s.}
  • The optimal value is the expected value of the conditional variance

    varT(F)=var0(Y0)+𝖤(0Tl(Yt,Vt,Ut(opt);Xt)dt)\operatorname{var}_{T}(F)=\operatorname{var}_{0}(Y_{0})+{\sf E}\Big{(}\int_{0}^{T}l(Y_{t},V_{t},U_{t}^{\text{\rm(opt)}};X_{t})\,\mathrm{d}t\Big{)}

    where (Y,V)(Y,V) is the optimally controlled stochastic process obtained with U=U(opt)U=U^{\text{\rm(opt)}} in (10b).

In fact, these two implications carry over to the entire optimal trajectory.

Proposition 3.5.

Suppose U(opt)U^{\text{\rm(opt)}} is the optimal control input and that (Y,V)(Y,V) is the associated solution of the BSDE (10b). Then for almost every 0tT0\leq t\leq T,

πt(Yt)\displaystyle\pi_{t}(Y_{t}) =μ(Y0)0t(Us(opt))TdZs,𝖯-a.s.\displaystyle=\mu(Y_{0})-\int_{0}^{t}\big{(}U_{s}^{\text{\rm(opt)}}\big{)}^{\hbox{\rm\tiny T}}\,\mathrm{d}Z_{s},\quad{\sf P}\text{-a.s.} (13)
vart(Yt)\displaystyle\operatorname{var}_{t}(Y_{t}) =var0(Y0)+𝖤(0tl(Ys,Vs,Us(opt);Xs)ds)\displaystyle=\operatorname{var}_{0}(Y_{0})+{\sf E}\Big{(}\int_{0}^{t}l(Y_{s},V_{s},U_{s}^{\text{\rm(opt)}};X_{s})\,\mathrm{d}s\Big{)} (14)
Proof 3.6.

See Appendix A.3.

Consequently, the expected value of the conditional variance is the optimal cost-to-go (for a.e. 0tT0\leq t\leq T). We do not yet have a formula for the optimal control U(opt)U^{\text{\rm(opt)}}. The difficulty arises because there is no HJB equation for BSDE-constrained optimal control problem. Instead, the literature on such problem utilizes the stochastic maximum principle for BSDE which is the subject of the next section. Before that, we discuss the linear Gaussian case.

3.3 Linear Gaussian case

The goal is to show that the classical Kalman-Bucy duality (2) described in Sec. 1 for the linear Gaussian model (1) is a special case. Consider a linear function F(x)=fTxF(x)=f^{\hbox{\rm\tiny T}}x where fdf\in\mathbb{R}^{d} is a given deterministic vector. The problem is to compute a minimum variance estimate of the scalar random variable fTXTf^{\hbox{\rm\tiny T}}X_{T}. It is given by 𝖤(fTXT|𝒵T){\sf E}(f^{\hbox{\rm\tiny T}}X_{T}|{\cal Z}_{T}). Now, it is a standard result in the theory of Gaussian processes that conditional expectation can be evaluated in the form of a linear predictor [42, Cor. 1.10]. For this reason, it suffices to consider an estimator of the form

ST:=b0TutTdZtS_{T}:=b-\int_{0}^{T}u_{t}^{\hbox{\rm\tiny T}}\,\mathrm{d}Z_{t}

where bb\in\mathbb{R} and u={utm:0tT}u=\{u_{t}\in\mathbb{R}^{m}:0\leq t\leq T\} are both deterministic (the lower case notation is used to stress this). Consequently, for linear Gaussian estimation, we can restrict the admissible space of control inputs to L2([0,T];m)L^{2}\big{(}[0,T];\mathbb{R}^{m}\big{)} which is a much smaller subspace of L𝒵2([0,T];m)L_{{\cal Z}}^{2}\big{(}[0,T];\mathbb{R}^{m}\big{)}. Using a deterministic control uu, and the terminal condition F(x)=fTxF(x)=f^{\hbox{\rm\tiny T}}x, the solution of the BSDE is given by

Yt(x)=ytTx,Vt(x)=0,xd,  0tTY_{t}(x)=y_{t}^{\hbox{\rm\tiny T}}x,\quad V_{t}(x)=0,\quad x\in\mathbb{R}^{d},\;\;0\leq t\leq T

where y={ytd:0tT}y=\{y_{t}\in\mathbb{R}^{d}:0\leq t\leq T\} is a solution of the backward ODE:

dytdt=Ayt+Hut,yT=f-\frac{\,\mathrm{d}y_{t}}{\,\mathrm{d}t}=Ay_{t}+Hu_{t},\quad y_{T}=f

Using the formula (7) for the carré du champ, the running cost

l(Yt,Vt,Ut;Xt)\displaystyle l(Y_{t},V_{t},U_{t};X_{t}) =(ΓYt)(Xt)+|Ut+Vt(Xt)|2\displaystyle=(\Gamma Y_{t})(X_{t})+|U_{t}+V_{t}(X_{t})|^{2}
=ytT(σσT)yt+|ut|2\displaystyle=y_{t}^{\hbox{\rm\tiny T}}(\sigma\sigma^{\hbox{\rm\tiny T}})y_{t}+|u_{t}|^{2}

With the Gaussian prior, the initial cost var0(y0)=y0TΣ0y0\text{var}_{0}(y_{0})=y_{0}^{\hbox{\rm\tiny T}}\Sigma_{0}y_{0}. Combining all of the above, the optimal control problem (10) reduces to (2) for the linear Gaussian model (1).

Remark 3.7.

The solution of the optimal control problem yields the optimal control input u(opt)={ut(opt):0tT}u^{\text{\rm(opt)}}=\{u^{\text{\rm(opt)}}_{t}:0\leq t\leq T\}, along with the vector y0dy_{0}\in\mathbb{R}^{d} that determines the minimum-variance estimator:

ST\displaystyle S_{T} =μ(y0Tx)0T(ut(opt))TdZt=y0Tm00T(ut(opt))TdZt\displaystyle=\mu(y_{0}^{\hbox{\rm\tiny T}}x)-\int_{0}^{T}\big{(}u_{t}^{\text{\rm(opt)}}\big{)}^{\hbox{\rm\tiny T}}\,\mathrm{d}Z_{t}=y_{0}^{\hbox{\rm\tiny T}}m_{0}-\int_{0}^{T}\big{(}u_{t}^{\text{\rm(opt)}}\big{)}^{\hbox{\rm\tiny T}}\,\mathrm{d}Z_{t}

The Kalman filter is obtained by expressing {St(f):t0,fd}\{S_{t}(f):t\geq 0,\ f\in\mathbb{R}^{d}\} as the solution to a linear SDE [5, Ch. 7.6].

4 Solution of the optimal control problem

The BSDE constrained optimal control problem (10) is not in its standard form [43, Eq. 5.10]. There are two issues:

  • The probability space: The driving martingale of the BSDE (10b) is ZZ, which is a 𝖯~{\tilde{\sf P}}-B.M. However, the expectation in defining the optimal control objective (10a) is with respect to the measure 𝖯{\sf P}.

  • The filtration: The ‘state’ of the optimal control problem (Y,V)(Y,V) is adapted to the filtration 𝒵{\cal Z}. However, the cost function (10a) also depends upon the non-adapted exogenous process XX.

The second problem is easily fixed by using the tower property of conditional expectation. To resolve the first problem, we have two choices:

  1. 1.

    Use the change of measure to evaluate 𝖩T(U){\sf J}_{T}(U) with respect to 𝖯~{\tilde{\sf P}} measure, or

  2. 2.

    Express the BSDE using a driving martingale that is a 𝖯{\sf P}-B.M. A convenient such process is the innovation process.

In this paper, the standard form of the dual optimal control problem is presented based on the first choice. For an analysis based on the second choice, see [36] and [35, Sec. 5.5].

In order to express the expectation for the control objective (10a) with respect to 𝖯~{\tilde{\sf P}}, we use the change of measure (see Appendix A.4 for the calculation) to obtain

𝖩T(U)\displaystyle{\sf J}_{T}(U) =var0(Y0)+𝖤~(0T(Yt,Vt,Ut;σt)dt)\displaystyle=\text{var}_{0}(Y_{0})+{\tilde{\sf E}}\Big{(}\int_{0}^{T}\ell(Y_{t},V_{t},U_{t};\sigma_{t})\,\mathrm{d}t\Big{)}

where the Lagrangian :𝒴×𝒴m×m×𝒴\ell:{\cal Y}\times{\cal Y}^{m}\times\mathbb{R}^{m}\times{\cal Y}^{\dagger}\to\mathbb{R} is defined by

(y,v,u;ρ)=ρ(Γy)+ρ(|u+v|2)\ell(y,v,u;\rho)=\rho\big{(}\Gamma y\big{)}+\rho\big{(}|u+v|^{2}\big{)} (15)

The dual optimal control problem (standard form) is now expressed as follows:

MinimizeU𝒰𝖩T(U)=var0(Y0)+𝖤~(0T(Yt,Vt,Ut;σt)dt)\displaystyle\mathop{\text{Minimize}}_{U\in{\cal U}}\quad{\sf J}_{T}(U)=\operatorname{var}_{0}(Y_{0})+{\tilde{\sf E}}\Big{(}\int_{0}^{T}\ell(Y_{t},V_{t},U_{t};\sigma_{t})\,\mathrm{d}t\Big{)} (16a)
Subject to:
dYt(x)=((𝒜Yt)(x)+hT(x)(Ut+Vt(x)))dtVtT(x)dZt\displaystyle-\!\,\mathrm{d}Y_{t}(x)=\big{(}({\cal A}Y_{t})(x)+h^{\hbox{\rm\tiny T}}(x)(U_{t}+V_{t}(x))\big{)}\,\mathrm{d}t-V_{t}^{\hbox{\rm\tiny T}}(x)\,\mathrm{d}Z_{t}
YT(x)=F(x),x𝕊\displaystyle\quad\;Y_{T}(x)=F(x),\;\;x\in\mathbb{S} (16b)
Remark 4.8.

The Lagrangian is a time-dependent random functional of the dual state (y,v)(y,v) and the control uu. The randomness and time-dependency comes only from the last argument σt\sigma_{t}.

4.1 Solution using the maximum principle

Because y𝒴y\in{\cal Y} is a function, the co-state p𝒴p\in{\cal Y}^{\dagger} is a measure. The Hamiltonian :𝒴×𝒴m×m×𝒴×𝒴{\cal H}:{\cal Y}\times{\cal Y}^{m}\times\mathbb{R}^{m}\times{\cal Y}^{\dagger}\times{\cal Y}^{\dagger}\to\mathbb{R} is defined as follows:

(y,v,u,p;ρ)=p(𝒜y+hT(u+v))(y,v,u;ρ){\cal H}(y,v,u,p;\rho)=-p\big{(}{\cal A}y+h^{\hbox{\rm\tiny T}}(u+v)\big{)}-\ell(y,v,u;\rho)

In the following, the Hamilton’s equations for the optimal trajectory are derived by an application of the maximum principle for BSDE constrained optimal control problems [44, Thm. 4.4].

The Hamilton’s equations are expressed in terms of the derivatives of the Hamiltonian. In order to take derivatives with respect to functions and measures, we adopt the notion of Gâteaux differentiability. Given a nonlinear functional F:𝒴F:{\cal Y}\to\mathbb{R}, the Gâteaux derivative Fy(y)𝒴F_{y}(y)\in{\cal Y}^{\dagger} is obtained from the defining relation [3, Sec. 10.1.3]:

ddεF(y+εy~)|ε=0=y~,Fy(y),y~𝒴\frac{\,\mathrm{d}}{\,\mathrm{d}\varepsilon}F(y+\varepsilon\tilde{y})\Big{|}_{\varepsilon=0}=\big{\langle}\tilde{y},F_{y}(y)\big{\rangle},\quad\forall\,\tilde{y}\in{\cal Y}

For the problem at hand, the derivatives of the Hamiltonian are as follows:

y(y,v,u,p;ρ)\displaystyle{\cal H}_{y}(y,v,u,p;\rho) =𝒜p(ρ(Γy))y\displaystyle=-{\cal A}^{\dagger}p-\big{(}\rho(\Gamma y)\big{)}_{y}
v(y,v,u,p;ρ)\displaystyle{\cal H}_{v}(y,v,u,p;\rho) =ph2(u+v)ρ\displaystyle=-ph-2(u+v)\rho
u(y,v,u,p;ρ)\displaystyle{\cal H}_{u}(y,v,u,p;\rho) =p(h)2ρ(𝟣)u2ρ(v)\displaystyle=-p(h)-2\rho({\sf 1})u-2\rho(v)
p(y,v,u,p;ρ)\displaystyle{\cal H}_{p}(y,v,u,p;\rho) =𝒜yhT(u+v)\displaystyle=-{\cal A}y-h^{\hbox{\rm\tiny T}}(u+v)

where 𝒜{\cal A}^{\dagger} is the adjoint of 𝒜{\cal A} (whereby (𝒜ρ)(f)=ρ(𝒜f)({\cal A}^{\dagger}\rho)(f)=\rho({\cal A}f) for all f𝒴,ρ𝒴f\in{\cal Y},\rho\in{\cal Y}^{\dagger}). Using this notation, the Hamilton’s equations are as follows:

Theorem 4.9.

Consider the optimal control problem (16). Suppose U(opt)U^{\text{\rm(opt)}} is the optimal control input and the (Y,V)(Y,V) is the associated solution of the BSDE (16b). Then there exists a 𝒵{\cal Z}-adapted measure-valued stochastic process P={Pt:0tT}P=\{P_{t}:0\leq t\leq T\} such that

dPt=y(Yt,Vt,Ut(opt),Pt;σt)dt\displaystyle\,\mathrm{d}P_{t}=-{\cal H}_{y}(Y_{t},V_{t},U_{t}^{\text{\rm(opt)}},P_{t};\sigma_{t})\,\mathrm{d}t
vT(Yt,Vt,Ut(opt),Pt;σt)dZt\displaystyle\qquad\qquad-{\cal H}_{v}^{\hbox{\rm\tiny T}}(Y_{t},V_{t},U_{t}^{\text{\rm(opt)}},P_{t};\sigma_{t})\,\mathrm{d}Z_{t} (17a)
dYt=p(Yt,Vt,Ut(opt),Pt;σt)dt+VtdZt\displaystyle\,\mathrm{d}Y_{t}={\cal H}_{p}(Y_{t},V_{t},U_{t}^{\text{\rm(opt)}},P_{t};\sigma_{t})\,\mathrm{d}t+V_{t}\,\mathrm{d}Z_{t} (17b)
dP0dμ(x)=2(Y0(x)μ(Y0)),YT(x)=F(x),x𝕊\displaystyle\frac{\,\mathrm{d}P_{0}}{\,\mathrm{d}\mu}(x)=2\big{(}Y_{0}(x)-\mu(Y_{0})\big{)},\quad Y_{T}(x)=F(x),\quad x\in\mathbb{S} (17c)

where the optimal control is given by

Ut(opt)=12Pt(h)σt(𝟣)πt(Vt),𝖯~-a.s., 0tTU_{t}^{\text{\rm(opt)}}=-\frac{1}{2}\frac{P_{t}(h)}{\sigma_{t}({\sf 1})}-\pi_{t}(V_{t}),\quad{\tilde{\sf P}}\text{-a.s.},\;0\leq t\leq T (18)

(In (17c), dP0dμ\frac{\,\mathrm{d}P_{0}}{\,\mathrm{d}\mu} denotes the R-N derivative of the measure P0P_{0} with respect to the measure μ0\mu_{0}).

Proof 4.10.

See Appendix A.5.

Remark 4.11.

From linear optimal control theory, it is known that PtP_{t} is related to YtY_{t} by a (𝒵t{\cal Z}_{t}-measurable) linear transformation [40, Sec. 6.6]. The boundary condition dP0dμ(x)=2(Y0(x)μ(Y0))\frac{\,\mathrm{d}P_{0}}{\,\mathrm{d}\mu}(x)=2\big{(}Y_{0}(x)-\mu(Y_{0})\big{)} suggests that the R-N derivative

dPtdσt(x)=2(Yt(x)πt(Yt)),𝖯~-a.s., 0tT,x𝕊\frac{\,\mathrm{d}P_{t}}{\,\mathrm{d}\sigma_{t}}(x)=2\big{(}Y_{t}(x)-\pi_{t}(Y_{t})\big{)},\quad{\tilde{\sf P}}\text{-a.s.},\;0\leq t\leq T,\;x\in\mathbb{S} (19)

This is indeed the case as we show in Appendix A.6 by verifying that (19) solves the Hamilton’s equation. Combining this formula with (18), we have a formula for optimal control input as a feedback control law:

Ut(opt)=(πt(hYt)πt(h)πt(Yt))πt(Vt),0tTU_{t}^{\text{\rm(opt)}}=-\big{(}\pi_{t}(hY_{t})-\pi_{t}(h)\pi_{t}(Y_{t})\big{)}-\pi_{t}(V_{t}),\quad 0\leq t\leq T

4.2 Explicit formulae for the guiding examples

Example 4.12 (Finite state-space).

(Continued from Example 1). A real-valued function ff (resp. a measure ρ\rho) is identified with a column vector in d\mathbb{R}^{d} where the ithi^{\text{th}} element of the vector represents f(i)f(i) (resp. ρ(i)\rho(i)), and ρ(f)=ρTf\rho(f)=\rho^{\hbox{\rm\tiny T}}f. In this manner, the generator 𝒜{\cal A} is identified with a rate matrix Ad×dA\in\mathbb{R}^{d\times d} and the observation function hh is identified with a matrix Hd×mH\in\mathbb{R}^{d\times m}. Let {e1,e2,,ed}\{e_{1},e_{2},\ldots,e_{d}\} denote the canonical basis in d\mathbb{R}^{d}, Q(i)=j𝕊A(i,j)(eiej)(eiej)TQ(i)=\sum_{j\in\mathbb{S}}A(i,j)(e_{i}-e_{j})(e_{i}-e_{j})^{\hbox{\rm\tiny T}} and ρ(Q)=i𝕊ρ(i)Q(i)\rho(Q)=\sum_{i\in\mathbb{S}}\rho(i)Q(i). For any vector bdb\in\mathbb{R}^{d}, B=diag(b)B=\operatorname{diag}(b) is a d×dd\times d diagonal matrix whose diagonal entires are defined as B(i,i)=b(i)B(i,i)=b(i) for i=1,2,,di=1,2,\ldots,d. For a d×dd\times d matrix BB, b=diag(B)b=\operatorname{diag}^{\dagger}(B) is a dd-dimensional vector whose entries are defined as b(i)=B(i,i)b(i)=B(i,i) for i=1,2,,di=1,2,\ldots,d.

The Lagrangian :d×d×m×m×d\ell:\mathbb{R}^{d}\times\mathbb{R}^{d\times m}\times\mathbb{R}^{m}\times\mathbb{R}^{d}\to\mathbb{R} and the Hamiltonian :d×d×m×m×d×d{\cal H}:\mathbb{R}^{d}\times\mathbb{R}^{d\times m}\times\mathbb{R}^{m}\times\mathbb{R}^{d}\times\mathbb{R}^{d}\to\mathbb{R} are as follows:

(y,v,u;ρ)=yTρ(Q)y+ρ(𝟣)|u|2+2uTvρ+ρTdiag(vvT)\displaystyle\ell(y,v,u;\rho)=y^{\hbox{\rm\tiny T}}\rho(Q)y+\rho({\sf 1})|u|^{2}+2u^{\hbox{\rm\tiny T}}v\rho+\rho^{\hbox{\rm\tiny T}}\operatorname{diag}^{\dagger}(vv^{\hbox{\rm\tiny T}})
(y,v,u,p;ρ)=pT(Ay+Hu+diag(HvT))(y,v,u;ρ)\displaystyle{\cal H}(y,v,u,p;\rho)=-p^{\hbox{\rm\tiny T}}(Ay+Hu+\operatorname{diag}^{\dagger}(Hv^{\hbox{\rm\tiny T}}))-\ell(y,v,u;\rho)

The functional derivatives are now the partial derivatives. For the Hamiltonian, these are as follows:

y(y,v,u,p;ρ)\displaystyle{\cal H}_{y}(y,v,u,p;\rho) =ATp2ρ(Q)y\displaystyle=-A^{\hbox{\rm\tiny T}}p-2\rho(Q)y
v(y,v,u,p;ρ)\displaystyle{\cal H}_{v}(y,v,u,p;\rho) =diag(p)H2ρuT2diag(ρ)v\displaystyle=-\operatorname{diag}(p)H-2\rho u^{\hbox{\rm\tiny T}}-2\operatorname{diag}(\rho)v
u(y,v,u,p;ρ)\displaystyle{\cal H}_{u}(y,v,u,p;\rho) =HTp2ρ(𝟣)u2vTρ\displaystyle=-H^{\hbox{\rm\tiny T}}p-2\rho({\sf 1})u-2v^{\hbox{\rm\tiny T}}\rho
p(y,v,u,p;ρ)\displaystyle{\cal H}_{p}(y,v,u,p;\rho) =AyHudiag(HvT)\displaystyle=-Ay-Hu-\operatorname{diag}^{\dagger}(Hv^{\hbox{\rm\tiny T}})

The Hamilton’s equations are given by

dPt\displaystyle\,\mathrm{d}P_{t} =(ATPt+2σt(Q)Yt)dt\displaystyle=\big{(}A^{\hbox{\rm\tiny T}}P_{t}+2\sigma_{t}(Q)Y_{t}\big{)}\,\mathrm{d}t
+(diag(Pt)H+2σtUtT+2diag(σt)Vt)dZt\displaystyle\qquad+\big{(}\operatorname{diag}(P_{t})H+2\sigma_{t}U_{t}^{\hbox{\rm\tiny T}}+2\operatorname{diag}(\sigma_{t})V_{t}\big{)}\,\mathrm{d}Z_{t}
dYt\displaystyle\,\mathrm{d}Y_{t} =(AYt+HUt+diag(HVtT))dt+VtdZt\displaystyle=-\big{(}AY_{t}+HU_{t}+\operatorname{diag}^{\dagger}(HV_{t}^{\hbox{\rm\tiny T}})\big{)}\,\mathrm{d}t+V_{t}\,\mathrm{d}Z_{t}
P0\displaystyle P_{0} =2Σ0Y0,YT=F\displaystyle=2\Sigma_{0}Y_{0},\quad Y_{T}=F

where Σ0:=diag(μ)μμ\Sigma_{0}:=\operatorname{diag}(\mu)-\mu\mu^{\top}.

Example 4.13 (Euclidean state-space).

(Continued from Example 2). We consider the Itô diffusion (2) in d\mathbb{R}^{d} with a prior density denoted as μ\mu. Likewise, ρ\rho and pp are used to denote the density of the respective measures. Doing so, the Lagrangian and the Hamiltonian are as follows:

(y,v,u;ρ)\displaystyle\ell(y,v,u;\rho) =dρ(x)(|σT(x)y(x)|2+|u+v(x)|2)dx\displaystyle=\int_{\mathbb{R}^{d}}\rho(x)\big{(}|\sigma^{\hbox{\rm\tiny T}}(x)\nabla y(x)|^{2}+|u+v(x)|^{2}\big{)}\,\mathrm{d}x
(y,v,u,p;ρ)\displaystyle{\cal H}(y,v,u,p;\rho) =dp(x)(𝒜y(x)+hT(x)(u+v(x)))dx\displaystyle=-\int_{\mathbb{R}^{d}}p(x)\big{(}{\cal A}y(x)+h^{\hbox{\rm\tiny T}}(x)(u+v(x))\big{)}\,\mathrm{d}x
(y,v,u;ρ)\displaystyle\qquad\qquad-\ell(y,v,u;\rho)

The functional derivatives are computed by evaluating the first variation. These are as follows:

y(y,v,u,p;ρ)\displaystyle{\cal H}_{y}(y,v,u,p;\rho) =𝒜p+2(σσT(y)ρ)\displaystyle=-{\cal A}^{\dagger}p+2\nabla\cdot\big{(}\sigma\sigma^{\hbox{\rm\tiny T}}(\nabla y)\rho\big{)}
v(y,v,u,p;ρ)\displaystyle{\cal H}_{v}(y,v,u,p;\rho) =ph2(u+v)ρ\displaystyle=-ph-2(u+v)\rho
u(y,v,u,p;ρ)\displaystyle{\cal H}_{u}(y,v,u,p;\rho) =p(h)2ρ(𝟣)u2ρ(v)\displaystyle=-p(h)-2\rho({\sf 1})u-2\rho(v)
p(y,v,u,p;ρ)\displaystyle{\cal H}_{p}(y,v,u,p;\rho) =𝒜yhT(u+v)\displaystyle=-{\cal A}y-h^{\hbox{\rm\tiny T}}(u+v)

where ρ(v)\rho(v) is now understood to mean ρ(x)v(x)dx\int\rho(x)v(x)\,\mathrm{d}x and the formula for adjoint is

(𝒜p)(x)=(ap)(x)+12i,j=1d2xixj([σσT]ijp)(x)({\cal A}^{\dagger}p)(x)=-\nabla\cdot(ap)(x)+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}\sum_{i,j=1}^{d}\frac{\partial^{2}}{\partial x_{i}\partial x_{j}}\big{(}[\sigma\sigma^{\hbox{\rm\tiny T}}]_{ij}p\big{)}(x)

Therefore, the Hamilton’s equations are given by

dPt(x)\displaystyle\,\mathrm{d}P_{t}(x) =((𝒜Pt)(x)2(σσT(Yt)σt)(x))dt\displaystyle=\big{(}({\cal A}^{\dagger}P_{t})(x)-2\nabla\cdot\big{(}\sigma\sigma^{\hbox{\rm\tiny T}}(\nabla Y_{t})\sigma_{t}\big{)}(x)\big{)}\,\mathrm{d}t
+(Pt(x)h(x)+2(Ut+Vt(x))σt(x))dZt\displaystyle\qquad+\big{(}P_{t}(x)h(x)+2(U_{t}+V_{t}(x))\sigma_{t}(x)\big{)}\,\mathrm{d}Z_{t}
dYt(x)\displaystyle\,\mathrm{d}Y_{t}(x) =(𝒜Yt+hT(x)(Ut+Vt(x)))dt+VtT(x)dZt\displaystyle=-\big{(}{\cal A}Y_{t}+h^{\hbox{\rm\tiny T}}(x)(U_{t}+V_{t}(x))\big{)}\,\mathrm{d}t+V_{t}^{\hbox{\rm\tiny T}}(x)\,\mathrm{d}Z_{t}
P0(x)\displaystyle P_{0}(x) =2μ(x)(Y0(x)μ(Y0)),YT(x)=F(x),xd\displaystyle=2\mu(x)\big{(}Y_{0}(x)-\mu(Y_{0})\big{)},\;Y_{T}(x)=F(x),\;x\in\mathbb{R}^{d}

where note that PtP_{t} is now a (random) function (same as YtY_{t}).

5 Martingale characterization

Although we do not have an HJB equation, a martingale characterization of the optimal solution is possible as described in the following theorem:

Theorem 5.14.

Fix U𝒰U\in{\cal U}. Consider a 𝒵{\cal Z}-adapted real-valued stochastic process M={Mt:0tT}M=\{M_{t}:0\leq t\leq T\}

Mt:=𝒱t(Yt)0t(Ys,Vs,Us;πs)ds,0tTM_{t}:={\cal V}_{t}(Y_{t})-\int_{0}^{t}\ell(Y_{s},V_{s},U_{s};\pi_{s})\,\mathrm{d}s,\quad 0\leq t\leq T

where (Y,V)(Y,V) is the solution to the BSDE (10b) and π\pi is the nonlinear filter. Then MM is a 𝖯{\sf P}-supermartingale, and MM is a 𝖯{\sf P}-martingale if and only if

Ut=(πt(hYt)πt(h)πt(Yt))πt(Vt)U_{t}=-\big{(}\pi_{t}(hY_{t})-\pi_{t}(h)\pi_{t}(Y_{t})\big{)}-\pi_{t}(V_{t}) (20)

for 0tT0\leq t\leq T, 𝖯-a.s.{\sf P}\text{-a.s.}.

Proof 5.15.

See Appendix A.7.

A direct consequence of Thm. 5.14 is the optimality of the control (20), because

𝖤(MT)𝖤(M0){\sf E}(M_{T})\leq{\sf E}(M_{0})

which means

𝖤(𝒱T(F))𝖤(𝒱0(Y0)+0T(Yt,Vt,Ut;πt)dt)=𝖩T(U){\sf E}\big{(}{\cal V}_{T}(F)\big{)}\leq{\sf E}\Big{(}{\cal V}_{0}(Y_{0})+\int_{0}^{T}\ell(Y_{t},V_{t},U_{t};\pi_{t})\,\mathrm{d}t\Big{)}={\sf J}_{T}(U)

with equality if and only if U=U(opt)U=U^{\text{\rm(opt)}}. Therefore, the expected value of the conditional variance varT(F)=𝖤(𝒱T(F))\text{var}_{T}(F)={\sf E}\big{(}{\cal V}_{T}(F)\big{)} is the optimal value functional for the optimal control problem.

Remark 5.16.

We now have a complete solution of the optimal control problem (10). Remarkably, the solution admits a meaningful interpretation not only at the terminal time TT but also for intermediate times 0tT0\leq t\leq T. At time tt,

  • The optimal value functional is vart(Yt)\text{var}_{t}(Y_{t}) (formula (14)).

  • The optimal control Ut(opt)U_{t}^{\text{\rm(opt)}} is a feedback control law (20).

  • The optimal estimate is πt(Yt)\pi_{t}(Y_{t}) (formula (13)).

Formula (13) for πt(Yt)\pi_{t}(Y_{t}) explicitly connects the optimal control to the optimal filter. In particular, the optimal control up to time tt yields an optimal estimate of Yt(Xt)Y_{t}(X_{t}).

Because of the BSDE constrained nature of the optimal control problem (10), an explicit characterization of the optimal value functional and the feedback form of the optimal control are both welcome surprises. It is noted that the feedback formula (20) for the optimal control is derived using two approaches: using the maximum principle (Rem. 4.11) and using the martingale characterization (Thm. 5.14).

6 Derivation of the nonlinear filter

From Prop. 3.5, using the formula (20) for optimal control

πt(Yt)=μ(Y0)+0t(πt(hYs)πs(h)πs(Ys)+πs(Vs))TdZs\pi_{t}(Y_{t})=\mu(Y_{0})+\int_{0}^{t}\big{(}\pi_{t}(hY_{s})-\pi_{s}(h)\pi_{s}(Y_{s})+\pi_{s}(V_{s})\big{)}^{\hbox{\rm\tiny T}}\,\mathrm{d}Z_{s} (21)

for 0tT0\leq t\leq T, 𝖯-a.s.{\sf P}\text{-a.s.}. Because the equation for YY is known, a natural question is whether (21) can be used to obtain the equation for nonlinear filter (akin to the derivation of the Kalman filter described in Rem. 3.7). A formal derivation of the nonlinear filter along these lines is given in Appendix A.8.

Table 1: Comparison of the Mitter-Newton duality and the duality proposed in this paper
Mitter-Newton duality Duality proposed in this paper
Filtering/smoothing objective Minimize relative entropy (Eq. (22)) Minimize variance (Eq. (9))
Observation (output) process Pathwise (zz is a sample path) ZZ is a stochastic process
Control (input) process UtU_{t} has the dimension of the process noise UU and ZZ are both elements of L𝒵2([0,T];m)L^{2}_{{\cal Z}}([0,T];\mathbb{R}^{m})
Dual optimal control problem Eq. (23) Eq. (10)
Arrow of time Forward in time Backward in time
Dual state-space 𝕊\mathbb{S}: same as the state-space for XtX_{t} 𝒴{\cal Y}: the space of functions on 𝕊\mathbb{S}
Constraint Controlled copy of the state process SDE (23a) Dual control system BSDE (10b)
Running cost (Lagrangian) l(x,u;zt)=12|u|2+12h2(x)+zt(𝒜uh)(x)l(x,u\,;z_{t})={\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}|u|^{2}+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}h^{2}(x)+z_{t}({\cal A}^{u}h)(x) l(y,v,u;x)=(Γy)(x)+|u+v(x)|2l(y,v,u;x)=(\Gamma y)(x)+|u+v(x)|^{2}
Value function (its interpretation) Minus log of the posterior density Expected value of the conditional variance
Asymptotic analysis (condition) Unclear Stabilizability of BSDE \Leftrightarrow Detectability of HMM
Optimal solution gives Forward-backward equations of smoothing Equation of nonlinear filtering
Linear-Gaussian special case Minimum energy duality (3) Minimum variance duality (2)

7 Comparison with Mitter-Newton Duality

7.1 Review of Mitter-Newton duality

In [11], Mitter and Newton introduced a modified version of the Markov process XX. The modified process is denoted by X~:={X~t:0tT}\tilde{X}:=\{\tilde{X}_{t}:0\leq t\leq T\}. The problem is to pick (i) the initial prior μ~\tilde{\mu}; and (ii) the state transition, such that the probability law of X~\tilde{X} equals the conditional law for XX.

This is accomplished by setting up an optimization problem on the space of probability laws. Let 𝖯X{\sf P}_{X} denote the law for XX, 𝖰{\sf Q} denote the law for X~\tilde{X}, and 𝖯Xz{\sf P}_{X\mid z} denote the conditional law for XX given an observation sample path z={ztm:0tT}z=\{z_{t}\in\mathbb{R}^{m}:0\leq t\leq T\}. Assuming 𝖰𝖯X{\sf Q}\ll{\sf P}_{X}, the objective function is the relative entropy between 𝖰{\sf Q} and 𝖯Xz{\sf P}_{X\mid z}:

min𝖰𝖤𝖰(logd𝖰d𝖯X)𝖤𝖰(logd𝖯Xzd𝖯X)\min_{{\sf Q}}\quad{\sf E}_{{\sf Q}}\Big{(}\log\frac{\,\mathrm{d}{\sf Q}}{\,\mathrm{d}{\sf P}_{X}}\Big{)}-{\sf E}_{{\sf Q}}\Big{(}\log\frac{\,\mathrm{d}{\sf P}_{X\mid z}}{\,\mathrm{d}{\sf P}_{X}}\Big{)} (22)

In [28], (22) is referred to as the variational Kallianpur-Striebel formula. For Example 2 (Itô diffusion), this procedure yields the following stochastic optimal control problem:

Minμ~,U:𝖩(μ~,U;z)\displaystyle\mathop{\text{Min}}_{\tilde{\mu},\;U}:\;\;{\sf J}(\tilde{\mu},U\,;z)
=𝖤(logdμ~dμ(X~0)zTh(X~T)+0Tl(X~t,Ut;zt)dt)\displaystyle\qquad={\sf E}\Big{(}\log\frac{\,\mathrm{d}\tilde{\mu}}{\,\mathrm{d}\mu}(\tilde{X}_{0})-z_{T}h(\tilde{X}_{T})+\int_{0}^{T}l(\tilde{X}_{t},U_{t}\,;z_{t})\,\mathrm{d}t\Big{)} (23a)
Subj.:dX~t=a(X~t)dt+σ(X~t)(Utdt+dB~t),X~0μ~\displaystyle\text{Subj.}:\;\;\,\mathrm{d}\tilde{X}_{t}=a(\tilde{X}_{t})\,\mathrm{d}t+\sigma(\tilde{X}_{t})(U_{t}\,\mathrm{d}t+\,\mathrm{d}\tilde{B}_{t}),\;\;\tilde{X}_{0}\sim\tilde{\mu} (23b)

where

l(x,u;zt):=12|u|2+12h2(x)+zt(𝒜uh)(x)l(x,u\,;z_{t}):={\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}|u|^{2}+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}h^{2}(x)+z_{t}({\cal A}^{u}h)(x)

where 𝒜u{\cal A}^{u} is the generator of the controlled Markov process X~\tilde{X}. A similar construction is also possible for Example 1 (finite state-space) [28, Sec. 2.2.2][45, Sec. 3.3].

The problem (23) is a standard stochastic optimal control problem whose solution is obtained by writing the HJB equation (see [45]),

vtt(x)\displaystyle-\frac{\partial v_{t}}{\partial t}(x) =(𝒜(vt+zth))(x)+12h2(x)\displaystyle=\big{(}{\cal A}(v_{t}+z_{t}h)\big{)}(x)+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}h^{2}(x)
12|σT(vt+zth)(x)|2\displaystyle\qquad-{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}|\sigma^{\hbox{\rm\tiny T}}\nabla(v_{t}+z_{t}h)(x)|^{2}
vT(x)\displaystyle v_{T}(x) =zTh(x),xd\displaystyle=-z_{T}h(x),\quad x\in\mathbb{R}^{d}

and the optimal control Ut=ut(opt)(X~t)U_{t}=u_{t}^{\text{\rm(opt)}}(\tilde{X}_{t}) where

ut(opt)(x)=σT(vt+zth)(x)u_{t}^{\text{\rm(opt)}}(x)=-\sigma^{\hbox{\rm\tiny T}}\nabla(v_{t}+z_{t}h)(x)

By expressing the value function

vt(x)=log(qt(x)ezth(x))v_{t}(x)=-\log\big{(}q_{t}(x)e^{z_{t}h(x)}\big{)}

a direct calculation shows that the process {qt:0tT}\{q_{t}:0\leq t\leq T\} satisfies the backward Zakai equation of the smoothing problem [46],[47, Thm. 3.8]. This shows the connection to both the log transformation and to the smoothing problem. In fact, the above can be used to derive the forward-backward equations of nonlinear smoothing (see [45] and [35, Appdx. B]).

Remark 7.17.

The stochastic optimal control problem (23) is equivalently stated as a deterministic optimal control problem on 𝒴{\cal Y}^{\dagger} [45, Sec. 3.2]. Note that the optimal control problem depends on a (fixed) observation sample path zz, which is the reason why a deterministic formulation is available.

7.2 Linear Gaussian case

The goal is to relate (23) to the minimum energy duality (3) described in Sec. 1 for the linear Gaussian model (1). In the linear Gaussian case, the controlled process (23b) becomes

dX~t=ATX~tdt+σUtdt+σdB~t,X~0N(m~0,Σ~0)\,\mathrm{d}\tilde{X}_{t}=A^{\hbox{\rm\tiny T}}\tilde{X}_{t}\,\mathrm{d}t+\sigma U_{t}\,\mathrm{d}t+\sigma\,\mathrm{d}\tilde{B}_{t},\quad\tilde{X}_{0}\sim N(\tilde{m}_{0},\tilde{\Sigma}_{0}) (24)

where UU, m~0,Σ~0\tilde{m}_{0},\tilde{\Sigma}_{0} are decision variables. Because the problem is linear Gaussian, it suffices to consider a linear control law of the form

Ut=Kt(X~tm~t)+utU_{t}=K_{t}(\tilde{X}_{t}-\tilde{m}_{t})+u_{t} (25)

where m~t:=𝖤(X~t)\tilde{m}_{t}:={\sf E}(\tilde{X}_{t}) and the two deterministic processes

K\displaystyle K ={Ktp×d:0tT}\displaystyle=\{K_{t}\in\mathbb{R}^{p\times d}:0\leq t\leq T\}
u\displaystyle u ={utp:0tT}\displaystyle=\{u_{t}\in\mathbb{R}^{p}:0\leq t\leq T\}

are the new decision variables. With a linear control law (25), the state X~t\tilde{X}_{t} is a Gaussian random variable with mean m~t\tilde{m}_{t} and variance Σ~t\tilde{\Sigma}_{t}. It is possible to equivalently express (23) as two un-coupled deterministic optimal control problems, for the mean and for the variance, respectively. Detailed calculations showing this are contained in Appendix A.9. In particular, it is shown that the optimal control problem for the mean is the classical minimum energy duality (3).

7.3 Comparison

Table 1 provides a side-by-side comparison of the two types of duality:

  • Mitter-Newton duality (23) on the left-hand side; and

  • Duality (10) proposed in this paper on the right-hand side.

In Sec. 7.2 and Sec. 3.3, the two are shown to be generalization of the classical minimum energy duality (3) and the minimum variance duality (2), respectively. All of this conclusively answers the two questions raised in Sec. 1.

We make a note of some important distinctions (compare with the bulleted list in Sec. 1):

  • Inputs and outputs. In proposed duality (10), inputs and outputs are dual processes that have the same dimension. These are element of the same Hilbert space 𝒰{\cal U}.

  • Constraint. The constraint is the dual control system (10b) studied in the companion paper (part I).

  • Stability condition. For asymptotic analysis of (10), stabilizability of the constraint is the most natural condition. The main result of part I was to establish that stabilizability of the dual control system is equivalent to the detectability of the HMM. The latter condition of course is central to filter stability.

  • Arrow of time. The dual control system is backward in time. However, it is important to note that the information structure (filtration) is forward in time. In particular, all the processes are forward adapted to the filtration 𝒵{\cal Z} defined by the observation process.

A major drawback of the proposed duality is that the problem (for the Euclidean state-space 𝕊=d\mathbb{S}=\mathbb{R}^{d}) is infinite-dimensional. This is to be expected because the nonlinear filter is infinite-dimensional. In contrast, the state space in the minimum energy duality is d\mathbb{R}^{d} which is important for algorithm design as in MEE. Having said that, the linear quadratic nature of the infinite-dimensional problem may prove to be useful in practical applications of this work.

8 Conclusions and directions of future work

In this paper, we presented the minimum variance dual optimal control problem for the nonlinear filtering problem. The mathematical relationship between the two problems is given by a duality principle. Two approaches are described to solve the problem, based on maximum principle and based on a martingale characterization. A formula for the optimal control as a feedback control law is obtained, and used to derive the equation of the nonlinear filter. A detailed comparison with the Mitter-Newton duality is given.

There are several possible directions of future research: An important next step is to use the controllability and stabilizability definitions of the dual control system to recover the known results in filter stability. Research on this has already begun with preliminary results appearing in [35, Chapter 7-8] and [48, 49]. Although some sufficient conditions have been obtained and compared with literature, a complete resolution still remains open.

Both the stability analysis and the optimal control formulation suggest natural connections to the dissipativity theory. Because the dual control system is linear, one might consider quadratic forms of supply rate function as follows (compare with the formula for the running cost ll):

s(y,v,u;x):=γ|u+v(x)|2|y(x)ct|2s(y,v,u;x):=\gamma|u+v(x)|^{2}-|y(x)-{c}_{t}|^{2}

where γ>0\gamma>0 and c:={ct:0tT}L𝒵2([0,T];)c:=\{c_{t}:0\leq t\leq T\}\in L^{2}_{{\cal Z}}\big{(}[0,T];\mathbb{R}\big{)} is a suitable stochastic process (which can be picked). Establishing conditions for existence of a storage function and relating these conditions to the properties of the HMM may be useful for stability and robustness analysis.

Another avenue is numerical approximation of the nonlinear filter by considering sub-optimal solutions of the dual optimal control problem. The simplest choice is to consider deterministic control inputs UL2([0,T];m)U\in L^{2}\big{(}[0,T];\mathbb{R}^{m}\big{)}. Some preliminary work on algorithm design along these lines appears in [36, Rem. 1][35, Sec. 9.2] and [50, Ch. 4]. In particular for the finite state space case, this approach provides derivation and justification of Kalman filter for Markov chains [51]. In this regard, it is useful to relate duality to both the feedback particle filter (FPF) [52] and to the special cases (apart from the linear Gaussian case) where the optimal filter is known to be finite-dimensional, e.g. [53].

9 Acknowledgement

It is a pleasure to acknowledge Sean Meyn and Amirhossein Taghvaei for many useful technical discussions over the years on the topic of duality. The authors also acknowledge Alain Bensoussan for his early encouragement of this work.

Appendix A Proofs of the statements

A.1 Proof of Thm. 1

For a Markov process, the following process is a martingale:

Nt(g)=g(Xt)0t𝒜g(Xs)dsN_{t}(g)=g(X_{t})-\int_{0}^{t}{\cal A}g(X_{s})\,\mathrm{d}s

Upon applying the Itô-Wentzell theorem [54, Thm. 1.17] on Yt(Xt)Y_{t}(X_{t}) (note here that all stochastic processes are forward adapted),

dYt(Xt)\displaystyle\,\mathrm{d}Y_{t}(X_{t}) =UtTdZt+(Ut+Vt(Xt))TdWt+dNt(Yt)\displaystyle=-U_{t}^{\hbox{\rm\tiny T}}\,\mathrm{d}Z_{t}+\big{(}U_{t}+V_{t}(X_{t})\big{)}^{\hbox{\rm\tiny T}}\,\mathrm{d}W_{t}+\,\mathrm{d}N_{t}(Y_{t})

Integrating both sides from 0 to TT,

F(XT)\displaystyle F(X_{T}) =Y0(X0)0TUtTdZt\displaystyle=Y_{0}(X_{0})-\int_{0}^{T}U_{t}^{\hbox{\rm\tiny T}}\,\mathrm{d}Z_{t}
+0T(Ut+Vt(Xt))TdWt+0TdNt(Yt)\displaystyle+\int_{0}^{T}(U_{t}+V_{t}(X_{t}))^{\hbox{\rm\tiny T}}\,\mathrm{d}W_{t}+\int_{0}^{T}\,\mathrm{d}N_{t}(Y_{t})

Consider now an estimator

ST=b0TUtTdZtS_{T}=b-\int_{0}^{T}U_{t}^{\hbox{\rm\tiny T}}\,\mathrm{d}Z_{t}

where bb\in\mathbb{R} is a deterministic constant. Then

F(XT)ST\displaystyle F(X_{T})-S_{T} =(Y0(X0)b)+0T(Ut+Vt(Xt))TdWt\displaystyle=\big{(}Y_{0}(X_{0})-b\big{)}+\int_{0}^{T}(U_{t}+V_{t}(X_{t}))^{\hbox{\rm\tiny T}}\,\mathrm{d}W_{t}
+0TdNt(Yt)\displaystyle\quad+\int_{0}^{T}\,\mathrm{d}N_{t}(Y_{t})

The left-hand side is the error of the estimator. The three terms on the right-hand side are mutually independent. Therefore, upon squaring and taking an expectation

𝖤(|F(XT)ST|2)=𝖤(|Y0(X0)μ(Y0)|2)+(μ(Y0)b)2\displaystyle{\sf E}\big{(}|F(X_{T})-S_{T}|^{2}\big{)}={\sf E}\big{(}|Y_{0}(X_{0})-\mu(Y_{0})|^{2}\big{)}+(\mu(Y_{0})-b)^{2}
+𝖤(0T|Ut+Vt(Xt)|2+(ΓYt)(Xt)dt)\displaystyle+{\sf E}\Big{(}\int_{0}^{T}|U_{t}+V_{t}(X_{t})|^{2}+(\Gamma Y_{t})(X_{t})\,\mathrm{d}t\Big{)}

The proof is completed by setting b=μ(Y0)b=\mu(Y_{0}).

A.2 Proof of Lemma 3.3

Because ZZ is a 𝖯~{\tilde{\sf P}}-B.M., the formula holds for πT(F)L𝒵T2(Ω;)\pi_{T}(F)\in L^{2}_{{\cal Z}_{T}}(\Omega;\mathbb{R}) by the Brownian motion representation theorem [42, Thm. 5.18]. Note that

|πT(F)|2F𝒴2,𝖯~-a.s.|\pi_{T}(F)|^{2}\leq\|F\|_{\cal Y}^{2},\quad{\tilde{\sf P}}\text{-a.s.}

because 𝒴\|\cdot\|_{\cal Y} is the sup norm. Therefore if FL𝒵T2(Ω;𝒴)F\in L^{2}_{{\cal Z}_{T}}(\Omega;{\cal Y}) then πT(F)L𝒵T2(Ω;)\pi_{T}(F)\in L^{2}_{{\cal Z}_{T}}(\Omega;\mathbb{R}). The conclusion follows.

A.3 Proof of Prop. 3.5

Using optimal control U(opt)={Ut(opt):0tT}𝒰U^{\text{\rm(opt)}}=\{U^{\text{\rm(opt)}}_{t}:0\leq t\leq T\}\in{\cal U}, (Y,V)={(Yt,Vt):0tT}L𝒵2([0,T];𝒴×𝒴m)(Y,V)=\{(Y_{t},V_{t}):0\leq t\leq T\}\in L^{2}_{{\cal Z}}\big{(}[0,T];{\cal Y}\times{\cal Y}^{m}\big{)} is the solution of the BSDE (10b) with YT=FL𝒵T2(Ω;𝒴)Y_{T}=F\in L^{2}_{{\cal Z}_{T}}(\Omega;{\cal Y}). Fix t[0,T]t\in[0,T] and let

St=μ(Y0)0t(Us(opt))TdZsS_{t}=\mu(Y_{0})-\int_{0}^{t}\big{(}U_{s}^{\text{\rm(opt)}}\big{)}^{\hbox{\rm\tiny T}}\,\mathrm{d}Z_{s}

Then by repeating the proof of Thm. 1 now over the time-horizon [0,t][0,t],

𝖤(|Yt(Xt)St|2)=var0(Y0)+𝖤(0tl(Ys,Vs,Us(opt);Xs)ds){\sf E}\big{(}|Y_{t}(X_{t})-S_{t}|^{2}\big{)}=\text{var}_{0}(Y_{0})+{\sf E}(\int_{0}^{t}l(Y_{s},V_{s},U_{s}^{\text{\rm(opt)}};X_{s})\,\mathrm{d}s)

If 𝖤(|Yt(Xt)St|2)=vart(Yt){\sf E}\big{(}|Y_{t}(X_{t})-S_{t}|^{2}\big{)}=\text{var}_{t}(Y_{t}) then there is nothing to prove. Because then St=πt(Yt)S_{t}=\pi_{t}(Y_{t}) (𝖯{\sf P}-a.s.) by the uniqueness of the conditional expectation. Therefore, suppose

vart(Yt)=𝖤(|Yt(Xt)πt(Yt)|2)<𝖤(|Yt(Xt)St|2)\text{var}_{t}(Y_{t})={\sf E}\big{(}|Y_{t}(X_{t})-\pi_{t}(Y_{t})|^{2}\big{)}<{\sf E}\big{(}|Y_{t}(X_{t})-S_{t}|^{2}\big{)}

In this case, we show that there exists a U~𝒰\tilde{U}\in{\cal U} such that 𝖩T(U~)<𝖩T(U(opt)){\sf J}_{T}(\tilde{U})<{\sf J}_{T}(U^{\text{\rm(opt)}}). Because U(opt)U^{\text{\rm(opt)}} is the optimal control, this provides the necessary contradiction.

Set C:=𝖤(tTl(Ys,Vs,Us(opt);Xs)ds)C:={\sf E}(\int_{t}^{T}l(Y_{s},V_{s},U_{s}^{\text{\rm(opt)}};X_{s})\,\mathrm{d}s) and we have

𝖩T(U(opt))=𝖤(|Yt(Xt)St|2)+C{\sf J}_{T}(U^{\text{\rm(opt)}})={\sf E}\big{(}|Y_{t}(X_{t})-S_{t}|^{2}\big{)}+C

Because YtL𝒵t2(Ω;𝒴)Y_{t}\in L^{2}_{{\cal Z}_{t}}(\Omega;{\cal Y}), by Lemma 3.3 there exists U^L𝒵2([0,t];m)\hat{U}\in L_{{\cal Z}}^{2}([0,t];\mathbb{R}^{m}) such that

πt(Yt)=𝖤~(πt(Yt))0tU^sTdZs,𝖯~-a.s.\pi_{t}(Y_{t})={\tilde{\sf E}}(\pi_{t}(Y_{t}))-\int_{0}^{t}\hat{U}_{s}^{\hbox{\rm\tiny T}}\,\mathrm{d}Z_{s},\quad{\tilde{\sf P}}\text{-a.s.}

Consider an admissible control U~\tilde{U} as follows

U~s={U^sstUs(opt)s>t\tilde{U}_{s}=\begin{cases}\hat{U}_{s}&s\leq t\\ U_{s}^{\text{\rm(opt)}}&s>t\end{cases}

and denote by (Y~,V~)(\tilde{Y},\tilde{V}) the solution of the BSDE with the control U~\tilde{U}. Because of the uniqueness of the solution, (Y~s,V~s)=(Ys,Vs)(\tilde{Y}_{s},\tilde{V}_{s})=(Y_{s},V_{s}) for all s>ts>t and therefore

𝖩T(U~)\displaystyle{\sf J}_{T}(\tilde{U}) =𝖤(|Yt(Xt)πt(Yt)|2)+C\displaystyle={\sf E}\big{(}|Y_{t}(X_{t})-\pi_{t}(Y_{t})|^{2}\big{)}+C
<𝖤(|Yt(Xt)St|2)+C=𝖩T(U(opt))\displaystyle<{\sf E}\big{(}|Y_{t}(X_{t})-S_{t}|^{2}\big{)}+C={\sf J}_{T}(U^{\text{\rm(opt)}})

This supplies the necessary contradiction and completes the proof.

A.4 Derivation of the Lagrangian

Using the change of measure formula (8),

𝖤((ΓYt)(Xt))\displaystyle{\sf E}\big{(}(\Gamma Y_{t})(X_{t})\big{)} =𝖤~(σt(ΓYt))\displaystyle={\tilde{\sf E}}\big{(}\sigma_{t}(\Gamma Y_{t})\big{)}
𝖤(|Ut+Vt(Xt)|2)\displaystyle{\sf E}\big{(}|U_{t}+V_{t}(X_{t})|^{2}\big{)} =𝖤~(σt(|Ut+Vt|2))\displaystyle={\tilde{\sf E}}\big{(}\sigma_{t}(|U_{t}+V_{t}|^{2})\big{)}

Even though the formula (8) is stated for deterministic functions, it is easily extended to 𝒵t{\cal Z}_{t}-measurable functions which is how it is used above. Therefore,

𝖩T(U)\displaystyle{\sf J}_{T}(U) =var0(Y0)+𝖤(0T(ΓYt)(Xt)+|Ut+Vt(Xt)|2dt)\displaystyle=\text{var}_{0}(Y_{0})+{\sf E}\Big{(}\int_{0}^{T}(\Gamma Y_{t})(X_{t})+|U_{t}+V_{t}(X_{t})|^{2}\,\mathrm{d}t\Big{)}
=var0(Y0)+𝖤~(0Tσt(ΓYt)+σt(|Ut+Vt|2)dt)\displaystyle=\text{var}_{0}(Y_{0})+{\tilde{\sf E}}\Big{(}\int_{0}^{T}\sigma_{t}(\Gamma Y_{t})+\sigma_{t}(|U_{t}+V_{t}|^{2})\,\mathrm{d}t\Big{)}
=var0(Y0)+𝖤~(0T(Yt,Vt,Ut;σt)dt)\displaystyle=\text{var}_{0}(Y_{0})+{\tilde{\sf E}}\Big{(}\int_{0}^{T}\ell(Y_{t},V_{t},U_{t};\sigma_{t})\,\mathrm{d}t\Big{)}

A.5 Proof of Thm. 4.9

Equation (17) is the Hamilton’s equation for optimal control of a BSDE [44, Thm. 4.4]. The optimal control is obtained from the maximum principle:

Ut=argmaxum(Yt,Vt,u,Pt;σt)U_{t}=\mathop{\operatorname{argmax}}_{u\in\mathbb{R}^{m}}\;{\cal H}(Y_{t},V_{t},u,P_{t};\sigma_{t})

Since {\cal H} is quadratic in the control input, the explicit formula (18) is obtained by evaluating the derivative and setting it to zero:

u(Yt,Vt,u,Pt;σt)=2σt(𝟣)u+2σt(Vt)+Pt(h)=0{\cal H}_{u}(Y_{t},V_{t},u,P_{t};\sigma_{t})=2\sigma_{t}({\sf 1})u+2\sigma_{t}(V_{t})+P_{t}(h)=0

A.6 Justification of the formula (19)

For notational ease, we drop the superscript (opt){}^{\text{\rm(opt)}} and denote the optimal control input simply as UtU_{t}. In this proof, ,\langle\cdot,\cdot\rangle is used to denote the duality paring between functions and measures (e.g., f,μ=μ(f)\langle f,\mu\rangle=\mu(f)).

Let ff be an arbitrary test function. We show that

f,Pt=2f(Ytπt(Yt)),σt,0<tT\langle f,P_{t}\rangle=\big{\langle}2f(Y_{t}-\pi_{t}(Y_{t})),\sigma_{t}\big{\rangle},\quad 0<t\leq T

This is known to be true at time t=0t=0 because of the boundary condition (17c). Therefore, the proof is carried out by taking a derivative of both sides and showing these to be identical.

Using the Itô-Wentzell formula for measure valued processes [55, Thm. 1.1],

d2f(Yt\displaystyle\,\mathrm{d}\big{\langle}2f(Y_{t} πt(Yt)),σt\displaystyle-\pi_{t}(Y_{t})),\sigma_{t}\big{\rangle}
=2𝒜(fYt)f(𝒜Yt)πt(Yt)(𝒜f),σtdt\displaystyle=2\big{\langle}{\cal A}(fY_{t})-f({\cal A}Y_{t})-\pi_{t}(Y_{t})({\cal A}f),\sigma_{t}\big{\rangle}\,\mathrm{d}t
+(2f(Ut+Vt),σt+fh,Pt)dZt\displaystyle\qquad\quad+(\langle 2f(U_{t}+V_{t}),\sigma_{t}\rangle+\langle fh,P_{t}\rangle)\,\mathrm{d}Z_{t}

where we have used d(πt(Yt))=UtdZt\,\mathrm{d}\big{(}\pi_{t}(Y_{t})\big{)}=-U_{t}\,\mathrm{d}Z_{t} (Prop. 3.5). From the Hamilton’s equation (17b), upon explicitly evaluating the terms

df,Pt\displaystyle\,\mathrm{d}\langle f,P_{t}\rangle =(𝒜f,Pt+ddϵσt(Γ(Yt+ϵf))|ϵ=0)dt\displaystyle=\Big{(}\langle{\cal A}f,P_{t}\rangle+\frac{\,\mathrm{d}}{\,\mathrm{d}\epsilon}\sigma_{t}\big{(}\Gamma(Y_{t}+\epsilon f)\big{)}\Big{|}_{\epsilon=0}\Big{)}\,\mathrm{d}t
+(fh,Pt+2f(Ut+Vt),σt)dZt\displaystyle\qquad\quad+(\langle fh,P_{t}\rangle+\langle 2f(U_{t}+V_{t}),\sigma_{t}\rangle)\,\mathrm{d}Z_{t}

where

ddϵΓ(Yt+ϵf)|ϵ=0=2(𝒜(Ytf)Yt(𝒜f)f(𝒜Yt))\frac{\,\mathrm{d}}{\,\mathrm{d}\epsilon}\Gamma(Y_{t}+\epsilon f)\Big{|}_{\epsilon=0}=2\big{(}{\cal A}(Y_{t}f)-Y_{t}({\cal A}f)-f({\cal A}Y_{t})\big{)}

On comparing the terms, the two derivatives are the seen to be the same where we use also the identity g,Pt=2g(Ytπt(Yt)),σt\langle g,P_{t}\rangle=\big{\langle}2g(Y_{t}-\pi_{t}(Y_{t})),\sigma_{t}\big{\rangle} for g=𝒜fg={\cal A}f.

A.7 Proof of Thm. 5.14

The proof uses the equation of the nonlinear filter and dIt:=dZtπt(h)dt\,\mathrm{d}I_{t}:=\,\mathrm{d}Z_{t}-\pi_{t}(h)\,\mathrm{d}t is the innovation increment. We evaluate the derivative of 𝒱t(Yt)=πt(Yt2)(πt(Yt))2{\cal V}_{t}(Y_{t})=\pi_{t}(Y_{t}^{2})-\big{(}\pi_{t}(Y_{t})\big{)}^{2}.

dπt(\displaystyle\,\mathrm{d}\pi_{t}( Yt2)\displaystyle Y_{t}^{2})
=πt(𝒜Yt2)dt+(πt(hYt2)πt(h)πt(Yt2))dIt\displaystyle=\pi_{t}({\cal A}Y_{t}^{2})\,\mathrm{d}t+\big{(}\pi_{t}(hY_{t}^{2})-\pi_{t}(h)\pi_{t}(Y_{t}^{2})\big{)}\,\mathrm{d}I_{t}
+πt(2Yt(𝒜Yt+h(Ut+Vt))+|Vt|2)dt\displaystyle\quad+\pi_{t}\big{(}-2Y_{t}\big{(}{\cal A}Y_{t}+h(U_{t}+V_{t})\big{)}+|V_{t}|^{2}\big{)}\,\mathrm{d}t
+2πt(YtVt)dZt+2(πt(hYtVt)πt(h)πt(YtVt))dt\displaystyle\quad+2\pi_{t}\big{(}Y_{t}V_{t}\big{)}\,\mathrm{d}Z_{t}+2\big{(}\pi_{t}(hY_{t}V_{t})-\pi_{t}(h)\pi_{t}(Y_{t}V_{t})\big{)}\,\mathrm{d}t
=πt(ΓYt)dt+πt(|Vt|2)dt2πt(hYt)Utdt\displaystyle=\pi_{t}\big{(}\Gamma Y_{t}\big{)}\,\mathrm{d}t+\pi_{t}(|V_{t}|^{2})\,\mathrm{d}t-2\pi_{t}(hY_{t})U_{t}\,\mathrm{d}t
+(πt(hYt2)πt(h)πt(Yt2)+2πt(YtVt))dIt\displaystyle\quad+\big{(}\pi_{t}(hY_{t}^{2})-\pi_{t}(h)\pi_{t}(Y_{t}^{2})+2\pi_{t}(Y_{t}V_{t})\big{)}\,\mathrm{d}I_{t}

Similarly,

dπt(Yt)\displaystyle\,\mathrm{d}\pi_{t}(Y_{t}) =πt(𝒜Yt)dt\displaystyle=\pi_{t}({\cal A}Y_{t})\,\mathrm{d}t
+(πt(hYt)πt(h)πt(Yt))(dZtπt(h)dt)\displaystyle\quad+\big{(}\pi_{t}(hY_{t})-\pi_{t}(h)\pi_{t}(Y_{t})\big{)}\big{(}\,\mathrm{d}Z_{t}-\pi_{t}(h)\,\mathrm{d}t\big{)}
πt(𝒜Yt+h(Ut+Vt))dt+πt(Vt)dZt\displaystyle\quad-\pi_{t}\big{(}{\cal A}Y_{t}+h(U_{t}+V_{t})\big{)}\,\mathrm{d}t+\pi_{t}\big{(}V_{t}\big{)}\,\mathrm{d}Z_{t}
+(πt(hVt)πt(h)πt(Vt))dt\displaystyle\quad+\big{(}\pi_{t}(hV_{t})-\pi_{t}(h)\pi_{t}(V_{t})\big{)}\,\mathrm{d}t
=(πt(hYt)πt(h)πt(Yt)+πt(Vt))dZt\displaystyle=\big{(}\pi_{t}(hY_{t})-\pi_{t}(h)\pi_{t}(Y_{t})+\pi_{t}(V_{t})\big{)}\,\mathrm{d}Z_{t}
(Ut+πt(hYt)πt(h)πt(Yt)+πt(Vt))πt(h)dt\displaystyle\quad-\big{(}U_{t}+\pi_{t}(hY_{t})-\pi_{t}(h)\pi_{t}(Y_{t})+\pi_{t}(V_{t})\big{)}\pi_{t}(h)\,\mathrm{d}t
=Ut(opt)dZt(UtUt(opt))πt(h)dt\displaystyle=U_{t}^{\text{\rm(opt)}}\,\mathrm{d}Z_{t}-(U_{t}-U_{t}^{\text{\rm(opt)}})\pi_{t}(h)\,\mathrm{d}t

where Ut(opt):=πt(hYt)+πt(h)πt(Yt)πt(Vt)U_{t}^{\text{\rm(opt)}}:=-\pi_{t}(hY_{t})+\pi_{t}(h)\pi_{t}(Y_{t})-\pi_{t}(V_{t}). Therefore,

d(πt(Yt))2\displaystyle\,\mathrm{d}\big{(}\pi_{t}(Y_{t})\big{)}^{2} =2πt(Yt)Ut(opt)dZt+|Ut(opt)|2dt\displaystyle=2\pi_{t}(Y_{t})U_{t}^{\text{\rm(opt)}}\,\mathrm{d}Z_{t}+|U_{t}^{\text{\rm(opt)}}|^{2}\,\mathrm{d}t
2πt(Yt)(UtUt(opt))πt(h)dt\displaystyle\quad-2\pi_{t}(Y_{t})(U_{t}-U_{t}^{\text{\rm(opt)}})\pi_{t}(h)\,\mathrm{d}t

Collecting terms, we have

dMt=\displaystyle\,\mathrm{d}M_{t}= πt(ΓYt)dt+πt(|Vt|2)dt2πt(hYt)Utdt\displaystyle\pi_{t}\big{(}\Gamma Y_{t}\big{)}\,\mathrm{d}t+\pi_{t}(|V_{t}|^{2})\,\mathrm{d}t-2\pi_{t}(hY_{t})U_{t}\,\mathrm{d}t
+(πt(hYt2)πt(h)πt(Yt2)+2πt(YtVt))dIt\displaystyle+\big{(}\pi_{t}(hY_{t}^{2})-\pi_{t}(h)\pi_{t}(Y_{t}^{2})+2\pi_{t}(Y_{t}V_{t})\big{)}\,\mathrm{d}I_{t}
2πt(Yt)Ut(opt)dZt+2πt(Yt)(UtUt(opt))πt(h)dt\displaystyle-2\pi_{t}(Y_{t})U_{t}^{\text{\rm(opt)}}\,\mathrm{d}Z_{t}+2\pi_{t}(Y_{t})\big{(}U_{t}-U_{t}^{\text{\rm(opt)}}\big{)}\pi_{t}(h)\,\mathrm{d}t
|Ut(opt)|2dt(Yt,Vt,Ut;πt)dt\displaystyle-|U_{t}^{\text{\rm(opt)}}|^{2}\,\mathrm{d}t-\ell(Y_{t},V_{t},U_{t};\pi_{t})\,\mathrm{d}t
=\displaystyle= (πt(hYt2)πt(h)πt(Yt2)+2πt(YtVt))dIt\displaystyle\big{(}\pi_{t}(hY_{t}^{2})-\pi_{t}(h)\pi_{t}(Y_{t}^{2})+2\pi_{t}(Y_{t}V_{t})\big{)}\,\mathrm{d}I_{t}
|UtUt(opt)|2dt\displaystyle-|U_{t}-U_{t}^{\text{\rm(opt)}}|^{2}\,\mathrm{d}t

Since |UtUt(opt)|20-|U_{t}-U_{t}^{\text{\rm(opt)}}|^{2}\leq 0 and II is a 𝖯{\sf P}-martingale, MM is a 𝖯{\sf P}-supermartingale, and it is a martingale if and only if Ut=Ut(opt)U_{t}=U_{t}^{\text{\rm(opt)}} for all tt.

A.8 Formal derivation of the nonlinear filter

We begin with an ansatz

dπt(f)=αt(f)dt+βt(f)dZt\,\mathrm{d}\pi_{t}(f)=\alpha_{t}(f)\,\mathrm{d}t+\beta_{t}(f)\,\mathrm{d}Z_{t} (26)

where the goal is to obtain formulae for αt\alpha_{t} and βt\beta_{t}. Because we have an equation (21) for πt(Yt)\pi_{t}(Y_{t}), let us express d(πt(Yt))\,\mathrm{d}(\pi_{t}(Y_{t})) in terms of the unknown αt\alpha_{t} and βt\beta_{t}. Using the SDE (26) for πt\pi_{t} and the BSDE (10b) for YtY_{t}, apply the Itô-Wentzell formula to obtain

d(πt(Yt))\displaystyle\,\mathrm{d}\big{(}\pi_{t}(Y_{t})\big{)} =(αt(Yt)+βt(Vt)πt(𝒜Yt+hT(Ut+Vt)))dt\displaystyle=(\alpha_{t}(Y_{t})+\beta_{t}(V_{t})-\pi_{t}({\cal A}Y_{t}+h^{\hbox{\rm\tiny T}}(U_{t}+V_{t})))\,\mathrm{d}t
+(βt(Yt)+πt(Vt))dZt\displaystyle\qquad+(\beta_{t}(Y_{t})+\pi_{t}(V_{t}))\,\mathrm{d}Z_{t}

Comparing with (21),

αt(Yt)+βt(Vt)πt(𝒜Yt+hT(Ut+Vt))=0\displaystyle\alpha_{t}(Y_{t})+\beta_{t}(V_{t})-\pi_{t}({\cal A}Y_{t}+h^{\hbox{\rm\tiny T}}(U_{t}+V_{t}))=0
βt(Yt)+πt(Vt)=(πt(hYt)πt(h)πt(Yt))+πt(Vt)\displaystyle\beta_{t}(Y_{t})+\pi_{t}(V_{t})=\big{(}\pi_{t}(hY_{t})-\pi_{t}(h)\pi_{t}(Y_{t})\big{)}+\pi_{t}(V_{t})

for 0tT0\leq t\leq T, 𝖯-a.s.{\sf P}\text{-a.s.}. Because FF and therefore YtY_{t} is arbitrary, the second of these equations suggests setting

βt(f)=πt(hf)πt(h)πt(f)\beta_{t}(f)=\pi_{t}(hf)-\pi_{t}(h)\pi_{t}(f)

using which the first equation is manipulated to show

αt(Yt)\displaystyle\alpha_{t}(Y_{t}) =πt(𝒜Yt)πt(h)(πt(hYt)πt(h)πt(Yt)+πt(Vt))\displaystyle=\pi_{t}({\cal A}Y_{t})-\pi_{t}(h)\big{(}\pi_{t}(hY_{t})-\pi_{t}(h)\pi_{t}(Y_{t})+\pi_{t}(V_{t})\big{)}
+πt(hVt)πt(hVt)+πt(h)πt(Vt)\displaystyle\qquad\quad+\pi_{t}(hV_{t})-\pi_{t}(hV_{t})+\pi_{t}(h)\pi_{t}(V_{t})
=πt(𝒜Yt)πt(h)(πt(hYt)πt(h)πt(Yt))\displaystyle=\pi_{t}({\cal A}Y_{t})-\pi_{t}(h)\big{(}\pi_{t}(hY_{t})-\pi_{t}(h)\pi_{t}(Y_{t})\big{)}

which gives the following

αt(f)=πt(𝒜f)βt(f)πt(h)\alpha_{t}(f)=\pi_{t}({\cal A}f)-\beta_{t}(f)\pi_{t}(h)

Substituting the expressions for αt\alpha_{t} and βt\beta_{t} into the ansatz (26)

dπt(f)=(πt(𝒜f)βt(f)πt(h))dt+βt(f)dZt\displaystyle\,\mathrm{d}\pi_{t}(f)=\big{(}\pi_{t}({\cal A}f)-\beta_{t}(f)\pi_{t}(h)\big{)}\,\mathrm{d}t+\beta_{t}(f)\,\mathrm{d}Z_{t}
=πt(𝒜f)dt+(πt(hf)πt(h)πt(f))(dZtπt(h)dt)\displaystyle\;\;=\pi_{t}({\cal A}f)\,\mathrm{d}t+\big{(}\pi_{t}(hf)-\pi_{t}(h)\pi_{t}(f)\big{)}(\,\mathrm{d}Z_{t}-\pi_{t}(h)\,\mathrm{d}t)

This is the well known SDE of the nonlinear filter.

A.9 Mitter-Newton duality for the linear Gaussian model

Consider (24) with the linear control law (25). Then X~t\tilde{X}_{t} is a Gaussian random variable whose mean m~t\tilde{m}_{t} and variance Σ~t\tilde{\Sigma}_{t} evolve as follows:

dm~tdt\displaystyle\frac{\,\mathrm{d}\tilde{m}_{t}}{\,\mathrm{d}t} =ATm~t+σut\displaystyle=A^{\hbox{\rm\tiny T}}\tilde{m}_{t}+\sigma{u}_{t} (27a)
dΣ~tdt\displaystyle\frac{\,\mathrm{d}\tilde{\Sigma}_{t}}{\,\mathrm{d}t} =(AT+σKt)Σ~t+Σ~t(AT+σKt)T+σσT\displaystyle=(A^{\hbox{\rm\tiny T}}+\sigma K_{t})\tilde{\Sigma}_{t}+\tilde{\Sigma}_{t}(A^{\hbox{\rm\tiny T}}+\sigma K_{t})^{\hbox{\rm\tiny T}}+\sigma\sigma^{\hbox{\rm\tiny T}} (27b)

Note that the two equations are entirely un-coupled: utu_{t} affects only the equation for m~t\tilde{m}_{t} and KtK_{t} affects only the equation for Σ~t\tilde{\Sigma}_{t}. We now turn to explicitly computing the running cost. For the linear Gaussian model

(𝒜uh)(x)=HT(ATx+σu)({\cal A}^{u}h)(x)=H^{\hbox{\rm\tiny T}}(A^{\hbox{\rm\tiny T}}x+\sigma u)

and the running cost becomes

l(x,u;zt)\displaystyle l(x,u\,;z_{t}) =12|u|2+|HTx|2+ztHT(ATx+σu)\displaystyle={\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}|u|^{2}+|H^{\hbox{\rm\tiny T}}x|^{2}+z_{t}H^{\hbox{\rm\tiny T}}(A^{\hbox{\rm\tiny T}}x+\sigma u)

Because X~tN(m~t,Σ~t)\tilde{X}_{t}\sim N(\tilde{m}_{t},\tilde{\Sigma}_{t}),

𝖤(l(X~t,ut;zt))\displaystyle{\sf E}\big{(}l(\tilde{X}_{t},u_{t}\,;z_{t})\big{)} =12|ut|2+12tr(KtTKtΣ~t)+12|HTm~t|2\displaystyle={\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}|{u}_{t}|^{2}+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}\mbox{tr}(K_{t}^{\hbox{\rm\tiny T}}K_{t}\tilde{\Sigma}_{t})+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}|H^{\hbox{\rm\tiny T}}\tilde{m}_{t}|^{2}
+12tr(HHTΣ~t)+ztHT(ATm~t+σut)\displaystyle\quad+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}\mbox{tr}(HH^{\hbox{\rm\tiny T}}\tilde{\Sigma}_{t})+z_{t}H^{\hbox{\rm\tiny T}}(A^{\hbox{\rm\tiny T}}\tilde{m}_{t}+\sigma{u}_{t})

and because μ~\tilde{\mu} from μ\mu are both Gaussian, the divergence

𝖤(logdμ~dμ(X~0))\displaystyle{\sf E}\Big{(}\log\frac{\,\mathrm{d}\tilde{\mu}}{\,\mathrm{d}\mu}(\tilde{X}_{0})\Big{)} =12(m0m~0)TΣ01(m0m~0)\displaystyle={\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}(m_{0}-\tilde{m}_{0})^{\hbox{\rm\tiny T}}\Sigma_{0}^{-1}(m_{0}-\tilde{m}_{0})
+12logdet(Σ~0)det(Σ0)d2+12tr(Σ~0Σ01)\displaystyle\quad+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}\log\frac{\det(\tilde{\Sigma}_{0})}{\det(\Sigma_{0})}-\frac{d}{2}+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}\mbox{tr}(\tilde{\Sigma}_{0}\Sigma_{0}^{-1})

and because h()h(\cdot) is linear the terminal condition term

𝖤(zTh(X~T))=zTHTm~T{\sf E}\big{(}z_{T}h(\tilde{X}_{T})\big{)}=z_{T}H^{\hbox{\rm\tiny T}}\tilde{m}_{T}

Combining all of the above, upon a formal integration by parts, 𝖩(μ~,U;z){\sf J}(\tilde{\mu},U;z) is expressed as sum of two un-coupled costs

𝖩1(m~0,u;z)\displaystyle{\sf J}_{1}(\tilde{m}_{0},{u}\,;z) =12(m0m~0)TΣ01(m0m~0)\displaystyle={\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}(m_{0}-\tilde{m}_{0})^{\hbox{\rm\tiny T}}\Sigma_{0}^{-1}(m_{0}-\tilde{m}_{0})
+0T12|ut|2+12|z˙tHTm~t|2dt\displaystyle+\int_{0}^{T}{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}|u_{t}|^{2}+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}|\dot{z}_{t}-H^{\hbox{\rm\tiny T}}\tilde{m}_{t}|^{2}\,\mathrm{d}t
𝖩2(Σ~0,K;z)\displaystyle{\sf J}_{2}(\tilde{\Sigma}_{0},K\,;z) =12log(det(Σ~0))+12tr(Σ~0Σ01)\displaystyle={\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}\log(\det(\tilde{\Sigma}_{0}))+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}\mbox{tr}(\tilde{\Sigma}_{0}\Sigma_{0}^{-1})
+0T12tr(KtTKtΣ~t)+12tr(HHTΣ~t)dt\displaystyle+\int_{0}^{T}{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}\mbox{tr}(K_{t}^{\hbox{\rm\tiny T}}K_{t}\tilde{\Sigma}_{t})+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}\mbox{tr}(HH^{\hbox{\rm\tiny T}}\tilde{\Sigma}_{t})\,\mathrm{d}t

plus a few constant terms that are not affected by the decision variables. The first of these costs subject to the ODE constraint (27a) for the mean m~t\tilde{m}_{t} is the classical minimum energy duality.

References

  • [1] R. E. Kalman, “On the general theory of control systems,” in Proceedings First International Conference on Automatic Control, Moscow, USSR, 1960, pp. 481–492.
  • [2] R. E. Kalman and R. S. Bucy, “New results in linear filtering and prediction theory,” Journal of basic engineering, vol. 83, no. 1, pp. 95–108, 1961.
  • [3] A. Bensoussan, Estimation and Control of Dynamical Systems.   Springer, 2018, vol. 48.
  • [4] E. Todorov, “General duality between optimal control and estimation,” in 2008 IEEE 47th Conference on Decision and Control (CDC), 12 2008, pp. 4286–4292.
  • [5] K. J. Åström, Introduction to Stochastic Control Theory.   Academic Press, 1970.
  • [6] A. E. Bryson and Y.-C. Ho, Applied optimal control: optimization, estimation, and control.   Routledge, 2018.
  • [7] D. Fraser and J. Potter, “The optimum linear smoother as a combination of two optimum linear filters,” IEEE Transactions on automatic control, vol. 14, no. 4, pp. 387–390, 1969.
  • [8] R. E. Mortensen, “Maximum-likelihood recursive nonlinear filtering,” Journal of Optimization Theory and Applications, vol. 2, no. 6, pp. 386–394, 1968.
  • [9] J. B. Rawlings, D. Q. Mayne, and M. Diehl, Model predictive control: theory, computation, and design.   Nob Hill Publishing Madison, WI, 2017, vol. 2.
  • [10] W. H. Fleming and S. K. Mitter, “Optimal control and nonlinear filtering for nondegenerate diffusion processes,” Stochastics: An International Journal of Probability and Stochastic Processes, vol. 8, no. 1, pp. 63–77, 1982.
  • [11] S. K. Mitter and N. J. Newton, “A variational approach to nonlinear estimation,” SIAM Journal on Control and Optimization, vol. 42, no. 5, pp. 1813–1833, 2003.
  • [12] H. Michalska and D. Q. Mayne, “Moving horizon observers and observer-based control,” IEEE Transactions on Automatic Control, vol. 40, no. 6, pp. 995–1006, 1995.
  • [13] C. V. Rao, J. B. Rawlings, and J. H. Lee, “Constrained linear state estimation—a moving horizon approach,” Automatica, vol. 37, no. 10, pp. 1619–1628, 2001.
  • [14] A. J. Krener, “The convergence of the minimum energy estimator,” in New Trends in Nonlinear Dynamics and Control and their Applications.   Springer, 2003, pp. 187–208.
  • [15] D. A. Copp and J. P. Hespanha, “Simultaneous nonlinear model predictive control and state estimation,” Automatica, vol. 77, pp. 143–154, 2017.
  • [16] M. Farina, G. Ferrari-Trecate, and R. Scattolini, “Distributed moving horizon estimation for linear constrained systems,” IEEE Trans. on Auto. Control, vol. 55, no. 11, pp. 2462–2475, 2010.
  • [17] R. Schneider, R. Hannemann-Tamás, and W. Marquardt, “An iterative partition-based moving horizon estimator with coupled inequality constraints,” Automatica, vol. 61, pp. 302–307, 2015.
  • [18] A. Alessandri, M. Baglietto, and G. Battistelli, “A maximum-likelihood kalman filter for switching discrete-time linear systems,” Automatica, vol. 46, no. 11, pp. 1870–1876, 2010.
  • [19] J. W. Kim and P. G. Mehta, “Duality for nonlinear filtering I: Observability,” unpublished.
  • [20] W. H. Fleming, “Exit probabilities and optimal stochastic control,” Applied Mathematics and Optimization, vol. 4, no. 1, pp. 329–346, 1978.
  • [21] A. Bensoussan, Stochastic control of partially observable systems.   Cambridge University Press, 1992.
  • [22] W. H. Fleming and E. De Giorgi, “Deterministic nonlinear filtering,” Annali della Scuola Normale Superiore di Pisa-Classe di Scienze-Serie IV, vol. 25, no. 3, pp. 435–454, 1997.
  • [23] Y. Chen, T. T. Georgiou, and M. Pavon, “On the relation between optimal transport and Schrödinger bridges: A stochastic control viewpoint,” Journal of Optimization Theory and Applications, vol. 169, no. 2, pp. 671–691, 2016.
  • [24] H. J. Kappen and H. C. Ruiz, “Adaptive importance sampling for control and inference,” Journal of Statistical Physics, vol. 162, no. 5, pp. 1244–1266, 2016.
  • [25] S. Reich, “Data assimilation: the Schrödinger perspective,” Acta Numerica, vol. 28, pp. 635–711, 2019.
  • [26] H. Ruiz and H. J. Kappen, “Particle smoothing for hidden diffusion processes: Adaptive path integral smoother,” IEEE Transactions on Signal Processing, vol. 65, no. 12, pp. 3191–3203, 2017.
  • [27] T. Sutter, A. Ganguly, and H. Koeppl, “A variational approach to path estimation and parameter inference of hidden diffusion processes,” Journal of Machine Learning Research, vol. 17, pp. 6544–80, 2016.
  • [28] R. van Handel, “Filtering, stability, and robustness,” Ph.D. dissertation, California Institute of Technology, Pasadena, 12 2006.
  • [29] K. W. Simon and A. R. Stubberud, “Duality of linear estimation and control,” Journal of Optimization Theory and Applications, vol. 6, no. 1, pp. 55–67, 1970.
  • [30] G. C. Goodwin, J. A. de Doná, M. M. Seron, and X. W. Zhuo, “Lagrangian duality between constrained estimation and control,” Automatica, vol. 41, no. 6, pp. 935–944, 2005.
  • [31] P. K. Mishra, G. Chowdhary, and P. G. Mehta, “Minimum variance constrained estimator,” Automatica, vol. 137, p. 110106, 2022.
  • [32] B. K. Kwon, S. Han, O. K. Kwon, and W. H. Kwon, “Minimum variance FIR smoothers for discrete-time state space models,” IEEE Signal Processing Letters, vol. 14, no. 8, pp. 557–560, 2007.
  • [33] S. Zhao, Y. S. Shmaliy, B. Huang, and F. Liu, “Minimum variance unbiased FIR filter for discrete time-variant systems,” Automatica, vol. 53, pp. 355–361, 2015.
  • [34] M. Darouach, M. Zasadzinski, and M. Boutayeb, “Extension of minimum variance estimation for systems with unknown inputs,” Automatica, vol. 39, no. 5, pp. 867–876, 2003.
  • [35] J. W. Kim, “Duality for nonlinear filtiering,” Ph.D. dissertation, University of Illinois at Urbana-Champaign, Urbana, 06 2022.
  • [36] J. W. Kim, P. G. Mehta, and S. Meyn, “What is the Lagrangian for nonlinear filtering?” in 2019 IEEE 58th Conference on Decision and Control (CDC).   Nice, France: IEEE, 12 2019, pp. 1607–1614.
  • [37] D. Bakry, I. Gentil, and M. Ledoux, Analysis and geometry of Markov diffusion operators.   Springer Science & Business Media, 2013, vol. 348.
  • [38] B. Øksendal, Stochastic differential equations: an introduction with applications.   Springer Science & Business Media, 2013.
  • [39] J. Xiong, An Introduction to Stochastic Filtering Theory.   Oxford University Press on Demand, 2008, vol. 18.
  • [40] J. Yong and X. Y. Zhou, Stochastic controls: Hamiltonian systems and HJB equations.   Springer Science & Business Media, 1999, vol. 43.
  • [41] J. Ma and J. Yong, “On linear, degenerate backward stochastic partial differential equations,” Probability Theory and Related Fields, vol. 113, no. 2, pp. 135–170, 1999.
  • [42] J. F. Le Gall, Brownian Motion, Martingales, and Stochastic Calculus.   Springer, 2016, vol. 274.
  • [43] E. Pardoux and A. Răşcanu, Stochastic Differential Equations, Backward SDEs, Partial Differential Equations.   Springer, 2014.
  • [44] S. Peng, “Backward stochastic differential equations and applications to optimal control,” Applied Mathematics and Optimization, vol. 27, no. 2, pp. 125–144, 1993.
  • [45] J. W. Kim and P. G. Mehta, “An optimal control derivation of nonlinear smoothing equations,” in Proceedings of the Workshop on Dynamics, Optimization and Computation held in honor of the 60th birthday of Michael Dellnitz.   Springer, 2020, pp. 295–311.
  • [46] E. Pardoux, “Backward and forward stochastic partial differential equations associated with a non linear filtering problem,” in 1979 18th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes, vol. 2.   IEEE, 1979, pp. 166–171.
  • [47] ——, “Non-linear filtering, prediction and smoothing,” in Stochastic systems: the mathematics of filtering and identification and applications.   Springer, 1981, pp. 529–557.
  • [48] J. W. Kim, P. G. Mehta, and S. Meyn, “The conditional Poincaré inequality for filter stability,” in 2021 IEEE 60th Conference on Decision and Control (CDC), 12 2021, pp. 1629–1636.
  • [49] J. W. Kim and P. G. Mehta, “A dual characterization of the stability of the Wonham filter,” in 2021 IEEE 60th Conference on Decision and Control (CDC), 12 2021, pp. 1621–1628.
  • [50] J. Szalankiewicz, “Duality in nonlinear filtering,” Master’s thesis, Technische Universität Berlin, Institut für Mathematik, Berlin, 2021.
  • [51] N. V. Krylov, R. S. Lipster, and A. A. Novikov, “Kalman filter for Markov processes,” in Statistics and Control of Stochastic Processes.   New York: Optimization Software, inc., 1984, pp. 197–213.
  • [52] T. Yang, P. G. Mehta, and S. Meyn, “Feedback particle filter,” IEEE Transactions on Automatic Control, vol. 58, no. 10, pp. 2465–2480, 10 2013.
  • [53] V. E. Beneš, “Exact finite-dimensional filters for certain diffusions with nonlinear drift,” Stochastics, vol. 5, no. 1-2, pp. 65–92, 1981.
  • [54] B. L. Rozovsky and S. V. Lototsky, Stochastic Evolution Systems: Linear Theory and Applications to Non-Linear Filtering.   Springer, 2018, vol. 89.
  • [55] N. V. Krylov, “On the Itô–Wentzell formula for distribution-valued processes and related topics,” Probability Theory and Related Fields, vol. 150, no. 1-2, pp. 295–319, 2011.
{IEEEbiography}

[[Uncaptioned image]]Jin Won Kim received the Ph.D. degree in Mechanical Engineering from University of Illinois at Urbana-Champaign, Urbana, IL, in 2022. He is now a postdocdoral research scientist in the Institute of Mathematics at the University of Potsdam. His current research interests are in nonlinear filtering and stochastic optimal control. He received the Best Student Paper Awards at the IEEE Conference on Decision and Control 2019.

{IEEEbiography}

[[Uncaptioned image]]Prashant G. Mehta received the Ph.D. degree in Applied Mathematics from Cornell University, Ithaca, NY, in 2004. He is a Professor of Mechanical Science and Engineering at the University of Illinois at Urbana-Champaign. Prior to joining Illinois, he was a Research Engineer at the United Technologies Research Center (UTRC). His current research interests are in nonlinear filtering. He received the Outstanding Achievement Award at UTRC for his contributions to the modeling and control of combustion instabilities in jet-engines. His students received the Best Student Paper Awards at the IEEE Conference on Decision and Control 2007, 2009 and 2019, and were finalists for these awards in 2010 and 2012. In the past, he has served on the editorial boards of the ASME Journal of Dynamic Systems, Measurement, and Control and the Systems and Control Letters. He currently serves on the editorial board of the IEEE Transactions on Automatic Control.