1. Introduction
In this work we study certain large deviation asymptotics for nonlinear filtering problems with small signal and observation noise. As the noise in the signal and observation processes vanish, the filtering problem can formally be replaced by a variational problem and one may approximate the filtering estimates (namely suitable conditional probabilities or expectations) by solutions of certain deterministic optimization problems. However due to randomness there will be occasional large deviations of the true nonlinear filter estimates from the variational problem solutions. The main goal of this work is to investigate the probabilities of such deviations by establishing a suitable large deviation principle. Large deviations and related asymptotic problems in the context of small noise nonlinear filtering have been investigated, under different settings, in many works [15, 13, 2, 16, 21, 3, 24, 18, 19, 11, 22, 1]. We summarize the main results of these works and their relation to the asymptotic questions considered in the current work at the end of this section.
In order to describe our results precisely, we begin by introducing the filtering model that we study.
We consider a signal process given as the solution of the -dimensional stochastic differential equation (SDE)
(1.1) |
|
|
|
and an -dimensional observation process governed by the equation
(1.2) |
|
|
|
on some probability space .
Here is a small parameter, is some given finite time horizon, and are mutually independent standard Brownian motions in
and respectively, is known deterministic initial condition of the signal, and the functions and are required to satisfy the following condition.
Assumption 1.
The following hold.
-
(a)
The functions from , , are Lipschitz: For some
|
|
|
-
(b)
The function is bounded: For some
|
|
|
-
(c)
The function is twice continuously differentiable with bounded first and second derivatives.
Note that under Assumption 1 there is a unique pathwise solution of (1.1) and the solution is a stochastic process with sample paths in
(the space of continuous functions from to equipped with the uniform metric).
The filtering problem is concerned with the computation of the conditional expectations of the form
(1.3) |
|
|
|
where and
is a suitable map. The stochastic process with values in the space of probability measures on , given by
|
|
|
is usually referred to as the nonlinear filter.
In this work we are interested in the study of the asymptotic behavior of the nonlinear filter as .
Denote by the unique solution of
(1.4) |
|
|
|
It can be shown that, under additional conditions (see discussion in Section 2), that, as ,
(1.5) |
|
|
|
weakly.
In particular for Borel subsets of whose closure does not contain one will have in probability as .
It is of interest to study the rate of decay of conditional probabilities of such non-typical state trajectories.
As a special case of the results of the current paper (see Corollary 4.2) it will follow that
for every real continuous and bounded function on , denoting
(1.6) |
|
|
|
(1.7) |
|
|
|
where denotes convergence in probability under ,
and is the rate function on associated with the large deviation principle for (see Section 2).
From this convergence it follows using standard arguments (see e.g. [6, Theorem 1.8]),
that, for all Borel subsets of )
(1.8) |
|
|
|
|
|
|
|
|
where for real random variables and a constant we say
[ resp. ]
if [resp. ] converges to in -probability, and for a set , and denote its interior and closure respectively..
Thus the convergence in (1.7) gives information on rate of decays of conditional probabilities of non-typical state trajectories. Formally, denoting the infimum in the above display as , we can write approximations for conditional probabilities:
(1.9) |
|
|
|
However, due to stochastic fluctuations, one may find that for some ‘rogue’ observation trajectories the conditional probabilities on the left side of (1.9) are quite different from the deterministic approximation on the right side of (1.9).
In order to quantify the probabilities of observing such rogue observation trajectories that cause deviations from the bounds in (1.8), a natural approach is to study a large deviation principle for valued random variables whose typical (law of large numbers) behavior is described by the right side of (1.7).
Establishing such a large deviation principle is the goal of this work.
Such a result gives information on decay rates of probabilities of the form
|
|
|
|
|
|
|
|
for suitable sets and .
Our main result is
Theorem 2.1 which gives a large deviation principle for , for every continuous and bounded function on with a rate funcion defined by the variational formula in (2.16)-(2.17).
Notation. The following notation and definitions will be used. For the Euclidean norm in will be denoted as and the corresponding inner product will be written as
. The space of finite positive measures (resp. probability measures) on a Polish space will be denoted by
(resp. ). The space of bounded measurable (resp. continuous and bounded) functions from will be denoted by
and respectively. For , .
For and , . Borel -field on a Polish space will be denoted as .
For and , will denote the space of continuous functions from to which is equipped with the supremum norm, defined as , . Since will be fixed in most of this work, frequently the subscript in
and will be dropped. We denote by the Hilbert space of square-integrable functions from to .
By convention, the infimum over an empty set will be taken to be . For random variables , with values in some Polish space , convergence in distribution of to will be denoted as .
A function from a Polish space to is called a rate function if it has compact sub-level sets, namely the set is a compact set of for every .
Given a function such that as , and a rate function on a Polish space , a collection of valued random variables is said to satisfy a large deviation principle (LDP) with rate function and speed if for every
|
|
|
Relation with existing body of work.
Denote by the collection of absolutely continuous functions that satisfy .
For define as
(1.10) |
|
|
|
where is the rate function of defined in (2.9). The functional was introduced in Mortensen[20] as the objective function in an optimization problem whose minima describes the most probable trajectory given the data in a nonlinear filtering problem in an appropriate asymptotic sense. This functional is also used in implementing the popular DVAR data assimilation algorithm (cf. [7, Section 3.2], [12, Chapter 16]). Connection of the optimization problem associated with the objective function in (1.10) with the asymptotics of classical small noise filtering problem has been studied by several authors [15, 14, 16].
We now describe this connection.
In Section 2 we will introduce a continuous map such that a.s.
In [15], Hijab established, under conditions (that include boundedness and smoothness of various coefficients functions), a large deviation principle for the collection of probability measures (on ) (with speed ), for a fixed in , with rate function given by
(1.11) |
|
|
|
In a related direction, Hijab’s Ph.D. dissertation [14], studied asymptotics of the unnormalized conditional density and established, under conditions, an asymptotic formula of the form
|
|
|
where denotes the solution of the Zakai equation associated with the nonlinear filter (cf. [17]). The deterministic function coincides with Mortensen’s (deterministic) minimum energy estimate [20] which is given as solution of a certain minimization problem related to the objective function .
Results of Hijab were extended to random initial conditions in [16], once again assuming boundedness and smoothness of coefficients. In related work, the problem of constructing observers for dynamical systems as limits of stochastic nonlinear filters is studied in [2]. Heunis[13] studies a somewhat different asymptotic problem for small noise nonlinear filters. Specifically, it is shown in [13], that for every , , and for any for which the map defined in (2.13) has a unique minimizer (at say ),
|
|
|
This result and its connection to our work are further discussed in Section 2. In particular the statement in (1.5) follows readily on using similar ideas as in [13].
The work of Pardoux and Zeitouni[21] considers a one dimensional nonlinear filtering problem where the observation noise is small while the signal noise is (specifically, the term in (1.1) is replaced by ). In this case the conditional distribution of given converges a.s. to a Dirac measure as . The paper [21] proves a quenched
LDP for this conditional distribution (regarded as a collection of probability measures on parametrized by ) in . In a somewhat different direction, in a sequence of papers [24, 19, 18], the authors have studied asymptotics of the filtering problem under a small signal to noise ratio limit, under various types of model settings. In this case the nonlinear filter converges to the unconditional law of the signal and the authors establish large deviation principles characterizing probabilities of deviation of the filter from the above deterministic law. An analogous result in a correlated signal-observation noise case was studied in [3]. Finally, yet another type of large deviation problem in the context of nonlinear filtering (with correlated signal-observation noise) when the observation noise is and the signal noise and drift are suitably small has been considered in a series of papers [11, 22, 1].
The closest connections of the current work are with [15] and [13]. Specifically, the asymptotic statements in (1.7) and (1.8) which follow as a special case of our results (see Corollary 4.2) is analogous to results in [15], except that instead of a fixed observation path we consider the actual observation process (also we make substantially weaker assumption on coefficients than [15]). However our main interest is in a large deviation principle for the convergence to the deterministic limit in (1.7) , thus roughly speaking we are interested in quantifying the probability of deviations from the convergence statement in [15] (when a fixed observation path is replaced with the observation process ). This large deviation result, given in Theorem 2.1,
is the main contribution of our work.
Proof idea.
The proof of Theorem 2.1 is based on a variational representation for functionals of Brownian motions obtained in [4] (see also [5]) using which the proof of the large deviation principle reduces to proving a key weak convergence result given in Lemma 4.1. Proof of Lemma 4.1 is the technical heart of this work. Important use is made of some key estimates obtained in [13] (see in particular Proposition 5.3). One of the key steps is to argue that terms of order can be ignored in the exponent when studying Laplace asymptotics for the quantity on the left side of (3.6). This relies on several careful large deviation exponential estimates which are developed in Section 5. Once Lemma 4.1 is available the proof of the large deviation principle in Theorem 2.1 follows readily using the now well developed weak convergence approach for the study of large deviation problems (cf. [6]).
Organization. It will be convenient to formulate the filtering problem on canonical path spaces and also to represent the nonlinear filter through a map given on the path space of the observation process. This formulation and our main result (Theorem 2.1) are given in Section 2. The key idea in the proof of the LDP is a variational representation from [4]. A somewhat simplified version of this representation (cf. [6]) that is used in this work is presented in Section 3. Section 4 presents a key lemma (Lemma 4.1) that is needed for implementing the weak convergence method for proving the large deviation result in Theorem 2.1. Section 5 is devoted to the proof of Lemma 4.1. Using this lemma, the proof of Theorem 2.1 is completed in Section 6.
2. Setting and Main Result
Recall that has sample paths in . Similarly, the processes have sample paths in respectively.
It will be convenient to formulate the filtering problem on suitable path spaces.
Denote, for , the standard Wiener measure on as and the Wiener measure with variance parameter as
.
Denote the canonical coordinate process on as
and consider the SDE on the probability space ,
|
|
|
From Assumption 1, the above SDE has a unique strong solution with sample paths in .
Consider the map defined as
and let
|
|
|
Next, let and consider the probability space
|
|
|
Abusing notation, denote the coordinate maps on the above probability space as , namely
|
|
|
We will frequently write as for . Similar notational shorthand will be followed for other coordinate maps.
Note that, under , , and are independent standard Brownian motions in and respectively and
(2.1) |
|
|
|
Define, for a.e. , for ,
|
|
|
Note that, since under , is a standard Brownian martingale with respect to the filtration
, the first integral in the exponent is well-defined as an Itô integral.
From the independence of and under and Assumption 1 it follows that
is a -martingale under . Define a probability measure on as
|
|
|
Note that, by Girsanov’s theorem, under
(2.2) |
|
|
|
is a standard -dimensional Brownian motion which is independent of .
Rewriting the above equation as
|
|
|
we see that
(2.3) |
|
|
|
Next, for , define as
(2.4) |
|
|
|
The maps are well defined -a.s. and using results of [8, 9, 10], one can obtain versions of these maps (denoted as ) which are continuous on .
Also, define as
(2.5) |
|
|
|
Once again, for each , this map is well defined -a.s. and a continuous version of the map exists (which we denote as ) from [8, 9, 10] .
Write, for
|
|
|
Then with as in (1.1)-(1.2), for
(2.6) |
|
|
|
Also,
(2.7) |
|
|
|
where denotes the expectation under the probability measure , and
(2.8) |
|
|
|
Let for ,
(2.9) |
|
|
|
where
is the collection of all in such that
(2.10) |
|
|
|
Note that, by Assumption 1, for every there is a unique solution of (2.10).
By classical results of Freidlin and Wentzell (see e.g. [6, Theorem 10.6])
the collection of valued random variables satisfies a LDP with rate function and speed , namely, for all
(2.11) |
|
|
|
where we denote the first coordinate process on by , i.e. for .
In [13] it is shown that for every , and a given , the probability measure
(2.12) |
|
|
|
weakly,
if the
map
(2.13) |
|
|
|
attains its infimum over uniquely at ,
where recall that is the continuous version of .
We remark that [13] assumes in addition to (1) that and are bounded, but an examination of the proof shows (see calculations in Section 5) that these conditons can be replaced by linear growth conditions that are implied by Assumption 1 .
Recall the function from (1.4).
Then using similar ideas as in [13], under Assumption 1, and assuming in addition that either is positive definite or is a one-to-one function, it follows that
(2.14) |
|
|
|
weakly, as .
This is a consequence of the fact that when the map in (2.13) achieves its minimum (which is ) uniquely at .
As a consequence of the results of the current paper (see Corollary 4.2) one can show the Laplace asymptotic formula in (1.7). Recall from the discussion in the Introduction that the convergence in (1.7) gives information on asymptotics of conditional probabilities of non-typical state trajectories.
In order to quantify the decay rate of probabilities of observing rare observation trajectories that cause deviations from the deterministic variational quantity in (1.7), we will establish a large deviation principle for defined in (1.6).
We now present the rate function associated with this LDP.
Define the map as
(2.15) |
|
|
|
Also, for , let be given as the unique solution of (2.10).
We now introduce the rate function that will govern the large deviation asymptotics of .
Fix and define as
(2.16) |
|
|
|
where is the collection of all in
such that
(2.17) |
|
|
|
The following is the main result of the work.
Theorem 2.1.
Suppose that Assumption 1 is satisfied. Then for every , the collection
satisfies a large deviation principle on with rate function and speed .
3. A Variational Representation
Fix .
Recall the functional from (1.6).
From (2.6), note that one can write as
|
|
|
whose distribution under is same as the distribution of under . Let
|
|
|
Using this equality of laws and the equivalence between Large deviation principles and Laplace principles (see e.g. [6, Theorems 1.5 and 1.8]), in order to prove Theorem 2.1 it suffices to show that, has compact sub-level sets, i.e.,
(3.1) |
|
|
|
and for every
(3.2) |
|
|
|
The proof of the identity in (3.2) will use a variational representation for nonnegative functionals of Brownian motions given by Boué and Dupuis[4].
We now use this representation to give a variational formula for the left side of the above equation.
Let denote the -completion of and denote by
[resp. ] the collection of all -progressively measurable [resp. ] valued processes such that for some
|
|
|
For , let be given as the unique solution of the SDE on :
(3.3) |
|
|
|
Also define
(3.4) |
|
|
|
Occasionally, to emphasize the dependence of above processes on we will write as .
Now let
(3.5) |
|
|
|
When clear from context we will drop from the notation in and simply write .
Then it follows from [4] (cf. [6, Theorems 3.17]) that
(3.6) |
|
|
|
|
|
|
|
|
5. Proof of Lemma 4.1.
Let . Define canonical coordinate processes on as and , .
Note that
|
|
|
and for , a.s.,
|
|
|
|
Suppressing in notation, we have
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Thus, letting
(5.1) |
|
|
|
we can write
(5.2) |
|
|
|
Let now be as in the statement of Lemma 4.1. Using Assumption 1 it is immediate that
(5.3) |
|
|
|
in , where
|
|
|
By appealing to Skorohod representation theorem we can obtain, on some probability space , random variables
with same law as the random vector on the left side of (5.3)
and with same law as the vector on the right side of (5.3), such that
(5.4) |
|
|
|
Henceforth, to simplify notation we will drop the from the notation in the above vectors and denote the corresponding process
as .
Then, from (5.2), and the distributional equality noted above, it follows that
(5.5) |
|
|
|
In order to prove Lemma 4.1 it now suffices to show that, for all , as ,
(5.6) |
|
|
|
Define as
(5.7) |
|
|
|
|
|
|
|
|
|
|
|
|
Then from the continuity of and the a.s. convergence in (5.4), we see that for every
(5.8) |
|
|
|
Furthermore, with ,
(5.9) |
|
|
|
In order to prove (5.6) we will show
(5.10) |
|
|
|
and
(5.11) |
|
|
|
The fact that can be neglected in the asymptotic formula follows along the lines of [13], however since, unlike [13], we do not assume that is bounded and our functional of interest is somewhat different from the one considered in [13], we provide the details.
5.1. Proof of (5.11)
We begin with the following lemmas.
Lemma 5.1.
For any ,
|
|
|
Proof.
Note that for
|
|
|
Let . Then by an application of Gronwall’s lemma, it suffices to show that
|
|
|
where is the expectation under the probability measure .
Since is bounded and under , is a Brownian motion, there is such that
for every .
The result follows.
∎
Lemma 5.2.
Let for , and be measurable maps from to such that
(5.12) |
|
|
|
Then
(5.13) |
|
|
|
and for every
(5.14) |
|
|
|
Proof.
For , let . Then
|
|
|
|
|
|
|
|
Thus
|
|
|
Since
|
|
|
in order to prove the lemma it suffices to show (5.14) for every .
Fix . Using the fact that, on the set ,
|
|
|
and the bound in (5.12),
we see that
|
|
|
|
|
|
The inequality in (5.14) now follows on applying Lemma 5.1. ∎
Note that, by Itô’s formula,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
where, a.s.,
|
|
|
and .
The following result is taken from Heunis[13](cf. page 940 therein).
Proposition 5.3 (Heunis[13]).
The maps and are measurable and continuous, respectively, from to , and
there are such that
for all , ,
|
|
|
and
(5.15) |
|
|
|
Define
(5.16) |
|
|
|
Proposition 5.4.
For any , and a.e.
|
|
|
(5.17) |
|
|
|
Proof.
Note that on the set
|
|
|
Also note that, using the linear growth of , one can find a measurable map such that
(5.18) |
|
|
|
Using these observations, we have
|
|
|
(5.19) |
|
|
|
Next, for every
|
|
|
|
|
|
(5.20) |
|
|
|
We now consider the two terms in the above display separately.
For the first term, from Cauchy-Schwarz inequality,
|
|
|
|
|
|
and therefore
|
|
|
|
|
|
|
|
|
|
|
|
where in the next to last line we have used Proposition 5.3 and in the last line we have appealed to Lemma 5.1 and the fact that a.s.
For the second term on the right side in (5.20), we have from Lemma 5.2 (see (5.14))
and (5.15)
that
|
|
|
Using the last two displays in (5.20) and combining with (5.19) we have (5.17) and
|
|
|
Next, from [13, Proposition ], it follows that
|
|
|
Now using Cauchy-Schwarz inequality and arguing as before, we see that
|
|
|
We omit the details.
The following proposition shows that the term involving in the definition of can be ignored in proving the bound in (5.11).
Proposition 5.5.
For a.e. ,
|
|
|
Proof.
Fix and write
|
|
|
|
|
|
|
|
|
From Proposition 5.4,
(5.21) |
|
|
|
Next note that
|
|
|
|
|
|
Now recalling (5.15) and (5.18) and applying the first inequality in
Lemma 5.2 (i.e. (5.13)), we get
|
|
|
|
|
|
Since is arbitrary, the result follows on combining the above with (5.21).
∎
The proof of the following lemma follows along the lines of Varadhan’s lemma (cf. [23, Theorem 2.6], [6, Theorem 1.18 ]). We provide details for reader’s convenience.
Lemma 5.6.
Let be a collection of random variables with values in a Polish space that satisfies a LDP with rate function and speed . Let be a continuous function bounded from above, namely , and let be a collection of real measurable maps on such that . Further suppose that
|
|
|
|
|
|
|
|
Then
|
|
|
Proof.
Define ,
and .
Since is a rate function, is a compact subset of .
Fix . From the hypothesis of the lemma, for each , there exist such that for every and ,
where is an open ball of radius in .
Also, from the continuity of , for every there exists such that
|
|
|
Next, from the lower semi-continuity of , for every , there exists such that
|
|
|
Let .
Now define an open cover of using the following open sets:
|
|
|
Note that for any, , and , we have
(5.22) |
|
|
|
Since is compact, there exists and such that cover . For , we can find such that with
, for every ,
(5.23) |
|
|
|
where, and .
Next note that
|
|
|
|
|
|
|
|
|
|
|
|
(5.24) |
|
|
|
|
Defining , we have
, for .
Thus, using (5.22) and (5.23)
|
|
|
(5.25) |
|
|
|
Also
|
|
|
(5.26) |
|
|
|
where the second inequality is a consequence of the observation that .
Since is arbitrary, using (5.25) and (5.26) in (5.24) we now see that
(5.27) |
|
|
|
For the lower bound, choose such that .
Let be such that for all
, and , for . Then
|
|
|
|
|
|
|
|
|
|
|
|
Sending we have the lower bound and combining it with the upper bound in (5.27), we have the result.
∎
Recall the definition of from (5.7).
The following lemma will allow us to apply Lemma 5.6.
Lemma 5.7.
For a.e. and every and there exist and such that
|
|
|
Proof.
Consider in the set of full measure on which the convergence in (5.4) (and thus in (5.8)) holds.
From (5.8), for any fixed and , we can find such that for all
(5.28) |
|
|
|
Also, from continuity of , we can find a such that for all with
|
|
|
and
|
|
|
Thus for all and with
|
|
|
|
|
|
|
|
|
|
|
|
We now complete the proof of (5.11).
Completing the proof of (5.11).
Note that, from Proposition 5.5, a.s.,
|
|
|
|
(5.29) |
|
|
|
|
For , let . Then
|
|
|
|
(5.30) |
|
|
|
|
Note that is a continuous and bounded map on , is a
continuous, nonnegative map on and
is a map uniformly bounded in which satisfies the properties in Lemma 5.7. Thus applying Lemma 5.6 and the large deviations result from (2.11), we have
(5.31) |
|
|
|
|
|
|
|
|
Next, using the linear growth property of
|
|
|
for some measurable map . Thus, using the boundedness of and the nonnegativity of , we have
|
|
|
|
|
|
where the last equality follows from Lemma 5.2 (see (5.14)). Using the last bound together with (5.31) in (5.30) and (5.29) we now have the inequality in (5.11). ∎
5.2. Proof of (5.10)
Recall the convergence from (5.4).
We begin with the following lemma.
Lemma 5.8.
For a.e.
|
|
|
|
Proof.
Fix and . From continuity of on , of on , and of (for a.e. ) on , a.s. convergence of to ,
and Lemma 5.7,
we can find, for a.e. , a neighbourhood of and such that
|
|
|
|
|
|
|
|
|
Observe that
|
|
|
|
|
|
|
|
|
Noting that a.s. and applying the large deviation result from (2.11), we now have
|
|
|
|
|
|
|
|
|
Since and are arbitrary, the result follows.
∎
We now complete the proof of (5.10).
Completing the proof of (5.10).
Fix . Then
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
From Proposition 5.4 (see (5.17))
|
|
|
Thus to prove (5.10) it suffice to show that, a.s.,
(5.32) |
|
|
|
However the above is an immediate consequence of Lemma 5.8. This completes the proof of (5.10). ∎
Finally we complete the proof of Lemma 4.1.
Completing the proof of Lemma 4.1.
As noted above (5.6), in order to prove Lemma 4.1 it suffices to show (5.6) for every .
Also, for this it is enough to show (5.11) and (5.10). The inequality in (5.11) was shown in Section 5.1 and the proof of the inequality in (5.10) was provided in Section 5.2. Combining these we have Lemma 4.1. ∎