Quantum Covariance and Filtering

John E. Gough¹¹footnotemark: 1 Aberystwyth University, Wales, SY23 3BZ, UK

Abstract

We give a tutorial exposition of the analogue of the filtering equation for quantum systems focusing on the quantum probabilistic framework and developing the ideas from the classical theory. Quantum covariances and conditional expectations on von Neumann algebras play an essential part in the presentation.

keywords:

Quantum probability, quantum filtering, quantum Markovian systems

^†^†journal: Journal of LaTeX Templates

1 Introduction

Nonlinear filtering theory is a well-developed field of engineering which is used to estimate unknown quantities in the presence of noise. One of the founders of the field was the Soviet mathematician Ruslan Stratonovich who encouraged his student Viacheslav Belavkin to extend the problem to the quantum domain [1]. Classically, estimation works by measuring one or more variables which are dependent on the variables to estimated, and Bayes Theorem plays an essential role in inferring the unknown variables based on what we measure. Belavkin’s approach uses the theory of quantum stochastic calculus for continuous-in-time homodyne and photon counting measurements. There are several approaches: in the paper of Barchielli and Belavkin [2], the characteristic functional method is used to derive the photon-counting case, with the diffusive case obtained as an appropriate limit. Further details of the many approaches and applications may be found in the books by Barchielli and Gregoratti [3] and by Wiseman and Milburn [4].

However, the proof of Bayes Theorem requires a joint probability distribution for the unknown variables and the measured ones. Once we go to quantum theory, we have to be very careful as incompatible observables do not possess a joint probability distribution - in such cases, applying Bayes Theorem will lead to erroneous results and is the root of many of the paradoxes in the theory.

We will derive the simplest quantum filter. The filter equation itself was originally postulated by Gisin on different grounds of continuous collapse of the wavefunction, but subsequently given a standard filtering interpretation [5]. It also appeared as way of simulating quantum open systems due to Carmichael [6] and Dalibard, Castin and Mølmer [7]: while this appears as a trick for simulating just the quantum master equation (analogue of the Fokker-Planck equation) by stochastic processes, it is clear that the authors consider an underlying interpretation based on continual measurements. The discrete-time version of the filter also featured in the famous Paris Photon-Box experiment [8].

2 Quantum Probabilistic Setting

We start from the tradition formulation of quantum theory in terms of operators on a separable Hilbert space, $\mathfrak{h}$ . The norm of a linear operator $X$ is $\|X\|=\sup\{\|X\phi\|:\phi\in\mathfrak{h},\|\phi\|=1\}$ , and the collection of bounded operators will be denoted by $B(\mathfrak{h})$ . We will denote the identity operator by 11. The adjoint of $X\in B(\mathfrak{h})$ will be denoted by $X^{\ast}$ .

Our interest will be in von Neumann algebras. These are unital *-algebras with that are closed in the weak operator topology. Here we say that a sequence of operators $(X_{n})$ converges weakly in $B(\mathfrak{h})$ to $X$ if their matrix elements converge, that is $\langle\phi,X_{n}\psi\rangle\to\langle\phi,X\psi\rangle$ for all $\phi,\psi\in\mathfrak{h}$ .

A pair $(\mathfrak{A},\langle\cdot\rangle)$ consisting of a von Neumann algebra and a state is referred to as a quantum probability (QP) space [9].

Commutative = Classical

Kolmogorov’s setting for classical probability is in terms of probability spaces $(\Omega,\mathcal{A},\mathbb{P})$ where $\Omega$ is a space of outcomes (the sample space), $\mathcal{A}$ is a $\sigma$ -algebra of subsets of $\Omega$ , and $\mathbb{P}$ is a probability measure on the elements in $\mathcal{A}$ . The collection of functions $\mathfrak{A}=L^{\infty}(\Omega,\mathcal{A},\mathbb{P})$ will form a commutative von Neumann algebra and, moreover, a state is given by $\langle A\rangle=\int_{\Omega}A(\omega)\,\mathbb{P}[d\omega]$ . (Conversely, every commutative von Neumann algebra with a state that is continuous in the normal topology, see below, will be isomorphic to this framework.)

Commutants

There is an alternative definition of von Neumann algebras which, surprising, is purely algebraic. For a subset of operators $\mathfrak{A}$ , we define its commutant in $B(\mathfrak{h})$ to be

\displaystyle\mathfrak{A}^{\prime}=\{X\in B(\mathfrak{h}):[A,X]=0,\forall A\in\mathfrak{A}\}.

(1)

The commutant of the commutant of $\mathfrak{A}$ is called the bicommutant and is denoted $\mathfrak{A}^{\prime\prime}$ . Von Neumann’s Bicommutant Theorem states that a collection of operators $\mathfrak{A}$ is a von Neumann algebra if and only if it is closed under taking adjoints and $\mathfrak{A}=\mathfrak{A}^{\prime\prime}$ .

$B(\mathfrak{h})$ itself is a von Neumann algebra. If $\mathfrak{A}$ and $\mathfrak{B}$ are von Neumann algebras then $\mathfrak{B}$ is said to be coarser than $\mathfrak{A}$ if $\mathfrak{B}\subset\mathfrak{A}$ . A collection of operators $K$ generates a von Neumann algebra $\mathrm{vN}(K)=(K\cup K^{\ast})^{\prime\prime})$ .

States

A state on a von Neumann algebra is a *-linear functional $\langle\cdot\rangle:\mathfrak{A}\mapsto\mathbb{C}$ which is positive ( $\langle X\rangle\geq 0$ whenever $X\geq 0$ ) and normalized ( $\langle\hbox{1\kern-4.0pt1}\rangle=1$ ). We will assume that the state is continuous in the normal topology, that is $\sup_{n}\mathbb{E}[X_{n}]=\mathbb{E}[\sup_{n}X_{n}]$ for any increasing sequence $(X_{n})$ of positive elements of $\mathfrak{A}$ . The main point of interest is that the normal state takes the form $\langle X\rangle=\mathrm{tr}\{\varrho X\}$ for $\varrho$ a density matrix.

The state satisfies the Cauchy-Schwartz identity $|\langle X^{\ast}Y\rangle|^{2}\leq\langle X^{\ast}X\rangle\,\langle Y^{\ast}Y\rangle$ .

Morphisms between QP Spaces

A morphism $\phi:(\mathfrak{A}_{1},\langle\cdot\rangle_{1})\mapsto(\mathfrak{A}_{2},\langle\cdot\rangle_{2})$ between QP spaces is a normal, completely positive, *-linear map which preserves the identity, $\phi(\hbox{1\kern-4.0pt1}_{1})=\hbox{1\kern-4.0pt1}_{2}$ , and the probabilities, $\langle\phi(X)\rangle_{2}=\langle X\rangle_{1}$ for all $X\in\mathfrak{A}_{1}$ . If a morphism is a homomorphism, that is, $\phi(X)\phi(Y)=\phi(XY)$ for all $X,Y\in\mathfrak{A}_{1}$ , then we say that $\mathfrak{A}_{1}$ is embedded into $\mathfrak{A}_{2}$ .

Tomita-Takesaki Theory

As operators do not necessarily commute we may have $\langle X^{\ast}Y\rangle$ different from $\langle YX^{\ast}\rangle$ . Nevertheless, it is possible to write

\displaystyle\langle YX^{\ast}\rangle=\langle X^{\ast}\Delta Y\rangle,

(2)

where $\Delta$ is a positive (possibly unbounded operator on $\mathfrak{A}$ known as the modular operator. This plays a central role in the Tomita-Takesaki theory of von Neumann algebras. A one-parameter group $\{\sigma_{t}:t\in\mathbb{R}\}$ of maps on $\mathfrak{A}$ is defined by $\sigma_{t}(X)=\Delta^{-it}X\Delta^{it}$ and is known as the modular group associated with the QP space $(\mathfrak{A},\langle\cdot\rangle)$ .

Theorem 1 (Takesaki, [10])

Let $(\mathfrak{A},\langle\cdot\rangle)$ be a QP space and let $\mathfrak{B}$ be a von Neumann subalgebra of $\mathfrak{A}$ . There will exist a morphism $\mathfrak{E}$ from $\mathfrak{A}$ down to $\mathfrak{B}$ which is projective ( $\mathfrak{E}\circ\mathfrak{E}=\mathfrak{E}$ ) if and only if $\mathfrak{B}$ is invariant under the modular group of $(\mathfrak{A},\langle\cdot\rangle)$ .

2.1 Quantum Conditioning

We fix a QP space $\big{(}\mathfrak{A},\langle\cdot\rangle\big{)}$ , and define the covariance of two elements $X,Y\in\mathfrak{A}$ to be

\displaystyle\mathrm{Cov}(X,Y)\triangleq\langle X^{\ast}Y\rangle-\langle X\rangle^{\ast}\langle Y\rangle.

(3)

Likewise the variance is defined as $\mathrm{Var}(X)\triangleq\mathrm{Cov}(X,X)$ .

The idea is that we have a subset $\mathfrak{B}\subset\mathfrak{A}$ , and we want to associate an element $\mathfrak{E}[A]\in\mathfrak{B}$ with each $A\in\mathfrak{A}$ , see Figure 1. As $\mathfrak{B}$ is smaller than $\mathfrak{A}$ we think of $\mathfrak{E}[A]$ as a coarse-grained version of $A$ based on a less information. The map $\mathfrak{E}$ therefore compresses the model $(\mathfrak{A},\langle\cdot\rangle)$ into a coarser one on $\mathfrak{B}$ : we would like to do this is a way that preserves averages.

Refer to caption — Figure 1: A conditional expectation $\mathfrak{E}$ is a projection from an algebra $\mathfrak{A}$ of random objects down into a smaller algebra $\mathfrak{B}$ such that $\langle\mathfrak{E}[A]\rangle=\langle A\rangle$ .

We now list some desirable features for $\mathfrak{E}$ which we have already encountered in the classical case: for any $X,Y,A\in\mathfrak{A}$ , $\alpha,\beta\in\mathbb{C}$ and $B_{1},B_{2}\in\mathfrak{B}$ ,

(CE1)

linearity: $\mathfrak{E}[\alpha X+\beta Y]=\alpha\mathfrak{E}[X]+\beta\mathfrak{E}[Y]$ ;
(CE2)

*-map: $\mathfrak{E}[X^{\ast}]=\mathfrak{E}[X]^{\ast}$ ;
(CE3)

conservativity: $\mathfrak{E}[\hbox{1\kern-4.0pt1}]=\hbox{1\kern-4.0pt1}$ ;
(CE4)

compatibility: $\langle\mathfrak{E}[A]\rangle=\langle A\rangle$ ;
(CE5)

projectivity: $\mathfrak{E}[\mathfrak{E}[A]]=\mathfrak{E}[A]$ ;
(CE6)

peelability: $\mathfrak{E}[B_{1}AB_{2}]=B_{1}\mathfrak{E}[A]B_{2}$ ;
(CE7)

positivity: $\mathfrak{E}[A]\geq 0$ whenever $A\geq 0$ .

We call property (CE6) “peelability” for the lack of a better name and we emphasize that the order of the operators is important. Property (CE7) is known to be insufficient to deal with quantum theory and must be strengthened as follows:

(CE7^′)

complete positivity: for each integer $n\geq 1$

\displaystyle\left[\begin{array}[]{ccc}\mathfrak{E}[A_{11}]&\cdots&\mathfrak{E}[A_{1n}]\\ \vdots&\ddots&\vdots\\ \mathfrak{E}[A_{n1}]&\cdots&\mathfrak{E}[A_{nn}]\end{array}\right]\geq 0\;\mathrm{whenever}\,\left[\begin{array}[]{ccc}A_{11}&\cdots&A_{1n}\\ \vdots&\ddots&\vdots\\ A_{n1}&\cdots&A_{nn}\end{array}\right]\geq 0.

(10)

Definition 2

Let $\mathfrak{A}$ and $\mathfrak{B}$ be a unital *-algebras with $\mathfrak{B}$ a subalgebra of $\mathfrak{A}$ , then a mapping $\mathfrak{E}:\mathfrak{A}\mapsto\mathfrak{B}$ satisfying properties (CE1)-(CE6) and (CE7 ${}^{\,\prime}$ ) is a quantum conditional expectation.

Proposition 3

A quantum conditional expectation $\mathfrak{E}$ acts as the identity map on $\mathfrak{B}$ .

Proof. Set $A=B_{1}=\hbox{1\kern-4.0pt1}$ and $B_{2}=B\in\mathfrak{B}$ , then peelability implies that $\mathfrak{E}[B]=\mathfrak{E}[\hbox{1\kern-4.0pt1}]B$ . So the result follows from conservativity.

Existence

We observe that the conditional expectation always exists in the classical world. Here $\mathfrak{A}$ can be identified as some $L^{\infty}(\Omega,\mathcal{A},\mathbb{P})$ and then the subalgebra $\mathfrak{B}$ will be then take the form $L^{\infty}(\Omega,\mathcal{B},\mathbb{P})$ where $\mathcal{B}$ is a coarser $\sigma$ -algebra. Conditional expectation is then well defined: For $A\in L^{\infty}(\Omega,\mathcal{A},\mathbb{P})$ one sets $\mu_{A}[I]=\int_{I}A(\omega)\mathbb{P}[d\omega]$ for each $I\in\mathcal{B}$ then $\mu_{A}$ is absolutely continuous with respect to $\mathbb{P}|_{\mathcal{G}}$ and its Radon-Nikodym derivative is the conditional expectation which we denote as $\mathbb{E}[A|\mathcal{B}]$ . This is explicit in Kolmogorov’s original paper.

In contrast, quantum conditional expectations need not exits. By definition, they satisfy the requirements of the Takesaki Theorem above (and additionally the peelability condition) so we need further invariance of the subalgebra $\mathfrak{B}$ under the modular group.

2.2 Quantum Covariance

Definition 4

Let $\mathfrak{E}$ be a quantum conditional expectation from $\mathfrak{A}$ onto a subalgebra $\mathfrak{B}$ . For each $A\in\mathfrak{A}$ , we define $\delta A\triangleq A-\mathfrak{E}[A]$ . The conditional covariance of $X,Y\in\mathfrak{A}$ is defined to be

\displaystyle\mathrm{Cov}_{\mathfrak{B}}(X,Y)\triangleq\mathfrak{E}[\delta X^{\ast}\,\delta Y].

(11)

The conditional variance is

\displaystyle\mathrm{Var}_{\mathfrak{B}}(X)\triangleq\mathrm{Cov}_{\mathfrak{B}}(X,X).

(12)

Note that

\displaystyle\mathfrak{E}[\delta A]=\langle\delta A\rangle=0,

(13)

for every $A\in\mathfrak{A}$ . It is worth emphasizing that the conditional covariance defined here is an operator on $\mathfrak{B}$ , not a scalar.

Lemma 5

We have $\mathfrak{E}[B_{1}\,\delta A\,B_{2}]=0$ whenever $A\in\mathfrak{A}$ and $B_{1},B_{2}\in\mathfrak{B}$ . In particular, $\mathfrak{E}[B\,\delta A]=0$ whenever $A\in\mathfrak{A}$ and $B\in\mathfrak{B}$ .

The proof depends crucially on peelability: $\mathfrak{E}[B_{1}\,\delta A\,B_{2}]=B_{1}\,\mathfrak{E}[\delta A]\,B_{2}=0$ . The following result is trivial classically, but again requires peelability in the non-commutative setting.

Proposition 6

The conditional covariance may alternatively be written as

\displaystyle\mathrm{Cov}_{\mathfrak{B}}(X,Y)=\mathfrak{E}[X^{\ast}Y]-\mathfrak{E}[X]^{\ast}\mathfrak{E}[Y].

(14)

Proof. From 5 we then have

	$\displaystyle\mathrm{Cov}_{\mathfrak{B}}(X,Y)$	$\displaystyle=$	$\displaystyle\mathfrak{E}\bigg{[}X^{\ast}Y-\mathfrak{E}[X]^{\ast}Y-X^{\ast}\mathfrak{E}[Y]+\mathfrak{E}[X]^{\ast}\mathfrak{E}[Y]\bigg{]}$
		$\displaystyle=$	$\displaystyle\mathfrak{E}[X^{\ast}Y]-\mathfrak{E}[X]^{\ast}\mathfrak{E}[Y]-\mathfrak{E}[X]^{\ast}\mathfrak{E}[Y]+\mathfrak{E}[X]^{\ast}\mathfrak{E}[Y]$

and the result follows.

Proposition 7

The conditional covariance has the invariance property

\displaystyle\mathrm{Cov}_{\mathfrak{B}}(X+B_{1},Y+B_{2})=\mathrm{Cov}_{\mathfrak{B}}(X,Y),

(15)

for all $B_{1},B_{2}\in\mathfrak{B}$ .

Proof. From *-linearity and (14), we see that the left hand side of (15) equals

\displaystyle\mathfrak{E}\big{[}X^{\ast}Y+X^{\ast}B_{2}+B_{1}^{\ast}Y+B_{1}^{\ast}B_{2}\big{]}-\big{(}\mathfrak{E}[X]+B_{1}\big{)}^{\ast}\big{(}\mathfrak{E}[Y]+B_{2}\big{)}

and the result follows using peelability.

Lemma 8

The covariance and conditional covariance are related by

\displaystyle\mathrm{Cov}(X,Y)=\langle\mathrm{Cov}_{\mathfrak{B}}(X,Y)\rangle+\big{\langle}(\mathfrak{E}[X]-\langle X\rangle)^{\ast}(\mathfrak{E}[Y]-\langle Y\rangle)\big{\rangle}.

(16)

Proof. This follows from repeated application of the compatibility property.

	$\displaystyle\langle\mathrm{Cov}_{\mathfrak{B}}(X,Y)\rangle$	$\displaystyle=$	$\displaystyle\langle X^{\ast}Y\rangle-\langle\mathfrak{E}[X]^{\ast}\mathfrak{E}[Y]\rangle$
		$\displaystyle=$	$\displaystyle\langle X^{\ast}Y\rangle-\langle X\rangle^{\ast}\langle Y\rangle-\big{(}\langle\mathfrak{E}[X]^{\ast}\mathfrak{E}[Y]\rangle-\langle X^{\ast}\rangle\langle Y\rangle)\big{)},$

which is readily rearranged to give the result.

As a consequence we have

\displaystyle\mathrm{Var}(X)=\langle\mathrm{Var}_{\mathfrak{B}}(X)\rangle+\big{\langle}(\mathfrak{E}[X]-\langle X\rangle)^{\ast}(\mathfrak{E}[X]-\langle X\rangle)\big{\rangle}.

(17)

2.3 Least Squares Property

Proposition 9

The conditional covariance has the least squares property, that is, $\mathfrak{E}[(X-B)^{\ast}(X-B)]$ is minimized over $B\in\mathfrak{B}$ by $B=\mathfrak{E}[X]$ .

Proof. Let $B\in\mathfrak{B}$ then $B^{\prime}=B+\mathfrak{E}[X]$ which is in again in $\mathfrak{B}$ . Then

$\displaystyle\mathfrak{E}[(X-B)^{\ast}(X-B)]$	$\displaystyle=$	$\displaystyle\mathfrak{E}[(\delta X-B^{\prime})^{\ast}(\delta X-B^{\prime})]$
	$\displaystyle=$	$\displaystyle\mathfrak{E}[\delta X^{\ast}\,\delta X]-B^{\prime\ast}\delta X-\delta X\,B^{\prime}+B^{\prime\ast}B^{\prime}]$
	$\displaystyle=$	$\displaystyle\mathrm{Var}_{\mathfrak{B}}(X)+\mathfrak{E}[B^{\prime\ast}B^{\prime}]$
	$\displaystyle\geq$	$\displaystyle\mathrm{Var}_{\mathfrak{B}}(X),$

where we use the positivity property.

Corollary 10

The variance $\langle(X-B)^{\ast}(X-B)\rangle$ is also minimized over $B\in\mathfrak{B}$ by $B=\mathfrak{E}[X]$ .

Proof. Using the same notations from the proof of Lemma 9, we have

$\displaystyle\langle(X-B)^{\ast}(X-B)\rangle$	$\displaystyle=$	$\displaystyle\langle(\delta X-B^{\prime})^{\ast}(\delta X-B^{\prime})\rangle$
	$\displaystyle=$	$\displaystyle\langle\delta X^{\ast}\delta X\rangle-\langle B^{\prime\ast}\delta X\rangle-\langle\delta X^{\ast}\,B^{\prime}\rangle+\langle B^{\prime\ast}B^{\prime}\rangle$
	$\displaystyle=$	$\displaystyle\langle\delta X^{\ast}\delta X\rangle+\langle B^{\prime\ast}B^{\prime}\rangle,$

since $\langle B^{\prime\ast}\delta X\rangle=\langle\delta X^{\ast}\,B^{\prime}\rangle=0$ by Lemma 5. Therefore $\langle(X-B)^{\ast}(X-B)\rangle$ is also minimized over $B\in\mathfrak{B}$ by $B=\mathfrak{E}[X]$ .

3 Classical Filtering

In this section we recall in detail Kolmogorov’s Theory of Probability. In the process we will see the commutative analogues that motivated the our more general definitions in the Introduction.

3.1 Kolmogorov’s Theory

Kolmogorov’s axiomatic formulation of probability theory is based on the mathematical formalism of measure theory. The main concept is that of a probability space. This is a triple $(\Omega,\mathcal{F},\mathbb{P})$ where:

1.

$\Omega$ , called the sample space, is the collection of all possible outcomes (typically a topological space);
2.

$\mathcal{F}$ is a $\sigma$ -algebra of subsets of $\Omega$ ,the elements of which are known as events;
3.

$\mathbb{P}$ is a probability measure on $\mathcal{F}$ .

In details, $\mathcal{F}$ will form a $\sigma$ -algebra if it contains the empty set $\emptyset$ , if it is closed under complementation (that is, if $A\in\mathcal{F}$ then so too will be its complement $A^{\prime}=\{\omega\in\Omega:\omega\notin A\}$ ), and finally if whenever $\{A_{n}\}$ is a countable number of events in $\mathcal{F}$ then their intersection $\cap_{n}A_{n}$ and union $\cup_{n}A_{n}$ will be in $\mathcal{F}$ . Note that $\Omega$ will be an event since it is the complement of the empty set.

A probability measure $\mathbb{P}$ on $\mathcal{F}$ is an assignment of a probability $\mathbb{P}[A]\geq 0$ to each event $A\in\mathcal{F}$ with the rule that $\mathbb{P}[\Omega]=1$ and $\mathbb{P}[\cap_{n}A_{n}]=\sum_{n}\mathbb{P}[A_{n}]$ for any countable number of events, $\{A_{n}\}$ , that are non-overlapping (i.e., $A_{n}\cap A_{m}=\emptyset$ if $n\neq m$ )

The pair $(\Omega,\mathcal{F})$ comprise a measurable space. In other words, a space where we are capable to assign possible measures of size to selected subsets in a consistent manner: this is the branch of mathematics known as measure theory which was set up to resolve pathological problems when you try and assign a measure to all subsets. It follows that probability theory is formally just special case of measure theory where the measure $\mathbb{P}$ has maximum value $\mathbb{P}[\Omega]=1$ .

More exactly, the setting is measure theory but probability theory brings its own additionally concepts with it. An example is conditional probability: the probability of event $A$ given that $B$ has occurred is defined by

\displaystyle\mathbb{P}[A|B]=\frac{\mathbb{P}[A\cap B]}{\mathbb{P}[B]}

(18)

which is the joint probability, $\mathbb{P}[A\cap B]$ , for both $A$ and $B$ to occur divided by the marginal probability $\mathbb{P}[B]$ .

The choice of $\mathcal{F}$ in a given problem is part of the modeling process. Essentially, we have to ask what are the events that we want to assign a probability to. Let $\mathcal{G}$ be a $\sigma$ -algebra that is contained in $\mathcal{F}$ (that is every event in $\mathcal{G}$ there is also an event in $\mathcal{F}$ ) then we say that $\mathcal{G}$ is coarser, or smaller, than $\mathcal{F}$ . The probability space $(\Omega,\mathcal{G},\mathbb{Q})$ is then a coarse-graining of $(\Omega,\mathcal{G},\mathbb{P})$ where we take $\mathbb{Q}$ to be the restriction of $\mathbb{P}$ to the smaller $\sigma$ -algebra $\mathcal{G}$ .

Just as we do not consider all subsets of $\Omega$ , we do not consider all functions on $\Omega$ either. Let $X:\Omega\to\mathbb{R}$ then we say $X$ is measurable with respect to a $\sigma$ -algebra $\mathcal{F}$ if the sets

\displaystyle X^{-1}[I]\triangleq\{\omega\in\Omega:X(\omega)\in I\}

(19)

belong to $\mathcal{F}$ for each interval $I$ . A measurable function $X$ on a probability space is called a random variable and the probability that it takes a value in the interval $I$ , denoted $\mathrm{Prob}\{X\in I\}$ is just the value $\mathbb{P}$ assigns to the event $X^{-1}[I]$ . We will use the term random vector for a vector-valued function whose components are all random variables.

Let $X_{1},\cdots,X_{n}$ be random variables, then there is a coarsest $\sigma$ -algebra which contains all the events of the form $X_{j}^{-1}[I]$ for all $j$ and all intervals $I$ : we refer to this as the $\sigma$ -algebra generated by the random variables.

The correct way to think of an ensemble is a probability space where $(\Omega,\mathcal{F})$ is collection $\Gamma$ all possible microstates with $\mathcal{F}$ is some suitable $\sigma$ -algebra, and $\mathbb{P}$ is a suitable probability measure. The Hamiltonian must, at the very least, be a measurable function with respect to whatever $\sigma$ -algebra we chose. No philosophical interpretations needed beyond this point.

3.2 Conditioning in Classical Probability

We will now restrict attention to continuous random variables with well-defined probability densities. A random variable $X$ has probability distribution function (pdf) $\rho_{X}$ so that

\displaystyle\Pr\left\{x\leq X<x+dx\right\}=\rho_{X}\left(x\right)\,dx.

(20)

Normalization requires $\int_{-\infty}^{\infty}\rho_{X}\left(x\right)dx=1$ . If we have several random variables, then we need to specify their joint probability. For instance, if we have a pair $X$ and $Y$ then their joint pdf will be $\rho_{X,Y}\left(x,y\right)$ with

	$\displaystyle\rho_{X}\left(x\right)$	$\displaystyle=$	$\displaystyle\int\rho_{X,Y}\left(x,y\right)dy,\quad\mathrm{(}x\mathrm{-marginal)}$		(21)
	$\displaystyle\rho_{Y}\left(y\right)$	$\displaystyle=$	$\displaystyle\int\rho_{X,Y}\left(x,y\right)dx,\quad\mathrm{(}y\mathrm{-marginal)}$		(22)

and $1=\int\int\rho_{X,Y}\left(x,y\right)dxdy$ .

We say that $X$ and $Y$ are statistically independent if their joint probability factors into the marginals

\displaystyle\rho_{X,Y}\left(x,y\right)\equiv\rho_{X}\left(x\right)\times\rho_{Y}\left(y\right),\quad\text{( independence).}

(23)

This is equivalent to pairs of events of the form $X^{-1}[I]$ and $Y^{-1}[J]$ being statistically independent for all intervals $I,J$ .

More generally, we can work out the conditional probabilities from a joint probability. The pdf for $X$ given that $Y=y$ is defined to be

\displaystyle\rho_{X|Y}\left(x|y\right)\triangleq\frac{\rho_{X,Y}\left(x,y\right)}{\rho_{Y}\left(y\right)}.

(24)

In the special case where $X$ and $Y$ are independent we have

\displaystyle\rho_{X|Y}\left(x|y\right)=\rho_{X}\left(x\right).

(25)

In other words, conditioning on the fact that $Y=y$ makes no change to our knowledge of $X$ .

Definition 11

Let $A=a(X,Y)$ be a random variable for some function $a:\mathbb{R}\times\mathbb{R}\mapsto\mathbb{R}$ , then its conditional expectation given $Y=y$ is defined to be

\displaystyle\mathbb{E}[A|Y=y]\triangleq\int_{\mathbb{R}}a(x,y)\rho_{X|Y}(x|y)dx.

(26)

More generally, let $\mathcal{Y}$ be the $\sigma$ -algebra generated by $Y$ , then $\mathbb{E}[A|\mathcal{Y}]$ is the $\mathcal{Y}$ -measurable random variable taking the value $\mathbb{E}[A|Y=y]$ for each $\omega$ where $y$ is the value of $Y(\omega)$ .

As $\int\rho_{X|Y}(x|y)\,dx=1$ , we have

\displaystyle\mathbb{E}[1|\mathcal{Y}]\equiv 1.

(27)

We note that for any random variable $A=a(X,Y)$

$\displaystyle\mathbb{E}\big{[}\mathbb{E}[A\|\mathcal{Y}]\big{]}$	$\displaystyle=$	$\displaystyle\int_{\mathbb{R}}\left(\int_{\mathbb{R}}a(x,y)\rho_{X\|Y}(x\|y)dx\right)\rho_{Y}(x)\,dy$	(28)
	$\displaystyle=$	$\displaystyle\int_{\mathbb{R}}dx\int_{\mathbb{R}}dy\,a(x,y)\rho_{X,Y}(x,y)dx$
	$\displaystyle=$	$\displaystyle\mathbb{E}[A].$

Also, for any $A=a(X,Y)$ and $B=b(Y)$ we have

$\displaystyle\mathbb{E}[AB\|\mathcal{Y}](\omega)$	$\displaystyle=$	$\displaystyle\int_{\mathbb{R}}a(x,Y(\omega)b(Y(\omega))\rho_{X\|Y}(x\|y)dx$	(29)
	$\displaystyle=$	$\displaystyle\left(\int_{\mathbb{R}}dx\,a(x,Y(\omega))\rho_{X\|Y}(x\|Y(\omega))dx\right)b(Y(\omega))$
	$\displaystyle=$	$\displaystyle\mathbb{E}[A\|\mathcal{Y}](\omega)\,B(\omega).$

This construction was specific to random variables with pdfs. However, it extends to the general setting as follows.

Theorem 12

Let $(\Omega,\mathcal{F},\mathbb{P})$ be a probability space and let $\mathcal{Y}$ be a sub- $\sigma$ -algebra of $\mathcal{F}$ . Then there exists a $\mathbb{P}$ -almost surely unique $\mathcal{Y}$ -measurable random variable $\mathbb{E}[X|\mathcal{Y}]$ such that $\mathbb{E}[1|\mathcal{Y}]=1$ , $\mathbb{E}\big{[}\mathbb{E}[A|\mathcal{Y}]\big{]}=\mathbb{E}[A]$ and $\mathbb{E}[AB|\mathcal{Y}]=\mathbb{E}[A|\mathcal{Y}]\,B$ whenever $B$ is $\mathcal{Y}$ -measurable.

Proposition 13

If $B$ is $\mathcal{Y}$ -measurable, then

\displaystyle\mathbb{E}[B|\mathcal{Y}]=B.

(30)

Proof. Setting $A=1$ in the identity $\mathbb{E}[AB|\mathcal{Y}]=\mathbb{E}[A|\mathcal{Y}]\,B$ whenever $B$ is $\mathcal{Y}$ -measurable, we see that $\mathbb{E}[B|\mathcal{Y}]=\mathbb{E}[1|\mathcal{Y}]B$ which in turn equals $B$ .

Proposition 14

Conditional expectations are projections.

Proof. For $A$ arbitrary, we set $B=\mathbb{E}[A|\mathcal{Y}]$ which is $\mathcal{Y}$ -measurable and so

\displaystyle\mathbb{E}[\mathbb{E}[A|\mathcal{Y}]|\mathcal{Y}]=\mathbb{E}[A|\mathcal{Y}].

(31)

3.3 Classical Measurement

We now suppose that we have a system with phase space $\Gamma$ and a measuring apparatus with parameter space $M$ . We let $x$ denote the phase points of $\Gamma$ as before, and write $y$ for the variables of the apparatus. The components of $y$ are sometimes referred to as pointer variables. The total space will be $\Omega=\Gamma\times M$ with coordinates $\omega=(x,y)$ . We take $\mathbb{P}$ to be a probability measure on $\Omega$ and consider the random vectors $X(\omega)=x$ and $Y(\omega)=y$ .

In an experiment, we will not measure the system directly but instead record the value of one or more pointer variables. Let $\mathcal{Y}$ be the $\sigma$ -algebra generated by $Y$ . We therefore refer to $Y$ as the data.

We shall assume that the system variables and the pointer variables are statistically dependent for our probability measure $\mathbb{P}$ , otherwise we learn nothing about our system from the data. As before we assume a joint pdf $\rho(x,y)$ with marginals $\rho_{\Gamma}(x)$ for the system and $\rho_{M}(y)$ for the measuring apparatus. We will write $\rho(x|y)$ for the conditional pdf for our system given the data but write $\lambda(y|x)$ for the conditional pdf of the data given the system. This implies that

\displaystyle\rho(x,y)=\rho(x|y)\,\rho_{M}(y)=\lambda(y|x)\,\rho_{\Gamma}(x).

(32)

In practice, we may not know $\mathbb{P}$ however we will assume that we know $\lambda(y|x)$ . That is, we assume that we know the probability distribution of the pointer variables if we prepared our system precisely in state $x$ , for each possible $x\in\Gamma$ . Statisticians refer to $\lambda\left(y|x\right)$ as the likelihood function of the data $y$ given $x$ .

Note that

\displaystyle\int_{M}\lambda\left(y|x\right)\,dy=\int_{\Gamma}\rho(y|x)\,dx=1.

(33)

Now every random variable may be written as $A=a(X,Y)$ for some function $a:\Omega=\Gamma\times M\mapsto\mathbb{R}$ . Its conditional expectation given the data is

\displaystyle\mathbb{E}[A|\mathcal{Y}]\equiv\int_{\Gamma}a(x,Y)\rho(x|Y)dx.

(34)

Indeed, for $\omega=(x,y)$ we have

	$\displaystyle\mathbb{E}[A\|\mathcal{Y}](\omega)$	$\displaystyle=$	$\displaystyle\frac{1}{\rho_{M}(y)}\int_{\Gamma}a(x^{\prime},y)\rho(x^{\prime},y)dx^{\prime}$		(35)
		$\displaystyle=$	$\displaystyle\frac{\int_{\Gamma}a(x^{\prime},y)\rho(x^{\prime},y)dx^{\prime}}{\int_{\Gamma}\rho(x^{\prime\prime},y)dx^{\prime\prime}}.$		(35)

This is an average over the hypersurface $\Omega_{y}=\{\omega\in\Omega:Y(\omega)=y\}$ . Indeed, the decomposition $\omega=(x,y)$ can be thought of as split into the constraint coordinates $y$ and the hypersurface coordinates $x$ .

From a practical stand point, we will have access only to the data - that is, variables measurable with respect to $\mathcal{Y}$ only. We are assuming that we know $\lambda$ , which is the conditional probability for data given that the system. However, the problem is that the system is unknown and what we are given is, of course, the data. Therefore, we need to solve the inverse problem, namely to give the conditional probability for the unknown $X$ given the measured values for $Y$ . The problem however is not well-posed. We do not have enough information in the problem yet to write down the joint probability.

To remedy this, we introduce a pdf for $X$ which is our a priori guess:

\displaystyle\rho_{X}(x)\stackrel{{\scriptstyle\mathrm{guess!}}}{{=}}\rho_{\mathrm{prior}}\left(x\right).

(36)

We then have the corresponding joint probability for $X$ and $Y$ :

\displaystyle\rho_{\mathrm{prior}}\left(x,y\right)=\lambda\left(y|x\right)\times\rho_{\mathrm{prior}}\left(x\right).

(37)

If we subsequently measure $Y=y$ then we obtain the a posteriori probability

	$\displaystyle\rho_{\mathrm{post}}\left(x\|y\right)$	$\displaystyle=$	$\displaystyle\frac{\rho_{X,Y}\left(x,y\right)}{\rho_{Y}\left(y\right)}$		(38)
		$\displaystyle=$	$\displaystyle\frac{{\lambda\left(y\|x\right)\rho_{\mathrm{prior}}\left(x\right)}}{\int\lambda\left(y\|x^{\prime}\right)\rho_{\mathrm{prior}}\left(x^{\prime}\right)dx^{\prime}}.$		(38)

The conditional expectation in (35) can be then written as

\displaystyle\mathbb{E}[A|\mathcal{Y}](\omega)=\frac{\int_{\Gamma}a(x^{\prime},y)\lambda\left(y|x^{\prime}\right)\rho_{\mathrm{prior}}\left(x^{\prime}\right)dx^{\prime}}{\int_{\Gamma}\lambda\left(y|x^{\prime\prime}\right)\rho_{\mathrm{prior}}\left(x^{\prime\prime}\right)dx^{\prime\prime}}.

(39)

Example 15

Let $X$ be the position of a particle. We measure

\displaystyle Y=X+\sigma Z

(40)

where $Z$ is a standard normal variable independent of $X$ . We may refer to $X$ as the signal and $Z$ as the noise.

Now if $X$ was known to be exactly $x$ then $Y$ will be normal with mean $x$ and variance $\sigma^{2}$ . Therefore, we can immediately write down the likelihood function: it is

\displaystyle\lambda\left(y|x\right)=\frac{1}{\sqrt{2\pi}\sigma}e^{-\left(y-x\right)^{2}/2\sigma^{2}},

(41)

\displaystyle\rho_{\mathrm{post}}\left(x|y\right)=\frac{\rho_{\mathrm{prior}}\left(x\right)e^{-\left(y-x\right)^{2}/2\sigma^{2}}}{\int\rho_{\mathrm{prior}}\left(x^{\prime}\right)e^{-\left(y-x^{\prime}\right)^{2}/2\sigma^{2}}dx^{\prime}}.

(42)

In the special case where $X$ is assumed to be Gaussian, say mean $\mu_{0}$ and variance $\sigma_{0}^{2}$ , we can give the explicit form of the posterior as Gaussian with mean $\mu_{1}$ and variance $\sigma_{0}^{2}$ where

	$\displaystyle\mu_{1}$	$\displaystyle=$	$\displaystyle\frac{\sigma_{1}^{2}}{\sigma_{0}^{2}}\mu_{0}+\frac{\sigma_{1}^{2}}{\sigma^{2}}y$		(43)
	$\displaystyle\frac{1}{\sigma_{1}^{2}}$	$\displaystyle=$	$\displaystyle\frac{1}{\sigma_{0}^{2}}+\frac{1}{\sigma^{2}}.$		(44)

There are two desirable features here. First, the new mean $\mu_{1}$ uses the data $y$ . Second, the new variance $\sigma_{1}^{2}$ is smaller than the prior variance $\sigma_{0}^{2}$ . In other words, the measurement is informative and decreases uncertainty in the state

3.4 Classical Filtering

It is possible to extend the conditioning problem to estimate the state of a dynamical system as it evolves in time based on continual monitoring. This involves the theory of stochastic processes and we will use the informal language of path integrals rather than the Ito calculus.

3.4.1 Stochastic Process

A stochastic process is a family, $\left\{X\left(t\right):t\geq 0\right\}$ , of random variables labeled by time. The process is determined by specifying all the multi-time distributions

\displaystyle\rho\left(x_{n},t_{n};\cdots;x_{1},t_{1}\right)

(45)

for $X\left(t_{1}\right)=x_{1},\cdots,X\left(t_{n}\right)=x_{n}$ for each $n\geq 0$ .

A stochastic process is said to be Markov if the multi-time distributions take the form

\displaystyle\rho\left(x_{n},t_{n};\cdots;x_{1},t_{1}\right)=T(x_{n},t_{n}|x_{n-1},t_{n-1})\cdots T(x_{2},t_{2}|x_{1},t_{1})\,\rho(x_{1},t_{1}),

(46)

where whenever $t_{n}>t_{n-1}>\cdots>t_{1}$ .

Here $T(x,t|x_{0},t_{0})$ is the probability density for $X(t)=x$ given that $X(t_{0})=x_{0}$ , ( $t>t_{0}$ ).

\displaystyle\text{Prob}\big{\{}x\leq X(t)\leq x+dx|X(t_{0})=x_{0}\big{\}}=T(x,t|x_{0},t_{0})\,dx,

(47)

for $t>t_{0}$ . It is called the transition mechanism of the Markov process. For consistency we should have the following propagation rule, known as the Chapman-Kolmogorov equation in probability theory,

\displaystyle\int T(x,t|x_{1},t_{1})\,T(x_{1},t_{1}|x_{0},t_{0})\,dx_{1}=T(x,t|x_{0},t_{0}),

(48)

for all $t>t_{1}>t_{0}$ .

Example 16

The Wiener process (Brownian motion) is determined by

	$\displaystyle T\left(x,t\|x_{0},t_{0}\right)$	$\displaystyle=$	$\displaystyle\frac{1}{\sqrt{2\pi\left(t-t_{0}\right)}}e^{-\frac{\left(x-x_{0}\right)^{2}}{2\left(t-t_{0}\right)}},$		(49)
	$\displaystyle\rho\left(x,0\right)$	$\displaystyle=$	$\displaystyle\delta_{0}\left(x\right).$		(50)

The transition mechanism here is the Green’s function for the heat equation

\displaystyle\frac{\partial}{\partial t}\rho=\frac{1}{2}\frac{\partial^{2}}{\partial x^{2}}\rho.

(51)

(In other words, given the data $\rho(\cdot,t_{0})=f(\cdot)$ at time $t_{0}$ , the solution for later times is $\rho(x,t)=\int T(x,t|x_{0},t_{0})f(x_{0})\,dx_{0}$ .)

Norbert Wiener gave an explicit construction - known as the canonical version of Brownian motion, where the sample space is the space of continuous paths, $\mathbf{w}=\left\{w\left(t\right):t\geq 0\right\}$ , starting a the origin as sample space, with a suitable $\sigma$ -algebra of subsets and a well defined measure $\mathbb{P}_{\text{Wiener}}^{t}$ .

The corresponding stochastic process is denote $W(t)$ . Ito was able to construct a stochastic differential calculus around the Wiener process, and more generally diffusions, and we have the following Ito table

dt00dW0dt.\displaystyle\begin{tabular}[]{l|ll}$\times$&$dt$&$dW$\\ \hline\cr$dt$&0&0\\ $dW$&0&$dt$\end{tabular}.

	$\displaystyle\begin{tabular}[]{l\|ll}$\times$&$dt$&$dW$\\ \hline\cr$dt$&0&0\\ $dW$&0&$dt$\end{tabular}.$	×	⁢dt	⁢dW		(55)

3.4.2 Path Integral Formulation

Indeed, we have

\displaystyle\rho\left(x_{n},t_{n};\cdots;x_{1},t_{1}\right)\,dx_{n}\cdots dx_{1}\propto e^{-\sum_{k}\frac{\left(x_{k}-x_{k-1}\right)^{2}}{2\left(t_{k}-t_{k-1}\right)}}dx_{n}\cdots dx_{1}.

(56)

Formally, we may introduce a limit “path integral” with probability measure on the space of paths

\displaystyle\mathbb{P}_{\text{Wiener}}^{t}\left[d\mathbf{w}\right]=e^{-S_{\text{Wiener}}\left[\mathbf{w}\right]}\mathcal{D}\mathbf{w}.

(57)

where we have the action

\displaystyle S_{\text{Wiener}}\left[\mathbf{w}\right]=\int_{0}^{t}\frac{1}{2}\dot{w}\left(\tau\right)^{2}d\tau.

(58)

For a diffusion $X\left(t\right)$ satisfying the Ito stochastic differential equation

\displaystyle dX=v\left(X\right)dt+\sigma\left(X\right)dW

(59)

we have the corresponding measure

\displaystyle\mathbb{P}_{X}^{t}\left[d\mathbf{x}\right]=e^{-S_{X}\left[\mathbf{x}\right]}\mathcal{D}\mathbf{x}.

(60)

where we have the action (substitute $\dot{w}=\frac{\dot{x}-w}{\sigma}$ into $S_{\text{Wiener}}\left[\mathbf{w}\right]$ , and allow for a Jacobian correction)

\displaystyle S_{X}\left[\mathbf{x}\right]=\int_{0}^{t}\frac{1}{2}\frac{[\dot{x}-v(x)]^{2}}{\sigma(x)^{2}}d\tau+\frac{1}{2}\int_{0}^{t}\nabla.v(x)d\tau.

(61)

3.4.3 The Classical Filtering Problem

Suppose that we have a system described by a process $\left\{X\left(t\right):t\geq 0\right\}$ . We obtain information by observing a related process $\left\{Y\left(t\right):t\geq 0\right\}$ .

	$\displaystyle dX$	$\displaystyle=$	$\displaystyle v\left(X\right)dt+\sigma\left(X\right)dW\quad\text{(stochastic dynamics),}$		(62)
	$\displaystyle dY$	$\displaystyle=$	$\displaystyle h\left(X\right)dt+dZ\quad\text{(Noisy observations).}$		(63)

Here we assume that the dynamical noise $W$ and the observational noise $Z$ are independent Wiener processes.

The joint probability of both $X$ and $Y$ up to time $t$ is

\displaystyle\mathbb{P}_{X,Y}^{t}\left[d\mathbf{x},d\mathbf{y}\right]=e^{-S_{X,Y}\left[x,y\right]}\mathcal{D}\mathbf{x}\mathcal{D}\mathbf{y},

(64)

where

	$\displaystyle S_{X,Y}\left[\mathbf{x},\mathbf{y}\right]$	$\displaystyle=$	$\displaystyle S_{X}\left[\mathbf{x}\right]+\int_{0}^{t}\frac{1}{2}\left[\dot{y}-h\left(x\right)\right]^{2}d\tau$		(65)
		$\displaystyle=$	$\displaystyle S_{X}\left[\mathbf{x}\right]+S_{\text{Wiener}}[\mathbf{y}]-\int_{0}^{t}\left[h\left(x\right)\dot{y}-\frac{1}{2}h\left(x\right)^{2}\right]d\tau,$		(66)

\displaystyle\mathbb{P}_{X,Y}^{t}\left[d\mathbf{x},d\mathbf{y}\right]=\mathbb{P}_{X}^{t}\left[d\mathbf{x}\right]\mathbb{P}_{\mathrm{Wiener}}^{t}\left[d\mathbf{y}\right]\,\lambda\left(\mathbf{y}|\mathbf{x}\right).

(67)

where the Kallianpur-Streibel likelihood²²2Readers with a background in stochastic processes will recognize this as a Radon-Nikodym derivative associated with a Girsanov transformation. is

\displaystyle\lambda_{t}\left(\mathbf{y}|\mathbf{x}\right)=e^{\int_{0}^{t}\left[h\left(x\right)dy(\tau)-\frac{1}{2}h\left(x\right)^{2}d\tau\right]}.

(68)

The distribution for $X\left(t\right)$ given observations $\mathbf{y}=\left\{y\left(\tau\right):0\leq\tau\leq t\right\}$ is then

\displaystyle\rho_{t}\left(x|\mathbf{y}\right)=\frac{\int_{x(0)=x_{0}}^{x(t)=x}\lambda_{t}\left(\mathbf{y}|\mathbf{x}\right)\mathbb{P}_{X}^{t}\left[d\mathbf{x}\right]}{\int_{x(0)=x_{0}}\lambda_{t}\left(\mathbf{y}|\mathbf{x}^{\prime}\right)\mathbb{P}_{X}^{t}\left[d\mathbf{x}^{\prime}\right]}

(69)

Let us write $\mathcal{Y}_{t}$ for the $\sigma$ -algebra generated by the observations $\{Y(\tau):0\leq\tau\leq t\}$ . The estimate for $f(X(t))$ for any function $f$ conditioned on the observations up to time $t$ is called the filter and, generalizing (39) to continuous time, we may write this as

	$\displaystyle\mathfrak{E}_{t}(f)$	$\displaystyle=$	$\displaystyle\mathbb{E}[f(X(t))\|\mathcal{Y}_{t}]$		(70)
		$\displaystyle=$	$\displaystyle\int\rho_{t}\left(x\|\mathbf{y}\right)f(x)\,dx=\frac{\int\sigma_{t}(x\|\mathbf{y})f(x)dx}{\int\sigma_{t}(x^{\prime}\|\mathbf{y})dx^{\prime}}$		(70)

where $\sigma_{t}(x|\mathbf{y})=\int_{x(0)=x_{0}}^{x(t)=x}\lambda\left(\mathbf{y}|\mathbf{x}\right)\mathbb{P}_{X}^{t}\left[d\mathbf{x}\right]$ is a non-normalized density. We introduce the stochastic process $\sigma_{t}(x):\omega\mapsto\sigma_{t}(x|\mathbf{y})$ and it can be shown to satisfy the Duncan-Mortensen-Zakai equation

\displaystyle d\sigma_{t}(x)=\mathcal{L}^{\ast}\sigma_{t}(x)\,dt+h(x)\sigma_{t}(x)\,dY(t).

(71)

This implies the filtering equation

\displaystyle d\mathfrak{E}_{t}(f)=\mathfrak{E}_{t}(\mathcal{L}f)\,dt+\big{\{}\mathfrak{E}_{t}(fh)-\mathfrak{E}_{t}(f)\mathfrak{E}_{t}(h)\big{\}}dI(t),

(72)

where the innovations process is defined as

\displaystyle dI(t)=dY(t)-\mathfrak{E}_{t}(h)\,dt.

(73)

4 Quantum Filtering

We now consider the quantum analogue of filtering. See also [11]-[15].

4.1 Quantum Measurement

The Basic Concepts

The Born interpretation of the wave function, $\psi(x)$ , in quantum mechanics is that $|\psi(x)|^{2}$ gives the probability density of finding the particle at position $x$ . More generally, in quantum theory, observables are represented by self-adjoint operators on a Hilbert space. The basic postulate of quantum theory is that the pure states of a system correespond to normalized the wave functions, $\Psi$ , and we will follow Dirac and denote these as kets $|\Psi\rangle$ . When we measure an observable, the physical value we record will be an eigenvalue. If the state is $|\Psi\rangle$ then the average value of the observable represented by $\hat{A}$ is $\langle\hat{A}\rangle=\langle\Psi|\hat{A}|\Psi\rangle$ .

Let us recall that a Hermitian operator $\hat{P}$ is called an orthogonal projection if it satisfies $\hat{P}^{2}=\hat{P}$ . Then if we have a Hermitian operator $\hat{A}$ with a discrete set of eigenvalues, then there exists a collection of orthogonal projections $\hat{P}_{a}$ labeled by the eigenvalues $a$ , satisfying $\hat{P}_{a}\hat{P}_{a^{\prime}}=0$ if $a\neq a^{\prime}$ and $\sum_{a}\hat{P}_{a}=\hat{I}$ , such that

\displaystyle\hat{A}=\sum_{a}a\,\hat{P}_{a}.

(74)

This is the spectral decomposition of $\hat{A}$ . The operators $\hat{P}_{a}$ project onto $\mathcal{E}_{a}$ which is the eigenspace of $\hat{A}$ for eigenvalue $a$ . In other words, $\mathcal{E}_{a}$ is the space of all eigenvectors of $\hat{A}$ having eigenvalue $a$ . The eigenspaces are orthogonal, that is $\langle\psi|\phi\rangle=0$ whenever $\psi$ and $\phi$ lie in different eigenspaces (this is equivalent to $\hat{P}_{a}\hat{P}_{a^{\prime}}=0$ if $a\neq a^{\prime}$ ), and every vector $|\psi\rangle$ can be written as a superposition of vectors $\sum_{a}|\psi_{a}\rangle$ where $|\psi_{a}\rangle$ lies in eigenspace $\mathcal{E}_{a}$ . (In fact, $|\psi_{a}\rangle=\hat{P}_{a}|\psi\rangle$ .)

We note that, for any integer $n$ ,

\displaystyle\hat{A}^{n}=\sum_{a}a^{n}\,\hat{P}_{a}

(75)

and any real $t$

\displaystyle e^{it\hat{A}}=\sum_{a}e^{ita}\hat{P}_{a}.

(76)

Suppose we prepare a quantum system in a state $|\Psi\rangle$ and perform a measurement of an observable $\hat{A}$ . We know that we may only measure an eigenvalue $a$ and quantum mechanics predicts the probability $p_{a}$ . In fact, using the spectral decomposition

\displaystyle\langle\hat{A}^{n}\rangle=\langle\sum_{a}a^{n}\,\hat{P}_{a}\rangle=\sum_{a}\langle a^{n}\,\hat{P}_{a}\rangle=\sum_{a}a^{n}\,p_{a},

(77)

and so

\displaystyle p_{a}=\langle\hat{P}_{a}\rangle\equiv\langle\Psi|\hat{P}_{a}|\Psi\rangle.

(78)

For the special case of a non-degenerate eigenvalue $a$ , we have that the eigenspace $\mathcal{E}_{a}$ is spanned by a single eigenvector $|a\rangle$ , which we take to be normalized. In this case we have $\hat{P}_{a}=|a\rangle\langle a|$

\displaystyle p_{a}=\langle\Psi|\hat{P}_{a}|\Psi\rangle=\langle\Psi|a\rangle\langle a|\Psi\rangle\equiv\left|\langle a|\Psi\rangle\right|^{2}.

(79)

We see that if an observable $\hat{A}$ has a non-degenerate eigenvalue $a$ with normalized eigenvector $|a\rangle$ , then if the system is prepared in state $|\Psi\rangle$ , the probability of measuring $a$ in an experiment is $\left|\langle a|\Psi\rangle\right|^{2}$ . The modulus squared of an overlap in this way may therefore have the interpretation as a probability.

The degenerate case needs some more attention. Here the eigenspace $\mathcal{E}_{a}$ can spanned by a set of orthonormal vectors $|a1\rangle,|a2\rangle,\cdots$ so that $\hat{P}_{a}=\sum_{n}|an\rangle\langle an|$ , and so $p_{a}=\sum_{n}\left|\langle an|\Psi\rangle\right|^{2}$ . The choice of the orthonormal basis for $\mathcal{E}_{a}$ is not important!

The probability $p_{a}$ is equal to the length-squared of $\hat{P}_{a}|\Psi\rangle$ , that is,

\displaystyle p_{a}=\|\hat{P}_{a}\Psi\|^{2}.

(80)

To see this, note that $\|\hat{P}_{a}\Psi\|^{2}$ is the overlap of the ket $\hat{P}_{a}|\Psi\rangle$ with its own bra $\langle\Psi|\hat{P}_{a}^{{\dagger}}$ so

\displaystyle\|\hat{P}_{a}\Psi\|^{2}=\langle\Psi|\hat{P}_{a}^{{\dagger}}\,\hat{P}_{a}|\Psi\rangle=\langle\Psi|\hat{P}_{a}^{2}|\Psi\rangle=\langle\Psi|\hat{P}_{a}|\Psi\rangle=p_{a}

(81)

where we used the fact that $\hat{P}_{a}=\hat{P}_{a}^{{\dagger}}=\hat{P}_{a}^{2}$ .

In the picture below, we project $|\Psi\rangle$ into the eigenspace $\mathcal{E}_{a}$ to get $\hat{P}_{a}|\Psi\rangle$ . In the special case where $|\Psi\rangle$ was already in the eigenspace, it equals its own projection ( $\hat{P}_{a}|\Psi\rangle=|\Psi\rangle$ ) and so $p_{a}=1$ since the state $|\Psi\rangle$ is normalized. If the state $|\Psi\rangle$ is however orthogonal to the eigenspace then its projection is zero ( $\hat{P}_{a}|\Psi\rangle=0$ ) and so $p_{a}=0$ .

In general, we get something in between. In the picture below we see that $|\Psi\rangle$ has a component in the eigenspace and a component orthogonal to it. The projected vector $\hat{P}_{a}|\Psi\rangle$ will then have length less than the original $|\Psi\rangle$ , and so $p_{a}<1$ .

Von Neumann’s Projection Postulate

Suppose the initial state is $|\Psi\rangle$ and we measure the eigenvalue $a$ of observable $\hat{A}$ in an given experiment. A second measurement of $\hat{A}$ performed straight way ought to yield the same value $a$ again, this time with certainty.

The only way however to ensure that we measure a given eigenvalue with certainty is if the state lies in the eigenspace for that eigenvalue. We therefore require that the state of the system immediately after the result $a$ is measured will jump from $|\Psi\rangle$ to something lying in the eigenspace $\mathcal{E}_{a}$ . This leads us directly to the von Neumann projection postulate.

The von Neumann projection postulate: If the state of a system is given by a ket $|\Psi\rangle$ , and a measurement of observable $\hat{A}$ yields the eigenvalue $a$ , then the state immediately after measurement becomes $|\Psi_{a}\rangle=\dfrac{1}{\sqrt{p_{a}}}\,\hat{P}_{a}|\Psi\rangle.$

We note that the projected vector $\hat{P}_{a}|\Psi\rangle$ has length $\sqrt{p_{a}}$ so we need to divide by this to ensure that $|\Psi_{a}\rangle$ is properly normalized. The von Neumann postulate is essentially the simplest geometric way to get the vector $|\Psi\rangle$ into the eigenspace: project down and then normalize!

Compatible Measurements

Suppose we measure a pair of observables $\hat{A}$ and $\hat{B}$ in that sequence. The $\hat{A}$ -measurement leaves the state in the eigenspace of the measured value $a$ , the subsequent $\hat{B}$ -measurement then leaves the state in the eigenspace of the measured value $b$ . If we then went back and remeasured $\hat{A}$ would be find $a$ again with certainty? The state after the second measurement will be an eigenvector of $\hat{B}$ with eigenvalue $b$ , but this need not necessarily be an eigenvector of $\hat{A}$ .

Let $A$ and $\hat{B}$ be a pair of observables with spectral decompositions $\sum_{a}a\hat{P}_{a}$ and $\sum_{b}b\hat{Q}_{b}$ respectively. Let us measure $\hat{A}$ and then $\hat{B}$ recording values $a$ and $b$ respectively. If the initial state was $|\Psi_{\text{in}}\rangle$ then we obtain after both measurements the final state will be

\displaystyle|\Psi_{\text{out}}\rangle\propto\hat{Q}_{b}\hat{P}_{a}\,|\Psi_{\text{in}}\rangle.

(82)

In particular $|\Psi_{\text{out}}\rangle$ is an eigenstate of $\hat{B}$ with eigenvalue $b$ . However suppose we also wanted $|\Psi_{\text{out}}\rangle$ to be an eigenstate of $\hat{A}$ with the original eigenvalue $a$ , the we must have $\hat{P}_{a}|\Psi_{\text{out}}\rangle=|\Psi_{\text{out}}\rangle$ or equivalently

\displaystyle\hat{P}_{a}\hat{Q}_{b}\hat{P}_{a}\,|\Psi_{\text{in}}\rangle=\hat{Q}_{b}\hat{P}_{a}\,|\Psi_{\text{in}}\rangle.

(83)

If we want this to be true irrespective of the actual initial state $|\Psi_{\text{in}}\rangle$ then we arrive at the operator equation

\displaystyle\hat{P}_{a}\hat{Q}_{b}\hat{P}_{a}=\hat{Q}_{b}\hat{P}_{a}.

(84)

Proposition 17

Let $\hat{P}$ and $\hat{Q}$ be a pair of orthogonal projections satisfying $\hat{P}\hat{Q}\hat{P}=\hat{Q}\hat{P}$ then $\hat{P}\hat{Q}=\hat{Q}\hat{P}$ .

Proof. We first observe that $\hat{R}=\hat{Q}\hat{P}\hat{Q}$ will again be an orthogonal projection. To this end we must show that $R^{{\dagger}}=R$ and $R^{2}=R$ . However, $R^{{\dagger}}=\left(\hat{Q}\hat{P}\hat{Q}\right)^{{\dagger}}=\hat{Q}^{{\dagger}}\hat{P}^{{\dagger}}\hat{Q}^{{\dagger}}=\hat{Q}\hat{P}\hat{Q}=R$ and

$\displaystyle\hat{R}^{2}$	$\displaystyle=$	$\displaystyle\left(\hat{Q}\hat{P}\hat{Q}\right)\left(\hat{Q}\hat{P}\hat{Q}\right)=\hat{Q}\hat{P}\hat{Q}^{2}\hat{P}\hat{Q}$
	$\displaystyle=$	$\displaystyle\hat{Q}\hat{P}\hat{Q}\hat{P}\hat{Q}=\hat{Q}(\hat{P}\hat{Q}\hat{P})\hat{Q}$
	$\displaystyle=$	$\displaystyle\hat{Q}(\hat{Q}\hat{P})\hat{Q}=\hat{Q}^{2}\hat{P}\hat{Q}$
	$\displaystyle=$	$\displaystyle\hat{Q}\hat{P}\hat{Q}=\hat{R}.$

However we also have $\hat{R}=\hat{Q}\hat{P}$ , so the relation $\hat{R}=\hat{R}^{{\dagger}}$ implies that $\hat{Q}\hat{P}=\hat{P}^{{\dagger}}\hat{Q}^{{\dagger}}=\hat{P}\hat{Q}$ .

We see that our operator identity above means that $\hat{Q}_{a}$ and $\hat{P}_{b}$ need to commute! If we wanted the $\hat{B}$ -measurement not to disturb the $\hat{A}$ -measurement for any possible outcome $a$ and $b$ , then we require that all the eigen-projections of $\hat{A}$ commute with all the eigen-projections of $\hat{B}$ , and this implies that .

Definition 18

A collection of observables are compatible if they commute. We define the commutator of two operators as

\displaystyle\left[\hat{A},\hat{B}\right]=\hat{A}\hat{B}-\hat{B}\hat{A}

(85)

So $\hat{A}$ and $\hat{B}$ are compatible if $\left[\hat{A},\hat{B}\right]=0$ .

Von Neumann’s Model of Measurement

The postulates of quantum mechanics outlined above assume that all measurements are idealized, but one might expect the actual process of extracting information from quantum systems to be more involved. Von Neumann modeled the measurement process as follows. We wish to get information about an observable, $\hat{X}$ , say the position of a quantum system. Rather than measure $\hat{X}$ directly, we measure an observable $\hat{Y}$ giving the pointer position of a second system (called the measurement apparatus).

We will reformulate the von Neumann measurement problem in the language of estimation theory. First we assume that apparatus is described by a wave-function $\phi$ . The initial state of the system and apparatus is $|\Psi_{0}\rangle=|\Psi_{\mathrm{prior}}\rangle\otimes|\phi\rangle$ , i.e.,

\displaystyle\langle x,y|\Psi_{0}\rangle=\Psi_{\mathrm{prior}}\left(x\right)\,\phi\left(y\right).

(86)

(Note that we are already falling in line with the estimation way of thinking by referring to the initial wave function of the particle as an a priori wave function - it is something we have to fix at the outset, even if we recognize it as only a guess for the correct physical state.)) The system and apparatus are taken to interact by means of the unitary

\displaystyle\hat{U}=e^{i\mu\hat{X}\otimes\hat{P}_{\mathrm{app}}/\hbar}

(87)

where $\hat{P}_{\mathrm{app}}=-i\hbar\frac{\partial}{\partial y}$ is the momentum operator of the pointer conjugate to $\hat{Y}$ . After coupling, the joint state is

\displaystyle\langle x,y|\hat{U}\Psi_{0}\rangle=\Psi_{\mathrm{prior}}\left(x\right)\,\phi\left(y-\mu x\right).

(88)

If the measured value of $\hat{Y}$ is $y$ , then the a posteriori wave-function must be

\displaystyle\psi_{\mathrm{post}}(x|y)=\frac{1}{\sqrt{\rho_{Y}(y)}}\psi_{\mathrm{prior}}\left(x\right)\,\phi\left(y-\mu x\right)

(89)

where

\displaystyle\rho_{Y}(y)=\int|\psi_{\mathrm{prior}}\left(x\right)\,\phi\left(y-\mu x\right)|^{2}dx.

(90)

Basically, the pointer position will be a random variable with pdf given by $\rho_{Y}$ : the a posteriori wave-function may then be thought of as a random wave-function on the system Hilbert space:

\displaystyle\psi_{\mathrm{prior}}(x)\longrightarrow\psi_{\mathrm{post}}(x|Y).

(91)

In the parlance of quantum theorists, the wave function of the apparatus collapses to $|y\rangle$ , while we update the a priori wave function to get the a posteriori one.

We have been describing events in the Schrödinger picture where states evolve while observables remain fixed. In this picture, we measure the observable $\hat{Y}^{\mathrm{in}}=I\otimes\hat{Y}$ , but the state is changing in time. It is instructive to describe events in the Heisenberg picture. Here the state is fixed as $|\Psi_{0}\rangle=|\Psi_{\mathrm{prior}}\rangle\otimes|\phi\rangle$ , while the observables evolve. In fact, the observable that we actually measure is

\displaystyle\hat{Y}^{\text{out}}=\hat{U}^{\ast}\big{(}I\otimes\hat{Y}\big{)}\hat{U}=\underbrace{\mu\,\hat{U}^{\ast}\big{(}\hat{X}\otimes I\big{)}\hat{U}}_{\mathrm{signal}}+\underbrace{\hat{Y}^{\mathrm{in}}}_{\mathrm{noise}},

(92)

from which it is clear that we are obtaining some information about $\hat{X}$ . Note that the measured observable $\hat{Y}^{\text{out}}$ is explicitly of the form signal plus noise as in Example 15. The noise term, $\hat{Y}^{\text{in}}$ , is independent of the signal and has the prescribed pdf $|\phi(y)|^{2}$ .

4.2 Quantum Markovian Systems

Quantum Systems with Classical Noise

We consider a quantum system driven by Wiener noise. For $H$ and $R$ self-adjoint, we set

\displaystyle U(t)=e^{-iHt-iRW(t)},

(93)

which clearly defines a unitary process. From the Ito calculus we can quickly deduce the corresponding Schrödinger equation

\displaystyle dU(t)=\big{[}-iH-\frac{1}{2}R^{2}\big{]}U(t)\,dt-iRU(t)\,dW(t).

(94)

If we set $j_{t}(X)=U(t)^{\ast}XU(t)$ , which we may think of as an embedding of the system observable $X$ into a noisy environment, then we similarly obtain

\displaystyle dj_{t}(X)=j_{t}\big{(}\mathcal{L}(X)\big{)}\,dt-ij_{t}\big{(}[X,R]\big{)}\,dW(t).

(95)

where

\displaystyle\mathcal{L}(X)=-i[X,H]-\frac{1}{2}\big{[}[X,R],R\big{]}.

(96)

An alternative is to use Poissonian noise. Here we apply a unitary kick, $S$ , at times distributed as a Poisson process with rate $\nu>0$ . Let $N(t)$ count the number of kicks up to time $t$ , then $\{N(t):t\geq 0\}$ is a stochastic process with independent stationary increments (like the Wiener process) and we have the Ito rules

\displaystyle dN(t)\,dN(t)=dN(t),\qquad\langle dN(t)\rangle=\nu\,dt.

(97)

The Schrödinger equation is $dU(t)=(S-I)U(t)\,dN(t)$ and for the evolution of observables we now have

\displaystyle dj_{t}(X)=j_{t}\big{(}\mathcal{L}(X)\big{)}dN(t),\qquad\mathcal{L}(X)=S^{\ast}XS-X.

(98)

Lindblad Generators

A quantum dynamical semigroup is a family of CP maps, $\{\Phi_{t}:t\geq 0\}$ , such that $\Phi_{t}\circ\Phi_{s}=\Phi_{t+s}$ and $\Phi(I)=I$ . Under various continuity conditions one can show that the general form of the generator is

\displaystyle\mathcal{L}(X)=\sum_{k}\frac{1}{2}L_{k}^{\ast}[X,L_{k}]+\sum_{k}\frac{1}{2}[L_{k}^{\ast},X]L_{k}-i[X,H].

(99)

These include the examples emerging from classical noise above - in fact, combinations of the Wiener and Poissonian cases give the general classical case. But the class of Lindblad generators is strictly larger that this, meaning that we need quantum noise! This is typically what we consider when modeling quantum optics situation.

4.3 Quantum Noise Models

Fock Space

We recall how to model bosonic fields. We wish to describe a typical pure state $|\Psi\rangle$ of the field. If we look at the field we expect to see a certain number, $n$ , of particles at locations $x_{1},x_{2},\cdots,x_{n}$ and to this situation we assign a complex number (the probability amplitude) $\psi_{n}(x_{1},x_{2},\cdots x_{n})$ . As the particles are indistinguishable bosons, the amplitude should be completely symmetric under interchange of particle identities.

The field however can have an indefinite number of particles - that is, it can be written as a superposition of fixed number states. The general form of a pure state for the field will be

\displaystyle|\Psi\rangle=\big{(}\psi_{0},\psi_{1},\psi_{2},\psi_{3},\cdots\big{)}.

(100)

Note that the case $n=0$ is included and is understood as the vacuum state. Here $\psi_{0}$ is a complex number, with $p_{0}=|\psi_{0}|^{2}$ giving the probability for finding no particles in the field.

The probability that we have exactly $n$ particles is

\displaystyle p_{n}=\int\left|\psi_{n}\left(x_{1},x_{2},\cdots,x_{n}\right)\right|^{2}dx_{1}dx_{2}\cdots dx_{n},

(101)

and the normalization of the state is therefore $\sum_{n=0}^{\infty}p_{n}=1$ .

In particular, we take the vacuum state to be

\displaystyle|\Omega\rangle=\big{(}1,0,0,0,\cdots\big{)}.

(102)

The Hilbert space spanned by such indefinite number of indistinguishable boson states is called Fock Space.

A convenient spanning set is given by the exponential vectors

\displaystyle\langle x_{1},x_{2},\cdots,x_{n}|\exp\left(\alpha\right)\rangle=\frac{1}{\sqrt{n!}}\alpha\left(x_{1}\right)\alpha\left(x_{2}\right)\cdots\alpha\left(x_{n}\right).

(103)

They are, in fact, over-complete and we have the inner products

	$\displaystyle\langle\exp\left(\alpha\right)\|\exp\left(\beta\right)\rangle$	(104)
$\displaystyle=$	$\displaystyle\sum_{n}\frac{1}{n!}\int\alpha\left(x_{1}\right)^{\ast}\cdots\alpha\left(x_{n}\right)^{\ast}\beta\left(x_{1}\right)\cdots\beta\left(x_{n}\right)\,dx_{1}\cdots dx_{n}$
$\displaystyle=$	$\displaystyle e^{\int\alpha\left(x\right)^{\ast}\beta\left(x\right)dx}$
$\displaystyle=$	$\displaystyle e^{\langle\alpha\|\beta\rangle}.$

The exponential vectors, when normalized, give the analogues to the coherent states for a single mode.

We note that the vacuum is an example: $|\Omega\rangle=|\exp(0)\rangle$ .

Quanta on a Wire

We now take our space to be 1-dimensional - a wire. Let’s parametrize the position on the wire by variable $\tau$ , and denote by $\mathfrak{F}_{[s,t]}$ the Fock space over a segment of the wire $s\leq\tau\leq t$ . We have the following tensor product decomposition

\displaystyle\mathfrak{F}_{A\cup B}=\mathfrak{F}_{A}\otimes\mathfrak{F}_{B},\qquad\qquad\text{if}\quad A\cap B=\emptyset.

(105)

In is convenient to introduce quantum white noises $b(t)$ and $b(t)^{\ast}$ satisfying the singular commutation relations

\displaystyle[b(t),b(s)^{\ast}]

\displaystyle=

\displaystyle\delta(t-s).

(106)

Here $b(t)$ annihilates a quantum of the field at location $t$ . In keeping with the usual theory of the quantized harmonic oscillator, we take it that $b(t)$ annihilates the vacuum: $b(t)\,|\Omega\rangle=0$ . More generally, this implies that

\displaystyle b(t)\,|\exp(\beta)\rangle=\beta(t)\,|\exp(\beta)\rangle.

(107)

The adjoint $b(t)^{\ast}$ creates a quantum at position $t$ .

The quantum white noises are operator densities and are singular, but their integrated forms do correspond to well defined operators which we call the annihilation and creation processes, respectively,

\displaystyle B(t)=\int_{0}^{t}b(\tau)d\tau,\qquad B(t)^{\ast}=\int_{0}^{t}b(\tau)^{\ast}d\tau.

(108)

We see that

\displaystyle[B(t),B(s)^{\ast}]=\int_{0}^{t}d\tau\int_{0}^{s}d\sigma\,\delta(\tau-\sigma)=\text{min}(t,s).

(109)

In addition we introduce a further process, called the number process, according to

\displaystyle\Lambda(t)=\int_{0}^{t}b(\tau)^{\ast}b(\tau)d\tau.

(110)

Quantum Stochastic Models

We now think of our system as lying at the origin $\tau=0$ of a quantum wire. The quanta move along the wire at the speed of light, $c$ , and the parameter $\tau$ can be thought of as $x/c$ which is the time for quanta at a distance $x$ away to reach the system. Better still $\tau$ is the time at which this part of the field passes through the system. The process $B(t)=\int_{0}^{t}b(\tau)d\tau$ is the operator describing the annihilation of quanta passing through the system at some stage over the time-interval $[0,t]$ .

Fix a system Hilbert space, $\mathfrak{h}_{0}$ , called the initial space. A quantum stochastic process is a family of operators, $\{X(t):t\geq 0\}$ , acting on $\mathfrak{h}_{0}\otimes\mathfrak{F}_{[0,\infty)}$ . .

The process is adapted if, for each $t$ , the operator $X(t)$ acts trivially on the future environment factor .

QSDEs with adapted coefficients where originally introduced by Hudson & Parthasarathy in 1984. Let $\{X_{\alpha\beta}(t):t\geq 0\}$ be four adapted quantum stochastic processes defined for $\alpha,\beta\in\{0,1\}$ . We then define consider the QSDE

\displaystyle\dot{X}(t)=b(t)^{\ast}(t)X_{11}(t)b(t)+b(t)^{\ast}X_{10}+X_{01}(t)b(t)+X_{00}(t),

(111)

with initial condition $X(0)=X_{0}\otimes I$ . To understand this we take matrix elements between states of the form $|\phi\otimes\exp(\alpha)\rangle$ and use the eigen-relation (107) to get the integrated form

\displaystyle\langle\phi\otimes\exp(\alpha)|X(t)|\psi\otimes\exp(\beta)\rangle=\langle\phi|X_{0}|\psi\rangle\,\langle\exp(\alpha)|\exp(\beta)\rangle

(112)

$\displaystyle+$	$\displaystyle\int_{0}^{t}\alpha(\tau)^{\ast}\langle\phi\otimes\exp(\alpha)\|X_{11}(t)\|\psi\otimes\exp(\beta)\rangle\beta(\tau)d\tau$
$\displaystyle+$	$\displaystyle\int_{0}^{t}\alpha(\tau)^{\ast}\langle\phi\otimes\exp(\alpha)\|X_{10}(t)\|\psi\otimes\exp(\beta)\rangle d\tau$
$\displaystyle+$	$\displaystyle\int_{0}^{t}\langle\phi\otimes\exp(\alpha)\|X_{01}(t)\|\psi\otimes\exp(\beta)\rangle\beta(\tau)d\tau$
$\displaystyle+$	$\displaystyle\int_{0}^{t}\langle\phi\otimes\exp(\alpha)\|X_{00}(t)\|\psi\otimes\exp(\beta)\rangle d\tau.$	(113)

Processes obtain this way are called quantum stochastic integrals.

The approach of Hudson and Parthasarathy is actually different [16, 17]. The arrive at the process defined by (111) by building the analogue of the Ito theory for stochastic integration: that is the show conditions in which

	$\displaystyle dX(t)$	$\displaystyle=$	$\displaystyle X_{11}(t)\otimes d\Lambda(t)+X_{10}(t)\otimes dB(t)^{\ast}+X_{01}(t)\otimes dB(t)+X_{00}(t)\otimes dt,$
					(114)

makes sense as a limit process where all the increments are future pointing. That is $\Delta\Lambda\equiv\Lambda(t+\Delta t)-\Lambda(t)$ with $\Delta t>0$ , etc.

One has, for instance,

	$\displaystyle\langle\phi\otimes\exp(\alpha)\|X_{00}(t)\otimes\Delta B(t)\|\psi\otimes\exp(\beta)\rangle$
	$\displaystyle\quad=\bigg{(}\int_{t}^{t+\Delta t}\beta(\tau)d\tau\bigg{)}\times\langle\phi\otimes\exp(\alpha)\|X_{00}(t)\otimes I\|\psi\otimes\exp(\beta)\rangle,$		(115)

etc., so the two approaches coincide.

Quantum Ito Rules

It is clear from (111) that this calculus is Wick ordered - note that the creators $b(t)^{\ast}$ all appear to the left and all the annihilators, $b(t)$ , appear to the right of the coefficients. The product of two Wick ordered expressions in not immediately Wick ordered and one must use the singular commutation relations to achieve this. This results in a additional term which corresponds to a quantum Ito correction.

We have

\displaystyle dB(t)dB(t)=dB(t)^{\ast}dB(t)=dB^{\ast}(t)dB^{\ast}(t)=0

(116)

To see this, let $X_{t}$ adapted, then

\displaystyle\langle\exp(\alpha)|X_{t}dB(t)^{\ast}dB(t)|\exp(\beta)\rangle=\alpha(t)^{\ast}\langle\exp(\alpha)|X_{t}\exp(\beta)\rangle\beta(t)\,(dt)^{2}

(117)

As we have a square of $dt$ we can neglect such terms.

However, we have

\displaystyle[B(t)-B(s),B(t)^{\ast}-B(s)^{\ast}]=t-s,\qquad(t>s)

(118)

and so $\Delta B\,\Delta B^{\ast}=\Delta B^{\ast}\Delta B+\Delta t$ . The infinitesimal form of this is then

\displaystyle dB(t)dB(t)^{\ast}=dt.

(119)

This is strikingly similar to the classical rule for increments of the Wiener process!

In fact, we have the following quantum Ito table

dt0000dB00dtdBdB∗0000dΛ00dB∗dΛ.\displaystyle\begin{tabular}[]{l|llll}$\times$&$dt$&$dB$&$dB^{\ast}$&$d\Lambda$\\ \hline\cr$dt$&0&0&0&0\\ $dB$&0&0&$dt$&$dB$\\ $dB^{\ast}$&0&0&0&0\\ $d\Lambda$&0&0&$dB^{\ast}$&$d\Lambda$\end{tabular}.

	$\displaystyle\begin{tabular}[]{l\|llll}$\times$&$dt$&$dB$&$dB^{\ast}$&$d\Lambda$\\ \hline\cr$dt$&0&0&0&0\\ $dB$&0&0&$dt$&$dB$\\ $dB^{\ast}$&0&0&0&0\\ $d\Lambda$&0&0&$dB^{\ast}$&$d\Lambda$\end{tabular}.$	×	⁢dt	⁢dB	⁢dB∗	⁢dΛ		(125)

Each of the non-zero terms arises from multiplying two processes that are not in Wick order.

For a pair of quantum stochastic integrals, we have the following quantum Ito product formula

\displaystyle d\big{(}XY\big{)}=(dX)dY+dX(dY)+(dX)(dY).

(126)

Unlike the classical version, the order of $X$ and $Y$ here is crucial.

Some Classical Processes On Fock Space

The process $Q(t)=B(t)+B(t)^{\ast}$ is self-commuting, that is $[Q(t),Q(s)]=0,\quad\forall t,s$ , and has the distribution of a Wiener process is the vacuum state

	$\displaystyle\langle\dot{Q}(t)\rangle$	$\displaystyle=$	$\displaystyle\langle\Omega\|[b(t)+b(t)^{\ast}]\Omega\rangle=0,$		(127)
	$\displaystyle\langle\dot{Q}(t)\dot{Q}(s)\rangle$	$\displaystyle=$	$\displaystyle\langle\Omega\|b(t)b^{\ast}(s)\Omega\rangle=\delta(t-s).$		(128)

The same applies to $P(t)=\frac{1}{i}[B(t)-B(t)^{\ast}]$ , but

\displaystyle[Q(t),P(s)]=2i\,\text{min}(t,s).

(129)

So we have two non-commuting Wiener processes in Fock space. We refer to $Q$ and $P$ as canonically conjugate quadrature processes.

One see that, for instance,

\displaystyle dQdQ=dBdB^{\ast}=dt.

(130)

We also obtain a Poisson process by the prescription

\displaystyle N(t)=\Lambda(t)+\sqrt{\nu}B^{\ast}(t)+\sqrt{\nu}B(t)+\nu t.

(131)

One readily checks that $dNdN=dN$ from the quantum Ito table.

Emission-Absorption Interactions

Let us consider a singular Hamiltonian of the form

\displaystyle\Upsilon(t)=H\otimes I+iL\otimes b(t)^{\ast}-iL^{\ast}\otimes b(t).

(132)

We will try and realize the solution to the Schrödinger equation

\displaystyle\dot{U}(t)=-i\Upsilon(t)\,U(t),\qquad U(0)=I.

(133)

as a unitary quantum stochastic integral process.

Let us first remark that the annihilator part of (132) will appear out of Wick order when we consider (133). The standard approach in quantum field theory is to develop the unitary $U(t)$ as a Dyson series expansion - often re-interpreted as a time order-exponential:

$\displaystyle U(t)$	$\displaystyle=$	$\displaystyle I-i\int_{0}^{t}\Upsilon(\tau)U(\tau)d\tau$	(134)
	$\displaystyle=$	$\displaystyle 1-i\int_{0}^{t}d\tau\Upsilon(\tau)+(-i)^{2}\int_{0}^{t}d\tau_{2}\int_{0}^{\tau_{2}}d\tau_{2}\Upsilon(\tau_{2})\Upsilon(\tau_{1})+\cdots$
	$\displaystyle=$	$\displaystyle\vec{T}e^{-i\int_{0}^{t}\Upsilon(\tau)d\tau}.$

In our case the field terms - the quantum white noises - are linear, however, we have the problem that they come multiplied by the system operators $L$ and $L^{\ast}$ which do not commute, and don’t necessarily commute with $H$ either.

Fortunately we can do the Wick ordering in one fell swoop rather than having to go down each term of the Dyson series. We have

$\displaystyle\left[b\left(t\right),U\left(t\right)\right]$	$\displaystyle=$	$\displaystyle\left[b\left(t\right),I-i\int_{0}^{t}\Upsilon\left(\tau\right)U\left(\tau\right)d\tau\right]=-i\int_{0}^{t}\left[b\left(t\right),\Upsilon\left(\tau\right)\right]U\left(\tau\right)d\tau$	(135)
	$\displaystyle=$	$\displaystyle\int_{0}^{t}\left[b\left(t\right),Lb\left(\tau\right)^{\ast}\right]U\left(\tau\right)d\tau$
	$\displaystyle=$	$\displaystyle L\int_{0}^{t}\delta\left(t-\tau\right)U\left(\tau\right)d\tau=\frac{1}{2}LU\left(t\right),$

where we dropped the $[b(t),U(\tau)]$ term as this should vanish for $t>\tau$ and took half the weight of the $\delta$ -function due to the upper limit $t$ of the integration. However, we get

\displaystyle b\left(t\right)U\left(t\right)=U\left(t\right)b\left(t\right)+\frac{1}{2}LU\left(t\right).

(136)

Plugging this into the equation (133), we get

	$\displaystyle\dot{U}\left(t\right)$	$\displaystyle=$	$\displaystyle b\left(t\right)^{\ast}LU\left(t\right)-L^{\ast}b\left(t\right)U\left(t\right)-iH\left(t\right)U\left(t\right)$		(137)
		$\displaystyle=$	$\displaystyle b\left(t\right)^{\ast}LU\left(t\right)-L^{\ast}U\left(t\right)b\left(t\right)-\left(\frac{1}{2}L^{\ast}L+iH\right)U\left(t\right).$		(137)

which is now Wick ordered. We can interpret this as the Hudson-Parthasarathy equation

\displaystyle dU\left(t\right)=\left\{L\otimes dB\left(t\right)^{\ast}-L^{\ast}\otimes dB\left(t\right)-\left(\frac{1}{2}L^{\ast}L+iH\right)\otimes dt\right\}U\left(t\right).

(138)

The corresponding Heisenberg equation for $j_{t}(X)=U(t)^{\ast}[X\otimes I]U(t)$ will be

$\displaystyle dj_{t}\left(X\right)$	$\displaystyle=$	$\displaystyle dU\left(t\right)^{\ast}\left[X\otimes I\right]U\left(t\right)+U\left(t\right)^{\ast}\left[X\otimes I\right]dU\left(t\right)$	(139)
		$\displaystyle+dU\left(t\right)^{\ast}\left[X\otimes I\right]dU\left(t\right)$
	$\displaystyle=$	$\displaystyle j_{t}\left(\mathcal{L}X\right)\otimes dt+j_{t}\left(\left[X,L\right]\right)\otimes dB\left(t\right)^{\ast}+j_{t}\left(\left[L^{\ast},X\right]\right)\otimes dB\left(t\right)$

where

	$\displaystyle\mathcal{L}X$	$\displaystyle=$	$\displaystyle-X\left(\frac{1}{2}L^{\ast}L+iH\right)-\left(\frac{1}{2}L^{\ast}L-iH\right)X+L^{\ast}XL$		(140)
		$\displaystyle=$	$\displaystyle\frac{1}{2}\left[L^{\ast},X\right]L+\frac{1}{2}L^{\ast}\left[X,L\right]-i\left[X,H\right].$		(140)

We note that we obtain the typical Lindblad form for the generator.

Scattering Interactions

We mention that we could also treat a Hamiltonian with only scattering terms Let us set $\Upsilon\left(t\right)=E\otimes b\left(t\right)^{\ast}b\left(t\right)$ . The same sort of argument leads to

\displaystyle\left[b\left(t\right),U\left(t\right)\right]=-iE\int_{0}^{t}\left[b\left(t\right),b\left(\tau\right)^{\ast}\right]b\left(\tau\right)U\left(\tau\right)d\tau=-\frac{i}{2}Eb\left(t\right)U\left(t\right),

(141)

which can be rearranged to give

\displaystyle b\left(t\right)U\left(t\right)=\frac{1}{I-\frac{i}{2}E}U\left(t\right)b\left(t\right).

(142)

So the Wick ordered form is

\displaystyle\dot{U}\left(t\right)=Eb\left(t\right)^{\ast}b\left(t\right)U\left(t\right)=\frac{E}{I-\frac{i}{2}}b\left(t\right)^{\ast}U\left(t\right)b\left(t\right)

(143)

or in quantum Ito form

\displaystyle dU\left(t\right)=\left(S-I\right)\otimes d\Lambda\left(t\right)\,U\left(t\right),\qquad\left(S=\frac{I+\frac{i}{2}E}{I-\frac{i}{2}E}\text{, unitary!}\right).

(144)

The Heisenberg equation here is $dj_{t}\left(X\right)=j_{t}\left(S^{\ast}XS-X\right)\otimes d\Lambda\left(t\right)$ .

This is all comparable to the classical Poisson process driven evolution involving unitary kicks.

The SLH Formalism

We now outline the so-called SLH Formalism - named after the scattering matrix operator $S$ , the coupling vector operator $L$ and Hamiltonian $H$ appearing in these Markov models [18]-[20]. The examples considered up to now used only one species of quanta. We could in fact have $n$ channels, based on $n$ quantum white noises:

\displaystyle[b_{j}(t),b^{\ast}_{k}(s)]=\delta_{jk}\,\delta(t-s).

(145)

The most general form of a unitary process with fixed coefficients may be described as follows: we have a Hamiltonian $H=H^{\ast}$ , a column vector of coupling/ collapse operators

\displaystyle L=\left[\begin{array}[]{c}L_{1}\\ \vdots\\ L_{n}\end{array}\right],

(149)

and a matrix of operators

\displaystyle S=\left[\begin{array}[]{ccc}S_{11}&\cdots&S_{1n}\\ \vdots&\ddots&\vdots\\ S_{n1}&\cdots&S_{nn}\end{array}\right],\qquad S^{-1}=S^{\ast}.

(153)

For each such triple $(S,L,H)$ we have the QSDE

	$\displaystyle dU(t)$	$\displaystyle=$	$\displaystyle\bigg{\{}\sum_{jk}(S_{jk}-\delta_{jk}I)\otimes d\Lambda_{jk}(t)+\sum_{j}L_{j}\otimes dB_{j}^{\ast}(t)$		(154)
			$\displaystyle-\sum_{jk}L_{j}^{\ast}S_{jk}\otimes dB_{k}(t)-(\frac{1}{2}\sum_{k}L_{k}^{\ast}L_{k}+iH)\otimes dt\bigg{\}}\,U(t)$		(154)

which has, for initial condition $U(0)=I$ , a solution which is a unitary adapted quantum stochastic process. The emission-absorption case is the $n=1$ model with no scattering ( $S=I$ ). Likewise the purse scattering corresponds to $H=0$ and $L=0$ .

Heisenberg-Langevin Dynamics

System observables evolve according to the Heisenberg-Langevin equation

	$\displaystyle dj_{t}(X)$	$\displaystyle=$	$\displaystyle\sum_{jk}j_{t}(S^{\ast}_{lj}XS_{lk}-\delta_{jk}X)d\Lambda_{jk}(t)+\sum_{jl}j_{t}(S_{lj}^{\ast}[L_{l},X])\otimes dB_{j}(t)^{\ast}$		(155)
			$\displaystyle+\sum_{lk}j_{t}([X,L^{\ast}_{l}]S_{lk})\otimes dB_{k}(t)+j_{t}(\mathscr{L}X)\otimes dt.$		(155)

where the generator is the traditional Lindblad form

\displaystyle\mathscr{L}X=\frac{1}{2}\sum_{k}L^{\ast}_{k}[X,L_{k}]+\frac{1}{2}\sum_{k}[L^{\ast}_{k},X]L_{k}-i[X,H].

(156)

Quantum Outputs

The output fields are defined by

\displaystyle B^{\text{out}}_{k}(t)=U(t)^{\ast}[I\otimes B_{k}(t)]U(t).

(157)

From the quantum Ito calculus we find that

\displaystyle dB^{\text{out}}_{j}(t)=\sum_{k}j_{t}(S_{jk})\otimes dB_{k}(t)+j_{t}(L_{k})\otimes dt,

(158)

Or, maybe more suggestively in quantum white noise language [21],

\displaystyle b^{\text{out}}_{j}(t)=\sum_{j}j_{t}(S_{jk})\otimes b_{k}(t)+j_{t}(L_{j})\otimes I.

(159)

4.4 Quantum Filtering

We now set up the quantum filtering problem. For simplicity, we will take $n=1$ and set $S=I$ so that we have a simple emission-absorption interaction. We will also consider the situation where we measure the $Q$ -quadrature of the output.

The initial state is taken to be $|\psi_{0}\rangle\otimes|\Omega\rangle$ , and in the Heisenberg picture this is fixed for all time.

The analogue of the stochastic dynamical equation considered in the classical filtering problem is the Heisenberg-Langevin equation

\displaystyle dj_{t}\left(X\right)=j_{t}\left(\mathcal{L}X\right)\otimes dt+j_{t}\left(\left[X,L\right]\right)\otimes dB\left(t\right)^{\ast}+j_{t}\left(\left[L^{\ast},X\right]\right)\otimes dB\left(t\right)

(160)

where $\mathcal{L}X=\frac{1}{2}\left[L^{\ast},X\right]L+\frac{1}{2}L^{\ast}\left[X,L\right]-i\left[X,H\right]$ .

Some care is needed in specifying what exactly we measure: we should really work in the Heisenberg picture for clarity. The $Q$ -quadrature of the input field is $Q\left(t\right)=B\left(t\right)+B\left(t\right)^{\ast}$ which we have already seen is a Wiener process for the vacuum state of the field. Of course this is not what we measure - we measure the output quadrature!

Set

\displaystyle Y^{\text{in}}\left(t\right)=I\otimes Q\left(t\right).

(161)

As indicated in our discussion on von Neumann’s measurement model, what we actually measure is

\displaystyle Y^{\text{out}}(t)=U(t)^{\ast}Y^{\text{in}}(t)U(t)=B^{\text{out}}(t)+B^{\text{out}}(t)^{\ast}.

(162)

The differential form of this is

\displaystyle dY^{\text{out}}(t)=dY^{\text{in}}(t)+j_{t}(L+L^{\ast})dt.

(163)

Note that

\displaystyle dY^{\text{in}}\left(t\right)dY^{\text{in}}\left(t\right)=dt=dY^{\text{out}}\left(t\right)dY^{\text{out}}\left(t\right).

(164)

The dynamical noise is generally a quantum noise and can only be considered classical in very special circumstances, while the observational noise is just its $Q$ -quadrature which can hardly be treated as independent!

In complete contrast to the classical filtering problem we considered earlier, we have no paths for the system - just evolving observables of the system. What is more these observables do not typically commute amongst themselves, or indeed the measured process.

We can only apply Bayes Theorem in the situation where the quantities involved have a joint probability distribution, and in the quantum world this requires them to be compatible. At this stage it may seem like a miracle that we have any theory of filtering in the quantum world. However, let us stake stock of what we have.

What Commutes With What?

For fixed $s\geq 0$ , let $U(t,s)$ be the solution to the QSDE (154) in time variable $t\geq s$ with $U(s,s)=I$ . Formally, we have

\displaystyle U\left(t,s\right)=\vec{T}e^{-i\int_{s}^{t}\Upsilon\left(\tau\right)d\tau}

(165)

which is the unitary which couples the system to the part of the field that enters over the time $s\leq\tau\leq t$ . In terms of our previous definition, we have $U(t)=U(t,0)$ and we have the property

\displaystyle U\left(t\right)=U\left(t,s\right)U\left(s\right),\qquad\left(t>s>0\right).

(166)

In the Heisenberg picture, the observables evolve

	$\displaystyle j_{t}\left(X\right)$	$\displaystyle=$	$\displaystyle U\left(t\right)^{\ast}\left[X\otimes I\right]U\left(t\right),$		(167)
	$\displaystyle Y^{\text{out}}\left(t\right)$	$\displaystyle=$	$\displaystyle U\left(t\right)^{\ast}\left[I\otimes Q\left(t\right)\right]U\left(t\right).$		(168)

We know that the input quadrature is self-commuting, but what about the output one? A key identity here is that

\displaystyle Y^{\text{out}}\left(t\right)=U\left(t\right)^{\ast}Y^{\text{in}}\left(s\right)U\left(t\right),\qquad\left(t>s\right),

(169)

which follows from the fact that $\left[Y^{\text{in}}\left(s\right),U\left(t,s\right)\right]=0$ .

From this, we see that the process $Y^{\text{out}}$ is also commutative since

\displaystyle\left[Y^{\text{out}}\left(t\right),Y^{\text{out}}\left(s\right)\right]=U\left(t\right)^{\ast}\left[Y^{\text{in}}\left(t\right),Y^{\text{in}}\left(s\right)\right]U\left(t\right)=0,\quad\left(t>s\right).

(170)

If this was not the case then subsequent measurements of the process $Y^{\text{out}}$ would invalidate (disturb?) earlier ones. In fancier parlance, we say that process is not self-demolishing - that is, all parts are compatible with each other.

A similar line of argument shows that

\displaystyle\left[j_{t}\left(X\right),Y^{\text{out}}\left(s\right)\right]=U\left(t\right)^{\ast}\left[X\otimes I,I\otimes Q\left(t\right)\right]U\left(t\right)=0,\quad\left(t>s\right).

(171)

Therefore, we have a joint probability for $j_{t}\left(X\right)$ and the continuous collection of observables $\left\{Y^{\text{out}}\left(\tau\right):0\leq\tau\leq t\right\}$ so can use Bayes Theorem to estimate $j_{t}(X)$ for any $X$ using the past observations. Following V.P. Belavkin, we refer to this as the non-demolition principle.

The Conditioned State

In the Schrödinger picture, the state at time $t\geq 0$ is $|\Psi_{t}\rangle=U\left(t\right)|\phi\otimes\Omega\rangle$ , so

$\displaystyle d\|\Psi_{t}\rangle$	$\displaystyle=$	$\displaystyle-\left(\frac{1}{2}L^{\ast}L+iH\right)\|\Psi_{t}\rangle dt+LdB\left(t\right)^{\ast}\|\Psi_{t}\rangle-L^{\ast}dB\left(t\right)\|\Psi_{t}\rangle$	(172)
	$\displaystyle=$	$\displaystyle-\left(\frac{1}{2}L^{\ast}L+iH\right)\|\Psi_{t}\rangle dt+LdB\left(t\right)^{\ast}\|\Psi_{t}\rangle+LdB\left(t\right)\|\Psi_{t}\rangle$
	$\displaystyle=$	$\displaystyle-\left(\frac{1}{2}L^{\ast}L+iH\right)\|\Psi_{t}\rangle dt+LdY^{\text{in}}(t)\|\Psi_{t}\rangle.$

Here we have used a profound trick due to A.S. Holevo. The differential $dB(t)$ acting on $|\Psi_{t}\rangle$ yields zero since it is future pointing and so only affects the future part which, by adaptedness, is the vacuum state of the future part of the field. To get from the first line to the second line, we remove and add a term that is technically zero. In its reconstituted form, we obtain the $Q$ -quadrature of the input. The result is that we obtain an expression for the state $|\Psi_{t}\rangle$ which is “diagonal” in the input quadrature - our terminology here is poor (we are talking about a state not and observable!) but hopefully wakes up physicists to see what’s going on.

The above equation is equivalent to the SDE in the system Hilbert space

\displaystyle d|\chi_{t}\rangle=-\left(\frac{1}{2}L^{\ast}L+iH\right)|\chi_{t}\rangle dt+L|\chi_{t}\rangle dy_{t}

(173)

where $\mathbf{y}$ is a sample path - or better still, eigen-path - of the quantum stochastic process $Y^{\text{in}}$ .

We refer to (173) as the Belavkin-Zakai equation.

The Quantum Filter

Let us begin with a useful computational

$\displaystyle\langle\phi\otimes\Omega\|j_{t}\left(X\right)F\left[Y_{\left[0,t\right]}^{\text{out}}\right]\|\phi\otimes\Omega\rangle$	$\displaystyle=$	$\displaystyle\langle\phi\otimes\Omega\|U(t)^{\ast}\big{(}X\otimes F\left[Y_{\left[0,t\right]}^{\text{in}}\right]\big{)}U(t)\|\phi\otimes\Omega\rangle$	(174)
	$\displaystyle=$	$\displaystyle\langle\Psi_{t}\|X\otimes F\left[Y_{\left[0,t\right]}^{\text{in}}\right]\|\Psi_{t}\rangle$
	$\displaystyle=$	$\displaystyle\int\langle\chi_{t}(\mathbf{y})\|X\otimes\|\chi_{t}(\mathbf{y})\rangle\,F\left[\mathbf{y}\right]\,\mathbb{P}_{\text{Wiener}}[d\mathbf{y}].$

A few comments are in order here. The operator $j_{t}\left(X\right)$ will commute with any functional of the past measurements - here $F\left[Y_{\left[0,t\right]}^{\text{out}}\right]$ . In the first equality is pulling things back in terms of the unitary $U(t)$ . The second is just the equivalence between Schrödinger and Heisenberg pictures. The final one just uses the equivalent form (173): note that the paths of the input quadrature gets their correct weighting as Wiener processes.

Setting $X=I$ in (174), we get the

\displaystyle\langle\phi\otimes\Omega|F\left[Y_{\left[0,t\right]}^{\text{out}}\right]|\phi\otimes\Omega\rangle

\displaystyle=

\displaystyle\int\langle\chi_{t}(\mathbf{y}|\chi_{t}(\mathbf{y})\rangle\,F\left[\mathbf{y}\right]\,\mathbb{P}_{\text{Wiener}}[d\mathbf{y}]

(175)

So the probability of the measured paths is

\displaystyle\mathbb{Q}[d\mathbf{y}]=\langle\chi_{t}(\mathbf{y})|\chi_{t}(\mathbf{y})\rangle\,\mathbb{P}_{\text{Wiener}}[d\mathbf{y}].

(176)

Now this last equation deserves some comment! The vector $|\Psi_{t}\rangle$ , which lives in the system tensor Fock space, is properly normalized, but its corresponding form $|\chi_{t}\rangle$ is not! The latter is a stochastic process taking values in the system Hilbert space and is adapted to input quadrature. However, we never said that $|\chi_{t}\rangle$ had to be normalized too, and indeed it follows from or “diagonalization” procedure. In fact, if $|\chi_{t}\rangle$ was normalized then the output measure would follow a Wiener distribution and so we would be measuring white noise!

From (174) again, we an deduce the filter: we get (using the arbitrariness of the functional $F$ )

\displaystyle\mathfrak{E}_{t}(X)=\frac{\langle\chi_{t}(\mathbf{y})|X|\chi_{t}(\mathbf{y})\rangle}{\langle\chi_{t}(\mathbf{y})|\chi_{t}(\mathbf{y})\rangle}.

(177)

This has a remarkable similarity to (70). Moreover, using the Ito calculus see that

	$\displaystyle d\langle\chi_{t}(\mathbf{y})\|X\|\chi_{t}(\mathbf{y})\rangle$	$\displaystyle=$	$\displaystyle\langle\chi_{t}(\mathbf{y})\|\mathcal{L}X\|\chi_{t}(\mathbf{y})\rangle dt$		(178)
			$\displaystyle+\langle\chi_{t}(\mathbf{y})\|\big{(}XL+L^{\ast}X\big{)}\|\chi_{t}(\mathbf{y})\rangle\,dy(t).$		(178)

This is the quantum analogue of the Duncan-Mortensen-Zakai equation.

So small work is left in order to derive the filter equation. We first observe that the normalization (set $X=I$ ) is that

\displaystyle d\langle\chi_{t}(\mathbf{y})|\chi_{t}(\mathbf{y})\rangle=\langle\chi_{t}(\mathbf{y})|\big{(}L+L^{\ast}\big{)}|\chi_{t}(\mathbf{y})\rangle\,dy(t).

(179)

Using the Ito calculus, it is then routine to show that the quantum filter is

\displaystyle d\mathfrak{E}_{t}(X)=\mathfrak{E}_{t}(\mathcal{L}X)\,dt+\big{\{}\mathfrak{E}_{t}(XL+L^{\ast}X)-\mathfrak{E}_{t}(X)\mathfrak{E}_{t}(L+L^{\ast})\big{\}}dI(t)

(180)

where the innovations are defined by

\displaystyle dI(t)=dY^{\text{out}}(t)-\mathfrak{E}_{t}(L+L^{\ast})\,dt.

(181)

Again, the innovations have the statistics of a Wiener process. As in the classical case, the innovations give the difference between what we observe next, $dY^{\text{out}}(t)$ , and what we would have expected based on our observations up to that point, $\mathfrak{E}_{t}(L+L^{\ast})\,dt$ . The fact that the innovations are a Wiener process is a reflection of the efficiency of the filter - after extracting as much information as we can out of the observations, we are left with just white noise.

Acknowledgements

I would like to thank the staff at CIRM, Luminy (Marseille), and at Institut Henri Poincaré (Paris) for their kind support during the 2018 Trimester on Measurement and Control of Quantum Systems where this work was began. I am also grateful to the other organizers Pierre Rouchon and Denis Bernard for valuable comments during the writing of these notes.

References

[1] V.P. Belavkin, (1989), Non-Demolition Measurements, Nonlinear Filtering and Dynamic Programming of Quantum Stochastic Processes, Lecture Notes in Control and Inform Sciences 121 245–265, Springer–Verlag, Berlin.
[2] A. Barchielli and V. P. Belavkin, (1991), Measurements continuous in time and a posteriori states in quantum mechanics, J. Phys. A: Math. Gen. 24, 1495.
[3] A. Barchielli and M. Gregoratti, (2009), Quantum Trajectories and Measurements in Continuous Time - the diffusive case, Springer Berlin Heidelberg.
[4] H.M. Wiseman and G.J. Milburn, (2009), Quantum Measurement and Control, Cambridge University Press.
[5] D. Gatarek, N. Gisin, (1995), Continuous quantum jumps and infinite‐dimensional stochastic equations, Journal of Mathematical Physics 32 (8), 2152-2157
[6] H.J. Carmichael, (1993), Phys. Rev. Lett. 70(15) p.2273.
[7] J. Dalibard, Y. Castin, Yvan, and K. Mølmer, (Feb 1992), Wave-function approach to dissipative processes in quantum optics. Phys. Rev. Lett. American Physical Society. 68 (5): 580–58.
[8] C. Sayrin, I. Dotsenko, et al., (1 September 2011), Real-time quantum feedback prepares and stabilizes photon number states, Nature 477, 73-77.
[9] H. Maassen, (1988), Theoretical concepts in quantum probability: quantum Markov processes. Fractals, quasicrystals, chaos, knots and algebraic quantum mechanics (Maratea, 1987), 287-302, NATO Adv. Sci. Inst. Ser. C Math. Phys. Sci., 235, Kluwer Acad. Publ., Dordrecht
[10] M. Takesaki, (1972) Conditional Expectations in von Neumann Algebras, J. Func. Anal., 9, 306-321.
[11] L. Bouten, R. van Handel and M.R. James, (2007), An introduction to quantum filtering, SIAM Journal on Control and Optimization 46, 2199.
[12] L. Bouten, R. van Handel, Quantum filtering: a reference probability approach, aXiv:math-ph/0508006
[13] R. van Handel, Ph.D. Thesis, Filtering, Stability, and Robustness, CalTech, 2006, http://www.princeton.edu/ rvan/thesisf070108.pdf
[14] H. Wiseman, (1994), Quantum theory of continuous feedback, Phys. Rev. A, 49(3):2133-2150.
[15] P. Rouchon, (August 13 - 21, 2014), Models and Feedback Stabilization of Open Quantum Systems Extended version of the paper attached to an invited conference for the International Congress of Mathematicians in Seoul, arXiv:1407.7810
[16] R.L. Hudson and K.R. Parthasarathy, (1984), Quantum Ito’s formula and stochastic evolutions, Commun. Math. Phys. 93, 301.
[17] K.R. Parthasarathy, (1992) An Introduction to Quantum Stochastic Calculus, Birkhauser.
[18] J. Gough, M.R. James, (2009), Quantum Feedback Networks: Hamiltonian Formulation, Commun. Math. Phys. 287, 1109.
[19] J. Gough, M.R. James, (2009), The series product and its application to quantum feedforward and feedback networks, IEEE Trans. on Automatic Control 54, 2530.
[20] J. Combes, J. Kerckhoff, M. Sarovar, (2017), The SLH framework for modeling quantum input-output networks, Advances in Physics: X, 2:3, 784-888.
[21] C.W. Gardiner and M.J. Collett, (1985), Input and output in damped quantum systems: Quantum stochastic differential equations and the master equation. Phys. Rev. A, 31(6):3761-3774.