This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Approximation of stationary processes and Toeplitz Spectra

Giorgio Picci and Bin Zhu Giorgio Picci is with the Department of Information Engineering, University of Padova, Padova, Italy [email protected]2 Bin Zhu is with the School of Intelligent Systems Engineering, Sun Yat-sen University, Waihuan East Road 132, 510006 Guangzhou, China, e-mail: [email protected]
Abstract

We study the approximation of stationary processes by a simple class of purely deterministic signals. This has an analytic counterpart in the approximation of symmetric positive definite Toeplitz matrices by submatrices of finite rank. We propose a notion of distance between them and prove a weak sense approximation result.

I Introduction and Problem Statement

In our past investigations [1] we have discovered that the a posteriori covariance operator of some random harmonic oscillation signals corrupted by white noise has remarkable properties very similar to those that have been uncovered in the 60’s and 70’s by D. Slepian and coworkers in a famous series of papers concerning the energy concentration problems of time and band limited signals [2, 3, 4, 5]. The key property of the covariance operator in question is that its eigenvalues decay extremely fast to zero for indices greater than an a priori computable number (the so-called Slepian frequency).

From the sharp decay to zero of the eigenvalues it follows that the a posteriori probabilistic description of the signal is essentially finite dimensional and it seems that in simulated experiments the observations can be well approximated by a purely deterministic process [6] corrupted by white noise. Since purely deterministic processes of finite rank can be described by linear finite dimensional state-space models [6, p. 277], the estimation can be carried on by rather straightforward subspace methods.

This although the precise meaning of this approximation has so far not been understood. Scope of this paper is to propose a possible metric for this approximation and to prove convergence althogh only in a weak sense.

Consider the infinite covariance matrix111In this paper bold symbols like 𝚺\boldsymbol{\Sigma} are reserved for infinte arrays say stochastic processes or covariance matrices. 𝚺=[σ(ts)]t,s\boldsymbol{\Sigma}=\left[\sigma(t-s)\right]_{t,s\in\mathbb{Z}} of a scalar stationary zero-mean process having covariance function

σ(τ)=𝔼𝐲(t+τ)𝐲(t)τ\sigma(\tau)=\operatorname{{\mathbb{E}}}\mathbf{y}(t+\tau)\mathbf{y}(t)\,\qquad\tau\in\mathbb{Z} (1)

which we shall assume admits a Fourier transform

φ(ejθ)=τ=+ejθτσ(τ)\varphi(e^{j\theta})=\sum_{\tau=-\infty}^{+\infty}\,e^{-j\theta\,\tau}\sigma(\tau)

which is a piecewise smooth function of θ\theta. For example φ\varphi will be continuous and bounded when σ\sigma belongs to 1\ell^{1}. The function φ\varphi is called the symbol of 𝚺\boldsymbol{\Sigma}. The process 𝐲\mathbf{y} is not necessarily purely non deterministic; however we assume that φ(ejθ)\varphi(e^{j\theta}) is piecewise continuous and bounded, as for example is a rectangular function. Then 𝚺\boldsymbol{\Sigma} induces a bounded operator in 2\ell^{2} [7].

Let now

𝐲N(t)=[𝐲(t)𝐲(t+1)𝐲(t+N1)]\mathbf{y}_{N}(t)=\left[\begin{matrix}\mathbf{y}(t)&\mathbf{y}(t+1)&\ldots&\mathbf{y}(t+N-1)\end{matrix}\right]^{\top}

and consider the NN-truncation of the matrix 𝚺\boldsymbol{\Sigma}, defined as

ΣN:=𝔼𝐲N(t)𝐲N(t)\Sigma_{N}:=\operatorname{{\mathbb{E}}}\mathbf{y}_{N}(t)\mathbf{y}_{N}(t)^{\top}

which for each NN has a positive point spectrum, say

𝔖N:={λN,1,,λN,N}\mathfrak{S}_{N}:=\{\lambda_{N,1},\ldots,\lambda_{N,N}\}

where the eigenvalues are ordered in decreasing magnitude. We shall assume that all ΣN\Sigma_{N} are non singular so that the eigenvalues must be strictly positive for all NN. By Weyl’s theorem [8, p. 203], see also [9, Fact M], the k–th eigenvalue of 𝚺N\boldsymbol{\Sigma}_{N} is a non decreasing function of NN and hence has a limit, λk(𝚺)\lambda_{k}(\boldsymbol{\Sigma}), which may possibly be equal to ++\infty. Each such limit is called an eigenvalue of 𝚺\boldsymbol{\Sigma}. These limits however are in general not true eigenvalues, as it is well-known that 𝚺\boldsymbol{\Sigma} may not have eigenvalues. For example, a bounded symmetric Toeplitz operator matrix has a purely continuous spectrum [10].

Assume now that all eigenvalues of 𝚺\boldsymbol{\Sigma} are finite (this is equivalent to 𝚺\boldsymbol{\Sigma} being a bounded linear operator in 2\ell^{2} see [11]) and let us keep in the formal spectral decomposition of 𝚺\boldsymbol{\Sigma} the nn largest eigenvalues setting all the others to zero. In this way we form an ”approximation” of 𝚺\boldsymbol{\Sigma} of finite rank nn which we shall denote 𝚺n\boldsymbol{\Sigma}^{n}. Then the (infinite) matrix 𝚺n\boldsymbol{\Sigma}^{n} is the covariance of a purely deterministic process of rank nn [6] whose spectral density, say φn(ejθ)\varphi_{n}(e^{j\theta}), is the sum of nn Dirac pulses. We would like to know in what sense, if any, 𝚺n\boldsymbol{\Sigma}^{n} could be considerd an approximation of 𝚺\boldsymbol{\Sigma} or equivalently, φn(ejθ)\varphi_{n}(e^{j\theta}) could be considered an approximation of the symbol φ(ejθ)\varphi(e^{j\theta}). Obviously this last fact could only happen in a weak sense, say for arbitrary test functions ψ(ejθ)\psi(e^{j\theta}) continuous on the unit circle one should have

ψ(ejθ)φn(ejθ)ψ(ejθ)ψ(ejθ)φ(ejθ)ψ(ejθ)\int\psi(e^{j\theta})\varphi_{n}(e^{j\theta})\psi(e^{j\theta})^{*}\to\int\psi(e^{j\theta})\varphi(e^{j\theta})\psi(e^{j\theta})^{*} (2)

as nn\to\infty.

An equivalent question can be posed in terms of L2L^{2} approximation of a stationary process 𝐲\mathbf{y} of covariance 𝚺\boldsymbol{\Sigma} (which could in particular be p.n.d) by a purely deterministic one, say 𝐲^n\hat{\mathbf{y}}_{n} having covariance 𝚺n\boldsymbol{\Sigma}_{n} of rank nn. As we shall see this problem should also be naturally formulated in a weak sense.

II Approximation of random vectors

To begin with, suppose we want to approximate an NN-dimensional zero-mean random vector yy of covariance matrix ΣN×N\Sigma\in\mathbb{R}^{N\times N} by another NN-dimensional vector say y^\hat{y} having covariance Σn\Sigma_{n} of rank n<Nn<N. To make the problem well-posed we shall require that the approximation y^\hat{y} generates a linear subspace of 𝐇(y)\mathbf{H}({y}), which will then have to be nn-dimensional. This means that we can represent y^\hat{y} as a linear function of yy, say

y^:=My\hat{y}:=M{y}

where MN×NM\in\mathbb{R}^{N\times N} has rank nn.

Motivated from the above, let us consider the following approximation problem:

Problem 1.

Find a matrix MN×NM\in{\mathbb{R}}^{N\times N} of rank nn, solving the following minimum problem

minrank(M)=n𝔼{yMy2}.\min_{\operatorname{rank}(\,M\,)\,=\,n}\,\operatorname{{\mathbb{E}}}\{\|{y}-M\,{y}\|^{2}\}\,. (3)

Note that an equivalent geometric formulation is to look for an optimal nn-dimensional subspace of 𝐇(y)\mathbf{H}({y}) onto which y{y} should be projected in order to minimize the approximation error variance. Let us stress that this is quite different from the usual least squares approximation problem which amounts to projecting onto a given subspace.

As usual, minimizing the square distance in (3) requires that the approximation MyM{y} should be uncorrelated with the approximation error; namely

yMyMy{y}-M{y}\,\perp\,M{y} (4)

which is equivalent to

MΣMΣM=0.M\Sigma-M\Sigma M^{\top}=0\,.

Introducing a square root Σ1/2\Sigma^{1/2} of Σ\Sigma and defining M^:=Σ1/2MΣ1/2\hat{M}:=\Sigma^{-1/2}M\Sigma^{1/2}, this condition is seen to be equivalent to

M^=M^M^\hat{M}=\hat{M}\,\hat{M}^{\top}

which just says that M^\hat{M} must be symmetric and idempotent (i.e. M^=M^2\hat{M}=\hat{M}^{2}), in other words an orthogonal projection from N{\mathbb{R}}^{N} onto some nn-dimensional subspace. Hence MM must have the following structure

M=Σ1/2ΠΣ1/2,Π=Π2Π=ΠM\,=\,\Sigma^{1/2}\,\Pi\,\Sigma^{-1/2},\quad\quad\Pi\,=\,\Pi^{2}\quad\quad\Pi\,=\,\Pi^{\top} (5)

where Σ1/2\Sigma^{1/2} is any square root of Σ\Sigma and Π\Pi is an orthogonal projection matrix of rank nn.

Theorem 2.

The solutions of the approximation problem (3) are of the form

M=WW,W=UnQnM=W\,W^{\top},\qquad W=U_{n}Q_{n}

where UnU_{n} is a N×nN\times n matrix whose columns are the first nn normalized eigenvectors of Σ\Sigma, ordered in the descending magnitude ordering of the corresponding first nn eigenvalues, collected in the diagonal matrix Λn\Lambda_{n}, and QnQ_{n} is an arbitrary n×nn\times n orthogonal matrix.

Proof: Let Λ:=diag{λ1,,λN}\Lambda:=\operatorname{diag}\{\lambda_{1},\ldots,\lambda_{N}\} and Σ=UΛU\Sigma\,=\,U\Lambda U^{\top} be the spectral decomposition of Σ\Sigma in which UU is an orthogonal matrix of eigenvectors. We can pick as a square rot of Σ\Sigma the matrix Σ1/2:=UΛ1/2\Sigma^{1/2}\,:=\,U\Lambda^{1/2}.

Now, no matter how Σ1/2\Sigma^{1/2} is chosen, the random vector 𝐞:=Σ1/2y{\mathbf{e}}:=\Sigma^{-1/2}\,{y} has orthonormal components. Hence using (5) the cost function of our minimum problem can be rewritten as

E{yMy2}\displaystyle E\{\|{y}-M\,{y}\|^{2}\}\, =𝔼{Σ1/2𝐞Σ1/2ΠΣ1/2𝐲2}\displaystyle=\,\operatorname{{\mathbb{E}}}\{\|\Sigma^{1/2}{\mathbf{e}}-\Sigma^{1/2}\,\Pi\,\Sigma^{-1/2}{\mathbf{y}}\|^{2}\}
=𝔼{Σ1/2(𝐞Π𝐞)2}\displaystyle=\,\operatorname{{\mathbb{E}}}\{\|\Sigma^{1/2}({\mathbf{e}}-\Pi\,{\mathbf{e}}\,)\|^{2}\}
=𝔼{Λ1/2(𝐞Π𝐞)2}\displaystyle=\,\operatorname{{\mathbb{E}}}\{\|\Lambda^{1/2}({\mathbf{e}}-\Pi\,{\mathbf{e}}\,)\|^{2}\}
=𝔼(𝐞Π𝐞)Λ(𝐞Π𝐞)\displaystyle=\,\operatorname{{\mathbb{E}}}\,({\mathbf{e}}-\Pi\,{\mathbf{e}}\,)^{\top}\Lambda\,({\mathbf{e}}-\Pi\,{\mathbf{e}}\,)
=tr[Λ𝔼(𝐞Π𝐞)(𝐞Π𝐞)]\displaystyle=\,\operatorname{tr}\left[\Lambda\,\operatorname{{\mathbb{E}}}({\mathbf{e}}-\Pi\,{\mathbf{e}}\,)({\mathbf{e}}-\Pi\,{\mathbf{e}}\,)^{\top}\right]

where trA:=akk\operatorname{tr}A:=\sum a_{kk} is the trace of AA. Our minimum problem can therefore be rewritten as

minrank(Π)=ntr{ΛΠ}\min_{\operatorname{rank}(\,\Pi\,)\,=\,n}\,\operatorname{tr}\{\Lambda\Pi^{\perp}\}

where Π:=IΠ\Pi^{\perp}:=I-\Pi is the orthogonal projection onto the orthogonal complement of the subspace ImΠ\operatorname{Im}\Pi.

Since the eigenvalues are ordered in decreasing order; i.e. {λ1λt}\{\lambda_{1}\geq\ldots\geq\lambda_{t}\}, one sees that the minimum of this function of Π\Pi is reached when Π\Pi projects onto the subspace spanned by the first nn coordinate axes. In other words, Πoptimal=diag{In, 0}\Pi_{optimal}=\operatorname{diag}\{I_{n},\,0\} the minimum being λn+1++λN\lambda_{n+1}+\ldots+\lambda_{N}. It is then evident that

M=UΛ1/2ΠoptimalΛ1/2U=UnUnM=\,U\Lambda^{1/2}\,\Pi_{optimal}\Lambda^{-1/2}U^{\top}=U_{n}U_{n}^{\top}

Naturally, multiplying UnU_{n} by any orthogonal n×nn\times n matrix does not change the result. \Box

Observe that y^=UnUny\hat{y}=U_{n}U_{n}^{\top}y is just the first nn Principal Components Approximation of yy. In fact it is well-known that the PCA vector y^\hat{y} can be seen as a linear transformation acting on y{y} [12]. This result confirms in particular that the truncated PCA expansion is optimal in the sense that it provides the best MM and the best approximation subspace for the criterion (3). This characterization has been also found by a different technique studying subspace approximation problems; see e.g. [13].

Note for future reference that the variance matrix of y^\hat{y} has rank nn since

𝔼y^y^=UnUn𝔼yyUnUn=Undiag{λ1,,λn}Un\operatorname{{\mathbb{E}}}\hat{y}\hat{y}^{\top}\!=\!U_{n}U_{n}^{\top}\!\operatorname{{\mathbb{E}}}yy^{\top}U_{n}U_{n}^{\top}\!=\!U_{n}\operatorname{diag}\{\lambda_{1},\ldots,\lambda_{n}\}U_{n}^{\top} (6)

and that this expression holds for arbitrary NnN\geq n. Naturally one should keep in mind that the eigenvector matrices now depend on NN but the eigenvalues stay fixed.

III Extension to infinite dimension

In the same spirit, consider now a stationary zero-mean process 𝐲\mathbf{y} and any jointly stationary zero-mean process (both written as a doubly infinite column vectors) 𝐳\mathbf{z}, spanning a subspace 𝐇(𝐳)𝐇(𝐲)\mathbf{H}(\mathbf{z})\subset\mathbf{H}(\mathbf{y}) of dimension nn. Any such process 𝐳\mathbf{z} must be purely deterministic of rank nn and is then uniquely determined by any finite string of random variables {𝐳(t)}tI\{\mathbf{z}(t)\}_{t\in I} induced on an interval II of length NnN\geq n. This statement from [6, page 276-277] is reported for completeness in the following lemma.

Lemma 1.

Any p.d. process 𝐳\mathbf{z} of rank nn can be represented for all tt\in\mathbb{Z} by a state-space model (i.e. a stochastic realization) of the form

𝝃(t+1)\displaystyle\boldsymbol{\xi}(t+1) =A𝝃(t)\displaystyle=A\boldsymbol{\xi}(t) (7)
𝐳(t)\displaystyle\mathbf{z}(t) =c𝝃(t)\displaystyle=c\,\boldsymbol{\xi}(t) (8)

where 𝛏(t)=[ξ1(t),ξ2(t),,ξn(t)]\boldsymbol{\xi}(t)=[\,\xi_{1}(t),\,\xi_{2}(t),\,\ldots,\xi_{n}(t)\,]^{\top} is an nn-dimensional basis vector spanning the Hilbert space 𝐇(𝐳N)\mathbf{H}(\mathbf{z}_{N}) linearly generated by the NnN\geq n random variables of the set {𝐳(s):tstN+1}\{\mathbf{z}(s)\,:\,t\geq s\geq t-N+1\}, AA is a n×nn\times n unitary matrix and cc is a nn-vector such that the pair (A,c)(A,c) is observable.

Proof: See [6, p. 277]. \Box

This linear system extends in time the finite family of random variables {𝐳(s)}\{\mathbf{z}(s)\}, generators of 𝐇(𝐳N)\mathbf{H}(\mathbf{z}_{N}), to generate the stationary p.d. process 𝐳\mathbf{z} defined on the whole time line \mathbb{Z}. Since this realization is uniquely determined by the space 𝐇(𝐳N)\mathbf{H}(\mathbf{z}_{N}) modulo a choice of basis, it follows that the process 𝐳\mathbf{z} is also uniquely determined by the space 𝐇(𝐳N)\mathbf{H}(\mathbf{z}_{N}). Hence all covariances σ(τ)=𝔼𝐳(t+τ)𝐳(t)\sigma(\tau)=\operatorname{{\mathbb{E}}}\mathbf{z}(t+\tau)\mathbf{z}(t) are also uniquely defined and determine the entries of the covariance operator of the process.

In analogy to Problem 1 let us ask if there is a stationary process 𝐳\mathbf{z} spanning a subspace 𝐇(𝐳)𝐇(𝐲)\mathbf{H}(\mathbf{z})\subset\mathbf{H}(\mathbf{y}) of dimension nn, which minimizes an approximation criterion of the type (3). If such a process exists we shall call it a nn-PC approximation of 𝐲\mathbf{y} and denote it by the symbol 𝐲^n\hat{\mathbf{y}}_{n}.

Let then I=[t,t+1,,t+N1]I=[\,t,t+1,\ldots,t+N-1\,] denote a finite subinterval of the time line \mathbb{Z} of length NnN\geq n and XIX_{I} a N×N\times\infty matrix with entries xj,kx_{j,k} equal to one if kIk\in I and zero otherwise. Consider the finite random vectors 𝐲N:=XI𝐲\mathbf{y}_{N}:=X_{I}\mathbf{y} and 𝐳N:=XI𝐳\mathbf{z}_{N}:=X_{I}\mathbf{z} and let 𝐲^N\hat{\mathbf{y}}_{N} be the random vector minimizing the norm 𝔼{XI𝐲XI𝐳2}\operatorname{{\mathbb{E}}}\{\|X_{I}\mathbf{y}-X_{I}\mathbf{z}\|^{2}\} for an interval of length NnN\geq n. This problem is analogous to the problem (1) where now yy is substituted by 𝐲N\mathbf{y}_{N}. The solution string determines by stationary extension (Lemma 1) a purely deterministic process 𝐲^\hat{\mathbf{y}} such that 𝐲^N:=XI𝐲^\hat{\mathbf{y}}_{N}:=X_{I}\hat{\mathbf{y}} and 𝐇(𝐲^)=𝐇(𝐲^N)\mathbf{H}(\hat{\mathbf{y}})=\mathbf{H}(\hat{\mathbf{y}}_{N}) has dimension nn. Since 𝐇(𝐲^)𝐇(𝐲N)𝐇(𝐲)\mathbf{H}(\hat{\mathbf{y}})\subset\mathbf{H}(\mathbf{y}_{N})\subset\mathbf{H}(\mathbf{y}), this processes satisfies our requirements. By stationarity 𝐲^\hat{\mathbf{y}} is invariant with respect to translations of the interval II provided its length NN is fixed. Then 𝐲^\hat{\mathbf{y}} is a nn-PC approximation of 𝐲\mathbf{y}. The question now is to understand in what sense this approximation can get close to 𝐲\mathbf{y} as nn\to\infty.

Since we are now studying the behavior of the nn-PC approximation of 𝐲\mathbf{y} when the dimension nn varies, we shall attach a superscript to 𝐲^\hat{\mathbf{y}} and denote it by 𝐲^n\hat{\mathbf{y}}^{n}; likewise we shall do to its covariance matrix, which will be denoted 𝚺^n\hat{\boldsymbol{\Sigma}}^{n}.

Theorem 3.

For each nn the nn-PC approximation of 𝐲\mathbf{y} has an (infinite) covariance operator 𝚺^n\hat{\boldsymbol{\Sigma}}^{n} of rank nn. The sequence {𝚺^n}\{\hat{\boldsymbol{\Sigma}}^{n}\} converges weakly to 𝚺\boldsymbol{\Sigma} as nn diverges to \infty, that is

ψ[𝚺𝚺^n]ψ0asn\psi^{\top}[\boldsymbol{\Sigma}-\hat{\boldsymbol{\Sigma}}^{n}]\psi\rightarrow 0\qquad\text{as}\quad n\to\infty

for all functions (row sequence) ψ:=aXI\psi^{\top}:=a^{\top}X_{I} having support in II where aNa\in\mathbb{R}^{N} is arbitrary.

Proof: Consider a nn-PC approximation 𝐲^\hat{\mathbf{y}} of 𝐲\mathbf{y} and the restriction XI𝐲^X_{I}\hat{\mathbf{y}} to any interval II of length NnN\geq n. Recall that 𝐲^\hat{\mathbf{y}} is now denoted 𝐲^n\hat{\mathbf{y}}^{n} and likewise for its covariance matrix denoted Σ^n\hat{\Sigma}^{n}. By analogy to (6) the N×NN\times N truncation of this matrix has the structure

Σ^Nn=UNndiag{λ1,,λn}(UNn)\hat{\Sigma}^{n}_{N}=U^{n}_{N}\operatorname{diag}\{\lambda_{1},\ldots,\lambda_{n}\}(U^{n}_{N})^{\top} (9)

where the eigenvalues λk\lambda_{k} are fixed and equal to the first nn eigenvalues of the N×NN\times N-truncation of the covariance operator 𝚺\boldsymbol{\Sigma} of 𝐲\mathbf{y}. The N×nN\times n eigenvector matrices UNnU^{n}_{N} depend on NN as their dimension trivially grows with NN.

We shall now show that all Σ^Nn\hat{\Sigma}^{n}_{N} are the NN-truncation of an infinite Toeplitz matrix 𝚺^n\hat{\boldsymbol{\Sigma}}^{n} of rank nn which is the covariance of the purely deterministic process 𝐲^n\hat{\mathbf{y}}^{n}. That this is so follows from the fact that all covariances σ^n(τ)=𝔼𝐲^n(t+τ)𝐲^n(t)\hat{\sigma}^{n}(\tau)=\operatorname{{\mathbb{E}}}\hat{\mathbf{y}}^{n}(t+\tau)\hat{\mathbf{y}}^{n}(t) are uniquely defined by the state space model (7), (8) and constitute exactly the entries of the (infinite) covariance operator 𝚺^n\hat{\boldsymbol{\Sigma}}^{n} of the p.d. process 𝐲^n\hat{\mathbf{y}}^{n}. Then, in particular, each finite matrix Σ^Nn\hat{\Sigma}^{n}_{N} must be a NN-truncation of the same 𝚺^n\hat{\boldsymbol{\Sigma}}^{n}. Then from the expansion (9) it follows that this truncation is just the (symmetric) N×NN\times N SVD approximation of rank nn of the N×NN\times N truncation of 𝚺\boldsymbol{\Sigma}, which is of course well defined for all finite NN.

By a well-known property of the SVD (see e.g [14, Chap. 2]) the variance matrix Σ^Nn\hat{\Sigma}^{n}_{N} of XI𝐲^nX_{I}\hat{\mathbf{y}}^{n} is the symmetric N×NN\times N matrix of rank nn which has minimum distance from that of XI𝐲X_{I}\mathbf{y} in the Frobenius norm. This in turn implies that the variance 𝚺^n\hat{\boldsymbol{\Sigma}}^{n} is the (infinte) covariance matrix of rank nn which minimizes

ψ(𝚺𝚺^n)ψ=ψ(𝚺NΣNn)ψ.\psi^{\top}(\boldsymbol{\Sigma}-\hat{\boldsymbol{\Sigma}}^{n})\psi=\psi^{\top}(\boldsymbol{\Sigma}_{N}-\Sigma^{n}_{N})\psi. (10)

for all functions ψ\psi having support in an interval II\subset\mathbb{Z} of length NnN\geq n. But the sequence (10) is monotonically decreasing and nonnegative for all nn and fixed NnN\geq n. In fact by obvious properties of the SVD, for each ψ\psi of finite support of length NN it tends to zero with nn in a finite number of steps since when n=Nn=N the difference is zero. But this happens for all NN and hence for all ψ\psi of finite support. \Box

Remark 1.

Contrary to all submatrices ΣN\Sigma_{N}, the infinite covariance operator 𝚺\boldsymbol{\Sigma} may not have eigenvalues (nor corresponding eigenvectors) and consequently the idea of PC approximation does not apply to the full matrix. For this reason the approximation and the convergence results may not hold in a strong sense.

IV Approximation in the spectral domain

We now go back to the problem formulation of Sec. I. From [6, Chap. 3] the processes 𝐲\mathbf{y} and 𝐲^n\hat{\mathbf{y}}^{n} have a spectral representation with random spectral measures dZ(eiθ)dZ(e^{i\theta}) and dZn(eiθ)dZ^{n}(e^{i\theta}) such that

𝔼dZ(eiθ)dZ((eiθ)\displaystyle\operatorname{{\mathbb{E}}}dZ(e^{i\theta})dZ((e^{i\theta})^{*} =φ(eiθ)dω2π,\displaystyle=\varphi(e^{i\theta}){\displaystyle\frac{d\omega}{2\pi}}\,,
𝔼dZn(eiθ)dZn((eiθ)\displaystyle\operatorname{{\mathbb{E}}}dZ^{n}(e^{i\theta})dZ^{n}((e^{i\theta})^{*} =φn(eiθ)dω2π.\displaystyle=\varphi_{n}(e^{i\theta}){\displaystyle\frac{d\omega}{2\pi}}\,.

Letting ψ^(eiω):=k=0N1ψ(k)eiωk\hat{\psi}(e^{i\omega}):=\sum_{k=0}^{N-1}\psi(k)e^{i\omega k} one has

ψ𝚺ψ\displaystyle\psi^{\top}\boldsymbol{\Sigma}\psi =𝔼[k=0N1ψ(k)𝐲(k)]2=𝔼[ππψ^(eiω)dZ(eiθ)]2\displaystyle=\operatorname{{\mathbb{E}}}[\sum_{k=0}^{N-1}\psi(k)\mathbf{y}(k)\,]^{2}=\operatorname{{\mathbb{E}}}[\int_{-\pi}^{\pi}\hat{\psi}(e^{i\omega})dZ(e^{i\theta})\,]^{2}
=ππψ^(eiω)φ(eiω)ψ^(eiω)dω2π\displaystyle=\int_{-\pi}^{\pi}\hat{\psi}(e^{i\omega})\varphi(e^{i\omega})\hat{\psi}(e^{i\omega})^{*}{\displaystyle\frac{d\omega}{2\pi}}

and similarly for ψ𝚺nψ\psi^{\top}\boldsymbol{\Sigma}^{n}\psi which can be written

ψ𝚺nψ\displaystyle\psi^{\top}\boldsymbol{\Sigma}^{n}\psi =𝔼[k=0N1ψ(k)𝐲^(k)]2\displaystyle=\operatorname{{\mathbb{E}}}[\sum_{k=0}^{N-1}\psi(k)\hat{\mathbf{y}}(k)\,]^{2}
=𝔼[ππψ^(eiω)dZn(eiθ)]2\displaystyle=\operatorname{{\mathbb{E}}}[\int_{-\pi}^{\pi}\hat{\psi}(e^{i\omega})dZ^{n}(e^{i\theta})\,]^{2}
=ππψ^(eiω)φn(eiω)ψ^(eiω)dω2π\displaystyle=\int_{-\pi}^{\pi}\hat{\psi}(e^{i\omega})\varphi_{n}(e^{i\omega})\hat{\psi}(e^{i\omega})^{*}{\displaystyle\frac{d\omega}{2\pi}}

Therefore (2) follows from Theorem 3.

Now note that, because of the orthogonality (4), ψ(𝐲𝐲^n)ψ𝐲^n\psi^{\top}(\mathbf{y}-\hat{\mathbf{y}}^{n})\perp\psi^{\top}\hat{\mathbf{y}}^{n} and the difference (10) can be rewritten as 𝔼[ψ(𝐲𝐲^n)]2\operatorname{{\mathbb{E}}}[\psi^{\top}(\mathbf{y}-\hat{\mathbf{y}}^{n})]^{2} which also must tend to zero when nn\to\infty for all functions ψ\psi having support in an interval II\subset\mathbb{Z} of length NnN\geq n. Therefore 𝐲^n\hat{\mathbf{y}}^{n} converges weakly to 𝐲\mathbf{y}.

References

  • [1] G. Picci and B. Zhu, “An empirical bayesian approach to frequency estimation,” Dept. of Information Engineering University of Padova, arXiv.1910.09475, 2019.
  • [2] D. Slepian and H. Pollak, “Prolate spheroidal wave functions, fourier analysis and uncertainty I,” Bell Syst. Tech. Jour., vol. 40, pp. 43–63, 1961.
  • [3] H. J. Landau and H. Pollak, “Prolate spheroidal wave functions, fourier analysis and uncertainty II,” Bell Syst. Tech. Jour., vol. 40, pp. 65–84, 1961.
  • [4] D. Slepian, “Prolate spheroidal wave functions, fourier analysis and uncertainty V: The discrete case,” Bell Syst. Tech. Jour., vol. 57, no. 5, pp. 1371–1430, 1978.
  • [5] H. J. Landau and H. Widom, “Eigenvalue distribution of time and frequency limiting,” Journal of Mathematical Analysis and Applications, vol. 77, no. 2, pp. 469–481, 1980.
  • [6] A. Lindquist and G. Picci, Linear Stochastic Systems: a Geometric Approach to Modeling Estimation and Identification.   Springer Verlag, 2015.
  • [7] N. Akhiezer and I. M. Glazman, Theory of Linear Operators in Hilbert Space Vol I.   New York: Fredrik Ungar Pub. Co., 1961.
  • [8] G. W. Stewart and J. Sun, Matrix perturbation theory.   Boston: Academic Press, 1990.
  • [9] M. Forni and M. Lippi, “The generalized dynamic factor model: representation theory,” Econometric Theory, vol. 17, pp. 1113–1141, 2001.
  • [10] P. Hartman and A. Wintner, “The spectra of Toeplitz matrices,” American Journal of Mathematics, vol. 76, pp. 867–882, 1954.
  • [11] G. Bottegal and G. Picci, “Modeling complex systems by generalized factor analysis,” IEEE Trans. Autom. Control, vol. 60, no. 3, pp. 759–774, 2015.
  • [12] H. Hotelling, “Relations between two sets of variates,” Biometrika, vol. 28, pp. 321–377, 1936.
  • [13] B. Yang, “Projection approximation subspace tracking,” IEEE Transactions on Signal Procesing, vol. 43, pp. 95–107, 1995.
  • [14] G. Golub and C. V. Loan, Matrix computations.   John Hopkins University Press, 2nd2^{nd} edition, 1989.