Approximation of stationary processes and Toeplitz Spectra
Abstract
We study the approximation of stationary processes by a simple class of purely deterministic signals. This has an analytic counterpart in the approximation of symmetric positive definite Toeplitz matrices by submatrices of finite rank. We propose a notion of distance between them and prove a weak sense approximation result.
I Introduction and Problem Statement
In our past investigations [1] we have discovered that the a posteriori covariance operator of some random harmonic oscillation signals corrupted by white noise has remarkable properties very similar to those that have been uncovered in the 60’s and 70’s by D. Slepian and coworkers in a famous series of papers concerning the energy concentration problems of time and band limited signals [2, 3, 4, 5]. The key property of the covariance operator in question is that its eigenvalues decay extremely fast to zero for indices greater than an a priori computable number (the so-called Slepian frequency).
From the sharp decay to zero of the eigenvalues it follows that the a posteriori probabilistic description of the signal is essentially finite dimensional and it seems that in simulated experiments the observations can be well approximated by a purely deterministic process [6] corrupted by white noise. Since purely deterministic processes of finite rank can be described by linear finite dimensional state-space models [6, p. 277], the estimation can be carried on by rather straightforward subspace methods.
This although the precise meaning of this approximation has so far not been understood. Scope of this paper is to propose a possible metric for this approximation and to prove convergence althogh only in a weak sense.
Consider the infinite covariance matrix111In this paper bold symbols like are reserved for infinte arrays say stochastic processes or covariance matrices. of a scalar stationary zero-mean process having covariance function
(1) |
which we shall assume admits a Fourier transform
which is a piecewise smooth function of . For example will be continuous and bounded when belongs to . The function is called the symbol of . The process is not necessarily purely non deterministic; however we assume that is piecewise continuous and bounded, as for example is a rectangular function. Then induces a bounded operator in [7].
Let now
and consider the -truncation of the matrix , defined as
which for each has a positive point spectrum, say
where the eigenvalues are ordered in decreasing magnitude. We shall assume that all are non singular so that the eigenvalues must be strictly positive for all . By Weyl’s theorem [8, p. 203], see also [9, Fact M], the k–th eigenvalue of is a non decreasing function of and hence has a limit, , which may possibly be equal to . Each such limit is called an eigenvalue of . These limits however are in general not true eigenvalues, as it is well-known that may not have eigenvalues. For example, a bounded symmetric Toeplitz operator matrix has a purely continuous spectrum [10].
Assume now that all eigenvalues of are finite (this is equivalent to being a bounded linear operator in see [11]) and let us keep in the formal spectral decomposition of the largest eigenvalues setting all the others to zero. In this way we form an ”approximation” of of finite rank which we shall denote . Then the (infinite) matrix is the covariance of a purely deterministic process of rank [6] whose spectral density, say , is the sum of Dirac pulses. We would like to know in what sense, if any, could be considerd an approximation of or equivalently, could be considered an approximation of the symbol . Obviously this last fact could only happen in a weak sense, say for arbitrary test functions continuous on the unit circle one should have
(2) |
as .
An equivalent question can be posed in terms of approximation of a stationary process of covariance (which could in particular be p.n.d) by a purely deterministic one, say having covariance of rank . As we shall see this problem should also be naturally formulated in a weak sense.
II Approximation of random vectors
To begin with, suppose we want to approximate an -dimensional zero-mean random vector of covariance matrix by another -dimensional vector say having covariance of rank . To make the problem well-posed we shall require that the approximation generates a linear subspace of , which will then have to be -dimensional. This means that we can represent as a linear function of , say
where has rank .
Motivated from the above, let us consider the following approximation problem:
Problem 1.
Find a matrix of rank , solving the following minimum problem
(3) |
Note that an equivalent geometric formulation is to look for an optimal -dimensional subspace of onto which should be projected in order to minimize the approximation error variance. Let us stress that this is quite different from the usual least squares approximation problem which amounts to projecting onto a given subspace.
As usual, minimizing the square distance in (3) requires that the approximation should be uncorrelated with the approximation error; namely
(4) |
which is equivalent to
Introducing a square root of and defining , this condition is seen to be equivalent to
which just says that must be symmetric and idempotent (i.e. ), in other words an orthogonal projection from onto some -dimensional subspace. Hence must have the following structure
(5) |
where is any square root of and is an orthogonal projection matrix of rank .
Theorem 2.
The solutions of the approximation problem (3) are of the form
where is a matrix whose columns are the first normalized eigenvectors of , ordered in the descending magnitude ordering of the corresponding first eigenvalues, collected in the diagonal matrix , and is an arbitrary orthogonal matrix.
Proof: Let and be the spectral decomposition of in which is an orthogonal matrix of eigenvectors. We can pick as a square rot of the matrix .
Now, no matter how is chosen, the random vector has orthonormal components. Hence using (5) the cost function of our minimum problem can be rewritten as
where is the trace of . Our minimum problem can therefore be rewritten as
where is the orthogonal projection onto the orthogonal complement of the subspace .
Since the eigenvalues are ordered in decreasing order; i.e. , one sees that the minimum of this function of is reached when projects onto the subspace spanned by the first coordinate axes. In other words, the minimum being . It is then evident that
Naturally, multiplying by any orthogonal matrix does not change the result.
Observe that is just the first Principal Components Approximation of . In fact it is well-known that the PCA vector can be seen as a linear transformation acting on [12]. This result confirms in particular that the truncated PCA expansion is optimal in the sense that it provides the best and the best approximation subspace for the criterion (3). This characterization has been also found by a different technique studying subspace approximation problems; see e.g. [13].
Note for future reference that the variance matrix of has rank since
(6) |
and that this expression holds for arbitrary . Naturally one should keep in mind that the eigenvector matrices now depend on but the eigenvalues stay fixed.
III Extension to infinite dimension
In the same spirit, consider now a stationary zero-mean process and any jointly stationary zero-mean process (both written as a doubly infinite column vectors) , spanning a subspace of dimension . Any such process must be purely deterministic of rank and is then uniquely determined by any finite string of random variables induced on an interval of length . This statement from [6, page 276-277] is reported for completeness in the following lemma.
Lemma 1.
Any p.d. process of rank can be represented for all by a state-space model (i.e. a stochastic realization) of the form
(7) | ||||
(8) |
where is an -dimensional basis vector spanning the Hilbert space linearly generated by the random variables of the set , is a unitary matrix and is a -vector such that the pair is observable.
Proof: See [6, p. 277].
This linear system extends in time the finite family of random variables , generators of , to generate the stationary p.d. process defined on the whole time line . Since this realization is uniquely determined by the space modulo a choice of basis, it follows that the process is also uniquely determined by the space . Hence all covariances are also uniquely defined and determine the entries of the covariance operator of the process.
In analogy to Problem 1 let us ask if there is a stationary process spanning a subspace of dimension , which minimizes an approximation criterion of the type (3). If such a process exists we shall call it a -PC approximation of and denote it by the symbol .
Let then denote a finite subinterval of the time line of length and a matrix with entries equal to one if and zero otherwise. Consider the finite random vectors and and let be the random vector minimizing the norm for an interval of length . This problem is analogous to the problem (1) where now is substituted by . The solution string determines by stationary extension (Lemma 1) a purely deterministic process such that and has dimension . Since , this processes satisfies our requirements. By stationarity is invariant with respect to translations of the interval provided its length is fixed. Then is a -PC approximation of . The question now is to understand in what sense this approximation can get close to as .
Since we are now studying the behavior of the -PC approximation of when the dimension varies, we shall attach a superscript to and denote it by ; likewise we shall do to its covariance matrix, which will be denoted .
Theorem 3.
For each the -PC approximation of has an (infinite) covariance operator of rank . The sequence converges weakly to as diverges to , that is
for all functions (row sequence) having support in where is arbitrary.
Proof: Consider a -PC approximation of and the restriction to any interval of length . Recall that is now denoted and likewise for its covariance matrix denoted . By analogy to (6) the truncation of this matrix has the structure
(9) |
where the eigenvalues are fixed and equal to the first eigenvalues of the -truncation of the covariance operator of . The eigenvector matrices depend on as their dimension trivially grows with .
We shall now show that all are the -truncation of an infinite Toeplitz matrix of rank which is the covariance of the purely deterministic process . That this is so follows from the fact that all covariances are uniquely defined by the state space model (7), (8) and constitute exactly the entries of the (infinite) covariance operator of the p.d. process . Then, in particular, each finite matrix must be a -truncation of the same . Then from the expansion (9) it follows that this truncation is just the (symmetric) SVD approximation of rank of the truncation of , which is of course well defined for all finite .
By a well-known property of the SVD (see e.g [14, Chap. 2]) the variance matrix of is the symmetric matrix of rank which has minimum distance from that of in the Frobenius norm. This in turn implies that the variance is the (infinte) covariance matrix of rank which minimizes
(10) |
for all functions having support in an interval of length . But the sequence (10) is monotonically decreasing and nonnegative for all and fixed . In fact by obvious properties of the SVD, for each of finite support of length it tends to zero with in a finite number of steps since when the difference is zero. But this happens for all and hence for all of finite support.
Remark 1.
Contrary to all submatrices , the infinite covariance operator may not have eigenvalues (nor corresponding eigenvectors) and consequently the idea of PC approximation does not apply to the full matrix. For this reason the approximation and the convergence results may not hold in a strong sense.
IV Approximation in the spectral domain
References
- [1] G. Picci and B. Zhu, “An empirical bayesian approach to frequency estimation,” Dept. of Information Engineering University of Padova, arXiv.1910.09475, 2019.
- [2] D. Slepian and H. Pollak, “Prolate spheroidal wave functions, fourier analysis and uncertainty I,” Bell Syst. Tech. Jour., vol. 40, pp. 43–63, 1961.
- [3] H. J. Landau and H. Pollak, “Prolate spheroidal wave functions, fourier analysis and uncertainty II,” Bell Syst. Tech. Jour., vol. 40, pp. 65–84, 1961.
- [4] D. Slepian, “Prolate spheroidal wave functions, fourier analysis and uncertainty V: The discrete case,” Bell Syst. Tech. Jour., vol. 57, no. 5, pp. 1371–1430, 1978.
- [5] H. J. Landau and H. Widom, “Eigenvalue distribution of time and frequency limiting,” Journal of Mathematical Analysis and Applications, vol. 77, no. 2, pp. 469–481, 1980.
- [6] A. Lindquist and G. Picci, Linear Stochastic Systems: a Geometric Approach to Modeling Estimation and Identification. Springer Verlag, 2015.
- [7] N. Akhiezer and I. M. Glazman, Theory of Linear Operators in Hilbert Space Vol I. New York: Fredrik Ungar Pub. Co., 1961.
- [8] G. W. Stewart and J. Sun, Matrix perturbation theory. Boston: Academic Press, 1990.
- [9] M. Forni and M. Lippi, “The generalized dynamic factor model: representation theory,” Econometric Theory, vol. 17, pp. 1113–1141, 2001.
- [10] P. Hartman and A. Wintner, “The spectra of Toeplitz matrices,” American Journal of Mathematics, vol. 76, pp. 867–882, 1954.
- [11] G. Bottegal and G. Picci, “Modeling complex systems by generalized factor analysis,” IEEE Trans. Autom. Control, vol. 60, no. 3, pp. 759–774, 2015.
- [12] H. Hotelling, “Relations between two sets of variates,” Biometrika, vol. 28, pp. 321–377, 1936.
- [13] B. Yang, “Projection approximation subspace tracking,” IEEE Transactions on Signal Procesing, vol. 43, pp. 95–107, 1995.
- [14] G. Golub and C. V. Loan, Matrix computations. John Hopkins University Press, edition, 1989.