Joint Mean-Vector and Var-Matrix estimation
for Locally Stationary VAR(1) processes
Giovanni Motta111Correpondence to: Giovanni Motta, 3143 TAMU, Department of Statistics, College Station, TX 77843, USA.
E-mail: [email protected] Department of Statistics, Texas A&M University
Abstract
During the last two decades, locally stationary processes have been widely studied in the time series literature. In this paper we consider the locally-stationary vector-auto-regression model of order one, or LS-VAR(1), and estimate its parameters by weighted least squares. The LS-VAR(1) we consider allows for a smoothly time-varying non-diagonal VAR matrix, as well as for a smoothly time-varying non-zero mean. The weighting scheme is based on kernel smoothers. The time-varying mean and the time-varying VAR matrix are estimated jointly, and the definition of the local-linear weighting matrix is provided in closed-from. The quality of the estimated curves is illustrated through simulation results.
Keywordsβ Local Stationarity, Local Polynomials, Weighted Least Squares
Data Availability Statement:
Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.
1 Introduction
In this paper we consider -dimensional multivariate data generated by a locally stationary process, and our goal is to fit to the data a parametric model with time-varying coefficients. The notation emphasizes that the data is a triangular array where at each , the structure of the process depends on the sample size .
To introduce the problem, consider the following uni-variate zero-mean autoregressive model of order ,
(1)
or AR(), where the coefficients are differentiable for with bounded derivatives.
In terms of modeling, local stationarity means that if the parameters are smooth in rescaled time and is large, then for all in a neightbour of . For estimation, rescaling in time allows to apply non-parametric methods to recover the unknown curves. In the frequency domain, the importance of rescaling time by the sample size and developing the analysis in rescaled time relies upon the uniqueness of the transfer function. Dahlhaus(1996) introduced
the spectral representation of a locally stationary process
where is a stochastic process with orthogonal increments, and where the sequence converges (uniformly in and , as diverges) to another function :
If is smooth in , then the time-varying spectral density is uniquely determined from the triangular array.
The dichotomy between and is particularly relevant in the case of AR processes. To see this, consider the simple case where and and . In the stationary case where the coefficient is time-invariant, the process in (1) can be written as
with . By contrast, the locally stationary process does not have a solution of the form
but only of the form ,ββ withββ .
The seminal papers on local stationarity (Dahlhaus, 1996, 1997) provide details on the mathematics in the frequency domain. For an overview on multivariate locally stationary processes, see Dahlhaus(2012, Section 7.2).
Without loss of generality, assume that is known and time-invariant, that is, . Suppose that the vector of interest
depends on a finite dimensional parameter. For example, if the coefficients are polynomials in time
(2)
estimating the time-varying vector at translates into estimating the time-invariant vector , where , with .
The specification in (2) approximates the coefficients vector by global polynomials. Dahlhaus(1997, Section 4) obtained an explicit formula for the vector as the solution of a linear system similar to the Yule-Walker equations. In the univariate setting, Dahlhaus(1996) estimates the time-varying parameters by kernel smoothers, that is, using a local-constant approximation. In the multivariate settings, Dahlhaus(2000, p. 1776) mentions the possibility of estimating the unknown parameters by minimazing of a local-polynomial approximation of the local likelihood.
Zhou andΒ Wu(2010) consider univariate linear models of the form , where both and are asumed to be locally stationary, and estimate their time-varying vector of coefficients by means of local polynomials.
The contribution of this paper is threefold. In terms of modeling, we consider a multivariate version of model (1) and estimate the time varying parameters in time domain. Our main contribution is the closed-form definition of the local-linear estimator of the parameters. Finally, we emphasize that the estimation of time-varying mean and time-varying AR matrix is performed jointly.
In SectionΒ 2 we derive the localized Yule-Walker equations for a locally-stationary zero-mean VAR process. In SectionΒ 3 we consider a locally-stationary VAR with time-varying mean. First, we derive the local-constant weighted-least-squares estimator, see PropositionΒ 1. Then in TheoremΒ 1 we establish our main result, the closed form definition of the local-linear weighted-least-squares estimator. In SectionΒ 4 we illustrate and compare the performance of the two weighted-least-squares estimators (local-constant and local-linear). SectionΒ 5 concludes and highlights the extension of our approach to the high-dimensional () setting.
Through the paper we use bold uppercase letters to denote matrices, and bold slanted to denote vectors. We denote by the identity matrix of size , by the vector of ones, by the trace of ,
by the transpose of , by the Frobenius norm , and by the inverse of , that is, the square matrix such that . Finally, we use the acronyms VAR and WLS for vector auto-regression and weighted least squares, respectively.
2 Locally Stationary Vector Auto Regression
Consider the following -dimensional Locally Stationary Vector Auto Regression of order 1
(3)
with and .
If the largest eigenvalue of lies inside the unit circle
is locally stationary and causal. Our goal is to estimate at a fixed using a localized version of the Yule-Walker equations. If we assume that the matrix-valued function is smooth in , we can write the Taylor expansion of around :
where . We are interested in evaluating the function at those value of in a neighborhood of . For example, for a fixed let , where is the largest integer not exceeding .
Then we have the following uniform bound:
As a consequence, assuming that
we obtain following bound
uniformly in .
Let , where is a Kernel function such that is and is the smoothing parameter that tends to zero as diverges, but slower than :
If we right-multiply (3) by and sum over we obtain
3 Joint estimation of time-varying mean-vector and VAR-matrix by Weighted Least Squares
The estimator in (8) can be obtained as the minimizer of a WLS problem. Consider the LS-VAR(1) in (3), and let us now allow for a time-varying (non-zero) mean
(9)
with and .
If the largest eigenvalue of lies inside the unit circle
(10)
is locally stationary and causal. Our goal is to estimate at a fixed by WLS. We can rewrite (9) as
and it makes sense to define as the minimizer of the weighted loss function
(13)
where the bandwidth sequence tends to zero slower than : and as . The following proposition provides a closed-form of the local-constant minimizer of (13).
We consider model (12) and use the local-constant approximation of in a neighborhood
of
to estimate , our parameter of interest, in the approximate model
Our first result generalizes the Yule-Walker solutions in (8) to allow for the time-varying mean vector .
Proposition 1.
Let follow the locally stationary model in (9), with satisfying (10). Assume that the mean function and the VAR matrix in (9) are both differentiable uniformly in , that is,
(14)
where and . Then, the local constant minimizer of (13) is
The following proposition provides a closed-form of the local-linear minimizer of (13). We consider model (12) and use the local-linear approximation of in a neighborhood
of
(21)
to estimate , our parameter of interest, in the approximate model
(22)
Theorem 1.
Let follow the locally stationary model in (9), with satisfying (10). Assume that the mean function and the VAR matrix in (9) are both continuously differentiable, uniformly in , that is,
(23)
where . Let
where has been defined in (5), and define the diagonal matrix
(24)
and the weighting matrix
(25)
where has been defined (7). Then, the local linear minimizer of (13) is
We first consider the zero-mean locally stationary VAR(1) model in (3), and estimate the VAR matrix by means of the localized Yule-Walker equations. Then we consider the locally stationary VAR(1) model in (9) with time-varying mean, and compare the WLS estimates obtained with local-constant and local-linear weights, respectively.
We simulate model (3) with . For and , we generate
the time varying entries of the matrix as
(32)
with , and . This specification is such that, for all ,
We estimate the parameters according to (8), with and the Gaussian Kernel .
The results are reported in FigureΒ 1.
(a) One realization of the times series simulated according to (3) with as in (32) and .
(b) Red lines: simulated parameters according to (32). Solid-black lines: average of the estimates over replications.
Dashed-black lines: 90% confidence bands (empirical quantiles) of the estimates.
Figure 1: Left: simulated time series according to (3), with and as in (32). Right: estimates obtained according to (8) over replications.
We simulate model (9) with . For , we generate
the time varying entries of the vector as
(33)
and the matrix as
(34)
with , and , , and . This specification is such that, for all ,
FigureΒ 2 exhibits the parameters estimated by WLS with and the Epanechnikov Kernel . The local constant estimates, obtained according to (15)-(16), are presented reported in FigureΒ 2(a).
The local linear estimates, obtained according to (25)-(27), are reported in FigureΒ 2(b).
FigureΒ 1 shows that in the absence of the (time-varying) mean, that is when , the local constant estimator performs very well. However, as FigureΒ 2(a) illustrates, this is not the case in the presence of a (time-varying) mean.
It is clear from FigureΒ 2 that
although the local constant estimates look satisfactory, the local linear approach delivers superior results. As in the univariate case, the bias of the local linear estimator only depends on the second derivative (of the unknown regression function), and this does not come as a cost in the asymptotic variance. Moreover, the local linear estimator does not suffer from boundary bias problems. The quality of the estimates in FigureΒ 2(b) is remarkable.
(a) Local-constant WLS estimates obtained according to (15)-(16) over replications.
(b) Local-linear WLS estimates obtained according to (25)-(27) over replications.
Figure 2: First row: one realization of the time series simulated according to (9) with as in (33), as in (34), and . Second row: estimated time-varying means. Third, fourth, and last row: estimated time-varying VAR-coefficients. Red: simulated curves. Solid black: average of the estimates. Dashed black: 95% confidence bands.
5 Conclusions and future research
In this paper we consider the problem of estimating the time-varying mean-vector and the time-varying AR-matrix of a locally stationary VAR(1). We provide the closed form definition of the local linear solution to the weighted least squares, in a way that the mean-vector and the AR-matrix are estimated jointly.
The asymptotic properties of our estimator need to be studied. Also, it would be interesting to develop data driven methods to select the smoothing parameters. Moreover, we might consider the problem of estimating the parameters of a locally stationary VAR() of order .
An important contribution from future studies is the extension of the WLS in (40) to the high-dimensional setting: . To this end, the WLS approach can be can be generalized in more than one direction. In fact, the closed form in (25)-(27) of Theorem 1 becomes particularly attractive in the case the length of the time series becomes large. Indeed, we can stick to the linear regression model in (22)
with the same assumptions as in SectionΒ 3, and fit (22) in a way to shrink the regression coefficients
towards zero. More precisely we can consider minimizing, with respect to , the following WLS-Ridge loss-function
(35)
the effect of the penalty being to shrink the entries of towards zero. The approach based on (35) can be generalized to the case of a non-spherical penalty. The loss function corresponding to this scenario is
(36)
which comprises a WLS criterion β as in (40) β and a generalized ridge penalty given by the matrix . In both (35) and (36), the matrix is diagonal, the -th element representing the weight of the -th observation such that . The penalty in (36) is a quadratic form with penalty parameter , an -dimensional positive-definite matrix. When , we obtain the spherical penalty of the WLS-Ridge regression in (35). Generalizing the (positive) scalar to the class of (positive definite) matrices allows for (i) different penalization per regression parameter, and (ii) joint shrinkage among the elements of .
References
(1)
Dahlhaus (1996)
Dahlhaus, R. (1996).
Asymptotic statistical inference for nonstationary processes with
evolutionary spectra.
In P.Β M. Robinson andΒ M.Β Rosenblatt (eds), Athens
Conference on Applied Probability and Time Series Analysis, Vol.Β II,
Springer-Verlag, New York.
Dahlhaus (1997)
Dahlhaus, R. (1997).
Fitting time series models to nonstationary processes.
The Annals of Statistics25,Β 1β37.
Dahlhaus (2000)
Dahlhaus, R. (2000).
A likelihood approximation for locally stationary processes.
The Annals of Statistics28(6),Β 1762β1794.
Dahlhaus (2012)
Dahlhaus, R. (2012).
Time Series Analysis: Methods and Applications.
Vol.Β 30, Elsevier, chapter Locally Stationary Processes.
LΓΌtkepohl (1996)
LΓΌtkepohl, H. (1996).
Handbook of Matrices.
John Wiley & Sons.
Zhou andΒ Wu (2010)
Zhou, Z. andΒ Wu, W.Β B. (2010).
Simultaneous inference of linear models with time varying
coefficients.
Journal of the Royal Statistical Society, Series B72(4),Β 513β531.
uniformly in . Therefore, adopting (21)-(22) the loss in (13) can be approximated by
(39)
Letting
and
where is a vector of ones, and where and have been defined in (5) and (24), respectively, the loss in (39) can be written in matrix form as
(40)
with as in (6) and as in (7). The loss in (40) is equal to
and thus minimizing with respect to is equivalent to minimizing
with respect to . Differentiating w.r.t. and equating to zero we obtain
that is,
Notice that
and that
where
The local-linear estimator of is given by the first columns of the matrix
Hence, we need the first columns of the matrix
. Without proof we state the following Lemma, see
LΓΌtkepohl(1996, result (2) in Section 3.5.3, page 30).
Lemma 2.
Let be , be , be , and be , and consider the partitioned matrix
where . If we define the matrix according to (25), the local-linear estimator can be written as
and (26) is proved. The estimator has the same form as the estimator in (15) of PropositionΒ 1, with instead of . Hence the result in (27) with
, , , and as in (28), (29), (30), and (31), respectively, follows directly from PropositionΒ 1.