Joint Mean-Vector and Var-Matrix estimation
for Locally Stationary VAR(1) processes

Giovanni Motta¹¹1Correpondence to: Giovanni Motta, 3143 TAMU, Department of Statistics, College Station, TX 77843, USA. E-mail: [email protected]
Department of Statistics, Texas A&M University

Abstract

During the last two decades, locally stationary processes have been widely studied in the time series literature. In this paper we consider the locally-stationary vector-auto-regression model of order one, or LS-VAR(1), and estimate its parameters by weighted least squares. The LS-VAR(1) we consider allows for a smoothly time-varying non-diagonal VAR matrix, as well as for a smoothly time-varying non-zero mean. The weighting scheme is based on kernel smoothers. The time-varying mean and the time-varying VAR matrix are estimated jointly, and the definition of the local-linear weighting matrix is provided in closed-from. The quality of the estimated curves is illustrated through simulation results.

Keywords— Local Stationarity, Local Polynomials, Weighted Least Squares

Data Availability Statement: Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

1 Introduction

In this paper we consider $r$ -dimensional multivariate data $\bm{X}_{T}(1),\dots\bm{X}_{T}(t)$ generated by a locally stationary process, and our goal is to fit to the data a parametric model with time-varying coefficients. The notation $\bm{X}_{T}(t)$ emphasizes that the data is a triangular array where at each $t$ , the structure of the process depends on the sample size $T$ .

To introduce the problem, consider the following uni-variate zero-mean autoregressive model of order $p$ ,

\sum_{j=0}^{p}a_{j}(\tfrac{t}{T})X_{T}(t-j)=\sigma(\tfrac{t}{T})\varepsilon(t),

(1)

or AR( $p$ ), where the coefficients $a_{j}(u)$ are differentiable for $u\in(0,1)$ with bounded derivatives.

In terms of modeling, local stationarity means that if the parameters $a_{j}(u)$ are smooth in rescaled time $u$ and $T$ is large, then $a_{j}(\tfrac{s}{T})\approx a_{j}(\tfrac{t}{T})$ for all $s$ in a neightbour of $t$ . For estimation, rescaling in time allows to apply non-parametric methods to recover the unknown curves. In the frequency domain, the importance of rescaling time $t$ by the sample size $T$ and developing the analysis in rescaled time $u\in(0,1)$ relies upon the uniqueness of the transfer function. Dahlhaus (1996) introduced the spectral representation of a locally stationary process

X_{T}(u)=\int_{-\pi}^{-\pi}\exp(i\lambda t)A_{T}^{0}(t,\lambda)d\xi(\lambda),

where $\xi(\lambda)$ is a stochastic process with orthogonal increments, and where the sequence $A_{T}^{0}(t,\lambda)$ converges (uniformly in $t$ and $\lambda$ , as $T$ diverges) to another function $A(u,\lambda)$ :

\sup_{t,\lambda}|A_{T}^{0}(t,\lambda)-A(\tfrac{t}{T},\lambda)|=\mathcal{O}(\tfrac{1}{T}).

If $A$ is smooth in $u$ , then the time-varying spectral density $f(u,\lambda)=|A(\tfrac{t}{T},\lambda)|^{2}$ is uniquely determined from the triangular array.

The dichotomy between $A_{T}^{0}$ and $A$ is particularly relevant in the case of AR processes. To see this, consider the simple case where $p=1$ and and $\sigma(u)\equiv 1$ . In the stationary case where the coefficient $a$ is time-invariant, the process in (1) can be written as

X(t)=\sum_{k=0}^{\infty}\psi_{\,\!{}_{k}}\varepsilon_{t-k}

with $\psi_{\,\!{}_{k}}=a^{k}$ . By contrast, the locally stationary process $X_{T}(t)=a(\tfrac{t}{T})X_{T}(t-1)+\varepsilon(t)$ does not have a solution of the form

X_{T}(t)=\sum_{k=0}^{\infty}\psi_{\,\!{}_{k}}(u)\varepsilon_{t-k},

but only of the form $X_{T}(t)=\displaystyle\sum_{k=0}^{\infty}\psi_{\!{}_{T,k}}(t)\varepsilon_{t-k}$ , with $\sup_{t}\displaystyle\sum_{k\in\mathbb{Z}}|\psi_{\!{}_{T,k}}(t)-\psi_{\,\!{}_{k}}(\tfrac{t}{T})|=\mathcal{O}(\tfrac{1}{T})$ .

The seminal papers on local stationarity (Dahlhaus, 1996, 1997) provide details on the mathematics in the frequency domain. For an overview on multivariate locally stationary processes, see Dahlhaus (2012, Section 7.2).

Without loss of generality, assume that $\sigma(u)$ is known and time-invariant, that is, $\sigma(u)\equiv\sigma$ . Suppose that the vector of interest $\bm{a}(u)=[a_{1}(u),\dots,a_{p}(u)]^{\top}$ depends on a finite dimensional parameter. For example, if the coefficients are polynomials in time

\displaystyle\begin{split}a_{j}(u)=&\sum_{k=1}^{K}\theta_{jk}f_{k}(u),\quad 1\leq k\leq K,&\quad 1\leq j\leq p,\\ \mbox{with}\quad f_{k}(u)=&\,\,\,u^{k-1},\qquad\qquad 1\leq k\leq K,&\end{split}

(2)

estimating the time-varying vector $\bm{a}(u)$ at $u\in(0,1)$ translates into estimating the time-invariant vector $\bm{\theta}=[\bm{\theta}_{1}^{\top},\dots,\bm{\theta}_{p}^{\top}]^{\top}$ , where $\bm{\theta}_{j}=[\theta_{j1},\dots,\theta_{jK}]^{\top}$ , with $1\leq j\leq p$ .

The specification in (2) approximates the coefficients vector $\bm{a}(u)$ by global polynomials. Dahlhaus (1997, Section 4) obtained an explicit formula for the vector ${\bm{\theta}}$ as the solution of a linear system similar to the Yule-Walker equations. In the univariate setting, Dahlhaus (1996) estimates the time-varying parameters $\bm{a}(u)$ by kernel smoothers, that is, using a local-constant approximation. In the multivariate settings, Dahlhaus (2000, p. 1776) mentions the possibility of estimating the unknown parameters $\bm{\theta}(u)$ by minimazing of a local-polynomial approximation of the local likelihood.

Zhou and Wu (2010) consider univariate linear models of the form $\bm{Y}(t)=\bm{X}^{\top}\bm{\beta}(t)+\bm{\varepsilon}(t)$ , where both $\bm{X}$ and $\bm{\varepsilon}$ are asumed to be locally stationary, and estimate their time-varying vector of coefficients $\bm{\beta}(t)$ by means of local polynomials.

The contribution of this paper is threefold. In terms of modeling, we consider a multivariate version of model (1) and estimate the time varying parameters in time domain. Our main contribution is the closed-form definition of the local-linear estimator of the parameters. Finally, we emphasize that the estimation of time-varying mean and time-varying AR matrix is performed jointly.

In Section 2 we derive the localized Yule-Walker equations for a locally-stationary zero-mean VAR process. In Section 3 we consider a locally-stationary VAR with time-varying mean. First, we derive the local-constant weighted-least-squares estimator, see Proposition 1. Then in Theorem 1 we establish our main result, the closed form definition of the local-linear weighted-least-squares estimator. In Section 4 we illustrate and compare the performance of the two weighted-least-squares estimators (local-constant and local-linear). Section 5 concludes and highlights the extension of our approach to the high-dimensional ( $r>T$ ) setting.

Through the paper we use bold uppercase letters to denote matrices, and bold slanted to denote vectors. We denote by $\mathbf{I}_{m}$ the identity matrix of size $m$ , by $\bm{1}_{m}$ the $m\times 1$ vector of ones, by ${\rm tr}\{\mathbf{A}\}$ the trace of $\mathbf{A}$ , by $\mathbf{A}^{\top}$ the transpose of $\mathbf{A}$ , by $\|\mathbf{A}\|$ the Frobenius norm $\|\mathbf{A}\|=[{\rm tr}\{\mathbf{A}^{\top}\mathbf{A}\}]^{\scalebox{0.6}{$1/2$}}$ , and by $\mathbf{A}^{-1}$ the inverse of $\mathbf{A}$ , that is, the square matrix such that $\mathbf{A}^{-1}\mathbf{A}=\mathbf{A}\mathbf{A}^{-1}=\mathbf{I}$ . Finally, we use the acronyms VAR and WLS for vector auto-regression and weighted least squares, respectively.

2 Locally Stationary Vector Auto Regression

Consider the following $r$ -dimensional Locally Stationary Vector Auto Regression of order 1

\underset{r\times 1}{\bm{X}_{t}}=\underset{r\times r}{\mathbf{A}(\tfrac{t}{T})}\,\underset{r\times 1}{\bm{X}_{t-1}}+\underset{r\times 1}{\bm{\varepsilon}_{t}},\qquad t=1,\dots,T,

(3)

with $\bm{X}_{0}=\bm{0}$ and $\mathbb{E}[\bm{\varepsilon}_{t}\bm{X}_{s}^{\top}]=\mathbbm{1}_{\{s=t\}}\mathbf{\Gamma}_{\varepsilon}$ . If the largest eigenvalue of $\mathbf{A}$ lies inside the unit circle

\sup_{u\in(0,1)}\left|{\rm v}_{1}[\mathbf{A}(u)]\,\right|<1,

$\bm{X}_{t}$ is locally stationary and causal. Our goal is to estimate $\mathbf{A}(u)$ at a fixed $u\in(0,1)$ using a localized version of the Yule-Walker equations. If we assume that the matrix-valued function $\mathbf{A}(x)$ is smooth in $x$ , we can write the Taylor expansion of $\mathbf{A}(x)$ around $u$ :

\mathbf{A}(x)=\sum_{j=0}^{\infty}\tfrac{\big{(}x-u\big{)}^{j}}{j!}\mathbf{A}^{(j)}(u)\\ =\mathbf{A}(u)+(x-u)\mathbf{A}^{(1)}(u)+\mathcal{O}([x-u]^{2})

where $\mathbf{A}^{(j)}(u):=\tfrac{d^{j}\mathbf{A}(x)}{dx^{j}}|_{x=u}$ . We are interested in evaluating the function $\mathbf{A}(\tfrac{t}{T})$ at those value of $t$ in a neighborhood of $u$ . For example, for a fixed $u_{0}\in(0,1)$ let $t_{0}=\lfloor u_{0}T\rfloor$ , where $\lfloor x\rfloor$ is the largest integer not exceeding $x$ . Then we have the following uniform bound:

\sup_{u_{0}\in(0,1)}|\tfrac{t_{0}}{T}-u_{0}|<\tfrac{1}{T}.

As a consequence, assuming that

\sup_{u\in(0,1)}\|\mathbf{A}^{(1)}(u)\|<\infty

we obtain following bound

\|\mathbf{A}(\tfrac{t}{T})-\mathbf{A}(u)\|\leq|\tfrac{t}{T}-u|\,\|\mathbf{A}^{(1)}(u)\|=\mathcal{O}(\tfrac{1}{T})\times\mathcal{O}(1)=\mathcal{O}(\tfrac{1}{T})

uniformly in $u$ . Let $K_{h}(x)=\tfrac{1}{h}K(\tfrac{x}{h})$ , where $K(\cdot)$ is a Kernel function such that $K(x/h)=0$ is $|x|>h$ and $h=h_{\!{}_{T}}$ is the smoothing parameter that tends to zero as $T$ diverges, but slower than $1/T$ :

h=h_{\!{}_{T}}\to 0\quad\mbox{and}\quad Th_{\!{}_{T}}\to\infty\quad\mbox{as}\,T\to\infty.

If we right-multiply (3) by $\bm{X}_{t-1}^{\top}K_{h}(\tfrac{t}{T}-u)$ and sum over $t$ we obtain

\displaystyle\sum_{t=1}^{T}\bm{X}_{t}\bm{X}_{t-1}^{\top}K_{h}(\tfrac{t}{T}-u)

\displaystyle=

\displaystyle\sum_{t=1}^{T}\mathbf{A}(\tfrac{t}{T})\bm{X}_{t-1}\bm{X}_{t-1}^{\top}K_{h}(\tfrac{t}{T}-u)+\sum_{t=1}^{T}\bm{\varepsilon}_{t}\bm{X}_{t-1}^{\top}K_{h}(\tfrac{t}{T}-u).

Since $\mathbb{E}[\bm{\varepsilon}_{t}\bm{X}_{t-1}^{\top}]=0$ ,

\sum_{t=1}^{T}\mathbf{A}(\tfrac{t}{T})\bm{X}_{t-1}\bm{X}_{t-1}^{\top}K_{h}(\tfrac{t}{T}-u)-\mathbf{A}(u)\sum_{t=1}^{T}\bm{X}_{t-1}\bm{X}_{t-1}^{\top}K_{h}(\tfrac{t}{T}-u)=\sum_{t=1}^{T}[\mathbf{A}(\tfrac{t}{T})-\mathbf{A}(u)]\bm{X}_{t-1}\bm{X}_{t-1}^{\top}K_{h}(\tfrac{t}{T}-u),

	$\displaystyle\\|\tfrac{1}{T}\sum_{t=1}^{T}\bm{X}_{t}\bm{X}_{t}^{\top}K_{h}(\tfrac{t}{T}-u)-\mathbf{\Gamma}(0,u)\\|=\mathcal{O}_{p}(\tfrac{1}{\sqrt{T\,h}}),$
	$\displaystyle\\|\tfrac{1}{T}\sum_{t=1}^{T}\bm{X}_{t}\bm{X}_{t-1}^{\top}K_{h}(\tfrac{t}{T}-u)-\mathbf{\Gamma}(1,u)\\|=\mathcal{O}_{p}(\tfrac{1}{\sqrt{T\,h}}),$

and

$\displaystyle\\|\tfrac{1}{T}\sum_{t=1}^{T}[\mathbf{A}(\tfrac{t}{T})-\mathbf{A}(u)]\bm{X}_{t-1}\bm{X}_{t-1}^{\top}K_{h}(\tfrac{t}{T}-u)\\|$	$\displaystyle\leq$	$\displaystyle\sup_{t,T}\\|\mathbf{A}(\tfrac{t}{T})\\|\times\\|\tfrac{1}{T}\sum_{t=1}^{T}\bm{X}_{t-1}\bm{X}_{t-1}^{\top}K_{h}(\tfrac{t}{T}-u)\\|$
	$\displaystyle\leq$	$\displaystyle\|\tfrac{t}{T}-u\|\,\times\\|\mathbf{A}^{(1)}(u)\\|\times\\|\tfrac{1}{T}\sum_{t=1}^{T}\bm{X}_{t-1}\bm{X}_{t-1}^{\top}K_{h}(\tfrac{t}{T}-u)\\|$
	$\displaystyle\leq$	$\displaystyle\mathcal{O}(\tfrac{1}{T})\times\mathcal{O}(1)\times\\|\mathbf{\Gamma}(0,u)+\mathcal{O}_{p}(\tfrac{1}{\sqrt{T\,h}})\\|=\mathcal{O}_{p}(\tfrac{1}{T}),$

it makes sense estimating $\mathbf{A}(u)$ as

\widehat{\mathbf{A}}(u)=\Big{[}\sum_{t=1}^{T}\bm{X}_{t}\bm{X}_{t-1}^{\top}K_{h}(\tfrac{t}{T}-u)\Big{]}\Big{[}\sum_{t=1}^{T}\bm{X}_{t-1}\bm{X}_{t-1}^{\top}K_{h}(\tfrac{t}{T}-u)\Big{]}^{-1}.

(4)

If we define

$\displaystyle\underset{T\times r}{\mathbf{X}_{0}}$	$\displaystyle=$	$\displaystyle\{\underset{r\times 1}{\bm{X}_{0}},\underset{r\times 1}{\bm{X}_{1}},\dots,\underset{r\times 1}{\bm{X}_{T-1}}\}^{\top},$	(5)
$\displaystyle\underset{T\times r}{\mathbf{X}_{1}}$	$\displaystyle=$	$\displaystyle\{\underset{r\times 1}{\bm{X}_{1}},\underset{r\times 1}{\bm{X}_{2}},\dots,\underset{r\times 1}{\bm{X}_{T}}\}^{\top},\,\,\mbox{and}$	(6)
$\displaystyle\underset{T\times T}{\mathbf{K}_{\!{}_{T}}(u)}$	$\displaystyle=$	$\displaystyle{\rm{diag}}\{K_{h}(\tfrac{1}{T}-u),K_{h}(\tfrac{2}{T}-u),\dots,K_{h}(1-u)\},$	(7)

the estimator in (4) can be written as

\widehat{\mathbf{A}}(u)=[\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{0}]\,[\mathbf{X}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{0}]^{-1}.

(8)

3 Joint estimation of time-varying mean-vector and VAR-matrix by Weighted Least Squares

The estimator in (8) can be obtained as the minimizer of a WLS problem. Consider the LS-VAR(1) in (3), and let us now allow for a time-varying (non-zero) mean

\bm{X}_{t}-\bm{\mu}(\tfrac{t}{T})=\mathbf{A}(\tfrac{t}{T})[\bm{X}_{t-1}-\bm{\mu}(\tfrac{t-1}{T})]+\bm{\varepsilon}_{t},\quad t=1,\dots,T,

(9)

with $\bm{X}_{0}=\bm{\mu}(0)=\bm{0}$ and $\mathbb{E}[\bm{\varepsilon}_{t}\bm{X}_{s}^{\top}]=\mathbbm{1}_{\{s=t\}}\mathbf{\Gamma}_{\varepsilon}$ . If the largest eigenvalue of $\mathbf{A}$ lies inside the unit circle

\sup_{u\in(0,1)}\left|{\rm v}_{1}[\mathbf{A}(u)]\,\right|<1,

(10)

$\bm{X}_{t}$ is locally stationary and causal. Our goal is to estimate $\mathbf{A}(u)$ at a fixed $u\in(0,1)$ by WLS. We can rewrite (9) as

\bm{X}_{t}=\bm{m}(\tfrac{t}{T})+\mathbf{A}(\tfrac{t}{T})\bm{X}_{t-1}+\bm{\varepsilon}_{t},

(11)

where $\bm{m}(\tfrac{t}{T})=\bm{\mu}(\tfrac{t}{T})-\mathbf{A}(\tfrac{t}{T})\bm{\mu}(\tfrac{t-1}{T})$ for all $t=1,\dots,T$ . If we define

	$\displaystyle\underset{r\times(r+1)}{\mathbf{B}(u)}$	$\displaystyle=$	$\displaystyle[\bm{m}(u),\mathbf{A}(u)]\quad\mbox{and}$
	$\displaystyle\underset{(r+1)\times 1}{\bm{Z}_{t}}$	$\displaystyle=$	$\displaystyle\begin{pmatrix}1\\ \bm{X}_{t-1}\end{pmatrix}$

for all $t=1,\dots,T$ , model (11) can be written as

\bm{X}_{t}=\mathbf{B}(\tfrac{t}{T})\bm{Z}_{t}+\bm{\varepsilon}_{t},

(12)

and it makes sense to define $\widehat{\mathbf{B}}(u)$ as the minimizer of the weighted loss function

\sum_{t=1}^{T}\|\bm{X}_{t}-\mathbf{B}(\tfrac{t}{T})\bm{Z}_{t}\|^{2}\,K_{h}(\tfrac{t}{T}-u),

(13)

where the bandwidth sequence $h\equiv h_{T}$ tends to zero slower than $T^{-1}$ : $h_{T}\to 0$ and $T\,h_{T}\to\infty$ as $T\to\infty$ . The following proposition provides a closed-form of the local-constant minimizer of (13). We consider model (12) and use the local-constant approximation of $\mathbf{B}(\tfrac{t}{T})$ in a neighborhood of $u$

\mathbf{B}(\tfrac{t}{T})\approx\mathbf{B}(u)

to estimate $\mathbf{B}(u)$ , our parameter of interest, in the approximate model

\bm{X}_{t}\approx\mathbf{B}(u)\bm{Z}_{t}+\bm{\varepsilon}_{t}.

Our first result generalizes the Yule-Walker solutions in (8) to allow for the time-varying mean vector $\bm{\mu}$ .

Proposition 1.

Let $\bm{X}_{t}$ follow the locally stationary model in (9), with $\mathbf{A}(u)$ satisfying (10). Assume that the mean function $\bm{\mu}(u)$ and the VAR matrix $\mathbf{A}(u)$ in (9) are both differentiable uniformly in $u$ , that is,

\sup_{u\in(0,1)}\|\bm{\mu}^{(1)}(u)\|<\infty,\qquad\mbox{and}\quad\sup_{u\in(0,1)}\|\mathbf{A}^{(1)}(u)\|<\infty,

(14)

where $\bm{\mu}^{(1)}(u):=\tfrac{d\bm{\mu}(x)}{dx}|_{x=u}$ and $\mathbf{A}^{(1)}(u):=\tfrac{d\mathbf{A}(x)}{dx}|_{x=u}$ . Then, the local constant minimizer of (13) is

	$\displaystyle\widehat{\mathbf{B}}(u)$	$\displaystyle=$	$\displaystyle\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{Z}_{0}\,\big{(}\mathbf{Z}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{Z}_{0}\big{)}^{-1}$		(15)
		$\displaystyle=$	$\displaystyle[\widehat{\bm{m}}(u),\widehat{\mathbf{A}}(u)],$		(16)

with $\widehat{\mathbf{A}}(u)=\widehat{\mathbf{G}}(u,1)\,[\widehat{\mathbf{G}}(u,0)^{-1}]$ and $\widehat{\bm{m}}(u)=\widehat{\bm{\mu}}_{1}(u)-\widehat{\mathbf{A}}(u)\widehat{\bm{\mu}}_{0}(u)$ , where

$\displaystyle\widehat{\bm{\mu}}_{0}(u)$	$\displaystyle=$	$\displaystyle\mathbf{X}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}/\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}\quad=$	$\displaystyle\sum_{t=1}^{T}\tfrac{K_{h}(\frac{t}{T}-u)}{\sum_{s=1}^{T}K_{h}(\frac{s}{T}-u)}\bm{X}_{t-1},$	(17)
$\displaystyle\widehat{\bm{\mu}}_{1}(u)$	$\displaystyle=$	$\displaystyle\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}/\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}\quad=$	$\displaystyle\sum_{t=1}^{T}\tfrac{K_{h}(\frac{t}{T}-u)}{\sum_{s=1}^{T}K_{h}(\frac{s}{T}-u)}\bm{X}_{t},$	(18)
$\displaystyle\widehat{\mathbf{G}}(u,0)$	$\displaystyle=$	$\displaystyle[\mathbf{X}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{0}-\tfrac{\mathbf{X}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{0}}{\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}]=$	$\displaystyle[\mathbf{X}_{0}-\bm{1}_{T}\widehat{\bm{\mu}}_{0}^{\top}(u)]^{\top}\mathbf{K}_{\!{}_{T}}(u)[\mathbf{X}_{0}-\bm{1}_{T}\widehat{\bm{\mu}}_{0}^{\top}(u)]$	(19)
	$\displaystyle=$	$\displaystyle[\mathbf{X}_{0}-\tfrac{\bm{1}_{T}\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{0}}{\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}]^{\top}\mathbf{K}_{\!{}_{T}}(u)[\mathbf{X}_{0}-\tfrac{\bm{1}_{T}\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{0}}{\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}]=$	$\displaystyle\sum_{t=1}^{T}[\bm{X}_{t-1}-\widehat{\bm{\mu}}_{0}(u)][\bm{X}_{t-1}-\widehat{\bm{\mu}}_{0}(u)]^{\top}K_{h}(\tfrac{t}{T}-u),$
$\displaystyle\widehat{\mathbf{G}}(u,1)$	$\displaystyle=$	$\displaystyle[\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{0}-\tfrac{\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{0}}{\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}]=$	$\displaystyle[\mathbf{X}_{1}-\bm{1}_{T}\widehat{\bm{\mu}}_{1}^{\top}(u)]^{\top}\mathbf{K}_{\!{}_{T}}(u)[\mathbf{X}_{0}-\bm{1}_{T}\widehat{\bm{\mu}}_{0}^{\top}(u)]$	(20)
	$\displaystyle=$	$\displaystyle[\mathbf{X}_{1}-\tfrac{\bm{1}_{T}\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{1}}{\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}]^{\top}\mathbf{K}_{\!{}_{T}}(u)[\mathbf{X}_{0}-\tfrac{\bm{1}_{T}\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{0}}{\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}]=$	$\displaystyle\sum_{t=1}^{T}[\bm{X}_{t}-\widehat{\bm{\mu}}_{1}(u)][\bm{X}_{t-1}-\widehat{\bm{\mu}}_{0}(u)]^{\top}K_{h}(\tfrac{t}{T}-u).$

Proof.

See Appendix A. ∎

The following proposition provides a closed-form of the local-linear minimizer of (13). We consider model (12) and use the local-linear approximation of $\mathbf{B}(\tfrac{t}{T})$ in a neighborhood of $u$

\mathbf{B}(\tfrac{t}{T})\approx\mathbf{B}(u)+(\tfrac{t}{T}-u)\mathbf{B}^{(1)}(u)

(21)

to estimate $\mathbf{B}(u)$ , our parameter of interest, in the approximate model

\bm{X}_{t}\approx\mathbf{B}(u)\bm{Z}_{t}+(\tfrac{t}{T}-u)\mathbf{B}^{(1)}(u)\bm{Z}_{t}+\bm{\varepsilon}_{t}.

(22)

Theorem 1.

\sup_{u\in(0,1)}\|\bm{\mu}^{(2)}(u)\|<\infty,\qquad\mbox{and}\quad\sup_{u\in(0,1)}\|\mathbf{A}^{(2)}(u)\|<\infty,

(23)

where $\mathbf{A}^{(2)}(u):=\tfrac{d^{2}\mathbf{A}(x)}{dx^{2}}|_{x=u}$ . Let

\underset{T\times(r+1)}{\mathbf{Z}_{0}}=[\mathbf{1}_{T}|\mathbf{X}_{0}],

where $\mathbf{X}_{0}$ has been defined in (5), and define the diagonal matrix

\underset{T\times T}{\mathbf{\Delta}_{1}(u)}={\rm{diag}}\{\tfrac{t}{T}-u,\,1\leq t\leq T\},

(24)

and the weighting matrix

\mathbf{W}_{\!{}_{T}}(u;\mathbf{X})=\mathbf{K}_{T}(u)-\mathbf{K}_{T}(u)\mathbf{\Delta}_{1}(u)\mathbf{Z}_{0}[\mathbf{Z}_{0}^{\top}\mathbf{\Delta}_{1}(u)\mathbf{K}_{T}(u)\mathbf{\Delta}_{1}(u)\mathbf{Z}_{0}]^{-1}\mathbf{Z}_{0}^{\top}\mathbf{\Delta}_{1}(u)\mathbf{K}_{T}(u),

(25)

where $\mathbf{K}_{T}(u)$ has been defined (7). Then, the local linear minimizer of (13) is

	$\displaystyle\widetilde{\mathbf{B}}(u)$	$\displaystyle=$	$\displaystyle\mathbf{X}_{1}^{\top}\mathbf{W}_{\!{}_{T}}(u;\mathbf{X})\mathbf{Z}_{0}\,[{\mathbf{Z}}_{0}^{\top}\mathbf{W}_{\!{}_{T}}(u;\mathbf{X}){\mathbf{Z}}_{0}]^{-1}$		(26)
		$\displaystyle=$	$\displaystyle[\widetilde{\bm{m}}(u),\widetilde{\mathbf{A}}(u)],$		(27)

with $\widetilde{\mathbf{A}}(u)=\widetilde{\mathbf{G}}(u,1)\,[\widetilde{\mathbf{G}}(u,0)^{-1}]$ and $\widetilde{\bm{m}}(u)=\widetilde{\bm{\mu}}_{1}(u)-\widetilde{\mathbf{A}}(u)\widetilde{\bm{\mu}}_{0}(u)$ , where

$\displaystyle\widetilde{\bm{\mu}}_{0}(u)$	$\displaystyle=$	$\displaystyle\mathbf{X}_{0}^{\top}\mathbf{W}_{\!{}_{T}}(u;\mathbf{X})\bm{1}_{T}/\bm{1}_{T}^{\top}\mathbf{W}_{\!{}_{T}}(u;\mathbf{X})\bm{1}_{T}$	(28)
$\displaystyle\widetilde{\bm{\mu}}_{1}(u)$	$\displaystyle=$	$\displaystyle\mathbf{X}_{1}^{\top}\mathbf{W}_{\!{}_{T}}(u;\mathbf{X})\bm{1}_{T}/\bm{1}_{T}^{\top}\mathbf{W}_{\!{}_{T}}(u;\mathbf{X})\bm{1}_{T}$	(29)
$\displaystyle\widetilde{\mathbf{G}}(u,0)$	$\displaystyle=$	$\displaystyle[\mathbf{X}_{0}^{\top}\mathbf{W}_{\!{}_{T}}(u;\mathbf{X})\mathbf{X}_{0}-\tfrac{\mathbf{X}_{0}^{\top}\mathbf{W}_{\!{}_{T}}(u;\mathbf{X})\bm{1}_{T}\bm{1}_{T}^{\top}\mathbf{W}_{\!{}_{T}}(u;\mathbf{X})\mathbf{X}_{0}}{\bm{1}_{T}^{\top}\mathbf{W}_{\!{}_{T}}(u;\mathbf{X})\bm{1}_{T}}]$	(30)
$\displaystyle\widetilde{\mathbf{G}}(u,1)$	$\displaystyle=$	$\displaystyle[\mathbf{X}_{1}^{\top}\mathbf{W}_{\!{}_{T}}(u;\mathbf{X})\mathbf{X}_{0}-\tfrac{\mathbf{X}_{1}^{\top}\mathbf{W}_{\!{}_{T}}(u;\mathbf{X})\bm{1}_{T}\bm{1}_{T}^{\top}\mathbf{W}_{\!{}_{T}}(u;\mathbf{X})\mathbf{X}_{0}}{\bm{1}_{T}^{\top}\mathbf{W}_{\!{}_{T}}(u;\mathbf{X})\bm{1}_{T}}],$	(31)

$\mathbf{X}_{1}$ being defined in (6).

Proof.

See Appendix B. ∎

4 Simulation Results

We first consider the zero-mean locally stationary VAR(1) model in (3), and estimate the VAR matrix by means of the localized Yule-Walker equations. Then we consider the locally stationary VAR(1) model in (9) with time-varying mean, and compare the WLS estimates obtained with local-constant and local-linear weights, respectively.

We simulate model (3) with $r=6$ . For $j=1,4$ and $k=1,\dots,r$ , we generate the time varying entries of the $r\times r$ matrix $\mathbf{A}(\tfrac{t}{T})$ as

$\displaystyle A_{j,k}(\tfrac{t}{T})$	$\displaystyle=$	$\displaystyle a_{1}\tfrac{\sqrt{j+3}}{\log(k+3)}\sin\big{(}4\pi\tfrac{t}{T}\tfrac{\sqrt{j+4}}{\log(k+4)}\big{)},$
$\displaystyle A_{j+1,k}(\tfrac{t}{T})$	$\displaystyle=$	$\displaystyle a_{1}\tfrac{\sqrt{j+2}}{\log(k+3)}\cos\big{(}2\pi\tfrac{t}{T}\tfrac{\sqrt{j+4}}{\log(k+2)}\big{)},$	(32)
$\displaystyle A_{j+2,k}(\tfrac{t}{T})$	$\displaystyle=$	$\displaystyle a_{2}\tfrac{\sqrt{j+1}}{\log(k+3)}\sin\big{(}\pi\tfrac{t}{T}\tfrac{\sqrt{j+4}}{\log(k+2)}\big{)},$

with $T=800$ , $a_{1}=0.2$ and $a_{2}=0.1$ . This specification is such that, for all $u\in(0,1)$ ,

0.1<\left|{\rm v}_{1}[\mathbf{A}(u)]\,\right|<0.9.

We estimate the parameters according to (8), with $h=0.03$ and the Gaussian Kernel $K(x)=\tfrac{1}{\sqrt{2\pi}}\exp(-0.5\,x^{2})$ . The results are reported in Figure 1.

Refer to caption — (a) One realization of the $r=6$ times series simulated according to (3) with $\mathbf{A}(u)$ as in (32) and $T=800$ .

We simulate model (9) with $r=3$ . For $k=1,\dots,r$ , we generate the time varying entries of the $r\times 1$ vector $\bm{\mu}(\tfrac{t}{T})$ as

\mu_{k}(\tfrac{t}{T})=\sqrt{6}\,\sin(\pi\omega_{k}\,\tfrac{t}{T}-\phi_{k})

(33)

and the $r\times r$ matrix $\mathbf{A}(\tfrac{t}{T})$ as

$\displaystyle A_{1,k}(\tfrac{t}{T})$	$\displaystyle=$	$\displaystyle a_{1}\tfrac{\sqrt{6}}{\log(k+3)}\sin\big{(}1.2+2\pi\tfrac{t}{T}\tfrac{\sqrt{7}}{\log(k+4)}\big{)},$
$\displaystyle A_{2,k}(\tfrac{t}{T})$	$\displaystyle=$	$\displaystyle a_{1}\tfrac{\sqrt{5}}{\log(k+3)}\cos\big{(}1.2+2\pi\tfrac{t}{T}\tfrac{\sqrt{7}}{\log(k+2)}\big{)},$	(34)
$\displaystyle A_{3,k}(\tfrac{t}{T})$	$\displaystyle=$	$\displaystyle a_{2}\tfrac{\sqrt{4}}{\log(k+3)}\sin\big{(}1.2+\pi\tfrac{t}{T}\tfrac{\sqrt{7}}{\log(k+2)}\big{)},$

with $T=600$ , $a_{1}=0.3$ and $a_{2}=0.2$ , $\omega_{k}=0.5+k$ , and $\phi_{k}=0.2+k/3$ . This specification is such that, for all $u\in(0,1)$ ,

0<\left|{\rm v}_{1}[\mathbf{A}(u)]\,\right|<0.9.

Figure 2 exhibits the parameters estimated by WLS with $h=0.04$ and the Epanechnikov Kernel $K(x)=\tfrac{3}{4}(1-u^{2})\mathbbm{1}_{\{|u|\leq 1\}}(x)$ . The local constant estimates, obtained according to (15)-(16), are presented reported in Figure 2(a). The local linear estimates, obtained according to (25)-(27), are reported in Figure 2(b).

Figure 1 shows that in the absence of the (time-varying) mean, that is when $\bm{\mu}(u)\equiv 0$ , the local constant estimator performs very well. However, as Figure 2(a) illustrates, this is not the case in the presence of a (time-varying) mean.

It is clear from Figure 2 that although the local constant estimates look satisfactory, the local linear approach delivers superior results. As in the univariate case, the bias of the local linear estimator only depends on the second derivative (of the unknown regression function), and this does not come as a cost in the asymptotic variance. Moreover, the local linear estimator does not suffer from boundary bias problems. The quality of the estimates in Figure 2(b) is remarkable.

5 Conclusions and future research

In this paper we consider the problem of estimating the time-varying mean-vector and the time-varying AR-matrix of a locally stationary VAR(1). We provide the closed form definition of the local linear solution to the weighted least squares, in a way that the mean-vector and the AR-matrix are estimated jointly.

The asymptotic properties of our estimator need to be studied. Also, it would be interesting to develop data driven methods to select the smoothing parameters. Moreover, we might consider the problem of estimating the parameters of a locally stationary VAR( $p$ ) of order $p>1$ .

An important contribution from future studies is the extension of the WLS in (40) to the high-dimensional setting: $r>>T$ . To this end, the WLS approach can be can be generalized in more than one direction. In fact, the closed form in (25)-(27) of Theorem 1 becomes particularly attractive in the case the length $r$ of the time series becomes large. Indeed, we can stick to the linear regression model in (22) with the same assumptions as in Section 3, and fit (22) in a way to shrink the regression coefficients towards zero. More precisely we can consider minimizing, with respect to $\mathbf{B}$ , the following WLS-Ridge loss-function

\|\mathbf{X}_{1}-\widetilde{\mathbf{Z}}_{0}\mathbf{B}(u)^{\top}\|^{2}_{\mathbf{K}(u)}+\lambda\|{\mathbf{B}}(u)\|^{2},

(35)

the effect of the penalty being to shrink the entries of $\mathbf{B}$ towards zero. The approach based on (35) can be generalized to the case of a non-spherical penalty. The loss function corresponding to this scenario is

\|\mathbf{X}_{1}-\widetilde{\mathbf{Z}}_{0}{\mathbf{B}}(u)^{\top}\|^{2}_{\mathbf{K}(u)}+\|{\mathbf{B}}(u)\|^{2}_{\mathbf{\Lambda}},

(36)

which comprises a WLS criterion – as in (40) – and a generalized ridge penalty given by the matrix $\mathbf{\Lambda}(u)$ . In both (35) and (36), the $(T\times T)$ matrix $\mathbf{K}(u)$ is diagonal, the $t$ -th element $\{K_{t}(u)=\tfrac{1}{h}K(\tfrac{u-t/T}{h}),\,1\leq t\leq T\}$ representing the weight of the $t$ -th observation such that $\tfrac{1}{T}K_{t}(u)\in[0,1]$ . The penalty in (36) is a quadratic form with penalty parameter $\mathbf{\Lambda}$ , an $r$ -dimensional positive-definite matrix. When $\mathbf{\Lambda}=\lambda\mathbf{I}_{r}$ , we obtain the spherical penalty of the WLS-Ridge regression in (35). Generalizing the (positive) scalar $\lambda$ to the class of (positive definite) matrices $\mathbf{\Lambda}$ allows for (i) different penalization per regression parameter, and (ii) joint shrinkage among the elements of $\mathbf{B}(u)$ .

References

(1)
Dahlhaus (1996) Dahlhaus, R. (1996). Asymptotic statistical inference for nonstationary processes with evolutionary spectra. In P. M. Robinson and M. Rosenblatt (eds), Athens Conference on Applied Probability and Time Series Analysis, Vol. II, Springer-Verlag, New York.
Dahlhaus (1997) Dahlhaus, R. (1997). Fitting time series models to nonstationary processes. The Annals of Statistics 25, 1–37.
Dahlhaus (2000) Dahlhaus, R. (2000). A likelihood approximation for locally stationary processes. The Annals of Statistics 28(6), 1762–1794.
Dahlhaus (2012) Dahlhaus, R. (2012). Time Series Analysis: Methods and Applications. Vol. 30, Elsevier, chapter Locally Stationary Processes.
Lütkepohl (1996) Lütkepohl, H. (1996). Handbook of Matrices. John Wiley & Sons.
Zhou and Wu (2010) Zhou, Z. and Wu, W. B. (2010). Simultaneous inference of linear models with time varying coefficients. Journal of the Royal Statistical Society, Series B 72(4), 513–531.

Appendix A Proof of Proposition 1

If we assume that the matrix-valued function

\mathbf{B}(x)=[\bm{m}(x),\mathbf{A}(x)]

is smooth in $x$ , we can write the Taylor expansion of $\mathbf{B}(x)$ around $u$ :

	$\displaystyle\mathbf{B}(x)$	$\displaystyle=$	$\displaystyle\sum_{j=0}^{\infty}\tfrac{(x-u)^{j}}{j!}\mathbf{B}^{(j)}(u)$
		$\displaystyle=$	$\displaystyle\mathbf{B}(u)+(x-u)\mathbf{B}^{(1)}(u)+\mathcal{O}([x-u]^{2})$

where $\mathbf{B}^{(j)}(u):=\tfrac{d^{j}\mathbf{B}(x)}{dx^{j}}|_{x=u}$ . Assuming (14) implies that

\sup_{u\in(0,1)}\|\mathbf{B}^{(1)}(u)\|<\infty,

and that

\|\mathbf{B}(\tfrac{t}{T})-\mathbf{B}(u)\|\leq|\tfrac{t}{T}-u|\,\|\mathbf{B}^{(1)}(u)\|=\mathcal{O}(\tfrac{1}{T})\times\mathcal{O}(1)=\mathcal{O}(\tfrac{1}{T})

uniformly in $u$ , so that the loss in (13) can be approximated by

\sum_{t=1}^{T}\|\bm{X}_{t}-\mathbf{B}(u)\bm{Z}_{t}\|^{2}\,K_{h}(\tfrac{t}{T}-u).

(37)

Letting

\underset{T\times(r+1)}{\mathbf{Z}_{0}}=[\bm{1}_{T},\mathbf{X}_{0}]

where $\bm{1}_{T}$ is a $T\times 1$ vector of ones, and $\mathbf{X}_{0}$ has been defined in (5), the loss in (37) can be written in matrix form as

\displaystyle\begin{split}\mathcal{L}_{\,\!{}_{T}}(u)&=\|\mathbf{X}_{1}-\mathbf{Z}_{0}\mathbf{B}(u)^{\top}\|^{2}_{\mathbf{K}_{\!{}_{T}}(u)}\\ &={\rm tr}\{[\mathbf{X}_{1}-\mathbf{Z}_{0}\mathbf{B}(u)^{\top}]^{\top}\,\mathbf{K}_{\!{}_{T}}(u)\,[\mathbf{X}_{1}-\mathbf{Z}_{0}\mathbf{B}(u)^{\top}]\},\end{split}

(38)

with $\mathbf{X}_{1}$ as in (6) and $\mathbf{K}_{\!{}_{T}}(u)$ as in (7). The loss in (38) is equal to

\displaystyle{\rm tr}\{[\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{1}-2\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{Z}_{0}\mathbf{B}(u)^{\top}+\mathbf{B}(u)\mathbf{Z}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{Z}_{0}\mathbf{B}(u)^{\top}]\},

and thus minimizing $\mathcal{L}_{\,\!{}_{T}}(u)$ with respect to $\mathbf{B}(u)$ is equivalent to minimizing

\displaystyle{\rm tr}\{[\mathbf{B}(u)\mathbf{Z}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{Z}_{0}\mathbf{B}(u)^{\top}-2\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{Z}_{0}\mathbf{B}(u)^{\top}]\}

with respect to $\mathbf{B}(u)$ . Differentiating w.r.t. $\mathbf{B}(u)^{\top}$ and equating to zero we obtain

\displaystyle{\rm tr}\{[2\mathbf{B}(u)\mathbf{Z}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{Z}_{0}-2\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{Z}_{0}]\}=0,

that is,

\displaystyle\widehat{\mathbf{B}}(u)=\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{Z}_{0}\,\big{(}\mathbf{Z}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{Z}_{0}\big{)}^{-1},

and thus (15) is proved. Notice that

\underset{r\times(r+1)}{\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{Z}_{0}}=\big{[}\underset{r\times 1}{\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}|\underset{r\times r}{\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{0}}\big{]},

and that the matrix we need to invert can be partitioned as

\underset{(r+1)\times(r+1)}{\mathbf{Z}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{Z}_{0}}=\left[\begin{array}[]{c|c}\underset{1\times 1}{\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}&\underset{1\times r}{\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{0}}\\ \hline\cr\underset{r\times 1}{\mathbf{X}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}&\underset{r\times r}{\mathbf{X}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{0}}\end{array}\right].

Without proof we state the following Lemma, see Lütkepohl (1996, result (1)in Section 3.5.3, pages 29-30).

Lemma 1.

Let $\mathbf{A}$ be $m\times m$ , $\mathbf{B}$ be $m\times n$ , $\mathbf{C}$ be $n\times m$ , and $\mathbf{D}$ be $n\times n$ , and consider the $(m+n)\times(m+n)$ partitioned matrix

\left[\begin{array}[]{c|c}\mathbf{A}&\mathbf{B}\\ \hline\cr\mathbf{C}&\mathbf{D}\end{array}\right].

If $\mathbf{A}$ and $[\mathbf{D}-\mathbf{C}\mathbf{A}^{-1}\mathbf{B}]$ are both nonsingular, then

\left[\begin{array}[]{c|c}\mathbf{A}&\mathbf{B}\\ \hline\cr\mathbf{C}&\mathbf{D}\end{array}\right]^{-1}=\left[\begin{array}[]{c|c}\mathbf{A}^{-1}+\mathbf{A}^{-1}\mathbf{B}(\mathbf{D}-\mathbf{C}\mathbf{A}^{-1}\mathbf{B})^{-1}\mathbf{C}\mathbf{A}^{-1}&-\mathbf{A}^{-1}\mathbf{B}(\mathbf{D}-\mathbf{C}\mathbf{A}^{-1}\mathbf{B})^{-1}\\ \hline\cr-(\mathbf{D}-\mathbf{C}\mathbf{A}^{-1}\mathbf{B})^{-1}\mathbf{C}\mathbf{A}^{-1}&(\mathbf{D}-\mathbf{C}\mathbf{A}^{-1}\mathbf{B})^{-1}\end{array}\right].

We can now prove (16), together with (17), (18), (19) and (20). By Lemma 1,

\big{(}\mathbf{Z}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{Z}_{0}\big{)}^{-1}=\left[\begin{array}[]{c|c}(\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T})^{-1}+\tfrac{\bm{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{0}(\widehat{\mathbf{G}}(u,0)^{-1})\mathbf{X}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}{(\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T})^{2}}&-\tfrac{\bm{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{0}}{\bm{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}(\widehat{\mathbf{G}}(u,0)^{-1})\\ \hline\cr-(\widehat{\mathbf{G}}(u,0)^{-1})\mathbf{X}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}(\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T})^{-1}&\widehat{\mathbf{G}}(u,0)^{-1}\end{array}\right],

where $\widehat{\mathbf{G}}(u,0)$ has been defined in (19), and therefore

\widehat{\mathbf{B}}(u)=\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{Z}_{0}\,\big{(}\mathbf{Z}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{Z}_{0}\big{)}^{-1}=[\widehat{\bm{m}}(u),\widehat{\mathbf{A}}(u)],

with

	$\displaystyle\widehat{\mathbf{A}}(u)$	$\displaystyle=$	$\displaystyle-\tfrac{\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{0}}{\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}(\widehat{\mathbf{G}}(u,0)^{-1})+\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{0}(\widehat{\mathbf{G}}(u,0)^{-1})$
		$\displaystyle=$	$\displaystyle\widehat{\mathbf{G}}(u,1)\,\widehat{\mathbf{G}}(u,0)^{-1},$

where $\widehat{\mathbf{G}}(u,1)$ has been defined in (20), and

$\displaystyle\widehat{\bm{m}}(u)$	$\displaystyle=$	$\displaystyle\tfrac{\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}{\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}+\tfrac{\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{0}(\widehat{\mathbf{G}}(u,0)^{-1})\mathbf{X}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}{(\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T})^{2}}-\tfrac{\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{0}(\widehat{\mathbf{G}}(u,0)^{-1})\mathbf{X}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}{\bm{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}$
	$\displaystyle=$	$\displaystyle\tfrac{\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}{\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}+\big{[}\tfrac{\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{0}}{\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}-\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{0}\big{]}(\widehat{\mathbf{G}}(u,0)^{-1})\tfrac{\mathbf{X}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}{\bm{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}$
	$\displaystyle=$	$\displaystyle\widehat{\bm{\mu}}_{1}(u)-\widehat{\mathbf{G}}(u,1)\widehat{\mathbf{G}}(u,0)^{-1}\widehat{\bm{\mu}}_{0}(u)=\widehat{\bm{\mu}}_{1}(u)-\widehat{\mathbf{A}}(u)\widehat{\bm{\mu}}_{0}(u),$

where $\widehat{\bm{\mu}}_{0}(u)$ and $\widehat{\bm{\mu}}_{1}(u)$ are given by (17) and (18), respectively.

Appendix B Proof of Theorem 1

If we assume that the matrix-valued function

\mathbf{B}(x)=[\bm{m}(x),\mathbf{A}(x)]

is smooth in $x$ , we can write the Taylor expansion of $\mathbf{B}(x)$ around $u$ :

	$\displaystyle\mathbf{B}(x)$	$\displaystyle=$	$\displaystyle\sum_{j=0}^{\infty}\tfrac{(x-u)^{j}}{j!}\mathbf{B}^{(j)}(u)$
		$\displaystyle=$	$\displaystyle\mathbf{B}(u)+(x-u)\mathbf{B}^{(1)}(u)+\tfrac{1}{2}(x-u)^{2}\mathbf{B}^{(2)}(u)+\mathcal{O}([x-u]^{3})$

where $\mathbf{B}^{(j)}(u):=\tfrac{d^{j}\mathbf{B}(x)}{dx^{j}}|_{x=u}$ . Assuming (23) implies that

\sup_{u\in(0,1)}\|\mathbf{B}^{(2)}(u)\|<\infty,

and that

\left\|\mathbf{B}(\tfrac{t}{T})-\big{[}\mathbf{B}(u)+(x-u)\mathbf{B}^{(1)}(u)\big{]}\right\|\leq\tfrac{1}{2}|\tfrac{t}{T}-u|^{2}\,\|\mathbf{B}^{(2)}(u)\|=\mathcal{O}(\tfrac{1}{T^{2}})\times\mathcal{O}(1)=\mathcal{O}(\tfrac{1}{T^{2}})

uniformly in $u$ . Therefore, adopting (21)-(22) the loss in (13) can be approximated by

\sum_{t=1}^{T}\|\bm{X}_{t}-[\mathbf{B}(u)+(\tfrac{t}{T}-u)\mathbf{B}^{(1)}(u)]\bm{Z}_{t}\|^{2}\,K_{h}(\tfrac{t}{T}-u).

(39)

Letting

\underset{[2(r+1)]\times r}{\widetilde{\mathbf{B}}_{1}(u)^{\top}}=\begin{bmatrix}\underset{(r+1)\times r}{{\mathbf{B}}(u)^{\top}}\\ \underset{(r+1)\times r}{{\mathbf{B}}^{(1)}(u)^{\top}}\end{bmatrix}

and

	$\displaystyle\underset{T\times[2(r+1)]}{\widetilde{\mathbf{Z}}_{0}}$	$\displaystyle=$	$\displaystyle[\mathbf{Z}_{0}\|\mathbf{\Delta}_{1}(u)\,\mathbf{Z}_{0}],\quad\mbox{with}$
	$\displaystyle\underset{T\times(r+1)}{\mathbf{Z}_{0}}$	$\displaystyle=$	$\displaystyle[\mathbf{1}_{T}\|\mathbf{X}_{0}],$

where $\bm{1}_{T}$ is a $T\times 1$ vector of ones, and where $\mathbf{X}_{0}$ and $\mathbf{\Delta}_{1}(u)$ have been defined in (5) and (24), respectively, the loss in (39) can be written in matrix form as

\displaystyle\begin{split}\widetilde{\mathcal{L}}_{\,\!{}_{T}}(u)&=\|\mathbf{X}_{1}-\widetilde{\mathbf{Z}}_{0}\widetilde{\mathbf{B}}_{1}(u)^{\top}\|^{2}_{\mathbf{K}_{\!{}_{T}}(u)}\\ &={\rm tr}\{[\mathbf{X}_{1}-\widetilde{\mathbf{Z}}_{0}\widetilde{\mathbf{B}}_{1}(u)^{\top}]^{\top}\,\mathbf{K}_{\!{}_{T}}(u)\,[\mathbf{X}_{1}-\widetilde{\mathbf{Z}}_{0}\widetilde{\mathbf{B}}_{1}(u)^{\top}]\},\end{split}

(40)

with $\mathbf{X}_{1}$ as in (6) and $\mathbf{K}_{\!{}_{T}}(u)$ as in (7). The loss in (40) is equal to

\displaystyle{\rm tr}\{[\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{1}-2\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\widetilde{\mathbf{Z}}_{0}\widetilde{\mathbf{B}}_{1}(u)^{\top}+\widetilde{\mathbf{B}}_{1}(u)\widetilde{\mathbf{Z}}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\widetilde{\mathbf{Z}}_{0}\mathbf{B}_{1}(u)^{\top}]\},

and thus minimizing $\widetilde{\mathcal{L}}_{\,\!{}_{T}}(u)$ with respect to $\widetilde{\mathbf{B}}_{1}(u)$ is equivalent to minimizing

\displaystyle{\rm tr}\{[\widetilde{\mathbf{B}}_{1}(u)\widetilde{\mathbf{Z}}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\widetilde{\mathbf{Z}}_{0}\widetilde{\mathbf{B}}_{1}(u)^{\top}-2\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\widetilde{\mathbf{Z}}_{0}\widetilde{\mathbf{B}}_{1}(u)^{\top}]\}

with respect to $\widetilde{\mathbf{B}}_{1}(u)$ . Differentiating w.r.t. $\widetilde{\mathbf{B}}_{1}(u)^{\top}$ and equating to zero we obtain

\displaystyle{\rm tr}\{[2\,\widetilde{\mathbf{B}}_{1}(u)\widetilde{\mathbf{Z}}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\widetilde{\mathbf{Z}}_{0}-2\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\widetilde{\mathbf{Z}}_{0}]\}=0,

that is,

\displaystyle\widetilde{\mathbf{B}}_{1}(u)=\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\widetilde{\mathbf{Z}}_{0}\,\big{(}\widetilde{\mathbf{Z}}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\widetilde{\mathbf{Z}}_{0}\big{)}^{-1}.

Notice that

\underset{r\times[2(r+1)]}{\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\widetilde{\mathbf{Z}}_{0}}=\big{[}\underset{r\times(r+1)}{\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{Z}_{0}}|\underset{r\times(r+1)}{\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{\Delta}_{1}(u)\mathbf{Z}_{0}}\big{]},

and that

\displaystyle\underset{[2(r+1)]\times[2(r+1)]}{\widetilde{\mathbf{Z}}_{0}^{\top}\mathbf{K}_{T}(u)\widetilde{\mathbf{Z}}_{0}}

\displaystyle=

\displaystyle\left[\begin{array}[]{c|c}\mathbf{A}&\mathbf{B}\\ \hline\cr\mathbf{C}&\mathbf{D}\end{array}\right],

where

$\displaystyle\underset{(r+1)\times(r+1)}{\mathbf{A}}$	$\displaystyle=$	$\displaystyle\mathbf{Z}_{0}^{\top}\mathbf{K}_{T}(u)\mathbf{Z}_{0}$
$\displaystyle\underset{(r+1)\times(r+1)}{\mathbf{B}}$	$\displaystyle=$	$\displaystyle\mathbf{Z}_{0}^{\top}\mathbf{K}_{T}(u)\mathbf{\Delta}_{1}(u)\mathbf{Z}_{0}$
$\displaystyle\underset{(r+1)\times(r+1)}{\mathbf{C}}$	$\displaystyle=$	$\displaystyle\mathbf{Z}_{0}^{\top}\mathbf{\Delta}_{1}(u)\mathbf{K}_{T}(u)\mathbf{Z}_{0}$
$\displaystyle\underset{(r+1)\times(r+1)}{\mathbf{D}}$	$\displaystyle=$	$\displaystyle\mathbf{Z}_{0}^{\top}\mathbf{\Delta}_{1}(u)\mathbf{K}_{T}(u)\mathbf{\Delta}_{1}(u)\mathbf{Z}_{0}.$

The local-linear estimator $\widetilde{\mathbf{B}}(u)$ of ${\mathbf{B}}(u)=[\bm{m}(u),\mathbf{A}(u)]$ is given by the first $r+1$ columns of the $r\times[2(r+1)]$ matrix

\underset{r\times[2(r+1)]}{\mathbf{X}_{1}^{\top}\mathbf{K}_{T}(u)\widetilde{\mathbf{Z}}_{0}}\,\,\underset{[2\times(r+1)][2\times(r+1)]}{[\widetilde{\mathbf{Z}}_{0}^{\top}\mathbf{K}_{T}(u)\widetilde{\mathbf{Z}}_{0}]^{-1}}.

Hence, we need the first $r+1$ columns of the $[2(r+1)]\times[2(r+1)]$ matrix $[\widetilde{\mathbf{Z}}_{0}^{\top}\mathbf{K}_{T}(u)\widetilde{\mathbf{Z}}_{0}]^{-1}$ . Without proof we state the following Lemma, see Lütkepohl (1996, result (2) in Section 3.5.3, page 30).

Lemma 2.

Let $\mathbf{A}$ be $m\times m$ , $\mathbf{B}$ be $m\times n$ , $\mathbf{C}$ be $n\times m$ , and $\mathbf{D}$ be $n\times n$ , and consider the $(m+n)\times(m+n)$ partitioned matrix

\left[\begin{array}[]{c|c}\mathbf{A}&\mathbf{B}\\ \hline\cr\mathbf{C}&\mathbf{D}\end{array}\right].

If $\mathbf{D}$ and $[\mathbf{A}-\mathbf{B}\mathbf{D}^{-1}\mathbf{C}]$ are both nonsingular, then

\left[\begin{array}[]{c|c}\mathbf{A}&\mathbf{B}\\ \hline\cr\mathbf{C}&\mathbf{D}\end{array}\right]^{-1}=\left[\begin{array}[]{c|c}[\mathbf{A}-\mathbf{B}\mathbf{D}^{-1}\mathbf{C}]^{-1}&-[\mathbf{A}-\mathbf{B}\mathbf{D}^{-1}\mathbf{C}]^{-1}\mathbf{B}\mathbf{D}^{-1}\\ \hline\cr-\mathbf{D}^{-1}\mathbf{C}[\mathbf{A}-\mathbf{B}\mathbf{D}^{-1}\mathbf{C}]^{-1}&\mathbf{D}^{-1}+\mathbf{D}^{-1}\mathbf{C}[\mathbf{A}-\mathbf{B}\mathbf{D}^{-1}\mathbf{C}]^{-1}\mathbf{B}\mathbf{D}^{-1}\end{array}\right].

Using Lemma 2 we can write

	$\displaystyle\widetilde{\mathbf{B}}(u)=$	$\displaystyle\mathbf{X}_{1}^{\top}\mathbf{K}_{T}(u)\mathbf{Z}_{0}$	$\displaystyle[\mathbf{Z}_{0}^{\top}\mathbf{K}_{T}(u)\mathbf{Z}_{0}-\mathbf{Z}_{0}^{\top}\mathbf{K}_{T}(u)\mathbf{\Delta}_{1}(u)\mathbf{Z}_{0}\mathbf{D}^{-1}\mathbf{Z}_{0}^{\top}\mathbf{\Delta}_{1}(u)\mathbf{K}_{T}(u)\mathbf{Z}_{0}]^{-1}$
		$\displaystyle-\mathbf{X}_{1}^{\top}\mathbf{K}_{T}(u)\mathbf{\Delta}_{1}(u)\mathbf{Z}_{0}\mathbf{D}^{-1}\mathbf{Z}_{0}^{\top}\mathbf{\Delta}_{1}(u)\mathbf{K}_{T}(u)\mathbf{Z}_{0}$	$\displaystyle[\mathbf{Z}_{0}^{\top}\mathbf{K}_{T}(u)\mathbf{Z}_{0}-\mathbf{Z}_{0}^{\top}\mathbf{K}_{T}(u)\mathbf{\Delta}_{1}(u)\mathbf{Z}_{0}\mathbf{D}^{-1}\mathbf{Z}_{0}^{\top}\mathbf{\Delta}_{1}(u)\mathbf{K}_{T}(u)\mathbf{Z}_{0}]^{-1},$

where $\mathbf{D}=\mathbf{Z}_{0}^{\top}\mathbf{\Delta}_{1}(u)\mathbf{K}_{T}(u)\mathbf{\Delta}_{1}(u)\mathbf{Z}_{0}$ . If we define the matrix $\mathbf{W}_{\!{}_{T}}(u;\mathbf{X})$ according to (25), the local-linear estimator $\widetilde{\mathbf{B}}(u)$ can be written as

\widetilde{\mathbf{B}}(u)=\mathbf{X}_{1}^{\top}\mathbf{W}_{\!{}_{T}}(u;\mathbf{X})\mathbf{Z}_{0}\,[{\mathbf{Z}}_{0}^{\top}\mathbf{W}_{\!{}_{T}}(u;\mathbf{X}){\mathbf{Z}}_{0}]^{-1},

and (26) is proved. The estimator $\widetilde{\mathbf{B}}(u)$ has the same form as the estimator $\widehat{\mathbf{B}}(u)$ in (15) of Proposition 1, with $\mathbf{W}_{\!{}_{T}}(u;\mathbf{X})$ instead of $\mathbf{K}_{\!{}_{T}}(u)$ . Hence the result in (27) with $\widetilde{\bm{\mu}}_{0}(u)$ , $\widetilde{\bm{\mu}}_{1}(u)$ , $\widetilde{\mathbf{G}}(u,0)$ , and $\widetilde{\mathbf{G}}(u,1)$ as in (28), (29), (30), and (31), respectively, follows directly from Proposition 1.

$\displaystyle\\|\tfrac{1}{T}\sum_{t=1}^{T}[\mathbf{A}(\tfrac{t}{T})-\mathbf{A}(u)]\bm{X}_{t-1}\bm{X}_{t-1}^{\top}K_{h}(\tfrac{t}{T}-u)\\|$	$\displaystyle\leq$	$\displaystyle\sup_{t,T}\\|\mathbf{A}(\tfrac{t}{T})\\|\times\\|\tfrac{1}{T}\sum_{t=1}^{T}\bm{X}_{t-1}\bm{X}_{t-1}^{\top}K_{h}(\tfrac{t}{T}-u)\\|$
	$\displaystyle\leq$	$\displaystyle\|\tfrac{t}{T}-u\|\,\times\\|\mathbf{A}^{(1)}(u)\\|\times\\|\tfrac{1}{T}\sum_{t=1}^{T}\bm{X}_{t-1}\bm{X}_{t-1}^{\top}K_{h}(\tfrac{t}{T}-u)\\|$
	$\displaystyle\leq$	$\displaystyle\mathcal{O}(\tfrac{1}{T})\times\mathcal{O}(1)\times\\|\mathbf{\Gamma}(0,u)+\mathcal{O}_{p}(\tfrac{1}{\sqrt{T\,h}})\\|=\mathcal{O}_{p}(\tfrac{1}{T}),$

Joint Mean-Vector and Var-Matrix estimation for Locally Stationary VAR(1) processes

Abstract

1 Introduction

2 Locally Stationary Vector Auto Regression

3 Joint estimation of time-varying mean-vector and VAR-matrix by Weighted Least Squares

Proposition 1.

Proof.

Theorem 1.

Proof.

4 Simulation Results

5 Conclusions and future research

References

Appendix A Proof of Proposition 1

Lemma 1.

Appendix B Proof of Theorem 1

Lemma 2.

Joint Mean-Vector and Var-Matrix estimation
for Locally Stationary VAR(1) processes