This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Joint Mean-Vector and Var-Matrix estimation
for Locally Stationary VAR(1) processes

Giovanni Motta111Correpondence to: Giovanni Motta, 3143 TAMU, Department of Statistics, College Station, TX 77843, USA. E-mail: [email protected]
Department of Statistics, Texas A&M University
Abstract

During the last two decades, locally stationary processes have been widely studied in the time series literature. In this paper we consider the locally-stationary vector-auto-regression model of order one, or LS-VAR(1), and estimate its parameters by weighted least squares. The LS-VAR(1) we consider allows for a smoothly time-varying non-diagonal VAR matrix, as well as for a smoothly time-varying non-zero mean. The weighting scheme is based on kernel smoothers. The time-varying mean and the time-varying VAR matrix are estimated jointly, and the definition of the local-linear weighting matrix is provided in closed-from. The quality of the estimated curves is illustrated through simulation results.

Keywordsβ€” Local Stationarity, Local Polynomials, Weighted Least Squares

Data Availability Statement: Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

1 Introduction

In this paper we consider rr-dimensional multivariate data 𝑿T​(1),…​𝑿T​(t)\bm{X}_{T}(1),\dots\bm{X}_{T}(t) generated by a locally stationary process, and our goal is to fit to the data a parametric model with time-varying coefficients. The notation 𝑿T​(t)\bm{X}_{T}(t) emphasizes that the data is a triangular array where at each tt, the structure of the process depends on the sample size TT.

To introduce the problem, consider the following uni-variate zero-mean autoregressive model of order pp,

βˆ‘j=0paj​(tT)​XT​(tβˆ’j)=σ​(tT)​Ρ​(t),\sum_{j=0}^{p}a_{j}(\tfrac{t}{T})X_{T}(t-j)=\sigma(\tfrac{t}{T})\varepsilon(t), (1)

or AR(pp), where the coefficients aj​(u)a_{j}(u) are differentiable for u∈(0,1)u\in(0,1) with bounded derivatives.

In terms of modeling, local stationarity means that if the parameters aj​(u)a_{j}(u) are smooth in rescaled time uu and TT is large, then aj​(sT)β‰ˆaj​(tT)a_{j}(\tfrac{s}{T})\approx a_{j}(\tfrac{t}{T}) for all ss in a neightbour of tt. For estimation, rescaling in time allows to apply non-parametric methods to recover the unknown curves. In the frequency domain, the importance of rescaling time tt by the sample size TT and developing the analysis in rescaled time u∈(0,1)u\in(0,1) relies upon the uniqueness of the transfer function. Dahlhaus (1996) introduced the spectral representation of a locally stationary process

XT​(u)=βˆ«βˆ’Ο€βˆ’Ο€exp⁑(i​λ​t)​AT0​(t,Ξ»)​𝑑ξ​(Ξ»),X_{T}(u)=\int_{-\pi}^{-\pi}\exp(i\lambda t)A_{T}^{0}(t,\lambda)d\xi(\lambda),

where ξ​(Ξ»)\xi(\lambda) is a stochastic process with orthogonal increments, and where the sequence AT0​(t,Ξ»)A_{T}^{0}(t,\lambda) converges (uniformly in tt and Ξ»\lambda, as TT diverges) to another function A​(u,Ξ»)A(u,\lambda):

supt,Ξ»|AT0​(t,Ξ»)βˆ’A​(tT,Ξ»)|=π’ͺ​(1T).\sup_{t,\lambda}|A_{T}^{0}(t,\lambda)-A(\tfrac{t}{T},\lambda)|=\mathcal{O}(\tfrac{1}{T}).

If AA is smooth in uu, then the time-varying spectral density f​(u,Ξ»)=|A​(tT,Ξ»)|2f(u,\lambda)=|A(\tfrac{t}{T},\lambda)|^{2} is uniquely determined from the triangular array.

The dichotomy between AT0A_{T}^{0} and AA is particularly relevant in the case of AR processes. To see this, consider the simple case where p=1p=1 and and σ​(u)≑1\sigma(u)\equiv 1. In the stationary case where the coefficient aa is time-invariant, the process in (1) can be written as

X​(t)=βˆ‘k=0∞ψk​Ρtβˆ’kX(t)=\sum_{k=0}^{\infty}\psi_{\,\!{}_{k}}\varepsilon_{t-k}

with ψk=ak\psi_{\,\!{}_{k}}=a^{k}. By contrast, the locally stationary process XT​(t)=a​(tT)​XT​(tβˆ’1)+Ρ​(t)X_{T}(t)=a(\tfrac{t}{T})X_{T}(t-1)+\varepsilon(t) does not have a solution of the form

XT​(t)=βˆ‘k=0∞ψk​(u)​Ρtβˆ’k,X_{T}(t)=\sum_{k=0}^{\infty}\psi_{\,\!{}_{k}}(u)\varepsilon_{t-k},

but only of the form XT​(t)=βˆ‘k=0∞ψT,k​(t)​Ρtβˆ’kX_{T}(t)=\displaystyle\sum_{k=0}^{\infty}\psi_{\!{}_{T,k}}(t)\varepsilon_{t-k},   with   suptβˆ‘kβˆˆβ„€|ψT,k​(t)βˆ’Οˆk​(tT)|=π’ͺ​(1T)\sup_{t}\displaystyle\sum_{k\in\mathbb{Z}}|\psi_{\!{}_{T,k}}(t)-\psi_{\,\!{}_{k}}(\tfrac{t}{T})|=\mathcal{O}(\tfrac{1}{T}).

The seminal papers on local stationarity (Dahlhaus, 1996, 1997) provide details on the mathematics in the frequency domain. For an overview on multivariate locally stationary processes, see Dahlhaus (2012, Section 7.2).

Without loss of generality, assume that σ​(u)\sigma(u) is known and time-invariant, that is, σ​(u)≑σ\sigma(u)\equiv\sigma. Suppose that the vector of interest 𝒂​(u)=[a1​(u),…,ap​(u)]⊀\bm{a}(u)=[a_{1}(u),\dots,a_{p}(u)]^{\top} depends on a finite dimensional parameter. For example, if the coefficients are polynomials in time

aj​(u)=βˆ‘k=1KΞΈj​k​fk​(u),1≀k≀K,1≀j≀p,with​fk​(u)=ukβˆ’1,1≀k≀K,\displaystyle\begin{split}a_{j}(u)=&\sum_{k=1}^{K}\theta_{jk}f_{k}(u),\quad 1\leq k\leq K,&\quad 1\leq j\leq p,\\ \mbox{with}\quad f_{k}(u)=&\,\,\,u^{k-1},\qquad\qquad 1\leq k\leq K,&\end{split} (2)

estimating the time-varying vector 𝒂​(u)\bm{a}(u) at u∈(0,1)u\in(0,1) translates into estimating the time-invariant vector 𝜽=[𝜽1⊀,…,𝜽p⊀]⊀\bm{\theta}=[\bm{\theta}_{1}^{\top},\dots,\bm{\theta}_{p}^{\top}]^{\top}, where 𝜽j=[ΞΈj​1,…,ΞΈj​K]⊀\bm{\theta}_{j}=[\theta_{j1},\dots,\theta_{jK}]^{\top}, with 1≀j≀p1\leq j\leq p.

The specification in (2) approximates the coefficients vector 𝒂​(u)\bm{a}(u) by global polynomials. Dahlhaus (1997, Section 4) obtained an explicit formula for the vector 𝜽{\bm{\theta}} as the solution of a linear system similar to the Yule-Walker equations. In the univariate setting, Dahlhaus (1996) estimates the time-varying parameters 𝒂​(u)\bm{a}(u) by kernel smoothers, that is, using a local-constant approximation. In the multivariate settings, Dahlhaus (2000, p. 1776) mentions the possibility of estimating the unknown parameters πœ½β€‹(u)\bm{\theta}(u) by minimazing of a local-polynomial approximation of the local likelihood.

Zhou andΒ Wu (2010) consider univariate linear models of the form 𝒀​(t)=π‘ΏβŠ€β€‹πœ·β€‹(t)+πœΊβ€‹(t)\bm{Y}(t)=\bm{X}^{\top}\bm{\beta}(t)+\bm{\varepsilon}(t), where both 𝑿\bm{X} and 𝜺\bm{\varepsilon} are asumed to be locally stationary, and estimate their time-varying vector of coefficients πœ·β€‹(t)\bm{\beta}(t) by means of local polynomials.

The contribution of this paper is threefold. In terms of modeling, we consider a multivariate version of model (1) and estimate the time varying parameters in time domain. Our main contribution is the closed-form definition of the local-linear estimator of the parameters. Finally, we emphasize that the estimation of time-varying mean and time-varying AR matrix is performed jointly.

In SectionΒ 2 we derive the localized Yule-Walker equations for a locally-stationary zero-mean VAR process. In SectionΒ 3 we consider a locally-stationary VAR with time-varying mean. First, we derive the local-constant weighted-least-squares estimator, see PropositionΒ 1. Then in TheoremΒ 1 we establish our main result, the closed form definition of the local-linear weighted-least-squares estimator. In SectionΒ 4 we illustrate and compare the performance of the two weighted-least-squares estimators (local-constant and local-linear). SectionΒ 5 concludes and highlights the extension of our approach to the high-dimensional (r>Tr>T) setting.

Through the paper we use bold uppercase letters to denote matrices, and bold slanted to denote vectors. We denote by 𝐈m\mathbf{I}_{m} the identity matrix of size mm, by 𝟏m\bm{1}_{m} the mΓ—1m\times 1 vector of ones, by tr​{𝐀}{\rm tr}\{\mathbf{A}\} the trace of 𝐀\mathbf{A}, by π€βŠ€\mathbf{A}^{\top} the transpose of 𝐀\mathbf{A}, by ‖𝐀‖\|\mathbf{A}\| the Frobenius norm ‖𝐀‖=[tr​{π€βŠ€β€‹π€}]1/2\|\mathbf{A}\|=[{\rm tr}\{\mathbf{A}^{\top}\mathbf{A}\}]^{\scalebox{0.6}{$1/2$}}, and by π€βˆ’1\mathbf{A}^{-1} the inverse of 𝐀\mathbf{A}, that is, the square matrix such that π€βˆ’1​𝐀=π€π€βˆ’1=𝐈\mathbf{A}^{-1}\mathbf{A}=\mathbf{A}\mathbf{A}^{-1}=\mathbf{I}. Finally, we use the acronyms VAR and WLS for vector auto-regression and weighted least squares, respectively.

2 Locally Stationary Vector Auto Regression

Consider the following rr-dimensional Locally Stationary Vector Auto Regression of order 1

𝑿trΓ—1=𝐀​(tT)rΓ—r​𝑿tβˆ’1rΓ—1+𝜺trΓ—1,t=1,…,T,\underset{r\times 1}{\bm{X}_{t}}=\underset{r\times r}{\mathbf{A}(\tfrac{t}{T})}\,\underset{r\times 1}{\bm{X}_{t-1}}+\underset{r\times 1}{\bm{\varepsilon}_{t}},\qquad t=1,\dots,T, (3)

with 𝑿0=𝟎\bm{X}_{0}=\bm{0} and 𝔼​[𝜺t​𝑿s⊀]=πŸ™{s=t}​πšͺΞ΅\mathbb{E}[\bm{\varepsilon}_{t}\bm{X}_{s}^{\top}]=\mathbbm{1}_{\{s=t\}}\mathbf{\Gamma}_{\varepsilon}. If the largest eigenvalue of 𝐀\mathbf{A} lies inside the unit circle

supu∈(0,1)|v1​[𝐀​(u)]|<1,\sup_{u\in(0,1)}\left|{\rm v}_{1}[\mathbf{A}(u)]\,\right|<1,

𝑿t\bm{X}_{t} is locally stationary and causal. Our goal is to estimate 𝐀​(u)\mathbf{A}(u) at a fixed u∈(0,1)u\in(0,1) using a localized version of the Yule-Walker equations. If we assume that the matrix-valued function 𝐀​(x)\mathbf{A}(x) is smooth in xx, we can write the Taylor expansion of 𝐀​(x)\mathbf{A}(x) around uu:

𝐀​(x)=βˆ‘j=0∞(xβˆ’u)jj!​𝐀(j)​(u)=𝐀​(u)+(xβˆ’u)​𝐀(1)​(u)+π’ͺ​([xβˆ’u]2)\mathbf{A}(x)=\sum_{j=0}^{\infty}\tfrac{\big{(}x-u\big{)}^{j}}{j!}\mathbf{A}^{(j)}(u)\\ =\mathbf{A}(u)+(x-u)\mathbf{A}^{(1)}(u)+\mathcal{O}([x-u]^{2})

where 𝐀(j)​(u):=dj​𝐀​(x)d​xj|x=u\mathbf{A}^{(j)}(u):=\tfrac{d^{j}\mathbf{A}(x)}{dx^{j}}|_{x=u}. We are interested in evaluating the function 𝐀​(tT)\mathbf{A}(\tfrac{t}{T}) at those value of tt in a neighborhood of uu. For example, for a fixed u0∈(0,1)u_{0}\in(0,1) let t0=⌊u0​TβŒ‹t_{0}=\lfloor u_{0}T\rfloor, where ⌊xβŒ‹\lfloor x\rfloor is the largest integer not exceeding xx. Then we have the following uniform bound:

supu0∈(0,1)|t0Tβˆ’u0|<1T.\sup_{u_{0}\in(0,1)}|\tfrac{t_{0}}{T}-u_{0}|<\tfrac{1}{T}.

As a consequence, assuming that

supu∈(0,1)‖𝐀(1)​(u)β€–<∞\sup_{u\in(0,1)}\|\mathbf{A}^{(1)}(u)\|<\infty

we obtain following bound

‖𝐀​(tT)βˆ’π€β€‹(u)‖≀|tTβˆ’u|​‖𝐀(1)​(u)β€–=π’ͺ​(1T)Γ—π’ͺ​(1)=π’ͺ​(1T)\|\mathbf{A}(\tfrac{t}{T})-\mathbf{A}(u)\|\leq|\tfrac{t}{T}-u|\,\|\mathbf{A}^{(1)}(u)\|=\mathcal{O}(\tfrac{1}{T})\times\mathcal{O}(1)=\mathcal{O}(\tfrac{1}{T})

uniformly in uu. Let Kh​(x)=1h​K​(xh)K_{h}(x)=\tfrac{1}{h}K(\tfrac{x}{h}), where K​(β‹…)K(\cdot) is a Kernel function such that K​(x/h)=0K(x/h)=0 is |x|>h|x|>h and h=hTh=h_{\!{}_{T}} is the smoothing parameter that tends to zero as TT diverges, but slower than 1/T1/T:

h=hTβ†’0​and​T​hTβ†’βˆžβ€‹as​Tβ†’βˆž.h=h_{\!{}_{T}}\to 0\quad\mbox{and}\quad Th_{\!{}_{T}}\to\infty\quad\mbox{as}\,T\to\infty.

If we right-multiply (3) by 𝑿tβˆ’1βŠ€β€‹Kh​(tTβˆ’u)\bm{X}_{t-1}^{\top}K_{h}(\tfrac{t}{T}-u) and sum over tt we obtain

βˆ‘t=1T𝑿t​𝑿tβˆ’1βŠ€β€‹Kh​(tTβˆ’u)\displaystyle\sum_{t=1}^{T}\bm{X}_{t}\bm{X}_{t-1}^{\top}K_{h}(\tfrac{t}{T}-u) =\displaystyle= βˆ‘t=1T𝐀​(tT)​𝑿tβˆ’1​𝑿tβˆ’1βŠ€β€‹Kh​(tTβˆ’u)+βˆ‘t=1T𝜺t​𝑿tβˆ’1βŠ€β€‹Kh​(tTβˆ’u).\displaystyle\sum_{t=1}^{T}\mathbf{A}(\tfrac{t}{T})\bm{X}_{t-1}\bm{X}_{t-1}^{\top}K_{h}(\tfrac{t}{T}-u)+\sum_{t=1}^{T}\bm{\varepsilon}_{t}\bm{X}_{t-1}^{\top}K_{h}(\tfrac{t}{T}-u).

Since 𝔼​[𝜺t​𝑿tβˆ’1⊀]=0\mathbb{E}[\bm{\varepsilon}_{t}\bm{X}_{t-1}^{\top}]=0,

βˆ‘t=1T𝐀​(tT)​𝑿tβˆ’1​𝑿tβˆ’1βŠ€β€‹Kh​(tTβˆ’u)βˆ’π€β€‹(u)β€‹βˆ‘t=1T𝑿tβˆ’1​𝑿tβˆ’1βŠ€β€‹Kh​(tTβˆ’u)=βˆ‘t=1T[𝐀​(tT)βˆ’π€β€‹(u)]​𝑿tβˆ’1​𝑿tβˆ’1βŠ€β€‹Kh​(tTβˆ’u),\sum_{t=1}^{T}\mathbf{A}(\tfrac{t}{T})\bm{X}_{t-1}\bm{X}_{t-1}^{\top}K_{h}(\tfrac{t}{T}-u)-\mathbf{A}(u)\sum_{t=1}^{T}\bm{X}_{t-1}\bm{X}_{t-1}^{\top}K_{h}(\tfrac{t}{T}-u)=\sum_{t=1}^{T}[\mathbf{A}(\tfrac{t}{T})-\mathbf{A}(u)]\bm{X}_{t-1}\bm{X}_{t-1}^{\top}K_{h}(\tfrac{t}{T}-u),
β€–1Tβ€‹βˆ‘t=1T𝑿t​𝑿tβŠ€β€‹Kh​(tTβˆ’u)βˆ’πšͺ​(0,u)β€–=π’ͺp​(1T​h),\displaystyle\|\tfrac{1}{T}\sum_{t=1}^{T}\bm{X}_{t}\bm{X}_{t}^{\top}K_{h}(\tfrac{t}{T}-u)-\mathbf{\Gamma}(0,u)\|=\mathcal{O}_{p}(\tfrac{1}{\sqrt{T\,h}}),
β€–1Tβ€‹βˆ‘t=1T𝑿t​𝑿tβˆ’1βŠ€β€‹Kh​(tTβˆ’u)βˆ’πšͺ​(1,u)β€–=π’ͺp​(1T​h),\displaystyle\|\tfrac{1}{T}\sum_{t=1}^{T}\bm{X}_{t}\bm{X}_{t-1}^{\top}K_{h}(\tfrac{t}{T}-u)-\mathbf{\Gamma}(1,u)\|=\mathcal{O}_{p}(\tfrac{1}{\sqrt{T\,h}}),

and

β€–1Tβ€‹βˆ‘t=1T[𝐀​(tT)βˆ’π€β€‹(u)]​𝑿tβˆ’1​𝑿tβˆ’1βŠ€β€‹Kh​(tTβˆ’u)β€–\displaystyle\|\tfrac{1}{T}\sum_{t=1}^{T}[\mathbf{A}(\tfrac{t}{T})-\mathbf{A}(u)]\bm{X}_{t-1}\bm{X}_{t-1}^{\top}K_{h}(\tfrac{t}{T}-u)\| ≀\displaystyle\leq supt,T‖𝐀​(tT)β€–Γ—β€–1Tβ€‹βˆ‘t=1T𝑿tβˆ’1​𝑿tβˆ’1βŠ€β€‹Kh​(tTβˆ’u)β€–\displaystyle\sup_{t,T}\|\mathbf{A}(\tfrac{t}{T})\|\times\|\tfrac{1}{T}\sum_{t=1}^{T}\bm{X}_{t-1}\bm{X}_{t-1}^{\top}K_{h}(\tfrac{t}{T}-u)\|
≀\displaystyle\leq |tTβˆ’u|×‖𝐀(1)​(u)β€–Γ—β€–1Tβ€‹βˆ‘t=1T𝑿tβˆ’1​𝑿tβˆ’1βŠ€β€‹Kh​(tTβˆ’u)β€–\displaystyle|\tfrac{t}{T}-u|\,\times\|\mathbf{A}^{(1)}(u)\|\times\|\tfrac{1}{T}\sum_{t=1}^{T}\bm{X}_{t-1}\bm{X}_{t-1}^{\top}K_{h}(\tfrac{t}{T}-u)\|
≀\displaystyle\leq π’ͺ​(1T)Γ—π’ͺ​(1)Γ—β€–πšͺ​(0,u)+π’ͺp​(1T​h)β€–=π’ͺp​(1T),\displaystyle\mathcal{O}(\tfrac{1}{T})\times\mathcal{O}(1)\times\|\mathbf{\Gamma}(0,u)+\mathcal{O}_{p}(\tfrac{1}{\sqrt{T\,h}})\|=\mathcal{O}_{p}(\tfrac{1}{T}),

it makes sense estimating 𝐀​(u)\mathbf{A}(u) as

𝐀^​(u)=[βˆ‘t=1T𝑿t​𝑿tβˆ’1βŠ€β€‹Kh​(tTβˆ’u)]​[βˆ‘t=1T𝑿tβˆ’1​𝑿tβˆ’1βŠ€β€‹Kh​(tTβˆ’u)]βˆ’1.\widehat{\mathbf{A}}(u)=\Big{[}\sum_{t=1}^{T}\bm{X}_{t}\bm{X}_{t-1}^{\top}K_{h}(\tfrac{t}{T}-u)\Big{]}\Big{[}\sum_{t=1}^{T}\bm{X}_{t-1}\bm{X}_{t-1}^{\top}K_{h}(\tfrac{t}{T}-u)\Big{]}^{-1}. (4)

If we define

𝐗0TΓ—r\displaystyle\underset{T\times r}{\mathbf{X}_{0}} =\displaystyle= {𝑿0rΓ—1,𝑿1rΓ—1,…,𝑿Tβˆ’1rΓ—1}⊀,\displaystyle\{\underset{r\times 1}{\bm{X}_{0}},\underset{r\times 1}{\bm{X}_{1}},\dots,\underset{r\times 1}{\bm{X}_{T-1}}\}^{\top}, (5)
𝐗1TΓ—r\displaystyle\underset{T\times r}{\mathbf{X}_{1}} =\displaystyle= {𝑿1rΓ—1,𝑿2rΓ—1,…,𝑿TrΓ—1}⊀,and\displaystyle\{\underset{r\times 1}{\bm{X}_{1}},\underset{r\times 1}{\bm{X}_{2}},\dots,\underset{r\times 1}{\bm{X}_{T}}\}^{\top},\,\,\mbox{and} (6)
𝐊T​(u)TΓ—T\displaystyle\underset{T\times T}{\mathbf{K}_{\!{}_{T}}(u)} =\displaystyle= diag​{Kh​(1Tβˆ’u),Kh​(2Tβˆ’u),…,Kh​(1βˆ’u)},\displaystyle{\rm{diag}}\{K_{h}(\tfrac{1}{T}-u),K_{h}(\tfrac{2}{T}-u),\dots,K_{h}(1-u)\}, (7)

the estimator in (4) can be written as

𝐀^​(u)=[𝐗1βŠ€β€‹πŠT​(u)​𝐗0]​[𝐗0βŠ€β€‹πŠT​(u)​𝐗0]βˆ’1.\widehat{\mathbf{A}}(u)=[\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{0}]\,[\mathbf{X}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{0}]^{-1}. (8)

3 Joint estimation of time-varying mean-vector and VAR-matrix by Weighted Least Squares

The estimator in (8) can be obtained as the minimizer of a WLS problem. Consider the LS-VAR(1) in (3), and let us now allow for a time-varying (non-zero) mean

𝑿tβˆ’πβ€‹(tT)=𝐀​(tT)​[𝑿tβˆ’1βˆ’πβ€‹(tβˆ’1T)]+𝜺t,t=1,…,T,\bm{X}_{t}-\bm{\mu}(\tfrac{t}{T})=\mathbf{A}(\tfrac{t}{T})[\bm{X}_{t-1}-\bm{\mu}(\tfrac{t-1}{T})]+\bm{\varepsilon}_{t},\quad t=1,\dots,T, (9)

with 𝑿0=𝝁​(0)=𝟎\bm{X}_{0}=\bm{\mu}(0)=\bm{0} and 𝔼​[𝜺t​𝑿s⊀]=πŸ™{s=t}​πšͺΞ΅\mathbb{E}[\bm{\varepsilon}_{t}\bm{X}_{s}^{\top}]=\mathbbm{1}_{\{s=t\}}\mathbf{\Gamma}_{\varepsilon}. If the largest eigenvalue of 𝐀\mathbf{A} lies inside the unit circle

supu∈(0,1)|v1​[𝐀​(u)]|<1,\sup_{u\in(0,1)}\left|{\rm v}_{1}[\mathbf{A}(u)]\,\right|<1, (10)

𝑿t\bm{X}_{t} is locally stationary and causal. Our goal is to estimate 𝐀​(u)\mathbf{A}(u) at a fixed u∈(0,1)u\in(0,1) by WLS. We can rewrite (9) as

𝑿t=π’Žβ€‹(tT)+𝐀​(tT)​𝑿tβˆ’1+𝜺t,\bm{X}_{t}=\bm{m}(\tfrac{t}{T})+\mathbf{A}(\tfrac{t}{T})\bm{X}_{t-1}+\bm{\varepsilon}_{t}, (11)

where π’Žβ€‹(tT)=𝝁​(tT)βˆ’π€β€‹(tT)​𝝁​(tβˆ’1T)\bm{m}(\tfrac{t}{T})=\bm{\mu}(\tfrac{t}{T})-\mathbf{A}(\tfrac{t}{T})\bm{\mu}(\tfrac{t-1}{T}) for all t=1,…,Tt=1,\dots,T. If we define

𝐁​(u)rΓ—(r+1)\displaystyle\underset{r\times(r+1)}{\mathbf{B}(u)} =\displaystyle= [π’Žβ€‹(u),𝐀​(u)]​and\displaystyle[\bm{m}(u),\mathbf{A}(u)]\quad\mbox{and}
𝒁t(r+1)Γ—1\displaystyle\underset{(r+1)\times 1}{\bm{Z}_{t}} =\displaystyle= (1𝑿tβˆ’1)\displaystyle\begin{pmatrix}1\\ \bm{X}_{t-1}\end{pmatrix}

for all t=1,…,Tt=1,\dots,T, model (11) can be written as

𝑿t=𝐁​(tT)​𝒁t+𝜺t,\bm{X}_{t}=\mathbf{B}(\tfrac{t}{T})\bm{Z}_{t}+\bm{\varepsilon}_{t}, (12)

and it makes sense to define 𝐁^​(u)\widehat{\mathbf{B}}(u) as the minimizer of the weighted loss function

βˆ‘t=1T‖𝑿tβˆ’πβ€‹(tT)​𝒁tβ€–2​Kh​(tTβˆ’u),\sum_{t=1}^{T}\|\bm{X}_{t}-\mathbf{B}(\tfrac{t}{T})\bm{Z}_{t}\|^{2}\,K_{h}(\tfrac{t}{T}-u), (13)

where the bandwidth sequence h≑hTh\equiv h_{T} tends to zero slower than Tβˆ’1T^{-1}: hTβ†’0h_{T}\to 0 and T​hTβ†’βˆžT\,h_{T}\to\infty as Tβ†’βˆžT\to\infty. The following proposition provides a closed-form of the local-constant minimizer of (13). We consider model (12) and use the local-constant approximation of 𝐁​(tT)\mathbf{B}(\tfrac{t}{T}) in a neighborhood of uu

𝐁​(tT)β‰ˆπβ€‹(u)\mathbf{B}(\tfrac{t}{T})\approx\mathbf{B}(u)

to estimate 𝐁​(u)\mathbf{B}(u), our parameter of interest, in the approximate model

𝑿tβ‰ˆπβ€‹(u)​𝒁t+𝜺t.\bm{X}_{t}\approx\mathbf{B}(u)\bm{Z}_{t}+\bm{\varepsilon}_{t}.

Our first result generalizes the Yule-Walker solutions in (8) to allow for the time-varying mean vector 𝝁\bm{\mu}.

Proposition 1.

Let 𝐗t\bm{X}_{t} follow the locally stationary model in (9), with 𝐀​(u)\mathbf{A}(u) satisfying (10). Assume that the mean function 𝛍​(u)\bm{\mu}(u) and the VAR matrix 𝐀​(u)\mathbf{A}(u) in (9) are both differentiable uniformly in uu, that is,

supu∈(0,1)‖𝝁(1)​(u)β€–<∞,and​supu∈(0,1)‖𝐀(1)​(u)β€–<∞,\sup_{u\in(0,1)}\|\bm{\mu}^{(1)}(u)\|<\infty,\qquad\mbox{and}\quad\sup_{u\in(0,1)}\|\mathbf{A}^{(1)}(u)\|<\infty, (14)

where 𝛍(1)​(u):=d​𝛍​(x)d​x|x=u\bm{\mu}^{(1)}(u):=\tfrac{d\bm{\mu}(x)}{dx}|_{x=u} and 𝐀(1)​(u):=d​𝐀​(x)d​x|x=u\mathbf{A}^{(1)}(u):=\tfrac{d\mathbf{A}(x)}{dx}|_{x=u}. Then, the local constant minimizer of (13) is

𝐁^​(u)\displaystyle\widehat{\mathbf{B}}(u) =\displaystyle= 𝐗1βŠ€β€‹πŠT​(u)​𝐙0​(𝐙0βŠ€β€‹πŠT​(u)​𝐙0)βˆ’1\displaystyle\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{Z}_{0}\,\big{(}\mathbf{Z}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{Z}_{0}\big{)}^{-1} (15)
=\displaystyle= [π’Ž^​(u),𝐀^​(u)],\displaystyle[\widehat{\bm{m}}(u),\widehat{\mathbf{A}}(u)], (16)

with 𝐀^​(u)=𝐆^​(u,1)​[𝐆^​(u,0)βˆ’1]\widehat{\mathbf{A}}(u)=\widehat{\mathbf{G}}(u,1)\,[\widehat{\mathbf{G}}(u,0)^{-1}]   and   𝐦^​(u)=𝛍^1​(u)βˆ’π€^​(u)​𝛍^0​(u)\widehat{\bm{m}}(u)=\widehat{\bm{\mu}}_{1}(u)-\widehat{\mathbf{A}}(u)\widehat{\bm{\mu}}_{0}(u), where

𝝁^0​(u)\displaystyle\widehat{\bm{\mu}}_{0}(u) =\displaystyle= 𝐗0βŠ€β€‹πŠT​(u)β€‹πŸT/𝟏TβŠ€β€‹πŠT​(u)β€‹πŸT=\displaystyle\mathbf{X}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}/\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}\quad= βˆ‘t=1TKh​(tTβˆ’u)βˆ‘s=1TKh​(sTβˆ’u)​𝑿tβˆ’1,\displaystyle\sum_{t=1}^{T}\tfrac{K_{h}(\frac{t}{T}-u)}{\sum_{s=1}^{T}K_{h}(\frac{s}{T}-u)}\bm{X}_{t-1}, (17)
𝝁^1​(u)\displaystyle\widehat{\bm{\mu}}_{1}(u) =\displaystyle= 𝐗1βŠ€β€‹πŠT​(u)β€‹πŸT/𝟏TβŠ€β€‹πŠT​(u)β€‹πŸT=\displaystyle\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}/\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}\quad= βˆ‘t=1TKh​(tTβˆ’u)βˆ‘s=1TKh​(sTβˆ’u)​𝑿t,\displaystyle\sum_{t=1}^{T}\tfrac{K_{h}(\frac{t}{T}-u)}{\sum_{s=1}^{T}K_{h}(\frac{s}{T}-u)}\bm{X}_{t}, (18)
𝐆^​(u,0)\displaystyle\widehat{\mathbf{G}}(u,0) =\displaystyle= [𝐗0βŠ€β€‹πŠT​(u)​𝐗0βˆ’π—0βŠ€β€‹πŠT​(u)β€‹πŸTβ€‹πŸTβŠ€β€‹πŠT​(u)​𝐗0𝟏TβŠ€β€‹πŠT​(u)β€‹πŸT]=\displaystyle[\mathbf{X}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{0}-\tfrac{\mathbf{X}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{0}}{\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}]= [𝐗0βˆ’πŸT​𝝁^0βŠ€β€‹(u)]βŠ€β€‹πŠT​(u)​[𝐗0βˆ’πŸT​𝝁^0βŠ€β€‹(u)]\displaystyle[\mathbf{X}_{0}-\bm{1}_{T}\widehat{\bm{\mu}}_{0}^{\top}(u)]^{\top}\mathbf{K}_{\!{}_{T}}(u)[\mathbf{X}_{0}-\bm{1}_{T}\widehat{\bm{\mu}}_{0}^{\top}(u)] (19)
=\displaystyle= [𝐗0βˆ’πŸTβ€‹πŸTβŠ€β€‹πŠT​(u)​𝐗0𝟏TβŠ€β€‹πŠT​(u)β€‹πŸT]βŠ€β€‹πŠT​(u)​[𝐗0βˆ’πŸTβ€‹πŸTβŠ€β€‹πŠT​(u)​𝐗0𝟏TβŠ€β€‹πŠT​(u)β€‹πŸT]=\displaystyle[\mathbf{X}_{0}-\tfrac{\bm{1}_{T}\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{0}}{\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}]^{\top}\mathbf{K}_{\!{}_{T}}(u)[\mathbf{X}_{0}-\tfrac{\bm{1}_{T}\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{0}}{\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}]= βˆ‘t=1T[𝑿tβˆ’1βˆ’π^0​(u)]​[𝑿tβˆ’1βˆ’π^0​(u)]βŠ€β€‹Kh​(tTβˆ’u),\displaystyle\sum_{t=1}^{T}[\bm{X}_{t-1}-\widehat{\bm{\mu}}_{0}(u)][\bm{X}_{t-1}-\widehat{\bm{\mu}}_{0}(u)]^{\top}K_{h}(\tfrac{t}{T}-u),
𝐆^​(u,1)\displaystyle\widehat{\mathbf{G}}(u,1) =\displaystyle= [𝐗1βŠ€β€‹πŠT​(u)​𝐗0βˆ’π—1βŠ€β€‹πŠT​(u)β€‹πŸTβ€‹πŸTβŠ€β€‹πŠT​(u)​𝐗0𝟏TβŠ€β€‹πŠT​(u)β€‹πŸT]=\displaystyle[\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{0}-\tfrac{\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{0}}{\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}]= [𝐗1βˆ’πŸT​𝝁^1βŠ€β€‹(u)]βŠ€β€‹πŠT​(u)​[𝐗0βˆ’πŸT​𝝁^0βŠ€β€‹(u)]\displaystyle[\mathbf{X}_{1}-\bm{1}_{T}\widehat{\bm{\mu}}_{1}^{\top}(u)]^{\top}\mathbf{K}_{\!{}_{T}}(u)[\mathbf{X}_{0}-\bm{1}_{T}\widehat{\bm{\mu}}_{0}^{\top}(u)] (20)
=\displaystyle= [𝐗1βˆ’πŸTβ€‹πŸTβŠ€β€‹πŠT​(u)​𝐗1𝟏TβŠ€β€‹πŠT​(u)β€‹πŸT]βŠ€β€‹πŠT​(u)​[𝐗0βˆ’πŸTβ€‹πŸTβŠ€β€‹πŠT​(u)​𝐗0𝟏TβŠ€β€‹πŠT​(u)β€‹πŸT]=\displaystyle[\mathbf{X}_{1}-\tfrac{\bm{1}_{T}\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{1}}{\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}]^{\top}\mathbf{K}_{\!{}_{T}}(u)[\mathbf{X}_{0}-\tfrac{\bm{1}_{T}\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{0}}{\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}]= βˆ‘t=1T[𝑿tβˆ’π^1​(u)]​[𝑿tβˆ’1βˆ’π^0​(u)]βŠ€β€‹Kh​(tTβˆ’u).\displaystyle\sum_{t=1}^{T}[\bm{X}_{t}-\widehat{\bm{\mu}}_{1}(u)][\bm{X}_{t-1}-\widehat{\bm{\mu}}_{0}(u)]^{\top}K_{h}(\tfrac{t}{T}-u).
Proof.

See Appendix A. ∎

The following proposition provides a closed-form of the local-linear minimizer of (13). We consider model (12) and use the local-linear approximation of 𝐁​(tT)\mathbf{B}(\tfrac{t}{T}) in a neighborhood of uu

𝐁​(tT)β‰ˆπβ€‹(u)+(tTβˆ’u)​𝐁(1)​(u)\mathbf{B}(\tfrac{t}{T})\approx\mathbf{B}(u)+(\tfrac{t}{T}-u)\mathbf{B}^{(1)}(u) (21)

to estimate 𝐁​(u)\mathbf{B}(u), our parameter of interest, in the approximate model

𝑿tβ‰ˆπβ€‹(u)​𝒁t+(tTβˆ’u)​𝐁(1)​(u)​𝒁t+𝜺t.\bm{X}_{t}\approx\mathbf{B}(u)\bm{Z}_{t}+(\tfrac{t}{T}-u)\mathbf{B}^{(1)}(u)\bm{Z}_{t}+\bm{\varepsilon}_{t}. (22)
Theorem 1.

Let 𝐗t\bm{X}_{t} follow the locally stationary model in (9), with 𝐀​(u)\mathbf{A}(u) satisfying (10). Assume that the mean function 𝛍​(u)\bm{\mu}(u) and the VAR matrix 𝐀​(u)\mathbf{A}(u) in (9) are both continuously differentiable, uniformly in uu, that is,

supu∈(0,1)‖𝝁(2)​(u)β€–<∞,and​supu∈(0,1)‖𝐀(2)​(u)β€–<∞,\sup_{u\in(0,1)}\|\bm{\mu}^{(2)}(u)\|<\infty,\qquad\mbox{and}\quad\sup_{u\in(0,1)}\|\mathbf{A}^{(2)}(u)\|<\infty, (23)

where 𝐀(2)​(u):=d2​𝐀​(x)d​x2|x=u\mathbf{A}^{(2)}(u):=\tfrac{d^{2}\mathbf{A}(x)}{dx^{2}}|_{x=u}. Let

𝐙0TΓ—(r+1)=[𝟏T|𝐗0],\underset{T\times(r+1)}{\mathbf{Z}_{0}}=[\mathbf{1}_{T}|\mathbf{X}_{0}],

where 𝐗0\mathbf{X}_{0} has been defined in (5), and define the diagonal matrix

𝚫1​(u)TΓ—T=diag​{tTβˆ’u, 1≀t≀T},\underset{T\times T}{\mathbf{\Delta}_{1}(u)}={\rm{diag}}\{\tfrac{t}{T}-u,\,1\leq t\leq T\}, (24)

and the weighting matrix

𝐖T​(u;𝐗)=𝐊T​(u)βˆ’πŠT​(u)β€‹πš«1​(u)​𝐙0​[𝐙0βŠ€β€‹πš«1​(u)β€‹πŠT​(u)β€‹πš«1​(u)​𝐙0]βˆ’1​𝐙0βŠ€β€‹πš«1​(u)β€‹πŠT​(u),\mathbf{W}_{\!{}_{T}}(u;\mathbf{X})=\mathbf{K}_{T}(u)-\mathbf{K}_{T}(u)\mathbf{\Delta}_{1}(u)\mathbf{Z}_{0}[\mathbf{Z}_{0}^{\top}\mathbf{\Delta}_{1}(u)\mathbf{K}_{T}(u)\mathbf{\Delta}_{1}(u)\mathbf{Z}_{0}]^{-1}\mathbf{Z}_{0}^{\top}\mathbf{\Delta}_{1}(u)\mathbf{K}_{T}(u), (25)

where 𝐊T​(u)\mathbf{K}_{T}(u) has been defined (7). Then, the local linear minimizer of (13) is

𝐁~​(u)\displaystyle\widetilde{\mathbf{B}}(u) =\displaystyle= 𝐗1βŠ€β€‹π–T​(u;𝐗)​𝐙0​[𝐙0βŠ€β€‹π–T​(u;𝐗)​𝐙0]βˆ’1\displaystyle\mathbf{X}_{1}^{\top}\mathbf{W}_{\!{}_{T}}(u;\mathbf{X})\mathbf{Z}_{0}\,[{\mathbf{Z}}_{0}^{\top}\mathbf{W}_{\!{}_{T}}(u;\mathbf{X}){\mathbf{Z}}_{0}]^{-1} (26)
=\displaystyle= [π’Ž~​(u),𝐀~​(u)],\displaystyle[\widetilde{\bm{m}}(u),\widetilde{\mathbf{A}}(u)], (27)

with 𝐀~​(u)=𝐆~​(u,1)​[𝐆~​(u,0)βˆ’1]\widetilde{\mathbf{A}}(u)=\widetilde{\mathbf{G}}(u,1)\,[\widetilde{\mathbf{G}}(u,0)^{-1}]  and   𝐦~​(u)=𝛍~1​(u)βˆ’π€~​(u)​𝛍~0​(u)\widetilde{\bm{m}}(u)=\widetilde{\bm{\mu}}_{1}(u)-\widetilde{\mathbf{A}}(u)\widetilde{\bm{\mu}}_{0}(u), where

𝝁~0​(u)\displaystyle\widetilde{\bm{\mu}}_{0}(u) =\displaystyle= 𝐗0βŠ€β€‹π–T​(u;𝐗)β€‹πŸT/𝟏TβŠ€β€‹π–T​(u;𝐗)β€‹πŸT\displaystyle\mathbf{X}_{0}^{\top}\mathbf{W}_{\!{}_{T}}(u;\mathbf{X})\bm{1}_{T}/\bm{1}_{T}^{\top}\mathbf{W}_{\!{}_{T}}(u;\mathbf{X})\bm{1}_{T} (28)
𝝁~1​(u)\displaystyle\widetilde{\bm{\mu}}_{1}(u) =\displaystyle= 𝐗1βŠ€β€‹π–T​(u;𝐗)β€‹πŸT/𝟏TβŠ€β€‹π–T​(u;𝐗)β€‹πŸT\displaystyle\mathbf{X}_{1}^{\top}\mathbf{W}_{\!{}_{T}}(u;\mathbf{X})\bm{1}_{T}/\bm{1}_{T}^{\top}\mathbf{W}_{\!{}_{T}}(u;\mathbf{X})\bm{1}_{T} (29)
𝐆~​(u,0)\displaystyle\widetilde{\mathbf{G}}(u,0) =\displaystyle= [𝐗0βŠ€β€‹π–T​(u;𝐗)​𝐗0βˆ’π—0βŠ€β€‹π–T​(u;𝐗)β€‹πŸTβ€‹πŸTβŠ€β€‹π–T​(u;𝐗)​𝐗0𝟏TβŠ€β€‹π–T​(u;𝐗)β€‹πŸT]\displaystyle[\mathbf{X}_{0}^{\top}\mathbf{W}_{\!{}_{T}}(u;\mathbf{X})\mathbf{X}_{0}-\tfrac{\mathbf{X}_{0}^{\top}\mathbf{W}_{\!{}_{T}}(u;\mathbf{X})\bm{1}_{T}\bm{1}_{T}^{\top}\mathbf{W}_{\!{}_{T}}(u;\mathbf{X})\mathbf{X}_{0}}{\bm{1}_{T}^{\top}\mathbf{W}_{\!{}_{T}}(u;\mathbf{X})\bm{1}_{T}}] (30)
𝐆~​(u,1)\displaystyle\widetilde{\mathbf{G}}(u,1) =\displaystyle= [𝐗1βŠ€β€‹π–T​(u;𝐗)​𝐗0βˆ’π—1βŠ€β€‹π–T​(u;𝐗)β€‹πŸTβ€‹πŸTβŠ€β€‹π–T​(u;𝐗)​𝐗0𝟏TβŠ€β€‹π–T​(u;𝐗)β€‹πŸT],\displaystyle[\mathbf{X}_{1}^{\top}\mathbf{W}_{\!{}_{T}}(u;\mathbf{X})\mathbf{X}_{0}-\tfrac{\mathbf{X}_{1}^{\top}\mathbf{W}_{\!{}_{T}}(u;\mathbf{X})\bm{1}_{T}\bm{1}_{T}^{\top}\mathbf{W}_{\!{}_{T}}(u;\mathbf{X})\mathbf{X}_{0}}{\bm{1}_{T}^{\top}\mathbf{W}_{\!{}_{T}}(u;\mathbf{X})\bm{1}_{T}}], (31)

𝐗1\mathbf{X}_{1} being defined in (6).

Proof.

See Appendix B. ∎

4 Simulation Results

We first consider the zero-mean locally stationary VAR(1) model in (3), and estimate the VAR matrix by means of the localized Yule-Walker equations. Then we consider the locally stationary VAR(1) model in (9) with time-varying mean, and compare the WLS estimates obtained with local-constant and local-linear weights, respectively.

We simulate model (3) with r=6r=6. For j=1,4j=1,4 and k=1,…,rk=1,\dots,r, we generate the time varying entries of the rΓ—rr\times r matrix 𝐀​(tT)\mathbf{A}(\tfrac{t}{T}) as

Aj,k​(tT)\displaystyle A_{j,k}(\tfrac{t}{T}) =\displaystyle= a1​j+3log⁑(k+3)​sin⁑(4​π​tT​j+4log⁑(k+4)),\displaystyle a_{1}\tfrac{\sqrt{j+3}}{\log(k+3)}\sin\big{(}4\pi\tfrac{t}{T}\tfrac{\sqrt{j+4}}{\log(k+4)}\big{)},
Aj+1,k​(tT)\displaystyle A_{j+1,k}(\tfrac{t}{T}) =\displaystyle= a1​j+2log⁑(k+3)​cos⁑(2​π​tT​j+4log⁑(k+2)),\displaystyle a_{1}\tfrac{\sqrt{j+2}}{\log(k+3)}\cos\big{(}2\pi\tfrac{t}{T}\tfrac{\sqrt{j+4}}{\log(k+2)}\big{)}, (32)
Aj+2,k​(tT)\displaystyle A_{j+2,k}(\tfrac{t}{T}) =\displaystyle= a2​j+1log⁑(k+3)​sin⁑(π​tT​j+4log⁑(k+2)),\displaystyle a_{2}\tfrac{\sqrt{j+1}}{\log(k+3)}\sin\big{(}\pi\tfrac{t}{T}\tfrac{\sqrt{j+4}}{\log(k+2)}\big{)},

with T=800T=800, a1=0.2a_{1}=0.2 and a2=0.1a_{2}=0.1. This specification is such that, for all u∈(0,1)u\in(0,1),

0.1<|v1​[𝐀​(u)]|<0.9.0.1<\left|{\rm v}_{1}[\mathbf{A}(u)]\,\right|<0.9.

We estimate the parameters according to (8), with h=0.03h=0.03 and the Gaussian Kernel K​(x)=12​π​exp⁑(βˆ’0.5​x2)K(x)=\tfrac{1}{\sqrt{2\pi}}\exp(-0.5\,x^{2}). The results are reported in FigureΒ 1.

Refer to caption
(a) One realization of the r=6r=6 times series simulated according to (3) with 𝐀​(u)\mathbf{A}(u) as in (32) and T=800T=800.
Refer to caption
(b) Red lines: simulated parameters according to (32). Solid-black lines: average of the estimates over MM replications. Dashed-black lines: 90% confidence bands (empirical quantiles) of the MM estimates.
Figure 1: Left: simulated time series according to (3), with r=6r=6 and 𝐀​(u)\mathbf{A}(u) as in (32). Right: estimates 𝐀^​(u)\widehat{\mathbf{A}}(u) obtained according to (8) over M=100M=100 replications.

We simulate model (9) with r=3r=3. For k=1,…,rk=1,\dots,r, we generate the time varying entries of the rΓ—1r\times 1 vector 𝝁​(tT)\bm{\mu}(\tfrac{t}{T}) as

ΞΌk​(tT)=6​sin⁑(π​ωk​tTβˆ’Ο•k)\mu_{k}(\tfrac{t}{T})=\sqrt{6}\,\sin(\pi\omega_{k}\,\tfrac{t}{T}-\phi_{k}) (33)

and the rΓ—rr\times r matrix 𝐀​(tT)\mathbf{A}(\tfrac{t}{T}) as

A1,k​(tT)\displaystyle A_{1,k}(\tfrac{t}{T}) =\displaystyle= a1​6log⁑(k+3)​sin⁑(1.2+2​π​tT​7log⁑(k+4)),\displaystyle a_{1}\tfrac{\sqrt{6}}{\log(k+3)}\sin\big{(}1.2+2\pi\tfrac{t}{T}\tfrac{\sqrt{7}}{\log(k+4)}\big{)},
A2,k​(tT)\displaystyle A_{2,k}(\tfrac{t}{T}) =\displaystyle= a1​5log⁑(k+3)​cos⁑(1.2+2​π​tT​7log⁑(k+2)),\displaystyle a_{1}\tfrac{\sqrt{5}}{\log(k+3)}\cos\big{(}1.2+2\pi\tfrac{t}{T}\tfrac{\sqrt{7}}{\log(k+2)}\big{)}, (34)
A3,k​(tT)\displaystyle A_{3,k}(\tfrac{t}{T}) =\displaystyle= a2​4log⁑(k+3)​sin⁑(1.2+π​tT​7log⁑(k+2)),\displaystyle a_{2}\tfrac{\sqrt{4}}{\log(k+3)}\sin\big{(}1.2+\pi\tfrac{t}{T}\tfrac{\sqrt{7}}{\log(k+2)}\big{)},

with T=600T=600, a1=0.3a_{1}=0.3 and a2=0.2a_{2}=0.2, Ο‰k=0.5+k\omega_{k}=0.5+k, and Ο•k=0.2+k/3\phi_{k}=0.2+k/3. This specification is such that, for all u∈(0,1)u\in(0,1),

0<|v1​[𝐀​(u)]|<0.9.0<\left|{\rm v}_{1}[\mathbf{A}(u)]\,\right|<0.9.

FigureΒ 2 exhibits the parameters estimated by WLS with h=0.04h=0.04 and the Epanechnikov Kernel K​(x)=34​(1βˆ’u2)β€‹πŸ™{|u|≀1}​(x)K(x)=\tfrac{3}{4}(1-u^{2})\mathbbm{1}_{\{|u|\leq 1\}}(x). The local constant estimates, obtained according to (15)-(16), are presented reported in FigureΒ 2(a). The local linear estimates, obtained according to (25)-(27), are reported in FigureΒ 2(b).

FigureΒ 1 shows that in the absence of the (time-varying) mean, that is when 𝝁​(u)≑0\bm{\mu}(u)\equiv 0, the local constant estimator performs very well. However, as FigureΒ 2(a) illustrates, this is not the case in the presence of a (time-varying) mean.

It is clear from FigureΒ 2 that although the local constant estimates look satisfactory, the local linear approach delivers superior results. As in the univariate case, the bias of the local linear estimator only depends on the second derivative (of the unknown regression function), and this does not come as a cost in the asymptotic variance. Moreover, the local linear estimator does not suffer from boundary bias problems. The quality of the estimates in FigureΒ 2(b) is remarkable.

Refer to caption
(a) Local-constant WLS estimates obtained according to (15)-(16) over M=100M=100 replications.
Refer to caption
(b) Local-linear WLS estimates obtained according to (25)-(27) over M=100M=100 replications.
Figure 2: First row: one realization of the r=3r=3 time series simulated according to (9) with 𝝁​(u)\bm{\mu}(u) as in (33), 𝐀​(u)\mathbf{A}(u) as in (34), and T=600T=600. Second row: estimated time-varying means. Third, fourth, and last row: estimated time-varying VAR-coefficients. Red: simulated curves. Solid black: average of the estimates. Dashed black: 95% confidence bands.

5 Conclusions and future research

In this paper we consider the problem of estimating the time-varying mean-vector and the time-varying AR-matrix of a locally stationary VAR(1). We provide the closed form definition of the local linear solution to the weighted least squares, in a way that the mean-vector and the AR-matrix are estimated jointly.

The asymptotic properties of our estimator need to be studied. Also, it would be interesting to develop data driven methods to select the smoothing parameters. Moreover, we might consider the problem of estimating the parameters of a locally stationary VAR(pp) of order p>1p>1.

An important contribution from future studies is the extension of the WLS in (40) to the high-dimensional setting: r>>Tr>>T. To this end, the WLS approach can be can be generalized in more than one direction. In fact, the closed form in (25)-(27) of Theorem 1 becomes particularly attractive in the case the length rr of the time series becomes large. Indeed, we can stick to the linear regression model in (22) with the same assumptions as in Section 3, and fit (22) in a way to shrink the regression coefficients towards zero. More precisely we can consider minimizing, with respect to 𝐁\mathbf{B}, the following WLS-Ridge loss-function

‖𝐗1βˆ’π™~0​𝐁​(u)βŠ€β€–πŠβ€‹(u)2+λ​‖𝐁​(u)β€–2,\|\mathbf{X}_{1}-\widetilde{\mathbf{Z}}_{0}\mathbf{B}(u)^{\top}\|^{2}_{\mathbf{K}(u)}+\lambda\|{\mathbf{B}}(u)\|^{2}, (35)

the effect of the penalty being to shrink the entries of 𝐁\mathbf{B} towards zero. The approach based on (35) can be generalized to the case of a non-spherical penalty. The loss function corresponding to this scenario is

‖𝐗1βˆ’π™~0​𝐁​(u)βŠ€β€–πŠβ€‹(u)2+‖𝐁​(u)β€–πš²2,\|\mathbf{X}_{1}-\widetilde{\mathbf{Z}}_{0}{\mathbf{B}}(u)^{\top}\|^{2}_{\mathbf{K}(u)}+\|{\mathbf{B}}(u)\|^{2}_{\mathbf{\Lambda}}, (36)

which comprises a WLS criterion – as in (40) – and a generalized ridge penalty given by the matrix πš²β€‹(u)\mathbf{\Lambda}(u). In both (35) and (36), the (TΓ—T)(T\times T) matrix πŠβ€‹(u)\mathbf{K}(u) is diagonal, the tt-th element {Kt​(u)=1h​K​(uβˆ’t/Th), 1≀t≀T}\{K_{t}(u)=\tfrac{1}{h}K(\tfrac{u-t/T}{h}),\,1\leq t\leq T\} representing the weight of the tt-th observation such that 1T​Kt​(u)∈[0,1]\tfrac{1}{T}K_{t}(u)\in[0,1]. The penalty in (36) is a quadratic form with penalty parameter 𝚲\mathbf{\Lambda}, an rr-dimensional positive-definite matrix. When 𝚲=Ξ»β€‹πˆr\mathbf{\Lambda}=\lambda\mathbf{I}_{r}, we obtain the spherical penalty of the WLS-Ridge regression in (35). Generalizing the (positive) scalar Ξ»\lambda to the class of (positive definite) matrices 𝚲\mathbf{\Lambda} allows for (i) different penalization per regression parameter, and (ii) joint shrinkage among the elements of 𝐁​(u)\mathbf{B}(u).

References

  • (1)
  • Dahlhaus (1996) Dahlhaus, R. (1996). Asymptotic statistical inference for nonstationary processes with evolutionary spectra. In P.Β M. Robinson andΒ M.Β Rosenblatt (eds), Athens Conference on Applied Probability and Time Series Analysis, Vol.Β II, Springer-Verlag, New York.
  • Dahlhaus (1997) Dahlhaus, R. (1997). Fitting time series models to nonstationary processes. The Annals of Statistics 25,Β 1–37.
  • Dahlhaus (2000) Dahlhaus, R. (2000). A likelihood approximation for locally stationary processes. The Annals of Statistics 28(6),Β 1762–1794.
  • Dahlhaus (2012) Dahlhaus, R. (2012). Time Series Analysis: Methods and Applications. Vol.Β 30, Elsevier, chapter Locally Stationary Processes.
  • LΓΌtkepohl (1996) LΓΌtkepohl, H. (1996). Handbook of Matrices. John Wiley & Sons.
  • Zhou andΒ Wu (2010) Zhou, Z. andΒ Wu, W.Β B. (2010). Simultaneous inference of linear models with time varying coefficients. Journal of the Royal Statistical Society, Series B 72(4),Β 513–531.

Appendix A Proof of PropositionΒ 1

If we assume that the matrix-valued function

𝐁​(x)=[π’Žβ€‹(x),𝐀​(x)]\mathbf{B}(x)=[\bm{m}(x),\mathbf{A}(x)]

is smooth in xx, we can write the Taylor expansion of 𝐁​(x)\mathbf{B}(x) around uu:

𝐁​(x)\displaystyle\mathbf{B}(x) =\displaystyle= βˆ‘j=0∞(xβˆ’u)jj!​𝐁(j)​(u)\displaystyle\sum_{j=0}^{\infty}\tfrac{(x-u)^{j}}{j!}\mathbf{B}^{(j)}(u)
=\displaystyle= 𝐁​(u)+(xβˆ’u)​𝐁(1)​(u)+π’ͺ​([xβˆ’u]2)\displaystyle\mathbf{B}(u)+(x-u)\mathbf{B}^{(1)}(u)+\mathcal{O}([x-u]^{2})

where 𝐁(j)​(u):=dj​𝐁​(x)d​xj|x=u\mathbf{B}^{(j)}(u):=\tfrac{d^{j}\mathbf{B}(x)}{dx^{j}}|_{x=u}. Assuming (14) implies that

supu∈(0,1)‖𝐁(1)​(u)β€–<∞,\sup_{u\in(0,1)}\|\mathbf{B}^{(1)}(u)\|<\infty,

and that

‖𝐁​(tT)βˆ’πβ€‹(u)‖≀|tTβˆ’u|​‖𝐁(1)​(u)β€–=π’ͺ​(1T)Γ—π’ͺ​(1)=π’ͺ​(1T)\|\mathbf{B}(\tfrac{t}{T})-\mathbf{B}(u)\|\leq|\tfrac{t}{T}-u|\,\|\mathbf{B}^{(1)}(u)\|=\mathcal{O}(\tfrac{1}{T})\times\mathcal{O}(1)=\mathcal{O}(\tfrac{1}{T})

uniformly in uu, so that the loss in (13) can be approximated by

βˆ‘t=1T‖𝑿tβˆ’πβ€‹(u)​𝒁tβ€–2​Kh​(tTβˆ’u).\sum_{t=1}^{T}\|\bm{X}_{t}-\mathbf{B}(u)\bm{Z}_{t}\|^{2}\,K_{h}(\tfrac{t}{T}-u). (37)

Letting

𝐙0TΓ—(r+1)=[𝟏T,𝐗0]\underset{T\times(r+1)}{\mathbf{Z}_{0}}=[\bm{1}_{T},\mathbf{X}_{0}]

where 𝟏T\bm{1}_{T} is a TΓ—1T\times 1 vector of ones, and 𝐗0\mathbf{X}_{0} has been defined in (5), the loss in (37) can be written in matrix form as

β„’T​(u)=‖𝐗1βˆ’π™0​𝐁​(u)βŠ€β€–πŠT​(u)2=tr​{[𝐗1βˆ’π™0​𝐁​(u)⊀]βŠ€β€‹πŠT​(u)​[𝐗1βˆ’π™0​𝐁​(u)⊀]},\displaystyle\begin{split}\mathcal{L}_{\,\!{}_{T}}(u)&=\|\mathbf{X}_{1}-\mathbf{Z}_{0}\mathbf{B}(u)^{\top}\|^{2}_{\mathbf{K}_{\!{}_{T}}(u)}\\ &={\rm tr}\{[\mathbf{X}_{1}-\mathbf{Z}_{0}\mathbf{B}(u)^{\top}]^{\top}\,\mathbf{K}_{\!{}_{T}}(u)\,[\mathbf{X}_{1}-\mathbf{Z}_{0}\mathbf{B}(u)^{\top}]\},\end{split} (38)

with 𝐗1\mathbf{X}_{1} as in (6) and 𝐊T​(u)\mathbf{K}_{\!{}_{T}}(u) as in (7). The loss in (38) is equal to

tr​{[𝐗1βŠ€β€‹πŠT​(u)​𝐗1βˆ’2​𝐗1βŠ€β€‹πŠT​(u)​𝐙0​𝐁​(u)⊀+𝐁​(u)​𝐙0βŠ€β€‹πŠT​(u)​𝐙0​𝐁​(u)⊀]},\displaystyle{\rm tr}\{[\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{1}-2\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{Z}_{0}\mathbf{B}(u)^{\top}+\mathbf{B}(u)\mathbf{Z}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{Z}_{0}\mathbf{B}(u)^{\top}]\},

and thus minimizing β„’T​(u)\mathcal{L}_{\,\!{}_{T}}(u) with respect to 𝐁​(u)\mathbf{B}(u) is equivalent to minimizing

tr​{[𝐁​(u)​𝐙0βŠ€β€‹πŠT​(u)​𝐙0​𝐁​(u)βŠ€βˆ’2​𝐗1βŠ€β€‹πŠT​(u)​𝐙0​𝐁​(u)⊀]}\displaystyle{\rm tr}\{[\mathbf{B}(u)\mathbf{Z}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{Z}_{0}\mathbf{B}(u)^{\top}-2\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{Z}_{0}\mathbf{B}(u)^{\top}]\}

with respect to 𝐁​(u)\mathbf{B}(u). Differentiating w.r.t. 𝐁​(u)⊀\mathbf{B}(u)^{\top} and equating to zero we obtain

tr​{[2​𝐁​(u)​𝐙0βŠ€β€‹πŠT​(u)​𝐙0βˆ’2​𝐗1βŠ€β€‹πŠT​(u)​𝐙0]}=0,\displaystyle{\rm tr}\{[2\mathbf{B}(u)\mathbf{Z}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{Z}_{0}-2\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{Z}_{0}]\}=0,

that is,

𝐁^​(u)=𝐗1βŠ€β€‹πŠT​(u)​𝐙0​(𝐙0βŠ€β€‹πŠT​(u)​𝐙0)βˆ’1,\displaystyle\widehat{\mathbf{B}}(u)=\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{Z}_{0}\,\big{(}\mathbf{Z}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{Z}_{0}\big{)}^{-1},

and thus (15) is proved. Notice that

𝐗1βŠ€β€‹πŠT​(u)​𝐙0rΓ—(r+1)=[𝐗1βŠ€β€‹πŠT​(u)β€‹πŸTrΓ—1|𝐗1βŠ€β€‹πŠT​(u)​𝐗0rΓ—r],\underset{r\times(r+1)}{\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{Z}_{0}}=\big{[}\underset{r\times 1}{\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}|\underset{r\times r}{\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{0}}\big{]},

and that the matrix we need to invert can be partitioned as

𝐙0βŠ€β€‹πŠT​(u)​𝐙0(r+1)Γ—(r+1)=[𝟏TβŠ€β€‹πŠT​(u)β€‹πŸT1Γ—1𝟏TβŠ€β€‹πŠT​(u)​𝐗01Γ—r𝐗0βŠ€β€‹πŠT​(u)β€‹πŸTrΓ—1𝐗0βŠ€β€‹πŠT​(u)​𝐗0rΓ—r].\underset{(r+1)\times(r+1)}{\mathbf{Z}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{Z}_{0}}=\left[\begin{array}[]{c|c}\underset{1\times 1}{\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}&\underset{1\times r}{\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{0}}\\ \hline\cr\underset{r\times 1}{\mathbf{X}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}&\underset{r\times r}{\mathbf{X}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{0}}\end{array}\right].

Without proof we state the following Lemma, see LΓΌtkepohl (1996, result (1)in Section 3.5.3, pages 29-30).

Lemma 1.

Let 𝐀\mathbf{A} be mΓ—mm\times m, 𝐁\mathbf{B} be mΓ—nm\times n, 𝐂\mathbf{C} be nΓ—mn\times m, and 𝐃\mathbf{D} be nΓ—nn\times n, and consider the (m+n)Γ—(m+n)(m+n)\times(m+n) partitioned matrix

[𝐀𝐁𝐂𝐃].\left[\begin{array}[]{c|c}\mathbf{A}&\mathbf{B}\\ \hline\cr\mathbf{C}&\mathbf{D}\end{array}\right].

If 𝐀\mathbf{A} and [πƒβˆ’π‚π€βˆ’1​𝐁][\mathbf{D}-\mathbf{C}\mathbf{A}^{-1}\mathbf{B}] are both nonsingular, then

[𝐀𝐁𝐂𝐃]βˆ’1=[π€βˆ’1+π€βˆ’1​𝐁​(πƒβˆ’π‚π€βˆ’1​𝐁)βˆ’1β€‹π‚π€βˆ’1βˆ’π€βˆ’1​𝐁​(πƒβˆ’π‚π€βˆ’1​𝐁)βˆ’1βˆ’(πƒβˆ’π‚π€βˆ’1​𝐁)βˆ’1β€‹π‚π€βˆ’1(πƒβˆ’π‚π€βˆ’1​𝐁)βˆ’1].\left[\begin{array}[]{c|c}\mathbf{A}&\mathbf{B}\\ \hline\cr\mathbf{C}&\mathbf{D}\end{array}\right]^{-1}=\left[\begin{array}[]{c|c}\mathbf{A}^{-1}+\mathbf{A}^{-1}\mathbf{B}(\mathbf{D}-\mathbf{C}\mathbf{A}^{-1}\mathbf{B})^{-1}\mathbf{C}\mathbf{A}^{-1}&-\mathbf{A}^{-1}\mathbf{B}(\mathbf{D}-\mathbf{C}\mathbf{A}^{-1}\mathbf{B})^{-1}\\ \hline\cr-(\mathbf{D}-\mathbf{C}\mathbf{A}^{-1}\mathbf{B})^{-1}\mathbf{C}\mathbf{A}^{-1}&(\mathbf{D}-\mathbf{C}\mathbf{A}^{-1}\mathbf{B})^{-1}\end{array}\right].

We can now prove (16), together with (17), (18), (19) and (20). By LemmaΒ 1,

(𝐙0βŠ€β€‹πŠT​(u)​𝐙0)βˆ’1=[(𝟏TβŠ€β€‹πŠT​(u)β€‹πŸT)βˆ’1+πŸβŠ€β€‹πŠT​(u)​𝐗0​(𝐆^​(u,0)βˆ’1)​𝐗0βŠ€β€‹πŠT​(u)β€‹πŸT(𝟏TβŠ€β€‹πŠT​(u)β€‹πŸT)2βˆ’πŸβŠ€β€‹πŠT​(u)​𝐗0πŸβŠ€β€‹πŠT​(u)β€‹πŸT​(𝐆^​(u,0)βˆ’1)βˆ’(𝐆^​(u,0)βˆ’1)​𝐗0βŠ€β€‹πŠT​(u)β€‹πŸT​(𝟏TβŠ€β€‹πŠT​(u)β€‹πŸT)βˆ’1𝐆^​(u,0)βˆ’1],\big{(}\mathbf{Z}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{Z}_{0}\big{)}^{-1}=\left[\begin{array}[]{c|c}(\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T})^{-1}+\tfrac{\bm{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{0}(\widehat{\mathbf{G}}(u,0)^{-1})\mathbf{X}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}{(\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T})^{2}}&-\tfrac{\bm{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{0}}{\bm{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}(\widehat{\mathbf{G}}(u,0)^{-1})\\ \hline\cr-(\widehat{\mathbf{G}}(u,0)^{-1})\mathbf{X}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}(\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T})^{-1}&\widehat{\mathbf{G}}(u,0)^{-1}\end{array}\right],

where 𝐆^​(u,0)\widehat{\mathbf{G}}(u,0) has been defined in (19), and therefore

𝐁^​(u)=𝐗1βŠ€β€‹πŠT​(u)​𝐙0​(𝐙0βŠ€β€‹πŠT​(u)​𝐙0)βˆ’1=[π’Ž^​(u),𝐀^​(u)],\widehat{\mathbf{B}}(u)=\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{Z}_{0}\,\big{(}\mathbf{Z}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{Z}_{0}\big{)}^{-1}=[\widehat{\bm{m}}(u),\widehat{\mathbf{A}}(u)],

with

𝐀^​(u)\displaystyle\widehat{\mathbf{A}}(u) =\displaystyle= βˆ’π—1βŠ€β€‹πŠT​(u)β€‹πŸTβ€‹πŸTβŠ€β€‹πŠT​(u)​𝐗0𝟏TβŠ€β€‹πŠT​(u)β€‹πŸT​(𝐆^​(u,0)βˆ’1)+𝐗1βŠ€β€‹πŠT​(u)​𝐗0​(𝐆^​(u,0)βˆ’1)\displaystyle-\tfrac{\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{0}}{\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}(\widehat{\mathbf{G}}(u,0)^{-1})+\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{0}(\widehat{\mathbf{G}}(u,0)^{-1})
=\displaystyle= 𝐆^​(u,1)​𝐆^​(u,0)βˆ’1,\displaystyle\widehat{\mathbf{G}}(u,1)\,\widehat{\mathbf{G}}(u,0)^{-1},

where 𝐆^​(u,1)\widehat{\mathbf{G}}(u,1) has been defined in (20), and

π’Ž^​(u)\displaystyle\widehat{\bm{m}}(u) =\displaystyle= 𝐗1βŠ€β€‹πŠT​(u)β€‹πŸT𝟏TβŠ€β€‹πŠT​(u)β€‹πŸT+𝐗1βŠ€β€‹πŠT​(u)β€‹πŸTβ€‹πŸTβŠ€β€‹πŠT​(u)​𝐗0​(𝐆^​(u,0)βˆ’1)​𝐗0βŠ€β€‹πŠT​(u)β€‹πŸT(𝟏TβŠ€β€‹πŠT​(u)β€‹πŸT)2βˆ’π—1βŠ€β€‹πŠT​(u)​𝐗0​(𝐆^​(u,0)βˆ’1)​𝐗0βŠ€β€‹πŠT​(u)β€‹πŸTπŸβŠ€β€‹πŠT​(u)β€‹πŸT\displaystyle\tfrac{\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}{\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}+\tfrac{\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{0}(\widehat{\mathbf{G}}(u,0)^{-1})\mathbf{X}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}{(\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T})^{2}}-\tfrac{\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{0}(\widehat{\mathbf{G}}(u,0)^{-1})\mathbf{X}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}{\bm{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}
=\displaystyle= 𝐗1βŠ€β€‹πŠT​(u)β€‹πŸT𝟏TβŠ€β€‹πŠT​(u)β€‹πŸT+[𝐗1βŠ€β€‹πŠT​(u)β€‹πŸTβ€‹πŸTβŠ€β€‹πŠT​(u)​𝐗0𝟏TβŠ€β€‹πŠT​(u)β€‹πŸTβˆ’π—1βŠ€β€‹πŠT​(u)​𝐗0]​(𝐆^​(u,0)βˆ’1)​𝐗0βŠ€β€‹πŠT​(u)β€‹πŸTπŸβŠ€β€‹πŠT​(u)β€‹πŸT\displaystyle\tfrac{\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}{\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}+\big{[}\tfrac{\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{0}}{\bm{1}_{T}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}-\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{0}\big{]}(\widehat{\mathbf{G}}(u,0)^{-1})\tfrac{\mathbf{X}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}{\bm{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\bm{1}_{T}}
=\displaystyle= 𝝁^1​(u)βˆ’π†^​(u,1)​𝐆^​(u,0)βˆ’1​𝝁^0​(u)=𝝁^1​(u)βˆ’π€^​(u)​𝝁^0​(u),\displaystyle\widehat{\bm{\mu}}_{1}(u)-\widehat{\mathbf{G}}(u,1)\widehat{\mathbf{G}}(u,0)^{-1}\widehat{\bm{\mu}}_{0}(u)=\widehat{\bm{\mu}}_{1}(u)-\widehat{\mathbf{A}}(u)\widehat{\bm{\mu}}_{0}(u),

where 𝝁^0​(u)\widehat{\bm{\mu}}_{0}(u) and 𝝁^1​(u)\widehat{\bm{\mu}}_{1}(u) are given by (17) and (18), respectively.

Appendix B Proof of TheoremΒ 1

If we assume that the matrix-valued function

𝐁​(x)=[π’Žβ€‹(x),𝐀​(x)]\mathbf{B}(x)=[\bm{m}(x),\mathbf{A}(x)]

is smooth in xx, we can write the Taylor expansion of 𝐁​(x)\mathbf{B}(x) around uu:

𝐁​(x)\displaystyle\mathbf{B}(x) =\displaystyle= βˆ‘j=0∞(xβˆ’u)jj!​𝐁(j)​(u)\displaystyle\sum_{j=0}^{\infty}\tfrac{(x-u)^{j}}{j!}\mathbf{B}^{(j)}(u)
=\displaystyle= 𝐁​(u)+(xβˆ’u)​𝐁(1)​(u)+12​(xβˆ’u)2​𝐁(2)​(u)+π’ͺ​([xβˆ’u]3)\displaystyle\mathbf{B}(u)+(x-u)\mathbf{B}^{(1)}(u)+\tfrac{1}{2}(x-u)^{2}\mathbf{B}^{(2)}(u)+\mathcal{O}([x-u]^{3})

where 𝐁(j)​(u):=dj​𝐁​(x)d​xj|x=u\mathbf{B}^{(j)}(u):=\tfrac{d^{j}\mathbf{B}(x)}{dx^{j}}|_{x=u}. Assuming (23) implies that

supu∈(0,1)‖𝐁(2)​(u)β€–<∞,\sup_{u\in(0,1)}\|\mathbf{B}^{(2)}(u)\|<\infty,

and that

‖𝐁​(tT)βˆ’[𝐁​(u)+(xβˆ’u)​𝐁(1)​(u)]‖≀12​|tTβˆ’u|2​‖𝐁(2)​(u)β€–=π’ͺ​(1T2)Γ—π’ͺ​(1)=π’ͺ​(1T2)\left\|\mathbf{B}(\tfrac{t}{T})-\big{[}\mathbf{B}(u)+(x-u)\mathbf{B}^{(1)}(u)\big{]}\right\|\leq\tfrac{1}{2}|\tfrac{t}{T}-u|^{2}\,\|\mathbf{B}^{(2)}(u)\|=\mathcal{O}(\tfrac{1}{T^{2}})\times\mathcal{O}(1)=\mathcal{O}(\tfrac{1}{T^{2}})

uniformly in uu. Therefore, adopting (21)-(22) the loss in (13) can be approximated by

βˆ‘t=1T‖𝑿tβˆ’[𝐁​(u)+(tTβˆ’u)​𝐁(1)​(u)]​𝒁tβ€–2​Kh​(tTβˆ’u).\sum_{t=1}^{T}\|\bm{X}_{t}-[\mathbf{B}(u)+(\tfrac{t}{T}-u)\mathbf{B}^{(1)}(u)]\bm{Z}_{t}\|^{2}\,K_{h}(\tfrac{t}{T}-u). (39)

Letting

𝐁~1​(u)⊀[2​(r+1)]Γ—r=[𝐁​(u)⊀(r+1)Γ—r𝐁(1)​(u)⊀(r+1)Γ—r]\underset{[2(r+1)]\times r}{\widetilde{\mathbf{B}}_{1}(u)^{\top}}=\begin{bmatrix}\underset{(r+1)\times r}{{\mathbf{B}}(u)^{\top}}\\ \underset{(r+1)\times r}{{\mathbf{B}}^{(1)}(u)^{\top}}\end{bmatrix}

and

𝐙~0TΓ—[2​(r+1)]\displaystyle\underset{T\times[2(r+1)]}{\widetilde{\mathbf{Z}}_{0}} =\displaystyle= [𝐙0|𝚫1​(u)​𝐙0],with\displaystyle[\mathbf{Z}_{0}|\mathbf{\Delta}_{1}(u)\,\mathbf{Z}_{0}],\quad\mbox{with}
𝐙0TΓ—(r+1)\displaystyle\underset{T\times(r+1)}{\mathbf{Z}_{0}} =\displaystyle= [𝟏T|𝐗0],\displaystyle[\mathbf{1}_{T}|\mathbf{X}_{0}],

where 𝟏T\bm{1}_{T} is a TΓ—1T\times 1 vector of ones, and where 𝐗0\mathbf{X}_{0} and 𝚫1​(u)\mathbf{\Delta}_{1}(u) have been defined in (5) and (24), respectively, the loss in (39) can be written in matrix form as

β„’~T​(u)=‖𝐗1βˆ’π™~0​𝐁~1​(u)βŠ€β€–πŠT​(u)2=tr​{[𝐗1βˆ’π™~0​𝐁~1​(u)⊀]βŠ€β€‹πŠT​(u)​[𝐗1βˆ’π™~0​𝐁~1​(u)⊀]},\displaystyle\begin{split}\widetilde{\mathcal{L}}_{\,\!{}_{T}}(u)&=\|\mathbf{X}_{1}-\widetilde{\mathbf{Z}}_{0}\widetilde{\mathbf{B}}_{1}(u)^{\top}\|^{2}_{\mathbf{K}_{\!{}_{T}}(u)}\\ &={\rm tr}\{[\mathbf{X}_{1}-\widetilde{\mathbf{Z}}_{0}\widetilde{\mathbf{B}}_{1}(u)^{\top}]^{\top}\,\mathbf{K}_{\!{}_{T}}(u)\,[\mathbf{X}_{1}-\widetilde{\mathbf{Z}}_{0}\widetilde{\mathbf{B}}_{1}(u)^{\top}]\},\end{split} (40)

with 𝐗1\mathbf{X}_{1} as in (6) and 𝐊T​(u)\mathbf{K}_{\!{}_{T}}(u) as in (7). The loss in (40) is equal to

tr​{[𝐗1βŠ€β€‹πŠT​(u)​𝐗1βˆ’2​𝐗1βŠ€β€‹πŠT​(u)​𝐙~0​𝐁~1​(u)⊀+𝐁~1​(u)​𝐙~0βŠ€β€‹πŠT​(u)​𝐙~0​𝐁1​(u)⊀]},\displaystyle{\rm tr}\{[\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{X}_{1}-2\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\widetilde{\mathbf{Z}}_{0}\widetilde{\mathbf{B}}_{1}(u)^{\top}+\widetilde{\mathbf{B}}_{1}(u)\widetilde{\mathbf{Z}}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\widetilde{\mathbf{Z}}_{0}\mathbf{B}_{1}(u)^{\top}]\},

and thus minimizing β„’~T​(u)\widetilde{\mathcal{L}}_{\,\!{}_{T}}(u) with respect to 𝐁~1​(u)\widetilde{\mathbf{B}}_{1}(u) is equivalent to minimizing

tr​{[𝐁~1​(u)​𝐙~0βŠ€β€‹πŠT​(u)​𝐙~0​𝐁~1​(u)βŠ€βˆ’2​𝐗1βŠ€β€‹πŠT​(u)​𝐙~0​𝐁~1​(u)⊀]}\displaystyle{\rm tr}\{[\widetilde{\mathbf{B}}_{1}(u)\widetilde{\mathbf{Z}}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\widetilde{\mathbf{Z}}_{0}\widetilde{\mathbf{B}}_{1}(u)^{\top}-2\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\widetilde{\mathbf{Z}}_{0}\widetilde{\mathbf{B}}_{1}(u)^{\top}]\}

with respect to 𝐁~1​(u)\widetilde{\mathbf{B}}_{1}(u). Differentiating w.r.t. 𝐁~1​(u)⊀\widetilde{\mathbf{B}}_{1}(u)^{\top} and equating to zero we obtain

tr​{[2​𝐁~1​(u)​𝐙~0βŠ€β€‹πŠT​(u)​𝐙~0βˆ’2​𝐗1βŠ€β€‹πŠT​(u)​𝐙~0]}=0,\displaystyle{\rm tr}\{[2\,\widetilde{\mathbf{B}}_{1}(u)\widetilde{\mathbf{Z}}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\widetilde{\mathbf{Z}}_{0}-2\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\widetilde{\mathbf{Z}}_{0}]\}=0,

that is,

𝐁~1​(u)=𝐗1βŠ€β€‹πŠT​(u)​𝐙~0​(𝐙~0βŠ€β€‹πŠT​(u)​𝐙~0)βˆ’1.\displaystyle\widetilde{\mathbf{B}}_{1}(u)=\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\widetilde{\mathbf{Z}}_{0}\,\big{(}\widetilde{\mathbf{Z}}_{0}^{\top}\mathbf{K}_{\!{}_{T}}(u)\widetilde{\mathbf{Z}}_{0}\big{)}^{-1}.

Notice that

𝐗1βŠ€β€‹πŠT​(u)​𝐙~0rΓ—[2​(r+1)]=[𝐗1βŠ€β€‹πŠT​(u)​𝐙0rΓ—(r+1)|𝐗1βŠ€β€‹πŠT​(u)β€‹πš«1​(u)​𝐙0rΓ—(r+1)],\underset{r\times[2(r+1)]}{\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\widetilde{\mathbf{Z}}_{0}}=\big{[}\underset{r\times(r+1)}{\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{Z}_{0}}|\underset{r\times(r+1)}{\mathbf{X}_{1}^{\top}\mathbf{K}_{\!{}_{T}}(u)\mathbf{\Delta}_{1}(u)\mathbf{Z}_{0}}\big{]},

and that

𝐙~0βŠ€β€‹πŠT​(u)​𝐙~0[2​(r+1)]Γ—[2​(r+1)]\displaystyle\underset{[2(r+1)]\times[2(r+1)]}{\widetilde{\mathbf{Z}}_{0}^{\top}\mathbf{K}_{T}(u)\widetilde{\mathbf{Z}}_{0}} =\displaystyle= [𝐀𝐁𝐂𝐃],\displaystyle\left[\begin{array}[]{c|c}\mathbf{A}&\mathbf{B}\\ \hline\cr\mathbf{C}&\mathbf{D}\end{array}\right],

where

𝐀(r+1)Γ—(r+1)\displaystyle\underset{(r+1)\times(r+1)}{\mathbf{A}} =\displaystyle= 𝐙0βŠ€β€‹πŠT​(u)​𝐙0\displaystyle\mathbf{Z}_{0}^{\top}\mathbf{K}_{T}(u)\mathbf{Z}_{0}
𝐁(r+1)Γ—(r+1)\displaystyle\underset{(r+1)\times(r+1)}{\mathbf{B}} =\displaystyle= 𝐙0βŠ€β€‹πŠT​(u)β€‹πš«1​(u)​𝐙0\displaystyle\mathbf{Z}_{0}^{\top}\mathbf{K}_{T}(u)\mathbf{\Delta}_{1}(u)\mathbf{Z}_{0}
𝐂(r+1)Γ—(r+1)\displaystyle\underset{(r+1)\times(r+1)}{\mathbf{C}} =\displaystyle= 𝐙0βŠ€β€‹πš«1​(u)β€‹πŠT​(u)​𝐙0\displaystyle\mathbf{Z}_{0}^{\top}\mathbf{\Delta}_{1}(u)\mathbf{K}_{T}(u)\mathbf{Z}_{0}
𝐃(r+1)Γ—(r+1)\displaystyle\underset{(r+1)\times(r+1)}{\mathbf{D}} =\displaystyle= 𝐙0βŠ€β€‹πš«1​(u)β€‹πŠT​(u)β€‹πš«1​(u)​𝐙0.\displaystyle\mathbf{Z}_{0}^{\top}\mathbf{\Delta}_{1}(u)\mathbf{K}_{T}(u)\mathbf{\Delta}_{1}(u)\mathbf{Z}_{0}.

The local-linear estimator 𝐁~​(u)\widetilde{\mathbf{B}}(u) of 𝐁​(u)=[π’Žβ€‹(u),𝐀​(u)]{\mathbf{B}}(u)=[\bm{m}(u),\mathbf{A}(u)] is given by the first r+1r+1 columns of the rΓ—[2​(r+1)]r\times[2(r+1)] matrix

𝐗1βŠ€β€‹πŠT​(u)​𝐙~0rΓ—[2​(r+1)]​[𝐙~0βŠ€β€‹πŠT​(u)​𝐙~0]βˆ’1[2Γ—(r+1)]​[2Γ—(r+1)].\underset{r\times[2(r+1)]}{\mathbf{X}_{1}^{\top}\mathbf{K}_{T}(u)\widetilde{\mathbf{Z}}_{0}}\,\,\underset{[2\times(r+1)][2\times(r+1)]}{[\widetilde{\mathbf{Z}}_{0}^{\top}\mathbf{K}_{T}(u)\widetilde{\mathbf{Z}}_{0}]^{-1}}.

Hence, we need the first r+1r+1 columns of the [2​(r+1)]Γ—[2​(r+1)][2(r+1)]\times[2(r+1)] matrix [𝐙~0βŠ€β€‹πŠT​(u)​𝐙~0]βˆ’1[\widetilde{\mathbf{Z}}_{0}^{\top}\mathbf{K}_{T}(u)\widetilde{\mathbf{Z}}_{0}]^{-1}. Without proof we state the following Lemma, see LΓΌtkepohl (1996, result (2) in Section 3.5.3, page 30).

Lemma 2.

Let 𝐀\mathbf{A} be mΓ—mm\times m, 𝐁\mathbf{B} be mΓ—nm\times n, 𝐂\mathbf{C} be nΓ—mn\times m, and 𝐃\mathbf{D} be nΓ—nn\times n, and consider the (m+n)Γ—(m+n)(m+n)\times(m+n) partitioned matrix

[𝐀𝐁𝐂𝐃].\left[\begin{array}[]{c|c}\mathbf{A}&\mathbf{B}\\ \hline\cr\mathbf{C}&\mathbf{D}\end{array}\right].

If 𝐃\mathbf{D} and [π€βˆ’ππƒβˆ’1​𝐂][\mathbf{A}-\mathbf{B}\mathbf{D}^{-1}\mathbf{C}] are both nonsingular, then

[𝐀𝐁𝐂𝐃]βˆ’1=[[π€βˆ’ππƒβˆ’1​𝐂]βˆ’1βˆ’[π€βˆ’ππƒβˆ’1​𝐂]βˆ’1β€‹ππƒβˆ’1βˆ’πƒβˆ’1​𝐂​[π€βˆ’ππƒβˆ’1​𝐂]βˆ’1πƒβˆ’1+πƒβˆ’1​𝐂​[π€βˆ’ππƒβˆ’1​𝐂]βˆ’1β€‹ππƒβˆ’1].\left[\begin{array}[]{c|c}\mathbf{A}&\mathbf{B}\\ \hline\cr\mathbf{C}&\mathbf{D}\end{array}\right]^{-1}=\left[\begin{array}[]{c|c}[\mathbf{A}-\mathbf{B}\mathbf{D}^{-1}\mathbf{C}]^{-1}&-[\mathbf{A}-\mathbf{B}\mathbf{D}^{-1}\mathbf{C}]^{-1}\mathbf{B}\mathbf{D}^{-1}\\ \hline\cr-\mathbf{D}^{-1}\mathbf{C}[\mathbf{A}-\mathbf{B}\mathbf{D}^{-1}\mathbf{C}]^{-1}&\mathbf{D}^{-1}+\mathbf{D}^{-1}\mathbf{C}[\mathbf{A}-\mathbf{B}\mathbf{D}^{-1}\mathbf{C}]^{-1}\mathbf{B}\mathbf{D}^{-1}\end{array}\right].

Using LemmaΒ 2 we can write

𝐁~​(u)=\displaystyle\widetilde{\mathbf{B}}(u)= 𝐗1βŠ€β€‹πŠT​(u)​𝐙0\displaystyle\mathbf{X}_{1}^{\top}\mathbf{K}_{T}(u)\mathbf{Z}_{0} [𝐙0βŠ€β€‹πŠT​(u)​𝐙0βˆ’π™0βŠ€β€‹πŠT​(u)β€‹πš«1​(u)​𝐙0β€‹πƒβˆ’1​𝐙0βŠ€β€‹πš«1​(u)β€‹πŠT​(u)​𝐙0]βˆ’1\displaystyle[\mathbf{Z}_{0}^{\top}\mathbf{K}_{T}(u)\mathbf{Z}_{0}-\mathbf{Z}_{0}^{\top}\mathbf{K}_{T}(u)\mathbf{\Delta}_{1}(u)\mathbf{Z}_{0}\mathbf{D}^{-1}\mathbf{Z}_{0}^{\top}\mathbf{\Delta}_{1}(u)\mathbf{K}_{T}(u)\mathbf{Z}_{0}]^{-1}
βˆ’π—1βŠ€β€‹πŠT​(u)β€‹πš«1​(u)​𝐙0β€‹πƒβˆ’1​𝐙0βŠ€β€‹πš«1​(u)β€‹πŠT​(u)​𝐙0\displaystyle-\mathbf{X}_{1}^{\top}\mathbf{K}_{T}(u)\mathbf{\Delta}_{1}(u)\mathbf{Z}_{0}\mathbf{D}^{-1}\mathbf{Z}_{0}^{\top}\mathbf{\Delta}_{1}(u)\mathbf{K}_{T}(u)\mathbf{Z}_{0} [𝐙0βŠ€β€‹πŠT​(u)​𝐙0βˆ’π™0βŠ€β€‹πŠT​(u)β€‹πš«1​(u)​𝐙0β€‹πƒβˆ’1​𝐙0βŠ€β€‹πš«1​(u)β€‹πŠT​(u)​𝐙0]βˆ’1,\displaystyle[\mathbf{Z}_{0}^{\top}\mathbf{K}_{T}(u)\mathbf{Z}_{0}-\mathbf{Z}_{0}^{\top}\mathbf{K}_{T}(u)\mathbf{\Delta}_{1}(u)\mathbf{Z}_{0}\mathbf{D}^{-1}\mathbf{Z}_{0}^{\top}\mathbf{\Delta}_{1}(u)\mathbf{K}_{T}(u)\mathbf{Z}_{0}]^{-1},

where 𝐃=𝐙0βŠ€β€‹πš«1​(u)β€‹πŠT​(u)β€‹πš«1​(u)​𝐙0\mathbf{D}=\mathbf{Z}_{0}^{\top}\mathbf{\Delta}_{1}(u)\mathbf{K}_{T}(u)\mathbf{\Delta}_{1}(u)\mathbf{Z}_{0}. If we define the matrix 𝐖T​(u;𝐗)\mathbf{W}_{\!{}_{T}}(u;\mathbf{X}) according to (25), the local-linear estimator 𝐁~​(u)\widetilde{\mathbf{B}}(u) can be written as

𝐁~​(u)=𝐗1βŠ€β€‹π–T​(u;𝐗)​𝐙0​[𝐙0βŠ€β€‹π–T​(u;𝐗)​𝐙0]βˆ’1,\widetilde{\mathbf{B}}(u)=\mathbf{X}_{1}^{\top}\mathbf{W}_{\!{}_{T}}(u;\mathbf{X})\mathbf{Z}_{0}\,[{\mathbf{Z}}_{0}^{\top}\mathbf{W}_{\!{}_{T}}(u;\mathbf{X}){\mathbf{Z}}_{0}]^{-1},

and (26) is proved. The estimator 𝐁~​(u)\widetilde{\mathbf{B}}(u) has the same form as the estimator 𝐁^​(u)\widehat{\mathbf{B}}(u) in (15) of PropositionΒ 1, with 𝐖T​(u;𝐗)\mathbf{W}_{\!{}_{T}}(u;\mathbf{X}) instead of 𝐊T​(u)\mathbf{K}_{\!{}_{T}}(u). Hence the result in (27) with 𝝁~0​(u)\widetilde{\bm{\mu}}_{0}(u), 𝝁~1​(u)\widetilde{\bm{\mu}}_{1}(u), 𝐆~​(u,0)\widetilde{\mathbf{G}}(u,0), and 𝐆~​(u,1)\widetilde{\mathbf{G}}(u,1) as in (28), (29), (30), and (31), respectively, follows directly from PropositionΒ 1.