This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Bayesian Inference for Relational Graph in a Causal Vector Autoregressive Time Series

Arkaprava Roy111Department of Biostatistics, University of Florida, [email protected], Anindya Roy222Department of Mathematics and Statistics, University of Maryland Baltimore County, [email protected] and Subhashis Ghosal333Department of Statistics, North Carolina State University, [email protected]
Abstract

We propose a method for simultaneously estimating a contemporaneous graph structure and autocorrelation structure for a causal high-dimensional vector autoregressive process (VAR). The graph is estimated by estimating the stationary precision matrix using a Bayesian framework. We introduce a novel parameterization that is convenient for jointly estimating the precision matrix and the autocovariance matrices. The methodology based on the new parameterization has several desirable properties. A key feature of the proposed methodology is that it maintains causality of the process in its estimates and also provides a fast feasible way for computing the reduced rank likelihood for a high-dimensional Gaussian VAR. We use sparse priors along with the likelihood under the new parameterization to obtain the posterior of the graphical parameters as well as that of the temporal parameters. An efficient Markov Chain Monte Carlo (MCMC) algorithm is developed for posterior computation. We also establish theoretical consistency properties for the high-dimensional posterior. The proposed methodology shows excellent performance in simulations and real data applications.
Keywords: Graphical model; reduced rank VAR; Schur stability; sparse prior; stationary graph;

1 Introduction

Graphical models are popular models that encode scientific linkages between variables of interest through conditional independence structure and provide a parsimonious representation of the joint probability distribution of the variables, particularly when the number of variables is large. Thus, learning the graph structure underlying the joint probability distribution of a large collection of variables is an important problem and it has a long history. When the joint distribution is Gaussian, learning graphical structure can be done by estimating the precision matrix leading to the popular Gaussian Graphical Model (GGM); even in the non-Gaussian case one can think of encoding the graph using the partial correlation structure and learn the Partial Correlation Graphical Model (PCGM) by estimating the precision matrix. However, when the sample has temporal dependence, estimating the precision matrix efficiently from the sample can be challenging. If estimating the graphical structure is the main objective, the temporal dependence can be treated as a nuisance feature and ignored in the process of graph estimation. When the temporal features are also important to learn, estimating the graph structure and the temporal dependence simultaneously can be considerably more complex.

There are several formulations of graphical models for time series. A time series graph could be one with the nodes representing the entire coordinate processes. In particular, if the coordinate processes are jointly stationary, this leads to a ‘stationary graphical’ structure. Conditional independence in such a graph can be expressed equivalently in terms of the absence of partial spectral coherence between two nodal series; see Dahlhaus (2000). Estimation of such stationary graphs have been investigated by Jung et al. (2015); Fiecas et al. (2019), Basu and Rao (2023), Basu et al. (2015) and Ma and Michailidis (2016).

A weaker form of conditional coding for stationary multivariate time series is a ‘contemporaneous stationary graphical’ structure where the graphical structure is encoded in the marginal precision matrix. In a contemporaneous stationary graph, the nodes are the coordinate variables of the vector time series. If the time series is Gaussian, this leads to contemporaneous conditional independence among the coordinates of the vector time series. This is the GGM structure which has the appealing property that conditional independence of a pair of variables given the others is equivalent to the zero-value of the corresponding off-diagonal entry of the precision matrix and hence graph learning can be achieved via estimation of precision matrices. A rich literature on estimating the graphical structure can be found in classical books like Lauritzen (1996); Koller and Friedman (2009); see also Maathuis et al. (2019) and references therein for more recent results. When the distribution is specified only up to the second moment, which is common in the study of second-order stationary time series, one could learn the contemporaneous partial correlation graphical structure by estimating the stationary precision matrix. In this paper, we use the contemporaneous precision matrix of a stationary time series to estimate the graph along the line of Qiu et al. (2016); Zhang and Wu (2017). For modeling the temporal dynamics we use a vector autoregression (VAR) model.

This paper brings together two popular models: the PCGM for graph estimation and the VAR model for estimating the temporal dynamics in vector time series. However, modeling the partial correlation graph and the temporal dependence with the VAR structure simultaneously is challenging, particularly under constraints such as reduced rank and causality on the VAR model and sparsity on the graphical model. This is achieved in this paper via a novel parameterization of the VAR process. The main contribution of the paper is a methodology that allows for meeting the following two challenging objectives simultaneously:

  1. (i)

    Estimation of the contemporaneous stationary graph structure under a sparsity constraint.

  2. (ii)

    Estimation of VAR processes with a reduced rank structure under causality constraints.

A novelty of the proposed approach is that while developing the methodology for performing the above two tasks, we can

  1. (a)

    develop a recursive computation scheme for computing the reduced-rank VAR likelihood through low-rank updates;

  2. (b)

    establish posterior concentration under priors based on new parameterization.

2 Partial Correlation Graph Under Autoregression

The partial correlation graph for a set of variables 𝑿=(X1,,Xd)\bm{X}=(X_{1},\ldots,X_{d}) can be identified by the precision matrix of 𝑿\bm{X}, that is, the inverse of the dispersion matrix of 𝑿\bm{X}. Two components XjX_{j} and XkX_{k} are conditionally uncorrelated given the remaining components if and only if the (j,k)(j,k)th entry of the precision matrix of 𝑿\bm{X} is zero. Then the relations can be expressed as a graph on {1,,p}\{1,\ldots,p\} where jj and kk are connected if and only if XjX_{j} and XkX_{k} are conditionally correlated given the remaining components. Equivalently, an edge connects jj and kk if and only if the partial correlation between XjX_{j} and XkX_{k} is non-zero.

In many contexts, the set of variables of interest evolve over time and are temporally dependent. If the process {𝑿t:t=0,1,2,}\{\bm{X}_{t}:t=0,1,2,\ldots\} is stationary, that is, the joint distributions remain invariant under a time-shift, the relational graph of 𝑿t\bm{X}_{t} remains time-invariant. A vector autoregressive (VAR) process provides a simple, interpretable mechanism for temporal dependence by representing the process as a fixed linear combination of itself at a few immediate time points plus an independent random error. It is widely used in time series modeling. In this paper, we propose a Bayesian method for learning the relational graph of a stationary VAR process.

Let 𝑿1,,𝑿T\bm{X}_{1},\ldots,\bm{X}_{T} be a sample of size TT from a Vector Autoregressive process of order pp, VAR(p)(p) in short, given by

𝑿t=𝝁+𝑨1𝑿t1++𝑨p𝑿tp+𝒁t\displaystyle\bm{X}_{t}=\bm{\mu}+\bm{A}_{1}\bm{X}_{t-1}+\cdots+\bm{A}_{p}\bm{X}_{t-p}+\bm{Z}_{t} (2.1)

where 𝝁d\bm{\mu}\in{\mathbb{R}}^{d}, 𝑨1,,𝑨p\bm{A}_{1},\ldots,\bm{A}_{p} are d×dd\times d real matrices and 𝒁tWN(𝟎,𝚺)\bm{Z}_{t}\sim\mathrm{WN}(\bm{0},\bm{\Sigma}) is a dd dimensional white noise process with covariance matrix 𝚺\bm{\Sigma}, that is, 𝒁t\bm{Z}_{t} are independent Np(𝟎,𝚺)\mathrm{N}_{p}(\bm{0},\bm{\Sigma}). We consider pp to be given, but in practice, it may not be known and may have to be assessed by some selection methods.

Causality is a property that plays an important role in multivariate time series models, particularly in terms of forecasting. For a causal time series, the prediction formula includes current and past innovations, and hence a causal time series allows a stable forecast of the future in terms of present and past data. However, the condition of causality imposes complex constraints on the parameters of the process, often making it extremely difficult for one to impose causality during the estimation process. An effective approach is to parameterize the constrained parameter space of causal processes in terms of unconstrained parameters and write the likelihood and the prior distribution in terms of the unconstrained parameters.

For a VAR(p)(p) time series defined by (2.1), causality is determined by the roots of the determinantal equation

det(𝑨(z))=0 where 𝑨(z)=𝑰𝑨1z𝑨pzp,z\displaystyle\det(\bm{A}(z))=0\mbox{ where }\bm{A}(z)=\bm{I}-\bm{A}_{1}z-\cdots-\bm{A}_{p}z^{p},\;z\in{\mathbb{C}} (2.2)

is the matrix polynomial associated with the VAR equation. For the VAR model to be causal, all roots of the determinantal equation must lie outside the unit disc 𝒟={z:|z|1}\mathcal{D}=\{z\in{\mathbb{C}}:|z|\leq 1\}. When the roots of the determinantal equation lie outside the unit disc, the associated monic matrix polynomial 𝑨~(z)=zp𝑨1zp1𝑨p{\tilde{\bm{A}}}(z)=z^{p}-\bm{A}_{1}z^{p-1}-\cdots-\bm{A}_{p} is called Schur stable. Roy et al. (2019) provided a parameterization of the entire class of Schur stable polynomials and thereby parameterized the space of causal VAR models.

For convenience, we briefly describe the Roy et al. (2019) framework. Roy et al. (2019) noted that the VAR(p)(p) model (2.1) is causal if and only if the block Toeplitz covariance matrix 𝚼p\bm{\Upsilon}_{p} is positive definite, where

𝚼j=[𝚪(0)𝚪(1)𝚪(j)𝚪(1)T𝚪(0)𝚪(j1)𝚪(j)T𝚪(j1)T𝚪(0)]\displaystyle\bm{\Upsilon}_{j}=\begin{bmatrix}\bm{\Gamma}(0)&\bm{\Gamma}(1)&\cdots&\bm{\Gamma}(j)\\ \bm{\Gamma}(1)^{\mathrm{T}}&\bm{\Gamma}(0)&\cdots&\bm{\Gamma}(j-1)\\ \vdots&\vdots&\ddots&\vdots\\ \bm{\Gamma}(j)^{\mathrm{T}}&\bm{\Gamma}(j-1)^{\mathrm{T}}&\cdots&\bm{\Gamma}(0)\end{bmatrix} (2.3)

is the covariance matrix of (j+1)(j+1) consecutive observations (𝑿t,𝑿t1,,𝑿tj)T(\bm{X}_{t},\bm{X}_{t-1},\ldots,\bm{X}_{t-j})^{\mathrm{T}} and

𝚪(h)=𝔼[𝑿t𝔼(𝑿t))(𝑿th𝔼(𝑿th))T]\displaystyle\bm{\Gamma}(h)={\mathbb{E}}[\bm{X}_{t}-{\mathbb{E}}(\bm{X}_{t}))(\bm{X}_{t-h}-{\mathbb{E}}(\bm{X}_{t-h}))^{\mathrm{T}}] (2.4)

is the lag-hh autocorrelation matrix of the process. As shown in Roy et al. (2019), this condition is equivalent to

𝚪(0)=𝑪0𝑪1𝑪p=𝚺,\displaystyle\bm{\Gamma}(0)=\bm{C}_{0}\geq\bm{C}_{1}\geq\cdots\geq\bm{C}_{p}=\bm{\Sigma}, (2.5)

where 𝑪j=Var(𝑿j+1|𝑿j1,,𝑿1)\bm{C}_{j}=\mathrm{Var}(\bm{X}_{j+1}|\bm{X}_{j-1},\ldots,\bm{X}_{1}) are the the conditional dispersion matrices; here and below, we use the Lowner ordering: for two matrices 𝑨,𝑩\bm{A},\bm{B}, 𝑨𝑩\bm{A}\geq\bm{B} means that 𝑨𝑩\bm{A}-\bm{B} is nonnegative definite. The condition (2.5) plays a central role in the formulation of our parameterization. Since the VAR parameters 𝑨1,,𝑨p,𝚺\bm{A}_{1},\ldots,\bm{A}_{p},\bm{\Sigma} can be expressed as a one-to-one map of the sequence 𝑪0𝑪1,,𝑪p1𝑪p,𝑪p\bm{C}_{0}-\bm{C}_{1},\ldots,\bm{C}_{p-1}-\bm{C}_{p},\bm{C}_{p}, the parameterization of the causal VAR process is achieved by parameterizing the nonnegative matrices of successive differences 𝑪j1𝑪j,j=1,,p\bm{C}_{j-1}-\bm{C}_{j},j=1,\ldots,p, and the positive definite matrix 𝑪p\bm{C}_{p} in terms of unconstrained parameters. Several options for parameterizing nonnegative matrices in terms of unconstrained parameters are available in the literature.

Our main objectives in this paper is to estimate the stationary precision matrix of the process, [Var(Xt)]1=Ω=𝚪1(0)[{\rm Var}(X_{t})]^{-1}=\Omega=\bm{\Gamma}^{-1}(0), under sparsity restrictions. In the Roy et al. (2019) parameterization, the stationary variance 𝚪(0)\bm{\Gamma}(0) and hence 𝛀\bm{\Omega} are functions of the basic free parameters and hence not suitable for estimation of the graphical structure with desired sparsity properties. We suggest a novel modification of the previous parameterization that achieves the goal of parameterizing the graphical structure and the VAR correlation structure directly and separately, thereby facilitating the estimation of both components under the desired restrictions, even in higher dimensions.

3 Reduced Rank Parameterization and Priors

The main idea in the proposed parameterization is separation of the parameters pertaining to the graphical structure, 𝛀,\bm{\Omega}, and the parameters that are used to describe the temporal correlation present in the sample. The following result is essential in developing the parameterization. It follows easily from (2.5) but due to its central nature in the new parameterization, we state it formally.

Proposition 1.

Let 𝐗t\bm{X}_{t} satisfying (2.1) be a stationary vector autoregressive time series with stationary variance Var(𝐗t)=𝛀1{\rm Var}(\bm{X}_{t})=\bm{\Omega}^{-1}, a positive definite matrix and error covariance matrix 𝚺\bm{\Sigma}. Then 𝐗t\bm{X}_{t} is causal if and only if

Ω𝑪11𝑪p1=𝚺1,\displaystyle\Omega\leq\bm{C}_{1}^{-1}\leq\cdots\leq\bm{C}_{p}^{-1}=\bm{\Sigma}^{-1}, (3.1)

where for any j1,j\geq 1, 𝐂j=Var(𝐗j+1|𝐗j,,𝐗1)\bm{C}_{j}={\rm Var}(\bm{X}_{j+1}|\bm{X}_{j},\ldots,\bm{X}_{1}).

Constraint-free parameterization: Based on Proposition 1, for a causal VAR process the successive differences 𝑪j1𝑪j11\bm{C}_{j}^{-1}-\bm{C}_{j-1}^{-1} are nonnegative definite for j=1,,pj=1,\ldots,p, where 𝛀=𝑪01\bm{\Omega}=\bm{C}_{0}^{-1}. Hence, these successive differences, along with the precision matrix 𝛀\bm{\Omega}, can be parameterized using an unrestricted parameterization that maps non-negative definite matrices to free parameters.

3.1 Efficient Computation of the Likelihood

The computation of the likelihood is difficult for a multivariate time series. For a Gaussian VAR process, the likelihood can be computed relatively fast using the Durbin-Levison (DL) or innovations algorithm. However, under different parameterizations, the computational burden can increase significantly depending on the complexity of the parameterization. Moreover, the DL-type algorithm can still be challenging if the process dimension is high.

In models with a high number of parameters, a common approach is to seek low-dimensional sub-models that could adequately describe the data and answer basic inferential questions of interest.

Using the proposed parameterization of 𝛀\bm{\Omega} and the differences 𝑪j1𝑪j11\bm{C}_{j}^{-1}-\bm{C}_{j-1}^{-1}, we provide a low-rank formulation of a causal VAR process that leads to an efficient algorithm for computing the likelihood based on a Gaussian VAR(pp) sample. The algorithm achieves computational efficiency by avoiding inversion of large dimensional matrices. The sparse precision matrix 𝛀=𝑪01\bm{\Omega}=\bm{C}_{0}^{-1} is modeled as a separate parameter, allowing direct inference about the graphical structure under sparsity constraints . The likelihood computation requires the inversion of only rj×rjr_{j}\times r_{j} matrices instead of d×dd\times d matrices. The last fact substantially decreases the computational complexity. In particular, when rj=r=1r_{j}=r=1, i.e. when the conditional precision updates, 𝑪j1𝑪j11\bm{C}_{j}^{-1}-\bm{C}_{j-1}^{-1}, are all rank-one, all components of likelihood computation can be done without matrix inversion or factorization.

Before we proceed, we define some notations that are used throughout the article. The stationary variance matrix 𝚼j\bm{\Upsilon}_{j} of (𝑿t,𝑿t1,,𝑿tj)T,(\bm{X}_{t},\bm{X}_{t-1},\ldots,\bm{X}_{t-j})^{\mathrm{T}}, as defined in (2.3), will be written in the following nested structures

𝚼j=[𝚪(0)𝝃jT𝝃j𝚼j1]=[𝚼j1𝜿j𝜿jT𝚪(0)]\displaystyle\bm{\Upsilon}_{j}=\begin{bmatrix}\bm{\Gamma}(0)&\bm{\xi}_{j}^{\mathrm{T}}\\ \bm{\xi}_{j}&\bm{\Upsilon}_{j-1}\end{bmatrix}=\begin{bmatrix}\bm{\Upsilon}_{j-1}&\bm{\kappa}_{j}\\ \bm{\kappa}_{j}^{\mathrm{T}}&\bm{\Gamma}(0)\end{bmatrix} (3.2)

where

𝝃jT=(𝚪(1),𝚪(2),,𝚪(j)),𝜿jT=(𝚪(j)T,,𝚪(2)T,𝚪(1)T).\displaystyle\bm{\xi}_{j}^{\mathrm{T}}=(\bm{\Gamma}(1),\,\bm{\Gamma}(2),\ldots,\bm{\Gamma}(j)),\quad\bm{\kappa}_{j}^{\mathrm{T}}=(\bm{\Gamma}(j)^{\mathrm{T}},\ldots,\bm{\Gamma}(2)^{\mathrm{T}},\,\bm{\Gamma}(1)^{\mathrm{T}}). (3.3)

Denote the Schur-complements of 𝚼j1\bm{\Upsilon}_{j-1} in the two representations as

𝑪j=𝚪(0)𝝃jT𝚼j11𝝃j,𝑫j=𝚪(0)𝜿jT𝚼j11𝜿j.\displaystyle\bm{C}_{j}=\bm{\Gamma}(0)-\bm{\xi}_{j}^{\mathrm{T}}\bm{\Upsilon}_{j-1}^{-1}\bm{\xi}_{j},\quad\bm{D}_{j}=\bm{\Gamma}(0)-\bm{\kappa}_{j}^{\mathrm{T}}\bm{\Upsilon}_{j-1}^{-1}\bm{\kappa}_{j}. (3.4)

Also, let ϕd(|𝝁,𝚺)\phi_{d}(\cdot|\bm{\mu},\bm{\Sigma}) and Φd(|𝝁,𝚺)\Phi_{d}(\cdot|\bm{\mu},\bm{\Sigma}) denote the probability density and cumulative distribution function of the dd-dimensional normal with mean 𝝁\bm{\mu} and covariance 𝚺\bm{\Sigma}. The basic computation will be to successively compute the likelihood contribution of the conditional densities f(𝑿j|𝑿j1,,𝑿1),j=2,,Tf(\bm{X}_{j}|\bm{X}_{j-1},\ldots,\bm{X}_{1}),\,j=2,\ldots,T. Let 𝒀j=(𝑿jT,,𝑿max(1,jp)T)T\bm{Y}_{j}=(\bm{X}_{j}^{\mathrm{T}},\ldots,\bm{X}_{\max(1,j-p)}^{\mathrm{T}})^{\mathrm{T}}. Under the assumption that the errors are Gaussian, i.e., 𝒁jNd(𝟎,𝚺)\bm{Z}_{j}\sim\mathrm{N}_{d}(\bm{0},\bm{\Sigma}), and that they are independent, and writing f(𝑿1|𝒀0)=f(𝑿1)f(\bm{X}_{1}|\bm{Y}_{0})=f(\bm{X}_{1}), μ1=𝟎\mu_{1}=\bm{0}, 𝚺1=𝑪0\bm{\Sigma}_{1}=\bm{C}_{0} we have,

f(𝑿j|𝒀j1)=ϕd(𝑿j|𝝁j,𝚺j),\displaystyle f(\bm{X}_{j}|\bm{Y}_{j-1})=\phi_{d}(\bm{X}_{j}|\bm{\mu}_{j},\bm{\Sigma}_{j}), (3.5)

where the conditional mean and variance are given by 𝝁j=𝝃j1T𝚼j21𝒀j1\bm{\mu}_{j}=\bm{\xi}_{j-1}^{\mathrm{T}}\bm{\Upsilon}_{j-2}^{-1}\bm{Y}_{j-1}, 𝚺j=𝑪j1\bm{\Sigma}_{j}=\bm{C}_{j-1} for any 2jp,2\leq j\leq p, and 𝝁j=𝝃pT𝚼p11𝒀j\bm{\mu}_{j}=\bm{\xi}_{p}^{\mathrm{T}}\bm{\Upsilon}_{p-1}^{-1}\bm{Y}_{j}, 𝚺j=𝑪p\bm{\Sigma}_{j}=\bm{C}_{p} for p+1jTp+1\leq j\leq T. Thus, the full Gaussian likelihood

L=f(𝑿1)j=2Tf(𝑿j|𝒀j1)\displaystyle L=f(\bm{X}_{1})\prod_{j=2}^{\mathrm{T}}f(\bm{X}_{j}|\bm{Y}_{j-1}) (3.6)

can be obtained by recursively deriving the conditional means and variances from the constraint-free parameters describing the sparse precision matrix 𝛀\bm{\Omega} and the reduced-rank conditional variance differences, 𝑪j1𝑪j11,j=1,,p\bm{C}_{j}^{-1}-\bm{C}_{j-1}^{-1},j=1,\ldots,p. For parameterization of the conditional precision updates, we use a low-rank parameterization. Specifically, let

𝑪j1𝑪j11=𝑳j𝑳jT, where 𝑳j are d×rj matrices with rjd.\displaystyle\bm{C}_{j}^{-1}-\bm{C}_{j-1}^{-1}=\bm{L}_{j}\bm{L}_{j}^{\mathrm{T}},\mbox{ where }\bm{L}_{j}\mbox{ are }d\times r_{j}\mbox{ matrices with }r_{j}\ll d. (3.7)

The rank factors 𝑳j\bm{L}_{j} are not directly solvable from the precision updates 𝑪j1𝑪j11\bm{C}_{j}^{-1}-\bm{C}_{j-1}^{-1}. To complete the parameterization, we need to define a bijection. The bijection can be defined by augmenting 𝑳j\bm{L}_{j} with d×rjd\times r_{j} semi-orthogonal matrices 𝑸j\bm{Q}_{j} and define 𝑳j𝑸jT\bm{L}_{j}\bm{Q}_{j}^{T} as a unique square-root of 𝑪j1𝑪j11\bm{C}_{j}^{-1}-\bm{C}_{j-1}^{-1}, such as the unique symmetric square-root. This would lead to an identifiable parameterization. However, given that the d×rjd\times r_{j} entries in 𝑸j\bm{Q}_{j} are constrained by the orthogonality requirement, we will use a slightly over-identified system of parameters to describe the reduced rank formulation of 𝑪j1𝑪j11\bm{C}_{j}^{-1}-\bm{C}_{j-1}^{-1}, j=1,,pj=1,\ldots,p. The over-identification facilitates computation enormously without creating any challenges in inference for the parameters of interest.

Specifically, for each j=1,,p,j=1,\ldots,p, along with 𝑳j\bm{L}_{j} we will use d×rjd\times r_{j} parameters arranged in a d×rjd\times r_{j} matrix, 𝑲j\bm{K}_{j}, to describe the reduced-rank updates 𝑪j1𝑪j11\bm{C}_{j}^{-1}-\bm{C}_{j-1}^{-1}. The exact definition of the matrices 𝑲j\bm{K}_{j} along with the steps for computing the causal VAR likelihood based on the basic constraint-free parameters 𝛀\bm{\Omega}, 𝑳j\bm{L}_{j}, and 𝑲j\bm{K}_{j}, j=1,,pj=1,\ldots,p are given below. We assume the that the ranks r1,,rpr_{1},\ldots,r_{p} are specified and fixed. Also, unless otherwise specified, we will assume the matrix square roots for pd matrices to be the unique symmetric square-root. Then, given the parameters 𝛀,𝑳1,,𝑳p\bm{\Omega},\bm{L}_{1},\ldots,\bm{L}_{p} and the initialization 𝑪01=𝛀,\bm{C}_{0}^{-1}=\bm{\Omega}, the following steps describe the remaining parameters 𝑲1,,𝑲p\bm{K}_{1},\ldots,\bm{K}_{p} recursively along with recursive computation of the components f(𝑿j|𝒀j1)f(\bm{X}_{j}|\bm{Y}_{j-1}) of the VAR likelihood.

Recursive Computation of the Reduced Rank VAR Likelihood

For j=1,,p,j=1,\ldots,p,

  1. 1.

    Since 𝑪j1𝑪j11=𝑳j𝑳jT,\bm{C}^{-1}_{j}-\bm{C}_{j-1}^{-1}=\bm{L}_{j}\bm{L}_{j}^{T}, we have 𝑪j1=𝑪j11+𝑳j𝑳jT.\bm{C}_{j}^{-1}=\bm{C}_{j-1}^{-1}+\bm{L}_{j}\bm{L}_{j}^{\mathrm{T}}.

  2. 2.

    Using the Sherman-Woodbury-Morrison (SWM) formula for partitioned matrices, 𝑪j=𝑪j1𝑼j𝑼jT,\bm{C}_{j}=\bm{C}_{j-1}-\bm{U}_{j}\bm{U}_{j}^{\mathrm{T}}, where

    𝑼j=𝑪j1𝑳j(𝑰+𝑳jT𝑪j1𝑳j)1/2\displaystyle\bm{U}_{j}=\bm{C}_{j-1}\bm{L}_{j}(\bm{I}+\bm{L}^{\mathrm{T}}_{j}\bm{C}_{j-1}\bm{L}_{j})^{-1/2} (3.8)

    Note that,

    𝑼jT𝑪j11𝑼j\displaystyle\bm{U}_{j}^{\mathrm{T}}\bm{C}_{j-1}^{-1}\bm{U}_{j} =\displaystyle= (𝑰+𝑳jT𝑪j1𝑳j)1/2𝑼jT𝑪j1𝑪j11𝑪j1(𝑰+𝑳jT𝑪j1𝑳j)1/2\displaystyle(\bm{I}+\bm{L}^{\mathrm{T}}_{j}\bm{C}_{j-1}\bm{L}_{j})^{-1/2}\bm{U}_{j}^{\mathrm{T}}\bm{C}_{j-1}\bm{C}_{j-1}^{-1}\bm{C}_{j-1}(\bm{I}+\bm{L}^{\mathrm{T}}_{j}\bm{C}_{j-1}\bm{L}_{j})^{-1/2}
    =\displaystyle= (𝑰+𝑳jT𝑪j1𝑳j)1/2𝑳jT𝑪j1𝑳j(𝑰+𝑳jT𝑪j1𝑳j)1/2.\displaystyle(\bm{I}+\bm{L}^{\mathrm{T}}_{j}\bm{C}_{j-1}\bm{L}_{j})^{-1/2}\bm{L}^{\mathrm{T}}_{j}\bm{C}_{j-1}\bm{L}_{j}(\bm{I}+\bm{L}^{\mathrm{T}}_{j}\bm{C}_{j-1}\bm{L}_{j})^{-1/2}.

    Since for a positive definite matrix 𝑨\bm{A}, (𝑰+𝑨)1/2𝑨(𝑰+𝑨)1/2=𝑨1+𝑨<1,\|(\bm{I}+\bm{A})^{-1/2}\bm{A}(\bm{I}+\bm{A})^{-1/2}\|=\frac{\|\bm{A}\|}{1+\|\bm{A}\|}<1, 𝑼j\bm{U}_{j} satisfies the restriction 𝑼jT𝑪j11𝑼j<1.\|\bm{U}_{j}^{\mathrm{T}}\bm{C}_{j-1}^{-1}\bm{U}_{j}\|<1.

  3. 3.

    Define

    𝑾j=𝚪(j)T𝝃j1𝚼j21𝜿j1,\displaystyle\bm{W}_{j}=\bm{\Gamma}(j)^{\mathrm{T}}-\bm{\xi}_{j-1}\bm{\Upsilon}_{j-2}^{-1}\bm{\kappa}_{j-1}, (3.9)

    with 𝑾1=𝚪(1)T.\bm{W}_{1}=\bm{\Gamma}(1)^{T}. From Roy et al. (2019), 𝑼j𝑼jT=𝑾j𝑫j11𝑾jT.\bm{U}_{j}\bm{U}_{j}^{\mathrm{T}}=\bm{W}_{j}\bm{D}_{j-1}^{-1}\bm{W}_{j}^{\mathrm{T}}. and hence for any 𝑽j\bm{V}_{j} such that 𝑽jT𝑫j11𝑽j=𝑰\bm{V}_{j}^{\mathrm{T}}\bm{D}_{j-1}^{-1}\bm{V}_{j}=\bm{I} we have

    𝑾j=𝑼j𝑽jT.\displaystyle\bm{W}_{j}=\bm{U}_{j}\bm{V}_{j}^{\mathrm{T}}. (3.10)

    While 𝑼j\bm{U}_{j} are determined by 𝑳j\bm{L}_{j}, the 𝑽j\bm{V}_{j} matrices are determined by the other parameters 𝑲j\bm{K}_{j}. We construct 𝑽j\bm{V}_{j} from the basic parameters 𝑲j\bm{K}_{j} as,

    𝑽j=𝑲j(𝑲jT𝑫j11𝑲j)1/2.\displaystyle\bm{V}_{j}=\bm{K}_{j}(\bm{K}_{j}^{\mathrm{T}}\bm{D}_{j-1}^{-1}\bm{K}_{j})^{-1/2}. (3.11)

    Note that 𝑽jT𝑫j11𝑽j=𝑰\bm{V}_{j}^{\mathrm{T}}\bm{D}_{j-1}^{-1}\bm{V}_{j}=\bm{I}.

  4. 4.

    Thus, entries of the covariance matrices can be updated as

    𝚪(j)T=𝑼j𝑽jT+𝝃j1T𝚼j21𝜿j1,𝝃jT=(𝝃j1T,𝚪(j)),𝜿jT=(𝚪(j)T,𝜿j1T).\bm{\Gamma}(j)^{\mathrm{T}}=\bm{U}_{j}\bm{V}_{j}^{\mathrm{T}}+\bm{\xi}_{j-1}^{\mathrm{T}}\bm{\Upsilon}_{j-2}^{-1}\bm{\kappa}_{j-1},\quad\bm{\xi}_{j}^{\mathrm{T}}=(\bm{\xi}_{j-1}^{\mathrm{T}},\;\;\bm{\Gamma}(j)),\qquad\bm{\kappa}_{j}^{\mathrm{T}}=(\bm{\Gamma}(j)^{\mathrm{T}},\;\;\bm{\kappa}_{j-1}^{\mathrm{T}}).
  5. 5.

    Applying SWM successively, we have

    𝑫j\displaystyle\bm{D}_{j} =\displaystyle= 𝑫j1𝑹j𝑹jT,\displaystyle\bm{D}_{j-1}-\bm{R}_{j}\bm{R}_{j}^{\mathrm{T}},
    𝑫j1\displaystyle\bm{D}_{j}^{-1} =\displaystyle= 𝑫j11+𝑷j𝑷jT,\displaystyle\bm{D}_{j-1}^{-1}+\bm{P}_{j}\bm{P}_{j}^{\mathrm{T}},

    where 𝑹j=𝑽j(𝑼jT𝑪j11𝑼j)1/2\bm{R}_{j}=\bm{V}_{j}(\bm{U}_{j}^{\mathrm{T}}\bm{C}_{j-1}^{-1}\bm{U}_{j})^{1/2} and 𝑷j=𝑫j11𝑹j(𝑰𝑹jT𝑫j11𝑹j)1/2\bm{P}_{j}=\bm{D}_{j-1}^{-1}\bm{R}_{j}(\bm{I}-\bm{R}_{j}^{\mathrm{T}}\bm{D}_{j-1}^{-1}\bm{R}_{j})^{-1/2}. For the last update to hold one needs (𝑰𝑹jT𝑫j11𝑹j)(\bm{I}-\bm{R}_{j}^{\mathrm{T}}\bm{D}_{j-1}^{-1}\bm{R}_{j}) to be positive definite. This follows from the fact that :

    𝑹jT𝑫j11𝑹j=(𝑼jT𝑪j11𝑼j)1/2𝑽jT𝑫j11𝑽j(𝑼jT𝑪j11𝑼j)1/2=𝑼jT𝑪j11𝑼j,\bm{R}_{j}^{\mathrm{T}}\bm{D}_{j-1}^{-1}\bm{R}_{j}=(\bm{U}_{j}^{\mathrm{T}}\bm{C}_{j-1}^{-1}\bm{U}_{j})^{1/2}\bm{V}_{j}^{\mathrm{T}}\bm{D}_{j-1}^{-1}\bm{V}_{j}(\bm{U}_{j}^{\mathrm{T}}\bm{C}_{j-1}^{-1}\bm{U}_{j})^{1/2}=\bm{U}_{j}^{\mathrm{T}}\bm{C}_{j-1}^{-1}\bm{U}_{j},

    where the final equation holds because 𝑽jT𝑫j11𝑽j=𝑰.\bm{V}_{j}^{\mathrm{T}}\bm{D}_{j-1}^{-1}\bm{V}_{j}=\bm{I}. Recalling 𝑼jT𝑪j11𝑼j<1,\|\bm{U}_{j}^{\mathrm{T}}\bm{C}_{j-1}^{-1}\bm{U}_{j}\|<1, the result follows. Thus, 𝑫j1=𝑫j11+𝑫j1𝑽j(𝑰𝑼jT𝑪j11𝑼j)1𝑽jT𝑫j1.\bm{D}_{j}^{-1}=\bm{D}_{j-1}^{-1}+\bm{D}_{j-1}\bm{V}_{j}(\bm{I}-\bm{U}_{j}^{\mathrm{T}}\bm{C}_{j-1}^{-1}\bm{U}_{j})^{-1}\bm{V}_{j}^{\mathrm{T}}\bm{D}_{j-1}.

  6. 6.

    Finally, the jjth conditional density in the likelihood is updated as

    f(𝑿j|𝒀j1)=ϕd(𝑿j|𝝃j1T𝚼j21𝒀j1,𝑪j1).f(\bm{X}_{j}\;|\;\bm{Y}_{j-1})=\phi_{d}(\bm{X}_{j}|\bm{\xi}_{j-1}^{\mathrm{T}}\bm{\Upsilon}_{j-2}^{-1}\bm{Y}_{j-1},\bm{C}_{j-1}).

    The determinant term can be updated recursively as det(𝑪j1)=det(𝑪j21)det(𝑰+𝑳j1T𝑪j2𝑳j1).\det(\bm{C}_{j}^{-1})=\det(\bm{C}_{j-2}^{-1})\det(\bm{I}+\bm{L}_{j-1}^{\mathrm{T}}\bm{C}_{j-2}\bm{L}_{j-1}).

The subsequent updates j=(p+1),,Tj=(p+1),\dots,T, required to compute the rest of the factors of the likelihood, can be obtained simply noting 𝝁j=𝝃p𝚼p11𝒀j1,p\bm{\mu}_{j}=\bm{\xi}_{p}\bm{\Upsilon}_{p-1}^{-1}\bm{Y}_{j-1,p} and 𝑪j11=𝑪p1\bm{C}_{j-1}^{-1}=\bm{C}_{p}^{-1} where 𝒀j1,p=(𝑿j1T,,𝑿jpT)T\bm{Y}_{j-1,p}=(\bm{X}_{j-1}^{T},\ldots,\bm{X}_{j-p}^{T})^{T}

3.2 Identification of the Parameters

In the recursive algorithm described above, the parameters 𝛀,(𝑳1,𝑲1),,(𝑳p,𝑲p)\bm{\Omega},(\bm{L}_{1},\bm{K}_{1}),\ldots,(\bm{L}_{p},\bm{K}_{p}) are mapped to the modified parameters 𝛀,(𝑼1,𝑽1),,(𝑼p,𝑽p)\bm{\Omega},(\bm{U}_{1},\bm{V}_{1}),\ldots,(\bm{U}_{p},\bm{V}_{p}). The mapping is not one-to-one since each 𝑽j\bm{V}_{j} has the restriction 𝑽jT𝑫j11𝑽j=𝑰\bm{V}_{j}^{\mathrm{T}}\bm{D}_{j-1}^{-1}\bm{V}_{j}=\bm{I} while the basic parameters 𝑲j\bm{K}_{j} do not have any restrictions. The intermediate parameters in the mapping, 𝛀,(𝑼1,𝑽1),,(𝑼p,𝑽p)\bm{\Omega},(\bm{U}_{1},\bm{V}_{1}),\ldots,(\bm{U}_{p},\bm{V}_{p}), are identifiable only up to rotations. Consider the equivalence classes of pair of d×rjd\times r_{j} matrices (𝑼,𝑽)(\bm{U},\bm{V}) defined through the relation (𝑼,𝑽)(𝑼𝑸,𝑽𝑸)(\bm{U},\bm{V})\equiv(\bm{U}\bm{Q},\bm{V}\bm{Q}) for any rj×rjr_{j}\times r_{j} orthogonal matrix 𝑸\bm{Q}. Let for each rjr_{j}, rj\mathbb{C}_{r_{j}} be the set of such equivalence classes. Also let 𝒮d,s++\mathcal{S}^{++}_{d,s} be a subset of the positive definite cone 𝒮d++\mathcal{S}^{++}_{d} with specified sparsity level ss. The following proposition shows that there is a bijection from the set 𝒮d,s++×r1××rp\mathcal{S}^{++}_{d,s}\times\mathbb{C}_{r_{1}}\times\cdots\times\mathbb{C}_{r_{p}} to the causal VAR(p)(p) parameter space defined by the VAR coefficients 𝑨1,,𝑨p\bm{A}_{1},\ldots,\bm{A}_{p} and innovation variance 𝚺\bm{\Sigma}, where polynomials with coefficients 𝑨1,,𝑨p\bm{A}_{1},\ldots,\bm{A}_{p} satisfy low-rank restrictions and the stationary precision satisfies ss-sparsity. The definition of sparsity is made more clear in the prior specification section. For simplicity, we use a mean zero VAR(pp) process. Before we formally state the identification result, we state and prove a short Lemma that is used to show identification.

Lemma 1.

Given a n×nn\times n matrix 𝐀\bm{A} of rank 1rn1\leq r\leq n for some positive integer nn, and a positive definite n×nn\times n matrix 𝐁\bm{B}, there exists n×rn\times r matrices 𝐂\bm{C} and 𝐑\bm{R} such that 𝐀=𝐂𝐑T\bm{A}=\bm{C}\bm{R}^{\mathrm{T}}, rank(𝐂)=rank(𝐑)=rrank(\bm{C})=rank(\bm{R})=r and 𝐑T𝐁𝐑=𝐈\bm{R}^{\mathrm{T}}\bm{B}\bm{R}=\bm{I}

Proof.

Given a rank factorization 𝑨=𝑼𝑽\bm{A}=\bm{U}\bm{V} and the symmetric square root 𝑩1/2\bm{B}^{1/2}, let 𝑺=𝑸𝒁\bm{S}=\bm{Q}\bm{Z} be the Q-R decomposition of 𝑺=𝑩1/2𝑽T\bm{S}=\bm{B}^{1/2}\bm{V}^{\mathrm{T}} where 𝑸\bm{Q} is n×rn\times r semi-orthogonal and 𝒁\bm{Z} is r×rr\times r nonsingular matrix. Then 𝑨=𝑼𝑽=𝑼𝑽𝑩1/2𝑩1/2=𝑼𝑽𝑩1/2𝑩1/2=𝑼𝒁T𝑸T𝑩1/2=𝑪𝑹\bm{A}=\bm{U}\bm{V}=\bm{U}\bm{V}\bm{B}^{1/2}\bm{B}^{-1/2}=\bm{U}\bm{V}\bm{B}^{1/2}\bm{B}^{-1/2}=\bm{U}\bm{Z}^{\mathrm{T}}\bm{Q}^{\mathrm{T}}\bm{B}^{-1/2}=\bm{C}\bm{R} where 𝑪=𝑼𝒁T\bm{C}=\bm{U}\bm{Z}^{\mathrm{T}} and 𝑹=𝑩1/2𝑸\bm{R}=\bm{B}^{-1/2}\bm{Q}. For this choice, 𝑹T𝑩𝑹=𝑰.\bm{R}^{\mathrm{T}}\bm{B}\bm{R}=\bm{I}.

Proposition 2.

Let 𝛀>0\bm{\Omega}>0 be a d×dd\times d positive definite matrix, which is ss-sparse. Let (𝐔1,𝐕1),(\bm{U}_{1},\bm{V}_{1}),\ldots, (𝐔p,𝐕p)(\bm{U}_{p},\bm{V}_{p}) be given full column rank matrix pairs, of order d×r1,,d×rpd\times r_{1},\dots,d\times r_{p}, respectively. Then there is a unique causal VAR(p)\mathrm{VAR}(p) process with stationary variance matrix 𝚪(0)=𝛀1\bm{\Gamma}(0)=\bm{\Omega}^{-1} and autcovariances 𝚪(1),,𝚪(p)\bm{\Gamma}(1),\ldots,\bm{\Gamma}(p) (and hence the coefficients 𝐀1,,𝐀p\bm{A}_{1},\ldots,\bm{A}_{p} as the Yule-Walker solution) uniquely determined recursively from 𝐖j=𝐔j𝐕jT\bm{W}_{j}=\bm{U}_{j}\bm{V}_{j}^{\mathrm{T}}. The associated increments in the conditional precision matrices 𝐂j1𝐂j11\bm{C}_{j}^{-1}-\bm{C}_{j-1}^{-1} will be of rank rj.r_{j}.

Conversely, let 𝐗t\bm{X}_{t} be a zero mean causal VAR(p)\mathrm{VAR}(p) process such that 𝛀=𝚪(0)1\bm{\Omega}=\bm{\Gamma}(0)^{-1} is ss-sparse and the increments in the conditional precision 𝐂j1𝐂j11\bm{C}_{j}^{-1}-\bm{C}_{j-1}^{-1} are of rank rj.r_{j}. Let 𝐖j\bm{W}_{j} be defined recursively as in (3.9). Then there are d×rjd\times r_{j} matrices (𝐔j,𝐕j)(\bm{U}_{j},\bm{V}_{j}), determined up to rotation, such that 𝐖j=𝐔j𝐕jT\bm{W}_{j}=\bm{U}_{j}\bm{V}_{j}^{\mathrm{T}} and 𝐕jT𝐃j11𝐕j=𝐈\bm{V}_{j}^{\mathrm{T}}\bm{D}_{j-1}^{-1}\bm{V}_{j}=\bm{I} for each jj;

Proof.

Given the parameters 𝛀,(𝑼1,𝑽1),,(𝑼p,𝑽p)\bm{\Omega},(\bm{U}_{1},\bm{V}_{1}),\ldots,(\bm{U}_{p},\bm{V}_{p}), the autocovariances are obtained as 𝚪(0)=𝛀1\bm{\Gamma}(0)=\bm{\Omega}^{-1} and 𝚪(j)T=𝑼j𝑽jT+𝝃j1𝚪j21𝜿j1\bm{\Gamma}(j)^{T}=\bm{U}_{j}\bm{V}_{j}^{\mathrm{T}}+\bm{\xi}_{j-1}\bm{\Gamma}^{-1}_{j-2}\bm{\kappa}_{j-1} with 𝝃j\bm{\xi}_{j} and 𝜿j\bm{\kappa}_{j} determined recursively. The block Toeplitz matrix 𝚼p\bm{\Upsilon}_{p} will be positive definite since the associated 𝑪j1𝑪j=𝑼j𝑼jT\bm{C}_{j-1}-\bm{C}_{j}=\bm{U}_{j}\bm{U}_{j}^{\mathrm{T}} are all nonnegative definite. Hence, the associated VAR polynomial whose coefficients are obtained as (𝑨1,,𝑨p)=𝝃pT𝚼p11(\bm{A}_{1},\ldots,\bm{A}_{p})=\bm{\xi}_{p}^{\mathrm{T}}\bm{\Upsilon}_{p-1}^{-1} will be Schur stable. Also, 𝚺=(𝛀+j=1p𝑪j1𝑼j(𝑰+𝑼jT𝑪j1𝑼j)1𝑼jT𝑪j1)\bm{\Sigma}=(\bm{\Omega}+\sum_{j=1}^{p}\bm{C}_{j}^{-1}\bm{U}_{j}(\bm{I}+\bm{U}_{j}^{T}\bm{C}_{j}^{-1}\bm{U}_{j})^{-1}\bm{U}_{j}^{\mathrm{T}}\bm{C}_{j}^{-1}) will be positive definite, thereby making the associated VAR process a causal process.

For the converse, note that once 𝑾j\bm{W}_{j} are obtained, by Lemma 1 they can be factorized as 𝑾j=𝑼j𝑽jT\bm{W}_{j}=\bm{U}_{j}\bm{V}_{j}^{\mathrm{T}} such that 𝑽j\bm{V}_{j} satisfy 𝑽jT𝑫j1𝑽j=𝑰.\bm{V}_{j}^{\mathrm{T}}\bm{D}_{j-1}\bm{V}_{j}=\bm{I}. However, the pairs (𝑼j,𝑽j)(\bm{U}_{j},\bm{V}_{j}) are unique only up to rotation with orthogonal matrices of order rj×rj.r_{j}\times r_{j}. The increments 𝑪j1𝑪j\bm{C}_{j-1}-\bm{C}_{j} are equal to 𝑾j𝑫j11𝑾jT\bm{W}_{j}\bm{D}_{j-1}^{-1}\bm{W}_{j}^{\mathrm{T}} and hence are nonnegative definite of rank rjr_{j}. Thus, the increments 𝑪j1𝑪j11\bm{C}_{j}^{-1}-\bm{C}_{j-1}^{-1} are also nonnegative definite of rank rjr_{j}. ∎

The over-parameterization of the reduced rank matrices is intentional; it reduces computational complexity. Consistent posterior inference on the set of identifiable parameters of interest is still possible, and the details are given in subsequent sections. For a general d×dd\times d matrix with rank rr, the number of free parameters is d2(dr)2=2drr2d^{2}-(d-r)^{2}=2dr-r^{2} where (dr)2(d-r)^{2} is the co-dimension of the rank manifold of d×dd\times d rank rr matrices. In the proposed parameterization, we are parameterizing the rank rjr_{j} updates 𝑪j1𝑪j11\bm{C}_{j}^{-1}-\bm{C}_{j-1}^{-1} using 2drj2dr_{j} parameters in the matrix pair (𝑳j,𝑲j).(\bm{L}_{j},\bm{K}_{j}). Thus, we have rj2r_{j}^{2} extra parameters that are being introduced for computational convenience. Typically, rjr_{j} will be small and hence the number of extra parameters will be small as well.

3.3 Rank-one Updates

The case when the increments 𝑪j1𝑪j11\bm{C}_{j}^{-1}-\bm{C}_{j-1}^{-1} are rank one matrices, i.e. rj=1r_{j}=1 for all jj, is of special interest. Then, the parameters 𝑼j,𝑽j\bm{U}_{j},\bm{V}_{j} are fully identifiable if one fixes the sign of the first entry in 𝑼j\bm{U}_{j}. Moreover, in this case, the likelihood can be computed without having to invert any matrices or computing square roots.

Specifically, the quantities from the original algorithm can be simplified and recursively computed. The key quantities to be computed at each iteration are the matrices 𝑪j,𝑪j1,𝑫j,𝑫j1,𝑼j,𝑽j.\bm{C}_{j},\bm{C}_{j}^{-1},\bm{D}_{j},\bm{D}_{j}^{-1},\bm{U}_{j},\bm{V}_{j}. At the jjth stage of recursion, Let ljl_{j} and kjk_{j} be scalar quantities defined as lj=𝑳jT𝑪j1𝑳jl_{j}=\bm{L}^{\mathrm{T}}_{j}\bm{C}_{j-1}\bm{L}_{j} and kj=𝑲jT𝑫j11𝑲jk_{j}=\bm{K}_{j}^{\mathrm{T}}\bm{D}_{j-1}^{-1}\bm{K}_{j}. Then, given the vectors Lj,KjL_{j},K_{j}

𝑪j1\displaystyle\bm{C}_{j}^{-1} =\displaystyle= 𝑪j11+𝑳j𝑳jT,\displaystyle\bm{C}_{j-1}^{-1}+\bm{L}_{j}\bm{L}_{j}^{\mathrm{T}},
𝑼j\displaystyle\bm{U}_{j} =\displaystyle= 1(1+zj)𝑪j1𝑳j,\displaystyle\frac{1}{\sqrt{(1+z_{j})}}\bm{C}_{j-1}\bm{L}_{j},
𝑪j\displaystyle\bm{C}_{j} =\displaystyle= 𝑪j1𝑼j𝑼jT,\displaystyle\bm{C}_{j-1}-\bm{U}_{j}\bm{U}_{j}^{\mathrm{T}},
𝑽j\displaystyle\bm{V}_{j} =\displaystyle= 1kj𝑲j,\displaystyle\frac{1}{k_{j}}\bm{K}_{j},
𝑫j\displaystyle\bm{D}_{j} =\displaystyle= 𝑫j1lj(1+lj)𝑽j𝑽jT,\displaystyle\bm{D}_{j-1}-\frac{l_{j}}{(1+l_{j})}\bm{V}_{j}\bm{V}_{j}^{\mathrm{T}},
𝑫j1\displaystyle\bm{D}_{j}^{-1} =\displaystyle= 𝑫j11+11+lj𝑫j1𝑽j𝑽jT𝑫j1.\displaystyle\bm{D}_{j-1}^{-1}+\frac{1}{1+l_{j}}\bm{D}_{j-1}\bm{V}_{j}\bm{V}_{j}^{\mathrm{T}}\bm{D}_{j-1}.

When the dimension is large, the rank-one update parameterization can effectively capture a reasonable dependence structure while providing extremely fast updates in the likelihood computation. In the simulation section, we demonstrate the effectiveness of the rank-one update method via numerical illustrations.

3.4 Prior Specification

If the model is assumed to be causal, the prior should charge only autoregression coefficients complying with causality restrictions while putting prior probability directly on the stationary covariance matrix. Presently, the literature lacks a prior that charges only causal processes with the prior on the precision 𝛀\bm{\Omega} as an independent component of the prior. Using our parametrizations, we achieve that. Recall that in our setup, pp is considered given; hence, no prior distribution is assigned to pp. Also, the ranks r1,,rpr_{1},\ldots,r_{p} are specified. While all of the procedures described in this paper go through where the pp ranks are potentially different from each other, we develop the methodology of low-rank updates by fixing r1==rp=r.r_{1}=\cdots=r_{p}=r.

We consider the modified Cholesky decomposition of 𝛀=(𝑰p𝑬)𝑭(𝑰p𝑬)T\bm{\Omega}=(\bm{I}_{p}-\bm{E})\bm{F}(\bm{I}_{p}-\bm{E})^{\mathrm{T}}, where 𝑬\bm{E} is strictly lower triangular, 𝑭=diag(𝒇)\bm{F}=\hbox{diag}(\bm{f}) is diagonal with 𝒇\bm{f} the vector of entries. For sparse estimation of 𝚺1\bm{\Sigma}^{-1}, we put sparsity inducing prior on 𝑬\bm{E}. We first define a hard thresholding operator Hλ(𝑯)={hi,j𝟙(|hi,j|>λ)}H_{\lambda}(\bm{H})=\{h_{i,j}\mathbbm{1}(|h_{i,j}|>\lambda)\}. Then we set, 𝑬=Hλ(𝑬1)\bm{E}=H_{\lambda}(\bm{E}_{1}), where 𝑬1\bm{E}_{1} is a strictly lower triangular matrix. For off-diagonal entries 𝑬1\bm{E}_{1}, we let eij,1N(0,σe2)e_{ij,1}\sim\mathrm{N}(0,\sigma^{2}_{e}) and σe2Inv-Ga(c1,c1)\sigma^{2}_{e}\sim\hbox{Inv-Ga}(c_{1},c_{1}) for i<ji<j. The components f1,,fpf_{1},\ldots,f_{p} of 𝒇\bm{f} are independently distributed according to the inverse Gaussian distributions with density function πd(t)t3/2e(tξ)2/(2t)\pi_{d}(t)\propto t^{-3/2}e^{-(t-\xi)^{2}/(2t)}, t>0t>0, for some ξ>0\xi>0; see (Chhikara, 1988). This prior has an exponential-like tail near both zero and infinity. We put a weakly informative mean-zero normal prior with a large variance on ξ\xi. While employing sparsity using hard thresholding, the modified Cholesky form is easier to work with. Lastly, we put a Uniform prior on λ\lambda.

  • Conditional precision updates 𝑳j\bm{L}_{j} of dimension d×rd\times r: We first build the matrix 𝚲=[𝝀1;𝝀2;𝝀p]\bm{\Lambda}=[\bm{\lambda}_{1};\bm{\lambda}_{2};\ldots\bm{\lambda}_{p}], where 𝝀j=vec(𝑳j)\bm{\lambda}_{j}=\hbox{vec}(\bm{L}_{j}) placing them one after the other and pp is a pre-specified maximum order of the estimated VAR model. Subsequently, we impose a cumulative shrinkage prior on 𝚲\bm{\Lambda} to ensure that the higher-order columns are shrunk to zero. Furthermore, for each individual 𝑳j\bm{L}_{j} too, we impose another cumulative shrinkage prior to shrinking higher-order columns in 𝑳j\bm{L}_{j}. Our prior follows the cumulative shrinkage architecture from Bhattacharya and Dunson (2011), but with two layers of cumulative shrinkage on Λ\Lambda to shrink its higher-order columns as well as the entries corresponding to 𝑳j\bm{L}_{j}’s with higher jj:

    λk|ϕk,τkψk,/rN(0,ϕk1τk1ψk,r1),\lambda_{\ell k}|\phi_{\ell k},\tau_{k}\psi_{k,\lceil{\ell}/{r}\rceil}\sim\mathrm{N}(0,\phi_{\ell k}^{-1}\tau_{k}^{-1}\psi_{k,\lceil\frac{\ell}{r}\rceil}^{-1}),
    ϕlkGa(ν1,ν1),τk=i=1kδi,ψk,m=i=1mδi(k)\phi_{lk}\sim\mathrm{Ga}(\nu_{1},\nu_{1}),\quad\tau_{k}=\prod_{i=1}^{k}\delta_{i},\quad\psi_{k,m}=\prod_{i=1}^{m}\delta^{(k)}_{i}
    δ1Ga(κ1,1),δiGa(κ2,1) for i2.\delta_{1}\sim\mathrm{Ga}(\kappa_{1},1),\quad\delta_{i}\sim\mathrm{Ga}(\kappa_{2},1)\textrm{ for }i\geq 2.
    δ1(k)Ga(κ1,1),δi(k)Ga(κ2,1) for i2,\delta^{(k)}_{1}\sim\mathrm{Ga}(\kappa_{1},1),\quad\delta^{(k)}_{i}\sim\mathrm{Ga}(\kappa_{2},1)\textrm{ for }i\geq 2,

    where x\lceil x\rceil stands for the ceiling function that maps to the smallest integer, greater than or equal to xx, and Ga stands for the gamma distribution. The parameters ϕlk\phi_{lk} control local shrinkage of the elements in Λ\Lambda, whereas τk\tau_{k} controls column shrinkage of the kk-thth column. Similarly, ψk,/r1)\psi_{k,\lceil{\ell}/{r}\rceil}^{-1}) helps to shrink higher-order columns in 𝑳k\bm{L}_{k}. Following the well-known shrinkage properties of multiplicative gamma prior, we let κ1=2.1\kappa_{1}=2.1 and κ2=3.1\kappa_{2}=3.1, which work well in all of our simulation experiments. However, in the case of rank-one updates, the ψk,m\psi_{k,m}’s are omitted. Computationally, it seems reasonable to keep ψk,m=1\psi_{k,m}=1 as long as rr is pre-specified to a small number.

  • 𝑲j\bm{K}_{j} of dimension d×rd\times r: We consider non-informative flat prior for the entries in 𝑲j\bm{K}_{j}. The rank of 𝑲j\bm{K}_{j} is specified simultaneously with the rank of 𝑳j\bm{L}_{j}. Hence, the cumulative shrinkage prior for 𝑲j\bm{K}_{j} is not needed.

3.5 Posterior Computation

We use Markov chain Monte Carlo algorithms for posterior inference. The individual sampling strategies are described below. Due to the non-smooth and non-linear mapping between the parameters and the likelihood, it is convenient to use the Metropolis-within-Gibbs samplers for different parameters.

Adaptive Metropolis-Hastings (M-H) moves are used to update the lower-triangular entries in the latent 𝑬1\bm{E}_{1} and the entries in 𝒇\bm{f}. Due to the positivity constraint on 𝒇\bm{f}, we update this parameter in the log scale with the necessary Jacobian adjustment. Specifically, we generate the updates from a multivariate normal, where the associated covariance matrix is computed based on the generated posterior samples. Our algorithm is similar to Haario et al. (2001) with some modifications, as discussed below. The initial part of the chain relies on random-walk updates, as no information is available to compute the covariance. However, after the 3500th iteration, we start computing the covariance matrices based on the last SS accepted samples. The choice of 3500 is based on our extensive simulation experimentation. Instead of updating these matrices at each iteration, we update them once at the end of each 100-thth iteration using the last SS accepted samples. The value of SS is increased gradually. Furthermore, the constant variance, multiplied with the covariance matrix as in Haario et al. (2001) is tuned to maintain a pre-specified level of acceptance. The thresholding parameter λ\lambda is also updated in the log scale with a Jacobian adjustment.

  1. 1.

    Adaptive Metropolis-Hastings update for each column in 𝚲\bm{\Lambda} and full conditional Gibbs updates for the other hyperparameters.

  2. 2.

    Adaptive M-H update for 𝑲j\bm{K}_{j}.

To speed up the computation, we initialize 𝛀\bm{\Omega} to the graphical lasso Friedman et al. (2008) output using glasso R package based on the marginal distribution of multivariate time series at every cross-section, ignoring the dependence for a warm start. From this 𝛀\bm{\Omega}, the modified Cholesky parameters 𝑬\bm{E} and 𝑭\bm{F} are initialized. We start the chain setting p=min(pm,T/2)p=\min(p_{m},T/2), where pmp_{m} is a pre-specified lower bound, and initialize the entries in 𝑳j\bm{L}_{j}’s and 𝑲j\bm{K}_{j}’s from Normal(0,1/j)\hbox{Normal}(0,1/j). At the 1000th iteration, we discard the jj’s if the entries in 𝑳j\bm{L}_{j} have very little contribution. In the case of the M-H algorithm, the acceptance rate is maintained between 25% and 50% to ensure adequate mixing of posterior samples.

Recently, Heaps (2023) applied the parametrization from Roy et al. (2019) with flat priors on the new set of parameters and developed HMC-based MCMC computation using rstan. Our priors, however, involve several layers of sparsity structures and, hence, are inherently more complex. We primarily rely on M-H sampling, as direct computation of the gradients that are required for HMC is difficult. We leave the exploration of more efficient implementation using rstan to implement gradient-based samplers as part of possible future investigations.

4 Posterior Contraction

In this section, we study the asymptotic properties of the posterior of the posterior distribution under some additional boundedness conditions on the support of the prior for the collection of all parameters 𝜻:=(𝛀,𝑳1,,𝑳p,𝑲1,,𝑲p)\bm{\zeta}:=(\bm{\Omega},\bm{L}_{1},\ldots,\bm{L}_{p},\bm{K}_{1},\ldots,\bm{K}_{p}). The corresponding true value is denoted by 𝜻0=(𝛀0,𝑳1,0,,𝑳p,0,𝑲1,0,,𝑲p,0)\bm{\zeta}_{0}=(\bm{\Omega}_{0},\bm{L}_{1,0},\ldots,\bm{L}_{p,0},\bm{K}_{1,0},\ldots,\bm{K}_{p,0}) with the related true autoregression coefficients 𝑨1,0,,𝑨p,0\bm{A}_{1,0},\ldots,\bm{A}_{p,0}. As the parameters 𝑳1,,𝑳p,𝑲1,,𝑲p\bm{L}_{1},\ldots,\bm{L}_{p},\bm{K}_{1},\ldots,\bm{K}_{p} are not identifiable, the above true values may not be unique. Below, we assume that the true model parameters represent the assumed form with some choice of true values satisfying the required conditions. The posterior contraction rate is finally obtained for the identifiable parameters 𝛀\bm{\Omega} and 𝑨1,,𝑨p\bm{A}_{1},\ldots,\bm{A}_{p}.

(A1)

The common rank rr of 𝑳1,,𝑳p,𝑲1,,𝑲p\bm{L}_{1},\ldots,\bm{L}_{p},\bm{K}_{1},\ldots,\bm{K}_{p} is given.

(A2)

The prior densities of all entries of 𝑬,𝒇,𝑳1,,𝑳p,𝑲1,,𝑲p\bm{E},\bm{f},\bm{L}_{1},\ldots,\bm{L}_{p},\bm{K}_{1},\ldots,\bm{K}_{p} are positive at the true values 𝑬0,𝒇0,𝑳1,0,,𝑳p,0,𝑲1,0,,𝑲p,0\bm{E}_{0},\bm{f}_{0},\bm{L}_{1,0},\ldots,\bm{L}_{p,0},\bm{K}_{1,0},\ldots,\bm{K}_{p,0} of 𝑬,𝒇,𝑳1,,𝑳p,𝑲1,,𝑲p\bm{E},\bm{f},\bm{L}_{1},\ldots,\bm{L}_{p},\bm{K}_{1},\ldots,\bm{K}_{p} respectively.

(A3)

The entries of 𝑬,𝒇,𝑳1,,𝑳p,𝑲1,,𝑲p\bm{E},\bm{f},\bm{L}_{1},\ldots,\bm{L}_{p},\bm{K}_{1},\ldots,\bm{K}_{p} are independent, and their densities are positive, continuous. Further, the prior densities of the entries of 𝒇,𝑳1,,𝑳p\bm{f},\bm{L}_{1},\ldots,\bm{L}_{p} have a common power-exponential tail (that, bounded by C0exp[c0|x|γ]C_{0}\exp[-c_{0}|x|^{\gamma}] for some constants C0,c0,γ>0C_{0},c_{0},\gamma>0).

(A4)

The entries of 𝒇\bm{f} are independent, bounded below by a fixed positive number, and have a common power-exponential upper tail.

(A5)

The entries of 𝒇0,𝑬0,𝑳1,0,,𝑳p,0,𝑲1,0,,𝑲p,0,𝑨1,0,,𝑨p,0\bm{f}_{0},\bm{E}_{0},\bm{L}_{1,0},\ldots,\bm{L}_{p,0},\bm{K}_{1,0},\ldots,\bm{K}_{p,0},\bm{A}_{1,0},\ldots,\bm{A}_{p,0} and the eigenvalues of 𝛀0\bm{\Omega}_{0} and 𝚺0\bm{\Sigma}_{0} lie within a fixed interval, and those of 𝒇0\bm{f}_{0} lie in the interior of the support of the prior for 𝒇\bm{f}.

Condition (A1) can be disposed of, and the rate in Theorem 2 will remain valid, but writing the proof will become more cumbersome. We note that the assumed bounds on 𝒇\bm{f} in Assumption (A4) ensure that the eigenvalues of 𝛀\bm{\Omega} are bounded below by a fixed positive number. This also ensures that the eigenvalues of 𝚺1\bm{\Sigma}^{-1} are also bounded below by a fixed positive number; that is, the eigenvalues of 𝚺\bm{\Sigma} are bounded above, given the representation 𝚺1=𝛀+j=1p𝑳j𝑳jT\bm{\Sigma}^{-1}=\bm{\Omega}+\sum_{j=1}^{p}\bm{L}_{j}\bm{L}_{j}^{\mathrm{T}} and all terms inside the sum are nonnegative definite. The condition that the entries of 𝒇\bm{f} are bounded below is not essential; it can be removed at the expense of weakening the Frobenius distance or by assuming a skinny tail of the prior density at zero and slightly weakening the rate, depending on the tail of the prior. We forgo the slight generalization in favor of clarity and simplicity. The condition can be satisfied with a minor change in the methodology by replacing the inverse Gaussian prior for 𝒇\bm{f} with a lower-truncated version, provided that the truncation does not exclude the true value. The last assertion will be valid unless the true precision matrix is close to singularity.

Let 𝒀T=vec(𝑿1,𝑿2,,𝑿T)\bm{Y}_{T}=\hbox{vec}(\bm{X}_{1},\bm{X}_{2},\ldots,\bm{X}_{T}). Recall that the joint distribution of 𝒀T\bm{Y}_{T} is dTdT-variate normal with mean the zero vector and dispersion matrix 𝚼T\bm{\Upsilon}_{T}. Thus the likelihood is given by Q𝜻=(2πdet(𝚼T))1/2exp[𝒀TT𝚼T1𝒀T/2]Q_{\bm{\zeta}}=(\sqrt{2\pi\det(\bm{\Upsilon}_{T})})^{-1/2}\exp[-\bm{Y}_{T}^{\mathrm{T}}\bm{\Upsilon}_{T}^{-1}\bm{Y}_{T}/2].

Theorem 2.

Under Conditions (A1)–(A5), the posterior contraction rate for 𝛀\bm{\Omega} at 𝛀0\bm{\Omega}_{0} and for the autoregression parameters 𝐀1,,𝐀p\bm{A}_{1},\ldots,\bm{A}_{p} at 𝐀1,0,,𝐀p,0\bm{A}_{1,0},\ldots,\bm{A}_{p,0} with respect to the Frobenius distance is d(logT)/Td\sqrt{(\log T)/T}.

To prove the theorem, we apply Theorem 3 of Ghosal and Van Der Vaart (2007) (equivalently, Theorem 8.19 of Ghosal and van der Vaart (2017)) and verify the testing condition directly. As the joint distribution of all components of the observations is a dTdT-dimensional multivariate normal, likelihood ratios can be explicitly written down. We construct a test based on the likelihood ratio at a few selected alternative values and obtain uniform bounds for likelihood in small neighborhoods of these alternative values to establish that the resulting test also has exponentially low type II error probabilities in these small neighborhoods. The resulting finitely many tests are combined to obtain the desired test, a technique used earlier in the high-dimensional context by Ning et al. (2020); Jeong and Ghosal (2021); Shi et al. (2021) for independent data. By relating the Kullback-Leibler divergence for the time series to the Frobenius distance on the precision matrix 𝛀\bm{\Omega} and the low-rank increment terms 𝑳1,,𝑳p,𝑲1,,𝑲p\bm{L}_{1},\ldots,\bm{L}_{p},\bm{K}_{1},\ldots,\bm{K}_{p}, we obtain the prior concentration rate, as in Lemma 8 of Ghosal and Van Der Vaart (2007) for a one-dimensional Gaussian time series. The details are shown in the supplementary material section.

If the true precision matrix 𝛀0\bm{\Omega}_{0} has an appropriate lower-dimensional structure and the rank rr of the low-rank increment terms 𝑳1,,𝑳p,𝑲1,,𝑲p\bm{L}_{1},\ldots,\bm{L}_{p},\bm{K}_{1},\ldots,\bm{K}_{p} in Condition (A1) is bounded by a fixed constant, the contraction rate can be improved by introducing a sparsity-inducing mechanism in the prior for 𝑬\bm{E}, for instance, as in Subsection 3.4. Let 𝚺0\bm{\Sigma}_{0} have the modified Cholesky decomposition (𝑰𝑬0)𝑭0(𝑰𝑬0)T(\bm{I}-\bm{E}_{0})\bm{F}_{0}(\bm{I}-\bm{E}_{0})^{\mathrm{T}}.

Theorem 3.

If Conditions (A1)–(A5) hold, rr is a fixed constant, and the number of non-zero entries of 𝐄0\bm{E}_{0} is ss, then the posterior contraction rate for 𝛀\bm{\Omega} at 𝛀0\bm{\Omega}_{0} and for the autoregression parameters 𝐀1,,𝐀p\bm{A}_{1},\ldots,\bm{A}_{p} at 𝐀1,0,,𝐀p,0\bm{A}_{1,0},\ldots,\bm{A}_{p,0} with respect to the Frobenius distance is (d+s)(logT)/T\sqrt{(d+s)(\log T)/T}.

5 Numerical Illustrations

We present the results from a limited simulation experiment and also the results from applying the proposed method to analyze graphical structures among several components time series for the US gross domestic product (GDP).

5.1 Simulation

For the simulation experiment, we primarily study the impact of the sample size and the sparsity level on estimation accuracy for the stationary precision matrix of a first order VAR process. We compare the accuracy of estimating the stationary precision matrix and the associated partial correlation graph for the proposed method along with two other popular graphical model estimation methods, the Gaussian Graphical Model (GGM) and the Gaussian Copula Graphical Model (GCGM).

We generate the data from a VAR(1) model with a fixed marginal precision matrix, 𝛀1\bm{\Omega}_{1}. Specifically, the data 𝑿1,,𝑿T\bm{X}_{1},\ldots,\bm{X}_{T} are generated from the model

𝑿t=𝑨1𝑿t1+𝒁t\displaystyle\bm{X}_{t}=\bm{A}_{1}\bm{X}_{t-1}+\bm{Z}_{t} (5.1)

with the stationary precision matrix chosen as 𝛀=𝛀1\bm{\Omega}=\bm{\Omega}_{1} and the parameters associated with the update of the conditional precision, (𝑳1,𝑲1)(\bm{L}_{1},\bm{K}_{1}), are chosen as random 30×3030\times 30 with entries generated independently from N(0,2.52).N(0,2.5^{2}). The VAR coefficient 𝑨1\bm{A}_{1} and the innovation variance 𝚺1\bm{\Sigma}_{1} are solved from the specified parameters, 𝛀1,𝑳1,𝑲1,\bm{\Omega}_{1},\bm{L}_{1},\bm{K}_{1}, following the steps defined in section 3.1. The initial observation, 𝑿1\bm{X}_{1}, is generated from the stationary distribution and subsequent observations are generated via the iteration in (5.1). Two different sample sizes are used, T=40T=40 and T=60.T=60. For the sparse precision matrix, we used two different levels of sparsity, 15%15\% and 25%25\%. The sparse precision 𝛀1\bm{\Omega}_{1} is generated using the following method.

  1. (1)

    Adjacency matrix: we generate three small-world networks, each containing 10 disjoint sets of nodes with two different choices for nei variable in sample_smallworld of igraph (Csárdi et al., 2024) as 10 and 5. Then, we randomly connect some nodes across the small worlds with probability qq. The parameters in the small world distribution and qq are adjusted to attain the desired sparsity levels in the precision matrix.

  2. (2)

    Precision matrix: Using the above adjacency matrix, we apply G-Wishart with scale 6 and truncate entries smaller than 1 in magnitude.

The simulated model is a full-rank 30-dimensional first-order VAR. Thus, the proposed method of fitting low-rank matrices for the conditional precision updates is merely an approximation and is fitting a possibly misspecified model. Posterior computation for the proposed method is done using the steps given in section 3.5, where a higher-order VAR with low-rank updates is used to fit the data. The upper bound on the order of the progress is chosen to be pm=10.p_{m}=10. Thus, we are fitting a mis-specified model, and evaluating the robustness characteristics of the method along with estimation accuracy. Table 1 shows the median MSE for estimation of the 30×\times30 precision matrix for model (5.1). The proposed method has higher estimation accuracy than the GCGM and GGM methods that ignore the temporal dependence. Thus, explicitly modeling the dependence, even under incorrect specification of the rank, provides substantial gain in terms of MSE. The comparative performance is better for the proposed method when the sparsity level is 25%.25\%. The Gaussian Graphical Model seemed sensitive to the specification of the simulation model. The MSE for he GGM increases with increasing sample size, a phenomenon presumably an artifact of ignoring the dependence in the sample. We simulated the parameters of the conditional precision update, 𝑳1,𝑲1,\bm{L}_{1},\bm{K}_{1}, independently from N(0,1)N(0,1) (not reported here) as well and found that the MSE for the GGM to be decreasing with the sample size. This could be because the N(0,1)N(0,1) is more concentrated around zero, making the simulated model closer to the independent model. The MSE for all the methods goes up with a decreased sparsity level. A greater number of nonzero entries in the precision matrix makes the problem harder for sparse estimation methods and leads to higher estimation errors. Our results above are all based on rank-1 updates. We also obtain results for rank-3 updates (not shown here) but the results show marginal improvement over rank-1 updates for precision matrix estimation.

We also investigate how well the partial correlation graph is estimated under different methods when the samples are dependent, arising from a VAR process. The true graph is given by the nonzero entries in the true precision 𝛀\bm{\Omega}. The estimated graph is obtained by thresholding the estimated precision at a given level τ.\tau. For the present investigation we use τ=0.15\tau=0.15, i.e. two different time series nodes are declared connected if the partial correlation between the two components exceeds 0.15 in the estimated precision matrix. Figure 1 shows the ROC curves for the three estimation methods for different settings; two different sample sizes T=40,60T=40,60 and two different sparsity levels 15%,25%.15\%,25\%. In every case the proposed method has better performance than GCGM and GGM, with the GCGM performing better than the GGM. The AUC values for the proposed method are all bigger than those for the GCGM and GGM.

Table 1: Median estimation MSE in estimating the precision matrix of dimension 30×3030\times 30 when the data is generated from VAR(1).
Time points 15% Non-zero 25% Non-zero
Causal VAR GCGM GGM Causal VAR GCGM GGM
40 7.01 6.30 21.87 8.95 14.92 17.83
60 5.37 6.46 34.33 7.18 10.68 34.10
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 1: ROC comparison for different cases: Black = GCGM, Red = GGM, Green = Causal VAR when the true data is generated using VAR(1) model.

5.2 Graph Structure in US GDP Components

For graph estimation, the simulation experiment illustrated the benefit of explicitly accounting for temporal dependence in the sample. Here we analyze the graphical association among components of the US GDP based on time series data on each of the components. The data are obtained from bea.gov, collected quarterly for the period 2010 to 2019 with a total of 40 time points. We chose this particular period to avoid large external shocks like the Great Recession and the COVID-19 pandemic. We study 14 components that are used in the computation of the aggregate GDP value. Specifically, we study associations between the following components: Durable Goods (Dura), Nondurable Goods (NonDur), Services (Serv), Structures (Strct), Equipment (Equip), Intellectual Property Products (IPP), Residential Products (Resid), Exports-Goods (Exp-G), Exports-Services (Exp-S), Imports-Goods (Imp-G), Imports-Services (Imp-S), National Defense (NatDef), Nondefense expenditure (NonDef), State and Local expenditures (St-Lo). The time series for the components are reported independently. However, several of these variables share implicit dependencies. We apply our proposed model to study these dependencies.

The graphical structure is established based on the partial correlation graph obtained from estimating the stationary precision matrix. Figure 2 illustrates the estimated graphical dependence based on the estimated partial correlations with an absolute value of more than 0.15. Thresholding the partial correlations at 0.15 to determine active edges leads to 18 connections, about a fifth of the total possible edges. The partial correlation threshold was varied to check the sensitivity of the estimated graph to the choice of the threshold over the values ranging from 0.05 to 0.5.

Refer to caption
Figure 2: The estimated graphical association among 14 macro-economic variables governing US GDP (threshold 0.15).
Table 2: Connectivity by thresholding partial correlations at 0.15; connections are denoted by ‘*’.
Dura Nondur Serv Strct Equip IPP Resid Exp-G Exp-S Imp-G Imp-S NatDef NonDef St-Lo
Dura * *
NonDur *
Serv *
Strct * *
Equip * * * *
IPP * *
Resid * * * * *
Exp-G * * * * *
Exp-S * * * * *
Imp-G *
Imp-S * * *
NatDef * *
NonDef * *
St-Lo *

To infer the estimated network, we compute graph theoretic nodal attributes. Specifically, we consider 1) Betweenness centrality, 2) Node impact, 3) Degree, 4) Participation coefficient, 5) Efficiency, 6) Average shortest path length, 7) Shortest path eccentricity, and 8) Leverage. For completeness, we provide a short description of these attributes in Section X of the supplementary materials.

From the attributes for the 14 GDP components, equipment, residential products, and export/import goods and services show a high degree of betweenness and centrality. These nodes seem to be deep in the graph. A better picture would emerge if the GDP components are studied at a more granular level in terms of their subcomponents. We plan to investigate that in the future.

Table 3: Graph attributes, computed based on the graphical association in Figure 2.
Dura Nondur Serv Strct Equip IPP Resid Exp-G Exp-S Imp-G Imp-S NatDef NonDef St-Lo
Betweenness 9.000 0.000 0.000 10.000 39.000 10.000 34.000 50.000 34.000 0.000 14.000 2.000 5.000 0.000
Node impact 0.027 -0.101 -0.062 0.066 -0.104 0.040 -0.201 -0.577 0.425 -0.101 0.168 -0.075 -0.024 -0.101
Degree 2.000 1.000 1.000 2.000 4.000 2.000 5.000 5.000 5.000 1.000 3.000 2.000 2.000 1.000
Participation coefficient 0.500 1.000 1.000 1.000 0.375 0.500 0.440 0.440 0.440 1.000 0.556 1.000 1.000 1.000
Efficiency 0.000 0.000 0.000 0.000 0.000 0.000 0.100 0.000 0.100 0.000 0.333 0.000 0.000 0.000
Avg shortest pathlen 2.071 2.786 2.571 2.071 1.714 2.000 1.929 1.929 1.643 2.786 1.857 2.643 2.357 2.786
Shortest path eccentricity 3.000 5.000 4.000 3.000 3.000 3.000 4.000 4.000 3.000 5.000 3.000 5.000 4.000 5.000
Leverage -0.381 -0.667 -0.600 -0.314 0.178 -0.429 0.355 0.460 0.244 -0.667 -0.100 -0.214 -0.214 -0.667

6 Discussion

In this article, we propose a new Bayesian method for estimating the stationary precision matrix of a high-dimensional VAR process under stability constraints and subsequently estimating the contemporaneous stationary graph for the components of the VAR process. The method has several natural advantages. The methodology introduces a parameterization that allows fast computation of the stationary likelihood of a reduced rank high-dimensional Gaussian VAR process, a popular high-dimensional time series model. The new parameterization introduced here gives a natural way to directly model the sparse stationary precision matrix of the high-dimensional VAR process, which is the quantity needed to construct contemporaneous stationary graphs for the VAR time series. Most estimation methods for high-dimensional VAR processes fail to impose causality (or even stationarity) on the solution, thereby making estimating stationary precision matrix less meaningful. The proposed methodology uses causality as a hard constraint so that any estimate of the VAR process is restricted to the causal VAR space. We also show the posterior consistency of our Bayesian estimation scheme when priors are defined through the proposed parameterization. The focus of the present article is on the estimation of the stationary precision matrix of a high-dimensional stationary VAR. Hence, the direct parameterization of the precision, independent from other parameters involved in the temporal dynamics, is critical. However, a similar scheme can provide a direct parameterization of the VAR autocovariance matrices and, hence, the VAR coefficients under reduced rank and causality constraints for a high-dimensional VAR. This is part of future investigation. We also plan to investigate graph consistency for the partial correlation graph obtained by thresholding the entries of the stationary precision matrix of a high-dimensional VAR.

Funding

The authors would like to thank the National Science Foundation collaborative research grants DMS-2210280 (Subhashis Ghosal) / 2210281 (Anindya Roy) / 2210282 (Arkaprava Roy).

References

  • Basu et al. [2015] S. Basu, A. Shojaie, and G. Michailidis. Network granger causality with inherent grouping structure. The Journal of Machine Learning Research, 16(1):417–453, 2015.
  • Basu and Rao [2023] Sumanta Basu and Suhasini Subba Rao. Graphical models for nonstationary time series. The Annals of Statistics, 51:1453–1483, 2023.
  • Bhattacharya and Dunson [2011] Anirban Bhattacharya and David B Dunson. Sparse Bayesian infinite factor models. Biometrika, 98(2):291–306, 2011.
  • Chhikara [1988] Raj Chhikara. The Inverse Gaussian Distribution: Theory, Methodology, and Applications, volume 95. CRC Press, 1988.
  • Christensen [2018] Alexander P Christensen. NetworkToolbox: Methods and measures for brain, cognitive, and psychometric network analysis in R. The R Journal, pages 422–439, 2018. doi: 10.32614/RJ-2018-065.
  • Csárdi et al. [2024] Gábor Csárdi, Tamás Nepusz, Vincent Traag, Szabolcs Horvát, Fabio Zanini, Daniel Noom, and Kirill Müller. igraph: Network Analysis and Visualization in R, 2024. URL https://CRAN.R-project.org/package=igraph. R package version 2.0.3.
  • Dahlhaus [2000] R. Dahlhaus. Graphical interaction models for multivariate time series. Metrika, 51:157––172, 2000.
  • Fiecas et al. [2019] M. Fiecas, C. Leng, W. Liu, and Y Yu. Spectral analysis of high-dimensional time series. Electronic Journal of Statistics, 13(2):4079–4101, 2019.
  • Friedman et al. [2008] Jerome Friedman, Trevor Hastie, and Robert Tibshirani. Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3):432–441, 2008.
  • Ghosal and Van Der Vaart [2007] Subhashis Ghosal and Aad Van Der Vaart. Convergence rates of posterior distributions for noniid observations. The Annals of Statistics, 35(1):192–223, 2007.
  • Ghosal and van der Vaart [2017] Subhashis Ghosal and Aad van der Vaart. Fundamentals of Nonparametric Bayesian Inference, volume 44 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 2017. ISBN 978-0-521-87826-5. doi: 10.1017/9781139029834. URL https://doi-org.prox.lib.ncsu.edu/10.1017/9781139029834.
  • Ghosh et al. [2019] Satyajit Ghosh, Kshitij Khare, and George Michailidis. High-dimensional posterior consistency in bayesian vector autoregressive models. Journal of the American Statistical Association, 114(526):735–748, 2019.
  • Haario et al. [2001] Heikki Haario, Eero Saksman, and Johanna Tamminen. An adaptive Metropolis algorithm. Bernoulli, 7(2):223–242, 2001.
  • Heaps [2023] Sarah E Heaps. Enforcing stationarity through the prior in vector autoregressions. Journal of Computational and Graphical Statistics, 32(1):74–83, 2023.
  • Jeong and Ghosal [2021] Seonghyun Jeong and Subhashis Ghosal. Unified bayesian theory of sparse linear regression with nuisance parameters. Electronic Journal of Statistics, 15(1):3040–3111, 2021.
  • Joyce et al. [2010] Karen E Joyce, Paul J Laurienti, Jonathan H Burdette, and Satoru Hayasaka. A new measure of centrality for brain networks. PloS one, 5:e12200, 2010.
  • Jung et al. [2015] A. Jung, G. Hannak, and N. Goertz. Graphical lasso based model selection for time series. IEEE Signal Processing Letters, 22(10):1781–1785, 2015.
  • Kenett et al. [2011] Yoed N Kenett, Dror Y Kenett, Eshel Ben-Jacob, and Miriam Faust. Global and local features of semantic networks: Evidence from the hebrew mental lexicon. PloS one, 6:e23912, 2011.
  • Koller and Friedman [2009] Daphne Koller and Nir Friedman. Probabilistic Graphical Models - Principles and Techniques. MIT Press, 2009.
  • Lauritzen [1996] Steffen L. Lauritzen. Graphical Models. Oxford, 1996.
  • Lütkepohl [2005] Helmut Lütkepohl. New Introduction to Multiple Time Series Analysis. Springer Science & Business Media, 2005.
  • Ma and Michailidis [2016] Jing Ma and George Michailidis. Joint structural estimation of multiple graphical models. The Journal of Machine Learning Research, 17:5777–5824, 2016.
  • Maathuis et al. [2019] Marloes Maathuis, Mathias Drton, Steffen L. Lauritzen, and Martin Wainwright, editors. Handbook of Graphical Models. Handbooks of Modern Statistical Methods. CRC Press, 2019.
  • Ning et al. [2020] Bo Ning, Seonghyun Jeong, and Subhashis Ghosal. Bayesian linear regression for multivariate responses under group sparsity. Bernoulli, 26(3):2353–2382, 2020.
  • Pons and Latapy [2005] Pascal Pons and Matthieu Latapy. Computing communities in large networks using random walks. In International symposium on computer and information sciences, pages 284–293. Springer, 2005.
  • Qiu et al. [2016] Huitong Qiu, Fang Han, Han Liu, and Brian Caffo. Joint estimation of multiple graphical models from high dimensional time series. Journal of the Royal Statistical Society Series B: Statistical Methodology, 78(2):487–504, 2016.
  • Roy et al. [2019] Anindya Roy, Tucker S Mcelroy, and Peter Linton. Constrained estimation of causal invertible varma. Statistica Sinica, 29(1):455–478, 2019.
  • Roy et al. [2024] Arkaprava Roy, Anindya Roy, and Subhashis Ghosal. Bayesian inference for high-dimensional time series by latent process modeling. arXiv preprint arXiv:2403.04915, 2024.
  • Shi et al. [2021] Wenli Shi, Subhashis Ghosal, and Ryan Martin. Bayesian estimation of sparse precision matrices in the presence of gaussian measurement error. Electronic Journal of Statistics, 15(2):4545–4579, 2021.
  • Watson [2020] Christopher G. Watson. brainGraph: Graph Theory Analysis of Brain MRI Data, 2020. R package version 3.0.0.
  • Zhang and Wu [2017] D. Zhang and W. B. Wu. Gaussian approximation for high dimensional time series. The Annals of Statistics, 45:1895–1919, 2017.

Supplementary materials

S1 Proof of the Main Theorems

We follow arguments in the general posterior contraction rate result from Theorem 8.19 of Ghosal and van der Vaart [2017] on the joint density of the entire multivariate time series by directly constructing a likelihood ratio test satisfying a condition like (8.17) of Ghosal and van der Vaart [2017]. We also use the simplified prior concentration condition (8.22) and global entropy condition (8.23) of Ghosal and van der Vaart [2017]. It is more convenient to give direct arguments than to fit within the notations of the theorem.

A.1 Prior concentration: pre-rate

Recall that the true parameter is denoted by 𝜻0\bm{\zeta}_{0}, and the corresponding dispersion matrix is 𝚼T,0\bm{\Upsilon}_{T,0}. We shall obtain the pre-rate ϵ¯T\bar{\epsilon}_{T} satisfying the relation

logΠ(K(Q𝜻0,Q𝜻)Tϵ¯T2,V(Q𝜻0,Q𝜻)Tϵ¯T2)Tϵ¯T2,\displaystyle-\log\Pi(K(Q_{\bm{\zeta}_{0}},Q_{\bm{\zeta}})\leq T\bar{\epsilon}_{T}^{2},V(Q_{\bm{\zeta}_{0}},Q_{\bm{\zeta}})\leq T\bar{\epsilon}_{T}^{2})\lesssim T\bar{\epsilon}_{T}^{2}, (A.1)

where KK and VV, respectively, stand for the Kullback-Leibler divergence and Kullback-Leibler variation. To proceed, we show that the above event contains {𝜻:𝜻𝜻0ηT}\{\bm{\zeta}:\|\bm{\zeta}-\bm{\zeta}_{0}\|_{\infty}\leq\eta_{T}\} for some ηT\eta_{T} and estimate the probability of the latter, when ηT\eta_{T} is small.

We note that by Relation (iii) of Lemma 5, 𝚼T,0op2𝚺0op2\|\bm{\Upsilon}_{T,0}\|^{2}_{\mathrm{op}}\leq\|\bm{\Sigma}_{0}\|^{2}_{\mathrm{op}}, while by Relation (i), 𝚼T,01op2𝚺01op2\|\bm{\Upsilon}_{T,0}^{-1}\|^{2}_{\mathrm{op}}\leq\|\bm{\Sigma}_{0}^{-1}\|^{2}_{\mathrm{op}}, which are bounded by a constant by Condition (A5). Next observe that by Relation (iii) applied to 𝚼T\bm{\Upsilon}_{T}, 𝚼Top𝚺op𝛀1op=1/𝒇\|\bm{\Upsilon}_{T}\|_{\mathrm{op}}\leq\|\bm{\Sigma}\|_{\mathrm{op}}\leq\|\bm{\Omega}^{-1}\|_{\mathrm{op}}=\|1/\bm{f}\|_{\infty}, where 1/𝒇1/\bm{f} refers to the vector of the reciprocals of the entries of 𝒇\bm{f}. Since the entries of 𝒇0\bm{f}_{0} lie between two fixed positive numbers, so do the entries of 𝒇\bm{f} and 1/𝒇1/\bm{f} when 𝒇𝒇0\|\bm{f}-\bm{f}_{0}\|_{\infty} is small. Then it is immediate that (see, e.g., Lemma 4 of Roy et al. [2024]) that 𝛀𝛀0Fdη\|\bm{\Omega}-\bm{\Omega}_{0}\|_{\mathrm{F}}\lesssim d\eta. Applying Lemma 8, we obtain the relation 𝚼T,0𝚼TFdrT2η\|\bm{\Upsilon}_{T,0}-\bm{\Upsilon}_{T}\|_{\mathrm{F}}\lesssim drT^{2}\eta, which is small for η\eta sufficiently small. Thus, 𝚼T,01/2(𝚼T,0𝚼T)𝚼T,01/2F\|\bm{\Upsilon}_{T,0}^{-1/2}(\bm{\Upsilon}_{T,0}-\bm{\Upsilon}_{T})\bm{\Upsilon}_{T,0}^{-1/2}\|_{\mathrm{F}} is also small. Since the operator norm is weaker than the Frobenius norm, this implies all eigenvalues of 𝚼T,01/2(𝚼T,0𝚼T)𝚼T,01/2\bm{\Upsilon}_{T,0}^{-1/2}(\bm{\Upsilon}_{T,0}-\bm{\Upsilon}_{T})\bm{\Upsilon}_{T,0}^{-1/2} are close to zero. Now, expanding the estimates in Parts (ii) and (iii) of Lemma 4 in a Taylor series, the Kullback-Leibler divergence and variation between Q𝜻0Q_{\bm{\zeta}_{0}} and Q𝜻Q_{\bm{\zeta}} are bounded by a multiple of 𝚼T,01/2(𝚼T,0𝚼T)𝚼T,01/2F2d2r2T4η2=Tϵ¯T2\|\bm{\Upsilon}_{T,0}^{-1/2}(\bm{\Upsilon}_{T,0}-\bm{\Upsilon}_{T})\bm{\Upsilon}_{T,0}^{-1/2}\|_{\mathrm{F}}^{2}\leq d^{2}r^{2}T^{4}\eta^{2}=T\bar{\epsilon}_{T}^{2} if η\eta is chosen to be d1T3/2ϵTd^{-1}T^{-3/2}\epsilon_{T}. The prior probability of 𝜻𝜻0η\|\bm{\zeta}-\bm{\zeta}_{0}\|_{\infty}\leq\eta is, in view of the assumed a priori independence of all components of 𝜻\bm{\zeta}, bounded below by (c¯η)dim(𝜻)(\bar{c}\eta)^{\mathrm{dim}(\bm{\zeta})} for some constant c¯>0\bar{c}>0. Hence logΠ(𝜻𝜻0η)(d2+d+2pdr)log(1/η)d2log(1/η)-\log\Pi(\|\bm{\zeta}-\bm{\zeta}_{0}\|_{\infty}\leq\eta)\lesssim(d^{2}+d+2pdr)\log(1/\eta)\lesssim d^{2}\log(1/\eta). Thus (A.1) for Tϵ¯T2d2logTT\bar{\epsilon}_{T}^{2}\asymp d^{2}\log T, that is for ϵ¯Td(logT)/T\bar{\epsilon}_{T}\asymp d\sqrt{(\log T)/T}, or any larger sequence.

A.2 Test construction

Recall that the true parameter is denoted by 𝜻0\bm{\zeta}_{0}, and the corresponding dispersion matrix is 𝚼T,0\bm{\Upsilon}_{T,0}. Let 𝜻1\bm{\zeta}_{1} be another point in the parameter space such that the corresponding dispersion matrix 𝚼T,1\bm{\Upsilon}_{T,1} satisfies 𝚼T,1𝚼T,0F>TϵT\|\bm{\Upsilon}_{T,1}-\bm{\Upsilon}_{T,0}\|_{\mathrm{F}}>\sqrt{T}\epsilon_{T}, where ϵT=Mϵ¯T\epsilon_{T}=M\bar{\epsilon}_{T} is a large constant multiple of the pre-rate ϵ¯T\bar{\epsilon}_{T} obtained in Subsection A.1. We first obtain a bound for the type I and type II error probabilities for testing the hypothesis 𝜻=𝜻0\bm{\zeta}=\bm{\zeta}_{0} against 𝜻=𝜻1\bm{\zeta}=\bm{\zeta}_{1}.

Let ϕT=𝟙{Q𝜻1/Q𝜻0>1}\phi_{T}=\mathbbm{1}\{Q_{\bm{\zeta}_{1}}/Q_{\bm{\zeta}_{0}}>1\} stand for the likelihood ratio test. Then, by the Markov inequality applied to the square root of the likelihood ratio, both error probabilities are bounded by eR(Q𝜻1,Q𝜻0)e^{-R(Q_{\bm{\zeta}_{1}},Q_{\bm{\zeta}_{0}})}, where RR stands for the Reyni divergence logQ𝜻1Q𝜻0-\log\int\sqrt{Q_{\bm{\zeta}_{1}}Q_{\bm{\zeta}_{0}}}.

Let ρ1,,ρp\rho_{1},\ldots,\rho_{p} stand for the eigenvalues of 𝚼T,01/2(𝚼T,1𝚼T,0)𝚼T,01/2\bm{\Upsilon}_{T,0}^{-1/2}(\bm{\Upsilon}_{T,1}-\bm{\Upsilon}_{T,0})\bm{\Upsilon}_{T,0}^{-1/2}. By Lemma 4, the Reyni divergence is given by 14j=1T[2log(1+ρj)log(1+ρj)]\frac{1}{4}\sum_{j=1}^{T}[2\log(1+\rho_{j})-\log(1+\rho_{j})]. By arguing as in the proof of Lemma 1 of Roy et al. [2024], each term inside the sum can be bounded below by c2min(ρj2,c1)c_{2}\min(\rho_{j}^{2},c_{1}) for some constants c1,c2>0c_{1},c_{2}>0, where c1c_{1} can be chosen as large as we like at the expense of making c2c_{2} appropriately smaller. Under Condition (A4), using Relation (iii) of Lemma 5, we obtain that the ρj\rho_{j} are bounded by some constant not changing with TT. Thus, with a sufficiently large c1c_{1}, the minimum operation in the estimate is redundant; that is, the Reyni divergence is bounded below by 14c2𝚼T,01/2(𝚼T,1𝚼T,0)𝚼T,01/2F2c2TϵT2\frac{1}{4}c_{2}\|\bm{\Upsilon}_{T,0}^{-1/2}(\bm{\Upsilon}_{T,1}-\bm{\Upsilon}_{T,0})\bm{\Upsilon}_{T,0}^{-1/2}\|_{\mathrm{F}}^{2}\geq c_{2}^{\prime}T\epsilon_{T}^{2} for some constant c2>0c_{2}^{\prime}>0. This follows since 𝚼T,01/2(𝚼T,1𝚼T,0)𝚼T,01/2F𝚼T,1𝚼T,0F/𝚼T,0op2\|\bm{\Upsilon}_{T,0}^{-1/2}(\bm{\Upsilon}_{T,1}-\bm{\Upsilon}_{T,0})\bm{\Upsilon}_{T,0}^{-1/2}\|_{\mathrm{F}}\geq\|\bm{\Upsilon}_{T,1}-\bm{\Upsilon}_{T,0}\|_{\mathrm{F}}/\|\bm{\Upsilon}_{T,0}\|^{2}_{\mathrm{op}} and by Relation (iii) of Lemma 5, 𝚼T,0op2𝛀01op2\|\bm{\Upsilon}_{T,0}\|^{2}_{\mathrm{op}}\leq\|\bm{\Omega}^{-1}_{0}\|^{2}_{\mathrm{op}}, which is bounded by a constant by Condition (A5). Thus both error probabilities are bounded by ec2TϵT2e^{-c_{2}^{\prime}T\epsilon_{T}^{2}} for some c2>0c_{2}^{\prime}>0.

Now let 𝜻2\bm{\zeta}_{2} be another parameter with the associated dispersion matrix 𝚼T,2\bm{\Upsilon}_{T,2} such that 𝚼T,2𝚼T,1F<δT\|\bm{\Upsilon}_{T,2}-\bm{\Upsilon}_{T,1}\|_{\mathrm{F}}<\delta_{T}, where δT\delta_{T} is to be chosen sufficiently small. Then, using the Cauchy-Schwarz inequality and Part (iv) of Lemma 4, the probability of type II error of ϕT\phi_{T} at 𝜻2\bm{\zeta}_{2} is bounded by

𝔼𝜻2(1ϕT)\displaystyle{\mathbb{E}}_{\bm{\zeta}_{2}}(1-\phi_{T}) [𝔼𝜻1(1ϕT)]1/2[𝔼𝜻1(Q𝜻2/Q𝜻2)2]1/2\displaystyle\leq[{\mathbb{E}}_{\bm{\zeta}_{1}}(1-\phi_{T})]^{1/2}[{\mathbb{E}}_{\bm{\zeta}_{1}}(Q_{\bm{\zeta}_{2}}/Q_{\bm{\zeta}_{2}})^{2}]^{1/2}
exp{c2TϵT2/2+𝚼T,11op2𝚼T,2𝚼T,1F2/2}.\displaystyle\leq\exp\{-c_{2}^{\prime}T\epsilon_{T}^{2}/2+\|\bm{\Upsilon}_{T,1}^{-1}\|_{\mathrm{op}}^{2}\|\bm{\Upsilon}_{T,2}-\bm{\Upsilon}_{T,1}\|_{\mathrm{F}}^{2}/2\}. (A.2)

Consider a sieve T\mathcal{F}_{T} consisting of all alternative parameter points 𝜻\bm{\zeta} such that 𝒇,1/𝒇C(Tϵ¯T2)1/γ\|\bm{f}\|_{\infty},\|1/\bm{f}\|_{\infty}\leq C({T}\bar{\epsilon}_{T}^{2})^{1/\gamma} and max(𝑳1,,𝑳p)C(Tϵ¯T2)1/γ\max(\|\bm{L}_{1}\|_{\infty},\ldots,\|\bm{L}_{p}\|_{\infty})\leq C({T}\bar{\epsilon}_{T}^{2})^{1/\gamma}, max(𝑲1,,𝑲p)C(Tϵ¯T2)1/γ\max(\|\bm{K}_{1}\|_{\infty},\ldots,\|\bm{K}_{p}\|_{\infty})\leq C({T}\bar{\epsilon}_{T}^{2})^{1/\gamma}, 𝑬C(Tϵ¯T2)1/γ\|\bm{E}\|_{\infty}\leq C({T}\bar{\epsilon}_{T}^{2})^{1/\gamma} for some constant CC to be chosen later, where γ\gamma is as in Condition (A2). By Part (i) of Lemma 5, for 𝜻1T\bm{\zeta}_{1}\in\mathcal{F}_{T},

𝚼T,11op𝚺11op𝛀1op+j=1p𝑳j,1𝑳j,1Top𝒇1+j=1ptr(𝑳j,1𝑳j,1T),\|\bm{\Upsilon}_{T,1}^{-1}\|_{\mathrm{op}}\leq\|\bm{\Sigma}_{1}^{-1}\|_{\mathrm{op}}\leq\|\bm{\Omega}_{1}\|_{\mathrm{op}}+\sum_{j=1}^{p}\|\bm{L}_{j,1}\bm{L}_{j,1}^{\rm T}\|_{\mathrm{op}}\leq\|\bm{f}_{1}\|_{\infty}+\sum_{j=1}^{p}\mathrm{tr}(\bm{L}_{j,1}\bm{L}_{j,1}^{\rm T}),

which is bounded by 𝒇1+pdrmaxj(𝑳j,12)C(Tϵ¯T2)1/γ+pdr(Tϵ¯T2)2/γCdr(Tϵ¯T2)2/γ\|\bm{f}_{1}\|_{\infty}+pdr\max_{j}(\|\bm{L}_{j,1}\|_{\infty}^{2})\leq C({T}\bar{\epsilon}_{T}^{2})^{1/\gamma}+pdr(T\bar{\epsilon}_{T}^{2})^{2/\gamma}\leq C^{\prime}dr(T\bar{\epsilon}_{T}^{2})^{2/\gamma} for some constant C>0C^{\prime}>0 as pp is fixed and Tϵ¯T2T\bar{\epsilon}_{T}^{2}\to\infty. Thus the expression in (A.2) is bounded by exp[c2TϵT2+(Cdr)2(Tϵ¯T2)4/γδT2]exp[c3TϵT2]\exp[-c_{2}T\epsilon_{T}^{2}+(C^{\prime}dr)^{2}(T\bar{\epsilon}_{T}^{2})^{4/\gamma}\delta_{T}^{2}]\leq\exp[-c_{3}T\epsilon_{T}^{2}] for some constant c3>0c_{3}>0 by choosing δT\delta_{T} a sufficiently small constant multiple of (Cdr)1(Tϵ¯T2)1/22/γ(C^{\prime}dr)^{-1}(T\bar{\epsilon}_{T}^{2})^{1/2-2/\gamma} and M>0M>0 large enough.

Under the Lemma 9, within the sieve, we have OT=GT(Tϵ¯T2)2/γO_{T}=G_{T}\lesssim(T\bar{\epsilon}_{T}^{2})^{2/\gamma} and CT=C2dr(Tϵ¯T2)2/γC_{T}=C^{2}dr(T\bar{\epsilon}_{T}^{2})^{2/\gamma}. Then M,Ud(Tϵ¯T2)4/γ,M,Vd(Tϵ¯T2)2/γM_{\ell,U}\lesssim d(T\bar{\epsilon}_{T}^{2})^{4/\gamma},M_{\ell,V}\lesssim d(T\bar{\epsilon}_{T}^{2})^{2/\gamma} and M,C,V,Udp(Tϵ¯T2)8p/γM_{\ell,C,V,U}\lesssim d^{p}(T\bar{\epsilon}_{T}^{2})^{8p/\gamma}. Then MP,1,MP,2d3+p(Tϵ¯T2)(8+8p)/γM_{P,1},M_{P,2}\lesssim d^{3+p}(T\bar{\epsilon}_{T}^{2})^{(8+8p)/\gamma}. Thus,

𝚼1,T𝚼2,TF2T3ζT2(Tϵ¯T2)2/γd3+p(Tϵ¯T2)(8+8p)/γ=δT2,\|\bm{\Upsilon}_{1,T}-\bm{\Upsilon}_{2,T}\|^{2}_{\mathrm{F}}\leq T^{3}\zeta_{T}^{2}(T\bar{\epsilon}_{T}^{2})^{2/\gamma}d^{3+p}(T\bar{\epsilon}_{T}^{2})^{(8+8p)/\gamma}=\delta^{2}_{T},

when {𝛀1𝛀2F2,𝑲1,j𝑲2,jF2,𝑳1,k𝑳2,kF2}ζT2\{\|\bm{\Omega}_{1}-\bm{\Omega}_{2}\|^{2}_{\mathrm{F}},\|\bm{K}_{1,j}-\bm{K}_{2,j}\|^{2}_{\mathrm{F}},\|\bm{L}_{1,k}-\bm{L}_{2,k}\|^{2}_{\mathrm{F}}\}\lesssim\zeta_{T}^{2}

We have, 1) 𝑲1,j𝑲2,jF2dr𝑲1,j𝑲2,j2\|\bm{K}_{1,j}-\bm{K}_{2,j}\|^{2}_{\mathrm{F}}\leq dr\|\bm{K}_{1,j}-\bm{K}_{2,j}\|^{2}_{\infty}, 2) 𝑳1,j𝑳2,j2dr𝑳1,j𝑳2,j2\|\bm{L}_{1,j}-\bm{L}_{2,j}\|^{2}_{\infty}\leq dr\|\bm{L}_{1,j}-\bm{L}_{2,j}\|^{2}_{\mathrm{\infty}}, and 3) 𝛀1𝛀2F2𝒇1𝒇2F2+2𝒇1𝑬𝑬0F2d𝒇1𝒇22+2𝒇1𝑬10𝑬1𝑬22\|\bm{\Omega}_{1}-\bm{\Omega}_{2}\|^{2}_{\mathrm{F}}\leq\|\bm{f}_{1}-\bm{f}_{2}\|^{2}_{F}+2\|\bm{f}_{1}\|_{\infty}\|\bm{E}-\bm{E}_{0}\|_{F}^{2}\leq d\|\bm{f}_{1}-\bm{f}_{2}\|^{2}_{\infty}+2\|\bm{f}_{1}\|_{\infty}\|\bm{E}_{1}\|_{0}\|\bm{E}_{1}-\bm{E}_{2}\|_{\infty}^{2}.

Using 1), 2) and 3) above, we have i) 𝑲1,j𝑲2,j2,𝑳1,j𝑳2,j2ζT2/dr,\|\bm{K}_{1,j}-\bm{K}_{2,j}\|^{2}_{\infty},\|\bm{L}_{1,j}-\bm{L}_{2,j}\|^{2}_{\infty}\leq\zeta_{T}^{2}/dr, ii) 𝒇1𝒇22ζT2/d,\|\bm{f}_{1}-\bm{f}_{2}\|^{2}_{\infty}\leq\zeta_{T}^{2}/d, and iii) 𝑬1𝑬22max{(Tϵ¯T2)2/γlogd,1/d2}ζT2\|\bm{E}_{1}-\bm{E}_{2}\|_{\infty}^{2}\leq\max\{({T}\bar{\epsilon}_{T}^{2})^{-2/\gamma}\log d,1/d^{2}\}\zeta_{T}^{2} implies {𝛀1𝛀2F2,𝑲1,j𝑲2,jF2,𝑳1,k𝑳2,kF2}ζT2\{\|\bm{\Omega}_{1}-\bm{\Omega}_{2}\|^{2}_{\mathrm{F}},\|\bm{K}_{1,j}-\bm{K}_{2,j}\|^{2}_{\mathrm{F}},\|\bm{L}_{1,k}-\bm{L}_{2,k}\|^{2}_{\mathrm{F}}\}\lesssim\zeta_{T}^{2}. Using the result from the above display, we let ζT2\zeta_{T}^{2} as δT2T3(Tϵ¯T2)2/γd3p(Tϵ¯T2)(8+8p)/γ\delta_{T}^{2}T^{-3}(T\bar{\epsilon}_{T}^{2})^{-2/\gamma}d^{-3-p}(T\bar{\epsilon}_{T}^{2})^{-(8+8p)/\gamma} Thus, at 𝜻2T\bm{\zeta}_{2}\in\mathcal{F}_{T} with 𝚼T,2𝚼T,1F<δT\|\bm{\Upsilon}_{T,2}-\bm{\Upsilon}_{T,1}\|_{\mathrm{F}}<\delta_{T}, 𝔼𝜻2(1ϕT)exp[c3TϵT2]{\mathbb{E}}_{\bm{\zeta}_{2}}(1-\phi_{T})\leq\exp[-c_{3}T\epsilon_{T}^{2}].

A.3 Rest of the proof of Theorem 2

To show that the posterior probability of {𝜻:𝚪𝚼0F>ϵT}\{\bm{\zeta}:\|\bm{\Gamma}-\bm{\Upsilon}_{0}\|_{\mathrm{F}}>\epsilon_{T}\} converges to zero in probability, it remains to show that the prior probability of the complement of the sieve is exponentially small: Π(Tc)ecTϵ¯T2\Pi(\mathcal{F}_{T}^{c})\leq e^{-c^{\prime}T\bar{\epsilon}_{T}^{2}} for some sufficiently large c>0c^{\prime}>0.

Observe that

Π(Tc)\displaystyle\Pi(\mathcal{F}_{T}^{c}) j=1dΠ(fj>C(Tϵ¯T2)1/γ)+k=1pj=1dl=1rΠ(|L1,kl|>C(Tϵ¯T2)1/γ)\displaystyle\leq\sum_{j=1}^{d}\Pi(f_{j}>C({T}\bar{\epsilon}_{T}^{2})^{1/\gamma})+\sum_{k=1}^{p}\sum_{j=1}^{d}\sum_{l=1}^{r}\Pi(|L_{1,kl}|>C({T}\bar{\epsilon}_{T}^{2})^{1/\gamma})
+k=1pj=1dl=1rΠ(|L1,kl|>C(Tϵ¯T2)1/γ)\displaystyle\quad+\sum_{k=1}^{p}\sum_{j=1}^{d}\sum_{l=1}^{r}\Pi(|L_{1,kl}|>C({T}\bar{\epsilon}_{T}^{2})^{1/\gamma})
pC0ec0CγTϵ¯T2+2pdrC0ec0CγTϵ¯T2\displaystyle\leq pC_{0}e^{-c_{0}C^{\gamma}T\bar{\epsilon}_{T}^{2}}+2pdrC_{0}e^{-c_{0}C^{\gamma}T\bar{\epsilon}_{T}^{2}}

Since logdTϵ¯T2\log d\ll T\bar{\epsilon}_{T}^{2}, the above expression is bounded by B0eb0TϵT2B_{0}e^{-b_{0}T\epsilon_{T}^{2}} for some B0,b0>0B_{0},b_{0}>0 and b0b_{0} can be chosen as large as we please by making CC in the definition of the sieve T\mathcal{F}_{T} large enough. Hence, by arguing as in the proof of Theorem 8.19 of Ghosal and van der Vaart [2017], it follows that the posterior contraction rate for 𝚼T\bm{\Upsilon}_{T} at 𝚼T,0\bm{\Upsilon}_{T,0} in terms of the Frobenius distance is Td2(logT)/T=d(logT)\sqrt{T}\sqrt{d^{2}(\log T)/T}=d\sqrt{(\log T)}.

Let the collection of such 𝜻2\bm{\zeta}_{2} with 𝚼T,2𝚼T,1F<δT\|\bm{\Upsilon}_{T,2}-\bm{\Upsilon}_{T,1}\|_{\mathrm{F}}<\delta_{T} be denoted by T={𝑲1,j𝑲2,j2,𝑳1,j𝑳2,j2ζT2/dr,𝒇1𝒇22ζT2/d,𝑬1𝑬22ζT2/d2}\mathcal{B}_{T}=\{\|\bm{K}_{1,j}-\bm{K}_{2,j}\|^{2}_{\infty},\|\bm{L}_{1,j}-\bm{L}_{2,j}\|^{2}_{\infty}\leq\zeta_{T}^{2}/dr,\|\bm{f}_{1}-\bm{f}_{2}\|^{2}_{\infty}\leq\zeta_{T}^{2}/d,\|\bm{E}_{1}-\bm{E}_{2}\|_{\infty}^{2}\leq\zeta_{T}^{2}/d^{2}\}, where ζT2=δT2T3(Tϵ¯T2)2/γd3p(Tϵ¯T2)(8+8p)/γ\zeta_{T}^{2}=\delta_{T}^{2}T^{-3}(T\bar{\epsilon}_{T}^{2})^{-2/\gamma}d^{-3-p}(T\bar{\epsilon}_{T}^{2})^{-(8+8p)/\gamma}. The number of such sets T\mathcal{B}_{T} needed to cover T\mathcal{F}_{T} is at most NTTBN_{T}\leq T^{B} following the similar lines of arguments as in Lemma 8 of Roy et al. [2024] setting δT\delta_{T} a negative polynomial in TT. Therefore logNTTϵT2\log N_{T}\lesssim T\epsilon_{T}^{2}. For each T\mathcal{B}_{T} containing a point 𝜻1\bm{\zeta}_{1} such that 𝚼T,1𝚼T,0F>ϵT\|\bm{\Upsilon}_{T,1}-\bm{\Upsilon}_{T,0}\|_{\mathrm{F}}>\epsilon_{T}, we choose a representative and construct the likelihood ratio test ϕT\phi_{T}. The final test for testing the hypothesis 𝜻=𝜻0\bm{\zeta}=\bm{\zeta}_{0} against the alternative {𝜻:𝚼T,1𝚼T,0F>TϵT}\{\bm{\zeta}:\|\bm{\Upsilon}_{T,1}-\bm{\Upsilon}_{T,0}\|_{\mathrm{F}}>\sqrt{T}\epsilon_{T}\} is the maximum of these tests. Then, the resulting test has the probability of type II error bounded by exp[c3TϵT2]\exp[-c_{3}T\epsilon_{T}^{2}] while the probability of type I error is bounded by exp[logNTc2TϵT2]exp[c2TϵT2/2]\exp[\log N_{T}-c_{2}^{\prime}T\epsilon_{T}^{2}]\leq\exp[-c_{2}^{\prime}T\epsilon_{T}^{2}/2] if we choose M>0M>0 sufficiently large.

Finally, we convert the notion of convergence to those for 𝛀\bm{\Omega} and 𝑨1,,𝑨p\bm{A}_{1},\ldots,\bm{A}_{p}. Recall that the eigenvalues of ΓT,0\Gamma_{T,0} are bounded between two fixed positive numbers, because by Lemma 5, 𝚼0op=𝛀01op\|\bm{\Upsilon}_{0}\|_{\mathrm{op}}=\|\bm{\Omega}^{-1}_{0}\|_{\mathrm{op}} and 𝚼01op𝚺01op\|\bm{\Upsilon}_{0}^{-1}\|_{\mathrm{op}}\leq\|\bm{\Sigma}_{0}^{-1}\|_{\mathrm{op}} and 𝚺0\bm{\Sigma}_{0} and Ω0\Omega_{0} are assumed to have eigenvalues bounded between two fixed positive numbers. Now, applying Lemmas 6 and 7 respectively, 𝛀𝛀0F2C1T1𝚼T𝚼T,0F2\|\bm{\Omega}-\bm{\Omega}_{0}\|_{\mathrm{F}}^{2}\leq C_{1}T^{-1}\|\bm{\Upsilon}_{T}-\bm{\Upsilon}_{T,0}\|_{\mathrm{F}}^{2} and for all j=1,,pj=1,\ldots,p, 𝑨j𝑨j,0F2C1T1𝚼T𝚼T,0F2\|\bm{A}_{j}-\bm{A}_{j,0}\|_{\mathrm{F}}^{2}\leq C_{1}T^{-1}\|\bm{\Upsilon}_{T}-\bm{\Upsilon}_{T,0}\|_{\mathrm{F}}^{2} for some constant C1>0C_{1}>0. These relations yield the rate d(logT)/Td\sqrt{(\log T)/T} for the precision matrix 𝛀\bm{\Omega} and all regression coefficients at their respective true values in terms the Frobenius distance.

A.4 Proof of Theorem 3

When 𝛀0\bm{\Omega}_{0} is sparse and the prior for 𝛀\bm{\Omega} imposes sparsity as described in Section 3.4, we show the improved rate (d+s)(logT)/T\sqrt{(d+s)(\log T)/T}. To this end, we refine the estimate of the prior concentration in Subsection A.1. The arguments used for the test construction, bounding the prior probability of the complement of the sieve and for converting the rate result in terms of the Frobenius distance on 𝛀\bm{\Omega} and 𝑹1,,𝑹p\bm{R}_{1},\ldots,\bm{R}_{p} remain the same. Except to control the number of nonzero entries in 𝑬\bm{E}, we add 𝑬0C(Tϵ¯T2)1/γ/logd\|\bm{E}\|_{0}\leq C({T}\bar{\epsilon}_{T}^{2})^{1/\gamma}/\log d in the sieve and let T={𝑲1,j𝑲2,j2,𝑳1,j𝑳2,j2ζT2/dr,𝒇1𝒇22ζT2/d,𝑬1𝑬22(Tϵ¯T2)2/γlogdζT2}\mathcal{B}_{T}=\{\|\bm{K}_{1,j}-\bm{K}_{2,j}\|^{2}_{\infty},\|\bm{L}_{1,j}-\bm{L}_{2,j}\|^{2}_{\infty}\leq\zeta_{T}^{2}/dr,\|\bm{f}_{1}-\bm{f}_{2}\|^{2}_{\infty}\leq\zeta_{T}^{2}/d,\|\bm{E}_{1}-\bm{E}_{2}\|_{\infty}^{2}\leq({T}\bar{\epsilon}_{T}^{2})^{-2/\gamma}\log d\zeta_{T}^{2}\}. Covering number calculation will remain identical to Lemma 8 of Roy et al. [2024].

Since rr is assumed to be a fixed constant, logΠ(𝑳j𝑳j,0η)dlog(1/η)-\log\Pi(\|\bm{L}_{j}-\bm{L}_{j,0}\|\leq\eta)\lesssim d\log(1/\eta), j=1,,pj=1,\ldots,p, and so is the corresponding estimate for 𝑲1,,𝑲p\bm{K}_{1},\ldots,\bm{K}_{p}. For 𝒇\bm{f}, the corresponding estimate dlog(1/η)d\log(1/\eta) does not change from the previous scenario. However, 𝑬0\bm{E}_{0} is now ss-dimensional instead of (d2)\binom{d}{2}. The estimate for logΠ(𝑬𝑬0η)-\log\Pi(\|\bm{E}-\bm{E}_{0}\|_{\infty}\leq\eta) under the shrinkage prior improves to a multiple of (d+s)log(1/η)(d+s)\log(1/\eta) by the arguments given in the proof of Theorem 4.2 of Shi et al. [2021]. Thus logΠ(𝜻𝜻0η)(d+s)log(1/η)-\log\Pi(\|\bm{\zeta}-\bm{\zeta}_{0}\|_{\infty}\leq\eta)\lesssim(d+s)\log(1/\eta), leading to the asserted pre-rate η¯=(d+s)(logT)/T\bar{\eta}=\sqrt{(d+s)(\log T)/T}. We also note that the additional sparsity imposed on 𝑳1,,𝑳p\bm{L}_{1},\ldots,\bm{L}_{p} through the prior does not further improve the rate when rr is a fixed constant and it may be replaced by a fully non-singular prior.

S2 Auxiliary lemmas and proofs

The following lemma expresses certain divergence measures in the family of centered multivariate normal distributions. The proofs follow from direct calculations.

Lemma 4.

Let f1f_{1} and f2f_{2} be probability densities of kk-dimensional normal distributions with mean zero and dispersion matrices 𝚫1\bm{\Delta}_{1} and 𝚫2\bm{\Delta}_{2} respectively. Let 𝐑=𝚫11/2(𝚫2𝚫1)𝚫11/2\bm{R}=\bm{\Delta}_{1}^{-1/2}(\bm{\Delta}_{2}-\bm{\Delta}_{1})\bm{\Delta}_{1}^{-1/2} with eigenvalues 1<ρ1,,ρk<-1<\rho_{1},\ldots,\rho_{k}<\infty. Then the following assertions hold:

  1. (i)

    The Reyni divergence R(f1,f2)=logf1f2R(f_{1},f_{2})=-\log\int\sqrt{f_{1}f_{2}} is given by

    12logdet((𝚫1+𝚫2)/2)14logdet(𝚫1)14logdet(𝚫2)\displaystyle\frac{1}{2}\log\det((\bm{\Delta}_{1}+\bm{\Delta}_{2})/2)-\frac{1}{4}\log\det(\bm{\Delta}_{1})-\frac{1}{4}\log\det(\bm{\Delta}_{2})
    =12logdet(𝑰+𝑹/2)14logdet(𝑰+𝑹)\displaystyle=\frac{1}{2}\log\det(\bm{I}+\bm{R}/2)-\frac{1}{4}\log\det(\bm{I}+\bm{R})
    =14j=1k[2log(1+ρj/2)log(1+ρj)].\displaystyle=\frac{1}{4}\sum_{j=1}^{k}[2\log(1+\rho_{j}/2)-\log(1+\rho_{j})].
  2. (ii)

    The Kullback-Leibler divergence K(f1,f2)=f1log(f2/f1)K(f_{1},f_{2})=\int f_{1}\log(f_{2}/f_{1}) is given by

    12logdet(𝚫2)12logdet(𝚫1)+12tr(𝚫21/2(𝚫2𝚫1)𝚫21/2)\displaystyle\frac{1}{2}\log\det(\bm{\Delta}_{2})-\frac{1}{2}\log\det(\bm{\Delta}_{1})+\frac{1}{2}\mathrm{tr}(\bm{\Delta}_{2}^{-1/2}(\bm{\Delta}_{2}-\bm{\Delta}_{1})\bm{\Delta}_{2}^{-1/2})
    =12logdet(𝑰+𝑹)+12tr((𝑰+𝑹)1𝑰)\displaystyle=\frac{1}{2}\log\det(\bm{I}+\bm{R})+\frac{1}{2}\mathrm{tr}((\bm{I}+\bm{R})^{-1}-\bm{I})
    =12j=1k[log(1+ρj)ρj/(1+ρj)].\displaystyle=\frac{1}{2}\sum_{j=1}^{k}[\log(1+\rho_{j})-\rho_{j}/(1+\rho_{j})].
  3. (iii)

    The Kullback-Leibler variation V(f1,f2)=f1(log(f2/f1)K(f1,f2))2V(f_{1},f_{2})=\int f_{1}(\log(f_{2}/f_{1})-K(f_{1},f_{2}))^{2} is given by

    12tr((𝚫21/2(𝚫2𝚫1)𝚫21/2)2)=12tr[((𝑰+𝑹)1𝑰)2]=12j=1kρj2/(1+ρj)2.\displaystyle\frac{1}{2}\mathrm{tr}((\bm{\Delta}_{2}^{-1/2}(\bm{\Delta}_{2}-\bm{\Delta}_{1})\bm{\Delta}_{2}^{-1/2})^{2})=\frac{1}{2}\mathrm{tr}[((\bm{I}+\bm{R})^{-1}-\bm{I})^{2}]=\frac{1}{2}\sum_{j=1}^{k}\rho_{j}^{2}/(1+\rho_{j})^{2}.
  4. (iv)

    If 2𝚫1𝚫22\bm{\Delta}_{1}-\bm{\Delta}_{2} is positive definite, then expected squared-likelihood ratio (f2/f1)2f1\int(f_{2}/f_{1})^{2}f_{1} is given by

    det(𝚫1)det(𝚫2)det(2𝚫𝟏𝚫2)=1det(𝑰+𝑹)det(𝑰𝑹)=exp[j=1klog(ρj21)/2]\displaystyle\frac{\det(\bm{\Delta}_{1})}{\sqrt{\det(\bm{\Delta}_{2})\det(2\bm{\Delta_{1}}-\bm{\Delta}_{2})}}=\frac{1}{\sqrt{\det(\bm{I}+\bm{R})\det(\bm{I}-\bm{R})}}=\exp[\sum_{j=1}^{k}\log(\rho_{j}^{2}-1)/2]

    which is bounded by

    exp[j=1kρj2/2]=exp[𝑹F2/2]exp[𝚫11op2𝚫2𝚫1F2/2].\exp[\sum_{j=1}^{k}\rho_{j}^{2}/2]=\exp[\|\bm{R}\|_{\mathrm{F}}^{2}/2]\leq\exp[\|\bm{\Delta}_{1}^{-1}\|_{\mathrm{op}}^{2}\|\|\bm{\Delta}_{2}-\bm{\Delta}_{1}\|_{\mathrm{F}}^{2}/2].
Lemma 5.

For all jj, we have

  • (i)

    𝚼j1op2𝑪p1op2\|\bm{\Upsilon}_{j}^{-1}\|^{2}_{\mathrm{op}}\leq\|\bm{C}_{p}^{-1}\|^{2}_{\mathrm{op}},

  • (ii)

    𝑫j1op2𝑪p1op2\|\bm{D}_{j}^{-1}\|^{2}_{\mathrm{op}}\leq\|\bm{C}_{p}^{-1}\|^{2}_{\mathrm{op}},

  • (iii)

    𝚼jop2𝚪(0)op2\|\bm{\Upsilon}_{j}\|^{2}_{\mathrm{op}}\leq\|\bm{\Gamma}(0)\|^{2}_{\mathrm{op}},

  • (iv)

    𝑫jop2𝚪(0)op2=𝛀1op2\|\bm{D}_{j}\|^{2}_{\mathrm{op}}\leq\|\bm{\Gamma}(0)\|^{2}_{\mathrm{op}}=\|\bm{\Omega}^{-1}\|^{2}_{\mathrm{op}},

  • (v)

    𝚼j1op2max{𝑪j1op2,𝚼j1op2}𝑪p1op2\|\bm{\Upsilon}_{j}^{-1}\|^{2}_{\mathrm{op}}\leq\max\{\|\bm{C}^{-1}_{j}\|^{2}_{\mathrm{op}},\|\bm{\Upsilon}_{j-1}\|^{2}_{\mathrm{op}}\}\leq\|\bm{C}_{p}^{-1}\|^{2}_{\mathrm{op}}

Lemma 6.

We have 𝛀1𝛀2F2\|\bm{\Omega}_{1}-\bm{\Omega}_{2}\|^{2}_{F} is small if 1T𝚼1,T𝚼2,TF2\frac{1}{T}\|\bm{\Upsilon}_{1,T}-\bm{\Upsilon}_{2,T}\|^{2}_{F} is small.

Lemma 7.

If 1T𝚼1,T𝚼2,TF2\frac{1}{T}\|\bm{\Upsilon}_{1,T}-\bm{\Upsilon}_{2,T}\|^{2}_{F} is small, 𝐀1𝐀2F2\|\bm{A}_{1}-\bm{A}_{2}\|^{2}_{F} is small, where 𝐀\bm{A} is the autoregressive coefficient.

The following Lemma is for around the truth case.

Lemma 8.

With {𝛀1𝛀0F2,𝐊1,j𝐊0,jF2,𝐋1,k𝐋0,kF2}ϵ2\{\|\bm{\Omega}_{1}-\bm{\Omega}_{0}\|^{2}_{\mathrm{F}},\|\bm{K}_{1,j}-\bm{K}_{0,j}\|^{2}_{\mathrm{F}},\|\bm{L}_{1,k}-\bm{L}_{0,k}\|^{2}_{\mathrm{F}}\}\lesssim\epsilon^{2}, we have

𝚼1,T𝚼0,TF2ϵ2T3,\displaystyle\|\bm{\Upsilon}_{1,T}-\bm{\Upsilon}_{0,T}\|^{2}_{\mathrm{F}}\lesssim\epsilon^{2}T^{3},

assuming fixed lower and upper bounds on eigenvalues of 𝛀0\bm{\Omega}_{0} and an upper bound for eigenvalues of 𝐋0,k\bm{L}_{0,k}.

Lemma 9.

With {𝛀1𝛀2F2,𝐊1,j𝐊2,jF2,𝐋1,k𝐋2,kF2}ϵ2\{\|\bm{\Omega}_{1}-\bm{\Omega}_{2}\|^{2}_{\mathrm{F}},\|\bm{K}_{1,j}-\bm{K}_{2,j}\|^{2}_{\mathrm{F}},\|\bm{L}_{1,k}-\bm{L}_{2,k}\|^{2}_{\mathrm{F}}\}\lesssim\epsilon^{2}, and 𝛀op2OT,𝚼(0)op2GT,𝐋,k2LT\|\bm{\Omega}_{\ell}\|^{2}_{\mathrm{op}}\leq O_{T},\|\bm{\Upsilon}_{\ell}(0)\|^{2}_{\mathrm{op}}\leq G_{T},\|\bm{L}_{\ell,k}\|^{2}_{\infty}\leq L_{T} and 𝐂,p1op2CT=OT+drLT\|\bm{C}_{\ell,p}^{-1}\|^{2}_{\mathrm{op}}\leq C_{T}=O_{T}+drL_{T} for =1,2\ell=1,2, Setting M1,U=GTLTdrM_{1,U}=G_{T}L_{T}dr, M1,V=CTM_{1,V}=C_{T} and M1,C,V,U=(1+CT2GTLTdr)pM_{1,C,V,U}=(1+C^{2}_{T}G_{T}L_{T}dr)^{p}, we have,

𝚼1,T𝚼2,TF2ϵ2[T2MP,1+T3GTMP,2],\displaystyle\|\bm{\Upsilon}_{1,T}-\bm{\Upsilon}_{2,T}\|^{2}_{\mathrm{F}}\leq\epsilon^{2}[T^{2}M_{P,1}+T^{3}G_{T}M_{P,2}],

where MP,1=[(p+1)+p3{M1,UM1,VM2,UM2,V+2CT(M1,V+M2,U)}+2GT{p4M1,UM1,VM1,C,U,V+2p2CTM2,VM1,VM1,C,U,V+2p2CTM2,UM1,VM1,C,U,V}]M_{P,1}=[(p+1)+p^{3}\{M_{1,U}M_{1,V}M_{2,U}M_{2,V}+2C_{T}(M_{1,V}+M_{2,U})\}+2G_{T}\{p^{4}M_{1,U}M_{1,V}M_{1,C,U,V}+2p^{2}C_{T}M_{2,V}M_{1,V}M_{1,C,U,V}+2p^{2}C_{T}M_{2,U}M_{1,V}M_{1,C,U,V}\}] and MP,2=[2p2(M1,UM1,VM1,C,U,V+p)+2pCTM2,VM1,VM1,C,U,V+2pCTM2,UM1,VM1,C,U,V]M_{P,2}=[2p^{2}(M_{1,U}M_{1,V}M_{1,C,U,V}+p)+2pC_{T}M_{2,V}M_{1,V}M_{1,C,U,V}+2pC_{T}M_{2,U}M_{1,V}M_{1,C,U,V}].

The proof is based on next 5 Lemmas and results.

Lemma 10.

We have

𝚼1,T𝚼2,TF2T2𝚼1,p1𝚼2,p1F2+T2(Tp+1)𝚼1(0)op2𝜿1,p𝚼1,p11𝜿2,p𝚼2,p11F2.\|\bm{\Upsilon}_{1,T}-\bm{\Upsilon}_{2,T}\|^{2}_{\mathrm{F}}\leq T^{2}\|\bm{\Upsilon}_{1,p-1}-\bm{\Upsilon}_{2,p-1}\|^{2}_{F}+T^{2}(T-p+1)\|\bm{\Upsilon}_{1}(0)\|^{2}_{\mathrm{op}}\|\bm{\kappa}_{1,p}\bm{\Upsilon}_{1,p-1}^{-1}-\bm{\kappa}_{2,p}\bm{\Upsilon}_{2,p-1}^{-1}\|^{2}_{F}.
Result 11 (Auxiliary recurrences).

We have,

  1. 1.

    𝚼1,j𝚼2,jF2𝚼1,j1𝚼2,j1F2+𝑫1,j𝑫2,jF2+(𝚼1(0)op2+𝚼2(0)op2)𝚼1,j11𝜿1,jT𝚼2,j11𝜿2,jTF2\|\bm{\Upsilon}_{1,j}-\bm{\Upsilon}_{2,j}\|^{2}_{F}\leq\|\bm{\Upsilon}_{1,j-1}-\bm{\Upsilon}_{2,j-1}\|^{2}_{F}+\|\bm{D}_{1,j}-\bm{D}_{2,j}\|^{2}_{F}+(\|\bm{\Upsilon}_{1}(0)\|^{2}_{\mathrm{op}}+\|\bm{\Upsilon}_{2}(0)\|^{2}_{\mathrm{op}})\|\bm{\Upsilon}_{1,j-1}^{-1}\bm{\kappa}_{1,j}^{\mathrm{T}}-\bm{\Upsilon}_{2,j-1}^{-1}\bm{\kappa}_{2,j}^{\mathrm{T}}\|^{2}_{F}

  2. 2.

    𝑫1,j𝑫2,jF2𝑫1,j1𝑫2,j1F2+𝑾1,jop2𝑾2,jop2𝑪1,j11𝑪2,j11F2+(𝑪1,j11op2+𝑪2,j11op2)𝑾1,j𝑾2,jF2\|\bm{D}_{1,j}-\bm{D}_{2,j}\|^{2}_{F}\leq\|\bm{D}_{1,j-1}-\bm{D}_{2,j-1}\|^{2}_{F}+\|\bm{W}_{1,j}\|^{2}_{\mathrm{op}}\|\bm{W}_{2,j}\|^{2}_{\mathrm{op}}\|\bm{C}_{1,j-1}^{-1}-\bm{C}_{2,j-1}^{-1}\|^{2}_{F}+(\|\bm{C}_{1,j-1}^{-1}\|^{2}_{\mathrm{op}}+\|\bm{C}_{2,j-1}^{-1}\|^{2}_{\mathrm{op}})\|\bm{W}_{1,j}-\bm{W}_{2,j}\|^{2}_{F}

  3. 3.

    𝑾1,j𝑾2,jF2𝑼1,j𝑼2,jF2𝑽1,jop2+𝑽1,j𝑽2,jF2𝑼2,jop2\|\bm{W}_{1,j}-\bm{W}_{2,j}\|^{2}_{F}\leq\|\bm{U}_{1,j}-\bm{U}_{2,j}\|^{2}_{F}\|\bm{V}_{1,j}\|^{2}_{\mathrm{op}}+\|\bm{V}_{1,j}-\bm{V}_{2,j}\|^{2}_{F}\|\bm{U}_{2,j}\|^{2}_{\mathrm{op}}

  4. 4.
    𝚼1,j11𝜿1,jT𝚼2,j11𝜿2,jTF2𝚼1,j21𝜿1,j1T𝚼2,j21𝜿2,j1TF2(1+𝑪1,j1op2𝑼1,jop2𝑽1,jop2)\displaystyle\|\bm{\Upsilon}_{1,j-1}^{-1}\bm{\kappa}_{1,j}^{\mathrm{T}}-\bm{\Upsilon}_{2,j-1}^{-1}\bm{\kappa}_{2,j}^{\mathrm{T}}\|^{2}_{F}\leq\|\bm{\Upsilon}_{1,j-2}^{-1}\bm{\kappa}_{1,j-1}^{\mathrm{T}}-\bm{\Upsilon}_{2,j-2}^{-1}\bm{\kappa}_{2,j-1}^{\mathrm{T}}\|^{2}_{F}(1+\|\bm{C}_{1,j}^{-1}\|^{2}_{\mathrm{op}}\|\bm{U}_{1,j}\|^{2}_{\mathrm{op}}\|\bm{V}_{1,j}\|^{2}_{\mathrm{op}})
    +2𝑪1,j1𝑼1,j𝑽1,j𝑪2,j1𝑼2,j𝑽2,jF2\displaystyle\quad+2\|\bm{C}_{1,j}^{-1}\bm{U}_{1,j}\bm{V}_{1,j}-\bm{C}_{2,j}^{-1}\bm{U}_{2,j}\bm{V}_{2,j}\|^{2}_{F}
Lemma 12 (Bounding first term of Lemma 10).

The bounds in terms of 𝐂k1=𝛀+j=1k𝐋j𝐋jT,𝐔k,𝐕k\bm{C}^{-1}_{k}=\bm{\Omega}+\sum_{j=1}^{k}\bm{L}_{j}\bm{L}_{j}^{T},\bm{U}_{k},\bm{V}_{k}’s,

𝚼1,p1𝚼2,p1F2\displaystyle\|\bm{\Upsilon}_{1,p-1}-\bm{\Upsilon}_{2,p-1}\|^{2}_{F}
𝚼1(0)𝚼2(0)F2+j=0p1𝑫1,j𝑫2,jF2+(𝚼1(0)op2+𝚼2(0)op2)j=0p1𝚼1,j11𝜿1,jT𝚼2,j11𝜿2,jTF2,\displaystyle\quad\leq\|\bm{\Upsilon}_{1}(0)-\bm{\Upsilon}_{2}(0)\|^{2}_{F}+\sum_{j=0}^{p-1}\|\bm{D}_{1,j}-\bm{D}_{2,j}\|^{2}_{F}+(\|\bm{\Upsilon}_{1}(0)\|^{2}_{\mathrm{op}}+\|\bm{\Upsilon}_{2}(0)\|^{2}_{\mathrm{op}})\sum_{j=0}^{p-1}\|\bm{\Upsilon}_{1,j-1}^{-1}\bm{\kappa}_{1,j}^{\mathrm{T}}-\bm{\Upsilon}_{2,j-1}^{-1}\bm{\kappa}_{2,j}^{\mathrm{T}}\|^{2}_{F}, (A.1)
where
j=0p1𝑫1,j𝑫2,jF2\displaystyle\sum_{j=0}^{p-1}\|\bm{D}_{1,j}-\bm{D}_{2,j}\|^{2}_{F}
p𝚼1(0)𝚼2(0)F2+pk=1p[M1,UM1,VM2,UM2,V𝑪1,k11𝑪2,k11F2+(𝑪1,p1op2\displaystyle\quad\leq p\|\bm{\Upsilon}_{1}(0)-\bm{\Upsilon}_{2}(0)\|^{2}_{F}+p\sum_{k=1}^{p}\big{[}M_{1,U}M_{1,V}M_{2,U}M_{2,V}\|\bm{C}_{1,k-1}^{-1}-\bm{C}_{2,k-1}^{-1}\|^{2}_{F}+(\|\bm{C}_{1,p}^{-1}\|^{2}_{\mathrm{op}}
+𝑪2,p1op2)(𝑼1,k𝑼2,kF2M1,V+𝑽1,k𝑽2,kF2M2,U)]\displaystyle\quad+\|\bm{C}_{2,p}^{-1}\|^{2}_{\mathrm{op}})(\|\bm{U}_{1,k}-\bm{U}_{2,k}\|^{2}_{F}M_{1,V}+\|\bm{V}_{1,k}-\bm{V}_{2,k}\|^{2}_{F}M_{2,U})\big{]} (A.2)

where M,C,U,V=(1+𝐂,p1op2M,UM,V)pM_{\ell,C,U,V}=(1+\|\bm{C}_{\ell,p}^{-1}\|^{2}_{\mathrm{op}}\|M_{\ell,U}M_{\ell,V})^{p}, M,U=maxj𝐔,jop2,M,V=maxj𝐕,jop2M_{\ell,U}=\max_{j}\|\bm{U}_{\ell,j}\|^{2}_{\mathrm{op}},M_{\ell,V}=\max_{j}\|\bm{V}_{\ell,j}\|^{2}_{\mathrm{op}} for {1,2}\ell\in\{1,2\}

Lemma 13 (Bounding second term of Lemma 10).

The final bounds for autoregressive coefficients,

𝚼1,p11𝜿1,pT𝚼2,p11𝜿2,pTF2\displaystyle\|\bm{\Upsilon}_{1,p-1}^{-1}\bm{\kappa}_{1,p}^{\mathrm{T}}-\bm{\Upsilon}_{2,p-1}^{-1}\bm{\kappa}_{2,p}^{\mathrm{T}}\|^{2}_{F}
2M1,UM1,VM1,C,U,Vk=0p𝑪1,k1𝑪2,k1F2+2𝑪2,p1op2M2,VM1,VM1,C,U,Vk=1p𝑼1,k𝑼2,kF2\displaystyle\quad\leq 2M_{1,U}M_{1,V}M_{1,C,U,V}\sum_{k=0}^{p}\|\bm{C}^{-1}_{1,k}-\bm{C}^{-1}_{2,k}\|^{2}_{F}+2\|\bm{C}_{2,p}^{-1}\|^{2}_{\mathrm{op}}M_{2,V}M_{1,V}M_{1,C,U,V}\sum_{k=1}^{p}\|\bm{U}_{1,k}-\bm{U}_{2,k}\|^{2}_{F}
+2𝑪2,p1op2F2M2,UM1,VM1,C,U,Vk=1p𝑽1,k𝑽2,kF2.\displaystyle\qquad+2\|\bm{C}_{2,p}^{-1}\|^{2}_{\mathrm{op}}\|^{2}_{F}M_{2,U}M_{1,V}M_{1,C,U,V}\sum_{k=1}^{p}\|\bm{V}_{1,k}-\bm{V}_{2,k}\|^{2}_{F}. (A.3)

where M,U=maxj𝐔,jop2,M,V=maxj𝐕,jop2M_{\ell,U}=\max_{j}\|\bm{U}_{\ell,j}\|^{2}_{\mathrm{op}},M_{\ell,V}=\max_{j}\|\bm{V}_{\ell,j}\|^{2}_{\mathrm{op}}, M,C,U,V=(1+𝐂,p1op2M,UM,V)pM_{\ell,C,U,V}=(1+\|\bm{C}_{\ell,p}^{-1}\|^{2}_{\mathrm{op}}\|M_{\ell,U}M_{\ell,V})^{p}.

The following steps will transform the bounds for 𝑼j\bm{U}_{j}’s and 𝑽j\bm{V}_{j}’s to 𝑳j\bm{L}_{j}’s and 𝑲j\bm{K}_{j}’s.

Lemma 14.
𝑽j,1𝑽j,2F2\displaystyle\|\bm{V}_{j,1}-\bm{V}_{j,2}\|^{2}_{F}
r𝑲j,1op2(𝑲j,1T𝑫j1,11𝑲1,1)1op(𝑲j,2T𝑫j1,21𝑲j,2)1op{𝑲j,1op𝑲j,2op𝑫j1,11𝑫j1,21op\displaystyle\leq r\|\bm{K}_{j,1}\|^{2}_{\mathrm{op}}\|(\bm{K}_{j,1}^{\mathrm{T}}\bm{D}_{j-1,1}^{-1}\bm{K}_{1,1})^{-1}\|_{\mathrm{op}}\|(\bm{K}_{j,2}^{\mathrm{T}}\bm{D}_{j-1,2}^{-1}\bm{K}_{j,2})^{-1}\|_{\mathrm{op}}\{\|\bm{K}_{j,1}\|_{\mathrm{op}}\|\bm{K}_{j,2}\|_{\mathrm{op}}\|\bm{D}_{j-1,1}^{-1}-\bm{D}_{j-1,2}^{-1}\|_{\mathrm{op}}
+|𝑲j,1𝑲j,2op2(𝑫j1,11op𝑲j,1op+𝑫j1,21op𝑲j,2op)}\displaystyle\quad+|\bm{K}_{j,1}-\bm{K}_{j,2}\|^{2}_{\mathrm{op}}(\|\bm{D}_{j-1,1}^{-1}\|_{\mathrm{op}}\|\bm{K}_{j,1}\|_{\mathrm{op}}+\|\bm{D}_{j-1,2}^{-1}\|_{\mathrm{op}}\|\bm{K}_{j,2}\|_{\mathrm{op}})\}
+r(𝑲j,2T𝑫j1,21𝑲j,2)1/2op2𝑲j,1𝑲j,2op2\displaystyle\quad+r\|(\bm{K}_{j,2}^{\mathrm{T}}\bm{D}_{j-1,2}^{-1}\bm{K}_{j,2})^{-1/2}\|^{2}_{\mathrm{op}}\|\bm{K}_{j,1}-\bm{K}_{j,2}\|^{2}_{\mathrm{op}}
𝑼j,1𝑼j,2op2\displaystyle\|\bm{U}_{j,1}-\bm{U}_{j,2}\|^{2}_{\mathrm{op}}
𝑪j1,1op2𝑳j,1op2(𝑰+𝑳j,1T𝑪j1,1𝑳j,1)1op(𝑰+𝑳j,2T𝑪j1,2𝑳j,2)1op𝑳j,1T𝑪j1,1𝑳j,1\displaystyle\quad\leq\|\bm{C}_{j-1,1}\|^{2}_{\mathrm{op}}\|\bm{L}_{j,1}\|^{2}_{\mathrm{op}}\|(\bm{I}+\bm{L}^{\mathrm{T}}_{j,1}\bm{C}_{j-1,1}\bm{L}_{j,1})^{-1}\|_{\mathrm{op}}\|(\bm{I}+\bm{L}^{\mathrm{T}}_{j,2}\bm{C}_{j-1,2}\bm{L}_{j,2})^{-1}\|_{\mathrm{op}}\|\bm{L}^{\mathrm{T}}_{j,1}\bm{C}_{j-1,1}\bm{L}_{j,1}
𝑳2T𝑪j1,2𝑳j,2op+(𝑰+𝑳j,2T𝑪j1,2𝑳j,2)1/2op2𝑪j1,1𝑳j,1𝑪j1,2𝑳j,2op2\displaystyle\qquad-\bm{L}^{\mathrm{T}}_{2}\bm{C}_{j-1,2}\bm{L}_{j,2}\|_{\mathrm{op}}+\|(\bm{I}+\bm{L}^{\mathrm{T}}_{j,2}\bm{C}_{j-1,2}\bm{L}_{j,2})^{-1/2}\|^{2}_{\mathrm{op}}\|\bm{C}_{j-1,1}\bm{L}_{j,1}-\bm{C}_{j-1,2}\bm{L}_{j,2}\|^{2}_{\mathrm{op}}

A.1 Proofs of the Auxiliary Lemmas and Resutls

Proof of Lemma 5.

Since,

𝑴=[𝑨𝑪T𝑪𝑩]=[𝑰p𝟎𝑪𝑨1𝑰q][𝑨𝟎𝟎𝑩𝑪𝑨1CT][𝑰p𝑨1𝑪T𝟎𝑰q].\bm{M}={\begin{bmatrix}\bm{A}&\bm{C}^{\mathrm{T}}\\ \bm{C}&\bm{B}\end{bmatrix}}={\begin{bmatrix}\bm{I}_{p}&\bm{0}\\ \bm{C}\bm{A}^{-1}&\bm{I}_{q}\end{bmatrix}}{\begin{bmatrix}\bm{A}&\bm{0}\\ \bm{0}&\bm{B}-\bm{C}\bm{A}^{-1}C^{\mathrm{T}}\end{bmatrix}}{\begin{bmatrix}\bm{I}_{p}&\bm{A}^{-1}\bm{C}^{\mathrm{T}}\\ \bm{0}&\bm{I}_{q}\end{bmatrix}}.
𝚼j=[𝑰d(j1)𝟎𝜿j𝚼j11𝑰d][𝚼j1𝟎𝟎𝚪(0)𝜿jT𝚼j11𝜿j][𝑰d(j1)𝚼j11𝜿jT𝟎𝑰d]=𝑷𝑸𝑷T.\bm{\Upsilon}_{j}={\begin{bmatrix}\bm{I}_{d(j-1)}&\bm{0}\\ \bm{\kappa}_{j}\bm{\Upsilon}_{j-1}^{-1}&\bm{I}_{d}\end{bmatrix}}{\begin{bmatrix}\bm{\Upsilon}_{j-1}&\bm{0}\\ \bm{0}&\bm{\Gamma}(0)-\bm{\kappa}_{j}^{\mathrm{T}}\bm{\Upsilon}_{j-1}^{-1}\bm{\kappa}_{j}\end{bmatrix}}{\begin{bmatrix}\bm{I}_{d(j-1)}&\bm{\Upsilon}_{j-1}^{-1}\bm{\kappa}_{j}^{\mathrm{T}}\\ \bm{0}&\bm{I}_{d}\end{bmatrix}}=\bm{P}\bm{Q}\bm{P}^{\mathrm{T}}.

The operator norm of 𝑷\bm{P} is 1 and 𝑫j=𝚪(0)𝜿jT𝚼j11𝜿j\bm{D}_{j}=\bm{\Gamma}(0)-\bm{\kappa}_{j}^{\mathrm{T}}\bm{\Upsilon}_{j-1}^{-1}\bm{\kappa}_{j}.

𝚼jop2𝑸op2max{𝑫jop2,𝚼j1op2}𝚪(0)op2,\|\bm{\Upsilon}_{j}\|^{2}_{\mathrm{op}}\leq\|\bm{Q}\|^{2}_{\mathrm{op}}\leq\max\{\|\bm{D}_{j}\|^{2}_{\mathrm{op}},\|\bm{\Upsilon}_{j-1}\|^{2}_{\mathrm{op}}\}\leq\|\bm{\Gamma}(0)\|^{2}_{\mathrm{op}},

by applying the first inequality recursively as 𝑫jop2𝚪(0)op2\|\bm{D}_{j}\|^{2}_{\mathrm{op}}\leq\|\bm{\Gamma}(0)\|^{2}_{\mathrm{op}}.

Applying the above, we also have,

𝚼j1=[𝑰d(j1)𝟎𝜿j𝚼j11𝑰d][𝚼j1100𝑫j1][𝑰d(j1)𝚼j11𝜿jT𝟎𝑰d]=𝑮𝑯𝑮T.\bm{\Upsilon}_{j}^{-1}={\begin{bmatrix}\bm{I}_{d(j-1)}&\bm{0}\\ -\bm{\kappa}_{j}\bm{\Upsilon}_{j-1}^{-1}&\bm{I}_{d}\end{bmatrix}}{\begin{bmatrix}\bm{\Upsilon}_{j-1}^{-1}&0\\ 0&\bm{D}_{j}^{-1}\end{bmatrix}}{\begin{bmatrix}\bm{I}_{d(j-1)}&-\bm{\Upsilon}_{j-1}^{-1}\bm{\kappa}_{j}^{\mathrm{T}}\\ \bm{0}&\bm{I}_{d}\end{bmatrix}}=\bm{G}\bm{H}\bm{G}^{\mathrm{T}}.

The operator norm of 𝑮\bm{G} is 1.

𝚼j1op2𝑯op2max{𝑫j1op2,𝚼j11op2}𝑪p1op2,\|\bm{\Upsilon}_{j}^{-1}\|^{2}_{\mathrm{op}}\leq\|\bm{H}\|^{2}_{\mathrm{op}}\leq\max\{\|\bm{D}^{-1}_{j}\|^{2}_{\mathrm{op}},\|\bm{\Upsilon}_{j-1}^{-1}\|^{2}_{\mathrm{op}}\}\leq\|\bm{C}_{p}^{-1}\|^{2}_{\mathrm{op}},

by applying the first inequality recursively as 𝑫j1op2𝑪p1op2\|\bm{D}_{j}^{-1}\|^{2}_{\mathrm{op}}\leq\|\bm{C}_{p}^{-1}\|^{2}_{\mathrm{op}}, 𝑪p1=𝛀+k=1p𝑳K𝑳kT\bm{C}_{p}^{-1}=\bm{\Omega}+\sum_{k=1}^{p}\bm{L}_{K}\bm{L}_{k}^{\mathrm{T}}.

Thus, 𝚼1,j𝚼2,jF2𝑸1𝑸2F+(𝑸1op2+𝑸2op2)𝑷1𝑷2F2𝚼1,j1𝚼2,j1F2+𝑫1,j𝑫2,jF2+(𝚼1(0)op2+𝚼2(0)op2)𝚼1,j11𝜿1,jT𝚼2,j11𝜿2,jTF2\|\bm{\Upsilon}_{1,j}-\bm{\Upsilon}_{2,j}\|^{2}_{F}\leq\|\bm{Q}_{1}-\bm{Q}_{2}\|_{\mathrm{F}}+(\|\bm{Q}_{1}\|^{2}_{\mathrm{op}}+\|\bm{Q}_{2}\|^{2}_{\mathrm{op}})\|\bm{P}_{1}-\bm{P}_{2}\|^{2}_{F}\leq\|\bm{\Upsilon}_{1,j-1}-\bm{\Upsilon}_{2,j-1}\|^{2}_{F}+\|\bm{D}_{1,j}-\bm{D}_{2,j}\|^{2}_{F}+(\|\bm{\Upsilon}_{1}(0)\|^{2}_{\mathrm{op}}+\|\bm{\Upsilon}_{2}(0)\|^{2}_{\mathrm{op}})\|\bm{\Upsilon}_{1,j-1}^{-1}\bm{\kappa}_{1,j}^{\mathrm{T}}-\bm{\Upsilon}_{2,j-1}^{-1}\bm{\kappa}_{2,j}^{\mathrm{T}}\|^{2}_{F}.

Since, 𝚼j11𝜿jTop21\|\bm{\Upsilon}_{j-1}^{-1}\bm{\kappa}_{j}^{\mathrm{T}}\|^{2}_{\mathrm{op}}\leq 1, we have 𝜿jop2𝚼j11𝜿jTop2𝚼j1op2𝚼j1op2𝚪(0)op2\|\bm{\kappa}_{j}\|^{2}_{\mathrm{op}}\leq\|\bm{\Upsilon}_{j-1}^{-1}\bm{\kappa}_{j}^{\mathrm{T}}\|^{2}_{\mathrm{op}}\|\bm{\Upsilon}_{j-1}\|^{2}_{\mathrm{op}}\leq\|\bm{\Upsilon}_{j-1}\|^{2}_{\mathrm{op}}\leq\|\bm{\Gamma}(0)\|^{2}_{\mathrm{op}}

Proof of Lemma 6.

There are TT many diagonal blocks of 𝚼1(0)\bm{\Upsilon}_{1}(0) and 𝚼2(0)\bm{\Upsilon}_{2}(0) in 𝚼1,T\bm{\Upsilon}_{1,T} and 𝚼2,T\bm{\Upsilon}_{2,T}, respectively. Thus 𝚼1(0)𝚼2(0)F21T𝚼1,T𝚼2,TF2\|\bm{\Upsilon}_{1}(0)-\bm{\Upsilon}_{2}(0)\|^{2}_{F}\leq\frac{1}{T}\|\bm{\Upsilon}_{1,T}-\bm{\Upsilon}_{2,T}\|^{2}_{F} and 𝛀1𝛀2F2𝚼1(0)𝚼2(0)F2𝚼1(0)1op2𝚼2(0)1op2\|\bm{\Omega}_{1}-\bm{\Omega}_{2}\|^{2}_{F}\leq\|\bm{\Upsilon}_{1}(0)-\bm{\Upsilon}_{2}(0)\|^{2}_{F}\|\bm{\Upsilon}_{1}(0)^{-1}\|^{2}_{\mathrm{op}}\|\bm{\Upsilon}_{2}(0)^{-1}\|^{2}_{\mathrm{op}}

𝚼2(0)1op2𝚼1(0)1op2+𝚼2(0)1op2𝚼1(0)1op2𝚼1,T𝚼2,Top2\|\bm{\Upsilon}_{2}(0)^{-1}\|^{2}_{\mathrm{op}}\leq\|\bm{\Upsilon}_{1}(0)^{-1}\|^{2}_{\mathrm{op}}+\|\bm{\Upsilon}_{2}(0)^{-1}\|^{2}_{\mathrm{op}}\|\bm{\Upsilon}_{1}(0)^{-1}\|^{2}_{\mathrm{op}}\|\bm{\Upsilon}_{1,T}-\bm{\Upsilon}_{2,T}\|^{2}_{\mathrm{op}}. Thus, 𝚼2(0)1op2𝚼1(0)1op21𝚼1(0)1op2𝚼1(0)𝚼2(0)op2\|\bm{\Upsilon}_{2}(0)^{-1}\|^{2}_{\mathrm{op}}\leq\frac{\|\bm{\Upsilon}_{1}(0)^{-1}\|^{2}_{\mathrm{op}}}{1-\|\bm{\Upsilon}_{1}(0)^{-1}\|^{2}_{\mathrm{op}}\|\bm{\Upsilon}_{1}(0)-\bm{\Upsilon}_{2}(0)\|^{2}_{\mathrm{op}}} for small 𝚼1(0)𝚼2(0)op2\|\bm{\Upsilon}_{1}(0)-\bm{\Upsilon}_{2}(0)\|^{2}_{\mathrm{op}}.

Hence, 𝛀1𝛀2F2𝚼1(0)𝚼2(0)F2𝚼1(0)1op21𝚼1(0)1op2𝚼1(0)𝚼2(0)op2\|\bm{\Omega}_{1}-\bm{\Omega}_{2}\|^{2}_{F}\leq\|\bm{\Upsilon}_{1}(0)-\bm{\Upsilon}_{2}(0)\|^{2}_{F}\frac{\|\bm{\Upsilon}_{1}(0)^{-1}\|^{2}_{\mathrm{op}}}{1-\|\bm{\Upsilon}_{1}(0)^{-1}\|^{2}_{\mathrm{op}}\|\bm{\Upsilon}_{1}(0)-\bm{\Upsilon}_{2}(0)\|^{2}_{\mathrm{op}}}. This completes the proof.

Proof of Lemma 7.

𝑨1𝑨2F2=𝝃1,p𝝃2,pF2𝚪1,p11op2+𝚪1,p11𝚪2,p11F2𝝃2,p1op2\|\bm{A}_{1}-\bm{A}_{2}\|^{2}_{F}=\|\bm{\xi}_{1,p}-\bm{\xi}_{2,p}\|^{2}_{F}\|\bm{\Gamma}^{-1}_{1,p-1}\|^{2}_{\mathrm{op}}+\|\bm{\Gamma}^{-1}_{1,p-1}-\bm{\Gamma}^{-1}_{2,p-1}\|^{2}_{F}\|\bm{\xi}_{2,p-1}\|^{2}_{\mathrm{op}}.

There are a total of TpT-p blocks of 𝝃1,p\bm{\xi}_{1,p} in 𝚼1,T\bm{\Upsilon}_{1,T}. Thus We have (Tp)𝝃1,p𝝃2,pF2𝚼1,T𝚼2,TF2TpT𝝃1,p𝝃2,pF21T𝚼1,T𝚼2,TF2(T-p)\|\bm{\xi}_{1,p}-\bm{\xi}_{2,p}\|^{2}_{F}\leq\|\bm{\Upsilon}_{1,T}-\bm{\Upsilon}_{2,T}\|^{2}_{F}\implies\frac{T-p}{T}\|\bm{\xi}_{1,p}-\bm{\xi}_{2,p}\|^{2}_{F}\leq\frac{1}{T}\|\bm{\Upsilon}_{1,T}-\bm{\Upsilon}_{2,T}\|^{2}_{F}. Furthermore, 𝝃2,p1op2𝝃1,p1op2+𝝃1,p1𝝃2,p1op2\|\bm{\xi}_{2,p-1}\|^{2}_{\mathrm{op}}\leq\|\bm{\xi}_{1,p-1}\|^{2}_{\mathrm{op}}+\|\bm{\xi}_{1,p-1}-\bm{\xi}_{2,p-1}\|^{2}_{\mathrm{op}}.

In 𝚼1,T\bm{\Upsilon}_{1,T}, there are T/pT/p diagonal blocks of 𝚼1,p1\bm{\Upsilon}_{1,p-1}’s assuming pp divides TT, and thus (T/p)𝚼1,p1𝚼2,p1F2𝚼1,T𝚼2,TF2(T/p)\|\bm{\Upsilon}_{1,p-1}-\bm{\Upsilon}_{2,p-1}\|^{2}_{F}\leq\|\bm{\Upsilon}_{1,T}-\bm{\Upsilon}_{2,T}\|^{2}_{F}. Hence, T/pT𝚼1,p1𝚼2,p1F21T𝚼1,T𝚼2,TF2\frac{T/p}{T}\|\bm{\Upsilon}_{1,p-1}-\bm{\Upsilon}_{2,p-1}\|^{2}_{F}\leq\frac{1}{T}\|\bm{\Upsilon}_{1,T}-\bm{\Upsilon}_{2,T}\|^{2}_{F}.

For pp not dividing TT is redundant in an asymptotic regime. ∎

Proof of Lemma 8.

We assume {𝛀1𝛀2F,𝑳1,j𝑳2,jF,𝑼1,j𝑼2,jF,𝑽1,j𝑽2,jF}ϵ\{\|\bm{\Omega}_{1}-\bm{\Omega}_{2}\|_{\mathrm{F}},\|\bm{L}_{1,j}-\bm{L}_{2,j}\|_{\mathrm{F}},\|\bm{U}_{1,j}-\bm{U}_{2,j}\|_{\mathrm{F}},\|\bm{V}_{1,j}-\bm{V}_{2,j}\|_{\mathrm{F}}\}\leq\epsilon. Since, 𝑼,jop2𝑪j1op2𝑳jop2\|\bm{U}_{\ell,j}\|^{2}_{\mathrm{op}}\leq\|\bm{C}_{j-1}\|^{2}_{\mathrm{op}}\|\bm{L}_{j}\|^{2}_{\mathrm{op}} and 𝑽,jop2𝑫j11op2𝑪p1op\|\bm{V}_{\ell,j}\|^{2}_{\mathrm{op}}\leq\|\bm{D}_{j-1}^{-1}\|^{2}_{\mathrm{op}}\leq\|\bm{C}_{p}^{-1}\|_{\mathrm{op}}, we have maxj𝑼,jop2𝚼1(0)op2𝑳,jop2M,U\max_{j}\|\bm{U}_{\ell,j}\|^{2}_{\mathrm{op}}\leq\|\bm{\Upsilon}_{1}(0)\|^{2}_{\mathrm{op}}\|\bm{L}_{\ell,j}\|^{2}_{\mathrm{op}}\leq M_{\ell,U}, maxj𝑽,jop2𝑪,p1op=M,V\max_{j}\|\bm{V}_{\ell,j}\|^{2}_{\mathrm{op}}\leq\|\bm{C}_{\ell,p}^{-1}\|_{\mathrm{op}}=M_{\ell,V} and (1+𝑪1,p1op𝚼1(0)op2𝑳1,jop2𝑪p1op)pM,C,V,U(1+\|\bm{C}^{-1}_{1,p}\|_{\mathrm{op}}\|\bm{\Upsilon}_{1}(0)\|^{2}_{\mathrm{op}}\|\bm{L}_{1,j}\|^{2}_{\mathrm{op}}\|\bm{C}_{p}^{-1}\|_{\mathrm{op}})^{p}\leq M_{\ell,C,V,U}, we have,

𝚼1,p1𝚼2,p1F2\displaystyle\|\bm{\Upsilon}_{1,p-1}-\bm{\Upsilon}_{2,p-1}\|^{2}_{F}
ϵ[(p+1)+p3{M1,UM1,VM2,UM2,V+(𝑪1,p1op2+𝑪2,p1op2)(M1,V+M2,U)}\displaystyle\quad\leq\epsilon[(p+1)+p^{3}\{M_{1,U}M_{1,V}M_{2,U}M_{2,V}+(\|\bm{C}_{1,p}^{-1}\|^{2}_{\mathrm{op}}+\|\bm{C}_{2,p}^{-1}\|^{2}_{\mathrm{op}})(M_{1,V}+M_{2,U})\}
+(𝚼1(0)op2+𝚼2(0)op2){p4M1,UM1,VM1,C,U,V+2p2𝑪2,p1op2M2,VM1,VM1,C,U,V\displaystyle\qquad+(\|\bm{\Upsilon}_{1}(0)\|^{2}_{\mathrm{op}}+\|\bm{\Upsilon}_{2}(0)\|^{2}_{\mathrm{op}})\{p^{4}M_{1,U}M_{1,V}M_{1,C,U,V}+2p^{2}\|\bm{C}_{2,p}^{-1}\|^{2}_{\mathrm{op}}M_{2,V}M_{1,V}M_{1,C,U,V}
+2p2𝑪2,p1op2F2M2,UM1,VM1,C,U,V}]\displaystyle\qquad+2p^{2}\|\bm{C}_{2,p}^{-1}\|^{2}_{\mathrm{op}}\|^{2}_{F}M_{2,U}M_{1,V}M_{1,C,U,V}\}]

And,

𝚼1,p11𝜿1,pT𝚼2,p11𝜿2,pTF2\displaystyle\|\bm{\Upsilon}_{1,p-1}^{-1}\bm{\kappa}_{1,p}^{\mathrm{T}}-\bm{\Upsilon}_{2,p-1}^{-1}\bm{\kappa}_{2,p}^{\mathrm{T}}\|^{2}_{F}
ϵ[2p2(M1,UM1,VM1,C,U,V+p)+2p𝑪2,p1op2M2,VM1,VM1,C,U,V\displaystyle\leq\epsilon[2p^{2}(M_{1,U}M_{1,V}M_{1,C,U,V}+p)+2p\|\bm{C}_{2,p}^{-1}\|^{2}_{\mathrm{op}}M_{2,V}M_{1,V}M_{1,C,U,V}
+2𝑪2,p1op2F2M2,UM1,VM1,C,U,Vp]\displaystyle\qquad+2\|\bm{C}_{2,p}^{-1}\|^{2}_{\mathrm{op}}\|^{2}_{F}M_{2,U}M_{1,V}M_{1,C,U,V}p]

Finally,

𝚼1,T𝚼2,TFϵ[T2MP,1+(Tp+1)𝚼1,p1op2MP,2],\displaystyle\|\bm{\Upsilon}_{1,T}-\bm{\Upsilon}_{2,T}\|_{\mathrm{F}}\leq\epsilon[T^{2}M_{P,1}+(T-p+1)\|\bm{\Upsilon}_{1,p-1}\|^{2}_{\mathrm{op}}M_{P,2}], (A.4)

where MP,1=[(p+1)+p3{M1,UM1,VM2,UM2,V+(𝑪1,p1op2+𝑪2,p1op2)(M1,V+M2,U)}+(𝚼1(0)op2+𝚼2(0)op2){p4M1,UM1,VM1,C,U,V+2p2𝑪2,p1op2M2,VM1,VM1,C,U,V+2p2𝑪2,p1op2F2M2,UM1,VM1,C,U,V}]M_{P,1}=[(p+1)+p^{3}\{M_{1,U}M_{1,V}M_{2,U}M_{2,V}+(\|\bm{C}_{1,p}^{-1}\|^{2}_{\mathrm{op}}+\|\bm{C}_{2,p}^{-1}\|^{2}_{\mathrm{op}})(M_{1,V}+M_{2,U})\}+(\|\bm{\Upsilon}_{1}(0)\|^{2}_{\mathrm{op}}+\|\bm{\Upsilon}_{2}(0)\|^{2}_{\mathrm{op}})\{p^{4}M_{1,U}M_{1,V}M_{1,C,U,V}+2p^{2}\|\bm{C}_{2,p}^{-1}\|^{2}_{\mathrm{op}}M_{2,V}M_{1,V}M_{1,C,U,V}+2p^{2}\|\bm{C}_{2,p}^{-1}\|^{2}_{\mathrm{op}}\|^{2}_{F}M_{2,U}M_{1,V}M_{1,C,U,V}\}] and MP,2=[2p2(M1,UM1,VM1,C,U,V+p)+2p𝑪2,p1op2M2,VM1,VM1,C,U,V+2p𝑪2,p1op2F2M2,UM1,VM1,C,U,V]M_{P,2}=[2p^{2}(M_{1,U}M_{1,V}M_{1,C,U,V}+p)+2p\|\bm{C}_{2,p}^{-1}\|^{2}_{\mathrm{op}}M_{2,V}M_{1,V}M_{1,C,U,V}+2p\|\bm{C}_{2,p}^{-1}\|^{2}_{\mathrm{op}}\|^{2}_{F}M_{2,U}M_{1,V}M_{1,C,U,V}].

While bounding around the truth, we let the eigenvalues of 𝛀1\bm{\Omega}_{1} be bounded between two fixed constants. Also let 𝑳1,jop2,𝑲1,jop2\|\bm{L}_{1,j}\|^{2}_{\mathrm{op}},\|\bm{K}_{1,j}\|^{2}_{\mathrm{op}} be bounded by again a fixed constant. This puts an upper bound on 𝑪p1\bm{C}_{p}^{-1} as well. We have 𝑼1,j𝑼2,jF2{𝛀1𝛀2op2+k=1j𝑳1,k𝑳1,kT𝑳2,k𝑳2,kTop2}\|\bm{U}_{1,j}-\bm{U}_{2,j}\|^{2}_{F}\lesssim\{\|\bm{\Omega}_{1}-\bm{\Omega}_{2}\|^{2}_{\mathrm{op}}+\sum_{k=1}^{j}\|\bm{L}_{1,k}\bm{L}_{1,k}^{\mathrm{T}}-\bm{L}_{2,k}\bm{L}_{2,k}^{\mathrm{T}}\|^{2}_{\mathrm{op}}\}.

We can similarly show that, 𝑽1,j𝑽2,jF2{𝑫j1,1𝑫j1,2op2+𝑲1,j𝑲2,jop2}\|\bm{V}_{1,j}-\bm{V}_{2,j}\|^{2}_{F}\lesssim\{\|\bm{D}_{j-1,1}-\bm{D}_{j-1,2}\|^{2}_{\mathrm{op}}+\|\bm{K}_{1,j}-\bm{K}_{2,j}\|^{2}_{\mathrm{op}}\}

We have, using the recurrence relations, that 𝑫1,j1𝑫2,j1op2𝑫1,j2𝑫2,j2op2+𝑼1,j1op2𝑼2,j1op2𝑽1,j1op2𝑽2,j1op2𝑪1,j21𝑪2,j21op2+(𝑪1,p1op2+𝑪2,p1op2)(𝑼1,j1𝑼2,j1op2𝑽1,j1op2+𝑽1,j1𝑽2,j1op2𝑼2,j1op2)\|\bm{D}_{1,j-1}-\bm{D}_{2,j-1}\|^{2}_{\mathrm{op}}\leq\|\bm{D}_{1,j-2}-\bm{D}_{2,j-2}\|^{2}_{\mathrm{op}}+\|\bm{U}_{1,j-1}\|^{2}_{\mathrm{op}}\|\bm{U}_{2,j-1}\|^{2}_{\mathrm{op}}\|\bm{V}_{1,j-1}\|^{2}_{\mathrm{op}}\|\|\bm{V}_{2,j-1}\|^{2}_{\mathrm{op}}\|\bm{C}_{1,j-2}^{-1}-\bm{C}_{2,j-2}^{-1}\|^{2}_{\mathrm{op}}+(\|\bm{C}_{1,p}^{-1}\|^{2}_{\mathrm{op}}+\|\bm{C}_{2,p}^{-1}\|^{2}_{\mathrm{op}})(\|\bm{U}_{1,j-1}-\bm{U}_{2,j-1}\|^{2}_{\mathrm{op}}\|\bm{V}_{1,j-1}\|^{2}_{\mathrm{op}}+\|\bm{V}_{1,j-1}-\bm{V}_{2,j-1}\|^{2}_{\mathrm{op}}\|\bm{U}_{2,j-1}\|^{2}_{\mathrm{op}}).

Then, we have 𝑽1,k𝑽2,kF2ϵ,𝑼1,k𝑼2,kF2ϵ\|\bm{V}_{1,k}-\bm{V}_{2,k}\|^{2}_{F}\leq\epsilon,\|\bm{U}_{1,k}-\bm{U}_{2,k}\|^{2}_{F}\leq\epsilon if {𝛀1𝛀2op2,𝑲1,j𝑲2,jop2}ϵ\{\|\bm{\Omega}_{1}-\bm{\Omega}_{2}\|^{2}_{\mathrm{op}},\|\bm{K}_{1,j}-\bm{K}_{2,j}\|^{2}_{\mathrm{op}}\}\lesssim\epsilon and 𝑳1,k𝑳2,kop2ϵ\|\bm{L}_{1,k}-\bm{L}_{2,k}\|^{2}_{\mathrm{op}}\lesssim\epsilon. The constants will depend on the bounds on the truth.

Proof of Lemma 9.

In the sieve, we have 𝑼1,j𝑼2,jF2uT{𝛀1𝛀2op2+k=1j𝑳1,k𝑳1,kT𝑳2,k𝑳2,kTop2}\|\bm{U}_{1,j}-\bm{U}_{2,j}\|^{2}_{F}\leq u_{T}\{\|\bm{\Omega}_{1}-\bm{\Omega}_{2}\|^{2}_{\mathrm{op}}+\sum_{k=1}^{j}\|\bm{L}_{1,k}\bm{L}_{1,k}^{\mathrm{T}}-\bm{L}_{2,k}\bm{L}_{2,k}^{\mathrm{T}}\|^{2}_{\mathrm{op}}\} for some uTu_{T}, a polynomial in TT.

We can similarly show that, 𝑽1,j𝑽2,jF2vT{𝑫j1,1𝑫j1,2op2+𝑲1,j𝑲2,jop2}\|\bm{V}_{1,j}-\bm{V}_{2,j}\|^{2}_{F}\leq v_{T}\{\|\bm{D}_{j-1,1}-\bm{D}_{j-1,2}\|^{2}_{\mathrm{op}}+\|\bm{K}_{1,j}-\bm{K}_{2,j}\|^{2}_{\mathrm{op}}\} for some vTv_{T} which is polynomial in TT.

Then, we have 𝑽1,k𝑽2,kF2ϵ,𝑼1,k𝑼2,kF2ϵ\|\bm{V}_{1,k}-\bm{V}_{2,k}\|^{2}_{F}\leq\epsilon,\|\bm{U}_{1,k}-\bm{U}_{2,k}\|^{2}_{F}\leq\epsilon if {𝛀1𝛀2op2,𝑲1,j𝑲2,jop2}min{ϵuT,ϵvT},𝑳1,k𝑳1,kT𝑳2,k𝑳2,kTop2min{ϵpuT,ϵpvT}\{\|\bm{\Omega}_{1}-\bm{\Omega}_{2}\|^{2}_{\mathrm{op}},\|\bm{K}_{1,j}-\bm{K}_{2,j}\|^{2}_{\mathrm{op}}\}\leq\min\left\{\frac{\epsilon}{u_{T}},\frac{\epsilon}{v^{\prime}_{T}}\right\},\|\bm{L}_{1,k}\bm{L}_{1,k}^{\mathrm{T}}-\bm{L}_{2,k}\bm{L}_{2,k}^{\mathrm{T}}\|^{2}_{\mathrm{op}}\leq\min\left\{\frac{\epsilon}{pu_{T}},\frac{\epsilon}{pv^{\prime}_{T}}\right\}. In summary, we need 𝑳1,k𝑳2,kop2oTϵ\|\bm{L}_{1,k}-\bm{L}_{2,k}\|^{2}_{\mathrm{op}}\leq o_{T}\epsilon, for some polynomial in TT, oTo_{T} within 𝒢T{\cal G}_{T} for the above to hold.

Finally, 𝑫1,0𝑫2,0op2cT𝛀1𝛀2op2\|\bm{D}_{1,0}-\bm{D}_{2,0}\|^{2}_{\mathrm{op}}\leq c_{T}\|\bm{\Omega}_{1}-\bm{\Omega}_{2}\|^{2}_{\mathrm{op}} for some cTc_{T} which is polynomial in TT and the bound for 𝑼1,j1𝑼2,j1op2\|\bm{U}_{1,j-1}-\bm{U}_{2,j-1}\|^{2}_{\mathrm{op}} is established above.

Thus, 𝑽1,j𝑽2,jF2vT{𝛀1𝛀2op2+𝑲1,j𝑲2,jop2+k=1j1𝑳1,k𝑳1,kT𝑳2,k𝑳2,kTop2}\|\bm{V}_{1,j}-\bm{V}_{2,j}\|^{2}_{F}\leq v^{\prime}_{T}\{\|\bm{\Omega}_{1}-\bm{\Omega}_{2}\|^{2}_{\mathrm{op}}+\|\bm{K}_{1,j}-\bm{K}_{2,j}\|^{2}_{\mathrm{op}}+\sum_{k=1}^{j-1}\|\bm{L}_{1,k}\bm{L}_{1,k}^{\mathrm{T}}-\bm{L}_{2,k}\bm{L}_{2,k}^{\mathrm{T}}\|^{2}_{\mathrm{op}}\} for some vTv^{\prime}_{T} which is polynomial in TT

Within 𝒢T{\cal G}_{T}, if aT,1,aT,2,bT,cTa_{T,1},a_{T,2},b_{T},c_{T} are all polynomials in TT, we have MP,1=cTM_{P,1}=c_{T} and MP,2=eTM_{P,2}=e_{T} which are also some polynomials in TT.

Setting, ϵ=δT/[T2cT+(Tp+1)aT,1eT]\epsilon=\delta_{T}/[T^{2}c_{T}+(T-p+1)a_{T,1}e_{T}], we have 𝚼1,T𝚼2,TFδT\|\bm{\Upsilon}_{1,T}-\bm{\Upsilon}_{2,T}\|_{\mathrm{F}}\leq\delta_{T}, where 𝑽1,k𝑽2,kF2ϵ,𝑼1,k𝑼2,kF2ϵ,𝑳1,k𝑳1,kT𝑳2,k𝑳2,kTF2ϵ,𝚼1(0)𝚼2(0)F2ϵ\|\bm{V}_{1,k}-\bm{V}_{2,k}\|^{2}_{F}\leq\epsilon,\|\bm{U}_{1,k}-\bm{U}_{2,k}\|^{2}_{F}\leq\epsilon,\|\bm{L}_{1,k}\bm{L}_{1,k}^{\mathrm{T}}-\bm{L}_{2,k}\bm{L}_{2,k}^{\mathrm{T}}\|^{2}_{F}\leq\epsilon,\|\bm{\Upsilon}_{1}(0)-\bm{\Upsilon}_{2}(0)\|^{2}_{F}\leq\epsilon.

Then using  (A.4), we have the final result.

Proof of Lemma 10.

We first compute a bound for VAR(1) case. Then use it for VAR(p). From Lütkepohl [2005], we get 𝚪(j)=i=0𝑨j+i𝑪1(𝑨i)T=𝑨ji=0𝑨i𝑪1(𝑨i)T\bm{\Gamma}(j)=\sum_{i=0}^{\infty}\bm{A}^{j+i}\bm{C}_{1}(\bm{A}^{i})^{\mathrm{T}}=\bm{A}^{j}\sum_{i=0}^{\infty}\bm{A}^{i}\bm{C}_{1}(\bm{A}^{i})^{\mathrm{T}} for a VAR(1) process, where 𝑨=𝚪(1)𝛀\bm{A}=\bm{\Gamma}(1)\bm{\Omega}.

Using the telescopic sum setting 𝚪(0)𝑨𝚪(0)𝑨=𝑪1\bm{\Gamma}(0)-\bm{A}\bm{\Gamma}(0)\bm{A}=\bm{C}_{1}, we have i=0𝑨i𝑪1(𝑨i)T=𝚪(0)limk𝑨k𝚪(0)(𝑨k)T=𝚪(0)\sum_{i=0}^{\infty}\bm{A}^{i}\bm{C}_{1}(\bm{A}^{i})^{\mathrm{T}}=\bm{\Gamma}(0)-\lim_{k\rightarrow\infty}\bm{A}^{k}\bm{\Gamma}(0)(\bm{A}^{k})^{\mathrm{T}}=\bm{\Gamma}(0) under our parametrization. Hence, 𝚪(j)=𝑨j𝚪(0)\bm{\Gamma}(j)=\bm{A}^{j}\bm{\Gamma}(0).

𝚪(1)=𝑼1𝑽1T\bm{\Gamma}(1)=\bm{U}_{1}\bm{V}_{1}^{\mathrm{T}} and 𝑪1=𝛀1𝑼1𝑼1T\bm{C}_{1}=\bm{\Omega}^{-1}-\bm{U}_{1}\bm{U}_{1}^{\mathrm{T}}. We have 𝑽1𝛀𝑽1T=𝑰\bm{V}_{1}\bm{\Omega}\bm{V}_{1}^{\mathrm{T}}=\bm{I} and 𝛀1𝑼1𝑼1T\bm{\Omega}^{-1}\geq\bm{U}_{1}\bm{U}_{1}^{\mathrm{T}}. Alternatively, λmin(𝛀1𝑼1𝑼1T)0\lambda_{\min}(\bm{\Omega}^{-1}-\bm{U}_{1}\bm{U}_{1}^{\mathrm{T}})\geq 0.

Since, 𝑽1T𝛀𝑽1=𝑰\bm{V}_{1}^{\mathrm{T}}\bm{\Omega}\bm{V}_{1}=\bm{I}, we have 𝑽1T𝛀1/2op2=1\|\bm{V}_{1}^{\mathrm{T}}\bm{\Omega}^{1/2}\|^{2}_{\mathrm{op}}=1. Hence 𝑽1op2𝛀1/2op2\|\bm{V}_{1}\|^{2}_{\mathrm{op}}\leq\|\bm{\Omega}^{-1/2}\|^{2}_{\mathrm{op}}

𝑨k𝑩k=𝑨k1(𝑨𝑩)+(𝑨k1𝑩k1)𝑩\bm{A}^{k}-\bm{B}^{k}=\bm{A}^{k-1}(\bm{A}-\bm{B})+(\bm{A}^{k-1}-\bm{B}^{k-1})\bm{B}. Recursively using this relation and applying norm, we have 𝑨k𝑩kF2𝑨𝑩F2(i=1k𝑨op2(ki)𝑩op2(i1))\|\bm{A}^{k}-\bm{B}^{k}\|^{2}_{F}\leq\|\bm{A}-\bm{B}\|^{2}_{F}(\sum_{i=1}^{k}\|\bm{A}\|_{\mathrm{op}}^{2(k-i)}\|\bm{B}\|_{\mathrm{op}}^{2(i-1)}), using the triangle inequality and the submultiplicative property of operator norm.

After some simplification using GP-series sum results, i=1k𝑨op2(ki)𝑩op2(i1)=𝑨op2k𝑩op2k𝑨op2𝑩op2={𝑩op2+(𝑨op2𝑩op2)}k𝑩op2k𝑨op2𝑩op2=i=0k1(ki)𝑩opi(𝑨op2𝑩op2)k1i\sum_{i=1}^{k}\|\bm{A}\|_{\mathrm{op}}^{2(k-i)}\|\bm{B}\|_{\mathrm{op}}^{2(i-1)}=\frac{\|\bm{A}\|_{\mathrm{op}}^{2k}-\|\bm{B}\|_{\mathrm{op}}^{2k}}{\|\bm{A}\|^{2}_{\mathrm{op}}-\|\bm{B}\|^{2}_{\mathrm{op}}}=\frac{\{\|\bm{B}\|^{2}_{\mathrm{op}}+(\|\bm{A}\|^{2}_{\mathrm{op}}-\|\bm{B}\|^{2}_{\mathrm{op}})\}^{k}-\|\bm{B}\|_{\mathrm{op}}^{2k}}{\|\bm{A}\|^{2}_{\mathrm{op}}-\|\bm{B}\|^{2}_{\mathrm{op}}}=\sum_{i=0}^{k-1}{k\choose i}\|\bm{B}\|^{i}_{\mathrm{op}}(\|\bm{A}\|^{2}_{\mathrm{op}}-\|\bm{B}\|^{2}_{\mathrm{op}})^{k-1-i}.

Hence, 𝑨k𝑩kF2𝑨𝑩F2{i=0k1(ki)𝑩opi(𝑨op2𝑩op2)k1i}k𝑨𝑩F2min{𝑨op2(k1),|𝑨op2𝑩op2|k1}\|\bm{A}^{k}-\bm{B}^{k}\|^{2}_{F}\leq\|\bm{A}-\bm{B}\|^{2}_{F}\left\{\sum_{i=0}^{k-1}{k\choose i}\|\bm{B}\|^{i}_{\mathrm{op}}(\|\bm{A}\|^{2}_{\mathrm{op}}-\|\bm{B}\|^{2}_{\mathrm{op}})^{k-1-i}\right\}\leq k\|\bm{A}-\bm{B}\|^{2}_{F}\min\{\|\bm{A}\|_{\mathrm{op}}^{2(k-1)},|\|\bm{A}\|^{2}_{\mathrm{op}}-\|\bm{B}\|^{2}_{\mathrm{op}}|^{k-1}\}

𝚼1,T𝚼2,TF2T𝚼1(0)𝚼2(0)F+k=1T1(Tk)𝚼1(k)𝚼2(k)F2\|\bm{\Upsilon}_{1,T}-\bm{\Upsilon}_{2,T}\|^{2}_{F}\leq T\|\bm{\Upsilon}_{1}(0)-\bm{\Upsilon}_{2}(0)\|_{\mathrm{F}}+\sum_{k=1}^{T-1}(T-k)\|\bm{\Upsilon}_{1}(k)-\bm{\Upsilon}_{2}(k)\|^{2}_{F}.

𝚼1(k)𝚼2(k)F2𝚼1(0)op2𝑨1k𝑨2kF+𝑨2opk𝚼1(0)𝚼2(0)F2\|\bm{\Upsilon}_{1}(k)-\bm{\Upsilon}_{2}(k)\|^{2}_{F}\leq\|\bm{\Upsilon}_{1}(0)\|^{2}_{\mathrm{op}}\|\bm{A}_{1}^{k}-\bm{A}_{2}^{k}\|_{\mathrm{F}}+\|\bm{A}_{2}\|^{k}_{\mathrm{op}}\|\bm{\Upsilon}_{1}(0)-\bm{\Upsilon}_{2}(0)\|^{2}_{F}.

𝑨1𝑨2F𝚼1(1)op2𝛀1𝛀2F2+𝛀2op2𝚼1(1)𝚼2(1)F2𝑼1op2𝑽1op2𝛀1𝛀2F2+𝛀2op2{𝑼1,1op2𝑽1,1𝑽1,2F2+𝑽1,2op2𝑼1,1𝑼1,2F2}\|\bm{A}_{1}-\bm{A}_{2}\|_{\mathrm{F}}\leq\|\bm{\Upsilon}_{1}(1)\|^{2}_{\mathrm{op}}\|\bm{\Omega}_{1}-\bm{\Omega}_{2}\|^{2}_{F}+\|\bm{\Omega}_{2}\|^{2}_{\mathrm{op}}\|\bm{\Upsilon}_{1}(1)-\bm{\Upsilon}_{2}(1)\|^{2}_{F}\leq\|\bm{U}_{1}\|^{2}_{\mathrm{op}}\|\bm{V}_{1}\|^{2}_{\mathrm{op}}\|\bm{\Omega}_{1}-\bm{\Omega}_{2}\|^{2}_{F}+\|\bm{\Omega}_{2}\|^{2}_{\mathrm{op}}\{\|\bm{U}_{1,1}\|^{2}_{\mathrm{op}}\|\bm{V}_{1,1}-\bm{V}_{1,2}\|^{2}_{F}+\|\bm{V}_{1,2}\|^{2}_{\mathrm{op}}\|\bm{U}_{1,1}-\bm{U}_{1,2}\|^{2}_{F}\}.

And,

𝚼1,T𝚼2,TF2\displaystyle\|\bm{\Upsilon}_{1,T}-\bm{\Upsilon}_{2,T}\|^{2}_{F}
T𝚼1(0)𝚼2(0)F+k=1T1(Tk){𝚼1(0)op2𝑨1k𝑨2kF2+𝑨2opk𝚼1(0)𝚼2(0)F2}\displaystyle\quad\leq T\|\bm{\Upsilon}_{1}(0)-\bm{\Upsilon}_{2}(0)\|_{\mathrm{F}}+\sum_{k=1}^{T-1}(T-k)\{\|\bm{\Upsilon}_{1}(0)\|^{2}_{\mathrm{op}}\|\bm{A}_{1}^{k}-\bm{A}_{2}^{k}\|^{2}_{\mathrm{F}}+\|\bm{A}_{2}\|^{k}_{\mathrm{op}}\|\bm{\Upsilon}_{1}(0)-\bm{\Upsilon}_{2}(0)\|^{2}_{F}\}
T𝚼1(0)𝚼2(0)F+k=1T1(Tk){𝚼1(0)op2k𝑨1𝑨2F2ak1\displaystyle\quad\leq T\|\bm{\Upsilon}_{1}(0)-\bm{\Upsilon}_{2}(0)\|_{\mathrm{F}}+\sum_{k=1}^{T-1}(T-k)\{\|\bm{\Upsilon}_{1}(0)\|^{2}_{\mathrm{op}}k\|\bm{A}_{1}-\bm{A}_{2}\|^{2}_{F}a^{k-1}
+𝑨2op2k𝚼1(0)𝚼2(0)F2}\displaystyle\qquad+\|\bm{A}_{2}\|^{2k}_{\mathrm{op}}\|\bm{\Upsilon}_{1}(0)-\bm{\Upsilon}_{2}(0)\|^{2}_{F}\}
(TbbT1bb{(T1)bTTbT1+1}(1b)2+T)𝚼1(0)𝚼2(0)F2\displaystyle\quad\leq(T\frac{b-b^{\mathrm{T}}}{1-b}-\frac{b\{(T-1)b^{\mathrm{T}}-Tb^{T-1}+1\}}{(1-b)^{2}}+T)\|\bm{\Upsilon}_{1}(0)-\bm{\Upsilon}_{2}(0)\|^{2}_{F} (A.5)
+T𝚼1(0)op2𝑨1𝑨2F2k=1T1kak1\displaystyle\qquad+T\|\bm{\Upsilon}_{1}(0)\|^{2}_{\mathrm{op}}\|\bm{A}_{1}-\bm{A}_{2}\|^{2}_{F}\sum_{k=1}^{T-1}ka^{k-1}
(TbTb2+bT+1b(1b)2+T)𝚼1(0)𝚼2(0)F2\displaystyle\quad\leq(\frac{Tb-Tb^{2}+b^{T+1}-b}{(1-b)^{2}}+T)\|\bm{\Upsilon}_{1}(0)-\bm{\Upsilon}_{2}(0)\|^{2}_{F}
+T𝚼1(0)op2𝑨1𝑨2F21+(T2)aT1(T1)aT2(1a)2\displaystyle\qquad+T\|\bm{\Upsilon}_{1}(0)\|^{2}_{\mathrm{op}}\|\bm{A}_{1}-\bm{A}_{2}\|^{2}_{F}\frac{1+(T-2)a^{T-1}-(T-1)a^{T-2}}{(1-a)^{2}} (A.6)

where a=𝑨1op2a=\|\bm{A}_{1}\|^{2}_{\mathrm{op}} and b=𝑨2op2b=\|\bm{A}_{2}\|^{2}_{\mathrm{op}}. Since, a<1a<1, we have 1+(T2)aT1(T1)aT2(1a)2<1(1a)2\frac{1+(T-2)a^{T-1}-(T-1)a^{T-2}}{(1-a)^{2}}<\frac{1}{(1-a)^{2}}. This comes in terms of the model parameters.

It is easy to verify that 1+(T2)aT1(T1)aT2(1a)2\frac{1+(T-2)a^{T-1}-(T-1)a^{T-2}}{(1-a)^{2}} and TbTb2+bT+1b(1b)2\frac{Tb-Tb^{2}+b^{T+1}-b}{(1-b)^{2}} are increasing in aa and, bb respectively. Following are the limits,

lima11+(T2)aT1(T1)aT2(1a)2=(T12).\lim_{a\rightarrow 1}\frac{1+(T-2)a^{T-1}-(T-1)a^{T-2}}{(1-a)^{2}}={T-1\choose 2}.
limb1TbTb2+bT+1b(1b)2+T=(T2).\lim_{b\rightarrow 1}\frac{Tb-Tb^{2}+b^{T+1}-b}{(1-b)^{2}}+T={T\choose 2}.

Due to VAR(1) representation of VAR(p)(p) [Ghosh et al., 2019], Equation (A.6) and 𝑨=𝜿,p𝚼,p11\bm{A}_{\ell}=\bm{\kappa}_{\ell,p}\bm{\Upsilon}_{\ell,p-1}^{-1} for =1,2\ell=1,2, we have

𝚼1,T𝚼2,TF2\displaystyle\|\bm{\Upsilon}_{1,T}-\bm{\Upsilon}_{2,T}\|^{2}_{\mathrm{F}}
T2𝚼1,p1𝚼2,p1F2\displaystyle\leq T^{2}\|\bm{\Upsilon}_{1,p-1}-\bm{\Upsilon}_{2,p-1}\|^{2}_{F}
+T3𝚼1,p1op2𝜿1,p𝚼1,p11𝜿2,p𝚼2,p11F2\displaystyle\qquad+T^{3}\|\bm{\Upsilon}_{1,p-1}\|^{2}_{\mathrm{op}}\|\bm{\kappa}_{1,p}\bm{\Upsilon}_{1,p-1}^{-1}-\bm{\kappa}_{2,p}\bm{\Upsilon}_{2,p-1}^{-1}\|^{2}_{F}
T2𝚼1,p1𝚼2,p1F2\displaystyle\leq T^{2}\|\bm{\Upsilon}_{1,p-1}-\bm{\Upsilon}_{2,p-1}\|^{2}_{F}
+T3𝚼1(0)op2𝜿1,p𝚼1,p11𝜿2,p𝚼2,p11F2\displaystyle\qquad+T^{3}\|\bm{\Upsilon}_{1}(0)\|^{2}_{\mathrm{op}}\|\bm{\kappa}_{1,p}\bm{\Upsilon}_{1,p-1}^{-1}-\bm{\kappa}_{2,p}\bm{\Upsilon}_{2,p-1}^{-1}\|^{2}_{F}

Proof of Result 11.

𝑫j=𝑫j1𝑾jT𝑪j11𝑾j\bm{D}_{j}=\bm{D}_{j-1}-\bm{W}_{j}^{\mathrm{T}}\bm{C}_{j-1}^{-1}\bm{W}_{j} with 𝑾j=𝑼j𝑽j\bm{W}_{j}=\bm{U}_{j}\bm{V}_{j}.

𝑫1,j𝑫2,jF2𝑫1,j1𝑫2,j1F2+𝑾1,jop2𝑾2,jop2𝑪1,j11𝑪2,j11F2+(𝑪1,j11op2+𝑪2,j11op2)𝑾1,j𝑾2,jF2\|\bm{D}_{1,j}-\bm{D}_{2,j}\|^{2}_{F}\leq\|\bm{D}_{1,j-1}-\bm{D}_{2,j-1}\|^{2}_{F}+\|\bm{W}_{1,j}\|^{2}_{\mathrm{op}}\|\bm{W}_{2,j}\|^{2}_{\mathrm{op}}\|\bm{C}_{1,j-1}^{-1}-\bm{C}_{2,j-1}^{-1}\|^{2}_{F}+(\|\bm{C}_{1,j-1}^{-1}\|^{2}_{\mathrm{op}}+\|\bm{C}_{2,j-1}^{-1}\|^{2}_{\mathrm{op}})\|\bm{W}_{1,j}-\bm{W}_{2,j}\|^{2}_{F}.

We have 𝑾1,j𝑾2,jF2𝑼1,j𝑼2,jF2𝑽1,jop2+𝑽1,j𝑽2,jF2𝑼2,jop2\|\bm{W}_{1,j}-\bm{W}_{2,j}\|^{2}_{F}\leq\|\bm{U}_{1,j}-\bm{U}_{2,j}\|^{2}_{F}\|\bm{V}_{1,j}\|^{2}_{\mathrm{op}}+\|\bm{V}_{1,j}-\bm{V}_{2,j}\|^{2}_{F}\|\bm{U}_{2,j}\|^{2}_{\mathrm{op}} and 𝑪1,j11op2𝑪1,p1op2\|\bm{C}_{1,j-1}^{-1}\|^{2}_{\mathrm{op}}\leq\|\bm{C}_{1,p}^{-1}\|^{2}_{\mathrm{op}}, 𝑪2,j11op2𝑪2,p1op2\|\bm{C}_{2,j-1}^{-1}\|^{2}_{\mathrm{op}}\leq\|\bm{C}_{2,p}^{-1}\|^{2}_{\mathrm{op}}.

We will be able to use mathematical induction to show that the difference can be bounded by the parameters.

Thus, 𝚪1,j1𝚪2,j1F2𝑮1𝑮2F+(𝑮1op2+𝑮2op2)𝑯1𝑯2F2𝚪1,j11𝚪2,j11F2+𝑫1,j1𝑫2,j1F2+(𝑪1,p1op2+𝑪2,p1op2)𝚼1,j11𝜿1,jT𝚼2,j11𝜿2,jTF2\|\bm{\Gamma}^{-1}_{1,j}-\bm{\Gamma}^{-1}_{2,j}\|^{2}_{F}\leq\|\bm{G}_{1}-\bm{G}_{2}\|_{\mathrm{F}}+(\|\bm{G}_{1}\|^{2}_{\mathrm{op}}+\|\bm{G}_{2}\|^{2}_{\mathrm{op}})\|\bm{H}_{1}-\bm{H}_{2}\|^{2}_{F}\leq\|\bm{\Gamma}^{-1}_{1,j-1}-\bm{\Gamma}^{-1}_{2,j-1}\|^{2}_{F}+\|\bm{D}^{-1}_{1,j}-\bm{D}^{-1}_{2,j}\|^{2}_{F}+(\|\bm{C}_{1,p}^{-1}\|^{2}_{\mathrm{op}}+\|\bm{C}_{2,p}^{-1}\|^{2}_{\mathrm{op}})\|\bm{\Upsilon}_{1,j-1}^{-1}\bm{\kappa}_{1,j}^{\mathrm{T}}-\bm{\Upsilon}_{2,j-1}^{-1}\bm{\kappa}_{2,j}^{\mathrm{T}}\|^{2}_{F}.

𝑫1,j1𝑫2,j1F2𝑫1,j1op2𝑫2,j1op2𝑫1,j𝑫2,jF2𝑪1,p1op2𝑪2,p1op2𝑫1,j𝑫2,jF2\|\bm{D}^{-1}_{1,j}-\bm{D}^{-1}_{2,j}\|^{2}_{F}\leq\|\bm{D}^{-1}_{1,j}\|^{2}_{\mathrm{op}}\|\bm{D}^{-1}_{2,j}\|^{2}_{\mathrm{op}}\|\bm{D}_{1,j}-\bm{D}_{2,j}\|^{2}_{F}\leq\|\bm{C}_{1,p}^{-1}\|^{2}_{\mathrm{op}}\|\bm{C}_{2,p}^{-1}\|^{2}_{\mathrm{op}}\|\bm{D}_{1,j}-\bm{D}_{2,j}\|^{2}_{F}.

since 𝚼1,j11op2𝚼1,p1op2,𝚼2,j11op2𝑪2,p1op2\|\bm{\Upsilon}_{1,j-1}^{-1}\|^{2}_{\mathrm{op}}\leq\|\bm{\Upsilon}_{1,p}^{-1}\|^{2}_{\mathrm{op}},\|\bm{\Upsilon}_{2,j-1}^{-1}\|^{2}_{\mathrm{op}}\leq\|\bm{C}_{2,p}^{-1}\|^{2}_{\mathrm{op}}.

𝐏=[𝐀𝟏𝐀𝟐𝐀𝟑𝐀𝟒]1=[(𝐀𝟏𝐀𝟐𝐀𝟒1𝐀𝟑)1(𝐀𝟏𝐀𝟐𝐀𝟒1𝐀𝟑)1𝐀𝟐𝐀𝟒1𝐀𝟒1𝐀𝟑(𝐀𝟏𝐀𝟐𝐀𝟒1𝐀𝟑)1𝐀𝟒1+𝐀𝟒1𝐀𝟑(𝐀𝟏𝐀𝟐𝐀𝟒1𝐀𝟑)1𝐀𝟐𝐀𝟒1].{\displaystyle\mathbf{P}={\begin{bmatrix}\mathbf{A_{1}}&\mathbf{A_{2}}\\ \mathbf{A_{3}}&\mathbf{A_{4}}\end{bmatrix}}^{-1}={\begin{bmatrix}\left(\mathbf{A_{1}}-\mathbf{A_{2}A_{4}}^{-1}\mathbf{A_{3}}\right)^{-1}&-\left(\mathbf{A_{1}}-\mathbf{A_{2}A_{4}}^{-1}\mathbf{A_{3}}\right)^{-1}\mathbf{A_{2}A_{4}}^{-1}\\ -\mathbf{A_{4}}^{-1}\mathbf{A_{3}}\left(\mathbf{A_{1}}-\mathbf{A_{2}A_{4}}^{-1}\mathbf{A_{3}}\right)^{-1}&\quad\mathbf{A_{4}}^{-1}+\mathbf{A_{4}}^{-1}\mathbf{A_{3}}\left(\mathbf{A_{1}}-\mathbf{A_{2}A_{4}}^{-1}\mathbf{A_{3}}\right)^{-1}\mathbf{A_{2}A_{4}}^{-1}\end{bmatrix}}.}

Setting 𝑷=𝚼j1,𝑨1=𝚪(0),𝑨2=𝝃j1T,𝑨3=𝝃j1,𝑨4=𝚼j2\bm{P}=\bm{\Upsilon}_{j-1},\bm{A}_{1}=\bm{\Gamma}(0),\bm{A}_{2}=\bm{\xi}_{j-1}^{\mathrm{T}},\bm{A}_{3}=\bm{\xi}_{j-1},\bm{A}_{4}=\bm{\Upsilon}_{j-2}, we have 𝚼j11𝜿jT={𝑪j11𝑼j𝑽j,𝚼j21𝜿j1T𝚼j21𝝃j1T𝑪j11𝑼j𝑽j}\bm{\Upsilon}_{j-1}^{-1}\bm{\kappa}_{j}^{\mathrm{T}}=\{\bm{C}_{j-1}^{-1}\bm{U}_{j}\bm{V}_{j},\bm{\Upsilon}_{j-2}^{-1}\bm{\kappa}_{j-1}^{\mathrm{T}}-\bm{\Upsilon}_{j-2}^{-1}\bm{\xi}_{j-1}^{\mathrm{T}}\bm{C}_{j-1}^{-1}\bm{U}_{j}\bm{V}_{j}\} since 𝚪(j)=𝑼j𝑽jT+𝝃j1𝚪j21𝜿j1\bm{\Gamma}(j)=\bm{U}_{j}\bm{V}_{j}^{\mathrm{T}}+\bm{\xi}_{j-1}\bm{\Gamma}^{-1}_{j-2}\bm{\kappa}_{j-1}

Since, 𝚼1,j21𝜿1,j1Top2,𝚼2,j21𝜿2,j1Top21\|\bm{\Upsilon}_{1,j-2}^{-1}\bm{\kappa}_{1,j-1}^{\mathrm{T}}\|^{2}_{\mathrm{op}},\|\bm{\Upsilon}_{2,j-2}^{-1}\bm{\kappa}_{2,j-1}^{\mathrm{T}}\|^{2}_{\mathrm{op}}\leq 1 and 𝚼1,j11𝝃1,jT𝚼2,j11𝝃2,jTF2=𝚼1,j11𝜿1,jT𝚼2,j11𝜿2,jTF2\|\bm{\Upsilon}_{1,j-1}^{-1}\bm{\xi}_{1,j}^{\mathrm{T}}-\bm{\Upsilon}_{2,j-1}^{-1}\bm{\xi}_{2,j}^{\mathrm{T}}\|^{2}_{F}=\|\bm{\Upsilon}_{1,j-1}^{-1}\bm{\kappa}_{1,j}^{\mathrm{T}}-\bm{\Upsilon}_{2,j-1}^{-1}\bm{\kappa}_{2,j}^{\mathrm{T}}\|^{2}_{F}

Proof of Lemma 12.
𝚼1,j𝚼2,jF2\displaystyle\|\bm{\Upsilon}_{1,j}-\bm{\Upsilon}_{2,j}\|^{2}_{F}
𝚼1,j1𝚼2,j1F2+𝑫1,j𝑫2,jF2+(𝚼1(0)op2+𝚼2(0)op2)𝚼1,j11𝜿1,jT𝚼2,j11𝜿2,jTF2,\displaystyle\quad\leq\|\bm{\Upsilon}_{1,j-1}-\bm{\Upsilon}_{2,j-1}\|^{2}_{F}+\|\bm{D}_{1,j}-\bm{D}_{2,j}\|^{2}_{F}+(\|\bm{\Upsilon}_{1}(0)\|^{2}_{\mathrm{op}}+\|\bm{\Upsilon}_{2}(0)\|^{2}_{\mathrm{op}})\|\bm{\Upsilon}_{1,j-1}^{-1}\bm{\kappa}_{1,j}^{\mathrm{T}}-\bm{\Upsilon}_{2,j-1}^{-1}\bm{\kappa}_{2,j}^{\mathrm{T}}\|^{2}_{F}, (A.7)
𝑫1,j𝑫2,jF2\displaystyle\|\bm{D}_{1,j}-\bm{D}_{2,j}\|^{2}_{F}
𝑫1,j1𝑫2,j1F2+𝑼1,j𝑽1,jop2𝑼2,j𝑽2,jop2𝑪1,j11𝑪2,j11F2+(𝑪1,j11op2\displaystyle\quad\leq\|\bm{D}_{1,j-1}-\bm{D}_{2,j-1}\|^{2}_{F}+\|\bm{U}_{1,j}\bm{V}_{1,j}\|^{2}_{\mathrm{op}}\|\bm{U}_{2,j}\bm{V}_{2,j}\|^{2}_{\mathrm{op}}\|\bm{C}_{1,j-1}^{-1}-\bm{C}_{2,j-1}^{-1}\|^{2}_{F}+(\|\bm{C}_{1,j-1}^{-1}\|^{2}_{\mathrm{op}}
+𝑪2,j11op2)(𝑼1,j𝑼2,jF2𝑽1,jop2+𝑽1,j𝑽2,jF2𝑼2,jop2)\displaystyle\quad+\|\bm{C}_{2,j-1}^{-1}\|^{2}_{\mathrm{op}})(\|\bm{U}_{1,j}-\bm{U}_{2,j}\|^{2}_{F}\|\bm{V}_{1,j}\|^{2}_{\mathrm{op}}+\|\bm{V}_{1,j}-\bm{V}_{2,j}\|^{2}_{F}\|\bm{U}_{2,j}\|^{2}_{\mathrm{op}})
𝚼1(0)𝚼2(0)F2+k=1j[𝑼1,k𝑽1,kop2𝑼2,k𝑽2,kop2𝑪1,k11𝑪2,k11F2+(𝑪1,k11op2\displaystyle\quad\leq\|\bm{\Upsilon}_{1}(0)-\bm{\Upsilon}_{2}(0)\|^{2}_{F}+\sum_{k=1}^{j}\big{[}\|\bm{U}_{1,k}\bm{V}_{1,k}\|^{2}_{\mathrm{op}}\|\bm{U}_{2,k}\bm{V}_{2,k}\|^{2}_{\mathrm{op}}\|\bm{C}_{1,k-1}^{-1}-\bm{C}_{2,k-1}^{-1}\|^{2}_{F}+(\|\bm{C}_{1,k-1}^{-1}\|^{2}_{\mathrm{op}} (A.8)
+𝑪2,k11op2)(𝑼1,k𝑼2,kF2𝑽1,kop2+𝑽1,k𝑽2,kF2𝑼2,kop2)],\displaystyle\quad+\|\bm{C}_{2,k-1}^{-1}\|^{2}_{\mathrm{op}})(\|\bm{U}_{1,k}-\bm{U}_{2,k}\|^{2}_{F}\|\bm{V}_{1,k}\|^{2}_{\mathrm{op}}+\|\bm{V}_{1,k}-\bm{V}_{2,k}\|^{2}_{F}\|\bm{U}_{2,k}\|^{2}_{\mathrm{op}})\big{]},

𝚼1,j11𝜿1,jT𝚼2,j11𝜿2,jTF2𝑩1(1+𝑪1,p1op2maxj𝑼1,jop2maxj𝑽1,jop2)j+.\|\bm{\Upsilon}_{1,j-1}^{-1}\bm{\kappa}_{1,j}^{\mathrm{T}}-\bm{\Upsilon}_{2,j-1}^{-1}\bm{\kappa}_{2,j}^{\mathrm{T}}\|^{2}_{F}\leq\bm{B}_{1}(1+\|\bm{C}_{1,p}^{-1}\|^{2}_{\mathrm{op}}\max_{j}\|\bm{U}_{1,j}\|^{2}_{\mathrm{op}}\max_{j}\|\bm{V}_{1,j}\|^{2}_{\mathrm{op}})^{j}+....

j=0p1𝑫1,j𝑫2,jF2\displaystyle\sum_{j=0}^{p-1}\|\bm{D}_{1,j}-\bm{D}_{2,j}\|^{2}_{F}
p𝚼1(0)𝚼2(0)F2+j=0p1k=1j[𝑼1,k𝑽1,kop2𝑼2,k𝑽2,kop2𝑪1,k11𝑪2,k11F2+(𝑪1,k11op2\displaystyle\quad\leq p\|\bm{\Upsilon}_{1}(0)-\bm{\Upsilon}_{2}(0)\|^{2}_{F}+\sum_{j=0}^{p-1}\sum_{k=1}^{j}\big{[}\|\bm{U}_{1,k}\bm{V}_{1,k}\|^{2}_{\mathrm{op}}\|\bm{U}_{2,k}\bm{V}_{2,k}\|^{2}_{\mathrm{op}}\|\bm{C}_{1,k-1}^{-1}-\bm{C}_{2,k-1}^{-1}\|^{2}_{F}+(\|\bm{C}_{1,k-1}^{-1}\|^{2}_{\mathrm{op}}
+𝑪2,k11op2)(𝑼1,k𝑼2,kF2𝑽1,kop2+𝑽1,k𝑽2,kF2𝑼2,kop2)]\displaystyle\quad+\|\bm{C}_{2,k-1}^{-1}\|^{2}_{\mathrm{op}})(\|\bm{U}_{1,k}-\bm{U}_{2,k}\|^{2}_{F}\|\bm{V}_{1,k}\|^{2}_{\mathrm{op}}+\|\bm{V}_{1,k}-\bm{V}_{2,k}\|^{2}_{F}\|\bm{U}_{2,k}\|^{2}_{\mathrm{op}})\big{]}
p𝚼1(0)𝚼2(0)F2+j=1p1k=1j[M1,UM1,VM2,UM2,V𝑪1,k11𝑪2,k11F2+(𝑪1,p1op2\displaystyle\quad\leq p\|\bm{\Upsilon}_{1}(0)-\bm{\Upsilon}_{2}(0)\|^{2}_{F}+\sum_{j=1}^{p-1}\sum_{k=1}^{j}\big{[}M_{1,U}M_{1,V}M_{2,U}M_{2,V}\|\bm{C}_{1,k-1}^{-1}-\bm{C}_{2,k-1}^{-1}\|^{2}_{F}+(\|\bm{C}_{1,p}^{-1}\|^{2}_{\mathrm{op}}
+𝑪2,p1op2)(𝑼1,k𝑼2,kF2M1,V+𝑽1,k𝑽2,kF2M2,U)]\displaystyle\quad+\|\bm{C}_{2,p}^{-1}\|^{2}_{\mathrm{op}})(\|\bm{U}_{1,k}-\bm{U}_{2,k}\|^{2}_{F}M_{1,V}+\|\bm{V}_{1,k}-\bm{V}_{2,k}\|^{2}_{F}M_{2,U})\big{]}
p𝚼1(0)𝚼2(0)F2+pk=1p[M1,UM1,VM2,UM2,V𝑪1,k11𝑪2,k11F2+(𝑪1,p1op2\displaystyle\quad\leq p\|\bm{\Upsilon}_{1}(0)-\bm{\Upsilon}_{2}(0)\|^{2}_{F}+p\sum_{k=1}^{p}\big{[}M_{1,U}M_{1,V}M_{2,U}M_{2,V}\|\bm{C}_{1,k-1}^{-1}-\bm{C}_{2,k-1}^{-1}\|^{2}_{F}+(\|\bm{C}_{1,p}^{-1}\|^{2}_{\mathrm{op}}
+𝑪2,p1op2)(𝑼1,k𝑼2,kF2M1,V+𝑽1,k𝑽2,kF2M2,U)].\displaystyle\quad+\|\bm{C}_{2,p}^{-1}\|^{2}_{\mathrm{op}})(\|\bm{U}_{1,k}-\bm{U}_{2,k}\|^{2}_{F}M_{1,V}+\|\bm{V}_{1,k}-\bm{V}_{2,k}\|^{2}_{F}M_{2,U})\big{]}.

Proof of Lemma 13.
𝚼1,j11𝜿1,jT𝚼2,j11𝜿2,jTF2\displaystyle\|\bm{\Upsilon}_{1,j-1}^{-1}\bm{\kappa}_{1,j}^{\mathrm{T}}-\bm{\Upsilon}_{2,j-1}^{-1}\bm{\kappa}_{2,j}^{\mathrm{T}}\|^{2}_{F}
𝚼1,j21𝜿1,j1T𝚼2,j21𝜿2,j1TF2(1+𝑪1,j1op2𝑼1,jop2𝑽1,jop2)\displaystyle\quad\leq\|\bm{\Upsilon}_{1,j-2}^{-1}\bm{\kappa}_{1,j-1}^{\mathrm{T}}-\bm{\Upsilon}_{2,j-2}^{-1}\bm{\kappa}_{2,j-1}^{\mathrm{T}}\|^{2}_{F}(1+\|\bm{C}_{1,j}^{-1}\|^{2}_{\mathrm{op}}\|\bm{U}_{1,j}\|^{2}_{\mathrm{op}}\|\bm{V}_{1,j}\|^{2}_{\mathrm{op}})
+2𝑪1,j1𝑼1,j𝑽1,j𝑪2,j1𝑼2,j𝑽2,jF2\displaystyle\quad+2\|\bm{C}_{1,j}^{-1}\bm{U}_{1,j}\bm{V}_{1,j}-\bm{C}_{2,j}^{-1}\bm{U}_{2,j}\bm{V}_{2,j}\|^{2}_{F}
𝚼1,j21𝜿1,j1T𝚼2,j21𝜿2,j1TF2(1+𝑪1,j1op2𝑼1,jop2𝑽1,jop2)\displaystyle\quad\leq\|\bm{\Upsilon}_{1,j-2}^{-1}\bm{\kappa}_{1,j-1}^{\mathrm{T}}-\bm{\Upsilon}_{2,j-2}^{-1}\bm{\kappa}_{2,j-1}^{\mathrm{T}}\|^{2}_{F}(1+\|\bm{C}_{1,j}^{-1}\|^{2}_{\mathrm{op}}\|\bm{U}_{1,j}\|^{2}_{\mathrm{op}}\|\bm{V}_{1,j}\|^{2}_{\mathrm{op}})
+2𝑪1,j1𝑪2,j1F2𝑼1,jop2𝑽1,jop2\displaystyle\qquad+2\|\bm{C}^{-1}_{1,j}-\bm{C}^{-1}_{2,j}\|^{2}_{F}\|\bm{U}_{1,j}\|^{2}_{\mathrm{op}}\|\bm{V}_{1,j}\|^{2}_{\mathrm{op}}
+2𝑪2,j1op2[𝑼1,j𝑼2,jF2𝑽2,jop2+𝑽1,j𝑽2,jF2𝑼1,jop2]\displaystyle\qquad+2\|\bm{C}_{2,j}^{-1}\|^{2}_{\mathrm{op}}\bigg{[}\|\bm{U}_{1,j}-\bm{U}_{2,j}\|^{2}_{F}\|\bm{V}_{2,j}\|^{2}_{\mathrm{op}}+\|\bm{V}_{1,j}-\bm{V}_{2,j}\|^{2}_{F}\|\bm{U}_{1,j}\|^{2}_{\mathrm{op}}\bigg{]}
𝚼1,j21𝜿1,j1T𝚼2,j21𝜿2,j1TF2(1+𝑪1,p1op2M1,UM1,V)\displaystyle\quad\leq\|\bm{\Upsilon}_{1,j-2}^{-1}\bm{\kappa}_{1,j-1}^{\mathrm{T}}-\bm{\Upsilon}_{2,j-2}^{-1}\bm{\kappa}_{2,j-1}^{\mathrm{T}}\|^{2}_{F}(1+\|\bm{C}_{1,p}^{-1}\|^{2}_{\mathrm{op}}\|M_{1,U}M_{1,V})
+2𝑪1,j1𝑪2,j1F2M1,UM1,V\displaystyle\qquad+2\|\bm{C}^{-1}_{1,j}-\bm{C}^{-1}_{2,j}\|^{2}_{F}M_{1,U}M_{1,V}
+2𝑪2,p1op2[𝑼1,j𝑼2,jF2M2,V+𝑽1,j𝑽2,jF2M1,U],\displaystyle\qquad+2\|\bm{C}_{2,p}^{-1}\|^{2}_{\mathrm{op}}\bigg{[}\|\bm{U}_{1,j}-\bm{U}_{2,j}\|^{2}_{F}M_{2,V}+\|\bm{V}_{1,j}-\bm{V}_{2,j}\|^{2}_{F}M_{1,U}\bigg{]},

where M,U=maxj𝑼,jop2,M,V=maxj𝑽,jop2M_{\ell,U}=\max_{j}\|\bm{U}_{\ell,j}\|^{2}_{\mathrm{op}},M_{\ell,V}=\max_{j}\|\bm{V}_{\ell,j}\|^{2}_{\mathrm{op}}. Let M,C,U,V=(1+𝑪,p1op2M,UM,V)pM_{\ell,C,U,V}=(1+\|\bm{C}_{\ell,p}^{-1}\|^{2}_{\mathrm{op}}\|M_{\ell,U}M_{\ell,V})^{p}, then

𝚼1,p11𝜿1,pT𝚼2,p11𝜿2,pTF2\displaystyle\|\bm{\Upsilon}_{1,p-1}^{-1}\bm{\kappa}_{1,p}^{\mathrm{T}}-\bm{\Upsilon}_{2,p-1}^{-1}\bm{\kappa}_{2,p}^{\mathrm{T}}\|^{2}_{F}
2M1,UM1,VM1,C,U,Vk=0p𝑪1,k1𝑪2,k1F2+2𝑪2,p1op2M2,VM1,VM1,C,U,Vk=1p𝑼1,k𝑼2,kF2\displaystyle\quad\leq 2M_{1,U}M_{1,V}M_{1,C,U,V}\sum_{k=0}^{p}\|\bm{C}^{-1}_{1,k}-\bm{C}^{-1}_{2,k}\|^{2}_{F}+2\|\bm{C}_{2,p}^{-1}\|^{2}_{\mathrm{op}}M_{2,V}M_{1,V}M_{1,C,U,V}\sum_{k=1}^{p}\|\bm{U}_{1,k}-\bm{U}_{2,k}\|^{2}_{F}
+2𝑪2,p1op2F2M2,UM1,VM1,C,U,Vk=1p𝑽1,k𝑽2,kF2.\displaystyle\qquad+2\|\bm{C}_{2,p}^{-1}\|^{2}_{\mathrm{op}}\|^{2}_{F}M_{2,U}M_{1,V}M_{1,C,U,V}\sum_{k=1}^{p}\|\bm{V}_{1,k}-\bm{V}_{2,k}\|^{2}_{F}.

Combining (A.2) and (A.3), we get an upper bound for (A.1). ∎

Proof of Lemma 14.

We have 𝑼j=𝑪j1𝑳j(𝑰+𝑳jT𝑪j1𝑳j)1/2\bm{U}_{j}=\bm{C}_{j-1}\bm{L}_{j}(\bm{I}+\bm{L}^{\mathrm{T}}_{j}\bm{C}_{j-1}\bm{L}_{j})^{-1/2} and 𝑽j=𝑲j(𝑲jT𝑫j11𝑲j)1/2\bm{V}_{j}=\bm{K}_{j}(\bm{K}_{j}^{\mathrm{T}}\bm{D}_{j-1}^{-1}\bm{K}_{j})^{-1/2}.

𝑼,jop2𝑪j1op2𝑳jop2\|\bm{U}_{\ell,j}\|^{2}_{\mathrm{op}}\leq\|\bm{C}_{j-1}\|^{2}_{\mathrm{op}}\|\bm{L}_{j}\|^{2}_{\mathrm{op}} and 𝑽,jop2𝑫j11op2\|\bm{V}_{\ell,j}\|^{2}_{\mathrm{op}}\leq\|\bm{D}_{j-1}^{-1}\|^{2}_{\mathrm{op}}

Using the inequality 𝑨1/2𝑩1/2op2𝑨𝑩op1/2\|\bm{A}^{1/2}-\bm{B}^{1/2}\|^{2}_{\mathrm{op}}\leq\|\bm{A}-\bm{B}\|^{1/2}_{\mathrm{op}}, we can bound

𝑽j,1𝑽j,2F2r𝑽j,1𝑽j,2op2\displaystyle\|\bm{V}_{j,1}-\bm{V}_{j,2}\|^{2}_{F}\leq r\|\bm{V}_{j,1}-\bm{V}_{j,2}\|^{2}_{\mathrm{op}}
𝑲1,1op2(𝑲j,1T𝑫j1,11𝑲1,1)1(𝑲j,2T𝑫j2,21𝑲1,2)1op\displaystyle\quad\leq\|\bm{K}_{1,1}\|^{2}_{\mathrm{op}}\|(\bm{K}_{j,1}^{\mathrm{T}}\bm{D}_{j-1,1}^{-1}\bm{K}_{1,1})^{-1}-(\bm{K}_{j,2}^{\mathrm{T}}\bm{D}_{j-2,2}^{-1}\bm{K}_{1,2})^{-1}\|_{\mathrm{op}}
+r(𝑲j,2T𝑫0,21𝑲j,2)1/2op2𝑲j,1𝑲j,2op2\displaystyle\qquad+r\|(\bm{K}_{j,2}^{\mathrm{T}}\bm{D}_{0,2}^{-1}\bm{K}_{j,2})^{-1/2}\|^{2}_{\mathrm{op}}\|\bm{K}_{j,1}-\bm{K}_{j,2}\|^{2}_{\mathrm{op}}

Similarly,

𝑼j,1𝑼j,2op2\displaystyle\|\bm{U}_{j,1}-\bm{U}_{j,2}\|^{2}_{\mathrm{op}}
𝑪j1,1op2𝑳j,1op2(𝑰+𝑳j,1T𝑪j1,1𝑳1,1)1op(𝑰+𝑳j,2T𝑪j1,2𝑳j,2)1op𝑳j,1T𝑪j1,1𝑳j,1\displaystyle\quad\leq\|\bm{C}_{j-1,1}\|^{2}_{\mathrm{op}}\|\bm{L}_{j,1}\|^{2}_{\mathrm{op}}\|(\bm{I}+\bm{L}^{\mathrm{T}}_{j,1}\bm{C}_{j-1,1}\bm{L}_{1,1})^{-1}\|_{\mathrm{op}}\|(\bm{I}+\bm{L}^{\mathrm{T}}_{j,2}\bm{C}_{j-1,2}\bm{L}_{j,2})^{-1}\|_{\mathrm{op}}\|\bm{L}^{\mathrm{T}}_{j,1}\bm{C}_{j-1,1}\bm{L}_{j,1}
𝑳2,jT𝑪j1,2𝑳j,2op+(𝑰+𝑳j,2T𝑪j1,2bLj,2)1/2op2𝑪j1,1𝑳j,1𝑪j1,2𝑳j,2op2\displaystyle\qquad-\bm{L}^{\mathrm{T}}_{2,j}\bm{C}_{j-1,2}\bm{L}_{j,2}\|_{\mathrm{op}}+\|(\bm{I}+\bm{L}^{\mathrm{T}}_{j,2}\bm{C}_{j-1,2}bL_{j,2})^{-1/2}\|^{2}_{\mathrm{op}}\|\bm{C}_{j-1,1}\bm{L}_{j,1}-\bm{C}_{j-1,2}\bm{L}_{j,2}\|^{2}_{\mathrm{op}}

S3 Attributes of the realized graph

The nodal attributes are computed using NetworkToolbox [Christensen, 2018], and brainGraph [Watson, 2020] in R. For the convenience of the reader, we briefly describe the definitions and characteristics of the nodal attributes. For a graph GG with VV nodes, let us denote the adjacency matrix by 𝑨=((ai,j))1i,jV\bm{A}=(\!(a_{i,j})\!)_{1\leq i,j\leq V}. Then, the degree of node ii is jai,j\sum_{j}a_{i,j}. The degree refers to the strength of each node in a graph. Global efficiency of the graph GG is measured by F(G)=[V(V1)]1iGri,1F(G)=[V(V-1)]^{-1}\sum_{i\neq\ell\in G}r_{i,\ell}^{-1}, where ri,r_{i,\ell} the shortest path length between the nodes ii and \ell. The nodal efficiency is defined by Ni1NiF(Gi)N_{i}^{-1}\sum_{\ell\in N_{i}}F(G_{i}), where GiG_{i} is the subgraph of neighbors of ii and NiN_{i} stands for the number of nodes in GiG_{i}. The corresponding betweenness centrality is iks,k(i)/s,k\sum_{\ell\neq i\neq k}{s_{\ell,k}(i)}/{s_{\ell,k}}, where s,ks_{\ell,k} stands for the total number of the shortest paths from node \ell to node kk and s,k(i)s_{\ell,k}(i) denotes the number of those paths that pass through ii. Nodal impact quantifies the impact of each node by measuring the change in average distances in the network upon removal of that node [Kenett et al., 2011]. The local shortest path length of each node is defined by the average shortest path length from that node to other nodes in the network. It is used to quantify the mean separation of a node in a network. Eccentricity is the maximal shortest path length between a node and any other node. Thus, it captures how far a node is from its most distant node in the network. Participant coefficient requires detecting communities first within the network. While using the R function participation of R package NetworkToolbox, we use the walktrap algorithm [Pons and Latapy, 2005] to detect communities. The participation coefficient quantifies the distribution of a node’s edges among different communities in the graph. It is zero if all the edges of a node are entirely restricted to its community, and it reaches its maximum value of 1 when the node’s edges are evenly distributed among all the communities. Within-community centrality is used to describe how central a node’s community is within the whole network. We use the walktrap algorithm to detect the communities. The final centrality measure that we include in our study is leverage centrality, introduced by Joyce et al. [2010]. Leverage centrality is the ratio of the degree of a node to its neighbors, and it thus measures the reliance of its immediate neighbors on that node for information.