This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Large Order-Invariant Bayesian VARs with
Stochastic Volatilitythanks: We would like to thank Chenghan Hou for many helpful comments.

Joshua C. C. Chan
Purdue University
   Gary Koop
University of Strathclyde
   Xuewen Yu
Purdue University
(October 2021)
Abstract

Many popular specifications for Vector Autoregressions (VARs) with multivariate stochastic volatility are not invariant to the way the variables are ordered due to the use of a Cholesky decomposition for the error covariance matrix. We show that the order invariance problem in existing approaches is likely to become more serious in large VARs. We propose the use of a specification which avoids the use of this Cholesky decomposition. We show that the presence of multivariate stochastic volatility allows for identification of the proposed model and prove that it is invariant to ordering. We develop a Markov Chain Monte Carlo algorithm which allows for Bayesian estimation and prediction. In exercises involving artificial and real macroeconomic data, we demonstrate that the choice of variable ordering can have non-negligible effects on empirical results. In a macroeconomic forecasting exercise involving VARs with 2020 variables we find that our order-invariant approach leads to the best forecasts and that some choices of variable ordering can lead to poor forecasts using a conventional, non-order invariant, approach.

Keywords: large vector autoregression, order invariance, stochastic volatility, shrinkage prior


JEL classifications: C11, C52, C55

1 Introduction

Vector Autoregressions (VARs) are one of the most popular models in modern macroeconomics. In recent years, two prominent developments have occured in the Baysian VAR literature. First, beginning with Banbura, Giannone, and Reichlin (2010) researchers have been working with large VARs involving dozens or even hundreds of dependent variables. Second, there is an increasingly recognition, in most macroeconomic datasets, of the need to allow for time variation in VAR error variances, see, among many others, Clark (2011). Both of these developments have led to VAR specifications which are not invariant to the way the variables are ordered in the VAR. This does not matter in the case where a logical ordering of the variables suggests itself (e.g. as in some structural VARs) or when the number of variables in the VAR is very small since the researcher can consider all possible ways of ordering the variables. But with a large number of variables it is unlikely that a logical ordering suggests itself and it is not practical to consider every possible ordering. These considerations motivate the present paper. In it we describe an alternative VAR specification with stochastic volatility (SV) and prove that it is invariant to the ways the variables are ordered. We develop a computationally-efficient Markov Chain Monte Carlo (MCMC) algorithm which is scaleable and allows for Bayesian estimation and forecasting even in high dimensional VARs. We carry out empirical work with artificial and real data which demonstrates the consequences of a lack of order invariance in conventional approaches and that our approach does not suffer from them.

There are two basic insights that underlie our approach. The first is that it is the use of a Cholesky decomposition of the error covariance matrix 𝚺t\boldsymbol{\Sigma}_{t} that leads to order dependence. Beginning with influential early VAR-SV papers such as Cogley and Sargent (2005) and Primiceri (2005), this Cholesky decomposition has been used in numerous papers. The second is that SV can be used to identify structural VARs, see Bertsche and Braun (2020). We adopt a VAR-SV specification that avoids the use of the Cholesky decomposition. If we were working with a homoscedastic VAR, our model would be unidentified. But we let the SV achieve identification. In theory we prove order invariance and, in practice, we show that it works well.

The remainder of this paper is organized as follows. The next section discusses ordering issues in VAR-SVs. The third section introduces our multivariate SV process and proves that it is identified and order invariant. The fourth section adds the VAR component to the model and discusses Bayesian estimation of the resulting order-invariant VAR-SV. The fifth section uses artificial data to illustrate the performance of our model in terms of accuracy and computation time. The sixth section investigates the practical consequences of using the non-order invariant approach of Cogley and Sargent (2005) using artificial and real macroeconomic data. Finally, section seven carries out a forecasting exercise which demonstrates that our order-invariant approach forecasts well, but that forecasts produced by the model of Cogley and Sargent (2005) are sensitive to the way that the variables are ordered and some ordering choices can lead to poor forecast performance.

2 Ordering Issues in VAR-SVs

Ordering issues in VARs relate to the error covariance matrix, not the VAR coefficients themselves. For instance, Carriero, Clark, and Marcellino (2019) use a triangular VAR-SV which involves a Cholesky decomposition of the error covariance matrix, 𝚺t\boldsymbol{\Sigma}_{t}. They demonstrate that, conditional on 𝚺t\boldsymbol{\Sigma}_{t}, the draws of the VAR coefficients are invariant to ordering.111It is worth noting that one major reason for triangularizing the system is to allow for equation by equation estimation of the model. This leads to large computational benefits since there is no need to manipulate the (potentially enormous) posterior covariance matrix of the VAR coefficients in a single block. In particular, Carriero, Clark, and Marcellino (2019) show how, relative to full system estimation, triangularisation can reduce computational complexity of Bayesian estimation of large VARs from O(n6)O(n^{6}) to O(n4)O(n^{4}) where nn is VAR dimension. This highlights the importance of development of VAR specifications, such as the one in this paper, which allow for equation by equation estimation. Such invariance to ordering of the VAR coefficients holds for all the models discussed in the paper. Accordingly, in this section and the next we focus on the multivariate stochastic volatility process. We say a model is invariant to ordering if the order of the variables does not affect the posterior distribution of the error covariance matrix, subject to permutations. In other words, if the ordering of variables ii and jj in the VAR is changed, then posterior inference on the error covariances and variances associated with variables ii and jj is unaffected for all ii and jj.

Before discussing multivariate stochastic volatility approaches which are not order invariant, it is worthwhile noting that there are some multivariate SV approaches which are order invariant. Wishart or inverse-Wishart processes (e.g., Philipov and Glickman (2006), Asai and McAleer (2009) and Chan, Doucet, León-González, and Strachan (2018)) are by construction invariant to ordering. However, estimation of SV models based on these multivariate processes is computationally intensive as it typically involves drawing from high-dimensional, non-standard distributions. Consequently, these models are generally not applicable to large datasets. In contrast, the common stochastic volatility models considered in Carriero, Clark, and Marcellino (2016) and Chan (2020) are order invariant and can be estimated quickly even for large systems. However, these models are less flexible as the time-varying error covariance matrix depends on a single SV process (e.g., error variances are always proportional to each other). Another order invariant approach is based on the discounted Wishart process (e.g., Uhlig (1997), West and Harrison (2006), Prado and West (2010) and Bognanni (2018)), which admits efficient filtering and smoothing algorithms for estimation. However, this approach appears to be too tightly parameterized for macroeconomic data and it tends to underperform in terms of both point and density forecasts relative to standard SV models such as Cogley and Sargent (2005) and Primiceri (2005); see, for example, Arias, Rubio-Ramirez, and Shin (2021) for a forecast comparison exercise.

Another model that is potentially order invariant is the factor stochastic volatility model, and recently these models have been applied to large datasets. However, factor models have some well-known drawbacks such as mis-specification concerns associated with assuming a high dimensional volatility process is well approximated with a low dimensional set of factors. Kastner (2019) notes that, if one imposes the usual identification restrictions (e.g., factor loadings are triangular), then the model is not order invariant. Kastner (2019) does not impose those restrictions, arguing that identification of the factor loadings is not necessary for many purposes (e.g. forecasting).

The multivariate stochastic volatility specification that is perhaps most commonly used in macroeconomics is that of Cogley and Sargent (2005). The SV specification of the reduced-form errors, the nn-dimensional vector 𝐮t\mathbf{u}_{t}, in Cogley and Sargent (2005) can be written as

𝐮t=𝐁01𝜺t,𝜺t𝒩(𝟎,𝐃t),\mathbf{u}_{t}=\mathbf{B}_{0}^{-1}\boldsymbol{\varepsilon}_{t},\quad\boldsymbol{\varepsilon}_{t}\sim\mathcal{N}(\mathbf{0},\mathbf{D}_{t}), (1)

where 𝐁0\mathbf{B}_{0} is a unit lower triangular matrix with elements bijb_{ij} and 𝐃t=diag(eh1,t,,ehn,t)\mathbf{D}_{t}=\text{diag}(\text{e}^{h_{1,t}},\ldots,\text{e}^{h_{n,t}}) is a diagonal matrix consisting of the log-volatility 𝐡t=(h1,t,,hn,t)\mathbf{h}_{t}=(h_{1,t},\ldots,h_{n,t})^{\prime}. By assuming that 𝐁0\mathbf{B}_{0} is lower triangular, the implied covariance matrix on 𝐮t\mathbf{u}_{t} naturally depends on the order of the elements in 𝐮t\mathbf{u}_{t}. In particular, one can show that under standard assumptions, the variance of the ii-th element of 𝐮t\mathbf{u}_{t}, ui,tu_{i,t}, increases as ii increases. Hence, this ordering issue becomes worse in higher dimensional models. In addition, since the stochastic volatility specification in Primiceri (2005) is based on Cogley and Sargent (2005) but with 𝐁0\mathbf{B}_{0} time-varying, it has a similar ordering issue.

To illustrate this key point, we assume for simplicity that 𝜺t𝒩(𝟎,𝐈n)\boldsymbol{\varepsilon}_{t}\sim\mathcal{N}(\mathbf{0},\mathbf{I}_{n}). We further assume that the priors on the lower triangular elements of 𝐁0\mathbf{B}_{0} are independent with prior means 0 and variances 1. Since 𝐁0𝐮t=𝜺t\mathbf{B}_{0}\mathbf{u}_{t}=\boldsymbol{\varepsilon}_{t}, we can obtain the elements of 𝐮t\mathbf{u}_{t} recursively and compute their variances Var(ui,t)=𝔼ui,t2\text{Var}(u_{i,t})=\mathbb{E}u_{i,t}^{2}. In particular, we have

u1,t\displaystyle u_{1,t} =ε1,t\displaystyle=\varepsilon_{1,t}
u2,t\displaystyle u_{2,t} =ε2,tb2,1u1,t\displaystyle=\varepsilon_{2,t}-b_{2,1}u_{1,t}
u3,t\displaystyle u_{3,t} =ε3,tb3,2u2,tb3,1u1,t\displaystyle=\varepsilon_{3,t}-b_{3,2}u_{2,t}-b_{3,1}u_{1,t}
\displaystyle\vdots
ui,t\displaystyle u_{i,t} =εi,tbi,i1ui1,tbi,1u1,t.\displaystyle=\varepsilon_{i,t}-b_{i,i-1}u_{i-1,t}-\cdots-b_{i,1}u_{1,t}.

Since 𝔼(bi,jbi,k)=0\mathbb{E}(b_{i,j}b_{i,k})=0 for all jkj\neq k, the expectation of any cross-product term in ui,t2u_{i,t}^{2} is 0. Hence, we have

𝔼u1,t2\displaystyle\mathbb{E}u_{1,t}^{2} =𝔼ε1,t2=1\displaystyle=\mathbb{E}\varepsilon_{1,t}^{2}=1
𝔼u2,t2\displaystyle\mathbb{E}u_{2,t}^{2} =𝔼ε2,t2+𝔼b2,12𝔼u1,t2=1+𝔼u1,t2=2\displaystyle=\mathbb{E}\varepsilon_{2,t}^{2}+\mathbb{E}b_{2,1}^{2}\mathbb{E}u_{1,t}^{2}=1+\mathbb{E}u_{1,t}^{2}=2
𝔼u3,t2\displaystyle\mathbb{E}u_{3,t}^{2} =𝔼ε3,t2+𝔼b3,22𝔼u2,t2+𝔼b3,12𝔼u1,t2=1+𝔼u2,t2+𝔼u1,t2=4\displaystyle=\mathbb{E}\varepsilon_{3,t}^{2}+\mathbb{E}b_{3,2}^{2}\mathbb{E}u_{2,t}^{2}+\mathbb{E}b_{3,1}^{2}\mathbb{E}u_{1,t}^{2}=1+\mathbb{E}u_{2,t}^{2}+\mathbb{E}u_{1,t}^{2}=4
\displaystyle\vdots
𝔼ui,t2\displaystyle\mathbb{E}u_{i,t}^{2} =𝔼εi,t2+𝔼bi,i12𝔼ui1,t2++𝔼bi,12𝔼u1,t2=2i1.\displaystyle=\mathbb{E}\varepsilon_{i,t}^{2}+\mathbb{E}b_{i,i-1}^{2}\mathbb{E}u_{i-1,t}^{2}+\cdots+\mathbb{E}b_{i,1}^{2}\mathbb{E}u_{1,t}^{2}=2^{i-1}.

In other words, the variance of the reduced-form error increases (exponentially) as it is ordered lower in the nn-tuple, even though the assumptions about bi,jb_{i,j} and εi,t\varepsilon_{i,t} are identical across i=1,,ni=1,\ldots,n.

3 An Order-Invariant Stochastic Volatility Model

3.1 Model Definition

In this section we extend the stochastic volatility specification in Cogley and Sargent (2005) with the goal of constructing an order-invariant model. The key reason why the ordering issue in Cogley and Sargent (2005) occurs is because of the lower triangular assumption for 𝐁0\mathbf{B}_{0}. A simple solution to this problem is to avoid this assumption and assume that 𝐁0\mathbf{B}_{0} is an unrestricted square matrix. To that end, let 𝐲t=(y1,t,,yn,t)\mathbf{y}_{t}=(y_{1,t},\ldots,y_{n,t})^{\prime} be an n×1n\times 1 vector of variables that is observed over the periods t=1,,T.t=1,\ldots,T. To fix ideas, consider the following model with zero conditional mean:

𝐲t=𝐁01𝜺t,𝜺t𝒩(𝟎,𝐃t),\mathbf{y}_{t}=\mathbf{B}_{0}^{-1}\boldsymbol{\varepsilon}_{t},\quad\boldsymbol{\varepsilon}_{t}\sim\mathcal{N}(\mathbf{0},\mathbf{D}_{t}), (2)

where 𝐃t=diag(eh1,t,,ehn,t)\mathbf{D}_{t}=\text{diag}(\text{e}^{h_{1,t}},\ldots,\text{e}^{h_{n,t}}) is diagonal. Here we assume that 𝐁0\mathbf{B}_{0} is non-singular, but is otherwise unrestricted. Each of the log-volatilities follows a stationary AR(1) process:

hi,t=ϕihi,t1+ui,th,ui,th𝒩(0,ωi2),h_{i,t}=\phi_{i}h_{i,t-1}+u_{i,t}^{h},\quad u_{i,t}^{h}\sim\mathcal{N}(0,\omega_{i}^{2}), (3)

for t=2,,Tt=2,\ldots,T, where |ϕi|<1|\phi_{i}|<1 and the initial condition is specified as hi,1𝒩(0,ωi2/(1ϕi2))h_{i,1}\sim\mathcal{N}(0,\omega_{i}^{2}/(1-\phi_{i}^{2})). Note that the unconditional mean of the AR(1) process is normalized to be zero for identification. We call this the Order Invariant SV (OI-SV) model.

3.2 Identification and Order-Invariance

The question of identification of our OI-SV model can be dealt with quickly. If there is no SV, it is well-known that 𝐁0\mathbf{B}_{0} is not identified. In particular, any orthogonal transformation of 𝐁0\mathbf{B}_{0} gives the same likelihood. But with the presence of SV, Bertsche and Braun (2020) show that 𝐁0\mathbf{B}_{0} is identified up to permutations and sign changes:

Proposition 1 (Bertsche and Braun (2020)).

Consider the stochastic volatility model given in (2)-(3) with ϕi0,i=1,,n.\phi_{i}\neq 0,i=1,\ldots,n. Then, 𝐁0\mathbf{B}_{0} is unique up to permutation of its column and multiplication of its columns by 1-1.

The assumptions in Proposition 1 are stronger than necessary for identification. In fact, Bertsche and Braun (2020) show that having n1n-1 SV processes is sufficient for identification and derive results relating to partial identification which occurs if there are nrn-r SV processes with r>1r>1. Intuitively, allowing for time-varying volatility generates non-trivial autocovariances in the squared reduced form errors. These additional moments help identify 𝐁0\mathbf{B}_{0}. We refer readers to Lanne, Lütkepohl, and Maciejowska (2010) and Lewis (2021) for a more detailed discussion of how identification can be achieved in structural VARs via heteroscedasticity.

Below we outline a few theoretical properties of the stochastic volatility model described in (2). First, we show that the model is order invariant, in the sense that if we permute the order of the dependent variables, the likelihood function implied by this new ordering is the same as that of the original ordering, provided that we permute the parameters accordingly. Stacking 𝐲=(𝐲1,,𝐲T)\mathbf{y}=(\mathbf{y}_{1}^{\prime},\ldots,\mathbf{y}_{T}^{\prime})^{\prime} and 𝐡=(𝐡1,,𝐡T)\mathbf{h}=(\mathbf{h}_{1}^{\prime},\ldots,\mathbf{h}_{T}^{\prime})^{\prime}, note that the likelihood implied by (2) is

p(𝐲|𝐁0,𝐡)=(2π)nT2|det𝐁0|Tt=1T|det𝐃t|12e12t=1T𝐲t𝐁0𝐃t1𝐁0𝐲t,p(\mathbf{y}\,|\,\mathbf{B}_{0},\mathbf{h})=(2\pi)^{-\frac{nT}{2}}|\det\mathbf{B}_{0}|^{T}\prod_{t=1}^{T}|\det\mathbf{D}_{t}|^{-\frac{1}{2}}\text{e}^{-\frac{1}{2}\sum_{t=1}^{T}\mathbf{y}_{t}^{\prime}\mathbf{B}_{0}^{\prime}\mathbf{D}_{t}^{-1}\mathbf{B}_{0}\mathbf{y}_{t}},

where |det𝐂||\det\mathbf{C}| denotes the absolute value of the determinant of the square matrix 𝐂\mathbf{C}. Now, suppose we permute the order of the dependent variables. More precisely, let 𝐏\mathbf{P} denote an arbitrary permutation matrix of dimension nn, and we define 𝐲~t=𝐏𝐲t\widetilde{\mathbf{y}}_{t}=\mathbf{P}\mathbf{y}_{t}. By the standard change of variable result, the likelihood of 𝐲~=(𝐲~1,,𝐲~T)\widetilde{\mathbf{y}}=(\widetilde{\mathbf{y}}_{1}^{\prime},\ldots,\widetilde{\mathbf{y}}_{T}^{\prime})^{\prime} is

|det𝐏1|T\displaystyle|\det\mathbf{P}^{-1}|^{T} (2π)nT2|det𝐁0|Tt=1T|det𝐃t|12e12t=1T(𝐏1𝐲~t)𝐁0𝐃t1𝐁0𝐏1𝐲~t\displaystyle(2\pi)^{-\frac{nT}{2}}|\det\mathbf{B}_{0}|^{T}\prod_{t=1}^{T}|\det\mathbf{D}_{t}|^{-\frac{1}{2}}\text{e}^{-\frac{1}{2}\sum_{t=1}^{T}(\mathbf{P}^{-1}\widetilde{\mathbf{y}}_{t})^{\prime}\mathbf{B}_{0}^{\prime}\mathbf{D}_{t}^{-1}\mathbf{B}_{0}\mathbf{P}^{-1}\widetilde{\mathbf{y}}_{t}}
=(2π)nT2|det𝐁0𝐏|Tt=1T|det𝐃t|12e12t=1T𝐲~t𝐏𝐁0𝐃t1𝐁0𝐏𝐲~t\displaystyle=(2\pi)^{-\frac{nT}{2}}|\det\mathbf{B}_{0}\mathbf{P}^{\prime}|^{T}\prod_{t=1}^{T}|\det\mathbf{D}_{t}|^{-\frac{1}{2}}\text{e}^{-\frac{1}{2}\sum_{t=1}^{T}\widetilde{\mathbf{y}}_{t}^{\prime}\mathbf{P}\mathbf{B}_{0}^{\prime}\mathbf{D}_{t}^{-1}\mathbf{B}_{0}\mathbf{P}^{\prime}\widetilde{\mathbf{y}}_{t}}
=(2π)nT2|det𝐏𝐁0𝐏|Tt=1T|det𝐏𝐃t𝐏|12e12t=1T𝐲~t𝐏𝐁0𝐏𝐏𝐃t1𝐏𝐏𝐁0𝐏𝐲~t\displaystyle=(2\pi)^{-\frac{nT}{2}}|\det\mathbf{P}\mathbf{B}_{0}\mathbf{P}^{\prime}|^{T}\prod_{t=1}^{T}|\det\mathbf{P}\mathbf{D}_{t}\mathbf{P}^{\prime}|^{-\frac{1}{2}}\text{e}^{-\frac{1}{2}\sum_{t=1}^{T}\widetilde{\mathbf{y}}_{t}^{\prime}\mathbf{P}\mathbf{B}_{0}^{\prime}\mathbf{P}^{\prime}\mathbf{P}\mathbf{D}_{t}^{-1}\mathbf{P}^{\prime}\mathbf{P}\mathbf{B}_{0}\mathbf{P}^{\prime}\widetilde{\mathbf{y}}_{t}}
=(2π)nT2|det𝐁~0|Tt=1T|det𝐃~t|12e12t=1T𝐲~t𝐁~0𝐃~t1𝐁~0𝐲~t=p(𝐲~|𝐁~0,𝐡~),\displaystyle=(2\pi)^{-\frac{nT}{2}}|\det\widetilde{\mathbf{B}}_{0}|^{T}\prod_{t=1}^{T}|\det\widetilde{\mathbf{D}}_{t}|^{-\frac{1}{2}}\text{e}^{-\frac{1}{2}\sum_{t=1}^{T}\widetilde{\mathbf{y}}_{t}^{\prime}\widetilde{\mathbf{B}}_{0}^{\prime}\widetilde{\mathbf{D}}_{t}^{-1}\widetilde{\mathbf{B}}_{0}\widetilde{\mathbf{y}}_{t}}=p(\widetilde{\mathbf{y}}\,|\,\widetilde{\mathbf{B}}_{0},\widetilde{\mathbf{h}}),

where 𝐁~0=𝐏𝐁0𝐏\widetilde{\mathbf{B}}_{0}=\mathbf{P}\mathbf{B}_{0}\mathbf{P}^{\prime}, 𝐃~t=𝐏𝐃t𝐏=diag(e𝐡~t)\widetilde{\mathbf{D}}_{t}=\mathbf{P}\mathbf{D}_{t}\mathbf{P}^{\prime}=\text{diag}(\text{e}^{\widetilde{\mathbf{h}}_{t}}) and 𝐡~=(𝐡~1,,𝐡~T)\widetilde{\mathbf{h}}=(\widetilde{\mathbf{h}}_{1}^{\prime},\ldots,\widetilde{\mathbf{h}}_{T}^{\prime}) with 𝐡~t=𝐏𝐡t\widetilde{\mathbf{h}}_{t}=\mathbf{P}\mathbf{h}_{t}. Note that we have used the fact that 𝐏1=𝐏\mathbf{P}^{-1}=\mathbf{P}^{\prime} and |det𝐏|=1|\det\mathbf{P}|=1, as all permutation matrices are orthogonal matrices. In addition, since 𝐏1𝐲~t=𝐲t\mathbf{P}^{-1}\widetilde{\mathbf{y}}_{t}=\mathbf{y}_{t}, the first line in the above derivation is equal to p(𝐲|𝐁0,𝐡)p(\mathbf{y}\,|\,\mathbf{B}_{0},\mathbf{h}). Hence, we have shown that p(𝐲|𝐁0,𝐡)=p(𝐲~|𝐁~0,𝐡~)p(\mathbf{y}\,|\,\mathbf{B}_{0},\mathbf{h})=p(\widetilde{\mathbf{y}}\,|\,\widetilde{\mathbf{B}}_{0},\widetilde{\mathbf{h}}). We summarize this result in the following proposition.

Proposition 2 (Order Invariance).

Let p(𝐲|𝐁0,𝐡)p(\mathbf{y}\,|\,\mathbf{B}_{0},\mathbf{h}) denote the likelihood of the stochastic volatility model given in (2). Let 𝐏\mathbf{P} be an arbitrary n×nn\times n permutation matrix and define 𝐲~t=𝐏𝐲t\widetilde{\mathbf{y}}_{t}=\mathbf{P}\mathbf{y}_{t} and 𝐡~t=𝐏𝐡t\widetilde{\mathbf{h}}_{t}=\mathbf{P}\mathbf{h}_{t}. Then, the likelihood with dependent variables 𝐲~t\widetilde{\mathbf{y}}_{t} has exactly the same form but with parameters permuted accordingly. More precisely,

p(𝐲|𝐁0,𝐡)=p(𝐲~|𝐁~0,𝐡~),p(\mathbf{y}\,|\,\mathbf{B}_{0},\mathbf{h})=p(\widetilde{\mathbf{y}}\,|\,\widetilde{\mathbf{B}}_{0},\widetilde{\mathbf{h}}),

where 𝐁~0=𝐏𝐁0𝐏\widetilde{\mathbf{B}}_{0}=\mathbf{P}\mathbf{B}_{0}\mathbf{P}^{\prime} and 𝐡~=(𝐡~1,,𝐡~T)\widetilde{\mathbf{h}}=(\widetilde{\mathbf{h}}_{1}^{\prime},\ldots,\widetilde{\mathbf{h}}_{T}^{\prime}).

Next, we derive the unconditional covariance matrix of the data implied by the stochastic volatility model in (2)-(3). More specifically, given the model parameters 𝐁0\mathbf{B}_{0}, ϕi\phi_{i} and ωi2,i=1,,n,\omega^{2}_{i},i=1,\ldots,n, suppose we generate the log-volatility processes via (3). It is straightforward to compute the implied unconditional covariance matrix of 𝐲t\mathbf{y}_{t}.

Since the unconditional distribution of hi,th_{i,t} implied by (3) is 𝒩(0,ωi2/(1ϕi2))\mathcal{N}(0,\omega_{i}^{2}/(1-\phi_{i}^{2})), the unconditional distribution of ehi,t\text{e}^{h_{i,t}} is log-normal with mean eωi22(1ϕi2)\text{e}^{\frac{\omega_{i}^{2}}{2(1-\phi_{i}^{2})}}. Hence, the unconditional mean of 𝐃t\mathbf{D}_{t} is 𝐕𝐃=diag(eω122(1ϕ12),,eωn22(1ϕn2))\mathbf{V}_{\mathbf{D}}=\text{diag}(\text{e}^{\frac{\omega_{1}^{2}}{2(1-\phi_{1}^{2})}},\ldots,\text{e}^{\frac{\omega_{n}^{2}}{2(1-\phi_{n}^{2})}}). It then follows that the unconditional covariance matrix of 𝐲t\mathbf{y}_{t} can be written as

𝚺(𝐁0𝐕𝐃1𝐁0)1=(𝐁¯0𝐁¯0)1,\boldsymbol{\Sigma}\equiv(\mathbf{B}_{0}^{\prime}\mathbf{V}_{\mathbf{D}}^{-1}\mathbf{B}_{0})^{-1}=(\overline{\mathbf{B}}_{0}^{\prime}\overline{\mathbf{B}}_{0})^{-1},

where 𝐁¯0=𝐕𝐃12𝐁0\overline{\mathbf{B}}_{0}=\mathbf{V}_{\mathbf{D}}^{-\frac{1}{2}}\mathbf{B}_{0}, i.e., scaling each row of 𝐁0\mathbf{B}_{0} by eωi24(1ϕi2)\text{e}^{-\frac{\omega_{i}^{2}}{4(1-\phi_{i}^{2})}}, i=1,,ni=1,\ldots,n. From this expression it is also clear that if we permute the order of the dependent variables via 𝐲~t=𝐏𝐲t\widetilde{\mathbf{y}}_{t}=\mathbf{P}\mathbf{y}_{t}, the unconditional covariance matrix of 𝐲~t\widetilde{\mathbf{y}}_{t} can be obtained by permuting the rows and columns of 𝐁¯0\overline{\mathbf{B}}_{0} accordingly. More precisely:

𝚺~𝐏𝚺𝐏=(𝐏𝐁¯0𝐏𝐏𝐁¯0𝐏)1=(𝐁¯~0𝐁¯~0)1,\widetilde{\boldsymbol{\Sigma}}\equiv\mathbf{P}\boldsymbol{\Sigma}\mathbf{P}^{\prime}=(\mathbf{P}\overline{\mathbf{B}}_{0}^{\prime}\mathbf{P}^{\prime}\mathbf{P}\overline{\mathbf{B}}_{0}\mathbf{P}^{\prime})^{-1}=(\widetilde{\overline{\mathbf{B}}}_{0}^{\prime}\widetilde{\overline{\mathbf{B}}}_{0})^{-1},

where 𝐁¯~0=𝐏𝐁¯0𝐏\widetilde{\overline{\mathbf{B}}}_{0}=\mathbf{P}\overline{\mathbf{B}}_{0}\mathbf{P}^{\prime}. Hence, the unconditional variance of any element in 𝐲t\mathbf{y}_{t} does not depend on its position in the nn-tuple.

This establishes order-invariance in the likelihood function defined by the OI-SV. Of course, the posterior will also be order invariant (subject to permutations and sign switches) if the prior is. This will hold in any reasonable case. All it requires is that the prior for the parameters in equation ii under one ordering becomes the prior for equation jj in a different ordering if variable ii in the first ordering becomes variable jj in the second ordering.

4 A VAR with an Order-Invariant SV Specification

4.1 Model Definition

We now add the VAR part to our OI-SV model, leading to the OI-VAR-SV which is given by

𝐲t=𝐚+𝐀1𝐲t1++𝐀p𝐲tp+𝐁01𝜺t,𝜺t𝒩(𝟎,𝐃t),\mathbf{y}_{t}=\mathbf{a}+\mathbf{A}_{1}\mathbf{y}_{t-1}+\cdots+\mathbf{A}_{p}\mathbf{y}_{t-p}+\mathbf{B}_{0}^{-1}\boldsymbol{\varepsilon}_{t},\quad\boldsymbol{\varepsilon}_{t}\sim\mathcal{N}(\mathbf{0},\mathbf{D}_{t}), (4)

where 𝐚\mathbf{a} is an n×1n\times 1 vector of intercepts, 𝐀1,,𝐀p\mathbf{A}_{1},\ldots,\mathbf{A}_{p} are n×nn\times n matrices of VAR coefficients, 𝐁0\mathbf{B}_{0} is non-singular but otherwise unrestricted n×nn\times n matrix, and 𝐃t=diag(eh1,t,,ehn,t)\mathbf{D}_{t}=\text{diag}(\text{e}^{h_{1,t}},\ldots,\text{e}^{h_{n,t}}) is diagonal. Finally, each of the log-volatility hi,th_{i,t} follows the stationary AR(1) process as specified in (3).

4.2 Shrinkage Priors

We argued previously that ordering issues are likely to be most important in high dimensional models. With large VARs over-parameterization concerns can be substantial and, for this reason, Bayesian methods involving shrinkage priors are commonly used. In this sub-section, we describe a particular VAR prior with attractive properties. But it is worth noting that any of the standard Bayesian VAR priors (e.g. the Minnesota prior) could have been used and the resulting model would still have been order invariant.

Let 𝜶i\boldsymbol{\alpha}_{i} denote the VAR coefficients in the ii-th equation, i=1,,ni=1,\ldots,n. We consider the Minnesota-type adaptive hierarchical priors proposed in Chan (2021), which have the advantages of both the Minnesota priors (e.g., rich prior beliefs such as cross-variable shrinkage) and modern adaptive hierarchical priors (e.g., heavy-tails and substantial mass around 0). In particular, we construct a Minnesota-type horseshoe prior as follows. For αi,j\alpha_{i,j}, the jj-th coefficient in the ii-th equation, let κi,j=κ1\kappa_{i,j}=\kappa_{1} if it is a coefficient on an ‘own lag’ and let κi,j=κ2\kappa_{i,j}=\kappa_{2} if it is a coefficient on an ‘other lag’. Given the constants Ci,jC_{i,j} defined below, consider the horseshoe prior on the VAR coefficients (excluding the intercepts) αi,j,i=1,,n,j=2,,k\alpha_{i,j},i=1,\ldots,n,j=2,\ldots,k with k=np+1k=np+1:

(αi,j|κ1,κ2,ψi,j)\displaystyle(\alpha_{i,j}\,|\,\kappa_{1},\kappa_{2},\psi_{i,j}) 𝒩(mi,j,κi,jψi,jCi,j),\displaystyle\sim\mathcal{N}(m_{i,j},\kappa_{i,j}\psi_{i,j}C_{i,j}), (5)
ψi,j\displaystyle\sqrt{\psi}_{i,j} 𝒞+(0,1),\displaystyle\sim\mathcal{C}^{+}(0,1), (6)
κ1,κ2\displaystyle\sqrt{\kappa}_{1},\sqrt{\kappa}_{2} 𝒞+(0,1),\displaystyle\sim\mathcal{C}^{+}(0,1), (7)

where 𝒞+(0,1)\mathcal{C}^{+}(0,1) denotes the standard half-Cauchy distribution. κ1\kappa_{1} and κ2\kappa_{2} are the global variance components that are common to, respectively, coefficients of own and other lags, whereas each ψi,j\psi_{i,j} is a local variance component specific to the coefficient αi,j\alpha_{i,j}. Lastly, the constants Ci,jC_{i,j} are obtained as in the Minnesota prior, i.e.,

Ci,j={1l2,for the coefficient on the l-th lag of variable i,si2l2sj2,for the coefficient on the l-th lag of variable j,ji,C_{i,j}=\left\{\begin{array}[]{ll}\frac{1}{l^{2}},&\text{for the coefficient on the $l$-th lag of variable }i,\\ \frac{s_{i}^{2}}{l^{2}s_{j}^{2}},&\text{for the coefficient on the $l$-th lag of variable }j,\;j\neq i,\end{array}\right. (8)

where sr2s_{r}^{2} denotes the sample variance of the residuals from an AR(4) model for the variable r,r=1,,nr,r=1,\ldots,n. Furthermore, for data in growth rates mi,jm_{i,j} is set to be 0; for level data, mi,jm_{i,j} is set to be zero as well except for the coefficient associated with the first own lag, which is set to be one.

It is easy to verify that if all the local variances are fixed, i.e., ψi,j1\psi_{i,j}\equiv 1, then the Minnesota-type horseshoe prior given in (5)–(7) reduces to the Minnesota prior. Hence, it can be viewed as an extension of the Minnesota prior by introducing a local variance component such that the marginal prior distribution of αi,j\alpha_{i,j} has heavy tails. On the other hand, if mi,j=0m_{i,j}=0, Ci,j=1C_{i,j}=1 and κ1=κ2\kappa_{1}=\kappa_{2}, then the Minnesota-type horseshoe prior reduces to the standard horseshoe prior where the coefficients have identical distributions. Therefore, it can also be viewed as an extension of the horseshoe prior (Carvalho, Polson, and Scott, 2010) that incorporates richer prior beliefs on the VAR coefficients, such as cross-variable shrinkage, i.e., shrinking coefficients on own lags differently than other lags.

To facilitate estimation, we follow Makalic and Schmidt (2016) and use the following latent variable representations of the half-Cauchy distributions in (6)-(7):

(ψi,j|zψi,j)\displaystyle(\psi_{i,j}\,|\,z_{\psi_{i,j}}) 𝒢(1/2,1/zψi,j),zψi,j𝒢(1/2,1),\displaystyle\sim\mathcal{IG}(1/2,1/z_{\psi_{i,j}}),\quad z_{\psi_{i,j}}\sim\mathcal{IG}(1/2,1), (9)
(κl|zκl)\displaystyle(\kappa_{l}\,|\,z_{\kappa_{l}}) 𝒢(1/2,1/zκl),zκl𝒢(1/2,1),\displaystyle\sim\mathcal{IG}(1/2,1/z_{\kappa_{l}}),\quad z_{\kappa_{l}}\sim\mathcal{IG}(1/2,1), (10)

for i=1,,n,j=2,,ki=1,\ldots,n,j=2,\ldots,k and l=1,2l=1,2 where 𝒢\mathcal{IG} denotes the inverse Gamma distribution

Next, we specify the priors on other model parameters. Let 𝐛i\mathbf{b}_{i} denote the ii-th row of 𝐁0\mathbf{B}_{0} for i=1,,n,i=1,\ldots,n, i.e., 𝐁0=(𝐛1,,𝐛n)\mathbf{B}_{0}^{\prime}=(\mathbf{b}_{1},\ldots,\mathbf{b}_{n}). We consider independent priors on 𝐛i,i=1,,n\mathbf{b}_{i},i=1,\ldots,n:

𝐛i𝒩(𝐛0,i,𝐕𝐛i).\mathbf{b}_{i}\sim\mathcal{N}(\mathbf{b}_{0,i},\mathbf{V}_{\mathbf{b}_{i}}).

For the parameters in the stochastic volatility equations, we assume the priors for j=1,,nj=1,\ldots,n:

ϕj𝒩(ϕ0,j,Vϕj)1(|ϕj|<1),σj2𝒢(νj,Sj).\phi_{j}\sim\mathcal{N}(\phi_{0,j},V_{\phi_{j}})1(|\phi_{j}|<1),\quad\sigma_{j}^{2}\sim\mathcal{IG}(\nu_{j},S_{j}).

4.3 MCMC Algorithm

In this section we develop a posterior sampler which allows for Bayesian estimation of the order-invariant stochastic volatility model with the Minnesota-type horseshoe prior. For later reference, let 𝝍i=(ψi,1,,ψi,k)\boldsymbol{\psi}_{i}=(\psi_{i,1},\ldots,\psi_{i,k})^{\prime} denote the local variance components associated with 𝜶i\boldsymbol{\alpha}_{i} and define 𝜿=(κ1,κ2)\boldsymbol{\kappa}=(\kappa_{1},\kappa_{2})^{\prime}. Furthermore, let 𝐳𝝍i=(zψi,1,,zψi,k)\mathbf{z}_{\boldsymbol{\psi}_{i}}=(z_{\psi_{i,1}},\ldots,z_{\psi_{i,k}})^{\prime} denote the latent variables corresponding to 𝝍i\boldsymbol{\psi}_{i} and similarly define 𝐳𝜿\mathbf{z}_{\boldsymbol{\kappa}}. Next, stack 𝐲=(𝐲1,,𝐲T)\mathbf{y}=(\mathbf{y}_{1}^{\prime},\ldots,\mathbf{y}_{T}^{\prime})^{\prime}, 𝜶=(𝜶1,,𝜶n)\boldsymbol{\alpha}=(\boldsymbol{\alpha}_{1}^{\prime},\ldots,\boldsymbol{\alpha}_{n}^{\prime})^{\prime}, 𝝍=(𝝍1,,𝝍n)\boldsymbol{\psi}=(\boldsymbol{\psi}_{1}^{\prime},\ldots,\boldsymbol{\psi}_{n}^{\prime})^{\prime}, 𝐳𝝍=(𝐳𝝍1,,𝐳𝝍n)\mathbf{z}_{\boldsymbol{\psi}}=(\mathbf{z}_{\boldsymbol{\psi}_{1}}^{\prime},\ldots,\mathbf{z}_{\boldsymbol{\psi}_{n}}^{\prime})^{\prime} and 𝐡=(𝐡1,,𝐡T)\mathbf{h}=(\mathbf{h}_{1}^{\prime},\ldots,\mathbf{h}_{T}^{\prime})^{\prime} with 𝐡t=(h1,t,,hn,t)\mathbf{h}_{t}=(h_{1,t},\ldots,h_{n,t})^{\prime}. Finally, let 𝐡i,=(hi,1,,hi,T)\mathbf{h}_{i,\boldsymbol{\cdot}}=(h_{i,1},\ldots,h_{i,T})^{\prime} represent the vector of log-volatility for the ii-th equation, i=1,,ni=1,\ldots,n. Then, posterior draws can be obtained by sampling sequentially from:

  1. 1.

    p(𝐁0|𝐲,𝜶,𝐡,ϕ,𝝎2,𝝍,𝜿,𝐳𝝍,𝐳𝜿)p(\mathbf{B}_{0}\,|\,\mathbf{y},\boldsymbol{\alpha},\mathbf{h},\boldsymbol{\phi},\boldsymbol{\omega}^{2},\boldsymbol{\psi},\boldsymbol{\kappa},\mathbf{z}_{\boldsymbol{\psi}},\mathbf{z}_{\boldsymbol{\kappa}}) = p(𝐁0|𝐲,𝜶,𝐡)p(\mathbf{B}_{0}\,|\,\mathbf{y},\boldsymbol{\alpha},\mathbf{h});

  2. 2.

    p(𝜶|𝐲,𝐁0,𝐡,ϕ,𝝎2,𝝍,𝜿,𝐳𝝍,𝐳𝜿)p(\boldsymbol{\alpha}\,|\,\mathbf{y},\mathbf{B}_{0},\mathbf{h},\boldsymbol{\phi},\boldsymbol{\omega}^{2},\boldsymbol{\psi},\boldsymbol{\kappa},\mathbf{z}_{\boldsymbol{\psi}},\mathbf{z}_{\boldsymbol{\kappa}}) = p(𝜶|𝐲,𝐁0,𝐡,𝝍,𝜿)p(\boldsymbol{\alpha}\,|\,\mathbf{y},\mathbf{B}_{0},\mathbf{h},\boldsymbol{\psi},\boldsymbol{\kappa});

  3. 3.

    p(𝝍|𝐲,𝜶,𝐁0,𝐡,ϕ,𝝎2,𝜿,𝐳𝝍,𝐳𝜿)p(\boldsymbol{\psi}\,|\,\mathbf{y},\boldsymbol{\alpha},\mathbf{B}_{0},\mathbf{h},\boldsymbol{\phi},\boldsymbol{\omega}^{2},\boldsymbol{\kappa},\mathbf{z}_{\boldsymbol{\psi}},\mathbf{z}_{\boldsymbol{\kappa}}) = i=1nj=2kp(ψi,j|αi,j,𝜿,zψi,j)\prod_{i=1}^{n}\prod_{j=2}^{k}p(\psi_{i,j}\,|\,\alpha_{i,j},\boldsymbol{\kappa},z_{\psi_{i,j}});

  4. 4.

    p(𝜿|𝐲,𝜶,𝐁0,𝐡,ϕ,𝝎2,𝝍,𝐳𝝍,𝐳𝜿)p(\boldsymbol{\kappa}\,|\,\mathbf{y},\boldsymbol{\alpha},\mathbf{B}_{0},\mathbf{h},\boldsymbol{\phi},\boldsymbol{\omega}^{2},\boldsymbol{\psi},\mathbf{z}_{\boldsymbol{\psi}},\mathbf{z}_{\boldsymbol{\kappa}}) = l=12p(κl|𝜶,𝝍,𝐳κl)\prod_{l=1}^{2}p(\kappa_{l}\,|\,\boldsymbol{\alpha},\boldsymbol{\psi},\mathbf{z}_{\kappa_{l}});

  5. 5.

    p(𝐳𝝍|𝐲,𝜶,𝐁0,𝐡,ϕ,𝝎2,𝜿,𝝍,𝐳𝜿)p(\mathbf{z}_{\boldsymbol{\psi}}\,|\,\mathbf{y},\boldsymbol{\alpha},\mathbf{B}_{0},\mathbf{h},\boldsymbol{\phi},\boldsymbol{\omega}^{2},\boldsymbol{\kappa},\boldsymbol{\psi},\mathbf{z}_{\boldsymbol{\kappa}}) = i=1nj=2kp(zψi,j|ψi,j)\prod_{i=1}^{n}\prod_{j=2}^{k}p(z_{\psi_{i,j}}\,|\,\psi_{i,j});

  6. 6.

    p(𝐳𝜿|𝐲,𝜶,𝐁0,𝐡,ϕ,𝝎2,𝜿,𝝍,𝐳𝝍)p(\mathbf{z}_{\boldsymbol{\kappa}}\,|\,\mathbf{y},\boldsymbol{\alpha},\mathbf{B}_{0},\mathbf{h},\boldsymbol{\phi},\boldsymbol{\omega}^{2},\boldsymbol{\kappa},\boldsymbol{\psi},\mathbf{z}_{\boldsymbol{\psi}}) = l=12p(zκl|κl)\prod_{l=1}^{2}p(z_{\kappa_{l}}\,|\,\kappa_{l}).

  7. 7.

    p(𝐡|𝐲,𝐁0,𝜶,ϕ,𝝎2,𝝍,𝜿,𝐳𝝍,𝐳𝜿)=i=1np(𝐡i,|𝐲,𝐁0,𝜶,ϕ,𝝎2)p(\mathbf{h}\,|\,\mathbf{y},\mathbf{B}_{0},\boldsymbol{\alpha},\boldsymbol{\phi},\boldsymbol{\omega}^{2},\boldsymbol{\psi},\boldsymbol{\kappa},\mathbf{z}_{\boldsymbol{\psi}},\mathbf{z}_{\boldsymbol{\kappa}})=\prod_{i=1}^{n}p(\mathbf{h}_{i,\boldsymbol{\cdot}}\,|\,\mathbf{y},\mathbf{B}_{0},\boldsymbol{\alpha},\boldsymbol{\phi},\boldsymbol{\omega}^{2});

  8. 8.

    p(𝝎2|𝐲,𝐁0,𝜶,𝐡,ϕ,𝝍,𝜿,𝐳𝝍,𝐳𝜿)=i=1np(ωi2|𝐡i,,ϕi)p(\boldsymbol{\omega}^{2}\,|\,\mathbf{y},\mathbf{B}_{0},\boldsymbol{\alpha},\mathbf{h},\boldsymbol{\phi},\boldsymbol{\psi},\boldsymbol{\kappa},\mathbf{z}_{\boldsymbol{\psi}},\mathbf{z}_{\boldsymbol{\kappa}})=\prod_{i=1}^{n}p(\omega_{i}^{2}\,|\,\mathbf{h}_{i,\boldsymbol{\cdot}},\phi_{i});

  9. 9.

    p(ϕ|𝐲,𝐁0,𝜶,𝐡,𝝎2,𝝍,𝜿,𝐳𝝍,𝐳𝜿)=i=1np(ϕi|𝐡i,,σi2)p(\boldsymbol{\phi}\,|\,\mathbf{y},\mathbf{B}_{0},\boldsymbol{\alpha},\mathbf{h},\boldsymbol{\omega}^{2},\boldsymbol{\psi},\boldsymbol{\kappa},\mathbf{z}_{\boldsymbol{\psi}},\mathbf{z}_{\boldsymbol{\kappa}})=\prod_{i=1}^{n}p(\phi_{i}\,|\,\mathbf{h}_{i,\boldsymbol{\cdot}},\sigma^{2}_{i});

Step 1. To implement Step 1, we adapt the sampling approach in Waggoner and Zha (2003) and Villani (2009) to the setting with stochastic volatility. More specifically, we aim to draw each row of 𝐁0\mathbf{B}_{0} given all other rows. To that end, we first rewrite (4) as:

(𝐘𝐗𝐀)𝐁0=𝐄,(\mathbf{Y}-\mathbf{X}\mathbf{A})\mathbf{B}_{0}^{\prime}=\mathbf{E}, (11)

where 𝐘\mathbf{Y} is the T×nT\times n matrix of dependent variables, 𝐗\mathbf{X} is the T×kT\times k matrix of lagged dependent variables with k=1+npk=1+np, 𝐀=(𝐚,𝐀1,,𝐀n)\mathbf{A}=(\mathbf{a},\mathbf{A}_{1},\ldots,\mathbf{A}_{n})^{\prime} is the k×nk\times n matrix of VAR coefficients and 𝐄\mathbf{E} is the T×nT\times n matrix of errors. Then, for i=1,,ni=1,\ldots,n, we have

(𝐘𝐗𝐀)𝐛i=𝐄i,𝐄i𝒩(𝟎,𝛀𝐡i,),(\mathbf{Y}-\mathbf{X}\mathbf{A})\mathbf{b}_{i}=\mathbf{E}_{i},\quad\mathbf{E}_{i}\sim\mathcal{N}(\mathbf{0},\boldsymbol{\Omega}_{\mathbf{h}_{i,\boldsymbol{\cdot}}}),

where 𝐄i\mathbf{E}_{i} is the ii-th column of 𝐄\mathbf{E} and 𝛀𝐡i,=diag(ehi,1,,ehi,T).\boldsymbol{\Omega}_{\mathbf{h}_{i,\boldsymbol{\cdot}}}=\text{diag}(\text{e}^{h_{i,1}},\ldots,\text{e}^{h_{i,T}}). Hence, the full conditional distribution of 𝐛i\mathbf{b}_{i} is given by

p(𝐛i|𝐲,𝜶,𝐛i,𝐡i,)\displaystyle p(\mathbf{b}_{i}\,|\,\mathbf{y},\boldsymbol{\alpha},\mathbf{b}_{-i},\mathbf{h}_{i,\boldsymbol{\cdot}}) |det𝐁0|Te12𝐛i(𝐘𝐗𝐀)𝛀𝐡i,1(𝐘𝐗𝐀)𝐛i×e12(𝐛i𝐛0,i)𝐕𝐛i1(𝐛i𝐛0,i)\displaystyle\propto|\det\mathbf{B}_{0}|^{T}\text{e}^{-\frac{1}{2}\mathbf{b}_{i}^{\prime}(\mathbf{Y}-\mathbf{X}\mathbf{A})^{\prime}\boldsymbol{\Omega}_{\mathbf{h}_{i,\boldsymbol{\cdot}}}^{-1}(\mathbf{Y}-\mathbf{X}\mathbf{A})\mathbf{b}_{i}}\times\text{e}^{-\frac{1}{2}(\mathbf{b}_{i}-\mathbf{b}_{0,i})^{\prime}\mathbf{V}_{\mathbf{b}_{i}}^{-1}(\mathbf{b}_{i}-\mathbf{b}_{0,i})}
|det𝐁0|Te12(𝐛i𝐛^i)𝐊𝐛i(𝐛i𝐛^i)\displaystyle\propto|\det\mathbf{B}_{0}|^{T}\text{e}^{-\frac{1}{2}(\mathbf{b}_{i}-\widehat{\mathbf{b}}_{i})^{\prime}\mathbf{K}_{\mathbf{b}_{i}}(\mathbf{b}_{i}-\widehat{\mathbf{b}}_{i})} (12)

where

𝐊𝐛i=𝐕𝐛i1+(𝐘𝐗𝐀)𝛀𝐡i,1(𝐘𝐗𝐀),𝐛^i=𝐊𝐛i1𝐕𝐛i1𝐛0,i.\mathbf{K}_{\mathbf{b}_{i}}=\mathbf{V}_{\mathbf{b}_{i}}^{-1}+(\mathbf{Y}-\mathbf{X}\mathbf{A})^{\prime}\boldsymbol{\Omega}_{\mathbf{h}_{i,\boldsymbol{\cdot}}}^{-1}(\mathbf{Y}-\mathbf{X}\mathbf{A}),\quad\widehat{\mathbf{b}}_{i}=\mathbf{K}_{\mathbf{b}_{i}}^{-1}\mathbf{V}_{\mathbf{b}_{i}}^{-1}\mathbf{b}_{0,i}.

The above full conditional posterior distribution is non-standard and direct sampling is infeasible. Fortunately, Waggoner and Zha (2003) develop an efficient algorithm to sample 𝐛i\mathbf{b}_{i} in the case when 𝐛^i=𝟎.\widehat{\mathbf{b}}_{i}=\mathbf{0}. Villani (2009) further generalizes this sampling approach for non-zero 𝐛^i\widehat{\mathbf{b}}_{i}.

The key idea is to show that 𝐛i\mathbf{b}_{i} has the same distribution as a linear transformation of an nn-vector that consists of one absolute normal and n1n-1 normal random variables. A random variable ZZ follows the absolute normal distribution 𝒜𝒩(μ,ρ)\mathcal{AN}(\mu,\rho) if it has the density function

f𝒜𝒩(z;μ,ρ)=c|z|1ρe12ρ(zμ)2,z,ρ+,μ,f_{\mathcal{AN}}(z;\mu,\rho)=c|z|^{\frac{1}{\rho}}\text{e}^{-\frac{1}{2\rho}(z-\mu)^{2}},\quad z\in\mathbb{R},\rho\in\mathbb{R}^{+},\mu\in\mathbb{R},

where cc is a normalizing constant.

To formally state the results, let 𝐂i\mathbf{C}_{i} denote the Cholesky factor of T1𝐊𝐛iT^{-1}\mathbf{K}_{\mathbf{b}_{i}} such that 𝐊𝐛i=T𝐂i𝐂i\mathbf{K}_{\mathbf{b}_{i}}=T\mathbf{C}_{i}\mathbf{C}_{i}^{\prime}. Let 𝐅i\mathbf{F}_{-i} represent the matrix 𝐅\mathbf{F} with the ii-th column deleted and 𝐅\mathbf{F}^{\perp} be the orthogonal complement of 𝐅\mathbf{F}. Furthermore, define 𝐯1=𝐂i1(𝐁0,i)/𝐂i1(𝐁0,i)\mathbf{v}_{1}=\mathbf{C}_{i}^{-1}(\mathbf{B}_{0,-i}^{\prime})^{\perp}/||\mathbf{C}_{i}^{-1}(\mathbf{B}_{0,-i}^{\prime})^{\perp}||, where (𝐁0,i)((𝐁0)i)(\mathbf{B}_{0,-i}^{\prime})^{\perp}\equiv((\mathbf{B}_{0}^{\prime})_{-i})^{\perp}, and let (𝐯2,,𝐯n)=𝐯1(\mathbf{v}_{2},\ldots,\mathbf{v}_{n})=\mathbf{v}_{1}^{\perp}. Then, we claim that

(𝐛i|𝐲,𝜶,𝐛i,𝐡i,)=d(𝐂i)1j=1nξj𝐯j,(\mathbf{b}_{i}\,|\,\mathbf{y},\boldsymbol{\alpha},\mathbf{b}_{-i},\mathbf{h}_{i,\boldsymbol{\cdot}})\stackrel{{\scriptstyle\text{d}}}{{=}}(\mathbf{C}_{i}^{\prime})^{-1}\sum_{j=1}^{n}\xi_{j}\mathbf{v}_{j},

where ξ1𝒜𝒩(ξ^1,T1)\xi_{1}\sim\mathcal{AN}(\widehat{\xi}_{1},T^{-1}) and ξj𝒩(ξ^j,T1),j=2,,n\xi_{j}\sim\mathcal{N}(\widehat{\xi}_{j},T^{-1}),j=2,\ldots,n with ξ^j=𝐛^i𝐂i𝐯j\widehat{\xi}_{j}=\widehat{\mathbf{b}}_{i}^{\prime}\mathbf{C}_{i}\mathbf{v}_{j}.

To prove the claim, let 𝐛i=(𝐂i)1j=1nξj𝐯j\mathbf{b}_{i}=(\mathbf{C}_{i}^{\prime})^{-1}\sum_{j=1}^{n}\xi_{j}\mathbf{v}_{j}. It suffices to show that if we substitute 𝐛i\mathbf{b}_{i} into (12), ξ1,,ξn\xi_{1},\ldots,\xi_{n} are independent with ξ1𝒜𝒩(ξ^1,T1)\xi_{1}\sim\mathcal{AN}(\widehat{\xi}_{1},T^{-1}) and ξj𝒩(ξ^j,T1),j=2,,n\xi_{j}\sim\mathcal{N}(\widehat{\xi}_{j},T^{-1}),j=2,\ldots,n. To that end, note that by construction, 𝐯1,,𝐯n\mathbf{v}_{1},\ldots,\mathbf{v}_{n} forms an orthonormal basis of n\mathbb{R}^{n}, particularly j=1n𝐯j𝐯j=𝐈n\sum_{j=1}^{n}\mathbf{v}_{j}\mathbf{v}_{j}^{\prime}=\mathbf{I}_{n}. We first write the quadratic form in (12) as:

(𝐛i𝐛^i)𝐊𝐛i(𝐛i𝐛^i)\displaystyle(\mathbf{b}_{i}-\widehat{\mathbf{b}}_{i})^{\prime}\mathbf{K}_{\mathbf{b}_{i}}(\mathbf{b}_{i}-\widehat{\mathbf{b}}_{i}) =T(j=1nξj𝐯j𝐂i𝐛^i)(j=1nξj𝐯j𝐂i𝐛^i)\displaystyle=T\left(\sum_{j=1}^{n}\xi_{j}\mathbf{v}_{j}-\mathbf{C}_{i}^{\prime}\widehat{\mathbf{b}}_{i}\right)^{\prime}\left(\sum_{j=1}^{n}\xi_{j}\mathbf{v}_{j}-\mathbf{C}_{i}^{\prime}\widehat{\mathbf{b}}_{i}\right)
=T(j=1nξj22j=1nξj𝐛^i𝐂i𝐯j+𝐛^i𝐂i(j=1n𝐯j𝐯j)𝐂i𝐛^i)\displaystyle=T\left(\sum_{j=1}^{n}\xi_{j}^{2}-2\sum_{j=1}^{n}\xi_{j}\widehat{\mathbf{b}}_{i}^{\prime}\mathbf{C}_{i}\mathbf{v}_{j}+\widehat{\mathbf{b}}_{i}^{\prime}\mathbf{C}_{i}\left(\sum_{j=1}^{n}\mathbf{v}_{j}\mathbf{v}_{j}^{\prime}\right)\mathbf{C}_{i}\widehat{\mathbf{b}}_{i}\right)
=Tj=1n(ξjξ^j)2.\displaystyle=T\sum_{j=1}^{n}(\xi_{j}-\widehat{\xi}_{j})^{2}.

Next, note that by construction, (𝐯2,,𝐯n)(\mathbf{v}_{2},\ldots,\mathbf{v}_{n}) spans the same space as 𝐁0,i\mathbf{B}_{0,-i}^{\prime}. Hence, it follows that

|det𝐁0|=|det𝐁0|=|det(𝐛1,,𝐛i1,(𝐂i)1j=1nξj𝐯j,𝐛i+1,,𝐛n)||ξ1|.|\det\mathbf{B}_{0}|=|\det\mathbf{B}_{0}^{\prime}|=|\det(\mathbf{b}_{1},\ldots,\mathbf{b}_{i-1},(\mathbf{C}_{i}^{\prime})^{-1}\sum_{j=1}^{n}\xi_{j}\mathbf{v}_{j},\mathbf{b}_{i+1},\ldots,\mathbf{b}_{n})|\propto|\xi_{1}|.

Finally, substituting the quadratic form and the determinant into (12), we obtain

p(𝐛i|𝐲,𝜶,𝐛i,𝐡i,)\displaystyle p(\mathbf{b}_{i}\,|\,\mathbf{y},\boldsymbol{\alpha},\mathbf{b}_{-i},\mathbf{h}_{i,\boldsymbol{\cdot}})\propto |ξ1|TeT2j=1n(ξjξ^j)2\displaystyle|\xi_{1}|^{T}\text{e}^{-\frac{T}{2}\sum_{j=1}^{n}(\xi_{j}-\widehat{\xi}_{j})^{2}}
=\displaystyle= |ξ1|TeT2(ξ1ξ^1)2j=2neT2(ξjξ^j)2.\displaystyle|\xi_{1}|^{T}\text{e}^{-\frac{T}{2}(\xi_{1}-\widehat{\xi}_{1})^{2}}\prod_{j=2}^{n}\text{e}^{-\frac{T}{2}(\xi_{j}-\widehat{\xi}_{j})^{2}}.

In other words, ξ1𝒜𝒩(ξ^1,T1)\xi_{1}\sim\mathcal{AN}(\widehat{\xi}_{1},T^{-1}) and ξj𝒩(ξ^j,T1),j=2,,n\xi_{j}\sim\mathcal{N}(\widehat{\xi}_{j},T^{-1}),j=2,\ldots,n, and we have proved the claim.

In order to use the above result, we need an efficient way to sample from 𝒜𝒩(ξ^1,T1)\mathcal{AN}(\widehat{\xi}_{1},T^{-1}). This can be done by using the 2-component normal mixture approximation considered in Appendix C of Villani (2009). In addition, the orthogonal complement of 𝐯1\mathbf{v}_{1}, namely, 𝐯1\mathbf{v}_{1}^{\perp}, can be obtained using the singular value decomposition.222In MATLAB, the orthogonal complement of 𝐯1\mathbf{v}_{1} can be obtained using null(𝐯1)(\mathbf{v}_{1}^{\prime}). In the sampler we fix the sign of the ii-th element of 𝐛i\mathbf{b}_{i} to be positive (so that the diagonal elements of 𝐁0\mathbf{B}_{0} are positive). This is done by multiplying the draw 𝐛i\mathbf{b}_{i} by 1-1 if its ii-th element is negative.

Step 2. We sample the reduced-form VAR coefficients row by row along the lines of Carriero, Clark, and Marcellino (2019). In particular, we extend the triangular algorithm in Carriero, Chan, Clark, and Marcellino (2021) that is designed for a lower triangular impact matrix 𝐁0\mathbf{B}_{0} to the case where 𝐁0\mathbf{B}_{0} is a full matrix. To that end, define 𝐀i=0\mathbf{A}_{i=0} to be a k×nk\times n matrix that has exactly the same elements as 𝐀\mathbf{A} except for the ii-th column, which is set to be zero, i.e., 𝐀i=0=(𝜶1,,𝜶i1,𝟎,𝜶i+1,,𝜶n).\mathbf{A}_{i=0}=(\boldsymbol{\alpha}_{1},\ldots,\boldsymbol{\alpha}_{i-1},\mathbf{0},\boldsymbol{\alpha}_{i+1},\ldots,\boldsymbol{\alpha}_{n}). Then, we can rewrite (4) as

𝐁0(𝐲t𝐀i=0𝐱t)=(𝐁0,i𝐱t)𝜶i+𝜺t,𝜺t𝒩(𝟎,𝐃t),\mathbf{B}_{0}(\mathbf{y}_{t}-\mathbf{A}_{i=0}^{\prime}\mathbf{x}_{t})=(\mathbf{B}_{0,i}\otimes\mathbf{x}_{t}^{\prime})\boldsymbol{\alpha}_{i}+\boldsymbol{\varepsilon}_{t},\quad\boldsymbol{\varepsilon}_{t}\sim\mathcal{N}(\mathbf{0},\mathbf{D}_{t}),

where 𝐱t=(1,𝐲t1,,𝐲tp)\mathbf{x}_{t}=(1,\mathbf{y}_{t-1}^{\prime},\ldots,\mathbf{y}_{t-p}^{\prime})^{\prime} and 𝐁0,i\mathbf{B}_{0,i} is the ii-th column of 𝐁0\mathbf{B}_{0}. Let 𝐳ti=𝐁0(𝐲t𝐀i=0𝐱t)\mathbf{z}^{i}_{t}=\mathbf{B}_{0}(\mathbf{y}_{t}-\mathbf{A}_{i=0}^{\prime}\mathbf{x}_{t}) and stack 𝐳i=(𝐳1i,,𝐳Ti)\mathbf{z}^{i}=(\mathbf{z}^{i^{\prime}}_{1},\ldots,\mathbf{z}^{i^{\prime}}_{T})^{\prime}, we have

𝐳i=𝐖i𝜶i+𝜺,𝜺𝒩(𝟎,𝐃),\mathbf{z}^{i}=\mathbf{W}^{i}\boldsymbol{\alpha}_{i}+\boldsymbol{\varepsilon},\quad\boldsymbol{\varepsilon}\sim\mathcal{N}(\mathbf{0},\mathbf{D}),

where 𝐖i=𝐗𝐁0,i\mathbf{W}^{i}=\mathbf{X}\otimes\mathbf{B}_{0,i} and 𝐃=diag(𝐃1,,𝐃T)\mathbf{D}=\text{diag}(\mathbf{D}_{1},\ldots,\mathbf{D}_{T}).

Next, the Minnesota-type horseshoe prior on 𝜶i\boldsymbol{\alpha}_{i} is conditionally Gaussian given the local variance component 𝝍i\boldsymbol{\psi}_{i}. More specifically, we can rewrite the conditional prior on 𝜶i\boldsymbol{\alpha}_{i} in (5) as:

(𝜶i|𝜿,𝝍i)𝒩(𝐦i,𝐕𝜶i),(\boldsymbol{\alpha}_{i}\,|\,\boldsymbol{\kappa},\boldsymbol{\psi}_{i})\sim\mathcal{N}(\mathbf{m}_{i},\mathbf{V}_{\boldsymbol{\alpha}_{i}}),

where 𝐦i=(mi,1,,mi,k)\mathbf{m}_{i}=(m_{i,1},\ldots,m_{i,k})^{\prime} and 𝐕𝜶i=diag(Ci,1,κi,2ψi,2Ci,2,κi,kψi,kCi,k)\mathbf{V}_{\boldsymbol{\alpha}_{i}}=\text{diag}(C_{i,1},\kappa_{i,2}\psi_{i,2}C_{i,2}\ldots,\kappa_{i,k}\psi_{i,k}C_{i,k}) (the prior on the intercept is Gaussian). Then, by standard linear regression results, we have

(𝜶i|𝐲,𝐁0,𝜶i,𝐡,𝝍i,𝜿)𝒩(𝜶^i,𝐊𝜶i1),(\boldsymbol{\alpha}_{i}\,|\,\mathbf{y},\mathbf{B}_{0},\boldsymbol{\alpha}_{-i},\mathbf{h},\boldsymbol{\psi}_{i},\boldsymbol{\kappa})\sim\mathcal{N}(\widehat{\boldsymbol{\alpha}}_{i},\mathbf{K}_{\boldsymbol{\alpha}_{i}}^{-1}),

where 𝜶i=(𝜶1,,𝜶i1,𝜶i+1,,𝜶n)\boldsymbol{\alpha}_{-i}=(\boldsymbol{\alpha}_{1}^{\prime},\ldots,\boldsymbol{\alpha}_{i-1}^{\prime},\boldsymbol{\alpha}_{i+1}^{\prime},\ldots,\boldsymbol{\alpha}_{n}^{\prime})^{\prime},

𝐊𝜶i=𝐕𝜶i1+𝐖i𝐃1𝐖i,𝜶^i=𝐊𝜶i1(𝐕𝜶i1𝐦i+𝐖i𝐃1𝐳i).\mathbf{K}_{\boldsymbol{\alpha}_{i}}=\mathbf{V}_{\boldsymbol{\alpha}_{i}}^{-1}+\mathbf{W}^{i^{\prime}}\mathbf{D}^{-1}\mathbf{W}^{i},\quad\widehat{\boldsymbol{\alpha}}_{i}=\mathbf{K}_{\boldsymbol{\alpha}_{i}}^{-1}\left(\mathbf{V}_{\boldsymbol{\alpha}_{i}}^{-1}\mathbf{m}_{i}+\mathbf{W}^{i^{\prime}}\mathbf{D}^{-1}\mathbf{z}^{i}\right).

The computational complexity of this step is the same, i.e., 𝒪(n4)\mathcal{O}(n^{4}), as in the triangular case considered in Carriero, Chan, Clark, and Marcellino (2021).

Step 3. First, note that the elements of 𝝍i\boldsymbol{\psi}_{i} are conditionally independent and we can sample them one by one without loss of efficiency. Next, combining (5) and (9), we obtain

p(ψi,j|αi,j,𝜿,zψi,j)\displaystyle p(\psi_{i,j}\,|\,\alpha_{i,j},\boldsymbol{\kappa},z_{\psi_{i,j}}) ψi,j12e12κi,jCi,jψi,j(αi,jmi,j)2×ψi,j32e1ψi,jzψi,j\displaystyle\propto\psi_{i,j}^{-\frac{1}{2}}\text{e}^{-\frac{1}{2\kappa_{i,j}C_{i,j}\psi_{i,j}}(\alpha_{i,j}-m_{i,j})^{2}}\times\psi_{i,j}^{-\frac{3}{2}}\text{e}^{-\frac{1}{\psi_{i,j}z_{\psi_{i,j}}}}
=ψi,j2e1ψi,j(zψi,j1+(αi,jmi,j)22κi,jCi,j),\displaystyle=\psi_{i,j}^{-2}\text{e}^{-\frac{1}{\psi_{i,j}}\left(z_{\psi_{i,j}}^{-1}+\frac{(\alpha_{i,j}-m_{i,j})^{2}}{2\kappa_{i,j}C_{i,j}}\right)},

which is the kernel of the following inverse-gamma distribution:

(ψi,j|αi,j,𝜿,zψi,j)𝒢(1,zψi,j1+(αi,jmi,j)22κi,jCi,j).(\psi_{i,j}\,|\,\alpha_{i,j},\boldsymbol{\kappa},z_{\psi_{i,j}})\sim\mathcal{IG}\left(1,z_{\psi_{i,j}}^{-1}+\frac{(\alpha_{i,j}-m_{i,j})^{2}}{2\kappa_{i,j}C_{i,j}}\right).

Step 4. Note that κ1\kappa_{1} and κ2\kappa_{2} only appear in their priors in (10) and in (5) — recall κi,j=κ1\kappa_{i,j}=\kappa_{1} for coefficients on own lags and κi,j=κ2\kappa_{i,j}=\kappa_{2} for coefficients on other lags. To sample κ1\kappa_{1} and κ2\kappa_{2}, first define the index set Sκ1S_{\kappa_{1}} that collects all the indexes (i,j)(i,j) such that αi,j\alpha_{i,j} is a coefficient associated with an own lag. That is, Sκ1={(i,j):αi,j is a coefficient associated with an own lag}S_{\kappa_{1}}=\{(i,j):\alpha_{i,j}\text{ is a coefficient associated with an own lag}\}. Similarly, define Sκ2S_{\kappa_{2}} as the set that collects all the indexes (i,j)(i,j) such that αi,j\alpha_{i,j} is a coefficient associated with a lag of other variables. It is easy to check that the numbers of elements in Sκ1S_{\kappa_{1}} and Sκ2S_{\kappa_{2}} are respectively npnp and (n1)np(n-1)np. Then, we have

p(κ1|𝜶,𝝍,zκ1)\displaystyle p(\kappa_{1}\,|\,\boldsymbol{\alpha},\boldsymbol{\psi},z_{\kappa_{1}}) (i,j)Sκ1κ112e12κ1Ci,jψi,j(αi,jmi,j)2×κ132e1κ1zκ1\displaystyle\propto\prod_{(i,j)\in S_{\kappa_{1}}}\kappa_{1}^{-\frac{1}{2}}\text{e}^{-\frac{1}{2\kappa_{1}C_{i,j}\psi_{i,j}}(\alpha_{i,j}-m_{i,j})^{2}}\times\kappa_{1}^{-\frac{3}{2}}\text{e}^{-\frac{1}{\kappa_{1}z_{\kappa_{1}}}}
=κ1(np+12+1)e1κ2(zκ11+(i,j)Sκ1(αi,jmi,j)22ψi,jCi,j),\displaystyle=\kappa_{1}^{-\left(\frac{np+1}{2}+1\right)}\text{e}^{-\frac{1}{\kappa_{2}}\left(z_{\kappa_{1}}^{-1}+\sum_{(i,j)\in S_{\kappa_{1}}}\frac{(\alpha_{i,j}-m_{i,j})^{2}}{2\psi_{i,j}C_{i,j}}\right)},

which is the kernel of the 𝒢(np+12,zκ11+(i,j)Sκ1(αi,jmi,j)22ψi,jCi,j)\mathcal{IG}\left(\frac{np+1}{2},z_{\kappa_{1}}^{-1}+\sum_{(i,j)\in S_{\kappa_{1}}}\frac{(\alpha_{i,j}-m_{i,j})^{2}}{2\psi_{i,j}C_{i,j}}\right) distribution. Similarly, we have

(κ2|𝜶,𝝍,zκ2)𝒢((n1)np+12,zκ21+(i,j)Sκ2(αi,jmi,j)22ψi,jCi,j).(\kappa_{2}\,|\,\boldsymbol{\alpha},\boldsymbol{\psi},z_{\kappa_{2}})\sim\mathcal{IG}\left(\frac{(n-1)np+1}{2},z_{\kappa_{2}}^{-1}+\sum_{(i,j)\in S_{\kappa_{2}}}\frac{(\alpha_{i,j}-m_{i,j})^{2}}{2\psi_{i,j}C_{i,j}}\right).

Steps 5-6. It is straightforward to sample the latent variables 𝐳𝝍\mathbf{z}_{\boldsymbol{\psi}} and 𝐳𝜿\mathbf{z}_{\boldsymbol{\kappa}}. In particular, it follows from (9) that zψi,j𝒢(1,1+ψi,j1),i=1,,n,j=2,,kz_{\psi_{i,j}}\sim\mathcal{IG}(1,1+\psi_{i,j}^{-1}),i=1,\ldots,n,j=2,\ldots,k. Similarly, from (10) we have zκl𝒢(1,1+κl1),l=1,2z_{\kappa_{l}}\sim\mathcal{IG}(1,1+\kappa_{l}^{-1}),l=1,2.

Step 7. The log-volatility vector for each equation, 𝐡i,,i=1,,n,\mathbf{h}_{i,\boldsymbol{\cdot}},i=1,\ldots,n, can be sampled using the auxiliary mixture sampler of Kim, Shephard, and Chib (1998) once we obtain the orthogonalized errors. More specifically, we first compute 𝐄\mathbf{E} using (11). Then, we transform the ii-th column of 𝐄\mathbf{E} via 𝐲i=(logE1,i2,,logET,i2)\mathbf{y}_{i}^{*}=(\log E_{1,i}^{2},\ldots,\log E_{T,i}^{2})^{\prime}. Finally we implement the auxiliary mixture sampler in conjunction with the precision sampler of Chan and Jeliazkov (2009) to sample 𝐡i,\mathbf{h}_{i,\boldsymbol{\cdot}} using 𝐲i\mathbf{y}_{i}^{*} as data.

Steps 8-9. These two steps are standard; see, e.g., Chan and Hsiao (2014).

5 Experiments with Artificial Data

In this section we first present results on two artificial data experiments to illustrate how the new model performs under DGPs with and without stochastic volatility. In the first experiment, we generate a dataset from the VAR in (4) with n=3n=3, T=500T=500, p=4p=4 and the stochastic volatility processes as specified in (3). We then estimate the model using the posterior sampler outlined in the previous section. The first dataset is generated as follows. First, the intercepts are drawn independently from 𝒰(10,10)\mathcal{U}(-10,10), the uniform distribution on the interval (10,10)(-10,10). For the VAR coefficients, the diagonal elements of the first VAR coefficient matrix are iid 𝒰(0,0.5)\mathcal{U}(0,0.5) and the off-diagonal elements are from 𝒰(0.2,0.2)\mathcal{U}(-0.2,0.2); all other elements of the jj-th (j>1j>1) VAR coefficient matrices are iid 𝒩(0,0.12/j2).\mathcal{N}(0,0.1^{2}/j^{2}). For the impact matrix 𝐁0\mathbf{B}_{0}, the diagonal elements are iid 𝒰(0.5,2)\mathcal{U}(0.5,2), whereas the off-diagonal elements are iid 𝒩(0,1)\mathcal{N}(0,1). Finally, for the SV processes, we set ϕi=0.95\phi_{i}=0.95 and ωi2=0.05,i=1,,n.\omega_{i}^{2}=0.05,i=1,\ldots,n. The results of the artificial data experiments are reported in Figures 1-3.

Refer to caption
Figure 1: Scatter plot of the posterior means of 𝐁0\mathbf{B}_{0} (left panel), intercepts (middle panel) and VAR coefficients (right panel) against true values from a DGP with stochastic volatility.
Refer to caption
Figure 2: Estimated time-varying reduced-form variances, where σii,t\sigma_{ii,t} denote the ii-th diagonal element of 𝚺t=𝐁01𝐃t(𝐁01)\boldsymbol{\Sigma}_{t}=\mathbf{B}_{0}^{-1}\mathbf{D}_{t}(\mathbf{B}_{0}^{-1})^{\prime}, from a DGP with stochastic volatility.
Refer to caption
Figure 3: Estimated time-varying reduced-form covariances, where σij,t\sigma_{ij,t} denote the (i,j)(i,j) element of 𝚺t=𝐁01𝐃t(𝐁01)\boldsymbol{\Sigma}_{t}=\mathbf{B}_{0}^{-1}\mathbf{D}_{t}(\mathbf{B}_{0}^{-1})^{\prime}, from a DGP with stochastic volatility.

Overall, all the estimates track the true values closely. In particular, the new model is able to recover the time-varying reduced-form covariance matrices. In addition, it is also evident that 𝐁0\mathbf{B}_{0} can be estimated accurately.

In the second experiment, we generate data from the same VAR but with the stochastic volatility component turned off so that the errors are homoscedastic (in particular, we set hi,t=0h_{i,t}=0). In other words, we have generated data from a homoscedastic, unidentified model. But we are estimating it using the (mis-specified) heteroscedastic model with the posterior sampler outlined in the previous sections. The results are reported in Figures 46.

Refer to caption
Figure 4: Scatter plot of the posterior means of 𝐁0\mathbf{B}_{0} (left panel), intercepts (middle panel) and VAR coefficients (right panel) against true values from a DGP without stochastic volatility.
Refer to caption
Figure 5: Estimated time-varying reduced-form variances, where σii,t\sigma_{ii,t} denote the ii-th diagonal element of 𝚺t=𝐁01𝐃t(𝐁01)\boldsymbol{\Sigma}_{t}=\mathbf{B}_{0}^{-1}\mathbf{D}_{t}(\mathbf{B}_{0}^{-1})^{\prime}, from a DGP without stochastic volatility
Refer to caption
Figure 6: Estimated time-varying reduced-form covariances, where σij,t\sigma_{ij,t} denote the (i,j)(i,j) element of 𝚺t=𝐁01𝐃t(𝐁01)\boldsymbol{\Sigma}_{t}=\mathbf{B}_{0}^{-1}\mathbf{D}_{t}(\mathbf{B}_{0}^{-1})^{\prime}, from a DGP without stochastic volatility.

When the DGP does not have stochastic volatility, elements of the impact matrix 𝐁0\mathbf{B}_{0} are much harder to pin down (as expected since these parameters are not identified). But it is interesting to note that the estimates of the time-varying variances are still able to track the true values fairly closely (as these reduced-form error variances are identified). The VAR coefficients, as well, are well-estimated.

Finally, we document the runtimes of estimating OI-VAR-SV models of different dimensions to assess how well the posterior sampler scales to higher dimensions. More specifically, we report in Table 1 the computation times (in minutes) to obtain 10,000 posterior draws from the proposed OI-VAR-SV model of dimensions n=10,20,50n=10,20,50 and sample sizes T=300,800T=300,800. The posterior sampler is implemented in MATLAB\mathrm{M}\mathrm{{\scriptstyle ATLAB}} on a desktop with an Intel Core i7-7700 @3.60 GHz processor and 64 GB memory. It is evident from the table that for typical applications with 15-30 variables, the OI-VAR-SV model can be estimated quickly. More generally, its estimation time is comparable to that of the triangular model in Carriero, Clark, and Marcellino (2019).

Table 1: The computation times (in minutes) to obtain 10,000 posterior draws from the proposed order invariant VAR-SV model with nn variables and TT observations. All VARs have p=4p=4 lags.
T=300T=300 T=800T=800
n=10n=10 n=20n=20 n=50n=50 n=10n=10 n=20n=20 n=50n=50
2.5 17.5 233 6.8 39.7 630

6 Sensitivity to Ordering in the VAR-SV of Cogley and Sargent (2005)

In the preceding sections, we have developed Bayesian methods for order-invariant inference in VARs with SV. In this section, we illustrate the importance of this by showing the degree to which results are sensitive to ordering in the VAR-SV of Cogley and Sargent (2005), hereafter CS-VAR-SV, which is one of the most popular VAR-SV specifications in empirical macroeconomics. We do this using both simulated and macroeconomic data. To ensure comparability, we use the same prior for the CS-VAR-SV as for our OI-VAR-SV. The only difference is the prior on 𝐁0\mathbf{B}_{0}. For the CS-VAR-SV, 𝐁0\mathbf{B}_{0} is unit lower triangular, and we assume the lower triangular elements have prior distribution 𝒩(0,1)\mathcal{N}(0,1). For OI-VAR-SV, we assume the same 𝒩(0,1)\mathcal{N}(0,1) prior for the off-diagonal elements of 𝐁0\mathbf{B}_{0}, whereas the diagonal elements are iid 𝒩(1,1)\mathcal{N}(1,1). MCMC estimation of the CS-VAR-SV is carried out using the algorithm of Carriero, Chan, Clark, and Marcellino (2021).

6.1 A Simulation Experiment

As discussed in Primiceri (2005), the Cholesky structure of the model in Cogley and Sargent (2005) is unable to accommodate certain co-volatility patterns. To illustrate this point, we simulate T=500T=500 observations from the model in (3) and (4) with p=4p=4 and n=3n=3. To investigate the consequences of the lower triangular restriction we choose a non-triangular data generating process (DGP). We emphasise that the presence of SV in the DGP means that the model is identified. We set

𝐁0=(10.80.80.810.80.80.81).\mathbf{B}_{0}=\begin{pmatrix}1&-0.8&-0.8\\ 0.8&1&-0.8\\ 0.8&0.8&1\end{pmatrix}.

The VAR coefficients and volatilities are simulated in exactly the same way as described in Section 5. Figure 7 reports the estimated time-varying reduced-form error covariance matrices produced by the CS-VAR-SV and OI-VAR-SV models.

Refer to caption
Figure 7: Estimated time-varying reduced-form covariances from the OI-VAR-SV and CS-VAR-SV models, where σij,t\sigma_{ij,t} denote the (i,j)(i,j) element of 𝚺t=𝐁01𝐃t(𝐁01).\boldsymbol{\Sigma}_{t}=\mathbf{B}_{0}^{-1}\mathbf{D}_{t}(\mathbf{B}_{0}^{-1})^{\prime}.

It is clear from the figure that the estimates from the CS-VAR-SV model tend to be flat and do not capture the time-variation in the error covariances well. This is perhaps not surprising as the 𝐁0\mathbf{B}_{0} is restricted to be lower triangular in the CS-VAR-SV. However, the OI-VAR-SV does track the true σij,t\sigma_{ij,t} quite well. Since error covariances play an important role in features of interest such as impulse responses, this illustrates the potential negative consequences of working with triangularized systems.

6.2 Ordering Issues in a 20-Variable VAR

Next we provide a simple illustration of how the choice of ordering can influence empirical results in large VARs using a dataset of 2020 popular US monthly variables obtained from the FRED-MD database (McCracken and Ng, 2016). The sample period is from 1959:03 to 2019:12. These variables, along with their transformations are listed in Appendix A. Four core variables, namely, industrial production, the unemployment rate, PCE inflation and the Federal funds rate, are ranked as the first to the fourth variables, respectively. The remaining 16 variables are ordered as in Carriero, Clark, and Marcellino (2019). We estimate the OI-VAR-SV and CS-VAR-SV models with the variables in this order (we call these models OI-VAR-SV-1 and CS-VAR-SV-1). Then we reverse the order of the variables and re-estimate them (these models are OI-VAR-SV-2 and CS-VAR-SV-2).

Estimates (posterior means) of reduced-form variances and covariances of selected variables are reported in Figure 8 and Figure 9. That is, the first figure reports selected diagonal elements of 𝚺t=𝐁01𝐃t(𝐁01)\boldsymbol{\Sigma}_{t}=\mathbf{B}_{0}^{-1}\mathbf{D}_{t}(\mathbf{B}_{0}^{-1})^{\prime} under two different variable orderings, whereas the second figure reports selected off-diagonal elements of 𝚺t\boldsymbol{\Sigma}_{t}.

There are three points to note about these figures. First, as expected, estimates produced by the OI-VAR-SV model under the two different variable orderings are identical (up to MCMC approximation error). Second, estimates produced by the CS-VAR-SV model under the two different variable orderings are often similar to one another, but occasionally differ, particularly in periods of high volatility. Third, estimates from the CS-VAR-SV models are often similar to the OI-VAR-SV models, but sometimes are substantially different, particularly in the error covariances.

Refer to caption
Figure 8: Estimates of time-varying reduced-form error variances from CS-VAR-SV and OI-VAR-SV under two different variable orderings
Refer to caption
Figure 9: Estimates of time-varying reduced-form error covariances from from CS-VAR-SV and OI-VAR-SV under two different variable orderings

We next present results relating to 𝐁0\mathbf{B}_{0} which highlight the differences between our OI-VAR-SV-1 and the CS-VAR-SV-1.333Note that we are comparing the two models under the first ordering of the variables. The comparable figure using the second ordering reveals similar patterns. Figure 10 reports the posterior means of 𝐁0\mathbf{B}_{0} of these two models. Note that the two panels of the figure are very different. Part of this difference is due to the fact that, under OI-VAR-SV, the SV processes are assumed to have zero unconditional means and the diagonal elements of 𝐁0\mathbf{B}_{0} are unrestricted. Hence, the diagonal elements of 𝐁0\mathbf{B}_{0} play a key role in adjusting the scale of the error variances. In contrast the diagonal elements of 𝐁0\mathbf{B}_{0} are restricted to be one for the CS-VAR-SV model. But the key difference lies in the upper-triangular elements of 𝐁0\mathbf{B}_{0}. These are restricted to be zero in the CS-VAR-SV. But several of these upper triangular elements are estimated to be large in magnitude using our OI-VAR-SV model. The data strongly supports an unrestricted impact matrix 𝐁0\mathbf{B}_{0}.

Refer to caption
Figure 10: Posterior means of 𝐁0\mathbf{B}_{0} using CS-VAR-SV-1 (left panel) and OI-VAR-SV-1 (right panel)

Next, we report the posterior means of the shrinkage hyperparameters κ1\kappa_{1} and κ2\kappa_{2} under the two models with two different orderings of the variables. As is clear from the table, the estimates of κ1\kappa_{1} and κ2\kappa_{2} can differ substantially when using the CS-VAR-SV model, depending on how the variables are ordered. For example, the estimate of κ2\kappa_{2} increases by 84% when one reverses the order of the variables. Of course, under the proposed order-invariant model the estimates are the same (up to MCMC approximation error).

Table 2: Posterior means of κ1\kappa_{1} and κ2\kappa_{2} from the proposed order-invariant stochastic volatility model under two different orderings (OI-VAR-SV-1 and OI-VAR-SV-2), as well as those from the model of Cogley and Sargent (2005) (CS-VAR-SV-1 and CS-VAR-SV-2).
CS-VAR-SV-1 CS-VAR-SV-2 OI-VAR-SV-1 OI-VAR-SV-2
κ1\kappa_{1} 0.0177 0.0170 0.0356 0.0358
κ2\kappa_{2} 2.34×1062.34\times 10^{-6} 4.30×1064.30\times 10^{-6} 2.36×1062.36\times 10^{-6} 2.25×1062.25\times 10^{-6}

7 A Forecasting Exercise

Since VARs are commonly used for forecasting, it is interesting to investigate how the sensitivity to ordering of the CS-VAR-SV models affects forecasts and whether working with our order-invariant specification can lead to forecast improvements. Accordingly we carry out a forecasting exercise using the monthly macroeconomic dataset of sub-section 6.2 for the four models (OI-VAR-SV-1, OI-VAR-SV-2, CS-VAR-SV-1 and CS-VAR-SV-2). Forecast performance of the models is evaluated from 1970:03 till the end of the sample. Root mean squared forecast errors (RMSFEs) are used to evaluate the quality of point forecasts and averages of log predictive likelihoods (ALPLs) are used to evaluate the quality of density forecasts. We focus on four core variables: industrial production, the unemployment rate, PCE inflation and the Federal funds rate. Results for three different forecast horizons (h=1,6,12h=1,6,12) are presented in Table 3. We use the iterated method of forecasting to compute longer horizon forecasts.

Note first that, as expected, OI-VAR-SV-1 and OI-VAR-SV-2 are producing the same forecasts (up to MCMC approximation error). CS-VAR-SV-1 and CS-VAR-SV-2 are often producing forecasts which are substantially different both from one another and from those produced by the order-invariant approaches. These differences are not that noticeable in the RMSFEs, but are very noticeable in the ALPLs. This suggests ordering issues are more important for density forecasts than for point forecasts. This is not surprising since ordering issues are important for the multivariate stochastic volatility process (which plays an important role in determining predictive variances) but not for the VAR coefficients (which are the key determinants of predictive means). These findings are similar to those found in smaller VAR-SVs by Arias, Rubio-Ramirez, and Shin (2021).

A careful examination of the ALPL results indicates that the best forecasting model is almost always the OI-VAR-SV and in most cases the forecast improvements are statistically significant relative to the CS-VAR-SV-1 benchmark. A comparison of CS-VAR-SV-1 to CS-VAR-SV-2 yields no simple conclusion. Remember that CS-VAR-SV-1 uses a similar ordering of Carriero, Clark, and Marcellino (2019). For most variables and forecast horizons, this ordering is leading to higher ALPLs than the reverse ordering. This finding is also consistent with the results in Arias, Rubio-Ramirez, and Shin (2021). But there are exceptions such as forecasting the Fed funds rate for h=6h=6 and h=12h=12. But the important point is not that, when using CS-VAR-SV models, one ordering is better than the other. The important points are that ordering matters (often in a statistically significant way) and that different variables prefer different orderings. That is, there is no one ordering that forecasts all of the variables best.

Table 3: RMSFE and ALPL of four core macroeconomic time series.
Variables Models RMSFE ALPL
h=1h=1 h=6h=6 h=12h=12 h=1h=1 h=6h=6 h=12h=12
Industrial production CS-VAR-SV-1 0.007 0.007 0.007 2.071 2.159 2.266
CS-VAR-SV-2 0.007∗∗ 0.007∗∗ 0.007∗∗ 1.522∗∗∗ 1.651∗∗∗ 1.767∗∗∗
OI-VAR-SV-1 0.007 0.007 0.007∗∗∗ 3.660∗∗∗ 3.458∗∗∗ 3.360∗∗∗
OI-VAR-SV-2 0.007 0.007 0.007∗∗∗ 3.659∗∗∗ 3.459∗∗∗ 3.358∗∗∗
Unemployment rate CS-VAR-SV-1 0.159 0.169 0.173 -0.005 -0.689 -0.707
CS-VAR-SV-2 0.158 0.169 0.173 -0.124 -0.761 -0.742
OI-VAR-SV-1 0.158 0.167 0.173 0.463∗∗∗ 0.276 0.152
OI-VAR-SV-2 0.158 0.167 0.173 0.463∗∗∗ 0.276 0.156
PCE inflation CS-VAR-SV-1 0.002 0.002 0.002 2.210 2.311 2.425
CS-VAR-SV-2 0.002∗∗∗ 0.002 0.002 1.811∗∗∗ 1.919∗∗∗ 2.024∗∗∗
OI-VAR-SV-1 0.002∗∗ 0.002 0.002 4.821∗∗∗ 4.448∗∗∗ 4.249∗∗∗
OI-VAR-SV-2 0.002∗∗ 0.002 0.002 4.823∗∗∗ 4.453∗∗∗ 4.260∗∗∗
Federal funds rate CS-VAR-SV-1 0.492 1.566 2.216 0.222 -8.473 -16.704
CS-VAR-SV-2 0.497 1.597 2.264 -0.327∗∗∗ -6.708∗∗∗ -12.474∗∗∗
OI-VAR-SV-1 0.493 1.576 2.233 0.294∗∗∗ -6.956 -13.466
OI-VAR-SV-2 0.494 1.570 2.236 0.296∗∗∗ -7.039 -13.662

Note: The bold figure indicates the best model in each case. , ∗∗ and ∗∗∗ denote, respectively, 0.10, 0.05 and 0.01 significance level for a two-sided Diebold and Mariano (1995) test. The benchmark model is CS-VAR-SV-1.

8 Conclusions

In this paper, we have demonstrated, both theoretically and empirically, the consequences of working with VAR-SVs which use Cholesky decompositions of the error covariance matrices such as that used in Cogley and Sargent (2005) and are thus not order invariant. We have proposed a new specification which is order invariant which involves working with an unrestricted version of the impact matrix, 𝐁0\mathbf{B}_{0}. Such a model would be unidentified in the homoscedastic VAR but we draw on Bertsche and Braun (2020) to establish that the incorporation of SV identifies the model. We develop an MCMC algorithm which allows for Bayesian inference and prediction in our order-invariant model. In an empirical exercise involving 2020 macroeconomic variables we demonstrate the ability of our methods to produce accurate forecasts in a computationally efficient manner.

Appendix A: Data Description

This appendix provides details of the monthly dataset used in the forecasting exercise. The variables and their transformations are the same as in Carriero, Clark, and Marcellino (2019).

Table 4: Monthly dataset of 20 variables from FRED-MD.
Variable Mnemonic Transformation
Real personal income RPI Δlog\Delta\text{log}
Real PCE DPCERA3M086SBEA Δlog\Delta\text{log}
Real manufacturing and trade sales CMRMTSPLx Δlog\Delta\text{log}
Industrial production INDPRO Δlog\Delta\text{log}
Capacity utilization in manufacturing CUMFNS Δ\Delta
Civilian unemployment rate UNRATE Δ\Delta
Total nonfarm employment PAYEMS Δlog\Delta\text{log}
Hours worked: goods-producing CES0600000007 no transformation
Average hourly earnings: goods-producing CES0600000008 Δlog\Delta\text{log}
PPI for finished goods WPSFD49207 Δ2log\Delta^{2}\text{log}
PPI for commodities PPICMM Δ2log\Delta^{2}\text{log}
PCE price index PCEPI Δ2log\Delta^{2}\text{log}
Federal funds rate FEDFUNDS no transformation
Total housing starts HOUST log
S&P 500 price index S&P 500 Δlog\Delta\text{log}
U.S.-U.K. exchange rate EXUSUKx Δlog\Delta\text{log}
1 yr. Treasury-FEDFUNDS spread T1YFFM no transformation
10 yr. Treasury-FEDFUNDS spread T10YFFM no transformation
BAA-FEDFUNDS spread BAAFFM no transformation
ISM: new orders index NAPMNOI no transformation

References

  • (1)
  • Arias, Rubio-Ramirez, and Shin (2021) Arias, J. E., J. F. Rubio-Ramirez, and M. Shin (2021): “Macroeconomic forecasting and variable ordering in multivariate stochastic volatility models,” Federal Reserve Bank of Philadelphia Working Papers.
  • Asai and McAleer (2009) Asai, M., and M. McAleer (2009): “The structure of dynamic correlations in multivariate stochastic volatility models,” Journal of Econometrics, 150(2), 182–192.
  • Banbura, Giannone, and Reichlin (2010) Banbura, M., D. Giannone, and L. Reichlin (2010): “Large Bayesian vector auto regressions,” Journal of Applied Econometrics, 25(1), 71–92.
  • Bertsche and Braun (2020) Bertsche, D., and R. Braun (2020): “Identification of structural vector autoregressions by stochastic volatility,” Journal of Business & Economic Statistics, just-accepted, 1–39.
  • Bognanni (2018) Bognanni, M. (2018): “A class of time-varying parameter structural VARs for inference under exact or set identification,” FRB of Cleveland Working Paper.
  • Carriero, Chan, Clark, and Marcellino (2021) Carriero, A., J. C. C. Chan, T. E. Clark, and M. G. Marcellino (2021): “Corrigendum to: Large Bayesian vector autoregressions with stochastic volatility and non-conjugate priors,” working paper.
  • Carriero, Clark, and Marcellino (2016) Carriero, A., T. E. Clark, and M. G. Marcellino (2016): “Common drifting volatility in large Bayesian VARs,” Journal of Business and Economic Statistics, 34(3), 375–390.
  • Carriero, Clark, and Marcellino (2019)    (2019): “Large Bayesian vector autoregressions with stochastic volatility and non-conjugate priors,” Journal of Econometrics, 212(1), 137–154.
  • Carvalho, Polson, and Scott (2010) Carvalho, C. M., N. G. Polson, and J. G. Scott (2010): “The horseshoe estimator for sparse signals,” Biometrika, 97(2), 465–480.
  • Chan (2020) Chan, J. C. C. (2020): “Large Bayesian VARs: A flexible Kronecker error covariance structure,” Journal of Business and Economic Statistics, 38(1), 68–79.
  • Chan (2021)    (2021): “Minnesota-type adaptive hierarchical priors for large Bayesian VARs,” International Journal of Forecasting, 37(3), 1212–1226.
  • Chan, Doucet, León-González, and Strachan (2018) Chan, J. C. C., A. Doucet, R. León-González, and R. W. Strachan (2018): “Multivariate stochastic volatility with co-heteroscedasticity,” CAMA Working Paper.
  • Chan and Hsiao (2014) Chan, J. C. C., and C. Y. L. Hsiao (2014): “Estimation of stochastic volatility models with heavy tails and serial dependence,” in Bayesian Inference in the Social Sciences, ed. by I. Jeliazkov, and X.-S. Yang, pp. 159–180. John Wiley & Sons, Hoboken.
  • Chan and Jeliazkov (2009) Chan, J. C. C., and I. Jeliazkov (2009): “Efficient simulation and integrated likelihood estimation in state space models,” International Journal of Mathematical Modelling and Numerical Optimisation, 1(1), 101–120.
  • Clark (2011) Clark, T. E. (2011): “Real-time density forecasts from Bayesian vector autoregressions with stochastic volatility,” Journal of Business and Economic Statistics, 29(3), 327–341.
  • Cogley and Sargent (2005) Cogley, T., and T. J. Sargent (2005): “Drifts and volatilities: Monetary policies and outcomes in the post WWII US,” Review of Economic Dynamics, 8(2), 262–302.
  • Diebold and Mariano (1995) Diebold, F. X., and R. S. Mariano (1995): “Comparing predictive accuracy,” Journal of Business and Economic Statistics, 13(3), 253–263.
  • Kastner (2019) Kastner, G. (2019): “Sparse Bayesian time-varying covariance estimation in many dimensions,” Journal of econometrics, 210(1), 98–115.
  • Kim, Shephard, and Chib (1998) Kim, S., N. Shephard, and S. Chib (1998): “Stochastic volatility: Likelihood inference and comparison with ARCH models,” Review of Economic Studies, 65(3), 361–393.
  • Lanne, Lütkepohl, and Maciejowska (2010) Lanne, M., H. Lütkepohl, and K. Maciejowska (2010): “Structural vector autoregressions with Markov switching,” Journal of Economic Dynamics and Control, 34(2), 121–131.
  • Lewis (2021) Lewis, D. J. (2021): “Identifying shocks via time-varying volatility,” The Review of Economic Studies.
  • Makalic and Schmidt (2016) Makalic, E., and D. F. Schmidt (2016): “A simple sampler for the horseshoe estimator,” IEEE Signal Processing Letters, 23(1), 179–182.
  • McCracken and Ng (2016) McCracken, M. W., and S. Ng (2016): “FRED-MD: A monthly database for macroeconomic research,” Journal of Business and Economic Statistics, 34(4), 574–589.
  • Philipov and Glickman (2006) Philipov, A., and M. Glickman (2006): “Multivariate stochastic volatility via Wishart processes,” Journal of Business and Economic Statistics, 24, 313–328.
  • Prado and West (2010) Prado, R., and M. West (2010): Time Series: Modeling, Computation, and Inference. Chapman and Hall/CRC.
  • Primiceri (2005) Primiceri, G. E. (2005): “Time varying structural vector autoregressions and monetary policy,” Review of Economic Studies, 72(3), 821–852.
  • Uhlig (1997) Uhlig, H. (1997): “Bayesian vector autoregressions with stochastic volatility,” Econometrica, 65(1), 59–73.
  • Villani (2009) Villani, M. (2009): “Steady-state priors for vector autoregressions,” Journal of Applied Econometrics, 24(4), 630–650.
  • Waggoner and Zha (2003) Waggoner, D., and T. Zha (2003): “A Gibbs sampler for structural vector autoregressions,” Journal of Economic Dynamics and Control, 28(2), 349–366.
  • West and Harrison (2006) West, M., and J. Harrison (2006): Bayesian Forecasting and Dynamic Models. Springer Science & Business Media.