This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Cluster GARCH

Chen Tong and Peter Reinhard Hansen and Ilya Archakov§


School of Economics, Xiamen University
University of North Carolina, Chapel Hill
§York University, Toronto, Canada
We are grateful for many valuable comments made by participants at the 2023 conference “Robust Econometric Methods in Financial Econometrics”. Chen Tong acknowledges financial support from the Youth Fund of the National Natural Science Foundation of China (72301227), and the Ministry of Education of China, Humanities and Social Sciences Youth Fund (22YJC790117). Corresponding author: Peter Reinhard Hansen. Email: [email protected].
(April 8, 2025)
Abstract

We introduce a novel multivariate GARCH model with flexible convolution-tt distributions that is applicable in high-dimensional systems. The model is called Cluster GARCH because it can accommodate cluster structures in the conditional correlation matrix and in the tail dependencies. The expressions for the log-likelihood function and its derivatives are tractable, and the latter facilitate a score-drive model for the dynamic correlation structure. We apply the Cluster GARCH model to daily returns for 100 assets and find it outperforms existing models, both in-sample and out-of-sample. Moreover, the convolution-tt distribution provides a better empirical performance than the conventional multivariate tt-distribution.


Keywords: Multivariate GARCH, Score-Driven Model, Cluster Structure, Block Correlation Matrix, Heavy Tailed Distributions.

JEL Classification: G11, G17, C32, C58

1 Introduction

Univariate GARCH models have enjoyed considerable empirical success since they were introduced in Engle, (1982) and refined in Bollerslev, (1986). In contrast, the success of multivariate GARCH models has been more moderate due to a number of challenges, see e.g. Bauwens et al., (2006). A common approach to modeling covariance matrices is to model variances and correlations separately, as is the case in the Constant Conditional Correlation (CCC) model by Bollerslev, (1990) and the Dynamic Conditional Correlation (DCC) model by Engle, (2002). See also Engle and Sheppard, (2001), Tse and Tsui, (2002), Aielli, (2013), Engle et al., (2019), and Pakel et al., (2021). While univariate conditional variances can be effectively modeled using standard GARCH models, the modeling of dynamic conditional correlation matrices necessitates less intuitive choices to be made. One challenge is that the number of correlations increases with the square of the number of variables, a second challenge is that the conditional correlation matrix must be positive semidefinite, and a third challenge is to determine how correlations should be updated in response to sample information.

In this paper, we develop a novel dynamic model of the conditional correlation matrix, the Cluster GARCH model, which has three main features. First, use convolution-tt distributions, which is a flexible class of multivariate heavy-tailed distributions with tractable likelihood expressions. The multivariate tt-distributions are nested in this framework, but a convolution-tt distribution can have heterogeneous marginal distributions and cluster-based dependencies. For instance, convolution-tt distributions can generate the type of sector-specific price jumps reported in Andersen et al., (2024). Second, the dynamic model is based on the score-driven framework by Creal et al., (2013), which leads to closed-form expressions for all key quantities. Third, the model can be combined with a block correlation structure that makes the model applicable to high-dimensional systems. This partitioning, defining the block structure, can also be interpreted as a second type of cluster structure.

Heavy-tailed distributions are common in financial returns, and many empirical studies adopt the multivariate tt-distribution to model vectors of financial series, e.g., Kotz and Nadarajah, (2004), Harvey, (2013), and Ibragimov et al., (2015). An implication of the multivariate tt-distribution is that all standardized returns have identical and time-invariant marginal distributions. This is a restrictive assumption, especially in high dimensions. The convolution-tt distributions by Hansen and Tong, (2024) relax these assumptions, and one of the main contributions of this paper is to incorporate this class of distributions into a tractable multivariate GARCH model. A convolution-tt distribution is a convolution of multivariate tt-distributions. In the Cluster GARCH model, standardized returns are time-varying linear combinations of independent tt-distributions, which can have different degrees of freedom. This leads to dynamic and heterogeneous marginal distributions for standardized returns, albeit the conventional multivariate tt-distribution is nested in this framework as a special case. We focus on three particular types of convolution-tt distributions, labelled Canonical-Block-tt, Cluster-tt, and Hetero-tt. These all have relatively simple log-likelihood functions, such that we can obtain closed-form expressions for the first two derivatives, score and information matrix, of the conditional log-likelihood functions. These are used in our score-driven model for the time-varying correlation structure, which is a key component of the Cluster GARCH model.

High-dimensional correlation matrices can be modeled using a parsimonious block structure for the conditional correlation matrix. The DCC model is parsimonious but greatly limits the way the conditional covariance matrix can be updated. Without additional structure, the number of latent variables increases with n2n^{2}, where nn is the number of assets. This number becomes unmanageable once nn is more than a single digit, and maintaining a positive definite correlation matrix can be challenging too. The correlation structure in the Block DECO model by Engle and Kelly, (2012) is an effective way to reduce the dimension of the estimated parameters. However, the estimation strategy in Engle and Kelly, (2012) was based on an ad-hoc averaging of within-block correlations for an auxiliary DCC model, and they did not fully utilize the simplifications offered by the block structure.111They derived likelihood expressions for the case with K=2K=2 blocks. For more two blocks, K>2K>2, they resort to a composite likelihood evaluation. The model proposed in this paper draws on recent advances in correlation matrix analysis by Archakov and Hansen, (2021, 2024). We will, in some specifications, adopt the block parameterization of the conditional correlation matrix, used in Archakov et al., (2020), which has (at most) K(K+1)/2K\left(K+1\right)/2 free parameters where KK is the number of blocks. This approach guarantees a positive definite correlation matrix and the likelihood evaluation is greatly simplified. Overall, the Cluster GARCH offers a good balance between flexibility and computational feasibility in high dimensions.

We adopt the convenient parametrization of the conditional correlation matrix, γ(C)\gamma(C), which is defined by taking the matrix logarithm of the correlation matrix, CC, and stacking the off-diagonal elements of logC\log C into the vector, γd\gamma\in\mathbb{R}^{d}, where d=n(n1)/2d=n(n-1)/2. This parametrization was introduced in Archakov and Hansen, (2021) and the mapping Cγ(C)C\mapsto\gamma(C) is one-to-one between the set of non-singular correlation matrices 𝒞n×n\mathcal{C}_{n\times n} and d\mathbb{R}^{d}. So, the inverse mapping, C(γ)C(\gamma), will always yield a positive definite correlation matrix and any non-singular correlation matrix can be generated in this way. The parametrization can be viewed as a generalization of Fisher’s Z-transformation to the multivariate case. It has attractive finite sample properties, which makes it suitable for an autoregressive model structure, see Archakov and Hansen, (2021).

A block correlation structure arises when variables can be partitioned into clusters, KK say, and the correlation between two variables is determined by their cluster assignments. When CC has a block structure, then logC\log C also has a block structure. This leads to a new parametrization of block correlation matrices, which defines a one-to-one mapping Cη(C)C\mapsto\eta(C) between the set of non-singular block correlation matrices 𝒞n×n\mathcal{C}_{n\times n} and d\mathbb{R}^{d} with d=K(K+1)/2d=K\left(K+1\right)/2. We adopt the canonical representation by Archakov and Hansen, (2024), which is a quasi-spectral decomposition of block matrices that diagonalizes the matrix with the exception of a small K×KK\times K submatrix. This decomposition makes the model parsimonious and greatly simplifies the evaluations of the log-likelihood function. This parameterization of block correlation matrices is more general than the factor-based approach to parametrizing block correlation matrices.222The factor-induced block structure, see Creal and Tsay, (2015), Opschoor et al., (2021), and Oh and Patton, (2023), entails superfluous restrictions on CC, see Tong and Hansen, (2023). Both approach simplifies the computation of detC\det C and C1C^{-1}, but only the parametrization based on the canonical representation simplifies the evaluation of the likelihood function for the convolution-tt distributions.

Our paper contributes to the literature on score-driven model for dynamics of covariance matrices. Using the multivariate tt-distribution, Creal et al., (2012) and Hafner and Wang, (2023) proposed score-driven model for time-varying covariance and correlation matrix, respectively.333The model by Hafner and Wang, (2023) update parameters using the unscaled score, i.e., they did not use the information matrix. Oh and Patton, (2023) proposed a score-driven dynamic factor copula models with skew-tt copula function, however, the analytical information matrices in these copula models are not available. Using realized measures of the covariance matrix, Gorgi et al., (2019) proposed the Realized Wishart-GARCH, which relies on a Wishart distribution for realized covariance matrices and on a Gaussian distribution for returns. Opschoor et al., (2017) constructed a multivariate HEAVY model based on Heavy-tailed distributions for both returns and the realized covariances. An aspect, which sets the Cluster GARCH apart from the existing literature, is that the model is based on the convolution-tt distributions, which includes the Gaussian distribution and the multivariate tt-distributions as special cases. The block structures we impose on the correlation matrix in some specifications, was previously used in Archakov et al., (2020). Their model used the Realized GARCH framework with a Gaussian specification, whereas we adopt the score-driven framework for convolution-tt distributions, and do require realized volatility measures in the modeling.

We conduct an extensive empirical investigation on the performance of our dynamic model for correlation matrices. The sample period spans the period from January 3, 2005 to December 31, 2021. The new model is applicable to high dimensions, and we consider a “small universe” with n=9n=9 assets and a “large universe” with n=100n=100 assets. The small universe allows us compare the new models with a range of existing models, as most of these are not applicable to the large universe. We also undertake an more detailed specification analysis with the small universe. The nine stocks are from three sectors, three from each sector, which motivates certain block and cluster structures. First, we find that the convolution-tt distribution offers a better fit than the conventional tt-distribution. Overall, the Cluster-tt distribution has the largest log-likelihood value. Second, we find that score-driven models successfully captures the dynamic variation in the conditional correlation matrix. The new score-driven models outperform traditional DCC models when based on the same distributional assumptions, and the proposed score-driven model with a sector motivated block correlation matrix has the smallest BIC.

The large universe with n=100n=100 stocks poses no obstacles for the Cluster GARCH model. We used the sector classification of the stocks to define the block structure in the correlation matrix. We also used the sector classification to explore possible cluster structures in the tail-dependencies, which are related to parameters in the convolution-tt distribution. With K=10K=10 sectors this reduces the number of free parameters in the correlation matrix from 4950 to 55, and the model estimation is very fast and stable, in part because the required computations only involve K×KK\times K matrices (instead of n×nn\times n matrices). For the large universe, the empirical results favor the Hetero-tt specification, which entails a convolutions of a large number of univariate tt-distributions. We also find that correlation targeting, which is analogous to variance targeting in GARCH models, is beneficial.

The rest of this paper is organized as follows: In Section 2 we introduce a new parametrization of block correlation matrices, based on Archakov and Hansen, (2021) and Archakov and Hansen, (2024). In Section 3, we introduce the convolution-tt distributions. We derive the score-driven models in Section 4, and we obtain analytical expressions for the score and information matrix for the convolution-tt distributions, including the special case where CC has a block structure. Some details about practical implementation are given in Section 5. The empirical analysis is presented in Section 6 and includes in-sample and out-of-sample evaluations and comparisons. All proofs are given in the Appendix.

2 The Theoretical Model

Consider an nn-dimensional time-series, RtR_{t}, t=1,2,,Tt=1,2,\ldots,T, and let {t}\{\mathcal{F}_{t}\} be a filtration to which RtR_{t} is adapted, i.e. RttR_{t}\in\mathcal{F}_{t}. We denote the conditional mean by μt=𝔼(Rt|t1)\mu_{t}=\mathbb{E}(R_{t}|\mathcal{F}_{t-1}) and the conditional covariance matrix by Σt=var(Rt|t1)\Sigma_{t}=\mathrm{var}(R_{t}|\mathcal{F}_{t-1}). With Λσtdiag(σ1t,,σnt)\Lambda_{\sigma_{t}}\equiv\mathrm{diag}(\sigma_{1t},\ldots,\sigma_{nt}), where σit2=var(Rit|t1)\sigma_{it}^{2}=\mathrm{var}(R_{it}|\mathcal{F}_{t-1}), i=1,,ni=1,\ldots,n, it follows that the conditional correlation matrix is given by

Ct=Λσt1ΣtΛσt1.C_{t}=\Lambda_{\sigma_{t}}^{-1}\Sigma_{t}\Lambda_{\sigma_{t}}^{-1}.

Initially, we take μt\mu_{t} and Λσt\Lambda_{\sigma_{t}} as given and focus on the dynamic modeling of CtC_{t}. We are particularly interested in the case where nn is large. We define the following standardized variables with a dynamic correlation matrix CtC_{t},

Zt=Λσt1(Xtμt).Z_{t}=\Lambda_{\sigma_{t}}^{-1}(X_{t}-\mu_{t}).

To simplify the notation, we omit subscript-tt in most of Sections 2 and 3 and reintroduce it again in Section 4 where the dynamic model is presented.

2.1 Block Correlation Matrix

If nn is relatively small, we can model the dynamic correlation matrix using d=n(n1)/2d=n(n-1)/2 latent variables. Additional structure on CC is required when nn is larger, because the number of latent variables becomes unmanageable. Additional structure can be imposed using a block structures on CC, as in Engle and Kelly, (2012).

A block correlation matrix is characterized by a partitioning of the variables into clusters, such that the correlation between two variables is solely determined by their cluster assignments. Let KK be the number of clusters, and let nkn_{k} be the number of variables in the kk-th cluster, k=1,,Kk=1,\ldots,K, such that n=k=1Knkn=\sum_{k=1}^{K}n_{k}. We let 𝒏=(n1,n2,,nK)\bm{n}=\left(n_{1},n_{2},\ldots,n_{K}\right)^{\prime} be the vector with cluster sizes and sort the variables such that the first n1n_{1} variables are those in the first cluster , the next n2n_{2} variables are those in the second cluster, and so forth. Then C=corr(Z)C=\mathrm{corr}(Z) will have the following block structure

C=[C[1,1]C[1,2]C[1,K]C[2,1]C[2,2]C[K,1]C[K,K]],C=\left[\begin{array}[]{cccc}C_{[1,1]}&C_{[1,2]}&\cdots&C_{[1,K]}\\ C_{[2,1]}&C_{[2,2]}\\ \vdots&&\ddots\\ C_{[K,1]}&&&C_{[K,K]}\end{array}\right], (1)

where C[k,l]C_{[k,l]} is an nk×nln_{k}\times n_{l} matrix given by

C[k,l]=[ρklρklρklρkl], for kland C[k,k]=[1ρkkρkkρkk1ρkk1].C_{[k,l]}=\left[\begin{array}[]{ccc}\rho_{kl}&\cdots&\rho_{kl}\\ \vdots&\ddots&\vdots\\ \rho_{kl}&\cdots&\rho_{kl}\end{array}\right],\text{ for }k\neq l\qquad\text{and }C_{[k,k]}=\left[\begin{array}[]{cccc}1&\rho_{kk}&\cdots&\rho_{kk}\\ \rho_{kk}&1&\ddots\\ \vdots&\ddots&\ddots\\ \rho_{kk}&&&1\end{array}\right].

Each block, C[k,l]C_{[k,l]}, has just one correlation coefficient, such that the block structure reduces the number of unique correlations from n(n1)/2n\left(n-1\right)/2 to at most K(K+1)/2K\left(K+1\right)/2.444This is based on the general case that the number of assets in each group is at least two. When there are K~K\tilde{K}\leq K groups with only one asset, this number become K(K+1)/2K~K\left(K+1\right)/2-\tilde{K}. The reason for the distinction between these two cases is that an 1×11\times 1 diagonal block has no correlation coefficients. This number does not increase with nn, and this makes it possible to scale the model to accommodate high-dimensional correlation matrices.

Below we derive score-driven models for unrestricted correlation matrices and for the case where CC has a block structure. time].555It is unproblematic to extend the model to allow for some missing observations and occasional changes in the cluster assignments.

2.2 Parametrizing the Correlation Matrix

We parameterize the correlation matrix with the vector

γ(C)vecl(logC)d,d=n(n1)/2,\gamma(C)\equiv{\rm vecl}\left(\log C\right)\in\mathbb{R}^{d},\qquad d=n(n-1)/2, (2)

where vecl(){\rm vecl}(\cdot) extracts and vectorizes the elements below the diagonal and logC\log C is the matrix logarithm of the correlation matrix.666For a nonsingular correlation matrix, we have logC=QlogΛQ\log C=Q\log\Lambda Q^{\prime}, where C=QΛQC=Q\Lambda Q^{\prime} is the spectral decomposition of CC, so that Λ\Lambda is a diagonal matrix with the eigenvalues of CC. The following example illustrates this parametrization for an 3×33\times 3 correlation matrix:

vecl[log(1.00.51.00.30.71.0)]=vecl[(0.150.530.470.130.850.34)]=(0.530.130.85)=:γ.{\rm vecl}\left[\log\left(\begin{array}[]{ccc}1.0&\bullet&\bullet\\ 0.5&1.0&\bullet\\ 0.3&0.7&1.0\end{array}\right)\right]={\rm vecl}\left[\left(\begin{array}[]{ccc}-0.15&\phantom{-}\bullet&\phantom{-}\bullet\\ \phantom{-}0.53&-0.47&\phantom{-}\bullet\\ \phantom{-}0.13&\phantom{-}0.85&-0.34\end{array}\right)\right]=\left(\begin{array}[]{c}0.53\\ 0.13\\ 0.85\end{array}\right)=:\gamma.

This parametrization is convenient because it guarantees a unique positive definiteness correlation matrix, C(γ)C(\gamma) for any vector γ,\gamma, without imposing superfluous restrictions on the correlation matrix, see Archakov and Hansen, (2021).

For a block correlation matrix the logarithmic transformation preserves the block structure as illustrated in the following example:

[1.00.80.40.40.20.20.20.81.00.40.40.20.20.20.40.41.00.60.10.10.10.40.40.61.00.10.10.10.20.20.10.11.00.30.30.20.20.10.10.31.00.30.20.20.10.10.30.31.0]=C[.591.02.251.251.115.115.1151.02.59.251.251.115.115.115.251.251.29.626.036.036.036.251.251.626.29.036.036.036.115.115.036.036.09.259.259.115.115.036.036.259.09.259.115.115.036.036.259.259.09]=logC.\underbrace{\left[\begin{array}[]{ccccccc}\pagecolor{black!10}1.0&\pagecolor{black!10}0.8&0.4&0.4&\pagecolor{black!05}0.2&\pagecolor{black!05}0.2&\pagecolor{black!05}0.2\\ \pagecolor{black!10}0.8&\pagecolor{black!10}1.0&0.4&0.4&\pagecolor{black!05}0.2&\pagecolor{black!05}0.2&\pagecolor{black!05}0.2\\ 0.4&0.4&\pagecolor{black!10}1.0&\pagecolor{black!10}0.6&0.1&0.1&0.1\\ 0.4&0.4&\pagecolor{black!10}0.6&\pagecolor{black!10}1.0&0.1&0.1&0.1\\ \pagecolor{black!05}0.2&\pagecolor{black!05}0.2&0.1&0.1&\pagecolor{black!10}1.0&\pagecolor{black!10}0.3&\pagecolor{black!10}0.3\\ \pagecolor{black!05}0.2&\pagecolor{black!05}0.2&0.1&0.1&\pagecolor{black!10}0.3&\pagecolor{black!10}1.0&\pagecolor{black!10}0.3\\ \pagecolor{black!05}0.2&\pagecolor{black!05}0.2&0.1&0.1&\pagecolor{black!10}0.3&\pagecolor{black!10}0.3&\pagecolor{black!10}1.0\end{array}\right]}_{=C}\quad\underbrace{\left[\begin{array}[]{ccccccc}\pagecolor{black!10}-.59&\pagecolor{black!10}1.02&.251&.251&\pagecolor{black!05}.115&\pagecolor{black!05}.115&\pagecolor{black!05}.115\\ \pagecolor{black!10}1.02&\pagecolor{black!10}-.59&.251&.251&\pagecolor{black!05}.115&\pagecolor{black!05}.115&\pagecolor{black!05}.115\\ .251&.251&\pagecolor{black!10}-.29&\pagecolor{black!10}.626&.036&.036&.036\\ .251&.251&\pagecolor{black!10}.626&\pagecolor{black!10}-.29&.036&.036&.036\\ \pagecolor{black!05}.115&\pagecolor{black!05}.115&.036&.036&\pagecolor{black!10}-.09&\pagecolor{black!10}.259&\pagecolor{black!10}.259\\ \pagecolor{black!05}.115&\pagecolor{black!05}.115&.036&.036&\pagecolor{black!10}.259&\pagecolor{black!10}-.09&\pagecolor{black!10}.259\\ \pagecolor{black!05}.115&\pagecolor{black!05}.115&.036&.036&\pagecolor{black!10}.259&\pagecolor{black!10}.259&\pagecolor{black!10}-.09\end{array}\right]}_{=\log C}.

The parameter vector, γ\gamma will only have as many unique elements as there are different blocks in CC. This number is (K+1)K/2(K+1)K/2, and we can therefore condense γ\gamma into a subvector, η\eta, such that

γ=Bη,\gamma=B\eta, (3)

where BB is a known bit-matrix with a single one in each row and ηK(K+1)/2\eta\in\mathbb{R}^{K(K+1)/2}. This factor structure for γ\gamma was first proposed in Archakov et al., (2020).

For later use, we define the condensed log-correlation matrix, C~K×K\tilde{C}\in\mathbb{R}^{K\times K}, whose (k,l(k,l)-th element is the off-diagonal element from the (k,l)(k,l)-th block of logC\log C, k,l=1,,Kk,l=1,\ldots,K, and we can set η=vech(C~)K(K+1)/2\eta=\mathrm{vech}(\tilde{C})\in\mathbb{R}^{K(K+1)/2}. In the example above, we have

C~=[1.02.251.115.251.626.036.115.036.259],\tilde{C}=\left[\begin{array}[]{ccc}\pagecolor{black!10}1.02&.251&\pagecolor{black!05}.115\\ .251&\pagecolor{black!10}.626&.036\\ \pagecolor{black!05}.115&.036&\pagecolor{black!10}.259\end{array}\right],

such that η=[1.02,0.251,0.115,0.626,0.036,0.259]\eta=\left[1.02,0.251,0.115,0.626,0.036,0.259\right]^{\prime} has dimension six whereas γ\gamma has dimension 21. Since the block correlation matrix, CC, is only a function of η\eta we can model the time-variation in CC using a dynamic model for the unrestricted vector η\eta. This will be our approach below.

2.3 Canonical Form for the Block Correlation Matrix

Block matrices has a canonical representation that resembles the eigendecomposition of matrices, see Archakov and Hansen, (2024). For a block correlation matrix with block-sizes, (n1,,nK)(n_{1},\ldots,n_{K}), we have

C=QDQ,D=[A000λ1In11000λKInK1],λk=nkAkknk1,C=QDQ^{\prime},\quad D=\left[\begin{array}[]{cccc}A&0&\cdots&0\\ 0&\lambda_{1}I_{n_{1}-1}&\ddots&\vdots\\ \vdots&\ddots&\ddots&0\\ 0&\cdots&0&\lambda_{K}I_{n_{K}-1}\end{array}\right],\quad\lambda_{k}=\frac{n_{k}-A_{kk}}{n_{k}-1}, (4)

where the upper left block, AA, is an K×KK\times K matrix with elements Akl=ρklnknlA_{kl}=\rho_{kl}\sqrt{n_{k}n_{l}}, for kl,k\neq l,and Akk=1+(nk1)ρkkA_{kk}=1+\left(n_{k}-1\right)\rho_{kk}. The matrix QQ is a group-specific orthonormal matrix, i.e., QQ=QQ=InQ^{\prime}Q=QQ^{\prime}=I_{n}. Importantly, QQ is solely determined by the block sizes, (n1,,nK)(n_{1},\ldots,n_{K}), and does not depend on the elements in CC. This matrix is given by

Q=[vn10vn1000vn20vn20vnK0vnK],Q=\left[\begin{array}[]{cccccccc}v_{n_{1}}&0&\cdots&&v_{n_{1}}^{\perp}&0&\cdots&0\\ 0&v_{n_{2}}&&&0&v_{n_{2}}^{\perp}&&\vdots\\ \vdots&&\ddots&&&&\ddots\\ 0&\cdots&&v_{n_{K}}&0&\cdots&&v_{n_{K}}^{\perp}\end{array}\right],

where vnk=(1/nk,,1/nk)nkv_{n_{k}}=(1/\sqrt{n_{k}},\ldots,1/\sqrt{n_{k}})^{\prime}\in\mathbb{R}^{n_{k}} and vnkv_{n_{k}}^{\perp} is an nk×(nk1)n_{k}\times(n_{k}-1) matrix, which is orthogonal to vnkv_{n_{k}}, i.e., vnkvnk=0v_{n_{k}}^{\prime}v_{n_{k}}^{\perp}=0, and orthonormal, such that vnkvnk=Ink1.v_{n_{k}}^{\perp\prime}v_{n_{k}}^{\perp}=I_{n_{k}-1}.777The Gram-Schmidt process can be used to obtain vnv_{n\perp} from vnv_{n}. The canonical representation enables us to rotate ZZ with QQ and define

Y=QZ,with Y=(Y0,Y1,,YK),Y=Q^{\prime}Z,\qquad\text{with }\quad Y=(Y_{0}^{\prime},Y_{1}^{\prime},\ldots,Y_{K}^{\prime})^{\prime}, (5)

where Y0Y_{0} is KK-dimensional with var(Y0)=A{\rm var}(Y_{0})=A, and YkY_{k} is nk1n_{k}-1 dimensional with var(Yk)=λkInk1{\rm var}(Y_{k})=\lambda_{k}I_{n_{k}-1} for k=1,,Kk=1,\ldots,K. The block-diagonal structure of DD implies that Y0,Y1,Y_{0},Y_{1},\ldots, and YKY_{K} are uncorrelated, which simplifies several expressions. For instance, we have the following identities:

|C|=|A|k=1Kλknk1,ZC1Z=Y0A1Y0+k=1Kλk1YkYk,|C|=|A|\cdot\prod_{k=1}^{K}\lambda_{k}^{n_{k}-1},\quad Z^{\prime}C^{-1}Z=Y_{0}^{\prime}A^{-1}Y_{0}+\sum_{k=1}^{K}\lambda_{k}^{-1}Y_{k}^{\prime}Y_{k}, (6)

such that the computation of the determinant and any power of CC is greatly simplified. The square-root of the n×nn\times n correlation matrix, C1/2C^{1/2}, is straight forward to compute. From the eigendecomposition of AA, A=PΛaPA=P\Lambda_{a}P^{\prime}, we define the block diagonal matrix: D1/2=diag(PΛa1/2P,λ11/2In11,,λK1/2InK1)D^{1/2}=\mathrm{diag}(P\Lambda_{a}^{1/2}P^{\prime},\lambda_{1}^{1/2}I_{n_{1}-1},\ldots,\lambda_{K}^{1/2}I_{n_{K}-1}), and set C1/2QD1/2QC^{1/2}\equiv QD^{1/2}Q^{\prime}. It is easy to verify that C=C1/2C1/2C=C^{1/2}C^{1/2} and that C1/2C^{1/2} is symmetric. Computing C1/2C^{1/2} therefore only requires an eigendecomposition of the symmetric and positive definite K×KK\times K matrix, AA, rather than the eigendecomposition of CC, which is n×nn\times n. Computing other power of CC can be done similarly.

We can use Archakov and Hansen, (2024, corollary 2) to recover the elements of the condensed log-correlation matrix,

C~=Λn1WΛn1,W=logAlogΛλ,\tilde{C}=\Lambda_{n}^{-1}W\Lambda_{n}^{-1},\quad W=\log A-\log\Lambda_{\lambda},

where

Λλ=[λ100λK],andΛn=[n100nK].\Lambda_{\lambda}=\left[\begin{array}[]{ccc}\lambda_{1}&\cdots&0\\ \vdots&\ddots&\vdots\\ 0&\cdots&\lambda_{K}\end{array}\right],\qquad\text{and}\qquad\Lambda_{n}=\left[\begin{array}[]{ccc}\sqrt{n_{1}}&&0\\ &\ddots\\ 0&&\sqrt{n_{K}}\end{array}\right].

The unique values in C~\tilde{C}, which are the elements in η\eta, can be expressed as

η=vech(C~)=LK(Λn1Λn1)vec(W),\eta={\rm vech}(\tilde{C})=L_{K}\left(\Lambda_{n}^{-1}\otimes\Lambda_{n}^{-1}\right){\rm vec}(W),

where LKL_{K} is the elimination matrix, that solves vech(A)=Lkvec(A)\mathrm{vech}(A)=L_{k}\mathrm{vec}(A). This parametrization of block correlation matrices does not impose additional superfluous restrictions, and the canonical representation facilitates simple computation of the determinant, the matrix inverse, and any other power, as well as the matrix logarithm and the matrix exponential. This is very useful for the evaluation of the likelihood function, especially for the more complicated models with heterogeneous heavy tails and complex dependencies, which we pursue in the next section.

3 Distributions

The next step is to specify a distribution for the nn-dimensional random vector ZZ, from which the log-likelihood function, \ell, is defined. We consider several specifications, ranging from the multivariate normal distribution to convolutions of multivariate tt-distributions. The convolution-tt distributions by Hansen and Tong, (2024) have simple log-likelihood functions and the canonical representation of a block correlation matrix motivates some particular specifications of the convolution-tt distribution.

We define

U=C1/2Z,U=C^{-1/2}Z,

such that var(U)=In\mathrm{var}(U)=I_{n},888An advantage of having defined C1/2C^{1/2} from the eigendecomposition, is that the normalized variables in UU are invariant to reordering of the elements in ZZ, which would not be the case if a Cholesky form was used to define C1/2C^{1/2}. and a convenient property of any log-likelihood function, \ell, is that

(Z)=12log|C|+(U).\ell\left(Z\right)=-\tfrac{1}{2}\log|C|+\ell\left(U\right). (7)

This shows that the log-likelihood function will be in closed-form if we adopt a distribution for UU with a closed-form expression for (U)\ell(U), and this is important for obtaining tractable score-driven models. It is well known that the multivariate tt-distribution and the Gaussian distribution have simple expression for (U)\ell(U). Fortunately, so does the multivariate convolution-tt distributions, which has different and interesting statistical properties for ZZ.

3.1 Multivariate tt-Distributions

We begin with the simplest heavy-tailed distribution, a scaled multivariate tt-distribution, which nests the Gaussian distribution as a limited case. The multivariate tt-distribution is widely used to model vectors of returns with heavy tailed distributions, see e.g. Creal et al., (2012), Opschoor et al., (2017), and Hafner and Wang, (2023).

The nn-dimensional multivariate tt-distribution with ν\nu degrees of freedom, location μn\mu\in\mathbb{R}^{n}, and scale matrix Σn×n\Sigma\in\mathbb{R}^{n\times n}, typically written Xtν(μ,Σ)X\sim t_{\nu}(\mu,\Sigma), has density

fX(x)=Γ(ν+n2)Γ(ν2)[νπ]n2|Σ|12[1+1ν(xμ)Σ1(xμ)]ν+n2.f_{X}(x)=\tfrac{\Gamma(\tfrac{\nu+n}{2})}{\Gamma(\tfrac{\nu}{2})}[\nu\pi]^{-\frac{n}{2}}|\Sigma|^{-\frac{1}{2}}\left[1+\tfrac{1}{\nu}(x-\mu)^{\prime}\Sigma^{-1}(x-\mu)\right]^{-\frac{\nu+n}{2}}.

The variance is well-defined when ν>2\nu>2, in which case var(X)=νν2Σ\mathrm{var}(X)=\tfrac{\nu}{\nu-2}\Sigma. The parameter ν\nu governs the heaviness of the tail and the multivariate tt-distribution converges to the multivariate normal distribution, N(μ,Σ)N(\mu,\Sigma), as ν\nu\rightarrow\infty.

To simplify the notation, we will use a scaled multivariate tt-distribution, denoted tνstd(0,Σ)t_{\nu}^{\mathrm{std}}(0,\Sigma), which is defined for ν>2\nu>2. Its density is given by,

fY(y)=Γ(ν+n2)Γ(ν2)[(ν2)π]n2|Σ|12[1+1ν2yΣ1y]ν+n2,ν>2.f_{Y}(y)=\tfrac{\Gamma(\tfrac{\nu+n}{2})}{\Gamma(\tfrac{\nu}{2})}[(\nu-2)\pi]^{-\frac{n}{2}}|\Sigma|^{-\frac{1}{2}}\left[1+\tfrac{1}{\nu-2}y{}^{\prime}\Sigma^{-1}y\right]^{-\frac{\nu+n}{2}},\qquad\nu>2. (8)

The relation between the two distributions is as follows: If Xtν(0,Σ)X\sim t_{\nu}(0,\Sigma) with ν>2\nu>2, then Y=ν2νXtνstd(0,Σ)Y=\sqrt{\tfrac{\nu-2}{\nu}}X\sim t_{\nu}^{\mathrm{std}}(0,\Sigma). The main advantage of the scaled tt-distribution is that var(Y)=Σ\mathrm{var}(Y)=\Sigma. Thus, if Utνstd(0,In)U\sim t_{\nu}^{\mathrm{std}}(0,I_{n}) then Z=C1/2Utνstd(0,C)Z=C^{1/2}U\sim t_{\nu}^{\mathrm{std}}(0,C), and the corresponding log-likelihood function is given by

(Z)=c(ν,n)12log|C|ν+n2log(1+1ν2ZC1Z),\ell(Z)=c(\nu,n)-\tfrac{1}{2}\log|C|-\tfrac{\nu+n}{2}\log\left(1+\tfrac{1}{\nu-2}Z^{\prime}C^{-1}Z\right), (9)

where c(ν,n)=log(Γ(ν+n2)/Γ(ν2))n2log[(ν2)π]c(\nu,n)=\log\left(\Gamma(\tfrac{\nu+n}{2})/\Gamma(\tfrac{\nu}{2})\right)-\frac{n}{2}\log\left[\left(\nu-2\right)\pi\right] is a normalizing constant that does not depend on the correlation matrix, CC. If CC has a block structure we can use the identities in (6), and obtain the following simplified expression,

(Z)=\displaystyle\ell(Z)= c(ν,n)12log|A|12k=1K(nk1)logλk\displaystyle c(\nu,n)-\tfrac{1}{2}\log|A|-\tfrac{1}{2}\sum_{k=1}^{K}\left(n_{k}-1\right)\log\lambda_{k} (10)
ν+n2log(1+1ν2(Y0A1Y0+k=1Kλk1YkYk)).\displaystyle-\tfrac{\nu+n}{2}\log\left(1+\tfrac{1}{\nu-2}\left(Y_{0}^{\prime}A^{-1}Y_{0}+\sum_{k=1}^{K}\lambda_{k}^{-1}Y_{k}^{\prime}Y_{k}\right)\right).

The multivariate tt-distribution has two implications for all elements of the vector ZZ. First, all elements of a multivariate tt-distribution are dependent, because they share a common random mixing variable. Second, all elements of UU are identically distributed, because they are tt-distributed with the same degrees of freedom. Both implications may be too restrictive in many applications, especially if the dimension, nn, is large. Below we consider the convolution-tt distribution proposed in Hansen and Tong, (2024), which allows for heterogeneity and cluster structures in the tail properties and the tail dependencies.

3.2 Multivariate Convolution-tt Distributions

The multivariate convolution-tt distribution is a suitable rotations of a random vector that is made up of independent multivariate tt-distributions. More specific, let V1,,VGV_{1},\ldots,V_{G} be mutually independent standardized multivariate tt-distributed variables, Vgtνgstd(0,Img)V_{g}\sim t_{\nu_{g}}^{\mathrm{std}}(0,I_{m_{g}}), with νg>2\nu_{g}>2 for all g=1,,Gg=1,\ldots,G and n=g=1Gmgn=\sum_{g=1}^{G}m_{g}.

Then V=(V1,,VG)nV=(V_{1}^{\prime},\ldots,V_{G}^{\prime})^{\prime}\in\mathbb{R}^{n} has the standardized convolution-tt distribution (with zero location vector and identity scale-rotation matrix) that is denoted by

VCT𝒎,𝝂std(0,In),V\sim\mathrm{CT}_{\boldsymbol{m},\boldsymbol{\nu}}^{{\rm std}}(0,I_{n}),

where 𝝂=(ν1,,νG)\boldsymbol{\nu}=(\nu_{1},\ldots,\nu_{G})^{\prime} is the vector with degrees of freedom and 𝒎=(m1,,mG)\boldsymbol{m}=(m_{1},\ldots,m_{G})^{\prime} is the vector with the dimensions for the GG multivariate tt-distributions. We can think of the partitioning of elements in VV as a second cluster structure, as we discuss below.

We will model the distribution of UU using U=PVU=PV, where Pn×nP\in\mathbb{R}^{n\times n} is an orthonormal matrix, i.e. PP=InP^{\prime}P=I_{n}, and we use the notation UCT𝒎,𝝂std(0,P)U\sim\mathrm{CT}_{\boldsymbol{m},\boldsymbol{\nu}}^{{\rm std}}(0,P). While =var(U)=var(V)=In=\mathrm{var}(U)=\mathrm{var}(V)=I_{n}, they will not have the same distribution, unless PP has a particular structure, such as P=InP=I_{n}. Similarly, we use the following notation for the distribution of

Z=C1/2PVCT𝒎,𝝂std(0,C1/2P),Z=C^{1/2}PV\sim\mathrm{CT}_{\boldsymbol{m},\boldsymbol{\nu}}^{{\rm std}}(0,C^{1/2}P),

which is a convolution-tt distribution with location zero and scale-rotation matrix C1/2PC^{1/2}P. Note that we have var(Z)=C\mathrm{var}(Z)=C, for any orthonormal matrix, PP, but different choices for PP lead to different distributions with distinct non-linear dependencies that arise from the cluster structure in VV.

Conveniently, we have the expression, V=PC1/2Z=PUV=P^{\prime}C^{-1/2}Z=P^{\prime}U, and if we partition the columns in PP, using the same cluster structure as in VV, i.e. P=(P1,,PG)P=(P_{1},\ldots,P_{G}) with Pgn×mgP_{g}\in\mathbb{R}^{n\times m_{g}}, then it follows that Vg=PgUmgV_{g}=P_{g}^{\prime}U\in\mathbb{R}^{m_{g}}, for g=1,,Gg=1,\ldots,G. Next, UU and VV have the exact same log-likelihoods, (U)=(PU)=(V)\ell(U)=\ell(P^{\prime}U)=\ell(V), and we can use (7) to express the log-likelihood function for ZZ as

(Z)\displaystyle\ell(Z) =12log|C|+g=1Gcgνg+mg2log(1+1νg2VgVg),\displaystyle=-\tfrac{1}{2}\log|C|+\sum_{g=1}^{G}c_{g}-\tfrac{\nu_{g}+m_{g}}{2}\log\left(1+\tfrac{1}{\nu_{g}-2}V_{g}^{\prime}V_{g}\right), (11)

where cg=c(νg,mg)c_{g}=c(\nu_{g},m_{g}). When CC has a block structure, then we also have

V=PQ[A1/2Y0λ11/2Y1λK1/2YK],V=P^{\prime}Q\left[\begin{array}[]{c}A^{-1/2}Y_{0}\\ \lambda_{1}^{-1/2}Y_{1}\\ \vdots\\ \lambda_{K}^{-1/2}Y_{K}\end{array}\right],

where Y=QZY=Q^{\prime}Z, and some interesting special cases emerge from this structure.

We have previously used a partitioning of the variables to form a block correlation structure, which arises from a cluster structure for the variables. The convolution-tt distribution involves a second partitioning that defines the GG independent multivariate tt-distributions. This is a cluster structure in the underlying random innovations in the model. The two cluster structures can be identical, or can be different, as we illustrate with examples and in our empirical application. Next, we highlight six distributional properties that are the product of this model design.

  1. 1.

    Each element of VgmgV_{g}\in\mathbb{R}^{m_{g}}, has the same marginal tt-distribution with νg\nu_{g} degrees of freedom. This does not carry over to the same elements of ZZ (even if P=IP=I). In general, the marginal distribution of an element of ZZ, will be a unique convolution of (as many as) GG independent tt-distributions with different degrees of freedom.

  2. 2.

    While the (multivariate) tt-distributions are independent across groups, this does not carry over to the corresponding sub-vectors of ZZ.

  3. 3.

    The convolution for each element of ZZ is, in part, defined by the correlation matrix, CC. So, time-variation in CC will induce time-varying marginal distributions for the elements of ZZ.

  4. 4.

    The partitioning of V=PUV=P^{\prime}U into GG clusters (GG-clusters) induces heterogeneity in tail dependencies and the heavyness of the tails. The GG-clusters can be entirely different from the KK-clusters (partitioning of ZZ variables) that define the block structure in the correlation matrix, and the two numbers of clusters can be different.

  5. 5.

    Increasing the number of GG-clusters, does not necessarily improve the empirical fit. While increasing GG will increase the number parameters (degrees of freedom) in the model, it also entails dividing VV into additional subvectors, which eliminates the innate dependence between elements of VV, which apply to elements from the same multivariate tt-distribution.

  6. 6.

    Sixth, this model framework nests the conventional multivariate tt-distribution as the special case, G=1G=1, which facilitates simple comparisons with a natural benchmark model.

3.3 Density and CDF of Convolution-tt Distribution

The marginal distributions of the elements of ZZ are convolutions of independent tt-distributed variables, and neither their densities nor their cumulative distribution function have simple expressions.999Even for the simplest case – a convolution of two univariate tt-distributions – the resulting density does not have a simple closed-form expression. However, using Hansen and Tong, (2024, theorem 1) we obtain the following semi-analytical expressions, where Re[x]{\rm Re}\left[x\right] and Im[x]{\rm Im}\left[x\right] denote the real and imaginary part of xx\in\mathbb{C}, respectively, and ej,ne_{j,n} is the jj-th column of identity matrix InI_{n}.

Proposition 1.

Suppose ZCT𝐦,𝛎std(0,C1/2P)Z\sim\mathrm{CT}_{\boldsymbol{m},\boldsymbol{\nu}}^{{\rm std}}(0,C^{1/2}P). Then the marginal density and cumulative distribution function for ZjZ_{j}, j=1,,nj=1,\ldots,n, are given by

fZj(z)\displaystyle f_{Z_{j}}(z) =1π0Re[eiszφZj(s)]ds,FZj(z)=121π0Im[eiszφZj(s)]sds,\displaystyle=\frac{1}{\pi}\int_{0}^{\infty}{\rm Re}\left[e^{-isz}\varphi_{Z_{j}}(s)\right]\mathrm{d}s,\quad\quad F_{Z_{j}}(z)=\frac{1}{2}-\frac{1}{\pi}\int_{0}^{\infty}\frac{{\rm Im}\left[e^{-isz}\varphi_{Z_{j}}(s)\right]}{s}\mathrm{d}s,

respectively, where φZj(s)=g=1Gφνgstd(isPgC12ej,n)\varphi_{Z_{j}}(s)=\prod_{g=1}^{G}\varphi_{\nu_{g}}^{\mathrm{std}}\left(is\|P_{g}^{\prime}C^{\frac{1}{2}}e_{j,n}\|\right) is the characteristic function for ZjZ_{j}, and

φνstd(s)=Kν2(ν2|s|)(ν2|s|)12νΓ(ν2)2ν21,\varphi_{\nu}^{\mathrm{std}}(s)=\frac{K_{\frac{\nu}{2}}(\sqrt{\nu-2}|s|)(\sqrt{\nu-2}|s|)^{\frac{1}{2}\nu}}{\Gamma\left(\frac{\nu}{2}\right)2^{\frac{\nu}{2}-1}},

is the characteristic function of the univariate tνstdt_{\nu}^{\mathrm{std}}-distribution.

To gain some insight about convolution-tt distributions and the expressions in Proposition 1 we present features of two densities in Figure 1. We specifically consider convolutions, 1Gg=1GVg\tfrac{1}{\sqrt{G}}\sum_{g=1}^{G}V_{g}, for G=2G=2 and G=10G=10, where V1,,VgV_{1},\ldots,V_{g} are independent and standardized tt-distributed with six degrees of freedom.

Refer to caption
Figure 1: Panel (a) plots the logarithm of marginal distribution for g=1GVg/G\sum_{g=1}^{G}V_{g}/\sqrt{G} for G=1G=1, G=2G=2, and G=6G=6, where VgV_{g} are independent and identically distributed as t6std(0,1)t_{6}^{\mathrm{std}}(0,1). Panels (b) and (c) are Q-Q-plots of the Convolution-tt distribution, g=1GVg/G\sum_{g=1}^{G}V_{g}/\sqrt{G}, with G=2G=2 and G=10G=10, respectively, against the best approximating standardized Student’s-tt distribution, as defined by the Kullback-Leibler discrepancy.

The upper panel of Figure 1, Panel (a), shows the log-densities of the (left) tail of the distribution, and how they compare to those of a standardized t(6)stdt_{(6)}^{{\rm std}}-distribution and a standard normal distribution. As GG\rightarrow\infty the convolution-tt distribution will approach the normal distribution. So, it is not surprisingly that the log-densities for the convolutions are between that of a t(6)stdt_{(6)}^{{\rm std}} and that of a standard normal. Unsurprisingly, the convolution of G=10G=10 standardized tt-distributions is closer to the normal distribution than the convolution of G=2G=2 distributions. However, the convolution-tt distribution is not a tt-distribution for G>1G>1. In terms of Kullback-Leibler discrepancy, the best approximating tt-distribution to the convolution-tt distribution is a t(8.75)stdt_{(8.75)}^{{\rm std}}-distribution when G=2G=2 and a t(26.15)stdt_{(26.15)}^{{\rm std}}-distribution when G=10G=10, see the Q-Q plots in Panels (b) and (c) in Figure 1.

The expression for the marginal density of convolution-tt distributions is particularly useful in our empirical analysis, because it gives us a factorization of the joint density into marginal densities and the copula density by Sklar’s theorem. This leads to the decomposition of the log-likelihood, (Z)=j=1n(Zj)+log(c(Z))\ell(Z)=\sum_{j=1}^{n}\ell(Z_{j})+\log(c(Z)), where c(Z)c(Z) denotes the copula density, and we can see if gains in the log-likelihood are primarily driven by gains in the marginal distributions or by gains in the copula density.

3.4 Three Special Types of Convolution-tt Distributions

The convolution-tt distributions define a broad class of distributions, with many possible partitions of VV and choices for PP. Below we elaborate on som particular details of three special types of convolution-tt distributions. For latter use, we use ekK×1e_{k}\in\mathbb{R}^{K\times 1} to denote the kk-th column of identity matrix IKI_{K}.

3.4.1 Special Type 1: Cluster-tt Distribution

The first special type of convolution-tt distribution has P=IP=I, such that U=VU=V, and a single cluster structure. The cluster structure, 𝒎\boldsymbol{m}, is imposed on VV, whereas CC can be unrestricted, or have block structure based on the the same clustering, in which case 𝒏=𝒎\bm{n}=\bm{m} and G=KG=K.

Without a block correlation structure on CC, we have V=C1/2ZV=C^{-1/2}Z and the log-likelihood function is simply computed using (11). If the block structure is imposed on CC, then we can express the multivariate tt-distributed variables as linear combinations on the canonical variables, Y0,,YKY_{0},\ldots,Y_{K},

Uk=Vk=vnkekA12Y0+λk12vnkYk,fork=1,,K,U_{k}=V_{k}=v_{n_{k}}e_{k}^{\prime}A^{-\frac{1}{2}}Y_{0}+\lambda_{k}^{-\frac{1}{2}}v_{n_{k}}^{\perp}Y_{k},\qquad\text{for}\quad k=1,\ldots,K, (12)

We therefore have the expression for the quadratic terms,

UkUk=Y0A12ekekA12Y0+λk1YkYk,k=1,,K,U_{k}^{\prime}U_{k}=Y_{0}^{\prime}A^{-\tfrac{1}{2}}e_{k}e_{k}^{\prime}A^{-\tfrac{1}{2}}Y_{0}+\lambda_{k}^{-1}Y_{k}^{\prime}Y_{k},\quad k=1,\ldots,K,

and the log-likelihood function simplifies to

(Z)\displaystyle\ell(Z) =12log|A|+k=1Kck12(nk1)logλkνk+nk2log(1+1νk2UkUk),\displaystyle=-\tfrac{1}{2}\log|A|+\sum_{k=1}^{K}c_{k}-\tfrac{1}{2}\left(n_{k}-1\right)\log\lambda_{k}-\tfrac{\nu_{k}+n_{k}}{2}\log\left(1+\tfrac{1}{\nu_{k}-2}U_{k}^{\prime}U_{k}\right), (13)

where ck=c(νk,nk)c_{k}=c(\nu_{k},n_{k}). The block structure simplifies implementation of the score-driven model for this specification, and makes it possible to implement the model with a large number of stocks.

3.4.2 Special Type 2: Hetero-tt Distribution

A second special type of convolution-tt distributions has P=IP=I and G=nG=n. So, the elements of UU are made up of nn independent univariate tt-distributions with degrees of freedom, νi\nu_{i}, i=1,,ni=1,\cdots,n. This distribution can accommodate a high degree of heterogeneity in the tail properties of ZiZ_{i}, i=1,,ni=1,\ldots,n, which are different convolutions of the nn independent tt-distributions. For this reason, we refer to these distributions as the Hetero-tt distributions. The number of degrees of freedom increases from GG to nn, but the additional parameters do not guarantee a better in-sample log-likelihood, because all dependence between elements of VV is eliminated. The Cluster-tt distribution has dependence between VV-variables within the same cluster. This has implications the linear combinations of UU, including those that define ZZ.

For the case with a general correlation matrix, the Hetero-tt distribution simplifies the log-likelihood function in (11) to

(Z)=12log|C|+i=1nciνi+12log(1+1νi2Ui2),\ell(Z)=-\tfrac{1}{2}\log|C|+\sum_{i=1}^{n}c_{i}-\tfrac{\nu_{i}+1}{2}\log\left(1+\tfrac{1}{\nu_{i}-2}U_{i}^{2}\right),

where ci=c(νi,1)c_{i}=c(\nu_{i},1).101010Note that we can obtain preliminary estimates (starting values) of the nn degrees of freedom parameters, by estimating νi\nu_{i} from eiU~te_{i}^{\prime}\tilde{U}_{t}, where U~t=C~12Zt\tilde{U}_{t}=\tilde{C}^{-\frac{1}{2}}Z_{t} and C~\tilde{C} is an estimate of the unconditional correlation matrix, for i=1,,ni=1,\ldots,n.

We can combine the heterogenous tt-distributions with a block correlation matrix, in which case the log-likelihood function simplifies to

(Z)=c12log|A|12k=1K(nk1)logλkk=1Kj=1nkνk,j+12log(1+1νk,j2Uk,j2),\ell(Z)=c-\tfrac{1}{2}\log|A|-\tfrac{1}{2}\sum_{k=1}^{K}\left(n_{k}-1\right)\log\lambda_{k}-\sum_{k=1}^{K}\sum_{j=1}^{n_{k}}\tfrac{\nu_{k,j}+1}{2}\log\left(1+\tfrac{1}{\nu_{k,j}-2}U_{k,j}^{2}\right), (14)

where c=i=1nc(νi,1)c=\sum_{i=1}^{n}c(\nu_{i},1) and Uk,jU_{k,j} is the jj-th element of the vector UkU_{k} expressed by (12).

3.4.3 Special Type 3: Canonical-Block-tt Distribution

A third special type of convolution-tt distributions is based on the canonical canonical variables, as defined by the canonical representation of the block correlation matrix. The Canonical-Block-tt distribution has P=QP=Q and 𝒎=(K,n11,,nK1)\boldsymbol{m}=(K,n_{1}-1,\ldots,n_{K}-1)^{\prime}, such that V=QUV=Q^{\prime}U is composed of G=K+1G=K+1 independent multivariate tt-distributions. So,

QU=(V0,V1,,VK),whereV0tν0(0,IK),and Vktνk(0,Ink1).Q^{\prime}U=\left(V_{0}^{\prime},V_{1}^{\prime},\cdots,V_{K}^{\prime}\right)^{\prime},\quad{\rm where}\ V_{0}\sim t_{\nu_{0}}(0,I_{K}),\quad{\rm and}\text{ }V_{k}\sim t_{\nu_{k}}(0,I_{n_{k}-1}).

This construction is motivated by the K+1K+1 canonical variables, Y0,,YKY_{0},\ldots,Y_{K}, that arises from the canonical representation of block correlation matrices. Interestingly, this type of convolution-tt distribution can be used, regardless of CC having a block structure or not. For a general correlation matrix, CC, the log-likelihood function is given by

(Z)\displaystyle\ell(Z) =12log|C|+c0ν0+K2log(1+V0V0ν02)+k=1Kckνk+nk12log(1+VkVkνk2),\displaystyle=-\tfrac{1}{2}\log|C|+c_{0}-\tfrac{\nu_{0}+K}{2}\log\left(1+\frac{V_{0}^{\prime}V_{0}}{\nu_{0}-2}\right)+\sum_{k=1}^{K}c_{k}-\tfrac{\nu_{k}+n_{k}-1}{2}\log\left(1+\frac{V_{k}^{\prime}V_{k}}{\nu_{k}-2}\right),

where V=QU=QC1/2ZV=Q^{\prime}U=Q^{\prime}C^{-1/2}Z.

From a practical viewpoint, a more interesting situation is when CC has a block structure, such that C=QDQC=QDQ^{\prime}. With this structure, the log-likelihood function simplifies to

(Z)\displaystyle\ell(Z) =c012log|A|ν0+K2log(1+1ν02Y0A1Y0)\displaystyle=c_{0}-\tfrac{1}{2}\log|A|-\tfrac{\nu_{0}+K}{2}\log\left(1+\tfrac{1}{\nu_{0}-2}Y_{0}^{\prime}A^{-1}Y_{0}\right)
+k=1K[ck12(nk1)logλkνk+nk12log(1+1νk2YkYkλk1)],\displaystyle\quad+\sum_{k=1}^{K}\left[c_{k}-\tfrac{1}{2}\left(n_{k}-1\right)\log\lambda_{k}-\tfrac{\nu_{k}+n_{k}-1}{2}\log\left(1+\tfrac{1}{\nu_{k}-2}Y_{k}^{\prime}Y_{k}\lambda_{k}^{-1}\right)\right], (15)

which is computationally advantageous, because it does note require an inverse (nor a determinant) of an n×nn\times n matrix.

The expression for the log-likelihood function shows that this distribution is equivalent to assuming that Y0,Y1,,YKY_{0},Y_{1},\ldots,Y_{K} are independent and distributed as Y0tν0std(0,A)Y_{0}\sim t_{\nu_{0}}^{\mathrm{std}}(0,A) and Yktνkstd(0,λkInk1)Y_{k}\sim t_{\nu_{k}}^{\mathrm{std}}(0,\lambda_{k}I_{n_{k}-1}), for k=1,,Kk=1,\cdots,K. This yields insight about the standardized returns within each block. Let ZkZ_{k} be the nkn_{k}-dimensional subvector of Z=(Z1,,ZK)Z=(Z_{1}^{\prime},\ldots,Z_{K}^{\prime})^{\prime}. From Z=QQZ=QYZ=QQ^{\prime}Z=QY it follows that

Zk=vnkY0,k+vnkYk,Z_{k}=v_{n_{k}}Y_{0,k}+v_{n_{k}}^{\perp}Y_{k},

such that a standardized return in the kk-th block has the same loading on the common variable Y0,kY_{0,k}, and orthogonal loadings on the vector YkY_{k}.

Additional convolution-tt distributions could be bases on this structure. For instance, we could combine P=QP=Q with heterogeneous univariate tt-distributions, for some or all of the canonical variables. For instance, the canonical variable, V0V_{0}, could be made up of KKheterogeneous tt-distributions, while other canonical variables, V1,,VKV_{1},\ldots,V_{K} have multivariate tνkstdt_{\nu_{k}}^{\mathrm{std}}-distributions.

4 Score-Driven Models

We turn to the dynamic modeling of the conditional correlation matrix in this section. To this end we adopt the score-drive framework by Creal et al., (2013), to model the dynamic properties of γt=vecl(logCt)d\gamma_{t}={\rm vecl}(\log C_{t})\in\mathbb{R}^{d}, with d=n(n1)/2d=n\left(n-1\right)/2. Specifically, we adopt the vector autoregressive model of order one, VAR(1):

γt+1=(Idβ)μ+βγt+αεt,\gamma_{t+1}=(I_{d}-\beta)\mu+\beta\gamma_{t}+\alpha\varepsilon_{t}, (16)

where β\beta and α\alpha are d×dd\times d matrices of coefficients, μ=𝔼(γt)\mu=\mathbb{E}(\gamma_{t}), and εt\varepsilon_{t} will be defined by the first-order conditions of the log-likelihood at times tt.111111It is straightforward to include additional lagged values of ηt\eta_{t}, such that (16) has a higher-order VAR(p) structure, and adding qq lagged values of εt\varepsilon_{t}, would generalize (16) to a VARMA(p,q) model, we do not pursue these extensions in this paper. The key aspect of a score-driven model is that the score of the predictive likelihood function is used to define the innovation εt\varepsilon_{t}, specifically

εt=𝒮t1t,wheret=t1(Zt)γt,\varepsilon_{t}=\mathcal{S}_{t}^{-1}\nabla_{t},\quad{\rm where}\quad\nabla_{t}=\frac{\partial\ell_{t-1}(Z_{t})}{\partial\gamma_{t}}, (17)

and 𝒮t\mathcal{S}_{t} is a scaling matrix. The score t\nabla_{t} is the first-order derivative of log-likelihood with respect to γt\gamma_{t}, and t\nabla_{t} is a martingale difference process if the model is correctly specified. The Fisher information matrix, t=𝔼t1(tt)\mathcal{I}_{t}=\mathbb{E}_{t-1}\left(\nabla_{t}\nabla_{t}^{\prime}\right), is often used as the scaling matrix, in which case the time-varying parameter vector is updated in a manner that resembles a Newton-Raphson algorithm, see Creal et al., (2013).121212One exception is Hafner and Wang, (2023), who used an unscaled score, i.e. 𝒮t=I\mathcal{S}_{t}=I, which does not take any curvature of the log-likelihood into account when parameter values are revised.

A potential drawback of using 𝒮t1\mathcal{S}_{t}^{-1} as the scaling matrix in (17) is that the precision of the inverse deteriorates as the dimension increases. We will therefore approximate 𝒮t1\mathcal{S}_{t}^{-1} by imposing a diagonal structure, and simply inverting the diagonal elements of t\mathcal{I}_{t}. This is equivalent to using the scaling matrix,

𝒮t=diag(t,11,,t,dd).\mathcal{S}_{t}={\rm diag}\left(\mathcal{I}_{t,11},\ldots,\mathcal{I}_{t,dd}\right).

In this manner, each element of the parameter vector is updated with a scaled version of the corresponding element of the score. Computing the inverse, 𝒮t1\mathcal{S}_{t}^{-1}, is now straightforward and simple to implement.

The score is computed using the following decomposition,

γ=vecl(C)vecl(C)vecl(logC).\frac{\partial\ell}{\partial\gamma^{\prime}}=\frac{\partial\ell}{\partial{\rm vecl}(C)^{\prime}}\frac{\partial{\rm vecl}(C)}{\partial{\rm vecl}\left(\log C\right)^{\prime}}. (18)

The expression for the last term was derived in Archakov and Hansen, (2021) using results from Linton and McCrorie, (1995). The drawback of this approach is that it requires an eigendecomposition of n2×n2n^{2}\times n^{2} matrix and this is impractical and unstable when nn is large. Moreover, the computational burden for the corresponding information matrix is even worse. Fortunately, when CC has a block structure, we have the following simplified expression,

η=vec(A)vec(A)vec(W)(ΛnΛn)DK.\frac{\partial\ell}{\partial\eta^{\prime}}=\frac{\partial\ell}{\partial{\rm vec}\left(A\right)^{\prime}}\frac{\partial{\rm vec}\left(A\right)}{\partial{\rm vec}\left(W\right)^{\prime}}\left(\Lambda_{n}\otimes\Lambda_{n}\right)D_{K}.

The first term can be computed very fast for all the variants of the convolution-tt distributions we consider. The second term only requires an eigendecomposition of AA (the upper-left K×KK\times K submatrix of DD), and this greatly reduces the computational burden for evaluating both the score and the information matrix.

For block correlation matrices, we use the vector autoregression of order one for the subvector,

ηt+1=(Idβ)μ+βηt+αεt,\eta_{t+1}=\left(I_{d}-\beta\right)\mu+\beta\eta_{t}+\alpha\varepsilon_{t}, (19)

where μ=𝔼(ηt)d\mu=\mathbb{E}(\text{$\eta$}_{t})\in\mathbb{R}^{d}, and α\alpha and β\beta are d×dd\times d matrices with d=K(K+1)/2d=K\left(K+1\right)/2.

To implement the score-driven model we need to derive the appropriate score and scaling matrix for each of the log-likelihoods. For this purpose, we will adopt the following notation involving matrices and matrix operators, with some notation adopted from Creal et al., (2012). Let AA and BB be two matrices with suitable dimensions. The Kronecker product is denoted by ABA\otimes B and we use AAAA_{\otimes}\equiv A\otimes A and ABAB+BAA\oplus B\equiv A\otimes B+B\otimes A. We let KkK_{k} denote the commutation matrix, DkD_{k} the duplication matrix, and LkL_{k}, ElE_{l}, EuE_{u}, are EdE_{d} elimination matrices. These are defined by the following identities:

Kkvec(B)=vec(B),Dkvech(A)=vec(A),Lkvec(B)=vech(B),Elvec(B)=vecl(B),Euvec(B)=vecl(B),Edvec(B)=diag(B),\begin{array}[]{c}K_{k}{\rm vec}(B)={\rm vec}\left(B^{\prime}\right),\quad D_{k}{\rm vech}(A)={\rm vec}(A),\quad L_{k}{\rm vec}(B)={\rm vech}(B),\\ E_{l}{\rm vec}(B)={\rm vecl}(B),\quad E_{u}{\rm vec}(B)={\rm vecl}\left(B^{\prime}\right),\quad E_{d}{\rm vec}(B)={\rm diag}(B),\end{array}

for any symmetric matrix, Ak×kA\in\mathbb{R}^{k\times k}, and any matrix, Bk×kB\in\mathbb{R}^{k\times k}.

4.1 Scores and Information Matrices for a General Correlation Matrix

We first derive expressions for \nabla and \mathcal{I} with a general correlation matrix. Recall that the log-likelihood function, based on a convolution-tt distribution, is given by (9), and in the special case with a multivariate tt-distribution, the log-likelihood simplifies to the expression in (11).

4.1.1 Score-Driven Model with Multivariate tt-Distribution

Theorem 1.

Suppose that Ztn,νstd(0,C)Z\sim t_{n,\nu}^{\mathrm{std}}(0,C). Then the score vector and information matrix with respect to γ=vecl(logC)\gamma={\rm vecl}\left(\log C\right), are given by:

\displaystyle\nabla =12MC1[Wvec(ZZ)vec(C)],\displaystyle=\tfrac{1}{2}M^{\prime}C_{\otimes}^{-1}\left[W{\rm vec}\left(ZZ^{\prime}\right)-{\rm vec}\left(C\right)\right], (20)
\displaystyle\mathcal{I} =14M[ϕC1Hn+(ϕ1)vec(C1)vec(C1)]M,\displaystyle=\tfrac{1}{4}M^{\prime}\left[\phi C_{\otimes}^{-1}H_{n}+(\phi-1){\rm vec}(C^{-1}){\rm vec}(C^{-1})^{\prime}\right]M, (21)

respectively, with Hn=In2+KnH_{n}=I_{n^{2}}+K_{n},

W=ν+nν2+ZC1Z,ϕ=ν+nν+n+2,W=\frac{\nu+n}{\nu-2+Z^{\prime}C^{-1}Z},\quad\phi=\frac{\nu+n}{\nu+n+2},

and

M=vec(C)/γ=(El+Eu)El(In2ΓEd(EdΓEd)1Ed)Γ(El+Eu),M=\partial{\rm vec}\left(C\right)/\partial\gamma^{\prime}=\left(E_{l}+E_{u}\right)^{\prime}E_{l}\left(I_{n^{2}}-\Gamma E_{d}^{\prime}\left(E_{d}\Gamma E_{d}^{\prime}\right)^{-1}E_{d}\right)\Gamma\left(E_{l}+E_{u}\right)^{\prime},

where the expression for Γ=vec(C)/vec(logC)\Gamma=\partial{\rm vec}(C)/\partial{\rm vec}\left(\log C\right)^{\prime} is presented in the appendix, see (A.1).

The expression of WW shows that the impact of extreme values (outliers) is dampened by the degrees of freedom, however this mitigation subsides as ν\nu\rightarrow\infty. The result for the Gaussian distribution is obtained by setting W=ϕ=1W=\phi=1, which are their limits as ν\nu\rightarrow\infty.

4.1.2 Score-Driven Model with Convolution-tt Distributions

Theorem 2.

Suppose that ZCT𝐦,𝛎std(0,C1/2P)Z\sim\mathrm{CT}_{\boldsymbol{m},\boldsymbol{\nu}}^{{\rm std}}(0,C^{1/2}P). Then the score vector and information matrix with respect to γ=vecl(logC)\gamma={\rm vecl}\left(\log C\right), are given by:

\displaystyle\nabla =MΩ[g=1GWgvec(PgVgU)vec(In)],\displaystyle=M^{\prime}\Omega\left[\sum_{g=1}^{G}W_{g}{\rm vec}\left(P_{g}V_{g}U^{\prime}\right)-{\rm vec}\left(I_{n}\right)\right],
\displaystyle\mathcal{I} =MΩ(Kn+ΥG)ΩM,\displaystyle=M^{\prime}\Omega\left(K_{n}+\Upsilon_{G}\right)\Omega M,

respectively, where MM is defined in Theorem 1, Ω=(InC12)(C12In)1\Omega=(I_{n}\otimes C^{-\frac{1}{2}})(C^{\frac{1}{2}}\oplus I_{n})^{-1}, and ΥG=g=1GΨg\Upsilon_{G}=\sum_{g=1}^{G}\Psi_{g} with

Ψg\displaystyle\Psi_{g} =ψg(InJg)+(ϕgψg)Jg+(ϕg1)[JgKn+vec(Jg)vec(Jg)],\displaystyle=\psi_{g}\left(I_{n}\otimes J_{g}\right)+\left(\phi_{g}-\psi_{g}\right)J_{g\otimes}+\left(\phi_{g}-1\right)\left[J_{g\otimes}K_{n}+{\rm vec}\left(J_{g}\right){\rm vec}\left(J_{g}\right)^{\prime}\right],

where Jg=PgPgJ_{g}=P_{g}P_{g}^{\prime},

Wg=νg+mgνg2+VgVg,ϕg=νg+mgνg+mg+2,ψg=ϕgνgνg2,W_{g}=\frac{\nu_{g}+m_{g}}{\nu_{g}-2+V_{g}^{\prime}V_{g}},\quad\phi_{g}=\frac{\nu_{g}+m_{g}}{\nu_{g}+m_{g}+2},\quad\psi_{g}=\phi_{g}\frac{\nu_{g}}{\nu_{g}-2},

for g=1,,Gg=1,\ldots,G.

The inverse of C12InC^{\frac{1}{2}}\oplus I_{n} (an n2×n2n^{2}\times n^{2} matrix) is available in closed form (see Appendix A) and is computationally inexpensive because it relies on an eigendecomposition of CC, which is already needed for computing Γ\Gamma in the expression of MM.

Some insight can be gained from considering the case P=IP=I. A key component of \nabla is g=1G(WgPgVg)=(W1V1,W2V2,,WGVG)\sum_{g=1}^{G}\left(W_{g}P_{g}V_{g}\right)=\left(W_{1}V_{1}^{\prime},W_{2}V_{2}^{\prime},\ldots,W_{G}V_{G}^{\prime}\right)^{\prime}, which shows that the impact that gg-th cluster, VgV_{g}, has one the score is controlled by the coefficient WgW_{g}.

4.2 Scores and Information Matrices for a Block Correlation Matrix

Next, we derive the corresponding expression for the case where CC has a block structure. For the score we have the following expression

=η=AΠA,whereA=vec(A),\nabla^{\prime}=\frac{\partial\ell}{\partial\eta^{\prime}}=\nabla_{A}^{\prime}\Pi_{A},\quad{\rm where}\quad\nabla_{A}=\frac{\partial\ell}{\partial{\rm vec}(A)},

and the expression for ΠA\Pi_{A} is given in the following Lemma.

Lemma 1.

Let ΠA=vec(A)/η\Pi_{A}=\partial{\rm vec}(A)/\partial\eta^{\prime}, then

ΠA=[ΓAΓAEd(Φ+EdΓAEd)1EdΓA]ΛnDk,\Pi_{A}=\left[\Gamma_{A}-\Gamma_{A}E_{d}^{\prime}\left(\Phi+E_{d}\Gamma_{A}E_{d}^{\prime}\right)^{-1}E_{d}\Gamma_{A}\right]\Lambda_{n\otimes}D_{k}, (22)

where Φ\Phi is a K×KK\times K diagonal matrix with Φkk=λk(nk1)\Phi_{kk}=\lambda_{k}\left(n_{k}-1\right), k=1,,Kk=1,\ldots,K, and ΓA=vec(A)/vec(logA)\Gamma_{A}=\partial{\rm vec}(A)/\partial{\rm vec}\left(\log A\right)^{\prime} has the expression given in (A.2).

Conveniently, the computation of ΠA\Pi_{A} only requires the inverse of a K×KK\times K matrix. From the results for A\nabla_{A} we have =ΠAA\nabla=\Pi_{A}^{\prime}\nabla_{A} and similarly,

=ΠAAΠA,whereA=𝔼(AA).\mathcal{I}=\Pi_{A}^{\prime}\mathcal{I}_{A}\Pi_{A},\quad{\rm where}\quad\mathcal{I}_{A}=\mathbb{E}\left(\nabla_{A}\nabla_{A}^{\prime}\right).

4.2.1 Score-Driven Model with Block Correlation and Multivariate tt-Distribution

With a block correlation structure, we define the standardized canonical variables

X=(X0,X1,,XK)=QU=D12Y,X=\left(X_{0}^{\prime},X_{1}^{\prime},\ldots,X_{K}^{\prime}\right)^{\prime}=Q^{\prime}U=D^{-\frac{1}{2}}Y,

such that X0=A12Y0X_{0}=A^{-\frac{1}{2}}Y_{0} with var(X0)=IK{\rm var}(X_{0})=I_{K} and Xk=λk12YkX_{k}=\lambda_{k}^{-\frac{1}{2}}Y_{k} with var(Xk)=Ink1{\rm var}(X_{k})=I_{n_{k}-1} for k=1,,Kk=1,\ldots,K.

Theorem 3.

Suppose that Ztν,nstd(0,C)Z\sim t_{\nu,n}^{{\rm std}}(0,C). Then the score vector and information matrix with respect to the dynamic parameters, vec(A){\rm vec}(A), are given by:

A\displaystyle\nabla_{A} =12A12[Wvec(X0X0)vec(IK)]+12EdS,\displaystyle=\tfrac{1}{2}A_{\otimes}^{-\frac{1}{2}}\left[W{\rm vec}\left(X_{0}X_{0}^{\prime}\right)-{\rm vec}\left(I_{K}\right)\right]+\tfrac{1}{2}E_{d}^{\prime}S,
A\displaystyle\mathcal{I}_{A} =14[ϕA1HK+(ϕ1)vec(A1)vec(A1)]+ϕ2EdΞEd\displaystyle=\tfrac{1}{4}\left[\phi A_{\otimes}^{-1}H_{K}+(\phi-1){\rm vec}(A^{-1}){\rm vec}(A^{-1})^{\prime}\right]+\tfrac{\phi}{2}E_{d}^{\prime}\Xi E_{d}
+1ϕ4[vec(A1)ξEd+Edξvec(A1)EdξξEd],\displaystyle\quad+\tfrac{1-\phi}{4}\left[{\rm vec}(A^{-1})\xi^{\prime}E_{d}+E_{d}^{\prime}\xi{\rm vec}(A^{-1})^{\prime}-E_{d}^{\prime}\xi\xi^{\prime}E_{d}\right],

respectively, where

ϕ=ν+nν+n+2,W=ν+nν2+X0X0+k=1KXkXk,\phi=\frac{\nu+n}{\nu+n+2},\quad W=\frac{\nu+n}{\nu-2+X_{0}^{\prime}X_{0}+\sum_{k=1}^{K}X_{k}^{\prime}X_{k}},

and SKS\in\mathbb{R}^{K}, ξK\xi\in\mathbb{R}^{K}, and the diagonal matrix, Ξ\Xi, are defined by

Sk=1λkWXkXkλk(nk1),ξk=λk1,Ξkk=λk2(nk1)1,S_{k}=\frac{1}{\lambda_{k}}-\frac{WX_{k}^{\prime}X_{k}}{\lambda_{k}\left(n_{k}-1\right)},\quad\xi_{k}=\lambda_{k}^{-1},\quad\Xi_{kk}=\lambda_{k}^{-2}\left(n_{k}-1\right)^{-1},

for k=1,,Kk=1,\ldots,K. In the special case where ZZ has a multivariate Gaussian distribution (ν=\nu=\infty, ϕ=1\phi=1), the expression for the information matrix simplifies to A=14A1HK+12EdΞEd\mathcal{I}_{A}=\tfrac{1}{4}A_{\otimes}^{-1}H_{K}+\tfrac{1}{2}E_{d}^{\prime}\Xi E_{d}.

4.2.2 Score-Driven Model with Block Correlation and Cluster-tt Distribution

Theorem 4 (Cluster-tt with Block-CC).

Suppose that ZCT𝐧,𝛎std(0,C1/2)Z\sim\mathrm{CT}_{\boldsymbol{n},\boldsymbol{\nu}}^{{\rm std}}(0,C^{1/2}) where CC has the block structure defined by 𝐧\boldsymbol{n}. Then the score vector and information matrix with respect to dynamic parameters, vec(A){\rm vec}(A), are given by:

A\displaystyle\nabla_{A} =ΩA[k=1KWkX0,kvec(ekX0)vec(IK)]+12EdS,\displaystyle=\Omega_{A}\left[\sum_{k=1}^{K}W_{k}X_{0,k}{\rm vec}\left(e_{k}X_{0}^{\prime}\right)-{\rm vec}\left(I_{K}\right)\right]+\tfrac{1}{2}E_{d}^{\prime}S,
A\displaystyle\mathcal{I}_{A} =ΩA(KK+ΥK)ΩA+14EdΞEd+12EdΘΩA+12ΩAΘEd,\displaystyle=\Omega_{A}\left(K_{K}+\Upsilon_{K}\right)\Omega_{A}+\tfrac{1}{4}E_{d}^{\prime}\Xi E_{d}+\tfrac{1}{2}E_{d}^{\prime}\Theta\Omega_{A}+\tfrac{1}{2}\Omega_{A}\Theta^{\prime}E_{d},

respectively, where ΩA=(IKA12)(A12IK)1\Omega_{A}=(I_{K}\otimes A^{-\frac{1}{2}})(A^{\frac{1}{2}}\oplus I_{K})^{-1}, and vector eke_{k} is the kk-th column of the identity matrix IKI_{K}. The vector SKS\in\mathbb{R}^{K}, the diagonal matrix, Ξ\Xi, and Θ\Theta are defined as

Sk\displaystyle S_{k} =1λkWkXkXkλk(nk1),Wk=νk+nkνk2+X0,k2+XkXk,\displaystyle=\frac{1}{\lambda_{k}}-\frac{W_{k}X_{k}^{\prime}X_{k}}{\lambda_{k}\left(n_{k}-1\right)},\quad\quad\ W_{k}=\frac{\nu_{k}+n_{k}}{\nu_{k}-2+X_{0,k}^{2}+X_{k}^{\prime}X_{k}},
Ξkk\displaystyle\Xi_{kk} =ϕk1λk2+2ϕkλk2(nk1),Θ=k=1Kλk1(1ϕk)ekvec(Jke),\displaystyle=\frac{\phi_{k}-1}{\lambda_{k}^{2}}+\frac{2\phi_{k}}{\lambda_{k}^{2}\left(n_{k}-1\right)},\quad\Theta=\sum_{k=1}^{K}\lambda_{k}^{-1}\left(1-\phi_{k}\right)e_{k}{\rm vec}\left(J_{k}^{e}\right)^{\prime},

for k=1,,Kk=1,\ldots,K. The matrix ΥK\Upsilon_{K} is defined analogously to ΥG\Upsilon_{G} in Theorem 2.

4.2.3 Score-Driven Model with Block Correlation and Hetero-tt Distribution

Theorem 5 (Heterogeneous-Block Convolution-tt).

Suppose that ZCT𝐧,𝛎std(0,C1/2)Z\sim\mathrm{CT}_{\boldsymbol{n},\boldsymbol{\nu}}^{{\rm std}}(0,C^{1/2}), where CC has the block structure defined by 𝐧\boldsymbol{n}. Then the score vector and information matrix with respect to the dynamic parameters, vec(A){\rm vec}(A), are given by:

A\displaystyle\nabla_{A} =ΩA[k=1Ki=1nkWk,iUk,ivec(ekX0)nk12vec(IK)]+12EdS,\displaystyle=\Omega_{A}\left[\sum_{k=1}^{K}\sum_{i=1}^{n_{k}}W_{k,i}U_{k,i}{\rm vec}\left(e_{k}X_{0}^{\prime}\right)n_{k}^{-\frac{1}{2}}-{\rm vec}(I_{K})\right]+\tfrac{1}{2}E_{d}^{\prime}S,
A\displaystyle\mathcal{I}_{A} =ΩA(KK+ΥKe)ΩA+14EdΞEd+12EdΘΩA+12ΩAΘEd,\displaystyle=\Omega_{A}\left(K_{K}+\Upsilon_{K}^{e}\right)\Omega_{A}+\tfrac{1}{4}E_{d}^{\prime}\Xi E_{d}+\tfrac{1}{2}E_{d}^{\prime}\Theta\Omega_{A}+\tfrac{1}{2}\Omega_{A}\Theta^{\prime}E_{d},

respectively, where

Sk=1λki=1nkWk,iUk,iFk,iUk(nk1)λk,k=1,,K,S_{k}=\frac{1}{\lambda_{k}}-\frac{\sum_{i=1}^{n_{k}}W_{k,i}U_{k,i}F_{k,i}U_{k}}{\left(n_{k}-1\right)\lambda_{k}},\qquad k=1,\ldots,K,

with

Wk,i=νk,i+1νk,i2+Uk,i2,Fk,i=e~i(Inkvnkvnk),W_{k,i}=\frac{\nu_{k,i}+1}{\nu_{k,i}-2+U_{k,i}^{2}},\qquad F_{k,i}=\tilde{e}_{i}^{\prime}\left(I_{n_{k}}-v_{n_{k}}v_{n_{k}}^{\prime}\right),

and e~i\tilde{e}_{i} is the ii-th column of identity matrix InkI_{n_{k}}. The matrix ΥKe=k=1KΨke\Upsilon_{K}^{e}=\sum_{k=1}^{K}\Psi_{k}^{e} is given by:

Ψke=nk1(3ϕ¯k2ψ¯k)Jke+ψ¯k(IKJke),\Psi_{k}^{e}=n_{k}^{-1}\left(3\bar{\phi}_{k}-2-\bar{\psi}_{k}\right)J_{k\otimes}^{e}+\bar{\psi}_{k}\left(I_{K}\otimes J_{k}^{e}\right),

where Jke=ekekJ_{k}^{e}=e_{k}e_{k}^{\prime}, and

ϕ¯k=1nki=1nkϕk,i,ψ¯k=1nki=1nkψk,i.\bar{\phi}_{k}=\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}\phi_{k,i},\quad\bar{\psi}_{k}=\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}\psi_{k,i}.

The diagonal matrix Ξ\Xi and Θ\Theta are given by:

Ξkk\displaystyle\Xi_{kk} =λk2nk1[3ϕ¯k1+(ψ¯k+1)(nk1)1],\displaystyle=\lambda_{k}^{-2}n_{k}^{-1}\left[3\bar{\phi}_{k}-1+\left(\bar{\psi}_{k}+1\right)\left(n_{k}-1\right)^{-1}\right],
Θ\displaystyle\Theta =k=1Kλk1nk1(ψ¯k+23ϕ¯k)ekvec(Jke).\displaystyle=\sum_{k=1}^{K}\lambda_{k}^{-1}n_{k}^{-1}\left(\bar{\psi}_{k}+2-3\bar{\phi}_{k}\right)e_{k}{\rm vec}\left(J_{k}^{e}\right)^{\prime}.

4.2.4 Score-Driven Model with Block Correlation and Canonical-Block-tt Distribution

Theorem 6 (Canonical-Block Convolution-tt).

Suppose that ZCT𝐦,𝛎std(0,C1/2Q)Z\sim\mathrm{CT}_{\boldsymbol{m},\boldsymbol{\nu}}^{{\rm std}}(0,C^{1/2}Q), where CC has the block structure defined by 𝐧\boldsymbol{n} and 𝐦=(K,n11,,nK1)\boldsymbol{m}=(K,n_{1}-1,\ldots,n_{K}-1)^{\prime}. Then the score vector and information matrix with respect to the dynamic parameters, vec(A){\rm vec}(A), are given by:

A\displaystyle\nabla_{A} =12A12[W0vec(X0X0)vec(IK)]+12EdS,\displaystyle=\tfrac{1}{2}A_{\otimes}^{-\frac{1}{2}}\left[W_{0}{\rm vec}(X_{0}X_{0}^{\prime})-{\rm vec}(I_{K})\right]+\tfrac{1}{2}E_{d}^{\prime}S,
A\displaystyle\mathcal{I}_{A} =14[ϕ0A1HK+(ϕ01)vec(A1)vec(A1)+EdΞEd],\displaystyle=\tfrac{1}{4}\left[\phi_{0}A_{\otimes}^{-1}H_{K}+(\phi_{0}-1){\rm vec}(A^{-1}){\rm vec}(A^{-1})^{\prime}+E_{d}^{\prime}\Xi E_{d}\right],

where the expressions for SS and diagonal matrix, Ξ\Xi, are those given in Theorem 5 with

W0\displaystyle W_{0} =ν0+Kν02+X0X0,Wk=νk+nk1νk2+XkXk,\displaystyle=\frac{\nu_{0}+K}{\nu_{0}-2+X_{0}^{\prime}X_{0}},\quad W_{k}=\frac{\nu_{k}+n_{k}-1}{\nu_{k}-2+X_{k}^{\prime}X_{k}},
ϕ0\displaystyle\phi_{0} =ν0+Kν0+K+2,ϕk=νk+nk1νk+nk+1,\displaystyle=\frac{\nu_{0}+K}{\nu_{0}+K+2},\quad\quad\phi_{k}=\frac{\nu_{k}+n_{k}-1}{\nu_{k}+n_{k}+1},

for k=1,,Kk=1,\ldots,K.

5 Some details about practical implementation

5.1 Obtaining the AA-matrix from the vector η\eta

The K×KK\times K matrix, A=var(Y0)A=\mathrm{var}(Y_{0}), plays a central role in the score models with block-correlation matrices. Below we show how AtA_{t} can be computed from ηt\eta_{t}.

In order to obtain AA from η\eta, we adopt the algorithm developed in Archakov et al., (2024, theorem 5) to generate random block correlation matrices. The algorithm has three steps.

  1. 1.

    Compute the elements of the K×KK\times K matrix, A~\tilde{A}, using

    A~k,l={c~kk(nk1) for k=l,c~klnknl for kl,\tilde{A}_{k,l}=\begin{cases}\tilde{c}_{kk}\left(n_{k}-1\right)&\text{ for }k=l,\\ \tilde{c}_{kl}\sqrt{n_{k}n_{l}}&\text{ for }k\neq l,\end{cases}

    where c~kl\tilde{c}_{kl} are elements of η\eta, as defined by the identity, η=vech(C~)\eta={\rm vech}(\tilde{C}).

  2. 2.

    From an arbitrary starting value, y(0)Ky^{(0)}\in\mathbb{R}^{K}, e.g. a vector of zeroes, evaluate the recursion,

    yk(N+1)=yk(N)+lognklog([exp{A~+diag(y(N))}]kk+(nk1)eyk(N)c~kk),y_{k}^{(N+1)}=y_{k}^{(N)}+\log n_{k}-\log\left(\left[\exp\left\{\tilde{A}+{\rm diag}\left(y^{(N)}\right)\right\}\right]_{kk}+\left(n_{k}-1\right)e^{y_{k}^{(N)}-\tilde{c}_{kk}}\right),

    repeatedly, until convergence. Let yy denote the final value. (The convergences tends to be quick because yy is a fixed point to a contraction).

  3. 3.

    Compute A=exp(A~+diag(y))A={\rm exp}\left(\tilde{A}+{\rm diag}(y)\right).

5.2 Correlation/Moment Targeting of Dynamic Parameters

The dimension of η\eta in the score-driven model with KK groups is d=K(K+1)/2d=K\left(K+1\right)/2. For this model we adopt the following dynamic model

ηt+1=(Idβ)μ+βηt+αst,\eta_{t+1}=\left(I_{d}-\beta\right)\mu+\beta\eta_{t}+\alpha s_{t},

where β\beta and α\alpha are diagonal matrices. This makes the total number of parameters to be estimated K(K+1)/2×3K\left(K+1\right)/2\times 3 when we use the Gaussian specification. Specifications with tt-distributions will have additional degrees of freedom parameters.

So-called variance targeting is often used when estimating multivariate GARCH models, where the expected value of the conditional covariance matrix is estimated in an initial step.131313Targeting is often found to be beneficial for prediction but can have drawbacks, e.g. for inference, see Pedersen, (2016). This idea can also be applied to the transformed correlations with an estimate of μ=𝔼(ηt)\mu=\mathbb{E}(\eta_{t}) as the target. In the present context, it would be more appropriate to call it correlation targeting, or moment targeting that encompasses many variations of this method. For the initial estimation of the target, 𝔼(ηt)\mathbb{E}(\eta_{t}), we follow Archakov and Hansen, (2024) and estimate the unconditional sample block-correlation matrix with

C^=QD^Q,D^=[A^000λ^1In11000λ^KInK1],\hat{C}=Q\hat{D}Q^{\prime},\quad\hat{D}=\left[\begin{array}[]{cccc}\hat{A}&0&\cdots&0\\ 0&\hat{\lambda}_{1}I_{n_{1}-1}&\ddots&\vdots\\ \vdots&\ddots&\ddots&0\\ 0&\cdots&0&\hat{\lambda}_{K}I_{n_{K}-1}\end{array}\right],

where

Yt=QXt=(Y0,t,Y1,t,,YK,t),A^=t=1TY0,tY0,t,λ^k=nkA^kknk1.Y_{t}=Q^{\prime}X_{t}=\left(Y_{0,t}^{\prime},Y_{1,t}^{\prime},\ldots,Y_{K,t}^{\prime}\right)^{\prime},\quad\hat{A}=\sum_{t=1}^{T}Y_{0,t}Y_{0,t}^{\prime},\quad\hat{\lambda}_{k}=\frac{n_{k}-\hat{A}_{kk}}{n_{k}-1}.

We then proceed to compute μ^=γ(C^)\hat{\mu}=\gamma(\hat{C}). Because γ(C)\gamma(C) is non-linear, μ^\hat{\mu} is only a first-order approximation of μ\mu, but our empirical results suggest that it is a good approximation.

5.3 Benchmark Correlation Model: The DCC Model

The original DCC model was proposed by Engle, (2002), see also Engle and Sheppard, (2001). The original form of variance targeting could result in inconsistencies, see Aielli, (2013), who proposed a modification that resolves this issue. This model is known as cDCC model and is given by:

Ct\displaystyle C_{t} =ΛQt1/2QtΛQt1/2,\displaystyle=\Lambda_{Q_{t}}^{-1/2}Q_{t}\Lambda_{Q_{t}}^{-1/2},

where QtQ_{t} is a symmetric positive definite matrix (whose dynamic properties are defined below) and ΛQt\Lambda_{Q_{t}} is the diagonal matrix with the same diagonal elements as QtQ_{t}. This structure ensures that CtC_{t} is a valid correlation matrix. The dynamic properties of CtC_{t} are defined from those of QtQ_{t}, which are defined by

Qt+1\displaystyle Q_{t+1} =(ιιαβ)C¯+βQt+α(ΛQt1/2ZtZtΛQt1/2),\displaystyle=\left(\iota\iota^{\prime}-\alpha-\beta\right)\odot\bar{C}+\beta\odot Q_{t}+\alpha\odot\left(\Lambda_{Q_{t}}^{1/2}Z_{t}Z_{t}^{\prime}\Lambda_{Q_{t}}^{1/2}\right), (23)

where ι\iota is the vector of ones, ZtZ_{t} is a n×1n\times 1 vector with standardized return shocks, \odot is the Hadamard product (element by element multiplication), and C¯\bar{C}, β\beta and α\alpha are unknown n×nn\times n matrices. Here C¯\bar{C} is the unconditional correlation matrix, which can be parametrized as μ=vecl(logC¯)\mu={\rm vecl}(\log\bar{C}). Note that this model has n(n+1)/2n(n+1)/2 time-varying parameters, as defined by the unique elements of vech(Qt){\rm vech}(Q_{t}). However, CtC_{t} only has n(n1)/2n(n-1)/2 distinct correlations, so there are nn redundant variable in QtQ_{t}.

6 Empirical Analysis

We estimate and evaluate the models using nine stocks (small universe) as well as 100 stocks (large universe). We will use industry sectors, as defined by the Global Industry Classification Standard (GICS), to form block structures in the correlation matrix and/or the heavy tail index. The ticker symbols for all 100 stocks are listed in Table 1, organized by industry sectors. The nine stocks in the small universe are highlighted with bold font.

Table 1: List of 100 stocks
Energy Materials Industrials Consumer Consumer
Discretionary Staples
APA APD BA AMZN CL
BKR DD CAT EBAY COST
COP FCX EMR F CPB
CVX IP FDX HD KO
DVN SHW GD LOW MDLZ
HAL GE MCD MO
MRO HON NKE PEP
NOV LMT SBUX PG
OXY MMM TGT WBA
SLB NSC WMT
WMB UNP
XOM UPS
Healthcare Financials Information Telecom. Utilities
Technology Services
ABT ALL AAPL CMCSA AEE
AMGN AXP ACN DIS AEP
BAX BAC ADBE DISH DUK
BMY BK CRM GOOGL ETR
DHR C CSCO OMC EXC
GILD COF IBM T NEE
JNJ GS INTC VZ SO
LLY JPM MSFT
MDT MET NVDA
MRK RF ORCL
PFE USB QCOM
TMO WFC TXN
UNH XRX

Note: Ticker symbols for 100 stocks that define the Large Universe, listed by sector according to their Global Industry Classification Standard (GICS) codes. The nine stocks in the Small Universe are highlighted with bold font.

The sample period spans the period from January 3, 2005 to December 31, 2021, with a total of T=4,280T=4,280 trading days. We obtained daily close-to-close returns from the CRSP daily stock files in the WRDS database.

The focus of this paper concerns the dynamic modeling of correlations, but in practice we also need to estimate the conditional variances. In our empirical analysis, we estimated each of the univariate time series of conditional variances using the EGARCH models by Nelson, (1991), where the conditional mean has an AR(1) structure, as is common in this literature. Thus, the model for the ii-th asset return on day tt, ri,tr_{i,t}, is given by:

ri,t\displaystyle r_{i,t} =κi+ϕiri,t1+hi,tzi,t,zi,t(0,1),\displaystyle=\kappa_{i}+\phi_{i}r_{i,t-1}+\sqrt{h_{i,t}}z_{i,t},\quad z_{i,t}\sim\left(0,1\right),
loghi,t+1\displaystyle\log h_{i,t+1} =ξi+θiloghi,t+τizi,t+δi|zi,t|.\displaystyle=\xi_{i}+\theta_{i}\log h_{i,t}+\tau_{i}z_{i,t}+\delta_{i}|z_{i,t}|. (24)

The parameter, τi\tau_{i}, is related to the well-known leverage effect, whereas θi\theta_{i} is tied to the degree of volatility clustering. By modeling the logarithm of conditional volatility, the estimated volatility paths are guaranteed to be positive, which in conjunction with the parametrization of the correlation matrix, C(γ)C(\gamma), guarantees a positive definite conditional covariance matrix. At this stage of the estimation, we do not want to select a particular type of heavy tail distributions for zi,tz_{i,t}. So, we simply estimate the EGARCH models by quasi maximum likelihood estimation using a Gaussian specification. From the estimated time series for hi,th_{i,t}, we obtain the vector of standardized returns, Zt=[z1,t,z2,t,,zn,t]Z_{t}=\left[z_{1,t},z_{2,t},\cdots,z_{n,t}\right], which are common to all the multivariate models we consider below.

Table 2: Small Universe: Sample Correlation Matrix, C^\hat{C}, and logC^\log\hat{C} (full sample)
Energy Financial Information Tech.
MRO OXY DVN BAC C JPM MSFT INTC CSCO
Energy MRO 0.758 0.855 0.152 0.149 0.180 0.139 0.137 0.133
OXY 0.790 0.709 0.152 0.153 0.199 0.117 0.153 0.163
DVN 0.814 0.775 0.145 0.153 0.126 0.125 0.145 0.136
Financial BAC 0.439 0.442 0.424 0.859 0.873 0.143 0.152 0.200
C 0.429 0.433 0.418 0.819 0.608 0.151 0.151 0.195
JPM 0.459 0.466 0.435 0.829 0.762 0.222 0.245 0.251
Info Tech. MSFT 0.372 0.367 0.361 0.422 0.412 0.467 0.494 0.455
INTC 0.386 0.392 0.381 0.435 0.422 0.484 0.576 0.426
CSCO 0.391 0.401 0.384 0.471 0.457 0.506 0.584 0.576

Note: The sample correlation matrix estimated for the nine assets (Small Universe) over the full sample period, January 3, 2005, to December 31, 2020. The elements of C^\hat{C} are given below the diagonal and elements of logC^\log\hat{C} are given above the diagonal. The block structure is illustrated with shaded regions.

6.1 Small Universe: Dynamic Correlations for Nine Stocks

We begin by analyzing nine stocks and we refer to this data set as the small universe. The nine stocks are: Marathon Oil (MRO), Occidental Petroleum (OXY), and Devon Energy (DVN) from the energy sector, Bank of America (BAC), Citigroup (C), and JPMorgan Chase & Co (JPM) from the Financial sector, and Microsoft (MSFT), Intel (INTC), and Cisco (CSCO) from the Information Technology sector. Table 2 reports the full-sample unconditional correlation matrix (lower triangle) and its logarithm (upper-triangle) with the sector-based block structure illustrated with the shaded regions. Note that the estimated unconditional correlations within each of the blocks have similar averages. The assets within the Energy sector and Financial sector are highly correlated, with an average correlation of about 0.80. Within-sector correlations for Information Technology stock returns tend to be smaller, with an average of about 0.58. The between-sector correlations tend to be smaller and range from 0.36 to 0.51. A similar pattern is observed for the corresponding elements of the logarithm of the unconditional correlation matrix, as the logarithm transformation preserves the block structure.

Table 3: Small Universe Estimation Results: CtC_{t} Unrestricted
Score-Driven Model DCC Model
Gaussian Multiv.-tt Canon-tt Cluster-tt Hetero-tt Gaussian Multiv.-tt Canon-tt Cluster-tt Hetero-tt
μ\mu Mean 0.248 0.265 0.268 0.269 0.268 0.239 0.255 0.250 0.266 0.266
Min 0.022 0.027 0.091 0.076 0.037 0.077 0.097 0.093 0.088 0.095
Q25Q_{25} 0.127 0.124 0.124 0.123 0.129 0.106 0.116 0.116 0.117 0.129
Q50Q_{50} 0.164 0.169 0.166 0.165 0.169 0.136 0.144 0.149 0.150 0.155
Q75Q_{75} 0.227 0.327 0.307 0.325 0.306 0.240 0.331 0.261 0.317 0.305
Max 0.816 0.777 0.805 0.807 0.815 0.874 0.783 0.829 0.848 0.856
β\beta Mean 0.917 0.962 0.952 0.970 0.970 0.967 0.973 0.974 0.971 0.972
Min 0.503 0.601 0.502 0.803 0.817 0.937 0.945 0.938 0.938 0.939
Q25Q_{25} 0.881 0.976 0.951 0.966 0.964 0.964 0.972 0.973 0.967 0.968
Q50Q_{50} 0.980 0.990 0.988 0.988 0.985 0.968 0.975 0.977 0.975 0.976
Q75Q_{75} 0.995 0.996 0.994 0.994 0.994 0.972 0.979 0.980 0.978 0.978
Max 0.999 0.999 0.999 0.999 0.999 0.979 0.984 0.991 0.986 0.981
α\alpha Mean 0.019 0.014 0.016 0.015 0.016 0.016 0.013 0.011 0.012 0.012
Min 0.001 0.001 0.001 0.001 0.001 0.011 0.008 0.005 0.007 0.008
Q25Q_{25} 0.007 0.005 0.004 0.005 0.005 0.012 0.010 0.008 0.009 0.011
Q50Q_{50} 0.014 0.011 0.009 0.009 0.011 0.016 0.013 0.012 0.012 0.012
Q75Q_{75} 0.025 0.019 0.023 0.023 0.024 0.019 0.015 0.014 0.014 0.014
Max 0.071 0.069 0.071 0.057 0.063 0.028 0.021 0.016 0.022 0.019
ν0\nu_{0} 6.232 6.391 6.029 6.501
6.193 5.907
ν1\nu_{1} 5.517 6.022 5.320 5.097 5.839 5.162
4.597 4.533
4.552 4.330
ν2\nu_{2} 4.919 4.871 4.614 4.397 4.671 4.340
5.165 4.985
3.289 3.237
ν3\nu_{3} 3.714 4.175 3.807 3.315 4.046 3.740
4.023 3.910
pp 108 109 112 111 117 126 127 130 129 135
\ell -42603 -39971 -39762 -39282 -39348 -42675 -40054 -39827 -39370 -39465
m\ell_{m} -54653 -52866 -52893 -52774 -52770 -54653 -52857 -52858 -52767 -52746
c\ell_{c} 12050 12895 13131 13492 13422 11978 12803 13031 13397 13281
AIC 85422 80160 79748 78786 78930 85602 80362 79914 78998 79200
BIC 86109 80853 80461 79492 79674 86404 81170 80741 79819 80059

Note: Parameter estimates for the full sample period, January 2005 to December 2021. The Score-Driven model and DCC model are both estimated with five distributional specifications, without imposing a block structure on CtC_{t}. We report summary statistics for the estimates of μ\mu, α\alpha, and β\beta, and report all estimates of the degrees of freedom. pp is the number of parameters and we report the maximized log-likelihoods, =m+c\ell=\ell_{m}+\ell_{c}, and its two components: the log-likelihoods for the nine marginal distributions, m\ell_{m}, and the corresponding log-copula density, c\ell_{c}. We also report the AIC =2+2p=-2\ell+2p and BIC =2+plnT=-2\ell+p\ln T. Bold font is used to identify the “best performing” specification in each row among Score-Driven models and among DCC models.

We estimate three types of dynamic correlation models using five different distributions. The first type of model is the DCC model, see (23). The second model is the new score-driven model for CtC_{t}, which we introduced in Section 4.1. The third model is the score-driven model for a block correlation matrix, see Section 4.2. We consider five distributional specifications for UU, for each of these models. The distributions are: Gaussian, multivariate tt, Canonical-Block-tt, Cluster-tt, and Hetero-tt distributions. We impose a diagonal structure on the matrices, α\alpha and β\beta. In Tables 3 and 5 we report means and quantiles for the estimated parameters, μ\mu, diag(β){\rm diag}\left(\beta\right), diag(α){\rm diag}\left(\alpha\right) for score-driven model, and μ\mu, vech(β){\rm vech}\left(\beta\right), vech(α){\rm vech}\left(\alpha\right) for DCC model, i.e. the DCC models have more parameters. We denote pp as the number of parameters, \ell is the full log-likelihood function, m\ell_{m} and c\ell_{c} are the log-likelihood for marginal densities and copula functions. We also report the Akaike and Bayesian information criteria (AIC and BIC) to compare the performance of models with different number of parameters.

Table 3 reports the estimation results for the DCC model and score-driven models for general correlation matrix (Score-Full model). There are several interesting findings: First, the score-driven model provides superior performance relative to the simple DCC model for all five specifications of distributions. Second, the models with heavy-tailed distributions perform better than the corresponding model with a Gaussian distribution. For the score-driven models we see that persistence parameter, β\beta, is larger for with heavy tailed specifications, as the existence of WW would mitigate the effect from extreme value in updating interested parameters. Third, introducing the structured heavy tails greatly improve the model performances, as indicated by higher likelihood values \ell. That this improves the empirical fit is supported by the estimated degree of freedoms, which are different for different asset groups. The Information Tech sector is estimated to have the heaviest tails, follow by the Financial and Energy sectors. Fourth, the degree of freedoms estimated from Cluster-tt distribution is larger than the averages of each group from Hetero-tt distribution, as we have explained earlier. Fifth, from the decomposition of \ell, we could observe that the improvements of Canonical-Block-tt relative to the multivariate tt-distribution are all driven by the copula part. This is also the case for comparing Hetero-tt and Cluster-tt distributions. Although the former provides more flexibility in fitting marginal distribution of individual asset, it doesn’t necessarily lead to a better dependence structure. In this dataset, the Cluster-tt provides the largest copula functions, as it allows for a common χ2\chi^{2} shock among assets within the same group.

Table 4: Small Universe Estimation Results: CtC_{t} with Block Structure
Gaussian Multiv.-tt Canon-tt Cluster-tt Hetero-tt
μ11\mu_{11} 0.663 0.666 0.687 0.697 0.689
μ12\mu_{12} 0.138 0.139 0.140 0.140 0.141
μ13\mu_{13} 0.089 0.110 0.108 0.111 0.112
μ22\mu_{22} 0.793 0.778 0.802 0.811 0.810
μ23\mu_{23} 0.169 0.188 0.179 0.178 0.185
μ33\mu_{33} 0.302 0.435 0.380 0.434 0.405
β11\beta_{11} 0.918 0.988 0.964 0.982 0.974
β12\beta_{12} 0.990 0.987 0.988 0.990 0.990
β13\beta_{13} 0.988 0.990 0.988 0.987 0.990
β22\beta_{22} 0.868 0.985 0.920 0.954 0.956
β23\beta_{23} 0.921 0.912 0.921 0.942 0.936
β33\beta_{33} 0.930 0.950 0.956 0.956 0.958
α11\alpha_{11} 0.082 0.035 0.058 0.044 0.052
α12\alpha_{12} 0.025 0.034 0.031 0.031 0.031
α13\alpha_{13} 0.025 0.029 0.030 0.033 0.030
α22\alpha_{22} 0.129 0.052 0.123 0.093 0.093
α23\alpha_{23} 0.041 0.064 0.056 0.051 0.054
α33\alpha_{33} 0.067 0.045 0.047 0.050 0.053
ν0\nu_{0} 6.323 6.476
6.496
ν1\nu_{1} 5.465 6.098 5.287
4.797
4.568
ν2\nu_{2} 4.771 4.918 4.691
4.977
3.254
ν3\nu_{3} 3.637 4.182 3.816
4.094
pp 18 19 22 21 27
\ell -42696 -40068 -39814 -39352 -39428
m\ell_{m} -54653 -52870 -52883 -52770 -52769
c\ell_{c} 11957 12803 13070 13417 13341
AIC 85428 80174 79672 78746 78910
BIC 85543 80295 79812 78880 79082

Note: Parameter estimates for the full sample period, January 2005 to December 2021. Score-Driven models with a block correlation structure and five distributional specifications are estimated. The parameter estimates are reported with subscript that refer to (within/between) clusters, where Energy=1, Financial=2, and Information Tech=3. pp is the number of parameters and we report the maximized log-likelihoods, =m+c\ell=\ell_{m}+\ell_{c}, and its two components: the log-likelihoods for the nine marginal distributions, m\ell_{m}, and the corresponding log-copula density, c\ell_{c}. We also report the AIC =2+2p=-2\ell+2p and BIC =2+plnT=-2\ell+p\ln T. Bold font is used to identify the “best performing” specification in each row among Score-Driven models.

Table 4 presents the estimation results for the score-driven models for block correlation matrix (Score-Block model). We report all the estimated coefficients with subscripts referring to the parameters for within/between groups with “Energy=1, Financial=2, Information Tech=3”. Results are similar to the Table 3. When compare with the 3, we could find although the DCC models the general correlation matrix, the restricted Score-Block models provide superior performances with the last three convolution-tt specifications. And compared with the Score-Full models, the Score-Block models delivery smaller BIC for all specifications, and smaller AIC for the last three cases. We plot the time series of correlations in Figure 2 filtered by Cluster-tt distributions. Several heterogenous patterns are observed: First, expect for the within correlations for financial sector, other correlations have a sharp decline in late 2010 and increase in early 2011. Second, the inter-group correlations that involves Energy sector have a evident decline in late 2008 and the recovered.

Refer to caption
Figure 2: Within-sector and between-sector conditional correlations implied by the estimated Score-Driven model with a sector-based cluster structure in correlations and the Convolution-tt distribution (Cluster-tt with block correlation matrix).

6.2 Large Universe: Dynamic Correlation Matrix for 100100 Assets

Next, we estimate the model with the large universe, where CtC_{t} has dimension 100×100100\times 100. We use the sector classification, see Table 1, to define the block structure on CtC_{t}. Ten (of the eleven) sectors represented in the Large Universe, such that K=10K=10, and the number of unique correlations in CtC_{t} is reduced from 4,950 to 55. We estimate the score-driven model with and without correlation targeting, see Section 5.2. With correlation targeting, the intercept, μ\mu, is estimated first, and the remaining parameters are estimated in a second stage.

Table 5: Large Universe Estimation Results: CtC_{t} with Block Structure
Score-Block Model with Full Parametrization Score-Block Model with Correlation Targeting
Gaussian Multiv.-tt Canon-tt Cluster-tt Hetero-tt Gaussian Multiv.-tt Canon-tt Cluster-tt Hetero-tt
μ\mu Mean 0.053 0.051 0.056 0.056 0.055 0.053 0.053 0.053 0.053 0.053
Min 0.004 0.001 0.008 0.008 0.007 0.002 0.002 0.002 0.002 0.002
Q25Q_{25} 0.026 0.025 0.026 0.027 0.026 0.025 0.025 0.025 0.025 0.025
Q50Q_{50} 0.038 0.037 0.038 0.039 0.040 0.036 0.036 0.036 0.036 0.036
Q75Q_{75} 0.048 0.049 0.048 0.048 0.050 0.049 0.049 0.049 0.049 0.049
Max 0.333 0.327 0.343 0.348 0.342 0.349 0.349 0.349 0.349 0.349
β\beta Mean 0.842 0.886 0.888 0.903 0.887 0.850 0.891 0.902 0.915 0.905
Min 0.443 0.432 0.612 0.654 0.619 0.401 0.415 0.614 0.651 0.621
Q25Q_{25} 0.794 0.859 0.817 0.853 0.808 0.799 0.854 0.827 0.861 0.846
Q50Q_{50} 0.897 0.983 0.935 0.940 0.944 0.898 0.983 0.941 0.951 0.954
Q75Q_{75} 0.975 0.996 0.989 0.985 0.989 0.976 0.996 0.991 0.989 0.990
Max 0.999 0.999 0.999 0.999 0.999 0.999 1.000 0.999 0.999 0.999
α\alpha Mean 0.030 0.023 0.042 0.041 0.041 0.030 0.023 0.041 0.039 0.041
Min 0.001 0.003 0.005 0.006 0.004 0.001 0.004 0.008 0.005 0.003
Q25Q_{25} 0.013 0.007 0.015 0.014 0.016 0.013 0.007 0.013 0.015 0.015
Q50Q_{50} 0.028 0.015 0.035 0.038 0.036 0.028 0.015 0.032 0.029 0.036
Q75Q_{75} 0.045 0.029 0.057 0.058 0.055 0.045 0.028 0.058 0.055 0.055
Max 0.102 0.108 0.137 0.127 0.140 0.103 0.106 0.138 0.128 0.139
ν0\nu_{0} 10.06 12.25 10.11 12.20
ν1\nu_{1} 7.111 7.320 4.852 7.158 7.341 4.882
ν2\nu_{2} 5.778 6.012 4.723 5.489 5.812 4.635
ν3\nu_{3} 6.640 6.903 4.451 6.397 6.644 4.368
ν4\nu_{4} 5.373 5.579 3.884 5.306 5.546 3.849
ν5\nu_{5} 5.657 5.896 4.049 5.627 5.847 4.018
ν6\nu_{6} 6.024 6.263 4.070 5.983 6.183 4.021
ν7\nu_{7} 6.018 6.159 4.280 6.008 6.127 4.289
ν8\nu_{8} 5.581 5.784 3.829 5.516 5.706 3.779
ν9\nu_{9} 7.022 7.286 5.347 7.043 7.283 5.337
ν10\nu_{10} 5.693 6.007 4.366 5.658 5.945 4.334
pp 165 166 176 175 265 110 111 121 120 210
\ell -481966 -464247 -448572 -446633 -436613 -482019 -464307 -448640 -446711 -436704
m\ell_{m} -607256 -588971 -589777 -587713 -586615 -607256 -589006 -589625 -587506 -586446
c\ell_{c} 125292 124724 141204 141079 150002 125236 124699 140986 140795 149742
AIC 964262 928826 897496 893616 873756 964258 928836 897522 893662 873828
BIC 965312 929882 898616 894729 875442 964958 929542 898292 894425 875164

Note: Parameter estimates for the full sample period, January 2005 to December 2021. Score-Driven models with a block correlation structure and five distributional specifications are estimated without correlation targeting (left panel) and with correlation targeting (right panel). We report summary statistics for the estimates of μ\mu, α\alpha, and β\beta, and all estimates of the degrees of freedom, except for the Heterogeneous Convolution-tt specifications where we report the average estimate within each cluster., as identified with the \dagger-superscript. pp is the number of parameters and we report the maximized log-likelihoods, =m+c\ell=\ell_{m}+\ell_{c}, and its two components: the log-likelihoods for the nine marginal distributions, m\ell_{m}, and the corresponding log-copula density, c\ell_{c}. We also report the AIC =2+2p=-2\ell+2p and BIC =2+plnT=-2\ell+p\ln T. Bold font is used to identify the “best performing” specification in each row for models with and without correlation targeting.

Table 5 reports the estimation results for the the score-driven models with block correlation matrices. The left panel has estimation results for models without correlation targeting, and the right panel has the estimation results based on correlation targeting. The estimates identified with a \dagger-superscript, are the average degrees of freedom within each cluster. These are used for specifications with heterogeneous Convolution-tt specifications (Hetero-tt), which estimates 100 degrees of freedom parameters. Compared with the results for the Small Universe, we note some interesting difference. First, different from the results on small universe, the model with hetero-tt distribution now provides the best fitting performance, and compared with Cluster-tt distribution, its improvement concentrates on the copula part. This may due to the high level of heterogeneity across the large dataset, and the simple classification based GICS is poor.141414One could estimates the group structure by using the method in Oh and Patton, (2023), here we only focus such simple classification to assess our score-driven model in modeling high-dimensional assets. Second, the models estimated with targeting perform well and have the smallest BIC across all distributional specifications.

6.3 Out-of-sample Results

We next compare the out-of-sample (OOS) performance of the different models/specifications. We estimate all models (once) using data from 2005-2014 and evaluate the estimated models with (out-of-sample) data that spans the years: 2015-2021.

The OOS results for the Small Universe are shown in Panel A of Table 6. We decompose the predicted log-likelihood, \ell, into the marginal, m\ell_{m}, and copula, c\ell_{c}, components. For each of the five distributional specifications, we have highlighted the largest predicted log-likelihood, which is the Score-Driven model without a block structure on CtC_{t}, for all five distributions. This is consistent with our in-sample results, where this model also had the largest (in-sample) log-likelihood for each of the five distributional specifications, see Tables 3 and 4. Overall, the Convolution-tt distribution with a sector-based cluster structure, Cluster-tt, has the largest predictive log-likelihood. We also note that the DCC model is has the worst performance across all distributional specifications. In sample, the DCC model was slightly better than the Score-Driven model with a block correlation matrix, for two of the five distributions (Gaussian and multivariate tt). This suggests that the DCC suffer from an overfitting problem.

Table 6: Out-of-sample Results
Panel A: Out-of-sample Results for 9 Assets
Gaussian Multiv.-tt Canon-tt Cluster-tt Hetero-tt
DCC Model
pp 126 127 130 129 135
\ell -17148 -15887 -15719 -15538 -15648
m\ell_{m} -22721 -21825 -21849 -21770 -21784
c\ell_{c} 5573 5938 6130 6232 6136
Score-Full Model
pp 108 109 112 111 117
\ell -17139 -15812 -15672 -15459 -15572
m\ell_{m} -22721 -21839 -21864 -21782 -21804
c\ell_{c} 5582 6027 6192 6323 6232
Score-Block Model
pp 18 19 22 21 27
\ell -17142 -15832 -15698 -15484 -15591
m\ell_{m} -22721 -21842 -21865 -21783 -21805
c\ell_{c} 5579 6010 6167 6299 6214
Panel B: Out-of-sample Results for 100 Assets
Gaussian Student-tt Convo-tt Group-tt Hetero-tt
Score-Block Model
pp 165 166 176 175 265
\ell -202633 -192366 -184318 -183362 -179509
m\ell_{m} -251946 -242925 -243415 -242198 -241225
c\ell_{c} 49313 50559 59098 58836 61716
Score-Block Model with Correlation Targeting
pp 110 111 121 120 210
\ell -201910 -192041 -184080 -183145 -179205
m\ell_{m} -251946 -242891 -243364 -242005 -241072
c\ell_{c} 50036 50850 59284 58860 61867

Note: Out-of-sample results for the sample period (January 2015 to December 2021). pp is the number of parameters, \ell is the log-likelihood function. The Akaike and Bayesian information criteria are respectively computed as AIC =2+2p=-2\ell+2p, and BIC =2+plnT=-2\ell+p\ln T. We include the Score-driven log Group-correlation model with several distribution assumptions. The highest log-likelihood and smallest AIC and BIC in each row are highlighted in bold.

We report the OOS results for the Large Universe in Panel B of Table 6, where all model-specifications employ a block structure on CtC_{t}. The empirical results favor correlation targeting, because the Score-Driven model with correlation targeting has the largest predicted log-likelihood for each of the five distributions. Across the five distributions, the Convolution-tt distribution based on 100100, independent tt-distributions, Hetero-tt, has the largest predictive log-likelihood.

7 Summary

We have introduced the Cluster GARCH model, which is a novel multivariate GARCH model, with two types of cluster structures. One that relates to the correlation structure and one that define non-linear dependencies. The Cluster GARCH framework combines several useful components from the existing literature. For instance, we incorporate the block correlation structure by Engle and Kelly, (2012), the correlation parametrization by Archakov and Hansen, (2021), and the convolution-tt distributions by Hansen and Tong, (2024). A convolution-tt distribution is a multivariate heavy-tailed distribution with cluster structures, flexible nonlinear dependencies, and heterogeneous marginal distributions. We also adopted the score-driven framework by Creal et al., (2013) to model the dynamic variation in the correlation structure. The convolution-tt distributions are well-suited for score-driven models, because their density functions are sufficiently tractable, allowing us to derive closed-form expressions for the key ingredients in score-driven models: the score and the Hessian. We derived detailed results for three special types of convolution-tt distributions. These are labelled Canonical-Block-tt, Cluster-tt, and Hetero-tt, and their score functions and Fisher informations are all available in closed-form.

Applying the model to high-dimensional systems is possible when the block correlation structure is imposed. This was pointed out in Archakov et al., (2020), but the present paper is first to demonstrate this empirically with n=100n=100. This was achieved with K=10K=10 sector-based clusters that was used to define the block structure on the correlation matrix. The block structure is advantages for several reason. First, it reduces the number of distinct correlations in CtC_{t} from 4,950 to 55 (n(n1)/2n(n-1)/2 to K(K+1)/2K(K+1)/2). Second, many likelihood computations are greatly simplified due to the canonical representation of block correlation matrix, see Archakov and Hansen, (2024). An important implication for the dynamic model is that computations only involve inverses, determinants, square-roots of K×KK\times K matrices rather than n×nn\times n matrices.

We conduct an extensive empirical investigation on the performance of our dynamic model for correlation matrices. And we consider a “small universe” with n=9n=9 assets and a “large universe” with n=100n=100 assets. The empirical results find strong support for convolution-tt distributions that outperforms conventional distributions, in-sample as well as out-of-sample. Moreover, the score-driven framework out-performs the standard DCC model in all cases (dimensions and choice of distribution). The score-driven model with a sector-based block correlation matrix has the smallest BIC.

References

  • Aielli, (2013) Aielli, G. P. (2013). Dynamic conditional correlation: on properties and estimation. Journal of Business and Economic Statistics, 31:282–299.
  • Andersen et al., (2024) Andersen, T. G., Ding, Y., and Todorov, V. (2024). The granular source of tail risk in the cross-section of asset prices. Working paper.
  • Archakov and Hansen, (2021) Archakov, I. and Hansen, P. R. (2021). A new parametrization of correlation matrices. Econometrica, 89:1699–1715.
  • Archakov and Hansen, (2024) Archakov, I. and Hansen, P. R. (2024). A canonical representation of block matrices with applications to covariance and correlation matrices. Review of Economics and Statistics, 106:1–15.
  • Archakov et al., (2020) Archakov, I., Hansen, P. R., and Lunde, A. (2020). A multivariate Realized GARCH model. arXiv:2012.02708, [econ.EM].
  • Archakov et al., (2024) Archakov, I., Hansen, P. R., and Luo, Y. (2024). A new method for generating random correlation matrices. Econometrics Journal, forthcoming.
  • Bauwens et al., (2006) Bauwens, L., Laurent, S., and Rombouts, J. V. K. (2006). Multivariate GARCH models: A survey. Journal of Applied Econometrics, 21:79–109.
  • Bollerslev, (1986) Bollerslev, T. (1986). Generalized autoregressive heteroskedasticity. Journal of Econometrics, 31:307–327.
  • Bollerslev, (1990) Bollerslev, T. (1990). Modelling the Coherence in Short-run Nominal Exchange Rates: A Multivariate Generalized ARCH Model. The Review of Economics and Statistics, 72:498–505.
  • Creal et al., (2012) Creal, D., Koopman, S. J., and Lucas, A. (2012). A dynamic multivariate heavy-tailed model for time-varying volatilities and correlations. Journal of Business and Economic Statistics, 29:552–563.
  • Creal et al., (2013) Creal, D., Koopman, S. J., and Lucas, A. (2013). Generalized autoregressive score models with applications. Journal of Applied Econometrics, 28:777–795.
  • Creal and Tsay, (2015) Creal, D. and Tsay, R. S. (2015). High dimensional dynamic stochastic copula models. Journal of Econometrics, 189:335–345.
  • Engle and Kelly, (2012) Engle, R. and Kelly, B. (2012). Dynamic equicorrelation. Journal of Business and Economic Statistics, 30:212–228.
  • Engle, (1982) Engle, R. F. (1982). Autoregressive conditional heteroskedasticity with estimates of the variance of U.K. inflation. Econometrica, 45:987–1007.
  • Engle, (2002) Engle, R. F. (2002). Dynamic conditional correlation: A simple class of multivariate generalized autoregressive conditional heteroskedasticity models. Journal of Business and Economic Statistics, 20:339–350.
  • Engle et al., (2019) Engle, R. F., Ledoit, O., and Wolf, M. (2019). Large dynamic covariance matrices. Journal of Business and Economic Statistics, 37:363–375.
  • Engle and Sheppard, (2001) Engle, R. F. and Sheppard, K. (2001). Theoretical and Empirical properties of Dynamic Conditional Correlation Multivariate GARCH. NBER Working Papers 8554, National Bureau of Economic Research, Inc.
  • Gorgi et al., (2019) Gorgi, P., Hansen, P. R., Janus, P., and Koopman, S. J. (2019). Realized Wishart-GARCH: A score-driven multi-asset volatility model. Journal of Financial Econometrics, 17:1–32.
  • Hafner and Wang, (2023) Hafner, C. M. and Wang, L. (2023). A dynamic conditional score model for the log correlation matrix. Journal of Econometrics, 237.
  • Hansen and Tong, (2024) Hansen, P. R. and Tong, C. (2024). Convolution-t distributions. arXiv:2404.00864, [econ.EM].
  • Harvey, (2013) Harvey, A. (2013). Dynamic Models for Volatility and Heavy Tails: with Applications to Financial and Economic Time Series. Cambridge University Press.
  • Hurst, (1995) Hurst, S. (1995). The characteristic function of the Student t distribution. Financial Mathematics Research Report No. FMRR006-95, Statistics Research Report No. SRR044-95.
  • Ibragimov et al., (2015) Ibragimov, M., Ibragimov, R., and Walden, J. (2015). Heavy-Tailed Distributions and Robustness in Economics and Finance. Springer.
  • Joarder, (1995) Joarder, A. H. (1995). The characteristic function of the univariate T-distribution. Dhaka University Journal of Science, 43:117–25.
  • Kotz and Nadarajah, (2004) Kotz, S. and Nadarajah, S. (2004). Multivariate t-distributions and their applications. Cambridge University Press.
  • Laub, (2004) Laub, A. J. (2004). Matrix analysis for scientists and engineers. SIAM.
  • Linton and McCrorie, (1995) Linton, O. and McCrorie, J. R. (1995). Differentiation of an exponential matrix function: Solution. Econometric Theory, 11:1182–1185.
  • Magnus and Neudecker, (1979) Magnus, J. R. and Neudecker, H. (1979). The commutation matrix: some properties and applications. The Annals of Statistics, 7:381–394.
  • Nelson, (1991) Nelson, D. B. (1991). Conditional heteroskedasticity in asset returns: A new approach. Econometrica, 59:347–370.
  • Oh and Patton, (2023) Oh, D. H. and Patton, A. J. (2023). Dynamic factor copula models with estimated cluster assignments. Journal of Econometrics, 237. 105374.
  • Opschoor et al., (2017) Opschoor, A., Janus, P., Lucas, A., and van Dijk, D. (2017). New HEAVY models for fat-tailed realized covariances and returns. Journal of Business and Economic Statistics, 36:643–657.
  • Opschoor et al., (2021) Opschoor, A., Lucas, A., Barra, I., and van Dijk, D. (2021). Closed-form multi-factor copula models with observation-driven dynamic factor loadings. Journal of Business and Economic Statistics, 39:1066–1079.
  • Pakel et al., (2021) Pakel, C., Shephard, N., Sheppard, K., and Engle, R. F. (2021). Fitting vast dimensional time-varying covariance models. Journal of Business and Economic Statistics, 39:652–668.
  • Pedersen, (2016) Pedersen, R. S. (2016). Targeting estimation of CCC-GARCH models with infinite fourth moments. Econometrics Journal, 32:498–531.
  • Tong and Hansen, (2023) Tong, C. and Hansen, P. R. (2023). Characterizing correlation matrices that admit a clustered factor representation. Economic Letters, 233. 111433.
  • Tse and Tsui, (2002) Tse, Y. K. and Tsui, A. K. C. (2002). A Multivariate Generalized Autoregressive Conditional Heteroscedasticity Model with Time-Varying Correlations. Journal of Business and Economic Statistics, 20:351–362.

Appendix A Proofs

Proof of Proposition 1. Let Xtνstd(0,In)X\sim t_{\nu}^{\mathrm{std}}(0,I_{n}) and consider XααXX_{\alpha}\equiv\alpha^{\prime}X, for some αn\alpha\in\mathbb{R}^{n}. It follows that Xα=aYX_{\alpha}=aY where a=α=ααa=\left\|\alpha\right\|=\sqrt{\alpha^{\prime}\alpha} and YY is a univariate random variable with distribution, Ytνstd(0,1)Y\sim t_{\nu}^{\mathrm{std}}(0,1). The characteristics function for the conventional Student’s tt-distribution with ν\nu degrees of freedom, see Hurst, (1995) and Joarder, (1995), is given by:

φν(s)=Kν2(ν|s|)(ν|s|)12νΓ(ν2)2ν21,\varphi_{\nu}(s)=\frac{K_{\frac{\nu}{2}}(\sqrt{\nu}|s|)(\sqrt{\nu}|s|)^{\frac{1}{2}\nu}}{\Gamma\left(\frac{\nu}{2}\right)2^{\frac{\nu}{2}-1}},

where Kν2()K_{\frac{\nu}{2}}(\cdot) is the modified Bessel function of the second kind, such that the characteristic function for YY is given by,

φνstd(s)=φν(ν2νs)=Kν2(ν2|s|)(ν2|s|)12νΓ(ν2)2ν21,\varphi_{\nu}^{\mathrm{std}}(s)=\varphi_{\nu}(\sqrt{\tfrac{\nu-2}{\nu}}s)=\frac{K_{\frac{\nu}{2}}(\sqrt{\nu-2}|s|)(\sqrt{\nu-2}|s|)^{\frac{1}{2}\nu}}{\Gamma\left(\frac{\nu}{2}\right)2^{\frac{\nu}{2}-1}},

and the characteristic function for XαX_{\alpha} is simply φXα(s)=φνstd(sα)\varphi_{X_{\alpha}}(s)=\varphi_{\nu}^{\mathrm{std}}(s\left\|\alpha\right\|).

Next, the jj-th element of Z=C12UZ=C^{\frac{1}{2}}U can be expresses as

Zj=ej,nC12U=g=1G(ej,nC12Pg)Vg=g=1GαjgVg,Z_{j}=e_{j,n}^{\prime}C^{\frac{1}{2}}U=\sum_{g=1}^{G}\left(e_{j,n}^{\prime}C^{\frac{1}{2}}P_{g}\right)V_{g}=\sum_{g=1}^{G}\alpha_{jg}^{\prime}V_{g},

where αjg=PgC12ej,nmg\alpha_{jg}=P_{g}^{\prime}C^{\frac{1}{2}}e_{j,n}\in\mathbb{R}^{m_{g}} and ej,ne_{j,n} is the jj-th column of identity matrix InI_{n}. From the independence of V1,,VGV_{1},\ldots,V_{G} it now follows that the characteristic function for ZjZ_{j} is given by

φZj(s)\displaystyle\varphi_{Z_{j}}(s) =g=1G𝔼(exp(isαjgVg))=g=1Gφνstd(sαjg).\displaystyle=\prod_{g=1}^{G}\mathbb{E}\left(\exp\left(is\alpha_{jg}^{\prime}V_{g}\right)\right)=\prod_{g=1}^{G}\varphi_{\nu}^{\mathrm{std}}\left(s\|\alpha_{jg}\|\right).

Finally, from the inverse Fourier transform, we can recover the probability and cumulative density functions from the characteristic functions of ZjZ_{j}, given by

fZj(z)=1π0Re[eiszφZj(s)]ds,f_{Z_{j}}(z)=\frac{1}{\pi}\int_{0}^{\infty}{\rm Re}\left[e^{-isz}\varphi_{Z_{j}}(s)\right]\mathrm{d}s,

and

FZj(z)=121π0Im[eiszφZj(s)]sds,F_{Z_{j}}(z)=\frac{1}{2}-\frac{1}{\pi}\int_{0}^{\infty}\frac{{\rm Im}\left[e^{-isz}\varphi_{Z_{j}}(s)\right]}{s}\mathrm{d}s,

respectively. \square

A.1 Proofs of Results for Score Model (Section 4)

First some notation. Let AA and BB be k×kk\times k matrices, then ABA\otimes B denotes the Kronecker product. We use AA_{\otimes} to denote AAA\otimes A and ABA\oplus B for AB+BAA\otimes B+B\otimes A as in Creal et al., (2012). The vec(A){\rm vec}(A) stacks the columns of matrix AA into a k2×1k^{2}\times 1 column vector, while vech(A){\rm vech}(A) stacks the lower triangular part (including diagonal elements) into a k×1k^{*}\times 1 column vector, where k=k(k+1)/2k^{*}=k(k+1)/2. The k×kk\times k identity matrix is denoted by IkI_{k}.

From the eigendecomposition, C=QΛQC=Q\Lambda Q^{\prime}, we have from Laub, (2004, Theorem 13.16) that CI=(QQ)(ΛI)(QQ).C\oplus I=(Q\otimes Q)(\Lambda\oplus I)(Q^{\prime}\otimes Q^{\prime}). The inverse is therefore given by

(CI)1=(QQ)(Λ1I)(QQ).\left(C\oplus I\right)^{-1}=\left(Q\otimes Q\right)\left(\Lambda^{-1}\oplus I\right)\left(Q^{\prime}\otimes Q^{\prime}\right).

From Linton and McCrorie, (1995), the expression for Γ=vec(C)/vec(logC)\Gamma=\partial{\rm vec}\left(C\right)/\partial{\rm vec}\left(\log C\right)^{\prime} is

Γ=(QQ)Ξ(QQ),\Gamma=(Q\otimes Q)\varXi(Q\otimes Q)^{\prime}, (A.1)

where QQ is an orthonormal matrix from the eigenvectors of logA\log A with eigenvalues, λ1,,λn\lambda_{1},\ldots,\lambda_{n}, and Ξ\varXi is a n2×n2n^{2}\times n^{2} diagonal matrix with elements δij\delta_{ij}, for i,j=1,,ni,j=1,\ldots,n

δij=Ξ(i1)n+j,(i1)n+j={eλi, if λi=λj,eλieλjλiλj, if λiλj,\delta_{ij}=\varXi_{(i-1)n+j,(i-1)n+j}=\begin{cases}e^{\lambda_{i}},&\text{ if }\lambda_{i}=\lambda_{j},\\ \frac{e^{\lambda_{i}}-e^{\lambda_{j}}}{\lambda_{i}-\lambda_{j}},&\text{ if }\lambda_{i}\neq\lambda_{j},\end{cases}

Note that the the expression for vec(logC)/vec(C)\partial{\rm vec}\left(\log C\right)/\partial{\rm vec}\left(C\right)^{\prime} is just the inverse of Γ\Gamma, given by

Γ1=(QQ)Ξ1(QQ),\Gamma^{-1}=(Q\otimes Q)\varXi^{-1}(Q\otimes Q)^{\prime}, (A.2)

where Ξ1\varXi^{-1} is a n2×n2n^{2}\times n^{2} diagonal matrix with elements δij1\delta_{ij}^{-1}, for i,j=1,,ni,j=1,\ldots,n.

Next, we presents expectations of some quantities involving the tνstd(0,In)t_{\nu}^{\mathrm{std}}(0,I_{n}) distribution, involving the following constant,

ζp,q=(ν+nν2)p2(ν22)q2Γ(ν+n2)Γ(ν2)Γ(ν+pq2)Γ(ν+p+n2).\zeta_{p,q}=\left(\tfrac{\nu+n}{\nu-2}\right)^{\frac{p}{2}}\left(\tfrac{\nu-2}{2}\right)^{\frac{q}{2}}\frac{\Gamma(\frac{\nu+n}{2})}{\Gamma(\tfrac{\nu}{2})}\frac{\Gamma(\tfrac{\nu+p-q}{2})}{\Gamma(\tfrac{\nu+p+n}{2})}.
Lemma A.1.

Suppose that Xtνstd(0,In)X\sim t_{\nu}^{\mathrm{std}}(0,I_{n}) and define

W=ν+nν2+XX.W=\frac{\nu+n}{\nu-2+X^{\prime}X}.

(i) For any integrable function gg and any p>2νp>2-\nu, it holds that

𝔼[Wp2g(X)]=ζp,0𝔼[g(Y)],\mathbb{E}\left[W^{\frac{p}{2}}g(X)\right]=\zeta_{p,0}\mathbb{E}\left[g(Y)\right],

where Ytν+pstd(0,v2v+p2In)Y\sim t_{\nu+p}^{\mathrm{std}}\left(0,\frac{v-2}{v+p-2}I_{n}\right).

(ii) Moreover, if gg is homogeneous of degree q<ν+pq<\nu+p, then

𝔼[Wp2g(X)]=ζp,q𝔼[g(Z)],\mathbb{E}\left[W^{\frac{p}{2}}g(X)\right]=\zeta_{p,q}\mathbb{E}\left[g(Z)\right],

where ZN(0,In)Z\sim N\left(0,I_{n}\right).

By integrable function, gg, the requirement is 𝔼|g(Y)|<\mathbb{E}|g(Y)|<\infty and 𝔼|g(Z)|<\mathbb{E}|g(Z)|<\infty in parts (i) and (ii), respectively. Note that pp is allowed to be negative, since 2ν<02-\nu<0. Also, if p/2p/2 is a positive integer, then

ζp,q\displaystyle\zeta_{p,q} =\displaystyle= (ν+nν2)p2(ν22)q2ν+q2ν+n2ν+q2+1ν+n2+1ν+q2+p21ν+n2+p21\displaystyle\left(\tfrac{\nu+n}{\nu-2}\right)^{\frac{p}{2}}\left(\tfrac{\nu-2}{2}\right)^{\frac{q}{2}}\frac{\tfrac{\nu+q}{2}}{\tfrac{\nu+n}{2}}\frac{\tfrac{\nu+q}{2}+1}{\tfrac{\nu+n}{2}+1}\cdots\frac{\tfrac{\nu+q}{2}+\tfrac{p}{2}-1}{\tfrac{\nu+n}{2}+\tfrac{p}{2}-1}
=\displaystyle= (ν+nν2)p2(ν22)q2ν+qν+nν+q+2ν+n+2ν+q+p2ν+n+p2\displaystyle\left(\tfrac{\nu+n}{\nu-2}\right)^{\frac{p}{2}}\left(\tfrac{\nu-2}{2}\right)^{\frac{q}{2}}\frac{\nu+q}{\nu+n}\frac{\nu+q+2}{\nu+n+2}\cdots\frac{\nu+q+p-2}{\nu+n+p-2}
=\displaystyle= (ν+nν2)p2(ν22)q2k=0p21ν+q+2kν+n+2k.\displaystyle\left(\tfrac{\nu+n}{\nu-2}\right)^{\frac{p}{2}}\left(\tfrac{\nu-2}{2}\right)^{\frac{q}{2}}\prod_{k=0}^{\tfrac{p}{2}-1}\frac{\nu+q+2k}{\nu+n+2k}.

where we used Γ(x+1)=xΓ(x)\Gamma(x+1)=x\Gamma(x), repeatedly. This simplifies the terms we use to derive the Fisher information matrix in several score models

ζ2,0=νν2,ζ2,2=1,ζ4,0=(ν+n)(ν+n+2)(ν+2)ν(ν2)2,ζ4,2=(ν+n)ν(ν+n+2)(ν2),ζ4,4=(ν+n)(ν+n+2).\begin{array}[]{rclcrcl}\zeta_{2,0}&=&\frac{\nu}{\nu-2},&&\zeta_{2,2}&=&1,\\ \zeta_{4,0}&=&\frac{(\nu+n)}{(\nu+n+2)}\frac{(\nu+2)\nu}{(\nu-2)^{2}},&&\zeta_{4,2}&=&\frac{(\nu+n)\nu}{(\nu+n+2)(\nu-2)},\\ &&&&\zeta_{4,4}&=&\frac{(\nu+n)}{(\nu+n+2)}.\end{array}

Proof of Lemma A.1. Let κν,n=Γ(ν+n2)/Γ(ν2)\kappa_{\nu,n}=\Gamma(\tfrac{\nu+n}{2})/\Gamma(\tfrac{\nu}{2}) and the density for Xtνstd(0,In)X\sim t_{\nu}^{\mathrm{std}}(0,I_{n}) is

fx(x)=κν,n[(ν2)π]n2(1+xxν2)ν+n2,f_{x}(x)=\kappa_{\nu,n}[(\nu-2)\pi]^{-\frac{n}{2}}\left(1+\tfrac{x^{\prime}x}{\nu-2}\right)^{-\frac{\nu+n}{2}},

whereas the density for Ytν+pstd(0,v2v+p2In)Y\sim t_{\nu+p}^{\mathrm{std}}(0,\frac{v-2}{v+p-2}I_{n}) is

fy(y)\displaystyle f_{y}(y) =\displaystyle= κν+p,n[(ν+p2)π]n2(ν+p2ν2)n2(1+1ν+p2x[ν2ν+p2In]1x)ν+p+n2\displaystyle\kappa_{\nu+p,n}[(\nu+p-2)\pi]^{-\frac{n}{2}}\left(\tfrac{\nu+p-2}{\nu-2}\right)^{\frac{n}{2}}\left(1+\tfrac{1}{\nu+p-2}x^{\prime}\left[\tfrac{\nu-2}{\nu+p-2}I_{n}\right]^{-1}x\right)^{-\frac{\nu+p+n}{2}}
=\displaystyle= κν+p,n[(ν2)π]n2(1+xxν2)ν+p+n2.\displaystyle\kappa_{\nu+p,n}[(\nu-2)\pi]^{-\frac{n}{2}}\left(1+\tfrac{x^{\prime}x}{\nu-2}\right)^{-\frac{\nu+p+n}{2}}.

The expected value we seek is

𝔼[Wp2g(X)]\displaystyle\mathbb{E}\left[W^{\frac{p}{2}}g(X)\right] =\displaystyle= (ν+nν2+xx)p2g(x)κν,n[(ν2)π]n2(1+xxν2)ν+n2dx\displaystyle\int\left(\tfrac{\nu+n}{\nu-2+x^{\prime}x}\right)^{\frac{p}{2}}g(x)\kappa_{\nu,n}[(\nu-2)\pi]^{-\frac{n}{2}}\left(1+\tfrac{x^{\prime}x}{\nu-2}\right)^{-\frac{\nu+n}{2}}\mathrm{d}x
=\displaystyle= (ν+nν2)p2g(x)κν,n[(ν2)π]n2(1+xxν2)ν+p+n2dx\displaystyle\left(\tfrac{\nu+n}{\nu-2}\right)^{\frac{p}{2}}\int g(x)\kappa_{\nu,n}[(\nu-2)\pi]^{-\frac{n}{2}}\left(1+\tfrac{x^{\prime}x}{\nu-2}\right)^{-\frac{\nu+p+n}{2}}\mathrm{d}x
=\displaystyle= (ν+nν2)p2κν,nκν+p,ng(x)fy(x)dx,\displaystyle\left(\tfrac{\nu+n}{\nu-2}\right)^{\frac{p}{2}}\frac{\kappa_{\nu,n}}{\kappa_{\nu+p,n}}\int g(x)f_{y}(x)\mathrm{d}x,

and the results for part (i) follows, since

ζp,0=(ν+nν2)p2Γ(ν+n2)/Γ(ν2)Γ(ν+p+n2)/Γ(ν+p2)=(ν+nν2)p2κν,nκν+p,n.\zeta_{p,0}=\left(\tfrac{\nu+n}{\nu-2}\right)^{\frac{p}{2}}\frac{\Gamma(\tfrac{\nu+n}{2})/\Gamma(\tfrac{\nu}{2})}{\Gamma(\tfrac{\nu+p+n}{2})/\Gamma(\tfrac{\nu+p}{2})}=\left(\tfrac{\nu+n}{\nu-2}\right)^{\frac{p}{2}}\frac{\kappa_{\nu,n}}{\kappa_{\nu+p,n}}.

To prove (ii) we use that Ytν+pstd(0,v2v+p2In)Y\sim t_{\nu+p}^{\mathrm{std}}\left(0,\frac{v-2}{v+p-2}I_{n}\right) can be expressed as Y=Z/ξ/(ν2)Y=Z/\sqrt{\xi/(\nu-2)} where ZN(0,In)Z\sim N(0,I_{n}) and ξ\xi is an independent χ2\chi^{2}-distributed random variable with ν+p\nu+p degrees of freedom. Hence, Y=ψZY=\psi Z, with ψ=1/ξ/(ν2)\psi=1/\sqrt{\xi/(\nu-2)}, such that ψq=(ν2ξ)q2.\psi^{q}=\left(\tfrac{\nu-2}{\xi}\right)^{\frac{q}{2}}. Now using part (i) and that gg is homogeneous, we find

𝔼[Wp2g(X)]\displaystyle\mathbb{E}\left[W^{\frac{p}{2}}g(X)\right] =\displaystyle= ζp,0𝔼[g(Y)]=ζp,0𝔼[ψqg(Z)]\displaystyle\zeta_{p,0}\mathbb{E}\left[g(Y)\right]=\zeta_{p,0}\mathbb{E}\left[\psi^{q}g(Z)\right]
=\displaystyle= ζp,0(ν2)q2𝔼[ξq2]𝔼[g(Z)],\displaystyle\zeta_{p,0}\left(\nu-2\right)^{\frac{q}{2}}\mathbb{E}[\xi^{-\frac{q}{2}}]\mathbb{E}\left[g(Z)\right],
=\displaystyle= ζp,0(ν2)q2Γ(ν+pq2)Γ(ν+p2)(12)q2𝔼[g(Z)]\displaystyle\zeta_{p,0}\left(\nu-2\right)^{\frac{q}{2}}\frac{\Gamma(\tfrac{\nu+p-q}{2})}{\Gamma(\tfrac{\nu+p}{2})}\left(\tfrac{1}{2}\right)^{\frac{q}{2}}\mathbb{E}\left[g(Z)\right]
=\displaystyle= (ν+nν2)p2Γ(ν+n2)Γ(ν2)Γ(ν+pq2)Γ(ν+p+n2)(ν22)q2𝔼[g(Z)],\displaystyle\left(\tfrac{\nu+n}{\nu-2}\right)^{\frac{p}{2}}\frac{\Gamma(\tfrac{\nu+n}{2})}{\Gamma(\tfrac{\nu}{2})}\frac{\Gamma(\tfrac{\nu+p-q}{2})}{\Gamma(\tfrac{\nu+p+n}{2})}\left(\tfrac{\nu-2}{2}\right)^{\frac{q}{2}}\mathbb{E}\left[g(Z)\right],

where we used that ξ\xi and ZZ are independent, and Creal et al., (2012, Results 2), which states that if ξχν+p2\xi\sim\chi_{\nu+p}^{2}, then

𝔼(ξq2)=Γ(ν+pq2)Γ(ν+p2)(12)q2,forq<ν+p.\mathbb{E}\left(\xi^{-\frac{q}{2}}\right)=\frac{\Gamma(\tfrac{\nu+p-q}{2})}{\Gamma(\tfrac{\nu+p}{2})}\left(\tfrac{1}{2}\right)^{\frac{q}{2}},\qquad\text{for}\quad q<\nu+p.

This completes the proof. \square

Proof of Theorem 1. The log-likelihood function for a vector, ZZ, with the multivariate tt-distribution, is given by

(Z)\displaystyle\ell(Z) =cν,n12log|C|ν+n2log(1+1ν2ZC1Z).\displaystyle=c_{\nu,n}-\tfrac{1}{2}\log\left|C\right|-\tfrac{\nu+n}{2}\log\left(1+\tfrac{1}{\nu-2}Z^{\prime}C^{-1}Z\right).

So, we define W=(ν+n)/(ν2+ZC1Z)W=\left(\nu+n\right)/\left(\nu-2+Z^{\prime}C^{-1}Z\right), we have

vec(C)\displaystyle\frac{\partial\ell}{\partial{\rm vec}(C)^{\prime}} =12vec(C1)12ν+nν2+ZC1Z(ZC1Z)vec(C1)vec(C1)vec(C)\displaystyle=-\tfrac{1}{2}{\rm vec}\left(C^{-1}\right)^{\prime}-\tfrac{1}{2}\frac{\nu+n}{\nu-2+Z^{\prime}C^{-1}Z}\frac{\partial\left(Z^{\prime}C^{-1}Z\right)}{\partial{\rm vec}(C^{-1})^{\prime}}\frac{\partial{\rm vec}(C^{-1})}{\partial{\rm vec}(C)^{\prime}}
=12[vec(C1)+Wvec(ZZ)C1]\displaystyle=-\tfrac{1}{2}\left[{\rm vec}\left(C^{-1}\right)^{\prime}+W{\rm vec}\left(ZZ^{\prime}\right)^{\prime}C_{\otimes}^{-1}\right]
=12[Wvec(ZZ)vec(C)]C1,\displaystyle=\tfrac{1}{2}\left[W{\rm vec}\left(ZZ^{\prime}\right)^{\prime}-{\rm vec}\left(C\right)^{\prime}\right]C_{\otimes}^{-1},

such that the score is given by

=γ=vec(C)vec(C)γ=12[Wvec(ZZ)vec(C)]C1M.\nabla^{\prime}=\frac{\partial\ell}{\partial\gamma^{\prime}}=\frac{\partial\ell}{\partial{\rm vec}(C)^{\prime}}\frac{\partial{\rm vec}(C)^{\prime}}{\partial\gamma^{\prime}}=\tfrac{1}{2}\left[W{\rm vec}\left(ZZ^{\prime}\right)^{\prime}-{\rm vec}\left(C\right)^{\prime}\right]C_{\otimes}^{-1}M.

From Archakov and Hansen, (2021, Proposition 3) we have the expression

M=vec(C)γ=(El+Eu)El(IΓEd(EdΓEd)1Ed)Γ(El+Eu),M=\frac{\partial{\rm vec}\left(C\right)}{\partial\gamma^{\prime}}=\left(E_{l}+E_{u}\right)^{\prime}E_{l}\left(I-\Gamma E_{d}^{\prime}\left(E_{d}\Gamma E_{d}^{\prime}\right)^{-1}E_{d}\right)\Gamma\left(E_{l}+E_{u}\right)^{\prime}, (A.3)

which uses the fact that vec(C)/vecl(C)=El+Eu\partial{\rm vec}\left(C\right)/\partial{\rm vecl}\left(C\right)=E_{l}+E_{u}, where ElE_{l}, EuE_{u}, EdE_{d} are elimination matrices, and the expression Γ=vec(C)/vec(logC)\Gamma=\partial{\rm vec}\left(C\right)/\partial{\rm vec}\left(\log C\right)^{\prime} is given in (A.1).

Next we rewrite \nabla as

\displaystyle\nabla^{\prime} =12[Wvec(ZZ)vec(C)]C1M=12[Wvec(UU)vec(I)]C12M,\displaystyle=\tfrac{1}{2}\left[W{\rm vec}\left(ZZ^{\prime}\right)^{\prime}-{\rm vec}\left(C\right)^{\prime}\right]C_{\otimes}^{-1}M=\tfrac{1}{2}\left[W{\rm vec}\left(UU^{\prime}\right)^{\prime}-{\rm vec}\left(I\right)^{\prime}\right]C_{\otimes}^{-\tfrac{1}{2}}M,

where U=C12Ztνstd(0,In)U=C^{-\frac{1}{2}}Z\sim t_{\nu}^{\mathrm{std}}\left(0,I_{n}\right), such that

=𝔼[]=14MC12[𝔼(W2vec(UU)vec(UU))vec(I)vec(I)]C12M.\mathcal{I}=\mathbb{E}\left[\nabla\nabla^{\prime}\right]=\frac{1}{4}M^{\prime}C_{\otimes}^{-\tfrac{1}{2}}\left[\mathbb{E}\left(W^{2}{\rm vec}\left(UU^{\prime}\right){\rm vec}\left(UU^{\prime}\right)^{\prime}\right)-{\rm vec}\left(I\right){\rm vec}\left(I\right)^{\prime}\right]C_{\otimes}^{-\tfrac{1}{2}}M.

From Lemma A.1 with ϕ=ζ42=(v+n)/(v+n+2)\phi=\zeta_{42}=(v+n)/(v+n+2), we have

𝔼[W2vec(UU)vec(UU)]=ϕ𝔼[vec(Z~Z~)vec(Z~Z~)]=ϕ[Hn+vec(I)vec(I)],\mathbb{E}\left[W^{2}{\rm vec}\left(UU^{\prime}\right){\rm vec}\left(UU^{\prime}\right)^{\prime}\right]=\phi\mathbb{E}\left[{\rm vec}\left(\tilde{Z}\tilde{Z}^{\prime}\right){\rm vec}\left(\tilde{Z}\tilde{Z}^{\prime}\right)^{\prime}\right]=\phi\left[H_{n}+{\rm vec}\left(I\right){\rm vec}\left(I\right)^{\prime}\right],

where Z~N(0,In)\tilde{Z}\sim N\left(0,I_{n}\right). The expression for last expectation follows from Magnus and Neudecker, (1979, Theorem 4.1), which states that

𝔼[vec(Z~Z~)vec(Z~Z~)]=Hn+vec(I)vec(I),\mathbb{E}\left[{\rm vec}\left(\tilde{Z}\tilde{Z}^{\prime}\right){\rm vec}\left(\tilde{Z}\tilde{Z}^{\prime}\right)^{\prime}\right]=H_{n}+{\rm vec}\left(I\right){\rm vec}\left(I\right)^{\prime},

if Z~N(0,In)\tilde{Z}\sim N\left(0,I_{n}\right), where Hn=In2+KnH_{n}=I_{n^{2}}+K_{n}, and KnK_{n} is the commutation matrix. Finally,

\displaystyle\mathcal{I} =14MC12[ϕHn+(ϕ1)vec(In)vec(In)]C12M\displaystyle=\tfrac{1}{4}M^{\prime}C_{\otimes}^{-\tfrac{1}{2}}\left[\phi H_{n}+\left(\phi-1\right){\rm vec}\left(I_{n}\right){\rm vec}\left(I_{n}\right)^{\prime}\right]C_{\otimes}^{-\tfrac{1}{2}}M
=14M[ϕC1Hn+(ϕ1)vec(C1)vec(C1)]M.\displaystyle=\tfrac{1}{4}M^{\prime}\left[\phi C_{\otimes}^{-1}H_{n}+(\phi-1){\rm vec}\left(C^{-1}\right){\rm vec}\left(C^{-1}\right)^{\prime}\right]M. (A.4)

This completes the proof. \square

Proof of Theorem 1. For this case we have the log-likelihood function

(Z)\displaystyle\ell\left(Z\right) =log|C12|+g=1Gcgνg+mg2log(1+1νg2VgVg),\displaystyle=-\log|C^{\frac{1}{2}}|+\sum_{g=1}^{G}c_{g}-\tfrac{\nu_{g}+m_{g}}{2}\log\left(1+\tfrac{1}{\nu_{g}-2}V_{g}^{\prime}V_{g}\right),

where Vg=PgU=PgC12ZV_{g}=P_{g}^{\prime}U=P_{g}^{\prime}C^{-\frac{1}{2}}Z, and Jg=PgPgJ_{g}=P_{g}P_{g}^{\prime}. Because we have

(VgVg)vec(C12)\displaystyle\frac{\partial\left(V_{g}^{\prime}V_{g}\right)}{\partial{\rm vec}(C^{\frac{1}{2}})^{\prime}} =(VgVg)Vgvec(PgC12Z)vec(C12)vec(C12)vec(C12)\displaystyle=\frac{\partial\left(V_{g}^{\prime}V_{g}\right)}{\partial V_{g}^{\prime}}\frac{\partial{\rm vec}(P_{g}^{\prime}C^{-\frac{1}{2}}Z)}{\partial{\rm vec}(C^{-\frac{1}{2}})^{\prime}}\frac{\partial{\rm vec}(C^{-\frac{1}{2}})}{\partial{\rm vec}(C^{\frac{1}{2}})^{\prime}}
=2Vg(ZPg)C12\displaystyle=-2V_{g}^{\prime}\left(Z^{\prime}\otimes P_{g}^{\prime}\right)C_{\otimes}^{-\frac{1}{2}}
=2Vg(UPgC12)\displaystyle=-2V_{g}^{\prime}\left(U^{\prime}\otimes P_{g}^{\prime}C^{-\frac{1}{2}}\right)
=2vec(C12PgVgU).\displaystyle=-2{\rm vec}\left(C^{-\frac{1}{2}}P_{g}V_{g}U^{\prime}\right)^{\prime}.

Define Wg=(νg+mg)/(νg2+VgVg)W_{g}=\left(\nu_{g}+m_{g}\right)/\left(\nu_{g}-2+V_{g}^{\prime}V_{g}\right), then we have

vec(C12)\displaystyle\frac{\partial\ell}{\partial{\rm vec}(C^{\frac{1}{2}})} =g=1GWgvec(C12PgVgU)vec(C12)=(InC12)s,\displaystyle=\sum_{g=1}^{G}W_{g}{\rm vec}\left(C^{-\frac{1}{2}}P_{g}V_{g}U^{\prime}\right)-{\rm vec}\left(C^{-\frac{1}{2}}\right)=\left(I_{n}\otimes C^{-\frac{1}{2}}\right)\nabla_{s},

where s=g=1GWgvec(PgVgU)vec(In)\nabla_{s}=\sum_{g=1}^{G}W_{g}{\rm vec}\left(P_{g}V_{g}U^{\prime}\right)-{\rm vec}\left(I_{n}\right). So, we have the formula for the score

\displaystyle\nabla^{\prime} =γ=vec(C12)vec(C12)vec(C)vec(C)γ=sΩM\displaystyle=\frac{\partial\ell}{\partial\gamma^{\prime}}=\frac{\partial\ell}{\partial{\rm vec}(C^{\frac{1}{2}})}\frac{\partial{\rm vec}(C^{\frac{1}{2}})}{\partial{\rm vec}(C)^{\prime}}\frac{\partial{\rm vec}(C)}{\partial\gamma^{\prime}}=\nabla_{s}^{\prime}\Omega M

where the matrix MM is defined in (A.3) and Ω=(InC12)(C12In)1\Omega=(I_{n}\otimes C^{-\frac{1}{2}})(C^{\frac{1}{2}}\oplus I_{n})^{-1}, which is based on

vec(C12)vec(C)=(vec(C)vec(C12))1=(C12In)1.\frac{\partial{\rm vec}(C^{\frac{1}{2}})}{\partial{\rm vec}(C)^{\prime}}=\left(\frac{\partial{\rm vec}(C)}{\partial{\rm vec}(C^{\frac{1}{2}})^{\prime}}\right)^{-1}=\left(C^{\frac{1}{2}}\oplus I_{n}\right)^{-1}.

This proves (20). Next, the inverse of the n2×n2n^{2}\times n^{2} matrix C12IC^{\frac{1}{2}}\oplus I is available in closed form, see Appendix A, based on the eigendecomposition C12=QΛ12QC^{\frac{1}{2}}=Q\Lambda^{\frac{1}{2}}Q^{\prime}. This does not add to the computation burden additionally, because the eigendecomposition of C12C^{\frac{1}{2}} is available from that of logC=QlogΛQ\log C=Q\log\Lambda Q^{\prime}, which was needed for computing Θ\Theta from MM.

The Information Matrix

Next we turn to the information matrix. Note that=MΩ𝔼(ss)ΩM\mathcal{I}=M^{\prime}\Omega\mathbb{E}\left(\nabla_{s}\nabla_{s}^{\prime}\right)\Omega M, with 𝔼(ss)\mathbb{E}\left(\nabla_{s}\nabla_{s}^{\prime}\right) given by

𝔼(ss)\displaystyle\mathbb{E}\left(\nabla_{s}\nabla_{s}^{\prime}\right) =𝔼[k=1Gl=1GWkWlvec(PkVkU)vec(PlVlU)vec(In)vec(In)]\displaystyle=\mathbb{E}\left[\sum_{k=1}^{G}\sum_{l=1}^{G}W_{k}W_{l}{\rm vec}\left(P_{k}V_{k}U^{\prime}\right){\rm vec}\left(P_{l}V_{l}U^{\prime}\right)^{\prime}-{\rm vec}\left(I_{n}\right){\rm vec}\left(I_{n}\right)^{\prime}\right]

For later use, we define ψk=ζ42\psi_{k}=\zeta_{42}, and ϕk=ζ44\phi_{k}=\zeta_{44}, for k=1,,Gk=1,\ldots,G, where the constants are given from Lemma A.1, given by

ϕk=νk+mkνk+mk+2,andψk=ϕkνkνk2,\phi_{k}=\frac{\nu_{k}+m_{k}}{\nu_{k}+m_{k}+2},\quad\text{and}\quad\psi_{k}=\phi_{k}\frac{\nu_{k}}{\nu_{k}-2},

and define the function φ(k,l)\varphi\left(k,l\right) as

φ(k,l)=WkWlvec(PkVkU)vec(PlVlU).\varphi\left(k,l\right)=W_{k}W_{l}{\rm vec}\left(P_{k}V_{k}U^{\prime}\right){\rm vec}\left(P_{l}V_{l}U^{\prime}\right)^{\prime}.

Note that we will use the following preliminary results in later analysis

U=g=1GPgVg,Jg=PgPg,g=1GJg=In.U=\sum_{g=1}^{G}P_{g}V_{g},\quad J_{g}=P_{g}P_{g}^{\prime},\quad\sum_{g=1}^{G}J_{g}=I_{n}.

Expectation of φ(k,l)\varphi\left(k,l\right) when k=lk=l

We have the expectation for φ(k,k)\varphi\left(k,k\right) given by

𝔼[φ(k,k)]=\displaystyle\mathbb{E}\left[\varphi\left(k,k\right)\right]= 𝔼[Wk2p=1Gvec(PkVkVpPp)q=1Gvec(PkVkVqPq)]\displaystyle\mathbb{E}\left[W_{k}^{2}\sum_{p=1}^{G}{\rm vec}\left(P_{k}V_{k}V_{p}^{\prime}P_{p}^{\prime}\right)\sum_{q=1}^{G}{\rm vec}\left(P_{k}V_{k}V_{q}^{\prime}P_{q}^{\prime}\right)^{\prime}\right]
=\displaystyle= 𝔼[Wk2pkGvec(PkVkVpPp)vec(PkVkVpPp)+Wk2vec(PkVkVkPk)vec(PkVkVkPk)].\displaystyle\mathbb{E}\left[W_{k}^{2}\sum_{p\neq k}^{G}{\rm vec}\left(P_{k}V_{k}V_{p}^{\prime}P_{p}^{\prime}\right){\rm vec}\left(P_{k}V_{k}V_{p}^{\prime}P_{p}^{\prime}\right)^{\prime}+W_{k}^{2}{\rm vec}\left(P_{k}V_{k}V_{k}^{\prime}P_{k}^{\prime}\right){\rm vec}\left(P_{k}V_{k}V_{k}^{\prime}P_{k}^{\prime}\right)^{\prime}\right].

Based on Lemma A.1 with ψk=ζ42\psi_{k}=\zeta_{42}, we have

pkG𝔼[Wk2vec(PkVkVpPp)vec(PkVkVpPp)]\displaystyle\sum_{p\neq k}^{G}\mathbb{E}\left[W_{k}^{2}{\rm vec}\left(P_{k}V_{k}V_{p}^{\prime}P_{p}^{\prime}\right){\rm vec}\left(P_{k}V_{k}V_{p}^{\prime}P_{p}^{\prime}\right)^{\prime}\right]
=\displaystyle= pkG(PpPk)𝔼[Wk2vec(VkVp)vec(VkVp)](PpPk)\displaystyle\sum_{p\neq k}^{G}\left(P_{p}\otimes P_{k}\right)\mathbb{E}\left[W_{k}^{2}{\rm vec}\left(V_{k}V_{p}^{\prime}\right){\rm vec}\left(V_{k}V_{p}^{\prime}\right)^{\prime}\right]\left(P_{p}^{\prime}\otimes P_{k}^{\prime}\right)
=\displaystyle= ψkpkG(PpPk)(PpPk)\displaystyle\psi_{k}\sum_{p\neq k}^{G}\left(P_{p}\otimes P_{k}\right)\left(P_{p}^{\prime}\otimes P_{k}^{\prime}\right)
=\displaystyle= ψkpkG(JpJk)\displaystyle\psi_{k}\sum_{p\neq k}^{G}\left(J_{p}\otimes J_{k}\right)
=\displaystyle= ψk(InJk)Jk,\displaystyle\psi_{k}\left(I_{n}-J_{k}\right)\otimes J_{k},

and from Lemma A.1 we have ϕk=ζ44\phi_{k}=\zeta_{44}, such that

𝔼[Wk2vec(PkVkVkPk)vec(PkVkVkPk)]\displaystyle\mathbb{E}\left[W_{k}^{2}{\rm vec}\left(P_{k}V_{k}V_{k}^{\prime}P_{k}^{\prime}\right){\rm vec}\left(P_{k}V_{k}V_{k}^{\prime}P_{k}^{\prime}\right)^{\prime}\right]
=\displaystyle= (PkPk)𝔼[Wk2vec(VkVk)vec(VkVk)](PkPk)\displaystyle\left(P_{k}\otimes P_{k}\right)\mathbb{E}\left[W_{k}^{2}{\rm vec}\left(V_{k}V_{k}^{\prime}\right){\rm vec}\left(V_{k}V_{k}^{\prime}\right)^{\prime}\right]\left(P_{k}^{\prime}\otimes P_{k}^{\prime}\right)
=\displaystyle= ϕk(PkPk)[Hnk+vec(Ink)vec(Ink)](PkPk)\displaystyle\phi_{k}\left(P_{k}\otimes P_{k}\right)\left[H_{n_{k}}+{\rm vec}(I_{n_{k}}){\rm vec}(I_{n_{k}})^{\prime}\right]\left(P_{k}^{\prime}\otimes P_{k}^{\prime}\right)
=\displaystyle= ϕk[JkHn+vec(Jk)vec(Jk)].\displaystyle\phi_{k}\left[J_{k\otimes}H_{n}+{\rm vec}(J_{k}){\rm vec}(J_{k})^{\prime}\right].

Finally we arrive at the expression,

𝔼[φ(k,k)]\displaystyle\mathbb{E}\left[\varphi\left(k,k\right)\right] =ψk(InJk)Jk+ϕk[JkHn+vec(Jk)vec(Jk)].\displaystyle=\psi_{k}\left(I_{n}-J_{k}\right)\otimes J_{k}+\phi_{k}\left[J_{k\otimes}H_{n}+{\rm vec}(J_{k}){\rm vec}(J_{k})^{\prime}\right].

Expectation of φ(k,l)\varphi\left(k,l\right) when klk\neq l

When klk\neq l, we have

𝔼[φ(k,l)]=\displaystyle\mathbb{E}\left[\varphi\left(k,l\right)\right]= 𝔼[WkWlp=1Gvec(PkVkVpPp)q=1Gvec(PlVlVqPq)]\displaystyle\mathbb{E}\left[W_{k}W_{l}\sum_{p=1}^{G}{\rm vec}\left(P_{k}V_{k}V_{p}^{\prime}P_{p}^{\prime}\right)\sum_{q=1}^{G}{\rm vec}\left(P_{l}V_{l}V_{q}^{\prime}P_{q}^{\prime}\right)^{\prime}\right]
=\displaystyle= 𝔼[WkWlvec(PkVkVkPk)vec(PlVlVlPl)+WkWlvec(PkVkVlPl)vec(PlVlVkPk)].\displaystyle\mathbb{E}\left[W_{k}W_{l}{\rm vec}\left(P_{k}V_{k}V_{k}^{\prime}P_{k}^{\prime}\right){\rm vec}\left(P_{l}V_{l}V_{l}^{\prime}P_{l}^{\prime}\right)^{\prime}+W_{k}W_{l}{\rm vec}\left(P_{k}V_{k}V_{l}^{\prime}P_{l}^{\prime}\right){\rm vec}\left(P_{l}V_{l}V_{k}^{\prime}P_{k}^{\prime}\right)^{\prime}\right].

From Lemma A.1 with p=2p=2 and q=2q=2, we have

𝔼[WkWlvec(PkVkVkPk)vec(PlVlVqPq)]\displaystyle\mathbb{E}\left[W_{k}W_{l}{\rm vec}\left(P_{k}V_{k}V_{k}^{\prime}P_{k}^{\prime}\right){\rm vec}\left(P_{l}V_{l}V_{q}^{\prime}P_{q}^{\prime}\right)^{\prime}\right]
=\displaystyle= vec(Pk𝔼[WkVkVk]Pk)vec(Pl𝔼[WlVlVl]Pl)\displaystyle{\rm vec}\left(P_{k}\mathbb{E}\left[W_{k}V_{k}V_{k}^{\prime}\right]P_{k}^{\prime}\right){\rm vec}\left(P_{l}\mathbb{E}\left[W_{l}V_{l}V_{l}^{\prime}\right]P_{l}^{\prime}\right)^{\prime}
=\displaystyle= vec(Jk)vec(Jl),\displaystyle{\rm vec}\left(J_{k}\right){\rm vec}\left(J_{l}\right)^{\prime},

and we also have

𝔼[WkWlvec(PkVkVlPl)vec(PlVlVkPk)]\displaystyle\mathbb{E}\left[W_{k}W_{l}{\rm vec}\left(P_{k}V_{k}V_{l}^{\prime}P_{l}^{\prime}\right){\rm vec}\left(P_{l}V_{l}V_{k}^{\prime}P_{k}^{\prime}\right)^{\prime}\right]
=\displaystyle= 𝔼[WkWlvec(PkVkVlPl)vec(PkVkVlPl)Kn]\displaystyle\mathbb{E}\left[W_{k}W_{l}{\rm vec}\left(P_{k}V_{k}V_{l}^{\prime}P_{l}^{\prime}\right){\rm vec}\left(P_{k}V_{k}V_{l}^{\prime}P_{l}^{\prime}\right)^{\prime}K_{n}\right]
=\displaystyle= (PlPk)𝔼[vec(V~kV~l)vec(V~kV~l)](PlPk)Kn\displaystyle\left(P_{l}\otimes P_{k}\right)\mathbb{E}\left[{\rm vec}\left(\tilde{V}_{k}\tilde{V}_{l}^{\prime}\right){\rm vec}\left(\tilde{V}_{k}\tilde{V}_{l}^{\prime}\right)^{\prime}\right]\left(P_{l}^{\prime}\otimes P_{k}^{\prime}\right)K_{n}
=\displaystyle= (PlPk)(PlPk)Kn\displaystyle\left(P_{l}\otimes P_{k}\right)\left(P_{l}^{\prime}\otimes P_{k}^{\prime}\right)K_{n}
=\displaystyle= (JlJk)Kn,\displaystyle\left(J_{l}\otimes J_{k}\right)K_{n},

where V~kN(0,Ink)\tilde{V}_{k}\sim N(0,I_{n_{k}}). Finally we arrive at

𝔼[φ(k,l)]=vec(Jk)vec(Jl)+(JlJk)Kn.\mathbb{E}\left[\varphi\left(k,l\right)\right]={\rm vec}\left(J_{k}\right){\rm vec}\left(J_{l}\right)^{\prime}+\left(J_{l}\otimes J_{k}\right)K_{n}.

The Expression for 𝔼(ss)\mathbb{E}\left(\nabla_{s}\nabla_{s}^{\prime}\right)

We have the following expression,

𝔼(ss)\displaystyle\mathbb{E}\left(\nabla_{s}\nabla_{s}^{\prime}\right) =𝔼[k=1Gl=1GWkWlvec(PkVkU)vec(PlVlU)]vec(In)vec(In)\displaystyle=\mathbb{E}\left[\sum_{k=1}^{G}\sum_{l=1}^{G}W_{k}W_{l}{\rm vec}\left(P_{k}V_{k}U^{\prime}\right){\rm vec}\left(P_{l}V_{l}U^{\prime}\right)^{\prime}\right]-{\rm vec}\left(I_{n}\right){\rm vec}\left(I_{n}\right)^{\prime}
=k=1Gl=1G[vec(Jk)vec(Jl)+(JlJk)Kn]vec(In)vec(In)\displaystyle=\sum_{k=1}^{G}\sum_{l=1}^{G}\left[{\rm vec}\left(J_{k}\right){\rm vec}\left(J_{l}\right)^{\prime}+\left(J_{l}\otimes J_{k}\right)K_{n}\right]-{\rm vec}\left(I_{n}\right){\rm vec}\left(I_{n}\right)^{\prime}
+k=1G[𝔼[φ(k,k)]vec(Jk)vec(Jk)+JkKn]\displaystyle\quad+\sum_{k=1}^{G}\left[\mathbb{E}\left[\varphi\left(k,k\right)\right]-{\rm vec}\left(J_{k}\right){\rm vec}\left(J_{k}\right)^{\prime}+J_{k\otimes}K_{n}\right]
=Kn+ΥG.\displaystyle=K_{n}+\Upsilon_{G}.

where ΥG=k=1GΨk\Upsilon_{G}=\sum_{k=1}^{G}\Psi_{k} with Ψk\Psi_{k} given by

Ψk\displaystyle\Psi_{k} =ψk(InJk)Jk+ϕk[JkHn+vec(Jk)vec(Jk)]vec(Jk)vec(Jk)JkKn\displaystyle=\psi_{k}\left(I_{n}-J_{k}\right)\otimes J_{k}+\phi_{k}\left[J_{k\otimes}H_{n}+{\rm vec}\left(J_{k}\right){\rm vec}\left(J_{k}\right)^{\prime}\right]-{\rm vec}\left(J_{k}\right){\rm vec}\left(J_{k}\right)^{\prime}-J_{k\otimes}K_{n}
=ψk(InJk)Jk+ϕkJk+(ϕk1)[JkKn+vec(Jk)vec(Jk)]\displaystyle=\psi_{k}\left(I_{n}-J_{k}\right)\otimes J_{k}+\phi_{k}J_{k\otimes}+\left(\phi_{k}-1\right)\left[J_{k\otimes}K_{n}+{\rm vec}\left(J_{k}\right){\rm vec}\left(J_{k}\right)^{\prime}\right]
=ψk(InJk)+(ϕkψk)Jk+(ϕk1)[JkKn+vec(Jk)vec(Jk)].\displaystyle=\psi_{k}\left(I_{n}\otimes J_{k}\right)+\left(\phi_{k}-\psi_{k}\right)J_{k\otimes}+\left(\phi_{k}-1\right)\left[J_{k\otimes}K_{n}+{\rm vec}\left(J_{k}\right){\rm vec}\left(J_{k}\right)^{\prime}\right].

So, the final formula for information matrix \mathcal{I} is given by

\displaystyle\mathcal{I} =MΩ𝔼(ss)ΩM=MΩ(Kn+ΥG)ΩM,\displaystyle=M^{\prime}\Omega\mathbb{E}\left(\nabla_{s}\nabla_{s}^{\prime}\right)\Omega M=M^{\prime}\Omega\left(K_{n}+\Upsilon_{G}\right)\Omega M,

as stated in (21). This completes the proof. \square

A.2 Block Correlation Matrix with Multivariate tt-Distribution

Next, we prove the results in Section 4.2. For latter use, we define the following variables:

QY=Y0A1Y0+k=1Kλk1YkYk,W=ν+nν2+QY,A=vec(A),ΠA=vec(A)vec(W).Q_{Y}=Y_{0}^{\prime}A^{-1}Y_{0}+\sum_{k=1}^{K}\lambda_{k}^{-1}Y_{k}^{\prime}Y_{k},\quad W=\frac{\nu+n}{\nu-2+Q_{Y}},\quad\nabla_{A}=\frac{\partial\ell}{\partial{\rm vec}\left(A\right)},\quad\Pi_{A}=\frac{\partial{\rm vec}\left(A\right)}{\partial{\rm vec}\left(W\right)^{\prime}}.

By (6) we have following form of log-likelihood function

(Z)=c12log|A|12k=1K(nk1)logλkν+nk2log(1+1ν2QY).\ell(Z)=c-\tfrac{1}{2}\log|A|-\frac{1}{2}\sum_{k=1}^{K}\left(n_{k}-1\right)\log\lambda_{k}-\tfrac{\nu+n_{k}}{2}\log\left(1+\tfrac{1}{\nu-2}Q_{Y}\right).

Because C~=Λn1WΛn1\tilde{C}=\Lambda_{n}^{-1}W\Lambda_{n}^{-1}, we have

η=vech(C~)=LKΛn1vec(W),\eta={\rm vech}(\tilde{C})=L_{K}\Lambda_{n\otimes}^{-1}{\rm vec}\left(W\right),

and the score is given by

=η\displaystyle\nabla^{\prime}=\frac{\partial\ell}{\partial\eta^{\prime}} =vec(A)=Avec(A)vec(W)vec(W)η=ΠA.\displaystyle=\underset{=\nabla_{A}^{\prime}}{\underbrace{\frac{\partial\ell}{\partial{\rm vec}\left(A\right)^{\prime}}}}\underset{=\Pi_{A}}{\underbrace{\frac{\partial{\rm vec}\left(A\right)}{\partial{\rm vec}\left(W\right)^{\prime}}\frac{\partial{\rm vec}\left(W\right)}{\partial\eta^{\prime}}}}.

Proof of Lemma 1. We have ΠA=vec(A)vec(W)vec(W)η\Pi_{A}=\frac{\partial{\rm vec}(A)}{\partial{\rm vec}(W)^{\prime}}\frac{\partial{\rm vec}\left(W\right)}{\partial\eta^{\prime}}, where vec(W)η=ΛnDK\frac{\partial{\rm vec}\left(W\right)}{\partial\eta^{\prime}}=\Lambda_{n\otimes}D_{K} and

vec(W)vec(A)=vec(logA)vec(A)vec(logΛλ)vec(A)=vec(logA)vec(A)Eddiag(logΛλ)diag(A)Ed,\frac{\partial{\rm vec}\left(W\right)}{\partial{\rm vec}\left(A\right)^{\prime}}=\frac{\partial{\rm vec}\left(\log A\right)}{\partial{\rm vec}\left(A\right)^{\prime}}-\frac{\partial{\rm vec}\left(\log\Lambda_{\lambda}\right)}{\partial{\rm vec}\left(A\right)^{\prime}}=\frac{\partial{\rm vec}\left(\log A\right)}{\partial{\rm vec}\left(A\right)^{\prime}}-E_{d}^{\prime}\frac{\partial{\rm diag}\left(\log\Lambda_{\lambda}\right)}{\partial{\rm diag}\left(A\right)^{\prime}}E_{d},

where the matrix Φ~=diag(logΛλ)/diag(A)\tilde{\Phi}=\partial{\rm diag}\left(\log\Lambda_{\lambda}\right)/\partial{\rm diag}\left(A\right)^{\prime} is a diagonal matrix with diagonal elements (Akknk)1=λk1(1nk)1\left(A_{kk}-n_{k}\right)^{-1}=\lambda_{k}^{-1}\left(1-n_{k}\right)^{-1} for k=1,,Kk=1,\ldots,K. The formula for dvec(logA)/dvec(A)d{\rm vec}(\log A)/d{\rm vec}(A) is given by ΓA1\Gamma_{A}^{-1} in (A.2). So, we have

ΠA=vec(A)vec(W)ΛnDK=(vec(W)vec(A))1ΛnDK.\Pi_{A}=\frac{\partial{\rm vec}\left(A\right)}{\partial{\rm vec}\left(W\right)^{\prime}}\Lambda_{n\otimes}D_{K}=\left(\frac{\partial{\rm vec}\left(W\right)}{\partial{\rm vec}\left(A\right)^{\prime}}\right)^{-1}\Lambda_{n\otimes}D_{K}.

Using the Woodbury formula, we simplify the inverse of the K2×K2K^{2}\times K^{2} matrix,

(vec(W)vec(A))1=(ΓA1+EdΦ~Ed)1=ΓAΓAEd(Φ~1+EdΓAEd)1EdΓA,\left(\frac{\partial{\rm vec}\left(W\right)}{\partial{\rm vec}\left(A\right)^{\prime}}\right)^{-1}=\left(\Gamma_{A}^{-1}+E_{d}^{\prime}\tilde{\Phi}E_{d}\right)^{-1}=\Gamma_{A}-\Gamma_{A}E_{d}^{\prime}\left(\tilde{\Phi}^{-1}+E_{d}\Gamma_{A}E_{d}^{\prime}\right)^{-1}E_{d}\Gamma_{A},

which only requires the inverse of the low dimension, K×KK\times K matrix, Φ~1+EdΓAEd\tilde{\Phi}^{-1}+E_{d}\Gamma_{A}E_{d}^{\prime} to be evaluated. Moreover, because Φ~\tilde{\Phi} is a diagonal matrix with elements Φ~kk=λk1(nk1)1\tilde{\Phi}_{kk}=\lambda_{k}^{-1}\left(n_{k}-1\right)^{-1}, we define the diagonal matrix Φ\Phi with diagonal elements Φkk=λk(nk1)\Phi_{kk}=\lambda_{k}\left(n_{k}-1\right), such that Φ=Φ~1\Phi=\tilde{\Phi}^{-1}. This proves (22) and completes the proof of Lemma 1. \square

Proof of Theorem 3. The expression for A\nabla_{A} is given by,

A=12vec(A1)12k=1Knk1λkλkvec(A)12WQYvec(A),\nabla_{A}^{\prime}=-\tfrac{1}{2}{\rm vec}\left(A^{-1}\right)^{\prime}-\tfrac{1}{2}\sum_{k=1}^{K}\frac{n_{k}-1}{\lambda_{k}}\frac{\partial\lambda_{k}}{\partial{\rm vec}\left(A\right)^{\prime}}-\tfrac{1}{2}W\frac{\partial Q_{Y}}{\partial{\rm vec}(A)^{\prime}},

with

λkvec(A)=λkdiag(A)diag(A)vec(A)=(1nk)1ek,KEd,\frac{\partial\lambda_{k}}{\partial{\rm vec}\left(A\right)^{\prime}}=\frac{\partial\lambda_{k}}{\partial{\rm diag}\left(A\right)^{\prime}}\frac{\partial{\rm diag}\left(A\right)}{\partial{\rm vec}\left(A\right)^{\prime}}=\left(1-n_{k}\right)^{-1}e_{k,K}^{\prime}E_{d},

where ek,Ke_{k,K} is the kk-th column of the K×KK\times K identity matrix IKI_{K}. Then we obtain

QYvec(A)=vec(Y0Y0)A1k=1Kλk2(1nk)1(YkYk)ek,KEd,\frac{\partial Q_{Y}}{\partial{\rm vec}(A)^{\prime}}=-{\rm vec}\left(Y_{0}Y_{0}^{\prime}\right)^{\prime}A_{\otimes}^{-1}-\sum_{k=1}^{K}\lambda_{k}^{-2}\left(1-n_{k}\right)^{-1}\left(Y_{k}^{\prime}Y_{k}\right)e_{k,K}^{\prime}E_{d},

which leads to

A\displaystyle\nabla_{A}^{\prime} =12[Wvec(Y0Y0)vec(A)]A1+12SEd\displaystyle=\tfrac{1}{2}\left[W{\rm vec}\left(Y_{0}Y_{0}^{\prime}\right)^{\prime}-{\rm vec}\left(A\right)^{\prime}\right]A_{\otimes}^{-1}+\tfrac{1}{2}S^{\prime}E_{d}
=12[Wvec(X0X0)vec(IK)]A12+12SEd,\displaystyle=\tfrac{1}{2}\left[W{\rm vec}\left(X_{0}X_{0}^{\prime}\right)^{\prime}-{\rm vec}\left(I_{K}\right)^{\prime}\right]A_{\otimes}^{-\frac{1}{2}}+\tfrac{1}{2}S^{\prime}E_{d}, (A.5)

where SS is a K×1K\times 1 vector defined by

S=k=1K(λk1Wλk1(nk1)1XkXk)ek,withSk=(nk1)WXkXkλk(nk1).S=\sum_{k=1}^{K}\left(\lambda_{k}^{-1}-W\lambda_{k}^{-1}\left(n_{k}-1\right)^{-1}X_{k}^{\prime}X_{k}\right)e_{k}^{\prime},\quad{\rm with}\quad S_{k}=\frac{\left(n_{k}-1\right)-WX_{k}^{\prime}X_{k}}{\lambda_{k}\left(n_{k}-1\right)}.

The Information Matrix

First, from the formula of score, we have following form of information matrix,

=ΠAAΠA.\mathcal{I}=\Pi_{A}^{\prime}\mathcal{I}_{A}\Pi_{A}.

So we need to compute the matrix A=𝔼(AA)\mathcal{I}_{A}=\mathbb{E}\left(\nabla_{A}\nabla_{A}^{\prime}\right). From, (A.5), we could find its first term is a function of X0X_{0} and the second term is a function of XkXk,k=1,2,..KX_{k}^{\prime}X_{k},k=1,2,..K. So, for the first term, we have

A(1)\displaystyle\nabla_{A}^{(1)} \displaystyle\equiv 12A12[Wvec(X0X0)vec(IK)]\displaystyle\tfrac{1}{2}A_{\otimes}^{-\frac{1}{2}}\left[W{\rm vec}\left(X_{0}X_{0}^{\prime}\right)-{\rm vec}\left(I_{K}\right)\right]
A(1)\displaystyle\mathcal{I}_{A}^{(1)} =\displaystyle= 14A1/2(𝔼[W2vec(X0X0)vec(X0X0)]vec(IK)vec(IK))A1/2.\displaystyle\tfrac{1}{4}A_{\otimes}^{-1/2}\left(\mathbb{E}\left[W^{2}{\rm vec}\left(X_{0}X_{0}^{\prime}\right){\rm vec}\left(X_{0}X_{0}^{\prime}\right)^{\prime}\right]-{\rm vec}\left(I_{K}\right){\rm vec}\left(I_{K}\right)^{\prime}\right)A_{\otimes}^{-1/2}.

Similar to (A.4), we have

A(1)\displaystyle\mathcal{I}_{A}^{(1)} =14[ϕA1HK+(ϕ1)vec(A1)vec(A1)].\displaystyle=\tfrac{1}{4}\left[\phi A_{\otimes}^{-1}H_{K}+\left(\phi-1\right){\rm vec}\left(A^{-1}\right){\rm vec}\left(A^{-1}\right)^{\prime}\right].

For the second term, we first define

A(2)\displaystyle\nabla_{A}^{(2)} 12EdS=12EdΛλS¯,A(2)=14EdΛλ𝔼(S¯S¯)ΛλEd,\displaystyle\equiv\tfrac{1}{2}E_{d}^{\prime}S=\tfrac{1}{2}E_{d}^{\prime}\Lambda_{\lambda}\bar{S},\quad\mathcal{I}_{A}^{(2)}=\tfrac{1}{4}E_{d}^{\prime}\Lambda_{\lambda}\mathbb{E}\left(\bar{S}\bar{S}^{\prime}\right)\Lambda_{\lambda}E_{d},

where the element in vector S¯\bar{S} and diagonal matrix Λλ\Lambda_{\lambda} are define by S¯k=WXkXk(nk1)\bar{S}_{k}=WX_{k}^{\prime}X_{k}-\left(n_{k}-1\right) and [Λλ]kk=[λk(nk1)]1\left[\Lambda_{\lambda}\right]_{kk}=\left[-\lambda_{k}\left(n_{k}-1\right)\right]^{-1}. We know that

𝔼[W2(XkXk)(XlXl)]\displaystyle\mathbb{E}\left[W^{2}\left(X_{k}^{\prime}X_{k}\right)\left(X_{l}^{\prime}X_{l}\right)\right] =ϕ𝔼[(Z~kZ~k)(Z~lZ~l)]={ϕ(nk1)(nl1)kl,ϕ[(nk1)2+2(nk1)]k=l,\displaystyle=\phi\mathbb{E}\left[\left(\tilde{Z}_{k}^{\prime}\tilde{Z}_{k}\right)\left(\tilde{Z}_{l}^{\prime}\tilde{Z}_{l}\right)\right]=\begin{cases}\phi\left(n_{k}-1\right)\left(n_{l}-1\right)&k\neq l,\\ \phi\left[\left(n_{k}-1\right)^{2}+2\left(n_{k}-1\right)\right]&k=l,\end{cases}

where Z~kN(0,Ink)\tilde{Z}_{k}\sim N\left(0,I_{n_{k}}\right). So 𝔼(S¯kS¯l)=𝔼[W2(XkXk)(XlXl)](nk1)(nl1)\mathbb{E}\left(\bar{S}_{k}\bar{S}_{l}^{\prime}\right)=\mathbb{E}\left[W^{2}\left(X_{k}^{\prime}X_{k}\right)\left(X_{l}^{\prime}X_{l}\right)\right]-\left(n_{k}-1\right)\left(n_{l}-1\right) is given by

𝔼(S¯kS¯l)={(ϕ1)(nk1)(nl1)kl,(ϕ1)(nk1)2+2ϕ(nk1)k=l,\mathbb{E}\left(\bar{S}_{k}\bar{S}_{l}^{\prime}\right)=\begin{cases}\left(\phi-1\right)\left(n_{k}-1\right)\left(n_{l}-1\right)&k\neq l,\\ \left(\phi-1\right)\left(n_{k}-1\right)^{2}+2\phi\left(n_{k}-1\right)&k=l,\end{cases}

and along with the following K×1K\times 1 vector ξ\xi and diagonal matrix Ξ\Xi, we have

Λλ𝔼(S¯S¯)Λλ=(ϕ1)ξξ+2ϕΞ,ξk=λk1,Ξkk=λk2(nk1)1.\Lambda_{\lambda}\mathbb{E}\left(\bar{S}\bar{S}^{\prime}\right)\Lambda_{\lambda}=\left(\phi-1\right)\xi\xi^{\prime}+2\phi\Xi,\quad\xi_{k}=\lambda_{k}^{-1},\quad\Xi_{kk}=\lambda_{k}^{-2}\left(n_{k}-1\right)^{-1}.

So,

A(2)\displaystyle\mathcal{I}_{A}^{(2)} =14Ed[(ϕ1)ξξ+2ϕΞ]Ed=ϕ14EdξξEd+ϕ2EdΞEd,\displaystyle=\tfrac{1}{4}E_{d}^{\prime}\left[\left(\phi-1\right)\xi\xi^{\prime}+2\phi\Xi\right]E_{d}=\tfrac{\phi-1}{4}E_{d}^{\prime}\xi\xi^{\prime}E_{d}+\tfrac{\phi}{2}E_{d}^{\prime}\Xi E_{d},

and we also have

𝔼(Wvec(X0X0)vec(IK))S¯k\displaystyle\mathbb{E}\left(W{\rm vec}\left(X_{0}X_{0}^{\prime}\right)-{\rm vec}(I_{K})\right)\bar{S}_{k} =𝔼[W2vec(X0X0)(XkXk)](nk1)vec(IK)\displaystyle=\mathbb{E}\left[W^{2}{\rm vec}\left(X_{0}X_{0}^{\prime}\right)\left(X_{k}^{\prime}X_{k}\right)\right]-\left(n_{k}-1\right){\rm vec}(I_{K})
=(ϕ1)(nk1)vec(IK).\displaystyle=\left(\phi-1\right)\left(n_{k}-1\right){\rm vec}\left(I_{K}\right).

Hence,

A(12)=𝔼(A(1)A(2))=\displaystyle\mathcal{I}_{A}^{(12)}=\mathbb{E}\left(\nabla_{A}^{(1)}\nabla_{A}^{(2)\prime}\right)= 14A1/2(ϕ1)vec(IK)ξEd=1ϕ4vec(A1)ξEd.\displaystyle-\tfrac{1}{4}A_{\otimes}^{-1/2}\left(\phi-1\right){\rm vec}\left(I_{K}\right)\xi^{\prime}E_{d}=\tfrac{1-\phi}{4}{\rm vec}\left(A_{\otimes}^{-1}\right)\xi^{\prime}E_{d}.

Finally, we have

A=A(1)+A(2)+A(12)+A(21),\mathcal{I}_{A}=\mathcal{I}_{A}^{(1)}+\mathcal{I}_{A}^{(2)}+\mathcal{I}_{A}^{(12)}+\mathcal{I}_{A}^{(21)},

which gives the expression in Theorem 3. Finally, in the limited case, ν\nu\rightarrow\infty, which corresponds to the multivariate normal distribution, we have ϕ1\phi\rightarrow 1, and the information matrix simplifies to A=14A1HK+12EdΞEd.\mathcal{I}_{A}=\tfrac{1}{4}A_{\otimes}^{-1}H_{K}+\tfrac{1}{2}E_{d}^{\prime}\Xi E_{d}. This completes the proof. \square

Proof of Theorem 4. For this case we have

A=12[W0vec(X0X0)vec(IK)]A12+12SEd,\nabla_{A}^{\prime}=\tfrac{1}{2}\left[W_{0}{\rm vec}\left(X_{0}X_{0}^{\prime}\right)^{\prime}-{\rm vec}\left(I_{K}\right)^{\prime}\right]A_{\otimes}^{-\frac{1}{2}}+\tfrac{1}{2}S^{\prime}E_{d},

where W0W_{0} and SKS\in\mathbb{R}^{K} are given by

W0=ν0+Kν02+X0X0,Sk=(nk1)WkXkXkλk(nk1),withWk=νk+nk1νk2+XkXk.W_{0}=\frac{\nu_{0}+K}{\nu_{0}-2+X_{0}^{\prime}X_{0}},\quad S_{k}=\frac{(n_{k}-1)-W_{k}X_{k}^{\prime}X_{k}}{\lambda_{k}(n_{k}-1)},\quad\text{with}\quad W_{k}=\frac{\nu_{k}+n_{k}-1}{\nu_{k}-2+X_{k}^{\prime}X_{k}}.

The covariance of the first part was derived in Theorem 3 and is given by

A(1)=14(ϕ0A1HK+(ϕ1)vec(A1)vec(A1)),ϕ0=ν0+Kν0+K+2.\mathcal{I}_{A}^{(1)}=\tfrac{1}{4}\left(\phi_{0}A_{\otimes}^{-1}H_{K}+\left(\phi-1\right){\rm vec}\left(A^{-1}\right){\rm vec}\left(A^{-1}\right)^{\prime}\right),\quad\phi_{0}=\frac{\nu_{0}+K}{\nu_{0}+K+2}.

For the second part, we have 𝔼(S¯kS¯l)=0\mathbb{E}\left(\bar{S}_{k}\bar{S}_{l}\right)=0 for klk\neq l, and

𝔼(S¯k2)=(ϕk1)(nk1)2+2ϕk(nk1),\mathbb{E}\left(\bar{S}_{k}^{2}\right)=\left(\phi_{k}-1\right)\left(n_{k}-1\right)^{2}+2\phi_{k}\left(n_{k}-1\right),

with ϕk=(νk+nk1)/(νk+nk+1)\phi_{k}=\left(\nu_{k}+n_{k}-1\right)/\left(\nu_{k}+n_{k}+1\right). Therefore, we have

A(2)=EdΛλ𝔼(S¯S¯)ΛλEd=Ξ,Ξkk=2ϕkλk2(nk1)+ϕk1λk2,\mathcal{I}_{A}^{(2)}=E_{d}^{\prime}\Lambda_{\lambda}\mathbb{E}\left(\bar{S}\bar{S}^{\prime}\right)\Lambda_{\lambda}E_{d}=\Xi,\quad\Xi_{kk}=\frac{2\phi_{k}}{\lambda_{k}^{2}\left(n_{k}-1\right)}+\frac{\phi_{k}-1}{\lambda_{k}^{2}},

and A(12)=𝔼(A(1)A(2))=0\mathcal{I}_{A}^{(12)}=\mathbb{E}\left(\nabla_{A}^{(1)}\nabla_{A}^{(2)\prime}\right)=0. Finally, we obtain

A=14[ϕ0A1HK+(ϕ1)vec(A1)vec(A1)+EdΞEd].\mathcal{I}_{A}=\tfrac{1}{4}\left[\phi_{0}A_{\otimes}^{-1}H_{K}+(\phi-1){\rm vec}\left(A^{-1}\right){\rm vec}\left(A^{-1}\right)^{\prime}+E_{d}^{\prime}\Xi E_{d}\right].

Appendix B Block Correlation Matrix with Cluster-tt Distribution

Because P=InP=I_{n}, and 𝒏=𝒎\bm{n}=\bm{m}, we have V=UV=U, and Vk=UkV_{k}=U_{k}. Then the log-likelihood function is given by

(Z)\displaystyle\ell\left(Z\right) =12log|C|+k=1Kckνk+nk2log(1+1νk2UkUk),\displaystyle=-\tfrac{1}{2}\log|C|+\sum_{k=1}^{K}c_{k}-\tfrac{\nu_{k}+n_{k}}{2}\log\left(1+\tfrac{1}{\nu_{k}-2}U_{k}^{\prime}U_{k}\right),

by using the canonical representation of block correlation matrix C=QDQC=QDQ^{\prime}, we define the vectors XX and YY as Y=QZY=Q^{\prime}Z and X=QUX=Q^{\prime}U, so we have X0=A12Y0X_{0}=A^{-\frac{1}{2}}Y_{0}, Xk=λk12YkX_{k}=\lambda_{k}^{-\frac{1}{2}}Y_{k}. From U=QXU=QX and the structure of QQ, we have

Uk,i\displaystyle U_{k,i} =nk1/2X0,k+(e~ivnk)Xk\displaystyle=n_{k}^{-1/2}X_{0,k}+\left(\tilde{e}_{i}^{\prime}v_{n_{k}}^{\perp}\right)X_{k}
Uk,i2\displaystyle U_{k,i}^{2} =X0,k2nk1+Xk(vnke~ie~ivnk)Xk+2X0,k(vnke~ie~ivnk)Xk,\displaystyle=X_{0,k}^{2}n_{k}^{-1}+X_{k}^{\prime}\left(v_{n_{k}}^{\perp\prime}\tilde{e}_{i}\tilde{e}_{i}^{\prime}v_{n_{k}}^{\perp}\right)X_{k}+2X_{0,k}\left(v_{n_{k}}^{\prime}\tilde{e}_{i}\tilde{e}_{i}^{\prime}v_{n_{k}}^{\perp}\right)X_{k},

for i=1,,nki=1,\ldots,n_{k}, and e~ink×1\tilde{e}_{i}\in\mathbb{R}^{n_{k}\times 1} denote the ii-th column of InkI_{n_{k}}. So we obtain

UkUk=i=1nkUk,i2=X0,k2+XkXk=Y0A~k1Y0+λk1YkYk,U_{k}^{\prime}U_{k}=\sum_{i=1}^{n_{k}}U_{k,i}^{2}=X_{0,k}^{2}+X_{k}^{\prime}X_{k}=Y_{0}^{\prime}\tilde{A}_{k}^{-1}Y_{0}+\lambda_{k}^{-1}Y_{k}^{\prime}Y_{k},

where A~k1=A12JkeA12\tilde{A}_{k}^{-1}=A^{-\tfrac{1}{2}}J_{k}^{e}A^{-\tfrac{1}{2}} , and Jke=ekekJ_{k}^{e}=e_{k}e_{k}^{\prime} with eke_{k} the kk-th column of the K×KK\times K identity matrix IKI_{K}. This leads to the the simplified expression for log-likelihood function

(Z)\displaystyle\ell\left(Z\right) =12log|A|+k=1K[cvk,nknk12logλkνk+nk2log(1+1νk2(X0,k2+XkXk))].\displaystyle=-\tfrac{1}{2}\log|A|+\sum_{k=1}^{K}\left[c_{v_{k},n_{k}}-\tfrac{n_{k}-1}{2}\log\lambda_{k}-\tfrac{\nu_{k}+n_{k}}{2}\log\left(1+\tfrac{1}{\nu_{k}-2}\left(X_{0,k}^{2}+X_{k}^{\prime}X_{k}\right)\right)\right].

Note that we have

X0,k=vnkUk,Xk=vnkUkfork=1,,K,X_{0,k}=v_{n_{k}}^{\prime}U_{k},\quad X_{k}=v_{n_{k}}^{\perp\ \prime}U_{k}\qquad\text{for}\quad k=1,\ldots,K,

such that X0,kX_{0,k} and XkX_{k} are simply linear combinations of UkU_{k}. From the structure of QQ it follows that X0,kX_{0,k} and X0,lX_{0,l} are independent for klk\neq l, just as it the case for XkX_{k} and XlX_{l} (by their definition). We also have that X0,kX_{0,k} and XkX_{k} are uncorrelated, but they are not independent, because they have tt-distributed shocks in common.

B.1 The Form of the Score

By using X0,k=ek,KA12Y0X_{0,k}=e_{k,K}^{\prime}A^{-\frac{1}{2}}Y_{0}, we first have

X0,k2vec(A)\displaystyle\frac{\partial X_{0,k}^{2}}{\partial{\rm vec}\left(A\right)^{\prime}} =X0,k2X0,k(ekA12Y0)vec(A12)vec(A12)vec(A12)vec(A12)vec(A)\displaystyle=\frac{\partial X_{0,k}^{2}}{\partial X_{0,k}}\frac{\partial\left(e_{k}^{\prime}A^{-\frac{1}{2}}Y_{0}\right)}{\partial{\rm vec}\left(A^{-\frac{1}{2}}\right)^{\prime}}\frac{\partial{\rm vec}\left(A^{-\frac{1}{2}}\right)^{\prime}}{\partial{\rm vec}\left(A^{\frac{1}{2}}\right)^{\prime}}\frac{\partial{\rm vec}\left(A^{\frac{1}{2}}\right)^{\prime}}{\partial{\rm vec}\left(A\right)^{\prime}}
=2X0,k(Y0ek)A12(A12I)1\displaystyle=-2X_{0,k}\left(Y_{0}^{\prime}\otimes e_{k}^{\prime}\right)A_{\otimes}^{-\frac{1}{2}}\left(A^{\frac{1}{2}}\oplus I\right)^{-1}
=2X0,k(X0ekA12)(A12I)1\displaystyle=-2X_{0,k}\left(X_{0}^{\prime}\otimes e_{k}^{\prime}A^{-\frac{1}{2}}\right)\left(A^{\frac{1}{2}}\oplus I\right)^{-1}
=2X0,kvec(A12ekX0)(A12I)1\displaystyle=-2X_{0,k}{\rm vec}\left(A^{-\frac{1}{2}}e_{k}X_{0}^{\prime}\right)^{\prime}\left(A^{\frac{1}{2}}\oplus I\right)^{-1}
=2X0,kvec(ekX0)Ω,\displaystyle=-2X_{0,k}{\rm vec}\left(e_{k}X_{0}^{\prime}\right)^{\prime}\Omega,

where Ω=(IA12)(A12I)1\Omega=\left(I\otimes A^{-\frac{1}{2}}\right)\left(A^{\frac{1}{2}}\oplus I\right)^{-1}. It follows that

A=vec(A)=[k=1KWkX0,kvec(ekX0)vec(IK)]Ω+12EdS\nabla_{A}^{\prime}=\frac{\partial\ell}{\partial{\rm vec}\left(A\right)^{\prime}}=\left[\sum_{k=1}^{K}W_{k}X_{0,k}{\rm vec}\left(e_{k}X_{0}^{\prime}\right)^{\prime}-{\rm vec}\left(I_{K}\right)^{\prime}\right]\Omega+\tfrac{1}{2}E_{d}^{\prime}S

where SS is a K×1K\times 1 vector defined by

Sk=(nk1)WkXkXk(nk1)λk,Wk=νk+nkνk2+X0,k2+XkXk.S_{k}=\frac{\left(n_{k}-1\right)-W_{k}X_{k}^{\prime}X_{k}}{\left(n_{k}-1\right)\lambda_{k}},\quad W_{k}=\frac{\nu_{k}+n_{k}}{\nu_{k}-2+X_{0,k}^{2}+X_{k}^{\prime}X_{k}}.

B.2 The Form of the Information Matrix

The information matrix of A\nabla_{A} can be decompose into four components, given by

A=A(1)+A(2)+A(21)+A(12).\mathcal{I}_{A}=\mathcal{I}_{A}^{(1)}+\mathcal{I}_{A}^{(2)}+\mathcal{I}_{A}^{(21)}+\mathcal{I}_{A}^{(12)}.

B.2.1 The Form of Matrix A(1)\mathcal{I}_{A}^{(1)}

Similar to previous proof, the covariance of the first part of /vec(A)\partial\ell/\partial{\rm vec}\left(A\right) is given by

A(1)\displaystyle\mathcal{I}_{A}^{(1)} =Ω𝔼[k=1Kl=1KWkWlX0,kX0,lvec(ekX0)vec(elX0)vec(IK)vec(IK)]Ω,\displaystyle=\Omega\mathbb{E}\left[\sum_{k=1}^{K}\sum_{l=1}^{K}W_{k}W_{l}X_{0,k}X_{0,l}{\rm vec}\left(e_{k}X_{0}^{\prime}\right){\rm vec}\left(e_{l}X_{0}^{\prime}\right)^{\prime}-{\rm vec}\left(I_{K}\right){\rm vec}\left(I_{K}\right)^{\prime}\right]\Omega,
=Ω(KK+Ψe)Ω\displaystyle=\Omega\left(K_{K}+\Psi^{e}\right)\Omega

where Ψe=k=1KΨke\Psi^{e}=\sum_{k=1}^{K}\Psi_{k}^{e} with

Ψke\displaystyle\Psi_{k}^{e} =ψk(IJke)Jke+ϕkJke+(ϕk1)[JkeKK+vec(Jke)vec(Jke)],\displaystyle=\psi_{k}\left(I-J_{k}^{e}\right)\otimes J_{k}^{e}+\phi_{k}J_{k\otimes}^{e}+\left(\phi_{k}-1\right)\left[J_{k\otimes}^{e}K_{K}+{\rm vec}\left(J_{k}^{e}\right){\rm vec}\left(J_{k}^{e}\right)^{\prime}\right],

with ϕk=(vk+nk)/(vk+nk+2)\phi_{k}=(v_{k}+n_{k})/(v_{k}+n_{k}+2).

B.2.2 The Form of Matrix A(2)\mathcal{I}_{A}^{(2)}

As for the second part, we have

A(2)=14Ed𝔼(SS)Ed=14EdΞEd,\mathcal{I}_{A}^{(2)}=\tfrac{1}{4}E_{d}^{\prime}\mathbb{E}\left(SS^{\prime}\right)E_{d}=\tfrac{1}{4}E_{d}^{\prime}\Xi E_{d},

where Ξkk=𝔼(Sk2)\Xi_{kk}=\mathbb{E}\left(S_{k}^{2}\right) given by

𝔼(Sk2)\displaystyle\mathbb{E}\left(S_{k}^{2}\right) =[𝔼(Wk2(XkXk)2)(nk1)2]/[λk2(nk1)2]\displaystyle=\left[\mathbb{E}\left(W_{k}^{2}\left(X_{k}^{\prime}X_{k}\right)^{2}\right)-\left(n_{k}-1\right)^{2}\right]/\left[\lambda_{k}^{2}\left(n_{k}-1\right)^{2}\right]
=[ϕk[(nk1)2+2(nk1)](nk1)2]/[λk2(nk1)2]\displaystyle=\left[\phi_{k}\left[\left(n_{k}-1\right)^{2}+2\left(n_{k}-1\right)\right]-\left(n_{k}-1\right)^{2}\right]/\left[\lambda_{k}^{2}\left(n_{k}-1\right)^{2}\right]
=(ϕk1)λk2+2ϕkλk2(nk1)1,\displaystyle=\left(\phi_{k}-1\right)\lambda_{k}^{-2}+2\phi_{k}\lambda_{k}^{-2}\left(n_{k}-1\right)^{-1},

and Ξkl=𝔼(SkSl)=0\Xi_{kl}=\mathbb{E}\left(S_{k}S_{l}\right)=0 for klk\neq l, so Ξ\Xi is a K×KK\times K diagonal matrix.

B.2.3 The Form of Matrix A(12)\mathcal{I}_{A}^{(12)}

As for the interaction term, we have A(21)=[A(12)]\mathcal{I}_{A}^{(21)}=\left[\mathcal{I}_{A}^{(12)}\right]^{\prime}, and

A(12)=12Ed𝔼[S(k=1KWkX0,kvec(ekX0)vec(I))]Ω=12EdΘΩ,\mathcal{I}_{A}^{(12)}=\tfrac{1}{2}E_{d}^{\prime}\mathbb{E}\left[S\left(\sum_{k=1}^{K}W_{k}X_{0,k}{\rm vec}\left(e_{k}X_{0}^{\prime}\right)^{\prime}-{\rm vec}\left(I\right)^{\prime}\right)\right]\Omega=\tfrac{1}{2}E_{d}^{\prime}\Theta\Omega,

where the kk-th row of the K×K2K\times K^{2} matrix Θ\Theta is given by

ekΘ\displaystyle e_{k}^{\prime}\Theta =𝔼[Sk(l=1KWlX0,lvec(elX0)vec(IK))]\displaystyle=\mathbb{E}\left[S_{k}\left(\sum_{l=1}^{K}W_{l}X_{0,l}{\rm vec}\left(e_{l}X_{0}^{\prime}\right)^{\prime}-{\rm vec}\left(I_{K}\right)^{\prime}\right)\right]
=(nk1)1λk1[𝔼(l=1KWkWl(XkXk)X0,lvec(elX0))(nk1)vec(IK)],\displaystyle=\left(n_{k}-1\right)^{-1}\lambda_{k}^{-1}\left[\mathbb{E}\left(\sum_{l=1}^{K}W_{k}W_{l}\left(X_{k}^{\prime}X_{k}\right)X_{0,l}{\rm vec}\left(e_{l}X_{0}^{\prime}\right)\right)-\left(n_{k}-1\right){\rm vec}\left(I_{K}\right)\right],

when k=lk=l, we have

𝔼[Wk2(XkXk)X0,kvec(ekX0)]=\displaystyle\mathbb{E}\left[W_{k}^{2}\left(X_{k}^{\prime}X_{k}\right)X_{0,k}{\rm vec}\left(e_{k}X_{0}^{\prime}\right)\right]= 𝔼[p=1KWk2(XkXk)X0,kvec(ekX0,pep)]\displaystyle\mathbb{E}\left[\sum_{p=1}^{K}W_{k}^{2}\left(X_{k}^{\prime}X_{k}\right)X_{0,k}{\rm vec}\left(e_{k}X_{0,p}e_{p}^{\prime}\right)\right]
=\displaystyle= 𝔼[Wk2(XkXk)X0,k2vec(Jke)]\displaystyle\mathbb{E}\left[W_{k}^{2}\left(X_{k}^{\prime}X_{k}\right)X_{0,k}^{2}{\rm vec}\left(J_{k}^{e}\right)\right]
=\displaystyle= ϕk(nk1)vec(Jke),\displaystyle\phi_{k}\left(n_{k}-1\right){\rm vec}\left(J_{k}^{e}\right),

when klk\neq l, we have

𝔼[WkWl(XkXk)X0,lvec(elX0)]=\displaystyle\mathbb{E}\left[W_{k}W_{l}\left(X_{k}^{\prime}X_{k}\right)X_{0,l}{\rm vec}\left(e_{l}X_{0}^{\prime}\right)\right]= 𝔼[p=1KWkWl(XkXk)X0,lvec(elX0,pel)]\displaystyle\mathbb{E}\left[\sum_{p=1}^{K}W_{k}W_{l}\left(X_{k}^{\prime}X_{k}\right)X_{0,l}{\rm vec}\left(e_{l}X_{0,p}e_{l}^{\prime}\right)\right]
=\displaystyle= 𝔼[WkWl(XkXk)X0,l2]vec(Jle)\displaystyle\mathbb{E}\left[W_{k}W_{l}\left(X_{k}^{\prime}X_{k}\right)X_{0,l}^{2}\right]{\rm vec}\left(J_{l}^{e}\right)
=\displaystyle= (nk1)vec(Jle).\displaystyle\left(n_{k}-1\right){\rm vec}\left(J_{l}^{e}\right).

Thus, we have

𝔼[l=1KWkWl(XkXk)X0,lvec(elX0)](nk1)vec(IK)\displaystyle\mathbb{E}\left[\sum_{l=1}^{K}W_{k}W_{l}\left(X_{k}^{\prime}X_{k}\right)X_{0,l}{\rm vec}\left(e_{l}X_{0}^{\prime}\right)\right]-\left(n_{k}-1\right){\rm vec}\left(I_{K}\right)
=\displaystyle= l(nk1)vec(Jle)(nk1)vec(Jke)+ϕk(nk1)vec(Jke)(nk1)vec(IK)\displaystyle\sum_{l}\left(n_{k}-1\right){\rm vec}\left(J_{l}^{e}\right)-\left(n_{k}-1\right){\rm vec}\left(J_{k}^{e}\right)+\phi_{k}\left(n_{k}-1\right){\rm vec}\left(J_{k}^{e}\right)-\left(n_{k}-1\right){\rm vec}\left(I_{K}\right)
=\displaystyle= (nk1)vec(IK)(nk1)vec(Jke)+ϕk(nk1)vec(Jke)(nk1)vec(IK)\displaystyle\left(n_{k}-1\right){\rm vec}\left(I_{K}\right)-\left(n_{k}-1\right){\rm vec}\left(J_{k}^{e}\right)+\phi_{k}\left(n_{k}-1\right){\rm vec}\left(J_{k}^{e}\right)-\left(n_{k}-1\right){\rm vec}\left(I_{K}\right)
=\displaystyle= (nk1)(ϕk1)vec(Jke).\displaystyle\left(n_{k}-1\right)\left(\phi_{k}-1\right){\rm vec}\left(J_{k}^{e}\right).

Finally, the kk-th row of matrix MM is ekΘ=λk1(ϕk1)vec(Jke)e_{k}^{\prime}\Theta=-\lambda_{k}^{-1}\left(\phi_{k}-1\right){\rm vec}\left(J_{k}^{e}\right)^{\prime}, so we have

Θ=k=1Kek(ekΘ)=k=1Kλk1(1ϕk)ekvec(Jke).\Theta=\sum_{k=1}^{K}e_{k}\left(e_{k}^{\prime}\Theta\right)=\sum_{k=1}^{K}\lambda_{k}^{-1}\left(1-\phi_{k}\right)e_{k}{\rm vec}\left(J_{k}^{e}\right)^{\prime}.

Appendix C Block Correlation Matrix with Hetero-tt Distribution

Because P=IP=I, we have U=PV=VU=PV=V. By using C=QDQC=QDQ^{\prime}, the log-likelihood function is now given by

(Z)=c12log|A|12k=1K(nk1)logλkk=1Ki=1nkνk,i+12log(1+1νk,i2Uk,i2),\ell(Z)=c-\tfrac{1}{2}\log|A|-\tfrac{1}{2}\sum_{k=1}^{K}\left(n_{k}-1\right)\log\lambda_{k}-\sum_{k=1}^{K}\sum_{i=1}^{n_{k}}\tfrac{\nu_{k,i}+1}{2}\log\left(1+\tfrac{1}{\nu_{k,i}-2}U_{k,i}^{2}\right),

where c=i=1nc(νi,1)c=\sum_{i=1}^{n}c\left(\nu_{i},1\right), and Uk,iU_{k,i} is the ii-th innovation of UkU_{k}. The cluster structure is implied by the block correlation matrix. To simplify the notation we let e~ink×1\tilde{e}_{i}\in\mathbb{R}^{n_{k}\times 1} denote the ii-th column of InkI_{n_{k}}. The identity Uk,i=nk1/2X0,k+(eivnk)XkU_{k,i}=n_{k}^{-1/2}X_{0,k}+\left(e_{i}^{\prime}v_{n_{k}}^{\perp}\right)X_{k}, which means that

(Uk,i2)vec(A)\displaystyle\frac{\partial\left(U_{k,i}^{2}\right)}{\partial{\rm vec}\left(A\right)} =(Uk,i2)Uk,i(nk1/2X0,k+(e~ivnk)Xk)vec(A)\displaystyle=\frac{\partial\left(U_{k,i}^{2}\right)}{\partial U_{k,i}}\frac{\partial\left(n_{k}^{-1/2}X_{0,k}+\left(\tilde{e}_{i}^{\prime}v_{n_{k}}^{\perp}\right)X_{k}\right)}{\partial{\rm vec}\left(A\right)}
=2Uk,i[nk1/2vec(ekX0)Ω12(eivnk)Xkλk1(1nk)1ekEd]\displaystyle=2U_{k,i}\left[n_{k}^{-1/2}{\rm vec}\left(e_{k}X_{0}^{\prime}\right)^{\prime}\Omega-\frac{1}{2}\left(e_{i}^{\prime}v_{n_{k}}^{\perp}\right)X_{k}\lambda_{k}^{-1}\left(1-n_{k}\right)^{-1}e_{k}^{\prime}E_{d}\right]
=2Uk,ivec(ekX0)nk1/2ΩUk,iFk,iUkekEd/[λk(1nk)],\displaystyle=2U_{k,i}{\rm vec}\left(e_{k}X_{0}^{\prime}\right)^{\prime}n_{k}^{-1/2}\Omega-U_{k,i}F_{k,i}U_{k}e_{k}^{\prime}E_{d}/\left[\lambda_{k}\left(1-n_{k}\right)\right],

where eke_{k} is kk-th column of the K×KK\times K identity matrix IKI_{K}, e~i\tilde{e}_{i} is the ii-th column of the nk×nkn_{k}\times n_{k} identity matrix InkI_{n_{k}}, Xk=vnkUkX_{k}=v_{n_{k}}^{\perp\ \prime}U_{k} and Fk,i=e~ivnkvnkF_{k,i}=\tilde{e}_{i}^{\prime}v_{n_{k}}^{\perp}v_{n_{k}}^{\perp\prime}. So we have

A=vec(A)\displaystyle\nabla_{A}^{\prime}=\frac{\partial\ell}{\partial{\rm vec}\left(A\right)^{\prime}} =[k=1Ki=1nkWk,iUk,ivec(ekX0)nk12vec(IK)]Ω+12SEd,\displaystyle=\left[\sum_{k=1}^{K}\sum_{i=1}^{n_{k}}W_{k,i}U_{k,i}{\rm vec}\left(e_{k}X_{0}^{\prime}\right)^{\prime}n_{k}^{-\frac{1}{2}}-{\rm vec}(I_{K})^{\prime}\right]\Omega+\tfrac{1}{2}S^{\prime}E_{d},

with the Wk,iW_{k,i} and SkS_{k} defined by

Wk,i=νk,i+1νk,i2+Uk,iUk,i,Sk=(nk1)i=1nkWk,iUk,iFk,iUk(nk1)λk.W_{k,i}=\frac{\nu_{k,i}+1}{\nu_{k,i}-2+U_{k,i}^{\prime}U_{k,i}},\quad S_{k}=\frac{\left(n_{k}-1\right)-\sum_{i=1}^{n_{k}}W_{k,i}U_{k,i}F_{k,i}U_{k}}{\left(n_{k}-1\right)\lambda_{k}}.

The information matrix of A\nabla_{A} can be expressed by A=A(1)+A(2)+A(21)+A(12)\mathcal{I}_{A}=\mathcal{I}_{A}^{(1)}+\mathcal{I}_{A}^{(2)}+\mathcal{I}_{A}^{(21)}+\mathcal{I}_{A}^{(12)}.

C.1 The Form of Matrix A(1)\mathcal{I}_{A}^{(1)}

A(1)=\displaystyle\mathcal{I}_{A}^{(1)}= 𝔼[k=1Ki=1nkWk,iUk,ivec(ekX0)nk12][l=1Kj=1nkWl,jUl,jvec(elX0)nl12]vec(IK)vec(IK)\displaystyle\mathbb{E}\left[\sum_{k=1}^{K}\sum_{i=1}^{n_{k}}W_{k,i}U_{k,i}{\rm vec}\left(e_{k}X_{0}^{\prime}\right)^{\prime}n_{k}^{-\frac{1}{2}}\right]\left[\sum_{l=1}^{K}\sum_{j=1}^{n_{k}}W_{l,j}U_{l,j}{\rm vec}\left(e_{l}X_{0}^{\prime}\right)^{\prime}n_{l}^{-\frac{1}{2}}\right]-{\rm vec}(I_{K}){\rm vec}(I_{K})^{\prime}
=\displaystyle= 𝔼[k=1Kl=1Ki=1nkj=1nlWk,iWl,jUk,iUl,jvec(ekX0)vec(elX0)nl12nk12]vec(IK)vec(IK)\displaystyle\mathbb{E}\left[\sum_{k=1}^{K}\sum_{l=1}^{K}\sum_{i=1}^{n_{k}}\sum_{j=1}^{n_{l}}W_{k,i}W_{l,j}U_{k,i}U_{l,j}{\rm vec}\left(e_{k}X_{0}^{\prime}\right){\rm vec}\left(e_{l}X_{0}^{\prime}\right)^{\prime}n_{l}^{-\frac{1}{2}}n_{k}^{-\frac{1}{2}}\right]-{\rm vec}(I_{K}){\rm vec}(I_{K})^{\prime}
=\displaystyle= 𝔼[k=1Ki=1nkWk,i2Uk,i2vec(ekX0)vec(ekX0)nk1]vec(IK)vec(IK)\displaystyle\mathbb{E}\left[\sum_{k=1}^{K}\sum_{i=1}^{n_{k}}W_{k,i}^{2}U_{k,i}^{2}{\rm vec}\left(e_{k}X_{0}^{\prime}\right){\rm vec}\left(e_{k}X_{0}^{\prime}\right)^{\prime}n_{k}^{-1}\right]-{\rm vec}(I_{K}){\rm vec}(I_{K})^{\prime}
+𝔼[k=1KlkKi=1nkj=1nlWk,iWl,jUk,iUl,jvec(ekX0)vec(elX0)nl12nk12]\displaystyle+\mathbb{E}\left[\sum_{k=1}^{K}\sum_{l\neq k}^{K}\sum_{i=1}^{n_{k}}\sum_{j=1}^{n_{l}}W_{k,i}W_{l,j}U_{k,i}U_{l,j}{\rm vec}\left(e_{k}X_{0}^{\prime}\right){\rm vec}\left(e_{l}X_{0}^{\prime}\right)^{\prime}n_{l}^{-\frac{1}{2}}n_{k}^{-\frac{1}{2}}\right]
+𝔼[k=1Ki=1nkjinkWk,iWk,jUk,iUk,jvec(ekX0)vec(ekX0)nk12nk12],\displaystyle+\mathbb{E}\left[\sum_{k=1}^{K}\sum_{i=1}^{n_{k}}\sum_{j\neq i}^{n_{k}}W_{k,i}W_{k,j}U_{k,i}U_{k,j}{\rm vec}\left(e_{k}X_{0}^{\prime}\right){\rm vec}\left(e_{k}X_{0}^{\prime}\right)^{\prime}n_{k}^{-\frac{1}{2}}n_{k}^{-\frac{1}{2}}\right],

and for later use, we have X0=kX0,kek=pUpvnpepX_{0}=\sum_{k}X_{0,k}e_{k}=\sum_{p}U_{p}^{\prime}v_{n_{p}}e_{p}.

C.1.1 The First Term

We have

𝔼[Wk,i2Uk,i2vec(ekX0)vec(ekX0)nk1]\displaystyle\mathbb{E}\left[W_{k,i}^{2}U_{k,i}^{2}{\rm vec}\left(e_{k}X_{0}^{\prime}\right){\rm vec}\left(e_{k}X_{0}^{\prime}\right)^{\prime}n_{k}^{-1}\right]
=\displaystyle= 𝔼[p=1Kq=1KWk,i2Uk,i2vec(ekUpvnpep)vec(ekUqvnqeq)nk1]\displaystyle\mathbb{E}\left[\sum_{p=1}^{K}\sum_{q=1}^{K}W_{k,i}^{2}U_{k,i}^{2}{\rm vec}\left(e_{k}U_{p}^{\prime}v_{n_{p}}e_{p}^{\prime}\right){\rm vec}\left(e_{k}U_{q}^{\prime}v_{n_{q}}e_{q}^{\prime}\right)^{\prime}n_{k}^{-1}\right]
=\displaystyle= 𝔼[Wk,i2Uk,i2vec(ekUkvnkek)vec(ekUkvnkek)nk1]\displaystyle\mathbb{E}\left[W_{k,i}^{2}U_{k,i}^{2}{\rm vec}\left(e_{k}U_{k}^{\prime}v_{n_{k}}e_{k}^{\prime}\right){\rm vec}\left(e_{k}U_{k}^{\prime}v_{n_{k}}e_{k}^{\prime}\right)^{\prime}n_{k}^{-1}\right]
+𝔼[pkKWk,i2Uk,i2vec(ekUpvnpep)vec(ekUpvnpep)nk1],\displaystyle+\mathbb{E}\left[\sum_{p\neq k}^{K}W_{k,i}^{2}U_{k,i}^{2}{\rm vec}\left(e_{k}U_{p}^{\prime}v_{n_{p}}e_{p}^{\prime}\right){\rm vec}\left(e_{k}U_{p}^{\prime}v_{n_{p}}e_{p}^{\prime}\right)^{\prime}n_{k}^{-1}\right],

and with p=k=qp=k=q, we have

𝔼[Wk,i2Uk,i2vec(ekUkvnkek)vec(ekUkvnkek)nk1]\displaystyle\mathbb{E}\left[W_{k,i}^{2}U_{k,i}^{2}{\rm vec}\left(e_{k}U_{k}^{\prime}v_{n_{k}}e_{k}^{\prime}\right){\rm vec}\left(e_{k}U_{k}^{\prime}v_{n_{k}}e_{k}^{\prime}\right)^{\prime}n_{k}^{-1}\right]
=\displaystyle= 𝔼[p=1nkq=1nkWk,i2Uk,i2Uk,pUk,qvec(ekek)vec(ekek)nk2]\displaystyle\mathbb{E}\left[\sum_{p=1}^{n_{k}}\sum_{q=1}^{n_{k}}W_{k,i}^{2}U_{k,i}^{2}U_{k,p}U_{k,q}{\rm vec}\left(e_{k}e_{k}^{\prime}\right){\rm vec}\left(e_{k}e_{k}^{\prime}\right)^{\prime}n_{k}^{-2}\right]
=\displaystyle= 𝔼[p=1nkq=1nkWk,i2Uk,i2Uk,pUk,qJkenk2]\displaystyle\mathbb{E}\left[\sum_{p=1}^{n_{k}}\sum_{q=1}^{n_{k}}W_{k,i}^{2}U_{k,i}^{2}U_{k,p}U_{k,q}J_{k\otimes}^{e}n_{k}^{-2}\right]
=\displaystyle= 𝔼[Wk,i2Uk,i4]Jkenk2+pink𝔼[Wk,i2Uk,i2Uk,p2]Jkenk2\displaystyle\mathbb{E}\left[W_{k,i}^{2}U_{k,i}^{4}\right]J_{k\otimes}^{e}n_{k}^{-2}+\sum_{p\neq i}^{n_{k}}\mathbb{E}\left[W_{k,i}^{2}U_{k,i}^{2}U_{k,p}^{2}\right]J_{k\otimes}^{e}n_{k}^{-2}
=\displaystyle= [3ϕk,i+ψk,i(nk1)]Jkenk2,\displaystyle\left[3\phi_{k,i}+\psi_{k,i}\left(n_{k}-1\right)\right]J_{k\otimes}^{e}n_{k}^{-2},

where we use that vec(ekek)=ekek{\rm vec}\left(e_{k}e_{k}^{\prime}\right)=e_{k}\otimes e_{k} because eke_{k} is a K×1K\times 1 vector. Next, when pkp\neq k, we have

pkK𝔼[Wk,i2Uk,i2vec(ekUpvnpep)vec(ekUpvnpep)nk1]\displaystyle\sum_{p\neq k}^{K}\mathbb{E}\left[W_{k,i}^{2}U_{k,i}^{2}{\rm vec}\left(e_{k}U_{p}^{\prime}v_{n_{p}}e_{p}^{\prime}\right){\rm vec}\left(e_{k}U_{p}^{\prime}v_{n_{p}}e_{p}^{\prime}\right)^{\prime}n_{k}^{-1}\right]
=\displaystyle= pkK𝔼[r=1npm=1npWk,i2Uk,i2Up,rUp,mvec(ekep)vec(ekep)nk1np1]\displaystyle\sum_{p\neq k}^{K}\mathbb{E}\left[\sum_{r=1}^{n_{p}}\sum_{m=1}^{n_{p}}W_{k,i}^{2}U_{k,i}^{2}U_{p,r}U_{p,m}{\rm vec}\left(e_{k}e_{p}^{\prime}\right){\rm vec}\left(e_{k}e_{p}^{\prime}\right)^{\prime}n_{k}^{-1}n_{p}^{-1}\right]
=\displaystyle= pkK𝔼[r=1npWk,i2Uk,i2Up,r2(epek)(epek)nk1np1]\displaystyle\sum_{p\neq k}^{K}\mathbb{E}\left[\sum_{r=1}^{n_{p}}W_{k,i}^{2}U_{k,i}^{2}U_{p,r}^{2}\left(e_{p}\otimes e_{k}\right)\left(e_{p}^{\prime}\otimes e_{k}^{\prime}\right)n_{k}^{-1}n_{p}^{-1}\right]
=\displaystyle= pkKψk,inp(epek)(epek)nk1np1\displaystyle\sum_{p\neq k}^{K}\psi_{k,i}n_{p}\left(e_{p}\otimes e_{k}\right)\left(e_{p}^{\prime}\otimes e_{k}^{\prime}\right)n_{k}^{-1}n_{p}^{-1}
=\displaystyle= pkKψk,i(JpeJke)nk1\displaystyle\sum_{p\neq k}^{K}\psi_{k,i}\left(J_{p}^{e}\otimes J_{k}^{e}\right)n_{k}^{-1}
=\displaystyle= ψk,i[(IKJke)Jk]nk1.\displaystyle\psi_{k,i}\left[\left(I_{K}-J_{k}^{e}\right)\otimes J_{k}\right]n_{k}^{-1}.

So, we have

k=1Ki=1nk𝔼[Wk,i2Uk,i2vec(ekX0)vec(ekX0)nk1]\displaystyle\sum_{k=1}^{K}\sum_{i=1}^{n_{k}}\mathbb{E}\left[W_{k,i}^{2}U_{k,i}^{2}{\rm vec}\left(e_{k}X_{0}^{\prime}\right){\rm vec}\left(e_{k}X_{0}^{\prime}\right)^{\prime}n_{k}^{-1}\right]
=\displaystyle= k=1Ki=1nk[[3ϕk,i+ψk,i(nk1)]nk2Jk+ψk,ink1[(IKJk)Jk]]\displaystyle\sum_{k=1}^{K}\sum_{i=1}^{n_{k}}\left[\left[3\phi_{k,i}+\psi_{k,i}\left(n_{k}-1\right)\right]n_{k}^{-2}J_{k\otimes}+\psi_{k,i}n_{k}^{-1}\left[\left(I_{K}-J_{k}\right)\otimes J_{k}\right]\right]
=\displaystyle= k=1K[nk1(3ϕ¯kψ¯k)Jke+ψ¯k(IKJke)],\displaystyle\sum_{k=1}^{K}\left[n_{k}^{-1}\left(3\bar{\phi}_{k}-\bar{\psi}_{k}\right)J_{k\otimes}^{e}+\bar{\psi}_{k}\left(I_{K}\otimes J_{k}^{e}\right)\right],

where ϕ¯k\bar{\phi}_{k} and ψ¯k\bar{\psi}_{k}, k=1,,Kk=1,\ldots,K are defined as

ϕ¯k=1nkk=1nkϕk,i,andψ¯k=1nkk=1nkψk,i,\bar{\phi}_{k}=\frac{1}{n_{k}}\sum_{k=1}^{n_{k}}\phi_{k,i},\qquad\text{and}\qquad\bar{\psi}_{k}=\frac{1}{n_{k}}\sum_{k=1}^{n_{k}}\psi_{k,i},

respectively.

C.1.2 The Second Term

For lkl\neq k, we have

𝔼[Wk,iWl,jUk,iUl,jvec(ekX0)vec(elX0)nl12nk12]\displaystyle\mathbb{E}\left[W_{k,i}W_{l,j}U_{k,i}U_{l,j}{\rm vec}\left(e_{k}X_{0}^{\prime}\right){\rm vec}\left(e_{l}X_{0}^{\prime}\right)^{\prime}n_{l}^{-\frac{1}{2}}n_{k}^{-\frac{1}{2}}\right]
=\displaystyle= 𝔼[p=1nkq=1nkWk,iWl,jUk,iUl,jvec(ekUpvnpep)vec(elUqvnqeq)nl12nk12]\displaystyle\mathbb{E}\left[\sum_{p=1}^{n_{k}}\sum_{q=1}^{n_{k}}W_{k,i}W_{l,j}U_{k,i}U_{l,j}{\rm vec}\left(e_{k}U_{p}^{\prime}v_{n_{p}}e_{p}^{\prime}\right){\rm vec}\left(e_{l}U_{q}^{\prime}v_{n_{q}}e_{q}^{\prime}\right)^{\prime}n_{l}^{-\frac{1}{2}}n_{k}^{-\frac{1}{2}}\right]
=\displaystyle= 𝔼[Wk,iWl,jUk,iUl,jvec(ekUkvnkek)vec(elUlvnlel)nl12nk12]\displaystyle\mathbb{E}\left[W_{k,i}W_{l,j}U_{k,i}U_{l,j}{\rm vec}\left(e_{k}U_{k}^{\prime}v_{n_{k}}e_{k}^{\prime}\right){\rm vec}\left(e_{l}U_{l}^{\prime}v_{n_{l}}e_{l}^{\prime}\right)^{\prime}n_{l}^{-\frac{1}{2}}n_{k}^{-\frac{1}{2}}\right]
+𝔼[Wk,iWl,jUk,iUl,jvec(ekUlvnlel)vec(elUkvnkek)nl12nk12].\displaystyle+\mathbb{E}\left[W_{k,i}W_{l,j}U_{k,i}U_{l,j}{\rm vec}\left(e_{k}U_{l}^{\prime}v_{n_{l}}e_{l}^{\prime}\right){\rm vec}\left(e_{l}U_{k}^{\prime}v_{n_{k}}e_{k}^{\prime}\right)^{\prime}n_{l}^{-\frac{1}{2}}n_{k}^{-\frac{1}{2}}\right].

The first term is given by

𝔼[Wk,iWl,jUk,iUl,jvec(ekUkvnkek)vec(elUlvnlel)nl12nk12]\displaystyle\mathbb{E}\left[W_{k,i}W_{l,j}U_{k,i}U_{l,j}{\rm vec}\left(e_{k}U_{k}^{\prime}v_{n_{k}}e_{k}^{\prime}\right){\rm vec}\left(e_{l}U_{l}^{\prime}v_{n_{l}}e_{l}^{\prime}\right)^{\prime}n_{l}^{-\frac{1}{2}}n_{k}^{-\frac{1}{2}}\right]
=\displaystyle= 𝔼[rnkmnlWk,iWl,jUk,iUl,jUk,rUl,mvec(ekek)vec(elel)nl1nk1]\displaystyle\mathbb{E}\left[\sum_{r}^{n_{k}}\sum_{m}^{n_{l}}W_{k,i}W_{l,j}U_{k,i}U_{l,j}U_{k,r}U_{l,m}{\rm vec}\left(e_{k}e_{k}^{\prime}\right){\rm vec}\left(e_{l}e_{l}^{\prime}\right)^{\prime}n_{l}^{-1}n_{k}^{-1}\right]
=\displaystyle= 𝔼[rnkmnlWk,iWl,jUk,iUl,jUk,rUl,mvec(Jke)vec(Jle)nl1nk1]\displaystyle\mathbb{E}\left[\sum_{r}^{n_{k}}\sum_{m}^{n_{l}}W_{k,i}W_{l,j}U_{k,i}U_{l,j}U_{k,r}U_{l,m}{\rm vec}\left(J_{k}^{e}\right){\rm vec}\left(J_{l}^{e}\right)^{\prime}n_{l}^{-1}n_{k}^{-1}\right]
=\displaystyle= 𝔼[Wk,iWl,jUk,i2Ul,j2vec(Jke)vec(Jle)nl1nk1]\displaystyle\mathbb{E}\left[W_{k,i}W_{l,j}U_{k,i}^{2}U_{l,j}^{2}{\rm vec}\left(J_{k}^{e}\right){\rm vec}\left(J_{l}^{e}\right)^{\prime}n_{l}^{-1}n_{k}^{-1}\right]
=\displaystyle= vec(Jke)vec(Jle)nl1nk1.\displaystyle{\rm vec}\left(J_{k}^{e}\right){\rm vec}\left(J_{l}^{e}\right)^{\prime}n_{l}^{-1}n_{k}^{-1}.

The second term is given by

𝔼[Wk,iWl,jUk,iUl,jvec(ekUlvnlel)vec(elUkvnkek)nl12nk12]\displaystyle\mathbb{E}\left[W_{k,i}W_{l,j}U_{k,i}U_{l,j}{\rm vec}\left(e_{k}U_{l}^{\prime}v_{n_{l}}e_{l}^{\prime}\right){\rm vec}\left(e_{l}U_{k}^{\prime}v_{n_{k}}e_{k}^{\prime}\right)^{\prime}n_{l}^{-\frac{1}{2}}n_{k}^{-\frac{1}{2}}\right]
=\displaystyle= 𝔼[r=1nlm=1nkWk,iWl,jUk,iUl,jUl,rUk,mvec(ekel)vec(elek)nl1nk1]\displaystyle\mathbb{E}\left[\sum_{r=1}^{n_{l}}\sum_{m=1}^{n_{k}}W_{k,i}W_{l,j}U_{k,i}U_{l,j}U_{l,r}U_{k,m}{\rm vec}\left(e_{k}e_{l}^{\prime}\right){\rm vec}\left(e_{l}e_{k}^{\prime}\right)^{\prime}n_{l}^{-1}n_{k}^{-1}\right]
=\displaystyle= 𝔼[Wk,iWl,jUk,i2Ul,j2vec(ekel)vec(elek)nl1nk1]\displaystyle\mathbb{E}\left[W_{k,i}W_{l,j}U_{k,i}^{2}U_{l,j}^{2}{\rm vec}\left(e_{k}e_{l}^{\prime}\right){\rm vec}\left(e_{l}e_{k}^{\prime}\right)^{\prime}n_{l}^{-1}n_{k}^{-1}\right]
=\displaystyle= vec(ekel)vec(elek)nl1nk1\displaystyle{\rm vec}\left(e_{k}e_{l}^{\prime}\right){\rm vec}\left(e_{l}e_{k}^{\prime}\right)^{\prime}n_{l}^{-1}n_{k}^{-1}
=\displaystyle= (JleJke)KKnl1nk1.\displaystyle\left(J_{l}^{e}\otimes J_{k}^{e}\right)K_{K}n_{l}^{-1}n_{k}^{-1}.

So, we have

k=1KlkKi=1nkj=1nl𝔼[Wk,iWl,jUk,iUl,jvec(ekX0)vec(elX0)nl12nk12]\displaystyle\sum_{k=1}^{K}\sum_{l\neq k}^{K}\sum_{i=1}^{n_{k}}\sum_{j=1}^{n_{l}}\mathbb{E}\left[W_{k,i}W_{l,j}U_{k,i}U_{l,j}{\rm vec}\left(e_{k}X_{0}^{\prime}\right){\rm vec}\left(e_{l}X_{0}^{\prime}\right)^{\prime}n_{l}^{-\frac{1}{2}}n_{k}^{-\frac{1}{2}}\right]
=\displaystyle= k=1KlkKi=1nkj=1nl[vec(Jke)vec(Jle)nl1nk1+(JleJke)KKnl1nk1]\displaystyle\sum_{k=1}^{K}\sum_{l\neq k}^{K}\sum_{i=1}^{n_{k}}\sum_{j=1}^{n_{l}}\left[{\rm vec}\left(J_{k}^{e}\right){\rm vec}\left(J_{l}^{e}\right)^{\prime}n_{l}^{-1}n_{k}^{-1}+\left(J_{l}^{e}\otimes J_{k}^{e}\right)K_{K}n_{l}^{-1}n_{k}^{-1}\right]
=\displaystyle= k=1KlkK[vec(Jke)vec(Jle)+(JleJke)KK]\displaystyle\sum_{k=1}^{K}\sum_{l\neq k}^{K}\left[{\rm vec}\left(J_{k}^{e}\right){\rm vec}\left(J_{l}^{e}\right)^{\prime}+\left(J_{l}^{e}\otimes J_{k}^{e}\right)K_{K}\right]
=\displaystyle= k=1K[vec(Jke)vec(IK)+(IKJke)KKJke(KK+IK2)]\displaystyle\sum_{k=1}^{K}\left[{\rm vec}\left(J_{k}^{e}\right){\rm vec}\left(I_{K}\right)^{\prime}+\left(I_{K}\otimes J_{k}^{e}\right)K_{K}-J_{k\otimes}^{e}\left(K_{K}+I_{K^{2}}\right)\right]
=\displaystyle= vec(IK)vec(IK)+KK2k=1KJke,\displaystyle{\rm vec}\left(I_{K}\right){\rm vec}\left(I_{K}\right)^{\prime}+K_{K}-2\sum_{k=1}^{K}J_{k\otimes}^{e},

the last equality use the fact that Jke=vec(Jke)vec(Jke)J_{k\otimes}^{e}={\rm vec}\left(J_{k}^{e}\right){\rm vec}\left(J_{k}^{e}\right)^{\prime} as Jke=ekekJ_{k}^{e}=e_{k}e_{k}^{\prime}, and JkeKK=JkeJ_{k\otimes}^{e}K_{K}=J_{k\otimes}^{e}.

C.1.3 The Third Term

We have iji\neq j, then

𝔼[Wk,iWk,jUk,iUk,jvec(ekX0)vec(ekX0)nk12nk12]\displaystyle\mathbb{E}\left[W_{k,i}W_{k,j}U_{k,i}U_{k,j}{\rm vec}\left(e_{k}X_{0}^{\prime}\right){\rm vec}\left(e_{k}X_{0}^{\prime}\right)^{\prime}n_{k}^{-\frac{1}{2}}n_{k}^{-\frac{1}{2}}\right]
=\displaystyle= 𝔼[pnkqnkWk,iWk,jUk,iUk,jvec(ekUpvnpep)vec(ekUqvnqeq)nk1]\displaystyle\mathbb{E}\left[\sum_{p}^{n_{k}}\sum_{q}^{n_{k}}W_{k,i}W_{k,j}U_{k,i}U_{k,j}{\rm vec}\left(e_{k}U_{p}^{\prime}v_{n_{p}}e_{p}^{\prime}\right){\rm vec}\left(e_{k}U_{q}^{\prime}v_{n_{q}}e_{q}^{\prime}\right)^{\prime}n_{k}^{-1}\right]
=\displaystyle= 𝔼[Wk,iWk,jUk,iUk,jvec(ekUkvnkek)vec(ekUkvnkek)nk1]\displaystyle\mathbb{E}\left[W_{k,i}W_{k,j}U_{k,i}U_{k,j}{\rm vec}\left(e_{k}U_{k}^{\prime}v_{n_{k}}e_{k}^{\prime}\right){\rm vec}\left(e_{k}U_{k}^{\prime}v_{n_{k}}e_{k}^{\prime}\right)^{\prime}n_{k}^{-1}\right]
=\displaystyle= rnkmnk𝔼[Wk,iWk,jUk,iUk,jUk,rUk,mvec(ekek)vec(ekek)nk2]\displaystyle\sum_{r}^{n_{k}}\sum_{m}^{n_{k}}\mathbb{E}\left[W_{k,i}W_{k,j}U_{k,i}U_{k,j}U_{k,r}U_{k,m}{\rm vec}\left(e_{k}e_{k}^{\prime}\right){\rm vec}\left(e_{k}e_{k}^{\prime}\right)^{\prime}n_{k}^{-2}\right]
=\displaystyle= 𝔼[Wk,iWk,jUk,iUk,jUk,iUk,j+Wk,iWk,jUk,iUk,jUk,jUk,i]Jkenk2\displaystyle\mathbb{E}\left[W_{k,i}W_{k,j}U_{k,i}U_{k,j}U_{k,i}U_{k,j}+W_{k,i}W_{k,j}U_{k,i}U_{k,j}U_{k,j}U_{k,i}\right]J_{k\otimes}^{e}n_{k}^{-2}
=\displaystyle= 2𝔼[Wk,iWk,jUk,i2Uk,j2]Jkenk2\displaystyle 2\mathbb{E}\left[W_{k,i}W_{k,j}U_{k,i}^{2}U_{k,j}^{2}\right]J_{k\otimes}^{e}n_{k}^{-2}
=\displaystyle= 2Jkenk2.\displaystyle 2J_{k\otimes}^{e}n_{k}^{-2}.

So we have

𝔼[k=1Ki=1nkjinkWk,iWk,jUk,iUk,jvec(ekX0)vec(ekX0)nk12nk12]\displaystyle\mathbb{E}\left[\sum_{k=1}^{K}\sum_{i=1}^{n_{k}}\sum_{j\neq i}^{n_{k}}W_{k,i}W_{k,j}U_{k,i}U_{k,j}{\rm vec}\left(e_{k}X_{0}^{\prime}\right){\rm vec}\left(e_{k}X_{0}^{\prime}\right)^{\prime}n_{k}^{-\frac{1}{2}}n_{k}^{-\frac{1}{2}}\right]
=\displaystyle= 𝔼[k=1Ki=1nkjink2Jkenk2]=k=1K2Jke(1nk1).\displaystyle\mathbb{E}\left[\sum_{k=1}^{K}\sum_{i=1}^{n_{k}}\sum_{j\neq i}^{n_{k}}2J_{k\otimes}^{e}n_{k}^{-2}\right]=\sum_{k=1}^{K}2J_{k\otimes}^{e}\left(1-n_{k}^{-1}\right).

C.1.4 Combine

Now we have

𝔼[k=1Ki=1nkWk,iUk,ivec(ekX0)nk12][l=1Kj=1nkWl,jUl,jvec(elX0)nl12]vec(IK)vec(IK)\displaystyle\mathbb{E}\left[\sum_{k=1}^{K}\sum_{i=1}^{n_{k}}W_{k,i}U_{k,i}{\rm vec}\left(e_{k}X_{0}^{\prime}\right)^{\prime}n_{k}^{-\frac{1}{2}}\right]\left[\sum_{l=1}^{K}\sum_{j=1}^{n_{k}}W_{l,j}U_{l,j}{\rm vec}\left(e_{l}X_{0}^{\prime}\right)^{\prime}n_{l}^{-\frac{1}{2}}\right]-{\rm vec}\left(I_{K}\right){\rm vec}\left(I_{K}\right)^{\prime}
=\displaystyle= KK+k=1K[nk1(3ϕ¯kψ¯k)Jke+ψ¯k(IKJke)2Jke+2Jke(1nk1)]\displaystyle K_{K}+\sum_{k=1}^{K}\left[n_{k}^{-1}\left(3\bar{\phi}_{k}-\bar{\psi}_{k}\right)J_{k\otimes}^{e}+\bar{\psi}_{k}\left(I_{K}\otimes J_{k}^{e}\right)-2J_{k\otimes}^{e}+2J_{k\otimes}^{e}\left(1-n_{k}^{-1}\right)\right]
=\displaystyle= KK+ΥKe,\displaystyle K_{K}+\Upsilon_{K}^{e},

where ΥKe=k=1KΨke\Upsilon_{K}^{e}=\sum_{k=1}^{K}\Psi_{k}^{e} with

Ψke\displaystyle\Psi_{k}^{e} =nk1(3ϕ¯kψ¯k)Jke+ψ¯k(IKJke)2Jke+2Jke(1nk1)\displaystyle=n_{k}^{-1}\left(3\bar{\phi}_{k}-\bar{\psi}_{k}\right)J_{k\otimes}^{e}+\bar{\psi}_{k}\left(I_{K}\otimes J_{k}^{e}\right)-2J_{k\otimes}^{e}+2J_{k\otimes}^{e}\left(1-n_{k}^{-1}\right)
=nk1(3ϕ¯k2ψ¯k)Jke+ψ¯k(IKJke).\displaystyle=n_{k}^{-1}\left(3\bar{\phi}_{k}-2-\bar{\psi}_{k}\right)J_{k\otimes}^{e}+\bar{\psi}_{k}\left(I_{K}\otimes J_{k}^{e}\right).

C.2 The Form of Matrix A(2)\mathcal{I}_{A}^{(2)}

We first define the K×1K\times 1 vector S¯\bar{S} with elements

S¯k=i=1nkWk,iUk,iFUkk,i(nk1),\bar{S}_{k}=\sum_{i=1}^{n_{k}}W_{k,i}U_{k,i}F{}_{k,i}U_{k}-\left(n_{k}-1\right),

and obviously we have 𝔼(SkSl)=0\mathbb{E}\left(S_{k}S_{l}\right)=0 for klk\neq l. And we need to compute

𝔼(S¯k2)\displaystyle\mathbb{E}\left(\bar{S}_{k}^{2}\right) =𝔼(i=1nkj=1nkWk,iWk,jUk,iUk,jFk,iUkUkFk,j)(nk1)2\displaystyle=\mathbb{E}\left(\sum_{i=1}^{n_{k}}\sum_{j=1}^{n_{k}}W_{k,i}W_{k,j}U_{k,i}U_{k,j}F_{k,i}U_{k}U_{k}^{\prime}F_{k,j}^{\prime}\right)-\left(n_{k}-1\right)^{2}
=𝔼(i=1nkj=1nkWk,iWk,j(Uk,iUk,j)Fk,i(UkUk)Fk,j)(nk1)2\displaystyle=\mathbb{E}\left(\sum_{i=1}^{n_{k}}\sum_{j=1}^{n_{k}}W_{k,i}W_{k,j}\left(U_{k,i}U_{k,j}\right)F_{k,i}\left(U_{k}U_{k}^{\prime}\right)F_{k,j}^{\prime}\right)-\left(n_{k}-1\right)^{2}
=𝔼(i=1nkWk,i2Uk,i2Fk,i(UkUk)Fk,i)(nk1)2\displaystyle=\mathbb{E}\left(\sum_{i=1}^{n_{k}}W_{k,i}^{2}U_{k,i}^{2}F_{k,i}\left(U_{k}U_{k}^{\prime}\right)F_{k,i}^{\prime}\right)-\left(n_{k}-1\right)^{2}
+𝔼(i=1nkjinkWk,iWk,j(Uk,iUk,j)Fk,i(UkUk)Fk,j).\displaystyle\quad+\mathbb{E}\left(\sum_{i=1}^{n_{k}}\sum_{j\neq i}^{n_{k}}W_{k,i}W_{k,j}\left(U_{k,i}U_{k,j}\right)F_{k,i}\left(U_{k}U_{k}^{\prime}\right)F_{k,j}^{\prime}\right).

Based on the following results on Fk,i=e~ivnkvnkF_{k,i}=\tilde{e}_{i}^{\prime}v_{n_{k}}^{\perp}v_{n_{k}}^{\perp\ \prime}, for iji\neq j we have

Fk,iFk,i\displaystyle F_{k,i}F_{k,i}^{\prime} =e~i(vnkvnk)e~i=e~i(Inkvnkvnk)e~i=1nk1\displaystyle=\tilde{e}_{i}^{\prime}\left(v_{n_{k}}^{\perp}v_{n_{k}}^{\perp\ \prime}\right)\tilde{e}_{i}=\tilde{e}_{i}^{\prime}\left(I_{n_{k}}-v_{n_{k}}v_{n_{k}}^{\prime}\right)\tilde{e}_{i}=1-n_{k}^{-1}
Fk,ie~ie~iFk,i\displaystyle F_{k,i}\tilde{e}_{i}\tilde{e}_{i}^{\prime}F_{k,i}^{\prime} =(e~ivnkvnke~i)(e~ivnkvnke~i)=(1nk1)2\displaystyle=\left(\tilde{e}_{i}^{\prime}v_{n_{k}}^{\perp}v_{n_{k}}^{\perp\ \prime}\tilde{e}_{i}\right)\left(\tilde{e}_{i}^{\prime}v_{n_{k}}^{\perp}v_{n_{k}}^{\perp\ \prime}\tilde{e}_{i}\right)=\left(1-n_{k}^{-1}\right)^{2}
Fk,ie~je~iFk,j\displaystyle F_{k,i}\tilde{e}_{j}\tilde{e}_{i}^{\prime}F_{k,j}^{\prime} =(e~ivnkvnke~j)(e~ivnkvnke~j)=(nk1)2=nk2,\displaystyle=\left(\tilde{e}_{i}^{\prime}v_{n_{k}}^{\perp}v_{n_{k}}^{\perp\ \prime}\tilde{e}_{j}\right)\left(\tilde{e}_{i}^{\prime}v_{n_{k}}^{\perp}v_{n_{k}}^{\perp\ \prime}\tilde{e}_{j}\right)=\left(-n_{k}^{-1}\right)^{2}=n_{k}^{-2},

we have

𝔼[Wk,i2Uk,i2Fk,i(UkUk)Fk,i]\displaystyle\mathbb{E}\left[W_{k,i}^{2}U_{k,i}^{2}F_{k,i}\left(U_{k}U_{k}^{\prime}\right)F_{k,i}^{\prime}\right]
=\displaystyle= 𝔼[pnkqnkWk,i2Uk,i2(Uk,pUk,q)Fk,i(e~pe~q)Fk,i]\displaystyle\mathbb{E}\left[\sum_{p}^{n_{k}}\sum_{q}^{n_{k}}W_{k,i}^{2}U_{k,i}^{2}\left(U_{k,p}U_{k,q}\right)F_{k,i}\left(\tilde{e}_{p}\tilde{e}_{q}^{\prime}\right)F_{k,i}^{\prime}\right]
=\displaystyle= 𝔼[Wk,i2Uk,i4Fk,i(e~ie~i)Fk,i]+pink𝔼[Wk,i2Uk,i2Uk,p2Fk,i(e~pe~p)Fk,i]\displaystyle\mathbb{E}\left[W_{k,i}^{2}U_{k,i}^{4}F_{k,i}\left(\tilde{e}_{i}\tilde{e}_{i}^{\prime}\right)F_{k,i}^{\prime}\right]+\sum_{p\neq i}^{n_{k}}\mathbb{E}\left[W_{k,i}^{2}U_{k,i}^{2}U_{k,p}^{2}F_{k,i}\left(\tilde{e}_{p}\tilde{e}_{p}^{\prime}\right)F_{k,i}^{\prime}\right]
=\displaystyle= 3ϕk,iFk,i(e~ie~i)Fk,i+pinkψk,iFk,i(e~pe~p)Fk,i\displaystyle 3\phi_{k,i}F_{k,i}\left(\tilde{e}_{i}\tilde{e}_{i}^{\prime}\right)F_{k,i}^{\prime}+\sum_{p\neq i}^{n_{k}}\psi_{k,i}F_{k,i}\left(\tilde{e}_{p}\tilde{e}_{p}^{\prime}\right)F_{k,i}^{\prime}
=\displaystyle= 3ϕk,iFk,i(e~ie~i)Fk,i+ψk,ink2(nk1)\displaystyle 3\phi_{k,i}F_{k,i}\left(\tilde{e}_{i}\tilde{e}_{i}^{\prime}\right)F_{k,i}^{\prime}+\psi_{k,i}n_{k}^{-2}\left(n_{k}-1\right)
=\displaystyle= 3ϕk,i(1nk1)2+ψk,ink2(nk1).\displaystyle 3\phi_{k,i}\left(1-n_{k}^{-1}\right)^{2}+\psi_{k,i}n_{k}^{-2}\left(n_{k}-1\right).

So

i=1nk𝔼[Wk,i2Uk,i2Fk,i(UkUk)Fk,i]=3nkϕ¯k(1nk1)2+ψ¯k(1nk1),\sum_{i=1}^{n_{k}}\mathbb{E}\left[W_{k,i}^{2}U_{k,i}^{2}F_{k,i}\left(U_{k}U_{k}^{\prime}\right)F_{k,i}^{\prime}\right]=3n_{k}\bar{\phi}_{k}\left(1-n_{k}^{-1}\right)^{2}+\bar{\psi}_{k}\left(1-n_{k}^{-1}\right),

and for iji\neq j, we have

𝔼[Wk,iWk,j(Uk,iUk,j)Fk,i(UkUk)Fk,j]\displaystyle\mathbb{E}\left[W_{k,i}W_{k,j}\left(U_{k,i}U_{k,j}\right)F_{k,i}\left(U_{k}U_{k}^{\prime}\right)F_{k,j}^{\prime}\right]
=\displaystyle= 𝔼[pnkqnkWk,iWk,j(Uk,iUk,j)(Uk,pUk,q)Fk,i(e~pe~q)Fk,j]\displaystyle\mathbb{E}\left[\sum_{p}^{n_{k}}\sum_{q}^{n_{k}}W_{k,i}W_{k,j}\left(U_{k,i}U_{k,j}\right)\left(U_{k,p}U_{k,q}\right)F_{k,i}\left(\tilde{e}_{p}\tilde{e}_{q}^{\prime}\right)F_{k,j}^{\prime}\right]
=\displaystyle= 𝔼[Wk,iWk,j(Uk,i2Uk,j2)Fk,i(e~ie~j)Fk,j]+𝔼[Wk,iWk,j(Uk,i2Uk,j2)Fk,i(e~je~i)Fk,j]\displaystyle\mathbb{E}\left[W_{k,i}W_{k,j}\left(U_{k,i}^{2}U_{k,j}^{2}\right)F_{k,i}\left(\tilde{e}_{i}\tilde{e}_{j}^{\prime}\right)F_{k,j}^{\prime}\right]+\mathbb{E}\left[W_{k,i}W_{k,j}\left(U_{k,i}^{2}U_{k,j}^{2}\right)F_{k,i}\left(\tilde{e}_{j}\tilde{e}_{i}^{\prime}\right)F_{k,j}^{\prime}\right]
=\displaystyle= Fk,i(e~ie~j+e~je~i)Fk,j=(1nk1)2+nk2.\displaystyle F_{k,i}\left(\tilde{e}_{i}\tilde{e}_{j}^{\prime}+\tilde{e}_{j}\tilde{e}_{i}^{\prime}\right)F_{k,j}^{\prime}=\left(1-n_{k}^{-1}\right)^{2}+n_{k}^{-2}.

So,

𝔼[jinkWk,iWk,j(Uk,iUk,j)Fk,i(UkUk)Fk,j]=(nk1)[(1nk1)2+nk2],\mathbb{E}\left[\sum_{j\neq i}^{n_{k}}W_{k,i}W_{k,j}\left(U_{k,i}U_{k,j}\right)F_{k,i}\left(U_{k}U_{k}^{\prime}\right)F_{k,j}^{\prime}\right]=\left(n_{k}-1\right)\left[\left(1-n_{k}^{-1}\right)^{2}+n_{k}^{-2}\right],

and this leads to

𝔼(S¯k2)\displaystyle\mathbb{E}\left(\bar{S}_{k}^{2}\right) =3nkϕ¯k(1nk1)2+ψ¯k(1nk1)+nk(nk1)[(1nk1)2+nk2](nk1)2,\displaystyle=3n_{k}\bar{\phi}_{k}\left(1-n_{k}^{-1}\right)^{2}+\bar{\psi}_{k}\left(1-n_{k}^{-1}\right)+n_{k}\left(n_{k}-1\right)\left[\left(1-n_{k}^{-1}\right)^{2}+n_{k}^{-2}\right]-\left(n_{k}-1\right)^{2},
𝔼(Sk2)\displaystyle\mathbb{E}\left(S_{k}^{2}\right) =λk2(nk1)2𝔼(S¯k2)=λk2nk1[3ϕ¯k1+(ψ¯k+1)(nk1)1],\displaystyle=\lambda_{k}^{-2}\left(n_{k}-1\right)^{-2}\mathbb{E}\left(\bar{S}_{k}^{2}\right)=\lambda_{k}^{-2}n_{k}^{-1}\left[3\bar{\phi}_{k}-1+\left(\bar{\psi}_{k}+1\right)\left(n_{k}-1\right)^{-1}\right],

and define the matrix Ξ\Xi as Ξkk=𝔼(Sk2)\Xi_{kk}=\mathbb{E}\left(S_{k}^{2}\right) and Ξkl=𝔼(SkSj)=0\Xi_{kl}=\mathbb{E}\left(S_{k}S_{j}\right)=0 for klk\neq l, we have A(2)=14EdΞEd\mathcal{I}_{A}^{(2)}=\frac{1}{4}E_{d}^{\prime}\Xi E_{d}.

C.3 The Form of Matrix A(12)\mathcal{I}_{A}^{(12)}

We first need to compute

𝔼[i=1nkWk,iUk,iFk,iUk(nk1)][l=1Kj=1nlWl,jUl,jvec(elX0)nl12vec(IK)]\displaystyle\mathbb{E}\left[\sum_{i=1}^{n_{k}}W_{k,i}U_{k,i}F_{k,i}U_{k}-\left(n_{k}-1\right)\right]\left[\sum_{l=1}^{K}\sum_{j=1}^{n_{l}}W_{l,j}U_{l,j}{\rm vec}\left(e_{l}X_{0}^{\prime}\right)^{\prime}n_{l}^{-\frac{1}{2}}-{\rm vec}(I_{K})^{\prime}\right]
=\displaystyle= 𝔼[l=1Ki=1nkj=1nlWk,iWl,jUk,iUl,j(Fk,iUk)vec(elX0)nl12](nk1)vec(IK)\displaystyle\mathbb{E}\left[\sum_{l=1}^{K}\sum_{i=1}^{n_{k}}\sum_{j=1}^{n_{l}}W_{k,i}W_{l,j}U_{k,i}U_{l,j}\left(F_{k,i}U_{k}\right){\rm vec}\left(e_{l}X_{0}^{\prime}\right)^{\prime}n_{l}^{-\frac{1}{2}}\right]-\left(n_{k}-1\right){\rm vec}(I_{K})^{\prime}
=\displaystyle= 𝔼[i=1nkWk,i2Uk,i2(Fk,iUk)vec(ekX0)nk12](nk1)vec(IK)\displaystyle\mathbb{E}\left[\sum_{i=1}^{n_{k}}W_{k,i}^{2}U_{k,i}^{2}\left(F_{k,i}U_{k}\right){\rm vec}\left(e_{k}X_{0}^{\prime}\right)^{\prime}n_{k}^{-\frac{1}{2}}\right]-\left(n_{k}-1\right){\rm vec}(I_{K})^{\prime}
+𝔼[i=1nkjinkWk,iWk,jUk,iUk,j(Fk,iUk)vec(ekX0)nk12]\displaystyle+\mathbb{E}\left[\sum_{i=1}^{n_{k}}\sum_{j\neq i}^{n_{k}}W_{k,i}W_{k,j}U_{k,i}U_{k,j}\left(F_{k,i}U_{k}\right){\rm vec}\left(e_{k}X_{0}^{\prime}\right)^{\prime}n_{k}^{-\frac{1}{2}}\right]
+𝔼[lkKi=1nkj=1nlWk,iWl,jUk,iUl,j(Fk,iUk)vec(elX0)nl12].\displaystyle+\mathbb{E}\left[\sum_{l\neq k}^{K}\sum_{i=1}^{n_{k}}\sum_{j=1}^{n_{l}}W_{k,i}W_{l,j}U_{k,i}U_{l,j}\left(F_{k,i}U_{k}\right){\rm vec}\left(e_{l}X_{0}^{\prime}\right)^{\prime}n_{l}^{-\frac{1}{2}}\right].

For the first term, we have

𝔼[Wk,i2Uk,i2(Fk,iUk)vec(ekX0)nk12]\displaystyle\mathbb{E}\left[W_{k,i}^{2}U_{k,i}^{2}\left(F_{k,i}U_{k}\right){\rm vec}\left(e_{k}X_{0}^{\prime}\right)^{\prime}n_{k}^{-\frac{1}{2}}\right]
=\displaystyle= 𝔼[qKpnkWk,i2Uk,i2(Fk,iUk,pe~p)vec(ekUqvnqeq)nk12]\displaystyle\mathbb{E}\left[\sum_{q}^{K}\sum_{p}^{n_{k}}W_{k,i}^{2}U_{k,i}^{2}\left(F_{k,i}U_{k,p}\tilde{e}_{p}\right){\rm vec}\left(e_{k}U_{q}^{\prime}v_{n_{q}}e_{q}^{\prime}\right)^{\prime}n_{k}^{-\frac{1}{2}}\right]
=\displaystyle= 𝔼[pnkWk,i2Uk,i2Uk,p(Fk,ie~p)vec(ekUkvnkek)nk12]\displaystyle\mathbb{E}\left[\sum_{p}^{n_{k}}W_{k,i}^{2}U_{k,i}^{2}U_{k,p}\left(F_{k,i}\tilde{e}_{p}\right){\rm vec}\left(e_{k}U_{k}^{\prime}v_{n_{k}}e_{k}^{\prime}\right)^{\prime}n_{k}^{-\frac{1}{2}}\right]
=\displaystyle= 𝔼[pnkrnkWk,i2Uk,i2Uk,pUk,r(Fk,ie~r)vec(ekek)nk1]\displaystyle\mathbb{E}\left[\sum_{p}^{n_{k}}\sum_{r}^{n_{k}}W_{k,i}^{2}U_{k,i}^{2}U_{k,p}U_{k,r}\left(F_{k,i}\tilde{e}_{r}\right){\rm vec}\left(e_{k}e_{k}^{\prime}\right)^{\prime}n_{k}^{-1}\right]
=\displaystyle= 𝔼[Wk,i2Uk,i4(Fk,ie~i)vec(ekek)nk1]+𝔼[pinkWk,i2Uk,i2Uk,p2(Fk,ie~p)vec(ekek)nk1]\displaystyle\mathbb{E}\left[W_{k,i}^{2}U_{k,i}^{4}\left(F_{k,i}\tilde{e}_{i}\right){\rm vec}\left(e_{k}e_{k}^{\prime}\right)^{\prime}n_{k}^{-1}\right]+\mathbb{E}\left[\sum_{p\neq i}^{n_{k}}W_{k,i}^{2}U_{k,i}^{2}U_{k,p}^{2}\left(F_{k,i}\tilde{e}_{p}\right){\rm vec}\left(e_{k}e_{k}^{\prime}\right)^{\prime}n_{k}^{-1}\right]
=\displaystyle= 3ϕk,i[(Fk,ie~i)vec(Jke)nk1]+pinkψk,i(Fk,ie~p)vec(Jke)nk1\displaystyle 3\phi_{k,i}\left[\left(F_{k,i}\tilde{e}_{i}\right){\rm vec}\left(J_{k}^{e}\right)^{\prime}n_{k}^{-1}\right]+\sum_{p\neq i}^{n_{k}}\psi_{k,i}\left(F_{k,i}\tilde{e}_{p}\right){\rm vec}\left(J_{k}^{e}\right)^{\prime}n_{k}^{-1}
=\displaystyle= 3ϕk,i(1nk1)vec(Jke)nk1ψk,i(nk1)nk1vec(Jke)nk1\displaystyle 3\phi_{k,i}\left(1-n_{k}^{-1}\right){\rm vec}\left(J_{k}^{e}\right)^{\prime}n_{k}^{-1}-\psi_{k,i}\left(n_{k}-1\right)n_{k}^{-1}{\rm vec}\left(J_{k}^{e}\right)^{\prime}n_{k}^{-1}
=\displaystyle= (3ϕk,iψk,i)(1nk1)vec(Jke)nk1.\displaystyle\left(3\phi_{k,i}-\psi_{k,i}\right)\left(1-n_{k}^{-1}\right){\rm vec}\left(J_{k}^{e}\right)^{\prime}n_{k}^{-1}.

So

𝔼[inkWk,i2Uk,i2(Fk,iUk)vec(ekX0)nk12]\displaystyle\mathbb{E}\left[\sum_{i}^{n_{k}}W_{k,i}^{2}U_{k,i}^{2}\left(F_{k,i}U_{k}\right){\rm vec}\left(e_{k}X_{0}^{\prime}\right)^{\prime}n_{k}^{-\frac{1}{2}}\right] =(3ϕ¯kψ¯k)(1nk1)vec(Jk),\displaystyle=\left(3\bar{\phi}_{k}-\bar{\psi}_{k}\right)\left(1-n_{k}^{-1}\right){\rm vec}\left(J_{k}\right)^{\prime},

and for the second term, we have iji\neq j and

𝔼[Wk,iWk,jUk,iUk,j(Fk,iUk)vec(ekX0)nk12]\displaystyle\mathbb{E}\left[W_{k,i}W_{k,j}U_{k,i}U_{k,j}\left(F_{k,i}U_{k}\right){\rm vec}\left(e_{k}X_{0}^{\prime}\right)^{\prime}n_{k}^{-\frac{1}{2}}\right]
=\displaystyle= 𝔼[q=1Kp=1nkWk,iWk,jUk,iUk,jUk,p(Fk,ie~p)vec(ekUqvnqeq)nk12]\displaystyle\mathbb{E}\left[\sum_{q=1}^{K}\sum_{p=1}^{n_{k}}W_{k,i}W_{k,j}U_{k,i}U_{k,j}U_{k,p}\left(F_{k,i}\tilde{e}_{p}\right){\rm vec}\left(e_{k}U_{q}^{\prime}v_{n_{q}}e_{q}^{\prime}\right)^{\prime}n_{k}^{-\frac{1}{2}}\right]
=\displaystyle= 𝔼[p=1nkWk,iWk,jUk,iUk,jUk,p(Fk,ie~p)vec(ekUkvnkek)nk12]\displaystyle\mathbb{E}\left[\sum_{p=1}^{n_{k}}W_{k,i}W_{k,j}U_{k,i}U_{k,j}U_{k,p}\left(F_{k,i}\tilde{e}_{p}\right){\rm vec}\left(e_{k}U_{k}^{\prime}v_{n_{k}}e_{k}^{\prime}\right)^{\prime}n_{k}^{-\frac{1}{2}}\right]
=\displaystyle= 𝔼[p=1nkr=1nkWk,iWk,jUk,iUk,jUk,pUk,r(Fk,ie~p)vec(ekek)nk1]\displaystyle\mathbb{E}\left[\sum_{p=1}^{n_{k}}\sum_{r=1}^{n_{k}}W_{k,i}W_{k,j}U_{k,i}U_{k,j}U_{k,p}U_{k,r}\left(F_{k,i}\tilde{e}_{p}\right){\rm vec}\left(e_{k}e_{k}^{\prime}\right)^{\prime}n_{k}^{-1}\right]
=\displaystyle= 𝔼[Wk,iWk,jUk,i2Uk,j2(Fk,ie~i)vec(Jk)nk1]+𝔼[Wk,iWk,jUk,i2Uk,j2(Fk,ie~j)vec(Jk)nk1]\displaystyle\mathbb{E}\left[W_{k,i}W_{k,j}U_{k,i}^{2}U_{k,j}^{2}\left(F_{k,i}\tilde{e}_{i}\right){\rm vec}\left(J_{k}\right)^{\prime}n_{k}^{-1}\right]+\mathbb{E}\left[W_{k,i}W_{k,j}U_{k,i}^{2}U_{k,j}^{2}\left(F_{k,i}\tilde{e}_{j}\right){\rm vec}\left(J_{k}\right)^{\prime}n_{k}^{-1}\right]
=\displaystyle= (Fk,ie~i)vec(Jke)nk1+(Fk,ie~j)vec(Jke)nk1\displaystyle\left(F_{k,i}\tilde{e}_{i}\right){\rm vec}\left(J_{k}^{e}\right)^{\prime}n_{k}^{-1}+\left(F_{k,i}\tilde{e}_{j}\right){\rm vec}\left(J_{k}^{e}\right)^{\prime}n_{k}^{-1}
=\displaystyle= (1nk1)vec(Jke)nk1nk1vec(Jke)nk1\displaystyle\left(1-n_{k}^{-1}\right){\rm vec}\left(J_{k}^{e}\right)^{\prime}n_{k}^{-1}-n_{k}^{-1}{\rm vec}\left(J_{k}^{e}\right)^{\prime}n_{k}^{-1}
=\displaystyle= (12nk1)vec(Jke)nk1.\displaystyle\left(1-2n_{k}^{-1}\right){\rm vec}\left(J_{k}^{e}\right)^{\prime}n_{k}^{-1}.

So

𝔼[i=1nkjinkWk,iWk,jUk,iUk,j(Fk,iUk)vec(ekX0)nk12]=(nk1)(12nk1)vec(Jke),\mathbb{E}\left[\sum_{i=1}^{n_{k}}\sum_{j\neq i}^{n_{k}}W_{k,i}W_{k,j}U_{k,i}U_{k,j}\left(F_{k,i}U_{k}\right){\rm vec}\left(e_{k}X_{0}^{\prime}\right)^{\prime}n_{k}^{-\frac{1}{2}}\right]=\left(n_{k}-1\right)\left(1-2n_{k}^{-1}\right){\rm vec}\left(J_{k}^{e}\right)^{\prime},

as for the third term, we have klk\neq l, and

𝔼[Wk,iWl,jUk,iUl,j(Fk,iUk)vec(elX0)nl12]\displaystyle\mathbb{E}\left[W_{k,i}W_{l,j}U_{k,i}U_{l,j}\left(F_{k,i}U_{k}\right){\rm vec}\left(e_{l}X_{0}^{\prime}\right)^{\prime}n_{l}^{-\frac{1}{2}}\right]
=\displaystyle= 𝔼[q=1Kp=1nkWk,iWl,jUk,iUl,jUk,p(Fk,ie~p)vec(elUqvnqeq)nl12]\displaystyle\mathbb{E}\left[\sum_{q=1}^{K}\sum_{p=1}^{n_{k}}W_{k,i}W_{l,j}U_{k,i}U_{l,j}U_{k,p}\left(F_{k,i}\tilde{e}_{p}\right){\rm vec}\left(e_{l}U_{q}^{\prime}v_{n_{q}}e_{q}^{\prime}\right)^{\prime}n_{l}^{-\frac{1}{2}}\right]
=\displaystyle= 𝔼[p=1nkWk,iWl,jUk,iUl,jUk,p(Fk,ie~p)vec(elUlvnlel)nl12]\displaystyle\mathbb{E}\left[\sum_{p=1}^{n_{k}}W_{k,i}W_{l,j}U_{k,i}U_{l,j}U_{k,p}\left(F_{k,i}\tilde{e}_{p}\right){\rm vec}\left(e_{l}U_{l}^{\prime}v_{n_{l}}e_{l}^{\prime}\right)^{\prime}n_{l}^{-\frac{1}{2}}\right]
=\displaystyle= 𝔼[p=1nkr=1nlWk,iWl,jUk,iUk,pUl,jUl,r(Fk,ie~p)vec(elel)nl1]\displaystyle\mathbb{E}\left[\sum_{p=1}^{n_{k}}\sum_{r=1}^{n_{l}}W_{k,i}W_{l,j}U_{k,i}U_{k,p}U_{l,j}U_{l,r}\left(F_{k,i}\tilde{e}_{p}\right){\rm vec}\left(e_{l}e_{l}^{\prime}\right)^{\prime}n_{l}^{-1}\right]
=\displaystyle= 𝔼[Wk,iWl,jUk,i2Ul,j2(Fk,ie~i)vec(elel)nl1]\displaystyle\mathbb{E}\left[W_{k,i}W_{l,j}U_{k,i}^{2}U_{l,j}^{2}\left(F_{k,i}\tilde{e}_{i}\right){\rm vec}\left(e_{l}e_{l}^{\prime}\right)^{\prime}n_{l}^{-1}\right]
=\displaystyle= (Fk,ie~i)vec(Jle)nl1\displaystyle\left(F_{k,i}\tilde{e}_{i}\right){\rm vec}\left(J_{l}^{e}\right)^{\prime}n_{l}^{-1}
=\displaystyle= (1nk1)vec(Jle)nl1.\displaystyle\left(1-n_{k}^{-1}\right){\rm vec}\left(J_{l}^{e}\right)^{\prime}n_{l}^{-1}.

So, we have

𝔼[lkKi=1nkj=1nlWk,iWl,jUk,iUl,j(Fk,iUk)vec(elX0)nl12]=(nk1)vec(IKJke),\mathbb{E}\left[\sum_{l\neq k}^{K}\sum_{i=1}^{n_{k}}\sum_{j=1}^{n_{l}}W_{k,i}W_{l,j}U_{k,i}U_{l,j}\left(F_{k,i}U_{k}\right){\rm vec}\left(e_{l}X_{0}^{\prime}\right)^{\prime}n_{l}^{-\frac{1}{2}}\right]=\left(n_{k}-1\right){\rm vec}\left(I_{K}-J_{k}^{e}\right)^{\prime},

and

𝔼[i=1nkWk,iUk,iFk,iUk(nk1)][l=1Kj=1nkWk,jUk,jvec(ekX0)nk12vec(IK)]\displaystyle\mathbb{E}\left[\sum_{i=1}^{n_{k}}W_{k,i}U_{k,i}F_{k,i}U_{k}-\left(n_{k}-1\right)\right]\left[\sum_{l=1}^{K}\sum_{j=1}^{n_{k}}W_{k,j}U_{k,j}{\rm vec}\left(e_{k}X_{0}^{\prime}\right)^{\prime}n_{k}^{-\frac{1}{2}}-{\rm vec}(I_{K})^{\prime}\right]
=\displaystyle= (3ϕ¯kψ¯k)(1nk1)vec(Jke)+(nk1)(12nk1)vec(Jke)\displaystyle\left(3\bar{\phi}_{k}-\bar{\psi}_{k}\right)\left(1-n_{k}^{-1}\right){\rm vec}\left(J_{k}^{e}\right)^{\prime}+\left(n_{k}-1\right)\left(1-2n_{k}^{-1}\right){\rm vec}\left(J_{k}^{e}\right)^{\prime}
+(nk1)vec(IKJke)(nk1)vec(IK)\displaystyle+\left(n_{k}-1\right){\rm vec}\left(I_{K}-J_{k}^{e}\right)^{\prime}-\left(n_{k}-1\right){\rm vec}(I_{K})^{\prime}
=\displaystyle= [(3ϕ¯kψ¯k2)(1nk1)]vec(Jke).\displaystyle\left[\left(3\bar{\phi}_{k}-\bar{\psi}_{k}-2\right)\left(1-n_{k}^{-1}\right)\right]{\rm vec}\left(J_{k}^{e}\right)^{\prime}.

We finally arrive at the following expression for Θ\Theta, with A(21)=12EdΘEd\mathcal{I}_{A}^{(21)}=\frac{1}{2}E_{d}^{\prime}\Theta E_{d},

Θ=𝔼[S(k=1Kj=1nkWk,jUk,jvec(ekX0)nk12vec(IK))]=k=1Kek(ekΘ),\Theta=\mathbb{E}\left[S\left(\sum_{k=1}^{K}\sum_{j=1}^{n_{k}}W_{k,j}U_{k,j}{\rm vec}\left(e_{k}X_{0}^{\prime}\right)^{\prime}n_{k}^{-\frac{1}{2}}-{\rm vec}(I_{K})^{\prime}\right)\right]=\sum_{k=1}^{K}e_{k}\left(e_{k}^{\prime}\Theta\right),

where its kk-th row ekΘe_{k}^{\prime}\Theta is given by

ekΘ\displaystyle e_{k}^{\prime}\Theta =(nk1)1λk1[(3ϕ¯kψ¯k2)(1nk1)]vec(Jke)\displaystyle=-\left(n_{k}-1\right)^{-1}\lambda_{k}^{-1}\left[\left(3\bar{\phi}_{k}-\bar{\psi}_{k}-2\right)\left(1-n_{k}^{-1}\right)\right]{\rm vec}\left(J_{k}^{e}\right)^{\prime}
=λk1nk1(3ϕ¯kψ¯k2)vec(Jke).\displaystyle=-\lambda_{k}^{-1}n_{k}^{-1}\left(3\bar{\phi}_{k}-\bar{\psi}_{k}-2\right){\rm vec}\left(J_{k}^{e}\right)^{\prime}.