Cluster GARCH
Abstract
We introduce a novel multivariate GARCH model with flexible convolution- distributions that is applicable in high-dimensional systems. The model is called Cluster GARCH because it can accommodate cluster structures in the conditional correlation matrix and in the tail dependencies. The expressions for the log-likelihood function and its derivatives are tractable, and the latter facilitate a score-drive model for the dynamic correlation structure. We apply the Cluster GARCH model to daily returns for 100 assets and find it outperforms existing models, both in-sample and out-of-sample. Moreover, the convolution- distribution provides a better empirical performance than the conventional multivariate -distribution.
Keywords: Multivariate GARCH, Score-Driven Model, Cluster Structure, Block Correlation Matrix, Heavy Tailed Distributions.
JEL Classification: G11, G17, C32, C58
1 Introduction
Univariate GARCH models have enjoyed considerable empirical success since they were introduced in Engle, (1982) and refined in Bollerslev, (1986). In contrast, the success of multivariate GARCH models has been more moderate due to a number of challenges, see e.g. Bauwens et al., (2006). A common approach to modeling covariance matrices is to model variances and correlations separately, as is the case in the Constant Conditional Correlation (CCC) model by Bollerslev, (1990) and the Dynamic Conditional Correlation (DCC) model by Engle, (2002). See also Engle and Sheppard, (2001), Tse and Tsui, (2002), Aielli, (2013), Engle et al., (2019), and Pakel et al., (2021). While univariate conditional variances can be effectively modeled using standard GARCH models, the modeling of dynamic conditional correlation matrices necessitates less intuitive choices to be made. One challenge is that the number of correlations increases with the square of the number of variables, a second challenge is that the conditional correlation matrix must be positive semidefinite, and a third challenge is to determine how correlations should be updated in response to sample information.
In this paper, we develop a novel dynamic model of the conditional correlation matrix, the Cluster GARCH model, which has three main features. First, use convolution- distributions, which is a flexible class of multivariate heavy-tailed distributions with tractable likelihood expressions. The multivariate -distributions are nested in this framework, but a convolution- distribution can have heterogeneous marginal distributions and cluster-based dependencies. For instance, convolution- distributions can generate the type of sector-specific price jumps reported in Andersen et al., (2024). Second, the dynamic model is based on the score-driven framework by Creal et al., (2013), which leads to closed-form expressions for all key quantities. Third, the model can be combined with a block correlation structure that makes the model applicable to high-dimensional systems. This partitioning, defining the block structure, can also be interpreted as a second type of cluster structure.
Heavy-tailed distributions are common in financial returns, and many empirical studies adopt the multivariate -distribution to model vectors of financial series, e.g., Kotz and Nadarajah, (2004), Harvey, (2013), and Ibragimov et al., (2015). An implication of the multivariate -distribution is that all standardized returns have identical and time-invariant marginal distributions. This is a restrictive assumption, especially in high dimensions. The convolution- distributions by Hansen and Tong, (2024) relax these assumptions, and one of the main contributions of this paper is to incorporate this class of distributions into a tractable multivariate GARCH model. A convolution- distribution is a convolution of multivariate -distributions. In the Cluster GARCH model, standardized returns are time-varying linear combinations of independent -distributions, which can have different degrees of freedom. This leads to dynamic and heterogeneous marginal distributions for standardized returns, albeit the conventional multivariate -distribution is nested in this framework as a special case. We focus on three particular types of convolution- distributions, labelled Canonical-Block-, Cluster-, and Hetero-. These all have relatively simple log-likelihood functions, such that we can obtain closed-form expressions for the first two derivatives, score and information matrix, of the conditional log-likelihood functions. These are used in our score-driven model for the time-varying correlation structure, which is a key component of the Cluster GARCH model.
High-dimensional correlation matrices can be modeled using a parsimonious block structure for the conditional correlation matrix. The DCC model is parsimonious but greatly limits the way the conditional covariance matrix can be updated. Without additional structure, the number of latent variables increases with , where is the number of assets. This number becomes unmanageable once is more than a single digit, and maintaining a positive definite correlation matrix can be challenging too. The correlation structure in the Block DECO model by Engle and Kelly, (2012) is an effective way to reduce the dimension of the estimated parameters. However, the estimation strategy in Engle and Kelly, (2012) was based on an ad-hoc averaging of within-block correlations for an auxiliary DCC model, and they did not fully utilize the simplifications offered by the block structure.111They derived likelihood expressions for the case with blocks. For more two blocks, , they resort to a composite likelihood evaluation. The model proposed in this paper draws on recent advances in correlation matrix analysis by Archakov and Hansen, (2021, 2024). We will, in some specifications, adopt the block parameterization of the conditional correlation matrix, used in Archakov et al., (2020), which has (at most) free parameters where is the number of blocks. This approach guarantees a positive definite correlation matrix and the likelihood evaluation is greatly simplified. Overall, the Cluster GARCH offers a good balance between flexibility and computational feasibility in high dimensions.
We adopt the convenient parametrization of the conditional correlation matrix, , which is defined by taking the matrix logarithm of the correlation matrix, , and stacking the off-diagonal elements of into the vector, , where . This parametrization was introduced in Archakov and Hansen, (2021) and the mapping is one-to-one between the set of non-singular correlation matrices and . So, the inverse mapping, , will always yield a positive definite correlation matrix and any non-singular correlation matrix can be generated in this way. The parametrization can be viewed as a generalization of Fisher’s Z-transformation to the multivariate case. It has attractive finite sample properties, which makes it suitable for an autoregressive model structure, see Archakov and Hansen, (2021).
A block correlation structure arises when variables can be partitioned into clusters, say, and the correlation between two variables is determined by their cluster assignments. When has a block structure, then also has a block structure. This leads to a new parametrization of block correlation matrices, which defines a one-to-one mapping between the set of non-singular block correlation matrices and with . We adopt the canonical representation by Archakov and Hansen, (2024), which is a quasi-spectral decomposition of block matrices that diagonalizes the matrix with the exception of a small submatrix. This decomposition makes the model parsimonious and greatly simplifies the evaluations of the log-likelihood function. This parameterization of block correlation matrices is more general than the factor-based approach to parametrizing block correlation matrices.222The factor-induced block structure, see Creal and Tsay, (2015), Opschoor et al., (2021), and Oh and Patton, (2023), entails superfluous restrictions on , see Tong and Hansen, (2023). Both approach simplifies the computation of and , but only the parametrization based on the canonical representation simplifies the evaluation of the likelihood function for the convolution- distributions.
Our paper contributes to the literature on score-driven model for dynamics of covariance matrices. Using the multivariate -distribution, Creal et al., (2012) and Hafner and Wang, (2023) proposed score-driven model for time-varying covariance and correlation matrix, respectively.333The model by Hafner and Wang, (2023) update parameters using the unscaled score, i.e., they did not use the information matrix. Oh and Patton, (2023) proposed a score-driven dynamic factor copula models with skew- copula function, however, the analytical information matrices in these copula models are not available. Using realized measures of the covariance matrix, Gorgi et al., (2019) proposed the Realized Wishart-GARCH, which relies on a Wishart distribution for realized covariance matrices and on a Gaussian distribution for returns. Opschoor et al., (2017) constructed a multivariate HEAVY model based on Heavy-tailed distributions for both returns and the realized covariances. An aspect, which sets the Cluster GARCH apart from the existing literature, is that the model is based on the convolution- distributions, which includes the Gaussian distribution and the multivariate -distributions as special cases. The block structures we impose on the correlation matrix in some specifications, was previously used in Archakov et al., (2020). Their model used the Realized GARCH framework with a Gaussian specification, whereas we adopt the score-driven framework for convolution- distributions, and do require realized volatility measures in the modeling.
We conduct an extensive empirical investigation on the performance of our dynamic model for correlation matrices. The sample period spans the period from January 3, 2005 to December 31, 2021. The new model is applicable to high dimensions, and we consider a “small universe” with assets and a “large universe” with assets. The small universe allows us compare the new models with a range of existing models, as most of these are not applicable to the large universe. We also undertake an more detailed specification analysis with the small universe. The nine stocks are from three sectors, three from each sector, which motivates certain block and cluster structures. First, we find that the convolution- distribution offers a better fit than the conventional -distribution. Overall, the Cluster- distribution has the largest log-likelihood value. Second, we find that score-driven models successfully captures the dynamic variation in the conditional correlation matrix. The new score-driven models outperform traditional DCC models when based on the same distributional assumptions, and the proposed score-driven model with a sector motivated block correlation matrix has the smallest BIC.
The large universe with stocks poses no obstacles for the Cluster GARCH model. We used the sector classification of the stocks to define the block structure in the correlation matrix. We also used the sector classification to explore possible cluster structures in the tail-dependencies, which are related to parameters in the convolution- distribution. With sectors this reduces the number of free parameters in the correlation matrix from 4950 to 55, and the model estimation is very fast and stable, in part because the required computations only involve matrices (instead of matrices). For the large universe, the empirical results favor the Hetero- specification, which entails a convolutions of a large number of univariate -distributions. We also find that correlation targeting, which is analogous to variance targeting in GARCH models, is beneficial.
The rest of this paper is organized as follows: In Section 2 we introduce a new parametrization of block correlation matrices, based on Archakov and Hansen, (2021) and Archakov and Hansen, (2024). In Section 3, we introduce the convolution- distributions. We derive the score-driven models in Section 4, and we obtain analytical expressions for the score and information matrix for the convolution- distributions, including the special case where has a block structure. Some details about practical implementation are given in Section 5. The empirical analysis is presented in Section 6 and includes in-sample and out-of-sample evaluations and comparisons. All proofs are given in the Appendix.
2 The Theoretical Model
Consider an -dimensional time-series, , , and let be a filtration to which is adapted, i.e. . We denote the conditional mean by and the conditional covariance matrix by . With , where , , it follows that the conditional correlation matrix is given by
Initially, we take and as given and focus on the dynamic modeling of . We are particularly interested in the case where is large. We define the following standardized variables with a dynamic correlation matrix ,
To simplify the notation, we omit subscript- in most of Sections 2 and 3 and reintroduce it again in Section 4 where the dynamic model is presented.
2.1 Block Correlation Matrix
If is relatively small, we can model the dynamic correlation matrix using latent variables. Additional structure on is required when is larger, because the number of latent variables becomes unmanageable. Additional structure can be imposed using a block structures on , as in Engle and Kelly, (2012).
A block correlation matrix is characterized by a partitioning of the variables into clusters, such that the correlation between two variables is solely determined by their cluster assignments. Let be the number of clusters, and let be the number of variables in the -th cluster, , such that . We let be the vector with cluster sizes and sort the variables such that the first variables are those in the first cluster , the next variables are those in the second cluster, and so forth. Then will have the following block structure
(1) |
where is an matrix given by
Each block, , has just one correlation coefficient, such that the block structure reduces the number of unique correlations from to at most .444This is based on the general case that the number of assets in each group is at least two. When there are groups with only one asset, this number become . The reason for the distinction between these two cases is that an diagonal block has no correlation coefficients. This number does not increase with , and this makes it possible to scale the model to accommodate high-dimensional correlation matrices.
Below we derive score-driven models for unrestricted correlation matrices and for the case where has a block structure. time].555It is unproblematic to extend the model to allow for some missing observations and occasional changes in the cluster assignments.
2.2 Parametrizing the Correlation Matrix
We parameterize the correlation matrix with the vector
(2) |
where extracts and vectorizes the elements below the diagonal and is the matrix logarithm of the correlation matrix.666For a nonsingular correlation matrix, we have , where is the spectral decomposition of , so that is a diagonal matrix with the eigenvalues of . The following example illustrates this parametrization for an correlation matrix:
This parametrization is convenient because it guarantees a unique positive definiteness correlation matrix, for any vector without imposing superfluous restrictions on the correlation matrix, see Archakov and Hansen, (2021).
For a block correlation matrix the logarithmic transformation preserves the block structure as illustrated in the following example:
The parameter vector, will only have as many unique elements as there are different blocks in . This number is , and we can therefore condense into a subvector, , such that
(3) |
where is a known bit-matrix with a single one in each row and . This factor structure for was first proposed in Archakov et al., (2020).
For later use, we define the condensed log-correlation matrix, , whose )-th element is the off-diagonal element from the -th block of , , and we can set . In the example above, we have
such that has dimension six whereas has dimension 21. Since the block correlation matrix, , is only a function of we can model the time-variation in using a dynamic model for the unrestricted vector . This will be our approach below.
2.3 Canonical Form for the Block Correlation Matrix
Block matrices has a canonical representation that resembles the eigendecomposition of matrices, see Archakov and Hansen, (2024). For a block correlation matrix with block-sizes, , we have
(4) |
where the upper left block, , is an matrix with elements , for and . The matrix is a group-specific orthonormal matrix, i.e., . Importantly, is solely determined by the block sizes, , and does not depend on the elements in . This matrix is given by
where and is an matrix, which is orthogonal to , i.e., , and orthonormal, such that 777The Gram-Schmidt process can be used to obtain from . The canonical representation enables us to rotate with and define
(5) |
where is -dimensional with , and is dimensional with for . The block-diagonal structure of implies that , and are uncorrelated, which simplifies several expressions. For instance, we have the following identities:
(6) |
such that the computation of the determinant and any power of is greatly simplified. The square-root of the correlation matrix, , is straight forward to compute. From the eigendecomposition of , , we define the block diagonal matrix: , and set . It is easy to verify that and that is symmetric. Computing therefore only requires an eigendecomposition of the symmetric and positive definite matrix, , rather than the eigendecomposition of , which is . Computing other power of can be done similarly.
We can use Archakov and Hansen, (2024, corollary 2) to recover the elements of the condensed log-correlation matrix,
where
The unique values in , which are the elements in , can be expressed as
where is the elimination matrix, that solves . This parametrization of block correlation matrices does not impose additional superfluous restrictions, and the canonical representation facilitates simple computation of the determinant, the matrix inverse, and any other power, as well as the matrix logarithm and the matrix exponential. This is very useful for the evaluation of the likelihood function, especially for the more complicated models with heterogeneous heavy tails and complex dependencies, which we pursue in the next section.
3 Distributions
The next step is to specify a distribution for the -dimensional random vector , from which the log-likelihood function, , is defined. We consider several specifications, ranging from the multivariate normal distribution to convolutions of multivariate -distributions. The convolution- distributions by Hansen and Tong, (2024) have simple log-likelihood functions and the canonical representation of a block correlation matrix motivates some particular specifications of the convolution- distribution.
We define
such that ,888An advantage of having defined from the eigendecomposition, is that the normalized variables in are invariant to reordering of the elements in , which would not be the case if a Cholesky form was used to define . and a convenient property of any log-likelihood function, , is that
(7) |
This shows that the log-likelihood function will be in closed-form if we adopt a distribution for with a closed-form expression for , and this is important for obtaining tractable score-driven models. It is well known that the multivariate -distribution and the Gaussian distribution have simple expression for . Fortunately, so does the multivariate convolution- distributions, which has different and interesting statistical properties for .
3.1 Multivariate -Distributions
We begin with the simplest heavy-tailed distribution, a scaled multivariate -distribution, which nests the Gaussian distribution as a limited case. The multivariate -distribution is widely used to model vectors of returns with heavy tailed distributions, see e.g. Creal et al., (2012), Opschoor et al., (2017), and Hafner and Wang, (2023).
The -dimensional multivariate -distribution with degrees of freedom, location , and scale matrix , typically written , has density
The variance is well-defined when , in which case . The parameter governs the heaviness of the tail and the multivariate -distribution converges to the multivariate normal distribution, , as .
To simplify the notation, we will use a scaled multivariate -distribution, denoted , which is defined for . Its density is given by,
(8) |
The relation between the two distributions is as follows: If with , then . The main advantage of the scaled -distribution is that . Thus, if then , and the corresponding log-likelihood function is given by
(9) |
where is a normalizing constant that does not depend on the correlation matrix, . If has a block structure we can use the identities in (6), and obtain the following simplified expression,
(10) | ||||
The multivariate -distribution has two implications for all elements of the vector . First, all elements of a multivariate -distribution are dependent, because they share a common random mixing variable. Second, all elements of are identically distributed, because they are -distributed with the same degrees of freedom. Both implications may be too restrictive in many applications, especially if the dimension, , is large. Below we consider the convolution- distribution proposed in Hansen and Tong, (2024), which allows for heterogeneity and cluster structures in the tail properties and the tail dependencies.
3.2 Multivariate Convolution- Distributions
The multivariate convolution- distribution is a suitable rotations of a random vector that is made up of independent multivariate -distributions. More specific, let be mutually independent standardized multivariate -distributed variables, , with for all and .
Then has the standardized convolution- distribution (with zero location vector and identity scale-rotation matrix) that is denoted by
where is the vector with degrees of freedom and is the vector with the dimensions for the multivariate -distributions. We can think of the partitioning of elements in as a second cluster structure, as we discuss below.
We will model the distribution of using , where is an orthonormal matrix, i.e. , and we use the notation . While , they will not have the same distribution, unless has a particular structure, such as . Similarly, we use the following notation for the distribution of
which is a convolution- distribution with location zero and scale-rotation matrix . Note that we have , for any orthonormal matrix, , but different choices for lead to different distributions with distinct non-linear dependencies that arise from the cluster structure in .
Conveniently, we have the expression, , and if we partition the columns in , using the same cluster structure as in , i.e. with , then it follows that , for . Next, and have the exact same log-likelihoods, , and we can use (7) to express the log-likelihood function for as
(11) |
where . When has a block structure, then we also have
where , and some interesting special cases emerge from this structure.
We have previously used a partitioning of the variables to form a block correlation structure, which arises from a cluster structure for the variables. The convolution- distribution involves a second partitioning that defines the independent multivariate -distributions. This is a cluster structure in the underlying random innovations in the model. The two cluster structures can be identical, or can be different, as we illustrate with examples and in our empirical application. Next, we highlight six distributional properties that are the product of this model design.
-
1.
Each element of , has the same marginal -distribution with degrees of freedom. This does not carry over to the same elements of (even if ). In general, the marginal distribution of an element of , will be a unique convolution of (as many as) independent -distributions with different degrees of freedom.
-
2.
While the (multivariate) -distributions are independent across groups, this does not carry over to the corresponding sub-vectors of .
-
3.
The convolution for each element of is, in part, defined by the correlation matrix, . So, time-variation in will induce time-varying marginal distributions for the elements of .
-
4.
The partitioning of into clusters (-clusters) induces heterogeneity in tail dependencies and the heavyness of the tails. The -clusters can be entirely different from the -clusters (partitioning of variables) that define the block structure in the correlation matrix, and the two numbers of clusters can be different.
-
5.
Increasing the number of -clusters, does not necessarily improve the empirical fit. While increasing will increase the number parameters (degrees of freedom) in the model, it also entails dividing into additional subvectors, which eliminates the innate dependence between elements of , which apply to elements from the same multivariate -distribution.
-
6.
Sixth, this model framework nests the conventional multivariate -distribution as the special case, , which facilitates simple comparisons with a natural benchmark model.
3.3 Density and CDF of Convolution- Distribution
The marginal distributions of the elements of are convolutions of independent -distributed variables, and neither their densities nor their cumulative distribution function have simple expressions.999Even for the simplest case – a convolution of two univariate -distributions – the resulting density does not have a simple closed-form expression. However, using Hansen and Tong, (2024, theorem 1) we obtain the following semi-analytical expressions, where and denote the real and imaginary part of , respectively, and is the -th column of identity matrix .
Proposition 1.
Suppose . Then the marginal density and cumulative distribution function for , , are given by
respectively, where is the characteristic function for , and
is the characteristic function of the univariate -distribution.
To gain some insight about convolution- distributions and the expressions in Proposition 1 we present features of two densities in Figure 1. We specifically consider convolutions, , for and , where are independent and standardized -distributed with six degrees of freedom.

The upper panel of Figure 1, Panel (a), shows the log-densities of the (left) tail of the distribution, and how they compare to those of a standardized -distribution and a standard normal distribution. As the convolution- distribution will approach the normal distribution. So, it is not surprisingly that the log-densities for the convolutions are between that of a and that of a standard normal. Unsurprisingly, the convolution of standardized -distributions is closer to the normal distribution than the convolution of distributions. However, the convolution- distribution is not a -distribution for . In terms of Kullback-Leibler discrepancy, the best approximating -distribution to the convolution- distribution is a -distribution when and a -distribution when , see the Q-Q plots in Panels (b) and (c) in Figure 1.
The expression for the marginal density of convolution- distributions is particularly useful in our empirical analysis, because it gives us a factorization of the joint density into marginal densities and the copula density by Sklar’s theorem. This leads to the decomposition of the log-likelihood, , where denotes the copula density, and we can see if gains in the log-likelihood are primarily driven by gains in the marginal distributions or by gains in the copula density.
3.4 Three Special Types of Convolution- Distributions
The convolution- distributions define a broad class of distributions, with many possible partitions of and choices for . Below we elaborate on som particular details of three special types of convolution- distributions. For latter use, we use to denote the -th column of identity matrix .
3.4.1 Special Type 1: Cluster- Distribution
The first special type of convolution- distribution has , such that , and a single cluster structure. The cluster structure, , is imposed on , whereas can be unrestricted, or have block structure based on the the same clustering, in which case and .
Without a block correlation structure on , we have and the log-likelihood function is simply computed using (11). If the block structure is imposed on , then we can express the multivariate -distributed variables as linear combinations on the canonical variables, ,
(12) |
We therefore have the expression for the quadratic terms,
and the log-likelihood function simplifies to
(13) |
where . The block structure simplifies implementation of the score-driven model for this specification, and makes it possible to implement the model with a large number of stocks.
3.4.2 Special Type 2: Hetero- Distribution
A second special type of convolution- distributions has and . So, the elements of are made up of independent univariate -distributions with degrees of freedom, , . This distribution can accommodate a high degree of heterogeneity in the tail properties of , , which are different convolutions of the independent -distributions. For this reason, we refer to these distributions as the Hetero- distributions. The number of degrees of freedom increases from to , but the additional parameters do not guarantee a better in-sample log-likelihood, because all dependence between elements of is eliminated. The Cluster- distribution has dependence between -variables within the same cluster. This has implications the linear combinations of , including those that define .
For the case with a general correlation matrix, the Hetero- distribution simplifies the log-likelihood function in (11) to
where .101010Note that we can obtain preliminary estimates (starting values) of the degrees of freedom parameters, by estimating from , where and is an estimate of the unconditional correlation matrix, for .
We can combine the heterogenous -distributions with a block correlation matrix, in which case the log-likelihood function simplifies to
(14) |
where and is the -th element of the vector expressed by (12).
3.4.3 Special Type 3: Canonical-Block- Distribution
A third special type of convolution- distributions is based on the canonical canonical variables, as defined by the canonical representation of the block correlation matrix. The Canonical-Block- distribution has and , such that is composed of independent multivariate -distributions. So,
This construction is motivated by the canonical variables, , that arises from the canonical representation of block correlation matrices. Interestingly, this type of convolution- distribution can be used, regardless of having a block structure or not. For a general correlation matrix, , the log-likelihood function is given by
where .
From a practical viewpoint, a more interesting situation is when has a block structure, such that . With this structure, the log-likelihood function simplifies to
(15) |
which is computationally advantageous, because it does note require an inverse (nor a determinant) of an matrix.
The expression for the log-likelihood function shows that this distribution is equivalent to assuming that are independent and distributed as and , for . This yields insight about the standardized returns within each block. Let be the -dimensional subvector of . From it follows that
such that a standardized return in the -th block has the same loading on the common variable , and orthogonal loadings on the vector .
Additional convolution- distributions could be bases on this structure. For instance, we could combine with heterogeneous univariate -distributions, for some or all of the canonical variables. For instance, the canonical variable, , could be made up of heterogeneous -distributions, while other canonical variables, have multivariate -distributions.
4 Score-Driven Models
We turn to the dynamic modeling of the conditional correlation matrix in this section. To this end we adopt the score-drive framework by Creal et al., (2013), to model the dynamic properties of , with . Specifically, we adopt the vector autoregressive model of order one, VAR(1):
(16) |
where and are matrices of coefficients, , and will be defined by the first-order conditions of the log-likelihood at times .111111It is straightforward to include additional lagged values of , such that (16) has a higher-order VAR(p) structure, and adding lagged values of , would generalize (16) to a VARMA(p,q) model, we do not pursue these extensions in this paper. The key aspect of a score-driven model is that the score of the predictive likelihood function is used to define the innovation , specifically
(17) |
and is a scaling matrix. The score is the first-order derivative of log-likelihood with respect to , and is a martingale difference process if the model is correctly specified. The Fisher information matrix, , is often used as the scaling matrix, in which case the time-varying parameter vector is updated in a manner that resembles a Newton-Raphson algorithm, see Creal et al., (2013).121212One exception is Hafner and Wang, (2023), who used an unscaled score, i.e. , which does not take any curvature of the log-likelihood into account when parameter values are revised.
A potential drawback of using as the scaling matrix in (17) is that the precision of the inverse deteriorates as the dimension increases. We will therefore approximate by imposing a diagonal structure, and simply inverting the diagonal elements of . This is equivalent to using the scaling matrix,
In this manner, each element of the parameter vector is updated with a scaled version of the corresponding element of the score. Computing the inverse, , is now straightforward and simple to implement.
The score is computed using the following decomposition,
(18) |
The expression for the last term was derived in Archakov and Hansen, (2021) using results from Linton and McCrorie, (1995). The drawback of this approach is that it requires an eigendecomposition of matrix and this is impractical and unstable when is large. Moreover, the computational burden for the corresponding information matrix is even worse. Fortunately, when has a block structure, we have the following simplified expression,
The first term can be computed very fast for all the variants of the convolution- distributions we consider. The second term only requires an eigendecomposition of (the upper-left submatrix of ), and this greatly reduces the computational burden for evaluating both the score and the information matrix.
For block correlation matrices, we use the vector autoregression of order one for the subvector,
(19) |
where , and and are matrices with .
To implement the score-driven model we need to derive the appropriate score and scaling matrix for each of the log-likelihoods. For this purpose, we will adopt the following notation involving matrices and matrix operators, with some notation adopted from Creal et al., (2012). Let and be two matrices with suitable dimensions. The Kronecker product is denoted by and we use and . We let denote the commutation matrix, the duplication matrix, and , , , are elimination matrices. These are defined by the following identities:
for any symmetric matrix, , and any matrix, .
4.1 Scores and Information Matrices for a General Correlation Matrix
We first derive expressions for and with a general correlation matrix. Recall that the log-likelihood function, based on a convolution- distribution, is given by (9), and in the special case with a multivariate -distribution, the log-likelihood simplifies to the expression in (11).
4.1.1 Score-Driven Model with Multivariate -Distribution
Theorem 1.
Suppose that . Then the score vector and information matrix with respect to , are given by:
(20) | ||||
(21) |
respectively, with ,
and
where the expression for is presented in the appendix, see (A.1).
The expression of shows that the impact of extreme values (outliers) is dampened by the degrees of freedom, however this mitigation subsides as . The result for the Gaussian distribution is obtained by setting , which are their limits as .
4.1.2 Score-Driven Model with Convolution- Distributions
Theorem 2.
Suppose that . Then the score vector and information matrix with respect to , are given by:
respectively, where is defined in Theorem 1, , and with
where ,
for .
The inverse of (an matrix) is available in closed form (see Appendix A) and is computationally inexpensive because it relies on an eigendecomposition of , which is already needed for computing in the expression of .
Some insight can be gained from considering the case . A key component of is , which shows that the impact that -th cluster, , has one the score is controlled by the coefficient .
4.2 Scores and Information Matrices for a Block Correlation Matrix
Next, we derive the corresponding expression for the case where has a block structure. For the score we have the following expression
and the expression for is given in the following Lemma.
Lemma 1.
Conveniently, the computation of only requires the inverse of a matrix. From the results for we have and similarly,
4.2.1 Score-Driven Model with Block Correlation and Multivariate -Distribution
With a block correlation structure, we define the standardized canonical variables
such that with and with for .
Theorem 3.
Suppose that . Then the score vector and information matrix with respect to the dynamic parameters, , are given by:
respectively, where
and , , and the diagonal matrix, , are defined by
for . In the special case where has a multivariate Gaussian distribution (, ), the expression for the information matrix simplifies to .
4.2.2 Score-Driven Model with Block Correlation and Cluster- Distribution
Theorem 4 (Cluster- with Block-).
Suppose that where has the block structure defined by . Then the score vector and information matrix with respect to dynamic parameters, , are given by:
respectively, where , and vector is the -th column of the identity matrix . The vector , the diagonal matrix, , and are defined as
for . The matrix is defined analogously to in Theorem 2.
4.2.3 Score-Driven Model with Block Correlation and Hetero- Distribution
Theorem 5 (Heterogeneous-Block Convolution-).
Suppose that , where has the block structure defined by . Then the score vector and information matrix with respect to the dynamic parameters, , are given by:
respectively, where
with
and is the -th column of identity matrix . The matrix is given by:
where , and
The diagonal matrix and are given by:
4.2.4 Score-Driven Model with Block Correlation and Canonical-Block- Distribution
Theorem 6 (Canonical-Block Convolution-).
Suppose that , where has the block structure defined by and . Then the score vector and information matrix with respect to the dynamic parameters, , are given by:
where the expressions for and diagonal matrix, , are those given in Theorem 5 with
for .
5 Some details about practical implementation
5.1 Obtaining the -matrix from the vector
The matrix, , plays a central role in the score models with block-correlation matrices. Below we show how can be computed from .
In order to obtain from , we adopt the algorithm developed in Archakov et al., (2024, theorem 5) to generate random block correlation matrices. The algorithm has three steps.
-
1.
Compute the elements of the matrix, , using
where are elements of , as defined by the identity, .
-
2.
From an arbitrary starting value, , e.g. a vector of zeroes, evaluate the recursion,
repeatedly, until convergence. Let denote the final value. (The convergences tends to be quick because is a fixed point to a contraction).
-
3.
Compute .
5.2 Correlation/Moment Targeting of Dynamic Parameters
The dimension of in the score-driven model with groups is . For this model we adopt the following dynamic model
where and are diagonal matrices. This makes the total number of parameters to be estimated when we use the Gaussian specification. Specifications with -distributions will have additional degrees of freedom parameters.
So-called variance targeting is often used when estimating multivariate GARCH models, where the expected value of the conditional covariance matrix is estimated in an initial step.131313Targeting is often found to be beneficial for prediction but can have drawbacks, e.g. for inference, see Pedersen, (2016). This idea can also be applied to the transformed correlations with an estimate of as the target. In the present context, it would be more appropriate to call it correlation targeting, or moment targeting that encompasses many variations of this method. For the initial estimation of the target, , we follow Archakov and Hansen, (2024) and estimate the unconditional sample block-correlation matrix with
where
We then proceed to compute . Because is non-linear, is only a first-order approximation of , but our empirical results suggest that it is a good approximation.
5.3 Benchmark Correlation Model: The DCC Model
The original DCC model was proposed by Engle, (2002), see also Engle and Sheppard, (2001). The original form of variance targeting could result in inconsistencies, see Aielli, (2013), who proposed a modification that resolves this issue. This model is known as cDCC model and is given by:
where is a symmetric positive definite matrix (whose dynamic properties are defined below) and is the diagonal matrix with the same diagonal elements as . This structure ensures that is a valid correlation matrix. The dynamic properties of are defined from those of , which are defined by
(23) |
where is the vector of ones, is a vector with standardized return shocks, is the Hadamard product (element by element multiplication), and , and are unknown matrices. Here is the unconditional correlation matrix, which can be parametrized as . Note that this model has time-varying parameters, as defined by the unique elements of . However, only has distinct correlations, so there are redundant variable in .
6 Empirical Analysis
We estimate and evaluate the models using nine stocks (small universe) as well as 100 stocks (large universe). We will use industry sectors, as defined by the Global Industry Classification Standard (GICS), to form block structures in the correlation matrix and/or the heavy tail index. The ticker symbols for all 100 stocks are listed in Table 1, organized by industry sectors. The nine stocks in the small universe are highlighted with bold font.
Energy | Materials | Industrials | Consumer | Consumer |
Discretionary | Staples | |||
APA | APD | BA | AMZN | CL |
BKR | DD | CAT | EBAY | COST |
COP | FCX | EMR | F | CPB |
CVX | IP | FDX | HD | KO |
DVN | SHW | GD | LOW | MDLZ |
HAL | GE | MCD | MO | |
MRO | HON | NKE | PEP | |
NOV | LMT | SBUX | PG | |
OXY | MMM | TGT | WBA | |
SLB | NSC | WMT | ||
WMB | UNP | |||
XOM | UPS | |||
Healthcare | Financials | Information | Telecom. | Utilities |
Technology | Services | |||
ABT | ALL | AAPL | CMCSA | AEE |
AMGN | AXP | ACN | DIS | AEP |
BAX | BAC | ADBE | DISH | DUK |
BMY | BK | CRM | GOOGL | ETR |
DHR | C | CSCO | OMC | EXC |
GILD | COF | IBM | T | NEE |
JNJ | GS | INTC | VZ | SO |
LLY | JPM | MSFT | ||
MDT | MET | NVDA | ||
MRK | RF | ORCL | ||
PFE | USB | QCOM | ||
TMO | WFC | TXN | ||
UNH | XRX | |||
Note: Ticker symbols for 100 stocks that define the Large Universe, listed by sector according to their Global Industry Classification Standard (GICS) codes. The nine stocks in the Small Universe are highlighted with bold font.
The sample period spans the period from January 3, 2005 to December 31, 2021, with a total of trading days. We obtained daily close-to-close returns from the CRSP daily stock files in the WRDS database.
The focus of this paper concerns the dynamic modeling of correlations, but in practice we also need to estimate the conditional variances. In our empirical analysis, we estimated each of the univariate time series of conditional variances using the EGARCH models by Nelson, (1991), where the conditional mean has an AR(1) structure, as is common in this literature. Thus, the model for the -th asset return on day , , is given by:
(24) |
The parameter, , is related to the well-known leverage effect, whereas is tied to the degree of volatility clustering. By modeling the logarithm of conditional volatility, the estimated volatility paths are guaranteed to be positive, which in conjunction with the parametrization of the correlation matrix, , guarantees a positive definite conditional covariance matrix. At this stage of the estimation, we do not want to select a particular type of heavy tail distributions for . So, we simply estimate the EGARCH models by quasi maximum likelihood estimation using a Gaussian specification. From the estimated time series for , we obtain the vector of standardized returns, , which are common to all the multivariate models we consider below.
Energy | Financial | Information Tech. | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
MRO | OXY | DVN | BAC | C | JPM | MSFT | INTC | CSCO | ||
Energy | MRO | 0.758 | 0.855 | 0.152 | 0.149 | 0.180 | 0.139 | 0.137 | 0.133 | |
OXY | 0.790 | 0.709 | 0.152 | 0.153 | 0.199 | 0.117 | 0.153 | 0.163 | ||
DVN | 0.814 | 0.775 | 0.145 | 0.153 | 0.126 | 0.125 | 0.145 | 0.136 | ||
Financial | BAC | 0.439 | 0.442 | 0.424 | 0.859 | 0.873 | 0.143 | 0.152 | 0.200 | |
C | 0.429 | 0.433 | 0.418 | 0.819 | 0.608 | 0.151 | 0.151 | 0.195 | ||
JPM | 0.459 | 0.466 | 0.435 | 0.829 | 0.762 | 0.222 | 0.245 | 0.251 | ||
Info Tech. | MSFT | 0.372 | 0.367 | 0.361 | 0.422 | 0.412 | 0.467 | 0.494 | 0.455 | |
INTC | 0.386 | 0.392 | 0.381 | 0.435 | 0.422 | 0.484 | 0.576 | 0.426 | ||
CSCO | 0.391 | 0.401 | 0.384 | 0.471 | 0.457 | 0.506 | 0.584 | 0.576 |
Note: The sample correlation matrix estimated for the nine assets (Small Universe) over the full sample period, January 3, 2005, to December 31, 2020. The elements of are given below the diagonal and elements of are given above the diagonal. The block structure is illustrated with shaded regions.
6.1 Small Universe: Dynamic Correlations for Nine Stocks
We begin by analyzing nine stocks and we refer to this data set as the small universe. The nine stocks are: Marathon Oil (MRO), Occidental Petroleum (OXY), and Devon Energy (DVN) from the energy sector, Bank of America (BAC), Citigroup (C), and JPMorgan Chase & Co (JPM) from the Financial sector, and Microsoft (MSFT), Intel (INTC), and Cisco (CSCO) from the Information Technology sector. Table 2 reports the full-sample unconditional correlation matrix (lower triangle) and its logarithm (upper-triangle) with the sector-based block structure illustrated with the shaded regions. Note that the estimated unconditional correlations within each of the blocks have similar averages. The assets within the Energy sector and Financial sector are highly correlated, with an average correlation of about 0.80. Within-sector correlations for Information Technology stock returns tend to be smaller, with an average of about 0.58. The between-sector correlations tend to be smaller and range from 0.36 to 0.51. A similar pattern is observed for the corresponding elements of the logarithm of the unconditional correlation matrix, as the logarithm transformation preserves the block structure.
Score-Driven Model | DCC Model | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Gaussian | Multiv.- | Canon- | Cluster- | Hetero- | Gaussian | Multiv.- | Canon- | Cluster- | Hetero- | ||||||||||||
Mean | 0.248 | 0.265 | 0.268 | 0.269 | 0.268 | 0.239 | 0.255 | 0.250 | 0.266 | 0.266 | |||||||||||
Min | 0.022 | 0.027 | 0.091 | 0.076 | 0.037 | 0.077 | 0.097 | 0.093 | 0.088 | 0.095 | |||||||||||
0.127 | 0.124 | 0.124 | 0.123 | 0.129 | 0.106 | 0.116 | 0.116 | 0.117 | 0.129 | ||||||||||||
0.164 | 0.169 | 0.166 | 0.165 | 0.169 | 0.136 | 0.144 | 0.149 | 0.150 | 0.155 | ||||||||||||
0.227 | 0.327 | 0.307 | 0.325 | 0.306 | 0.240 | 0.331 | 0.261 | 0.317 | 0.305 | ||||||||||||
Max | 0.816 | 0.777 | 0.805 | 0.807 | 0.815 | 0.874 | 0.783 | 0.829 | 0.848 | 0.856 | |||||||||||
Mean | 0.917 | 0.962 | 0.952 | 0.970 | 0.970 | 0.967 | 0.973 | 0.974 | 0.971 | 0.972 | |||||||||||
Min | 0.503 | 0.601 | 0.502 | 0.803 | 0.817 | 0.937 | 0.945 | 0.938 | 0.938 | 0.939 | |||||||||||
0.881 | 0.976 | 0.951 | 0.966 | 0.964 | 0.964 | 0.972 | 0.973 | 0.967 | 0.968 | ||||||||||||
0.980 | 0.990 | 0.988 | 0.988 | 0.985 | 0.968 | 0.975 | 0.977 | 0.975 | 0.976 | ||||||||||||
0.995 | 0.996 | 0.994 | 0.994 | 0.994 | 0.972 | 0.979 | 0.980 | 0.978 | 0.978 | ||||||||||||
Max | 0.999 | 0.999 | 0.999 | 0.999 | 0.999 | 0.979 | 0.984 | 0.991 | 0.986 | 0.981 | |||||||||||
Mean | 0.019 | 0.014 | 0.016 | 0.015 | 0.016 | 0.016 | 0.013 | 0.011 | 0.012 | 0.012 | |||||||||||
Min | 0.001 | 0.001 | 0.001 | 0.001 | 0.001 | 0.011 | 0.008 | 0.005 | 0.007 | 0.008 | |||||||||||
0.007 | 0.005 | 0.004 | 0.005 | 0.005 | 0.012 | 0.010 | 0.008 | 0.009 | 0.011 | ||||||||||||
0.014 | 0.011 | 0.009 | 0.009 | 0.011 | 0.016 | 0.013 | 0.012 | 0.012 | 0.012 | ||||||||||||
0.025 | 0.019 | 0.023 | 0.023 | 0.024 | 0.019 | 0.015 | 0.014 | 0.014 | 0.014 | ||||||||||||
Max | 0.071 | 0.069 | 0.071 | 0.057 | 0.063 | 0.028 | 0.021 | 0.016 | 0.022 | 0.019 | |||||||||||
6.232 | 6.391 | 6.029 | 6.501 | ||||||||||||||||||
6.193 | 5.907 | ||||||||||||||||||||
5.517 | 6.022 | 5.320 | 5.097 | 5.839 | 5.162 | ||||||||||||||||
4.597 | 4.533 | ||||||||||||||||||||
4.552 | 4.330 | ||||||||||||||||||||
4.919 | 4.871 | 4.614 | 4.397 | 4.671 | 4.340 | ||||||||||||||||
5.165 | 4.985 | ||||||||||||||||||||
3.289 | 3.237 | ||||||||||||||||||||
3.714 | 4.175 | 3.807 | 3.315 | 4.046 | 3.740 | ||||||||||||||||
4.023 | 3.910 | ||||||||||||||||||||
108 | 109 | 112 | 111 | 117 | 126 | 127 | 130 | 129 | 135 | ||||||||||||
-42603 | -39971 | -39762 | -39282 | -39348 | -42675 | -40054 | -39827 | -39370 | -39465 | ||||||||||||
-54653 | -52866 | -52893 | -52774 | -52770 | -54653 | -52857 | -52858 | -52767 | -52746 | ||||||||||||
12050 | 12895 | 13131 | 13492 | 13422 | 11978 | 12803 | 13031 | 13397 | 13281 | ||||||||||||
AIC | 85422 | 80160 | 79748 | 78786 | 78930 | 85602 | 80362 | 79914 | 78998 | 79200 | |||||||||||
BIC | 86109 | 80853 | 80461 | 79492 | 79674 | 86404 | 81170 | 80741 | 79819 | 80059 |
Note: Parameter estimates for the full sample period, January 2005 to December 2021. The Score-Driven model and DCC model are both estimated with five distributional specifications, without imposing a block structure on . We report summary statistics for the estimates of , , and , and report all estimates of the degrees of freedom. is the number of parameters and we report the maximized log-likelihoods, , and its two components: the log-likelihoods for the nine marginal distributions, , and the corresponding log-copula density, . We also report the AIC and BIC . Bold font is used to identify the “best performing” specification in each row among Score-Driven models and among DCC models.
We estimate three types of dynamic correlation models using five different distributions. The first type of model is the DCC model, see (23). The second model is the new score-driven model for , which we introduced in Section 4.1. The third model is the score-driven model for a block correlation matrix, see Section 4.2. We consider five distributional specifications for , for each of these models. The distributions are: Gaussian, multivariate , Canonical-Block-, Cluster-, and Hetero- distributions. We impose a diagonal structure on the matrices, and . In Tables 3 and 5 we report means and quantiles for the estimated parameters, , , for score-driven model, and , , for DCC model, i.e. the DCC models have more parameters. We denote as the number of parameters, is the full log-likelihood function, and are the log-likelihood for marginal densities and copula functions. We also report the Akaike and Bayesian information criteria (AIC and BIC) to compare the performance of models with different number of parameters.
Table 3 reports the estimation results for the DCC model and score-driven models for general correlation matrix (Score-Full model). There are several interesting findings: First, the score-driven model provides superior performance relative to the simple DCC model for all five specifications of distributions. Second, the models with heavy-tailed distributions perform better than the corresponding model with a Gaussian distribution. For the score-driven models we see that persistence parameter, , is larger for with heavy tailed specifications, as the existence of would mitigate the effect from extreme value in updating interested parameters. Third, introducing the structured heavy tails greatly improve the model performances, as indicated by higher likelihood values . That this improves the empirical fit is supported by the estimated degree of freedoms, which are different for different asset groups. The Information Tech sector is estimated to have the heaviest tails, follow by the Financial and Energy sectors. Fourth, the degree of freedoms estimated from Cluster- distribution is larger than the averages of each group from Hetero- distribution, as we have explained earlier. Fifth, from the decomposition of , we could observe that the improvements of Canonical-Block- relative to the multivariate -distribution are all driven by the copula part. This is also the case for comparing Hetero- and Cluster- distributions. Although the former provides more flexibility in fitting marginal distribution of individual asset, it doesn’t necessarily lead to a better dependence structure. In this dataset, the Cluster- provides the largest copula functions, as it allows for a common shock among assets within the same group.
Gaussian | Multiv.- | Canon- | Cluster- | Hetero- | |||||
---|---|---|---|---|---|---|---|---|---|
0.663 | 0.666 | 0.687 | 0.697 | 0.689 | |||||
0.138 | 0.139 | 0.140 | 0.140 | 0.141 | |||||
0.089 | 0.110 | 0.108 | 0.111 | 0.112 | |||||
0.793 | 0.778 | 0.802 | 0.811 | 0.810 | |||||
0.169 | 0.188 | 0.179 | 0.178 | 0.185 | |||||
0.302 | 0.435 | 0.380 | 0.434 | 0.405 | |||||
0.918 | 0.988 | 0.964 | 0.982 | 0.974 | |||||
0.990 | 0.987 | 0.988 | 0.990 | 0.990 | |||||
0.988 | 0.990 | 0.988 | 0.987 | 0.990 | |||||
0.868 | 0.985 | 0.920 | 0.954 | 0.956 | |||||
0.921 | 0.912 | 0.921 | 0.942 | 0.936 | |||||
0.930 | 0.950 | 0.956 | 0.956 | 0.958 | |||||
0.082 | 0.035 | 0.058 | 0.044 | 0.052 | |||||
0.025 | 0.034 | 0.031 | 0.031 | 0.031 | |||||
0.025 | 0.029 | 0.030 | 0.033 | 0.030 | |||||
0.129 | 0.052 | 0.123 | 0.093 | 0.093 | |||||
0.041 | 0.064 | 0.056 | 0.051 | 0.054 | |||||
0.067 | 0.045 | 0.047 | 0.050 | 0.053 | |||||
6.323 | 6.476 | ||||||||
6.496 | |||||||||
5.465 | 6.098 | 5.287 | |||||||
4.797 | |||||||||
4.568 | |||||||||
4.771 | 4.918 | 4.691 | |||||||
4.977 | |||||||||
3.254 | |||||||||
3.637 | 4.182 | 3.816 | |||||||
4.094 | |||||||||
18 | 19 | 22 | 21 | 27 | |||||
-42696 | -40068 | -39814 | -39352 | -39428 | |||||
-54653 | -52870 | -52883 | -52770 | -52769 | |||||
11957 | 12803 | 13070 | 13417 | 13341 | |||||
AIC | 85428 | 80174 | 79672 | 78746 | 78910 | ||||
BIC | 85543 | 80295 | 79812 | 78880 | 79082 |
Note: Parameter estimates for the full sample period, January 2005 to December 2021. Score-Driven models with a block correlation structure and five distributional specifications are estimated. The parameter estimates are reported with subscript that refer to (within/between) clusters, where Energy=1, Financial=2, and Information Tech=3. is the number of parameters and we report the maximized log-likelihoods, , and its two components: the log-likelihoods for the nine marginal distributions, , and the corresponding log-copula density, . We also report the AIC and BIC . Bold font is used to identify the “best performing” specification in each row among Score-Driven models.
Table 4 presents the estimation results for the score-driven models for block correlation matrix (Score-Block model). We report all the estimated coefficients with subscripts referring to the parameters for within/between groups with “Energy=1, Financial=2, Information Tech=3”. Results are similar to the Table 3. When compare with the 3, we could find although the DCC models the general correlation matrix, the restricted Score-Block models provide superior performances with the last three convolution- specifications. And compared with the Score-Full models, the Score-Block models delivery smaller BIC for all specifications, and smaller AIC for the last three cases. We plot the time series of correlations in Figure 2 filtered by Cluster- distributions. Several heterogenous patterns are observed: First, expect for the within correlations for financial sector, other correlations have a sharp decline in late 2010 and increase in early 2011. Second, the inter-group correlations that involves Energy sector have a evident decline in late 2008 and the recovered.

6.2 Large Universe: Dynamic Correlation Matrix for Assets
Next, we estimate the model with the large universe, where has dimension . We use the sector classification, see Table 1, to define the block structure on . Ten (of the eleven) sectors represented in the Large Universe, such that , and the number of unique correlations in is reduced from 4,950 to 55. We estimate the score-driven model with and without correlation targeting, see Section 5.2. With correlation targeting, the intercept, , is estimated first, and the remaining parameters are estimated in a second stage.
Score-Block Model with Full Parametrization | Score-Block Model with Correlation Targeting | ||||||||||||||||||||
Gaussian | Multiv.- | Canon- | Cluster- | Hetero- | Gaussian | Multiv.- | Canon- | Cluster- | Hetero- | ||||||||||||
Mean | 0.053 | 0.051 | 0.056 | 0.056 | 0.055 | 0.053 | 0.053 | 0.053 | 0.053 | 0.053 | |||||||||||
Min | 0.004 | 0.001 | 0.008 | 0.008 | 0.007 | 0.002 | 0.002 | 0.002 | 0.002 | 0.002 | |||||||||||
0.026 | 0.025 | 0.026 | 0.027 | 0.026 | 0.025 | 0.025 | 0.025 | 0.025 | 0.025 | ||||||||||||
0.038 | 0.037 | 0.038 | 0.039 | 0.040 | 0.036 | 0.036 | 0.036 | 0.036 | 0.036 | ||||||||||||
0.048 | 0.049 | 0.048 | 0.048 | 0.050 | 0.049 | 0.049 | 0.049 | 0.049 | 0.049 | ||||||||||||
Max | 0.333 | 0.327 | 0.343 | 0.348 | 0.342 | 0.349 | 0.349 | 0.349 | 0.349 | 0.349 | |||||||||||
Mean | 0.842 | 0.886 | 0.888 | 0.903 | 0.887 | 0.850 | 0.891 | 0.902 | 0.915 | 0.905 | |||||||||||
Min | 0.443 | 0.432 | 0.612 | 0.654 | 0.619 | 0.401 | 0.415 | 0.614 | 0.651 | 0.621 | |||||||||||
0.794 | 0.859 | 0.817 | 0.853 | 0.808 | 0.799 | 0.854 | 0.827 | 0.861 | 0.846 | ||||||||||||
0.897 | 0.983 | 0.935 | 0.940 | 0.944 | 0.898 | 0.983 | 0.941 | 0.951 | 0.954 | ||||||||||||
0.975 | 0.996 | 0.989 | 0.985 | 0.989 | 0.976 | 0.996 | 0.991 | 0.989 | 0.990 | ||||||||||||
Max | 0.999 | 0.999 | 0.999 | 0.999 | 0.999 | 0.999 | 1.000 | 0.999 | 0.999 | 0.999 | |||||||||||
Mean | 0.030 | 0.023 | 0.042 | 0.041 | 0.041 | 0.030 | 0.023 | 0.041 | 0.039 | 0.041 | |||||||||||
Min | 0.001 | 0.003 | 0.005 | 0.006 | 0.004 | 0.001 | 0.004 | 0.008 | 0.005 | 0.003 | |||||||||||
0.013 | 0.007 | 0.015 | 0.014 | 0.016 | 0.013 | 0.007 | 0.013 | 0.015 | 0.015 | ||||||||||||
0.028 | 0.015 | 0.035 | 0.038 | 0.036 | 0.028 | 0.015 | 0.032 | 0.029 | 0.036 | ||||||||||||
0.045 | 0.029 | 0.057 | 0.058 | 0.055 | 0.045 | 0.028 | 0.058 | 0.055 | 0.055 | ||||||||||||
Max | 0.102 | 0.108 | 0.137 | 0.127 | 0.140 | 0.103 | 0.106 | 0.138 | 0.128 | 0.139 | |||||||||||
10.06 | 12.25 | 10.11 | 12.20 | ||||||||||||||||||
7.111 | 7.320 | 4.852† | 7.158 | 7.341 | 4.882† | ||||||||||||||||
5.778 | 6.012 | 4.723† | 5.489 | 5.812 | 4.635† | ||||||||||||||||
6.640 | 6.903 | 4.451† | 6.397 | 6.644 | 4.368† | ||||||||||||||||
5.373 | 5.579 | 3.884† | 5.306 | 5.546 | 3.849† | ||||||||||||||||
5.657 | 5.896 | 4.049† | 5.627 | 5.847 | 4.018† | ||||||||||||||||
6.024 | 6.263 | 4.070† | 5.983 | 6.183 | 4.021† | ||||||||||||||||
6.018 | 6.159 | 4.280† | 6.008 | 6.127 | 4.289† | ||||||||||||||||
5.581 | 5.784 | 3.829† | 5.516 | 5.706 | 3.779† | ||||||||||||||||
7.022 | 7.286 | 5.347† | 7.043 | 7.283 | 5.337† | ||||||||||||||||
5.693 | 6.007 | 4.366† | 5.658 | 5.945 | 4.334† | ||||||||||||||||
165 | 166 | 176 | 175 | 265 | 110 | 111 | 121 | 120 | 210 | ||||||||||||
-481966 | -464247 | -448572 | -446633 | -436613 | -482019 | -464307 | -448640 | -446711 | -436704 | ||||||||||||
-607256 | -588971 | -589777 | -587713 | -586615 | -607256 | -589006 | -589625 | -587506 | -586446 | ||||||||||||
125292 | 124724 | 141204 | 141079 | 150002 | 125236 | 124699 | 140986 | 140795 | 149742 | ||||||||||||
AIC | 964262 | 928826 | 897496 | 893616 | 873756 | 964258 | 928836 | 897522 | 893662 | 873828 | |||||||||||
BIC | 965312 | 929882 | 898616 | 894729 | 875442 | 964958 | 929542 | 898292 | 894425 | 875164 |
Note: Parameter estimates for the full sample period, January 2005 to December 2021. Score-Driven models with a block correlation structure and five distributional specifications are estimated without correlation targeting (left panel) and with correlation targeting (right panel). We report summary statistics for the estimates of , , and , and all estimates of the degrees of freedom, except for the Heterogeneous Convolution- specifications where we report the average estimate within each cluster., as identified with the -superscript. is the number of parameters and we report the maximized log-likelihoods, , and its two components: the log-likelihoods for the nine marginal distributions, , and the corresponding log-copula density, . We also report the AIC and BIC . Bold font is used to identify the “best performing” specification in each row for models with and without correlation targeting.
Table 5 reports the estimation results for the the score-driven models with block correlation matrices. The left panel has estimation results for models without correlation targeting, and the right panel has the estimation results based on correlation targeting. The estimates identified with a -superscript, are the average degrees of freedom within each cluster. These are used for specifications with heterogeneous Convolution- specifications (Hetero-), which estimates 100 degrees of freedom parameters. Compared with the results for the Small Universe, we note some interesting difference. First, different from the results on small universe, the model with hetero- distribution now provides the best fitting performance, and compared with Cluster- distribution, its improvement concentrates on the copula part. This may due to the high level of heterogeneity across the large dataset, and the simple classification based GICS is poor.141414One could estimates the group structure by using the method in Oh and Patton, (2023), here we only focus such simple classification to assess our score-driven model in modeling high-dimensional assets. Second, the models estimated with targeting perform well and have the smallest BIC across all distributional specifications.
6.3 Out-of-sample Results
We next compare the out-of-sample (OOS) performance of the different models/specifications. We estimate all models (once) using data from 2005-2014 and evaluate the estimated models with (out-of-sample) data that spans the years: 2015-2021.
The OOS results for the Small Universe are shown in Panel A of Table 6. We decompose the predicted log-likelihood, , into the marginal, , and copula, , components. For each of the five distributional specifications, we have highlighted the largest predicted log-likelihood, which is the Score-Driven model without a block structure on , for all five distributions. This is consistent with our in-sample results, where this model also had the largest (in-sample) log-likelihood for each of the five distributional specifications, see Tables 3 and 4. Overall, the Convolution- distribution with a sector-based cluster structure, Cluster-, has the largest predictive log-likelihood. We also note that the DCC model is has the worst performance across all distributional specifications. In sample, the DCC model was slightly better than the Score-Driven model with a block correlation matrix, for two of the five distributions (Gaussian and multivariate ). This suggests that the DCC suffer from an overfitting problem.
Panel A: Out-of-sample Results for 9 Assets | |||||||||
Gaussian | Multiv.- | Canon- | Cluster- | Hetero- | |||||
DCC Model | |||||||||
126 | 127 | 130 | 129 | 135 | |||||
-17148 | -15887 | -15719 | -15538 | -15648 | |||||
-22721 | -21825 | -21849 | -21770 | -21784 | |||||
5573 | 5938 | 6130 | 6232 | 6136 | |||||
Score-Full Model | |||||||||
108 | 109 | 112 | 111 | 117 | |||||
-17139 | -15812 | -15672 | -15459 | -15572 | |||||
-22721 | -21839 | -21864 | -21782 | -21804 | |||||
5582 | 6027 | 6192 | 6323 | 6232 | |||||
Score-Block Model | |||||||||
18 | 19 | 22 | 21 | 27 | |||||
-17142 | -15832 | -15698 | -15484 | -15591 | |||||
-22721 | -21842 | -21865 | -21783 | -21805 | |||||
5579 | 6010 | 6167 | 6299 | 6214 | |||||
Panel B: Out-of-sample Results for 100 Assets | |||||||||
Gaussian | Student- | Convo- | Group- | Hetero- | |||||
Score-Block Model | |||||||||
165 | 166 | 176 | 175 | 265 | |||||
-202633 | -192366 | -184318 | -183362 | -179509 | |||||
-251946 | -242925 | -243415 | -242198 | -241225 | |||||
49313 | 50559 | 59098 | 58836 | 61716 | |||||
Score-Block Model with Correlation Targeting | |||||||||
110 | 111 | 121 | 120 | 210 | |||||
-201910 | -192041 | -184080 | -183145 | -179205 | |||||
-251946 | -242891 | -243364 | -242005 | -241072 | |||||
50036 | 50850 | 59284 | 58860 | 61867 |
Note: Out-of-sample results for the sample period (January 2015 to December 2021). is the number of parameters, is the log-likelihood function. The Akaike and Bayesian information criteria are respectively computed as AIC , and BIC . We include the Score-driven log Group-correlation model with several distribution assumptions. The highest log-likelihood and smallest AIC and BIC in each row are highlighted in bold.
We report the OOS results for the Large Universe in Panel B of Table 6, where all model-specifications employ a block structure on . The empirical results favor correlation targeting, because the Score-Driven model with correlation targeting has the largest predicted log-likelihood for each of the five distributions. Across the five distributions, the Convolution- distribution based on , independent -distributions, Hetero-, has the largest predictive log-likelihood.
7 Summary
We have introduced the Cluster GARCH model, which is a novel multivariate GARCH model, with two types of cluster structures. One that relates to the correlation structure and one that define non-linear dependencies. The Cluster GARCH framework combines several useful components from the existing literature. For instance, we incorporate the block correlation structure by Engle and Kelly, (2012), the correlation parametrization by Archakov and Hansen, (2021), and the convolution- distributions by Hansen and Tong, (2024). A convolution- distribution is a multivariate heavy-tailed distribution with cluster structures, flexible nonlinear dependencies, and heterogeneous marginal distributions. We also adopted the score-driven framework by Creal et al., (2013) to model the dynamic variation in the correlation structure. The convolution- distributions are well-suited for score-driven models, because their density functions are sufficiently tractable, allowing us to derive closed-form expressions for the key ingredients in score-driven models: the score and the Hessian. We derived detailed results for three special types of convolution- distributions. These are labelled Canonical-Block-, Cluster-, and Hetero-, and their score functions and Fisher informations are all available in closed-form.
Applying the model to high-dimensional systems is possible when the block correlation structure is imposed. This was pointed out in Archakov et al., (2020), but the present paper is first to demonstrate this empirically with . This was achieved with sector-based clusters that was used to define the block structure on the correlation matrix. The block structure is advantages for several reason. First, it reduces the number of distinct correlations in from 4,950 to 55 ( to ). Second, many likelihood computations are greatly simplified due to the canonical representation of block correlation matrix, see Archakov and Hansen, (2024). An important implication for the dynamic model is that computations only involve inverses, determinants, square-roots of matrices rather than matrices.
We conduct an extensive empirical investigation on the performance of our dynamic model for correlation matrices. And we consider a “small universe” with assets and a “large universe” with assets. The empirical results find strong support for convolution- distributions that outperforms conventional distributions, in-sample as well as out-of-sample. Moreover, the score-driven framework out-performs the standard DCC model in all cases (dimensions and choice of distribution). The score-driven model with a sector-based block correlation matrix has the smallest BIC.
References
- Aielli, (2013) Aielli, G. P. (2013). Dynamic conditional correlation: on properties and estimation. Journal of Business and Economic Statistics, 31:282–299.
- Andersen et al., (2024) Andersen, T. G., Ding, Y., and Todorov, V. (2024). The granular source of tail risk in the cross-section of asset prices. Working paper.
- Archakov and Hansen, (2021) Archakov, I. and Hansen, P. R. (2021). A new parametrization of correlation matrices. Econometrica, 89:1699–1715.
- Archakov and Hansen, (2024) Archakov, I. and Hansen, P. R. (2024). A canonical representation of block matrices with applications to covariance and correlation matrices. Review of Economics and Statistics, 106:1–15.
- Archakov et al., (2020) Archakov, I., Hansen, P. R., and Lunde, A. (2020). A multivariate Realized GARCH model. arXiv:2012.02708, [econ.EM].
- Archakov et al., (2024) Archakov, I., Hansen, P. R., and Luo, Y. (2024). A new method for generating random correlation matrices. Econometrics Journal, forthcoming.
- Bauwens et al., (2006) Bauwens, L., Laurent, S., and Rombouts, J. V. K. (2006). Multivariate GARCH models: A survey. Journal of Applied Econometrics, 21:79–109.
- Bollerslev, (1986) Bollerslev, T. (1986). Generalized autoregressive heteroskedasticity. Journal of Econometrics, 31:307–327.
- Bollerslev, (1990) Bollerslev, T. (1990). Modelling the Coherence in Short-run Nominal Exchange Rates: A Multivariate Generalized ARCH Model. The Review of Economics and Statistics, 72:498–505.
- Creal et al., (2012) Creal, D., Koopman, S. J., and Lucas, A. (2012). A dynamic multivariate heavy-tailed model for time-varying volatilities and correlations. Journal of Business and Economic Statistics, 29:552–563.
- Creal et al., (2013) Creal, D., Koopman, S. J., and Lucas, A. (2013). Generalized autoregressive score models with applications. Journal of Applied Econometrics, 28:777–795.
- Creal and Tsay, (2015) Creal, D. and Tsay, R. S. (2015). High dimensional dynamic stochastic copula models. Journal of Econometrics, 189:335–345.
- Engle and Kelly, (2012) Engle, R. and Kelly, B. (2012). Dynamic equicorrelation. Journal of Business and Economic Statistics, 30:212–228.
- Engle, (1982) Engle, R. F. (1982). Autoregressive conditional heteroskedasticity with estimates of the variance of U.K. inflation. Econometrica, 45:987–1007.
- Engle, (2002) Engle, R. F. (2002). Dynamic conditional correlation: A simple class of multivariate generalized autoregressive conditional heteroskedasticity models. Journal of Business and Economic Statistics, 20:339–350.
- Engle et al., (2019) Engle, R. F., Ledoit, O., and Wolf, M. (2019). Large dynamic covariance matrices. Journal of Business and Economic Statistics, 37:363–375.
- Engle and Sheppard, (2001) Engle, R. F. and Sheppard, K. (2001). Theoretical and Empirical properties of Dynamic Conditional Correlation Multivariate GARCH. NBER Working Papers 8554, National Bureau of Economic Research, Inc.
- Gorgi et al., (2019) Gorgi, P., Hansen, P. R., Janus, P., and Koopman, S. J. (2019). Realized Wishart-GARCH: A score-driven multi-asset volatility model. Journal of Financial Econometrics, 17:1–32.
- Hafner and Wang, (2023) Hafner, C. M. and Wang, L. (2023). A dynamic conditional score model for the log correlation matrix. Journal of Econometrics, 237.
- Hansen and Tong, (2024) Hansen, P. R. and Tong, C. (2024). Convolution-t distributions. arXiv:2404.00864, [econ.EM].
- Harvey, (2013) Harvey, A. (2013). Dynamic Models for Volatility and Heavy Tails: with Applications to Financial and Economic Time Series. Cambridge University Press.
- Hurst, (1995) Hurst, S. (1995). The characteristic function of the Student t distribution. Financial Mathematics Research Report No. FMRR006-95, Statistics Research Report No. SRR044-95.
- Ibragimov et al., (2015) Ibragimov, M., Ibragimov, R., and Walden, J. (2015). Heavy-Tailed Distributions and Robustness in Economics and Finance. Springer.
- Joarder, (1995) Joarder, A. H. (1995). The characteristic function of the univariate T-distribution. Dhaka University Journal of Science, 43:117–25.
- Kotz and Nadarajah, (2004) Kotz, S. and Nadarajah, S. (2004). Multivariate t-distributions and their applications. Cambridge University Press.
- Laub, (2004) Laub, A. J. (2004). Matrix analysis for scientists and engineers. SIAM.
- Linton and McCrorie, (1995) Linton, O. and McCrorie, J. R. (1995). Differentiation of an exponential matrix function: Solution. Econometric Theory, 11:1182–1185.
- Magnus and Neudecker, (1979) Magnus, J. R. and Neudecker, H. (1979). The commutation matrix: some properties and applications. The Annals of Statistics, 7:381–394.
- Nelson, (1991) Nelson, D. B. (1991). Conditional heteroskedasticity in asset returns: A new approach. Econometrica, 59:347–370.
- Oh and Patton, (2023) Oh, D. H. and Patton, A. J. (2023). Dynamic factor copula models with estimated cluster assignments. Journal of Econometrics, 237. 105374.
- Opschoor et al., (2017) Opschoor, A., Janus, P., Lucas, A., and van Dijk, D. (2017). New HEAVY models for fat-tailed realized covariances and returns. Journal of Business and Economic Statistics, 36:643–657.
- Opschoor et al., (2021) Opschoor, A., Lucas, A., Barra, I., and van Dijk, D. (2021). Closed-form multi-factor copula models with observation-driven dynamic factor loadings. Journal of Business and Economic Statistics, 39:1066–1079.
- Pakel et al., (2021) Pakel, C., Shephard, N., Sheppard, K., and Engle, R. F. (2021). Fitting vast dimensional time-varying covariance models. Journal of Business and Economic Statistics, 39:652–668.
- Pedersen, (2016) Pedersen, R. S. (2016). Targeting estimation of CCC-GARCH models with infinite fourth moments. Econometrics Journal, 32:498–531.
- Tong and Hansen, (2023) Tong, C. and Hansen, P. R. (2023). Characterizing correlation matrices that admit a clustered factor representation. Economic Letters, 233. 111433.
- Tse and Tsui, (2002) Tse, Y. K. and Tsui, A. K. C. (2002). A Multivariate Generalized Autoregressive Conditional Heteroscedasticity Model with Time-Varying Correlations. Journal of Business and Economic Statistics, 20:351–362.
Appendix A Proofs
Proof of Proposition 1. Let and consider , for some . It follows that where and is a univariate random variable with distribution, . The characteristics function for the conventional Student’s -distribution with degrees of freedom, see Hurst, (1995) and Joarder, (1995), is given by:
where is the modified Bessel function of the second kind, such that the characteristic function for is given by,
and the characteristic function for is simply .
Next, the -th element of can be expresses as
where and is the -th column of identity matrix . From the independence of it now follows that the characteristic function for is given by
Finally, from the inverse Fourier transform, we can recover the probability and cumulative density functions from the characteristic functions of , given by
and
respectively.
A.1 Proofs of Results for Score Model (Section 4)
First some notation. Let and be matrices, then denotes the Kronecker product. We use to denote and for as in Creal et al., (2012). The stacks the columns of matrix into a column vector, while stacks the lower triangular part (including diagonal elements) into a column vector, where . The identity matrix is denoted by .
From the eigendecomposition, , we have from Laub, (2004, Theorem 13.16) that The inverse is therefore given by
From Linton and McCrorie, (1995), the expression for is
(A.1) |
where is an orthonormal matrix from the eigenvectors of with eigenvalues, , and is a diagonal matrix with elements , for
Note that the the expression for is just the inverse of , given by
(A.2) |
where is a diagonal matrix with elements , for .
Next, we presents expectations of some quantities involving the distribution, involving the following constant,
Lemma A.1.
Suppose that and define
(i) For any integrable function and any , it holds that
where .
(ii) Moreover, if is homogeneous of degree , then
where .
By integrable function, , the requirement is and in parts (i) and (ii), respectively. Note that is allowed to be negative, since . Also, if is a positive integer, then
where we used , repeatedly. This simplifies the terms we use to derive the Fisher information matrix in several score models
Proof of Lemma A.1. Let and the density for is
whereas the density for is
The expected value we seek is
and the results for part (i) follows, since
To prove (ii) we use that can be expressed as where and is an independent -distributed random variable with degrees of freedom. Hence, , with , such that Now using part (i) and that is homogeneous, we find
where we used that and are independent, and Creal et al., (2012, Results 2), which states that if , then
This completes the proof.
Proof of Theorem 1. The log-likelihood function for a vector, , with the multivariate -distribution, is given by
So, we define , we have
such that the score is given by
From Archakov and Hansen, (2021, Proposition 3) we have the expression
(A.3) |
which uses the fact that , where , , are elimination matrices, and the expression is given in (A.1).
Next we rewrite as
where , such that
From Lemma A.1 with , we have
where . The expression for last expectation follows from Magnus and Neudecker, (1979, Theorem 4.1), which states that
if , where , and is the commutation matrix. Finally,
(A.4) |
This completes the proof.
Proof of Theorem 1. For this case we have the log-likelihood function
where , and . Because we have
Define , then we have
where . So, we have the formula for the score
where the matrix is defined in (A.3) and , which is based on
This proves (20). Next, the inverse of the matrix is available in closed form, see Appendix A, based on the eigendecomposition . This does not add to the computation burden additionally, because the eigendecomposition of is available from that of , which was needed for computing from .
The Information Matrix
Next we turn to the information matrix. Note that, with given by
For later use, we define , and , for , where the constants are given from Lemma A.1, given by
and define the function as
Note that we will use the following preliminary results in later analysis
Expectation of when
Expectation of when
The Expression for
We have the following expression,
where with given by
So, the final formula for information matrix is given by
as stated in (21). This completes the proof.
A.2 Block Correlation Matrix with Multivariate -Distribution
Next, we prove the results in Section 4.2. For latter use, we define the following variables:
By (6) we have following form of log-likelihood function
Because , we have
and the score is given by
Proof of Lemma 1. We have , where and
where the matrix is a diagonal matrix with diagonal elements for . The formula for is given by in (A.2). So, we have
Using the Woodbury formula, we simplify the inverse of the matrix,
which only requires the inverse of the low dimension, matrix, to be evaluated. Moreover, because is a diagonal matrix with elements , we define the diagonal matrix with diagonal elements , such that . This proves (22) and completes the proof of Lemma 1.
Proof of Theorem 3. The expression for is given by,
with
where is the -th column of the identity matrix . Then we obtain
which leads to
(A.5) |
where is a vector defined by
The Information Matrix
First, from the formula of score, we have following form of information matrix,
So we need to compute the matrix . From, (A.5), we could find its first term is a function of and the second term is a function of . So, for the first term, we have
Similar to (A.4), we have
For the second term, we first define
where the element in vector and diagonal matrix are define by and . We know that
where . So is given by
and along with the following vector and diagonal matrix , we have
So,
and we also have
Hence,
Finally, we have
which gives the expression in Theorem 3. Finally, in the limited case, , which corresponds to the multivariate normal distribution, we have , and the information matrix simplifies to This completes the proof.
Appendix B Block Correlation Matrix with Cluster- Distribution
Because , and , we have , and . Then the log-likelihood function is given by
by using the canonical representation of block correlation matrix , we define the vectors and as and , so we have , . From and the structure of , we have
for , and denote the -th column of . So we obtain
where , and with the -th column of the identity matrix . This leads to the the simplified expression for log-likelihood function
Note that we have
such that and are simply linear combinations of . From the structure of it follows that and are independent for , just as it the case for and (by their definition). We also have that and are uncorrelated, but they are not independent, because they have -distributed shocks in common.
B.1 The Form of the Score
By using , we first have
where . It follows that
where is a vector defined by
B.2 The Form of the Information Matrix
The information matrix of can be decompose into four components, given by
B.2.1 The Form of Matrix
Similar to previous proof, the covariance of the first part of is given by
where with
with .
B.2.2 The Form of Matrix
As for the second part, we have
where given by
and for , so is a diagonal matrix.
B.2.3 The Form of Matrix
As for the interaction term, we have , and
where the -th row of the matrix is given by
when , we have
when , we have
Thus, we have
Finally, the -th row of matrix is , so we have
Appendix C Block Correlation Matrix with Hetero- Distribution
Because , we have . By using , the log-likelihood function is now given by
where , and is the -th innovation of . The cluster structure is implied by the block correlation matrix. To simplify the notation we let denote the -th column of . The identity , which means that
where is -th column of the identity matrix , is the -th column of the identity matrix , and . So we have
with the and defined by
The information matrix of can be expressed by .
C.1 The Form of Matrix
and for later use, we have .
C.1.1 The First Term
We have
and with , we have
where we use that because is a vector. Next, when , we have
So, we have
where and , are defined as
respectively.
C.1.2 The Second Term
For , we have
The first term is given by
The second term is given by
So, we have
the last equality use the fact that as , and .
C.1.3 The Third Term
We have , then
So we have
C.1.4 Combine
Now we have
where with
C.2 The Form of Matrix
We first define the vector with elements
and obviously we have for . And we need to compute
Based on the following results on , for we have
we have
So
and for , we have
So,
and this leads to
and define the matrix as and for , we have .
C.3 The Form of Matrix
We first need to compute
For the first term, we have
So
and for the second term, we have and
So
as for the third term, we have , and
So, we have
and
We finally arrive at the following expression for , with ,
where its -th row is given by