This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Optimal Portfolio Using Factor Graphical Lasso 00footnotetext: The authors would like to thank the editor Fabio Trojani and three anonymous referees for helpful and constructive comments on the paper.

Tae-Hwy Lee111Department of Economics, University of California, Riverside. Email: [email protected].     and   Ekaterina Seregina222Department of Economics, Colby College. Email: [email protected].
Abstract

Graphical models are a powerful tool to estimate a high-dimensional inverse covariance (precision) matrix, which has been applied for a portfolio allocation problem. The assumption made by these models is a sparsity of the precision matrix. However, when stock returns are driven by common factors, such assumption does not hold. We address this limitation and develop a framework, Factor Graphical Lasso (FGL), which integrates graphical models with the factor structure in the context of portfolio allocation by decomposing a precision matrix into low-rank and sparse components. Our theoretical results and simulations show that FGL consistently estimates the portfolio weights and risk exposure and also that FGL is robust to heavy-tailed distributions which makes our method suitable for financial applications. FGL-based portfolios are shown to exhibit superior performance over several prominent competitors including equal-weighted and Index portfolios in the empirical application for the S&P500 constituents.

Keywords: High-dimensionality, Portfolio optimization, Graphical Lasso, Approximate Factor Model, Sharpe Ratio, Elliptical distributions

JEL Classifications: C13, C55, C58, G11, G17

1 Introduction

Estimating the inverse covariance matrix, or precision matrix, of excess stock returns is crucial for constructing weights of financial assets in a portfolio and estimating the out-of-sample Sharpe Ratio. In high-dimensional setting, when the number of assets, pp, is greater than or equal to the sample size, TT, using an estimator of covariance matrix for obtaining portfolio weights leads to unstable investment allocations. This is known as the Markowitz’ curse: a higher number of assets increases correlation between the investments, which calls for a more diversified portfolio, and yet unstable corner solutions for weights become more likely. The reason behind this curse is the need to invert a high-dimensional covariance matrix to obtain the optimal weights from the quadratic optimization problem: when pTp\geq T, the condition number of the covariance matrix (i.e., the absolute value of the ratio between maximal and minimal eigenvalues of the covariance matrix) is high. Hence, the inverted covariance matrix yields an unstable estimator of the precision matrix. To circumvent this issue one can estimate precision matrix directly, rather than inverting an estimated covariance matrix.

Graphical models were shown to provide consistent estimates of the precision matrix (Friedman et al., (2008); Meinshausen and Bühlmann, (2006); Cai et al., (2011)). Goto and Xu, (2015) estimated a sparse precision matrix for portfolio hedging using graphical models. They found out that their portfolio achieves significant out-of-sample risk reduction and higher return, as compared to the portfolios based on equal weights, shrunk covariance matrix, industry factor models, and no-short-sale constraints. Awoye, (2016) used Graphical Lasso (Friedman et al., (2008)) to estimate a sparse covariance matrix for the Markowitz mean-variance portfolio problem and reduce the realized portfolio risk. Millington and Niranjan, (2017) conducted an empirical study that applies Graphical Lasso for the estimation of covariance for the portfolio allocation. Their empirical findings suggest that portfolios using Graphical Lasso enjoy lower risk and higher returns compared to those using empirical covariance matrix. Millington and Niranjan, (2017) also construct a financial network using the estimated precision matrix to explore the relationship between the companies and show how the constructed network helps to make investment decisions. Callot et al., (2021) use the nodewise-regression method of Meinshausen and Bühlmann, (2006) to establish consistency of the estimated covariance matrix, weights and risk of high-dimensional financial portfolio. Their empirical application demonstrates that the precision matrix estimator based on the nodewise-regression outperforms the principal orthogonal complement thresholding estimator (POET) (Fan et al., (2013)) and linear shrinkage (Ledoit and Wolf, (2004)). Cai et al., (2020) use constrained 1\ell_{1}-minimization for inverse matrix estimation (Clime) of the precision matrix (Cai et al., (2011)) to develop a consistent estimator of the minimum variance for high-dimensional global minimum-variance portfolio. It is important to note that all the aforementioned methods impose some sparsity assumption on the precision matrix of excess returns.

An alternative strategy to handle high-dimensional setting uses factor models to acknowledge common variation in the stock prices, which was documented in many empirical studies (see Campbell et al., (1997) among many others). A common approach decomposes covariance matrix of excess returns into low-rank and sparse parts, the latter is further regularized since, after the common factors are accounted for, the remaining covariance matrix of the idiosyncratic components is still high-dimensional (Fan et al., (2013, 2011, 2018)). This stream of literature, however, focuses on the estimation of a covariance matrix. The accuracy of precision matrices obtained from inverting the factor-based covariance matrix was investigated by Ait-Sahalia and Xiu, (2017), but they did not study a high-dimensional case. Factor models are generally treated as competitors to graphical models: as an example, Callot et al., (2021) find evidence of superior performance of nodewise-regression estimator of precision matrix over a factor-based estimator POET (Fan et al., (2013)) in terms of the out-of-sample Sharpe Ratio and risk of financial portfolio. The root cause why factor models and graphical models are treated separately is the sparsity assumption on the precision matrix made in the latter. Specifically, as pointed out in Koike, (2020), when asset returns have common factors, the precision matrix cannot be sparse because all pairs of assets are partially correlated conditional on other assets through the common factors. One attempt to integrate factor modeling and high-dimensional precision estimation was made by Fan et al., (2018) (Section 5.2): the authors referred to such class of models as “conditional graphical models”. However, this was not the main focus of their paper which concentrated on covariance estimation through elliptical factor models. As Fan et al., (2018) pointed out, “though substantial amount of efforts have been made to understand the graphical model, little has been done for estimating conditional graphical model, which is more general and realistic”. Concretely, to the best of our knowledge there are no studies that examine theoretical and empirical performance of graphical models integrated with the factor structure in the context of portfolio allocation.

In this paper we fill this gap and develop a new conditional precision matrix estimator for the excess returns under the approximate factor model that combines the benefits of graphical models and factor structure. We call our algorithm the Factor Graphical Lasso (FGL). We use a factor model to remove the co-movements induced by the factors, and then we apply the Weighted Graphical Lasso for the estimation of the precision matrix of the idiosyncratic terms. We prove consistency of FGL in the spectral and 1\ell_{1} matrix norms. In addition, we prove consistency of the estimated portfolio weights and risk exposure for three formulations of the optimal portfolio allocation.

Our empirical application uses daily and monthly data for the constituents of the S&P500: we demonstrate that FGL outperforms equal-weighted portfolio, index portfolio, portfolios based on other estimators of precision matrix (Clime, Cai et al., (2011)) and covariance matrix, including POET (Fan et al., (2013)) and the shrinkage estimators adjusted to allow for the factor structure (Ledoit and Wolf, (2004), Ledoit and Wolf, (2017)), in terms of the out-of-sample Sharpe Ratio. Furthermore, we find strong empirical evidence that relaxing the constraint that portfolio weights sum up to one leads to a large increase in the out-of-sample Sharpe Ratio, which, to the best of our knowledge, has not been previously well-studied in the empirical finance literature.

From the theoretical perspective, our paper makes several important contributions to the existing literature on graphical models and factor models. First, to the best of out knowledge, there are no equivalent theoretical results that establish consistency of the portfolio weights and risk exposure in a high-dimensional setting without assuming sparsity on the covariance or precision matrix of stock returns. Second, we extend the theoretical results of POET (Fan et al., (2013)) to allow the number of factors to grow with the number of assets. Concretely, we establish uniform consistency for the factors and factor loadings estimated using PCA. Third, we are not aware of any other papers that provide convergence results for estimating a high-dimensional precision matrix using the Weighted Graphical Lasso under the approximate factor model with unobserved factors. Furthermore, all theoretical results established in this paper hold for a wide range of distributions: Sub-Gaussian family (including Gaussian) and elliptical family. Our simulations demonstrate that FGL is robust to very heavy-tailed distributions, which makes our method suitable for the financial applications. Finally, we demonstrate that in contrast to POET, the success of the proposed method does not heavily depend on the factor pervasiveness assumption: FGL is robust to the scenarios when the gap between the diverging and bounded eigenvalues decreases.

This paper is organized as follows: Section 2 reviews the basics of the Markowitz mean-variance portfolio theory. Section 3 provides a brief summary of the graphical models and introduces the Factor Graphical Lasso. Section 4 contains theoretical results and Section 5 validates these results using simulations. Section 6 provides empirical application. Section 7 concludes.

Notation

For the convenience of the reader, we summarize the notation to be used throughout the paper. Let 𝒮p\mathcal{S}_{p} denote the set of all p×pp\times p symmetric matrices, and 𝒮p++\mathcal{S}_{p}^{++} denotes the set of all p×pp\times p positive definite matrices. For any matrix 𝐂{\mathbf{C}}, its (i,j)(i,j)-th element is denoted as cijc_{ij}. Given a vector 𝐮d{\mathbf{u}}\in\mathbb{R}^{d} and parameter a[1,)a\in[1,\infty), let 𝐮a\left\lVert{\mathbf{u}}\right\rVert_{a} denote a\ell_{a}-norm. Given a matrix 𝐔𝒮p{\mathbf{U}}\in\mathcal{S}_{p}, let Λmax(𝐔)Λ1(𝐔)Λ2(𝐔)Λmin(𝐔)Λp(𝐔)\Lambda_{\text{max}}({\mathbf{U}})\equiv\Lambda_{1}({\mathbf{U}})\geq\Lambda_{2}({\mathbf{U}})\geq\ldots\geq\Lambda_{\text{min}}({\mathbf{U}})\equiv\Lambda_{p}({\mathbf{U}}) be the eigenvalues of 𝐔{\mathbf{U}}, and eigK(𝐔)K×p\text{eig}_{K}({\mathbf{U}})\in\mathbb{R}^{K\times p} denote the first KpK\leq p normalized eigenvectors corresponding to Λ1(𝐔),,ΛK(𝐔)\Lambda_{1}({\mathbf{U}}),\ldots,\Lambda_{K}({\mathbf{U}}). Given parameters a,b[1,)a,b\in[1,\infty), let |||𝐔|||a,bmax𝐲a=1𝐔𝐲b{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{\mathbf{U}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{a,b}\equiv\max_{\left\lVert{\mathbf{y}}\right\rVert_{a}=1}\left\lVert{\mathbf{U}}{\mathbf{y}}\right\rVert_{b} denote the induced matrix-operator norm. The special cases are |𝐔|1max1jNi=1N|ui,j|{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{\mathbf{U}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{1}\equiv\max_{1\leq j\leq N}\sum_{i=1}^{N}\left\lvert u_{i,j}\right\rvert for the 1/1\ell_{1}/\ell_{1}-operator norm; the operator norm (2\ell_{2}-matrix norm) |𝐔|22Λmax(𝐔𝐔){\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{\mathbf{U}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{2}^{2}\equiv\Lambda_{\text{max}}({\mathbf{U}}{\mathbf{U}}^{\prime}) is equal to the maximal singular value of 𝐔{\mathbf{U}}; |𝐔|max1jNi=1N|uj,i|{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{\mathbf{U}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{\infty}\equiv\max_{1\leq j\leq N}\sum_{i=1}^{N}\left\lvert u_{j,i}\right\rvert for the /\ell_{\infty}/\ell_{\infty}-operator norm. Finally, 𝐔maxmaxi,j|ui,j|\left\lVert{\mathbf{U}}\right\rVert_{\text{max}}\equiv\max_{i,j}\left\lvert u_{i,j}\right\rvert denotes the element-wise maximum, and |𝐔|F2i,jui,j2{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{\mathbf{U}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{F}^{2}\equiv\sum_{i,j}u_{i,j}^{2} denotes the Frobenius matrix norm.

2 Optimal Portfolio Allocation

Suppose we observe pp assets (indexed by ii) over TT period of time (indexed by tt). Let 𝐫~t=(r~1t,r~2t,,r~pt)𝒟(𝐦,𝚺)\widetilde{{\mathbf{r}}}_{t}=(\widetilde{r}_{1t},\widetilde{r}_{2t},\ldots,\widetilde{r}_{pt})^{\prime}\sim\mathcal{D}({\mathbf{m}},{\bm{\Sigma}}) be a p×1p\times 1 vector of excess returns drawn from a distribution 𝒟\mathcal{D}, where 𝐦{\mathbf{m}} and 𝚺{\bm{\Sigma}} are the unconditional mean and covariance matrix of the returns. The goal of the Markowitz theory is to choose asset weights in a portfolio optimally. We will study two optimization problems: the well-known Markowitz weight-constrained (MWC) optimization problem, and the Markowitz risk-constrained (MRC) optimization that relaxes the constraint on portfolio weights.

The first optimization problem searches for asset weights such that the portfolio achieves a desired expected rate of return with minimum risk, under the restriction that all weights sum up to one. This can be formulated as the following quadratic optimization problem:

min𝐰12𝐰𝚺𝐰,s.t.𝐰𝜾p=1and𝐦𝐰μ\min_{{\mathbf{w}}}\frac{1}{2}{\mathbf{w}}^{\prime}{\bm{\Sigma}}{\mathbf{w}},\ \text{s.t.}\ {\mathbf{w}}^{\prime}{\bm{\iota}}_{p}=1\ \text{and}\ {\mathbf{m}}^{\prime}{\mathbf{w}}\geq\mu (2.1)

where 𝐰{\mathbf{w}} is a p×1p\times 1 vector of asset weights in the portfolio, 𝜾p{\bm{\iota}}_{p} is a p×1p\times 1 vector of ones, and μ\mu is a desired expected rate of portfolio return. Let 𝚯𝚺1{\bm{\Theta}}\equiv{\bm{\Sigma}}^{-1} be the precision matrix.

If 𝐦𝐰>μ{\mathbf{m}}^{\prime}{\mathbf{w}}>\mu, then the solution to (2.1) yields the global minimum-variance (GMV) portfolio weights 𝐰GMV{\mathbf{w}}_{GMV}:

𝐰GMV=(𝜾p𝚯𝜾p)1𝚯𝜾p.{\mathbf{w}}_{GMV}=({\bm{\iota}}_{p}^{\prime}{\bm{\Theta}}{\bm{\iota}}_{p})^{-1}{\bm{\Theta}}{\bm{\iota}}_{p}. (2.2)

If 𝐦𝐰=μ{\mathbf{m}}^{\prime}{\mathbf{w}}=\mu, the solution to (2.1) is a well-known two-fund separation theorem introduced by Tobin, (1958):

𝐰MWC=(1a1)𝐰GMV+a1𝐰M,\displaystyle{\mathbf{w}}_{MWC}=(1-a_{1}){\mathbf{w}}_{GMV}+a_{1}{\mathbf{w}}_{M}, (2.3)

where 𝐰MWC{\mathbf{w}}_{MWC} denotes the portfolio allocation with the constraint that the weights need to sum up to one, 𝐰M=(𝜾p𝚯𝐦)1𝚯𝐦{\mathbf{w}}_{M}=({\bm{\iota}}_{p}^{\prime}{\bm{\Theta}}{\mathbf{m}})^{-1}{\bm{\Theta}}{\mathbf{m}}, and a1=[μ(𝐦𝚯𝜾p)(𝜾p𝚯𝜾p)(𝐦𝚯𝜾p)2]/[(𝐦𝚯𝐦)(𝜾p𝚯𝜾p)(𝐦𝚯𝜾p)2]a_{1}=[\mu({\mathbf{m}}^{\prime}{\bm{\Theta}}{\bm{\iota}}_{p})({\bm{\iota}}_{p}^{\prime}{\bm{\Theta}}{\bm{\iota}}_{p})-({\mathbf{m}}^{\prime}{\bm{\Theta}}{\bm{\iota}}_{p})^{2}]/[({\mathbf{m}}^{\prime}{\bm{\Theta}}{\mathbf{m}})({\bm{\iota}}_{p}^{\prime}{\bm{\Theta}}{\bm{\iota}}_{p})-({\mathbf{m}}^{\prime}{\bm{\Theta}}{\bm{\iota}}_{p})^{2}].

The MRC problem maximizes Sharpe Ratio (SR) subject to either target risk or target return constraints, but portfolio weights are not required to sum up to one:

max𝐰𝐦𝐰𝐰𝚺𝐰s.t.(i)𝐦𝐰μor(ii)𝐰𝚺𝐰σ2.\max_{{\mathbf{w}}}\frac{{\mathbf{m}}^{\prime}{\mathbf{w}}}{\sqrt{{\mathbf{w}}^{\prime}{\bm{\Sigma}}{\mathbf{w}}}}\ \text{s.t.}\ \text{(i)}\ {\mathbf{m}}^{\prime}{\mathbf{w}}\geq\mu\ \text{or}\text{(ii)}\ {\mathbf{w}}^{\prime}{\bm{\Sigma}}{\mathbf{w}}\leq\sigma^{2}. (2.4)

When μ=σ𝐦𝚯𝐦\mu=\sigma\sqrt{{\mathbf{m}}^{\prime}{\bm{\Theta}}{\mathbf{m}}}, the solution to either of the constraints is given by

𝐰MRC=σ𝐦𝚯𝐦𝚯𝐦.{\mathbf{w}}_{MRC}=\frac{\sigma}{\sqrt{{\mathbf{m}}^{\prime}{\bm{\Theta}}{\mathbf{m}}}}{\bm{\Theta}}{\mathbf{m}}. (2.5)

Equation (2.4) tells us that once an investor specifies the desired return, μ\mu, and maximum risk-tolerance level, σ\sigma, the MRC weight maximizes the Sharpe Ratio of the portfolio.

Therefore, we have three alternative portfolio allocations commonly used in the existing literature: GMV in (2.2), MWC in (2.3) and MRC in (2.5). It is clear that all formulations require an estimate of the precision matrix 𝚯{\bm{\Theta}}.

3 Factor Graphical Lasso

In this section we introduce a framework for estimating precision matrix for the aforementioned financial portfolios which accounts for the fact that the returns follow approximate factor structure. We examine how to solve the Markowitz mean-variance portfolio allocation problems using factor structure in the returns. We also develop Factor Graphical Lasso Algorithm that uses the estimated common factors to obtain a sparse precision matrix of the idiosyncratic component. The resulting estimator is used to obtain the precision of the asset returns necessary to form portfolio weights.

The arbitrage pricing theory (APT), developed by Ross, (1976), postulates that the expected returns on securities should be related to their covariance with the common components or factors. The goal of the APT is to model the tendency of asset returns to move together via factor decomposition. Assume that the return generating process (𝐫~t\widetilde{{\mathbf{r}}}_{t}) follows a KK-factor model:

𝐫~tp×1=𝐦+𝐁𝐟tK×1+𝜺t,t=1,,T\displaystyle\underbrace{\widetilde{{\mathbf{r}}}_{t}}_{p\times 1}={\mathbf{m}}+{\mathbf{B}}\underbrace{{\mathbf{f}}_{t}}_{K\times 1}+\ {\bm{\varepsilon}}_{t},\quad t=1,\ldots,T (3.1)

where 𝐟t=(f1t,,fKt){\mathbf{f}}_{t}=(f_{1t},\ldots,f_{Kt})^{\prime} are the factors, 𝐁{\mathbf{B}} is a p×Kp\times K matrix of factor loadings, and 𝜺t{\bm{\varepsilon}}_{t} is the idiosyncratic component that cannot be explained by the common factors. Without loss of generality, we assume throughout the paper that unconditional means of factors and idiosyncratic component are zero. Factors in (3.1) can be either observable, such as in Fama and French, (1993, 2015), or can be estimated using statistical factor models. Unobservable factors and loadings are usually estimated by the principal component analysis (PCA), as studied in Bai, (2003), Bai and Ng, (2002), Connor and Korajczyk, (1988), and Stock and Watson, (2002).

In this paper our main interest lies in establishing asymptotic properties of the estimators of precision matrix, portfolio weights and risk-exposure for the high-dimensional case. We assume that the number of common factors, K=Kp,TK=K_{p,T}\rightarrow\infty as pp\rightarrow\infty, or TT\rightarrow\infty, or both p,Tp,T\rightarrow\infty, but we require that max{K/p,K/T}0\max\{K/p,K/T\}\rightarrow 0 as p,Tp,T\rightarrow\infty.

Our setup is similar to the one studied in Fan et al., (2013): we consider a spiked covariance model when the first KK principal eigenvalues of 𝚺{\bm{\Sigma}} are growing with pp, while the remaining pKp-K eigenvalues are bounded.

Rewrite equation (3.1) in matrix form:

𝐑~p×T=𝐦𝜾T+𝐁p×K𝐅+𝐄,\underbrace{\widetilde{{\mathbf{R}}}}_{p\times T}={\mathbf{m}}{\bm{\iota}}^{\prime}_{T}+\underbrace{{\mathbf{B}}}_{p\times K}{\mathbf{F}}+{\mathbf{E}}, (3.2)

where 𝜾T{\bm{\iota}}_{T} is a T×1T\times 1 vector of ones. We further demean the returns using the sample mean, 𝐦^\widehat{{\mathbf{m}}}, to obtain 𝐑𝐑~𝐦^𝜾T{\mathbf{R}}\equiv\widetilde{{\mathbf{R}}}-\widehat{{\mathbf{m}}}{\bm{\iota}}^{\prime}_{T}. We assume that 𝐦^𝐦max=𝒪P(logp/T)\left\lVert\widehat{{\mathbf{m}}}-{\mathbf{m}}\right\rVert_{\text{max}}=\mathcal{O}_{P}(\sqrt{\log p/T}), which was proven to hold in Chang et al., (2018) (see their Lemma 1).

Let 𝚺ε=T1𝐄𝐄{\bm{\Sigma}}_{\varepsilon}=T^{-1}{\mathbf{E}}{\mathbf{E}}^{\prime} and 𝚺f=T1𝐅𝐅{\bm{\Sigma}}_{f}=T^{-1}{\mathbf{F}}{\mathbf{F}}^{\prime} be covariance matrices of the idiosyncratic components and factors, and let 𝚯ε=𝚺ε1{\bm{\Theta}}_{\varepsilon}={\bm{\Sigma}}_{\varepsilon}^{-1} and 𝚯f=𝚺f1{\bm{\Theta}}_{f}={\bm{\Sigma}}_{f}^{-1} be their inverses. The factors and loadings in (3.2) are estimated by solving the following minimization problem: (𝐁^,𝐅^)=argmin𝐁,𝐅𝐑𝐁𝐅F2(\widehat{{\mathbf{B}}},\widehat{{\mathbf{F}}})=\arg\!\min_{{\mathbf{B}},{\mathbf{F}}}\left\lVert{\mathbf{R}}-{\mathbf{B}}{\mathbf{F}}\right\rVert^{2}_{F} s.t. 1T𝐅𝐅=𝐈K,𝐁𝐁is diagonal\frac{1}{T}{\mathbf{F}}{\mathbf{F}}^{\prime}={\mathbf{I}}_{K},\ {\mathbf{B}}^{\prime}{\mathbf{B}}\ \text{is diagonal}. The constraints are needed to identify the factors (Fan et al., (2018)). It was shown (Stock and Watson, (2002)) that 𝐅^=TeigK(𝐑𝐑)\widehat{{\mathbf{F}}}=\sqrt{T}\text{eig}_{K}({\mathbf{R}}^{\prime}{\mathbf{R}}) and 𝐁^=T1𝐑𝐅^\widehat{{\mathbf{B}}}=T^{-1}{\mathbf{R}}\widehat{{\mathbf{F}}}^{\prime}. Given 𝐅^,𝐁^\widehat{{\mathbf{F}}},\widehat{{\mathbf{B}}}, define 𝐄^=𝐑𝐁^𝐅^\widehat{{\mathbf{E}}}={\mathbf{R}}-\widehat{{\mathbf{B}}}\widehat{{\mathbf{F}}}. Given a sample of the estimated residuals {𝜺^t=𝐫t𝐁^𝐟t^}t=1T\{\widehat{{\bm{\varepsilon}}}_{t}={\mathbf{r}}_{t}-\widehat{{\mathbf{B}}}\widehat{{\mathbf{f}}_{t}}\}_{t=1}^{T} and the estimated factors {𝐟^t}t=1T\{\widehat{{\mathbf{f}}}_{t}\}_{t=1}^{T}, let 𝚺^ε=(1/T)t=1T𝜺^t𝜺^t\widehat{{\bm{\Sigma}}}_{\varepsilon}=(1/T)\sum_{t=1}^{T}\widehat{{\bm{\varepsilon}}}_{t}\widehat{{\bm{\varepsilon}}}_{t}^{\prime} and 𝚺^f=(1/T)t=1T𝐟^t𝐟^t\widehat{{\bm{\Sigma}}}_{f}=(1/T)\sum_{t=1}^{T}\widehat{{\mathbf{f}}}_{t}\widehat{{\mathbf{f}}}_{t}^{\prime} be the sample counterparts of the covariance matrices. Since our interest is in constructing portfolio weights, our goal is to estimate a precision matrix of the excess returns 𝚯{\bm{\Theta}}.

We impose a sparsity assumption on the precision matrix of the idiosyncratic errors, 𝚯ε{\bm{\Theta}}_{\varepsilon}, which is obtained using the estimated residuals after removing the co-movements induced by the factors (see Barigozzi et al., (2018); Brownlees et al., (2018); Koike, (2020)).

Let us elaborate on three reasons justifying the assumption of sparsity on the precision matrix of residuals. First, from the technical viewpoint, this assumption is widely used in high-dimensional settings when p>Tp>T. Second, a more intuitive rationale for the sparsity assumption on 𝚯ε{\bm{\Theta}}_{\varepsilon} stems from its implication for the structure of corresponding optimal portfolios. Let rtportf𝐫~t𝐰tr_{t}^{\text{portf}}\equiv\widetilde{{\mathbf{r}}}_{t}^{\prime}{\mathbf{w}}_{t} be the optimal portfolio. Plugging in the definition of 𝐫~t\widetilde{{\mathbf{r}}}_{t} from (3.1), we get rtportf=(𝐦+𝜺t)𝐰t+𝐟t𝐁𝐰tr_{t}^{\text{portf}}=({\mathbf{m}}+{\bm{\varepsilon}}_{t})^{\prime}{\mathbf{w}}_{t}+{\mathbf{f}}_{t}^{\prime}{\mathbf{B}}{\mathbf{w}}_{t}. Hence, after hedging factor risk, we can isolate the excess return component only loading on non-factor risk. In this context, since 𝐰t{\mathbf{w}}_{t} is a function of 𝚯ε{\bm{\Theta}}_{\varepsilon}, imposing sparsity on 𝚯ε{\bm{\Theta}}_{\varepsilon} translates into reducing the contribution of more volatile non-factor risk on the optimal portfolio and thus leading to less sensitive (more robust) investment strategies.

Third, another rationale comes from relatively high “concentration” of S&P 500 Composite Index: as evidenced from SP Global Index methodology and financial data on S&P 500 constituents by weight, 15 large companies (top 3%) comprise 30% of the total index weights (starting from Apple that has the highest weight of nearly 7%). As the number of firms, pp, increases, one reasonable assumption is that the number of large firms increases at a rate slower than pp (Chudik et al., (2011); Gabaix, (2011)). This suggests that one could divide the firms into dominant ones and followers. After the effect of common factors is accounted for, dominant firms still have significant idiosyncratic movements that influence other firms and must be taken into account when constructing a portfolio. When it comes to fringe firms (or market followers), idiosyncratic movements are smaller in magnitude and might be less relevant for portfolio allocation purposes. Hence, the network of the idiosyncratic returns is sparse and the sparsity increases with pp. By imposing sparsity, we only keep relatively large partial correlations among idiosyncratic components: as illustrated in Supplemental Appendix D.2, in our empirical application the estimated number of zeroes in off-diagonal elements of 𝚯ε{\bm{\Theta}}_{\varepsilon} varies over time from 74.5%-98.8%.

Henceforth, having established the need for a sparse precision of errors, we search for a tool that would help us recover its entries. This brings us to consider a family of graphical models, which have evolved from the connection between partial correlations and the entries of an adjacency matrix. The adjacency matrix has zero or one in its entries, with a zero entry indicating that two variables are independent conditional on the rest. The adjacency matrix is sometimes referred to as a “graph”. Graphical Lasso procedure (Friedman et al., (2008)) described in Supplemental Appendix A is a representative member of graphical models family: its theoretical and empirical properties have been thoroughly examined in a standard sparse setting (Friedman et al., (2008), Mazumder and Hastie, (2012), Janková and van de Geer, (2018)). One of the goals of our paper is to augment graphical models to non-sparse settings through integrating them with factor modeling. By doing so, graphical models would become adequate for applications in economics and finance.

A common way to induce sparsity is by utilizing Lasso-type penalty. This strategy is used in the Graphical Lasso (GL) together with the objective function based on the Bregman divergence for estimating inverse covariance. The discussion of GL is presented in Supplemental Appendix A. We now elaborate on the Bregman divergence class which unifies many commonly used loss functions, including the quasi-likelihood function. Let 𝐖ε{\mathbf{W}}_{\varepsilon} be an estimate of 𝚺ε{\bm{\Sigma}}_{\varepsilon}. Ravikumar et al., (2011) showed that Bregman divergence of the form trace(𝐖ε𝚯ε)logdet(𝚯ε)\text{trace}({\mathbf{W}}_{\varepsilon}{\bm{\Theta}}_{\varepsilon})-\log\det({\bm{\Theta}}_{\varepsilon}), known as the log-determinant Bregman function, is suitable to be used as a measure of the quality of constructed sparse approximations of signals such as precision matrices. As pointed out by Ravikumar et al., (2011), in principle one could use other Bregman divergences including the von Neumann Entropy or the Frobenius divergence which would lead to alternative forms of divergence minimizations for estimating precision matrix. We proceed with the log-determinant Bregman function since (i) it ensures positive definite estimator of precision matrix; (ii) the population optimization problem involves only the population covariance and not its inverse; (iii) the log-determinant divergence gives rise to the likelihood function in the multivariate Gaussian case. At the same time, despite its resemblance with the Gaussian log-likelihood, Bregman divergence was shown to be applicable for non-Gaussian distributions (Ravikumar et al., (2011)). Let 𝐃^ε2diag(𝐖ε)\widehat{{\mathbf{D}}}_{\varepsilon}^{2}\equiv\textup{diag}({\mathbf{W}}_{\varepsilon}). To sparsify entries of precision matrix of the idiosyncratic errors 𝚯ε{\bm{\Theta}}_{\varepsilon}, we use the following penalized Bregman divergence with the Weighted Graphical Lasso penalty:

𝚯^ε,λ=argmin𝚯𝒮p++trace(𝐖ε𝚯ε)logdet(𝚯ε)+λijd^ε,iid^ε,jj|θε,ij|.\displaystyle\widehat{{\bm{\Theta}}}_{\varepsilon,\lambda}=\arg\!\min_{{\bm{\Theta}}\in\mathcal{S}_{p}^{++}}\text{trace}({\mathbf{W}}_{\varepsilon}{\bm{\Theta}}_{\varepsilon})-\log\det({\bm{\Theta}}_{\varepsilon})+\lambda\sum_{i\neq j}\widehat{d}_{\varepsilon,ii}\widehat{d}_{\varepsilon,jj}\left\lvert\theta_{\varepsilon,ij}\right\rvert. (3.3)

The subscript λ\lambda in 𝚯^ε,λ\widehat{{\bm{\Theta}}}_{\varepsilon,\lambda} means that the solution of the optimization problem in (3.3) will depend upon the choice of the tuning parameter which is discussed below. Section 4 establishes sparsity requirements that guarantee convergence of (3.3). In order to simplify notation, we will omit the subscript λ\lambda.

The objective function in (3.3) extends the family of linear shrinkage estimators of the first moment to linear shrinkage estimators of the inverse of the second moments. Instead of restricting the number of regressors for estimating conditional mean, equation (3.3) restricts the number of edges in a graph by shrinking some off-diagonal entries of precision matrix to zero. Note that shrinkage occurs adaptively with respect to partial covariances.

Let us discuss the choice of the tuning parameter λ\lambda in (3.3). Let 𝚯^ε,λ\widehat{{\bm{\Theta}}}_{\varepsilon,\lambda} be the solution to (3.3) for a fixed λ\lambda. Following Koike, (2020), we minimize the following Bayesian Information Criterion (BIC) using grid search:

BIC(λ)T[trace(𝚯^ε,λ𝚺^ε)logdet(𝚯^ε,λ)]+(logT)ij𝟙[θ^ε,λ,ij0].\text{BIC}(\lambda)\equiv T\Big{[}\text{trace}(\widehat{{\bm{\Theta}}}_{\varepsilon,\lambda}\widehat{{\bm{\Sigma}}}_{\varepsilon})-\log\text{det}(\widehat{{\bm{\Theta}}}_{\varepsilon,\lambda})\Big{]}+(\log T)\sum_{i\leq j}\mathds{1}\left[\widehat{\theta}_{\varepsilon,\lambda,ij}\neq 0\right]. (3.4)

The grid 𝒢{λ1,,λM}\mathcal{G}\equiv\{\lambda_{1},\ldots,\lambda_{M}\} is constructed as follows: the maximum value in the grid, λM\lambda_{M}, is set to be the smallest value for which all the off-diagonal entries of 𝚯^ε,λM\widehat{{\bm{\Theta}}}_{\varepsilon,\lambda_{M}} are zero. The smallest value of the grid, λ1𝒢\lambda_{1}\in\mathcal{G}, is determined as λ1ϑλM\lambda_{1}\equiv\vartheta\lambda_{M} for a constant 0<ϑ<10<\vartheta<1. The remaining grid values λ1,,λM\lambda_{1},\ldots,\lambda_{M} are constructed in the ascending order from λ1\lambda_{1} to λM\lambda_{M} on the log scale:

λi=exp(log(λ1)+i1M1log(λM/λ1)),i=2,,M1.\lambda_{i}=\exp\Big{(}\log(\lambda_{1})+\frac{i-1}{M-1}\log(\lambda_{M}/\lambda_{1})\Big{)},\quad i=2,\ldots,M-1.

We use ϑ=ω3T\vartheta=\omega_{3T} which is defined in Theorem 2 of the next section and M=10M=10 in the simulations and the empirical exercise.

Having estimated factors, factor loadings and precision matrix of the idiosyncratic components, we combine them using the Sherman-Morrison-Woodbury formula to estimate the precision matrix of excess returns:

𝚯^=𝚯^ε𝚯^ε𝐁^[𝚯^f+𝐁^𝚯^ε𝐁^]1𝐁^𝚯^ε.\widehat{{\bm{\Theta}}}=\widehat{{\bm{\Theta}}}_{\varepsilon}-\widehat{{\bm{\Theta}}}_{\varepsilon}\widehat{{\mathbf{B}}}[\widehat{{\bm{\Theta}}}_{f}+\widehat{{\mathbf{B}}}^{\prime}\widehat{{\bm{\Theta}}}_{\varepsilon}\widehat{{\mathbf{B}}}]^{-1}\widehat{{\mathbf{B}}}^{\prime}\widehat{{\bm{\Theta}}}_{\varepsilon}. (3.5)

To solve (3.3) we use the procedure based on the GL. However, the original algorithm developed by Friedman et al., (2008) is not suitable under the factor structure. Our procedure called Factor Graphical Lasso (FGL), which is summarized in Procedure 1, augments the standard GL: it starts with estimating factors, loadings (low-rank part) and error terms (sparse part), then it proceeds by recovering sparse precision matrix of the errors using GL, and, finally, low-rank and sparse components are combined through Shermann-Morrison-Woodbury formula in (3.5).

Procedure 1 Factor Graphical Lasso
1:  (Factor Model) Estimate 𝐟^t\widehat{{\mathbf{f}}}_{t} and 𝐛^i\widehat{{\mathbf{b}}}_{i} (Theorem 1). Get 𝜺^t=𝐫t𝐁^𝐟t^\widehat{{\bm{\varepsilon}}}_{t}={\mathbf{r}}_{t}-\widehat{{\mathbf{B}}}\widehat{{\mathbf{f}}_{t}}, 𝚺^ε\widehat{{\bm{\Sigma}}}_{\varepsilon}, 𝚺^f\widehat{{\bm{\Sigma}}}_{f} and 𝚯^f=𝚺^f1\widehat{{\bm{\Theta}}}_{f}=\widehat{{\bm{\Sigma}}}_{f}^{-1}.
2:  (GL) Use GL from Friedman et al., (2008) (see Supplemental Appendix A for more details) to get 𝚯^ε\widehat{{\bm{\Theta}}}_{\varepsilon}. (Theorem 2)
3:  (FGL) Use 𝚯^ε\widehat{{\bm{\Theta}}}_{\varepsilon}, 𝚯^f\widehat{{\bm{\Theta}}}_{f} and 𝐛^i\widehat{{\mathbf{b}}}_{i} from Steps 1-2 to get 𝚯^\widehat{{\bm{\Theta}}} in Equation (3.5). (Theorem 3)
4:  Use 𝚯^\widehat{{\bm{\Theta}}} to get 𝐰^ξ\widehat{{\mathbf{w}}}_{\xi}, ξ{GMV, MWC, MRC}\xi\in\{\text{GMV, MWC, MRC}\}. (Theorem 4)
5:  Use 𝚺^=𝚯^1\widehat{{\bm{\Sigma}}}=\widehat{{\bm{\Theta}}}^{-1} and 𝐰^ξ\widehat{{\mathbf{w}}}_{\xi} to get portfolio exposure 𝐰^ξ𝚺^𝐰^ξ\widehat{{\mathbf{w}}}_{\xi}^{{}^{\prime}}\widehat{{\bm{\Sigma}}}\widehat{{\mathbf{w}}}_{\xi}. (Theorem 5)

The estimator produced by GL in general and FGL in particular is guaranteed to be positive definite. We have verified it in the simulations (Section 5) and the empirical application (Section 6). In Section 4, consistency properties of estimators are established for the factors and loadings (Theorem 1), the precision matrix of 𝜺{\bm{\varepsilon}} (Theorem 2), the precision matrix 𝚯{\bm{\Theta}} (Theorem 3), portfolio weights (Theorem 4), and the portfolio risk exposure (Theorem 5). We can use 𝚯^\widehat{{\bm{\Theta}}} obtained from (3.5) using Step 4 of Procedure 1 to estimate portfolio weights in (2.2), (2.3) and (2.5):

4 Asymptotic Properties

In this section we first provide a brief review of the terminology used in the literature on graphical models and the approaches to estimate a precision matrix. After that we establish consistency of the Factor Graphical Lasso in Procedure 1. We also study consistency of the estimators of weights in (2.2), (2.3) and (2.5) and the implications on the out-of sample Sharpe Ratio. Throughout the main text we assume that errors and factors have exponential-type tails (1(c)). Supplemental Appendix B.10 proves that the conclusions of all theorems studied in Section 4 continue to hold when this assumption is relaxed.

The review of the Gaussian graphical models is based on Hastie et al., (2001) and Bishop, (2006). A graph consists of a set of vertices (nodes) and a set of edges (arcs) that join some pairs of the vertices. In graphical models, each vertex represents a random variable, and the graph visualizes the joint distribution of the entire set of random variables. The edges in a graph are parameterized by potentials (values) that encode the strength of the conditional dependence between the random variables at the corresponding vertices. Sparse graphs have a relatively small number of edges. Among the main challenges in working with the graphical models are choosing the structure of the graph (model selection) and estimation of the edge parameters from the data.

Let A𝒮pA\in\mathcal{S}_{p}. Define the following set for j=1,,pj=1,\ldots,p:

Dj(A){i:Aij0,ij},dj(A)card(Dj(A)),d(A)maxj=1,,pdj(A),\displaystyle D_{j}(A)\equiv\{i:A_{ij}\neq 0,\ i\neq j\},\quad d_{j}(A)\equiv\text{card}(D_{j}(A)),\quad d(A)\equiv\max_{j=1,\ldots,p}d_{j}(A), (4.1)

where dj(A)d_{j}(A) is the number of edges adjacent to the vertex jj (i.e., the degree of vertex jj), and d(A)d(A) measures the maximum vertex degree. Define S(A)j=1pDj(A)S(A)\equiv\bigcup_{j=1}^{p}D_{j}(A) to be the overall off-diagonal sparsity pattern, and s(A)j=1pdj(A)s(A)\equiv\sum_{j=1}^{p}d_{j}(A) is the overall number of edges contained in the graph. Note that card(S(A))s(A)\text{card}(S(A))\leq s(A): when s(A)=p(p1)/2s(A)=p(p-1)/2 this would give a fully connected graph.

4.1 Assumptions

We now list the assumptions on the model (3.1):

  1. \edefmbx(A.1)

    (Spiked covariance model) As pp\rightarrow\infty, Λ1(𝚺)>Λ2(𝚺)>>ΛK(𝚺)ΛK+1(𝚺)Λp(𝚺)0\Lambda_{1}({\bm{\Sigma}})>\Lambda_{2}({\bm{\Sigma}})>\ldots>\Lambda_{K}({\bm{\Sigma}})\gg\Lambda_{K+1}({\bm{\Sigma}})\geq\ldots\geq\Lambda_{p}({\bm{\Sigma}})\geq 0, where Λj(𝚺)=𝒪(p)\Lambda_{j}({\bm{\Sigma}})=\mathcal{O}(p) for jKj\leq K, while the non-spiked eigenvalues are bounded, that is, c0Λj(𝚺)C0c_{0}\leq\Lambda_{j}({\bm{\Sigma}})\leq C_{0}, j>Kj>K for constants c0,C0>0c_{0},C_{0}>0.

  1. \edefmbx(A.2)

    (Pervasive factors) There exists a positive definite K×KK\times K matrix 𝐁˘\breve{{\mathbf{B}}} such that |p1𝐁𝐁𝐁˘|20{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|p^{-1}{\mathbf{B}}^{\prime}{\mathbf{B}}-\breve{{\mathbf{B}}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{2}\rightarrow 0 and Λmin(𝐁˘)1=𝒪(1)\Lambda_{\text{min}}(\breve{{\mathbf{B}}})^{-1}=\mathcal{O}(1) as pp\rightarrow\infty.

  1. \edefmbx(A.3)
    1. (a)

      {𝜺t,𝐟t}t1\{{\bm{\varepsilon}}_{t},{\mathbf{f}}_{t}\}_{t\geq 1} is strictly stationary. Also, 𝔼[εit]=𝔼[εitfit]=0\mathbb{E}_{\,\!\!}\left[\varepsilon_{it}\right]=\mathbb{E}_{\,\!\!}\left[\varepsilon_{it}f_{it}\right]=0 ip\forall i\leq p, jKj\leq K and tTt\leq T.

    2. (b)

      There are constants c1,c2>0c_{1},c_{2}>0 such that Λmin(𝚺ε)>c1\Lambda_{\text{min}}({\bm{\Sigma}}_{\varepsilon})>c_{1}, |𝚺ε|1<c2{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{\bm{\Sigma}}_{\varepsilon}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{1}<c_{2} and minip,jpvar(εitεjt)>c1\text{min}_{i\leq p,j\leq p}\text{var}(\varepsilon_{it}\varepsilon_{jt})>c_{1}.

    3. (c)

      There are r1,r2>0r_{1},r_{2}>0 and b1,b2>0b_{1},b_{2}>0 such that for any s>0s>0, ipi\leq p, jKj\leq K,

      Pr(|εit|>s)exp{(s/b1)r1},Pr(|fjt|>s)exp{(s/b2)r2}.\displaystyle\Pr{(\left\lvert\varepsilon_{it}\right\rvert>s)\leq\exp\{-(s/b_{1})^{r_{1}}\}},\ \Pr{(\left\lvert f_{jt}\right\rvert>s)\leq\exp\{-(s/b_{2})^{r_{2}}\}}.

We also impose the strong mixing condition. Let 0\mathcal{F}_{-\infty}^{0} and T\mathcal{F}_{T}^{\infty} denote the σ\sigma-algebras that are generated by {(𝐟t,𝜺t):t0}\{({\mathbf{f}}_{t},{\bm{\varepsilon}}_{t}):t\leq 0\} and {(𝐟t,𝜺t):tT}\{({\mathbf{f}}_{t},{\bm{\varepsilon}}_{t}):t\geq T\} respectively. Define the mixing coefficient

α(T)=supA0,BT|PrAPrBPrAB|.\alpha(T)=\sup_{A\in\mathcal{F}_{-\infty}^{0},B\in\mathcal{F}_{T}^{\infty}}\left\lvert\Pr{A}\Pr{B}-\Pr{AB}\right\rvert. (4.2)
  1. \edefmbx(A.4)

    (Strong mixing) There exists r3>0r_{3}>0 such that 3r11+1.5r21+3r31>13r_{1}^{-1}+1.5r_{2}^{-1}+3r_{3}^{-1}>1, and C>0C>0 satisfying, for all T+T\in\mathbb{Z}^{+}, α(T)exp(CTr3)\alpha(T)\leq\exp(-CT^{r_{3}}).

  1. \edefmbx(A.5)

    (Regularity conditions) There exists M>0M>0 such that, for all ipi\leq p, tTt\leq T and sTs\leq T, such that:

    1. (a)

      𝐛imax<M\left\lVert{\mathbf{b}}_{i}\right\rVert_{\text{max}}<M

    2. (b)

      𝔼[p1/2{𝜺s𝜺t𝔼[𝜺s𝜺t]}]4<M\mathbb{E}_{\,\!\!}\left[p^{-1/2}\{{\bm{\varepsilon}}^{\prime}_{s}{\bm{\varepsilon}}_{t}-\mathbb{E}_{\,\!\!}\left[{\bm{\varepsilon}}^{\prime}_{s}{\bm{\varepsilon}}_{t}\right]\}\right]^{4}<M and

    3. (c)

      𝔼[p1/2i=1p𝐛iεit4]<K2M\mathbb{E}_{\,\!\!}\left[\left\lVert p^{-1/2}\sum_{i=1}^{p}{\mathbf{b}}_{i}\varepsilon_{it}\right\rVert^{4}\right]<K^{2}M.

Some comments regarding the aforementioned assumptions are in order. Assumptions 1-1 are the same as in Fan et al., (2013), and assumption 1 is modified to account for the increasing number of factors. Assumption 1 divides the eigenvalues into the diverging and bounded ones. Without loss of generality, we assume that KK largest eigenvalues have multiplicity of 1. The assumption of a spiked covariance model is common in the literature on approximate factor models. However, we note that the model studied in this paper can be characterized as a “very spiked model”. In other words, the gap between the first KK eigenvalues and the rest is increasing with pp. As pointed out by Fan et al., (2018), 1 is typically satisfied by the factor model with pervasive factors, which brings us to Assumption 1: the factors impact a non-vanishing proportion of individual time-series. Supplemental Appendix C.4 explores the sensitivity of portfolios constructed using FGL when the pervasiveness assumption is relaxed, that is, when the gap between the diverging and bounded eigenvalues decreases. Assumption 1(a) is slightly stronger than in Bai, (2003), since it requires strict stationarity and non-correlation between {𝜺t}\{{\bm{\varepsilon}}_{t}\} and {𝐟t}\{{\mathbf{f}}_{t}\} to simplify technical calculations. In 1(b) we require |𝚺ε|1<c2{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{\bm{\Sigma}}_{\varepsilon}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{1}<c_{2} instead of λmax(𝚺ε)=𝒪(1)\lambda_{\textup{max}}({\bm{\Sigma}}_{\varepsilon})=\mathcal{O}(1) to estimate KK consistently. When KK is known, as in Koike, (2020); Fan et al., (2011), this condition can be relaxed. 1(c) requires exponential-type tails to apply the large deviation theory to (1/T)t=1Tεitεjtσε,ij(1/T)\sum_{t=1}^{T}\varepsilon_{it}\varepsilon_{jt}-\sigma_{\varepsilon,ij} and (1/T)t=1Tfjtεit(1/T)\sum_{t=1}^{T}f_{jt}\varepsilon_{it}. However, in Supplemental Appendix B.10 we discuss the extension of our results to the setting with elliptical distribution family which is more appropriate for financial applications. Specifically, we discuss the appropriate modifications to the initial estimator of the covariance matrix of returns such that the bounds derived in this paper continue to hold. 1-1 are technical conditions which are needed to consistently estimate the common factors and loadings. The conditions 1(a-b) are weaker than those in Bai, (2003) since our goal is to estimate a precision matrix, and 1(c) differs from Bai, (2003) and Bai and Ng, (2006) in that the number of factors is assumed to slowly grow with pp.

In addition, the following structural assumption on the population quantities is imposed:

  1. \edefmbx(B.1)

    𝚺max=𝒪(1)\left\lVert{\bm{\Sigma}}\right\rVert_{\text{max}}=\mathcal{O}(1), 𝐁max=𝒪(1)\left\lVert{\mathbf{B}}\right\rVert_{\text{max}}=\mathcal{O}(1), and 𝐦=𝒪(1)\left\lVert{\mathbf{m}}\right\rVert_{\infty}=\mathcal{O}(1).

The sparsity of 𝚯ε{\bm{\Theta}}_{\varepsilon} is controlled by the deterministic sequences sTs_{T} and dTd_{T}: s(𝚯ε)=𝒪P(sT)s({\bm{\Theta}}_{\varepsilon})=\mathcal{O}_{P}(s_{T}) for some sequence sT(0,),T=1,2,s_{T}\in(0,\infty),\ T=1,2,\ldots, and d(𝚯ε)=𝒪P(dT)d({\bm{\Theta}}_{\varepsilon})=\mathcal{O}_{P}(d_{T}) for some sequence dT(0,),T=1,2,d_{T}\in(0,\infty),\ T=1,2,\ldots. We will impose restrictions on the growth rates of sTs_{T} and dTd_{T}. Note that assumptions on dTd_{T} are weaker since they are always satisfied when sT=dTs_{T}=d_{T}. However, dTd_{T} can generally be smaller than sTs_{T}. In contrast to Fan et al., (2013) we do not impose sparsity on the covariance matrix of the idiosyncratic component. Instead, it is more realistic and relevant for error quantification in portfolio analysis to impose conditional sparsity on the precision matrix after the common factors are accounted for.

4.2 The FGL Procedure

Recall the definition of the Weighted Graphical Lasso estimator in (3.3) for the precision matrix of the idiosyncratic components. Also, recall that to estimate 𝚯{\bm{\Theta}} we used equation (3.5). Therefore, in order to obtain the FGL estimator 𝚯^\widehat{{\bm{\Theta}}} we take the following steps: (1): estimate unknown factors and factor loadings to get an estimator of 𝚺ε{\bm{\Sigma}}_{\varepsilon}. (2): use 𝚺^ε\widehat{{\bm{\Sigma}}}_{\varepsilon} to get an estimator of 𝚯ε{\bm{\Theta}}_{\varepsilon} in (3.3). (3): use 𝚯^ε\widehat{{\bm{\Theta}}}_{\varepsilon} together with the estimators of factors and factor loadings from Step 1 to obtain the final precision matrix estimator 𝚯^\widehat{{\bm{\Theta}}}, portfolio weight estimator 𝐰^ξ\widehat{{\mathbf{w}}}_{\xi}, and risk exposure estimator Φ^ξ=𝐰^ξ𝚯^1𝐰^ξ\widehat{\Phi}_{\xi}=\widehat{{\mathbf{w}}}^{\prime}_{\xi}\widehat{{\bm{\Theta}}}^{-1}\widehat{{\mathbf{w}}}_{\xi} where ξ{GMV, MWC, MRC}\xi\in\{\text{GMV, MWC, MRC}\}.

Subsection 4.3 examines the theoretical foundations of the first step, and Subsections 4.4-4.5 are devoted to Steps 2 and 3.

4.3 Convergence in Estimation of Factors and Loadings

As pointed out in Bai, (2003) and Fan et al., (2013), K×1K\times 1-dimensional factor loadings {𝐛i}i=1p\{{\mathbf{b}}_{i}\}_{i=1}^{p}, which are the rows of the factor loadings matrix 𝐁{\mathbf{B}}, and K×1K\times 1-dimensional common factors {𝐟t}t=1T\{{\mathbf{f}}_{t}\}_{t=1}^{T}, which are the columns of 𝐅{\mathbf{F}}, are not separately identifiable. Concretely, for any K×KK\times K matrix 𝐇{\mathbf{H}} such that 𝐇𝐇=𝐈K{\mathbf{H}}^{\prime}{\mathbf{H}}={\mathbf{I}}_{K}, 𝐁𝐟t=𝐁𝐇𝐇𝐟t{\mathbf{B}}{\mathbf{f}}_{t}={\mathbf{B}}{\mathbf{H}}^{\prime}{\mathbf{H}}{\mathbf{f}}_{t}, therefore, we cannot identify the tuple (𝐁,𝐟t)({\mathbf{B}},{\mathbf{f}}_{t}) from (𝐁𝐇,𝐇𝐟t)({\mathbf{B}}{\mathbf{H}}^{\prime},{\mathbf{H}}{\mathbf{f}}_{t}). Let K^{1,,Kmax}\widehat{K}\in\{1,\ldots,K_{\text{max}}\} denote the estimated number of factors, where KmaxK_{\text{max}} is allowed to increase at a slower speed than min{p,T}\min\{p,T\} such that Kmax=o(min{p1/3,T})K_{\text{max}}=o(\min\{p^{1/3},T\}) (see Li et al., (2017) for the discussion about the rate).

Define 𝐕{\mathbf{V}} to be a K^×K^\widehat{K}\times\widehat{K} diagonal matrix of the first K^\widehat{K} largest eigenvalues of the sample covariance matrix in decreasing order. Further, define a K^×K^\widehat{K}\times\widehat{K} matrix 𝐇=(1/T)𝐕1𝐅^𝐅𝐁𝐁{\mathbf{H}}=(1/T){\mathbf{V}}^{-1}\widehat{{\mathbf{F}}}^{\prime}{\mathbf{F}}{\mathbf{B}}^{\prime}{\mathbf{B}}. For tTt\leq T, 𝐇𝐟t=T1𝐕1𝐅^(𝐁𝐟1,,𝐁𝐟T)𝐁𝐟t{\mathbf{H}}{\mathbf{f}}_{t}=T^{-1}{\mathbf{V}}^{-1}\widehat{{\mathbf{F}}}^{\prime}({\mathbf{B}}{\mathbf{f}}_{1},\ldots,{\mathbf{B}}{\mathbf{f}}_{T})^{\prime}{\mathbf{B}}{\mathbf{f}}_{t}, which depends only on the data 𝐕1𝐅^{\mathbf{V}}^{-1}\widehat{{\mathbf{F}}}^{\prime} and an identifiable part of parameters {𝐁𝐟t}t=1T\{{\mathbf{B}}{\mathbf{f}}_{t}\}_{t=1}^{T}. Hence, 𝐇𝐟t{\mathbf{H}}{\mathbf{f}}_{t} does not have an identifiability problem regardless of the imposed identifiability condition.

Let γ1=3r11+1.5r21+r31+1\gamma^{-1}=3r_{1}^{-1}+1.5r_{2}^{-1}+r_{3}^{-1}+1. The following theorem is an extension of the results in Fan et al., (2013) for the case when the number of factors is unknown and is allowed to grow. Proofs of all the theorems are in Supplemental Appendix B.

Theorem 1.

Suppose that Kmax=o(min{p1/3,T})K_{\textup{max}}=o(\min\{p^{1/3},T\}), K3logp=o(Tγ/6)K^{3}\log p=o(T^{\gamma/6}), KT=o(p2)KT=o(p^{2}) and Assumptions 1-1 and 1 hold. Let ω1TK3/2logp/T+K/p\omega_{1T}\equiv K^{3/2}\sqrt{\log p/T}+K/\sqrt{p} and ω2TK/T+KT1/4/p\omega_{2T}\equiv K/\sqrt{T}+KT^{1/4}/\sqrt{p}. Then maxip𝐛^i𝐇𝐛i=𝒪P(ω1T)\max_{i\leq p}\left\lVert\widehat{{\mathbf{b}}}_{i}-{\mathbf{H}}{\mathbf{b}}_{i}\right\rVert=\mathcal{O}_{P}(\omega_{1T}) and maxtT𝐟^t𝐇𝐟t=𝒪P(ω2T)\max_{t\leq T}\left\lVert\widehat{{\mathbf{f}}}_{t}-{\mathbf{H}}{\mathbf{f}}_{t}\right\rVert=\mathcal{O}_{P}(\omega_{2T}).

The conditions K3logp=o(Tγ/6)K^{3}\log p=o(T^{\gamma/6}), KT=o(p2)KT=o(p^{2}) are similar to Fan et al., (2013), the difference arises due to the fact that we do not fix KK, hence, in addition to the factor loadings, there are KTKT factors to estimate. Therefore, the number of parameters introduced by the unknown growing factors should not be “too large”, such that we can consistently estimate them uniformly. The growth rate of the number of factors is controlled by Kmax=o(min{p1/3,T})K_{\text{max}}=o(\min\{p^{1/3},T\}).

The bounds derived in Theorem 1 help us establish the convergence properties of the estimated idiosyncratic covariance, 𝚺^ε\widehat{{\bm{\Sigma}}}_{\varepsilon}, and precision matrix 𝚯^ε\widehat{{\bm{\Theta}}}_{\varepsilon} which are presented in the next theorem:

Theorem 2.

Let ω3TK2logp/T+K3/p\omega_{3T}\equiv K^{2}\sqrt{\log p/T}+K^{3}/\sqrt{p}. Under the assumptions of Theorem 1 and with λω3T\lambda\asymp\omega_{3T} (where λ\lambda is the tuning parameter in (3.3)), the estimator 𝚺^ε\widehat{{\bm{\Sigma}}}_{\varepsilon} obtained by estimating factor model in (3.2) satisfies 𝚺^ε𝚺εmax=𝒪P(ω3T).\left\lVert\widehat{{\bm{\Sigma}}}_{\varepsilon}-{\bm{\Sigma}}_{\varepsilon}\right\rVert_{\textup{max}}=\mathcal{O}_{P}(\omega_{3T}). Let ϱT\varrho_{T} be a sequence of positive-valued random variables such that ϱT1ω3T𝑝0\varrho_{T}^{-1}\omega_{3T}\xrightarrow{p}0. If sTϱTp0s_{T}\varrho_{T}\xrightarrow{\text{p}}0, then |𝚯^ε𝚯ε|l=𝒪P(ϱTsT){\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\widehat{{\bm{\Theta}}}_{\varepsilon}-{\bm{\Theta}}_{\varepsilon}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{l}=\mathcal{O}_{P}(\varrho_{T}s_{T}) as TT\rightarrow\infty for any l[1,]l\in[1,\infty].

Note that the term containing K3/pK^{3}/\sqrt{p} arises due to the need to estimate unknown factors. Fan et al., (2011) obtained a similar rate but for the case when factors are observable (in their work, ω3T=K1/2logp/T\omega_{3T}=K^{1/2}\sqrt{\log p/T}). The second part of Theorem 2 is based on the relationship between the convergence rates of the estimated covariance and precision matrices established in Janková and van de Geer, (2018) (Theorem 14.1.3). Koike, (2020) obtained the convergence rate when factors are observable: the rate obtained in our paper is slower due to the fact that factors need to be estimated (concretely, the rate under observable factors would satisfy ϱT1Klogp/T𝑝0\varrho_{T}^{-1}\sqrt{K\log p/T}\xrightarrow{p}0 ). We now comment on the optimality of the rate in Theorem 2: as pointed out in Koike, (2020), in the standard Gaussian setting without factor structure, the minimax optimal rate is d(𝚯ε)logp/Td({\bm{\Theta}}_{\varepsilon})\sqrt{\log p/T}, which can be faster than the rate obtained in Theorem 2 if d(𝚯ε)<sTd({\bm{\Theta}}_{\varepsilon})<s_{T}. Using penalized nodewise regression could help achieve this faster rate. However, our empirical application to the monthly stock returns demonstrated superior performance of the Weighted Graphical Lasso compared to the nodewise regression in terms of the out-of-sample Sharpe Ratio and portfolio risk. Hence, in order not to divert the focus of this paper, we leave the theoretical properties of the nodewise regression for future research.

4.4 Convergence in Estimation of Precision Matrix and Portfolio Weights

Having established the convergence properties of 𝚺^ε\widehat{{\bm{\Sigma}}}_{\varepsilon} and 𝚯^ε\widehat{{\bm{\Theta}}}_{\varepsilon}, we now move to the estimation of the precision matrix of the factor-adjusted returns in equation (3.5).

Theorem 3.

Under the assumptions of Theorem 2, if dTsTϱTp0d_{T}s_{T}\varrho_{T}\xrightarrow{\text{p}}0, then |𝚯^𝚯|2=𝒪P(ϱTsT){\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\widehat{{\bm{\Theta}}}-{\bm{\Theta}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{2}=\mathcal{O}_{P}(\varrho_{T}s_{T}) and |𝚯^𝚯|1=𝒪P(ϱTdTK3/2sT){\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\widehat{{\bm{\Theta}}}-{\bm{\Theta}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{1}=\mathcal{O}_{P}(\varrho_{T}d_{T}K^{3/2}s_{T}).

Note that since, by construction, the precision matrix obtained using the Factor Graphical Lasso is symmetric, |𝚯^𝚯|{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\widehat{{\bm{\Theta}}}-{\bm{\Theta}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{\infty} can be trivially obtained from the above theorem.

Using Theorem 3, we can then establish the consistency of the estimated weights of portfolios based on the Factor Graphical Lasso.

Theorem 4.

Under the assumptions of Theorem 3, we additionally assume |𝚯|2=𝒪(1){\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{\bm{\Theta}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{2}=\mathcal{O}(1) (this additional requirement essentially imposes Λp(𝚺)>0\Lambda_{p}({\bm{\Sigma}})>0 in 1), and ϱTdT2sT=o(1)\varrho_{T}d_{T}^{2}s_{T}=o(1). Procedure 1 consistently estimates portfolio weights in (2.2), (2.3) and (2.5):
𝐰^GMV𝐰GMV1=𝒪P(ϱTdT2K3sT)=oP(1)\left\lVert\widehat{{\mathbf{w}}}_{\text{GMV}}-{\mathbf{w}}_{\text{GMV}}\right\rVert_{1}=\mathcal{O}_{P}\Big{(}\varrho_{T}d_{T}^{2}K^{3}s_{T}\Big{)}=o_{P}(1), 𝐰^MWC𝐰MWC1=𝒪P(ϱTdT2K3sT)=oP(1)\left\lVert\widehat{{\mathbf{w}}}_{\text{MWC}}-{\mathbf{w}}_{\text{MWC}}\right\rVert_{1}=\mathcal{O}_{P}(\varrho_{T}d_{T}^{2}K^{3}s_{T})=o_{P}(1), and 𝐰^MRC𝐰MRC1=𝒪P(dT3/2K3[ϱTsT]1/2)=oP(1)\left\lVert\widehat{{\mathbf{w}}}_{\text{MRC}}-{\mathbf{w}}_{\text{MRC}}\right\rVert_{1}=\mathcal{O}_{P}\Big{(}d_{T}^{3/2}K^{3}\cdot[\varrho_{T}s_{T}]^{1/2}\Big{)}=o_{P}(1).

We now comment on the rates in Theorem 4: first, the rates obtained by Callot et al., (2021) for GMV and MWC formulations, when no factor structure of stock returns is assumed, require s(𝚯)3/2logp/T=oP(1)s({\bm{\Theta}})^{3/2}\sqrt{\log p/T}=o_{P}(1), where the authors imposed sparsity on the precision matrix of stock returns, 𝚯{\bm{\Theta}}. Therefore, if the precision matrix of stock returns is not sparse, portfolio weights can be consistently estimated only if pp is less than T1/3T^{1/3} (since (p1)3/2logp/T=o(1)(p-1)^{3/2}\sqrt{\log p/T}=o(1) is required to ensure consistent estimation of portfolio weights). Our result in Theorem 4 improves this rate and shows that as long as dT2sTK3logp/T=oP(1)d_{T}^{2}s_{T}K^{3}\sqrt{\log p/T}=o_{P}(1) we can consistently estimate weights of the financial portfolio. Specifically, when the precision of the factor-adjusted returns is sparse, we can consistently estimate portfolio weights when p>Tp>T without assuming sparsity on 𝚺{\bm{\Sigma}} or 𝚯{\bm{\Theta}}. Second, note that GMV and MWC weights converge slightly slower than MRC weight. This result is further supported by our simulations presented in the next section.

4.5 Implications on Portfolio Risk Exposure

Having examined the properties of portfolio weights, it is natural to comment on the portfolio variance estimation error. It is determined by the errors in two components: the estimated covariance matrix and the estimated portfolio weights. Define a=𝜾p𝚯𝜾p/pa={\bm{\iota}}^{\prime}_{p}{\bm{\Theta}}{\bm{\iota}}_{p}/p, b=𝜾p𝚯𝐦/pb={\bm{\iota}}^{\prime}_{p}{\bm{\Theta}}{\mathbf{m}}/p, d=𝐦𝚯𝐦/pd={\mathbf{m}}^{\prime}{\bm{\Theta}}{\mathbf{m}}/p, g=𝐦𝚯𝐦/pg=\sqrt{{\mathbf{m}}^{\prime}{\bm{\Theta}}{\mathbf{m}}}/p and a^=𝜾p𝚯^𝜾p/p\widehat{a}={\bm{\iota}}^{\prime}_{p}\widehat{{\bm{\Theta}}}{\bm{\iota}}_{p}/p, b^=𝜾p𝚯^𝐦^/p\widehat{b}={\bm{\iota}}^{\prime}_{p}\widehat{{\bm{\Theta}}}\widehat{{\mathbf{m}}}/p, d^=𝐦^𝚯^𝐦^/p\widehat{d}=\widehat{{\mathbf{m}}}^{\prime}\widehat{{\bm{\Theta}}}\widehat{{\mathbf{m}}}/p, g^=𝐦^𝚯^𝐦^/p\widehat{g}=\sqrt{\widehat{{\mathbf{m}}}^{\prime}\widehat{{\bm{\Theta}}}\widehat{{\mathbf{m}}}}/p. Define ΦGMV=𝐰GMV𝚺𝐰GMV=(pa)1\Phi_{\text{GMV}}={\mathbf{w}}_{GMV}^{\prime}{\bm{\Sigma}}{\mathbf{w}}_{GMV}=(pa)^{-1} to be the global minimum variance, ΦMWC=𝐰MWC𝚺𝐰MWC=p1[aμ22bμ+dadb2]\Phi_{\text{MWC}}={\mathbf{w}}_{MWC}^{\prime}{\bm{\Sigma}}{\mathbf{w}}_{MWC}=p^{-1}\Big{[}\frac{a\mu^{2}-2b\mu+d}{ad-b^{2}}\Big{]} is the MWC portfolio variance, and ΦMRC=𝐰MRC𝚺𝐰MRC=σ2(pg)\Phi_{\text{MRC}}={\mathbf{w}}_{MRC}^{\prime}{\bm{\Sigma}}{\mathbf{w}}_{MRC}=\sigma^{2}(pg) is the MRC portfolio variance. We use the terms variance and risk exposure interchangeably. Let Φ^GMV\widehat{\Phi}_{\text{GMV}}, Φ^MWC\widehat{\Phi}_{\text{MWC}}, and Φ^MRC\widehat{\Phi}_{\text{MRC}} be the sample counterparts of the respective portfolio variances. The expressions for ΦGMV\Phi_{\text{GMV}} and ΦMWC\Phi_{\text{MWC}} were derived in Fan et al., (2008) and Callot et al., (2021). Theorem 5 establishes the consistency of a large portfolio’s variance estimator.

Theorem 5.

Under the assumptions of Theorem 3, FGL consistently estimates GMV, MWC, and MRC portfolio variance:
|Φ^GMV/ΦGMV1|=𝒪P(ϱTdTsTK3/2)=oP(1)\left\lvert\widehat{\Phi}_{\text{GMV}}/\Phi_{\text{GMV}}-1\right\rvert=\mathcal{O}_{P}(\varrho_{T}d_{T}s_{T}K^{3/2})=o_{P}(1),
|Φ^MWC/ΦMWC1|=𝒪P(ϱTdTsTK3/2)=oP(1)\left\lvert\widehat{\Phi}_{\text{MWC}}/\Phi_{\text{MWC}}-1\right\rvert=\mathcal{O}_{P}(\varrho_{T}d_{T}s_{T}K^{3/2})=o_{P}(1),
|Φ^MRC/ΦMRC1|=𝒪P([ϱTdTsTK3/2]1/2)=oP(1)\left\lvert\widehat{\Phi}_{\text{MRC}}/\Phi_{\text{MRC}}-1\right\rvert=\mathcal{O}_{P}\Big{(}[\varrho_{T}d_{T}s_{T}K^{3/2}]^{1/2}\Big{)}=o_{P}(1).

Callot et al., (2021) derived a similar result for ΦGMV\Phi_{\text{GMV}} and ΦMWC\Phi_{\text{MWC}} under the assumption that precision matrix of stock returns is sparse. Also, Ding et al., (2021) derived the bounds for ΦGMV\Phi_{\text{GMV}} under the factor structure assuming sparse covariance matrix of idiosyncratic components and gross exposure constraint on portfolio weights which limits negative positions.

The empirical application in Section 6 reveals that the portfolios constructed using MRC formulation have higher risk compared with GMV and MWC alternatives: using monthly and daily returns of the components of S&P500 index, MRC portfolios exhibit higher out-of-sample risk and return compared to the alternative formulations. Furthermore, the empirical exercise demonstrates that the higher return of MRC portfolios outweighs higher risk for the monthly data which is evidenced by the increased out-of-sample Sharpe Ratio.

5 Monte Carlo

In order to validate our theoretical results, we perform several simulation studies which are divided into four parts. The first set of results computes the empirical convergence rates and compares them with the theoretical expressions derived in Theorems 3-5. The second set of results compares the performance of the FGL with several alternative models for estimating covariance and precision matrix. To highlight the benefit of using the information about factor structure as opposed to standard graphical models, we include Graphical Lasso by Friedman et al., (2008) (GL) that does not account for the factor structure. To explore the benefits of using FGL for error quantification in (3.5), we consider several alternative estimators of covariance/precision matrix of the idiosyncratic component in (3.5): (1) linear shrinkage estimator of covariance developed by Ledoit and Wolf, (2004) further referred to as Factor LW or FLW; (2) nonlinear shrinkage estimator of covariance by Ledoit and Wolf, (2017) (Factor NLW or FNLW); (3) POET (Fan et al., (2013)); (4) constrained 1\ell_{1}-minimization for inverse matrix estimator, Clime (Cai et al., (2011)) (Factor Clime or FClime). Furthermore, we discovered that in certain setups the estimator of covariance produced by POET is not positive definite. In such cases we use the matrix symmetrization procedure as in Fan et al., (2018) and then use eigenvalue cleaning as in Callot et al., (2017) and Hautsch et al., (2012). This estimator is referred to as Projected POET; it coincides with POET when the covariance estimator produced by the latter is positive definite. The third set of results examines the performance of FGL and Robust FGL (described in Supplemental Appendix B.10) when the dependent variable follows elliptical distribution. The fourth set of results explores the sensitivity of portfolios constructed using different covariance and precision estimators of interest when the pervasiveness assumption 1 is relaxed, that is, when the gap between the diverging and bounded eigenvalues decreases. All exercises in this section use 100 Monte Carlo simulations.

We consider the following setup: let p=Tδp=T^{\delta}, δ=0.85\delta=0.85, K=2(logT)0.5K=2(\log T)^{0.5} and T=[2h],forh=7,7.5,8,,9.5T=[2^{h}],\ \text{for}\ h=7,7.5,8,\ldots,9.5. A sparse precision matrix of the idiosyncratic components is constructed as follows: we first generate the adjacency matrix using a random graph structure. Define a p×pp\times p adjacency matrix 𝐀ε{\mathbf{A}}_{\varepsilon} which is used to represent the structure of the graph:

aε,ij={1,forijwith probability q,0,otherwise.\displaystyle a_{\varepsilon,ij}=\begin{cases}1,&\text{for}\ i\neq j\ \ \text{with probability $q$},\\ 0,&\text{otherwise.}\end{cases} (5.1)

Let aε,ija_{\varepsilon,ij} denote the i,ji,j-th element of the adjacency matrix 𝐀ε{\mathbf{A}}_{\varepsilon}. We set aε,ij=aε,ji=1,forija_{\varepsilon,ij}=a_{\varepsilon,ji}=1,\ \text{for}\ i\neq j with probability qq, and 0 otherwise. Such structure results in sT=p(p1)q/2s_{T}=p(p-1)q/2 edges in the graph. To control sparsity, we set q=1/(pT0.8)q=1/(pT^{0.8}), which makes sT=𝒪(T0.05)s_{T}=\mathcal{O}(T^{0.05}). The adjacency matrix has all diagonal elements equal to zero. Hence, to obtain a positive definite precision matrix we apply the procedure described in Zhao et al., (2012): using their notation, 𝚯ε=𝐀εv+𝐈(|τ|+0.1+u){\bm{\Theta}}_{\varepsilon}={\mathbf{A}}_{\varepsilon}\cdot v+{\mathbf{I}}(\left\lvert\tau\right\rvert+0.1+u), where u>0u>0 is a positive number added to the diagonal of the precision matrix to control the magnitude of partial correlations, vv controls the magnitude of partial correlations with uu, and τ\tau is the smallest eigenvalue of 𝐀εv{\mathbf{A}}_{\varepsilon}\cdot v. In our simulations we use u=0.1u=0.1 and v=0.3v=0.3.

Factors are assumed to have the following structure:

𝐟t=ϕf𝐟t1+𝜻t\displaystyle{\mathbf{f}}_{t}=\phi_{f}{\mathbf{f}}_{t-1}+{\bm{\zeta}}_{t} (5.2)
𝐫tp×1=𝐦+𝐁𝐟tK×1+𝜺t,t=1,,T\displaystyle\underbrace{{\mathbf{r}}_{t}}_{p\times 1}={\mathbf{m}}+{\mathbf{B}}\underbrace{{\mathbf{f}}_{t}}_{K\times 1}+\ {\bm{\varepsilon}}_{t},\quad t=1,\ldots,T (5.3)

where mi𝒩(1,1)m_{i}\sim\mathcal{N}(1,1) independently for each i=1,,pi=1,\ldots,p, 𝜺t{\bm{\varepsilon}}_{t} is a p×1p\times 1 random vector of idiosyncratic errors following 𝒩(𝟎,𝚺ε)\mathcal{N}(\bm{0},{\bm{\Sigma}}_{\varepsilon}), with sparse 𝚯ε{\bm{\Theta}}_{\varepsilon} that has a random graph structure described above, 𝐟t{\mathbf{f}}_{t} is a K×1K\times 1 vector of factors, ϕf\phi_{f} is an autoregressive parameter in the factors which is a scalar for simplicity, 𝐁{\mathbf{B}} is a p×Kp\times K matrix of factor loadings, 𝜻t{\bm{\zeta}}_{t} is a K×1K\times 1 random vector with each component independently following 𝒩(0,σζ2)\mathcal{N}(0,\sigma^{2}_{\zeta}). To create 𝐁{\mathbf{B}} in (5.3) we take the first KK rows of an upper triangular matrix from a Cholesky decomposition of the p×pp\times p Toeplitz matrix parameterized by ρ\rho. For the first set of results we set ρ=0.2\rho=0.2, ϕf=0.2\phi_{f}=0.2 and σζ2=1\sigma^{2}_{\zeta}=1. The specification in (5.3) leads to the low-rank plus sparse decomposition of the covariance matrix of stock returns 𝐫t{\mathbf{r}}_{t}.

As a first exercise, we compare the empirical and theoretical convergence rates of the precision matrix, portfolio weights and exposure. A detailed description of the procedure and the simulation results is provided in Supplemental Appendix C.1. We confirm that the empirical rates and theoretical rates from Theorems 3-5 are matched.

As a second exercise, we compare the performance of FGL with the alternative models listed at the beginning of this section. We consider two cases: Case 1 is the same as for the first set of simulations (p<Tp<T): p=Tδp=T^{\delta}, δ=0.85\delta=0.85, K=2(logT)0.5K=2(\log T)^{0.5}, sT=𝒪(T0.05)s_{T}=\mathcal{O}(T^{0.05}). Case 2 captures the cases when p>Tp>T with p=3Tδp=3\cdot T^{\delta}, δ=0.85\delta=0.85, all else equal. The results for Case 2 are reported in Figure 1-3, and Case 1 is located in Supplemental Appendix C.2. FGL demonstrates superior performance for estimating precision matrix and portfolio weights in both cases, exhibiting consistency for both Case 1 and Case 2 settings. Also, FGL outperforms GL for estimating portfolio exposure and consistently estimates the latter, however, depending on the case under consideration some alternative models produce lower averaged error.

As a third exercise, we examine the performance of FGL and Robust FGL (described in Supplemental Appendix B.10) when the dependent variable follows elliptical distributions. A detailed description of the data generating process (DGP) and simulation results are provided in Supplemental Appendix C.3. We find that the performance of FGL for estimating the precision matrix is comparable with that of Robust FGL: this suggests that our FGL algorithm is robust to heavy-tailed distributions even without additional modifications.

As a final exercise, we explore the sensitivity of portfolios constructed using different covariance and precision estimators of interest when the pervasiveness assumption 1 is relaxed. A detailed description of the data generating process (DGP) and simulation results are provided in Supplemental Appendix C.4. We verify that FGL exhibits robust performance when the gap between the diverging and bounded eigenvalues decreases. In contrast, POET and Projected POET are most sensitive to relaxing pervasiveness assumption which is consistent with our empirical findings and also with the simulation results by Onatski, (2013).

6 Empirical Application

In this section we examine the performance of the Factor Graphical Lasso for constructing a financial portfolio using daily data. The description and empirical results for monthly data can be found in Supplemental Appendix D. We first describe the data and the estimation methodology, then we list four metrics commonly reported in the finance literature, and, finally, we present the results.

6.1 Data

We use daily returns of the components of the S&P500 index. The data on historical S&P500 constituents and stock returns is fetched from CRSP and Compustat using SAS interface. For the daily data the full sample size has 5040 observations on 420 stocks from January 20, 2000 - January 31, 2020. We use January 20, 2000 - January 24, 2002 (504 obs) as the first training (estimation) period and January 25, 2002 - January 31, 2020 (4536 obs) as the out-of-sample (OOS) test period. Supplemental Appendix D.3 examines the performance of different competing methods for longer training periods. We roll the estimation window (training periods) over the test sample to rebalance the portfolios monthly. At the end of each month, prior to portfolio construction, we remove stocks with less than 2 years of historical stock return data. The performance of the competing models is compared with the Index – the composite S&P500 index listed as \wedgeGSPC. We take the risk-free rate and Fama/French factors from Kenneth R. French’s data library.

6.2 Performance Measures

Similarly to Callot et al., (2021), we consider four metrics commonly reported in the finance literature: the Sharpe Ratio, the portfolio turnover, the average return and the risk of a portfolio (which is defined as the square root of the out-of-sample variance of the portfolio). We consider two scenarios: with and without transaction costs. Let TT denote the total number of observations, the training sample consists of m=504m=504 observations, and the test sample is n=Tmn=T-m.

When transaction costs are not taken into account, the out-of-sample average portfolio return, variance and SR are

μ^test=1nt=mT1𝐰^t𝐫t+1,σ^test2=1n1t=mT1(𝐰^t𝐫t+1μ^test)2,SR=μ^test/σ^test.\displaystyle\hat{\mu}_{\text{test}}=\frac{1}{n}\sum_{t=m}^{T-1}\widehat{{\mathbf{w}}}^{\prime}_{t}{\mathbf{r}}_{t+1},\ \hat{\sigma}_{\text{test}}^{2}=\frac{1}{n-1}\sum_{t=m}^{T-1}(\widehat{{\mathbf{w}}}^{\prime}_{t}{\mathbf{r}}_{t+1}-\hat{\mu}_{\text{test}})^{2},\ \text{SR}=\hat{\mu}_{\text{test}}/\hat{\sigma}_{\text{test}}. (6.1)

When transaction costs are considered, we follow Ban et al., (2018), Callot et al., (2021), DeMiguel et al., (2009), and Li, (2015) to account for the transaction costs, further denoted as tc. In line with the aforementioned papers, we set tc=10bps\textup{tc}=10\text{bps}. Define the excess portfolio at time t+1t+1 with transaction costs (tc) as

rt+1,portfolio=𝐰^t𝐫t+1tc(1+𝐰^t𝐫t+1)j=1p|w^t+1,jw^t,j+|,\displaystyle r_{t+1,\text{portfolio}}=\ \widehat{{\mathbf{w}}}^{\prime}_{t}{\mathbf{r}}_{t+1}-\textup{tc}(1+\widehat{{\mathbf{w}}}^{\prime}_{t}{\mathbf{r}}_{t+1})\sum_{j=1}^{p}\left\lvert\hat{w}_{t+1,j}-\hat{w}_{t,j}^{+}\right\rvert, (6.2)

where

w^t,j+=w^t,j1+rt+1,j+rt+1f1+rt+1,portfolio+rt+1f,\displaystyle\hat{w}_{t,j}^{+}=\hat{w}_{t,j}\frac{1+r_{t+1,j}+r^{f}_{t+1}}{1+r_{t+1,\text{portfolio}}+r^{f}_{t+1}}, (6.3)

rt+1,j+rt+1fr_{t+1,j}+r^{f}_{t+1} is sum of the excess return of the jj-th asset and risk-free rate, and rt+1,portfolio+rt+1fr_{t+1,\text{portfolio}}+r^{f}_{t+1} is the sum of the excess return of the portfolio and risk-free rate. The out-of-sample average portfolio return, variance, Sharpe Ratio and turnover are defined accordingly:

μ^test,tc=1nt=mT1rt,portfolio,\displaystyle\hat{\mu}_{\text{test,tc}}=\frac{1}{n}\sum_{t=m}^{T-1}r_{t,\text{portfolio}},\ σ^test,tc2=1n1t=mT1(rt,portfolioμ^test,tc)2,SRtc=μ^test,tc/σ^test,tc,\displaystyle\hat{\sigma}_{\text{test,tc}}^{2}=\frac{1}{n-1}\sum_{t=m}^{T-1}(r_{t,\text{portfolio}}-\hat{\mu}_{\text{test,tc}})^{2},\ \text{SR}_{\text{tc}}=\hat{\mu}_{\text{test,tc}}/\hat{\sigma}_{\text{test,tc}}, (6.4)
Turnover=1nt=mT1j=1p|w^t+1,jw^t,j+|.\displaystyle\text{Turnover}=\frac{1}{n}\sum_{t=m}^{T-1}\sum_{j=1}^{p}\left\lvert\hat{w}_{t+1,j}-\hat{w}_{t,j}^{+}\right\rvert. (6.5)

6.3 Description of Empirical Design

In the empirical application for constructing financial portfolio we consider two scenarios, when the factors are unknown and estimated using the standard PCA (statistical factors), and when the factors are known. The number of statistical factors, K^\hat{K}, is estimated in accordance with Remark 1 in Supplemental Appendix D.1. For the scenario with known factors we include up to 5 Fama-French factors: FF1 includes the excess return on the market, FF3 includes FF1 plus size factor (Small Minus Big, SMB) and value factor (High Minus Low, HML), and FF5 includes FF3 plus profitability factor (Robust Minus Weak, RMW) and risk factor (Conservative Minus Agressive, CMA).
We examine the performance of Factor Graphical Lasso for three alternative portfolio allocations (2.2), (2.3) and (2.5) and compare it with the equal-weighted portfolio (EW), index portfolio (Index), FClime, FLW, FNLW (as in the simulations, we use alternative covariance and precision estimators that incorporate the factor structure through Sherman-Morrison inversion formula), POET, Projected POET, and factor models without sparsity restriction on the residual risk (FF1, FF3, and FF5).
In Table 1 and Supplemental Appendix D, we report the daily and monthly portfolio performance for three alternative portfolio allocations in (2.2), (2.3) and (2.5). We consider a relatively risk-averse investor in a sense that they are willing to tolerate no more risk than that incurred by holding the S&P500 Index: the target level of risk for the weight-constrained and risk-constrained Markowitz portfolio (MWC and MRC) is set at σ=0.013\sigma=0.013 which is the standard deviation of the daily excess returns of the S&P500 index in the first training set. A return target μ=0.0378%\mu=0.0378\% which is equivalent to 10%10\% yearly return when compounded. Transaction costs for each individual stock are set to be a constant 0.1%0.1\%. Supplemental Appendix D.3 provides the results for less risk-averse investors that have higher target levels of risk and return for both monthly and daily data.

To compare the relative performance of investment strategies induced by different precision matrix estimators, we use a stepwise multiple testing procedure developed in Romano and Wolf, (2005) and further covered in Romano and Wolf, (2016). Let SRP=μtest/σtest\text{SR}^{P}=\mu_{\text{test}}/\sigma_{\text{test}} be the population counterpart of the sample Sharpe Ratio defined in (6.1). We compare each strategy ss, 1sS1\leq s\leq S, with the benchmark (Index) strategy, indexed as S+1S+1. Define χsSRsPSRS+1P\chi_{s}\equiv\text{SR}_{s}^{P}-\text{SR}_{S+1}^{P}. The test statistic is χ^sSRsSRS+1\hat{\chi}_{s}\equiv\text{SR}_{s}-\text{SR}_{S+1}. For a given strategy ss, we consider the individual testing problem 0:χs0vs.A:χs>0\mathbb{H}_{0}:\chi_{s}\leq 0\quad\text{vs.}\quad\mathbb{H}_{A}:\chi_{s}>0. Using the stepwise multiple testing procedure we aim at identifying as many strategies as possible for which χs>0\chi_{s}>0: we relabel the strategies according to the size of the individual test statistics, from largest to smallest, and make the individual decisions in a stepdown manner starting with the null hypothesis that corresponds to the largest test statistic. P-values for competing methods are reported in the tables with empirical results. We note that by construction of the stepwise multiple testing procedure, the resulting p-values are relatively conservative, consistent with Remark 3.1 of Romano and Wolf, (2005).

6.4 Empirical Results

This section explores the performance of the Factor Graphical Lasso for the financial portfolio using daily data.

Let us summarize the results for daily data in Table 1: (1) MRC portfolios produce higher return and higher risk, compared to MWC and GMV. However, the out-of-sample Sharpe Ratio for MRC is lower than that of MWC and GMV, which implies that the higher risk of MRC portfolios is not fully compensated by the higher return. (2) FGL outperforms all the competitors, including EW and Index. Specifically, our method has the lowest risk and turnover (compared to FClime, FLW, FNLW and POET), and the highest out-of-sample Sharpe Ratio compared with all alternative methods. (3) The implementation of POET for MRC resulted in the erratic behavior of this method for estimating portfolio weights; many entries in the weight matrix had “NaN” entries. We elaborate on the reasons behind such performance below. (4) Using the observable Fama-French factors in the FGL, in general, produces portfolios with higher return and higher out-of-sample Sharpe Ratio compared to the portfolios based on statistical factors. Interestingly, this increase in return is not followed by higher risk. (5) FGL strongly dominates all factor models that do not impose sparsity on the precision of the idiosyncratic component. The results for monthly data are provided in Supplemental Appendix D: all the conclusions are similar to the ones for daily data.

We now examine possible reasons behind the observed puzzling behavior of POET and Projected POET. The erratic behavior of the former is caused by the fact that POET estimator of covariance matrix was not positive-definite which produced poor estimates of GMV and MWC weights and made it infeasible to compute MRC weights (recall, by construction MRC weight in (2.5) requires taking a square root). To explore deteriorated behavior of Projected POET, let us highlight two findings outlined by the existing closely related literature. First, Bailey et al., (2021) examined “pervasiveness” degree, or strength, of 146 factors commonly used in the empirical finance literature, and found that only the market factor was strong, while all other factors were semi-strong. This indicates that the factor pervasiveness assumption 1 might be unrealistic in practice. Second, as pointed out by Onatski, (2013), “the quality of POET dramatically deteriorates as the systematic-idiosyncratic eigenvalue gap becomes small”. Therefore, being guided by the two aforementioned findings, we attribute deteriorated performance of POET and Projected POET to the decreased gap between the diverging and bounded eigenvalues documented in the past studies on financial returns. High sensitivity of these two covariance estimators in such settings was further supported by our additional simulation study (Supplemental Appendix C.4) examining the robustness of portfolios constructed using different covariance and precision estimators.

Table 2 compares the performance of MRC portfolios for the daily data for different time periods of interesting episodes in terms of the cumulative excess return (CER), risk, and SR. To demonstrate the performance of all methods during the periods of recession and expansion, we chose four periods and recorded CER for the whole year in each period of interest. Two years, 2002 and 2008 correspond to the recession periods, which is why we we refer to them as “Downturns”. We note that the references to Argentine Great Depression and The Financial Crisis do not intend to limit these economic downturns to only one year. They merely provide the context for the recessions. The other two years, 2017 and 2019, correspond to the years which were relatively favorable to the stock market (“Booms”). Overall, it is easier to beat the Index in Downturns than in Booms. In most cases FGL shows superior performance in terms of CER and SR for Downturn #1, Boom #1 and Boom #2. For Downturn #2, even though FGL has the highest CER, its SR is smaller than SR of some other competing methods. One explanation would be the following: as evidenced by high risk of the competing methods during Boom #2, there were high positive and negative returns during the period, with high returns driving up the average used in computing the SR. However, if one were to use the alternative strategies ignoring CER statistics, then the return on the money deposited at the beginning of 2008 would either be negative (e.g. FClime, Projected POET) or smaller than the CER of FGL-based strategies. This exercise demonstrates that SR statistics alone, especially during recession periods characterized by higher volatility, could be misleading. Another interesting finding from such exercise is that FGL exhibits smaller risk compared to most competing methods even during the periods of recession, which holds for all portfolio formulations. This allows FGL to minimize cumulative losses during economic downturns. Subperiod analyses for MWC and GMV portfolio formulations is presented in Supplemental Appendix D.5.

7 Conclusion

In this paper, we propose a new conditional precision matrix estimator for the excess returns under the approximate factor model with unobserved factors that combines the benefits of graphical models and factor structure. We established consistency of FGL in the spectral and 1\ell_{1} matrix norms. In addition, we proved consistency of the portfolio weights and risk exposure for three formulations of the optimal portfolio allocation without assuming sparsity on the covariance or precision matrix of stock returns. All theoretical results established in this paper hold for a wide range of distributions: sub-Gaussian family (including Gaussian) and elliptical family. Our simulations demonstrate that FGL is robust to very heavy-tailed distributions, which makes our method suitable for the financial applications. Furthermore, we demonstrate that in contrast to POET and Projected POET, the success of the proposed method does not heavily depend on the factor pervasiveness assumption: FGL is robust to the scenarios when the gap between the diverging and bounded eigenvalues decreases.

The empirical exercise uses the constituents of the S&P500 index and demonstrates superior performance of FGL compared to several alternative models for estimating precision (FClime) and covariance (FLW, FNLW, POET) matrices, Equal-Weighted (EW) portfolio and Index portfolio in terms of the OOS SR and risk. This result is robust to monthly and daily data. We examine three portfolio formulations and discover that the only portfolios that produce positive CER during recessions are the ones that relax the constraint requiring portfolio weights sum up to one.

References

  • Ait-Sahalia and Xiu, (2017) Ait-Sahalia, Y. and Xiu, D. (2017). Using principal component analysis to estimate a high dimensional factor model with high-frequency data. Journal of Econometrics, 201(2):384–399.
  • Awoye, (2016) Awoye, O. A. (2016). Markowitz Minimum Variance Portfolio Optimization Using New Machine Learning Methods. PhD thesis, University College London.
  • Bai, (2003) Bai, J. (2003). Inferential theory for factor models of large dimensions. Econometrica, 71(1):135–171.
  • Bai and Ng, (2002) Bai, J. and Ng, S. (2002). Determining the number of factors in approximate factor models. Econometrica, 70(1):191–221.
  • Bai and Ng, (2006) Bai, J. and Ng, S. (2006). Confidence intervals for diffusion index forecasts and inference for factor-augmented regressions. Econometrica, 74(4):1133–1150.
  • Bailey et al., (2021) Bailey, N., Kapetanios, G., and Pesaran, M. H. (2021). Measurement of factor strength: Theory and practice. Journal of Applied Econometrics, 36(5):587–613.
  • Ban et al., (2018) Ban, G.-Y., El Karoui, N., and Lim, A. E. (2018). Machine learning and portfolio optimization. Management Science, 64(3):1136–1154.
  • Barigozzi et al., (2018) Barigozzi, M., Brownlees, C., and Lugosi, G. (2018). Power-law partial correlation network models. Electronic Journal of Statistics, 12(2):2905–2929.
  • Bishop, (2006) Bishop, C. M. (2006). Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, Berlin, Heidelberg.
  • Brownlees et al., (2018) Brownlees, C., Nualart, E., and Sun, Y. (2018). Realized networks. Journal of Applied Econometrics, 33(7):986–1006.
  • Cai et al., (2011) Cai, T., Liu, W., and Luo, X. (2011). A constrained l1-minimization approach to sparse precision matrix estimation. Journal of the American Statistical Association, 106(494):594–607.
  • Cai et al., (2020) Cai, T. T., Hu, J., Li, Y., and Zheng, X. (2020). High-dimensional minimum variance portfolio estimation based on high-frequency data. Journal of Econometrics, 214(2):482–494.
  • Callot et al., (2021) Callot, L., Caner, M., Önder, A. O., and Ulaşan, E. (2021). A nodewise regression approach to estimating large portfolios. Journal of Business & Economic Statistics, 39(2):520–531.
  • Callot et al., (2017) Callot, L. A. F., Kock, A. B., and Medeiros, M. C. (2017). Modeling and forecasting large realized covariance matrices and portfolio choice. Journal of Applied Econometrics, 32(1):140–158.
  • Campbell et al., (1997) Campbell, J. Y., Lo, A. W., and MacKinlay, A. C. (1997). The Econometrics of Financial Markets. Princeton University Press.
  • Chang et al., (2018) Chang, J., Qiu, Y., Yao, Q., and Zou, T. (2018). Confidence regions for entries of a large precision matrix. Journal of Econometrics, 206(1):57–82.
  • Chudik et al., (2011) Chudik, A., Pesaran, M. H., and Tosetti, E. (2011). Weak and strong cross-section dependence and estimation of large panels. The Econometrics Journal, 14(1):C45–C90.
  • Connor and Korajczyk, (1988) Connor, G. and Korajczyk, R. A. (1988). Risk and return in an equilibrium APT: Application of a new test methodology. Journal of Financial Economics, 21(2):255–289.
  • DeMiguel et al., (2009) DeMiguel, V., Garlappi, L., and Uppal, R. (2009). Optimal versus naive diversification: How inefficient is the 1/n portfolio strategy? The Review of Financial Studies, 22(5):1915–1953.
  • Ding et al., (2021) Ding, Y., Li, Y., and Zheng, X. (2021). High dimensional minimum variance portfolio estimation under statistical factor models. Journal of Econometrics, 222(1, Part B):502–515.
  • Fama and French, (1993) Fama, E. F. and French, K. R. (1993). Common risk factors in the returns on stocks and bonds. Journal of Financial Economics, 33(1):3–56.
  • Fama and French, (2015) Fama, E. F. and French, K. R. (2015). A five-factor asset pricing model. Journal of Financial Economics, 116(1):1–22.
  • Fan et al., (2008) Fan, J., Fan, Y., and Lv, J. (2008). High dimensional covariance matrix estimation using a factor model. Journal of Econometrics, 147(1):186 – 197.
  • Fan et al., (2011) Fan, J., Liao, Y., and Mincheva, M. (2011). High-dimensional covariance matrix estimation in approximate factor models. The Annals of Statistics, 39(6):3320–3356.
  • Fan et al., (2013) Fan, J., Liao, Y., and Mincheva, M. (2013). Large covariance estimation by thresholding principal orthogonal complements. Journal of the Royal Statistical Society: Series B, 75(4):603–680.
  • Fan et al., (2018) Fan, J., Liu, H., and Wang, W. (2018). Large covariance estimation through elliptical factor models. The Annals of Statistics, 46(4):1383–1414.
  • Friedman et al., (2008) Friedman, J., Hastie, T., and Tibshirani, R. (2008). Sparse inverse covariance estimation with the Graphical Lasso. Biostatistics, 9(3):432–441.
  • Gabaix, (2011) Gabaix, X. (2011). The granular origins of aggregate fluctuations. Econometrica, 79(3):733–772.
  • Goto and Xu, (2015) Goto, S. and Xu, Y. (2015). Improving mean variance optimization through sparse hedging restrictions. Journal of Financial and Quantitative Analysis, 50(6):1415–1441.
  • Hastie et al., (2001) Hastie, T., Tibshirani, R., and Friedman, J. (2001). The Elements of Statistical Learning. Springer Series in Statistics. Springer New York Inc., New York, NY, USA.
  • Hautsch et al., (2012) Hautsch, N., Kyj, L. M., and Oomen, R. C. A. (2012). A blocking and regularization approach to high-dimensional realized covariance estimation. Journal of Applied Econometrics, 27(4):625–645.
  • Janková and van de Geer, (2018) Janková, J. and van de Geer, S. (2018). Inference in high-dimensional graphical models. Handbook of Graphical Models, Chapter 14, pages 325–351. CRC Press.
  • Kapetanios, (2010) Kapetanios, G. (2010). A testing procedure for determining the number of factors in approximate factor models with large datasets. Journal of Business & Economic Statistics, 28(3):397–409.
  • Koike, (2020) Koike, Y. (2020). De-biased graphical lasso for high-frequency data. Entropy, 22(4):456.
  • Ledoit and Wolf, (2004) Ledoit, O. and Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis, 88(2):365–411.
  • Ledoit and Wolf, (2017) Ledoit, O. and Wolf, M. (2017). Nonlinear shrinkage of the covariance matrix for portfolio selection: Markowitz meets goldilocks. The Review of Financial Studies, 30(12):4349–4388.
  • Li et al., (2017) Li, H., Li, Q., and Shi, Y. (2017). Determining the number of factors when the number of factors can increase with sample size. Journal of Econometrics, 197(1):76–86.
  • Li, (2015) Li, J. (2015). Sparse and stable portfolio selection with parameter uncertainty. Journal of Business & Economic Statistics, 33(3):381–392.
  • Mazumder and Hastie, (2012) Mazumder, R. and Hastie, T. (2012). The Graphical Lasso: new insights and alternatives. Electronic Journal of Statistics, 6:2125–2149.
  • Meinshausen and Bühlmann, (2006) Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. The Annals of Statistics, 34(3):1436–1462.
  • Millington and Niranjan, (2017) Millington, T. and Niranjan, M. (2017). Robust portfolio risk minimization using the graphical lasso. In Neural Information Processing, pages 863–872, Cham. Springer International Publishing.
  • Onatski, (2013) Onatski, A. (2013). Discussion on the paper by Fan J., Liao Y., and Mincheva M. Large covariance estimation by thresholding principal orthogonal complements. Journal of the Royal Statistical Society: Series B, 75(4):650–652.
  • Pourahmadi, (2013) Pourahmadi, M. (2013). High-Dimensional Covariance Estimation: With High-Dimensional Data. Wiley Series in Probability and Statistics. John Wiley and Sons, 2013.
  • Ravikumar et al., (2011) Ravikumar, P., J. Wainwright, M., Raskutti, G., and Yu, B. (2011). High-dimensional covariance estimation by minimizing -penalized log-determinant divergence. Electronic Journal of Statistics, Vol. 5:935–980.
  • Romano and Wolf, (2005) Romano, J. P. and Wolf, M. (2005). Stepwise multiple testing as formalized data snooping. Econometrica, 73(4):1237–1282.
  • Romano and Wolf, (2016) Romano, J. P. and Wolf, M. (2016). Efficient computation of adjusted p-values for resampling-based stepdown multiple testing. Statistics and Probability Letters, 113:38–40.
  • Ross, (1976) Ross, S. A. (1976). The arbitrage theory of capital asset pricing. Journal of Economic Theory, 13(3):341–360.
  • Stock and Watson, (2002) Stock, J. H. and Watson, M. W. (2002). Forecasting using principal components from a large number of predictors. Journal of the American Statistical Association, 97(460):1167–1179.
  • Tobin, (1958) Tobin, J. (1958). Liquidity preference as behavior towards risk. The Review of Economic Studies, 25(2):65–86.
  • Zhao et al., (2012) Zhao, T., Liu, H., Roeder, K., Lafferty, J., and Wasserman, L. (2012). The HUGE package for high-dimensional undirected graph estimation in R. Journal of Machine Learning Research, 13(1):1059–1062.
Refer to caption
Figure 1: Averaged errors of the estimators of 𝚯{\bm{\Theta}} for Case 2 on logarithmic scale: p=3T0.85p=3\cdot T^{0.85}, K=2(logT)0.5K=2(\log T)^{0.5}, sT=𝒪(T0.05)s_{T}=\mathcal{O}(T^{0.05}).
Refer to caption
Figure 2: Averaged errors of the estimators of 𝐰GMV{\mathbf{w}}_{\text{GMV}} (left) and 𝐰MRC{\mathbf{w}}_{\text{MRC}} (right) for Case 2 on logarithmic scale: p=3T0.85p=3\cdot T^{0.85}, K=2(logT)0.5K=2(\log T)^{0.5}, sT=𝒪(T0.05)s_{T}=\mathcal{O}(T^{0.05}).
Refer to caption
Figure 3: Averaged errors of the estimators of ΦGMV\Phi_{\text{GMV}} (left) and ΦMRC\Phi_{\text{MRC}} (right) for Case 2 on logarithmic scale: p=3T0.85p=3\cdot T^{0.85}, K=2(logT)0.5K=2(\log T)^{0.5}, sT=𝒪(T0.05)s_{T}=\mathcal{O}(T^{0.05}).
Table 1: Daily portfolio returns, risk, SR and turnover. In the upper part corresponding to the results w/o transactions costs, p-values are in parentheses. In the lower part corresponding to the results with transaction costs, ∗∗∗ indicates p-value << 0.01, ∗∗ indicates p-value << 0.05, and indicates p-value << 0.10. In-sample: January 20, 2000 - January 24, 2002 (504 obs), Out-of-sample: January 17, 2002 - January 31, 2020 (4536 obs).
Markowitz Risk-Constrained Markowitz Weight-Constrained Global Minimum-Variance
Return Risk SR Turnover Return Risk SR Turnover Return Risk SR Turnover
Without TC
EW 2.33E-04 1.90E-02 0.0123 - 2.33E-04 1.90E-02 0.0123 - 2.33E-04 1.90E-02 0.0123 -
Index 1.86E-04 1.17E-02 0.0159 - 1.86E-04 1.17E-02 0.0159 - 1.86E-04 1.17E-02 0.0159 -
FGL 8.12E-04 2.66E-02
0.0305
(0.0579)
- 2.95E-04 8.21E-03
0.0360
(0.024)
- 2.94E-04 7.51E-03
0.0392
(0.0279)
-
FClime 2.15E-03 8.46E-02
0.0254
(0.0758)
- 2.02E-04 9.85E-03
0.0205
(0.0299)
- 2.73E-04 1.07E-02
0.0255
(0.0419)
-
FLW 4.34E-04 2.65E-02
0.0164
(0.1782)
- 3.12E-04 9.96E-03
0.0313
(0.024)
- 3.10E-04 9.38E-03
0.0330
(0.0279)
-
FNLW 4.91E-04 6.66E-02
0.0074
(0.5515 )
- 2.98E-04 1.24E-02
0.0241
(0.0419)
- 3.06E-04 1.32E-02
0.0231
(0.0419)
-
POET NaN NaN NaN - -7.06E-04 2.74E-01
-0.0026
(0.9137)
- 1.07E-03 2.71E-01
0.0039
(0.7912)
-
Projected POET 1.20E-03 1.71E-01
0.0070
(0.5515)
- -8.06E-05 1.61E-02
-0.0050
(0.9337)
- -7.57E-05 1.93E-02
-0.0039
(0.9482)
-
FGL (FF1) 7.96E-04 2.80E-02
0.0285
(0.0758)
- 3.73E-04 8.73E-03
0.0427
(0.024)
- 3.52E-04 8.62E-03
0.0408
(0.0259)
-
FGL (FF3) 6.51E-04 2.74E-02
0.0238
(0.0758)
- 3.52E-04 8.96E-03
0.0393
(0.024)
- 3.39E-04 8.94E-03
0.0379
(0.022)
-
FGL (FF5) 5.87E-04 2.70E-02
0.0217
(0.0758)
- 3.47E-04 9.38E-03
0.0370
(0.024)
- 3.36E-04 9.29E-03
0.0362
(0.022)
-
FF1 7.38E-04 1.11E-01
0.0067
(0.5821)
- 3.30E-05 1.62E-02
0.0020
(0.7139)
- 2.49E-05 1.61E-02
0.0015
(0.8430)
-
FF3 7.52E-04 1.11E-01
0.0068
(0.5821)
- 2.68E-05 1.62E-02
0.0017
(0.7139)
- 2.06E-05 1.61E-02
0.0013
(0.8430)
-
FF5 7.59E-04 1.11E-01
0.0069
(0.5821)
- 2.01E-05 1.62E-02
0.0012
(0.7139)
- 1.38E-05 1.61E-02
0.0009
(0.8430)
-
With TC
EW 2.01E-04 1.90E-02 0.0106 0.0292 2.01E-04 1.90E-02 0.0106 0.0292 2.01E-04 1.90E-02 0.0106 0.0292
FGL 4.47E-04 2.66E-02 0.0168 0.3655 2.30E-04 8.22E-03 0.0280* 0.0666 2.32E-04 7.52E-03 0.0309* 0.0633
FClime 1.18E-03 8.48E-02 0.0139 1.0005 1.67E-04 9.86E-03 0.0170 0.0369 2.46E-04 1.07E-02 0.0230* 0.0290
FLW -5.54E-05 2.65E-02 -0.0021 0.4874 1.92E-04 9.98E-03 0.0193 0.1207 1.92E-04 9.39E-03 0.0204* 0.1194
FNLW -2.39E-03 7.03E-02 -0.0340 3.6370 5.50E-05 1.25E-02 0.0044 0.2441 6.08E-05 1.33E-02 0.0046 0.2457
POET NaN NaN NaN NaN -2.28E-02 5.55E-01 -0.0411 113.3848 -2.81E-02 4.21E-01 -0.0666 132.8215
Projected POET -1.59E-02 3.64E-01 -0.0437 35.9692 -1.03E-03 1.68E-02 -0.0616 0.9544 -1.37E-03 2.06E-02 -0.0666 1.2946
FGL (FF1) 3.86E-04 2.80E-02 0.0138 0.4068 2.82E-04 8.74E-03 0.0323** 0.0903 2.63E-04 8.63E-03 0.0305* 0.0887
FGL (FF3) 2.47E-04 2.74E-02 0.0090 0.4043 2.60E-04 8.98E-03 0.0290** 0.0928 2.49E-04 8.96E-03 0.0278* 0.0911
FGL (FF5) 1.83E-04 2.71E-02 0.0068 0.4032 2.53E-04 9.40E-03 0.0269* 0.0952 2.43E-04 9.30E-03 0.0262* 0.0937
FF1 -6.69E-03 1.28E-01 -0.0639 8.5721 -5.27E-04 1.65E-02 -0.0319 0.5704 -5.30E-04 1.64E-02 -0.0323 0.5641
FF3 -6.65E-03 1.28E-01 -0.0635 8.5411 -5.33E-04 1.65E-02 -0.0323 0.5701 -5.34E-04 1.64E-02 -0.0326 0.5638
FF5 -6.63E-03 1.28E-01 -0.0634 8.5262 -5.40E-04 1.65E-02 -0.0327 0.5703 -5.41E-04 1.64E-02 -0.0330 0.5646
Table 2: Cumulative excess return (CER) and risk of MRC portfolios using daily data. Targeted risk is set at σ=0.013\sigma=0.013, daily targeted return is 0.0378%0.0378\%. P-values are in parentheses. In-sample: January 20, 2000 - January 24, 2002 (504 obs), Out-of-sample: January 17, 2002 - January 31, 2020 (4536 obs).
EW Index FGL FClime FLW FNLW ProjPOET FGL(FF1) FGL(FF3) FGL(FF5) FF1 FF3 FF5
Downturn #1: Argentine Great Depression (2002)
CER -0.1633 -0.2418 0.2909 -0.0079 0.0308 0.0728 -0.6178 0.3375 0.3423 0.3401 -0.0860 -0.0860 -0.0860
Risk 0.0160 0.0168 0.0206 0.0348 0.0231 0.0213 0.0545 0.0211 0.0211 0.0212 0.0495 0.0495 0.0495
SR -0.0393 -0.0615
0.0629
(0.0619)
0.0164
(0.0759)
0.0171
(0.0759)
0.0246
(0.0759)
-0.0467
(0.4852)
0.0689
(0.0619)
0.0696
(0.0619)
0.0692
(0.0619)
0.0169
(0.0759)
0.0169
(0.0759)
0.0169
(0.0759)
Downturn #2: Financial Crisis (2008)
CER -0.5622 -0.4746 0.2938 -0.8912 0.2885 0.2075 -0.9999 0.2665 0.2650 0.2560 0.0404 0.0404 0.0404
Risk 0.0310 0.0258 0.0282 0.1484 0.0315 0.0392 0.1963 0.0320 0.0319 0.0319 0.0986 0.0986 0.0986
SR -0.0857 -0.0857
0.0315
(0.0889)
0.1045
(0.1079)
0.0282
(0.1079)
0.0392
(0.1079)
0.1963
(0.1079)
0.0320
(0.0889)
0.0319
(0.0889)
0.0319
(0.0889)
0.0006
(0.1079)
0.0006
(0.1079)
0.0006
(0.1079)
Boom #1 (2017)
CER 0.0627 0.1752 0.7267 0.5331 0.3164 0.5796 -0.7599 0.6568 0.6607 0.6486 -0.5070 -0.5294 -0.4755
Risk 0.0218 0.0042 0.0142 0.0383 0.0118 0.0497 0.1197 0.0135 0.0134 0.0132 0.0720 0.0721 0.0710
SR 0.0220 0.1536
0.1606
(0.5544)
0.1231
(0.6264)
0.0987
(0.6264)
0.1008
(0.5465)
0.0151
(0.9815)
0.1563
(0.5455)
0.1581
(0.5455)
0.1575
(0.5455)
-0.0022
(0.9985)
-0.0046
(0.9985)
0.0002
(0.9815)
Boom #2 (2019)
CER 0.1642 0.2934 0.6872 0.2346 0.5520 0.9315 1.8592 0.5166 0.5168 0.5037 0.2690 0.2682 0.2730
Risk 0.0185 0.0086 0.0263 0.0557 0.0287 0.0355 0.1177 0.0247 0.0248 0.0248 0.1094 0.1094 0.1094
SR 0.0418 0.1228
0.0919
(0.1738)
0.0436
(0.2298)
0.0753
(0.2298)
0.0905
(0.2298)
0.0898
(0.2298)
0.0793
(0.1728)
0.0792
(0.1728)
0.0779
(0.1728)
0.0798
(0.2298)
0.0798
(0.2298)
0.0799
(0.2298)

This Online Supplemental Appendix is structured as follows: Appendix A summarizes Graphical Lasso algorithm, Appendix B contains proofs of the theorems, accompanying lemmas, and an extension of the theorems to elliptical distributions. Appendix C provides additional simulations for Section 5, additional empirical results for Section 6 are located in Appendix D.

Appendix A Graphical Lasso Algorithm

To solve (3.3) we use the procedure based on the weighted Graphical Lasso which was first proposed in Friedman et al., (2008) and further studied in Mazumder and Hastie, (2012) and Janková and van de Geer, (2018) among others. Define the following partitions of 𝐖ε{\mathbf{W}}_{\varepsilon}, 𝚺^ε\widehat{{\bm{\Sigma}}}_{\varepsilon} and 𝚯ε{\bm{\Theta}}_{\varepsilon}:

𝐖ε=(𝐖ε,11(p1)×(p1)𝐰ε,12(p1)×1𝐰ε,12wε,22),𝚺^ε=(𝚺^ε,11(p1)×(p1)𝝈^ε,12(p1)×1𝝈^ε,12σ^ε,22),𝚯=(𝚯ε,11(p1)×(p1)𝜽ε,12(p1)×1𝜽ε,12θε,22).{\mathbf{W}}_{\varepsilon}=\begin{pmatrix}\underbrace{{\mathbf{W}}_{\varepsilon,11}}_{(p-1)\times(p-1)}&\underbrace{{\mathbf{w}}_{\varepsilon,12}}_{(p-1)\times 1}\\ {\mathbf{w}}_{\varepsilon,12}^{\prime}&w_{\varepsilon,22}\end{pmatrix},\widehat{{\bm{\Sigma}}}_{\varepsilon}=\begin{pmatrix}\underbrace{\widehat{{\bm{\Sigma}}}_{\varepsilon,11}}_{(p-1)\times(p-1)}&\underbrace{\widehat{{\bm{\sigma}}}_{\varepsilon,12}}_{(p-1)\times 1}\\ \widehat{{\bm{\sigma}}}_{\varepsilon,12}^{\prime}&\widehat{\sigma}_{\varepsilon,22}\end{pmatrix},{\bm{\Theta}}=\begin{pmatrix}\underbrace{{\bm{\Theta}}_{\varepsilon,11}}_{(p-1)\times(p-1)}&\underbrace{{\bm{\theta}}_{\varepsilon,12}}_{(p-1)\times 1}\\ {\bm{\theta}}_{\varepsilon,12}^{\prime}&\theta_{\varepsilon,22}\end{pmatrix}. (A.1)

Let 𝜷𝜽ε,12/θε,22{\bm{\beta}}\equiv-{\bm{\theta}}_{\varepsilon,12}/\theta_{\varepsilon,22}. The idea of GLASSO is to set 𝐖ε=𝚺^ε+λ𝐈{\mathbf{W}}_{\varepsilon}=\widehat{{\bm{\Sigma}}}_{\varepsilon}+\lambda{\mathbf{I}} in (3.3) and combine the gradient of (3.3) with the formula for partitioned inverses to obtain the following 1\ell_{1}-regularized quadratic program

𝜷^=argmin𝜷p1{12𝜷𝐖ε,11𝜷𝜷𝝈^ε,12+λ𝜷1}.\widehat{{\bm{\beta}}}=\arg\!\min_{{\bm{\beta}}\in\mathbb{R}^{p-1}}\Bigl{\{}\frac{1}{2}{\bm{\beta}}^{\prime}{\mathbf{W}}_{\varepsilon,11}{\bm{\beta}}-{\bm{\beta}}^{\prime}\widehat{{\bm{\sigma}}}_{\varepsilon,12}+\lambda\left\lVert{\bm{\beta}}\right\rVert_{1}\Bigr{\}}. (A.2)

As shown by Friedman et al., (2008), (A.2) can be viewed as a LASSO regression, where the LASSO estimates are functions of the inner products of 𝐖ε,11{\mathbf{W}}_{\varepsilon,11} and σ^ε,12\widehat{\sigma}_{\varepsilon,12}. Hence, (3.3) is equivalent to pp coupled LASSO problems. Once we obtain 𝜷^\widehat{{\bm{\beta}}}, we can estimate the entries 𝚯ε{\bm{\Theta}}_{\varepsilon} using the formula for partitioned inverses. The procedure to obtain sparse 𝚯ε{\bm{\Theta}}_{\varepsilon} is summarized in Algorithm 1.

Algorithm 1 Graphical Lasso Friedman et al., (2008)
1:  Initialize 𝐖ε=𝚺^ε+λ𝐈{\mathbf{W}}_{\varepsilon}=\widehat{{\bm{\Sigma}}}_{\varepsilon}+\lambda{\mathbf{I}}. The diagonal of 𝐖ε{\mathbf{W}}_{\varepsilon} remains the same in what follows.
2:  Repeat for j=1,,p,1,,p,j=1,\ldots,p,1,\ldots,p,\ldots until convergence:
  • Partition 𝐖ε{\mathbf{W}}_{\varepsilon} into part 1: all but the jj-th row and column, and part 2: the jj-th row and column.

  • Solve the score equations using the cyclical coordinate descent: 𝐖ε,11𝜷𝝈^ε,12+λSign(𝜷)=𝟎{\mathbf{W}}_{\varepsilon,11}{\bm{\beta}}-\widehat{{\bm{\sigma}}}_{\varepsilon,12}+\lambda\cdot\text{Sign}({\bm{\beta}})=\mathbf{0}. This gives a (p1)×1(p-1)\times 1 vector solution 𝜷^.\widehat{{\bm{\beta}}}.

  • Update 𝐰^ε,12=𝐖ε,11𝜷^\widehat{{\mathbf{w}}}_{\varepsilon,12}={\mathbf{W}}_{\varepsilon,11}\widehat{{\bm{\beta}}}.

3:  In the final cycle (for i=1,,pi=1,\ldots,p) solve for 1θ^22=wε,22𝜷^𝐰^ε,12\frac{1}{\widehat{\theta}_{22}}=w_{\varepsilon,22}-\widehat{{\bm{\beta}}}^{\prime}\widehat{{\mathbf{w}}}_{\varepsilon,12} and 𝜽^12=θ^22𝜷^\widehat{{\bm{\theta}}}_{12}=-\widehat{\theta}_{22}\widehat{{\bm{\beta}}}.

As was shown in Friedman et al., (2008) and the follow-up paper by Mazumder and Hastie, (2012), the estimator produced by Graphical Lasso is guaranteed to be positive definite.

Appendix B Proofs of the Theorems

B.1 Lemmas for Theorem 1

Lemma 1.

Under the assumptions of Theorem 1,

  1. (a)

    maxi,jK|(1/T)t=1Tfitfjt𝔼[fitfjt]|=𝒪P(1/T)\max_{i,j\leq K}\left\lvert(1/T)\sum_{t=1}^{T}f_{it}f_{jt}-\mathbb{E}_{\,\!\!}\left[f_{it}f_{jt}\right]\right\rvert=\mathcal{O}_{P}(\sqrt{1/T}),

  2. (b)

    maxi,jp|(1/T)t=1Tεitεjt𝔼[εitεjt]|=𝒪P(logp/T)\max_{i,j\leq p}\left\lvert(1/T)\sum_{t=1}^{T}\varepsilon_{it}\varepsilon_{jt}-\mathbb{E}_{\,\!\!}\left[\varepsilon_{it}\varepsilon_{jt}\right]\right\rvert=\mathcal{O}_{P}(\sqrt{\log p/T}),

  3. (c)

    maxiK,jp|(1/T)t=1Tfitεjt|=𝒪P(logp/T)\max_{i\leq K,j\leq p}\left\lvert(1/T)\sum_{t=1}^{T}f_{it}\varepsilon_{jt}\right\rvert=\mathcal{O}_{P}(\sqrt{\log p/T}).

Proof.

The proof of Lemma 1 can be found in Fan et al. (2011) (Lemma B.1). ∎

Lemma 2.

Under Assumption (A.4), maxtTs=1K|𝔼[𝛆s𝛆t]|/p=𝒪(1)\max_{t\leq T}\sum_{s=1}^{K}\left\lvert\mathbb{E}_{\,\!\!}\left[{\bm{\varepsilon}}^{\prime}_{s}{\bm{\varepsilon}}_{t}\right]\right\rvert/p=\mathcal{O}(1).

Proof.

The proof of Lemma 2 can be found in Fan et al. (2013) (Lemma A.6). ∎

Lemma 3.

For K^\widehat{K} defined in expression (3.6),

Pr(K^=K)1.\Pr\Big{(}\widehat{K}=K\Big{)}\rightarrow 1.
Proof.

The proof of Lemma 3 can be found in Li et al. (2017) (Theorem 1 and Corollary 1). ∎

Using the expressions (A.1) in Bai (2003) and (C.2) in Fan et al. (2013), we have the following identity:

𝐟^t𝐇𝐟t=(𝐕p)1[1Ts=1T𝐟^s𝔼[𝜺s𝜺t]p+1Ts=1T𝐟^sζst+1Ts=1T𝐟^sηst+1Ts=1T𝐟^sξst],\widehat{{\mathbf{f}}}_{t}-{\mathbf{H}}{\mathbf{f}}_{t}=\Big{(}\frac{{\mathbf{V}}}{p}\Big{)}^{-1}\Bigg{[}\frac{1}{T}\sum_{s=1}^{T}\widehat{{\mathbf{f}}}_{s}\frac{\mathbb{E}_{\,\!\!}\left[{\bm{\varepsilon}}^{\prime}_{s}{\bm{\varepsilon}}_{t}\right]}{p}+\frac{1}{T}\sum_{s=1}^{T}\widehat{{\mathbf{f}}}_{s}\zeta_{st}+\frac{1}{T}\sum_{s=1}^{T}\widehat{{\mathbf{f}}}_{s}\eta_{st}+\frac{1}{T}\sum_{s=1}^{T}\widehat{{\mathbf{f}}}_{s}\xi_{st}\Bigg{]}, (B.1)

where ζst=𝜺s𝜺t/p𝔼[𝜺s𝜺t]/p\zeta_{st}={\bm{\varepsilon}}^{\prime}_{s}{\bm{\varepsilon}}_{t}/p-\mathbb{E}_{\,\!\!}\left[{\bm{\varepsilon}}^{\prime}_{s}{\bm{\varepsilon}}_{t}\right]/p, ηst=𝐟si=1p𝐛iεit/p\eta_{st}={\mathbf{f}}^{\prime}_{s}\sum_{i=1}^{p}{\mathbf{b}}_{i}\varepsilon_{it}/p and ξst=𝐟ti=1p𝐛iεis/p\xi_{st}={\mathbf{f}}^{\prime}_{t}\sum_{i=1}^{p}{\mathbf{b}}_{i}\varepsilon_{is}/p.

Lemma 4.

For all iK^i\leq\widehat{K},

  1. (a)

    (1/T)t=1T[(1/T)t=1Tf^is𝔼[𝜺s𝜺t]/p]2=𝒪P(T1)(1/T)\sum_{t=1}^{T}\Big{[}(1/T)\sum_{t=1}^{T}\hat{f}_{is}\mathbb{E}_{\,\!\!}\left[{\bm{\varepsilon}}^{\prime}_{s}{\bm{\varepsilon}}_{t}\right]/p\Big{]}^{2}=\mathcal{O}_{P}(T^{-1}),

  2. (b)

    (1/T)t=1T[(1/T)t=1Tf^isζst/p]2=𝒪P(p1)(1/T)\sum_{t=1}^{T}\Big{[}(1/T)\sum_{t=1}^{T}\hat{f}_{is}\zeta_{st}/p\Big{]}^{2}=\mathcal{O}_{P}(p^{-1}),

  3. (c)

    (1/T)t=1T[(1/T)t=1Tf^isηst/p]2=𝒪P(K2/p)(1/T)\sum_{t=1}^{T}\Big{[}(1/T)\sum_{t=1}^{T}\hat{f}_{is}\eta_{st}/p\Big{]}^{2}=\mathcal{O}_{P}(K^{2}/p),

  4. (d)

    (1/T)t=1T[(1/T)t=1Tf^isξst/p]2=𝒪P(K2/p)(1/T)\sum_{t=1}^{T}\Big{[}(1/T)\sum_{t=1}^{T}\hat{f}_{is}\xi_{st}/p\Big{]}^{2}=\mathcal{O}_{P}(K^{2}/p).

Proof.

We only prove (c) and (d), the proof of (a) and (b) can be found in Fan et al. (2013) (Lemma 8).

  1. (c)

    Recall, ηst=𝐟si=1p𝐛iεit/p\eta_{st}={\mathbf{f}}^{\prime}_{s}\sum_{i=1}^{p}{\mathbf{b}}_{i}\varepsilon_{it}/p. Using Assumption (A.5), we get 𝔼[(1/T)×t=1Ti=1p𝐛iεit2]=𝔼[i=1p𝐛iεit2]=𝒪(pK)\mathbb{E}_{\,\!\!}\left[(1/T)\times\sum_{t=1}^{T}\lVert\sum_{i=1}^{p}{\mathbf{b}}_{i}\varepsilon_{it}\rVert^{2}\right]=\mathbb{E}_{\,\!\!}\left[\left\lVert\sum_{i=1}^{p}{\mathbf{b}}_{i}\varepsilon_{it}\right\rVert^{2}\right]=\mathcal{O}(pK). Therefore, by the Cauchy-Schwarz inequality and the facts that (1/T)t=1T𝐟t2=𝒪(K)(1/T)\sum_{t=1}^{T}\lVert{\mathbf{f}}_{t}\rVert^{2}=\mathcal{O}(K), and, i\forall i, s=1Tf^is2=T\sum_{s=1}^{T}\hat{f}_{is}^{2}=T,

    1Tt=1T(1Ts=1Tf^isηst)2\displaystyle\frac{1}{T}\sum_{t=1}^{T}\Big{(}\frac{1}{T}\sum_{s=1}^{T}\hat{f}_{is}\eta_{st}\Big{)}^{2} 1Ts=1Tf^is𝐟s21Tt=1T1pj=1p𝐛iεjt2\displaystyle\leq\left\lVert\frac{1}{T}\sum_{s=1}^{T}\lVert\hat{f}_{is}{\mathbf{f}}^{\prime}_{s}\rVert^{2}\frac{1}{T}\sum_{t=1}^{T}\frac{1}{p}\lVert\sum_{j=1}^{p}{\mathbf{b}}_{i}\varepsilon_{jt}\rVert\right\rVert^{2}
    1Tp2t=1Tj=1p𝐛iεjt2(1Ts=1Tf^is21Ts=1T𝐟s2)\displaystyle\leq\frac{1}{Tp^{2}}\sum_{t=1}^{T}\left\lVert\sum_{j=1}^{p}{\mathbf{b}}_{i}\varepsilon_{jt}\right\rVert^{2}\Bigg{(}\frac{1}{T}\sum_{s=1}^{T}\hat{f}_{is}^{2}\frac{1}{T}\sum_{s=1}^{T}\left\lVert{\mathbf{f}}_{s}\right\rVert^{2}\Bigg{)}
    =𝒪P(KpK)=𝒪P(K2p).\displaystyle=\mathcal{O}_{P}\Big{(}\frac{K}{p}\cdot K\Big{)}=\mathcal{O}_{P}\Big{(}\frac{K^{2}}{p}\Big{)}.
  2. (d)

    Using a similar approach as in part (c):

    1Tt=1T(1Ts=1Tf^isξst)2\displaystyle\frac{1}{T}\sum_{t=1}^{T}\Big{(}\frac{1}{T}\sum_{s=1}^{T}\hat{f}_{is}\xi_{st}\Big{)}^{2} =1Tt=1T|1Ts=1T𝐟tj=1pεjs1pf^is|2(1Tt=1T𝐟t2)1Ts=1Tj=1p𝐛jεjs1pf^is2\displaystyle=\frac{1}{T}\sum_{t=1}^{T}\left\lvert\frac{1}{T}\sum_{s=1}^{T}{\mathbf{f}}^{\prime}_{t}\sum_{j=1}^{p}\varepsilon_{js}\frac{1}{p}\hat{f}_{is}\right\rvert^{2}\leq\Big{(}\frac{1}{T}\sum_{t=1}^{T}\left\lVert{\mathbf{f}}_{t}\right\rVert^{2}\Big{)}\left\lVert\frac{1}{T}\sum_{s=1}^{T}\sum_{j=1}^{p}{\mathbf{b}}_{j}\varepsilon_{js}\frac{1}{p}\hat{f}_{is}\right\rVert^{2}
    (1Tt=1T𝐟t2)1Ts=1Tj=1p𝐛jεjs1p2(1Ts=1Tf^is2)\displaystyle\leq\Big{(}\frac{1}{T}\sum_{t=1}^{T}\left\lVert{\mathbf{f}}_{t}\right\rVert^{2}\Big{)}\frac{1}{T}\sum_{s=1}^{T}\left\lVert\sum_{j=1}^{p}{\mathbf{b}}_{j}\varepsilon_{js}\frac{1}{p}\right\rVert^{2}\Big{(}\frac{1}{T}\sum_{s=1}^{T}\hat{f}_{is}^{2}\Big{)}
    =𝒪P(KpKp21)=𝒪P(K2p)\displaystyle=\mathcal{O}_{P}\Big{(}K\cdot\frac{pK}{p^{2}}\cdot 1\Big{)}=\mathcal{O}_{P}\Big{(}\frac{K^{2}}{p}\Big{)}

Lemma 5.
  1. (a)

    maxtT(1/(Tp))s=1T𝐟^s𝔼[𝜺s𝜺t]=𝒪P(K/T)\max_{t\leq T}\left\lVert(1/(Tp))\sum_{s=1}^{T}\widehat{{\mathbf{f}}}^{{}^{\prime}}_{s}\mathbb{E}_{\,\!\!}\left[{\bm{\varepsilon}}^{\prime}_{s}{\bm{\varepsilon}}_{t}\right]\right\rVert=\mathcal{O}_{P}(K/\sqrt{T}).

  2. (b)

    maxtT(1/(Tp))s=1T𝐟^sζst=𝒪P(KT1/4/p)\max_{t\leq T}\left\lVert(1/(Tp))\sum_{s=1}^{T}\widehat{{\mathbf{f}}}^{{}^{\prime}}_{s}\zeta_{st}\right\rVert=\mathcal{O}_{P}(\sqrt{K}T^{1/4}/\sqrt{p}).

  3. (c)

    maxtT(1/(Tp))s=1T𝐟^sηst=𝒪P(KT1/4/p)\max_{t\leq T}\left\lVert(1/(Tp))\sum_{s=1}^{T}\widehat{{\mathbf{f}}}^{{}^{\prime}}_{s}\eta_{st}\right\rVert=\mathcal{O}_{P}(KT^{1/4}/\sqrt{p}).

  4. (d)

    maxtT(1/(Tp))s=1T𝐟^sξst=𝒪P(KT1/4/p)\max_{t\leq T}\left\lVert(1/(Tp))\sum_{s=1}^{T}\widehat{{\mathbf{f}}}^{{}^{\prime}}_{s}\xi_{st}\right\rVert=\mathcal{O}_{P}(KT^{1/4}/\sqrt{p}).

Proof.

Our proof is similar to the proof in Fan et al. (2013). However, we relax the assumptions of fixed KK.

  1. (a)

    Using the Cauchy-Schwarz inequality, Lemma 2, and the fact that (1/T)t=1T𝐟^t2=𝒪P(K)(1/T)\sum_{t=1}^{T}\lVert\widehat{{\mathbf{f}}}_{t}\rVert^{2}=\mathcal{O}_{P}(K), we get

    maxtT1Tps=1T𝐟^s𝔼[𝜺s𝜺t]maxtT[1Ts=1T𝐟^s1Ts=1T(𝔼[𝜺s𝜺t]p)2]1/2𝒪P(K)maxtT[1Ts=1T(𝔼[𝜺s𝜺t]p)2]1/2\displaystyle\max_{t\leq T}\left\lVert\frac{1}{Tp}\sum_{s=1}^{T}\widehat{{\mathbf{f}}}^{\prime}_{s}\mathbb{E}_{\,\!\!}\left[{\bm{\varepsilon}}^{\prime}_{s}{\bm{\varepsilon}}_{t}\right]\right\rVert\leq\max_{t\leq T}\Bigg{[}\frac{1}{T}\sum_{s=1}^{T}\left\lVert\widehat{{\mathbf{f}}}_{s}\right\rVert\frac{1}{T}\sum_{s=1}^{T}\Bigg{(}\frac{\mathbb{E}_{\,\!\!}\left[{\bm{\varepsilon}}^{\prime}_{s}{\bm{\varepsilon}}_{t}\right]}{p}\Bigg{)}^{2}\Bigg{]}^{1/2}\leq\mathcal{O}_{P}(K)\max_{t\leq T}\Bigg{[}\frac{1}{T}\sum_{s=1}^{T}\Bigg{(}\frac{\mathbb{E}_{\,\!\!}\left[{\bm{\varepsilon}}^{\prime}_{s}{\bm{\varepsilon}}_{t}\right]}{p}\Bigg{)}^{2}\Bigg{]}^{1/2}
    𝒪P(K)maxs,t|𝔼[𝜺s𝜺t]p|maxtT[1Ts=1T|𝔼[𝜺s𝜺t]p|]1/2=𝒪P(K11T)=𝒪P(KT).\displaystyle\leq\mathcal{O}_{P}(K)\max_{s,t}\sqrt{\left\lvert\frac{\mathbb{E}_{\,\!\!}\left[{\bm{\varepsilon}}^{\prime}_{s}{\bm{\varepsilon}}_{t}\right]}{p}\right\rvert}\max_{t\leq T}\Bigg{[}\frac{1}{T}\sum_{s=1}^{T}\left\lvert\frac{\mathbb{E}_{\,\!\!}\left[{\bm{\varepsilon}}^{\prime}_{s}{\bm{\varepsilon}}_{t}\right]}{p}\right\rvert\Bigg{]}^{1/2}=\mathcal{O}_{P}\Big{(}K\cdot 1\cdot\frac{1}{\sqrt{T}}\Big{)}=\mathcal{O}_{P}\Big{(}\frac{K}{\sqrt{T}}\Big{)}.
  2. (b)

    Using the Cauchy-Schwarz inequality,

    maxtT1Ts=1T𝐟^sζstmaxtT1T(s=1T𝐟^s2s=1Tζst2)1/2\displaystyle\max_{t\leq T}\left\lVert\frac{1}{T}\sum_{s=1}^{T}\widehat{{\mathbf{f}}}^{{}^{\prime}}_{s}\zeta_{st}\right\rVert\leq\max_{t\leq T}\frac{1}{T}\Bigg{(}\sum_{s=1}^{T}\left\lVert\widehat{{\mathbf{f}}}_{s}\right\rVert^{2}\sum_{s=1}^{T}\zeta_{st}^{2}\Bigg{)}^{1/2} (𝒪P(K)maxt1Ts=1Tζst2)1/2\displaystyle\leq\Bigg{(}\mathcal{O}_{P}(K)\max_{t}\frac{1}{T}\sum_{s=1}^{T}\zeta_{st}^{2}\Bigg{)}^{1/2}
    =𝒪P(KT1/4/p).\displaystyle=\mathcal{O}_{P}\Big{(}\sqrt{K}\cdot T^{1/4}/\sqrt{p}\cdot\Big{)}.

    To obtain the last inequality we used Assumption (A.5)(b) to get 𝔼[(1/T)s=1Tζst2]2maxs,tT𝔼[ζst4]=𝒪(1/p2)\mathbb{E}_{\,\!\!}\left[(1/T)\sum_{s=1}^{T}\zeta_{st}^{2}\right]^{2}\leq\max_{s,t\leq T}\mathbb{E}_{\,\!\!}\left[\zeta_{st}^{4}\right]=\mathcal{O}(1/p^{2}), and then applied the Chebyshev inequality and Bonferroni’s method that yield maxt(1/T)s=1Tζst2=𝒪P(T/p)\max_{t}(1/T)\sum_{s=1}^{T}\zeta_{st}^{2}=\mathcal{O}_{P}\Big{(}\sqrt{T}/p\Big{)}.

  3. (c)

    Using the definition of ηst\eta_{st} we get

    maxtT1Ts=1T𝐟^sηst1Ts=1T𝐟^s𝐟smaxt1pi=1p𝐛iεit=𝒪P(KT1/4/p).\max_{t\leq T}\left\lVert\frac{1}{T}\sum_{s=1}^{T}\widehat{{\mathbf{f}}}^{{}^{\prime}}_{s}\eta_{st}\right\rVert\leq\left\lVert\frac{1}{T}\sum_{s=1}^{T}\widehat{{\mathbf{f}}}_{s}{\mathbf{f}}^{\prime}_{s}\right\rVert\max_{t}\left\lVert\frac{1}{p}\sum_{i=1}^{p}{\mathbf{b}}_{i}\varepsilon_{it}\right\rVert=\mathcal{O}_{P}\Big{(}K\cdot T^{1/4}/\sqrt{p}\Big{)}.

    To obtain the last rate we used Assumption (A.5)(c) together with the Chebyshev inequality and Bonferroni’s method to get maxtTi=1p𝐛iεit=𝒪P(T1/4p)\max_{t\leq T}\left\lVert\sum_{i=1}^{p}{\mathbf{b}}_{i}\varepsilon_{it}\right\rVert=\mathcal{O}_{P}\Big{(}T^{1/4}\sqrt{p}\Big{)}.

  4. (d)

    In the proof of Lemma 4 we showed that (1/T)×t=1Ti=1p𝐛iεit(1/p)𝐟^s2=𝒪(K/p)\lVert(1/T)\times\sum_{t=1}^{T}\sum_{i=1}^{p}{\mathbf{b}}_{i}\varepsilon_{it}(1/p)\widehat{{\mathbf{f}}}_{s}\rVert^{2}=\mathcal{O}\Big{(}\sqrt{K/p}\Big{)}. Furthermore, Assumption (A.3) implies 𝔼[K2𝐟t]4<M\mathbb{E}_{\,\!\!}\left[K^{-2}{\mathbf{f}}_{t}\right]^{4}<M, therefore, maxtT𝐟t=𝒪P(T1/4K)\max_{t\leq T}\left\lVert{\mathbf{f}}_{t}\right\rVert=\mathcal{O}_{P}\Big{(}T^{1/4}\sqrt{K}\Big{)}. Using these bounds we get

    maxtT1Ts=1T𝐟^sξstmaxtT𝐟ts=1Ti=1p𝐛iεit1p𝐟^s=𝒪P(T1/4KK/p)=𝒪P(T1/4K/p).\displaystyle\max_{t\leq T}\left\lVert\frac{1}{T}\sum_{s=1}^{T}\widehat{{\mathbf{f}}}^{{}^{\prime}}_{s}\xi_{st}\right\rVert\leq\max_{t\leq T}\left\lVert{\mathbf{f}}_{t}\right\rVert\cdot\left\lVert\sum_{s=1}^{T}\sum_{i=1}^{p}{\mathbf{b}}_{i}\varepsilon_{it}\frac{1}{p}\widehat{{\mathbf{f}}}_{s}\right\rVert=\mathcal{O}_{P}\Big{(}T^{1/4}\sqrt{K}\cdot\sqrt{K/p}\Big{)}=\mathcal{O}_{P}\Big{(}T^{1/4}K/\sqrt{p}\Big{)}.

Lemma 6.
  1. (a)

    maxiK(1/T)t=1T(𝐟^t𝐇𝐟t)i2=𝒪P(1/T+K2/p)\max_{i\leq K}(1/T)\sum_{t=1}^{T}(\widehat{{\mathbf{f}}}_{t}-{\mathbf{H}}{\mathbf{f}}_{t})_{i}^{2}=\mathcal{O}_{P}(1/T+K^{2}/p).

  2. (b)

    (1/T)t=1T𝐟^t𝐇𝐟t2=𝒪P(K/T+K3/p)(1/T)\sum_{t=1}^{T}\lVert\widehat{{\mathbf{f}}}_{t}-{\mathbf{H}}{\mathbf{f}}_{t}\rVert^{2}=\mathcal{O}_{P}(K/T+K^{3}/p).

  3. (c)

    maxtT(1/T)𝐟^t𝐇𝐟t=𝒪P(K/T+KT1/4/p)\max_{t\leq T}(1/T)\lVert\widehat{{\mathbf{f}}}_{t}-{\mathbf{H}}{\mathbf{f}}_{t}\rVert=\mathcal{O}_{P}(K/\sqrt{T}+KT^{1/4}/\sqrt{p}).

Proof.

Similarly to Fan et al. (2013), we prove this lemma conditioning on the event K^=K\hat{K}=K. Since Pr(K^K)=o(1)\Pr(\hat{K}\neq K)=o(1), the unconditional arguments are implied.

  1. (a)

    Using (B.1), for some constant C>0C>0,

    maxiK(1/T)t=1T(𝐟^t𝐇𝐟t)i2\displaystyle\max_{i\leq K}(1/T)\sum_{t=1}^{T}(\widehat{{\mathbf{f}}}_{t}-{\mathbf{H}}{\mathbf{f}}_{t})_{i}^{2} CmaxiK1Tt=1T(1Ts=1Tf^is𝔼[𝜺s𝜺t]p)2+CmaxiK1Tt=1T(1Ts=1Tf^isζst)2\displaystyle\leq C\max_{i\leq K}\frac{1}{T}\sum_{t=1}^{T}\Bigg{(}\frac{1}{T}\sum_{s=1}^{T}\hat{f}_{is}\frac{\mathbb{E}_{\,\!\!}\left[{\bm{\varepsilon}}^{\prime}_{s}{\bm{\varepsilon}}_{t}\right]}{p}\Bigg{)}^{2}+C\max_{i\leq K}\frac{1}{T}\sum_{t=1}^{T}\Bigg{(}\frac{1}{T}\sum_{s=1}^{T}\hat{f}_{is}\zeta_{st}\Bigg{)}^{2}
    +CmaxiK1Tt=1T(1Ts=1Tf^isζst)2+CmaxiK1Tt=1T(1Ts=1Tf^isξst)2\displaystyle+C\max_{i\leq K}\frac{1}{T}\sum_{t=1}^{T}\Bigg{(}\frac{1}{T}\sum_{s=1}^{T}\hat{f}_{is}\zeta_{st}\Bigg{)}^{2}+C\max_{i\leq K}\frac{1}{T}\sum_{t=1}^{T}\Bigg{(}\frac{1}{T}\sum_{s=1}^{T}\hat{f}_{is}\xi_{st}\Bigg{)}^{2}
    =𝒪P(1T+1p+K2p+K2p)=𝒪P(1/T+K2/p).\displaystyle=\mathcal{O}_{P}\Bigg{(}\frac{1}{T}+\frac{1}{p}+\frac{K^{2}}{p}+\frac{K^{2}}{p}\Bigg{)}=\mathcal{O}_{P}(1/T+K^{2}/p).
  2. (b)

    Part (b) follows from part (a) and

    1Tt=1T𝐟^t𝐇𝐟t2KmaxiK1Tt=1T(𝐟^t𝐇𝐟t)i2.\frac{1}{T}\sum_{t=1}^{T}\lVert\widehat{{\mathbf{f}}}_{t}-{\mathbf{H}}{\mathbf{f}}_{t}\rVert^{2}\leq K\max_{i\leq K}\frac{1}{T}\sum_{t=1}^{T}(\widehat{{\mathbf{f}}}_{t}-{\mathbf{H}}{\mathbf{f}}_{t})_{i}^{2}.
  3. (c)

    Part (c) is a direct consequence of B.1 and Lemma 5.

Lemma 7.
  1. (a)

    𝐇𝐇=𝐈K^+𝒪P(K5/2/T+K5/2/p){\mathbf{H}}{\mathbf{H}}^{\prime}={\mathbf{I}}_{\hat{K}}+\mathcal{O}_{P}(K^{5/2}/\sqrt{T}+K^{5/2}/\sqrt{p}).

  2. (b)

    𝐇𝐇=𝐈K+𝒪P(K5/2/T+K5/2/p){\mathbf{H}}{\mathbf{H}}^{\prime}={\mathbf{I}}_{K}+\mathcal{O}_{P}(K^{5/2}/\sqrt{T}+K^{5/2}/\sqrt{p}).

Proof.

Similarly to Lemma 6, we first condition on K^=K\hat{K}=K.

  1. (a)

    The key observation here is that, according to the definition of 𝐇{\mathbf{H}}, its rank grows with KK, that is, 𝐇=𝒪P(K)\left\lVert{\mathbf{H}}\right\rVert=\mathcal{O}_{P}(K). Let cov^(𝐇𝐟t)=(1/T)t=1T𝐇𝐟t(𝐇𝐟t)\widehat{\text{cov}}({\mathbf{H}}{\mathbf{f}}_{t})=(1/T)\sum_{t=1}^{T}{\mathbf{H}}{\mathbf{f}}_{t}({\mathbf{H}}{\mathbf{f}}_{t})^{\prime}. Using the triangular inequality we get

    𝐇𝐇𝐈K^F𝐇𝐇cov^(𝐇𝐟t)F+cov^(𝐇𝐟t)𝐈K^F.\left\lVert{\mathbf{H}}{\mathbf{H}}^{\prime}-{\mathbf{I}}_{\hat{K}}\right\rVert_{F}\leq\left\lVert{\mathbf{H}}{\mathbf{H}}^{\prime}-\widehat{\text{cov}}({\mathbf{H}}{\mathbf{f}}_{t})\right\rVert_{F}+\left\lVert\widehat{\text{cov}}({\mathbf{H}}{\mathbf{f}}_{t})-{\mathbf{I}}_{\hat{K}}\right\rVert_{F}. (B.2)

    To bound the first term in (B.2), we use Lemma 1: 𝐇𝐇cov^(𝐇𝐟t)F𝐇2𝐈Kcov^(𝐇𝐟t)F=𝒪P(K5/2/T)\left\lVert{\mathbf{H}}{\mathbf{H}}^{\prime}-\widehat{\text{cov}}({\mathbf{H}}{\mathbf{f}}_{t})\right\rVert_{F}\leq\left\lVert{\mathbf{H}}\right\rVert^{2}\left\lVert{\mathbf{I}}_{K}-\widehat{\text{cov}}({\mathbf{H}}{\mathbf{f}}_{t})\right\rVert_{F}=\mathcal{O}_{P}(K^{5/2}/\sqrt{T}).
    To bound the second term in (B.2), we use the Cauchy-Schwarz inequality and Lemma 6:

    1Tt=1T𝐇𝐟t(𝐇𝐟t)1Tt=1T𝐟^t𝐟^tF1Tt=1T(𝐇𝐟t𝐟^t)(𝐇𝐟t)F+1Tt𝐟^t(𝐟^t(𝐇𝐟t))F\displaystyle\left\lVert\frac{1}{T}\sum_{t=1}^{T}{\mathbf{H}}{\mathbf{f}}_{t}({\mathbf{H}}{\mathbf{f}}_{t})^{\prime}-\frac{1}{T}\sum_{t=1}^{T}\widehat{{\mathbf{f}}}_{t}\widehat{{\mathbf{f}}}^{\prime}_{t}\right\rVert_{F}\leq\left\lVert\frac{1}{T}\sum_{t=1}^{T}({\mathbf{H}}{\mathbf{f}}_{t}-\widehat{{\mathbf{f}}}_{t})({\mathbf{H}}{\mathbf{f}}_{t})^{\prime}\right\rVert_{F}+\left\lVert\frac{1}{T}\sum_{t}\widehat{{\mathbf{f}}}_{t}(\widehat{{\mathbf{f}}}^{\prime}_{t}-({\mathbf{H}}{\mathbf{f}}_{t})^{\prime})\right\rVert_{F}
    (1Tt=1𝐇𝐟t𝐟^t21Tt=1𝐇𝐟t2)1/2+(1Tt=1𝐇𝐟t𝐟^t21Tt=1𝐟^t2)1/2\displaystyle\leq\Bigg{(}\frac{1}{T}\sum_{t=1}\left\lVert{\mathbf{H}}{\mathbf{f}}_{t}-\widehat{{\mathbf{f}}}_{t}\right\rVert^{2}\frac{1}{T}\sum_{t=1}\left\lVert{\mathbf{H}}{\mathbf{f}}_{t}\right\rVert^{2}\Bigg{)}^{1/2}+\Bigg{(}\frac{1}{T}\sum_{t=1}\left\lVert{\mathbf{H}}{\mathbf{f}}_{t}-\widehat{{\mathbf{f}}}_{t}\right\rVert^{2}\frac{1}{T}\sum_{t=1}\left\lVert\widehat{{\mathbf{f}}}_{t}\right\rVert^{2}\Bigg{)}^{1/2}
    =𝒪P((KT+K3pK)1/2+(KT+K3pK2)1/2)=𝒪P(K3/2T+K5/2p).\displaystyle=\mathcal{O}_{P}\Bigg{(}\Big{(}\frac{K}{T}+\frac{K^{3}}{p}\cdot K\Big{)}^{1/2}+\Big{(}\frac{K}{T}+\frac{K^{3}}{p}\cdot K^{2}\Big{)}^{1/2}\Bigg{)}=\mathcal{O}_{P}\Bigg{(}\frac{K^{3/2}}{\sqrt{T}}+\frac{K^{5/2}}{\sqrt{p}}\Bigg{)}.
  2. (b)

    The proof of (b) follows from Pr(K^=K)1\Pr(\hat{K}=K)\rightarrow 1 and the arguments made in Fan et al. (2013), (Lemma 11) for fixed KK.

B.2 Proof of Theorem 1

The second part of Theorem 1 was proved in Lemma 6. We now proceed to the convergence rate of the first part. Using the following definitions: 𝐛^i=(1/T)t=1Trit𝐟^t\widehat{{\mathbf{b}}}_{i}=(1/T)\sum_{t=1}^{T}r_{it}\widehat{{\mathbf{f}}}_{t} and (1/T)t=1T𝐟^t𝐟^t=𝐈K(1/T)\sum_{t=1}^{T}\widehat{{\mathbf{f}}}_{t}\widehat{{\mathbf{f}}}^{\prime}_{t}={\mathbf{I}}_{K}, we obtain

𝐛^i𝐇𝐛i=1Tt=1T𝐇𝐟tεit+1Tt=1Trit(𝐟^t𝐇𝐟t)+𝐇(1Tt=1T𝐟t𝐟t𝐈K)𝐛i.\widehat{{\mathbf{b}}}_{i}-{\mathbf{H}}{\mathbf{b}}_{i}=\frac{1}{T}\sum_{t=1}^{T}{\mathbf{H}}{\mathbf{f}}_{t}\varepsilon_{it}+\frac{1}{T}\sum_{t=1}^{T}r_{it}(\widehat{{\mathbf{f}}}_{t}-{\mathbf{H}}{\mathbf{f}}_{t})+{\mathbf{H}}\Big{(}\frac{1}{T}\sum_{t=1}^{T}{\mathbf{f}}_{t}{\mathbf{f}}^{\prime}_{t}-{\mathbf{I}}_{K}\Big{)}{\mathbf{b}}_{i}. (B.3)

Let us bound each term on the right-hand side of (B.3). The first term is

maxip𝐇𝐟tεit𝐇maxik=1K(1Tt=1Tfktεit)2\displaystyle\max_{i\leq p}\left\lVert{\mathbf{H}}{\mathbf{f}}_{t}\varepsilon_{it}\right\rVert\leq\left\lVert{\mathbf{H}}\right\rVert\max_{i}\sqrt{\sum_{k=1}^{K}\Bigg{(}\frac{1}{T}\sum_{t=1}^{T}f_{kt}\varepsilon_{it}\Bigg{)}^{2}} 𝐇Kmaxip,jK|1Tt=1Tfjtεit|\displaystyle\leq\left\lVert{\mathbf{H}}\right\rVert\sqrt{K}\max_{i\leq p,j\leq K}\left\lvert\frac{1}{T}\sum_{t=1}^{T}f_{jt}\varepsilon_{it}\right\rvert
=𝒪P(KK1/2logp/T),\displaystyle=\mathcal{O}_{P}\Big{(}K\cdot K^{1/2}\cdot\sqrt{\log p/T}\Big{)},

where we used Lemmas 1 and 7 together with Bonferroni’s method. For the second term,

maxi1Tt=1Trit(𝐟^t𝐇𝐟t)maxi(1Tt=1Trit21Tt=1T𝐟^t𝐇𝐟t2)1/2=𝒪P(1T+K2p)1/2,\displaystyle\max_{i}\left\lVert\frac{1}{T}\sum_{t=1}^{T}r_{it}\Big{(}\widehat{{\mathbf{f}}}_{t}-{\mathbf{H}}{\mathbf{f}}_{t}\Big{)}\right\rVert\leq\max_{i}\Bigg{(}\frac{1}{T}\sum_{t=1}^{T}r_{it}^{2}\frac{1}{T}\sum_{t=1}^{T}\left\lVert\widehat{{\mathbf{f}}}_{t}-{\mathbf{H}}{\mathbf{f}}_{t}\right\rVert^{2}\Bigg{)}^{1/2}=\mathcal{O}_{P}\Bigg{(}\frac{1}{T}+\frac{K^{2}}{p}\Bigg{)}^{1/2},

where we used Lemma 6 and the fact that maxiT1t=1Trit2=𝒪P(1)\max_{i}T^{-1}\sum_{t=1}^{T}r_{it}^{2}=\mathcal{O}_{P}(1) since 𝔼[rit2]=𝒪(1)\mathbb{E}_{\,\!\!}\left[r_{it}^{2}\right]=\mathcal{O}(1).
Finally, the third term is 𝒪P(K2T1/2)\mathcal{O}_{P}(K^{2}T^{-1/2}) since (1/T)t=1T𝐟t𝐟t𝐈K=𝒪P(KT1/2)\lVert(1/T)\sum_{t=1}^{T}{\mathbf{f}}_{t}{\mathbf{f}}^{\prime}_{t}-{\mathbf{I}}_{K}\rVert=\mathcal{O}_{P}\Big{(}KT^{-1/2}\Big{)}, 𝐇=𝒪P(K)\left\lVert{\mathbf{H}}\right\rVert=\mathcal{O}_{P}(K) and maxi𝐛i=𝒪(1)\max_{i}\left\lVert{\mathbf{b}}\right\rVert_{i}=\mathcal{O}(1) by Assumption (B.1).

B.3 Corollary 1

As a consequence of Theorem 1, we get the following corollary:

Corollary 1.

Under the assumptions of Theorem 1,

maxip,tT𝐛^i𝐟^t𝐛i𝐟t=𝒪P(logT1/r2K2logp/T+K2T1/4/p).\max_{i\leq p,t\leq T}\left\lVert\widehat{{\mathbf{b}}}_{i}^{{}^{\prime}}\widehat{{\mathbf{f}}}_{t}-{\mathbf{b}}^{\prime}_{i}{\mathbf{f}}_{t}\right\rVert=\mathcal{O}_{P}(\log T^{1/r_{2}}K^{2}\sqrt{\log p/T}+K^{2}T^{1/4}/\sqrt{p}).
Proof.

Using Assumption (A.4) and Bonferroni’s method, we have maxtT𝐟t=𝒪P(KlogT1/r2)\max_{t\leq T}\lVert{\mathbf{f}}_{t}\rVert=\mathcal{O}_{P}(\sqrt{K}\log T^{1/r_{2}}). By Theorem 1, uniformly in ii and tt:

𝐛^i𝐟^t𝐛i𝐟t\displaystyle\left\lVert\widehat{{\mathbf{b}}}^{\prime}_{i}\widehat{{\mathbf{f}}}_{t}-{\mathbf{b}}^{\prime}_{i}{\mathbf{f}}_{t}\right\rVert 𝐛^i𝐇𝐛i𝐟^t𝐇𝐟t+𝐇𝐛i𝐟^t𝐇𝐟t\displaystyle\leq\left\lVert\widehat{{\mathbf{b}}}_{i}-{\mathbf{H}}{\mathbf{b}}_{i}\right\rVert\left\lVert\widehat{{\mathbf{f}}}_{t}-{\mathbf{H}}{\mathbf{f}}_{t}\right\rVert+\left\lVert{\mathbf{H}}{\mathbf{b}}_{i}\right\rVert\left\lVert\widehat{{\mathbf{f}}}_{t}-{\mathbf{H}}{\mathbf{f}}_{t}\right\rVert
+𝐛^i𝐇𝐛i𝐇𝐟t+𝐛i𝐟t𝐇𝐇𝐈K\displaystyle+\left\lVert\widehat{{\mathbf{b}}}_{i}-{\mathbf{H}}{\mathbf{b}}_{i}\right\rVert\left\lVert{\mathbf{H}}{\mathbf{f}}_{t}\right\rVert+\left\lVert{\mathbf{b}}_{i}\right\rVert\left\lVert{\mathbf{f}}_{t}\right\rVert\left\lVert{\mathbf{H}}^{\prime}{\mathbf{H}}-{\mathbf{I}}_{K}\right\rVert
=𝒪P((K3/2logpT+Kp)(KT+KT1/4p))+𝒪P(K(KT+KT1/4p))\displaystyle=\mathcal{O}_{P}\Bigg{(}\Big{(}K^{3/2}\sqrt{\frac{\log p}{T}}+\frac{K}{\sqrt{p}}\Big{)}\cdot\Big{(}\frac{K}{\sqrt{T}}+\frac{KT^{1/4}}{\sqrt{p}}\Big{)}\Bigg{)}+\mathcal{O}_{P}\Bigg{(}K\cdot\Big{(}\frac{K}{\sqrt{T}}+\frac{KT^{1/4}}{\sqrt{p}}\Big{)}\Bigg{)}
+𝒪P((K3/2logpT+Kp)logT1/r2K1/2)+𝒪P(logT1/r2K1/2(K5/2T+K5/2p))\displaystyle+\mathcal{O}_{P}\Bigg{(}\Big{(}K^{3/2}\sqrt{\frac{\log p}{T}}+\frac{K}{\sqrt{p}}\Big{)}\cdot\log T^{1/r_{2}}K^{1/2}\Bigg{)}+\mathcal{O}_{P}\Bigg{(}\log T^{1/r_{2}}K^{1/2}\Big{(}\frac{K^{5/2}}{\sqrt{T}}+\frac{K^{5/2}}{\sqrt{p}}\Big{)}\Bigg{)}
=𝒪P(logT1/r2K2logp/T+K2T1/4/p).\displaystyle=\mathcal{O}_{P}\Big{(}\log T^{1/r_{2}}K^{2}\sqrt{\log p/T}+K^{2}T^{1/4}/\sqrt{p}\Big{)}.

B.4 Proof of Theorem 2

Using the definition of the idiosyncratic components we have εitε^it=𝐛i𝐇(𝐟^t𝐇𝐟t)+(𝐛^i𝐛i𝐇)𝐟^t+𝐛i(𝐇𝐇𝐈K)𝐟t\varepsilon_{it}-\hat{\varepsilon}_{it}={\mathbf{b}}^{\prime}_{i}{\mathbf{H}}^{\prime}(\widehat{{\mathbf{f}}}_{t}-{\mathbf{H}}{\mathbf{f}}_{t})+(\widehat{{\mathbf{b}}}^{\prime}_{i}-{\mathbf{b}}^{\prime}_{i}{\mathbf{H}}^{\prime})\widehat{{\mathbf{f}}}_{t}+{\mathbf{b}}^{\prime}_{i}({\mathbf{H}}^{\prime}{\mathbf{H}}-{\mathbf{I}}_{K}){\mathbf{f}}_{t}. We bound the maximum element-wise difference as follows:

maxip1Tt=1T(εitε^it)2\displaystyle\max_{i\leq p}\frac{1}{T}\sum_{t=1}^{T}(\varepsilon_{it}-\hat{\varepsilon}_{it})^{2} 4maxi𝐛i𝐇21Tt=1T𝐟^t𝐇𝐟t2+4maxi𝐛^i𝐛i𝐇21Tt=1T𝐟^t2\displaystyle\leq 4\max_{i}\left\lVert{\mathbf{b}}^{\prime}_{i}{\mathbf{H}}^{\prime}\right\rVert^{2}\frac{1}{T}\sum_{t=1}^{T}\left\lVert\widehat{{\mathbf{f}}}_{t}-{\mathbf{H}}{\mathbf{f}}_{t}\right\rVert^{2}+4\max_{i}\left\lVert\widehat{{\mathbf{b}}}^{\prime}_{i}-{\mathbf{b}}^{\prime}_{i}{\mathbf{H}}^{\prime}\right\rVert^{2}\frac{1}{T}\sum_{t=1}^{T}\left\lVert\widehat{{\mathbf{f}}}_{t}\right\rVert^{2}
+4maxi𝐛i1Tt=1T𝐟t2𝐇𝐇𝐈KF2\displaystyle+4\max_{i}\left\lVert{\mathbf{b}}^{\prime}_{i}\right\rVert\frac{1}{T}\sum_{t=1}^{T}\left\lVert{\mathbf{f}}_{t}\right\rVert^{2}\left\lVert{\mathbf{H}}^{\prime}{\mathbf{H}}-{\mathbf{I}}_{K}\right\rVert_{F}^{2}
=𝒪(K2(KT+K3p))+𝒪((K3logpT+K2p)K)+𝒪(K(K5T+K5p))\displaystyle=\mathcal{O}\Bigg{(}K^{2}\cdot\Big{(}\frac{K}{T}+\frac{K^{3}}{p}\Big{)}\Bigg{)}+\mathcal{O}\Bigg{(}\Big{(}\frac{K^{3}\log p}{T}+\frac{K^{2}}{p}\Big{)}\cdot K\Bigg{)}+\mathcal{O}\Bigg{(}K\cdot\Big{(}\frac{K^{5}}{T}+\frac{K^{5}}{p}\Big{)}\Bigg{)}
=𝒪(K4logpT+K6p).\displaystyle=\mathcal{O}\Bigg{(}\frac{K^{4}\log p}{T}+\frac{K^{6}}{p}\Bigg{)}.

Let ω3TK2logp/T+K3/p\omega_{3T}\equiv K^{2}\sqrt{\log p/T}+K^{3}/\sqrt{p}. Denote maxip(1/T)t=1T(εitε^it)2=𝒪P(ω3T2)\max_{i\leq p}(1/T)\sum_{t=1}^{T}(\varepsilon_{it}-\hat{\varepsilon}_{it})^{2}=\mathcal{O}_{P}(\omega_{3T}^{2}). Then, maxi,t|εitε^it|=𝒪P(ω3T)=oP(1)\max_{i,t}\left\lvert\varepsilon_{it}-\hat{\varepsilon}_{it}\right\rvert=\mathcal{O}_{P}(\omega_{3T})=o_{P}(1), where the last equality is implied by Corollary 1.
As pointed out in the main text, the second part of Theorem 2 is based on the relationship between the convergence rates of the estimated covariance and precision matrices established in Janková and van de Geer (2018) (Theorem 14.1.3).

B.5 Lemmas for Theorem 3

Lemma 8.

Under the assumptions of Theorem 1, we have the following results:

  1. (a)

    𝐁=𝐁𝐇=𝒪(p)\left\lVert{\mathbf{B}}\right\rVert=\left\lVert{\mathbf{B}}{\mathbf{H}}^{\prime}\right\rVert=\mathcal{O}(\sqrt{p}).

  2. (b)

    ϱT1max1ip𝐛^i𝐇𝐛i=oP(1/K)\varrho_{T}^{-1}\max_{1\leq i\leq p}\left\lVert\widehat{{\mathbf{b}}}_{i}-{\mathbf{H}}^{\prime}{\mathbf{b}}_{i}\right\rVert=o_{P}(1/\sqrt{K}) and max1ip𝐛^i=𝒪P(K)\max_{1\leq i\leq p}\left\lVert\widehat{{\mathbf{b}}}_{i}\right\rVert=\mathcal{O}_{P}(\sqrt{K}).

  3. (c)

    ϱT1𝐁^𝐁𝐇=oP(p/K)\varrho_{T}^{-1}\left\lVert\widehat{{\mathbf{B}}}-{\mathbf{B}}{\mathbf{H}}^{\prime}\right\rVert=o_{P}\Big{(}\sqrt{p/K}\Big{)} and 𝐁^=𝒪P(p)\left\lVert\widehat{{\mathbf{B}}}\right\rVert=\mathcal{O}_{P}(\sqrt{p}).

Proof.

Part (c) is direct consequences of (a)-(b), therefore, we only prove the first two parts in what follows.

  1. (a)

    Part (a) easily follows from (B.1): tr(𝚺𝐁𝐁)=tr(𝚺)𝐁20\text{tr}({\bm{\Sigma}}-{\mathbf{B}}{\mathbf{B}}^{\prime})=\text{tr}({\bm{\Sigma}})-\left\lVert{\mathbf{B}}\right\rVert^{2}\geq 0, since tr(𝚺)=𝒪(p)\text{tr}({\bm{\Sigma}})=\mathcal{O}(p) by (B.1), we get 𝐁2=𝒪(p)\left\lVert{\mathbf{B}}\right\rVert^{2}=\mathcal{O}(p). Part (a) follows from the fact that the linear space spanned by the rows of 𝐁{\mathbf{B}} is the same as that by the rows of 𝐁𝐇{\mathbf{B}}{\mathbf{H}}^{\prime}, hence, in practice, it does not matter which one is used.

  2. (b)

    From Theorem 1, we have maxip𝐛^i𝐇𝐛i=𝒪P(ω1T)\max_{i\leq p}\left\lVert\widehat{{\mathbf{b}}}_{i}-{\mathbf{H}}{\mathbf{b}}_{i}\right\rVert=\mathcal{O}_{P}(\omega_{1T}). Using the definition of ϱT\varrho_{T} from Theorem 2, it follows that ϱT1ω1T=oP(ω1Tω3T1)\varrho_{T}^{-1}\omega_{1T}=o_{P}(\omega_{1T}\omega_{3T}^{-1}). Let z~Tω1Tω3T1\widetilde{z}_{T}\equiv\omega_{1T}\omega_{3T}^{-1}. Consider
    ϱT1max1ip𝐛^i𝐇𝐛i=oP(zT)\varrho_{T}^{-1}\max_{1\leq i\leq p}\left\lVert\widehat{{\mathbf{b}}}_{i}-{\mathbf{H}}{\mathbf{b}}_{i}\right\rVert=o_{P}(z_{T}). The latter holds for any ztz~Tz_{t}\geq\widetilde{z}_{T}, with the tightest bound obtained when zT=z~Tz_{T}=\widetilde{z}_{T}. For the ease of representation, we use zT=1/Kz_{T}=1/\sqrt{K} instead of z~T\widetilde{z}_{T}.
    The second result in Part (b) is obtained using the fact that max1ip𝐛^iK𝐁max\max_{1\leq i\leq p}\left\lVert\widehat{{\mathbf{b}}}_{i}\right\rVert\leq\sqrt{K}\left\lVert{\mathbf{B}}\right\rVert_{\text{max}}, where 𝐁max=𝒪(1)\left\lVert{\mathbf{B}}\right\rVert_{\text{max}}=\mathcal{O}(1) by (B.1).

Lemma 9.

Let 𝚷[𝚯f+(𝐁𝐇)𝚯ε(𝐁𝐇)]1{\bm{\Pi}}\equiv\Big{[}{\bm{\Theta}}_{f}+({\mathbf{B}}{\mathbf{H}}^{\prime})^{\prime}{\bm{\Theta}}_{\varepsilon}({\mathbf{B}}{\mathbf{H}}^{\prime})\Big{]}^{-1}, 𝚷^[𝚯^f+𝐁^𝚯^ε𝐁^]1\widehat{{\bm{\Pi}}}\equiv\Big{[}\widehat{{\bm{\Theta}}}_{f}+\widehat{{\mathbf{B}}}^{\prime}\widehat{{\bm{\Theta}}}_{\varepsilon}\widehat{{\mathbf{B}}}\Big{]}^{-1}. Also, define 𝚺f=(1/T)t=1T𝐇𝐟t(𝐇𝐟t){\bm{\Sigma}}_{f}=(1/T)\sum_{t=1}^{T}{\mathbf{H}}{\mathbf{f}}_{t}({\mathbf{H}}{\mathbf{f}}_{t})^{\prime}, 𝚯f=𝚺f1{\bm{\Theta}}_{f}={\bm{\Sigma}}_{f}^{-1}, 𝚺^f(1/T)t=1T𝐟^t𝐟^t\widehat{{\bm{\Sigma}}}_{f}\equiv(1/T)\sum_{t=1}^{T}\widehat{{\mathbf{f}}}_{t}\widehat{{\mathbf{f}}}^{\prime}_{t}, and 𝚯^f=𝚺^f1\widehat{{\bm{\Theta}}}_{f}=\widehat{{\bm{\Sigma}}}_{f}^{-1}. Under the assumptions of Theorem 2, we have the following results:

  1. (a)

    Λmin(𝐁𝐁)1=𝒪(1/p)\Lambda_{\text{min}}({\mathbf{B}}^{\prime}{\mathbf{B}})^{-1}=\mathcal{O}(1/p).

  2. (b)

    |𝚷|2=𝒪(1/p){\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{\bm{\Pi}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{2}=\mathcal{O}(1/p).

  3. (c)

    ϱT1|𝚯^f𝚯f|2=oP(1/K)\varrho_{T}^{-1}{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\widehat{{\bm{\Theta}}}_{f}-{\bm{\Theta}}_{f}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{2}=o_{P}\Big{(}1/\sqrt{K}\Big{)}.

  4. (d)

    ϱT1|𝚷^𝚷|2=𝒪P(sT/p)\varrho_{T}^{-1}{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\widehat{{\bm{\Pi}}}-{\bm{\Pi}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{2}=\mathcal{O}_{P}\Big{(}s_{T}/p\Big{)} and |𝚷^|2=𝒪P(1/p){\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\widehat{{\bm{\Pi}}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{2}=\mathcal{O}_{P}(1/p).

Proof.
  1. (a)

    Using Assumption (A.2) we have |Λmin(p1𝐁𝐁)Λmin(𝐁˘)||p1𝐁𝐁𝐁˘|2\left\lvert\Lambda_{\text{min}}(p^{-1}{\mathbf{B}}^{\prime}{\mathbf{B}})-\Lambda_{\text{min}}(\breve{{\mathbf{B}}})\right\rvert\leq{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|p^{-1}{\mathbf{B}}^{\prime}{\mathbf{B}}-\breve{{\mathbf{B}}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{2}, which implies Part (a).

  2. (b)

    First, notice that |𝚷|2=Λmin(𝚯f+(𝐁𝐇)𝚯ε(𝐁𝐇))1{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{\bm{\Pi}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{2}=\Lambda_{\text{min}}({\bm{\Theta}}_{f}+({\mathbf{B}}{\mathbf{H}}^{\prime})^{\prime}{\bm{\Theta}}_{\varepsilon}({\mathbf{B}}{\mathbf{H}}^{\prime}))^{-1}. Therefore, we get

    |𝚷|2Λmin((𝐁𝐇)𝚯ε(𝐁𝐇))1Λmin(𝐁𝐁)1Λmin(𝚯ε)1=Λmin(𝐁𝐁)1Λmax(𝚺ε),{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{\bm{\Pi}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{2}\leq\Lambda_{\text{min}}(({\mathbf{B}}{\mathbf{H}}^{\prime})^{\prime}{\bm{\Theta}}_{\varepsilon}({\mathbf{B}}{\mathbf{H}}^{\prime}))^{-1}\leq\Lambda_{\text{min}}({\mathbf{B}}^{\prime}{\mathbf{B}})^{-1}\Lambda_{\text{min}}({\bm{\Theta}}_{\varepsilon})^{-1}=\Lambda_{\text{min}}({\mathbf{B}}^{\prime}{\mathbf{B}})^{-1}\Lambda_{\text{max}}({\bm{\Sigma}}_{\varepsilon}),

    where the second inequality is due to the fact that the linear space spanned by the rows of 𝐁{\mathbf{B}} is the same as that by the rows of 𝐁𝐇{\mathbf{B}}{\mathbf{H}}^{\prime}, hence, in practice, it does not matter which one is used. Therefore, the result in Part (b) follows from Part (a), Assumptions (A.1) and (A.2).

  3. (c)

    From Lemma 7 we obtained:

    1Tt=1T𝐇𝐟t(𝐇𝐟t)1Tt=1T𝐟^t𝐟^tF=𝒪P(K3/2T+K5/2p).\displaystyle\left\lVert\frac{1}{T}\sum_{t=1}^{T}{\mathbf{H}}{\mathbf{f}}_{t}({\mathbf{H}}{\mathbf{f}}_{t})^{\prime}-\frac{1}{T}\sum_{t=1}^{T}\widehat{{\mathbf{f}}}_{t}\widehat{{\mathbf{f}}}^{\prime}_{t}\right\rVert_{F}=\mathcal{O}_{P}\Bigg{(}\frac{K^{3/2}}{\sqrt{T}}+\frac{K^{5/2}}{\sqrt{p}}\Bigg{)}.

    Since |𝚯f(𝚺^f𝚺f)|2<1{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{\bm{\Theta}}_{f}(\widehat{{\bm{\Sigma}}}_{f}-{\bm{\Sigma}}_{f})\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{2}<1, we have

    |𝚯^f𝚯f|2|𝚯f|2|𝚯f(𝚺^f𝚺f)|21|𝚯f(𝚺^f𝚺f)|2=𝒪P(K3/2T+K5/2p).\displaystyle{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\widehat{{\bm{\Theta}}}_{f}-{\bm{\Theta}}_{f}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{2}\leq\frac{{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{\bm{\Theta}}_{f}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{2}{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{\bm{\Theta}}_{f}(\widehat{{\bm{\Sigma}}}_{f}-{\bm{\Sigma}}_{f})\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{2}}{1-{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{\bm{\Theta}}_{f}(\widehat{{\bm{\Sigma}}}_{f}-{\bm{\Sigma}}_{f})\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{2}}=\mathcal{O}_{P}\Bigg{(}\frac{K^{3/2}}{\sqrt{T}}+\frac{K^{5/2}}{\sqrt{p}}\Bigg{)}.

    Let ω4T=K3/2/T+K5/2/p\omega_{4T}=K^{3/2}/\sqrt{T}+K^{5/2}/\sqrt{p}. Using the definition of ϱT\varrho_{T} from Theorem 2, it follows that ϱT1ω4T=oP(ω4Tω3T1)\varrho_{T}^{-1}\omega_{4T}=o_{P}(\omega_{4T}\omega_{3T}^{-1}). Let γ~Tω4Tω3T1\widetilde{\gamma}_{T}\equiv\omega_{4T}\omega_{3T}^{-1}. Consider ϱT1|𝚯^f𝚯f|2=oP(γT)\varrho_{T}^{-1}{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\widehat{{\bm{\Theta}}}_{f}-{\bm{\Theta}}_{f}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{2}=o_{P}(\gamma_{T}). The latter holds for any γtγ~T\gamma_{t}\geq\widetilde{\gamma}_{T}, with the tightest bound obtained when γT=γ~T\gamma_{T}=\widetilde{\gamma}_{T}. For the ease of representation, we use γT=1/K\gamma_{T}=1/\sqrt{K} instead of γ~T\widetilde{\gamma}_{T}.

  4. (d)

    We will bound each term in the definition of 𝚷^𝚷\widehat{{\bm{\Pi}}}-{\bm{\Pi}}. First, we have

    |𝐁^𝚯^ε𝐁^(𝐁𝐇)𝚯ε(𝐁𝐇)|2\displaystyle{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\widehat{{\mathbf{B}}}^{\prime}\widehat{{\bm{\Theta}}}_{\varepsilon}\widehat{{\mathbf{B}}}-({\mathbf{B}}{\mathbf{H}}^{\prime})^{\prime}{\bm{\Theta}}_{\varepsilon}({\mathbf{B}}{\mathbf{H}}^{\prime})\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{2} |𝐁^𝐁𝐇|2|𝚯^ε|2|𝐁^|2+|𝐁𝐇|2|𝚯^ε𝚯ε|2|𝐁^|2\displaystyle\leq{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\widehat{{\mathbf{B}}}-{\mathbf{B}}{\mathbf{H}}^{\prime}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{2}{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\widehat{{\bm{\Theta}}}_{\varepsilon}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{2}{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\widehat{{\mathbf{B}}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{2}+{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{\mathbf{B}}{\mathbf{H}}^{\prime}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{2}{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\widehat{{\bm{\Theta}}}_{\varepsilon}-{\bm{\Theta}}_{\varepsilon}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{2}{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\widehat{{\mathbf{B}}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{2}
    +|𝐁𝐇|2|𝚯ε|2|𝐁^𝐁𝐇|2=𝒪P(psTϱT).\displaystyle+{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{\mathbf{B}}{\mathbf{H}}^{\prime}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{2}{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{\bm{\Theta}}_{\varepsilon}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{2}{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\widehat{{\mathbf{B}}}-{\mathbf{B}}{\mathbf{H}}^{\prime}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{2}=\mathcal{O}_{P}\Bigg{(}p\cdot s_{T}\cdot\varrho_{T}\Bigg{)}. (B.4)

    Now we combine ((d)) with the results from Parts (b)-(c):

    ϱT1|𝚷(𝚷^1𝚷1)|2=𝒪P(st).\displaystyle\varrho_{T}^{-1}{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{\bm{\Pi}}\Big{(}\widehat{{\bm{\Pi}}}^{-1}-{\bm{\Pi}}^{-1}\Big{)}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{2}=\mathcal{O}_{P}\Big{(}s_{t}\Big{)}.

    Finally, since |𝚷(𝚷^1𝚷1)|2<1{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{\bm{\Pi}}\Big{(}\widehat{{\bm{\Pi}}}^{-1}-{\bm{\Pi}}^{-1}\Big{)}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{2}<1, we have

    ϱT1|𝚷^𝚷|2ϱT1|𝚷|2|𝚷(𝚷^1𝚷1)|21|𝚷(𝚷^1𝚷1)|2=𝒪P(stp).\displaystyle\varrho_{T}^{-1}{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\widehat{{\bm{\Pi}}}-{\bm{\Pi}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{2}\leq\varrho_{T}^{-1}\frac{{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{\bm{\Pi}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{2}{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{\bm{\Pi}}\Big{(}\widehat{{\bm{\Pi}}}^{-1}-{\bm{\Pi}}^{-1}\Big{)}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{2}}{1-{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{\bm{\Pi}}\Big{(}\widehat{{\bm{\Pi}}}^{-1}-{\bm{\Pi}}^{-1}\Big{)}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{2}}=\mathcal{O}_{P}\Bigg{(}\frac{s_{t}}{p}\Bigg{)}.

B.6 Proof of Theorem 3

Using the Sherman-Morrison-Woodbury formula, we have

|𝚯^𝚯|l\displaystyle{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\widehat{{\bm{\Theta}}}-{\bm{\Theta}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{l} |𝚯^ε𝚯ε|l+|(𝚯^ε𝚯ε)𝐁^𝚷^𝐁^𝚯^ε|l+|𝚯ε(𝐁^𝐁𝐇)𝚷^𝐁^𝚯^ε|l\displaystyle\leq{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\widehat{{\bm{\Theta}}}_{\varepsilon}-{\bm{\Theta}}_{\varepsilon}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{l}+{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|(\widehat{{\bm{\Theta}}}_{\varepsilon}-{\bm{\Theta}}_{\varepsilon})\widehat{{\mathbf{B}}}\widehat{{\bm{\Pi}}}\widehat{{\mathbf{B}}}^{\prime}\widehat{{\bm{\Theta}}}_{\varepsilon}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{l}+{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{\bm{\Theta}}_{\varepsilon}(\widehat{{\mathbf{B}}}-{\mathbf{B}}{\mathbf{H}}^{\prime})\widehat{{\bm{\Pi}}}\widehat{{\mathbf{B}}}^{\prime}\widehat{{\bm{\Theta}}}_{\varepsilon}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{l}
+|𝚯ε𝐁𝐇(𝚷^𝚷)𝐁^𝚯^ε|l+|𝚯ε𝐁𝐇𝚷(𝐁^𝐁)𝚯^ε|l+|𝚯ε𝐁𝐇𝚷(𝐁𝐇)(𝚯^ε𝚯ε)|l\displaystyle+{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{\bm{\Theta}}_{\varepsilon}{\mathbf{B}}{\mathbf{H}}^{\prime}(\widehat{{\bm{\Pi}}}-{\bm{\Pi}})\widehat{{\mathbf{B}}}^{\prime}\widehat{{\bm{\Theta}}}_{\varepsilon}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{l}+{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{\bm{\Theta}}_{\varepsilon}{\mathbf{B}}{\mathbf{H}}^{\prime}{\bm{\Pi}}(\widehat{{\mathbf{B}}}-{\mathbf{B}})^{\prime}\widehat{{\bm{\Theta}}}_{\varepsilon}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{l}+{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{\bm{\Theta}}_{\varepsilon}{\mathbf{B}}{\mathbf{H}}^{\prime}{\bm{\Pi}}({\mathbf{B}}{\mathbf{H}}^{\prime})^{\prime}(\widehat{{\bm{\Theta}}}_{\varepsilon}-{\bm{\Theta}}_{\varepsilon})\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{l}
=Δ1+Δ2+Δ3+Δ4+Δ5+Δ6.\displaystyle=\Delta_{1}+\Delta_{2}+\Delta_{3}+\Delta_{4}+\Delta_{5}+\Delta_{6}. (B.5)

We now bound the terms in (B.6) for l=2l=2 and l=l=\infty. We start with l=2l=2. First, note that ϱT1Δ1=𝒪P(sT)\varrho_{T}^{-1}\Delta_{1}=\mathcal{O}_{P}(s_{T}) by Theorem 2. Second, using Lemmas 8-9 together with Theorem 2, we have ϱT1(Δ2+Δ6)=𝒪P(sTp(1/p)p1)=𝒪P(sT)\varrho_{T}^{-1}(\Delta_{2}+\Delta_{6})=\mathcal{O}_{P}(s_{T}\cdot\sqrt{p}\cdot(1/p)\cdot\sqrt{p}\cdot 1)=\mathcal{O}_{P}(s_{T}). Third, ϱT1(Δ3+Δ5)\varrho_{T}^{-1}(\Delta_{3}+\Delta_{5}) is negligible according to Lemma 8(c). Finally, ϱT1Δ4=𝒪P(1p(sT/p)p1)=𝒪P(sT)\varrho_{T}^{-1}\Delta_{4}=\mathcal{O}_{P}\Big{(}1\cdot\sqrt{p}\cdot(s_{T}/p)\cdot\sqrt{p}\cdot 1\Big{)}=\mathcal{O}_{P}(s_{T}) by Lemmas 8-9 and Theorem 2.
Now consider l=l=\infty. First, similarly to the previous case, ϱT1Δ1=𝒪P(sT)\varrho_{T}^{-1}\Delta_{1}=\mathcal{O}_{P}(s_{T}). Second, ϱT1(Δ2+Δ6)=𝒪P(sTpK(K/p)pKdT)=𝒪P(sTK3/2dT)\varrho_{T}^{-1}(\Delta_{2}+\Delta_{6})=\mathcal{O}_{P}\Big{(}s_{T}\cdot\sqrt{pK}\cdot(\sqrt{K}/p)\cdot\sqrt{pK}\cdot\sqrt{d_{T}}\Big{)}=\mathcal{O}_{P}(s_{T}K^{3/2}\sqrt{d_{T}}), where we used the fact that for any 𝐀𝒮p{\mathbf{A}}\in\mathcal{S}_{p} we have |𝐀|1=|𝐀|d(𝐀)|𝐀|2{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{\mathbf{A}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{1}={\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{\mathbf{A}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{\infty}\leq\sqrt{d({\mathbf{A}})}{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{\mathbf{A}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{2}, where d(𝐀)d({\mathbf{A}}) measures the maximum vertex degree as described at the beginning of Section 4. Third, the term ϱT1(Δ3+Δ5)\varrho_{T}^{-1}(\Delta_{3}+\Delta_{5}) is negligible according to Lemma 8(c). Finally, ϱT1Δ4=𝒪P(dTpKK(sT)/ppKdT)=𝒪P(dTK3/2sT)\varrho_{T}^{-1}\Delta_{4}=\mathcal{O}_{P}(\sqrt{d_{T}}\cdot\sqrt{pK}\cdot\sqrt{K}(s_{T})/p\cdot\sqrt{pK}\cdot\sqrt{d_{T}})=\mathcal{O}_{P}(d_{T}K^{3/2}s_{T}).

B.7 Lemmas for Theorem 4

Lemma 10.

Under the assumptions of Theorem 4, |𝚯|1=𝒪(dTK3/2){\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{\bm{\Theta}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{1}=\mathcal{O}(d_{T}K^{3/2}), where dTd_{T} was defined in Section 4.

Proof.

We use the Sherman-Morrison-Woodbury formula:

|𝚯|1\displaystyle{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{\bm{\Theta}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{1} |𝚯ε|1+|𝚯ε𝐁[𝚯f+𝐁𝚯ε𝐁]1𝐁𝚯ε|1\displaystyle\leq{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{\bm{\Theta}}_{\varepsilon}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{1}+{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{\bm{\Theta}}_{\varepsilon}{\mathbf{B}}[{\bm{\Theta}}_{f}+{\mathbf{B}}^{\prime}{\bm{\Theta}}_{\varepsilon}{\mathbf{B}}]^{-1}{\mathbf{B}}^{\prime}{\bm{\Theta}}_{\varepsilon}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{1}
=𝒪(dT)+𝒪(dTpKpKdT)=𝒪(dTK3/2).\displaystyle=\mathcal{O}(\sqrt{d_{T}})+\mathcal{O}\Big{(}\sqrt{d_{T}}\cdot p\cdot\frac{\sqrt{K}}{p}\cdot K\cdot\sqrt{d_{T}}\Big{)}=\mathcal{O}(d_{T}K^{3/2}). (B.6)

The last equality in (B.7) is obtained under the assumptions of Theorem 4. This result is important in several aspects: it shows that the sparsity of the precision matrix of stock returns is controlled by the sparsity in the precision of the idiosyncratic returns. Hence, one does not need to impose an unrealistic sparsity assumption on the precision of returns a priori when the latter follow a factor structure - sparsity of the precision once the common movements have been taken into account would suffice. ∎

Lemma 11.

Define a=𝛊p𝚯𝛊p/pa={\bm{\iota}}^{\prime}_{p}{\bm{\Theta}}{\bm{\iota}}_{p}/p, b=𝛊p𝚯𝐦/pb={\bm{\iota}}^{\prime}_{p}{\bm{\Theta}}{\mathbf{m}}/p, d=𝐦𝚯𝐦/pd={\mathbf{m}}^{\prime}{\bm{\Theta}}{\mathbf{m}}/p, g=𝐦𝚯𝐦/pg=\sqrt{{\mathbf{m}}^{\prime}{\bm{\Theta}}{\mathbf{m}}}/p and a^=𝛊p𝚯^𝛊p/p\widehat{a}={\bm{\iota}}^{\prime}_{p}\widehat{{\bm{\Theta}}}{\bm{\iota}}_{p}/p, b^=𝛊p𝚯^𝐦^/p\widehat{b}={\bm{\iota}}^{\prime}_{p}\widehat{{\bm{\Theta}}}\widehat{{\mathbf{m}}}/p, d^=𝐦^𝚯^𝐦^/p\widehat{d}=\widehat{{\mathbf{m}}}^{\prime}\widehat{{\bm{\Theta}}}\widehat{{\mathbf{m}}}/p, g^=𝐦^𝚯^𝐦^/p\widehat{g}=\sqrt{\widehat{{\mathbf{m}}}^{\prime}\widehat{{\bm{\Theta}}}\widehat{{\mathbf{m}}}}/p . Under the assumptions of Theorem 4 and assuming (adb2)>0(ad-b^{2})>0,

  1. (a)

    aC0>0a\geq C_{0}>0, b=𝒪(1)b=\mathcal{O}(1), d=𝒪(1)d=\mathcal{O}(1), where C0C_{0} is a positive constant representing the minimal eigenvalue of 𝚯{\bm{\Theta}}.

  2. (b)

    |a^a|=𝒪P(ϱTdTK3/2sT)=oP(1)\left\lvert\widehat{a}-a\right\rvert=\mathcal{O}_{P}(\varrho_{T}d_{T}K^{3/2}s_{T})=o_{P}(1).

  3. (c)

    |b^b|=𝒪P(ϱTdTK3/2sT)=oP(1)\left\lvert\widehat{b}-b\right\rvert=\mathcal{O}_{P}(\varrho_{T}d_{T}K^{3/2}s_{T})=o_{P}(1)

  4. (d)

    |d^d|=𝒪P(ϱTdTK3/2sT)=oP(1)\left\lvert\widehat{d}-d\right\rvert=\mathcal{O}_{P}(\varrho_{T}d_{T}K^{3/2}s_{T})=o_{P}(1).

  5. (e)

    |g^g|=𝒪P([ϱTdTK3/2sT]1/2)=oP(1)\left\lvert\widehat{g}-g\right\rvert=\mathcal{O}_{P}\Big{(}[\varrho_{T}d_{T}K^{3/2}s_{T}]^{1/2}\Big{)}=o_{P}(1).

  6. (f)

    |(a^d^b^2)(adb2)|=𝒪P(ϱTdTK3/2sT)=oP(1)\left\lvert(\widehat{a}\widehat{d}-\widehat{b}^{2})-(ad-b^{2})\right\rvert=\mathcal{O}_{P}\Big{(}\varrho_{T}d_{T}K^{3/2}s_{T}\Big{)}=o_{P}(1).

  7. (g)

    |adb2|=𝒪(1)\left\lvert ad-b^{2}\right\rvert=\mathcal{O}(1).

Proof.
  1. (a)

    Part (a) is trivial and follows directly from |𝚯|2=𝒪(1){\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{\bm{\Theta}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{2}=\mathcal{O}(1) and 𝐦=𝒪(1)\left\lVert{\mathbf{m}}\right\rVert_{\infty}=\mathcal{O}(1) from Assumption 1. We show the proof for dd: recall, d=𝐦𝚯𝐦/p|𝚯|22𝐦22/p=𝒪(1)d={\mathbf{m}}^{\prime}{\bm{\Theta}}{\mathbf{m}}/p\leq{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{\bm{\Theta}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{2}^{2}\left\lVert{\mathbf{m}}\right\rVert_{2}^{2}/p=\mathcal{O}(1).

  2. (b)

    Using the Hölders inequality, we have

    |a^a|=|𝜾p(𝚯^𝚯)𝜾pp|(𝚯^𝚯)𝜾p1𝜾pmaxp\displaystyle\left\lvert\widehat{a}-a\right\rvert=\left\lvert\frac{{\bm{\iota}}^{\prime}_{p}(\widehat{{\bm{\Theta}}}-{\bm{\Theta}}){\bm{\iota}}_{p}}{p}\right\rvert\leq\frac{\left\lVert(\widehat{{\bm{\Theta}}}-{\bm{\Theta}}){\bm{\iota}}_{p}\right\rVert_{1}\left\lVert{\bm{\iota}}_{p}\right\rVert_{\text{max}}}{p} |𝚯^𝚯|1\displaystyle\leq{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\widehat{{\bm{\Theta}}}-{\bm{\Theta}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{1}
    =𝒪P(ϱTdTK3/2(sT+(1/p)))=oP(1),\displaystyle=\mathcal{O}_{P}\Big{(}\varrho_{T}d_{T}K^{3/2}(s_{T}+(1/p))\Big{)}=o_{P}(1),

    where the last rate is obtained using the assumptions of Theorem 3.

  3. (c)

    First, rewrite the expression of interest:

    b^b=[𝜾p(𝚯^𝚯)(𝐦^𝐦)]/p+[𝜾p(𝚯^𝚯)𝐦]/p+[𝜾p𝚯(𝐦^𝐦)]/p.\displaystyle\widehat{b}-b=[{\bm{\iota}}^{\prime}_{p}(\widehat{{\bm{\Theta}}}-{\bm{\Theta}})(\widehat{{\mathbf{m}}}-{\mathbf{m}})]/p+[{\bm{\iota}}^{\prime}_{p}(\widehat{{\bm{\Theta}}}-{\bm{\Theta}}){\mathbf{m}}]/p+[{\bm{\iota}}^{\prime}_{p}{\bm{\Theta}}(\widehat{{\mathbf{m}}}-{\mathbf{m}})]/p. (B.7)

    We now bound each of the terms in (B.7) using the expressions derived in Callot et al. (2019) (see their Proof of Lemma A.2) and the fact that logp/T=o(1)\log p/T=o(1).

    |𝜾p(𝚯^𝚯)(𝐦^𝐦)|/p|𝚯^𝚯|1𝐦^𝐦max=𝒪P(ϱTdTK3/2sTlogpT).\displaystyle\left\lvert{\bm{\iota}}^{\prime}_{p}(\widehat{{\bm{\Theta}}}-{\bm{\Theta}})(\widehat{{\mathbf{m}}}-{\mathbf{m}})\right\rvert/p\leq{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\widehat{{\bm{\Theta}}}-{\bm{\Theta}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{1}\left\lVert\widehat{{\mathbf{m}}}-{\mathbf{m}}\right\rVert_{\text{max}}=\mathcal{O}_{P}\Big{(}\varrho_{T}d_{T}K^{3/2}s_{T}\cdot\sqrt{\frac{\log p}{T}}\Big{)}. (B.8)
    |𝜾p(𝚯^𝚯)𝐦|/p|𝚯^𝚯|1=𝒪P(ϱTdTK3/2sT).\displaystyle\left\lvert{\bm{\iota}}^{\prime}_{p}(\widehat{{\bm{\Theta}}}-{\bm{\Theta}}){\mathbf{m}}\right\rvert/p\leq{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\widehat{{\bm{\Theta}}}-{\bm{\Theta}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{1}=\mathcal{O}_{P}\Big{(}\varrho_{T}d_{T}K^{3/2}s_{T}\Big{)}. (B.9)
    |𝜾p𝚯(𝐦^𝐦)|/p|𝚯|1𝐦^𝐦max=𝒪P(dTK3/2logpT).\displaystyle\left\lvert{\bm{\iota}}^{\prime}_{p}{\bm{\Theta}}(\widehat{{\mathbf{m}}}-{\mathbf{m}})\right\rvert/p\leq{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{\bm{\Theta}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{1}\left\lVert\widehat{{\mathbf{m}}}-{\mathbf{m}}\right\rVert_{\text{max}}=\mathcal{O}_{P}\Big{(}d_{T}K^{3/2}\cdot\sqrt{\frac{\log p}{T}}\Big{)}. (B.10)
  4. (d)

    First, rewrite the expression of interest:

    d^d\displaystyle\widehat{d}-d =[(𝐦^𝐦)(𝚯^𝚯)(𝐦^𝐦)]/p+[(𝐦^𝐦)𝚯(𝐦^𝐦)]/p\displaystyle=[(\widehat{{\mathbf{m}}}-{\mathbf{m}})^{\prime}(\widehat{{\bm{\Theta}}}-{\bm{\Theta}})(\widehat{{\mathbf{m}}}-{\mathbf{m}})]/p+[(\widehat{{\mathbf{m}}}-{\mathbf{m}})^{\prime}{\bm{\Theta}}(\widehat{{\mathbf{m}}}-{\mathbf{m}})]/p
    +[2(𝐦^𝐦)𝚯𝐦]/p+[2𝐦(𝚯^𝚯)(𝐦^𝐦)]/p\displaystyle+[2(\widehat{{\mathbf{m}}}-{\mathbf{m}})^{\prime}{\bm{\Theta}}{\mathbf{m}}]/p+[2{\mathbf{m}}^{\prime}(\widehat{{\bm{\Theta}}}-{\bm{\Theta}})(\widehat{{\mathbf{m}}}-{\mathbf{m}})]/p
    +[𝐦(𝚯^𝚯)𝐦]/p.\displaystyle+[{\mathbf{m}}^{\prime}(\widehat{{\bm{\Theta}}}-{\bm{\Theta}}){\mathbf{m}}]/p. (B.11)

    We now bound each of the terms in ((d)) using the expressions derived in Callot et al. (2019) (see their Proof of Lemma A.3) and the facts that logp/T=o(1)\log p/T=o(1) and 𝐦^𝐦max=𝒪P(logp/T)\left\lVert\widehat{{\mathbf{m}}}-{\mathbf{m}}\right\rVert_{\text{max}}=\mathcal{O}_{P}(\sqrt{\log p/T}).

    |(𝐦^𝐦)(𝚯^𝚯)(𝐦^𝐦)|/p\displaystyle\left\lvert(\widehat{{\mathbf{m}}}-{\mathbf{m}})^{\prime}(\widehat{{\bm{\Theta}}}-{\bm{\Theta}})(\widehat{{\mathbf{m}}}-{\mathbf{m}})\right\rvert/p 𝐦^𝐦max2|𝚯^𝚯|1\displaystyle\leq\left\lVert\widehat{{\mathbf{m}}}-{\mathbf{m}}\right\rVert_{\text{max}}^{2}{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\widehat{{\bm{\Theta}}}-{\bm{\Theta}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{1}
    =𝒪P(logpTϱTdTK3/2sT)\displaystyle=\mathcal{O}_{P}\Big{(}\frac{\log p}{T}\cdot\varrho_{T}d_{T}K^{3/2}s_{T}\Big{)} (B.12)
    |(𝐦^𝐦)𝚯(𝐦^𝐦)|/p𝐦^𝐦max2|𝚯|1=𝒪P(logpTdTK3/2).\displaystyle\left\lvert(\widehat{{\mathbf{m}}}-{\mathbf{m}})^{\prime}{\bm{\Theta}}(\widehat{{\mathbf{m}}}-{\mathbf{m}})\right\rvert/p\leq\left\lVert\widehat{{\mathbf{m}}}-{\mathbf{m}}\right\rVert_{\text{max}}^{2}{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{\bm{\Theta}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{1}=\mathcal{O}_{P}\Big{(}\frac{\log p}{T}\cdot d_{T}K^{3/2}\Big{)}. (B.13)
    |(𝐦^𝐦)𝚯𝐦|/p𝐦^𝐦max|𝚯|1=𝒪P(logpTdTK3/2).\displaystyle\left\lvert(\widehat{{\mathbf{m}}}-{\mathbf{m}})^{\prime}{\bm{\Theta}}{\mathbf{m}}\right\rvert/p\leq\left\lVert\widehat{{\mathbf{m}}}-{\mathbf{m}}\right\rVert_{\text{max}}{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{\bm{\Theta}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{1}=\mathcal{O}_{P}\Big{(}\sqrt{\frac{\log p}{T}}\cdot d_{T}K^{3/2}\Big{)}. (B.14)
    |𝐦(𝚯^𝚯)(𝐦^𝐦)|/p\displaystyle\left\lvert{\mathbf{m}}^{\prime}(\widehat{{\bm{\Theta}}}-{\bm{\Theta}})(\widehat{{\mathbf{m}}}-{\mathbf{m}})\right\rvert/p 𝐦^𝐦max|𝚯^𝚯|1\displaystyle\leq\left\lVert\widehat{{\mathbf{m}}}-{\mathbf{m}}\right\rVert_{\text{max}}{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\widehat{{\bm{\Theta}}}-{\bm{\Theta}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{1}
    =𝒪P(logpTϱTdTK3/2sT).\displaystyle=\mathcal{O}_{P}\Big{(}\sqrt{\frac{\log p}{T}}\cdot\varrho_{T}d_{T}K^{3/2}s_{T}\Big{)}. (B.15)
    |𝐦(𝚯^𝚯)𝐦|/p|𝚯^𝚯|1=𝒪P(ϱTdTK3/2sT).\displaystyle\left\lvert{\mathbf{m}}^{\prime}(\widehat{{\bm{\Theta}}}-{\bm{\Theta}}){\mathbf{m}}\right\rvert/p\leq{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\widehat{{\bm{\Theta}}}-{\bm{\Theta}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{1}=\mathcal{O}_{P}\Big{(}\varrho_{T}d_{T}K^{3/2}s_{T}\Big{)}. (B.16)
  5. (e)

    This is a direct consequence of Part (d) and the fact that d^dd^d\sqrt{\widehat{d}-d}\geq\sqrt{\widehat{d}}-\sqrt{d}.

  6. (f)

    First, rewrite the expression of interest:

    (a^d^b^2)(adb2)=[(a^a)+a][(d^d)+d][(b^b)+b]2,\displaystyle(\widehat{a}\widehat{d}-\widehat{b}^{2})-(ad-b^{2})=[(\widehat{a}-a)+a][(\widehat{d}-d)+d]-[(\widehat{b}-b)+b]^{2},

    therefore, using Lemma 11, we have

    |(a^d^b^2)(adb2)|\displaystyle\left\lvert(\widehat{a}\widehat{d}-\widehat{b}^{2})-(ad-b^{2})\right\rvert [|a^a||d^d|+|a^a|d+a|d^d|+(b^b)2+2|b||b^b|]\displaystyle\leq\Big{[}\left\lvert\widehat{a}-a\right\rvert\left\lvert\widehat{d}-d\right\rvert+\left\lvert\widehat{a}-a\right\rvert d+a\left\lvert\widehat{d}-d\right\rvert+(\widehat{b}-b)^{2}+2\left\lvert b\right\rvert\left\lvert\widehat{b}-b\right\rvert\Big{]}
    =𝒪P(ϱTdTK3/2sT)=oP(1).\displaystyle=\mathcal{O}_{P}\Big{(}\varrho_{T}d_{T}K^{3/2}s_{T}\Big{)}=o_{P}(1).
  7. (g)

    This is a direct consequence of Part (a): adb2ad=𝒪(1)ad-b^{2}\leq ad=\mathcal{O}(1).

B.8 Proof of Theorem 4

Let us derive convergence rates for each portfolio weight formulas one by one. We start with GMV formulation.

𝐰^GMV𝐰GMV1a(𝚯^𝚯)𝜾p1p+|aa^|𝚯𝜾p1p|a^|a=𝒪P(ϱTdT2K3sT)=oP(1),\displaystyle\left\lVert\widehat{{\mathbf{w}}}_{\text{GMV}}-{\mathbf{w}}_{\text{GMV}}\right\rVert_{1}\leq\frac{a\frac{\left\lVert(\widehat{{\bm{\Theta}}}-{\bm{\Theta}}){\bm{\iota}}_{p}\right\rVert_{1}}{p}+\left\lvert a-\widehat{a}\right\rvert\frac{\left\lVert{\bm{\Theta}}{\bm{\iota}}_{p}\right\rVert_{1}}{p}}{\left\lvert\widehat{a}\right\rvert a}=\mathcal{O}_{P}\Big{(}\varrho_{T}d_{T}^{2}K^{3}s_{T}\Big{)}=o_{P}(1),

where the first inequality was shown in Callot et al. (2019) (see their expression A.50), and the rate follows from Lemmas 11 and 10.
We now proceed with the MWC weight formulation. First, let us simplify the weight expression as follows: 𝐰MWC=κ1(𝚯𝜾p/p)+κ2(𝚯𝐦/p){\mathbf{w}}_{\text{MWC}}=\kappa_{1}({\bm{\Theta}}{\bm{\iota}}_{p}/p)+\kappa_{2}({\bm{\Theta}}{\mathbf{m}}/p), where

κ1=dμbadb2\displaystyle\kappa_{1}=\frac{d-\mu b}{ad-b^{2}}
κ2=μabadb2.\displaystyle\kappa_{2}=\frac{\mu a-b}{ad-b^{2}}.

Let 𝐰^MWC=κ^1(𝚯^𝜾p/p)+κ^2(𝚯^𝐦^/p)\widehat{{\mathbf{w}}}_{\text{MWC}}=\widehat{\kappa}_{1}(\widehat{{\bm{\Theta}}}{\bm{\iota}}_{p}/p)+\widehat{\kappa}_{2}(\widehat{{\bm{\Theta}}}\widehat{{\mathbf{m}}}/p), where κ^1\widehat{\kappa}_{1} and κ^2\widehat{\kappa}_{2} are the estimators of κ1\kappa_{1} and κ2\kappa_{2} respectively. As shown in Callot et al. (2019) (see their equation A.57), we can bound the quantity of interest as follows:

𝐰^MWC𝐰MWC1\displaystyle\left\lVert\widehat{{\mathbf{w}}}_{\text{MWC}}-{\mathbf{w}}_{\text{MWC}}\right\rVert_{1} |(κ^1κ1)|(𝚯^𝚯)𝜾p1/p+|(κ^1κ1)|𝚯𝜾p1/p+|κ1|(𝚯^𝚯)𝜾p1/p\displaystyle\leq\left\lvert(\widehat{\kappa}_{1}-\kappa_{1})\right\rvert\left\lVert(\widehat{{\bm{\Theta}}}-{\bm{\Theta}}){\bm{\iota}}_{p}\right\rVert_{1}/p+\left\lvert(\widehat{\kappa}_{1}-\kappa_{1})\right\rvert\left\lVert{\bm{\Theta}}{\bm{\iota}}_{p}\right\rVert_{1}/p+\left\lvert\kappa_{1}\right\rvert\left\lVert(\widehat{{\bm{\Theta}}}-{\bm{\Theta}}){\bm{\iota}}_{p}\right\rVert_{1}/p
+|(κ^2κ2)|(𝚯^𝚯)(𝐦^𝐦)1/p+|(κ^2κ2)|𝚯(𝐦^𝐦)1/p\displaystyle+\left\lvert(\widehat{\kappa}_{2}-\kappa_{2})\right\rvert\left\lVert(\widehat{{\bm{\Theta}}}-{\bm{\Theta}})(\widehat{{\mathbf{m}}}-{\mathbf{m}})\right\rVert_{1}/p+\left\lvert(\widehat{\kappa}_{2}-\kappa_{2})\right\rvert\left\lVert{\bm{\Theta}}(\widehat{{\mathbf{m}}}-{\mathbf{m}})\right\rVert_{1}/p
+|(κ^2κ2)|(𝚯^𝚯)𝐦1/p+|(κ^2κ2)|𝚯𝐦1/p\displaystyle+\left\lvert(\widehat{\kappa}_{2}-\kappa_{2})\right\rvert\left\lVert(\widehat{{\bm{\Theta}}}-{\bm{\Theta}}){\mathbf{m}}\right\rVert_{1}/p+\left\lvert(\widehat{\kappa}_{2}-\kappa_{2})\right\rvert\left\lVert{\bm{\Theta}}{\mathbf{m}}\right\rVert_{1}/p
+|κ2|(𝚯^𝚯)(𝐦^𝐦)1/p+|κ2|(𝚯^𝚯)𝐦1/p.\displaystyle+\left\lvert\kappa_{2}\right\rvert\left\lVert(\widehat{{\bm{\Theta}}}-{\bm{\Theta}})(\widehat{{\mathbf{m}}}-{\mathbf{m}})\right\rVert_{1}/p+\left\lvert\kappa_{2}\right\rvert\left\lVert(\widehat{{\bm{\Theta}}}-{\bm{\Theta}}){\mathbf{m}}\right\rVert_{1}/p. (B.17)

For the ease of representation, denote y=adb2y=ad-b^{2}. Then, using similar technique as in Callot et al. (2019) we get

|(κ^1κ1)|y|d^d|+yμ|b^b|+|y^y||dμb|y^y=𝒪P(ϱTdTK3/2sT)=oP(1),\displaystyle\left\lvert(\widehat{\kappa}_{1}-\kappa_{1})\right\rvert\leq\frac{y\left\lvert\widehat{d}-d\right\rvert+y\mu\left\lvert\widehat{b}-b\right\rvert+\left\lvert\widehat{y}-y\right\rvert\left\lvert d-\mu b\right\rvert}{\widehat{y}y}=\mathcal{O}_{P}\Big{(}\varrho_{T}d_{T}K^{3/2}s_{T}\Big{)}=o_{P}(1),

where the rate trivially follows from Lemma 11.
Similarly, we get

|(κ^2κ2)|=𝒪P(ϱTdTK3/2sT)=oP(1).\displaystyle\left\lvert(\widehat{\kappa}_{2}-\kappa_{2})\right\rvert=\mathcal{O}_{P}\Big{(}\varrho_{T}d_{T}K^{3/2}s_{T}\Big{)}=o_{P}(1).

Callot et al. (2019) showed that |κ1|=𝒪(1)\left\lvert\kappa_{1}\right\rvert=\mathcal{O}(1) and |κ2|=𝒪(1)\left\lvert\kappa_{2}\right\rvert=\mathcal{O}(1). Therefore, we can get the rate of (B.8):

𝐰^MWC𝐰MWC1=𝒪P(ϱTdT2K3sT)=oP(1).\displaystyle\left\lVert\widehat{{\mathbf{w}}}_{\text{MWC}}-{\mathbf{w}}_{\text{MWC}}\right\rVert_{1}=\mathcal{O}_{P}\Big{(}\varrho_{T}d_{T}^{2}K^{3}s_{T}\Big{)}=o_{P}(1).

We now proceed with the MRC weight formulation:

𝐰^MRC𝐰MRC1gp[(𝚯^𝚯)(𝐦^𝐦)1+(𝚯^𝚯)𝐦1+𝚯(𝐦^𝐦)1]+|g^g|𝚯𝐦1|g^|g\displaystyle\left\lVert\widehat{{\mathbf{w}}}_{\text{MRC}}-{\mathbf{w}}_{\text{MRC}}\right\rVert_{1}\leq\frac{\frac{g}{p}\Big{[}\left\lVert(\widehat{{\bm{\Theta}}}-{\bm{\Theta}})(\widehat{{\mathbf{m}}}-{\mathbf{m}})\right\rVert_{1}+\left\lVert(\widehat{{\bm{\Theta}}}-{\bm{\Theta}}){\mathbf{m}}\right\rVert_{1}+\left\lVert{\bm{\Theta}}(\widehat{{\mathbf{m}}}-{\mathbf{m}})\right\rVert_{1}\Big{]}+\left\lvert\widehat{g}-g\right\rvert\left\lVert{\bm{\Theta}}{\mathbf{m}}\right\rVert_{1}}{\left\lvert\widehat{g}\right\rvert g}
gp[p|𝚯^𝚯|1(𝐦^𝐦)max+p|𝚯^𝚯|1𝐦max+p|𝚯|1(𝐦^𝐦)max]+p|g^g||𝚯|1𝐦max|g^|g\displaystyle\leq\frac{\frac{g}{p}\Big{[}p{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\widehat{{\bm{\Theta}}}-{\bm{\Theta}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{1}\left\lVert(\widehat{{\mathbf{m}}}-{\mathbf{m}})\right\rVert_{\text{max}}+p{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\widehat{{\bm{\Theta}}}-{\bm{\Theta}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{1}\left\lVert{\mathbf{m}}\right\rVert_{\text{max}}+p{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{\bm{\Theta}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{1}\left\lVert(\widehat{{\mathbf{m}}}-{\mathbf{m}})\right\rVert_{\text{max}}\Big{]}+p\left\lvert\widehat{g}-g\right\rvert{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{\bm{\Theta}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{1}\left\lVert{\mathbf{m}}\right\rVert_{\text{max}}}{\left\lvert\widehat{g}\right\rvert g}
=𝒪P(ϱTdTK3/2sTlogpT)+𝒪P(ϱTdTK3/2sT)\displaystyle=\mathcal{O}_{P}\Big{(}\varrho_{T}d_{T}K^{3/2}s_{T}\cdot\sqrt{\frac{\log p}{T}}\Big{)}+\mathcal{O}_{P}\Big{(}\varrho_{T}d_{T}K^{3/2}s_{T}\Big{)}
+𝒪P(dTK3/2logpT)+𝒪P([ϱTdTK3/2sT]1/2dTK3/2)=oP(1),\displaystyle+\mathcal{O}_{P}\Big{(}d_{T}K^{3/2}\cdot\sqrt{\frac{\log p}{T}}\Big{)}+\mathcal{O}_{P}\Big{(}[\varrho_{T}d_{T}K^{3/2}s_{T}]^{1/2}\cdot d_{T}K^{3/2}\Big{)}=o_{P}(1),

where we used Lemmas 10-11 and the fact that 𝐦^𝐦max=𝒪P(logp/T)\left\lVert\widehat{{\mathbf{m}}}-{\mathbf{m}}\right\rVert_{\text{max}}=\mathcal{O}_{P}(\sqrt{\log p/T}).

B.9 Proof of Theorem 5

We start with the GMV formulation. Using Lemma 11 (a)-(b), we get

|a^1a11|=|aa^||a^|=𝒪P(ϱTdTK3/2sT)=oP(1).\displaystyle\left\lvert\frac{\hat{a}^{-1}}{a^{-1}}-1\right\rvert=\frac{\left\lvert a-\hat{a}\right\rvert}{\left\lvert\hat{a}\right\rvert}=\mathcal{O}_{P}(\varrho_{T}d_{T}K^{3/2}s_{T})=o_{P}(1).

Proceeding to the MWC risk exposure, we follow Callot et al. (2019) and introduce the following notation: x=aμ22bμ+dx=a\mu^{2}-2b\mu+d and x^=a^μ2b^μ+d^\hat{x}=\hat{a}\mu-2\hat{b}\mu+\hat{d} to rewrite Φ^MWC=p1(x^/y^)\widehat{\Phi}_{\text{MWC}}=p^{-1}(\hat{x}/\hat{y}). As shown in Callot et al. (2019), y/x=𝒪(1)y/x=\mathcal{O}(1) (see their equation A.42). Furthermore, by Lemma 11 (b)-(d)

|x^x||a^a|μ2+2|b^b|μ+|d^d|=𝒪P(ϱTdTK3/2sT)=oP(1),\displaystyle\left\lvert\hat{x}-x\right\rvert\leq\left\lvert\hat{a}-a\right\rvert\mu^{2}+2\left\lvert\hat{b}-b\right\rvert\mu+\left\lvert\hat{d}-d\right\rvert=\mathcal{O}_{P}(\varrho_{T}d_{T}K^{3/2}s_{T})=o_{P}(1),

and by Lemma 11 (f):

|y^y|=|a^d^b^2(adb2)|=𝒪P(ϱTdTK3/2sT)=oP(1).\displaystyle\left\lvert\hat{y}-y\right\rvert=\left\lvert\hat{a}\hat{d}-\hat{b}^{2}-(ad-b^{2})\right\rvert=\mathcal{O}_{P}(\varrho_{T}d_{T}K^{3/2}s_{T})=o_{P}(1).

Using the above and the facts that y=𝒪(1)y=\mathcal{O}(1) and x=𝒪(1)x=\mathcal{O}(1) (which were derived by Callot et al. (2019) in A.45 and A.46), we have

|Φ^MWCΦMWCΦMWC|=|(x^x)y+x(yy^)y^y|𝒪(1)𝒪P(ϱTdTK3/2sT)=oP(1).\displaystyle\left\lvert\frac{\widehat{\Phi}_{\text{MWC}}-\Phi_{\text{MWC}}}{\Phi_{\text{MWC}}}\right\rvert=\left\lvert\frac{(\hat{x}-x)y+x(y-\hat{y})}{\hat{y}y}\right\rvert\mathcal{O}(1)\mathcal{O}_{P}(\varrho_{T}d_{T}K^{3/2}s_{T})=o_{P}(1).

Finally, to bound MRC risk exposure, we use Lemma 11 (e) and rewrite

|gg^||g^|=𝒪P([ϱTdTK3/2sT]1/2)=oP(1).\displaystyle\frac{\left\lvert g-\hat{g}\right\rvert}{\left\lvert\hat{g}\right\rvert}=\mathcal{O}_{P}\Big{(}[\varrho_{T}d_{T}K^{3/2}s_{T}]^{1/2})=o_{P}(1).

B.10 Generalization: Sub-Gaussian and Elliptical Distributions

So far the consistency of the Factor Graphical Lasso in Theorem 4 relied on the assumption of the exponential-type tails in 1(c). Since this tail-behavior may be too restrictive for financial portfolio, we comment on the possibility to relax it. First, recall where 1(c) was used before: we required this assumption in order to establish convergence of unknown factors and loadings in Theorem 1, which was further used to obtain the convergence properties of 𝚺^ε\widehat{{\bm{\Sigma}}}_{\varepsilon} in Theorem 2. Hence, when Assumption 1(c) is relaxed, one needs to find another way to consistently estimate 𝚺ε{\bm{\Sigma}}_{\varepsilon}. We achieve it using the tools developed in Fan et al., (2018). Specifically, let 𝚺=𝚪𝚲𝚪{\bm{\Sigma}}={\bm{\Gamma}}{\bm{\Lambda}}{\bm{\Gamma}}^{{}^{\prime}}, where 𝚺{\bm{\Sigma}} is the covariance matrix of returns that follow a factor structure described in equation (3.1). Define 𝚺^,𝚲^K,𝚪^K\widehat{{\bm{\Sigma}}},\widehat{{\bm{\Lambda}}}_{K},\widehat{{\bm{\Gamma}}}_{K} to be the estimators of 𝚺,𝚲,𝚪{\bm{\Sigma}},{\bm{\Lambda}},{\bm{\Gamma}}. We further let 𝚲^K=diag(λ^1,,λ^K)\widehat{{\bm{\Lambda}}}_{K}=\text{diag}(\hat{\lambda}_{1},\ldots,\hat{\lambda}_{K}) and 𝚪^K=(v^1,,v^K)\widehat{{\bm{\Gamma}}}_{K}=(\hat{v}_{1},\ldots,\hat{v}_{K}) to be constructed by the first KK leading empirical eigenvalues and the corresponding eigenvectors of 𝚺^\widehat{{\bm{\Sigma}}} and 𝐁^𝐁^=𝚪^K𝚲^K𝚪^K\widehat{{\mathbf{B}}}\widehat{{\mathbf{B}}}^{\prime}=\widehat{{\bm{\Gamma}}}_{K}\widehat{{\bm{\Lambda}}}_{K}\widehat{{\bm{\Gamma}}}_{K}^{{}^{\prime}}. Similarly to Fan et al., (2018), we require the following bounds on the componentwise maximums of the estimators:

  1. \edefmbx(C.1)

    𝚺^𝚺max=𝒪P(logp/T)\left\lVert\widehat{{\bm{\Sigma}}}-{\bm{\Sigma}}\right\rVert_{\text{max}}=\mathcal{O}_{P}(\sqrt{\log p/T}),

  1. \edefmbx(C.2)

    (𝚲^K𝚲)𝚲1max=𝒪P(Klogp/T)\left\lVert(\widehat{{\bm{\Lambda}}}_{K}-{\bm{\Lambda}}){\bm{\Lambda}}^{-1}\right\rVert_{\text{max}}=\mathcal{O}_{P}(K\sqrt{\log p/T}),

  1. \edefmbx(C.3)

    𝚪^K𝚪max=𝒪P(K1/2logp/(Tp))\left\lVert\widehat{{\bm{\Gamma}}}_{K}-{\bm{\Gamma}}\right\rVert_{\text{max}}=\mathcal{O}_{P}(K^{1/2}\sqrt{\log p/(Tp)}).

Let 𝚺^SG\widehat{{\bm{\Sigma}}}^{SG} be the sample covariance matrix, with 𝚲^KSG\widehat{{\bm{\Lambda}}}_{K}^{SG} and 𝚪^KSG\widehat{{\bm{\Gamma}}}_{K}^{SG} constructed with the first KK leading empirical eigenvalues and eigenvectors of 𝚺^SG\widehat{{\bm{\Sigma}}}^{SG} respectively. Also, let 𝚺^EL1=𝐃^𝐑^1𝐃^\widehat{{\bm{\Sigma}}}^{EL1}=\widehat{{\mathbf{D}}}\widehat{{\mathbf{R}}}_{1}\widehat{{\mathbf{D}}}, where 𝐑^1\widehat{{\mathbf{R}}}_{1} is obtained using the Kendall’s tau correlation coefficients and 𝐃^\widehat{{\mathbf{D}}} is a robust estimator of variances constructed using the Huber loss. Furthermore, let 𝚺^EL2=𝐃^𝐑^2𝐃^\widehat{{\bm{\Sigma}}}^{EL2}=\widehat{{\mathbf{D}}}\widehat{{\mathbf{R}}}_{2}\widehat{{\mathbf{D}}}, where 𝐑^2\widehat{{\mathbf{R}}}_{2} is obtained using the spatial Kendall’s tau estimator. Define 𝚲^KEL\widehat{{\bm{\Lambda}}}_{K}^{EL} to be the matrix of the first KK leading empirical eigenvalues of 𝚺^EL1\widehat{{\bm{\Sigma}}}^{EL1}, and 𝚪^KEL\widehat{{\bm{\Gamma}}}_{K}^{EL} is the matrix of the first KK leading empirical eigenvectors of 𝚺^EL2\widehat{{\bm{\Sigma}}}^{EL2}. For more details regarding constructing 𝚺^SG\widehat{{\bm{\Sigma}}}^{SG}, 𝚺^EL1\widehat{{\bm{\Sigma}}}^{EL1} and 𝚺^EL2\widehat{{\bm{\Sigma}}}^{EL2} see Fan et al., (2018), Sections 3 and 4.

Proposition 1.

For sub-Gaussian distributions, 𝚺^SG\widehat{{\bm{\Sigma}}}^{SG}, 𝚲^KSG\widehat{{\bm{\Lambda}}}_{K}^{SG} and 𝚪^KSG\widehat{{\bm{\Gamma}}}_{K}^{SG} satisfy 1-1.
For elliptical distributions, 𝚺^EL1\widehat{{\bm{\Sigma}}}^{EL1}, 𝚲^KEL\widehat{{\bm{\Lambda}}}_{K}^{EL} and 𝚪^KEL\widehat{{\bm{\Gamma}}}_{K}^{EL} satisfy 1-1.
When 1-1 are satisfied, the bounds obtained in Theorems 2-5 continue to hold.

Proposition 1 is essentially a rephrasing of the results obtained in Fan et al., (2018), Sections 3 and 4. The difference arises due to the fact that we allow KK to increase, which is reflected in the modified rates in 1-1. As evidenced from the above Proposition, 𝚺^EL2\widehat{{\bm{\Sigma}}}^{EL2} is only used for estimating the eigenvectors. This is necessary due to the fact that, in contrast with 𝚺^EL2\widehat{{\bm{\Sigma}}}^{EL2}, the theoretical properties of the eigenvectors of 𝚺^EL\widehat{{\bm{\Sigma}}}^{EL} are mathematically involved because of the sin function. The FGL for the elliptical distributions will be called the Robust FGL.

Appendix C Additional Simulations

C.1 Verifying Theoretical Rates

To compare the empirical rate with the theoretical expressions derived in Theorems 3-5, we use the facts from Theorem 2 that ω3TK2logp/T+K3/p\omega_{3T}\equiv K^{2}\sqrt{\log p/T}+K^{3}/\sqrt{p} and ϱT1ω3T𝑝0\varrho_{T}^{-1}\omega_{3T}\xrightarrow{p}0 to introduce the following functions that correspond to the theoretical rates for the choice of parameters in the empirical setting:

f||||||2=C1+C2log2(sTϱT)g||||||1=C3+C2log2(dTK3/2sTϱT)}for𝚯^\displaystyle\begin{rcases*}f_{{\left|\kern-0.75346pt\left|\kern-0.75346pt\left|\cdot\right|\kern-0.75346pt\right|\kern-0.75346pt\right|}_{2}}=C_{1}+C_{2}\cdot\log_{2}(s_{T}\varrho_{T})\\ g_{{\left|\kern-0.75346pt\left|\kern-0.75346pt\left|\cdot\right|\kern-0.75346pt\right|\kern-0.75346pt\right|}_{1}}=C_{3}+C_{2}\cdot\log_{2}(d_{T}K^{3/2}s_{T}\varrho_{T})\end{rcases*}\text{for}\ \widehat{{\bm{\Theta}}} (C.1)
h1=C4+C2log2(ϱTdT2K3sT)for𝐰^GMV,𝐰^MWC\displaystyle h_{1}=C_{4}+C_{2}\cdot\log_{2}(\varrho_{T}d_{T}^{2}K^{3}s_{T})\hskip 30.5pt\text{for}\ \widehat{{\mathbf{w}}}_{\text{GMV}},\widehat{{\mathbf{w}}}_{\text{MWC}} (C.2)
h2=C5+C6log2([ϱTsT]1/2dT3/2K3)for𝐰^MRC\displaystyle h_{2}=C_{5}+C_{6}\cdot\log_{2}([\varrho_{T}s_{T}]^{1/2}d_{T}^{3/2}K^{3})\quad\text{for}\ \widehat{{\mathbf{w}}}_{\text{MRC}} (C.3)
h3=C7+C2log2(dTK3/2sTϱT)forΦ^GMV,Φ^MWC\displaystyle h_{3}=C_{7}+C_{2}\cdot\log_{2}(d_{T}K^{3/2}s_{T}\varrho_{T})\hskip 22.0pt\text{for}\ \widehat{\Phi}_{\text{GMV}},\widehat{\Phi}_{\text{MWC}} (C.4)
h4=C8+C9log2(dTK3/2sTϱT)forΦ^MRC\displaystyle h_{4}=C_{8}+C_{9}\cdot\log_{2}(d_{T}K^{3/2}s_{T}\varrho_{T})\hskip 22.0pt\text{for}\ \widehat{\Phi}_{\text{MRC}} (C.5)

where C1,,C9C_{1},\ldots,C_{9} are constants with C6>C2C_{6}>C_{2} (by Theorem 4), C9>C2C_{9}>C_{2} (by Theorem 5).

Figure 4 shows the averaged (over Monte Carlo simulations) errors of the estimators of 𝚯{\bm{\Theta}}, 𝐰{\mathbf{w}} and Φ\Phi versus the sample size TT in the logarithmic scale (base 2). In order to confirm the theoretical findings from Theorems 3-5, we also plot the theoretical rates of convergence given by the functions in (C.1)-(C.5). We verify that the empirical and theoretical rates are matched. Since the convergence rates for GMV and MWC portfolio weights 𝐰{\mathbf{w}} and risk exposures Φ\Phi are very similar, we only report the former. Note that as predicted by Theorem 3, the rate of convergence of the precision matrix in ||||||2{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\cdot\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{2}-norm is faster than the rate in ||||||1{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\cdot\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{1}-norm. Furthermore, the convergence rate of the GMV, MWC and MRC portfolio weights and risk exposures are close to the rate of the precision matrix 𝚯{\bm{\Theta}} in ||||||1{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\cdot\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{1}-norm, which is confirmed by Theorem 4. As evidenced by Figure 4, the convergence rate of the MRC risk exposure is slower than the rate of GMV and MWC exposures. This finding is in accordance with Theorem 5 and it is also consistent with the empirical findings that indicate higher overall risk associated with MRC portfolios.

Refer to caption
Refer to caption
Figure 4: Averaged empirical errors (solid lines) and theoretical rates of convergence (dashed lines) on logarithmic scale: p=T0.85p=T^{0.85}, K=2(logT)0.5K=2(\log T)^{0.5}, sT=𝒪(T0.05)s_{T}=\mathcal{O}(T^{0.05}).

C.2 Results for Case 1

We compare the performance of FGL with the alternative models listed at the beginning of Section 5 for Case 1. The only instance when FGL is strictly but slightly dominated occurs in Figure 5: POET outperforms FGL in terms of convergence of precision matrix in the spectral norm. This is different from Case 2 in Figure 1 where FGL outperforms all the competing models.

Refer to caption
Figure 5: Averaged errors of the estimators of 𝚯{\bm{\Theta}} for Case 1 on logarithmic scale: p=T0.85p=T^{0.85}, K=2(logT)0.5K=2(\log T)^{0.5}, sT=𝒪(T0.05)s_{T}=\mathcal{O}(T^{0.05}).
Refer to caption
Figure 6: Averaged errors of the estimators of 𝐰GMV{\mathbf{w}}_{\text{GMV}} (left) and 𝐰MRC{\mathbf{w}}_{\text{MRC}} (right) for Case 1 on logarithmic scale: p=T0.85p=T^{0.85}, K=2(logT)0.5K=2(\log T)^{0.5}, sT=𝒪(T0.05)s_{T}=\mathcal{O}(T^{0.05}).
Refer to caption
Figure 7: Averaged errors of the estimators of ΦGMV\Phi_{\text{GMV}} (left) and ΦMRC\Phi_{\text{MRC}} (right) for Case 1 on logarithmic scale: p=T0.85p=T^{0.85}, K=2(logT)0.5K=2(\log T)^{0.5}, sT=𝒪(T0.05)s_{T}=\mathcal{O}(T^{0.05}).

C.3 Robust FGL

The DGP for elliptical distributions is similar to Fan et al., (2018): let (𝐟t,𝜺t)({\mathbf{f}}_{t},{\bm{\varepsilon}}_{t}) from (3.1) jointly follow the multivariate t-distribution with the degrees of freedom ν\nu. When ν=\nu=\infty, this corresponds to the multivariate normal distribution, smaller values of ν\nu are associated with thicker tails. We draw TT independent samples of (𝐟t,𝜺t)({\mathbf{f}}_{t},{\bm{\varepsilon}}_{t}) from the multivariate t-distribution with zero mean and covariance matrix 𝚺=diag(𝚺f,𝚺ε){\bm{\Sigma}}=\text{diag}({\bm{\Sigma}}_{f},{\bm{\Sigma}}_{\varepsilon}), where 𝚺f=𝐈K{\bm{\Sigma}}_{f}={\mathbf{I}}_{K}. To construct 𝚺ε{\bm{\Sigma}}_{\varepsilon} we use a Toeplitz structure parameterized by ρ=0.5\rho=0.5, which leads to the sparse 𝚯ε=𝚺ε1{\bm{\Theta}}_{\varepsilon}={\bm{\Sigma}}^{-1}_{\varepsilon}. The rows of 𝐁{\mathbf{B}} are drawn from 𝒩(𝟎,𝐈K)\mathcal{N}(\bm{0},{\mathbf{I}}_{K}). We let p=T0.85p=T^{0.85}, K=2(logT)0.5K=2(\log T)^{0.5} and T=[2h],forh{7,7.5,8,,9.5}T=[2^{h}],\ \text{for}\ h\in\{7,7.5,8,\ldots,9.5\}. Figure 8-9 report the averaged (over Monte Carlo simulations) estimation errors (in the logarithmic scale, base 2) for 𝚯{\bm{\Theta}} and two portfolio weights (GMV and MRC) using FGL and Robust FGL for ν=4.2\nu=4.2. Noticeably, the performance of FGL for estimating the precision matrix is comparable with that of Robust FGL: this suggests that our FGL algorithm is insensitive to heavy-tailed distributions even without additional modifications. Furthermore, FGL outperforms its Robust counterpart in terms of estimating portfolio weights, as evidenced by Figure 9. We further compare the performance of FGL and Robust FGL for different degrees of freedom: Figure 10 reports the log-ratios (base 2) of the averaged (over Monte Carlo simulations) estimation errors for ν=4.2\nu=4.2, ν=7\nu=7 and ν=\nu=\infty. The results for the estimation of 𝚯{\bm{\Theta}} presented in Figure 10 are consistent with the findings in Fan et al., (2018): Robust FGL outperforms the non-robust counterpart for thicker tails.

Refer to caption
Figure 8: Averaged errors of the estimators of 𝚯{\bm{\Theta}} on logarithmic scale: p=T0.85p=T^{0.85}, K=2(logT)0.5K=2(\log T)^{0.5}, ν=4.2\nu=4.2.
Refer to caption
Figure 9: Averaged errors of the estimators of 𝐰GMV{\mathbf{w}}_{\text{GMV}} (left) and 𝐰MRC{\mathbf{w}}_{\text{MRC}} (right) on logarithmic scale: p=T0.85p=T^{0.85}, K=2(logT)0.5K=2(\log T)^{0.5}, ν=4.2\nu=4.2.
Refer to caption
Figure 10: Log ratios (base 2) of the averaged errors of the FGL and the Robust FGL estimators of 𝚯{\bm{\Theta}}: log2(|𝚯^𝚯|2|𝚯^R𝚯|2)\log_{2}\Big{(}\frac{{\left|\kern-0.67812pt\left|\kern-0.67812pt\left|\widehat{{\bm{\Theta}}}-{\bm{\Theta}}\right|\kern-0.67812pt\right|\kern-0.67812pt\right|}_{2}}{{\left|\kern-0.67812pt\left|\kern-0.67812pt\left|\widehat{{\bm{\Theta}}}_{\text{R}}-{\bm{\Theta}}\right|\kern-0.67812pt\right|\kern-0.67812pt\right|}_{2}}\Big{)} (left), log2(|𝚯^𝚯|1|𝚯^R𝚯|1)\log_{2}\Big{(}\frac{{\left|\kern-0.67812pt\left|\kern-0.67812pt\left|\widehat{{\bm{\Theta}}}-{\bm{\Theta}}\right|\kern-0.67812pt\right|\kern-0.67812pt\right|}_{1}}{{\left|\kern-0.67812pt\left|\kern-0.67812pt\left|\widehat{{\bm{\Theta}}}_{\text{R}}-{\bm{\Theta}}\right|\kern-0.67812pt\right|\kern-0.67812pt\right|}_{1}}\Big{)} (right): p=T0.85p=T^{0.85}, K=2(logT)0.5K=2(\log T)^{0.5}.

C.4 Relaxing Pervasiveness Assumption

As pointed out by Onatski, (2013), the data on 100 industrial portfolios shows that there are no large gaps between eigenvalues ii and i+1i+1 of the sample covariance data except for i=1i=1. However, as is commonly believed, such data contains at least three factors. Therefore, the factor pervasiveness assumption suggests the existence of a large gap for i3i\geq 3. In order to examine sensitivity of portfolios to the pervasiveness assumption and quantify the degree of pervasiveness, we use the same DGP as in (5.2)-(5.3), but with σε,ij=ρ|ij|\sigma_{\varepsilon,ij}=\rho^{\left\lvert i-j\right\rvert} and K=3K=3. We consider ρ{0.4,0.5,0.6,0.7,0.8,0.9}\rho\in\{0.4,0.5,0.6,0.7,0.8,0.9\} which corresponds to λ3/λ4{3.1,2.7,2.6,2.2,1.5,1.1}\lambda_{3}/\lambda_{4}\in\{3.1,2.7,2.6,2.2,1.5,1.1\}. In other words, as ρ\rho increases, the systematic-idiosyncratic gap measured by λ^3/λ^4\hat{\lambda}_{3}/\hat{\lambda}_{4} decreases. Table 3-4 report the mean quality of the estimators for portfolio weights and risk over 100 replications for T=300T=300 and p{300,400}p\in\{300,400\}. The sample size and the number of regressors are chosen to closely match the values from the empirical application. POET and Projected POET are the most sensitive to a reduction in the gap between the leading and bounded eigenvalues which is evident from a dramatic deterioration in the quality of these estimators. The remaining methods, including FGL, exhibit robust performance. Since the behavior of the estimators for portfolio weights is similar to that of the estimators of precision matrix, we only report the former for the ease of presentation. For (T,p)=(300,300)(T,p)=(300,300), FClime shows the best performance followed by FGL and FLW, whereas for (T,p)=(300,400)(T,p)=(300,400) FGL takes the lead. Interestingly, despite inferior performance of POET and Projected POET in terms of estimating portfolio weights, risk exposure of the portfolios based on these estimators is competitive with the other approaches.

Table 3: Sensitivity of portfolio weights and risk exposure when the gap between the diverging and bounded eigenvalues decreases: (T,p)=(300,300)(T,p)=(300,300).
ρ=0.4\rho=0.4 ρ=0.5\rho=0.5 ρ=0.6\rho=0.6 ρ=0.7\rho=0.7 ρ=0.8\rho=0.8 ρ=0.9\rho=0.9
(λ3/λ4=3.1\lambda_{3}/\lambda_{4}=3.1) (λ3/λ4=2.7\lambda_{3}/\lambda_{4}=2.7) (λ3/λ4=2.6\lambda_{3}/\lambda_{4}=2.6) (λ3/λ4=2.2\lambda_{3}/\lambda_{4}=2.2) (λ3/λ4=1.5\lambda_{3}/\lambda_{4}=1.5) (λ3/λ4=1.1\lambda_{3}/\lambda_{4}=1.1)
𝐰^GMV𝐰GMV1\left\lVert\widehat{{\mathbf{w}}}_{\text{GMV}}-{\mathbf{w}}_{\text{GMV}}\right\rVert_{1}
FGL 2.3198 2.3465 2.5177 2.4504 2.5010 2.7319
FClime 1.9554 1.9359 1.9795 1.9103 1.9813 1.9948
FLW 2.3445 2.3948 2.5328 2.4715 2.5918 3.0515
FNLW 2.2381 2.3009 2.3293 2.5497 2.9039 3.1980
POET 47.6746 82.1873 43.9722 54.1131 157.6963 235.8119
Projected POET 9.6335 7.8669 10.1546 10.6205 12.1795 15.2581
|Φ^GMVΦGMV|\left\lvert\widehat{\Phi}_{\text{GMV}}-\Phi_{\text{GMV}}\right\rvert
FGL 0.0033 0.0032 0.0034 0.0027 0.0021 0.0023
FClime 0.0012 0.0012 0.0012 0.0011 0.0010 0.0010
FLW 0.0049 0.0052 0.0061 0.0056 0.0049 0.0059
FNLW 0.0055 0.0060 0.0054 0.0052 0.0066 0.0057
POET 0.0070 0.0122 0.0058 0.0063 0.0103 0.0160
Projected POET 0.0021 0.0022 0.0019 0.0019 0.0018 0.0026
𝐰^MWC𝐰MWC1\left\lVert\widehat{{\mathbf{w}}}_{\text{MWC}}-{\mathbf{w}}_{\text{MWC}}\right\rVert_{1}
FGL 2.3766 2.4108 2.7411 2.6094 2.5669 3.4633
FClime 2.0502 2.0279 2.2901 2.1400 2.1028 3.0737
FLW 2.4694 2.5132 2.8902 2.7315 2.7210 4.0248
FNLW 2.7268 2.3060 2.8984 3.5902 2.9232 3.2076
POET 49.8603 34.2024 469.3605 108.1529 74.8016 99.4561
Projected POET 9.0261 7.4028 8.1899 9.4806 11.9642 13.3890
|Φ^MWCΦMWC|\left\lvert\widehat{\Phi}_{\text{MWC}}-\Phi_{\text{MWC}}\right\rvert
FGL 0.0033 0.0032 0.0034 0.0027 0.0021 0.0024
FClime 0.0012 0.0012 0.0013 0.0011 0.0010 0.0009
FLW 0.0050 0.0053 0.0062 0.0057 0.0050 0.0059
FNLW 0.0055 0.0060 0.0055 0.0053 0.0066 0.0057
POET 0.0068 0.0047 0.0363 0.0092 0.0060 0.0056
Projected POET 0.0022 0.0022 0.0020 0.0020 0.0018 0.0027
𝐰^MRC𝐰MRC1\left\lVert\widehat{{\mathbf{w}}}_{\text{MRC}}-{\mathbf{w}}_{\text{MRC}}\right\rVert_{1}
FGL 0.4872 0.1793 1.0044 0.6332 1.4568 2.3353
FClime 0.5160 0.2148 1.0188 0.6694 1.4855 2.3519
FLW 0.5333 0.2279 1.0345 0.6734 1.4904 2.3691
FNLW 0.8365 1.1285 1.1181 1.4419 1.7694 2.4612
POET NaN NaN NaN NaN NaN NaN
Projected POET 0.7414 0.6383 1.6686 1.8013 2.3297 3.2791
|Φ^MRCΦMRC|\left\lvert\widehat{\Phi}_{\text{MRC}}-\Phi_{\text{MRC}}\right\rvert
FGL 0.0004 0.0003 0.0025 0.0007 0.0021 0.0071
FClime 0.0005 0.0003 0.0024 0.0004 0.0016 0.0062
FLW 0.0002 0.0002 0.0021 0.0003 0.0018 0.0066
FNLW 0.0062 0.0062 0.0069 0.0119 0.0059 0.0143
POET NaN NaN NaN NaN NaN NaN
Projected POET 0.0003 0.0003 0.0027 0.0031 0.0069 0.0062
Table 4: Sensitivity of portfolio weights and risk exposure when the gap between the diverging and bounded eigenvalues decreases: (T,p)=(300,400)(T,p)=(300,400).
ρ=0.4\rho=0.4 ρ=0.5\rho=0.5 ρ=0.6\rho=0.6 ρ=0.7\rho=0.7 ρ=0.8\rho=0.8 ρ=0.9\rho=0.9
(λ3/λ4=3.1\lambda_{3}/\lambda_{4}=3.1) (λ3/λ4=2.7\lambda_{3}/\lambda_{4}=2.7) (λ3/λ4=2.6\lambda_{3}/\lambda_{4}=2.6) (λ3/λ4=2.2\lambda_{3}/\lambda_{4}=2.2) (λ3/λ4=1.5\lambda_{3}/\lambda_{4}=1.5) (λ3/λ4=1.1\lambda_{3}/\lambda_{4}=1.1)
𝐰^GMV𝐰GMV1\left\lVert\widehat{{\mathbf{w}}}_{\text{GMV}}-{\mathbf{w}}_{\text{GMV}}\right\rVert_{1}
FGL 1.6900 1.8134 1.8577 1.8839 1.9843 2.0692
FClime 1.9073 1.9524 1.9997 1.9490 1.9898 2.0330
FLW 2.0239 2.0945 2.1195 2.1235 2.2473 2.4745
FNLW 2.0316 2.0790 2.1927 2.2503 2.4143 2.4710
POET 18.7934 28.0493 155.8479 32.4197 41.8098 71.5811
Projected POET 7.8696 8.4915 8.8641 10.7522 11.2092 19.0424
|Φ^GMVΦGMV|\left\lvert\widehat{\Phi}_{\text{GMV}}-\Phi_{\text{GMV}}\right\rvert
FGL 8.62E-04 9.22E-04 7.23E-04 7.31E-04 6.83E-04 5.73E-04
FClime 8.40E-04 8.27E-04 8.02E-04 7.87E-04 7.36E-04 6.71E-04
FLW 1.59E-03 1.73E-03 1.57E-03 1.68E-03 1.69E-03 1.54E-03
FNLW 2.24E-03 2.10E-03 1.83E-03 1.88E-03 2.07E-03 1.29E-03
POET 1.11E-03 1.46E-03 3.59E-03 1.27E-03 1.88E-03 2.51E-03
Projected POET 8.97E-04 8.80E-04 6.83E-04 6.79E-04 7.98E-04 6.55E-04
𝐰^MWC𝐰MWC1\left\lVert\widehat{{\mathbf{w}}}_{\text{MWC}}-{\mathbf{w}}_{\text{MWC}}\right\rVert_{1}
FGL 1.9034 2.2843 1.9118 3.2569 2.7055 2.8812
FClime 2.1193 2.4024 2.0540 3.3487 2.7277 2.8593
FLW 2.2573 2.5809 2.1790 3.5728 3.0072 3.3164
FNLW 2.3207 3.3335 3.5518 3.4282 2.6446 4.8827
POET 15.8824 100.1419 56.9827 33.6483 38.8961 103.0434
Projected POET 6.5386 7.2169 7.8583 9.7342 12.1420 17.7368
|Φ^MWCΦMWC|\left\lvert\widehat{\Phi}_{\text{MWC}}-\Phi_{\text{MWC}}\right\rvert
FGL 8.72E-04 9.41E-04 7.26E-04 7.99E-04 7.12E-04 6.08E-04
FClime 8.52E-04 8.49E-04 8.06E-04 8.32E-04 7.50E-04 6.86E-04
FLW 1.59E-03 1.74E-03 1.57E-03 1.71E-03 1.70E-03 1.56E-03
FNLW 2.25E-03 2.22E-03 1.89E-03 1.91E-03 2.08E-03 1.56E-03
POET 1.14E-03 4.91E-03 1.78E-03 1.45E-03 1.57E-03 2.93E-03
Projected POET 9.19E-04 9.20E-04 7.11E-04 7.04E-04 8.26E-04 6.78E-04
𝐰^MRC𝐰MRC1\left\lVert\widehat{{\mathbf{w}}}_{\text{MRC}}-{\mathbf{w}}_{\text{MRC}}\right\rVert_{1}
FGL 0.6683 0.7390 1.3103 1.5195 1.7124 3.0935
FClime 0.6903 0.7635 1.3238 1.5403 1.7415 3.1180
FLW 0.7132 0.7828 1.3430 1.5549 1.7517 3.1364
FNLW 0.4909 1.2121 1.4974 1.1996 1.8020 3.2989
POET NaN NaN NaN NaN NaN NaN
Projected POET 1.6851 1.4434 1.9628 2.6182 2.7716 4.1753
|Φ^MRCΦMRC|\left\lvert\widehat{\Phi}_{\text{MRC}}-\Phi_{\text{MRC}}\right\rvert
FGL 1.02E-03 9.73E-04 4.63E-03 4.49E-03 3.23E-03 8.73E-03
FClime 1.14E-03 1.01E-03 4.55E-03 4.22E-03 2.70E-03 7.72E-03
FLW 6.62E-04 5.54E-04 4.19E-03 4.01E-03 2.71E-03 8.11E-03
FNLW 2.73E-04 6.93E-03 5.11E-03 1.93E-03 6.42E-03 2.98E-02
POET NaN NaN NaN NaN NaN NaN
Projected POET 3.59E-03 1.20E-03 1.49E-03 2.58E-03 7.86E-03 1.39E-02

Appendix D Additional Empirical Results

This Appendix contains the description of the procedure used to estimate unknown factors and loadings using PCA (Appendix D.1), and additional empirical results with portfolio performance for monthly data (Appendix D.2) and verifying robustness of FGL towards different training periods (Appendix D.3), different target risk and return (Appendix D.4), and subperiod analyses for MWC and GMV portfolios (Appendix D.5).

D.1 Estimating Unknown Factors and Loadings

Remark 1.

In practice, the number of common factors, KK, is unknown and needs to be estimated. One of the standard and commonly used approaches is to determine KK in a data-driven way (Bai and Ng, (2002); Kapetanios, (2010)). As an example, in their paper Fan et al., (2013) adopt the approach from Bai and Ng, (2002). However, all of the aforementioned papers deal with a fixed number of factors. Therefore, we need to adopt a different criteria since KK is allowed to grow in our setup. For this reason, we use the methodology by Li et al., (2017): let 𝐛i,K{\mathbf{b}}_{i,K} and 𝐟t,K{\mathbf{f}}_{t,K} denote K×1K\times 1 vectors of loadings and factors when KK needs to be estimated, and 𝐁K{\mathbf{B}}_{K} is a p×Kp\times K matrix of stacked 𝐛i,K{\mathbf{b}}_{i,K}. Define

V(K)=min𝐁K,𝐅K1pTi=1pt=1T(rit1K𝐛i,K𝐟t,K)2,V(K)=\min_{{\mathbf{B}}_{K},{\mathbf{F}}_{K}}\frac{1}{pT}\sum_{i=1}^{p}\sum_{t=1}^{T}\Big{(}r_{it}-\frac{1}{\sqrt{K}}{\mathbf{b}}^{\prime}_{i,K}{\mathbf{f}}_{t,K}\Big{)}^{2}, (D.1)

where the minimum is taken over 1KKmax1\leq K\leq K_{\textup{max}}, subject to normalization 𝐁K𝐁K/p=𝐈K{\mathbf{B}}^{\prime}_{K}{\mathbf{B}}_{K}/p={\mathbf{I}}_{K}. Hence, 𝐅¯K=K𝐑𝐁K/p\bar{{\mathbf{F}}}^{\prime}_{K}=\sqrt{K}{\mathbf{R}}^{\prime}{\mathbf{B}}_{K}/p. Define 𝐅^K=𝐅¯K(𝐅¯K𝐅¯K/T)1/2\widehat{{\mathbf{F}}}^{\prime}_{K}=\bar{{\mathbf{F}}}^{\prime}_{K}(\bar{{\mathbf{F}}}_{K}\bar{{\mathbf{F}}}^{\prime}_{K}/T)^{1/2}, which is a rescaled estimator of the factors that is used to determine the number of factors when KK grows with the sample size. We then apply the following procedure described in Li et al., (2017) to estimate KK:

K^=argmin1KKmaxln(V(K,𝐅^K))+Kg(p,T),\widehat{K}=\arg\!\min_{1\leq K\leq K_{\textup{max}}}\ln(V(K,\hat{{\mathbf{F}}}_{K}))+Kg(p,T), (D.2)

where 1KKmax=o(min{p1/17,T1/16})1\leq K\leq K_{\textup{max}}=o(\min\{p^{1/17},T^{1/16}\}) and g(p,T)g(p,T) is a penalty function of (p,T)(p,T) such that (i) Kmaxg(p,T)0K_{\textup{max}}\cdot g(p,T)\rightarrow 0 and (ii) Cp,T,Kmax1g(p,T)C_{p,T,K_{\textup{max}}}^{-1}\cdot g(p,T)\rightarrow\infty with Cp,T,Kmax=𝒪P(max[Kmax3p,Kmax5/2T])C_{p,T,K_{\textup{max}}}=\mathcal{O}_{P}\Big{(}\max\Big{[}\frac{K^{3}_{\textup{max}}}{\sqrt{p}},\frac{K^{5/2}_{\textup{max}}}{\sqrt{T}}\Big{]}\Big{)}. The choice of the penalty function is similar to Bai and Ng, (2002). Throughout the paper we let K^\widehat{K} be the solution to (D.2).

D.2 Monthly Data

Similarly to daily data, we use monthly returns of the components of the S&P500. The data is fetched from CRSP and Compustat using SAS interface. The full sample for the monthly data has 480 observations on 355 stocks from January 1, 1980 - December 1, 2019. We use January 1, 1980 - December 1, 1994 (180 obs) as a training (estimation) period and January 1, 1995 - December 1, 2019 (300 obs) as the out-of-sample test period. At the end of each month, prior to portfolio construction, we remove stocks with less than 15 years of historical stock return data. We set the return target μ=0.7974%\mu=0.7974\% which is equivalent to 10%10\% yearly return when compounded. The target level of risk for the weight-constrained and risk-constrained Markowitz portfolio (MWC and MRC) is set at σ=0.05\sigma=0.05 which is the standard deviation of the monthly excess returns of the S&P500 index in the first training set. Transaction costs are taken to be the same as for the daily returns in Section 6.

Table 5 reports the results for monthly data. Some comments are in order: (1) interestingly, MRC produces portfolio return and Sharpe Ratio that are mostly higher than those for the weight-constrained allocations MWC and GMV. This means that relaxing the constraint that portfolio weights sum up to one leads to a large increase in the out-of-sample Sharpe Ratio and portfolio return which has not been previously well-studied in the empirical finance literature. (2) Similarly to the results from Table 1, FGL outperforms the competitors including EW and Index in terms of the out-of-sample Sharpe Ratio and turnover. (3) Similarly to the results in Table 1, the observable Fama-French factors produce the FGL portfolios with higher return and higher out-of-sample Sharpe Ratio compared to the FGL portfolios based on statistical factors. Again, this increase in return is not followed by higher risk. (4) To further verify that the shrinkage is functioning as desired and the estimated 𝚯ε{\bm{\Theta}}_{\varepsilon} is indeed sparse we include several visualizations. Figure 11 reports optimally tuned values of λ\lambda (please refer to Section 3 of the main manuscript for a discussion on choosing the optimal shrinkage intensity) over the estimation period. Figure 12 plots the proportion of zero elements in the precision matrix of the idiosyncratic part, 𝚯^ε\widehat{{\bm{\Theta}}}_{\varepsilon}, corresponding to the optimal values of λ=λ^\lambda=\hat{\lambda}, and several fixed values of λ\lambda for monthly data over the testing period. Extracting the common factors significantly reduces partial correlations of the error terms, rendering 𝚯^ε\widehat{{\bm{\Theta}}}_{\varepsilon} sparse over the testing period: the number of zeroes for the optimally tuned λ\lambda varies from 74.5%-98.8%. Figure 13 plots the Sharpe Ratio of GMV portfolios for a set of fixed values of λ{0.005,0.01,0.05,0.08,0.1,0.12,0.15,0.17,0.2,0.25,0.3,0.4,0.5}\lambda\in\{0.005,0.01,0.05,0.08,0.1,0.12,0.15,0.17,0.2,0.25,0.3,0.4,0.5\}. In other words, instead of using optimally tuned λ\lambda, we fix its value throughout the whole testing period and report the corresponding SR of such portfolios. For comparison, the SR that corresponds to the optimally tuned λ\lambda is equal to 0.20230.2023, which is significantly higher than the SR achieved for any fixed λ\lambda confirming the importance of selecting shrinkage intensity optimally.

We would like to emphasize that the selection of the tuning parameter is critically important in the literature on graphical models, which is why we build our tuning methodology on the Bayesian Information Criteria (BIC) as used and described in Koike, (2020); Bishop, (2006); Pourahmadi, (2013); Janková and van de Geer, (2018) among others (the detailed treatment relevant to our paper can be found on p.13 of the main manuscript). The advantage of SR obtained using optimally tuned λ\lambda highlights the importance of tuning and demonstrates that λ\lambda changes over time. Hence, using a fixed value is expected to produce suboptimal performance.

We now elaborate on the discrepancy between the SR with the optimal vs fixed λ\lambda. Please note that Figure 11 should not be compared with Figure 13. In contrast to Figure 13, the range of λ\lambda in Figure 11 is selected optimally by minimizing BIC. In other words, SR is not the objective function that we use for selecting the tuning parameter. To demonstrate the relevant range of λ\lambda selected by the BIC we have included Figure 14 that shows optimally selected λ\lambda for six different rolling windows.

Refer to caption
Figure 11: Optimally tuned values of λ\lambda over the testing period.
Refer to caption
Figure 12: Proportion of zero elements in 𝚯^ε\widehat{{\bm{\Theta}}}_{\varepsilon} with respect to the total number of elements in a lower-triangular part of 𝚯^ε\widehat{{\bm{\Theta}}}_{\varepsilon} (diagonals are excluded) corresponding to the optimal values of λ=λ^\lambda=\hat{\lambda}, and several fixed values of λ\lambda.
Refer to caption
Figure 13: Sharpe Ratios for GMV portfolios associated with fixed λ\lambda.
Refer to caption
Figure 14: BICs for several rolling windows indexed by t=1,,300t=1,\ldots,300.
Table 5: Monthly portfolio returns, risk, SR and turnover. In the upper part corresponding to the results w/o transactions costs, p-values are in parentheses. In the lower part corresponding to the results with transaction costs, ∗∗∗ indicates p-value << 0.01, ∗∗ indicates p-value << 0.05, and indicates p-value << 0.10. In-sample: January 1, 1980 - December 31, 1995 (180 obs), Out-of-sample: January 1, 1995 - December 31, 2019 (300 obs).
Markowitz Risk-Constrained Markowitz Weight-Constrained Global Minimum-Variance
Return Risk SR Turnover Return Risk SR Turnover Return Risk SR Turnover
Without TC
EW 0.0081 0.0519 0.1553 - 0.0081 0.0519 0.1553 - 0.0081 0.0519 0.1553 -
Index 0.0063 0.0453 0.1389 - 0.0063 0.0453 0.1389 - 0.0063 0.0453 0.1389 -
FGL 0.0256 0.0828
0.3099
(0.0799)
- 0.0059 0.0329
0.1804
(0.0430)
- 0.0065 0.0321
0.2023
(0.046)
-
FClime 0.0372 0.2337
0.1593
(0.2715)
- 0.0067 0.0471
0.1434
(0.0791)
- 0.0076 0.0466
0.1643
(0.047)
-
FLW 0.0296 0.1049
0.2817
(0.0879)
- 0.0059 0.0353
0.1662
(0.0791)
- 0.0063 0.0353
0.1774
(0.047)
-
FNLW 0.0264 0.0925
0.2853
(0.0879)
- 0.0060 0.0333
0.1793
(0.0430)
- 0.0064 0.0332
0.1930
(0.046)
-
POET NaN NaN NaN - -0.1041 2.0105
-0.0518
(0.9925)
- 0.5984 11.0064
0.0544
(0.6344)
-
Projected POET 0.0583 0.3300
0.1766
(0.2715)
- 0.0058 0.0546
0.1056
(0.0791)
- 0.0069 0.0612
0.1128
(0.2693)
-
FGL (FF1) 0.0275 0.0800
0.3433
(0.0659)
- 0.0061 0.0316
0.1941
(0.0415)
- 0.0073 0.0302
0.2427
(0.035)
-
FGL (FF3) 0.0274 0.0797
0.3437
(0.0659)
- 0.0061 0.0314
0.1955
(0.0415)
- 0.0073 0.0300
0.2440
(0.035)
-
FGL (FF5) 0.0273 0.0793
0.3443
(0.0659)
- 0.0061 0.0314
0.1943
(0.0415)
- 0.0073 0.0300
0.2426
(0.035)
-
FF1 0.0403 0.2250
0.1789
(0.2715)
- 0.0025 0.0548
0.0452
(0.9318)
- 0.0043 0.0546
0.0781
(0.6344)
-
FF3 0.0389 0.2022
0.1926
(0.2715)
- 0.0032 0.0528
0.0610
(0.9318)
- 0.0047 0.0517
0.0915
(0.6344)
-
FF5 0.0354 0.1803
0.1962
(0.2715)
- 0.0036 0.0531
0.0670
(0.9318)
- 0.0048 0.0513
0.0945
(0.6344)
-
With TC
EW 0.0080 0.0520 0.1538 0.0630 0.0080 0.0520 0.1538 0.0630 0.0080 0.0520 0.1538 0.0630
FGL 0.0222 0.0828 0.2682* 3.1202 0.0050 0.0329 0.1525* 0.8786 0.0056 0.0321 0.1740** 0.8570
FClime 0.0334 0.2334 0.1429 4.9174 0.0062 0.0471 0.1307 0.5945 0.0071 0.0466 0.1522* 0.5528
FLW 0.0237 0.1052 0.2257 5.5889 0.0043 0.0353 0.1231 1.5166 0.0048 0.0354 0.1343 1.5123
FNLW 0.0224 0.0927 0.2415* 3.7499 0.0049 0.0334 0.1463* 1.0812 0.0053 0.0333 0.1596* 1.0793
POET NaN NaN NaN NaN -0.1876 1.7274 -0.1086 152.3298 1.0287 14.2676 0.0721 354.6043
Projected POET 0.0166 0.2859 0.0579 69.7600 -0.0002 0.0540 -0.0044 5.9131 -0.0002 0.0613 -0.0027 7.0030
FGL (FF1) 0.0243 0.0800 0.3036* 2.8514 0.0054 0.0317 0.1692* 0.7513 0.0066 0.0302 0.2176** 0.7095
FGL (FF3) 0.0242 0.0797 0.3037* 2.8708 0.0054 0.0314 0.1703* 0.7545 0.0066 0.0300 0.2186** 0.7127
FGL (FF5) 0.0241 0.0793 0.3037* 2.8857 0.0053 0.0315 0.1686* 0.7630 0.0065 0.0300 0.2167** 0.7224
FF1 0.0169 0.2331 0.0767 23.3910 -0.0023 0.0545 -0.0415 4.6257 -0.0004 0.0543 -0.0079 4.5751
FF3 0.0185 0.2268 0.0924 20.6137 -0.0013 0.0524 -0.0243 4.3667 0.0003 0.0514 0.0059 4.2956
FF5 0.0164 0.2254 0.0918 18.5514 -0.0008 0.0527 -0.0145 4.2134 0.0005 0.0508 0.0108 4.1681

D.3 Portfolio Performance for Longer Training Periods

This section examines the performance of the methods when training periods were increased. Tables 6 and 7 report the results: the conclusions that we highlighted when analyzing Tables 5 and 1 continue to hold. We observed an interesting finding: for MRC portfolios (both monthly and daily), a larger training period changed the values of portfolio return and risk for all methods, however, their relative value illustrated by the SR remained unchanged. This is due to the fact that MRC portfolios maximize SR subject to either target risk or target return constraints:

max𝐰𝐦𝐰𝐰𝚺𝐰s.t.(i)𝐦𝐰μor(ii)𝐰𝚺𝐰σ2,\max_{{\mathbf{w}}}\frac{{\mathbf{m}}^{\prime}{\mathbf{w}}}{\sqrt{{\mathbf{w}}^{\prime}{\bm{\Sigma}}{\mathbf{w}}}}\ \text{s.t.}\ \text{(i)}\ {\mathbf{m}}^{\prime}{\mathbf{w}}\geq\mu\ \text{or}\text{(ii)}\ {\mathbf{w}}^{\prime}{\bm{\Sigma}}{\mathbf{w}}\leq\sigma^{2},

when μ=σ𝐦𝚯𝐦\mu=\sigma\sqrt{{\mathbf{m}}^{\prime}{\bm{\Theta}}{\mathbf{m}}}, the solution to either of the constraints is given by 𝐰MRC=σ𝐦𝚯𝐦𝚯𝐦{\mathbf{w}}_{\text{MRC}}=\frac{\sigma}{\sqrt{{\mathbf{m}}^{\prime}{\bm{\Theta}}{\mathbf{m}}}}{\bm{\Theta}}{\mathbf{m}}. Hence, even though the training period was increased, the maximum achievable SR remained the same since neither target risk nor target return were changed.

Table 6: Monthly portfolio returns, risk, SR and turnover. In the upper part corresponding to the results w/o transactions costs, p-values are in parentheses. In the lower part corresponding to the results with transaction costs, ∗∗∗ indicates p-value << 0.01, ∗∗ indicates p-value << 0.05, and indicates p-value << 0.10. In-sample: January 1, 1980 - December 31, 1999 (240 obs), Out-of-sample: January 1, 2000 - December 31, 2019 (240 obs).
Markowitz Risk-Constrained Markowitz Weight-Constrained Global Minimum-Variance
Return Risk SR Turnover Return Risk SR Turnover Return Risk SR Turnover
Without TC
EW 0.0061 0.0029 0.1126 - 0.0061 0.0029 0.1126 - 0.0061 0.0029 0.1126 -
Index 0.0032 0.0022 0.0692 - 0.0032 0.0022 0.0692 - 0.0032 0.0022 0.0692 -
FGL 0.0223 0.0805
0.2770
(0.0609)
- 0.0054 0.0343
0.1581
(0.0939)
- 0.0062 0.0333
0.1848
(0.1189)
-
FClime 0.0339 0.2642
0.1285
(0.0989)
- 0.0068 0.0510
0.1341
(0.1119)
- 0.0072 0.0483
0.1482
(0.1419)
-
FLW 0.0268 0.1030
0.2606
(0.0609)
- 0.0047 0.0400
0.1173
(0.1119)
- 0.0056 0.0396
0.1423
(0.1419)
-
FNLW 0.0234 0.0887
0.2633
(0.0609)
- 0.0050 0.0360
0.1397
(0.1119)
- 0.0060 0.0354
0.1701
(0.1189)
-
POET NaN NaN NaN - 0.2800 3.7119
0.0754
(0.2617)
- -0.0239 0.6870
-0.0348
(0.7729)
-
Projected POET -0.0371 0.8637
-0.0430
(0.9137)
- 0.0032 0.0447
0.0708
(0.2617)
- 0.0014 0.0823
0.0174
(0.7729)
-
FGL(FF1) 0.0224 0.0775
0.2891
(0.0579)
- 0.0056 0.0325
0.1721
(0.0839)
- 0.0064 0.0307
0.2070
(0.0889)
-
FGL(FF3) 0.0223 0.0772 0.2885 - 0.0056 0.0324
0.1728
(0.0839)
- 0.0064 0.0307
0.2075
(0.0889)
-
FGL(FF5) 0.0222 0.0769
0.2887
(0.0579)
- 0.0056 0.0325
0.1719
(0.0839)
- 0.0063 0.0307
0.2058
(0.0889)
-
FF1 -0.0074 1.0260
-0.0072
(0.9512)
- 0.0090 0.1158
0.0777
(0.2617)
- 0.0099 0.1162
0.0853
(0.1738)
-
FF3 0.0234 0.9301
0.0252
(0.9512)
- 0.0073 0.1011
0.0726
(0.2617)
- 0.0083 0.1040
0.0796
(0.1738)
-
FF5 0.0111 0.7406
0.0149
(0.9512)
- 0.0064 0.0906
0.0708
(0.2617)
- 0.0056 0.0942
0.0590
(0.1738)
-
With TC
EW 0.0060 0.0029 0.1109 0.0627 0.0060 0.0029 0.1109 0.0627 0.0060 0.0029 0.1109 0.0627
FGL 0.0190 0.1826 0.2351* 3.0398 0.0044 0.0344 0.1280 0.9940 0.0052 0.0334 0.1548 0.9519
FClime 0.0305 0.7518 0.1158 5.1354 0.0063 0.0510 0.1231 0.5606 0.0067 0.0484 0.1378 0.4962
FLW 0.0203 0.1992 0.1961* 6.3365 0.0028 0.0401 0.0703 1.8813 0.0038 0.0397 0.0952 1.8528
FNLW 0.0195 0.2199 0.2193* 3.6238 0.0038 0.0361 0.1066 1.1847 0.0049 0.0355 0.1367 1.1596
POET NaN NaN NaN NaN 0.1707 3.4886 0.0489 151.5813 -0.0846 0.5652 -0.1496 76.9912
Projected POET -0.0556 0.1885 -0.0757* 33.2465 -0.0014 0.0451 -0.0313 4.3028 -0.0031 0.0734 -0.0429 15.5429
FGL(FF1) 0.0194 0.2708 0.2497* 2.7486 0.0047 0.0326 0.1454 0.8299 0.0055 0.0308 0.1803 0.7703
FGL(FF3) 0.0192 0.1754 0.2485* 2.7682 0.0047 0.0325 0.1460 0.8343 0.0055 0.0307 0.1807 0.7754
FGL(FF5) 0.0192 0.1753 0.2488* 2.7807 0.0047 0.0325 0.1451 0.8377 0.0055 0.0307 0.1789 0.7815
FF1 -0.1143 0.1753 -0.1499 338.4639 -0.0039 0.1160 -0.0337 13.1081 -0.0041 0.1167 -0.0353 14.2580
FF3 -0.0763 0.3416 -0.1047 285.5053 -0.0038 0.1010 -0.0380 11.3194 -0.0042 0.1043 -0.0407 12.6905
FF5 -0.0547 0.3229 -0.0922 604.2117 -0.0038 0.0904 -0.0423 10.3659 -0.0052 0.0944 -0.0548 10.9466
Table 7: Daily portfolio returns, risk, SR and turnover. In the upper part corresponding to the results w/o transactions costs, p-values are in parentheses. In the lower part corresponding to the results with transaction costs, ∗∗∗ indicates p-value << 0.01, ∗∗ indicates p-value << 0.05, and indicates p-value << 0.10. In-sample: January 20, 2000 - January 25, 2005 (1260 obs), Out-of-sample: January 26, 2005 - January 31, 2020 (3780 obs).
Markowitz Risk-Constrained Markowitz Weight-Constrained Global Minimum-Variance
Return Risk SR Turnover Return Risk SR Turnover Return Risk SR Turnover
Without TC
EW 2.19E-04 1.98E-02 0.0111 - 2.19E-04 1.98E-02 0.0111 - 2.19E-04 1.98E-02 0.0111 -
Index 2.15E-04 1.16E-02 0.0185 - 2.15E-04 1.16E-02 0.0185 - 2.15E-04 1.16E-02 0.0185 -
FGL 8.86E-04 2.90E-02
0.0305
(0.0450)
- 3.51E-04 7.07E-03
0.0496
(0.0020)
- 3.51E-04 6.98E-03
0.0503
(0.0025)
-
FClime 1.30E-03 8.36E-02
0.0156
(0.2513)
- 2.41E-04 1.04E-02
0.0231
(0.0315)
- 2.75E-04 1.10E-02
0.0250
(0.0415)
-
FLW 4.24E-04 2.88E-02
0.0147
(0.2513)
- 3.12E-04 7.06E-03
0.0443
(0.0025)
- 3.15E-04 7.41E-03
0.0425
(0.0033)
-
FNLW 3.20E-04 5.33E-02
0.0060
(0.6397)
- 3.23E-04 7.01E-03
0.0461
(0.0020)
- 3.49E-04 8.44E-03
0.0414
(0.0033)
-
POET NaN NaN NaN - 5.39E-03 3.82E-01
0.0141
(0.1384)
- -8.23E-05 9.49E-02
-0.0009
(0.9218)
-
Projected POET 7.86E-04 7.74E-02
0.0101
(0.2513)
- -1.70E-04 1.09E-02
-0.0156
(0.9713)
- -1.78E-04 1.15E-02
-0.0155
(0.9218)
-
FGL(FF1) 6.03E-04 3.56E-02
0.0169
(0.2513)
- 3.58E-04 6.98E-03
0.0513
(0.0010)
- 3.68E-04 7.02E-03
0.0523
(0.0025)
-
FGL(FF3) 6.02E-04 3.56E-02
0.0169
(0.2513)
- 3.58E-04 6.98E-03
0.0514
(0.0010)
- 3.68E-04 7.02E-03
0.0524
(0.0025)
-
FGL(FF5) 6.01E-04 3.56E-02
0.0169
(0.2513)
- 3.57E-04 6.98E-03
0.0512
(0.0010)
- 3.67E-04 7.02E-03
0.0522
(0.0025)
-
FF1 6.13E-04 5.22E-02
0.0117
(0.2513)
- 2.93E-04 7.23E-03
0.0405
(0.0032)
- 2.99E-04 8.06E-03
0.0371
(0.0285)
-
FF3 6.13E-04 5.22E-02
0.0117
(0.2513)
- 2.93E-04 7.23E-03
0.0405
(0.0032)
- 2.99E-04 8.06E-03
0.0371
(0.0285)
-
FF5 6.13E-04 5.22E-02
0.0117
(0.2513)
- 2.93E-04 7.23E-03
0.0405
(0.0032)
- 2.99E-04 8.06E-03
0.0371
(0.0285)
-
With TC
EW 1.87E-04 1.98E-02 0.0094 0.0294 1.87E-04 1.98E-02 0.0094 0.0294 1.87E-04 1.98E-02 0.0094 0.0294
FGL 5.07E-04 8.37E-02 0.0175 0.3845 2.64E-04 7.09E-03 0.0372*** 0.0882 2.66E-04 7.00E-03 0.038** 0.0863
FClime 4.36E-04 3.03E-01 0.0052 0.9279 2.10E-04 1.04E-02 0.0201* 0.0333 2.51E-04 1.10E-02 0.0228** 0.0266
FLW -1.26E-05 8.62E-02 -0.0004 0.4399 1.86E-04 7.09E-03 0.0263** 0.1267 1.91E-04 7.44E-03 0.0256** 0.1251
FNLW -4.74E-04 1.05E-01 -0.009 0.806 1.38E-04 7.07E-03 0.0195 0.1856 1.65E-04 8.48E-03 0.0194 0.1846
POET NaN NaN NaN NaN -6.70E-03 1.95E-01 -0.0344 25.1772 -7.54E-03 9.20E-02 -0.082 11.6087
Projected POET -3.51E-03 9.21E-02 -0.0485 10.9077 -6.81E-04 1.12E-02 -0.0610 0.5127 -7.37E-04 1.19E-02 -0.0621 0.5609
FGL(FF1) 2.02E-04 1.09E-01 0.0057 0.4028 2.68E-04 7.00E-03 0.0383*** 0.0899 2.80E-04 7.04E-03 0.0397** 0.0877
FGL(FF3) 2.02E-04 8.39E-02 0.0057 0.4028 2.68E-04 7.00E-03 0.0383*** 0.0901 2.80E-04 7.04E-03 0.0397** 0.0879
FGL(FF5) 1.99E-04 8.39E-02 0.0056 0.4032 2.67E-04 6.99E-03 0.0382*** 0.0901 2.79E-04 7.04E-03 0.0396** 0.088
FF1 -7.16E-04 8.39E-02 -0.0139 1.3523 1.61E-05 7.34E-03 0.0022 0.2748 2.35E-05 8.15E-03 0.0029 0.2736
FF3 -7.16E-04 9.03E-02 -0.0139 1.3523 1.61E-05 7.34E-03 0.0022 0.2748 2.35E-05 8.15E-03 0.0029 0.2736
FF5 -7.16E-04 9.03E-02 -0.0139 1.3523 1.61E-05 7.34E-03 0.0022 0.2748 2.35E-05 8.15E-03 0.0029 0.2736

D.4 Less Risk-Averse Investors

Tables 8 and 9 provide the empirical results for higher target levels of risk and return for both monthly and daily data: target risk for monthly and daily data is set at σ=0.08\sigma=0.08 and σ=0.02\sigma=0.02, respectively. Target return for monthly and daily data is set at 1.1715%1.1715\% and 0.0555%0.0555\%, respectively, both are equivalent to 15%15\% yearly return when compounded. Since GMV portfolio weight is not affected by target risk and return, only updated results for MRC and MWC are reported. Furthermore, since EW and Index portfolios are also not affected by target risk and return, their values are the same as in Table 5 and, hence, are also not reported to avoid repetition. The conclusions that we highlighted when analyzing updated Tables 5 and 1 continue to hold.

Table 8: Monthly portfolio returns, risk, SR and turnover. Targeted risk is set at σ=0.08\sigma=0.08, monthly targeted return is 1.1715%1.1715\% which is equivalent to 15%15\% yearly return when compounded. In the upper part corresponding to the results w/o transactions costs, p-values are in parentheses. In the lower part corresponding to the results with transaction costs, ∗∗∗ indicates p-value << 0.01, ∗∗ indicates p-value << 0.05, and indicates p-value << 0.10. In-sample: January 1, 1980 - December 31, 1995 (180 obs), Out-of-sample: January 1, 1995 - December 31, 2019 (300 obs).
Markowitz Risk-Constrained Markowitz Weight-Constrained
Return Risk SR Turnover Return Risk SR Turnover
Without TC
FGL 0.041 0.1324
0.3099
(0.0769)
- 0.0069 0.0317
0.2187
(0.028)
-
FClime 0.0596 0.3739
0.1593
(0.1272)
- 0.0076 0.0441
0.1717
(0.034)
-
FLW 0.0473 0.1679
0.2817
(0.0849)
- 0.007 0.0344
0.2047
(0.028)
-
FNLW 0.0422 0.148
0.2853
(0.0849)
- 0.0071 0.0324
0.2190
(0.028)
-
POET NaN NaN NaN - -0.1144 1.9928
-0.0574
(0.9471)
-
Projected POET 0.0933 0.5281
0.1766
(0.1272)
- 0.0075 0.051
0.1471
(0.0837)
-
FGL(FF1) 0.0439 0.128
0.3433
(0.0649)
- 0.0072 0.0303
0.2369
(0.0220)
-
FGL(FF3) 0.0438 0.1275
0.3437
(0.0649)
- 0.0072 0.0301
0.2385
(0.0220)
-
FGL(FF5) 0.0437 0.1269
0.3443
(0.0649)
- 0.0072 0.0301
0.2377
(0.0220)
-
FF1 0.0644 0.36
0.1789
(0.1272)
- 0.0038 0.0538
0.0706
(0.4833)
-
FF3 0.0623 0.3235
0.1926
(0.1272)
- 0.0045 0.0513
0.0869
(0.4833)
-
FF5 0.0566 0.2885
0.1962
(0.1272)
- 0.0047 0.0513
0.0908
(0.4833)
-
With TC
FGL 0.0353 0.1792 0.2666* 5.2184 0.006 0.0317 0.1897** 0.8622
FClime 0.0528 3.7772 0.1422 10.133 0.007 0.0442 0.1577* 0.5971
FLW 0.0375 0.1881 0.223 9.5001 0.0055 0.0345 0.1606* 1.5019
FNLW 0.0355 0.2159 0.2393* 6.3769 0.006 0.0325 0.185** 1.0653
POET NaN NaN NaN NaN -0.1933 1.7451 -0.1108 124.9832
Projected POET 0.0313 0.1825 0.073 85.8766 0.0014 0.0505 0.0277 5.9556
FGL(FF1) 0.0386 0.2476 0.3018* 4.8017 0.0064 0.0303 0.2113** 0.7219
FGL(FF3) 0.0385 0.1738 0.3018* 4.8312 0.0064 0.0301 0.2127** 0.7245
FGL(FF5) 0.0383 0.1733 0.3018* 4.8537 0.0064 0.0302 0.2112** 0.7335
FF1 0.0244 0.1733 0.0707 64.7017 -0.0009 0.0535 -0.0162 4.5438
FF3 0.028 0.2331 0.0896 168.9642 4.04E-05 0.051 0.0008 4.2854
FF5 0.0237 0.2268 0.0836 34.1596 0.0004 0.0509 0.0077 4.1438
Table 9: Daily portfolio returns, risk, SR and turnover. Targeted risk is set at σ=0.02\sigma=0.02, daily targeted return is 0.0555%0.0555\% which is equivalent to 15%15\% yearly return when compounded. In the upper part corresponding to the results w/o transactions costs, p-values are in parentheses. In the lower part corresponding to the results with transaction costs, ∗∗∗ indicates p-value << 0.01, ∗∗ indicates p-value << 0.05, and indicates p-value << 0.10. In-sample: January 20, 2000 - January 24, 2002 (504 obs), Out-of-sample: January 17, 2002 - January 31, 2020 (4536 obs).
Markowitz Risk-Constrained Markowitz Weight-Constrained
Return Risk SR Turnover Return Risk SR Turnover
Without TC
FGL 1.25E-03 4.09E-02
0.0305
(0.0709)
- 3.10E-04 7.86E-03
0.0394
(0.0260)
-
FClime 3.30E-03 1.30E-01
0.0254
(0.0814)
- 2.20E-04 9.61E-03
0.0229
(0.036)
-
FLW 6.68E-04 4.08E-02
0.0164
(0.1539)
- 3.21E-04 9.36E-03
0.0343
(0.0280)
-
FNLW 7.56E-04 1.02E-01
0.0074
(0.7312)
- 3.02E-04 1.16E-02
0.0261
(0.0360)
-
POET NaN NaN NaN - -5.17E-04 2.89E-01
-0.0018
(0.7419)
-
Projected POET 1.84E-03 2.63E-01
0.0070
(0.7312)
- -6.76E-05 1.58E-02
-0.0043
(0.7419)
-
FGL(FF1) 1.24E-03 4.10E-02
0.0303
(0.0709)
- 3.10E-04 7.56E-03
0.0410
(0.0260)
-
FGL(FF3) 1.25E-03 4.09E-02
0.0306
(0.0709)
- 3.15E-04 7.54E-03
0.0417
(0.0260)
-
FGL(FF5) 1.24E-03 4.11E-02
0.0301
(0.0709)
- 3.15E-04 7.52E-03
0.0419
(0.0260)
-
FF1 1.14E-03 1.71E-01
0.0067
(0.7312)
- 3.78E-05 1.64E-02
0.0023
(0.5813)
-
FF3 1.16E-03 1.70E-01
0.0068
(0.7312)
- 3.14E-05 1.64E-02
0.0019
(0.5813)
-
FF5 1.17E-03 1.70E-01
0.0069
(0.7312)
- 2.47E-05 1.64E-02
0.0015
(0.5813)
-
With TC
FGL 6.14E-04 8.67E-02 0.0150 0.6385 2.43E-04 7.86E-03 0.0310** 0.0673
FClime 1.31E-03 6.49E-01 0.0101 2.4056 1.84E-04 9.61E-03 0.0191 0.0382
FLW -1.58E-04 9.69E-02 -0.0039 0.8283 2.01E-04 9.38E-03 0.0214** 0.1218
FNLW -4.50E-03 1.03E-01 -0.0422 10.5211 5.71E-05 1.17E-02 0.0049 0.2461
POET NaN NaN NaN NaN -2.50E-02 6.21E-01 -0.0403 113.1667
Projected POET -2.93E-02 1.15E-01 -0.0315 84.1090 -1.02E-03 1.65E-02 -0.0615 0.9502
FGL(FF1) 5.81E-04 1.43E-01 0.0141 0.6642 2.43E-04 7.57E-03 0.0321** 0.0681
FGL(FF3) 5.89E-04 8.51E-02 0.0144 0.6642 2.47E-04 7.55E-03 0.0327** 0.0685
FGL(FF5) 5.76E-04 8.50E-02 0.0140 0.6646 2.47E-04 7.53E-03 0.0328** 0.0687
FF1 -1.33E-02 8.49E-02 -0.0858 15.6900 -5.30E-04 1.66E-02 -0.0319 0.5790
FF3 -1.32E-02 1.28E-01 -0.0854 15.6211 -5.36E-04 1.66E-02 -0.0323 0.5785
FF5 -1.32E-02 1.28E-01 -0.0852 15.5866 -5.43E-04 1.66E-02 -0.0327 0.5786

D.5 Subperiod Analyses: MWC and GMV

Tables 10 and 11 report subperiod analyses for MWC and MRC portfolio formulations. The values of the EW and Index portfolios are the same as in Table 2 and, hence, are also not reported to avoid repetition. In terms of relative comparison between the competing models, the conclusions are similar to those drawn when examining Table 2 in the main text. However, in terms of relative magnitude, all models that use MWC or GMV portfolios exhibit deteriorated performance in terms of CER and SR during economic downturns (Downturn #1 and Downturn #2): MRC from Table 2 is the only type of portfolio that produces positive CER during both recessions.

Table 10: Cumulative excess return (CER) and risk of MWC portfolios using daily data. Targeted risk is set at σ=0.013\sigma=0.013, daily targeted return is 0.0378%0.0378\%. P-values are in parentheses. In-sample: January 20, 2000 - January 24, 2002 (504 obs), Out-of-sample: January 17, 2002 - January 31, 2020 (4536 obs).
FGL FClime FLW FNLW POET ProjPOET FGL(FF1) FGL(FF3) FGL(FF5) FF1 FF3 FF5
Downturn #1: Argentine Great Depression (2002)
CER -0.0138 -0.1045 -0.0158 -0.0195 -0.2820 -0.0217 -0.0153 -0.0176 -0.0187 -0.0334 -0.0334 -0.0334
Risk 0.0082 0.0124 0.0080 0.0078 0.0324 0.0130 0.0078 0.0078 0.0078 0.0095 0.0095 0.0095
SR -0.0031
-0.0314
(0.6753)
-0.0045
(0.6753)
-0.0069
(0.6753)
-0.0265
(0.6753)
-0.0007
(0.6753)
-0.0044
(0.6753)
-0.0057
(0.6753)
-0.0063
(0.6753)
-0.0194
(0.6414)
-0.0194
(0.6414)
-0.0194
(0.6414)
Downturn #2: Financial Crisis (2008)
CER -0.1956 -0.3974 -0.2789 -0.2811 -0.9989 -0.0842 -0.2107 -0.2074 -0.2053 -0.2669 -0.2669 -0.2669
Risk 0.0135 0.0204 0.0126 0.0123 0.1198 0.0176 0.0134 0.0134 0.0133 0.0183 0.0183 0.0183
SR
0.0135
(0.4555)
0.0204
(0.4715)
0.0126
(0.4715)
0.0123
(0.4715)
0.1198
(0.4715)
0.0176
(0.4715)
0.0134
(0.4645)
0.0134
(0.4685)
0.0133
(0.4715)
0.0113
(0.4486)
0.0113
(0.4486)
0.0183
(0.4486)
Boom #1 (2017)
CER 0.1398 0.1309 0.1267 -0.0361 0.5720 -0.0877 0.1406 0.1407 0.1419 -0.0361 -0.0349 -0.0676
Risk 0.0044 0.0041 0.0037 0.0087 0.0630 0.0089 0.0046 0.0046 0.0046 0.0070 0.0070 0.0070
SR
0.1194
(0.5814)
0.1227
(0.5884)
0.1308
(0.5814)
-0.0124
(0.7644)
0.0510
(0.5218)
-0.0367
(0.7644)
0.1151
(0.5814)
0.1154
(0.5814)
0.1177
(0.5814)
-0.0173
(0.7644)
-0.0165
(0.7644)
-0.0361
(0.7644)
Boom #2 (2019)
CER 0.3787 0.2595 0.3018 0.4078 1.4756 0.5300 0.2492 0.2497 0.2506 0.3839 0.3845 0.3896
Risk 0.0085 0.0078 0.0072 0.0098 0.0403 0.0176 0.0063 0.0064 0.0064 0.0175 0.0175 0.0175
SR
0.1533
(0.5715)
0.1215
(0.5920)
0.1495
(0.5920)
0.1432
(0.5920)
0.1092
(0.6512)
0.1046
(0.6512)
0.1423
(0.5920)
0.1423
(0.5920)
0.1427
(0.5920)
0.0816
(0.8912)
0.0817
(0.8912)
0.0826
(0.8912)
Table 11: Cumulative excess return (CER) and risk of GMV portfolios using daily data. Targeted risk is set at σ=0.013\sigma=0.013, daily targeted return is 0.0378%0.0378\%. P-values are in parentheses. In-sample: January 20, 2000 - January 24, 2002 (504 obs), Out-of-sample: January 17, 2002 - January 31, 2020 (4536 obs).
FGL FClime FLW FNLW POET ProjPOET FGL(FF1) FGL(FF3) FGL(FF5) FF1 FF3 FF5
Downturn #1: Argentine Great Depression (2002)
CER -0.0044 -0.1061 -0.0151 -0.0206 -0.3190 -0.0662 -0.0038 -0.0059 -0.0076 -0.0335 -0.0335 -0.0335
Risk 0.0081 0.0129 0.0080 0.0078 0.0330 0.0135 0.0077 0.0077 0.0077 0.0096 0.0096 0.0096
SR
0.0017
(0.6543)
-0.0306
(0.7564)
-0.0041
(0.6583)
-0.0075
(0.6583)
-0.0322
(0.7564)
-0.0148
(0.6583)
0.0017
(0.6543)
0.0006
(0.6543)
-0.0004
(0.6543)
-0.0193
(0.6583)
-0.0193
(0.6583)
-0.0193
(0.6583)
Downturn #2: Financial Crisis (2008)
CER -0.2113 -0.4410 -0.2926 -0.2959 -0.9928 0.0829 -0.2291 -0.2251 -0.2226 -0.2938 -0.2938 -0.2938
Risk 0.0138 0.0241 0.0128 0.0124 0.0931 0.0247 0.0136 0.0136 0.0136 0.0186 0.0186 0.0186
SR
0.0138
(0.4146)
0.0241
(0.4296)
0.0128
(0.4296)
0.0124
(0.4296)
0.0931
(0.4296)
0.0247
(0.4296)
0.0136
(0.4196)
0.0136
(0.4276)
0.0136
(0.4296)
0.0116
(0.4096)
0.0116
(0.4096)
0.0186
(0.4096)
Boom #1 (2017)
CER 0.1384 0.1264 0.1323 -0.0388 -1.0000 -0.1106 0.1387 0.1388 0.1402 -0.0389 -0.0366 -0.0698
Risk 0.0045 0.0041 0.0037 0.0090 0.2414 0.0115 0.0047 0.0047 0.0046 0.0065 0.0065 0.0066
SR
0.1177
(0.5994)
0.1183
(0.6044)
0.1366
(0.6044)
-0.0131
(0.8024)
-0.0723
(0.9115)
-0.0347
(0.8024)
0.1131
(0.6014)
0.1133
(0.6044)
0.1157
(0.6044)
-0.0211
(0.8024)
-0.0196
(0.8024)
-0.0404
(0.9023)
Boom #2 (2019)
CER 0.3703 0.2829 0.2994 0.3287 1.6301 0.6870 0.2503 0.2504 0.2504 0.4031 0.4038 0.4087
Risk 0.0072 0.0081 0.0084 0.0097 0.0318 0.0186 0.0063 0.0063 0.0063 0.0185 0.0185 0.0184
SR
0.1521
(0.5644)
0.1266
(0.5714)
0.1478
(0.5714)
0.1419
(0.5714)
0.1366
(0.5714)
0.1209
(0.5714)
0.1441
(0.5684)
0.1440
(0.5684)
0.1439
(0.5684)
0.0810
(0.7592)
0.0811
(0.7592)
0.0819
(0.7592)