This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\MakeSorted

figure \MakeSortedtable

Multi Anchor Point Shrinkage for the Sample Covariance Matrix (Extended Version)

Hubeyb Gurdogan, Alec Kercheval
Abstract

Portfolio managers faced with limited sample sizes must use factor models to estimate the covariance matrix of a high-dimensional returns vector. For the simplest one-factor market model, success rests on the quality of the estimated leading eigenvector “beta”.

When only the returns themselves are observed, the practitioner has available the “PCA” estimate equal to the leading eigenvector of the sample covariance matrix. This estimator performs poorly in various ways. To address this problem in the high-dimension, limited sample size asymptotic regime and in the context of estimating the minimum variance portfolio, Goldberg, Papanicolau, and Shkolnik (Goldberg et al., [2021]) developed a shrinkage method (the “GPS estimator”) that improves the PCA estimator of beta by shrinking it toward the target unit vector q=(1,,1)/ppq=(1,\dots,1)/\sqrt{p}\in\mathbb{R}^{p}.

In this paper we continue their work to develop a more general framework of shrinkage targets that allows the practitioner to make use of further information to improve the estimator. Examples include sector separation of stock betas, and recent information from prior estimates. We prove some precise statements and illustrate the resulting improvements over the GPS estimator with some numerical experiments.


Acknowledgement: The authors thank Lisa Goldberg and Alex Shkolnik for helpful conversations. Any remaining errors are our own.

Last revision: September 9, 2021

1 Introduction

This paper is about the problem of estimating covariance matrices for large random vectors, when the data for estimation is a relatively small sample. We discuss a shrinkage approach to reducing the sampling error asymptotically in the high dimensional, bounded sample size regime, denoted HL. We note at the outset that this context differs from that of the more well-known random matrix theory of the asymptotic “HH regime” in which the sample size grows in proportion to the dimension (e.g. El Karoui, [2008]). See Hall et al., [2005] for earlier discussion of the HL regime, and Fan et al., [2008] for a discussion of the estimation problem for factor models in high dimension.

Our interest in the HL asymptotic regime comes from the problem of portfolio optimization in financial markets. There, a portfolio manager is likely to confront a large number of assets, like stocks, in a universe of hundreds or thousands of individual issues. However, typical return periods of days, weeks, or months, combined with the irrelevance of the distant past, mean that the useful length of data time series is usually much shorter than the dimension of the returns vectors being estimated.

In this paper we extend the successful shrinkage approach introduced in Goldberg et al., [2021] (GPS) to a framework that allows the user to incorporate additional information into the shrinkage target and improve results. Our “multi anchor point shrinkage” (MAPS) approach includes the GPS method as a special case, but improves results when some a priori order information about the betas is known.

The problem of sampling error for portfolio optimization has been widely studied ever since Markowitz, [1952] introduced the approach of mean-variance optimization. That paper immediately gave rise to the importance of estimating the covariance matrix Σ\Sigma of asset returns, as the risk, measured by variance of returns, is given by wTΣww^{T}\Sigma w, where ww is the vector of weights defining the portfolio.

For a survey of various approaches over the years, see Goldberg et al., [2021] and references therein. Reducing the number of parameters via factor models has long been standard; see for example Rosenberg, [1974] and Ross, [1976]. Vasicek, [1973] and Frost and Savarino, [1986] initiated a Bayesian approach to portfolio estimation and the efficient frontier. Vasicek used a prior cross-sectional distribution for betas to produce an empirical Bayes estimator for beta that amounts to shrinking the least-squares estimator toward the prior in an optimal way. This is one of a number of “shrinkage” approaches in which initial sample estimates of the covariance matrix are “shrunk” toward a prior e.g. Lai and Xing, [2008], Bickel and Levina, [2008], Ledoit and Wolf, [2003], Ledoit and Wolf, [2004], Fan et al., [2013]. Ledoit and Wolf, [2017] describes a nonlinear shrinkage of the covariance matrix focused on correcting the eigenvalues, set in the HH asymptotic regime.

The key insight of Goldberg et al., [2021] was to identify the PCA leading eigenvector of the sample covariance matrix as the primary culprit contributing to sampling error for the minimum variance portfolio problem in the HL asymptotic regime. Their approach to eigenvector shrinkage is not explicitly Bayesian, but can be viewed in that spirit. This is the starting point for the present work.

1.1 Mathematical setting and background

Next we describe the mathematical setting, motivation, and results in more detail. We restrict attention to a familiar and well-studied baseline model for financial returns: the one-factor, or “market”, model

𝐫=βx+𝐳,\mathbf{r}=\beta x+\mathbf{z}, (1)

where 𝐫p\mathbf{r}\in\mathbb{R}^{p} is a pp-dimensional random vector of asset (excess) returns in a universe of pp assets, βp\beta\in\mathbb{R}^{p} is an unobserved non-zero vector of parameters to be estimated, xx is an unobserved random variable representing the common factor return, and 𝐳p\mathbf{z}\in\mathbb{R}^{p} is an unobserved random vector of residual returns.

With the assumption that the components of 𝐳\mathbf{z} are uncorrelated with xx and each other, the returns of different assets are correlated only through β\beta, and therefore the covariance matrix of 𝐫\mathbf{r} is

Σ=σ2ββT+Δ,\Sigma=\sigma^{2}\beta\beta^{T}+\Delta,

where σ2\sigma^{2} denotes the variance of xx, and Δ\Delta is the diagonal covariance matrix of 𝐳\mathbf{z}.

Under the further simplifying model assumption111The assumption of homogeneous residual variance δ2\delta^{2} is a mathematical convenience. If the diagonal covariance matrix Δ\Delta of residual returns can be reasonably estimated, then the problem can be rescaled as Δ1/2𝐫=Δ1/2βx+Δ1/2𝐳\Delta^{-1/2}\mathbf{r}=\Delta^{-1/2}\beta x+\Delta^{-1/2}\mathbf{z}, which has covariance matrix σ2βΔβΔT+I\sigma^{2}\beta_{\Delta}\beta_{\Delta}^{T}+I, where βΔ=Δ1/2β\beta_{\Delta}=\Delta^{-1/2}\beta. that each component of 𝐳\mathbf{z} has a common variance δ2\delta^{2} (also not observed), we obtain the covariance matrix of returns

Σ=σ2ββT+δ2𝐈,\Sigma=\sigma^{2}\beta\beta^{T}+\delta^{2}\mathbf{I}, (2)

where 𝐈\mathbf{I} denotes the p×pp\times p identity matrix.

This means that β\beta, or its normalization b=β/βb=\beta/||\beta||, is the leading eigenvector of Σ\Sigma, corresponding to the largest eigenvalue σ2β2+δ2\sigma^{2}||\beta||^{2}+\delta^{2}. As estimating bb becomes the most significant part of the estimation problem for Σ\Sigma, a natural approach is to take as an estimate the first principal component (leading unit eigenvector) hPCAh_{PCA} of the sample covariance of returns data generated by the model. This principal component analysis (PCA) estimate is our starting point.

Consider the optimization problem

minwpwTΣw\displaystyle\min_{w\in\mathbb{R}^{p}}w^{T}\Sigma w
eTw=1\displaystyle e^{T}w=1

where e=(1,1,,1)e=(1,1,\dots,1), the vector of all ones.

The solution, the “minimum variance portfolio”, is the unique fully invested portfolio minimizing the variance of returns. Of course the true covariance matrix Σ\Sigma is not observable and must be estimated from data. Denote an estimate by

Σ^=σ^2β^β^T+δ^2𝐈\hat{\Sigma}=\hat{\sigma}^{2}\hat{\beta}\hat{\beta}^{T}+\hat{\delta}^{2}\mathbf{I} (3)

corresponding to estimated parameters σ^\hat{\sigma}, β^\hat{\beta}, and δ^\hat{\delta}.

Let w^\hat{w} denote the solution of the optimization problem

minwpwTΣ^w\displaystyle\min_{w\in\mathbb{R}^{p}}w^{T}\hat{\Sigma}w
eTw=1.\displaystyle e^{T}w=1.

It is interesting to compare the estimated minimum variance

V^2=w^TΣ^w^\hat{V}^{2}=\hat{w}^{T}\hat{\Sigma}\hat{w}

with the actual variance of w^\hat{w}:

V2=w^TΣw^,V^{2}=\hat{w}^{T}\Sigma\hat{w},

and consider the variance forecast ratio V2/V^2V^{2}/\hat{V}^{2} as one measure of the error made in the estimation of minimum variance, hence of the covariance matrix Σ\Sigma.

The remarkable fact proved in Goldberg et al., [2021] is that, asymptotically as pp tends to infinity, the true variance of the estimated portfolio doesn’t depend on σ^\hat{\sigma}, δ^\hat{\delta}, or β^||\hat{\beta}||, but only on the unit eigenvector β^/β^\hat{\beta}/||\hat{\beta}||. Under some mild assumptions stated later, they show the following.

Definition 1.1.

For a pp-vector β=(β(1),,β(p))\beta=(\beta(1),\dots,\beta(p)), define the mean μ(β)\mu(\beta) and dispersion d2(β)d^{2}(\beta) of β\beta by

μ(β)=1pi=1pβ(i) and d2(β)=1pi=1p(β(i)μ(β)1)2.\mu(\beta)=\frac{1}{p}\sum_{i=1}^{p}\beta(i)\text{ and }d^{2}(\beta)=\frac{1}{p}\sum_{i=1}^{p}\big{(}\frac{\beta(i)}{\mu(\beta)}-1\big{)}^{2}. (4)

We use the notation for normalized vectors

b=ββ, q=ep, and h=β^β^.b=\frac{\beta}{||\beta||},\text{ }q=\frac{e}{\sqrt{p}},\text{ and }h=\frac{\hat{\beta}}{||\hat{\beta}||}.
Proposition 1.1 (Goldberg et al., [2021]).

The true variance of the estimated portfolio w^\hat{w} is given by

V2=w^TΣw^=σ2μ2(β)(1+d2(β))2(h)+opV^{2}=\hat{w}^{T}\Sigma\hat{w}=\sigma^{2}\mu^{2}(\beta)(1+d^{2}(\beta))\mathcal{E}^{2}(h)+o_{p}

where (h)\mathcal{E}(h) is defined by

(h)=(b,q)(b,h)(h,q)1(h,q)2,\mathcal{E}(h)=\frac{(b,q)-(b,h)(h,q)}{1-(h,q)^{2}},

and where the remainder opo_{p} is such that for some constants c,Cc,C, c/popC/pc/p\leq o_{p}\leq C/p for all pp sufficiently large.

In addition, the variance forecast ratio V2/V^2V^{2}/\hat{V}^{2} is asymptotically equal to p2(h)p\mathcal{E}^{2}(h).

Goldberg, Papanicolaou and Shkolnik call the quantity (h)\mathcal{E}(h) the optimization bias associated to an estimate hh of the true vector bb. They note that the optimization bias (hPCA)\mathcal{E}(h_{PCA}) is asymptotically bounded above zero almost surely, and hence the variance forecast ratio explodes as pp\to\infty.

With this background, the estimation problem becomes focused on finding a better estimate hh of bb from an observed time series of returns. GPS Goldberg et al., [2021] introduces a shrinkage estimate for bb – the GPS estimator hGPSh_{GPS} – obtained by “shrinking” the PCA eigenvector hPCAh_{PCA} along the unit sphere toward qq, to reduce excess dispersion. That is, hGPSh_{GPS} is obtained by moving a specified distance (computed only from observed data) toward qq along the spherical geodesic connecting hPCAh_{PCA} and qq. “Shrinkage” refers to the reduced geodesic distance to the “shrinkage target” qq.

The GPS estimator hGPSh_{GPS} is a significant improvement on hPCAh_{PCA}. First, (hGPS)\mathcal{E}(h_{GPS}) tends to zero with pp, and in fact p2(hGPS)/loglog(p)p\mathcal{E}^{2}(h_{GPS})/\log\log(p) is bounded (proved in Gurdogan, [2021]). In Goldberg et al., [2021] it is conjectured, with numerical support, that E[p2(hGPS)]E[p\mathcal{E}^{2}(h_{GPS})] is bounded in pp, and hence the expected variance forecast ratio remains bounded. Moreover, asymptotically hGPSh_{GPS} is closer than hPCAh_{PCA} to the true value bb in the 2\ell_{2} norm; and it yields a portfolio with better tracking error against the true minimum variance portfolio.

1.2 Our contributions

The purpose of this paper is to generalize the GPS estimator by introducing a way to use additional information about beta to adjust the shrinkage target qq in order to improve the estimate.

We can consider the space of all possible shrinkage targets τ\tau as determined by the family of all nontrivial proper linear subspaces LL of p\mathbb{R}^{p} as follows. Given LL (assumed not orthogonal to hh), let the unit vector τ(L)\tau(L) be the normalized projection of hh onto LL. τ(L)\tau(L) is then a shrinkage target for hh determined by LL (and hh). We will describe such a subspace LL as the linear span of a set of unit vectors called “anchor points”. In the case of a single anchor point qq, note that τ(span{q})=q\tau(\text{span}\{q\})=q, so this case corresponds to the GPS shrinkage target.

The “MAPS” estimator is a shrinkage estimator with a shrinkage target defined by an arbitrary collection of anchor points, usually including qq. When qq is the only anchor point, the MAPS estimator reduces to the GPS estimator. We can therefore think of the MAPS approach as allowing for the incorporation of additional anchor points when this provides additional information.

In Theorem 2.2, we show that expanding span{q}\text{span}\{q\} by adding additional anchor points at random asymptotically does no harm, but makes no improvement.

In Theorem 2.3, we show that if the user has certain mild a priori rank ordering information about groups of components of β\beta, even with no information about magnitudes, an appropriately constructed MAPS estimator converges exactly to the true vector bb in the asymptotic limit.

Theorem 2.4 shows that if the betas have positive serial correlation over recent history, then adding the prior PCA estimator hh as an anchor point improves the 2\ell_{2} error in comparison with the GPS estimator, even if the GPS estimator is computed with the same total data history.

The benefit of improving the 2\ell_{2} error in addition to the optimization bias is that it also allows us to reduce the tracking error of the estimated minimum variance fully invested portfolio, discussed in Section 3 and Theorem 3.1.

In the next sections we present the main results. The framework, assumptions, and statements of the main theorems are presented in Sections 2 and 3. Some simulation experiments are presented in Section 4 to illustrate the impact of the main results for some specific situations.

Proofs of the theorems of Section 2 are organized in Section 5, with the proofs of some of the needed technical propositions and lemmas appearing in Section 6. Additional details and computations may be found in Gurdogan, [2021].

2 Main Theorems

2.1 Assumptions and Definitions

We consider a simple random sample history generated from the basic model (1). The sample data can be summarized as

R=βXT+ZR=\beta X^{T}+Z (5)

where Rp×nR\in\mathbb{R}^{p\times n} holds the observed individual (excess) returns of pp assets for a time window that is set by nn consecutive observations. We may consider the observables RR to be generated by non-observable random variables βp\beta\in\mathbb{R}^{p}, XnX\in\mathbb{R}^{n} and Zp×nZ\in\mathbb{R}^{p\times n}.

The entries of XX are the market factor returns for each observation time; the entries of ZZ are the specific returns for each asset at each time; the entries of β\beta are the exposure of each asset to the market factor, and we interpret β\beta as random but fixed at the start of the observation window of times 1,2,3,,n1,2,3,...,n and remaining constant throughout the window. Only RR is observable.

In this paper we are interested in asymptotic results as pp tends to infinity with nn fixed. Therefore we consider equation (5) as defining an infinite sequence of models, one for each pp.

To specify the relationship between models with different values of pp, we need a more precise notation. We’ll let β\beta refer to an infinite sequence (β(1),β(2),)(\beta(1),\beta(2),\dots)\in\mathbb{R}^{\infty}, and βp=(β(1),,β(p))p\beta^{p}=(\beta(1),\dots,\beta(p))\in\mathbb{R}^{p} the vector obtained by truncation after pp entries. When the value pp is understood or implied, we will frequently drop the superscript and write β\beta for βp\beta^{p}.

Similarly, Z×nZ\in\mathbb{R}^{\infty\times n} is a vector of nn sequences (the columns), and Zpp×nZ^{p}\in\mathbb{R}^{p\times n} is obtained by truncating the sequences at pp.

With this setup, passing from pp to p+1p+1 amounts to simply adding an additional asset to the model without changing the existing pp assets. The ppth model is denoted

Rp=βpXT+Zp,R^{p}=\beta^{p}X^{T}+Z^{p},

but for convenience we will often drop the superscript pp in our notation when there is no ambiguity, in favor of equation (5).

Let μp(β)\mu_{p}(\beta) and dp(β)d_{p}(\beta) denote the mean and dispersion of βp\beta^{p}, given by

μp(β)=1pi=1pβ(i)  and  dp(β)2=1pi=1p(β(i)μp(β)μp(β))2.\mu_{p}(\beta)=\frac{1}{p}\sum\limits_{i=1}^{p}\beta(i)\text{ }\text{ and }\text{ }d_{p}(\beta)^{2}=\frac{1}{p}\sum\limits_{i=1}^{p}(\frac{\beta(i)-\mu_{p}(\beta)}{\mu_{p}(\beta)})^{2}. (6)

We make the following assumptions regarding β\beta, XX and ZZ:

  1. A1.

    (Regularity of beta) The entries β(i)\beta(i) of β\beta are uniformly bounded, independent random variables, fixed at time 1. The mean μp(β)\mu_{p}(\beta) and dispersion dp(β)d_{p}(\beta) converge to limits μ(β)(0,)\mu_{\infty}(\beta)\in(0,\infty) and d(β)(0,)d_{\infty}(\beta)\in(0,\infty).

  2. A2.

    (Independence of beta, X, Z) β\beta, XX and ZZ are jointly independent of each other.

  3. A3.

    (Regularity of X) The entries XiX_{i} of XX are iid random variables with mean zero, variance σ2\sigma^{2} .

  4. A4.

    (Regularity of Z) The entries ZijZ_{ij} of ZZ have mean zero, finite variance δ2\delta^{2}, and uniformly bounded fourth moment. In addition, the nn-dimensional rows of ZZ are mutually independent, and within each row the entries are pairwise uncorrelated.222Note we do not assume β,X\beta,X, or ZZ are Normal or belong to any specific family of distributions.


We carry out our analysis with the projection of the vectors on the unit sphere 𝕊p1p\mathbb{S}^{p-1}\subset\mathbb{R}^{p}. To that end we define

b=ββ , q=ep,b=\frac{\beta}{||\beta||}\text{ , }q=\frac{e}{\sqrt{p}}, (7)

where e=ep=(1,1,,1)pe=e^{p}=(1,1,\dots,1)\in\mathbb{R}^{p}, and ||.||||.|| denotes the usual Euclidean norm. With the given assumptions the covariance matrix Σβ\Sigma_{\beta} of RR, conditional on β\beta, is

Σβ=σ2ββT+δ2I.\Sigma_{\beta}=\sigma^{2}\beta\beta^{T}+\delta^{2}I. (8)

Since β\beta stays constant over the nn observations, the sample covariance matrix 1nRRT\frac{1}{n}RR^{T} converges to Σβ\Sigma_{\beta} almost surely if nn is taken to \infty, and is the maximum likelihood estimator of Σβ\Sigma_{\beta}.

Since bb is a leading eigenvector of Σβ\Sigma_{\beta} (corresponding to the largest eigenvalue), then the PCA estimator hh (the unit leading eigenvector hh of the sample covariance matrix 1nRRT\frac{1}{n}RR^{T}) is a natural estimator of bb. (We always select the choice of unit eigenvector hh such that (h,q)0(h,q)\geq 0.)

Since β\beta and XX only appear in the model R=βX+ZR=\beta X+Z as a product, there is a scale ambiguity that we can resolve by combining their scales into a single parameter etaeta:

ηp=1p|βp|2σ2.\eta^{p}=\frac{1}{p}|\beta^{p}|^{2}\sigma^{2}.

It is easy to verify that

ηp=μp(β)2(dp(β)2+1)σ2,\eta^{p}=\mu_{p}(\beta)^{2}(d_{p}(\beta)^{2}+1)\sigma^{2},

and therefore by our assumptions ηp\eta^{p} tends to a positive, finite limit η\eta^{\infty} as pp\to\infty.

Our covariance matrix becomes

ΣβΣb=pηbbT+δ2I,\Sigma_{\beta}\equiv\Sigma_{b}=p\eta bb^{T}+\delta^{2}I, (9)

where we drop the superscript pp when convenient. The scalars η,δ\eta,\delta and the unit vector bb are to be estimated by η^\hat{\eta}, δ^\hat{\delta}, and hh. As described above, asymptotically only the estimate hh of bb will be significant. Improving this estimate is the main technical goal of this paper.

In Goldberg et al., [2021] the PCA estimate hh is replaced by an estimate hGPSh_{GPS} that is “data driven”, meaning that it is computable solely from the observed data RR. We henceforth use the notation hGPS=h^qh_{GPS}=\hat{h}_{q}, for a reason that will be clear shortly. As an intermediate step we also consider a non-observable “oracle” version hqh_{q}, defined as the orthogonal projection in 𝕊p1\mathbb{S}^{p-1} of bb onto the geodesic joining hh to qq. The oracle version is not data driven because it requires knowledge of the unobserved vector bb that we are trying to estimate, but it is a useful concept in the definition and analysis of the data driven version. Both the data driven estimate h^q\hat{h}_{q} and the oracle estimate hqh_{q} can be thought of as obtained from the eigenvector hh via “shrinkage” along the geodesic connecting hh to the anchor point, qq.

The GPS data-driven estimator h^q\hat{h}_{q} is successful in improving the variance forecast ratio, and in arriving at a better estimate of the true variance of the minimum variance portfolio. In this paper we have the additional goal of reducing the l2l_{2} error of the estimator, which, for example, is helpful in reducing tracking error. To that end, we introduce the following new data driven estimator, denoted h^L\hat{h}_{L}.

Let LppL_{p}\subset\mathbb{R}^{p} denote a nontrivial proper linear subspace of p\mathbb{R}^{p}. We will sometimes drop the dimension pp from the notation. Denote by kpk_{p} the dimension of LpL_{p}, with 1kpp11\leq k_{p}\leq p-1.

Let hph^{p} denote the normalized leading eigenvector of 1nRp(Rp)T\frac{1}{n}R^{p}(R^{p})^{T}, sp2s_{p}^{2} its largest eigenvalue, and lp2l_{p}^{2} the average of the remaining eigenvalues. Then we define the data driven “MAPS” (Multi Anchor Point Shrinkage) estimator by

h^L=τph+proj𝐿(h)τph+proj𝐿(h)  where  τp=ψp2proj𝐿(h)21ψp2\hat{h}_{L}=\frac{\tau_{p}h+\underset{L}{\operatorname{proj}}(h)}{||\tau_{p}h+\underset{L}{\operatorname{proj}}(h)||}\text{ }\text{ where }\text{ }\tau_{p}=\frac{\psi_{p}^{2}-||\underset{L}{\operatorname{proj}}(h)||^{2}}{1-\psi_{p}^{2}} (10)

and

ψp=sp2lp2sp2\psi_{p}=\sqrt{\frac{s_{p}^{2}-l_{p}^{2}}{s_{p}^{2}}} (11)

is the relative gap between sp2s_{p}^{2} and lp2l_{p}^{2}.

Lemma 2.1 (Goldberg et al., [2021]).

The limits

ψ=limpψp and (h,b)=limp(hp,bp)\psi_{\infty}=\lim_{p\to\infty}\psi_{p}\text{ and }(h,b)_{\infty}=\lim_{p\to\infty}(h^{p},b^{p})

exist almost surely, and

ψ=(h,b)(0,1).\psi_{\infty}=(h,b)_{\infty}\in(0,1).

When LL is the one-dimensional subspace spanned by the vector qq, then h^L\hat{h}_{L} is precisely the GPS estimator h^q\hat{h}_{q}, located along the spherical geodesic connecting hh to qq. The phrase “multi anchor point” comes from thinking of qq as an “anchor point” shrinkage target in the GPS paper, and LL as a subspace spanned one or more anchor points. The new shrinkage target determined by LL is the normalized orthogonal projection of hh onto LL. When LL is the one-dimensional subspace spanned by qq, the normalized projection of hh onto LL is just qq itself. In the event that LL is orthogonal to hh, the MAPS estimator h^L\hat{h}_{L} reverts to hh itself.

2.2 The MAPS estimator with random extra anchor points

Does adding anchor points to create a MAPS estimator from a higher-dimensional subspace improve the estimation? The answer depends on whether there is any relevant information in the added anchor points.

We need the concept of a random linear subspace of p\mathbb{R}^{p}. Let kpk_{p} be a positive integer such that 1kpp11\leq k_{p}\leq p-1. Let ξp\xi^{p} be an O(p)O(p)-valued random variable, where O(p)O(p) denotes the orthogonal group in p\mathbb{R}^{p}. Let {e1p,e2p,,epp}\{e^{p}_{1},e^{p}_{2},\dots,e^{p}_{p}\} denote the standard Cartesian basis of p\mathbb{R}^{p}.

We say that LpL_{p} is a random linear subspace of p\mathbb{R}^{p} with dimension kpk_{p} if, for some ξp\xi^{p} as above,

Lp=spanp{ξpeip|i=1,2,,kp},L_{p}=\text{span}_{p}\{\xi^{p}e^{p}_{i}|i=1,2,\dots,k_{p}\},

where spanp\text{span}_{p} denotes the linear span of a set of vectors in p\mathbb{R}^{p}.

We say LpL_{p} is independent of a random variable XX if the generator ξp\xi^{p} is independent of XX. Moreover, we say HpH_{p} is a Haar random subspace of p\mathbb{R}^{p} if it is a random linear subspace as above, and the random variable ξp\xi^{p} induces the (uniform) Haar measure on O(p)O(p).

Definition 2.1.

A non-decreasing sequence {kp}\{k_{p}\} of positive integers is square root dominated if

p=1kp2p2<.\sum_{p=1}^{\infty}\frac{k_{p}^{2}}{p^{2}}<\infty.

For example, any non-decreasing sequence satisfying kpCpαk_{p}\leq Cp^{\alpha} for α<1/2\alpha<1/2 is square root dominated.

Theorem 2.2.

Let the assumptions 1,2,3 and 4 hold. Suppose, for each pp, LpL_{p} is a random linear subspace and HpH_{p} is a Haar random subspace of p\mathbb{R}^{p}. Suppose also that LpL_{p} and HpH_{p} are independent of β\beta and ZZ, and the sequences dimLp\dim L_{p} and dimHp\dim H_{p} are square root dominated.

Let Lp=span{Lp,qp}L^{\prime}_{p}=\text{span}\{L_{p},q^{p}\} and Lp′′=span{Hp,qp}L^{\prime\prime}_{p}=\text{span}\{H_{p},q^{p}\}.

Then, almost surely,

(a)\displaystyle(a) lim supph^Lb\displaystyle\limsup\limits_{p\rightarrow\infty}||\hat{h}_{L^{\prime}}-b|| h^qb,\displaystyle\leq||\hat{h}_{q}-b||_{\infty}, (12)
(b)\displaystyle(b) limph^L′′b\displaystyle\lim\limits_{p\rightarrow\infty}||\hat{h}_{L^{\prime\prime}}-b|| =h^qb, and\displaystyle=||\hat{h}_{q}-b||_{\infty},\text{ and } (13)
(c)\displaystyle(c) limph^Hb\displaystyle\lim\limits_{p\rightarrow\infty}||\hat{h}_{H}-b|| =hb.\displaystyle=||h-b||_{\infty}. (14)

Theorem 2.2 says adding random anchor points to form a MAPS estimator does no harm, but also makes no improvement asymptotically. Equation (13) says that the GPS estimator is neither improved nor harmed by adding extra anchor points uniformly at random. Therefore the goal will be to find useful anchor points that take advantage of additional information that might be available.

2.3 The MAPS estimator with rank order information about the entries of beta

As with stocks grouped by sector, it may be that the betas can be separated into ordered groups, where the rank ordering of the groups is known, but not the ordering within groups. This turns out to be enough information for the MAPS estimator to converge asymptotically to the true value almost surely.

Definition 2.2.

For any pp\in\mathbb{N}, let 𝒫=𝒫(p)\mathcal{P}=\mathcal{P}(p) be a partition of the index set {1,2,..,p}\{1,2,..,p\} (i.e. a collection of pairwise disjoint non-empty subsets, called atoms, whose union is {1,2,..,p}\{1,2,..,p\}). The number of atoms of 𝒫\mathcal{P} is denoted by |𝒫||\mathcal{P}|.

We say the sequence of partitions 𝒫(p)\mathcal{P}(p) is semi-uniform if there exists M>0M>0 such that for all pp,

maxI𝒫(p)|I|Mp|𝒫(p)|.\max\limits_{I\in\mathcal{P}(p)}|I|\leq M\frac{p}{|\mathcal{P}(p)|}. (15)

In other words, no atom is larger than a multiple MM of the average atom size.

Given βp\beta\in\mathbb{R}^{p}, we say 𝒫\mathcal{P} is β\beta-ordered if, for each distinct I,J𝒫I,J\in\mathcal{P}, either maxiIβiminjJβj\max\limits_{i\in I}\beta_{i}\leq\min\limits_{j\in J}\beta_{j} or maxjJβjminjIβi\max\limits_{j\in J}\beta_{j}\leq\min\limits_{j\in I}\beta_{i}.

Definition 2.3.

For any A{1,2,,p}A\subset\{1,2,...,p\} define a unit vector vApv^{A}\in\mathbb{R}^{p} by

vA(i)=1A(i)1|A|,v^{A}(i)=1_{A}(i)\frac{1}{\sqrt{|A|}}, (16)

where 1A1_{A} denotes the indicator function of AA. We may then define, for any partition 𝒫=𝒫(p)\mathcal{P}=\mathcal{P}(p), an induced linear subspace L(𝒫)L(\mathcal{P}) of p\mathbb{R}^{p} by

L(𝒫)=spanp{vA|A𝒫}<vA|A𝒫>.L(\mathcal{P})=\text{span}_{p}\{v^{A}\big{|}A\in\mathcal{P}\}\equiv<v^{A}\big{|}A\in\mathcal{P}>. (17)
Theorem 2.3.

Let the assumptions 1,2,3 and 4 hold. Consider a semi-uniform sequence {𝒫(p):p=1,2,3,}\{\mathcal{P}(p):p=1,2,3,\dots\} of β\beta-ordered partitions such that the sequence {|𝒫(p)|}\{|\mathcal{P}(p)|\} tends to infinity and is square root dominated. Then

limph^L(𝒫(p))b=0  almost surely. \lim\limits_{p\rightarrow\infty}||\hat{h}_{L(\mathcal{P}(p))}-b||=0\text{ }\text{ almost surely. } (18)

Theorem 2.3 says that if we have certain prior information about the ordering of the β\beta elements in the sense of finding an ordered partition (but with no prior information about the magnitudes of the elements or their ordering within partition atoms), then asymptotically we can estimate bb exactly.

Having in hand a genuine ordered partition a priori is likely only approximately possible in the real world. Theorem 2.3 is suggestive that even partial grouped order information about the betas can be helpful in strictly improving the GPS estimate. This is confirmed empirically in section 4.

The next theorem shows that even with no a priori information beyond the observed time series of returns, we can still use MAPS to improve the GPS estimator.

2.4 A data-driven dynamic MAPS estimator

In the analysis above we have treated β\beta as a constant throughout the sampling period, but in reality we expect β\beta to vary slowly over time. To capture this in a simple way, let’s now assume that we have access to returns observations for pp assets over a fixed number of 2n2n periods. The first nn periods we call the first (or previous) time block, and the second nn periods the second (or current) time block. We then have returns matrices R1,R2p×nR_{1},R_{2}\in\mathbb{R}^{p\times n} corresponding to the two time blocks, and R=[R1R2]p×2nR=[R_{1}R_{2}]\in\mathbb{R}^{p\times 2n} the full returns matrix over the full set of 2n2n observation times.

Define the sample covariance matrices S,S1,S2S,S_{1},S_{2} as 12nRRT\frac{1}{2n}RR^{T}, 1nR1R1T\frac{1}{n}R_{1}R_{1}^{T}, and 1nR1R1T\frac{1}{n}R_{1}R_{1}^{T}, respectively. Let h,h1,h2h,h_{1},h_{2} denote the respective (normalized) leading eigenvectors (PCA estimators) of S,S1,S2S,S_{1},S_{2}. (Of the two choices of eigenvector, we always select the one having non-negative inner product with qq.)

Instead of a single β\beta for the entire observation period, we suppose there are random vectors β1\beta_{1} and β2\beta_{2} that enter the model during the first and second time blocks, respectively, and are fixed during their respective blocks. We assume both β1\beta_{1} and β2\beta_{2} satisfy assumptions (1) and (2) above, and denote by b1b_{1} and b2b_{2} the corresponding normalized vectors. The vectors β1\beta_{1} and β2\beta_{2} should not be too dissimilar in the mild sense that (β1,β2)0(\beta_{1},\beta_{2})\geq 0.

Definition 2.4.

Define the co-dispersion dp(β1,β2)d_{p}(\beta_{1},\beta_{2}) and pointwise correlation ρp(β1,β2)\rho_{p}(\beta_{1},\beta_{2}) of β1\beta_{1} and β2\beta_{2} by

dp(β1,β2)=1pi=1p(β1(i)μp(β1)1)(β2(i)μp(β2)1)d_{p}(\beta_{1},\beta_{2})=\frac{1}{p}\sum\limits_{i=1}^{p}\big{(}\frac{\beta_{1}(i)}{\mu_{p}(\beta_{1})}-1\big{)}\big{(}\frac{\beta_{2}(i)}{\mu_{p}(\beta_{2})}-1\big{)}

and

ρp(β1,β2)=dp(β1,β2)dp(β1)dp(β2).\rho_{p}(\beta_{1},\beta_{2})=\frac{d_{p}(\beta_{1},\beta_{2})}{d_{p}(\beta_{1})d_{p}(\beta_{2})}.

The Cauchy-Schwartz inequality shows 1ρp(β1,β2)1-1\leq\rho_{p}(\beta_{1},\beta_{2})\leq 1. Furthermore, it is straightforward to verify that

(b1,b2)(b1,q)(b2,q)=dp(β1,β2)1+dp(β1)21+dp(β2)2.(b_{1},b_{2})-(b_{1},q)(b_{2},q)=\frac{d_{p}(\beta_{1},\beta_{2})}{\sqrt{1+d_{p}(\beta_{1})^{2}}\sqrt{1+d_{p}(\beta_{2})^{2}}}. (19)

and hence dp(β1,β2)d_{p}(\beta_{1},\beta_{2}), and ρp(β1,β2)\rho_{p}(\beta_{1},\beta_{2}) have limits d(β1,β2)d_{\infty}(\beta_{1},\beta_{2}), and ρ(β1,β2)\rho_{\infty}(\beta_{1},\beta_{2}) as pp\to\infty.

Our motivation for this model is the intuition that the betas for different time periods are noisy representations of a fundamental beta, and that the beta from a recent time block provides some useful information about beta in the current time block. To make this precise in support of the following theorem, we make the following additional assumptions.

  1. A5.

    [Relation between β1\beta_{1} and β2\beta_{2}] Almost surely, (β1,β2)>0(\beta_{1},\beta_{2})>0, μ(β1)=μ(β2)\mu_{\infty}(\beta_{1})=\mu_{\infty}(\beta_{2}), d(β1)=d(β2)d_{\infty}(\beta_{1})=d_{\infty}(\beta_{2}), and limpdp(β1,β2)=d(β1,β2)\lim_{p\to\infty}d_{p}(\beta_{1},\beta_{2})=d_{\infty}(\beta_{1},\beta_{2}) exists.

Theorem 2.4.

Assume β1\beta_{1}, β2\beta_{2}, R,X,ZR,X,Z satisfy assumptions 1-5. Denote by h^qs\hat{h}_{q}^{s} and h^qd\hat{h}_{q}^{d} the GPS estimators for R2R_{2} and RR, respectively, i.e. the current (single) and previous plus current (double) time blocks. Let h1h_{1} and h2h_{2} be the PCA estimators for R1R_{1} and R2R_{2}, respectively.

Let Lp=<h1,q>L_{p}=<h_{1},q> and define a MAPS estimator for the current time block as

h^L=τph2+proj𝐿(h2)τph2+proj𝐿(h2)  where  τp=ψp2proj𝐿(h2)21ψp2.\hat{h}_{L}=\frac{\tau_{p}h_{2}+\underset{L}{\operatorname{proj}}(h_{2})}{||\tau_{p}h_{2}+\underset{L}{\operatorname{proj}}(h_{2})||}\text{ }\text{ where }\text{ }\tau_{p}=\frac{\psi_{p}^{2}-||\underset{L}{\operatorname{proj}}(h_{2})||^{2}}{1-\psi_{p}^{2}}. (20)

Then, almost surely,

  1. (a)
    limp(||h^Lb2||||h^qsb2||])0  and  limp(||h^Lb2||||h^qdb2||])0.\lim\limits_{p\rightarrow\infty}\big{(}||\hat{h}_{L}-b_{2}||-||\hat{h}_{q}^{s}-b_{2}||]\big{)}\leq 0\text{ }\text{ and }\text{ }\lim\limits_{p\rightarrow\infty}\big{(}||\hat{h}_{L}-b_{2}||-||\hat{h}_{q}^{d}-b_{2}||]\big{)}\leq 0. (21)
  2. (b)

    If 0<|ρ(β1,β2)|<10<|\rho_{\infty}(\beta_{1},\beta_{2})|<1 almost surely,

    limp(h^Lb2h^qsb2)<0 and  limp(h^Lb2h^qdb2)<0.\lim\limits_{p\rightarrow\infty}\big{(}||\hat{h}_{L}-b_{2}||-||\hat{h}^{s}_{q}-b_{2}||\big{)}<0\text{ and }\text{ }\lim\limits_{p\rightarrow\infty}\big{(}||\hat{h}_{L}-b_{2}||-||\hat{h}_{q}^{d}-b_{2}||\big{)}<0. (22)

Theorem 2.4 says that the MAPS estimator obtained by adding the PCA estimator hh from the previous time block as a second anchor point outperforms the GPS estimator asymptotically, as measured by l2l_{2} error, even if the latter estimated with the full 2n2n (double) data set. This works when the previous time block carries some information about the current beta (non-zero correlation). In the case of perfect correlation ρ(β1,β2)=1\rho_{\infty}(\beta_{1},\beta_{2})=1 the two betas are equal, and we then return to the GPS setting where beta is assumed constant, so no improved performance is expected.

The cost of implementing this “dynamic MAPS” estimator is comparable to that of the GPS estimator, so should generally be preferred when no rank order information is available for beta.

3 Tracking Error

Our task has been to estimate the covariance matrix of returns for a large number pp of assets but a short time series of nn returns observations.

Recall that for the returns model (1), under the given assumptions, we have the true covariance matrix

Σb=pηbbT+δ2I,\Sigma_{b}=p\eta bb^{T}+\delta^{2}I,

where η\eta and δ\delta are positive constants and bb is a unit pp-vector, and we are interested in corresponding estimates η^\hat{\eta}, δ^\hat{\delta}, and hh that define an estimator

Σh=pη^hhT+δ^2I.\Sigma_{h}=p\hat{\eta}hh^{T}+\hat{\delta}^{2}I.

The theorems above are about finding an estimator hh of bb that asymptotically controls the 2\ell_{2} error hb||h-b||. We are ignoring η^\hat{\eta} and δ^\hat{\delta} because of Proposition 1.1, showing that the true variance of the estimated minimum variance portfolio w^\hat{w}, and the variance forecast ratio, are asymptotically controlled by hh via the optimization bias

(h)=(b,q)(b,h)(h,q)1(h,q)2.\mathcal{E}(h)=\frac{(b,q)-(b,h)(h,q)}{1-(h,q)^{2}}.

We now turn to another important measure of portfolio estimation quality: the tracking error.

Recall that ww denotes the true minimum variance portfolio using Σ\Sigma, and w^\hat{w} is the minimum variance portfolio using the estimated covariance matrix Σ^\hat{\Sigma}.

Definition 3.1.

The (true) tracking error 𝒯(h)\mathcal{T}(h) associated to w^\hat{w} is defined by

𝒯2(h)=(w^w)TΣ(w^w).\mathcal{T}^{2}(h)=(\hat{w}-w)^{T}\Sigma(\hat{w}-w). (23)
Definition 3.2.

Given the notation above, define the eigenvector bias 𝒟(h)\mathcal{D}(h) associated to a unit leading eigenvector estimate hh as

𝒟(h)=(h,q)2(1(h,b)2)(1(h,q)2)(1(b,q)2)=(h,q)2hb2hq2bq2.\mathcal{D}(h)=\frac{(h,q)^{2}(1-(h,b)^{2})}{(1-(h,q)^{2})(1-(b,q)^{2})}=\frac{(h,q)^{2}||h-b||^{2}}{||h-q||^{2}||b-q||^{2}}.
Theorem 3.1.

Let hh be an estimator of bb such that (h)0\mathcal{E}(h)\to 0 as pp\to\infty (such as a GPS or MAPS estimator). Then the tracking error of hh is asymptotically (neglecting terms of higher order in 1/p1/p) given by

𝒯2(h)=η2(h)+δ2p𝒟(h)+Cp(h),\mathcal{T}^{2}(h)=\eta\mathcal{E}^{2}(h)+\frac{\delta^{2}}{p}\mathcal{D}(h)+\frac{C}{p}\mathcal{E}(h), (24)

where

C=2ξ(1+d2(β))(δ2+ηη^δ^2)C=\frac{2}{\xi(1+d_{\infty}^{2}(\beta))}(\delta^{2}+\frac{\eta}{\hat{\eta}}\hat{\delta}^{2})

and ξ>0\xi>0 is a constant depending only on ψ\psi_{\infty}, μ(β)\mu_{\infty}(\beta), and d(β)d_{\infty}(\beta).

We consider what this theorem means for various estimators hh. For the PCA estimate, it was already shown in Goldberg et al., [2021] that (hPCA)\mathcal{E}(h_{PCA}) is asymptotically bounded below, and hence so is the tracking error.

On the other hand, (hGPS)\mathcal{E}(h_{GPS}) tends to zero as pp\to\infty. In addition Goldberg et al., [2021] shows that

lim suppp2(hGPS)=\limsup_{p\to\infty}p\,\mathcal{E}^{2}(h_{GPS})=\infty

almost surely, while Gurdogan, [2021] shows

lim suppp2(hGPS)loglogp<,\limsup_{p\to\infty}\frac{p\,\mathcal{E}^{2}(h_{GPS})}{\log\log p}<\infty,

and we conjecture the same is true for the more general estimator hMAPSh_{MAPS}.

This implies the leading terms, asymptotically, are

𝒯2(hMAPS)η2(hMAPS)+(δ2/p)𝒟(hMAPS)\mathcal{T}^{2}(h_{MAPS})\leq\eta\mathcal{E}^{2}(h_{MAPS})+(\delta^{2}/p)\mathcal{D}(h_{MAPS})

Note here the estimated parameters η^\hat{\eta} and δ^\hat{\delta} have dropped out, with the tracking error asymptotically controlled by the eigenvector estimate hh alone.

Theorem 3.1 helps justify our interest in the 2\ell_{2} error results of Theorems 2.3 and 2.4. Reducing the 2\ell_{2} error hb||h-b|| of the hh estimate controls the second term 𝒟(h)\mathcal{D}(h) of the asymptotic estimate for tracking error. We therefore expect to see improved total tracking error when we are able to make an informed choice of additional anchor points in forming the MAPS estimator. This is borne out in our numerical experiments described in Section 4.


Proof of Theorem 3.1

Lemma 3.2.

There exists ξ>0\xi>0, depending only on ψ\psi_{\infty}, μ(β)\mu_{\infty}(\beta), and d(β)d_{\infty}(\beta), such that for any pp sufficiently large, and any linear subspace LL of p\mathbb{R}^{p} that contains qq,

hLq2>ξ>0,||h_{L}-q||^{2}>\xi>0,

where hLh_{L} is the MAPS estimator determined by LL.

The Lemma follows from the fact that (hL,q)(hGPS,q)(h_{L},q)\leq(h_{GPS},q), and is proved for the case hGPSh_{GPS} using the definitions and the known limits

(hPCA,q)\displaystyle(h_{PCA},q)_{\infty} =\displaystyle= (b,q)(hPCA,b)\displaystyle(b,q)_{\infty}(h_{PCA},b)_{\infty} (25)
(b,q)2\displaystyle(b,q)^{2}_{\infty} =\displaystyle= 11+d2(β)(0,1)\displaystyle\frac{1}{1+d_{\infty}^{2}(\beta)}\in(0,1) (26)
(hPCA,b)\displaystyle(h_{PCA},b)_{\infty} =\displaystyle= ψ>0.\displaystyle\psi_{\infty}>0. (27)

From the Lemma and equation (26), we may assume without loss of generality that ξ>0\xi>0 is an asymptotic lower bound for both hLq2=1(hL,q)2||h_{L}-q||^{2}=1-(h_{L},q)^{2} and bq2=1(b,q)2||b-q||^{2}=1-(b,q)^{2}.

Next, we recall it is straightforward to find explicit formulas for the minimum variance portfolios ww and w^\hat{w}:

w=1pρqbρ(b,q), where ρ=1+k2(b,q),k2=δ2pηw=\frac{1}{\sqrt{p}}\frac{\rho q-b}{\rho-(b,q)},\quad\text{ where }\rho=\frac{1+k^{2}}{(b,q)},\quad k^{2}=\frac{\delta^{2}}{p\eta}

and

w^=1pρ^qhρ^(h,q), where ρ^=1+k^2(h,q),k^2=δ^2pη^.\hat{w}=\frac{1}{\sqrt{p}}\frac{\hat{\rho}q-h}{\hat{\rho}-(h,q)},\quad\text{ where }\hat{\rho}=\frac{1+\hat{k}^{2}}{(h,q)},\quad\hat{k}^{2}=\frac{\hat{\delta}^{2}}{p\hat{\eta}}.

We may use these expressions to obtain an explicit formula for the tracking error:

𝒯2(h)\displaystyle\mathcal{T}^{2}(h) =\displaystyle= (w^w)TΣ(w^w)=(w^w)T(pηbbT+δ2I)(w^w)\displaystyle(\hat{w}-w)^{T}\Sigma(\hat{w}-w)=(\hat{w}-w)^{T}(p\eta bb^{T}+\delta^{2}I)(\hat{w}-w)
=\displaystyle= pη(w^w,b)2+δ2w^w2.\displaystyle p\eta(\hat{w}-w,b)^{2}+\delta^{2}||\hat{w}-w||^{2}.

We now estimate the two terms on the right hand side separately.

(1) For the first term pη(w^w,b)2p\eta(\hat{w}-w,b)^{2}, it is convenient to introduce the notation

Γ=k21+k2(b,q)2 and Γ^=k^21+k^2(h,q)2,\Gamma=\frac{k^{2}}{1+k^{2}-(b,q)^{2}}\text{ and }\hat{\Gamma}=\frac{\hat{k}^{2}}{1+\hat{k}^{2}-(h,q)^{2}},

and since

Γk2ξ and Γ^k^2ξ\Gamma\leq\frac{k^{2}}{\xi}\text{ and }\hat{\Gamma}\leq\frac{\hat{k}^{2}}{\xi}

both Γ\Gamma and Γ^\hat{\Gamma} are of order 1/p1/p.

A straightforward computation verifies that

(w,b)\displaystyle(w,b) =\displaystyle= 1pΓ(b,q)\displaystyle\frac{1}{\sqrt{p}}\Gamma(b,q) (28)
(w^,b)\displaystyle(\hat{w},b) =\displaystyle= 1p((h)+Γ^[(b,q)(h)]).\displaystyle\frac{1}{\sqrt{p}}\left(\mathcal{E}(h)+\hat{\Gamma}[(b,q)-\mathcal{E}(h)]\right). (29)

We then obtain

p(w^w,b)2\displaystyle p(\hat{w}-w,b)^{2} =\displaystyle= p[(w^,b)(w,b)]2\displaystyle p[(\hat{w},b)-(w,b)]^{2} (30)
=\displaystyle= (h)2+2(h)G+G2,\displaystyle\mathcal{E}(h)^{2}+2\mathcal{E}(h)G+G^{2}, (31)

where G=Γ^((b,q)(h))Γ(b,q)G=\hat{\Gamma}((b,q)-\mathcal{E}(h))-\Gamma(b,q).

Since asymptotically (b,q)(b,q) is bounded below and (h)0\mathcal{E}(h)\to 0, the third term G2G^{2} is of order 1/p21/p^{2} and can be dropped. We thus obtain the asymptotic estimate

p(w^w,b)22+2(h)(Γ^Γ)(b,q).p(\hat{w}-w,b)^{2}\leq\mathcal{E}^{2}+2\mathcal{E}(h)(\hat{\Gamma}-\Gamma)(b,q).

Multiplying by η\eta and using the bounds on Γ,Γ^\Gamma,\hat{\Gamma} and the limit of (b,q)(b,q), we obtain

pη(w^w,b)22+Cp(h),p\eta(\hat{w}-w,b)^{2}\leq\mathcal{E}^{2}+\frac{C}{p}\mathcal{E}(h),

where CC is the constant defined in the statement of the theorem.

(2) We now turn to the second term w^w2=w^2+w22(w^,w)||\hat{w}-w||^{2}=||\hat{w}||^{2}+||w||^{2}-2(\hat{w},w).

Using the definitions of w^\hat{w} and ww and the fact that k2k^{2}, k^2\hat{k}^{2} are of order 1/p1/p, after a calculation we obtain, to lowest order in 1/p1/p,

pw^w2=(h,q)2[1(h,b)2](1(h,q)2)(1(b,q)2)+1(h,q)21(b,q)22(h).p||\hat{w}-w||^{2}=\frac{(h,q)^{2}[1-(h,b)^{2}]}{(1-(h,q)^{2})(1-(b,q)^{2})}+\frac{1-(h,q)^{2}}{1-(b,q)^{2}}\mathcal{E}^{2}(h). (32)

Since (h)0\mathcal{E}(h)\to 0, we may neglect the second term, and putting (1) and (2) together yields

𝒯2(h)2+Cp(h)+δ2p𝒟(h).\mathcal{T}^{2}(h)\leq\mathcal{E}^{2}+\frac{C}{p}\mathcal{E}(h)+\frac{\delta^{2}}{p}\mathcal{D}(h).

4 Simulation Experiments

To illustrate the previous theorems, we present the results of numerical experiments showing the improvement that MAPS estimators can bring in estimating the covariance matrix. To approximate the asymptotic regime, in these experiments we use p=500p=500 stocks. The Python code used to run these experiments and create the figures is available at
https://github.com/hugurdog/MAPS_NumericalExperiments.

4.1 Simulated betas with correlation

First we set up a test bed consisting of a double block of data where we can control the beta correlation. Set n=24n=24. We generate observations Ri1,Ri2pR_{i}^{1},R_{i}^{2}\in\mathbb{R}^{p} for i=1,2,,ni=1,2,...,n, according to the market model of Equation (1):

Rit=βtXit+Zit t=1,2 i=1,2,,nR_{i}^{t}=\beta_{t}X_{i}^{t}+Z_{i}^{t}\text{ }\text{, }t=1,2\text{ }\text{, }i=1,2,...,n (33)

for unobserved market returns XitX^{t}_{i}\in\mathbb{R} and unobserved asset specific returns ZitpZ^{t}_{i}\in\mathbb{R}^{p} for each time window of data.

Here the p×np\times n matrices R1R^{1} and R2R^{2} represent the previous and current block of nn consecutive excess returns, respectively, and are obtained from Equation (33) by randomly generating β,X\beta,X, and ZZ:

  • the market returns XitX^{t}_{i} are an iid random sample drawn from a normal distribution with mean 0 and variance σ2=0.16\sigma^{2}=0.16,

  • the asset specific returns {Zi1}1n\{Z^{1}_{i}\}_{1}^{n} and {Zi2}1n\{Z^{2}_{i}\}_{1}^{n} are i.i.d. normal with mean 0 and variance δ2I=(.5)2I\delta^{2}I=(.5)^{2}I, and

  • the pp-vectors β1\beta_{1} and β2\beta_{2} are drawn independently of XX and ZZ from a normal distribution with mean 0 and variance (.5)2I(.5)^{2}I and with pointwise correlation ρp(β1,β2)[0,1]\rho_{p}(\beta^{1},\beta^{2})\in[0,1] for a range of values of ρ\rho specified below.333exact recipe for β1,β2\beta_{1},\beta_{2} here.

The true covariance matrix of the nn most recent returns R2R^{2} is

Σ=σ2β2β2T+δ2I,{\Sigma}={\sigma}^{2}{\beta_{2}}{\beta_{2}}^{T}+{\delta}^{2}I, (34)

which we wish to estimate by

Σ^=σ^2β^β^T+δ^2I.\hat{\Sigma}=\hat{\sigma}^{2}\hat{\beta}\hat{\beta}^{T}+\hat{\delta}^{2}I. (35)

Following the lead of Goldberg et al., [2021], we fix

σ^2|β^|2=sp2lp2 and δ^2=nplp2\hat{\sigma}^{2}|\hat{\beta}|^{2}=s_{p}^{2}-l_{p}^{2}\text{ and }\hat{\delta}^{2}=\frac{n}{p}l_{p}^{2} (36)

and vary only the estimator of β^|β^|=h\frac{\hat{\beta}}{|\hat{\beta}|}=h. In our numerical experiments we compare performance for the following choices of hh:

  1. 1.

    hsh^{s} the PCA estimator on the single block R2R^{2} (PCA1)

  2. 2.

    hdh^{d} the PCA estimator on the double block R=[R1,R2]R=[R^{1},R^{2}] (PCA2)

  3. 3.

    h^qs\hat{h}_{q}^{s}, the GPS estimator on the single block R2R^{2} (GPS1)

  4. 4.

    h^LD\hat{h}_{L{{}_{D}}} the dynamical MAPS estimator defined on the double block of data R=[R1,R2]R=[R^{1},R^{2}] by the equation (18).(Dynamical MAPS)

  5. 5.

    h^qd\hat{h}_{q}^{d}, the GPS estimator on the double block R=[R1,R2]R=[R^{1},R^{2}] (GPS2)

  6. 6.

    h^L(𝒫)s\hat{h}^{s}_{L(\mathcal{P})} is the MAPS estimator on single block R2R^{2} where 𝒫\mathcal{P} is a beta ordered uniform partition constructed by using the ordering of the entries of β2\beta^{2} and where the number of atoms kpk_{p} in each partition is set to 8, which is approximately p3\sqrt[3]{p}.444The largest 479 beta values are partitioned into 7 groups of 71, and the three lowest values form the eighth partition atom. (Beta Ordered MAPS)

We report the performance of each of these estimators according to the following two metrics:

  • The l2l_{2} error bh||b-h|| between the true normalized beta b=β2|β2|b=\frac{\beta^{2}}{|\beta^{2}|} of the current data block and the estimated version h=β^|β^|h=\frac{\hat{\beta}}{|\hat{\beta}|}.

  • The tracking error between the true and estimated minimum variance portfolios ww and w^\hat{w}

    𝒯2(w^)=(w^w)TΣ(w^w).\mathcal{T}^{2}(\hat{w})=(\hat{w}-w)^{T}\Sigma(\hat{w}-w). (37)

Results of the comparison are displayed below for values of the pointwise correlation ρ\rho selected from {0,0.2,.4,.6,.8,1}\{0,0.2,.4,.6,.8,1\}. For each choice of ρ\rho, the experiment was run 100 times, resulting in 100 2\ell_{2} and tracking error values each. These values are summarized using standard box-and-whisker plots generated in Python using the package matplotlib.pyplot.boxplot.

Figure 1 shows the 2\ell_{2} error hb||h-b|| for different estimators hh (in the same order, left to right, as listed above) for the case ρ=0.2\rho=0.2. The worst performer is the single block PCA. (It is independent of ρ\rho since it doesn’t see the earlier data at all.) Double block PCA is a little better, but the other estimators are far better. Since GPS effectively assumes that the betas have perfect serial correlation, it’s not surprising that the double block GPS does slightly worse than the single block in this case. The best estimator is the MAPS estimator with prior information about group ordering. Assuming no such information is available, the GPS2 and Dynamical MAPS estimators are about tied for best.

Figure 2 shows the results for ρ=0,0.2,0.4,0.6,0.8,1.0\rho=0,0.2,0.4,0.6,0.8,1.0 in smaller size for visual comparison. Throughout the range, the dynamical MAPS estimator outperforms all the other purely data-driven estimators, but the beta-ordered MAPS estimator remains in the lead.

Figure 3 presents the results for tracking error, reported as p𝒯2p\mathcal{T}^{2}. Results are similar to the 2\ell_{2} error, but stronger. Again, the dynamical MAPS estimator does best among all methods that don’t use order information, and the beta ordered MAPS estimator is significantly better than all others. Figure 4 displays tracking error outcomes for a range of correlation values ρ(β1,β2)\rho(\beta_{1},\beta_{2}).

We conclude from these experiments the dynamical MAPS estimator is best when the only the returns are available, and the beta ordered MAPS estimator is preferred when rank order information on the betas is available.

Refer to caption
Figure 1: The 2\ell_{2} error performance of six different estimators as defined in the text. Here the pointwise correlation of betas between the two time blocks is ρ=0.2\rho=0.2.
Refer to caption
(a) ρ=0\rho=0
Refer to caption
(b) ρ=0.2\rho=0.2
Refer to caption
(c) ρ=0.4\rho=0.4
Refer to caption
(d) ρ=0.6\rho=0.6
Refer to caption
(e) ρ=0.8\rho=0.8
Refer to caption
(f) ρ=1.0\rho=1.0
Figure 2: Results of simulation experiments for different estimators PCA1, PCA2, GPS1, Random Partition, Dynamical Maps, GPS2, and Beta Ordered. The pointwise correlation ρ\rho is the correlation between betas in the two different time blocks. Figure 2(b) is the same as Figure 1.
Refer to caption
Figure 3: The tracking error error performance of different estimators. Here the pointwise correlation of betas between the two time blocks is ρ=0.2\rho=0.2.
Refer to caption
(a) ρ=0\rho=0
Refer to caption
(b) ρ=0.2\rho=0.2
Refer to caption
(c) ρ=0.4\rho=0.4
Refer to caption
(d) ρ=0.6\rho=0.6
Refer to caption
(e) ρ=0.8\rho=0.8
Refer to caption
(f) ρ=1.0\rho=1.0
Figure 4: Tracking error results of simulation experiments for different estimators PCA1, PCA2, GPS1, Random Partition, Dynamical Maps, GPS2, and Beta Ordered. The pointwise correlation ρ\rho is the correlation between betas in the two different time blocks. Figure 4(b) is the same as Figure 3.

4.2 Simulations with historical betas

In this section we use historical betas rather than randomly generated ones to test the quality of some MAPS estimators. We use 24 historical monthly CAPM betas for each of the S&P 500 firms provided by WRDS555Wharton Research Data Services, wrds-www.wharton.upenn.edu between the dates 01/01/2018 and 11/30/2020. We denote these betas as β1,,β24\beta_{1},\dots,\beta_{24}. We will have two different mechanism of generating observations of the market model for the single and double data block test beds.

4.2.1 Single Data Block

The WRDS beta suite estimates beta each month from the prior 12 monthly returns. Hence we generate n=12n=12 sequential observations of the market model for each beta separately,

Rit=βtXit+Ziti=1,2,,12,t=1,2,,24,R_{i}^{t}=\beta_{t}X_{i}^{t}+Z_{i}^{t}\quad i=1,2,...,12,\quad t=1,2,\dots,24, (38)

with the unobserved market return XtX^{t} and the asset specific return ZtZ^{t} generated using the same settings as in the previous section.

For each βt\beta_{t} this produces a p×np\times n returns matrix RtR^{t} from which we can derive the following four estimators hth^{t} of βt\beta_{t}:

  1. 1.

    hPCAth_{PCA}^{t}, the PCA estimator of RtR^{t}. (PCA)

  2. 2.

    h^qt\hat{h}^{t}_{q} the GPS estimator of RtR^{t}. (GPS)

  3. 3.

    h^L(𝒫s)t\hat{h}^{t}_{L(\mathcal{P}_{s})}, the MAPS estimator of RtR^{t}, where 𝒫s\mathcal{P}_{s} is a sector partitioning of the indices {1,2,,p}\{1,2,...,p\} in which each atom in the partition contains the indices of one of the 11 sectors in the market666The 11 sectors of the Global Industry Classification Standard are: Information Technology, Health Care, Financials, Consumer Discretionary, Communication Services, Industrials, Consumer Staples, Enery, Utilities, Real Estate, and Materials.. This is one possible data-driven proxy for the beta-ordered uniform partition. (Sector Separated)

  4. 4.

    h^L(𝒫)t\hat{h}^{t}_{L(\mathcal{P})}, the MAPS estimator of RtR^{t} where 𝒫\mathcal{P} is a beta ordered uniform partition with 11 atoms constructed by using the true ordering of the entries of βt\beta_{t}. (Beta Ordered)

For each of these four choices of estimator hth^{t}, we examine three different measures of error: the squared 2\ell_{2} error htbt2||h^{t}-b_{t}||^{2}, the scaled squared tracking error p𝒯2(ht)p\mathcal{T}^{2}(h^{t}), and the scaled optimization bias pp2(ht)p\mathcal{E}^{2}_{p}(h^{t}).

Since we are interested in expected outcomes, we repeat the above experiment 100 times, and take the average of the errors as a monte carlo estimate of the expectations

𝔼[htbt2],𝔼[p𝒯2(ht)],𝔼[pp2(ht)],\mathbb{E}[||h^{t}-b_{t}||^{2}],\quad\mathbb{E}[p\mathcal{T}^{2}(h^{t})],\quad\mathbb{E}[p\mathcal{E}^{2}_{p}(h^{t})],

once for each tt. We then display box plots for the resulting distribution of 24 expected errors of each type, corresponding to the 24 historical betas.

Figure 5 shows a similar story for all three error measures. The GPS estimator significantly outperforms the PCA estimator, and the Beta Ordered estimator, which assumes the ability to rank order partition the betas, is significantly the best.

The result of more interest is that a sector partition approximates a beta ordered partition well enough to improve on the GPS estimate. This approach takes advantage of the fact that betas of stocks in a common sector tend on average to be closer to each other than to betas in other sectors. The Sector Separated MAPS estimator does not require any information not easily available to the practitioner, and so represents a costless improvement on the GPS estimation method.

Refer to caption
(a) 2\ell_{2} error
Refer to caption
(b) tracking error
Refer to caption
(c) optimization bias
Figure 5: Box plots summarizing the distribution of 24 monte carlo-estimated expected errors for the PCA, GPS, Sector Separated, and Beta Ordered estimators (left to right in each figure). The experiment is conducted over 488488 S&\&P 500 companies. This experiment reveals that the Sector Separated estimator is able to capture some of the ordering information and therefore outperforms the GPS estimator. The Beta Ordered estimator performs best.

4.2.2 Double Data Block

In order to test the dynamical MAPS estimator that is designed to take advantage of serial correlation in the betas, we will generate a test bed of double data blocks of simulated market observations using the 24 WRDS historical betas for the same time period as before.

For each t=1,2,,12t=1,2,\dots,12, we generate 12 simulated monthly market returns for βt\beta_{t} and βt+12\beta_{t+12} according to

Rit=βtXit+Zit,i=1,2,,12R_{i}^{t}=\beta_{t}X_{i}^{t}+Z_{i}^{t},\quad i=1,2,...,12 (39)
Rit+12=βt+12Xit+12+Zit+12,i=1,2,,12R_{i}^{t+12}=\beta_{t+12}X_{i}^{t+12}+Z_{i}^{t+12},\quad i=1,2,...,12 (40)

were XX and ZZ are generated independently as before.

This provides, for each tt, two p×12p\times 12 “single block” returns matrices RtR^{t} and Rt+12R^{t+12} each covering 12 months, and a combined “double block” p×2np\times 2n returns matrix Rt=[RtRt+12]R_{t}=[R^{t}R^{t+12}] containing 24 consecutive monthly returns of the pp stocks.

The estimation problem, given observation of the double block of data RtR_{t}, is to estimate the normalized beta vector βt+12/βt+12\beta^{t+12}/||\beta^{t+12}|| corresponding to the most recent 12 months. This estimate then implies an estimated covariance matrix for that 12 month period according to equations (35) and (36), and allows us to measure the estimation error as before.

We compare the following estimators:

  • hPCAsh_{PCA}^{s}, the PCA estimator of Rt+12R^{t+12}.

  • hPCAdh_{PCA}^{d}, the PCA estimator of the double block Rt=[RtRt+12]R_{t}=[R^{t}R^{t+12}].

  • h^qs\hat{h}^{s}_{q} the GPS estimator of Rt+12R^{t+12}.

  • h^qd\hat{h}^{d}_{q} the GPS estimator of Rt=[RtRt+12]R_{t}=[R^{t}R^{t+12}].

  • h^LD\hat{h}_{L{{}_{D}}} the dynamical MAPS estimator defined on the double block Rt=[RtRt+12]R_{t}=[R^{t}R^{t+12}] by Equation (20).

We will report our results using the same three error metrics as before: 𝔼[||b||2]\mathbb{E}[||*-b||^{2}], 𝔼[p𝒯2()]\mathbb{E}[p\mathcal{T}^{2}(*)], 𝔼[pp2()]\mathbb{E}[p\mathcal{E}^{2}_{p}(*)] for each of the five estimators. To obtain estimated expectations, we repeat the experiments 100 times and compute the average. The box plots summarize the distribution of the 12 overlapping double block expected errors.

The experiment shows that the Dynamical MAPS estimator outperforms the others, and illustrates the promise of Theorem 2.4, which is based on the hypothesis that betas exhibit some serial correlation. Another benefit of the Dynamical MAPS approach is to relieve the practitioner from the burden of choosing whether to use a GPS1 or GPS2 estimator when a double block of data is available.

Refer to caption
(a) 2\ell_{2} error
Refer to caption
(b) tracking error
Refer to caption
(c) optimization bias
Figure 6: Box plots for three kinds of expected error for (left to right) the PCA1, PCA2, GPS1, Dynamical MAPS, and GPS2 estimators, summarizing the distribution of 12 different expected errors for each estimator corresponding to 12 consecutive months of 2020. The experiment is conducted using 488488 S&\&P 500 companies.

5 Proofs of the Main Theorems

The proofs of the main theorems proceed by means of some intermediate results involving an “oracle estimator”, defined in terms of the unobservable bb but equal to the MAPS estimator in the asymptotic limit (Theorem 5.1 below). Several technical supporting propositions and lemmas are needed and postponed to Section 6.

5.1 Oracle Theorems

A key tool in the proofs is the oracle estimator hLh_{L}, which is a version of h^L\hat{h}_{L} but defined in terms of bb, our estimation target.

Given a subspace L=LpL=L_{p} of p\mathbb{R}^{p}, we define

hL=proj<h,L>(b)proj<h,L>(b).h_{L}=\frac{\underset{<h,L>}{\operatorname{proj}}(b)}{||\underset{<h,L>}{\operatorname{proj}}(b)||}. (41)

Here <h,L><h,L> denotes the span of hh and LL, and note that if L={0}L=\{0\} we get hL=hh_{L}=h, the PCA estimator. A nontrivial example for the selection would be Lp=<q>L_{p}=<q>, which generates hqh_{q}, the oracle version of the GPS estimator in Goldberg et al., [2021]. The following theorem says that asymptotically the oracle estimator (41) converges to the MAPS estimator (10).

Theorem 5.1.

Let the assumptions 1,2,3 and 4 hold. Suppose {Lp}\{L_{p}\} be any sequence of random linear subspaces that is independent of the entries of ZZ, such that dim(Lp)dim(L_{p}) is a square root dominated sequence. Then

limph^LhL=0.\lim\limits_{p\rightarrow\infty}||\hat{h}_{L}-h_{L}||=0. (42)

The proof of Theorem 5.1 requires the following proposition, proved in Section 6.

Proposition 5.2.

Under the assumptions of Theorem 5.1, let h=hPCAh=h_{PCA} be the PCA estimator, equal to the unit leading eigenvector of the sample covariance matrix. Then, almost surely:

  1. 1.

    limp((h,proj𝐿(h))(h,b)2(b,proj𝐿(b)))=0\lim\limits_{p\rightarrow\infty}\big{(}(h,\underset{L}{\operatorname{proj}}(h))-(h,b)^{2}(b,\underset{L}{\operatorname{proj}}(b))\big{)}=0,

  2. 2.

    limp((b,proj𝐿(h))(h,b)(b,proj𝐿(b))))=0\lim\limits_{p\rightarrow\infty}\big{(}(b,\underset{L}{\operatorname{proj}}(h))-(h,b)(b,\underset{L}{\operatorname{proj}}(b)))\big{)}=0, and

  3. 3.

    limpproj𝐿(h)(h,b)proj𝐿(b)=0.\lim\limits_{p\rightarrow\infty}||\underset{L}{\operatorname{proj}}(h)-(h,b)\underset{L}{\operatorname{proj}}(b)||=0.

In particular, proj𝐿(h)proj𝐿(h)\frac{\underset{L}{\operatorname{proj}}(h)}{||\underset{L}{\operatorname{proj}}(h)||} converges asymptotically to proj𝐿(b)proj𝐿(b)\frac{\underset{L}{\operatorname{proj}}(b)}{||\underset{L}{\operatorname{proj}}(b)||}.

Proof of the Theorem 5.1:.

Recall from (10) that,

h^L=τph+proj𝐿(h)τph+proj𝐿(h)  where  τp=ψp2proj𝐿(h)21ψp2.\hat{h}_{L}=\frac{\tau_{p}h+\underset{L}{\operatorname{proj}}(h)}{||\tau_{p}h+\underset{L}{\operatorname{proj}}(h)||}\text{ }\text{ where }\text{ }\tau_{p}=\frac{\psi_{p}^{2}-||\underset{L}{\operatorname{proj}}(h)||^{2}}{1-\psi_{p}^{2}}.

By Lemma 2.1, ψp\psi_{p} has an almost sure limit ψ=(h,b)(0,1)\psi_{\infty}=(h,b)_{\infty}\in(0,1), and hence τp\tau_{p} is bounded in pp almost surely.

Let Ω1Ω\Omega_{1}\subset\Omega be the almost sure set for which the conclusions of Proposition 5.2 hold. Define the notation

ap(ω)=h^LphLpa_{p}(\omega)=||\hat{h}_{L_{p}}-h_{L_{p}}||

and

γp=(h,b)(b,proj𝐿(h))1proj𝐿(h)2.\gamma_{p}=\frac{(h,b)-(b,\underset{L}{\operatorname{proj}}(h))}{1-||\underset{L}{\operatorname{proj}}(h)||^{2}}.

The proof will follow steps 1-4 below:

  1. 1.

    For every ωΩ1\omega\in\Omega_{1} and sub-sequence {pk}k=1{p}1\{p_{k}\}_{k=1}^{\infty}\subset\{p\}_{1}^{\infty} satisfying

    lim supkprojLpk(b)(ω)<1\limsup\limits_{k\rightarrow\infty}||\underset{L_{p_{k}}}{\operatorname{proj}}(b)||(\omega)<1

    we prove

    0<lim infkγpk(ω)lim supkγpk(ω)<0<\liminf\limits_{k\rightarrow\infty}\gamma_{p_{k}}(\omega)\leq\limsup\limits_{k\rightarrow\infty}\gamma_{p_{k}}(\omega)<\infty

    and

    0<lim infkτpk(ω)lim supkτpk(ω)<.0<\liminf\limits_{k\rightarrow\infty}\tau_{p_{k}}(\omega)\leq\limsup\limits_{k\rightarrow\infty}\tau_{p_{k}}(\omega)<\infty.
  2. 2.

    For every ωΩ1\omega\in\Omega_{1} and sub-sequence {pk}k=1{p}1\{p_{k}\}_{k=1}^{\infty}\subset\{p\}_{1}^{\infty} satisfying

    lim supkprojLpk(b)(ω)<1\limsup\limits_{k\rightarrow\infty}||\underset{L_{p_{k}}}{\operatorname{proj}}(b)||(\omega)<1

    we use step 1 to prove limkapk(w)\lim\limits_{k\rightarrow\infty}a_{p_{k}}(w)=0

  3. 3.

    Set Ω0={ωΩ|lim suppprojLp(b)2=1}\Omega_{0}=\{\omega\in\Omega\big{|}\limsup\limits_{p\rightarrow\infty}||\underset{L_{p}}{\operatorname{proj}}(b)||^{2}=1\}. Fix ωΩ0Ω1\omega\in\Omega_{0}\cap\Omega_{1} and prove using step 2 that limpap(ω)=0\lim\limits_{p\rightarrow\infty}a_{p}(\omega)=0

  4. 4.

    Finish the proof by applying step 2 for all ωΩ0cΩ1\omega\in\Omega_{0}^{c}\cap\Omega_{1} when {pk}\{p_{k}\} is set to {p}\{p\}.

Step 1: Since ωΩ1\omega\in\Omega_{1} we have the following immediate implications of Proposition 5.2,

lim supkprojLpk(h)2=(h,b)2lim supkprojLpk(b)2.\limsup\limits_{k\rightarrow\infty}||\underset{L_{p_{k}}}{\operatorname{proj}}(h)||^{2}=(h,b)_{\infty}^{2}\limsup\limits_{k\rightarrow\infty}||\underset{L_{p_{k}}}{\operatorname{proj}}(b)||^{2}. (43)
lim supk(b,projLpk(h))=(h,b)lim supkprojLpk(b)2.\limsup\limits_{k\rightarrow\infty}(b,\underset{L_{p_{k}}}{\operatorname{proj}}(h))=(h,b)_{\infty}\limsup\limits_{k\rightarrow\infty}||\underset{L_{p_{k}}}{\operatorname{proj}}(b)||^{2}. (44)

Using the assumption lim supkprojLpk(b)2<1\limsup\limits_{k\rightarrow\infty}||\underset{L_{p_{k}}}{\operatorname{proj}}(b)||^{2}<1, we update (43) and (44) as,

lim supkprojLpk(h)2<(h,b)2<1\limsup\limits_{k\rightarrow\infty}||\underset{L_{p_{k}}}{\operatorname{proj}}(h)||^{2}<(h,b)_{\infty}^{2}<1 (45)
lim supk(b,projLpk(h))<(h,b)\limsup\limits_{k\rightarrow\infty}(b,\underset{L_{p_{k}}}{\operatorname{proj}}(h))<(h,b)_{\infty} (46)

for the given ωΩ1\omega\in\Omega_{1}. We can use (45) on the numerator of τpk\tau_{p_{k}} to show,

lim infk(ψpk2projLpk(h))\displaystyle\liminf\limits_{k\rightarrow\infty}\big{(}\psi_{p_{k}}^{2}-||\underset{L_{p_{k}}}{\operatorname{proj}}(h)||\big{)} lim infkψpk2lim supkprojLpk(h)2\displaystyle\geq\liminf\limits_{k\rightarrow\infty}\psi_{p_{k}}^{2}-\limsup\limits_{k\rightarrow\infty}||\underset{L_{p_{k}}}{\operatorname{proj}}(h)||^{2}
=(h,b)2lim supkprojLpk(h)2>0.\displaystyle=(h,b)^{2}_{\infty}-\limsup\limits_{k\rightarrow\infty}||\underset{L_{p_{k}}}{\operatorname{proj}}(h)||^{2}>0.

That together with the fact that the denominator of τpk\tau_{p_{k}} has a limit in (0,)(0,\infty) implies,

0<lim infkτpk(ω)lim supkτpk(ω)<0<\liminf\limits_{k\rightarrow\infty}\tau_{p_{k}}(\omega)\leq\limsup\limits_{k\rightarrow\infty}\tau_{p_{k}}(\omega)<\infty (47)

Similarly we can use (46) on the numerator of γpk\gamma_{p_{k}} as,

lim infk((h,b)(b,projLpk(h)))(h,b)lim supk(b,projLpk(h))>0.\liminf\limits_{k\rightarrow\infty}\big{(}(h,b)-(b,\underset{L_{p_{k}}}{\operatorname{proj}}(h))\big{)}\geq(h,b)_{\infty}-\limsup\limits_{k\rightarrow\infty}(b,\underset{L_{p_{k}}}{\operatorname{proj}}(h))>0. (48)

Also (45) can be used on the denominator of γpk\gamma_{p_{k}} as,

lim infk1projLpk(h)2>1lim supkprojLpk(h)2>0\liminf\limits_{k\rightarrow\infty}{1-||\underset{L_{p_{k}}}{\operatorname{proj}}(h)||^{2}}>1-\limsup\limits_{k\rightarrow\infty}||\underset{L_{p_{k}}}{\operatorname{proj}}(h)||^{2}>0 (49)

Using (48) and (49) we get,

0<lim infkγpk(ω)lim supkγpk(ω)<0<\liminf\limits_{k\rightarrow\infty}\gamma_{p_{k}}(\omega)\leq\limsup\limits_{k\rightarrow\infty}\gamma_{p_{k}}(\omega)<\infty (50)

for the given ωΩ1\omega\in\Omega_{1}. This completes the step 1.

Step 2: We have the following initial observation,

1proj<h,Lpk>(b)proj<h>(b)=(h,b)1\geq||\underset{<h,L_{p_{k}}>}{\operatorname{proj}}(b)||\geq||\underset{<h>}{\operatorname{proj}}(b)||=(h,b) (51)

and using that we get

1lim suppproj<h,Lpk>(b)lim infpproj<h,Lpk>(b)(h,b)>0.1\geq\limsup\limits_{p\rightarrow}||\underset{<h,L_{p_{k}}>}{\operatorname{proj}}(b)||\geq\liminf\limits_{p\rightarrow}||\underset{<h,L_{p_{k}}>}{\operatorname{proj}}(b)||\geq(h,b)_{\infty}>0.

Given that, in order to show limkapk(ω)=0\lim\limits_{k\rightarrow\infty}a_{p_{k}}(\omega)=0, it suffices to show τpkh+projLpk(h)\tau_{p_{k}}h+\underset{L_{p_{k}}}{\operatorname{proj}}(h) converges to a scalar multiple of proj<h,Lpk>(b)\underset{<h,L_{p_{k}}>}{\operatorname{proj}}(b) since that scalar clears after normalizing the vectors. To motivate that lets re-write proj<h,Lpk>(b)\underset{<h,L_{p_{k}}>}{\operatorname{proj}}(b) as,

proj<h,Lpk>(b)\displaystyle\underset{<h,L_{p_{k}}>}{\operatorname{proj}}(b) =proj<hprojLpk(h),Lpk>(b)\displaystyle=\underset{<h-\underset{L_{p_{k}}}{\operatorname{proj}}(h),L_{p_{k}}>}{\operatorname{proj}}(b)
=projLpk(b)+(hprojLpk(h)hprojLpk(h),b)hprojLpk(h)hprojLpk(h)\displaystyle=\underset{L_{p_{k}}}{\operatorname{proj}}(b)+\bigg{(}\frac{h-\underset{L_{p_{k}}}{\operatorname{proj}}(h)}{||h-\underset{L_{p_{k}}}{\operatorname{proj}}(h)||},b\bigg{)}\frac{h-\underset{L_{p_{k}}}{\operatorname{proj}}(h)}{||h-\underset{L_{p_{k}}}{\operatorname{proj}}(h)||}
=projLpk(b)+γpk(hprojLpk(h))\displaystyle=\underset{L_{p_{k}}}{\operatorname{proj}}(b)+\gamma_{p_{k}}(h-\underset{L_{p_{k}}}{\operatorname{proj}}(h)) (52)
=γpk(h+1γpkprojLpk(b)projLpk(h)).\displaystyle=\gamma_{p_{k}}(h+\frac{1}{\gamma_{p_{k}}}\underset{L_{p_{k}}}{\operatorname{proj}}(b)-\underset{L_{p_{k}}}{\operatorname{proj}}(h)). (53)

We also have,

τpkh+projLpk(h)=τpk(h+1τpkprojLpk(h)).\tau_{p_{k}}h+\underset{L_{p_{k}}}{\operatorname{proj}}(h)=\tau_{p_{k}}(h+\frac{1}{\tau_{p_{k}}}\underset{L_{p_{k}}}{\operatorname{proj}}(h)). (54)

Since we have τpk\tau_{p_{k}} and γpk\gamma_{p_{k}} satisfying (47) and (50) respectively, we have the equations (53) and (54) well defined asymptotically, which is sufficient for our purpose. Hence, from the above argument it is sufficient to show the convergence of h+1τpkprojLpk(h)h+\frac{1}{\tau_{p_{k}}}\underset{L_{p_{k}}}{\operatorname{proj}}(h) to h+1γpkprojLpk(b)projLpk(h)h+\frac{1}{\gamma_{p_{k}}}\underset{L_{p_{k}}}{\operatorname{proj}}(b)-\underset{L_{p_{k}}}{\operatorname{proj}}(h). That is equivalent to showing 1τpkprojLpk(h)\frac{1}{\tau_{p_{k}}}\underset{L_{p_{k}}}{\operatorname{proj}}(h) converges to 1γpkprojLpk(b)projLpk(h)\frac{1}{\gamma_{p_{k}}}\underset{L_{p_{k}}}{\operatorname{proj}}(b)-\underset{L_{p_{k}}}{\operatorname{proj}}(h). We can re-write the associated quantity as,

|1τpkprojLpk(h)(1γpkprojLpk(b)projLpk(h))|=|(1+1τpk)projLpk(h)1γpkprojLpk(b)|\big{|}\frac{1}{\tau_{p_{k}}}\underset{L_{p_{k}}}{\operatorname{proj}}(h)-\big{(}\frac{1}{\gamma_{p_{k}}}\underset{L_{p_{k}}}{\operatorname{proj}}(b)-\underset{L_{p_{k}}}{\operatorname{proj}}(h)\big{)}\big{|}=\big{|}(1+\frac{1}{\tau_{p_{k}}})\underset{L_{p_{k}}}{\operatorname{proj}}(h)-\frac{1}{\gamma_{p_{k}}}\underset{L_{p_{k}}}{\operatorname{proj}}(b)\big{|} (55)

Using Proposition 5.2 part 3 in (55), it is equivalent to prove
|(1+1τpk)(h,b)1γpk|\big{|}(1+\frac{1}{\tau_{p_{k}}})(h,b)-\frac{1}{\gamma_{p_{k}}}\big{|} converges to 0. We re-write it as

|(1τpk+1)(h,b)1γpk|\displaystyle|(\frac{1}{\tau_{p_{k}}}+1)(h,b)-\frac{1}{\gamma_{p_{k}}}| =|(h,b)(1||projLpk(h)||2ψpk2projLpk(h)21projLpk(h)2(h,b)(projLpk(h),b)|\displaystyle=\bigg{|}\frac{(h,b)(1-||\underset{L_{p_{k}}}{\operatorname{proj}}(h)||^{2}}{\psi_{p_{k}}^{2}-||\underset{L_{p_{k}}}{\operatorname{proj}}(h)||^{2}}-\frac{1-||\underset{L_{p_{k}}}{\operatorname{proj}}(h)||^{2}}{(h,b)-(\underset{L_{p_{k}}}{\operatorname{proj}}(h),b)}\bigg{|}
=|1projLpk(h)2||(h,b)ψpk2projLpk(h)21(h,b)(projLpk(h),b)|\displaystyle=|1-||\underset{L_{p_{k}}}{\operatorname{proj}}(h)||^{2}|\bigg{|}\frac{(h,b)}{\psi_{p_{k}}^{2}-||\underset{L_{p_{k}}}{\operatorname{proj}}(h)||^{2}}-\frac{1}{(h,b)-(\underset{L_{p_{k}}}{\operatorname{proj}}(h),b)}\bigg{|} (56)

Using parts (1) and (2) of Proposition 5.2 and the fact that ψpk2\psi_{p_{k}}^{2} converges to (h,b)2(h,b)_{\infty}^{2} shows that (56) converges to 0 for the given ωΩ1\omega\in\Omega_{1}. This completes step 2.

Step 3: Fix ωΩ0Ω1\omega\in\Omega_{0}\cap\Omega_{1}. To show that limpap(ω)=0\lim_{p\to\infty}a_{p}(\omega)=0, it suffices to show that for any sub-sequence {pk}k=1{p}1\{p_{k}\}_{k=1}^{\infty}\subset\{p\}_{1}^{\infty} there exist a further sub-sequence {st}t=1\{s_{t}\}_{t=1}^{\infty} such that limtast(ω)=0\lim\limits_{t\rightarrow\infty}a_{s_{t}}(\omega)=0. Let {pk}k=1\{p_{k}\}_{k=1}^{\infty} be a subsequence. We have one of the following cases,

lim supkprojLpk(b)(ω)2<1\limsup\limits_{k\rightarrow\infty}||\underset{L_{p_{k}}}{\operatorname{proj}}(b)||(\omega)^{2}<1

or

lim supkprojLpk(b)(ω)2=1\limsup\limits_{k\rightarrow\infty}||\underset{L_{p_{k}}}{\operatorname{proj}}(b)||(\omega)^{2}=1

If it is strictly less than 11, then we get from the step 2 that limkapk(ω)=0\lim\limits_{k\rightarrow\infty}a_{p_{k}}(\omega)=0. In that case we take the further sub-sequence of equal to {pk}\{p_{k}\}.

If it is equal to 11, then we get a further sub-sequence {st}\{s_{t}\} s.t
limtprojLst(b)2=1\lim\limits_{t\rightarrow\infty}||\underset{L_{s_{t}}}{\operatorname{proj}}(b)||^{2}=1. Using this and Proposition 5.2 we get the following,

limtprojLst(h)2=(h,b)2  and limt(b,projLst(h))=(h,b)\lim\limits_{t\rightarrow\infty}||\underset{L_{s_{t}}}{\operatorname{proj}}(h)||^{2}=(h,b)^{2}_{\infty}\text{ }\text{ and }\lim\limits_{t\rightarrow\infty}(b,\underset{L_{s_{t}}}{\operatorname{proj}}(h))=(h,b)_{\infty}

which implies limtτst(ω)=limtγst(ω)=0\lim\limits_{t\rightarrow\infty}\tau_{s_{t}}(\omega)=\lim\limits_{t\rightarrow\infty}\gamma_{s_{t}}(\omega)=0. Using this on the definition of h^L\hat{h}_{L} and the equation (52) we get,

limth^LstprojLst(h)projLst(h)=0  and  limthLstprojLst(b)projLst(b)=0\lim\limits_{t\rightarrow\infty}\big{|}\big{|}\hat{h}_{L_{s_{t}}}-\frac{\underset{L_{s_{t}}}{\operatorname{proj}}(h)}{||\underset{L_{s_{t}}}{\operatorname{proj}}(h)||}\big{|}\big{|}=0\text{ }\text{ and }\text{ }\lim\limits_{t\rightarrow\infty}\big{|}\big{|}h_{L_{s_{t}}}-\frac{\underset{L_{s_{t}}}{\operatorname{proj}}(b)}{||\underset{L_{s_{t}}}{\operatorname{proj}}(b)||}\big{|}\big{|}=0 (57)

We can now decompose ast=h^LsthLsta_{s_{t}}=||\hat{h}_{L_{s_{t}}}-h_{L_{s_{t}}}|| into familiar components via the triangle inequality as follows,

ast=||h^LsthLst|||\displaystyle a_{s_{t}}=||\hat{h}_{L_{s_{t}}}-h_{L_{s_{t}}}||\leq\big{|} |h^LstprojLst(h)projLst(h)||+||hLstprojLst(b)projLst(b)||\displaystyle\big{|}\hat{h}_{L_{s_{t}}}-\frac{\underset{L_{s_{t}}}{\operatorname{proj}}(h)}{||\underset{L_{s_{t}}}{\operatorname{proj}}(h)||}\big{|}\big{|}+\big{|}\big{|}h_{L_{s_{t}}}-\frac{\underset{L_{s_{t}}}{\operatorname{proj}}(b)}{||\underset{L_{s_{t}}}{\operatorname{proj}}(b)||}\big{|}\big{|}
+projLst(b)projLst(b)projLst(h)projLst(h)\displaystyle+\big{|}\big{|}\frac{\underset{L_{s_{t}}}{\operatorname{proj}}(b)}{||\underset{L_{s_{t}}}{\operatorname{proj}}(b)||}-\frac{\underset{L_{s_{t}}}{\operatorname{proj}}(h)}{||\underset{L_{s_{t}}}{\operatorname{proj}}(h)||}\big{|}\big{|}

Using (57), we know that the first and the second terms on the right hand side converge to 0 for the given ωΩ0Ω1\omega\in\Omega_{0}\cap\Omega_{1}. Since we have limtprojLst(h)2=(h,b)2\lim\limits_{t\rightarrow\infty}||\underset{L_{s_{t}}}{\operatorname{proj}}(h)||^{2}=(h,b)^{2}_{\infty} and limtprojLst(b)2=1\lim\limits_{t\rightarrow\infty}||\underset{L_{s_{t}}}{\operatorname{proj}}(b)||^{2}=1, proving the third term on the right hand side converges to 0 is equivalent to proving

limtprojLst(h)(h,b)projLst(b)=0,\lim\limits_{t\rightarrow\infty}\big{|}\big{|}\underset{L_{s_{t}}}{\operatorname{proj}}(h)-(h,b)\underset{L_{s_{t}}}{\operatorname{proj}}(b)\big{|}\big{|}=0,

which is true by Proposition 5.2. This completes the step 3.

Step 4: In step 3 we proved the theorem for every ωΩ0Ω1\omega\in\Omega_{0}\cap\Omega_{1}. Replacing {pk}\{p_{k}\} in step 2 by the whole sequence of indices {p}\{p\}, we get the theorem for every ωΩ0cΩ1\omega\in\Omega_{0}^{c}\cap\Omega_{1}. These together shows that we have,

limpap(w)=0  for all ωΩ1\lim\limits_{p\rightarrow\infty}a_{p}(w)=0\text{ }\text{ for all }\omega\in\Omega_{1}

which completes the proof of Theorem 5.1. ∎

5.2 Proof of Theorem 2.2

The proof of Theorem 2.2(a) is an immediate application of Theorem 5.1.

Proof of the Theorem 2.2(a):.

From the definitions of hLh_{L} and hqh_{q}, and as long as qLpq\in L_{p}, we have

hLpbhqb||h_{L_{p}}-b||\leq||h_{q}-b||

and therefore

h^Lpb\displaystyle||\hat{h}_{L_{p}}-b|| \displaystyle\leq h^LphLp+hLpb\displaystyle||\hat{h}_{L_{p}}-h_{L_{p}}||+||h_{L_{p}}-b||
\displaystyle\leq h^LphLp+hqb\displaystyle||\hat{h}_{L_{p}}-h_{L_{p}}||+||h_{q}-b||
\displaystyle\leq h^LphLp+h^qb\displaystyle||\hat{h}_{L_{p}}-h_{L_{p}}||+||\hat{h}_{q}-b||

since hqbh^qb||h_{q}-b||\leq||\hat{h}_{q}-b|| for all pp. Applying Theorem 5.1 gives

lim suph^Lpbh^qb.\limsup||\hat{h}_{L_{p}}-b||\leq||\hat{h}_{q}-b||_{\infty}.

To prove the remainder of Theorem 2.2 we need the following intermediate result concerning Haar random subspaces, proved in Section 6.

Proposition 5.3.

Suppose, for each pp, zpz_{p} is a (possibly random) point in 𝕊p1\mathbb{S}^{p-1} and p\mathcal{H}_{p} is a Haar random subspace of p\mathbb{R}^{p} that is independent of zpz_{p}. Assume the sequence {dimp}\{\dim\mathcal{H}_{p}\} is square root dominated.

Then

limpprojp(zp)2=0 almost surely.\lim\limits_{p\rightarrow\infty}||\underset{\mathcal{H}_{p}}{\operatorname{proj}}(z_{p})||^{2}=0\text{ almost surely.}
Proof of the Theorem 2.2(b,c):.

Theorem 5.1 is applicable. Hence, it suffices to prove the results for the oracle version of the MAPS estimator.

Since the scalars clear after normalization, it suffices to prove the following assertions,

limpproj<h,>(b)proj<h>(b)2=0\lim\limits_{p\rightarrow\infty}||\underset{<h,\mathcal{H}>}{\operatorname{proj}}(b)-\underset{<h>}{\operatorname{proj}}(b)||_{2}=0 (58)

and

limpproj<h,q,>(b)proj<h,q>(b)2=0.\lim\limits_{p\rightarrow\infty}||\underset{<h,q,\mathcal{H}>}{\operatorname{proj}}(b)-\underset{<h,q>}{\operatorname{proj}}(b)||_{2}=0. (59)

We first consider (58), rewriting the left hand side as

limpproj(b)+projhproj(h)(b)proj<h>(b)2\displaystyle\lim\limits_{p\rightarrow\infty}||\underset{\mathcal{H}}{\operatorname{proj}}(b)+\underset{h-\underset{\mathcal{H}}{\operatorname{proj}}(h)}{\operatorname{proj}}(b)-\underset{<h>}{\operatorname{proj}}(b)||_{2}
proj(b)2+projhproj(h)(b)proj<h>(b)2\displaystyle\leq||\underset{\mathcal{H}}{\operatorname{proj}}(b)||_{2}+||\underset{h-\underset{\mathcal{H}}{\operatorname{proj}}(h)}{\operatorname{proj}}(b)-\underset{<h>}{\operatorname{proj}}(b)||_{2} (60)

The first term of (60) converges to 0 by setting z=bz=b in Proposition 5.3. Moreover, Propositions 5.3 and 5.2 imply proj(h)\underset{\mathcal{H}}{\operatorname{proj}}(h) converges to the origin in the l2l_{2} norm. Hence we have hproj(h)h-\underset{\mathcal{H}}{\operatorname{proj}}(h) is converging to hh in l2l_{2} norm. That implies the second term in (60) converges to 0, which in turn proves (58).

Next, rewrite the expression in the assertion (59) as,

proj(b)+proj<hproj(h),qproj(q)>(b)proj<h,q>(b)\displaystyle||\underset{\mathcal{H}}{\operatorname{proj}}(b)+\underset{<h-\underset{\mathcal{H}}{\operatorname{proj}}(h),q-\underset{\mathcal{H}}{\operatorname{proj}}(q)>}{\operatorname{proj}}(b)-\underset{<h,q>}{\operatorname{proj}}(b)||
proj(b)+proj<hproj(h),qproj(q)>(b)proj<h,q>(b)\displaystyle\leq||\underset{\mathcal{H}}{\operatorname{proj}}(b)||+||\underset{<h-\underset{\mathcal{H}}{\operatorname{proj}}(h),q-\underset{\mathcal{H}}{\operatorname{proj}}(q)>}{\operatorname{proj}}(b)-\underset{<h,q>}{\operatorname{proj}}(b)|| (61)

Similarly the first term of (61) converges to 0 by Proposition 5.3. Note that 5.3 also applies when we set z=qz=q, and hence proj(q)\underset{\mathcal{H}}{\operatorname{proj}}(q) converges to the origin in the l2l_{2} norm. Hence the basis elements of <hproj(h),qproj(q)><h-\underset{\mathcal{H}}{\operatorname{proj}}(h),q-\underset{\mathcal{H}}{\operatorname{proj}}(q)> converge to the basis elements of <h,q><h,q>, which implies the second term of (61) converges to 0 as well. That completes the proof. ∎

5.3 Proof of Theorem 2.3

We need the following lemma.

Lemma 5.4.

Let 𝒫(p)\mathcal{P}(p) be a sequence of uniform β\beta-ordered partitions such that limp|𝒫(p)|=\lim\limits_{p\rightarrow\infty}|\mathcal{P}(p)|=\infty. Then for Lp=L(𝒫(p))L_{p}=L(\mathcal{P}(p)) we have,

limpproj𝐿(b)=1\lim\limits_{p\rightarrow\infty}||\underset{L}{\operatorname{proj}}(b)||=1 (62)

almost surely.

Proof.

To be more precise about L=L(𝒫)L=L(\mathcal{P}), set 𝒫(p)={I1,I2,,Ikp}\mathcal{P}(p)=\{I_{1},I_{2},...,I_{k_{p}}\} and denote the defining basis of the corresponding subspace Lp=L(𝒫)L_{p}=L(\mathcal{P}) by the orthonormal set {v1,v2,,vkp}\{v_{1},v_{2},...,v_{k_{p}}\}.

Then

1proj𝐿(b)2\displaystyle 1-||\underset{L}{\operatorname{proj}}(b)||^{2} =1limpi=1kp(b,vi)2\displaystyle=1-\lim\limits_{p\rightarrow\infty}\sum\limits_{i=1}^{k_{p}}(b,v_{i})^{2}
=i=1pbi2limpi=1kp(b,vi)2\displaystyle=\sum\limits_{i=1}^{p}b_{i}^{2}-\lim\limits_{p\rightarrow\infty}\sum\limits_{i=1}^{k_{p}}(b,v_{i})^{2}
=limp1β2i=1kp(jIiβj21|Ii|(jIiβi)2)\displaystyle=\lim\limits_{p\rightarrow\infty}\frac{1}{||\beta||^{2}}\sum\limits_{i=1}^{k_{p}}(\sum\limits_{j\in I_{i}}\beta_{j}^{2}-\frac{1}{|I_{i}|}(\sum\limits_{j\in I_{i}}\beta_{i})^{2})
=limp1β2i=1kp(jIi(βj1|Ii|(jIiβi))2\displaystyle=\lim\limits_{p\rightarrow\infty}\frac{1}{||\beta||^{2}}\sum\limits_{i=1}^{k_{p}}(\sum\limits_{j\in I_{i}}(\beta_{j}-\frac{1}{|I_{i}|}(\sum\limits_{j\in I_{i}}\beta_{i}))^{2} (63)

Now define the random variables ai=maxjIi(βj)a_{i}=\underset{j\in I_{i}}{max}(\beta_{j}), ci=minjIi(βj)c_{i}=\underset{j\in I_{i}}{min}(\beta_{j}) for all 1ikp1\leq i\leq k_{p}. Without loss of generality, ckpakpc1a1c_{k_{p}}\leq a_{k_{p}}\leq...\leq c_{1}\leq a_{1}. Since the sequence {𝒫(p)}\{\mathcal{P}(p)\} is uniform, there exists M>0M>0 such that

maxI𝒫(p)|I|Mp|𝒫(p)|.\max\limits_{I\in\mathcal{P}(p)}|I|\leq\frac{Mp}{|\mathcal{P}(p)|}. (64)

Then

limp1β2i=1kp(jIi(βj1|Ii|(jIiβi))2\displaystyle\lim\limits_{p\rightarrow\infty}\frac{1}{||\beta||^{2}}\sum\limits_{i=1}^{k_{p}}(\sum\limits_{j\in I_{i}}(\beta_{j}-\frac{1}{|I_{i}|}(\sum\limits_{j\in I_{i}}\beta_{i}))^{2} limp1β2i=1kp|Ii|(aici)2\displaystyle\leq\lim\limits_{p\rightarrow\infty}\frac{1}{||\beta||^{2}}\sum\limits_{i=1}^{k_{p}}|I_{i}|(a_{i}-c_{i})^{2}
limpMpkpβ2i=1kp(aici)2\displaystyle\leq\lim\limits_{p\rightarrow\infty}\frac{\frac{Mp}{k_{p}}}{||\beta||^{2}}\sum\limits_{i=1}^{k_{p}}(a_{i}-c_{i})^{2} (65)
=limpMβ2p1kp(a1ckp)2\displaystyle=\lim\limits_{p\rightarrow\infty}\frac{M}{\frac{||\beta||^{2}}{p}}\frac{1}{k_{p}}(a_{1}-c_{k_{p}})^{2} (66)

The term a1ckpa_{1}-c_{k_{p}} appearing in (66) is uniformly bounded since the β\beta’s are uniformly bounded. Also, β2p\frac{||\beta||^{2}}{p} is finite and away from zero asymptotically. Using those together with the fact that limpkp=\lim\limits_{p\rightarrow\infty}k_{p}=\infty we get the limit in (66) equal to 0 for any realization of the random variables β\beta. Note that this is stronger than almost sure convergence. ∎

Proof of the Theorem 2.3:.

By an application of Theorem 5.1 it suffices to prove the theorem for the oracle version of the MAPS estimator. Now

bproj<h,L>(b)2bproj𝐿(b)2=1proj𝐿(b)2||b-\underset{<h,L>}{\operatorname{proj}}(b)||^{2}\leq||b-\underset{L}{\operatorname{proj}}(b)||^{2}=1-||\underset{L}{\operatorname{proj}}(b)||^{2} (67)

and note that application of Lemma 5.4 shows that proj𝐿(b)||\underset{L}{\operatorname{proj}}(b)|| converges to 1 as pp tends to \infty. ∎

5.4 Proof of Theorem 2.4(a)

The proof of Theorem 2.4 requires the following proposition, from which part (a) of the theorem easily follows. The proof of the proposition, along with the more difficult proof of the the strict inequality of 2.4 (b), appear in Section 6.

Recall that h1,h2h_{1},h_{2} and hh are the PCA leading eigenvectors of the sample covariance matrices of the returns R1,R2R_{1},R_{2} and RR, respectively.

Proposition 5.5.

For each pp there is a vector h~\tilde{h} in the linear subspace LRpL\subset R^{p} generated by h1h_{1} and h2h_{2} such that limph~h=0\lim\limits_{p\rightarrow\infty}||\tilde{h}-h||=0 almost surely.

Proof of Theorem 2.4(a).

Since dim(Lp)=2dim(L_{p})=2 and Lp=span(h1,q)L_{p}=span(h_{1},q) is independent of the asset specific portion Z2Z_{2} of the current block, Theorem 2.1 implies that h^L\hat{h}_{L} converges to hLh_{L} almost surely in l2l_{2} norm. Hence it suffices to establish the result for the oracle versions of the MAPS and the GPS estimators.

Note

(hL,b)=projspan(q,h1,h2)(b)(h_{L},b)=||\underset{span(q,h_{1},h_{2})}{\operatorname{proj}}(b)|| (68)
(hqs,b)=projspan(q,h2)(b)(h_{q}^{s},b)=||\underset{span(q,h_{2})}{\operatorname{proj}}(b)|| (69)
(hqd,b)=projspan(q,h)(b)(h_{q}^{d},b)=||\underset{span(q,h)}{\operatorname{proj}}(b)|| (70)

Using Proposition 5.5 we know there exist h~span(h1,h2)\tilde{h}\in span(h_{1},h_{2}) such that h~\tilde{h} converges to hh in l2l_{2} almost surely. Since span(q,h~)span(q,h1,h2)span(q,\tilde{h})\subset span(q,h_{1},h_{2}),

projspan(q,h1,h2)(b)projspan(q,h~)(b).||\underset{span(q,h_{1},h_{2})}{\operatorname{proj}}(b)||\geq||\underset{span(q,\tilde{h})}{\operatorname{proj}}(b)||.

Taking the limits of both sides we get

limp(hL,b)=limpprojspan(q,h1,h2)(b)limpprojspan(q,h)(b)=limp(hqd,b).\lim\limits_{p\rightarrow\infty}(h_{L},b)=\lim\limits_{p\rightarrow\infty}||\underset{span(q,h_{1},h_{2})}{\operatorname{proj}}(b)||\geq\lim\limits_{p\rightarrow\infty}||\underset{span(q,h)}{\operatorname{proj}}(b)||=\lim\limits_{p\rightarrow\infty}(h^{d}_{q},b). (71)

Similarly, since span(q,h1)span(q,h1,h2)span(q,h_{1})\subset span(q,h_{1},h_{2}),

limp(hL,b)=limpprojspan(q,h1,h2)(b)limpprojspan(q,h1)(b)=limp(hqd,b).\lim\limits_{p\rightarrow\infty}(h_{L},b)=\lim\limits_{p\rightarrow\infty}||\underset{span(q,h_{1},h_{2})}{\operatorname{proj}}(b)||\geq\lim\limits_{p\rightarrow\infty}||\underset{span(q,h_{1})}{\operatorname{proj}}(b)||=\lim\limits_{p\rightarrow\infty}(h^{d}_{q},b). (72)

Inequalities (71) and (72) complete the proof of Theorem 2.4(a). ∎

6 Supplemental Proofs

This section is devoted to four tasks: proving Proposition 5.2, Proposition 5.3, Proposition 5.5, and part (b) of Theorem 2.4.

6.1 Proof of Proposition 5.2

The following is equation (33) from Goldberg et al., [2021]:

h=βXTϕspn+Zϕspnh=\frac{\beta X^{T}\phi}{s_{p}\sqrt{n}}+\frac{Z\phi}{s_{p}\sqrt{n}} (73)

where sp2s_{p}^{2} is the eigenvalue corresponding to hh and ϕ\phi is the right singular vector of 1nR\frac{1}{\sqrt{n}}R. For notational convenience we set

Γp=|βXTϕspn.\Gamma_{p}=\frac{|\beta\|X^{T}\phi}{s_{p}\sqrt{n}}.
Lemma 6.1 (Goldberg et al., [2021]).

Under the assumptions 1,2,3 and 4 we have

  1. 1.

    limpΓp=(h,b)\lim\limits_{p\rightarrow\infty}\Gamma_{p}=(h,b)_{\infty},

  2. 2.

    limpsp2p=σβ2+δ2n\lim\limits_{p\rightarrow\infty}\frac{s_{p}^{2}}{p}=\sigma_{\beta}^{2}+\frac{\delta^{2}}{n}, and

  3. 3.

    limpϕp=X|X|\lim\limits_{p\rightarrow\infty}\phi_{p}=\frac{X}{|X|}

almost surely.

We need one more lemma to prepare for the proof of Proposition 5.2. Let ZkpZ_{k}\in\mathbb{R}^{p} denote the kkth column of ZZ. From our assumptions we know

  • for each k=1,,nk=1,\dots,n, {Zk(i)=Zik:i{1,2,,p}}\{Z_{k}(i)=Z_{ik}:i\in\{1,2,...,p\}\} is an independent set of mean zero random variables, and

  • there exists M(0,)M\in(0,\infty) such that 𝔼[Zk(i)4]M\mathbb{E}[Z_{k}(i)^{4}]\leq M for every i{1,2,,p}i\in\{1,2,...,p\}.

Lemma 6.2.

Let k{1,,n}k\in\{1,\dots,n\} and let uu be a random unit vector of dimension pp and independent of ZkZ_{k}. Then

𝔼[(u,Zk)]=0 and 𝔼[(u,Zk)4]3M.\mathbb{E}[(u,Z_{k})]=0\text{ and }\mathbb{E}[(u,Z_{k})^{4}]\leq 3M. (74)
Proof.

For notational convenience set X=ZkX=Z_{k}.

Let θ=(θ1,θ2,,θp)p\theta=(\theta_{1},\theta_{2},...,\theta_{p})\in\mathbb{N}^{p} where i=1pθi=4\sum\limits_{i=1}^{p}\theta_{i}=4. For that, define uθ=i=1puiθiu^{\theta}=\prod\limits_{i=1}^{p}u_{i}^{\theta_{i}}. Define XθX^{\theta} similarly. We have,

𝔼[(u,X)4]=θ𝔼[uθXθ].\mathbb{E}[(u,X)^{4}]=\sum\limits_{\theta}\mathbb{E}[u^{\theta}X^{\theta}]. (75)

Since XiX_{i}’s are independent and have mean 0 and uiu_{i} are independent of XiX_{i}’s, if θ\theta vector contains an entry equals to 11, we would get the corresponding term 𝔼[uθXθ]\mathbb{E}[u^{\theta}X^{\theta}] vanish. Hence we continue from the sum in (75) as,

=4!2!2!i<j𝔼[ui2uj2Xi2Xj2]+i𝔼[ui4Xi4]\displaystyle=\frac{4!}{2!2!}\sum\limits_{i<j}\mathbb{E}[u_{i}^{2}u_{j}^{2}X_{i}^{2}X_{j}^{2}]+\sum\limits_{i}\mathbb{E}[u_{i}^{4}X_{i}^{4}]
=6i<j𝔼[ui2uj2]𝔼[Xi2Xj2]+i𝔼[ui4]𝔼[Xi4]\displaystyle=6\sum\limits_{i<j}\mathbb{E}[u_{i}^{2}u_{j}^{2}]\mathbb{E}[X_{i}^{2}X_{j}^{2}]+\sum\limits_{i}\mathbb{E}[u_{i}^{4}]\mathbb{E}[X_{i}^{4}]
M(6i<j𝔼[ui2uj2]+i𝔼[ui4])\displaystyle\leq M\big{(}6\sum\limits_{i<j}\mathbb{E}[u_{i}^{2}u_{j}^{2}]+\sum\limits_{i}\mathbb{E}[u_{i}^{4}]\big{)} (76)

The last inequality in (76) follows by the assumption 𝔼[Xi4]<M\mathbb{E}[X_{i}^{4}]<M and an application of the Cauchy-Shawardz inequality

𝔼[Xi2Xj2]𝔼[Xi4]𝔼[Xj4]M.\mathbb{E}[X_{i}^{2}X_{j}^{2}]\leq\sqrt{\mathbb{E}[X_{i}^{4}]\mathbb{E}[X_{j}^{4}]}\leq M.

We can continue from (76) withd

=3M𝔼[(i=1pui2)2]2M𝔼[i=1pui4]3M𝔼[(i=1pui2)2]=3M=3M\mathbb{E}\big{[}(\sum\limits_{i=1}^{p}u_{i}^{2})^{2}\big{]}-2M\mathbb{E}[\sum\limits_{i=1}^{p}u_{i}^{4}]\leq 3M\mathbb{E}\big{[}(\sum\limits_{i=1}^{p}u_{i}^{2})^{2}\big{]}=3M (77)

completing the proof of Lemma 6.2. ∎

Lemma 6.3.

For each pp, let LpL_{p} be a (possibly random) linear subspace of p\mathbb{R}^{p}, with dimension kpk_{p}, that is independent of ZpZ^{p}. Assume the sequence {kp}\{k_{p}\} is square root dominated. Let ZspZ^{p}_{s} denote the ssth column of ZpZ^{p}.

Then, for any s{1,2,,n}s\in\{1,2,...,n\},

limp1pproj𝐿(Zsp)2=0\lim\limits_{p\rightarrow\infty}\frac{1}{p}||\underset{L}{\operatorname{proj}}(Z^{p}_{s})||^{2}=0 (78)

almost surely.

Proof.

Note first that, for any pp where kp=0k_{p}=0 we have proj𝐿(Zsp)=0||\underset{L}{\operatorname{proj}}(Z^{p}_{s})||=0 so without loss of generality we can assume kp>0k_{p}>0 for all pp. Under that assumption, there exist an orthonormal basis {u1,u2,,ukp}\{u_{1},u_{2},...,u_{k_{p}}\} of LpL_{p} that is independent of ZpZ^{p}. Then we can rewrite the expression in (78) as

1pproj𝐿(Zsp)2\displaystyle\frac{1}{p}||\underset{L}{\operatorname{proj}}(Z^{p}_{s})||^{2} =1pi=1kp(ui,Zsp)2\displaystyle=\frac{1}{p}\sum\limits_{i=1}^{k_{p}}(u_{i},Z^{p}_{s})^{2} (79)

Set Yi=(ui,Zs)Y_{i}=(u_{i},Z_{s}) and observe that YiY_{i} depends on the selection of the orthonormal basis but 1pi=1kpYi2\frac{1}{p}\sum\limits_{i=1}^{k_{p}}Y_{i}^{2} does not. Now see that the lemma 6.2 implies

𝔼[Yi4]<3M\mathbb{E}[Y_{i}^{4}]<3M (80)

for each ii. Using that together with an application of the Cauchy-Shawardz inequality shows,

𝔼[Yi2Yj2]𝔼[Yi4]𝔼[Yj4]<3M\mathbb{E}[Y_{i}^{2}Y_{j}^{2}]\leq\sqrt{\mathbb{E}[Y_{i}^{4}]\mathbb{E}[Y_{j}^{4}]}<3M (81)

for any iji\neq j. Reading 81 together with 80, we see that the 81 is true for any ii and jj combination including the case i=ji=j. We want to prove,

limp1pi=1kpYi2=0.\lim\limits_{p\rightarrow\infty}\frac{1}{p}\sum\limits_{i=1}^{k_{p}}Y_{i}^{2}=0. (82)

Using an application of the Chebyshev’s inequality, 80 and 81 we argue as follows,

(proj𝐿(Zsp)2>ϵp)\displaystyle\mathbb{P}(||\underset{L}{\operatorname{proj}}(Z^{p}_{s})||^{2}>\epsilon p) =(i=1kpYi2>ϵp)\displaystyle=\mathbb{P}(\sum\limits_{i=1}^{k_{p}}Y_{i}^{2}>\epsilon p)
<𝔼[(i=1kpYi2)2]ϵ2p2\displaystyle<\frac{\mathbb{E}[(\sum\limits_{i=1}^{k_{p}}Y_{i}^{2})^{2}]}{\epsilon^{2}p^{2}}
<3Mkp2ϵ2p2\displaystyle<\frac{3Mk_{p}^{2}}{\epsilon^{2}p^{2}} (83)

Note that the event Ap:={proj𝐿(Zsp)2>ϵp}A_{p}:=\{||\underset{L}{\operatorname{proj}}(Z^{p}_{s})||^{2}>\epsilon p\} does not depend on the selection of the orthonormal basis. Since {kp}\{k_{p}\} is square root dominated, we get

p=1(Ap)=p=1(proj𝐿(Zsp)2>ϵp)3Mϵ2p=1kp2p2<.\sum\limits_{p=1}^{\infty}\mathbb{P}(A_{p})=\sum\limits_{p=1}^{\infty}\mathbb{P}(||\underset{L}{\operatorname{proj}}(Z^{p}_{s})||^{2}>\epsilon p)\leq\frac{3M}{\epsilon^{2}}\sum\limits_{p=1}^{\infty}\frac{k_{p}^{2}}{p^{2}}<\infty. (84)

Since the ϵ\epsilon was arbitrary, an application of Borel-Cantelli lemma yields,

limp1pproj𝐿(Zsp)2=0 almost surely .\lim\limits_{p\rightarrow\infty}\frac{1}{p}||\underset{L}{\operatorname{proj}}(Z^{p}_{s})||^{2}=0\text{ almost surely }.

Proof of Proposition 5.2:.

Consider (h,proj𝐿(h))(h,\underset{L}{\operatorname{proj}}(h)) and let {u1,u2,,ukp}\{u_{1},u_{2},...,u_{k_{p}}\} be an orthonormal basis of LL. Using (73) and setting ϵip=uiTZpϕspn\epsilon_{i}^{p}=\frac{u_{i}^{T}Z}{\sqrt{p}}\frac{\phi}{\frac{s}{\sqrt{p}}\sqrt{n}}, we obtain

(h,proj𝐿(h))\displaystyle(h,\underset{L}{\operatorname{proj}}(h)) =i=1kp(h,ui)2\displaystyle=\sum\limits_{i=1}^{k_{p}}(h,u_{i})^{2}
=i=1kp(Γp(b,ui)+uiTZpϕspn)2\displaystyle=\sum\limits_{i=1}^{k_{p}}(\Gamma_{p}(b,u_{i})+\frac{u_{i}^{T}Z}{\sqrt{p}}\frac{\phi}{\frac{s}{\sqrt{p}}\sqrt{n}})^{2}
=Γp2i=1kp(b,ui)2+2Γpi=1kp(b,ui)ϵip+i=1kp(ϵip)2\displaystyle=\Gamma_{p}^{2}\sum\limits_{i=1}^{k_{p}}(b,u_{i})^{2}+2\Gamma_{p}\sum\limits_{i=1}^{k_{p}}(b,u_{i})\epsilon_{i}^{p}+\sum\limits_{i=1}^{k_{p}}(\epsilon_{i}^{p})^{2}
=Γp2(b,proj𝐿(b))+2Γpi=1kp(b,ui)ϵip+i=1kp(ϵip)2.\displaystyle=\Gamma_{p}^{2}(b,\underset{L}{\operatorname{proj}}(b))+2\Gamma_{p}\sum\limits_{i=1}^{k_{p}}(b,u_{i})\epsilon_{i}^{p}+\sum\limits_{i=1}^{k_{p}}(\epsilon_{i}^{p})^{2}. (85)

We can relate the third term in (85) to Lemma 6.3 as follows,

i=1kp(ϵip)2\displaystyle\sum\limits_{i=1}^{k_{p}}(\epsilon_{i}^{p})^{2} =1nsp2p1pi=1kpl,sϕlϕs(ui,Zl)(ui,Zs)\displaystyle=\frac{1}{n\frac{s_{p}^{2}}{p}}\frac{1}{p}\sum\limits_{i=1}^{k_{p}}\sum\limits_{l,s}\phi_{l}\phi_{s}(u_{i},Z_{l})(u_{i},Z_{s})
=1nsp2pl,sϕlϕs1pi=1kp(ui,Zl)(ui,Zs)\displaystyle=\frac{1}{n\frac{s_{p}^{2}}{p}}\sum\limits_{l,s}\phi_{l}\phi_{s}\frac{1}{p}\sum\limits_{i=1}^{k_{p}}(u_{i},Z_{l})(u_{i},Z_{s})
=1nsp2pl,sϕlϕs1p(Zs,proj𝐿(Zl)).\displaystyle=\frac{1}{n\frac{s_{p}^{2}}{p}}\sum\limits_{l,s}\phi_{l}\phi_{s}\frac{1}{p}(Z_{s},\underset{L}{\operatorname{proj}}(Z_{l})). (86)

Using the Cauchy-Schwarz inequality yields

1p(Zs,proj𝐿(Zl))\displaystyle\frac{1}{p}(Z_{s},\underset{L}{\operatorname{proj}}(Z_{l})) =1p(proj𝐿(Zs),proj𝐿(Zl))\displaystyle=\frac{1}{p}(\underset{L}{\operatorname{proj}}(Z_{s}),\underset{L}{\operatorname{proj}}(Z_{l}))
1pproj𝐿(Zs)21pproj𝐿(Zl)2\displaystyle\leq\sqrt{\frac{1}{p}||\underset{L}{\operatorname{proj}}(Z_{s})||^{2}}\sqrt{\frac{1}{p}||\underset{L}{\operatorname{proj}}(Z_{l})||^{2}}
=1p(Zs,proj𝐿(Zs))1p(Zl,proj𝐿(Zl))\displaystyle=\sqrt{\frac{1}{p}(Z_{s},\underset{L}{\operatorname{proj}}(Z_{s}))}\sqrt{\frac{1}{p}(Z_{l},\underset{L}{\operatorname{proj}}(Z_{l}))} (87)

Moreover, since Lemma 6.3 applies to each column of ZZ we get,

limp1p(Zs,proj𝐿(Zl))=0.\lim\limits_{p\rightarrow\infty}\frac{1}{p}(Z_{s},\underset{L}{\operatorname{proj}}(Z_{l}))=0. (88)

for any ll and ss. By Lemma 6.1, sp2/ps_{p}^{2}/p is bounded below, so 1nsp2p\frac{1}{n\frac{s_{p}^{2}}{p}} has a finite limit supremum as pp goes to infinity. Also, since the vector ϕ\phi is orthonormal and is of finite dimension, we obtain

limpi=1kp(ϵip)2=0\lim\limits_{p\rightarrow\infty}\sum\limits_{i=1}^{k_{p}}(\epsilon_{i}^{p})^{2}=0 (89)

almost surely.

Now we are ready to prove the first part of the proposition 5.2,

|(h,proj𝐿(h))(h,b)2(b,proj𝐿(b))|\displaystyle|(h,\underset{L}{\operatorname{proj}}(h))-(h,b)^{2}(b,\underset{L}{\operatorname{proj}}(b))|
|Γp2(h,b)2|(b,proj𝐿(b))+2|Γpi=1kp(b,ui)ϵip|+i=1kp(ϵip)2\displaystyle\leq|\Gamma^{2}_{p}-(h,b)^{2}|(b,\underset{L}{\operatorname{proj}}(b))+2|\Gamma_{p}\sum\limits_{i=1}^{k_{p}}(b,u_{i})\epsilon_{i}^{p}|+\sum\limits_{i=1}^{k_{p}}(\epsilon_{i}^{p})^{2} (90)
=|Γp2(h,b)2|i=1kp(b,ui)2+2|Γpi=1kp(b,ui)ϵip|+i=1kp(ϵip)2\displaystyle=|\Gamma^{2}_{p}-(h,b)^{2}|\sum\limits_{i=1}^{k_{p}}(b,u_{i})^{2}+2|\Gamma_{p}\sum\limits_{i=1}^{k_{p}}(b,u_{i})\epsilon_{i}^{p}|+\sum\limits_{i=1}^{k_{p}}(\epsilon_{i}^{p})^{2} (91)
|Γp2(h,b)2|i=1kp(b,ui)2+2|Γp|i=1kp(b,ui)2i=1kp(ϵip)2+i=1kp(ϵip)2\displaystyle\leq|\Gamma^{2}_{p}-(h,b)^{2}|\sum\limits_{i=1}^{k_{p}}(b,u_{i})^{2}+2|\Gamma_{p}|\sqrt{\sum\limits_{i=1}^{k_{p}}(b,u_{i})^{2}}\sqrt{\sum\limits_{i=1}^{k_{p}}(\epsilon_{i}^{p})^{2}}+\sum\limits_{i=1}^{k_{p}}(\epsilon_{i}^{p})^{2} (92)
|Γp2(h,b)2|+2|Γp|i=1kp(ϵip)2+i=1kp(ϵip)2.\displaystyle\leq|\Gamma^{2}_{p}-(h,b)^{2}|+2|\Gamma_{p}|\sqrt{\sum\limits_{i=1}^{k_{p}}(\epsilon_{i}^{p})^{2}}+\sum\limits_{i=1}^{k_{p}}(\epsilon_{i}^{p})^{2}. (93)

Note that we used the Cauchy-Schwarz and Bessel’s inequalities for the transitions from (91) to (92), and (92) to (93), respectively. Using (89) and Lemma 6.1, the right hand side of the inequality (93) converges to zero almost surely.

For the second part of the Propostion, a similar argument shows that

|(b,proj𝐿(h))(h,b)(b,proj𝐿(b))||(b,\underset{L}{\operatorname{proj}}(h))-(h,b)(b,\underset{L}{\operatorname{proj}}(b))|

converges to zero almost surely.

For the last part of Proposition 5.2,

proj𝐿(h)(h,b)proj𝐿(b)2\displaystyle||\underset{L}{\operatorname{proj}}(h)-(h,b)\underset{L}{\operatorname{proj}}(b)||^{2}
=proj𝐿(h)22(h,b)(proj𝐿(h),proj𝐿(b))+proj𝐿(b)2\displaystyle=||\underset{L}{\operatorname{proj}}(h)||^{2}-2(h,b)(\underset{L}{\operatorname{proj}}(h),\underset{L}{\operatorname{proj}}(b))+||\underset{L}{\operatorname{proj}}(b)||^{2}
=(h,proj𝐿(h))2(h,b)(b,proj𝐿(h))+(h,b)2(b,proj𝐿(b))\displaystyle=(h,\underset{L}{\operatorname{proj}}(h))-2(h,b)(b,\underset{L}{\operatorname{proj}}(h))+(h,b)^{2}(b,\underset{L}{\operatorname{proj}}(b))
=[(h,proj𝐿(h))(h,b)2(b,proj𝐿(b))]+2(h,b)[(h,b)(b,proj𝐿(b))(b,proj𝐿(h))].\displaystyle=\big{[}(h,\underset{L}{\operatorname{proj}}(h))-(h,b)^{2}(b,\underset{L}{\operatorname{proj}}(b))\big{]}+2(h,b)\big{[}(h,b)(b,\underset{L}{\operatorname{proj}}(b))-(b,\underset{L}{\operatorname{proj}}(h))\big{]}.

An application of the first two parts of Proposition 5.2 completes the proof. ∎

6.2 Proof of Proposition 5.3

Two preliminary lemmas are useful.

Lemma 6.4.

Fix an ϵ>0\epsilon>0, and positive integers p,kpp,k_{p} with 1kp<p1\leq k_{p}<p. For gO(n)g\in O(n), define the linear subspace Hp(g)=<ge1,ge2,,gekp>H_{p}(g)=<ge_{1},ge_{2},...,ge_{k_{p}}> of p\mathbb{R}^{p}. Let u𝕊p1u\in\mathbb{S}^{p-1} be a fixed vector, and define

Npu={gO(p)| projHp(g)(u)2>ϵ}.N_{p}^{u}=\{g\in O(p)\big{|}\text{ }||\underset{H_{p}(g)}{\operatorname{proj}}(u)||^{2}>\epsilon\}.

Similarly, for a fixed subspace VpV_{p} of dimension kpk_{p}, we define

MpV={gO(p)| projVp(ge1)2>ϵ}.M_{p}^{V}=\{g\in O(p)\big{|}\text{ }||\underset{V_{p}}{\operatorname{proj}}(ge_{1})||^{2}>\epsilon\}.

For any choice of the nonrandom vector uu and the nonrandom subspace VpV_{p} we have σ(Npu)=σ(MpV)\sigma(N_{p}^{u})=\sigma(M_{p}^{V}), where σ\sigma denotes Haar measure on O(p)O(p).

The lemma asserts that, under the Haar measure, the event that involves the projection of a fixed point on 𝕊p1\mathbb{S}^{p-1} onto a random linear subspace is equivalent to the event that involves the projection of a random point on 𝕊p1\mathbb{S}^{p-1} to a fixed linear subspace.

The proof of this lemma makes use of the fact that Haar measure is invariant under left and right translations, and is omitted.

Lemma 6.5.

For ϵ>0\epsilon>0 and any linear subspace VppV_{p}\subset\mathbb{R}^{p} of dimension kpk_{p}, define, as before,

MpV={gO(p)| projVp(ge1)2>ϵ}.M_{p}^{V}=\{g\in O(p)\big{|}\text{ }||\underset{V_{p}}{\operatorname{proj}}(ge_{1})||^{2}>\epsilon\}.

Then we have the following bound depending only on ϵ\epsilon, pp, and kpk_{p}:

σ(MpV)Dp=kpI1ϵkp(p12,12)I(p12,12)\sigma(M_{p}^{V})\leq D_{p}=k_{p}\frac{I_{1-\frac{\epsilon}{k_{p}}}(\frac{p-1}{2},\frac{1}{2})}{I(\frac{p-1}{2},\frac{1}{2})}

where I(x,y)I(x,y) and Ia(x,y)I_{a}(x,y) are the beta function and the incomplete beta function, respectively. Moreover, if kpk_{p} is of order O(pα)O(p^{\alpha}) for α<1\alpha<1, the bound satisfies

p=1Dp<.\sum\limits_{p=1}^{\infty}D_{p}<\infty.

We omit the purely geometric proof, which uses an analysis of volumes of spherical caps and properties of the beta functions.

Proof of the Theorem 5.3..

For the random linear subspace

p(ω)=<ξ(ω)ei | i=1,2,kp>,\mathcal{H}_{p}(\omega)=<\xi(\omega)e_{i}\text{ }|\text{ }i=1,2...,k_{p}>,

where ξ\xi is an O(n)O(n)-valued random variable inducing the Haar measure on O(n)O(n), we would like to show

limpprojp(z)2=0  almost surely.\lim\limits_{p\rightarrow\infty}||\underset{\mathcal{H}_{p}}{\operatorname{proj}}(z)||^{2}=0\text{ }\text{ almost surely.}

It suffices to show, for any ϵ>0\epsilon>0 that

lim suppprojp(z)2ϵ  almost surely.\limsup\limits_{p\rightarrow\infty}||\underset{\mathcal{H}_{p}}{\operatorname{proj}}(z)||^{2}\leq\epsilon\text{ }\text{ almost surely.} (94)

Fix an ϵ>0\epsilon>0, and define the set

Fp={(g,y)O(p)×𝕊p1| i=1kp(gei,y)2>ϵ}.F_{p}=\big{\{}(g,y)\in O(p)\times\mathbb{S}^{p-1}\big{|}\text{ }\sum\limits_{i=1}^{k_{p}}(ge_{i},y)^{2}>\epsilon\big{\}}.

Now consider the event Ap={ωΩ| (ξ(ω),zp(ω))Fp}A_{p}=\{\omega\in\Omega\big{|}\text{ }(\xi(\omega),z_{p}(\omega))\in F_{p}\}. Since

Ap={ωΩ| ||projp(ω)(zp(ω)||2>ϵ},A_{p}=\{\omega\in\Omega\big{|}\text{ }||\underset{\mathcal{H}_{p}(\omega)}{\operatorname{proj}}(z_{p}(\omega)||^{2}>\epsilon\},

if we show

p=1(Ap)<\sum\limits_{p=1}^{\infty}\mathbb{P}(A_{p})<\infty

then an application of the Borel-Cantelli lemma will establish (94).

To that end, note

(Ap)\displaystyle\mathbb{P}(A_{p}) =𝔼[𝟏{Ap}]=𝔼[𝟏{Fp}(ξ,zp)]\displaystyle=\mathbb{E}_{\mathbb{P}}[\mathbf{1}_{\{A_{p}\}}]=\mathbb{E}[\mathbf{1}_{\{F_{p}\}}(\xi,z_{p})]
=𝔼[𝔼[𝟏{Fp}(ξ,zp)|zp]].\displaystyle=\mathbb{E}[\mathbb{E}[\mathbf{1}_{\{F_{p}\}}(\xi,z_{p})\big{|}z_{p}]]. (95)

For any u𝕊p1u\in\mathbb{S}^{p-1} define the (nonrandom) function h(u)=𝔼[𝟏{Fp}(ξ,u)]h(u)=\mathbb{E}[\mathbf{1}_{\{F_{p}\}}(\xi,u)]. Since ξ\xi is independent of zpz_{p}, we have for any ωΩ\omega\in\Omega

𝔼[𝟏{Fp}(ξ,zp)|zp](ω)=h(zp(ω)),\mathbb{E}[\mathbf{1}_{\{F_{p}\}}(\xi,z_{p})\big{|}z_{p}](\omega)=h(z_{p}(\omega)),

and therefore

𝔼[𝔼[𝟏{Fp}(ξ,zp)|zp]]=𝔼[h(zp)].\mathbb{E}[\mathbb{E}[\mathbf{1}_{\{F_{p}\}}(\xi,z_{p})\big{|}z_{p}]]=\mathbb{E}[h(z_{p})].

On the other hand we have h(u)=(ωΩ| ξ(ω)Npu)=σ(Npu)h(u)=\mathbb{P}(\omega\in\Omega\big{|}\text{ }\xi(\omega)\in N_{p}^{u})=\sigma(N_{p}^{u}), using the notation of Lemmas 6.4 and 6.5. By the use of those lemmas we obtain a bound DpD_{p} on h(u)h(u) that does not depend on uu. Hence, using (95), we have

(Ap)=𝔼[h(zp)]𝔼[Dp]=Dp,\displaystyle\mathbb{P}(A_{p})=\mathbb{E}[h(z_{p})]\leq\mathbb{E}[D_{p}]=D_{p},

and by an application of Lemma 6.5 we get

p=1(Ap)p=1Dp<.\sum\limits_{p=1}^{\infty}\mathbb{P}(A_{p})\leq\sum\limits_{p=1}^{\infty}D_{p}<\infty.

6.3 Proof of Proposition 5.5

From now on we will use the shorthand ν\nu for the quantity (b1,b2)(b_{1},b_{2})_{\infty}.

Also, recall assumption (5): μ(β1)=μ(β2)\mu_{\infty}(\beta_{1})=\mu_{\infty}(\beta_{2}) and d(β1)=d(β2)d_{\infty}(\beta_{1})=d_{\infty}(\beta_{2}) a.s. Therefore

limp|β|p=μ(β1)1+d(β1)2=μ(β2)1+d(β2)2=limp|β2|p\lim\limits_{p\rightarrow\infty}\frac{|\beta|}{\sqrt{p}}=\mu_{\infty}(\beta_{1})\sqrt{1+d_{\infty}(\beta_{1})^{2}}=\mu_{\infty}(\beta_{2})\sqrt{1+d_{\infty}(\beta_{2})^{2}}=\lim\limits_{p\rightarrow\infty}\frac{|\beta_{2}|}{\sqrt{p}} (96)

and

(b1,q)=limpi=1pβ1(i)|β1|p=μ(β1)|β1|p=μ(β2)|β2|p=limpi=1pβ2(i)|β2|p=(b2,q).(b_{1},q)_{\infty}=\frac{\lim\limits_{p\rightarrow\infty}\sum\limits_{i=1}^{p}\beta_{1}(i)}{\frac{|\beta_{1}|}{\sqrt{p}}}=\frac{\mu_{\infty}(\beta_{1})}{\frac{|\beta_{1}|}{\sqrt{p}}}=\frac{\mu_{\infty}(\beta_{2})}{\frac{|\beta_{2}|}{\sqrt{p}}}=\frac{\lim\limits_{p\rightarrow\infty}\sum\limits_{i=1}^{p}\beta_{2}(i)}{\frac{|\beta_{2}|}{\sqrt{p}}}=(b_{2},q)_{\infty}. (97)

For the proof of Proposition 5.5 we will need three intermediate lemmas.

Lemma 6.6.

For x1,x2nx_{1},x_{2}\in\mathbb{R}^{n} set x=[x1x2]2nx=\bigg{[}\begin{array}[]{c}x_{1}\\ x_{2}\end{array}\bigg{]}\in\mathbb{R}^{2n} and impose x2=1||x||^{2}=1. Recall ν=(b1,b2)\nu=(b_{1},b_{2})_{\infty} and define the following functions:

gp(x)gp(x1,x2)=Rx22np=R1x1+R2x222np,g_{p}(x)\equiv g_{p}(x_{1},x_{2})=\frac{||Rx||^{2}}{2np}=\frac{||R_{1}x_{1}+R_{2}x_{2}||^{2}}{2np}, (98)
g(x)=(limp|β|2p)12n[t=12(XtTxt)2+2ν(X1Tx1)(X2Tx2)+δ2].g_{\infty}(x)=\left(\lim\limits_{p\rightarrow\infty}\frac{|\beta|^{2}}{p}\right)\frac{1}{2n}\bigg{[}\sum\limits_{t=1}^{2}(X_{t}^{T}x_{t})^{2}+2\nu(X_{1}^{T}x_{1})(X_{2}^{T}x_{2})+\delta^{2}\bigg{]}. (99)

Then gp(x)g_{p}(x) converges to g(x)g_{\infty}(x) uniformly almost surely as pp tends to \infty.

Proof.

Notice

gp(x1,x2)=12n[1pt=12Rtxt2+1p2(R1x1,R2x2)]g_{p}(x_{1},x_{2})=\frac{1}{2n}\bigg{[}\frac{1}{p}\sum\limits_{t=1}^{2}||R_{t}x_{t}||^{2}+\frac{1}{p}2(R_{1}x_{1},R_{2}x_{2})\bigg{]} (100)

By using the proof of Lemma 6.14 on each summand, the first term in the bracket converges to

(limp|β|2p)t=12(XtTxt)2+δ2\left(\lim\limits_{p\rightarrow\infty}\frac{|\beta|^{2}}{p}\right)\sum\limits_{t=1}^{2}(X_{t}^{T}x_{t})^{2}+\delta^{2} (101)

uniformly almost surely. Hence it suffices to prove that the remaining term 1np(R1x1,R2x2)\frac{1}{np}(R_{1}x_{1},R_{2}x_{2}) converges to (limp|β|2p)ν1n(X1Tx1)(X2Tx2)\left(\lim\limits_{p\rightarrow\infty}\frac{|\beta|^{2}}{p}\right)\nu\frac{1}{n}(X_{1}^{T}x_{1})(X_{2}^{T}x_{2}) uniformly almost surely. We can re-write it as follows,

1np(R1x1,R2x2)=\displaystyle\frac{1}{np}(R_{1}x_{1},R_{2}x_{2})= 1n(X1Tx1)1pβ1TZ2x2+1n(X2Tx2)1pβ2TZ1x1\displaystyle\frac{1}{n}(X_{1}^{T}x_{1})\frac{1}{p}\beta_{1}^{T}Z_{2}x_{2}+\frac{1}{n}(X_{2}^{T}x_{2})\frac{1}{p}\beta_{2}^{T}Z_{1}x_{1}
+\displaystyle+ 1n(X1Tx1)(X2Tx2)1pβ1Tβ2+x1TZ1TZ2x2np.\displaystyle\frac{1}{n}(X_{1}^{T}x_{1})(X_{2}^{T}x_{2})\frac{1}{p}\beta_{1}^{T}\beta_{2}+\frac{x_{1}^{T}Z_{1}^{T}Z_{2}x_{2}}{np}. (102)

The first and second term converges to zero uniformly almost surely by an application of Lemma 6.13 and by the fact that the terms X1Tx1X_{1}^{T}x_{1} and X2Tx2X_{2}^{T}x_{2} can be uniformly bounded by |X1||X_{1}| and |X2||X_{2}|, respectively. Also note the third term converges to the desired limit uniformly almost surely. Hence it is left to prove that the fourth term converges to 0 uniformly almost surely. We can re-write it as,

x1TZ1TZ2x2np=1ni,jn(x1(i))(x2(j))1pk=1p(Z1)ki(Z2)kj.\frac{x_{1}^{T}Z_{1}^{T}Z_{2}x_{2}}{np}=\frac{1}{n}\sum\limits_{i,j}^{n}(x_{1}(i))(x_{2}(j))\frac{1}{p}\sum\limits_{k=1}^{p}(Z_{1})_{ki}(Z_{2})_{kj}. (103)

Lets now fix i,ji,j and set Yk:=(Z1)ki(Z2)kjY_{k}:=(Z_{1})_{ki}(Z_{2})_{kj}. Since the entries (Z1)ki(Z_{1})_{ki} and (Z2)kj(Z_{2})_{kj} belong to the same row of ZZ, they are uncorrelated by the assumption 4. Moreover, {Yk}k=1\{Y_{k}\}_{k=1}^{\infty} is a sequence of independent random variables by the assumption 4 as well. Using that and an application of Cauchy-Shawardz inequality along with the 4th moment condition on the entries of ZZ we get,

𝔼[Yk]=𝔼[(Z1)ki]𝔼[(Z2)kj]=0,\mathbb{E}[Y_{k}]=\mathbb{E}[(Z_{1})_{ki}]\mathbb{E}[(Z_{2})_{kj}]=0,
𝔼[Yk2]=𝔼[(Z1)ki2(Z2)kj2]𝔼[(Z1)ki4]𝔼[(Z2)kj4]M.\mathbb{E}[Y_{k}^{2}]=\mathbb{E}[(Z_{1})_{ki}^{2}(Z_{2})_{kj}^{2}]\leq\sqrt{\mathbb{E}[(Z_{1})_{ki}^{4}]\mathbb{E}[(Z_{2})_{kj}^{4}]}\leq M.

From here we can apply the Kolmogorov strong law of large numbers to conclude

limp1pk=1p(Z1)ki(Z2)kj=limp1pk=1pYk=0  almost surely\lim\limits_{p\rightarrow\infty}\frac{1}{p}\sum\limits_{k=1}^{p}(Z_{1})_{ki}(Z_{2})_{kj}=\lim\limits_{p\rightarrow\infty}\frac{1}{p}\sum\limits_{k=1}^{p}Y_{k}=0\text{ }\text{ almost surely} (104)

which is true for all i,j{1,2,,n}i,j\in\{1,2,...,n\}. We are now ready to complete the argument.

limp|1n\displaystyle\lim\limits_{p\rightarrow\infty}\Big{|}\frac{1}{n} i,jn(x1)i(x2)j1pk=1p(Z1)ki(Z2)kj|\displaystyle\sum\limits_{i,j}^{n}(x_{1})_{i}(x_{2})_{j}\frac{1}{p}\sum\limits_{k=1}^{p}(Z_{1})_{ki}(Z_{2})_{kj}\Big{|}
|1ni,jn(x1)i(x2)j|limpmaxi,j|1pk=1p(Z1)ki(Z2)kj|\displaystyle\leq\Big{|}\frac{1}{n}\sum\limits_{i,j}^{n}(x_{1})_{i}(x_{2})_{j}\Big{|}\lim\limits_{p\rightarrow\infty}\max\limits_{i,j}\Big{|}\frac{1}{p}\sum\limits_{k=1}^{p}(Z_{1})_{ki}(Z_{2})_{kj}\Big{|}
=1n|in(x1)i||in(x2)i|limpmaxi,j|1pk=1p(Z1)ki(Z2)kj|\displaystyle=\frac{1}{n}\Big{|}\sum\limits_{i}^{n}(x_{1})_{i}\Big{|}\Big{|}\sum\limits_{i}^{n}(x_{2})_{i}\Big{|}\lim\limits_{p\rightarrow\infty}\max\limits_{i,j}\Big{|}\frac{1}{p}\sum\limits_{k=1}^{p}(Z_{1})_{ki}(Z_{2})_{kj}\Big{|}
in(x1)i2in(x2)i2limpmaxi,j|1pk=1p(Z1)ki(Z2)kj|\displaystyle\leq\sqrt{\sum\limits_{i}^{n}(x_{1})_{i}^{2}}\sqrt{\sum\limits_{i}^{n}(x_{2})_{i}^{2}}\lim\limits_{p\rightarrow\infty}\max\limits_{i,j}\Big{|}\frac{1}{p}\sum\limits_{k=1}^{p}(Z_{1})_{ki}(Z_{2})_{kj}\Big{|}
=limpmaxi,j|1pk=1p(Z1)ki(Z2)kj|=0  almost surely.\displaystyle=\lim\limits_{p\rightarrow\infty}\max\limits_{i,j}\Big{|}\frac{1}{p}\sum\limits_{k=1}^{p}(Z_{1})_{ki}(Z_{2})_{kj}\Big{|}=0\text{ }\text{ almost surely}.

The final limit is 0 since number of possible pairs of ii and jj is n2<n^{2}<\infty and we have (104). ∎

The proof of the following Lemmas is a straightforward computation.

Lemma 6.7.

Let

λ=12((|X1|2+|X2|2)+(|X1|2|X2|2)2+4ν2|X1|2|X2|2).\lambda=\frac{1}{2}\bigg{(}(|X_{1}|^{2}+|X_{2}|^{2})+\sqrt{(|X_{1}|^{2}-|X_{2}|^{2})^{2}+4\nu^{2}|X_{1}|^{2}|X_{2}|^{2}}\bigg{)}.

The maximum of the function g(x)g(x1,x2)g_{\infty}(x)\equiv g_{\infty}(x_{1},x_{2}) defined in Lemma 6.6 is attained at

x=[x1x2]=±[X1|X1|a1X2|X2|a2]x^{*}=\bigg{[}\begin{array}[]{c}x_{1}^{*}\\ x_{2}^{*}\end{array}\bigg{]}=\pm\bigg{[}\begin{array}[]{c}\frac{X_{1}}{|X_{1}|}a_{1}\\ \frac{X_{2}}{|X_{2}|}a_{2}\end{array}\bigg{]} (105)

where

a1=λ|X2|22λ|X|2a_{1}=\sqrt{\frac{\lambda-|X_{2}|^{2}}{2\lambda-|X|^{2}}}, a2=λ|X1|22λ|X|2a_{2}=\sqrt{\frac{\lambda-|X_{1}|^{2}}{2\lambda-|X|^{2}}} for ν>0\nu>0
a1=12a_{1}=\frac{1}{\sqrt{2}}, a2=12a_{2}=\frac{1}{\sqrt{2}} for ν=0\nu=0 and |X2|=|X1||X_{2}|=|X_{1}|
a1=1a_{1}=1, a2a_{2}=0 for ν=0\nu=0 and |X1|>|X2||X_{1}|>|X_{2}|
a1=0a_{1}=0, a2=1a_{2}=1 for ν=0\nu=0 and |X2|>|X1||X_{2}|>|X_{1}|

and the maximum value is

max(g(x1,x2))=12n[(limp|β|2p)λ+δ2]max(g_{\infty}(x_{1},x_{2}))=\frac{1}{2n}\Big{[}\Big{(}\lim\limits_{p\rightarrow\infty}\frac{|\beta|^{2}}{p}\Big{)}\lambda+\delta^{2}\Big{]} (106)
Lemma 6.8.

With the notation as before and a1a_{1}, a2a_{2} and λ\lambda as defined as in Lemma 6.7,

limps2p=12n((limp|β|2p)λ+δ2)  and  limpχ=[X1|X1|a1X2|X2|a2]\lim\limits_{p\rightarrow\infty}\frac{s^{2}}{p}=\frac{1}{2n}(\Big{(}\lim\limits_{p\rightarrow\infty}\frac{|\beta|^{2}}{p}\Big{)}\lambda+\delta^{2})\text{ }\text{ and }\text{ }\lim\limits_{p\rightarrow\infty}\chi=\bigg{[}\begin{array}[]{c}\frac{X_{1}}{|X_{1}|}a_{1}\\ \frac{X_{2}}{|X_{2}|}a_{2}\end{array}\bigg{]}

almost surely.

Proof.

Define χ1,χ2n\chi_{1},\chi_{2}\in\mathbb{R}^{n} by χ=[χ1χ2]\chi=\bigg{[}\begin{array}[]{c}\chi_{1}\\ \chi_{2}\end{array}\bigg{]}. The definitions of ss,hh and χ\chi are such that

sh=Rχ2nsh=\frac{R\chi}{\sqrt{2n}} (107)

Since s2s^{2} is the largest eigenvalue we get

s2p=|Rx|22np=gp(χ1,χ2)=sup|x1|2+|x2|2=1gp(x1,x2).\frac{s^{2}}{p}=\frac{|Rx|^{2}}{2np}=g_{p}(\chi_{1},\chi_{2})=\underset{|x_{1}|^{2}+|x_{2}|^{2}=1}{sup}g_{p}(x_{1},x_{2}). (108)

By Lemma 6.6, we have the uniform almost sure convergence of gp(x1,x2)g_{p}(x_{1},x_{2}) to g(x1,x2)g_{\infty}(x_{1},x_{2}). Hence the supremum and limits are interchangeable:

limpgp(χ1,χ2)\displaystyle\lim\limits_{p\rightarrow\infty}g_{p}(\chi_{1},\chi_{2}) =limpsup|x1|2+|x2|2=1gp(x1,x2)\displaystyle=\lim\limits_{p\rightarrow\infty}\underset{|x_{1}|^{2}+|x_{2}|^{2}=1}{\sup}g_{p}(x_{1},x_{2})
=sup|x1|2+|x2|2=1limpgp(x1,x2)=sup|x1|2+|x2|2=1g(x1,x2)\displaystyle=\underset{|x_{1}|^{2}+|x_{2}|^{2}=1}{\sup}\lim\limits_{p\rightarrow\infty}g_{p}(x_{1},x_{2})=\underset{|x_{1}|^{2}+|x_{2}|^{2}=1}{\sup}g_{\infty}(x_{1},x_{2})
=g(x1,x2).\displaystyle=g_{\infty}(x_{1}^{*},x_{2}^{*}). (109)

By (109) along with the Lemma 6.7,

limps2p=limpgp(χ1,χ2)=g(x1,x2)=12n((limp|β|2p)λ+δ2).\lim\limits_{p\rightarrow\infty}\frac{s^{2}}{p}=\lim\limits_{p\rightarrow\infty}g_{p}(\chi_{1},\chi_{2})=g_{\infty}(x_{1}^{*},x_{2}^{*})=\frac{1}{2n}(\Big{(}\lim\limits_{p\rightarrow\infty}\frac{|\beta|^{2}}{p}\Big{)}\lambda+\delta^{2}). (110)

This proves the first part of the Lemma. On the other hand, the following almost sure convergence also follows from Lemma 6.6:

limp|gp(χ1,χ2)g(χ1,χ2)|=0.\lim\limits_{p\rightarrow\infty}|g_{p}(\chi_{1},\chi_{2})-g_{\infty}(\chi_{1},\chi_{2})|=0. (111)

Combining (109) and (111) we obtain,

limp|g(χ1,χ2)g(x1,x2)|=0\lim\limits_{p\rightarrow\infty}|g_{\infty}(\chi_{1},\chi_{2})-g_{\infty}(x_{1}^{*},x_{2}^{*})|=0 (112)

Expanding equation 107 yields

(h,q)=(b1,q)|β1|X1Tχ1+(b2,q)|β1|X1Tχ1+qTZ1χ1+qTZ2χ2s2n.(h,q)=\frac{(b_{1},q)|\beta_{1}|X_{1}^{T}\chi_{1}+(b_{2},q)|\beta_{1}|X_{1}^{T}\chi_{1}+q^{T}Z_{1}\chi_{1}+q^{T}Z_{2}\chi_{2}}{s\sqrt{2n}}. (113)

By an application of Lemma 6.13, it is straightforward to show that the last two terms vanish almost surely. Hence we have,

(h,q)=(b1,q)(limp|β1|X1Tχ1s2n)+(b2,q)(limp|β2|X2Tχ2s2n).(h,q)_{\infty}=(b_{1},q)_{\infty}\bigg{(}\lim\limits_{p\rightarrow\infty}\frac{|\beta_{1}|X_{1}^{T}\chi_{1}}{s\sqrt{2n}}\bigg{)}+(b_{2},q)_{\infty}\bigg{(}\lim\limits_{p\rightarrow\infty}\frac{|\beta_{2}|X_{2}^{T}\chi_{2}}{s\sqrt{2n}}\bigg{)}. (114)

Note that the sequence of points χ=[χ1χ2]\chi=\bigg{[}\begin{array}[]{c}\chi_{1}\\ \chi_{2}\end{array}\bigg{]} lie on a closed and bounded set 𝕊2n1\mathbb{S}^{2n-1}. Hence for any sub-sequence, a further sub-sequence converges. Since we have (112) and gg_{\infty} is continuous, those sub-sequence of vectors must converge to either [X1X1a1X2|X2|a2]\bigg{[}\begin{array}[]{c}\frac{X_{1}}{||X_{1}||}a_{1}\\ \frac{X_{2}}{|X_{2}|}a_{2}\end{array}\bigg{]} or [X1X1a1X2|X2|a2]-\bigg{[}\begin{array}[]{c}\frac{X_{1}}{||X_{1}||}a_{1}\\ \frac{X_{2}}{|X_{2}|}a_{2}\end{array}\bigg{]}.

By (114) the fact that a10a_{1}\geq 0, a20a_{2}\geq 0, and (h,q)0(h,q)\geq 0, there must be a further sub-sequence that converges to

[X1X1a1X2|X2|a2].\bigg{[}\begin{array}[]{c}\frac{X_{1}}{||X_{1}||}a_{1}\\ \frac{X_{2}}{|X_{2}|}a_{2}\end{array}\bigg{]}.

Hence

limpχ1=X1X1a1  and  limpχ2=X2X2a2.\lim\limits_{p\rightarrow\infty}\chi_{1}=\frac{X_{1}}{||X_{1}||}a_{1}\text{ }\text{ and }\text{ }\lim\limits_{p\rightarrow\infty}\chi_{2}=\frac{X_{2}}{||X_{2}||}a_{2}. (115)

Now we are ready at last to prove Proposition 5.5.

Proof of Proposition 5.5:.

Let (h1,s1)(h_{1},s_{1}) and (h2,s2)(h_{2},s_{2}) be the leading eigenpairs for the sample covariance matrices S1=1nR1R1TS_{1}=\frac{1}{n}R_{1}R_{1}^{T} and S2=1nR2R2TS_{2}=\frac{1}{n}R_{2}R_{2}^{T} respectively. Also let ϕ1\phi_{1} and ϕ2\phi_{2} be the right singular vectors of R1R_{1} and R2R_{2} that are associated with h1h_{1} and h2h_{2} respectively. Hence we can write

h1=R1ϕ1s1n=β1X1Tϕ1+Z1ϕ1s1nh_{1}=\frac{R_{1}\phi_{1}}{s_{1}\sqrt{n}}=\frac{\beta_{1}X_{1}^{T}\phi_{1}+Z_{1}\phi_{1}}{s_{1}\sqrt{n}} (116)
h2=R2ϕ2s2n=β2X2Tϕ2+Z2ϕ2s2nh_{2}=\frac{R_{2}\phi_{2}}{s_{2}\sqrt{n}}=\frac{\beta_{2}X_{2}^{T}\phi_{2}+Z_{2}\phi_{2}}{s_{2}\sqrt{n}} (117)
h=Rχs2n=β1X1Tχ1+Z1χ1+β2TX2χ2+Z2χ2s2nh=\frac{R\chi}{s\sqrt{2n}}=\frac{\beta_{1}X_{1}^{T}\chi_{1}+Z_{1}\chi_{1}+\beta_{2}^{T}X_{2}\chi_{2}+Z_{2}\chi_{2}}{s\sqrt{2n}} (118)

Now define

h~=s1s2a1h1+s2s2a2h2.\tilde{h}=\frac{s_{1}}{s\sqrt{2}}a_{1}h_{1}+\frac{s_{2}}{s\sqrt{2}}a_{2}h_{2}. (119)

Clearly, h~\tilde{h} resides in the span of h1h_{1} and h2h_{2}. By Lemma 6.14 we have limpϕ1=X1|X1|\lim\limits_{p\rightarrow\infty}\phi_{1}=\frac{X_{1}}{|X_{1}|} and limpϕ2=X2|X2|\lim\limits_{p\rightarrow\infty}\phi_{2}=\frac{X_{2}}{|X_{2}|} almost surely. We have,

h~h\displaystyle||\tilde{h}-h|| =||β1X1Ts2n(a1ϕ1χ1)+β2X2Ts2n(a2ϕ2χ2)\displaystyle=\bigg{|}\bigg{|}\frac{\beta_{1}X_{1}^{T}}{s\sqrt{2n}}(a_{1}\phi_{1}-\chi_{1})+\frac{\beta_{2}X_{2}^{T}}{s\sqrt{2n}}(a_{2}\phi_{2}-\chi_{2})
+Z1s2n(a1ϕ1χ1)+Z2s2n(a2ϕ2χ2)||\displaystyle+\frac{Z_{1}}{s\sqrt{2n}}(a_{1}\phi_{1}-\chi_{1})+\frac{Z_{2}}{s\sqrt{2n}}(a_{2}\phi_{2}-\chi_{2})\bigg{|}\bigg{|}
{β1X1Ts2nF+Z1s2nF}a1ϕ1χ1\displaystyle\leq\bigg{\{}\bigg{|}\bigg{|}\frac{\beta_{1}X_{1}^{T}}{s\sqrt{2n}}\bigg{|}\bigg{|}_{F}+\bigg{|}\bigg{|}\frac{Z_{1}}{s\sqrt{2n}}\bigg{|}\bigg{|}_{F}\bigg{\}}\bigg{|}\bigg{|}a_{1}\phi_{1}-\chi_{1}\bigg{|}\bigg{|}
+{β2X2Ts2nF+Z2s2nF}a2ϕ2χ2.\displaystyle+\bigg{\{}\bigg{|}\bigg{|}\frac{\beta_{2}X_{2}^{T}}{s\sqrt{2n}}\bigg{|}\bigg{|}_{F}+\bigg{|}\bigg{|}\frac{Z_{2}}{s\sqrt{2n}}\bigg{|}\bigg{|}_{F}\bigg{\}}\bigg{|}\bigg{|}a_{2}\phi_{2}-\chi_{2}\bigg{|}\bigg{|}. (120)

By Lemmas 6.6 and 6.14, both terms a1ϕ1χ1a_{1}\phi_{1}-\chi_{1} and a2ϕ2χ2a_{2}\phi_{2}-\chi_{2} converge to 0 almost surely. Since we have limpsp(0,)\lim\limits_{p\rightarrow\infty}\frac{s}{\sqrt{p}}\in(0,\infty) from the proof of 6.6, a few applications of strong law of large numbers shows that the Frobenious norm terms remain finite in the asymptotic regime. This completes the proof. ∎

6.4 Proof of Theorem 2.4(b)

We first need to tackle the following technical propositions.

Proposition 6.9.

Under assumptions 1-5,

  1. 1.

    |ν(b,q)2|>0|\nu-(b,q)_{\infty}^{2}|>0 almost surely if and only if |ρ(β1,β2)|>0|\rho_{\infty}(\beta_{1},\beta_{2})|>0 almost surely,

  2. 2.

    ν<1\nu<1 almost surely if and only if ρ(β1,β2)<1\rho_{\infty}(\beta_{1},\beta_{2})<1 almost surely, and

  3. 3.

    1+ν2(b,q)2>01+\nu-2(b,q)^{2}_{\infty}>0 almost surely if and only if ρ(β1,β2)>1\rho_{\infty}(\beta_{1},\beta_{2})>-1 almost surely.

Proof.

From the definitions of dp(β1)d_{p}(\beta_{1}), dp(β1)d_{p}(\beta_{1}) and dp(β1,β2)d_{p}(\beta_{1},\beta_{2}) it follows that

(b1,b2)(b1,q)(b2,q)=dp(β1,β2)1+dp(β1)21+dp(β22)(b_{1},b_{2})-(b_{1},q)(b_{2},q)=\frac{d_{p}(\beta_{1},\beta_{2})}{\sqrt{1+d_{p}(\beta_{1})^{2}}\sqrt{1+d_{p}(\beta_{2}}^{2})} (121)
(b1,q)=11+dp(β1)2  and  (b2,q)=11+dp(β2)2.(b_{1},q)=\frac{1}{\sqrt{1+d_{p}(\beta_{1})^{2}}}\text{ }\text{ and }\text{ }(b_{2},q)=\frac{1}{\sqrt{1+d_{p}(\beta_{2})^{2}}}. (122)

Now using (121) and the assumption (5) we can prove the first part of the proposition as,

|ν(b,q)2|=limp|(b1,b2)(b1,q)(b2,q)|\displaystyle|\nu-(b,q)_{\infty}^{2}|=\lim\limits_{p\rightarrow}|(b_{1},b_{2})-(b_{1},q)(b_{2},q)|
=|d(β1,β2)1+d(β)2|=|ρp(β1,β2)|d(β)21+d(β)2\displaystyle=|\frac{d_{\infty}(\beta_{1},\beta_{2})}{1+d_{\infty}(\beta)^{2}}|=|\rho_{p}(\beta_{1},\beta_{2})|\frac{d_{\infty}(\beta)^{2}}{1+d_{\infty}(\beta)^{2}} (123)

Next, using equations (121), (122) and the assumption (5) we can re-write ν\nu as,

ν=d(β1,β2)1+d(β)2+11+d(β)2=1+d(β1,β2)1+d(β)2\displaystyle\nu=\frac{d_{\infty}(\beta_{1},\beta_{2})}{1+d_{\infty}(\beta)^{2}}+\frac{1}{1+d_{\infty}(\beta)^{2}}=\frac{1+d_{\infty}(\beta_{1},\beta_{2})}{1+d_{\infty}(\beta)^{2}} (124)

From there it is easy to see

ν<1d(β1,β2)<d(β)2ρ(β)2<1\nu<1\iff d_{\infty}(\beta_{1},\beta_{2})<d_{\infty}(\beta)^{2}\iff\rho_{\infty}(\beta)^{2}<1 (125)

which proves the second assertion. Finally, using the equations (121), (122) and the assumption (5) we can rewrite 1+ν2(b,q)21+\nu-2(b,q)^{2}_{\infty} as,

1+ν2(b,q)2=1+d(β1,β2)1+d(β)211+d(β)2=d(β)2+d(β1,β2))1+d(β)2.1+\nu-2(b,q)^{2}_{\infty}=1+\frac{d_{\infty}(\beta_{1},\beta_{2})}{1+d_{\infty}(\beta)^{2}}-\frac{1}{1+d_{\infty}(\beta)^{2}}=\frac{d_{\infty}(\beta)^{2}+d_{\infty}(\beta_{1},\beta_{2}))}{1+d_{\infty}(\beta)^{2}}. (126)

Therefore

1+ν2(b,q)2>0d2(β)>d(β1,β2)ρ(β1,β2)>1,1+\nu-2(b,q)^{2}_{\infty}>0\iff d_{\infty}^{2}(\beta)>-d_{\infty}(\beta_{1},\beta_{2})\iff\rho_{\infty}(\beta_{1},\beta_{2})>-1, (127)

which proves the third part of the proposition. ∎

Proposition 6.10.

Given the modeling assumptions and |ρ(β1,β2)|>0|\rho_{\infty}(\beta_{1},\beta_{2})|>0 we have

limp(proj<q,h1,h2>(b2)2proj<q,h2>(b2)2)>0  almost surely. \lim\limits_{p\rightarrow\infty}\big{(}||\underset{<q,h_{1},h_{2}>}{\operatorname{proj}}(b_{2})||^{2}-||\underset{<q,h_{2}>}{\operatorname{proj}}(b_{2})||^{2}\big{)}>0\text{ }\text{ almost surely. }
Proof.

First re-write it as

proj<q,h1,h2>(b2)2proj<q,h2>(b2)2\displaystyle||\underset{<q,h_{1},h_{2}>}{\operatorname{proj}}(b_{2})||^{2}-||\underset{<q,h_{2}>}{\operatorname{proj}}(b_{2})||^{2} =proj<q,h2,h1>(b2)2proj<q,h2>(b2)2=(h1,b)2\displaystyle=||\underset{<q,h_{2},h_{1}^{*}>}{\operatorname{proj}}(b_{2})||^{2}-||\underset{<q,h_{2}>}{\operatorname{proj}}(b_{2})||^{2}=(h_{1}^{*},b)^{2} (128)

where h1=h1proj<q,h2>(h1)h1proj<q,h2>(h1)h_{1}^{*}=\frac{h_{1}-\underset{<q,h_{2}>}{\operatorname{proj}}(h_{1})}{||h_{1}-\underset{<q,h_{2}>}{\operatorname{proj}}(h_{1})||}. From (128) it is sufficient to prove,

(h1,b)2=limp(h1proj<q,h2>(h1),b2)21h1proj<q,h2>(h1)2>0.(h_{1}^{*},b)_{\infty}^{2}=\lim\limits_{p\rightarrow\infty}\Big{(}h_{1}-\underset{<q,h_{2}>}{\operatorname{proj}}(h_{1}),b_{2}\Big{)}^{2}\frac{1}{||h_{1}-\underset{<q,h_{2}>}{\operatorname{proj}}(h_{1})||^{2}}>0. (129)

With some computation and Proposition 5.2 one can verify that

limp|(h1proj<q,h2>(h1),b2)|=|(ν(b,q)2)(h1,b1)(1(h2,b2)2)1(h2,q)2|.\lim\limits_{p\rightarrow\infty}\Big{|}\Big{(}h_{1}-\underset{<q,h_{2}>}{\operatorname{proj}}(h_{1}),b_{2}\Big{)}\Big{|}=\Big{|}\frac{(\nu-(b,q)_{\infty}^{2})(h_{1},b_{1})_{\infty}(1-(h_{2},b_{2})_{\infty}^{2})}{1-(h_{2},q)^{2}_{\infty}}\Big{|}. (130)

By Lemma 6.14, we have (h1,b1),(h2,b2)(0,1)(h_{1},b_{1})_{\infty},(h_{2},b_{2})_{\infty}\in(0,1). By part (1) of Proposition 6.9 we have |ν(b,q)2|>0|\nu-(b,q)_{\infty}^{2}|>0 almost surely. These all together prove

limp|(h1proj<q,h2>(h1),b2)|>0,\lim\limits_{p\rightarrow\infty}\Big{|}\Big{(}h_{1}-\underset{<q,h_{2}>}{\operatorname{proj}}(h_{1}),b_{2}\Big{)}\Big{|}>0, (131)

which implies (h1,b)2>0(h_{1}^{*},b)_{\infty}^{2}>0. This finishes the proof.

Proposition 6.11.

Given the modeling assumptions and 0<|ρ(β1,β2)|<10<|\rho_{\infty}(\beta_{1},\beta_{2})|<1 almost surely, we have

limp(proj<q,h1,h2>(b2)2proj<q,h~>(b2)2)>0  almost surely. \lim\limits_{p\rightarrow\infty}\big{(}||\underset{<q,h_{1},h_{2}>}{\operatorname{proj}}(b_{2})||^{2}-||\underset{<q,\tilde{h}>}{\operatorname{proj}}(b_{2})||^{2}\big{)}>0\text{ }\text{ almost surely. } (132)

where h~\tilde{h} is defined by equation (119).

Proof.

Rewrite the definition of h~\tilde{h} as

h~=s1s2a1h1+s2s2a2h2=A1h1+A2h2.\tilde{h}=\frac{s_{1}}{s\sqrt{2}}a_{1}h_{1}+\frac{s_{2}}{s\sqrt{2}}a_{2}h_{2}=A_{1}h_{1}+A_{2}h_{2}. (133)

where we set A1=s1s2a1A_{1}=\frac{s_{1}}{s\sqrt{2}}a_{1} and A2=s2s2a2A_{2}=\frac{s_{2}}{s\sqrt{2}}a_{2}. We will use the notation A1,A_{1,\infty} and A2,A_{2,\infty} for the limits of A1A_{1} and A2A_{2} respectively. From the definition of a1a_{1} and a2a_{2} we have A1A_{1} and A2A_{2} are nonzero as long as ν0\nu\neq 0. From the statement of Lemma 6.7 and a use of lemma 6.8 it is easy to recover the following implications of the sub-cases of ν=0\nu=0,

  1. (a)

    If |X2|=|X1||X_{2}|=|X_{1}| and ν=0\nu=0 then A1=A2=12A_{1}=A_{2}=\frac{1}{\sqrt{2}},

  2. (b)

    if |X2|>|X1|X_{2}|>|X_{1} and ν=0\nu=0 then A1=0A_{1}=0, A2=1A_{2}=1,

  3. (c)

    if |X1|>|X2||X_{1}|>|X_{2}| and ν=0\nu=0 then A1=1A_{1}=1, A2=0A_{2}=0.

For the sub-case (b) we get h~=h2\tilde{h}=h_{2}. Hence the assertion of the proposition for this sub-case is same with the assertion of Proposition 6.10. Therefore the untreated cases are the sub-cases (a), (c) and the case ν0\nu\neq 0. For all of them A1>0A_{1}>0 and hence (h1,h2)=(h~,h2)(h_{1},h_{2})=(\tilde{h},h_{2}). For that reason we can re-write (132) as follows,

proj<q,h1,h2>(b2)2proj<q,h~>(b2)2=proj<q,h~,h2>(b2)2proj<q,h~>(b2)2=(h2,b2)2\displaystyle||\underset{<q,h_{1},h_{2}>}{\operatorname{proj}}(b_{2})||^{2}-||\underset{<q,\tilde{h}>}{\operatorname{proj}}(b_{2})||^{2}=||\underset{<q,\tilde{h},h_{2}>}{\operatorname{proj}}(b_{2})||^{2}-||\underset{<q,\tilde{h}>}{\operatorname{proj}}(b_{2})||^{2}=(h_{2}^{*},b_{2})^{2} (134)

where we set h2=h2proj(h2)<q,h~>h2proj(h2<q,h~>h_{2}^{*}=\frac{h_{2}-\underset{<q,\tilde{h}>}{\operatorname{proj}(h_{2})}}{||h_{2}-\underset{<q,\tilde{h}>}{\operatorname{proj}(h_{2}}||}, and which is by definition orthogonal to the linear subspace generated by qq and h~\tilde{h}. Continuing from (134), it is sufficient to show

(h2,b2)2=limp(h2proj(h2)<q,h~>h2proj(h2<q,h~>,b2)2>0  almost surely.(h_{2}^{*},b_{2})_{\infty}^{2}=\lim\limits_{p\rightarrow\infty}\Big{(}\frac{h_{2}-\underset{<q,\tilde{h}>}{\operatorname{proj}(h_{2})}}{||h_{2}-\underset{<q,\tilde{h}>}{\operatorname{proj}(h_{2}}||},b_{2}\Big{)}^{2}>0\text{ }\text{ almost surely}.

By means of Theorem 5.1 and Proposition 5.5 we can derive the following decomposition:

Lemma 6.12.
limp(h2proj(h2)<q,h~>,b2)=Q1+(1(b,q)2)A1,Q2+Q31(h~,q)2\lim\limits_{p\rightarrow\infty}\big{(}h_{2}-\underset{<q,\tilde{h}>}{\operatorname{proj}(h_{2})},b_{2}\big{)}\\ =\frac{Q_{1}+(1-(b,q)^{2}_{\infty})A_{1,\infty}Q_{2}+Q_{3}}{1-(\tilde{h},q)_{\infty}^{2}}

where

Q1\displaystyle Q_{1} =(b,q)2(1ν)A1,A2,(h1,b1)(1(h2,b2)2)\displaystyle=(b,q)^{2}_{\infty}(1-\nu)A_{1,\infty}A_{2,\infty}(h_{1},b_{1})_{\infty}(1-(h_{2},b_{2})_{\infty}^{2})
Q2\displaystyle Q_{2} =A1,(h2,b2)(1(h1,b1)2)νA2,(h1,b1)(1(h2,b2)2)\displaystyle=A_{1,\infty}(h_{2},b_{2})(1-(h_{1},b_{1})^{2})-\nu A_{2,\infty}(h_{1},b_{1})_{\infty}(1-(h_{2},b_{2})_{\infty}^{2})
Q3\displaystyle Q_{3} =(1ν)(1+ν2(b,q)2)A1,2(h1,b1)2(h2,b2)\displaystyle=(1-\nu)(1+\nu-2(b,q)^{2}_{\infty})A_{1,\infty}^{2}(h_{1},b_{1})_{\infty}^{2}(h_{2},b_{2})_{\infty}

The proof of Lemma 6.12 requires some algebraic computations and is omitted here, but complete details appear in Gurdogan, [2021].

Now we will prove Q20Q_{2}\geq 0, Q10Q_{1}\geq 0, and Q3>0Q_{3}>0 in an orderly fashion. We will use the following implications of the second and the third part of Proposition 6.9,

ν<1 and 1+ν2(b,q)2>0  almost surely.\nu<1\text{ }\text{and}\text{ }1+\nu-2(b,q)^{2}>0\text{ }\text{ almost surely.} (135)

Lets start with proving Q20Q_{2}\geq 0. Using Lemmas 6.7, 6.8 and Lemma 6.14 one can derive the following,

Q2=γδ2Cλ|X2|2|X2|(|X|2λ)Q_{2}=\frac{\sqrt{\gamma}\delta^{2}}{C}\frac{\sqrt{\lambda-|X_{2}|^{2}}}{|X_{2}|}(|X|^{2}-\lambda) (136)
C=(γλ+δ2)(γ|X1|2+δ2)(γ|X2|2+δ2)(2λ|X|2),  γ=limp|β|2p.C=\sqrt{(\gamma\lambda+\delta^{2})(\gamma|X_{1}|^{2}+\delta^{2})(\gamma|X_{2}|^{2}+\delta^{2})(2\lambda-|X|^{2})}\text{,}\text{ }\text{ }\gamma=\lim\limits_{p\rightarrow}\frac{|\beta|^{2}}{p}.

From the definition of λ\lambda in Lemma 6.7, we can immediately infer that

max(|X1|2,|X2|2)λ|X|2.max(|X_{1}|^{2},|X_{2}|^{2})\leq\lambda\leq|X|^{2}.

By the assumptions (1), (4) the remaining terms on the numerator at 136 are positive. Hence we get Q20Q_{2}\geq 0 almost surely.

In regards to Q1Q_{1}, we have the terms A1A_{1}, A2A_{2} non-negative by their definition. Also, the terms (h1,b1)(h_{1},b_{1})_{\infty}, 1(h2,b2)21-(h_{2},b_{2})_{\infty}^{2} are positive by Lemma 6.14. We have (b,q)2>0(b,q)_{\infty}^{2}>0 positive by a straight forward implication of the modeling assumption (1). Lastly 1ν>01-\nu>0 by (135). These all together proves that Q10Q_{1}\geq 0.

Now, lets see that all terms involve in Q3Q_{3} are positive. The terms 1ν1-\nu and 1+ν(b,q)21+\nu-(b,q)_{\infty}^{2} are positive by 135. Also the terms (h1,b1)(h_{1},b_{1})_{\infty} and (h2,b2)(h_{2},b_{2})_{\infty} are positive by Lemma 6.14. Finally, the term A1>0A_{1}>0 for the remaining case/sub-cases being treated. These all-together shows Q3>0Q_{3}>0.

Having Q10Q_{1}\geq 0, Q20Q_{2}\geq 0 and Q3>0Q_{3}>0 and A1,>0A_{1,\infty}>0 on (6.12) proves that,

limp(h2proj(h2)<q,h~>,b2)>0\lim\limits_{p\rightarrow\infty}\big{(}h_{2}-\underset{<q,\tilde{h}>}{\operatorname{proj}(h_{2})},b_{2}\big{)}>0 (137)

almost surely. This finishes the proof. ∎

Proof of Theorem 2.4(b).

Using Theorem 2.2 it suffices to prove

limp(hLbhqsb)<0 and  limp(hLbhqdb)<0  almost surely\lim\limits_{p\rightarrow\infty}\big{(}||h_{L}-b||-||h^{s}_{q}-b||\big{)}<0\text{ and }\text{ }\lim\limits_{p\rightarrow\infty}\big{(}||h_{L}-b||-||h_{q}^{d}-b||\big{)}<0\text{ }\text{ almost surely} (138)

Recalling the definition of hLh_{L} and hqh_{q} we can re-write it as

limp(proj<q,h1,h2>(b2)2proj<q,h>(b2)2)>0\lim\limits_{p\rightarrow\infty}\big{(}||\underset{<q,h_{1},h_{2}>}{\operatorname{proj}}(b_{2})||^{2}-||\underset{<q,h>}{\operatorname{proj}}(b_{2})||^{2}\big{)}>0 (139)

and

limp(proj<q,h1,h2>(b2)2proj<q,h2>(b2)2)>0\lim\limits_{p\rightarrow\infty}\big{(}||\underset{<q,h_{1},h_{2}>}{\operatorname{proj}}(b_{2})||^{2}-||\underset{<q,h_{2}>}{\operatorname{proj}}(b_{2})||^{2}\big{)}>0 (140)

almost surely. Recall from Proposition 5.5 that h~\tilde{h} converges to hh almost surely in the l2l_{2} norm. Using this we can update (140) with the following equivalent version,

limp(proj<q,h1,h2>(b2)2proj<q,h~>(b2)2)>0\lim\limits_{p\rightarrow\infty}\big{(}||\underset{<q,h_{1},h_{2}>}{\operatorname{proj}}(b_{2})||^{2}-||\underset{<q,\tilde{h}>}{\operatorname{proj}}(b_{2})||^{2}\big{)}>0 (141)

and

limp(proj<q,h1,h2>(b2)2proj<q,h2>(b2)2)>0\lim\limits_{p\rightarrow\infty}\big{(}||\underset{<q,h_{1},h_{2}>}{\operatorname{proj}}(b_{2})||^{2}-||\underset{<q,h_{2}>}{\operatorname{proj}}(b_{2})||^{2}\big{)}>0 (142)

almost surely. Propositions (6.10) and (6.11) finish the proof. ∎

6.5 Two Lemmas

For the reader’s convenience, we state two lemmas from Goldberg et al., [2021] that are used above. The reader may consult that paper for proofs.

Lemma 6.13 (Lemma A.1).

Let η={η(i)}i=1\eta=\{\eta(i)\}_{i=1}^{\infty} be a real sequence with μp(η)=1pi=1pη(i)\mu_{p}(\eta)=\frac{1}{p}\sum_{i=1}^{p}\eta(i) satisfying

lim infpμp(η)>0.\liminf_{p\to\infty}\mu_{p}(\eta)>0.

Let {Z(i)}i=1\{Z(i)\}_{i=1}^{\infty} be a sequence of mean zero, pairwise independent and identically distributed real random variables with finite variance.

Then

ηpTZpp|ηp|0\frac{\eta_{p}^{T}Z_{p}}{\sqrt{p}|\eta_{p}|}\to 0

almost surely as \ \to\infty, where

ηp=(η(1),,η(p))T and Zp=(Z(1),,Z(p))T.\eta_{p}=(\eta(1),\dots,\eta(p))^{T}\text{ and }Z_{p}=(Z(1),\dots,Z(p))^{T}.

With the notation and assumptions of our theorems, define a non-degenerate random variable σX\sigma_{X} by

σX2=|X|2nμ2(1+d2(β)),\sigma_{X}^{2}=\frac{|X|^{2}}{n}\mu_{\infty}^{2}(1+d_{\infty}^{2}(\beta)),

and recall: sp2s_{p}^{2} is the leading eigenvalue of the sample covariance matrix S=RRT/nS=RR^{T}/n, p2\ell_{p}^{2} is the average of the remaining eigenvalues, and χp\chi_{p} is the normalized right singular vector of Y/nY/\sqrt{n} corresponding to the singular value sp0s_{p}\geq 0.

Lemma 6.14 (Lemma A.2).

Almost surely as pp\to\infty,

sppσX2+δ2/n,χpX|X|, and p2pδ2n.\frac{s_{p}}{\sqrt{p}}\to\sqrt{\sigma_{X}^{2}+\delta^{2}/n},\,\chi_{p}\to\frac{X}{|X|},\text{ and }\frac{\ell_{p}^{2}}{p}\to\frac{\delta^{2}}{n}.

References

  • Bickel and Levina, [2008] Bickel, P. J. and Levina, E. (2008). Covariance regularization by thresholding. The Annals of Statistics, pages 2577–2604.
  • El Karoui, [2008] El Karoui, N. E. (2008). Spectrum estimation for large dimensional covariance matrices using random matrix theory. The Annals of Statistics, pages 2757–2790.
  • Fan et al., [2008] Fan, J., Fan, Y., and Lv, J. (2008). High dimensional covariance matrix estimation using a factor model. Journal of Econometrics, 147(1):186–197.
  • Fan et al., [2013] Fan, J., Liao, Y., and Mincheva, M. (2013). Large covariance estimation by thresholding principal orthogonal complements. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75(4):603–680.
  • Frost and Savarino, [1986] Frost, P. A. and Savarino, J. E. (1986). An empirical bayes approach to efficient portfolio selection. Journal of Financial and Quantitative Analysis, 21(3):293–305.
  • Goldberg et al., [2021] Goldberg, L., Papanicolaou, A., and Shkolnik, A. (2021). The dispersion bias. CDAR working paper https://cdar.berkeley.edu/publications/dispersion-bias.
  • Gurdogan, [2021] Gurdogan, H. (2021). Eigenvector Shrinkage for Estimating Covariance Matrices. PhD thesis, Florida State University.
  • Hall et al., [2005] Hall, P., Marron, J. S., and Neeman, A. (2005). Geometric representation of high dimension, low sample size data. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(3):427–444.
  • Lai and Xing, [2008] Lai, T. L. and Xing, H. (2008). Statistical models and methods for financial markets. Springer.
  • Ledoit and Wolf, [2003] Ledoit, O. and Wolf, M. (2003). Improved estimation of the covariance matrix of stock returns with an application to portfolio selection. Journal of empirical finance, 10(5):603–621.
  • Ledoit and Wolf, [2004] Ledoit, O. and Wolf, M. (2004). Honey, I shrunk the sample covariance matrix. The Journal of Portfolio Management, 30:110–119.
  • Ledoit and Wolf, [2017] Ledoit, O. and Wolf, M. (2017). Nonlinear shrinkage of the covariance matrix for portfolio selection: Markowitz meets goldilocks. The Review of Financial Studies, 30(12):4349–4388.
  • Markowitz, [1952] Markowitz, H. (1952). Portfolio selection. The Journal of Finance, 7(1):77–91.
  • Rosenberg, [1974] Rosenberg, B. (1974). Extra-market components of covariance in security returns. Journal of Financial and Quantitative Analysis, 9(2):263–274.
  • Ross, [1976] Ross, S. A. (1976). The arbitrage theory of capital asset pricing. Journal of economic theory, 13(3):341–360.
  • Vasicek, [1973] Vasicek, O. A. (1973). A note on using cross-sectional information in bayesian estimation of security betas. The Journal of Finance, 28(5):1233–1239.