This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A Bootstrap Method for Spectral Statistics in High-Dimensional
Elliptical Models

Siyao Wanglabel=e1][email protected] [    Miles E. Lopeslabel=e2][email protected] [ Department of Statistics, University of California, Davis
(2020)
Abstract

Although there is an extensive literature on the eigenvalues of high-dimensional sample covariance matrices, much of it is specialized to independent components (IC) models—in which observations are represented as linear transformations of random vectors with independent entries. By contrast, less is known in the context of elliptical models, which violate the independence structure of IC models and exhibit quite different statistical phenomena. In particular, very little is known about the scope of bootstrap methods for doing inference with spectral statistics in high-dimensional elliptical models. To fill this gap, we show how a bootstrap approach developed previously for IC models can be extended to handle the different properties of elliptical models. Within this setting, our main theoretical result guarantees that the proposed method consistently approximates the distributions of linear spectral statistics, which play a fundamental role in multivariate analysis. We also provide empirical results showing that the proposed method performs well for a variety of nonlinear spectral statistics.

62H99 ,
62F40,
62H25,
high-dimensional statistics,
bootstrap,
covariance matrices,
elliptical distributions,
keywords:
[class=MSC]
keywords:
volume: 0issue: 0
\startlocaldefs\endlocaldefs

and

t1Supported in part by NSF Grant DMS 1915786

1 Introduction

The analysis of spectral statistics of sample covariance matrices is a major research area within multivariate analysis, random matrix theory, high-dimensional statistics, and related fields [5, 48, 46, 41]. If 𝐱1,,𝐱n\mathbf{x}_{1},\dots,\mathbf{x}_{n} are centered i.i.d. observations in p\mathbb{R}^{p} with a sample covariance matrix denoted by

Σ^n=1ni=1n𝐱i𝐱i,\widehat{\Sigma}_{n}=\frac{1}{n}\sum_{i=1}^{n}\mathbf{x}_{i}\mathbf{x}_{i}^{\top}, (1.1)

then we say that a random variable TnT_{n} is a spectral statistic if it has the form Tn=ψ(λ1(Σ^n),,λp(Σ^n))T_{n}=\psi(\lambda_{1}(\widehat{\Sigma}_{n}),\dots,\lambda_{p}(\widehat{\Sigma}_{n})), where λ1(Σ^n)λp(Σ^n)\lambda_{1}(\widehat{\Sigma}_{n})\geq\cdots\geq\lambda_{p}(\widehat{\Sigma}_{n}) are the sorted eigenvalues of Σ^n\widehat{\Sigma}_{n}, and ψ\psi is a generic real-valued function. Over the past two decades, there has been a tremendous growth of interest in spectral statistics in high-dimensional settings where p=p(n)p=p(n) grows so that p/np/n converges to a positive constant as nn\to\infty. Likewise, statistical tools for approximating the distributions of spectral statistics have been applied to high-dimensional data in a broad range of domains, such as electrical engineering, finance, and biology [40, 1, 9, 10].

Although the research on spectral statistics has dealt with many different statistical models, two of the most influential ones have been elliptical and independent components (IC) models. To be specific, we say that the random vector 𝐱1\mathbf{x}_{1} follows an elliptical model if it can be represented in the form

𝐱1=ξ1Σn1/2𝐮1\mathbf{x}_{1}=\xi_{1}\Sigma_{n}^{1/2}\mathbf{u}_{1} (1.2)

where Σn=𝔼(𝐱1𝐱1)\Sigma_{n}=\mathbb{E}(\mathbf{x}_{1}\mathbf{x}_{1}^{\top}), and (𝐮1,ξ1)p×[0,)(\mathbf{u}_{1},\xi_{1})\in\mathbb{R}^{p}\times[0,\infty) is a random vector such that 𝐮1\mathbf{u}_{1} and ξ1\xi_{1} are independent, with 𝐮1\mathbf{u}_{1} being uniformly distributed on the unit sphere of p\mathbb{R}^{p}. Alternatively, we say that 𝐱1\mathbf{x}_{1} follows an IC model if

𝐱1=Σn1/2𝐳1,\mathbf{x}_{1}=\Sigma_{n}^{1/2}\mathbf{z}_{1}, (1.3)

where 𝐳1p\mathbf{z}_{1}\in\mathbb{R}^{p} is a random vector whose entries are centered and independent.

At first sight, these two models may seem to be very similar, but this outward appearance conceals some crucial differences in modelling capabilities. In particular, it should be stressed that the entries of the random vector ξ1𝐮1\xi_{1}\mathbf{u}_{1} in an elliptical model are correlated, which contrasts with the independence of the entries of 𝐳1\mathbf{z}_{1} in an IC model. Also, since the scalar random variable ξ1\xi_{1} is shared across all entries of 𝐱1\mathbf{x}_{1} in an elliptical model, this enhances the ability to capture scenarios where the magnitudes of all entries of 𝐱1\mathbf{x}_{1} move in the same direction simultaneously. This is a key effect in some application domains, such as in finance, where the entries of 𝐱1\mathbf{x}_{1} correspond to stock prices that can fall in tandem during a sharp market downturn. Additional background on related merits of elliptical models can be found in [11, 37, 16]. More generally, the multivariate analysis literature has placed a longstanding emphasis on the benefits of elliptical models in fitting various types of non-Gaussian data [14, 2, 17].

However, looking beyond the points just mentioned, IC models have played a more dominant role than elliptical models in the literature on spectral statistics in high dimensions. Consequently, the established body of high-dimensional limit theory is much less complete for elliptical models. Indeed, the challenge of extending results from IC models to elliptical ones has become a prominent topic of ongoing research, which has led to important advances in the limit theory for spectral statistics [e.g. 28, 19, 18, 47, 24, 29, 51]. As a matter of historical context, it also worth bearing in mind that for some spectral statistics, it took many years for such extensions to be established.

From the standpoint of statistical methodology, a corresponding set of gaps exists between elliptical and IC models. These gaps are especially apparent in the current state of bootstrap methods for high-dimensional data. In particular, it is known from [34] that a form of parametric bootstrapping can successfully approximate the distributions of spectral statistics in IC models, whereas very little is known for elliptical models. Accordingly, our primary goal in the current paper is to resolve this issue by developing a parametric bootstrap method that is both theoretically and empirically effective in high-dimensional elliptical models.

With regard to theory, we will focus on the class of linear spectral statistics, which have the form

Tn(f)=1pj=1pf(λj(Σ^n)),T_{n}(f)=\frac{1}{p}\sum_{j=1}^{p}f(\lambda_{j}(\widehat{\Sigma}_{n})), (1.4)

for a suitable real-valued function ff. Beginning with the pathbreaking works [23, 3] that established the earliest versions of the central limit theorem for linear spectral statistics in high dimensions, these statistics have been a perennial focus of research. Their importance is underscored by the fact that they appear frequently throughout multivariate analysis, with some of the most well-known examples being 1ptr(Σ^n)\frac{1}{p}\textup{tr}(\widehat{\Sigma}_{n}), 1ptr(Σ^n2)\frac{1}{p}\textup{tr}(\widehat{\Sigma}_{n}^{2}), and 1plogdet(Σ^n)\frac{1}{p}\log\det(\widehat{\Sigma}_{n}), among various other classical statistics for testing hypotheses [48].

Motivated by these considerations, our main theoretical result (Theorem 2) shows that the proposed bootstrap method consistently approximates the distributions of linear spectral statistics when the underlying data are elliptical and p/np/n converges to a positive constant as nn\to\infty. The proof substantially leverages recent progress on the central limit theorem for linear spectral statistics in elliptical models due to [18]. Also, an intermediate step in the proof (Lemma A.4) shows that the well-known eigenvalue estimation method QuEST is consistent in elliptical models—which may be of independent interest, since it seems that QuEST’s consistency has not previously been reported outside of IC models [26]. Moreover, Section 3.4 develops an application of Theorem 2 where we establish the asymptotic validity of inference procedures related to the stable rank parameter rn=tr(Σn)2/tr(Σn2)r_{n}=\textup{tr}(\Sigma_{n})^{2}/\textup{tr}(\Sigma_{n}^{2}) of the population covariance matrix Σn\Sigma_{n}.

To address the empirical performance of the proposed method, Section 4 presents numerical results for a wide variety of model settings and statistics. Most notably, these results show encouraging performance for both linear and nonlinear spectral statistics. (We regard any function of (λ1(Σ^n),,λp(Σ^n))(\lambda_{1}(\widehat{\Sigma}_{n}),\dots,\lambda_{p}(\widehat{\Sigma}_{n})) that is not of the form (1.4) as a nonlinear spectral statistic.) To put this point into perspective, it is important to highlight the fact that asymptotic formulas for the distributions of nonlinear spectral statistics are typically developed on a case-by-case basis, and are relatively scarce in comparison to those for linear spectral statistics. Even when such formulas are available, they may be very different from those for linear spectral statistics—as they may require the estimation of different model parameters, or the implementation of different algorithms for numerical evaluation. On the other hand, the bootstrap approach may be considered more user-friendly, since it can be applied to different types of spectral statistics in an automatic and unified way. Similarly, the bootstrap approach can provide the user with the freedom to easily explore statistics that depend on several linear spectral statistics in a complicated manner (which would otherwise require intricate delta-method calculations), or statistics for which formulas may not be available at all.

Notation and terminology. For a random object YY, the expression (Y)\mathcal{L}(Y) denotes its distribution, and (Y|X)\mathcal{L}(Y|X) denotes its conditional distribution given the observations 𝐱1,,𝐱n\mathbf{x}_{1},\dots,\mathbf{x}_{n}. Similarly, we use |X|X when referring to probabilities, expectations, and variances that are conditional on the observations. The symbols \xrightarrow{\mathbb{P}} and \Rightarrow respectively denote convergence in probability and convergence in distribution. For a set AkA\subset\mathbb{R}^{k} and a number δ>0\delta>0, the outer δ\delta-neighborhood of AA is defined as Aδ={𝐚k|inf𝐚A𝐚𝐚2δ}A^{\delta}=\{\mathbf{a}^{\prime}\in\mathbb{R}^{k}|\inf_{\mathbf{a}\in A}\|\mathbf{a}^{\prime}-\mathbf{a}\|_{2}\leq\delta\}, where 2\|\cdot\|_{2} is the Euclidean norm. If 𝐯\mathbf{v} and 𝐰\mathbf{w} are random vectors in k\mathbb{R}^{k}, then the Lévy-Prohorov metric dLP((𝐯),(𝐰))d_{\textup{LP}}(\mathcal{L}(\mathbf{v}),\mathcal{L}(\mathbf{w})) between their distributions is defined as the infimum over all numbers δ>0\delta>0 such that the inequality (𝐯A)(𝐰Aδ)+δ\mathbb{P}(\mathbf{v}\in A)\leq\mathbb{P}(\mathbf{w}\in A^{\delta})+\delta holds for all Borel sets AkA\subset\mathbb{R}^{k}. If ν,ν1,ν2,\nu,\nu_{1},\nu_{2},\dots is a sequence of random probability distributions on k\mathbb{R}^{k}, then the expression νnν\nu_{n}\overset{\mathbb{P}}{\Rightarrow}\nu means that the sequence of scalar random variables dLP(νn,ν)d_{\textup{LP}}(\nu_{n},\nu) converges to 0 in probability as nn\to\infty. For two sequences of non-negative real numbers {an}\{a_{n}\} and {bn}\{b_{n}\}, we write anbna_{n}\lesssim b_{n} if there is a constant C>0C>0 not depending on nn such that anCbna_{n}\leq Cb_{n} holds for all large nn. When both of the relations anbna_{n}\lesssim b_{n} and bnanb_{n}\lesssim a_{n} hold, we write anbna_{n}\asymp b_{n}. The relation an=o(bn)a_{n}=o(b_{n}) means an/bn0a_{n}/b_{n}\to 0 as nn\to\infty, and the relation an=𝒪(bn)a_{n}=\mathcal{O}(b_{n}) is equivalent to anbna_{n}\lesssim b_{n}. The k×kk\times k identity matrix is denoted as IkI_{k}, and indicator function for a condition \cdots is denoted as 1{}1\{\cdots\}. Lastly, we use +\mathbb{C}^{+} to refer to the set of complex numbers with positive imaginary part.

2 Method

Conceptually, the proposed method is motivated by the fact that the standard nonparametric bootstrap, based on sampling with replacement, often performs poorly when it is applied naively to high-dimensional data, unless special low-dimensional structure is available. (For additional background, see the papers [13, 32, 49], as well as the numerical results presented here at the end of Section 4.2.) This general difficulty can be understood by noting that sampling with replacement implicitly relies on the empirical distribution of the data as a substitute for the true data-generating distribution. In other words, the nonparametric bootstrap attempts to approximate a pp-dimensional distribution in a fully non-parametric way, which can be challenging for even moderately large values of pp. For this reason, alternative bootstrap methods that sample from parametric distributions have been advocated to improve upon the nonparametric bootstrap in high dimensions [e.g. 35, 34, 52], and this is the viewpoint that we pursue here.

2.1 Bootstrap algorithm

At an algorithmic level, the proposed method is built on top of two estimators. The first is an estimator ς^n 2\widehat{\varsigma}_{n}^{\,2} for the variance parameter ςn2=var(ξ12)\varsigma_{n}^{2}=\operatorname{var}(\xi_{1}^{2}). The second is an estimator Σ~n\tilde{\Sigma}_{n} for the diagonal matrix of population eigenvalues diag(λ1(Σn),,λp(Σn))\textup{diag}(\lambda_{1}(\Sigma_{n}),\dots,\lambda_{p}(\Sigma_{n})). Once these two estimators have been assembled, the method generates bootstrap data from an elliptical model parameterized in terms of ς^n 2\widehat{\varsigma}_{n}^{\,2} and Σ~n\tilde{\Sigma}_{n}. More specifically, the iith sample of each bootstrap dataset 𝐱1,,𝐱n\mathbf{x}_{1}^{*},\dots,\mathbf{x}_{n}^{*} is generated to be of the form

𝐱i=ξiΣ~n1/2𝐮i,\mathbf{x}_{i}^{*}=\xi_{i}^{*}\tilde{\Sigma}^{1/2}_{n}\mathbf{u}_{i}^{*},

where ξi\xi_{i}^{*} is a non-negative random variable satisfying 𝔼((ξi)2|X)=p\mathbb{E}((\xi_{i}^{*})^{2}|X)=p and var((ξi)2|X)=ς^n 2\operatorname{var}((\xi_{i}^{*})^{2}|X)=\widehat{\varsigma}_{n}^{\,2}, and 𝐮ip\mathbf{u}_{i}^{*}\in\mathbb{R}^{p} is drawn uniformly from the unit sphere, independently of ξi\xi_{i}^{*}. Then, a single bootstrap sample of a generic spectral statistic Tn=ψ(λ1(Σ^n),,λp(Σ^n))T_{n}=\psi(\lambda_{1}(\widehat{\Sigma}_{n}),\dots,\lambda_{p}(\widehat{\Sigma}_{n})) is computed as Tn=ψ(λ1(Σ^n),,λp(Σ^n))T_{n}^{*}=\psi(\lambda_{1}(\widehat{\Sigma}_{n}^{*}),\dots,\lambda_{p}(\widehat{\Sigma}_{n}^{*})), where Σ^n\widehat{\Sigma}_{n}^{*} is the sample covariance matrix of the bootstrap data.

To emphasize the modular role that ς^n 2\widehat{\varsigma}_{n}^{\,2} and Σ~n\tilde{\Sigma}_{n} play in the bootstrap sampling process, we will provide the details for their construction later in Sections 2.2 and 2.3. With these points understood, the following algorithm shows that the method is very easy to implement.

Algorithm 1 (Bootstrap for spectral statistics).

  Input: The number of bootstrap replicates BB, as well as Σ~n\tilde{\Sigma}_{n} and ς^n2\widehat{\varsigma}_{n}^{2}.
For: b=1,,Bb=1,\dots,B do in parallel

  • 1.

    Generate independent random variables g1,,gng_{1}^{*},\dots,g_{n}^{*} from a Gamma distribution with mean pp and variance ς^n2\widehat{\varsigma}_{n}^{2}, and then put ξi=gi\xi^{*}_{i}=\sqrt{g^{*}_{i}} for i=1,,ni=1,\dots,n.

  • 2.

    Generate independent random vectors 𝐮1,,𝐮np\mathbf{u}^{*}_{1},\ldots,\mathbf{u}^{*}_{n}\in\mathbb{R}^{p} from the uniform distribution on the unit sphere.

  • 3.

    Compute 𝐱i=ξiΣ~n1/2𝐮i\mathbf{x}^{*}_{i}=\xi^{*}_{i}\tilde{\Sigma}_{n}^{1/2}\mathbf{u}^{*}_{i} for i=1,,ni=1,\dots,n, and form Σ^n=1ni=1n𝐱i(𝐱i)\widehat{\Sigma}_{n}^{*}=\frac{1}{n}\sum_{i=1}^{n}\mathbf{x}_{i}^{*}(\mathbf{x}_{i}^{*})^{\top}.

  • 4.

    Compute the spectral statistic Tn,b=ψ(λ1(Σ^n),,λp(Σ^n))T^{*}_{n,b}=\psi(\lambda_{1}(\widehat{\Sigma}_{n}^{*}),\ldots,\lambda_{p}(\widehat{\Sigma}_{n}^{*})).

end for

Return: The empirical distribution of Tn,1,,Tn,B.T^{*}_{n,1},\ldots,T^{*}_{n,B}.

  

Remarks. One basic but valuable feature of the algorithm is that it can be applied with equal ease to both linear and nonlinear spectral statistics. To comment on some more technical aspects of the algorithm, the Gamma distribution is used in step 1 because it offers a convenient way to generate non-negative random variables whose means and variances can be matched to any pair of positive numbers. (If the event ς^n 2=0\widehat{\varsigma}_{n}^{\,2}=0 happens to occur, then the Gamma distribution in step 1 is interpreted as the point mass at pp, so that ξi=p\xi_{i}^{*}=\sqrt{p} for all i=1,,ni=1,\dots,n.) Nevertheless, the choice of the Gamma distribution for generating g1,,gng_{1}^{*},\dots,g_{n}^{*} is not required. Any other family of distributions on [0,)[0,\infty) parameterized by means μ\mu and variances σ2\sigma^{2}, say {Gμ,σ2}\{G_{\mu,\sigma^{2}}\}, will be compatible with our bootstrap consistency result in Theorem 2 if it satisfies the following two conditions: First, the pair (μ,σ2)(\mu,\sigma^{2}) can be set to (p,v)(p,v) for any integer p1p\geq 1 and real number v>0v>0. Second, there exists some fixed ε>0\varepsilon>0 such that the quantity |tpp|4+ε𝑑Gp,σ2(t)\int|\frac{t-p}{\sqrt{p}}|^{4+\varepsilon}dG_{p,\sigma^{2}}(t) remains bounded when pp diverges and σ2/p\sigma^{2}/p converges to a finite limit.

2.2 Variance estimation

The estimation of the parameter ςn2=var(ξ12)\varsigma_{n}^{2}=\operatorname{var}(\xi_{1}^{2}) is complicated by the fact that the random variables ξ1,,ξn\xi_{1},\dots,\xi_{n} are not directly observable in an elliptical model. It is possible to overcome this challenge with the following estimating equation, which can be derived from an explicit formula for var(𝐱122)\operatorname{var}(\|\mathbf{x}_{1}\|_{2}^{2}) that is given in Lemma D.1,

ςn2=p(p+2)var(𝐱122)2tr(Σn2)tr(Σn)2+2tr(Σn2)+2p.\varsigma_{n}^{2}\ =\ p(p+2)\frac{\operatorname{var}(\|\mathbf{x}_{1}\|_{2}^{2})-2\textup{tr}(\Sigma_{n}^{2})}{\textup{tr}(\Sigma_{n})^{2}+2\textup{tr}(\Sigma_{n}^{2})}+2p. (2.1)

Based on this equation, our approach is to separately estimate each of the three moment parameters on the right hand side, denoted as

αn=tr(Σn2)βn=var(𝐱122)γn=tr(Σn)2.\begin{split}\alpha_{n}&=\textup{tr}(\Sigma_{n}^{2})\\[5.69046pt] \beta_{n}&=\operatorname{var}(\|\mathbf{x}_{1}\|_{2}^{2})\\[5.69046pt] \gamma_{n}&=\textup{tr}(\Sigma_{n})^{2}.\end{split}

These parameters have the advantage that they can be estimated in a more direct manner, due to their simpler relations with the observations and the matrix Σn\Sigma_{n}. Specifically, we use estimates defined according to

α^n\displaystyle\widehat{\alpha}_{n} =tr(Σ^n2)1ntr(Σ^n)2,\displaystyle=\textup{tr}(\widehat{\Sigma}_{n}^{2})-\textstyle\frac{1}{n}\textup{tr}(\widehat{\Sigma}_{n})^{2},
β^n\displaystyle\widehat{\beta}_{n} =1n1i=1n(𝐱i221ni=1n𝐱i22)2,\displaystyle=\frac{1}{n-1}\sum_{i=1}^{n}\Big{(}\|\mathbf{x}_{i}\|_{2}^{2}-\textstyle\frac{1}{n}\sum_{i^{\prime}=1}^{n}\|\mathbf{x}_{i^{\prime}}\|_{2}^{2}\Big{)}^{2},
γ^n\displaystyle\widehat{\gamma}_{n} =tr(Σ^n)2.\displaystyle=\textup{tr}(\widehat{\Sigma}_{n})^{2}.

Substituting these estimates into (2.1) yields our proposed estimate for ςn2\varsigma_{n}^{2}

ς^n 2=(p(p+2)β^n2α^nγ^n+2α^n+2p)+\displaystyle\widehat{\varsigma}_{n}^{\,2}=\bigg{(}p(p+2)\frac{\widehat{\beta}_{n}-2\widehat{\alpha}_{n}}{\widehat{\gamma}_{n}+2\widehat{\alpha}_{n}}+2p\bigg{)}_{\!+} (2.2)

where x+=max{x,0}x_{+}=\max\{x,0\} denotes the non-negative part of any real number xx. The consistency of this estimate will be established in Theorem 1, which shows 1p(ς^n 2ςn2)0\frac{1}{p}(\widehat{\varsigma}_{n}^{\,2}-\varsigma_{n}^{2})\to 0 in probability as nn\to\infty.

2.3 Spectrum estimation

The problem of estimating the eigenvalues of a population covariance matrix has attracted long-term interest in the high-dimensional statistics literature, and many different estimation methods have been proposed  [e.g. 12, 38, 7, 26, 25]. In order to estimate λ1(Σn),,λp(Σn)\lambda_{1}(\Sigma_{n}),\dots,\lambda_{p}(\Sigma_{n}) in our current setting, we modify the method of QuEST [26], which has become widely used in recent years. This choice has the benefit of making all aspects of our proposed method easy to implement, because QuEST is supported by a turnkey software package [27].

We denote the estimates produced by QuEST as λ^Q,1λ^Q,p\widehat{\lambda}_{\textup{Q},1}\geq\dots\geq\widehat{\lambda}_{\textup{Q},p}, and we will use modified versions of them defined by

λ~j=min(λ^Q,j,b^n),\displaystyle\tilde{\lambda}_{j}=\min(\widehat{\lambda}_{\textup{Q},j},\widehat{b}_{n}), (2.3)

for j=1,,pj=1,\dots,p, where we let b^n=λ1(Σ^n)+1\widehat{b}_{n}=\lambda_{1}(\widehat{\Sigma}_{n})+1. The modification is done for theoretical reasons, to ensure that λ~1,,λ~p\tilde{\lambda}_{1},\dots,\tilde{\lambda}_{p} are asymptotically bounded, which follows from Lemma A.5. In addition, we define the diagonal p×pp\times p matrix associated with these estimates as

Σ~n=diag(λ~1,,λ~p).\displaystyle\tilde{\Sigma}_{n}=\textup{diag}(\tilde{\lambda}_{1},\ldots,\tilde{\lambda}_{p}). (2.4)

Later, in Theorem 1, we will show that the estimates λ~1,,λ~p\tilde{\lambda}_{1},\dots,\tilde{\lambda}_{p} are consistent, in the sense that their empirical distribution converges weakly in probability to the correct limit as nn\to\infty.

3 Theoretical results

In this section, we present three theoretical guarantees for the proposed method. Theorem 1 establishes appropriate notions of consistency for each of the estimators ς^n 2\widehat{\varsigma}_{n}^{\,2} and Σ~n\tilde{\Sigma}_{n}. Second, our main result in Theorem 2 shows that the bootstrap samples generated in Algorithm 1 consistently approximate the distributions of linear spectral statistics. Lastly, Theorem 3 demonstrates the asymptotic validity of bootstrap-based inference procedures involving nonlinear spectral statistics.

3.1 Setup

All of our theoretical analysis is framed in terms of a sequence of models indexed by nn, so that all model parameters are allowed to vary with nn, except when stated otherwise. The details of our model assumptions are given below in Assumptions 1 and 2.

Assumption 1 (Data generating model).

As nn\to\infty, the dimension pp grows so that the ratio cn=p/nc_{n}=p/n satisfies cncc_{n}\to c for some positive constant cc different from 1. For each i=1,,ni=1,\dots,n, the observation 𝐱ip\mathbf{x}_{i}\in\mathbb{R}^{p} can be represented as

𝐱i=ξiΣn1/2𝐮i,\mathbf{x}_{i}=\xi_{i}\Sigma_{n}^{1/2}\mathbf{u}_{i}, (3.1)

where Σnp×p\Sigma_{n}\in\mathbb{R}^{p\times p} is a deterministic non-zero positive semidefinite matrix, and (𝐮1,ξ1),,(\mathbf{u}_{1},\xi_{1}),\dots, (𝐮n,ξn)(\mathbf{u}_{n},\xi_{n}) are i.i.d. random vectors in p×[0,)\mathbb{R}^{p}\times[0,\infty) satisfying the following conditions: The vector 𝐮1\mathbf{u}_{1} is drawn from the uniform distribution on the unit sphere of p\mathbb{R}^{p}, and is independent of ξ1\xi_{1}. In addition, as nn\to\infty, the random variable ξ12\xi_{1}^{2} satisfies 𝔼(ξ12)=p\mathbb{E}(\xi_{1}^{2})=p, as well as the conditions

var(ξ12pp)=τ+o(1) and𝔼|ξ12pp|4+ε 1,\displaystyle\operatorname{var}\Big{(}\textstyle\frac{\xi_{1}^{2}-p}{\sqrt{p}}\Big{)}=\tau+o(1)\ \ \ \ \ \ \text{ and}\ \ \ \ \ \ \mathbb{E}\,\Big{|}\textstyle\frac{\xi_{1}^{2}-p}{\sqrt{p}}\Big{|}^{4+\varepsilon}\,\lesssim\,1, (3.2)

for some fixed constants τ0\tau\geq 0 and ε>0\varepsilon>0 that do not depend on nn.

Remarks. With regard to the limiting value cc for the ratio p/np/n, the single case of c=1c=1 is excluded so that we may employ certain facts about the QuEST method that were established in [26]. From a practical standpoint, our proposed method can still be used effectively when p/n=1p/n=1, as shown in Section 4. To address the moment conditions on ξ12\xi_{1}^{2}, a similar set of conditions was used in [18] to establish a high-dimensional central limit theorem for linear spectral statistics. However, our condition involving a 4+ε4+\varepsilon moment for (ξ12p)/p(\xi_{1}^{2}-p)/\sqrt{p} replaces a corresponding 2+ε2+\varepsilon moment condition in that work. The extra bit of integrability is used here to show that the estimators ς^n2\widehat{\varsigma}_{n}^{2} and Σ~n\tilde{\Sigma}_{n} have suitable asymptotic properties for ensuring bootstrap consistency.

Examples. To illustrate that the conditions in (3.2) cover a substantial range of situations, it is possible to provide quite a few explicit examples of distributions for ξ12\xi_{1}^{2} that are conforming:

  1. 1.

    Chi-Squared distribution with pp degrees of freedom

  2. 2.

    Poisson(pp)

  3. 3.

    (1τ)(1-\tau)Negative-Binomial(p,1τ)(p,1-\tau),  for any τ(0,1)\tau\in(0,1)

  4. 4.

    Gamma(p/τ,1/τ)(p/\tau,1/\tau),  for any τ>0\tau>0

  5. 5.

    Beta-Prime(p(1+p+τ)τ,1+p+2ττ)\left(\frac{p(1+p+\tau)}{\tau},\frac{1+p+2\tau}{\tau}\right),  for any τ>0\tau>0

  6. 6.

    Log-Normal(log(p)12log(1+τp),log(1+τp))\left(\log(p)-\frac{1}{2}\log\big{(}1+\frac{\tau}{p}\big{)},\log\big{(}1+\frac{\tau}{p}\big{)}\right),  for any τ>0\tau>0

  7. 7.

    (p+2β)(p+2\beta) Beta(p/2,β)(p/2,\beta),  for any β>0\beta>0

It is also possible to give a more abstract class of examples that subsumes some of the previous ones as special cases. In detail, the conditions in (3.2) will hold for any τ0\tau\geq 0 if ξ12=j=1pz1j2\xi_{1}^{2}=\sum_{j=1}^{p}z_{1j}^{2} for some independent random variables z11,,z1pz_{11},\dots,z_{1p} satisfying

1pj=1p𝔼(z1j2)=1,1pj=1pvar(z1j2)τ, and max1jp𝔼|z1j|8+2εp1+ε4.\small\textstyle\frac{1}{p}\sum_{j=1}^{p}\mathbb{E}(z_{1j}^{2})=1,\quad\textstyle\frac{1}{p}\sum_{j=1}^{p}\operatorname{var}(z_{1j}^{2})\to\tau,\text{\quad and \quad}\displaystyle\max_{1\leq j\leq p}\mathbb{E}|z_{1j}|^{8+2\varepsilon}\,\lesssim\,p^{1+\frac{\varepsilon}{4}}. (3.3)

Further details for checking the validity of the previous examples, as well as explicit parameterizations, are provided in Appendix E.

In addition to Assumption 1, we need one more assumption dealing with the spectrum of the population covariance matrix Σn\Sigma_{n}. To state this assumption, let HnH_{n} denote the empirical distribution function associated with λ1(Σn),,λp(Σn)\lambda_{1}(\Sigma_{n}),\dots,\lambda_{p}(\Sigma_{n}), which is defined for any tt\in\mathbb{R} according to

Hn(t)=1pj=1p1{λj(Σn)t}.H_{n}(t)\ =\ \frac{1}{p}\displaystyle\sum_{j=1}^{p}1\{\lambda_{j}(\Sigma_{n})\leq t\}. (3.4)
Assumption 2 (Spectral structure).

There is a limiting spectral distribution HH such that as nn\to\infty,

HnH,H_{n}\ \Rightarrow\ H, (3.5)

where the support of HH is a finite union of closed intervals, bounded away from 0 and \infty. Furthermore, there is a fixed compact interval in (0,)(0,\infty) containing the support of HnH_{n} for all large nn.

Remarks. Variations of these conditions on population eigenvalues are commonly used throughout random matrix theory. This particular set of conditions was used in [26] to establish theoretical guarantees for the QuEST estimation method in the context of IC models.

3.2 Consistency of estimators

Here, we establish the consistency of the estimators ς^n 2\widehat{\varsigma}_{n}^{\,2} and Σ~n\tilde{\Sigma}_{n}, defined in (2.2) and (2.4). The appropriate notion of consistency for Σ~n\tilde{\Sigma}_{n} is stated in terms of its empirical spectral distribution function, which is defined for any tt\in\mathbb{R} as

H~n(t)=1pj=1p1{λ~jt}.\tilde{H}_{n}(t)=\frac{1}{p}\sum_{j=1}^{p}1\{\tilde{\lambda}_{j}\leq t\}. (3.6)
Theorem 1.

Under Assumptions 1 and 2, the following limits hold as nn\to\infty,

1p(ς^n2ςn2)\displaystyle\textstyle\frac{1}{p}(\widehat{\varsigma}_{n}^{2}-\varsigma_{n}^{2})\ 0,\displaystyle\xrightarrow{\mathbb{P}}0, (3.7)
H~n\displaystyle\tilde{H}_{n} H.\displaystyle\xRightarrow{\mathbb{P}}H. (3.8)

Remarks. The limits (3.7) and (3.8) are proved in Appendices A.1 and A.2 respectively. Although these limits can be stated in a succinct form, quite a few details are involved in their proofs. For instance, the analysis of ς^n 2\widehat{\varsigma}_{n}^{\,2} is based on extensive calculations with polynomial functions of the quadratic forms 𝐱i22\|\mathbf{x}_{i}\|_{2}^{2} and 𝐱i𝐱j\mathbf{x}_{i}^{\top}\mathbf{x}_{j} with iji\neq j, as well as associated mixed moments. The consistency of H~n\tilde{H}_{n} is also notable because it requires showing the consistency of QuEST in elliptical models, and it seems that the consistency of QuEST has not previously been reported outside of IC models.

3.3 Consistency of bootstrap

To develop our main result on the consistency of the proposed bootstrap method, it is necessary to recall some background facts and introduce several pieces of notation.

Under Assumptions 1 and 2, it is known from [6, Theorem 1.1] that an extended version of the classical Marčenko-Pastur Theorem holds for the empirical spectral distribution function H^n(t)=1pj=1p1{λj(Σ^n)t}\widehat{H}_{n}(t)=\frac{1}{p}\sum_{j=1}^{p}1\{\lambda_{j}(\widehat{\Sigma}_{n})\leq t\}. Namely, there is a probability distribution Ψ(H,c)\Psi(H,c) on [0,)[0,\infty), depending only on HH and cc, such that the weak limit H^nΨ(H,c)\widehat{H}_{n}\Rightarrow\Psi(H,c) occurs almost surely. In this statement, we may regard Ψ(,)\Psi(\cdot,\cdot) as a map that takes a distribution HH^{\prime} on [0,)[0,\infty) and a number c>0c^{\prime}>0 as input, and returns a new distribution Ψ(H,c)\Psi(H^{\prime}\!,c^{\prime}) on [0,)[0,\infty) whose Stieltjes transform mH,c(z)=1λzd(Ψ(H,c))(λ)m_{H^{\prime}\!,c^{\prime}}(z)=\int\frac{1}{\lambda-z}d(\Psi(H^{\prime}\!,c^{\prime}))(\lambda) solves the Marčenko-Pastur equation (3.9) below. That is, for any z+z\in\mathbb{C}^{+}, the number mH,c=mH,c(z)m_{H^{\prime}\!,c^{\prime}}=m_{H^{\prime}\!,c^{\prime}}(z) is the unique solution to the equation

mH,c=1t(1cczmH,c)z𝑑H(t)m_{H^{\prime}\!,c^{\prime}}=\int\frac{1}{t(1-c^{\prime}-c^{\prime}zm_{H^{\prime}\!,c^{\prime}})-z}dH^{\prime}(t) (3.9)

within the set {m|(c1)/z+cm+}\{m\in\mathbb{C}\,|\,(c^{\prime}-1)/z+c^{\prime}m\in\mathbb{C}^{+}\}.

The map Ψ(,)\Psi(\cdot,\cdot) is relevant to our purposes here, because it determines a centering parameter that is commonly used in limit theorems for linear spectral statistics. In detail, if Hn,cnH_{n,c_{n}} denotes a shorthand for the probability distribution Ψ(Hn,cn)\Psi(H_{n},c_{n}), and if ff is a real-valued function defined on the support of Hn,cnH_{n,c_{n}}, then the associated centering parameter is defined as

ϑn(f)=f(t)𝑑Hn,cn(t).\vartheta_{n}(f)\ =\ \int f(t)dH_{n,c_{n}}(t). (3.10)

Similarly, let H~n,cn=Ψ(H~n,cn)\tilde{H}_{n,c_{n}}=\Psi(\tilde{H}_{n},c_{n}) and ϑ~n(f)=f(t)𝑑H~n,cn(t)\tilde{\vartheta}_{n}(f)=\int f(t)d\tilde{H}_{n,c_{n}}(t). Also, in order to simplify notation for handling the joint distribution of several linear spectral statistics arising from a fixed set of functions 𝐟=(f1,,fk)\mathbf{f}=(f_{1},\dots,f_{k}), we write Tn(𝐟)=(Tn(f1),,Tn(fk))T_{n}(\mathbf{f})=(T_{n}(f_{1}),\dots,T_{n}(f_{k})), and likewise for Tn,1(𝐟)T_{n,1}^{*}(\mathbf{f}), ϑn(𝐟)\vartheta_{n}(\mathbf{f}), and ϑ~n(𝐟)\tilde{\vartheta}_{n}(\mathbf{f}).

As one more preparatory item, recall from page 1 that dLPd_{\textup{LP}} denotes the Lévy-Prohorov metric for comparing distributions on k\mathbb{R}^{k}. This metric is a standard choice for formulating bootstrap consistency results, as it has the fundamental property of metrizing weak convergence.

Theorem 2.

Suppose Assumptions 1 and 2 hold, and let 𝐟=(f1,,fk)\mathbf{f}=(f_{1},\ldots,f_{k}) be a fixed set of real-valued functions that are analytic on an open subset of \mathbb{R} containing [0,)[0,\infty). Under these conditions, if Tn,1(𝐟)T_{n,1}^{*}(\mathbf{f}) is generated as in Algorithm 1 using the estimators ς^n 2\widehat{\varsigma}_{n}^{\,2} and Σ~n\tilde{\Sigma}_{n} defined by (2.2) and (2.4), then the following limit holds as nn\to\infty

dLP((p{Tn(𝐟)ϑn(𝐟)}),(p{Tn,1(𝐟)ϑ~n(𝐟)}|X)) 0.d_{\mathrm{LP}}\Big{(}\mathcal{L}\big{(}p\{T_{n}(\mathbf{f})-\vartheta_{n}(\mathbf{f})\}\big{)}\,,\,\mathcal{L}\big{(}p\{T_{n,1}^{*}(\mathbf{f})-\tilde{\vartheta}_{n}(\mathbf{f})\}\big{|}X\big{)}\Big{)}\ \xrightarrow{\mathbb{P}}\ 0. (3.11)

Remarks. The proof is given in Appendix B, and makes key use of a recently developed central limit theorem for linear spectral statistics due to [18]. Regarding other aspects of the theorem, there are two points to discuss. First, the assumption that the functions f1,,fkf_{1},\dots,f_{k} are defined on an open set containing [0,)[0,\infty) has been made for technical simplicity. In the setting where p/nc>1p/n\to c>1, this assumption is minor because f1,,fkf_{1},\dots,f_{k} must be defined at 0 due to the singularity of Σ^n\widehat{\Sigma}_{n}. Nevertheless, if p/nc(0,1)p/n\to c\in(0,1), then it is possible to show that a corresponding version of the theorem holds for analytic functions that are not defined at 0, and our numerical results confirm that the proposed method can successfully handle such cases. Second, the quantities ϑn(𝐟)\vartheta_{n}(\mathbf{f}) and ϑ~n(𝐟)\tilde{\vartheta}_{n}(\mathbf{f}) are only introduced so that the distributions appearing in (3.11) have non-trivial weak limits, which facilitates the proof. Still, the proposed method can be applied without requiring any particular type of centering.

3.4 Application to inference on stable rank with guarantees

When dealing with high-dimensional covariance matrices, it is often of interest to have a measure of the number of “dominant eigenvalues”. One such measure is the stable rank, defined as

rn=tr(Σn)2tr(Σn2),r_{n}=\frac{\ \textup{tr}(\Sigma_{n})^{2}}{\textup{tr}(\Sigma_{n}^{2})}, (3.12)

which arises naturally in a plethora of situations [e.g. 4, 43, 44, 30, 36, 33]. Whenever Σn\Sigma_{n} is non-zero, this parameter satisfies 1rnrank(Σn)1\leq r_{n}\leq\textup{rank}(\Sigma_{n}), and the equality rn=pr_{n}=p holds if and only if Σn\Sigma_{n} is proportional to the identity matrix.

In this subsection, we illustrate how the proposed bootstrap method can be applied to solve some inference problems involving the parameter rnr_{n}. Our first example shows how to construct a confidence interval for rnr_{n}, and our subsequent examples deal with testing procedures related to rnr_{n}. Later, in Theorem 3, we establish the theoretical validity of the methods used in these examples—showing that the confidence interval has asymptotically exact coverage, and that the relevant testing procedures maintain asymptotic control of their levels.

3.4.1 Confidence interval for stable rank

Our confidence interval for rnr_{n} is constructed using the estimator

r^n=tr(Σ^n)2tr(Σ^n2)Δ^n,\displaystyle\widehat{r}_{n}=\frac{\textup{tr}(\widehat{\Sigma}_{n})^{2}}{\textup{tr}(\widehat{\Sigma}_{n}^{2})-\widehat{\Delta}_{n}}, (3.13)

where we define

Δ^n\displaystyle\widehat{\Delta}_{n} =tr(Σ^n2)n[2(n1)np2+ς^n2p(p+2)1]\displaystyle=\frac{\textup{tr}(\widehat{\Sigma}_{n}^{2})}{n}\bigg{[}\frac{2(n-1)}{n}\frac{p^{2}+\widehat{\varsigma}_{n}^{2}}{p(p+2)}-1\bigg{]}
+tr(Σ^n)2n[n+1n+n1nς^n22pp(p+2)2(n1)n2p2+ς^n2p(p+2)],\displaystyle\ \ \ +\frac{\textup{tr}(\widehat{\Sigma}_{n})^{2}}{n}\bigg{[}\frac{n+1}{n}+\frac{n-1}{n}\frac{{\widehat{\varsigma}}_{n}^{2}-2p}{p(p+2)}-\frac{2(n-1)}{n^{2}}\frac{p^{2}+\widehat{\varsigma}_{n}^{2}}{p(p+2)}\bigg{]},

and we set r^n\widehat{r}_{n} equal to nn in the exceptional case that its denominator is 0. It should be noted that r^n\widehat{r}_{n} differs from the naive plug-in rule tr(Σ^n)2/tr(Σ^n2)\textup{tr}(\widehat{\Sigma}_{n})^{2}/\textup{tr}(\widehat{\Sigma}_{n}^{2}), since the extra term Δ^\widehat{\Delta} in the denominator serves as a bias correction.

To proceed, let q1αq_{1-\alpha} denote (1α)(1-\alpha)-quantile of the random variable (r^nrn)/p(\widehat{r}_{n}-r_{n})/p for any fixed α(0,1)\alpha\in(0,1), and consider the interval n=[r^npq1α/2,r^npqα/2]\mathcal{I}_{n}=[\widehat{r}_{n}-pq_{1-\alpha/2},\widehat{r}_{n}-pq_{\alpha/2}]. Whenever the distribution of (r^nrn)/p(\widehat{r}_{n}-r_{n})/p is continuous, this interval satisfies

(rnn)=1α.\mathbb{P}(r_{n}\in\mathcal{I}_{n})=1-\alpha. (3.14)

However, the quantiles qα/2q_{\alpha/2} and q1α/2q_{1-\alpha/2} are unknown, and so they must be estimated. This can be accomplished by generating bootstrap samples of the following form in Algorithm 1,

Tn,1=1p(tr(Σ^n)2tr((Σ^n)2)Δ^ntr(Σ~n)2tr(Σ~n2)),T^{*}_{n,1}=\frac{1}{p}\bigg{(}\frac{\textup{tr}(\widehat{\Sigma}^{*}_{n})^{2}}{\textup{tr}((\widehat{\Sigma}^{*}_{n})^{2})-\widehat{\Delta}_{n}^{{}^{*}}}\,-\,\frac{\textup{tr}(\tilde{\Sigma}_{n})^{2}}{\textup{tr}(\tilde{\Sigma}_{n}^{2})}\bigg{)},

where Δ^n\widehat{\Delta}_{n}^{{}^{*}} is defined by modifying the previous formula for Δ^n\widehat{\Delta}_{n} so that Σ^n\widehat{\Sigma}_{n} and ς^n2\widehat{\varsigma}_{n}^{2} are replaced by versions computed with the bootstrap data 𝐱1,,𝐱n\mathbf{x}_{1}^{*},\dots,\mathbf{x}_{n}^{*}. Also, we define Tn,1T_{n,1}^{*} to be 0 in the exceptional case of a denominator being equal to 0. Letting q^α/2\widehat{q}_{\alpha/2} and q^1α/2\widehat{q}_{1-\alpha/2} denote the respective α/2\alpha/2 and (1α/2)(1-\alpha/2)-quantiles of (Tn,1|X)\mathcal{L}(T_{n,1}^{*}|X), the proposed confidence interval is defined as

^n=[r^npq^1α/2,r^npq^α/2].\widehat{\mathcal{I}}_{n}\ =\ \Big{[}\widehat{r}_{n}-p\widehat{q}_{1-\alpha/2}\,,\,\widehat{r}_{n}-p\widehat{q}_{\alpha/2}\Big{]}. (3.15)

Below, Theorem 3 shows that as nn\to\infty, the coverage probability (rn^n)\mathbb{P}(r_{n}\in\widehat{\mathcal{I}}_{n}) converges to 1α1-\alpha, as desired.

3.4.2 Hypotheses related to stable rank

Screening data for PCA. Consider a scenario where a collection of different datasets are to be screened for further investigation by principal components analysis (PCA). In this situation, the datasets that should be discarded are the ones that cannot be well summarized by a moderate number of principal components. To put this in more quantitative terms, a dataset may be considered unsuitable for PCA if the stable rank rnr_{n} exceeds a certain fraction of the full dimension pp. That is, if rnp>ϵ0\frac{r_{n}}{p}>\epsilon_{0} for some fixed reference value ϵ0(0,1)\epsilon_{0}\in(0,1). On the other hand, if rnpϵ0\frac{r_{n}}{p}\leq\epsilon_{0}, then the dataset may be retained. This leads to considering the hypothesis testing problem

𝖧0,n:rnpϵ0 vs. 𝖧1,n:rnp>ϵ0.\mathsf{H}_{0,n}:\textstyle\frac{r_{n}}{p}\leq\epsilon_{0}\text{ \ \ \ \ \ \ vs. \ \ \ \ \ \ }\mathsf{H}_{1,n}:\textstyle\frac{r_{n}}{p}>\epsilon_{0}. (3.16)

To develop a testing procedure, we may again consider the (1α)(1-\alpha)-quantile q1αq_{1-\alpha} of the random variable (r^nrn)/p(\widehat{r}_{n}-r_{n})/p. This quantile serves as a conceptual basis for a rejection criterion, because it satisfies the following inequality under the null hypothesis

(r^npϵ0>q1α)α.\mathbb{P}\Big{(}\textstyle\frac{\widehat{r}_{n}}{p}-\epsilon_{0}>q_{1-\alpha}\Big{)}\ \leq\ \alpha. (3.17)

In other words, if q1αq_{1-\alpha} were known, then a level-α\alpha testing procedure would result from using 1pr^nϵ0>q1α\frac{1}{p}\widehat{r}_{n}-\epsilon_{0}>q_{1-\alpha} as a rejection criterion. Accordingly, a bootstrap-based version of this procedure rejects the null hypothesis when 1pr^nϵ0>q^1α\frac{1}{p}\widehat{r}_{n}-\epsilon_{0}>\widehat{q}_{1-\alpha}, with the quantile estimate q^1α\widehat{q}_{1-\alpha} defined as before.

Testing for sphericity. One more example of a testing problem related to rnr_{n} is that of testing for sphericity,

𝖧0,n:Σn=σ2I for some σ2>0 vs. 𝖧1,n:Σnσ2I for any σ2>0.\mathsf{H}^{\prime}_{0,n}:\Sigma_{n}=\sigma^{2}I\text{ \ for some \ $\sigma^{2}>0$}\text{ \ \ \ vs. \ \ \ }\mathsf{H}^{\prime}_{1,n}:\Sigma_{n}\neq\sigma^{2}I\text{ \ for any \ $\sigma^{2}>0$}. (3.18)

The connection to the parameter rnr_{n} arises from the fact that (3.18) is equivalent to the problem 𝖧0,n:rnp=1\mathsf{H}^{\prime}_{0,n}:\frac{r_{n}}{p}=1 vs. 𝖧1,n:rnp<1\mathsf{H}^{\prime}_{1,n}:\frac{r_{n}}{p}<1. This observation was used in the influential paper [43] to develop a formula-based sphericity test. By analogy with our discussion of the problem (3.16), this observation can also be used to develop a bootstrap-based sphericity test. However, there is one point of distinction in the current situation, which is that the eigenvalues of Σn\Sigma_{n} no longer need to be estimated. The reason is that under 𝖧0,n\mathsf{H}_{0,n}^{\prime}, the scale-invariance of the statistic (r^nrn)/p(\widehat{r}_{n}-r_{n})/p causes it to behave as if Σn=Ip\Sigma_{n}=I_{p}. Consequently, to estimate the quantiles of the null distribution of (r^nrn)/p(\widehat{r}_{n}-r_{n})/p, Algorithm 1 can be run using Σ~n=Ip\tilde{\Sigma}_{n}=I_{p}. To summarize, if we let q^α\widehat{q}_{\alpha}^{\prime} denote the resulting estimate for the α\alpha-quantile of the null distribution of (r^nrn)/p(\widehat{r}_{n}-r_{n})/p, then the rejection criterion is r^np1q^α\frac{\widehat{r}_{n}}{p}-1\leq\widehat{q}_{\alpha}^{\prime}.

The following result establishes the theoretical validity of the procedures discussed in this subsection.

Theorem 3.

If Assumptions 1 and 2 hold, then as nn\to\infty,

(rn^n) 1α.\mathbb{P}\big{(}r_{n}\in\widehat{\mathcal{I}}_{n}\big{)}\ \to\ 1-\alpha. (3.19)

Furthermore, if 𝖧0,n\mathsf{H}_{0,n} holds for all large nn, then

(r^npϵ0>q^1α)α+o(1),\mathbb{P}\Big{(}\textstyle\frac{\widehat{r}_{n}}{p}-\epsilon_{0}>\widehat{q}_{1-\alpha}\Big{)}\ \leq\ \alpha+o(1), (3.20)

or if 𝖧0,n\mathsf{H}^{\prime}_{0,n} holds for all large nn, then

(r^np1q^α)α.\mathbb{P}\Big{(}\textstyle\frac{\widehat{r}_{n}}{p}-1\leq\widehat{q}_{\alpha}^{\prime}\Big{)}\ \to\ \alpha. (3.21)

Remarks. This result provides a notable complement to Theorem 2, because it demonstrates that bootstrap consistency can be established in tasks that are based on a nonlinear spectral statistic, namely r^nrn\widehat{r}_{n}-r_{n}. The proof of this result is given in Appendix C, where it can be seen that the limiting distribution of r^nrn\widehat{r}_{n}-r_{n} has a very complicated dependence on the moments of HH, due to the correlation between tr(Σ^n)\textup{tr}(\widehat{\Sigma}_{n}) and tr(Σ^n2)\textup{tr}(\widehat{\Sigma}_{n}^{2}). In this way, the proof illustrates the utility of the bootstrap, since the bootstrap enables the user to completely bypass such complexity.

4 Numerical results

This section explores the empirical performance of the proposed bootstrap method in three different ways. Sections 4.2 and 4.3 deal with bootstrap approximations for linear and nonlinear spectral statistics, while Section 4.4 looks at procedures for doing inference on the stable rank parameter rnr_{n}.

4.1 Parameter settings

In our simulations, we generated data from elliptical models that were parameterized as follows. The random variable ξ12\xi_{1}^{2} was generated using four choices of distributions:

  1. (i).

    Chi-Squared distribution with pp degrees of freedom

  2. (ii).

    Beta-Prime(p(1+p+8)8,1+p+168)\left(\frac{p(1+p+8)}{8},\frac{1+p+16}{8}\right),

  3. (iii).

    (p+4)Beta(p/2,2)(p+4)\textup{Beta}(p/2,2),

  4. (iv).

    9p10F(p,20)\frac{9p}{10}\textup{F}(p,20),

where F(d1,d2)(d_{1},d_{2}) denotes an F-distribution with d1d_{1} and d2d_{2} degrees of freedom. Note that cases (i), (iii), (iv), correspond respectively to a multivariate Gaussian distribution, a multivariate Pearson type II distribution, and a multivariate t-distribution with 20 degrees of freedom. Also, the numerical values appearing in (i)-(iv) were chosen to ensure the normalization condition 𝔼(ξ12)=p\mathbb{E}(\xi_{1}^{2})=p.

The population covariance matrix Σn\Sigma_{n} was selected from five options:

  1. 1.

    The eigenvalues of Σn\Sigma_{n} are λ1(Σn)==λ5(Σn)=43\lambda_{1}(\Sigma_{n})=\cdots=\lambda_{5}(\Sigma_{n})=\frac{4}{3} and λj(Σn)=1\lambda_{j}(\Sigma_{n})=1 for j{6,,p}j\in\{6,\dots,p\}. The p×pp\times p matrix of eigenvectors of Σn\Sigma_{n} is generated from the uniform distribution on orthogonal matrices.

  2. 2.

    The eigenvalues of Σn\Sigma_{n} are λj(Σn)=exp(j/4)\lambda_{j}(\Sigma_{n})=\exp(-j/4) for j{1,,20}j\in\{1,\dots,20\}, and λ20(Σn)==λp(Σn)\lambda_{20}(\Sigma_{n})=\cdots=\lambda_{p}(\Sigma_{n}). The eigenvectors are the same as in case 1.

  3. 3.

    The matrix Σn\Sigma_{n} has entries of the form (Σn)ij=(110)|ij|+1{i=j}(\Sigma_{n})_{ij}=(\frac{1}{10})^{|i-j|}+1\{i=j\}.

  4. 4.

    The matrix Σn\Sigma_{n} has entries of the form (Σn)ij=(110)1{ij}+1{i=j}(\Sigma_{n})_{ij}=(\frac{1}{10})1\{i\neq j\}+1\{i=j\}.

  5. 5.

    The eigenvalues of Σn\Sigma_{n} are λ1(Σn)=5\lambda_{1}(\Sigma_{n})=5 and λj(Σn)=1\lambda_{j}(\Sigma_{n})=1 for j{2,,p}j\in\{2,\dots,p\}. The eigenvectors are the same as in case 1.

4.2 Linear spectral statistics

Our experiments for linear spectral statistics were based on the task of using the proposed bootstrap to estimate three parameters of (p(Tn(f)ϑn(f)))\mathcal{L}(p(T_{n}(f)-\vartheta_{n}(f))): the mean, standard deviation, and 95th percentile. We considered two choices for the function ff, namely f(x)=x2f(x)=x^{2} and f(x)=xlog(x)1f(x)=x-\log(x)-1. In the first case, we selected the ratio p/np/n so that p/n{0.5,1,1.5}p/n\in\{0.5,1,1.5\}, and in the second case we used p/n{0.3,0.5,0.7}p/n\in\{0.3,0.5,0.7\}.

Design of experiments. For each possible choice of (ξ1,Σn,p/n)(\xi_{1},\Sigma_{n},p/n), we generated 5000 realizations of the dataset 𝐱1,,𝐱n\mathbf{x}_{1},\dots,\mathbf{x}_{n}, with n=400n=400. These datasets allowed us to compute 5000 realizations of the statistic p(Tn(f)ϑn(f))p(T_{n}(f)-\vartheta_{n}(f)), and we treated the empirical mean, standard deviation, and 95th percentile of these 5000 realizations as ground truth for our parameters of interest. In Tables 1 and 2, the ground truth values are reported in the first row of numbers corresponding to each choice of Σn\Sigma_{n}.

With regard to the bootstrap, we ran Algorithm 1 on the first 500 datasets corresponding to each parameter setting. Also, we generated B=250B=250 bootstrap samples of the form p(Tn,1(f)ϑ~n(f))p(T_{n,1}^{*}(f)-\tilde{\vartheta}_{n}(f)) during every run. As a result of these runs, we obtained 500 different bootstrap estimates of the mean, standard deviation, and 95th percentile of (p(Tn(f)ϑn(f)))\mathcal{L}(p(T_{n}(f)-\vartheta_{n}(f))). In Tables 1 and 2, we report the empirical mean and standard deviation (in parenthesis) of these 500 estimates in the second row of numbers corresponding to each choice of Σn\Sigma_{n}.

One more detail to mention is related to the computation of ϑn(f)\vartheta_{n}(f) and ϑ~n(f)\tilde{\vartheta}_{n}(f). For each parameter setting, we approximated ϑn(f)\vartheta_{n}(f) as follows. We averaged 30 realizations of 140pj=140pf(λj(140ni=140nξi2Σn1/2𝐮i𝐮iΣn1/2))\frac{1}{40p}\sum_{j=1}^{40p}f(\lambda_{j}(\frac{1}{40n}\sum_{i=1}^{40n}\xi_{i}^{2}\mathsf{\Sigma}_{n}^{1/2}\mathbf{u}_{i}\mathbf{u}_{i}^{\top}\mathsf{\Sigma}_{n}^{1/2})), where Σn=I40Σn\mathsf{\Sigma}_{n}=I_{40}\otimes\Sigma_{n} is of size 40p×40p40p\times 40p, each 𝐮i\mathbf{u}_{i} was drawn from the uniform distribution on the unit sphere of 40p\mathbb{R}^{40p}, and each ξi2\xi_{i}^{2} was generated as in (i)-(iv), but with 40p40p replacing pp. For the bootstrap samples, we computed one realization of the statistic 140pj=140pf(λj(140ni=140n(ξi)2Σ~n1/2𝐮i(𝐮i)Σ~n1/2))\frac{1}{40p}\sum_{j=1}^{40p}f(\lambda_{j}(\frac{1}{40n}\sum_{i=1}^{40n}(\xi_{i}^{*})^{2}\tilde{\mathsf{\Sigma}}_{n}^{1/2}\mathbf{u}_{i}^{*}(\mathbf{u}_{i}^{*})^{\top}\tilde{\mathsf{\Sigma}}_{n}^{1/2})) to approximate ϑ~n(f)\tilde{\vartheta}_{n}(f) during every run of Algorithm 1, where Σ~n=I40Σ~n\tilde{\mathsf{\Sigma}}_{n}=I_{40}\otimes\tilde{\Sigma}_{n} is of size 40p×40p40p\times 40p, each 𝐮i\mathbf{u}_{i}^{*} was drawn from the uniform distribution on the unit sphere of 40p\mathbb{R}^{40p}, and each (ξi)2(\xi_{i}^{*})^{2} was drawn from a Gamma distribution with mean 40p40p and variance 40ς^n240\widehat{\varsigma}_{n}^{2}.

Comments on results. It is easiest to explain the format of the tables with an example: The two entries in the upper right corner of Table 1 show that in settings (i) and 1 with p/n=1.5p/n=1.5, the 95th percentile of (p(Tn(f)ϑn(f)))\mathcal{L}(p(T_{n}(f)-\vartheta_{n}(f))) with f(x)=x2f(x)=x^{2} is equal to 16.48, and the bootstrap estimate for the 95th percentile has a mean (standard deviation) of 16.73 (1.48). Table 2 presents results for f(x)=log(x)x1f(x)=\log(x)-x-1 in the same format.

In most settings, the bootstrap estimates perform well, with their bias and standard deviation being small in proportion to the parameter being estimated. However, there are some specific parameter settings that require more attention. These settings involve choice 44 for Σn\Sigma_{n}, which is an equi-correlation matrix, and choice (iv) for ξ1\xi_{1}, which induces a multivariate t-distribution with 20 degrees of freedom. Notably, these choices correspond to settings that violate Assumptions 1 and 2 of our theoretical results. In the case of the equi-correlation matrix, the bootstrap approximations for f(x)=x2f(x)=x^{2} are less accurate in comparison to other choices of Σn\Sigma_{n}, due to increased variance. By contrast, if f(x)=log(x)x1f(x)=\log(x)-x-1, then the bootstrap approximations have similar accuracy across all choices of Σn\Sigma_{n} while holding other parameters fixed.

Lastly, in the case of the multivariate t-distribution, the bootstrap is able to accurately estimate the standard deviation of (p(Tn(f)ϑn(f)))\mathcal{L}(p(T_{n}(f)-\vartheta_{n}(f))) for both choices of ff, but difficulties arise in estimating the mean and 95th percentile. To understand these mixed results, it is important to recognize the standard deviation does not depend on the centering parameter ϑn(f)\vartheta_{n}(f), whereas the mean and 95th percentile do. Also, the choice of ϑn(f)\vartheta_{n}(f) as a centering parameter is based on the CLT for linear spectral statistics established in [18], and the assumptions underlying that result are violated by the multivariate t-distribution.

Table 1: Results for p(Tn(f)ϑn(f))p(T_{n}(f)-\vartheta_{n}(f)) with f(x)=x2f(x)=x^{2}
p/n=0.5p/n=0.5 p/n=1p/n=1 p/n=1.5p/n=1.5
ξ1\xi_{1} Σn\Sigma_{n} mean sd 95th mean sd 95th mean sd 95th
1 0.51 3.31 6.02 0.85 6.11 10.87 1.44 9.15 16.48
0.48(0.24) 3.27(0.19) 5.85(0.53) 0.96(0.44) 6.09(0.35) 11.02(1) 1.47(0.65) 9.25(0.51) 16.73(1.48)
2 0 0.11 0.19 0.01 0.11 0.19 0 0.11 0.2
0(0.01) 0.11(0.01) 0.19(0.03) 0(0.01) 0.11(0.01) 0.19(0.02) 0(0.01) 0.11(0.01) 0.2(0.03)
(i) 3 2.16 12.88 23.12 4.02 24.12 44.03 5.8 36.13 65.72
1.9(0.88) 12.85(0.73) 23.2(2.07) 3.94(1.75) 24.27(1.35) 43.95(3.82) 5.8(2.77) 37.08(2.2) 67.09(6.1)
4 0.64 63.35 110.6 2.48 241.3 420 14.11 540 952.2
1.53(4.7) 62.9(9.6) 111.2(19.68) 4.95(17.04) 245(34.43) 433(70.15) 11.44(36.41) 541.9(78.44) 958.3(161.6)
5 0.55 5.03 9.1 1.03 7.49 13.4 1.57 10.42 18.58
0.55(0.34) 5.1(0.46) 9.15(1.05) 1.04(0.53) 7.41(0.48) 13.4(1.22) 1.55(0.74) 10.35(0.65) 18.61(1.7)
1 3.57 6.35 14.09 6.69 11.93 26.9 10.25 18.09 39.92
3.47(0.59) 6.36(0.45) 14.03(1.29) 6.83(1) 11.85(0.76) 26.51(2.32) 10.26(1.53) 17.99(1.11) 39.92(3.25)
2 0.01 0.12 0.21 0 0.11 0.2 0 0.11 0.19
0.01(0.01) 0.12(0.01) 0.2(0.03) 0.01(0.01) 0.11(0.01) 0.2(0.03) 0(0.01) 0.11(0.01) 0.2(0.03)
(ii) 3 14.16 25.27 56 27.59 47.57 105.8 41.49 71.66 159.7
13.67(2.24) 25.12(1.76) 55.35(5.1) 26.94(3.72) 47.09(2.97) 104.9(8.33) 40.39(6.13) 71.59(4.5) 158.6(13.38)
4 2.26 65.49 114.9 9.27 246.8 435.7 23.51 539.9 970.1
4.7(4.65) 66.35(9.67) 119.7(20.26) 9.81(16.29) 248.2(38.15) 441.7(75.8) 19.95(37.73) 548.5(83.74) 971.1(172)
5 3.75 7.75 16.66 7.21 13.01 28.58 10.46 18.87 42.02
3.59(0.65) 7.86(0.62) 16.69(1.58) 7.03(1.05) 12.87(0.84) 28.29(2.43) 10.4(1.51) 18.83(1.1) 41.45(3.22)
1 -0.46 1.13 1.44 -0.95 2.13 2.61 -1.47 3.12 3.71
-0.47(0.08) 1.14(0.07) 1.4(0.17) -0.96(0.14) 2.15(0.11) 2.58(0.33) -1.43(0.21) 3.17(0.15) 3.81(0.45)
2 0.01 0.11 0.19 0 0.11 0.19 0 0.11 0.19
0(0.01) 0.11(0.01) 0.19(0.03) 0(0.01) 0.11(0.01) 0.19(0.02) 0(0.01) 0.11(0.01) 0.19(0.03)
(iii) 3 -1.93 4.49 5.41 -3.89 8.66 10.62 -5.62 13 15.33
-1.85(0.33) 4.56(0.25) 5.68(0.69) -3.78(0.57) 8.69(0.48) 10.57(1.26) -5.7(0.9) 12.87(0.68) 15.52(1.88)
4 1.61 63.23 110 6.1 242 420.7 -6.78 541.8 944.3
0.75(4.21) 63.54(9.08) 111.2(18.37) 3.06(16.56) 242.5(37.18) 423.6(74.62) 5.52(35.54) 530.5(80.96) 928.1(161.8)
5 -0.52 3.78 5.98 -0.9 4.45 6.74 -1.3 5.35 7.75
-0.48(0.25) 3.83(0.51) 6.14(1.02) -0.94(0.32) 4.53(0.55) 6.81(1.17) -1.4(0.36) 5.38(0.53) 7.67(1.15)
1 0.67 12.57 21.94 1.19 33.98 58.49 0.77 64.69 114.1
13.02(2.03) 12.28(1.26) 33.54(4.18) 50.46(7.66) 33.11(3.74) 105.7(13.79) 111.2(15.79) 62.29(6.76) 215.5(26.71)
2 0.01 0.13 0.24 0.01 0.14 0.25 0 0.14 0.23
0.01(0.01) 0.13(0.02) 0.24(0.04) 0.02(0.01) 0.14(0.02) 0.25(0.04) 0.02(0.01) 0.14(0.02) 0.25(0.04)
(iv) 3 0.68 48.68 82.62 1.3 136.4 236.7 15.26 260 457.1
51.21(8.4) 48.62(5.03) 132.7(17.04) 198.5(28.4) 131.5(13.71) 418.8(51.54) 442.9(64.24) 250.9(28.24) 861.1(108)
4 2.75 73.19 135.4 17.25 281.2 506.3 27.98 603.8 1097
14.03(5.46) 73.41(12.32) 142.2(25.81) 54.28(20.28) 277.7(44.26) 538.9(90.87) 123.8(46.62) 622.4(97.37) 1217(206.5)
5 0.42 14.02 23.86 2.62 35.23 62.08 -0.69 66.26 110.7
13.2(2.08) 13.82(1.41) 36.31(4.4) 50.71(7.33) 34.45(3.39) 108.3(12.79) 111.5(15.1) 64.04(6.65) 219(26.3)
Table 2: Results for p(Tn(f)ϑn(f))p(T_{n}(f)-\vartheta_{n}(f)) with f(x)=xlog(x)1f(x)=x-\log(x)-1
p/n=0.3p/n=0.3 p/n=0.5p/n=0.5 p/n=0.7p/n=0.7
ξ1\xi_{1} Σn\Sigma_{n} mean sd 95th mean sd 95th mean sd 95th
1 0.18 0.35 0.77 0.34 0.64 1.4 0.6 1 2.24
0.17(0.03) 0.35(0.02) 0.74(0.05) 0.34(0.06) 0.63(0.03) 1.37(0.09) 0.58(0.09) 1.01(0.05) 2.25(0.14)
(i) 2 0.18 0.83 1.55 0.36 1.16 2.25 0.59 1.53 3.1
0.19(0.22) 0.79(0.26) 1.49(0.65) 0.36(0.45) 1.11(0.37) 2.19(1.05) 0.62(0.7) 1.51(0.44) 3.1(1.42)
3 0.18 0.86 1.62 0.35 1.19 2.27 0.55 1.58 3.12
0.17(0.06) 0.86(0.05) 1.59(0.14) 0.33(0.09) 1.19(0.06) 2.3(0.19) 0.58(0.12) 1.56(0.08) 3.17(0.25)
4 0.16 0.91 1.66 0.32 1.53 2.92 0.57 2.21 4.29
0.17(0.09) 0.91(0.07) 1.71(0.18) 0.33(0.18) 1.54(0.12) 2.91(0.31) 0.58(0.31) 2.21(0.16) 4.29(0.49)
5 0.18 0.44 0.91 0.32 0.7 1.46 0.56 1.04 2.28
0.17(0.04) 0.45(0.03) 0.92(0.07) 0.34(0.06) 0.69(0.03) 1.48(0.11) 0.59(0.09) 1.05(0.05) 2.31(0.15)
1 1 0.37 1.62 1.74 0.65 2.79 2.56 1.02 4.25
1.04(0.11) 0.37(0.02) 1.66(0.12) 1.79(0.17) 0.66(0.03) 2.87(0.19) 2.64(0.22) 1.04(0.05) 4.35(0.25)
(ii) 2 1.04 1.52 3.54 1.74 2.02 5.06 2.54 2.49 6.54
1.05(0.4) 1.51(0.25) 3.54(0.81) 1.77(0.7) 2.01(0.35) 5.09(1.27) 2.54(0.96) 2.46(0.41) 6.59(1.63)
3 1.02 1.63 3.78 1.78 2.15 5.34 2.57 2.63 6.94
1.04(0.16) 1.6(0.12) 3.68(0.36) 1.8(0.22) 2.11(0.14) 5.31(0.45) 2.61(0.28) 2.59(0.16) 6.88(0.53)
4 1.02 0.94 2.59 1.69 1.57 4.26 2.58 2.22 6.3
1.04(0.16) 0.94(0.07) 2.63(0.24) 1.8(0.31) 1.58(0.12) 4.45(0.43) 2.63(0.49) 2.25(0.17) 6.4(0.63)
5 1.02 0.47 1.8 1.78 0.71 2.96 2.56 1.06 4.31
1.05(0.13) 0.48(0.03) 1.84(0.16) 1.8(0.17) 0.72(0.04) 2.98(0.2) 2.64(0.22) 1.08(0.05) 4.41(0.27)
1 -0.11 0.34 0.45 -0.13 0.62 0.9 -0.14 1 1.49
-0.11(0.02) 0.34(0.02) 0.45(0.05) -0.14(0.04) 0.62(0.03) 0.88(0.08) -0.09(0.07) 1.01(0.04) 1.57(0.14)
(iii) 2 -0.12 0.36 0.49 -0.14 0.64 0.93 -0.08 1.01 1.57
-0.04(0.12) 0.47(0.18) 0.74(0.43) 0.05(0.29) 0.85(0.28) 1.45(0.74) 0.18(0.45) 1.22(0.31) 2.19(0.95)
3 -0.11 0.38 0.52 -0.14 0.65 0.93 -0.07 1.03 1.66
-0.11(0.03) 0.38(0.02) 0.51(0.05) -0.14(0.05) 0.65(0.03) 0.93(0.09) -0.09(0.07) 1.03(0.05) 1.6(0.14)
4 -0.09 0.91 1.43 -0.12 1.55 2.47 -0.14 2.21 3.64
-0.09(0.07) 0.9(0.08) 1.42(0.17) -0.1(0.13) 1.54(0.11) 2.48(0.28) 0.01(0.21) 2.21(0.16) 3.7(0.4)
5 -0.11 0.43 0.62 -0.15 0.67 0.94 -0.15 1.06 1.57
-0.11(0.03) 0.44(0.03) 0.62(0.07) -0.14(0.05) 0.68(0.03) 0.99(0.09) -0.09(0.07) 1.04(0.05) 1.64(0.15)
1 0.19 0.45 0.95 0.37 0.86 1.81 0.68 1.44 3.15
2.35(0.28) 0.44(0.03) 3.07(0.31) 6.45(0.79) 0.84(0.06) 7.85(0.87) 12.53(1.51) 1.41(0.1) 14.87(1.64)
2 0.15 2.15 3.75 0.26 3.41 5.85 0.46 4.77 8.24
2.37(0.81) 2.2(0.31) 6.01(1.31) 6.37(1.52) 3.64(0.43) 12.36(2.23) 12.6(2.65) 5.16(0.56) 21.06(3.51)
(iv) 3 0.13 2.42 4.3 0.22 3.96 6.84 0.57 5.75 10.49
2.37(0.34) 2.32(0.21) 6.25(0.7) 6.45(0.86) 3.8(0.35) 12.82(1.43) 12.67(1.58) 5.31(0.5) 21.49(2.33)
4 0.19 1.01 1.89 0.4 1.7 3.29 0.5 2.53 4.75
2.39(0.35) 0.99(0.09) 4.06(0.44) 6.46(0.93) 1.72(0.15) 9.37(1.08) 12.44(1.78) 2.52(0.2) 16.64(1.97)
5 0.19 0.54 1.1 0.35 0.93 1.92 0.62 1.49 3.12
2.36(0.3) 0.54(0.04) 3.25(0.35) 6.35(0.78) 0.9(0.07) 7.86(0.87) 12.55(1.55) 1.46(0.1) 14.97(1.7)

Breakdown of nonparametric bootstrap. To highlight one of the motivations for our parametric bootstrap method, we close this subsection with some numerical results exhibiting the breakdown of the standard nonparametric bootstrap, based on sampling with replacement. For the sake of brevity, we focus only on the task of estimating the standard deviation of p(Tn(f)ϑn(f))p(T_{n}(f)-\vartheta_{n}(f)) in a subset of the previous parameter settings, corresponding to p/n=1/2p/n=1/2, and (ξn,Σn){(i)}×{1,2,3,4,5}(\xi_{n},\Sigma_{n})\in\{\text{(i)}\}\times\{1,2,3,4,5\}. The standard deviation is of particular interest, because it clarifies that the breakdown does not depend on how the statistic is centered.

The results are presented in Table 3, showing that the nonparametric bootstrap tends to overestimate the true standard deviation, often by a factor of 2 or more. For comparison, the corresponding estimates obtained from the proposed bootstrap method are much more accurate, as can be seen from Tables 1 and 2. In addition, the nonparametric bootstrap can have difficulties with nonlinear spectral statistics in settings like those considered here. Numerical results along these lines can be found in the paper [13].

Table 3: Estimation of standard deviation of p(Tn(f)ϑn(f))p(T_{n}(f)-\vartheta_{n}(f)) with the nonparametric bootstrap
p/np/n ξ1\xi_{1} Σn\Sigma_{n} f(x)=x2f(x)=x^{2} f(x)=xlog(x)1f(x)=x-\log(x)-1
0.5 (i) 1 3.31 0.64
9.1(0.45) 6.08(0.27)
0.5 (i) 2 0.11 1.16
0.11(0.01) 6.11(0.28)
0.5 (i) 3 12.88 1.19
35.72(1.83) 6.23(0.31)
0.5 (i) 4 63.35 1.53
66.65(10.63) 6.25(0.3)
0.5 (i) 5 5.03 0.7
10.35(0.63) 6.11(0.27)

4.3 Nonlinear spectral statistics

This subsection looks at how well the proposed bootstrap handles nonlinear spectral statistics. The statistics under consideration here are the largest sample eigenvalue λ1(Σ^n)\lambda_{1}(\widehat{\Sigma}_{n}), and the leading eigengap λ1(Σ^n)λ2(Σ^n)\lambda_{1}(\widehat{\Sigma}_{n})-\lambda_{2}(\widehat{\Sigma}_{n}). The underlying experiments for these statistics were designed in the same manner as in Section 4.2, and Tables 4 and 5 display the results in the same format as Tables 1 and 2. To a large extent, the favorable patterns that were noted in the results for linear spectral statistics are actually enhanced in the results for nonlinear spectral statistics—in the sense that the bias and standard deviations of the bootstrap estimates are generally smaller here than before. Also, in contrast to linear spectral statistics, the results for nonlinear spectral statistics are relatively unaffected by choice 4 for Σn\Sigma_{n}. Lastly, under choice (iv) for ξ1\xi_{1}, the accuracy for nonlinear spectral statistics is reduced in comparison to other choices of ξ1\xi_{1}. Nevertheless, the reduction in accuracy under choice (iv) is less pronounced here than it was in the context of linear spectral statistics. This makes sense in light of the fact that the statistics λ1(Σ^n)\lambda_{1}(\widehat{\Sigma}_{n}) and λ1(Σ^n)λ2(Σ^n)\lambda_{1}(\widehat{\Sigma}_{n})-\lambda_{2}(\widehat{\Sigma}_{n}) do not involve the centering parameter ϑn(f)\vartheta_{n}(f) that was used for linear spectral statistics. (Recall from Section 4.2 that the reduced accuracy for linear spectral statistics under choice (iv) appeared to be related to ϑn(f)\vartheta_{n}(f).)

Table 4: Results for λ1(Σ^n)\lambda_{1}(\widehat{\Sigma}_{n})
p/n=0.5p/n=0.5 p/n=1p/n=1 p/n=1.5p/n=1.5
ξ1\xi_{1} Σn\Sigma_{n} mean sd 95th mean sd 95th mean sd 95th
1 2.9 0.06 3 3.96 0.06 4.07 4.9 0.06 5.01
2.93(0.05) 0.06(0.01) 3.04(0.07) 3.99(0.04) 0.06(0.01) 4.1(0.06) 4.93(0.04) 0.07(0.01) 5.05(0.06)
2 0.8 0.05 0.89 0.8 0.05 0.89 0.81 0.05 0.9
0.8(0.05) 0.05(0.01) 0.89(0.06) 0.8(0.05) 0.05(0.01) 0.89(0.06) 0.81(0.05) 0.05(0.01) 0.9(0.06)
(i) 3 5.76 0.11 5.95 7.91 0.12 8.12 9.81 0.13 10.03
5.81(0.08) 0.12(0.02) 6.02(0.11) 7.98(0.09) 0.13(0.02) 8.21(0.13) 9.87(0.1) 0.14(0.03) 10.12(0.14)
4 21.35 1.47 23.81 41.79 2.87 46.62 62.34 4.31 69.6
21.21(1.5) 1.47(0.13) 23.68(1.71) 41.8(2.73) 2.9(0.23) 46.7(3.09) 62.34(4.19) 4.3(0.35) 69.62(4.73)
5 5.62 0.35 6.21 6.25 0.34 6.81 6.88 0.33 7.44
5.61(0.33) 0.35(0.03) 6.2(0.38) 6.22(0.33) 0.34(0.03) 6.79(0.37) 6.87(0.33) 0.33(0.03) 7.43(0.38)
1 2.96 0.07 3.08 4.02 0.07 4.14 4.96 0.07 5.08
3.03(0.07) 0.07(0.01) 3.15(0.09) 4.09(0.07) 0.07(0.01) 4.22(0.09) 5.04(0.06) 0.08(0.01) 5.17(0.08)
2 0.8 0.06 0.89 0.8 0.05 0.89 0.8 0.05 0.9
0.8(0.05) 0.05(0.01) 0.89(0.06) 0.8(0.06) 0.05(0.01) 0.89(0.06) 0.81(0.06) 0.05(0.01) 0.9(0.06)
(ii) 3 5.88 0.13 6.11 8.04 0.14 8.26 9.93 0.14 10.16
6.03(0.13) 0.14(0.02) 6.27(0.16) 8.2(0.12) 0.15(0.02) 8.47(0.15) 10.1(0.12) 0.16(0.02) 10.37(0.16)
4 21.34 1.51 23.86 41.8 2.92 46.72 62.32 4.3 69.65
21.44(1.48) 1.51(0.13) 23.99(1.69) 41.87(3) 2.92(0.25) 46.79(3.38) 62.37(4.37) 4.34(0.38) 69.67(4.99)
5 5.65 0.35 6.24 6.28 0.35 6.86 6.9 0.34 7.48
5.67(0.36) 0.36(0.03) 6.27(0.41) 6.31(0.35) 0.35(0.03) 6.9(0.4) 6.91(0.35) 0.34(0.03) 7.48(0.4)
1 2.88 0.05 2.97 3.94 0.06 4.04 4.88 0.06 4.99
2.89(0.03) 0.06(0.01) 2.98(0.05) 3.95(0.03) 0.06(0.01) 4.06(0.05) 4.89(0.03) 0.06(0.01) 5(0.05)
2 0.8 0.05 0.89 0.8 0.05 0.89 0.8 0.05 0.9
0.8(0.05) 0.05(0.01) 0.89(0.06) 0.8(0.05) 0.05(0.01) 0.89(0.06) 0.8(0.05) 0.05(0.01) 0.89(0.06)
(iii) 3 5.72 0.1 5.9 7.88 0.12 8.08 9.77 0.12 9.99
5.74(0.07) 0.11(0.02) 5.94(0.11) 7.9(0.08) 0.12(0.02) 8.12(0.13) 9.79(0.08) 0.13(0.02) 10.01(0.11)
4 21.39 1.47 23.84 41.83 2.88 46.61 62.14 4.34 69.51
21.39(1.38) 1.47(0.12) 23.87(1.56) 41.74(2.95) 2.87(0.25) 46.57(3.35) 61.71(4.39) 4.25(0.37) 68.87(4.96)
5 5.61 0.34 6.17 6.24 0.34 6.82 6.87 0.34 7.44
5.59(0.34) 0.34(0.03) 6.17(0.38) 6.23(0.35) 0.34(0.03) 6.8(0.39) 6.88(0.34) 0.34(0.03) 7.45(0.39)
1 3.24 0.18 3.55 4.77 0.42 5.48 6.27 0.67 7.51
3.39(0.22) 0.12(0.03) 3.59(0.27) 5(0.46) 0.18(0.05) 5.31(0.54) 6.59(0.64) 0.25(0.07) 7.03(0.75)
2 0.8 0.06 0.9 0.8 0.06 0.91 0.81 0.06 0.91
0.81(0.06) 0.06(0.01) 0.91(0.07) 0.81(0.06) 0.06(0.01) 0.91(0.07) 0.81(0.06) 0.06(0.01) 0.91(0.07)
(iv) 3 6.42 0.36 7.03 9.51 0.81 10.97 12.52 1.33 15.13
6.77(0.4) 0.24(0.06) 7.18(0.49) 10.17(0.81) 0.37(0.1) 10.82(0.97) 13.55(1.35) 0.54(0.16) 14.48(1.6)
4 21.45 1.62 24.28 42.01 3.23 47.49 62.59 4.68 70.57
21.52(1.71) 1.62(0.16) 24.26(1.96) 42.08(3.22) 3.17(0.29) 47.44(3.66) 63.05(4.62) 4.76(0.43) 71.15(5.29)
5 5.71 0.39 6.37 6.48 0.4 7.18 7.33 0.51 8.14
5.79(0.39) 0.39(0.04) 6.45(0.45) 6.66(0.42) 0.39(0.04) 7.33(0.49) 7.69(0.45) 0.39(0.05) 8.37(0.53)
Table 5: Results for λ1(Σ^n)λ2(Σ^n)\lambda_{1}(\widehat{\Sigma}_{n})-\lambda_{2}(\widehat{\Sigma}_{n})
p/n=0.5p/n=0.5 p/n=1p/n=1 p/n=1.5p/n=1.5
ξ1\xi_{1} Σn\Sigma_{n} mean sd 95th mean sd 95th mean sd 95th
1 0.09 0.05 0.18 0.1 0.05 0.19 0.1 0.06 0.21
0.1(0.02) 0.06(0.01) 0.2(0.05) 0.1(0.02) 0.06(0.01) 0.21(0.04) 0.11(0.02) 0.06(0.01) 0.22(0.04)
2 0.18 0.07 0.3 0.18 0.07 0.29 0.18 0.07 0.29
0.19(0.06) 0.06(0.01) 0.29(0.07) 0.18(0.06) 0.06(0.01) 0.29(0.07) 0.19(0.06) 0.06(0.01) 0.3(0.07)
(i) 3 0.18 0.1 0.35 0.19 0.1 0.38 0.2 0.11 0.41
0.2(0.04) 0.11(0.02) 0.4(0.08) 0.22(0.05) 0.12(0.03) 0.44(0.1) 0.23(0.05) 0.13(0.03) 0.47(0.11)
4 18.78 1.47 21.26 38.26 2.87 43.07 57.95 4.31 65.22
18.62(1.5) 1.47(0.13) 21.09(1.71) 38.24(2.74) 2.9(0.23) 43.15(3.09) 57.94(4.19) 4.3(0.35) 65.22(4.73)
5 2.78 0.35 3.38 2.32 0.35 2.9 2.01 0.34 2.58
2.75(0.33) 0.35(0.03) 3.34(0.38) 2.28(0.33) 0.34(0.03) 2.85(0.37) 1.98(0.33) 0.34(0.03) 2.55(0.38)
1 0.1 0.05 0.2 0.1 0.05 0.2 0.11 0.06 0.21
0.11(0.02) 0.06(0.01) 0.22(0.05) 0.11(0.02) 0.06(0.01) 0.23(0.05) 0.12(0.02) 0.07(0.01) 0.24(0.04)
2 0.18 0.07 0.29 0.18 0.07 0.29 0.18 0.07 0.29
0.18(0.06) 0.06(0.01) 0.29(0.07) 0.19(0.06) 0.06(0.01) 0.29(0.07) 0.19(0.06) 0.06(0.01) 0.3(0.07)
(ii) 3 0.19 0.1 0.38 0.2 0.11 0.41 0.21 0.11 0.42
0.22(0.04) 0.12(0.02) 0.44(0.09) 0.23(0.05) 0.13(0.03) 0.48(0.1) 0.24(0.04) 0.13(0.03) 0.49(0.1)
4 18.72 1.5 21.24 38.21 2.92 43.11 57.88 4.3 65.19
18.76(1.48) 1.51(0.13) 21.3(1.68) 38.22(2.99) 2.92(0.25) 43.13(3.38) 57.87(4.37) 4.34(0.38) 65.17(4.98)
5 2.74 0.36 3.34 2.29 0.35 2.88 1.97 0.35 2.55
2.68(0.36) 0.36(0.03) 3.29(0.41) 2.24(0.35) 0.35(0.03) 2.84(0.4) 1.89(0.35) 0.34(0.03) 2.47(0.4)
1 0.09 0.05 0.17 0.09 0.05 0.19 0.1 0.06 0.21
0.09(0.02) 0.05(0.01) 0.19(0.04) 0.1(0.02) 0.05(0.01) 0.2(0.03) 0.11(0.01) 0.06(0.01) 0.21(0.03)
2 0.18 0.07 0.29 0.18 0.07 0.29 0.18 0.07 0.29
0.19(0.06) 0.06(0.01) 0.3(0.07) 0.18(0.06) 0.06(0.01) 0.29(0.07) 0.18(0.06) 0.06(0.01) 0.29(0.07)
(iii) 3 0.17 0.09 0.34 0.19 0.1 0.38 0.2 0.11 0.41
0.19(0.04) 0.1(0.02) 0.38(0.08) 0.21(0.05) 0.11(0.03) 0.42(0.1) 0.18(0.06) 0.06(0.01) 0.29(0.07)
4 18.84 1.48 21.29 38.31 2.88 43.1 57.77 4.34 65.18
18.83(1.38) 1.47(0.12) 21.32(1.57) 38.21(2.95) 2.88(0.25) 43.05(3.35) 57.32(4.39) 4.25(0.37) 64.49(4.96)
5 2.78 0.34 3.35 2.34 0.35 2.92 2.02 0.34 2.6
2.75(0.34) 0.35(0.03) 3.34(0.38) 2.31(0.35) 0.34(0.03) 2.9(0.39) 2.03(0.34) 0.34(0.03) 2.6(0.39)
1 0.17 0.14 0.39 0.31 0.34 0.91 0.5 0.54 1.5
0.17(0.08) 0.09(0.03) 0.34(0.14) 0.27(0.22) 0.14(0.06) 0.53(0.32) 0.4(0.33) 0.21(0.09) 0.78(0.47)
2 0.19 0.07 0.3 0.18 0.07 0.3 0.19 0.07 0.3
0.19(0.06) 0.07(0.01) 0.31(0.07) 0.19(0.06) 0.07(0.01) 0.3(0.08) 0.19(0.06) 0.07(0.01) 0.3(0.07)
(iv) 3 0.32 0.28 0.75 0.62 0.64 1.73 0.99 1.06 2.97
0.35(0.17) 0.19(0.07) 0.69(0.28) 0.6(0.44) 0.3(0.13) 1.15(0.64) 0.98(0.87) 0.45(0.21) 1.8(1.19)
4 18.59 1.61 21.41 37.75 3.22 43.23 57.01 4.69 65
18.45(1.69) 1.61(0.16) 21.18(1.93) 37.46(3.21) 3.16(0.29) 42.78(3.62) 56.94(4.6) 4.74(0.43) 64.99(5.25)
5 2.54 0.41 3.21 1.79 0.47 2.53 1.24 0.49 2.03
2.4(0.39) 0.39(0.04) 3.06(0.45) 1.58(0.41) 0.4(0.05) 2.27(0.47) 1.09(0.37) 0.4(0.06) 1.78(0.45)

4.4 Inference on stable rank

Confidence interval. Table 6 presents numerical results on the width and coverage probability of the bootstrap confidence interval ^\widehat{\mathcal{I}} (defined in (3.15)) for the stable rank parameter rnr_{n}. The results were computed using experiments based on the same design as in Section 4.2, with every interval having a nominal coverage probability of 95%95\%. Due to the fact that the width of the interval scales with the value of rnr_{n}, we report the width as a percentage of rnr_{n}. To illustrate a particular example, the upper right corner of Table 6 shows that in settings 1 and (i) with p/n=1.5p/n=1.5, the average width of the interval over repeated experiments is 1.97% of rnr_{n}, with a standard deviation of 0.12%. Under choices 1, 3, and 5 for Σn\Sigma_{n}, the width is typically quite small as a percentage of rnr_{n}. By contrast, the percentage is larger in cases 2 and 4, which seems to occur because rnr_{n} is smaller in these cases compared to 1, 3, and 5. Lastly, to consider coverage probability, the table shows good general agreement with the nominal level. Indeed, among the 60 distinct settings, there are only a few where the coverage probability differs from the nominal level by more than 2%.

Table 6: Results for the stable rank confidence interval (95% nominal coverage probability)
p/n=0.5p/n=0.5 p/n=1p/n=1 p/n=1.5p/n=1.5
ξ1\xi_{1} Σn\Sigma_{n} width/rn/r_{n} (%)(\%) coverage (%)(\%) width/rn/r_{n} (%)(\%) coverage (%)(\%) width/rn/r_{n} (%)(\%) coverage (%)(\%)
(i) 1 2.00(0.12) 95.20 1.96(0.12) 95.40 1.97(0.12) 93.80
2 14.53(1.19) 95.40 16.99(1.32) 91.60 18.7(1.34) 95.40
3 2.00(0.12) 95.00 1.98(0.12) 97.40 1.97(0.12) 95.40
4 34.68(2.87) 94.00 41.55(4.30) 94.00 43.82(5.33) 93.80
5 5.18(0.65) 94.80 3.24(0.35) 95.20 2.64(0.24) 94.60
(ii) 1 2.07(0.12) 94.00 2.01(0.13) 92.80 1.98(0.12) 93.80
2 14.68(1.17) 94.60 17.10(1.34) 95.00 18.75(1.42) 93.20
3 2.08(0.13) 95.20 2.02(0.12) 95.20 2.00(0.13) 93.40
4 34.86(2.76) 93.40 41.47(4.09) 94.80 44.99(5.34) 94.00
5 5.12(0.63) 94.00 3.22(0.33) 93.40 2.63(0.24) 94.00
(iii) 1 1.96(0.12) 94.40 1.96(0.11) 96.00 1.96(0.11) 95.80
2 14.45(1.11) 93.40 17.01(1.27) 94.60 18.57(1.42) 94.00
3 1.97(0.12) 95.00 1.96(0.12) 93.20 1.96(0.12) 93.00
4 34.71(2.67) 93.80 41.49(4.38) 92.80 44.37(5.21) 92.40
5 5.17(0.64) 94.20 3.22(0.33) 95.80 2.61(0.24) 95.20
(iv) 1 2.38(0.21) 95.00 2.32(0.19) 94.80 2.31(0.25) 93.80
2 15.31(1.19) 97.40 17.86(1.42) 93.80 19.53(1.49) 94.00
3 2.4(0.20) 96.00 2.37(0.24) 95.00 2.33(0.27) 94.20
4 35.43(2.90) 93.60 42.01(4.33) 93.60 44.59(5.06) 93.40
5 5.20(0.64) 93.20 3.26(0.33) 93.00 2.73(0.29) 93.40

Stable rank test. Here, we discuss numerical results for an instance of the testing problem (3.16) given by

𝖧0,n:rnp0.1 vs. 𝖧1,n:rnp>0.1.\mathsf{H}_{0,n}:\textstyle\frac{r_{n}}{p}\leq 0.1\ \ \ \text{\ \ \ \ \ \ \ vs. \ \ \ \ \ \ \ }\ \ \ \mathsf{H}_{1,n}:\textstyle\frac{r_{n}}{p}>0.1. (4.1)

As a way to unify the study of level and power, we modified the experiments from Section 4.2 as follows. We rescaled the leading 15 eigenvalues in setting (1) to tune the ratio rn/pr_{n}/p within the grid {0.0980,0.0985,,0.1045,0.1050}\{0.0980,0.0985,\dots,0.1045,0.1050\}. More precisely, the eigenvalues of Σn\Sigma_{n} were taken to be of the form λj(Σn)=4s/3\lambda_{j}(\Sigma_{n})=4s/3 for j{1,,5}j\in\{1,\dots,5\}, λj(Σn)=s\lambda_{j}(\Sigma_{n})=s for j{6,,15}j\in\{6,\dots,15\} and λj(Σn)=1\lambda_{j}(\Sigma_{n})=1 for j{16,,p}j\in\{16,\dots,p\}, with different values of ss being chosen to produce values of rn/pr_{n}/p matching the stated gridpoints. Hence, gridpoints less than 0.1 correspond to 𝖧n,0\mathsf{H}_{n,0}, and gridpoints larger than 0.10.1 correspond to 𝖧1,n\mathsf{H}_{1,n}.

At each gridpoint, we performed experiments based on the design of those in Section 4.2, allowing for p/np/n to take the values 0.5,1.0,1.50.5,1.0,1.5, and allowing the distribution of ξ1\xi_{1} to be of the types (i), (ii), (iii), and (iv). For each such setting, we applied the relevant bootstrap test from Section 3.4.2 at a 5% nominal level to 500 datasets, and then we recorded the rejection rate over the 500 trials. Figures 123, and 4 display the results by plotting the rejection rate as a function of the ratio rn/pr_{n}/p. The separate figures correspond to choices of the distribution of ξ1\xi_{1}, and within each figure, three colored curves correspond to choices of p/np/n, as indicated in the legend. In addition, all the plots include a dashed horizontal line to signify the 5% nominal level.

An important feature that is shared by all the curves is that they stay below the nominal 5% level for essentially every value of rn/p<0.1r_{n}/p<0.1, which corroborates our theoretical bound (3.20) in Theorem 3. Furthermore, when rn/p=0.1r_{n}/p=0.1, the curves are mostly quite close to 5%, demonstrating that the testing procedure is well calibrated. For values of rn/p>0.1r_{n}/p>0.1, the procedure exhibits substantial power, with the curve corresponding to p/n=0.5p/n=0.5 achieving approximately 100% power at rn/p=0.105r_{n}/p=0.105 for every choice of ξ1\xi_{1}. In the cases of p/n{1.0,1.5}p/n\in\{1.0,1.5\}, the procedure still retains power, but with some diminution, as might be anticipated in settings of higher dimension.

\begin{overpic}[width=195.12767pt]{stable_rank_distr4_c.eps} \put(22.0,75.0){ rejection rate vs. $r_{n}/p$ } \put(0.0,10.0){\rotatebox{90.0}{ {\footnotesize\ \ \ rejection rate (\%) \ \ }}} \put(40.0,-1.0){ $r_{n}/p\ (\%)$ } \end{overpic}
Figure 1: Results for the testing problem (4.1) when ξ12\xi_{1}^{2} follows a Chi-Squared distribution with pp degrees of freedom.
\begin{overpic}[width=195.12767pt]{stable_rank_distr2_c.eps} \put(22.0,75.0){ rejection rate vs. $r_{n}/p$ } \put(0.0,10.0){\rotatebox{90.0}{ {\footnotesize\ \ \ rejection rate (\%) \ \ }}} \put(40.0,-1.0){ $r_{n}/p\ (\%)$ } \end{overpic}
Figure 2: Results for the testing problem (4.1) when ξ12\xi_{1}^{2} follows a Beta-Prime(p(1+p+8)8,1+p+168)\left(\frac{p(1+p+8)}{8},\frac{1+p+16}{8}\right) distribution.
\begin{overpic}[width=195.12767pt]{stable_rank_distr3_c.eps} \put(22.0,75.0){ rejection rate vs. $r_{n}/p$ } \put(0.0,10.0){\rotatebox{90.0}{ {\footnotesize\ \ \ rejection rate (\%) \ \ }}} \put(40.0,-1.0){ $r_{n}/p\ (\%)$ } \end{overpic}
Figure 3: Results for the testing problem (4.1) when ξ12\xi_{1}^{2} follows a (p+4)Beta(p/2,2)(p+4)\textup{Beta}(p/2,2) distribution.
\begin{overpic}[width=195.12767pt]{stable_rank_distr5_c.eps} \put(22.0,75.0){ rejection rate vs. $r_{n}/p$ } \put(0.0,10.0){\rotatebox{90.0}{ {\footnotesize\ \ \ rejection rate (\%) \ \ }}} \put(40.0,-1.0){ $r_{n}/p\ (\%)$ } \end{overpic}
Figure 4: Results for the testing problem (4.1) when ξ12\xi_{1}^{2} follows a 9p10F(p,20)\frac{9p}{10}\textup{F}(p,20) distribution.

4.5 Sphericity test

Let 𝖳sr=r^n/p1\mathsf{T}_{\text{sr}}=\widehat{r}_{n}/p-1 be a shorthand for the statistic that was introduced in Section 3.4.2 for testing sphericity. We now provide numerical comparisons with three other testing procedures based on linear spectral statistics. To define these other procedures, let

𝒮^n=pni=1n𝐱i𝐱i𝐱i22\widehat{\mathcal{S}}_{n}=\frac{p}{n}\sum_{i=1}^{n}\frac{\mathbf{x}_{i}\mathbf{x}_{i}^{\top}}{\|\mathbf{x}_{i}\|_{2}^{2}} (4.2)

denote the sample covariance matrix of rescaled observations, and consider the following three statistics,

𝖳1\displaystyle\mathsf{T}_{1} =1ptr(𝒮^n2)cn1\displaystyle=\textstyle\frac{1}{p}\textup{tr}(\widehat{\mathcal{S}}_{n}^{2})-c_{n}-1 (4.3)
𝖳2\displaystyle\mathsf{T}_{2} =1ptr(𝒮^n4)4cnptr(𝒮^n3)2cn(1ptr(𝒮^n2))2+10cn2ptr(𝒮^n2)5cn31\displaystyle=\textstyle\frac{1}{p}\textup{tr}(\widehat{\mathcal{S}}_{n}^{4})-\textstyle\frac{4c_{n}}{p}\textup{tr}(\widehat{\mathcal{S}}_{n}^{3})-2c_{n}(\textstyle\frac{1}{p}\textup{tr}(\widehat{\mathcal{S}}_{n}^{2}))^{2}+\textstyle\frac{10c_{n}^{2}}{p}\textup{tr}(\widehat{\mathcal{S}}_{n}^{2})-5c_{n}^{3}-1 (4.4)
𝖳3\displaystyle\mathsf{T}_{3} =max{n𝖳1+12,n𝖳2+6cn8(18+12cn+cn2)}.\displaystyle=\max\Big{\{}\textstyle\frac{n\mathsf{T}_{1}+1}{2}\,,\,\frac{n\mathsf{T}_{2}+6-c_{n}}{\sqrt{8(18+12c_{n}+c_{n}^{2})}}\Big{\}}. (4.5)

The testing procedures corresponding to these statistics reject the sphericity hypothesis when the statistics take large values. The first two statistics can be attributed in part to the papers [43] and [15], and the proposal of taking the maximum was made in [45]. However, in all of these works, an ordinary sample covariance matrix was used in place of 𝒮^n\widehat{\mathcal{S}}_{n}. Variants of the definition of 𝖳1\mathsf{T}_{1} in (4.3) have been studied in [39] and references in therein, while the definitions 𝖳2\mathsf{T}_{2} and 𝖳3\mathsf{T}_{3} in (4.4) and (4.5) were proposed in [18]. The latter paper also derived the limiting null distributions of all three statistics in high-dimensional elliptical models.

Since numerical comparisons of the statistics 𝖳1\mathsf{T}_{1}, 𝖳2\mathsf{T}_{2}, and 𝖳3\mathsf{T}_{3} were given previously in [18], our experiments here are designed using similar settings. Under the null hypothesis, we generated data from a standard multivariate normal distribution in 15 cases, corresponding to 3 choices of p/n{0.5,1,2}p/n\in\{0.5,1,2\} and 5 choices of n{100,200,300,400,500}n\in\{100,200,300,400,500\}. For the statistics 𝖳1\mathsf{T}_{1}, 𝖳2\mathsf{T}_{2}, and 𝖳3\mathsf{T}_{3}, we used the analytical critical values derived previously in [18], and for the statistic 𝖳sr\mathsf{T}_{\text{sr}}, we determined its critical value using the proposed bootstrap with B=500B=500. For each setting under the null hypothesis, we generated 50000 datasets and calculated the empirical level of each test as the fraction of rejections among the 50000 trials. The results corresponding to a nominal level of 5% are displayed in Table 7, which shows that the empirical and nominal levels are in close agreement for all four statistics.

Regarding the alternative hypothesis, we retained all the settings described above, except that we replaced the null covariance matrix Σn=Ip\Sigma_{n}=I_{p} with a diagonal spiked covariance matrix such that λj(Σn)=1.3\lambda_{j}(\Sigma_{n})=1.3 for all 1jp/21\leq j\leq p/2, and λj(Σn)=1\lambda_{j}(\Sigma_{n})=1 for all other jj. This choice has the benefit that it creates variation in the numerical values of power, so that they are not too concentrated near 1. Similar alternatives were also used for the experiments in [18]. The results are presented in Table 8, which is organized in the same format as Table 7. In each setting, the power of the statistic 𝖳sr\mathsf{T}_{\text{sr}} approximately matches the highest power achieved among 𝖳1\mathsf{T}_{1},𝖳2\mathsf{T}_{2}, and 𝖳3\mathsf{T}_{3}.

Table 7: Results on empirical level for sphericity tests (5% nominal level)
p/n=0.5p/n=0.5 p/n=1p/n=1 p/n=2p/n=2
nn 𝖳1\mathsf{T}_{1} 𝖳2\mathsf{T}_{2} 𝖳3\mathsf{T}_{3} 𝖳sr\mathsf{T}_{\text{sr}} 𝖳1\mathsf{T}_{1} 𝖳2\mathsf{T}_{2} 𝖳3\mathsf{T}_{3} 𝖳sr\mathsf{T}_{\text{sr}} 𝖳1\mathsf{T}_{1} 𝖳2\mathsf{T}_{2} 𝖳3\mathsf{T}_{3} 𝖳sr\mathsf{T}_{\text{sr}}
100 0.049 0.048 0.051 0.050 0.049 0.049 0.051 0.054 0.048 0.051 0.053 0.051
200 0.049 0.050 0.052 0.057 0.043 0.047 0.048 0.047 0.051 0.052 0.054 0.051
300 0.046 0.048 0.051 0.046 0.046 0.047 0.048 0.048 0.051 0.054 0.059 0.052
400 0.048 0.051 0.050 0.051 0.049 0.056 0.051 0.054 0.051 0.049 0.053 0.054
500 0.049 0.045 0.048 0.052 0.050 0.046 0.048 0.053 0.044 0.045 0.044 0.046
Table 8: Results on power for sphericity tests
p/n=0.5p/n=0.5 p/n=1p/n=1 p/n=2p/n=2
nn 𝖳1\mathsf{T}_{1} 𝖳2\mathsf{T}_{2} 𝖳3\mathsf{T}_{3} 𝖳sr\mathsf{T}_{\text{sr}} 𝖳1\mathsf{T}_{1} 𝖳2\mathsf{T}_{2} 𝖳3\mathsf{T}_{3} 𝖳sr\mathsf{T}_{\text{sr}} 𝖳1\mathsf{T}_{1} 𝖳2\mathsf{T}_{2} 𝖳3\mathsf{T}_{3} 𝖳sr\mathsf{T}_{\text{sr}}
100 0.190 0.165 0.185 0.217 0.201 0.154 0.190 0.211 0.208 0.136 0.194 0.207
200 0.497 0.391 0.466 0.502 0.513 0.350 0.473 0.503 0.503 0.271 0.465 0.520
300 0.793 0.660 0.766 0.797 0.801 0.589 0.763 0.812 0.793 0.458 0.752 0.804
400 0.952 0.863 0.941 0.952 0.951 0.790 0.933 0.954 0.958 0.670 0.941 0.949
500 0.993 0.958 0.990 0.994 0.993 0.922 0.989 0.993 0.993 0.816 0.987 0.995

5 Conclusion

Up to now, high-dimensional elliptical models have generally fallen outside the scope of existing bootstrap methods for spectral statistics. In the current paper, we have addressed this problem by showing how a parametric bootstrap approach that is specialized to IC models [34] can be extended to elliptical models in high dimensions. In addition, we have shown that the new method is supported by two types of theoretical guarantees in the elliptical setting: First, the method consistently approximates the distributions of linear spectral statistics (Theorem 2). Second, the method can be applied to a nonlinear combination of these statistics to construct asymptotically valid confidence intervals and hypothesis tests (Theorem 3). From a practical perspective, a valuable property of the method is its user-friendliness, since it can be applied to generic spectral statistics in an automatic way. In particular, this provides the user with the flexibility to easily explore spectral statistics whose asymptotic distributions are analytically inconvenient or unknown. With regard to empirical performance, we have presented extensive simulation results, showing that with few exceptions, the method accurately approximates the distributions of both linear and nonlinear spectral statistics across many settings and inference tasks. An interesting question for future work is to determine if a consistent parametric bootstrap method for spectral statistics can be developed within more general models that unify IC and elliptical models.

Appendices

In Appendices AB, and C, we give the proofs of Theorems 12, and 3 respectively. Background results are given in Appendix D, and additional details related to the examples in Section 3.1 are given in Appendix E.

Appendix A Proof of Theorem 1

Our consistency guarantees for ς^n2\widehat{\varsigma}_{n}^{2} and H~n\tilde{H}_{n} are proven separately in the next two subsections.

A.1 Consistency of ς^n 2\widehat{\varsigma}_{n}^{\,2}: Proof of (3.7) in Theorem 1

Define the parameter

τn=(p+2)(βn2αn)γn+2αn+2,\displaystyle\tau_{n}=\frac{(p+2)(\beta_{n}-2\alpha_{n})}{\gamma_{n}+2\alpha_{n}}+2, (A.1)

and the estimate

τ^n=(p+2)(β^n2α^n)γ^n+2α^n+2.\displaystyle\widehat{\tau}_{n}=\frac{(p+2)(\widehat{\beta}_{n}-2\widehat{\alpha}_{n})}{\widehat{\gamma}_{n}+2\widehat{\alpha}_{n}}+2. (A.2)

Based on the definitions of ςn2\varsigma_{n}^{2} and ς^n2\widehat{\varsigma}_{n}^{2}, note that

ςn2p=τn and ς^n2p=max{τ^n,0},\textstyle\frac{\varsigma_{n}^{2}}{p}=\tau_{n}\ \ \ \text{ and }\ \ \ \ \textstyle\frac{\widehat{\varsigma}_{n}^{2}}{p}=\max\{\widehat{\tau}_{n},0\},

as well as the fact that the Assumption 1 implies τnτ\tau_{n}\to\tau as nn\to\infty. Since the function xmax{x,0}x\mapsto\max\{x,0\} is 1-Lipschitz and τn0\tau_{n}\geq 0, it suffices to show τ^nτn0\widehat{\tau}_{n}-\tau_{n}\xrightarrow{\mathbb{P}}0.

In Lemmas A.1, A.2, and A.3 given later in this subsection, the following three limits will be established,

α^nαn 1\displaystyle\frac{\widehat{\alpha}_{n}}{\alpha_{n}}\ \xrightarrow{\mathbb{P}}\ 1 (A.3)
γ^nγn 1\displaystyle\frac{\widehat{\gamma}_{n}}{\gamma_{n}}\ \xrightarrow{\mathbb{P}}\ 1 (A.4)
(p+2)β^nβnγn+2αn 0.\displaystyle(p+2)\frac{\widehat{\beta}_{n}-\beta_{n}}{\gamma_{n}+2\alpha_{n}}\ \xrightarrow{\mathbb{P}}\ 0. (A.5)

Due to the ratio-consistency of α^n\widehat{\alpha}_{n} and γ^n\widehat{\gamma}_{n}, as well as the fact that (p+2)αnγn+2αn1\frac{(p+2)\alpha_{n}}{\gamma_{n}+2\alpha_{n}}\asymp 1 holds under Assumption 2, it is straightforward to check that

γn+2αnγ^n+2α^n1 and (p+2)αnγn+2αn(p+2)α^nγ^n+2α^n0.\frac{\gamma_{n}+2\alpha_{n}}{\widehat{\gamma}_{n}+2\widehat{\alpha}_{n}}\xrightarrow{\mathbb{P}}1\ \ \ \ \text{ and }\ \ \ \ \frac{(p+2)\alpha_{n}}{\gamma_{n}+2\alpha_{n}}-\frac{(p+2)\widehat{\alpha}_{n}}{\widehat{\gamma}_{n}+2\widehat{\alpha}_{n}}\xrightarrow{\mathbb{P}}0. (A.6)

Therefore,

τ^nτn=(p+2)β^nγn+2αn(γn+2αnγ^n+2α^n1)+(p+2)β^nβnγn+2αn+(2(p+2)αnγn+2αn2(p+2)α^nγ^n+2α^n)=(p+2)βnγn+2αno(1)+o(1),\begin{split}\widehat{\tau}_{n}-\tau_{n}&=\frac{(p+2)\widehat{\beta}_{n}}{\gamma_{n}+2\alpha_{n}}\left(\frac{\gamma_{n}+2\alpha_{n}}{\widehat{\gamma}_{n}+2\widehat{\alpha}_{n}}-1\right)+(p+2)\frac{\widehat{\beta}_{n}-\beta_{n}}{\gamma_{n}+2\alpha_{n}}+\bigg{(}2\frac{(p+2)\alpha_{n}}{\gamma_{n}+2\alpha_{n}}-2\frac{(p+2)\widehat{\alpha}_{n}}{\widehat{\gamma}_{n}+2\widehat{\alpha}_{n}}\bigg{)}\\ &=\frac{(p+2)\beta_{n}}{\gamma_{n}+2\alpha_{n}}o_{\mathbb{P}}(1)+o_{\mathbb{P}}(1),\end{split} (A.7)

where we have applied (A.5) twice in the second step. Under Assumption 1, we have 𝔼(ξ14)=p2+τp+o(p)\mathbb{E}(\xi_{1}^{4})=p^{2}+\tau p+o(p) and so Lemma D.1 gives

βn\displaystyle\beta_{n} =𝔼(ξ14)p(p+2)(tr(Σn)2+2tr(Σn2))tr(Σn)2\displaystyle\ =\ \textstyle\frac{\mathbb{E}(\xi_{1}^{4})}{p(p+2)}\Big{(}\textup{tr}(\Sigma_{n})^{2}+2\textup{tr}(\Sigma_{n}^{2})\Big{)}\ -\ \textup{tr}(\Sigma_{n})^{2}
=p2+τp+o(p)p(p+2)(tr(Σn)2+2tr(Σn2))tr(Σn)2\displaystyle\ =\ \textstyle\frac{p^{2}+\tau p+o(p)}{p(p+2)}\Big{(}\textup{tr}(\Sigma_{n})^{2}+2\textup{tr}(\Sigma_{n}^{2})\Big{)}\ -\ \textup{tr}(\Sigma_{n})^{2}
=((τ2)p+o(p)p(p+2))tr(Σn)2+2(1+o(1))tr(Σn2)\displaystyle\ =\ \Big{(}\textstyle\frac{(\tau-2)p+o(p)}{p(p+2)}\Big{)}\textup{tr}(\Sigma_{n})^{2}+2(1+o(1))\textup{tr}(\Sigma_{n}^{2})
p.\displaystyle\ \lesssim\ p.

Consequently, we have (p+2)βnγn+2αn1\frac{(p+2)\beta_{n}}{\gamma_{n}+2\alpha_{n}}\lesssim 1, and applying this to (A.7) completes the proof of (3.7) in Theorem 1.∎

Lemma A.1.

If Assumptions 1 and 2 hold, then as nn\rightarrow\infty

α^nαn 1.\frac{\widehat{\alpha}_{n}}{\alpha_{n}}\ \xrightarrow{\mathbb{P}}\ 1.
Proof.

Recall that αn=tr(Σn2)\alpha_{n}=\textup{tr}(\Sigma_{n}^{2}) and that α^n=tr(Σ^n2)1ntr(Σ^n)2\widehat{\alpha}_{n}=\textup{tr}(\widehat{\Sigma}_{n}^{2})-\frac{1}{n}\textup{tr}(\widehat{\Sigma}_{n})^{2}. The two terms in the estimate can be expanded as

tr(Σ^n2)=1n2i=1nj=1n(𝐱i𝐱j)21ntr(Σ^n)2=1n3i=1nj=1n(𝐱i𝐱i)(𝐱j𝐱j),\begin{split}\textup{tr}(\widehat{\Sigma}_{n}^{2})&\ =\ \frac{1}{n^{2}}\sum_{i=1}^{n}\sum_{j=1}^{n}(\mathbf{x}_{i}^{\top}\mathbf{x}_{j})^{2}\\[5.69046pt] \frac{1}{n}\textup{tr}(\widehat{\Sigma}_{n})^{2}&\ =\ \frac{1}{n^{3}}\sum_{i=1}^{n}\sum_{j=1}^{n}(\mathbf{x}_{i}^{\top}\mathbf{x}_{i})(\mathbf{x}_{j}^{\top}\mathbf{x}_{j}),\end{split} (A.8)

which leads to the algebraic relation

α^nαnαn=(1αnn2ij(𝐱i𝐱j)2 1)+1αnn3ij((𝐱i𝐱i)2(𝐱i𝐱i)(𝐱j𝐱j))=:An+Bn.\begin{split}\frac{\widehat{\alpha}_{n}-\alpha_{n}}{\alpha_{n}}&\ =\ \bigg{(}\frac{1}{\alpha_{n}n^{2}}\sum_{i\neq j}(\mathbf{x}_{i}^{\top}\mathbf{x}_{j})^{2}\ -\ 1\bigg{)}\ +\ \frac{1}{\alpha_{n}n^{3}}\sum_{i\neq j}\Big{(}(\mathbf{x}_{i}^{\top}\mathbf{x}_{i})^{2}-(\mathbf{x}_{i}^{\top}\mathbf{x}_{i})(\mathbf{x}_{j}^{\top}\mathbf{x}_{j})\Big{)}\\[5.69046pt] &\ =:\ A_{n}+B_{n}.\end{split} (A.9)

In the remainder of the proof, we will show that AnA_{n} and BnB_{n} are both o(1)o_{\mathbb{P}}(1). We begin with the analysis of BnB_{n}, since it is simpler. Note that BnB_{n} is always non-negative, since it can be rewritten as

Bn=1αnn3i>j(𝐱i𝐱i𝐱j𝐱j)2,B_{n}\ =\ \textstyle\frac{1}{\alpha_{n}n^{3}}\sum_{i>j}(\mathbf{x}_{i}^{\top}\mathbf{x}_{i}-\mathbf{x}_{j}^{\top}\mathbf{x}_{j})^{2}, (A.10)

and so it suffices to show that 𝔼(Bn)=o(1)\mathbb{E}(B_{n})=o(1). Furthermore, the expectation of BnB_{n} can be computed directly as

𝔼(Bn)=1αnn3i>j2var(𝐱i𝐱i)=n(n1)αnn3var(𝐱1𝐱1)=n(n1)αnn3[(𝔼(ξ14)p(p+2)1)tr(Σn)2+ 2𝔼(ξ14)p(p+2)tr(Σn2)](Lemma D.1)=n(n1)αnn3[𝒪(1p)tr(Σn)2+ 2(1+𝒪(1p))tr(Σn2)](Assumption 1)1n,\begin{split}\mathbb{E}(B_{n})&\ =\ \textstyle\frac{1}{\alpha_{n}n^{3}}\sum_{i>j}2\,\operatorname{var}(\mathbf{x}_{i}^{\top}\mathbf{x}_{i})\\[5.69046pt] &\ =\ \textstyle\frac{n(n-1)}{\alpha_{n}n^{3}}\operatorname{var}(\mathbf{x}_{1}^{\top}\mathbf{x}_{1})\\[5.69046pt] &\ =\ \frac{n(n-1)}{\alpha_{n}n^{3}}\bigg{[}\Big{(}\textstyle\frac{\mathbb{E}(\xi_{1}^{4})}{p(p+2)}-1\Big{)}\textup{tr}(\Sigma_{n})^{2}\ +\ 2\frac{\mathbb{E}(\xi_{1}^{4})}{p(p+2)}\textup{tr}(\Sigma_{n}^{2})\bigg{]}\ \ \ \ \ \text{(Lemma \ref{lem:quadform})}\\[5.69046pt] &\ =\ \frac{n(n-1)}{\alpha_{n}n^{3}}\bigg{[}\mathcal{O}\big{(}\textstyle\frac{1}{p}\big{)}\textup{tr}(\Sigma_{n})^{2}\ +\ 2\Big{(}1+\mathcal{O}\big{(}\textstyle\frac{1}{p}\big{)}\Big{)}\textup{tr}(\Sigma_{n}^{2})\bigg{]}\ \ \ \ \ \ \ \text{(Assumption \ref{Data generating model})}\\[5.69046pt] &\ \lesssim\ \textstyle\frac{1}{n},\end{split} (A.11)

where the last step uses tr(Σn)2αnp\frac{\textup{tr}(\Sigma_{n})^{2}}{\alpha_{n}}\leq p. Thus, 𝔼(Bn)=o(1)\mathbb{E}(B_{n})=o(1).

Now we handle the term AnA_{n} by showing that 𝔼(An2)=o(1)\mathbb{E}(A_{n}^{2})=o(1). It is helpful to start by noting that if iji\neq j, then

𝔼((𝐱i𝐱j)2)=tr(Σn2),\mathbb{E}((\mathbf{x}_{i}^{\top}\mathbf{x}_{j})^{2})=\textup{tr}(\Sigma_{n}^{2}), (A.12)

which can be checked by a direct calculation. (See the calculation in (D.5) for additional details.) Consequently, if we expand out the square An2A_{n}^{2} and then take the expectation, it follows that

𝔼(An2)=1αn2n4𝔼[(ij(𝐱i𝐱j)2)2]2n(n1)n2+ 1.\mathbb{E}(A_{n}^{2})\ =\ \frac{1}{\alpha_{n}^{2}n^{4}}\mathbb{E}\bigg{[}\Big{(}\sum_{i\neq j}(\mathbf{x}_{i}^{\top}\mathbf{x}_{j})^{2}\Big{)}^{2}\bigg{]}\ -\ \frac{2n(n-1)}{n^{2}}\ +\ 1. (A.13)

Next, we compute the second moment on the right as

𝔼[(ij(𝐱i𝐱j)2)2]=ijlk𝔼[(𝐱i𝐱j)2(𝐱l𝐱k)2].\mathbb{E}\bigg{[}\Big{(}\sum_{i\neq j}(\mathbf{x}_{i}^{\top}\mathbf{x}_{j})^{2}\Big{)}^{2}\bigg{]}\ =\ \displaystyle\sum_{i\neq j}\sum_{l\neq k}\mathbb{E}\Big{[}(\mathbf{x}_{i}^{\top}\mathbf{x}_{j})^{2}(\mathbf{x}_{l}^{\top}\mathbf{x}_{k})^{2}\Big{]}. (A.14)

In Lemma D.2, it is shown that if (i,j,k,l)(i,j,k,l) are four distinct indices, then

𝔼((𝐱i𝐱j)4)\displaystyle\mathbb{E}((\mathbf{x}_{i}^{\top}\mathbf{x}_{j})^{4})\ = 3(𝔼(ξ14)p(p+2))2(αn2+2tr(Σn4))\displaystyle=\ 3\left(\frac{\mathbb{E}(\xi_{1}^{4})}{p(p+2)}\right)^{2}(\alpha_{n}^{2}+2\textup{tr}(\Sigma_{n}^{4})) (A.15)
𝔼((𝐱i𝐱j)2(𝐱i𝐱k)2)\displaystyle\mathbb{E}((\mathbf{x}_{i}^{\top}\mathbf{x}_{j})^{2}(\mathbf{x}_{i}^{\top}\mathbf{x}_{k})^{2})\ =𝔼(ξ14)p(p+2)(αn2+2tr(Σn4))\displaystyle=\ \frac{\mathbb{E}(\xi_{1}^{4})}{p(p+2)}(\alpha_{n}^{2}+2\textup{tr}(\Sigma_{n}^{4})) (A.16)
𝔼((𝐱i𝐱j)2(𝐱l𝐱k)2)\displaystyle\mathbb{E}((\mathbf{x}_{i}^{\top}\mathbf{x}_{j})^{2}(\mathbf{x}_{l}^{\top}\mathbf{x}_{k})^{2})\ =αn2.\displaystyle=\ \alpha_{n}^{2}. (A.17)

Note that in the double sum (A.14), the numbers of terms involving 2, 3, and 4 distinct indices are respectively 𝒪(n2)\mathcal{O}(n^{2}), 𝒪(n3)\mathcal{O}(n^{3}) and n4+𝒪(n3)n^{4}+\mathcal{O}(n^{3}). Applying these observations to (A.14), we have

1αn2n4𝔼[(ij(𝐱i𝐱j)2)2]=αn2αn2n4(n4+𝒪(n3))+1αn2n4(𝔼(ξ14)p(p+2)(αn2+2tr(Σn4)))𝒪(n3)+1αn2n4(3(𝔼(ξ14)p(p+2))2(αn2+2tr(Σn4)))𝒪(n2)= 1+𝒪(1n).\begin{split}\textstyle\frac{1}{\alpha_{n}^{2}n^{4}}\mathbb{E}\bigg{[}\Big{(}\sum_{i\neq j}(\mathbf{x}_{i}^{\top}\mathbf{x}_{j})^{2}\Big{)}^{2}\bigg{]}&\ =\ \textstyle\frac{\alpha_{n}^{2}}{\alpha_{n}^{2}n^{4}}\Big{(}n^{4}+\mathcal{O}(n^{3})\Big{)}\\ &\ \ \ \ \ +\ \textstyle\frac{1}{\alpha_{n}^{2}n^{4}}\Big{(}\textstyle\frac{\mathbb{E}(\xi_{1}^{4})}{p(p+2)}(\alpha_{n}^{2}+2\textup{tr}(\Sigma_{n}^{4}))\Big{)}\mathcal{O}(n^{3})\\ &\ \ \ \ \ +\ \textstyle\frac{1}{\alpha_{n}^{2}n^{4}}\Big{(}3\left(\frac{\mathbb{E}(\xi_{1}^{4})}{p(p+2)}\right)^{2}(\alpha_{n}^{2}+2\textup{tr}(\Sigma_{n}^{4}))\Big{)}\mathcal{O}(n^{2})\\[5.69046pt] &\ =\ 1+\mathcal{O}(\textstyle\frac{1}{n}).\end{split}

Combining this with (A.13), we reach the following bound on 𝔼(An2)\mathbb{E}(A_{n}^{2}),

𝔼(An2)= 1+𝒪(1n)2n(n1)n2+ 11n,\begin{split}\mathbb{E}(A_{n}^{2})&\ =\ 1+\mathcal{O}(\textstyle\frac{1}{n})-\frac{2n(n-1)}{n^{2}}\ +\ 1\\ &\ \lesssim\ \textstyle\frac{1}{n},\end{split} (A.18)

which completes the proof. ∎

Lemma A.2.

If Assumptions 1 and 2 hold, then as nn\to\infty

γ^nγn 1.\frac{\widehat{\gamma}_{n}}{\gamma_{n}}\ \xrightarrow{\mathbb{P}}\ 1.
Proof.

Recall γn=tr(Σn)2\gamma_{n}=\textup{tr}(\Sigma_{n})^{2} and γ^n=tr(Σ^n)2\widehat{\gamma}_{n}=\textup{tr}(\widehat{\Sigma}_{n})^{2}, and note that the algebraic identity a2b2=2b(ab)+(ab)2a^{2}-b^{2}=2b(a-b)+(a-b)^{2} gives

𝔼|γ^nγn|\displaystyle\mathbb{E}|\widehat{\gamma}_{n}-\gamma_{n}| =𝔼|2tr(Σn)(tr(Σ^n)tr(Σn))+(tr(Σ^n)tr(Σn))2|\displaystyle\ =\ \mathbb{E}\Big{|}2\textup{tr}(\Sigma_{n})\big{(}\textup{tr}(\widehat{\Sigma}_{n})-\textup{tr}(\Sigma_{n})\big{)}+\big{(}\textup{tr}(\widehat{\Sigma}_{n})-\textup{tr}(\Sigma_{n})\big{)}^{2}\Big{|}
2tr(Σn)(var(tr(Σ^n)))1/2+var(tr(Σ^n)).\displaystyle\ \leq\ 2\textup{tr}(\Sigma_{n})\big{(}\operatorname{var}(\textup{tr}(\widehat{\Sigma}_{n}))\big{)}^{1/2}+\operatorname{var}(\textup{tr}(\widehat{\Sigma}_{n})).

Next, observe that var(tr(Σ^n))=1nvar(𝐱1𝐱1)\operatorname{var}(\textup{tr}(\widehat{\Sigma}_{n}))=\frac{1}{n}\operatorname{var}(\mathbf{x}_{1}^{\top}\mathbf{x}_{1}), and that the calculation in (A.11) shows var(𝐱1𝐱1)p\operatorname{var}(\mathbf{x}_{1}^{\top}\mathbf{x}_{1})\lesssim p. Combining this with the fact that γnp2\gamma_{n}\asymp p^{2} under Assumption 2, we have

𝔼|γ^nγn|γntr(Σn)+1p2\frac{\mathbb{E}|\widehat{\gamma}_{n}-\gamma_{n}|}{\gamma_{n}}\ \lesssim\ \frac{\textup{tr}(\Sigma_{n})+1}{p^{2}} (A.19)

which leads to the stated result.∎

Lemma A.3.

If Assumptions 1 and 2 hold, then as nn\to\infty,

(p+2)β^nβnγn+2αn 0.(p+2)\frac{\widehat{\beta}_{n}-\beta_{n}}{\gamma_{n}+2\alpha_{n}}\ \xrightarrow{\mathbb{P}}\ 0. (A.20)
Proof.

Recall that βn=var(𝐱122)\beta_{n}=\operatorname{var}(\|\mathbf{x}_{1}\|_{2}^{2}) and β^n=1n1i=1n(𝐱i221ni=1n𝐱i22)2\widehat{\beta}_{n}=\frac{1}{n-1}\sum_{i=1}^{n}\Big{(}\|\mathbf{x}_{i}\|_{2}^{2}-\textstyle\frac{1}{n}\sum_{i^{\prime}=1}^{n}\|\mathbf{x}_{i^{\prime}}\|_{2}^{2}\Big{)}^{2}. It is clear that 𝔼(β^n)=βn\mathbb{E}(\widehat{\beta}_{n})=\beta_{n}, and so it suffices to show

(p+2)2(γn+2αn)2var(β^n)=o(1).\textstyle\frac{(p+2)^{2}}{(\gamma_{n}+2\alpha_{n})^{2}}\operatorname{var}(\widehat{\beta}_{n})=o(1).

Since Assumption 2 implies γn+2αnp2\gamma_{n}+2\alpha_{n}\asymp p^{2}, it remains to show var(β^n)=o(p2)\operatorname{var}(\widehat{\beta}_{n})=o(p^{2}).

Making use of a standard bound for the variance of a sample variance, we have

var(β^n)1n𝔼|𝐱122tr(Σn)|4.\operatorname{var}(\widehat{\beta}_{n})\ \lesssim\ \textstyle\frac{1}{n}\,\mathbb{E}\Big{|}\left\|\mathbf{x}_{1}\right\|_{2}^{2}-\textup{tr}(\Sigma_{n})\Big{|}^{4}. (A.21)

To bound the right side of (A.21), observe that

𝔼|𝐱122tr(Σn)|4\displaystyle\mathbb{E}\Big{|}\!\left\|\mathbf{x}_{1}\right\|_{2}^{2}-\textup{tr}\left(\Sigma_{n}\right)\!\Big{|}^{4} =𝔼|ξ12(𝐮1Σn𝐮11ptr(Σn))+(1pξ121)tr(Σn)|4\displaystyle\ =\ \mathbb{E}\left|\xi_{1}^{2}\left(\mathbf{u}_{1}^{\top}\Sigma_{n}\mathbf{u}_{1}-\textstyle\frac{1}{p}\textup{tr}(\Sigma_{n})\right)+\left(\textstyle\frac{1}{p}\xi_{1}^{2}-1\right)\textup{tr}(\Sigma_{n})\right|^{4}
𝔼(ξ18)𝔼|𝐮1Σn𝐮11ptr(Σn)|4+tr(Σn)4𝔼|1pξ121|4.\displaystyle\ \lesssim\ \mathbb{E}(\xi_{1}^{8})\cdot\mathbb{E}\left|\mathbf{u}_{1}^{\top}\Sigma_{n}\mathbf{u}_{1}-\textstyle\frac{1}{p}\textup{tr}(\Sigma_{n})\right|^{4}+\textup{tr}(\Sigma_{n})^{4}\cdot\mathbb{E}\left|\textstyle\frac{1}{p}\xi_{1}^{2}-1\right|^{4}.

Since λ1(Σn)1\lambda_{1}(\Sigma_{n})\lesssim 1 holds under Assumption 2, it follows from Lemma D.3 that

𝔼|𝐮1Σn𝐮11ptr(Σn)|41p2.\mathbb{E}\left|\mathbf{u}_{1}^{\top}\Sigma_{n}\mathbf{u}_{1}-\textstyle\frac{1}{p}\textup{tr}(\Sigma_{n})\right|^{4}\ \lesssim\ \textstyle\frac{1}{p^{2}}.

Also, Assumption 1 implies

𝔼|1pξ121|41p2 and 𝔼(ξ18)p4.\mathbb{E}\left|\textstyle\frac{1}{p}\xi_{1}^{2}-1\right|^{4}\ \lesssim\ \textstyle\frac{1}{p^{2}}\ \ \ \ \ \text{ and }\ \ \ \ \ \mathbb{E}(\xi_{1}^{8})\,\lesssim\,p^{4}. (A.22)

Applying the last several observations to (A.21) implies var(β^n)p\operatorname{var}(\widehat{\beta}_{n})\lesssim p, which yields the stated result. ∎

A.2 Consistency of H~n\tilde{H}_{n}: Proof of (3.8) in Theorem 1

For each tt\in\mathbb{R}, denote the empirical distribution function of the QuEST eigenvalue estimates as

H^Q,n(t)=1pj=1p1{λ^Q,jt}.\widehat{H}_{\textup{Q},n}(t)=\frac{1}{p}\sum_{j=1}^{p}1\{\widehat{\lambda}_{\textup{Q},j}\leq t\}.

It follows from Lemma A.4 below that the limit

H^Q,nH\widehat{H}_{\textup{Q},n}\Rightarrow H (A.23)

holds almost surely as nn\to\infty. So, to prove (3.8), it is sufficient to show supt|H^Q,n(t)H~n(t)|0\sup_{t\in\mathbb{R}}|\widehat{H}_{\textup{Q},n}(t)-\tilde{H}_{n}(t)|\xrightarrow{\mathbb{P}}0.

Since λ^Q,j\widehat{\lambda}_{\textup{Q},j} and λ~j\tilde{\lambda}_{j} can only disagree when λ^Q,j>b^n\widehat{\lambda}_{\textup{Q},j}>\widehat{b}_{n}, we have

supt|H^Q,n(t)H~n(t)|1pj=1p1{λ^Q,jλ~j}=1pj=1p1{λ^Q,j>b^n}= 1H^Q,n(b^n).\begin{split}\sup_{t\in\mathbb{R}}|\widehat{H}_{\textup{Q},n}(t)-\tilde{H}_{n}(t)|&\ \leq\ \frac{1}{p}\sum_{j=1}^{p}1\{\widehat{\lambda}_{\textup{Q},j}\neq\tilde{\lambda}_{j}\}\\[5.69046pt] &\ =\ \frac{1}{p}\sum_{j=1}^{p}1\{\widehat{\lambda}_{\textup{Q},j}>\widehat{b}_{n}\}\\[5.69046pt] &\ =\ 1-\widehat{H}_{\textup{Q},n}(\widehat{b}_{n}).\end{split} (A.24)

Let uu denote the upper endpoint of the support of the distribution associated with HH, and fix any ϵ(0,14)\epsilon\in(0,\frac{1}{4}). Since H^Q,n\widehat{H}_{\textup{Q},n} is a non-decreasing function, we have

supt|H^Q,nH~n(t)| 1H^Q,n(u+ϵ)1{b^nu+ϵ}.\begin{split}\sup_{t\in\mathbb{R}}|\widehat{H}_{\textup{Q},n}-\tilde{H}_{n}(t)|&\ \leq\ 1-\widehat{H}_{\textup{Q},n}(u+\epsilon)1\{\widehat{b}_{n}\geq u+\epsilon\}.\end{split} (A.25)

By the definition of uu, the value u+ϵu+\epsilon is a continuity point of HH with H(u+ϵ)=1H(u+\epsilon)=1, and so the limit (A.23) implies H^Q,n(u+ϵ)=1+o(1)\widehat{H}_{\textup{Q},n}(u+\epsilon)=1+o_{\mathbb{P}}(1). Hence, in order to show that the right side of (A.25) converges to 0 in probability, it is enough to show that (b^nu+ϵ)1\mathbb{P}(\widehat{b}_{n}\geq u+\epsilon)\to 1.

We will handle this remaining task by showing instead that (b^n<u+ϵ)0\mathbb{P}(\widehat{b}_{n}<u+\epsilon)\to 0. Recall that b^n=λ1(Σ^n)+1\widehat{b}_{n}=\lambda_{1}(\widehat{\Sigma}_{n})+1 and note that the limit HnHH_{n}\Rightarrow H implies that λ1(Σn)uϵ\lambda_{1}(\Sigma_{n})\geq u-\epsilon must hold for all large nn. Therefore, we have

(b^n<u+ϵ)(λ1(Σ^n)λ1(Σn)+2ϵ1)\begin{split}\mathbb{P}(\widehat{b}_{n}<u+\epsilon)&\ \leq\ \mathbb{P}\Big{(}\lambda_{1}(\widehat{\Sigma}_{n})\leq\lambda_{1}(\Sigma_{n})+2\epsilon-1\Big{)}\end{split} (A.26)

for all large nn. To derive an upper bound on the last probability, we will replace λ1(Σ^n)\lambda_{1}(\widehat{\Sigma}_{n}) with a smaller random variable, and then rearrange the event. Let 𝐯1\mathbf{v}_{1} denote an eigenvector of Σn\Sigma_{n} corresponding to λ1(Σn)\lambda_{1}(\Sigma_{n}) with 𝐯12=1\|\mathbf{v}_{1}\|_{2}=1. Defining the random matrix Wn=1ni=1nξi2𝐮i𝐮iW_{n}=\frac{1}{n}\sum_{i=1}^{n}\xi_{i}^{2}\mathbf{u}_{i}\mathbf{u}_{i}^{\top}, the variational representation of λ1(Σ^n)\lambda_{1}(\widehat{\Sigma}_{n}) gives λ1(Σ^n)λ1(Σn)𝐯1Wn𝐯1\lambda_{1}(\widehat{\Sigma}_{n})\geq\lambda_{1}(\Sigma_{n})\mathbf{v}_{1}^{\prime}W_{n}\mathbf{v}_{1}. Also note that our choice of ϵ\epsilon ensures 2ϵ11/22\epsilon-1\leq-1/2. This yields the following bounds for all large nn,

(b^n<u+ϵ)(λ1(Σn)|𝐯1Wn𝐯11|1/2) 4λ1(Σn)2var(𝐯1Wn𝐯1)=4λ1(Σn)2nvar(ξ12𝐮1(𝐯1𝐯1)𝐮1)=4λ1(Σn)2n(3𝔼(ξ14)p(p+2)1)(Lemma D.1)1n,\begin{split}\mathbb{P}(\widehat{b}_{n}<u+\epsilon)&\ \leq\ \mathbb{P}\Big{(}\lambda_{1}(\Sigma_{n})|\mathbf{v}_{1}^{\top}W_{n}\mathbf{v}_{1}-1|\geq 1/2\Big{)}\\[5.69046pt] &\ \leq\ 4\lambda_{1}(\Sigma_{n})^{2}\operatorname{var}\big{(}\mathbf{v}_{1}^{\top}W_{n}\mathbf{v}_{1}\big{)}\\[5.69046pt] &\ =\ \textstyle\frac{4\lambda_{1}(\Sigma_{n})^{2}}{n}\,\operatorname{var}\big{(}\xi_{1}^{2}\mathbf{u}_{1}^{\top}(\mathbf{v}_{1}\mathbf{v}_{1}^{\top})\mathbf{u}_{1}\big{)}\\[5.69046pt] &\ =\ \textstyle\frac{4\lambda_{1}(\Sigma_{n})^{2}}{n}\,\left(3\frac{\mathbb{E}(\xi_{1}^{4})}{p(p+2)}-1\right)\ \ \ \ \ \ \ \ \ \ \ \text{(Lemma~{}\ref{lem:quadform})}\\[0.0pt] &\ \lesssim\ \textstyle\frac{1}{n},\end{split} (A.27)

where the last step uses Assumptions 1 and 2.∎

Lemma A.4.

Suppose that Assumptions 1 and 2 hold. Then, the following limit holds almost surely as nn\to\infty

H^Q,nH.\widehat{H}_{\textup{Q},n}\Rightarrow H. (A.28)
Proof.

In the paper [26, pp.381-382], the limit (A.28) is established in the context of an IC model where p/nc(0,){1}p/n\to c\in(0,\infty)\setminus\{1\}, and the eigenvalues of the population covariance matrix Σn\Sigma_{n} satisfy Assumption 2. To adapt the proof from [26] to our current setting, it is sufficient to show that the following two facts hold under Assumptions 1 and 2. First, there is a constant C>0C>0 such that the bound lim supnλ1(Σ^n)C\limsup_{n\to\infty}\lambda_{1}(\widehat{\Sigma}_{n})\leq C holds almost surely, which we prove later using a truncation argument in Lemmas A.5 and A.6. Second, the random distribution function H^n(t)=1pj=1p1{λj(Σ^n)t}\widehat{H}_{n}(t)=\frac{1}{p}\sum_{j=1}^{p}1\{\lambda_{j}(\widehat{\Sigma}_{n})\leq t\} satisfies H^nΨ(H,c)\widehat{H}_{n}\Rightarrow\Psi(H,c) almost surely, where we recall that the distribution Ψ(H,c)\Psi(H,c) is defined near equation (3.9). The validity of this second fact under our current assumptions is a consequence of Theorem 1.1 in [6].∎

A.3 Boundedness of sample eigenvalues

Lemma A.5.

Under Assumptions 1 and 2, there is a constant C>0C>0 such that the bounds

lim supnλ1(Σ^n)\displaystyle\limsup_{n\to\infty}\lambda_{1}({\widehat{\Sigma}_{n}}) C and lim supnλ1(Σ~n)C\displaystyle\leq C\text{ \ \ \ \ \ \ and \ \ \ \ \ \ }\limsup_{n\to\infty}\lambda_{1}({\tilde{\Sigma}_{n}})\leq C (A.29)

hold almost surely.

Proof.

Since the estimates λ~j=λj(Σ~n)\tilde{\lambda}_{j}=\lambda_{j}(\tilde{\Sigma}_{n}) are bounded above by b^n=λ1(Σ^n)+1\widehat{b}_{n}=\lambda_{1}(\widehat{\Sigma}_{n})+1 for all j{1,,p}j\in\{1,\dots,p\}, it is enough to focus on the first inequality in (A.29). Define a sequence of truncated random variables ξ˘i=ξi 1{|ξi2p|<pn}\breve{\xi}_{i}=\xi_{i}\,1\{|\xi_{i}^{2}-p|<\sqrt{pn}\} for i=1,,ni=1,\dots,n, as well as the following truncated version of Σ^n\widehat{\Sigma}_{n},

Σ˘n=1ni=1nξ˘i2Σn1/2𝐮i𝐮iΣn1/2.\breve{\Sigma}_{n}=\frac{1}{n}\sum_{i=1}^{n}\breve{\xi}_{i}^{2}\,\Sigma_{n}^{1/2}\mathbf{u}_{i}\mathbf{u}_{i}^{\top}\Sigma_{n}^{1/2}. (A.30)

Lemma A.6 below shows that (Σ^nΣ˘ni.o.)=0\mathbb{P}(\widehat{\Sigma}_{n}\neq\breve{\Sigma}_{n}\ \text{i.o.})=0. Consequently, it suffices to show there is a constant C>0C>0 such that lim supnλ1(Σ˘n)C\limsup_{n\to\infty}\lambda_{1}(\breve{\Sigma}_{n})\leq C holds almost surely.

Since the vectors 𝐮1,,𝐮n\mathbf{u}_{1},\dots,\mathbf{u}_{n} are uniformly distributed on the unit sphere of p\mathbb{R}^{p}, we may express them as 𝐮i=𝐳i/𝐳i2\mathbf{u}_{i}=\mathbf{z}_{i}/\|\mathbf{z}_{i}\|_{2} for a sequence of i.i.d. standard Gaussian vectors 𝐳1,,𝐳n\mathbf{z}_{1},\dots,\mathbf{z}_{n} in p\mathbb{R}^{p}. This yields

Σ˘n=1ni=1nξ˘i2/p𝐳i22/pΣn1/2𝐳i𝐳iΣn1/2.\breve{\Sigma}_{n}=\frac{1}{n}\sum_{i=1}^{n}\frac{\breve{\xi}_{i}^{2}/p}{\|\mathbf{z}_{i}\|_{2}^{2}/p}\Sigma_{n}^{1/2}\mathbf{z}_{i}\mathbf{z}_{i}^{\top}\Sigma_{n}^{1/2}. (A.31)

By construction, we have max1inξ˘i2/p1+1/cn\max_{1\leq i\leq n}\breve{\xi}_{i}^{2}/p\leq 1+1/\sqrt{c_{n}} for all n1n\geq 1. Also, using standard tail bounds for the χp2\chi_{p}^{2} distribution and the Borel-Cantelli lemma, it is straightforward to show that lim infnmin1in𝐳i22/p\liminf_{n\to\infty}\min_{1\leq i\leq n}\|\mathbf{z}_{i}\|_{2}^{2}/p is at least 1/21/2 almost surely. Taken together, these observations imply there is a constant C>0C>0 such that the bound

λ1(Σ˘n)Cλ1(Σn)λ1(1ni=1n𝐳i𝐳i)\lambda_{1}(\breve{\Sigma}_{n})\ \leq\ C\,\lambda_{1}(\Sigma_{n})\,\lambda_{1}\Big{(}\textstyle\frac{1}{n}\sum_{i=1}^{n}\mathbf{z}_{i}\mathbf{z}_{i}^{\top}\Big{)} (A.32)

holds almost surely for all large nn. In addition, note that supn1λ1(Σn)1\sup_{n\geq 1}\lambda_{1}(\Sigma_{n})\lesssim 1 holds by Assumption 2. Lastly, it is known from [50, Theorem 3.1] that the limit

limnλ1(1ni=1n𝐳i𝐳i)=(1+c)2\lim_{n\to\infty}\lambda_{1}\Big{(}\textstyle\frac{1}{n}\sum_{i=1}^{n}\mathbf{z}_{i}\mathbf{z}_{i}^{\top}\Big{)}\ =\ (1+\sqrt{c})^{2}

holds almost surely, which completes the proof.∎

Lemma A.6.

Suppose that Assumption 1 holds, and let Σ˘n\breve{\Sigma}_{n} be as defined in (A.30). Then,

(Σ^nΣ˘ni.o.)=0.\mathbb{P}(\widehat{\Sigma}_{n}\neq\breve{\Sigma}_{n}\ \textup{i.o.})=0. (A.33)
Proof.

We adapt a classical argument from [50]. For a fixed number nn, the matrices Σ^n\widehat{\Sigma}_{n} and Σ˘n\breve{\Sigma}_{n} can only disagree if |ξi2p|pn|\xi_{i}^{2}-p|\geq\sqrt{pn} for at least one i=1,,ni=1,\dots,n, and so

(Σ^nΣ˘ni.o.)limj(n=ji=1n{|ξi2p|pn}).\mathbb{P}(\widehat{\Sigma}_{n}\neq\breve{\Sigma}_{n}\ \textup{i.o.})\ \leq\ \lim_{j\to\infty}\mathbb{P}\bigg{(}\bigcup_{n=j}^{\infty}\bigcup_{i=1}^{n}\{|\xi_{i}^{2}-p|\geq\sqrt{pn}\}\bigg{)}. (A.34)

Next, we partition the values of nn into the intervals [2m1,2m),[2m,2m+1),[2^{m-1},2^{m}),[2^{m},2^{m+1}),\dots and take a union bound across the intervals, yielding

(Σ^nΣ˘ni.o.)limjm=j(max1in2m1n<2m|ξi2pp|2m1).\begin{split}\mathbb{P}(\widehat{\Sigma}_{n}\neq\breve{\Sigma}_{n}\ \textup{i.o.})&\ \leq\ \lim_{j\to\infty}\sum_{m=j}^{\infty}\mathbb{P}\Bigg{(}\max_{\underset{2^{m-1}\leq n<2^{m}}{1\leq i\leq n}}|\textstyle\frac{\xi_{i}^{2}-p}{\sqrt{p}}|\,\geq\,\sqrt{2^{m-1}}\Bigg{)}.\end{split} (A.35)

For a generic sequence of random variables Y1,,YNY_{1},\dots,Y_{N} and number q1q\geq 1, recall the standard maximal inequality

𝔼|max1iNYi|qNmax1iN𝔼|Yi|q.\mathbb{E}\Big{|}\!\max_{1\leq i\leq N}Y_{i}\Big{|}^{q}\ \leq\ N\max_{1\leq i\leq N}\mathbb{E}|Y_{i}|^{q}. (A.36)

Since the number of pairs (i,n)(i,n) in the maximum in (A.35) is at most 22m2^{2m}, if we apply Chebyshev’s inequality to (A.35) and use the condition 𝔼|ξ12pp|4+ε1\mathbb{E}|\textstyle\frac{\xi_{1}^{2}-p}{\sqrt{p}}|^{4+\varepsilon}\lesssim 1 from Assumption 1, then for each m1m\geq 1 we have

(max1in2m1n2m|ξi2pp|2m1)22m(2m1)4+ε 2mε/2.\begin{split}\mathbb{P}\Bigg{(}\max_{\underset{2^{m-1}\leq n\leq 2^{m}}{1\leq i\leq n}}|\textstyle\frac{\xi_{i}^{2}-p}{\sqrt{p}}|\,\geq\,\sqrt{2^{m-1}}\Bigg{)}&\ \lesssim\ \frac{2^{2m}}{\big{(}\sqrt{2^{m-1}}\big{)}^{4+\varepsilon}}\\[5.69046pt] &\ \lesssim\ 2^{-m\varepsilon/2}.\end{split} (A.37)

Hence, we may insert this bound into (A.35) to conclude that

(Σ^nΣ˘ni.o.)limjm=j2mε/2= 0,\mathbb{P}(\widehat{\Sigma}_{n}\neq\breve{\Sigma}_{n}\ \textup{i.o.})\ \lesssim\ \lim_{j\to\infty}\sum_{m=j}^{\infty}2^{-m\varepsilon/2}\ =\ 0,

which completes the proof.∎

Appendix B Proof of Theorem 2

Let 𝐳k\mathbf{z}\in\mathbb{R}^{k} denote a Gaussian random vector to be described in a moment, and consider the triangle inequality

dLP((p{Tn(𝐟)ϑn(𝐟)}),(p{Tn,1(𝐟)ϑ~n(𝐟)}|X))In+IIn,d_{\mathrm{LP}}\Big{(}\mathcal{L}(p\big{\{}T_{n}(\mathbf{f})-\vartheta_{n}(\mathbf{f})\big{\}}\big{)}\,,\,\mathcal{L}\big{(}p\big{\{}T_{n,1}^{*}(\mathbf{f})-\tilde{\vartheta}_{n}(\mathbf{f})\big{\}}\big{|}X\big{)}\Big{)}\ \leq\ \textup{I}_{n}\ +\ \textup{II}_{n}, (B.1)

where we define

In\displaystyle\textup{I}_{n} =dLP((p{Tn(𝐟)ϑn(𝐟)}),(𝐳))\displaystyle\ =\ d_{\mathrm{LP}}\Big{(}\mathcal{L}\big{(}p\big{\{}T_{n}(\mathbf{f})-\vartheta_{n}(\mathbf{f})\big{\}}\big{)},\mathcal{L}(\mathbf{z})\Big{)} (B.2)
IIn\displaystyle\textup{II}_{n} =dLP((𝐳),(p{Tn,1(𝐟)ϑ~n(𝐟)}|X)).\displaystyle\ =\ d_{\mathrm{LP}}\Big{(}\mathcal{L}(\mathbf{z})\,,\,\mathcal{L}\big{(}p\big{\{}T_{n,1}^{*}(\mathbf{f})-\tilde{\vartheta}_{n}(\mathbf{f})\}\big{|}X\big{)}\Big{)}. (B.3)

To handle the terms In\textup{I}_{n} and IIn\textup{II}_{n}, we will apply a central limit theorem for linear spectral statistics established in [18], which relies on the following two conditions when cnc>0c_{n}\to c>0 as nn\to\infty:

  • (a).

    The elliptical model in Assumption 1 holds with the conditions on ξ12\xi_{1}^{2} being replaced by

    𝔼(ξ12)=p,var(ξ12pp)=τ+o(1) and 𝔼|ξ12pp|2+ε1,\mathbb{E}(\xi_{1}^{2})=p,\ \ \ \ \operatorname{var}\Big{(}\textstyle\frac{\xi_{1}^{2}-p}{\sqrt{p}}\Big{)}=\tau+o(1)\ \ \ \ \ \text{ and }\ \ \ \ \mathbb{E}\Big{|}\textstyle\frac{\xi_{1}^{2}-p}{\sqrt{p}}\Big{|}^{2+\varepsilon}\lesssim 1, (B.4)

    for some constants constants τ0\tau\geq 0 and ε>0\varepsilon>0 that do not depend on nn.

  • (b).

    There is a distribution HH such that HnHH_{n}\Rightarrow H as nn\to\infty, and λ1(Σn)1\lambda_{1}(\Sigma_{n})\lesssim 1.

Under these conditions, Theorem 2.2 in [18] ensures there exists a Gaussian distribution (𝐳)\mathcal{L}(\mathbf{z}) depending only on (𝐟,H,c,τ)(\mathbf{f},H,c,\tau) such that In0\textup{I}_{n}\to 0 as nn\to\infty.

To finish the proof, we must show IIn0\textup{II}_{n}\xrightarrow{\mathbb{P}}0. This can be done using the mentioned central limit theorem, but some extra considerations are involved, due to the fact that IIn\textup{II}_{n} is random. It is sufficient to show that for any subsequence J{1,2,}J\subset\{1,2,\dots\}, the limit IIn0\textup{II}_{n}\to 0 holds almost surely along a further subsequence of JJ. Since the bootstrap data are generated from an elliptical model that is parameterized in terms of ((ξ1)2|X)\mathcal{L}((\xi_{1}^{*})^{2}|X) and Σ~n\tilde{\Sigma}_{n}, this amounts to verifying that ((ξ1)2|X)\mathcal{L}((\xi_{1}^{*})^{2}|X) and H~n\tilde{H}_{n} satisfy analogues of (a) and (b) almost surely along a subsequence of JJ.

To proceed, recall that Algorithm 1 is designed so that 𝔼((ξ1)2|X)=p\mathbb{E}((\xi_{1}^{*})^{2}|X)=p and var((ξ1)2|X)=ς^n2\operatorname{var}((\xi_{1}^{*})^{2}|X)=\widehat{\varsigma}_{n}^{2}. Also, note that under Assumption 1, the parameter ςn2=var(ξ12)\varsigma_{n}^{2}=\operatorname{var}(\xi_{1}^{2}) satisfies ςn2/pτ\varsigma_{n}^{2}/p\to\tau. Consequently, Theorem 1 implies ς^n2/p=τ+o(1)\widehat{\varsigma}_{n}^{2}/p=\tau+o_{\mathbb{P}}(1), and so there is a subsequence JJJ^{\prime}\subset J along which the limit ς^n2/pτ\widehat{\varsigma}_{n}^{2}/p\to\tau holds almost surely. In other words, the limit

var((ξ1)2pp|X)τ\operatorname{var}\!\Big{(}\textstyle\frac{(\xi_{1}^{*})^{2}-p}{\sqrt{p}}\Big{|}X\Big{)}\ \to\ \tau (B.5)

holds almost surely along JJ^{\prime}. Moreover, since ((ξ1)2|X)\mathcal{L}((\xi_{1}^{*})^{2}|X) is a Gamma distribution with mean pp and variance ς^n2\widehat{\varsigma}_{n}^{2}, Lemma E.4 implies there is a constant C>0C>0 not depending on nn such that the bound

(𝔼(|(ξ1)2pp|6|X))1/6C(ς^np+ς^n2p3/2)\Big{(}\mathbb{E}\Big{(}\big{|}\textstyle\frac{(\xi_{1}^{*})^{2}-p}{\sqrt{p}}\big{|}^{6}\Big{|}X\Big{)}\Big{)}^{1/6}\leq C\big{(}\textstyle\frac{\widehat{\varsigma}_{n}}{\sqrt{p}}+\textstyle\frac{\widehat{\varsigma}_{n}^{2}}{p^{3/2}}\big{)} (B.6)

holds almost surely. Therefore, the left side of (B.6) is bounded almost surely along JJ^{\prime}.

With regard to the empirical spectral distribution H~n\tilde{H}_{n} associated with the matrix Σ~n\tilde{\Sigma}_{n}, it satisfies the limit H~nH\tilde{H}_{n}\Rightarrow H almost surely along a further subsequence J′′JJ^{\prime\prime}\subset J^{\prime}, due to Theorem 1. In addition, Lemma A.5 ensures there is a constant C>0C>0 such that lim supnλ1(Σ~n)C\limsup_{n\to\infty}\lambda_{1}(\tilde{\Sigma}_{n})\leq C almost surely. Altogether, it follows that ((ξ1)2|X)\mathcal{L}((\xi_{1}^{*})^{2}|X) and H~n\tilde{H}_{n} simultaneously satisfy analogues of (a) and (b) almost surely along J′′J^{\prime\prime}, which implies IIn0\textup{II}_{n}\xrightarrow{\mathbb{P}}0.∎

Appendix C Proof of Theorem 3

To begin the proof, we need to introduce three auxiliary statistics defined by

r˘n\displaystyle\breve{r}_{n} =tr(Σ^n)2tr(Σ^n2)1ntr(Σ^n)2\displaystyle=\frac{\textup{tr}(\widehat{\Sigma}_{n})^{2}}{\textup{tr}(\widehat{\Sigma}_{n}^{2})-\frac{1}{n}\textup{tr}(\widehat{\Sigma}_{n})^{2}} (C.1)
r˘n\displaystyle\breve{r}_{n}^{*} =tr(Σ^n)2tr((Σ^n)2)1ntr(Σ^n)2\displaystyle=\frac{\textup{tr}(\widehat{\Sigma}^{*}_{n})^{2}}{\textup{tr}((\widehat{\Sigma}^{*}_{n})^{2})-\frac{1}{n}\textup{tr}(\widehat{\Sigma}_{n}^{*})^{2}} (C.2)
r~n\displaystyle\tilde{r}_{n} =tr(Σ~n)2tr(Σ~n2).\displaystyle=\frac{\textup{tr}(\tilde{\Sigma}_{n})^{2}}{\textup{tr}(\tilde{\Sigma}_{n}^{2})}. (C.3)

Also, we define r^n\widehat{r}_{n}^{*} as the statistic obtained by applying the formula (3.13) for r^n\widehat{r}_{n} to the bootstrap data 𝐱1,,𝐱n\mathbf{x}_{1}^{*},\dots,\mathbf{x}_{n}^{*}. The primary task is to establish the following four limits, which are established later in this appendix in Propositions 1 and 2. Specifically, these results show that there exists a non-degenerate Gaussian random variable ζ\zeta and a constant aa, such that as nn\to\infty,

(r˘nrn)\displaystyle\mathcal{L}(\breve{r}_{n}-r_{n}) (ζ),\displaystyle\ \Rightarrow\ \mathcal{L}(\zeta), (C.4)
(r˘nr~n|X)\displaystyle\mathcal{L}\big{(}\breve{r}_{n}^{*}-\tilde{r}_{n}\big{|}X\big{)} (ζ),\displaystyle\ \xRightarrow{\mathbb{P}}\ \mathcal{L}(\zeta), (C.5)
(r^nr˘n)\displaystyle\mathcal{L}(\widehat{r}_{n}-\breve{r}_{n}) (a),\displaystyle\ \Rightarrow\ \mathcal{L}(a), (C.6)
(r^nr˘n|X)\displaystyle\mathcal{L}(\widehat{r}_{n}^{*}-\breve{r}_{n}^{*}\big{|}X\big{)} (a),\displaystyle\ \xRightarrow{\mathbb{P}}\ \mathcal{L}(a), (C.7)

where (a)\mathcal{L}(a) denotes the point mass distribution at aa. (In the current section, we sometimes use the notation for weak convergence in limits where convergence in probability holds, because it will help to clarify how r^nr˘n\widehat{r}_{n}^{*}-\breve{r}_{n}^{*} can be analyzed in an analogous manner to r^nr˘n\widehat{r}_{n}-\breve{r}_{n}.) Using Slutsky’s lemma, it follows from the limits (C.4) and (C.6) that

(r^nrn)\displaystyle\mathcal{L}(\widehat{r}_{n}-r_{n}) (ζ+a).\displaystyle\ \Rightarrow\ \mathcal{L}(\zeta+a). (C.8)

Analogously, Slutsky’s lemma can be applied in a conditional manner to the limits (C.5) and (C.7), yielding

(r^nr~n|X)\displaystyle\mathcal{L}\big{(}\widehat{r}_{n}^{*}-\tilde{r}_{n}\big{|}X\big{)} (ζ+a).\displaystyle\ \xRightarrow{\mathbb{P}}\ \mathcal{L}(\zeta+a). (C.9)

Since the limiting distribution (ζ+a)\mathcal{L}(\zeta+a) is continuous, Pólya’s theorem implies

supt|(r^nrnt)(r^nr~nt|X)| 0.\displaystyle\sup_{t\in\mathbb{R}}\Big{|}\mathbb{P}(\widehat{r}_{n}-r_{n}\leq t)-\mathbb{P}(\widehat{r}_{n}^{*}-\tilde{r}_{n}\leq t|X)\Big{|}\ \xrightarrow{\mathbb{P}}\ 0. (C.10)

Due to this uniform limit and the continuity of the distribution (ζ+a)\mathcal{L}(\zeta+a), standard arguments can be used to show that the quantiles of (r^nr~n|X)\mathcal{L}(\widehat{r}_{n}^{*}-\tilde{r}_{n}|X) are asymptotically equivalent to those of (r^nrn)\mathcal{L}(\widehat{r}_{n}-r_{n}). (For example, see the proof of Lemma 10.4 in [31].) More precisely, if we note that the quantile estimate q^1α\widehat{q}_{1-\alpha} defined in Section 3.4 is the same as the (1α)(1-\alpha)-quantile of (1p(r^nr~n)|X)\mathcal{L}\big{(}\frac{1}{p}(\widehat{r}_{n}^{*}-\tilde{r}_{n})|X\big{)}, then as nn\to\infty we have the limit

(1p(r^nrn)>q^1α)α.\mathbb{P}\big{(}\textstyle\frac{1}{p}(\widehat{r}_{n}-r_{n})>\widehat{q}_{1-\alpha}\big{)}\to\alpha. (C.11)

This limit directly implies the first two statements (3.19) and (3.20) in Theorem 3. Regarding the third statement (3.21), if 𝖧0,n\mathsf{H}_{0,n}^{\prime} holds for all large nn, then replacing Σ~n=diag(λ~1,,λ~p)\tilde{\Sigma}_{n}=\textup{diag}(\tilde{\lambda}_{1},\dots,\tilde{\lambda}_{p}) with Σ~n=Ip\tilde{\Sigma}_{n}=I_{p} does not affect the reasoning leading up to (C.11), because this replacement does not affect the proofs of Propositions 1 and 2 given later. Consequently, if 𝖧0,n\mathsf{H}_{0,n}^{\prime} holds for all large nn, then we have (1p(r^n1)q^α)α\mathbb{P}\big{(}\frac{1}{p}(\widehat{r}_{n}-1)\leq\widehat{q}_{\alpha}^{\prime}\big{)}\to\alpha as nn\to\infty, which completes the proof.∎

Proposition 1.

Under Assumptions 1 and 2, the limits (C.4) and (C.5) hold as nn\to\infty.

Proof.

The proof is based on viewing r˘n\breve{r}_{n} as a nonlinear function of Tn(𝐟)T_{n}(\mathbf{f}), where the components of 𝐟=(f1,f2)\mathbf{f}=(f_{1},f_{2}) are taken to be f1(x)=xf_{1}(x)=x and f2(x)=x2f_{2}(x)=x^{2}. In this case, the centering parameter ϑn(𝐟)\vartheta_{n}(\mathbf{f}) defined by (3.10) reduces to

ϑn(𝐟)=(1ptr(Σn),1ptr(Σn2)+1nptr(Σn)2),\vartheta_{n}(\mathbf{f})\ =\ \left(\textstyle\frac{1}{p}\textup{tr}(\Sigma_{n})\,,\,\textstyle\frac{1}{p}\textup{tr}(\Sigma_{n}^{2})+\frac{1}{np}\textup{tr}(\Sigma_{n})^{2}\right), (C.12)

as recorded in Lemma 2.16 of [48]. Consequently, by considering the function

gn(x1,x2)=x12x2cnx12,g_{n}(x_{1},x_{2})=\textstyle\frac{x_{1}^{2}}{x_{2}-c_{n}x_{1}^{2}}, (C.13)

we have the key relation

r˘nrn=p{gn(Tn(𝐟))gn(ϑn(𝐟))}.\breve{r}_{n}-r_{n}\ =\ p\big{\{}g_{n}(T_{n}(\mathbf{f}))-g_{n}(\vartheta_{n}(\mathbf{f}))\big{\}}. (C.14)

We can also develop a corresponding relation for the bootstrap statistic r˘nr~\breve{r}_{n}^{*}-\tilde{r}. Due to Lemma 2.16 in [48] and our choice of 𝐟\mathbf{f}, the definition of ϑ~n(𝐟)\tilde{\vartheta}_{n}(\mathbf{f}) below (3.10) implies

ϑ~n(𝐟)=(1ptr(Σ~n),1ptr(Σ~n2)+1nptr(Σ~n)2).\tilde{\vartheta}_{n}(\mathbf{f})\ =\ \left(\textstyle\frac{1}{p}\textup{tr}(\tilde{\Sigma}_{n})\,,\,\textstyle\frac{1}{p}\textup{tr}(\tilde{\Sigma}_{n}^{2})+\frac{1}{np}\textup{tr}(\tilde{\Sigma}_{n})^{2}\right). (C.15)

Likewise, the bootstrap version of (C.14) is

r˘nr~n=p{gn(Tn(𝐟))gn(ϑ~n(𝐟))}.\breve{r}_{n}^{*}-\tilde{r}_{n}\ =\ p\big{\{}g_{n}(T_{n}^{*}(\mathbf{f}))-g_{n}(\tilde{\vartheta}_{n}(\mathbf{f}))\big{\}}. (C.16)

We will proceed by applying the delta method to the relations (C.14) and (C.16). For this purpose, note that the proof of Theorem 2 shows there exists a Gaussian random vector 𝐳2\mathbf{z}\in\mathbb{R}^{2} such that as nn\to\infty,

(p{Tn(𝐟)ϑn(𝐟)})\displaystyle\mathcal{L}\big{(}p\big{\{}T_{n}(\mathbf{f})-\vartheta_{n}(\mathbf{f})\big{\}}\big{)} (𝐳)\displaystyle\ \Rightarrow\ \mathcal{L}(\mathbf{z}) (C.17)
(p{Tn,1(𝐟)ϑ~n(𝐟)}|X)\displaystyle\mathcal{L}\big{(}p\big{\{}T_{n,1}^{*}(\mathbf{f})-\tilde{\vartheta}_{n}(\mathbf{f})\}\big{|}X\big{)} (𝐳).\displaystyle\ \xRightarrow{\mathbb{P}}\ \mathcal{L}(\mathbf{z}). (C.18)

Also, to introduce some further notation, we refer to the jjth moment of the distribution HH as

ϕj=tj𝑑H(t),\phi_{j}=\textstyle\int t^{j}dH(t), (C.19)

and we use these moments to define the parameter

ϑ(𝐟)=(ϕ1,ϕ2+cϕ12).\vartheta(\mathbf{f})=(\phi_{1},\phi_{2}+c\phi_{1}^{2}). (C.20)

This parameter arises in the following limits as nn\to\infty,

ϑn(𝐟)ϑ(𝐟) and ϑ~n(𝐟)ϑ(𝐟).\vartheta_{n}(\mathbf{f})\to\vartheta(\mathbf{f})\ \ \ \ \ \ \text{ and }\ \ \ \ \ \ \tilde{\vartheta}_{n}(\mathbf{f})\xrightarrow{\mathbb{P}}\vartheta(\mathbf{f}). (C.21)

To see why these limits hold, first note that by Assumption 2, there is a compact interval containing support of HnH_{n} for all large nn, and by Lemma A.5, the same statement holds almost surely for H~n\tilde{H}_{n}. Moreover, Assumption 2 and Theorem 1 ensure the limits HnHH_{n}\Rightarrow H and H~nH\tilde{H}_{n}\xRightarrow{\mathbb{P}}H, and so it follows that the moments of HnH_{n} converge to those of HH, and the moments of H~n\tilde{H}_{n} converge in probability to those of HH. Combining these facts with the formulas (C.12) and (C.15) yields the limits (C.21).

In light of the relations (C.14) and (C.16), and the limits (C.21), we will expand the function gng_{n} around ϑ(𝐟)\vartheta(\mathbf{f}) and apply the delta method. This is justified because the gradient gn\nabla g_{n} has the following continuity property. Namely, if we let g(x1,x2)=x12x2cx12g(x_{1},x_{2})=\frac{x_{1}^{2}}{x_{2}-cx_{1}^{2}}, let 𝒰\mathcal{U} be a sufficiently small open neighborhood of ϑ(𝐟)\vartheta(\mathbf{f}) in 2\mathbb{R}^{2}, and let {(x1,n,x2,n)}\{(x_{1,n},x_{2,n})\} be any sequence of points within 𝒰\mathcal{U} that converges to ϑ(𝐟)\vartheta(\mathbf{f}), then we have the limit

gn(x1,n,x2,n)g(ϑ(𝐟))=(2ϕ1ϕ2+2cϕ13ϕ22,ϕ12ϕ22).\nabla g_{n}(x_{1,n},x_{2,n})\ \to\ \nabla g(\vartheta(\mathbf{f}))\ =\ \left(\textstyle\frac{2\phi_{1}}{\phi_{2}}+\frac{2c\phi_{1}^{3}}{\phi_{2}^{2}},-\frac{\phi_{1}^{2}}{\phi_{2}^{2}}\right). (C.22)

So, applying the delta method to the relations (C.14) and (C.16) and the weak limits (C.17) and (C.18) gives

(r˘nrn)\displaystyle\mathcal{L}(\breve{r}_{n}-r_{n}) (g(ϑ(𝐟))𝐳)\displaystyle\ \Rightarrow\ \mathcal{L}(\nabla g(\vartheta(\mathbf{f}))^{\top}\mathbf{z}) (C.23)
(r˘nr~n|X)\displaystyle\mathcal{L}(\breve{r}_{n}^{*}-\tilde{r}_{n}\big{|}X\big{)} (g(ϑ(𝐟))𝐳).\displaystyle\ \xRightarrow{\mathbb{P}}\ \mathcal{L}(\nabla g(\vartheta(\mathbf{f}))^{\top}\mathbf{z}). (C.24)

Now, it remains to show that the variance of the Gaussian random variable g(ϑ(𝐟))𝐳\nabla g(\vartheta(\mathbf{f}))^{\top}\mathbf{z} is positive. Letting KK denote the 2×22\times 2 covariance matrix of 𝐳\mathbf{z}, it is shown below equation 2.10 in [18] that the entries of KK are given by

K11\displaystyle K_{11} =2cϕ2+c(τ2)ϕ12,\displaystyle=2c\phi_{2}+c(\tau-2)\phi_{1}^{2},
K12\displaystyle K_{12} =4cϕ3+4c2ϕ1ϕ2+2c(τ2)ϕ1(cϕ12+ϕ2)\displaystyle=4c\phi_{3}+4c^{2}\phi_{1}\phi_{2}+2c(\tau-2)\phi_{1}(c\phi_{1}^{2}+\phi_{2})
K22\displaystyle K_{22} =8cϕ4+4c2ϕ22+16c2ϕ1ϕ3+8c3ϕ12ϕ2+4c(τ2)(cϕ12+ϕ2)2.\displaystyle=8c\phi_{4}+4c^{2}\phi_{2}^{2}+16c^{2}\phi_{1}\phi_{3}+8c^{3}\phi_{1}^{2}\phi_{2}+4c(\tau-2)(c\phi_{1}^{2}+\phi_{2})^{2}.

Combining this with the formula for g(ϑ(𝐟))\nabla g(\vartheta(\mathbf{f})) in (C.22), a direct but lengthy computation of the quadratic form g(ϑ(𝐟))Kg(ϑ(𝐟))\nabla g(\vartheta(\mathbf{f}))^{\top}K\,\nabla g(\vartheta(\mathbf{f})) yields

var(g(ϑ(𝐟))𝐳)=4cϕ12ϕ24(cϕ12ϕ22+2ϕ12ϕ4+2ϕ234ϕ1ϕ2ϕ3).\begin{split}\operatorname{var}\!\Big{(}\nabla g(\vartheta(\mathbf{f}))^{\top}\mathbf{z}\Big{)}&\ =\ \textstyle\frac{4c\phi_{1}^{2}}{\phi_{2}^{4}}\Big{(}c\phi_{1}^{2}\phi_{2}^{2}+2\phi_{1}^{2}\phi_{4}+2\phi_{2}^{3}-4\phi_{1}\phi_{2}\phi_{3}\Big{)}.\end{split} (C.25)

To see that the variance is positive, it suffices to check that 2ϕ12ϕ4+2ϕ234ϕ1ϕ2ϕ32\phi_{1}^{2}\phi_{4}+2\phi_{2}^{3}-4\phi_{1}\phi_{2}\phi_{3} is non-negative. Rewriting this as 2(ϕ1ϕ41/2ϕ23/2)2+4ϕ1ϕ2(ϕ41/2ϕ21/2ϕ3)2(\phi_{1}\phi_{4}^{1/2}-\phi_{2}^{3/2})^{2}+4\phi_{1}\phi_{2}(\phi_{4}^{1/2}\phi_{2}^{1/2}-\phi_{3}), its non-negativity follows from the observation that ϕ3ϕ41/2ϕ21/2\phi_{3}\leq\phi_{4}^{1/2}\phi_{2}^{1/2} is an instance of the Cauchy-Schwarz inequality. ∎

Proposition 2.

Under Assumptions 1 and 2, the limits (C.6) and (C.7) hold as nn\to\infty.

Proof.

Here, we retain the definitions of f1f_{1} and f2f_{2} used in the proof of the previous proposition. The difference r^nr˘n\widehat{r}_{n}-\breve{r}_{n} can be written explicitly in terms of Tn(f1)T_{n}(f_{1}), Tn(f2)T_{n}(f_{2}), and Δ^n\widehat{\Delta}_{n} (defined below (3.13)) as

r^nr˘n=Tn(f1)2(Δ^npcnTn(f1)2)(Tn(f2)cnTn(f1)2)(Tn(f2)Δ^n/p).\displaystyle\widehat{r}_{n}-\breve{r}_{n}=\frac{T_{n}(f_{1})^{2}\Big{(}\widehat{\Delta}_{n}-pc_{n}T_{n}(f_{1})^{2}\Big{)}}{(T_{n}(f_{2})-c_{n}T_{n}(f_{1})^{2})(T_{n}(f_{2})-\widehat{\Delta}_{n}/p)}. (C.26)

Furthermore, Δ^n\widehat{\Delta}_{n} can be decomposed as

Δ^n=κ^1Tn(f1)2+κ^2Tn(f2),\widehat{\Delta}_{n}=\widehat{\kappa}_{1}T_{n}(f_{1})^{2}\ +\ \widehat{\kappa}_{2}T_{n}(f_{2}), (C.27)

where the random coefficients κ^1\widehat{\kappa}_{1} and κ^2\widehat{\kappa}_{2} are defined by

κ^1\displaystyle\widehat{\kappa}_{1}\ =cnp[n+1n+n1nς^n22pp(p+2)2(n1)n2p2+ς^n2p(p+2)]\displaystyle=\ c_{n}p\bigg{[}\frac{n+1}{n}+\frac{n-1}{n}\frac{{\widehat{\varsigma}}_{n}^{2}-2p}{p(p+2)}-\frac{2(n-1)}{n^{2}}\frac{p^{2}+\widehat{\varsigma}_{n}^{2}}{p(p+2)}\bigg{]} (C.28)
κ^2\displaystyle\widehat{\kappa}_{2}\ =cn[2(n1)np2+ς^n2p(p+2)1].\displaystyle=\ c_{n}\bigg{[}\frac{2(n-1)}{n}\frac{p^{2}+\widehat{\varsigma}_{n}^{2}}{p(p+2)}-1\bigg{]}. (C.29)

The last few displays show that in order to determine the limit of r^nr˘n\widehat{r}_{n}-\breve{r}_{n}, it suffices to determine the limit of the triple (Tn(f1),Tn(f2),ς^n2/p)(T_{n}(f_{1}),T_{n}(f_{2}),\widehat{\varsigma}_{n}^{2}/p). For the random variables Tn(f1)T_{n}(f_{1}) and Tn(f2)T_{n}(f_{2}), we can apply the limits (C.17) and (C.21) as well the formula (C.20) to obtain

(Tn(f1),Tn(f2))(ϕ1,ϕ2+cϕ12).\displaystyle\mathcal{L}(T_{n}(f_{1}),T_{n}(f_{2}))\Rightarrow\mathcal{L}(\phi_{1},\phi_{2}+c\phi_{1}^{2}). (C.30)

In addition, the proof of the limit (3.7) in Theorem 1 shows that

(1pς^n2)(τ).\mathcal{L}(\textstyle\frac{1}{p}\widehat{\varsigma}_{n}^{2})\ \Rightarrow\ \mathcal{L}(\tau). (C.31)

Combining the previous two displays with the formulas (C.26)-(C.29), a direct calculation leads to

(r^nr˘n)(a),\displaystyle\mathcal{L}(\widehat{r}_{n}-\breve{r}_{n})\ \Rightarrow\ \mathcal{L}(a), (C.32)

where

a=cϕ12(ϕ2+(τ2)ϕ12)ϕ22,a=\textstyle\frac{c\phi_{1}^{2}(\phi_{2}+(\tau-2)\phi_{1}^{2})}{\phi_{2}^{2}},

which proves (C.6).

Now we turn to the proof of (C.7). By analogy with the previous argument that led to (C.32), it is enough to show the following two limits hold as nn\to\infty,

(Tn(f1),Tn(f2)|X)\displaystyle\mathcal{L}(T_{n}^{*}(f_{1}),T_{n}^{*}(f_{2})|X) (ϕ1,ϕ2+cϕ12)\displaystyle\xRightarrow{\mathbb{P}}\mathcal{L}(\phi_{1},\phi_{2}+c\phi_{1}^{2}) (C.33)
(1p(ς^n2)|X)\displaystyle\mathcal{L}(\textstyle\frac{1}{p}(\widehat{\varsigma}_{n}^{2})^{*}|X) (τ),\displaystyle\xRightarrow{\mathbb{P}}\mathcal{L}(\tau), (C.34)

where (ς^n2)(\widehat{\varsigma}_{n}^{2})^{*} is obtained by applying the formula (2.2) to the bootstrap data 𝐱1,,𝐱n\mathbf{x}_{1}^{*},\dots,\mathbf{x}_{n}^{*}. The first limit (C.33) is a consequence of two limits that were established in the proof of Theorem 2, which are that IIn0\textup{II}_{n}\xrightarrow{\mathbb{P}}0 and ϑ~n(𝐟)(ϕ1,ϕ2+cϕ12)\tilde{\vartheta}_{n}(\mathbf{f})\xrightarrow{\mathbb{P}}(\phi_{1},\phi_{2}+c\phi_{1}^{2}).

Regarding the second limit (C.34), note that it is equivalent to showing that for any subsequence J{1,2,}J\subset\{1,2,\dots\}, there is a further subsequence JJJ^{\prime}\subset J such that (1p(ς^n2)|X)(τ)\mathcal{L}(\textstyle\frac{1}{p}(\widehat{\varsigma}_{n}^{2})^{*}|X)\Rightarrow\mathcal{L}(\tau) holds almost surely along JJ^{\prime}. The latter statement can be proven by analogy with the limit (1pς^n2)(τ)\mathcal{L}(\frac{1}{p}\widehat{\varsigma}_{n}^{2})\Rightarrow\mathcal{L}(\tau), which follows from the proof of (3.7) in Theorem 1. To be more precise, this analogy can be justified as follows: The proof of (3.7) only relies on Assumption 1 and two other conditions, which are

λ1(Σn)1, and (1ptr(Σn),1ptr(Σn2))(ϕ1,ϕ2).\lambda_{1}(\Sigma_{n})\lesssim 1,\ \ \ \ \text{ and }\ \ \ \ (\textstyle\frac{1}{p}\textup{tr}(\Sigma_{n}),\textstyle\frac{1}{p}\textup{tr}(\Sigma_{n}^{2}))\to(\phi_{1},\phi_{2}). (C.35)

Consequently, it is enough to check that bootstrap counterparts of these conditions hold almost surely along JJ^{\prime}. First, the bootstrap counterpart of Assumption 1 was shown to hold almost surely along subsequences in the proof of Theorem 2. Second, the bootstrap counterpart of λ1(Σn)1\lambda_{1}(\Sigma_{n})\lesssim 1 is implied by Lemma A.5, which guarantees that there is a constant C>0C>0 such that lim supnλ1(Σ~n)C\limsup_{n\to\infty}\lambda_{1}(\tilde{\Sigma}_{n})\leq C holds almost surely. Lastly, the bootstrap counterpart of the limit (1ptr(Σn),1ptr(Σn2))(ϕ1,ϕ2)(\textstyle\frac{1}{p}\textup{tr}(\Sigma_{n}),\textstyle\frac{1}{p}\textup{tr}(\Sigma_{n}^{2}))\to(\phi_{1},\phi_{2}) is handled by the fact that the moments of H~n\tilde{H}_{n} converge in probability to the moments of HH, which was shown in the proof of Proposition 1. This completes the proof. ∎

Appendix D Background results

Lemma D.1 ([18], Lemma A.1).

Let ξ1\xi_{1}\in\mathbb{R} and 𝐮1p\mathbf{u}_{1}\in\mathbb{R}^{p} satisfy the conditions in Assumption 1, and fix any symmetric matrix Mp×pM\in\mathbb{R}^{p\times p}. Then,

var(ξ12𝐮1M𝐮1)=𝔼(ξ14)p(p+2)(tr(M)2+2tr(M2))tr(M)2.\displaystyle\operatorname{var}(\xi_{1}^{2}\mathbf{u}_{1}^{\top}M\mathbf{u}_{1})=\frac{\mathbb{E}(\xi_{1}^{4})}{p(p+2)}(\textup{tr}(M)^{2}+2\textup{tr}(M^{2}))-\textup{tr}(M)^{2}. (D.1)
Lemma D.2.

Let 𝐱1,,𝐱n\mathbf{x}_{1},\ldots,\mathbf{x}_{n} satisfy the conditions in Assumption 1, and let i,j,k,li,j,k,l be four distinct indices in {1,,n}\{1,\dots,n\}. Then,

𝔼((𝐱i𝐱j)4)\displaystyle\mathbb{E}((\mathbf{x}_{i}^{\top}\mathbf{x}_{j})^{4})\ = 3(𝔼(ξ14)p(p+2))2(tr(Σn2)2+2tr(Σn4))\displaystyle=\ 3\left(\frac{\mathbb{E}(\xi_{1}^{4})}{p(p+2)}\right)^{2}(\textup{tr}(\Sigma_{n}^{2})^{2}+2\textup{tr}(\Sigma_{n}^{4})) (D.2)
𝔼((𝐱i𝐱j)2(𝐱i𝐱k)2)\displaystyle\mathbb{E}((\mathbf{x}_{i}^{\top}\mathbf{x}_{j})^{2}(\mathbf{x}_{i}^{\top}\mathbf{x}_{k})^{2})\ =𝔼(ξ14)p(p+2)(tr(Σn2)2+2tr(Σn4))\displaystyle=\ \frac{\mathbb{E}(\xi_{1}^{4})}{p(p+2)}(\textup{tr}(\Sigma_{n}^{2})^{2}+2\textup{tr}(\Sigma_{n}^{4})) (D.3)
𝔼((𝐱i𝐱j)2(𝐱l𝐱k)2)\displaystyle\mathbb{E}((\mathbf{x}_{i}^{\top}\mathbf{x}_{j})^{2}(\mathbf{x}_{l}^{\top}\mathbf{x}_{k})^{2})\ =tr(Σn2)2.\displaystyle=\ \textup{tr}(\Sigma^{2}_{n})^{2}. (D.4)
Proof.

Since the observations are i.i.d. and centered, we have

𝔼((𝐱i𝐱j)2(𝐱l𝐱k)2)=(var(𝐱i𝐱j))2=(cov(𝐱i𝐱j,𝐱i𝐱j))2=(r=1ps=1p𝔼(𝐱ir𝐱jr𝐱is𝐱js))2=(r=1ps=1p((Σn)rs)2)2,\begin{split}\mathbb{E}((\mathbf{x}_{i}^{\top}\mathbf{x}_{j})^{2}(\mathbf{x}_{l}^{\top}\mathbf{x}_{k})^{2})&=(\operatorname{var}(\mathbf{x}_{i}^{\top}\mathbf{x}_{j}))^{2}\\[5.69046pt] &=(\text{cov}(\mathbf{x}_{i}^{\top}\mathbf{x}_{j}\,,\,\mathbf{x}_{i}^{\top}\mathbf{x}_{j}))^{2}\\ &=\Big{(}\sum_{r=1}^{p}\sum_{s=1}^{p}\mathbb{E}(\mathbf{x}_{ir}\mathbf{x}_{jr}\mathbf{x}_{is}\mathbf{x}_{js})\Big{)}^{2}\\[5.69046pt] &=\Big{(}\sum_{r=1}^{p}\sum_{s=1}^{p}((\Sigma_{n})_{rs})^{2}\Big{)}^{2},\end{split} (D.5)

which establishes (D.4). The two other assertions can be shown using conditional expectation. For the statement (D.2), we use Lemma D.1 to obtain

𝔼((𝐱i𝐱j)4)\displaystyle\small\mathbb{E}((\mathbf{x}_{i}^{\top}\mathbf{x}_{j})^{4}) =𝔼(𝔼((𝐱i𝐱j𝐱j𝐱i)2|𝐱j))\displaystyle\ =\ \mathbb{E}(\mathbb{E}((\mathbf{x}_{i}^{\top}\mathbf{x}_{j}\mathbf{x}_{j}^{\top}\mathbf{x}_{i})^{2}|\mathbf{x}_{j}))
=𝔼(ξ14)p(p+2)𝔼((tr(Σn1/2𝐱j𝐱jΣn1/2))2+2tr((Σn1/2𝐱j𝐱jΣn1/2)2))\displaystyle\ =\ \frac{\mathbb{E}(\xi_{1}^{4})}{p(p+2)}\mathbb{E}\bigg{(}\textstyle\Big{(}\textup{tr}(\Sigma_{n}^{1/2}\mathbf{x}_{j}\mathbf{x}_{j}^{\top}\Sigma_{n}^{1/2})\Big{)}^{2}+2\textup{tr}\Big{(}\big{(}\Sigma_{n}^{1/2}\mathbf{x}_{j}\mathbf{x}_{j}^{\top}\Sigma_{n}^{1/2}\big{)}^{2}\Big{)}\bigg{)}
= 3𝔼(ξ14)p(p+2)𝔼((𝐱jΣn𝐱j)2)\displaystyle\ =\ 3\textstyle\frac{\mathbb{E}(\xi_{1}^{4})}{p(p+2)}\mathbb{E}((\mathbf{x}_{j}^{\top}\Sigma_{n}\mathbf{x}_{j})^{2})
= 3(𝔼(ξ14)p(p+2))2(tr(Σn2)2+2tr(Σn4)).\displaystyle\ =\ 3\left(\frac{\mathbb{E}(\xi_{1}^{4})}{p(p+2)}\right)^{2}(\textup{tr}(\Sigma_{n}^{2})^{2}+2\textup{tr}(\Sigma_{n}^{4})).

The argument for (D.3) is similar. ∎

Lemma D.3.

Let 𝐮1p\mathbf{u}_{1}\in\mathbb{R}^{p} be a random vector that is uniformly distributed on the unit sphere, and let Mp×pM\in\mathbb{R}^{p\times p} be a non-random positive semidefinite matrix with λ1(M)1\lambda_{1}(M)\lesssim 1. Then,

𝔼|𝐮1M𝐮11ptr(M)|41p2.\mathbb{E}\Big{|}\mathbf{u}_{1}^{\top}M\mathbf{u}_{1}-\textstyle\frac{1}{p}\textup{tr}(M)\Big{|}^{4}\lesssim\ \frac{1}{p^{2}}. (D.6)
Proof.

Due to the orthogonal invariance of 𝐮1\mathbf{u}_{1}, we may work under the assumption that MM is diagonal, i.e. M=diag(λ1(M),,λp(M))M=\text{diag}(\lambda_{1}(M),\dots,\lambda_{p}(M)). Therefore, the quantity 𝔼|𝐮1M𝐮11ptr(M)|4\mathbb{E}\big{|}\mathbf{u}_{1}^{\top}M\mathbf{u}_{1}-\textstyle\frac{1}{p}\textup{tr}(M)\big{|}^{4} is the same as

j1,j2,j3,j4λj1(M)λj2(M)λj3(M)λj4(M)𝔼(l=14(𝐮1jl21p)).\displaystyle\sum_{j_{1},j_{2},j_{3},j_{4}}\textstyle\lambda_{j_{1}}(M)\lambda_{j_{2}}(M)\lambda_{j_{3}}(M)\lambda_{j_{4}}(M)\,\mathbb{E}\Big{(}\prod_{l=1}^{4}(\mathbf{u}_{1j_{l}}^{2}-\textstyle\frac{1}{p})\Big{)}. (D.7)

Depending on the number of distinct indices among (j1,j2,j3,j4)(j_{1},j_{2},j_{3},j_{4}), the following bounds can be obtained via direct calculation

|𝔼(l=14(𝐮1jl21p))|{1p6(4 distinct indices)1p5(3 distinct indices)1p4(1 or 2 distinct indices).\bigg{|}\mathbb{E}\Big{(}\textstyle\prod_{l=1}^{4}(\mathbf{u}_{1j_{l}}^{2}-\textstyle\frac{1}{p})\Big{)}\bigg{|}\ \lesssim\ \begin{cases}\textstyle\frac{1}{p^{6}}\ \ \ \ \text{(4 distinct indices)}\\[5.69046pt] \textstyle\frac{1}{p^{5}}\ \ \ \ \text{(3 distinct indices)}\\[5.69046pt] \textstyle\frac{1}{p^{4}}\ \ \ \ \text{(1 or 2 distinct indices).}\end{cases} (D.8)

Since the eigenvalues of MM are non-negative with λ1(M)1\lambda_{1}(M)\lesssim 1, and since the numbers of terms in (D.7) involving kk distinct indices is 𝒪(pk)\mathcal{O}(p^{k}), the stated result (D.6) is proved.∎

Appendix E Discussion of examples in Section 3.1

This section provides detailed information related to examples of the random variable ξ12\xi_{1}^{2} stated in Section 3.1. We give explicit parameterizations, and we check that the distributions satisfy the conditions in Assumption 1. The only three examples we do not individually cover are the Chi-Squared, Poisson, and Negative-Binomial distributions, because they can be decomposed into sums of independent random variables, and consequently, such examples are covered by the following lemma.

Lemma E.1.

Suppose ξ12=j=1pz1j2\xi_{1}^{2}=\sum_{j=1}^{p}z_{1j}^{2} for some independent random variables z11,,z1pz_{11},\dots,z_{1p} satisfying

1pj=1p𝔼(z1j2)=1,1pj=1pvar(z1j2)=τ+o(1) and max1jp𝔼|z1j|8+2εp1+ε4\textstyle\frac{1}{p}\sum_{j=1}^{p}\mathbb{E}(z_{1j}^{2})=1,\quad\textstyle\frac{1}{p}\sum_{j=1}^{p}\operatorname{var}(z_{1j}^{2})=\tau+o(1)\text{\quad and \quad}\displaystyle\max_{1\leq j\leq p}\mathbb{E}|z_{1j}|^{8+2\varepsilon}\,\lesssim\,p^{1+\frac{\varepsilon}{4}} (E.1)

as nn\to\infty, for some fixed constants τ0\tau\geq 0 and ε>0\varepsilon>0 not depending on nn. Then, ξ12\xi_{1}^{2} satisfies the conditions in Assumption 1.

Proof.

It is only necessary to show 𝔼|(ξ12𝔼(ξ12))/p|4+ε1\mathbb{E}\big{|}(\xi_{1}^{2}-\mathbb{E}(\xi_{1}^{2}))/\sqrt{p}\big{|}^{4+\varepsilon}\lesssim 1. Using Rosenthal’s inequality [22] to bound the L4+εL^{4+\varepsilon} norm of a sum of independent centered random variables, we have

(𝔼|ξ12𝔼(ξ12)|4+ε)14+ε=(𝔼|j=1pz1j2𝔼(z1j2)|4+ε)14+εj=1pvar(z1j2)+p14+εmax1jp(𝔼|z1j2𝔼(z1j2)|4+ε)14+εp+p1/4max1jp(𝔼|z1j|8+2ε)14+εp,\begin{split}\Big{(}\mathbb{E}\big{|}\xi_{1}^{2}-\mathbb{E}(\xi_{1}^{2})\big{|}^{4+\varepsilon}\Big{)}^{\frac{1}{4+\varepsilon}}&\ =\ \Big{(}\mathbb{E}\big{|}\sum_{j=1}^{p}z_{1j}^{2}-\mathbb{E}(z_{1j}^{2})\big{|}^{4+\varepsilon}\Big{)}^{\frac{1}{4+\varepsilon}}\\[5.69046pt] &\ \lesssim\ \sqrt{\textstyle\sum_{j=1}^{p}\operatorname{var}(z_{1j}^{2})}+p^{\frac{1}{4+\varepsilon}}\max_{1\leq j\leq p}(\mathbb{E}|z_{1j}^{2}-\mathbb{E}(z_{1j}^{2})|^{4+\varepsilon})^{\frac{1}{4+\varepsilon}}\\[5.69046pt] &\ \lesssim\sqrt{p}+p^{1/4}\max_{1\leq j\leq p}(\mathbb{E}|z_{1j}|^{8+2\varepsilon})^{\frac{1}{4+\varepsilon}}\\[5.69046pt] &\ \lesssim\ \sqrt{p},\end{split} (E.2)

which completes the proof. ∎

Beta distribution The Beta(a,b)(a,b) distribution with parameters a,b>0a,b>0 has a density function that is proportional to xa1(1x)b1x^{a-1}(1-x)^{b-1} for x(0,1)x\in(0,1).

Lemma E.2.

If β>0\beta>0 is fixed with respect to nn and ξ12(p+2β)\xi_{1}^{2}\sim(p+2\beta)Beta(p/2,β)(p/2,\beta), then ξ12\xi_{1}^{2} satisfies the conditions in Assumption 1.

Proof.

The mean and variance are given by 𝔼(ξ12)=p\mathbb{E}(\xi_{1}^{2})=p and var(ξ12)=4βpp+2β+2\operatorname{var}(\xi_{1}^{2})=\frac{4\beta p}{p+2\beta+2}, and in particular we have var(ξ12)/p0\operatorname{var}(\xi_{1}^{2})/p\to 0 as pp\to\infty. Based on Equation (25.14) in [21], the higher moments of ξ12\xi_{1}^{2} are

𝔼(ξ12k)=j=0k1(p+4jβp+2β+2j).\displaystyle\mathbb{E}(\xi_{1}^{2k})=\prod_{j=0}^{k-1}\Big{(}p+\textstyle\frac{4j\beta}{p+2\beta+2j}\Big{)}.

Using the general relationship between central moments to ordinary moments

𝔼|ξ12pp|6=1p3j=06(6j)(1)6j𝔼(ξ12)6j𝔼(ξ12j),\mathbb{E}\big{|}\textstyle\frac{\xi_{1}^{2}-p}{\sqrt{p}}\big{|}^{6}\ =\ \frac{1}{p^{3}}\displaystyle\sum_{j=0}^{6}\textstyle\binom{6}{j}(-1)^{6-j}\mathbb{E}(\xi_{1}^{2})^{6-j}\mathbb{E}(\xi_{1}^{2j}), (E.3)

it can be checked that that 𝔼|ξ12pp|6\mathbb{E}|\frac{\xi_{1}^{2}-p}{\sqrt{p}}|^{6} is a rational function of β\beta that converges pointwise to 0 as pp\to\infty. This implies the (4+ε)(4+\varepsilon)-moment condition in (3.2).∎ 

Beta-Prime distribution. A random variable WW is said to follow a Beta-Prime(a,b)(a,b) distribution with parameters a,b>0a,b>0 if it can be expressed as W=U1UW=\frac{U}{1-U} with UU\sim Beta(a,b)(a,b).

Lemma E.3.

If τ>0\tau>0 and ξ12\xi_{1}^{2}\sim Beta-Prime(p(1+p+τ)τ,1+p+2ττ)\left(\frac{p(1+p+\tau)}{\tau},\frac{1+p+2\tau}{\tau}\right), then ξ12\xi_{1}^{2} satisfies the conditions in Assumption 1.

Proof.

For any positive integer k<(1+p+2τ)/τk<(1+p+2\tau)/\tau, the random variable ξ12\xi_{1}^{2} satisfies the following moment formula [42, §5],

𝔼(ξ12k)=j=1kp(1+p+τ)+(j1)τ1+p+(2j)τ.\displaystyle\mathbb{E}(\xi_{1}^{2k})=\prod_{j=1}^{k}\frac{p(1+p+\tau)+(j-1)\tau}{1+p+(2-j)\tau}. (E.4)

Since 2<(1+p+2τ)/τ2<(1+p+2\tau)/\tau, this gives 𝔼(ξ12)=p\mathbb{E}(\xi_{1}^{2})=p and 𝔼(ξ14)=p(p+τ)\mathbb{E}(\xi_{1}^{4})=p(p+\tau), which yield var(ξ12pp)=τ\operatorname{var}\Big{(}\frac{\xi_{1}^{2}-p}{\sqrt{p}}\Big{)}=\tau. Also, using the formula (E.3), it can be checked that 𝔼|ξ12pp|6\mathbb{E}|\frac{\xi_{1}^{2}-p}{\sqrt{p}}|^{6}, viewed as a function of τ\tau, converges pointwise to a polynomial function of τ\tau as pp\to\infty. This implies the (4+ε)(4+\varepsilon)-moment condition in (3.2).∎ 
 
Gamma distribution. For α,β>0\alpha,\beta>0, we parameterize the Gamma(α,β)(\alpha,\beta) distribution so that its density function is proportional to xα1eβxx^{\alpha-1}e^{-\beta x} when x(0,)x\in(0,\infty).

Lemma E.4.

If τ>0\tau>0 and ξ12Gamma(p/τ,1/τ)\xi_{1}^{2}\sim\textup{Gamma}(p/\tau,1/\tau), then ξ12\xi_{1}^{2} satisfies the conditions in Assumption 1.

Proof.

The conditions 𝔼(ξ12)=p\mathbb{E}(\xi_{1}^{2})=p and var(ξ12pp)=τ\operatorname{var}\Big{(}\frac{\xi_{1}^{2}-p}{\sqrt{p}}\Big{)}=\tau follow directly from the stated parameterization. Also, Theorem 2.3 in [8] gives

(𝔼|ξ12p|6)1/6C(τp+τ)\displaystyle\big{(}\mathbb{E}|\xi_{1}^{2}-p|^{6}\big{)}^{1/6}\leq C(\sqrt{\tau p}+\tau) (E.5)

for an absolute constant C>0C>0, which implies the (4+ε)(4+\varepsilon)-moment condition in (3.2).∎ 
 
Log-Normal distribution. A positive random variable YY is said to follow a Log-Normal(μ,σ2)(\mu,\sigma^{2}) distribution if log(Y)N(μ,σ2)\log(Y)\sim N(\mu,\sigma^{2}).

Lemma E.5.

If τ>0\tau>0 and ξ12\xi_{1}^{2}\sim Log-Normal(log(p)12log(1+τp),log(1+τp))\left(\!\log(p)-\frac{1}{2}\log\big{(}1+\frac{\tau}{p}\big{)},\log\big{(}1+\frac{\tau}{p}\big{)}\!\right), then ξ12\xi_{1}^{2} satisfies the conditions in Assumption 1.

Proof.

Equations (14.8a)-(14.8b) in [20] show that if YLog-Normal(μ,σ2)Y\sim\textup{Log-Normal}(\mu,\sigma^{2}), then 𝔼(Y)=eμ+σ2/2\mathbb{E}(Y)=e^{\mu+\sigma^{2}/2} and var(Y)=e2μ+σ2(eσ21)\operatorname{var}(Y)=e^{2\mu+\sigma^{2}}\big{(}e^{\sigma^{2}}-1\big{)}. Consequently, the stated choice of ξ12\xi_{1}^{2} satisfies 𝔼(ξ12)=p\mathbb{E}(\xi_{1}^{2})=p and var(ξ12)/p=τ\operatorname{var}(\xi_{1}^{2})/p=\tau. Equation (14.8c) in [20] shows that the central 6th moment of YY is

𝔼|Y𝔼(Y)|6\displaystyle\mathbb{E}|Y-\mathbb{E}(Y)|^{6} =e6μ+3σ2j=06(1)j(6j)eσ2(6j)(5j)/2,\displaystyle=e^{6\mu+3\sigma^{2}}\sum_{j=0}^{6}(-1)^{j}\tbinom{6}{j}e^{\sigma^{2}(6-j)(5-j)/2},

and a direct calculation reveals that 𝔼|ξ12pp|6\mathbb{E}|\frac{\xi_{1}^{2}-p}{\sqrt{p}}|^{6} is a polynomial function of τ\tau whose coefficients are bounded as pp\to\infty. This ensures that ξ1\xi_{1} satisfies the (4+ε)(4+\varepsilon)-moment condition in (3.2).

Acknowledgements

The authors thank the reviewers and Associate Editor for their helpful feedback, which significantly improved the paper.

References

  • Akemann et al. [2011] Akemann, G., Baik, J., and Di Francesco, P. The Oxford Handbook of Random Matrix Theory. Oxford, 2011.
  • Anderson [2003] Anderson, T. W. Introduction to Multivariate Statistical Analysis. Wiley, 2003.
  • Bai and Silverstein [2004] Bai, Z. and Silverstein, J. W. CLT for linear spectral statistics of large-dimensional sample covariance matrices. Annals of Probability, 32(1A):553–605, 2004.
  • Bai and Saranadasa [1996] Bai, Z. and Saranadasa, H. Effect of high dimension: by an example of a two sample problem. Statistica Sinica, pages 311–329, 1996.
  • Bai and Silverstein [2010] Bai, Z. and Silverstein, J. W. Spectral Analysis of Large Dimensional Random Matrices. Springer, 2010.
  • Bai and Zhou [2008] Bai, Z. and Zhou, W. Large sample covariance matrices without independence structures in columns. Statistica Sinica, pages 425–442, 2008.
  • Bai et al. [2010] Bai, Z., Chen, J., and Yao, J. On estimation of the population spectral distribution from a high-dimensional sample covariance matrix. Australian & New Zealand Journal of Statistics, 52(4):423–437, 2010.
  • Boucheron et al. [2013] Boucheron, S., Lugosi, G., and Massart, P. Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford, 2013.
  • Couillet and Debbah [2011] Couillet, R. and Debbah, M. Random Matrix Methods for Wireless Communications. Cambridge, 2011.
  • Edelman and Wang [2013] Edelman, A. and Wang, Y. Random matrix theory and its innovative applications. In Advances in Applied Mathematics, Modeling, and Computational Science, pages 91–116. Springer, 2013.
  • El Karoui [2009] El Karoui, N. Concentration of measure and spectra of random matrices: Applications to correlation matrices, elliptical distributions and beyond. Annals of Applied Probability, 19(6):2362–2405, 2009.
  • El Karoui [2008] El Karoui, N. Spectrum estimation for large dimensional covariance matrices using random matrix theory. Annals of Statistics, 36(6):2757–2790, 2008.
  • El Karoui and Purdom [2019] El Karoui, N. and Purdom, E. The non-parametric bootstrap and spectral analysis in moderate and high-dimension. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 2115–2124. PMLR, 2019.
  • Fang et al. [1990] Fang, K.-T., Kotz, S., and Ng, K. W. Symmetric Multivariate and Related Distributions. Chapman and Hall, 1990.
  • Fisher et al. [2010] Fisher, T. J., Sun, X., and Gallagher, C. M. A new test for sphericity of the covariance matrix for high dimensional data. Journal of Multivariate Analysis, 101(10):2554–2570, 2010.
  • Gupta et al. [2013] Gupta, A. K., Varga, T., and Bodnar, T. Elliptically Contoured Models in Statistics and Portfolio Theory. Springer, 2013.
  • Härdle and Simar [2019] Härdle, W. K. and Simar, L. Applied Multivariate Statistical Analysis. Springer, 2019.
  • Hu et al. [2019a] Hu, J., Li, W., Liu, Z., and Zhou, W. High-dimensional covariance matrices in elliptical distributions with application to spherical test. Annals of Statistics, 47(1):527–555, 2019a.
  • Hu et al. [2019b] Hu, J., Li, W., and Zhou, W. Central limit theorem for mutual information of large MIMO systems with elliptically correlated channels. IEEE Transactions on Information Theory, 65(11):7168–7180, 2019b.
  • Johnson et al. [1994] Johnson, N. L., Kotz, S., and Balakrishnan, N. Continuous Univariate Distributions, Volume 1, (2nd ed.). Wiley, 1994.
  • Johnson et al. [1995] Johnson, N. L., Kotz, S., and Balakrishnan, N. Continuous Univariate Distributions, Volume 2, (2nd ed.). Wiley, 1995.
  • Johnson et al. [1985] Johnson, W. B., Schechtman, G., and Zinn, J. Best constants in moment inequalities for linear combinations of independent and exchangeable random variables. Annals of Probability, pages 234–253, 1985.
  • Jonsson [1982] Jonsson, D. Some limit theorems for the eigenvalues of a sample covariance matrix. Journal of Multivariate Analysis, 12(1):1–38, 1982.
  • Jun et al. [2022] Jun, W., Jiahui, X., Long, Y., and Wang, Z. Tracy-Widom limit for the largest eigenvalue of high-dimensional covariance matrices in elliptical distributions. Bernoulli, 28(4):2941–2967, 2022.
  • Kong and Valiant [2017] Kong, W. and Valiant, G. Spectrum estimation from samples. Annals of Statistics, 45(5):2218–2247, 2017.
  • Ledoit and Wolf [2015] Ledoit, O. and Wolf, M. Spectrum estimation: A unified framework for covariance matrix estimation and PCA in large dimensions. Journal of Multivariate Analysis, 139:360–384, 2015.
  • Ledoit and Wolf [2017] Ledoit, O. and Wolf, M. Numerical implementation of the QuEST function. Computational Statistics & Data Analysis, 115:199–223, 2017.
  • Li and Yao [2018] Li, W. and Yao, J. On structure testing for component covariance matrices of a high dimensional mixture. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 80(2):293–318, 2018.
  • Li et al. [2022] Li, W., Wang, Q., Yao, J., and Zhou, W. On eigenvalues of a high-dimensional spatial-sign covariance matrix. Bernoulli, 28(1):606–637, 2022.
  • Lopes [2016] Lopes, M. E. Unknown sparsity in compressed sensing: Denoising and inference. IEEE Transactions on Information Theory, 62(9):5145–5166, 2016.
  • Lopes [2022a] Lopes, M. E. Central limit theorem and bootstrap approximation in high dimensions: Near 1/n1/\sqrt{n} rates via implicit smoothing. The Annals of Statistics, 50(5):2492–2513, 2022a.
  • Lopes [2022b] Lopes, M. E. Improved rates of bootstrap approximation for the operator norm: A coordinate-free approach. arXiv:2208.03050, 2022b.
  • Lopes and Yao [2022] Lopes, M. E. and Yao, J. A sharp lower-tail bound for Gaussian maxima with application to bootstrap methods in high dimensions. Electronic Journal of Statistics, 16(1):58–83, 2022.
  • Lopes et al. [2019] Lopes, M. E., Blandino, A., and Aue, A. Bootstrapping spectral statistics in high dimensions. Biometrika, 106(4):781–801, 2019.
  • Mammen [1993] Mammen, E. Bootstrap and wild bootstrap for high dimensional linear models. Annals of Statistics, 21(1):255–285, 1993.
  • Marcus et al. [2022] Marcus, A. W., Spielman, D. A., and Srivastava, N. Interlacing families III: Sharper restricted invertibility estimates. Israel Journal of Mathematics, 247(2):519–546, 2022.
  • McNeil et al. [2011] McNeil, A., Frey, R., and Embrechts, P. Quantitative Risk Management. Princeton, 2011.
  • Mestre [2008] Mestre, X. Improved estimation of eigenvalues and eigenvectors of covariance matrices using their sample estimates. IEEE Transactions on Information Theory, 54(11):5113–5129, 2008.
  • Paindaveine and Verdebout [2016] Paindaveine, D. and Verdebout, T. On high-dimensional sign tests. Bernoulli, 22(3):1745 – 1769, 2016.
  • Patterson et al. [2006] Patterson, N., Price, A. L., and Reich, D. Population structure and eigenanalysis. PLoS genetics, 2(12):e190, 2006.
  • Potters and Bouchaud [2020] Potters, M. and Bouchaud, J.-P. A First Course in Random Matrix Theory: For Physicists, Engineers and Data Scientists. Cambridge, 2020.
  • Siegrist [2017] Siegrist, K. Probability, Mathematical Statistics, and Stochastic Processes. LibreTexts, 2017.
  • Srivastava [2005] Srivastava, M. S. Some tests concerning the covariance matrix in high dimensional data. Journal of the Japan Statistical Society, 35(2):251–272, 2005.
  • Tang and Nehorai [2012] Tang, G. and Nehorai, A. The stability of low-rank matrix reconstruction: a constrained singular value view. IEEE Transactions on Information Theory, 58(9):6079–6092, 2012.
  • Tian et al. [2015] Tian, X., Lu, Y., and Li, W. A robust test for sphericity of high-dimensional covariance matrices. Journal of Multivariate Analysis, 141:217–227, 2015.
  • Vershynin [2018] Vershynin, R. High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge, 2018.
  • Yang et al. [2021] Yang, X., Zheng, X., and Chen, J. Testing high-dimensional covariance matrices under the elliptical distribution and beyond. Journal of Econometrics, 221(2):409–423, 2021.
  • Yao et al. [2015] Yao, J., Zheng, S., and Bai, Z. Sample Covariance Matrices and High-Dimensional Data Analysis. Cambridge, 2015.
  • Yao and Lopes [2023] Yao, J. and Lopes, M. E. Rates of bootstrap approximation for eigenvalues in high-dimensional PCA. to appear: Statistica Sinica (arXiv:2104.07328), 2023.
  • Yin et al. [1988] Yin, Y. Q., Bai, Z., and Krishnaiah, P. R. On the limit of the largest eigenvalue of the large dimensional sample covariance matrix. Probability Theory and Related Fields, 78(4):509–521, 1988.
  • Zhang et al. [2022] Zhang, Y., Hu, J., and Li, W. CLT for linear spectral statistics of high-dimensional sample covariance matrices in elliptical distributions. Journal of Multivariate Analysis, page 105007, 2022.
  • Zhao and Candès [2022] Zhao, Q. and Candès, E. J. An adaptively resized parametric bootstrap for inference in high-dimensional generalized linear models. arXiv:2208.08944, 2022.