A Bootstrap Method for Spectral Statistics in High-Dimensional
Elliptical Models

Siyao Wanglabel=e1][email protected] [ Miles E. Lopeslabel=e2][email protected] [ Department of Statistics, University of California, Davis

(2020)

Abstract

Although there is an extensive literature on the eigenvalues of high-dimensional sample covariance matrices, much of it is specialized to independent components (IC) models—in which observations are represented as linear transformations of random vectors with independent entries. By contrast, less is known in the context of elliptical models, which violate the independence structure of IC models and exhibit quite different statistical phenomena. In particular, very little is known about the scope of bootstrap methods for doing inference with spectral statistics in high-dimensional elliptical models. To fill this gap, we show how a bootstrap approach developed previously for IC models can be extended to handle the different properties of elliptical models. Within this setting, our main theoretical result guarantees that the proposed method consistently approximates the distributions of linear spectral statistics, which play a fundamental role in multivariate analysis. We also provide empirical results showing that the proposed method performs well for a variety of nonlinear spectral statistics.

62H99 ,

62F40,

62H25,

high-dimensional statistics,

bootstrap,

covariance matrices,

elliptical distributions,

keywords:

[class=MSC]

keywords:

^†^†volume: 0^†^†issue: 0

\startlocaldefs\endlocaldefs

and

t1Supported in part by NSF Grant DMS 1915786

1 Introduction

The analysis of spectral statistics of sample covariance matrices is a major research area within multivariate analysis, random matrix theory, high-dimensional statistics, and related fields [5, 48, 46, 41]. If $\mathbf{x}_{1},\dots,\mathbf{x}_{n}$ are centered i.i.d. observations in $\mathbb{R}^{p}$ with a sample covariance matrix denoted by

\widehat{\Sigma}_{n}=\frac{1}{n}\sum_{i=1}^{n}\mathbf{x}_{i}\mathbf{x}_{i}^{\top},

(1.1)

then we say that a random variable $T_{n}$ is a spectral statistic if it has the form $T_{n}=\psi(\lambda_{1}(\widehat{\Sigma}_{n}),\dots,\lambda_{p}(\widehat{\Sigma}_{n}))$ , where $\lambda_{1}(\widehat{\Sigma}_{n})\geq\cdots\geq\lambda_{p}(\widehat{\Sigma}_{n})$ are the sorted eigenvalues of $\widehat{\Sigma}_{n}$ , and $\psi$ is a generic real-valued function. Over the past two decades, there has been a tremendous growth of interest in spectral statistics in high-dimensional settings where $p=p(n)$ grows so that $p/n$ converges to a positive constant as $n\to\infty$ . Likewise, statistical tools for approximating the distributions of spectral statistics have been applied to high-dimensional data in a broad range of domains, such as electrical engineering, finance, and biology [40, 1, 9, 10].

Although the research on spectral statistics has dealt with many different statistical models, two of the most influential ones have been elliptical and independent components (IC) models. To be specific, we say that the random vector $\mathbf{x}_{1}$ follows an elliptical model if it can be represented in the form

\mathbf{x}_{1}=\xi_{1}\Sigma_{n}^{1/2}\mathbf{u}_{1}

(1.2)

where $\Sigma_{n}=\mathbb{E}(\mathbf{x}_{1}\mathbf{x}_{1}^{\top})$ , and $(\mathbf{u}_{1},\xi_{1})\in\mathbb{R}^{p}\times[0,\infty)$ is a random vector such that $\mathbf{u}_{1}$ and $\xi_{1}$ are independent, with $\mathbf{u}_{1}$ being uniformly distributed on the unit sphere of $\mathbb{R}^{p}$ . Alternatively, we say that $\mathbf{x}_{1}$ follows an IC model if

\mathbf{x}_{1}=\Sigma_{n}^{1/2}\mathbf{z}_{1},

(1.3)

where $\mathbf{z}_{1}\in\mathbb{R}^{p}$ is a random vector whose entries are centered and independent.

At first sight, these two models may seem to be very similar, but this outward appearance conceals some crucial differences in modelling capabilities. In particular, it should be stressed that the entries of the random vector $\xi_{1}\mathbf{u}_{1}$ in an elliptical model are correlated, which contrasts with the independence of the entries of $\mathbf{z}_{1}$ in an IC model. Also, since the scalar random variable $\xi_{1}$ is shared across all entries of $\mathbf{x}_{1}$ in an elliptical model, this enhances the ability to capture scenarios where the magnitudes of all entries of $\mathbf{x}_{1}$ move in the same direction simultaneously. This is a key effect in some application domains, such as in finance, where the entries of $\mathbf{x}_{1}$ correspond to stock prices that can fall in tandem during a sharp market downturn. Additional background on related merits of elliptical models can be found in [11, 37, 16]. More generally, the multivariate analysis literature has placed a longstanding emphasis on the benefits of elliptical models in fitting various types of non-Gaussian data [14, 2, 17].

However, looking beyond the points just mentioned, IC models have played a more dominant role than elliptical models in the literature on spectral statistics in high dimensions. Consequently, the established body of high-dimensional limit theory is much less complete for elliptical models. Indeed, the challenge of extending results from IC models to elliptical ones has become a prominent topic of ongoing research, which has led to important advances in the limit theory for spectral statistics [e.g. 28, 19, 18, 47, 24, 29, 51]. As a matter of historical context, it also worth bearing in mind that for some spectral statistics, it took many years for such extensions to be established.

From the standpoint of statistical methodology, a corresponding set of gaps exists between elliptical and IC models. These gaps are especially apparent in the current state of bootstrap methods for high-dimensional data. In particular, it is known from [34] that a form of parametric bootstrapping can successfully approximate the distributions of spectral statistics in IC models, whereas very little is known for elliptical models. Accordingly, our primary goal in the current paper is to resolve this issue by developing a parametric bootstrap method that is both theoretically and empirically effective in high-dimensional elliptical models.

With regard to theory, we will focus on the class of linear spectral statistics, which have the form

T_{n}(f)=\frac{1}{p}\sum_{j=1}^{p}f(\lambda_{j}(\widehat{\Sigma}_{n})),

(1.4)

for a suitable real-valued function $f$ . Beginning with the pathbreaking works [23, 3] that established the earliest versions of the central limit theorem for linear spectral statistics in high dimensions, these statistics have been a perennial focus of research. Their importance is underscored by the fact that they appear frequently throughout multivariate analysis, with some of the most well-known examples being $\frac{1}{p}\textup{tr}(\widehat{\Sigma}_{n})$ , $\frac{1}{p}\textup{tr}(\widehat{\Sigma}_{n}^{2})$ , and $\frac{1}{p}\log\det(\widehat{\Sigma}_{n})$ , among various other classical statistics for testing hypotheses [48].

Motivated by these considerations, our main theoretical result (Theorem 2) shows that the proposed bootstrap method consistently approximates the distributions of linear spectral statistics when the underlying data are elliptical and $p/n$ converges to a positive constant as $n\to\infty$ . The proof substantially leverages recent progress on the central limit theorem for linear spectral statistics in elliptical models due to [18]. Also, an intermediate step in the proof (Lemma A.4) shows that the well-known eigenvalue estimation method QuEST is consistent in elliptical models—which may be of independent interest, since it seems that QuEST’s consistency has not previously been reported outside of IC models [26]. Moreover, Section 3.4 develops an application of Theorem 2 where we establish the asymptotic validity of inference procedures related to the stable rank parameter $r_{n}=\textup{tr}(\Sigma_{n})^{2}/\textup{tr}(\Sigma_{n}^{2})$ of the population covariance matrix $\Sigma_{n}$ .

To address the empirical performance of the proposed method, Section 4 presents numerical results for a wide variety of model settings and statistics. Most notably, these results show encouraging performance for both linear and nonlinear spectral statistics. (We regard any function of $(\lambda_{1}(\widehat{\Sigma}_{n}),\dots,\lambda_{p}(\widehat{\Sigma}_{n}))$ that is not of the form (1.4) as a nonlinear spectral statistic.) To put this point into perspective, it is important to highlight the fact that asymptotic formulas for the distributions of nonlinear spectral statistics are typically developed on a case-by-case basis, and are relatively scarce in comparison to those for linear spectral statistics. Even when such formulas are available, they may be very different from those for linear spectral statistics—as they may require the estimation of different model parameters, or the implementation of different algorithms for numerical evaluation. On the other hand, the bootstrap approach may be considered more user-friendly, since it can be applied to different types of spectral statistics in an automatic and unified way. Similarly, the bootstrap approach can provide the user with the freedom to easily explore statistics that depend on several linear spectral statistics in a complicated manner (which would otherwise require intricate delta-method calculations), or statistics for which formulas may not be available at all.

Notation and terminology. For a random object $Y$ , the expression $\mathcal{L}(Y)$ denotes its distribution, and $\mathcal{L}(Y|X)$ denotes its conditional distribution given the observations $\mathbf{x}_{1},\dots,\mathbf{x}_{n}$ . Similarly, we use $|X$ when referring to probabilities, expectations, and variances that are conditional on the observations. The symbols $\xrightarrow{\mathbb{P}}$ and $\Rightarrow$ respectively denote convergence in probability and convergence in distribution. For a set $A\subset\mathbb{R}^{k}$ and a number $\delta>0$ , the outer $\delta$ -neighborhood of $A$ is defined as $A^{\delta}=\{\mathbf{a}^{\prime}\in\mathbb{R}^{k}|\inf_{\mathbf{a}\in A}\|\mathbf{a}^{\prime}-\mathbf{a}\|_{2}\leq\delta\}$ , where $\|\cdot\|_{2}$ is the Euclidean norm. If $\mathbf{v}$ and $\mathbf{w}$ are random vectors in $\mathbb{R}^{k}$ , then the Lévy-Prohorov metric $d_{\textup{LP}}(\mathcal{L}(\mathbf{v}),\mathcal{L}(\mathbf{w}))$ between their distributions is defined as the infimum over all numbers $\delta>0$ such that the inequality $\mathbb{P}(\mathbf{v}\in A)\leq\mathbb{P}(\mathbf{w}\in A^{\delta})+\delta$ holds for all Borel sets $A\subset\mathbb{R}^{k}$ . If $\nu,\nu_{1},\nu_{2},\dots$ is a sequence of random probability distributions on $\mathbb{R}^{k}$ , then the expression $\nu_{n}\overset{\mathbb{P}}{\Rightarrow}\nu$ means that the sequence of scalar random variables $d_{\textup{LP}}(\nu_{n},\nu)$ converges to 0 in probability as $n\to\infty$ . For two sequences of non-negative real numbers $\{a_{n}\}$ and $\{b_{n}\}$ , we write $a_{n}\lesssim b_{n}$ if there is a constant $C>0$ not depending on $n$ such that $a_{n}\leq Cb_{n}$ holds for all large $n$ . When both of the relations $a_{n}\lesssim b_{n}$ and $b_{n}\lesssim a_{n}$ hold, we write $a_{n}\asymp b_{n}$ . The relation $a_{n}=o(b_{n})$ means $a_{n}/b_{n}\to 0$ as $n\to\infty$ , and the relation $a_{n}=\mathcal{O}(b_{n})$ is equivalent to $a_{n}\lesssim b_{n}$ . The $k\times k$ identity matrix is denoted as $I_{k}$ , and indicator function for a condition $\cdots$ is denoted as $1\{\cdots\}$ . Lastly, we use $\mathbb{C}^{+}$ to refer to the set of complex numbers with positive imaginary part.

2 Method

Conceptually, the proposed method is motivated by the fact that the standard nonparametric bootstrap, based on sampling with replacement, often performs poorly when it is applied naively to high-dimensional data, unless special low-dimensional structure is available. (For additional background, see the papers [13, 32, 49], as well as the numerical results presented here at the end of Section 4.2.) This general difficulty can be understood by noting that sampling with replacement implicitly relies on the empirical distribution of the data as a substitute for the true data-generating distribution. In other words, the nonparametric bootstrap attempts to approximate a $p$ -dimensional distribution in a fully non-parametric way, which can be challenging for even moderately large values of $p$ . For this reason, alternative bootstrap methods that sample from parametric distributions have been advocated to improve upon the nonparametric bootstrap in high dimensions [e.g. 35, 34, 52], and this is the viewpoint that we pursue here.

2.1 Bootstrap algorithm

At an algorithmic level, the proposed method is built on top of two estimators. The first is an estimator $\widehat{\varsigma}_{n}^{\,2}$ for the variance parameter $\varsigma_{n}^{2}=\operatorname{var}(\xi_{1}^{2})$ . The second is an estimator $\tilde{\Sigma}_{n}$ for the diagonal matrix of population eigenvalues $\textup{diag}(\lambda_{1}(\Sigma_{n}),\dots,\lambda_{p}(\Sigma_{n}))$ . Once these two estimators have been assembled, the method generates bootstrap data from an elliptical model parameterized in terms of $\widehat{\varsigma}_{n}^{\,2}$ and $\tilde{\Sigma}_{n}$ . More specifically, the $i$ th sample of each bootstrap dataset $\mathbf{x}_{1}^{*},\dots,\mathbf{x}_{n}^{*}$ is generated to be of the form

\mathbf{x}_{i}^{*}=\xi_{i}^{*}\tilde{\Sigma}^{1/2}_{n}\mathbf{u}_{i}^{*},

where $\xi_{i}^{*}$ is a non-negative random variable satisfying $\mathbb{E}((\xi_{i}^{*})^{2}|X)=p$ and $\operatorname{var}((\xi_{i}^{*})^{2}|X)=\widehat{\varsigma}_{n}^{\,2}$ , and $\mathbf{u}_{i}^{*}\in\mathbb{R}^{p}$ is drawn uniformly from the unit sphere, independently of $\xi_{i}^{*}$ . Then, a single bootstrap sample of a generic spectral statistic $T_{n}=\psi(\lambda_{1}(\widehat{\Sigma}_{n}),\dots,\lambda_{p}(\widehat{\Sigma}_{n}))$ is computed as $T_{n}^{*}=\psi(\lambda_{1}(\widehat{\Sigma}_{n}^{*}),\dots,\lambda_{p}(\widehat{\Sigma}_{n}^{*}))$ , where $\widehat{\Sigma}_{n}^{*}$ is the sample covariance matrix of the bootstrap data.

To emphasize the modular role that $\widehat{\varsigma}_{n}^{\,2}$ and $\tilde{\Sigma}_{n}$ play in the bootstrap sampling process, we will provide the details for their construction later in Sections 2.2 and 2.3. With these points understood, the following algorithm shows that the method is very easy to implement.

Algorithm 1 (Bootstrap for spectral statistics).

Input: The number of bootstrap replicates $B$ , as well as $\tilde{\Sigma}_{n}$ and $\widehat{\varsigma}_{n}^{2}$ .
For: $b=1,\dots,B$ do in parallel

1.

Generate independent random variables $g_{1}^{*},\dots,g_{n}^{*}$ from a Gamma distribution with mean $p$ and variance $\widehat{\varsigma}_{n}^{2}$ , and then put $\xi^{*}_{i}=\sqrt{g^{*}_{i}}$ for $i=1,\dots,n$ .
2.

Generate independent random vectors $\mathbf{u}^{*}_{1},\ldots,\mathbf{u}^{*}_{n}\in\mathbb{R}^{p}$ from the uniform distribution on the unit sphere.
3.

Compute $\mathbf{x}^{*}_{i}=\xi^{*}_{i}\tilde{\Sigma}_{n}^{1/2}\mathbf{u}^{*}_{i}$ for $i=1,\dots,n$ , and form $\widehat{\Sigma}_{n}^{*}=\frac{1}{n}\sum_{i=1}^{n}\mathbf{x}_{i}^{*}(\mathbf{x}_{i}^{*})^{\top}$ .
4.

Compute the spectral statistic $T^{*}_{n,b}=\psi(\lambda_{1}(\widehat{\Sigma}_{n}^{*}),\ldots,\lambda_{p}(\widehat{\Sigma}_{n}^{*}))$ .

end for

Return: The empirical distribution of $T^{*}_{n,1},\ldots,T^{*}_{n,B}.$

Remarks. One basic but valuable feature of the algorithm is that it can be applied with equal ease to both linear and nonlinear spectral statistics. To comment on some more technical aspects of the algorithm, the Gamma distribution is used in step 1 because it offers a convenient way to generate non-negative random variables whose means and variances can be matched to any pair of positive numbers. (If the event $\widehat{\varsigma}_{n}^{\,2}=0$ happens to occur, then the Gamma distribution in step 1 is interpreted as the point mass at $p$ , so that $\xi_{i}^{*}=\sqrt{p}$ for all $i=1,\dots,n$ .) Nevertheless, the choice of the Gamma distribution for generating $g_{1}^{*},\dots,g_{n}^{*}$ is not required. Any other family of distributions on $[0,\infty)$ parameterized by means $\mu$ and variances $\sigma^{2}$ , say $\{G_{\mu,\sigma^{2}}\}$ , will be compatible with our bootstrap consistency result in Theorem 2 if it satisfies the following two conditions: First, the pair $(\mu,\sigma^{2})$ can be set to $(p,v)$ for any integer $p\geq 1$ and real number $v>0$ . Second, there exists some fixed $\varepsilon>0$ such that the quantity $\int|\frac{t-p}{\sqrt{p}}|^{4+\varepsilon}dG_{p,\sigma^{2}}(t)$ remains bounded when $p$ diverges and $\sigma^{2}/p$ converges to a finite limit.

2.2 Variance estimation

The estimation of the parameter $\varsigma_{n}^{2}=\operatorname{var}(\xi_{1}^{2})$ is complicated by the fact that the random variables $\xi_{1},\dots,\xi_{n}$ are not directly observable in an elliptical model. It is possible to overcome this challenge with the following estimating equation, which can be derived from an explicit formula for $\operatorname{var}(\|\mathbf{x}_{1}\|_{2}^{2})$ that is given in Lemma D.1,

\varsigma_{n}^{2}\ =\ p(p+2)\frac{\operatorname{var}(\|\mathbf{x}_{1}\|_{2}^{2})-2\textup{tr}(\Sigma_{n}^{2})}{\textup{tr}(\Sigma_{n})^{2}+2\textup{tr}(\Sigma_{n}^{2})}+2p.

(2.1)

Based on this equation, our approach is to separately estimate each of the three moment parameters on the right hand side, denoted as

\begin{split}\alpha_{n}&=\textup{tr}(\Sigma_{n}^{2})\\[5.69046pt] \beta_{n}&=\operatorname{var}(\|\mathbf{x}_{1}\|_{2}^{2})\\[5.69046pt] \gamma_{n}&=\textup{tr}(\Sigma_{n})^{2}.\end{split}

These parameters have the advantage that they can be estimated in a more direct manner, due to their simpler relations with the observations and the matrix $\Sigma_{n}$ . Specifically, we use estimates defined according to

	$\displaystyle\widehat{\alpha}_{n}$	$\displaystyle=\textup{tr}(\widehat{\Sigma}_{n}^{2})-\textstyle\frac{1}{n}\textup{tr}(\widehat{\Sigma}_{n})^{2},$
	$\displaystyle\widehat{\beta}_{n}$	$\displaystyle=\frac{1}{n-1}\sum_{i=1}^{n}\Big{(}\\|\mathbf{x}_{i}\\|_{2}^{2}-\textstyle\frac{1}{n}\sum_{i^{\prime}=1}^{n}\\|\mathbf{x}_{i^{\prime}}\\|_{2}^{2}\Big{)}^{2},$
	$\displaystyle\widehat{\gamma}_{n}$	$\displaystyle=\textup{tr}(\widehat{\Sigma}_{n})^{2}.$

Substituting these estimates into (2.1) yields our proposed estimate for $\varsigma_{n}^{2}$

\displaystyle\widehat{\varsigma}_{n}^{\,2}=\bigg{(}p(p+2)\frac{\widehat{\beta}_{n}-2\widehat{\alpha}_{n}}{\widehat{\gamma}_{n}+2\widehat{\alpha}_{n}}+2p\bigg{)}_{\!+}

(2.2)

where $x_{+}=\max\{x,0\}$ denotes the non-negative part of any real number $x$ . The consistency of this estimate will be established in Theorem 1, which shows $\frac{1}{p}(\widehat{\varsigma}_{n}^{\,2}-\varsigma_{n}^{2})\to 0$ in probability as $n\to\infty$ .

2.3 Spectrum estimation

The problem of estimating the eigenvalues of a population covariance matrix has attracted long-term interest in the high-dimensional statistics literature, and many different estimation methods have been proposed [e.g. 12, 38, 7, 26, 25]. In order to estimate $\lambda_{1}(\Sigma_{n}),\dots,\lambda_{p}(\Sigma_{n})$ in our current setting, we modify the method of QuEST [26], which has become widely used in recent years. This choice has the benefit of making all aspects of our proposed method easy to implement, because QuEST is supported by a turnkey software package [27].

We denote the estimates produced by QuEST as $\widehat{\lambda}_{\textup{Q},1}\geq\dots\geq\widehat{\lambda}_{\textup{Q},p}$ , and we will use modified versions of them defined by

\displaystyle\tilde{\lambda}_{j}=\min(\widehat{\lambda}_{\textup{Q},j},\widehat{b}_{n}),

(2.3)

for $j=1,\dots,p$ , where we let $\widehat{b}_{n}=\lambda_{1}(\widehat{\Sigma}_{n})+1$ . The modification is done for theoretical reasons, to ensure that $\tilde{\lambda}_{1},\dots,\tilde{\lambda}_{p}$ are asymptotically bounded, which follows from Lemma A.5. In addition, we define the diagonal $p\times p$ matrix associated with these estimates as

\displaystyle\tilde{\Sigma}_{n}=\textup{diag}(\tilde{\lambda}_{1},\ldots,\tilde{\lambda}_{p}).

(2.4)

Later, in Theorem 1, we will show that the estimates $\tilde{\lambda}_{1},\dots,\tilde{\lambda}_{p}$ are consistent, in the sense that their empirical distribution converges weakly in probability to the correct limit as $n\to\infty$ .

3 Theoretical results

In this section, we present three theoretical guarantees for the proposed method. Theorem 1 establishes appropriate notions of consistency for each of the estimators $\widehat{\varsigma}_{n}^{\,2}$ and $\tilde{\Sigma}_{n}$ . Second, our main result in Theorem 2 shows that the bootstrap samples generated in Algorithm 1 consistently approximate the distributions of linear spectral statistics. Lastly, Theorem 3 demonstrates the asymptotic validity of bootstrap-based inference procedures involving nonlinear spectral statistics.

3.1 Setup

All of our theoretical analysis is framed in terms of a sequence of models indexed by $n$ , so that all model parameters are allowed to vary with $n$ , except when stated otherwise. The details of our model assumptions are given below in Assumptions 1 and 2.

Assumption 1 (Data generating model).

As $n\to\infty$ , the dimension $p$ grows so that the ratio $c_{n}=p/n$ satisfies $c_{n}\to c$ for some positive constant $c$ different from 1. For each $i=1,\dots,n$ , the observation $\mathbf{x}_{i}\in\mathbb{R}^{p}$ can be represented as

\mathbf{x}_{i}=\xi_{i}\Sigma_{n}^{1/2}\mathbf{u}_{i},

(3.1)

where $\Sigma_{n}\in\mathbb{R}^{p\times p}$ is a deterministic non-zero positive semidefinite matrix, and $(\mathbf{u}_{1},\xi_{1}),\dots,$ $(\mathbf{u}_{n},\xi_{n})$ are i.i.d. random vectors in $\mathbb{R}^{p}\times[0,\infty)$ satisfying the following conditions: The vector $\mathbf{u}_{1}$ is drawn from the uniform distribution on the unit sphere of $\mathbb{R}^{p}$ , and is independent of $\xi_{1}$ . In addition, as $n\to\infty$ , the random variable $\xi_{1}^{2}$ satisfies $\mathbb{E}(\xi_{1}^{2})=p$ , as well as the conditions

\displaystyle\operatorname{var}\Big{(}\textstyle\frac{\xi_{1}^{2}-p}{\sqrt{p}}\Big{)}=\tau+o(1)\ \ \ \ \ \ \text{ and}\ \ \ \ \ \ \mathbb{E}\,\Big{|}\textstyle\frac{\xi_{1}^{2}-p}{\sqrt{p}}\Big{|}^{4+\varepsilon}\,\lesssim\,1,

(3.2)

for some fixed constants $\tau\geq 0$ and $\varepsilon>0$ that do not depend on $n$ .

Remarks. With regard to the limiting value $c$ for the ratio $p/n$ , the single case of $c=1$ is excluded so that we may employ certain facts about the QuEST method that were established in [26]. From a practical standpoint, our proposed method can still be used effectively when $p/n=1$ , as shown in Section 4. To address the moment conditions on $\xi_{1}^{2}$ , a similar set of conditions was used in [18] to establish a high-dimensional central limit theorem for linear spectral statistics. However, our condition involving a $4+\varepsilon$ moment for $(\xi_{1}^{2}-p)/\sqrt{p}$ replaces a corresponding $2+\varepsilon$ moment condition in that work. The extra bit of integrability is used here to show that the estimators $\widehat{\varsigma}_{n}^{2}$ and $\tilde{\Sigma}_{n}$ have suitable asymptotic properties for ensuring bootstrap consistency.

Examples. To illustrate that the conditions in (3.2) cover a substantial range of situations, it is possible to provide quite a few explicit examples of distributions for $\xi_{1}^{2}$ that are conforming:

1.

Chi-Squared distribution with $p$ degrees of freedom
2.

Poisson( $p$ )
3.

$(1-\tau)$ Negative-Binomial $(p,1-\tau)$ , for any $\tau\in(0,1)$
4.

Gamma $(p/\tau,1/\tau)$ , for any $\tau>0$
5.

Beta-Prime $\left(\frac{p(1+p+\tau)}{\tau},\frac{1+p+2\tau}{\tau}\right)$ , for any $\tau>0$
6.

Log-Normal $\left(\log(p)-\frac{1}{2}\log\big{(}1+\frac{\tau}{p}\big{)},\log\big{(}1+\frac{\tau}{p}\big{)}\right)$ , for any $\tau>0$
7.

$(p+2\beta)$ Beta $(p/2,\beta)$ , for any $\beta>0$

It is also possible to give a more abstract class of examples that subsumes some of the previous ones as special cases. In detail, the conditions in (3.2) will hold for any $\tau\geq 0$ if $\xi_{1}^{2}=\sum_{j=1}^{p}z_{1j}^{2}$ for some independent random variables $z_{11},\dots,z_{1p}$ satisfying

\small\textstyle\frac{1}{p}\sum_{j=1}^{p}\mathbb{E}(z_{1j}^{2})=1,\quad\textstyle\frac{1}{p}\sum_{j=1}^{p}\operatorname{var}(z_{1j}^{2})\to\tau,\text{\quad and \quad}\displaystyle\max_{1\leq j\leq p}\mathbb{E}|z_{1j}|^{8+2\varepsilon}\,\lesssim\,p^{1+\frac{\varepsilon}{4}}.

(3.3)

Further details for checking the validity of the previous examples, as well as explicit parameterizations, are provided in Appendix E.

In addition to Assumption 1, we need one more assumption dealing with the spectrum of the population covariance matrix $\Sigma_{n}$ . To state this assumption, let $H_{n}$ denote the empirical distribution function associated with $\lambda_{1}(\Sigma_{n}),\dots,\lambda_{p}(\Sigma_{n})$ , which is defined for any $t\in\mathbb{R}$ according to

H_{n}(t)\ =\ \frac{1}{p}\displaystyle\sum_{j=1}^{p}1\{\lambda_{j}(\Sigma_{n})\leq t\}.

(3.4)

Assumption 2 (Spectral structure).

There is a limiting spectral distribution $H$ such that as $n\to\infty$ ,

H_{n}\ \Rightarrow\ H,

(3.5)

where the support of $H$ is a finite union of closed intervals, bounded away from $0$ and $\infty$ . Furthermore, there is a fixed compact interval in $(0,\infty)$ containing the support of $H_{n}$ for all large $n$ .

Remarks. Variations of these conditions on population eigenvalues are commonly used throughout random matrix theory. This particular set of conditions was used in [26] to establish theoretical guarantees for the QuEST estimation method in the context of IC models.

3.2 Consistency of estimators

Here, we establish the consistency of the estimators $\widehat{\varsigma}_{n}^{\,2}$ and $\tilde{\Sigma}_{n}$ , defined in (2.2) and (2.4). The appropriate notion of consistency for $\tilde{\Sigma}_{n}$ is stated in terms of its empirical spectral distribution function, which is defined for any $t\in\mathbb{R}$ as

\tilde{H}_{n}(t)=\frac{1}{p}\sum_{j=1}^{p}1\{\tilde{\lambda}_{j}\leq t\}.

(3.6)

Theorem 1.

Under Assumptions 1 and 2, the following limits hold as $n\to\infty$ ,

	$\displaystyle\textstyle\frac{1}{p}(\widehat{\varsigma}_{n}^{2}-\varsigma_{n}^{2})\$	$\displaystyle\xrightarrow{\mathbb{P}}0,$		(3.7)
	$\displaystyle\tilde{H}_{n}$	$\displaystyle\xRightarrow{\mathbb{P}}H.$		(3.8)

Remarks. The limits (3.7) and (3.8) are proved in Appendices A.1 and A.2 respectively. Although these limits can be stated in a succinct form, quite a few details are involved in their proofs. For instance, the analysis of $\widehat{\varsigma}_{n}^{\,2}$ is based on extensive calculations with polynomial functions of the quadratic forms $\|\mathbf{x}_{i}\|_{2}^{2}$ and $\mathbf{x}_{i}^{\top}\mathbf{x}_{j}$ with $i\neq j$ , as well as associated mixed moments. The consistency of $\tilde{H}_{n}$ is also notable because it requires showing the consistency of QuEST in elliptical models, and it seems that the consistency of QuEST has not previously been reported outside of IC models.

3.3 Consistency of bootstrap

To develop our main result on the consistency of the proposed bootstrap method, it is necessary to recall some background facts and introduce several pieces of notation.

Under Assumptions 1 and 2, it is known from [6, Theorem 1.1] that an extended version of the classical Marčenko-Pastur Theorem holds for the empirical spectral distribution function $\widehat{H}_{n}(t)=\frac{1}{p}\sum_{j=1}^{p}1\{\lambda_{j}(\widehat{\Sigma}_{n})\leq t\}$ . Namely, there is a probability distribution $\Psi(H,c)$ on $[0,\infty)$ , depending only on $H$ and $c$ , such that the weak limit $\widehat{H}_{n}\Rightarrow\Psi(H,c)$ occurs almost surely. In this statement, we may regard $\Psi(\cdot,\cdot)$ as a map that takes a distribution $H^{\prime}$ on $[0,\infty)$ and a number $c^{\prime}>0$ as input, and returns a new distribution $\Psi(H^{\prime}\!,c^{\prime})$ on $[0,\infty)$ whose Stieltjes transform $m_{H^{\prime}\!,c^{\prime}}(z)=\int\frac{1}{\lambda-z}d(\Psi(H^{\prime}\!,c^{\prime}))(\lambda)$ solves the Marčenko-Pastur equation (3.9) below. That is, for any $z\in\mathbb{C}^{+}$ , the number $m_{H^{\prime}\!,c^{\prime}}=m_{H^{\prime}\!,c^{\prime}}(z)$ is the unique solution to the equation

m_{H^{\prime}\!,c^{\prime}}=\int\frac{1}{t(1-c^{\prime}-c^{\prime}zm_{H^{\prime}\!,c^{\prime}})-z}dH^{\prime}(t)

(3.9)

within the set $\{m\in\mathbb{C}\,|\,(c^{\prime}-1)/z+c^{\prime}m\in\mathbb{C}^{+}\}$ .

The map $\Psi(\cdot,\cdot)$ is relevant to our purposes here, because it determines a centering parameter that is commonly used in limit theorems for linear spectral statistics. In detail, if $H_{n,c_{n}}$ denotes a shorthand for the probability distribution $\Psi(H_{n},c_{n})$ , and if $f$ is a real-valued function defined on the support of $H_{n,c_{n}}$ , then the associated centering parameter is defined as

\vartheta_{n}(f)\ =\ \int f(t)dH_{n,c_{n}}(t).

(3.10)

Similarly, let $\tilde{H}_{n,c_{n}}=\Psi(\tilde{H}_{n},c_{n})$ and $\tilde{\vartheta}_{n}(f)=\int f(t)d\tilde{H}_{n,c_{n}}(t)$ . Also, in order to simplify notation for handling the joint distribution of several linear spectral statistics arising from a fixed set of functions $\mathbf{f}=(f_{1},\dots,f_{k})$ , we write $T_{n}(\mathbf{f})=(T_{n}(f_{1}),\dots,T_{n}(f_{k}))$ , and likewise for $T_{n,1}^{*}(\mathbf{f})$ , $\vartheta_{n}(\mathbf{f})$ , and $\tilde{\vartheta}_{n}(\mathbf{f})$ .

As one more preparatory item, recall from page 1 that $d_{\textup{LP}}$ denotes the Lévy-Prohorov metric for comparing distributions on $\mathbb{R}^{k}$ . This metric is a standard choice for formulating bootstrap consistency results, as it has the fundamental property of metrizing weak convergence.

Theorem 2.

Suppose Assumptions 1 and 2 hold, and let $\mathbf{f}=(f_{1},\ldots,f_{k})$ be a fixed set of real-valued functions that are analytic on an open subset of $\mathbb{R}$ containing $[0,\infty)$ . Under these conditions, if $T_{n,1}^{*}(\mathbf{f})$ is generated as in Algorithm 1 using the estimators $\widehat{\varsigma}_{n}^{\,2}$ and $\tilde{\Sigma}_{n}$ defined by (2.2) and (2.4), then the following limit holds as $n\to\infty$

d_{\mathrm{LP}}\Big{(}\mathcal{L}\big{(}p\{T_{n}(\mathbf{f})-\vartheta_{n}(\mathbf{f})\}\big{)}\,,\,\mathcal{L}\big{(}p\{T_{n,1}^{*}(\mathbf{f})-\tilde{\vartheta}_{n}(\mathbf{f})\}\big{|}X\big{)}\Big{)}\ \xrightarrow{\mathbb{P}}\ 0.

(3.11)

Remarks. The proof is given in Appendix B, and makes key use of a recently developed central limit theorem for linear spectral statistics due to [18]. Regarding other aspects of the theorem, there are two points to discuss. First, the assumption that the functions $f_{1},\dots,f_{k}$ are defined on an open set containing $[0,\infty)$ has been made for technical simplicity. In the setting where $p/n\to c>1$ , this assumption is minor because $f_{1},\dots,f_{k}$ must be defined at 0 due to the singularity of $\widehat{\Sigma}_{n}$ . Nevertheless, if $p/n\to c\in(0,1)$ , then it is possible to show that a corresponding version of the theorem holds for analytic functions that are not defined at 0, and our numerical results confirm that the proposed method can successfully handle such cases. Second, the quantities $\vartheta_{n}(\mathbf{f})$ and $\tilde{\vartheta}_{n}(\mathbf{f})$ are only introduced so that the distributions appearing in (3.11) have non-trivial weak limits, which facilitates the proof. Still, the proposed method can be applied without requiring any particular type of centering.

3.4 Application to inference on stable rank with guarantees

When dealing with high-dimensional covariance matrices, it is often of interest to have a measure of the number of “dominant eigenvalues”. One such measure is the stable rank, defined as

r_{n}=\frac{\ \textup{tr}(\Sigma_{n})^{2}}{\textup{tr}(\Sigma_{n}^{2})},

(3.12)

which arises naturally in a plethora of situations [e.g. 4, 43, 44, 30, 36, 33]. Whenever $\Sigma_{n}$ is non-zero, this parameter satisfies $1\leq r_{n}\leq\textup{rank}(\Sigma_{n})$ , and the equality $r_{n}=p$ holds if and only if $\Sigma_{n}$ is proportional to the identity matrix.

In this subsection, we illustrate how the proposed bootstrap method can be applied to solve some inference problems involving the parameter $r_{n}$ . Our first example shows how to construct a confidence interval for $r_{n}$ , and our subsequent examples deal with testing procedures related to $r_{n}$ . Later, in Theorem 3, we establish the theoretical validity of the methods used in these examples—showing that the confidence interval has asymptotically exact coverage, and that the relevant testing procedures maintain asymptotic control of their levels.

3.4.1 Confidence interval for stable rank

Our confidence interval for $r_{n}$ is constructed using the estimator

\displaystyle\widehat{r}_{n}=\frac{\textup{tr}(\widehat{\Sigma}_{n})^{2}}{\textup{tr}(\widehat{\Sigma}_{n}^{2})-\widehat{\Delta}_{n}},

(3.13)

where we define

	$\displaystyle\widehat{\Delta}_{n}$	$\displaystyle=\frac{\textup{tr}(\widehat{\Sigma}_{n}^{2})}{n}\bigg{[}\frac{2(n-1)}{n}\frac{p^{2}+\widehat{\varsigma}_{n}^{2}}{p(p+2)}-1\bigg{]}$
		$\displaystyle\ \ \ +\frac{\textup{tr}(\widehat{\Sigma}_{n})^{2}}{n}\bigg{[}\frac{n+1}{n}+\frac{n-1}{n}\frac{{\widehat{\varsigma}}_{n}^{2}-2p}{p(p+2)}-\frac{2(n-1)}{n^{2}}\frac{p^{2}+\widehat{\varsigma}_{n}^{2}}{p(p+2)}\bigg{]},$

and we set $\widehat{r}_{n}$ equal to $n$ in the exceptional case that its denominator is 0. It should be noted that $\widehat{r}_{n}$ differs from the naive plug-in rule $\textup{tr}(\widehat{\Sigma}_{n})^{2}/\textup{tr}(\widehat{\Sigma}_{n}^{2})$ , since the extra term $\widehat{\Delta}$ in the denominator serves as a bias correction.

To proceed, let $q_{1-\alpha}$ denote $(1-\alpha)$ -quantile of the random variable $(\widehat{r}_{n}-r_{n})/p$ for any fixed $\alpha\in(0,1)$ , and consider the interval $\mathcal{I}_{n}=[\widehat{r}_{n}-pq_{1-\alpha/2},\widehat{r}_{n}-pq_{\alpha/2}]$ . Whenever the distribution of $(\widehat{r}_{n}-r_{n})/p$ is continuous, this interval satisfies

\mathbb{P}(r_{n}\in\mathcal{I}_{n})=1-\alpha.

(3.14)

However, the quantiles $q_{\alpha/2}$ and $q_{1-\alpha/2}$ are unknown, and so they must be estimated. This can be accomplished by generating bootstrap samples of the following form in Algorithm 1,

T^{*}_{n,1}=\frac{1}{p}\bigg{(}\frac{\textup{tr}(\widehat{\Sigma}^{*}_{n})^{2}}{\textup{tr}((\widehat{\Sigma}^{*}_{n})^{2})-\widehat{\Delta}_{n}^{{}^{*}}}\,-\,\frac{\textup{tr}(\tilde{\Sigma}_{n})^{2}}{\textup{tr}(\tilde{\Sigma}_{n}^{2})}\bigg{)},

where $\widehat{\Delta}_{n}^{{}^{*}}$ is defined by modifying the previous formula for $\widehat{\Delta}_{n}$ so that $\widehat{\Sigma}_{n}$ and $\widehat{\varsigma}_{n}^{2}$ are replaced by versions computed with the bootstrap data $\mathbf{x}_{1}^{*},\dots,\mathbf{x}_{n}^{*}$ . Also, we define $T_{n,1}^{*}$ to be 0 in the exceptional case of a denominator being equal to 0. Letting $\widehat{q}_{\alpha/2}$ and $\widehat{q}_{1-\alpha/2}$ denote the respective $\alpha/2$ and $(1-\alpha/2)$ -quantiles of $\mathcal{L}(T_{n,1}^{*}|X)$ , the proposed confidence interval is defined as

\widehat{\mathcal{I}}_{n}\ =\ \Big{[}\widehat{r}_{n}-p\widehat{q}_{1-\alpha/2}\,,\,\widehat{r}_{n}-p\widehat{q}_{\alpha/2}\Big{]}.

(3.15)

Below, Theorem 3 shows that as $n\to\infty$ , the coverage probability $\mathbb{P}(r_{n}\in\widehat{\mathcal{I}}_{n})$ converges to $1-\alpha$ , as desired.

3.4.2 Hypotheses related to stable rank

Screening data for PCA. Consider a scenario where a collection of different datasets are to be screened for further investigation by principal components analysis (PCA). In this situation, the datasets that should be discarded are the ones that cannot be well summarized by a moderate number of principal components. To put this in more quantitative terms, a dataset may be considered unsuitable for PCA if the stable rank $r_{n}$ exceeds a certain fraction of the full dimension $p$ . That is, if $\frac{r_{n}}{p}>\epsilon_{0}$ for some fixed reference value $\epsilon_{0}\in(0,1)$ . On the other hand, if $\frac{r_{n}}{p}\leq\epsilon_{0}$ , then the dataset may be retained. This leads to considering the hypothesis testing problem

\mathsf{H}_{0,n}:\textstyle\frac{r_{n}}{p}\leq\epsilon_{0}\text{ \ \ \ \ \ \ vs. \ \ \ \ \ \ }\mathsf{H}_{1,n}:\textstyle\frac{r_{n}}{p}>\epsilon_{0}.

(3.16)

To develop a testing procedure, we may again consider the $(1-\alpha)$ -quantile $q_{1-\alpha}$ of the random variable $(\widehat{r}_{n}-r_{n})/p$ . This quantile serves as a conceptual basis for a rejection criterion, because it satisfies the following inequality under the null hypothesis

\mathbb{P}\Big{(}\textstyle\frac{\widehat{r}_{n}}{p}-\epsilon_{0}>q_{1-\alpha}\Big{)}\ \leq\ \alpha.

(3.17)

In other words, if $q_{1-\alpha}$ were known, then a level- $\alpha$ testing procedure would result from using $\frac{1}{p}\widehat{r}_{n}-\epsilon_{0}>q_{1-\alpha}$ as a rejection criterion. Accordingly, a bootstrap-based version of this procedure rejects the null hypothesis when $\frac{1}{p}\widehat{r}_{n}-\epsilon_{0}>\widehat{q}_{1-\alpha}$ , with the quantile estimate $\widehat{q}_{1-\alpha}$ defined as before.

Testing for sphericity. One more example of a testing problem related to $r_{n}$ is that of testing for sphericity,

\mathsf{H}^{\prime}_{0,n}:\Sigma_{n}=\sigma^{2}I\text{ \ for some \ $\sigma^{2}>0$}\text{ \ \ \ vs. \ \ \ }\mathsf{H}^{\prime}_{1,n}:\Sigma_{n}\neq\sigma^{2}I\text{ \ for any \ $\sigma^{2}>0$}.

(3.18)

The connection to the parameter $r_{n}$ arises from the fact that (3.18) is equivalent to the problem $\mathsf{H}^{\prime}_{0,n}:\frac{r_{n}}{p}=1$ vs. $\mathsf{H}^{\prime}_{1,n}:\frac{r_{n}}{p}<1$ . This observation was used in the influential paper [43] to develop a formula-based sphericity test. By analogy with our discussion of the problem (3.16), this observation can also be used to develop a bootstrap-based sphericity test. However, there is one point of distinction in the current situation, which is that the eigenvalues of $\Sigma_{n}$ no longer need to be estimated. The reason is that under $\mathsf{H}_{0,n}^{\prime}$ , the scale-invariance of the statistic $(\widehat{r}_{n}-r_{n})/p$ causes it to behave as if $\Sigma_{n}=I_{p}$ . Consequently, to estimate the quantiles of the null distribution of $(\widehat{r}_{n}-r_{n})/p$ , Algorithm 1 can be run using $\tilde{\Sigma}_{n}=I_{p}$ . To summarize, if we let $\widehat{q}_{\alpha}^{\prime}$ denote the resulting estimate for the $\alpha$ -quantile of the null distribution of $(\widehat{r}_{n}-r_{n})/p$ , then the rejection criterion is $\frac{\widehat{r}_{n}}{p}-1\leq\widehat{q}_{\alpha}^{\prime}$ .

The following result establishes the theoretical validity of the procedures discussed in this subsection.

Theorem 3.

If Assumptions 1 and 2 hold, then as $n\to\infty$ ,

\mathbb{P}\big{(}r_{n}\in\widehat{\mathcal{I}}_{n}\big{)}\ \to\ 1-\alpha.

(3.19)

Furthermore, if $\mathsf{H}_{0,n}$ holds for all large $n$ , then

\mathbb{P}\Big{(}\textstyle\frac{\widehat{r}_{n}}{p}-\epsilon_{0}>\widehat{q}_{1-\alpha}\Big{)}\ \leq\ \alpha+o(1),

(3.20)

or if $\mathsf{H}^{\prime}_{0,n}$ holds for all large $n$ , then

\mathbb{P}\Big{(}\textstyle\frac{\widehat{r}_{n}}{p}-1\leq\widehat{q}_{\alpha}^{\prime}\Big{)}\ \to\ \alpha.

(3.21)

Remarks. This result provides a notable complement to Theorem 2, because it demonstrates that bootstrap consistency can be established in tasks that are based on a nonlinear spectral statistic, namely $\widehat{r}_{n}-r_{n}$ . The proof of this result is given in Appendix C, where it can be seen that the limiting distribution of $\widehat{r}_{n}-r_{n}$ has a very complicated dependence on the moments of $H$ , due to the correlation between $\textup{tr}(\widehat{\Sigma}_{n})$ and $\textup{tr}(\widehat{\Sigma}_{n}^{2})$ . In this way, the proof illustrates the utility of the bootstrap, since the bootstrap enables the user to completely bypass such complexity.

4 Numerical results

This section explores the empirical performance of the proposed bootstrap method in three different ways. Sections 4.2 and 4.3 deal with bootstrap approximations for linear and nonlinear spectral statistics, while Section 4.4 looks at procedures for doing inference on the stable rank parameter $r_{n}$ .

4.1 Parameter settings

In our simulations, we generated data from elliptical models that were parameterized as follows. The random variable $\xi_{1}^{2}$ was generated using four choices of distributions:

(i).

Chi-Squared distribution with $p$ degrees of freedom
(ii).

Beta-Prime $\left(\frac{p(1+p+8)}{8},\frac{1+p+16}{8}\right)$ ,
(iii).

$(p+4)\textup{Beta}(p/2,2)$ ,
(iv).

$\frac{9p}{10}\textup{F}(p,20)$ ,

where F $(d_{1},d_{2})$ denotes an F-distribution with $d_{1}$ and $d_{2}$ degrees of freedom. Note that cases (i), (iii), (iv), correspond respectively to a multivariate Gaussian distribution, a multivariate Pearson type II distribution, and a multivariate t-distribution with 20 degrees of freedom. Also, the numerical values appearing in (i)-(iv) were chosen to ensure the normalization condition $\mathbb{E}(\xi_{1}^{2})=p$ .

The population covariance matrix $\Sigma_{n}$ was selected from five options:

1.

The eigenvalues of $\Sigma_{n}$ are $\lambda_{1}(\Sigma_{n})=\cdots=\lambda_{5}(\Sigma_{n})=\frac{4}{3}$ and $\lambda_{j}(\Sigma_{n})=1$ for $j\in\{6,\dots,p\}$ . The $p\times p$ matrix of eigenvectors of $\Sigma_{n}$ is generated from the uniform distribution on orthogonal matrices.
2.

The eigenvalues of $\Sigma_{n}$ are $\lambda_{j}(\Sigma_{n})=\exp(-j/4)$ for $j\in\{1,\dots,20\}$ , and $\lambda_{20}(\Sigma_{n})=\cdots=\lambda_{p}(\Sigma_{n})$ . The eigenvectors are the same as in case 1.
3.

The matrix $\Sigma_{n}$ has entries of the form $(\Sigma_{n})_{ij}=(\frac{1}{10})^{|i-j|}+1\{i=j\}$ .
4.

The matrix $\Sigma_{n}$ has entries of the form $(\Sigma_{n})_{ij}=(\frac{1}{10})1\{i\neq j\}+1\{i=j\}$ .
5.

The eigenvalues of $\Sigma_{n}$ are $\lambda_{1}(\Sigma_{n})=5$ and $\lambda_{j}(\Sigma_{n})=1$ for $j\in\{2,\dots,p\}$ . The eigenvectors are the same as in case 1.

4.2 Linear spectral statistics

Our experiments for linear spectral statistics were based on the task of using the proposed bootstrap to estimate three parameters of $\mathcal{L}(p(T_{n}(f)-\vartheta_{n}(f)))$ : the mean, standard deviation, and 95th percentile. We considered two choices for the function $f$ , namely $f(x)=x^{2}$ and $f(x)=x-\log(x)-1$ . In the first case, we selected the ratio $p/n$ so that $p/n\in\{0.5,1,1.5\}$ , and in the second case we used $p/n\in\{0.3,0.5,0.7\}$ .

Design of experiments. For each possible choice of $(\xi_{1},\Sigma_{n},p/n)$ , we generated 5000 realizations of the dataset $\mathbf{x}_{1},\dots,\mathbf{x}_{n}$ , with $n=400$ . These datasets allowed us to compute 5000 realizations of the statistic $p(T_{n}(f)-\vartheta_{n}(f))$ , and we treated the empirical mean, standard deviation, and 95th percentile of these 5000 realizations as ground truth for our parameters of interest. In Tables 1 and 2, the ground truth values are reported in the first row of numbers corresponding to each choice of $\Sigma_{n}$ .

With regard to the bootstrap, we ran Algorithm 1 on the first 500 datasets corresponding to each parameter setting. Also, we generated $B=250$ bootstrap samples of the form $p(T_{n,1}^{*}(f)-\tilde{\vartheta}_{n}(f))$ during every run. As a result of these runs, we obtained 500 different bootstrap estimates of the mean, standard deviation, and 95th percentile of $\mathcal{L}(p(T_{n}(f)-\vartheta_{n}(f)))$ . In Tables 1 and 2, we report the empirical mean and standard deviation (in parenthesis) of these 500 estimates in the second row of numbers corresponding to each choice of $\Sigma_{n}$ .

One more detail to mention is related to the computation of $\vartheta_{n}(f)$ and $\tilde{\vartheta}_{n}(f)$ . For each parameter setting, we approximated $\vartheta_{n}(f)$ as follows. We averaged 30 realizations of $\frac{1}{40p}\sum_{j=1}^{40p}f(\lambda_{j}(\frac{1}{40n}\sum_{i=1}^{40n}\xi_{i}^{2}\mathsf{\Sigma}_{n}^{1/2}\mathbf{u}_{i}\mathbf{u}_{i}^{\top}\mathsf{\Sigma}_{n}^{1/2}))$ , where $\mathsf{\Sigma}_{n}=I_{40}\otimes\Sigma_{n}$ is of size $40p\times 40p$ , each $\mathbf{u}_{i}$ was drawn from the uniform distribution on the unit sphere of $\mathbb{R}^{40p}$ , and each $\xi_{i}^{2}$ was generated as in (i)-(iv), but with $40p$ replacing $p$ . For the bootstrap samples, we computed one realization of the statistic $\frac{1}{40p}\sum_{j=1}^{40p}f(\lambda_{j}(\frac{1}{40n}\sum_{i=1}^{40n}(\xi_{i}^{*})^{2}\tilde{\mathsf{\Sigma}}_{n}^{1/2}\mathbf{u}_{i}^{*}(\mathbf{u}_{i}^{*})^{\top}\tilde{\mathsf{\Sigma}}_{n}^{1/2}))$ to approximate $\tilde{\vartheta}_{n}(f)$ during every run of Algorithm 1, where $\tilde{\mathsf{\Sigma}}_{n}=I_{40}\otimes\tilde{\Sigma}_{n}$ is of size $40p\times 40p$ , each $\mathbf{u}_{i}^{*}$ was drawn from the uniform distribution on the unit sphere of $\mathbb{R}^{40p}$ , and each $(\xi_{i}^{*})^{2}$ was drawn from a Gamma distribution with mean $40p$ and variance $40\widehat{\varsigma}_{n}^{2}$ .

Comments on results. It is easiest to explain the format of the tables with an example: The two entries in the upper right corner of Table 1 show that in settings (i) and 1 with $p/n=1.5$ , the 95th percentile of $\mathcal{L}(p(T_{n}(f)-\vartheta_{n}(f)))$ with $f(x)=x^{2}$ is equal to 16.48, and the bootstrap estimate for the 95th percentile has a mean (standard deviation) of 16.73 (1.48). Table 2 presents results for $f(x)=\log(x)-x-1$ in the same format.

In most settings, the bootstrap estimates perform well, with their bias and standard deviation being small in proportion to the parameter being estimated. However, there are some specific parameter settings that require more attention. These settings involve choice $4$ for $\Sigma_{n}$ , which is an equi-correlation matrix, and choice (iv) for $\xi_{1}$ , which induces a multivariate t-distribution with 20 degrees of freedom. Notably, these choices correspond to settings that violate Assumptions 1 and 2 of our theoretical results. In the case of the equi-correlation matrix, the bootstrap approximations for $f(x)=x^{2}$ are less accurate in comparison to other choices of $\Sigma_{n}$ , due to increased variance. By contrast, if $f(x)=\log(x)-x-1$ , then the bootstrap approximations have similar accuracy across all choices of $\Sigma_{n}$ while holding other parameters fixed.

Lastly, in the case of the multivariate t-distribution, the bootstrap is able to accurately estimate the standard deviation of $\mathcal{L}(p(T_{n}(f)-\vartheta_{n}(f)))$ for both choices of $f$ , but difficulties arise in estimating the mean and 95th percentile. To understand these mixed results, it is important to recognize the standard deviation does not depend on the centering parameter $\vartheta_{n}(f)$ , whereas the mean and 95th percentile do. Also, the choice of $\vartheta_{n}(f)$ as a centering parameter is based on the CLT for linear spectral statistics established in [18], and the assumptions underlying that result are violated by the multivariate t-distribution.

Table 1: Results for

p(T_{n}(f)-\vartheta_{n}(f))

with

f(x)=x^{2}

		$p/n=0.5$			$p/n=1$			$p/n=1.5$
$\xi_{1}$	$\Sigma_{n}$	mean	sd	95th	mean	sd	95th	mean	sd	95th
	1	0.51	3.31	6.02	0.85	6.11	10.87	1.44	9.15	16.48
	1	0.48(0.24)	3.27(0.19)	5.85(0.53)	0.96(0.44)	6.09(0.35)	11.02(1)	1.47(0.65)	9.25(0.51)	16.73(1.48)
	2	0	0.11	0.19	0.01	0.11	0.19	0	0.11	0.2
	2	0(0.01)	0.11(0.01)	0.19(0.03)	0(0.01)	0.11(0.01)	0.19(0.02)	0(0.01)	0.11(0.01)	0.2(0.03)
(i)	3	2.16	12.88	23.12	4.02	24.12	44.03	5.8	36.13	65.72
(i)	3	1.9(0.88)	12.85(0.73)	23.2(2.07)	3.94(1.75)	24.27(1.35)	43.95(3.82)	5.8(2.77)	37.08(2.2)	67.09(6.1)
	4	0.64	63.35	110.6	2.48	241.3	420	14.11	540	952.2
	4	1.53(4.7)	62.9(9.6)	111.2(19.68)	4.95(17.04)	245(34.43)	433(70.15)	11.44(36.41)	541.9(78.44)	958.3(161.6)
	5	0.55	5.03	9.1	1.03	7.49	13.4	1.57	10.42	18.58
	5	0.55(0.34)	5.1(0.46)	9.15(1.05)	1.04(0.53)	7.41(0.48)	13.4(1.22)	1.55(0.74)	10.35(0.65)	18.61(1.7)
	1	3.57	6.35	14.09	6.69	11.93	26.9	10.25	18.09	39.92
	1	3.47(0.59)	6.36(0.45)	14.03(1.29)	6.83(1)	11.85(0.76)	26.51(2.32)	10.26(1.53)	17.99(1.11)	39.92(3.25)
	2	0.01	0.12	0.21	0	0.11	0.2	0	0.11	0.19
	2	0.01(0.01)	0.12(0.01)	0.2(0.03)	0.01(0.01)	0.11(0.01)	0.2(0.03)	0(0.01)	0.11(0.01)	0.2(0.03)
(ii)	3	14.16	25.27	56	27.59	47.57	105.8	41.49	71.66	159.7
(ii)	3	13.67(2.24)	25.12(1.76)	55.35(5.1)	26.94(3.72)	47.09(2.97)	104.9(8.33)	40.39(6.13)	71.59(4.5)	158.6(13.38)
	4	2.26	65.49	114.9	9.27	246.8	435.7	23.51	539.9	970.1
	4	4.7(4.65)	66.35(9.67)	119.7(20.26)	9.81(16.29)	248.2(38.15)	441.7(75.8)	19.95(37.73)	548.5(83.74)	971.1(172)
	5	3.75	7.75	16.66	7.21	13.01	28.58	10.46	18.87	42.02
	5	3.59(0.65)	7.86(0.62)	16.69(1.58)	7.03(1.05)	12.87(0.84)	28.29(2.43)	10.4(1.51)	18.83(1.1)	41.45(3.22)
	1	-0.46	1.13	1.44	-0.95	2.13	2.61	-1.47	3.12	3.71
	1	-0.47(0.08)	1.14(0.07)	1.4(0.17)	-0.96(0.14)	2.15(0.11)	2.58(0.33)	-1.43(0.21)	3.17(0.15)	3.81(0.45)
	2	0.01	0.11	0.19	0	0.11	0.19	0	0.11	0.19
	2	0(0.01)	0.11(0.01)	0.19(0.03)	0(0.01)	0.11(0.01)	0.19(0.02)	0(0.01)	0.11(0.01)	0.19(0.03)
(iii)	3	-1.93	4.49	5.41	-3.89	8.66	10.62	-5.62	13	15.33
(iii)	3	-1.85(0.33)	4.56(0.25)	5.68(0.69)	-3.78(0.57)	8.69(0.48)	10.57(1.26)	-5.7(0.9)	12.87(0.68)	15.52(1.88)
	4	1.61	63.23	110	6.1	242	420.7	-6.78	541.8	944.3
	4	0.75(4.21)	63.54(9.08)	111.2(18.37)	3.06(16.56)	242.5(37.18)	423.6(74.62)	5.52(35.54)	530.5(80.96)	928.1(161.8)
	5	-0.52	3.78	5.98	-0.9	4.45	6.74	-1.3	5.35	7.75
	5	-0.48(0.25)	3.83(0.51)	6.14(1.02)	-0.94(0.32)	4.53(0.55)	6.81(1.17)	-1.4(0.36)	5.38(0.53)	7.67(1.15)
	1	0.67	12.57	21.94	1.19	33.98	58.49	0.77	64.69	114.1
	1	13.02(2.03)	12.28(1.26)	33.54(4.18)	50.46(7.66)	33.11(3.74)	105.7(13.79)	111.2(15.79)	62.29(6.76)	215.5(26.71)
	2	0.01	0.13	0.24	0.01	0.14	0.25	0	0.14	0.23
	2	0.01(0.01)	0.13(0.02)	0.24(0.04)	0.02(0.01)	0.14(0.02)	0.25(0.04)	0.02(0.01)	0.14(0.02)	0.25(0.04)
(iv)	3	0.68	48.68	82.62	1.3	136.4	236.7	15.26	260	457.1
(iv)	3	51.21(8.4)	48.62(5.03)	132.7(17.04)	198.5(28.4)	131.5(13.71)	418.8(51.54)	442.9(64.24)	250.9(28.24)	861.1(108)
	4	2.75	73.19	135.4	17.25	281.2	506.3	27.98	603.8	1097
	4	14.03(5.46)	73.41(12.32)	142.2(25.81)	54.28(20.28)	277.7(44.26)	538.9(90.87)	123.8(46.62)	622.4(97.37)	1217(206.5)
	5	0.42	14.02	23.86	2.62	35.23	62.08	-0.69	66.26	110.7
	5	13.2(2.08)	13.82(1.41)	36.31(4.4)	50.71(7.33)	34.45(3.39)	108.3(12.79)	111.5(15.1)	64.04(6.65)	219(26.3)

Table 2: Results for

p(T_{n}(f)-\vartheta_{n}(f))

with

f(x)=x-\log(x)-1

		$p/n=0.3$			$p/n=0.5$			$p/n=0.7$
$\xi_{1}$	$\Sigma_{n}$	mean	sd	95th	mean	sd	95th	mean	sd	95th
	1	0.18	0.35	0.77	0.34	0.64	1.4	0.6	1	2.24
	1	0.17(0.03)	0.35(0.02)	0.74(0.05)	0.34(0.06)	0.63(0.03)	1.37(0.09)	0.58(0.09)	1.01(0.05)	2.25(0.14)
(i)	2	0.18	0.83	1.55	0.36	1.16	2.25	0.59	1.53	3.1
	2	0.19(0.22)	0.79(0.26)	1.49(0.65)	0.36(0.45)	1.11(0.37)	2.19(1.05)	0.62(0.7)	1.51(0.44)	3.1(1.42)
	3	0.18	0.86	1.62	0.35	1.19	2.27	0.55	1.58	3.12
	3	0.17(0.06)	0.86(0.05)	1.59(0.14)	0.33(0.09)	1.19(0.06)	2.3(0.19)	0.58(0.12)	1.56(0.08)	3.17(0.25)
	4	0.16	0.91	1.66	0.32	1.53	2.92	0.57	2.21	4.29
	4	0.17(0.09)	0.91(0.07)	1.71(0.18)	0.33(0.18)	1.54(0.12)	2.91(0.31)	0.58(0.31)	2.21(0.16)	4.29(0.49)
	5	0.18	0.44	0.91	0.32	0.7	1.46	0.56	1.04	2.28
	5	0.17(0.04)	0.45(0.03)	0.92(0.07)	0.34(0.06)	0.69(0.03)	1.48(0.11)	0.59(0.09)	1.05(0.05)	2.31(0.15)
	1	1	0.37	1.62	1.74	0.65	2.79	2.56	1.02	4.25
	1	1.04(0.11)	0.37(0.02)	1.66(0.12)	1.79(0.17)	0.66(0.03)	2.87(0.19)	2.64(0.22)	1.04(0.05)	4.35(0.25)
(ii)	2	1.04	1.52	3.54	1.74	2.02	5.06	2.54	2.49	6.54
	2	1.05(0.4)	1.51(0.25)	3.54(0.81)	1.77(0.7)	2.01(0.35)	5.09(1.27)	2.54(0.96)	2.46(0.41)	6.59(1.63)
	3	1.02	1.63	3.78	1.78	2.15	5.34	2.57	2.63	6.94
	3	1.04(0.16)	1.6(0.12)	3.68(0.36)	1.8(0.22)	2.11(0.14)	5.31(0.45)	2.61(0.28)	2.59(0.16)	6.88(0.53)
	4	1.02	0.94	2.59	1.69	1.57	4.26	2.58	2.22	6.3
	4	1.04(0.16)	0.94(0.07)	2.63(0.24)	1.8(0.31)	1.58(0.12)	4.45(0.43)	2.63(0.49)	2.25(0.17)	6.4(0.63)
	5	1.02	0.47	1.8	1.78	0.71	2.96	2.56	1.06	4.31
	5	1.05(0.13)	0.48(0.03)	1.84(0.16)	1.8(0.17)	0.72(0.04)	2.98(0.2)	2.64(0.22)	1.08(0.05)	4.41(0.27)
	1	-0.11	0.34	0.45	-0.13	0.62	0.9	-0.14	1	1.49
	1	-0.11(0.02)	0.34(0.02)	0.45(0.05)	-0.14(0.04)	0.62(0.03)	0.88(0.08)	-0.09(0.07)	1.01(0.04)	1.57(0.14)
(iii)	2	-0.12	0.36	0.49	-0.14	0.64	0.93	-0.08	1.01	1.57
	2	-0.04(0.12)	0.47(0.18)	0.74(0.43)	0.05(0.29)	0.85(0.28)	1.45(0.74)	0.18(0.45)	1.22(0.31)	2.19(0.95)
	3	-0.11	0.38	0.52	-0.14	0.65	0.93	-0.07	1.03	1.66
	3	-0.11(0.03)	0.38(0.02)	0.51(0.05)	-0.14(0.05)	0.65(0.03)	0.93(0.09)	-0.09(0.07)	1.03(0.05)	1.6(0.14)
	4	-0.09	0.91	1.43	-0.12	1.55	2.47	-0.14	2.21	3.64
	4	-0.09(0.07)	0.9(0.08)	1.42(0.17)	-0.1(0.13)	1.54(0.11)	2.48(0.28)	0.01(0.21)	2.21(0.16)	3.7(0.4)
	5	-0.11	0.43	0.62	-0.15	0.67	0.94	-0.15	1.06	1.57
	5	-0.11(0.03)	0.44(0.03)	0.62(0.07)	-0.14(0.05)	0.68(0.03)	0.99(0.09)	-0.09(0.07)	1.04(0.05)	1.64(0.15)
	1	0.19	0.45	0.95	0.37	0.86	1.81	0.68	1.44	3.15
	1	2.35(0.28)	0.44(0.03)	3.07(0.31)	6.45(0.79)	0.84(0.06)	7.85(0.87)	12.53(1.51)	1.41(0.1)	14.87(1.64)
	2	0.15	2.15	3.75	0.26	3.41	5.85	0.46	4.77	8.24
	2	2.37(0.81)	2.2(0.31)	6.01(1.31)	6.37(1.52)	3.64(0.43)	12.36(2.23)	12.6(2.65)	5.16(0.56)	21.06(3.51)
(iv)	3	0.13	2.42	4.3	0.22	3.96	6.84	0.57	5.75	10.49
(iv)	3	2.37(0.34)	2.32(0.21)	6.25(0.7)	6.45(0.86)	3.8(0.35)	12.82(1.43)	12.67(1.58)	5.31(0.5)	21.49(2.33)
	4	0.19	1.01	1.89	0.4	1.7	3.29	0.5	2.53	4.75
	4	2.39(0.35)	0.99(0.09)	4.06(0.44)	6.46(0.93)	1.72(0.15)	9.37(1.08)	12.44(1.78)	2.52(0.2)	16.64(1.97)
	5	0.19	0.54	1.1	0.35	0.93	1.92	0.62	1.49	3.12
	5	2.36(0.3)	0.54(0.04)	3.25(0.35)	6.35(0.78)	0.9(0.07)	7.86(0.87)	12.55(1.55)	1.46(0.1)	14.97(1.7)

Breakdown of nonparametric bootstrap. To highlight one of the motivations for our parametric bootstrap method, we close this subsection with some numerical results exhibiting the breakdown of the standard nonparametric bootstrap, based on sampling with replacement. For the sake of brevity, we focus only on the task of estimating the standard deviation of $p(T_{n}(f)-\vartheta_{n}(f))$ in a subset of the previous parameter settings, corresponding to $p/n=1/2$ , and $(\xi_{n},\Sigma_{n})\in\{\text{(i)}\}\times\{1,2,3,4,5\}$ . The standard deviation is of particular interest, because it clarifies that the breakdown does not depend on how the statistic is centered.

The results are presented in Table 3, showing that the nonparametric bootstrap tends to overestimate the true standard deviation, often by a factor of 2 or more. For comparison, the corresponding estimates obtained from the proposed bootstrap method are much more accurate, as can be seen from Tables 1 and 2. In addition, the nonparametric bootstrap can have difficulties with nonlinear spectral statistics in settings like those considered here. Numerical results along these lines can be found in the paper [13].

Table 3: Estimation of standard deviation of

p(T_{n}(f)-\vartheta_{n}(f))

with the nonparametric bootstrap

$p/n$	$\xi_{1}$	$\Sigma_{n}$	$f(x)=x^{2}$	$f(x)=x-\log(x)-1$
0.5	(i)	1	3.31	0.64
			9.1(0.45)	6.08(0.27)
0.5	(i)	2	0.11	1.16
			0.11(0.01)	6.11(0.28)
0.5	(i)	3	12.88	1.19
			35.72(1.83)	6.23(0.31)
0.5	(i)	4	63.35	1.53
			66.65(10.63)	6.25(0.3)
0.5	(i)	5	5.03	0.7
			10.35(0.63)	6.11(0.27)

4.3 Nonlinear spectral statistics

This subsection looks at how well the proposed bootstrap handles nonlinear spectral statistics. The statistics under consideration here are the largest sample eigenvalue $\lambda_{1}(\widehat{\Sigma}_{n})$ , and the leading eigengap $\lambda_{1}(\widehat{\Sigma}_{n})-\lambda_{2}(\widehat{\Sigma}_{n})$ . The underlying experiments for these statistics were designed in the same manner as in Section 4.2, and Tables 4 and 5 display the results in the same format as Tables 1 and 2. To a large extent, the favorable patterns that were noted in the results for linear spectral statistics are actually enhanced in the results for nonlinear spectral statistics—in the sense that the bias and standard deviations of the bootstrap estimates are generally smaller here than before. Also, in contrast to linear spectral statistics, the results for nonlinear spectral statistics are relatively unaffected by choice 4 for $\Sigma_{n}$ . Lastly, under choice (iv) for $\xi_{1}$ , the accuracy for nonlinear spectral statistics is reduced in comparison to other choices of $\xi_{1}$ . Nevertheless, the reduction in accuracy under choice (iv) is less pronounced here than it was in the context of linear spectral statistics. This makes sense in light of the fact that the statistics $\lambda_{1}(\widehat{\Sigma}_{n})$ and $\lambda_{1}(\widehat{\Sigma}_{n})-\lambda_{2}(\widehat{\Sigma}_{n})$ do not involve the centering parameter $\vartheta_{n}(f)$ that was used for linear spectral statistics. (Recall from Section 4.2 that the reduced accuracy for linear spectral statistics under choice (iv) appeared to be related to $\vartheta_{n}(f)$ .)

Table 4: Results for

\lambda_{1}(\widehat{\Sigma}_{n})

		$p/n=0.5$			$p/n=1$			$p/n=1.5$
$\xi_{1}$	$\Sigma_{n}$	mean	sd	95th	mean	sd	95th	mean	sd	95th
	1	2.9	0.06	3	3.96	0.06	4.07	4.9	0.06	5.01
	1	2.93(0.05)	0.06(0.01)	3.04(0.07)	3.99(0.04)	0.06(0.01)	4.1(0.06)	4.93(0.04)	0.07(0.01)	5.05(0.06)
	2	0.8	0.05	0.89	0.8	0.05	0.89	0.81	0.05	0.9
	2	0.8(0.05)	0.05(0.01)	0.89(0.06)	0.8(0.05)	0.05(0.01)	0.89(0.06)	0.81(0.05)	0.05(0.01)	0.9(0.06)
(i)	3	5.76	0.11	5.95	7.91	0.12	8.12	9.81	0.13	10.03
(i)	3	5.81(0.08)	0.12(0.02)	6.02(0.11)	7.98(0.09)	0.13(0.02)	8.21(0.13)	9.87(0.1)	0.14(0.03)	10.12(0.14)
	4	21.35	1.47	23.81	41.79	2.87	46.62	62.34	4.31	69.6
	4	21.21(1.5)	1.47(0.13)	23.68(1.71)	41.8(2.73)	2.9(0.23)	46.7(3.09)	62.34(4.19)	4.3(0.35)	69.62(4.73)
	5	5.62	0.35	6.21	6.25	0.34	6.81	6.88	0.33	7.44
	5	5.61(0.33)	0.35(0.03)	6.2(0.38)	6.22(0.33)	0.34(0.03)	6.79(0.37)	6.87(0.33)	0.33(0.03)	7.43(0.38)
	1	2.96	0.07	3.08	4.02	0.07	4.14	4.96	0.07	5.08
	1	3.03(0.07)	0.07(0.01)	3.15(0.09)	4.09(0.07)	0.07(0.01)	4.22(0.09)	5.04(0.06)	0.08(0.01)	5.17(0.08)
	2	0.8	0.06	0.89	0.8	0.05	0.89	0.8	0.05	0.9
	2	0.8(0.05)	0.05(0.01)	0.89(0.06)	0.8(0.06)	0.05(0.01)	0.89(0.06)	0.81(0.06)	0.05(0.01)	0.9(0.06)
(ii)	3	5.88	0.13	6.11	8.04	0.14	8.26	9.93	0.14	10.16
(ii)	3	6.03(0.13)	0.14(0.02)	6.27(0.16)	8.2(0.12)	0.15(0.02)	8.47(0.15)	10.1(0.12)	0.16(0.02)	10.37(0.16)
	4	21.34	1.51	23.86	41.8	2.92	46.72	62.32	4.3	69.65
	4	21.44(1.48)	1.51(0.13)	23.99(1.69)	41.87(3)	2.92(0.25)	46.79(3.38)	62.37(4.37)	4.34(0.38)	69.67(4.99)
	5	5.65	0.35	6.24	6.28	0.35	6.86	6.9	0.34	7.48
	5	5.67(0.36)	0.36(0.03)	6.27(0.41)	6.31(0.35)	0.35(0.03)	6.9(0.4)	6.91(0.35)	0.34(0.03)	7.48(0.4)
	1	2.88	0.05	2.97	3.94	0.06	4.04	4.88	0.06	4.99
	1	2.89(0.03)	0.06(0.01)	2.98(0.05)	3.95(0.03)	0.06(0.01)	4.06(0.05)	4.89(0.03)	0.06(0.01)	5(0.05)
	2	0.8	0.05	0.89	0.8	0.05	0.89	0.8	0.05	0.9
	2	0.8(0.05)	0.05(0.01)	0.89(0.06)	0.8(0.05)	0.05(0.01)	0.89(0.06)	0.8(0.05)	0.05(0.01)	0.89(0.06)
(iii)	3	5.72	0.1	5.9	7.88	0.12	8.08	9.77	0.12	9.99
(iii)	3	5.74(0.07)	0.11(0.02)	5.94(0.11)	7.9(0.08)	0.12(0.02)	8.12(0.13)	9.79(0.08)	0.13(0.02)	10.01(0.11)
	4	21.39	1.47	23.84	41.83	2.88	46.61	62.14	4.34	69.51
	4	21.39(1.38)	1.47(0.12)	23.87(1.56)	41.74(2.95)	2.87(0.25)	46.57(3.35)	61.71(4.39)	4.25(0.37)	68.87(4.96)
	5	5.61	0.34	6.17	6.24	0.34	6.82	6.87	0.34	7.44
	5	5.59(0.34)	0.34(0.03)	6.17(0.38)	6.23(0.35)	0.34(0.03)	6.8(0.39)	6.88(0.34)	0.34(0.03)	7.45(0.39)
	1	3.24	0.18	3.55	4.77	0.42	5.48	6.27	0.67	7.51
	1	3.39(0.22)	0.12(0.03)	3.59(0.27)	5(0.46)	0.18(0.05)	5.31(0.54)	6.59(0.64)	0.25(0.07)	7.03(0.75)
	2	0.8	0.06	0.9	0.8	0.06	0.91	0.81	0.06	0.91
	2	0.81(0.06)	0.06(0.01)	0.91(0.07)	0.81(0.06)	0.06(0.01)	0.91(0.07)	0.81(0.06)	0.06(0.01)	0.91(0.07)
(iv)	3	6.42	0.36	7.03	9.51	0.81	10.97	12.52	1.33	15.13
(iv)	3	6.77(0.4)	0.24(0.06)	7.18(0.49)	10.17(0.81)	0.37(0.1)	10.82(0.97)	13.55(1.35)	0.54(0.16)	14.48(1.6)
	4	21.45	1.62	24.28	42.01	3.23	47.49	62.59	4.68	70.57
	4	21.52(1.71)	1.62(0.16)	24.26(1.96)	42.08(3.22)	3.17(0.29)	47.44(3.66)	63.05(4.62)	4.76(0.43)	71.15(5.29)
	5	5.71	0.39	6.37	6.48	0.4	7.18	7.33	0.51	8.14
	5	5.79(0.39)	0.39(0.04)	6.45(0.45)	6.66(0.42)	0.39(0.04)	7.33(0.49)	7.69(0.45)	0.39(0.05)	8.37(0.53)

Table 5: Results for

\lambda_{1}(\widehat{\Sigma}_{n})-\lambda_{2}(\widehat{\Sigma}_{n})

		$p/n=0.5$			$p/n=1$			$p/n=1.5$
$\xi_{1}$	$\Sigma_{n}$	mean	sd	95th	mean	sd	95th	mean	sd	95th
	1	0.09	0.05	0.18	0.1	0.05	0.19	0.1	0.06	0.21
	1	0.1(0.02)	0.06(0.01)	0.2(0.05)	0.1(0.02)	0.06(0.01)	0.21(0.04)	0.11(0.02)	0.06(0.01)	0.22(0.04)
	2	0.18	0.07	0.3	0.18	0.07	0.29	0.18	0.07	0.29
	2	0.19(0.06)	0.06(0.01)	0.29(0.07)	0.18(0.06)	0.06(0.01)	0.29(0.07)	0.19(0.06)	0.06(0.01)	0.3(0.07)
(i)	3	0.18	0.1	0.35	0.19	0.1	0.38	0.2	0.11	0.41
(i)	3	0.2(0.04)	0.11(0.02)	0.4(0.08)	0.22(0.05)	0.12(0.03)	0.44(0.1)	0.23(0.05)	0.13(0.03)	0.47(0.11)
	4	18.78	1.47	21.26	38.26	2.87	43.07	57.95	4.31	65.22
	4	18.62(1.5)	1.47(0.13)	21.09(1.71)	38.24(2.74)	2.9(0.23)	43.15(3.09)	57.94(4.19)	4.3(0.35)	65.22(4.73)
	5	2.78	0.35	3.38	2.32	0.35	2.9	2.01	0.34	2.58
	5	2.75(0.33)	0.35(0.03)	3.34(0.38)	2.28(0.33)	0.34(0.03)	2.85(0.37)	1.98(0.33)	0.34(0.03)	2.55(0.38)
	1	0.1	0.05	0.2	0.1	0.05	0.2	0.11	0.06	0.21
	1	0.11(0.02)	0.06(0.01)	0.22(0.05)	0.11(0.02)	0.06(0.01)	0.23(0.05)	0.12(0.02)	0.07(0.01)	0.24(0.04)
	2	0.18	0.07	0.29	0.18	0.07	0.29	0.18	0.07	0.29
	2	0.18(0.06)	0.06(0.01)	0.29(0.07)	0.19(0.06)	0.06(0.01)	0.29(0.07)	0.19(0.06)	0.06(0.01)	0.3(0.07)
(ii)	3	0.19	0.1	0.38	0.2	0.11	0.41	0.21	0.11	0.42
(ii)	3	0.22(0.04)	0.12(0.02)	0.44(0.09)	0.23(0.05)	0.13(0.03)	0.48(0.1)	0.24(0.04)	0.13(0.03)	0.49(0.1)
	4	18.72	1.5	21.24	38.21	2.92	43.11	57.88	4.3	65.19
	4	18.76(1.48)	1.51(0.13)	21.3(1.68)	38.22(2.99)	2.92(0.25)	43.13(3.38)	57.87(4.37)	4.34(0.38)	65.17(4.98)
	5	2.74	0.36	3.34	2.29	0.35	2.88	1.97	0.35	2.55
	5	2.68(0.36)	0.36(0.03)	3.29(0.41)	2.24(0.35)	0.35(0.03)	2.84(0.4)	1.89(0.35)	0.34(0.03)	2.47(0.4)
	1	0.09	0.05	0.17	0.09	0.05	0.19	0.1	0.06	0.21
	1	0.09(0.02)	0.05(0.01)	0.19(0.04)	0.1(0.02)	0.05(0.01)	0.2(0.03)	0.11(0.01)	0.06(0.01)	0.21(0.03)
	2	0.18	0.07	0.29	0.18	0.07	0.29	0.18	0.07	0.29
	2	0.19(0.06)	0.06(0.01)	0.3(0.07)	0.18(0.06)	0.06(0.01)	0.29(0.07)	0.18(0.06)	0.06(0.01)	0.29(0.07)
(iii)	3	0.17	0.09	0.34	0.19	0.1	0.38	0.2	0.11	0.41
(iii)	3	0.19(0.04)	0.1(0.02)	0.38(0.08)	0.21(0.05)	0.11(0.03)	0.42(0.1)	0.18(0.06)	0.06(0.01)	0.29(0.07)
	4	18.84	1.48	21.29	38.31	2.88	43.1	57.77	4.34	65.18
	4	18.83(1.38)	1.47(0.12)	21.32(1.57)	38.21(2.95)	2.88(0.25)	43.05(3.35)	57.32(4.39)	4.25(0.37)	64.49(4.96)
	5	2.78	0.34	3.35	2.34	0.35	2.92	2.02	0.34	2.6
	5	2.75(0.34)	0.35(0.03)	3.34(0.38)	2.31(0.35)	0.34(0.03)	2.9(0.39)	2.03(0.34)	0.34(0.03)	2.6(0.39)
	1	0.17	0.14	0.39	0.31	0.34	0.91	0.5	0.54	1.5
	1	0.17(0.08)	0.09(0.03)	0.34(0.14)	0.27(0.22)	0.14(0.06)	0.53(0.32)	0.4(0.33)	0.21(0.09)	0.78(0.47)
	2	0.19	0.07	0.3	0.18	0.07	0.3	0.19	0.07	0.3
	2	0.19(0.06)	0.07(0.01)	0.31(0.07)	0.19(0.06)	0.07(0.01)	0.3(0.08)	0.19(0.06)	0.07(0.01)	0.3(0.07)
(iv)	3	0.32	0.28	0.75	0.62	0.64	1.73	0.99	1.06	2.97
(iv)	3	0.35(0.17)	0.19(0.07)	0.69(0.28)	0.6(0.44)	0.3(0.13)	1.15(0.64)	0.98(0.87)	0.45(0.21)	1.8(1.19)
	4	18.59	1.61	21.41	37.75	3.22	43.23	57.01	4.69	65
	4	18.45(1.69)	1.61(0.16)	21.18(1.93)	37.46(3.21)	3.16(0.29)	42.78(3.62)	56.94(4.6)	4.74(0.43)	64.99(5.25)
	5	2.54	0.41	3.21	1.79	0.47	2.53	1.24	0.49	2.03
	5	2.4(0.39)	0.39(0.04)	3.06(0.45)	1.58(0.41)	0.4(0.05)	2.27(0.47)	1.09(0.37)	0.4(0.06)	1.78(0.45)

4.4 Inference on stable rank

Confidence interval. Table 6 presents numerical results on the width and coverage probability of the bootstrap confidence interval $\widehat{\mathcal{I}}$ (defined in (3.15)) for the stable rank parameter $r_{n}$ . The results were computed using experiments based on the same design as in Section 4.2, with every interval having a nominal coverage probability of $95\%$ . Due to the fact that the width of the interval scales with the value of $r_{n}$ , we report the width as a percentage of $r_{n}$ . To illustrate a particular example, the upper right corner of Table 6 shows that in settings 1 and (i) with $p/n=1.5$ , the average width of the interval over repeated experiments is 1.97% of $r_{n}$ , with a standard deviation of 0.12%. Under choices 1, 3, and 5 for $\Sigma_{n}$ , the width is typically quite small as a percentage of $r_{n}$ . By contrast, the percentage is larger in cases 2 and 4, which seems to occur because $r_{n}$ is smaller in these cases compared to 1, 3, and 5. Lastly, to consider coverage probability, the table shows good general agreement with the nominal level. Indeed, among the 60 distinct settings, there are only a few where the coverage probability differs from the nominal level by more than 2%.

Table 6: Results for the stable rank confidence interval (95% nominal coverage probability)

		$p/n=0.5$		$p/n=1$		$p/n=1.5$
$\xi_{1}$	$\Sigma_{n}$	width $/r_{n}$ $(\%)$	coverage $(\%)$	width $/r_{n}$ $(\%)$	coverage $(\%)$	width $/r_{n}$ $(\%)$	coverage $(\%)$
(i)	1	2.00(0.12)	95.20	1.96(0.12)	95.40	1.97(0.12)	93.80
	2	14.53(1.19)	95.40	16.99(1.32)	91.60	18.7(1.34)	95.40
	3	2.00(0.12)	95.00	1.98(0.12)	97.40	1.97(0.12)	95.40
	4	34.68(2.87)	94.00	41.55(4.30)	94.00	43.82(5.33)	93.80
	5	5.18(0.65)	94.80	3.24(0.35)	95.20	2.64(0.24)	94.60
(ii)	1	2.07(0.12)	94.00	2.01(0.13)	92.80	1.98(0.12)	93.80
	2	14.68(1.17)	94.60	17.10(1.34)	95.00	18.75(1.42)	93.20
	3	2.08(0.13)	95.20	2.02(0.12)	95.20	2.00(0.13)	93.40
	4	34.86(2.76)	93.40	41.47(4.09)	94.80	44.99(5.34)	94.00
	5	5.12(0.63)	94.00	3.22(0.33)	93.40	2.63(0.24)	94.00
(iii)	1	1.96(0.12)	94.40	1.96(0.11)	96.00	1.96(0.11)	95.80
	2	14.45(1.11)	93.40	17.01(1.27)	94.60	18.57(1.42)	94.00
	3	1.97(0.12)	95.00	1.96(0.12)	93.20	1.96(0.12)	93.00
	4	34.71(2.67)	93.80	41.49(4.38)	92.80	44.37(5.21)	92.40
	5	5.17(0.64)	94.20	3.22(0.33)	95.80	2.61(0.24)	95.20
(iv)	1	2.38(0.21)	95.00	2.32(0.19)	94.80	2.31(0.25)	93.80
	2	15.31(1.19)	97.40	17.86(1.42)	93.80	19.53(1.49)	94.00
	3	2.4(0.20)	96.00	2.37(0.24)	95.00	2.33(0.27)	94.20
	4	35.43(2.90)	93.60	42.01(4.33)	93.60	44.59(5.06)	93.40
	5	5.20(0.64)	93.20	3.26(0.33)	93.00	2.73(0.29)	93.40

Stable rank test. Here, we discuss numerical results for an instance of the testing problem (3.16) given by

\mathsf{H}_{0,n}:\textstyle\frac{r_{n}}{p}\leq 0.1\ \ \ \text{\ \ \ \ \ \ \ vs. \ \ \ \ \ \ \ }\ \ \ \mathsf{H}_{1,n}:\textstyle\frac{r_{n}}{p}>0.1.

(4.1)

As a way to unify the study of level and power, we modified the experiments from Section 4.2 as follows. We rescaled the leading 15 eigenvalues in setting (1) to tune the ratio $r_{n}/p$ within the grid $\{0.0980,0.0985,\dots,0.1045,0.1050\}$ . More precisely, the eigenvalues of $\Sigma_{n}$ were taken to be of the form $\lambda_{j}(\Sigma_{n})=4s/3$ for $j\in\{1,\dots,5\}$ , $\lambda_{j}(\Sigma_{n})=s$ for $j\in\{6,\dots,15\}$ and $\lambda_{j}(\Sigma_{n})=1$ for $j\in\{16,\dots,p\}$ , with different values of $s$ being chosen to produce values of $r_{n}/p$ matching the stated gridpoints. Hence, gridpoints less than 0.1 correspond to $\mathsf{H}_{n,0}$ , and gridpoints larger than $0.1$ correspond to $\mathsf{H}_{1,n}$ .

At each gridpoint, we performed experiments based on the design of those in Section 4.2, allowing for $p/n$ to take the values $0.5,1.0,1.5$ , and allowing the distribution of $\xi_{1}$ to be of the types (i), (ii), (iii), and (iv). For each such setting, we applied the relevant bootstrap test from Section 3.4.2 at a 5% nominal level to 500 datasets, and then we recorded the rejection rate over the 500 trials. Figures 1, 2, 3, and 4 display the results by plotting the rejection rate as a function of the ratio $r_{n}/p$ . The separate figures correspond to choices of the distribution of $\xi_{1}$ , and within each figure, three colored curves correspond to choices of $p/n$ , as indicated in the legend. In addition, all the plots include a dashed horizontal line to signify the 5% nominal level.

An important feature that is shared by all the curves is that they stay below the nominal 5% level for essentially every value of $r_{n}/p<0.1$ , which corroborates our theoretical bound (3.20) in Theorem 3. Furthermore, when $r_{n}/p=0.1$ , the curves are mostly quite close to 5%, demonstrating that the testing procedure is well calibrated. For values of $r_{n}/p>0.1$ , the procedure exhibits substantial power, with the curve corresponding to $p/n=0.5$ achieving approximately 100% power at $r_{n}/p=0.105$ for every choice of $\xi_{1}$ . In the cases of $p/n\in\{1.0,1.5\}$ , the procedure still retains power, but with some diminution, as might be anticipated in settings of higher dimension.

\begin{overpic}[width=195.12767pt]{stable_rank_distr4_c.eps} \put(22.0,75.0){ rejection rate vs. $r_{n}/p$ } \put(0.0,10.0){\rotatebox{90.0}{ {\footnotesize\ \ \ rejection rate (\%) \ \ }}} \put(40.0,-1.0){ $r_{n}/p\ (\%)$ } \end{overpic}

Figure 1: Results for the testing problem (4.1) when

\xi_{1}^{2}

follows a Chi-Squared distribution with

p

degrees of freedom.

\begin{overpic}[width=195.12767pt]{stable_rank_distr2_c.eps} \put(22.0,75.0){ rejection rate vs. $r_{n}/p$ } \put(0.0,10.0){\rotatebox{90.0}{ {\footnotesize\ \ \ rejection rate (\%) \ \ }}} \put(40.0,-1.0){ $r_{n}/p\ (\%)$ } \end{overpic}

Figure 2: Results for the testing problem (4.1) when

\xi_{1}^{2}

follows a Beta-Prime

\left(\frac{p(1+p+8)}{8},\frac{1+p+16}{8}\right)

distribution.

\begin{overpic}[width=195.12767pt]{stable_rank_distr3_c.eps} \put(22.0,75.0){ rejection rate vs. $r_{n}/p$ } \put(0.0,10.0){\rotatebox{90.0}{ {\footnotesize\ \ \ rejection rate (\%) \ \ }}} \put(40.0,-1.0){ $r_{n}/p\ (\%)$ } \end{overpic}

Figure 3: Results for the testing problem (4.1) when

\xi_{1}^{2}

follows a

(p+4)\textup{Beta}(p/2,2)

distribution.

\begin{overpic}[width=195.12767pt]{stable_rank_distr5_c.eps} \put(22.0,75.0){ rejection rate vs. $r_{n}/p$ } \put(0.0,10.0){\rotatebox{90.0}{ {\footnotesize\ \ \ rejection rate (\%) \ \ }}} \put(40.0,-1.0){ $r_{n}/p\ (\%)$ } \end{overpic}

Figure 4: Results for the testing problem (4.1) when

\xi_{1}^{2}

follows a

\frac{9p}{10}\textup{F}(p,20)

distribution.

4.5 Sphericity test

Let $\mathsf{T}_{\text{sr}}=\widehat{r}_{n}/p-1$ be a shorthand for the statistic that was introduced in Section 3.4.2 for testing sphericity. We now provide numerical comparisons with three other testing procedures based on linear spectral statistics. To define these other procedures, let

\widehat{\mathcal{S}}_{n}=\frac{p}{n}\sum_{i=1}^{n}\frac{\mathbf{x}_{i}\mathbf{x}_{i}^{\top}}{\|\mathbf{x}_{i}\|_{2}^{2}}

(4.2)

denote the sample covariance matrix of rescaled observations, and consider the following three statistics,

$\displaystyle\mathsf{T}_{1}$	$\displaystyle=\textstyle\frac{1}{p}\textup{tr}(\widehat{\mathcal{S}}_{n}^{2})-c_{n}-1$	(4.3)
$\displaystyle\mathsf{T}_{2}$	$\displaystyle=\textstyle\frac{1}{p}\textup{tr}(\widehat{\mathcal{S}}_{n}^{4})-\textstyle\frac{4c_{n}}{p}\textup{tr}(\widehat{\mathcal{S}}_{n}^{3})-2c_{n}(\textstyle\frac{1}{p}\textup{tr}(\widehat{\mathcal{S}}_{n}^{2}))^{2}+\textstyle\frac{10c_{n}^{2}}{p}\textup{tr}(\widehat{\mathcal{S}}_{n}^{2})-5c_{n}^{3}-1$	(4.4)
$\displaystyle\mathsf{T}_{3}$	$\displaystyle=\max\Big{\{}\textstyle\frac{n\mathsf{T}_{1}+1}{2}\,,\,\frac{n\mathsf{T}_{2}+6-c_{n}}{\sqrt{8(18+12c_{n}+c_{n}^{2})}}\Big{\}}.$	(4.5)

The testing procedures corresponding to these statistics reject the sphericity hypothesis when the statistics take large values. The first two statistics can be attributed in part to the papers [43] and [15], and the proposal of taking the maximum was made in [45]. However, in all of these works, an ordinary sample covariance matrix was used in place of $\widehat{\mathcal{S}}_{n}$ . Variants of the definition of $\mathsf{T}_{1}$ in (4.3) have been studied in [39] and references in therein, while the definitions $\mathsf{T}_{2}$ and $\mathsf{T}_{3}$ in (4.4) and (4.5) were proposed in [18]. The latter paper also derived the limiting null distributions of all three statistics in high-dimensional elliptical models.

Since numerical comparisons of the statistics $\mathsf{T}_{1}$ , $\mathsf{T}_{2}$ , and $\mathsf{T}_{3}$ were given previously in [18], our experiments here are designed using similar settings. Under the null hypothesis, we generated data from a standard multivariate normal distribution in 15 cases, corresponding to 3 choices of $p/n\in\{0.5,1,2\}$ and 5 choices of $n\in\{100,200,300,400,500\}$ . For the statistics $\mathsf{T}_{1}$ , $\mathsf{T}_{2}$ , and $\mathsf{T}_{3}$ , we used the analytical critical values derived previously in [18], and for the statistic $\mathsf{T}_{\text{sr}}$ , we determined its critical value using the proposed bootstrap with $B=500$ . For each setting under the null hypothesis, we generated 50000 datasets and calculated the empirical level of each test as the fraction of rejections among the 50000 trials. The results corresponding to a nominal level of 5% are displayed in Table 7, which shows that the empirical and nominal levels are in close agreement for all four statistics.

Regarding the alternative hypothesis, we retained all the settings described above, except that we replaced the null covariance matrix $\Sigma_{n}=I_{p}$ with a diagonal spiked covariance matrix such that $\lambda_{j}(\Sigma_{n})=1.3$ for all $1\leq j\leq p/2$ , and $\lambda_{j}(\Sigma_{n})=1$ for all other $j$ . This choice has the benefit that it creates variation in the numerical values of power, so that they are not too concentrated near 1. Similar alternatives were also used for the experiments in [18]. The results are presented in Table 8, which is organized in the same format as Table 7. In each setting, the power of the statistic $\mathsf{T}_{\text{sr}}$ approximately matches the highest power achieved among $\mathsf{T}_{1}$ , $\mathsf{T}_{2}$ , and $\mathsf{T}_{3}$ .

Table 7: Results on empirical level for sphericity tests (5% nominal level)

	$p/n=0.5$				$p/n=1$				$p/n=2$
$n$	$\mathsf{T}_{1}$	$\mathsf{T}_{2}$	$\mathsf{T}_{3}$	$\mathsf{T}_{\text{sr}}$	$\mathsf{T}_{1}$	$\mathsf{T}_{2}$	$\mathsf{T}_{3}$	$\mathsf{T}_{\text{sr}}$	$\mathsf{T}_{1}$	$\mathsf{T}_{2}$	$\mathsf{T}_{3}$	$\mathsf{T}_{\text{sr}}$
100	0.049	0.048	0.051	0.050	0.049	0.049	0.051	0.054	0.048	0.051	0.053	0.051
200	0.049	0.050	0.052	0.057	0.043	0.047	0.048	0.047	0.051	0.052	0.054	0.051
300	0.046	0.048	0.051	0.046	0.046	0.047	0.048	0.048	0.051	0.054	0.059	0.052
400	0.048	0.051	0.050	0.051	0.049	0.056	0.051	0.054	0.051	0.049	0.053	0.054
500	0.049	0.045	0.048	0.052	0.050	0.046	0.048	0.053	0.044	0.045	0.044	0.046

Table 8: Results on power for sphericity tests

	$p/n=0.5$				$p/n=1$				$p/n=2$
$n$	$\mathsf{T}_{1}$	$\mathsf{T}_{2}$	$\mathsf{T}_{3}$	$\mathsf{T}_{\text{sr}}$	$\mathsf{T}_{1}$	$\mathsf{T}_{2}$	$\mathsf{T}_{3}$	$\mathsf{T}_{\text{sr}}$	$\mathsf{T}_{1}$	$\mathsf{T}_{2}$	$\mathsf{T}_{3}$	$\mathsf{T}_{\text{sr}}$
100	0.190	0.165	0.185	0.217	0.201	0.154	0.190	0.211	0.208	0.136	0.194	0.207
200	0.497	0.391	0.466	0.502	0.513	0.350	0.473	0.503	0.503	0.271	0.465	0.520
300	0.793	0.660	0.766	0.797	0.801	0.589	0.763	0.812	0.793	0.458	0.752	0.804
400	0.952	0.863	0.941	0.952	0.951	0.790	0.933	0.954	0.958	0.670	0.941	0.949
500	0.993	0.958	0.990	0.994	0.993	0.922	0.989	0.993	0.993	0.816	0.987	0.995

5 Conclusion

Up to now, high-dimensional elliptical models have generally fallen outside the scope of existing bootstrap methods for spectral statistics. In the current paper, we have addressed this problem by showing how a parametric bootstrap approach that is specialized to IC models [34] can be extended to elliptical models in high dimensions. In addition, we have shown that the new method is supported by two types of theoretical guarantees in the elliptical setting: First, the method consistently approximates the distributions of linear spectral statistics (Theorem 2). Second, the method can be applied to a nonlinear combination of these statistics to construct asymptotically valid confidence intervals and hypothesis tests (Theorem 3). From a practical perspective, a valuable property of the method is its user-friendliness, since it can be applied to generic spectral statistics in an automatic way. In particular, this provides the user with the flexibility to easily explore spectral statistics whose asymptotic distributions are analytically inconvenient or unknown. With regard to empirical performance, we have presented extensive simulation results, showing that with few exceptions, the method accurately approximates the distributions of both linear and nonlinear spectral statistics across many settings and inference tasks. An interesting question for future work is to determine if a consistent parametric bootstrap method for spectral statistics can be developed within more general models that unify IC and elliptical models.

Appendices

In Appendices A, B, and C, we give the proofs of Theorems 1, 2, and 3 respectively. Background results are given in Appendix D, and additional details related to the examples in Section 3.1 are given in Appendix E.

Appendix A Proof of Theorem 1

Our consistency guarantees for $\widehat{\varsigma}_{n}^{2}$ and $\tilde{H}_{n}$ are proven separately in the next two subsections.

A.1 Consistency of $\widehat{\varsigma}_{n}^{\,2}$ : Proof of (3.7) in Theorem 1

Define the parameter

\displaystyle\tau_{n}=\frac{(p+2)(\beta_{n}-2\alpha_{n})}{\gamma_{n}+2\alpha_{n}}+2,

(A.1)

and the estimate

\displaystyle\widehat{\tau}_{n}=\frac{(p+2)(\widehat{\beta}_{n}-2\widehat{\alpha}_{n})}{\widehat{\gamma}_{n}+2\widehat{\alpha}_{n}}+2.

(A.2)

Based on the definitions of $\varsigma_{n}^{2}$ and $\widehat{\varsigma}_{n}^{2}$ , note that

\textstyle\frac{\varsigma_{n}^{2}}{p}=\tau_{n}\ \ \ \text{ and }\ \ \ \ \textstyle\frac{\widehat{\varsigma}_{n}^{2}}{p}=\max\{\widehat{\tau}_{n},0\},

as well as the fact that the Assumption 1 implies $\tau_{n}\to\tau$ as $n\to\infty$ . Since the function $x\mapsto\max\{x,0\}$ is 1-Lipschitz and $\tau_{n}\geq 0$ , it suffices to show $\widehat{\tau}_{n}-\tau_{n}\xrightarrow{\mathbb{P}}0$ .

In Lemmas A.1, A.2, and A.3 given later in this subsection, the following three limits will be established,

	$\displaystyle\frac{\widehat{\alpha}_{n}}{\alpha_{n}}\ \xrightarrow{\mathbb{P}}\ 1$		(A.3)
	$\displaystyle\frac{\widehat{\gamma}_{n}}{\gamma_{n}}\ \xrightarrow{\mathbb{P}}\ 1$		(A.4)
	$\displaystyle(p+2)\frac{\widehat{\beta}_{n}-\beta_{n}}{\gamma_{n}+2\alpha_{n}}\ \xrightarrow{\mathbb{P}}\ 0.$		(A.5)

Due to the ratio-consistency of $\widehat{\alpha}_{n}$ and $\widehat{\gamma}_{n}$ , as well as the fact that $\frac{(p+2)\alpha_{n}}{\gamma_{n}+2\alpha_{n}}\asymp 1$ holds under Assumption 2, it is straightforward to check that

\frac{\gamma_{n}+2\alpha_{n}}{\widehat{\gamma}_{n}+2\widehat{\alpha}_{n}}\xrightarrow{\mathbb{P}}1\ \ \ \ \text{ and }\ \ \ \ \frac{(p+2)\alpha_{n}}{\gamma_{n}+2\alpha_{n}}-\frac{(p+2)\widehat{\alpha}_{n}}{\widehat{\gamma}_{n}+2\widehat{\alpha}_{n}}\xrightarrow{\mathbb{P}}0.

(A.6)

Therefore,

\begin{split}\widehat{\tau}_{n}-\tau_{n}&=\frac{(p+2)\widehat{\beta}_{n}}{\gamma_{n}+2\alpha_{n}}\left(\frac{\gamma_{n}+2\alpha_{n}}{\widehat{\gamma}_{n}+2\widehat{\alpha}_{n}}-1\right)+(p+2)\frac{\widehat{\beta}_{n}-\beta_{n}}{\gamma_{n}+2\alpha_{n}}+\bigg{(}2\frac{(p+2)\alpha_{n}}{\gamma_{n}+2\alpha_{n}}-2\frac{(p+2)\widehat{\alpha}_{n}}{\widehat{\gamma}_{n}+2\widehat{\alpha}_{n}}\bigg{)}\\ &=\frac{(p+2)\beta_{n}}{\gamma_{n}+2\alpha_{n}}o_{\mathbb{P}}(1)+o_{\mathbb{P}}(1),\end{split}

(A.7)

where we have applied (A.5) twice in the second step. Under Assumption 1, we have $\mathbb{E}(\xi_{1}^{4})=p^{2}+\tau p+o(p)$ and so Lemma D.1 gives

	$\displaystyle\beta_{n}$	$\displaystyle\ =\ \textstyle\frac{\mathbb{E}(\xi_{1}^{4})}{p(p+2)}\Big{(}\textup{tr}(\Sigma_{n})^{2}+2\textup{tr}(\Sigma_{n}^{2})\Big{)}\ -\ \textup{tr}(\Sigma_{n})^{2}$
		$\displaystyle\ =\ \textstyle\frac{p^{2}+\tau p+o(p)}{p(p+2)}\Big{(}\textup{tr}(\Sigma_{n})^{2}+2\textup{tr}(\Sigma_{n}^{2})\Big{)}\ -\ \textup{tr}(\Sigma_{n})^{2}$
		$\displaystyle\ =\ \Big{(}\textstyle\frac{(\tau-2)p+o(p)}{p(p+2)}\Big{)}\textup{tr}(\Sigma_{n})^{2}+2(1+o(1))\textup{tr}(\Sigma_{n}^{2})$
		$\displaystyle\ \lesssim\ p.$

Consequently, we have $\frac{(p+2)\beta_{n}}{\gamma_{n}+2\alpha_{n}}\lesssim 1$ , and applying this to (A.7) completes the proof of (3.7) in Theorem 1.∎

Lemma A.1.

If Assumptions 1 and 2 hold, then as $n\rightarrow\infty$

\frac{\widehat{\alpha}_{n}}{\alpha_{n}}\ \xrightarrow{\mathbb{P}}\ 1.

Proof.

Recall that $\alpha_{n}=\textup{tr}(\Sigma_{n}^{2})$ and that $\widehat{\alpha}_{n}=\textup{tr}(\widehat{\Sigma}_{n}^{2})-\frac{1}{n}\textup{tr}(\widehat{\Sigma}_{n})^{2}$ . The two terms in the estimate can be expanded as

\begin{split}\textup{tr}(\widehat{\Sigma}_{n}^{2})&\ =\ \frac{1}{n^{2}}\sum_{i=1}^{n}\sum_{j=1}^{n}(\mathbf{x}_{i}^{\top}\mathbf{x}_{j})^{2}\\[5.69046pt] \frac{1}{n}\textup{tr}(\widehat{\Sigma}_{n})^{2}&\ =\ \frac{1}{n^{3}}\sum_{i=1}^{n}\sum_{j=1}^{n}(\mathbf{x}_{i}^{\top}\mathbf{x}_{i})(\mathbf{x}_{j}^{\top}\mathbf{x}_{j}),\end{split}

(A.8)

which leads to the algebraic relation

\begin{split}\frac{\widehat{\alpha}_{n}-\alpha_{n}}{\alpha_{n}}&\ =\ \bigg{(}\frac{1}{\alpha_{n}n^{2}}\sum_{i\neq j}(\mathbf{x}_{i}^{\top}\mathbf{x}_{j})^{2}\ -\ 1\bigg{)}\ +\ \frac{1}{\alpha_{n}n^{3}}\sum_{i\neq j}\Big{(}(\mathbf{x}_{i}^{\top}\mathbf{x}_{i})^{2}-(\mathbf{x}_{i}^{\top}\mathbf{x}_{i})(\mathbf{x}_{j}^{\top}\mathbf{x}_{j})\Big{)}\\[5.69046pt] &\ =:\ A_{n}+B_{n}.\end{split}

(A.9)

In the remainder of the proof, we will show that $A_{n}$ and $B_{n}$ are both $o_{\mathbb{P}}(1)$ . We begin with the analysis of $B_{n}$ , since it is simpler. Note that $B_{n}$ is always non-negative, since it can be rewritten as

B_{n}\ =\ \textstyle\frac{1}{\alpha_{n}n^{3}}\sum_{i>j}(\mathbf{x}_{i}^{\top}\mathbf{x}_{i}-\mathbf{x}_{j}^{\top}\mathbf{x}_{j})^{2},

(A.10)

and so it suffices to show that $\mathbb{E}(B_{n})=o(1)$ . Furthermore, the expectation of $B_{n}$ can be computed directly as

\begin{split}\mathbb{E}(B_{n})&\ =\ \textstyle\frac{1}{\alpha_{n}n^{3}}\sum_{i>j}2\,\operatorname{var}(\mathbf{x}_{i}^{\top}\mathbf{x}_{i})\\[5.69046pt] &\ =\ \textstyle\frac{n(n-1)}{\alpha_{n}n^{3}}\operatorname{var}(\mathbf{x}_{1}^{\top}\mathbf{x}_{1})\\[5.69046pt] &\ =\ \frac{n(n-1)}{\alpha_{n}n^{3}}\bigg{[}\Big{(}\textstyle\frac{\mathbb{E}(\xi_{1}^{4})}{p(p+2)}-1\Big{)}\textup{tr}(\Sigma_{n})^{2}\ +\ 2\frac{\mathbb{E}(\xi_{1}^{4})}{p(p+2)}\textup{tr}(\Sigma_{n}^{2})\bigg{]}\ \ \ \ \ \text{(Lemma \ref{lem:quadform})}\\[5.69046pt] &\ =\ \frac{n(n-1)}{\alpha_{n}n^{3}}\bigg{[}\mathcal{O}\big{(}\textstyle\frac{1}{p}\big{)}\textup{tr}(\Sigma_{n})^{2}\ +\ 2\Big{(}1+\mathcal{O}\big{(}\textstyle\frac{1}{p}\big{)}\Big{)}\textup{tr}(\Sigma_{n}^{2})\bigg{]}\ \ \ \ \ \ \ \text{(Assumption \ref{Data generating model})}\\[5.69046pt] &\ \lesssim\ \textstyle\frac{1}{n},\end{split}

(A.11)

where the last step uses $\frac{\textup{tr}(\Sigma_{n})^{2}}{\alpha_{n}}\leq p$ . Thus, $\mathbb{E}(B_{n})=o(1)$ .

Now we handle the term $A_{n}$ by showing that $\mathbb{E}(A_{n}^{2})=o(1)$ . It is helpful to start by noting that if $i\neq j$ , then

\mathbb{E}((\mathbf{x}_{i}^{\top}\mathbf{x}_{j})^{2})=\textup{tr}(\Sigma_{n}^{2}),

(A.12)

which can be checked by a direct calculation. (See the calculation in (D.5) for additional details.) Consequently, if we expand out the square $A_{n}^{2}$ and then take the expectation, it follows that

\mathbb{E}(A_{n}^{2})\ =\ \frac{1}{\alpha_{n}^{2}n^{4}}\mathbb{E}\bigg{[}\Big{(}\sum_{i\neq j}(\mathbf{x}_{i}^{\top}\mathbf{x}_{j})^{2}\Big{)}^{2}\bigg{]}\ -\ \frac{2n(n-1)}{n^{2}}\ +\ 1.

(A.13)

Next, we compute the second moment on the right as

\mathbb{E}\bigg{[}\Big{(}\sum_{i\neq j}(\mathbf{x}_{i}^{\top}\mathbf{x}_{j})^{2}\Big{)}^{2}\bigg{]}\ =\ \displaystyle\sum_{i\neq j}\sum_{l\neq k}\mathbb{E}\Big{[}(\mathbf{x}_{i}^{\top}\mathbf{x}_{j})^{2}(\mathbf{x}_{l}^{\top}\mathbf{x}_{k})^{2}\Big{]}.

(A.14)

In Lemma D.2, it is shown that if $(i,j,k,l)$ are four distinct indices, then

$\displaystyle\mathbb{E}((\mathbf{x}_{i}^{\top}\mathbf{x}_{j})^{4})\$	$\displaystyle=\ 3\left(\frac{\mathbb{E}(\xi_{1}^{4})}{p(p+2)}\right)^{2}(\alpha_{n}^{2}+2\textup{tr}(\Sigma_{n}^{4}))$	(A.15)
$\displaystyle\mathbb{E}((\mathbf{x}_{i}^{\top}\mathbf{x}_{j})^{2}(\mathbf{x}_{i}^{\top}\mathbf{x}_{k})^{2})\$	$\displaystyle=\ \frac{\mathbb{E}(\xi_{1}^{4})}{p(p+2)}(\alpha_{n}^{2}+2\textup{tr}(\Sigma_{n}^{4}))$	(A.16)
$\displaystyle\mathbb{E}((\mathbf{x}_{i}^{\top}\mathbf{x}_{j})^{2}(\mathbf{x}_{l}^{\top}\mathbf{x}_{k})^{2})\$	$\displaystyle=\ \alpha_{n}^{2}.$	(A.17)

Note that in the double sum (A.14), the numbers of terms involving 2, 3, and 4 distinct indices are respectively $\mathcal{O}(n^{2})$ , $\mathcal{O}(n^{3})$ and $n^{4}+\mathcal{O}(n^{3})$ . Applying these observations to (A.14), we have

\begin{split}\textstyle\frac{1}{\alpha_{n}^{2}n^{4}}\mathbb{E}\bigg{[}\Big{(}\sum_{i\neq j}(\mathbf{x}_{i}^{\top}\mathbf{x}_{j})^{2}\Big{)}^{2}\bigg{]}&\ =\ \textstyle\frac{\alpha_{n}^{2}}{\alpha_{n}^{2}n^{4}}\Big{(}n^{4}+\mathcal{O}(n^{3})\Big{)}\\ &\ \ \ \ \ +\ \textstyle\frac{1}{\alpha_{n}^{2}n^{4}}\Big{(}\textstyle\frac{\mathbb{E}(\xi_{1}^{4})}{p(p+2)}(\alpha_{n}^{2}+2\textup{tr}(\Sigma_{n}^{4}))\Big{)}\mathcal{O}(n^{3})\\ &\ \ \ \ \ +\ \textstyle\frac{1}{\alpha_{n}^{2}n^{4}}\Big{(}3\left(\frac{\mathbb{E}(\xi_{1}^{4})}{p(p+2)}\right)^{2}(\alpha_{n}^{2}+2\textup{tr}(\Sigma_{n}^{4}))\Big{)}\mathcal{O}(n^{2})\\[5.69046pt] &\ =\ 1+\mathcal{O}(\textstyle\frac{1}{n}).\end{split}

Combining this with (A.13), we reach the following bound on $\mathbb{E}(A_{n}^{2})$ ,

\begin{split}\mathbb{E}(A_{n}^{2})&\ =\ 1+\mathcal{O}(\textstyle\frac{1}{n})-\frac{2n(n-1)}{n^{2}}\ +\ 1\\ &\ \lesssim\ \textstyle\frac{1}{n},\end{split}

(A.18)

which completes the proof. ∎

Lemma A.2.

If Assumptions 1 and 2 hold, then as $n\to\infty$

\frac{\widehat{\gamma}_{n}}{\gamma_{n}}\ \xrightarrow{\mathbb{P}}\ 1.

Proof.

Recall $\gamma_{n}=\textup{tr}(\Sigma_{n})^{2}$ and $\widehat{\gamma}_{n}=\textup{tr}(\widehat{\Sigma}_{n})^{2}$ , and note that the algebraic identity $a^{2}-b^{2}=2b(a-b)+(a-b)^{2}$ gives

	$\displaystyle\mathbb{E}\|\widehat{\gamma}_{n}-\gamma_{n}\|$	$\displaystyle\ =\ \mathbb{E}\Big{\|}2\textup{tr}(\Sigma_{n})\big{(}\textup{tr}(\widehat{\Sigma}_{n})-\textup{tr}(\Sigma_{n})\big{)}+\big{(}\textup{tr}(\widehat{\Sigma}_{n})-\textup{tr}(\Sigma_{n})\big{)}^{2}\Big{\|}$
		$\displaystyle\ \leq\ 2\textup{tr}(\Sigma_{n})\big{(}\operatorname{var}(\textup{tr}(\widehat{\Sigma}_{n}))\big{)}^{1/2}+\operatorname{var}(\textup{tr}(\widehat{\Sigma}_{n})).$

Next, observe that $\operatorname{var}(\textup{tr}(\widehat{\Sigma}_{n}))=\frac{1}{n}\operatorname{var}(\mathbf{x}_{1}^{\top}\mathbf{x}_{1})$ , and that the calculation in (A.11) shows $\operatorname{var}(\mathbf{x}_{1}^{\top}\mathbf{x}_{1})\lesssim p$ . Combining this with the fact that $\gamma_{n}\asymp p^{2}$ under Assumption 2, we have

\frac{\mathbb{E}|\widehat{\gamma}_{n}-\gamma_{n}|}{\gamma_{n}}\ \lesssim\ \frac{\textup{tr}(\Sigma_{n})+1}{p^{2}}

(A.19)

which leads to the stated result.∎

Lemma A.3.

If Assumptions 1 and 2 hold, then as $n\to\infty$ ,

(p+2)\frac{\widehat{\beta}_{n}-\beta_{n}}{\gamma_{n}+2\alpha_{n}}\ \xrightarrow{\mathbb{P}}\ 0.

(A.20)

Proof.

Recall that $\beta_{n}=\operatorname{var}(\|\mathbf{x}_{1}\|_{2}^{2})$ and $\widehat{\beta}_{n}=\frac{1}{n-1}\sum_{i=1}^{n}\Big{(}\|\mathbf{x}_{i}\|_{2}^{2}-\textstyle\frac{1}{n}\sum_{i^{\prime}=1}^{n}\|\mathbf{x}_{i^{\prime}}\|_{2}^{2}\Big{)}^{2}$ . It is clear that $\mathbb{E}(\widehat{\beta}_{n})=\beta_{n}$ , and so it suffices to show

\textstyle\frac{(p+2)^{2}}{(\gamma_{n}+2\alpha_{n})^{2}}\operatorname{var}(\widehat{\beta}_{n})=o(1).

Since Assumption 2 implies $\gamma_{n}+2\alpha_{n}\asymp p^{2}$ , it remains to show $\operatorname{var}(\widehat{\beta}_{n})=o(p^{2})$ .

Making use of a standard bound for the variance of a sample variance, we have

\operatorname{var}(\widehat{\beta}_{n})\ \lesssim\ \textstyle\frac{1}{n}\,\mathbb{E}\Big{|}\left\|\mathbf{x}_{1}\right\|_{2}^{2}-\textup{tr}(\Sigma_{n})\Big{|}^{4}.

(A.21)

To bound the right side of (A.21), observe that

	$\displaystyle\mathbb{E}\Big{\|}\!\left\\|\mathbf{x}_{1}\right\\|_{2}^{2}-\textup{tr}\left(\Sigma_{n}\right)\!\Big{\|}^{4}$	$\displaystyle\ =\ \mathbb{E}\left\|\xi_{1}^{2}\left(\mathbf{u}_{1}^{\top}\Sigma_{n}\mathbf{u}_{1}-\textstyle\frac{1}{p}\textup{tr}(\Sigma_{n})\right)+\left(\textstyle\frac{1}{p}\xi_{1}^{2}-1\right)\textup{tr}(\Sigma_{n})\right\|^{4}$
		$\displaystyle\ \lesssim\ \mathbb{E}(\xi_{1}^{8})\cdot\mathbb{E}\left\|\mathbf{u}_{1}^{\top}\Sigma_{n}\mathbf{u}_{1}-\textstyle\frac{1}{p}\textup{tr}(\Sigma_{n})\right\|^{4}+\textup{tr}(\Sigma_{n})^{4}\cdot\mathbb{E}\left\|\textstyle\frac{1}{p}\xi_{1}^{2}-1\right\|^{4}.$

Since $\lambda_{1}(\Sigma_{n})\lesssim 1$ holds under Assumption 2, it follows from Lemma D.3 that

\mathbb{E}\left|\mathbf{u}_{1}^{\top}\Sigma_{n}\mathbf{u}_{1}-\textstyle\frac{1}{p}\textup{tr}(\Sigma_{n})\right|^{4}\ \lesssim\ \textstyle\frac{1}{p^{2}}.

Also, Assumption 1 implies

\mathbb{E}\left|\textstyle\frac{1}{p}\xi_{1}^{2}-1\right|^{4}\ \lesssim\ \textstyle\frac{1}{p^{2}}\ \ \ \ \ \text{ and }\ \ \ \ \ \mathbb{E}(\xi_{1}^{8})\,\lesssim\,p^{4}.

(A.22)

Applying the last several observations to (A.21) implies $\operatorname{var}(\widehat{\beta}_{n})\lesssim p$ , which yields the stated result. ∎

A.2 Consistency of $\tilde{H}_{n}$ : Proof of (3.8) in Theorem 1

For each $t\in\mathbb{R}$ , denote the empirical distribution function of the QuEST eigenvalue estimates as

\widehat{H}_{\textup{Q},n}(t)=\frac{1}{p}\sum_{j=1}^{p}1\{\widehat{\lambda}_{\textup{Q},j}\leq t\}.

It follows from Lemma A.4 below that the limit

\widehat{H}_{\textup{Q},n}\Rightarrow H

(A.23)

holds almost surely as $n\to\infty$ . So, to prove (3.8), it is sufficient to show $\sup_{t\in\mathbb{R}}|\widehat{H}_{\textup{Q},n}(t)-\tilde{H}_{n}(t)|\xrightarrow{\mathbb{P}}0$ .

Since $\widehat{\lambda}_{\textup{Q},j}$ and $\tilde{\lambda}_{j}$ can only disagree when $\widehat{\lambda}_{\textup{Q},j}>\widehat{b}_{n}$ , we have

\begin{split}\sup_{t\in\mathbb{R}}|\widehat{H}_{\textup{Q},n}(t)-\tilde{H}_{n}(t)|&\ \leq\ \frac{1}{p}\sum_{j=1}^{p}1\{\widehat{\lambda}_{\textup{Q},j}\neq\tilde{\lambda}_{j}\}\\[5.69046pt] &\ =\ \frac{1}{p}\sum_{j=1}^{p}1\{\widehat{\lambda}_{\textup{Q},j}>\widehat{b}_{n}\}\\[5.69046pt] &\ =\ 1-\widehat{H}_{\textup{Q},n}(\widehat{b}_{n}).\end{split}

(A.24)

Let $u$ denote the upper endpoint of the support of the distribution associated with $H$ , and fix any $\epsilon\in(0,\frac{1}{4})$ . Since $\widehat{H}_{\textup{Q},n}$ is a non-decreasing function, we have

\begin{split}\sup_{t\in\mathbb{R}}|\widehat{H}_{\textup{Q},n}-\tilde{H}_{n}(t)|&\ \leq\ 1-\widehat{H}_{\textup{Q},n}(u+\epsilon)1\{\widehat{b}_{n}\geq u+\epsilon\}.\end{split}

(A.25)

By the definition of $u$ , the value $u+\epsilon$ is a continuity point of $H$ with $H(u+\epsilon)=1$ , and so the limit (A.23) implies $\widehat{H}_{\textup{Q},n}(u+\epsilon)=1+o_{\mathbb{P}}(1)$ . Hence, in order to show that the right side of (A.25) converges to 0 in probability, it is enough to show that $\mathbb{P}(\widehat{b}_{n}\geq u+\epsilon)\to 1$ .

We will handle this remaining task by showing instead that $\mathbb{P}(\widehat{b}_{n}<u+\epsilon)\to 0$ . Recall that $\widehat{b}_{n}=\lambda_{1}(\widehat{\Sigma}_{n})+1$ and note that the limit $H_{n}\Rightarrow H$ implies that $\lambda_{1}(\Sigma_{n})\geq u-\epsilon$ must hold for all large $n$ . Therefore, we have

\begin{split}\mathbb{P}(\widehat{b}_{n}<u+\epsilon)&\ \leq\ \mathbb{P}\Big{(}\lambda_{1}(\widehat{\Sigma}_{n})\leq\lambda_{1}(\Sigma_{n})+2\epsilon-1\Big{)}\end{split}

(A.26)

for all large $n$ . To derive an upper bound on the last probability, we will replace $\lambda_{1}(\widehat{\Sigma}_{n})$ with a smaller random variable, and then rearrange the event. Let $\mathbf{v}_{1}$ denote an eigenvector of $\Sigma_{n}$ corresponding to $\lambda_{1}(\Sigma_{n})$ with $\|\mathbf{v}_{1}\|_{2}=1$ . Defining the random matrix $W_{n}=\frac{1}{n}\sum_{i=1}^{n}\xi_{i}^{2}\mathbf{u}_{i}\mathbf{u}_{i}^{\top}$ , the variational representation of $\lambda_{1}(\widehat{\Sigma}_{n})$ gives $\lambda_{1}(\widehat{\Sigma}_{n})\geq\lambda_{1}(\Sigma_{n})\mathbf{v}_{1}^{\prime}W_{n}\mathbf{v}_{1}$ . Also note that our choice of $\epsilon$ ensures $2\epsilon-1\leq-1/2$ . This yields the following bounds for all large $n$ ,

\begin{split}\mathbb{P}(\widehat{b}_{n}<u+\epsilon)&\ \leq\ \mathbb{P}\Big{(}\lambda_{1}(\Sigma_{n})|\mathbf{v}_{1}^{\top}W_{n}\mathbf{v}_{1}-1|\geq 1/2\Big{)}\\[5.69046pt] &\ \leq\ 4\lambda_{1}(\Sigma_{n})^{2}\operatorname{var}\big{(}\mathbf{v}_{1}^{\top}W_{n}\mathbf{v}_{1}\big{)}\\[5.69046pt] &\ =\ \textstyle\frac{4\lambda_{1}(\Sigma_{n})^{2}}{n}\,\operatorname{var}\big{(}\xi_{1}^{2}\mathbf{u}_{1}^{\top}(\mathbf{v}_{1}\mathbf{v}_{1}^{\top})\mathbf{u}_{1}\big{)}\\[5.69046pt] &\ =\ \textstyle\frac{4\lambda_{1}(\Sigma_{n})^{2}}{n}\,\left(3\frac{\mathbb{E}(\xi_{1}^{4})}{p(p+2)}-1\right)\ \ \ \ \ \ \ \ \ \ \ \text{(Lemma~{}\ref{lem:quadform})}\\[0.0pt] &\ \lesssim\ \textstyle\frac{1}{n},\end{split}

(A.27)

where the last step uses Assumptions 1 and 2.∎

Lemma A.4.

Suppose that Assumptions 1 and 2 hold. Then, the following limit holds almost surely as $n\to\infty$

\widehat{H}_{\textup{Q},n}\Rightarrow H.

(A.28)

Proof.

In the paper [26, pp.381-382], the limit (A.28) is established in the context of an IC model where $p/n\to c\in(0,\infty)\setminus\{1\}$ , and the eigenvalues of the population covariance matrix $\Sigma_{n}$ satisfy Assumption 2. To adapt the proof from [26] to our current setting, it is sufficient to show that the following two facts hold under Assumptions 1 and 2. First, there is a constant $C>0$ such that the bound $\limsup_{n\to\infty}\lambda_{1}(\widehat{\Sigma}_{n})\leq C$ holds almost surely, which we prove later using a truncation argument in Lemmas A.5 and A.6. Second, the random distribution function $\widehat{H}_{n}(t)=\frac{1}{p}\sum_{j=1}^{p}1\{\lambda_{j}(\widehat{\Sigma}_{n})\leq t\}$ satisfies $\widehat{H}_{n}\Rightarrow\Psi(H,c)$ almost surely, where we recall that the distribution $\Psi(H,c)$ is defined near equation (3.9). The validity of this second fact under our current assumptions is a consequence of Theorem 1.1 in [6].∎

A.3 Boundedness of sample eigenvalues

Lemma A.5.

Under Assumptions 1 and 2, there is a constant $C>0$ such that the bounds

\displaystyle\limsup_{n\to\infty}\lambda_{1}({\widehat{\Sigma}_{n}})

\displaystyle\leq C\text{ \ \ \ \ \ \ and \ \ \ \ \ \ }\limsup_{n\to\infty}\lambda_{1}({\tilde{\Sigma}_{n}})\leq C

(A.29)

hold almost surely.

Proof.

Since the estimates $\tilde{\lambda}_{j}=\lambda_{j}(\tilde{\Sigma}_{n})$ are bounded above by $\widehat{b}_{n}=\lambda_{1}(\widehat{\Sigma}_{n})+1$ for all $j\in\{1,\dots,p\}$ , it is enough to focus on the first inequality in (A.29). Define a sequence of truncated random variables $\breve{\xi}_{i}=\xi_{i}\,1\{|\xi_{i}^{2}-p|<\sqrt{pn}\}$ for $i=1,\dots,n$ , as well as the following truncated version of $\widehat{\Sigma}_{n}$ ,

\breve{\Sigma}_{n}=\frac{1}{n}\sum_{i=1}^{n}\breve{\xi}_{i}^{2}\,\Sigma_{n}^{1/2}\mathbf{u}_{i}\mathbf{u}_{i}^{\top}\Sigma_{n}^{1/2}.

(A.30)

Lemma A.6 below shows that $\mathbb{P}(\widehat{\Sigma}_{n}\neq\breve{\Sigma}_{n}\ \text{i.o.})=0$ . Consequently, it suffices to show there is a constant $C>0$ such that $\limsup_{n\to\infty}\lambda_{1}(\breve{\Sigma}_{n})\leq C$ holds almost surely.

Since the vectors $\mathbf{u}_{1},\dots,\mathbf{u}_{n}$ are uniformly distributed on the unit sphere of $\mathbb{R}^{p}$ , we may express them as $\mathbf{u}_{i}=\mathbf{z}_{i}/\|\mathbf{z}_{i}\|_{2}$ for a sequence of i.i.d. standard Gaussian vectors $\mathbf{z}_{1},\dots,\mathbf{z}_{n}$ in $\mathbb{R}^{p}$ . This yields

\breve{\Sigma}_{n}=\frac{1}{n}\sum_{i=1}^{n}\frac{\breve{\xi}_{i}^{2}/p}{\|\mathbf{z}_{i}\|_{2}^{2}/p}\Sigma_{n}^{1/2}\mathbf{z}_{i}\mathbf{z}_{i}^{\top}\Sigma_{n}^{1/2}.

(A.31)

By construction, we have $\max_{1\leq i\leq n}\breve{\xi}_{i}^{2}/p\leq 1+1/\sqrt{c_{n}}$ for all $n\geq 1$ . Also, using standard tail bounds for the $\chi_{p}^{2}$ distribution and the Borel-Cantelli lemma, it is straightforward to show that $\liminf_{n\to\infty}\min_{1\leq i\leq n}\|\mathbf{z}_{i}\|_{2}^{2}/p$ is at least $1/2$ almost surely. Taken together, these observations imply there is a constant $C>0$ such that the bound

\lambda_{1}(\breve{\Sigma}_{n})\ \leq\ C\,\lambda_{1}(\Sigma_{n})\,\lambda_{1}\Big{(}\textstyle\frac{1}{n}\sum_{i=1}^{n}\mathbf{z}_{i}\mathbf{z}_{i}^{\top}\Big{)}

(A.32)

holds almost surely for all large $n$ . In addition, note that $\sup_{n\geq 1}\lambda_{1}(\Sigma_{n})\lesssim 1$ holds by Assumption 2. Lastly, it is known from [50, Theorem 3.1] that the limit

\lim_{n\to\infty}\lambda_{1}\Big{(}\textstyle\frac{1}{n}\sum_{i=1}^{n}\mathbf{z}_{i}\mathbf{z}_{i}^{\top}\Big{)}\ =\ (1+\sqrt{c})^{2}

holds almost surely, which completes the proof.∎

Lemma A.6.

Suppose that Assumption 1 holds, and let $\breve{\Sigma}_{n}$ be as defined in (A.30). Then,

\mathbb{P}(\widehat{\Sigma}_{n}\neq\breve{\Sigma}_{n}\ \textup{i.o.})=0.

(A.33)

Proof.

We adapt a classical argument from [50]. For a fixed number $n$ , the matrices $\widehat{\Sigma}_{n}$ and $\breve{\Sigma}_{n}$ can only disagree if $|\xi_{i}^{2}-p|\geq\sqrt{pn}$ for at least one $i=1,\dots,n$ , and so

\mathbb{P}(\widehat{\Sigma}_{n}\neq\breve{\Sigma}_{n}\ \textup{i.o.})\ \leq\ \lim_{j\to\infty}\mathbb{P}\bigg{(}\bigcup_{n=j}^{\infty}\bigcup_{i=1}^{n}\{|\xi_{i}^{2}-p|\geq\sqrt{pn}\}\bigg{)}.

(A.34)

Next, we partition the values of $n$ into the intervals $[2^{m-1},2^{m}),[2^{m},2^{m+1}),\dots$ and take a union bound across the intervals, yielding

\begin{split}\mathbb{P}(\widehat{\Sigma}_{n}\neq\breve{\Sigma}_{n}\ \textup{i.o.})&\ \leq\ \lim_{j\to\infty}\sum_{m=j}^{\infty}\mathbb{P}\Bigg{(}\max_{\underset{2^{m-1}\leq n<2^{m}}{1\leq i\leq n}}|\textstyle\frac{\xi_{i}^{2}-p}{\sqrt{p}}|\,\geq\,\sqrt{2^{m-1}}\Bigg{)}.\end{split}

(A.35)

For a generic sequence of random variables $Y_{1},\dots,Y_{N}$ and number $q\geq 1$ , recall the standard maximal inequality

\mathbb{E}\Big{|}\!\max_{1\leq i\leq N}Y_{i}\Big{|}^{q}\ \leq\ N\max_{1\leq i\leq N}\mathbb{E}|Y_{i}|^{q}.

(A.36)

Since the number of pairs $(i,n)$ in the maximum in (A.35) is at most $2^{2m}$ , if we apply Chebyshev’s inequality to (A.35) and use the condition $\mathbb{E}|\textstyle\frac{\xi_{1}^{2}-p}{\sqrt{p}}|^{4+\varepsilon}\lesssim 1$ from Assumption 1, then for each $m\geq 1$ we have

\begin{split}\mathbb{P}\Bigg{(}\max_{\underset{2^{m-1}\leq n\leq 2^{m}}{1\leq i\leq n}}|\textstyle\frac{\xi_{i}^{2}-p}{\sqrt{p}}|\,\geq\,\sqrt{2^{m-1}}\Bigg{)}&\ \lesssim\ \frac{2^{2m}}{\big{(}\sqrt{2^{m-1}}\big{)}^{4+\varepsilon}}\\[5.69046pt] &\ \lesssim\ 2^{-m\varepsilon/2}.\end{split}

(A.37)

Hence, we may insert this bound into (A.35) to conclude that

\mathbb{P}(\widehat{\Sigma}_{n}\neq\breve{\Sigma}_{n}\ \textup{i.o.})\ \lesssim\ \lim_{j\to\infty}\sum_{m=j}^{\infty}2^{-m\varepsilon/2}\ =\ 0,

which completes the proof.∎

Appendix B Proof of Theorem 2

Let $\mathbf{z}\in\mathbb{R}^{k}$ denote a Gaussian random vector to be described in a moment, and consider the triangle inequality

d_{\mathrm{LP}}\Big{(}\mathcal{L}(p\big{\{}T_{n}(\mathbf{f})-\vartheta_{n}(\mathbf{f})\big{\}}\big{)}\,,\,\mathcal{L}\big{(}p\big{\{}T_{n,1}^{*}(\mathbf{f})-\tilde{\vartheta}_{n}(\mathbf{f})\big{\}}\big{|}X\big{)}\Big{)}\ \leq\ \textup{I}_{n}\ +\ \textup{II}_{n},

(B.1)

where we define

	$\displaystyle\textup{I}_{n}$	$\displaystyle\ =\ d_{\mathrm{LP}}\Big{(}\mathcal{L}\big{(}p\big{\{}T_{n}(\mathbf{f})-\vartheta_{n}(\mathbf{f})\big{\}}\big{)},\mathcal{L}(\mathbf{z})\Big{)}$		(B.2)
	$\displaystyle\textup{II}_{n}$	$\displaystyle\ =\ d_{\mathrm{LP}}\Big{(}\mathcal{L}(\mathbf{z})\,,\,\mathcal{L}\big{(}p\big{\{}T_{n,1}^{*}(\mathbf{f})-\tilde{\vartheta}_{n}(\mathbf{f})\}\big{\|}X\big{)}\Big{)}.$		(B.3)

To handle the terms $\textup{I}_{n}$ and $\textup{II}_{n}$ , we will apply a central limit theorem for linear spectral statistics established in [18], which relies on the following two conditions when $c_{n}\to c>0$ as $n\to\infty$ :

(a).

The elliptical model in Assumption 1 holds with the conditions on $\xi_{1}^{2}$ being replaced by

\mathbb{E}(\xi_{1}^{2})=p,\ \ \ \ \operatorname{var}\Big{(}\textstyle\frac{\xi_{1}^{2}-p}{\sqrt{p}}\Big{)}=\tau+o(1)\ \ \ \ \ \text{ and }\ \ \ \ \mathbb{E}\Big{|}\textstyle\frac{\xi_{1}^{2}-p}{\sqrt{p}}\Big{|}^{2+\varepsilon}\lesssim 1,

(B.4)

for some constants constants $\tau\geq 0$ and $\varepsilon>0$ that do not depend on $n$ .

(b).

There is a distribution $H$ such that $H_{n}\Rightarrow H$ as $n\to\infty$ , and $\lambda_{1}(\Sigma_{n})\lesssim 1$ .

Under these conditions, Theorem 2.2 in [18] ensures there exists a Gaussian distribution $\mathcal{L}(\mathbf{z})$ depending only on $(\mathbf{f},H,c,\tau)$ such that $\textup{I}_{n}\to 0$ as $n\to\infty$ .

To finish the proof, we must show $\textup{II}_{n}\xrightarrow{\mathbb{P}}0$ . This can be done using the mentioned central limit theorem, but some extra considerations are involved, due to the fact that $\textup{II}_{n}$ is random. It is sufficient to show that for any subsequence $J\subset\{1,2,\dots\}$ , the limit $\textup{II}_{n}\to 0$ holds almost surely along a further subsequence of $J$ . Since the bootstrap data are generated from an elliptical model that is parameterized in terms of $\mathcal{L}((\xi_{1}^{*})^{2}|X)$ and $\tilde{\Sigma}_{n}$ , this amounts to verifying that $\mathcal{L}((\xi_{1}^{*})^{2}|X)$ and $\tilde{H}_{n}$ satisfy analogues of (a) and (b) almost surely along a subsequence of $J$ .

To proceed, recall that Algorithm 1 is designed so that $\mathbb{E}((\xi_{1}^{*})^{2}|X)=p$ and $\operatorname{var}((\xi_{1}^{*})^{2}|X)=\widehat{\varsigma}_{n}^{2}$ . Also, note that under Assumption 1, the parameter $\varsigma_{n}^{2}=\operatorname{var}(\xi_{1}^{2})$ satisfies $\varsigma_{n}^{2}/p\to\tau$ . Consequently, Theorem 1 implies $\widehat{\varsigma}_{n}^{2}/p=\tau+o_{\mathbb{P}}(1)$ , and so there is a subsequence $J^{\prime}\subset J$ along which the limit $\widehat{\varsigma}_{n}^{2}/p\to\tau$ holds almost surely. In other words, the limit

\operatorname{var}\!\Big{(}\textstyle\frac{(\xi_{1}^{*})^{2}-p}{\sqrt{p}}\Big{|}X\Big{)}\ \to\ \tau

(B.5)

holds almost surely along $J^{\prime}$ . Moreover, since $\mathcal{L}((\xi_{1}^{*})^{2}|X)$ is a Gamma distribution with mean $p$ and variance $\widehat{\varsigma}_{n}^{2}$ , Lemma E.4 implies there is a constant $C>0$ not depending on $n$ such that the bound

\Big{(}\mathbb{E}\Big{(}\big{|}\textstyle\frac{(\xi_{1}^{*})^{2}-p}{\sqrt{p}}\big{|}^{6}\Big{|}X\Big{)}\Big{)}^{1/6}\leq C\big{(}\textstyle\frac{\widehat{\varsigma}_{n}}{\sqrt{p}}+\textstyle\frac{\widehat{\varsigma}_{n}^{2}}{p^{3/2}}\big{)}

(B.6)

holds almost surely. Therefore, the left side of (B.6) is bounded almost surely along $J^{\prime}$ .

With regard to the empirical spectral distribution $\tilde{H}_{n}$ associated with the matrix $\tilde{\Sigma}_{n}$ , it satisfies the limit $\tilde{H}_{n}\Rightarrow H$ almost surely along a further subsequence $J^{\prime\prime}\subset J^{\prime}$ , due to Theorem 1. In addition, Lemma A.5 ensures there is a constant $C>0$ such that $\limsup_{n\to\infty}\lambda_{1}(\tilde{\Sigma}_{n})\leq C$ almost surely. Altogether, it follows that $\mathcal{L}((\xi_{1}^{*})^{2}|X)$ and $\tilde{H}_{n}$ simultaneously satisfy analogues of (a) and (b) almost surely along $J^{\prime\prime}$ , which implies $\textup{II}_{n}\xrightarrow{\mathbb{P}}0$ .∎

Appendix C Proof of Theorem 3

To begin the proof, we need to introduce three auxiliary statistics defined by

$\displaystyle\breve{r}_{n}$	$\displaystyle=\frac{\textup{tr}(\widehat{\Sigma}_{n})^{2}}{\textup{tr}(\widehat{\Sigma}_{n}^{2})-\frac{1}{n}\textup{tr}(\widehat{\Sigma}_{n})^{2}}$	(C.1)
$\displaystyle\breve{r}_{n}^{*}$	$\displaystyle=\frac{\textup{tr}(\widehat{\Sigma}^{}_{n})^{2}}{\textup{tr}((\widehat{\Sigma}^{}_{n})^{2})-\frac{1}{n}\textup{tr}(\widehat{\Sigma}_{n}^{*})^{2}}$	(C.2)
$\displaystyle\tilde{r}_{n}$	$\displaystyle=\frac{\textup{tr}(\tilde{\Sigma}_{n})^{2}}{\textup{tr}(\tilde{\Sigma}_{n}^{2})}.$	(C.3)

Also, we define $\widehat{r}_{n}^{*}$ as the statistic obtained by applying the formula (3.13) for $\widehat{r}_{n}$ to the bootstrap data $\mathbf{x}_{1}^{*},\dots,\mathbf{x}_{n}^{*}$ . The primary task is to establish the following four limits, which are established later in this appendix in Propositions 1 and 2. Specifically, these results show that there exists a non-degenerate Gaussian random variable $\zeta$ and a constant $a$ , such that as $n\to\infty$ ,

$\displaystyle\mathcal{L}(\breve{r}_{n}-r_{n})$	$\displaystyle\ \Rightarrow\ \mathcal{L}(\zeta),$	(C.4)
$\displaystyle\mathcal{L}\big{(}\breve{r}_{n}^{*}-\tilde{r}_{n}\big{\|}X\big{)}$	$\displaystyle\ \xRightarrow{\mathbb{P}}\ \mathcal{L}(\zeta),$	(C.5)
$\displaystyle\mathcal{L}(\widehat{r}_{n}-\breve{r}_{n})$	$\displaystyle\ \Rightarrow\ \mathcal{L}(a),$	(C.6)
$\displaystyle\mathcal{L}(\widehat{r}_{n}^{}-\breve{r}_{n}^{}\big{\|}X\big{)}$	$\displaystyle\ \xRightarrow{\mathbb{P}}\ \mathcal{L}(a),$	(C.7)

where $\mathcal{L}(a)$ denotes the point mass distribution at $a$ . (In the current section, we sometimes use the notation for weak convergence in limits where convergence in probability holds, because it will help to clarify how $\widehat{r}_{n}^{*}-\breve{r}_{n}^{*}$ can be analyzed in an analogous manner to $\widehat{r}_{n}-\breve{r}_{n}$ .) Using Slutsky’s lemma, it follows from the limits (C.4) and (C.6) that

\displaystyle\mathcal{L}(\widehat{r}_{n}-r_{n})

\displaystyle\ \Rightarrow\ \mathcal{L}(\zeta+a).

(C.8)

Analogously, Slutsky’s lemma can be applied in a conditional manner to the limits (C.5) and (C.7), yielding

\displaystyle\mathcal{L}\big{(}\widehat{r}_{n}^{*}-\tilde{r}_{n}\big{|}X\big{)}

\displaystyle\ \xRightarrow{\mathbb{P}}\ \mathcal{L}(\zeta+a).

(C.9)

Since the limiting distribution $\mathcal{L}(\zeta+a)$ is continuous, Pólya’s theorem implies

\displaystyle\sup_{t\in\mathbb{R}}\Big{|}\mathbb{P}(\widehat{r}_{n}-r_{n}\leq t)-\mathbb{P}(\widehat{r}_{n}^{*}-\tilde{r}_{n}\leq t|X)\Big{|}\ \xrightarrow{\mathbb{P}}\ 0.

(C.10)

Due to this uniform limit and the continuity of the distribution $\mathcal{L}(\zeta+a)$ , standard arguments can be used to show that the quantiles of $\mathcal{L}(\widehat{r}_{n}^{*}-\tilde{r}_{n}|X)$ are asymptotically equivalent to those of $\mathcal{L}(\widehat{r}_{n}-r_{n})$ . (For example, see the proof of Lemma 10.4 in [31].) More precisely, if we note that the quantile estimate $\widehat{q}_{1-\alpha}$ defined in Section 3.4 is the same as the $(1-\alpha)$ -quantile of $\mathcal{L}\big{(}\frac{1}{p}(\widehat{r}_{n}^{*}-\tilde{r}_{n})|X\big{)}$ , then as $n\to\infty$ we have the limit

\mathbb{P}\big{(}\textstyle\frac{1}{p}(\widehat{r}_{n}-r_{n})>\widehat{q}_{1-\alpha}\big{)}\to\alpha.

(C.11)

This limit directly implies the first two statements (3.19) and (3.20) in Theorem 3. Regarding the third statement (3.21), if $\mathsf{H}_{0,n}^{\prime}$ holds for all large $n$ , then replacing $\tilde{\Sigma}_{n}=\textup{diag}(\tilde{\lambda}_{1},\dots,\tilde{\lambda}_{p})$ with $\tilde{\Sigma}_{n}=I_{p}$ does not affect the reasoning leading up to (C.11), because this replacement does not affect the proofs of Propositions 1 and 2 given later. Consequently, if $\mathsf{H}_{0,n}^{\prime}$ holds for all large $n$ , then we have $\mathbb{P}\big{(}\frac{1}{p}(\widehat{r}_{n}-1)\leq\widehat{q}_{\alpha}^{\prime}\big{)}\to\alpha$ as $n\to\infty$ , which completes the proof.∎

Proposition 1.

Under Assumptions 1 and 2, the limits (C.4) and (C.5) hold as $n\to\infty$ .

Proof.

The proof is based on viewing $\breve{r}_{n}$ as a nonlinear function of $T_{n}(\mathbf{f})$ , where the components of $\mathbf{f}=(f_{1},f_{2})$ are taken to be $f_{1}(x)=x$ and $f_{2}(x)=x^{2}$ . In this case, the centering parameter $\vartheta_{n}(\mathbf{f})$ defined by (3.10) reduces to

\vartheta_{n}(\mathbf{f})\ =\ \left(\textstyle\frac{1}{p}\textup{tr}(\Sigma_{n})\,,\,\textstyle\frac{1}{p}\textup{tr}(\Sigma_{n}^{2})+\frac{1}{np}\textup{tr}(\Sigma_{n})^{2}\right),

(C.12)

as recorded in Lemma 2.16 of [48]. Consequently, by considering the function

g_{n}(x_{1},x_{2})=\textstyle\frac{x_{1}^{2}}{x_{2}-c_{n}x_{1}^{2}},

(C.13)

we have the key relation

\breve{r}_{n}-r_{n}\ =\ p\big{\{}g_{n}(T_{n}(\mathbf{f}))-g_{n}(\vartheta_{n}(\mathbf{f}))\big{\}}.

(C.14)

We can also develop a corresponding relation for the bootstrap statistic $\breve{r}_{n}^{*}-\tilde{r}$ . Due to Lemma 2.16 in [48] and our choice of $\mathbf{f}$ , the definition of $\tilde{\vartheta}_{n}(\mathbf{f})$ below (3.10) implies

\tilde{\vartheta}_{n}(\mathbf{f})\ =\ \left(\textstyle\frac{1}{p}\textup{tr}(\tilde{\Sigma}_{n})\,,\,\textstyle\frac{1}{p}\textup{tr}(\tilde{\Sigma}_{n}^{2})+\frac{1}{np}\textup{tr}(\tilde{\Sigma}_{n})^{2}\right).

(C.15)

Likewise, the bootstrap version of (C.14) is

\breve{r}_{n}^{*}-\tilde{r}_{n}\ =\ p\big{\{}g_{n}(T_{n}^{*}(\mathbf{f}))-g_{n}(\tilde{\vartheta}_{n}(\mathbf{f}))\big{\}}.

(C.16)

We will proceed by applying the delta method to the relations (C.14) and (C.16). For this purpose, note that the proof of Theorem 2 shows there exists a Gaussian random vector $\mathbf{z}\in\mathbb{R}^{2}$ such that as $n\to\infty$ ,

	$\displaystyle\mathcal{L}\big{(}p\big{\{}T_{n}(\mathbf{f})-\vartheta_{n}(\mathbf{f})\big{\}}\big{)}$	$\displaystyle\ \Rightarrow\ \mathcal{L}(\mathbf{z})$		(C.17)
	$\displaystyle\mathcal{L}\big{(}p\big{\{}T_{n,1}^{*}(\mathbf{f})-\tilde{\vartheta}_{n}(\mathbf{f})\}\big{\|}X\big{)}$	$\displaystyle\ \xRightarrow{\mathbb{P}}\ \mathcal{L}(\mathbf{z}).$		(C.18)

Also, to introduce some further notation, we refer to the $j$ th moment of the distribution $H$ as

\phi_{j}=\textstyle\int t^{j}dH(t),

(C.19)

and we use these moments to define the parameter

\vartheta(\mathbf{f})=(\phi_{1},\phi_{2}+c\phi_{1}^{2}).

(C.20)

This parameter arises in the following limits as $n\to\infty$ ,

\vartheta_{n}(\mathbf{f})\to\vartheta(\mathbf{f})\ \ \ \ \ \ \text{ and }\ \ \ \ \ \ \tilde{\vartheta}_{n}(\mathbf{f})\xrightarrow{\mathbb{P}}\vartheta(\mathbf{f}).

(C.21)

To see why these limits hold, first note that by Assumption 2, there is a compact interval containing support of $H_{n}$ for all large $n$ , and by Lemma A.5, the same statement holds almost surely for $\tilde{H}_{n}$ . Moreover, Assumption 2 and Theorem 1 ensure the limits $H_{n}\Rightarrow H$ and $\tilde{H}_{n}\xRightarrow{\mathbb{P}}H$ , and so it follows that the moments of $H_{n}$ converge to those of $H$ , and the moments of $\tilde{H}_{n}$ converge in probability to those of $H$ . Combining these facts with the formulas (C.12) and (C.15) yields the limits (C.21).

In light of the relations (C.14) and (C.16), and the limits (C.21), we will expand the function $g_{n}$ around $\vartheta(\mathbf{f})$ and apply the delta method. This is justified because the gradient $\nabla g_{n}$ has the following continuity property. Namely, if we let $g(x_{1},x_{2})=\frac{x_{1}^{2}}{x_{2}-cx_{1}^{2}}$ , let $\mathcal{U}$ be a sufficiently small open neighborhood of $\vartheta(\mathbf{f})$ in $\mathbb{R}^{2}$ , and let $\{(x_{1,n},x_{2,n})\}$ be any sequence of points within $\mathcal{U}$ that converges to $\vartheta(\mathbf{f})$ , then we have the limit

\nabla g_{n}(x_{1,n},x_{2,n})\ \to\ \nabla g(\vartheta(\mathbf{f}))\ =\ \left(\textstyle\frac{2\phi_{1}}{\phi_{2}}+\frac{2c\phi_{1}^{3}}{\phi_{2}^{2}},-\frac{\phi_{1}^{2}}{\phi_{2}^{2}}\right).

(C.22)

So, applying the delta method to the relations (C.14) and (C.16) and the weak limits (C.17) and (C.18) gives

	$\displaystyle\mathcal{L}(\breve{r}_{n}-r_{n})$	$\displaystyle\ \Rightarrow\ \mathcal{L}(\nabla g(\vartheta(\mathbf{f}))^{\top}\mathbf{z})$		(C.23)
	$\displaystyle\mathcal{L}(\breve{r}_{n}^{*}-\tilde{r}_{n}\big{\|}X\big{)}$	$\displaystyle\ \xRightarrow{\mathbb{P}}\ \mathcal{L}(\nabla g(\vartheta(\mathbf{f}))^{\top}\mathbf{z}).$		(C.24)

Now, it remains to show that the variance of the Gaussian random variable $\nabla g(\vartheta(\mathbf{f}))^{\top}\mathbf{z}$ is positive. Letting $K$ denote the $2\times 2$ covariance matrix of $\mathbf{z}$ , it is shown below equation 2.10 in [18] that the entries of $K$ are given by

	$\displaystyle K_{11}$	$\displaystyle=2c\phi_{2}+c(\tau-2)\phi_{1}^{2},$
	$\displaystyle K_{12}$	$\displaystyle=4c\phi_{3}+4c^{2}\phi_{1}\phi_{2}+2c(\tau-2)\phi_{1}(c\phi_{1}^{2}+\phi_{2})$
	$\displaystyle K_{22}$	$\displaystyle=8c\phi_{4}+4c^{2}\phi_{2}^{2}+16c^{2}\phi_{1}\phi_{3}+8c^{3}\phi_{1}^{2}\phi_{2}+4c(\tau-2)(c\phi_{1}^{2}+\phi_{2})^{2}.$

Combining this with the formula for $\nabla g(\vartheta(\mathbf{f}))$ in (C.22), a direct but lengthy computation of the quadratic form $\nabla g(\vartheta(\mathbf{f}))^{\top}K\,\nabla g(\vartheta(\mathbf{f}))$ yields

\begin{split}\operatorname{var}\!\Big{(}\nabla g(\vartheta(\mathbf{f}))^{\top}\mathbf{z}\Big{)}&\ =\ \textstyle\frac{4c\phi_{1}^{2}}{\phi_{2}^{4}}\Big{(}c\phi_{1}^{2}\phi_{2}^{2}+2\phi_{1}^{2}\phi_{4}+2\phi_{2}^{3}-4\phi_{1}\phi_{2}\phi_{3}\Big{)}.\end{split}

(C.25)

To see that the variance is positive, it suffices to check that $2\phi_{1}^{2}\phi_{4}+2\phi_{2}^{3}-4\phi_{1}\phi_{2}\phi_{3}$ is non-negative. Rewriting this as $2(\phi_{1}\phi_{4}^{1/2}-\phi_{2}^{3/2})^{2}+4\phi_{1}\phi_{2}(\phi_{4}^{1/2}\phi_{2}^{1/2}-\phi_{3})$ , its non-negativity follows from the observation that $\phi_{3}\leq\phi_{4}^{1/2}\phi_{2}^{1/2}$ is an instance of the Cauchy-Schwarz inequality. ∎

Proposition 2.

Under Assumptions 1 and 2, the limits (C.6) and (C.7) hold as $n\to\infty$ .

Proof.

Here, we retain the definitions of $f_{1}$ and $f_{2}$ used in the proof of the previous proposition. The difference $\widehat{r}_{n}-\breve{r}_{n}$ can be written explicitly in terms of $T_{n}(f_{1})$ , $T_{n}(f_{2})$ , and $\widehat{\Delta}_{n}$ (defined below (3.13)) as

\displaystyle\widehat{r}_{n}-\breve{r}_{n}=\frac{T_{n}(f_{1})^{2}\Big{(}\widehat{\Delta}_{n}-pc_{n}T_{n}(f_{1})^{2}\Big{)}}{(T_{n}(f_{2})-c_{n}T_{n}(f_{1})^{2})(T_{n}(f_{2})-\widehat{\Delta}_{n}/p)}.

(C.26)

Furthermore, $\widehat{\Delta}_{n}$ can be decomposed as

\widehat{\Delta}_{n}=\widehat{\kappa}_{1}T_{n}(f_{1})^{2}\ +\ \widehat{\kappa}_{2}T_{n}(f_{2}),

(C.27)

where the random coefficients $\widehat{\kappa}_{1}$ and $\widehat{\kappa}_{2}$ are defined by

	$\displaystyle\widehat{\kappa}_{1}\$	$\displaystyle=\ c_{n}p\bigg{[}\frac{n+1}{n}+\frac{n-1}{n}\frac{{\widehat{\varsigma}}_{n}^{2}-2p}{p(p+2)}-\frac{2(n-1)}{n^{2}}\frac{p^{2}+\widehat{\varsigma}_{n}^{2}}{p(p+2)}\bigg{]}$		(C.28)
	$\displaystyle\widehat{\kappa}_{2}\$	$\displaystyle=\ c_{n}\bigg{[}\frac{2(n-1)}{n}\frac{p^{2}+\widehat{\varsigma}_{n}^{2}}{p(p+2)}-1\bigg{]}.$		(C.29)

The last few displays show that in order to determine the limit of $\widehat{r}_{n}-\breve{r}_{n}$ , it suffices to determine the limit of the triple $(T_{n}(f_{1}),T_{n}(f_{2}),\widehat{\varsigma}_{n}^{2}/p)$ . For the random variables $T_{n}(f_{1})$ and $T_{n}(f_{2})$ , we can apply the limits (C.17) and (C.21) as well the formula (C.20) to obtain

\displaystyle\mathcal{L}(T_{n}(f_{1}),T_{n}(f_{2}))\Rightarrow\mathcal{L}(\phi_{1},\phi_{2}+c\phi_{1}^{2}).

(C.30)

In addition, the proof of the limit (3.7) in Theorem 1 shows that

\mathcal{L}(\textstyle\frac{1}{p}\widehat{\varsigma}_{n}^{2})\ \Rightarrow\ \mathcal{L}(\tau).

(C.31)

Combining the previous two displays with the formulas (C.26)-(C.29), a direct calculation leads to

\displaystyle\mathcal{L}(\widehat{r}_{n}-\breve{r}_{n})\ \Rightarrow\ \mathcal{L}(a),

(C.32)

where

a=\textstyle\frac{c\phi_{1}^{2}(\phi_{2}+(\tau-2)\phi_{1}^{2})}{\phi_{2}^{2}},

which proves (C.6).

Now we turn to the proof of (C.7). By analogy with the previous argument that led to (C.32), it is enough to show the following two limits hold as $n\to\infty$ ,

	$\displaystyle\mathcal{L}(T_{n}^{}(f_{1}),T_{n}^{}(f_{2})\|X)$	$\displaystyle\xRightarrow{\mathbb{P}}\mathcal{L}(\phi_{1},\phi_{2}+c\phi_{1}^{2})$		(C.33)
	$\displaystyle\mathcal{L}(\textstyle\frac{1}{p}(\widehat{\varsigma}_{n}^{2})^{*}\|X)$	$\displaystyle\xRightarrow{\mathbb{P}}\mathcal{L}(\tau),$		(C.34)

where $(\widehat{\varsigma}_{n}^{2})^{*}$ is obtained by applying the formula (2.2) to the bootstrap data $\mathbf{x}_{1}^{*},\dots,\mathbf{x}_{n}^{*}$ . The first limit (C.33) is a consequence of two limits that were established in the proof of Theorem 2, which are that $\textup{II}_{n}\xrightarrow{\mathbb{P}}0$ and $\tilde{\vartheta}_{n}(\mathbf{f})\xrightarrow{\mathbb{P}}(\phi_{1},\phi_{2}+c\phi_{1}^{2})$ .

Regarding the second limit (C.34), note that it is equivalent to showing that for any subsequence $J\subset\{1,2,\dots\}$ , there is a further subsequence $J^{\prime}\subset J$ such that $\mathcal{L}(\textstyle\frac{1}{p}(\widehat{\varsigma}_{n}^{2})^{*}|X)\Rightarrow\mathcal{L}(\tau)$ holds almost surely along $J^{\prime}$ . The latter statement can be proven by analogy with the limit $\mathcal{L}(\frac{1}{p}\widehat{\varsigma}_{n}^{2})\Rightarrow\mathcal{L}(\tau)$ , which follows from the proof of (3.7) in Theorem 1. To be more precise, this analogy can be justified as follows: The proof of (3.7) only relies on Assumption 1 and two other conditions, which are

\lambda_{1}(\Sigma_{n})\lesssim 1,\ \ \ \ \text{ and }\ \ \ \ (\textstyle\frac{1}{p}\textup{tr}(\Sigma_{n}),\textstyle\frac{1}{p}\textup{tr}(\Sigma_{n}^{2}))\to(\phi_{1},\phi_{2}).

(C.35)

Consequently, it is enough to check that bootstrap counterparts of these conditions hold almost surely along $J^{\prime}$ . First, the bootstrap counterpart of Assumption 1 was shown to hold almost surely along subsequences in the proof of Theorem 2. Second, the bootstrap counterpart of $\lambda_{1}(\Sigma_{n})\lesssim 1$ is implied by Lemma A.5, which guarantees that there is a constant $C>0$ such that $\limsup_{n\to\infty}\lambda_{1}(\tilde{\Sigma}_{n})\leq C$ holds almost surely. Lastly, the bootstrap counterpart of the limit $(\textstyle\frac{1}{p}\textup{tr}(\Sigma_{n}),\textstyle\frac{1}{p}\textup{tr}(\Sigma_{n}^{2}))\to(\phi_{1},\phi_{2})$ is handled by the fact that the moments of $\tilde{H}_{n}$ converge in probability to the moments of $H$ , which was shown in the proof of Proposition 1. This completes the proof. ∎

Appendix D Background results

Lemma D.1 ([18], Lemma A.1).

Let $\xi_{1}\in\mathbb{R}$ and $\mathbf{u}_{1}\in\mathbb{R}^{p}$ satisfy the conditions in Assumption 1, and fix any symmetric matrix $M\in\mathbb{R}^{p\times p}$ . Then,

\displaystyle\operatorname{var}(\xi_{1}^{2}\mathbf{u}_{1}^{\top}M\mathbf{u}_{1})=\frac{\mathbb{E}(\xi_{1}^{4})}{p(p+2)}(\textup{tr}(M)^{2}+2\textup{tr}(M^{2}))-\textup{tr}(M)^{2}.

(D.1)

Lemma D.2.

Let $\mathbf{x}_{1},\ldots,\mathbf{x}_{n}$ satisfy the conditions in Assumption 1, and let $i,j,k,l$ be four distinct indices in $\{1,\dots,n\}$ . Then,

$\displaystyle\mathbb{E}((\mathbf{x}_{i}^{\top}\mathbf{x}_{j})^{4})\$	$\displaystyle=\ 3\left(\frac{\mathbb{E}(\xi_{1}^{4})}{p(p+2)}\right)^{2}(\textup{tr}(\Sigma_{n}^{2})^{2}+2\textup{tr}(\Sigma_{n}^{4}))$	(D.2)
$\displaystyle\mathbb{E}((\mathbf{x}_{i}^{\top}\mathbf{x}_{j})^{2}(\mathbf{x}_{i}^{\top}\mathbf{x}_{k})^{2})\$	$\displaystyle=\ \frac{\mathbb{E}(\xi_{1}^{4})}{p(p+2)}(\textup{tr}(\Sigma_{n}^{2})^{2}+2\textup{tr}(\Sigma_{n}^{4}))$	(D.3)
$\displaystyle\mathbb{E}((\mathbf{x}_{i}^{\top}\mathbf{x}_{j})^{2}(\mathbf{x}_{l}^{\top}\mathbf{x}_{k})^{2})\$	$\displaystyle=\ \textup{tr}(\Sigma^{2}_{n})^{2}.$	(D.4)

Proof.

Since the observations are i.i.d. and centered, we have

\begin{split}\mathbb{E}((\mathbf{x}_{i}^{\top}\mathbf{x}_{j})^{2}(\mathbf{x}_{l}^{\top}\mathbf{x}_{k})^{2})&=(\operatorname{var}(\mathbf{x}_{i}^{\top}\mathbf{x}_{j}))^{2}\\[5.69046pt] &=(\text{cov}(\mathbf{x}_{i}^{\top}\mathbf{x}_{j}\,,\,\mathbf{x}_{i}^{\top}\mathbf{x}_{j}))^{2}\\ &=\Big{(}\sum_{r=1}^{p}\sum_{s=1}^{p}\mathbb{E}(\mathbf{x}_{ir}\mathbf{x}_{jr}\mathbf{x}_{is}\mathbf{x}_{js})\Big{)}^{2}\\[5.69046pt] &=\Big{(}\sum_{r=1}^{p}\sum_{s=1}^{p}((\Sigma_{n})_{rs})^{2}\Big{)}^{2},\end{split}

(D.5)

which establishes (D.4). The two other assertions can be shown using conditional expectation. For the statement (D.2), we use Lemma D.1 to obtain

	$\displaystyle\small\mathbb{E}((\mathbf{x}_{i}^{\top}\mathbf{x}_{j})^{4})$	$\displaystyle\ =\ \mathbb{E}(\mathbb{E}((\mathbf{x}_{i}^{\top}\mathbf{x}_{j}\mathbf{x}_{j}^{\top}\mathbf{x}_{i})^{2}\|\mathbf{x}_{j}))$
		$\displaystyle\ =\ \frac{\mathbb{E}(\xi_{1}^{4})}{p(p+2)}\mathbb{E}\bigg{(}\textstyle\Big{(}\textup{tr}(\Sigma_{n}^{1/2}\mathbf{x}_{j}\mathbf{x}_{j}^{\top}\Sigma_{n}^{1/2})\Big{)}^{2}+2\textup{tr}\Big{(}\big{(}\Sigma_{n}^{1/2}\mathbf{x}_{j}\mathbf{x}_{j}^{\top}\Sigma_{n}^{1/2}\big{)}^{2}\Big{)}\bigg{)}$
		$\displaystyle\ =\ 3\textstyle\frac{\mathbb{E}(\xi_{1}^{4})}{p(p+2)}\mathbb{E}((\mathbf{x}_{j}^{\top}\Sigma_{n}\mathbf{x}_{j})^{2})$
		$\displaystyle\ =\ 3\left(\frac{\mathbb{E}(\xi_{1}^{4})}{p(p+2)}\right)^{2}(\textup{tr}(\Sigma_{n}^{2})^{2}+2\textup{tr}(\Sigma_{n}^{4})).$

The argument for (D.3) is similar. ∎

Lemma D.3.

Let $\mathbf{u}_{1}\in\mathbb{R}^{p}$ be a random vector that is uniformly distributed on the unit sphere, and let $M\in\mathbb{R}^{p\times p}$ be a non-random positive semidefinite matrix with $\lambda_{1}(M)\lesssim 1$ . Then,

\mathbb{E}\Big{|}\mathbf{u}_{1}^{\top}M\mathbf{u}_{1}-\textstyle\frac{1}{p}\textup{tr}(M)\Big{|}^{4}\lesssim\ \frac{1}{p^{2}}.

(D.6)

Proof.

Due to the orthogonal invariance of $\mathbf{u}_{1}$ , we may work under the assumption that $M$ is diagonal, i.e. $M=\text{diag}(\lambda_{1}(M),\dots,\lambda_{p}(M))$ . Therefore, the quantity $\mathbb{E}\big{|}\mathbf{u}_{1}^{\top}M\mathbf{u}_{1}-\textstyle\frac{1}{p}\textup{tr}(M)\big{|}^{4}$ is the same as

\displaystyle\sum_{j_{1},j_{2},j_{3},j_{4}}\textstyle\lambda_{j_{1}}(M)\lambda_{j_{2}}(M)\lambda_{j_{3}}(M)\lambda_{j_{4}}(M)\,\mathbb{E}\Big{(}\prod_{l=1}^{4}(\mathbf{u}_{1j_{l}}^{2}-\textstyle\frac{1}{p})\Big{)}.

(D.7)

Depending on the number of distinct indices among $(j_{1},j_{2},j_{3},j_{4})$ , the following bounds can be obtained via direct calculation

\bigg{|}\mathbb{E}\Big{(}\textstyle\prod_{l=1}^{4}(\mathbf{u}_{1j_{l}}^{2}-\textstyle\frac{1}{p})\Big{)}\bigg{|}\ \lesssim\ \begin{cases}\textstyle\frac{1}{p^{6}}\ \ \ \ \text{(4 distinct indices)}\\[5.69046pt] \textstyle\frac{1}{p^{5}}\ \ \ \ \text{(3 distinct indices)}\\[5.69046pt] \textstyle\frac{1}{p^{4}}\ \ \ \ \text{(1 or 2 distinct indices).}\end{cases}

(D.8)

Since the eigenvalues of $M$ are non-negative with $\lambda_{1}(M)\lesssim 1$ , and since the numbers of terms in (D.7) involving $k$ distinct indices is $\mathcal{O}(p^{k})$ , the stated result (D.6) is proved.∎

Appendix E Discussion of examples in Section 3.1

This section provides detailed information related to examples of the random variable $\xi_{1}^{2}$ stated in Section 3.1. We give explicit parameterizations, and we check that the distributions satisfy the conditions in Assumption 1. The only three examples we do not individually cover are the Chi-Squared, Poisson, and Negative-Binomial distributions, because they can be decomposed into sums of independent random variables, and consequently, such examples are covered by the following lemma.

Lemma E.1.

Suppose $\xi_{1}^{2}=\sum_{j=1}^{p}z_{1j}^{2}$ for some independent random variables $z_{11},\dots,z_{1p}$ satisfying

\textstyle\frac{1}{p}\sum_{j=1}^{p}\mathbb{E}(z_{1j}^{2})=1,\quad\textstyle\frac{1}{p}\sum_{j=1}^{p}\operatorname{var}(z_{1j}^{2})=\tau+o(1)\text{\quad and \quad}\displaystyle\max_{1\leq j\leq p}\mathbb{E}|z_{1j}|^{8+2\varepsilon}\,\lesssim\,p^{1+\frac{\varepsilon}{4}}

(E.1)

as $n\to\infty$ , for some fixed constants $\tau\geq 0$ and $\varepsilon>0$ not depending on $n$ . Then, $\xi_{1}^{2}$ satisfies the conditions in Assumption 1.

Proof.

It is only necessary to show $\mathbb{E}\big{|}(\xi_{1}^{2}-\mathbb{E}(\xi_{1}^{2}))/\sqrt{p}\big{|}^{4+\varepsilon}\lesssim 1$ . Using Rosenthal’s inequality [22] to bound the $L^{4+\varepsilon}$ norm of a sum of independent centered random variables, we have

\begin{split}\Big{(}\mathbb{E}\big{|}\xi_{1}^{2}-\mathbb{E}(\xi_{1}^{2})\big{|}^{4+\varepsilon}\Big{)}^{\frac{1}{4+\varepsilon}}&\ =\ \Big{(}\mathbb{E}\big{|}\sum_{j=1}^{p}z_{1j}^{2}-\mathbb{E}(z_{1j}^{2})\big{|}^{4+\varepsilon}\Big{)}^{\frac{1}{4+\varepsilon}}\\[5.69046pt] &\ \lesssim\ \sqrt{\textstyle\sum_{j=1}^{p}\operatorname{var}(z_{1j}^{2})}+p^{\frac{1}{4+\varepsilon}}\max_{1\leq j\leq p}(\mathbb{E}|z_{1j}^{2}-\mathbb{E}(z_{1j}^{2})|^{4+\varepsilon})^{\frac{1}{4+\varepsilon}}\\[5.69046pt] &\ \lesssim\sqrt{p}+p^{1/4}\max_{1\leq j\leq p}(\mathbb{E}|z_{1j}|^{8+2\varepsilon})^{\frac{1}{4+\varepsilon}}\\[5.69046pt] &\ \lesssim\ \sqrt{p},\end{split}

(E.2)

which completes the proof. ∎

Beta distribution The Beta $(a,b)$ distribution with parameters $a,b>0$ has a density function that is proportional to $x^{a-1}(1-x)^{b-1}$ for $x\in(0,1)$ .

Lemma E.2.

If $\beta>0$ is fixed with respect to $n$ and $\xi_{1}^{2}\sim(p+2\beta)$ Beta $(p/2,\beta)$ , then $\xi_{1}^{2}$ satisfies the conditions in Assumption 1.

Proof.

The mean and variance are given by $\mathbb{E}(\xi_{1}^{2})=p$ and $\operatorname{var}(\xi_{1}^{2})=\frac{4\beta p}{p+2\beta+2}$ , and in particular we have $\operatorname{var}(\xi_{1}^{2})/p\to 0$ as $p\to\infty$ . Based on Equation (25.14) in [21], the higher moments of $\xi_{1}^{2}$ are

\displaystyle\mathbb{E}(\xi_{1}^{2k})=\prod_{j=0}^{k-1}\Big{(}p+\textstyle\frac{4j\beta}{p+2\beta+2j}\Big{)}.

Using the general relationship between central moments to ordinary moments

\mathbb{E}\big{|}\textstyle\frac{\xi_{1}^{2}-p}{\sqrt{p}}\big{|}^{6}\ =\ \frac{1}{p^{3}}\displaystyle\sum_{j=0}^{6}\textstyle\binom{6}{j}(-1)^{6-j}\mathbb{E}(\xi_{1}^{2})^{6-j}\mathbb{E}(\xi_{1}^{2j}),

(E.3)

it can be checked that that $\mathbb{E}|\frac{\xi_{1}^{2}-p}{\sqrt{p}}|^{6}$ is a rational function of $\beta$ that converges pointwise to 0 as $p\to\infty$ . This implies the $(4+\varepsilon)$ -moment condition in (3.2).∎

Beta-Prime distribution. A random variable $W$ is said to follow a Beta-Prime $(a,b)$ distribution with parameters $a,b>0$ if it can be expressed as $W=\frac{U}{1-U}$ with $U\sim$ Beta $(a,b)$ .

Lemma E.3.

If $\tau>0$ and $\xi_{1}^{2}\sim$ Beta-Prime $\left(\frac{p(1+p+\tau)}{\tau},\frac{1+p+2\tau}{\tau}\right)$ , then $\xi_{1}^{2}$ satisfies the conditions in Assumption 1.

Proof.

For any positive integer $k<(1+p+2\tau)/\tau$ , the random variable $\xi_{1}^{2}$ satisfies the following moment formula [42, §5],

\displaystyle\mathbb{E}(\xi_{1}^{2k})=\prod_{j=1}^{k}\frac{p(1+p+\tau)+(j-1)\tau}{1+p+(2-j)\tau}.

(E.4)

Since $2<(1+p+2\tau)/\tau$ , this gives $\mathbb{E}(\xi_{1}^{2})=p$ and $\mathbb{E}(\xi_{1}^{4})=p(p+\tau)$ , which yield $\operatorname{var}\Big{(}\frac{\xi_{1}^{2}-p}{\sqrt{p}}\Big{)}=\tau$ . Also, using the formula (E.3), it can be checked that $\mathbb{E}|\frac{\xi_{1}^{2}-p}{\sqrt{p}}|^{6}$ , viewed as a function of $\tau$ , converges pointwise to a polynomial function of $\tau$ as $p\to\infty$ . This implies the $(4+\varepsilon)$ -moment condition in (3.2).∎

Gamma distribution. For $\alpha,\beta>0$ , we parameterize the Gamma $(\alpha,\beta)$ distribution so that its density function is proportional to $x^{\alpha-1}e^{-\beta x}$ when $x\in(0,\infty)$ .

Lemma E.4.

If $\tau>0$ and $\xi_{1}^{2}\sim\textup{Gamma}(p/\tau,1/\tau)$ , then $\xi_{1}^{2}$ satisfies the conditions in Assumption 1.

Proof.

The conditions $\mathbb{E}(\xi_{1}^{2})=p$ and $\operatorname{var}\Big{(}\frac{\xi_{1}^{2}-p}{\sqrt{p}}\Big{)}=\tau$ follow directly from the stated parameterization. Also, Theorem 2.3 in [8] gives

\displaystyle\big{(}\mathbb{E}|\xi_{1}^{2}-p|^{6}\big{)}^{1/6}\leq C(\sqrt{\tau p}+\tau)

(E.5)

for an absolute constant $C>0$ , which implies the $(4+\varepsilon)$ -moment condition in (3.2).∎

Log-Normal distribution. A positive random variable $Y$ is said to follow a Log-Normal $(\mu,\sigma^{2})$ distribution if $\log(Y)\sim N(\mu,\sigma^{2})$ .

Lemma E.5.

If $\tau>0$ and $\xi_{1}^{2}\sim$ Log-Normal $\left(\!\log(p)-\frac{1}{2}\log\big{(}1+\frac{\tau}{p}\big{)},\log\big{(}1+\frac{\tau}{p}\big{)}\!\right)$ , then $\xi_{1}^{2}$ satisfies the conditions in Assumption 1.

Proof.

Equations (14.8a)-(14.8b) in [20] show that if $Y\sim\textup{Log-Normal}(\mu,\sigma^{2})$ , then $\mathbb{E}(Y)=e^{\mu+\sigma^{2}/2}$ and $\operatorname{var}(Y)=e^{2\mu+\sigma^{2}}\big{(}e^{\sigma^{2}}-1\big{)}$ . Consequently, the stated choice of $\xi_{1}^{2}$ satisfies $\mathbb{E}(\xi_{1}^{2})=p$ and $\operatorname{var}(\xi_{1}^{2})/p=\tau$ . Equation (14.8c) in [20] shows that the central 6th moment of $Y$ is

\displaystyle\mathbb{E}|Y-\mathbb{E}(Y)|^{6}

\displaystyle=e^{6\mu+3\sigma^{2}}\sum_{j=0}^{6}(-1)^{j}\tbinom{6}{j}e^{\sigma^{2}(6-j)(5-j)/2},

and a direct calculation reveals that $\mathbb{E}|\frac{\xi_{1}^{2}-p}{\sqrt{p}}|^{6}$ is a polynomial function of $\tau$ whose coefficients are bounded as $p\to\infty$ . This ensures that $\xi_{1}$ satisfies the $(4+\varepsilon)$ -moment condition in (3.2).

Acknowledgements

The authors thank the reviewers and Associate Editor for their helpful feedback, which significantly improved the paper.

References

Akemann et al. [2011] Akemann, G., Baik, J., and Di Francesco, P. The Oxford Handbook of Random Matrix Theory. Oxford, 2011.
Anderson [2003] Anderson, T. W. Introduction to Multivariate Statistical Analysis. Wiley, 2003.
Bai and Silverstein [2004] Bai, Z. and Silverstein, J. W. CLT for linear spectral statistics of large-dimensional sample covariance matrices. Annals of Probability, 32(1A):553–605, 2004.
Bai and Saranadasa [1996] Bai, Z. and Saranadasa, H. Effect of high dimension: by an example of a two sample problem. Statistica Sinica, pages 311–329, 1996.
Bai and Silverstein [2010] Bai, Z. and Silverstein, J. W. Spectral Analysis of Large Dimensional Random Matrices. Springer, 2010.
Bai and Zhou [2008] Bai, Z. and Zhou, W. Large sample covariance matrices without independence structures in columns. Statistica Sinica, pages 425–442, 2008.
Bai et al. [2010] Bai, Z., Chen, J., and Yao, J. On estimation of the population spectral distribution from a high-dimensional sample covariance matrix. Australian & New Zealand Journal of Statistics, 52(4):423–437, 2010.
Boucheron et al. [2013] Boucheron, S., Lugosi, G., and Massart, P. Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford, 2013.
Couillet and Debbah [2011] Couillet, R. and Debbah, M. Random Matrix Methods for Wireless Communications. Cambridge, 2011.
Edelman and Wang [2013] Edelman, A. and Wang, Y. Random matrix theory and its innovative applications. In Advances in Applied Mathematics, Modeling, and Computational Science, pages 91–116. Springer, 2013.
El Karoui [2009] El Karoui, N. Concentration of measure and spectra of random matrices: Applications to correlation matrices, elliptical distributions and beyond. Annals of Applied Probability, 19(6):2362–2405, 2009.
El Karoui [2008] El Karoui, N. Spectrum estimation for large dimensional covariance matrices using random matrix theory. Annals of Statistics, 36(6):2757–2790, 2008.
El Karoui and Purdom [2019] El Karoui, N. and Purdom, E. The non-parametric bootstrap and spectral analysis in moderate and high-dimension. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 2115–2124. PMLR, 2019.
Fang et al. [1990] Fang, K.-T., Kotz, S., and Ng, K. W. Symmetric Multivariate and Related Distributions. Chapman and Hall, 1990.
Fisher et al. [2010] Fisher, T. J., Sun, X., and Gallagher, C. M. A new test for sphericity of the covariance matrix for high dimensional data. Journal of Multivariate Analysis, 101(10):2554–2570, 2010.
Gupta et al. [2013] Gupta, A. K., Varga, T., and Bodnar, T. Elliptically Contoured Models in Statistics and Portfolio Theory. Springer, 2013.
Härdle and Simar [2019] Härdle, W. K. and Simar, L. Applied Multivariate Statistical Analysis. Springer, 2019.
Hu et al. [2019a] Hu, J., Li, W., Liu, Z., and Zhou, W. High-dimensional covariance matrices in elliptical distributions with application to spherical test. Annals of Statistics, 47(1):527–555, 2019a.
Hu et al. [2019b] Hu, J., Li, W., and Zhou, W. Central limit theorem for mutual information of large MIMO systems with elliptically correlated channels. IEEE Transactions on Information Theory, 65(11):7168–7180, 2019b.
Johnson et al. [1994] Johnson, N. L., Kotz, S., and Balakrishnan, N. Continuous Univariate Distributions, Volume 1, (2nd ed.). Wiley, 1994.
Johnson et al. [1995] Johnson, N. L., Kotz, S., and Balakrishnan, N. Continuous Univariate Distributions, Volume 2, (2nd ed.). Wiley, 1995.
Johnson et al. [1985] Johnson, W. B., Schechtman, G., and Zinn, J. Best constants in moment inequalities for linear combinations of independent and exchangeable random variables. Annals of Probability, pages 234–253, 1985.
Jonsson [1982] Jonsson, D. Some limit theorems for the eigenvalues of a sample covariance matrix. Journal of Multivariate Analysis, 12(1):1–38, 1982.
Jun et al. [2022] Jun, W., Jiahui, X., Long, Y., and Wang, Z. Tracy-Widom limit for the largest eigenvalue of high-dimensional covariance matrices in elliptical distributions. Bernoulli, 28(4):2941–2967, 2022.
Kong and Valiant [2017] Kong, W. and Valiant, G. Spectrum estimation from samples. Annals of Statistics, 45(5):2218–2247, 2017.
Ledoit and Wolf [2015] Ledoit, O. and Wolf, M. Spectrum estimation: A unified framework for covariance matrix estimation and PCA in large dimensions. Journal of Multivariate Analysis, 139:360–384, 2015.
Ledoit and Wolf [2017] Ledoit, O. and Wolf, M. Numerical implementation of the QuEST function. Computational Statistics & Data Analysis, 115:199–223, 2017.
Li and Yao [2018] Li, W. and Yao, J. On structure testing for component covariance matrices of a high dimensional mixture. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 80(2):293–318, 2018.
Li et al. [2022] Li, W., Wang, Q., Yao, J., and Zhou, W. On eigenvalues of a high-dimensional spatial-sign covariance matrix. Bernoulli, 28(1):606–637, 2022.
Lopes [2016] Lopes, M. E. Unknown sparsity in compressed sensing: Denoising and inference. IEEE Transactions on Information Theory, 62(9):5145–5166, 2016.
Lopes [2022a] Lopes, M. E. Central limit theorem and bootstrap approximation in high dimensions: Near $1/\sqrt{n}$ rates via implicit smoothing. The Annals of Statistics, 50(5):2492–2513, 2022a.
Lopes [2022b] Lopes, M. E. Improved rates of bootstrap approximation for the operator norm: A coordinate-free approach. arXiv:2208.03050, 2022b.
Lopes and Yao [2022] Lopes, M. E. and Yao, J. A sharp lower-tail bound for Gaussian maxima with application to bootstrap methods in high dimensions. Electronic Journal of Statistics, 16(1):58–83, 2022.
Lopes et al. [2019] Lopes, M. E., Blandino, A., and Aue, A. Bootstrapping spectral statistics in high dimensions. Biometrika, 106(4):781–801, 2019.
Mammen [1993] Mammen, E. Bootstrap and wild bootstrap for high dimensional linear models. Annals of Statistics, 21(1):255–285, 1993.
Marcus et al. [2022] Marcus, A. W., Spielman, D. A., and Srivastava, N. Interlacing families III: Sharper restricted invertibility estimates. Israel Journal of Mathematics, 247(2):519–546, 2022.
McNeil et al. [2011] McNeil, A., Frey, R., and Embrechts, P. Quantitative Risk Management. Princeton, 2011.
Mestre [2008] Mestre, X. Improved estimation of eigenvalues and eigenvectors of covariance matrices using their sample estimates. IEEE Transactions on Information Theory, 54(11):5113–5129, 2008.
Paindaveine and Verdebout [2016] Paindaveine, D. and Verdebout, T. On high-dimensional sign tests. Bernoulli, 22(3):1745 – 1769, 2016.
Patterson et al. [2006] Patterson, N., Price, A. L., and Reich, D. Population structure and eigenanalysis. PLoS genetics, 2(12):e190, 2006.
Potters and Bouchaud [2020] Potters, M. and Bouchaud, J.-P. A First Course in Random Matrix Theory: For Physicists, Engineers and Data Scientists. Cambridge, 2020.
Siegrist [2017] Siegrist, K. Probability, Mathematical Statistics, and Stochastic Processes. LibreTexts, 2017.
Srivastava [2005] Srivastava, M. S. Some tests concerning the covariance matrix in high dimensional data. Journal of the Japan Statistical Society, 35(2):251–272, 2005.
Tang and Nehorai [2012] Tang, G. and Nehorai, A. The stability of low-rank matrix reconstruction: a constrained singular value view. IEEE Transactions on Information Theory, 58(9):6079–6092, 2012.
Tian et al. [2015] Tian, X., Lu, Y., and Li, W. A robust test for sphericity of high-dimensional covariance matrices. Journal of Multivariate Analysis, 141:217–227, 2015.
Vershynin [2018] Vershynin, R. High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge, 2018.
Yang et al. [2021] Yang, X., Zheng, X., and Chen, J. Testing high-dimensional covariance matrices under the elliptical distribution and beyond. Journal of Econometrics, 221(2):409–423, 2021.
Yao et al. [2015] Yao, J., Zheng, S., and Bai, Z. Sample Covariance Matrices and High-Dimensional Data Analysis. Cambridge, 2015.
Yao and Lopes [2023] Yao, J. and Lopes, M. E. Rates of bootstrap approximation for eigenvalues in high-dimensional PCA. to appear: Statistica Sinica (arXiv:2104.07328), 2023.
Yin et al. [1988] Yin, Y. Q., Bai, Z., and Krishnaiah, P. R. On the limit of the largest eigenvalue of the large dimensional sample covariance matrix. Probability Theory and Related Fields, 78(4):509–521, 1988.
Zhang et al. [2022] Zhang, Y., Hu, J., and Li, W. CLT for linear spectral statistics of high-dimensional sample covariance matrices in elliptical distributions. Journal of Multivariate Analysis, page 105007, 2022.
Zhao and Candès [2022] Zhao, Q. and Candès, E. J. An adaptively resized parametric bootstrap for inference in high-dimensional generalized linear models. arXiv:2208.08944, 2022.

A Bootstrap Method for Spectral Statistics in High-Dimensional Elliptical Models

Abstract

keywords:

keywords:

1 Introduction

2 Method

2.1 Bootstrap algorithm

Algorithm 1 (Bootstrap for spectral statistics).

2.2 Variance estimation

2.3 Spectrum estimation

3 Theoretical results

3.1 Setup

Assumption 1 (Data generating model).

Assumption 2 (Spectral structure).

3.2 Consistency of estimators

Theorem 1.

3.3 Consistency of bootstrap

Theorem 2.

3.4 Application to inference on stable rank with guarantees

3.4.1 Confidence interval for stable rank

3.4.2 Hypotheses related to stable rank

Theorem 3.

4 Numerical results

4.1 Parameter settings

4.2 Linear spectral statistics

4.3 Nonlinear spectral statistics

4.4 Inference on stable rank

4.5 Sphericity test

5 Conclusion

Appendices

Appendix A Proof of Theorem 1

A.1 Consistency of ς^n 2\widehat{\varsigma}_{n}^{\,2}: Proof of (3.7) in Theorem 1

Lemma A.1.

Proof.

Lemma A.2.

Proof.

Lemma A.3.

Proof.

A.2 Consistency of H~n\tilde{H}_{n}: Proof of (3.8) in Theorem 1

Lemma A.4.

Proof.

A.3 Boundedness of sample eigenvalues

Lemma A.5.

Proof.

Lemma A.6.

Proof.

Appendix B Proof of Theorem 2

Appendix C Proof of Theorem 3

Proposition 1.

Proof.

Proposition 2.

Proof.

Appendix D Background results

Lemma D.1 ([18], Lemma A.1).

Lemma D.2.

Proof.

Lemma D.3.

Proof.

Appendix E Discussion of examples in Section 3.1

Lemma E.1.

Proof.

Lemma E.2.

Proof.

Lemma E.3.

Proof.

Lemma E.4.

Proof.

Lemma E.5.

Proof.

Acknowledgements

References

A Bootstrap Method for Spectral Statistics in High-Dimensional
Elliptical Models

A.1 Consistency of $\widehat{\varsigma}_{n}^{\,2}$ : Proof of (3.7) in Theorem 1

A.2 Consistency of $\tilde{H}_{n}$ : Proof of (3.8) in Theorem 1