This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Covariance test and universal bootstrap by operator norm

Guoyu Zhanglabel=e1][email protected] [    Dandan Jianglabel=e2][email protected] [    Fang Yaolabel=e3][email protected]\orcid0000-0000-0000-0000 [ Department of Probability and Statistics, School of Mathematical Sciences, Center for Statistical Science, Peking Universitypresep=, ]e1,e3 School of Mathematics and Statistics, Xi’an Jiaotong Universitypresep=, ]e2
Abstract

Testing covariance matrices is crucial in high-dimensional statistics. Traditional methods based on the Frobenius and supremum norms are widely used but often treat the covariance matrix as a vector and neglect its inherent matrix structure. This paper introduces a new testing framework based on the operator norm, designed to capture the spectral properties of the covariance matrix more accurately. The commonly used empirical bootstrap and multiplier bootstrap methods are shown to fail for operator norm-based statistics. To derive the critical values of this type of statistics, we propose a universal bootstrap procedure, utilizing the concept of universality from random matrix theory. Our method demonstrates consistency across both high- and ultra-high-dimensional regimes, accommodating scenarios where the dimension-to-sample-size ratio p/np/n converges to a nonzero constant or diverges to infinity. As a byproduct, we provide the first proof of the Tracy-Widom law for the largest eigenvalue of sample covariances with non-Gaussian entries as p/np/n\to\infty. We also show such universality does not hold for the Frobenius norm and supremum norm statistics. Extensive simulations and a real-world data study support our findings, highlighting the favorable finite sample performance of the proposed operator norm-based statistics.

62H15,
62F40,
60B20,
Covariance test,
high-dimensional hypothesis test,
operator norm,
random matrix theory,
universality,
bootstrap,
keywords:
[class=MSC]
keywords:
\startlocaldefs\endlocaldefs

, and

1 Introduction

Testing large covariance has received considerable attention due to its critical role in high-dimensional statistics. Generally, statistics for testing one-sample high-dimensional covariance matrices are based on the distance between the sample covariance and the hypothesized covariance matrices. As highlighted by Chen, Qiu and Zhang (2023), this distance is typically measured using two types of norms: the Frobenius norm (e.g., Chen, Zhang and Zhong (2010); Cai and Ma (2013)) and the supremum norm (e.g., Jiang (2004); Cai and Jiang (2011)), similar to their application in high-dimensional mean tests Chen and Qin (2010); Cai, Liu and Xia (2014).

In contrast to the Frobenius norm and the supremum norm which treat the covariance matrices as vectors, the operator norm captures the spectral structure of the covariance matrices and has recently gained significant attention. For nn independent, not necessarily identically distributed random vectors 𝑿1,,𝑿np\bm{X}_{1},\cdots,\bm{X}_{n}\in\mathbb{R}^{p} with zero mean and a common covariance matrix 𝚺\bm{\Sigma}, consider the statistics

T=𝚺^𝚺0op,\displaystyle T=\left\|\bm{\hat{\Sigma}}-\bm{\Sigma}_{0}\right\|_{\text{op}}, (1)

where 𝚺^=i=1n𝑿i𝑿iT/n\bm{\hat{\Sigma}}=\sum_{i=1}^{n}\bm{X}_{i}\bm{X}_{i}^{T}/n is the sample covariance matrix, 𝚺0\bm{\Sigma}_{0} is the hypothesized covariance matrix, and the operator norm of a matrix 𝑨\bm{A} is defined as 𝑨op=sup𝒖=1𝑨𝒖\|\bm{A}\|_{\text{op}}=\sup_{\|\bm{u}\|=1}\|\bm{A}\bm{u}\|. The statistic TT is fundamental in principal component analysis, as it bounds the error of sample eigenvalues and eigenvectors. As a result, numerous studies have developed non-asymptotic upper bounds for the tail of TT, e.g., Adamczak et al. (2011); Bunea and Xiao (2015); Koltchinskii and Lounici (2017). However, these upper bounds are often conservative when used to construct confidence intervals for TT. Only a few studies have explored the asymptotic distribution of TT due to the complex nature of the spectral analysis, including Han, Xu and Zhou (2018); Lopes (2022a); Lopes, Erichson and Mahoney (2023); Giessing (2023). Nonetheless, these works assume that either the dimension pp grows slower than nn, or that the eigenvalues of 𝚺\bm{\Sigma} and 𝚺0\bm{\Sigma}_{0} decay fast enough, making its effective rank smaller than nn. Under such conditions, the empirical covariance matrix 𝚺^\bm{\hat{\Sigma}} is a consistent estimator of 𝚺\bm{\Sigma} with respect to the operator norm. Essentially, these assumptions reduce the intrinsic dimension of the problem, leaving the true high-dimensional cases unresolved. This article studies the behavior of TT under both high- and ultra-high-dimensional regimes, characterized by constants C1,C2>0C_{1},C_{2}>0 and α1\alpha\geq 1, such that

C1nαpC2nα.\displaystyle C_{1}n^{\alpha}\leq p\leq C_{2}n^{\alpha}. (2)

Furthermore, no eigen-decay assumptions are made for 𝚺\bm{\Sigma} or 𝚺0\bm{\Sigma}_{0}. In this context, the dimension-to-sample size ratio ϕ=p/n\phi=p/n may converge to a nonzero constant or diverge to infinity, and no consistent estimators exist for 𝚺\bm{\Sigma} under the operator norm, as discussed in Ding and Wang (2023); Ding, Hu and Wang (2024). This setting, while challenging, is common in practice. For example, the simple case of 𝚺=𝑰p\bm{\Sigma}=\bm{I}_{p} with pp growing proportionally to nn falls into this inconsistent regime. Consequently, existing results are not applicable to such cases. The theoretical difficulties in determining the distribution of TT restrict the use of the operator norm in high-dimensional covariance testing within the inconsistent regime.

1.1 Random matrix theory

Building upon prior work, various approaches have been developed for analyzing TT and its variants in specific scenarios using random matrix theory. When 𝚺0=𝑰p\bm{\Sigma}_{0}=\bm{I}_{p}, the distribution of TT can be derived by examining the limiting behavior of the extreme eigenvalues of the sample covariance matrix 𝚺^\bm{\hat{\Sigma}}. The study of extreme spectral properties of high-dimensional covariance matrices has attracted substantial interest, resulting in a number of influential contributions, including Johnstone (2001); Bianchi et al. (2011); Onatski, Moreira and Hallin (2013); Johnstone and Paul (2018). Building on the progress of these studies, El Karoui (2007); Bao, Pan and Zhou (2015); Lee and Schnelli (2016); Knowles and Yin (2017) established that under conditions where the dimension pp and sample size nn grow proportionally and other certain regularity conditions, the limiting distribution of the largest eigenvalue of 𝚺^\bm{\hat{\Sigma}} follows the Tracy–Widom law. In cases where 𝚺0\bm{\Sigma}_{0} is invertible, Bao, Pan and Zhou (2015) proposed transforming the data 𝑿1,,𝑿n\bm{X}_{1},\cdots,\bm{X}_{n} into 𝚺01/2𝑿1,,𝚺01/2𝑿n\bm{\Sigma}_{0}^{-1/2}\bm{X}_{1},\cdots,\bm{\Sigma}_{0}^{-1/2}\bm{X}_{n}, allowing for testing whether the covariance of 𝚺01/2𝑿i\bm{\Sigma}_{0}^{-1/2}\bm{X}_{i} is the identity matrix 𝑰p\bm{I}_{p}. This transformation results in the statistic TT taking the form of Roy’s largest root statistic

TRoy=𝚺012𝚺^𝚺012𝑰pop,\displaystyle T^{\text{Roy}}=\left\|\bm{\Sigma}_{0}^{-\frac{1}{2}}\bm{\hat{\Sigma}}\bm{\Sigma}_{0}^{-\frac{1}{2}}-\bm{I}_{p}\right\|_{\text{op}}, (3)

and the Tracy–Widom distribution results hold for TRoyT^{\text{Roy}} when pp and nn grow proportionally, i.e., α=1\alpha=1. However, similar distributional results for TT have not yet been obtained for general 𝚺0\bm{\Sigma}_{0}, even when p/np/n is bounded. To address this gap, we leverage random matrix theory to characterize the extreme singular values for the matrix of the form 𝚺^+𝑹\hat{\bm{\Sigma}}+\bm{R} with a general matrix 𝑹\bm{R}. When taking 𝑹=𝚺0\bm{R}=-\bm{\Sigma}_{0}, this gives the limiting properties of TT. Additionally, we introduce a novel technique, the universal bootstrap, designed to directly approximate the distribution of TT in the context of general 𝚺0\bm{\Sigma}_{0}.

1.2 Bootstrap

The bootstrap method, a generic resampling technique to approximate the distribution of a statistic, was first introduced by Efron (1979). Originally designed for fixed-dimensional problems, the bootstrap method has recently been adapted for high-dimensional settings through substantial work. Chernozhukov, Chetverikov and Kato (2013, 2017); Lopes, Lin and Müller (2020); Lopes (2022b); Chernozhukov, Chetverikov and Koike (2023) developed Gaussian approximation methods and established the approximation rates for empirical bootstrap and multiplier bootstrap for the supremum norm of the sum of high-dimensional vectors. Moreover, Han, Xu and Zhou (2018); Lopes, Blandino and Aue (2019); Yao and Lopes (2021); Lopes (2022a); Lopes, Erichson and Mahoney (2023) explored bootstrap methods for spectral statistics of covariance matrices, while these works often assume a low intrinsic dimension or effective rank relative to the sample size nn. Despite such advances in high-dimensional bootstrap methods, El Karoui and Purdom (2019) and Yu, Zhao and Zhou (2024) demonstrated that empirical and multiplier bootstraps can be inconsistent for TT, even when 𝚺0=𝑰p\bm{\Sigma}_{0}=\bm{I}_{p} if pp and nn grow proportionally and the eigenvalues of 𝚺\bm{\Sigma} and 𝚺0\bm{\Sigma}_{0} do not decay.

To address the inconsistency issues associated with high-dimensional bootstraps based on Gaussian approximation, motivated by the concept of universality, a topic of recent interest in random matrix theory (Hu and Lu, 2022; Montanari and Saeed, 2022), we develop a novel resampling method informed by universality principles. Specifically, traditional bootstrap methods (Chernozhukov, Chetverikov and Kato, 2013, 2017; Chernozhukov, Chetverikov and Koike, 2023), depending on the high-dimensional central limit theorem, treat the large sample covariance matrix 𝚺^=i=1n𝑿i𝑿iT/n\bm{\hat{\Sigma}}=\sum_{i=1}^{n}\bm{X}_{i}\bm{X}_{i}^{T}/n as a sum of random matrices and approximate its distribution with a Gaussian matrix sharing the same covariance as 𝚺^\bm{\hat{\Sigma}}. Nonetheless, as the dimension pp increases, the accuracy of this approximation diminishes, eventually leading to inconsistency, as shown by El Karoui and Purdom (2019). This inconsistency indicates that in high-dimensional contexts, the large matrix structures dominate the central limit theorem’s applicability, causing the sample covariance matrix 𝚺^\bm{\hat{\Sigma}} to diverge from Gaussian behavior. To circumvent this limitation, the universal property leverages high-dimensional structures. Broadly, the universality suggests that the asymptotic distribution of TT only depends on the distribution of 𝑿1,,𝑿n\bm{X}_{1},\cdots,\bm{X}_{n} through their first two moments. This allows us to construct the universal bootstrap statistic by substituting independent Gaussian samples 𝒀1,,𝒀n𝒩(𝟎,𝚺)\bm{Y}_{1},\dots,\bm{Y}_{n}\sim\mathcal{N}(\bm{0},\bm{\Sigma}) in place of 𝑿1,,𝑿n\bm{X}_{1},\dots,\bm{X}_{n} in the definition of (1),

Tub=𝚺^ub𝚺0op,\displaystyle T^{\text{ub}}=\left\|\bm{\hat{\Sigma}}^{\text{ub}}-\bm{\Sigma}_{0}\right\|_{\text{op}}, (4)

where 𝚺^ub=i=1n𝒀i𝒀iT/n\bm{\hat{\Sigma}}^{\text{ub}}=\sum_{i=1}^{n}\bm{Y}_{i}\bm{Y}_{i}^{T}/n. The key insight here is that while the universal bootstrap matrix 𝚺^ub\bm{\hat{\Sigma}}^{\text{ub}} need not approximate a Gaussian matrix, it effectively replicates the structure of 𝚺^\bm{\hat{\Sigma}}, which is disrupted by the empirical and multiplier bootstraps. Although this method relies on sophisticated random matrix theory, the implementation remains straightforward.

1.3 Our contributions

We summarize our contribution as three-fold. First, we establish the anisotropic local law for matrices of the form 𝚺^+𝑹\hat{\bm{\Sigma}}+\bm{R} within both high- and ultra-high-dimensional settings (2), where 𝑹\bm{R} may have positive, negative, or zero eigenvalues. Erdős, Yau and Yin (2012); Knowles and Yin (2017) demonstrated that the anisotropic local law is essential for proving eigenvalue rigidity and universality. The anisotropic local law for 𝚺^\hat{\bm{\Sigma}} has been previously established by Knowles and Yin (2017) for α=1\alpha=1 and by Ding and Wang (2023); Ding, Hu and Wang (2024) for α>1\alpha>1. However, the existence of 𝑹\bm{R} alters the structure of the sample covariance matrix 𝚺^\hat{\bm{\Sigma}}, rendering existing techniques invalid. Additionally, due to the potential ultra-high dimension, each block of the Green function matrix (see (21) for definition) may converge at different rates, making the bound derived in Ding and Wang (2023) suboptimal. To address these challenges, we introduce a new ”double Schur inversion” technique to represent the Green function matrix using local blocks. We also define an auxiliary parameter-dependent covariance matrix and establish the corresponding parameter-dependent Marčenko-Pastur law. This parameter-dependent structure enables us to prove the entry-wise local law, which represents a weaker version of the anisotropic local law. Furthermore, we present an improved representation of the anisotropic local law, enhancing the results of Ding and Wang (2023) and Ding, Hu and Wang (2024). This unified structure provides a valuable tool for establishing the anisotropic local law.

Second, we establish the universality results for the statistic TT, based on which the universal bootstrap procedure is introduced. Leveraging this universality, we demonstrate that the empirical distribution generated by the universal bootstrap in (4) effectively approximates the distribution of TT. Additionally, the asymptotic distribution of TT is shown to be independent of the third and fourth moments of 𝑿1,,𝑿n\bm{X}_{1},\cdots,\bm{X}_{n}, allowing us to focus primarily on the covariance. In contrast, we find that two commonly used norms, the Frobenius norm and supremum norm, lack this universality, as their asymptotic distributions explicitly depend on all fourth moments of the data. Therefore, standardized statistics based on these norms requires fourth-moment estimation, which is generally more complex than estimating the covariance. This difference highlights a key advantage of employing the operator norm for covariance testing.

Third, we perform size and power analyses for the statistics TT and TRoyT^{\text{Roy}} and propose a combined approach to enhance testing power. We also develop a generalized universality theorem to show the consistency of the universal bootstrap for statistics based on extreme eigenvalues, including this combined statistic. For size analysis, we demonstrate that the universality results apply to TRoyT^{\text{Roy}}. As a byproduct, we extend the Tracy–Widom law for the largest eigenvalue of sample covariance with general entries under the ultra-high dimension regime, addressing a long-standing gap since it was only proven for Gaussian entries by Karoui (2003). For power performance, we analyze both TT and TRoyT^{\text{Roy}} within the generalized spiked model framework from Bai and Yao (2012); Jiang and Bai (2021). We show that TRoyT^{\text{Roy}} performs better in worst-case scenarios while TT excels on average performance. To further enhance the power, we propose a new combined statistic, TComT^{\text{Com}}, supported by a generalized universality theorem that confirms the validity of the universal bootstrap for TComT^{\text{Com}}. This also serves as a theoretical guarantee for the universal bootstrap applicable to a broader range of other statistics based on extreme eigenvalues. Extensive simulations validate these findings, highlighting the superior performance of our combined statistics across a range of scenarios.

1.4 Notations and paper organization

Throughout the paper, we reserve boldfaced symbols for vectors and matrices. For a complex number zz\in\mathbb{C}, we use z\Re z and z\Im z for its real part and imaginary part, respectively. For a vector 𝒖p\bm{u}\in\mathbb{R}^{p}, we use 𝒖=i=1pui2\|\bm{u}\|=\sqrt{\sum_{i=1}^{p}u_{i}^{2}} for its Euclidean norm. For a matrix 𝑨=(aij)M×N\bm{A}=(a_{ij})_{M\times N}, we use 𝑨op\|\bm{A}\|_{\text{op}}, 𝑨F=:i=1Mj=1Naij2\|\bm{A}\|_{\text{F}}=:\sqrt{\sum_{i=1}^{M}\sum_{j=1}^{N}a_{ij}^{2}}, and 𝑨sup=:sup1iM,1jN|aij|\|\bm{A}\|_{\text{sup}}=:\sup_{1\leq i\leq M,1\leq j\leq N}|a_{ij}| to represent its operator norm, Frobenius norm and the supremum norm, respectively. Denote the singular values of 𝑨\bm{A} by σ1(𝑨)σ2(𝑨)σr(𝑨)\sigma_{1}(\bm{A})\geq\sigma_{2}(\bm{A})\geq\cdots\geq\sigma_{r}(\bm{A}), where r=min{M,N}r=\min\{M,N\}. When M=NM=N, we use λ1(𝑨),λ2(𝑨),λM(𝑨)\lambda_{1}(\bm{A}),\lambda_{2}(\bm{A}),\cdots\lambda_{M}(\bm{A}) for the eigenvalues of 𝑨\bm{A}. The trace of a square matrix 𝑨\bm{A} is denoted by tr(𝑨)=i=1paii\text{tr}(\bm{A})=\sum_{i=1}^{p}a_{ii}. For two sequences {an}n=1\{a_{n}\}_{n=1}^{\infty}, {bn}n=1\{b_{n}\}_{n=1}^{\infty}, we use anbna_{n}\lesssim b_{n} or an=O(bn)a_{n}=O(b_{n}) to show there exists a constant CC not depending on nn such that |an|Cbn|a_{n}|\leq Cb_{n} for all nn\in\mathbb{N}. Denote an=o(bn)a_{n}=o(b_{n}) if an/bn0a_{n}/b_{n}\to 0, and anbna_{n}\asymp b_{n} if both anbna_{n}\lesssim b_{n} and bnanb_{n}\lesssim a_{n}.

The paper is organized as follows. Section 2 provides an overview of the proposed method and presents our main results on the consistency of the proposed universal bootstrap procedure. Section 3 describes the key tools from random matrix theory and applies the universality theorem to covariance testing problems. Section 4 presents the power analysis of the universal bootstrap procedure. Section 5 provides simulation results and a real data example, demonstrating the numerical performance of our operator norm-based statistics. Detailed technical proofs are provided in the supplementary material. The data and codes are publicly available in a GitHub repository (https://github.com/zhang-guoyu/universal_bootstrap).

2 Proposed method and theoretical outlines

In this section, we introduce the covariance testing problem and present an informal summary of our main universal results. The formal results are provided in Section 3.3. We also discuss statistical applications of our proposed universal bootstrap method, which utilizes the concept of universality. Consider nn independent random vectors 𝑿1,,𝑿np\bm{X}_{1},\cdots,\bm{X}_{n}\in\mathbb{R}^{p} with zero mean and covariance matrix 𝚺=(σij)p×p\bm{\Sigma}=(\sigma_{ij})_{p\times p}, which are not necessarily identically distributed. For a given pp by pp non-negative matrix 𝚺0\bm{\Sigma}_{0}, we aim to test the hypothesis

H0:𝚺=𝚺0vs.H1:𝚺𝚺0.\displaystyle H_{0}:\bm{\Sigma}=\bm{\Sigma}_{0}\quad\text{vs.}\quad H_{1}:\bm{\Sigma}\neq\bm{\Sigma}_{0}. (5)

As we focus on testing the covariance structure, we allow the third and fourth moments of 𝑿i\bm{X}_{i} to differ across i=1,,ni=1,\cdots,n.

To simplify notation, we arrange the data into an n×pn\times p matrix 𝑿=(𝑿1,,𝑿n)Tn×p\bm{X}=(\bm{X}_{1},\cdots,\bm{X}_{n})^{T}\in\mathbb{R}^{n\times p}, where 𝑨T\bm{A}^{T} denotes the transpose of 𝑨\bm{A}, and the sample covariance matrix is expressed by 𝚺^=𝑿T𝑿/n\bm{\hat{\Sigma}}=\bm{X}^{T}\bm{X}/n. Recall the definition of statistic TT in (1), we aim to control the size of our procedure by charaterizing the asymptotic distribution of TT under the null hypothesis H0H_{0}. Consider the Gaussian data matrix 𝒀=(𝒀1,,𝒀n)Tn×p\bm{Y}=(\bm{Y}_{1},\cdots,\bm{Y}_{n})^{T}\in\mathbb{R}^{n\times p}, with 𝒀1,,𝒀n𝒩(𝟎,𝚺)\bm{Y}_{1},\cdots,\bm{Y}_{n}\sim\mathcal{N}(\bm{0},\bm{\Sigma}). Its sample covariance is defined accordingly as 𝚺^ub=𝒀T𝒀/n\bm{\hat{\Sigma}}^{\text{ub}}=\bm{Y}^{T}\bm{Y}/n. Throughout this article, we work under both the proportionally growing high-dimensional regime and the ultra-high dimensional regime (2). Under these regimes, the key quantity of interest to bound is as follows

ρn(𝚺0)=supt0|(𝚺^𝐁op(𝚺0,t))(𝚺^ub𝐁op(𝚺0,t))|,\displaystyle\rho_{n}(\bm{\Sigma}_{0})=\sup_{t\geq 0}\bigg{|}\mathbb{P}\bigg{(}\bm{\hat{\Sigma}}\in\mathbf{B}_{\text{op}}(\bm{\Sigma}_{0},t)\bigg{)}-\mathbb{P}\bigg{(}\bm{\hat{\Sigma}}^{\text{ub}}\in\mathbf{B}_{\text{op}}(\bm{\Sigma}_{0},t)\bigg{)}\bigg{|}, (6)

where 𝐁op(𝚺0,t)={𝑵:𝑵𝚺0opt}\mathbf{B}_{\text{op}}(\bm{\Sigma}_{0},t)=\big{\{}\bm{N}\ :\|\bm{N}-\bm{\Sigma}_{0}\|_{\text{op}}\leq t\big{\}} represents operator norm ball with center 𝚺0\bm{\Sigma}_{0} and the radius tt.

A simplified form is given to illustrate our main results.

Result 1 (Informal).

We have

ρn(𝚺)Cnδ0asn,\displaystyle\rho_{n}(\bm{\Sigma})\leq Cn^{-\delta}\to 0\quad\text{as}\quad n\to\infty, (7)

for some constants CC and δ>0\delta>0, where 𝚺\bm{\Sigma} is the covariance matrix of 𝑿\bm{X}.

This result enables us to approximate the asymptotic distribution of TT using TubT^{\text{ub}}, as defined in (4). Under H0H_{0}, 𝚺=𝚺0\bm{\Sigma}=\bm{\Sigma}_{0} and the distribution of TubT^{\text{ub}} does not contain any unknown quantities. This inspires the procedure of universal bootstrap. Given the analytical complexity of TubT^{\text{ub}}’s distribution, we generate BB independent samples, 𝒀1,,𝒀B\bm{Y}^{1},\dots,\bm{Y}^{B}, from the same distribution as 𝒀\bm{Y}. We define 𝚺^ub,b=(𝒀b)T(𝒀b)/n\bm{\hat{\Sigma}}^{\text{ub},b}=(\bm{Y}^{b})^{T}(\bm{Y}^{b})/n for each b=1,,Bb=1,\dots,B, allowing us to compute

Tub,b=𝚺^ub,b𝚺0op,b=1,,B.T^{\text{ub},b}=\left\|\bm{\hat{\Sigma}}^{\text{ub},b}-\bm{\Sigma}_{0}\right\|_{\text{op}},\ b=1,\cdots,B.

The empirical distribution of Tub,bT^{\text{ub},b} for b=1,,Bb=1,\dots,B serves as an approximation for the distribution of TT. In particular, we use the empirical upper-α\alpha quantile, q^𝚺,𝚺0ub,B(α)\hat{q}^{\text{ub},B}_{\bm{\Sigma},\bm{\Sigma}_{0}}(\alpha), of Tub,bT^{\text{ub},b}, b=1,,Bb=1,\dots,B, as the threshold for the test (5) based on TT. Defining q^𝚺0ub,B(α)=q^𝚺0,𝚺0ub,B(α)\hat{q}^{\text{ub},B}_{\bm{\Sigma}_{0}}(\alpha)=\hat{q}^{\text{ub},B}_{\bm{\Sigma}_{0},\bm{\Sigma}_{0}}(\alpha), we reject H0H_{0} if Tq^𝚺0ub,B(α)T\geq\hat{q}^{\text{ub},B}_{\bm{\Sigma}_{0}}(\alpha). Results like (7) ensure the consistency of q^𝚺0ub,B(α)\hat{q}^{\text{ub},B}_{\bm{\Sigma}_{0}}(\alpha) to the upper-α\alpha quantile q𝚺0(α)q_{\bm{\Sigma}_{0}}(\alpha) of TT under H0H_{0} as nn\to\infty and BB\to\infty, validating the universal bootstrap procedure.

To contextualize our findings, we briefly compare (7) with existing high-dimensional bootstrap results. Generally, these results are presented as

ρ(𝒜)=supA𝒜|(TA)(TA|𝑿)|0,\rho(\mathcal{A})=\sup_{A\in\mathcal{A}}\big{|}\mathbb{P}\big{(}T\in A\big{)}-\mathbb{P}\big{(}T^{*}\in A\ \big{|}\ \bm{X}\big{)}\big{|}\to 0,

where TT is a statistic, TT^{*} is its Gaussian counterpart, and 𝒜\mathcal{A} represents a specified family of sets. For example, Chernozhukov, Chetverikov and Kato (2013, 2017); Chernozhukov, Chetverikov and Koike (2023) considered TT as mean estimators and 𝒜\mathcal{A} as all rectangular in p\mathbb{R}^{p}. A more related choice of 𝒜\mathcal{A} is in Zhai (2018); Xu, Zhang and Wu (2019); Fang and Koike (2024), who also considered mean estimators but take 𝒜\mathcal{A} to be the sets of Euclidean balls and convex sets in p\mathbb{R}^{p}. Their results demonstrated that under mild conditions and for sets of Euclidean balls 𝒜\mathcal{A}, ρ(𝒜)\rho(\mathcal{A}) converge to 0 if and only if p/n0p/n\to 0, meaning the Gaussian approximation holds when p=o(n)p=o(n). For comparison, we observe that the operator norm ball 𝐁op(𝒗,t)\mathbf{B}_{\text{op}}(\bm{v},t) for vector 𝒗\bm{v} in p\mathbb{R}^{p} coincides with the Euclidean balls, and our results show that the universality approximation holds when p/np/n converges to a nonzero constant or even diverges to infinity. For the covariance test, Han, Xu and Zhou (2018) took TT as the sample covariance matrix and 𝒜\mathcal{A} as all sets of ss-sparse operator norm balls (defined in their work). Especially, with 𝒜\mathcal{A} as all operator norm balls, i.e. s=ps=p, they required p=o(n1/9)p=o(n^{1/9}), limiting pp to be considerably smaller than nn. Similarly, Lopes (2022a); Lopes, Erichson and Mahoney (2023) considered TT as sample covariance with 𝒜\mathcal{A} as operator norm balls, but imposed a decay rate iβi^{-\beta} for the ii-th largest eigenvalue λi(𝚺)\lambda_{i}(\bm{\Sigma}) of 𝚺\bm{\Sigma} with β>1\beta>1, implying a low intrinsic test dimension. Likewise, Giessing (2023) required the effective rank r(𝚺)=tr(𝚺)/|𝚺|opr(\bm{\Sigma})=\text{tr}(\bm{\Sigma})/|\bm{\Sigma}|_{\text{op}} to satisfy r(𝚺)=o(n1/6)r(\bm{\Sigma})=o(n^{1/6}). In contrast, we impose no such assumptions, allowing each eigenvalue of 𝚺\bm{\Sigma} to be of comparable scale. To summarize, previous work has typically assumed either pnp\ll n or fast-decaying eigenvalues, yielding consistent estimates for 𝚺\bm{\Sigma}. In our setting, however, no consistent estimator of 𝚺\bm{\Sigma} exists (see Cai and Ma (2013)). These comparisons underscore the advantages of our proposal, even in regimes lacking consistency.

Given these improvements on existing results, we establish a universal result that extends beyond (7).

Result 2 (Informal).

For 𝚺0\bm{\Sigma}_{0} that commutes with 𝚺\bm{\Sigma}, we have

ρn(𝚺0)Cnδ0asn.\displaystyle\rho_{n}(\bm{\Sigma}_{0})\leq Cn^{-\delta}\to 0\quad\text{as}\quad n\to\infty. (8)

for some constants CC and δ>0\delta>0. See Theorem 3.5 for a formal description.

This result generalizes the universal bootstrap consistency of TT under H0H_{0} to alternative covariance 𝚺0\bm{\Sigma}_{0} distinct from 𝚺\bm{\Sigma}. The commutativity requirement between 𝚺0\bm{\Sigma}_{0} and 𝚺\bm{\Sigma} assumes the two matrices share eigenvectors. Similar assumptions appear in Zhou, Bai and Hu (2023); Zhou et al. (2024). The result in (8) further guarantees universal bootstrap consistency for statistics beyond TT. For instance, we can estimate 𝚺\bm{\Sigma} with shrinkage estimators such as a𝚺^+(1a)𝑰pa\hat{\bm{\Sigma}}+(1-a)\bm{I}_{p} for a(0,1)a\in(0,1) as proposed by Schäfer and Strimmer (2005), or a𝚺^a\hat{\bm{\Sigma}} for some a>0a>0 as in Tsukuma (2016). Using these estimators, covariance can be tested with statistics T1shr=|a𝚺^+(1a)𝑰p𝚺0|op=a|𝚺^(𝚺0(1a)𝑰p)/a|opT^{\text{shr}}_{1}=\left|a\hat{\bm{\Sigma}}+(1-a)\bm{I}_{p}-\bm{\Sigma}_{0}\right|_{\text{op}}=a\left|\hat{\bm{\Sigma}}-(\bm{\Sigma}_{0}-(1-a)\bm{I}_{p})/a\right|_{\text{op}} or T2shr=|a𝚺^𝚺0|op=a|𝚺^𝚺0/a|opT^{\text{shr}}_{2}=\left|a\hat{\bm{\Sigma}}-\bm{\Sigma}_{0}\right|_{\text{op}}=a\left|\hat{\bm{\Sigma}}-\bm{\Sigma}_{0}/a\right|_{\text{op}}. Under H0H_{0}, universal bootstrap consistency for T1shrT^{\text{shr}}_{1} and T2shrT^{\text{shr}}_{2} can be shown by ρn((𝚺0(1a)𝑰p)/a)0\rho_{n}\left((\bm{\Sigma}_{0}-(1-a)\bm{I}_{p})/a\right)\to 0 and ρn(𝚺0/a)0\rho_{n}\left(\bm{\Sigma}_{0}/a\right)\to 0, as (𝚺0(1a)𝑰p)/a(\bm{\Sigma}_{0}-(1-a)\bm{I}_{p})/a and 𝚺0/a\bm{\Sigma}_{0}/a commute with 𝚺\bm{\Sigma}.

Despite the broad applications of (8), certain statistics using extreme eigenvalues remain outside its scope. For example, as outlined in Section 1, we define TComT^{\text{Com}} to combine TT and TRoyT^{\text{Roy}} for enhanced statistical power, where TComT^{\text{Com}} is given by

TCom=T2tr(𝚺0)+(TRoy)2.\displaystyle T^{\text{Com}}=\frac{T^{2}}{\text{tr}(\bm{\Sigma}_{0})}+(T^{\text{Roy}})^{2}. (9)

While (8) demonstrates universal bootstrap consistency for TT and TRoyT^{\text{Roy}} separately, it does not apply to TComT^{\text{Com}}. This limitation arises from TComT^{\text{Com}}’s dependence on the joint law of σ1(𝚺^𝚺0)\sigma_{1}\left(\bm{\hat{\Sigma}}-\bm{\Sigma}_{0}\right) and σ1(𝚺01/2𝚺^𝚺01/2𝑰p)\sigma_{1}\left(\bm{\Sigma}_{0}^{-1/2}\bm{\hat{\Sigma}}\bm{\Sigma}_{0}^{-1/2}-\bm{I}_{p}\right), which introduces a complex dependence structure. To address these limitations, we develop a generalized universality theorem that accounts for dependencies among various extreme eigenvalues. We begin by introducing the generalized operator norm ball 𝐁k,op(𝚺1,𝚺2,𝒕k)\mathbf{B}_{k,\text{op}}(\bm{\Sigma}_{1},\bm{\Sigma}_{2},\bm{t}_{k})

𝐁k,op(𝚺1,𝚺2,𝒕k)={𝑵:σi(𝚺212(𝑵𝚺1)𝚺212)ti,i=1,,k},\displaystyle\mathbf{B}_{k,\text{op}}(\bm{\Sigma}_{1},\bm{\Sigma}_{2},\bm{t}_{k})=\bigg{\{}\bm{N}\ :\ \sigma_{i}\left(\bm{\Sigma}_{2}^{-\frac{1}{2}}(\bm{N}-\bm{\Sigma}_{1})\bm{\Sigma}_{2}^{-\frac{1}{2}}\right)\leq t_{i},\ i=1,\cdots,k\bigg{\}}, (10)

where 𝚺2\bm{\Sigma}_{2} is positive-definite, and 𝒕k=(t1,,tk)T\bm{t}_{k}=(t_{1},\cdots,t_{k})^{T} with ti0t_{i}\geq 0 for each i=1,,ki=1,\cdots,k. Since the operator norm 𝑵𝚺1op\|\bm{N}-\bm{\Sigma}_{1}\|_{\text{op}} equals the largest singular value σ1(𝑵𝚺1)\sigma_{1}(\bm{N}-\bm{\Sigma}_{1}), we obtain 𝐁op(𝚺,t)=𝐁1,op(𝚺,𝑰p,t)\mathbf{B}_{\text{op}}(\bm{\Sigma},t)=\mathbf{B}_{1,\text{op}}(\bm{\Sigma},\bm{I}_{p},t). Next, we define the following quantity for symmetric matrices 𝚺1,m\bm{\Sigma}_{1,m} positive-definite matrices 𝚺2,m\bm{\Sigma}_{2,m}, m=1,,Mm=1,\cdots,M,

ρn({𝚺1,m,𝚺2,m}m=1M)=sup𝒕k1,,𝒕kM𝟎|\displaystyle\rho_{n}(\left\{\bm{\Sigma}_{1,m},\bm{\Sigma}_{2,m}\right\}_{m=1}^{M})=\sup_{\bm{t}_{k_{1}},\cdots,\bm{t}_{k_{M}}\geq\bm{0}}\bigg{|}\mathbb{P} (𝚺^m=1M𝐁km,op(𝚺1,m,𝚺2,m,𝒕km))\displaystyle\bigg{(}\bm{\hat{\Sigma}}\in\bigcap_{m=1}^{M}\mathbf{B}_{k_{m},\text{op}}(\bm{\Sigma}_{1,m},\bm{\Sigma}_{2,m},\bm{t}_{k_{m}})\bigg{)}
(𝚺^ubm=1M𝐁km,op(𝚺1,m,𝚺2,m,𝒕km))|,\displaystyle-\mathbb{P}\bigg{(}\bm{\hat{\Sigma}}^{\text{ub}}\in\bigcap_{m=1}^{M}\mathbf{B}_{k_{m},\text{op}}(\bm{\Sigma}_{1,m},\bm{\Sigma}_{2,m},\bm{t}_{k_{m}})\bigg{)}\bigg{|}, (11)

where we use 𝒕k𝟎\bm{t}_{k}\geq\bm{0} to represent ti0t_{i}\geq 0 for each i=1,,ki=1,\cdots,k.

Result 3 (Informal).

For 𝚺2,m1/2𝚺1,m𝚺2,m1/2\bm{\Sigma}_{2,m}^{-1/2}\bm{\Sigma}_{1,m}\bm{\Sigma}_{2,m}^{-1/2} that commutes with 𝚺2,m1/2𝚺𝚺2,m1/2\bm{\Sigma}_{2,m}^{-1/2}\bm{\Sigma}\bm{\Sigma}_{2,m}^{-1/2} for m=1,,Mm=1,\cdots,M, we have

ρn({𝚺1,m,𝚺2,m}m=1M)Cnδ0asn.\displaystyle\rho_{n}(\left\{\bm{\Sigma}_{1,m},\bm{\Sigma}_{2,m}\right\}_{m=1}^{M})\leq Cn^{-\delta}\to 0\quad\text{as}\quad n\to\infty. (12)

Take M=1M=1, k1=1k_{1}=1, 𝚺1,1=𝚺0\bm{\Sigma}_{1,1}=\bm{\Sigma}_{0}, 𝚺2,1=𝑰p\bm{\Sigma}_{2,1}=\bm{I}_{p}, (12) recovers (8). But (12) provides a more general result, showing that the joint law of

{(σk(𝚺2,112(𝚺^𝚺1,1)𝚺2,112))k=1k1,,(σk(𝚺2,M12(𝚺^𝚺1,M)𝚺2,M12))k=1kM},\displaystyle\left\{\left(\sigma_{k}\left(\bm{\Sigma}_{2,1}^{-\frac{1}{2}}(\bm{\hat{\Sigma}}-\bm{\Sigma}_{1,1})\bm{\Sigma}_{2,1}^{-\frac{1}{2}}\right)\right)_{k=1}^{k_{1}},\cdots,\left(\sigma_{k}\left(\bm{\Sigma}_{2,M}^{-\frac{1}{2}}(\bm{\hat{\Sigma}}-\bm{\Sigma}_{1,M})\bm{\Sigma}_{2,M}^{-\frac{1}{2}}\right)\right)_{k=1}^{k_{M}}\right\}, (13)

can be approximated by the universal bootstrap. This result allows for constructing statistics by combining extreme eigenvalues in (13) in various ways, with (12) confirming universal bootstrap consistency for these statistics.

3 Universal bootstrap

3.1 Preliminaries

Our results rely on the universality from the random matrix theory. To proceed, we first introduce some preliminary results relevant to our analysis. Denote 𝑿i=𝚺1/2𝒁i\bm{X}_{i}=\bm{\Sigma}^{1/2}\bm{Z}_{i}, where 𝒁i\bm{Z}_{i} has zero mean and identity covariance matrix for i=1,,ni=1,\cdots,n. We accordingly define 𝒁=(𝒁1,,𝒁n)Tn×p\bm{Z}=(\bm{Z}_{1},\cdots,\bm{Z}_{n})^{T}\in\mathbb{R}^{n\times p}. The primary matrix of interest is 𝚺^𝚺0\bm{\hat{\Sigma}}-\bm{\Sigma}_{0}. This inspires us to consider the matrix of a general form

𝑴n=𝚺^+𝑹=ϕ12𝑴n\displaystyle\bm{M}_{n}=\bm{\hat{\Sigma}}+\bm{R}=\phi^{\frac{1}{2}}\bm{M}^{\prime}_{n} (14)

where 𝑴n=(np)1/2𝚺1/2𝒁T𝒁𝚺1/2+𝑹\bm{M}^{\prime}_{n}=(np)^{-1/2}\ \bm{\Sigma}^{1/2}\bm{Z}^{T}\bm{Z}\bm{\Sigma}^{1/2}+\bm{R}^{\prime} and 𝑹=ϕ1/2𝑹\bm{R}^{\prime}=\phi^{-1/2}\ \bm{R}. Here, 𝑹\bm{R} is a symmetric matrix with eigenvalues that may be positive, negative, or zero, and 𝑴\bm{M}^{\prime} is normalized to address cases where α>1\alpha>1, as per Ding and Wang (2023). We impose some assumptions to determine the limit of the Stieltjes transform m𝑴n(z)=tr(𝑴nz𝑰p)1/pm_{\bm{M}^{\prime}_{n}}(z)=\text{tr}(\bm{M}_{n}^{\prime}-z\bm{I}_{p})^{-1}/p of 𝑴n\bm{M}^{\prime}_{n}.

Assumption 1.

Suppose that 𝒁=(Zij)n×p\bm{Z}=(Z_{ij})_{n\times p} and {Zij,i=1,,n;j=1,,p}\left\{Z_{ij},i=1,\cdots,n;j=1,\cdots,p\right\} are independent with 𝔼[Zij]=0\mathbb{E}[Z_{ij}]=0, 𝔼[Zij2]=1\mathbb{E}[Z_{ij}^{2}]=1. There exists a positive sequence CkC_{k} such that 𝔼[|Zij|k]Ck\mathbb{E}[|Z_{ij}|^{k}]\leq C_{k} for i=1,,n;j=1,,pi=1,\cdots,n;j=1,\cdots,p and kk\in\mathbb{N}.

Assumption 2.

The matrix 𝚺\bm{\Sigma} and 𝑹\bm{R} are bounded in spectral norm, i.e. there exists some positive CC such that 𝚺op,𝑹opC\|\bm{\Sigma}\|_{\text{op}},\ \|\bm{R}\|_{\text{op}}\leq C. Furthermore, there are constants c1,c2>0c_{1},c_{2}>0 such that the empirical spectral distribution FF of 𝚺\bm{\Sigma} satisfies F𝚺(c1)c2F^{\bm{\Sigma}}(c_{1})\leq c_{2}.

Assumption 3.

The matrix 𝚺\bm{\Sigma} and 𝑹\bm{R} are commutative, i.e. 𝚺𝑹=𝑹𝚺\bm{\Sigma}\bm{R}=\bm{R}\bm{\Sigma}.

Assumptions 1 and 2 are standard and often appear in the random matrix literature. Notably, Assumption 1 relies only on independence, without assuming identical distribution as in Qiu, Li and Yao (2023). While all moments exist under Assumption 1, this condition could be relaxed as discussed in Ding and Yang (2018). We do not pursue this here. We also permit 𝚺\bm{\Sigma} to be singular, a less restrictive condition than the invertibility assumed in Ding and Wang (2023); Ding, Hu and Wang (2024). This generalization enables testing singular 𝚺0\bm{\Sigma}_{0}, where TRoyT^{\text{Roy}} is invalid but TT remains applicable. Lastly, Assumption 3, imposed for technical requirements, necessitates that 𝚺\bm{\Sigma} and 𝑹\bm{R} share the same eigenvectors. The same Assumption is required in the signal-plus-noise model as in Zhou, Bai and Hu (2023); Zhou et al. (2024).

We introduce the following deterministic equivalence matrix

𝑸¯p(z)=(𝑹+1ϕ12(1+ϕ12e(z))𝚺z𝑰p)1,\displaystyle\bar{\bm{Q}}_{p}(z)=\bigg{(}\bm{R}^{\prime}+\frac{1}{\phi^{\frac{1}{2}}(1+\phi^{\frac{1}{2}}e(z))}\bm{\Sigma}-z\bm{I}_{p}\bigg{)}^{-1}, (15)

where ϕ=p/n\phi=p/n and e(z)e(z) is the fixed point of the equation e(z)=tr(𝑹+ϕ1/2(1+ϕ1/2e(z))1𝚺z𝑰p)1𝚺/pe(z)=\text{tr}\big{(}\bm{R}^{\prime}+\phi^{-1/2}(1+\phi^{1/2}e(z))^{-1}\bm{\Sigma}-z\bm{I}_{p}\big{)}^{-1}\bm{\Sigma}/p. Using 𝑸¯p(z)\bar{\bm{Q}}_{p}(z), we define the associated Stieltjes transform m¯p(z)\bar{m}_{p}(z) as m¯p(z)=tr(𝑸¯p(z))/p\bar{m}_{p}(z)=\text{tr}(\bar{\bm{Q}}_{p}(z))/p. According to Couillet, Debbah and Silverstein (2011), e(z)e(z) and 𝑸¯p(z)\bar{\bm{Q}}_{p}(z) are well-defined for every z+z\in\mathbb{C}^{+}, and m¯p(z)\bar{m}_{p}(z) serves as the Stieltjes transform of a measure ρ\rho on \mathbb{R}. We further denote E+E_{+} and EE_{-} as the endpoints of ρ\rho. Formally, if we define supp(ρ)={x:ρ([xϵ,x+ϵ])>0,ϵ>0}\text{supp}(\rho)=\left\{x\in\mathbb{R}\ :\ \rho([x-\epsilon,x+\epsilon])>0,\ \forall\epsilon>0\right\}, we have E+=supsupp(ρ)E_{+}=\sup\text{supp}(\rho) and E=infsupp(ρ)E_{-}=\inf\text{supp}(\rho). Combining results of Knowles and Yin (2017) for the α=1\alpha=1 case and Ding and Wang (2023) for the α>1\alpha>1 case, it follows that E+ϕ1/2E_{+}\asymp\phi^{1/2} and |E|1|E_{-}|\lesssim 1. Intuitively, the largest eigenvalue λ1(𝑴n)\lambda_{1}(\bm{M}^{\prime}_{n}) approaches E+E_{+}, while λp(𝑴n)\lambda_{p}(\bm{M}^{\prime}_{n}) approaches EE_{-}. The next subsection characterizes the fluctuations of λ1(𝑴n)E+\lambda_{1}(\bm{M}^{\prime}_{n})-E_{+} and λp(𝑴n)E\lambda_{p}(\bm{M}^{\prime}_{n})-E_{-}.

3.2 Universality

In this subsection, we establish the anisotropic local law, which forms the foundation for our universality results. Specifically, we aim to describe the fluctuations of λ1(𝑴n)E+\lambda_{1}(\bm{M}^{\prime}_{n})-E_{+} and λp(𝑴n)E\lambda_{p}(\bm{M}^{\prime}_{n})-E_{-}. This requires us to analyze the convergence of the Stieltjes transform near the endpoints E+E_{+} and EE_{-}. We define the local domains 𝒟+\mathcal{D}_{+} and 𝒟\mathcal{D}_{-} as

𝒟±=𝒟±(τ)={z=E+iη+:|z|τ,|EE±|τ1,n1+τητ1}\displaystyle\mathcal{D}_{\pm}=\mathcal{D}_{\pm}(\tau)=\left\{z=E+\text{i}\eta\in\mathbb{C}^{+}\ :\ |z|\geq\tau,|E-E_{\pm}|\leq\tau^{-1},n^{-1+\tau}\leq\eta\leq\tau^{-1}\right\} (16)

for a fixed parameter 0<τ<10<\tau<1. For α=1\alpha=1, let 𝒟=𝒟+𝒟\mathcal{D}=\mathcal{D}_{+}\cup\mathcal{D}_{-}, and for α>1\alpha>1, define 𝒟=𝒟+\mathcal{D}=\mathcal{D}_{+}. We will focus on the behavior of the Green function on 𝒟\mathcal{D}. This definition ensures that for α=1\alpha=1, both the largest and smallest eigenvalues of 𝑴n\bm{M}_{n}^{\prime} are controlled, so that σ1(𝑴n)=max{|λ1(𝑴n)|,|λp(𝑴n)|}\sigma_{1}(\bm{M}_{n}^{\prime})=\max\{|\lambda_{1}(\bm{M}_{n}^{\prime})|,|\lambda_{p}(\bm{M}_{n}^{\prime})|\} can also be controlled. While for the case α>1\alpha>1, we always have |E+||E||E_{+}|\gg|E_{-}|, thus only the largest eigenvalue requires attention.

To facilitate the presentation of the anisotropic local law, we assume in this subsection that

𝚺 is invertible and 𝑹 is positive-definite”.\displaystyle\text{"}\bm{\Sigma}\text{ is invertible and }\bm{R}\text{ is positive-definite"}. (17)

Under (17), we can rewrite 𝑴n\bm{M}^{\prime}_{n} as 𝑴n=𝚺1/2((np)1/2𝒁T𝒁+𝑹′′)𝚺1/2\bm{M}^{\prime}_{n}=\bm{\Sigma}^{1/2}\left((np)^{-1/2}\ \bm{Z}^{T}\bm{Z}+\bm{R}^{{}^{\prime\prime}}\right)\bm{\Sigma}^{1/2} where 𝑹′′=𝚺1/2𝑹𝚺1/2\bm{R}^{{}^{\prime\prime}}=\bm{\Sigma}^{-1/2}\bm{R}^{{}^{\prime}}\bm{\Sigma}^{-1/2}. Since 𝑴n\bm{M}^{\prime}_{n} is quadratic in 𝑿\bm{X}, we follow Knowles and Yin (2017) to define the linearization matrix 𝑯(z)\bm{H}(z) and the corresponding linearized Green function 𝑮(z)\bm{G}(z)

𝑯(z)=[𝚺1(𝑹′′)121(np)14𝒁T(𝑹′′)12z𝑰p𝟎1(np)14𝒁𝟎z𝑰n],𝑮(z)=𝑯1(z).\displaystyle\bm{H}(z)=\left[\begin{array}[]{ccc}-\bm{\Sigma}^{-1}&(\bm{R}^{{}^{\prime\prime}})^{\frac{1}{2}}&\frac{1}{(np)^{\frac{1}{4}}}\bm{Z}^{T}\\ (\bm{R}^{{}^{\prime\prime}})^{\frac{1}{2}}&-z\bm{I}_{p}&\bm{0}\\ \frac{1}{(np)^{\frac{1}{4}}}\bm{Z}&\bm{0}&-z\bm{I}_{n}\end{array}\right],\quad\bm{G}(z)=\bm{H}^{-1}(z). (21)

Notice that 𝑮(z)\bm{G}(z) is a large (n+2p)×(n+2p)(n+2p)\times(n+2p) matrix. We also define the deterministic equivalent Green function as

𝑮¯(z)=[z𝚺12𝑸¯p(z)𝚺12𝚺12𝑸¯p(z)𝚺12(𝑹′′)12𝟎(𝑹′′)12𝚺12𝑸¯p(z)𝚺121z((𝑹′′)12𝚺12𝑸¯p(z)𝚺12(𝑹′′)12𝑰p)𝟎𝟎𝟎m~(z)𝑰n],\displaystyle\bar{\bm{G}}(z)=\left[\begin{array}[]{ccc}z\bm{\Sigma}^{\frac{1}{2}}\bar{\bm{Q}}_{p}(z)\bm{\Sigma}^{\frac{1}{2}}&\bm{\Sigma}^{\frac{1}{2}}\bar{\bm{Q}}_{p}(z)\bm{\Sigma}^{\frac{1}{2}}(\bm{R}^{{}^{\prime\prime}})^{\frac{1}{2}}&\bm{0}\\ (\bm{R}^{{}^{\prime\prime}})^{\frac{1}{2}}\bm{\Sigma}^{\frac{1}{2}}\bar{\bm{Q}}_{p}(z)\bm{\Sigma}^{\frac{1}{2}}&\frac{1}{z}\left((\bm{R}^{{}^{\prime\prime}})^{\frac{1}{2}}\bm{\Sigma}^{\frac{1}{2}}\bar{\bm{Q}}_{p}(z)\bm{\Sigma}^{\frac{1}{2}}(\bm{R}^{{}^{\prime\prime}})^{\frac{1}{2}}-\bm{I}_{p}\right)&\bm{0}\\ \bm{0}&\bm{0}&\tilde{m}(z)\bm{I}_{n}\end{array}\right], (25)

where m~(z)=z1(1+ϕ1/2e(z))1\tilde{m}(z)=-z^{-1}(1+\phi^{1/2}e(z))^{-1}. The anisotropic local law aims to control the difference 𝑮(z)𝑮¯(z)\bm{G}(z)-\bar{\bm{G}}(z) for z𝒟z\in\mathcal{D}. A crucial observation is that defining the parameter zz-dependent covariance 𝚺(z)=z𝚺(z𝑰p𝑹)1\bm{\Sigma}(z)=z\bm{\Sigma}(z\bm{I}_{p}-\bm{R}^{\prime})^{-1} with the corresponding measure πz=i=1pδλi(𝚺(z))/p\pi^{z}=\sum_{i=1}^{p}\delta_{\lambda_{i}(\bm{\Sigma}(z))}/p, where δz\delta_{z} is the Dirac point measure at zz, we have the following zz-dependent deformed Marčenko-Pastur law,

1m~(z)=z+ϕ12t1+ϕ12m~(z)tdπz(t).\displaystyle\frac{1}{\tilde{m}(z)}=-z+\int\frac{\phi^{\frac{1}{2}}t}{1+\phi^{-\frac{1}{2}}\tilde{m}(z)t}\mathrm{d}\pi^{z}(t). (26)

This result mirrors the form of the deformed Marčenko-Pastur law in Ding and Wang (2023), with 𝚺(z)\bm{\Sigma}(z) as the covariance matrix. This demonstrates that the effect of 𝑹\bm{R} can be represented by turning 𝚺\bm{\Sigma} into the zz-dependent covariance 𝚺(z)\bm{\Sigma}(z). This insight simplifies our proof and presentation. For the denominator in (26), we impose the following technical assumption.

Assumption 4.

When α=1,\alpha=1, we require that there exists τ>0\tau>0 such that

|1+ϕ12m~(E±)λi(𝚺(E±))|τ,i=1,,p.\displaystyle|1+\phi^{-\frac{1}{2}}\tilde{m}(E_{\pm})\lambda_{i}(\bm{\Sigma}(E_{\pm}))|\geq\tau,\quad i=1,\cdots,p. (27)

We provide several remarks regarding Assumption 4. Informally, condition (27) ensures that the extreme eigenvalues of 𝚺\bm{\Sigma} do not spread near the endpoints E±E_{\pm}, thereby preventing spikes outside the support of ρ\rho. Similar assumptions have been made in the literature for the universality of 𝚺^\hat{\bm{\Sigma}}, using the Stieltjes transform in place of m~\tilde{m} and λi(𝚺)\lambda_{i}(\bm{\Sigma}) in place of λi(𝚺(E±))\lambda_{i}(\bm{\Sigma}(E_{\pm})), as in Bao, Pan and Zhou (2015); Knowles and Yin (2017). This aligns with the intuition that the effect of 𝑹\bm{R} can be expressed through the transformation from 𝚺\bm{\Sigma} to 𝚺(z)\bm{\Sigma}(z). Moreover, (27) is automatically satisfied when ϕ=p/n\phi=p/n\to\infty, i.e. α>1\alpha>1. We thus impose Assumption 4 only in the case α=1\alpha=1.

To state our main results, we define the concept of stochastic domination, introduced by Erdős, Knowles and Yau (2013) and widely applied in random matrix theory (Knowles and Yin, 2017). For two families of non-negative random variables A=(A(n)(u):n,uU(n))A=\left(A^{(n)}(u)\ :\ n\in\mathbb{N},u\in U^{(n)}\right) and B=(B(n)(u):n,uU(n))B=\left(B^{(n)}(u)\ :\ n\in\mathbb{N},u\in U^{(n)}\right), where U(n)U^{(n)} is nn-dependent parameter set, we say AA is stochastically dominated by BB uniformly in uu if for any ϵ>0\epsilon>0 and D>0D>0, we have supuU(n)(A(n)(u)>nϵB(n)(u))nD\sup_{u\in U^{(n)}}\mathbb{P}\left(A^{(n)}(u)>n^{\epsilon}B^{(n)}(u)\right)\leq n^{-D} for large enough nn0(ϵ,D)n\geq n_{0}(\epsilon,D). We use the notation ABA\prec B to represent this relationship. When AA is a family of negative or complex random variables, we also write ABA\prec B or A=O(B)A=O_{\prec}(B) to indicate |A|B|A|\prec B. With these definitions, we state the anisotropic local law as follows.

Theorem 3.1 (Anisotropic local law).

Define the control parameter as Ψ(z)=m~(z)nη+1nη\Psi(z)=\sqrt{\frac{\Im\tilde{m}(z)}{n\eta}}+\frac{1}{n\eta}. Suppose that Assumptions 1, 2, 3, 4 and (17) hold.

(i) We have

𝒖T𝚽𝚺¯1(𝑮(z)𝑮¯(z))𝚺¯1𝚽𝒗=O(𝒖𝒗Ψ(z)),\displaystyle\bm{u}^{T}\bm{\Phi}\bar{\bm{\Sigma}}^{-1}(\bm{G}(z)-\bar{\bm{G}}(z))\bar{\bm{\Sigma}}^{-1}\bm{\Phi}\bm{v}=O_{\prec}(\|\bm{u}\|\|\bm{v}\|\Psi(z)), (28)

uniformly in vectors 𝐮,𝐯n+2p\bm{u},\bm{v}\in\mathbb{R}^{n+2p} and z𝒟z\in\mathcal{D}, where the weight matrix 𝚽\bm{\Phi} is defined as 𝚽=diag(ϕ1/4𝐈p,ϕ3/4𝐈p,𝐈n)\bm{\Phi}=\text{diag}(\phi^{-1/4}\bm{I}_{p},\phi^{-3/4}\bm{I}_{p},\bm{I}_{n}) and the augemented covariance matrix is 𝚺¯=diag(𝚺,𝐈n+p)\bar{\bm{\Sigma}}=\text{diag}(\bm{\Sigma},\bm{I}_{n+p}).

(ii) Moreover, we have the average local law

mp(z)m¯p(z)=O((nη)1),\displaystyle m_{p}(z)-\bar{m}_{p}(z)=O_{\prec}((n\eta)^{-1}), (29)

uniformly in z𝒟z\in\mathcal{D}.

The anisotropic local law provides a delicate characterization of the discrepancy between the random Green function and its deterministic counterpart, yielding a precise bound to analyze the behavior of the extreme eigenvalues of 𝑴n\bm{M}_{n}^{\prime}. When 𝑹=𝟎\bm{R}=\bm{0}, similar results have been established for α=1\alpha=1 in Knowles and Yin (2017) and for α1\alpha\geq 1 in Ding and Wang (2023); Ding, Hu and Wang (2024). As demonstrated in Ding and Wang (2023), the results for general α1\alpha\geq 1 hold without requiring the dimension pp and sample size nn to grow proportionally, resulting in different convergence rates for the blocks of 𝑮(z)\bm{G}(z). Ding and Wang (2023) separately provides convergence rates for each block and offers a coarse bound for the anisotropic local law

Giμ(z)=O(ϕ14Ψ(z)),i=1,,p,μ=p+1,,2p+n.\displaystyle G_{i\mu}(z)=O_{\prec}(\phi^{-\frac{1}{4}}\Psi(z)),\ i=1,\cdots,p,\mu=p+1,\cdots,2p+n.

In this study, we enhance these findings by introducing the weight matrix 𝚽\bm{\Phi}, allowing us to express the anisotropic local law (28) in a more compact form. This reformulation refines the convergence rate and simplifies our proof.

Building on the anisotropic local law theorem, we proceed to establish our universality result. Our first step involves deriving key implications from Theorem 3.1. Recall that ρ\rho is the measure on \mathbb{R} whose Stieltjes transform is the deterministic equivalence m¯p(z)\bar{m}_{p}(z). We define the quantile sequence of ρ\rho as w1wpw_{1}\geq\cdots\geq w_{p} such that wi+ρ(x)dx=(i1/2)/p\int_{w_{i}}^{+\infty}\rho(x)\mathrm{d}x=(i-1/2)/p for i=1,,pi=1,\cdots,p. We aim to demonstrate that the eigenvalues λi(𝑴n)\lambda_{i}(\bm{M}_{n}^{\prime}) of 𝑴n\bm{M}_{n}^{\prime} remain close to wiw_{i} for i=1,,pi=1,\cdots,p.

Theorem 3.2 (Rigidity of eigenvalues).

Suppose that Assumptions 1, 2, 3, 4 hold.

(i) When α=1\alpha=1, we have for any fixed integer number k1k\geq 1,

|λi(𝑴n)wi|,|λpi(𝑴n)wpi|(min{i,p+1i})13n23,\displaystyle|\lambda_{i}(\bm{M}_{n}^{\prime})-w_{i}|,\ |\lambda_{p-i}(\bm{M}_{n}^{\prime})-w_{p-i}|\prec(\min\{i,p+1-i\})^{-\frac{1}{3}}n^{-\frac{2}{3}}, (30)

uniformly for 1ik1\leq i\leq k.

(ii)When α>1\alpha>1, we have

|λi(𝑴n)wi|(min{i,n+1i})13n23,\displaystyle|\lambda_{i}(\bm{M}_{n}^{\prime})-w_{i}|\prec(\min\{i,n+1-i\})^{-\frac{1}{3}}n^{-\frac{2}{3}}, (31)

uniformly for 1in1\leq i\leq n.

Theorem 3.2 establishes the rigidity of the eigenvalues of 𝑴n\bm{M}_{n}^{\prime}, with two main implications. First, when α=1\alpha=1, equation (30) shows that the largest and smallest kk-th eigenvalues lie within an n2/3+ϵn^{-2/3+\epsilon}-neighborhood of wpw_{p}, respectively, for any ϵ>0\epsilon>0. Furthermore, we have |w1E+|,|wpE|n2/3|w_{1}-E_{+}|,|w_{p}-E_{-}|\prec n^{-2/3}, leading to |λi(𝑴n)E+|,|λpi(𝑴n)E|n2/3|\lambda_{i}(\bm{M}_{n}^{\prime})-E_{+}|,|\lambda_{p-i}(\bm{M}_{n}^{\prime})-E_{-}|\prec n^{-2/3} uniformly for 1ik1\leq i\leq k. Second, with Theorem 3.2, we demonstrate that wiϕ1/2+w_{i}\asymp\phi^{1/2}\to+\infty for 1in1\leq i\leq n, revealing a fast n2/3+ϵn^{-2/3+\epsilon} approximation rate for the diverging quantities λi(𝑴n)\lambda_{i}(\bm{M}_{n}^{\prime}) and wiw_{i}.

To establish the universality result, we provide bounds for the discrepancy between the distribution of λ1(𝑴n)\lambda_{1}(\bm{M}_{n}^{\prime}) and its Gaussian counterpart. We denote by Gau\mathbb{P}^{\text{Gau}} and 𝔼Gau\mathbb{E}^{\text{Gau}} the probability and expectation under the additional assumption that {Zij}1in,1jp\{Z_{ij}\}_{1\leq i\leq n,1\leq j\leq p} are independent standard Gaussian variables. With this notation, we present the following result.

Theorem 3.3 (Universality of the largest eigenvalue).

Under Assumptions 1, 2, 3, 4, there exists constant CC for large enough nn and tt\in\mathbb{R} such that for any small ϵ>0\epsilon>0,

(n23(\displaystyle\mathbb{P}(n^{\frac{2}{3}}( λ1(𝑴n)E+)tnϵ)n16+Cϵ\displaystyle\lambda_{1}(\bm{M}_{n}^{\prime})-E_{+})\leq t-n^{-\epsilon})-n^{-\frac{1}{6}+C\epsilon}
\displaystyle\leq Gau(n23(λ1(𝑴n)E+)t)\displaystyle\mathbb{P}^{\text{Gau}}(n^{\frac{2}{3}}(\lambda_{1}(\bm{M}_{n}^{\prime})-E_{+})\leq t) (32)
(n23(λ1(𝑴n)E+)t+nϵ)+n16+Cϵ.\displaystyle\leq\mathbb{P}(n^{\frac{2}{3}}(\lambda_{1}(\bm{M}_{n}^{\prime})-E_{+})\leq t+n^{-\epsilon})+n^{-\frac{1}{6}+C\epsilon}.

When α=1\alpha=1, similar results also hold for λp(𝐌n)\lambda_{p}(\bm{M}_{n}^{\prime}).

The universality Theorem 3.3 shows that the asymptotic distributions of λ1(𝑴n)\lambda_{1}(\bm{M}_{n}), λp(𝑴n)\lambda_{p}(\bm{M}_{n}), and consequently σ1(𝑴n)=𝑴nop\sigma_{1}(\bm{M}_{n})=\|\bm{M}_{n}\|_{\text{op}}, rely solely on the first two moments of 𝑿\bm{X}. In contrast, as discussed in Section 4.3, other widely-used norms of 𝑴n\bm{M}_{n}, such as 𝑴nF\|\bm{M}_{n}\|_{\text{F}} and 𝑴nsup\|\bm{M}_{n}\|_{\text{sup}},are influenced by the first four moments of 𝑿\bm{X}. This universality characteristic of the operator norm offers a straightforward yet effective framework for constructing rejection regions in hypothesis testing. A detailed exploration of this application is presented in Section 3.3. Before applying universality to covariance matrix testing, we outline several corollaries of Theorem 3.3, which are of notable independent interest.

Corollary 1.

Consider the case α>1\alpha>1, 𝚺=𝑰p\bm{\Sigma}=\bm{I}_{p}, 𝑹=𝟎\bm{R}=\bm{0}. Define μ=p\mu=p, σ=p1/2n1/6\sigma=p^{1/2}n^{-1/6}. Under Assumption 1, we have as nn\to\infty,

λ1(𝒁T𝒁)μσTW1,\displaystyle\frac{\lambda_{1}(\bm{Z}^{T}\bm{Z})-\mu}{\sigma}\Rightarrow\text{TW}_{1}, (33)

where “\Rightarrow” represents convergence in law, and TW1\text{TW}_{1} is the Tracy-Widom distribution of type 11.

One of the central problems in random matrix theory is establishing the asymptotic distribution of the largest eigenvalue of a sample covariance matrix. In this context, (33) was derived for Gaussian 𝒁\bm{Z} in the setting where p/np/n\to\infty (Karoui, 2003). However, for general distributions of 𝒁\bm{Z} where α>1\alpha>1, no further results have been established. Our universality Theorem 3.3 addresses this significant gap in random matrix theory.

Corollary 2.

Consider the case α=1\alpha=1, 𝚺=𝑰p\bm{\Sigma}=\bm{I}_{p}, and a general symmetric matrix 𝑹\bm{R}. Under Assumptions 1,2,4, we have nn\to\infty,

n23(λ1(𝚺^+𝑹)E+)σTW1.\displaystyle\frac{n^{\frac{2}{3}}(\lambda_{1}(\hat{\bm{\Sigma}}+\bm{R})-E_{+})}{\sigma}\Rightarrow\text{TW}_{1}. (34)

Here E+E_{+} and σ\sigma are two constants depending on 𝑹\bm{R}.

As demonstrated in Section 5 in the Supplement, when 𝒁\bm{Z} follows a Gaussian distribution, (34) can be derived using the findings from Ji and Park (2021). For non-Gaussian distributions of 𝒁\bm{Z}, our universality Theorem 3.3 extends this result to the same form. These results expand the traditional analysis of extreme eigenvalues in the sample covariance matrix 𝚺^\bm{\hat{\Sigma}} to a more general setting 𝚺^𝚺0\hat{\bm{\Sigma}}-\bm{\Sigma}_{0}, where 𝑹=𝚺0\bm{R}=-\bm{\Sigma}_{0}, with broader applications in covariance testing within high-dimensional statistics.

3.3 Universal bootstrap for covariance test

Theorem 3.3 is commonly termed universality in random matrix literature concerning extreme eigenvalue, see Bao, Pan and Zhou (2015); Knowles and Yin (2017). However, this universality does not ensure the validity of our universal bootstrap procedure, which depends on bounds like (7) and (8). Specifically, (3.3) provides bounds on (n2/3(Tϕ1/2E)t±nϵ)\mathbb{P}(n^{2/3}(T-\phi^{1/2}E)\leq t\pm n^{-\epsilon}) and (n2/3(Tubϕ1/2E)t)\mathbb{P}(n^{2/3}(T^{\text{ub}}-\phi^{1/2}E)\leq t) with E=max{E+,E}E=\max\{E_{+},-E_{-}\}, while we aim to bound (n2/3(Tϕ1/2E)t)\mathbb{P}(n^{2/3}(T-\phi^{1/2}E)\leq t) and (n2/3(Tubϕ1/2E)t)\mathbb{P}(n^{2/3}(T^{\text{ub}}-\phi^{1/2}E)\leq t). Establishing this bound requires demonstrating the closeness between (n2/3(Tϕ1/2E)t±nϵ)\mathbb{P}(n^{2/3}(T-\phi^{1/2}E)\leq t\pm n^{-\epsilon}) and (n2/3(Tϕ1/2E)t)\mathbb{P}(n^{2/3}(T-\phi^{1/2}E)\leq t). This step relies on the anti-concentration inequality (Chernozhukov, Chetverikov and Kato, 2013) that will be established below.

We first define the Lévy anti-concentration function for random variable TT as

a(T,δ)=supt(tTt+δ),\displaystyle a(T,\delta)=\sup_{t\in\mathbb{R}}\mathbb{P}(t\leq T\leq t+\delta), (35)

for δ>0\delta>0. We shall provide bounds for a(T,δ)a(T,\delta). We note that the theorems derived thus far apply specifically to the extreme eigenvalues λ1(𝑴n)\lambda_{1}(\bm{M}_{n}^{\prime}) and λp(𝑴n)\lambda_{p}(\bm{M}_{n}^{\prime}). As we will show, extending these results to extreme singular values σ1(𝑴n)\sigma_{1}(\bm{M}_{n}^{\prime}) requires only weaker assumptions.

Assumption 4.

When α=1,\alpha=1, we require that there exists τ>0\tau>0 such that

(i) if |E+|>|E||E_{+}|>|E_{-}|, we assume |1+ϕ1/2m~(E+)λi(𝚺(E+))|τ|1+\phi^{-1/2}\tilde{m}(E_{+})\lambda_{i}(\bm{\Sigma}(E_{+}))|\geq\tau for i=1,,pi=1,\cdots,p.

(ii) if |E+|<|E||E_{+}|<|E_{-}|, we assume |1+ϕ1/2m~(E)λi(𝚺(E))|τ|1+\phi^{-1/2}\tilde{m}(E_{-})\lambda_{i}(\bm{\Sigma}(E_{-}))|\geq\tau for i=1,,pi=1,\cdots,p.

(iii) if |E+|=|E||E_{+}|=|E_{-}|, we assume |1+ϕ1/2m~(E±)λi(𝚺(E±))|τ|1+\phi^{-1/2}\tilde{m}(E_{\pm})\lambda_{i}(\bm{\Sigma}(E_{\pm}))|\geq\tau for i=1,,pi=1,\cdots,p.

Assumption 4 is evidently weaker than Assumption 4. The reasoning behind Assumption 4 is straightforward: given that the singular value σ(𝑴n)=max{|λ1(𝑴n)|,|λp(𝑴n)|}\sigma(\bm{M}_{n}^{\prime})=\max\{|\lambda_{1}(\bm{M}_{n}^{\prime})|,|\lambda_{p}(\bm{M}_{n}^{\prime})|\}, if |E+|>|E+||E_{+}|>|E_{+}|, then with probability 11, σ(𝑴n)=|λ1(𝑴n)|\sigma(\bm{M}_{n}^{\prime})=|\lambda_{1}(\bm{M}_{n}^{\prime})| for sufficiently large nn, meaning only the spectral behavior near the right edge E+E_{+} is necessary to consider. A similar argument applies when |E+|<|E+||E_{+}|<|E_{+}|. While this relaxation may appear minor, the following example demonstrates its merit in extending the applicability of our theorem, even in the simplest case.

Example 1.

Consider the case α=1\alpha=1 and 𝚺=𝑰p\bm{\Sigma}=\bm{I}_{p}, 𝑹=𝑰p\bm{R}=-\bm{I}_{p}. We can show that 𝑴n=ϕ1/2(𝚺^𝑰p)\bm{M}_{n}^{\prime}=\phi^{-1/2}(\hat{\bm{\Sigma}}-\bm{I}_{p}) satisfies Assumption 4 for all nn and pp, but fails to satisfy Assumption 4 if p>np>n.

This demonstrates the relevance of introducing Assumption 4 when focusing on σ(𝑴n)\sigma(\bm{M}_{n}^{\prime}). Specifically, in the case where p>np>n, the smallest eigenvalue, λp(𝑴n)\lambda_{p}(\bm{M}_{n}^{\prime}), becomes isolated at the left edge E=ϕ1/2E_{-}=-\phi^{-1/2} and does not satisfy Assumption 4. However, we can show that when p>np>n, |E+|>|E||E_{+}|>|E_{-}|, allowing Assumption 4 to hold consistently. Accordingly, we adopt Assumption 4 in our analysis of extreme singular values. Recalling that TT is the largest singular value of 𝚺𝚺0\bm{\Sigma}-\bm{\Sigma}_{0}, we are led to define the concept of an admissible pair.

Definition 1 (Admissible pair).

We call two pp by pp non-negative matrices (𝚺,𝚺0)(\bm{\Sigma},\bm{\Sigma}_{0}) an admissible pair, if (𝚺,𝑹)=(𝚺,𝚺0)(\bm{\Sigma},\bm{R})=(\bm{\Sigma},-\bm{\Sigma}_{0}) satisfy Assumptions 2, 3, 4.

Theorem 3.4 (Anti-concentration inequality).

Under Assumption 1, for any admissible pair (𝚺,𝚺0)(\bm{\Sigma},\bm{\Sigma}_{0}) and δn2/3\delta\leq n^{-2/3}, we have

a(T,δ)n23δ.\displaystyle a(T,\delta)\prec n^{\frac{2}{3}}\delta. (36)

Equipped with these results, we present the universal consistency theorem for TT. Recall the test in (5), where we reject H0H_{0} if Tq^𝚺0ub,B(α)T\geq\hat{q}^{\text{ub},B}_{\bm{\Sigma}_{0}}(\alpha). Here q^𝚺0ub,B(α)=q^𝚺0,𝚺0ub,B(α)\hat{q}^{\text{ub},B}_{\bm{\Sigma}_{0}}(\alpha)=\hat{q}^{\text{ub},B}_{\bm{\Sigma}_{0},\bm{\Sigma}_{0}}(\alpha) and q^𝚺,𝚺0ub,B(α)\hat{q}^{\text{ub},B}_{\bm{\Sigma},\bm{\Sigma}_{0}}(\alpha) denotes the empirical upper α\alpha-th quantile of Tub,bT^{\text{ub},b}, b=1,,Bb=1,\cdots,B. To establish the theoretical validity of this universal bootstrap procedure, we provide a bound on the uniform Gaussian approximation error

ρn(𝚺0)=supt0|(𝚺^𝐁op(𝚺0,t))(𝚺^ub𝐁op(𝚺0,t))|,\displaystyle\rho_{n}(\bm{\Sigma}_{0})=\sup_{t\geq 0}\bigg{|}\mathbb{P}\bigg{(}\bm{\hat{\Sigma}}\in\mathbf{B}_{\text{op}}(\bm{\Sigma}_{0},t)\bigg{)}-\mathbb{P}\bigg{(}\bm{\hat{\Sigma}}^{\text{ub}}\in\mathbf{B}_{\text{op}}(\bm{\Sigma}_{0},t)\bigg{)}\bigg{|}, (37)

where 𝐁op(𝚺0,t)\mathbf{B}_{\text{op}}(\bm{\Sigma}_{0},t) is the operator norm ball centered at 𝚺0\bm{\Sigma}_{0} with radius tt. The probability in (37) is defined with respect to 𝑿\bm{X} and 𝒀\bm{Y}, whose rows have covariance matrix 𝚺\bm{\Sigma}, which may differ from 𝚺0\bm{\Sigma}_{0}. We summarize the results for universal bootstrap consistency as follows.

Theorem 3.5 (Universal bootstrap consistency).

Under Assumption 1, for any admissible pair (𝚺,𝚺0)(\bm{\Sigma},\bm{\Sigma}_{0}), we have the uniform Gaussian approximation bound

ρn(𝚺0)Cnδ0asn,\displaystyle\rho_{n}(\bm{\Sigma}_{0})\leq Cn^{-\delta}\to 0\quad\text{as}\quad n\to\infty, (38)

for some constants CC, δ>0\delta>0.

Moreover, we have with probability approaching 11,

sup0α1|(Tq^𝚺,𝚺0ub,B(α)|𝒀1,,𝒀B)α|ρn(𝚺0)+B12.\displaystyle\sup_{0\leq\alpha\leq 1}\bigg{|}\mathbb{P}\bigg{(}T\geq\hat{q}^{\text{ub},B}_{\bm{\Sigma},\bm{\Sigma}_{0}}(\alpha)\bigg{|}\bm{Y}^{1},\cdots,\bm{Y}^{B}\bigg{)}-\alpha\bigg{|}\lesssim\rho_{n}(\bm{\Sigma}_{0})+B^{-\frac{1}{2}}. (39)

Theorem 3.5 establishes the uniform Gaussian approximation bound and Type I error bound of the universal bootstrap. Expression (39) ensures that we can uniformly control the test size with an error of at most nδ+B1/2n^{-\delta}+B^{-1/2}, which vanishes as nn\to\infty and BB\to\infty. This result, therefore, guarantees the uniform consistency of the universal bootstrap procedure. To appreciate these results, we compare our universal bootstrap with the widely used multiplier bootstrap that approximates the distribution of TT using the distribution of Tmb=1ni=1nϵi(𝑿i𝑿iT𝚺^)opT^{\text{mb}}=\left\|\frac{1}{n}\sum_{i=1}^{n}\epsilon_{i}(\bm{X}_{i}\bm{X}_{i}^{T}-\hat{\bm{\Sigma}})\right\|_{\text{op}} conditioned on 𝑿\bm{X}, where ϵ1,,ϵn\epsilon_{1},\cdots,\epsilon_{n}\in\mathbb{R} are random variables with zero mean and unit variance. To ensure consistency of multiplier bootstrap, Han, Xu and Zhou (2018) required a low dimension condition, while Lopes (2022a); Lopes, Erichson and Mahoney (2023); Giessing (2023) imposed a fast eigen-decay condition, as discussed in Section 2. These methods rely on the consistency of sample covariance, i.e. 𝚺^𝚺op0\|\bm{\hat{\Sigma}}-\bm{\Sigma}\|_{\text{op}}\to 0, which does not hold even when 𝚺=𝑰p\bm{\Sigma}=\bm{I}_{p} and α1\alpha\geq 1. This highlights the key advantage of our proposal for covariance testing in a high-dimensional setting where estimation consistency is unattainable, yet test consistency is assured.

3.4 Sharp uniform simultaneous confidence intervals

The approximation of the distribution of the largest singular value has numerous applications, including determining the number of spikes, as discussed in Ding and Yang (2022). Next, we present an additional important application of Theorem 3.5. Define the inner product of two matrices 𝑨\bm{A} and 𝑩\bm{B} as 𝑨,𝑩=tr(𝑨T𝑩)\langle\bm{A},\bm{B}\rangle=\text{tr}(\bm{A}^{T}\bm{B}), and the Schatten 11-norm of matrix 𝑨\bm{A} as 𝑨S1=i=1σi(𝑨)\|\bm{A}\|_{S_{1}}=\sum_{i=1}\sigma_{i}(\bm{A}). We shall construct sharp uniform simultaneous confidence intervals for 𝑨,𝚺\langle\bm{A},\bm{\Sigma}\rangle and 𝒄1T𝚺𝒄2\bm{c}_{1}^{T}\bm{\Sigma}\bm{c}_{2} for all 𝑨p×p\bm{A}\in\mathbb{R}^{p\times p} and 𝒄1,𝒄2p\bm{c}_{1},\bm{c}_{2}\in\mathbb{R}^{p}. To achieve this, we note that q^𝚺ub,B(α)=q^Sp(𝚺)ub,B(α)\hat{q}^{\text{ub},B}_{\bm{\Sigma}}(\alpha)=\hat{q}^{\text{ub},B}_{\text{Sp}(\bm{\Sigma})}(\alpha) where the spectral matrix is defined as Sp(𝚺)=diag(λ1(𝚺),,λp(𝚺))\text{Sp}(\bm{\Sigma})=\text{diag}(\lambda_{1}(\bm{\Sigma}),\cdots,\lambda_{p}(\bm{\Sigma})). Although the complete matrix 𝚺\bm{\Sigma} cannot be consistently estimated in our setting, certain estimators Sp^(𝚺)\widehat{\text{Sp}}(\bm{\Sigma}) of the spectral matrix Sp(𝚺)\text{Sp}(\bm{\Sigma}) have been shown to be consistent under a suitable normalized distance. For instance, given the distance dSp2(Sp^(𝚺),Sp(𝚺))=tr(Sp^(𝚺)Sp(𝚺))2/pd^{2}_{\text{Sp}}(\widehat{\text{Sp}}(\bm{\Sigma}),\text{Sp}(\bm{\Sigma}))=\text{tr}(\widehat{\text{Sp}}(\bm{\Sigma})-\text{Sp}(\bm{\Sigma}))^{2}/p, Ledoit and Wolf (2015); Kong and Valiant (2017) proposed estimators Sp^(𝚺)\widehat{\text{Sp}}(\bm{\Sigma}) satisfying dSp(Sp^(𝚺),Sp(𝚺))0d_{\text{Sp}}(\widehat{\text{Sp}}(\bm{\Sigma}),\text{Sp}(\bm{\Sigma}))\to 0 as nn\to\infty when α=1\alpha=1. Therefore, the estimated threshold q^Sp^(𝚺)ub,B(α)\hat{q}^{\text{ub},B}_{\widehat{\text{Sp}}(\bm{\Sigma})}(\alpha) can replace the unknown q^Sp(𝚺)ub,B(α)\hat{q}^{\text{ub},B}_{\text{Sp}(\bm{\Sigma})}(\alpha).

Theorem 3.6.

Under Assumption 1, for any admissible pair (𝚺,𝚺)(\bm{\Sigma},\bm{\Sigma}) and estimators Sp^(𝚺)\widehat{\text{Sp}}(\bm{\Sigma}) of Sp(𝚺)\text{Sp}(\bm{\Sigma}), we have the following sharp uniform simultaneous confidence intervals for 𝐀,𝚺\langle\bm{A},\bm{\Sigma}\rangle

sup0α1|(𝑨p×p,𝑨,𝚺[𝑨,𝚺^q^Sp^(𝚺)ub,B(α)\displaystyle\sup_{0\leq\alpha\leq 1}\bigg{|}\mathbb{P}\bigg{(}\forall\bm{A}\in\mathbb{R}^{p\times p},\ \langle\bm{A},\bm{\Sigma}\rangle\in\bigg{[}\langle\bm{A},\bm{\hat{\Sigma}}\rangle-\hat{q}^{\text{ub},B}_{\widehat{\text{Sp}}(\bm{\Sigma})}(\alpha)\ 𝑨S1,\displaystyle\|\bm{A}\|_{S_{1}}, (40)
𝑨,𝚺^+q^Sp^(𝚺)ub,B(α)𝑨S1]|𝒀1,,𝒀B)(1α)|ρn(𝚺0)+\displaystyle\langle\bm{A},\bm{\hat{\Sigma}}\rangle+\hat{q}^{\text{ub},B}_{\widehat{\text{Sp}}(\bm{\Sigma})}(\alpha)\ \|\bm{A}\|_{S_{1}}\bigg{]}\bigg{|}\bm{Y}^{1},\cdots,\bm{Y}^{B}\bigg{)}-(1-\alpha)\bigg{|}\lesssim\rho_{n}(\bm{\Sigma}_{0})+ dSp(Sp^(𝚺),Sp(𝚺))+B12.\displaystyle d_{\text{Sp}}(\widehat{\text{Sp}}(\bm{\Sigma}),\text{Sp}(\bm{\Sigma}))+B^{-\frac{1}{2}}.

As a special case, we can also construct sharp uniform simultaneous confidence intervals for 𝐜1T𝚺𝐜2\bm{c}_{1}^{T}\bm{\Sigma}\bm{c}_{2}

sup0α1|(𝒄1,𝒄2p,𝒄1T𝚺𝒄2[𝒄1T𝚺^𝒄2q^Sp^(𝚺)ub,B(α)\displaystyle\sup_{0\leq\alpha\leq 1}\bigg{|}\mathbb{P}\bigg{(}\forall\bm{c}_{1},\bm{c}_{2}\in\mathbb{R}^{p},\ \bm{c}_{1}^{T}\bm{\Sigma}\bm{c}_{2}\in\bigg{[}\bm{c}_{1}^{T}\bm{\hat{\Sigma}}\bm{c}_{2}-\hat{q}^{\text{ub},B}_{\widehat{\text{Sp}}(\bm{\Sigma})}(\alpha)\ 𝒄1𝒄2,\displaystyle\|\bm{c}_{1}\|\cdot\|\bm{c}_{2}\|, (41)
𝒄1T𝚺^𝒄2+q^Sp^(𝚺)ub,B(α)𝒄1𝒄2]|𝒀1,,𝒀B)(1α)|\displaystyle\bm{c}_{1}^{T}\bm{\hat{\Sigma}}\bm{c}_{2}+\hat{q}^{\text{ub},B}_{\widehat{\text{Sp}}(\bm{\Sigma})}(\alpha)\ \|\bm{c}_{1}\|\cdot\|\bm{c}_{2}\|\bigg{]}\bigg{|}\bm{Y}^{1},\cdots,\bm{Y}^{B}\bigg{)}-(1-\alpha)\bigg{|} ρn(𝚺0)+dSp(Sp^(𝚺),Sp(𝚺))+B12.\displaystyle\lesssim\rho_{n}(\bm{\Sigma}_{0})+d_{\text{Sp}}(\widehat{\text{Sp}}(\bm{\Sigma}),\text{Sp}(\bm{\Sigma}))+B^{-\frac{1}{2}}.

Theorem 3.6 establishes sharp uniform simultaneous confidence intervals for both 𝑨,𝚺\langle\bm{A},\bm{\Sigma}\rangle and 𝒄1T𝚺𝒄2\bm{c}_{1}^{T}\bm{\Sigma}\bm{c}_{2}. Specifically, intervals (40) and (41) are sharp, as their probability converges uniformly to 1α1-\alpha, rather than no less than 1α1-\alpha. When 𝒄1\bm{c}_{1} and 𝒄2\bm{c}_{2} are chosen as the canonical basis vectors 𝒆1,,𝒆p\bm{e}_{1},\cdots,\bm{e}_{p} in p\mathbb{R}^{p}, we obtain a uniform entry-wise confidence interval for all σij\sigma_{ij}. Notably, Corollary 3.6 extends this result by providing confidence intervals for all linear combinations of entries of 𝚺\bm{\Sigma}. Constructing confidence intervals for all such combinations is considerably more challenging than for individual entries. For a mean vector in the simpler vector case, achieving confidence intervals for each entry requires only lnp=o(na)\ln p=o(n^{a}) for some a>0a>0, see Chernozhukov, Chetverikov and Kato (2013, 2017); Chernozhukov, Chetverikov and Koike (2023). In contrast, confidence intervals for all linear combinations of the mean vector require p=o(n)p=o(n), see Zhai (2018); Xu, Zhang and Wu (2019); Fang and Koike (2024). Remarkably, we are able to construct sharp simultaneous confidence intervals for all 𝑨,𝚺\langle\bm{A},\bm{\Sigma}\rangle even as p/np/n converges or diverges to infinity.

4 Power analysis and the generalized universal bootstrap approximation

4.1 Power analysis for operator norm-based statistics

As noted in Section 1, several popular statistics utilize the operator norm in covariance testing, such as Roy’s largest root statistic TRoyT^{\text{Roy}} in (3). Our Theorem 3.5 also applies to Roy’s largest root with universal bootstrap, prompting the question of which test to use. In this subsection, we perform a power analysis across a family of statistics, with TT and TRoyT^{\text{Roy}} examined as specific cases.

We conduct a power analysis under the generalized spike model setting, which allows variability in the bulk eigenvalues and does not assume block independence between the spiked and bulk parts, as in Bai and Yao (2012) and further developed by Jiang and Bai (2021). We assume the following generalized spike structure

𝚺=𝑼[𝑫𝟎𝟎𝑽2]𝑼T,𝚺0=𝑼[𝑽1𝟎𝟎𝑽2]𝑼T,𝚺1=𝑼[𝑹1𝟎𝟎𝑹2]𝑼T,\displaystyle\bm{\Sigma}=\bm{U}\left[\begin{array}[]{cc}\bm{D}&\bm{0}\\ \bm{0}&\bm{V}_{2}\end{array}\right]\bm{U}^{T},\ \bm{\Sigma}_{0}=\bm{U}\left[\begin{array}[]{cc}\bm{V}_{1}&\bm{0}\\ \bm{0}&\bm{V}_{2}\end{array}\right]\bm{U}^{T},\ \bm{\Sigma}_{1}=\bm{U}\left[\begin{array}[]{cc}\bm{R}_{1}&\bm{0}\\ \bm{0}&\bm{R}_{2}\end{array}\right]\bm{U}^{T}, (48)

where 𝑫,𝑽1,𝑹1k×k\bm{D},\bm{V}_{1},\bm{R}_{1}\in\mathbb{R}^{k\times k}, 𝑽2,𝑹2(pk)×(pk)\bm{V}_{2},\bm{R}_{2}\in\mathbb{R}^{(p-k)\times(p-k)} are diagonal matrices, kk is a fixed number, and 𝑼\bm{U} is orthogonal matrix. Thus

𝚺112𝚺𝚺112=𝑼[𝑫𝟎𝟎𝑽2]𝑼T,𝚺112𝚺0𝚺112=𝑼[𝑽1𝟎𝟎𝑽2]𝑼T,\displaystyle\bm{\Sigma}_{1}^{-\frac{1}{2}}\bm{\Sigma}\bm{\Sigma}_{1}^{-\frac{1}{2}}=\bm{U}\left[\begin{array}[]{cc}\bm{D}^{\prime}&\bm{0}\\ \bm{0}&\bm{V}^{\prime}_{2}\end{array}\right]\bm{U}^{T},\ \bm{\Sigma}_{1}^{-\frac{1}{2}}\bm{\Sigma}_{0}\bm{\Sigma}_{1}^{-\frac{1}{2}}=\bm{U}\left[\begin{array}[]{cc}\bm{V}^{\prime}_{1}&\bm{0}\\ \bm{0}&\bm{V}^{\prime}_{2}\end{array}\right]\bm{U}^{T}, (53)

where 𝑫=𝑫𝑹11\bm{D}^{\prime}=\bm{D}\bm{R}^{-1}_{1}, 𝑽1=𝑽1𝑹11\bm{V}_{1}^{\prime}=\bm{V}_{1}\bm{R}^{-1}_{1}, 𝑽2=𝑽2𝑹21\bm{V}_{2}^{\prime}=\bm{V}_{2}\bm{R}^{-1}_{2}. We assume that 𝑫op\|\bm{D}^{\prime}\|_{\text{op}}, 𝑽1op\|\bm{V}^{\prime}_{1}\|_{\text{op}}, 𝑽2op\|\bm{V}^{\prime}_{2}\|_{\text{op}} are bounded, without requiring the eigenvalues of 𝑽2\bm{V}_{2} to be the same, unlike the classical spike model. We consider the family of statistics

T(𝚺1)=𝚺112(𝚺^𝚺0)𝚺112op,\displaystyle T(\bm{\Sigma}_{1})=\left\|\bm{\Sigma}_{1}^{-\frac{1}{2}}\left(\hat{\bm{\Sigma}}-\bm{\Sigma}_{0}\right)\bm{\Sigma}_{1}^{-\frac{1}{2}}\right\|_{\text{op}}, (54)

and reject H0H_{0} if T(𝚺1)q^𝚺11/2𝚺0𝚺11/2ub,B(α)T(\bm{\Sigma}_{1})\geq\hat{q}^{\text{ub},B}_{\bm{\Sigma}_{1}^{-1/2}\bm{\Sigma}_{0}\bm{\Sigma}_{1}^{-1/2}}(\alpha). Notably, T=T(𝑰p)T=T(\bm{I}_{p}), TRoy=T(𝚺0)T^{\text{Roy}}=T(\bm{\Sigma}_{0}) are both included in this family. Define E+(𝚺1)E_{+}(\bm{\Sigma}_{1}), E(𝚺1)E_{-}(\bm{\Sigma}_{1}) and m~(z;𝚺1)\tilde{m}(z;\bm{\Sigma}_{1}) as in Section 3 with (𝚺,𝑹)=(𝑽2,𝑽2)(\bm{\Sigma},\bm{R})=(\bm{V}_{2}^{\prime},-\bm{V}_{2}^{\prime}) in 𝑴n\bm{M}_{n} in (14). Intuitively, E+(𝚺1)E_{+}(\bm{\Sigma}_{1}), E(𝚺1)E_{-}(\bm{\Sigma}_{1}) and m~(z;𝚺1)\tilde{m}(z;\bm{\Sigma}_{1}) are limits of the upper and lower bound and the Stieltjes transform of bulk eigenvalues of ϕ1/2𝚺11/2(𝚺^𝚺0)𝚺11/2\phi^{-1/2}\bm{\Sigma}_{1}^{-1/2}\left(\hat{\bm{\Sigma}}-\bm{\Sigma}_{0}\right)\bm{\Sigma}_{1}^{-1/2}. For simplicity, we restrict our analysis to α=1\alpha=1 and |E+(𝚺1)|>|E(𝚺1)||E_{+}(\bm{\Sigma}_{1})|>|E_{-}(\bm{\Sigma}_{1})|.

Theorem 4.1.

Under Assumptions 1 and (48), define 𝐃=diag((di)i=1k)\bm{D}^{\prime}=\text{diag}\left((d_{i}^{\prime})_{i=1}^{k}\right), 𝐕1=diag((vi)i=1k)\bm{V}_{1}^{\prime}=\text{diag}\left((v_{i}^{\prime})_{i=1}^{k}\right) and the threshold

κ=ϕ12m~(ϕ12E+;𝚺1)ϕ12viE+m~(ϕ12E+;𝚺1).\displaystyle\kappa=-\frac{\phi^{\frac{1}{2}}}{\tilde{m}(\phi^{-\frac{1}{2}}E_{+};\bm{\Sigma}_{1})}-\frac{\phi^{\frac{1}{2}}v_{i}^{\prime}}{E_{+}\tilde{m}(\phi^{-\frac{1}{2}}E_{+};\bm{\Sigma}_{1})}.

(i) If for some 1ik1\leq i\leq k we have di>κd_{i}^{\prime}>\kappa, then

(T(𝚺1)q^𝚺112𝚺0𝚺112ub,B(α))1,asn,B,\displaystyle\mathbb{P}\left(T(\bm{\Sigma}_{1})\geq\hat{q}^{\text{ub},B}_{\bm{\Sigma}_{1}^{-\frac{1}{2}}\bm{\Sigma}_{0}\bm{\Sigma}_{1}^{-\frac{1}{2}}}(\alpha)\right)\to 1,\ \text{as}\ n\to\infty,\ B\to\infty,

i.e. the power goes to 11.

(ii) If for all 1ik1\leq i\leq k, we have di<κd_{i}^{\prime}<\kappa and T(𝚺1)=λ1(𝚺11/2(𝚺^𝚺0)𝚺11/2)T(\bm{\Sigma}_{1})=\lambda_{1}\left(\bm{\Sigma}_{1}^{-1/2}\left(\hat{\bm{\Sigma}}-\bm{\Sigma}_{0}\right)\bm{\Sigma}_{1}^{-1/2}\right), then

(T(𝚺1)q^𝚺11/2𝚺0𝚺11/2ub,B(α))α,asn,B,\displaystyle\mathbb{P}\left(T(\bm{\Sigma}_{1})\geq\hat{q}^{\text{ub},B}_{\bm{\Sigma}_{1}^{-1/2}\bm{\Sigma}_{0}\bm{\Sigma}_{1}^{-1/2}}(\alpha)\right)\to\alpha,\ \text{as}\ n\to\infty,\ B\to\infty,

i.e. there is no power under this setting.

Theorem 4.1 provides the power analysis of the test based on the statistic (54). Specifically, it identifies the phase transition point

ϕ12m~(ϕ12E+;𝚺1)ϕ12viE+m~(ϕ12E+;𝚺1),\displaystyle-\frac{\phi^{\frac{1}{2}}}{\tilde{m}(\phi^{-\frac{1}{2}}E_{+};\bm{\Sigma}_{1})}-\frac{\phi^{\frac{1}{2}}v_{i}^{\prime}}{E_{+}\tilde{m}(\phi^{-\frac{1}{2}}E_{+};\bm{\Sigma}_{1})},

which closely resembles the well-known Baik-Ben Arous-Péché (BBP) phase transition for the largest eigenvalues of the sample covariance matrix, see Baik, Ben Arous and Péché (2005); Baik and Silverstein (2006); Paul (2007). When the spike eigenvalues exceed this phase transition point, they lie outside the bulk eigenvalue support, yielding full test power. Conversely, if the spike eigenvalues did^{\prime}_{i} fall below this point, distinguishing them from the null case using T(𝚺1)T(\bm{\Sigma}_{1}) becomes infeasible, as the spike and bulk eigenvalues are indistinguishably close. It is noted that Theorem 4.1 focuses solely on the largest eigenvalue spike, as analyzing the smallest eigenvalue spike is more complex. Since this power analysis serves as a preliminary illustration, we do not extend our investigation in that direction.

With Theorem 4.1, we can now deduce the power of TT and TRoyT^{\text{Roy}} as special cases.

Corollary 3.

Under the same assumption in theorem 4.1, define 𝑫=diag((di)i=1k)\bm{D}=\text{diag}\left((d_{i})_{i=1}^{k}\right), 𝑽1=diag((vi)i=1k)\bm{V}_{1}=\text{diag}\left((v_{i})_{i=1}^{k}\right).

(i) If for some 1ik1\leq i\leq k, we have divi>ϕ1/2m~(ϕ1/2E+)vi(1+ϕ1/2E+m~(ϕ1/2E+))d_{i}-v_{i}>-\frac{\phi^{1/2}}{\tilde{m}(\phi^{-1/2}E_{+})}-v_{i}\left(1+\frac{\phi^{1/2}}{E_{+}\tilde{m}(\phi^{-1/2}E_{+})}\right), then

(Tq^𝚺0ub,B(α))1,asn,B,\displaystyle\mathbb{P}\left(T\geq\hat{q}^{\text{ub},B}_{\bm{\Sigma}_{0}}(\alpha)\right)\to 1,\ \text{as}\ n\to\infty,\ B\to\infty,

i.e. the power goes to 11. Here m~(z)=m~(z;𝚺)\tilde{m}(z)=\tilde{m}(z;\bm{\Sigma}) satisfies E+m~(E+)<1E_{+}\tilde{m}(E_{+})<-1.

(ii) If for some 1ik1\leq i\leq k, we have divi>ϕ1/2vid_{i}-v_{i}>\phi^{1/2}v_{i}, then

(TRoyq^𝑰pub,B(α))1,asn,B,\displaystyle\mathbb{P}\left(T^{\text{Roy}}\geq\hat{q}^{\text{ub},B}_{\bm{I}_{p}}(\alpha)\right)\to 1,\ \text{as}\ n\to\infty,\ B\to\infty,

i.e. the power goes to 11.

We compare the phase transition points (i) and (ii) for TT and TRoyT^{\text{Roy}}, respectively. These points reveal distinct behaviors: as ϕ1/2E+m~(ϕ1/2E+)<1\phi^{-1/2}E_{+}\tilde{m}(\phi^{-1/2}E_{+})<-1, the phase transition point of (i) decreases with viv_{i}, whereas the phase transition point of (ii) increases with viv_{i}. This indicates that when spikes affect the larger eigenvalues, i.e. for large viv_{i}, then the statistics TT achieves full power more effectively than the normalized TRoyT^{\text{Roy}} due to a lower phase transition point. Conversely, the normalized statistic TRoyT^{\text{Roy}} performs better for smaller viv_{i}. Thus, Corollary 3 quantifies this intuitive observation.

4.2 The generalized universal bootstrap

Given that neither TT and TRoyT^{\text{Roy}} dominates across all settings, a natural approach is to combine these two statistics to enhance testing power, see TComT^{\text{Com}} in (9) as an example. In Section 5, simulations demonstrate that TComT^{\text{Com}} can outperform both TT and TRoyT^{\text{Roy}} in certain scenarios. However, conducting tests with TComT^{\text{Com}} requires calculating an appropriate threshold. While Theorem 3.5 provides thresholds for TT and TRoyT^{\text{Roy}} individually, it cannot be directly applied to TComT^{\text{Com}} due to the complex dependence between TT and TRoyT^{\text{Roy}}. To address this, we develop a generalized universality theorem that applies to a broad class of statistics involving extreme singular values.

We consider the statistic TExST^{\text{ExS}} as a general function of extreme singular values

TExS=f({(σk(𝚺2,112(𝚺^𝚺1,1)𝚺2,112))k=1k1,,(σk(𝚺2,M12(𝚺^𝚺1,M)𝚺2,M12))k=1kM}),\displaystyle T^{\text{ExS}}=f\bigg{(}\bigg{\{}\left(\sigma_{k}\left(\bm{\Sigma}_{2,1}^{-\frac{1}{2}}(\bm{\hat{\Sigma}}-\bm{\Sigma}_{1,1})\bm{\Sigma}_{2,1}^{-\frac{1}{2}}\right)\right)_{k=1}^{k_{1}},\cdots,\left(\sigma_{k}\left(\bm{\Sigma}_{2,M}^{-\frac{1}{2}}(\bm{\hat{\Sigma}}-\bm{\Sigma}_{1,M})\bm{\Sigma}_{2,M}^{-\frac{1}{2}}\right)\right)_{k=1}^{k_{M}}\bigg{\}}\bigg{)}, (55)

where ff is a measurable function, (𝚺1,m,𝚺2,m)(\bm{\Sigma}_{1,m},\bm{\Sigma}_{2,m}) are matrices, and kmk_{m} denotes integers for m=1,,Mm=1,\cdots,M. The universal bootstrapped version, TExS,ubT^{\text{ExS,ub}}, is defined by replacing 𝚺^\bm{\hat{\Sigma}} with 𝚺^ub\bm{\hat{\Sigma}}^{\text{ub}} in (55). We then define the empirical universal bootstrap threshold q^𝚺,{𝚺1,m,𝚺2,m}m=1MExS,ub,B(α)\hat{q}^{\text{ExS,ub,B}}_{\bm{\Sigma},\left\{\bm{\Sigma}_{1,m},\bm{\Sigma}_{2,m}\right\}_{m=1}^{M}}(\alpha) as the upper α\alpha-th quantile of TExS,ub,1,,TExS,ub,BT^{\text{ExS,ub,1}},\cdots,T^{\text{ExS,ub,B}} of the i.i.d. samples TExS,ubT^{\text{ExS,ub}}. This setup enables the following generalized universal bootstrap consistency result.

Theorem 4.2 (Generalized universal bootstrap consistency).

Under Assumption 1, given MM admissible pairs (𝚺2,m1/2𝚺𝚺2,m1/2,𝚺2,m1/2𝚺1,m𝚺2,m1/2)(\bm{\Sigma}_{2,m}^{-1/2}\bm{\Sigma}\bm{\Sigma}_{2,m}^{-1/2},\bm{\Sigma}_{2,m}^{-1/2}\bm{\Sigma}_{1,m}\bm{\Sigma}_{2,m}^{-1/2}) and fixed integer numbers kmk_{m} for m=1,,Mm=1,\cdots,M, we have the uniform Gaussian approximation bound

ρn({𝚺1,m,𝚺2,m}m=1M)Cnδ0asn,\displaystyle\rho_{n}(\left\{\bm{\Sigma}_{1,m},\bm{\Sigma}_{2,m}\right\}_{m=1}^{M})\leq Cn^{-\delta}\to 0\quad\text{as}\quad n\to\infty, (56)

for some constants CC, δ>0\delta>0, where ρn({𝚺1,m,𝚺2,m}m=1M)\rho_{n}(\left\{\bm{\Sigma}_{1,m},\bm{\Sigma}_{2,m}\right\}_{m=1}^{M}) is defined in (11).

Moreover, we have

sup0α1|(TExSq^𝚺,{𝚺1,m,𝚺2,m}m=1MExS,ub,B(α))α|ρn(𝚺0)+B12.\displaystyle\sup_{0\leq\alpha\leq 1}\bigg{|}\mathbb{P}\bigg{(}T^{\text{ExS}}\geq\hat{q}^{\text{ExS,ub,B}}_{\bm{\Sigma},\left\{\bm{\Sigma}_{1,m},\bm{\Sigma}_{2,m}\right\}_{m=1}^{M}}(\alpha)\bigg{)}-\alpha\bigg{|}\lesssim\rho_{n}(\bm{\Sigma}_{0})+B^{-\frac{1}{2}}. (57)

Theorem 4.2 extends Theorem 3.5 by demonstrating that universal bootstrap consistency applies not only to TT and TRoyT^{\text{Roy}}, but also to combinations like TComT^{\text{Com}}, the sum of the largest kk-th singular values of 𝚺^\bm{\hat{\Sigma}} proposed by Ke (2016), and to all statistics based on extreme singular values with arbitrary dependencies in the form of (55). This result reinforces our universal bootstrap principle: to approximate the distribution of statistics involving extreme eigenvalues, it suffices to substitute all random variables with their Gaussian counterparts, showcasing the practical advantage of universality.

4.3 Comparison with other norms

In this subsection, we compare statistics based on the operator norm with those based on the two widely used norms: the Frobenius norm and the supremum norm. Define the statistics

TF=𝚺12𝚺^𝚺12𝑰pF,Tsup=𝚺12𝚺^𝚺12𝑰psup.\displaystyle T^{\text{F}}=\left\|\bm{\Sigma}^{-\frac{1}{2}}\bm{\hat{\Sigma}}\bm{\Sigma}^{-\frac{1}{2}}-\bm{I}_{p}\right\|_{\text{\text{F}}},\ T^{\text{sup}}=\left\|\bm{\Sigma}^{-\frac{1}{2}}\bm{\hat{\Sigma}}\bm{\Sigma}^{-\frac{1}{2}}-\bm{I}_{p}\right\|_{\text{\text{sup}}}. (58)

While the universality Theorem 3.3 holds for operator norm-based statistics TT and TRoyT^{\text{Roy}}, we demonstrate that it does not extend to TFT^{\text{F}} or TsupT^{\text{sup}}. Theorem 3.3 specifies that the asymptotic distribution of TT and TRoyT^{\text{Roy}} depends solely on the first two moments of 𝑿\bm{X}, allowing for the construction of a universal bootstrap procedure. In contrast, the asymptotic distributions of TFT^{\text{F}} and TsupT^{\text{sup}} rely on the first four moments of 𝑿\bm{X}. We present the following modified assumptions to establish the asymptotic distribution of TFT^{\text{F}}.

Assumption 1.

Suppose that 𝒁={Zij,i=1,,n;j=1,,p}\bm{Z}=\left\{Z_{ij},i=1,\cdots,n;j=1,\cdots,p\right\} are i.i.d. random variables with 𝔼[Zij]=0\mathbb{E}[Z_{ij}]=0, 𝔼[Zij2]=1\mathbb{E}[Z_{ij}^{2}]=1, 𝔼[Zij4]=ν4\mathbb{E}[Z_{ij}^{4}]=\nu_{4}. Assume that 𝔼[|Zij|6+ϵ]<\mathbb{E}[|Z_{ij}|^{6+\epsilon}]<\infty for some ϵ>0\epsilon>0.

Proposition 1.

Under Assumption 1, if α=1\alpha=1 or α2\alpha\geq 2, we have

np(TF)21p(tr(𝚺^𝚺1))2(ν42)𝒩(0,4).\displaystyle\frac{n}{p}(T^{\text{F}})^{2}-\frac{1}{p}\left(\text{tr}(\bm{\hat{\Sigma}}\bm{\Sigma}^{-1})\right)^{2}-(\nu_{4}-2)\Rightarrow\mathcal{N}(0,4).

To facilitate the Gaussian approximation of TsupT^{\text{sup}}, we introduce the following sub-Gaussian assumption. This assumption is employed here for simplicity and can be extended to a more general, though tedious bound if needed.

Assumption 1′′.

Suppose that for i=1,,n;j1,j2=1,,pi=1,\cdots,n;\ j_{1},j_{2}=1,\cdots,p,

𝔼[exp(|Zij1Zij2|/Bn)]2,\displaystyle\mathbb{E}\left[\text{exp}(|Z_{ij_{1}}Z_{ij_{2}}|/B_{n})\right]\leq 2,

for some positive sequence BnB_{n}.

Assumption 2′′.

We have for i=1,,n;j1,j2=1,,pi=1,\cdots,n;\ j_{1},j_{2}=1,\cdots,p,

b121ni=1n𝔼[(Zij1Zij2σj1j2)2],1ni=1n𝔼[(Zij1Zij2σj1j2)4]Bn2b22\displaystyle b_{1}^{2}\leq\frac{1}{n}\sum_{i=1}^{n}\mathbb{E}\big{[}(Z_{ij_{1}}Z_{ij_{2}}-\sigma_{j_{1}j_{2}})^{2}\big{]},\ \frac{1}{n}\sum_{i=1}^{n}\mathbb{E}\big{[}(Z_{ij_{1}}Z_{ij_{2}}-\sigma_{j_{1}j_{2}})^{4}\big{]}\leq B_{n}^{2}b_{2}^{2}

for some positive numbers b1b_{1}, b2b_{2} and sequence BnB_{n} in Assumption 1′′.

Proposition 2.

If Assumptions 1′′, 2′′ hold, we have

supt0|(Tsupt)(𝑮supt)|(Bn2ln2(pn)n)14,\displaystyle\sup_{t\geq 0}\bigg{|}\mathbb{P}\bigg{(}T^{\text{sup}}\leq t\bigg{)}-\mathbb{P}\bigg{(}\|\bm{G}\|_{\text{sup}}\leq t\bigg{)}\bigg{|}\lesssim\left(\frac{B_{n}^{2}\text{ln}^{2}(pn)}{n}\right)^{\frac{1}{4}},

where 𝑮p×p\bm{G}\in\mathbb{R}^{p\times p} is a Gaussian random matrix with zero mean and covariance Cov(Gi1j1,Gi2j2)=1nk=1nCov(Zki1Zkj1,Zki2Zkj2)\text{Cov}(G_{i_{1}j_{1}},G_{i_{2}j_{2}})=\frac{1}{n}\sum_{k=1}^{n}\text{Cov}(Z_{ki_{1}}Z_{kj_{1}},Z_{ki_{2}}Z_{kj_{2}}), i1,i2,j1,j2=1,,pi_{1},i_{2},j_{1},j_{2}=1,\cdots,p.

Propositions 1 and 2 follow as direct corollaries from Qiu, Li and Yao (2023) and Chernozhuokov et al. (2022), and we omit the proofs here. These propositions demonstrate that statistics based on the Frobenius and supremum norms lack the second-moment universality as the operator norm. Consequently, tests relying on the Frobenius and supremum norms require normalization through fourth-moment estimates of 𝑿\bm{X}, a process more complex than estimating the covariance structure itself. This complexity arises because the Frobenius and supremum norms reduce the covariance matrices to vector forms, calculating the L2L^{2} and LL^{\infty} norms, respectively. As a result, deriving the asymptotic distribution necessitates the second moment of the sample covariance, equivalent to the fourth moment of 𝑿\bm{X}. Thus, the operator norm emerges as a more easy-to-implement option for covariance testing.

5 Numerical results

5.1 Simulation

In this subsection, we evaluate the empirical size and power of the proposed universal bootstrap for the operator norm-based statistics TT in (1) (Opn) and the combined statistics TComT^{\text{Com}} in (9) (Com). These are compared with the operator norm-based statistic TRoyT^{\text{Roy}} in (3) (Roy), the Frobenius norm-based linear spectral statistic TF,1T^{\text{F},1} from Qiu, Li and Yao (2023) (Lfn), the debiased Frobenius norm-based U statistic TF,2T^{\text{F},2} from Cai and Ma (2013) (Ufn), and the supremum norm-based statistic TsupT^{\text{sup}} inspired by Cai and Jiang (2011); Cai, Liu and Xia (2013) (Supn). The expressions for these statistics are provided in Section 12 of the Supplement.

To evaluate the robustness of our method, we generate nn independent, non-identically distributed samples of pp-dimensional random vectors with zero means and covariance matrix 𝚺\bm{\Sigma}. We set sample sizes n=100n=100 and n=300n=300 and vary the dimension pp over 100100, 200200, 500500, 10001000, and 20002000 to cover both proportional and ultra-high-dimensional cases. For each configuration, we calculate the average number of rejections based on 20002000 replications, with the universal bootstrap procedure using 10001000 resamplings. The significance level for all tests is fixed at 0.050.05. We consider three different structures for the null covariance matrix 𝚺0=(σ0,ij)p×p\bm{\Sigma}_{0}=(\sigma_{0,ij})_{p\times p}:

(a). Exponential decay. The elements σ0,ij=0.6|ij|\sigma_{0,ij}=0.6^{|i-j|}.

(b). Block diagonal. Let K=p/10K=\lfloor p/10\rfloor be the number of blocks, with each diagonal block being 0.55𝟏10𝟏10T+0.45𝑰100.55\bm{1}_{10}^{\ }\bm{1}_{10}^{T}+0.45\bm{I}_{10}.

(c). Signed sub-exponential decay. The elements σ0,ij=(1)i+j0.4|ij|0.5\sigma_{0,ij}=(-1)^{i+j}0.4^{|i-j|^{0.5}}.

These structures have been previously analyzed in Cai, Liu and Xia (2013); Zhu et al. (2017); Yu, Li and Xue (2024) for models (a) and (b), and in Cai, Liu and Xia (2013); Yu, Li and Xue (2024) for model (c), highlighting the broad applicability of our approach. To assess the empirical size, we examine the following distributions of 𝑿=𝚺12𝒁\bm{X}=\bm{\Sigma}^{\frac{1}{2}}\bm{Z}:

(1). Gaussian distribution: {Zij,i=1,,n;j=1,,p}\{Z_{ij},\ i=1,\cdots,n;j=1,\cdots,p\} are i.i.d. standard normal random variables.

(2). Uniform and t-distributions: {Zij,i=1,,n/2;j=1,,p}\{Z_{ij},\ i=1,\cdots,\lfloor n/2\rfloor;j=1,\cdots,p\} are i.i.d. normalized uniform random variables on [1,1][-1,1], while {Zij,i=n/2+1,,n;j=1,,p}\{Z_{ij},\ i=\lfloor n/2\rfloor+1,\cdots,n;j=1,\cdots,p\} are i.i.d. normalized t random variables with degree of freedom 1212.

(3). Gaussian and uniform distributions: {Zij,i=1,,n/2;j=1,,p}\{Z_{ij},\ i=1,\cdots,\lfloor n/2\rfloor;j=1,\cdots,p\} are i.i.d. standard normal random variables, while {Zij,i=n/2+1,,n;j=1,,p}\{Z_{ij},\ i=\lfloor n/2\rfloor+1,\cdots,n;j=1,\cdots,p\} are i.i.d. normalized uniform random variables on [1,1][-1,1].

(4). Gaussian and t-distributions: {Zij,i=1,,n/2;j=1,,p}\{Z_{ij},\ i=1,\cdots,\lfloor n/2\rfloor;j=1,\cdots,p\} are i.i.d. standard normal random variables, while {Zij,i=n/2+1,,n;j=1,,p}\{Z_{ij},\ i=\lfloor n/2\rfloor+1,\cdots,n;j=1,\cdots,p\} are i.i.d. normalized t random variables with degree of freedom 1212.

Common covariance test cases often consider the Gaussian and uniform distributions Chen, Qiu and Zhang (2023), while the t-distribution with 1212 degrees of freedom is examined in Cai, Liu and Xia (2013); Zhu et al. (2017). All distributions are standardized to zero mean and unit variance. Except for the Gaussian baseline, the other cases share identical covariance matrices but differ in distribution, creating challenging yet crucial scenarios for testing covariance structures without imposing additional distributional assumptions. These cases reflect the broad applicability of our universality result.

Table 1: The empirical sizes of the supremum, the Frobenius, and the operator norm tests with a significance level 0.050.05 for data with Gaussian distribution and the t and uniform distribution and the combinations of the sample size nn, the dimension pp, and the covariance matrix 𝚺0\bm{\Sigma}_{0}.
𝚺0\bm{\Sigma}_{0} pp Exp. decay(0.6) Block diagonal Signed subExp. decay(0.4)
100 200 500 1000 2000 100 200 500 1000 2000 100 200 500 1000 2000
Gaussian, n=100
Supn 0.203 0.240 0.316 0.406 0.487 0.203 0.240 0.316 0.406 0.487 0.203 0.240 0.316 0.406 0.487
Lfn 0.049 0.050 0.049 0.050 0.055 0.049 0.050 0.049 0.050 0.055 0.049 0.050 0.049 0.050 0.055
Ufn 0.048 0.056 0.046 0.049 0.056 0.048 0.056 0.046 0.049 0.056 0.048 0.056 0.046 0.049 0.056
Roy 0.050 0.050 0.045 0.053 0.049 0.050 0.050 0.045 0.053 0.049 0.050 0.050 0.045 0.053 0.049
Opn 0.051 0.046 0.048 0.044 0.049 0.048 0.047 0.048 0.047 0.052 0.053 0.055 0.049 0.049 0.052
Com 0.050 0.046 0.046 0.047 0.049 0.050 0.046 0.047 0.045 0.050 0.055 0.054 0.048 0.052 0.051
Gaussian, n=300
Supn 0.073 0.066 0.071 0.070 0.076 0.073 0.066 0.071 0.070 0.076 0.073 0.066 0.071 0.070 0.076
Lfn 0.056 0.048 0.045 0.045 0.053 0.056 0.048 0.045 0.045 0.053 0.056 0.048 0.045 0.045 0.053
Ufn 0.053 0.048 0.044 0.044 0.057 0.053 0.048 0.044 0.044 0.057 0.053 0.048 0.044 0.044 0.057
Roy 0.048 0.049 0.050 0.052 0.051 0.048 0.049 0.050 0.052 0.051 0.048 0.049 0.050 0.052 0.051
Opn 0.057 0.052 0.047 0.051 0.052 0.052 0.050 0.052 0.052 0.049 0.042 0.046 0.054 0.049 0.054
Com 0.057 0.050 0.046 0.052 0.052 0.053 0.050 0.050 0.051 0.048 0.042 0.048 0.052 0.049 0.055
uniform and t(df=12), n=100
Supn 0.251 0.293 0.391 0.492 0.613 0.251 0.293 0.391 0.492 0.613 0.251 0.293 0.391 0.492 0.613
Lfn 0.057 0.050 0.053 0.052 0.062 0.057 0.050 0.053 0.052 0.062 0.057 0.050 0.053 0.052 0.062
Ufn 0.056 0.051 0.048 0.051 0.064 0.056 0.051 0.048 0.051 0.064 0.056 0.051 0.048 0.051 0.064
Roy 0.050 0.045 0.051 0.050 0.050 0.050 0.045 0.051 0.050 0.050 0.050 0.051 0.051 0.050 0.050
Opn 0.045 0.033 0.043 0.041 0.047 0.052 0.043 0.042 0.051 0.043 0.047 0.047 0.051 0.060 0.049
Com 0.045 0.033 0.044 0.041 0.050 0.051 0.044 0.041 0.055 0.046 0.049 0.047 0.046 0.063 0.055
uniform and t(df=12), n=300
Supn 0.093 0.089 0.098 0.109 0.125 0.093 0.089 0.098 0.109 0.125 0.093 0.089 0.098 0.109 0.125
Lfn 0.055 0.047 0.050 0.039 0.053 0.055 0.047 0.050 0.039 0.053 0.055 0.047 0.050 0.039 0.053
Ufn 0.056 0.044 0.049 0.040 0.056 0.056 0.044 0.049 0.040 0.056 0.056 0.044 0.049 0.040 0.056
Roy 0.050 0.046 0.038 0.053 0.054 0.050 0.046 0.038 0.053 0.054 0.050 0.046 0.038 0.053 0.054
Opn 0.040 0.051 0.049 0.051 0.050 0.042 0.053 0.051 0.047 0.051 0.046 0.042 0.036 0.042 0.061
Com 0.046 0.051 0.041 0.048 0.047 0.043 0.052 0.050 0.046 0.052 0.049 0.045 0.038 0.045 0.058

Table 1 reports the empirical sizes of the supremum, the Frobenius, and the operator norm tests at the significance level 0.050.05 across various sample size nn, dimension pp, covariance 𝚺0\bm{\Sigma}_{0} and data generation distributions of Gaussian and the uniform and t distributions, (distributions (1) and (2)). For results on Gaussian-uniform and Gaussian-tt distributions (distributions (3) and (4)), see Table 2 in the Supplement, which demonstrates similar patterns. The three operator norm tests are performed using the proposed universal bootstrap procedure, while the supremum and Frobenius norm tests rely on their respective asymptotic distribution formulas. Both Table 1 and Table 2 in the Supplement show that the universal bootstrap effectively controls the size of all operator norm-based statistics at the nominal significance level across all tested scenarios. This approach performs well under both Gaussian and non-Gaussian distributions, for both i.i.d. and non-i.i.d. data, in proportional and ultra-high dimensional settings, and across various covariance structures. These findings confirm the universality and consistency of the universal bootstrap procedure, providing empirical support for our theoretical results in Theorem 3.5 and Theorem 4.2. Additionally, the Frobenius norm-based test maintains appropriate size control, while the supremum norm-based test exhibits substantial size inflation at n=100n=100 and moderate inflation even at a larger sample size of n=300n=300.

To evaluate the empirical power of these statistics, we consider a setting where 𝚺=𝚺0+𝚫\bm{\Sigma}=\bm{\Sigma}_{0}+\bm{\Delta}, with 𝚫\bm{\Delta} having a low rank. This setup corresponds to the power analysis presented in Theorem 4.1. As shown in Theorem 4.1, the statistic TT outperforms TRoyT^{\text{Roy}} when the eigenspaces of 𝚫\bm{\Delta} align with the eigenvectors of 𝚺0\bm{\Sigma}_{0} corresponding to larger eigenvalues. In contrast, TRoyT^{\text{Roy}} performs better when 𝚫\bm{\Delta} aligns with the smaller eigenvectors of 𝚺0\bm{\Sigma}_{0}. For a fair comparison, we define 𝚫=σ𝒖𝒖T/2+σ𝒗𝒗T/4\bm{\Delta}=\sigma\bm{u}\bm{u}^{T}/2+\sigma\bm{v}\bm{v}^{T}/4, where 𝒖\bm{u} is the fifth-largest eigenvector of 𝚺0\bm{\Sigma}_{0}, 𝒗\bm{v} is uniformly sampled on the unit sphere in p\mathbb{R}^{p}, and σ\sigma is the signal strength. We term this configuration the spike setting. For completeness, we also conduct a power analysis where 𝚫\bm{\Delta} has full rank by setting 𝚫=σ𝑰p\bm{\Delta}=\sigma\bm{I}_{p}, where σ\sigma represents the signal level. This configuration, termed the white noise setting, represents the covariance structure after adding white noise to the null covariance. Together, the spike and white noise settings cover both low- and full-rank deviations from the null covariance. For illustration, we consider sample sizes n=100n=100 and 300300, with dimension p=1000p=1000. Based on the universality results in Theorem 3.5 and empirical size performance, we conduct the power analysis using a Gaussian distribution for simplicity. Further empirical power comparisons under varying sample sizes and dimensions are provided in Table 1 of the Supplement table.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Refer to caption
(e)
Refer to caption
(f)
Figure 1: Empirical powers of the supremum, the Frobenius, and the proposed operator norm tests with respect to the signal level of the spike setting alternatives under three covariance structures, the sample size n=100n=100, 300300, dimension p=1000p=1000, and the Gaussian data with 20002000 replications.
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Refer to caption
(e)
Refer to caption
(f)
Figure 2: Empirical powers of the supremum, the Frobenius, and the proposed operator norm tests with respect to the signal level of the white noise setting alternatives under three covariance structures, the sample size n=100n=100, 300300, dimension p=1000p=1000, and the Gaussian data with 20002000 replications.

Figure 1 illustrates the empirical power performance of six test statistics across different signal levels and null covariance structures under the spike setting, where the dashed line represents the significance level of 0.050.05. Under this setting, the proposed operator norm-based statistics, TT and TComT^{\text{Com}} exhibit similar power, while the two Frobenius norm-based statistics and the normalized operator norm statistic TRoyT^{\text{Roy}} perform comparably. When n=100n=100, the operator norm-based statistics TT and TComT^{\text{Com}} maintain appropriate test size and achieve the highest power, approaching 11 quickly across all three covariance configurations. The normalized operator norm statistic TRoyT^{\text{Roy}} and the two Frobenius norm-based statistics TF,1T^{\text{F},1} and TF,2T^{\text{F},2} also control the test size but exhibit lower power, among which TRoyT^{\text{Roy}} slightly outperforms. The supremum norm-based statistic, however, exhibits significant size inflation and fails to detect signals, as its power curve does not increase with signal level. The pattern is similar for n=300n=300, except that the power curves for the supremum norm-based statistic remain near the dashed line, indicating limited power performance. This provides empirical evidence for the advantage of operator norm-based statistics over the other norms in the spike setting, particularly supporting the proposed statistic TT over TRoyT^{\text{Roy}}. We further make comparisons under the white noise setting across various signal levels in Figure 2. In this setting, the combined statistic TComT^{\text{Com}} and the normalized statistic TRoyT^{\text{Roy}} outperform other statistics. Because the white noise setting applies a uniform signal across eigenvector directions of 𝚺0\bm{\Sigma}_{0} with both large and small eigenvalues, the normalized statistic TRoyT^{\text{Roy}} achieves higher power than TT, which is consistent with our power analysis Theorem 4.1. Notably, the combined statistic TComT^{\text{Com}} performs robustly in both the spike and white noise settings, demonstrating enhanced power without size inflation, as supported by the generalized universal bootstrap consistency Theorem 4.2. As observed in the spike setting, Frobenius norm-based statistics display lower power, while the supremum norm-based statistic fails to control size. In summary, when the covariance structure is known, we recommend using the statistic TT. When the covariance structure is unknown, we prefer the more robust combined statistic, TComT^{\text{Com}}.

5.2 Data application

We apply our test procedure to annual mean near-surface air temperature data from the global scale for the period 196020101960-2010, using the HadCRUT4 dataset (Morice et al. (2012)) and the Coupled Model Intercomparison Project Phase 5 (CMIP5, Taylor, Stouffer and Meehl (2012)), as detailed in Li et al. (2021). The global dataset includes monthly anomalies of near-surface air temperature across 5×55^{\circ}\times 5^{\circ} grid boxes. To reduce dimensionality, these grid boxes are aggregated into larger 40×3040^{\circ}\times 30^{\circ} boxes, resulting in S=54S=54 spatial grid boxes. To mitigate distribution shifts due to long-term trends, we divide the 196020101960-2010 period into five decades and analyze the data for each decade separately. The monthly data for each decade are averaged over five-year intervals, yielding a temporal dimension T=2T=2 for each ten-year period. To illustrate the rationale of the covariance test in this context, we first outline the commonly used modeling procedure in climatology. Following Li et al. (2021), one considers a high-dimensional linear regression with errors-in-variables

𝒀k=i=12𝑿k,iβk,i+ϵk,k=1,,5,\displaystyle\bm{Y}_{k}=\sum_{i=1}^{2}\bm{X}_{k,i}\beta_{k,i}+\bm{\epsilon}_{k},\ k=1,\cdots,5,

where 𝒀kL\bm{Y}_{k}\in\mathbb{R}^{L} is the vectorized observed mean temperature across spatial and temporal dimensions for the kk-th decades after 19601960, with dimension L=S×TL=S\times T. The vectors 𝑿k,1\bm{X}_{k,1}, 𝑿k,2,ϵkL\bm{X}_{k,2},\bm{\epsilon}_{k}\in\mathbb{R}^{L} represent the expected unobserved climate response under the ANT and NAT forcings, and the unobserved noise, respectively. The coefficients βk,1\beta_{k,1}, βk,2\beta_{k,2} are unknown scaling factors of interest for statistical inference. To estimate βk,1\beta_{k,1}, βk,2\beta_{k,2}, in addition to the observed data 𝒀k\bm{Y}_{k}, we have nin_{i} noisy observation fingerprints 𝑿~k,i,j\tilde{\bm{X}}_{k,i,j} of the climate system

𝑿~k,i,j=𝑿k,i+ϵ~k,i,j,k=1,,5;j=1,,ni;i=1,2.\displaystyle\tilde{\bm{X}}_{k,i,j}=\bm{X}_{k,i}+\tilde{\bm{\epsilon}}_{k,i,j},\ k=1,\cdots,5;\ j=1,\cdots,n_{i};\ i=1,2.

In our dataset, n1=35n_{1}=35, n2=46n_{2}=46. For privacy, the data 𝑿~k,i,j\tilde{\bm{X}}_{k,i,j} are preprocessed by adding white noise, with the according covariance of the noise added to the hypothesized matrix 𝚺ϵ,k\bm{\Sigma}_{\bm{\epsilon},k} defined below. The climate models also provide NN simulations ϵ^k,i\hat{\bm{\epsilon}}_{k,i}, k=1,,5;i=1,,Nk=1,\cdots,5;\ i=1,\cdots,N with N=223N=223 provided in this dataset, which are assumed to follow the same distribution as the unobserved noise ϵk\bm{\epsilon}_{k}. To efficiently estimate β1,β2\beta_{1},\beta_{2}, a typical assumption is that the natural variability simulated by the climate models matches the observed variability, i.e., the covariance of ϵ~k,i,j\tilde{\bm{\epsilon}}_{k,i,j}, k=1,,5;j=1,,ni;i=1,2k=1,\dots,5;\ j=1,\dots,n_{i};\ i=1,2, equals the covariance matrix 𝚺ϵ,k\bm{\Sigma}_{\bm{\epsilon},k} of ϵk\bm{\epsilon}_{k}. Since ϵk\bm{\epsilon}_{k} is unobserved, a common choice for 𝚺ϵ,k\bm{\Sigma}_{\bm{\epsilon},k} is the sample covariance of the simulations 𝚺ϵ,k=i=1Nϵ^k,iϵ^k,iT/N\bm{\Sigma}_{\bm{\epsilon},k}=\sum_{i=1}^{N}\hat{\bm{\epsilon}}_{k,i}\hat{\bm{\epsilon}}_{k,i}^{T}/N. Therefore, it is important to test the hypothesis

H0,k:Cov(ϵ~k,i,j)=𝚺ϵ,k,k=1,,5;j=1,,ni;i=1,2.\displaystyle H_{0,k}:\text{Cov}(\tilde{\bm{\epsilon}}_{k,i,j})=\bm{\Sigma}_{\bm{\epsilon},k},\ k=1,\cdots,5;\ j=1,\cdots,n_{i};\ i=1,2. (59)

As noted by Olonscheck and Notz (2017), this equivalence (59) is crucial for optimal fingerprint estimation in climate models. However, few studies have validated (59) with statistical evidence, which calls for the need to test this hypothesis.

We construct the data matrix 𝑿~kn×p\tilde{\bm{X}}_{k}\in\mathbb{R}^{n\times p} by combining quantities 𝑿~k,1,j𝑿~¯k,1\tilde{\bm{X}}_{k,1,j}-\bar{\tilde{\bm{X}}}_{k,1} for j=1,,n1j=1,\dots,n_{1} and 𝑿~k,2,j𝑿~¯k,2\tilde{\bm{X}}_{k,2,j}-\bar{\tilde{\bm{X}}}_{k,2} for j=1,,n2j=1,\dots,n_{2}, where n=n1+n2=81n=n_{1}+n_{2}=81 and p=L=108p=L=108. Here 𝑿~¯k,i=j=1ni𝑿~k,i,j/ni\bar{\tilde{\bm{X}}}_{k,i}=\sum_{j=1}^{n_{i}}\tilde{\bm{X}}_{k,i,j}/n_{i}. We apply several statistics to the data matrix 𝑿~k\tilde{\bm{X}}_{k} with the hypothesized matrix 𝚺ϵ,k\bm{\Sigma}_{\bm{\epsilon},k}: the proposed operator norm-based statistics TT (Opn) and TComT^{\text{Com}} (Com), the operator norm-based statistics TRoyT^{\text{Roy}} (Roy), the Frobenius norm-based statistics TF,1T^{\text{F},1} (Lfn) and TF,2T^{\text{F},2} (Ufn), and the supremum norm-based statistic TsupT^{\text{sup}} (Supn). For comparison, we generate i.i.d. Gaussian samples 𝑿~k,j𝒩(𝟎,𝚺ϵ,k)\tilde{\bm{X}}_{k,j}^{\prime}\sim\mathcal{N}(\bm{0},\bm{\Sigma}_{\bm{\epsilon},k}) for j=1,,n;k=1,,5j=1,\dots,n;\ k=1,\dots,5 as a control group. The observed data 𝑿~k\tilde{\bm{X}}_{k} forms the observation group. The p-value results for these statistics are summarized in Table 2. The supremum norm-based statistic TsupT^{\text{sup}} fails to reject the observation group for all periods but rejects the control group in the periods 196019701960-1970, 197019801970-1980, and 198019901980-1990. The Frobenius norm-based statistics TF,1T^{\text{F},1} and TF,2T^{\text{F},2} fail to reject the observation group in the periods 197019801970-1980 and 199020001990-2000, and TF,2T^{\text{F},2} rejects the control group during 199020001990-2000. Only the operator norm-based statistics TT, TComT^{\text{Com}}, and TRoyT^{\text{Roy}} reject the null hypothesis in the observation group while not rejecting it in the control group for all years. This suggests that the commonly assumed hypothesis (59) should be rejected, and a more suitable assumption should be used to estimate βk,1\beta_{k,1} and βk,2\beta_{k,2} for all kk.

Table 2: Estimated p-values of tests for covariances for observation group and control group using the supremum norm-based statistic (Supn), the Frobenius norm-based statistic (Lfn, Ufn), and the operator norm-based statistics (Roy, Opn, Com), with sample size n=81n=81, dimension p=108p=108.
p-value observation group control group
year Supn Lfn Ufn Roy Opn Com Supn Lfn Ufn Roy Opn Com
1960-1970 .783 .006 .001 .000 .000 .000 .002 .566 .521 .805 .646 .734
1970-1980 .406 .146 .115 .000 .000 .000 .004 .230 .243 .401 .452 .427
1980-1990 .249 .001 .000 .000 .000 .000 .000 .419 .451 .638 .713 .677
1990-2000 .905 .634 .410 .000 .000 .000 .223 .072 .042 .445 .383 .414
2000-2010 .984 .002 .001 .000 .000 .000 .719 .368 .376 .244 .288 .268

In summary, the operator norm-based statistics demonstrate strong performance across various covariance structures and signal configurations, with the universal bootstrap ensuring the consistency of tests constructed from diverse combinations of operator norms. Numerical results align with the theoretical framework, indicating that the proposed tests exhibit robust power properties and that the universal bootstrap procedures maintain appropriate size control.

{funding}

Fang Yao is the corresponding author. This research is partially supported by the National Key Research and Development Program of China (No. 2022YFA1003801), the National Natural Science Foundation of China (No. 12292981, 11931001), the LMAM, the Fundamental Research Funds for the Central Universities, Peking University, and LMEQF.

References

  • Adamczak et al. (2011) {barticle}[author] \bauthor\bsnmAdamczak, \bfnmRadosław\binitsR., \bauthor\bsnmLitvak, \bfnmAlexander E\binitsA. E., \bauthor\bsnmPajor, \bfnmAlain\binitsA. and \bauthor\bsnmTomczak-Jaegermann, \bfnmNicole\binitsN. (\byear2011). \btitleSharp bounds on the rate of convergence of the empirical covariance matrix. \bjournalComptes Rendus. Mathématique \bvolume349 \bpages195–200. \endbibitem
  • Bai and Yao (2012) {barticle}[author] \bauthor\bsnmBai, \bfnmZhidong\binitsZ. and \bauthor\bsnmYao, \bfnmJianfeng\binitsJ. (\byear2012). \btitleOn sample eigenvalues in a generalized spiked population model. \bjournalJournal of Multivariate Analysis \bvolume106 \bpages167–177. \endbibitem
  • Baik, Ben Arous and Péché (2005) {barticle}[author] \bauthor\bsnmBaik, \bfnmJinho\binitsJ., \bauthor\bsnmBen Arous, \bfnmGérard\binitsG. and \bauthor\bsnmPéché, \bfnmSandrine\binitsS. (\byear2005). \btitlePhase transition of the largest eigenvalue for nonnull complex sample covariance matrices. \endbibitem
  • Baik and Silverstein (2006) {barticle}[author] \bauthor\bsnmBaik, \bfnmJinho\binitsJ. and \bauthor\bsnmSilverstein, \bfnmJack W\binitsJ. W. (\byear2006). \btitleEigenvalues of large sample covariance matrices of spiked population models. \bjournalJournal of multivariate analysis \bvolume97 \bpages1382–1408. \endbibitem
  • Bao, Pan and Zhou (2015) {barticle}[author] \bauthor\bsnmBao, \bfnmZhigang\binitsZ., \bauthor\bsnmPan, \bfnmGuangming\binitsG. and \bauthor\bsnmZhou, \bfnmWang\binitsW. (\byear2015). \btitleUniversality for the largest eigenvalue of sample covariance matrices with general population. \endbibitem
  • Bianchi et al. (2011) {barticle}[author] \bauthor\bsnmBianchi, \bfnmPascal\binitsP., \bauthor\bsnmDebbah, \bfnmMerouane\binitsM., \bauthor\bsnmMa\̈text{i}da, \bfnmMylène\binitsM. and \bauthor\bsnmNajim, \bfnmJamal\binitsJ. (\byear2011). \btitlePerformance of statistical tests for single-source detection using random matrix theory. \bjournalIEEE Transactions on Information theory \bvolume57 \bpages2400–2419. \endbibitem
  • Bunea and Xiao (2015) {barticle}[author] \bauthor\bsnmBunea, \bfnmFlorentina\binitsF. and \bauthor\bsnmXiao, \bfnmLuo\binitsL. (\byear2015). \btitleOn the sample covariance matrix estimator of reduced effective rank population matrices, with applications to fPCA. \endbibitem
  • Cai and Jiang (2011) {barticle}[author] \bauthor\bsnmCai, \bfnmT Tony\binitsT. T. and \bauthor\bsnmJiang, \bfnmTiefeng\binitsT. (\byear2011). \btitleLimiting laws of coherence of random matrices with applications to testing covariance structure and construction of compressed sensing matrices. \endbibitem
  • Cai, Liu and Xia (2013) {barticle}[author] \bauthor\bsnmCai, \bfnmTony\binitsT., \bauthor\bsnmLiu, \bfnmWeidong\binitsW. and \bauthor\bsnmXia, \bfnmYin\binitsY. (\byear2013). \btitleTwo-sample covariance matrix testing and support recovery in high-dimensional and sparse settings. \bjournalJournal of the American Statistical Association \bvolume108 \bpages265–277. \endbibitem
  • Cai, Liu and Xia (2014) {barticle}[author] \bauthor\bsnmCai, \bfnmT Tony\binitsT. T., \bauthor\bsnmLiu, \bfnmWeidong\binitsW. and \bauthor\bsnmXia, \bfnmYin\binitsY. (\byear2014). \btitleTwo-sample test of high dimensional means under dependence. \bjournalJournal of the Royal Statistical Society Series B: Statistical Methodology \bvolume76 \bpages349–372. \endbibitem
  • Cai and Ma (2013) {barticle}[author] \bauthor\bsnmCai, \bfnmT Tony\binitsT. T. and \bauthor\bsnmMa, \bfnmZongming\binitsZ. (\byear2013). \btitleOptimal hypothesis testing for high dimensional covariance matrices. \endbibitem
  • Chen and Qin (2010) {barticle}[author] \bauthor\bsnmChen, \bfnmSong Xi\binitsS. X. and \bauthor\bsnmQin, \bfnmYing-Li\binitsY.-L. (\byear2010). \btitleA two-sample test for high-dimensional data with applications to gene-set testing. \endbibitem
  • Chen, Qiu and Zhang (2023) {barticle}[author] \bauthor\bsnmChen, \bfnmSong Xi\binitsS. X., \bauthor\bsnmQiu, \bfnmYumou\binitsY. and \bauthor\bsnmZhang, \bfnmShuyi\binitsS. (\byear2023). \btitleSharp optimality for high-dimensional covariance testing under sparse signals. \bjournalThe Annals of Statistics \bvolume51 \bpages1921–1945. \endbibitem
  • Chen, Zhang and Zhong (2010) {barticle}[author] \bauthor\bsnmChen, \bfnmSong Xi\binitsS. X., \bauthor\bsnmZhang, \bfnmLi-Xin\binitsL.-X. and \bauthor\bsnmZhong, \bfnmPing-Shou\binitsP.-S. (\byear2010). \btitleTests for high-dimensional covariance matrices. \bjournalJournal of the American Statistical Association \bvolume105 \bpages810–819. \endbibitem
  • Chernozhukov, Chetverikov and Kato (2013) {barticle}[author] \bauthor\bsnmChernozhukov, \bfnmVictor\binitsV., \bauthor\bsnmChetverikov, \bfnmDenis\binitsD. and \bauthor\bsnmKato, \bfnmKengo\binitsK. (\byear2013). \btitleGaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. \endbibitem
  • Chernozhukov, Chetverikov and Kato (2017) {barticle}[author] \bauthor\bsnmChernozhukov, \bfnmVictor\binitsV., \bauthor\bsnmChetverikov, \bfnmDenis\binitsD. and \bauthor\bsnmKato, \bfnmKengo\binitsK. (\byear2017). \btitleCentral limit theorems and bootstrap in high dimensions. \endbibitem
  • Chernozhukov, Chetverikov and Koike (2023) {barticle}[author] \bauthor\bsnmChernozhukov, \bfnmVictor\binitsV., \bauthor\bsnmChetverikov, \bfnmDenis\binitsD. and \bauthor\bsnmKoike, \bfnmYuta\binitsY. (\byear2023). \btitleNearly optimal central limit theorem and bootstrap approximations in high dimensions. \bjournalThe Annals of Applied Probability \bvolume33 \bpages2374–2425. \endbibitem
  • Chernozhuokov et al. (2022) {barticle}[author] \bauthor\bsnmChernozhuokov, \bfnmVictor\binitsV., \bauthor\bsnmChetverikov, \bfnmDenis\binitsD., \bauthor\bsnmKato, \bfnmKengo\binitsK. and \bauthor\bsnmKoike, \bfnmYuta\binitsY. (\byear2022). \btitleImproved central limit theorem and bootstrap approximations in high dimensions. \bjournalThe Annals of Statistics \bvolume50 \bpages2562–2586. \endbibitem
  • Couillet, Debbah and Silverstein (2011) {barticle}[author] \bauthor\bsnmCouillet, \bfnmRomain\binitsR., \bauthor\bsnmDebbah, \bfnmMérouane\binitsM. and \bauthor\bsnmSilverstein, \bfnmJack W\binitsJ. W. (\byear2011). \btitleA deterministic equivalent for the analysis of correlated MIMO multiple access channels. \bjournalIEEE Transactions on Information Theory \bvolume57 \bpages3493–3514. \endbibitem
  • Ding, Hu and Wang (2024) {barticle}[author] \bauthor\bsnmDing, \bfnmXiucai\binitsX., \bauthor\bsnmHu, \bfnmYichen\binitsY. and \bauthor\bsnmWang, \bfnmZhenggang\binitsZ. (\byear2024). \btitleTwo sample test for covariance matrices in ultra-high dimension. \bjournalJournal of the American Statistical Association \bpages1–12. \endbibitem
  • Ding and Wang (2023) {barticle}[author] \bauthor\bsnmDing, \bfnmXiucai\binitsX. and \bauthor\bsnmWang, \bfnmZhenggang\binitsZ. (\byear2023). \btitleGlobal and local CLTs for linear spectral statistics of general sample covariance matrices when the dimension is much larger than the sample size with applications. \bjournalarXiv preprint arXiv:2308.08646. \endbibitem
  • Ding and Yang (2018) {barticle}[author] \bauthor\bsnmDing, \bfnmXiucai\binitsX. and \bauthor\bsnmYang, \bfnmFan\binitsF. (\byear2018). \btitleA necessary and sufficient condition for edge universality at the largest singular values of covariance matrices. \bjournalThe Annals of Applied Probability \bvolume28 \bpages1679–1738. \endbibitem
  • Ding and Yang (2022) {barticle}[author] \bauthor\bsnmDing, \bfnmXiucai\binitsX. and \bauthor\bsnmYang, \bfnmFan\binitsF. (\byear2022). \btitleTracy-Widom distribution for heterogeneous Gram matrices with applications in signal detection. \bjournalIEEE Transactions on Information Theory \bvolume68 \bpages6682–6715. \endbibitem
  • Efron (1979) {barticle}[author] \bauthor\bsnmEfron, \bfnmBradley\binitsB. (\byear1979). \btitleBootstrap Methods: Another Look at the Jackknife. \bjournalThe Annals of Statistics \bvolume7 \bpages1–26. \endbibitem
  • El Karoui (2007) {barticle}[author] \bauthor\bsnmEl Karoui, \bfnmNoureddine\binitsN. (\byear2007). \btitleTracy–Widom limit for the largest eigenvalue of a large class of complex sample covariance matrices. \endbibitem
  • El Karoui and Purdom (2019) {binproceedings}[author] \bauthor\bsnmEl Karoui, \bfnmNoureddine\binitsN. and \bauthor\bsnmPurdom, \bfnmElizabeth\binitsE. (\byear2019). \btitleThe non-parametric bootstrap and spectral analysis in moderate and high-dimension. In \bbooktitleThe 22nd International Conference on Artificial Intelligence and Statistics \bpages2115–2124. \bpublisherPMLR. \endbibitem
  • Erdős, Knowles and Yau (2013) {binproceedings}[author] \bauthor\bsnmErdős, \bfnmLászló\binitsL., \bauthor\bsnmKnowles, \bfnmAntti\binitsA. and \bauthor\bsnmYau, \bfnmHorng-Tzer\binitsH.-T. (\byear2013). \btitleAveraging fluctuations in resolvents of random band matrices. In \bbooktitleAnnales Henri Poincaré \bvolume14 \bpages1837–1926. \bpublisherSpringer. \endbibitem
  • Erdős, Yau and Yin (2012) {barticle}[author] \bauthor\bsnmErdős, \bfnmLászló\binitsL., \bauthor\bsnmYau, \bfnmHorng-Tzer\binitsH.-T. and \bauthor\bsnmYin, \bfnmJun\binitsJ. (\byear2012). \btitleRigidity of eigenvalues of generalized Wigner matrices. \bjournalAdvances in Mathematics \bvolume229 \bpages1435–1515. \endbibitem
  • Fang and Koike (2024) {barticle}[author] \bauthor\bsnmFang, \bfnmXiao\binitsX. and \bauthor\bsnmKoike, \bfnmYuta\binitsY. (\byear2024). \btitleLarge-dimensional central limit theorem with fourth-moment error bounds on convex sets and balls. \bjournalThe Annals of Applied Probability \bvolume34 \bpages2065–2106. \endbibitem
  • Giessing (2023) {barticle}[author] \bauthor\bsnmGiessing, \bfnmAlexander\binitsA. (\byear2023). \btitleGaussian and Bootstrap Approximations for Suprema of Empirical Processes. \bjournalarXiv preprint arXiv:2309.01307. \endbibitem
  • Han, Xu and Zhou (2018) {barticle}[author] \bauthor\bsnmHan, \bfnmFang\binitsF., \bauthor\bsnmXu, \bfnmSheng\binitsS. and \bauthor\bsnmZhou, \bfnmWen-Xin\binitsW.-X. (\byear2018). \btitleOn Gaussian comparison inequality and its application to spectral analysis of large random matrices. \endbibitem
  • Hu and Lu (2022) {barticle}[author] \bauthor\bsnmHu, \bfnmHong\binitsH. and \bauthor\bsnmLu, \bfnmYue M\binitsY. M. (\byear2022). \btitleUniversality laws for high-dimensional learning with random features. \bjournalIEEE Transactions on Information Theory \bvolume69 \bpages1932–1964. \endbibitem
  • Ji and Park (2021) {barticle}[author] \bauthor\bsnmJi, \bfnmHong Chang\binitsH. C. and \bauthor\bsnmPark, \bfnmJaewhi\binitsJ. (\byear2021). \btitleTracy-Widom limit for free sum of random matrices. \bjournalarXiv preprint arXiv:2110.05147. \endbibitem
  • Jiang (2004) {barticle}[author] \bauthor\bsnmJiang, \bfnmTiefeng\binitsT. (\byear2004). \btitleThe asymptotic distributions of the largest entries of sample correlation matrices. \endbibitem
  • Jiang and Bai (2021) {barticle}[author] \bauthor\bsnmJiang, \bfnmDandan\binitsD. and \bauthor\bsnmBai, \bfnmZhidong\binitsZ. (\byear2021). \btitleGeneralized four moment theorem and an application to CLT for spiked eigenvalues of high-dimensional covariance matrices. \endbibitem
  • Johnstone (2001) {barticle}[author] \bauthor\bsnmJohnstone, \bfnmIain M\binitsI. M. (\byear2001). \btitleOn the distribution of the largest eigenvalue in principal components analysis. \bjournalThe Annals of statistics \bvolume29 \bpages295–327. \endbibitem
  • Johnstone and Paul (2018) {barticle}[author] \bauthor\bsnmJohnstone, \bfnmIain M\binitsI. M. and \bauthor\bsnmPaul, \bfnmDebashis\binitsD. (\byear2018). \btitlePCA in high dimensions: An orientation. \bjournalProceedings of the IEEE \bvolume106 \bpages1277–1292. \endbibitem
  • Karoui (2003) {barticle}[author] \bauthor\bsnmKaroui, \bfnmNoureddine El\binitsN. E. (\byear2003). \btitleOn the largest eigenvalue of Wishart matrices with identity covariance when n, p and p/n tend to infinity. \bjournalarXiv preprint math/0309355. \endbibitem
  • Ke (2016) {barticle}[author] \bauthor\bsnmKe, \bfnmZheng Tracy\binitsZ. T. (\byear2016). \btitleDetecting rare and weak spikes in large covariance matrices. \bjournalarXiv preprint arXiv:1609.00883. \endbibitem
  • Knowles and Yin (2017) {barticle}[author] \bauthor\bsnmKnowles, \bfnmAntti\binitsA. and \bauthor\bsnmYin, \bfnmJun\binitsJ. (\byear2017). \btitleAnisotropic local laws for random matrices. \bjournalProbability Theory and Related Fields \bvolume169 \bpages257–352. \endbibitem
  • Koltchinskii and Lounici (2017) {barticle}[author] \bauthor\bsnmKoltchinskii, \bfnmVladimir\binitsV. and \bauthor\bsnmLounici, \bfnmKarim\binitsK. (\byear2017). \btitleConcentration inequalities and moment bounds for sample covariance operators. \bjournalBernoulli \bpages110–133. \endbibitem
  • Kong and Valiant (2017) {barticle}[author] \bauthor\bsnmKong, \bfnmWeihao\binitsW. and \bauthor\bsnmValiant, \bfnmGregory\binitsG. (\byear2017). \btitleSpectrum estimation from samples. \endbibitem
  • Ledoit and Wolf (2015) {barticle}[author] \bauthor\bsnmLedoit, \bfnmOlivier\binitsO. and \bauthor\bsnmWolf, \bfnmMichael\binitsM. (\byear2015). \btitleSpectrum estimation: A unified framework for covariance matrix estimation and PCA in large dimensions. \bjournalJournal of Multivariate Analysis \bvolume139 \bpages360–384. \endbibitem
  • Lee and Schnelli (2016) {barticle}[author] \bauthor\bsnmLee, \bfnmJi Oon\binitsJ. O. and \bauthor\bsnmSchnelli, \bfnmKevin\binitsK. (\byear2016). \btitleTracy–Widom distribution for the largest eigenvalue of real sample covariance matrices with general population. \endbibitem
  • Li et al. (2021) {barticle}[author] \bauthor\bsnmLi, \bfnmYan\binitsY., \bauthor\bsnmChen, \bfnmKun\binitsK., \bauthor\bsnmYan, \bfnmJun\binitsJ. and \bauthor\bsnmZhang, \bfnmXuebin\binitsX. (\byear2021). \btitleUncertainty in optimal fingerprinting is underestimated. \bjournalEnvironmental Research Letters \bvolume16 \bpages084043. \endbibitem
  • Lopes (2022a) {barticle}[author] \bauthor\bsnmLopes, \bfnmMiles E\binitsM. E. (\byear2022a). \btitleImproved rates of bootstrap approximation for the operator norm: A coordinate-free approach. \bjournalarXiv preprint arXiv:2208.03050. \endbibitem
  • Lopes (2022b) {barticle}[author] \bauthor\bsnmLopes, \bfnmMiles E\binitsM. E. (\byear2022b). \btitleCentral limit theorem and bootstrap approximation in high dimensions: Near 1/n1/\sqrt{n} rates via implicit smoothing. \bjournalThe Annals of Statistics \bvolume50 \bpages2492–2513. \endbibitem
  • Lopes, Blandino and Aue (2019) {barticle}[author] \bauthor\bsnmLopes, \bfnmMiles E\binitsM. E., \bauthor\bsnmBlandino, \bfnmAndrew\binitsA. and \bauthor\bsnmAue, \bfnmAlexander\binitsA. (\byear2019). \btitleBootstrapping spectral statistics in high dimensions. \bjournalBiometrika \bvolume106 \bpages781–801. \endbibitem
  • Lopes, Erichson and Mahoney (2023) {barticle}[author] \bauthor\bsnmLopes, \bfnmMiles E\binitsM. E., \bauthor\bsnmErichson, \bfnmN Benjamin\binitsN. B. and \bauthor\bsnmMahoney, \bfnmMichael W\binitsM. W. (\byear2023). \btitleBootstrapping the operator norm in high dimensions: Error estimation for covariance matrices and sketching. \bjournalBernoulli \bvolume29 \bpages428–450. \endbibitem
  • Lopes, Lin and Müller (2020) {barticle}[author] \bauthor\bsnmLopes, \bfnmMiles E\binitsM. E., \bauthor\bsnmLin, \bfnmZhenhua\binitsZ. and \bauthor\bsnmMüller, \bfnmHans-Georg\binitsH.-G. (\byear2020). \btitleBootstrapping max statistics in high dimensions: Near-parametric rates under weak variance decay and application to functional and multinomial data. \endbibitem
  • Montanari and Saeed (2022) {binproceedings}[author] \bauthor\bsnmMontanari, \bfnmAndrea\binitsA. and \bauthor\bsnmSaeed, \bfnmBasil N\binitsB. N. (\byear2022). \btitleUniversality of empirical risk minimization. In \bbooktitleConference on Learning Theory \bpages4310–4312. \bpublisherPMLR. \endbibitem
  • Morice et al. (2012) {barticle}[author] \bauthor\bsnmMorice, \bfnmColin P\binitsC. P., \bauthor\bsnmKennedy, \bfnmJohn J\binitsJ. J., \bauthor\bsnmRayner, \bfnmNick A\binitsN. A. and \bauthor\bsnmJones, \bfnmPhil D\binitsP. D. (\byear2012). \btitleQuantifying uncertainties in global and regional temperature change using an ensemble of observational estimates: The HadCRUT4 data set. \bjournalJournal of Geophysical Research: Atmospheres \bvolume117. \endbibitem
  • Olonscheck and Notz (2017) {barticle}[author] \bauthor\bsnmOlonscheck, \bfnmDirk\binitsD. and \bauthor\bsnmNotz, \bfnmDirk\binitsD. (\byear2017). \btitleConsistently estimating internal climate variability from climate model simulations. \bjournalJournal of Climate \bvolume30 \bpages9555–9573. \endbibitem
  • Onatski, Moreira and Hallin (2013) {barticle}[author] \bauthor\bsnmOnatski, \bfnmAlexei\binitsA., \bauthor\bsnmMoreira, \bfnmMarcelo J\binitsM. J. and \bauthor\bsnmHallin, \bfnmMarc\binitsM. (\byear2013). \btitleAsymptotic power of sphericity tests for high-dimensional data. \endbibitem
  • Paul (2007) {barticle}[author] \bauthor\bsnmPaul, \bfnmDebashis\binitsD. (\byear2007). \btitleAsymptotics of sample eigenstructure for a large dimensional spiked covariance model. \bjournalStatistica Sinica \bpages1617–1642. \endbibitem
  • Qiu, Li and Yao (2023) {barticle}[author] \bauthor\bsnmQiu, \bfnmJiaxin\binitsJ., \bauthor\bsnmLi, \bfnmZeng\binitsZ. and \bauthor\bsnmYao, \bfnmJianfeng\binitsJ. (\byear2023). \btitleAsymptotic normality for eigenvalue statistics of a general sample covariance matrix when p/n\to\infty and applications. \bjournalThe Annals of Statistics \bvolume51 \bpages1427–1451. \endbibitem
  • Schäfer and Strimmer (2005) {barticle}[author] \bauthor\bsnmSchäfer, \bfnmJuliane\binitsJ. and \bauthor\bsnmStrimmer, \bfnmKorbinian\binitsK. (\byear2005). \btitleA shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. \bjournalStatistical applications in genetics and molecular biology \bvolume4. \endbibitem
  • Taylor, Stouffer and Meehl (2012) {barticle}[author] \bauthor\bsnmTaylor, \bfnmKarl E\binitsK. E., \bauthor\bsnmStouffer, \bfnmRonald J\binitsR. J. and \bauthor\bsnmMeehl, \bfnmGerald A\binitsG. A. (\byear2012). \btitleAn overview of CMIP5 and the experiment design. \bjournalBulletin of the American meteorological Society \bvolume93 \bpages485–498. \endbibitem
  • Tsukuma (2016) {barticle}[author] \bauthor\bsnmTsukuma, \bfnmHisayuki\binitsH. (\byear2016). \btitleEstimation of a high-dimensional covariance matrix with the Stein loss. \bjournalJournal of Multivariate Analysis \bvolume148 \bpages1–17. \endbibitem
  • Xu, Zhang and Wu (2019) {barticle}[author] \bauthor\bsnmXu, \bfnmMengyu\binitsM., \bauthor\bsnmZhang, \bfnmDanna\binitsD. and \bauthor\bsnmWu, \bfnmWei Biao\binitsW. B. (\byear2019). \btitlePearson’s chi-squared statistics: approximation theory and beyond. \bjournalBiometrika \bvolume106 \bpages716–723. \endbibitem
  • Yao and Lopes (2021) {barticle}[author] \bauthor\bsnmYao, \bfnmJunwen\binitsJ. and \bauthor\bsnmLopes, \bfnmMiles E\binitsM. E. (\byear2021). \btitleRates of bootstrap approximation for eigenvalues in high-dimensional PCA. \bjournalarXiv preprint arXiv:2104.07328. \endbibitem
  • Yu, Li and Xue (2024) {barticle}[author] \bauthor\bsnmYu, \bfnmXiufan\binitsX., \bauthor\bsnmLi, \bfnmDanning\binitsD. and \bauthor\bsnmXue, \bfnmLingzhou\binitsL. (\byear2024). \btitleFisher’s combined probability test for high-dimensional covariance matrices. \bjournalJournal of the American Statistical Association \bvolume119 \bpages511–524. \endbibitem
  • Yu, Zhao and Zhou (2024) {barticle}[author] \bauthor\bsnmYu, \bfnmLong\binitsL., \bauthor\bsnmZhao, \bfnmPeng\binitsP. and \bauthor\bsnmZhou, \bfnmWang\binitsW. (\byear2024). \btitleTesting the number of common factors by bootstrapped sample covariance matrix in high-dimensional factor models. \bjournalJournal of the American Statistical Association \bpages1–12. \endbibitem
  • Zhai (2018) {barticle}[author] \bauthor\bsnmZhai, \bfnmAlex\binitsA. (\byear2018). \btitleA high-dimensional CLT in 𝒲2\mathcal{W}_{2} distance with near optimal convergence rate. \bjournalProbability Theory and Related Fields \bvolume170 \bpages821–845. \endbibitem
  • Zhou, Bai and Hu (2023) {barticle}[author] \bauthor\bsnmZhou, \bfnmHuanchao\binitsH., \bauthor\bsnmBai, \bfnmZhidong\binitsZ. and \bauthor\bsnmHu, \bfnmJiang\binitsJ. (\byear2023). \btitleThe Limiting Spectral Distribution of Large-Dimensional General Information-Plus-Noise-Type Matrices. \bjournalJournal of Theoretical Probability \bvolume36 \bpages1203–1226. \endbibitem
  • Zhou et al. (2024) {barticle}[author] \bauthor\bsnmZhou, \bfnmHuanchao\binitsH., \bauthor\bsnmHu, \bfnmJiang\binitsJ., \bauthor\bsnmBai, \bfnmZhidong\binitsZ. and \bauthor\bsnmSilverstein, \bfnmJack W\binitsJ. W. (\byear2024). \btitleAnalysis of the Limiting Spectral Distribution of Large-dimensional General Information-Plus-Noise-Type Matrices. \bjournalJournal of Theoretical Probability \bvolume37 \bpages1199–1229. \endbibitem
  • Zhu et al. (2017) {barticle}[author] \bauthor\bsnmZhu, \bfnmLingxue\binitsL., \bauthor\bsnmLei, \bfnmJing\binitsJ., \bauthor\bsnmDevlin, \bfnmBernie\binitsB. and \bauthor\bsnmRoeder, \bfnmKathryn\binitsK. (\byear2017). \btitleTesting high-dimensional covariance matrices, with application to detecting schizophrenia risk genes. \bjournalThe annals of applied statistics \bvolume11 \bpages1810. \endbibitem