Covariance test and universal bootstrap by operator norm

Guoyu Zhanglabel=e1][email protected] [ Dandan Jianglabel=e2][email protected] [ Fang Yaolabel=e3][email protected]\orcid0000-0000-0000-0000 [ Department of Probability and Statistics, School of Mathematical Sciences, Center for Statistical Science, Peking Universitypresep=, ]e1,e3 School of Mathematics and Statistics, Xi’an Jiaotong Universitypresep=, ]e2

Abstract

Testing covariance matrices is crucial in high-dimensional statistics. Traditional methods based on the Frobenius and supremum norms are widely used but often treat the covariance matrix as a vector and neglect its inherent matrix structure. This paper introduces a new testing framework based on the operator norm, designed to capture the spectral properties of the covariance matrix more accurately. The commonly used empirical bootstrap and multiplier bootstrap methods are shown to fail for operator norm-based statistics. To derive the critical values of this type of statistics, we propose a universal bootstrap procedure, utilizing the concept of universality from random matrix theory. Our method demonstrates consistency across both high- and ultra-high-dimensional regimes, accommodating scenarios where the dimension-to-sample-size ratio $p/n$ converges to a nonzero constant or diverges to infinity. As a byproduct, we provide the first proof of the Tracy-Widom law for the largest eigenvalue of sample covariances with non-Gaussian entries as $p/n\to\infty$ . We also show such universality does not hold for the Frobenius norm and supremum norm statistics. Extensive simulations and a real-world data study support our findings, highlighting the favorable finite sample performance of the proposed operator norm-based statistics.

62H15,

62F40,

60B20,

Covariance test,

high-dimensional hypothesis test,

operator norm,

random matrix theory,

universality,

bootstrap,

keywords:

[class=MSC]

keywords:

\startlocaldefs\endlocaldefs

, and

1 Introduction

Testing large covariance has received considerable attention due to its critical role in high-dimensional statistics. Generally, statistics for testing one-sample high-dimensional covariance matrices are based on the distance between the sample covariance and the hypothesized covariance matrices. As highlighted by Chen, Qiu and Zhang (2023), this distance is typically measured using two types of norms: the Frobenius norm (e.g., Chen, Zhang and Zhong (2010); Cai and Ma (2013)) and the supremum norm (e.g., Jiang (2004); Cai and Jiang (2011)), similar to their application in high-dimensional mean tests Chen and Qin (2010); Cai, Liu and Xia (2014).

In contrast to the Frobenius norm and the supremum norm which treat the covariance matrices as vectors, the operator norm captures the spectral structure of the covariance matrices and has recently gained significant attention. For $n$ independent, not necessarily identically distributed random vectors $\bm{X}_{1},\cdots,\bm{X}_{n}\in\mathbb{R}^{p}$ with zero mean and a common covariance matrix $\bm{\Sigma}$ , consider the statistics

\displaystyle T=\left\|\bm{\hat{\Sigma}}-\bm{\Sigma}_{0}\right\|_{\text{op}},

(1)

where $\bm{\hat{\Sigma}}=\sum_{i=1}^{n}\bm{X}_{i}\bm{X}_{i}^{T}/n$ is the sample covariance matrix, $\bm{\Sigma}_{0}$ is the hypothesized covariance matrix, and the operator norm of a matrix $\bm{A}$ is defined as $\|\bm{A}\|_{\text{op}}=\sup_{\|\bm{u}\|=1}\|\bm{A}\bm{u}\|$ . The statistic $T$ is fundamental in principal component analysis, as it bounds the error of sample eigenvalues and eigenvectors. As a result, numerous studies have developed non-asymptotic upper bounds for the tail of $T$ , e.g., Adamczak et al. (2011); Bunea and Xiao (2015); Koltchinskii and Lounici (2017). However, these upper bounds are often conservative when used to construct confidence intervals for $T$ . Only a few studies have explored the asymptotic distribution of $T$ due to the complex nature of the spectral analysis, including Han, Xu and Zhou (2018); Lopes (2022a); Lopes, Erichson and Mahoney (2023); Giessing (2023). Nonetheless, these works assume that either the dimension $p$ grows slower than $n$ , or that the eigenvalues of $\bm{\Sigma}$ and $\bm{\Sigma}_{0}$ decay fast enough, making its effective rank smaller than $n$ . Under such conditions, the empirical covariance matrix $\bm{\hat{\Sigma}}$ is a consistent estimator of $\bm{\Sigma}$ with respect to the operator norm. Essentially, these assumptions reduce the intrinsic dimension of the problem, leaving the true high-dimensional cases unresolved. This article studies the behavior of $T$ under both high- and ultra-high-dimensional regimes, characterized by constants $C_{1},C_{2}>0$ and $\alpha\geq 1$ , such that

\displaystyle C_{1}n^{\alpha}\leq p\leq C_{2}n^{\alpha}.

(2)

Furthermore, no eigen-decay assumptions are made for $\bm{\Sigma}$ or $\bm{\Sigma}_{0}$ . In this context, the dimension-to-sample size ratio $\phi=p/n$ may converge to a nonzero constant or diverge to infinity, and no consistent estimators exist for $\bm{\Sigma}$ under the operator norm, as discussed in Ding and Wang (2023); Ding, Hu and Wang (2024). This setting, while challenging, is common in practice. For example, the simple case of $\bm{\Sigma}=\bm{I}_{p}$ with $p$ growing proportionally to $n$ falls into this inconsistent regime. Consequently, existing results are not applicable to such cases. The theoretical difficulties in determining the distribution of $T$ restrict the use of the operator norm in high-dimensional covariance testing within the inconsistent regime.

1.1 Random matrix theory

Building upon prior work, various approaches have been developed for analyzing $T$ and its variants in specific scenarios using random matrix theory. When $\bm{\Sigma}_{0}=\bm{I}_{p}$ , the distribution of $T$ can be derived by examining the limiting behavior of the extreme eigenvalues of the sample covariance matrix $\bm{\hat{\Sigma}}$ . The study of extreme spectral properties of high-dimensional covariance matrices has attracted substantial interest, resulting in a number of influential contributions, including Johnstone (2001); Bianchi et al. (2011); Onatski, Moreira and Hallin (2013); Johnstone and Paul (2018). Building on the progress of these studies, El Karoui (2007); Bao, Pan and Zhou (2015); Lee and Schnelli (2016); Knowles and Yin (2017) established that under conditions where the dimension $p$ and sample size $n$ grow proportionally and other certain regularity conditions, the limiting distribution of the largest eigenvalue of $\bm{\hat{\Sigma}}$ follows the Tracy–Widom law. In cases where $\bm{\Sigma}_{0}$ is invertible, Bao, Pan and Zhou (2015) proposed transforming the data $\bm{X}_{1},\cdots,\bm{X}_{n}$ into $\bm{\Sigma}_{0}^{-1/2}\bm{X}_{1},\cdots,\bm{\Sigma}_{0}^{-1/2}\bm{X}_{n}$ , allowing for testing whether the covariance of $\bm{\Sigma}_{0}^{-1/2}\bm{X}_{i}$ is the identity matrix $\bm{I}_{p}$ . This transformation results in the statistic $T$ taking the form of Roy’s largest root statistic

\displaystyle T^{\text{Roy}}=\left\|\bm{\Sigma}_{0}^{-\frac{1}{2}}\bm{\hat{\Sigma}}\bm{\Sigma}_{0}^{-\frac{1}{2}}-\bm{I}_{p}\right\|_{\text{op}},

(3)

and the Tracy–Widom distribution results hold for $T^{\text{Roy}}$ when $p$ and $n$ grow proportionally, i.e., $\alpha=1$ . However, similar distributional results for $T$ have not yet been obtained for general $\bm{\Sigma}_{0}$ , even when $p/n$ is bounded. To address this gap, we leverage random matrix theory to characterize the extreme singular values for the matrix of the form $\hat{\bm{\Sigma}}+\bm{R}$ with a general matrix $\bm{R}$ . When taking $\bm{R}=-\bm{\Sigma}_{0}$ , this gives the limiting properties of $T$ . Additionally, we introduce a novel technique, the universal bootstrap, designed to directly approximate the distribution of $T$ in the context of general $\bm{\Sigma}_{0}$ .

1.2 Bootstrap

The bootstrap method, a generic resampling technique to approximate the distribution of a statistic, was first introduced by Efron (1979). Originally designed for fixed-dimensional problems, the bootstrap method has recently been adapted for high-dimensional settings through substantial work. Chernozhukov, Chetverikov and Kato (2013, 2017); Lopes, Lin and Müller (2020); Lopes (2022b); Chernozhukov, Chetverikov and Koike (2023) developed Gaussian approximation methods and established the approximation rates for empirical bootstrap and multiplier bootstrap for the supremum norm of the sum of high-dimensional vectors. Moreover, Han, Xu and Zhou (2018); Lopes, Blandino and Aue (2019); Yao and Lopes (2021); Lopes (2022a); Lopes, Erichson and Mahoney (2023) explored bootstrap methods for spectral statistics of covariance matrices, while these works often assume a low intrinsic dimension or effective rank relative to the sample size $n$ . Despite such advances in high-dimensional bootstrap methods, El Karoui and Purdom (2019) and Yu, Zhao and Zhou (2024) demonstrated that empirical and multiplier bootstraps can be inconsistent for $T$ , even when $\bm{\Sigma}_{0}=\bm{I}_{p}$ if $p$ and $n$ grow proportionally and the eigenvalues of $\bm{\Sigma}$ and $\bm{\Sigma}_{0}$ do not decay.

To address the inconsistency issues associated with high-dimensional bootstraps based on Gaussian approximation, motivated by the concept of universality, a topic of recent interest in random matrix theory (Hu and Lu, 2022; Montanari and Saeed, 2022), we develop a novel resampling method informed by universality principles. Specifically, traditional bootstrap methods (Chernozhukov, Chetverikov and Kato, 2013, 2017; Chernozhukov, Chetverikov and Koike, 2023), depending on the high-dimensional central limit theorem, treat the large sample covariance matrix $\bm{\hat{\Sigma}}=\sum_{i=1}^{n}\bm{X}_{i}\bm{X}_{i}^{T}/n$ as a sum of random matrices and approximate its distribution with a Gaussian matrix sharing the same covariance as $\bm{\hat{\Sigma}}$ . Nonetheless, as the dimension $p$ increases, the accuracy of this approximation diminishes, eventually leading to inconsistency, as shown by El Karoui and Purdom (2019). This inconsistency indicates that in high-dimensional contexts, the large matrix structures dominate the central limit theorem’s applicability, causing the sample covariance matrix $\bm{\hat{\Sigma}}$ to diverge from Gaussian behavior. To circumvent this limitation, the universal property leverages high-dimensional structures. Broadly, the universality suggests that the asymptotic distribution of $T$ only depends on the distribution of $\bm{X}_{1},\cdots,\bm{X}_{n}$ through their first two moments. This allows us to construct the universal bootstrap statistic by substituting independent Gaussian samples $\bm{Y}_{1},\dots,\bm{Y}_{n}\sim\mathcal{N}(\bm{0},\bm{\Sigma})$ in place of $\bm{X}_{1},\dots,\bm{X}_{n}$ in the definition of (1),

\displaystyle T^{\text{ub}}=\left\|\bm{\hat{\Sigma}}^{\text{ub}}-\bm{\Sigma}_{0}\right\|_{\text{op}},

(4)

where $\bm{\hat{\Sigma}}^{\text{ub}}=\sum_{i=1}^{n}\bm{Y}_{i}\bm{Y}_{i}^{T}/n$ . The key insight here is that while the universal bootstrap matrix $\bm{\hat{\Sigma}}^{\text{ub}}$ need not approximate a Gaussian matrix, it effectively replicates the structure of $\bm{\hat{\Sigma}}$ , which is disrupted by the empirical and multiplier bootstraps. Although this method relies on sophisticated random matrix theory, the implementation remains straightforward.

1.3 Our contributions

We summarize our contribution as three-fold. First, we establish the anisotropic local law for matrices of the form $\hat{\bm{\Sigma}}+\bm{R}$ within both high- and ultra-high-dimensional settings (2), where $\bm{R}$ may have positive, negative, or zero eigenvalues. Erdős, Yau and Yin (2012); Knowles and Yin (2017) demonstrated that the anisotropic local law is essential for proving eigenvalue rigidity and universality. The anisotropic local law for $\hat{\bm{\Sigma}}$ has been previously established by Knowles and Yin (2017) for $\alpha=1$ and by Ding and Wang (2023); Ding, Hu and Wang (2024) for $\alpha>1$ . However, the existence of $\bm{R}$ alters the structure of the sample covariance matrix $\hat{\bm{\Sigma}}$ , rendering existing techniques invalid. Additionally, due to the potential ultra-high dimension, each block of the Green function matrix (see (21) for definition) may converge at different rates, making the bound derived in Ding and Wang (2023) suboptimal. To address these challenges, we introduce a new ”double Schur inversion” technique to represent the Green function matrix using local blocks. We also define an auxiliary parameter-dependent covariance matrix and establish the corresponding parameter-dependent Marčenko-Pastur law. This parameter-dependent structure enables us to prove the entry-wise local law, which represents a weaker version of the anisotropic local law. Furthermore, we present an improved representation of the anisotropic local law, enhancing the results of Ding and Wang (2023) and Ding, Hu and Wang (2024). This unified structure provides a valuable tool for establishing the anisotropic local law.

Second, we establish the universality results for the statistic $T$ , based on which the universal bootstrap procedure is introduced. Leveraging this universality, we demonstrate that the empirical distribution generated by the universal bootstrap in (4) effectively approximates the distribution of $T$ . Additionally, the asymptotic distribution of $T$ is shown to be independent of the third and fourth moments of $\bm{X}_{1},\cdots,\bm{X}_{n}$ , allowing us to focus primarily on the covariance. In contrast, we find that two commonly used norms, the Frobenius norm and supremum norm, lack this universality, as their asymptotic distributions explicitly depend on all fourth moments of the data. Therefore, standardized statistics based on these norms requires fourth-moment estimation, which is generally more complex than estimating the covariance. This difference highlights a key advantage of employing the operator norm for covariance testing.

Third, we perform size and power analyses for the statistics $T$ and $T^{\text{Roy}}$ and propose a combined approach to enhance testing power. We also develop a generalized universality theorem to show the consistency of the universal bootstrap for statistics based on extreme eigenvalues, including this combined statistic. For size analysis, we demonstrate that the universality results apply to $T^{\text{Roy}}$ . As a byproduct, we extend the Tracy–Widom law for the largest eigenvalue of sample covariance with general entries under the ultra-high dimension regime, addressing a long-standing gap since it was only proven for Gaussian entries by Karoui (2003). For power performance, we analyze both $T$ and $T^{\text{Roy}}$ within the generalized spiked model framework from Bai and Yao (2012); Jiang and Bai (2021). We show that $T^{\text{Roy}}$ performs better in worst-case scenarios while $T$ excels on average performance. To further enhance the power, we propose a new combined statistic, $T^{\text{Com}}$ , supported by a generalized universality theorem that confirms the validity of the universal bootstrap for $T^{\text{Com}}$ . This also serves as a theoretical guarantee for the universal bootstrap applicable to a broader range of other statistics based on extreme eigenvalues. Extensive simulations validate these findings, highlighting the superior performance of our combined statistics across a range of scenarios.

1.4 Notations and paper organization

Throughout the paper, we reserve boldfaced symbols for vectors and matrices. For a complex number $z\in\mathbb{C}$ , we use $\Re z$ and $\Im z$ for its real part and imaginary part, respectively. For a vector $\bm{u}\in\mathbb{R}^{p}$ , we use $\|\bm{u}\|=\sqrt{\sum_{i=1}^{p}u_{i}^{2}}$ for its Euclidean norm. For a matrix $\bm{A}=(a_{ij})_{M\times N}$ , we use $\|\bm{A}\|_{\text{op}}$ , $\|\bm{A}\|_{\text{F}}=:\sqrt{\sum_{i=1}^{M}\sum_{j=1}^{N}a_{ij}^{2}}$ , and $\|\bm{A}\|_{\text{sup}}=:\sup_{1\leq i\leq M,1\leq j\leq N}|a_{ij}|$ to represent its operator norm, Frobenius norm and the supremum norm, respectively. Denote the singular values of $\bm{A}$ by $\sigma_{1}(\bm{A})\geq\sigma_{2}(\bm{A})\geq\cdots\geq\sigma_{r}(\bm{A})$ , where $r=\min\{M,N\}$ . When $M=N$ , we use $\lambda_{1}(\bm{A}),\lambda_{2}(\bm{A}),\cdots\lambda_{M}(\bm{A})$ for the eigenvalues of $\bm{A}$ . The trace of a square matrix $\bm{A}$ is denoted by $\text{tr}(\bm{A})=\sum_{i=1}^{p}a_{ii}$ . For two sequences $\{a_{n}\}_{n=1}^{\infty}$ , $\{b_{n}\}_{n=1}^{\infty}$ , we use $a_{n}\lesssim b_{n}$ or $a_{n}=O(b_{n})$ to show there exists a constant $C$ not depending on $n$ such that $|a_{n}|\leq Cb_{n}$ for all $n\in\mathbb{N}$ . Denote $a_{n}=o(b_{n})$ if $a_{n}/b_{n}\to 0$ , and $a_{n}\asymp b_{n}$ if both $a_{n}\lesssim b_{n}$ and $b_{n}\lesssim a_{n}$ .

The paper is organized as follows. Section 2 provides an overview of the proposed method and presents our main results on the consistency of the proposed universal bootstrap procedure. Section 3 describes the key tools from random matrix theory and applies the universality theorem to covariance testing problems. Section 4 presents the power analysis of the universal bootstrap procedure. Section 5 provides simulation results and a real data example, demonstrating the numerical performance of our operator norm-based statistics. Detailed technical proofs are provided in the supplementary material. The data and codes are publicly available in a GitHub repository (https://github.com/zhang-guoyu/universal_bootstrap).

2 Proposed method and theoretical outlines

In this section, we introduce the covariance testing problem and present an informal summary of our main universal results. The formal results are provided in Section 3.3. We also discuss statistical applications of our proposed universal bootstrap method, which utilizes the concept of universality. Consider $n$ independent random vectors $\bm{X}_{1},\cdots,\bm{X}_{n}\in\mathbb{R}^{p}$ with zero mean and covariance matrix $\bm{\Sigma}=(\sigma_{ij})_{p\times p}$ , which are not necessarily identically distributed. For a given $p$ by $p$ non-negative matrix $\bm{\Sigma}_{0}$ , we aim to test the hypothesis

\displaystyle H_{0}:\bm{\Sigma}=\bm{\Sigma}_{0}\quad\text{vs.}\quad H_{1}:\bm{\Sigma}\neq\bm{\Sigma}_{0}.

(5)

As we focus on testing the covariance structure, we allow the third and fourth moments of $\bm{X}_{i}$ to differ across $i=1,\cdots,n$ .

To simplify notation, we arrange the data into an $n\times p$ matrix $\bm{X}=(\bm{X}_{1},\cdots,\bm{X}_{n})^{T}\in\mathbb{R}^{n\times p}$ , where $\bm{A}^{T}$ denotes the transpose of $\bm{A}$ , and the sample covariance matrix is expressed by $\bm{\hat{\Sigma}}=\bm{X}^{T}\bm{X}/n$ . Recall the definition of statistic $T$ in (1), we aim to control the size of our procedure by charaterizing the asymptotic distribution of $T$ under the null hypothesis $H_{0}$ . Consider the Gaussian data matrix $\bm{Y}=(\bm{Y}_{1},\cdots,\bm{Y}_{n})^{T}\in\mathbb{R}^{n\times p}$ , with $\bm{Y}_{1},\cdots,\bm{Y}_{n}\sim\mathcal{N}(\bm{0},\bm{\Sigma})$ . Its sample covariance is defined accordingly as $\bm{\hat{\Sigma}}^{\text{ub}}=\bm{Y}^{T}\bm{Y}/n$ . Throughout this article, we work under both the proportionally growing high-dimensional regime and the ultra-high dimensional regime (2). Under these regimes, the key quantity of interest to bound is as follows

\displaystyle\rho_{n}(\bm{\Sigma}_{0})=\sup_{t\geq 0}\bigg{|}\mathbb{P}\bigg{(}\bm{\hat{\Sigma}}\in\mathbf{B}_{\text{op}}(\bm{\Sigma}_{0},t)\bigg{)}-\mathbb{P}\bigg{(}\bm{\hat{\Sigma}}^{\text{ub}}\in\mathbf{B}_{\text{op}}(\bm{\Sigma}_{0},t)\bigg{)}\bigg{|},

(6)

where $\mathbf{B}_{\text{op}}(\bm{\Sigma}_{0},t)=\big{\{}\bm{N}\ :\|\bm{N}-\bm{\Sigma}_{0}\|_{\text{op}}\leq t\big{\}}$ represents operator norm ball with center $\bm{\Sigma}_{0}$ and the radius $t$ .

A simplified form is given to illustrate our main results.

Result 1 (Informal).

We have

\displaystyle\rho_{n}(\bm{\Sigma})\leq Cn^{-\delta}\to 0\quad\text{as}\quad n\to\infty,

(7)

for some constants $C$ and $\delta>0$ , where $\bm{\Sigma}$ is the covariance matrix of $\bm{X}$ .

This result enables us to approximate the asymptotic distribution of $T$ using $T^{\text{ub}}$ , as defined in (4). Under $H_{0}$ , $\bm{\Sigma}=\bm{\Sigma}_{0}$ and the distribution of $T^{\text{ub}}$ does not contain any unknown quantities. This inspires the procedure of universal bootstrap. Given the analytical complexity of $T^{\text{ub}}$ ’s distribution, we generate $B$ independent samples, $\bm{Y}^{1},\dots,\bm{Y}^{B}$ , from the same distribution as $\bm{Y}$ . We define $\bm{\hat{\Sigma}}^{\text{ub},b}=(\bm{Y}^{b})^{T}(\bm{Y}^{b})/n$ for each $b=1,\dots,B$ , allowing us to compute

T^{\text{ub},b}=\left\|\bm{\hat{\Sigma}}^{\text{ub},b}-\bm{\Sigma}_{0}\right\|_{\text{op}},\ b=1,\cdots,B.

The empirical distribution of $T^{\text{ub},b}$ for $b=1,\dots,B$ serves as an approximation for the distribution of $T$ . In particular, we use the empirical upper- $\alpha$ quantile, $\hat{q}^{\text{ub},B}_{\bm{\Sigma},\bm{\Sigma}_{0}}(\alpha)$ , of $T^{\text{ub},b}$ , $b=1,\dots,B$ , as the threshold for the test (5) based on $T$ . Defining $\hat{q}^{\text{ub},B}_{\bm{\Sigma}_{0}}(\alpha)=\hat{q}^{\text{ub},B}_{\bm{\Sigma}_{0},\bm{\Sigma}_{0}}(\alpha)$ , we reject $H_{0}$ if $T\geq\hat{q}^{\text{ub},B}_{\bm{\Sigma}_{0}}(\alpha)$ . Results like (7) ensure the consistency of $\hat{q}^{\text{ub},B}_{\bm{\Sigma}_{0}}(\alpha)$ to the upper- $\alpha$ quantile $q_{\bm{\Sigma}_{0}}(\alpha)$ of $T$ under $H_{0}$ as $n\to\infty$ and $B\to\infty$ , validating the universal bootstrap procedure.

To contextualize our findings, we briefly compare (7) with existing high-dimensional bootstrap results. Generally, these results are presented as

\rho(\mathcal{A})=\sup_{A\in\mathcal{A}}\big{|}\mathbb{P}\big{(}T\in A\big{)}-\mathbb{P}\big{(}T^{*}\in A\ \big{|}\ \bm{X}\big{)}\big{|}\to 0,

where $T$ is a statistic, $T^{*}$ is its Gaussian counterpart, and $\mathcal{A}$ represents a specified family of sets. For example, Chernozhukov, Chetverikov and Kato (2013, 2017); Chernozhukov, Chetverikov and Koike (2023) considered $T$ as mean estimators and $\mathcal{A}$ as all rectangular in $\mathbb{R}^{p}$ . A more related choice of $\mathcal{A}$ is in Zhai (2018); Xu, Zhang and Wu (2019); Fang and Koike (2024), who also considered mean estimators but take $\mathcal{A}$ to be the sets of Euclidean balls and convex sets in $\mathbb{R}^{p}$ . Their results demonstrated that under mild conditions and for sets of Euclidean balls $\mathcal{A}$ , $\rho(\mathcal{A})$ converge to $0$ if and only if $p/n\to 0$ , meaning the Gaussian approximation holds when $p=o(n)$ . For comparison, we observe that the operator norm ball $\mathbf{B}_{\text{op}}(\bm{v},t)$ for vector $\bm{v}$ in $\mathbb{R}^{p}$ coincides with the Euclidean balls, and our results show that the universality approximation holds when $p/n$ converges to a nonzero constant or even diverges to infinity. For the covariance test, Han, Xu and Zhou (2018) took $T$ as the sample covariance matrix and $\mathcal{A}$ as all sets of $s$ -sparse operator norm balls (defined in their work). Especially, with $\mathcal{A}$ as all operator norm balls, i.e. $s=p$ , they required $p=o(n^{1/9})$ , limiting $p$ to be considerably smaller than $n$ . Similarly, Lopes (2022a); Lopes, Erichson and Mahoney (2023) considered $T$ as sample covariance with $\mathcal{A}$ as operator norm balls, but imposed a decay rate $i^{-\beta}$ for the $i$ -th largest eigenvalue $\lambda_{i}(\bm{\Sigma})$ of $\bm{\Sigma}$ with $\beta>1$ , implying a low intrinsic test dimension. Likewise, Giessing (2023) required the effective rank $r(\bm{\Sigma})=\text{tr}(\bm{\Sigma})/|\bm{\Sigma}|_{\text{op}}$ to satisfy $r(\bm{\Sigma})=o(n^{1/6})$ . In contrast, we impose no such assumptions, allowing each eigenvalue of $\bm{\Sigma}$ to be of comparable scale. To summarize, previous work has typically assumed either $p\ll n$ or fast-decaying eigenvalues, yielding consistent estimates for $\bm{\Sigma}$ . In our setting, however, no consistent estimator of $\bm{\Sigma}$ exists (see Cai and Ma (2013)). These comparisons underscore the advantages of our proposal, even in regimes lacking consistency.

Given these improvements on existing results, we establish a universal result that extends beyond (7).

Result 2 (Informal).

For $\bm{\Sigma}_{0}$ that commutes with $\bm{\Sigma}$ , we have

\displaystyle\rho_{n}(\bm{\Sigma}_{0})\leq Cn^{-\delta}\to 0\quad\text{as}\quad n\to\infty.

(8)

for some constants $C$ and $\delta>0$ . See Theorem 3.5 for a formal description.

This result generalizes the universal bootstrap consistency of $T$ under $H_{0}$ to alternative covariance $\bm{\Sigma}_{0}$ distinct from $\bm{\Sigma}$ . The commutativity requirement between $\bm{\Sigma}_{0}$ and $\bm{\Sigma}$ assumes the two matrices share eigenvectors. Similar assumptions appear in Zhou, Bai and Hu (2023); Zhou et al. (2024). The result in (8) further guarantees universal bootstrap consistency for statistics beyond $T$ . For instance, we can estimate $\bm{\Sigma}$ with shrinkage estimators such as $a\hat{\bm{\Sigma}}+(1-a)\bm{I}_{p}$ for $a\in(0,1)$ as proposed by Schäfer and Strimmer (2005), or $a\hat{\bm{\Sigma}}$ for some $a>0$ as in Tsukuma (2016). Using these estimators, covariance can be tested with statistics $T^{\text{shr}}_{1}=\left|a\hat{\bm{\Sigma}}+(1-a)\bm{I}_{p}-\bm{\Sigma}_{0}\right|_{\text{op}}=a\left|\hat{\bm{\Sigma}}-(\bm{\Sigma}_{0}-(1-a)\bm{I}_{p})/a\right|_{\text{op}}$ or $T^{\text{shr}}_{2}=\left|a\hat{\bm{\Sigma}}-\bm{\Sigma}_{0}\right|_{\text{op}}=a\left|\hat{\bm{\Sigma}}-\bm{\Sigma}_{0}/a\right|_{\text{op}}$ . Under $H_{0}$ , universal bootstrap consistency for $T^{\text{shr}}_{1}$ and $T^{\text{shr}}_{2}$ can be shown by $\rho_{n}\left((\bm{\Sigma}_{0}-(1-a)\bm{I}_{p})/a\right)\to 0$ and $\rho_{n}\left(\bm{\Sigma}_{0}/a\right)\to 0$ , as $(\bm{\Sigma}_{0}-(1-a)\bm{I}_{p})/a$ and $\bm{\Sigma}_{0}/a$ commute with $\bm{\Sigma}$ .

Despite the broad applications of (8), certain statistics using extreme eigenvalues remain outside its scope. For example, as outlined in Section 1, we define $T^{\text{Com}}$ to combine $T$ and $T^{\text{Roy}}$ for enhanced statistical power, where $T^{\text{Com}}$ is given by

\displaystyle T^{\text{Com}}=\frac{T^{2}}{\text{tr}(\bm{\Sigma}_{0})}+(T^{\text{Roy}})^{2}.

(9)

While (8) demonstrates universal bootstrap consistency for $T$ and $T^{\text{Roy}}$ separately, it does not apply to $T^{\text{Com}}$ . This limitation arises from $T^{\text{Com}}$ ’s dependence on the joint law of $\sigma_{1}\left(\bm{\hat{\Sigma}}-\bm{\Sigma}_{0}\right)$ and $\sigma_{1}\left(\bm{\Sigma}_{0}^{-1/2}\bm{\hat{\Sigma}}\bm{\Sigma}_{0}^{-1/2}-\bm{I}_{p}\right)$ , which introduces a complex dependence structure. To address these limitations, we develop a generalized universality theorem that accounts for dependencies among various extreme eigenvalues. We begin by introducing the generalized operator norm ball $\mathbf{B}_{k,\text{op}}(\bm{\Sigma}_{1},\bm{\Sigma}_{2},\bm{t}_{k})$

\displaystyle\mathbf{B}_{k,\text{op}}(\bm{\Sigma}_{1},\bm{\Sigma}_{2},\bm{t}_{k})=\bigg{\{}\bm{N}\ :\ \sigma_{i}\left(\bm{\Sigma}_{2}^{-\frac{1}{2}}(\bm{N}-\bm{\Sigma}_{1})\bm{\Sigma}_{2}^{-\frac{1}{2}}\right)\leq t_{i},\ i=1,\cdots,k\bigg{\}},

(10)

where $\bm{\Sigma}_{2}$ is positive-definite, and $\bm{t}_{k}=(t_{1},\cdots,t_{k})^{T}$ with $t_{i}\geq 0$ for each $i=1,\cdots,k$ . Since the operator norm $\|\bm{N}-\bm{\Sigma}_{1}\|_{\text{op}}$ equals the largest singular value $\sigma_{1}(\bm{N}-\bm{\Sigma}_{1})$ , we obtain $\mathbf{B}_{\text{op}}(\bm{\Sigma},t)=\mathbf{B}_{1,\text{op}}(\bm{\Sigma},\bm{I}_{p},t)$ . Next, we define the following quantity for symmetric matrices $\bm{\Sigma}_{1,m}$ positive-definite matrices $\bm{\Sigma}_{2,m}$ , $m=1,\cdots,M$ ,

	$\displaystyle\rho_{n}(\left\{\bm{\Sigma}_{1,m},\bm{\Sigma}_{2,m}\right\}_{m=1}^{M})=\sup_{\bm{t}_{k_{1}},\cdots,\bm{t}_{k_{M}}\geq\bm{0}}\bigg{\|}\mathbb{P}$	$\displaystyle\bigg{(}\bm{\hat{\Sigma}}\in\bigcap_{m=1}^{M}\mathbf{B}_{k_{m},\text{op}}(\bm{\Sigma}_{1,m},\bm{\Sigma}_{2,m},\bm{t}_{k_{m}})\bigg{)}$
		$\displaystyle-\mathbb{P}\bigg{(}\bm{\hat{\Sigma}}^{\text{ub}}\in\bigcap_{m=1}^{M}\mathbf{B}_{k_{m},\text{op}}(\bm{\Sigma}_{1,m},\bm{\Sigma}_{2,m},\bm{t}_{k_{m}})\bigg{)}\bigg{\|},$		(11)

where we use $\bm{t}_{k}\geq\bm{0}$ to represent $t_{i}\geq 0$ for each $i=1,\cdots,k$ .

Result 3 (Informal).

For $\bm{\Sigma}_{2,m}^{-1/2}\bm{\Sigma}_{1,m}\bm{\Sigma}_{2,m}^{-1/2}$ that commutes with $\bm{\Sigma}_{2,m}^{-1/2}\bm{\Sigma}\bm{\Sigma}_{2,m}^{-1/2}$ for $m=1,\cdots,M$ , we have

\displaystyle\rho_{n}(\left\{\bm{\Sigma}_{1,m},\bm{\Sigma}_{2,m}\right\}_{m=1}^{M})\leq Cn^{-\delta}\to 0\quad\text{as}\quad n\to\infty.

(12)

Take $M=1$ , $k_{1}=1$ , $\bm{\Sigma}_{1,1}=\bm{\Sigma}_{0}$ , $\bm{\Sigma}_{2,1}=\bm{I}_{p}$ , (12) recovers (8). But (12) provides a more general result, showing that the joint law of

\displaystyle\left\{\left(\sigma_{k}\left(\bm{\Sigma}_{2,1}^{-\frac{1}{2}}(\bm{\hat{\Sigma}}-\bm{\Sigma}_{1,1})\bm{\Sigma}_{2,1}^{-\frac{1}{2}}\right)\right)_{k=1}^{k_{1}},\cdots,\left(\sigma_{k}\left(\bm{\Sigma}_{2,M}^{-\frac{1}{2}}(\bm{\hat{\Sigma}}-\bm{\Sigma}_{1,M})\bm{\Sigma}_{2,M}^{-\frac{1}{2}}\right)\right)_{k=1}^{k_{M}}\right\},

(13)

can be approximated by the universal bootstrap. This result allows for constructing statistics by combining extreme eigenvalues in (13) in various ways, with (12) confirming universal bootstrap consistency for these statistics.

3 Universal bootstrap

3.1 Preliminaries

Our results rely on the universality from the random matrix theory. To proceed, we first introduce some preliminary results relevant to our analysis. Denote $\bm{X}_{i}=\bm{\Sigma}^{1/2}\bm{Z}_{i}$ , where $\bm{Z}_{i}$ has zero mean and identity covariance matrix for $i=1,\cdots,n$ . We accordingly define $\bm{Z}=(\bm{Z}_{1},\cdots,\bm{Z}_{n})^{T}\in\mathbb{R}^{n\times p}$ . The primary matrix of interest is $\bm{\hat{\Sigma}}-\bm{\Sigma}_{0}$ . This inspires us to consider the matrix of a general form

\displaystyle\bm{M}_{n}=\bm{\hat{\Sigma}}+\bm{R}=\phi^{\frac{1}{2}}\bm{M}^{\prime}_{n}

(14)

where $\bm{M}^{\prime}_{n}=(np)^{-1/2}\ \bm{\Sigma}^{1/2}\bm{Z}^{T}\bm{Z}\bm{\Sigma}^{1/2}+\bm{R}^{\prime}$ and $\bm{R}^{\prime}=\phi^{-1/2}\ \bm{R}$ . Here, $\bm{R}$ is a symmetric matrix with eigenvalues that may be positive, negative, or zero, and $\bm{M}^{\prime}$ is normalized to address cases where $\alpha>1$ , as per Ding and Wang (2023). We impose some assumptions to determine the limit of the Stieltjes transform $m_{\bm{M}^{\prime}_{n}}(z)=\text{tr}(\bm{M}_{n}^{\prime}-z\bm{I}_{p})^{-1}/p$ of $\bm{M}^{\prime}_{n}$ .

Assumption 1.

Suppose that $\bm{Z}=(Z_{ij})_{n\times p}$ and $\left\{Z_{ij},i=1,\cdots,n;j=1,\cdots,p\right\}$ are independent with $\mathbb{E}[Z_{ij}]=0$ , $\mathbb{E}[Z_{ij}^{2}]=1$ . There exists a positive sequence $C_{k}$ such that $\mathbb{E}[|Z_{ij}|^{k}]\leq C_{k}$ for $i=1,\cdots,n;j=1,\cdots,p$ and $k\in\mathbb{N}$ .

Assumption 2.

The matrix $\bm{\Sigma}$ and $\bm{R}$ are bounded in spectral norm, i.e. there exists some positive $C$ such that $\|\bm{\Sigma}\|_{\text{op}},\ \|\bm{R}\|_{\text{op}}\leq C$ . Furthermore, there are constants $c_{1},c_{2}>0$ such that the empirical spectral distribution $F$ of $\bm{\Sigma}$ satisfies $F^{\bm{\Sigma}}(c_{1})\leq c_{2}$ .

Assumption 3.

The matrix $\bm{\Sigma}$ and $\bm{R}$ are commutative, i.e. $\bm{\Sigma}\bm{R}=\bm{R}\bm{\Sigma}$ .

Assumptions 1 and 2 are standard and often appear in the random matrix literature. Notably, Assumption 1 relies only on independence, without assuming identical distribution as in Qiu, Li and Yao (2023). While all moments exist under Assumption 1, this condition could be relaxed as discussed in Ding and Yang (2018). We do not pursue this here. We also permit $\bm{\Sigma}$ to be singular, a less restrictive condition than the invertibility assumed in Ding and Wang (2023); Ding, Hu and Wang (2024). This generalization enables testing singular $\bm{\Sigma}_{0}$ , where $T^{\text{Roy}}$ is invalid but $T$ remains applicable. Lastly, Assumption 3, imposed for technical requirements, necessitates that $\bm{\Sigma}$ and $\bm{R}$ share the same eigenvectors. The same Assumption is required in the signal-plus-noise model as in Zhou, Bai and Hu (2023); Zhou et al. (2024).

We introduce the following deterministic equivalence matrix

\displaystyle\bar{\bm{Q}}_{p}(z)=\bigg{(}\bm{R}^{\prime}+\frac{1}{\phi^{\frac{1}{2}}(1+\phi^{\frac{1}{2}}e(z))}\bm{\Sigma}-z\bm{I}_{p}\bigg{)}^{-1},

(15)

where $\phi=p/n$ and $e(z)$ is the fixed point of the equation $e(z)=\text{tr}\big{(}\bm{R}^{\prime}+\phi^{-1/2}(1+\phi^{1/2}e(z))^{-1}\bm{\Sigma}-z\bm{I}_{p}\big{)}^{-1}\bm{\Sigma}/p$ . Using $\bar{\bm{Q}}_{p}(z)$ , we define the associated Stieltjes transform $\bar{m}_{p}(z)$ as $\bar{m}_{p}(z)=\text{tr}(\bar{\bm{Q}}_{p}(z))/p$ . According to Couillet, Debbah and Silverstein (2011), $e(z)$ and $\bar{\bm{Q}}_{p}(z)$ are well-defined for every $z\in\mathbb{C}^{+}$ , and $\bar{m}_{p}(z)$ serves as the Stieltjes transform of a measure $\rho$ on $\mathbb{R}$ . We further denote $E_{+}$ and $E_{-}$ as the endpoints of $\rho$ . Formally, if we define $\text{supp}(\rho)=\left\{x\in\mathbb{R}\ :\ \rho([x-\epsilon,x+\epsilon])>0,\ \forall\epsilon>0\right\}$ , we have $E_{+}=\sup\text{supp}(\rho)$ and $E_{-}=\inf\text{supp}(\rho)$ . Combining results of Knowles and Yin (2017) for the $\alpha=1$ case and Ding and Wang (2023) for the $\alpha>1$ case, it follows that $E_{+}\asymp\phi^{1/2}$ and $|E_{-}|\lesssim 1$ . Intuitively, the largest eigenvalue $\lambda_{1}(\bm{M}^{\prime}_{n})$ approaches $E_{+}$ , while $\lambda_{p}(\bm{M}^{\prime}_{n})$ approaches $E_{-}$ . The next subsection characterizes the fluctuations of $\lambda_{1}(\bm{M}^{\prime}_{n})-E_{+}$ and $\lambda_{p}(\bm{M}^{\prime}_{n})-E_{-}$ .

3.2 Universality

In this subsection, we establish the anisotropic local law, which forms the foundation for our universality results. Specifically, we aim to describe the fluctuations of $\lambda_{1}(\bm{M}^{\prime}_{n})-E_{+}$ and $\lambda_{p}(\bm{M}^{\prime}_{n})-E_{-}$ . This requires us to analyze the convergence of the Stieltjes transform near the endpoints $E_{+}$ and $E_{-}$ . We define the local domains $\mathcal{D}_{+}$ and $\mathcal{D}_{-}$ as

\displaystyle\mathcal{D}_{\pm}=\mathcal{D}_{\pm}(\tau)=\left\{z=E+\text{i}\eta\in\mathbb{C}^{+}\ :\ |z|\geq\tau,|E-E_{\pm}|\leq\tau^{-1},n^{-1+\tau}\leq\eta\leq\tau^{-1}\right\}

(16)

for a fixed parameter $0<\tau<1$ . For $\alpha=1$ , let $\mathcal{D}=\mathcal{D}_{+}\cup\mathcal{D}_{-}$ , and for $\alpha>1$ , define $\mathcal{D}=\mathcal{D}_{+}$ . We will focus on the behavior of the Green function on $\mathcal{D}$ . This definition ensures that for $\alpha=1$ , both the largest and smallest eigenvalues of $\bm{M}_{n}^{\prime}$ are controlled, so that $\sigma_{1}(\bm{M}_{n}^{\prime})=\max\{|\lambda_{1}(\bm{M}_{n}^{\prime})|,|\lambda_{p}(\bm{M}_{n}^{\prime})|\}$ can also be controlled. While for the case $\alpha>1$ , we always have $|E_{+}|\gg|E_{-}|$ , thus only the largest eigenvalue requires attention.

To facilitate the presentation of the anisotropic local law, we assume in this subsection that

\displaystyle\text{"}\bm{\Sigma}\text{ is invertible and }\bm{R}\text{ is positive-definite"}.

(17)

Under (17), we can rewrite $\bm{M}^{\prime}_{n}$ as $\bm{M}^{\prime}_{n}=\bm{\Sigma}^{1/2}\left((np)^{-1/2}\ \bm{Z}^{T}\bm{Z}+\bm{R}^{{}^{\prime\prime}}\right)\bm{\Sigma}^{1/2}$ where $\bm{R}^{{}^{\prime\prime}}=\bm{\Sigma}^{-1/2}\bm{R}^{{}^{\prime}}\bm{\Sigma}^{-1/2}$ . Since $\bm{M}^{\prime}_{n}$ is quadratic in $\bm{X}$ , we follow Knowles and Yin (2017) to define the linearization matrix $\bm{H}(z)$ and the corresponding linearized Green function $\bm{G}(z)$

\displaystyle\bm{H}(z)=\left[\begin{array}[]{ccc}-\bm{\Sigma}^{-1}&(\bm{R}^{{}^{\prime\prime}})^{\frac{1}{2}}&\frac{1}{(np)^{\frac{1}{4}}}\bm{Z}^{T}\\ (\bm{R}^{{}^{\prime\prime}})^{\frac{1}{2}}&-z\bm{I}_{p}&\bm{0}\\ \frac{1}{(np)^{\frac{1}{4}}}\bm{Z}&\bm{0}&-z\bm{I}_{n}\end{array}\right],\quad\bm{G}(z)=\bm{H}^{-1}(z).

(21)

Notice that $\bm{G}(z)$ is a large $(n+2p)\times(n+2p)$ matrix. We also define the deterministic equivalent Green function as

\displaystyle\bar{\bm{G}}(z)=\left[\begin{array}[]{ccc}z\bm{\Sigma}^{\frac{1}{2}}\bar{\bm{Q}}_{p}(z)\bm{\Sigma}^{\frac{1}{2}}&\bm{\Sigma}^{\frac{1}{2}}\bar{\bm{Q}}_{p}(z)\bm{\Sigma}^{\frac{1}{2}}(\bm{R}^{{}^{\prime\prime}})^{\frac{1}{2}}&\bm{0}\\ (\bm{R}^{{}^{\prime\prime}})^{\frac{1}{2}}\bm{\Sigma}^{\frac{1}{2}}\bar{\bm{Q}}_{p}(z)\bm{\Sigma}^{\frac{1}{2}}&\frac{1}{z}\left((\bm{R}^{{}^{\prime\prime}})^{\frac{1}{2}}\bm{\Sigma}^{\frac{1}{2}}\bar{\bm{Q}}_{p}(z)\bm{\Sigma}^{\frac{1}{2}}(\bm{R}^{{}^{\prime\prime}})^{\frac{1}{2}}-\bm{I}_{p}\right)&\bm{0}\\ \bm{0}&\bm{0}&\tilde{m}(z)\bm{I}_{n}\end{array}\right],

(25)

where $\tilde{m}(z)=-z^{-1}(1+\phi^{1/2}e(z))^{-1}$ . The anisotropic local law aims to control the difference $\bm{G}(z)-\bar{\bm{G}}(z)$ for $z\in\mathcal{D}$ . A crucial observation is that defining the parameter $z$ -dependent covariance $\bm{\Sigma}(z)=z\bm{\Sigma}(z\bm{I}_{p}-\bm{R}^{\prime})^{-1}$ with the corresponding measure $\pi^{z}=\sum_{i=1}^{p}\delta_{\lambda_{i}(\bm{\Sigma}(z))}/p$ , where $\delta_{z}$ is the Dirac point measure at $z$ , we have the following $z$ -dependent deformed Marčenko-Pastur law,

\displaystyle\frac{1}{\tilde{m}(z)}=-z+\int\frac{\phi^{\frac{1}{2}}t}{1+\phi^{-\frac{1}{2}}\tilde{m}(z)t}\mathrm{d}\pi^{z}(t).

(26)

This result mirrors the form of the deformed Marčenko-Pastur law in Ding and Wang (2023), with $\bm{\Sigma}(z)$ as the covariance matrix. This demonstrates that the effect of $\bm{R}$ can be represented by turning $\bm{\Sigma}$ into the $z$ -dependent covariance $\bm{\Sigma}(z)$ . This insight simplifies our proof and presentation. For the denominator in (26), we impose the following technical assumption.

Assumption 4.

When $\alpha=1,$ we require that there exists $\tau>0$ such that

\displaystyle|1+\phi^{-\frac{1}{2}}\tilde{m}(E_{\pm})\lambda_{i}(\bm{\Sigma}(E_{\pm}))|\geq\tau,\quad i=1,\cdots,p.

(27)

We provide several remarks regarding Assumption 4. Informally, condition (27) ensures that the extreme eigenvalues of $\bm{\Sigma}$ do not spread near the endpoints $E_{\pm}$ , thereby preventing spikes outside the support of $\rho$ . Similar assumptions have been made in the literature for the universality of $\hat{\bm{\Sigma}}$ , using the Stieltjes transform in place of $\tilde{m}$ and $\lambda_{i}(\bm{\Sigma})$ in place of $\lambda_{i}(\bm{\Sigma}(E_{\pm}))$ , as in Bao, Pan and Zhou (2015); Knowles and Yin (2017). This aligns with the intuition that the effect of $\bm{R}$ can be expressed through the transformation from $\bm{\Sigma}$ to $\bm{\Sigma}(z)$ . Moreover, (27) is automatically satisfied when $\phi=p/n\to\infty$ , i.e. $\alpha>1$ . We thus impose Assumption 4 only in the case $\alpha=1$ .

To state our main results, we define the concept of stochastic domination, introduced by Erdős, Knowles and Yau (2013) and widely applied in random matrix theory (Knowles and Yin, 2017). For two families of non-negative random variables $A=\left(A^{(n)}(u)\ :\ n\in\mathbb{N},u\in U^{(n)}\right)$ and $B=\left(B^{(n)}(u)\ :\ n\in\mathbb{N},u\in U^{(n)}\right)$ , where $U^{(n)}$ is $n$ -dependent parameter set, we say $A$ is stochastically dominated by $B$ uniformly in $u$ if for any $\epsilon>0$ and $D>0$ , we have $\sup_{u\in U^{(n)}}\mathbb{P}\left(A^{(n)}(u)>n^{\epsilon}B^{(n)}(u)\right)\leq n^{-D}$ for large enough $n\geq n_{0}(\epsilon,D)$ . We use the notation $A\prec B$ to represent this relationship. When $A$ is a family of negative or complex random variables, we also write $A\prec B$ or $A=O_{\prec}(B)$ to indicate $|A|\prec B$ . With these definitions, we state the anisotropic local law as follows.

Theorem 3.1 (Anisotropic local law).

Define the control parameter as $\Psi(z)=\sqrt{\frac{\Im\tilde{m}(z)}{n\eta}}+\frac{1}{n\eta}$ . Suppose that Assumptions 1, 2, 3, 4 and (17) hold.

(i) We have

\displaystyle\bm{u}^{T}\bm{\Phi}\bar{\bm{\Sigma}}^{-1}(\bm{G}(z)-\bar{\bm{G}}(z))\bar{\bm{\Sigma}}^{-1}\bm{\Phi}\bm{v}=O_{\prec}(\|\bm{u}\|\|\bm{v}\|\Psi(z)),

(28)

uniformly in vectors $\bm{u},\bm{v}\in\mathbb{R}^{n+2p}$ and $z\in\mathcal{D}$ , where the weight matrix $\bm{\Phi}$ is defined as $\bm{\Phi}=\text{diag}(\phi^{-1/4}\bm{I}_{p},\phi^{-3/4}\bm{I}_{p},\bm{I}_{n})$ and the augemented covariance matrix is $\bar{\bm{\Sigma}}=\text{diag}(\bm{\Sigma},\bm{I}_{n+p})$ .

(ii) Moreover, we have the average local law

\displaystyle m_{p}(z)-\bar{m}_{p}(z)=O_{\prec}((n\eta)^{-1}),

(29)

uniformly in $z\in\mathcal{D}$ .

The anisotropic local law provides a delicate characterization of the discrepancy between the random Green function and its deterministic counterpart, yielding a precise bound to analyze the behavior of the extreme eigenvalues of $\bm{M}_{n}^{\prime}$ . When $\bm{R}=\bm{0}$ , similar results have been established for $\alpha=1$ in Knowles and Yin (2017) and for $\alpha\geq 1$ in Ding and Wang (2023); Ding, Hu and Wang (2024). As demonstrated in Ding and Wang (2023), the results for general $\alpha\geq 1$ hold without requiring the dimension $p$ and sample size $n$ to grow proportionally, resulting in different convergence rates for the blocks of $\bm{G}(z)$ . Ding and Wang (2023) separately provides convergence rates for each block and offers a coarse bound for the anisotropic local law

\displaystyle G_{i\mu}(z)=O_{\prec}(\phi^{-\frac{1}{4}}\Psi(z)),\ i=1,\cdots,p,\mu=p+1,\cdots,2p+n.

In this study, we enhance these findings by introducing the weight matrix $\bm{\Phi}$ , allowing us to express the anisotropic local law (28) in a more compact form. This reformulation refines the convergence rate and simplifies our proof.

Building on the anisotropic local law theorem, we proceed to establish our universality result. Our first step involves deriving key implications from Theorem 3.1. Recall that $\rho$ is the measure on $\mathbb{R}$ whose Stieltjes transform is the deterministic equivalence $\bar{m}_{p}(z)$ . We define the quantile sequence of $\rho$ as $w_{1}\geq\cdots\geq w_{p}$ such that $\int_{w_{i}}^{+\infty}\rho(x)\mathrm{d}x=(i-1/2)/p$ for $i=1,\cdots,p$ . We aim to demonstrate that the eigenvalues $\lambda_{i}(\bm{M}_{n}^{\prime})$ of $\bm{M}_{n}^{\prime}$ remain close to $w_{i}$ for $i=1,\cdots,p$ .

Theorem 3.2 (Rigidity of eigenvalues).

Suppose that Assumptions 1, 2, 3, 4 hold.

(i) When $\alpha=1$ , we have for any fixed integer number $k\geq 1$ ,

\displaystyle|\lambda_{i}(\bm{M}_{n}^{\prime})-w_{i}|,\ |\lambda_{p-i}(\bm{M}_{n}^{\prime})-w_{p-i}|\prec(\min\{i,p+1-i\})^{-\frac{1}{3}}n^{-\frac{2}{3}},

(30)

uniformly for $1\leq i\leq k$ .

(ii)When $\alpha>1$ , we have

\displaystyle|\lambda_{i}(\bm{M}_{n}^{\prime})-w_{i}|\prec(\min\{i,n+1-i\})^{-\frac{1}{3}}n^{-\frac{2}{3}},

(31)

uniformly for $1\leq i\leq n$ .

Theorem 3.2 establishes the rigidity of the eigenvalues of $\bm{M}_{n}^{\prime}$ , with two main implications. First, when $\alpha=1$ , equation (30) shows that the largest and smallest $k$ -th eigenvalues lie within an $n^{-2/3+\epsilon}$ -neighborhood of $w_{p}$ , respectively, for any $\epsilon>0$ . Furthermore, we have $|w_{1}-E_{+}|,|w_{p}-E_{-}|\prec n^{-2/3}$ , leading to $|\lambda_{i}(\bm{M}_{n}^{\prime})-E_{+}|,|\lambda_{p-i}(\bm{M}_{n}^{\prime})-E_{-}|\prec n^{-2/3}$ uniformly for $1\leq i\leq k$ . Second, with Theorem 3.2, we demonstrate that $w_{i}\asymp\phi^{1/2}\to+\infty$ for $1\leq i\leq n$ , revealing a fast $n^{-2/3+\epsilon}$ approximation rate for the diverging quantities $\lambda_{i}(\bm{M}_{n}^{\prime})$ and $w_{i}$ .

To establish the universality result, we provide bounds for the discrepancy between the distribution of $\lambda_{1}(\bm{M}_{n}^{\prime})$ and its Gaussian counterpart. We denote by $\mathbb{P}^{\text{Gau}}$ and $\mathbb{E}^{\text{Gau}}$ the probability and expectation under the additional assumption that $\{Z_{ij}\}_{1\leq i\leq n,1\leq j\leq p}$ are independent standard Gaussian variables. With this notation, we present the following result.

Theorem 3.3 (Universality of the largest eigenvalue).

Under Assumptions 1, 2, 3, 4, there exists constant $C$ for large enough $n$ and $t\in\mathbb{R}$ such that for any small $\epsilon>0$ ,

$\displaystyle\mathbb{P}(n^{\frac{2}{3}}($	$\displaystyle\lambda_{1}(\bm{M}_{n}^{\prime})-E_{+})\leq t-n^{-\epsilon})-n^{-\frac{1}{6}+C\epsilon}$
$\displaystyle\leq$	$\displaystyle\mathbb{P}^{\text{Gau}}(n^{\frac{2}{3}}(\lambda_{1}(\bm{M}_{n}^{\prime})-E_{+})\leq t)$	(32)
	$\displaystyle\leq\mathbb{P}(n^{\frac{2}{3}}(\lambda_{1}(\bm{M}_{n}^{\prime})-E_{+})\leq t+n^{-\epsilon})+n^{-\frac{1}{6}+C\epsilon}.$

When $\alpha=1$ , similar results also hold for $\lambda_{p}(\bm{M}_{n}^{\prime})$ .

The universality Theorem 3.3 shows that the asymptotic distributions of $\lambda_{1}(\bm{M}_{n})$ , $\lambda_{p}(\bm{M}_{n})$ , and consequently $\sigma_{1}(\bm{M}_{n})=\|\bm{M}_{n}\|_{\text{op}}$ , rely solely on the first two moments of $\bm{X}$ . In contrast, as discussed in Section 4.3, other widely-used norms of $\bm{M}_{n}$ , such as $\|\bm{M}_{n}\|_{\text{F}}$ and $\|\bm{M}_{n}\|_{\text{sup}}$ ,are influenced by the first four moments of $\bm{X}$ . This universality characteristic of the operator norm offers a straightforward yet effective framework for constructing rejection regions in hypothesis testing. A detailed exploration of this application is presented in Section 3.3. Before applying universality to covariance matrix testing, we outline several corollaries of Theorem 3.3, which are of notable independent interest.

Corollary 1.

Consider the case $\alpha>1$ , $\bm{\Sigma}=\bm{I}_{p}$ , $\bm{R}=\bm{0}$ . Define $\mu=p$ , $\sigma=p^{1/2}n^{-1/6}$ . Under Assumption 1, we have as $n\to\infty$ ,

\displaystyle\frac{\lambda_{1}(\bm{Z}^{T}\bm{Z})-\mu}{\sigma}\Rightarrow\text{TW}_{1},

(33)

where “ $\Rightarrow$ ” represents convergence in law, and $\text{TW}_{1}$ is the Tracy-Widom distribution of type $1$ .

One of the central problems in random matrix theory is establishing the asymptotic distribution of the largest eigenvalue of a sample covariance matrix. In this context, (33) was derived for Gaussian $\bm{Z}$ in the setting where $p/n\to\infty$ (Karoui, 2003). However, for general distributions of $\bm{Z}$ where $\alpha>1$ , no further results have been established. Our universality Theorem 3.3 addresses this significant gap in random matrix theory.

Corollary 2.

Consider the case $\alpha=1$ , $\bm{\Sigma}=\bm{I}_{p}$ , and a general symmetric matrix $\bm{R}$ . Under Assumptions 1,2,4, we have $n\to\infty$ ,

\displaystyle\frac{n^{\frac{2}{3}}(\lambda_{1}(\hat{\bm{\Sigma}}+\bm{R})-E_{+})}{\sigma}\Rightarrow\text{TW}_{1}.

(34)

Here $E_{+}$ and $\sigma$ are two constants depending on $\bm{R}$ .

As demonstrated in Section 5 in the Supplement, when $\bm{Z}$ follows a Gaussian distribution, (34) can be derived using the findings from Ji and Park (2021). For non-Gaussian distributions of $\bm{Z}$ , our universality Theorem 3.3 extends this result to the same form. These results expand the traditional analysis of extreme eigenvalues in the sample covariance matrix $\bm{\hat{\Sigma}}$ to a more general setting $\hat{\bm{\Sigma}}-\bm{\Sigma}_{0}$ , where $\bm{R}=-\bm{\Sigma}_{0}$ , with broader applications in covariance testing within high-dimensional statistics.

3.3 Universal bootstrap for covariance test

Theorem 3.3 is commonly termed universality in random matrix literature concerning extreme eigenvalue, see Bao, Pan and Zhou (2015); Knowles and Yin (2017). However, this universality does not ensure the validity of our universal bootstrap procedure, which depends on bounds like (7) and (8). Specifically, (3.3) provides bounds on $\mathbb{P}(n^{2/3}(T-\phi^{1/2}E)\leq t\pm n^{-\epsilon})$ and $\mathbb{P}(n^{2/3}(T^{\text{ub}}-\phi^{1/2}E)\leq t)$ with $E=\max\{E_{+},-E_{-}\}$ , while we aim to bound $\mathbb{P}(n^{2/3}(T-\phi^{1/2}E)\leq t)$ and $\mathbb{P}(n^{2/3}(T^{\text{ub}}-\phi^{1/2}E)\leq t)$ . Establishing this bound requires demonstrating the closeness between $\mathbb{P}(n^{2/3}(T-\phi^{1/2}E)\leq t\pm n^{-\epsilon})$ and $\mathbb{P}(n^{2/3}(T-\phi^{1/2}E)\leq t)$ . This step relies on the anti-concentration inequality (Chernozhukov, Chetverikov and Kato, 2013) that will be established below.

We first define the Lévy anti-concentration function for random variable $T$ as

\displaystyle a(T,\delta)=\sup_{t\in\mathbb{R}}\mathbb{P}(t\leq T\leq t+\delta),

(35)

for $\delta>0$ . We shall provide bounds for $a(T,\delta)$ . We note that the theorems derived thus far apply specifically to the extreme eigenvalues $\lambda_{1}(\bm{M}_{n}^{\prime})$ and $\lambda_{p}(\bm{M}_{n}^{\prime})$ . As we will show, extending these results to extreme singular values $\sigma_{1}(\bm{M}_{n}^{\prime})$ requires only weaker assumptions.

Assumption 4^′.

When $\alpha=1,$ we require that there exists $\tau>0$ such that

(i) if $|E_{+}|>|E_{-}|$ , we assume $|1+\phi^{-1/2}\tilde{m}(E_{+})\lambda_{i}(\bm{\Sigma}(E_{+}))|\geq\tau$ for $i=1,\cdots,p$ .

(ii) if $|E_{+}|<|E_{-}|$ , we assume $|1+\phi^{-1/2}\tilde{m}(E_{-})\lambda_{i}(\bm{\Sigma}(E_{-}))|\geq\tau$ for $i=1,\cdots,p$ .

(iii) if $|E_{+}|=|E_{-}|$ , we assume $|1+\phi^{-1/2}\tilde{m}(E_{\pm})\lambda_{i}(\bm{\Sigma}(E_{\pm}))|\geq\tau$ for $i=1,\cdots,p$ .

Assumption 4^′ is evidently weaker than Assumption 4. The reasoning behind Assumption 4^′ is straightforward: given that the singular value $\sigma(\bm{M}_{n}^{\prime})=\max\{|\lambda_{1}(\bm{M}_{n}^{\prime})|,|\lambda_{p}(\bm{M}_{n}^{\prime})|\}$ , if $|E_{+}|>|E_{+}|$ , then with probability $1$ , $\sigma(\bm{M}_{n}^{\prime})=|\lambda_{1}(\bm{M}_{n}^{\prime})|$ for sufficiently large $n$ , meaning only the spectral behavior near the right edge $E_{+}$ is necessary to consider. A similar argument applies when $|E_{+}|<|E_{+}|$ . While this relaxation may appear minor, the following example demonstrates its merit in extending the applicability of our theorem, even in the simplest case.

Example 1.

Consider the case $\alpha=1$ and $\bm{\Sigma}=\bm{I}_{p}$ , $\bm{R}=-\bm{I}_{p}$ . We can show that $\bm{M}_{n}^{\prime}=\phi^{-1/2}(\hat{\bm{\Sigma}}-\bm{I}_{p})$ satisfies Assumption 4^′ for all $n$ and $p$ , but fails to satisfy Assumption 4 if $p>n$ .

This demonstrates the relevance of introducing Assumption 4^′ when focusing on $\sigma(\bm{M}_{n}^{\prime})$ . Specifically, in the case where $p>n$ , the smallest eigenvalue, $\lambda_{p}(\bm{M}_{n}^{\prime})$ , becomes isolated at the left edge $E_{-}=-\phi^{-1/2}$ and does not satisfy Assumption 4. However, we can show that when $p>n$ , $|E_{+}|>|E_{-}|$ , allowing Assumption 4^′ to hold consistently. Accordingly, we adopt Assumption 4^′ in our analysis of extreme singular values. Recalling that $T$ is the largest singular value of $\bm{\Sigma}-\bm{\Sigma}_{0}$ , we are led to define the concept of an admissible pair.

Definition 1 (Admissible pair).

We call two $p$ by $p$ non-negative matrices $(\bm{\Sigma},\bm{\Sigma}_{0})$ an admissible pair, if $(\bm{\Sigma},\bm{R})=(\bm{\Sigma},-\bm{\Sigma}_{0})$ satisfy Assumptions 2, 3, 4^′.

Theorem 3.4 (Anti-concentration inequality).

Under Assumption 1, for any admissible pair $(\bm{\Sigma},\bm{\Sigma}_{0})$ and $\delta\leq n^{-2/3}$ , we have

\displaystyle a(T,\delta)\prec n^{\frac{2}{3}}\delta.

(36)

Equipped with these results, we present the universal consistency theorem for $T$ . Recall the test in (5), where we reject $H_{0}$ if $T\geq\hat{q}^{\text{ub},B}_{\bm{\Sigma}_{0}}(\alpha)$ . Here $\hat{q}^{\text{ub},B}_{\bm{\Sigma}_{0}}(\alpha)=\hat{q}^{\text{ub},B}_{\bm{\Sigma}_{0},\bm{\Sigma}_{0}}(\alpha)$ and $\hat{q}^{\text{ub},B}_{\bm{\Sigma},\bm{\Sigma}_{0}}(\alpha)$ denotes the empirical upper $\alpha$ -th quantile of $T^{\text{ub},b}$ , $b=1,\cdots,B$ . To establish the theoretical validity of this universal bootstrap procedure, we provide a bound on the uniform Gaussian approximation error

\displaystyle\rho_{n}(\bm{\Sigma}_{0})=\sup_{t\geq 0}\bigg{|}\mathbb{P}\bigg{(}\bm{\hat{\Sigma}}\in\mathbf{B}_{\text{op}}(\bm{\Sigma}_{0},t)\bigg{)}-\mathbb{P}\bigg{(}\bm{\hat{\Sigma}}^{\text{ub}}\in\mathbf{B}_{\text{op}}(\bm{\Sigma}_{0},t)\bigg{)}\bigg{|},

(37)

where $\mathbf{B}_{\text{op}}(\bm{\Sigma}_{0},t)$ is the operator norm ball centered at $\bm{\Sigma}_{0}$ with radius $t$ . The probability in (37) is defined with respect to $\bm{X}$ and $\bm{Y}$ , whose rows have covariance matrix $\bm{\Sigma}$ , which may differ from $\bm{\Sigma}_{0}$ . We summarize the results for universal bootstrap consistency as follows.

Theorem 3.5 (Universal bootstrap consistency).

Under Assumption 1, for any admissible pair $(\bm{\Sigma},\bm{\Sigma}_{0})$ , we have the uniform Gaussian approximation bound

\displaystyle\rho_{n}(\bm{\Sigma}_{0})\leq Cn^{-\delta}\to 0\quad\text{as}\quad n\to\infty,

(38)

for some constants $C$ , $\delta>0$ .

Moreover, we have with probability approaching $1$ ,

\displaystyle\sup_{0\leq\alpha\leq 1}\bigg{|}\mathbb{P}\bigg{(}T\geq\hat{q}^{\text{ub},B}_{\bm{\Sigma},\bm{\Sigma}_{0}}(\alpha)\bigg{|}\bm{Y}^{1},\cdots,\bm{Y}^{B}\bigg{)}-\alpha\bigg{|}\lesssim\rho_{n}(\bm{\Sigma}_{0})+B^{-\frac{1}{2}}.

(39)

Theorem 3.5 establishes the uniform Gaussian approximation bound and Type I error bound of the universal bootstrap. Expression (39) ensures that we can uniformly control the test size with an error of at most $n^{-\delta}+B^{-1/2}$ , which vanishes as $n\to\infty$ and $B\to\infty$ . This result, therefore, guarantees the uniform consistency of the universal bootstrap procedure. To appreciate these results, we compare our universal bootstrap with the widely used multiplier bootstrap that approximates the distribution of $T$ using the distribution of $T^{\text{mb}}=\left\|\frac{1}{n}\sum_{i=1}^{n}\epsilon_{i}(\bm{X}_{i}\bm{X}_{i}^{T}-\hat{\bm{\Sigma}})\right\|_{\text{op}}$ conditioned on $\bm{X}$ , where $\epsilon_{1},\cdots,\epsilon_{n}\in\mathbb{R}$ are random variables with zero mean and unit variance. To ensure consistency of multiplier bootstrap, Han, Xu and Zhou (2018) required a low dimension condition, while Lopes (2022a); Lopes, Erichson and Mahoney (2023); Giessing (2023) imposed a fast eigen-decay condition, as discussed in Section 2. These methods rely on the consistency of sample covariance, i.e. $\|\bm{\hat{\Sigma}}-\bm{\Sigma}\|_{\text{op}}\to 0$ , which does not hold even when $\bm{\Sigma}=\bm{I}_{p}$ and $\alpha\geq 1$ . This highlights the key advantage of our proposal for covariance testing in a high-dimensional setting where estimation consistency is unattainable, yet test consistency is assured.

3.4 Sharp uniform simultaneous confidence intervals

The approximation of the distribution of the largest singular value has numerous applications, including determining the number of spikes, as discussed in Ding and Yang (2022). Next, we present an additional important application of Theorem 3.5. Define the inner product of two matrices $\bm{A}$ and $\bm{B}$ as $\langle\bm{A},\bm{B}\rangle=\text{tr}(\bm{A}^{T}\bm{B})$ , and the Schatten $1-$ norm of matrix $\bm{A}$ as $\|\bm{A}\|_{S_{1}}=\sum_{i=1}\sigma_{i}(\bm{A})$ . We shall construct sharp uniform simultaneous confidence intervals for $\langle\bm{A},\bm{\Sigma}\rangle$ and $\bm{c}_{1}^{T}\bm{\Sigma}\bm{c}_{2}$ for all $\bm{A}\in\mathbb{R}^{p\times p}$ and $\bm{c}_{1},\bm{c}_{2}\in\mathbb{R}^{p}$ . To achieve this, we note that $\hat{q}^{\text{ub},B}_{\bm{\Sigma}}(\alpha)=\hat{q}^{\text{ub},B}_{\text{Sp}(\bm{\Sigma})}(\alpha)$ where the spectral matrix is defined as $\text{Sp}(\bm{\Sigma})=\text{diag}(\lambda_{1}(\bm{\Sigma}),\cdots,\lambda_{p}(\bm{\Sigma}))$ . Although the complete matrix $\bm{\Sigma}$ cannot be consistently estimated in our setting, certain estimators $\widehat{\text{Sp}}(\bm{\Sigma})$ of the spectral matrix $\text{Sp}(\bm{\Sigma})$ have been shown to be consistent under a suitable normalized distance. For instance, given the distance $d^{2}_{\text{Sp}}(\widehat{\text{Sp}}(\bm{\Sigma}),\text{Sp}(\bm{\Sigma}))=\text{tr}(\widehat{\text{Sp}}(\bm{\Sigma})-\text{Sp}(\bm{\Sigma}))^{2}/p$ , Ledoit and Wolf (2015); Kong and Valiant (2017) proposed estimators $\widehat{\text{Sp}}(\bm{\Sigma})$ satisfying $d_{\text{Sp}}(\widehat{\text{Sp}}(\bm{\Sigma}),\text{Sp}(\bm{\Sigma}))\to 0$ as $n\to\infty$ when $\alpha=1$ . Therefore, the estimated threshold $\hat{q}^{\text{ub},B}_{\widehat{\text{Sp}}(\bm{\Sigma})}(\alpha)$ can replace the unknown $\hat{q}^{\text{ub},B}_{\text{Sp}(\bm{\Sigma})}(\alpha)$ .

Theorem 3.6.

Under Assumption 1, for any admissible pair $(\bm{\Sigma},\bm{\Sigma})$ and estimators $\widehat{\text{Sp}}(\bm{\Sigma})$ of $\text{Sp}(\bm{\Sigma})$ , we have the following sharp uniform simultaneous confidence intervals for $\langle\bm{A},\bm{\Sigma}\rangle$

	$\displaystyle\sup_{0\leq\alpha\leq 1}\bigg{\|}\mathbb{P}\bigg{(}\forall\bm{A}\in\mathbb{R}^{p\times p},\ \langle\bm{A},\bm{\Sigma}\rangle\in\bigg{[}\langle\bm{A},\bm{\hat{\Sigma}}\rangle-\hat{q}^{\text{ub},B}_{\widehat{\text{Sp}}(\bm{\Sigma})}(\alpha)\$	$\displaystyle\\|\bm{A}\\|_{S_{1}},$		(40)
	$\displaystyle\langle\bm{A},\bm{\hat{\Sigma}}\rangle+\hat{q}^{\text{ub},B}_{\widehat{\text{Sp}}(\bm{\Sigma})}(\alpha)\ \\|\bm{A}\\|_{S_{1}}\bigg{]}\bigg{\|}\bm{Y}^{1},\cdots,\bm{Y}^{B}\bigg{)}-(1-\alpha)\bigg{\|}\lesssim\rho_{n}(\bm{\Sigma}_{0})+$	$\displaystyle d_{\text{Sp}}(\widehat{\text{Sp}}(\bm{\Sigma}),\text{Sp}(\bm{\Sigma}))+B^{-\frac{1}{2}}.$

As a special case, we can also construct sharp uniform simultaneous confidence intervals for $\bm{c}_{1}^{T}\bm{\Sigma}\bm{c}_{2}$

	$\displaystyle\sup_{0\leq\alpha\leq 1}\bigg{\|}\mathbb{P}\bigg{(}\forall\bm{c}_{1},\bm{c}_{2}\in\mathbb{R}^{p},\ \bm{c}_{1}^{T}\bm{\Sigma}\bm{c}_{2}\in\bigg{[}\bm{c}_{1}^{T}\bm{\hat{\Sigma}}\bm{c}_{2}-\hat{q}^{\text{ub},B}_{\widehat{\text{Sp}}(\bm{\Sigma})}(\alpha)\$	$\displaystyle\\|\bm{c}_{1}\\|\cdot\\|\bm{c}_{2}\\|,$		(41)
	$\displaystyle\bm{c}_{1}^{T}\bm{\hat{\Sigma}}\bm{c}_{2}+\hat{q}^{\text{ub},B}_{\widehat{\text{Sp}}(\bm{\Sigma})}(\alpha)\ \\|\bm{c}_{1}\\|\cdot\\|\bm{c}_{2}\\|\bigg{]}\bigg{\|}\bm{Y}^{1},\cdots,\bm{Y}^{B}\bigg{)}-(1-\alpha)\bigg{\|}$	$\displaystyle\lesssim\rho_{n}(\bm{\Sigma}_{0})+d_{\text{Sp}}(\widehat{\text{Sp}}(\bm{\Sigma}),\text{Sp}(\bm{\Sigma}))+B^{-\frac{1}{2}}.$

Theorem 3.6 establishes sharp uniform simultaneous confidence intervals for both $\langle\bm{A},\bm{\Sigma}\rangle$ and $\bm{c}_{1}^{T}\bm{\Sigma}\bm{c}_{2}$ . Specifically, intervals (40) and (41) are sharp, as their probability converges uniformly to $1-\alpha$ , rather than no less than $1-\alpha$ . When $\bm{c}_{1}$ and $\bm{c}_{2}$ are chosen as the canonical basis vectors $\bm{e}_{1},\cdots,\bm{e}_{p}$ in $\mathbb{R}^{p}$ , we obtain a uniform entry-wise confidence interval for all $\sigma_{ij}$ . Notably, Corollary 3.6 extends this result by providing confidence intervals for all linear combinations of entries of $\bm{\Sigma}$ . Constructing confidence intervals for all such combinations is considerably more challenging than for individual entries. For a mean vector in the simpler vector case, achieving confidence intervals for each entry requires only $\ln p=o(n^{a})$ for some $a>0$ , see Chernozhukov, Chetverikov and Kato (2013, 2017); Chernozhukov, Chetverikov and Koike (2023). In contrast, confidence intervals for all linear combinations of the mean vector require $p=o(n)$ , see Zhai (2018); Xu, Zhang and Wu (2019); Fang and Koike (2024). Remarkably, we are able to construct sharp simultaneous confidence intervals for all $\langle\bm{A},\bm{\Sigma}\rangle$ even as $p/n$ converges or diverges to infinity.

4 Power analysis and the generalized universal bootstrap approximation

4.1 Power analysis for operator norm-based statistics

As noted in Section 1, several popular statistics utilize the operator norm in covariance testing, such as Roy’s largest root statistic $T^{\text{Roy}}$ in (3). Our Theorem 3.5 also applies to Roy’s largest root with universal bootstrap, prompting the question of which test to use. In this subsection, we perform a power analysis across a family of statistics, with $T$ and $T^{\text{Roy}}$ examined as specific cases.

We conduct a power analysis under the generalized spike model setting, which allows variability in the bulk eigenvalues and does not assume block independence between the spiked and bulk parts, as in Bai and Yao (2012) and further developed by Jiang and Bai (2021). We assume the following generalized spike structure

\displaystyle\bm{\Sigma}=\bm{U}\left[\begin{array}[]{cc}\bm{D}&\bm{0}\\ \bm{0}&\bm{V}_{2}\end{array}\right]\bm{U}^{T},\ \bm{\Sigma}_{0}=\bm{U}\left[\begin{array}[]{cc}\bm{V}_{1}&\bm{0}\\ \bm{0}&\bm{V}_{2}\end{array}\right]\bm{U}^{T},\ \bm{\Sigma}_{1}=\bm{U}\left[\begin{array}[]{cc}\bm{R}_{1}&\bm{0}\\ \bm{0}&\bm{R}_{2}\end{array}\right]\bm{U}^{T},

(48)

where $\bm{D},\bm{V}_{1},\bm{R}_{1}\in\mathbb{R}^{k\times k}$ , $\bm{V}_{2},\bm{R}_{2}\in\mathbb{R}^{(p-k)\times(p-k)}$ are diagonal matrices, $k$ is a fixed number, and $\bm{U}$ is orthogonal matrix. Thus

\displaystyle\bm{\Sigma}_{1}^{-\frac{1}{2}}\bm{\Sigma}\bm{\Sigma}_{1}^{-\frac{1}{2}}=\bm{U}\left[\begin{array}[]{cc}\bm{D}^{\prime}&\bm{0}\\ \bm{0}&\bm{V}^{\prime}_{2}\end{array}\right]\bm{U}^{T},\ \bm{\Sigma}_{1}^{-\frac{1}{2}}\bm{\Sigma}_{0}\bm{\Sigma}_{1}^{-\frac{1}{2}}=\bm{U}\left[\begin{array}[]{cc}\bm{V}^{\prime}_{1}&\bm{0}\\ \bm{0}&\bm{V}^{\prime}_{2}\end{array}\right]\bm{U}^{T},

(53)

where $\bm{D}^{\prime}=\bm{D}\bm{R}^{-1}_{1}$ , $\bm{V}_{1}^{\prime}=\bm{V}_{1}\bm{R}^{-1}_{1}$ , $\bm{V}_{2}^{\prime}=\bm{V}_{2}\bm{R}^{-1}_{2}$ . We assume that $\|\bm{D}^{\prime}\|_{\text{op}}$ , $\|\bm{V}^{\prime}_{1}\|_{\text{op}}$ , $\|\bm{V}^{\prime}_{2}\|_{\text{op}}$ are bounded, without requiring the eigenvalues of $\bm{V}_{2}$ to be the same, unlike the classical spike model. We consider the family of statistics

\displaystyle T(\bm{\Sigma}_{1})=\left\|\bm{\Sigma}_{1}^{-\frac{1}{2}}\left(\hat{\bm{\Sigma}}-\bm{\Sigma}_{0}\right)\bm{\Sigma}_{1}^{-\frac{1}{2}}\right\|_{\text{op}},

(54)

and reject $H_{0}$ if $T(\bm{\Sigma}_{1})\geq\hat{q}^{\text{ub},B}_{\bm{\Sigma}_{1}^{-1/2}\bm{\Sigma}_{0}\bm{\Sigma}_{1}^{-1/2}}(\alpha)$ . Notably, $T=T(\bm{I}_{p})$ , $T^{\text{Roy}}=T(\bm{\Sigma}_{0})$ are both included in this family. Define $E_{+}(\bm{\Sigma}_{1})$ , $E_{-}(\bm{\Sigma}_{1})$ and $\tilde{m}(z;\bm{\Sigma}_{1})$ as in Section 3 with $(\bm{\Sigma},\bm{R})=(\bm{V}_{2}^{\prime},-\bm{V}_{2}^{\prime})$ in $\bm{M}_{n}$ in (14). Intuitively, $E_{+}(\bm{\Sigma}_{1})$ , $E_{-}(\bm{\Sigma}_{1})$ and $\tilde{m}(z;\bm{\Sigma}_{1})$ are limits of the upper and lower bound and the Stieltjes transform of bulk eigenvalues of $\phi^{-1/2}\bm{\Sigma}_{1}^{-1/2}\left(\hat{\bm{\Sigma}}-\bm{\Sigma}_{0}\right)\bm{\Sigma}_{1}^{-1/2}$ . For simplicity, we restrict our analysis to $\alpha=1$ and $|E_{+}(\bm{\Sigma}_{1})|>|E_{-}(\bm{\Sigma}_{1})|$ .

Theorem 4.1.

Under Assumptions 1 and (48), define $\bm{D}^{\prime}=\text{diag}\left((d_{i}^{\prime})_{i=1}^{k}\right)$ , $\bm{V}_{1}^{\prime}=\text{diag}\left((v_{i}^{\prime})_{i=1}^{k}\right)$ and the threshold

\displaystyle\kappa=-\frac{\phi^{\frac{1}{2}}}{\tilde{m}(\phi^{-\frac{1}{2}}E_{+};\bm{\Sigma}_{1})}-\frac{\phi^{\frac{1}{2}}v_{i}^{\prime}}{E_{+}\tilde{m}(\phi^{-\frac{1}{2}}E_{+};\bm{\Sigma}_{1})}.

(i) If for some $1\leq i\leq k$ we have $d_{i}^{\prime}>\kappa$ , then

\displaystyle\mathbb{P}\left(T(\bm{\Sigma}_{1})\geq\hat{q}^{\text{ub},B}_{\bm{\Sigma}_{1}^{-\frac{1}{2}}\bm{\Sigma}_{0}\bm{\Sigma}_{1}^{-\frac{1}{2}}}(\alpha)\right)\to 1,\ \text{as}\ n\to\infty,\ B\to\infty,

i.e. the power goes to $1$ .

(ii) If for all $1\leq i\leq k$ , we have $d_{i}^{\prime}<\kappa$ and $T(\bm{\Sigma}_{1})=\lambda_{1}\left(\bm{\Sigma}_{1}^{-1/2}\left(\hat{\bm{\Sigma}}-\bm{\Sigma}_{0}\right)\bm{\Sigma}_{1}^{-1/2}\right)$ , then

\displaystyle\mathbb{P}\left(T(\bm{\Sigma}_{1})\geq\hat{q}^{\text{ub},B}_{\bm{\Sigma}_{1}^{-1/2}\bm{\Sigma}_{0}\bm{\Sigma}_{1}^{-1/2}}(\alpha)\right)\to\alpha,\ \text{as}\ n\to\infty,\ B\to\infty,

i.e. there is no power under this setting.

Theorem 4.1 provides the power analysis of the test based on the statistic (54). Specifically, it identifies the phase transition point

\displaystyle-\frac{\phi^{\frac{1}{2}}}{\tilde{m}(\phi^{-\frac{1}{2}}E_{+};\bm{\Sigma}_{1})}-\frac{\phi^{\frac{1}{2}}v_{i}^{\prime}}{E_{+}\tilde{m}(\phi^{-\frac{1}{2}}E_{+};\bm{\Sigma}_{1})},

which closely resembles the well-known Baik-Ben Arous-Péché (BBP) phase transition for the largest eigenvalues of the sample covariance matrix, see Baik, Ben Arous and Péché (2005); Baik and Silverstein (2006); Paul (2007). When the spike eigenvalues exceed this phase transition point, they lie outside the bulk eigenvalue support, yielding full test power. Conversely, if the spike eigenvalues $d^{\prime}_{i}$ fall below this point, distinguishing them from the null case using $T(\bm{\Sigma}_{1})$ becomes infeasible, as the spike and bulk eigenvalues are indistinguishably close. It is noted that Theorem 4.1 focuses solely on the largest eigenvalue spike, as analyzing the smallest eigenvalue spike is more complex. Since this power analysis serves as a preliminary illustration, we do not extend our investigation in that direction.

With Theorem 4.1, we can now deduce the power of $T$ and $T^{\text{Roy}}$ as special cases.

Corollary 3.

Under the same assumption in theorem 4.1, define $\bm{D}=\text{diag}\left((d_{i})_{i=1}^{k}\right)$ , $\bm{V}_{1}=\text{diag}\left((v_{i})_{i=1}^{k}\right)$ .

(i) If for some $1\leq i\leq k$ , we have $d_{i}-v_{i}>-\frac{\phi^{1/2}}{\tilde{m}(\phi^{-1/2}E_{+})}-v_{i}\left(1+\frac{\phi^{1/2}}{E_{+}\tilde{m}(\phi^{-1/2}E_{+})}\right)$ , then

\displaystyle\mathbb{P}\left(T\geq\hat{q}^{\text{ub},B}_{\bm{\Sigma}_{0}}(\alpha)\right)\to 1,\ \text{as}\ n\to\infty,\ B\to\infty,

i.e. the power goes to $1$ . Here $\tilde{m}(z)=\tilde{m}(z;\bm{\Sigma})$ satisfies $E_{+}\tilde{m}(E_{+})<-1$ .

(ii) If for some $1\leq i\leq k$ , we have $d_{i}-v_{i}>\phi^{1/2}v_{i}$ , then

\displaystyle\mathbb{P}\left(T^{\text{Roy}}\geq\hat{q}^{\text{ub},B}_{\bm{I}_{p}}(\alpha)\right)\to 1,\ \text{as}\ n\to\infty,\ B\to\infty,

i.e. the power goes to $1$ .

We compare the phase transition points (i) and (ii) for $T$ and $T^{\text{Roy}}$ , respectively. These points reveal distinct behaviors: as $\phi^{-1/2}E_{+}\tilde{m}(\phi^{-1/2}E_{+})<-1$ , the phase transition point of (i) decreases with $v_{i}$ , whereas the phase transition point of (ii) increases with $v_{i}$ . This indicates that when spikes affect the larger eigenvalues, i.e. for large $v_{i}$ , then the statistics $T$ achieves full power more effectively than the normalized $T^{\text{Roy}}$ due to a lower phase transition point. Conversely, the normalized statistic $T^{\text{Roy}}$ performs better for smaller $v_{i}$ . Thus, Corollary 3 quantifies this intuitive observation.

4.2 The generalized universal bootstrap

Given that neither $T$ and $T^{\text{Roy}}$ dominates across all settings, a natural approach is to combine these two statistics to enhance testing power, see $T^{\text{Com}}$ in (9) as an example. In Section 5, simulations demonstrate that $T^{\text{Com}}$ can outperform both $T$ and $T^{\text{Roy}}$ in certain scenarios. However, conducting tests with $T^{\text{Com}}$ requires calculating an appropriate threshold. While Theorem 3.5 provides thresholds for $T$ and $T^{\text{Roy}}$ individually, it cannot be directly applied to $T^{\text{Com}}$ due to the complex dependence between $T$ and $T^{\text{Roy}}$ . To address this, we develop a generalized universality theorem that applies to a broad class of statistics involving extreme singular values.

We consider the statistic $T^{\text{ExS}}$ as a general function of extreme singular values

\displaystyle T^{\text{ExS}}=f\bigg{(}\bigg{\{}\left(\sigma_{k}\left(\bm{\Sigma}_{2,1}^{-\frac{1}{2}}(\bm{\hat{\Sigma}}-\bm{\Sigma}_{1,1})\bm{\Sigma}_{2,1}^{-\frac{1}{2}}\right)\right)_{k=1}^{k_{1}},\cdots,\left(\sigma_{k}\left(\bm{\Sigma}_{2,M}^{-\frac{1}{2}}(\bm{\hat{\Sigma}}-\bm{\Sigma}_{1,M})\bm{\Sigma}_{2,M}^{-\frac{1}{2}}\right)\right)_{k=1}^{k_{M}}\bigg{\}}\bigg{)},

(55)

where $f$ is a measurable function, $(\bm{\Sigma}_{1,m},\bm{\Sigma}_{2,m})$ are matrices, and $k_{m}$ denotes integers for $m=1,\cdots,M$ . The universal bootstrapped version, $T^{\text{ExS,ub}}$ , is defined by replacing $\bm{\hat{\Sigma}}$ with $\bm{\hat{\Sigma}}^{\text{ub}}$ in (55). We then define the empirical universal bootstrap threshold $\hat{q}^{\text{ExS,ub,B}}_{\bm{\Sigma},\left\{\bm{\Sigma}_{1,m},\bm{\Sigma}_{2,m}\right\}_{m=1}^{M}}(\alpha)$ as the upper $\alpha$ -th quantile of $T^{\text{ExS,ub,1}},\cdots,T^{\text{ExS,ub,B}}$ of the i.i.d. samples $T^{\text{ExS,ub}}$ . This setup enables the following generalized universal bootstrap consistency result.

Theorem 4.2 (Generalized universal bootstrap consistency).

Under Assumption 1, given $M$ admissible pairs $(\bm{\Sigma}_{2,m}^{-1/2}\bm{\Sigma}\bm{\Sigma}_{2,m}^{-1/2},\bm{\Sigma}_{2,m}^{-1/2}\bm{\Sigma}_{1,m}\bm{\Sigma}_{2,m}^{-1/2})$ and fixed integer numbers $k_{m}$ for $m=1,\cdots,M$ , we have the uniform Gaussian approximation bound

\displaystyle\rho_{n}(\left\{\bm{\Sigma}_{1,m},\bm{\Sigma}_{2,m}\right\}_{m=1}^{M})\leq Cn^{-\delta}\to 0\quad\text{as}\quad n\to\infty,

(56)

for some constants $C$ , $\delta>0$ , where $\rho_{n}(\left\{\bm{\Sigma}_{1,m},\bm{\Sigma}_{2,m}\right\}_{m=1}^{M})$ is defined in (11).

Moreover, we have

\displaystyle\sup_{0\leq\alpha\leq 1}\bigg{|}\mathbb{P}\bigg{(}T^{\text{ExS}}\geq\hat{q}^{\text{ExS,ub,B}}_{\bm{\Sigma},\left\{\bm{\Sigma}_{1,m},\bm{\Sigma}_{2,m}\right\}_{m=1}^{M}}(\alpha)\bigg{)}-\alpha\bigg{|}\lesssim\rho_{n}(\bm{\Sigma}_{0})+B^{-\frac{1}{2}}.

(57)

Theorem 4.2 extends Theorem 3.5 by demonstrating that universal bootstrap consistency applies not only to $T$ and $T^{\text{Roy}}$ , but also to combinations like $T^{\text{Com}}$ , the sum of the largest $k$ -th singular values of $\bm{\hat{\Sigma}}$ proposed by Ke (2016), and to all statistics based on extreme singular values with arbitrary dependencies in the form of (55). This result reinforces our universal bootstrap principle: to approximate the distribution of statistics involving extreme eigenvalues, it suffices to substitute all random variables with their Gaussian counterparts, showcasing the practical advantage of universality.

4.3 Comparison with other norms

In this subsection, we compare statistics based on the operator norm with those based on the two widely used norms: the Frobenius norm and the supremum norm. Define the statistics

\displaystyle T^{\text{F}}=\left\|\bm{\Sigma}^{-\frac{1}{2}}\bm{\hat{\Sigma}}\bm{\Sigma}^{-\frac{1}{2}}-\bm{I}_{p}\right\|_{\text{\text{F}}},\ T^{\text{sup}}=\left\|\bm{\Sigma}^{-\frac{1}{2}}\bm{\hat{\Sigma}}\bm{\Sigma}^{-\frac{1}{2}}-\bm{I}_{p}\right\|_{\text{\text{sup}}}.

(58)

While the universality Theorem 3.3 holds for operator norm-based statistics $T$ and $T^{\text{Roy}}$ , we demonstrate that it does not extend to $T^{\text{F}}$ or $T^{\text{sup}}$ . Theorem 3.3 specifies that the asymptotic distribution of $T$ and $T^{\text{Roy}}$ depends solely on the first two moments of $\bm{X}$ , allowing for the construction of a universal bootstrap procedure. In contrast, the asymptotic distributions of $T^{\text{F}}$ and $T^{\text{sup}}$ rely on the first four moments of $\bm{X}$ . We present the following modified assumptions to establish the asymptotic distribution of $T^{\text{F}}$ .

Assumption 1^′.

Suppose that $\bm{Z}=\left\{Z_{ij},i=1,\cdots,n;j=1,\cdots,p\right\}$ are i.i.d. random variables with $\mathbb{E}[Z_{ij}]=0$ , $\mathbb{E}[Z_{ij}^{2}]=1$ , $\mathbb{E}[Z_{ij}^{4}]=\nu_{4}$ . Assume that $\mathbb{E}[|Z_{ij}|^{6+\epsilon}]<\infty$ for some $\epsilon>0$ .

Proposition 1.

Under Assumption 1^′, if $\alpha=1$ or $\alpha\geq 2$ , we have

\displaystyle\frac{n}{p}(T^{\text{F}})^{2}-\frac{1}{p}\left(\text{tr}(\bm{\hat{\Sigma}}\bm{\Sigma}^{-1})\right)^{2}-(\nu_{4}-2)\Rightarrow\mathcal{N}(0,4).

To facilitate the Gaussian approximation of $T^{\text{sup}}$ , we introduce the following sub-Gaussian assumption. This assumption is employed here for simplicity and can be extended to a more general, though tedious bound if needed.

Assumption 1^′′.

Suppose that for $i=1,\cdots,n;\ j_{1},j_{2}=1,\cdots,p$ ,

\displaystyle\mathbb{E}\left[\text{exp}(|Z_{ij_{1}}Z_{ij_{2}}|/B_{n})\right]\leq 2,

for some positive sequence $B_{n}$ .

Assumption 2^′′.

We have for $i=1,\cdots,n;\ j_{1},j_{2}=1,\cdots,p$ ,

\displaystyle b_{1}^{2}\leq\frac{1}{n}\sum_{i=1}^{n}\mathbb{E}\big{[}(Z_{ij_{1}}Z_{ij_{2}}-\sigma_{j_{1}j_{2}})^{2}\big{]},\ \frac{1}{n}\sum_{i=1}^{n}\mathbb{E}\big{[}(Z_{ij_{1}}Z_{ij_{2}}-\sigma_{j_{1}j_{2}})^{4}\big{]}\leq B_{n}^{2}b_{2}^{2}

for some positive numbers $b_{1}$ , $b_{2}$ and sequence $B_{n}$ in Assumption 1^′′.

Proposition 2.

If Assumptions 1^′′, 2^′′ hold, we have

\displaystyle\sup_{t\geq 0}\bigg{|}\mathbb{P}\bigg{(}T^{\text{sup}}\leq t\bigg{)}-\mathbb{P}\bigg{(}\|\bm{G}\|_{\text{sup}}\leq t\bigg{)}\bigg{|}\lesssim\left(\frac{B_{n}^{2}\text{ln}^{2}(pn)}{n}\right)^{\frac{1}{4}},

where $\bm{G}\in\mathbb{R}^{p\times p}$ is a Gaussian random matrix with zero mean and covariance $\text{Cov}(G_{i_{1}j_{1}},G_{i_{2}j_{2}})=\frac{1}{n}\sum_{k=1}^{n}\text{Cov}(Z_{ki_{1}}Z_{kj_{1}},Z_{ki_{2}}Z_{kj_{2}})$ , $i_{1},i_{2},j_{1},j_{2}=1,\cdots,p$ .

Propositions 1 and 2 follow as direct corollaries from Qiu, Li and Yao (2023) and Chernozhuokov et al. (2022), and we omit the proofs here. These propositions demonstrate that statistics based on the Frobenius and supremum norms lack the second-moment universality as the operator norm. Consequently, tests relying on the Frobenius and supremum norms require normalization through fourth-moment estimates of $\bm{X}$ , a process more complex than estimating the covariance structure itself. This complexity arises because the Frobenius and supremum norms reduce the covariance matrices to vector forms, calculating the $L^{2}$ and $L^{\infty}$ norms, respectively. As a result, deriving the asymptotic distribution necessitates the second moment of the sample covariance, equivalent to the fourth moment of $\bm{X}$ . Thus, the operator norm emerges as a more easy-to-implement option for covariance testing.

5 Numerical results

5.1 Simulation

In this subsection, we evaluate the empirical size and power of the proposed universal bootstrap for the operator norm-based statistics $T$ in (1) (Opn) and the combined statistics $T^{\text{Com}}$ in (9) (Com). These are compared with the operator norm-based statistic $T^{\text{Roy}}$ in (3) (Roy), the Frobenius norm-based linear spectral statistic $T^{\text{F},1}$ from Qiu, Li and Yao (2023) (Lfn), the debiased Frobenius norm-based U statistic $T^{\text{F},2}$ from Cai and Ma (2013) (Ufn), and the supremum norm-based statistic $T^{\text{sup}}$ inspired by Cai and Jiang (2011); Cai, Liu and Xia (2013) (Supn). The expressions for these statistics are provided in Section 12 of the Supplement.

To evaluate the robustness of our method, we generate $n$ independent, non-identically distributed samples of $p$ -dimensional random vectors with zero means and covariance matrix $\bm{\Sigma}$ . We set sample sizes $n=100$ and $n=300$ and vary the dimension $p$ over $100$ , $200$ , $500$ , $1000$ , and $2000$ to cover both proportional and ultra-high-dimensional cases. For each configuration, we calculate the average number of rejections based on $2000$ replications, with the universal bootstrap procedure using $1000$ resamplings. The significance level for all tests is fixed at $0.05$ . We consider three different structures for the null covariance matrix $\bm{\Sigma}_{0}=(\sigma_{0,ij})_{p\times p}$ :

(a). Exponential decay. The elements $\sigma_{0,ij}=0.6^{|i-j|}$ .

(b). Block diagonal. Let $K=\lfloor p/10\rfloor$ be the number of blocks, with each diagonal block being $0.55\bm{1}_{10}^{\ }\bm{1}_{10}^{T}+0.45\bm{I}_{10}$ .

(c). Signed sub-exponential decay. The elements $\sigma_{0,ij}=(-1)^{i+j}0.4^{|i-j|^{0.5}}$ .

These structures have been previously analyzed in Cai, Liu and Xia (2013); Zhu et al. (2017); Yu, Li and Xue (2024) for models (a) and (b), and in Cai, Liu and Xia (2013); Yu, Li and Xue (2024) for model (c), highlighting the broad applicability of our approach. To assess the empirical size, we examine the following distributions of $\bm{X}=\bm{\Sigma}^{\frac{1}{2}}\bm{Z}$ :

(1). Gaussian distribution: $\{Z_{ij},\ i=1,\cdots,n;j=1,\cdots,p\}$ are i.i.d. standard normal random variables.

(2). Uniform and t-distributions: $\{Z_{ij},\ i=1,\cdots,\lfloor n/2\rfloor;j=1,\cdots,p\}$ are i.i.d. normalized uniform random variables on $[-1,1]$ , while $\{Z_{ij},\ i=\lfloor n/2\rfloor+1,\cdots,n;j=1,\cdots,p\}$ are i.i.d. normalized t random variables with degree of freedom $12$ .

(3). Gaussian and uniform distributions: $\{Z_{ij},\ i=1,\cdots,\lfloor n/2\rfloor;j=1,\cdots,p\}$ are i.i.d. standard normal random variables, while $\{Z_{ij},\ i=\lfloor n/2\rfloor+1,\cdots,n;j=1,\cdots,p\}$ are i.i.d. normalized uniform random variables on $[-1,1]$ .

(4). Gaussian and t-distributions: $\{Z_{ij},\ i=1,\cdots,\lfloor n/2\rfloor;j=1,\cdots,p\}$ are i.i.d. standard normal random variables, while $\{Z_{ij},\ i=\lfloor n/2\rfloor+1,\cdots,n;j=1,\cdots,p\}$ are i.i.d. normalized t random variables with degree of freedom $12$ .

Common covariance test cases often consider the Gaussian and uniform distributions Chen, Qiu and Zhang (2023), while the t-distribution with $12$ degrees of freedom is examined in Cai, Liu and Xia (2013); Zhu et al. (2017). All distributions are standardized to zero mean and unit variance. Except for the Gaussian baseline, the other cases share identical covariance matrices but differ in distribution, creating challenging yet crucial scenarios for testing covariance structures without imposing additional distributional assumptions. These cases reflect the broad applicability of our universality result.

Table 1: The empirical sizes of the supremum, the Frobenius, and the operator norm tests with a significance level

0.05

for data with Gaussian distribution and the t and uniform distribution and the combinations of the sample size

n

, the dimension

p

, and the covariance matrix

\bm{\Sigma}_{0}

$\bm{\Sigma}_{0}$ $p$	Exp. decay(0.6)					Block diagonal					Signed subExp. decay(0.4)
$\bm{\Sigma}_{0}$ $p$	100	200	500	1000	2000	100	200	500	1000	2000	100	200	500	1000	2000
	Gaussian, n=100
Supn	0.203	0.240	0.316	0.406	0.487	0.203	0.240	0.316	0.406	0.487	0.203	0.240	0.316	0.406	0.487
Lfn	0.049	0.050	0.049	0.050	0.055	0.049	0.050	0.049	0.050	0.055	0.049	0.050	0.049	0.050	0.055
Ufn	0.048	0.056	0.046	0.049	0.056	0.048	0.056	0.046	0.049	0.056	0.048	0.056	0.046	0.049	0.056
Roy	0.050	0.050	0.045	0.053	0.049	0.050	0.050	0.045	0.053	0.049	0.050	0.050	0.045	0.053	0.049
Opn	0.051	0.046	0.048	0.044	0.049	0.048	0.047	0.048	0.047	0.052	0.053	0.055	0.049	0.049	0.052
Com	0.050	0.046	0.046	0.047	0.049	0.050	0.046	0.047	0.045	0.050	0.055	0.054	0.048	0.052	0.051
	Gaussian, n=300
Supn	0.073	0.066	0.071	0.070	0.076	0.073	0.066	0.071	0.070	0.076	0.073	0.066	0.071	0.070	0.076
Lfn	0.056	0.048	0.045	0.045	0.053	0.056	0.048	0.045	0.045	0.053	0.056	0.048	0.045	0.045	0.053
Ufn	0.053	0.048	0.044	0.044	0.057	0.053	0.048	0.044	0.044	0.057	0.053	0.048	0.044	0.044	0.057
Roy	0.048	0.049	0.050	0.052	0.051	0.048	0.049	0.050	0.052	0.051	0.048	0.049	0.050	0.052	0.051
Opn	0.057	0.052	0.047	0.051	0.052	0.052	0.050	0.052	0.052	0.049	0.042	0.046	0.054	0.049	0.054
Com	0.057	0.050	0.046	0.052	0.052	0.053	0.050	0.050	0.051	0.048	0.042	0.048	0.052	0.049	0.055
	uniform and t(df=12), n=100
Supn	0.251	0.293	0.391	0.492	0.613	0.251	0.293	0.391	0.492	0.613	0.251	0.293	0.391	0.492	0.613
Lfn	0.057	0.050	0.053	0.052	0.062	0.057	0.050	0.053	0.052	0.062	0.057	0.050	0.053	0.052	0.062
Ufn	0.056	0.051	0.048	0.051	0.064	0.056	0.051	0.048	0.051	0.064	0.056	0.051	0.048	0.051	0.064
Roy	0.050	0.045	0.051	0.050	0.050	0.050	0.045	0.051	0.050	0.050	0.050	0.051	0.051	0.050	0.050
Opn	0.045	0.033	0.043	0.041	0.047	0.052	0.043	0.042	0.051	0.043	0.047	0.047	0.051	0.060	0.049
Com	0.045	0.033	0.044	0.041	0.050	0.051	0.044	0.041	0.055	0.046	0.049	0.047	0.046	0.063	0.055
	uniform and t(df=12), n=300
Supn	0.093	0.089	0.098	0.109	0.125	0.093	0.089	0.098	0.109	0.125	0.093	0.089	0.098	0.109	0.125
Lfn	0.055	0.047	0.050	0.039	0.053	0.055	0.047	0.050	0.039	0.053	0.055	0.047	0.050	0.039	0.053
Ufn	0.056	0.044	0.049	0.040	0.056	0.056	0.044	0.049	0.040	0.056	0.056	0.044	0.049	0.040	0.056
Roy	0.050	0.046	0.038	0.053	0.054	0.050	0.046	0.038	0.053	0.054	0.050	0.046	0.038	0.053	0.054
Opn	0.040	0.051	0.049	0.051	0.050	0.042	0.053	0.051	0.047	0.051	0.046	0.042	0.036	0.042	0.061
Com	0.046	0.051	0.041	0.048	0.047	0.043	0.052	0.050	0.046	0.052	0.049	0.045	0.038	0.045	0.058

Table 1 reports the empirical sizes of the supremum, the Frobenius, and the operator norm tests at the significance level $0.05$ across various sample size $n$ , dimension $p$ , covariance $\bm{\Sigma}_{0}$ and data generation distributions of Gaussian and the uniform and t distributions, (distributions (1) and (2)). For results on Gaussian-uniform and Gaussian- $t$ distributions (distributions (3) and (4)), see Table 2 in the Supplement, which demonstrates similar patterns. The three operator norm tests are performed using the proposed universal bootstrap procedure, while the supremum and Frobenius norm tests rely on their respective asymptotic distribution formulas. Both Table 1 and Table 2 in the Supplement show that the universal bootstrap effectively controls the size of all operator norm-based statistics at the nominal significance level across all tested scenarios. This approach performs well under both Gaussian and non-Gaussian distributions, for both i.i.d. and non-i.i.d. data, in proportional and ultra-high dimensional settings, and across various covariance structures. These findings confirm the universality and consistency of the universal bootstrap procedure, providing empirical support for our theoretical results in Theorem 3.5 and Theorem 4.2. Additionally, the Frobenius norm-based test maintains appropriate size control, while the supremum norm-based test exhibits substantial size inflation at $n=100$ and moderate inflation even at a larger sample size of $n=300$ .

To evaluate the empirical power of these statistics, we consider a setting where $\bm{\Sigma}=\bm{\Sigma}_{0}+\bm{\Delta}$ , with $\bm{\Delta}$ having a low rank. This setup corresponds to the power analysis presented in Theorem 4.1. As shown in Theorem 4.1, the statistic $T$ outperforms $T^{\text{Roy}}$ when the eigenspaces of $\bm{\Delta}$ align with the eigenvectors of $\bm{\Sigma}_{0}$ corresponding to larger eigenvalues. In contrast, $T^{\text{Roy}}$ performs better when $\bm{\Delta}$ aligns with the smaller eigenvectors of $\bm{\Sigma}_{0}$ . For a fair comparison, we define $\bm{\Delta}=\sigma\bm{u}\bm{u}^{T}/2+\sigma\bm{v}\bm{v}^{T}/4$ , where $\bm{u}$ is the fifth-largest eigenvector of $\bm{\Sigma}_{0}$ , $\bm{v}$ is uniformly sampled on the unit sphere in $\mathbb{R}^{p}$ , and $\sigma$ is the signal strength. We term this configuration the spike setting. For completeness, we also conduct a power analysis where $\bm{\Delta}$ has full rank by setting $\bm{\Delta}=\sigma\bm{I}_{p}$ , where $\sigma$ represents the signal level. This configuration, termed the white noise setting, represents the covariance structure after adding white noise to the null covariance. Together, the spike and white noise settings cover both low- and full-rank deviations from the null covariance. For illustration, we consider sample sizes $n=100$ and $300$ , with dimension $p=1000$ . Based on the universality results in Theorem 3.5 and empirical size performance, we conduct the power analysis using a Gaussian distribution for simplicity. Further empirical power comparisons under varying sample sizes and dimensions are provided in Table 1 of the Supplement table.

Figure 1 illustrates the empirical power performance of six test statistics across different signal levels and null covariance structures under the spike setting, where the dashed line represents the significance level of $0.05$ . Under this setting, the proposed operator norm-based statistics, $T$ and $T^{\text{Com}}$ exhibit similar power, while the two Frobenius norm-based statistics and the normalized operator norm statistic $T^{\text{Roy}}$ perform comparably. When $n=100$ , the operator norm-based statistics $T$ and $T^{\text{Com}}$ maintain appropriate test size and achieve the highest power, approaching $1$ quickly across all three covariance configurations. The normalized operator norm statistic $T^{\text{Roy}}$ and the two Frobenius norm-based statistics $T^{\text{F},1}$ and $T^{\text{F},2}$ also control the test size but exhibit lower power, among which $T^{\text{Roy}}$ slightly outperforms. The supremum norm-based statistic, however, exhibits significant size inflation and fails to detect signals, as its power curve does not increase with signal level. The pattern is similar for $n=300$ , except that the power curves for the supremum norm-based statistic remain near the dashed line, indicating limited power performance. This provides empirical evidence for the advantage of operator norm-based statistics over the other norms in the spike setting, particularly supporting the proposed statistic $T$ over $T^{\text{Roy}}$ . We further make comparisons under the white noise setting across various signal levels in Figure 2. In this setting, the combined statistic $T^{\text{Com}}$ and the normalized statistic $T^{\text{Roy}}$ outperform other statistics. Because the white noise setting applies a uniform signal across eigenvector directions of $\bm{\Sigma}_{0}$ with both large and small eigenvalues, the normalized statistic $T^{\text{Roy}}$ achieves higher power than $T$ , which is consistent with our power analysis Theorem 4.1. Notably, the combined statistic $T^{\text{Com}}$ performs robustly in both the spike and white noise settings, demonstrating enhanced power without size inflation, as supported by the generalized universal bootstrap consistency Theorem 4.2. As observed in the spike setting, Frobenius norm-based statistics display lower power, while the supremum norm-based statistic fails to control size. In summary, when the covariance structure is known, we recommend using the statistic $T$ . When the covariance structure is unknown, we prefer the more robust combined statistic, $T^{\text{Com}}$ .

5.2 Data application

We apply our test procedure to annual mean near-surface air temperature data from the global scale for the period $1960-2010$ , using the HadCRUT4 dataset (Morice et al. (2012)) and the Coupled Model Intercomparison Project Phase 5 (CMIP5, Taylor, Stouffer and Meehl (2012)), as detailed in Li et al. (2021). The global dataset includes monthly anomalies of near-surface air temperature across $5^{\circ}\times 5^{\circ}$ grid boxes. To reduce dimensionality, these grid boxes are aggregated into larger $40^{\circ}\times 30^{\circ}$ boxes, resulting in $S=54$ spatial grid boxes. To mitigate distribution shifts due to long-term trends, we divide the $1960-2010$ period into five decades and analyze the data for each decade separately. The monthly data for each decade are averaged over five-year intervals, yielding a temporal dimension $T=2$ for each ten-year period. To illustrate the rationale of the covariance test in this context, we first outline the commonly used modeling procedure in climatology. Following Li et al. (2021), one considers a high-dimensional linear regression with errors-in-variables

\displaystyle\bm{Y}_{k}=\sum_{i=1}^{2}\bm{X}_{k,i}\beta_{k,i}+\bm{\epsilon}_{k},\ k=1,\cdots,5,

where $\bm{Y}_{k}\in\mathbb{R}^{L}$ is the vectorized observed mean temperature across spatial and temporal dimensions for the $k$ -th decades after $1960$ , with dimension $L=S\times T$ . The vectors $\bm{X}_{k,1}$ , $\bm{X}_{k,2},\bm{\epsilon}_{k}\in\mathbb{R}^{L}$ represent the expected unobserved climate response under the ANT and NAT forcings, and the unobserved noise, respectively. The coefficients $\beta_{k,1}$ , $\beta_{k,2}$ are unknown scaling factors of interest for statistical inference. To estimate $\beta_{k,1}$ , $\beta_{k,2}$ , in addition to the observed data $\bm{Y}_{k}$ , we have $n_{i}$ noisy observation fingerprints $\tilde{\bm{X}}_{k,i,j}$ of the climate system

\displaystyle\tilde{\bm{X}}_{k,i,j}=\bm{X}_{k,i}+\tilde{\bm{\epsilon}}_{k,i,j},\ k=1,\cdots,5;\ j=1,\cdots,n_{i};\ i=1,2.

In our dataset, $n_{1}=35$ , $n_{2}=46$ . For privacy, the data $\tilde{\bm{X}}_{k,i,j}$ are preprocessed by adding white noise, with the according covariance of the noise added to the hypothesized matrix $\bm{\Sigma}_{\bm{\epsilon},k}$ defined below. The climate models also provide $N$ simulations $\hat{\bm{\epsilon}}_{k,i}$ , $k=1,\cdots,5;\ i=1,\cdots,N$ with $N=223$ provided in this dataset, which are assumed to follow the same distribution as the unobserved noise $\bm{\epsilon}_{k}$ . To efficiently estimate $\beta_{1},\beta_{2}$ , a typical assumption is that the natural variability simulated by the climate models matches the observed variability, i.e., the covariance of $\tilde{\bm{\epsilon}}_{k,i,j}$ , $k=1,\dots,5;\ j=1,\dots,n_{i};\ i=1,2$ , equals the covariance matrix $\bm{\Sigma}_{\bm{\epsilon},k}$ of $\bm{\epsilon}_{k}$ . Since $\bm{\epsilon}_{k}$ is unobserved, a common choice for $\bm{\Sigma}_{\bm{\epsilon},k}$ is the sample covariance of the simulations $\bm{\Sigma}_{\bm{\epsilon},k}=\sum_{i=1}^{N}\hat{\bm{\epsilon}}_{k,i}\hat{\bm{\epsilon}}_{k,i}^{T}/N$ . Therefore, it is important to test the hypothesis

\displaystyle H_{0,k}:\text{Cov}(\tilde{\bm{\epsilon}}_{k,i,j})=\bm{\Sigma}_{\bm{\epsilon},k},\ k=1,\cdots,5;\ j=1,\cdots,n_{i};\ i=1,2.

(59)

As noted by Olonscheck and Notz (2017), this equivalence (59) is crucial for optimal fingerprint estimation in climate models. However, few studies have validated (59) with statistical evidence, which calls for the need to test this hypothesis.

We construct the data matrix $\tilde{\bm{X}}_{k}\in\mathbb{R}^{n\times p}$ by combining quantities $\tilde{\bm{X}}_{k,1,j}-\bar{\tilde{\bm{X}}}_{k,1}$ for $j=1,\dots,n_{1}$ and $\tilde{\bm{X}}_{k,2,j}-\bar{\tilde{\bm{X}}}_{k,2}$ for $j=1,\dots,n_{2}$ , where $n=n_{1}+n_{2}=81$ and $p=L=108$ . Here $\bar{\tilde{\bm{X}}}_{k,i}=\sum_{j=1}^{n_{i}}\tilde{\bm{X}}_{k,i,j}/n_{i}$ . We apply several statistics to the data matrix $\tilde{\bm{X}}_{k}$ with the hypothesized matrix $\bm{\Sigma}_{\bm{\epsilon},k}$ : the proposed operator norm-based statistics $T$ (Opn) and $T^{\text{Com}}$ (Com), the operator norm-based statistics $T^{\text{Roy}}$ (Roy), the Frobenius norm-based statistics $T^{\text{F},1}$ (Lfn) and $T^{\text{F},2}$ (Ufn), and the supremum norm-based statistic $T^{\text{sup}}$ (Supn). For comparison, we generate i.i.d. Gaussian samples $\tilde{\bm{X}}_{k,j}^{\prime}\sim\mathcal{N}(\bm{0},\bm{\Sigma}_{\bm{\epsilon},k})$ for $j=1,\dots,n;\ k=1,\dots,5$ as a control group. The observed data $\tilde{\bm{X}}_{k}$ forms the observation group. The p-value results for these statistics are summarized in Table 2. The supremum norm-based statistic $T^{\text{sup}}$ fails to reject the observation group for all periods but rejects the control group in the periods $1960-1970$ , $1970-1980$ , and $1980-1990$ . The Frobenius norm-based statistics $T^{\text{F},1}$ and $T^{\text{F},2}$ fail to reject the observation group in the periods $1970-1980$ and $1990-2000$ , and $T^{\text{F},2}$ rejects the control group during $1990-2000$ . Only the operator norm-based statistics $T$ , $T^{\text{Com}}$ , and $T^{\text{Roy}}$ reject the null hypothesis in the observation group while not rejecting it in the control group for all years. This suggests that the commonly assumed hypothesis (59) should be rejected, and a more suitable assumption should be used to estimate $\beta_{k,1}$ and $\beta_{k,2}$ for all $k$ .

Table 2: Estimated p-values of tests for covariances for observation group and control group using the supremum norm-based statistic (Supn), the Frobenius norm-based statistic (Lfn, Ufn), and the operator norm-based statistics (Roy, Opn, Com), with sample size

n=81

, dimension

p=108

p-value	observation group					control group
year	Supn	Lfn	Ufn	Roy	Opn	Com	Supn	Lfn	Ufn	Roy	Opn	Com
1960-1970	.783	.006	.001	.000	.000	.000	.002	.566	.521	.805	.646	.734
1970-1980	.406	.146	.115	.000	.000	.000	.004	.230	.243	.401	.452	.427
1980-1990	.249	.001	.000	.000	.000	.000	.000	.419	.451	.638	.713	.677
1990-2000	.905	.634	.410	.000	.000	.000	.223	.072	.042	.445	.383	.414
2000-2010	.984	.002	.001	.000	.000	.000	.719	.368	.376	.244	.288	.268

In summary, the operator norm-based statistics demonstrate strong performance across various covariance structures and signal configurations, with the universal bootstrap ensuring the consistency of tests constructed from diverse combinations of operator norms. Numerical results align with the theoretical framework, indicating that the proposed tests exhibit robust power properties and that the universal bootstrap procedures maintain appropriate size control.

{funding}

Fang Yao is the corresponding author. This research is partially supported by the National Key Research and Development Program of China (No. 2022YFA1003801), the National Natural Science Foundation of China (No. 12292981, 11931001), the LMAM, the Fundamental Research Funds for the Central Universities, Peking University, and LMEQF.

References

Adamczak et al. (2011) {barticle}[author] \bauthor\bsnmAdamczak, \bfnmRadosław\binitsR., \bauthor\bsnmLitvak, \bfnmAlexander E\binitsA. E., \bauthor\bsnmPajor, \bfnmAlain\binitsA. and \bauthor\bsnmTomczak-Jaegermann, \bfnmNicole\binitsN. (\byear2011). \btitleSharp bounds on the rate of convergence of the empirical covariance matrix. \bjournalComptes Rendus. Mathématique \bvolume349 \bpages195–200. \endbibitem
Bai and Yao (2012) {barticle}[author] \bauthor\bsnmBai, \bfnmZhidong\binitsZ. and \bauthor\bsnmYao, \bfnmJianfeng\binitsJ. (\byear2012). \btitleOn sample eigenvalues in a generalized spiked population model. \bjournalJournal of Multivariate Analysis \bvolume106 \bpages167–177. \endbibitem
Baik, Ben Arous and Péché (2005) {barticle}[author] \bauthor\bsnmBaik, \bfnmJinho\binitsJ., \bauthor\bsnmBen Arous, \bfnmGérard\binitsG. and \bauthor\bsnmPéché, \bfnmSandrine\binitsS. (\byear2005). \btitlePhase transition of the largest eigenvalue for nonnull complex sample covariance matrices. \endbibitem
Baik and Silverstein (2006) {barticle}[author] \bauthor\bsnmBaik, \bfnmJinho\binitsJ. and \bauthor\bsnmSilverstein, \bfnmJack W\binitsJ. W. (\byear2006). \btitleEigenvalues of large sample covariance matrices of spiked population models. \bjournalJournal of multivariate analysis \bvolume97 \bpages1382–1408. \endbibitem
Bao, Pan and Zhou (2015) {barticle}[author] \bauthor\bsnmBao, \bfnmZhigang\binitsZ., \bauthor\bsnmPan, \bfnmGuangming\binitsG. and \bauthor\bsnmZhou, \bfnmWang\binitsW. (\byear2015). \btitleUniversality for the largest eigenvalue of sample covariance matrices with general population. \endbibitem
Bianchi et al. (2011) {barticle}[author] \bauthor\bsnmBianchi, \bfnmPascal\binitsP., \bauthor\bsnmDebbah, \bfnmMerouane\binitsM., \bauthor\bsnmMa\̈text{i}da, \bfnmMylène\binitsM. and \bauthor\bsnmNajim, \bfnmJamal\binitsJ. (\byear2011). \btitlePerformance of statistical tests for single-source detection using random matrix theory. \bjournalIEEE Transactions on Information theory \bvolume57 \bpages2400–2419. \endbibitem
Bunea and Xiao (2015) {barticle}[author] \bauthor\bsnmBunea, \bfnmFlorentina\binitsF. and \bauthor\bsnmXiao, \bfnmLuo\binitsL. (\byear2015). \btitleOn the sample covariance matrix estimator of reduced effective rank population matrices, with applications to fPCA. \endbibitem
Cai and Jiang (2011) {barticle}[author] \bauthor\bsnmCai, \bfnmT Tony\binitsT. T. and \bauthor\bsnmJiang, \bfnmTiefeng\binitsT. (\byear2011). \btitleLimiting laws of coherence of random matrices with applications to testing covariance structure and construction of compressed sensing matrices. \endbibitem
Cai, Liu and Xia (2013) {barticle}[author] \bauthor\bsnmCai, \bfnmTony\binitsT., \bauthor\bsnmLiu, \bfnmWeidong\binitsW. and \bauthor\bsnmXia, \bfnmYin\binitsY. (\byear2013). \btitleTwo-sample covariance matrix testing and support recovery in high-dimensional and sparse settings. \bjournalJournal of the American Statistical Association \bvolume108 \bpages265–277. \endbibitem
Cai, Liu and Xia (2014) {barticle}[author] \bauthor\bsnmCai, \bfnmT Tony\binitsT. T., \bauthor\bsnmLiu, \bfnmWeidong\binitsW. and \bauthor\bsnmXia, \bfnmYin\binitsY. (\byear2014). \btitleTwo-sample test of high dimensional means under dependence. \bjournalJournal of the Royal Statistical Society Series B: Statistical Methodology \bvolume76 \bpages349–372. \endbibitem
Cai and Ma (2013) {barticle}[author] \bauthor\bsnmCai, \bfnmT Tony\binitsT. T. and \bauthor\bsnmMa, \bfnmZongming\binitsZ. (\byear2013). \btitleOptimal hypothesis testing for high dimensional covariance matrices. \endbibitem
Chen and Qin (2010) {barticle}[author] \bauthor\bsnmChen, \bfnmSong Xi\binitsS. X. and \bauthor\bsnmQin, \bfnmYing-Li\binitsY.-L. (\byear2010). \btitleA two-sample test for high-dimensional data with applications to gene-set testing. \endbibitem
Chen, Qiu and Zhang (2023) {barticle}[author] \bauthor\bsnmChen, \bfnmSong Xi\binitsS. X., \bauthor\bsnmQiu, \bfnmYumou\binitsY. and \bauthor\bsnmZhang, \bfnmShuyi\binitsS. (\byear2023). \btitleSharp optimality for high-dimensional covariance testing under sparse signals. \bjournalThe Annals of Statistics \bvolume51 \bpages1921–1945. \endbibitem
Chen, Zhang and Zhong (2010) {barticle}[author] \bauthor\bsnmChen, \bfnmSong Xi\binitsS. X., \bauthor\bsnmZhang, \bfnmLi-Xin\binitsL.-X. and \bauthor\bsnmZhong, \bfnmPing-Shou\binitsP.-S. (\byear2010). \btitleTests for high-dimensional covariance matrices. \bjournalJournal of the American Statistical Association \bvolume105 \bpages810–819. \endbibitem
Chernozhukov, Chetverikov and Kato (2013) {barticle}[author] \bauthor\bsnmChernozhukov, \bfnmVictor\binitsV., \bauthor\bsnmChetverikov, \bfnmDenis\binitsD. and \bauthor\bsnmKato, \bfnmKengo\binitsK. (\byear2013). \btitleGaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. \endbibitem
Chernozhukov, Chetverikov and Kato (2017) {barticle}[author] \bauthor\bsnmChernozhukov, \bfnmVictor\binitsV., \bauthor\bsnmChetverikov, \bfnmDenis\binitsD. and \bauthor\bsnmKato, \bfnmKengo\binitsK. (\byear2017). \btitleCentral limit theorems and bootstrap in high dimensions. \endbibitem
Chernozhukov, Chetverikov and Koike (2023) {barticle}[author] \bauthor\bsnmChernozhukov, \bfnmVictor\binitsV., \bauthor\bsnmChetverikov, \bfnmDenis\binitsD. and \bauthor\bsnmKoike, \bfnmYuta\binitsY. (\byear2023). \btitleNearly optimal central limit theorem and bootstrap approximations in high dimensions. \bjournalThe Annals of Applied Probability \bvolume33 \bpages2374–2425. \endbibitem
Chernozhuokov et al. (2022) {barticle}[author] \bauthor\bsnmChernozhuokov, \bfnmVictor\binitsV., \bauthor\bsnmChetverikov, \bfnmDenis\binitsD., \bauthor\bsnmKato, \bfnmKengo\binitsK. and \bauthor\bsnmKoike, \bfnmYuta\binitsY. (\byear2022). \btitleImproved central limit theorem and bootstrap approximations in high dimensions. \bjournalThe Annals of Statistics \bvolume50 \bpages2562–2586. \endbibitem
Couillet, Debbah and Silverstein (2011) {barticle}[author] \bauthor\bsnmCouillet, \bfnmRomain\binitsR., \bauthor\bsnmDebbah, \bfnmMérouane\binitsM. and \bauthor\bsnmSilverstein, \bfnmJack W\binitsJ. W. (\byear2011). \btitleA deterministic equivalent for the analysis of correlated MIMO multiple access channels. \bjournalIEEE Transactions on Information Theory \bvolume57 \bpages3493–3514. \endbibitem
Ding, Hu and Wang (2024) {barticle}[author] \bauthor\bsnmDing, \bfnmXiucai\binitsX., \bauthor\bsnmHu, \bfnmYichen\binitsY. and \bauthor\bsnmWang, \bfnmZhenggang\binitsZ. (\byear2024). \btitleTwo sample test for covariance matrices in ultra-high dimension. \bjournalJournal of the American Statistical Association \bpages1–12. \endbibitem
Ding and Wang (2023) {barticle}[author] \bauthor\bsnmDing, \bfnmXiucai\binitsX. and \bauthor\bsnmWang, \bfnmZhenggang\binitsZ. (\byear2023). \btitleGlobal and local CLTs for linear spectral statistics of general sample covariance matrices when the dimension is much larger than the sample size with applications. \bjournalarXiv preprint arXiv:2308.08646. \endbibitem
Ding and Yang (2018) {barticle}[author] \bauthor\bsnmDing, \bfnmXiucai\binitsX. and \bauthor\bsnmYang, \bfnmFan\binitsF. (\byear2018). \btitleA necessary and sufficient condition for edge universality at the largest singular values of covariance matrices. \bjournalThe Annals of Applied Probability \bvolume28 \bpages1679–1738. \endbibitem
Ding and Yang (2022) {barticle}[author] \bauthor\bsnmDing, \bfnmXiucai\binitsX. and \bauthor\bsnmYang, \bfnmFan\binitsF. (\byear2022). \btitleTracy-Widom distribution for heterogeneous Gram matrices with applications in signal detection. \bjournalIEEE Transactions on Information Theory \bvolume68 \bpages6682–6715. \endbibitem
Efron (1979) {barticle}[author] \bauthor\bsnmEfron, \bfnmBradley\binitsB. (\byear1979). \btitleBootstrap Methods: Another Look at the Jackknife. \bjournalThe Annals of Statistics \bvolume7 \bpages1–26. \endbibitem
El Karoui (2007) {barticle}[author] \bauthor\bsnmEl Karoui, \bfnmNoureddine\binitsN. (\byear2007). \btitleTracy–Widom limit for the largest eigenvalue of a large class of complex sample covariance matrices. \endbibitem
El Karoui and Purdom (2019) {binproceedings}[author] \bauthor\bsnmEl Karoui, \bfnmNoureddine\binitsN. and \bauthor\bsnmPurdom, \bfnmElizabeth\binitsE. (\byear2019). \btitleThe non-parametric bootstrap and spectral analysis in moderate and high-dimension. In \bbooktitleThe 22nd International Conference on Artificial Intelligence and Statistics \bpages2115–2124. \bpublisherPMLR. \endbibitem
Erdős, Knowles and Yau (2013) {binproceedings}[author] \bauthor\bsnmErdős, \bfnmLászló\binitsL., \bauthor\bsnmKnowles, \bfnmAntti\binitsA. and \bauthor\bsnmYau, \bfnmHorng-Tzer\binitsH.-T. (\byear2013). \btitleAveraging fluctuations in resolvents of random band matrices. In \bbooktitleAnnales Henri Poincaré \bvolume14 \bpages1837–1926. \bpublisherSpringer. \endbibitem
Erdős, Yau and Yin (2012) {barticle}[author] \bauthor\bsnmErdős, \bfnmLászló\binitsL., \bauthor\bsnmYau, \bfnmHorng-Tzer\binitsH.-T. and \bauthor\bsnmYin, \bfnmJun\binitsJ. (\byear2012). \btitleRigidity of eigenvalues of generalized Wigner matrices. \bjournalAdvances in Mathematics \bvolume229 \bpages1435–1515. \endbibitem
Fang and Koike (2024) {barticle}[author] \bauthor\bsnmFang, \bfnmXiao\binitsX. and \bauthor\bsnmKoike, \bfnmYuta\binitsY. (\byear2024). \btitleLarge-dimensional central limit theorem with fourth-moment error bounds on convex sets and balls. \bjournalThe Annals of Applied Probability \bvolume34 \bpages2065–2106. \endbibitem
Giessing (2023) {barticle}[author] \bauthor\bsnmGiessing, \bfnmAlexander\binitsA. (\byear2023). \btitleGaussian and Bootstrap Approximations for Suprema of Empirical Processes. \bjournalarXiv preprint arXiv:2309.01307. \endbibitem
Han, Xu and Zhou (2018) {barticle}[author] \bauthor\bsnmHan, \bfnmFang\binitsF., \bauthor\bsnmXu, \bfnmSheng\binitsS. and \bauthor\bsnmZhou, \bfnmWen-Xin\binitsW.-X. (\byear2018). \btitleOn Gaussian comparison inequality and its application to spectral analysis of large random matrices. \endbibitem
Hu and Lu (2022) {barticle}[author] \bauthor\bsnmHu, \bfnmHong\binitsH. and \bauthor\bsnmLu, \bfnmYue M\binitsY. M. (\byear2022). \btitleUniversality laws for high-dimensional learning with random features. \bjournalIEEE Transactions on Information Theory \bvolume69 \bpages1932–1964. \endbibitem
Ji and Park (2021) {barticle}[author] \bauthor\bsnmJi, \bfnmHong Chang\binitsH. C. and \bauthor\bsnmPark, \bfnmJaewhi\binitsJ. (\byear2021). \btitleTracy-Widom limit for free sum of random matrices. \bjournalarXiv preprint arXiv:2110.05147. \endbibitem
Jiang (2004) {barticle}[author] \bauthor\bsnmJiang, \bfnmTiefeng\binitsT. (\byear2004). \btitleThe asymptotic distributions of the largest entries of sample correlation matrices. \endbibitem
Jiang and Bai (2021) {barticle}[author] \bauthor\bsnmJiang, \bfnmDandan\binitsD. and \bauthor\bsnmBai, \bfnmZhidong\binitsZ. (\byear2021). \btitleGeneralized four moment theorem and an application to CLT for spiked eigenvalues of high-dimensional covariance matrices. \endbibitem
Johnstone (2001) {barticle}[author] \bauthor\bsnmJohnstone, \bfnmIain M\binitsI. M. (\byear2001). \btitleOn the distribution of the largest eigenvalue in principal components analysis. \bjournalThe Annals of statistics \bvolume29 \bpages295–327. \endbibitem
Johnstone and Paul (2018) {barticle}[author] \bauthor\bsnmJohnstone, \bfnmIain M\binitsI. M. and \bauthor\bsnmPaul, \bfnmDebashis\binitsD. (\byear2018). \btitlePCA in high dimensions: An orientation. \bjournalProceedings of the IEEE \bvolume106 \bpages1277–1292. \endbibitem
Karoui (2003) {barticle}[author] \bauthor\bsnmKaroui, \bfnmNoureddine El\binitsN. E. (\byear2003). \btitleOn the largest eigenvalue of Wishart matrices with identity covariance when n, p and p/n tend to infinity. \bjournalarXiv preprint math/0309355. \endbibitem
Ke (2016) {barticle}[author] \bauthor\bsnmKe, \bfnmZheng Tracy\binitsZ. T. (\byear2016). \btitleDetecting rare and weak spikes in large covariance matrices. \bjournalarXiv preprint arXiv:1609.00883. \endbibitem
Knowles and Yin (2017) {barticle}[author] \bauthor\bsnmKnowles, \bfnmAntti\binitsA. and \bauthor\bsnmYin, \bfnmJun\binitsJ. (\byear2017). \btitleAnisotropic local laws for random matrices. \bjournalProbability Theory and Related Fields \bvolume169 \bpages257–352. \endbibitem
Koltchinskii and Lounici (2017) {barticle}[author] \bauthor\bsnmKoltchinskii, \bfnmVladimir\binitsV. and \bauthor\bsnmLounici, \bfnmKarim\binitsK. (\byear2017). \btitleConcentration inequalities and moment bounds for sample covariance operators. \bjournalBernoulli \bpages110–133. \endbibitem
Kong and Valiant (2017) {barticle}[author] \bauthor\bsnmKong, \bfnmWeihao\binitsW. and \bauthor\bsnmValiant, \bfnmGregory\binitsG. (\byear2017). \btitleSpectrum estimation from samples. \endbibitem
Ledoit and Wolf (2015) {barticle}[author] \bauthor\bsnmLedoit, \bfnmOlivier\binitsO. and \bauthor\bsnmWolf, \bfnmMichael\binitsM. (\byear2015). \btitleSpectrum estimation: A unified framework for covariance matrix estimation and PCA in large dimensions. \bjournalJournal of Multivariate Analysis \bvolume139 \bpages360–384. \endbibitem
Lee and Schnelli (2016) {barticle}[author] \bauthor\bsnmLee, \bfnmJi Oon\binitsJ. O. and \bauthor\bsnmSchnelli, \bfnmKevin\binitsK. (\byear2016). \btitleTracy–Widom distribution for the largest eigenvalue of real sample covariance matrices with general population. \endbibitem
Li et al. (2021) {barticle}[author] \bauthor\bsnmLi, \bfnmYan\binitsY., \bauthor\bsnmChen, \bfnmKun\binitsK., \bauthor\bsnmYan, \bfnmJun\binitsJ. and \bauthor\bsnmZhang, \bfnmXuebin\binitsX. (\byear2021). \btitleUncertainty in optimal fingerprinting is underestimated. \bjournalEnvironmental Research Letters \bvolume16 \bpages084043. \endbibitem
Lopes (2022a) {barticle}[author] \bauthor\bsnmLopes, \bfnmMiles E\binitsM. E. (\byear2022a). \btitleImproved rates of bootstrap approximation for the operator norm: A coordinate-free approach. \bjournalarXiv preprint arXiv:2208.03050. \endbibitem
Lopes (2022b) {barticle}[author] \bauthor\bsnmLopes, \bfnmMiles E\binitsM. E. (\byear2022b). \btitleCentral limit theorem and bootstrap approximation in high dimensions: Near $1/\sqrt{n}$ rates via implicit smoothing. \bjournalThe Annals of Statistics \bvolume50 \bpages2492–2513. \endbibitem
Lopes, Blandino and Aue (2019) {barticle}[author] \bauthor\bsnmLopes, \bfnmMiles E\binitsM. E., \bauthor\bsnmBlandino, \bfnmAndrew\binitsA. and \bauthor\bsnmAue, \bfnmAlexander\binitsA. (\byear2019). \btitleBootstrapping spectral statistics in high dimensions. \bjournalBiometrika \bvolume106 \bpages781–801. \endbibitem
Lopes, Erichson and Mahoney (2023) {barticle}[author] \bauthor\bsnmLopes, \bfnmMiles E\binitsM. E., \bauthor\bsnmErichson, \bfnmN Benjamin\binitsN. B. and \bauthor\bsnmMahoney, \bfnmMichael W\binitsM. W. (\byear2023). \btitleBootstrapping the operator norm in high dimensions: Error estimation for covariance matrices and sketching. \bjournalBernoulli \bvolume29 \bpages428–450. \endbibitem
Lopes, Lin and Müller (2020) {barticle}[author] \bauthor\bsnmLopes, \bfnmMiles E\binitsM. E., \bauthor\bsnmLin, \bfnmZhenhua\binitsZ. and \bauthor\bsnmMüller, \bfnmHans-Georg\binitsH.-G. (\byear2020). \btitleBootstrapping max statistics in high dimensions: Near-parametric rates under weak variance decay and application to functional and multinomial data. \endbibitem
Montanari and Saeed (2022) {binproceedings}[author] \bauthor\bsnmMontanari, \bfnmAndrea\binitsA. and \bauthor\bsnmSaeed, \bfnmBasil N\binitsB. N. (\byear2022). \btitleUniversality of empirical risk minimization. In \bbooktitleConference on Learning Theory \bpages4310–4312. \bpublisherPMLR. \endbibitem
Morice et al. (2012) {barticle}[author] \bauthor\bsnmMorice, \bfnmColin P\binitsC. P., \bauthor\bsnmKennedy, \bfnmJohn J\binitsJ. J., \bauthor\bsnmRayner, \bfnmNick A\binitsN. A. and \bauthor\bsnmJones, \bfnmPhil D\binitsP. D. (\byear2012). \btitleQuantifying uncertainties in global and regional temperature change using an ensemble of observational estimates: The HadCRUT4 data set. \bjournalJournal of Geophysical Research: Atmospheres \bvolume117. \endbibitem
Olonscheck and Notz (2017) {barticle}[author] \bauthor\bsnmOlonscheck, \bfnmDirk\binitsD. and \bauthor\bsnmNotz, \bfnmDirk\binitsD. (\byear2017). \btitleConsistently estimating internal climate variability from climate model simulations. \bjournalJournal of Climate \bvolume30 \bpages9555–9573. \endbibitem
Onatski, Moreira and Hallin (2013) {barticle}[author] \bauthor\bsnmOnatski, \bfnmAlexei\binitsA., \bauthor\bsnmMoreira, \bfnmMarcelo J\binitsM. J. and \bauthor\bsnmHallin, \bfnmMarc\binitsM. (\byear2013). \btitleAsymptotic power of sphericity tests for high-dimensional data. \endbibitem
Paul (2007) {barticle}[author] \bauthor\bsnmPaul, \bfnmDebashis\binitsD. (\byear2007). \btitleAsymptotics of sample eigenstructure for a large dimensional spiked covariance model. \bjournalStatistica Sinica \bpages1617–1642. \endbibitem
Qiu, Li and Yao (2023) {barticle}[author] \bauthor\bsnmQiu, \bfnmJiaxin\binitsJ., \bauthor\bsnmLi, \bfnmZeng\binitsZ. and \bauthor\bsnmYao, \bfnmJianfeng\binitsJ. (\byear2023). \btitleAsymptotic normality for eigenvalue statistics of a general sample covariance matrix when p/n $\to\infty$ and applications. \bjournalThe Annals of Statistics \bvolume51 \bpages1427–1451. \endbibitem
Schäfer and Strimmer (2005) {barticle}[author] \bauthor\bsnmSchäfer, \bfnmJuliane\binitsJ. and \bauthor\bsnmStrimmer, \bfnmKorbinian\binitsK. (\byear2005). \btitleA shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. \bjournalStatistical applications in genetics and molecular biology \bvolume4. \endbibitem
Taylor, Stouffer and Meehl (2012) {barticle}[author] \bauthor\bsnmTaylor, \bfnmKarl E\binitsK. E., \bauthor\bsnmStouffer, \bfnmRonald J\binitsR. J. and \bauthor\bsnmMeehl, \bfnmGerald A\binitsG. A. (\byear2012). \btitleAn overview of CMIP5 and the experiment design. \bjournalBulletin of the American meteorological Society \bvolume93 \bpages485–498. \endbibitem
Tsukuma (2016) {barticle}[author] \bauthor\bsnmTsukuma, \bfnmHisayuki\binitsH. (\byear2016). \btitleEstimation of a high-dimensional covariance matrix with the Stein loss. \bjournalJournal of Multivariate Analysis \bvolume148 \bpages1–17. \endbibitem
Xu, Zhang and Wu (2019) {barticle}[author] \bauthor\bsnmXu, \bfnmMengyu\binitsM., \bauthor\bsnmZhang, \bfnmDanna\binitsD. and \bauthor\bsnmWu, \bfnmWei Biao\binitsW. B. (\byear2019). \btitlePearson’s chi-squared statistics: approximation theory and beyond. \bjournalBiometrika \bvolume106 \bpages716–723. \endbibitem
Yao and Lopes (2021) {barticle}[author] \bauthor\bsnmYao, \bfnmJunwen\binitsJ. and \bauthor\bsnmLopes, \bfnmMiles E\binitsM. E. (\byear2021). \btitleRates of bootstrap approximation for eigenvalues in high-dimensional PCA. \bjournalarXiv preprint arXiv:2104.07328. \endbibitem
Yu, Li and Xue (2024) {barticle}[author] \bauthor\bsnmYu, \bfnmXiufan\binitsX., \bauthor\bsnmLi, \bfnmDanning\binitsD. and \bauthor\bsnmXue, \bfnmLingzhou\binitsL. (\byear2024). \btitleFisher’s combined probability test for high-dimensional covariance matrices. \bjournalJournal of the American Statistical Association \bvolume119 \bpages511–524. \endbibitem
Yu, Zhao and Zhou (2024) {barticle}[author] \bauthor\bsnmYu, \bfnmLong\binitsL., \bauthor\bsnmZhao, \bfnmPeng\binitsP. and \bauthor\bsnmZhou, \bfnmWang\binitsW. (\byear2024). \btitleTesting the number of common factors by bootstrapped sample covariance matrix in high-dimensional factor models. \bjournalJournal of the American Statistical Association \bpages1–12. \endbibitem
Zhai (2018) {barticle}[author] \bauthor\bsnmZhai, \bfnmAlex\binitsA. (\byear2018). \btitleA high-dimensional CLT in $\mathcal{W}_{2}$ distance with near optimal convergence rate. \bjournalProbability Theory and Related Fields \bvolume170 \bpages821–845. \endbibitem
Zhou, Bai and Hu (2023) {barticle}[author] \bauthor\bsnmZhou, \bfnmHuanchao\binitsH., \bauthor\bsnmBai, \bfnmZhidong\binitsZ. and \bauthor\bsnmHu, \bfnmJiang\binitsJ. (\byear2023). \btitleThe Limiting Spectral Distribution of Large-Dimensional General Information-Plus-Noise-Type Matrices. \bjournalJournal of Theoretical Probability \bvolume36 \bpages1203–1226. \endbibitem
Zhou et al. (2024) {barticle}[author] \bauthor\bsnmZhou, \bfnmHuanchao\binitsH., \bauthor\bsnmHu, \bfnmJiang\binitsJ., \bauthor\bsnmBai, \bfnmZhidong\binitsZ. and \bauthor\bsnmSilverstein, \bfnmJack W\binitsJ. W. (\byear2024). \btitleAnalysis of the Limiting Spectral Distribution of Large-dimensional General Information-Plus-Noise-Type Matrices. \bjournalJournal of Theoretical Probability \bvolume37 \bpages1199–1229. \endbibitem
Zhu et al. (2017) {barticle}[author] \bauthor\bsnmZhu, \bfnmLingxue\binitsL., \bauthor\bsnmLei, \bfnmJing\binitsJ., \bauthor\bsnmDevlin, \bfnmBernie\binitsB. and \bauthor\bsnmRoeder, \bfnmKathryn\binitsK. (\byear2017). \btitleTesting high-dimensional covariance matrices, with application to detecting schizophrenia risk genes. \bjournalThe annals of applied statistics \bvolume11 \bpages1810. \endbibitem

	$\displaystyle\sup_{0\leq\alpha\leq 1}\bigg{\|}\mathbb{P}\bigg{(}\forall\bm{A}\in\mathbb{R}^{p\times p},\ \langle\bm{A},\bm{\Sigma}\rangle\in\bigg{[}\langle\bm{A},\bm{\hat{\Sigma}}\rangle-\hat{q}^{\text{ub},B}_{\widehat{\text{Sp}}(\bm{\Sigma})}(\alpha)\$	$\displaystyle\\|\bm{A}\\|_{S_{1}},$		(40)
	$\displaystyle\langle\bm{A},\bm{\hat{\Sigma}}\rangle+\hat{q}^{\text{ub},B}_{\widehat{\text{Sp}}(\bm{\Sigma})}(\alpha)\ \\|\bm{A}\\|_{S_{1}}\bigg{]}\bigg{\|}\bm{Y}^{1},\cdots,\bm{Y}^{B}\bigg{)}-(1-\alpha)\bigg{\|}\lesssim\rho_{n}(\bm{\Sigma}_{0})+$	$\displaystyle d_{\text{Sp}}(\widehat{\text{Sp}}(\bm{\Sigma}),\text{Sp}(\bm{\Sigma}))+B^{-\frac{1}{2}}.$

	$\displaystyle\sup_{0\leq\alpha\leq 1}\bigg{\|}\mathbb{P}\bigg{(}\forall\bm{c}_{1},\bm{c}_{2}\in\mathbb{R}^{p},\ \bm{c}_{1}^{T}\bm{\Sigma}\bm{c}_{2}\in\bigg{[}\bm{c}_{1}^{T}\bm{\hat{\Sigma}}\bm{c}_{2}-\hat{q}^{\text{ub},B}_{\widehat{\text{Sp}}(\bm{\Sigma})}(\alpha)\$	$\displaystyle\\|\bm{c}_{1}\\|\cdot\\|\bm{c}_{2}\\|,$		(41)
	$\displaystyle\bm{c}_{1}^{T}\bm{\hat{\Sigma}}\bm{c}_{2}+\hat{q}^{\text{ub},B}_{\widehat{\text{Sp}}(\bm{\Sigma})}(\alpha)\ \\|\bm{c}_{1}\\|\cdot\\|\bm{c}_{2}\\|\bigg{]}\bigg{\|}\bm{Y}^{1},\cdots,\bm{Y}^{B}\bigg{)}-(1-\alpha)\bigg{\|}$	$\displaystyle\lesssim\rho_{n}(\bm{\Sigma}_{0})+d_{\text{Sp}}(\widehat{\text{Sp}}(\bm{\Sigma}),\text{Sp}(\bm{\Sigma}))+B^{-\frac{1}{2}}.$

Covariance test and universal bootstrap by operator norm

Abstract

keywords:

keywords:

1 Introduction

1.1 Random matrix theory

1.2 Bootstrap

1.3 Our contributions

1.4 Notations and paper organization

2 Proposed method and theoretical outlines

Result 1 (Informal).

Result 2 (Informal).

Result 3 (Informal).

3 Universal bootstrap

3.1 Preliminaries

Assumption 1.

Assumption 2.

Assumption 3.

3.2 Universality

Assumption 4.

Theorem 3.1 (Anisotropic local law).

Theorem 3.2 (Rigidity of eigenvalues).

Theorem 3.3 (Universality of the largest eigenvalue).

Corollary 1.

Corollary 2.

3.3 Universal bootstrap for covariance test

Assumption 4′.

Example 1.

Definition 1 (Admissible pair).

Theorem 3.4 (Anti-concentration inequality).

Theorem 3.5 (Universal bootstrap consistency).

3.4 Sharp uniform simultaneous confidence intervals

Theorem 3.6.

4 Power analysis and the generalized universal bootstrap approximation

4.1 Power analysis for operator norm-based statistics

Theorem 4.1.

Corollary 3.

4.2 The generalized universal bootstrap

Theorem 4.2 (Generalized universal bootstrap consistency).

4.3 Comparison with other norms

Assumption 1′.

Proposition 1.

Assumption 1′′.

Assumption 2′′.

Proposition 2.

5 Numerical results

5.1 Simulation

5.2 Data application

References

Assumption 4^′.

Assumption 1^′.

Assumption 1^′′.

Assumption 2^′′.