Rank-adaptive covariance testing with applications to
genomics and neuroimaging
2Department of Statistics, University of Wisconsin - Madison, Madison, Wisconsin, U.S.A.
3Department of Psychology, University of Toronto, Toronto, ON, Canada
∗Equal contributions
)
Abstract
In biomedical studies, testing for differences in covariance offers scientific insights beyond mean differences, especially when differences are driven by complex joint behavior between features. However, when differences in joint behavior are weakly dispersed across many dimensions and arise from differences in low-rank structures within the data, as is often the case in genomics and neuroimaging, existing two-sample covariance testing methods may suffer from power loss. The Ky-Fan() norm, defined by the sum of the top singular values, is a simple and intuitive matrix norm able to capture signals caused by differences in low-rank structures between matrices, but its statistical properties in hypothesis testing have not been studied well. In this paper, we investigate the behavior of the Ky-Fan() norm in two-sample covariance testing. Ultimately, we propose a novel methodology, Rank-Adaptive Covariance Testing (RACT), which is able to leverage differences in low-rank structures found in the covariance matrices of two groups in order to maximize power. RACT uses permutation for statistical inference, ensuring an exact Type I error control. We validate RACT in simulation studies and evaluate its performance when testing for differences in gene expression networks between two types of lung cancer, as well as testing for covariance heterogeneity in diffusion tensor imaging (DTI) data taken on two different scanner types.
Keywords: Ky-Fan() norm; adaptive testing; permutation; two-sample covariance testing; genomics; neuroimaging.
1 Introduction
1.1 Two-sample covariance testing in biomedical data
In biomedical studies, comparing differences between covariance often offers scientific insights beyond what is inferred by comparing the mean differences. Especially, it can help determine if complex joint behavior differs between two groups of samples. In this paper, we present two specific applications of two-sample covariance testing within genomics and neuroimaging.
In genomics, gene expression networks, quantified by the covariance of expression levels of multiple genes, provides a better understanding of the genetic drivers of cellular behavior, particularly when the individual contribution from numerous genes is weak, but their joint co-expression levels are strong (Eisen \BOthers., \APACyear1998; Stuart \BOthers., \APACyear2003). Testing for differences in the gene expression networks between tissue types (e.g., a tumor tissue and normal tissue), molecular subtypes of a cancer (e.g., basal and HER2 subtypes in breast cancer), or cancer types (e.g., breast cancer versus ovarian cancer) can be used to identify important genomic biomarkers of complex diseases via these co-expression levels.
In the second motivating example, we consider the ‘batch effect’ problem in genomics, microbiome, and neuroimaging where non-biological variations are induced by collecting data from multiple lab/sites, platforms, preprocessing methods, or scanners. Within neuroimaging, a number of methods have been proposed to estimate and remove these effects (Hu \BOthers., \APACyear2023), and have the potential to increase reliability of scientific findings from the increased sample sizes and more diverse groups of subjects that come from combining datasets. Recently Chen \BOthers. (\APACyear2022), Zhang \BOthers. (\APACyear2023), and Zhang \BOthers. (\APACyear2024) have empirically observed batch effects reflected in the the heterogeneity of covariances of observations taken at different sites or using different scanners. Despite these observed differences, there has been limited work on testing whether these observed covariance heterogeneities are statistically significant, and hence whether preprocessing the data to mitigate batch effects in covariances is justified.
1.2 Leveraging low-rank structure
In both genomics and neuroimaging, techniques which leverage the low-rank structure of the often high dimensional data have been found to be useful in characterizing and understanding variations within the data. These low-rank structures are correspondingly reflected in the spiked structure of the singular values of the covariance matrix. Using tools from differential co-expression analysis (Chowdhury \BOthers., \APACyear2019), low-rank structures have been empirically observed in the differences between gene expression networks (Amar \BOthers., \APACyear2013; Meng \BOthers., \APACyear2018; Yu \BBA Luo, \APACyear2021). In addition, several model-based approaches in cancer genomics demonstrated the utility of leveraging low-rank structures in studying differences between tumor types (Park \BBA Lock, \APACyear2020; Lock \BOthers., \APACyear2022). As well, the batch effect induced covariance heterogeneities in neuroimaging appear to be driven by differences in low-rank structures (Veronese \BOthers., \APACyear2019; Carmon \BOthers., \APACyear2020; Chen \BOthers., \APACyear2022; Zhang \BOthers., \APACyear2023). When these low-rank structures differ between two groups, it is expected that a low-rank structure will explain most of the differences in covariances between two groups. Therefore, there is potential utility in constructing statistics which leverage the low-rank structure of the difference in covariance matrices between two groups in improving statistical power.
1.3 Literature review
In both of the aforementioned application areas, a low-rank structure is likely to be useful in characterizing the differences in covariance matrices between populations, however to date few two-sample covariance testing methods, which test , have been developed which explicitly are able to leverage this type of low-rank structure. Schott (\APACyear2007) and Li \BBA Chen (\APACyear2012) consider test statistics based on the squared Frobenius norm of . Srivastava \BBA Yanagihara (\APACyear2010) develops an estimator based on the trace of and as well as and . These tests which involve the Frobenius norm or trace can be seen as a function of all of the singular values of and or , and hence do not exploit the data’s potential low-rank structure. Tests which are more powerful in specific low-rank settings include: Cai \BOthers. (\APACyear2013) a test powerful against sparse alternatives based on the maximum of standardized elementwise differences of sample covariance matrices between two groups, Danaher \BOthers. (\APACyear2015) a biological pathway inspired test which uses the leading eigenvalues and trace of the sample covariance matrices, Zhu \BOthers. (\APACyear2017) which uses the sparse leading eigenvalue of , He \BBA Chen (\APACyear2018) whose test statistic is based on differences in superdiagonals between and , which is particularly powerful when and have a bandable structure, and Ding \BOthers. (\APACyear2024) which compares a subset of eigenvalues between the two sample covariance matrices.
1.4 Our contributions
A selection of the two-sample methods listed in Section 1.3 leverage some form of low-rank structure in either the covariance matrices, or difference in covariance matrices, between groups; however, they individually all make strong assumptions of what the form of the low-rank structure that exists is. This can be problematic in the two application areas this paper examines where it is unclear in advance what low-rank structures exist, let alone which low-rank structures to use in order to maximize power for two-sample covariance matrix testing. In this paper we bridge this gap and propose a two-sample covariance testing method which is able to adapt to the form of the low-rank structures which exist within the data, without placing stringent prior assumptions on their form. Specifically, we construct a test statistic that is adaptive to the form of the low-rank structure found in the difference of two sample covariance matrices, and utilize a permutation scheme to ensure strict Type I error control in the finite sample setting.
The remainder of the paper is organized as follows. In Section 2 the adaptive test statistic underlying RACT is presented, and its permutation procedure introduced. Section 3 investigates the asymptotic behavior of the individual test statistics included in RACT’s adaptive test statistic, and discusses the signal-to-noise trade-off of these individual test statistics. A simulation study is presented in Section 4, demonstrating RACT’s adaptivity to various forms of covariance differences. We then apply RACT to the two application areas of interest in Section 5, and compare its performance to a selection of competing two-sample covariance testing methods. Finally in Section 6 we discuss potential extensions, and limitations, of RACT.
2 Methodology
2.1 Notation and setup
Denote , as observed features from two groups of sample sizes and , respectively, with equal population means (for practical purposes as a preprocessing step the data can be centered using the sample mean for each group), and let be the population and sample covariances of these groups. In addition, we define to be the total number of samples and to be the pooled covariance. In this section, we test for differences in the covariance matrix, however if one preferred to test for differences in the correlation matrix this could be accomplished by redefining as population and sample correlation matrices. We define our null and alternative hypotheses as versus .
2.2 Ky-Fan() statistic - fixed
Testing requires quantifying the difference between and , for example by using a probabilistic model. As noted in Section 1.3, several existing two-sample covariance testing methods are based on test statistics which utilize the Frobenius norm or leading singular values for detecting differences in covariance structures. However, using either of these test statistics individually would be limited for certain covariance differences and lead to underpowered results. For example, a test statistic based on only a single singular value would be underpowered when several singular values drive the difference in covariance. On the other hand, the Frobenius norm, constructed by , is well-suited when each entry of has non-zero expectation, but it could be underpowered when the entries with non-zero expected values are sparse. Also, even when the signals are dense, considering that the Frobenius norm of a matrix is equivalent to the square root of the sum of squares of all singular values of the matrix, it may be underpowered when is low-rank.
For a fixed , one way to characterize the difference between sample covariance matrices is through the use of a Ky-Fan() norm defined by
(1) |
where and is the th largest singular value of a matrix . If is low-rank, there likely will exist a such that this Ky-Fan() norm captures most of the variation of the signal, and by excluding the bottom singular values, ignores noise introduced through finite sample variability.
2.3 Adaptive Ky-Fan statistic
Although is a well-motivated test statistic with a prespecified , it is unclear in advance what value of will maximize ’s power. If is chosen too small, then may fail to capture the signal which exists outside of the top- singular values of . Alternatively, if is chosen too large then some of the singular values included in will be noise, decreasing the signal to noise ratio of . Section 3 further discusses this signal-to-noise trade-off in the asymptotic setting.
The problem of selecting the ‘optimal’ motivates the proposed method, rank-adaptive covariance testing (RACT), which considers the adaptive test statistic:
(2) |
where represents the collection of Ky-Fan() norms from 1 to , and , are the expected value and variance of under .
As discussed in Section 3, for normal data in the asymptotic setting, we show that the signal-to-noise ratio of is a function of , , and . Therefore, when and are unknown it is difficult to ascertain in advance which will attain the maximum signal-to-noise ratio for . By taking a maximum across different values of for an appropriately normalized , under the alternative the behavior of will resemble the behavior of the signal-to-noise maximizing , and similarly the power of will be close to the power of this (our simulations in Section 4 reflect this adaptivity). By including a diverse set of norms in , an investigator is freed from having to make a potentially consequential decision as to what single matrix norm should be used for testing , while at the same time potentially benefiting from the inclusion of norms which are sensitive to certain types of structures in .
The maximum can be chosen either as , or a smaller value that reflects prior knowledge of the data or computational considerations (since truncated SVD for the top singular values is an operation). In this paper, we choose to be the smallest such that the variation of explained by its top singular values exceeds 80%. This is done as the bottom 20% of the singular values are unlikely to provide much of the signal for our test statistic in the high-dimensional setting, and by including observations from both groups we ensure is the same for all permutations of the data.
2.4 Permutation testing
We use permutation to both estimate and , as well as to set a critical value for conducting hypothesis testing based on . The use of permutation to set a critical value is attractive due to the dependencies inherent in the statistics. Even if the asymptotic distribution was known, the critical values from this distribution could be inaccurate for Type I error control for a finite . We create permuted datasets (permuting subjects between the two groups) and take the empirical means and standard deviations of for all to estimate and . Then, using these same permuted datasets, we set the critical value at level of for testing , as the th quantile of where is calculated in the same way as in (2) except each is calculated using the permuted dataset. We then reject if is larger than this critical value.
Remark 1
While Section 3 suggests that for normally distributed data in the asymptotic setting these statistics for certain values of are marginally normal, making the standardized comparable across different , for finite sample sizes we observe these statistics often deviate substantially from normality, particularly for small . We find that a minimum -value approach is more robust in smaller samples, where the test statistic is defined by where is the value corresponding to the . Further details are provided in the Appendix.
3 Asymptotic analysis
Understanding the (asymptotic) properties of the Ky-Fan norms is necessary in obtaining deeper insights into the power under the alternative hypothesis. Assumptions of underlying true distributions will be made to derive simple analytical forms below, but they are not obligatory for applying the procedure to control the Type I error because permutation can effectively control the Type I error given a finite sample size (Van der Vaart, \APACyear2000).
Throughout this section, we consider two-sample observations and , assume and are bounded as . We will next elucidate a trade-off between the signal and noise increments that determines an optimal choice of when using the Ky-Fan() norms.
For the ease of discussion, we next consider scenarios where the eigendecomposition of is unique, so that all of the sample Ky-Fan() norms converge to well-defined limits. Suppose is the singular value decomposition with ordered singular values . By matrix properties, we can write the eigendecomposition with and for .
Theorem 1
For , define
(3) |
Let .
-
(i)
When ,
where represents the convergence in distribution.
-
(ii)
When ,
where is jointly normal with mean zero and covariance .
By Theorem 1, under , we have
where we define for . In a special case where the ’s are also eigenvectors of and , we have when , suggesting that the ’s are independent.
Remark 2
By the property of the normal distribution,
which shows that the limiting distribution changes with respect to . This motivates us to consider the standardization used in Section 2.3. In this way, the Ky-Fan statistics are of the standardized scale under , so that taking the maximum yields a fair comparison.
Based on Theorem 1, we study the asymptotic power of the individual Ky-Fan statistic . Let represent the upper -level quantile of . For a given , suppose we reject , if and only if . Let denote the probability measure under and . Then the test power is
(4) |
When , by Part (i) of Theorem 1, (4) can be approximated by
(5) |
where represents the cumulative distribution function of . We define the signal to noise ratio as . When is large such that is very small, the power formula (5) is dominated by , and higher value of indicates a larger value of (5), i.e., higher asymptotic power.
We next compare the asymptotic power of different Ky-Fan statistics via comparing the values of .
Proposition 2
For two indexes , define
as the relative increments of signal and noise, respectively. We have and only if .
We next discuss the interpretation of Proposition 2. As an example, when , we have (no signal increment) whereas (positive noise increment), so that by Proposition 2. This suggests that the test power/SNR of Ky-Fan would be smaller than that of Ky-Fan. Thus it is likely that including sample singular values in the test statistic would not help enhance the test power. Moreover, in this case, we anticipate that Ky-Fan might have higher power than the popular Frobenius norm statistic. When , the condition can be interpreted as the relative signal increment is larger than the relative variance increment. This suggests that even when the signal has a positive increment, whether the test power can be enhanced or not depends on the trade-off between signal and noise increments. Such an intricate trade-off partially shows the difficulty of determining an optimal exactly in observational data. It partially justifies using the proposed adaptive version which integrates different Ky-Fan statistics in Equation (2).
4 Simulation study
4.1 Simulation setup
We conducted extensive simulation studies to better understand the performance of RACT. All simulations, unless otherwise noted, were run in the setting of . For simulations related to the null hypothesis, the covariance matrix was for all observations; otherwise and represent the covariance matrices for group 1 and group 2. We simulated data from covariance matrices with changing low-rank, and low-rank block structures to reflect the block structures commonly found in biomedical data. In the data simulation settings S1-S3 below, we let and where and are low-rank matrices, so that is low-rank. We define as the ranks of and , and in our simulations either or . To randomly generate appropriately sized blocks of the low-rank components of the covariance matrix , we generated the columns of independently using the first singular vectors from randomly generated matrices , where are appropriately sized matrices with i.i.d. standard normal entries. Figure 1 is provided for graphical illustrations of our simulation settings.

-
S1.
(LowRank): We set with . Here, all elements of the covariance matrix experience a change with high probability.
-
S2.
(LowRankBlockLarge): We set
-
S3.
(LowRankBlockSmall): A similar setup to S2 except and .
-
S4.
(OffDiagonal): Here where . has an equicorrelation structure in that for and for . is equal to except the covariances between dimensions and are set to .
To evaluate Type I error, we generated 10000 datasets and used 2000 permutations for each dataset. For power analysis we used 1000 datasets and 1000 permutations.
4.2 Simulation results
4.2.1 Control of Type I error
RACT provided reliable Type I error control in all of the simulation setups, which was expected from the permutation scheme. Across all simulation setups, for RACT’s Type I error fell between 0.047 and 0.054.
4.2.2 Two-sample testing

The performance of RACT in two-sample testing allows us to understand how the power of individual norms drives the overall power of RACT. Figure 2 shows the power of RACT when specific single norms were used, as well as when multiple norms were included as described in Section 2.3. The individual norms we present are the operator (i.e. Ky-Fan(1)), Ky-Fan(4), Ky-Fan(10), and Ky-Fan(25), as they help demonstrate in our simulation settings the impact varying has on the power of the Ky-Fan() norm. Recall that because RACT selects a based on the pooled covariance matrix, RACT may not include all of the the individual norms we compare it to in all simulations.
-
S1.
(LowRank): For this covariance structure, when RACT’s power was similar to that of the Ky-Fan(4) norm, which we would expect would have high power as is of rank at most 4. Increasing the rank such that led to an improvement in performance of the higher Ky-Fan() norms and a relative underperformance of the operator norm. Interestingly for the case the Ky-Fan(4) norm still performed quite strongly, reflecting the difficultly in ascertaining in advance the ideal set of Ky-Fan() norms for RACT to include in the finite sample setting.
-
S2.
(LowRankBlockSmall): As the change is relegated to a small block we saw a deterioration in the performance of the operator norm and an increase in performance of the higher Ky-Fan() norms. The setting demonstrates the way in which a very small block changing is in some ways an adversarial setting for RACT. Here the norms we examined which provided the greatest individual power were the Ky-Fan(10) and Ky-Fan(25) norms. However RACT’s power was lower than these as it suboptimally chose due to the size, and hence large singular values, associated with the unchanging block. For example, in the case for associated with RACT’s power equal to 0.5, was chosen to be 7.46 on average implying that the higher power Ky-Fan() norms were often excluded from consideration.
-
S3.
(LowRankBlockLarge): Here we saw when a large block experienced a change, the operator norm markedly underperformed. For associated with RACT’s power equal to 0.5, on average RACT chose as 19.6 and 11.6 for the and cases respectively. These choices of contributed to RACT’s high power, as we see the Ky-Fan() norms for large values of were particularly powerful.
-
S4.
(OffDiagonal): This setting sees two very large singular values in , and this is reflected in the strong performance of the operator norm. Notable was the very poor performance of the Ky-Fan() norm, an indication that including too many Ky-Fan() norms may lead to a deterioration of performance.
One interesting result seen in S2 and S3, but not in S1, is the strong performance of the Ky-Fan(25) norm, despite the fact that the rank of is at most 4 and 10 for and respectively. This can be attributed to the fact that for S2 and S3 in the finite sample setting the top singular values of can be driven by the unchanging block of and for small values of . This led to the signal arising from the changing block to be excluded from the Ky-Fan() norms for small values of , reducing the power of these. This points to the difficulty of ascertaining the optimal in advance as the interaction of the covariance structures, , and , seem to determine which Ky-Fan() norms produce the highest power.
5 Real data analysis
We examine the performance of RACT on two separate datasets and compare it to five competitors: SY (Srivastava \BBA Yanagihara, \APACyear2010), LC (Li \BBA Chen, \APACyear2012), CLX (Cai \BOthers., \APACyear2013), HC (He \BBA Chen, \APACyear2018), and DHW (Ding \BOthers., \APACyear2024). The competing methods are implemented using the R package UHDtst available at https://github.com/xcding1212/UHDtst. Since all the competing methods considered are based on asymptotic results that reported inflated Type I error rates in small sample sizes, we implement a permutation-based version of SY, LC, CLX, HC for a fair comparison. We do not implement a permutation-based version of DHW given its high computational cost. For the hyperparameters used for each method we follow the default implementation in UHDtst; namely HC tests superdiagonals, and for DHW the data-splitting algorithm is repeated 100 times.
For both applications, we compare the empirical Type I error rates by conducting two sample tests when both samples are from the same group, and power when they are from different groups, for different sizes of subsamples from the full sample. When comparing RACT to other methods we select as described in Section 2.3, and when comparing the power of individual Ky-Fan() norms relative to the power of RACT we set for comparability across simulations. We present results where we conduct two-sample tests using the covariance matrix as results when using correlation are broadly similar. We note that the Type I error and power of RACT are nearly identical for covariance and correlation, whereas there is more variability for the competing methods. We provide a visualization of the difference in covariance matrices between groups, and their low-rank approximation, for both datasets in Figure 3.

5.1 TCGA lung cancer data
The Cancer Genome Atlas dataset (TCGA) contains 11,000 samples from 33 tumor types for a range of data modalities and has enabled a better understanding of the molecular changes which cancer causes (Hutter \BBA Zenklusen, \APACyear2018). Of the data modalities available from TCGA, we focus on gene expression, which serves as a vital link in understanding the mechanism through which DNA mutations can lead to cancer. The study of gene expression, recently aided by the development of new sequencing technologies such as RNA-Seq (Wang \BOthers., \APACyear2009), has led to a better understanding of the differences between cancer and normal cells (and therefore serves as a potential biomarker), and has helped in the identification of new disease subtypes (Gerstung \BOthers., \APACyear2015). Specifically, we analyze gene expression networks in two types of lung cancers, lung squamous cell carcinomas (LUSC) which is a common type of lung cancer (The Cancer Genome Atlas Research Network, \APACyear2012), and lung adenocarcinomas (LUAD) which is a leading cause of cancer death (The Cancer Genome Atlas Research Network, \APACyear2014). Gene expressions for protein coding genes for these tumors was downloaded using the BiocManager R package (Morgan \BBA Ramos, \APACyear2024), for which LUSC tumor samples and LUAD tumor samples were available from 19962 genes. We transform the data by taking and keep the genes which exhibit the most variability across both tumor types. Finally within each tumor type, we regressed out age and sex from the data, and then used the residuals for our evaluations.

Using all samples we see in the difference of covariance matrices between tumor types some evidence of a low-rank structure. Using all samples, the first singular value and the first 93 singular values, represent 16% and 80% respectively of the total sum of all singular values. This suggests that when testing the equality of these covariance matrices much of the signal can be found in a limited number of low-rank structures. This is reflected in Figure 4 where we examine the empirical power of various Ky-Fan() norms for a fixed subsample size of . A rapid increase in power when is seen as increases, before peaking at and decreasing modestly thereafter. The adaptivity of RACT is demonstrated in that its empirical power, which takes into account multiple , settles near the maximum power across all .
For comparing RACT to other methods, we first note that for moderate subsample sizes of between 15 and 50 we see LC and HC exhibit slightly inflated Type I error of approximately 10%, while CLX and DHW experience more severe Type I error inflation. Hence we use a permutation-based version of LC, HC, and CLX, and exclude DHW from our power comparison. RACT’s Type I error is controlled close to , while SY is slightly conservative. Figure 4 shows RACT exhibits increased power compared to SY, LC, CLX, and is comparable to that of HC.
5.2 SPINS diffusion tensor imaging data
The second dataset we analyze is from the Social Processes Initiative in Neurobiology of the Schizophrenia(s) (SPINS) group (Hawco \BOthers., \APACyear2021). This dataset consists of diffusion tensor imaging (DTI) measurements of fractional anisotropy (FA) and mean diffusivity (MD). DTI is a non-invasive technique which measures the diffusion of water molecules within neural tissue. MD relates to the amount of diffusion which is occurring, while FA is a summary statistic related to the architecture of the tissue being examined which is influenced by how uniform diffusion is for different directions (O’Donnell \BBA Westin, \APACyear2011; Van Hecke \BOthers., \APACyear2016). DTI has been found to be useful in a clinical setting in providing biomarkers for amyotrophic lateral sclerosis, and as a diagnostic tool for Parkinson’s disease (Tae \BOthers., \APACyear2018). Zhang \BOthers. (\APACyear2023) investigated the biases that are introduced for DTI imaging in the SPINS study when measurements are taken at different sites or when using different scanners, specifically inter-scanner covariance heterogeneity. In the SPINS study most sites began with General Electric 3T (GE) scanners, and all ended with Siemens Prisma 3T (SP) scanners. In the below analysis we provide further evidence of inter-scanner covariance heterogeneity in the SPINS study. Along the lines of Zhang \BOthers. (\APACyear2023) we use linear regression to remove the effect of age, age2, gender, diagnosis, age gender, and age diagnosis, and then use these residuals to test for differences in the covariance matrix of observations for each scanner type. For both FA and MD each observation has , and in total we have observations from GE scanners and observations from SP scanners.
An examination of the difference in covariance matrices using all samples reveals a low-rank structure for both FA and MD measurements. For FA the first singular value and the sum of the first 19 singular values represent 30% and 80% of the total sum of all singular values. The low-rank structure is more pronounced for MD where the first singular value represents 59% of the total sum, and the sum of the top 6 singular values represents 81% of the total sum. Using a fixed subsample size of 30 and 15 for FA and MD respectively we see in Figure 4 for individual Ky-Fan() norms power is maximized at for FA and for MD. Reflecting the lower-rank structure of the MD data, we see a less steep increase in power as increases relative to FA. Similar to the TCGA data we see across both FA and MD datasets inflated Type I errors for LC, CLX, HC, and DHW; therefore we use permutation-based versions of LC, CLX, and HC and exclude DHW from the power analysis. RACT’s Type I error control is accurate, and SY is very conservative (Type I error of 0%). Relative to other methods we see similar results to what was seen in the TCGA dataset, with HC once again performing most comparably to RACT.
5.3 Relative performance of competing methods
Across TCGA, SPINS FA, and SPINS MD we note that RACT’s best relative performance appears for the SPINS MD dataset, where its empirical power is strictly higher than other methods for subsample sizes up to 35. This potentially reflects the relatively lower-rank structure of the SPINS MD data, and correspondingly RACT’s ability to leverage this structure better than other methods. Also it is notable that RACT’s performance is consistently strong across all datasets whereas the performance of SY, CLX, and LC is more variable; this would be expected given RACT’s adaptive test statistic as compared to the test statistics of other methods which will be more sensitive to the specific form the difference in covariance takes. The method performing most similarly to RACT is HC. However, HC’s superdiagonal-based test statistic leads its performance to vary based on the ordering of the dimensions, and for an arbitrary reordering of the dimensions for these datasets will produce different power results. This could particularly affect HC’s results on SPINS FA, which we observe in Figure 3 to have several blocks along the diagonal of the difference in covariance matrices. This contrasts with RACT’s test statistic which is invariant to permutation of the dimensions.
6 Discussion
In this paper we propose a novel two-sample covariance testing method, which is able to improve power via leveraging low-rank structures commonly found in genomics and neuroimaging data. Underlying RACT is the use of the Ky-Fan() matrix norm, which is novel in the setting of two-sample covariance testing. In Section 3 we investigate the asymptotic properties of the Ky-Fan() norm, and discover a delicate signal-to-noise trade-off which emerges for different values of . This trade-off is reflected in our power simulations where the Ky-Fan() norm which maximizes power is seen to differ between scenarios.
RACT utilizes an adaptive test statistic, composed of a series of individual Ky-Fan() norms. RACT is able to adapt to the differences in low-rank structures, by virtue of the fact that for an appropriate , the Ky-Fan() norm is able to capture most of the signal found in these low-rank differences. Figure 2 and 4 show the power of RACT generally approaches that of the power of the best performing individual Ky-Fan() norm. Notably, in Figure 2 we see the potential downside of setting too high as there are cases where the power of the Ky-Fan(25) norm trails substantially behind smaller Ky-Fan() norms. This demonstrates the importance of RACT’s adaptivity as one would not be able to replicate its performance by only using a single Ky-Fan() norm with large . However, we also see in Figure 4 that in our real data applications, the Ky-Fan() norms for very small have substantially reduced power (and in Figure 2 the operator norm generally does not maximize power). To solve this problem one could potentially select a lower bound such that only Ky-Fan() norms for are included, potentially choosing in a similar fashion to how is selected in Section 2.3. Another potential extension would be to have take a maximum over test statistics other than Ky-Fan() norms. These test statistics could be chosen to maximize the power for the application at hand; for example if the differences in covariances were expected to have a banded structure then a test statistic similar to He \BBA Chen (\APACyear2018) could be included.
The applications to gene expression networks and batch effects in neuroimaging we present in Section 5 shows RACT exhibit strong power relative to competing two-sample covariance testing methods, and is able to attain power close to the maximum of individual Ky-Fan() norms in each case (Figure 4). We see in Figure 3 the unique low-rank nature of the differences in the covariance matrices for each dataset, and note RACT’s consistently strong performance as compared to some other methods with more volatile relative performance between datasets.
Section 4 showed how the rank of had a significant effect on the relative power of the individual Ky-Fan() norms. Generally we saw that the Ky-Fan() norms which maximize power were those where is close to the rank of . In biomedical data analysis, techniques such as principal component analysis are often used for dimension reduction. With this in mind, the power maximizing of RACT could be used to help guide the selection of how many principal components to include for downstream analysis.
In Section 3 we present the asymptotic distribution of the Ky-Fan() norm for a fixed . While the limiting distribution of remains an open question, we are able to control the Type I error via permutation. While this lack of closed-form distribution for is a limitation, we also note that Section 5 shows that when faced with finite samples permutation may be necessary for many other methods in order to control Type I error. A further extension could include investigating the performance of RACT in the asymptotic setting where is not fixed, which would be valuable given the high-dimensional nature of many biomedical problems.
Acknowledgements
We would like to thank the SPINS group for providing access to the brain imaging data from the SPINS study. DV was partially supported by the doctoral fellowship from University of Toronto’s Data Sciences Institute. YH was partially supported by the Wisconsin Alumni Research Foundation. JYP was partially supported by the Natural Sciences and Engineering Research Council of Canada (RGPIN-2022-04831) the University of Toronto’s Data Sciences Institute, the McLaughlin Centre, and the Connaught Fund. The computing resources were enabled in part by support provided by University of Toronto and the Digital Research Alliance of Canada (alliancecan.ca).
References
- Amar \BOthers. (\APACyear2013) \APACinsertmetastaramar2013dissection{APACrefauthors}Amar, D., Safer, H.\BCBL \BBA Shamir, R. \APACrefYearMonthDay2013. \BBOQ\APACrefatitleDissection of regulatory networks that are altered in disease via differential co-expression Dissection of regulatory networks that are altered in disease via differential co-expression.\BBCQ \APACjournalVolNumPagesPLoS Computational Biology93e1002955. \PrintBackRefs\CurrentBib
- Cai \BOthers. (\APACyear2013) \APACinsertmetastarcai2013two{APACrefauthors}Cai, T., Liu, W.\BCBL \BBA Xia, Y. \APACrefYearMonthDay2013. \BBOQ\APACrefatitleTwo-sample covariance matrix testing and support recovery in high-dimensional and sparse settings Two-sample covariance matrix testing and support recovery in high-dimensional and sparse settings.\BBCQ \APACjournalVolNumPagesJournal of the American Statistical Association108501265–277. \PrintBackRefs\CurrentBib
- Carmon \BOthers. (\APACyear2020) \APACinsertmetastarcarmon2020reliability{APACrefauthors}Carmon, J., Heege, J., Necus, J\BPBIH., Owen, T\BPBIW., Pipa, G., Kaiser, M.\BDBLWang, Y. \APACrefYearMonthDay2020. \BBOQ\APACrefatitleReliability and comparability of human brain structural covariance networks Reliability and comparability of human brain structural covariance networks.\BBCQ \APACjournalVolNumPagesNeuroImage220117104. \PrintBackRefs\CurrentBib
- Chen \BOthers. (\APACyear2022) \APACinsertmetastarchen2022mitigating{APACrefauthors}Chen, A\BPBIA., Beer, J\BPBIC., Tustison, N\BPBIJ., Cook, P\BPBIA., Shinohara, R\BPBIT., Shou, H.\BCBL \BBA Alzheimer’s Disease Neuroimaging Initiative. \APACrefYearMonthDay2022. \BBOQ\APACrefatitleMitigating site effects in covariance for machine learning in neuroimaging data Mitigating site effects in covariance for machine learning in neuroimaging data.\BBCQ \APACjournalVolNumPagesHuman Brain Mapping4341179–1195. \PrintBackRefs\CurrentBib
- Chowdhury \BOthers. (\APACyear2019) \APACinsertmetastarchowdhury2019differential{APACrefauthors}Chowdhury, H\BPBIA., Bhattacharyya, D\BPBIK.\BCBL \BBA Kalita, J\BPBIK. \APACrefYearMonthDay2019. \BBOQ\APACrefatitle(Differential) co-expression analysis of gene expression: a survey of best practices (Differential) co-expression analysis of gene expression: a survey of best practices.\BBCQ \APACjournalVolNumPagesIEEE/ACM Transactions on Computational Biology and Bioinformatics1741154–1173. \PrintBackRefs\CurrentBib
- Danaher \BOthers. (\APACyear2015) \APACinsertmetastardanaher2015covariance{APACrefauthors}Danaher, P., Paul, D.\BCBL \BBA Wang, P. \APACrefYearMonthDay2015. \BBOQ\APACrefatitleCovariance-based analyses of biological pathways Covariance-based analyses of biological pathways.\BBCQ \APACjournalVolNumPagesBiometrika1023533–544. \PrintBackRefs\CurrentBib
- Ding \BOthers. (\APACyear2024) \APACinsertmetastarding2023sampletestcovariancematrices{APACrefauthors}Ding, X., Hu, Y.\BCBL \BBA Wang, Z. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleTwo sample test for covariance matrices in ultra-high dimension Two sample test for covariance matrices in ultra-high dimension.\BBCQ \APACjournalVolNumPagesJournal of the American Statistical Associationjust-accepted1–22. \PrintBackRefs\CurrentBib
- Eisen \BOthers. (\APACyear1998) \APACinsertmetastareisen1998cluster{APACrefauthors}Eisen, M\BPBIB., Spellman, P\BPBIT., Brown, P\BPBIO.\BCBL \BBA Botstein, D. \APACrefYearMonthDay1998. \BBOQ\APACrefatitleCluster analysis and display of genome-wide expression patterns Cluster analysis and display of genome-wide expression patterns.\BBCQ \APACjournalVolNumPagesProceedings of the National Academy of Sciences952514863–14868. \PrintBackRefs\CurrentBib
- Gerstung \BOthers. (\APACyear2015) \APACinsertmetastargerstung2015combining{APACrefauthors}Gerstung, M., Pellagatti, A., Malcovati, L., Giagounidis, A., Porta, M., Jädersten, M.\BDBLBoultwood, J. \APACrefYearMonthDay2015. \BBOQ\APACrefatitleCombining gene mutation with gene expression data improves outcome prediction in myelodysplastic syndromes Combining gene mutation with gene expression data improves outcome prediction in myelodysplastic syndromes.\BBCQ \APACjournalVolNumPagesNature Communications615901. \PrintBackRefs\CurrentBib
- Hawco \BOthers. (\APACyear2021) \APACinsertmetastards003011:1.2.0{APACrefauthors}Hawco, C., Dickie, E., Herman, G., Turner, J., Argyan, M., Homan, P.\BDBLVoineskos, A. \APACrefYearMonthDay2021. \APACrefbtitleSocial Processes Initiative in Neurobiology of the Schizophrenia(s) Traveling Human Phantoms. Social Processes Initiative in Neurobiology of the Schizophrenia(s) Traveling Human Phantoms. \APACaddressPublisherOpenNeuro. \PrintBackRefs\CurrentBib
- He \BBA Chen (\APACyear2018) \APACinsertmetastarhe2018high{APACrefauthors}He, J.\BCBT \BBA Chen, S\BPBIX. \APACrefYearMonthDay2018. \BBOQ\APACrefatitleHigh-dimensional two-sample covariance matrix testing via super-diagonals High-dimensional two-sample covariance matrix testing via super-diagonals.\BBCQ \APACjournalVolNumPagesStatistica Sinica2842671–2696. \PrintBackRefs\CurrentBib
- Hu \BOthers. (\APACyear2023) \APACinsertmetastarhu2023image{APACrefauthors}Hu, F., Chen, A\BPBIA., Horng, H., Bashyam, V., Davatzikos, C., Alexander-Bloch, A.\BDBLShinohara, R\BPBIT. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleImage harmonization: A review of statistical and deep learning methods for removing batch effects and evaluation metrics for effective harmonization Image harmonization: A review of statistical and deep learning methods for removing batch effects and evaluation metrics for effective harmonization.\BBCQ \APACjournalVolNumPagesNeuroImage274120125. \PrintBackRefs\CurrentBib
- Hutter \BBA Zenklusen (\APACyear2018) \APACinsertmetastarhutter2018cancer{APACrefauthors}Hutter, C.\BCBT \BBA Zenklusen, J\BPBIC. \APACrefYearMonthDay2018. \BBOQ\APACrefatitleThe cancer genome atlas: creating lasting value beyond its data The cancer genome atlas: creating lasting value beyond its data.\BBCQ \APACjournalVolNumPagesCell1732283–285. \PrintBackRefs\CurrentBib
- Li \BBA Chen (\APACyear2012) \APACinsertmetastarli2012two{APACrefauthors}Li, J.\BCBT \BBA Chen, S\BPBIX. \APACrefYearMonthDay2012. \BBOQ\APACrefatitleTwo sample tests for high-dimensional covariance matrices Two sample tests for high-dimensional covariance matrices.\BBCQ \APACjournalVolNumPagesThe Annals of Statistics402908 – 940. \PrintBackRefs\CurrentBib
- Lock \BOthers. (\APACyear2022) \APACinsertmetastarlock2022bidimensional{APACrefauthors}Lock, E\BPBIF., Park, J\BPBIY.\BCBL \BBA Hoadley, K\BPBIA. \APACrefYearMonthDay2022. \BBOQ\APACrefatitleBidimensional linked matrix factorization for pan-omics pan-cancer analysis Bidimensional linked matrix factorization for pan-omics pan-cancer analysis.\BBCQ \APACjournalVolNumPagesThe Annals of Applied Statistics161193. \PrintBackRefs\CurrentBib
- Meng \BOthers. (\APACyear2018) \APACinsertmetastarmeng2018functional{APACrefauthors}Meng, S., Liu, G., Su, L., Sun, L., Wu, D., Wang, L.\BCBL \BBA Zheng, Z. \APACrefYearMonthDay2018. \BBOQ\APACrefatitleFunctional clusters analysis and research based on differential coexpression networks Functional clusters analysis and research based on differential coexpression networks.\BBCQ \APACjournalVolNumPagesBiotechnology & Biotechnological Equipment321171–182. \PrintBackRefs\CurrentBib
- Morgan \BBA Ramos (\APACyear2024) \APACinsertmetastarbiocmanagerpackage{APACrefauthors}Morgan, M.\BCBT \BBA Ramos, M. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleBiocManager: Access the Bioconductor Project Package Repository BiocManager: Access the Bioconductor Project Package Repository\BBCQ [\bibcomputersoftwaremanual]. {APACrefURL} https://CRAN.R-project.org/package=BiocManager \APACrefnoteR package version 1.30.25 \PrintBackRefs\CurrentBib
- O’Donnell \BBA Westin (\APACyear2011) \APACinsertmetastaro2011introduction{APACrefauthors}O’Donnell, L\BPBIJ.\BCBT \BBA Westin, C\BHBIF. \APACrefYearMonthDay2011. \BBOQ\APACrefatitleAn introduction to diffusion tensor image analysis An introduction to diffusion tensor image analysis.\BBCQ \APACjournalVolNumPagesNeurosurgery Clinics222185–196. \PrintBackRefs\CurrentBib
- Pan \BOthers. (\APACyear2014) \APACinsertmetastarpan2014powerful{APACrefauthors}Pan, W., Kim, J., Zhang, Y., Shen, X.\BCBL \BBA Wei, P. \APACrefYearMonthDay2014. \BBOQ\APACrefatitleA powerful and adaptive association test for rare variants A powerful and adaptive association test for rare variants.\BBCQ \APACjournalVolNumPagesGenetics19741081–1095. \PrintBackRefs\CurrentBib
- Park \BBA Lock (\APACyear2020) \APACinsertmetastarpark2020integrative{APACrefauthors}Park, J\BPBIY.\BCBT \BBA Lock, E\BPBIF. \APACrefYearMonthDay2020. \BBOQ\APACrefatitleIntegrative factorization of bidimensionally linked matrices Integrative factorization of bidimensionally linked matrices.\BBCQ \APACjournalVolNumPagesBiometrics76161–74. \PrintBackRefs\CurrentBib
- Schott (\APACyear2007) \APACinsertmetastarschott2007test{APACrefauthors}Schott, J\BPBIR. \APACrefYearMonthDay2007. \BBOQ\APACrefatitleA test for the equality of covariance matrices when the dimension is large relative to the sample sizes A test for the equality of covariance matrices when the dimension is large relative to the sample sizes.\BBCQ \APACjournalVolNumPagesComputational Statistics & Data Analysis51126535–6542. \PrintBackRefs\CurrentBib
- Srivastava \BBA Yanagihara (\APACyear2010) \APACinsertmetastarsrivastava2010testing{APACrefauthors}Srivastava, M\BPBIS.\BCBT \BBA Yanagihara, H. \APACrefYearMonthDay2010. \BBOQ\APACrefatitleTesting the equality of several covariance matrices with fewer observations than the dimension Testing the equality of several covariance matrices with fewer observations than the dimension.\BBCQ \APACjournalVolNumPagesJournal of Multivariate Analysis10161319–1329. \PrintBackRefs\CurrentBib
- Stuart \BOthers. (\APACyear2003) \APACinsertmetastarstuart2003gene{APACrefauthors}Stuart, J\BPBIM., Segal, E., Koller, D.\BCBL \BBA Kim, S\BPBIK. \APACrefYearMonthDay2003. \BBOQ\APACrefatitleA gene-coexpression network for global discovery of conserved genetic modules A gene-coexpression network for global discovery of conserved genetic modules.\BBCQ \APACjournalVolNumPagesScience3025643249–255. \PrintBackRefs\CurrentBib
- Tae \BOthers. (\APACyear2018) \APACinsertmetastartae2018current{APACrefauthors}Tae, W\BHBIS., Ham, B\BHBIJ., Pyun, S\BHBIB., Kang, S\BHBIH.\BCBL \BBA Kim, B\BHBIJ. \APACrefYearMonthDay2018. \BBOQ\APACrefatitleCurrent clinical applications of diffusion-tensor imaging in neurological disorders Current clinical applications of diffusion-tensor imaging in neurological disorders.\BBCQ \APACjournalVolNumPagesJournal of Clinical Neurology142129–140. \PrintBackRefs\CurrentBib
- The Cancer Genome Atlas Research Network (\APACyear2012) \APACinsertmetastarcancer2012comprehensive{APACrefauthors}The Cancer Genome Atlas Research Network. \APACrefYearMonthDay2012. \BBOQ\APACrefatitleComprehensive genomic characterization of squamous cell lung cancers Comprehensive genomic characterization of squamous cell lung cancers.\BBCQ \APACjournalVolNumPagesNature4897417519. \PrintBackRefs\CurrentBib
- The Cancer Genome Atlas Research Network (\APACyear2014) \APACinsertmetastarcancer2014comprehensive{APACrefauthors}The Cancer Genome Atlas Research Network. \APACrefYearMonthDay2014. \BBOQ\APACrefatitleComprehensive molecular profiling of lung adenocarcinoma Comprehensive molecular profiling of lung adenocarcinoma.\BBCQ \APACjournalVolNumPagesNature5117511543. \PrintBackRefs\CurrentBib
- Van der Vaart (\APACyear2000) \APACinsertmetastarvan2000asymptotic{APACrefauthors}Van der Vaart, A\BPBIW. \APACrefYear2000. \APACrefbtitleAsymptotic Statistics Asymptotic Statistics (\BVOL 3). \APACaddressPublisherCambridge University Press. \PrintBackRefs\CurrentBib
- Van Hecke \BOthers. (\APACyear2016) \APACinsertmetastarvan2016diffusion{APACrefauthors}Van Hecke, W., Emsell, L.\BCBL \BBA Sunaert, S. \APACrefYear2016. \APACrefbtitleDiffusion Tensor Imaging: a Practical Handbook Diffusion Tensor Imaging: a Practical Handbook (\BVOL 1). \APACaddressPublisherSpringer. \PrintBackRefs\CurrentBib
- Veronese \BOthers. (\APACyear2019) \APACinsertmetastarveronese2019covariance{APACrefauthors}Veronese, M., Moro, L., Arcolin, M., Dipasquale, O., Rizzo, G., Expert, P.\BDBLTurkheimer, F. \APACrefYearMonthDay2019. \BBOQ\APACrefatitleCovariance statistics and network analysis of brain PET imaging studies Covariance statistics and network analysis of brain PET imaging studies.\BBCQ \APACjournalVolNumPagesScientific Reports912496. \PrintBackRefs\CurrentBib
- Wang \BOthers. (\APACyear2009) \APACinsertmetastarwang2009rna{APACrefauthors}Wang, Z., Gerstein, M.\BCBL \BBA Snyder, M. \APACrefYearMonthDay2009. \BBOQ\APACrefatitleRNA-Seq: a revolutionary tool for transcriptomics RNA-Seq: a revolutionary tool for transcriptomics.\BBCQ \APACjournalVolNumPagesNature Reviews Genetics10157–63. \PrintBackRefs\CurrentBib
- Yu \BBA Luo (\APACyear2021) \APACinsertmetastaryu2021recovering{APACrefauthors}Yu, J.\BCBT \BBA Luo, X. \APACrefYearMonthDay2021. \BBOQ\APACrefatitleRecovering spatially-varying cell-specific gene co-expression networks for single-cell spatial expression data Recovering spatially-varying cell-specific gene co-expression networks for single-cell spatial expression data.\BBCQ \APACjournalVolNumPagesFrontiers in Genetics12656637. \PrintBackRefs\CurrentBib
- Zhang \BOthers. (\APACyear2024) \APACinsertmetastarzhang2024san{APACrefauthors}Zhang, R., Chen, L., Oliver, L\BPBID., Voineskos, A\BPBIN.\BCBL \BBA Park, J\BPBIY. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleSAN: mitigating spatial covariance heterogeneity in cortical thickness data collected from multiple scanners or sites SAN: mitigating spatial covariance heterogeneity in cortical thickness data collected from multiple scanners or sites.\BBCQ \APACjournalVolNumPagesHuman Brain Mapping457e26692. \PrintBackRefs\CurrentBib
- Zhang \BOthers. (\APACyear2023) \APACinsertmetastar10.1162/imag_a_00011{APACrefauthors}Zhang, R., Oliver, L\BPBID., Voineskos, A\BPBIN.\BCBL \BBA Park, J\BPBIY. \APACrefYearMonthDay202308. \BBOQ\APACrefatitleRELIEF: A structured multivariate approach for removal of latent inter-scanner effects RELIEF: A structured multivariate approach for removal of latent inter-scanner effects.\BBCQ \APACjournalVolNumPagesImaging Neuroscience11-16. \PrintBackRefs\CurrentBib
- Zhu \BOthers. (\APACyear2017) \APACinsertmetastarzhu2017testing{APACrefauthors}Zhu, L., Lei, J., Devlin, B.\BCBL \BBA Roeder, K. \APACrefYearMonthDay2017. \BBOQ\APACrefatitleTesting high-dimensional covariance matrices, with application to detecting schizophrenia risk genes Testing high-dimensional covariance matrices, with application to detecting schizophrenia risk genes.\BBCQ \APACjournalVolNumPagesThe Annals of Applied Statistics1131810. \PrintBackRefs\CurrentBib
Appendix
Appendix A Proofs of Results in Section 3
A.1 Proof of Theorem 1
Let denote the th largest singular value of . By matrix properties, we can write the eigendecomposition with . Similarly, let denote the th largest singular value of , and we can write the eigendecomposition with . To prove Theorem 1, we first provide Lemma 3 below, which gives the joint limiting distribution of the eigenvalues of .
Lemma 3 (Limits of eigenvalues)
where the limiting covariance matrix with defined same as in (3).
For , , and . For ,
Then we have
where is jointly normal with mean zero and covariance .
It remains to prove Lemma 3. Let and . Then we have
Write with , where i.i.d. represents independent and identically distributed, and . By , we have , where
It follows that . By , and the central limit theorem, we have
where with
(6) |
It remains to derive below.
We first consider the first term in (6) and the second one can be derived similarly. For the simplicity of notation, let
Then , and
Applying similar analysis to the second term in (6), we can obtain
A.1.1 Proof of Proposition 2
By the definition of , if and only if
Appendix B Minimum -value Method
Below we outline the minimum -value approach from Section 2.4. For this approach we calculate the critical value threshold in a similar fashion to Pan \BOthers. (\APACyear2014) where for each individual we use the permuted datasets to calculate a -value, and then take a minimum across all to calculate a minimum -value
where is calculated in the same way as in (1), except with the data in permuted order instead of the original order. We note here that only one set of permutations is needed as once has been calculated for all and the minimum -values for each permutation can be calculated easily from these without the need to conduct a second set of permutations.