PC Adjusted Testing for Low Dimensional Parameters

Sohom Bhattacharya S. Bhattacharya Department of Operations Research and Financial Engineering
Princeton University
Princeton, NJ, 08540, US. [email protected] , Rounak Dey R. Dey Department of Biostatistics,
Harvard T.H. Chan School of Public Health
Boston, MA 02115, US [email protected] and Rajarshi Mukherjee R. Mukherjee Department of Biostatistics,
Harvard T.H. Chan School of Public Health
Boston, MA 02115, US [email protected]

Abstract.

In this paper we consider the effect of high dimensional Principal Component (PC) adjustments while inferring the effects of variables on outcomes. This problem is particularly motivated by applications in genetic association studies where one performs PC adjustment to account for population stratification. We consider simple statistical models to obtain asymptotically precise understanding of when such PC adjustments are supposed to work in terms of providing valid tests with controlled Type I errors. We also verify these results through a class of numerical experiments.

1. Introduction

Testing statistical hypotheses, where one needs to take special care of complications of high dimensional data structures is a staple in modern data science. Indeed, a quintessential example of this type problems arise naturally in the context of genetic association studies. Although such questions are not specific to the domain of genetic associate studies, the seminal paper of price2006principal has put forward a Principal Component (PC) adjustment based procedure to adjust for population stratification type confounding and has been applied across a vast canvas of omics type applications beyond Genome Wide Association Studies (GWAS). As another specific example, barfield2014accounting explores the applicability of the same principle in epigenome wide association studies. A major appeal of this method, that we will try lend a theoretical lens to in this paper, is availability of the EIGENSTRAT package of price2006principal that allows fast implementation of this principle and has therefore been immensely popular as a standard adjustment procedure to account for confounding bias in such studies.

In spite of the immense practical popularity of PC adjustment for high dimensional nuisance parameters, the most basic theoretical questions remain unanswered for this high popular and regularly used principle. For example, the most traditional statistical questions that arise as a part of planning a real data analysis pertains to sample size and power calculations. Interestingly, the answers to even such simple questions remain unexplored. Indeed, the two stage method involving a principal component adjustment before testing for genetic association, implies interesting statistical phenomenon in high dimensional problems where the number of genetic variants under study is large (patterson2006population) . In contrast to such implied caveats and potential pathologies of inconsistencies in high dimensional PCA (johnstone2009consistency), empirical success of the procedure in practical problems in turn go far towards making a strong case for the PC based adjustments in these applications (yang2014principal). This paper is aimed at providing first steps towards a theoretical formalism and reconciling these issues and thereby potentially resolve questions regarding when and why such PC-based adjustments are reliable and mathematically valid.

2. Setup

We consider data on $(Y_{i},\mathbf{X}_{i})_{i=1}^{n}\stackrel{{\scriptstyle\rm i.i.d.}}{{\sim}}\mathbb{P}$ where where $Y_{i}\in\mathbb{R}$ denotes an outcome of interest and $\mathbf{X}_{i}\in\mathbb{R}^{p}$ collects predictor variables under study. In many case, it is of interest to test the effect of specific variables/components in $\mathbf{X}$ on $Y$ while conditioning on other variables in $\mathbf{X}$ . More precisely, we will partition $\mathbf{X}$ as $\mathbf{X}=[A,\mathbf{W}]^{T}$ where $A\in\mathbb{R}$ is the variable whose effect on $Y$ is of interest and $\mathbf{W}$ collects confounders one wishes to adjust while decoding this relationship. Our main question of interest concerns a specific way of inferring the effect of $A$ on $Y$ – namely principal component adjustment for the confounding variables in $\mathbf{X}$ . We will study the precise asymptotic properties of this specific method through the lens of a hypothesis testing problem for the effect of $A$ on $Y$ . To obtain sharp asymptotics and precise answers, for our theoretical developments we will focus on the Gaussian linear regression set-up where {equs}Y_i=A_iδ+W_i^Tβ+ϵ_i, ϵ_i∼N(0,σ^2_y), and thereby consider the hypothesis testing problem for $\delta$ defined as {equs}H_0: δ=0 vs H_1(h):δ=hn. The choice of the alternative converging to $0$ at $1/\sqrt{n}$ rate is driven by the fact that the detection threshold for testing $\delta$ scales accordingly in the asymptotic regimes we will work with in this paper. Subsequently, with an abuse of notation that suppresses dependence on other parameters, we will denote the distribution $\mathbb{P}$ of $Y|A,\mathbf{W}$ corresponding to model (2) as $\mathbb{P}_{\delta}$ to underscore the main object of the hypothesis test i.e. $\delta$ . Finally, throughout we assume that $\|\boldsymbol{\beta}\|$ is bounded away from $0$ to reflect a scenario where the effect of $A$ on $Y$ experiences non-trivial confounding through $\mathbf{W}$ .

The standard test for (2) is the Generalized Likelihood Ratio Test (GLRT) – which for known $\sigma_{y}^{2}$ ¹¹1We will throughout assume this and show through numerical experiments that the results for unknown $\sigma^{2}_{y}$ should be qualitatively similar. reduces to the chi-square test with $1$ -degree of freedom. Although the same test has non-trivial local power for testing (2) even for growing dimension $p$ : (1) it is often believed that suitable dimension reduction techniques on the regression of $\mathbf{X}$ on $Y$ by relying on the intuition that dimension reduction techniques capture underlying structures of the problem (e.g. sparsity, low rank etc) and thereby yields more power for the testing problem compared to vanilla GLRT; and (2) performing GLRT at the scale of modern high dimensional problems such as GWAS is computationally expensive ( ${O}(\min\{p^{3},np^{2},n^{2}p\})$ ) and performing dimension reduction on $\mathbf{X}$ substantially reduces this burden. To account for these concerns, one method that has gained popularity is that of performing Principal Component Analysis (PCA) based dimension reduction on $\mathbf{X}$ which amounts to including “a few top principal components” in the regression (and can be computed through Fast-PCA type algoritms (galinsky2016fast)) of $Y$ on $\mathbf{X}$ (patterson2006population; price2006principal; barfield2014accounting). In this regard, for testing effect of $A$ on $Y$ , it is natural to consider PC adjustment on $\mathbf{W}$ only. However, one attractive piece of the EIGENSTRAT method introduced in price2006principal is the fact that it allows the investigator to only run PCA on the whole of $\mathbf{X}$ data matrix only once and including them in the regression analysis of $Y$ on any $X_{j}$ (and not a specific component $A$ as considered here) which is a component of $\mathbf{X}$ . More specifically, we shall study the statistical properties (Type I and Type II error of testing for $\delta$ in the model (2)) of testing for a specific component $A$ of $\mathbf{X}$ , our set up is in parallel to traditional GWAS analysis where is a standard practice to run PCA on the whole data matrix of $\mathbf{X}$ (instead on the matrix obtained by removing $A$ -observations from it) and include them in the analyses – instead of running a new PCA every time for testing for a new variable in $\mathbf{X}$ . Here we will explore both the cases when the PC is run on the whole of $\mathbf{X}$ (a case more relevant for GWAS type analyses (price2006principal)) and for the case when PC adjustments are performed on $\mathbf{W}$ (a case relevant in certain epigenetic studies where $A$ contains DNA methylation levels of a particular cytosine-phosphate-guanine (CpG) site of interest and one performs PCA on genetic variants $\mathbf{W}$ to account for population stratification barfield2014accounting. )

To initiate a theoretical study of this method, we begin by with setting up the necessary mathematical notation to describe the procedure. As discussed above, one standard approach in practice is to perform Principal Component (PC) analysis on the data matrix of $\mathbf{X}$ and only use the top principal component directions while adjusting for confounders in the regression of $Y$ on $A$ . Mathematically, such a formalism can be written as follows through the data vectors $\mathbf{Y}=(Y_{1},\ldots,Y_{n})^{T}$ , $\mathbf{A}=(A_{1},\ldots,A_{n})^{T}$ and the data matrix of covariates $\mathbb{X}=[\mathbf{X}_{1}:\cdots:\mathbf{X}_{n}]^{T}$ :

(i)

Given $k\in\mathbb{N}$ , let $\widetilde{{\mathbf{V}_{k,\mathrm{in}}}}$ denote the matrix composed of the first $k$ right singular vectors of $\mathbb{X}$ .
(ii)

Perform a linear regression of $\mathbf{Y}$ on $(\mathbf{A},\mathbb{X}\widetilde{{\mathbf{V}_{k,\mathrm{in}}}})$ to get a score or level of significance for $\delta$ .

Here the subscript $\mathrm{in}$ refers to the fact that a $A$ is a part of the variables in $\mathbf{X}$ on which the PC was performed. Mathematically, for any cut-off $t$ we consider the test that rejects for value of the likelihood ratio larger than $t$ as follows {equs}φ_k,in(t)=1(LR_k,in¿t), where {equs}LR_k,in=Y^TP_C(A_k, in)Y, A_k, in=(I-P_C(X~V_k,in))A Here for any matrix $M$ we use $\mathcal{C}(M)$ to denote its column space and $P_{\mathcal{C}(M)}$ to denote the projection matrix onto $\mathcal{C}(M)$ . An analogous can be constructed where the PC adjustment was performed only on $\mathbb{W}:=[\mathbf{W}_{1}:\cdots:\mathbf{W}_{n}]^{T}$ – which we shall refer to as {equs}φ_k,out(t):=1(LR_k,out¿t) with {equs}LR_k,out=Y^TP_C(A_k, out)Y, A_k,out=(I-P_C(W~V_k,out))A and $\widetilde{\mathbf{V}_{k,\mathrm{out}}}$ now collects the top right singular vectors of $\mathbb{W}$ .

This approach, similar in essence to Principal Component Regression (to be denoted by PCR hereafter) raises a few immediate questions:

(i)

What is the behavior of $\varphi_{k,\mathrm{\mathrm{ind}}}(t)$ , $\rm ind\in\{\mathrm{in},\mathrm{out}\}$ under $H_{0}$ and does there exist a distributional cut-off $t_{\alpha}$ such that the Type I error of $\varphi_{k,\mathrm{\rm ind}}(t_{\alpha})$ converges to $\alpha$ ?
(ii)

How to characterize the power function of $\varphi_{k,\mathrm{ind}}(t)$ , $\rm ind\in\{\mathrm{in},\mathrm{out}\}$ for any given $(k,t)$ ?

To understand these questions, we operate in a regime which allows $p$ to be larger than sample size but to scale proportionally with $n$ . In this asymptotic regime, the main results and contributions of this paper in view of the above questions are as follows:

(i)

Under a generalized spiked model for $\mathbf{W}$ (see Definition 2.2) we provide exact asymptotic results for testing (2) after PC adjustments. The precise description of the testing errors can be elaborated as follows.
1. (a)
  
  Our results show an interesting distinction between a fixed effects and random effects on $\boldsymbol{\beta}$ . In particular, we show that when $\boldsymbol{\beta}$ in (2) is considered fixed and unknown the Type I error of the test for (2) after PC adjustments converges to $1$ exponentially in sample size $n$ – even after including “enough” principal component direction in the regression. In contrast, when $\boldsymbol{\beta}$ has a mean zero random effect distribution the Type I error of the test is bounded away from both $1$ and the nominal desired level. (See Theorem 3.1 i.-ii.).
2. (b)
  
  For random effects on $\boldsymbol{\beta}$ , we derive the precise power function of the test for (2) after PC adjustments and show that without further assumptions it might not be possible to estimate the power function exactly to either correct for the Type I error inflation or perform sample size/power calculations (See Theorem 3.1 ii.).
(ii)

We show that, when $\mathbf{W}$ is derived from a mixture of mean-shift type distributions, a similar phenomenon persists in terms of the behavior of the test for (2) after PC adjustments. We further verify similar behavior through extensive simulations even when the distribution of $\mathbf{X}$ arises from mixture of discrete distributions. These simulations provide interesting insights into the subtleties and success of the highly popular EIGENSTRAT method of price2006principal (See Theorem 3.3).
(iii)

Under a classical spiked model (johnstone2001distribution) on $\mathbf{W}$ with diverging leading spike strengths, we derive a precise phase transition on the minimum spike strength and the angle between the $\boldsymbol{\beta}$ and leading population spikes required for the pathologies of the procedure $\varphi_{k,\mathrm{ind}}$ noted in points (i) and (ii) above to disappear. (See Theorem 3.4-3.7).

The rest of the paper, organized as follows, makes the above claims precise, rigorous, and elaborate. We first introduce the notation, definitions, and assumptions to set up the main building blocks of our results in Section 2.1. We divide the main theoretical results in Section 3 into three parts as follows. In Section 3.1 we present asymptotic results on the power function of $\varphi_{k,\rm ind},\rm ind\in\{\mathrm{in},\mathrm{out}\}$ for fixed spike/signal strength. Subsequently, we present the implications of these results in the context of GWAS type problems in Section 3.2.Thereafter, Section 3.3 is devoted to understanding the effect of diverging signal strength and thereby providing benchmarks to decide when and how such PC adjustment based testing procedures are valid in practice. Finally, in Section 4 we present detailed numerical experiments to not only verify our theoretical findings in simulation examples but also explore the continued validity of our analytical results beyond the specific conditions assumed for the theoretical derivations.

2.1. Technical Preparations

In this subsection we present definitions, assumptions and related discussions, and notation that will be used throughout the rest of the paper.

2.1.1. Definitions

To present our main results of this section we first need a few definitions to set up the notion of generalized spiked models. We start with the definition of empirical spectral measure of a Hermitian matrix.

Definition 2.1.

For a $p$ -dimensional Hermitian matrix $\Sigma_{p}$ with eigenvalues $\lambda_{1,p},\lambda_{2,p},\ldots,\lambda_{p,p}$ , the Empirical Spectral Distribution (ESD) $H_{p}$ is defined as the distribution function corresponding to the probability measure

\displaystyle dH_{p}(x)=\frac{1}{p}\sum_{i=1}^{p}{\delta_{\lambda_{i,p}}(x)},

where $\delta_{\lambda_{i,p}}(x)=1$ when $x=\lambda_{i,p}$ , otherwise $0$ . For a sequence $\{\Sigma_{p}\}_{p\geq 1}$ of deterministic Hermitian matrices, if the corresponding sequence $\{H_{p}\}_{p\geq 1}$ of ESDs converges weakly to a non-random probability distribution $H$ as $p\rightarrow\infty$ , then $H$ is defined as the Limiting Spectral Distribution (LSD) of the sequence $\{\Sigma_{p}\}_{p\geq 1}$ .

With this definition in hand we are ready to present the definition of generalized spiked models as proposed by bai2012gsp.

Definition 2.2.

A sequence $\{\Sigma_{p}\}_{p\geq 1}$ of deterministic Hermitian matrices is called a sequence of generalized spiked population matrices if the following hold.

(i)

$\Sigma_{p}$ can be written as,

$\Sigma_{p}=\begin{bmatrix}\Sigma^{(A)}_{m}&0\\ 0&\Sigma^{(B)}_{p-m}\end{bmatrix},$

where $\Sigma^{(A)}_{m}$ and $\Sigma^{(B)}_{p-m}$ are non-negative and non-random Hermitian matrices of dimensions $m\times m$ ( $m$ is finite and fixed) and $(p-m)\times(p-m)$ , respectively.
(ii)

The sequence $\{H_{p}\}_{p\geq 1}$ of ESDs corresponding to $\{\Sigma_{p}\}_{p\geq 1}$ converges weakly to a non-random probability distribution $H$ .
(iii)

Let $\Gamma_{H}$ be the support of $H$ , and let the sets of eigenvalues of the submatrices $\Sigma^{(A)}_{m}$ and $\Sigma^{(B)}_{p-m}$ be $\{\alpha_{1},\geq\ldots\geq\alpha_{m}\}$ and $\{\beta_{1,p},\geq\ldots\geq\beta_{p-m,p}\}$ , respectively. Then, $\alpha_{i}\notin\Gamma_{H}$ for $i=1,\ldots,m$ , and $\max_{1\leq i\leq p-m}{d\left(\beta_{i,p},\Gamma_{H}\right)}\rightarrow 0$ as $p\rightarrow\infty$ , where $d(x,S_{A})=\inf_{y\in S_{A}}|x-y|$ is a distance metric from a point $x$ to a set $S_{A}$ .

In this case, the eigenvalues of $\Sigma^{(A)}_{m}$ are called the generalized spikes, and the eigenvalues of $\Sigma^{(B)}_{p-m}$ are called the non-spikes. The distribution $H$ is same as the LSD of the sequence $\{\Sigma^{(B)}_{p-m}\}$ .

We note that the traditional spiked model (johnstone2001distribution) is a special case of the generalized spiked model where $H$ is the degenerate distribution at unity. Next, we will introduce the phase transition boundaries for the generalized spikes analogous to the phase transition boundary (baik2005phase) established in the spiked model.

Definition 2.3.

For $\alpha\notin\Gamma_{H},\alpha\neq 0$ , we define the following function:

\psi(\alpha)=\alpha+\gamma\alpha\int{\frac{\lambda dH(\lambda)}{\alpha-\lambda}}.

Then, a generalized spike $\alpha_{i}$ is called a “distant spike” if $\psi^{\prime}(\alpha_{i})>0$ , and “close spike” if $\psi^{\prime}(\alpha_{i})\leq 0$ , where $\psi^{\prime}$ is the first derivative of $\psi$ .

Depending on whether a generalized spike is a distant or a close spike, the asymptotic convergence of the corresponding sample eigenvalues differ according to Theorem 4.1 and Theorem 4.2 of (bai2012gsp). Therefore, the sample eigenvalues corresponding to the generalized spikes go through a phase transition at the boundaries where $\psi^{\prime}(\alpha)=0$ . Unlike in the spiked model where the phase transition happens only at the two boundaries $(1\pm\sqrt{\gamma})$ , the generalized spiked model can have more than two phase transition boundaries. The next assumption is intended to simplify our mathematical derivations by only allowing one phase transition boundary $\alpha^{*}>\sup{\Gamma_{H}}$ , and by assuming that all the generalized spikes have multiplicity one and are above that boundary (i.e., all distant spikes). The extension of our derivations to allow close spikes, and generalized spikes around multiple phase transition boundaries are similar but tedious, and hence omitted in this paper. Finally we define covariate data distributions we will working with throughout.

Definition 2.4.

We say that a data matrix $\widetilde{\mathbb{X}}_{n\times d}$ follows a generalized spiked distribution with $k^{*}$ spike eigenvalue-eigenvector pairs $\left\{(\lambda_{j},v_{j})\right\}_{j=1}^{k^{*}}$ if $\widetilde{\mathbb{X}}=Z\Lambda_{d}^{1/2}V_{d}$ , where $Z$ is an $n\times d$ random matrix with i.i.d. sub-gaussian elements such that $\mathbb{E}(Z_{ij})=0,\mathbb{E}(|Z_{ij}|^{2})=1$ , and $\Lambda_{d}^{1/2}$ is the poitive definite Hermitian square root of $\Lambda_{d}$ such that

(a)

$\{\Sigma_{d}\}_{d\geq 1}$ is a sequence of real symmetric positive definite matrices with spectral decomposition $\Sigma_{d}=V_{d}\Lambda_{d}V_{d}^{\top}$ , and $\Lambda_{d}$ satisfies the generalized spiked assumptions as outlined above (Definition 2.2) with $k^{*}$ (finite) generalized spikes. The eigenvalues and eigenvectors of $\Sigma_{d}$ are given by $\lambda_{1},\ldots,\lambda_{d}$ and $v_{1},\ldots,v_{d}$ , respectively.
(b)

The sequence $\{\|\Sigma_{d}\|\}$ of spectral norms is bounded.
(c)

$\lambda_{1}>\lambda_{2}>\ldots>\lambda_{k^{*}}>\sup{\Gamma_{H}}$ denote the generalized spikes of $\Sigma_{d}$ , and $\psi^{\prime}(\lambda_{i})>0$ for $i=1,\ldots,k^{*}$ .

In this case we denote $\widetilde{\mathbb{X}}\sim\mathrm{GSp}(\left\{(\lambda_{j},v_{j})\right\}_{j=1}^{k^{*}};\Gamma_{H};n,d)$

Some of our results are derived under the classical spiked model (johnstone2001distribution). To that end we will use the following definition.

Definition 2.5.

We say that a data matrix $\widetilde{\mathbb{X}}_{n\times d}$ follows a spiked distribution with $k^{*}$ spike eigenvalue-eigenvector pairs $\left\{(\lambda_{j},v_{j})\right\}_{j=1}^{k^{*}}$ if $\widetilde{\mathbb{X}}=Z\Sigma^{1/2}$ , where $Z$ is an $n\times d$ random matrix with i.i.d. sub-gaussian elements such that $\mathbb{E}(Z_{ij})=0,\mathbb{E}(|Z_{ij}|^{2})=1$ , and $\Sigma_{d}^{1/2}=I+\sum_{j=1}^{k^{*}}\lambda_{j}v_{j}v_{j}^{T}$ with $\lambda_{1}\geq\ldots\lambda_{k^{*}}>\sqrt{\gamma}$ where $\gamma=\lim p/n$ and $v_{1},\ldots,v_{k^{*}}$ are orthonormal vectors. In this case we denote $\widetilde{\mathbb{X}}\sim\mathrm{Sp}(\left\{(\lambda_{j},v_{j})\right\}_{j=1}^{k^{*}};n,d)$

2.1.2. Assumptions

Using the above definitions, we can now state our assumptions various subsets of which will be useful for our theoretical results.

Assumption 2.6.

(a)

$p\rightarrow\infty,n\rightarrow\infty,p/n\rightarrow\gamma\in(0,\infty)$ .
(b)

$\mathbb{W}\sim\mathrm{GSp}(\left\{(\lambda_{j},v_{j})\right\}_{j=1}^{k^{*}};\Gamma_{H};n,p)$ .
(b)’

$\mathbb{W}\sim\mathrm{Sp}(\left\{(\lambda_{j},v_{j})\right\}_{j=1}^{k^{*}};p)$ .
(c)

${\mathbb{X}}:=[\mathbf{A},\mathbb{W}]\sim\mathrm{GSp}(\left\{(\lambda_{j},v_{j})\right\}_{j=1}^{k^{*}};\Gamma_{H};n,p+1)$ .
(c)’

${\mathbb{X}}:=[\mathbf{A},\mathbb{W}]\sim\mathrm{Sp}(\left\{(\lambda_{j},v_{j})\right\}_{j=1}^{k^{*}};p+1)$ .

We will mostly assume the following dependence between $\mathbf{A}$ and $\mathbb{W}$ .

Assumption 2.7.

Let $\mathbf{A}=\mathbb{W}\boldsymbol{\theta}+\boldsymbol{\eta}$ with $\boldsymbol{\eta}\sim N(0,\sigma^{2}_{g}I)$ .

Although assumption 2.7 seems restrictive, we keep it here for keeping our arguments short and only for the precise analysis of the test $\varphi_{k,\mathrm{out}}$ – this assumption is often not needed either for analysis of $\varphi_{k,\mathrm{in}}$ or while demonstrating a lower bound on the Type I error of $\varphi_{k,\mathrm{out}}$ instead of deriving a precise limit . Indeed, this result is easy to extend to sub-Gaussian $A$ ’s and we additionally verify through extensive simulations (see Section 4) that the intuitions from our main theorems continue to be valid even without assuming the above specific conditional distribution of $A|\mathbf{X}$ . Finally we note that for the analysis in epigenetic studies (barfield2014accounting), where $A$ contains DNA methylation levels of a particular CpG site of interest and one performs PCA on genetic variants $\mathbf{W}$ to account for population stratification, assumption 2.7 is often reasonable. Our last assumption removes this restrictions in some of our results at the cost of obtaining bounds instead of precise limiting objects.

Assumption 2.8.

Let $A_{i}$ ’s be centered and sub-Gaussian with parameter $\sigma^{2}_{a}$ .

Assumption 2.8 will be used for deriving lower bounds on Type I error for tests based on $\mathrm{LR}_{k,\mathrm{out}}$ and obtain partial sharp limiting Type I error for tests based on $\mathrm{LR}_{k,\mathrm{in}}$ (see e.g. Theorem 3.2).

2.1.3. Notation

The results in this paper are mostly asymptotic (in $n$ ) in nature and thus requires some standard asymptotic notations. If $a_{n}$ and $b_{n}$ are two sequences of real numbers then $a_{n}\gg b_{n}$ (and $a_{n}\ll b_{n}$ ) implies that ${a_{n}}/{b_{n}}\rightarrow\infty$ (and ${a_{n}}/{b_{n}}\rightarrow 0$ ) as $n\rightarrow\infty$ , respectively. Similarly $a_{n}\gtrsim b_{n}$ (and $a_{n}\lesssim b_{n}$ ) implies that $\liminf_{n\rightarrow\infty}{{a_{n}}/{b_{n}}}=C$ for some $C\in(0,\infty]$ (and $\limsup_{n\rightarrow\infty}{{a_{n}}/{b_{n}}}=C$ for some $C\in[0,\infty)$ ). Alternatively, $a_{n}=o(b_{n})$ will also imply $a_{n}\ll b_{n}$ and $a_{n}=O(b_{n})$ will imply that $\limsup_{n\rightarrow\infty}\ a_{n}/b_{n}=C$ for some $C\in[0,\infty)$ ). If $C>0$ then we write $a_{n}=\Theta(b_{n})$ . If $a_{n}/b_{n}\rightarrow 1$ , then we say $a_{n}\sim b_{n}$ . A sequence of random variable $X_{n}$ is called $\Omega_{\mathbb{P}}$ , $\mathrm{a.s.}\ \mathbb{P}$ if $\exists C>0$ such that $\mathbb{P}(X_{n}>C)\rightarrow 1$ .

3. Main Results

The behavior of the tests $\varphi_{k,\rm ind}(t),\rm ind\in\{\mathrm{in},\mathrm{out}\}$ , crucially depends on the “strength” of principal components in the population model which in turn is reflective of the strength of population stratification under consideration for the GWAS studies. To capture this, we first present results for fixed “strength” of principal components in Section 3.1. Subsequently, in Section 3.3 we also present a sharp phase transition depending on spike strength in diverging spike strength of population principal components while considering the traditional spiked model (johnstone2001distribution). The implications of these results to GWAS type problems is discussed in Section 3.2 Finally, 3.2 also presents results when the studies come from a mixture of populations and thereby directly addressing population stratification type issues in GWAS.

3.1. Fixed Strength of Population PCs

We begin by stating and discussing the results for $\mathrm{LR}_{k,\mathrm{ind}}$ fixed stratification strength, i.e., $\lambda_{1}=O(1)$ whenever $\mathbb{X}$ or $\mathbb{W}$ follow a generalized spiked distribution with top spiked eigenvalue $\lambda_{1}$ .

Theorem 3.1.

Consider testing (2) under (2) using $\varphi_{k,\rm ind}(t)$ . Then the following hold with any fixed $k\geq k^{*}$ under Assumptions 2.6 (a), (b), and 2.7 for $\rm ind=\mathrm{out}$ and 2.6 (a) and (c) for $\rm ind=\mathrm{in}$ .

Suppose that $\boldsymbol{\beta},\boldsymbol{\theta}$ are fixed effects. Then for any fixed cut-off $t\in\mathbb{R}$ and fixed $M>0$ one has

		$\displaystyle\liminf\limits_{n\rightarrow\infty:\atop p/n\rightarrow\gamma>0}\sup\limits_{\\|\boldsymbol{\beta}\\|,\\|\boldsymbol{\theta}\\|\leq M}\mathbb{P}_{\delta=0}\left(\mathrm{LR}_{k,\mathrm{out}}>t\right)=1,$
	and	$\displaystyle\liminf\limits_{n\rightarrow\infty:\atop p/n\rightarrow\gamma>0}\sup\limits_{\\|\boldsymbol{\beta}\\|\leq M}\mathbb{P}_{\delta=0}\left(\mathrm{LR}_{k,\mathrm{in}}>t\right)=1.$

ii.

Assume that $\boldsymbol{\beta}\sim N(\mathbf{0},\sigma^{2}_{\beta}{I}/p)$ . Then $\exists$ function $\upsilon_{\mathrm{out}}\left(\cdot,\cdot\right):\mathbb{R}^{2}\rightarrow\mathbb{R}_{+}$ such that for any $h,t\in\mathbb{R}$ and $\delta=h_{1}/\sqrt{n}$ and fixed $h_{2}$ one has

		$\displaystyle\lim\mathbb{P}_{\delta=h/\sqrt{n}}(\mathrm{LR}_{k,\mathrm{out}}>t)=\upsilon_{\mathrm{out}}\left(t,h\right)\quad\forall\boldsymbol{\theta}\quad\text{s.t.}\quad\\|\boldsymbol{\theta}\\|\geq h_{2};$
	and	$\displaystyle\lim\mathbb{P}_{\delta=h/\sqrt{n}}(\mathrm{LR}_{k,\mathrm{in}}>t)=\upsilon_{\mathrm{in}}\left(t,h\right)$

The function $\upsilon_{\mathrm{out}}$ depends on $\{\gamma,\sigma^{2}_{y},\sigma^{2}_{\beta},k^{*},k,\Gamma_{H},\left\{\langle\mathbf{v}_{j},\boldsymbol{\theta}\rangle,\lambda_{j}\right\}_{j=1}^{k^{*}}\}$ and the function $\upsilon_{\mathrm{in}}$ depends on $\{\gamma,\sigma^{2}_{y},\sigma^{2}_{\beta},k^{*},k,\Gamma_{H},\left\{\lambda_{j}\right\}_{j=1}^{k^{*}}\}$ . ²²2the precise forms of the $\upsilon_{\rm ind}$ can be found in the proof where its dependence on the sequence of tuples $\{\langle\mathbf{v}_{j},\boldsymbol{\theta}\rangle,\{\lambda_{j}\}_{j=1}^{k^{*}}\}$ is suitably defined in a limiting sense. Throughout we shall implicitly assume that suitable quadratic forms of the type $\sum_{j=1}^{p}\psi(\lambda_{j})\boldsymbol{\zeta}^{T}v_{j}v_{j}^{T}\boldsymbol{\zeta}$ for certain deterministic functions $\psi$ and sequence of vectors $\boldsymbol{\zeta}$ of bounded norm, that arise in the proof of the theorem, exists. Otherwise, our results can be written more tediously in terms of upper and lower bounds on the power function involving liminf’s and limsup’s of similar quadratic forms. and is larger than $\alpha$ at $h=0,t=\chi^{2}_{1}(\alpha)$ (the $\alpha$ -quantile of $\chi^{2}_{1}$ ). A qualitatively similar result holds for either $\boldsymbol{\beta}$ non-random and $\boldsymbol{\theta}$ random or both $\boldsymbol{\beta},\boldsymbol{\theta}$ random effects ³³3with analogous but different limiting functions.

Remark 1.

Although some levels of Type I error is expected for fixed and low spike strength (patterson2006population), Theorem 3.1 implies that the Type I error can be arbitrarily close to $1$ for using any fixed cut-off – which automatically covers any distribution based cut-off. Indeed, $\mathrm{LR}_{k,\rm ind}$ diverges in this set-up and as our proof will suggest that this is not an artifact of specific adversarial choices of $\boldsymbol{\beta},\boldsymbol{\theta}$ but actually holds for uncountably many choices of $\boldsymbol{\beta},\boldsymbol{\theta}$ . Our proof also suggests that the Type I error is somewhat less pathological when $\boldsymbol{\beta}$ is orthogonal to the leading population eigenvectors – and thereby providing intuitive justification of the results in the random effects parts of Theorem 3.1 since mean $0$ random $\boldsymbol{\beta}$ is “on average orthogonal” to any fixed directions.

Remark 2.

In the notation $\mathbb{P}_{\delta}$ we have suppressed the dependence of $\mathbb{P}$ on $\boldsymbol{\beta},\boldsymbol{\theta}$ to not only maintain succinct notation but also to avoid confusion when one of $\boldsymbol{\beta}$ and $\boldsymbol{\theta}$ is random. Consequently, in statements of the theorem whenever a supremum over $\boldsymbol{\beta}$ or $\boldsymbol{\theta}$ appears, it implicitly acknowledges the dependence of $\mathbb{P}_{\delta}$ on the respective parameters.

Remark 3.

We note that the results for $\mathrm{LR}_{k,\rm ind}$ does not involve a supremum over $\boldsymbol{\theta}$ . This is because assuming a spiked model on $\mathbf{X}$ does not allow a free choice of $\boldsymbol{\theta}$ since by representing $\boldsymbol{\theta}$ as the population least squares coefficients implies that $\boldsymbol{\theta}$ is a function of $\Sigma$ . Also, we assume $\|\boldsymbol{\theta}\|\geq h_{2}>0$ is bounded away from $0$ to reflect a scenario where the effect of $A$ on $Y$ experiences non-trivial confounding through $\mathbf{W}$ .

Remark 4.

The results above do not change if one changes the distribution random effects from Gaussian to other suitable sub-Gaussian distributions as long as the leading population eigenvectors $v_{1},\ldots,v_{k^{*}}$ are delocalized. However, the results can be quite specific to various possibilities of localizations and distributions of the random effects otherwise. We do not explore this aspect here in detail.

Remark 5.

We consider a simple application of Theorem 3.1. Consider the population covariance matrix be the classical spike model (johnstone2001distribution) i.e. $\Sigma=I+\lambda_{1}e_{1}e^{\top}_{1}$ , where $\lambda_{1}>1+\sqrt{\gamma}$ . By Lemma LABEL:lem:distribution_lr one has {equs}LR_k, out—A,W∼χ_1^2(^κ^2_out), ^κ^2_k,out=((βTW⊤+A⊤δ)(I-PC(W~Vk,out))A)2A⊤(I-PC(W~Vk,out))A. Note that,

\mathbb{P}_{\delta}(LR_{k,\mathrm{out}}>t_{\alpha})=\mathbb{E}\Big{(}\bar{\Phi}(t_{\alpha}-\hat{\kappa}_{k,\mathrm{out}}^{2})-\Phi(-t_{\alpha}-\hat{\kappa}_{k,\mathrm{out}}^{2})\Big{)}^{2},

where $\Phi$ is the standard normal cdf and $\bar{\Phi}=1-\Phi$ . Hence, it will be sufficient to derive asymptotic distribution of $\hat{\kappa}_{k,\mathrm{out}}^{2}$ followed by an application of uniform integrability principle to derive the asymptotic behavior of the likelihood ratio test.

Let $\boldsymbol{\beta}\sim N\left(0,\frac{1}{p}\sigma^{2}_{\beta}\right)$ and $\boldsymbol{\theta}:=e_{1}$ . We have shown, in (LABEL:eq:theta_fixed_ncp),

\displaystyle\hat{\kappa}^{2}_{k,\mathrm{out}}\Rightarrow\frac{\sigma^{2}_{\beta}(\sigma^{2}_{g}m_{1}+c_{4})}{c_{0}+\sigma^{2}_{g}}\chi^{2}(h^{2}(c_{0}+\sigma^{2}_{g})^{2}):=\mathcal{L},

where $c_{0}=\lim_{n}\sum_{j=k+1}^{n\wedge p}\hat{\lambda}_{j}\boldsymbol{\theta}^{T}\hat{v_{j}}\hat{v}_{j}^{T}\boldsymbol{\theta}$ , $c_{4}=\lim_{n}\frac{1}{np}\sum_{j=k+1}^{n\wedge p}\hat{d_{j}}^{4}\boldsymbol{\theta}^{T}\hat{v_{j}}\hat{v_{j}}^{T}\boldsymbol{\theta}$ , and $m_{1}=\lim_{n}\frac{1}{p}\sum_{j=k+1}^{n\wedge p}\hat{\lambda}_{j}$ . Now, $m_{1}=m_{1,\gamma}$ is the mean of Marčenko-Pastur distribution. To compute the precise values of $c_{0},c_{4}$ , note that the limiting spectral distribution of $\Sigma$ is $\delta_{1}$ , the non-random distribution which is degenerate at $1$ . Now we invoke Lemma LABEL:lem:spikes_non_spikes to obtain that $c_{0}=\sum_{j=1}^{n\wedge p}\phi_{1j}\boldsymbol{\theta}^{\top}v_{j}v^{\top}_{j}\boldsymbol{\theta}$ and $c_{4}=\frac{1}{\gamma}\sum_{j=1}^{n\wedge p}\phi_{2j}\boldsymbol{\theta}^{\top}v_{j}v^{\top}_{j}\boldsymbol{\theta}$ , where

	$\displaystyle\phi_{1j}$	$\displaystyle=\begin{cases}\gamma\frac{\lambda+1}{\lambda^{2}},\quad j=1\\ 1,\quad j>1,\end{cases}$
	$\displaystyle\phi_{2j}$	$\displaystyle=\begin{cases}\gamma\frac{\lambda+1}{\lambda^{2}}+\gamma^{2}\frac{(\lambda+1)^{2}}{\lambda^{3}},\quad j=1,\\ 1+\gamma,\quad j>1.\end{cases}$

Note that, $\Sigma=(1+\lambda_{1})e_{1}e^{\top}_{1}+\sum_{j=2}^{n\wedge p}e_{j}e^{\top}_{j}$ . So, $v_{j}=e_{j}$ , $c_{0}=\frac{\lambda+1}{\lambda^{2}}$ , and $c_{4}=\gamma\frac{\lambda+1}{\lambda^{2}}+\gamma^{2}\frac{(\lambda+1)^{2}}{\lambda^{3}}$ . Hence,

\mathbb{P}_{\delta}(LR_{k,\mathrm{out}}>t_{\alpha})\rightarrow\mathbb{E}\Big{(}\bar{\Phi}(t_{\alpha}-\mathcal{L})-\Phi(-t_{\alpha}-\mathcal{L})\Big{)}^{2}.

A few further remarks are in order regarding the assumptions and implications of Theorem 3.1 regarding the behavior of $\varphi_{k,\rm ind}(t)$ for $\rm ind\in\{\mathrm{in},\mathrm{out}\}$ . First, Theorem 3.1 (i) implies that under fixed effects model on the regression of $Y|A,\mathbf{X}$ and $A|\mathbf{X}$ one has pathological behavior of PC adjustment based tests in the sense that for any fixed cut-off $t$ , the LRT following PC adjustments has size converging to $1$ . In Section 4 we further present simulation results towards universality of this phenomenon by verifying that this is indeed not an artifact of the particular choice of regression of $A|\mathbf{X}$ in our analytical explorations. Similarly, Theorem 3.1 (ii) demonstrates that although the pathology of the size is somewhat diluted for random effects in the regressions of $Y|A,\mathbf{X}$ and $A|\mathbf{X}$ , the resulting size is still strictly above the desired level $\alpha$ while using the standard $\chi_{1}^{2}(\alpha)$ type cut-offs. Indeed, in view of the proportional asymptotic regime (i.e. $p/n\rightarrow\gamma\in(0,\infty)$ ) and associated inconsistency of PCA (johnstone2009consistency), one might expect some level of discrepancy in the size and power of the PC adjustment based test $\varphi_{k,\rm ind}$ considered – and Theorem 3.1 provides precise nature of this discrepancy. Subsequently, it is natural to ask whether one can recover the desired $\alpha$ -level by suitably inverting the asymptotic power function of the test $\varphi_{k,\rm ind}(t)$ . In this regard, we first focus on the case of random $\boldsymbol{\beta}$ and a close look at the formulae obtained for the power functions $\upsilon_{\rm ind}$ in Theorem 3.1 reveals their intricate dependence on $\left\{\langle\mathbf{v}_{j},\boldsymbol{\theta}\rangle,\lambda_{j}\right\}_{j=1}^{k^{*}}\}$ – which cannot in general be estimated consistently for generalized spiked models with fixed spike strengths cai2018rate; perry2018optimality. In some cases, when $\upsilon_{\rm ind}$ does not depend on the projection of $\boldsymbol{\theta}$ in the directions of population eigenvectors (see e.g. Remark 5 for the very special case of classical spiked model (johnstone2001distribution) and specific assumption on $\boldsymbol{\theta}$ ) estimation of the power function is possible – however in general this seems to be impossible without assuming further structure on $\boldsymbol{\theta}$ and population $\mathbf{v}_{j}$ ’s. In contrast when $\boldsymbol{\theta}$ is random, one has irrespective of $\boldsymbol{\beta}$ that the power function only depends on the $\Gamma_{H}$ (the population spectral distribution for the distribution of $\mathbf{X}$ or $\mathbf{W}$ ) and hence can be potentially estimated. To keep our discussions focused we do not further pursue this avenue here.

To further argue that the pathologies notes above are not an artifact of Assumption 2.7, we now present analogues of the results (in somewhat less precise form) from Theorem 3.1 for the case when $A|\mathbf{W}$ is not a linear regression. Indeed this is the case when $A$ is discrete (as is the situation for GWAS) and as a result Theorem 3.1 does not apply immediately. However as we will see that the qualitative nature of the problem conveyed by Theorem 3.1 continues to hold – with the difference being in the nature of preciseness at which we are able to pinpoint the accurate limit of the Type I error (as in Theorem 3.1 ii.) in the sense that we are only able to provide a lower bound instead of a precise limit.

Theorem 3.2.

Consider testing (2) under (2) using $\varphi_{k,\rm ind}(t)$ . Also assume that $k\geq k^{*}$ is fixed and Assumptions 2.6 (a) and 2.8 hold.

The following holds under Assumption 2.6 (b) for any $M>0$ .

\displaystyle\liminf\limits_{n\rightarrow\infty:\atop p/n\rightarrow\gamma>0}\sup\limits_{\|\boldsymbol{\beta}\|\leq M}\mathbb{P}_{\delta=0}\left(\mathrm{LR}_{k,\mathrm{out}}>\chi_{1}^{2}(\alpha)\right)>\alpha.

ii.

Suppose $\boldsymbol{\beta}\sim N(0,\sigma^{2}_{\beta}I/p)$ and that Assumption 2.6(b) holds. Then

\displaystyle\lim\limits_{n\rightarrow\infty:\atop p/n\rightarrow\gamma>0}\mathbb{P}_{\delta=0}(\mathrm{LR}_{k,\mathrm{out}}>\chi_{1}^{2}(\alpha))>\alpha.

iii.

Suppose that Assumption 2.6 (c) holds. Let $\boldsymbol{\beta}_{0}:=I_{-1}\boldsymbol{\beta}$ with $I_{-1}=[e_{2}:\ldots:e_{p+1}]$ and $c^{*}_{p}(\boldsymbol{\beta}_{0})=\sum_{j=1}^{p}\phi_{1j}\boldsymbol{\beta}^{\top}_{0}v_{j}v_{j}^{\top}e_{1}$ with $\phi_{1j}$ defined in Lemma LABEL:lem:spikes_non_spikes and $e_{j},j\geq 1$ canonical basis $\mathbb{R}^{p+1}$ .

(a)

If $\boldsymbol{\beta}$ is such that $\liminf|c^{*}_{p}(\boldsymbol{\beta}_{0})|>0$ then for any $t\in\mathbb{R}$

$\displaystyle\liminf\limits_{n\rightarrow\infty:\atop p/n\rightarrow\gamma>0}\mathbb{P}_{\delta=0}\left(\mathrm{LR}_{k,\mathrm{in}}>t\right)=1.$

(b)

If $\boldsymbol{\beta}$ is such that $\limsup|c^{*}_{p}(\boldsymbol{\beta}_{0})|=0$ and $\liminf|\sqrt{p}c^{*}_{p}(\boldsymbol{\beta})|>0$ then

\displaystyle\liminf\limits_{n\rightarrow\infty:\atop p/n\rightarrow\gamma>0}\mathbb{P}_{\delta=0}\left(\mathrm{LR}_{k,\mathrm{in}}>\chi_{1}^{2}(\alpha)\right)>\alpha.

(c)

If $\boldsymbol{\beta}$ is such that $\limsup|\sqrt{p}c^{*}_{p}(\boldsymbol{\beta}_{0})|=0$ then the following holds if $\liminf\lambda_{p}^{*}>0$ .

\displaystyle\limsup\limits_{n\rightarrow\infty:\atop p/n\rightarrow\gamma>0}\mathbb{P}_{\delta=0}\left(\mathrm{LR}_{k,\mathrm{in}}>\chi_{1}^{2}(\alpha)\right)=\alpha.

iv.

Suppose $\boldsymbol{\beta}\sim N(0,\sigma^{2}_{\beta}I/p)$ and that Assumption 2.6(b) holds. Then

\displaystyle\lim\limits_{n\rightarrow\infty:\atop p/n\rightarrow\gamma>0}\mathbb{P}_{\delta=0}(\mathrm{LR}_{k,\mathrm{in}}>\chi_{1}^{2}(\alpha))>\alpha.

A few remarks are in order regarding the nature of the results above. First, the result shows the necessity to at least modify the currently used cut-offs for testing using $\mathrm{LR}_{k,\mathrm{out}}$ . Although we believe and demonstrate in the simulation section that a result similar to Theorem 3.1 (i.e. Type I error converging to $1$ for any fixed cut-off) our proofs techniques do not naturally extend to the case of arbitrary $A$ . The reason we can still demonstrate an inflated Type I error while using classical cut-offs is because our proof essentially relies on the case where $A\perp\mathbf{W}$ and thereby is analogous to the case of $\boldsymbol{\beta}=0$ in the context of Theorem 3.1 – implying no confounding through of $A$ and $\mathbf{W}$ . However, the result for $\mathrm{LR}_{k,\mathrm{in}}$ can remain pathological owing to the fact of double use of $A$ in both as a regressor for $Y$ as well as the in the PC’s used in the same regression. In order to provide insight about the nature of the quantity driving the behavior of $\mathrm{LR}_{k,\mathrm{in}}$ , we elaborate using the special case classical spiked model with $\Sigma=I+\lambda vv^{T}$ . In this case the result depends on the localization structure of $v$ . First if $v\perp e_{1}$ then $c_{p}^{*}(\boldsymbol{\beta}_{0})=0$ for any $\boldsymbol{\beta}_{0}$ and hence a $\chi_{1}^{2}$ cut-off is valid for $\mathrm{LR}_{k,\mathrm{in}}$ . This is indeed expected since $v\perp e_{1}$ implies no confounding between $Y$ and $A$ through $\mathbf{W}$ . Suppose now $\liminf|\langle v,e_{1}\rangle|>0$ and $\liminf|\langle v,\boldsymbol{\beta}\rangle|>0$ then we land in the situation of part (a) of Theorem 3.2 iii. (a) since using the fact that $\boldsymbol{\beta}_{0}^{T}e_{1}=0$ in this case we have $\liminf|c_{p}^{*}(\boldsymbol{\beta}_{0})|=\liminf|(\phi_{11}-\phi_{12})\boldsymbol{\beta}_{0}^{T}vv^{T}e_{1}+\boldsymbol{\beta}_{0}^{T}e_{1}|=\liminf|(\phi_{11}-\phi_{12})\boldsymbol{\beta}_{0}^{T}vv^{T}e_{1}|>0$ . Finally if $|\langle v,e_{1}\rangle|=\Theta(1/\sqrt{p})$ we can appeal to 3.2 iii. (b) and a Type error inflation occurs while using traditional $\chi_{1}^{2}$ cut-off.

3.2. Implications for GWAS type Analyses

As mentioned in the introduction, the method analyzed above has especially gathered immense popularity in GWAS type analysis where adjusting for variables in $\mathbf{X}$ crucially adjusts for population stratification and unmeasured confounding (price2006principal). In particular, a classical problem of modern GWAS is that of testing for individual genetic variants while adjusting for other variants and environmental factors (visscher201710). In this regard, the issue of population stratification is one of the main challenges in modern genetic association studies. In particular, the presence of population sub-structure, (i.e. when data set consists of several sub-populations, different phenotypic means across strata in combination with varying allele frequencies across populations) can lead to confounding of the disease-gene relationship. Consequently, many commonly used association tests may be severely biased. Indeed, several methods exist to account for such population structure. The most commonly used ones among them can be organized around three broad themes – (a) the method of genomic control (devlin1999genomic) (this addresses the effects that unknown population-admixture might have on the variance of commonly used test test statistics); (b) methods that rely on a model and data on the additional genetic markers (pritchard2000inference) to infer the latent population structure and subsequently incorporates this model into down stream analysis; and (c) methods that use linear combinations of other genetic markers (which capture population sub-structure in their distribution) as covariates in the analysis using linear or logistic regression (these markers serve as surrogates for the underlying strata chen2003qualitative; zhang2003semiparametric; zhu2002association; price2006principal). Owing to the seminal paper of price2006principal, this third approach which we analyze here, have emerged as the most popular of these available methods.

The implications of Theorem 3.4 can be best understood through its intuitive connection to a simple population admixture model described as follows. Suppose that $\mathbf{X}_{i}=(X_{i1},\ldots,X_{ip})$ is such that there exists $S\subset\{1,\ldots,p\}$ coordinates where the distributions of $X_{ij},j\in S$ are a mixture between two population and for $j\in S^{c}$ the $X_{ij}$ ’s are same between the two populations. For studying GWAS type problems, under LD pruning (hartl1997principles; laird2011fundamentals) of the genotypes, one such situation can be idealized through independently having $X_{ij}\sim\frac{1}{2}\mathrm{Bin}(2,q_{1j})+\frac{1}{2}\mathrm{Bin}(2,q_{2j})$ with $q_{1j}=q_{2j}=q$ for $j\in S$ and $q_{1}=q_{1j}\neq q_{2j}=q_{2}$ for $j\in S^{c}$ . It is easy to check that in this case, $\mathrm{Var}(\mathbf{X})=\sigma^{2}I+(q_{1}-q_{2})^{2}|S|\mathbf{v}\mathbf{v}^{T}$ for some $\sigma^{2}>0$ and unit vector $\mathbf{v}$ . Consequently, in the language of spiked model formulation, the variance covariance matrix of $\mathbf{X}$ indeed follows a spiked model with spike strength proportional to $|S|$ . Theorem 3.4 pertains to the traditional spiked model as an intuitive simplification from the motivating mixture model described above and establishes precise phase transition for the performance of EIGENSTRAT type procedures based on population stratification strength – as captured by the leading spike strength. Indeed, this provides an useful benchmark to check the validity of using PC adjustment based testing for low dimensional parameters.

Our next result is targeted towards showing that we can translate the results from the generalized spike model set up to a more relevant case of mixture models on the rows of $\mathbb{X}$ . In particular, it is natural to assume such a mixture distribution for the genotypes in genetic association studies while considering population admixture models (price2006principal).

Theorem 3.3.

Consider testing (2) under (2) using $\varphi_{k,\rm ind}(t)$ . Then the same conclusion as of Theorem 3.1 holds with any fixed $k\geq 1$ under Assumptions 2.6 (a), and 2.7 for $\rm ind=\mathrm{out}$ when $\mathbf{W}\sim\sum_{i=1}^{L}w_{i}F(\boldsymbol{\mu}_{i},\Sigma)$ for any fixed $L$ with $w_{i}\geq 0$ , with $\sum_{i=1}^{L}w_{i}=1$ , $\sup_{i}\|\mu_{i}\|\leq 1$ , $\|\Sigma\|=O(1)$ where $F(\boldsymbol{\mu},\Sigma)$ is the distribution of a random vector $\Sigma^{1/2}\mathbf{Z}+\boldsymbol{\mu}$ where $\mathbf{Z}$ is a random vector with mean zero i.i.d. sub-Gaussian coordinates. When $\rm ind=\mathrm{in}$ , the same conclusion holds under Assumption 2.6 (a) when $\mathbf{X}\sim\sum_{i=1}^{L}w_{i}F(\boldsymbol{\mu}_{i},\Sigma)$ .

It is worth noting that mixture models on $\mathbf{X}$ implies spiked models on the variance covariance matrix of $\mathbf{X}$ and thereby the above result is somewhat intuitive given the validity of Theorem 3.1. Moreover, in our numerical experiments we shall additionally verify the universality of the results for non-mean-shift type mixture models on $\mathbf{X}$ (e.g. mixtures of binomial distributions on the coordinates of $\mathbf{X}$ as more natural candidates for modeling genotypes under admixture of populations under Hardy-Weinberg equilibrium).

3.3. Diverging Strength of Population PCs

Section 3.1 explored the necessity of potential caution while using EIGENSTRAT type procedure for PC adjustment testing of low-dimensional components. However, these results are not able to reconcile the empirical success and popularity of this procedure. Moreover, to an expert in the results of random matrix theory, the above results might suggest the main issue being the bias in the eigenvectors as for fixed spike strengths. Indeed, the fact that PC based analysis of genetic variations have been able to reveal known population structures (novembre2008genes), prompts one to believe that a diverging spike strength might be the main underlying recipe behind the success of the PC adjustment based testing procedures. However, as we shall see below, that simply diverging spike strength is not sufficient to explain away the pathologies noted above – and rather a specific signal strength can indeed be calibrated to pin point the regimes of success of EIGENSTRAT types procedures.

In this regard, we now present first results in this direction to derive necessary and sufficient conditions on the success of this procedure under the classical spiked model johnstone2001distribution with diverging spike strengths. In the regime of diverging signal strength, it turns out that we can additionally take into account the effect of the distances between $\boldsymbol{\beta},\boldsymbol{\theta}$ and the population spike eigenvectors. This seems to be a reasonable consideration to explore since using the sample PC directions as the regressors finds its natural analogue in the population when $\boldsymbol{\beta}$ belongs to (or is close to) the linear span of the population spiked eigenvectors. Therefore to introduce the results we will require the following natural notion of distance between vectors and sub-spaces. In particular, given a set $V$ and a fixed vector $\beta$ we define $d(\beta,V)=1-\frac{\|P_{V}(\boldsymbol{\beta})\|^{2}}{\|\boldsymbol{\beta}\|^{2}}$ (with $P_{V}$ denoting th orthogonal projection operator onto $V$ ) and $\mathcal{C}_{\tau}(V)=\{\beta:\|\beta\|\leq 1,d(\beta,V)\leq p^{-\tau}\}$ . Note that, if $\tau_{1}>\tau_{2}$ , $\mathcal{C}_{\tau_{1}}(V)\subseteq\mathcal{C}_{\tau_{2}}(V)$ and larger $\tau$ implies lesser angle of the vector $\boldsymbol{\beta}$ with elements of $V$ . Moreover, the results in this case for $\mathrm{LR}_{k,\rm ind}$ additionally depend on the nature of localization of the coordinates of the spikes. This is intuitively understandable since certain natures of localized coordinates implies a highly strong effect of coordinates of $\mathbf{W}$ on $A$ whereas delocalized eigenvectors imply otherwise. To avoid confusion, we therefore first present the results for $\mathrm{LR}_{k,\mathrm{out}}$ .

Theorem 3.4.

Consider testing (2) under (2) using $\varphi_{k,\mathrm{out}}(t)$ and suppose that Assumptions 2.6 (a), (b)’, and 2.7 hold. Further assume that $\lambda_{1},\lambda_{k^{*}}=\Theta(p^{\tau_{0}})$ with $\tau_{0}>0$ . Then the following hold for any $\tau>0$

Suppose that $\boldsymbol{\beta},\boldsymbol{\theta}$ are not random.

(a)

If $\min\{\tau_{0},\tau\}<1/2$ , then for any fixed cut-off $t\in\mathbb{R}$ and $\gamma<1$ ,

\liminf\limits_{n\rightarrow\infty:\atop p/n\rightarrow\gamma>0}\sup_{\boldsymbol{\beta},\boldsymbol{\theta}\in\mathcal{C}_{\tau}(S_{V_{k^{*}}})}\mathbb{P}_{\delta=0}(\mathrm{LR}_{k^{*},\mathrm{out}}>t)=1.

(b)

If $\min\{\tau_{0},\tau\}>1/2$ , then for any $k\geq k^{*}$

\liminf\limits_{n\rightarrow\infty:\atop p/n\rightarrow\gamma>0}\sup_{\boldsymbol{\beta},\boldsymbol{\theta}\in\mathcal{C}_{\tau}(S_{V_{k^{*}}})}\mathbb{P}_{\delta=0}(\mathrm{LR}_{k,\mathrm{out}}>\chi^{2}_{1}(\alpha))=\alpha.

ii.

Suppose $\boldsymbol{\beta}\sim N(0,\frac{1}{p}\sigma^{2}_{\beta}I_{p})$ . Then for any $\tau_{0}\geq 0$ and $\gamma<1$ one has

\liminf\limits_{n\rightarrow\infty:\atop p/n\rightarrow\gamma>0}\sup_{\boldsymbol{\theta}\in\mathcal{C}_{\tau}(S_{V_{k^{*}}})}\mathbb{P}_{\delta=0}(\mathrm{LR}_{k^{\star},\mathrm{out}}>\chi^{2}_{1}(\alpha))>\alpha.

Further, if $\boldsymbol{\theta}\sim N(0,\frac{1}{p}\sigma^{2}_{\theta}I_{p})$ ,

\liminf\limits_{n\rightarrow\infty:\atop p/n\rightarrow\gamma>0}\mathbb{P}_{\delta=0}(\mathrm{LR}_{k^{\star},\mathrm{out}}>\chi^{2}_{1}(\alpha))>\alpha.

iii.

Assume $\boldsymbol{\theta}\sim N(0,\frac{1}{p}\sigma^{2}_{\theta}I_{p})$ but $\boldsymbol{\beta}$ is not random. If $\tau_{0}>0$ , then

\liminf\limits_{n\rightarrow\infty:\atop p/n\rightarrow\gamma>0}\sup_{\boldsymbol{\beta}\in\mathcal{C}_{\tau}(S_{V_{k^{*}}})}\mathbb{P}_{\delta=0}(\mathrm{LR}_{k,\mathrm{out}}>\chi^{2}_{1}(\alpha))=\alpha.

Above $S_{V_{k^{*}}}$ refers to the subspace spanned by $v_{1},\ldots,v_{k^{*}}$ .

A few remarks are in order regarding the phase transitions implied by Theorem 3.4. First, it is worth noting that the phase transitions for the Type I error of $\varphi_{k,\mathrm{out}}$ depends not only whether $\boldsymbol{\theta},\boldsymbol{\beta}$ are random effects but also on the angle between $\boldsymbol{\theta},\boldsymbol{\beta}$ (whichever is fixed effects) and $\mathbf{v}_{1}$ . As for the range of $\tau$ (the parameter deciding the angle with $\mathbf{v}_{1}$ ), Theorem 3.4 only records the results for $\tau>0$ . Indeed, for $\tau=0$ , the spaces for $\boldsymbol{\beta},\boldsymbol{\theta}$ are rendered unrestricted. In this case, our next result demonstrates the lack of phase transition compared to the results in Theorem 3.4.

Proposition 3.5.

Consider testing (2) under (2) using $\varphi_{k,\mathrm{out}}(t)$ . Further assume that $\lambda_{1}=\Theta(p^{\tau_{0}})$ , $\tau_{0}>0$ and $\gamma<1$ . Then the following hold under Assumptions 2.6 (a), (b)’: {equs}lim inf_n→∞:p/n →γ¿0 sup_β,θ∈C_0(S_V_k^*) P_δ=0(LR_k^⋆,out¿χ^2_1(α))¿α. Further, when $\boldsymbol{\beta}\sim N(0,\frac{1}{p}\sigma^{2}_{\theta}I_{p})$ , the following holds under Assumptions 2.6 (a), (b)’: {equs}lim inf_n→∞:p/n →γ¿0 sup_θ∈C_0(S_V_k^*) P_δ=0(LR_k^⋆,out¿χ^2_1(α))¿α. If $\boldsymbol{\beta}\sim N(0,\frac{1}{p}\sigma^{2}_{\beta}I_{p})$ and $\boldsymbol{\theta}\sim N(0,\frac{1}{p}\sigma^{2}_{\theta}I_{p})$ , then

\liminf\limits_{n\rightarrow\infty:\atop p/n\rightarrow\gamma>0}\mathbb{P}_{\delta=0}(\mathrm{LR}_{k^{\star},\mathrm{out}}>\chi^{2}_{1}(\alpha))>\alpha.

However, if $\boldsymbol{\theta}\sim N(0,\frac{1}{p}\sigma^{2}_{\theta}I_{p})$ , for any $\gamma>0$ ,

\liminf\limits_{n\rightarrow\infty:\atop p/n\rightarrow\gamma>0}\sup_{\boldsymbol{\beta}\in\mathcal{C}_{\tau}(S_{V_{k^{*}}})}\mathbb{P}_{\delta=0}(\mathrm{LR}_{k,\mathrm{out}}>\chi^{2}_{1}(\alpha))=\alpha.

Before proceeding to the analysis of $\mathrm{LR}_{k,\mathrm{in}}$ we present a partial result on the behavior of $\mathrm{LR}_{k,\mathrm{out}}$ under a regime which avoids the reliance on Assumption 2.7.

Proposition 3.6.

Consider testing (2) under (2) using $\varphi_{k,\mathrm{out}}(t)$ . Further assume that $\lambda_{1}=\Theta(p^{\tau_{0}})$ and $\tau,\tau_{0}>0$ and $\gamma<1$ . Then the following hold under Assumptions 2.6 (a), (b)’, and 2.8 whenever $\min\{\tau,\tau_{0}\}<\frac{1}{2}$ . {equs}lim inf_n→∞:p/n →γ¿0 sup_β∈C_τ(S_V_k^*) P_δ=0(LR_k^⋆,out¿χ^2_1(α))¿α. If $\boldsymbol{\beta}\sim N(0,\frac{1}{p}\sigma^{2}_{\beta}I_{p})$ . Then one has

\liminf\limits_{n\rightarrow\infty:\atop p/n\rightarrow\gamma>0}\sup_{\boldsymbol{\theta}\in\mathcal{C}_{\tau}(S_{V_{k^{*}}})}\mathbb{P}_{\delta=0}(\mathrm{LR}_{k^{\star},\mathrm{out}}>\chi^{2}_{1}(\alpha))>\alpha.

Our final result pertains to the behavior of $\mathrm{LR}_{k,\mathrm{in}}$ in the regime of divergence spikes. As noted above, this behavior is subtle and depends on the nature of localization of the coordinates of the leading eigenvectors of $\Sigma$ . Since the complete characterization of all the possible cases that can arise is beyond the scope of the paper, we focus on a simple special case of single spiked model below with completely delocalized spiked eigenvector.

Theorem 3.7.

Consider testing (2) under (2) using $\varphi_{k,\mathrm{in}}(t)$ and suppose that Assumptions 2.6 (a), (c)’, and 2.7 hold.

i.

Assume that $k^{*}=1$ , $\lambda_{1}=\Theta(p^{\tau_{0}})$ with $\tau_{0}>0$ and that $\inf_{j}|v_{1}(j)|,\sup_{j}|v_{1}(j)|=\Theta(1/\sqrt{p})$ . Then the following hold with any fixed $k\geq 1$ and fixed $M>0$ {equs}lim inf_n→∞:p/n →γ¿0 sup_β∈S_v_1:∥β∥≤M P_δ=0(LR_k,in¿χ^2_1(α))=α.
ii.

If $\boldsymbol{\beta}\sim N(0,\frac{\sigma^{2}_{\beta}}{p}I_{p})$ , {equs}lim inf_n→∞:p/n →γ¿0 P_δ=0(LR_1,in¿χ^2_1(α))¿α.

Indeed, for diverging spikes using $A$ twice, once in regression of $Y$ on $A$ and also while adjusting for confounding through PCA, is more beneficial compared to $\mathrm{LR}_{k,\mathrm{out}}$ . This is in contrast to existing intuition which only considers power of the procedures and ignores the Type I error considerations (mai2021understanding; listgarten2012improved; yang2014advantages).

4. Numerical Experiments

In this section, we present detailed numerical experiments to not only verify our theoretical results but also to provide additional evidence on the potential universality of the main narrative behind our results that persists beyond the working assumptions of our theoretical results.

We first present results on tests based on $\mathrm{LR}_{k,\mathrm{out}}$ . Our simulations are set up to explore the following features of PC adjustment based procedure for testing (2) under true underlying model (2) by varying the following features: (i) marginal distribution of $\mathbf{W}$ and associated strength of confounding as captured by the spiked strength in case of the generalized spiked model or a suitable notion of mixing strength for mixture models on $\mathbf{W}$ ; (ii) conditional distribution of $A|\mathbf{W}$ ; (iii) aspect ratio between the sample size and the number of confounding variables; (iv) the nature of the regression of $Y|A,\mathbf{W}$ (i.e. fixed or random effects as driven by the nature of $\boldsymbol{\beta}$ in model (2)); and (v) the nature of the regression of $A|\mathbf{W}$ (i.e. once again fixed or random effects as driven by the nature of $\boldsymbol{\theta}$ in model (2.7)). Below we first provide a description of the variations that different types of cases we will consider in this regard.

•
Distribution of $\mathbf{W}$ : We shall consider two broad sub-cases in this regard with further variations as follows.
1. (1)
  
  Spiked Model: In this case we will consider $\mathbf{W}_{i}\sim N(0,\Sigma)$ with $\Sigma=I+\sum_{j=1}^{k^{*}}\lambda_{j}v_{j}v_{j}^{T}$ with $\lambda_{1}\geq\ldots,\lambda_{k}^{*}>0$ . We shall consider $k^{*}=1$ , and $\lambda_{1}=p^{\tau_{0}}$ to regulate the spike strength, where $\tau_{0}\in\{0,0.05,0.10,\ldots,1\}$ .
2. (2)
  
  Mixture Model: We shall consider two types mixture models: (i) (Gaussian Mixture Model) $\mathbf{W}_{i}\sim\frac{1}{2}N(\boldsymbol{\mu}_{1},I)+\frac{1}{2}N(\boldsymbol{\mu}_{2},I)$ with $\boldsymbol{\mu}_{i}=(\mu_{i1},\ldots,\mu_{ip})$ for $i=1,2$ , and $\mu_{1j}=-\mu_{2j}=1$ for $j=1,\ldots,m$ and $\mu_{1j}=\mu_{2j}=0$ for $j\geq m+1$ ; and (ii) (Binomial Mixture Model) $\mathbf{W}_{i}=(W_{i1},\ldots,W_{ip})$ with $W_{ij}\sim\frac{1}{2}\mathrm{Bin}(2,q_{1j})+\frac{1}{2}\mathrm{Bin}(2,q_{2j})$ with $q_{1j}=0.3$ , $q_{2j}=0.7$ for $j=1,\ldots,m$ and $q_{1j}=q_{2j}=0.5$ for $j\geq m+1$ . For this model, we vary $m=\lceil{p^{\tau_{0}}\rceil}$ to regulate the strength of the population stratification, where $\tau_{0}\in\{0,0.05,0.10,\ldots,1\}$ , and $\lceil\cdot\rceil$ denotes the ceiling function.
•

Regression of $A|\mathbf{W}$ : We shall consider two cases: (i) (Continuous Exposure) $A_{i}\in\mathbb{R}$ with $A_{i}=\mathbf{W}_{i}^{T}\boldsymbol{\theta}+\eta_{i}$ where $\eta_{i}\sim N(0,\sigma_{a}^{2})$ ; and (ii) (Binomial Exposure) $A_{i}\sim\mathrm{Bin}(2,p_{i})$ with $p_{i}=\frac{\exp(\boldsymbol{\theta}^{T}\mathbf{W}_{i})}{1+\exp(\boldsymbol{\theta}^{T}\mathbf{W}_{i})}$ . We take $\sigma_{a}^{2}=1$ for our simulations.
•

Aspect Ratio: We shall consider two different aspect ratios $\gamma\in\{0.5,2\}$ between the sample size and the number of confounding variables in our simulations. In all our simulations, we shall let $p=1000$ (and thus the sample size $n\in\{500,2000\}$ ).
•

Regression Coefficients for $Y|A,\mathbf{W}$ : We shall consider both fixed and random effects for $\boldsymbol{\beta}$ in the model (2). When $\boldsymbol{\beta}$ is a fixed effect, we shall vary the angle/distance metric $d(\boldsymbol{\beta},S_{v_{1}})$ by setting $\boldsymbol{\beta}=av_{1}+\sqrt{1-a^{2}}v_{2}$ , where $v_{1}$ and $v_{2}$ are the eigenvectors corresponding to the largest and second largest eigenvalues of $\mathbb{E}(\mathbf{W}^{\top}\mathbf{W}/n)$ , and $a=1-p^{-\tau_{0}}$ . The specification of $\tau_{0}$ remains the same as described in the context of varying spike strength. When $\boldsymbol{\beta}$ is a random effect, $\boldsymbol{\beta}$ is drawn randomly from $N_{p}(0,1/pI)$ .
•

Regression Coefficients for $A,\mathbf{W}$ : We shall consider both fixed and random effects for $\boldsymbol{\theta}$ in the model (2.7). The specifications of $\boldsymbol{\theta}$ as fixed and random effects are exactly the same as described above for $\boldsymbol{\beta}$ .

We are now ready to present our numerical experiments which delves into combinations of the setups presented above. In particular, we shall consider three different combinations of the distributional assumptions of $\mathbf{W}$ and $A|\mathbf{W}$ , namely, (1) Spiked Model on $\mathbf{W}$ and continuous exposure $A$ ; (2) Gaussian mixture model on $\mathbf{W}$ and and continuous exposure $A$ ; and (3) Binomial mixture model on $\mathbf{W}$ and binomial exposure $A$ . In each of these cases and for all combinations of different aspect ratios and different specifications of $\boldsymbol{\beta}$ and $\boldsymbol{\theta}$ , we simulate the outcomes under the null model (i.e. $\delta=0$ in (2)) by taking $\sigma^{2}_{y}=1$ . We performed 2000 replications of the dataset and tested $\varphi_{k,\mathrm{out}}(\chi_{1}^{2}(\alpha))$ on each dataset by taking $k=1$ and $\alpha=0.05$ . Finally, we plot the empirical type I error rate of the test across varying $\tau_{0}$ in Figure 1.

Refer to caption — (a) Spiked model, continuous exposure
( $\gamma=2$ )

In Figure 1, the left and the right columns present the results based on the two different aspect ratios, $\gamma=2$ and $0.5$ , respectively. From top to bottom, the three rows represent simulation setups (1), (2), and (3) as described above. In each of the plots, the black line corresponds to both $\boldsymbol{\beta}$ and $\boldsymbol{\theta}$ being fixed effects, the blue line corresponds to $\boldsymbol{\beta}$ being a fixed effect and $\boldsymbol{\theta}$ being a random effect, the red line corresponds to $\boldsymbol{\beta}$ being a random effect and $\boldsymbol{\theta}$ being a fixed effect, and the pink line corresponds to both $\boldsymbol{\beta}$ and $\boldsymbol{\theta}$ being random effects. The results verify the claims made in Theorem 3.4, as well as Theorems 3.1,3.3 when $\tau_{0}=0$ .

Next, we present results on tests based on $\mathrm{LR}_{k,in}$ . Below we provide a description of different aspects of the simulation procedure that we will consider.

•

Distribution of $\mathbf{X}$ : We shall consider a spiked model for $\mathbf{X}$ . In particular, we shall consider $\mathbf{X}_{i}\sim N(0,\Sigma)$ with $\Sigma=I+\sum_{j=1}^{k^{*}}\lambda_{j}v_{j}v_{j}^{T}$ with $\lambda_{1}\geq\ldots,\lambda_{k}^{*}>0$ . We shall set $k^{*}=1$ , and $\lambda_{1}=p^{\tau_{0}}$ to regulate the spike strength, where $\tau_{0}\in\{0,0.05,0.10,\ldots,1\}$ . We shall further generate $v_{1}$ by drawing randomly from $N_{p}(0,1/pI)$ and normalizing afterwards. Next, we shall partition $\mathbf{X}=\left[A:\mathbf{W}\right]$ , i.e., take the first column of $\mathbf{X}$ as $A$ and the rest as $\mathbf{W}$ .
•

Aspect Ratio: We shall consider two different aspect ratios $\gamma\in\{0.5,2\}$ between the sample size and the number of confounding variables in our simulations. In all our simulations, we shall let $p=1000$ (and thus the sample size $n\in\{500,2000\}$ .
•

Regression Coefficients for $Y|A,\mathbf{W}$ : We shall consider both fixed and random effects for $\boldsymbol{\beta}$ in the model (2). When $\boldsymbol{\beta}$ is a fixed effect, we shall set $\boldsymbol{\beta}=\Sigma_{(22)}^{-1}\Sigma_{(21)}$ , where $\Sigma_{(21)},\Sigma_{(22)}$ are defined based on the partition of $\Sigma$ ,

$\Sigma=\begin{bmatrix}\Sigma_{(11)}&\Sigma_{(12)}\\ \Sigma_{(21)}&\Sigma_{(22)}\end{bmatrix}.$

Here $\Sigma_{(11)}$ is a scalar. When $\boldsymbol{\beta}$ is a random effect, $\boldsymbol{\beta}$ is drawn randomly from $N_{p}(0,\frac{1}{p}I)$ .

For all combinations of different aspect ratios and different specifications of $\boldsymbol{\beta}$ , we simulate the outcomes under the null model (i.e. $\delta=0$ in (2)) by taking $\sigma^{2}_{y}=1$ . We performed 2000 replications of the dataset and tested $\varphi_{k,in}(\chi_{1}^{2}(\alpha))$ on each dataset by taking $k=1$ and $\alpha=0.05$ . Finally, we plot the empirical type I error rate of the test across varying $\tau_{0}$ in Figure 2.

Figures 2(a) and 2(b) present the results based on the two different aspect ratios, $\gamma=2$ and $0.5$ , respectively. In each of the plots, the black line corresponds to $\boldsymbol{\beta}$ being fixed effects, and the red line corresponds to $\boldsymbol{\beta}$ being random effects. The results verify the claims made in Theorem 3.7.

5. Discussions

In this paper we consider hypothesis testing of individual parameters in high dimensional linear regression set up under proportional asymptotics while adjusting for additional confounding through principal component analysis. This method is regularly employed in large-scale genetic and epigenetic association studies and in this paper, we take some initial steps to shed light on necessary and sufficient conditions for the validity of this procedure. Indeed, several interesting directions remain to be explored. A non-exhaustive list of such avenues is as follows: (i) What is a complete picture of phase transitions for $\mathrm{LR}_{k,\mathrm{in}}$ in diverging spiked models? (ii) How to analyze $\mathrm{LR}_{k,\rm ind}$ for more realistic genetic admixture models (price2006principal)? (iii) How to capture similar results when $A|\mathbf{W}$ follows a multiple logistic regression with Hardy-Weinberg equilibrium proportions depending through the logistic link on $\mathbf{W}$ ? (iv) How to perform analogous analysis for case-control studies? and (v) How to construct corrections to the cut-offs for $\mathrm{LR}_{k,\rm ind}$ in regimes where the Type-I error of the tests is inflated? We keep these and other important directions for future research endeavors.

6. Proofs of Main Results

Throughout our proofs whenever the context is clear, with an abuse of notation, we shall write both $\mathbb{W}$ and $\mathbb{X}$ as $\sum_{j=1}^{n\wedge p}\hat{d}_{j}\hat{u}_{j}\hat{v_{j}}^{T}$ and denote $\hat{\lambda_{j}}=\hat{d_{j}}/\sqrt{n}$ – and use them accordingly in the context of $\mathrm{LR}_{\mathrm{out}}$ and $\mathrm{LR}_{\mathrm{in}}$ respectively.

6.1. Proof of Theorem 3.1

We only prove the result for $\mathrm{LR}_{k,\mathrm{out}}$ and note that the proof for $\mathrm{LR}_{k,\mathrm{in}}$ follows by similar arguments by simply replacing $\boldsymbol{\beta},\boldsymbol{\theta}$ by $\widetilde{\boldsymbol{\beta}}:=\widetilde{I}_{-1}\boldsymbol{\beta}$ and $\widetilde{\boldsymbol{\theta}}:=\widetilde{I}_{-1}\boldsymbol{\theta}$ respectively with $\widetilde{I}_{-1}:=[e_{2}:\cdots:e_{p}]$ and $e_{j}$ denotes the $j^{\rm th}$ canonical basis of $\mathbb{R}^{p+1}$ .

We begin by noting that by Lemma LABEL:lem:distribution_lr one has {equs}LR_k, out—A,W∼χ_1^2(^κ^2_k,out), ^κ^2_k,out=((βTW⊤+A⊤δ)(I-PC(W~Vk,out))A)2A⊤(I-PC(W~Vk,out))A.

Since

\mathbb{P}_{\delta=0}(LR_{k,\mathrm{out}}>t_{\alpha})=\mathbb{E}\Big{(}\bar{\Phi}(t_{\alpha}-\hat{\kappa}_{k,\mathrm{out}}^{2})-\Phi(-t_{\alpha}-\hat{\kappa}_{k,\mathrm{out}}^{2})\Big{)}^{2},

where $\Phi$ is the standard normal cdf and $\bar{\Phi}=1-\Phi$ , it will be sufficient to derive asymptotic distribution of $\hat{\kappa}_{k,\mathrm{out}}^{2}$ followed by an application of uniform integrability principle to derive the asymptotic behavior of the likelihood ratio test.

Proof of Theorem 3.1 i.

The non-centrality parameter $\kappa^{2}_{k,\mathrm{out}}$ under $H_{0}$ is given by {equs} ^κ_k,out^2=(W β)^⊤P_C(A_k) (W β) =(β⊤W⊤(I-PC(W~Vk,out))A)2A⊤(I-PC(W~Vk,out))A:=T21T2. First note that the coordinates of $A_{i}$ are i.i.d with finite $\psi_{2}$ -norm (this follows from Assumption 2.7 and (2)). Therefore by Bernstein’s inequality, $\exists$ $C_{2},C_{2}^{\prime}>0$ such that $\|\mathbf{A}\|^{2}_{2}\leq C_{2}n$ w.p. $\geq 1-e^{-C_{2}^{\prime}n}$ . Hence,

\frac{T_{2}}{n}\leq\frac{\|\mathbf{A}\|^{2}_{2}}{n}\leq C_{2},

w.p. $\geq 1-e^{-C_{2}^{\prime}n}$ . Next we show that there exists a constant $C_{1}$ and a pair $\boldsymbol{\beta},\boldsymbol{\theta}$ such that with probability converging to $1$ one has $T_{1}^{2}\geq C_{2}n^{2}$ . To this end note that for any $\boldsymbol{\beta},\boldsymbol{\theta}$ , {equs}T_1=β^⊤W^⊤(I-P_C(W~V_k,out))Wθ+ β^⊤W