This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\equalcont

These authors contributed equally to this work.

\equalcont

These authors contributed equally to this work.

\equalcont

These authors contributed equally to this work.

[1]\fnmXiaoxi \surShen

[1]\orgdivDepartment of Mathematics, \orgnameTexas State University, \orgaddress\street601 University Dr., \citySan Marcos, \postcode78666, \stateTX, \countryU.S.A.

2]\orgnameSan Marcos High School, \orgaddress\street2601 Rattler Road, \citySan Marcos, \postcode78666, \stateTX, \countryU.S.A.

3]\orgnameWilliam P. Clements High School, \orgaddress\street4200 Elkins Rd., \citySugar Land, \postcode77479, \stateTX, \countryU.S.A.

Estimating Broad Sense Heritability via Kernel Ridge Regression

\fnmOlivia \surBley [email protected]    \fnmElizabeth \surLei [email protected]    \fnmAndy \surZhou [email protected]    [email protected] * [ [
Abstract

The broad-sense genetic heritability, which quantifies the total proportion of phenotypic variation in a population due to genetic factors, is crucial for understanding trait inheritance. While many existing methods focus on estimating narrow-sense heritability, which accounts only for additive genetic variation, this paper introduces a kernel ridge regression approach to estimate broad-sense heritability. We provide both upper and lower bounds for the estimator. The effectiveness of the proposed method was evaluated through extensive simulations of both synthetic data and real data from the 1000 Genomes Project. Additionally, the estimator was applied to data from the Alzheimer’s Disease Neuroimaging Initiative to demonstrate its practical utility.

keywords:
Genetic Heritability, Kernel Ridge Regression, Spectrum, Asymptotic Behaviors

1 Introduction

Genetic heritability, which measures the proportion of phenotypic variance explained by genetic factors, is a crucial quantity that can enhance our understanding of the genetic mechanisms underlying complex phenotypes [9]. As noted by Zhu and Zhou [21], two types of heritability can be estimated. The first is broad-sense heritability (H2H^{2}), which evaluates the proportion of phenotypic variance explained by all genetic factors, including additive effects, dominance effects, and epistatic interactions. The second type, narrow-sense heritability (h2h^{2}), represents the proportion of phenotypic variance explained solely by additive genetic effects.

Since the success of Genome-Wide Association Studies (GWAS), hundreds of single nucleotide polymorphisms (SNPs) associated with complex traits have been discovered [4, 2]. However, SNPs identified through GWAS explain only a small fraction of the genetic variation for any given trait, a well-known issue referred to as ”missing heritability” [10, 11]. One possible explanation is that each individual SNP contributes only a tiny effect to the trait; thus, many SNPs remain undetected as they fail to meet the stringent significance threshold in GWAS [20]. As a result, existing literature on narrow-sense heritability estimation often relies on modeling the joint effects of multiple SNPs, with linear mixed-effects models (LMMs) playing a key role in constructing narrow-sense heritability estimators. For example, [20] applied Restricted Maximum Likelihood (REML) multi-variance component analysis on large sample sizes to estimate heritability and uncover genetic architecture. This method demonstrated significant reductions in computation time and memory use compared to previous approaches. The BOLT-REML algorithm revealed that SNP heritability increases with higher-frequency SNPs, while rarer SNPs have a larger effect per allele but explain less variance per SNP. The study further characterized schizophrenia as having a highly polygenic architecture.

In line with this trend, numerous advancements have been made to improve LMM-based narrow-sense heritability estimators. For instance, [9] addressed three main challenges—scale, ascertainment, and SNP quality control—and demonstrated that genetic variation in disease liability, linked to common SNPs in linkage disequilibrium (LD), could be estimated from genome-wide association studies (GWAS) data. First, variance explained on the observed risk scale was transformed to an underlying liability scale. Second, the method accounted for ascertainment, which led to a much higher proportion of cases in the analyzed sample than in the general population. Third, the new approach minimized inflated heritability estimates by preventing artificial differences between case and control genotypes through rigorous GWAS quality control. This method was tested using simulated data and real GWAS data from three Wellcome Trust Case Control Consortium (WTCCC) datasets: Crohn’s disease, bipolar disorder, and type I diabetes. Additionally, Speed et al. [15] proposed a modified kinship matrix that weights SNPs based on local LD, significantly reducing bias in heritability estimates and enhancing estimation precision.

Several additional approaches have been developed to improve narrow-sense heritability estimation. For example, with the reduced cost of whole-genome sequencing and advances in data storage, biobank data has become accessible. [3] explored an estimator derived under a model that minimizes assumptions about genetic architecture, acknowledging that models with numerous assumptions risk bias due to the unknown nature of genetic architecture. They proposed a generalized random effects (GRE) model that makes minimal assumptions about genetic architecture. They found that the GRE estimator is nearly unbiased across all architectures compared to existing methods, with a maximum bias of around 2%. As anticipated, the GRE model demonstrated robust performance across architectures, with a runtime of O(Npk2N{p_{k}}^{2}) for chromosome kk with pkp_{k} SNPs, taking approximately 3 hours and 60 GB of memory for 50,000 SNPs. Another direction involves estimating narrow-sense heritability using only summary data, such as the LD matrix. Although individual-level data can provide more precise estimates, its accessibility is often limited by data-sharing restrictions. Recently, Li et al. [8] proposed an iterative procedure called HEELS, which uses summary statistics to solve REML-score equations by adapting Henderson’s algorithm for variance component estimation in LMMs. This approach improves the statistical efficiency of heritability estimators based on summary statistics, achieving performance comparable to that of REML.

Although accurately estimating narrow-sense heritability has received considerable attention in the literature, the true relationship between genetic components and phenotypic traits is unknown, and focusing solely on additive (linear) effects may result in a loss of information. Therefore, it is also important to estimate broad-sense heritability accurately. Mathematically, suppose the relationship between SNPs (𝒁\boldsymbol{Z}) and a phenotypic trait (YY) is given by Y=g(𝒁)+ϵY=g(\boldsymbol{Z})+\epsilon, where ϵ\epsilon is a random error. The broad-sense heritability can then be expressed as

H2=Var[g(𝒁)]Var[g(𝒁)]+Var[ϵ].H^{2}=\frac{\textrm{Var}[g(\boldsymbol{Z})]}{\textrm{Var}[g(\boldsymbol{Z})]+\textrm{Var}[\epsilon]}. (1)

If the function gg were known and i.i.d. data pairs (𝒁1,Y1),(𝒁2,Y2),,(𝒁n,Yn)(\boldsymbol{Z}_{1},Y_{1}),(\boldsymbol{Z}_{2},Y_{2}),\ldots,(\boldsymbol{Z}_{n},Y_{n}) were observed, a natural estimator for Var[g(𝒁)]\textrm{Var}[g(\boldsymbol{Z})] would be the sample variance (n1)1i=1n(g(𝒁i)g¯)2(n-1)^{-1}\sum_{i=1}^{n}(g(\boldsymbol{Z}_{i})-\bar{g})^{2}, with g¯=n1i=1ng(𝒁i)\bar{g}=n^{-1}\sum_{i=1}^{n}g(\boldsymbol{Z}_{i}). Similarly, a natural estimator for Var[ϵ]\textrm{Var}[\epsilon] is its sample counterpart, n1i=1n(Yig(𝒁i))2n^{-1}\sum_{i=1}^{n}(Y_{i}-g(\boldsymbol{Z}_{i}))^{2}. An estimator of H2H^{2} can be obtained by replacing the variances with their respective estimators. However, since gg is unknown, it must also be estimated. In this paper, this estimation will be achieved through kernel ridge regression.

The remainder of the paper is organized as follows: Section 2 provides the details on our proposed estimator for the broad-sense heritability as well as some theoretical properties of the estimator, followed by some simulation studies whose results are presented in Section 3. A real data analysis of the proposed method for the data from the Alzheimer’s Neuroimaging Initiative (ADNI) is given in Section 4 and Section 5 provides a discussion of the proposed method and future work.

Notations: Throughout the remainder of the paper, we use bold lower-cased alphabetic letters and bold lower-cased Greek letters to denote non-random vectors. Capitalized alphabetic letters and Greek letters will be used to denote random variables, while bold capitalized alphabetic letters and Greek letters will be used to denote matrices or random vectors. For a matrix 𝑨\boldsymbol{A}, 𝑨op\left\|{\boldsymbol{A}}\right\|_{op} is used to denote the operator norm, which is the largest singular value of 𝑨\boldsymbol{A}. For a vector 𝒂\boldsymbol{a}, 𝒂\left\|{\boldsymbol{a}}\right\| is used to denote the regular Euclidean norm. For a square matrix 𝑨\boldsymbol{A}, l1(𝑨)l2(𝑨)ln(𝑨)l_{1}(\boldsymbol{A})\geq l_{2}(\boldsymbol{A})\geq\cdots\geq l_{n}(\boldsymbol{A}) are used to denote the eigenvalues of 𝑨\boldsymbol{A} and 𝑨F\left\|{\boldsymbol{A}}\right\|_{F} and 𝑨op\left\|{\boldsymbol{A}}\right\|_{op} denote the Frobenius norm and the operator norm of 𝑨\boldsymbol{A} respectively.

2 Methodology

Consider the following semi-parametric model for the population:

Y=g(𝒁)+ϵ,Y=g(\boldsymbol{Z})+\epsilon, (2)

where 𝒁p\boldsymbol{Z}\in\mathbb{R}^{p} is a vector containing pp SNPs. Suppose that gKg\in\mathcal{H}_{K} with K\mathcal{H}_{K} being a reproducing kernel Hilbert space associated with the kernel function K(,)K(\cdot,\cdot). Now, given the observed data pairs {(Yi,𝒁i)}i=1n\{(Y_{i},\boldsymbol{Z}_{i})\}_{i=1}^{n} from a random sample, the sample version of model (2) is given by

Yi=g(𝒁i)+ϵi,i=1,,n.Y_{i}=g(\boldsymbol{Z}_{i})+\epsilon_{i},\quad i=1,\ldots,n. (3)

Here we assume that ϵi\epsilon_{i} are i.i.d with mean zero and variance σϵ2\sigma_{\epsilon}^{2}. Estimation of gKg\in\mathcal{H}_{K} can be accomplished via minimizing the penalized least squares loss function

g^=argmingK1ni=1n(Yig(𝒁i))2+λngK2.\hat{g}=\textrm{argmin}_{g\in\mathcal{H}_{K}}\frac{1}{n}\sum_{i=1}^{n}(Y_{i}-g(\boldsymbol{Z}_{i}))^{2}+\lambda_{n}\left\|{g}\right\|_{\mathcal{H}_{K}}^{2}. (4)

Thanks to the Representer Theorem [6, 7], the general solution for the nonparametric function g()g(\cdot) in (4) can be expressed as

g()=i=1nαiK(𝒁i,),g(\cdot)=\sum_{i=1}^{n}\alpha_{i}K(\boldsymbol{Z}_{i},\cdot),

so that model (3), in a matrix-vector form, can be expressed as

𝒀=𝑲𝜶+ϵ,\boldsymbol{Y}=\boldsymbol{K\alpha}+\boldsymbol{\epsilon}, (5)

where 𝑲=[K(𝒁i,𝒁j)]i,j=1n\boldsymbol{K}=[K(\boldsymbol{Z}_{i},\boldsymbol{Z}_{j})]_{i,j=1}^{n} is a kernel matrix constructed based on the observed data 𝒁1,,𝒁n\boldsymbol{Z}_{1},\ldots,\boldsymbol{Z}_{n} and the optimization problem (4) becomes

𝜶^=argmin𝜶1n(𝒀𝑲𝜶)T(𝒀𝑲𝜶)+λn𝜶T𝑲𝜶.\hat{\boldsymbol{\alpha}}=\textrm{argmin}_{\boldsymbol{\alpha}}\frac{1}{n}(\boldsymbol{Y}-\boldsymbol{K\alpha})^{T}(\boldsymbol{Y}-\boldsymbol{K\alpha})+\lambda_{n}\boldsymbol{\alpha}^{T}\boldsymbol{K\alpha}. (6)

Set Q(𝜶)=1n(𝒀𝑲𝜶)T(𝒀𝑲𝜶)Q(\boldsymbol{\alpha})=\frac{1}{n}(\boldsymbol{Y}-\boldsymbol{K\alpha})^{T}(\boldsymbol{Y}-\boldsymbol{K\alpha}). Taking derivatives with respect to 𝜶\boldsymbol{\alpha} gives

Q𝜶\displaystyle\frac{\partial Q}{\partial\boldsymbol{\alpha}} =2n𝑲(𝒀𝑲𝜶)+2λn𝑲𝜶.\displaystyle=\frac{2}{n}\boldsymbol{K}(\boldsymbol{Y}-\boldsymbol{K\alpha})+2\lambda_{n}\boldsymbol{K\alpha}.

By letting it equal to 𝟎\boldsymbol{0} and the solution to the equation Q𝜶=𝟎\frac{\partial Q}{\partial\boldsymbol{\alpha}}=\boldsymbol{0} is given by

𝜶^\displaystyle\hat{\boldsymbol{\alpha}} =(𝑲+nλn𝑰n)1𝒀\displaystyle=\left(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n}\right)^{-1}\boldsymbol{Y}

We estimate the broad sense heritability by using 𝜶^\hat{\boldsymbol{\alpha}} as given above and using σ^ϵ2=1n(𝒀𝑲𝜶^)T(𝒀𝑲𝜶^)\hat{\sigma}_{\epsilon}^{2}=\frac{1}{n}(\boldsymbol{Y}-\boldsymbol{K}\hat{\boldsymbol{\alpha}})^{T}(\boldsymbol{Y}-\boldsymbol{K}\hat{\boldsymbol{\alpha}}). Then the broad sense heritability can be estimated via

H^2=1n1𝜶^T𝑲(𝑰n1n𝑱n)𝑲𝜶^1n1𝜶^T𝑲(𝑰n1n𝑱n)𝑲𝜶^+σ^ϵ2.\hat{H}^{2}=\frac{\frac{1}{n-1}\hat{\boldsymbol{\alpha}}^{T}\boldsymbol{K}(\boldsymbol{I}_{n}-\frac{1}{n}\boldsymbol{J}_{n})\boldsymbol{K}\hat{\boldsymbol{\alpha}}}{\frac{1}{n-1}\hat{\boldsymbol{\alpha}}^{T}\boldsymbol{K}(\boldsymbol{I}_{n}-\frac{1}{n}\boldsymbol{J}_{n})\boldsymbol{K}\hat{\boldsymbol{\alpha}}+\hat{\sigma}_{\epsilon}^{2}}. (7)

Now, we will look at the theoretical properties of the proposed heritability estimator H^2\hat{H}^{2} by providing asymptotic upper bound and lower bound. For simplicity, we denote 𝒈=[g(𝒁1),,g(𝒁n)]T\boldsymbol{g}=[g(\boldsymbol{Z}_{1}),\ldots,g(\boldsymbol{Z}_{n})]^{T}, 𝑱¯n=1n𝑱n\bar{\boldsymbol{J}}_{n}=\frac{1}{n}\boldsymbol{J}_{n} and σ^g2=1n1𝜶^𝑲(𝑰n𝑱¯n)𝑲𝜶^\hat{\sigma}_{g}^{2}=\frac{1}{n-1}\hat{\boldsymbol{\alpha}}\boldsymbol{K}(\boldsymbol{I}_{n}-\bar{\boldsymbol{J}}_{n})\boldsymbol{K}\hat{\boldsymbol{\alpha}}. To start with, let us decompose σ^g2\hat{\sigma}_{g}^{2} and σ^ϵ2\hat{\sigma}_{\epsilon}^{2} as follows:

σ^g2\displaystyle\hat{\sigma}_{g}^{2} =1n1𝒀T(𝑲+nλn𝑰n)1𝑲(𝑰n𝑱¯n)𝑲(𝑲+nλn𝑰n)1𝒀\displaystyle=\frac{1}{n-1}\boldsymbol{Y}^{T}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}\boldsymbol{K}(\boldsymbol{I}_{n}-\bar{\boldsymbol{J}}_{n})\boldsymbol{K}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}\boldsymbol{Y}
=1n1(𝒈+ϵ)T(𝑲+nλn𝑰n)1𝑲(𝑰n𝑱¯n)𝑲(𝑲+nλn𝑰n)1(𝒈+ϵ)\displaystyle=\frac{1}{n-1}(\boldsymbol{g}+\boldsymbol{\epsilon})^{T}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}\boldsymbol{K}(\boldsymbol{I}_{n}-\bar{\boldsymbol{J}}_{n})\boldsymbol{K}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}(\boldsymbol{g}+\boldsymbol{\epsilon})
=1n1(I1g+I2g+I3g),\displaystyle=\frac{1}{n-1}(I_{1}^{g}+I_{2}^{g}+I_{3}^{g}),

where

I1g\displaystyle I_{1}^{g} =𝒈T(𝑲+nλn𝑰n)1𝑲(𝑰n𝑱¯n)𝑲(𝑲+nλn𝑰n)1𝒈,\displaystyle=\boldsymbol{g}^{T}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}\boldsymbol{K}(\boldsymbol{I}_{n}-\bar{\boldsymbol{J}}_{n})\boldsymbol{K}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}\boldsymbol{g},
I2g\displaystyle I_{2}^{g} =ϵT(𝑲+nλn𝑰n)1𝑲(𝑰n𝑱¯n)𝑲(𝑲+nλn𝑰n)1ϵ,\displaystyle=\boldsymbol{\epsilon}^{T}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}\boldsymbol{K}(\boldsymbol{I}_{n}-\bar{\boldsymbol{J}}_{n})\boldsymbol{K}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}\boldsymbol{\epsilon},
I3g\displaystyle I_{3}^{g} =2𝒈T(𝑲+nλn𝑰n)1𝑲(𝑰n𝑱¯n)𝑲(𝑲+nλn𝑰n)1ϵ.\displaystyle=2\boldsymbol{g}^{T}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}\boldsymbol{K}(\boldsymbol{I}_{n}-\bar{\boldsymbol{J}}_{n})\boldsymbol{K}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}\boldsymbol{\epsilon}.

Similarly,

σ^ϵ2\displaystyle\hat{\sigma}_{\epsilon}^{2} =1n(𝒀𝑲(𝑲+nλn𝑰n)1𝒀)T(𝒀𝑲(𝑲+nλn𝑰n)1𝒀)\displaystyle=\frac{1}{n}(\boldsymbol{Y}-\boldsymbol{K}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}\boldsymbol{Y})^{T}(\boldsymbol{Y}-\boldsymbol{K}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}\boldsymbol{Y})
=1n𝒀T(𝑰n𝑲(𝑲+nλn𝑰n)1)(𝑰n𝑲(𝑲+nλn𝑰n)1)𝒀\displaystyle=\frac{1}{n}\boldsymbol{Y}^{T}(\boldsymbol{I}_{n}-\boldsymbol{K}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1})(\boldsymbol{I}_{n}-\boldsymbol{K}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1})\boldsymbol{Y}
=1nn2λn2𝒀T(𝑲+nλn𝑰n)2𝒀\displaystyle=\frac{1}{n}n^{2}\lambda_{n}^{2}\boldsymbol{Y}^{T}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-2}\boldsymbol{Y}
=1nn2λn2(𝒈+ϵ)T(𝑲+nλn𝑰n)1(𝒈+ϵ)\displaystyle=\frac{1}{n}n^{2}\lambda_{n}^{2}(\boldsymbol{g}+\boldsymbol{\epsilon})^{T}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}(\boldsymbol{g}+\boldsymbol{\epsilon})
=1n(I1ϵ+I2ϵ+I3ϵ),\displaystyle=\frac{1}{n}(I_{1}^{\epsilon}+I_{2}^{\epsilon}+I_{3}^{\epsilon}),

where

I1ϵ\displaystyle I_{1}^{\epsilon} =n2λn2𝒈T(𝑲+nλn𝑰n)2𝒈\displaystyle=n^{2}\lambda_{n}^{2}\boldsymbol{g}^{T}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-2}\boldsymbol{g}
I2ϵ\displaystyle I_{2}^{\epsilon} =n2λn2ϵT(𝑲+nλn𝑰n)2ϵ\displaystyle=n^{2}\lambda_{n}^{2}\boldsymbol{\epsilon}^{T}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-2}\boldsymbol{\epsilon}
I3ϵ\displaystyle I_{3}^{\epsilon} =2n2λn2𝒈T(𝑲+nλn𝑰n)2ϵ.\displaystyle=2n^{2}\lambda_{n}^{2}\boldsymbol{g}^{T}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-2}\boldsymbol{\epsilon}.

Our first observation is that the cross-product term 1n1I3g\frac{1}{n-1}I_{3}^{g} and 1nIeϵ\frac{1}{n}I_{e}^{\epsilon} are asymptotically negligible under conditions (C1) and (C2). (C1) assumes that the random errors are sub-Gaussian random variables and (C2) requires that the underlying function gg is bounded. These assumptions are common in literature.

  • (C1)

    (Sub-Gaussian Errors) ϵ1,,ϵn\epsilon_{1},\ldots,\epsilon_{n} are i.i.d. sub-gaussian random variables with maxiϵiψ2C\max_{i}\left\|{\epsilon_{i}}\right\|_{\psi_{2}}\leq C for some constant C>0C>0.

  • (C2)

    (Bounded Signals) gM\left\|{g}\right\|_{\infty}\leq M for some M>0M>0.

Proposition 1.

Under Assumptions (C1) and (C2), given 𝒵:=σ(𝐙1,,𝐙n)\mathcal{Z}:=\sigma(\boldsymbol{Z}_{1},\ldots,\boldsymbol{Z}_{n}),

1n1I3g=op(1),1nI3ϵ=op(1).\frac{1}{n-1}I_{3}^{g}=o_{p}(1),\qquad\frac{1}{n}I_{3}^{\epsilon}=o_{p}(1).

Since I3gI_{3}^{g} and I3ϵI_{3}^{\epsilon} are asymptotically negligible, the asymptotic performance of H^2\hat{H}^{2} is solely determined by the asymptoci behavior of IigI_{i}^{g} and IiϵI_{i}^{\epsilon}, i{1,2}i\in\{1,2\}. We first focus on the asymptotic performance of I2gI_{2}^{g} and I2ϵI_{2}^{\epsilon}.

Proposition 2.

Under Assumptions (C1) and (C2), given 𝒵:=σ(𝐙1,,𝐙n)\mathcal{Z}:=\sigma(\boldsymbol{Z}_{1},\ldots,\boldsymbol{Z}_{n}),

1n1I2g\displaystyle\frac{1}{n-1}I_{2}^{g} =σϵ2n1tr((𝑲+nλn𝑰n)1𝑲(𝑰n𝑱¯n)𝑲(𝑲+nλn𝑰n)1)+op(1)\displaystyle=\frac{\sigma_{\epsilon}^{2}}{n-1}\textrm{tr}\left((\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}\boldsymbol{K}(\boldsymbol{I}_{n}-\bar{\boldsymbol{J}}_{n})\boldsymbol{K}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}\right)+o_{p}(1)
1nI2ϵ\displaystyle\frac{1}{n}I_{2}^{\epsilon} =σϵ2ntr(n2λn2(𝑲+nλn𝑰n)2)+op(1).\displaystyle=\frac{\sigma_{\epsilon}^{2}}{n}\textrm{tr}\left(n^{2}\lambda_{n}^{2}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-2}\right)+o_{p}(1).

For I1gI_{1}^{g} and I1ϵI_{1}^{\epsilon}, since both terms are quadratic form, so they naturally are naturally lower bounded from 0. To provide nontrivial bounds for these quadratic forms, we further impose the following assumptions. Condition (C3) requires that 𝒈\boldsymbol{g} and 𝒗1\boldsymbol{v}_{1} are not orthogonal so that 𝒈\boldsymbol{g} does not lie in the null space of 𝑲\boldsymbol{K}. Similarly, (C3) also requires that neither 𝒗1\boldsymbol{v}_{1} nor 𝒈\boldsymbol{g} lies in the null space of 𝑱n\boldsymbol{J}_{n}. Condition (C4) requires that there is sufficient cap between the largest eigenvalue and the second largest eigenvalue of the kernel matrix 𝑲\boldsymbol{K}.

  • (C3)

    There exist constants c(0,1]c\in(0,1] such that

    |𝒗1T𝟏n|\displaystyle\left|{\boldsymbol{v}_{1}^{T}\boldsymbol{1}_{n}}\right| cn,\displaystyle\geq c\sqrt{n},
    |𝒗1T𝒈|\displaystyle\left|{\boldsymbol{v}_{1}^{T}\boldsymbol{g}}\right| c𝒈 a.s.,\displaystyle\geq c\left\|{\boldsymbol{g}}\right\|\textrm{ a.s.},
    |𝟏nT𝒈|\displaystyle\left|{\boldsymbol{1}_{n}^{T}\boldsymbol{g}}\right| cn𝒈a.s.\displaystyle\geq c\sqrt{n}\left\|{\boldsymbol{g}}\right\|\textrm{a.s.}

    where 𝒗1\boldsymbol{v}_{1} is the eigenvector corresponding to the largest eigenvalue of 𝑲\boldsymbol{K}.

  • (C4)

    (Spectral Gap) There exists constant α(max(2c21,0),1]\alpha\in(\max(2c^{2}-1,0),1] such that

    l1(𝑲)l2(𝑲)>α+1c2c2, a.s.\frac{l_{1}(\boldsymbol{K})}{l_{2}(\boldsymbol{K})}>\frac{\alpha+1-c^{2}}{c^{2}},\textrm{ a.s.}

    where l2(𝑲)l_{2}(\boldsymbol{K}) is the second largest eigenvalue of the kernel matrix 𝑲\boldsymbol{K}.

Proposition 3.

Under Assumptions (C3) and (C4),

|𝟏nT𝑲(𝑲+nλn𝑰n)1𝒈|α(l2(𝑲)l2(𝑲)+nλn)n𝒈,\left|{\boldsymbol{1}_{n}^{T}\boldsymbol{K}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}\boldsymbol{g}}\right|\geq\alpha\left(\frac{l_{2}(\boldsymbol{K})}{l_{2}(\boldsymbol{K})+n\lambda_{n}}\right)\sqrt{n}\left\|{\boldsymbol{g}}\right\|,

provided that nλn(α+1c2)l1(𝐊)l2(𝐊)c2l1(𝐊)(α+1c2)l2(𝐊)n\lambda_{n}\geq\frac{(\alpha+1-c^{2})l_{1}(\boldsymbol{K})l_{2}(\boldsymbol{K})}{c^{2}l_{1}(\boldsymbol{K})-(\alpha+1-c^{2})l_{2}(\boldsymbol{K})}.

As one can see from Proposition 3, to obtain a nontrivial lower bound, besides the conditions (C3) and (C4), one also requires that the regularization parameter λn\lambda_{n} should be chosen properly as well. In addition, as a consequence of Proposition 3, the quantity 1n(𝟏nT𝑲(𝑲+nλn)1𝒈)2\frac{1}{n}(\boldsymbol{1}_{n}^{T}\boldsymbol{K}(\boldsymbol{K}+n\lambda_{n})^{-1}\boldsymbol{g})^{2} behaves similarly as 1n(𝟏nT𝒈)2\frac{1}{n}(\boldsymbol{1}_{n}^{T}\boldsymbol{g})^{2}. This observation is given in the following proposition.

Proposition 4.

Under Assumption (C3), (C4) and nλn(α+1c2)l1(𝐊)l2(𝐊)/[c2l1(𝐊)(α+1c2)l2(𝐊)]n\lambda_{n}\geq(\alpha+1-c^{2})l_{1}(\boldsymbol{K})l_{2}(\boldsymbol{K})/[c^{2}l_{1}(\boldsymbol{K})-(\alpha+1-c^{2})l_{2}(\boldsymbol{K})],

α2(l1(𝑲)l1(𝑲)+nλn)21n(𝟏nT𝑲(𝑲+nλn𝑰n)1𝒈)21n(𝟏nT𝒈)2c2(l1(𝑲)l1(𝑲)+nλn)2\alpha^{2}\left(\frac{l_{1}(\boldsymbol{K})}{l_{1}(\boldsymbol{K})+n\lambda_{n}}\right)^{2}\leq\frac{\frac{1}{n}(\boldsymbol{1}_{n}^{T}\boldsymbol{K}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}\boldsymbol{g})^{2}}{\frac{1}{n}(\boldsymbol{1}_{n}^{T}\boldsymbol{g})^{2}}\leq c^{-2}\left(\frac{l_{1}(\boldsymbol{K})}{l_{1}(\boldsymbol{K})+n\lambda_{n}}\right)^{2}\\

Now we are ready to bound the terms that are not asymptotically negligible to compute σ^g2\hat{\sigma}_{g}^{2} and σ^ϵ2\hat{\sigma}_{\epsilon}^{2}. The bounds are provided in the following two propositions.

Proposition 5.

Under Assumption (C1)-(C4) and nλn(α+1c2)l1(𝐊)l2(𝐊)/[c2l1(𝐊)(α+1c2)l2(𝐊)]n\lambda_{n}\geq(\alpha+1-c^{2})l_{1}(\boldsymbol{K})l_{2}(\boldsymbol{K})/[c^{2}l_{1}(\boldsymbol{K})-(\alpha+1-c^{2})l_{2}(\boldsymbol{K})],

nc2n1(l1(𝑲)l1(𝑲)+nλn)2[1n𝒈2c4(1n𝟏nT𝒈)2]\displaystyle\frac{nc^{2}}{n-1}\left(\frac{l_{1}(\boldsymbol{K})}{l_{1}(\boldsymbol{K})+n\lambda_{n}}\right)^{2}\left[\frac{1}{n}\left\|{\boldsymbol{g}}\right\|^{2}-c^{-4}\left(\frac{1}{n}\boldsymbol{1}_{n}^{T}\boldsymbol{g}\right)^{2}\right]
\displaystyle\leq 1n1I1g\displaystyle\frac{1}{n-1}I_{1}^{g}
\displaystyle\leq nn1(l1(𝑲)l1(𝑲)+nλn)2[1n𝒈2α2(l2(𝑲)l2(𝑲)+nλnl1(𝑲)+nλnl1(𝑲))2(1n𝟏nT𝒈)2],\displaystyle\frac{n}{n-1}\left(\frac{l_{1}(\boldsymbol{K})}{l_{1}(\boldsymbol{K})+n\lambda_{n}}\right)^{2}\left[\frac{1}{n}\left\|{\boldsymbol{g}}\right\|^{2}-\alpha^{2}\left(\frac{l_{2}(\boldsymbol{K})}{l_{2}(\boldsymbol{K})+n\lambda_{n}}\frac{l_{1}(\boldsymbol{K})+n\lambda_{n}}{l_{1}(\boldsymbol{K})}\right)^{2}\left(\frac{1}{n}\boldsymbol{1}_{n}^{T}\boldsymbol{g}\right)^{2}\right],

and

σϵ2n1i=2n(li(𝑲)li(𝑲)+nλn)2+op(1)1n1I2g(1c2)σϵ2n1i=1n(li(𝑲)li(𝑲)+nλn)2+op(1).\displaystyle\frac{\sigma_{\epsilon}^{2}}{n-1}\sum_{i=2}^{n}\left(\frac{l_{i}(\boldsymbol{K})}{l_{i}(\boldsymbol{K})+n\lambda_{n}}\right)^{2}+o_{p}(1)\leq\frac{1}{n-1}I_{2}^{g}\leq\frac{(1-c^{2})\sigma_{\epsilon}^{2}}{n-1}\sum_{i=1}^{n}\left(\frac{l_{i}(\boldsymbol{K})}{l_{i}(\boldsymbol{K})+n\lambda_{n}}\right)^{2}+o_{p}(1).
Proposition 6.

Under Assumptions (C1) and (C2), given 𝒵=σ{𝐙1,,𝐙n}\mathcal{Z}=\sigma\{\boldsymbol{Z}_{1},\ldots,\boldsymbol{Z}_{n}\}

1n(nλnl1(𝑲)+nλn)2𝒈2\displaystyle\frac{1}{n}\left(\frac{n\lambda_{n}}{l_{1}(\boldsymbol{K})+n\lambda_{n}}\right)^{2}\left\|{\boldsymbol{g}}\right\|^{2} 1nI1ϵ1n𝒈2\displaystyle\leq\frac{1}{n}I_{1}^{\epsilon}\leq\frac{1}{n}\left\|{\boldsymbol{g}}\right\|^{2}
(nλnl1(𝑲)+nλn)2σϵ2+op(1)\displaystyle\left(\frac{n\lambda_{n}}{l_{1}(\boldsymbol{K})+n\lambda_{n}}\right)^{2}\sigma_{\epsilon}^{2}+o_{p}(1) 1nI2ϵσϵ2+op(1).\displaystyle\leq\frac{1}{n}I_{2}^{\epsilon}\leq\sigma_{\epsilon}^{2}+o_{p}(1).

Putting everything together, we have

Theorem 7.

Under (C1)-(C4) and and nλn(α+1c2)l1(𝐊)l2(𝐊)/[c2l1(𝐊)(α+1c2)l2(𝐊)]n\lambda_{n}\geq(\alpha+1-c^{2})l_{1}(\boldsymbol{K})l_{2}(\boldsymbol{K})/[c^{2}l_{1}(\boldsymbol{K})-(\alpha+1-c^{2})l_{2}(\boldsymbol{K})],

σ^g2\displaystyle\hat{\sigma}_{g}^{2} c2τ12Var[g(𝒁)]+(c2τ12c4)(𝔼[g(𝒁)])2+σϵ2(xx+nλn)2dF1(x)+op(1)\displaystyle\geq c^{2}\tau_{1}^{2}\textrm{Var}[g(\boldsymbol{Z})]+(c^{2}\tau_{1}^{2}-c^{-4})(\mathbb{E}[g(\boldsymbol{Z})])^{2}+\sigma_{\epsilon}^{2}\int\left(\frac{x}{x+n\lambda_{n}}\right)^{2}\textrm{d}F_{-1}(x)+o_{p}(1)
σ^g2\displaystyle\hat{\sigma}_{g}^{2} τ12Var[g(𝒁)]+(τ12α2(τ2/τ1)2)(𝔼[g(𝒁)])2+(1c)2σϵ2(xx+nλn)2dF(x)+op(1),\displaystyle\leq\tau_{1}^{2}\textrm{Var}[g(\boldsymbol{Z})]+(\tau_{1}^{2}-\alpha^{2}(\tau_{2}/\tau_{1})^{2})(\mathbb{E}[g(\boldsymbol{Z})])^{2}+(1-c)^{2}\sigma_{\epsilon}^{2}\int\left(\frac{x}{x+n\lambda_{n}}\right)^{2}\textrm{d}F(x)+o_{p}(1),

and

σ^ε2\displaystyle\hat{\sigma}_{\varepsilon}^{2} (1τ1)2(Var[g(𝒁)]+σε2)+(1τ1)2(𝔼[g(𝒁)])2+op(1)\displaystyle\geq(1-\tau_{1})^{2}\left(\textrm{Var}[g(\boldsymbol{Z})]+\sigma_{\varepsilon}^{2}\right)+(1-\tau_{1})^{2}(\mathbb{E}[g(\boldsymbol{Z})])^{2}+o_{p}(1)
σ^ε2\displaystyle\hat{\sigma}_{\varepsilon}^{2} Var[g(𝒁)]+σε2+(𝔼[g(𝒁)])2+op(1),\displaystyle\leq\textrm{Var}[g(\boldsymbol{Z})]+\sigma_{\varepsilon}^{2}+(\mathbb{E}[g(\boldsymbol{Z})])^{2}+o_{p}(1),

where

τi=limnli(𝑲)li(𝑲)+nλn,i=1,2\tau_{i}=\lim_{n\to\infty}\frac{l_{i}(\boldsymbol{K})}{l_{i}(\boldsymbol{K})+n\lambda_{n}},\quad i=1,2

F(x)F(x) is the limiting empirical spectral distribution (ESD) of 𝐊\boldsymbol{K}, and F1(x)F_{-1}(x) is the limiting empirical spectral distribution (ESD) of 𝐊\boldsymbol{K} by removing the largest eigenvalue of 𝐊\boldsymbol{K}.

As a result, the bounds for σ^ϵ2\hat{\sigma}_{\epsilon}^{2} and σ^g2\hat{\sigma}_{g}^{2} further provide an upper bound and a lower bound for σ^ϵ2/σ^g2\hat{\sigma}_{\epsilon}^{2}/\hat{\sigma}_{g}^{2}, which is an important quantity for H^2\hat{H}^{2} since H^2=1/(1+σ^ϵ2/σ^g2)\hat{H}^{2}=1/(1+\hat{\sigma}_{\epsilon}^{2}/\hat{\sigma}_{g}^{2}). In addition, by careful choice of the regularization parameter λn\lambda_{n}, it can be shown that the bounds provided covers the true value σϵ2/σg2\sigma_{\epsilon}^{2}/\sigma_{g}^{2}.

Proposition 8.

Under the same conditions as in Theorem 7,

(1τ1)2(Var[g(𝒁)]+σϵ2)+(1τ1)2(𝔼[g(𝒁)])2c2τ12Var[g(𝒁)]+(τ12α2(τ2/τ1)2)(𝔼[g(𝒁)])2+(1c)2σϵ2(xx+nλn)2dF(x)\displaystyle\frac{(1-\tau_{1})^{2}(\textrm{Var}[g(\boldsymbol{Z})]+\sigma_{\epsilon}^{2})+(1-\tau_{1})^{2}(\mathbb{E}[g(\boldsymbol{Z})])^{2}}{c^{2}\tau_{1}^{2}\textrm{Var}[g(\boldsymbol{Z})]+(\tau_{1}^{2}-\alpha^{2}(\tau_{2}/\tau_{1})^{2})(\mathbb{E}[g(\boldsymbol{Z})])^{2}+(1-c)^{2}\sigma_{\epsilon}^{2}\int\left(\frac{x}{x+n\lambda_{n}}\right)^{2}\textrm{d}F(x)}
\displaystyle\leq limnσ^ϵ2/σ^g2\displaystyle\lim_{n\to\infty}\hat{\sigma}_{\epsilon}^{2}/\hat{\sigma}_{g}^{2}
\displaystyle\leq Var[g(𝒁)]+σϵ2+(𝔼[g(𝒁)])2c2τ12Var[g(𝒁)]+(c2τ12c4)(𝔼[g(𝒁)])2+σϵ2(xx+nλn)2dF1(x)\displaystyle\frac{\textrm{Var}[g(\boldsymbol{Z})]+\sigma_{\epsilon}^{2}+(\mathbb{E}[g(\boldsymbol{Z})])^{2}}{c^{2}\tau_{1}^{2}\textrm{Var}[g(\boldsymbol{Z})]+(c^{2}\tau_{1}^{2}-c^{-4})(\mathbb{E}[g(\boldsymbol{Z})])^{2}+\sigma_{\epsilon}^{2}\int\left(\frac{x}{x+n\lambda_{n}}\right)^{2}\textrm{d}F_{-1}(x)}

Consequently, if the regularization parameter λn\lambda_{n} is chosen such that

(α+1c2)l1(𝑲)l2(𝑲)c2l1(𝑲)(α+1c2)l2(𝑲)\displaystyle\frac{(\alpha+1-c^{2})l_{1}(\boldsymbol{K})l_{2}(\boldsymbol{K})}{c^{2}l_{1}(\boldsymbol{K})-(\alpha+1-c^{2})l_{2}(\boldsymbol{K})} nλn\displaystyle\leq n\lambda_{n}
(xx+nλn)2dF1(x)\displaystyle\int\left(\frac{x}{x+n\lambda_{n}}\right)^{2}\textrm{d}F_{-1}(x) σg2σϵ2(1c2τ12)\displaystyle\leq\frac{\sigma_{g}^{2}}{\sigma_{\epsilon}^{2}}(1-c^{2}\tau_{1}^{2})
(xx+nλn)2dF(x)\displaystyle\int\left(\frac{x}{x+n\lambda_{n}}\right)^{2}\textrm{d}F(x) 11c2[σg4σϵ4+(1c2τ12)σg2σϵ2\displaystyle\geq\frac{1}{1-c^{2}}\left[\frac{\sigma_{g}^{4}}{\sigma_{\epsilon}^{4}}+(1-c^{2}\tau_{1}^{2})\frac{\sigma_{g}^{2}}{\sigma_{\epsilon}^{2}}\right.
+(σg2σϵ4τ12α2(τ2/τ1)2σϵ2)(𝔼[g(𝒁)])2],\displaystyle\qquad\qquad\left.+\left(\frac{\sigma_{g}^{2}}{\sigma_{\epsilon}^{4}}-\frac{\tau_{1}^{2}-\alpha^{2}(\tau_{2}/\tau_{1})^{2}}{\sigma_{\epsilon}^{2}}\right)(\mathbb{E}[g(\boldsymbol{Z})])^{2}\right],

then the asymptotic bounds for σ^ϵ2/σ^g2\hat{\sigma}_{\epsilon}^{2}/\hat{\sigma}_{g}^{2} cover the true value σϵ2/σg2\sigma_{\epsilon}^{2}/\sigma_{g}^{2} with σg2=Var[g(𝐙)]\sigma_{g}^{2}=\textrm{Var}[g(\boldsymbol{Z})].

3 Simulation

To evaluate the performance of the proposed estimators, we started by creating populations, from which response variables were generated based on the following model:

Yi=g(𝒁i)+ϵi,i=1,,N,Y_{i}=g(\boldsymbol{Z}_{i})+\epsilon_{i},\quad i=1,\ldots,N, (8)

where NN is the population size specified in the following subsections; ϵi\epsilon_{i} are i.i.d. normal random variables with mean 0 and standard deviation 0.5. The genetic variants 𝒁ip\boldsymbol{Z}_{i}\in\mathbb{R}^{p} were generated under two different scenarios: (1). The genetic variants simulated based on Hardy-Weinberg equilibrium (HWE) and (2). The genetic variants sampled from the 1000-Genome Project [1]. Three underlying functions were used in the simulation studies:

  1. 1.

    Linear function: g(𝒁i)=2𝒁iT𝜷+5g(\boldsymbol{Z}_{i})=2\boldsymbol{Z}_{i}^{T}\boldsymbol{\beta}+5.

  2. 2.

    Quadratic function: g(𝒁i)=(𝒁iT𝜷)2g(\boldsymbol{Z}_{i})=(\boldsymbol{Z}_{i}^{T}\boldsymbol{\beta})^{2}.

  3. 3.

    Trigonometric function: g(x𝒁i)=sin(𝒁iT𝜷)+2𝒁iT𝜷g(x\boldsymbol{Z}_{i})=\sin(\boldsymbol{Z}_{i}^{T}\boldsymbol{\beta})+2\boldsymbol{Z}_{i}^{T}\boldsymbol{\beta}.

The genetic effect 𝜷𝒩p(𝟎,σg2𝑰n)\boldsymbol{\beta}\sim\mathcal{N}_{p}(\boldsymbol{0},\sigma_{g}^{2}\boldsymbol{I}_{n}) and the variance component σg\sigma_{g} is chosen so that the population broad-sense heritability lies between 0.2 and 0.8. In each scenario, we considered two cases: the low-dimensional case (N>pN>p) and the high-dimensional case (N<pN<p). We now explain in detail how the 𝒁i\boldsymbol{Z}_{i}’s were simulated in these two scenarios.

3.1 Scenario 1: Genetic Variants Simulated from HWE

In this scenario, we started by simulating the minor allele frequencies (MAFs) of the SNPs from the uniform distribution 𝒰[0.01,0.5]\mathcal{U}[0.01,0.5]. Then, Hardy-Weinberg equilibrium (HWE) was applied to simulate the additive genotypes. Specifically, the jjth SNP with MAF, MAFjMAF_{j}, has values 0, 1, and 2 with the following probabilities respectively:

(SNPj=0)\displaystyle\mathbb{P}(SNP_{j}=0) =(1MAFj)2,\displaystyle=(1-MAF_{j})^{2},
(SNPj=1)\displaystyle\mathbb{P}(SNP_{j}=1) =2MAFj(1MAFj),\displaystyle=2MAF_{j}(1-MAF_{j}),
(SNPj=2)\displaystyle\mathbb{P}(SNP_{j}=2) =MAFj2.\displaystyle=MAF_{j}^{2}.

The choices of NN, pp, and σg\sigma_{g} under different simulation settings are summarized in Table 1.

Table 1: Choices of NN, pp and σg\sigma_{g} for the scenario where the genetic variants were simulated based on HWE.
Low-dimensional (N>pN>p) High-dimensional (N<pN<p)
NN pp σg\sigma_{g} NN pp σg\sigma_{g}
Linear 1000 500 0.02 500 1000 0.01
Quadratic 1000 500 0.03 500 1000 0.02
Trigonometric 1000 500 0.02 500 1000 0.05

3.2 Scenario 2: Genetic Variants Simulated from Real Genetic Data

To mimic the real structure of sequence data, genetic variants in this scenario were sampled from the real sequencing data of Chromosome 17: 7,344,328–8,344,327 from the 1000 Genomes Project. The distribution of minor allele frequencies (MAFs) of single nucleotide polymorphisms (SNPs) in this region is heavily skewed to the right and ranges from 0.046% to 49.954%. In this scenario, pp SNPs were randomly sampled from all SNPs in the sequencing data. The choices of NN, pp and σg\sigma_{g} are summarized in Table 2

Table 2: Choices of NN, pp and σg\sigma_{g} for the scenario where the genetic variants were simulated based on real genetic data.
Low-dimensional (N>pN>p) High-dimensional (N<pN<p)
NN pp σg\sigma_{g} NN pp σg\sigma_{g}
Linear 1092 500 0.02 1092 1500 0.01
Quadratic 1092 500 0.03 1092 1500 0.015
Trigonometric 1092 500 0.05 1092 1500 0.05

In both scenarios, the linear kernel matrix 𝑲=p1𝒁𝒁T\boldsymbol{K}=p^{-1}\boldsymbol{ZZ}^{T}, the polynomial kernel matrix 𝑲=(𝑱n+p1𝒁𝒁T)2\boldsymbol{K}=(\boldsymbol{J}_{n}+p^{-1}\boldsymbol{ZZ}^{T})^{\circ 2}, where 2\circ 2 means elementwise squaring, and the Gaussian kernel 𝑲ij=exp{12𝒁i𝒁j2}\boldsymbol{K}_{ij}=\exp\left\{-\frac{1}{2}\left\|{\boldsymbol{Z}_{i}-\boldsymbol{Z}_{j}}\right\|^{2}\right\} were used to estimate the heritability. In terms of the regularization parameter λn\lambda_{n} in the kernel ridge regression, we predefined a candidacy set 𝒞={0.1,0.5,0.8,1,1.3,1.5,2,2.3,2.5,3,5}\mathcal{C}=\{0.1,0.5,0.8,1,1.3,1.5,2,2.3,2.5,3,5\} for the values of nλnn\lambda_{n} and calculated the heritability estimator using each candidate. In the low-dimensional case, samples of sizes 600, 700, 800, 900, and 1000 were randomly selected from the corresponding population, while in the high-dimensional case, samples of sizes 100, 200, 300, 400, and 500 were randomly picked from the corresponding population. We then calculated the heritability estimator for each sample size. Such a process was repeated 500 times.

Based on the results, the polynomial kernel provides the most accurate estimation of the underlying heritability across all settings when the value of nλnn\lambda_{n} is appropriately chosen. Table 3 and Table 4 summarize the mean heritability estimates and standard deviations across 500 repetitions when the underlying function is linear. The results for the other two functions used in the simulations are presented in Appendix B. Here, we only present tables for the three values of nλnn\lambda_{n} that yield the most accurate estimates for the polynomial kernel.

Table 3: Mean and standard deviation of the heritability estimators obtained after running 500 repetitions when the underlying function is linear and the low dimensional case was applied. The true heritability for the HWE scenario is 0.769 and the true heritability for the 1000 Genome Project case is 0.774.
Hardy Weinberg Equilibrium 1000 Genome Project
nλnn\lambda_{n} nλnn\lambda_{n}
Linear 2.3 2.5 3.0 1.5 2.0 2.3
600 0.034 (0.002) 0.031 (0.002) 0.025 (0.002) 0.054 (0.005) 0.042 (0.004) 0.037 (0.004)
700 0.030 (0.002) 0.027 (0.002) 0.022 (0.001) 0.047 (0.004) 0.037 (0.003) 0.033 (0.003)
800 0.024 (0.001) 0.022 (0.001) 0.018 (0.001) 0.040 (0.003) 0.031 (0.003) 0.028 (0.002)
900 0.017 (0.001) 0.016 (0.001) 0.013 (0.001) 0.031 (0.002) 0.025 (0.002) 0.022 (0.002)
1000 0.009 (0.000) 0.008 (0.000) 0.007 (0.000) 0.024 (0.002) 0.019 (0.001) 0.017 (0.001)
Polynomial 2.3 2.5 3 1.5 2.0 2.3
600 0.702 (0.011) 0.674 (0.012) 0.608 (0.013) 0.803 (0.014) 0.765 (0.016) 0.745 (0.017)
700 0.712 (0.009) 0.685 (0.009) 0.622 (0.011) 0.806 (0.012) 0.770 (0.013) 0.750 (0.014)
800 0.721 (0.007) 0.695 (0.008) 0.635 (0.008) 0.809 (0.010) 0.773 (0.011) 0.755 (0.010)
900 0.729 (0.005) 0.704 (0.006) 0.646 (0.006) 0.810 (0.007) 0.777 (0.008) 0.758 (0.009)
1000 0.736 (0.000) 0.713 (0.000) 0.656 (0.000) 0.811 (0.005) 0.780 (0.005) 0.762 (0.006)
Gaussian 2.3 2.5 3 1.5 2.0 2.3
600 0.265 (0.009) 0.257 (0.009) 0.242 (0.009) 0.423 (0.016) 0.412 (0.015) 0.408 (0.015)
700 0.271 (0.008) 0.262 (0.008) 0.246 (0.007) 0.426 (0.013) 0.413 (0.012) 0.409 (0.012)
800 0.275 (0.006) 0.266 (0.006) 0.250 (0.005) 0.428 (0.011) 0.414 (0.010) 0.409 (0.009)
900 0.281 (0.004) 0.270 (0.004) 0.253 (0.004) 0.430 (0.008) 0.416 (0.008) 0.411 (0.007)
1000 0.285 (0.000) 0.275 (0.000) 0.256 (0.000) 0.432 (0.005) 0.417 (0.005) 0.411 (0.005)
Table 4: Mean and standard deviation of the heritability estimators obtained after running 500 repetitions when the underlying function is linear and the high dimensional case was applied. The true heritability for the HWE scenario is 0.594 and the true heritability for the 1000 Genome Project case is 0.719.
Hardy Weinberg Equilibrium 1000 Genome Project
nλnn\lambda_{n} nλnn\lambda_{n}
Linear 2.0 2.3 2.5 1.3 1.5 2.0
100 0.014 (0.002) 0.012 (0.002) 0.010 (0.002) 0.056 (0.010) 0.041 (0.008) 0.034 (0.007)
200 0.018 (0.002) 0.014 (0.002) 0.013 (0.001) 0.055 (0.007) 0.039 (0.005) 0.034 (0.005)
300 0.017 (0.001) 0.014 (0.001) 0.012 (0.001) 0.052 (0.006) 0.038 (0.005) 0.033 (0.005)
400 0.012 (0.001) 0.010 (0.001) 0.009 (0.001) 0.049 (0.005) 0.036 (0.004) 0.031 (0.004)
500 0.004 (0.000) 0.004 (0.000) 0.003 (0.000) 0.044 (0.005) 0.033 (0.003) 0.028 (0.003)
Polynomial 2.0 2.3 2.5 1.3 1.5 2.0
100 0.682 (0.013) 0.618 (0.015) 0.576 (0.015) 0.766 (0.046) 0.688 (0.056) 0.643 (0.059)
200 0.700 (0.011) 0.640 (0.013) 0.601 (0.013) 0.779 (0.031) 0.708 (0.037) 0.670 (0.039)
300 0.708 (0.009) 0.650 (0.010) 0.614 (0.011) 0.783 (0.024) 0.717 (0.028) 0.683 (0.031)
400 0.714 (0.006) 0.658 (0.008) 0.624 (0.008) 0.785 (0.019) 0.722 (0.022) 0.689 (0.025)
500 0.719 (0.000) 0.665 (0.000) 0.631 (0.000) 0.787 (0.016) 0.727 (0.019) 0.695 (0.021)
Gaussian 2.0 2.3 2.5 1.3 1.5 2.0
100 0.100 (0.008) 0.079 (0.007) 0.068 (0.007) 0.427 (0.043) 0.408 (0.040) 0.400 (0.039)
200 0.117 (0.007) 0.095 (0.006) 0.083 (0.005) 0.438 (0.031) 0.423 (0.026) 0.415 (0.026)
300 0.127 (0.005) 0.103 (0.005) 0.091 (0.004) 0.440 (0.023) 0.425 (0.023) 0.422 (0.021)
400 0.134 (0.004) 0.110 (0.003) 0.097 (0.003) 0.441 (0.019) 0.428 (0.017) 0.423 (0.018)
500 0.141 (0.000) 0.116 (0.000) 0.103 (0.000) 0.442 (0.016) 0.429 (0.015) 0.424 (0.014)

4 Real Data Analysis

Alzheimer’s disease (AD) is the most common neurodegenerative disease and has substantial genetic components [5, 16]. AD primarily targets the hippocampus region of the brain, which plays an important role in learning and memory. In the early stages of AD, the hippocampus suffers from volume loss [17, 12]. Additionally, studies have shown that all hippocampal subregion volumes are highly heritable [19]. Accurately estimating genetic heritability can help us understand more about the genetic inheritance of AD and support the development of effective genetic treatments for the disease.

We applied our proposed genetic heritability estimator to whole genome sequencing data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI). A total of 808 samples from the screening and baseline of the ADNI1 and ADNI2 studies included whole genome sequencing data. After retaining SNPs that were called or imputed with r2>0.9r^{2}>0.9, we performed quality control by removing all SNPs with MAF ¡0.05 and any SNPs that did not pass the Hardy-Weinberg equilibrium test (pp-value <1e6<1e-6). Combining individuals having phenotype information, 576 individuals remain for the analysis.

We began by regressing the natural logarithm of hippocampal volume on several common clinical covariates, including age, gender, education status, and the number of alleles in APOEε\varepsilon4. The residuals were then used as the response variable in the kernel ridge regression. The three kernel matrices used in the simulation studies—linear kernel, polynomial kernel of degree 2, and Gaussian kernel-were constructed based on the SNP matrix.

After applying our method to estimate the heritability based on the residuals using nλn=0.8n\lambda_{n}=0.8, we obtained H^2=0.028\hat{H}^{2}=0.028 for the linear kernel; H^2=0.115\hat{H}^{2}=0.115 for the polynomial kernel and H^2=0.024\hat{H}^{2}=0.024 for the Gaussian kernel. As in the simulation studies, the polynomial kernel gives the best performance among the three kernel matrices. We hypothesize that the reason of seeing these small numbers is that the age predictor has explained much of the variations in the response.

5 Discussion

In this paper, we proposed an estimator for the broad-sense genetic heritability of continuous phenotypes using kernel ridge regression. Theoretically, we provided an upper bound and a lower bound for the heritability estimator under certain regularity conditions. Consequently, we demonstrated that with an appropriate choice of the regularization parameter λn\lambda_{n} in the kernel ridge regression, the upper and lower bounds for the estimator encompass the underlying heritability. Simulation studies across various scenarios suggest that the proposed estimator can indeed capture the underlying heritability.

In the methodology section, we provided a general result on the bounds for our heritability estimator. It is worth noting that in some special cases, the bounds could be simpler. For instance, when r:=rank(𝑲)=o(n)r:=\textrm{rank}(\boldsymbol{K})=o(n), it then follows from the von Neumann trace inequality [13] that as nn\to\infty,

1n1I2g\displaystyle\frac{1}{n-1}I_{2}^{g} =σϵ2n1tr((𝑲+nλn𝑰n)1𝑲(𝑰n𝑱¯n)𝑲(𝑲+nλn𝑰n)1)+op(1)\displaystyle=\frac{\sigma_{\epsilon}^{2}}{n-1}\textrm{tr}\left((\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}\boldsymbol{K}(\boldsymbol{I}_{n}-\bar{\boldsymbol{J}}_{n})\boldsymbol{K}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}\right)+o_{p}(1)
=σϵ2n1tr(𝑲(𝑲+nλn𝑰n)2𝑲(𝑰n𝑱¯n))+op(1)\displaystyle=\frac{\sigma_{\epsilon}^{2}}{n-1}\textrm{tr}\left(\boldsymbol{K}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-2}\boldsymbol{K}(\boldsymbol{I}_{n}-\bar{\boldsymbol{J}}_{n})\right)+o_{p}(1)
σϵ2n1i=1nli(𝑲(𝑲+nλn𝑰n)2𝑲)li(𝑰n𝑱¯n)\displaystyle\leq\frac{\sigma_{\epsilon}^{2}}{n-1}\sum_{i=1}^{n}l_{i}(\boldsymbol{K}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-2}\boldsymbol{K})l_{i}(\boldsymbol{I}_{n}-\bar{\boldsymbol{J}}_{n})
=σϵ2n1i=1r(li(𝑲)li(𝑲)+nλn)2+op(1)\displaystyle=\frac{\sigma_{\epsilon}^{2}}{n-1}\sum_{i=1}^{r}\left(\frac{l_{i}(\boldsymbol{K})}{l_{i}(\boldsymbol{K})+n\lambda_{n}}\right)^{2}+o_{p}(1)
rσϵ2n1+op(1)\displaystyle\leq\frac{r\sigma_{\epsilon}^{2}}{n-1}+o_{p}(1)
=op(1).\displaystyle=o_{p}(1).

Such a case occurs when the kernel matrix is rank deficient, which is generally true in the low-dimensional case. Consequently, σ^g2=1n1I1g+op(1)\hat{\sigma}_{g}^{2}=\frac{1}{n-1}I_{1}^{g}+o_{p}(1). In other words, the integral term for the empirical spectral distribution of 𝑲\boldsymbol{K} will vanish.

Despite the popularity of the Gaussian kernel in practice, the results of the simulation studies and real data analysis suggest that the polynomial kernel generally provides the best performance. In reality, for a given dataset, there is little prior knowledge about which kernel matrix to use. As a future project, we will explore using machine learning techniques, such as neural networks, to learn from the data and self-determine the best kernel matrix. Additionally, estimators obtained from kernel ridge regression are usually biased, and this bias may deteriorate the performance of the proposed estimator. Whether debiasing techniques can improve the performance of heritability estimation will also be part of our future work.

\bmhead

Acknowledgements

The authors would like to thank MathWorks at Texas State University for supporting this research.

Declarations

  • Funding: No funding was received for conducting this study.

  • Competing interests: The authors declare they have no financial interests. In terms of non-financial interests, Xiaoxi Shen has served as an editorial review board member for Frontiers in Systems Biology.

  • Ethics approval and consent to participate: Not applicable.

  • Consent for publication: Not applicable.

  • Data availability: The data used in the simulation studies can be found at https://github.com/SxxMichael/KRR-Heritability

  • Materials availability: Not applicable

  • Code availability: The code used to implement the methods can be found at https://github.com/SxxMichael/KRR-Heritability

  • Author contribution: Conceptualization: Olivia Bley, Elizabeth Lei, Andy Zhou; Methodology: Olivia Bley, Elizabeth Lei, Andy Zhou, Xiaoxi Shen; Formal analysis and investigation: Xiaoxi Shen; Writing - original draft preparation: Olivia Bley, Elizabeth Lei, Andy Zhou; Writing - review and editing: Olivia Bley, Elizabeth Lei, Andy Zhou, Xiaoxi Shen; Supervision: Xiaoxi Shen.

If any of the sections are not relevant to your manuscript, please include the heading and write ‘Not applicable’ for that section.

Appendix A Proofs

An appendix contains the proof of the main results in Section 2 of the main text.

A.1 Proof of Proposition 1

Proof.

Under (C1), it follows from Theorem 2.6.3 in Vershynin [18] that for any t>0t>0,

(|1n1I3g|>2t|𝒵)\displaystyle\mathbb{P}\left(\left.\left|{\frac{1}{n-1}I_{3}^{g}}\right|>2t\right|\mathcal{Z}\right) exp{c(n1)2t2C2(𝑲+nλn𝑰n)1𝑲(𝑰n𝑱¯n)𝑲(𝑲+nλn𝑰n)1𝒈2}.\displaystyle\lesssim\exp\left\{-\frac{c(n-1)^{2}t^{2}}{C^{2}\left\|{(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}\boldsymbol{K}(\boldsymbol{I}_{n}-\bar{\boldsymbol{J}}_{n})\boldsymbol{K}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}\boldsymbol{g}}\right\|^{2}}\right\}.

Since 𝑰n𝑱¯n\boldsymbol{I}_{n}-\bar{\boldsymbol{J}}_{n} is idempotent, it follows that 𝑰n𝑱¯nop=1\left\|{\boldsymbol{I}_{n}-\bar{\boldsymbol{J}}_{n}}\right\|_{op}=1 and hence

(𝑲+nλn𝑰n)1𝑲(𝑰n𝑱¯n)𝑲(𝑲+nλn𝑰n)1𝒈2\displaystyle\left\|{(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}\boldsymbol{K}(\boldsymbol{I}_{n}-\bar{\boldsymbol{J}}_{n})\boldsymbol{K}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}\boldsymbol{g}}\right\|^{2}
\displaystyle\leq (𝑲+nλn𝑰n)1𝑲(𝑰n𝑱¯n)𝑲(𝑲+nλn𝑰n)1op2𝒈2\displaystyle\left\|{(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}\boldsymbol{K}(\boldsymbol{I}_{n}-\bar{\boldsymbol{J}}_{n})\boldsymbol{K}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}}\right\|_{op}^{2}\left\|{\boldsymbol{g}}\right\|^{2}
\displaystyle\leq 𝑲(𝑲+nλn𝑰n)2𝑲op𝒈2\displaystyle\left\|{\boldsymbol{K}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-2}\boldsymbol{K}}\right\|_{op}\left\|{\boldsymbol{g}}\right\|^{2}
\displaystyle\leq (max1inli(𝑲)li(𝑲)+nλn)4nM2\displaystyle\left(\max_{1\leq i\leq n}\frac{l_{i}(\boldsymbol{K})}{l_{i}(\boldsymbol{K})+n\lambda_{n}}\right)^{4}nM^{2}
\displaystyle\leq nM2=𝒪(n).\displaystyle nM^{2}=\mathcal{O}(n).

Therefore,

(|1n1I3g|>2t|𝒵)exp{𝒪(n)}0,\mathbb{P}\left(\left.\left|{\frac{1}{n-1}I_{3}^{g}}\right|>2t\right|\mathcal{Z}\right)\lesssim\exp\{-\mathcal{O}(n)\}\to 0,

which implies that given 𝒵\mathcal{Z}, 1n1I3g=op(1)\frac{1}{n-1}I_{3}^{g}=o_{p}(1). Similarly,

(|1nI3ϵ|>2t|𝒵)exp{cn2t2C2n2λn2(𝑲+nλn𝑰n)2𝒈2}.\mathbb{P}\left(\left.\left|{\frac{1}{n}I_{3}^{\epsilon}}\right|>2t\right|\mathcal{Z}\right)\lesssim\exp\left\{-\frac{cn^{2}t^{2}}{C^{2}\left\|{n^{2}\lambda_{n}^{2}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-2}\boldsymbol{g}}\right\|^{2}}\right\}.

Note that

n2λn2(𝑲+nλn𝑰n)2𝒈2\displaystyle\left\|{n^{2}\lambda_{n}^{2}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-2}\boldsymbol{g}}\right\|^{2} n2λn2(𝑲+nλn𝑰n)2op2𝒈2\displaystyle\leq\left\|{n^{2}\lambda_{n}^{2}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-2}}\right\|_{op}^{2}\left\|{\boldsymbol{g}}\right\|^{2}
(max1in(nλnli(𝑲)+nλn)2)2nM2\displaystyle\leq\left(\max_{1\leq i\leq n}\left(\frac{n\lambda_{n}}{l_{i}(\boldsymbol{K})+n\lambda_{n}}\right)^{2}\right)^{2}nM^{2}
nM2=𝒪(n),\displaystyle\leq nM^{2}=\mathcal{O}(n),

Therefore,

(|1n1I3ϵ|>2t|𝒵)exp{𝒪(n)}0,\mathbb{P}\left(\left.\left|{\frac{1}{n-1}I_{3}^{\epsilon}}\right|>2t\right|\mathcal{Z}\right)\lesssim\exp\{-\mathcal{O}(n)\}\to 0,

which implies that given 𝒵\mathcal{Z}, 1n1I3ϵ=op(1)\frac{1}{n-1}I_{3}^{\epsilon}=o_{p}(1). ∎

A.2 Proof of Proposition 2

Proof.

Since ϵi\epsilon_{i} are sub-gaussian random variables by (C1), it follows from the Hanson-Wright inequality [14] that

(|1n1(I2g𝔼[I2g])|>t|𝒵)exp{c[(n1)2t2C4𝚪1F2(n1)tC2𝚪1op]},\mathbb{P}\left(\left.\left|{\frac{1}{n-1}(I_{2}^{g}-\mathbb{E}[I_{2}^{g}])}\right|>t\right|\mathcal{Z}\right)\lesssim\exp\left\{-c\left[\frac{(n-1)^{2}t^{2}}{C^{4}\left\|{\boldsymbol{\Gamma}_{1}}\right\|_{F}^{2}}\wedge\frac{(n-1)t}{C^{2}\left\|{\boldsymbol{\Gamma}_{1}}\right\|_{op}}\right]\right\},

where 𝚪1=(𝑲+nλn𝑰n)1𝑲(𝑰n𝑱¯n)𝑲(𝑲+nλn𝑰n)1\boldsymbol{\Gamma}_{1}=(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}\boldsymbol{K}(\boldsymbol{I}_{n}-\bar{\boldsymbol{J}}_{n})\boldsymbol{K}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}. Now note that

𝚪1op\displaystyle\left\|{\boldsymbol{\Gamma}_{1}}\right\|_{op} 𝑲(𝑲+nλn𝑰n)1𝑲op𝑰n𝑱¯nop\displaystyle\leq\left\|{\boldsymbol{K}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}\boldsymbol{K}}\right\|_{op}\left\|{\boldsymbol{I}_{n}-\bar{\boldsymbol{J}}_{n}}\right\|_{op}
=(l1(𝑲)l1(𝑲)+nλn)2=𝒪(1)\displaystyle=\left(\frac{l_{1}(\boldsymbol{K})}{l_{1}(\boldsymbol{K})+n\lambda_{n}}\right)^{2}=\mathcal{O}(1)
𝚪1F2\displaystyle\left\|{\boldsymbol{\Gamma}_{1}}\right\|_{F}^{2} =tr(𝚪12)n𝚪1op2=𝒪(n),\displaystyle=\textrm{tr}(\boldsymbol{\Gamma}_{1}^{2})\leq n\left\|{\boldsymbol{\Gamma}_{1}}\right\|_{op}^{2}=\mathcal{O}(n),

which implies that as nn\to\infty,

(|1n1(I2g𝔼[I2g])|>t|𝒵)exp{𝒪(n)}0.\mathbb{P}\left(\left.\left|{\frac{1}{n-1}\left(I_{2}^{g}-\mathbb{E}[I_{2}^{g}]\right)}\right|>t\right|\mathcal{Z}\right)\lesssim\exp\left\{-\mathcal{O}(n)\right\}\to 0.

Therefore,

1n1I2g\displaystyle\frac{1}{n-1}I_{2}^{g} =1n1𝔼[I2g]+op(1)\displaystyle=\frac{1}{n-1}\mathbb{E}[I_{2}^{g}]+o_{p}(1)
=σϵ2n1tr((𝑲+nλn𝑰n)1𝑲(𝑰n𝑱¯n)𝑲(𝑲+nλn𝑰n)1)+op(1).\displaystyle=\frac{\sigma_{\epsilon}^{2}}{n-1}\textrm{tr}\left((\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}\boldsymbol{K}(\boldsymbol{I}_{n}-\bar{\boldsymbol{J}}_{n})\boldsymbol{K}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}\right)+o_{p}(1).

Similarly, we have

(|1n(I2ϵ𝔼[I2ϵ])|>t|𝒵)exp{c[n2t2C4𝚪2F2ntC2𝚪2op]},\mathbb{P}\left(\left.\left|{\frac{1}{n}(I_{2}^{\epsilon}-\mathbb{E}[I_{2}^{\epsilon}])}\right|>t\right|\mathcal{Z}\right)\lesssim\exp\left\{-c\left[\frac{n^{2}t^{2}}{C^{4}\left\|{\boldsymbol{\Gamma}_{2}}\right\|_{F}^{2}}\wedge\frac{nt}{C^{2}\left\|{\boldsymbol{\Gamma}_{2}}\right\|_{op}}\right]\right\},

where 𝚪2=n2λn2(𝑲+nλn𝑰n)2\boldsymbol{\Gamma}_{2}=n^{2}\lambda_{n}^{2}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-2}. Then note that

𝚪2op\displaystyle\left\|{\boldsymbol{\Gamma}_{2}}\right\|_{op} =(nλnln(𝑲)+nλn)2=𝒪(1).\displaystyle=\left(\frac{n\lambda_{n}}{l_{n}(\boldsymbol{K})+n\lambda_{n}}\right)^{2}=\mathcal{O}(1).
𝚪2F2\displaystyle\left\|{\boldsymbol{\Gamma}_{2}}\right\|_{F}^{2} =tr(𝚪22)n𝚪2op2=𝒪(n),\displaystyle=\textrm{tr}(\boldsymbol{\Gamma}_{2}^{2})\leq n\left\|{\boldsymbol{\Gamma}_{2}}\right\|_{op}^{2}=\mathcal{O}(n),

which implies that as nn\to\infty

(|1n(I2ϵ𝔼[I2ϵ])|>t|𝒵)exp{𝒪(n)}0.\mathbb{P}\left(\left.\left|{\frac{1}{n}\left(I_{2}^{\epsilon}-\mathbb{E}[I_{2}^{\epsilon}]\right)}\right|>t\right|\mathcal{Z}\right)\lesssim\exp\left\{-\mathcal{O}(n)\right\}\to 0.

Therefore,

1nI2ϵ\displaystyle\frac{1}{n}I_{2}^{\epsilon} =1n𝔼[I2ϵ]+op(1)\displaystyle=\frac{1}{n}\mathbb{E}[I_{2}^{\epsilon}]+o_{p}(1)
=σϵ2ntr(n2λn2(𝑲+nλn)2)+op(1).\displaystyle=\frac{\sigma_{\epsilon}^{2}}{n}\textrm{tr}\left(n^{2}\lambda_{n}^{2}(\boldsymbol{K}+n\lambda_{n})^{-2}\right)+o_{p}(1).

A.3 Proof of Proposition 3

Proof.

Note that

|𝟏nT𝑲(𝑲+nλn𝑰n)1𝒈|\displaystyle\left|{\boldsymbol{1}_{n}^{T}\boldsymbol{K}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}\boldsymbol{g}}\right| =|i=1nli(𝑲)li(𝑲)+nλn(𝒗iT𝟏n)(𝒗iT𝒈)|\displaystyle=\left|{\sum_{i=1}^{n}\frac{l_{i}(\boldsymbol{K})}{l_{i}(\boldsymbol{K})+n\lambda_{n}}(\boldsymbol{v}_{i}^{T}\boldsymbol{1}_{n})(\boldsymbol{v}_{i}^{T}\boldsymbol{g})}\right|
=|l1(𝑲)l1(𝑲)+nλn(𝒗1T𝟏n)(𝒗1T𝒈)+i2li(𝑲)li(𝑲)+nλn(𝒗iT𝟏n)(𝒗iT𝒈)|\displaystyle=\left|{\frac{l_{1}(\boldsymbol{K})}{l_{1}(\boldsymbol{K})+n\lambda_{n}}(\boldsymbol{v}_{1}^{T}\boldsymbol{1}_{n})(\boldsymbol{v}_{1}^{T}\boldsymbol{g})+\sum_{i\geq 2}\frac{l_{i}(\boldsymbol{K})}{l_{i}(\boldsymbol{K})+n\lambda_{n}}(\boldsymbol{v}_{i}^{T}\boldsymbol{1}_{n})(\boldsymbol{v}_{i}^{T}\boldsymbol{g})}\right|
l1(𝑲)l1(𝑲)+nλn|𝒗𝟏𝑻𝟏𝒏||𝒗1T𝒈|i2li(𝑲)li(𝑲)+nλn|𝒗iT𝟏n||𝒗iT𝒈|\displaystyle\geq\frac{l_{1}(\boldsymbol{K})}{l_{1}(\boldsymbol{K})+n\lambda_{n}}\left|{\boldsymbol{v_{1}^{T}\boldsymbol{1}_{n}}}\right|\left|{\boldsymbol{v}_{1}^{T}\boldsymbol{g}}\right|-\sum_{i\geq 2}\frac{l_{i}(\boldsymbol{K})}{l_{i}(\boldsymbol{K})+n\lambda_{n}}\left|{\boldsymbol{v}_{i}^{T}\boldsymbol{1}_{n}}\right|\left|{\boldsymbol{v}_{i}^{T}\boldsymbol{g}}\right|
(c2l1(𝑲)l1(𝑲)+nλn(1c2)l2(𝑲)l2(𝑲)+nλn)n𝒈,\displaystyle\geq\left(\frac{c^{2}l_{1}(\boldsymbol{K})}{l_{1}(\boldsymbol{K})+n\lambda_{n}}-(1-c^{2})\frac{l_{2}(\boldsymbol{K})}{l_{2}(\boldsymbol{K})+n\lambda_{n}}\right)\sqrt{n}\left\|{\boldsymbol{g}}\right\|,

where the last inequality follows since by assumption l1(𝑲)l1(𝑲)+nλn|𝒗1T𝟏n||𝒗1T𝒈|l1(𝑲)l1(𝑲)+nλnc2n𝒈\frac{l_{1}(\boldsymbol{K})}{l_{1}(\boldsymbol{K})+n\lambda_{n}}\left|{\boldsymbol{v}_{1}^{T}\boldsymbol{1}_{n}}\right|\left|{\boldsymbol{v}_{1}^{T}\boldsymbol{g}}\right|\geq\frac{l_{1}(\boldsymbol{K})}{l_{1}(\boldsymbol{K})+n\lambda_{n}}c^{2}\sqrt{n}\left\|{\boldsymbol{g}}\right\| and

i2li(𝑲)li(𝑲)+nλn|𝒗iT𝟏n||𝒗iT𝒈|\displaystyle\sum_{i\geq 2}\frac{l_{i}(\boldsymbol{K})}{l_{i}(\boldsymbol{K})+n\lambda_{n}}\left|{\boldsymbol{v}_{i}^{T}\boldsymbol{1}_{n}}\right|\left|{\boldsymbol{v}_{i}^{T}\boldsymbol{g}}\right| l2(𝑲)l2(𝑲)+nλni2|𝒗iT𝟏n||𝒗iT𝒈|\displaystyle\leq\frac{l_{2}(\boldsymbol{K})}{l_{2}(\boldsymbol{K})+n\lambda_{n}}\sum_{i\geq 2}\left|{\boldsymbol{v}_{i}^{T}\boldsymbol{1}_{n}}\right|\left|{\boldsymbol{v}_{i}^{T}\boldsymbol{g}}\right|
l2(𝑲)l2(𝑲)+nλni2(𝒗iT𝟏n)2i2(𝒗iT𝒈)2\displaystyle\leq\frac{l_{2}(\boldsymbol{K})}{l_{2}(\boldsymbol{K})+n\lambda_{n}}\sqrt{\sum_{i\geq 2}(\boldsymbol{v}_{i}^{T}\boldsymbol{1}_{n})^{2}}\sqrt{\sum_{i\geq 2}(\boldsymbol{v}_{i}^{T}\boldsymbol{g})^{2}}
=l2(𝑲)l2(𝑲)+nλni=1n(𝒗iT𝟏n)2(𝒗1T𝟏n)2i=1n(𝒗iT𝒈)2(𝒗1T𝒈)2\displaystyle=\frac{l_{2}(\boldsymbol{K})}{l_{2}(\boldsymbol{K})+n\lambda_{n}}\sqrt{\sum_{i=1}^{n}(\boldsymbol{v}_{i}^{T}\boldsymbol{1}_{n})^{2}-(\boldsymbol{v}_{1}^{T}\boldsymbol{1}_{n})^{2}}\sqrt{\sum_{i=1}^{n}(\boldsymbol{v}_{i}^{T}\boldsymbol{g})^{2}-(\boldsymbol{v}_{1}^{T}\boldsymbol{g})^{2}}
(1c2)l2(𝑲)l2(𝑲)+nλnn𝒈.\displaystyle\leq(1-c^{2})\frac{l_{2}(\boldsymbol{K})}{l_{2}(\boldsymbol{K})+n\lambda_{n}}\sqrt{n}\left\|{\boldsymbol{g}}\right\|.

Based on the choice of nλnn\lambda_{n} and (C4), it follows that c2l1(𝑲)l1(𝑲)+nλn(1c2)l2(𝑲)l2(𝑲)+nλnα(l2(𝑲)l2(𝑲)+nλn)\frac{c^{2}l_{1}(\boldsymbol{K})}{l_{1}(\boldsymbol{K})+n\lambda_{n}}-(1-c^{2})\frac{l_{2}(\boldsymbol{K})}{l_{2}(\boldsymbol{K})+n\lambda_{n}}\geq\alpha\left(\frac{l_{2}(\boldsymbol{K})}{l_{2}(\boldsymbol{K})+n\lambda_{n}}\right) and the desired result then follows. ∎

A.4 Proof of Proposition 4

Proof.

First, for the denominator (𝟏nT𝒈)2(\boldsymbol{1}_{n}^{T}\boldsymbol{g})^{2}, we have

c2n𝒈2(𝟏nT𝒈)2=𝒈T𝟏n𝟏nT𝒈𝟏n𝟏nTop𝒈2=n𝒈2.c^{2}n\left\|{\boldsymbol{g}}\right\|^{2}\leq(\boldsymbol{1}_{n}^{T}\boldsymbol{g})^{2}=\boldsymbol{g}^{T}\boldsymbol{1}_{n}\boldsymbol{1}_{n}^{T}\boldsymbol{g}\leq\left\|{\boldsymbol{1}_{n}\boldsymbol{1}_{n}^{T}}\right\|_{op}\left\|{\boldsymbol{g}}\right\|^{2}=n\left\|{\boldsymbol{g}}\right\|^{2}.

On the other hand, we have

1n(𝟏nT𝑲(𝑲+nλn𝑰n)1𝒈)2\displaystyle\frac{1}{n}(\boldsymbol{1}_{n}^{T}\boldsymbol{K}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}\boldsymbol{g})^{2} =1n𝒈T(𝑲+nλn𝑰n)1𝑲𝑱n𝑲(𝑲+nλn𝑰n)1𝒈\displaystyle=\frac{1}{n}\boldsymbol{g}^{T}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}\boldsymbol{K}\boldsymbol{J}_{n}\boldsymbol{K}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}\boldsymbol{g}
1n(𝑲+nλn𝑰n)1𝑲𝑱n𝑲(𝑲+nλn𝑰n)1op𝒈2\displaystyle\leq\frac{1}{n}\left\|{(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}\boldsymbol{K}\boldsymbol{J}_{n}\boldsymbol{K}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}}\right\|_{op}\left\|{\boldsymbol{g}}\right\|^{2}
1n𝑱n𝑲(𝑲+nλn𝑰n)2𝑲op𝒈2\displaystyle\leq\frac{1}{n}\left\|{\boldsymbol{J}_{n}}\right\|\left\|{\boldsymbol{K}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-2}\boldsymbol{K}}\right\|_{op}\left\|{\boldsymbol{g}}\right\|^{2}
=(l1(𝑲)l1(𝑲)+nλn)2𝒈2,\displaystyle=\left(\frac{l_{1}(\boldsymbol{K})}{l_{1}(\boldsymbol{K})+n\lambda_{n}}\right)^{2}\left\|{\boldsymbol{g}}\right\|^{2},

and from Proposition 3,

1n(𝟏nT𝑲(𝑲+nλn𝑰n)1𝒈)2\displaystyle\frac{1}{n}(\boldsymbol{1}_{n}^{T}\boldsymbol{K}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}\boldsymbol{g})^{2} α2(l2(𝑲)l2(𝑲)+nλn)2𝒈2\displaystyle\geq\alpha^{2}\left(\frac{l_{2}(\boldsymbol{K})}{l_{2}(\boldsymbol{K})+n\lambda_{n}}\right)^{2}\left\|{\boldsymbol{g}}\right\|^{2}

Hence,

α2(l2(𝑲)l2(𝑲)+nλn)21n(𝟏nT𝑲(𝑲+nλn𝑰n)1𝒈)21n(𝟏nT𝒈)2c2(l1(𝑲)l1(𝑲)+nλn)2.\alpha^{2}\left(\frac{l_{2}(\boldsymbol{K})}{l_{2}(\boldsymbol{K})+n\lambda_{n}}\right)^{2}\leq\frac{\frac{1}{n}(\boldsymbol{1}_{n}^{T}\boldsymbol{K}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}\boldsymbol{g})^{2}}{\frac{1}{n}(\boldsymbol{1}_{n}^{T}\boldsymbol{g})^{2}}\leq c^{-2}\left(\frac{l_{1}(\boldsymbol{K})}{l_{1}(\boldsymbol{K})+n\lambda_{n}}\right)^{2}.

A.5 Proof of Proposition 5

Proof.

Note that

1n1I1g\displaystyle\frac{1}{n-1}I_{1}^{g} =1n1𝒈T(𝑲+nλn𝑰n)1𝑲(𝑰n𝑱¯n)𝑲(𝑲+nλn𝑰n)1𝒈\displaystyle=\frac{1}{n-1}\boldsymbol{g}^{T}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}\boldsymbol{K}(\boldsymbol{I}_{n}-\bar{\boldsymbol{J}}_{n})\boldsymbol{K}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}\boldsymbol{g}
=1n1[𝒈T(𝑲+nλn𝑰n)1𝑲2(𝑲+nλn𝑰n)1𝒈1n(𝟏nT𝑲(𝑲+nλn𝑰n)1𝒈)2]\displaystyle=\frac{1}{n-1}\left[\boldsymbol{g}^{T}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}\boldsymbol{K}^{2}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}\boldsymbol{g}-\frac{1}{n}(\boldsymbol{1}_{n}^{T}\boldsymbol{K}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}\boldsymbol{g})^{2}\right]
=1n1[i=1n(li(𝑲)li(𝑲)+nλn)2(𝒗iT𝒈)21n(𝟏nT𝑲(𝑲+nλn𝑰n)1𝒈)2].\displaystyle=\frac{1}{n-1}\left[\sum_{i=1}^{n}\left(\frac{l_{i}(\boldsymbol{K})}{l_{i}(\boldsymbol{K})+n\lambda_{n}}\right)^{2}(\boldsymbol{v}_{i}^{T}\boldsymbol{g})^{2}-\frac{1}{n}(\boldsymbol{1}_{n}^{T}\boldsymbol{K}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}\boldsymbol{g})^{2}\right].

Since by (C3),

c2(l1(𝑲)l1(𝑲)+nλn)2𝒈2\displaystyle c^{2}\left(\frac{l_{1}(\boldsymbol{K})}{l_{1}(\boldsymbol{K})+n\lambda_{n}}\right)^{2}\left\|{\boldsymbol{g}}\right\|^{2} (l1(𝑲)l1(𝑲)+nλn)2(𝒗1T𝒈)2\displaystyle\leq\left(\frac{l_{1}(\boldsymbol{K})}{l_{1}(\boldsymbol{K})+n\lambda_{n}}\right)^{2}(\boldsymbol{v}_{1}^{T}\boldsymbol{g})^{2}
i=1n(li(𝑲)li(𝑲)+nλn)2(𝒗iT𝒈)2\displaystyle\leq\sum_{i=1}^{n}\left(\frac{l_{i}(\boldsymbol{K})}{l_{i}(\boldsymbol{K})+n\lambda_{n}}\right)^{2}(\boldsymbol{v}_{i}^{T}\boldsymbol{g})^{2}
(l1(𝑲)l1(𝑲)+nλn)2𝒈2,\displaystyle\leq\left(\frac{l_{1}(\boldsymbol{K})}{l_{1}(\boldsymbol{K})+n\lambda_{n}}\right)^{2}\left\|{\boldsymbol{g}}\right\|^{2},

it then follows from Proposition 4 that

1n1I1g\displaystyle\frac{1}{n-1}I_{1}^{g} 1n1[c2(l1(𝑲)l1(𝑲)+nλn)2𝒈2c2(l1(𝑲)l1(𝑲)+nλn)21n(𝟏nT𝒈)2]\displaystyle\geq\frac{1}{n-1}\left[c^{2}\left(\frac{l_{1}(\boldsymbol{K})}{l_{1}(\boldsymbol{K})+n\lambda_{n}}\right)^{2}\left\|{\boldsymbol{g}}\right\|^{2}-c^{-2}\left(\frac{l_{1}(\boldsymbol{K})}{l_{1}(\boldsymbol{K})+n\lambda_{n}}\right)^{2}\frac{1}{n}(\boldsymbol{1}_{n}^{T}\boldsymbol{g})^{2}\right]
=nc2n1(l1(𝑲)l1(𝑲)+nλn)2[1n𝒈2c4(1n𝟏nT𝒈)2],\displaystyle=\frac{nc^{2}}{n-1}\left(\frac{l_{1}(\boldsymbol{K})}{l_{1}(\boldsymbol{K})+n\lambda_{n}}\right)^{2}\left[\frac{1}{n}\left\|{\boldsymbol{g}}\right\|^{2}-c^{-4}\left(\frac{1}{n}\boldsymbol{1}_{n}^{T}\boldsymbol{g}\right)^{2}\right],

and

1n1I1g\displaystyle\frac{1}{n-1}I_{1}^{g} 1n1[(l1(𝑲)l1(𝑲)+nλn)2𝒈2α2(l2(𝑲)l2(𝑲)+nλn)21n(𝟏nT𝒈)2]\displaystyle\leq\frac{1}{n-1}\left[\left(\frac{l_{1}(\boldsymbol{K})}{l_{1}(\boldsymbol{K})+n\lambda_{n}}\right)^{2}\left\|{\boldsymbol{g}}\right\|^{2}-\alpha^{2}\left(\frac{l_{2}(\boldsymbol{K})}{l_{2}(\boldsymbol{K})+n\lambda_{n}}\right)^{2}\frac{1}{n}(\boldsymbol{1}_{n}^{T}\boldsymbol{g})^{2}\right]
=nn1(l1(𝑲)l1(𝑲)+nλn)2[1n𝒈2α2(l2(𝑲)l2(𝑲+nλn)l1(𝑲)+nλnl1(𝑲))2(1n𝟏nT𝒈)2]\displaystyle=\frac{n}{n-1}\left(\frac{l_{1}(\boldsymbol{K})}{l_{1}(\boldsymbol{K})+n\lambda_{n}}\right)^{2}\left[\frac{1}{n}\left\|{\boldsymbol{g}}\right\|^{2}-\alpha^{2}\left(\frac{l_{2}(\boldsymbol{K})}{l_{2}(\boldsymbol{K}+n\lambda_{n})}\frac{l_{1}(\boldsymbol{K})+n\lambda_{n}}{l_{1}(\boldsymbol{K})}\right)^{2}\left(\frac{1}{n}\boldsymbol{1}_{n}^{T}\boldsymbol{g}\right)^{2}\right]

Similarly, as a consequence of Proposition 2, the upper bound and lower bound of 1n1I2g\frac{1}{n-1}I_{2}^{g} can be obtained as follow:

1n1I2g\displaystyle\frac{1}{n-1}I_{2}^{g} =σϵ2n1tr((𝑲+nλn)1𝑲(𝑰n𝑱¯n)𝑲(𝑲+nλn𝑰n)1)+op(1)\displaystyle=\frac{\sigma_{\epsilon}^{2}}{n-1}\textrm{tr}\left((\boldsymbol{K}+n\lambda_{n})^{-1}\boldsymbol{K}(\boldsymbol{I}_{n}-\bar{\boldsymbol{J}}_{n})\boldsymbol{K}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-1}\right)+o_{p}(1)
=σϵ2n1[tr(𝑲(𝑲+nλn𝑰n)2𝑲)1n𝟏nT𝑲(𝑲+nλn𝑰n)2𝑲𝟏n]+op(1)\displaystyle=\frac{\sigma_{\epsilon}^{2}}{n-1}\left[\textrm{tr}\left(\boldsymbol{K}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-2}\boldsymbol{K}\right)-\frac{1}{n}\boldsymbol{1}_{n}^{T}\boldsymbol{K}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-2}\boldsymbol{K}\boldsymbol{1}_{n}\right]+o_{p}(1)
=σϵ2n1[i=1n(li(𝑲)li(𝑲)+nλn)21ni=1n(li(𝑲)li(𝑲)+nλn)2(𝒗iT𝟏n)2]+op(1)\displaystyle=\frac{\sigma_{\epsilon}^{2}}{n-1}\left[\sum_{i=1}^{n}\left(\frac{l_{i}(\boldsymbol{K})}{l_{i}(\boldsymbol{K})+n\lambda_{n}}\right)^{2}-\frac{1}{n}\sum_{i=1}^{n}\left(\frac{l_{i}(\boldsymbol{K})}{l_{i}(\boldsymbol{K})+n\lambda_{n}}\right)^{2}(\boldsymbol{v}_{i}^{T}\boldsymbol{1}_{n})^{2}\right]+o_{p}(1)
(1c2)σϵ2n1i=1n(li(𝑲)li(𝑲)+nλn)2+op(1),\displaystyle\leq\frac{(1-c^{2})\sigma_{\epsilon}^{2}}{n-1}\sum_{i=1}^{n}\left(\frac{l_{i}(\boldsymbol{K})}{l_{i}(\boldsymbol{K})+n\lambda_{n}}\right)^{2}+o_{p}(1),

where the last inequality follows from (C3). On the other hand, since 𝟏nT𝑲(𝑲+nλn𝑰n)2𝑲𝟏nn(l1(𝑲)l1(𝑲)+nλn)2\boldsymbol{1}_{n}^{T}\boldsymbol{K}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-2}\boldsymbol{K}\boldsymbol{1}_{n}\leq n\left(\frac{l_{1}(\boldsymbol{K})}{l_{1}(\boldsymbol{K})+n\lambda_{n}}\right)^{2}, it then follows that

1n1I2gσϵ2n1i=2n(li(𝑲)li(𝑲)+nλn)2+op(1).\frac{1}{n-1}I_{2}^{g}\geq\frac{\sigma_{\epsilon}^{2}}{n-1}\sum_{i=2}^{n}\left(\frac{l_{i}(\boldsymbol{K})}{l_{i}(\boldsymbol{K})+n\lambda_{n}}\right)^{2}+o_{p}(1).

A.6 Proof of Proposition 6

Proof.

Note that

1nI1ϵ=1ni=1n(nλnli(𝑲)+nλn)2(𝒗iT𝒈)2,\frac{1}{n}I_{1}^{\epsilon}=\frac{1}{n}\sum_{i=1}^{n}\left(\frac{n\lambda_{n}}{l_{i}(\boldsymbol{K})+n\lambda_{n}}\right)^{2}(\boldsymbol{v}_{i}^{T}\boldsymbol{g})^{2},

and since

i=1n(nλnli(𝑲)+nλn)2(𝒗iT𝒈)2\displaystyle\sum_{i=1}^{n}\left(\frac{n\lambda_{n}}{l_{i}(\boldsymbol{K})+n\lambda_{n}}\right)^{2}(\boldsymbol{v}_{i}^{T}\boldsymbol{g})^{2} (nλnl1(𝑲)+nλn)2𝒈2\displaystyle\geq\left(\frac{n\lambda_{n}}{l_{1}(\boldsymbol{K})+n\lambda_{n}}\right)^{2}\left\|{\boldsymbol{g}}\right\|^{2}
i=1n(nλnli(𝑲)+nλn)2(𝒗iT𝒈)2\displaystyle\sum_{i=1}^{n}\left(\frac{n\lambda_{n}}{l_{i}(\boldsymbol{K})+n\lambda_{n}}\right)^{2}(\boldsymbol{v}_{i}^{T}\boldsymbol{g})^{2} (nλnln(𝑲)+nλn)2𝒈2𝒈2\displaystyle\leq\left(\frac{n\lambda_{n}}{l_{n}(\boldsymbol{K})+n\lambda_{n}}\right)^{2}\left\|{\boldsymbol{g}}\right\|^{2}\leq\left\|{\boldsymbol{g}}\right\|^{2}

it then follows that

1n(nλnl1(𝑲)+nλn)2𝒈21nI1ϵ1n(nλnln(𝑲)+nλn)2𝒈21n𝒈2.\frac{1}{n}\left(\frac{n\lambda_{n}}{l_{1}(\boldsymbol{K})+n\lambda_{n}}\right)^{2}\left\|{\boldsymbol{g}}\right\|^{2}\leq\frac{1}{n}I_{1}^{\epsilon}\leq\frac{1}{n}\left(\frac{n\lambda_{n}}{l_{n}(\boldsymbol{K})+n\lambda_{n}}\right)^{2}\left\|{\boldsymbol{g}}\right\|^{2}\leq\frac{1}{n}\left\|{\boldsymbol{g}}\right\|^{2}.

Similarly, as a consequence of Proposition 2, we have

1nI2ϵ\displaystyle\frac{1}{n}I_{2}^{\epsilon} =σϵ2ntr(n2λn2(𝑲+nλn𝑰n)2)+op(1)\displaystyle=\frac{\sigma_{\epsilon}^{2}}{n}\textrm{tr}\left(n^{2}\lambda_{n}^{2}(\boldsymbol{K}+n\lambda_{n}\boldsymbol{I}_{n})^{-2}\right)+o_{p}(1)
=σϵ2ni=1n(nλnli(𝑲)+nλn)2+op(1)\displaystyle=\frac{\sigma_{\epsilon}^{2}}{n}\sum_{i=1}^{n}\left(\frac{n\lambda_{n}}{l_{i}(\boldsymbol{K})+n\lambda_{n}}\right)^{2}+o_{p}(1)
(nλnln(𝑲+nλn))2σϵ2+op(1)\displaystyle\leq\left(\frac{n\lambda_{n}}{l_{n}(\boldsymbol{K}+n\lambda_{n})}\right)^{2}\sigma_{\epsilon}^{2}+o_{p}(1)
σϵ2+op(1),\displaystyle\leq\sigma_{\epsilon}^{2}+o_{p}(1),

and

1nI2ϵ(nλnl1(𝑲)+nλn)2σϵ2+op(1).\frac{1}{n}I_{2}^{\epsilon}\geq\left(\frac{n\lambda_{n}}{l_{1}(\boldsymbol{K})+n\lambda_{n}}\right)^{2}\sigma_{\epsilon}^{2}+o_{p}(1).

A.7 Proof of Proposition 8

Proof.

The upper bound and the lower bound for σ^ϵ2/σ^g2\hat{\sigma}_{\epsilon}^{2}/\hat{\sigma}_{g}^{2} follow directly from Theorem 7. In addition, denote LB as the lower bound of σ^ϵ2/σ^g2\hat{\sigma}_{\epsilon}^{2}/\hat{\sigma}_{g}^{2} and UB as the upper bound of σ^ϵ2/σ^g2\hat{\sigma}_{\epsilon}^{2}/\hat{\sigma}_{g}^{2} for simplicity. Note that c2τ12c40c^{2}\tau_{1}^{2}-c^{-4}\leq 0,

UBσϵ2c2τ12σg2+σϵ2(xx+nλn)2dF1(x),UB\geq\frac{\sigma_{\epsilon}^{2}}{c^{2}\tau_{1}^{2}\sigma_{g}^{2}+\sigma_{\epsilon}^{2}\int\left(\frac{x}{x+n\lambda_{n}}\right)^{2}\textrm{d}F_{-1}(x)},

and based on the choice of λn\lambda_{n}, it follows that

UBσϵ2c2τ12σg2+σϵ2(xx+nλn)2dF1(x)σϵ2σg2.UB\geq\frac{\sigma_{\epsilon}^{2}}{c^{2}\tau_{1}^{2}\sigma_{g}^{2}+\sigma_{\epsilon}^{2}\int\left(\frac{x}{x+n\lambda_{n}}\right)^{2}\textrm{d}F_{-1}(x)}\geq\frac{\sigma_{\epsilon}^{2}}{\sigma_{g}^{2}}.

Similarly, based on the choice of λn\lambda_{n}, we have

σϵ2σg2(1τ12)1LBLB.\frac{\sigma_{\epsilon}^{2}}{\sigma_{g}^{2}}\geq(1-\tau_{1}^{2})^{-1}LB\geq LB.

Appendix B Additional Simulation Results

Table 5: Mean and standard deviation of the heritability estimators obtained after running 500 repetitions when the underlying function is quadratic and the low dimensional case was applied. The true heritability for the HWE scenario is 0.573 and the true heritability for the 1000 Genome Project case is 0.811.
Hardy Weinberg Equilibrium 1000 Genome Project
nλnn\lambda_{n} nλnn\lambda_{n}
Linear 2.3 2.5 3.0 0.5 0.8 1
600 0.125 (0.017) 0.110 (0.015) 0.082 (0.011) 0.836 (0.071) 0.756 (0.080) 0.706 (0.087)
700 0.122 (0.015) 0.106 (0.013) 0.081 (0.010) 0.823 (0.050) 0.749 (0.057) 0.698 (0.065)
800 0.117 (0.012) 0.104 (0.011) 0.080 (0.008) 0.807 (0.039) 0.733 (0.047) 0.685 (0.053)
900 0.112 (0.010) 0.102 (0.009) 0.079 (0.008) 0.792 (0.034) 0.712 (0.041) 0.670 (0.042)
1000 0.109 (0.009) 0.098 (0.008) 0.077 (0.007) 0.776 (0.028) 0.696 (0.034) 0.653 (0.037)
Polynomial 2.3 2.5 3.0 0.5 0.8 1
600 0.608 (0.028) 0.571 (0.029) 0.486 (0.030) 0.948 (0.030) 0.926 (0.039) 0.912 (0.044)
700 0.583 (0.024) 0.543 (0.024) 0.463 (0.026) 0.942 (0.023) 0.924 (0.026) 0.910 (0.030)
800 0.558 (0.023) 0.523 (0.022) 0.444 (0.022) 0.937 (0.017) 0.918 (0.022) 0.904 (0.026)
900 0.536 (0.019) 0.504 (0.019) 0.425 (0.021) 0.933 (0.015) 0.910 (0.019) 0.901 (0.021)
1000 0.517 (0.017) 0.482 (0.017) 0.410 (0.017) 0.929 (0.012) 0.907 (0.016) 0.896 (0.018)
Gaussian 2.3 2.5 3.0 0.5 0.8 1
600 0.072 (0.005) 0.062 (0.005) 0.045 (0.003) 0.649 (0.058) 0.450 (0.047) 0.352 (0.042)
700 0.072 (0.005) 0.062 (0.004) 0.045 (0.003) 0.650 (0.039) 0.458 (0.033) 0.361 (0.031)
800 0.071 (0.005) 0.062 (0.004) 0.045 (0.003) 0.652 (0.031) 0.463 (0.030) 0.368 (0.028)
900 0.071 (0.004) 0.062 (0.004) 0.045 (0.003) 0.653 (0.027) 0.464 (0.028) 0.372 (0.025)
1000 0.071 (0.004) 0.061 (0.003) 0.045 (0.003) 0.652 (0.023) 0.466 (0.024) 0.375 (0.023)
Table 6: Mean and standard deviation of the heritability estimators obtained after running 500 repetitions when the underlying function is quadratic and the high dimensional case was applied. The true heritability for the HWE scenario is 0.602 and the true heritability for the 1000 Genome Project case is 0.553.
Hardy Weinberg Equilibrium 1000 Genome Project
nλnn\lambda_{n} nλnn\lambda_{n}
Linear 2.0 2.3 2.5 1.5 2.0 2.3
100 0.152 (0.014) 0.122 (0.015) 0.106 (0.010) 0.381 (0.077) 0.292 (0.072) 0.248 (0.069)
200 0.145 (0.011) 0.117 (0.009) 0.106 (0.010) 0.376 (0.059) 0.294 (0.053) 0.250 (0.055)
300 0.137 (0.009) 0.112 (0.007) 0.099 (0.007) 0.372 (0.046) 0.291 (0.044) 0.257 (0.042)
400 0.129 (0.006) 0.106 (0.005) 0.094 (0.005) 0.367 (0.040) 0.288 (0.036) 0.253 (0.035)
500 0.119 (9.45E-17) 0.099 (8.07E-17) 0.088 (7.76E-17) 0.356 (0.031) 0.283 (0.031) 0.251 (0.029)
Polynomial 2.0 2.3 2.5 1.5 2.0 2.3
100 0.679 (0.014) 0.618 (0.015) 0.580 (0.015) 0.770 (0.063) 0.705 (0.081) 0.661 (0.089)
200 0.666 (0.013) 0.605 (0.013) 0.567 (0.013) 0.764 (0.049) 0.701 (0.054) 0.658 (0.066)
300 0.656 (0.010) 0.594 (0.011) 0.556 (0.012) 0.761 (0.038) 0.695 (0.045) 0.664 (0.049)
400 0.64 (0.008) 0.583 (0.009) 0.546 (0.008) 0.755 (0.033) 0.692 (0.036) 0.656 (0.038)
500 0.635 (3.10E-16) 0.574 (3.39E-16) 0.537 (3.47E-16) 0.749 (0.026) 0.687 (0.031) 0.653 (0.032)
Gaussian 2.0 2.3 2.5 1.5 2.0 2.3
100 0.091 (0.004) 0.071 (0.003) 0.061 (0.003) 0.154 (0.023) 0.096 (0.016) 0.074 (0.012)
200 0.091 (0.004) 0.071 (0.003) 0.061 (0.003) 0.162 (0.020) 0.102 (0.014) 0.079 (0.011)
300 0.091 (0.003) 0.071 (0.003) 0.061 (0.002) 0.169 (0.018) 0.108 (0.013) 0.086 (0.010)
400 0.091 (0.003) 0.071 (0.002) 0.061 (0.002) 0.177 (0.017) 0.113 (0.012) 0.090 (0.010)
500 0.091 (1.38E-16) 0.071 (1.15E-16) 0.062 (1.13E-16) 0.179 (0.014) 0.118 (0.011) 0.095 (0.010)
Table 7: Mean and standard deviation of the heritability estimators obtained after running 500 repetitions when the underlying function is trigonometric and the low dimensional case was applied. The true heritability for the HWE scenario is 0.706 and the true heritability for the 1000 Genome Project case is 0.686.
Hardy Weinberg Equilibrium 1000 Genome Project
nλnn\lambda_{n} nλnn\lambda_{n}
Linear 1.0 1.3 1.5 1.3 1.5 2.0
100 0.648 (0.050) 0.583 (0.055) 0.499 (0.064) 0.395 (0.021) 0.326 (0.020) 0.218 (0.015)
200 0.682 (0.035) 0.640 (0.037) 0.530 (0.047) 0.411 (0.018) 0.343 (0.017) 0.233 (0.013)
300 0.703 (0.026) 0.678 (0.026) 0.559 (0.033) 0.428 (0.015) 0.355 (0.015) 0.247 (0.012)
400 0.719 (0.020) 0.704 (0.020) 0.581 (0.028) 0.444 (0.011) 0.369 (0.010) 0.263 (0.008)
500 0.731 (0.017) 0.721 (0.016) 0.601 (0.023) 0.458 (0.000) 0.382 (0.000) 0.277 (0.000)
Polynomial 1.0 1.3 1.5 1.3 1.5 2.0
100 0.903 (0.021) 0.869 (0.023) 0.846 (0.037) 0.848 (0.009) 0.805 (0.011) 0.701 (0.014)
200 0.914 (0.014) 0.890 (0.016) 0.857 (0.024) 0.850 (0.007) 0.808 (0.009) 0.706 (0.016)
300 0.920 (0.010) 0.902 (0.010) 0.870 (0.015) 0.874 (0.006) 0.810 (0.007) 0.612 (0.010)
400 0.924 (0.007) 0.911 (0.007) 0.878 (0.012) 0.857 (0.004) 0.813 (0.005) 0.719 (0.006)
500 0.928 (0.006) 0.917 (0.006) 0.886 (0.009) 0.860 (0.000) 0.816 (0.000) 0.725 (0.000)
Gaussian 1.0 1.3 1.5 1.3 1.5 2.0
100 0.145 (0.021) 0.107 (0.17) 0.077 (0.013) 0.203 (0.001) 0.161 (0.001) 0.099 (0.000)
200 0.188 (0.020) 0.144 (0.018) 0.098 (0.014) 0.212 (0.001) 0.170 (0.000) 0.107 (0.000)
300 0.226 (0.019) 0.174 (0.018) 0.117 (0.009) 0.221 (0.001) 0.179 (0.000) 0.115 (0.000)
400 0.259 (0.019) 0.203 (0.018) 0.134 (0.009) 0.232 (0.001) 0.188 (0.000) 0.123 (0.000)
500 0.287 (0.017) 0.226 (0.017) 0.151 (0.009) 0.241 (0.000) 0.197 (0.000) 0.130 (0.000)
Table 8: Mean and standard deviation of the heritability estimators obtained after running 500 repetitions when the underlying function is trigonometric and the high dimensional case was applied. The true heritability for the HWE scenario is 0.724 and the true heritability for the 1000 Genome Project case is 0.767.
Hardy Weinberg Equilibrium 1000 Genome Project
nλnn\lambda_{n} nλnn\lambda_{n}
Linear 1.3 1.5 2.0 1.3 1.5 2.0
100 0.544 (0.050) 0.475 (0.067) 0.365 (0.046) 0.384 (0.020) 0.323 (0.018) 0.216 (0.014)
200 0.581 (0.038) 0.508 (0.045) 0.404 (0.034) 0.396 (0.018) 0.338 (0.016) 0.230 (0.014)
300 0.606 (0.026) 0.532 (0.034) 0.437 (0.024) 0.409 (0.014) 0.352 (0.013) 0.245 (0.013)
400 0.624 (0.021) 0.556 (0.028) 0.459 (0.026) 0.419 (0.011) 0.366 (0.010) 0.257 (0.009)
500 0.642 (0.018) 0.0574 (0.027) 0.479 (0.019) 0.430 (0.021) 0.380 (0.000) 0.270 (0.000)
Polynomial 1.3 1.5 2.0 1.3 1.5 2.0
100 0.857 (0.028) 0.831 (0.042) 0.777 (0.040) 0.843 (0.008) 0.803 (0.009) 0.699 (0.013)
200 0.869 (0.019) 0.848 (0.025) 0.800 (0.025) 0.843 (0.007) 0.805 (0.008) 0.703 (0.014)
300 0.879 (0.012) 0.858 (0.018) 0.817 (0.016) 0.845 (0.006) 0.808 (0.007) 0.708 (0.009)
400 0.885 (0.010) 0.869 (0.014) 0.827 (0.013) 0.846 (0.004) 0.811 (0.005) 0.712 (0.007)
500 0.891 (0.007) 0.876 (0.010) 0.836 (0.011) 0.847 (0.000) 0.815 (0.000) 0.717 (0.000)
Gaussian 1.3 1.5 2.0 1.3 1.5 2.0
100 0.078 (0.014) 0.054 (0.012) 0.032 (0.006) 0.204 (0.001) 0.161 (0.000) 0.098 (0.000)
200 0.103 (0.015) 0.064 (0.010) 0.0425 (0.006) 0.215 (0.001) 0.170 (0.000) 0.104 (0.000)
300 0.125 (0.015) 0.073 (0.009) 0.053 (0.006) 0.225 (0.001) 0.179 (0.000) 0.110 (0.000)
400 0.144 (0.014) 0.082 (0.008) 0.064 (0.007) 0.237 (0.000) 0.188 (0.000) 0.118 (0.000)
500 0.162 (0.014) 0.091 (0.008) 0.073 (0.007) 0.247 (0.000) 0.197 (0.000) 0.124 (0.000)

References

  • \bibcommenthead
  • Altshuler et al. [2010] Altshuler, D.M., Lander, E.S., Ambrogio, L., Bloom, T., Cibulskis, K., Fennell, T.J., Gabriel, S.B., Jaffe, D.B., Shefler, E., Sougnez, C.L., et al.: A map of human genome variation from population scale sequencing. Nature (2010)
  • Donnelly [2008] Donnelly, P.: Progress and challenges in genome-wide association studies in humans. Nature 456(7223), 728–731 (2008)
  • Hou et al. [2019] Hou, K., Burch, K.S., Majumdar, A., Shi, H., Mancuso, N., Wu, Y., Sankararaman, S., Pasaniuc, B.: Accurate estimation of snp-heritability from biobank-scale data irrespective of genetic architecture. Nature genetics 51(8), 1244–1251 (2019)
  • Hindorff et al. [2009] Hindorff, L.A., Sethupathy, P., Junkins, H.A., Ramos, E.M., Mehta, J.P., Collins, F.S., Manolio, T.A.: Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proceedings of the National Academy of Sciences 106(23), 9362–9367 (2009)
  • Karch et al. [2014] Karch, C.M., Cruchaga, C., Goate, A.M.: Alzheimer’s disease genetics: from the bench to the clinic. Neuron 83(1), 11–26 (2014)
  • Kimeldorf and Wahba [1970] Kimeldorf, G.S., Wahba, G.: A correspondence between bayesian estimation on stochastic processes and smoothing by splines. The Annals of Mathematical Statistics 41(2), 495–502 (1970)
  • Kimeldorf and Wahba [1971] Kimeldorf, G., Wahba, G.: Some results on tchebycheffian spline functions. Journal of mathematical analysis and applications 33(1), 82–95 (1971)
  • Li et al. [2023] Li, H., Mazumder, R., Lin, X.: Accurate and efficient estimation of local heritability using summary statistics and the linkage disequilibrium matrix. Nature Communications 14(1), 7954 (2023)
  • Lee et al. [2011] Lee, S.H., Wray, N.R., Goddard, M.E., Visscher, P.M.: Estimating missing heritability for disease from genome-wide association studies. The American Journal of Human Genetics 88(3), 294–305 (2011)
  • Maher [2008] Maher, B.: Personal genomes: The case of the missing heritability. Nature 456(7218) (2008)
  • Manolio et al. [2009] Manolio, T.A., Collins, F.S., Cox, N.J., Goldstein, D.B., Hindorff, L.A., Hunter, D.J., McCarthy, M.I., Ramos, E.M., Cardon, L.R., Chakravarti, A., et al.: Finding the missing heritability of complex diseases. Nature 461(7265), 747–753 (2009)
  • Mu and Gage [2011] Mu, Y., Gage, F.H.: Adult hippocampal neurogenesis and its role in alzheimer’s disease. Molecular neurodegeneration 6, 1–9 (2011)
  • Mirsky [1975] Mirsky, L.: A trace inequality of john von neumann. Monatshefte für mathematik 79(4), 303–306 (1975)
  • Rudelson and Vershynin [2013] Rudelson, M., Vershynin, R.: Hanson-Wright inequality and sub-gaussian concentration. Electronic Communications in Probability 18(none), 1–9 (2013) https://doi.org/10.1214/ECP.v18-2865
  • Speed et al. [2012] Speed, D., Hemani, G., Johnson, M.R., Balding, D.J.: Improved heritability estimation from genome-wide snps. The American Journal of Human Genetics 91(6), 1011–1021 (2012)
  • Sims et al. [2020] Sims, R., Hill, M., Williams, J.: The multiplex model of the genetics of alzheimer’s disease. Nature neuroscience 23(3), 311–322 (2020)
  • Schuff et al. [2009] Schuff, N., Woerner, N., Boreta, L., Kornfield, T., Shaw, L., Trojanowski, J., Thompson, P., Jack Jr, C., Weiner, M., Initiative, A.D.N.: Mri of hippocampal volume loss in early alzheimer’s disease in relation to apoe genotype and biomarkers. Brain 132(4), 1067–1077 (2009)
  • Vershynin [2018] Vershynin, R.: High-dimensional Probability: An Introduction with Applications in Data Science vol. 47. Cambridge university press, ??? (2018)
  • Whelan et al. [2016] Whelan, C.D., Hibar, D.P., Van Velzen, L.S., Zannas, A.S., Carrillo-Roa, T., McMahon, K., Prasad, G., Kelly, S., Faskowitz, J., deZubiracay, G., et al.: Heritability and reliability of automatically segmented human hippocampal formation subregions. Neuroimage 128, 125–137 (2016)
  • Yang et al. [2010] Yang, J., Benyamin, B., McEvoy, B.P., Gordon, S., Henders, A.K., Nyholt, D.R., Madden, P.A., Heath, A.C., Martin, N.G., Montgomery, G.W., et al.: Common snps explain a large proportion of the heritability for human height. Nature genetics 42(7), 565–569 (2010)
  • Zhu and Zhou [2020] Zhu, H., Zhou, X.: Statistical methods for snp heritability estimation and partition: A review. Computational and Structural Biotechnology Journal 18, 1557–1568 (2020)