This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Theory of functional principal component analysis
for discretely observed data

Hang Zhoulabel=e1][email protected] [    Dongyi Weilabel=e2][email protected] [    Fang Yaolabel=e3][email protected] [ Department of Statistics, University of California at Davis, School of Mathematical Sciences, Peking University, School of Mathematical Sciences and Center for Statistical Science, Peking University,
Abstract

Functional data analysis is an important research field in statistics which treats data as random functions drawn from some infinite-dimensional functional space, and functional principal component analysis (FPCA) based on eigen-decomposition plays a central role for data reduction and representation. After nearly three decades of research, there remains a key problem unsolved, namely, the perturbation analysis of covariance operator for diverging number of eigencomponents obtained from noisy and discretely observed data. This is fundamental for studying models and methods based on FPCA, while there has not been substantial progress since Hall, Müller and Wang (2006)’s result for a fixed number of eigenfunction estimates. In this work, we aim to establish a unified theory for this problem, obtaining upper bounds for eigenfunctions with diverging indices in both the 2\mathcal{L}^{2} and supremum norms, and deriving the asymptotic distributions of eigenvalues for a wide range of sampling schemes. Our results provide insight into the phenomenon when the 2\mathcal{L}^{2} bound of eigenfunction estimates with diverging indices is minimax optimal as if the curves are fully observed, and reveal the transition of convergence rates from nonparametric to parametric regimes in connection to sparse or dense sampling. We also develop a double truncation technique to handle the uniform convergence of estimated covariance and eigenfunctions. The technical arguments in this work are useful for handling the perturbation series with noisy and discretely observed functional data and can be applied in models or those involving inverse problems based on FPCA as regularization, such as functional linear regression.

62R10,
62G20,
eigen-decomposition,
functional data,
perturbation series,
phase transition,
keywords:
[class=MSC2020]
keywords:
\startlocaldefs\endlocaldefs

, and 111Fang Yao is the corresponding author.

1 Introduction

Modern data collection technologies have rapidly evolved, leading to the widespread emergence of functional data that have been extensively studied over the past few decades. Generally, functional data are considered stochastic processes that satisfy certain smoothness conditions or realizations of random elements valued in Hilbert space. These two perspectives highlight the essential natures of functional data, namely, their smoothness and infinite dimensionality, which distinguish them from high-dimensional and vector-valued data. For a comprehensive treatment of functional data, we recommend the monographs by Ramsay and Silverman (2006), Ferraty and Vieu (2006), Horváth and Kokoszka (2012), and Hsing and Eubank (2015), among others.

Although functional data provide information over a continuum, which is often time or spatial location, in reality, data are collected or observed discretely with measurement errors. For instance, we usually use nn to denote the sample size, which is the number of subjects corresponding to random functions, and NiN_{i} to denote the number of observations for the iith subject. Thanks to the smooth nature of functional data, having a large number of observations per subject is more of a blessing than a curse, in contrast to high-dimensional data (Hall, Müller and Wang, 2006; Zhang and Wang, 2016). There is extensive literature on nonparametric methods that address the smoothness of functional data, including kernel or local polynomial methods (Yao, Müller and Wang, 2005a; Hall, Müller and Wang, 2006; Zhang and Wang, 2016), and various types of spline methods (Rice and Wu, 2001; Yao and Lee, 2006; Paul and Peng, 2009; Cai and Yuan, 2011).

When employing a smoothing method, there are two typical strategies to be considered. If the observed time points per subject are relatively dense, it is recommended to pre-smooth each curve before further analysis, as suggested by Ramsay and Silverman (2006) and Zhang and Chen (2007). However, if the sampling scheme is rather sparse, it is preferred to pool observations together from all subjects (Yao, Müller and Wang, 2005a). The choice of individual versus pooled information affects the convergence rates and phase transitions in estimating population quantities, such as mean and covariance functions. When NiO(n5/4)N_{i}\gtrsim O(n^{5/4}) and the tuning parameter is optimally chosen per subject, the estimated mean and covariance functions based on the reconstructed curves through pre-smoothing are n\sqrt{n}-consistent, the so-called optimal parametric rate. On the other hand, by borrowing information from all subjects, the pooling method only requires NiO(n1/4)N_{i}\gtrsim O(n^{1/4}) for mean and covariance estimation to reach optimal (Cai and Yuan, 2010, 2011; Zhang and Wang, 2016), providing theoretical insight into the advantage of the pooling strategy.

However, estimating the mean and covariance functions does not account for the infinite dimensionality of functional data. Due to the non-invertibility of covariance operators for functional random objects, regularization is required in models that involve inverse issues with functional covariates, such as the functional linear model (Yao, Müller and Wang, 2005b; Hall and Horowitz, 2007; Yuan and Cai, 2010), functional generalized linear model (Müller and Stadtmüller, 2005; Dou, Pollard and Zhou, 2012), and functional Cox model (Qu, Wang and Wang, 2016). Truncation of the leading functional principal components (FPC) is a well-developed approach to addressing this inverse issue (Hall and Horowitz, 2007; Dou, Pollard and Zhou, 2012). In order to suppress the model bias, the number of principal components used in truncation should grow slowly with sample size. As a result, the convergence rate of the estimated eigenfunctions with diverging indices becomes a fundamental issue, which is not only important in its own right but also crucial for most models and methods involving functional principal components regularization.

For fully observed functional data, Hall and Horowitz (2007) obtained the optimal convergence rate j2/nj^{2}/n for the jjth eigenfunction, which served as a cornerstone in establishing the optimal convergence rate in functional linear model (Hall and Horowitz, 2007) and functional generalized linear model (Dou, Pollard and Zhou, 2012). In the discretely observed case, stochastic bounds for a fixed number of eigenfunctions have been obtained by different methods. Using a local linear smoother, Hall, Müller and Wang (2006) showed that the 2\mathcal{L}^{2} rate of a fixed eigenfunction for finite NiN_{i} is OP(n4/5)O_{P}(n^{-4/5}). Under the reproducing kernel Hilbert space framework, Cai and Yuan (2010) claimed that eigenfunctions with fixed indices admit the same convergence rate as the covariance function, which is OP((n/logn)4/5)O_{P}((n/\log n)^{-4/5}). It is important to note that, although both results are one-dimensional nonparametric rates (differing at most by a factor of (logn)4/5(\log n)^{4/5}), the methodologies and techniques used are completely disparate, and a detailed discussion can be found in Section 2. Additionally, Paul and Peng (2009) proposed a reduced rank model and studied its asymptotic properties under a particular setting. In Zhou, Yao and Zhang (2023), the authors studied the convergence rate for the functional linear model and obtained an improved bound for the eigenfunctions with diverging indices. However, this rate will not reach the optimal rate of j2/nj^{2}/n for any sampling rate NiN_{i}. As explained in Section 2, while some bounds can be obtained for eigenfunctions with diverging indices, attaining an optimal bound presents a substantially greater challenge. Lack of such an optimal bound for eigenfunctions poses considerable challenge in analyzing the standard and efficient plug-in estimator in functional linear model (Hall and Horowitz, 2007). Consequently, Zhou, Yao and Zhang (2023) resorted to a complex sample-splitting strategy, which results in lower estimation efficiency. To the best of our knowledge, there has been no progress in obtaining the optimal convergence rate of eigenfunctions with diverging indices when the data are discretely observed with noise contamination.

The distinction between estimating a diverging number and a fixed number of eigenfunctions is rooted in the infinite-dimensional nature of functional data. Analyzing eigenfunctions with diverging indices presents challenges due to the decaying eigenvalues. For fully observed data, the cross-sectional sample covariance based on the true functions facilitates the application of perturbation results, as shown in prior work (Hall and Horowitz, 2007; Dou, Pollard and Zhou, 2012). This approach simplifies each term in the perturbation series to the principal component scores. However, when the trajectories are observed at discrete time points, this virtue no longer exists, leading to a summability issue arising from the estimation bias and decaying eigenvalues. This renders existing techniques invalid and remains an unresolved problem; see Section 2 for further elaboration.

This paper addresses this significant yet challenging task of estimating an increasing number of eigenfunctions from discretely observed functional data, and presents a unified theory. The contributions of this paper are at least threefold. First, we establish an 2\mathcal{L}^{2} bound for the eigenfunctions and the asymptotic normality of the eigenvalues with increasing indices, reflecting a transition from nonparametric to parametric regimes and encompassing a wide range from sparse to dense sampling. We show that when NiN_{i} reaches a magnitude of n1/4+δn^{1/4+\delta}, where δ\delta depends on the smoothness parameters of the underlying curves, the convergence rate becomes optimal as if the curves are fully observed. Second, we introduce a novel double truncation method that yields uniform convergence across the time domain, surmounting theoretical barriers in the existing literature. Through this approach, uniform convergence rates for the covariance and eigenfunctions are achieved under mild conditions across various sampling schemes. Notably, this includes the uniform convergence of eigenfunctions with increasing indices, which is new even in scenarios where data are fully observed. Third, we provide a new technical route for addressing the perturbation series of the functional covariance operator, bridging the gap between the “ideal” fully observed scenario and the noisy, discrete “real-world” context. These advanced techniques pave the way for their application in downstream FPCA-related analyses, and the achieved optimal rate of eigenpairs facilitates the extension of existing theoretical results for “fully” observed functional data to discreetly observed case.

The rest of the paper is organized as follows. In Section 2, we give a synopsis of covariance and eigencomponents estimation in functional data. We present the 2\mathcal{L}^{2} convergence of eigenfunctions in Section 3, and discuss the uniform convergence problem of functional data in Section 5. Asymptotic normality of eigenvalues is presented in Section 4. Section 6 provides an illustration of the phase transition phenomenom in eigenfunctions with synthetic data. The proofs of Theorem 1 can be found in Appendix, while the proofs of other theorems and lemmas are collected in the Supplementary Material.

In what follows, we denote by An=OP(Bn)A_{n}=O_{P}(B_{n}) the relation (AnMBn)1ϵ\mathbb{P}(A_{n}\leq MB_{n})\geq 1-\epsilon, and by An=oP(Bn)A_{n}=o_{P}(B_{n}) the relation (AnϵBn)0\mathbb{P}(A_{n}\leq\epsilon B_{n})\rightarrow 0 as nn\rightarrow\infty, for each ϵ>0\epsilon>0 and a positive constant MM. A non-random sequence ana_{n} is said to be O(1)O(1) if it is bounded. For any non-random sequence bnb_{n}, we say bn=O(an)b_{n}=O(a_{n}) if bn/an=O(1)b_{n}/a_{n}=O(1), and bn=o(an)b_{n}=o(a_{n}) if bn/an0b_{n}/a_{n}\rightarrow 0. The notation anbna_{n}\lesssim b_{n} indicates anCbna_{n}\leq Cb_{n} for sufficiently large nn and postive constant CC, and the relation \gtrsim is defined similarly. We write anbna_{n}\asymp b_{n} if anbna_{n}\lesssim b_{n} and bnanb_{n}\lesssim a_{n}. For aa\in\mathbb{R}, a\lfloor a\rfloor denotes the largest integer less than or equal to aa. For a function f2[0,1]f\in\mathcal{L}^{2}[0,1], where 2[0,1]\mathcal{L}^{2}[0,1] denotes the space of square-integrable functions on [0,1][0,1], f2\|f\|^{2} denotes [0,1]f(s)2ds\int_{[0,1]}f(s)^{2}\mathrm{d}s, and f\|f\|_{\infty} denotes sups[0,1]|f(s)|\sup_{s\in[0,1]}|f(s)|. For a function A(s,t)2[0,1]2A(s,t)\in\mathcal{L}^{2}[0,1]^{2}, define AHS2=[0,1]2A(s,t)2dsdt\|A\|_{\mathrm{HS}}^{2}=\iint_{[0,1]^{2}}A(s,t)^{2}\mathrm{d}s\mathrm{d}t and A(j)2=[0,1]{[0,1]A(s,t)ϕj(s)ds}2dt\|A\|_{(j)}^{2}=\int_{[0,1]}\{\int_{[0,1]}A(s,t)\phi_{j}(s)\mathrm{d}s\}^{2}\mathrm{d}t, where {ϕj}j=1\{\phi_{j}\}_{j=1}^{\infty} are the eigenfunctions of interest. We write pq\int pq and Apq\int Apq for p(u)q(u)du\int p(u)q(u)\mathrm{d}u and A(u,v)p(u)q(v)dudv\iint A(u,v)p(u)q(v)\mathrm{d}u\mathrm{d}v occasionally for brevity.

2 Eigen-estimation for discretely observed functional data

Let X(t)X(t) be a square integrable stochastic process on [0,1][0,1], and let Xi(t)X_{i}(t) be independent and identically distributed copies of X(t)X(t). The mean and covariance functions of X(t)X(t) are denoted by μ(t)=𝔼[X(t)]\mu(t)=\mathbb{E}[{X(t)}] and C(s,t)=𝔼[{X(s)μ(s)}{X(t)μ(t)}]C(s,t)=\mathbb{E}[{\{X(s)-\mu(s)\}\{X(t)-\mu(t)\}}], respectively. According to Mercer’s Theorem (Indritz, 1963), C(s,t)C(s,t) has the spectral decomposition

C(s,t)=k=1λkϕk(s)ϕk(t),C(s,t)=\sum_{k=1}^{\infty}\lambda_{k}\phi_{k}(s)\phi_{k}(t), (1)

where λ1>λ2>>0\lambda_{1}>\lambda_{2}>\ldots>0 are eigenvalues and {ϕj}j=1\{\phi_{j}\}_{j=1}^{\infty} are the corresponding eigenfunctions, which form a complete orthonormal system on 2[0,1]\mathcal{L}^{2}[0,1]. For each ii, the process XiX_{i} admits the so-called Karhunen-Loève expansion

Xi(t)=μ(t)+j=1ξikϕk(t),X_{i}(t)=\mu(t)+\sum_{j=1}^{\infty}\xi_{ik}\phi_{k}(t), (2)

where ξik=01{Xi(t)μ(t)}ϕk(t)dt\xi_{ik}=\int_{0}^{1}\{X_{i}(t)-\mu(t)\}\phi_{k}(t)\mathrm{d}t are functional principal component scores with zero mean and variances λk\lambda_{k}.

However, in practice, it is only an idealization to have each Xi(t)X_{i}(t) for all t[0,1]t\in[0,1] to simplify theoretical analysis. Measurements are typically taken at NiN_{i} discrete time points with noise contamination. Specifically, the actual observations for each XiX_{i} are given by

{(tij,Xij)|Xij=Xi(tij)+εij,j=1,,Ni},\{(t_{ij},X_{ij})|X_{ij}=X_{i}(t_{ij})+\varepsilon_{ij},j=1,\cdots,N_{i}\}, (3)

where εij\varepsilon_{ij} are random copies of ε\varepsilon, with 𝔼(ε)=0\mathbb{E}(\varepsilon)=0 and Var(ε)=σX2\text{Var}(\varepsilon)=\sigma_{X}^{2}. We further assume the measurements errors {εij}i,j\{\varepsilon_{ij}\}_{i,j} are independent of XiX_{i}.

Local linear regression is a popular smoothing technique in functional data analysis due to its attractive theoretical properties (Yao, Müller and Wang, 2005a; Hall, Müller and Wang, 2006; Li and Hsing, 2010; Zhang and Wang, 2016). The primary goal of this paper is to develop a unified theory for estimating a larger number of eigenfunctions from discretely observed functional data. To maintain focus and avoid distractions, we assume that the mean function μ(t)\mu(t) is known, and set μ(t)=0\mu(t)=0 without loss of generality. The scenario involving an unknown mean function is discussed later in Section 3. We denote by δijl=XijXil\delta_{ijl}=X_{ij}X_{il} the raw covariance, and define vi={nNi(Ni1)}1v_{i}=\{nN_{i}(N_{i}-1)\}^{-1}. The local linear estimator of the covariance function is given by C^(s,t)=β^0\hat{C}(s,t)=\hat{\beta}_{0},

(β^0,β^1,β^2)=\displaystyle\left(\hat{\beta}_{0},\hat{\beta}_{1},\hat{\beta}_{2}\right)= argminβ0,β1,β2i=1nvi1l1l2Ni{δijlβ0β1(til1s)β2(til2t)}2\displaystyle\underset{\beta_{0},\beta_{1},\beta_{2}}{\operatorname{argmin}}\sum_{i=1}^{n}v_{i}\sum_{1\leq l_{1}\neq l_{2}\leq N_{i}}\left\{\delta_{ijl}-\beta_{0}-\beta_{1}\left(t_{il_{1}}-s\right)-\beta_{2}\left(t_{il_{2}}-t\right)\right\}^{2} (4)
×1hK(til1sh)1hK(til2th),\displaystyle\quad\quad\quad\times\frac{1}{h}\mathrm{K}\left(\frac{t_{il_{1}}-s}{h}\right)\frac{1}{h}\mathrm{K}\left(\frac{t_{il_{2}}-t}{h}\right),

where K\mathrm{K} is a symmetric, Lipschitz continuous density kernel on [1,1][-1,1] and hh is the tuning parameter. The estimated covariance function C^(s,t)\hat{C}(s,t) can be expressed as an empirical version of the spectral decomposition in (1), i.e.,

C^(s,t)=k=1λ^kϕ^k(s)ϕ^k(t),\hat{C}(s,t)=\sum_{k=1}^{\infty}\hat{\lambda}_{k}\hat{\phi}_{k}(s)\hat{\phi}_{k}(t), (5)

where λ^k\hat{\lambda}_{k} and ϕ^k\hat{\phi}_{k} are estimators for λk\lambda_{k} and ϕk\phi_{k}, respectively. We assume that ϕ^k,ϕk0\langle\hat{\phi}_{k},\phi_{k}\rangle\geq 0 for specificity.

Before delving into the theoretical details, we provide an overview of eigenfunction estimation in functional data analysis. We start with the resolvent series shown in Equation (6) and illustrate its application in statistical analysis,

𝔼(ϕ^jϕj2)kj𝔼[{(C^C)ϕjϕk}2](λkλj)2.\mathbb{E}(\|\hat{\phi}_{j}-\phi_{j}\|^{2})\asymp\sum_{k\neq j}\frac{\mathbb{E}[\{\iint(\hat{C}-C)\phi_{j}\phi_{k}\}^{2}]}{(\lambda_{k}-\lambda_{j})^{2}}. (6)

Such expansions can be found in Bosq (2000), Dou, Pollard and Zhou (2012), and Li and Hsing (2010); see Chapter 5 in Hsing and Eubank (2015) for details. Denote by ηj\eta_{j} the eigengap of λj\lambda_{j}, that is, ηj:=minkj|λkλj|\eta_{j}:=\min_{k\neq j}|\lambda_{k}-\lambda_{j}|. An basic rough bound for 𝔼(ϕ^jϕj2)\mathbb{E}(\|\hat{\phi}_{j}-\phi_{j}\|^{2}) can be derived from Equation (6) and Bessel’s inequality,

𝔼(ϕ^jϕj2)ηj2𝔼(C^CHS2).\mathbb{E}(\|\hat{\phi}_{j}-\phi_{j}\|^{2})\leq\eta_{j}^{-2}\mathbb{E}(\|\hat{C}-C\|_{\mathrm{HS}}^{2}). (7)

However, this bound is suboptimal for two reasons. First, while ηj\eta_{j} is bounded away from 0 for a fixed jj, the bound implies that the eigenfunctions converge at the same rate as the covariance function. This is counterintuitive since integration usually brings extra smoothness (Cai and Hall, 2006), which typically results in the eigenfunction estimates converging at a faster rate than the two-dimensional kernel smoothing rate of C^CHS2\|\hat{C}-C\|_{\mathrm{HS}}^{2}. Second, for jj that diverges with the sample size, ηj2\eta_{j}^{-2}\rightarrow\infty in the bound cannot be improved. Assuming λjja\lambda_{j}\asymp j^{-a}, the rate obtained by (7) is slower than j2a+2/nj^{2a+2}/n, which is known to be suboptimal (Wahl, 2022). Therefore, to obtain a sharp rate for 𝔼(ϕ^jϕj2)\mathbb{E}(\|\hat{\phi}_{j}-\phi_{j}\|^{2}), one should use the original perturbation series given by (6), rather than its approximation given by (7).

When each trajectory Xi(t)X_{i}(t) is fully observed for all t[0,1]t\in[0,1], the cross-sectional sample covariance C^(s,t)=n1i=1nXi(s)Xi(t)\hat{C}(s,t)=n^{-1}\sum_{i=1}^{n}X_{i}(s)X_{i}(t) is a canonical estimator of C(s,t)C(s,t). Then, the numerators in each term of (6) can be reduced to the principal components scores under some mild assumptions, for example, 𝔼[{(C^C)ϕjϕk}2]n1λjλk\mathbb{E}[\{\iint(\hat{C}-C)\phi_{j}\phi_{k}\}^{2}]\lesssim n^{-1}\lambda_{j}\lambda_{k} (Hall and Horowitz, 2007; Dou, Pollard and Zhou, 2012). Subsequently, 𝔼(ϕ^jϕj2)\mathbb{E}(\|\hat{\phi}_{j}-\phi_{j}\|^{2}) is bounded by (λj/n)kjλk/(λkλj)2(\lambda_{j}/n)\sum_{k\neq j}\lambda_{k}/(\lambda_{k}-\lambda_{j})^{2}. With the common assumption of the polynomial decay of eigenvalues, the aforementioned summation is dominated by λj/nj/2k2jλk/(λkλj)2\lambda_{j}/n\sum_{\lfloor j/2\rfloor\leq k\leq 2j}\lambda_{k}/(\lambda_{k}-\lambda_{j})^{2}, which is O(j2/n)O(j^{2}/n) and optimal in the minimax sense (Wahl, 2022). See Lemma 7 in Dou, Pollard and Zhou (2012) for a detailed elaboration. This suggests that the convergence rate caused by the inverse issue can be captured by the summation over the set {k2j}\{k\leq 2j\}, and the tail sum on {k>2j}\{k>2j\} can be treated as a unity.

However, we would like to emphasize that when it comes to discretely observed functional data, all the existing literature utilizing a bound similar to (7) excludes the case of diverging indices. For instance, the result in Cai and Yuan (2010) is simply a direct application of the bound in (7). Moreover, their one-dimensional rate is inherited from the covariance estimator, which is assumed to be in a tensor product space that is smaller than the space 2[0,1]2\mathcal{L}^{2}[0,1]^{2}. On the other hand, the one-dimensional rates obtained by Hall, Müller and Wang (2006) and Li and Hsing (2010) utilize detailed calculations based on the approximation of the perturbation series in (6). However, these results are based on the assumption that ηj\eta_{j} is bounded away from zero, which implies that jj must be a fixed constant. This is inconsistent with the nonparametric nature of functional data models, which aim to approximate or regularize an infinite-dimensional process. Therefore, when dealing with discretely observed functional data, the key to obtaining a sharp bound for estimated eigenfunctions with diverging indices lies in effectively utilizing the perturbation series (6).

The main challenges arise from quantifying the summation in (6) without the fully observed sample covariance. For the pre-smoothing method, the reconstructed X^i\hat{X}_{i} achieves a n\sqrt{n} convergence in the 2\mathcal{L}^{2} sense when each NiN_{i} reaches a magnitude of n5/4n^{5/4}, and then the estimated covariance function C^(s,t)=n1i=1nX^i(s)X^i(t)\hat{C}(s,t)=n^{-1}\sum_{i=1}^{n}\hat{X}_{i}(s)\hat{X}_{i}(t) has an optimal rate C^CHS=OP(n1/2)\|\hat{C}-C\|_{\mathrm{HS}}=O_{P}(n^{-1/2}). However, this does not guarantee optimal convergence of a diverging number of eigenfunctions. The numerators in each term of (6) are no longer the principal component scores, and the complex form of this infinite summation makes it difficult to quantify when |λkλj|0|\lambda_{k}-\lambda_{j}|\rightarrow 0. Similarly, the pooling method also encounters significant challenges in summing all 𝔼[{(C^C)ϕjϕk}2]\mathbb{E}[\{\iint(\hat{C}-C)\phi_{j}\phi_{k}\}^{2}] with respect to jj and kk. Specifically, the convergence rate of C^CHS2\|\hat{C}-C\|_{\mathrm{HS}}^{2} should be a two-dimensional kernel smoothing rate with variance n1{1+(Nh)2}n^{-1}\{1+(Nh)^{-2}\} (Zhang and Wang, 2016). However, after being integrated twice, 𝔼[{(C^C)ϕjϕk}2]\mathbb{E}[\{\iint(\hat{C}-C)\phi_{j}\phi_{k}\}^{2}] has a degenerated kernel smoothing rate with variance n1n^{-1}. According to Bessel’s inequality, 𝔼(C^CHS2)\mathbb{E}(\|\hat{C}-C\|_{\mathrm{HS}}^{2}) can be expressed as j,k𝔼[{(C^C)ϕjϕk}2]\sum_{j,k}\mathbb{E}[\{\iint(\hat{C}-C)\phi_{j}\phi_{k}\}^{2}]. However, due to estimation bias, one cannot directly sum all 𝔼[{(C^C)ϕjϕk}2]\mathbb{E}[\{\iint(\hat{C}-C)\phi_{j}\phi_{k}\}^{2}] with respect to all j,kj,k.

3 2\mathcal{L}^{2} convergence of eigenfunction estimates

Based on the issues discussed above, we propose a novel technique for addressing the perturbation series (6) when dealing with discretely observed functional data. To this end, we make the following regularity assumptions.

Assumption 1.

There exists a positive constant c0c_{0} such that 𝔼(ξj4)c0λj2\mathbb{E}(\xi_{j}^{4})\leq c_{0}\lambda_{j}^{2} for all jj.

Assumption 2.

The second order derivatives of C(s,t)C(s,t), C(s,t)/s2\partial C(s,t)/\partial s^{2}, C(s,t)/t2\partial C(s,t)/\partial t^{2} and C(s,t)/st\partial C(s,t)/\partial s\partial t are bounded on [0,1]2[0,1]^{2}.

Assumption 3.

The eigenvalues λj\lambda_{j} are decreasing with jaλjλj+1+ja1j^{-a}\gtrsim\lambda_{j}\gtrsim\lambda_{j+1}+j^{-a-1} for a>1a>1 and each j1j\geqslant 1.

Assumption 4.

For each j+j\in\mathbb{N}^{+}, the eigenfunctions ϕj\phi_{j} satisfies supt[0,1]|ϕj(t)|=O(1)\sup\limits_{t\in[0,1]}|\phi_{j}(t)|=O(1) and

supt[0,1]|ϕj(k)(t)|jc/2supt[0,1]|ϕj(k1)|, for k=1,2,\sup\limits_{t\in[0,1]}|\phi_{j}^{(k)}(t)|\lesssim j^{c/2}\sup\limits_{t\in[0,1]}|\phi_{j}^{(k-1)}|,\hskip 7.22743pt\text{ for }k=1,2,

where cc is a positive constant.

Assumption 5.

𝔼[sups[0,1]|X(t)|2α]<\mathbb{E}[\sup_{s\in[0,1]}|X(t)|^{2\alpha}]<\infty and 𝔼(ε2α)<\mathbb{E}(\varepsilon^{2\alpha})<\infty for α>1\alpha>1.

Assumptions 1 and 2 are widely adopted in the functional data literature related to smoothing (Yao, Müller and Wang, 2005a; Cai and Yuan, 2010; Zhang and Wang, 2016). The decay rate assumption on the eigenvalues provides a natural theoretical framework for justifying and analyzing the properties of functional principal components (Hall and Horowitz, 2007; Dou, Pollard and Zhou, 2012; Zhou, Yao and Zhang, 2023). The number of eigenfunctions that can be well estimated from exponentially decaying eigenvalues is limited to the order of logn\log n, which lacks practical interest. Consequently, we adopt the assumption of polynomial decay in eigenvalues. To achieve quality estimates for a specific eigenfunction, its smoothness should be taken into account. Generally, the frequency of ϕj\phi_{j} is higher for larger jj, which requires a smaller bandwidth to capture its local variation. Assumption 4 characterizes the frequency increment of a specific eigenfunction via the smoothness of its derivatives. For some commonly used bases, such as the Fourier, Legendre, and wavelet bases, c=2c=2. In Hall, Müller and Wang (2006), the authors assumed that max1jrmaxs=0,1,2supt[0,1]|ϕj(s)(t)|C\max_{1\leq j\leq r}\max_{s=0,1,2}\sup_{t\in[0,1]}|\phi_{j}^{(s)}(t)|\leq C, which is only achievable for a fixed rr. Therefore, Assumption 4 can be interpreted as a generalization of this assumption. To analyze the convergence of eigenfunctions effectively, a uniform convergence rate of the covariance function is needed to handle the local linear estimator (4). Assumption 5 is the moment assumption required for uniform convergence of covariance function and adopted in Li and Hsing (2010) and Zhang and Wang (2016).

For the observation time points {tij}i,j\{t_{ij}\}_{i,j}, there are two typical types of designs: the random design, in which the design points are random samples from a distribution, and the regular design, where the observation points are predetermined mesh grid. For the random design, the following assumption is commonly adopted (Yao, Müller and Wang, 2005a; Li and Hsing, 2010; Cai and Yuan, 2011; Zhang and Wang, 2016):

Assumption 6 (Random design).

The design points tijt_{ij}, which are independent of XX and ε\varepsilon, are i.i.d. sampled from a distribution on [0,1][0,1] with a density that is bounded away from zero and infinity.

For the regular design, each sample path is observed on an equally spaced grid {tj}j=1N\{t_{j}\}_{j=1}^{N}, where Ni==Nn=NN_{i}=\cdots=N_{n}=N for all subjects. This longitudinal design is frequently encountered in a various scientific experiments and has been studied in Cai and Yuan (2011). Assumption 7 guarantees a sufficient number of observations within the local window for the kernel smoothing method. Furthermore, Assumption 8 is needed for the Riemann sum approximation in the fixed regular design.

Assumption 7 (Fixed regular design).

The design points {tj}j=1N\{t_{j}\}_{j=1}^{N} are nonrandom, and there exist constant c2c1>0c_{2}\geq c_{1}>0, such that for any interval A,B[0,1]A,B\in[0,1],

  • (a)

    c1N|A|1j=1N𝟙{tjA}max{c2N,1}c_{1}N|A|-1\leq\sum_{j=1}^{N}\mathds{1}_{\{t_{j}\in A\}}\leq\max\{c_{2}N,1\},

  • (b)

    c1N2|A||B|1l1,l2N𝟙{tl1A}𝟙{tl2B}max{c2N2|A||B|,1}c_{1}N^{2}|A||B|-1\leq\sum_{l_{1},l_{2}}^{N}\mathds{1}_{\{t_{l_{1}}\in A\}}\mathds{1}_{\{t_{l_{2}}\in B\}}\leq\max\{c_{2}N^{2}|A||B|,1\},

where |A||A| denotes the length of AA.

Assumption 8.

C(s,t)/s2{\partial C(s,t)}/{\partial s^{2}} and C(s,t)/t2{\partial C(s,t)}/{\partial t^{2}} are continuously differentiable.

The following theorem is one of our main results. It characterizes the 2\mathcal{L}^{2} convergence of the estimated eigenfunctions with diverging indices for both random design (Assumption 6) and fixed regular design (Assumption 7).

Theorem 1.

Assume Assumptions 1 to 4 hold, further assume Assumption 5 holds with α>3\alpha>3.

  • (a)

    For the random design, under Assumption 6, for all jm+j\leq m\in\mathbb{N}_{+} satisfies m2a+2/n0m^{2a+2}/n\rightarrow 0, m2a+2/(nN¯22h2)0m^{2a+2}/(n\bar{N}_{2}^{2}h^{2})\rightarrow 0 and h4max{m2a+2c,m4alogn}1h^{4}\max\{m^{2a+2c},m^{4a}\log n\}\lesssim 1,

    ϕ^jϕj2=OP(j2n{1+j2aN¯22}+janN¯2h(1+jaN¯2)+h4j2c+2),\|\hat{\phi}_{j}-\phi_{j}\|^{2}=O_{P}\left(\frac{j^{2}}{n}\left\{1+\frac{j^{2a}}{\bar{N}_{2}^{2}}\right\}+\frac{j^{a}}{n\bar{N}_{2}h}\left(1+\frac{j^{a}}{\bar{N}_{2}}\right)+h^{4}j^{2c+2}\right), (8)

    where N¯2=(n1i=1nNi2)1/2\bar{N}_{2}=(n^{-1}\sum_{i=1}^{n}N_{i}^{-2})^{-1/2}.

  • (b)

    For the fixed regular design, under Assumption 7 and 8, for all jm+j\leq m\in\mathbb{N}_{+} satisfies m2a+2/n0m^{2a+2}/n\rightarrow 0, m2a+2/(nN2h2)0m^{2a+2}/(n{N}^{2}h^{2})\rightarrow 0, Nha1{N}h^{a}\gtrsim 1 and h4max{m2a+2c,m4alogn}1h^{4}\max\{m^{2a+2c},m^{4a}\log n\}\lesssim 1,

    ϕ^jϕj2=OP(j2n+janNh+h4j2c+2).\|\hat{\phi}_{j}-\phi_{j}\|^{2}=O_{P}\left(\frac{j^{2}}{n}+\frac{j^{a}}{nNh}+h^{4}j^{2c+2}\right). (9)

The integer mm in Theorem 1 represents the maximum number of eigenfunctions that can be well estimated from the observed data using appropriate tuning parameters. It is important to note that in Theorem 1, mm could diverge to infinity. The upper bound of mm is a function of the sample size nn, the sampling rate N¯2\bar{N}_{2}, the smoothing parameter hh, and the decaying eigengap ηj\eta_{j} or aa. As the frequency of ϕj\phi_{j} is higher for larger jj, smaller hh is required to capture its local variations. If aa is large, the eigengap ηj\eta_{j} diminishes rapidly, posing a greater challenge in distinguishing between adjacent eigenvalues. Note that the assumptions of mm in Theorem 1 could encompass most downstream analyses that involve a functional covariate, such as functional linear regression as discussed in (Hall and Horowitz, 2007).

Theorem 1 provides a good illustration of both the infinite dimensionality and smoothness nature of functional data. To better understand this result, note that j2/nj^{2}/n is the optimal rate in the fully observed case. The additional terms on the right-hand side of Equation (8) represent contamination introduced by discrete observations and measurement errors. In particular, the term with h4h^{4} corresponds to the smoothing bias, while the term (nN¯2h)1(n\bar{N}_{2}h)^{-1} reflects the variance typically associated with one-dimensional kernel smoothing. Terms including N¯21\bar{N}_{2}^{-1} arise from the discrete approximation, and the terms that involve jj with positive powers are due to the decaying eigengaps associated with an increasing number of eigencomponents.

The first part of Theorem 1 provides a unified result for eigenfunction estimates under random design without imposing any restrictions on the sampling rate NiN_{i}. Similar to the phase transitions of mean and covariance functions studied in Cai and Yuan (2011) and Zhang and Wang (2016), Corollary 1 presents a systematic partition that ranges from “sparse” to “dense” schemes for eigenfunction estimation under the random design, which is meaningful for FPCA-based models and methods.

Corollary 1.

Under Assumptions 1 to 6, and m+m\in\mathbb{N}_{+} satisfies (a)(a) of Theorem 1. For each jmj\leq m and let hopt(j)=(nN¯2)1/5j(a2c2)/5(1+ja/N¯2)1/5h_{opt}(j)=(n\bar{N}_{2})^{-1/5}j^{(a-2c-2)/5}(1+j^{a}/\bar{N}_{2})^{1/5},

  1. (a)

    If N¯2ja\bar{N}_{2}\gtrsim j^{a},

    ϕ^jϕj2=OP(j2n+j(4a+2c+2)/5(nN¯2)4/5).\|\hat{\phi}_{j}-\phi_{j}\|^{2}=O_{P}\left(\frac{j^{2}}{n}+\frac{j^{(4a+2c+2)/5}}{(n\bar{N}_{2})^{4/5}}\right).

    In addition, if N¯2n1/4ja+c/22\bar{N}_{2}\geq n^{1/4}j^{a+c/2-2},

    ϕ^jϕj2=OP(j2n).\|\hat{\phi}_{j}-\phi_{j}\|^{2}=O_{P}\left(\frac{j^{2}}{n}\right).
  2. (b)

    If N¯2=o(ja)\bar{N}_{2}=o(j^{a}),

    ϕ^jϕj2=OP(j2a+2nN¯22+j(8a+2c+2)/5(nN¯22)4/5).\|\hat{\phi}_{j}-\phi_{j}\|^{2}=O_{P}\left(\frac{j^{2a+2}}{n\bar{N}_{2}^{2}}+\frac{j^{(8a+2c+2)/5}}{(n\bar{N}_{2}^{2})^{4/5}}\right).

Note that hopt(j)h_{opt}(j) is the optimal bandwidth for estimating a specific eigenfunction ϕj\phi_{j}. However, in practice, it suffices to estimate the covariance function just once using the optimal bandwidth associated with the largest eigenfunction ϕmmax\phi_{m_{\text{max}}}. This ensures that both the subspace spanned by the first mmaxm_{\text{max}} eigenfunctions and their corresponding projections are well estimated. When conducting downstream analysis with FPCA, the evaluation of error rates typically involves the term j=1mmaxjβϕ^jϕj2\sum_{j=1}^{m_{\max}}j^{-\beta}\|\hat{\phi}_{j}-\phi_{j}\|^{2}. Here mmaxm_{\max} denotes the maximum number of principal components used, and β\beta varies across different scenarios, reflecting how the influence of each principal component on the error rate is weighted. For example, in the functional linear model, β>0\beta>0 represents the rate at which the Fourier coefficients of the regression function decay. By the first statement of Theorem 1,

j=1mmaxjβϕ^jϕj2=mmax3βn(1+mmax2aN¯22)+maβ+1nN¯2h(1+mmaxaN¯2)+h4m2c+3β.\displaystyle\sum_{j=1}^{m_{\max}}j^{-\beta}\|\hat{\phi}_{j}-\phi_{j}\|^{2}=\frac{m_{\max}^{3-\beta}}{n}\left(1+\frac{m_{\max}^{2a}}{\bar{N}_{2}^{2}}\right)+\frac{m^{a-\beta+1}}{n\bar{N}_{2}h}\left(1+\frac{m_{\max}^{a}}{\bar{N}_{2}}\right)+h^{4}m^{2c+3-\beta}.

It can be found that the optimal bandwidth hh in the above equation aligns with hopt(mmax)h_{opt}(m_{\max}) for all β\beta. This implies that our theoretical framework can be seamlessly adapted to scenarios where a single bandwidth is employed to attain the optimal convergence rate in the downstream analysis.

For the commonly used bases where c=2c=2, the convergence rate for the jjth eigenfunction achieves optimality as if the curves are fully observed when N¯2>n1/4ja1\bar{N}_{2}>n^{1/4}j^{a-1}. Keeping jj fixed, the phase transition occurs at n1/4n^{1/4}, aligning with results in Hall, Müller and Wang (2006) and Cai and Yuan (2010), as well as mean and covariance functions discussed in Zhang and Wang (2016). For nn subjects, the maximum index of the eigenfunction that can be well estimated is less than mmax:=n1/(2a+2)m_{\mathrm{max}}:=n^{1/(2a+2)}, which directly follows from the assumption m2a+2/n0m^{2a+2}/n\rightarrow 0.The phase transition in estimating ϕmmax\phi_{m_{\mathrm{max}}} occurs at n1/4+(a1)/(2a+2)n^{1/4+(a-1)/(2a+2)}. This can be interpreted from two aspects. On one hand, compared to mean and covariance estimation, more observations per subject are required to obtain optimal convergence for eigenfunctions with increasing indices, showing the challenges tied to infinite dimensionality and decaying eigenvalues. On the other hand, the fact that n1/4+(a1)/(2a+2)n^{1/4+(a-1)/(2a+2)} is only slightly larger than n1/4n^{1/4} justifies the merits of the pooling method and supports our intuition. When jj is fixed and N¯2\bar{N}_{2} is finite, the convergence rate obtained by Corollary 1 is (nh)1+h4(nh)^{-1}+h^{4}, which corresponds to a typical one-dimensional kernel smoothing rate and achieves optimal at hn1/5h\asymp n^{-1/5}. This result aligns with those in Hall, Müller and Wang (2006) and is optimal in the minimax sense. When allowing jj to diverge, the known lower bound j2/nj^{2}/n for fully observed data is attained by applying van Trees’ inequality to the special orthogonal group (Wahl, 2022). However, the argument presented in Wahl (2022) does not directly extend to the discrete case and there are currently no available lower bounds for the eigenfunctions with diverging indices based on discrete observations, which remains an open problem that requires further investigation.

Comparing the results of this work with those in Zhou, Yao and Zhang (2023) is also of interest. The 2\mathcal{L}^{2} convergence rate for the jjth eigenfunction obtained in (Zhou, Yao and Zhang, 2023) is

ja+2n(1+1Nh)+h4j2a+2c+2.\frac{j^{a+2}}{n}\left(1+\frac{1}{Nh}\right)+h^{4}j^{2a+2c+2}. (10)

It is evident that the results in Zhou, Yao and Zhang (2023) will never reach the optimal rate j2/nj^{2}/n for any sampling rate NN. In contrast, Corollary 1 provide a systematic partition ranging from “sparse” to “dense” schemes for eigenfunction estimation, and the optimal rate j2/nj^{2}/n can be achieved when the sampling rate NiN_{i} exceeds the phase transition point. The optimal rate achieved here represents more than just a theoretical improvement over previous findings; it also carries substantial implications for downstream analysis. Further discussion can be found in Section 7.

The following Corollary presents the phase transition of eigenfunctions for fixed regular design. Note that for fixed regular design, the number of distinct observed time points in an interval of length hh is on the order of NhNh, so Nh1Nh\gtrsim 1 is required to ensure there is at least one observation in the bandwidths of kernel smoothing (Shao, Lin and Yao, 2022). Moreover, Nha1Nh^{a}\gtrsim 1 is required to eliminate the Riemann sum approximation bias. The condition Nha1Nh^{a}\gtrsim 1 is parallel of part (a) in Corollary 1 in the scenario of the random design, which is similar as the mean and covariance estimation where consistency can only be obtained under the dense case for regular design (Shao, Lin and Yao, 2022).

Corollary 2.

Under Assumptions 1 to 5, 7, 8 and m+m\in\mathbb{N}_{+} satisfies (b)(b) of Theorem 1. For each jmj\leq m and let hopt(j)=(nN)1/5j(a2c2)/5h_{opt}(j)=(nN)^{-1/5}j^{(a-2c-2)/5},

ϕ^jϕj2=OP(j2n+j(4a+2c+2)/5(nN)4/5).\|\hat{\phi}_{j}-\phi_{j}\|^{2}=O_{P}\left(\frac{j^{2}}{n}+\frac{j^{(4a+2c+2)/5}}{(n{N})^{4/5}}\right).

In addition, if Nn1/4ja+c/22{N}\geq n^{1/4}j^{a+c/2-2},

ϕ^jϕj2=OP(j2n).\|\hat{\phi}_{j}-\phi_{j}\|^{2}=O_{P}\left(\frac{j^{2}}{n}\right).

If the mean function μ(t)\mu(t) is unknown, one could use local linear smoother to fit μ^(t)=α^0\hat{\mu}(t)=\hat{\alpha}_{0} with

(α^0,α^1)=argminα0,α11ni=1n1Nij=1Ni{Xijα0α1(tijt)}21hμK(tijthμ).(\hat{\alpha}_{0},\hat{\alpha}_{1})=\underset{\alpha_{0},\alpha_{1}}{\operatorname{argmin}}\frac{1}{n}\sum_{i=1}^{n}\frac{1}{N_{i}}\sum_{j=1}^{N_{i}}\{X_{ij}-\alpha_{0}-\alpha_{1}(t_{ij}-t)\}^{2}\frac{1}{h_{\mu}}\mathrm{K}\left(\frac{t_{ij}-t}{h_{\mu}}\right).

Then the covariance estimator C^\hat{C} is obtained by replacing the raw covariance δijl\delta_{ijl} in (4) by δ^ijl={Xijμ^(tij)}{Xilμ^(til)}\hat{\delta}_{ijl}=\{X_{ij}-\hat{\mu}(t_{ij})\}\{X_{il}-\hat{\mu}(t_{il})\}. The following corollary presents the convergence rate and phase transition for the case where μ(t)\mu(t) is unknown. It should be noted that the Fourier coefficients of μ(t)\mu(t) with respect to the eigenfunction ϕj{\phi_{j}} generally do not exhibit a decaying trend. Therefore, to eliminate the estimation error caused by the mean estimation, an additional lower bound on NN is necessary. This lower bound, denoted as NjaN\gtrsim j^{a}, aligns with the partition described in Corollary 1 for the random design case.

Corollary 3.

Suppose that Assumptions 1 to 4 hold. Under either of Assumption 6 or 7 and 8, for all m+m\in\mathbb{N}_{+} satisfies m2a+2/nm^{2a+2}/n, Nha1Nh^{a}\gtrsim 1 and h4max{m2a+2c,m4alogn}1h^{4}\max\{m^{2a+2c},m^{4a}\log n\}\lesssim 1 and each jmj\leq m,

ϕ^jϕj2=OP(j2n+j(4a+2c+2)/5(nN¯2)4/5).\|\hat{\phi}_{j}-\phi_{j}\|^{2}=O_{P}\left(\frac{j^{2}}{n}+\frac{j^{(4a+2c+2)/5}}{(n\bar{N}_{2})^{4/5}}\right).

In addition, if N¯2n1/4ja+c/22\bar{N}_{2}\geq n^{1/4}j^{a+c/2-2},

ϕ^jϕj2=OP(j2n).\|\hat{\phi}_{j}-\phi_{j}\|^{2}=O_{P}\left(\frac{j^{2}}{n}\right).

4 Asymptotic normality of eigenvalue estimates

The distribution of eigenvalues plays a crucial role in statistical learning and is of significant interest in the high-dimensional setting. Random matrix theory provides a systematic tool for deriving the distribution of eigenvalues of a squared matrix (Anderson, Guionnet and Zeitouni, 2010; Pastur and Shcherbina, 2011), and has been successfully applied in various statistical problems, such as signal detection (Nadler, Penna and Garello, 2011; Onatski, 2009; Bianchi et al., 2011), spiked covariance models (Johnstone, 2001; Paul, 2007; El Karoui, 2007; Ding and Yang, 2021; Bao et al., 2022), and hypothesis testing (Bai et al., 2009; Chen and Qin, 2010; Zheng, 2012). For a comprehensive treatment of random matrix theory in statistics, we recommend the monograph by Bai and Silverstein (2010) and the review paper by Paul and Aue (2014).

Despite the success of random matrix theory in high-dimensional statistics, its application to functional data analysis is not straightforward due to the different structures of functional spaces. If the observations are taken at the same time points {tj}j=1N\{t_{j}\}_{j=1}^{N} for all ii, one can obtain an estimator for ΣT=Cov(X~i,X~i)+σX2IN\Sigma_{T}=\operatorname{Cov}(\tilde{X}_{i},\tilde{X}_{i})+\sigma_{X}^{2}I_{N}, where X~i=(Xi(t1),,Xi(tN))\tilde{X}_{i}=(X_{i}(t_{1}),\ldots,X_{i}(t_{N}))^{\top}. Note that X~i\tilde{X}_{i} can be regarded as a random vector; however the adjacent elements in X~i\tilde{X}_{i} are highly correlated as NN increases due to the smooth nature of functional data. This correlation violates the independence assumption required in most random matrix theory settings.

In the context of functional data, variables of interest become the eigenvalues of the covariance operator. However, the rough bound obtained by Weyl’s inequality, |λ^kλk|ΔHS|\hat{\lambda}_{k}-\lambda_{k}|\leq\|\Delta\|_{\mathrm{HS}}, is suboptimal from two respects. First, |λ^kλk||\hat{\lambda}_{k}-\lambda_{k}| should have a degenerated kernel smoothing rate with variance n1/2n^{-1/2}, whereas ΔHS\|\Delta\|_{\mathrm{HS}} has a two-dimensional kernel smoothing rate with variance (nN¯22h2)1/2(n\bar{N}_{2}^{2}h^{2})^{-1/2}. Second, due to the infinite dimensionality of functional data, the eigenvalues {λk}k\{\lambda_{k}\}_{k} tend to zero as kk\rightarrow\infty, so a general bound for all eigenvalues provides little information for those with larger indices. Although expansions and asymptotic normality have been studied for a fixed number of eigenvalues, as well as for those with diverging indices for fully observed functional data, the study of eigenvalues with a diverging index under the discrete sampling scheme remains limited.

In light of the aforementioned issues, we employ the perturbation technique outlined in previous sections to establish the asymptotic normality of eigenvalues with diverging indices, which holds broad implications for inference problems in functional data analysis. Before presenting our results, we introduce the following assumption, which is standard in FPCA for establishing asymptotic normality.

Assumption 9.

𝔼(X6)<\mathbb{E}(\|X\|^{6})<\infty and 𝔼(ϵ6)<\mathbb{E}(\epsilon^{6})<\infty. For any sequence j1,,j4+j_{1},\ldots,j_{4}\in\mathbb{N}_{+}, 𝔼(ξj1ξj2ξj3ξj4)=0\mathbb{E}(\xi_{j_{1}}\xi_{j_{2}}\xi_{j_{3}}\xi_{j_{4}})=0 unless each index jkj_{k} is repeated.

Theorem 2.

Under Assumptions 1 to 4, 6 and 9, for all jm+j\leq m\in\mathbb{N}_{+} satisfy m=o(n1/(2a+4))m=o(n^{1/(2a+4)}), hmalogn1h{m^{a}}\log n\lesssim 1 and h4m2a+2c1h^{4}m^{2a+2c}\lesssim 1

Σn12(λ^jλjλjK2h2ϕj(2)(u)ϕj(u)du+o(h2))d𝒩(0,1),\Sigma_{n}^{-\frac{1}{2}}\left(\frac{\hat{\lambda}_{j}-\lambda_{j}}{\lambda_{j}}-{\mathrm{K}_{2}h^{2}}\int\phi_{j}^{(2)}(u)\phi_{j}(u)\mathrm{d}u+o(h^{2})\right)\stackrel{{\scriptstyle d}}{{\longrightarrow}}\mathcal{N}(0,1),

where

Σn=4!P0(N)n𝔼(ξj4)λj2λj2+4!P1(N)n{C(u,u)+σX2}ϕj2(u)f(u)duλj\displaystyle\Sigma_{n}=\frac{4!P_{0}(N)}{n}\frac{\mathbb{E}(\xi_{j}^{4})-\lambda_{j}^{2}}{\lambda_{j}^{2}}+4!\frac{P_{1}(N)}{n}\frac{\int\{C(u,u)+\sigma_{X}^{2}\}\frac{\phi_{j}^{2}(u)}{f(u)}\mathrm{d}u}{\lambda_{j}}
+\displaystyle+ 4P2(N)n1λj2([{C(u,u)+σX2}ϕj2(u)f(u)du]2C(u,v)2ϕj2(u)ϕj2(v)f(u)f(v)dudv)\displaystyle 4\frac{P_{2}(N)}{n}\frac{1}{\lambda_{j}^{2}}\left(\left[\int\{C(u,u)+\sigma_{X}^{2}\}\frac{\phi_{j}^{2}(u)}{f(u)}\mathrm{d}u\right]^{2}-\iint C(u,v)^{2}\frac{\phi_{j}^{2}(u)\phi_{j}^{2}(v)}{f(u)f(v)}\mathrm{d}u\mathrm{d}v\right)

with K2=u2K(u)du\mathrm{K}_{2}=\int u^{2}K(u)\mathrm{d}u and

P0(N)=1ni=1n(Ni2)(Ni3)Ni(Ni1),P1(N)=1ni=1n(Ni2)Ni(Ni1),P2(N)=1ni=1n1Ni(Ni1).P_{0}(N)=\frac{1}{n}\sum_{i=1}^{n}\frac{(N_{i}-2)(N_{i}-3)}{N_{i}(N_{i}-1)},\ P_{1}(N)=\frac{1}{n}\sum_{i=1}^{n}\frac{(N_{i}-2)}{N_{i}(N_{i}-1)},\ P_{2}(N)=\frac{1}{n}\sum_{i=1}^{n}\frac{1}{N_{i}(N_{i}-1)}.

Since λj\lambda_{j} approaches zero as jj approaches infinity, we need to regularize the eigenvalues so that they can be compared on the same scale of variability. For a fixed jj, (λ^jλj)/λj(\hat{\lambda}_{j}-\lambda_{j})/\lambda_{j} is n\sqrt{n}-consistent when hh is small. For diverging jj, Corollary 4 presents three different types of asymptotic normalities, depending on the value of P2(N)P_{2}(N).

Corollary 4.

Under the assumptions of Theorem 2,

  1. (a)

    If P2(N)/λj20P_{2}(N)/\lambda_{j}^{2}\rightarrow 0, nh2ϕj(2)(u)ϕj(u)du0\sqrt{n}h^{2}\int\phi_{j}^{(2)}(u)\phi_{j}(u)\mathrm{d}u\rightarrow 0,

    n(λ^jλjλj)d𝒩(0,4!𝔼(ξj4)λj2λj2).\sqrt{n}\left(\frac{\hat{\lambda}_{j}-\lambda_{j}}{\lambda_{j}}\right)\stackrel{{\scriptstyle d}}{{\longrightarrow}}\mathcal{N}\left(0,4!\frac{\mathbb{E}(\xi_{j}^{4})-\lambda_{j}^{2}}{\lambda_{j}^{2}}\right).
  2. (b)

    If P2(N)/λj2C3P_{2}(N)/\lambda_{j}^{2}\rightarrow C_{3} for a positive C3C_{3},

    n(λ^jλjλjK2h2ϕj(2)(u)ϕj(u)du)d𝒩(0,4!𝔼(ξj4)λj2λj2\displaystyle\sqrt{n}\left(\frac{\hat{\lambda}_{j}-\lambda_{j}}{\lambda_{j}}-{\mathrm{K}_{2}h^{2}}\int\phi_{j}^{(2)}(u)\phi_{j}(u)\mathrm{d}u\right)\stackrel{{\scriptstyle d}}{{\longrightarrow}}\mathcal{N}\left(0,4!\frac{\mathbb{E}(\xi_{j}^{4})-\lambda_{j}^{2}}{\lambda_{j}^{2}}\right.
    +4!C3{C(u,u)+σX2}ϕj2(u)f(u)du\displaystyle\quad\quad+4!\sqrt{C_{3}}\int\{C(u,u)+\sigma_{X}^{2}\}\frac{\phi_{j}^{2}(u)}{f(u)}\mathrm{d}u
    +4C3{[{C(u,u)+σX2}ϕj2(u)f(u)du]2C(u,v)2ϕj2(u)ϕj2(v)f(u)f(v)dudv})\displaystyle\quad\quad+\left.4C_{3}\left\{\left[\int\{C(u,u)+\sigma_{X}^{2}\}\frac{\phi_{j}^{2}(u)}{f(u)}\mathrm{d}u\right]^{2}-\iint C(u,v)^{2}\frac{\phi_{j}^{2}(u)\phi_{j}^{2}(v)}{f(u)f(v)}\mathrm{d}u\mathrm{d}v\right\}\right)
  3. (c)

    If P2(N)/λj2P_{2}(N)/\lambda_{j}^{2}\rightarrow\infty,

    nλj2P2(N)(λ^jλjλjK2h2ϕj(2)(u)ϕj(u)du)\displaystyle\sqrt{\frac{n\lambda_{j}^{2}}{P_{2}(N)}}\left(\frac{\hat{\lambda}_{j}-\lambda_{j}}{\lambda_{j}}-{\mathrm{K}_{2}h^{2}}\int\phi_{j}^{(2)}(u)\phi_{j}(u)\mathrm{d}u\right)
    d𝒩(0,4{[{C(u,u)+σX2}ϕj2(u)f(u)du]2C(u,v)2ϕj2(u)ϕj2(v)f(u)f(v)dudv}).\displaystyle\stackrel{{\scriptstyle d}}{{\longrightarrow}}\mathcal{N}\left(0,4\left\{\left[\int\{C(u,u)+\sigma_{X}^{2}\}\frac{\phi_{j}^{2}(u)}{f(u)}\mathrm{d}u\right]^{2}-\iint C(u,v)^{2}\frac{\phi_{j}^{2}(u)\phi_{j}^{2}(v)}{f(u)f(v)}\mathrm{d}u\mathrm{d}v\right\}\right).

Compared to the mean and covariance estimators, which are associated with one-dimensional and two-dimensional kernel smoothing rates respectively, the estimator of eigenvalues exhibits a degenerate rate after being integrated twice. This implies that there is no trade-off between bias and variance in the bandwidth hh, and the estimation bias can be considered negligible for small values of hh. In this scenario, the phase transition is entirely determined by the relationship between P2(N)P_{2}(N) and λj\lambda_{j}. Specifically, in the dense and ultra-dense cases where P2(N)/λj2<P_{2}(N)/\lambda_{j}^{2}<\infty, the variance terms resulting from discrete approximation are dominated by 1/n1/\sqrt{n}, corresponding to cases (a) and (b) in Corollary 4. On the other hand, when each NiN_{i} is relatively small, the estimation variance nλj2/P2(N)\sqrt{n\lambda_{j}^{2}/P_{2}(N)} arising from discrete observations dominates, as outlined in case (c) of Corollary 4. The phase transition point jaj^{a} is the same as the case of eigenfunctions outlined in Corollary 1 and 7, and for larger values of jj, more observations are needed due to the vanishing eigengap.

Using similar arguments, we establish the 2\mathcal{L}^{2} convergence rate for eigenvalues, which is useful for analyzing models involving functional principal component analysis, such as the plug-in estimator in Hall and Horowitz (2007).

Corollary 5.

Under Assumptions 1 to 4 and 6, for all jm+j\leq m\in\mathbb{N}_{+} satisfy m=o(n1/(2a+4))m=o(n^{1/(2a+4)}), hmalogn1h{m^{a}}\log n\lesssim 1 and h4m2a+2c1h^{4}m^{2a+2c}\lesssim 1,

(λ^jλj)2=OP(j2an{P0(N)+j2aP2(N)}+h4j2c2a).(\hat{\lambda}_{j}-\lambda_{j})^{2}=O_{P}\left(\frac{j^{-2a}}{n}\left\{P_{0}(N)+j^{2a}P_{2}(N)\right\}+h^{4}j^{2c-2a}\right).

5 Uniform convergence of covariance and eigenfunction estimates

Classical nonparametric regression with independent observations has yielded numerous results for the uniform convergence of kernel-type estimators (Bickel and Rosenblatt, 1973; Hardle, Janssen and Serfling, 1988; Härdle, 1989). For functional data with in-curve dependence, Yao, Müller and Wang (2005a) obtained a uniform bound for mean and covariance pooling estimates. More recently, Li and Hsing (2010) and Zhang and Wang (2016) have studied the strong uniform convergence of these estimators, showing that these rates depend on both the sample size and the number of observations per subject. However, uniform results for estimated eigenfunctions with diverging indices have not been obtained, even in the fully observed case.

Even for the covariance estimates, there remains a theoretical challenge in achieving uniform convergence for the covariance function under the dense/ultra-dense schemes. Specifically, to obtain uniform convergence with in-curve dependence in functional data analysis, a common approach involves employing the Bernstein inequality to obtain a uniform bound over a finite grid of the observation domain. This grid becomes increasingly dense as the sample size grows. The goal is then to demonstrate that the bound over the finite grid and the bound over [0,1][0,1] are asymptotically equivalent. This technique has been studied by Li and Hsing (2010) and Zhang and Wang (2016) to achieve uniform convergence for mean and covariance estimators based on local linear smoothers. However, due to the lack of compactness in functional data, truncation on the observed data is necessary to apply the Bernstein inequality; that is, Xij𝟙{XijAn}X_{ij}\mathds{1}_{\{X_{ij}\leq A_{n}\}}, where AnA_{n}\rightarrow\infty as nn\rightarrow\infty. The choice of the truncation sequence AnA_{n} should balance the trade-off between the estimation variance appearing in the Bernstein inequality and the bias resulting from truncation. Once the optimal AnA_{n} is chosen, it is essential to impose additional moment conditions on both XiX_{i} and εi\varepsilon_{i} to ensure that the truncation bias is negligible. For covariance estimation, the current state-of-the-art results (Zhang and Wang, 2016) require assuming that the 6/(1τ)6/(1-\tau)th moments of XiX_{i} and εi\varepsilon_{i} are finite, where τ\tau is the sampling rate with N¯2nτ/4\bar{N}_{2}\asymp n^{\tau/4} for τ[0,1)\tau\in[0,1). However, when N¯2\bar{N}_{2} is close to n1/4n^{1/4}, the value of 6/(1τ)6/(1-\tau) tends to infinity. Moreover, in the dense and ultra-dense sampling schemes where N¯2n1/4\bar{N}_{2}\gtrsim n^{1/4}, the discrepancy between the truncated and the original estimators becomes dominant, preventing the achievement of optimal convergence rates with the current methods.

We first resolve this issue for covariance estimates. We propose an additional truncation on the summation of random quantities after truncating XijX_{ij} on AnA_{n}, which achieves a sharp bound by Bernstein inequality twice and allows for a larger AnA_{n} to reduce the first truncation bias. This novel double truncation technique enables one to obtain a unified result for all sampling schemes and to address the limitations of the existing literature.

Theorem 3.

Under the Assumptions 1, 5 and 6,

  1. (a)

    For p,q=0,1p,q=0,1, denote

    Rpq(s,t)=i=1nvil1l2Ni1h2K(til1sh)K(til2th)(til1sh)p(til2th)qδil1l2.R_{pq}(s,t)=\sum_{i=1}^{n}v_{i}\sum_{l_{1}\neq l_{2}}^{N_{i}}\frac{1}{h^{2}}\mathrm{K}\left(\frac{t_{il_{1}}-s}{h}\right)\mathrm{K}\left(\frac{t_{il_{2}}-t}{h}\right)\left(\frac{t_{il_{1}}-s}{h}\right)^{p}\left(\frac{t_{il_{2}}-t}{h}\right)^{q}\delta_{il_{1}l_{2}}.

    Then

    sups,t[0,1]|Rpq(s,t)𝔼Rpq(s,t)|\displaystyle\sup_{s,t\in[0,1]}|{R}_{pq}(s,t)-\mathbb{E}{R}_{pq}(s,t)| (11)
    =\displaystyle= O(lnnn(1+1N¯2h)+|lnnn|11α|1+lnnN¯2h|22αh2α) a.s.\displaystyle O\left(\sqrt{\frac{\ln n}{n}}\left(1+\frac{1}{\bar{N}_{2}h}\right)+\left|\frac{\ln n}{n}\right|^{1-\frac{1}{\alpha}}\left|1+\frac{\ln n}{\bar{N}_{2}h}\right|^{2-\frac{2}{\alpha}}h^{-\frac{2}{\alpha}}\right)\text{ a.s. }
  2. (b)

    In addition, if Assumption 2 holds and α>3\alpha>3,

    • (1).

      When N¯2/(n/logn)1/40\bar{N}_{2}/(n/\log n)^{1/4}\rightarrow 0 and h(nN¯22/logn)1/6h\asymp(n\bar{N}_{2}^{2}/\log n)^{-1/6},

      sups,t[0,1]|C^(s,t)C(s,t)|(lognnN¯22)13a.s.\sup_{s,t\in[0,1]}|\hat{C}(s,t)-{C}(s,t)|\lesssim\left(\frac{\log n}{n\bar{N}_{2}^{2}}\right)^{\frac{1}{3}}\quad{a.s.}
    • (2).

      When N¯2/(n/logn)1/4C1\bar{N}_{2}/(n/\log n)^{1/4}\rightarrow C_{1} for a positive constant C1C_{1} and h(n/logn)1/4h\asymp(n/\log n)^{1/4}

      sups,t[0,1]|C^(s,t)C(s,t)|lognna.s.\sup_{s,t\in[0,1]}|\hat{C}(s,t)-{C}(s,t)|\lesssim\sqrt{\frac{\log n}{n}}\quad{a.s.}
    • (3).

      When N¯2/(n/logn)1/4\bar{N}_{2}/(n/\log n)^{1/4}\rightarrow\infty, h=o(n/logn)1/4h=o(n/\log n)^{1/4} and N¯2h\bar{N}_{2}h\rightarrow\infty,

      sups,t[0,1]|C^(s,t)C(s,t)|lognna.s.\sup_{s,t\in[0,1]}|\hat{C}(s,t)-{C}(s,t)|\lesssim\sqrt{\frac{\log n}{n}}\quad{a.s.}

The first statement of Theorem 3 establishes the uniform convergence rate of the variance term of C^C\hat{C}-C for all sampling rates N¯2\bar{N}_{2}. It is worth comparing our results with those in Li and Hsing (2010) and Zhang and Wang (2016). The bias term caused by double truncation, which appears in the second term on the right-hand side of (11), is smaller than those obtained in Li and Hsing (2010) and Zhang and Wang (2016). As a result, the second statement of Theorem 3 shows that the truncation bias is dominated by the main term if α>3\alpha>3, a condition that is much milder compared to the moment assumptions in Li and Hsing (2010) and Zhang and Wang (2016), thereby establishing the uniform convergence rate for both sparse and dense functional data. Moreover, when 1<α31<\alpha\leq 3, which corresponds to the case where X(t)X(t) or ε\varepsilon do not possess a sixth order finite moment, the additional term caused by truncation becomes dominant. In summary, by introducing the double truncation technique, we resolve the aforementioned issues present in the original proofs of Li and Hsing (2010) and Zhang and Wang (2016), achieve the uniform convergence rate for the covariance estimator across all sampling schemes, and show that the optimal rates for dense functional data can be obtained as a special case. Having set the stage, we arrive at the following result that gives the uniform convergence for estimated eigenfunctions with diverging indices.

Theorem 4.

Under Assumptions 1 to 5, for all jmj\leq m satisfying m2a+2/n=o(1)m^{2a+2}/n=o(1), m2a+2/(nN¯22h2)=o(1)m^{2a+2}/(n\bar{N}_{2}^{2}h^{2})=o(1), h4m2a+2c1h^{4}m^{2a+2c}\lesssim 1 and hmalogn1hm^{a}\log n\lesssim 1,

ϕ^jϕj=\displaystyle\|\hat{\phi}_{j}-\phi_{j}\|_{\infty}= O(jn(lnn+lnj){1+jaN¯2+ja1N¯2h(1+jaN¯2)}\displaystyle O\left(\frac{j}{\sqrt{n}}(\sqrt{\ln n}+\ln j){\left\{1+\frac{j^{a}}{\bar{N}_{2}}+\sqrt{\frac{j^{a-1}}{\bar{N}_{2}h}}\left(1+\sqrt{\frac{j^{a}}{\bar{N}_{2}}}\right)\right\}}\right. (12)
+ja|lnnn|11α|j1/2+lnnN¯2h|11αh1α+h2jc+1logj).\displaystyle+\left.j^{a}\left|\frac{\ln n}{n}\right|^{1-\frac{1}{\alpha}}\left|j^{1/2}+\frac{\ln n}{\bar{N}_{2}h}\right|^{1-\frac{1}{\alpha}}h^{-\frac{1}{\alpha}}+h^{2}j^{c+1}\log j\right).

The contributions of Theorem 4 are two-fold. First, our result is the first to establish uniform convergence for eigenfunctions with diverging indices, providing a useful tool for the theoretical analysis of models involving FPCA and inverse issues. Second, the double truncation technique we introduced reduces the truncation bias, making our results applicable to all sampling schemes. For a fixed jj, Corollary 6 below discusses the uniform convergence rate under different ranges of N¯2\bar{N}_{2}. When α>5/2\alpha>5/2, the truncation bias in equation (12) is dominated by the first two terms for all scenarios of N¯2\bar{N}_{2}. If N¯2/(n/logn)1/40\bar{N}_{2}/(n/\log n)^{1/4}\rightarrow 0, ϕ^jϕj\|\hat{\phi}_{j}-\phi_{j}\|_{\infty} admits a typical one-dimensional kernel smoothing rate that differs only by a logn\log n factor, consistent with the result in Hall, Müller and Wang (2006) and Li and Hsing (2010). In summary, the introduction of the double truncation technique facilitates the attainment of uniform convergence across all sampling schemes within functional data analysis. This advancement prompts more in-depth research, particularly in the analysis of non-compact data exhibiting in-curve dependence.

Corollary 6.

Under the assumptions of Theorem 4. If jj is fixed and α>5/2\alpha>5/2,

  • (1).

    When N¯2/(n/logn)1/40\bar{N}_{2}/(n/\log n)^{1/4}\rightarrow 0 and h(nN¯2/logn)1/5h\asymp(n\bar{N}_{2}/\log n)^{-1/5},

    supt[0,1]|ϕ^j(t)ϕj(t)|=O((lognnN¯2)25) a.s. \sup_{t\in[0,1]}|\hat{\phi}_{j}(t)-{\phi}_{j}(t)|=O\left(\left(\frac{\log n}{n\bar{N}_{2}}\right)^{\frac{2}{5}}\right)\quad\text{ a.s. }
  • (2).

    When N¯2/(n/logn)1/4C2>0\bar{N}_{2}/(n/\log n)^{1/4}\rightarrow C_{2}>0 and h(n/logn)1/4h\asymp(n/\log n)^{1/4}

    supt[0,1]|ϕ^j(t)ϕj(t)|=O(lognn) a.s. \sup_{t\in[0,1]}|\hat{\phi}_{j}(t)-{\phi}_{j}(t)|=O\left(\sqrt{\frac{\log n}{n}}\right)\quad\text{ a.s. }
  • (3).

    When N¯2/(n/logn)1/4\bar{N}_{2}/(n/\log n)^{1/4}\rightarrow\infty, h=o(n/logn)1/4h=o(n/\log n)^{1/4} and N¯2h\bar{N}_{2}h\rightarrow\infty,

    supt[0,1]|ϕ^j(t)ϕj(t)|=O(lognn) a.s. \sup_{t\in[0,1]}|\hat{\phi}_{j}(t)-{\phi}_{j}(t)|=O\left(\sqrt{\frac{\log n}{n}}\right)\quad\text{ a.s. }

The following corollary established the optimal uniform convergence rate for eigenfunctions with diverging indices under mild assumptions, a finding that is new even in the fully observed scenario. In contrast to the 2\mathcal{L}^{2} convergence, the maximum number of eigenfunctions that can be well-estimated under the |||\cdot|_{\infty} norm is slightly smaller and depends on the moment characterized by assumption α\alpha. If α>5/2\alpha>{5}/{2}, then mmaxm_{\max} can increase as the sample size nn goes to infinity, and the phase transition point N¯2mmaxa\bar{N}_{2}\geq m_{\max}^{a} aligns with the 2\mathcal{L}^{2} case.

Corollary 7.

Under the assumptions of Theorem 4 and 5. Given α>5/2\alpha>5/2, denote mmax=min{n12a,nα5/22αaα1}m_{\mathrm{max}}=\min\{n^{\frac{1}{2a}},n^{\frac{\alpha-5/2}{2\alpha a-\alpha-1}}\}. If N¯2mmaxa\bar{N}_{2}\geq m_{\max}^{a}, h4mmax2cn1h^{4}m_{\max}^{2c}\leq n^{-1} and N¯2hmmax\bar{N}_{2}h\geq m_{\max}, for all jmmaxj\leq m_{\mathrm{max}},

supt[0,1]|ϕ^j(t)ϕj(t)|=O(jn(lnn+lnj))a.s.\sup_{t\in[0,1]}|\hat{\phi}_{j}(t)-{\phi}_{j}(t)|=O\left(\frac{j}{\sqrt{n}}(\sqrt{\ln n}+\ln j)\right)\quad a.s.

6 Numerical experiment

In this section, we carry out a numerical evaluation of the convergence rates of eigenfunctions. The underlying trajectories are generated as Xi(t)=j=150ξijϕj(t), for i=1,,nX_{i}(t)=\sum_{j=1}^{50}\xi_{ij}\phi_{j}(t),\text{ for }i=1,\ldots,n. The principal component scores are independently generated following the distribution ξijN(0,j2)\xi_{ij}\sim N(0,j^{-2}) for all ii and jj. We define the eigenfunctions as ϕ1(t)1\phi_{1}(t)\equiv 1, and for j1j\geq 1, as ϕj(t)=2cos(jπt)\phi_{j}(t)=\sqrt{2}\cos(j\pi t). The actual observations are Xij=Xi(tij)+ϵijX_{ij}=X_{i}(t_{ij})+\epsilon_{ij}, with noise ϵij\epsilon_{ij} following 𝒩(0,0.12)\mathcal{N}(0,0.1^{2}), and the time points tijt_{ij} sampled from a uniform distribution Unif[0,1]\mathrm{Unif}[0,1], for j=1,,Nj=1,\ldots,N. Each setting is repeated for 200 Mento-Carlo runs to mitigate the randomness that may occur in a single simulation.

When the phase transition occurs, our theory indicates a proportional relationship such that log(ϕ^jϕj2)2log(j)\log(\|\hat{\phi}_{j}-\phi_{j}\|^{2})\propto 2\log(j) for each fixed nn and log(ϕ^jϕj2)log(n)\log(\|\hat{\phi}_{j}-\phi_{j}\|^{2})\propto-\log(n) for each fixed jj. Figure 1 illustrates this phenomenon by showing that as NN increases, the relationship between log(ϕ^jϕj2)\log(\|\hat{\phi}_{j}-\phi_{j}\|^{2}) and jj tends to be linear with a slope of 22, indicating that the phase transition might occur around N=50N=50. Figure 1 additionally offers a practical way for identifying the phase transition point of NN. This is valuable in guiding both data collection and experimental design, contributing to more cost-effective data collecting strategies. Similarly, Figure 2 shows that as NN increases, the relationship between log(ϕ^jϕj2)\log(\|\hat{\phi}_{j}-\phi_{j}\|^{2}) and log(n)\log(n) also tends to follow a linear trend, but with a slope of 1-1.

Refer to caption
Figure 1: Plot of log(ϕ^jϕj2)\log(\|\hat{\phi}_{j}-\phi_{j}\|^{2}) over the logarithm of number of eigenfunctions. The sample size is n=240n=240. The colored dashed lines correspond to different value of NN. The solid red line represents the theoretical optimal value.
Refer to caption
Figure 2: Plot of log(ϕ^jϕj2)\log(\|\hat{\phi}_{j}-\phi_{j}\|^{2}) for the first (left), second (middle) and third (right) eigenfunctions over log(n)log(n). The colored dashed lines correspond to different value of NN. The solid red line represents the theoretical optimal value.

7 Conclusion and discussion

In this paper, we focus on the convergence rate of eigenfunctions with diverging indices for discretely observed functional data. We propose new techniques to handle the perturbation series and establish sharp bounds for eigenvalues and eigenfunctions across different convergence types. Additionally, we extend the partition “dense” and “sparse” defined for mean and covariance functions to principal components analysis. Another notable contribution of this paper is the double truncation technique for handling uniform convergence. Existing results on uniform convergence for covariance estimation require a strong moment condition on X(t)X(t) and are only applicable to sparse functional data where Ni=op(n1/4)N_{i}=o_{p}(n^{1/4}). By employing the double truncation technique proposed in this paper, we establish an improved bound for the truncated bias, which ensures the uniform convergence of the covariance and eigenfunctions across all sampling schemes under mild conditions. These asymptotic properties play a direct role in various types of statistical inference involving functional data (Yao, Müller and Wang, 2005a; Li and Hsing, 2010).

Furthermore, the optimal rate achieved in this paper holds significant implications for downstream analysis. Since most functional regression models encounter inverse issues due to the infinite dimensionality of functional covariates, the convergence rates in this paper would help improve existing theoretical findings in downstream analyses from fully observed functional data to various sampling designs. Consider the functional linear model 𝔼[Yi|Xi]=βXi\mathbb{E}[Y_{i}|X_{i}]=\int\beta X_{i} with β=k=1kbϕk\beta=\sum_{k=1}^{\infty}k^{-b}\phi_{k}. Without a sharp bound for eigenfunctions, achieving optimal convergence for standard and efficient plug-in estimators becomes challenging. Therefore, methods like approximated least squares and sample splitting, as discussed in (Zhou, Yao and Zhang, 2023), are necessary in the modeling phase. These complex methods require estimating principal components scores and do not efficiently utilize information gained from pooling. As a result, the phase transition for the functional linear model obtained by Zhou, Yao and Zhang (2023) is max{n2a+2a+2b,na+ba+2b}{\max\{n^{\frac{2a+2}{a+2b}},n^{\frac{a+b}{a+2b}}\}}, which is significantly greater than 1/21/2. In contrast, by using the new results in this paper, one can directly apply the plug-in estimator from Hall and Horowitz (2007) and achieve a phase transition of n14+3a2b4(a+2b)n^{\frac{1}{4}+\frac{3a-2b}{4(a+2b)}} for the functional linear model. Additionally, for complex regression models like the functional generalized linear model or functional Cox model, the methods developed in this paper could serve as a cornerstone for further exploration.

Proof of Theorem 1

Recall C^(s,t)=β^0\hat{C}(s,t)=\hat{\beta}_{0} in the optimization problem (4) and β^0\hat{\beta}_{0} has the analytical from

β^0(s,t)=R00(s,t)I1(s,t)+R10(s,t)I2(s,t)+R01(s,t)I3(s,t)S00(s,t)I1(s,t)+S10(s,t)I2(s,t)+S01(s,t)I3(s,t),\hat{\beta}_{0}(s,t)=\frac{R_{00}(s,t)I_{1}(s,t)+R_{10}(s,t)I_{2}(s,t)+R_{01}(s,t)I_{3}(s,t)}{S_{00}(s,t)I_{1}(s,t)+S_{10}(s,t)I_{2}(s,t)+S_{01}(s,t)I_{3}(s,t)},

where

Spq(s,t)=\displaystyle S_{pq}(s,t)= i=1nvil1l2Ni1h2K(til1sh)K(til2th)(til1sh)p(til2th)q\displaystyle\sum_{i=1}^{n}v_{i}\sum_{l_{1}\neq l_{2}}^{N_{i}}\frac{1}{h^{2}}\mathrm{K}\left(\frac{t_{il_{1}}-s}{h}\right)\mathrm{K}\left(\frac{t_{il_{2}}-t}{h}\right)\left(\frac{t_{il_{1}}-s}{h}\right)^{p}\left(\frac{t_{il_{2}}-t}{h}\right)^{q}
Rpq(s,t)=\displaystyle R_{pq}(s,t)= i=1nvil1l2Ni1h2K(til1sh)K(til2th)(til1sh)p(til2th)qδil1l2\displaystyle\sum_{i=1}^{n}v_{i}\sum_{l_{1}\neq l_{2}}^{N_{i}}\frac{1}{h^{2}}\mathrm{K}\left(\frac{t_{il_{1}}-s}{h}\right)\mathrm{K}\left(\frac{t_{il_{2}}-t}{h}\right)\left(\frac{t_{il_{1}}-s}{h}\right)^{p}\left(\frac{t_{il_{2}}-t}{h}\right)^{q}\delta_{il_{1}l_{2}}

for p,q=0,1,2p,q=0,1,2 and

I1(s,t)=S02(s,t)S20(s,t)S112(s,t),I2(s,t)=S01(s,t)S11(s,t)S02(s,t)S10(s,t)\displaystyle I_{1}(s,t)=S_{02}(s,t)S_{20}(s,t)-S_{11}^{2}(s,t),\ I_{2}(s,t)=S_{01}(s,t)S_{11}(s,t)-S_{02}(s,t)S_{10}(s,t)
I3(s,t)=S10(s,t)S11(s,t)S20(s,t)S01(s,t).\displaystyle I_{3}(s,t)=S_{10}(s,t)S_{11}(s,t)-S_{20}(s,t)S_{01}(s,t).

Some calculations show that

β^1(s,t)=1hR00(s,t)I2(s,t)+R10(s,t)J1(s,t)+R01(s,t)J3(s,t)S00(s,t)I1(s,t)+S10(s,t)I2(s,t)+S01(s,t)I3(s,t)\hat{\beta}_{1}(s,t)=\frac{1}{h}\frac{R_{00}(s,t)I_{2}(s,t)+R_{10}(s,t)J_{1}(s,t)+R_{01}(s,t)J_{3}(s,t)}{S_{00}(s,t)I_{1}(s,t)+S_{10}(s,t)I_{2}(s,t)+S_{01}(s,t)I_{3}(s,t)}

and

β^2(s,t)=1hR00(s,t)I3(s,t)+R10(s,t)J3(s,t)+R01(s,t)J2(s,t)S00(s,t)I1(s,t)+S10(s,t)I2(s,t)+S01(s,t)I3(s,t)\hat{\beta}_{2}(s,t)=\frac{1}{h}\frac{R_{00}(s,t)I_{3}(s,t)+R_{10}(s,t)J_{3}(s,t)+R_{01}(s,t)J_{2}(s,t)}{S_{00}(s,t)I_{1}(s,t)+S_{10}(s,t)I_{2}(s,t)+S_{01}(s,t)I_{3}(s,t)}

where

J1(s,t)=S00(s,t)S02(s,t)S012(s,t),J2(s,t)=S00(s,t)S20(s,t)S102(s,t)\displaystyle J_{1}(s,t)=S_{00}(s,t)S_{02}(s,t)-S_{01}^{2}(s,t),\ J_{2}(s,t)=S_{00}(s,t)S_{20}(s,t)-S_{10}^{2}(s,t)
J3(s,t)=S10(s,t)S01(s,t)S00(s,t)S11(s,t).\displaystyle J_{3}(s,t)=S_{10}(s,t)S_{01}(s,t)-S_{00}(s,t)S_{11}(s,t).

In the following, we omit the arguments (s,t)(s,t) in the functions β^2\hat{\beta}_{2}, β^3\hat{\beta}_{3}, Spq,RpqS_{pq},\ R_{pq}, IrI_{r} and JrJ_{r} for p,q=0,1,2p,q=0,1,2 and r=1,2,3r=1,2,3 when there is no ambiguity. Simple calculations show that C^(s,t)=(R00hβ^1S10hβ^2S01)/S00\hat{C}(s,t)=(R_{00}-h\hat{\beta}_{1}S_{10}-h\hat{\beta}_{2}S_{01})/S_{00}. We further denote

C~0(s,t)={R00C(s,t)S00hC(s,t)s(s,t)S10hC(s,t)t(s,t)S01}/f(s)f(t),\tilde{C}_{0}(s,t)=\left\{R_{00}-C(s,t)S_{00}-h\frac{\partial C(s,t)}{\partial s}(s,t)S_{10}-h\frac{\partial C(s,t)}{\partial t}(s,t)S_{01}\right\}/{f(s)f(t)},

where f(s)f(s) is the density function of tijt_{ij} in the case of the random design, and f(s)=1{0s1}f(s)=1_{\{0\leq s\leq 1\}} in the case of the fixed regular design. The subsequent proof is structured into three steps to ensure clearer understanding.

Step 1: Discrepancy between C^(s,t)\hat{C}(s,t) and C~0\tilde{C}_{0}. The following lemma bound the discrepancy between C^(s,t)C(s,t)\hat{C}(s,t)-C(s,t) and C~0(s,t)\tilde{C}_{0}(s,t), and its proof can be found in Section S3 of the supplement.

Lemma 1.

Under Assumption 1, 2 and 5.

  • (a)

    For the random design, if Assumption 6 holds,

    C^(s,t)C(s,t)C~0(s,t)HS2=OP(h6+h2n{1+1N¯22h2}).\left\|\hat{C}(s,t)-C(s,t)-\tilde{C}_{0}(s,t)\right\|_{\mathrm{HS}}^{2}=O_{P}\left(h^{6}+{\frac{h^{2}}{n}\left\{1+\frac{1}{\bar{N}^{2}_{2}h^{2}}\right\}}\right). (13)
  • (b)

    For the fix design, if Assumption 7 and 8 hold,

    C^(s,t)C(s,t)C~0(s,t)HS2=OP(h6+h2n{1+1N2h2}).\left\|\hat{C}(s,t)-C(s,t)-\tilde{C}_{0}(s,t)\right\|_{\mathrm{HS}}^{2}=O_{P}\left(h^{6}+{\frac{h^{2}}{n}\left\{1+\frac{1}{N^{2}h^{2}}\right\}}\right). (14)

Step 2: bound the projection 𝔼([C~0(s,t)ϕj(s)ϕk(t)dsdt]2)\mathbb{E}([\iint\tilde{C}_{0}(s,t)\phi_{j}(s)\phi_{k}(t)\mathrm{d}s\mathrm{d}t]^{2}). In this part, we will prove the following lemma, which provides expectation bounds for the projections of C~0(s,t)\tilde{C}_{0}(s,t) with respect to ϕj\phi_{j} and ϕk\phi_{k}.

Lemma 2.

Under assumptions 1 to 4, h4j2a+2c1h^{4}j^{2a+2c}\lesssim 1 and hjalogn1hj^{a}\log n\lesssim 1,

  • (a)

    If Assumption 6 holds, for all 1k2j1\leq k\leq 2j,

    𝔼[{C~0(s,t)ϕj(s)ϕk(t)dsdt}2]1n(jaka+ja+kaN¯2+1N¯22)+h4k2c2a\displaystyle\mathbb{E}\left[\left\{\iint\tilde{C}_{0}(s,t)\phi_{j}(s)\phi_{k}(t)\mathrm{d}s\mathrm{d}t\right\}^{2}\right]\lesssim\frac{1}{n}\left(j^{-a}k^{-a}+\frac{j^{-a}+k^{-a}}{\bar{N}_{2}}+\frac{1}{\bar{N}^{2}_{2}}\right)+h^{4}k^{2c-2a}

    and

    k=j+1𝔼[{C~0(s,t)ϕj(s)ϕk(t)dsdt}2]\displaystyle\sum_{k=j+1}^{\infty}\mathbb{E}\left[\left\{\iint\tilde{C}_{0}(s,t)\phi_{j}(s)\phi_{k}(t)\mathrm{d}s\mathrm{d}t\right\}^{2}\right]
    \displaystyle\lesssim 1n(j12a+h1ja+j1aN¯22+1hN¯22)+h4j1+2c2a.\displaystyle\frac{1}{n}\left(j^{1-2a}+\frac{h^{-1}j^{-a}+j^{1-a}}{\bar{N}_{2}^{2}}+\frac{1}{h\bar{N}^{2}_{2}}\right)+h^{4}j^{1+2c-2a}.
  • (b)

    If Assumptions 7 and 8 hold, for all 1k2j1\leq k\leq 2j,

    𝔼[{C~0(s,t)ϕj(s)ϕk(t)dsdt}2]1n(jaka+ja+kaN+1N2)+h4k2c2a\displaystyle\mathbb{E}\left[\left\{\iint\tilde{C}_{0}(s,t)\phi_{j}(s)\phi_{k}(t)\mathrm{d}s\mathrm{d}t\right\}^{2}\right]\lesssim\frac{1}{n}\left(j^{-a}k^{-a}+\frac{j^{-a}+k^{-a}}{N}+\frac{1}{N^{2}}\right)+h^{4}k^{2c-2a}

    and

    k=j+1𝔼[{C~0(s,t)ϕj(s)ϕk(t)dsdt}2]\displaystyle\sum_{k=j+1}^{\infty}\mathbb{E}\left[\left\{\iint\tilde{C}_{0}(s,t)\phi_{j}(s)\phi_{k}(t)\mathrm{d}s\mathrm{d}t\right\}^{2}\right]
    \displaystyle\lesssim 1n(j12a+h1ja+j1aN+1hN2)+h4j1+2c2a.\displaystyle\frac{1}{n}\left(j^{1-2a}+\frac{h^{-1}j^{-a}+j^{1-a}}{N}+\frac{1}{hN^{2}}\right)+h^{4}j^{1+2c-2a}.
Proof of Lemma 2.

We focus on the first statement of Lemma 2. The proof of the fix regular design is similar and we put it in supplement. By definition of C~0(s,t)\tilde{C}_{0}(s,t), we need to bound the bias and variance terms of

{R00C(s,t)S00hCs(s,t)S10hCt(s,t)S01}ϕj(s)ϕk(t)f(s)f(t)dsdt.\iint\left\{R_{00}-C(s,t)S_{00}-h\frac{\partial C}{\partial s}(s,t)S_{10}-h\frac{\partial C}{\partial t}(s,t)S_{01}\right\}\frac{\phi_{j}(s)\phi_{k}(t)}{f(s)f(t)}\mathrm{d}s\mathrm{d}t. (15)

For the random design case, by analogous calculation as proof of Theorem 3.2 in Zhang and Wang (2016), one has

𝔼[R00C(s,t)S00hCs(s,t)S10hCt(s,t)S01]\displaystyle\mathbb{E}\left[R_{00}-C(s,t)S_{00}-h\frac{\partial C}{\partial s}(s,t)S_{10}-h\frac{\partial C}{\partial t}(s,t)S_{01}\right]
=\displaystyle= h22K2C(s,t)s2(s,t)f(s)f(t)+h22K2C(s,t)t2(s,t)f(s)f(t)+O(h3),\displaystyle\frac{h^{2}}{2}\mathrm{K}_{2}\frac{\partial C(s,t)}{\partial s^{2}}(s,t)f(s)f(t)+\frac{h^{2}}{2}\mathrm{K}_{2}\frac{\partial C(s,t)}{\partial t^{2}}(s,t)f(s)f(t)+O(h^{3}),

where K2=u2K(u)du\mathrm{K}_{2}=\int u^{2}\mathrm{K}(u)\mathrm{d}u. Thus, for the bias part of equation (15), for all k2jk\leq 2j,

(𝔼{R00C(s,t)S00hCs(s,t)S10hCt(s,t)S01}ϕj(s)ϕk(t)f(s)f(t)dsdt)2\displaystyle\left(\mathbb{E}\iint\left\{R_{00}-C(s,t)S_{00}-h\frac{\partial C}{\partial s}(s,t)S_{10}-h\frac{\partial C}{\partial t}(s,t)S_{01}\right\}\frac{\phi_{j}(s)\phi_{k}(t)}{f(s)f(t)}\mathrm{d}s\mathrm{d}t\right)^{2} (16)
=\displaystyle= h42K22[r=1λrϕr(s)ϕr(2)(t)ϕj(s)ϕk(t)dsdt]2+o(j2ah4)\displaystyle\frac{h^{4}}{2}\mathrm{K}_{2}^{2}\left[\iint\sum_{r=1}^{\infty}\lambda_{r}\phi_{r}(s)\phi_{r}^{(2)}(t)\phi_{j}(s)\phi_{k}(t)\mathrm{d}s\mathrm{d}t\right]^{2}+o(j^{-2a}h^{4})
\displaystyle\leq h42K22λj2ϕj2+o(h4)=O(h4k2aj2c).\displaystyle\frac{h^{4}}{2}\mathrm{K}_{2}^{2}\lambda_{j}^{2}\|\phi_{j}\|^{2}_{\infty}+o(h^{4})=O(h^{4}k^{-2a}j^{2c}).

For the tail summation, similarly

kj(𝔼{R00CS00hCs(s,t)S10hCt(s,t)S01}ϕj(s)ϕk(t)f(s)f(t)dsdt)2\displaystyle\sum_{k\geq j}\left(\mathbb{E}\iint\left\{R_{00}-CS_{00}-h\frac{\partial C}{\partial s}(s,t)S_{10}-h\frac{\partial C}{\partial t}(s,t)S_{01}\right\}\frac{\phi_{j}(s)\phi_{k}(t)}{f(s)f(t)}\mathrm{d}s\mathrm{d}t\right)^{2} (17)
\displaystyle\leq h42K222Ct2(t,s)ϕj(s)dsHS+o(j2ah4)\displaystyle\frac{h^{4}}{2}\mathrm{K}_{2}^{2}\left\|\int\frac{\partial^{2}C}{\partial t^{2}}(t,s)\phi_{j}(s)\mathrm{d}s\right\|_{\mathrm{HS}}+o(j^{-2a}h^{4})
=\displaystyle= O(h4j1+2c2a).\displaystyle O(h^{4}j^{1+2c-2a}).

For the variance of equation (15), note that

𝕍ar({R00CS00hCs(s,t)S10hCt(s,t)S01}ϕj(s)ϕk(t)f(s)f(t)dsdt)\displaystyle\mathbb{V}\mathrm{ar}\left(\iint\left\{R_{00}-CS_{00}-h\frac{\partial C}{\partial s}(s,t)S_{10}-h\frac{\partial C}{\partial t}(s,t)S_{01}\right\}\frac{\phi_{j}(s)\phi_{k}(t)}{f(s)f(t)}\mathrm{d}s\mathrm{d}t\right) (18)
\displaystyle\leq 𝔼[{R00(s,t)ϕj(s)ϕk(t)f(s)f(t)dsdt}2]+𝔼[{C(s,t)S00(s,t)ϕj(s)ϕk(t)f(s)f(t)dsdt}2]\displaystyle\mathbb{E}\left[\left\{\iint R_{00}(s,t)\frac{\phi_{j}(s)\phi_{k}(t)}{f(s)f(t)}\mathrm{d}s\mathrm{d}t\right\}^{2}\right]+\mathbb{E}\left[\left\{\iint C(s,t)S_{00}(s,t)\frac{\phi_{j}(s)\phi_{k}(t)}{f(s)f(t)}\mathrm{d}s\mathrm{d}t\right\}^{2}\right]
+h2𝔼[{Cs(s,t)S10(s,t)ϕj(s)ϕk(t)f(s)f(t)dsdt}2]\displaystyle+h^{2}\mathbb{E}\left[\left\{\iint\frac{\partial C}{\partial s}(s,t)S_{10}(s,t)\frac{\phi_{j}(s)\phi_{k}(t)}{f(s)f(t)}\mathrm{d}s\mathrm{d}t\right\}^{2}\right]
+h2𝔼[{Ct(s,t)S01(s,t)ϕj(s)ϕk(t)f(s)f(t)dsdt}2].\displaystyle+h^{2}\mathbb{E}\left[\left\{\iint\frac{\partial C}{\partial t}(s,t)S_{01}(s,t)\frac{\phi_{j}(s)\phi_{k}(t)}{f(s)f(t)}\mathrm{d}s\mathrm{d}t\right\}^{2}\right].

We start with the first term in the right hand side of equation (18). To simplify the notation, we shall introduce the following notation:

𝒯hf(x)=1hK(xyh)f(y)dy.\mathcal{T}_{h}f(x)=\frac{1}{h}\int\mathrm{K}\left(\frac{x-y}{h}\right)f(y)\mathrm{d}y.

Then

𝔼[{R00(s,t)ϕj(s)ϕk(t)f(s)f(t)dsdt}2]=𝔼[{i=1nvil1l2Niδil1l2𝒯hϕjf(til1)𝒯hϕkf(til2)}2]\displaystyle\mathbb{E}\left[\left\{\iint R_{00}(s,t)\frac{\phi_{j}(s)\phi_{k}(t)}{f(s)f(t)}\mathrm{d}s\mathrm{d}t\right\}^{2}\right]=\mathbb{E}\left[\left\{\sum_{i=1}^{n}v_{i}\sum_{l_{1}\neq l_{2}}^{N_{i}}\delta_{il_{1}l_{2}}\mathcal{T}_{h}\frac{\phi_{j}}{f}(t_{il_{1}})\mathcal{T}_{h}\frac{\phi_{k}}{f}(t_{il_{2}})\right\}^{2}\right]
=\displaystyle= i=1nvi2{4!(Ni4)Ai1+3!(Ni3)Ai2+2!(Ni2)Ai3}\displaystyle\sum_{i=1}^{n}v_{i}^{2}\left\{4!\binom{N_{i}}{4}A_{i1}+3!\binom{N_{i}}{3}A_{i2}+2!\binom{N_{i}}{2}A_{i3}\right\}

with

Ai1=\displaystyle A_{i1}= 𝔼(Xif,𝒯hϕjf2Xif,𝒯hϕkf2),\displaystyle\mathbb{E}\left(\left\langle X_{i}f,\mathcal{T}_{h}\frac{\phi_{j}}{f}\right\rangle^{2}\left\langle X_{i}f,\mathcal{T}_{h}\frac{\phi_{k}}{f}\right\rangle^{2}\right),
Ai2=\displaystyle A_{i2}= 2𝔼{(Xif𝒯hϕjf,Xi𝒯hϕkf+σX2𝒯hϕjf,f𝒯hϕkf)Xif,𝒯hϕjfXif,𝒯hϕkf}\displaystyle 2\mathbb{E}\left\{\left(\left\langle X_{i}f\mathcal{T}_{h}\frac{\phi_{j}}{f},X_{i}\mathcal{T}_{h}\frac{\phi_{k}}{f}\right\rangle+\sigma_{X}^{2}\left\langle\mathcal{T}_{h}\frac{\phi_{j}}{f},f\mathcal{T}_{h}\frac{\phi_{k}}{f}\right\rangle\right)\right.\left.\left\langle X_{i}f,\mathcal{T}_{h}\frac{\phi_{j}}{f}\right\rangle\left\langle X_{i}f,\mathcal{T}_{h}\frac{\phi_{k}}{f}\right\rangle\right\}
+𝔼{Xif,𝒯hϕjf2(Xi2+σX2)f,(𝒯hϕkf)2}\displaystyle+\mathbb{E}\left\{\left\langle X_{i}f,\mathcal{T}_{h}\frac{\phi_{j}}{f}\right\rangle^{2}\left\langle(X_{i}^{2}+\sigma_{X}^{2})f,\left(\mathcal{T}_{h}\frac{\phi_{k}}{f}\right)^{2}\right\rangle\right\}
+𝔼{Xif,𝒯hϕkf2(Xi2+σX2)f,(𝒯hϕjf)2}\displaystyle+\mathbb{E}\left\{\left\langle X_{i}f,\mathcal{T}_{h}\frac{\phi_{k}}{f}\right\rangle^{2}\left\langle(X_{i}^{2}+\sigma_{X}^{2})f,\left(\mathcal{T}_{h}\frac{\phi_{j}}{f}\right)^{2}\right\rangle\right\}
:=\displaystyle:= Ai21+Ai22+Ai23\displaystyle A_{i21}+A_{i22}+A_{i23}
Ai3=\displaystyle A_{i3}= 𝔼{(Xi2+σX2)f,(𝒯hϕjf)2(Xi2+σX2)f,(𝒯hϕkf)2}\displaystyle\mathbb{E}\left\{\left\langle(X_{i}^{2}+\sigma_{X}^{2})f,\left(\mathcal{T}_{h}\frac{\phi_{j}}{f}\right)^{2}\right\rangle\left\langle(X_{i}^{2}+\sigma_{X}^{2})f,\left(\mathcal{T}_{h}\frac{\phi_{k}}{f}\right)^{2}\right\rangle\right\}
+𝔼(Xi𝒯hϕjf,Xif𝒯hϕkf+σX2𝒯hϕjf,f𝒯hϕkf)2\displaystyle+\mathbb{E}\left(\left\langle X_{i}\mathcal{T}_{h}\frac{\phi_{j}}{f},X_{i}f\mathcal{T}_{h}\frac{\phi_{k}}{f}\right\rangle+\sigma_{X}^{2}\left\langle\mathcal{T}_{h}\frac{\phi_{j}}{f},f\mathcal{T}_{h}\frac{\phi_{k}}{f}\right\rangle\right)^{2}
:=\displaystyle:= Ai31+Ai32.\displaystyle A_{i31}+A_{i32}.

By Cauchy–Schwarz and AM–GM inequality,

Ai21\displaystyle A_{i21}\leq 𝔼{Xif,𝒯hϕjf2(Xi2+σX2)f,(𝒯hϕkf)2}\displaystyle\mathbb{E}\left\{\left\langle X_{i}f,\mathcal{T}_{h}\frac{\phi_{j}}{f}\right\rangle^{2}\left\langle(X_{i}^{2}+\sigma_{X}^{2})f,\left(\mathcal{T}_{h}\frac{\phi_{k}}{f}\right)^{2}\right\rangle\right\}
+𝔼{Xif,𝒯hϕkf2(Xi2+σX2)f,(𝒯hϕjf)2}\displaystyle+\mathbb{E}\left\{\left\langle X_{i}f,\mathcal{T}_{h}\frac{\phi_{k}}{f}\right\rangle^{2}\left\langle(X_{i}^{2}+\sigma_{X}^{2})f,\left(\mathcal{T}_{h}\frac{\phi_{j}}{f}\right)^{2}\right\rangle\right\}
=\displaystyle= Ai22+Ai23.\displaystyle A_{i22}+A_{i23}.

Similarly Ai32Ai31A_{i32}\leq A_{i31}, thus, Ai22{Ai22+Ai23},A_{i2}\leq 2\{A_{i22}+A_{i23}\}, Ai32Ai31A_{i3}\leq 2A_{i31}. Combine all above,

𝔼[{R00(s,t)ϕj(s)ϕk(t)f(s)f(t)dsdt}2]\displaystyle\mathbb{E}\left[\left\{\iint R_{00}(s,t)\frac{\phi_{j}(s)\phi_{k}(t)}{f(s)f(t)}\mathrm{d}s\mathrm{d}t\right\}^{2}\right] (19)
\displaystyle\lesssim i=1nvi2{4!(Ni4)Ai1+3!(Ni3)(Ai22+Ai23)+2!(Ni2)Ai31}.\displaystyle\sum_{i=1}^{n}v_{i}^{2}\left\{4!\binom{N_{i}}{4}A_{i1}+3!\binom{N_{i}}{3}(A_{i22}+A_{i23})+2!\binom{N_{i}}{2}A_{i31}\right\}.

The following lemma in bounding Ai1,Ai22,Ai23A_{i1},A_{i22},A_{i23} and Ai31A_{i31}, its proof can be found in the supplement.

Lemma 3.

Under assumptions 1 to 4 and h4j2c+2a1h^{4}j^{2c+2a}\lesssim 1, hjalogn1hj^{a}\log n\lesssim 1, there is 𝔼(Xif,𝒯hϕkf4)k2afor 1k2j\mathbb{E}(\langle X_{i}f,\mathcal{T}_{h}\frac{\phi_{k}}{f}\rangle^{4})\lesssim k^{-2a}\ \text{for}\ 1\leq k\leq 2j and

𝔼(k>jXif,𝒯hϕkf2)2j22a.\mathbb{E}\left(\sum_{k>j}\left\langle X_{i}f,\mathcal{T}_{h}\frac{\phi_{k}}{f}\right\rangle^{2}\right)^{2}\lesssim j^{2-2a}.

By Lemma 3 and Cauchy-Schwarz inequality

Ai1(𝔼Xif,𝒯hϕjf4𝔼Xif,𝒯hϕkf4)1/2jaka and\displaystyle A_{i1}\leq\left(\mathbb{E}\left\langle X_{i}f,\mathcal{T}_{h}\frac{\phi_{j}}{f}\right\rangle^{4}\mathbb{E}\left\langle X_{i}f,\mathcal{T}_{h}\frac{\phi_{k}}{f}\right\rangle^{4}\right)^{{1}/{2}}\lesssim j^{-a}k^{-a}\text{ and } (20)
k>jAi1(𝔼Xif,𝒯hϕjf4𝔼(k>jXif,𝒯hϕkf2)2)1/2j12a.\displaystyle\sum_{k>j}A_{i1}\leq\left(\mathbb{E}\left\langle X_{i}f,\mathcal{T}_{h}\frac{\phi_{j}}{f}\right\rangle^{4}\mathbb{E}\left(\sum_{k>j}\left\langle X_{i}f,\mathcal{T}_{h}\frac{\phi_{k}}{f}\right\rangle^{2}\right)^{2}\right)^{{1}/{2}}\lesssim j^{1-2a}.

For Ai22A_{i22}, by Lemma 3 and Cauchy-Schwarz inequality,

Ai22\displaystyle A_{i22}\leq ϕkf2f{𝔼Xif,𝒯hϕjf4𝔼(Xi2+σX2)2}1/2ja,\displaystyle\left\|\frac{\phi_{k}}{f}\right\|_{{\infty}}^{2}\|f\|_{\infty}\left\{\mathbb{E}\left\langle X_{i}f,\mathcal{T}_{h}\frac{\phi_{j}}{f}\right\rangle^{4}\mathbb{E}(\|X_{i}\|^{2}+\sigma_{X}^{2})^{2}\right\}^{1/2}\lesssim j^{-a}, (21)

For the summation k>jAi22\sum_{k>j}A_{i22}, by Cauchy-Schwarz inequality

k=1|𝒯hϕkf|21h2|K(xyh)|21f2(y)dyh1 for all x[0,1].\displaystyle\sum_{k=1}^{\infty}\left|\mathcal{T}_{h}\frac{\phi_{k}}{f}\right|^{2}\leq\frac{1}{h^{2}}\int\left|\mathrm{K}\left(\frac{x-y}{h}\right)\right|^{2}\frac{1}{f^{2}(y)}\mathrm{d}y\lesssim{h}^{-1}\text{ for all }x\in[0,1].

Thus,

k=jAi22=𝔼{Xif,𝒯hϕjf2k=j(Xi2+σX2)f,(𝒯hϕkf)2}\displaystyle\sum_{k=j}^{\infty}A_{i22}=\mathbb{E}\left\{\left\langle X_{i}f,\mathcal{T}_{h}\frac{\phi_{j}}{f}\right\rangle^{2}\sum_{k=j}^{\infty}\left\langle(X_{i}^{2}+\sigma_{X}^{2})f,\left(\mathcal{T}_{h}\frac{\phi_{k}}{f}\right)^{2}\right\rangle\right\} (22)
\displaystyle\leq fk=1𝒯hϕkf2𝔼{Xif,𝒯hϕjf2(Xi2+σX2)}\displaystyle\|f\|_{\infty}\sum_{k=1}^{\infty}\left\|\mathcal{T}_{h}\frac{\phi_{k}}{f}\right\|^{2}\mathbb{E}\left\{\left\langle X_{i}f,\mathcal{T}_{h}\frac{\phi_{j}}{f}\right\rangle^{2}(\|X_{i}\|^{2}+\sigma_{X}^{2})\right\}
\displaystyle\lesssim h1(𝔼Xif,𝒯hϕjf4)1/2h1ja.\displaystyle h^{-1}\left(\mathbb{E}\left\langle X_{i}f,\mathcal{T}_{h}\frac{\phi_{j}}{f}\right\rangle^{4}\right)^{1/2}\lesssim h^{-1}j^{-a}.

For Ai23A_{i23}, by Lemma 3,

Ai23=Ai22𝒯hϕjf2f𝔼{Xif,𝒯hϕkf2(Xi2+σX2)}\displaystyle A_{i23}=A_{i22}\leq\left\|\mathcal{T}_{h}\frac{\phi_{j}}{f}\right\|_{{\infty}}^{2}\|f\|_{\infty}\mathbb{E}\left\{\left\langle X_{i}f,\mathcal{T}_{h}\frac{\phi_{k}}{f}\right\rangle^{2}(\|X_{i}\|^{2}+\sigma_{X}^{2})\right\} (23)
\displaystyle\lesssim (𝔼Xif,𝒯hϕkf4)1/2ka, 1k2j,\displaystyle\left(\mathbb{E}\left\langle X_{i}f,\mathcal{T}_{h}\frac{\phi_{k}}{f}\right\rangle^{4}\right)^{1/2}\lesssim k^{-a},\quad\forall\ 1\leq k\leq 2j,

and

k>jAi23𝒯hϕjf2f𝔼[k>jXif,𝒯hϕkf2(Xi2+σX2)]\displaystyle\sum_{k>j}A_{i23}\leq\left\|\mathcal{T}_{h}\frac{\phi_{j}}{f}\right\|_{{\infty}}^{2}\|f\|_{\infty}\mathbb{E}\left[\sum_{k>j}\!\left\langle X_{i}f,\mathcal{T}_{h}\frac{\phi_{k}}{f}\right\rangle^{2}(\|X_{i}\|^{2}+\sigma_{X}^{2})\right] (24)
\displaystyle\lesssim {𝔼(k>jXif,𝒯hϕkf2)2𝔼(X2+σX2)2}12j1a.\displaystyle\left\{\mathbb{E}\left(\sum_{k>j}\!\left\langle X_{i}f,\mathcal{T}_{h}\frac{\phi_{k}}{f}\right\rangle^{2}\right)^{2}\mathbb{E}(\|X\|^{2}+\sigma_{X}^{2})^{2}\right\}^{\frac{1}{2}}\lesssim j^{1-a}.

For the last term Ai31A_{i31}, note that

Ai31=\displaystyle A_{i31}= 𝔼{(Xi2+σX2)f,(𝒯hϕjf)2(Xi2+σX2)f,(𝒯hϕkf)2}=O(1),\displaystyle\mathbb{E}\left\{\left\langle(X_{i}^{2}+\sigma_{X}^{2})f,\left(\mathcal{T}_{h}\frac{\phi_{j}}{f}\right)^{2}\right\rangle\left\langle(X_{i}^{2}+\sigma_{X}^{2})f,\left(\mathcal{T}_{h}\frac{\phi_{k}}{f}\right)^{2}\right\rangle\right\}=O(1),

and

k=jAi31𝔼{(Xi2+σX2)f,(𝒯hϕjf)2(Xi2f+σX2f2)K2h}\displaystyle\sum_{k=j}^{\infty}A_{i31}\leq\mathbb{E}\left\{\left\langle(X_{i}^{2}+\sigma_{X}^{2})f,\left(\mathcal{T}_{h}\frac{\phi_{j}}{f}\right)^{2}\right\rangle(\left\|X_{i}\right\|^{2}\|f\|_{\infty}+\sigma_{X}^{2}\|f\|^{2})\frac{\|\mathrm{K}\|^{2}}{h}\right\} (25)
\displaystyle\lesssim h1𝔼(Xi2f+σX2f)2h1.\displaystyle h^{-1}\mathbb{E}(\|X_{i}\|^{2}\|f\|_{\infty}+\sigma_{X}^{2}\|f\|_{\infty})^{2}\lesssim h^{-1}.

Combine equation (20) to (25), for all k2jk\leq 2j

𝔼[{R00(s,t)ϕj(s)ϕk(t)f(s)f(t)dsdt}2]1n(jaka+ja+kaN¯2+1N¯22)\displaystyle\mathbb{E}\left[\left\{\iint R_{00}(s,t)\frac{\phi_{j}(s)\phi_{k}(t)}{f(s)f(t)}\mathrm{d}s\mathrm{d}t\right\}^{2}\right]\lesssim\frac{1}{n}\left({j^{-a}k^{-a}}+\frac{j^{-a}+k^{-a}}{\bar{N}_{2}}+\frac{1}{\bar{N}_{2}^{2}}\right)

and

k>j𝔼[{R00(s,t)ϕj(s)ϕk(t)f(s)f(t)dsdt}2]1n(j12a+jah1+j1aN¯2+h1N¯22).\displaystyle\sum_{k>j}\mathbb{E}\left[\left\{\iint R_{00}(s,t)\frac{\phi_{j}(s)\phi_{k}(t)}{f(s)f(t)}\mathrm{d}s\mathrm{d}t\right\}^{2}\right]\lesssim\frac{1}{n}\left({j^{1-2a}}+\frac{j^{-a}h^{-1}+j^{1-a}}{\bar{N}_{2}}+\frac{h^{-1}}{\bar{N}^{2}_{2}}\right).

By similar analysis, the second term in the right hand side of equation (18) has the same convergence rate as the first term. Under h4j2a+2c1h^{4}j^{2a+2c}\lesssim 1 and hjalogn1hj^{a}\log n\lesssim 1, the last two terms in the right hand side of equation (18) are dominated by the first two terms. Then the proof of the random design case is complete by

𝔼({R00C(s,t)S00hCs(s,t)S10hCt(s,t)S01}ϕj(s)ϕk(t)f(s)f(t)dsdt)2\displaystyle\mathbb{E}\left(\iint\left\{R_{00}-C(s,t)S_{00}-h\frac{\partial C}{\partial s}(s,t)S_{10}-h\frac{\partial C}{\partial t}(s,t)S_{01}\right\}\frac{\phi_{j}(s)\phi_{k}(t)}{f(s)f(t)}\mathrm{d}s\mathrm{d}t\right)^{2}
\displaystyle\lesssim 1n(jaka+ja+kaN¯2+1N¯22)+h4k2aj2c\displaystyle\frac{1}{n}\left({j^{-a}k^{-a}}+\frac{j^{-a}+k^{-a}}{\bar{N}_{2}}+\frac{1}{\bar{N}^{2}_{2}}\right)+h^{4}k^{-2a}j^{2c}

for all k2jk\leq 2j and

k>j𝔼({R00C(s,t)S00hCs(s,t)S10hCt(s,t)S01}ϕj(s)ϕk(t)f(s)f(t)dsdt)2\displaystyle\sum_{k>j}\mathbb{E}\left(\iint\left\{R_{00}-C(s,t)S_{00}-h\frac{\partial C}{\partial s}(s,t)S_{10}-h\frac{\partial C}{\partial t}(s,t)S_{01}\right\}\frac{\phi_{j}(s)\phi_{k}(t)}{f(s)f(t)}\mathrm{d}s\mathrm{d}t\right)^{2}
\displaystyle\lesssim 1n(j12a+jah1+j1aN¯2+h1N¯22)+h4j12a+2c.\displaystyle\frac{1}{n}\left({j^{1-2a}}+\frac{j^{-a}h^{-1}+j^{1-a}}{\bar{N}_{2}}+\frac{h^{-1}}{\bar{N}^{2}_{2}}\right)+h^{4}j^{1-2a+2c}.

Step 3: perturbation series. By the proof of Theorem 5.1.8 in Hsing and Eubank (2015), for jmj\in m and on the set Ωm(n,N,h)={C^CHSηm/2}\Omega_{m}(n,N,h)=\{\|\hat{C}-C\|_{\mathrm{HS}}\leq\eta_{m}/2\}, we have the following expansion,

ϕ^jϕj=\displaystyle\hat{\phi}_{j}-\phi_{j}= kj(C^C)ϕjϕk(λjλk)ϕk+kj(C^C)(ϕ^jϕj)ϕk(λjλk)ϕk\displaystyle\sum_{k\neq j}\frac{\int(\hat{C}-C)\phi_{j}\phi_{k}}{\left(\lambda_{j}-\lambda_{k}\right)}\phi_{k}+\sum_{k\neq j}\frac{\int(\hat{C}-C)(\hat{\phi}_{j}-\phi_{j})\phi_{k}}{\left(\lambda_{j}-\lambda_{k}\right)}\phi_{k} (26)
+kjs=1(λjλ^j)s(λjλk)s+1{(C^C)ϕ^jϕk}ϕk+{(ϕ^jϕj)ϕj}ϕj.\displaystyle+\sum_{k\neq j}\sum_{s=1}^{\infty}\frac{(\lambda_{j}-\hat{\lambda}_{j})^{s}}{(\lambda_{j}-\lambda_{k})^{s+1}}\left\{\int(\hat{C}-C)\hat{\phi}_{j}\phi_{k}\right\}\phi_{k}+\left\{\int(\hat{\phi}_{j}-\phi_{j})\phi_{j}\right\}\phi_{j}.

Such kind of expansion can also be found in Hall and Hosseini-Nasab (2006) and Li and Hsing (2010). Below, when we say that a bound is valid when Ωm(n,N,h)\Omega_{m}(n,N,h) holds, this should be interpreted as stating that the bound is valid for all realizations for which C^CHSηm/2\|\hat{C}-C\|_{\mathrm{HS}}\leq\eta_{m}/2 (Hall and Horowitz, 2007). Under assumptions m2a+2/n0m^{2a+2}/n\rightarrow 0, m2a+2/(nN¯22h2)0m^{2a+2}/(n\bar{N}_{2}^{2}h^{2})\rightarrow 0 and h4max{m2a+2c,m4alogn}1h^{4}\max\{m^{2a+2c},m^{4a}\log n\}\lesssim 1, we have (Ωm(n,N,h))1\mathbb{P}(\Omega_{m}(n,N,h))\rightarrow 1. Since limDlim supnP(ϕ^jϕj>Dτn)=0\lim_{D\rightarrow\infty}\limsup_{n\rightarrow\infty}P(\|\hat{\phi}_{j}-\phi_{j}\|>D\tau_{n})=0 implies ϕ^jϕj=OP(τn)\|\hat{\phi}_{j}-\phi_{j}\|=O_{P}(\tau_{n}), thus the results in OPO_{P} form we want to prove relate only to probabilities of differences. It suffices to work with bounds that are established under the assumption such that (Ωm)1\mathbb{P}(\Omega_{m})\rightarrow 1 holds. (Hall and Horowitz, 2007, Section 5.1).

We first show that 𝔼(ϕ^jϕj2)\mathbb{E}(\|\hat{\phi}_{j}-\phi_{j}\|^{2}) is dominated by the 2\mathcal{L}^{2} norm of the first term in the right hand side of equation (26). By Bessel’s inequality, we see that

𝔼kj(C^C)(ϕ^jϕj)ϕk(λjλk)ϕk2𝔼C^CHS2ϕ^jϕj2(2ηj)2<116𝔼ϕ^jϕj2,\mathbb{E}\left\|\sum_{k\neq j}\frac{\int(\hat{C}-C)(\hat{\phi}_{j}-\phi_{j})\phi_{k}}{\left(\lambda_{j}-\lambda_{k}\right)}\phi_{k}\right\|^{2}\leq\mathbb{E}\frac{\|\hat{C}-C\|_{\mathrm{HS}}^{2}\|\hat{\phi}_{j}-\phi_{j}\|^{2}}{(2\eta_{j})^{2}}<\frac{1}{16}\mathbb{E}\|\hat{\phi}_{j}-\phi_{j}\|^{2}, (27)

where the last equality comes from the fact ηj1C^C<1/2\eta_{j}^{-1}\|\hat{C}-C\|<1/2 on Ωm(n,N,h)\Omega_{m}(n,N,h). Similarly,

𝔼kjs=1(λjλ^j)s(λjλk)s+1{(C^C)ϕ^jϕk}ϕk2\displaystyle\mathbb{E}\left\|\sum_{k\neq j}\sum_{s=1}^{\infty}\frac{(\lambda_{j}-\hat{\lambda}_{j})^{s}}{(\lambda_{j}-\lambda_{k})^{s+1}}\left\{\int(\hat{C}-C)\hat{\phi}_{j}\phi_{k}\right\}\phi_{k}\right\|^{2} (28)
=\displaystyle= 𝔼kj(λjλ^j)2(λjλk)2(λ^jλk)2{(C^C)ϕ^jϕk}2\displaystyle\mathbb{E}\sum_{k\neq j}\frac{(\lambda_{j}-\hat{\lambda}_{j})^{2}}{(\lambda_{j}-\lambda_{k})^{2}(\hat{\lambda}_{j}-\lambda_{k})^{2}}\left\{\int(\hat{C}-C)\hat{\phi}_{j}\phi_{k}\right\}^{2}
\displaystyle\leq 2𝔼C^CHS2(2ηjC^CHS)2[kj{(C^C)ϕjϕk}2(λjλk)2+kj{(C^C)(ϕ^jϕj)ϕk}2(λjλk)2]\displaystyle 2\mathbb{E}\frac{\|\hat{C}-C\|_{\mathrm{HS}}^{2}}{(2\eta_{j}-\|\hat{C}-C\|_{\mathrm{HS}})^{2}}\left[\sum_{k\neq j}\frac{\left\{\int(\hat{C}-C)\phi_{j}\phi_{k}\right\}^{2}}{\left(\lambda_{j}-\lambda_{k}\right)^{2}}\right.+\left.\sum_{k\neq j}\frac{\left\{\int(\hat{C}-C)(\hat{\phi}_{j}-\phi_{j})\phi_{k}\right\}^{2}}{\left(\lambda_{j}-\lambda_{k}\right)^{2}}\right]
\displaystyle\leq 89𝔼[C^CHS2ηj2kj{(C^C)ϕjϕk}2(λjλk)2+C^CHS4ηj4ϕ^jϕj2]\displaystyle\frac{8}{9}\mathbb{E}\left[\frac{\|\hat{C}-C\|_{\mathrm{HS}}^{2}}{\eta_{j}^{2}}\sum_{k\neq j}\frac{\left\{\int(\hat{C}-C)\phi_{j}\phi_{k}\right\}^{2}}{\left(\lambda_{j}-\lambda_{k}\right)^{2}}+\frac{\|\hat{C}-C\|_{\mathrm{HS}}^{4}}{\eta_{j}^{4}}\|\hat{\phi}_{j}-\phi_{j}\|^{2}\right]
\displaystyle\leq 29𝔼kj{(C^C)ϕjϕk}2(λjλk)2+118𝔼ϕ^jϕj2.\displaystyle\frac{2}{9}\mathbb{E}\sum_{k\neq j}\frac{\left\{\int(\hat{C}-C)\phi_{j}\phi_{k}\right\}^{2}}{\left(\lambda_{j}-\lambda_{k}\right)^{2}}+\frac{1}{18}\mathbb{E}\|\hat{\phi}_{j}-\phi_{j}\|^{2}.

Combing (26) to (28) and the fact {(ϕ^jϕj)ϕj}ϕj=1/2ϕ^jϕj2\|\{\int(\hat{\phi}_{j}-\phi_{j})\phi_{j}\}\phi_{j}\|=1/2\|\hat{\phi}_{j}-\phi_{j}\|^{2}, 𝔼(ϕ^jϕj2)\mathbb{E}(\|\hat{\phi}_{j}-\phi_{j}\|^{2}) is dominated by the first term in the right hand side of equation (26). Thus,

𝔼(ϕ^jϕj2)\displaystyle\mathbb{E}(\|\hat{\phi}_{j}-\phi_{j}\|^{2})\lesssim kj𝔼[{(C^CC~0)ϕjϕk}2](λjλk)2+kj𝔼{(C~0ϕjϕk)2}(λjλk)2.\displaystyle\sum_{k\neq j}\frac{\mathbb{E}\left[\left\{\int(\hat{C}-C-\tilde{C}_{0})\phi_{j}\phi_{k}\right\}^{2}\right]}{(\lambda_{j}-\lambda_{k})^{2}}+\sum_{k\neq j}\frac{\mathbb{E}\left\{\left(\int\tilde{C}_{0}\phi_{j}\phi_{k}\right)^{2}\right\}}{(\lambda_{j}-\lambda_{k})^{2}}. (29)

For the random design, under Assumption 6 and hjalogn1hj^{a}\log n\lesssim 1, by Bessel equality and (13), the first term in the right hand side of equation (29) is bounded by

kj𝔼{(C^CC~0)ϕjϕk}2(λjλk)2C^CC~0HS2ηj2\displaystyle\sum_{k\neq j}\frac{\mathbb{E}\left\{\int(\hat{C}-C-\tilde{C}_{0})\phi_{j}\phi_{k}\right\}^{2}}{(\lambda_{j}-\lambda_{k})^{2}}\leq\frac{\left\|\hat{C}-C-\tilde{C}_{0}\right\|_{\mathrm{HS}}^{2}}{\eta_{j}^{2}}
=\displaystyle= OP(j2n{1+jaN¯2+j2aN¯22})+oP(h2j2c+2).\displaystyle O_{P}\left(\frac{j^{2}}{n}\left\{1+\frac{j^{a}}{\bar{N}_{2}}+\frac{j^{2a}}{\bar{N}^{2}_{2}}\right\}\right)+o_{P}\left(h^{2}j^{2c+2}\right).

The first statement of Theorem 1 is complete by

𝔼(ϕ^jϕj2)kj𝔼{(C~0ϕjϕk)2}(λjλk)2\displaystyle\mathbb{E}(\|\hat{\phi}_{j}-\phi_{j}\|^{2})\lesssim\sum_{k\neq j}\frac{\mathbb{E}\left\{\left(\int\tilde{C}_{0}\phi_{j}\phi_{k}\right)^{2}\right\}}{(\lambda_{j}-\lambda_{k})^{2}}
=\displaystyle= kjk2j𝔼{(C~0ϕjϕk)2}(λjλk)2+k>2j𝔼{(C~0ϕjϕk)2}(λjλk)2\displaystyle\sum_{{k\neq j}\atop{k\leq 2j}}\frac{\mathbb{E}\left\{\left(\int\tilde{C}_{0}\phi_{j}\phi_{k}\right)^{2}\right\}}{(\lambda_{j}-\lambda_{k})^{2}}+\sum_{k>2j}\frac{\mathbb{E}\left\{\left(\int\tilde{C}_{0}\phi_{j}\phi_{k}\right)^{2}\right\}}{(\lambda_{j}-\lambda_{k})^{2}}
\displaystyle\lesssim h4j2c+2+j2n{1+jaN¯2+j2aN¯22}+h4j2c+1+1n(j+jaN¯2h+ja+1N¯2+j2aN¯22h)\displaystyle h^{4}j^{2c+2}+\frac{j^{2}}{n}\left\{1+\frac{j^{a}}{\bar{N}_{2}}+\frac{j^{2a}}{\bar{N}^{2}_{2}}\right\}+h^{4}j^{2c+1}+\frac{1}{n}\left(j+\frac{j^{a}}{\bar{N}_{2}h}+\frac{j^{a+1}}{\bar{N}_{2}}+\frac{j^{2a}}{\bar{N}^{2}_{2}h}\right)
\displaystyle\lesssim j2n{1+jaN¯2+j2aN¯22}+janh(1N¯2+jaN¯22)+h4j2c+2,\displaystyle\frac{j^{2}}{n}\left\{1+\frac{j^{a}}{\bar{N}_{2}}+\frac{j^{2a}}{\bar{N}_{2}^{2}}\right\}+\frac{j^{a}}{nh}\left(\frac{1}{\bar{N}_{2}}+\frac{j^{a}}{\bar{N}_{2}^{2}}\right)+h^{4}j^{2c+2},

where the second inequality comes from Lemma 7 in Dou, Pollard and Zhou (2012) and Lemma 2. The proof for the fixed regular design is similar, and thus we omit the details for brevity.

References

  • Anderson, Guionnet and Zeitouni (2010) {bbook}[author] \bauthor\bsnmAnderson, \bfnmGreg W\binitsG. W., \bauthor\bsnmGuionnet, \bfnmAlice\binitsA. and \bauthor\bsnmZeitouni, \bfnmOfer\binitsO. (\byear2010). \btitleAn Introduction to Random Matrices \bvolume118. \bpublisherCambridge university press. \endbibitem
  • Bai and Silverstein (2010) {bbook}[author] \bauthor\bsnmBai, \bfnmZhidong\binitsZ. and \bauthor\bsnmSilverstein, \bfnmJack W\binitsJ. W. (\byear2010). \btitleSpectral Analysis of Large Dimensional Random Matrices \bvolume20. \bpublisherSpringer. \endbibitem
  • Bai et al. (2009) {barticle}[author] \bauthor\bsnmBai, \bfnmZhidong\binitsZ., \bauthor\bsnmJiang, \bfnmDandan\binitsD., \bauthor\bsnmYao, \bfnmJian-Feng\binitsJ.-F. and \bauthor\bsnmZheng, \bfnmShurong\binitsS. (\byear2009). \btitleCorrections to LRT on large-dimensional covariance matrix by RMT. \bjournalAnn. Statist. \bvolume37 \bpages3822–3840. \endbibitem
  • Bao et al. (2022) {barticle}[author] \bauthor\bsnmBao, \bfnmZhigang\binitsZ., \bauthor\bsnmDing, \bfnmXiucai\binitsX., \bauthor\bsnmWang, \bfnmJingming\binitsJ. and \bauthor\bsnmWang, \bfnmKe\binitsK. (\byear2022). \btitleStatistical inference for principal components of spiked covariance matrices. \bjournalAnn. Statist. \bvolume50 \bpages1144–1169. \endbibitem
  • Bianchi et al. (2011) {barticle}[author] \bauthor\bsnmBianchi, \bfnmPascal\binitsP., \bauthor\bsnmDebbah, \bfnmMerouane\binitsM., \bauthor\bsnmMaïda, \bfnmMylène\binitsM. and \bauthor\bsnmNajim, \bfnmJamal\binitsJ. (\byear2011). \btitlePerformance of statistical tests for single-source detection using random matrix theory. \bjournalIEEE Trans. Inf. Theory. \bvolume57 \bpages2400–2419. \endbibitem
  • Bickel and Rosenblatt (1973) {barticle}[author] \bauthor\bsnmBickel, \bfnmPeter J\binitsP. J. and \bauthor\bsnmRosenblatt, \bfnmMurray\binitsM. (\byear1973). \btitleOn some global measures of the deviations of density function estimates. \bjournalAnn. Statist. \bpages1071–1095. \endbibitem
  • Bosq (2000) {bbook}[author] \bauthor\bsnmBosq, \bfnmDenis\binitsD. (\byear2000). \btitleLinear Processes in Function Spaces: Theory and Applications \bvolume149. \bpublisherSpringer Science & Business Media. \endbibitem
  • Cai and Hall (2006) {barticle}[author] \bauthor\bsnmCai, \bfnmT Tony\binitsT. T. and \bauthor\bsnmHall, \bfnmPeter\binitsP. (\byear2006). \btitlePrediction in functional linear regression. \bjournalAnn. Statist. \bvolume34 \bpages2159–2179. \endbibitem
  • Cai and Yuan (2010) {barticle}[author] \bauthor\bsnmCai, \bfnmTony\binitsT. and \bauthor\bsnmYuan, \bfnmMing\binitsM. (\byear2010). \btitleNonparametric covariance function estimation for functional and longitudinal data. \bjournalUniversity of Pennsylvania and Georgia inistitute of technology. \endbibitem
  • Cai and Yuan (2011) {barticle}[author] \bauthor\bsnmCai, \bfnmT Tony\binitsT. T. and \bauthor\bsnmYuan, \bfnmMing\binitsM. (\byear2011). \btitleOptimal estimation of the mean function based on discretely sampled functional data: Phase transition. \bjournalAnn. Statist. \bvolume39 \bpages2330–2355. \endbibitem
  • Chen and Qin (2010) {barticle}[author] \bauthor\bsnmChen, \bfnmSong Xi\binitsS. X. and \bauthor\bsnmQin, \bfnmYing-Li\binitsY.-L. (\byear2010). \btitleA two-sample test for high-dimensional data with applications to gene-set testing. \bjournalAnn. Statist. \bvolume38 \bpages808–835. \endbibitem
  • Ding and Yang (2021) {barticle}[author] \bauthor\bsnmDing, \bfnmXiucai\binitsX. and \bauthor\bsnmYang, \bfnmFan\binitsF. (\byear2021). \btitleSpiked separable covariance matrices and principal components. \bjournalAnn. Statist. \bvolume49 \bpages1113–1138. \endbibitem
  • Dou, Pollard and Zhou (2012) {barticle}[author] \bauthor\bsnmDou, \bfnmWinston Wei\binitsW. W., \bauthor\bsnmPollard, \bfnmDavid\binitsD. and \bauthor\bsnmZhou, \bfnmHarrison H\binitsH. H. (\byear2012). \btitleEstimation in functional regression for general exponential families. \bjournalAnn. Statist. \bvolume40 \bpages2421–2451. \endbibitem
  • El Karoui (2007) {barticle}[author] \bauthor\bsnmEl Karoui, \bfnmNoureddine\binitsN. (\byear2007). \btitleTracy–Widom limit for the largest eigenvalue of a large class of complex sample covariance matrices. \bjournalAnn. Probab. \bvolume35 \bpages663–714. \endbibitem
  • Ferraty and Vieu (2006) {bbook}[author] \bauthor\bsnmFerraty, \bfnmFrédéric\binitsF. and \bauthor\bsnmVieu, \bfnmPhilippe\binitsP. (\byear2006). \btitleNonparametric Functional Data Analysis: Theory and Practice. \bpublisherSpringer Science & Business Media. \endbibitem
  • Hall and Horowitz (2007) {barticle}[author] \bauthor\bsnmHall, \bfnmPeter\binitsP. and \bauthor\bsnmHorowitz, \bfnmJoel L\binitsJ. L. (\byear2007). \btitleMethodology and convergence rates for functional linear regression. \bjournalAnn. Statist. \bvolume35 \bpages70–91. \endbibitem
  • Hall and Hosseini-Nasab (2006) {barticle}[author] \bauthor\bsnmHall, \bfnmPeter\binitsP. and \bauthor\bsnmHosseini-Nasab, \bfnmMohammad\binitsM. (\byear2006). \btitleOn properties of functional principal components analysis. \bjournalJ. R. Stat. Soc. Ser. B Stat. Methodol. \bvolume68 \bpages109–126. \endbibitem
  • Hall, Müller and Wang (2006) {barticle}[author] \bauthor\bsnmHall, \bfnmPeter\binitsP., \bauthor\bsnmMüller, \bfnmHans-Georg\binitsH.-G. and \bauthor\bsnmWang, \bfnmJane-Ling\binitsJ.-L. (\byear2006). \btitleProperties of principal component methods for functional and longitudinal data analysis. \bjournalAnn. Statist. \bpages1493–1517. \endbibitem
  • Härdle (1989) {barticle}[author] \bauthor\bsnmHärdle, \bfnmWolfgang\binitsW. (\byear1989). \btitleAsymptotic maximal deviation of M-smoothers. \bjournalJ. Multivar. Anal. \bvolume29 \bpages163–179. \endbibitem
  • Hardle, Janssen and Serfling (1988) {barticle}[author] \bauthor\bsnmHardle, \bfnmW\binitsW., \bauthor\bsnmJanssen, \bfnmPaul\binitsP. and \bauthor\bsnmSerfling, \bfnmRobert\binitsR. (\byear1988). \btitleStrong uniform consistency rates for estimators of conditional functionals. \bjournalAnn. Statist. \bpages1428–1449. \endbibitem
  • Horváth and Kokoszka (2012) {bbook}[author] \bauthor\bsnmHorváth, \bfnmLajos\binitsL. and \bauthor\bsnmKokoszka, \bfnmPiotr\binitsP. (\byear2012). \btitleInference for Functional Data with Applications \bvolume200. \bpublisherSpringer Science & Business Media. \endbibitem
  • Hsing and Eubank (2015) {bbook}[author] \bauthor\bsnmHsing, \bfnmTailen\binitsT. and \bauthor\bsnmEubank, \bfnmRandall\binitsR. (\byear2015). \btitleTheoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators \bvolume997. \bpublisherJohn Wiley & Sons. \endbibitem
  • Indritz (1963) {bbook}[author] \bauthor\bsnmIndritz, \bfnmJack\binitsJ. (\byear1963). \btitleMethods in Analysis. \bpublisherMacmillan. \endbibitem
  • Johnstone (2001) {barticle}[author] \bauthor\bsnmJohnstone, \bfnmIain M\binitsI. M. (\byear2001). \btitleOn the distribution of the largest eigenvalue in principal components analysis. \bjournalAnn. Statist. \bvolume29 \bpages295–327. \endbibitem
  • Li and Hsing (2010) {barticle}[author] \bauthor\bsnmLi, \bfnmYehua\binitsY. and \bauthor\bsnmHsing, \bfnmTailen\binitsT. (\byear2010). \btitleUniform convergence rates for nonparametric regression and principal component analysis in functional/longitudinal data. \bjournalAnn. Statist. \bvolume38 \bpages3321–3351. \endbibitem
  • Müller and Stadtmüller (2005) {barticle}[author] \bauthor\bsnmMüller, \bfnmHans-Georg\binitsH.-G. and \bauthor\bsnmStadtmüller, \bfnmUlrich\binitsU. (\byear2005). \btitleGeneralized functional linear models. \bjournalAnn. Statist. \bvolume33 \bpages774–805. \endbibitem
  • Nadler, Penna and Garello (2011) {binproceedings}[author] \bauthor\bsnmNadler, \bfnmBoaz\binitsB., \bauthor\bsnmPenna, \bfnmFederico\binitsF. and \bauthor\bsnmGarello, \bfnmRoberto\binitsR. (\byear2011). \btitlePerformance of eigenvalue-based signal detectors with known and unknown noise level. In \bbooktitle2011 IEEE Int. Conf. Commun. \bpages1–5. \bpublisherIEEE. \endbibitem
  • Onatski (2009) {barticle}[author] \bauthor\bsnmOnatski, \bfnmAlexei\binitsA. (\byear2009). \btitleTesting hypotheses about the number of factors in large factor models. \bjournalEconometrica \bvolume77 \bpages1447–1479. \endbibitem
  • Pastur and Shcherbina (2011) {bbook}[author] \bauthor\bsnmPastur, \bfnmLeonid Andreevich\binitsL. A. and \bauthor\bsnmShcherbina, \bfnmMariya\binitsM. (\byear2011). \btitleEigenvalue Distribution of Large Random Matrices \bvolume171. \bpublisherAmerican Mathematical Soc. \endbibitem
  • Paul (2007) {barticle}[author] \bauthor\bsnmPaul, \bfnmDebashis\binitsD. (\byear2007). \btitleAsymptotics of sample eigenstructure for a large dimensional spiked covariance model. \bjournalStat. Sinica. \bpages1617–1642. \endbibitem
  • Paul and Aue (2014) {barticle}[author] \bauthor\bsnmPaul, \bfnmDebashis\binitsD. and \bauthor\bsnmAue, \bfnmAlexander\binitsA. (\byear2014). \btitleRandom matrix theory in statistics: A review. \bjournalJ. Stat. Plan. Inference. \bvolume150 \bpages1–29. \endbibitem
  • Paul and Peng (2009) {barticle}[author] \bauthor\bsnmPaul, \bfnmDebashis\binitsD. and \bauthor\bsnmPeng, \bfnmJie\binitsJ. (\byear2009). \btitleConsistency of restricted maximum likelihood estimators of principal components. \bjournalAnn. Statist. \bvolume37 \bpages1229–1271. \endbibitem
  • Qu, Wang and Wang (2016) {barticle}[author] \bauthor\bsnmQu, \bfnmSimeng\binitsS., \bauthor\bsnmWang, \bfnmJane-Ling\binitsJ.-L. and \bauthor\bsnmWang, \bfnmXiao\binitsX. (\byear2016). \btitleOptimal estimation for the functional cox model. \bjournalAnn. Statist. \bvolume44 \bpages1708–1738. \endbibitem
  • Ramsay and Silverman (2006) {bbook}[author] \bauthor\bsnmRamsay, \bfnmJames\binitsJ. and \bauthor\bsnmSilverman, \bfnmBW\binitsB. (\byear2006). \btitleFunctional Data Analysis. \bpublisherSpringer Science & Business Media. \endbibitem
  • Rice and Wu (2001) {barticle}[author] \bauthor\bsnmRice, \bfnmJohn A\binitsJ. A. and \bauthor\bsnmWu, \bfnmColin O\binitsC. O. (\byear2001). \btitleNonparametric mixed effects models for unequally sampled noisy curves. \bjournalBiometrics \bvolume57 \bpages253–259. \endbibitem
  • Shao, Lin and Yao (2022) {barticle}[author] \bauthor\bsnmShao, \bfnmLingxuan\binitsL., \bauthor\bsnmLin, \bfnmZhenhua\binitsZ. and \bauthor\bsnmYao, \bfnmFang\binitsF. (\byear2022). \btitleIntrinsic Riemannian functional data analysis for sparse longitudinal observations. \bjournalAnn. Statist. \bvolume50 \bpages1696 – 1721. \bdoi10.1214/22-AOS2172 \endbibitem
  • Wahl (2022) {binproceedings}[author] \bauthor\bsnmWahl, \bfnmMartin\binitsM. (\byear2022). \btitleLower bounds for invariant statistical models with applications to principal component analysis. In \bbooktitleAnn. Inst. Henri Poincaré Probab. Statist. \bvolume58 \bpages1565–1589. \bpublisherInstitut Henri Poincaré. \endbibitem
  • Yao and Lee (2006) {barticle}[author] \bauthor\bsnmYao, \bfnmFang\binitsF. and \bauthor\bsnmLee, \bfnmThomas CM\binitsT. C. (\byear2006). \btitlePenalized spline models for functional principal component analysis. \bjournalJ. R. Stat. Soc. Ser. B Stat. Methodol. \bvolume68 \bpages3–25. \endbibitem
  • Yao, Müller and Wang (2005a) {barticle}[author] \bauthor\bsnmYao, \bfnmFang\binitsF., \bauthor\bsnmMüller, \bfnmHans-Georg\binitsH.-G. and \bauthor\bsnmWang, \bfnmJane-Ling\binitsJ.-L. (\byear2005a). \btitleFunctional data analysis for sparse longitudinal data. \bjournalJ. Am. Stat. Assoc. \bvolume100 \bpages577–590. \endbibitem
  • Yao, Müller and Wang (2005b) {barticle}[author] \bauthor\bsnmYao, \bfnmFang\binitsF., \bauthor\bsnmMüller, \bfnmHans-Georg\binitsH.-G. and \bauthor\bsnmWang, \bfnmJane-Ling\binitsJ.-L. (\byear2005b). \btitleFunctional linear regression analysis for longitudinal data. \bjournalAnn. Statist. \bpages2873–2903. \endbibitem
  • Yuan and Cai (2010) {barticle}[author] \bauthor\bsnmYuan, \bfnmMing\binitsM. and \bauthor\bsnmCai, \bfnmT Tony\binitsT. T. (\byear2010). \btitleA reproducing kernel Hilbert space approach to functional linear regression. \bjournalAnn. Statist. \bvolume38 \bpages3412–3444. \endbibitem
  • Zhang and Chen (2007) {barticle}[author] \bauthor\bsnmZhang, \bfnmJin-Ting\binitsJ.-T. and \bauthor\bsnmChen, \bfnmJianwei\binitsJ. (\byear2007). \btitleStatistical inferences for functional data. \bjournalAnn. Statist. \bvolume35 \bpages1052–1079. \endbibitem
  • Zhang and Wang (2016) {barticle}[author] \bauthor\bsnmZhang, \bfnmXiaoke\binitsX. and \bauthor\bsnmWang, \bfnmJane-Ling\binitsJ.-L. (\byear2016). \btitleFrom sparse to dense functional data and beyond. \bjournalAnn. Statist. \bvolume44 \bpages2281–2321. \endbibitem
  • Zheng (2012) {binproceedings}[author] \bauthor\bsnmZheng, \bfnmShurong\binitsS. (\byear2012). \btitleCentral limit theorems for linear spectral statistics of large dimensional FF-matrices. In \bbooktitleAnn. Inst. Henri Poincaré Probab. Statist \bvolume48 \bpages444–476. \endbibitem
  • Zhou, Yao and Zhang (2023) {barticle}[author] \bauthor\bsnmZhou, \bfnmHang\binitsH., \bauthor\bsnmYao, \bfnmFang\binitsF. and \bauthor\bsnmZhang, \bfnmHuiming\binitsH. (\byear2023). \btitleFunctional linear regression for discretely observed data: from ideal to reality. \bjournalBiometrika \bvolume110 \bpages381–393. \endbibitem