This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Asymptotic properties of spiked eigenvalues and eigenvectors of signal-plus-noise matrices with their applications

Abstract

This paper is to consider a general low-rank signal plus noise model in high dimensional settings. Specifically, we consider the noise with a general covariance structure and the signal to be at the same magnitude as the noise. Our study focuses on exploring various asymptotic properties related to the spiked eigenvalues and eigenvectors. As applications, we propose a new criterion to estimate the number of clusters, and investigate the properties of spectral clustering.



keywords: signal-plus-noise matrices; spiked eigenvalues and eigenvectors; deterministic equivalents; spectral clustering.

Xiaoyu Liu1, Yiming Liu1, Guangming Pan2, Lingyue Zhang3, Zhixiang Zhang4
1School of Economics, Jinan University
2 School of Physical and Mathematical Sciences, Nanyang Technological University
3School of Mathematical Science, Capital Normal University
4Department of Mathematics, Faculty of Science and Technology, University of Macau

1 Introduction

Consider a signal-plus-noise model with the form of

๐—n=๐€n+๐šบ1/2โ€‹๐–nโˆˆโ„pร—n,\displaystyle\mathbf{X}_{n}=\mathbf{A}_{n}+\bm{\Sigma}^{1/2}\mathbf{W}_{n}\in\mathbb{R}^{p\times n}, (1)

where ๐€n\mathbf{A}_{n} is the signal matrix with finite rank, ๐–n\mathbf{W}_{n} consists of i.i.d. random variables, and ๐šบ\bm{\Sigma} accounts for the covariance structure in the noise. Such a model is popular in many fields including machine learning (Yang etย al.,, 2016), matrix denoising (Nadakuditi,, 2014) or signal processing (Vallet etย al.,, 2012). When ๐šบ\bm{\Sigma} is an identity matrix, there has been a huge amount of work on eigenvalues and eigenvectors for such signal-plus-noise type matrices. To name a few, Loubaton and Vallet, (2011) derived the almost sure limits of eigenvalues, Ding, (2020) obtained the limits and convergent rates of the leading eigenvalues and eigenvectors, Bao etย al., (2021) showed the distributions of the principal singular vectors and singular subspaces. When ๐šบn\bm{\Sigma}_{n} is set to be a diagonal matrix, Hachem etย al., (2013) investigated the limiting behavior of the random bilinear form of the sample covariance matrix under a separable model, which includes the case of ๐šบ\bm{\Sigma} being diagonal in (1). When the signal-to-noise ratio tends to infinity, i.e., the ratio of the spectral norm of the signal part to the noise part tends to infinity, Cape etย al., (2019) also considered the asymptotic properties of spiked eigenvectors under Model (1). By imposing Gaussianity on ๐–n\mathbf{W}_{n}, Han etย al., (2021) provided a eigen-selected spectral clustering method with theoretical justifications.

However, the assumptions that ๐šบ\bm{\Sigma} is an identity or diagonal matrix, and the signal-to-noise ratio tends to infinity, seem to be restricted and hard to verify in practice. In this paper, we aim to investigate the asymptotic properties of the eigenvalues of ๐—nโ€‹๐—nโŠค\mathbf{X}_{n}\mathbf{X}_{n}^{\top}, as well as both the left and right spiked singular vectors of ๐—n\mathbf{X}_{n} under the regime where p/nโ†’c>0p/n\rightarrow c>0, with mild regularity conditions towards ๐šบ\bm{\Sigma} and ๐€n\mathbf{A}_{n}, and mild moment assumptions on ๐–n\mathbf{W}_{n}. To the best of our knowledge, we first systematically study the properties of eigenvalues and eigenvectors of Model (1) under such mild conditions. Specifically, we consider

๐’n:=๐—nโ€‹๐—nโŠค=(๐€n+๐šบ1/2โ€‹๐–n)โ€‹(๐€n+๐šบ1/2โ€‹๐–n)โŠค\mathbf{S}_{n}:=\mathbf{X}_{n}\mathbf{X}_{n}^{\top}=(\mathbf{A}_{n}+\bm{\Sigma}^{1/2}\mathbf{W}_{n})(\mathbf{A}_{n}+\bm{\Sigma}^{1/2}\mathbf{W}_{n})^{\top} (2)

and

๐’~n:=๐—nโŠคโ€‹๐—n=(๐€n+๐šบ1/2โ€‹๐–n)โŠคโ€‹(๐€n+๐šบ1/2โ€‹๐–n).\displaystyle\tilde{\mathbf{S}}_{n}:=\mathbf{X}_{n}^{\top}\mathbf{X}_{n}=(\mathbf{A}_{n}+\bm{\Sigma}^{1/2}\mathbf{W}_{n})^{\top}(\mathbf{A}_{n}+\bm{\Sigma}^{1/2}\mathbf{W}_{n}). (3)

In order to obtain the asymptotic properties of spiked eigenvectors of ๐’n\mathbf{S}_{n} and ๐’~n\tilde{\mathbf{S}}_{n}, we analyze the quadratic forms involving the resolvents Qnโ€‹(z)Q_{n}(z) and Q~nโ€‹(z)\tilde{Q}_{n}(z) of matrices ๐’n\mathbf{S}_{n} and ๐’~n\tilde{\mathbf{S}}_{n} defined as

Qnโ€‹(z)=(๐’nโˆ’zโ€‹๐ˆ)โˆ’1\displaystyle Q_{n}(z)=(\mathbf{S}_{n}-z\mathbf{I})^{-1} (4)

and

Q~nโ€‹(z)=(๐’~nโˆ’zโ€‹๐ˆ)โˆ’1,\displaystyle\tilde{Q}_{n}(z)=(\tilde{\mathbf{S}}_{n}-z\mathbf{I})^{-1}, (5)

respectively, where zโˆˆ๐’ž+z\in\mathcal{C}^{+} and ๐ˆ\mathbf{I} refer to an identity matrix with comparable sizes. The study on the spiked eigenvalues leverages the main results developed in Liu etย al., (2022).

To demonstrate the use of the theoretical results, we consider applications in spectral clustering. When each column of ๐€n\mathbf{A}_{n} can be only chosen from a finite number of distinct unknown deterministic vectors, (1) can be regarded as a collection of samples generated from a mixture model. Thus, in a vector form, the ii-th column of Model (1) can be written as

๐ฑi=๐ši+๐šบ1/2๐ฐi,โˆˆโ„p\mathbf{x}_{i}=\mathbf{a}_{i}+\bm{\Sigma}^{1/2}\mathbf{w}_{i},\in\mathbb{R}^{p} (6)

where ๐ši=๐s/n\mathbf{a}_{i}=\bm{\mu}_{s}/\sqrt{n} for some sโˆˆ{1,โ€ฆ,K}s\in\{1,\ldots,K\} if iโˆˆ๐’ฑsโІ{1โ€‹โ€ฆ,n}i\in\mathcal{V}_{s}\subseteq\{1\ldots,n\}. The normalized constant n\sqrt{n} in ๐ši\mathbf{a}_{i} is to unify the Assumption 1 below. Here โˆชs=1k๐’ฑs={1โ€‹โ€ฆ,n}\cup_{s=1}^{k}\mathcal{V}_{s}=\{1\ldots,n\} and ๐’ฑsโˆฉ๐’ฑt=โˆ…\mathcal{V}_{s}\cap\mathcal{V}_{t}=\emptyset for any sโ‰ ts\neq t, and KK actually refers to the number of the different distributions (i.e., clusters) in a mixture model. One should also note that the labels are unknown in clustering problems. Numerous literatures investigate mixture models. In statistics, Redner and Walker, (1984) considered the clustering problem for the Gaussian mixture model in low dimensional cases, while Cai etย al., (2019) considered the high dimensional cases. Some classical techniques about clustering were also proposed in past decades; see e.g., MacQueen etย al., (1967), Bradley etย al., (1999), Kaufman and Rousseeuw, (1987), Maimon and Rokach, (2005) and Duda and Hart, (1973). In empirical economics, mixture models are used to introduce unobserved heterogeneity. An important example of this setup from the econometrics literature is Keane and Wolpin, (1997), which investigated the clustering problem in labor markets. Such models also arise in analyzing some classes of games with multiple Nash equilibria. See for example, Berry and Tamer, (2006), Chen etย al., (2014) and others.

Our main theoretical contribution is to precisely characterize the first-order limits of the eigenvalues and eigenvectors of ๐’n\mathbf{S}_{n} and ๐’~n\widetilde{\bf S}_{n}. There are two observations that can be obtained based on our main theoretical results that are somewhat surprising, as they exhibit some overlaps with findings in the literature, albeit in different scopes of problems. The first is that the limits of the spiked eigenvalues of ๐’n\mathbf{S}_{n} coincide with these of the sample covariance matrices without the signal part, by letting the population covariance matrix be ๐€nโ€‹๐€nโŠค+๐šบ\mathbf{A}_{n}\mathbf{A}_{n}^{\top}+\bm{\Sigma}, see the discussion below Theorem 2. The second is that, the spiked right singular vectors of ๐—n\mathbf{X}_{n} have an intrinsic block structure if ๐€n\mathbf{A}_{n} contains a finite number of distinct deterministic factors, even for a moderate signal-to-noise ratio. Our Corollary 1 precisely quantifies the deviation of the right singular vector from a vector with entries having a group structure. This finding is highly relevant to the field of spectral clustering, which has been extensively discussed in the literature. It is worth noting that many existing studies assume strong moment conditions on the noise and consider scenarios where the signal-to-noise ratio tends to infinity.

As applications, we propose a method to estimate the number of clusters by leveraging the asymptotic limits of sample eigenvalues. We also discuss how the theoretical results intuitively explain why the spiked eigenvectors have clustering power in the context of spectral clustering.

The remaining sections are organized as follows. In Section 2 we state the main results of the quantities of (4) and (5) under Model 1. Section 3 includes the applications in terms of clustering and classification. In Section 4, we demonstrate some numerical results regarding the applications mentioned in Section 3. All the proofs are relegated to the Appendix part and the supplementary materials.

Conventions: We use CC to denote generic large constant independent of nn, and its value may change from line to line. aโˆงb=minโก{a,b}a\wedge b=\min\{a,b\}, ๐Ÿ\mathbf{1} and ๐ˆ\mathbf{I} refer to a vector with all entries being one and an identity matrix with a comparable size, respectively. We let โˆฅโ‹…โˆฅ\|\cdot\| denote the Euclidean norm of a vector or the spectral norm of a matrix.

2 The main results

In this section, we mainly investigate the limits of the eigenvalues and eigenvectors of ๐’n\mathbf{S}_{n} and ๐’~n\widetilde{\bf S}_{n} defined in (2) and (3), respectively. We first impose some mild conditions on ๐–n\mathbf{W}_{n} for establishing the asymptotic limits of the eigenvalues and eigenvectors:

Assumption 1.

We assume that ๐–n=(wiโ€‹j)\mathbf{W}_{n}=(w_{ij}) is an pร—np\times n matrix, whose entries {wiโ€‹j:1โ‰คiโ‰คp,1โ‰คjโ‰คn}\{w_{ij}:1\leq i\leq p,1\leq j\leq n\} are independent real random variables satisfying

Eโ€‹wiโ€‹j=0,Eโ€‹|nโ€‹wiโ€‹j|2=1โ€‹ย andย Eโ€‹|nโ€‹wiโ€‹j|4โ‰คC.\mbox{E}w_{ij}=0,\quad\mbox{E}|\sqrt{n}w_{ij}|^{2}=1\text{ and }\mbox{E}|\sqrt{n}w_{ij}|^{4}\leq C. (7)

We consider the high dimensional setting specified by the following assumption.

Assumption 2.

p/nโ‰กcnโ†’cโˆˆ(0,โˆž)p/n\equiv c_{n}\rightarrow c\in(0,\infty).

Note that when ๐€n\mathbf{A}_{n} has a bounded rank, the limiting spectral distribution of ๐—nโ€‹๐—nโŠค\mathbf{X}_{n}\mathbf{X}_{n}^{\top} is the same as that of the model by setting ๐€n\mathbf{A}_{n} as a zero matrix. This can be directly concluded by the rank inequality, see Theorem A.43 of Bai and Silverstein, (2010). However, to investigate the limiting behaviors of the spiked eigenvalues and eigenvectors under Model (1), more assumptions on ๐€\mathbf{A} and ๐šบ\bm{\Sigma} are required.

Assumption 3.

Let ๐€n\mathbf{A}_{n} be a pร—np\times n matrix with bounded spectral norm and finite rank KK, and ๐šบ\bm{\Sigma} be a symmetric matrix with bounded spectral norm. Let ๐‘nโ‰ก๐€nโ€‹๐€nโŠค+๐šบ\mathbf{R}_{n}\equiv\mathbf{A}_{n}\mathbf{A}_{n}^{\top}+\bm{\Sigma}, and denote the singular value decomposition of ๐‘n\mathbf{R}_{n} by ๐‘n=โˆ‘k=1pฮณkโ€‹ฮพkโ€‹ฮพkโŠค\mathbf{R}_{n}=\sum_{k=1}^{p}\gamma_{k}\xi_{k}\xi_{k}^{\top} with Cโ‰ฅฮณ1>โ€ฆ>ฮณK>ฮณK+1โ‰ฅโ€ฆโ‰ฅฮณpC\geq\gamma_{1}>\ldots>\gamma_{K}>\gamma_{K+1}\geq\ldots\geq\gamma_{p}.

Remark 1.

In this paper, we consider the case where the leading eigenvalue of ๐‘n\mathbf{R}_{n} is bounded, and a similar strategy can be adapted to investigate the case of divergent spikes. Moreover, one can also allow KK to tend to infinity at a slow rate, but we do not pursue it here.

The key technical tool is the deterministic equivalents of QnQ_{n} and Q~n\tilde{Q}_{n} in (4) and (5). We introduce it first as it requires weaker assumptions than the main results on spiked eigenvectors and eigenvalues, and may be of independent interest. For any zโˆˆ๐’ž+z\in\mathcal{C}^{+}, let r~nโ€‹(z)โˆˆ๐’ž+\tilde{r}_{n}(z)\in\mathcal{C}^{+} be the unique solution to the equation

z=โˆ’1r~n+cnโ€‹โˆซtโ€‹dโ€‹F๐‘nโ€‹(t)1+tโ€‹r~n,\displaystyle z=-\frac{1}{\tilde{r}_{n}}+c_{n}\int\frac{tdF^{\mathbf{R}_{n}}(t)}{1+t\tilde{r}_{n}}, (8)

where F๐‘nโ€‹(t)F^{\mathbf{R}_{n}}(t) is the empirical spectral distribution of ๐‘n\mathbf{R}_{n}. Proposition 1 below provides the deterministic equivalence of QnQ_{n} and Q~n\tilde{Q}_{n}.

Proposition 1.

Suppose that Assumptions 1 to 3 are satisfied. Let (๐ฎn)nโ‰ฅ1,(๐ฏn)nโ‰ฅ1(\mathbf{u}_{n})_{n\geq 1},(\mathbf{v}_{n})_{n\geq 1} be sequences of deterministic vectors of unit norm. Then for any zโˆˆ๐’ž+z\in\mathcal{C}^{+} with โ„‘โกz\Im z being bounded from below by a positive constant. we have

Eโ€‹|๐ฎnโŠคโ€‹(Q~nโ€‹(z)โˆ’R~nโ€‹(z))โ€‹๐ฎn|2=Oโ€‹(nโˆ’1),\displaystyle\mbox{E}|\mathbf{u}_{n}^{\top}(\tilde{Q}_{n}(z)-\tilde{R}_{n}(z))\mathbf{u}_{n}|^{2}=O(n^{-1}), (9)

where

R~nโ€‹(z)=r~nโ€‹(z)โ€‹๐ˆโˆ’(r~nโ€‹(z))2โ€‹๐€nโŠคโ€‹[๐ˆ+r~nโ€‹(z)โ€‹๐‘n]โˆ’1โ€‹๐€n;\tilde{R}_{n}(z)=\tilde{r}_{n}(z)\mathbf{I}-(\tilde{r}_{n}(z))^{2}\mathbf{A}_{n}^{\top}\left[\mathbf{I}+\tilde{r}_{n}(z)\mathbf{R}_{n}\right]^{-1}\mathbf{A}_{n}; (10)

and

Eโ€‹|๐ฏnโŠคโ€‹(Qnโ€‹(z)โˆ’Rnโ€‹(z))โ€‹๐ฏn|2=Oโ€‹(nโˆ’1),\mbox{E}|\mathbf{v}_{n}^{\top}(Q_{n}(z)-R_{n}(z))\mathbf{v}_{n}|^{2}=O(n^{-1}), (11)

where

Rnโ€‹(z)=(โˆ’zโ€‹๐ˆโˆ’zโ€‹r~nโ€‹๐‘n)โˆ’1.R_{n}(z)=\left(-z\mathbf{I}-z\tilde{r}_{n}\mathbf{R}_{n}\right)^{-1}. (12)
Remark 2.

The model we studied is similar to that in Hachem etย al., (2013) and the proof of Proposition 1 leverages the main result therein. The main difference between our Proposition 1 and their results is that we study the case with a general ๐šบ\bm{\Sigma}, while they consider a model with a separable variance profile where the noise part can be written as ๐ƒn1/2โ€‹๐–nโ€‹๐ƒ~n1/2\mathbf{D}_{n}^{1/2}\mathbf{W}_{n}{\bf\tilde{D}}_{n}^{1/2} but ๐ƒn\mathbf{D}_{n} and ๐ƒ~n{\bf\tilde{D}}_{n} are both diagonal matrices.

There are two features of Proposition 1 that are worth mentioning here. First, the deterministic equivalents of both Qnโ€‹(z)Q_{n}(z) and Q~nโ€‹(z)\tilde{Q}_{n}(z) involves a quantity r~n\tilde{r}_{n}, which is actually the Stieltjes transform of the generalized Marchenko-Pastur law, see Bai and Silverstein, (2010) for instance. This is hidden in Hachem etย al., (2013) as their results hold for general ๐€n\mathbf{A}_{n} instead of being of finite rank. Second, when ๐—n\mathbf{X}_{n} has columns with the structure specified in (6), which is of statistical interest, especially in the context of special clustering, R~nโ€‹(z)\tilde{R}_{n}(z) has a block structure as it is in a form of c1โ€‹๐ˆ+c2โ€‹๐€nโŠคโ€‹Mโ€‹๐€nc_{1}\mathbf{I}+c_{2}\mathbf{A}_{n}^{\top}M\mathbf{A}_{n} for some constants c1,c2c_{1},c_{2} and some matrix MM. This can also be inferred from the observation that Eโ€‹(๐—nโŠคโ€‹๐—n)\mbox{E}(\mathbf{X}_{n}^{\top}\mathbf{X}_{n}) can also be written in such a form.

Based on Proposition 1, we now first focus on the eigenvectors corresponding to the spiked eigenvalues of ๐‘n\mathbf{R}_{n}. The following assumption is needed.

Assumption 4.

Under the spectral decomposition of ๐‘n\mathbf{R}_{n} specified in Assumption 3, we assume that min1โ‰คiโ‰คKโก(ฮณiโˆ’ฮณi+1)>c0>0\min_{1\leq i\leq K}(\gamma_{i}-\gamma_{i+1})>c_{0}>0 for some constant c0c_{0} independent of pp and nn. For 1โ‰คkโ‰คK1\leq k\leq K, ฮณk\gamma_{k} satisfies

โˆซt2โ€‹dโ€‹Hโ€‹(t)(ฮณkโˆ’t)2<1c,\displaystyle\int\frac{t^{2}dH(t)}{(\gamma_{k}-t)^{2}}<\frac{1}{c}, (13)

where Hโ€‹(t)H(t) is the limiting spectral distribution of ๐‘n\mathbf{R}_{n}.

Remark 3.

Assumption 4 is a variant of the condition given in definition 4.1 of Bai and Yao, (2012), ensures that the first KK largest eigenvalues of ๐’n\mathbf{S}_{n} are simple spiked eigenvalues, and the gaps of adjacent spiked eigenvalues have a constant lower bound with probability tending to one.

Let ๐ฏ^kโˆˆโ„p\hat{\mathbf{v}}_{k}\in\mathbb{R}^{p} and ๐ฎ^kโˆˆโ„n\hat{\mathbf{u}}_{k}\in\mathbb{R}^{n} be the eigenvector associated with the kk-th largest (spiked) eigenvalue of ๐’n\mathbf{S}_{n} and ๐’~n\tilde{\mathbf{S}}_{n}, respectively. The following theorem characterizes the asymptotic behaviours of ๐ฏ^k\hat{\mathbf{v}}_{k} and ๐ฎ^k\hat{\mathbf{u}}_{k}.

Theorem 1.

Under Assumptions 1 to 4 , for any 1โ‰คkโ‰คK1\leq k\leq K, and any sequences of deterministic unit vectors {๐ฏn}nโ‰ฅ1โˆˆโ„p\{\mathbf{v}_{n}\}_{n\geq 1}\in\mathbb{R}^{p} and {๐ฎn}nโ‰ฅ1โˆˆโ„n\{\mathbf{u}_{n}\}_{n\geq 1}\in\mathbb{R}^{n}, we have

  1. 1.
    |๐ฏnโŠคโ€‹๐ฏ^kโ€‹๐ฏ^kโŠคโ€‹๐ฏnโˆ’๐ฏnโŠคโ€‹๐kโ€‹๐ฏn|=OPโ€‹(1n),|\mathbf{v}_{n}^{\top}\hat{\mathbf{v}}_{k}\hat{\mathbf{v}}_{k}^{\top}\mathbf{v}_{n}-\mathbf{v}_{n}^{\top}\mathbf{P}_{k}\mathbf{v}_{n}|=O_{P}\left(\frac{1}{\sqrt{n}}\right), (14)

    where ๐k=โˆ‘j=1pckโ€‹(j)โ€‹ฮพjโ€‹ฮพjโŠค\mathbf{P}_{k}=\sum_{j=1}^{p}c_{k}(j)\xi_{j}\xi_{j}^{\top}, and {ckโ€‹(j)}\{c_{k}(j)\} are defined by

    ckโ€‹(j)={1โˆ’โˆ‘i=1,iโ‰ kp(ฮณkฮณiโˆ’ฮณkโˆ’ฯ‰kฮณiโˆ’ฯ‰k),j=kฮณkฮณjโˆ’ฮณkโˆ’ฯ‰kฮณjโˆ’ฯ‰k,jโ‰ kc_{k}(j)=\left\{\begin{array}[]{ll}\displaystyle 1-\sum_{i=1,i\neq k}^{p}\left(\frac{\gamma_{k}}{\gamma_{i}-\gamma_{k}}-\frac{\omega_{k}}{\gamma_{i}-\omega_{k}}\right),&j=k\\[10.0pt] \displaystyle\frac{\gamma_{k}}{\gamma_{j}-\gamma_{k}}-\frac{\omega_{k}}{\gamma_{j}-\omega_{k}},&j\neq k\\ \end{array}\right.

    and ฯ‰1โ‰ฅฯ‰2โ‰ฅ,โ‹ฏ,โ‰ฅฯ‰p\omega_{1}\geq\omega_{2}\geq,\cdots,\geq\omega_{p} are the real solutions to the equation in ฯ‰\omega:

    1pโ€‹โˆ‘i=1pฮณiฮณiโˆ’ฯ‰=1c.\frac{1}{p}\sum_{i=1}^{p}\frac{\gamma_{i}}{\gamma_{i}-\omega}=\frac{1}{c}. (15)
  2. 2.
    |๐ฎnโŠคโ€‹๐ฎ^kโ€‹๐ฎ^kโŠคโ€‹๐ฎnโˆ’ฮทkโ€‹๐ฎnโŠคโ€‹๐€nโŠคโ€‹ฮพkโ€‹ฮพkโŠคโ€‹๐€nโ€‹๐ฎnฮณk|=OPโ€‹(1n),\left|\mathbf{u}_{n}^{\top}\hat{\mathbf{u}}_{k}\hat{\mathbf{u}}_{k}^{\top}\mathbf{u}_{n}-\eta_{k}\frac{\mathbf{u}_{n}^{\top}\mathbf{A}_{n}^{\top}\xi_{k}\xi_{k}^{\top}\mathbf{A}_{n}\mathbf{u}_{n}}{\gamma_{k}}\right|=O_{P}\left(\frac{1}{\sqrt{n}}\right), (16)

    where

    ฮทk=(1โˆ’1nโ€‹โˆ‘i=1,iโ‰ kpฮณi2(ฮณkโˆ’ฮณi)2).\eta_{k}=\left(1-\frac{1}{n}\sum_{i=1,i\neq k}^{p}\frac{\gamma_{i}^{2}}{(\gamma_{k}-\gamma_{i})^{2}}\right). (17)
Remark 4.

It is worth mentioning that the first-order behaviour of the left spiked singular vectors of ๐—n\mathbf{X}_{n} is the same as that of a sample covariance matrix of ๐‘n1/2โ€‹๐–\mathbf{R}_{n}^{1/2}\mathbf{W}, see the main results in Mestre, 2008b , and Table 5 below demonstrated by a simulation. However, the behaviour of the right singular vectors is significantly distinct. Specifically, when the entries of ๐–\mathbf{W} are Gaussian variables, the matrix composed of the right eigenvectors of ๐‘n1/2โ€‹๐–\mathbf{R}_{n}^{1/2}\mathbf{W} is asymptotically Haar distributed. This observation contrasts with the second fact in Theorem 1.

In addition, it is noteworthy that when ๐šบ=๐ˆ\bm{\Sigma}=\mathbf{I}, the model reduces to the one studied in (Ding,, 2020; Bao etย al.,, 2021). In these studies, the results on the left and right singular vectors of ๐—n\mathbf{X}_{n} are observed to be symmetric due to the symmetry of the model structure. However, for a general ๐šบ\bm{\Sigma}, we cannot deduce the properties of the right singular vectors of ๐—n\mathbf{X}_{n} solely based on the properties of the left singular vectors, and vice versa. We further discuss the relationship between our results and those in Ding, (2020) below Theorem 2.

The asymptotic behaviour of the spiked eigenvalues is also considered, and thus some more notations are also required. Similar to Bai and Yao, (2012), for the spiked eigenvalue ฮณ\gamma outside the support of HH and ฮณโ‰ 0\gamma\neq 0, we define

ฯ†โ€‹(ฮณ)=zโ€‹(โˆ’1ฮณ)=ฮณโ€‹(1+cโ€‹โˆซtโ€‹dโ€‹Hโ€‹(t)ฮณโˆ’t),\varphi(\gamma)=z(-\frac{1}{\gamma})=\gamma\left(1+c\int\frac{t\mathrm{d}H(t)}{\gamma-t}\right), (18)

where zz is regarded as the function defined in (8) with its domain extended to the real line. As defined in Bai and Yao, (2012), a spiked eigenvalue ฮณ\gamma is called a distant spike if ฯ†โ€ฒโ€‹(ฮณ)>0\varphi^{\prime}(\gamma)>0 which is coincident to Assumption 4, and a close spike if ฯ†โ€ฒโ€‹(ฮณ)โ‰ค0\varphi^{\prime}(\gamma)\leq 0. Note that ๐’n\mathbf{S}_{n} and ๐’~n\tilde{\mathbf{S}}_{n} share the same nonzero eigenvalues, and we denote by ฮป1โ‰ฅโ€ฆโ‰ฅฮปpโˆงn>0.\lambda_{1}\geq\ldots\geq\lambda_{p\wedge n}>0.

Theorem 2.

Under Assumptions 1 to 4, we have

ฮปkโ€‹โ†’๐‘ƒโ€‹ฯ†โ€‹(ฮณk).\lambda_{k}\overset{P}{\rightarrow}\varphi(\gamma_{k}). (19)

In particular, the above result still holds for the distant spiked eigenvalues with multiplicity larger than one. Moreover, for a nonspiked eigenvalue ฮปj\lambda_{j} with j/pโ†’qj/p\rightarrow q, we have

limnโ†’โˆžฮปj=ฮผ1โˆ’qa.s.\lim_{n\rightarrow\infty}\lambda_{j}=\mu_{1-q}~{}~{}~{}~{}\text{a.s.} (20)

uniformly holds in 0โ‰คqโ‰ค10\leq q\leq 1, where ฮผq\mu_{q} is the qq-quantile of Fc,HF^{c,H}, that is, ฮผq=inf{x:Fc,Hโ€‹(x)โ‰ฅq}\mu_{q}=\inf\{x:~{}F^{c,H}(x)\geq q\}.

Remark 5.

By the main results in Bai and Yao, (2012), one could obtain the limits of the spiked eigenvalues of ๐‘n1/2โ€‹๐–nโ€‹๐–nโŠคโ€‹๐‘n1/2\mathbf{R}_{n}^{1/2}\mathbf{W}_{n}\mathbf{W}_{n}^{\top}\mathbf{R}_{n}^{1/2}. Theorem 2 indicates that the asymptotic limits of the spiked eigenvalues of ๐—nโ€‹๐—nโŠค\mathbf{X}_{n}\mathbf{X}_{n}^{\top} are the same as those of ๐‘n1/2โ€‹๐–nโ€‹๐–nโŠคโ€‹๐‘n1/2\mathbf{R}_{n}^{1/2}\mathbf{W}_{n}\mathbf{W}_{n}^{\top}\mathbf{R}_{n}^{1/2}. See Table 5 below for an illustration.

Related work. Ding, (2020) investigated the limits of the spiked eigenvalues and eigenvectors of a signal-plus-noise model where ๐šบ=๐ˆ\bm{\Sigma}=\mathbf{I}. Bao etย al., (2021) further obtained the fluctuation of quadratic forms of left and right spiked eigenvectors of a signal-plus-noise model where ๐šบ=๐ˆ\bm{\Sigma}=\mathbf{I}. The model we considered includes both of these two models as special cases, and our results show that the source of sample spiked eigenvalues can be either from the spikes in the signal matrix ๐€\mathbf{A}, or spikes from ๐šบ\bm{\Sigma}, which is the covariance matrix of the noise part.

We verify that our main results match with the corresponding parts in Ding, (2020). As the first KK eigenvectors of ๐‘n=๐€nโ€‹๐€nโŠค+๐ˆn\mathbf{R}_{n}=\mathbf{A}_{n}\mathbf{A}_{n}^{\top}+\mathbf{I}_{n} are the same as the left singular vectors of ๐€n\mathbf{A}_{n}, the singular value decomposition of ๐€n\mathbf{A}_{n} can be written as ๐€n=โˆ‘i=1Kdiโ€‹ฮพiโ€‹ฮถiโŠค\mathbf{A}_{n}=\sum_{i=1}^{K}d_{i}\xi_{i}\zeta_{i}^{\top} where ฮพi\xi_{i} is the eigenvector associated with ๐‘n\mathbf{R}_{n}. Then ฮณk=dk2+1\gamma_{k}=d_{k}^{2}+1 for k=1,โ‹ฏ,Kk=1,\cdots,K, and ฮณk=1\gamma_{k}=1 for (K+1)โ‰คkโ‰คp(K+1)\leq k\leq p. Theorem 2 implies that

ฮปkโ€‹โ†’๐‘ƒโ€‹ฯ†โ€‹(ฮณk)=(di2+1)โ€‹(1+diโˆ’1โ€‹c).\lambda_{k}\overset{P}{\rightarrow}\varphi(\gamma_{k})=(d_{i}^{2}+1)(1+d_{i}^{-1}c).

By taking ๐ฎn=ฮถk\mathbf{u}_{n}=\zeta_{k} in (16), we find ๐ฎnโŠคโ€‹๐€nโŠคโ€‹ฮพk=dk\mathbf{u}_{n}^{\top}\mathbf{A}_{n}^{\top}\xi_{k}=d_{k}, and ฮทk=1โˆ’cnโ€‹dkโˆ’4+Oโ€‹(nโˆ’1)\eta_{k}=1-c_{n}d_{k}^{-4}+O(n^{-1}), thus

ฮถkโŠคโ€‹๐ฎ^kโ€‹๐ฎ^kโŠคโ€‹ฮถkโˆ’(dk4โˆ’cn)/[dk2โ€‹(1+dk2)]=OPโ€‹(nโˆ’1/2).\zeta_{k}^{\top}\hat{\mathbf{u}}_{k}\hat{\mathbf{u}}_{k}^{\top}\zeta_{k}-(d_{k}^{4}-c_{n})/[d_{k}^{2}(1+d_{k}^{2})]=O_{P}(n^{-1/2}).

These limits coincide with pโ€‹(dk)p(d_{k}) and a2โ€‹(dk)a_{2}(d_{k}) defined in (2.6) and (2.9) of Ding, (2020), respectively.

One may wonder whether the asymptotic distributions of the spiked eigenvalues and eigenvectors of ๐—nโ€‹๐—nโŠค\mathbf{X}_{n}\mathbf{X}_{n}^{\top} as those of ๐‘n1/2โ€‹๐–nโ€‹๐–nโ€‹๐‘n1/2\mathbf{R}_{n}^{1/2}\mathbf{W}_{n}\mathbf{W}_{n}\mathbf{R}_{n}^{1/2}, given that their first order limits coincide. Several recent studies investigated the latter model, including (Jiang and Bai,, 2021; Zhang etย al.,, 2022; Bao etย al.,, 2022), etc. Through simulations, we observe different asymptotic variances between the two models, as indicated by Table 5.

The aforementioned theoretical results are all built on ๐’n\mathbf{S}_{n} or ๐’~n\tilde{\mathbf{S}}_{n} that refer to the noncentral covariance matrices. In some situations, the centered versions are also of interest. Specifically, we consider the corresponding covariance matrices

๐’ยฏn=(๐—nโˆ’๐—ยฏn)โ€‹(๐—nโˆ’๐—ยฏn)โŠค,\bar{\mathbf{S}}_{n}=(\mathbf{X}_{n}-\bar{\mathbf{X}}_{n})(\mathbf{X}_{n}-\bar{\mathbf{X}}_{n})^{\top},

and

๐’ยฏ~n=(๐—nโˆ’๐—ยฏn)โŠคโ€‹(๐—nโˆ’๐—ยฏn),\tilde{\bar{\mathbf{S}}}_{n}=(\mathbf{X}_{n}-\bar{\mathbf{X}}_{n})^{\top}(\mathbf{X}_{n}-\bar{\mathbf{X}}_{n}),

where ๐—ยฏn=๐ฑยฏnโ€‹๐ŸโŠค\bar{\mathbf{X}}_{n}=\bar{\mathbf{x}}_{n}\mathbf{1}^{\top} and ๐ฑยฏn=โˆ‘k=1n๐ฑk/n\bar{\mathbf{x}}_{n}=\sum_{k=1}^{n}\mathbf{x}_{k}/n. Let ฮฆ=๐ˆโˆ’๐Ÿ๐ŸโŠค/n\Phi=\mathbf{I}-\mathbf{1}\mathbf{1}^{\top}/n and denote the spectral decomposition of ๐‘ยฏn=๐€nโ€‹ฮฆโ€‹๐€nโŠค+๐šบ\bar{\mathbf{R}}_{n}=\mathbf{A}_{n}\Phi\mathbf{A}_{n}^{\top}+\bm{\Sigma} by ๐‘ยฏn=โˆ‘k=1pฮณยฏkโ€‹ฮพยฏkโ€‹ฮพยฏkโŠค\bar{\mathbf{R}}_{n}=\sum_{k=1}^{p}\bar{\gamma}_{k}\bar{\xi}_{k}\bar{\xi}_{k}^{\top}, where ฮณยฏ1>โ€ฆ>ฮณยฏKยฏ>โ€ฆโ‰ฅฮณยฏp\bar{\gamma}_{1}>\ldots>\bar{\gamma}_{\bar{K}}>\ldots\geq\bar{\gamma}_{p}. Here Kยฏ\bar{K} may be equal to KK or Kโˆ’1K-1 in some different cases. Moreover, define the corresponding resolvent Qยฏnโ€‹(z)\bar{Q}_{n}(z) and Q~ยฏnโ€‹(z)\bar{\tilde{Q}}_{n}(z) of matrix ๐’ยฏn\bar{\mathbf{S}}_{n} and ๐’~ยฏn\bar{\tilde{\mathbf{S}}}_{n}, respectively:

Qยฏnโ€‹(z)=(๐’ยฏnโˆ’zโ€‹๐ˆ)โˆ’1,Q~ยฏnโ€‹(z)=(๐’~ยฏnโˆ’zโ€‹๐ˆ)โˆ’1.\displaystyle\bar{Q}_{n}(z)=(\bar{\mathbf{S}}_{n}-z\mathbf{I})^{-1},\quad\bar{\tilde{Q}}_{n}(z)=(\bar{\tilde{\mathbf{S}}}_{n}-z\mathbf{I})^{-1}.

Based on the given notations, we established the corresponding results for the centralized sample covariance matrices:

Proposition 2.

Suppose that Assumptions 1 and 2 are satisfied, replace ๐‘n\mathbf{R}_{n} in Assumption 3 by ๐‘ยฏn\bar{\mathbf{R}}_{n}. Then, we have

|๐ฎnโŠคโ€‹(Q~ยฏnโ€‹(z)โˆ’D~โ€‹(z))โ€‹๐ฎn|=OPโ€‹(1/n),\displaystyle\left|\mathbf{u}_{n}^{\top}(\bar{\tilde{Q}}_{n}(z)-\tilde{D}(z))\mathbf{u}_{n}\right|=O_{P}(1/\sqrt{n}),
|๐ฏnโŠคโ€‹(Qยฏnโ€‹(z)โˆ’Dโ€‹(z))โ€‹๐ฏn|=OPโ€‹(1/n)\displaystyle\left|\mathbf{v}_{n}^{\top}(\bar{Q}_{n}(z)-{D}(z))\mathbf{v}_{n}\right|=O_{P}(1/\sqrt{n})

where

D~โ€‹(z)\displaystyle\tilde{D}(z) =r~nโ€‹(z)โ€‹ฮฆโˆ’rn~โ€‹(z)2โ€‹ฮฆโ€‹๐€nโŠคโ€‹(๐ˆ+r~nโ€‹(z)โ€‹๐‘ยฏn)โˆ’1โ€‹๐€nโ€‹ฮฆโˆ’zโˆ’1โ€‹nโˆ’1โ€‹๐Ÿ๐ŸโŠค,\displaystyle=\tilde{r}_{n}(z)\Phi-\tilde{r_{n}}(z)^{2}\Phi\mathbf{A}_{n}^{\top}(\mathbf{I}+\tilde{r}_{n}(z)\bar{\mathbf{R}}_{n})^{-1}\mathbf{A}_{n}\Phi-z^{-1}n^{-1}\mathbf{1}\mathbf{1}^{\top},
Dโ€‹(z)\displaystyle D(z) =(โˆ’zโˆ’zโ€‹r~nโ€‹๐‘ยฏn)โˆ’1.\displaystyle=(-z-z\tilde{r}_{n}\bar{\mathbf{R}}_{n})^{-1}.

Relying on Proposition 2, we also have the following conclusion for the spiked eigenvalues and the corresponding eigenvectors of ๐’ยฏn\bar{\mathbf{S}}_{n} and ๐’~ยฏn\bar{\tilde{\mathbf{S}}}_{n}.

Theorem 3.

Assume that the conditions of Proposition 2 are satisfied with ๐‘n\mathbf{R}_{n} in Assumption 4 replaced by ๐‘ยฏn\bar{\mathbf{R}}_{n}. By replacing ๐’n\mathbf{S}_{n}, ๐’~n\tilde{\mathbf{S}}_{n}, ๐‘n\mathbf{R}_{n} and their latent symbols (e.g., ฮณk\gamma_{k}) with the counterparts of ๐’ยฏn\bar{\mathbf{S}}_{n}, ๐’~ยฏn\bar{\tilde{\mathbf{S}}}_{n} and ๐‘ยฏn\bar{\mathbf{R}}_{n}, the conclusions in Theorems 1 and 2 still hold.

3 Applications

In this section, based on the results in Section 2, we aim to develop some potential applications. Spectral clustering has been used in practice frequently in data science and the theoretical underpinning of such a method has received extensive interest in recent years; see e.g., (Couillet etย al.,, 2016; Zhou and Amini,, 2019; Lรถffler etย al.,, 2021), etc. This section is to have a deep insight into the spectral clustering based on the Model (6). Moreover, we also propose a new criterion to estimate the number of clusters. Recalling (6), for any iโˆˆ๐’ฑsi\in\mathcal{V}_{s}, there is Eโ€‹๐ฑi=๐s/n\mbox{E}\mathbf{x}_{i}=\bm{\mu}_{s}/\sqrt{n}, where s=1,โ€ฆ,Ks=1,\ldots,K. Let ๐=[๐1,โ€ฆ,๐K]/nโˆˆโ„pร—K\mathbf{N}=[\bm{\mu}_{1},\ldots,\bm{\mu}_{K}]/\sqrt{n}\in\mathbb{R}^{p\times K}, ๐‡=[๐ก1,โ€ฆ,๐กK]โˆˆโ„nร—K\mathbf{H}=[\mathbf{h}_{1},\ldots,\mathbf{h}_{K}]\in\mathbb{R}^{n\times K}, ๐กs=(๐กsโ€‹(1),โ€ฆ,๐กsโ€‹(n))โŠคโˆˆโ„n\mathbf{h}_{s}=(\mathbf{h}_{s}(1),\ldots,\mathbf{h}_{s}(n))^{\top}\in\mathbb{R}^{n}, where ๐กsโ€‹(i)=1\mathbf{h}_{s}(i)=1 if iโˆˆ๐’ฑsi\in\mathcal{V}_{s} and ๐กsโ€‹(i)=0\mathbf{h}_{s}(i)=0 otherwise. In a matrix form, write

๐—n=[๐ฑ1,โ€ฆ,๐ฑn]=๐๐‡โŠค+๐šบ1/2โ€‹๐–n\mathbf{X}_{n}=[\mathbf{x}_{1},\ldots,\mathbf{x}_{n}]=\mathbf{N}\mathbf{H}^{\top}+\bm{\Sigma}^{1/2}\mathbf{W}_{n}

Notice that

Eโ€‹(๐’~n)=๐‡๐โŠคโ€‹๐๐‡โŠค+trโ€‹๐šบnโ€‹๐ˆn.\mbox{E}(\tilde{\mathbf{S}}_{n})=\mathbf{H}\mathbf{N}^{\top}\mathbf{N}\mathbf{H}^{\top}+\frac{\mbox{tr}\bm{\Sigma}}{n}\mathbf{I}_{n}.

The block structure of Eโ€‹(๐’~n)\mbox{E}(\tilde{\mathbf{S}}_{n}) (except the diagonal positions) is similar to that of stochastic block models (SBM). This motivates one to use spectral clustering for high dimensional data with different means across groups.

To do the clustering, it is of interest to estimate the number of clusters, i.e., estimation of KK. There exist plenty of approaches to estimate the number of the clusters. To name a few, Thorndike, (1953) proposed the Elbow method that aims to minimize the within-group sum of squares (WSS); Silhouette index (Rousseeuw,, 1987) is a measure of how similar an object is to its own cluster compared to other clusters, which takes values in [โˆ’1,1][-1,1]; Tibshirani etย al., (2001) proposed a gap statistic to estimate the number of clusters, etc. These methods either lack theoretical guarantees or have some restrictions in computation or settings. Hence, here we propose a theoretical guarantee and easily implemented approach to estimate the number of clusters. Notice that under Model (6), the number of the spiked eigenvalues of ๐’n\mathbf{S}_{n} or ๐’~n\tilde{\mathbf{S}}_{n} is the same as the number of clusters if the means in terms of the different clusters are not linearly correlated. The estimation of the number of spikes in different models has been discussed in multiple literatures, and mostly are based on the setting of ๐šบ=๐ˆ\bm{\Sigma}=\mathbf{I}; see e.g., Bai etย al., (2018).

Motivated by the work of Bai etย al., (2018) and Theorem 2, we propose two criteria to estimate the number of clusters. Without loss of generality, we assume 0<c<10<c<1. Let

EDAk=โˆ’nโ€‹(ฮป1โˆ’ฮปk+1)+nโ€‹(pโˆ’kโˆ’1)โ€‹logโกฮธ~p,k+2โ€‹pโ€‹k,EDBk=โˆ’nโ€‹logโก(p)โ‹…(ฮป1โˆ’ฮปk+1)+nโ€‹(pโˆ’kโˆ’1)โ€‹logโกฮธ~p,k+(logโกn)โ€‹pโ€‹k,\begin{split}&\text{EDA}_{k}=-n(\lambda_{1}-\lambda_{k+1})+n(p-k-1)\log\widetilde{\theta}_{p,k}+2pk,\\ &\text{EDB}_{k}=-n\log(p)\cdot(\lambda_{1}-\lambda_{k+1})+n(p-k-1)\log\widetilde{\theta}_{p,k}+(\log n)pk,\end{split} (21)

where ฮธ~p,k=1pโˆ’kโˆ’1โ€‹โˆ‘i=k+1pโˆ’1ฮธi2\widetilde{\theta}_{p,k}=\frac{1}{p-k-1}\sum_{i=k+1}^{p-1}\theta_{i}^{2}, and ฮธk=expโก{ฮปn,kโˆ’ฮปn,k+1},k=1,2,โ€ฆ,pโˆ’1\theta_{k}=\exp\{\lambda_{n,k}-\lambda_{n,k+1}\},k=1,2,\ldots,p-1.

Remark 6.

The first two main terms aim to capture the difference between eigenvalues, and the third term is the penalty term for the number of unknown parameters in the model. The values of EDA and EDB are expected to reach a minimum when k=Kk=K. From (21), it can be seen that, as kk increases, the first and second terms decrease while the third term increases. For more discussion about (21) and the case of c>1c>1, one may refer to the supplementary material.

We estimate the number of clusters by

K^EDA\displaystyle\hat{K}_{\text{EDA}} =\displaystyle= argโกmink=1,โ€ฆ,wโก1nโ€‹EDAk,\displaystyle\arg\min\limits_{k=1,\ldots,w}\frac{1}{n}\text{EDA}_{k}, (22)
K^EDB\displaystyle\hat{K}_{\text{EDB}} =\displaystyle= argโกmink=1,โ€ฆ,wโก1nโ€‹EDBk,\displaystyle\arg\min\limits_{k=1,\ldots,w}\frac{1}{n}\text{EDB}_{k}, (23)

where ww is the prespecified number of clusters satisfying w=oโ€‹(p)w=o(p). Note that under conditions of Theorem 2, it follows that for k=1,2,โ€ฆ,Kโˆ’1k=1,2,\ldots,K-1,

ฮธkโ€‹โ†’๐‘โ€‹expโก{ฯ†โ€‹(ฮณk)โˆ’ฯ†โ€‹(ฮณk+1)},ฮธKโ€‹โ†’๐‘โ€‹expโก{ฯ†โ€‹(ฮณK)โˆ’ฮผ1},\theta_{k}\overset{p}{\rightarrow}\exp\{\varphi(\gamma_{k})-\varphi(\gamma_{k+1})\},~{}~{}\theta_{K}\overset{p}{\rightarrow}\exp\{\varphi(\gamma_{K})-\mu_{1}\}, (24)

where function ฯ†\varphi and ฮผ1\mu_{1} are defined in (18) and (20), respectively. For simplicity, denote the limit of ฮธk\theta_{k} by ฮพk\xi_{k} for k=1,โ€ฆ,Kk=1,\ldots,K. Define two sequences {as}s=2K\{a_{s}\}_{s=2}^{K} and {bs}s=2K\{b_{s}\}_{s=2}^{K} as follows

as=ฮพs2+logโกฮพsโˆ’2โ€‹cโˆ’1+as+1โ€‹ย andย โ€‹aK+1=0โ€‹ย forย โ€‹s=2,โ€ฆ,K,bs=ฮพs2+logโกpโ€‹logโกฮพsโˆ’cโ€‹logโกnโˆ’1+bs+1โ€‹ย andย โ€‹bK+1=0โ€‹ย forย โ€‹s=2,โ€ฆ,K.\begin{split}&a_{s}=\xi_{s}^{2}+\log\xi_{s}-2c-1+a_{s+1}\text{ and }a_{K+1}=0\text{ for }s=2,\ldots,K,\\ &b_{s}=\xi_{s}^{2}+{\log p}\log\xi_{s}-c\log n-1+b_{s+1}\text{ and }b_{K+1}=0\text{ for }s=2,\ldots,K.\end{split} (25)

We propose two gap conditions for EDA and EDB, respectively, i.e.,

mins=2,โ€ฆ,Kโกas\displaystyle\min\limits_{s=2,\ldots,K}a_{s} >\displaystyle> 0,\displaystyle 0, (26)
mins=2,โ€ฆ,Kโกbs\displaystyle\min\limits_{s=2,\ldots,K}b_{s} >\displaystyle> 0.\displaystyle 0. (27)
Remark 7.

The gap condition in Bai etย al., (2018) was proposed for the population covariance matrix with distant spikes larger than one and other eigenvalues equal to one. While the model studied in this paper imposes no restriction to the non-spiked eigenvalues, the gap conditions in (26) and (27) are more easily satisfied and have a wider range of applications.

Note that Theorem 2 and (24) are obtained when the leading eigenvalues are bounded. Here we also investigate the cases of the leading eigenvalues tending to infinity.

Lemma 1.

In the same setup of Theorem 2, instead of assuming ฮณ1\gamma_{1} being bounded, suppose that ฮณKโ†’โˆž\gamma_{K}\rightarrow\infty, as nโ†’โˆžn\rightarrow\infty. Then, for any k=1,โ€ฆ,Kk=1,\ldots,K, we have

limnโ†’โˆžฮปk/ฮณk=1a.s..\lim_{n\rightarrow\infty}\lambda_{k}/\gamma_{k}=1~{}~{}~{}~{}\text{a.s.}.

Based on Theorem 2 and Lemma 1, we derive the consistency of K^EDA\hat{K}_{\text{EDA}} as follows.

Theorem 4.

Under conditions of Theorem 2, if the gap condition (26) does not hold, then K^EDA\hat{K}_{\text{EDA}} is not consistent; if the gap condition holds, then K^EDA\hat{K}_{\text{EDA}} is strongly consistent.
In particular, if ฮณK\gamma_{K} tends to infinity, then K^EDA\hat{K}_{\text{EDA}} is strongly consistent.

In Bai etย al., (2018), BIC is consistent when ฮปKโ†’โˆž\lambda_{K}\rightarrow\infty at a rate faster than logโกn\log n, which makes BIC less capable of detecting signals. This is because BIC has a more strict penalty coefficient logโกn\log n compared to the penalty coefficient โ€œ2โ€ in AIC. For the EDB construction of selecting the number of clusters, we add the coefficient โ€œlogโกp\log pโ€ to the first term so that the spikes do not need to be very large and only the corresponding gap condition for EDB is required. By the analogous proof strategy of Theorem 4, we obtain the consistency of EDB as follows.

Theorem 5.

Under the same setting of Theorem 4, if the gap condition (27) does not hold, then K^EDB\hat{K}_{\text{EDB}} is not consistent; if (27) holds, then K^EDB\hat{K}_{\text{EDB}} is strongly consistent. Moreover, if ฮณK\gamma_{K} tends to infinity, then K^EDB\hat{K}_{\text{EDB}} is strongly consistent.

Once the estimator of the number of clusters is available, we can conduct spectral clustering. Specifically, let the eigenvectors corresponding to the first K^\hat{K} eigenvalues of ๐’~n\tilde{\mathbf{S}}_{n} be ๐”^=[๐ฎ^1,โ€ฆ,๐ฎ^K^]โˆˆโ„nร—K^\widehat{\mathbf{U}}=[\hat{\mathbf{u}}_{1},\ldots,\hat{\mathbf{u}}_{\hat{K}}]\in\mathbb{R}^{n\times\hat{K}}. We then apply the following K-means optimization to the ๐”^\widehat{\mathbf{U}}, i.e.,

๐”โˆ—=argโกmaxUโˆˆโ„ณn,K^โกโ€–Uโˆ’๐”^โ€–F2,\displaystyle\mathbf{U}^{*}=\arg\max_{U\in\mathcal{M}_{n,\hat{K}}}\|U-\widehat{\mathbf{U}}\|_{F}^{2}, (28)

where โ„ณn,K={Uโˆˆโ„nร—K:Uโ€‹ย has at mostย Kย distinct rows}\mathcal{M}_{n,K}=\{U\in\mathbb{R}^{n\times K}:U\text{ has at most $K$ distinct rows}\}. Then, we return ๐’ฑ^1,โ€ฆ,๐’ฑ^K^\hat{\mathcal{V}}_{1},\ldots,\hat{\mathcal{V}}_{\hat{K}} as the indices for each cluster. From (28), we see that the spectral clustering is conducted from the obtained ๐”^\widehat{\mathbf{U}}, and hence we look into the properties of ๐”^\widehat{\mathbf{U}}.

Corollary 1.

Under the conditions of Theorem 1, in the set of all deterministic unit vectors ๐ฎn\mathbf{u}_{n}, ๐ฎโˆ—=๐€nโŠคโ€‹ฮพk/โ€–๐€nโŠคโ€‹ฮพkโ€–\mathbf{u}^{*}=\mathbf{A}_{n}^{\top}\xi_{k}/\|\mathbf{A}_{n}^{\top}\xi_{k}\| maximizes the non-random term ฮณkโˆ’1โ€‹ฮทkโ€‹๐ฎnTโ€‹๐€nTโ€‹ฮพkโ€‹ฮพkTโ€‹๐€nโ€‹๐ฎn\gamma_{k}^{-1}\eta_{k}\mathbf{u}_{n}^{T}\mathbf{A}_{n}^{T}\xi_{k}\xi_{k}^{T}\mathbf{A}_{n}\mathbf{u}_{n} in (16), and

โ€–(๐ฎ^kโŠคโ€‹๐ฎโˆ—)โ€‹๐ฎโˆ—โˆ’๐ฎ^kโ€–2=1โˆ’ฮทkโ€‹(1โˆ’ฮพkโŠคโ€‹๐šบโ€‹ฮพkฮณk)+OPโ€‹(1n).\left\|(\hat{\mathbf{u}}_{k}^{\top}\mathbf{u}^{*})\mathbf{u}^{*}-\hat{\mathbf{u}}_{k}\right\|^{2}=1-\eta_{k}\left(1-\frac{\xi_{k}^{\top}\bm{\Sigma}\xi_{k}}{\gamma_{k}}\right)+O_{P}\left(\frac{1}{\sqrt{n}}\right). (29)

Moreover, let ๐”^r\widehat{\mathbf{U}}_{r} be the eigenvectors corresponding to the largest rr eigenvalues of ๐’~n\tilde{\mathbf{S}}_{n}, where rโ‰คKr\leq K. For any deterministic ๐•r\mathbf{V}_{r} that contains rr column vectors of unit length, we have

infฮ›โˆˆโ„rร—rโ€–๐•rโ€‹ฮ›โˆ’๐”^rโ€–F2=rโˆ’trโ€‹(๐•rโŠคโ€‹๐”^rโ€‹๐”^rโŠคโ€‹๐•r)=rโˆ’trโ€‹(๐•rโŠคโ€‹๐€โŠคโ€‹๐Rโ€‹๐€๐•r)+OPโ€‹(1n),\inf_{\Lambda\in\mathbb{R}^{r\times r}}\|\mathbf{V}_{r}\Lambda-\hat{\mathbf{U}}_{r}\|_{F}^{2}=r-\mbox{tr}\left(\mathbf{V}_{r}^{\top}\hat{\mathbf{U}}_{r}\hat{\mathbf{U}}_{r}^{\top}\mathbf{V}_{r}\right)=r-\mbox{tr}\left(\mathbf{V}_{r}^{\top}\mathbf{A}^{\top}\mathbf{P}_{R}\mathbf{A}\mathbf{V}_{r}\right)+O_{P}\left(\frac{1}{\sqrt{n}}\right), (30)

where

๐R=โˆ‘k=1rฮทkฮณkโ€‹ฮพkโ€‹ฮพkโŠค.\mathbf{P}_{R}=\sum_{k=1}^{r}\frac{\eta_{k}}{\gamma_{k}}\xi_{k}\xi_{k}^{\top}.
Remark 8.

From Corollary 1, we see that if ฮณk\gamma_{k} tends to infinity, and ฮณiโˆ’1/ฮณi>1+ฮด\gamma_{i-1}/\gamma_{i}>1+\delta for 1โ‰คiโ‰คK1\leq i\leq K with ฮด\delta being a positive constant independent of nn, we have ฮทkโ†’1\eta_{k}\rightarrow 1 thus the right side of (29) converges to zero in probability. Consequently, ๐ฎ^k\hat{\mathbf{u}}_{k} is an asymptotic consistent estimator of ๐€nโŠคโ€‹ฮพkโ€–๐€nโŠคโ€‹ฮพkโ€–\frac{\mathbf{A}_{n}^{\top}\xi_{k}}{\|\mathbf{A}_{n}^{\top}\xi_{k}\|}. Note that ๐€n=๐๐‡โŠค\mathbf{A}_{n}=\mathbf{N}\mathbf{H}^{\top}, which has KK distinct columns and represents KK different means. Hence, under mild conditions, there are KK different rows in ๐”^\widehat{\mathbf{U}}, and one can use it to find the corresponding clusters. When ฮณk\gamma_{k} is bounded, ๐ฎ^k\hat{\mathbf{u}}_{k} is not a consistent estimator for the block-wise constant vector ๐€nโŠคโ€‹ฮพk/โ€–๐€nโŠคโ€‹ฮพkโ€–โˆˆโ„n\mathbf{A}_{n}^{\top}\xi_{k}/\|\mathbf{A}_{n}^{\top}\xi_{k}\|\in\mathbb{R}^{n}. However, in this case, following the proof of Theorem 2.2 in Jin, (2015), an elementary misclustering error rate by spectral clustering can be also obtained, which is a new observation based on the proposed results.

4 Simulation

In this section, we first evaluate the performance of the proposed criteria in the estimation of the number of clusters discussed in Section 3. Denote the sets of under-estimated, exactly estimated and over-estimated models by โ„ฑโˆ’,โ„ฑโˆ—\mathcal{F}_{-},\mathcal{F}_{*} and โ„ฑ+\mathcal{F}_{+}, respectively, i.e.,

โ„ฑโˆ’={1,โ€ฆ,Kโˆ’1},โ„ฑโˆ—={K},โ„ฑ+={K+1,โ€ฆ,w}.\mathcal{F}_{-}=\{1,\ldots,K-1\},~{}~{}\mathcal{F}_{*}=\{K\},~{}~{}\mathcal{F}_{+}=\{K+1,\ldots,w\}.

The selection percentages corresponding to โ„ฑโˆ’,โ„ฑโˆ—\mathcal{F}_{-},\mathcal{F}_{*} and โ„ฑ+\mathcal{F}_{+} are computed by 10001000 repetitions. Suppose that the entries of ๐–n\mathbf{W}_{n} are i.i.d. with the following distributions:

  • โ€ข

    Standard normal distribution: wi,jโˆผ๐’ฉโ€‹(0,1)w_{i,j}\sim\mathcal{N}(0,1).

  • โ€ข

    Standardized tt distribution with 8 degrees of freedom: wi,jโˆผt8/Varโ€‹(t8)w_{i,j}\sim t_{8}/\sqrt{\text{Var}(t_{8})}.

  • โ€ข

    Standardized Bernoulli distribution with probability 1/21/2: wi,jโˆผ(Bernoulliโ€‹(1,1/2)โˆ’1/2)/(1/2)w_{i,j}\sim(\text{Bernoulli}(1,1/2)-1/2)/(1/2).

  • โ€ข

    Standardized chi-square distribution with 3 degrees of freedom: wi,jโˆผ(ฯ‡2โ€‹(3)โˆ’3)/Varโ€‹(ฯ‡2โ€‹(3))=(ฯ‡2โ€‹(3)โˆ’3)/6w_{i,j}\sim(\chi^{2}(3)-3)/\sqrt{\text{Var}(\chi^{2}(3))}=(\chi^{2}(3)-3)/\sqrt{6}

For comparison, three different methods are also considered: Average Silhouette Index (Rousseeuw, (1987)), Gap Statistic (Tibshirani etย al., (2001)) and BIC with degrees of freedom (David, (2020)), denoted by ASI, GS and BICdf, respectively. This section considers the situations with 0<c<10<c<1, and the cases with c>1c>1 are demonstrated in the supplementary material. Here we set c=1/3,3/4c=1/3,3/4 and the largest number of possible clusters w=โŒŠ6โ‹…n0.1โŒ‹w=\lfloor 6\cdot n^{0.1}\rfloor. Different means in terms of different clusters and the covariance matrices are set as follows :

Case 1. Let ๐1=(5,0,โˆ’4,0,0,โ€ฆ,0)โŠคโˆˆโ„p\bm{\mu}_{1}=(5,0,-4,0,0,\ldots,0)^{\top}\in\mathbb{R}^{p}, ๐2=(0,4,0,โˆ’6,0,โ€ฆ,0)โŠคโˆˆโ„p\bm{\mu}_{2}=(0,4,0,-6,0,\ldots,0)^{\top}\in\mathbb{R}^{p}, ๐3=(0,โˆ’5,โˆ’5,0,0,โ€ฆ,0)โŠคโˆˆโ„p\bm{\mu}_{3}=(0,-5,-5,0,0,\ldots,0)^{\top}\in\mathbb{R}^{p}, ๐4=(โˆ’6,0,0,6,0,โ€ฆ,0)โŠคโˆˆโ„p\bm{\mu}_{4}=(-6,0,0,6,0,\ldots,0)^{\top}\in\mathbb{R}^{p}, and ๐šบ=(ฯƒi,j)pร—p\bm{\Sigma}=(\sigma_{i,j})_{p\times p}, where ฯƒi,j=0.2|iโˆ’j|\sigma_{i,j}=0.2^{|i-j|}. Define

๐€n=(๐1,โ€ฆ,๐1โŸn1,๐2,โ€ฆ,๐2โŸn2,๐3,โ€ฆ,๐3โŸn3,๐4,โ€ฆ,๐4โŸn4),\mathbf{A}_{n}=\big{(}\underbrace{\bm{\mu}_{1},\ldots,\bm{\mu}_{1}}_{n_{1}},\underbrace{\bm{\mu}_{2},\ldots,\bm{\mu}_{2}}_{n_{2}},\underbrace{\bm{\mu}_{3},\ldots,\bm{\mu}_{3}}_{n_{3}},\underbrace{\bm{\mu}_{4},\ldots,\bm{\mu}_{4}}_{n_{4}}\big{)},

where n1=n3=0.3โ€‹n,n2=n4=0.2โ€‹nn_{1}=n_{3}=0.3n,~{}n_{2}=n_{4}=0.2n. Therefore, the true number of clusters is K=4K=4.

Case 2. Let ๐1=(3,0,0,0,โ€ฆ,0)โŠคโˆˆโ„p\bm{\mu}_{1}=(3,0,0,0,\ldots,0)^{\top}\in\mathbb{R}^{p}, ๐2=(0,3,0,0,โ€ฆ,0)โŠคโˆˆโ„p\bm{\mu}_{2}=(0,3,0,0,\ldots,0)^{\top}\in\mathbb{R}^{p}, ๐3=(0,0,3,0,โ€ฆ,0)โŠคโˆˆโ„p\bm{\mu}_{3}=(0,0,3,0,\ldots,0)^{\top}\in\mathbb{R}^{p}, ๐šบ=๐ˆ\bm{\Sigma}=\mathbf{I}, where ๐ˆ\mathbf{I} is the identity matrix of size pp. Then,

๐€n=(๐1,โ€ฆ,๐1โŸn1,๐2,โ€ฆ,๐2โŸn2,๐3,โ€ฆ,๐3โŸn3),\mathbf{A}_{n}=\big{(}\underbrace{\bm{\mu}_{1},\ldots,\bm{\mu}_{1}}_{n_{1}},\underbrace{\bm{\mu}_{2},\ldots,\bm{\mu}_{2}}_{n_{2}},\underbrace{\bm{\mu}_{3},\ldots,\bm{\mu}_{3}}_{n_{3}}\big{)},

where n1=n2=0.3โ€‹n,n3=0.4โ€‹nn_{1}=n_{2}=0.3n,~{}n_{3}=0.4n. Therefore, the true number of clusters is K=3K=3.

Case 3. The same setting as in the above Case 2 with ๐šบ=(ฯƒi,j)pร—p\bm{\Sigma}=(\sigma_{i,j})_{p\times p} instead of ๐ˆ\mathbf{I}, where ฯƒi,j=0.2|iโˆ’j|\sigma_{i,j}=0.2^{|i-j|}.

The spikes in the above cases are bounded. We also consider a case of spikes with ฮณKโ†’โˆž\gamma_{K}\rightarrow\infty at a rate faster than logโกn\log n and ฮณ1=Oโ€‹(p)\gamma_{1}=O(p).

Case 4. Let ๐1=(2โ€‹a,a,โˆ’a,a,1,โ€ฆ,1)โŠคโˆˆโ„p\bm{\mu}_{1}=(2a,a,-a,a,1,\ldots,1)^{\top}\in\mathbb{R}^{p}, ๐2=(a,a,2โ€‹a,โˆ’3โ€‹a,1,โ€ฆ,1)โŠคโˆˆโ„p\bm{\mu}_{2}=(a,a,2a,-3a,1,\ldots,1)^{\top}\in\mathbb{R}^{p}, ๐3=(a,โˆ’2โ€‹a,โˆ’a,a,1,โ€ฆ,1)โŠคโˆˆโ„p\bm{\mu}_{3}=(a,-2a,-a,a,1,\ldots,1)^{\top}\in\mathbb{R}^{p}, ๐4=(โˆ’2โ€‹a,a,a,a,1,โ€ฆ,1)โŠคโˆˆโ„p\bm{\mu}_{4}=(-2a,a,a,a,1,\ldots,1)^{\top}\in\mathbb{R}^{p}, and the sample size of cluster corresponding to each center be n1=n3=0.3โ€‹n,n2=n4=0.2โ€‹nn_{1}=n_{3}=0.3n,~{}n_{2}=n_{4}=0.2n, such that the true number of clusters K=4K=4. Suppose ๐šบ=(ฯƒi,j)pร—p\bm{\Sigma}=(\sigma_{i,j})_{p\times p}, where a=p/10a=\sqrt{p/10}, ฯƒi,j=0.2|iโˆ’j|\sigma_{i,j}=0.2^{|i-j|}.

Tables 1 to 4 report the percentages of under-estimated, exactly estimated and over-estimated under 1000 replications. From the reported results, we see the criteria based on EDA and EDB work better and better as n,pn,p become larger. When c=1/3c=1/3, the probabilities of the under-estimated number of clusters are equal to 0 and increase when cc is getting closer to 11. From (25), it is shown that the larger cc is, the harder the gap conditions are to be satisfied. EDB generally outperforms EDA except the case of c=3/4c=3/4, when p,np,n are large. It can be seen that when c=3/4c=3/4, as nn increases, the probability of โ„ฑโˆ’\mathcal{F}_{-} estimated by EDB becomes larger, and is uniformly greater than that by EDA. This is due to the fact that the coefficient in the penalty term of EDB criterion is โ€œlogโกn\log nโ€ which is different from the coefficient โ€22โ€ in EDA, so that the gap condition of EDB is more stronger than of EDA, that is, (27) is more difficult to be satisfied than (26). The criteria based on EDA and EDB show the highest accuracy under Bernoulli distribution, followed by normal, t8t_{8} and ฯ‡2โ€‹(3)\chi^{2}(3) with relatively heavy right tail which may be destructive to the results.

Table 1: Selection percentages of EDA, EDB, ASI, GS and BICdf in Case 1
EDA EDB ASI GS BICdf EDA EDB ASI GS BICdf
cc nn ๐’ฉโ€‹(0,1)\mathcal{N}(0,1) t8t_{8}
13\frac{1}{3} 180180 โ„ฑโˆ’\mathcal{F}_{-} 0 0 69.169.1 32.732.7 1.11.1 0 0 68.468.4 31.431.4 0.90.9
โ„ฑโˆ—\mathcal{F}_{*} 59.859.8 83.483.4 30.730.7 60.360.3 67.767.7 57.457.4 78.578.5 3131 61.461.4 67.167.1
โ„ฑ+\mathcal{F}_{+} 40.240.2 16.616.6 0.20.2 77 31.231.2 42.642.6 21.521.5 0.60.6 7.27.2 3232
450450 โ„ฑโˆ’\mathcal{F}_{-} 0 0 7575 24.724.7 1.61.6 0 0 73.673.6 26.126.1 1.31.3
โ„ฑโˆ—\mathcal{F}_{*} 93.193.1 98.998.9 24.824.8 66.466.4 71.971.9 94.194.1 99.299.2 26.426.4 64.664.6 70.570.5
โ„ฑ+\mathcal{F}_{+} 6.96.9 1.11.1 0.20.2 8.98.9 26.526.5 5.95.9 0.80.8 0 9.39.3 28.228.2
12\frac{1}{2} 120120 โ„ฑโˆ’\mathcal{F}_{-} 0.20.2 0.90.9 70.970.9 32.532.5 9.59.5 0.80.8 1.31.3 70.770.7 32.732.7 88
โ„ฑโˆ—\mathcal{F}_{*} 68.868.8 83.683.6 28.528.5 61.161.1 67.267.2 67.467.4 81.781.7 28.628.6 61.661.6 66.166.1
โ„ฑ+\mathcal{F}_{+} 3131 15.515.5 0.60.6 6.46.4 23.323.3 31.831.8 1717 0.70.7 5.75.7 25.925.9
300300 โ„ฑโˆ’\mathcal{F}_{-} 0 0.30.3 74.874.8 25.725.7 18.218.2 0 0.40.4 72.272.2 24.824.8 1515
โ„ฑโˆ—\mathcal{F}_{*} 9696 99.199.1 24.824.8 67.467.4 68.368.3 97.497.4 99.399.3 27.627.6 66.266.2 69.769.7
โ„ฑ+\mathcal{F}_{+} 44 0.60.6 0.40.4 6.96.9 13.513.5 2.62.6 0.30.3 0.20.2 99 15.315.3
34\frac{3}{4} 8080 โ„ฑโˆ’\mathcal{F}_{-} 5.65.6 10.910.9 70.270.2 34.734.7 17.917.9 6.66.6 11.911.9 7070 34.834.8 19.319.3
โ„ฑโˆ—\mathcal{F}_{*} 72.572.5 77.977.9 28.628.6 59.859.8 66.666.6 68.768.7 7575 28.128.1 59.359.3 65.665.6
โ„ฑ+\mathcal{F}_{+} 21.921.9 11.211.2 1.21.2 5.55.5 15.515.5 24.724.7 13.113.1 1.91.9 5.95.9 15.115.1
200200 โ„ฑโˆ’\mathcal{F}_{-} 6.66.6 1515 76.176.1 29.629.6 30.930.9 99 17.917.9 75.175.1 2929 26.626.6
โ„ฑโˆ—\mathcal{F}_{*} 9191 84.684.6 23.523.5 6464 62.862.8 88.288.2 81.681.6 24.524.5 64.964.9 6666
โ„ฑ+\mathcal{F}_{+} 2.42.4 0.40.4 0.40.4 6.46.4 6.36.3 2.82.8 0.50.5 0.40.4 6.16.1 7.47.4
cc nn Bernoulli ฯ‡2โ€‹(3)\chi^{2}(3)
13\frac{1}{3} 180180 โ„ฑโˆ’\mathcal{F}_{-} 0 0 71.971.9 30.330.3 1.11.1 0 0 65.765.7 29.329.3 1.11.1
โ„ฑโˆ—\mathcal{F}_{*} 64.264.2 82.482.4 27.527.5 60.960.9 65.165.1 52.452.4 75.275.2 32.932.9 62.262.2 68.268.2
โ„ฑ+\mathcal{F}_{+} 35.835.8 17.617.6 0.60.6 8.88.8 33.833.8 47.647.6 24.824.8 1.41.4 8.58.5 30.730.7
450450 โ„ฑโˆ’\mathcal{F}_{-} 0 0 75.775.7 26.626.6 2.32.3 0 0 68.368.3 27.127.1 1.11.1
โ„ฑโˆ—\mathcal{F}_{*} 96.696.6 98.898.8 24.224.2 65.765.7 66.966.9 93.193.1 9999 31.331.3 6464 7070
โ„ฑ+\mathcal{F}_{+} 3.43.4 1.21.2 0.10.1 7.77.7 30.830.8 6.96.9 11 0.40.4 8.98.9 28.928.9
12\frac{1}{2} 120120 โ„ฑโˆ’\mathcal{F}_{-} 0.20.2 0.50.5 71.871.8 31.331.3 8.88.8 0.70.7 2.42.4 69.569.5 32.132.1 8.18.1
โ„ฑโˆ—\mathcal{F}_{*} 72.372.3 85.285.2 27.727.7 62.262.2 68.568.5 61.561.5 7777 2929 6262 68.568.5
โ„ฑ+\mathcal{F}_{+} 27.527.5 14.314.3 0.50.5 6.56.5 22.722.7 37.837.8 20.620.6 1.51.5 5.95.9 23.423.4
300300 โ„ฑโˆ’\mathcal{F}_{-} 0.10.1 0.10.1 7676 26.726.7 16.316.3 0.10.1 1.11.1 69.569.5 23.423.4 1313
โ„ฑโˆ—\mathcal{F}_{*} 97.597.5 99.499.4 23.923.9 64.364.3 71.571.5 96.696.6 98.298.2 30.230.2 67.767.7 70.170.1
โ„ฑ+\mathcal{F}_{+} 2.42.4 0.50.5 0.10.1 99 12.212.2 3.33.3 0.70.7 0.30.3 8.98.9 16.916.9
34\frac{3}{4} 8080 โ„ฑโˆ’\mathcal{F}_{-} 4.44.4 66 68.968.9 33.833.8 19.919.9 7.77.7 13.913.9 67.767.7 33.833.8 17.717.7
โ„ฑโˆ—\mathcal{F}_{*} 74.874.8 83.183.1 30.130.1 62.462.4 66.266.2 64.264.2 69.369.3 29.529.5 60.860.8 67.767.7
โ„ฑ+\mathcal{F}_{+} 20.820.8 10.910.9 11 3.83.8 13.913.9 28.128.1 16.816.8 2.82.8 5.45.4 14.614.6
200200 โ„ฑโˆ’\mathcal{F}_{-} 5.85.8 12.712.7 75.675.6 28.728.7 30.830.8 10.510.5 19.919.9 71.471.4 28.728.7 28.828.8
โ„ฑโˆ—\mathcal{F}_{*} 92.692.6 8787 24.224.2 65.465.4 63.963.9 86.486.4 79.979.9 2828 65.765.7 63.463.4
โ„ฑ+\mathcal{F}_{+} 1.61.6 0.30.3 0.20.2 5.95.9 5.35.3 3.13.1 0.20.2 0.60.6 5.65.6 7.87.8
Table 2: Selection percentages of EDA, EDB, ASI, GS and BICdf in Case 2
EDA EDB ASI GS BICdf EDA EDB ASI GS BICdf
cc nn ๐’ฉโ€‹(0,1)\mathcal{N}(0,1) t8t_{8}
13\frac{1}{3} 180180 โ„ฑโˆ’\mathcal{F}_{-} 0 0 6.16.1 64.864.8 34.734.7 0 0 5.35.3 80.280.2 34.834.8
โ„ฑโˆ—\mathcal{F}_{*} 80.480.4 95.395.3 93.993.9 35.235.2 59.859.8 78.478.4 91.191.1 94.294.2 19.819.8 59.359.3
โ„ฑ+\mathcal{F}_{+} 19.619.6 4.74.7 0 0 5.55.5 21.621.6 8.98.9 0.50.5 0 5.95.9
450450 โ„ฑโˆ’\mathcal{F}_{-} 0 0 3.43.4 41.941.9 98.598.5 0 0 4.34.3 72.572.5 98.398.3
โ„ฑโˆ—\mathcal{F}_{*} 99.499.4 100100 96.696.6 58.158.1 1.51.5 98.998.9 99.899.8 95.695.6 27.527.5 1.71.7
โ„ฑ+\mathcal{F}_{+} 0.60.6 0 0 0 0 1.11.1 0.20.2 0.10.1 0 0
12\frac{1}{2} 120120 โ„ฑโˆ’\mathcal{F}_{-} 0 0 11.211.2 97.197.1 9898 0 0 12.112.1 98.898.8 98.598.5
โ„ฑโˆ—\mathcal{F}_{*} 83.483.4 93.793.7 88.188.1 2.92.9 22 8282 91.391.3 86.186.1 1.21.2 1.51.5
โ„ฑ+\mathcal{F}_{+} 16.616.6 6.36.3 0.70.7 0 0 1818 8.78.7 1.81.8 0 0
300300 โ„ฑโˆ’\mathcal{F}_{-} 0 0 5.75.7 97.197.1 100100 0 0 7.97.9 98.698.6 100100
โ„ฑโˆ—\mathcal{F}_{*} 99.399.3 100100 94.394.3 2.92.9 0 9999 99.899.8 91.891.8 1.41.4 0
โ„ฑ+\mathcal{F}_{+} 0.70.7 0 0 0 0 11 0.20.2 0.30.3 0 0
34\frac{3}{4} 8080 โ„ฑโˆ’\mathcal{F}_{-} 1.11.1 2.82.8 22.122.1 100100 100100 1.31.3 2.72.7 21.521.5 100100 100100
โ„ฑโˆ—\mathcal{F}_{*} 86.986.9 91.191.1 76.176.1 0 0 82.482.4 86.686.6 73.673.6 0 0
โ„ฑ+\mathcal{F}_{+} 1212 6.16.1 1.81.8 0 0 16.316.3 10.710.7 4.94.9 0 0
200200 โ„ฑโˆ’\mathcal{F}_{-} 0.10.1 0.10.1 13.213.2 100100 100100 0 0.40.4 13.313.3 100100 100100
โ„ฑโˆ—\mathcal{F}_{*} 99.199.1 99.899.8 86.486.4 0 0 99.499.4 99.699.6 85.585.5 0 0
โ„ฑ+\mathcal{F}_{+} 0.80.8 0.10.1 0.40.4 0 0 0.60.6 0 1.21.2 0 0
cc nn Bernoulli ฯ‡2โ€‹(3)\chi^{2}(3)
13\frac{1}{3} 180180 โ„ฑโˆ’\mathcal{F}_{-} 0 0 2.82.8 23.823.8 33.833.8 0 0 7.47.4 88.488.4 33.833.8
โ„ฑโˆ—\mathcal{F}_{*} 84.684.6 94.394.3 97.197.1 76.276.2 62.462.4 71.971.9 87.787.7 86.786.7 11.611.6 60.860.8
โ„ฑ+\mathcal{F}_{+} 15.415.4 5.75.7 0.10.1 0 3.83.8 28.128.1 12.312.3 5.95.9 0 5.45.4
450450 โ„ฑโˆ’\mathcal{F}_{-} 0 0 1.11.1 8.58.5 98.598.5 0 0 4.14.1 85.685.6 98.998.9
โ„ฑโˆ—\mathcal{F}_{*} 99.599.5 100100 98.998.9 91.591.5 1.51.5 98.498.4 99.999.9 95.295.2 14.414.4 1.11.1
โ„ฑ+\mathcal{F}_{+} 0.50.5 0 0 0 0 1.61.6 0.10.1 0.70.7 0 0
12\frac{1}{2} 120120 โ„ฑโˆ’\mathcal{F}_{-} 0 0 1111 9191 99.699.6 0.10.1 0 13.313.3 99.699.6 98.398.3
โ„ฑโˆ—\mathcal{F}_{*} 88.388.3 95.495.4 8989 99 0.40.4 74.574.5 84.484.4 78.778.7 0.40.4 1.11.1
โ„ฑ+\mathcal{F}_{+} 11.711.7 4.64.6 0 0 0 25.425.4 15.615.6 88 0 0
300300 โ„ฑโˆ’\mathcal{F}_{-} 0 0 33 85.485.4 100100 0 0 9.29.2 99.899.8 98.398.3
โ„ฑโˆ—\mathcal{F}_{*} 99.899.8 100100 9797 14.614.6 0 99.599.5 100100 89.889.8 0.20.2 1.71.7
โ„ฑ+\mathcal{F}_{+} 0.20.2 0 0 0 0 0.50.5 0 11 0 0
34\frac{3}{4} 8080 โ„ฑโˆ’\mathcal{F}_{-} 0.40.4 11 23.523.5 100100 99.999.9 5.45.4 77 23.423.4 100100 100100
โ„ฑโˆ—\mathcal{F}_{*} 90.290.2 94.494.4 76.476.4 0 0.10.1 7575 79.679.6 58.158.1 0 0
โ„ฑ+\mathcal{F}_{+} 9.49.4 4.64.6 0.10.1 0 0 19.619.6 13.413.4 18.518.5 0 0
200200 โ„ฑโˆ’\mathcal{F}_{-} 0 0.20.2 7.77.7 100100 100100 0.30.3 1.11.1 20.120.1 100100 100100
โ„ฑโˆ—\mathcal{F}_{*} 99.799.7 99.899.8 92.292.2 0 0 99.199.1 98.998.9 76.476.4 0 0
โ„ฑ+\mathcal{F}_{+} 0.30.3 0 0.10.1 0 0 0.60.6 0 3.53.5 0 0
Table 3: Selection percentages of EDA, EDB, ASI, GS and BICdf in Case 3
EDA EDB ASI GS BICdf EDA EDB ASI GS BICdf
cc nn ๐’ฉโ€‹(0,1)\mathcal{N}(0,1) t8t_{8}
13\frac{1}{3} 180180 โ„ฑโˆ’\mathcal{F}_{-} 0 0 5.45.4 69.469.4 92.392.3 0 0 3.83.8 82.4 89.489.4
โ„ฑโˆ—\mathcal{F}_{*} 58.158.1 81.281.2 94.494.4 30.630.6 7.77.7 5858 78.678.6 9595 17.617.6 10.610.6
โ„ฑ+\mathcal{F}_{+} 41.941.9 18.818.8 0.20.2 0 0 4242 21.421.4 1.21.2 0 0
450450 โ„ฑโˆ’\mathcal{F}_{-} 0 0 2.72.7 60.960.9 99.299.2 0 0 3.53.5 77.677.6 9999
โ„ฑโˆ—\mathcal{F}_{*} 93.293.2 99.199.1 97.397.3 39.139.1 0.80.8 93.793.7 99.699.6 96.396.3 22.422.4 11
โ„ฑ+\mathcal{F}_{+} 6.86.8 0.90.9 0 0 0 6.36.3 0.40.4 0.20.2 0 0
12\frac{1}{2} 120120 โ„ฑโˆ’\mathcal{F}_{-} 0.30.3 0.40.4 9.39.3 98.698.6 98.898.8 0.10.1 0.70.7 11.911.9 99.899.8 99.399.3
โ„ฑโˆ—\mathcal{F}_{*} 69.669.6 82.282.2 89.689.6 1.41.4 1.21.2 68.168.1 80.680.6 8585 0.20.2 0.70.7
โ„ฑ+\mathcal{F}_{+} 30.130.1 17.417.4 1.11.1 0 0 31.831.8 18.718.7 3.13.1 0 0
300300 โ„ฑโˆ’\mathcal{F}_{-} 0 0 4.64.6 99.599.5 100100 0 0 8.28.2 100100 100100
โ„ฑโˆ—\mathcal{F}_{*} 95.995.9 99.299.2 95.395.3 0.50.5 0 96.796.7 99.699.6 91.391.3 0 0
โ„ฑ+\mathcal{F}_{+} 4.14.1 0.80.8 0.10.1 0 0 3.33.3 0.40.4 0.50.5 0 0
34\frac{3}{4} 8080 โ„ฑโˆ’\mathcal{F}_{-} 3.53.5 8.48.4 17.217.2 100100 100100 5.55.5 10.410.4 19.919.9 100100 99.999.9
โ„ฑโˆ—\mathcal{F}_{*} 72.772.7 79.679.6 7979 0 0 7171 73.373.3 72.772.7 0 0.10.1
โ„ฑ+\mathcal{F}_{+} 23.823.8 1212 3.83.8 0 0 23.523.5 16.316.3 7.47.4 0 0
200200 โ„ฑโˆ’\mathcal{F}_{-} 11 6.96.9 1111 100100 100100 2.12.1 8.48.4 14.214.2 100100 100100
โ„ฑโˆ—\mathcal{F}_{*} 96.296.2 92.892.8 88.288.2 0 0 9595 91.391.3 84.384.3 0 0
โ„ฑ+\mathcal{F}_{+} 2.82.8 0.30.3 0.80.8 0 0 2.92.9 0.30.3 1.51.5 0 0
cc nn Bernoulli ฯ‡2โ€‹(3)\chi^{2}(3)
13\frac{1}{3} 180180 โ„ฑโˆ’\mathcal{F}_{-} 0 0 2.12.1 39.639.6 90.290.2 0 0 77 90.290.2 87.687.6
โ„ฑโˆ—\mathcal{F}_{*} 61.261.2 82.482.4 97.897.8 60.460.4 9.89.8 51.851.8 75.175.1 87.687.6 9.89.8 12.412.4
โ„ฑ+\mathcal{F}_{+} 38.838.8 17.617.6 0.10.1 0 0 48.248.2 24.924.9 5.45.4 0 0
450450 โ„ฑโˆ’\mathcal{F}_{-} 0 0 0.50.5 36.736.7 99.399.3 0 0 4.14.1 89.389.3 9999
โ„ฑโˆ—\mathcal{F}_{*} 93.993.9 99.599.5 99.599.5 63.363.3 0.70.7 93.493.4 99.299.2 94.994.9 10.710.7 11
โ„ฑ+\mathcal{F}_{+} 6.16.1 0.50.5 0 0 0 6.66.6 0.80.8 11 0 0
12\frac{1}{2} 120120 โ„ฑโˆ’\mathcal{F}_{-} 0.30.3 0 7.17.1 9595 99.699.6 0.20.2 1.31.3 1515 99.899.8 98.298.2
โ„ฑโˆ—\mathcal{F}_{*} 71.871.8 87.587.5 92.792.7 55 0.40.4 59.359.3 74.674.6 75.575.5 0.20.2 1.81.8
โ„ฑ+\mathcal{F}_{+} 27.927.9 12.512.5 0.20.2 0 0 40.540.5 24.124.1 9.59.5 0 0
300300 โ„ฑโˆ’\mathcal{F}_{-} 0 0 1.61.6 96.996.9 100100 0 0.40.4 10.410.4 100100 100100
โ„ฑโˆ—\mathcal{F}_{*} 97.197.1 99.899.8 98.498.4 3.13.1 0 9595 9999 87.787.7 0 0
โ„ฑ+\mathcal{F}_{+} 2.92.9 0.20.2 0 0 0 55 0.60.6 1.91.9 0 0
34\frac{3}{4} 8080 โ„ฑโˆ’\mathcal{F}_{-} 3.23.2 5.75.7 18.318.3 100100 100100 10.110.1 15.115.1 23.223.2 100100 100100
โ„ฑโˆ—\mathcal{F}_{*} 7777 83.583.5 79.179.1 0 0 61.561.5 69.669.6 57.157.1 0 0
โ„ฑ+\mathcal{F}_{+} 19.819.8 10.810.8 2.62.6 0 0 28.428.4 15.315.3 19.619.6 0 0
200200 โ„ฑโˆ’\mathcal{F}_{-} 0.60.6 4.64.6 5.95.9 100100 100100 5.25.2 14.614.6 21.521.5 100100 100100
โ„ฑโˆ—\mathcal{F}_{*} 97.397.3 95.495.4 93.993.9 0 0 91.991.9 85.285.2 74.774.7 0 0
โ„ฑ+\mathcal{F}_{+} 2.12.1 0 0.20.2 0 0 2.92.9 0.20.2 3.8 0 0
Table 4: Selection percentages of EDA, EDB, ASI, GS and BICdf in Case 4
EDA EDB ASI GS BICdf EDA EDB ASI GS BICdf
cc nn ๐’ฉโ€‹(0,1)\mathcal{N}(0,1) t8t_{8}
13\frac{1}{3} 180180 โ„ฑโˆ’\mathcal{F}_{-} 0 0 70.470.4 25.125.1 0.20.2 0 0 7070 28.128.1 0
โ„ฑโˆ—\mathcal{F}_{*} 60.460.4 82.482.4 28.928.9 65.565.5 67.167.1 57.157.1 78.378.3 28.728.7 62.162.1 66.566.5
โ„ฑ+\mathcal{F}_{+} 39.639.6 17.617.6 0.70.7 9.49.4 32.732.7 42.942.9 21.721.7 1.31.3 9.89.8 33.533.5
450450 โ„ฑโˆ’\mathcal{F}_{-} 0 0 67.267.2 28.428.4 0 0 0 69.269.2 31.431.4 0.10.1
โ„ฑโˆ—\mathcal{F}_{*} 94.894.8 99.699.6 29.829.8 61.261.2 62.762.7 93.593.5 9999 27.827.8 5858 6363
โ„ฑ+\mathcal{F}_{+} 5.25.2 0.40.4 33 10.410.4 37.337.3 6.56.5 11 33 10.610.6 36.936.9
12\frac{1}{2} 120120 โ„ฑโˆ’\mathcal{F}_{-} 0 0 71.271.2 28.328.3 1.81.8 0 0 69.269.2 28.828.8 2.52.5
โ„ฑโˆ—\mathcal{F}_{*} 70.470.4 83.183.1 27.327.3 63.463.4 70.870.8 62.962.9 83.483.4 29.829.8 63.463.4 66.266.2
โ„ฑ+\mathcal{F}_{+} 29.629.6 16.916.9 1.51.5 8.38.3 27.427.4 37.137.1 16.616.6 11 7.87.8 31.331.3
300300 โ„ฑโˆ’\mathcal{F}_{-} 0 0 66.766.7 28.328.3 0 0 0 65.565.5 26.726.7 0
โ„ฑโˆ—\mathcal{F}_{*} 96.796.7 99.799.7 30.530.5 59.559.5 59.659.6 96.496.4 99.899.8 31.131.1 62.262.2 61.761.7
โ„ฑ+\mathcal{F}_{+} 3.33.3 0.30.3 2.82.8 12.212.2 40.440.4 3.63.6 0.20.2 3.43.4 11.111.1 38.338.3
34\frac{3}{4} 8080 โ„ฑโˆ’\mathcal{F}_{-} 0 0 67.167.1 30.630.6 13.913.9 0 0 69.569.5 36.236.2 1313
โ„ฑโˆ—\mathcal{F}_{*} 7575 84.484.4 31.431.4 61.261.2 68.468.4 68.868.8 78.478.4 29.529.5 56.856.8 68.968.9
โ„ฑ+\mathcal{F}_{+} 2525 15.615.6 1.51.5 8.28.2 17.717.7 31.231.2 21.621.6 11 77 18.118.1
200200 โ„ฑโˆ’\mathcal{F}_{-} 0 0 68.768.7 31.131.1 0.20.2 0 0 67.767.7 30.130.1 0
โ„ฑโˆ—\mathcal{F}_{*} 9797 99.799.7 28.128.1 58.858.8 6363 9696 99.399.3 29.529.5 59.859.8 60.160.1
โ„ฑ+\mathcal{F}_{+} 33 0.30.3 3.23.2 10.110.1 36.836.8 44 0.70.7 2.82.8 10.110.1 39.939.9
cc nn Bernoulli ฯ‡2โ€‹(3)\chi^{2}(3)
13\frac{1}{3} 180180 โ„ฑโˆ’\mathcal{F}_{-} 0 0 67.567.5 24.324.3 0 0 0 66.466.4 24.224.2 0.10.1
โ„ฑโˆ—\mathcal{F}_{*} 60.760.7 83.683.6 32.232.2 66.966.9 66.366.3 49.549.5 75.975.9 31.731.7 67.567.5 66.666.6
โ„ฑ+\mathcal{F}_{+} 39.339.3 16.416.4 0.30.3 8.88.8 33.733.7 50.550.5 24.124.1 1.91.9 8.18.1 33.333.3
450450 โ„ฑโˆ’\mathcal{F}_{-} 0 0 70.770.7 29.329.3 0 0 0 67.167.1 30.130.1 0
โ„ฑโˆ—\mathcal{F}_{*} 95.195.1 99.499.4 26.626.6 58.358.3 62.462.4 91.991.9 99.499.4 29.329.3 56.656.6 60.960.9
โ„ฑ+\mathcal{F}_{+} 4.94.9 0.60.6 2.72.7 12.412.4 37.637.6 8.18.1 0.60.6 3.63.6 13.313.3 39.139.1
12\frac{1}{2} 120120 โ„ฑโˆ’\mathcal{F}_{-} 0 0 72.672.6 29.229.2 1.91.9 0 0 68.168.1 30.230.2 1.91.9
โ„ฑโˆ—\mathcal{F}_{*} 72.672.6 8585 25.625.6 63.663.6 66.766.7 62.262.2 74.774.7 29.329.3 61.561.5 66.666.6
โ„ฑ+\mathcal{F}_{+} 27.427.4 1515 1.81.8 7.27.2 31.431.4 37.837.8 25.325.3 2.62.6 8.38.3 31.531.5
300300 โ„ฑโˆ’\mathcal{F}_{-} 0 0 69.369.3 28.928.9 0.10.1 0 0 66.966.9 29.329.3 0
โ„ฑโˆ—\mathcal{F}_{*} 96.996.9 99.799.7 2828 60.760.7 59.759.7 94.594.5 98.998.9 28.928.9 59.859.8 61.661.6
โ„ฑ+\mathcal{F}_{+} 3.13.1 0.30.3 2.72.7 10.410.4 40.240.2 5.55.5 1.11.1 4.24.2 10.910.9 38.438.4
34\frac{3}{4} 8080 โ„ฑโˆ’\mathcal{F}_{-} 0 0 70.970.9 31.731.7 14.414.4 0 0 6565 35.535.5 15.115.1
โ„ฑโˆ—\mathcal{F}_{*} 77.677.6 89.189.1 28.128.1 61.561.5 68.368.3 65.965.9 75.175.1 31.131.1 57.757.7 6666
โ„ฑ+\mathcal{F}_{+} 22.422.4 10.910.9 11 6.86.8 17.317.3 34.134.1 24.924.9 3.93.9 6.86.8 18.918.9
200200 โ„ฑโˆ’\mathcal{F}_{-} 0 0 63.863.8 29.429.4 0.20.2 0 0 66.866.8 32.432.4 0.30.3
โ„ฑโˆ—\mathcal{F}_{*} 97.797.7 99.799.7 32.432.4 60.160.1 61.261.2 96.196.1 99.299.2 30.230.2 57.257.2 59.159.1
โ„ฑ+\mathcal{F}_{+} 2.32.3 0.30.3 3.83.8 10.510.5 38.638.6 3.93.9 0.80.8 33 10.410.4 40.640.6

At the end of this section, we use a simple simulation to demonstrate the matching properties of the left spiked eigenvectors and spiked eigenvalues between a signal-plus-noise matrix and a sample covariance matrix, which have been discussed in Remarks 4 and 5. Let ๐€n=Uโ€‹ฮ›โ€‹VโŠคโˆˆโ„pร—n\mathbf{A}_{n}=U\Lambda V^{\top}\in\mathbb{R}^{p\times n}, where UU has two column vectors 2โˆ’1/2โ€‹(1,1,0,โ‹ฏ,0)2^{-1/2}(1,1,0,\cdots,0) and 2โˆ’1/2โ€‹(โˆ’1,1,0,โ‹ฏ,0)2^{-1/2}(-1,1,0,\cdots,0), ฮ›=diagโ€‹(3,2)\Lambda=\mbox{diag}(3,2) and VV consists the first two right eigenvectors of a pร—np\times n Gaussian matrix, and ๐šบ=(0.4|iโˆ’j|)+diagโ€‹(0,0,6,0,โ‹ฏโ€‹0)โˆˆโ„pร—p\bm{\Sigma}=(0.4^{|i-j|})+\mbox{diag}(0,0,6,0,\cdots 0)\in\mathbb{R}^{p\times p}. Let Model 1 be ๐€n+๐šบ1/2โ€‹๐–n\mathbf{A}_{n}+\bm{\Sigma}^{1/2}\mathbf{W}_{n} where ๐–n\mathbf{W}_{n} consists of independent ๐’ฉโ€‹(0,1/n)\mathcal{N}(0,1/n), and Model 2 be ๐‘n1/2โ€‹๐–nโ€‹๐–nโŠคโ€‹๐‘n1/2\mathbf{R}_{n}^{1/2}\mathbf{W}_{n}\mathbf{W}_{n}^{\top}\mathbf{R}_{n}^{1/2} where ๐‘n=๐€nโ€‹๐€nโŠค+๐šบ\mathbf{R}_{n}=\mathbf{A}_{n}\mathbf{A}_{n}^{\top}+\bm{\Sigma}, and ๐–n\mathbf{W}_{n} the same as Model 1. There are three spiked eigenvalues satisfying Assumption 4. Table 5 reports the three largest eigenvalues and eigenvectors of ๐—nโ€‹๐—nโŠค\mathbf{X}_{n}\mathbf{X}_{n}^{\top} averaged from 500 replications generated by Model 1 and 2, respectively.

Table 5: The first three eigenvalues and eigenvectors of ๐—nโ€‹๐—nโŠค\mathbf{X}_{n}\mathbf{X}_{n}^{\top} where ๐—n\mathbf{X}_{n} are generated by Model 1 and 2, averaging from 500 replications each, with v=(1,0,โ‹ฏ,0)v=(1,0,\cdots,0) (values in parentheses indicate the standard deviations).
ฮป1\lambda_{1} ฮป2\lambda_{2} ฮป3\lambda_{3} (vโŠคโ€‹v^1)2(v^{\top}\hat{v}_{1})^{2} (vโŠคโ€‹v^2)2(v^{\top}\hat{v}_{2})^{2} (vโŠคโ€‹v^3)2(v^{\top}\hat{v}_{3})^{2}
๐—n=๐€n+๐šบ1/2โ€‹๐–n\mathbf{X}_{n}=\mathbf{A}_{n}+\bm{\Sigma}^{1/2}\mathbf{W}_{n} 11.122 7.574 5.238 0.447 0.040 0.377
(0.550) (0.633) (0.261) (0.052) (0.050) (0.053)
๐—n=๐‘n1/2โ€‹๐–nโ€‹(๐‘n=๐€nโ€‹๐€nโŠค+๐šบ)\mathbf{X}_{n}=\mathbf{R}_{n}^{1/2}\mathbf{W}_{n}(\mathbf{R}_{n}=\mathbf{A}_{n}\mathbf{A}_{n}^{\top}+\bm{\Sigma}) 11.114 7.583 5.212 0.444 0.047 0.371
(1.026) (0.640) (0.432) (0.089) (0.065) (0.080)

We observe that the first-order limits are almost the same for the two types of models. Moreover, the fluctuation behaviour is possibly different which can be inferred from the different standard deviations in Table 5.

5 Proofs of main results

In this section, we prove the main results in Section 2 and 3. Proposition 1 plays an important role in the proof of Theorem 1. To prove Proposition 1, the following Proposition 3 is required, whose proof is provided in the supplementary material. In what follows, we sometimes omit the subscripts โ€œnโ€, and use the conjugate transpose `โ€‹`โˆ—โ€‹"``^{*}" to replace the common transpose `โ€‹`โŠคโ€‹"``^{\top}", which are same in real cases. In addition, the proof of Theorem 2, Theorem 4 and Corollary 1 are also provided.

Proposition 3.

Under the conditions of Propostion 1, for any deterministic unit vectors ๐ฎnโˆˆโ„n\mathbf{u}_{n}\in\mathbb{R}^{n} and zโˆˆ๐’ž+z\in\mathcal{C}^{+}, we have

Eโ€‹|๐ฎnโˆ—โ€‹(Q~nโ€‹(z)โˆ’T~โ€‹(z))โ€‹๐ฎn|2=Oโ€‹(nโˆ’1),\displaystyle\mbox{E}|\mathbf{u}_{n}^{*}(\tilde{Q}_{n}(z)-\tilde{T}(z))\mathbf{u}_{n}|^{2}=O(n^{-1}), (31)

where

Tโ€‹(z)=(โˆ’zโ€‹(๐ˆ+ฮด~โ€‹(z)โ€‹๐šบ)+11+ฮดโ€‹(z)โ€‹๐€๐€โˆ—)โˆ’1,T(z)=\left(-z(\mathbf{I}+\tilde{\delta}(z)\bm{\Sigma})+\frac{1}{1+\delta(z)}\mathbf{A}\mathbf{A}^{*}\right)^{-1},
T~โ€‹(z)=(โˆ’zโ€‹(1+ฮดโ€‹(z))โ€‹๐ˆ+๐€โˆ—โ€‹(๐ˆ+ฮด~โ€‹(z)โ€‹๐šบ)โˆ’1โ€‹๐€)โˆ’1,\tilde{T}(z)=\left(-z(1+\delta(z))\mathbf{I}+\mathbf{A}^{*}(\mathbf{I}+\tilde{\delta}(z)\bm{\Sigma})^{-1}\mathbf{A}\right)^{-1},

ฮดโ€‹(z)=1nโ€‹trโ€‹(๐šบโ€‹Tโ€‹(z))\delta(z)=\frac{1}{n}\mbox{tr}(\bm{\Sigma}T(z)) and ฮด~โ€‹(z)=1nโ€‹trโ€‹(T~โ€‹(z))\tilde{\delta}(z)=\frac{1}{n}\mbox{tr}(\tilde{T}(z)).

To give the theoretical justifications, we first introduce a necessary lemma.

Lemma 2.

(Woodbury matrix identity) Suppose that Aโˆˆโ„nร—nA\in\mathbb{R}^{n\times n} and Dโˆˆโ„kร—kD\in\mathbb{R}^{k\times k} are invertible, and Uโˆˆโ„nร—kU\in\mathbb{R}^{n\times k}, Vโˆˆโ„kร—nV\in\mathbb{R}^{k\times n}, there is

(A+Uโ€‹Dโ€‹V)โˆ’1=Aโˆ’1โˆ’Aโˆ’1โ€‹Uโ€‹(Dโˆ’1+Vโ€‹Aโˆ’1โ€‹U)โˆ’1โ€‹Vโ€‹Aโˆ’1.(A+UDV)^{-1}=A^{-1}-A^{-1}U\left(D^{-1}+VA^{-1}U\right)^{-1}VA^{-1}.

Now we start to prove Proposition 1.

Proof of Proposition 1.

Proof.

Recall that

Tโ€‹(z)=(โˆ’zโ€‹(๐ˆ+ฮด~โ€‹(z)โ€‹๐šบ)+11+ฮดโ€‹(z)โ€‹๐€๐€โˆ—)โˆ’1,T(z)=\left(-z(\mathbf{I}+\tilde{\delta}(z)\bm{\Sigma})+\frac{1}{1+\delta(z)}\mathbf{A}\mathbf{A}^{*}\right)^{-1},
T~โ€‹(z)=(โˆ’zโ€‹(1+ฮดโ€‹(z))โ€‹๐ˆ+๐€โˆ—โ€‹(๐ˆ+ฮด~โ€‹(z)โ€‹๐šบ)โˆ’1โ€‹๐€)โˆ’1,\tilde{T}(z)=\left(-z(1+\delta(z))\mathbf{I}+\mathbf{A}^{*}(\mathbf{I}+\tilde{\delta}(z)\bm{\Sigma})^{-1}\mathbf{A}\right)^{-1},

ฮดโ€‹(z)=1nโ€‹trโ€‹(๐šบโ€‹Tโ€‹(z))\delta(z)=\frac{1}{n}\mbox{tr}(\bm{\Sigma}T(z)) and ฮด~โ€‹(z)=1nโ€‹trโ€‹(T~โ€‹(z))\tilde{\delta}(z)=\frac{1}{n}\mbox{tr}(\tilde{T}(z)). Using the Woodbury matrix identity in Lemma 2, there is

T~โ€‹(z)=โˆ’1zโ€‹(1+ฮดโ€‹(z))โ€‹๐ˆโˆ’(โˆ’1zโ€‹(1+ฮดโ€‹(z)))2โ€‹๐€โˆ—โ€‹[๐ˆ+ฮด~โ€‹(z)โ€‹๐šบ+โˆ’1zโ€‹(1+ฮดโ€‹(z))โ€‹๐€๐€โˆ—]โˆ’1โ€‹๐€.\tilde{T}(z)=-\frac{1}{z(1+\delta(z))}\mathbf{I}-\left(-\frac{1}{z(1+\delta(z))}\right)^{2}\mathbf{A}^{*}\left[\mathbf{I}+\tilde{\delta}(z)\bm{\Sigma}+\frac{-1}{z(1+\delta(z))}\mathbf{A}\mathbf{A}^{*}\right]^{-1}\mathbf{A}. (32)

To prove (9), let

ฮ”~โ€‹(z)=ฮด~โ€‹(z)โ€‹๐ˆโˆ’(ฮด~โ€‹(z))2โ€‹๐€โˆ—โ€‹[๐ˆ+ฮด~โ€‹(z)โ€‹(๐šบ+๐€๐€โˆ—)]โˆ’1โ€‹๐€.\tilde{\Delta}(z)=\tilde{\delta}(z)\mathbf{I}-(\tilde{\delta}(z))^{2}\mathbf{A}^{*}\left[\mathbf{I}+\tilde{\delta}(z)\left(\bm{\Sigma}+\mathbf{A}\mathbf{A}^{*}\right)\right]^{-1}\mathbf{A}.

There is

|๐ฎโˆ—โ€‹(T~โ€‹(z)โˆ’ฮ”~โ€‹(z))โ€‹๐ฎ|โ‰ค|โˆ’1zโ€‹(1+ฮดโ€‹(z))โˆ’ฮด~โ€‹(z)|โ€‹|๐ฎโˆ—โ€‹๐ฎ|\displaystyle\left|\mathbf{u}^{*}\left(\tilde{T}(z)-\tilde{\Delta}(z)\right)\mathbf{u}\right|\leq\left|-\frac{1}{z(1+\delta(z))}-\tilde{\delta}(z)\right||\mathbf{u}^{*}\mathbf{u}|
+|(โˆ’1zโ€‹(1+ฮดโ€‹(z)))2โˆ’ฮด~2โ€‹(z)|โ€‹|๐ฎโˆ—โ€‹(๐€โˆ—โ€‹[๐ˆ+ฮด~โ€‹(z)โ€‹(๐šบ+๐€๐€โˆ—)]โˆ’1โ€‹๐€)โ€‹๐ฎ|\displaystyle+\left|\left(-\frac{1}{z(1+\delta(z))}\right)^{2}-\tilde{\delta}^{2}(z)\right|\left|\mathbf{u}^{*}\left(\mathbf{A}^{*}\left[\mathbf{I}+\tilde{\delta}(z)\left(\bm{\Sigma}+\mathbf{A}\mathbf{A}^{*}\right)\right]^{-1}\mathbf{A}\right)\mathbf{u}\right|
+|(โˆ’1zโ€‹(1+ฮดโ€‹(z)))2||๐ฎโˆ—(๐€โˆ—[๐ˆ+ฮด~(z)(๐šบ+๐€๐€โˆ—)]โˆ’1๐€\displaystyle+\left|\left(-\frac{1}{z(1+\delta(z))}\right)^{2}\right|\Bigg{|}\mathbf{u}^{*}\Bigg{(}\mathbf{A}^{*}\left[\mathbf{I}+\tilde{\delta}(z)\left(\bm{\Sigma}+\mathbf{A}\mathbf{A}^{*}\right)\right]^{-1}\mathbf{A}
โˆ’๐€โˆ—[๐ˆ+ฮด~(z)๐šบ+โˆ’1zโ€‹(1+ฮดโ€‹(z))๐€๐€โˆ—]โˆ’1๐€)๐ฎ|.\displaystyle~{}-\mathbf{A}^{*}\left[\mathbf{I}+\tilde{\delta}(z)\bm{\Sigma}+\frac{-1}{z(1+\delta(z))}\mathbf{A}\mathbf{A}^{*}\right]^{-1}\mathbf{A}\Bigg{)}\mathbf{u}\Bigg{|}. (33)

We first consider the convergence rate of

โˆ’1zโ€‹(1+ฮดโ€‹(z))โˆ’ฮด~โ€‹(z).-\frac{1}{z(1+\delta(z))}-\tilde{\delta}(z). (34)

By (32) there is

โˆ’1zโ€‹(1+ฮดโ€‹(z))โˆ’ฮด~โ€‹(z)=1nโ€‹(1zโ€‹(1+ฮดโ€‹(z)))2โ€‹trโ€‹๐€โˆ—โ€‹Tโ€‹(z)โ€‹๐€.\displaystyle-\frac{1}{z(1+\delta(z))}-\tilde{\delta}(z)=\frac{1}{n}\left(\frac{1}{z(1+\delta(z))}\right)^{2}\mbox{tr}\mathbf{A}^{*}T(z)\mathbf{A}. (35)

Proposition 2.2 in Hachem etย al., (2007) yields โ€–Tโ€‹(z)โ€–โ‰ค1โ„‘โกz\|T(z)\|\leq\frac{1}{\Im z}, and one can see in Hachem etย al., (2013) as well. Also, by Lemma 2.3 of Silverstein and Bai, (1995), there is โ€–(๐ˆ+ฮด~โ€‹(z)โ€‹๐šบ)โˆ’1โ€–โ‰คmaxโก(4โ„‘โกz,2)\|(\mathbf{I}+\tilde{\delta}(z)\bm{\Sigma})^{-1}\|\leq\max(\frac{4}{\Im z},2). Combining the fact of trโ€‹๐€๐€โˆ—=Oโ€‹(1)\mbox{tr}\mathbf{A}\mathbf{A}^{*}=O(1), we have

|โˆ’1zโ€‹(1+ฮดโ€‹(z))โˆ’ฮด~โ€‹(z)|=Oโ€‹(1nโ€‹(โ„‘โกz)3).|-\frac{1}{z(1+\delta(z))}-\tilde{\delta}(z)|=O\left(\frac{1}{n(\Im z)^{3}}\right).

Thus, a direct calculation shows that

|uโˆ—โ€‹(T~โ€‹(z)โˆ’ฮ”~โ€‹(z))โ€‹v|โ‰คOโ€‹(1nโ€‹(โ„‘โกz)7).\displaystyle\left|u^{*}\left(\tilde{T}(z)-\tilde{\Delta}(z)\right)v\right|\leq O\left(\frac{1}{n(\Im z)^{7}}\right). (36)

Next, let

R~โ€‹(z)=r~โ€‹(z)โ€‹๐ˆโˆ’(r~โ€‹(z))2โ€‹๐€โˆ—โ€‹[๐ˆ+r~โ€‹(z)โ€‹(๐šบ+๐€๐€โˆ—)]โˆ’1โ€‹๐€,\tilde{R}(z)=\tilde{r}(z)\mathbf{I}-(\tilde{r}(z))^{2}\mathbf{A}^{*}\left[\mathbf{I}+\tilde{r}(z)\left(\bm{\Sigma}+\mathbf{A}\mathbf{A}^{*}\right)\right]^{-1}\mathbf{A},

where r~โ€‹(z)\tilde{r}(z) in ๐’ž+\mathcal{C}^{+} solves the equation

z=โˆ’1r~โ€‹(z)+cnโ€‹โˆซtโ€‹dโ€‹H๐‘nโ€‹(t)1+tโ€‹r~โ€‹(z),\displaystyle z=-\frac{1}{\tilde{r}(z)}+c_{n}\int\frac{tdH^{\mathbf{R}_{n}}(t)}{1+t\tilde{r}(z)},

and H๐‘nโ€‹(t)H^{\mathbf{R}_{n}}(t) is the empirical spectral distribution of ๐‘n=๐šบ+๐€๐€โˆ—\mathbf{R}_{n}=\bm{\Sigma}+\mathbf{A}\mathbf{A}^{*}. If we denote the right hand side of (35) by ฯ‰\omega, then (35) can be rewritten as

z=1ฮด~โˆ’zโ€‹ฮด+ฯ‰1,\displaystyle z=\frac{1}{\tilde{\delta}}-z\delta+\omega_{1},

where ฯ‰1=โˆ’1ฮด~โˆ’1ฮด~+ฯ‰\omega_{1}=-\frac{1}{\tilde{\delta}}-\frac{1}{\tilde{\delta}+\omega}. We also let

Tโ€ฒโ€‹(z)=(โˆ’zโ€‹(๐ˆ+ฮด~โ€‹(z)โ€‹๐šบ)โˆ’zโ€‹ฮด~โ€‹(z)โ€‹๐€๐€โˆ—)โˆ’1.T^{\prime}(z)=\left(-z(\mathbf{I}+\tilde{\delta}(z)\bm{\Sigma})-z\tilde{\delta}(z)\mathbf{A}\mathbf{A}^{*}\right)^{-1}.

By the definition of ฮด\delta, this equation can be further written as

z=โˆ’1ฮด~โˆ’znโ€‹trโ€‹๐šบโ€‹T+ฯ‰1=โˆ’1ฮด~โˆ’znโ€‹trโ€‹(๐šบ+๐€๐€โˆ—)โ€‹T+znโ€‹trโ€‹๐€๐€โˆ—โ€‹T+ฯ‰1=โˆ’1ฮด~โˆ’znโ€‹trโ€‹(๐šบ+๐€๐€โˆ—)โ€‹Tโ€ฒ+znโ€‹trโ€‹(๐šบ+๐€๐€โˆ—)โ€‹(Tโ€ฒโˆ’T)+znโ€‹trโ€‹๐€๐€โˆ—โ€‹T+ฯ‰1=โˆ’1ฮด~+cnโ€‹โˆซtโ€‹dโ€‹H๐‘nโ€‹(t)1+tโ€‹ฮด~+ฯ‰2,\displaystyle\begin{aligned} z&=-\frac{1}{\tilde{\delta}}-\frac{z}{n}\mbox{tr}\bm{\Sigma}T+\omega_{1}\\ &=-\frac{1}{\tilde{\delta}}-\frac{z}{n}\mbox{tr}(\bm{\Sigma}+\mathbf{A}\mathbf{A}^{*})T+\frac{z}{n}\mbox{tr}\mathbf{A}\mathbf{A}^{*}T+\omega_{1}\\ &=-\frac{1}{\tilde{\delta}}-\frac{z}{n}\mbox{tr}(\bm{\Sigma}+\mathbf{A}\mathbf{A}^{*})T^{\prime}+\frac{z}{n}\mbox{tr}(\bm{\Sigma}+\mathbf{A}\mathbf{A}^{*})(T^{\prime}-T)+\frac{z}{n}\mbox{tr}\mathbf{A}\mathbf{A}^{*}T+\omega_{1}\\ &=-\frac{1}{\tilde{\delta}}+c_{n}\int\frac{tdH^{\mathbf{R}_{n}}(t)}{1+t\tilde{\delta}}+\omega_{2},\end{aligned} (37)

where ฯ‰2=ฯ‰1+znโ€‹trโ€‹(๐šบ+๐€๐€โˆ—)โ€‹(Tโ€ฒโˆ’T)+znโ€‹trโ€‹๐€๐€โˆ—โ€‹T.\omega_{2}=\omega_{1}+\frac{z}{n}\mbox{tr}(\bm{\Sigma}+\mathbf{A}\mathbf{A}^{*})(T^{\prime}-T)+\frac{z}{n}\mbox{tr}\mathbf{A}\mathbf{A}^{*}T. We have that |ฯ‰1|=Oโ€‹(1nโ€‹(โ„‘โกz)5)|\omega_{1}|=O(\frac{1}{n(\Im z)^{5}}), |znโ€‹trโ€‹(๐šบ+๐€๐€โˆ—)โ€‹(Tโ€ฒโˆ’T)|=Oโ€‹(1nโ€‹(โ„‘โกz)5)|\frac{z}{n}\mbox{tr}(\bm{\Sigma}+\mathbf{A}\mathbf{A}^{*})(T^{\prime}-T)|=O(\frac{1}{n(\Im z)^{5}}),and |znโ€‹trโ€‹๐€๐€โˆ—โ€‹T|=Oโ€‹(1nโ€‹โ„‘โกz)|\frac{z}{n}\mbox{tr}\mathbf{A}\mathbf{A}^{*}T|=O(\frac{1}{n\Im z}). Then it follows that |ฯ‰2|=Oโ€‹(1nโ€‹(โ„‘โกz)5)|\omega_{2}|=O(\frac{1}{n(\Im z)^{5}}). With equations (8) and (37) at hand, there is

ฮด~โˆ’r~\displaystyle\tilde{\delta}-\tilde{r} =\displaystyle= (ฮด~โˆ’r~)โ€‹(ฮด~โ€‹r~โ€‹cnโ€‹โˆซt2โ€‹dโ€‹H๐‘nโ€‹(t)(1+tโ€‹r~)โ€‹(1+tโ€‹ฮด~))โˆ’ฮด~โ€‹r~โ€‹ฯ‰2.\displaystyle(\tilde{\delta}-\tilde{r})\left(\tilde{\delta}\tilde{r}c_{n}\int\frac{t^{2}dH^{\mathbf{R}_{n}}(t)}{(1+t\tilde{r})(1+t\tilde{\delta})}\right)-\tilde{\delta}\tilde{r}\omega_{2}.

Similar to (6.2.26) in Bai and Silverstein, (2010), we also have

|ฮด~โ€‹r~โ€‹cnโ€‹โˆซt2โ€‹dโ€‹H๐‘nโ€‹(t)(1+tโ€‹r~)โ€‹(1+tโ€‹ฮด~)|โ‰ค1โˆ’Cโ€‹(โ„‘โกz)2.\left|\tilde{\delta}\tilde{r}c_{n}\int\frac{t^{2}dH^{\mathbf{R}_{n}}(t)}{(1+t\tilde{r})(1+t\tilde{\delta})}\right|\leq 1-C(\Im z)^{2}.

Therefore, there is

|ฮด~โˆ’r~|=Oโ€‹(1nโ€‹(โ„‘โกz)7).|\tilde{\delta}-\tilde{r}|=O\left(\frac{1}{n(\Im z)^{7}}\right).

Using the same arguments as in (5), it follows that

|๐ฎโˆ—โ€‹(R~โ€‹(z)โˆ’ฮ”~โ€‹(z))โ€‹๐ฎ|=Oโ€‹(1nโ€‹(โ„‘โกz)11).\displaystyle|\mathbf{u}^{*}(\tilde{R}(z)-\tilde{\Delta}(z))\mathbf{u}|=O\left(\frac{1}{n(\Im z)^{11}}\right).

Then the conclusion follows. โˆŽ

To prove Theorem 1, we also need the separation of the spiked eigenvalues of ๐’n\mathbf{S}_{n}. Recall that ๐‘n=๐€nโ€‹๐€nโŠค+๐šบ=โˆ‘k=1pฮณkโ€‹ฮพkโ€‹ฮพkโŠค\mathbf{R}_{n}=\mathbf{A}_{n}\mathbf{A}_{n}^{\top}+\bm{\Sigma}=\sum_{k=1}^{p}\gamma_{k}\xi_{k}\xi_{k}^{\top}.

Lemma 3.

Under assumptions of Theorem 1, for ak,bka_{k},b_{k} satisfying [โˆ’r~โ€‹(ak)โˆ’1,โˆ’r~โ€‹(bk)โˆ’1]โŠ‚(ฮณk+1,ฮณk)[-\tilde{r}(a_{k})^{-1},-\tilde{r}(b_{k})^{-1}]\subset(\gamma_{k+1},\gamma_{k}) for k=1,โ€ฆ,Kk=1,\ldots,K, where r~โ€‹(z)\tilde{r}(z) are given in (8), we have

๐โ€‹(ฮปk>bkโ€‹ย andย โ€‹ฮปk+1<ak)โ†’1โ€‹ย asย nโ†’โˆž,\mathbf{P}(\lambda_{k}>b_{k}\text{ and }\lambda_{k+1}<a_{k})\rightarrow 1\text{ as $n\rightarrow\infty$},

where ฮปk\lambda_{k} is the kk-th largest eigenvalue of ๐’n\mathbf{S}_{n}.

Proof.

Theorem 1 in Liu etย al., (2022) has shown that the conclusion holds with an additional assumption that ๐€\mathbf{A} contains finite number of different columns. We use two steps to extend to general low-rank ๐€\mathbf{A}. The first step is to show that for general low rank ๐€\mathbf{A} the conclusion holds when ๐–\mathbf{W} is Gaussian. And the second step is to extend to general ๐–\mathbf{W} satisfying Assumption 1.

We begin with the first step. Assume that ๐–\mathbf{W} is Gaussian, and ๐€\mathbf{A} has a singular value decomposition ๐”1โ€‹ฮ›โ€‹๐•1โŠค\mathbf{U}_{1}\Lambda\mathbf{V}_{1}^{\top}, where ฮ›\Lambda is a pร—np\times n matrix with KK singular values on its first KK main diagonal positions. Assume K=2K=2 for simplicity and the case for general KK is similar. Let ๐•2\mathbf{V}_{2} be an orthogonal matrix where the first row has non-zero entries that all equal 2/n\sqrt{2/n} on the first n/2n/2 coordinates, and the second row has non-zero entries that equal 2/n\sqrt{2/n} on the last n/2n/2 coordinates. Further define ๐Ž=๐•1โ€‹๐•2\mathbf{O}=\mathbf{V}_{1}\mathbf{V}_{2}. Then ๐”1โŠคโ€‹๐—๐Ž=ฮ›โ€‹๐•2+๐”โŠคโ€‹๐šบ1/2โ€‹๐–๐Ž=dฮ›โ€‹๐•2+๐”โŠคโ€‹๐šบ1/2โ€‹๐–\mathbf{U}_{1}^{\top}\mathbf{X}\mathbf{O}=\Lambda\mathbf{V}_{2}+\mathbf{U}^{\top}\bm{\Sigma}^{1/2}\mathbf{W}\mathbf{O}\stackrel{{\scriptstyle d}}{{=}}\Lambda\mathbf{V}_{2}+\mathbf{U}^{\top}\bm{\Sigma}^{1/2}\mathbf{W} becomes Model 1 in Liu etย al., (2022), i.e., the columns of signal part contains two different vectors. Therefore the conclusion of this lemma holds for ๐”1โŠคโ€‹๐—๐Ž๐ŽโŠคโ€‹๐—โŠคโ€‹๐”1\mathbf{U}_{1}^{\top}\mathbf{X}\mathbf{O}\mathbf{O}^{\top}\mathbf{X}^{\top}\mathbf{U}_{1}. Thus the conclusion also holds for ๐—๐—โŠค\mathbf{X}\mathbf{X}^{\top} since ๐”1\mathbf{U}_{1} and ๐Ž\mathbf{O} are both orthogonal matrices.

To conclude the second step, we introduce a continuous interpolation matrix defined as ๐–โ€‹(t)=tโ€‹๐–1+1โˆ’tโ€‹๐–0\mathbf{W}(t)=\sqrt{t}\mathbf{W}_{1}+\sqrt{1-t}\mathbf{W}_{0} for tโˆˆ[0,1]t\in[0,1], where ๐–0\mathbf{W}_{0} is Gaussian, ๐–1\mathbf{W}_{1} is general, and both satisfy the moment conditions in Assumption 1. Note that ๐–โ€‹(t)\mathbf{W}(t) satisfies Assumption 1 for any tโˆˆ[0,1]t\in[0,1]. Define ๐—โ€‹(t)\mathbf{X}(t) and ๐’โ€‹(t)\mathbf{S}(t) by replacing ๐–\mathbf{W} with ๐–โ€‹(t)\mathbf{W}(t), respectively. Denote the ii-th largest singular value of a matrix MM by ฯƒiโ€‹(M)\sigma_{i}(M). For any t1,t2โˆˆ[0,1]t_{1},t_{2}\in[0,1], we have

|ฮปiโ€‹(๐’โ€‹(t1))โˆ’ฮปiโ€‹(๐’โ€‹(t2))|\displaystyle|\lambda_{i}(\mathbf{S}(t_{1}))-\lambda_{i}(\mathbf{S}(t_{2}))| โ‰คCโ€ฒโ€‹|ฯƒiโ€‹(๐—โ€‹(t1))โˆ’ฯƒiโ€‹(๐—โ€‹(t2))|\displaystyle\leq C^{\prime}|\sigma_{i}(\mathbf{X}(t_{1}))-\sigma_{i}(\mathbf{X}(t_{2}))| (38)
โ‰คCโ€ฒโ€ฒโ€‹ฯƒ1โ€‹(๐–โ€‹(t1)โˆ’๐–โ€‹(t2))โ‰คCโ€ฒโ€ฒโ€ฒโ€‹|t1โˆ’t2|,\displaystyle\leq C^{\prime\prime}\sigma_{1}(\mathbf{W}(t_{1})-\mathbf{W}(t_{2}))\leq C^{\prime\prime\prime}\sqrt{|t_{1}-t_{2}|},

where Cโ€ฒ,Cโ€ฒโ€ฒ,Cโ€ฒโ€ฒโ€ฒC^{\prime},C^{\prime\prime},C^{\prime\prime\prime} are some positive constants independent of n,t1,t2n,t_{1},t_{2}. In the first and third step we use the fact that ฯƒ1โ€‹(๐’โ€‹(t))\sigma_{1}(\mathbf{S}(t)) and ฯƒ1โ€‹(๐–โ€‹(t))\sigma_{1}(\mathbf{W}(t)) are bounded, and the second step uses Welyโ€™s inequality. Now we can conclude the exact separation by the continuity of eigenvalues together with Proposition 1 in Liu etย al., (2022). More specifically, let tj=j/nt_{j}=j/n, we know that ฮปkโ€‹(๐’โ€‹(0))>bk\lambda_{k}(\mathbf{S}(0))>b_{k}, and Proposition 1 in Liu etย al., (2022) implies that there are no eigenvalues of ๐’โ€‹(tj)\mathbf{S}(t_{j}), j=1,โ‹ฏ,nj=1,\cdots,n in [ak,bk][a_{k},b_{k}]. Therefore (38) implies ฮปkโ€‹(๐’โ€‹(1))>bk\lambda_{k}(\mathbf{S}(1))>b_{k} with probability tending to one. โˆŽ

Proof of Theorem 1.

Proof.

We first prove (16). Define

โ„yโ€‹(k)={zโˆˆโ„‚:ฯƒ^1โ‰คโ„œโกzโ‰คฯƒ^2,|โ„‘โกz|โ‰คy},\mathbb{R}_{y}(k)=\{z\in\mathbb{C}:~{}\hat{\sigma}_{1}\leq\Re z\leq\hat{\sigma}_{2},~{}|\Im z|\leq y\},

where y>0y>0, [ฯƒ^1,ฯƒ^2][\hat{\sigma}_{1},\hat{\sigma}_{2}] encloses the sample eigenvalues ฮปk\lambda_{k} of ๐—nโˆ—โ€‹๐—n\mathbf{X}_{n}^{*}\mathbf{X}_{n} and excludes all other sample eigenvalues. The existence of โ„yโ€‹(k)\mathbb{R}_{y}(k) is guaranteed by the Assumption 4. By Cauchy integral formula, we have

12โ€‹ฯ€โ€‹iโ€‹โˆฎโˆ‚โ„yโˆ’โ€‹(k)๐ฎโˆ—โ€‹Q~nโ€‹(z)โ€‹๐ฎโ€‹๐‘‘z=๐ฎโˆ—โ€‹๐ฎ^kโ€‹๐ฎ^kโˆ—โ€‹๐ฎ:=r^k,\frac{1}{2\pi i}\oint_{\partial\mathbb{R}_{y}^{-}(k)}\mathbf{u}^{*}\tilde{Q}_{n}(z)\mathbf{u}dz=\mathbf{u}^{*}\hat{\mathbf{u}}_{k}\hat{\mathbf{u}}_{k}^{*}\mathbf{u}:=\hat{r}_{k}, (39)

where vv is any nร—1n\times 1 deterministic unit vector, and โˆ‚โ„yโˆ’โ€‹(k)\partial\mathbb{R}_{y}^{-}(k) represents negatively oriented boundary of โ„yโ€‹(k)\mathbb{R}_{y}(k).

Lemma 4.

Under Assumptions of Theorem 1, there is

|r^kโˆ’12โ€‹ฯ€โ€‹iโ€‹โˆฎโˆ‚โ„yโˆ’โ€‹(k)๐ฎโˆ—โ€‹R~nโ€‹(z)โ€‹๐ฎโ€‹๐‘‘z|=OPโ€‹(1n).\left|\hat{r}_{k}-\frac{1}{2\pi i}\oint_{\partial\mathbb{R}_{y}^{-}(k)}\mathbf{u}^{*}\tilde{R}_{n}(z)\mathbf{u}dz\right|=O_{P}\left(\frac{1}{\sqrt{n}}\right).

where R~โ€‹(z)\tilde{R}(z) is defined in (10).

Proof.

The proof is in the same spirit as that of Proposition 1 in Mestre, 2008b . Since our result provides a convergence rate of error, we use a slightly different argument by considering the second moment of the left term. Define an event ฮฉ:={ฯƒ^1+ฮด<ฮป^k<ฯƒ^2โˆ’ฮด}\Omega:=\{\hat{\sigma}_{1}+\delta<\hat{\lambda}_{k}<\hat{\sigma}_{2}-\delta\}, which holds with probability tending to one for some small positive ฮด>0\delta>0 independent of nn. We have

Eโ€‹|โˆฎโˆ‚โ„yโˆ’โ€‹(k)(๐ฎnโˆ—โ€‹(Q~nโ€‹(z)โˆ’R~nโ€‹(z))โ€‹๐ฎn)โ€‹Iโ€‹(ฮฉ)โ€‹๐‘‘z|2\displaystyle\mbox{E}\left|\oint_{\partial\mathbb{R}_{y}^{-}(k)}\left(\mathbf{u}_{n}^{*}(\tilde{Q}_{n}(z)-\tilde{R}_{n}(z))\mathbf{u}_{n}\right)I(\Omega)dz\right|^{2} (40)
โ‰คCโ€‹โˆฎโˆ‚โ„yโˆ’โ€‹(k)Eโ€‹(|๐ฎnโˆ—โ€‹(Q~nโ€‹(z)โˆ’R~nโ€‹(z))โ€‹๐ฎn|2โ€‹Iโ€‹(ฮฉ))โ€‹|dโ€‹z|=Oโ€‹(nโˆ’1)\displaystyle\leq C\oint_{\partial\mathbb{R}_{y}^{-}(k)}\mbox{E}\left(|\mathbf{u}_{n}^{*}(\tilde{Q}_{n}(z)-\tilde{R}_{n}(z))\mathbf{u}_{n}|^{2}I(\Omega)\right)|dz|=O(n^{-1})

where the first step uses Hรถlderโ€™s inequality and the second step follows from (S.72) and (S.73). The conclusion follows from Chebyshevโ€™s inequality. โˆŽ

The above lemma reduces the proof to calculating the deterministic integral

F=12โ€‹ฯ€โ€‹iโ€‹โˆฎโˆ‚โ„yโˆ’โ€‹(k)๐ฎโˆ—โ€‹R~โ€‹(z)โ€‹๐ฎโ€‹๐‘‘z.F=\frac{1}{2\pi i}\oint_{\partial\mathbb{R}_{y}^{-}(k)}\mathbf{u}^{*}\tilde{R}(z)\mathbf{u}dz.

Let wโ€‹(z)=โˆ’1r~โ€‹(z)w(z)=-\frac{1}{\tilde{r}(z)}, where r~โ€‹(z)\tilde{r}(z) is introduced in Proposition 1. We find that wโ€‹(z)w(z) satisfies the following equation

z=wโ€‹(z)โ€‹(1โˆ’cโ€‹โˆซtโ€‹dโ€‹F๐‘nโ€‹(t)tโˆ’wโ€‹(z)),z=w(z)\left(1-c\int\frac{tdF^{\mathbf{R}_{n}}(t)}{t-w(z)}\right),

which is parallel to equation (24) in Mestre, 2008a . Thus, wโ€‹(z)w(z) satisfies all the properties listed in Proposition 2 in Mestre, 2008a . Write F=F1+F2F=F_{1}+F_{2}, where

F1=โˆ’12โ€‹ฯ€โ€‹iโ€‹๐ฎโˆ—โ€‹๐ฎโ€‹โˆฎTโˆ’โ€‹(k)1wโ€‹[1โˆ’1nโ€‹โˆ‘k=1p(ฮณkฮณkโˆ’w)2]โ€‹๐‘‘w,F_{1}=-\frac{1}{2\pi i}\mathbf{u}^{*}\mathbf{u}\oint_{T^{-}(k)}\frac{1}{w}\left[1-\frac{1}{n}\sum^{p}_{k=1}\left(\frac{\gamma_{k}}{\gamma_{k}-w}\right)^{2}\right]dw, (41)
F2=โˆ’12โ€‹ฯ€โ€‹iโ€‹โˆฎTโˆ’โ€‹(k)1wโ€‹๐ฎโˆ—โ€‹๐€โˆ—โ€‹โˆ‘k=1pฮพkโ€‹ฮพkโˆ—wโˆ’ฮณkโ€‹๐€๐ฎโ€‹[1โˆ’1nโ€‹โˆ‘k=1p(ฮณkฮณkโˆ’w)2]โ€‹dโ€‹w,F_{2}=-\frac{1}{2\pi i}\oint_{T^{-}(k)}\frac{1}{w}\mathbf{u}^{*}\mathbf{A}^{*}\sum_{k=1}^{p}\frac{\xi_{k}\xi_{k}^{*}}{w-\gamma_{k}}\mathbf{A}\mathbf{u}\left[1-\frac{1}{n}\sum^{p}_{k=1}\left(\frac{\gamma_{k}}{\gamma_{k}-w}\right)^{2}\right]dw, (42)

where Tโˆ’โ€‹(k)T^{-}(k) is a simple closed curve that includes ฮณk\gamma_{k} and excludes all the other population eigenvalues of ๐‘n\mathbf{R}_{n} with negative orientation. By a calculation,

F1=Rโ€‹eโ€‹sโ€‹(1wโ€‹[1โˆ’1nโ€‹โˆ‘k=1p(ฮณkฮณkโˆ’w)2],ฮณk)=1n.F_{1}=Res\left(\frac{1}{w}\left[1-\frac{1}{n}\sum^{p}_{k=1}\left(\frac{\gamma_{k}}{\gamma_{k}-w}\right)^{2}\right],\gamma_{k}\right)=\frac{1}{n}.

For F2F_{2}, we further decompose the integrand as

F2=โˆ’12โ€‹ฯ€โ€‹iโ€‹โˆฎTโˆ’โ€‹(k)(ฯ‡1โ€‹kโ€‹(w)+ฯ‡2โ€‹kโ€‹(w)+ฯ‡3โ€‹kโ€‹(w)+ฯ‡4โ€‹kโ€‹(w))โ€‹๐‘‘w,F_{2}=-\frac{1}{2\pi i}\oint_{T^{-}(k)}(\chi_{1k}(w)+\chi_{2k}(w)+\chi_{3k}(w)+\chi_{4k}(w))dw,

where

ฯ‡1โ€‹kโ€‹(w)\displaystyle\chi_{1k}(w) =\displaystyle= ๐ฎโˆ—โ€‹๐€โˆ—โ€‹ฮพkโ€‹ฮพkโˆ—โ€‹๐€๐ฎwโ€‹(wโˆ’ฮณk),ฯ‡2โ€‹kโ€‹(w)=โˆ’ฮณk2nโ€‹๐ฎโˆ—โ€‹๐€โˆ—โ€‹ฮพkโ€‹ฮพkโˆ—โ€‹๐€๐ฎwโ€‹(wโˆ’ฮณk)3\displaystyle\frac{\mathbf{u}^{*}\mathbf{A}^{*}\xi_{k}\xi_{k}^{*}\mathbf{A}\mathbf{u}}{w(w-\gamma_{k})},~{}\chi_{2k}(w)=-\frac{\gamma_{k}^{2}}{n}\frac{\mathbf{u}^{*}\mathbf{A}^{*}\xi_{k}\xi_{k}^{*}\mathbf{A}\mathbf{u}}{w(w-\gamma_{k})^{3}}
ฯ‡3โ€‹kโ€‹(w)\displaystyle\chi_{3k}(w) =\displaystyle= โˆ’๐ฎโˆ—โ€‹๐€โˆ—โ€‹ฮพkโ€‹ฮพkโˆ—โ€‹๐€๐ฎnโ€‹wโ€‹(wโˆ’ฮณk)โ€‹โˆ‘i=1,iโ‰ kp(ฮณiฮณiโˆ’w)2,\displaystyle-\frac{\mathbf{u}^{*}\mathbf{A}^{*}\xi_{k}\xi_{k}^{*}\mathbf{A}\mathbf{u}}{nw(w-\gamma_{k})}\sum_{i=1,i\neq k}^{p}\left(\frac{\gamma_{i}}{\gamma_{i}-w}\right)^{2},
ฯ‡4โ€‹kโ€‹(w)\displaystyle\chi_{4k}(w) =\displaystyle= โˆ’1nโ€‹wโ€‹๐ฎโˆ—โ€‹๐€โˆ—โ€‹โˆ‘i=1,iโ‰ kpฮพiโ€‹ฮพiโˆ—wโˆ’ฮณiโ€‹๐€๐ฎโ€‹ฮณk2(ฮณkโˆ’w)2.\displaystyle-\frac{1}{nw}\mathbf{u}^{*}\mathbf{A}^{*}\sum_{i=1,i\neq k}^{p}\frac{\xi_{i}\xi_{i}^{*}}{w-\gamma_{i}}\mathbf{A}\mathbf{u}\frac{\gamma_{k}^{2}}{(\gamma_{k}-w)^{2}}.

By calculation, there are

Rโ€‹eโ€‹sโ€‹(ฯ‡1โ€‹kโ€‹(w),ฮณk)\displaystyle Res(\chi_{1k}(w),\gamma_{k}) =\displaystyle= ๐ฎโˆ—โ€‹๐€โˆ—โ€‹ฮพkโ€‹ฮพkโˆ—โ€‹๐€๐ฎฮณk,Rโ€‹eโ€‹sโ€‹(ฯ‡2โ€‹kโ€‹(w),ฮณk)=โˆ’๐ฎโˆ—โ€‹๐€โˆ—โ€‹ฮพkโ€‹ฮพkโˆ—โ€‹๐€๐ฎnโ€‹ฮณk,\displaystyle\frac{\mathbf{u}^{*}\mathbf{A}^{*}\xi_{k}\xi_{k}^{*}\mathbf{A}\mathbf{u}}{\gamma_{k}},~{}Res(\chi_{2k}(w),\gamma_{k})=-\frac{\mathbf{u}^{*}\mathbf{A}^{*}\xi_{k}\xi_{k}^{*}\mathbf{A}\mathbf{u}}{n\gamma_{k}},
Rโ€‹eโ€‹sโ€‹(ฯ‡3โ€‹kโ€‹(w),ฮณk)\displaystyle Res(\chi_{3k}(w),\gamma_{k}) =\displaystyle= โˆ’๐ฎโˆ—โ€‹๐€โˆ—โ€‹ฮพkโ€‹ฮพkโˆ—โ€‹๐€๐ฎnโ€‹ฮณkโ€‹โˆ‘i=1,iโ‰ kp(ฮณiฮณiโˆ’ฮณk)2,\displaystyle-\frac{\mathbf{u}^{*}\mathbf{A}^{*}\xi_{k}\xi_{k}^{*}\mathbf{A}\mathbf{u}}{n\gamma_{k}}\sum_{i=1,i\neq k}^{p}\left(\frac{\gamma_{i}}{\gamma_{i}-\gamma_{k}}\right)^{2},
Rโ€‹eโ€‹sโ€‹(ฯ‡4โ€‹kโ€‹(w),ฮณk)\displaystyle Res(\chi_{4k}(w),\gamma_{k}) =\displaystyle= โˆ’1nโ€‹๐ฎโˆ—โ€‹๐€โˆ—โ€‹โˆ‘i=1,iโ‰ kฮพiโ€‹ฮพiโˆ—โ€‹(ฮณiโˆ’2โ€‹ฮณk)(ฮณkโˆ’ฮณi)2โ€‹๐€๐ฎ.\displaystyle-\frac{1}{n}\mathbf{u}^{*}\mathbf{A}^{*}\sum_{i=1,i\neq k}\frac{\xi_{i}\xi_{i}^{*}(\gamma_{i}-2\gamma_{k})}{(\gamma_{k}-\gamma_{i})^{2}}\mathbf{A}\mathbf{u}.

Therefore, we have

F=๐ฎโˆ—โ€‹๐€โˆ—โ€‹ฮพkโ€‹ฮพkโˆ—โ€‹๐€๐ฎฮณkโ€‹(1โˆ’1nโ€‹โˆ‘i=1,iโ‰ kฮณi2(ฮณkโˆ’ฮณi)2)+Oโ€‹(1n).F=\frac{\mathbf{u}^{*}\mathbf{A}^{*}\xi_{k}\xi_{k}^{*}\mathbf{A}\mathbf{u}}{\gamma_{k}}\left(1-\frac{1}{n}\sum_{i=1,i\neq k}\frac{\gamma_{i}^{2}}{(\gamma_{k}-\gamma_{i})^{2}}\right)+O\left(\frac{1}{n}\right).

Let ฮทk=(1โˆ’1nโ€‹โˆ‘i=1,iโ‰ kฮณi2(ฮณkโˆ’ฮณi)2)\eta_{k}=\left(1-\frac{1}{n}\sum_{i=1,i\neq k}\frac{\gamma_{i}^{2}}{(\gamma_{k}-\gamma_{i})^{2}}\right), and we conclude (16).

The first assertion can be obtained by an argument similar to the one that leads to (40) and the calculation of the deterministic term is exactly the same as (22) in Mestre, 2008b , where their dkโ€‹(z)d_{k}(z) lines up with (โˆ’ฮณkโ€‹zโ€‹r~โ€‹(z)โˆ’z)โˆ’1(-\gamma_{k}z\tilde{r}(z)-z)^{-1} in our case. โˆŽ

Proof of Theorem 2.

Proof.

We first consider (19).Denote the support of HH by ฮ“H\Gamma_{H}. Under Assumption 4, it is easy to obtain that ฯ†โ€ฒโ€‹(ฮณk)>0\varphi^{\prime}(\gamma_{k})>0 for 1โ‰คkโ‰คK1\leq k\leq K. By the continuity of ฯ†\varphi, there exists ฮด>0\delta>0 such that

ฯ†โ€ฒโ€‹(x)>0,โˆ€xโˆˆ(ฮณkโˆ’ฮด,ฮณk+ฮด)\varphi^{\prime}(x)>0,\quad\forall x\in(\gamma_{k}-\delta,\gamma_{k}+\delta) (43)

and ฮณk+1<ฮณkโˆ’ฮด<ฮณk+ฮด<ฮณkโˆ’1\gamma_{k+1}<\gamma_{k}-\delta<\gamma_{k}+\delta<\gamma_{k-1} (by default, ฮณ0=โˆž\gamma_{0}=\infty). Then, we can find 0<ฮต<ฮด0<\varepsilon<\delta and ฮณkโˆ’ฮด<a<b<ฮณkโˆ’ฮต<ฮณk<ฮณk+ฮต<e<f<ฮณk+ฮด\gamma_{k}-\delta<a<b<\gamma_{k}-\varepsilon<\gamma_{k}<\gamma_{k}+\varepsilon<e<f<\gamma_{k}+\delta such that [a,b][a,b] and [e,f][e,f] are outside ฮ“H\Gamma_{H}. For ฮปโˆˆ[a,b]โˆช[e,f]\lambda\in[a,b]\cup[e,f], define

ฯ†nโ€‹(ฮป)=ฯ†cn,Hnโ€‹(ฮป)โ‰กฮป+ฮปโ€‹cnโ€‹โˆซtฮปโˆ’tโ€‹dF๐‘nโ€‹(t)=ฮป+ฮปโ€‹cnโ€‹(pโˆ’Kpโ€‹โˆซtฮปโˆ’tโ€‹dHnNonโ€‹(t)+1pโ€‹โˆ‘k=1Kฮณkฮปโˆ’ฮณk),\begin{split}\varphi_{n}(\lambda)&=\varphi^{c_{n},H_{n}}(\lambda)\equiv\lambda+\lambda c_{n}\int\frac{t}{\lambda-t}\mathrm{d}F^{\mathbf{R}_{n}}(t)\\ &=\lambda+\lambda c_{n}\left(\frac{p-K}{p}\int\frac{t}{\lambda-t}\mathrm{d}H_{n}^{\text{Non}}(t)+\frac{1}{p}\sum_{k=1}^{K}\frac{\gamma_{k}}{\lambda-\gamma_{k}}\right),\end{split}

where HnNonโ€‹(t)=1pโˆ’Kโ€‹โˆ‘j=K+1pI[ฮปj,โˆž)โ€‹(t)H_{n}^{\text{Non}}(t)=\frac{1}{p-K}\sum_{j=K+1}^{p}I_{[\lambda_{j},\infty)}(t) is the ESD of nonspikes. Then,

ฯ†nโ€‹(ฮป)โˆ’ฯ†โ€‹(ฮป)=cโ€‹ฮปโ€‹โˆซtฮปโˆ’tโ€‹dHnNonโ€‹(t)โˆ’cโ€‹ฮปโ€‹โˆซtโ€‹dโ€‹Hโ€‹(t)ฮปโˆ’t+cnpโ€‹ฮปโ€‹โˆ‘k=1Kฮณkฮปโˆ’ฮณk+(cnโ€‹pโˆ’Kpโˆ’c)โ€‹ฮปโ€‹โˆซtฮปโˆ’tโ€‹dHnNonโ€‹(t).\begin{split}\varphi_{n}(\lambda)-\varphi(\lambda)=&c\lambda\int\frac{t}{\lambda-t}\mathrm{d}H^{\text{Non}}_{n}(t)-c\lambda\int\frac{t\mathrm{d}H(t)}{\lambda-t}+\frac{c_{n}}{p}\lambda\sum_{k=1}^{K}\frac{\gamma_{k}}{\lambda-\gamma_{k}}\\ &+\left(c_{n}\frac{p-K}{p}-c\right)\lambda\int\frac{t}{\lambda-t}\mathrm{d}H^{\text{Non}}_{n}(t).\end{split} (44)

Observe that

infK+1โ‰คjโ‰คp,ฮปโˆˆ[a,b]โˆช[e,f]|ฮณjโˆ’ฮป|>0โ€‹ย andย โ€‹inf1โ‰คkโ‰คK,ฮปโˆˆ[a,b]โˆช[e,f]|ฮณkโˆ’ฮป|>0,\inf\limits_{K+1\leq j\leq p,\lambda\in[a,b]\cup[e,f]}|\gamma_{j}-\lambda|>0\text{ and }\inf\limits_{1\leq k\leq K,\lambda\in[a,b]\cup[e,f]}|\gamma_{k}-\lambda|>0,

so that the third and the fourth term on the right hand of (44) converge uniformly to zero, as pโ†’โˆžp\rightarrow\infty. It is shown that the first term on the right hand of (44) converges pointwise to the second one, in which they are all continuous function w.r.t. ฮป\lambda. Since {cโ€‹ฮปโ€‹โˆซtฮปโˆ’tโ€‹dHnNonโ€‹(t)}\{c\lambda\int\frac{t}{\lambda-t}\mathrm{d}H^{\text{Non}}_{n}(t)\} can be regarded as a monotone sequence of functions, by Diniโ€™s theorem, the convergence is uniform. Thus, ฯ†n\varphi_{n} uniformly converges to ฯ†\varphi on [a,b]โˆช[e,f][a,b]\cup[e,f]. The proof for the uniform convergence of ฯ†nโ€ฒ\varphi^{\prime}_{n} equal to

ฯ†nโ€ฒโ€‹(ฮป)=1โˆ’cnโ€‹(pโˆ’Kpโ€‹โˆซt2โ€‹dโ€‹HnNonโ€‹(t)(ฮปโˆ’t)2+1pโ€‹โˆ‘k=1Kฮณk2(ฮปโˆ’ฮณk)2).\varphi^{\prime}_{n}(\lambda)=1-c_{n}\left(\frac{p-K}{p}\int\frac{t^{2}\mathrm{d}H^{\text{Non}}_{n}(t)}{(\lambda-t)^{2}}+\frac{1}{p}\sum_{k=1}^{K}\frac{\gamma_{k}^{2}}{(\lambda-\gamma_{k})^{2}}\right).

is analogous and left out here. Hence, from Theorem 4.2 of Silverstein and Choi, (1995), combining (43) and the uniform convergence of ฯ†n,ฯ†nโ€ฒ\varphi_{n},\varphi^{\prime}_{n} on [a,b]โˆช[e,f][a,b]\cup[e,f], it follows that both [ฯ†โ€‹(a),ฯ†โ€‹(b)][\varphi(a),\varphi(b)] and [ฯ†โ€‹(e),ฯ†โ€‹(f)][\varphi(e),\varphi(f)] are out of the support of Fยฏcn,๐‘n\underline{F}^{c_{n},\mathbf{R}_{n}}. Then, using Lemma 3,

โ„™(ฮปk+1โ‰คฯ†(a)<ฯ†(b)โ‰คฮปk,for all largen)โ†’1,โ„™(ฮปkโ‰คฯ†(e)<ฯ†(f)โ‰คฮปkโˆ’1,for all largen)โ†’1.\begin{split}&\mathbb{P}(\lambda_{k+1}\leq\varphi(a)<\varphi(b)\leq\lambda_{k},~{}\text{for all large}~{}n)\rightarrow 1,\\ &\mathbb{P}(\lambda_{k}\leq\varphi(e)<\varphi(f)\leq\lambda_{k-1},~{}\text{for all large}~{}n)\rightarrow 1.\end{split}

Hence, with probability tending to one,

ฯ†โ€‹(b)โ‰คlim infnฮปk,lim supnฮปkโ‰คฯ†โ€‹(e),\varphi(b)\leq\liminf\limits_{n}\lambda_{k},\quad\limsup\limits_{n}\lambda_{k}\leq\varphi(e),

Finally, letting bโ†‘ฮณkb\uparrow\gamma_{k} and eโ†“ฮณke\downarrow\gamma_{k}, we have

ฯ†โ€‹(ฮณk)โ‰คlim infnฮปk,lim supnฮปkโ‰คฯ†โ€‹(ฮณk)in probability.\varphi(\gamma_{k})\leq\liminf\limits_{n}\lambda_{k},\quad\limsup\limits_{n}\lambda_{k}\leq\varphi(\gamma_{k})\quad\text{in probability.} (45)

From (45), we conclude that, in probability,

limnโ†’โˆžฮปk=ฯ†โ€‹(ฮณk),k=1,2,โ€ฆ,K.\lim\limits_{n\rightarrow\infty}\lambda_{k}=\varphi(\gamma_{k}),~{}k=1,2,\ldots,K.

Next we turn to the second assertion (20). Theorems 1.1 and 2.1 in Silverstein and Choi, (1995) has shown that Fยฏc,H\underline{F}^{c,H} has a continuous derivative on โ„โˆ–{0}\mathbb{R}\setminus\{0\} given by the imaginary part of its Stieltjes transform mยฏ\underline{m}, so that

Fc,Hโ€‹(x)=1cโ€‹ฯ€โ€‹โˆซฮ“Fยฏc,Hโ€‹โ‹‚(0,x]โ„‘โกmยฏโ€‹(t)โ€‹๐‘‘t.F^{c,H}(x)=\frac{1}{c\pi}\int_{\Gamma_{\underline{F}^{c,H}}\bigcap(0,x]}\Im\underline{m}(t){d}t.

Note that for positive xโˆ‰ฮ“Fc,Hx\notin\Gamma_{F^{c,H}}, โ„‘โกmยฏโ€‹(x)=0\Im\underline{m}(x)=0 follows from Lemma 3, which indicates that no eigenvalues lie outside the support of the LSD of ๐’n\mathbf{S}_{n} and ๐’~n\tilde{\mathbf{S}}_{n}, and there exists ฮปj,ฮปj+1,0<a<bโˆˆโˆ‚ฮ“Fc,H\lambda_{j},\lambda_{j+1},0<a<b\in\partial\Gamma_{F^{c,H}} satisfying

P(ฮปj+1โ‰คa<x<bโ‰คฮปj,for all largen)=1,Fc,H(a)=Fc,H(x)=Fc,H(b)โ‰œ1โˆ’q.P(\lambda_{j+1}\leq a<x<b\leq\lambda_{j},~{}\text{for all large}~{}n)=1,~{}~{}F^{c,H}(a)=F^{c,H}(x)=F^{c,H}(b)\triangleq 1-q.

Thus, with probability 1, for a<x<ba<x<b

F๐‘บnโ€‹(x)=1pโ€‹โˆ‘i=1pI[ฮปi,โˆž)โ€‹(x)=1โˆ’jpโ€‹โŸถ๐‘คโ€‹Fc,Hโ€‹(x).F^{\bm{S}_{n}}(x)=\frac{1}{p}\sum_{i=1}^{p}I_{[\lambda_{i},\infty)}(x)=1-\frac{j}{p}\overset{w}{\longrightarrow}F^{c,H}(x).

From the definition of quantile, we have

limnโ†’โˆžฮปj=a=ฮผ1โˆ’q,a.s..\lim_{n\rightarrow\infty}\lambda_{j}=a=\mu_{1-q},~{}~{}\text{a.s.}. (46)

When xโˆˆฮ“Fc,Hโˆ–{0}x\in\Gamma_{F^{c,H}}\setminus\{0\}, we can find ฮปr\lambda_{r} such that

limnโ†’โˆžฮปr=x,a.s..\lim_{n\rightarrow\infty}\lambda_{r}=x,~{}~{}\text{a.s.}. (47)

Therefore, with probability 1, there exists ฮด=oโ€‹(p)\delta=o(p) such that

F๐‘บnโ€‹(x)=1pโ€‹โˆ‘i=1pI[ฮปi,โˆž)โ€‹(x)=1โˆ’r+ฮดpโ€‹โŸถ๐‘คโ€‹Fc,Hโ€‹(x)โ‰œ1โˆ’q,F^{\bm{S}_{n}}(x)=\frac{1}{p}\sum_{i=1}^{p}I_{[\lambda_{i},\infty)}(x)=1-\frac{r+\delta}{p}\overset{w}{\longrightarrow}F^{c,H}(x)\triangleq 1-q, (48)

which yields ฮผ1โˆ’q=x\mu_{1-q}=x. Hence, (20) follows from (46)-(48). The conclusion follows. โˆŽ

Proof of Theorem 4.

Proof.

We first consider the case where k<Kk<K. Note that the criteria in (21) can be also expressed as

EDAk=โˆ’nโ€‹logโก(ฮธ1โ€‹โ‹ฏโ€‹ฮธk)+nโ€‹(pโˆ’kโˆ’1)โ€‹logโกฮธ~p,k+2โ€‹pโ€‹k,EDBk=โˆ’nโ€‹logโก(p)โ‹…logโก(ฮธ1โ€‹โ‹ฏโ€‹ฮธk)+nโ€‹(pโˆ’kโˆ’1)โ€‹logโกฮธ~p,k+(logโกn)โ€‹pโ€‹k.\begin{split}&\text{EDA}_{k}=-n\log(\theta_{1}\cdots\theta_{k})+n(p-k-1)\log\widetilde{\theta}_{p,k}+2pk,\\ &\text{EDB}_{k}=-n\log(p)\cdot\log(\theta_{1}\cdots\theta_{k})+n(p-k-1)\log\widetilde{\theta}_{p,k}+(\log n)pk.\end{split} (49)

From (49), write

1nโ€‹(EDAkโˆ’EDAK)=1nโ€‹โˆ‘i=k+1K(EDAiโˆ’1โˆ’EDAi)\displaystyle\frac{1}{n}\left(\text{EDA}_{k}-\text{EDA}_{K}\right)=\frac{1}{n}\sum_{i=k+1}^{K}\left(\text{EDA}_{i-1}-\text{EDA}_{i}\right) (50)
=\displaystyle= โˆ‘i=k+1K{logโกฮธi+(pโˆ’i)โ€‹logโก[1โˆ’1pโˆ’iโ€‹(1โˆ’ฮธi2ฮธ~p,i)]+logโกฮธ~p,iโˆ’2โ€‹pn}\displaystyle\sum_{i=k+1}^{K}\left\{\log\theta_{i}+(p-i)\log\left[1-\frac{1}{p-i}\left(1-\frac{\theta_{i}^{2}}{\widetilde{\theta}_{p,i}}\right)\right]+\log\widetilde{\theta}_{p,i}-2\frac{p}{n}\right\}
โˆผ\displaystyle\sim โˆ‘i=k+1K{ฮพi2ฮธ~p,i+logโกฮพi+logโกฮธ~p,iโˆ’1โˆ’2โ€‹c}.\displaystyle\sum_{i=k+1}^{K}\left\{\frac{\xi_{i}^{2}}{\widetilde{\theta}_{p,i}}+\log\xi_{i}+\log\widetilde{\theta}_{p,i}-1-2c\right\}.

If there are h=oโ€‹(p)h=o(p) bulks in ฮ“Fc,H\Gamma_{F^{c,H}}, from (20), we have

ฮธrj=Oโ€‹(1)โ€‹a.s.,rjโˆˆ{K+1,โ€ฆ,pโˆ’1},j=1,โ€ฆ,hโˆ’1,ฮธrโ†’1โ€‹a.s.,rโˆˆ๐•ƒโ‰œ{K+1,โ€ฆ,pโˆ’1}โˆ–{r1,โ€ฆ,rhโˆ’1}.\begin{split}&\theta_{r_{j}}=O(1)~{}~{}\text{a.s.},~{}r_{j}\in\{K+1,\ldots,p-1\},j=1,\ldots,h-1,\\ &\theta_{r}\rightarrow 1~{}~{}\text{a.s.},~{}r\in\mathbb{L}\triangleq\{K+1,\ldots,p-1\}\setminus\{r_{1},\ldots,r_{h-1}\}.\end{split} (51)

Combining it with (24), for iโˆˆ[k,K]i\in[k,K], it yields

1โ‰คฮธ~p,i=1pโˆ’iโˆ’1โ€‹(ฮธi+12+โ‹ฏ+ฮธK2+ฮธK+12+โ‹ฏ+ฮธpโˆ’12)โ‰ค1pโˆ’iโˆ’1โ€‹((Kโˆ’i+hโˆ’1)โ€‹maxjโˆˆ{i+1,โ€ฆ,K,r1,โ€ฆ,rhโˆ’1}โกฮธj2+โˆ‘rโˆˆ๐•ƒฮธr2)โ†’1,\begin{split}1\leq\widetilde{\theta}_{p,i}=&\frac{1}{p-i-1}\left(\theta_{i+1}^{2}+\cdots+\theta_{K}^{2}+\theta_{K+1}^{2}+\cdots+\theta_{p-1}^{2}\right)\\ \leq&\frac{1}{p-i-1}\left((K-i+h-1)\max\limits_{j\in\{i+1,\ldots,K,r_{1},\ldots,r_{h-1}\}}\theta_{j}^{2}+\sum_{r\in\mathbb{L}}\theta_{r}^{2}\right)\rightarrow 1,\end{split}

as pโ†’โˆžp\rightarrow\infty. Thus, (50) is equivalent to

โˆ‘i=k+1K{ฮพi2+logโกฮพiโˆ’1โˆ’2โ€‹c}.\sum_{i=k+1}^{K}\left\{\xi_{i}^{2}+\log\xi_{i}-1-2c\right\}. (52)

If the gap condition (26) does not hold, (52) can be negative, so that K^EDA\hat{K}_{\text{EDA}} is not consistent. Otherwise, for k<Kk<K and sufficiently large pp, we have

1nโ€‹(EDAkโˆ’EDAK)>0.\frac{1}{n}\left(\text{EDA}_{k}-\text{EDA}_{K}\right)>0.

In other words,

K^EDA=argโกmink=1,โ€ฆ,Kโก1nโ€‹EDAk=K,a.s..\hat{K}_{\text{EDA}}=\arg\min\limits_{k=1,\ldots,K}\frac{1}{n}\text{EDA}_{k}=K,~{}a.s.. (53)

Next, consider the case that K<kโ‰คwK<k\leq w. It follows that

1nโ€‹(EDAkโˆ’EDAK)=1nโ€‹โˆ‘i=K+1k(EDAiโˆ’EDAiโˆ’1)\displaystyle\frac{1}{n}\left(\text{EDA}_{k}-\text{EDA}_{K}\right)=\frac{1}{n}\sum_{i=K+1}^{k}\left(\text{EDA}_{i}-\text{EDA}_{i-1}\right) (54)
=\displaystyle= โˆ‘i=K+1k{โˆ’logโกฮธiโˆ’(pโˆ’i)โ€‹logโก[1โˆ’1pโˆ’iโ€‹(1โˆ’ฮธi2ฮธ~p,i)]โˆ’logโกฮธ~p,i+2โ€‹pn}\displaystyle\sum_{i=K+1}^{k}\left\{-\log\theta_{i}-(p-i)\log\left[1-\frac{1}{p-i}\left(1-\frac{\theta_{i}^{2}}{\widetilde{\theta}_{p,i}}\right)\right]-\log\widetilde{\theta}_{p,i}+2\frac{p}{n}\right\}
โˆผ\displaystyle\sim โˆ‘i=K+1k{1โˆ’ฮธi2ฮธ~p,iโˆ’logโกฮธiโˆ’logโกฮธ~p,i+2โ€‹c}.\displaystyle\sum_{i=K+1}^{k}\left\{1-\frac{\theta_{i}^{2}}{\widetilde{\theta}_{p,i}}-\log\theta_{i}-\log\widetilde{\theta}_{p,i}+2c\right\}.

By (51), for i=K+1,โ€ฆ,wi=K+1,\ldots,w, we have

ฮธ~p,i=1pโˆ’iโˆ’1โ€‹โˆ‘j=i+1pโˆ’1ฮธj2โ†’1โ€‹a.s..\widetilde{\theta}_{p,i}=\frac{1}{p-i-1}\sum_{j=i+1}^{p-1}\theta_{j}^{2}\rightarrow 1~{}~{}~{}\text{a.s.}.

Hence, (54) is equivalent to 2โ€‹(kโˆ’K)โ€‹c>02(k-K)c>0, which follows from w=oโ€‹(p)w=o(p). Then,

K^EDA=argโกmink=K,โ€ฆ,wโก1nโ€‹EDAk=Ka.s.,\hat{K}_{\text{EDA}}=\arg\min\limits_{k=K,\ldots,w}\frac{1}{n}\text{EDA}_{k}=K~{}~{}~{}~{}\text{a.s.},

from which with (53) conclusion (i) follows.

If ฮปKโ†’โˆž\lambda_{K}\rightarrow\infty, note that the proof for the case where K<kโ‰คwK<k\leq w proceeds in the same manner as before, which will not be repeated here.

For k<Kk<K, from Lemma 1 and the second assertion in Theorem 2, it yields

1nโ€‹(EDAkโˆ’EDAK)\displaystyle\frac{1}{n}\left(\text{EDA}_{k}-\text{EDA}_{K}\right) (55)
=\displaystyle= logโก(ฮธk+1โ€‹โ‹ฏโ€‹ฮธK)+(pโˆ’Kโˆ’1)โ€‹logโกฮธ~p,kฮธ~p,K+(Kโˆ’k)โ€‹logโกฮธ~p,kโˆ’2โ€‹(Kโˆ’k)โ€‹pn\displaystyle\log(\theta_{k+1}\cdots\theta_{K})+(p-K-1)\log\frac{\widetilde{\theta}_{p,k}}{\widetilde{\theta}_{p,K}}+(K-k)\log\widetilde{\theta}_{p,k}-2(K-k)\frac{p}{n}
=\displaystyle= ฮปk+1โˆ’ฮปK+1+(pโˆ’Kโˆ’1)โ€‹logโก[1+1pโˆ’kโˆ’1โ€‹(ฮธk+12+โ‹ฏ+ฮธK2ฮธ~p,Kโˆ’(Kโˆ’k))]\displaystyle\lambda_{k+1}-\lambda_{K+1}+(p-K-1)\log\left[1+\frac{1}{p-k-1}\left(\frac{\theta_{k+1}^{2}+\cdots+\theta_{K}^{2}}{\widetilde{\theta}_{p,K}}-(K-k)\right)\right]
+(Kโˆ’k)โ€‹logโกฮธ~p,kโˆ’2โ€‹(Kโˆ’k)โ€‹pn\displaystyle+(K-k)\log\widetilde{\theta}_{p,k}-2(K-k)\frac{p}{n}
โˆผ\displaystyle\sim ฮปk+1โˆ’ฮผ1+(pโˆ’Kโˆ’1)โ€‹logโก[1+1pโˆ’kโˆ’1โ€‹(ฮธk+12+โ‹ฏ+ฮธK2ฮธ~p,Kโˆ’(Kโˆ’k))]\displaystyle\lambda_{k+1}-\mu_{1}+(p-K-1)\log\left[1+\frac{1}{p-k-1}\left(\frac{\theta_{k+1}^{2}+\cdots+\theta_{K}^{2}}{\widetilde{\theta}_{p,K}}-(K-k)\right)\right]
+(Kโˆ’k)โ€‹logโกฮธ~p,kโˆ’2โ€‹(Kโˆ’k)โ€‹c\displaystyle+(K-k)\log\widetilde{\theta}_{p,k}-2(K-k)c
โ‰ฅ\displaystyle\geq ฮปk+1โˆ’ฮผ1+(pโˆ’Kโˆ’1)โ€‹logโก[1+1pโˆ’kโˆ’1โ€‹(ฮธK2ฮธ~p,Kโˆ’(Kโˆ’k))]\displaystyle\lambda_{k+1}-\mu_{1}+(p-K-1)\log\left[1+\frac{1}{p-k-1}\left(\frac{\theta_{K}^{2}}{\widetilde{\theta}_{p,K}}-(K-k)\right)\right]
+(Kโˆ’k)โ€‹logโกฮธ~p,kโˆ’2โ€‹(Kโˆ’k)โ€‹c\displaystyle+(K-k)\log\widetilde{\theta}_{p,k}-2(K-k)c

Since

1pโˆ’kโˆ’1โ€‹(ฮธK2ฮธ~p,Kโˆ’(Kโˆ’k))โˆผ1pโˆ’kโˆ’1โ€‹(expโก{2โ€‹(ฮปKโˆ’ฮผ1)}โˆ’(Kโˆ’k))>0,\frac{1}{p-k-1}\left(\frac{\theta_{K}^{2}}{\widetilde{\theta}_{p,K}}-(K-k)\right)\sim\frac{1}{p-k-1}\left(\exp\{2(\lambda_{K}-\mu_{1})\}-(K-k)\right)>0,

the second term of (55) and then also (55) tend to infinity as pโ†’โˆžp\rightarrow\infty. Hence the second assertion holds. โˆŽ

Proof of Corollary 1.

Proof.

We first verify (29). From (16) we find that for any fixed unit vector ๐ฎโˆˆโ„p\mathbf{u}\in\mathbb{R}^{p},

inftโˆˆโ„โ€–tโ€‹๐ฎโˆ’๐ฎ^kโ€–2=1โˆ’๐ฎโŠคโ€‹๐ฎ^kโ€‹๐ฎ^kโŠคโ€‹๐ฎ=1โˆ’ฮทkโ€‹๐ฎโŠคโ€‹๐€nโŠคโ€‹ฮพkโ€‹ฮพkโŠคโ€‹๐€nโ€‹๐ฎฮณk+OPโ€‹(1n),\inf_{t\in\mathbb{R}}\|t\mathbf{u}-\hat{\mathbf{u}}_{k}\|^{2}=1-\mathbf{u}^{\top}\hat{\mathbf{u}}_{k}\hat{\mathbf{u}}_{k}^{\top}\mathbf{u}=1-\eta_{k}\frac{\mathbf{u}^{\top}\mathbf{A}_{n}^{\top}\xi_{k}\xi_{k}^{\top}\mathbf{A}_{n}\mathbf{u}}{\gamma_{k}}+O_{P}\left(\frac{1}{\sqrt{n}}\right), (56)

where the first step holds by taking t=๐ฎ^kโŠคโ€‹๐ฎt=\hat{\mathbf{u}}_{k}^{\top}\mathbf{u}. Note that ๐€nโŠคโ€‹ฮพkโ€‹ฮพkโŠคโ€‹๐€n\mathbf{A}_{n}^{\top}\xi_{k}\xi_{k}^{\top}\mathbf{A}_{n} is a rank one matrix and its eigenvector associated with the non-zero eigenvalue is ๐ฎโˆ—:=๐€nโŠคโ€‹ฮพk/โ€–๐€nโŠคโ€‹ฮพkโ€–\mathbf{u}^{*}:=\mathbf{A}_{n}^{\top}\xi_{k}/\|\mathbf{A}_{n}^{\top}\xi_{k}\|. Then (29) follows by substituting ๐ฎ=๐ฎโˆ—\mathbf{u}=\mathbf{u}^{*} into (56) and using the fact that (๐€nโ€‹๐€nโŠค+๐šบ)โ€‹ฮพk=ฮณkโ€‹ฮพk(\mathbf{A}_{n}\mathbf{A}_{n}^{\top}+\bm{\Sigma})\xi_{k}=\gamma_{k}\xi_{k}.

The second statement (30) can be concluded by finding that ฮ›=๐•rโŠคโ€‹๐”^r\Lambda=\mathbf{V}_{r}^{\top}\hat{\mathbf{U}}_{r} minimizes โ€–๐•rโ€‹ฮ›โˆ’๐”^rโ€–F2\|\mathbf{V}_{r}\Lambda-\hat{\mathbf{U}}_{r}\|_{F}^{2} and its minimum value is obtained also by (16). โˆŽ

References

  • Bai etย al., (2007) Bai, Z., Miao, B., and Pan, G. (2007). On asymptotics of eigenvectors of large sample covariance matrix. The Annals of Probability, 35(4):1532โ€“1572.
  • Bai and Silverstein, (2010) Bai, Z. and Silverstein, J.ย W. (2010). Spectral analysis of large dimensional random matrices, volumeย 20. Springer.
  • Bai and Yao, (2012) Bai, Z. and Yao, J. (2012). On sample eigenvalues in a generalized spiked population model. Journal of Multivariate Analysis, 106:167โ€“177.
  • Bai etย al., (2018) Bai, Z.ย D., Choi, K.ย P., and Fujikoshi, Y. (2018). Consistency of aic and bic in estimating the number of significant components in high-dimensional pricipal component analysis. The Annals of Statistics, 46(3):1050โ€“1076.
  • Bao etย al., (2022) Bao, Z., Ding, X., Wang, J., and Wang, K. (2022). Statistical inference for principal components of spiked covariance matrices. The Annals of Statistics, 50(2):1144โ€“1169.
  • Bao etย al., (2021) Bao, Z., Ding, X., and Wang, K. (2021). Singular vector and singular subspace distribution for the matrix denoising model. The Annals of Statistics, 49(1):370โ€“392.
  • Berry and Tamer, (2006) Berry, S. and Tamer, E. (2006). Identification in models of oligopoly entry. Econometric Society Monographs, 42:46.
  • Bradley etย al., (1999) Bradley, P.ย S., Fayyad, U.ย M., and Mangasarian, O.ย L. (1999). Mathematical programming for data mining: Formulations and challenges. INFORMS Journal on Computing, 11(3):217โ€“238.
  • Cai etย al., (2019) Cai, T.ย T., Ma, J., and Zhang, L. (2019). Chime: Clustering of high-dimensional gaussian mixtures with em algorithm and its optimality. The Annals of Statistics, 47(3):1234โ€“1267.
  • Cape etย al., (2019) Cape, J., Tang, M., and Priebe, C.ย E. (2019). Signal-plus-noise matrix models: eigenvector deviations and fluctuations. Biometrika, 106(1):243โ€“250.
  • Chen etย al., (2014) Chen, X., Ponomareva, M., and Tamer, E. (2014). Likelihood inference in some finite mixture models. Journal of Econometrics, 182(1):87โ€“99.
  • Couillet etย al., (2016) Couillet, R., Benaych-Georges, F., etย al. (2016). Kernel spectral clustering of large dimensional data. Electronic Journal of Statistics, 10(1):1393โ€“1454.
  • David, (2020) David, P.ย H. (2020). Degrees of freedom and model selection for kk-means. Computational Statistics and Data Analysis, 149:1โ€“14.
  • Ding, (2020) Ding, X. (2020). High dimensional deformed rectangular matrices with applications in matrix denoising. Bernoulli, 26(1):387โ€“417.
  • Duda and Hart, (1973) Duda, R.ย O. and Hart, P.ย E. (1973). Pattern classification and scene analysis, volumeย 3. Wiley New York.
  • Hachem etย al., (2007) Hachem, W., Loubaton, P., and Najim, J. (2007). Deterministic equivalents for certain functionals of large random matrices. The Annals of Applied Probability, 17(3):875โ€“930.
  • Hachem etย al., (2013) Hachem, W., Loubaton, P., Najim, J., and Vallet, P. (2013). On bilinear forms based on the resolvent of large random matrices. In Annales de lโ€™IHP Probabilitรฉs et statistiques, volumeย 49, pages 36โ€“63.
  • Han etย al., (2021) Han, X., Tong, X., and Fan, Y. (2021). Eigen selection in spectral clustering: A theory-guided practice. Journal of the American Statistical Association, pages 1โ€“13.
  • Jiang and Bai, (2021) Jiang, D. and Bai, Z. (2021). Generalized four moment theorem and an application to clt for spiked eigenvalues of high-dimensional covariance matrices. Bernoulli, 27(1):274โ€“294.
  • Jin, (2015) Jin, J. (2015). Fast community detection by score. The Annals of Statistics, 43(1):57โ€“89.
  • Kaufman and Rousseeuw, (1987) Kaufman, L. and Rousseeuw, P. (1987). Clustering by means of medoids.
  • Keane and Wolpin, (1997) Keane, M.ย P. and Wolpin, K.ย I. (1997). The career decisions of young men. Journal of political Economy, 105(3):473โ€“522.
  • Liu etย al., (2022) Liu, Y., Liang, Y.-C., Pan, G., and Zhang, Z. (2022). Random or nonrandom signal in high-dimensional regimes. IEEE Transactions on Information Theory, 69(1):298โ€“315.
  • Lรถffler etย al., (2021) Lรถffler, M., Zhang, A.ย Y., and Zhou, H.ย H. (2021). Optimality of spectral clustering in the gaussian mixture model. The Annals of Statistics, 49(5):2506โ€“2530.
  • Loubaton and Vallet, (2011) Loubaton, P. and Vallet, P. (2011). Almost sure localization of the eigenvalues in a gaussian information plus noise model. application to the spiked models. Electronic Journal of Probability, 16:1934โ€“1959.
  • MacQueen etย al., (1967) MacQueen, J. etย al. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, volumeย 1, pages 281โ€“297. Oakland, CA, USA.
  • Maimon and Rokach, (2005) Maimon, O. and Rokach, L. (2005). Data mining and knowledge discovery handbook.
  • (28) Mestre, X. (2008a). Improved estimation of eigenvalues and eigenvectors of covariance matrices using their sample estimates. IEEE Transactions on Information Theory, 54(11):5113โ€“5129.
  • (29) Mestre, X. (2008b). On the asymptotic behavior of the sample estimates of eigenvalues and eigenvectors of covariance matrices. IEEE Transactions on Signal Processing, 56(11):5353โ€“5368.
  • Nadakuditi, (2014) Nadakuditi, R.ย R. (2014). Optshrink: An algorithm for improved low-rank signal matrix denoising by optimal, data-driven singular value shrinkage. IEEE Transactions on Information Theory, 60(5):3002โ€“3018.
  • Pan, (2014) Pan, G. (2014). Comparison between two types of large sample covariance matrices. In Annales de lโ€™IHP Probabilitรฉs et statistiques, volumeย 50, pages 655โ€“677.
  • Redner and Walker, (1984) Redner, R.ย A. and Walker, H.ย F. (1984). Mixture densities, maximum likelihood and the em algorithm. SIAM review, 26(2):195โ€“239.
  • Rousseeuw, (1987) Rousseeuw, P.ย J. (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Computational and Applied Mathematics, 20:53โ€“65.
  • Silverstein and Bai, (1995) Silverstein, J.ย W. and Bai, Z. (1995). On the empirical distribution of eigenvalues of a class of large dimensional random matrices. Journal of Multivariate analysis, 54(2):175โ€“192.
  • Silverstein and Choi, (1995) Silverstein, J.ย W. and Choi, S.-I. (1995). Analysis of the limiting spectral distribution of large dimensional random matrices. Journal of Multivariate Analysis, 54(2):295โ€“309.
  • Thorndike, (1953) Thorndike, R.ย L. (1953). Who belongs in the family? Psychometrika, 18(4):267โ€“276.
  • Tibshirani etย al., (2001) Tibshirani, R., Walther, G., and Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society. Series B. Statistical Methodology, 63(2):411โ€“423.
  • Vallet etย al., (2012) Vallet, P., Loubaton, P., and Mestre, X. (2012). Improved subspace estimation for multivariate observations of high dimension: the deterministic signals case. IEEE Transactions on Information Theory, 58(2):1043โ€“1068.
  • Yang etย al., (2016) Yang, D., Ma, Z., and Buja, A. (2016). Rate optimal denoising of simultaneously sparse and low rank matrices. The Journal of Machine Learning Research, 17(1):3163โ€“3189.
  • Yin etย al., (1988) Yin, Y., Bai, Z.ย D., and Krishnaiah, P. (1988). On the limit of the largest eigenvalue of the large dimensional sample covariance matrix. Probability Theory and Related Fields, 78:509โ€“521.
  • Zhang etย al., (2022) Zhang, Z., Zheng, S., Pan, G., and Zhong, P.-S. (2022). Asymptotic independence of spiked eigenvalues and linear spectral statistics for large sample covariance matrices. The Annals of Statistics, 50(4):2205โ€“2230.
  • Zhou and Amini, (2019) Zhou, Z. and Amini, A.ย A. (2019). Analysis of spectral clustering algorithms for community detection: the general bipartite setting. The Journal of Machine Learning Research, 20(1):1774โ€“1820.

Supplementary material on โ€œOn the asymptotic properties of spiked eigenvalues and eigenvectors of signal-plus-noise matrices with their applicationsโ€

The supplementary material provides an additional application, the criteria of the estimation of the number of clusters when c>1c>1, the remaining proof of the theoretical results and some additional simulation studies.

6 S.1 The estimation of the number of clusters when c>1c>1.

In this section, we consider the case when p,nโ†’โˆžp,n\rightarrow\infty such that p/nโ†’cโˆˆ(1,โˆž)p/n\rightarrow c\in(1,\infty). Then the smallest (pโˆ’n)(p-n) eigenvalues of ๐‘บn\bm{S}_{n} are zero, that is,

ฮป1โ‰ฅฮป2โ‰ฅโ€ฆโ‰ฅฮปK>ฮปK+1โ‰ฅโ€ฆโ‰ฅฮปnโˆ’1โ‰ฅฮปnโ‰ฅฮปn+1=โ€ฆ=ฮปp=0.\lambda_{1}\geq\lambda_{2}\geq\ldots\geq\lambda_{K}>\lambda_{K+1}\geq\ldots\geq\lambda_{n-1}\geq\lambda_{n}\geq\lambda_{n+1}=\ldots=\lambda_{p}=0.

The modified criteria EDAยดk\acute{\text{EDA}}_{k} and EDBยดk\acute{\text{EDB}}_{k} for selecting the true number of clusters under c>1c>1 are obtained by replacing the second term in EDAk\text{EDA}_{k} and EDBk\text{EDB}_{k} with (nโˆ’kโˆ’1)โ€‹logโกฮธ~n,k(n-k-1)\log\widetilde{\theta}_{n,k}:

EDAยดk\displaystyle\acute{\text{EDA}}_{k} =\displaystyle= โˆ’nโ€‹(ฮป1โˆ’ฮปk+1)+nโ€‹(nโˆ’kโˆ’1)โ€‹logโกฮธ~n,k+2โ€‹pโ€‹k,\displaystyle-n(\lambda_{1}-\lambda_{k+1})+n(n-k-1)\log\widetilde{\theta}_{n,k}+2pk,
EDBยดk\displaystyle\acute{\text{EDB}}_{k} =\displaystyle= โˆ’nโ€‹logโก(p)โ‹…(ฮป1โˆ’ฮปk+1)+nโ€‹(nโˆ’kโˆ’1)โ€‹logโกฮธ~n,k+(logโกn)โ€‹pโ€‹k\displaystyle-n\log(p)\cdot(\lambda_{1}-\lambda_{k+1})+n(n-k-1)\log\widetilde{\theta}_{n,k}+(\log n)pk

where ฮธk=expโก{ฮปkโˆ’ฮปk+1},k=1,2,โ€ฆ,nโˆ’1,ฮธ~n,k=1nโˆ’kโˆ’1โ€‹โˆ‘i=k+1nโˆ’1ฮธi2\theta_{k}=\exp\{\lambda_{k}-\lambda_{k+1}\},k=1,2,\ldots,n-1,~{}\widetilde{\theta}_{n,k}=\frac{1}{n-k-1}\sum_{i=k+1}^{n-1}\theta_{i}^{2}, called pseudo-EDA and pseudo-EDB, respectively. Analogous to the case where 0<c<10<c<1, the modified pseudo-EDA and pseudo-EDB select the number of clusters by

K^EDAยด\displaystyle\hat{K}_{\acute{\text{EDA}}} =\displaystyle= argโกmink=1,โ€ฆ,wโก1nโ€‹EDAยดk,\displaystyle\arg\min\limits_{k=1,\ldots,w}\frac{1}{n}\acute{\text{EDA}}_{k},
K^EDBยด\displaystyle\hat{K}_{\acute{\text{EDB}}} =\displaystyle= argโกmink=1,โ€ฆ,wโก1nโ€‹EDBยดk.\displaystyle\arg\min\limits_{k=1,\ldots,w}\frac{1}{n}\acute{\text{EDB}}_{k}.

The corresponding gap conditions for pseudo-EDA and pseudo-EDB stay the same as in (26) and (27), respectively. The following theorems show that K^EDAยด\hat{K}_{\acute{\text{EDA}}} and K^EDBยด\hat{K}_{\acute{\text{EDB}}} possess a similar property as K^EDA\hat{K}_{\text{EDA}} and K^EDB\hat{K}_{\text{EDB}}.

Theorem 6.

Under conditions of Theorems 4 and 5, we have the following consistency results of the estimation criteria K^EDAยด\hat{K}_{\acute{\text{EDA}}} and K^EDBยด\hat{K}_{\acute{\text{EDB}}}.
(i) Suppose that ฮป1\lambda_{1} is bounded. If the gap conditions (26), (27) do not hold, then K^EDAยด\hat{K}_{\acute{\text{EDA}}} and K^EDBยด\hat{K}_{\acute{\text{EDB}}} are not consistent. If the gap conditions (26) and (27) hold, then K^EDAยด\hat{K}_{\acute{\text{EDA}}} and K^EDBยด\hat{K}_{\acute{\text{EDB}}} are strongly consistent.
(ii) Suppose that ฮปK\lambda_{K} tends to infinity. Then, K^EDAยด\hat{K}_{\acute{\text{EDA}}} and K^EDBยด\hat{K}_{\acute{\text{EDB}}} are strongly consistent.

Remark 9.

To illustrate EDA and EDB, one can refer to the example below, in which the true number of clusters is two.

Example. Let p=60p=60, n=100n=100 , ๐šบ\bm{\Sigma} be pร—pp\times p identity matrix and c=3/5c=3/5 and ฯ–1=โ€ฆ=ฯ–p=1\varpi_{1}=\ldots=\varpi_{p}=1. Suppose the means of two clusters are ๐1=(2,0,0,โ€ฆ,0)โŠค\bm{\mu}_{1}=(2,0,0,\ldots,0)^{\top}, ๐2=(0,2,0,โ€ฆ,0)โŠค\bm{\mu}_{2}=(0,2,0,\ldots,0)^{\top} with equal number of observations in each cluster, that is, n1=n2=50n_{1}=n_{2}=50. From Theorem 2, the limits of first four eigenvalues of ๐‘บn\bm{S}_{n} can be obtained as follows

ฮป1,ฮป2โ†’ฯ†โ€‹(3)=3.9โ€‹i.p.,ฮป3,ฮป4โ†’(1+3/5)2โ€‹a.s..\lambda_{1},~{}\lambda_{2}\rightarrow\varphi(3)=3.9~{}~{}\text{i.p.},\quad\quad\lambda_{3},~{}\lambda_{4}\rightarrow(1+\sqrt{3/5})^{2}~{}~{}\text{a.s.}. (S.57)

Then,

ฮธ1=expโก{ฮป1โˆ’ฮป2}โ†’1,ฮธ2=expโก{ฮป2โˆ’ฮป3}โ†’expโก{3.9โˆ’(1+3/5)2},ฮธ3,ฮธ4,โ€ฆ,ฮธpโˆ’1โ†’1,ฮธ~p,2=1pโˆ’3โ€‹โˆ‘i=3pโˆ’1ฮธi2โˆผ1,ฮธ~p,3โˆผ1,ฮธ~p,1=1pโˆ’2โ€‹โˆ‘i=2pโˆ’1ฮธi2=ฮธ22pโˆ’2+1pโˆ’2โ€‹โˆ‘i=3pโˆ’1ฮธi2โˆผexpโก{2โ€‹[3.9โˆ’(1+3/5)2]}pโˆ’2+pโˆ’3pโˆ’2.\begin{split}&\theta_{1}=\exp\{\lambda_{1}-\lambda_{2}\}\rightarrow 1,\quad\theta_{2}=\exp\{\lambda_{2}-\lambda_{3}\}\rightarrow\exp\{3.9-(1+\sqrt{3/5})^{2}\},\\ &\theta_{3},\theta_{4},\ldots,\theta_{p-1}\rightarrow 1,\quad\tilde{\theta}_{p,2}=\frac{1}{p-3}\sum_{i=3}^{p-1}\theta_{i}^{2}\sim 1,\quad\tilde{\theta}_{p,3}\sim 1,\\ &\tilde{\theta}_{p,1}=\frac{1}{p-2}\sum_{i=2}^{p-1}\theta_{i}^{2}=\frac{\theta_{2}^{2}}{p-2}+\frac{1}{p-2}\sum_{i=3}^{p-1}\theta_{i}^{2}\sim\frac{\exp\{2[3.9-(1+\sqrt{3/5})^{2}]\}}{p-2}+\frac{p-3}{p-2}.\end{split} (S.58)

Using (S.57) and (S.58), we have

1nโ€‹(EDA1โˆ’EDA2)โˆผ(pโˆ’2)โ€‹logโกฮธ~p,1+(ln,1โˆ’ln,3)โˆ’2โ€‹pnโ‰ˆ2.94>0,1nโ€‹(EDB1โˆ’EDB2)โˆผ(pโˆ’2)โ€‹logโกฮธ~p,1+logโก(p)โ€‹(ln,1โˆ’ln,3)โˆ’(logโกn)โ€‹pnโ‰ˆ3.7>0,\begin{split}&\frac{1}{n}(\text{EDA}_{1}-\text{EDA}_{2})\sim(p-2)\log\tilde{\theta}_{p,1}+(l_{n,1}-l_{n,3})-2\frac{p}{n}\approx 2.94>0,\\ &\frac{1}{n}(\text{EDB}_{1}-\text{EDB}_{2})\sim(p-2)\log\tilde{\theta}_{p,1}+\log(p)(l_{n,1}-l_{n,3})-(\log n)\frac{p}{n}\approx 3.7>0,\end{split} (S.59)

which means EDA and EDB can not lead to underestimation of the number of clusters, and the following expressions imply that they do not also lead to overestimation

1nโ€‹(EDA3โˆ’EDA2)โˆผ2โ€‹pn=1.2>0,1nโ€‹(EDB3โˆ’EDB2)โˆผ(logโกn)โ€‹pnโ‰ˆ2.76>0.\begin{split}&\frac{1}{n}(\text{EDA}_{3}-\text{EDA}_{2})\sim 2\frac{p}{n}=1.2>0,\\ &\frac{1}{n}(\text{EDB}_{3}-\text{EDB}_{2})\sim(\log n)\frac{p}{n}\approx 2.76>0.\end{split} (S.60)

From (S.59) and (S.60), it follows that both EDA and EDB are able to estimate the number of clusters accurately.

S.2.1 Additional simulations

We also consider the consistency properties of pseudo-EDA K^EDAยด\hat{K}_{\acute{\text{EDA}}} and pseudo-EDB K^EDBยด\hat{K}_{\acute{\text{EDB}}} when c=3/2c=3/2 and 33 under the following situations:

Case 5. Let ๐1=(5,0,0,0,โ€ฆ,0)โŠคโˆˆโ„p\bm{\mu}_{1}=(5,0,0,0,\ldots,0)^{\top}\in\mathbb{R}^{p}, ๐2=(0,6,0,0,โ€ฆ,0)โŠคโˆˆโ„p\bm{\mu}_{2}=(0,6,0,0,\ldots,0)^{\top}\in\mathbb{R}^{p}, ๐3=(โˆ’2,0,4,0,โ€ฆ,0)โŠคโˆˆโ„p\bm{\mu}_{3}=(-2,0,4,0,\ldots,0)^{\top}\in\mathbb{R}^{p}, ๐šบ=(ฯƒi,j)pร—p\bm{\Sigma}=(\sigma_{i,j})_{p\times p}, where ฯƒi,j=0.2|iโˆ’j|\sigma_{i,j}=0.2^{|i-j|}. Then,

๐€n=(๐1,โ€ฆ,๐1โŸn1,๐2,โ€ฆ,๐2โŸn2,๐3,โ€ฆ,๐3โŸn3),\mathbf{A}_{n}=\big{(}\underbrace{\bm{\mu}_{1},\ldots,\bm{\mu}_{1}}_{n_{1}},\underbrace{\bm{\mu}_{2},\ldots,\bm{\mu}_{2}}_{n_{2}},\underbrace{\bm{\mu}_{3},\ldots,\bm{\mu}_{3}}_{n_{3}}\big{)},

where n1=n2=0.3โ€‹n,n3=0.4โ€‹nn_{1}=n_{2}=0.3n,~{}n_{3}=0.4n. Therefore, the true number of clusters is K=3K=3.

Case 6. Let ๐1=(4,0,0,0,โ€ฆ,0)โŠคโˆˆโ„p\bm{\mu}_{1}=(4,0,0,0,\ldots,0)^{\top}\in\mathbb{R}^{p}, ๐2=(0,4,0,0,โ€ฆ,0)โŠคโˆˆโ„p\bm{\mu}_{2}=(0,4,0,0,\ldots,0)^{\top}\in\mathbb{R}^{p}, ๐3=(0,0,4,0,โ€ฆ,0)โŠคโˆˆโ„p\bm{\mu}_{3}=(0,0,4,0,\ldots,0)^{\top}\in\mathbb{R}^{p}, ๐šบ=๐ˆ\bm{\Sigma}=\mathbf{I}. Then, ๐€n\mathbf{A}_{n} has the same form as above with n1=n2=0.3โ€‹n,n3=0.4โ€‹nn_{1}=n_{2}=0.3n,~{}n_{3}=0.4n. Therefore, the true number of clusters is K=3K=3.

Case 7. The same setting as in Case 6 with ๐ˆ\mathbf{I} replaced by ๐šบ=(ฯƒi,j)pร—p\bm{\Sigma}=(\sigma_{i,j})_{p\times p}, where ฯƒi,j=0.2|iโˆ’j|\sigma_{i,j}=0.2^{|i-j|}.

Case 8. The same setting as in Case 4 with a=n/10a=\sqrt{n/10} instead of a=p/10a=\sqrt{p/10}. Generality, ss can be seen from Table 6, when c=3/2c=3/2, with pp increasing, K^EDAยด\hat{K}_{\acute{\text{EDA}}} and K^EDBยด\hat{K}_{\acute{\text{EDB}}} perform better, especially K^EDBยด\hat{K}_{\acute{\text{EDB}}}. As cc increases (fixed pp and reducing nn), from (25), the gap conditions of EDA and EDB are not easy to satisfy. In particular, the gap condition of EDB is more strict than that of EDA when n(>20)n(>20) and cc are large. Therefore, the performance of pseudo-EDA is better than that of pseudo-EDB at c=3c=3. Other tables are similarly.

Table 6: Selection percentages of EDA, EDB, ASI, GS and BICdf in Case 5
EDAยด\acute{\text{EDA}} EDBยด\acute{\text{EDB}} ASI GS BICdf EDAยด\acute{\text{EDA}} EDBยด\acute{\text{EDB}} ASI GS BICdf
cc nn ๐’ฉโ€‹(0,1)\mathcal{N}(0,1) t8t_{8}
32\frac{3}{2} 6060 โ„ฑโˆ’\mathcal{F}_{-} 0.50.5 0.60.6 5.65.6 21.621.6 94.594.5 0.20.2 0.30.3 5.45.4 28.328.3 95.195.1
โ„ฑโˆ—\mathcal{F}_{*} 8989 93.293.2 91.991.9 78.478.4 5.55.5 84.484.4 90.190.1 89.889.8 71.771.7 4.94.9
โ„ฑ+\mathcal{F}_{+} 10.510.5 6.26.2 2.52.5 0 0 15.415.4 9.69.6 4.84.8 0 0
300300 โ„ฑโˆ’\mathcal{F}_{-} 0 0 4.94.9 6.36.3 100100 0 0 5.25.2 10.910.9 100100
โ„ฑโˆ—\mathcal{F}_{*} 100100 100100 94.294.2 93.793.7 0 100100 100100 93.293.2 89.189.1 0
โ„ฑ+\mathcal{F}_{+} 0 0 0.90.9 0 0 0 0 1.61.6 0 0
33 3030 โ„ฑโˆ’\mathcal{F}_{-} 1515 15.515.5 66 100100 100100 13.913.9 14.914.9 7.47.4 99.999.9 100100
โ„ฑโˆ—\mathcal{F}_{*} 73.773.7 74.474.4 89.889.8 0 0 71.871.8 71.771.7 86.686.6 0.10.1 0
โ„ฑ+\mathcal{F}_{+} 11.311.3 10.110.1 4.24.2 0 0 14.314.3 13.413.4 66 0 0
150150 โ„ฑโˆ’\mathcal{F}_{-} 8.38.3 19.219.2 3.73.7 98.998.9 100100 8.88.8 1919 7.47.4 99.699.6 100100
โ„ฑโˆ—\mathcal{F}_{*} 91.791.7 80.880.8 94.894.8 1.11.1 0 91.291.2 8181 90.390.3 0.40.4 0
โ„ฑ+\mathcal{F}_{+} 0 0 1.51.5 0 0 0 0 2.32.3 0 0
cc nn Bernoulli ฯ‡2โ€‹(3)\chi^{2}(3)
32\frac{3}{2} 6060 โ„ฑโˆ’\mathcal{F}_{-} 0 0 5.75.7 12.112.1 95.295.2 0.40.4 1.31.3 4.74.7 36.436.4 94.394.3
โ„ฑโˆ—\mathcal{F}_{*} 89.489.4 93.693.6 92.892.8 87.987.9 4.84.8 81.281.2 85.885.8 87.787.7 63.463.4 5.75.7
โ„ฑ+\mathcal{F}_{+} 10.610.6 6.46.4 1.51.5 0 0 18.418.4 12.912.9 7.67.6 0 0
300300 โ„ฑโˆ’\mathcal{F}_{-} 0 0 4.14.1 3.63.6 100100 0 0 7.27.2 1212 99.999.9
โ„ฑโˆ—\mathcal{F}_{*} 100100 100100 95.795.7 96.496.4 0 100100 100100 90.590.5 8888 0.10.1
โ„ฑ+\mathcal{F}_{+} 0 0 0.20.2 0 0 0 0 2.32.3 0 0
33 3030 โ„ฑโˆ’\mathcal{F}_{-} 1212 12.712.7 6.36.3 99.899.8 100100 17.117.1 17.917.9 6.76.7 100100 100100
โ„ฑโˆ—\mathcal{F}_{*} 79.179.1 79.679.6 91.191.1 0.20.2 0 64.664.6 65.465.4 80.580.5 0 0
โ„ฑ+\mathcal{F}_{+} 8.98.9 7.77.7 2.62.6 0 0 18.318.3 16.716.7 12.812.8 0 0
150150 โ„ฑโˆ’\mathcal{F}_{-} 7.67.6 1616 3.63.6 97.397.3 100100 14.214.2 25.525.5 8.68.6 99.999.9 100100
โ„ฑโˆ—\mathcal{F}_{*} 92.492.4 8484 96.396.3 2.72.7 0 85.885.8 74.574.5 86.286.2 0.10.1 0
โ„ฑ+\mathcal{F}_{+} 0 0 0.10.1 0 0 0 0 5.25.2 0 0
Table 7: Selection percentages of EDA, EDB, ASI, GS and BICdf in Case 6
EDAยด\acute{\text{EDA}} EDBยด\acute{\text{EDB}} ASI GS BICdf EDAยด\acute{\text{EDA}} EDBยด\acute{\text{EDB}} ASI GS BICdf
cc nn ๐’ฉโ€‹(0,1)\mathcal{N}(0,1) t8t_{8}
32\frac{3}{2} 6060 โ„ฑโˆ’\mathcal{F}_{-} 0 0 6.86.8 100100 100100 0 0 9.69.6 100100 100100
โ„ฑโˆ—\mathcal{F}_{*} 93.393.3 96.796.7 91.491.4 0 0 92.192.1 95.295.2 85.785.7 0 0
โ„ฑ+\mathcal{F}_{+} 6.76.7 3.33.3 1.81.8 0 0 7.97.9 4.84.8 4.74.7 0 0
300300 โ„ฑโˆ’\mathcal{F}_{-} 0 0 2.92.9 100100 100100 0 0 2.62.6 100100 100100
โ„ฑโˆ—\mathcal{F}_{*} 100100 100100 96.996.9 0 0 99.899.8 100100 96.696.6 0 0
โ„ฑ+\mathcal{F}_{+} 0 0 0.20.2 0 0 0.20.2 0 0.80.8 0 0
33 3030 โ„ฑโˆ’\mathcal{F}_{-} 16.816.8 18.618.6 19.919.9 100100 100100 21.121.1 25.425.4 22.422.4 100100 100100
โ„ฑโˆ—\mathcal{F}_{*} 78.678.6 75.475.4 70.270.2 0 0 69.769.7 68.368.3 60.860.8 0 0
โ„ฑ+\mathcal{F}_{+} 4.64.6 66 9.99.9 0 0 9.29.2 6.36.3 16.816.8 0 0
150150 โ„ฑโˆ’\mathcal{F}_{-} 1.21.2 1010 9.69.6 100100 100100 2.82.8 13.413.4 13.213.2 100100 100100
โ„ฑโˆ—\mathcal{F}_{*} 98.898.8 9090 89.489.4 0 0 97.297.2 86.686.6 83.483.4 0 0
โ„ฑ+\mathcal{F}_{+} 0 0 11 0 0 0 0 3.43.4 0 0
cc nn Bernoulli ฯ‡2โ€‹(3)\chi^{2}(3)
32\frac{3}{2} 6060 โ„ฑโˆ’\mathcal{F}_{-} 0 0 6.16.1 100100 100100 0.10.1 0.30.3 10.410.4 100100 100100
โ„ฑโˆ—\mathcal{F}_{*} 94.994.9 97.797.7 93.393.3 0 0 88.188.1 92.592.5 74.174.1 0 0
โ„ฑ+\mathcal{F}_{+} 5.15.1 2.32.3 0.60.6 0 0 11.811.8 7.27.2 15.515.5 0 0
300300 โ„ฑโˆ’\mathcal{F}_{-} 0 0 1.11.1 99.999.9 100100 0 0 8.68.6 100100 100100
โ„ฑโˆ—\mathcal{F}_{*} 100100 100100 98.898.8 0.10.1 0 100100 100100 89.689.6 0 0
โ„ฑ+\mathcal{F}_{+} 0 0 0.10.1 0 0 0 0 1.81.8 0 0
33 3030 โ„ฑโˆ’\mathcal{F}_{-} 12.212.2 14.614.6 1919 100100 100100 30.530.5 30.730.7 24.624.6 100100 100100
โ„ฑโˆ—\mathcal{F}_{*} 83.383.3 82.482.4 75.475.4 0 0 60.160.1 61.961.9 4646 0 0
โ„ฑ+\mathcal{F}_{+} 4.54.5 33 5.65.6 0 0 9.49.4 7.47.4 29.429.4 0 0
150150 โ„ฑโˆ’\mathcal{F}_{-} 1.41.4 8.98.9 2.12.1 100100 100100 4.94.9 19.719.7 19.719.7 100100 100100
โ„ฑโˆ—\mathcal{F}_{*} 98.698.6 91.191.1 97.697.6 0 0 95.195.1 80.380.3 74.374.3 0 0
โ„ฑ+\mathcal{F}_{+} 0 0 0.30.3 0 0 0 0 66 0 0
Table 8: Selection percentages of EDA, EDB, ASI, GS and BICdf in Case 7
EDAยด\acute{\text{EDA}} EDBยด\acute{\text{EDB}} ASI GS BICdf EDAยด\acute{\text{EDA}} EDBยด\acute{\text{EDB}} ASI GS BICdf
cc nn ๐’ฉโ€‹(0,1)\mathcal{N}(0,1) t8t_{8}
32\frac{3}{2} 6060 โ„ฑโˆ’\mathcal{F}_{-} 0 0 5.75.7 100100 100100 0 0.10.1 7.87.8 100100 100100
โ„ฑโˆ—\mathcal{F}_{*} 89.589.5 93.593.5 91.791.7 0 0 8686 90.890.8 86.886.8 0 0
โ„ฑ+\mathcal{F}_{+} 10.510.5 6.56.5 2.62.6 0 0 1414 9.19.1 5.45.4 0 0
300300 โ„ฑโˆ’\mathcal{F}_{-} 0 0 22 100100 100100 0 0 2.62.6 100100 100100
โ„ฑโˆ—\mathcal{F}_{*} 100100 100100 97.797.7 0 0 99.999.9 100100 96.396.3 0 0
โ„ฑ+\mathcal{F}_{+} 0 0 0.30.3 0 0 0.10.1 0 1.11.1 0 0
33 3030 โ„ฑโˆ’\mathcal{F}_{-} 24.524.5 26.426.4 17.817.8 100100 100100 24.424.4 26.626.6 19.919.9 100100 100100
โ„ฑโˆ—\mathcal{F}_{*} 67.367.3 66.766.7 65.765.7 0 0 64.964.9 64.264.2 61.261.2 0 0
โ„ฑ+\mathcal{F}_{+} 8.28.2 6.96.9 16.516.5 0 0 10.710.7 9.29.2 18.918.9 0 0
150150 โ„ฑโˆ’\mathcal{F}_{-} 9.49.4 3434 9.89.8 100100 100100 10.210.2 3838 14.714.7 100100 100100
โ„ฑโˆ—\mathcal{F}_{*} 90.690.6 6666 88.388.3 0 0 89.889.8 6262 81.581.5 0 0
โ„ฑ+\mathcal{F}_{+} 0 0 1.91.9 0 0 0 0 3.83.8 0 0
cc nn Bernoulli ฯ‡2โ€‹(3)\chi^{2}(3)
32\frac{3}{2} 6060 โ„ฑโˆ’\mathcal{F}_{-} 0.10.1 0.20.2 5.15.1 99.799.7 100100 0.30.3 0.60.6 9.19.1 100100 100100
โ„ฑโˆ—\mathcal{F}_{*} 89.489.4 93.393.3 93.993.9 0.30.3 0 82.682.6 87.487.4 76.876.8 0 0
โ„ฑ+\mathcal{F}_{+} 10.510.5 6.56.5 11 0 0 17.117.1 1212 14.114.1 0 0
300300 โ„ฑโˆ’\mathcal{F}_{-} 0 0 1.41.4 100100 100100 0 0 7.97.9 100100 100100
โ„ฑโˆ—\mathcal{F}_{*} 100100 100100 98.598.5 0 0 100100 100100 90.190.1 0 0
โ„ฑ+\mathcal{F}_{+} 0 0 0.10.1 0 0 0 0 22 0 0
33 3030 โ„ฑโˆ’\mathcal{F}_{-} 20.720.7 21.921.9 16.216.2 100100 100100 30.930.9 33.133.1 25.825.8 100100 100100
โ„ฑโˆ—\mathcal{F}_{*} 71.571.5 71.171.1 73.773.7 0 0 57.457.4 56.256.2 42.442.4 0 0
โ„ฑ+\mathcal{F}_{+} 7.87.8 77 10.110.1 0 0 11.711.7 10.710.7 31.831.8 0 0
150150 โ„ฑโˆ’\mathcal{F}_{-} 6.96.9 29.929.9 2.42.4 100100 100100 14.414.4 45.745.7 18.618.6 100100 100100
โ„ฑโˆ—\mathcal{F}_{*} 93.193.1 70.170.1 9797 0 0 85.685.6 54.354.3 73.573.5 0 0
โ„ฑ+\mathcal{F}_{+} 0 0 0.60.6 0 0 0 0 7.97.9 0 0
Table 9: Selection percentages of EDA, EDB, ASI, GS and BICdf in Case 8
EDAยด\acute{\text{EDA}} EDBยด\acute{\text{EDB}} ASI GS BICdf EDAยด\acute{\text{EDA}} EDBยด\acute{\text{EDB}} ASI GS BICdf
cc nn ๐’ฉโ€‹(0,1)\mathcal{N}(0,1) t8t_{8}
32\frac{3}{2} 6060 โ„ฑโˆ’\mathcal{F}_{-} 0 0 76.276.2 45.645.6 76.176.1 0 0 75.275.2 45.245.2 76.776.7
โ„ฑโˆ—\mathcal{F}_{*} 88.988.9 9494 2222 51.151.1 23.923.9 84.884.8 9090 23.423.4 52.352.3 23.3
โ„ฑ+\mathcal{F}_{+} 11.111.1 66 1.11.1 3.33.3 0 15.215.2 1010 1.41.4 2.52.5 0
300300 โ„ฑโˆ’\mathcal{F}_{-} 0 0 72.472.4 30.430.4 1.81.8 0 0 68.968.9 30.930.9 2.22.2
โ„ฑโˆ—\mathcal{F}_{*} 100100 100100 23.723.7 58.958.9 59.259.2 99.999.9 100100 27.827.8 59.659.6 60.760.7
โ„ฑ+\mathcal{F}_{+} 0 0 3.93.9 10.710.7 3939 0.10.1 0 3.33.3 9.59.5 37.137.1
33 3030 โ„ฑโˆ’\mathcal{F}_{-} 16.916.9 1919 85.185.1 100100 100100 20.120.1 21.921.9 82.682.6 100100 100100
โ„ฑโˆ—\mathcal{F}_{*} 73.373.3 73.173.1 13.413.4 0 0 68.868.8 67.667.6 14.414.4 0 0
โ„ฑ+\mathcal{F}_{+} 9.89.8 7.97.9 1.51.5 0 0 11.111.1 10.510.5 33 0 0
150150 โ„ฑโˆ’\mathcal{F}_{-} 0 0 8686 27.627.6 84.384.3 0 0 84.484.4 28.128.1 81.881.8
โ„ฑโˆ—\mathcal{F}_{*} 100100 100100 12.312.3 63.163.1 15.715.7 99.899.8 100100 13.213.2 64.764.7 18.218.2
โ„ฑ+\mathcal{F}_{+} 0 0 1.71.7 9.39.3 0 0.20.2 0 2.42.4 7.27.2 0
cc nn Bernoulli ฯ‡2โ€‹(3)\chi^{2}(3)
32\frac{3}{2} 6060 โ„ฑโˆ’\mathcal{F}_{-} 0 0 78.978.9 38.338.3 79.279.2 0 0 72.972.9 48.648.6 7878
โ„ฑโˆ—\mathcal{F}_{*} 89.889.8 93.293.2 20.320.3 57.757.7 20.820.8 78.678.6 86.986.9 24.524.5 47.947.9 21.821.8
โ„ฑ+\mathcal{F}_{+} 10.210.2 6.86.8 0.80.8 44 0 21.421.4 13.113.1 2.62.6 3.53.5 0.20.2
300300 โ„ฑโˆ’\mathcal{F}_{-} 0 0 71.371.3 32.332.3 1.91.9 0 0 69.969.9 32.732.7 1.81.8
โ„ฑโˆ—\mathcal{F}_{*} 100100 100100 26.126.1 58.258.2 59.759.7 100100 100100 26.826.8 56.156.1 60.160.1
โ„ฑ+\mathcal{F}_{+} 0 0 2.62.6 9.59.5 38.438.4 0 0 3.33.3 11.211.2 38.138.1
33 3030 โ„ฑโˆ’\mathcal{F}_{-} 15.715.7 14.514.5 85.385.3 100100 100100 24.224.2 21.321.3 78.478.4 100100 100100
โ„ฑโˆ—\mathcal{F}_{*} 77.477.4 78.478.4 14.314.3 0 0 58.658.6 64.964.9 16.916.9 0 0
โ„ฑ+\mathcal{F}_{+} 6.96.9 7.17.1 0.40.4 0 0 17.217.2 13.813.8 4.74.7 0 0
150150 โ„ฑโˆ’\mathcal{F}_{-} 0 0 84.684.6 2525 8484 0 0 81.881.8 31.331.3 82.182.1
โ„ฑโˆ—\mathcal{F}_{*} 99.999.9 100100 12.412.4 6565 1616 100100 100100 15.115.1 59.659.6 17.917.9
โ„ฑ+\mathcal{F}_{+} 0.10.1 0 33 1010 0 0 0 3.13.1 9.19.1 0

S.2.2 Proof of Lemma 1, Theorems 5 and 6.

Proof of Lemma 1.

Proof.

For any matrix ๐‘จ\bm{A}, denote by ฯƒiโ€‹(๐‘จ)\sigma_{i}(\bm{A}), ฯiโ€‹(๐‘จ)\rho_{i}(\bm{A}) the ii-th largest eigenvalue and singular value of ๐‘จ\bm{A}, respectively. From conditions in Theorem 2 and the main result of Yin etย al., (1988), it is shown that, with probability 11, as nโ†’โˆžn\rightarrow\infty, for k=1,โ€ฆ,Kk=1,\ldots,K, there is a constant CC such that

|ln,kโˆ’ฮปk|\displaystyle\left|l_{n,k}-\lambda_{k}\right| =\displaystyle= |ฯƒkโ€‹(๐—nโ€‹๐—nโŠค)โˆ’ฯƒkโ€‹(๐€nโ€‹๐€nโŠค+๐šบ)|\displaystyle\left|\sigma_{k}(\mathbf{X}_{n}\mathbf{X}_{n}^{\top})-\sigma_{k}(\mathbf{A}_{n}\mathbf{A}_{n}^{\top}+\bm{\Sigma})\right| (S.61)
โ‰ค\displaystyle\leq |ฯƒkโ€‹(๐—nโ€‹๐—nโŠค)โˆ’ฯƒkโ€‹(๐€nโ€‹๐€nโŠค)|+|ฯƒ1โ€‹(๐šบ)|\displaystyle\left|\sigma_{k}(\mathbf{X}_{n}\mathbf{X}_{n}^{\top})-\sigma_{k}(\mathbf{A}_{n}\mathbf{A}_{n}^{\top})\right|+\left|\sigma_{1}(\bm{\Sigma})\right|
=\displaystyle= |ฯk2โ€‹(๐—n)โˆ’ฯk2โ€‹(๐€n)|+|ฯƒ1โ€‹(๐šบ)|\displaystyle\left|\rho_{k}^{2}(\mathbf{X}_{n})-\rho^{2}_{k}(\mathbf{A}_{n})\right|+\left|\sigma_{1}(\bm{\Sigma})\right|
=\displaystyle= |ฯkโ€‹(๐—n)+ฯkโ€‹(๐€n)|โ€‹|ฯkโ€‹(๐šบ1/2โ€‹๐–n)|+|ฯƒ1โ€‹(๐šบ)|\displaystyle\left|\rho_{k}(\mathbf{X}_{n})+\rho_{k}(\mathbf{A}_{n})\right|\left|\rho_{k}(\bm{\Sigma}^{1/2}\mathbf{W}_{n})\right|+\left|\sigma_{1}(\bm{\Sigma})\right|
=\displaystyle= Cโ€‹(1+c)โ€‹|ฯkโ€‹(๐—n)+ฯkโ€‹(๐€n)|+C.\displaystyle\sqrt{C}(1+\sqrt{c})\left|\rho_{k}(\mathbf{X}_{n})+\rho_{k}(\mathbf{A}_{n})\right|+C.

Since ฮณKโ†’โˆž\gamma_{K}\rightarrow\infty, it follows

|ฯkโ€‹(๐—n)+ฯkโ€‹(๐€n)||ฯƒkโ€‹(๐€nโ€‹๐€nโŠค+๐šบ)|โ‰ค2โ€‹|ฯkโ€‹(๐€n)|+ฯkโ€‹(๐šบ1/2โ€‹๐‘พn)|ฯƒkโ€‹(๐€nโ€‹๐€nโŠค)|โ‰คC|ฯkโ€‹(๐€n)|โ†’0.\frac{\left|\rho_{k}(\mathbf{X}_{n})+\rho_{k}(\mathbf{A}_{n})\right|}{\left|\sigma_{k}(\mathbf{A}_{n}\mathbf{A}_{n}^{\top}+\bm{\Sigma})\right|}\leq\frac{2\left|\rho_{k}(\mathbf{A}_{n})\right|+\rho_{k}(\bm{\Sigma}^{1/2}\bm{W}_{n})}{\left|\sigma_{k}(\mathbf{A}_{n}\mathbf{A}_{n}^{\top})\right|}\leq\frac{C}{\left|\rho_{k}(\mathbf{A}_{n})\right|}\rightarrow 0. (S.62)

Dividing by ฮณk\gamma_{k} on the both sides of (S.61), due to (S.62), we complete the proof. โˆŽ

Proof of Theorem 5.

Proof.

The proof of Theorem 5 is identical to that of Theorem 4, and henc omitted. โˆŽ

Proof of Theorem 6.

Proof.

We sketch the proofs here, which is quite similar to that of Theorem 4. For k<Kk<K, we have

1nโ€‹(EDAยดkโˆ’EDAยดK)=1nโ€‹โˆ‘i=k+1K(EDAยดiโˆ’1โˆ’EDAยดi)\displaystyle\frac{1}{n}\left(\acute{\text{EDA}}_{k}-\acute{\text{EDA}}_{K}\right)=\frac{1}{n}\sum_{i=k+1}^{K}\left(\acute{\text{EDA}}_{i-1}-\acute{\text{EDA}}_{i}\right)
=\displaystyle= โˆ‘i=k+1K{logโกฮธi+(nโˆ’i)โ€‹logโก[1โˆ’1nโˆ’iโ€‹(1โˆ’ฮธi2ฮธ~n,i)]+logโกฮธ~n,iโˆ’2โ€‹pn},\displaystyle\sum_{i=k+1}^{K}\left\{\log\theta_{i}+(n-i)\log\left[1-\frac{1}{n-i}\left(1-\frac{\theta_{i}^{2}}{\widetilde{\theta}_{n,i}}\right)\right]+\log\widetilde{\theta}_{n,i}-2\frac{p}{n}\right\},
1nโ€‹(EDBยดkโˆ’EDBยดK)=1nโ€‹โˆ‘i=k+1K(EDBยดiโˆ’1โˆ’EDBยดi)\displaystyle\frac{1}{n}\left(\acute{\text{EDB}}_{k}-\acute{\text{EDB}}_{K}\right)=\frac{1}{n}\sum_{i=k+1}^{K}\left(\acute{\text{EDB}}_{i-1}-\acute{\text{EDB}}_{i}\right)
=\displaystyle= โˆ‘i=k+1K{(logโกp)โ€‹(logโกฮธi)+(nโˆ’i)โ€‹logโก[1โˆ’1nโˆ’iโ€‹(1โˆ’ฮธi2ฮธ~n,i)]+logโกฮธ~n,iโˆ’(logโกn)โ€‹pn}.\displaystyle\sum_{i=k+1}^{K}\left\{(\log p)(\log\theta_{i})+(n-i)\log\left[1-\frac{1}{n-i}\left(1-\frac{\theta_{i}^{2}}{\widetilde{\theta}_{n,i}}\right)\right]+\log\widetilde{\theta}_{n,i}-(\log n)\frac{p}{n}\right\}.

According to the second assertion in Theorem 2, due to h=oโ€‹(p)h=o(p) bulks in ฮ“Fc,H\Gamma_{F^{c,H}}, we also have (51). Thus,

ฮธ~n,iโˆผ1,i=2,โ€ฆ,K.\widetilde{\theta}_{n,i}\sim 1,~{}~{}i=2,\ldots,K.

When ฮณ1<โˆž\gamma_{1}<\infty, for i=2,โ€ฆ,Ki=2,\ldots,K, we have ฮธiโˆผฮพi\theta_{i}\sim\xi_{i} defined in (24). Hence, if the gap conditions (26) and (27) are satisfied, then

1nโ€‹(EDAยดkโˆ’EDAยดK)โˆผโˆ‘i=k+1K{ฮพi2+logโกฮพiโˆ’1โˆ’2โ€‹c}โ‰ฅ(Kโˆ’k)โ€‹mins=1,โ€ฆ,Kโกas>0,1nโ€‹(EDBยดkโˆ’EDBยดK)โˆผโˆ‘i=k+1K{ฮพi2+logโกฮพilogโกpโˆ’1โˆ’(logโกn)โ€‹c}โ‰ฅ(Kโˆ’k)โ€‹mins=1,โ€ฆ,Kโกbs>0.\begin{split}&\frac{1}{n}\left(\acute{\text{EDA}}_{k}-\acute{\text{EDA}}_{K}\right)\sim\sum_{i=k+1}^{K}\left\{\xi_{i}^{2}+\log\xi_{i}-1-2c\right\}\geq(K-k)\min\limits_{s=1,\ldots,K}a_{s}>0,\\ &\frac{1}{n}\left(\acute{\text{EDB}}_{k}-\acute{\text{EDB}}_{K}\right)\sim\sum_{i=k+1}^{K}\left\{\xi_{i}^{2}+\log\xi_{i}^{\log p}-1-(\log n)c\right\}\geq(K-k)\min\limits_{s=1,\ldots,K}b_{s}>0.\end{split}

When ฮณKโ†’โˆž\gamma_{K}\rightarrow\infty, by the similar discussion to that of Theorem 4 with nn instead of pp, without any gap conditions, we have

1nโ€‹(EDAยดkโˆ’EDAยดK)โ‰ฅฮณk+1โˆ’ฮผ1+(nโˆ’Kโˆ’1)โ€‹logโก[1+1nโˆ’kโˆ’1โ€‹(ฮธK2ฮธ~n,Kโˆ’(Kโˆ’k))]+(Kโˆ’k)โ€‹logโกฮธ~n,kโˆ’2โ€‹(Kโˆ’k)โ€‹cโ†’โˆž1nโ€‹(EDBยดkโˆ’EDBยดK)โ‰ฅ(logโกp)โ€‹(ฮณk+1โˆ’ฮผ1)+(nโˆ’Kโˆ’1)โ€‹logโก[1+1nโˆ’kโˆ’1โ€‹(ฮธK2ฮธ~n,Kโˆ’(Kโˆ’k))]+(Kโˆ’k)โ€‹logโกฮธ~n,kโˆ’(logโกn)โ€‹(Kโˆ’k)โ€‹cโ†’โˆž\begin{split}\frac{1}{n}\left(\acute{\text{EDA}}_{k}-\acute{\text{EDA}}_{K}\right)\geq&\gamma_{k+1}-\mu_{1}+(n-K-1)\log\left[1+\frac{1}{n-k-1}\left(\frac{\theta_{K}^{2}}{\widetilde{\theta}_{n,K}}-(K-k)\right)\right]\\ &+(K-k)\log\widetilde{\theta}_{n,k}-2(K-k)c\rightarrow\infty\\ \frac{1}{n}\left(\acute{\text{EDB}}_{k}-\acute{\text{EDB}}_{K}\right)\geq&(\log p)(\gamma_{k+1}-\mu_{1})+(n-K-1)\log\left[1+\frac{1}{n-k-1}\left(\frac{\theta_{K}^{2}}{\widetilde{\theta}_{n,K}}-(K-k)\right)\right]\\ &+(K-k)\log\widetilde{\theta}_{n,k}-(\log n)(K-k)c\rightarrow\infty\end{split}

Next, consider K<kโ‰คw=oโ€‹(p)K<k\leq w=o(p). Analogously, it yields

1nโ€‹(EDAkยดโˆ’EDAKยด)=โˆ‘i=K+1k{โˆ’logโกฮธiโˆ’(nโˆ’i)โ€‹logโก[1โˆ’1nโˆ’iโ€‹(1โˆ’ฮธi2ฮธ~n,i)]โˆ’logโกฮธ~n,i+2โ€‹pn}โˆผโˆ‘i=K+1k{1โˆ’ฮธi2ฮธ~n,iโˆ’logโกฮธiโˆ’logโกฮธ~n,i+2โ€‹c}โˆผ2โ€‹(kโˆ’K)โ€‹c>0,1nโ€‹(EDBkยดโˆ’EDBKยด)=โˆ‘i=K+1k{โˆ’(logโกp)โ€‹(logโกฮธi)โˆ’(nโˆ’i)โ€‹logโก[1โˆ’1nโˆ’iโ€‹(1โˆ’ฮธi2ฮธ~n,i)]โˆ’logโกฮธ~n,i+(logโกn)โ€‹pn}โˆผโˆ‘i=K+1k{1โˆ’ฮธi2ฮธ~n,iโˆ’(logโกp)โ€‹(logโกฮธi)โˆ’logโกฮธ~n,i+cโ€‹logโกn}โˆผ(kโˆ’K)โ€‹cโ€‹logโกn>0,\begin{split}&\frac{1}{n}\left(\acute{\text{EDA}_{k}}-\acute{\text{EDA}_{K}}\right)\\ =&\sum_{i=K+1}^{k}\left\{-\log\theta_{i}-(n-i)\log\left[1-\frac{1}{n-i}\left(1-\frac{\theta_{i}^{2}}{\widetilde{\theta}_{n,i}}\right)\right]-\log\widetilde{\theta}_{n,i}+2\frac{p}{n}\right\}\\ \sim&\sum_{i=K+1}^{k}\left\{1-\frac{\theta_{i}^{2}}{\widetilde{\theta}_{n,i}}-\log\theta_{i}-\log\widetilde{\theta}_{n,i}+2c\right\}\sim 2(k-K)c>0,\\ &\frac{1}{n}\left(\acute{\text{EDB}_{k}}-\acute{\text{EDB}_{K}}\right)\\ =&\sum_{i=K+1}^{k}\left\{-(\log p)(\log\theta_{i})-(n-i)\log\left[1-\frac{1}{n-i}\left(1-\frac{\theta_{i}^{2}}{\widetilde{\theta}_{n,i}}\right)\right]-\log\widetilde{\theta}_{n,i}+(\log n)\frac{p}{n}\right\}\\ \sim&\sum_{i=K+1}^{k}\left\{1-\frac{\theta_{i}^{2}}{\widetilde{\theta}_{n,i}}-(\log p)(\log\theta_{i})-\log\widetilde{\theta}_{n,i}+c\log n\right\}\sim(k-K)c\log n>0,\end{split}

which completes the proof. โˆŽ

S.3 Remaining proof

Below are some lemmas required.

Lemma 5.

For nร—nn\times n invertible matrix AA and nร—1n\times 1 vectors ๐ช,๐ฏ\mathbf{q},\mathbf{v} where AA and A+๐ฏ๐ฏโˆ—A+\mathbf{v}\mathbf{v}^{*} are invertible, we have

๐ชโˆ—โ€‹(A+๐ฏ๐ฏโˆ—)โˆ’1=๐ชโˆ—โ€‹Aโˆ’1โˆ’๐ชโˆ—โ€‹Aโˆ’1โ€‹๐ฏ1+๐ฏโˆ—โ€‹Aโˆ’1โ€‹๐ฏโ€‹๐ฏโˆ—โ€‹Aโˆ’1.\mathbf{q}^{*}\left(A+\mathbf{v}\mathbf{v}^{*}\right)^{-1}=\mathbf{q}^{*}A^{-1}-\frac{\mathbf{q}^{*}A^{-1}\mathbf{v}}{1+\mathbf{v}^{*}A^{-1}\mathbf{v}}\mathbf{v}^{*}A^{-1}.
Lemma 6.

Let B=(biโ€‹j)โˆˆโ„nร—nB=(b_{ij})\in\mathbb{R}^{n\times n} with โ€–Bโ€–=Oโ€‹(1)\|B\|=O(1) and ๐ฑ=(x1,โ€ฆ,xn)โŠค\mathbf{x}=(x_{1},\ldots,x_{n})^{\top}, where xix_{i} are i.i.d. satisfying Eโ€‹xi=0\mbox{E}x_{i}=0, Eโ€‹|xi|2=1\mbox{E}|x_{i}|^{2}=1. Then, there is

Eโ€‹|๐ฑโˆ—โ€‹Bโ€‹๐ฑโˆ’trโ€‹B|qโ‰คCqโ€‹((Eโ€‹|x1|4โ€‹trโ€‹Bโ€‹Bโˆ—)q/2+Eโ€‹|x1|2โ€‹qโ€‹trโ€‹(Bโ€‹Bโˆ—)q/2).E\left|\mathbf{x}^{*}B\mathbf{x}-\mbox{tr}B\right|^{q}\leq C_{q}\left(\left(E\left|x_{1}\right|^{4}\mbox{tr}BB^{*}\right)^{q/2}+E\left|x_{1}\right|^{2q}\mbox{tr}\left(BB^{*}\right)^{q/2}\right).
Lemma 7.

(Burkholder inequality) Let {Xk}\{X_{k}\} be a complex martingale difference sequence with respect to the filtration โ„ฑk\mathcal{F}_{k}. For every qโ‰ฅ1q\geq 1, there exists Cq>0C_{q}>0 such that:

Eโ€‹|โˆ‘k=1nXk|2โ€‹qโ‰คCqโ€‹(Eโ€‹(โˆ‘k=1nEโ€‹(|Xk|2|โ„ฑkโˆ’1))q+โˆ‘k=1nEโ€‹|Xk|2โ€‹q).\mbox{E}\left|\sum^{n}_{k=1}X_{k}\right|^{2q}\leq C_{q}\left(\mbox{E}\left(\sum^{n}_{k=1}\mbox{E}(|X_{k}|^{2}|\mathcal{F}_{k-1})\right)^{q}+\sum^{n}_{k=1}\mbox{E}|X_{k}|^{2q}\right).

For simplicity, we remove the subscripts of โ€œnโ€. Let ๐—=[๐ฑ1,โ€ฆ,๐ฑn]\mathbf{X}=[\mathbf{x}_{1},\ldots,\mathbf{x}_{n}], ๐ฑi=๐ši+๐šบ1/2โ€‹๐ฐi\mathbf{x}_{i}=\mathbf{a}_{i}+\bm{\Sigma}^{1/2}\mathbf{w}_{i}, ๐—k=๐—โˆ’๐ฑkโ€‹๐žkโŠค\mathbf{X}_{k}=\mathbf{X}-\mathbf{x}_{k}\mathbf{e}_{k}^{\top}, and hence define

Qkโ€‹(z)=(๐—kโ€‹๐—kโˆ—โˆ’zโ€‹๐ˆ)โˆ’1.Q_{k}(z)=(\mathbf{X}_{k}\mathbf{X}_{k}^{*}-z\mathbf{I})^{-1}.

Moreover, we also introduce some basic notations and formulas. For kร—kk\times k invertible matrix A,BA,B and kk-dimensional vector ๐ช\mathbf{q}, there are

๐ชโˆ—โ€‹(B+๐ช๐ชโˆ—)โˆ’1=11+๐ชโˆ—โ€‹Bโˆ’1โ€‹๐ชโ€‹๐ชโˆ—โ€‹Bโˆ’1,\mathbf{q}^{*}\left(B+\mathbf{q}\mathbf{q}^{*}\right)^{-1}=\frac{1}{1+\mathbf{q}^{*}B^{-1}\mathbf{q}}\mathbf{q}^{*}B^{-1}, (S.65)
Aโˆ’1โˆ’Bโˆ’1=Bโˆ’1โ€‹(Bโˆ’A)โ€‹Aโˆ’1.A^{-1}-B^{-1}=B^{-1}(B-A)A^{-1}. (S.66)

Moreover, define

ฮฒk=11+๐ฑkโˆ—โ€‹Qkโ€‹(z)โ€‹๐ฑk,\beta_{k}=\frac{1}{1+\mathbf{x}_{k}^{*}Q_{k}(z)\mathbf{x}_{k}}, (S.67)
bk=11+trโ€‹(๐šบโ€‹Qkโ€‹(z))/n+๐škโˆ—โ€‹Qkโ€‹(z)โ€‹๐šk.b_{k}=\frac{1}{1+\mbox{tr}(\bm{\Sigma}Q_{k}(z))/n+\mathbf{a}_{k}^{*}Q_{k}(z)\mathbf{a}_{k}}. (S.68)

The following lemma is useful in calculating some moments bounds below:

Lemma 8.

For zโˆˆโ„‚+z\in\mathbb{C}_{+}, there are |ฮฒk|โ‰ค|z||โ„‘โกz||\beta_{k}|\leq\frac{|z|}{|\Im z|}, |bk|โ‰ค|z||โ„‘โกz||b_{k}|\leq\frac{|z|}{|\Im z|} and โ€–Qkโ€‹(z)โ€‹๐—kโ€–โ‰ค(1|โ„‘โกz|+|z||โ„‘โกz|2)1/2\|Q_{k}(z)\mathbf{X}_{k}\|\leq(\frac{1}{|\Im z|}+\frac{|z|}{|\Im z|^{2}})^{1/2}.

Proof.

We have

|zโˆ’1โ€‹ฮฒk|โ‰ค1โ„‘โก(z+zโ€‹๐ฑkโˆ—โ€‹Qkโ€‹(z)โ€‹๐ฑk)โ‰ค1โ„‘โกz,|z^{-1}\beta_{k}|\leq\frac{1}{\Im(z+z\mathbf{x}_{k}^{*}Q_{k}(z)\mathbf{x}_{k})}\leq\frac{1}{\Im z},

where the second step uses the fact that โ„‘โก(zโ€‹๐ฑkโˆ—โ€‹Qkโ€‹(z)โ€‹๐ฑk)>0\Im(z\mathbf{x}_{k}^{*}Q_{k}(z)\mathbf{x}_{k})>0. Therefore |ฮฒk|โ‰ค|z|/โ„‘โกz|\beta_{k}|\leq|z|/\Im z. The bound for |bk||b_{k}| is checked similarly. For the last one, there is

โ€–Qkโ€‹(z)โ€‹๐—kโ€–\displaystyle\|Q_{k}(z)\mathbf{X}_{k}\| =\displaystyle= โ€–Qkโ€‹(z)โ€‹๐—kโ€‹๐—kโˆ—โ€‹Qkโ€‹(z)โ€–12\displaystyle\|Q_{k}(z)\mathbf{X}_{k}\mathbf{X}_{k}^{*}Q_{k}(z)\|^{\frac{1}{2}}
=\displaystyle= โ€–Qkโ€‹(z)โ€‹(๐—kโ€‹๐—kโˆ—โˆ’zโ€‹๐ˆ+zโ€‹๐ˆ)โ€‹Qkโ€‹(z)โ€–1/2\displaystyle\|Q_{k}(z)(\mathbf{X}_{k}\mathbf{X}_{k}^{*}-z\mathbf{I}+z\mathbf{I})Q_{k}(z)\|^{1/2}
โ‰ค\displaystyle\leq โ€–Qkโ€‹(z)+zโ€‹Qkโ€‹(z)โ€‹Qkโ€‹(z)โ€–1/2\displaystyle\|Q_{k}(z)+zQ_{k}(z)Q_{k}(z)\|^{1/2}
โ‰ค\displaystyle\leq (1|โ„‘โกz|+|z||โ„‘โกz|2)1/2.\displaystyle(\frac{1}{|\Im z|}+\frac{|z|}{|\Im z|^{2}})^{1/2}.

โˆŽ

Proof of Proposition 3. We first truncate, recentralize and renormalize the entries of ๐–\mathbf{W} following the steps in Bai etย al., (2007). Select ฮทnโ†’0\eta_{n}\to 0 and satisfies ฮทnโˆ’4โ€‹โˆซ{|n1/2โ€‹W11|โ‰ฅฮทnโ€‹n1/4}|n1/2โ€‹W11|4โ†’0\eta_{n}^{-4}\int_{\left\{\left|n^{1/2}W_{11}\right|\geq\eta_{n}n^{1/4}\right\}}\left|n^{1/2}W_{11}\right|^{4}\rightarrow 0. Let W^iโ€‹j=Wiโ€‹jโ€‹Iโ€‹(|Wiโ€‹j|โ‰คฮทnโ€‹nโˆ’1/4)โˆ’Eโ€‹Wiโ€‹jโ€‹Iโ€‹(|Xiโ€‹j|โ‰คฮทnโ€‹nโˆ’1/4),๐–~n=๐–nโˆ’๐–^n\hat{W}_{ij}=W_{ij}I\left(\left|W_{ij}\right|\leq\eta_{n}n^{-1/4}\right)-EW_{ij}I\left(\left|X_{ij}\right|\leq\eta_{n}n^{-1/4}\right),\tilde{\mathbf{W}}_{n}=\mathbf{W}_{n}-\hat{\mathbf{W}}_{n}, and ๐—^n=๐€n+ฮฃ1/2โ€‹๐–^n\hat{\mathbf{X}}_{n}=\mathbf{A}_{n}+\Sigma^{1/2}\hat{\mathbf{W}}_{n}, where ๐–^n=(W^iโ€‹j)\hat{\mathbf{W}}_{n}=\left(\hat{W}_{ij}\right). Let ฯƒn2=Eโ€‹|W^11|2\sigma_{n}^{2}=E\left|\hat{W}_{11}\right|^{2} and ๐—ห˜n=๐€+ฯƒnโˆ’1โ€‹๐šบโ€‹๐–^n\breve{\mathbf{X}}_{n}=\mathbf{A}+\sigma_{n}^{-1}\bm{\Sigma}\hat{\mathbf{W}}_{n}. Write Qห˜โ€‹(z)=(๐—ห˜nโ€‹๐—ห˜nโˆ—โˆ’zโ€‹I)โˆ’1\breve{Q}(z)=\left(\breve{\mathbf{X}}_{n}\breve{\mathbf{X}}_{n}^{*}-zI\right)^{-1}. Then following the arguments used in the proof Lemma 4 therein, we can show that Eโ€‹|๐ฏโˆ—โ€‹(Qห˜โ€‹(z)โˆ’Qโ€‹(z))โ€‹๐ฏ|2=oโ€‹(nโˆ’1).\mbox{E}|\mathbf{v}^{*}(\breve{Q}(z)-Q(z))\mathbf{v}|^{2}=o(n^{-1}).

With this truncation and centralization, we have the following simple bound that can be checked using Lemma 6 and will be used frequently later:

Eโ€‹|๐ฐ1โˆ—โ€‹Aโ€‹๐ฐ1โˆ’nโˆ’1โ€‹trโ€‹A|qโ‰คCโ€‹(ฮทn2โ€‹qโˆ’4โ€‹nโˆ’q/2+nโˆ’q/2)โ‰คCโ€‹nโˆ’q/2\displaystyle\mbox{E}\left|\mathbf{w}_{1}^{*}A\mathbf{w}_{1}-n^{-1}\mbox{tr}A\right|^{q}\leq C\left(\eta_{n}^{2q-4}n^{-q/2}+n^{-q/2}\right)\leq Cn^{-q/2} (S.69)

With this bound, and

Eโ€‹|๐ฐ1โˆ—โ€‹๐ฏ|4โ‰คCโ€‹nโˆ’2,\mbox{E}|\mathbf{w}_{1}^{*}\mathbf{v}|^{4}\leq Cn^{-2}, (S.70)

for any deterministic unit norm vector ๐ฏ\mathbf{v}, we can obtain the following lemma without difficulty

Lemma 9.

Let ฮ”k=๐ฑkโˆ—โ€‹Qkโ€‹(z)โ€‹๐ฑkโˆ’trโ€‹๐šบโ€‹Qkโ€‹(z)nโˆ’๐škโˆ—โ€‹Qkโ€‹(z)โ€‹๐šk\Delta_{k}=\mathbf{x}_{k}^{*}Q_{k}(z)\mathbf{x}_{k}-\frac{\mbox{tr}\bm{\Sigma}Q_{k}(z)}{n}-\mathbf{a}_{k}^{*}Q_{k}(z)\mathbf{a}_{k} and E๐ฐk\mbox{E}_{\mathbf{w}_{k}} be the conditional expectation with respect to the ฯƒ\sigma-field generated by {๐ฐl,lโ‰ k}\{\mathbf{w}_{l},l\neq k\}. Under Assumption 1 and Assumption 3, for 1โ‰คqโ‰ค41\leq q\leq 4, there is

Ewkโ€‹|ฮ”k|q=OPโ€‹(1nq/2โ€‹|โ„‘โกz|q).\mbox{E}_{w_{k}}\left|\Delta_{k}\right|^{q}=O_{P}\left(\frac{1}{n^{q/2}|\Im z|^{q}}\right).

A direct consequence is that

Ewkโ€‹|ฮฒkโˆ’bk|q=OPโ€‹(|z|qnq/2โ€‹|โ„‘โกz|3โ€‹q).\mbox{E}_{w_{k}}|\beta_{k}-b_{k}|^{q}=O_{P}\left(\frac{|z|^{q}}{n^{q/2}|\Im z|^{3q}}\right).
Proof.

Using the Woodbury identity in Lemma 2, there is

Q~โ€‹(z)\displaystyle\tilde{Q}(z) =(โˆ’zโ€‹๐ˆ)โˆ’1โˆ’(โˆ’zโ€‹๐ˆ)โˆ’1โ€‹๐—โˆ—โ€‹[๐ˆ+๐—โ€‹(โˆ’z)โˆ’1โ€‹๐—โˆ—]โˆ’1โ€‹๐—โ€‹(โˆ’zโ€‹๐ˆ)โˆ’1\displaystyle=(-z\mathbf{I})^{-1}-(-z\mathbf{I})^{-1}\mathbf{X}^{*}[\mathbf{I}+\mathbf{X}(-z)^{-1}\mathbf{X}^{*}]^{-1}\mathbf{X}(-z\mathbf{I})^{-1} (S.71)
=(โˆ’zโ€‹๐ˆ)โˆ’1+zโˆ’1โ€‹๐—โˆ—โ€‹(๐—๐—โˆ—โˆ’zโ€‹๐ˆ)โˆ’1โ€‹๐—.\displaystyle=(-z\mathbf{I})^{-1}+z^{-1}\mathbf{X}^{*}(\mathbf{X}\mathbf{X}^{*}-z\mathbf{I})^{-1}\mathbf{X}.

To prove Proposition 3, it suffices to prove

Eโ€‹|๐ฎโˆ—โ€‹๐—โˆ—โ€‹(๐—๐—โˆ—โˆ’zโ€‹๐ˆ)โˆ’1โ€‹๐—๐ฎโˆ’Eโ€‹๐ฎโˆ—โ€‹๐—โˆ—โ€‹(๐—๐—โˆ—โˆ’zโ€‹๐ˆ)โˆ’1โ€‹๐—๐ฎ|2โ‰คCโ€‹nโˆ’1,\displaystyle\mbox{E}|\mathbf{u}^{*}\mathbf{X}^{*}(\mathbf{X}\mathbf{X}^{*}-z\mathbf{I})^{-1}\mathbf{X}\mathbf{u}-\mbox{E}\mathbf{u}^{*}\mathbf{X}^{*}(\mathbf{X}\mathbf{X}^{*}-z\mathbf{I})^{-1}\mathbf{X}\mathbf{u}|^{2}\leq Cn^{-1}, (S.72)

and

|Eโ€‹๐ฎโˆ—โ€‹๐—โˆ—โ€‹(๐—๐—โˆ—โˆ’zโ€‹๐ˆ)โˆ’1โ€‹๐—๐ฎโˆ’Eโ€‹๐ฎโˆ—โ€‹๐—0โˆ—โ€‹(๐—0โ€‹๐—0โˆ—โˆ’zโ€‹๐ˆ)โˆ’1โ€‹๐—0โ€‹๐ฎ|โ‰คCโ€‹1n,\displaystyle|\mbox{E}\mathbf{u}^{*}\mathbf{X}^{*}(\mathbf{X}\mathbf{X}^{*}-z\mathbf{I})^{-1}\mathbf{X}\mathbf{u}-\mbox{E}\mathbf{u}^{*}\mathbf{X}_{0}^{*}(\mathbf{X}_{0}\mathbf{X}_{0}^{*}-z\mathbf{I})^{-1}\mathbf{X}_{0}\mathbf{u}|\leq C\frac{1}{\sqrt{n}}, (S.73)

where ๐ฎ=(u1,โ€ฆ,un)โŠค\mathbf{u}=(u_{1},\ldots,u_{n})^{\top} is a fixed unit vector and ๐—0\mathbf{X}_{0} represents the case of the ๐–\mathbf{W} being gaussian, denoted by ๐–0\mathbf{W}_{0}. Suppose, by singular value decomposition, ๐šบ=Uโ€‹Dโ€‹UโŠค\bm{\Sigma}=UDU^{\top}, we then have

Eโ€‹๐ฎโˆ—โ€‹[(๐€+๐šบ1/2โ€‹๐–0)โˆ—โ€‹(๐€+๐šบ1/2โ€‹๐–0)โˆ’zโ€‹๐ˆ]โˆ’1โ€‹๐ฎ\displaystyle\mbox{E}\mathbf{u}^{*}\left[(\mathbf{A}+\bm{\Sigma}^{1/2}\mathbf{W}_{0})^{*}(\mathbf{A}+\bm{\Sigma}^{1/2}\mathbf{W}_{0})-z\mathbf{I}\right]^{-1}\mathbf{u}
=\displaystyle= Eโ€‹๐ฎโˆ—โ€‹[(๐€+Uโ€‹D1/2โ€‹UโŠคโ€‹๐–0)โˆ—โ€‹(๐€+Uโ€‹D1/2โ€‹UโŠคโ€‹๐–0)โˆ’zโ€‹๐ˆ]โˆ’1โ€‹๐ฎ\displaystyle\mbox{E}\mathbf{u}^{*}\left[(\mathbf{A}+UD^{1/2}U^{\top}\mathbf{W}_{0})^{*}(\mathbf{A}+UD^{1/2}U^{\top}\mathbf{W}_{0})-z\mathbf{I}\right]^{-1}\mathbf{u}
=\displaystyle= Eโ€‹๐ฎโˆ—โ€‹[(๐€+Uโ€‹D1/2โ€‹๐–0)โˆ—โ€‹(๐€+Uโ€‹D1/2โ€‹๐–0)โˆ’zโ€‹๐ˆ]โˆ’1โ€‹๐ฎ\displaystyle\mbox{E}\mathbf{u}^{*}\left[(\mathbf{A}+UD^{1/2}\mathbf{W}_{0})^{*}(\mathbf{A}+UD^{1/2}\mathbf{W}_{0})-z\mathbf{I}\right]^{-1}\mathbf{u}
=\displaystyle= Eโ€‹๐ฎโˆ—โ€‹[(Uโ€‹(UโŠคโ€‹๐€+D1/2โ€‹๐–0))โˆ—โ€‹(Uโ€‹(UโŠคโ€‹๐€+D1/2โ€‹๐–0))โˆ’zโ€‹๐ˆ]โˆ’1โ€‹๐ฎ\displaystyle\mbox{E}\mathbf{u}^{*}\left[\big{(}U(U^{\top}\mathbf{A}+D^{1/2}\mathbf{W}_{0})\big{)}^{*}\big{(}U(U^{\top}\mathbf{A}+D^{1/2}\mathbf{W}_{0})\big{)}-z\mathbf{I}\right]^{-1}\mathbf{u}
=\displaystyle= Eโ€‹๐ฎโˆ—โ€‹[(UโŠคโ€‹๐€+D1/2โ€‹๐–0)โˆ—โ€‹(UโŠคโ€‹๐€+D1/2โ€‹๐–0)โˆ’zโ€‹๐ˆ]โˆ’1โ€‹๐ฎ.\displaystyle\mbox{E}\mathbf{u}^{*}\left[\big{(}U^{\top}\mathbf{A}+D^{1/2}\mathbf{W}_{0}\big{)}^{*}\big{(}U^{\top}\mathbf{A}+D^{1/2}\mathbf{W}_{0}\big{)}-z\mathbf{I}\right]^{-1}\mathbf{u}.

Letting UโŠคโ€‹๐€U^{\top}\mathbf{A} as ๐€\mathbf{A}, it satisfies the model in Hachem etย al., (2013). Hence we have

Eโ€‹|๐ฎโˆ—โ€‹([(UโŠคโ€‹๐€+D1/2โ€‹๐–0)โˆ—โ€‹(UโŠคโ€‹๐€+D1/2โ€‹๐–0)โˆ’zโ€‹๐ˆ]โˆ’1โˆ’Tโ€ฒโ€‹(z))โ€‹๐ฎ|โ‰คCโ€‹1n,\mbox{E}\left|\mathbf{u}^{*}\left(\left[\big{(}U^{\top}\mathbf{A}+D^{1/2}\mathbf{W}_{0}\big{)}^{*}\big{(}U^{\top}\mathbf{A}+D^{1/2}\mathbf{W}_{0}\big{)}-z\mathbf{I}\right]^{-1}-T^{\prime}(z)\right)\mathbf{u}\right|\leq C\frac{1}{\sqrt{n}}, (S.74)

where Tโ€ฒโ€‹(z)=(โˆ’zโ€‹(1+ฮดโ€‹(z))โ€‹๐ˆ+๐€โˆ—โ€‹Uโ€‹(๐ˆ+ฮด~โ€‹(z)โ€‹D)โˆ’1โ€‹UโŠคโ€‹๐€)โˆ’1=T~โ€‹(z)T^{\prime}(z)=\left(-z(1+\delta(z))\mathbf{I}+\mathbf{A}^{*}U(\mathbf{I}+\tilde{\delta}(z)D)^{-1}U^{\top}\mathbf{A}\right)^{-1}=\tilde{T}(z). Moreover, combing Proposition 3.8 and Proposition 3.9 in Hachem etย al., (2013), the conclusion follows.

Proof of (S.72): We will write the term in (S.72) as the sum of the martingale difference sequence first. Using two basic matrix equality (S.65) and (S.66) and, there is

๐ฎโˆ—โ€‹๐—โˆ—โ€‹(๐—๐—โˆ—โˆ’zโ€‹๐ˆ)โˆ’1โ€‹๐—๐ฎโˆ’๐ฎโˆ—โ€‹๐—kโˆ—โ€‹(๐—kโ€‹๐—kโˆ—โˆ’zโ€‹๐ˆ)โˆ’1โ€‹๐—kโ€‹๐ฎ\displaystyle\mathbf{u}^{*}\mathbf{X}^{*}(\mathbf{X}\mathbf{X}^{*}-z\mathbf{I})^{-1}\mathbf{X}\mathbf{u}-\mathbf{u}^{*}\mathbf{X}_{k}^{*}(\mathbf{X}_{k}\mathbf{X}_{k}^{*}-z\mathbf{I})^{-1}\mathbf{X}_{k}\mathbf{u} (S.75)
=\displaystyle= ๐ฎโˆ—โ€‹(๐—โˆ—โˆ’๐—kโˆ—)โ€‹(๐—๐—โˆ—โˆ’zโ€‹๐ˆ)โˆ’1โ€‹๐—๐ฎ+๐ฎโˆ—โ€‹๐—kโˆ—โ€‹(Qโ€‹(z)โˆ’Qkโ€‹(z))โ€‹๐—๐ฎ+๐ฎโˆ—โ€‹๐—kโˆ—โ€‹Qkโ€‹(z)โ€‹(๐—โˆ’๐—k)โ€‹๐ฎ\displaystyle\mathbf{u}^{*}(\mathbf{X}^{*}-\mathbf{X}_{k}^{*})(\mathbf{X}\mathbf{X}^{*}-z\mathbf{I})^{-1}\mathbf{X}\mathbf{u}+\mathbf{u}^{*}\mathbf{X}_{k}^{*}\big{(}Q(z)-Q_{k}(z)\big{)}\mathbf{X}\mathbf{u}+\mathbf{u}^{*}\mathbf{X}_{k}^{*}Q_{k}(z)(\mathbf{X}-\mathbf{X}_{k})\mathbf{u}
=\displaystyle= ๐ฎโˆ—โ€‹๐žkโ€‹๐ฑkโˆ—โ€‹Qkโ€‹(z)โ€‹๐—๐ฎโ€‹ฮฒkโˆ’๐ฎโˆ—โ€‹๐—kโˆ—โ€‹Qkโ€‹(z)โ€‹๐ฑkโ€‹๐ฑkโˆ—โ€‹Qkโ€‹(z)โ€‹๐—๐ฎโ€‹ฮฒk+๐ฎโˆ—โ€‹๐—kโˆ—โ€‹Qkโ€‹(z)โ€‹๐ฑkโ€‹๐žkโˆ—โ€‹๐ฎ\displaystyle\mathbf{u}^{*}\mathbf{e}_{k}\mathbf{x}_{k}^{*}Q_{k}(z)\mathbf{X}\mathbf{u}\beta_{k}-\mathbf{u}^{*}\mathbf{X}_{k}^{*}Q_{k}(z)\mathbf{x}_{k}\mathbf{x}_{k}^{*}Q_{k}(z)\mathbf{X}\mathbf{u}\beta_{k}+\mathbf{u}^{*}\mathbf{X}_{k}^{*}Q_{k}(z)\mathbf{x}_{k}\mathbf{e}_{k}^{*}\mathbf{u}
:=\displaystyle:= Akโˆ’Bk+Ck.\displaystyle A_{k}-B_{k}+C_{k}.

Denote by Ek\mbox{E}_{k} the conditional expectation with respect to the ฯƒ\sigma-field generated by {wi,iโ‰คk}\{w_{i},i\leq k\}. With the above expansion, it is equivalent to obtaining a bound of

Eโ€‹|โˆ‘k=1n(Ekโˆ’Ekโˆ’1)โ€‹(Akโˆ’Bk+Ck)|2.\mbox{E}\left|\sum_{k=1}^{n}(\mbox{E}_{k}-\mbox{E}_{k-1})(A_{k}-B_{k}+C_{k})\right|^{2}.

We split AkA_{k} as:

Ak\displaystyle A_{k} =\displaystyle= ๐ฎโˆ—โ€‹๐žkโ€‹๐ฑkโˆ—โ€‹Qkโ€‹(z)โ€‹๐—๐ฎโ€‹ฮฒk\displaystyle\mathbf{u}^{*}\mathbf{e}_{k}\mathbf{x}_{k}^{*}Q_{k}(z)\mathbf{X}\mathbf{u}\beta_{k}
=\displaystyle= ๐ฎโˆ—โ€‹๐žkโ€‹(๐škโˆ—+๐ฐkโˆ—โ€‹๐šบ1/2)โ€‹Qkโ€‹(z)โ€‹(๐—k+๐ฑkโ€‹๐žkโˆ—)โ€‹๐ฎโ€‹ฮฒk\displaystyle\mathbf{u}^{*}\mathbf{e}_{k}(\mathbf{a}_{k}^{*}+\mathbf{w}_{k}^{*}\bm{\Sigma}^{1/2})Q_{k}(z)(\mathbf{X}_{k}+\mathbf{x}_{k}\mathbf{e}_{k}^{*})\mathbf{u}\beta_{k}
=\displaystyle= ๐ฎโˆ—โ€‹๐žkโ€‹๐škโˆ—โ€‹Qkโ€‹(z)โ€‹๐—kโ€‹๐ฎโ€‹ฮฒk+๐ฎโˆ—โ€‹๐žkโ€‹๐škโˆ—โ€‹Qkโ€‹(z)โ€‹๐ฑkโ€‹๐žkโˆ—โ€‹๐ฎโ€‹ฮฒk\displaystyle\mathbf{u}^{*}\mathbf{e}_{k}\mathbf{a}_{k}^{*}Q_{k}(z)\mathbf{X}_{k}\mathbf{u}\beta_{k}+\mathbf{u}^{*}\mathbf{e}_{k}\mathbf{a}_{k}^{*}Q_{k}(z)\mathbf{x}_{k}\mathbf{e}_{k}^{*}\mathbf{u}\beta_{k}
+๐ฎโˆ—โ€‹๐žkโ€‹๐ฐkโˆ—โ€‹๐šบ1/2โ€‹Qkโ€‹(z)โ€‹๐—kโ€‹๐ฎโ€‹ฮฒk+๐ฎโˆ—โ€‹๐žkโ€‹๐ฐkโˆ—โ€‹๐šบ1/2โ€‹Qkโ€‹(z)โ€‹๐ฑkโ€‹๐žkโˆ—โ€‹๐ฎโ€‹ฮฒk\displaystyle+\mathbf{u}^{*}\mathbf{e}_{k}\mathbf{w}_{k}^{*}\bm{\Sigma}^{1/2}Q_{k}(z)\mathbf{X}_{k}\mathbf{u}\beta_{k}+\mathbf{u}^{*}\mathbf{e}_{k}\mathbf{w}_{k}^{*}\bm{\Sigma}^{1/2}Q_{k}(z)\mathbf{x}_{k}\mathbf{e}_{k}^{*}\mathbf{u}\beta_{k}
:=\displaystyle:= A1โ€‹k+A2โ€‹k+A3โ€‹k+A4โ€‹k.\displaystyle A_{1k}+A_{2k}+A_{3k}+A_{4k}.

To obtain the bound for the term involving AkA_{k}, we consider the bounds of A1โ€‹kA_{1k} to A4โ€‹kA_{4k}, respectively.

For A1โ€‹k=๐ฎโˆ—โ€‹๐žkโ€‹๐škโˆ—โ€‹Qkโ€‹(z)โ€‹๐—kโ€‹๐ฎโ€‹ฮฒkA_{1k}=\mathbf{u}^{*}\mathbf{e}_{k}\mathbf{a}_{k}^{*}Q_{k}(z)\mathbf{X}_{k}\mathbf{u}\beta_{k}, we can decompose it as the sum of two components: ๐ฎโˆ—โ€‹๐žkโ€‹๐škโˆ—โ€‹Qkโ€‹(z)โ€‹๐—kโ€‹๐ฎโ€‹bk\mathbf{u}^{*}\mathbf{e}_{k}\mathbf{a}_{k}^{*}Q_{k}(z)\mathbf{X}_{k}\mathbf{u}b_{k} and ๐ฎโˆ—โ€‹๐žkโ€‹๐škโˆ—โ€‹Qkโ€‹(z)โ€‹๐—kโ€‹๐ฎโ€‹(ฮฒkโˆ’bk)\mathbf{u}^{*}\mathbf{e}_{k}\mathbf{a}_{k}^{*}Q_{k}(z)\mathbf{X}_{k}\mathbf{u}(\beta_{k}-b_{k}). Since (Ekโˆ’Ekโˆ’1)โ€‹๐ฎโˆ—โ€‹๐žkโ€‹๐škโˆ—โ€‹Qkโ€‹(z)โ€‹๐—kโ€‹๐ฎโ€‹bk=0(E_{k}-E_{k-1})\mathbf{u}^{*}\mathbf{e}_{k}\mathbf{a}_{k}^{*}Q_{k}(z)\mathbf{X}_{k}\mathbf{u}b_{k}=0, we have

โˆ‘k=1nEkโˆ’1โ€‹|(Ekโˆ’Ekโˆ’1)โ€‹A1โ€‹k|2\displaystyle\sum_{k=1}^{n}\mbox{E}_{k-1}\left|(\mbox{E}_{k}-\mbox{E}_{k-1})A_{1k}\right|^{2} โ‰ค\displaystyle\leq Cโ€‹โˆ‘k=1nEkโˆ’1โ€‹|๐ฎโˆ—โ€‹๐žkโ€‹๐škโˆ—โ€‹Qkโ€‹(z)โ€‹๐—kโ€‹๐ฎโ€‹(ฮฒkโˆ’bk)|2\displaystyle C\sum_{k=1}^{n}\mbox{E}_{k-1}\left|\mathbf{u}^{*}\mathbf{e}_{k}\mathbf{a}_{k}^{*}Q_{k}(z)\mathbf{X}_{k}\mathbf{u}(\beta_{k}-b_{k})\right|^{2}
โ‰ค\displaystyle\leq Cโ€‹โˆ‘k=1nEkโˆ’1โ€‹{|uk|2โ€‹โ€–Qkโ€‹(z)โ€‹๐—kโ€–2โ€‹Ewkโ€‹|ฮฒkโˆ’bk|2}\displaystyle C\sum_{k=1}^{n}\mbox{E}_{k-1}\left\{|u_{k}|^{2}\|Q_{k}(z)\mathbf{X}_{k}\|^{2}\mbox{E}_{w_{k}}|\beta_{k}-b_{k}|^{2}\right\}
โ‰ค\displaystyle\leq Cโ€‹nโˆ’1,\displaystyle Cn^{-1},

where uku_{k} is the kk-th coordinate of ๐ฎ\mathbf{u}, and the third lines uses Lemmas 8, 9 and โˆ‘k=1|uk|2=1\sum_{k=1}|u_{k}|^{2}=1.

Similarly,

โˆ‘k=1nEโ€‹|(Ekโˆ’Ekโˆ’1)โ€‹A1โ€‹k|2\displaystyle\sum_{k=1}^{n}\mbox{E}|(\mbox{E}_{k}-\mbox{E}_{k-1})A_{1k}|^{2} โ‰ค\displaystyle\leq Cโ€‹โˆ‘k=1nEโ€‹|๐ฎโˆ—โ€‹๐žkโ€‹๐škโˆ—โ€‹Qkโ€‹(z)โ€‹๐—kโ€‹๐ฎโ€‹(ฮฒkโˆ’bk)|2\displaystyle C\sum_{k=1}^{n}\mbox{E}|\mathbf{u}^{*}\mathbf{e}_{k}\mathbf{a}_{k}^{*}Q_{k}(z)\mathbf{X}_{k}\mathbf{u}(\beta_{k}-b_{k})|^{2} (S.76)
โ‰ค\displaystyle\leq Cโ€‹โˆ‘k=1n|uk|2โ€‹Eโ€‹(โ€–Qkโ€‹(z)โ€‹๐—kโ€–โ‹…|ฮฒkโˆ’bk|)2\displaystyle C\sum_{k=1}^{n}|u_{k}|^{2}\mbox{E}\left(\|Q_{k}(z)\mathbf{X}_{k}\|\cdot|\beta_{k}-b_{k}|\right)^{2} (S.77)
โ‰ค\displaystyle\leq Cโ€‹nโˆ’1.\displaystyle Cn^{-1}. (S.78)

Thus, applying the Burkholder inequality in Lemma 7, there is

Eโ€‹|โˆ‘k=1n(Ekโˆ’Ekโˆ’1)โ€‹A1โ€‹k|2โ‰คCโ€‹nโˆ’1.\mbox{E}\left|\sum^{n}_{k=1}(\mbox{E}_{k}-\mbox{E}_{k-1})A_{1k}\right|^{2}\leq Cn^{-1}. (S.79)

For A2โ€‹k=๐ฎโˆ—โ€‹๐žkโ€‹๐škโˆ—โ€‹Qkโ€‹(z)โ€‹๐ฑkโ€‹๐žkโˆ—โ€‹๐ฎโ€‹ฮฒkA_{2k}=\mathbf{u}^{*}\mathbf{e}_{k}\mathbf{a}_{k}^{*}Q_{k}(z)\mathbf{x}_{k}\mathbf{e}_{k}^{*}\mathbf{u}\beta_{k}, by ๐ฑk=๐šk+๐šบ1/2โ€‹๐ฐk\mathbf{x}_{k}=\mathbf{a}_{k}+\bm{\Sigma}^{1/2}\mathbf{w}_{k} and using Lemma 8, there is

(Ekโˆ’Ekโˆ’1)โ€‹A2โ€‹k=(Ekโˆ’Ekโˆ’1)โ€‹[|uk|2โ€‹๐škโˆ—โ€‹Qkโ€‹(z)โ€‹๐škโ€‹(ฮฒkโˆ’bk)+|uk|2โ€‹๐škโˆ—โ€‹Qkโ€‹(z)โ€‹๐šบ1/2โ€‹๐ฐkโ€‹ฮฒk]\displaystyle(\mbox{E}_{k}-\mbox{E}_{k-1})A_{2k}=(\mbox{E}_{k}-\mbox{E}_{k-1})\left[|u_{k}|^{2}\mathbf{a}_{k}^{*}Q_{k}(z)\mathbf{a}_{k}(\beta_{k}-b_{k})+|u_{k}|^{2}\mathbf{a}_{k}^{*}Q_{k}(z)\bm{\Sigma}^{1/2}\mathbf{w}_{k}\beta_{k}\right] (S.80)

Then

โˆ‘k=1nEkโˆ’1โ€‹|(Ekโˆ’Ekโˆ’1)โ€‹A2โ€‹k|2\displaystyle\sum_{k=1}^{n}\mbox{E}_{k-1}\left|(\mbox{E}_{k}-\mbox{E}_{k-1})A_{2k}\right|^{2}
โ‰ค\displaystyle\leq Cโ€‹โˆ‘k=1n|uk|4โ€‹Ekโˆ’1โ€‹|๐škโˆ—โ€‹Qkโ€‹(z)โ€‹๐škโ€‹(ฮฒkโˆ’bk)|2+Cโ€‹โˆ‘k=1n|uk|4โ€‹Ekโˆ’1โ€‹|๐škโˆ—โ€‹Qkโ€‹(z)โ€‹๐šบ1/2โ€‹๐ฐkโ€‹ฮฒk|2\displaystyle C\sum_{k=1}^{n}|u_{k}|^{4}\mbox{E}_{k-1}|\mathbf{a}_{k}^{*}Q_{k}(z)\mathbf{a}_{k}(\beta_{k}-b_{k})|^{2}+C\sum_{k=1}^{n}|u_{k}|^{4}\mbox{E}_{k-1}|\mathbf{a}_{k}^{*}Q_{k}(z)\bm{\Sigma}^{1/2}\mathbf{w}_{k}\beta_{k}|^{2}
โ‰ค\displaystyle\leq Cโ€‹โˆ‘k=1n|uk|4โ€‹Ekโˆ’1โ€‹|ฮฒkโˆ’bk|2+Cโ€‹โˆ‘k=1n|uk|4โ€‹Ekโˆ’1โ€‹|๐škโˆ—โ€‹Qkโ€‹(z)โ€‹๐šบ1/2โ€‹๐ฐk|2\displaystyle C\sum_{k=1}^{n}|u_{k}|^{4}\mbox{E}_{k-1}|\beta_{k}-b_{k}|^{2}+C\sum_{k=1}^{n}|u_{k}|^{4}\mbox{E}_{k-1}|\mathbf{a}_{k}^{*}Q_{k}(z)\bm{\Sigma}^{1/2}\mathbf{w}_{k}|^{2}
โ‰ค\displaystyle\leq Cโ€‹nโˆ’1\displaystyle Cn^{-1}

where the second step uses |ฮฒk|=Oโ€‹(1)|\beta_{k}|=O(1) and โ€–Qkโ€‹(z)โ€–=Oโ€‹(1)\|Q_{k}(z)\|=O(1), and the third step uses Lemma 9 and (S.70). By (S.80) and an argument similar to (S.76), it can also be checked that

โˆ‘k=1nEโ€‹|(Ekโˆ’Ekโˆ’1)โ€‹A2โ€‹k|2\displaystyle\sum_{k=1}^{n}\mbox{E}|(\mbox{E}_{k}-\mbox{E}_{k-1})A_{2k}|^{2} โ‰ค\displaystyle\leq Cโ€‹โˆ‘k=1nEโ€‹|A2โ€‹k|2โ‰คCโ€‹nโˆ’1.\displaystyle C\sum_{k=1}^{n}\mbox{E}|A_{2k}|^{2}\leq Cn^{-1}.

Therefore an application of the Burkholder inequality yields

Eโ€‹|โˆ‘k=1n(Ekโˆ’Ekโˆ’1)โ€‹A2โ€‹k|2โ‰คCโ€‹nโˆ’1.\mbox{E}\left|\sum^{n}_{k=1}(\mbox{E}_{k}-\mbox{E}_{k-1})A_{2k}\right|^{2}\leq Cn^{-1}.

For A3โ€‹k=๐ฎโˆ—โ€‹๐žkโ€‹๐ฐkโˆ—โ€‹๐šบ1/2โ€‹Qkโ€‹(z)โ€‹๐—kโ€‹๐ฎโ€‹ฮฒkA_{3k}=\mathbf{u}^{*}\mathbf{e}_{k}\mathbf{w}_{k}^{*}\bm{\Sigma}^{1/2}Q_{k}(z)\mathbf{X}_{k}\mathbf{u}\beta_{k}, it can be handled following an argument similar to the one that leads to the bound for A2โ€‹kA_{2k}.

For A4โ€‹k=๐ฎโˆ—โ€‹๐žkโ€‹๐ฐkโˆ—โ€‹๐šบ1/2โ€‹Qkโ€‹(z)โ€‹๐ฑkโ€‹๐žkโˆ—โ€‹๐ฎโ€‹ฮฒkA_{4k}=\mathbf{u}^{*}\mathbf{e}_{k}\mathbf{w}_{k}^{*}\bm{\Sigma}^{1/2}Q_{k}(z)\mathbf{x}_{k}\mathbf{e}_{k}^{*}\mathbf{u}\beta_{k}, there is

A4โ€‹k\displaystyle A_{4k} =\displaystyle= |uk|2โ€‹๐ฐkโˆ—โ€‹๐šบ1/2โ€‹Qkโ€‹(z)โ€‹(๐šบ1/2โ€‹๐ฐk+๐šk)โ€‹(ฮฒkโˆ’bk+bk)\displaystyle|u_{k}|^{2}\mathbf{w}_{k}^{*}\bm{\Sigma}^{1/2}Q_{k}(z)(\bm{\Sigma}^{1/2}\mathbf{w}_{k}+\mathbf{a}_{k})(\beta_{k}-b_{k}+b_{k}) (S.81)
=\displaystyle= |uk|2โ€‹๐ฐkโˆ—โ€‹๐šบ1/2โ€‹Qkโ€‹(z)โ€‹๐šบ1/2โ€‹๐ฐkโ€‹(ฮฒkโˆ’bk)+|uk|2โ€‹๐ฐkโˆ—โ€‹๐šบ1/2โ€‹Qkโ€‹(z)โ€‹๐šบ1/2โ€‹๐ฐkโ€‹bk\displaystyle|u_{k}|^{2}\mathbf{w}_{k}^{*}\bm{\Sigma}^{1/2}Q_{k}(z)\bm{\Sigma}^{1/2}\mathbf{w}_{k}(\beta_{k}-b_{k})+|u_{k}|^{2}\mathbf{w}_{k}^{*}\bm{\Sigma}^{1/2}Q_{k}(z)\bm{\Sigma}^{1/2}\mathbf{w}_{k}b_{k}
+|uk|2โ€‹๐ฐkโˆ—โ€‹๐šบ1/2โ€‹Qkโ€‹(z)โ€‹๐škโ€‹ฮฒk\displaystyle+|u_{k}|^{2}\mathbf{w}_{k}^{*}\bm{\Sigma}^{1/2}Q_{k}(z)\mathbf{a}_{k}\beta_{k}
:=\displaystyle:= A5โ€‹k+A6โ€‹k+A7โ€‹k.\displaystyle A_{5k}+A_{6k}+A_{7k}.

Now, we can continue to bound A5โ€‹kA_{5k} to A7โ€‹kA_{7k}. For A5โ€‹k=|uk|2โ€‹๐ฐkโˆ—โ€‹๐šบ1/2โ€‹Qkโ€‹(z)โ€‹๐šบ1/2โ€‹๐ฐkโ€‹(ฮฒkโˆ’bk)A_{5k}=|u_{k}|^{2}\mathbf{w}_{k}^{*}\bm{\Sigma}^{1/2}Q_{k}(z)\bm{\Sigma}^{1/2}\mathbf{w}_{k}(\beta_{k}-b_{k}), there is

โˆ‘k=1nEkโˆ’1โ€‹|(Ekโˆ’Ekโˆ’1)โ€‹A5โ€‹k|2\displaystyle\sum_{k=1}^{n}\mbox{E}_{k-1}\left|(\mbox{E}_{k}-\mbox{E}_{k-1})A_{5k}\right|^{2}
โ‰ค\displaystyle\leq Cโ€‹โˆ‘k=1n|uk|4โ€‹Ekโˆ’1โ€‹{(E๐ฐkโ€‹|๐ฐkโˆ—โ€‹๐šบ1/2โ€‹Qkโ€‹(z)โ€‹๐šบ1/2โ€‹๐ฐk|4)1/2โ€‹(E๐ฐkโ€‹|ฮฒkโˆ’bk|4)1/2}\displaystyle C\sum_{k=1}^{n}|u_{k}|^{4}\mbox{E}_{k-1}\left\{\left(\mbox{E}_{\mathbf{w}_{k}}|\mathbf{w}_{k}^{*}\bm{\Sigma}^{1/2}Q_{k}(z)\bm{\Sigma}^{1/2}\mathbf{w}_{k}|^{4}\right)^{1/2}\left(\mbox{E}_{\mathbf{w}_{k}}|\beta_{k}-b_{k}|^{4}\right)^{1/2}\right\}
โ‰ค\displaystyle\leq Cโ€‹โˆ‘k=1n|uk|4n,\displaystyle C\frac{\sum_{k=1}^{n}|u_{k}|^{4}}{n},

where we apply (S.69) and Lemma 9. Similarly,

โˆ‘k=1nEโ€‹|(Ekโˆ’Ekโˆ’1)โ€‹A5โ€‹k|2\displaystyle\sum_{k=1}^{n}\mbox{E}|(\mbox{E}_{k}-\mbox{E}_{k-1})A_{5k}|^{2} โ‰ค\displaystyle\leq Cโ€‹โˆ‘k=1nEโ€‹|A5โ€‹k|2\displaystyle C\sum_{k=1}^{n}\mbox{E}|A_{5k}|^{2}
โ‰ค\displaystyle\leq Cโ€‹โˆ‘k=1n|uk|2โ€‹Eโ€‹{(E๐ฐkโ€‹|๐ฐkโˆ—โ€‹๐šบ1/2โ€‹Qkโ€‹(z)โ€‹๐šบ1/2โ€‹๐ฐk|4)1/2โ€‹(E๐ฐkโ€‹|ฮฒkโˆ’bk|4)1/2}\displaystyle C\sum_{k=1}^{n}|u_{k}|^{2}\mbox{E}\left\{\left(\mbox{E}_{\mathbf{w}_{k}}|\mathbf{w}_{k}^{*}\bm{\Sigma}^{1/2}Q_{k}(z)\bm{\Sigma}^{1/2}\mathbf{w}_{k}|^{4}\right)^{1/2}\left(\mbox{E}_{\mathbf{w}_{k}}|\beta_{k}-b_{k}|^{4}\right)^{1/2}\right\}
โ‰ค\displaystyle\leq Cโ€‹nโˆ’1.\displaystyle Cn^{-1}.

Thus, again using Lemma 7, we have

Eโ€‹|โˆ‘k=1n(Ekโˆ’Ekโˆ’1)โ€‹A5โ€‹k|2โ‰คCโ€‹nโˆ’1.\mbox{E}\left|\sum^{n}_{k=1}(\mbox{E}_{k}-\mbox{E}_{k-1})A_{5k}\right|^{2}\leq Cn^{-1}.

For A6โ€‹k=|uk|2โ€‹๐ฐkโˆ—โ€‹๐šบ1/2โ€‹Qkโ€‹(z)โ€‹๐šบ1/2โ€‹๐ฐkโ€‹bkA_{6k}=|u_{k}|^{2}\mathbf{w}_{k}^{*}\bm{\Sigma}^{1/2}Q_{k}(z)\bm{\Sigma}^{1/2}\mathbf{w}_{k}b_{k}, there is

โˆ‘k=1nEkโˆ’1โ€‹|(Ekโˆ’Ekโˆ’1)โ€‹A6โ€‹k|2\displaystyle\sum_{k=1}^{n}\mbox{E}_{k-1}\left|(\mbox{E}_{k}-\mbox{E}_{k-1})A_{6k}\right|^{2} =\displaystyle= โˆ‘k=1n|uk|4โ€‹Ekโˆ’1โ€‹{(E๐ฐkโ€‹|๐ฐkโˆ—โ€‹๐šบ1/2โ€‹Qkโ€‹(z)โ€‹๐šบ1/2โ€‹๐ฐkโˆ’1pโ€‹trโ€‹๐šบโ€‹Qโ€‹(z)|2)}\displaystyle\sum_{k=1}^{n}|u_{k}|^{4}\mbox{E}_{k-1}\left\{\left(\mbox{E}_{\mathbf{w}_{k}}|\mathbf{w}_{k}^{*}\bm{\Sigma}^{1/2}Q_{k}(z)\bm{\Sigma}^{1/2}\mathbf{w}_{k}-\frac{1}{p}\mbox{tr}\bm{\Sigma}Q(z)|^{2}\right)\right\}
โ‰ค\displaystyle\leq Cโ€‹โˆ‘k=1n|uk|4n\displaystyle C\frac{\sum_{k=1}^{n}|u_{k}|^{4}}{n}

and

โˆ‘k=1nEโ€‹|(Ekโˆ’Ekโˆ’1)โ€‹A6โ€‹k|2\displaystyle\sum_{k=1}^{n}\mbox{E}|(\mbox{E}_{k}-\mbox{E}_{k-1})A_{6k}|^{2} โ‰ค\displaystyle\leq Cโ€‹โˆ‘k=1nEโ€‹{|uk|2โ€‹|Ekโ€‹(๐ฐkโˆ—โ€‹๐šบ1/2โ€‹Qkโ€‹(z)โ€‹๐šบ1/2โ€‹๐ฐkโˆ’1pโ€‹trโ€‹๐šบโ€‹Qโ€‹(z))|2}\displaystyle C\sum_{k=1}^{n}\mbox{E}\left\{|u_{k}|^{2}\left|\mbox{E}_{k}(\mathbf{w}_{k}^{*}\bm{\Sigma}^{1/2}Q_{k}(z)\bm{\Sigma}^{1/2}\mathbf{w}_{k}-\frac{1}{p}\mbox{tr}\bm{\Sigma}Q(z))\right|^{2}\right\}
โ‰ค\displaystyle\leq Cโ€‹nโˆ’1.\displaystyle Cn^{-1}.

Thus, there is also

Eโ€‹|โˆ‘k=1n(Ekโˆ’Ekโˆ’1)โ€‹A6โ€‹k|2โ‰คCโ€‹nโˆ’1.\mbox{E}\left|\sum^{n}_{k=1}(\mbox{E}_{k}-\mbox{E}_{k-1})A_{6k}\right|^{2}\leq Cn^{-1}.

The term invovling A7โ€‹k=๐ฎโˆ—โ€‹๐žkโ€‹๐ฐkโˆ—โ€‹๐šบ1/2โ€‹Qkโ€‹(z)โ€‹๐škโ€‹๐žkโˆ—โ€‹๐ฎโ€‹ฮฒkA_{7k}=\mathbf{u}^{*}\mathbf{e}_{k}\mathbf{w}_{k}^{*}\bm{\Sigma}^{1/2}Q_{k}(z)\mathbf{a}_{k}\mathbf{e}_{k}^{*}\mathbf{u}\beta_{k} can be bounded by an argument similar to that for A2โ€‹kA_{2k}. Combining the above discussions we obtain

Eโ€‹|โˆ‘k=1n(Ekโˆ’Ekโˆ’1)โ€‹Ak|2โ‰คCโ€‹nโˆ’1.\mbox{E}\left|\sum^{n}_{k=1}(\mbox{E}_{k}-\mbox{E}_{k-1})A_{k}\right|^{2}\leq Cn^{-1}. (S.82)

Now, we consider BkB_{k} in (S.75). We split BkB_{k} into several components:

Bk\displaystyle B_{k} =\displaystyle= ๐ฎโˆ—โ€‹๐—kโˆ—โ€‹Qkโ€‹(z)โ€‹๐ฑkโ€‹๐ฑkโˆ—โ€‹Qkโ€‹(z)โ€‹๐—๐ฎโ€‹ฮฒk\displaystyle\mathbf{u}^{*}\mathbf{X}_{k}^{*}Q_{k}(z)\mathbf{x}_{k}\mathbf{x}_{k}^{*}Q_{k}(z)\mathbf{X}\mathbf{u}\beta_{k}
=\displaystyle= ๐ฎโˆ—โ€‹๐—kโˆ—โ€‹Qkโ€‹(z)โ€‹๐škโ€‹๐škโˆ—โ€‹Qkโ€‹(z)โ€‹๐—kโ€‹๐ฎโ€‹ฮฒk+๐ฎโˆ—โ€‹๐—kโˆ—โ€‹Qkโ€‹(z)โ€‹๐šบ1/2โ€‹๐ฐkโ€‹๐škโˆ—โ€‹Qkโ€‹(z)โ€‹๐—kโ€‹๐ฎโ€‹ฮฒk\displaystyle\mathbf{u}^{*}\mathbf{X}_{k}^{*}Q_{k}(z)\mathbf{a}_{k}\mathbf{a}_{k}^{*}Q_{k}(z)\mathbf{X}_{k}\mathbf{u}\beta_{k}+\mathbf{u}^{*}\mathbf{X}_{k}^{*}Q_{k}(z)\bm{\Sigma}^{1/2}\mathbf{w}_{k}\mathbf{a}_{k}^{*}Q_{k}(z)\mathbf{X}_{k}\mathbf{u}\beta_{k}
+๐ฎโˆ—โ€‹๐—kโˆ—โ€‹Qkโ€‹(z)โ€‹๐škโ€‹๐ฐkโˆ—โ€‹๐šบ1/2โ€‹Qkโ€‹(z)โ€‹๐—kโ€‹๐ฎโ€‹ฮฒk+๐ฎโˆ—โ€‹๐—kโˆ—โ€‹Qkโ€‹(z)โ€‹๐šบ1/2โ€‹๐ฐkโ€‹๐ฐkโˆ—โ€‹๐šบ1/2โ€‹Qkโ€‹(z)โ€‹๐—kโ€‹๐ฎโ€‹(ฮฒkโˆ’bk)\displaystyle+\mathbf{u}^{*}\mathbf{X}_{k}^{*}Q_{k}(z)\mathbf{a}_{k}\mathbf{w}_{k}^{*}\bm{\Sigma}^{1/2}Q_{k}(z)\mathbf{X}_{k}\mathbf{u}\beta_{k}+\mathbf{u}^{*}\mathbf{X}_{k}^{*}Q_{k}(z)\bm{\Sigma}^{1/2}\mathbf{w}_{k}\mathbf{w}_{k}^{*}\bm{\Sigma}^{1/2}Q_{k}(z)\mathbf{X}_{k}\mathbf{u}(\beta_{k}-b_{k})
+๐ฎโˆ—โ€‹๐—kโˆ—โ€‹Qkโ€‹(z)โ€‹๐šบ1/2โ€‹๐ฐkโ€‹๐ฐkโˆ—โ€‹๐šบ1/2โ€‹Qkโ€‹(z)โ€‹๐—kโ€‹๐ฎโ€‹bk+๐ฎโˆ—โ€‹๐—kโˆ—โ€‹Qkโ€‹(z)โ€‹๐škโ€‹๐ฑkโˆ—โ€‹Qkโ€‹(z)โ€‹๐ฑkโ€‹๐žkโˆ—โ€‹๐ฎโ€‹ฮฒk\displaystyle+\mathbf{u}^{*}\mathbf{X}_{k}^{*}Q_{k}(z)\bm{\Sigma}^{1/2}\mathbf{w}_{k}\mathbf{w}_{k}^{*}\bm{\Sigma}^{1/2}Q_{k}(z)\mathbf{X}_{k}\mathbf{u}b_{k}+\mathbf{u}^{*}\mathbf{X}_{k}^{*}Q_{k}(z)\mathbf{a}_{k}\mathbf{x}_{k}^{*}Q_{k}(z)\mathbf{x}_{k}\mathbf{e}_{k}^{*}\mathbf{u}\beta_{k}
+๐ฎโˆ—โ€‹๐—kโˆ—โ€‹Qkโ€‹(z)โ€‹๐šบ1/2โ€‹๐ฐkโ€‹๐ฑkโˆ—โ€‹Qkโ€‹(z)โ€‹๐ฑkโ€‹๐žkโˆ—โ€‹๐ฎโ€‹ฮฒk\displaystyle+\mathbf{u}^{*}\mathbf{X}_{k}^{*}Q_{k}(z)\bm{\Sigma}^{1/2}\mathbf{w}_{k}\mathbf{x}_{k}^{*}Q_{k}(z)\mathbf{x}_{k}\mathbf{e}_{k}^{*}\mathbf{u}\beta_{k}
:=\displaystyle:= B1โ€‹k+B2โ€‹k+B3โ€‹k+B4โ€‹k+B5โ€‹k+B6โ€‹k+B7โ€‹k.\displaystyle B_{1k}+B_{2k}+B_{3k}+B_{4k}+B_{5k}+B_{6k}+B_{7k}.

We obtain bounds for B1โ€‹kB_{1k}, B2โ€‹kB_{2k} and B3โ€‹kB_{3k} by arguments similar to those leading to the bounds for A1โ€‹kA_{1k} and A2โ€‹kA_{2k}. For the terms B4โ€‹kB_{4k} and B5โ€‹kB_{5k}, we utilize (S.70). For B6โ€‹kB_{6k} we use ๐ฑk=๐šk+๐šบ1/2โ€‹๐ฐk\mathbf{x}_{k}=\mathbf{a}_{k}+\bm{\Sigma}^{1/2}\mathbf{w}_{k} to further decompose it into four components. For the component without ๐ฐk\mathbf{w}_{k}, it can be bounded following an argument similar to the one that leads to the bound for A1โ€‹kA_{1k}. For the component with one ๐ฐk\mathbf{w}_{k}, it can be handled similarly to A2โ€‹kA_{2k}. For the component involving the quadratic form ๐ฐkโŠคโ€‹๐šบ1/2โ€‹Qkโ€‹(z)โ€‹๐šบ1/2โ€‹๐ฐk\mathbf{w}_{k}^{\top}\bm{\Sigma}^{1/2}Q_{k}(z)\bm{\Sigma}^{1/2}\mathbf{w}_{k}, we use arguments leading to the bound for A5โ€‹kA_{5k} and A6โ€‹kA_{6k}. For B7โ€‹kB_{7k}, it is similar to B6โ€‹kB_{6k}, and owing to the presence of ๐ฎโˆ—โ€‹๐—kโˆ—โ€‹Qkโ€‹(z)โ€‹๐šบ1/2โ€‹๐ฐk\mathbf{u}^{*}\mathbf{X}_{k}^{*}Q_{k}(z)\bm{\Sigma}^{1/2}\mathbf{w}_{k}, which has a 44-th moment of Oโ€‹(nโˆ’2)O(n^{-2}), its analysis becomes even simpler. Therefore, we find

Eโ€‹|โˆ‘k=1n(Ekโˆ’Ekโˆ’1)โ€‹Bk|2โ‰คCโ€‹nโˆ’1.\mbox{E}\left|\sum^{n}_{k=1}(\mbox{E}_{k}-\mbox{E}_{k-1})B_{k}\right|^{2}\leq Cn^{-1}. (S.83)

Recalling the definition of CkC_{k} in (S.75), according to the analysis of AkA_{k}, we readily obtain that

Eโ€‹|โˆ‘k=1n(Ekโˆ’Ekโˆ’1)โ€‹Ck|2โ‰คCโ€‹nโˆ’1.\mbox{E}\left|\sum^{n}_{k=1}(\mbox{E}_{k}-\mbox{E}_{k-1})C_{k}\right|^{2}\leq Cn^{-1}. (S.84)

Combining the bounds in (S.82), (S.83),and (S.84), we can obtain (S.72).

Proof of (S.73): We first define

Zk1\displaystyle Z_{k}^{1} =\displaystyle= โˆ‘i=1k๐ฑiโ€‹๐žiโˆ—+โˆ‘i=k+1n๐ฑi0โ€‹๐žiโˆ—\displaystyle\sum_{i=1}^{k}\mathbf{x}_{i}\mathbf{e}_{i}^{*}+\sum_{i=k+1}^{n}\mathbf{x}_{i}^{0}\mathbf{e}_{i}^{*}
Zk\displaystyle Z_{k} =\displaystyle= โˆ‘i=1kโˆ’1๐ฑiโ€‹๐žiโˆ—+โˆ‘i=k+1n๐ฑi0โ€‹๐žiโˆ—\displaystyle\sum_{i=1}^{k-1}\mathbf{x}_{i}\mathbf{e}_{i}^{*}+\sum_{i=k+1}^{n}\mathbf{x}_{i}^{0}\mathbf{e}_{i}^{*}
Zk0\displaystyle Z_{k}^{0} =\displaystyle= โˆ‘i=1kโˆ’1๐ฑiโ€‹๐žiโˆ—+โˆ‘i=kn๐ฑi0โ€‹๐žiโˆ—,\displaystyle\sum_{i=1}^{k-1}\mathbf{x}_{i}\mathbf{e}_{i}^{*}+\sum_{i=k}^{n}\mathbf{x}_{i}^{0}\mathbf{e}_{i}^{*},

where ๐ฑi0=๐ši+๐šบ1/2โ€‹๐ฐi0\mathbf{x}^{0}_{i}=\mathbf{a}_{i}+\bm{\Sigma}^{1/2}\mathbf{w}_{i}^{0}, and ๐ฐi0\mathbf{w}_{i}^{0} follows normal distribution with mean ๐ŸŽ\mathbf{0} variance 1/n1/n. Define

Gkโ€‹(z)=(Zkโ€‹Zkโˆ—โˆ’zโ€‹๐ˆ)โˆ’1,ฮฒk1=11+๐ฑkโˆ—โ€‹Gkโ€‹(z)โ€‹๐ฑk,ฮฒk0=11+๐ฑk0โˆ—โ€‹Gkโ€‹(z)โ€‹๐ฑk0.G_{k}(z)=(Z_{k}Z_{k}^{*}-z\mathbf{I})^{-1},~{}~{}~{}~{}\beta_{k}^{1}=\frac{1}{1+\mathbf{x}_{k}^{*}G_{k}(z)\mathbf{x}_{k}},~{}~{}~{}~{}\beta_{k}^{0}~{}=~{}\frac{1}{1+{\mathbf{x}_{k}^{0}}^{*}G_{k}(z)\mathbf{x}_{k}^{0}}. (S.85)

Write

Eโ€‹๐ฎโˆ—โ€‹๐—โˆ—โ€‹(๐—๐—โˆ—โˆ’zโ€‹๐ˆ)โˆ’1โ€‹๐—๐ฎโˆ’Eโ€‹๐ฎโˆ—โ€‹๐—0โˆ—โ€‹(๐—0โ€‹๐—0โˆ—โˆ’zโ€‹๐ˆ)โˆ’1โ€‹๐—0โ€‹๐ฎ\displaystyle\mbox{E}\mathbf{u}^{*}\mathbf{X}^{*}(\mathbf{X}\mathbf{X}^{*}-z\mathbf{I})^{-1}\mathbf{X}\mathbf{u}-\mbox{E}\mathbf{u}^{*}\mathbf{X}_{0}^{*}(\mathbf{X}_{0}\mathbf{X}_{0}^{*}-z\mathbf{I})^{-1}\mathbf{X}_{0}\mathbf{u}
=\displaystyle= โˆ‘k=1nEโ€‹(๐ฎโˆ—โ€‹Zk1โˆ—โ€‹(Zk1โ€‹Zk1โˆ—โˆ’zโ€‹๐ˆ)โˆ’1โ€‹Zk1โ€‹๐ฎโˆ’๐ฎโˆ—โ€‹Zkโˆ—โ€‹(Zkโ€‹Zkโˆ—โˆ’zโ€‹๐ˆ)โˆ’1โ€‹Zkโ€‹๐ฎ)\displaystyle\sum^{n}_{k=1}\mbox{E}\left(\mathbf{u}^{*}{Z_{k}^{1}}^{*}({Z_{k}^{1}}{Z_{k}^{1}}^{*}-z\mathbf{I})^{-1}{Z_{k}^{1}}\mathbf{u}-\mathbf{u}^{*}{Z_{k}}^{*}({Z_{k}}{Z_{k}}^{*}-z\mathbf{I})^{-1}{Z_{k}}\mathbf{u}\right)
โˆ’โˆ‘k=1nEโ€‹(๐ฎโˆ—โ€‹Zk0โˆ—โ€‹(Zk0โ€‹Zk0โˆ—โˆ’zโ€‹๐ˆ)โˆ’1โ€‹Zk0โ€‹๐ฎโˆ’๐ฎโˆ—โ€‹Zkโˆ—โ€‹(Zkโ€‹Zkโˆ—โˆ’zโ€‹๐ˆ)โˆ’1โ€‹Zkโ€‹๐ฎ)\displaystyle-\sum^{n}_{k=1}\mbox{E}\left(\mathbf{u}^{*}{Z_{k}^{0}}^{*}({Z_{k}^{0}}{Z_{k}^{0}}^{*}-z\mathbf{I})^{-1}{Z_{k}^{0}}\mathbf{u}-\mathbf{u}^{*}{Z_{k}}^{*}({Z_{k}}{Z_{k}}^{*}-z\mathbf{I})^{-1}{Z_{k}}\mathbf{u}\right)
:=\displaystyle:= โˆ‘k=1n[Eโ€‹(Ak1โˆ’Bk1+Ck1)โˆ’Eโ€‹(Ak0โˆ’Bk0+Ck0)],\displaystyle\sum_{k=1}^{n}\left[\mbox{E}\left(A_{k}^{1}-B_{k}^{1}+C_{k}^{1}\right)-\mbox{E}\left(A_{k}^{0}-B_{k}^{0}+C_{k}^{0}\right)\right],

where

Ak1\displaystyle A_{k}^{1} =๐ฎโˆ—โ€‹๐žkโ€‹๐ฑkโ€‹Gkโ€‹Zk1โ€‹๐ฎโ€‹ฮฒk1,\displaystyle=\mathbf{u}^{*}\mathbf{e}_{k}\mathbf{x}_{k}G_{k}Z_{k}^{1}\mathbf{u}\beta_{k}^{1}, Bk1\displaystyle B_{k}^{1} =๐ฎโˆ—โ€‹Zkโˆ—โ€‹Gkโ€‹๐ฑkโ€‹๐ฑkโˆ—โ€‹Gkโ€‹Zk1โ€‹๐ฎโ€‹ฮฒk1,\displaystyle=\mathbf{u}^{*}Z_{k}^{*}G_{k}\mathbf{x}_{k}\mathbf{x}_{k}^{*}G_{k}Z_{k}^{1}\mathbf{u}\beta_{k}^{1}, Ck1\displaystyle C_{k}^{1} =๐ฎโˆ—โ€‹Zkโˆ—โ€‹Gkโ€‹๐ฑkโ€‹๐žkโˆ—โ€‹๐ฎ,\displaystyle=\mathbf{u}^{*}Z_{k}^{*}G_{k}\mathbf{x}_{k}\mathbf{e}_{k}^{*}\mathbf{u},
Ak0\displaystyle A_{k}^{0} =๐ฎโˆ—โ€‹๐žkโ€‹๐ฑk0โ€‹GKโ€‹Zk0โ€‹๐ฎโ€‹ฮฒk0,\displaystyle=\mathbf{u}^{*}\mathbf{e}_{k}\mathbf{x}_{k}^{0}G_{K}Z_{k}^{0}\mathbf{u}\beta_{k}^{0}, Bk0\displaystyle B_{k}^{0} =๐ฎโˆ—โ€‹Zkโˆ—โ€‹Gkโ€‹๐ฑk0โ€‹๐ฑk0โˆ—โ€‹Gkโ€‹Zk0โ€‹๐ฎโ€‹ฮฒk0,\displaystyle=\mathbf{u}^{*}Z_{k}^{*}G_{k}\mathbf{x}_{k}^{0}{\mathbf{x}_{k}^{0}}^{*}G_{k}Z_{k}^{0}\mathbf{u}\beta_{k}^{0}, Ck0\displaystyle C_{k}^{0} =๐ฎโˆ—โ€‹Zkโˆ—โ€‹Gkโ€‹๐ฑk0โ€‹๐žkโˆ—โ€‹๐ฎ.\displaystyle=\mathbf{u}^{*}Z_{k}^{*}G_{k}\mathbf{x}_{k}^{0}\mathbf{e}_{k}^{*}\mathbf{u}.

Similar to Ak,Bk,CkA_{k},B_{k},C_{k} in (S.75), here, Ak1,Bk1,Ck1,Ak0,Bk0,Ck0A_{k}^{1},B_{k}^{1},C_{k}^{1},A_{k}^{0},B_{k}^{0},C_{k}^{0} can be further decomposed as before, and we use the superscripts โ€œ1โ€ and โ€œ0โ€ to distinguish the general case and the gaussian case. Since the procedure is similar as before, for simplicity, we list two typical examples to illustrate the proof idea. For example, consider Ak1A_{k}^{1},

Ak1\displaystyle A_{k}^{1} =\displaystyle= ๐ฎโˆ—โ€‹๐žkโ€‹๐ฑkโˆ—โ€‹Gkโ€‹(z)โ€‹Zk1โ€‹๐ฎโ€‹ฮฒk1\displaystyle\mathbf{u}^{*}\mathbf{e}_{k}\mathbf{x}_{k}^{*}G_{k}(z)Z_{k}^{1}\mathbf{u}\beta_{k}^{1}
=\displaystyle= ๐ฎโˆ—โ€‹๐žkโ€‹(๐škโˆ—+๐ฐkโˆ—โ€‹๐šบ1/2)โ€‹Gkโ€‹(z)โ€‹(Zk+๐ฑkโ€‹๐žkโˆ—)โ€‹๐ฎโ€‹ฮฒk1\displaystyle\mathbf{u}^{*}\mathbf{e}_{k}(\mathbf{a}_{k}^{*}+\mathbf{w}_{k}^{*}\bm{\Sigma}^{1/2})G_{k}(z)(Z_{k}+\mathbf{x}_{k}\mathbf{e}_{k}^{*})\mathbf{u}\beta_{k}^{1}
=\displaystyle= ๐ฎโˆ—โ€‹๐žkโ€‹๐škโˆ—โ€‹Gkโ€‹(z)โ€‹Zkโ€‹๐ฎโ€‹ฮฒk1+๐ฎโˆ—โ€‹๐žkโ€‹๐škโˆ—โ€‹Gkโ€‹(z)โ€‹๐ฑkโ€‹๐žkโˆ—โ€‹๐ฎโ€‹ฮฒk1\displaystyle\mathbf{u}^{*}\mathbf{e}_{k}\mathbf{a}_{k}^{*}G_{k}(z)Z_{k}\mathbf{u}\beta_{k}^{1}+\mathbf{u}^{*}\mathbf{e}_{k}\mathbf{a}_{k}^{*}G_{k}(z)\mathbf{x}_{k}\mathbf{e}_{k}^{*}\mathbf{u}\beta_{k}^{1}
+๐ฎโˆ—โ€‹๐žkโ€‹๐ฐkโˆ—โ€‹๐šบ1/2โ€‹Gkโ€‹(z)โ€‹Zkโ€‹๐ฎโ€‹ฮฒk1+๐ฎโˆ—โ€‹๐žkโ€‹๐ฐkโˆ—โ€‹๐šบ1/2โ€‹Gkโ€‹(z)โ€‹๐ฑkโ€‹๐žkโˆ—โ€‹๐ฎโ€‹ฮฒk1\displaystyle+\mathbf{u}^{*}\mathbf{e}_{k}\mathbf{w}_{k}^{*}\bm{\Sigma}^{1/2}G_{k}(z)Z_{k}\mathbf{u}\beta_{k}^{1}+\mathbf{u}^{*}\mathbf{e}_{k}\mathbf{w}_{k}^{*}\bm{\Sigma}^{1/2}G_{k}(z)\mathbf{x}_{k}\mathbf{e}_{k}^{*}\mathbf{u}\beta_{k}^{1}
:=\displaystyle:= A1โ€‹k1+A2โ€‹k1+A3โ€‹k1+A4โ€‹k1.\displaystyle A_{1k}^{1}+A_{2k}^{1}+A_{3k}^{1}+A_{4k}^{1}.

For A1โ€‹k1=๐ฎโˆ—โ€‹๐žkโ€‹๐škโˆ—โ€‹Gkโ€‹(z)โ€‹Zkโ€‹๐ฎโ€‹ฮฒk1A_{1k}^{1}=\mathbf{u}^{*}\mathbf{e}_{k}\mathbf{a}_{k}^{*}G_{k}(z)Z_{k}\mathbf{u}\beta_{k}^{1}, there is

|โˆ‘k=1nEโ€‹[๐ฎโˆ—โ€‹๐žkโ€‹๐škโˆ—โ€‹Gkโ€‹(z)โ€‹Zkโ€‹๐ฎโ€‹(ฮฒk1โˆ’bk)]|โ‰คโˆ‘k=1n(Eโ€‹|๐ฎโˆ—โ€‹๐žkโ€‹๐škโˆ—โ€‹Gkโ€‹(z)โ€‹Zkโ€‹๐ฎ|2โ€‹Eโ€‹|ฮฒk1โˆ’bk|2)1/2โ‰คCnโ€‹โˆ‘k=1n|uk|โ‹…โ€–๐škโ€–โ‰คC2โ€‹nโ€‹โˆ‘k=1n(|uk|2+โ€–๐škโ€–2)โ‰คCn,\displaystyle\begin{aligned} \left|\sum^{n}_{k=1}\mbox{E}\left[\mathbf{u}^{*}\mathbf{e}_{k}\mathbf{a}_{k}^{*}G_{k}(z)Z_{k}\mathbf{u}(\beta_{k}^{1}-b_{k})\right]\right|&\leq\sum^{n}_{k=1}\left(\mbox{E}|\mathbf{u}^{*}\mathbf{e}_{k}\mathbf{a}_{k}^{*}G_{k}(z)Z_{k}\mathbf{u}|^{2}\mbox{E}|\beta_{k}^{1}-b_{k}|^{2}\right)^{1/2}\\ &\leq\frac{C}{\sqrt{n}}\sum_{k=1}^{n}|u_{k}|\cdot\|\mathbf{a}_{k}\|\leq\frac{C}{2\sqrt{n}}\sum_{k=1}^{n}(|u_{k}|^{2}+\|\mathbf{a}_{k}\|^{2})\leq\frac{C}{\sqrt{n}},\end{aligned}

where bkb_{k} is defined in (S.68), and in the second step we use Eโ€‹|ฮฒk1โˆ’bk|2=Oโ€‹(nโˆ’1)\mbox{E}|\beta_{k}^{1}-b_{k}|^{2}=O(n^{-1}). Thus, we have

โˆ‘k=1nEโ€‹A1โ€‹k1=โˆ‘k=1nEโ€‹๐ฎโˆ—โ€‹๐žkโ€‹๐škโˆ—โ€‹Gkโ€‹(z)โ€‹Zkโ€‹๐ฎโ€‹bk+Oโ€‹(1n).\sum^{n}_{k=1}\mbox{E}A_{1k}^{1}=\sum^{n}_{k=1}\mbox{E}\mathbf{u}^{*}\mathbf{e}_{k}\mathbf{a}_{k}^{*}G_{k}(z)Z_{k}\mathbf{u}b_{k}+O\left(\frac{1}{\sqrt{n}}\right). (S.86)

Similarly, we also have โˆ‘k=1nEโ€‹A1โ€‹k0=โˆ‘k=1nEโ€‹๐ฎโˆ—โ€‹๐žkโ€‹๐škโˆ—โ€‹Gkโ€‹(z)โ€‹Zkโ€‹๐ฎโ€‹bk+Oโ€‹(1n)\sum^{n}_{k=1}\mbox{E}A_{1k}^{0}=\sum^{n}_{k=1}\mbox{E}\mathbf{u}^{*}\mathbf{e}_{k}\mathbf{a}_{k}^{*}G_{k}(z)Z_{k}\mathbf{u}b_{k}+O(\frac{1}{\sqrt{n}}). For A3โ€‹k1A_{3k}^{1}, note that Eโ€‹๐ฎโˆ—โ€‹๐žkโ€‹๐ฐkโˆ—โ€‹๐šบ1/2โ€‹Gkโ€‹(z)โ€‹Zkโ€‹๐ฎโ€‹bk=0\mbox{E}\mathbf{u}^{*}\mathbf{e}_{k}\mathbf{w}_{k}^{*}\bm{\Sigma}^{1/2}G_{k}(z)Z_{k}\mathbf{u}b_{k}=0. Then, by Cauchyโ€“Schwarz inequality, write

โˆ‘k=1nEโ€‹[ukโ€‹๐ฐkโˆ—โ€‹๐šบ1/2โ€‹Gkโ€‹(z)โ€‹Zkโ€‹๐ฎโ€‹(ฮฒk1โˆ’bk)]โ‰คCnโ€‹โˆ‘k=1n|uk|โ€‹(Eโ€‹๐ฎโˆ—โ€‹Zkโˆ—โ€‹Gkโ€‹(z)โ€‹๐šบโ€‹Gkโ€‹(z)โ€‹Zkโ€‹๐ฎ)1/2โ‰คCn.\displaystyle\sum_{k=1}^{n}\mbox{E}\left[u_{k}\mathbf{w}_{k}^{*}\bm{\Sigma}^{1/2}G_{k}(z)Z_{k}\mathbf{u}(\beta_{k}^{1}-b_{k})\right]\leq\frac{C}{n}\sum_{k=1}^{n}|u_{k}|(\mbox{E}\mathbf{u}^{*}Z_{k}^{*}G_{k}(z)\bm{\Sigma}G_{k}(z)Z_{k}\mathbf{u})^{1/2}\leq\frac{C}{\sqrt{n}}. (S.87)

Similarly, it is easy to prove that |โˆ‘k=1nEโ€‹Ajโ€‹k1|=Oโ€‹(1n)\left|\sum^{n}_{k=1}\mbox{E}A_{jk}^{1}\right|=O(\frac{1}{\sqrt{n}}) for j=2,4j=2,4 by using Cauchyโ€“Schwarz inequality.

Consider Bk1B_{k}^{1} , there is

Bk1\displaystyle B_{k}^{1} =\displaystyle= ๐ฎโˆ—โ€‹Zkโˆ—โ€‹(Zkโ€‹Zkโˆ—โˆ’zโ€‹๐ˆ)โˆ’1โ€‹๐ฑkโ€‹๐ฑkโˆ—โ€‹(Zkโ€‹Zkโˆ—โˆ’zโ€‹๐ˆ)โˆ’1โ€‹Zkโ€‹๐ฎโ€‹ฮฒk1\displaystyle\mathbf{u}^{*}Z_{k}^{*}(Z_{k}Z_{k}^{*}-z\mathbf{I})^{-1}\mathbf{x}_{k}\mathbf{x}_{k}^{*}(Z_{k}Z_{k}^{*}-z\mathbf{I})^{-1}Z_{k}\mathbf{u}\beta_{k}^{1}
=\displaystyle= ๐ฎโˆ—โ€‹Zkโˆ—โ€‹Gkโ€‹(z)โ€‹๐škโ€‹๐škโˆ—โ€‹Gkโ€‹(z)โ€‹Zkโ€‹๐ฎโ€‹ฮฒk1+๐ฎโˆ—โ€‹Zkโˆ—โ€‹Gkโ€‹(z)โ€‹๐šบ1/2โ€‹๐ฐkโ€‹๐škโˆ—โ€‹Gkโ€‹(z)โ€‹Zkโ€‹๐ฎโ€‹ฮฒk1\displaystyle\mathbf{u}^{*}Z_{k}^{*}G_{k}(z)\mathbf{a}_{k}\mathbf{a}_{k}^{*}G_{k}(z)Z_{k}\mathbf{u}\beta_{k}^{1}+\mathbf{u}^{*}Z_{k}^{*}G_{k}(z)\bm{\Sigma}^{1/2}\mathbf{w}_{k}\mathbf{a}_{k}^{*}G_{k}(z)Z_{k}\mathbf{u}\beta_{k}^{1}
+๐ฎโˆ—โ€‹Zkโˆ—โ€‹Gkโ€‹(z)โ€‹๐škโ€‹๐ฐkโˆ—โ€‹๐šบ1/2โ€‹Gkโ€‹(z)โ€‹Zkโ€‹๐ฎโ€‹ฮฒk1+๐ฎโˆ—โ€‹Zkโˆ—โ€‹Gkโ€‹(z)โ€‹๐šบ1/2โ€‹๐ฐkโ€‹๐ฐkโˆ—โ€‹๐šบ1/2โ€‹Gkโ€‹(z)โ€‹Zkโ€‹๐ฎโ€‹(ฮฒk1โˆ’bk)\displaystyle+\mathbf{u}^{*}Z_{k}^{*}G_{k}(z)\mathbf{a}_{k}\mathbf{w}_{k}^{*}\bm{\Sigma}^{1/2}G_{k}(z)Z_{k}\mathbf{u}\beta_{k}^{1}+\mathbf{u}^{*}Z_{k}^{*}G_{k}(z)\bm{\Sigma}^{1/2}\mathbf{w}_{k}\mathbf{w}_{k}^{*}\bm{\Sigma}^{1/2}G_{k}(z)Z_{k}\mathbf{u}(\beta_{k}^{1}-b_{k})
+๐ฎโˆ—โ€‹Zkโˆ—โ€‹Gkโ€‹(z)โ€‹๐šบ1/2โ€‹๐ฐkโ€‹๐ฐkโˆ—โ€‹๐šบ1/2โ€‹Gkโ€‹(z)โ€‹Zkโ€‹๐ฎโ€‹bk+๐ฎโˆ—โ€‹Zkโˆ—โ€‹Gkโ€‹(z)โ€‹๐škโ€‹๐ฑkโˆ—โ€‹Gkโ€‹(z)โ€‹๐ฑkโ€‹๐žkโˆ—โ€‹๐ฎโ€‹ฮฒk1\displaystyle+\mathbf{u}^{*}Z_{k}^{*}G_{k}(z)\bm{\Sigma}^{1/2}\mathbf{w}_{k}\mathbf{w}_{k}^{*}\bm{\Sigma}^{1/2}G_{k}(z)Z_{k}\mathbf{u}b_{k}+\mathbf{u}^{*}Z_{k}^{*}G_{k}(z)\mathbf{a}_{k}\mathbf{x}_{k}^{*}G_{k}(z)\mathbf{x}_{k}\mathbf{e}_{k}^{*}\mathbf{u}\beta_{k}^{1}
+๐ฎโˆ—โ€‹Zkโˆ—โ€‹Gkโ€‹(z)โ€‹๐šบ1/2โ€‹๐ฐkโ€‹๐ฑkโ€‹Gkโ€‹(z)โ€‹๐ฑkโ€‹๐žkโˆ—โ€‹๐ฎโ€‹ฮฒk1\displaystyle+\mathbf{u}^{*}Z_{k}^{*}G_{k}(z)\bm{\Sigma}^{1/2}\mathbf{w}_{k}\mathbf{x}_{k}G_{k}(z)\mathbf{x}_{k}\mathbf{e}_{k}^{*}\mathbf{u}\beta_{k}^{1}
:=\displaystyle:= B1โ€‹k1+B2โ€‹k1+B3โ€‹k1+B4โ€‹k1+B5โ€‹k1+B6โ€‹k1+B7โ€‹k1.\displaystyle B_{1k}^{1}+B_{2k}^{1}+B_{3k}^{1}+B_{4k}^{1}+B_{5k}^{1}+B_{6k}^{1}+B_{7k}^{1}.

Similar to (S.86), we have

โˆ‘k=1nEโ€‹B1โ€‹k1=โˆ‘k=1nEโ€‹[๐ฎโˆ—โ€‹Zkโˆ—โ€‹Gkโ€‹(z)โ€‹๐škโ€‹๐škโˆ—โ€‹Gkโ€‹(z)โ€‹Zkโ€‹๐ฎโ€‹bk]+Oโ€‹(1n),\sum^{n}_{k=1}\mbox{E}B_{1k}^{1}=\sum^{n}_{k=1}\mbox{E}\left[\mathbf{u}^{*}Z_{k}^{*}G_{k}(z)\mathbf{a}_{k}\mathbf{a}_{k}^{*}G_{k}(z)Z_{k}\mathbf{u}b_{k}\right]+O\left(\frac{1}{\sqrt{n}}\right),

and

โˆ‘k=1nEโ€‹B1โ€‹k0=โˆ‘k=1nEโ€‹[๐ฎโˆ—โ€‹Zkโˆ—โ€‹Gkโ€‹(z)โ€‹๐škโ€‹๐škโˆ—โ€‹Gkโ€‹(z)โ€‹Zkโ€‹๐ฎโ€‹bk]+Oโ€‹(1n).\sum^{n}_{k=1}\mbox{E}B_{1k}^{0}=\sum^{n}_{k=1}\mbox{E}\left[\mathbf{u}^{*}Z_{k}^{*}G_{k}(z)\mathbf{a}_{k}\mathbf{a}_{k}^{*}G_{k}(z)Z_{k}\mathbf{u}b_{k}\right]+O\left(\frac{1}{\sqrt{n}}\right).

For B2โ€‹k1=๐ฎโˆ—โ€‹Zkโˆ—โ€‹Gkโ€‹(z)โ€‹๐šบ1/2โ€‹๐ฐkโ€‹๐škโˆ—โ€‹Gkโ€‹(z)โ€‹Zkโ€‹๐ฎโ€‹ฮฒk1B_{2k}^{1}=\mathbf{u}^{*}Z_{k}^{*}G_{k}(z)\bm{\Sigma}^{1/2}\mathbf{w}_{k}\mathbf{a}_{k}^{*}G_{k}(z)Z_{k}\mathbf{u}\beta_{k}^{1}, we have

โˆ‘k=1nEโ€‹B2โ€‹k1=โˆ‘k=1nEโ€‹[๐ฎโˆ—โ€‹Zkโˆ—โ€‹Gkโ€‹(z)โ€‹๐šบ1/2โ€‹๐ฐkโ€‹๐škโˆ—โ€‹Gkโ€‹(z)โ€‹Zkโ€‹๐ฎโ€‹(ฮฒk1โˆ’bk)]=Oโ€‹(1n),\sum_{k=1}^{n}\mbox{E}B_{2k}^{1}=\sum_{k=1}^{n}\mbox{E}\left[\mathbf{u}^{*}Z_{k}^{*}G_{k}(z)\bm{\Sigma}^{1/2}\mathbf{w}_{k}\mathbf{a}_{k}^{*}G_{k}(z)Z_{k}\mathbf{u}(\beta_{k}^{1}-b_{k})\right]=O\left(\frac{1}{\sqrt{n}}\right),

and by the same reason, we have such bound for โˆ‘k=1nEโ€‹B2โ€‹k0\sum_{k=1}^{n}\mbox{E}B_{2k}^{0}, โˆ‘k=1nEโ€‹B3โ€‹k1\sum_{k=1}^{n}\mbox{E}B_{3k}^{1} and โˆ‘k=1nEโ€‹B3โ€‹k0\sum_{k=1}^{n}\mbox{E}B_{3k}^{0}.

For B4โ€‹k1=๐ฎโˆ—โ€‹Zkโˆ—โ€‹Gkโ€‹(z)โ€‹๐šบ1/2โ€‹๐ฐkโ€‹๐ฐkโˆ—โ€‹ฮฃ1/2โ€‹Gkโ€‹(z)โ€‹Zkโ€‹๐ฎโ€‹(ฮฒk1โˆ’bk)B_{4k}^{1}=\mathbf{u}^{*}Z_{k}^{*}G_{k}(z)\bm{\Sigma}^{1/2}\mathbf{w}_{k}\mathbf{w}_{k}^{*}\Sigma^{1/2}G_{k}(z)Z_{k}\mathbf{u}(\beta_{k}^{1}-b_{k}), by lemma 6 we have

|โˆ‘k=1nEโ€‹B4โ€‹k1|\displaystyle|\sum_{k=1}^{n}\mbox{E}B_{4k}^{1}| โ‰คโˆ‘k=1n[Eโ€‹|๐ฐkโˆ—โ€‹ฮฃ1/2โ€‹Gkโ€‹(z)โ€‹Zkโ€‹๐ฎ๐ฎโˆ—โ€‹Zkโˆ—โ€‹Gkโ€‹(z)โ€‹ฮฃ1/2โ€‹๐ฐk|2โ€‹Eโ€‹|ฮฒk1โˆ’bk|2]1/2\displaystyle\leq\sum_{k=1}^{n}\left[\mbox{E}|\mathbf{w}_{k}^{*}\Sigma^{1/2}G_{k}(z)Z_{k}\mathbf{u}\mathbf{u}^{*}Z_{k}^{*}G_{k}(z)\Sigma^{1/2}\mathbf{w}_{k}|^{2}E|\beta_{k}^{1}-b_{k}|^{2}\right]^{1/2}
โ‰คCnโˆ‘k=1n[E|๐ฐkโˆ—ฮฃ1/2Gk(z)Zk๐ฎ๐ฎโˆ—Zkโˆ—Gk(z)ฮฃ1/2๐ฐkโˆ’nโˆ’1๐ฎโˆ—ZkGk(z)๐šบGk(z)Zk๐ฎ|2\displaystyle\leq\frac{C}{\sqrt{n}}\sum_{k=1}^{n}\Big{[}\mbox{E}|\mathbf{w}_{k}^{*}\Sigma^{1/2}G_{k}(z)Z_{k}\mathbf{u}\mathbf{u}^{*}Z_{k}^{*}G_{k}(z)\Sigma^{1/2}\mathbf{w}_{k}-n^{-1}\mathbf{u}^{*}Z_{k}G_{k}(z)\bm{\Sigma}G_{k}(z)Z_{k}\mathbf{u}|^{2}
+E(๐ฎโˆ—ZkGk(z)๐šบGk(z)Zk๐ฎ/n)2]1/2=O(1n).\displaystyle~{}~{}+\mbox{E}(\mathbf{u}^{*}Z_{k}G_{k}(z)\bm{\Sigma}G_{k}(z)Z_{k}\mathbf{u}/n)^{2}\Big{]}^{1/2}=O\left(\frac{1}{\sqrt{n}}\right).

and โˆ‘k=1pB4โ€‹k0\sum_{k=1}^{p}B_{4k}^{0} also has bound of order Oโ€‹(1n)O(\frac{1}{\sqrt{n}}).

For B5โ€‹k1B_{5k}^{1} and B5โ€‹k0B_{5k}^{0}, we have

โˆ‘k=1nEโ€‹B5โ€‹k1=โˆ‘k=1nEโ€‹B5โ€‹k0=โˆ‘k=1n๐ฎโˆ—โ€‹Zkโˆ—โ€‹Gkโ€‹(z)โ€‹๐šบโ€‹Gkโ€‹(z)โ€‹Zkโ€‹๐ฎn.\sum_{k=1}^{n}\mbox{E}B_{5k}^{1}=\sum_{k=1}^{n}\mbox{E}B_{5k}^{0}=\sum_{k=1}^{n}\frac{\mathbf{u}^{*}Z_{k}^{*}G_{k}(z)\bm{\Sigma}G_{k}(z)Z_{k}\mathbf{u}}{n}.

For B6โ€‹k1=๐ฎโˆ—โ€‹Zkโˆ—โ€‹Gkโ€‹(z)โ€‹๐škโ€‹๐ฑkโˆ—โ€‹Gkโ€‹(z)โ€‹๐ฑkโ€‹๐žkโˆ—โ€‹๐ฎโ€‹ฮฒk1B_{6k}^{1}=\mathbf{u}^{*}Z_{k}^{*}G_{k}(z)\mathbf{a}_{k}\mathbf{x}_{k}^{*}G_{k}(z)\mathbf{x}_{k}\mathbf{e}_{k}^{*}\mathbf{u}\beta_{k}^{1}, it can be decomposed into

B6โ€‹k1\displaystyle B_{6k}^{1} =\displaystyle= ๐ฎโˆ—โ€‹Zkโˆ—โ€‹Gkโ€‹(z)โ€‹๐škโ€‹๐škโˆ—โ€‹Gkโ€‹(z)โ€‹๐škโ€‹๐žkโˆ—โ€‹๐ฎโ€‹ฮฒk1+๐ฎโˆ—โ€‹Zkโˆ—โ€‹Gkโ€‹(z)โ€‹๐škโ€‹๐ฐkโˆ—โ€‹๐šบ1/2โ€‹Gkโ€‹(z)โ€‹๐škโ€‹๐žkโˆ—โ€‹๐ฎโ€‹ฮฒk1\displaystyle\mathbf{u}^{*}Z_{k}^{*}G_{k}(z)\mathbf{a}_{k}\mathbf{a}_{k}^{*}G_{k}(z)\mathbf{a}_{k}\mathbf{e}_{k}^{*}\mathbf{u}\beta_{k}^{1}+\mathbf{u}^{*}Z_{k}^{*}G_{k}(z)\mathbf{a}_{k}\mathbf{w}_{k}^{*}\bm{\Sigma}^{1/2}G_{k}(z)\mathbf{a}_{k}\mathbf{e}_{k}^{*}\mathbf{u}\beta_{k}^{1}
+๐ฎโˆ—โ€‹Zkโˆ—โ€‹Gkโ€‹(z)โ€‹๐škโ€‹๐škโˆ—โ€‹Gkโ€‹(z)โ€‹๐šบ1/2โ€‹๐ฐkโ€‹๐žkโˆ—โ€‹๐ฎโ€‹ฮฒk1+๐ฎโˆ—โ€‹Zkโˆ—โ€‹Gkโ€‹(z)โ€‹๐škโ€‹๐ฐkโˆ—โ€‹๐šบ1/2โ€‹Gkโ€‹(z)โ€‹๐šบ1/2โ€‹๐ฐkโ€‹๐žkโˆ—โ€‹๐ฎโ€‹(ฮฒk1โˆ’bk)\displaystyle+\mathbf{u}^{*}Z_{k}^{*}G_{k}(z)\mathbf{a}_{k}\mathbf{a}_{k}^{*}G_{k}(z)\bm{\Sigma}^{1/2}\mathbf{w}_{k}\mathbf{e}_{k}^{*}\mathbf{u}\beta_{k}^{1}+\mathbf{u}^{*}Z_{k}^{*}G_{k}(z)\mathbf{a}_{k}\mathbf{w}_{k}^{*}\bm{\Sigma}^{1/2}G_{k}(z)\bm{\Sigma}^{1/2}\mathbf{w}_{k}\mathbf{e}_{k}^{*}\mathbf{u}(\beta_{k}^{1}-b_{k})
+๐ฎโˆ—โ€‹Zkโˆ—โ€‹Gkโ€‹(z)โ€‹๐škโ€‹๐ฐkโˆ—โ€‹๐šบ1/2โ€‹Gkโ€‹(z)โ€‹๐šบ1/2โ€‹๐ฐkโ€‹๐žkโˆ—โ€‹๐ฎโ€‹bk,\displaystyle+\mathbf{u}^{*}Z_{k}^{*}G_{k}(z)\mathbf{a}_{k}\mathbf{w}_{k}^{*}\bm{\Sigma}^{1/2}G_{k}(z)\bm{\Sigma}^{1/2}\mathbf{w}_{k}\mathbf{e}_{k}^{*}\mathbf{u}b_{k},

and it is readily verified that

โˆ‘k=1nEโ€‹B6โ€‹k1\displaystyle\sum^{n}_{k=1}\mbox{E}B_{6k}^{1} =โˆ‘k=1nEโ€‹[trโ€‹[Gkโ€‹(z)โ€‹ฮฃ]nโ€‹๐ฎโˆ—โ€‹Zkโˆ—โ€‹Gkโ€‹(z)โ€‹๐škโ€‹ukโ€‹bk+๐ฎโˆ—โ€‹Zkโˆ—โ€‹Gkโ€‹(z)โ€‹๐škโ€‹๐škโˆ—โ€‹Gkโ€‹(z)โ€‹๐škโ€‹ukโ€‹bk]+Oโ€‹(1n)\displaystyle=\sum^{n}_{k=1}\mbox{E}\left[\frac{\mbox{tr}[G_{k}(z)\Sigma]}{n}\mathbf{u}^{*}Z_{k}^{*}G_{k}(z)\mathbf{a}_{k}u_{k}b_{k}+\mathbf{u}^{*}Z_{k}^{*}G_{k}(z)\mathbf{a}_{k}\mathbf{a}_{k}^{*}G_{k}(z)\mathbf{a}_{k}u_{k}b_{k}\right]+O\left(\frac{1}{\sqrt{n}}\right)
=โˆ‘k=1nEโ€‹B6โ€‹k0+Oโ€‹(1n).\displaystyle=\sum^{n}_{k=1}\mbox{E}B_{6k}^{0}+O\left(\frac{1}{\sqrt{n}}\right).

Similarly, by decomposing B7โ€‹k1=๐ฎโˆ—โ€‹Zkโˆ—โ€‹Gkโ€‹(z)โ€‹ฮฃ1/2โ€‹๐ฐkโ€‹๐ฑkโ€‹Gkโ€‹(z)โ€‹๐ฑkโ€‹ukโ€‹ฮฒk1B_{7k}^{1}=\mathbf{u}^{*}Z_{k}^{*}G_{k}(z)\Sigma^{1/2}\mathbf{w}_{k}\mathbf{x}_{k}G_{k}(z)\mathbf{x}_{k}u_{k}\beta_{k}^{1}, one can prove that

โˆ‘k=1nEโ€‹B7โ€‹k1=2โ€‹๐škโˆ—โ€‹Gkโ€‹(z)โ€‹๐šบโ€‹Gkโ€‹(z)โ€‹Zkโˆ—โ€‹๐ฎโ€‹ukโ€‹bk+Oโ€‹(1n)=โˆ‘k=1nEโ€‹B7โ€‹k0+Oโ€‹(1n).\sum^{n}_{k=1}\mbox{E}B_{7k}^{1}=2\mathbf{a}_{k}^{*}G_{k}(z)\bm{\Sigma}G_{k}(z)Z_{k}^{*}\mathbf{u}u_{k}b_{k}+O\left(\frac{1}{\sqrt{n}}\right)=\sum^{n}_{k=1}\mbox{E}B_{7k}^{0}+O\left(\frac{1}{\sqrt{n}}\right). (S.88)

Therefore, combining arguments above, (S.73) holds. โˆŽ

Proof of Proposition 2. We first consider the bound for Q~ยฏn\bar{\tilde{Q}}_{n}. The strategy is to consider the difference between the quadratic form involving Q~ยฏn\bar{\tilde{Q}}_{n} and the one involving Q~n\tilde{Q}_{n}, as studied in Proposition 1, and to derive the limits of such difference terms. The approach also applies to deriving the second bound for Qยฏn\bar{Q}_{n}.

Similarly to (S.71), by Lemma 2, we have

(ฮฆโ€‹๐—โˆ—โ€‹๐—โ€‹ฮฆโˆ’zโ€‹๐ˆ)โˆ’1=(โˆ’zโ€‹๐ˆ)โˆ’1+zโˆ’1โ€‹ฮฆโ€‹๐—โˆ—โ€‹(๐—โ€‹ฮฆโ€‹๐—โˆ—โˆ’zโ€‹๐ˆ)โˆ’1โ€‹๐—โ€‹ฮฆ.(\Phi\mathbf{X}^{*}\mathbf{X}\Phi-z\mathbf{I})^{-1}=(-z\mathbf{I})^{-1}+z^{-1}\Phi\mathbf{X}^{*}(\mathbf{X}\Phi\mathbf{X}^{*}-z\mathbf{I})^{-1}\mathbf{X}\Phi.

By calculations, we find

D~c\displaystyle\tilde{D}_{c} :=๐ฎโˆ—โ€‹ฮฆโ€‹๐—โˆ—โ€‹(๐—โ€‹ฮฆโ€‹๐—โˆ—โˆ’zโ€‹๐ˆ)โˆ’1โ€‹๐—โ€‹ฮฆโ€‹๐ฎโˆ’๐ฎโˆ—โ€‹ฮฆโ€‹๐—โˆ—โ€‹(๐—๐—โˆ—โˆ’zโ€‹๐ˆ)โˆ’1โ€‹๐—โ€‹ฮฆโ€‹๐ฎ\displaystyle:=\mathbf{u}^{*}\Phi\mathbf{X}^{*}(\mathbf{X}\Phi\mathbf{X}^{*}-z\mathbf{I})^{-1}\mathbf{X}\Phi\mathbf{u}-\mathbf{u}^{*}\Phi\mathbf{X}^{*}(\mathbf{X}\mathbf{X}^{*}-z\mathbf{I})^{-1}\mathbf{X}\Phi\mathbf{u} (S.89)
=๐ฎโˆ—โ€‹ฮฆโ€‹๐—โˆ—โ€‹(๐—๐—โˆ—โˆ’zโ€‹๐ˆ)โˆ’1โ€‹nโˆ’1โ€‹๐—๐Ÿ๐Ÿโˆ—โ€‹๐—โˆ—โ€‹(๐—โ€‹ฮฆโ€‹๐—โˆ—โˆ’zโ€‹๐ˆ)โˆ’1โ€‹๐—โ€‹ฮฆโ€‹๐ฎ\displaystyle=\mathbf{u}^{*}\Phi\mathbf{X}^{*}(\mathbf{X}\mathbf{X}^{*}-z\mathbf{I})^{-1}n^{-1}\mathbf{X}\mathbf{1}\mathbf{1}^{*}\mathbf{X}^{*}(\mathbf{X}\Phi\mathbf{X}^{*}-z\mathbf{I})^{-1}\mathbf{X}\Phi\mathbf{u}
=โˆ’nโˆ’1โ€‹๐ฎโˆ—โ€‹ฮฆโ€‹๐—โˆ—โ€‹(๐—๐—โˆ—โˆ’zโ€‹๐ˆ)โˆ’1โ€‹๐—๐Ÿ๐Ÿโˆ—โ€‹๐—โˆ—โ€‹(๐—๐—โˆ—โˆ’zโ€‹๐ˆ)โˆ’1โ€‹๐—โ€‹ฮฆโ€‹๐ฎ1โˆ’nโˆ’1โ€‹๐Ÿโˆ—โ€‹๐—โˆ—โ€‹(๐—๐—โˆ—โˆ’zโ€‹๐ˆ)โˆ’1โ€‹๐—๐Ÿ\displaystyle=-\frac{n^{-1}\mathbf{u}^{*}\Phi\mathbf{X}^{*}(\mathbf{X}\mathbf{X}^{*}-z\mathbf{I})^{-1}\mathbf{X}\mathbf{1}\mathbf{1}^{*}\mathbf{X}^{*}(\mathbf{X}\mathbf{X}^{*}-z\mathbf{I})^{-1}\mathbf{X}\Phi\mathbf{u}}{1-n^{-1}\mathbf{1}^{*}\mathbf{X}^{*}(\mathbf{X}\mathbf{X}^{*}-z\mathbf{I})^{-1}\mathbf{X}\mathbf{1}}
=nโˆ’1โ€‹z2โ€‹๐ฎโˆ—โ€‹ฮฆโ€‹(๐—โˆ—โ€‹๐—โˆ’zโ€‹๐ˆ)โˆ’1โ€‹๐Ÿ๐Ÿโˆ—โ€‹(๐—โˆ—โ€‹๐—โˆ’zโ€‹๐ˆ)โˆ’1โ€‹ฮฆโ€‹๐ฎzโ€‹nโˆ’1โ€‹๐Ÿโˆ—โ€‹(๐—โˆ—โ€‹๐—โˆ’zโ€‹๐ˆ)โˆ’1โ€‹๐Ÿ\displaystyle=\frac{n^{-1}z^{2}\mathbf{u}^{*}\Phi(\mathbf{X}^{*}\mathbf{X}-z\mathbf{I})^{-1}\mathbf{1}\mathbf{1}^{*}(\mathbf{X}^{*}\mathbf{X}-z\mathbf{I})^{-1}\Phi\mathbf{u}}{zn^{-1}\mathbf{1}^{*}(\mathbf{X}^{*}\mathbf{X}-z\mathbf{I})^{-1}\mathbf{1}}

where the first step uses (S.66), the second step uses (S.65), and the third step uses (S.71) and ๐ฎโˆ—โ€‹ฮฆโ€‹๐Ÿ=0\mathbf{u}^{*}\Phi\mathbf{1}=0. Following similar steps, we obtain

L~c\displaystyle\tilde{L}_{c} :=r~2โ€‹๐ฎโˆ—โ€‹ฮฆโ€‹๐€nโŠคโ€‹(๐ˆ+r~โ€‹๐‘ยฏn)โˆ’1โ€‹๐€nโ€‹ฮฆโ€‹๐ฎโˆ’r~2โ€‹๐ฎโˆ—โ€‹ฮฆโ€‹๐€nโŠคโ€‹(๐ˆ+r~โ€‹๐‘n)โˆ’1โ€‹๐€nโ€‹ฮฆโ€‹๐ฎ\displaystyle:=\tilde{r}^{2}\mathbf{u}^{*}\Phi\mathbf{A}_{n}^{\top}(\mathbf{I}+\tilde{r}\bar{\mathbf{R}}_{n})^{-1}\mathbf{A}_{n}\Phi\mathbf{u}-\tilde{r}^{2}\mathbf{u}^{*}\Phi\mathbf{A}_{n}^{\top}(\mathbf{I}+\tilde{r}{\mathbf{R}}_{n})^{-1}\mathbf{A}_{n}\Phi\mathbf{u} (S.90)
=r~2โ€‹๐ฎโˆ—โ€‹ฮฆโ€‹๐€nโŠคโ€‹(๐ˆ+r~โ€‹๐‘)โˆ’1โ€‹nโˆ’1โ€‹๐€nโ€‹r~โ€‹๐Ÿ๐ŸโŠคโ€‹๐€nโŠคโ€‹(๐ˆ+r~โ€‹๐‘)โˆ’11โˆ’r~โ€‹nโˆ’1โ€‹๐ŸโŠคโ€‹๐€nโŠคโ€‹(๐ˆ+r~โ€‹๐‘)โˆ’1โ€‹๐€nโ€‹๐Ÿ.\displaystyle=\frac{\tilde{r}^{2}\mathbf{u}^{*}\Phi\mathbf{A}_{n}^{\top}(\mathbf{I}+\tilde{r}\mathbf{R})^{-1}n^{-1}\mathbf{A}_{n}\tilde{r}\mathbf{1}\mathbf{1}^{\top}\mathbf{A}_{n}^{\top}(\mathbf{I}+\tilde{r}\mathbf{R})^{-1}}{1-\tilde{r}n^{-1}\mathbf{1}^{\top}\mathbf{A}_{n}^{\top}(\mathbf{I}+\tilde{r}\mathbf{R})^{-1}\mathbf{A}_{n}\mathbf{1}}.

Next we verify that zโˆ’1โ€‹D~cโˆ’L~cz^{-1}\tilde{D}_{c}-\tilde{L}_{c} is OPโ€‹(nโˆ’1/2)O_{P}(n^{-1/2}). By polarization, (9) still holds with different sequences of deterministic vectors on both sides of Q~nโ€‹(z)โˆ’R~nโ€‹(z)\tilde{Q}_{n}(z)-\tilde{R}_{n}(z). This together with ๐ฎโˆ—โ€‹ฮฆโ€‹๐Ÿ=0\mathbf{u}^{*}\Phi\mathbf{1}=0 yields

nโˆ’1/2โ€‹๐ฎโˆ—โ€‹ฮฆโ€‹(๐—โˆ—โ€‹๐—โˆ’zโ€‹๐ˆ)โˆ’1โ€‹๐Ÿ+nโˆ’1/2โ€‹r~2โ€‹๐ฎโˆ—โ€‹ฮฆโ€‹๐€nโŠคโ€‹(๐ˆ+r~โ€‹๐‘n)โˆ’1โ€‹๐€nโ€‹๐Ÿ=OPโ€‹(nโˆ’1/2).n^{-1/2}\mathbf{u}^{*}\Phi(\mathbf{X}^{*}\mathbf{X}-z\mathbf{I})^{-1}\mathbf{1}+n^{-1/2}\tilde{r}^{2}\mathbf{u}^{*}\Phi\mathbf{A}_{n}^{\top}(\mathbf{I}+\tilde{r}\mathbf{R}_{n})^{-1}\mathbf{A}_{n}\mathbf{1}=O_{P}(n^{-1/2}).

For the term in the denominator, we have

zโ€‹nโˆ’1โ€‹๐Ÿโˆ—โ€‹(๐—โˆ—โ€‹๐—โˆ’zโ€‹๐ˆ)โˆ’1โ€‹๐Ÿโˆ’[zโ€‹r~โˆ’zโ€‹nโˆ’1โ€‹r~2โ€‹๐Ÿโˆ—โ€‹๐€nโŠคโ€‹(๐ˆ+r~โ€‹๐‘)โˆ’1โ€‹๐€nโ€‹๐Ÿ]=OPโ€‹(nโˆ’1/2).zn^{-1}\mathbf{1}^{*}(\mathbf{X}^{*}\mathbf{X}-z\mathbf{I})^{-1}\mathbf{1}-\left[z\tilde{r}-zn^{-1}\tilde{r}^{2}\mathbf{1}^{*}\mathbf{A}_{n}^{\top}(\mathbf{I}+\tilde{r}\mathbf{R})^{-1}\mathbf{A}_{n}\mathbf{1}\right]=O_{P}(n^{-1/2}).

With these two bounds, and by calculating the difference of those two terms in the last step of (S.89) and (S.90), respectively, we find that

zโˆ’1โ€‹D~cโˆ’L~c=zโˆ’1โ€‹[๐ฎโˆ—โ€‹ฮฆโ€‹๐—โˆ—โ€‹(๐—โ€‹ฮฆโ€‹๐—โˆ—โˆ’zโ€‹๐ˆ)โˆ’1โ€‹๐—โ€‹ฮฆโ€‹๐ฎโˆ’๐ฎโˆ—โ€‹ฮฆโ€‹๐—โˆ—โ€‹(๐—๐—โˆ—โˆ’zโ€‹๐ˆ)โˆ’1โ€‹๐—โ€‹ฮฆโ€‹๐ฎ]\displaystyle z^{-1}\tilde{D}_{c}-\tilde{L}_{c}=z^{-1}\left[\mathbf{u}^{*}\Phi\mathbf{X}^{*}(\mathbf{X}\Phi\mathbf{X}^{*}-z\mathbf{I})^{-1}\mathbf{X}\Phi\mathbf{u}-\mathbf{u}^{*}\Phi\mathbf{X}^{*}(\mathbf{X}\mathbf{X}^{*}-z\mathbf{I})^{-1}\mathbf{X}\Phi\mathbf{u}\right] (S.91)
โˆ’[r~2โ€‹๐ฎโˆ—โ€‹ฮฆโ€‹๐€nโŠคโ€‹(๐ˆ+r~โ€‹๐‘ยฏn)โˆ’1โ€‹๐€nโ€‹ฮฆโ€‹๐ฎโˆ’r~2โ€‹๐ฎโˆ—โ€‹ฮฆโ€‹๐€nโŠคโ€‹(๐ˆ+r~โ€‹๐‘n)โˆ’1โ€‹๐€nโ€‹ฮฆโ€‹๐ฎ]=OPโ€‹(nโˆ’1/2).\displaystyle\quad-\left[\tilde{r}^{2}\mathbf{u}^{*}\Phi\mathbf{A}_{n}^{\top}(\mathbf{I}+\tilde{r}\bar{\mathbf{R}}_{n})^{-1}\mathbf{A}_{n}\Phi\mathbf{u}-\tilde{r}^{2}\mathbf{u}^{*}\Phi\mathbf{A}_{n}^{\top}(\mathbf{I}+\tilde{r}{\mathbf{R}}_{n})^{-1}\mathbf{A}_{n}\Phi\mathbf{u}\right]=O_{P}(n^{-1/2}).

Since (โˆ’zโ€‹๐ˆ)โˆ’1โ€‹๐ฎโˆ—โ€‹ฮฆโ€‹๐ฎ+zโˆ’1โ€‹๐ฎโˆ—โ€‹ฮฆโ€‹๐—โˆ—โ€‹(๐—๐—โˆ—โˆ’zโ€‹๐ˆ)โˆ’1โ€‹๐—โ€‹ฮฆโ€‹๐ฎโˆ’[r~โ€‹๐ฎโˆ—โ€‹ฮฆโ€‹๐ฎโˆ’r~2โ€‹๐ฎโˆ—โ€‹ฮฆโ€‹๐€nโŠคโ€‹(๐ˆ+r~โ€‹๐‘n)โˆ’1โ€‹๐€nโ€‹ฮฆโ€‹๐ฎ]=OPโ€‹(nโˆ’1/2)(-z\mathbf{I})^{-1}\mathbf{u}^{*}\Phi\mathbf{u}+z^{-1}\mathbf{u}^{*}\Phi\mathbf{X}^{*}(\mathbf{X}\mathbf{X}^{*}-z\mathbf{I})^{-1}\mathbf{X}\Phi\mathbf{u}-\left[\tilde{r}\mathbf{u}^{*}\Phi\mathbf{u}-\tilde{r}^{2}\mathbf{u}^{*}\Phi\mathbf{A}_{n}^{\top}(\mathbf{I}+\tilde{r}{\mathbf{R}}_{n})^{-1}\mathbf{A}_{n}\Phi\mathbf{u}\right]=O_{P}(n^{-1/2}) by (9), we conclude from this and (S.91) that

(โˆ’zโ€‹๐ˆ)โˆ’1โ€‹๐ฎโˆ—โ€‹ฮฆโ€‹๐ฎ+zโˆ’1โ€‹๐ฎโˆ—โ€‹ฮฆโ€‹๐—โˆ—โ€‹(๐—โ€‹ฮฆโ€‹๐—โˆ—โˆ’zโ€‹๐ˆ)โˆ’1โ€‹๐—โ€‹ฮฆโ€‹๐ฎ\displaystyle(-z\mathbf{I})^{-1}\mathbf{u}^{*}\Phi\mathbf{u}+z^{-1}\mathbf{u}^{*}\Phi\mathbf{X}^{*}(\mathbf{X}\Phi\mathbf{X}^{*}-z\mathbf{I})^{-1}\mathbf{X}\Phi\mathbf{u}
โˆ’[r~โ€‹๐ฎโˆ—โ€‹ฮฆโ€‹๐ฎโˆ’r~2โ€‹๐ฎโˆ—โ€‹ฮฆโ€‹๐€nโŠคโ€‹(๐ˆ+r~โ€‹๐‘ยฏn)โˆ’1โ€‹๐€nโ€‹ฮฆโ€‹๐ฎ]=OPโ€‹(nโˆ’1/2).\displaystyle\quad\quad-\left[\tilde{r}\mathbf{u}^{*}\Phi\mathbf{u}-\tilde{r}^{2}\mathbf{u}^{*}\Phi\mathbf{A}_{n}^{\top}(\mathbf{I}+\tilde{r}\bar{\mathbf{R}}_{n})^{-1}\mathbf{A}_{n}\Phi\mathbf{u}\right]=O_{P}(n^{-1/2}).

Therefore,

๐ฎโˆ—โ€‹(ฮฆโ€‹๐—โˆ—โ€‹๐—โ€‹ฮฆโˆ’zโ€‹๐ˆ)โˆ’1โˆ’[r~โ€‹๐ฎโˆ—โ€‹ฮฆโ€‹๐ฎโˆ’r~2โ€‹๐ฎโˆ—โ€‹ฮฆโ€‹๐€nโŠคโ€‹(๐ˆ+r~โ€‹๐‘ยฏn)โˆ’1โ€‹๐€nโ€‹ฮฆโ€‹๐ฎโˆ’zโˆ’1โ€‹๐ฎโˆ—โ€‹nโˆ’1โ€‹๐Ÿ๐ŸโŠคโ€‹๐ฎ]=OPโ€‹(nโˆ’1/2).\mathbf{u}^{*}(\Phi\mathbf{X}^{*}\mathbf{X}\Phi-z\mathbf{I})^{-1}-\left[\tilde{r}\mathbf{u}^{*}\Phi\mathbf{u}-\tilde{r}^{2}\mathbf{u}^{*}\Phi\mathbf{A}_{n}^{\top}(\mathbf{I}+\tilde{r}\bar{\mathbf{R}}_{n})^{-1}\mathbf{A}_{n}\Phi\mathbf{u}-z^{-1}\mathbf{u}^{*}n^{-1}\mathbf{1}\mathbf{1}^{\top}\mathbf{u}\right]=O_{P}(n^{-1/2}).

This concludes the bound for Q~ยฏn\bar{\tilde{Q}}_{n}.

Then we prove the second bound on Qยฏnโ€‹(z)\bar{Q}_{n}(z). We have

Dc\displaystyle D_{c} =๐ฏโˆ—โ€‹[(๐—โ€‹ฮฆโ€‹๐—โˆ—โˆ’zโ€‹๐ˆ)โˆ’1โˆ’(๐—๐—โˆ—โˆ’zโ€‹๐ˆ)โˆ’1]โ€‹๐ฏ\displaystyle=\mathbf{v}^{*}\left[(\mathbf{X}\Phi\mathbf{X}^{*}-z\mathbf{I})^{-1}-(\mathbf{X}\mathbf{X}^{*}-z\mathbf{I})^{-1}\right]\mathbf{v} (S.92)
=๐ฏโˆ—โ€‹Qnโ€‹nโˆ’1โ€‹๐—๐Ÿ๐Ÿโˆ—โ€‹๐—โˆ—โ€‹Qnโ€‹๐ฏ1โˆ’nโˆ’1โ€‹๐Ÿโˆ—โ€‹๐—โˆ—โ€‹Qโ€‹๐—๐Ÿ=๐ฏโˆ—โ€‹Qnโ€‹nโˆ’1โ€‹๐€๐Ÿ๐Ÿโˆ—โ€‹๐€โˆ—โ€‹Qnโ€‹๐ฏ1โˆ’nโˆ’1โ€‹๐Ÿโˆ—โ€‹๐—โˆ—โ€‹Qโ€‹๐—๐Ÿ+OPโ€‹(nโˆ’1/2),\displaystyle=\frac{\mathbf{v}^{*}Q_{n}n^{-1}\mathbf{X}\mathbf{1}\mathbf{1}^{*}\mathbf{X}^{*}Q_{n}\mathbf{v}}{1-n^{-1}\mathbf{1}^{*}\mathbf{X}^{*}Q\mathbf{X}\mathbf{1}}=\frac{\mathbf{v}^{*}Q_{n}n^{-1}\mathbf{A}\mathbf{1}\mathbf{1}^{*}\mathbf{A}^{*}Q_{n}\mathbf{v}}{1-n^{-1}\mathbf{1}^{*}\mathbf{X}^{*}Q\mathbf{X}\mathbf{1}}+O_{P}(n^{-1/2}),

where in the last step we use nโˆ’1/2โ€‹๐ฏโˆ—โ€‹Qnโ€‹๐–๐Ÿ=OPโ€‹(nโˆ’1/2)n^{-1/2}\mathbf{v}^{*}Q_{n}\mathbf{W}\mathbf{1}=O_{P}(n^{-1/2}), which can be checked by following arguments similar to (3.7)-(3.12) of Pan, (2014). We also find

Lc\displaystyle L_{c} :=(โˆ’zโˆ’zโ€‹r~โ€‹๐‘ยฏn)โˆ’1โˆ’(โˆ’zโˆ’zโ€‹r~โ€‹๐‘n)โˆ’1\displaystyle:=(-z-z\tilde{r}\bar{\mathbf{R}}_{n})^{-1}-(-z-z\tilde{r}{\mathbf{R}}_{n})^{-1}
=โˆ’(โˆ’zโˆ’zโ€‹r~โ€‹๐‘)โˆ’1โ€‹zโ€‹r~โ€‹nโˆ’1โ€‹๐€๐Ÿ๐ŸโŠคโ€‹๐€โŠคโ€‹(โˆ’zโˆ’zโ€‹r~โ€‹๐‘)โˆ’11โˆ’zโ€‹nโˆ’1โ€‹๐Ÿโˆ—โ€‹๐€โŠคโ€‹(โˆ’zโˆ’zโ€‹r~โ€‹๐‘)โˆ’1โ€‹๐€๐Ÿ.\displaystyle=\frac{-(-z-z\tilde{r}\mathbf{R})^{-1}z\tilde{r}n^{-1}\mathbf{A}\mathbf{1}\mathbf{1}^{\top}\mathbf{A}^{\top}(-z-z\tilde{r}\mathbf{R})^{-1}}{1-zn^{-1}\mathbf{1}^{*}\mathbf{A}^{\top}(-z-z\tilde{r}\mathbf{R})^{-1}\mathbf{A}\mathbf{1}}.

By (11), we have nโˆ’1/2โ€‹๐ฏโˆ—โ€‹Qnโ€‹๐€๐Ÿโˆ’nโˆ’1/2โ€‹๐ฏโˆ—โ€‹(โˆ’zโˆ’zโ€‹r~โ€‹๐‘)โˆ’1โ€‹๐€๐Ÿ=OPโ€‹(nโˆ’1/2).n^{-1/2}\mathbf{v}^{*}Q_{n}\mathbf{A}\mathbf{1}-n^{-1/2}\mathbf{v}^{*}(-z-z\tilde{r}\mathbf{R})^{-1}\mathbf{A}\mathbf{1}=O_{P}(n^{-1/2}). Since 1โˆ’nโˆ’1โ€‹๐Ÿโˆ—โ€‹๐—โˆ—โ€‹Qโ€‹๐—๐Ÿ=zโ€‹nโˆ’1โ€‹๐Ÿโˆ—โ€‹Q~nโ€‹๐Ÿ,1-n^{-1}\mathbf{1}^{*}\mathbf{X}^{*}Q\mathbf{X}\mathbf{1}=zn^{-1}\mathbf{1}^{*}\tilde{Q}_{n}\mathbf{1}, according to (9), we find 1โˆ’nโˆ’1โ€‹๐Ÿโˆ—โ€‹๐—โˆ—โ€‹Qโ€‹๐—๐Ÿ+zโ€‹nโˆ’1โ€‹๐Ÿโˆ—โ€‹[r~โ€‹๐ˆ+r~2โ€‹๐€โŠคโ€‹(๐ˆ+r~โ€‹(z)โ€‹๐‘n)โˆ’1โ€‹๐€]โ€‹๐Ÿ=OPโ€‹(nโˆ’1/2)1-n^{-1}\mathbf{1}^{*}\mathbf{X}^{*}Q\mathbf{X}\mathbf{1}+zn^{-1}\mathbf{1}^{*}\left[\tilde{r}\mathbf{I}+\tilde{r}^{2}\mathbf{A}^{\top}(\mathbf{I}+\tilde{r}(z)\mathbf{R}_{n})^{-1}\mathbf{A}\right]\mathbf{1}=O_{P}(n^{-1/2}). Therefore, we obtain that Dcโˆ’Lc=OPโ€‹(nโˆ’1/2)D_{c}-L_{c}=O_{P}(n^{-1/2}). This combining with the fact that ๐ฏโˆ—โ€‹(๐—๐—โˆ—โˆ’zโ€‹๐ˆ)โˆ’1โ€‹๐ฏโˆ’๐ฏโˆ—โ€‹(โˆ’zโˆ’zโ€‹r~โ€‹๐‘n)โˆ’1โ€‹๐ฏ=OPโ€‹(nโˆ’1/2)\mathbf{v}^{*}(\mathbf{X}\mathbf{X}^{*}-z\mathbf{I})^{-1}\mathbf{v}-\mathbf{v}^{*}(-z-z\tilde{r}{\mathbf{R}}_{n})^{-1}\mathbf{v}=O_{P}(n^{-1/2}) concludes the proof. โˆŽ