This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Asymptotic locations of bounded and unbounded eigenvalues of sample correlation matrices of certain factor models
– application to a components retention rule

Yohji Akama  
The Mathematical Institute, Tohoku University,
Aramaki, Aoba, Sendai, 980-8578, Japan

Peng Tian
Laboratoire Jean Alexandre Dieudonné, Université Côte d’Azur,
28, Avenue Valrose, 06108 Nice Cedex 2, France
Abstract

Let the dimension NN of data and the sample size TT tend to \infty with N/Tc>0N/T\to c>0. The spectral properties of a sample correlation matrix 𝐂\mathbf{C} and a sample covariance matrix 𝐒\mathbf{S} are asymptotically equal whenever the population correlation matrix 𝐑\mathbf{R} is bounded (El Karoui, 2009). We demonstrate this also for general linear models for unbounded 𝐑\mathbf{R}, by examining the behavior of the singular values of multiplicatively perturbed matrices. By this, we establish: Given a factor model of an idiosyncratic noise variance σ2\sigma^{2} and a rank-rr factor loading matrix 𝐋\mathbf{L} which rows all have common Euclidean norm LL. Then, the kkth largest eigenvalues λk\lambda_{k} (1kN)(1\leq k\leq N) of 𝐂\mathbf{C} satisfy almost surely: (1) λr\lambda_{r} diverges, (2) λk/sk21/(L2+σ2)\lambda_{k}/s_{k}^{2}\to 1/(L^{2}+\sigma^{2}) (1kr)(1\leq k\leq r) for the kkth largest singular value sks_{k} of 𝐋\mathbf{L}, and (3) λr+1(1ρ)(1+c)2\lambda_{r+1}\to(1-\rho)(1+\sqrt{c})^{2} for ρ:=L2/(L2+σ2)\rho:=L^{2}/(L^{2}+\sigma^{2}). Whenever srs_{r} is much larger than logN\sqrt{\log N}, then broken-stick rule (Frontier, 1976; Jackson, 1993), which estimates rank𝐋\operatorname{rank}\mathbf{L} by a random partition (Holst, 1980) of [0, 1][0,\,1], tends to rr (a.s.). We also provide a natural factor model where the rule tends to “essential rank” of 𝐋\mathbf{L} (a.s.) which is smaller than rank𝐋\operatorname{rank}\mathbf{L}.


Keywords: unbounded eigenvalues, the largest bounded eigenvalue, high-dimensional modeling, multiplicative matrix perturbation, correlation

1 Introduction

In multivariate analysis (Anderson, 2003; Fujikoshi et al., 2011; Muirhead, 2009), covariance matrices are important objects, for example, in principal component analysis (PCA). Motivated by this, research on sample covariance matrices has a long history and an abundance of results have been established, e.g. various Wishart distributions.

Several well-known methods in multivariate analysis, however, become inefficient or even misleading when the data dimension NN is large. To deal with such large-dimensional data, a novel approach in asymptotic statistics has been developed where the data dimension NN is no longer fixed but tends to infinity together with the sample size TT. The feature of this limiting regime are discussed in (Yao et al., 2015, Section 1.1) based on real datasets such as portfolio, climate survey, speech analysis, face recognition, microarrays, signal detection, etc. In this paper, we assume a high-dimensional regime in which the ratio N/TN/T, of dimension NN to sample size TT, converges to a positive constant cc.

One of milestone work about the spectral properties of sample covariance matrices in this limiting regime was the discovery of Marčenko-Pastur distribution, and the determination (Yin et al., 1988; Bai & Yin, 1993) of the largest and the smallest eigenvalues of sample covariance matrices 𝐒\mathbf{S} for i.i.d. data under the finite fourth moment condition. Baik & Silverstein (2006) studied asymptotic locations of the eigenvalues of 𝐒\mathbf{S} of a spiked eigenvalue model (Johnstone, 2001), but the largest eigenvalues of the population covariance matrix 𝚺\bm{\Sigma} are bounded. We refer the reader to (Bai & Silverstein, 2010; Paul & Aue, 2014; Yao et al., 2015).

In some cases, for example, when data are measured in different units, it is more appropriate to utilize sample correlation matrices (Jolliffe, 2002; Jolliffe & Cadima, 2016; Johnson & Wichern, 2007) because sample correlation matrices 𝐂\mathbf{C} are invariant under scaling and shifting. Moreover, the assumption of i.i.d. data is not very acceptable; Practitioners often hope to be in the presence of a curious covariance structure. However, the asymptotic spectral properties of a sample correlation matrix 𝐂\mathbf{C} in the high-dimensional regime have not been sufficiently investigated, compared to sample covariance matrices 𝐒\mathbf{S}.

We review typical asymptotic results of the spectral properties of sample correlation matrices 𝐂\mathbf{C} here. For i.i.d. case, under the finite fourth moment condition, Jiang (2004) showed that the extreme eigenvalues of the sample correlation matrix 𝐂\mathbf{C} converge almost surely to (1±c)2(1\pm\sqrt{c})^{2}, and Bai & Zhou (2008) showed that the limiting spectral distribution is the standard Marčenko-Pastur distribution under a weaker assumption.

In a class of spiked models, Morales-Jimenez et al. (2021) derived asymptotic first-order and distributional results for spiked eigenvalues and eigenvectors of sample correlation matrices 𝐂\mathbf{C}. More specifically, they found that the first-order spectral properties of sample correlation matrices 𝐂\mathbf{C} match those of sample covariance matrices 𝐒\mathbf{S}, whilst their asymptotic distributions can differ significantly.

El Karoui (2009) revealed that the first-order asymptotic behavior of the spectra of 𝐂\mathbf{C} is similar to that of 𝐒\mathbf{S}, for unit variance data of a general linear model, except that this similarity requires the boundedness of the population correlation matrix 𝐑\mathbf{R}, a condition not met by some factor models.

The problem is that the boundedness of the population covariance matrices 𝚺\bm{\Sigma} is not always satisfied in econometrics, finance (Chamberlain & Rothschild, 1983; Bai & Ng, 2002), genomics, and stationary long-memory processes. One of the features of the unbounded population covariance matrices 𝚺\bm{\Sigma} is the consistency of the eigenvectors (Yata & Aoshima, 2013; Koltchinskii & Lounici, 2017; Wang & Fan, 2017), although the asymptotic location and the fluctuation of the largest unbounded eigenvalues of 𝐂\mathbf{C} are not available yet even for the equi-correlated normal population, which has the simplest unbounded population covariance matrix 𝚺\bm{\Sigma}.

Unbounded covariance/correlation matrices in high-dimensional problems are studied with a spiked model (Yata & Aoshima, 2013), a time-series model (Merlevède et al., 2019) or a factor model (Cai et al., 2020; Wang & Fan, 2017). Following the latter articles for unbounded sample covariance matrices, we study factor models for unbounded sample correlation matrices.

Our target model is a KK-factor model

𝐗=[𝝁𝝁]+𝐋𝐅+𝚲𝚿\mathbf{X}=\begin{bmatrix}\bm{\mu}&\dots&\bm{\mu}\end{bmatrix}+\mathbf{L}\mathbf{F}+\mathbf{\Lambda}\mathbf{\Psi}

with 𝝁N\bm{\mu}\in{\mathbb{R}}^{N}, 𝐋N×K\mathbf{L}\in{\mathbb{R}}^{N\times K}, 𝚲N×N\mathbf{\Lambda}\in{\mathbb{R}}^{N\times N} being deterministic, KK a fixed positive integer, 𝐅=[fit]K×T\mathbf{F}=[f_{it}]\in{\mathbb{R}}^{K\times T} and 𝚿=[ψit]N×T\mathbf{\Psi}=[\psi_{it}]\in{\mathbb{R}}^{N\times T} two random matrices. Here we assume that the entries of 𝐅\mathbf{F} (factors) are i.i.d. centered random variables with unit variance, and so are the entries of 𝚿\mathbf{\Psi} (noises). The entries of 𝐅\mathbf{F} are independent from the entries of 𝚿\mathbf{\Psi}, but f11f_{11} and ψ11\psi_{11} are not necessarily identically distributed. All rows of 𝚪:=[𝐋𝚲]\bm{\Gamma}:=\begin{bmatrix}\mathbf{L}&\mathbf{\Lambda}\end{bmatrix} are nonzero.

Our fundamental result is Theorem 2.4: for every KK-factor model, if the entries ψit\psi_{it} of the noise matrix 𝚿\mathbf{\Psi} have finite fourth moments and 𝚲\mathbf{\Lambda} is diagonal, or if there exists ε>0\varepsilon>0 such that 𝔼(|ψit|4(log|ψit|)2+2ε)<\operatorname{\mathbb{E}}\left(\,|\psi_{it}|^{4}(\log|\psi_{it}|)^{2+2\varepsilon}\right)<\infty, then the eigenvalues of the sample correlation matrices 𝐂\mathbf{C} and those of the sample covariance matrices 𝐒\mathbf{S} with unit-variance data are asymptotically equal, which extends the result of El Karoui (2009).

In PCA, methods for estimating the number of factors in a sample have been studied by (Bai & Ng, 2002; Lam & Yao, 2012; Aït-Sahalia & Xiu, 2017) in econometrics, etc.

Meanwhile, rules for the retention of principal components have been proposed in many lectures (see, e.g., (Jackson, 1993)). Broken-stick (BS) rule (Frontier, 1976; Jackson, 1993) is a peculiar rule among such rules. BS rule compares the spectral distribution of 𝐂\mathbf{C}, with a distribution of the mean lengths of the subintervals obtained by a random partition of [0,  1] (Holst, 1980). Specifically, for the number of factors or significant principal components, the BS rule provides i1i-1 where ii is the lowest among [1,N][1,\,N] for which the iith largest eigenvalue of 𝐂\mathbf{C} does not exceed the sum 1/i+1/(i+1)++1/N1/i+1/(i+1)+\cdots+1/N. The BS rule depends on the number and growth rates of unbounded eigenvalues of 𝐂\mathbf{C}. The idea of the BS rule, initially coming from the species occupation model in an ecological system, is not evident in relation to the distribution of eigenvalues of 𝐂\mathbf{C}.

First, we study the asymptotic spectral properties of 𝐂\mathbf{C}, through the fundamental theorem of KK-factor model (Theorem 2.4) and techniques from the random matrix theory, for two illustrative KK-factor models generalizing the equi-correlated normal population. One of the two is a KK-factor model such that 𝚲=σ𝐈N\mathbf{\Lambda}=\sigma\mathbf{I}_{N} and the rows of the factor loading matrix 𝐋\mathbf{L} have a common length. We call this a constant length factor loading model (CLFM). The other is a KK-factor model such that the rows of 𝐋\mathbf{L} and the diagonal entries of 𝚲\mathbf{\Lambda} are convergent. This model was introduced in Akama (2023), and is called an asymptotic convergent factor model (ACFM) here. We establish that under certain mild conditions, the BS rule precisely matches the principal rank of the factor loading matrix 𝐋\mathbf{L} for a CLFM (see Theorem 2.7), but the essential rank of 𝐋\mathbf{L} for an ACFM (see Theorem 2.8).

Then we calculate the BS rule and some modern factor number estimators such as the adjusted correlation thresholding (Fan et al., 2022) and Bai-Ng’s rule (Bai & Ng, 2002) based on an information criterion, for financial datasets and a biological dataset (Quadeer et al., 2014). The financial datasets obtained by Fan et al. (2022) from Fama-French 100 portfolio (Fama & French, 1993) by cleaning-up outliers. The biological dataset is a binary multiple sequence alignment (MSA) of HCV genotype 1a (prevalent in North America) publicly available from the Los Alamos National Laboratory database (Kuiken et al., 2004).

The structure of this paper is as follows. Section 2 formally introduces the model and presents the main theorems, with proofs provided in Section A of the supplementary material. Numerical analysis is performed on actual financial and biological datasets in Section 3. Section 4 is the conclusion. Section B of the supplementary material includes several important lemmas derived from general random matrix theories.

2 Models and theoretical results

In this paper we consider the following model of data matrix:

Definition 2.1 (KK-Factor model).

Let K,N,TK,N,T be nonnegative integers. A KK-factor model is a random matrix 𝐗N×T\mathbf{X}\in{{\mathbb{R}}}^{N\times T} in the form

𝐗:=𝐌+𝐋𝐅+𝚲𝚿,\displaystyle\mathbf{X}:=\mathbf{M}+\mathbf{L}\mathbf{F}+\mathbf{\Lambda}\mathbf{\Psi},

where

  1. 1.

    the deterministic matrix 𝐌=[𝝁𝝁]N×T\mathbf{M}=\begin{bmatrix}\bm{\mu}&\dots&\bm{\mu}\end{bmatrix}\in{\mathbb{R}}^{N\times T} is the theoretical mean, with 𝝁N\bm{\mu}\in{\mathbb{R}}^{N};

  2. 2.

    the deterministic matrices 𝐋N×K\mathbf{L}\in{\mathbb{R}}^{N\times K} and 𝚲N×N\mathbf{\Lambda}\in{\mathbb{R}}^{N\times N} are called the factor loading matrix and noise coefficient matrix, and the rows of 𝐋\mathbf{L} are called factor loading vectors;

  3. 3.

    there are two independent sets of i.i.d. random variables {fit}1iK,t1\{f_{it}\}_{1\leq i\leq K,t\geq 1} and {ψit}i,t1\{\psi_{it}\}_{i,t\geq 1} with mean zero and variance one, such that

    𝐅=[fit]1iK,1tT,𝚿=[ψit]1iN,1tT.\mathbf{F}=[f_{it}]_{1\leq i\leq K,1\leq t\leq T},\quad\mathbf{\Psi}=[\psi_{it}]_{1\leq i\leq N,1\leq t\leq T}.

    The two random matrices 𝐅\mathbf{F} and 𝚿\mathbf{\Psi} are called factor matrix and idiosyncratic noise matrix, respectively.

We often write 𝐗\mathbf{X} in a compact form

𝐗=𝐌+𝚪𝐙\mathbf{X}=\mathbf{M}+\bm{\Gamma}\mathbf{Z}

with

𝚪=[𝐋𝚲]and𝐙=[𝐅𝚿].\bm{\Gamma}=\begin{bmatrix}\mathbf{L}&\mathbf{\Lambda}\end{bmatrix}\quad\text{and}\quad\mathbf{Z}=\begin{bmatrix}\mathbf{F}\\ \mathbf{\Psi}\end{bmatrix}.

Let 𝒙j\bm{x}_{j} be the jjth column of 𝐗\mathbf{X}. Then 𝒙1,,𝒙T\bm{x}_{1},\dots,\bm{x}_{T} are i.i.d. random vectors. We recall that the population covariance and correlation matrices are [ov(xi1,xk1)]N×N[\operatorname{\mathbb{C}ov}(x_{i1},x_{k1})]_{N\times N} and [orr(xi1,xk1)]N×N[\operatorname{\mathbb{C}orr}(x_{i1},x_{k1})]_{N\times N}, respectively, where xijx_{ij} is the iith component of the vector 𝒙j\bm{x}_{j}. Note that under Definition 2.1, the population covariance matrix is

𝚺:=𝚪𝚪=𝐋𝐋+𝚲𝚲,\bm{\Sigma}:=\bm{\Gamma}\bm{\Gamma}^{\top}=\mathbf{L}\mathbf{L}^{\top}+\mathbf{\Lambda}\mathbf{\Lambda}^{\top},

and the population correlation matrix is

𝐑:=𝚫12𝚺𝚫12,\mathbf{R}:=\bm{\Delta}^{-\frac{1}{2}}\bm{\Sigma}\bm{\Delta}^{-\frac{1}{2}},

where

𝚫=diag(𝕍(x11),,𝕍(xN1))\displaystyle\bm{\Delta}=\operatorname{diag}(\operatorname{\mathbb{V}}(x_{11}),\dots,\operatorname{\mathbb{V}}(x_{N1}))

is a diagonal matrix containing the variance of components of 𝒙1\bm{x}_{1}. Note that 𝕍(xk1)\operatorname{\mathbb{V}}(x_{k1}) is just the square of Euclidean norm of the kkth row vector 𝜸k\bm{\gamma}^{k} of 𝚪\bm{\Gamma}, and is also the kkth diagonal element of 𝚺\bm{\Sigma}. So we can write 𝚫=diag(𝚺)\bm{\Delta}=\operatorname{diag}(\bm{\Sigma}), where for a square matrix 𝐀\mathbf{A}, diag(𝐀)\operatorname{diag}(\mathbf{A}) denotes the diagonal matrix with the same diagonal as 𝐀\mathbf{A}.

The population covariance matrix 𝚺\bm{\Sigma} and correlation matrix 𝐑\mathbf{R} play important roles in multivariate statistics. However they are not always available. In order to estimate them, we define the theoretically-centered sample covariance matrix 𝐒~{\tilde{\mathbf{S}}} as

𝐒~\displaystyle{\tilde{\mathbf{S}}} =1T(𝐗𝐌)(𝐗𝐌).\displaystyle=\frac{1}{T}(\mathbf{X}-\mathbf{M})(\mathbf{X}-\mathbf{M})^{\top}. (1)

Let 𝐃~=diag(𝐒~)\tilde{\mathbf{D}}=\operatorname{diag}({\tilde{\mathbf{S}}}). Then the theoretically-centered sample correlation matrix 𝐂~\tilde{\mathbf{C}} is defined as (see e.g. El Karoui (2009))

𝐂~\displaystyle\tilde{\mathbf{C}} =𝐃~12𝐒~𝐃~12.\displaystyle=\tilde{\mathbf{D}}^{-\frac{1}{2}}{\tilde{\mathbf{S}}}\tilde{\mathbf{D}}^{-\frac{1}{2}}.

As the mean vector 𝝁\bm{\mu} is not always known either, we sometimes need to replace 𝝁\bm{\mu} by the sample mean

𝒙¯=1Tt=1T𝒙t,\displaystyle\bar{\bm{x}}=\frac{1}{T}\sum_{t=1}^{T}\bm{x}_{t},

and use it to define the data-centered sample covariance matrix

𝐒=1T1(𝐗𝐗¯)(𝐗𝐗¯),\displaystyle\mathbf{S}=\frac{1}{T-1}(\mathbf{X}-\bar{\mathbf{X}})(\mathbf{X}-\bar{\mathbf{X}})^{\top},

where 𝐗¯=[𝒙¯𝒙¯]N×T\bar{\mathbf{X}}=\begin{bmatrix}\bar{\bm{x}}&\cdots&\bar{\bm{x}}\end{bmatrix}_{N\times T}, and the data-centered sample correlation matrix

𝐂=𝐃12𝐒𝐃12,\displaystyle\mathbf{C}=\mathbf{D}^{-\frac{1}{2}}\mathbf{S}\mathbf{D}^{-\frac{1}{2}},

where 𝐃=diag(𝐒)\mathbf{D}=\operatorname{diag}(\mathbf{S}).

In this paper we focus on the sample correlation matrices. Note that if one row of 𝚪\bm{\Gamma} is identically 0, then the corresponding row of 𝐗\mathbf{X} is deterministically equal to the mean, and the correlation between this row and any other row is not defined. But it is easy to recognize such a row in the data and to eliminate it before further treatment. So we can assume

  1. A1

    The rows of 𝚪\bm{\Gamma} are nonzero:

    𝜸i𝟎,i=1,,N.\bm{\gamma}^{i}\neq\bm{0},\quad i=1,\dots,N.

We study the limiting locations of eigenvalues of sample correlation matrices 𝐂\mathbf{C} and 𝐂~\tilde{\mathbf{C}} in the proportional limiting regime

N,T,NTc>0,\displaystyle{N,T\to\infty},\ \frac{N}{T}\to c>0,

which will be denoted as

N,T{N,T\to\infty}

for simplicity. In this regime, Theorem 1 in El Karoui (2009) related the spectral properties (limiting spectral distributions and limit locations of individual eigenvalues) of 𝐂\mathbf{C} and 𝐂~\tilde{\mathbf{C}} to those of sample covariance matrices 𝚫12𝐒𝚫12\bm{\Delta}^{-\frac{1}{2}}\mathbf{S}\bm{\Delta}^{-\frac{1}{2}} and 𝚫12𝐒~𝚫12\bm{\Delta}^{-\frac{1}{2}}{\tilde{\mathbf{S}}}\bm{\Delta}^{-\frac{1}{2}}, in condition that |𝚫12𝚪|\left|\!\left|\!\left|\bm{\Delta}^{-\frac{1}{2}}\bm{\Gamma}\right|\!\right|\!\right|, the spectral norm of 𝚫12𝚪\bm{\Delta}^{-\frac{1}{2}}\bm{\Gamma}, are uniformly bounded. However, due to the presence of 𝐋\mathbf{L}, this condition is not satisfied by some factor models. For example, let us consider an ENP studied in (Fan & Jiang, 2019; Akama & Husnaqilati, 2022; Akama, 2023) and defined in the following Definition 2.2. Then |𝚫12𝚪|=(L2N+σ2)/(L2+σ2)\left|\!\left|\!\left|\bm{\Delta}^{-\frac{1}{2}}\bm{\Gamma}\right|\!\right|\!\right|=\sqrt{(L^{2}N+\sigma^{2})/(L^{2}+\sigma^{2})} which diverges to \infty as NN\to\infty.

Definition 2.2.

An equi-correlated normal population (ENP for short) is a 11-factor model such that (1) 𝐋=[LL]N\mathbf{L}=[L\cdots L]^{\top}\in{\mathbb{R}}^{N} for some L>0L>0; and (2) 𝚲=σ𝐈\mathbf{\Lambda}=\sigma\mathbf{I} for some σ>0\sigma>0, and (3) fktf_{kt} (1kK=1)({1\leq k\leq K}=1) and ψit\psi_{it} (1iN, 1tT)({1\leq i\leq N},\,{1\leq t\leq T}) are independent standard normal random variables. For an ENP, we define ρ=L2/(L2+σ2)\rho=L^{2}/(L^{2}+\sigma^{2}).

In this paper, for the iith largest singular value si(𝐀)s_{i}\left(\mathbf{A}\right) of a matrix 𝐀\mathbf{A}, we prove the following Theorem in Section A:

Theorem 2.3.

Let 𝐀\mathbf{A} and 𝐁\mathbf{B} be complex matrices of order m×nm\times n and n×pn\times p.

sm(𝐀)si(𝐁)\displaystyle s_{m}\left(\mathbf{A}\right)s_{i}\left(\mathbf{B}\right) si(𝐀𝐁)s1(𝐀)si(𝐁)\displaystyle\leq s_{i}\left(\mathbf{A}\mathbf{B}\right)\leq s_{1}\left(\mathbf{A}\right)s_{i}\left(\mathbf{B}\right) (i1).\displaystyle(i\geq 1).

Thanks to this theorem, we managed to generalize Theorem 1 in El Karoui (2009) to general KK-factor models with possibly unbounded 𝚫12𝚪\bm{\Delta}^{-\frac{1}{2}}\bm{\Gamma}.

Before stating our first main result, we add some additional assumptions.

  1. A2

    Each random variable ψit\psi_{it} has finite forth moment:

    𝔼(|ψit|4)<.\operatorname{\mathbb{E}}\left(|\psi_{it}|^{4}\right)<\infty.
  2. A3

    One of the following assumptions holds: (a) There exists ε>0\varepsilon>0 such that

    𝔼(|ψit|4(log|ψit|)2+2ε)<,\operatorname{\mathbb{E}}\left(\,|\psi_{it}|^{4}(\log|\psi_{it}|)^{2+2\varepsilon}\right)<\infty,

    or (b) 𝚲\mathbf{\Lambda} is diagonal.

For two nonnegative sequences {an}\{a_{n}\} and {bn}\{b_{n}\}, by anbna_{n}\sim b_{n} we mean that there is a positive sequence {τn(0,1)}\{\tau_{n}\in(0,1)\} such that limnτn=1\lim_{n\to\infty}\tau_{n}=1, and for large enough nn, τnanbnτn1an\tau_{n}a_{n}\leq b_{n}\leq\tau_{n}^{-1}a_{n}. For a family of such sequences {an(i)}i\{a_{n}^{(i)}\}_{i} and {bn(i)}i\{b_{n}^{(i)}\}_{i}, we say that an(i)bn(i)a_{n}^{(i)}\sim b_{n}^{(i)} uniformly if there is {τn(0,1)}\{\tau_{n}\in(0,1)\} independent of ii, with limnτn=1\lim_{n\to\infty}\tau_{n}=1, such that τnan(i)bn(i)τn1an(i)\tau_{n}a_{n}^{(i)}\leq b_{n}^{(i)}\leq\tau_{n}^{-1}a_{n}^{(i)} holds for all ii and for large enough nn.

Theorem 2.4.

For a KK-factor model with K0K\geq 0 fixed, if A1-A3 hold, then as N,T{N,T\to\infty}, N/Tc>0N/T\to c>0, for any i=1,2,,Ni=1,2,\dots,N, we have almost surely

λi(𝐂)λi(𝚫12𝐒𝚫12),λi(𝐂~)λi(𝚫12𝐒~𝚫12)\lambda_{i}(\mathbf{C})\sim\lambda_{i}(\bm{\Delta}^{-\frac{1}{2}}\mathbf{S}\bm{\Delta}^{-\frac{1}{2}}),\quad\lambda_{i}(\tilde{\mathbf{C}})\sim\lambda_{i}(\bm{\Delta}^{-\frac{1}{2}}{\tilde{\mathbf{S}}}\bm{\Delta}^{-\frac{1}{2}})

uniformly.

It should be noted that no bounding condition for 𝚫1/2𝚲\bm{\Delta}^{-1/2}\mathbf{\Lambda} is required. Therefore, when 𝐋=𝟎\mathbf{L}=\bm{0}, the theorem is applicable to a general linear model 𝐗=𝐌+𝚲𝚿\mathbf{X}=\mathbf{M}+\mathbf{\Lambda}\mathbf{\Psi}, where 𝚲\mathbf{\Lambda} can be unbounded. Furthermore, assuming 𝐋𝟎\mathbf{L}\neq\bm{0}, we accommodate different distributions for factor and noise components, for example, the factors are allowed to have a heavy-tailed distribution, along with a light-tailed noise.

The general result being stated, it is still complex to determine the asymptotic locations of a sample covariance matrix in general. We content ourselves with some particular cases that extend an ENP.

Model example 1: Constant length factor loading model (CLFM).

Definition 2.5 (Constant length factor loading model).

By a constant length factor loading model ((CLFM for short)), we mean a KK-factor model with 𝚲=σ𝐈\mathbf{\Lambda}=\sigma\mathbf{I} for some σ>0\sigma>0, whose factor loading vectors have the same length, i.e., there is a constant L0L\geq 0 independent of NN and TT such that the iith row vector i\bm{\ell}^{i} (1iN)({1\leq i\leq N}) of 𝐋\mathbf{L} has length i=L\left\|\bm{\ell}^{i}\right\|=L.

Note that a CLFM satisfies automatically A1, and A3(b). Moreover, the largest eigenvalue of 𝐋𝐋\mathbf{L}^{\top}\mathbf{L} has asymptotic tight order NN:

L2NKλ1(𝐋𝐋)L2N.\displaystyle\frac{L^{2}N}{K}\leq\lambda_{1}(\mathbf{L}^{\top}\mathbf{L})\leq L^{2}N. (2)

It is because, by letting k\bm{\ell}_{k} be the kkth column of 𝐋\mathbf{L} for 1kK{1\leq k\leq K}, we have

max1kKk2λ1(𝐋𝐋)trace(𝐋𝐋)=L2NKmax1kKk2\max_{{1\leq k\leq K}}{\left\|\bm{\ell}_{k}\right\|^{2}}\leq\lambda_{1}(\mathbf{L}^{\top}\mathbf{L})\leq\operatorname{trace}(\mathbf{L}^{\top}\mathbf{L})=L^{2}N\leq K\max_{{1\leq k\leq K}}{\left\|\bm{\ell}_{k}\right\|^{2}}

where the first inequality is due to Courant-Fisher min-max theorem (Horn & Johnson, 2013, Theorem 4.2.6), and the equality between the trace and L2NL^{2}N is due to that the length of each row of 𝐋\mathbf{L} is LL.

Furthermore, we will consider the following assumption:

  1. A4

    The rank of 𝐋\mathbf{L} is r0r\geq 0, and the rr nonzero eigenvalues of 𝐋𝐋\mathbf{L}^{\top}\mathbf{L} (if any) tend to infinity: for k=1,,rk=1,\dots,r,

    limNλk(𝐋𝐋)=.\displaystyle\lim_{N\to\infty}\lambda_{k}(\mathbf{L}^{\top}\mathbf{L})=\infty.
Theorem 2.6.

If a CLFM satisfies A2 and A4, then for k=1,,rk=1,\ldots,r,

limN,Tλk(𝐂)λk(𝐋𝐋)=limN,Tλk(𝐂~)λk(𝐋𝐋)\displaystyle\lim_{N,T\to\infty}\frac{\lambda_{k}\left(\mathbf{C}\right)}{\lambda_{k}\left(\mathbf{L}\mathbf{L}^{\top}\right)}=\lim_{N,T\to\infty}\frac{\lambda_{k}(\tilde{\mathbf{C}})}{\lambda_{k}\left(\mathbf{L}\mathbf{L}^{\top}\right)} =1L2+σ2\displaystyle=\frac{1}{L^{2}+\sigma^{2}} (a.s.),\displaystyle(a.s.), (3)

and

limN,Tλr+1(𝐂)=limN,Tλr+1(𝐂~)\displaystyle\lim_{N,T\to\infty}\lambda_{r+1}\left(\mathbf{C}\right)=\lim_{N,T\to\infty}\lambda_{r+1}(\tilde{\mathbf{C}}) =σ2(1+c)2L2+σ2\displaystyle=\frac{\sigma^{2}(1+\sqrt{c})^{2}}{L^{2}+\sigma^{2}} (a.s.).\displaystyle(a.s.). (4)

If a given CLFM is an ENP, then Definition 2.2 implies K=1K=1, which gives λ1(𝐋𝐋)=L2N\lambda_{1}\left(\mathbf{L}\mathbf{L}^{\top}\right)=L^{2}N by (2). Thus, by Theorem 2.6 (3), both limN,Tλ1(𝐂)/N\lim_{N,T\to\infty}\lambda_{1}\left(\mathbf{C}\right)/N and limN,Tλ1(𝐂~)/N\lim_{N,T\to\infty}\lambda_{1}(\tilde{\mathbf{C}})/N are almost surely ρ\rho, which is proved in Akama (2023). Theorem 2.6 (4) implies that limN,Tλr+1(𝐂)=limN,Tλr+1(𝐂~)=(1ρ)(1+c)2\lim_{N,T\to\infty}\lambda_{r+1}\left(\mathbf{C}\right)=\lim_{N,T\to\infty}\lambda_{r+1}(\tilde{\mathbf{C}})=(1-\rho)(1+\sqrt{c})^{2} almost surely.

As a direct application of Theorem 2.6, we establish that the limit of the broken-stick rule BS(𝐂)\operatorname{BS}(\mathbf{C}) and BS(𝐂~)\operatorname{BS}(\tilde{\mathbf{C}}) equal to rr.

Theorem 2.7.

If a CLFM satisfies A2, and if the smallest nonzero eigenvalue (if there exists) of 𝐋𝐋\mathbf{L}^{\top}\mathbf{L} is eventually larger than logN\log N, i.e.,

lim¯Nλr(𝐋𝐋)logN>L2+σ2,\displaystyle\varliminf_{N\to\infty}\frac{\lambda_{r}(\mathbf{L}^{\top}\mathbf{L})}{\log N}>L^{2}+\sigma^{2}, (5)

then

limN,TBS(𝐂)\displaystyle\lim_{N,T\to\infty}\operatorname{BS}(\mathbf{C}) =limN,TBS(𝐂~)=r\displaystyle=\lim_{N,T\to\infty}\operatorname{BS}(\tilde{\mathbf{C}})=r (a.s.).\displaystyle(a.s.).

Here we give a sufficient condition to ensure A4 or (5) above. If the rank of 𝐋\mathbf{L} is rKr\leq K, then by rearranging the columns, 𝐋\mathbf{L} can be written as

𝐋=[𝐋1𝐋1𝐀]\mathbf{L}=\begin{bmatrix}\mathbf{L}_{1}&\mathbf{L}_{1}\mathbf{A}\end{bmatrix}

where 𝐋1\mathbf{L}_{1} is an N×rN\times r matrix and 𝐀\mathbf{A} an r×(Kr)r\times(K-r) matrix. Then by eigenvalue interlacing theorem (Horn & Johnson, 2013, Theorem 4.3.28),

λr(𝐋𝐋)λr(𝐋1𝐋1).\lambda_{r}(\mathbf{L}\mathbf{L}^{\top})\geq\lambda_{r}(\mathbf{L}_{1}\mathbf{L}_{1}^{\top}).

Let 𝐁=diag(1,,r)\mathbf{B}=\operatorname{diag}(\left\|\bm{\ell}_{1}\right\|,\ldots,\left\|\bm{\ell}_{r}\right\|) and

cos(i,j)=i,jij\cos(\bm{\ell}_{i},\bm{\ell}_{j})=\frac{\langle\bm{\ell}_{i},\bm{\ell}_{j}\rangle}{\left\|\bm{\ell}_{i}\right\|\left\|\bm{\ell}_{j}\right\|}

be the cosine of the angle generated by the two vectors i,j\bm{\ell}_{i},\bm{\ell}_{j}. By virtue of 𝐋1𝐋1=𝐁[cos(i,j)]1i,jr𝐁,\mathbf{L}_{1}^{\top}\mathbf{L}_{1}=\mathbf{B}\,\left[\cos(\bm{\ell}_{i},\bm{\ell}_{j})\right]_{1\leq i,j\leq r}\,\mathbf{B}, Theorem 2.3 implies

λr(𝐋1𝐋1)λr([cos(i,j)]1i,jr)min1jrj2.\lambda_{r}(\mathbf{L}_{1}\mathbf{L}_{1}^{\top})\geq\lambda_{r}\left(\,\left[\cos(\bm{\ell}_{i},\bm{\ell}_{j})\right]_{1\leq i,j\leq r}\right)\min_{1\leq j\leq r}\left\|\bm{\ell}_{j}\right\|^{2}.

By (Varah, 1975, Theorem 1),

λr([cos(i,j)]1i,jr)min1kr(1jk|cos(j,k)|).\displaystyle\lambda_{r}\left(\,\left[\cos(\bm{\ell}_{i},\bm{\ell}_{j})\right]_{1\leq i,j\leq r}\right)\geq\min_{1\leq k\leq r}\left(1-\sum_{j\neq k}|\cos(\bm{\ell}_{j},\bm{\ell}_{k})|\right).

Therefore, for any positive sequence {bN>0}\{b_{N}>0\},

lim¯Nmin1kr(1jk|cos(j,k)|)min1krk2bN>1\displaystyle\varliminf_{N\to\infty}\frac{\min_{1\leq k\leq r}\left(1-\sum_{j\neq k}|\cos(\bm{\ell}_{j},\bm{\ell}_{k})|\right)\min_{1\leq k\leq r}\left\|\bm{\ell}_{k}\right\|^{2}}{b_{N}}>1 (6)

is a sufficient condition of

lim¯Nλr(𝐋𝐋)bN>1.\varliminf_{N\to\infty}\frac{\lambda_{r}(\mathbf{L}\mathbf{L}^{\top})}{b_{N}}>1.

This condition is noteworthy because the magnitude of j\left\|\bm{\ell}_{j}\right\| can be seen as the overall influence of the jjth factor {fjt}t\{f_{jt}\}_{t} on the dataset; whereas the normalized vector j/jN\bm{\ell}_{j}/\left\|\bm{\ell}_{j}\right\|\in{\mathbb{R}}^{N} indicates the effects of the jjth factor {fjt}t\{f_{jt}\}_{t} across the NN covariates. By Condition (6), if rr factors exert equally substantial influences on the dataset and their impact distributions are sufficiently independent, then the value of λr(𝐋𝐋)\lambda_{r}(\mathbf{L}\mathbf{L}^{\top}) tends to be large.

Model example 2: Asymptotic constant factor loading model (ACFM).

We consider a KK-factor model satisfying the following

  1. A5

    There is a sequence of KK-dimensional row vectors (k)k1(\bm{\ell}^{k})_{k\geq 1} and a sequence of strict positive numbers (σk)(\sigma_{k}) such that

    𝐋=[(1)(N)],𝚲=diag(σ1,,σN),\mathbf{L}=\begin{bmatrix}(\bm{\ell}^{1})^{\top}&\dots(\bm{\ell}^{N})^{\top}\end{bmatrix}^{\top},\quad\mathbf{\Lambda}=\operatorname{diag}(\sigma_{1},\dots,\sigma_{N}),

    and there are K\bm{\ell}\in{\mathbb{R}}^{K} and σ>0\sigma>0 such that

    limkk=0;limkσk=σ.\lim_{k\to\infty}\left\|\bm{\ell}^{k}-\bm{\ell}\right\|=0;\quad\lim_{k\to\infty}\sigma_{k}=\sigma.

This model was considered in Akama (2023) where the limiting spectral distribution of its sample correlation matrices was derived. The limit of the broken-stick rule BS(𝐂)\operatorname{BS}(\mathbf{C}) and BS(𝐂~)\operatorname{BS}(\tilde{\mathbf{C}}) for this model is determined here.

Theorem 2.8.

If a KK-factor model satisfies A1, A2 and A5, and if in additional,

k=1Nk2=o(logN),\sum_{k=1}^{N}\left\|\bm{\ell}^{k}-\bm{\ell}\right\|^{2}=o(\log N),

then

limN,TBS(𝐂)=limN,TBS(𝐂~)=I(𝟎),(a.s.),\lim_{{N,T\to\infty}}\operatorname{BS}(\mathbf{C})=\lim_{{N,T\to\infty}}\operatorname{BS}(\tilde{\mathbf{C}})=\mathrm{I}\left(\bm{\ell}\neq\bm{0}\right),\quad(a.s.),

where I()\mathrm{I}\left(\cdot\right) is the indicator function.

An equi-correlated normal population (ENP) is a CLFM and an ACLM, but is not a model of Fan et al. (2022); the population covariance matrix of an ENP is Σ=[ρ]+diag(1ρ)\Sigma=[\rho]+\operatorname{diag}(1-\rho) for some constant ρ\rho, so an ENP does not satisfy the condition C3 of Section 2 “High-Dimensional Factor Model” of (Fan et al., 2022). Some model of Fan et al. (2022) is neither a CLFM nor an ACLM, because of the definitions of CLFM and ACLM.

The proofs of Theorems 2.3, 2.4, 2.6, 2.7 and 2.8 are provided in Section A.

3 Broken-stick rule for real datasets

We will check whether the following real datasets are generated by a CLFM or an ACFM, based on Theorems 2.6 and 2.8:

  1. 1.

    the datasets Fan et al. (2022) obtained by cleaning outliers from the datasets of the daily excess returns (Jensen, 1968) of Fama-French 100 portfolios (Fama & French (1993, 2015). See Prof. French’s data library http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/).

  2. 2.

    binary multiple sequence alignment (Quadeer et al., 2014) by the courtesy of Prof. Quadeer.

For the correlation matrices 𝐂\mathbf{C} of these datasets, we will compute the eight quantities:

  1. 1.

    the sample size TT,

  2. 2.

    N/TN/T,

  3. 3.

    λ1(𝐂)/N\lambda_{1}\left(\mathbf{C}\right)/N (cf. Theorem 2.6 (3)),

  4. 4.

    the dimension NN,

  5. 5.

    the broken-stick rule BS(𝐂)\operatorname{BS}(\mathbf{C}),

  6. 6.

    Adjusted correlation thresholding (Fan et al., 2022) ACT(𝐂)\operatorname{ACT}(\mathbf{C}).

  7. 7.

    Bai-Ng’s rule based on information criterion (Bai & Ng, 2002).

In a stock return dataset, NN (TT, resp.) intends the number of companies (trading days1-1, resp.). For i,ti,t (1iN,1tT)({1\leq i\leq N},{1\leq t\leq T}), the iith row of a data matrix 𝐗\mathbf{X} is for the iith company and ttth column is for the ttth trading day. The factors are those of Fama-French, for instance.

3.1 Stock return time-series

Engle & Kelly (2012) proposed their Dynamic Equicorrelation to forecast time-series of economics. In asset pricing and portfolio management, Fama and French designed statistical models, namely, 3-factor model (Fama & French, 1993) and 5-factor model (Fama & French, 2015), to describe stock returns.

Fan et al. (2022) computed their estimator ACT and confirmed the three Fama-French factors (Fama & French, 1993), for the following datasets of the daily excess returns of 100 companies French chose.

  1. 1.

    Dot-com bubble & recession (1998-01-02/2007-12-31).

    In this case N=100N=100 and T=2514T=2514. (“Before 2007-2008 financial crisis”)

  2. 2.

    After Lehman shock (2010-01-04/2019-04-30).

    In this case N=100N=100 and T=2346T=2346. (“After 2007-2008 financial crisis”)

Fama-French Portfolios TT N/T{N}/{T} λ1(𝐂)N\frac{\lambda_{1}\left(\mathbf{C}\right)}{N} NN BS\operatorname{BS} ACT\operatorname{ACT} Bai-Ng
1998/2007 (Dot-com bubble & recession) 2514 .0398 .658 100 2 4 4
2010/2019 (After Lehman shock) 2346 .0426 .806 1 3 5
Table 1: The stock return datasets of the two Fama-French portfolios (1998-01-02/2007-12-31, 2010-01-04/2019-04-30).
Refer to caption
Refer to caption
Figure 1: For 1998-01-02/2007-12-31 (left) and 2010-01-04/2019-04-30 (right), Fisher’s z-transform of equi-correlation coefficient are computed by GJR GARCH+Engle-Kelly’s dynamic equicorrelation. Their cyclical components (lower paletts) is computed by Hodrick-Prescott filter, and the residues (trends) are overlaid in the upper paletts.

As we see in Figure 1, equicorrelation coefficient is low during dot-com bubble by speculation to uncorrelated emergent stocks.

We have two large groups of less speculative companies:

  • the group of non-dot-com companies, in the dot-com bubble.

  • the totality, in the dot-com recession.

Figure 1 is produced by the combination of a few techniques.

  1. 1.

    GJR GARCH with the correlation model being the Dynamic Equicorrelation (Engle & Kelly, 2012).

    This computes the time-series of equicorrelation coefficient ϱ\varrho. Our program utilizes dccmidas package (Candila, 2021) of R.

  2. 2.

    Fisher’s z-transform 12log1+ϱ1ϱ\frac{1}{2}\log\frac{1+\varrho}{1-\varrho}.

    This bijection between (1, 1)(-1,\,1) and (,)(-\infty,\,\infty) makes the movement of ϱ\varrho clearer. By z(ECC) time-series, we mean the time-series of the z-transform of ϱ\varrho. Fisher’s z-transform often obeys asymptotically a normal distribution. In Figure 1, the z(ECC) time-series are on the upper paletts.

  3. 3.

    Hodrick-Prescott filter (Hodrick & Prescott, 1997) to compute the trend and the cyclical component of the z(ECC) time-series.

    From a given time-series 𝒚=[yt]1tTT\bm{y}=[y_{t}]_{{1\leq t\leq T}}\in{\mathbb{R}}^{T}, Hodrick-Prescott filter of parameter λ>0\lambda>0 extracts

    𝒙~=[x~t]1tT=argmin𝒙T(𝒚𝒙2+λ2𝒙2).\tilde{\bm{x}}=[\tilde{x}_{t}]_{{1\leq t\leq T}}=\operatorname*{arg\,min}_{\bm{x}\in{\mathbb{R}}^{T}}\left(\left\|\bm{y}-\bm{x}\right\|^{2}+\lambda\left\|\partial^{2}\bm{x}\right\|^{2}\right).

    Here 2𝒙\partial^{2}\bm{x} is the second difference [xt2xt+1+xt+2]1tT2T2\left[x_{t}-2x_{t+1}+x_{t+2}\right]_{1\leq t\leq T-2}\in{\mathbb{R}}^{T-2} of 𝒙=[xt]1tTT\bm{x}=\left[x_{t}\right]_{{1\leq t\leq T}}\in{\mathbb{R}}^{T}. Then, x~t\tilde{x}_{t}’s are weighted averages of y1,,yTy_{1},\ldots,y_{T} and satisfy t=1T(ytx~t)=0\sum_{t=1}^{T}(y_{t}-\tilde{x}_{t})=0. limλ𝒙~\lim_{\lambda\to\infty}\tilde{\bm{x}} is the arithmetical progression obtained from 𝒚\bm{y} by the least square approach, but 𝒙~\tilde{\bm{x}} is 𝒚\bm{y} for λ=0\lambda=0. 𝒙~\tilde{\bm{x}} is called a trend of the time-series 𝒚\bm{y} and 𝒚𝒙~\bm{y}-\tilde{\bm{x}} a cyclical component of 𝒚\bm{y}.

    In Figure 1, the cyclical components are on the lower paletts, and the residues (trends) are overlaid on the upper paletts.

For the time-series of equicorrelation coefficient based on Engle & Kelly (2012), Wang et al. (2020) applied a linear regression analysis and regarded the residue as the equi-correlation of industry return (IEC). They claimed that “We can see that the index displays prominent countercyclical behavior in that it always rises during the recession periods and declines during the expansion periods. This is expected because economic recessions lead to greater comovement among stock returns”. Fisher’s z-transformation emphasizes the movement of the time-series of equicorrelation coefficient (ECC), and Hodrick-Prescott filter improves the linear regression analysis of the ECC time-series.

As we see in Figure 1, the trend of z(ECC) time-series during period Dot-com bubble & recession (1998-01-02/2007-12-31), is more changing than that of z(ECC) time-series after Lehman shock. One may suppose that investors speculated much to seemingly uncorrelated stocks during the Dot-com bubble.

3.2 Binary multiple sequence alignment

We consider the binary multiple sequence alignment (MSA for short) of an NN-residue (site) protein with TT sequences where N=475N=475 and T=2815T=2815. The dataset is by the courtesy of Quadeer. Quadeer et al. did “identify groups of coevolving residues within HCV nonstructural protein 3 (NS3) by analyzing diverse sequences of this protein using ideas from random matrix theory and associated methods” (Quadeer et al., 2014, p. 7628), and also found “Sequence analysis reveals three sectors of collectively evolving sites in NS3. …, there remained α=9\alpha=9 eigenvalues greater than λmaxrnd\lambda^{\mbox{rnd}}_{\mbox{max}}, presumably representing intrinsic correlations”(Quadeer et al., 2014, p. 7631). They detected signals by a randomization from the data.

On the statistical model of Quadeer et al. (2014), Morales-Jimenez, one of the authors of Quadeer et al. (2018), commented “the majority of variables (protein positions in the genome) are essentially independent, and there are just some small groups of variables which are correlated, giving rise to the different spikes. These group of variables can be modeled with equi-correlation, but the size of these groups is modeled as fixed, i.e., not growing with the dimension of the protein. That leads to a non-divergent spiked model, like the one considered in our Stat Sinica paper” Morales-Jimenez et al. (2021).

Nonetheless, for the dataset of the binary MSA (Quadeer et al., 2014), the broken-stick rule and the adjusted correlation thresholding work well.

TT N/TN/T λ1(𝐂)/N\lambda_{1}\left(\mathbf{C}\right)/N NN BS\operatorname{BS} ACT\operatorname{ACT} Bai-Ng
MSA 2815 0.1687 0.0216 475 3 10 4
Table 2: The binary MSA dataset.
  • The number 3 of the sectors is detected by the broken-stick rule BS(𝐂)=3\operatorname{BS}(\mathbf{C})=3.

  • The number α=9\alpha=9 of eigenvalues greater than λmaxrnd\lambda^{\mathrm{rnd}}_{\mathrm{max}} is close to ACT(𝐂)=10\operatorname{ACT}(\mathbf{C})=10. Here ACT\operatorname{ACT} is studied with the proportional limiting regime N,T,N/Tc>0{N,T\to\infty},\,N/T\to c>0.

To end this section about the broken-stick rule for the stock return datasets and the binary MSA dataset, we observe: Fama-French portfolio (Dot-com bubble & recession (1998-01-02/2007-12-31)), and the binary MSA dataset fit a CLFM with r=2r=2 but not an ACFM. The Fama-French portfolio (after Lehman shock (2010-01-04/2019-04-30)) fits both ACFM and a CLFM with r=1r=1.

In conclusion, the pair of a CLFM and BS rule may be more useful than the pair of a model and ACT (Fan et al., 2022), to estimate the number of factors of the binary MSA dataset; the number of factors estimated by our approach is the same as that estimated by (Quadeer et al., 2014). On the other hand, ACT may be useful for Fama-French portfolios, as they can predict the Fama-French factors.

In theory, to distinguish a model CLFM of r=1r=1 from an ACFM, one may think of 𝐃=diag(D11,,DNN)\mathbf{D}=\operatorname{diag}(D_{11},\ldots,D_{NN}). The diagonal entries DiiD_{ii} are asymptotically equal to the corresponding entries of 𝚫\bm{\Delta} (their ratios converge almost surely to 1 by Lemma A.1). And the diagonal entries of 𝚫\bm{\Delta}, are just the Euclidean norm of rows of 𝚲\mathbf{\Lambda}.

Once we find a theoretically appropriate scaling and shifting of DiiD_{ii}’s to uncover the concentration of DiiD_{ii}’s, we could decide whether a given dataset fit a CLFM, an ACFM, or the generalization.

4 Conclusion

We suppose the proportional limiting regime for factor models. We established a general theorem for the asymptotic equispectrality of 𝐂\mathbf{C} and the naturally normalized sample covariance matrix for factor models. Then, we derived the following assertions: for the introduced two models (CLFM and ACFM), the limiting largest bounded eigenvalue and the limiting spectral distribution of the sample correlation matrix 𝐂\mathbf{C}, are scaling of those of 𝐂\mathbf{C} of the i.i.d. case, with the scaling constant 1ρ1-\rho. Here ρ\rho is “a reduced correlation coefficient”. The largest eigenvalue of 𝐂\mathbf{C} divided by the order NN of 𝐂\mathbf{C} converges almost surely to ρ\rho.

Notably, for an ACFM, the BS rule computes not the rank of the factor loading matrix 𝐋\mathbf{L}, but the essential rank (=1)(=1) (Theorem 2.8). In other words, the deterministic decreasing vector 𝒚=(j=kK(jK)1)k=1K\bm{y}=(\sum_{j=k}^{K}(jK)^{-1})_{k=1}^{K} of the BS rule examines an intrinsic structure of 𝐋\mathbf{L}, for an ACFM. Moreover, 𝒚K\bm{y}\in{\mathbb{R}}^{K} is the descending list of the limits of the eigenvalues of 𝐋𝐋N×N\mathbf{L}\mathbf{L}^{\top}\in{\mathbb{R}}^{N\times N} in NN\to\infty, for a random factor loading matrix 𝐋=[Bikyik]1iN,1kK\mathbf{L}=\left[B_{ik}\sqrt{y_{ik}}\right]_{{1\leq i\leq N},{1\leq k\leq K}} where for each ii (1iN)({1\leq i\leq N}),

  • 𝒃i=(Bik)k=1K\bm{b}_{i}=(B_{ik})_{k=1}^{K} is a random vector of independent Rademacher variables,

  • 𝒚i=(yik)k=1K\bm{y}_{i}=(y_{ik})_{k=1}^{K} is such that yiky_{ik} is the length of the kkth longest subintervals obtained by K1K-1 independent uniformly distributed separators of [0, 1][0,\,1];

and 𝒃1,,𝒃N,𝒚1,,𝒚N\bm{b}_{1},\ldots,\bm{b}_{N},\ \bm{y}_{1},\ldots,\bm{y}_{N} are independent. In this case, 𝐋\mathbf{L} does not satisfy the condition of an ACFM. A limit theory for the order statistics of the eigenvalues of a sample covariance/correlation matrix is awaited to analyze the BS rule.

References

  • (1)
  • Aït-Sahalia & Xiu (2017) Aït-Sahalia, Y. & Xiu, D. (2017), ‘Using principal component analysis to estimate a high dimensional factor model with high-frequency data’, J. Econom. 201(2), 384–399.
  • Akama (2023) Akama, Y. (2023), ‘Correlation matrix of equi-correlated normal population: fluctuation of the largest eigenvalue, scaling of the bulk eigenvalues, and stock market’, Int. J. Theor. Appl. Finance 26, 2350006.
  • Akama & Husnaqilati (2022) Akama, Y. & Husnaqilati, A. (2022), ‘A dichotomous behavior of Guttman-Kaiser criterion from equi-correlated normal population’, J. Indones. Math. Soc. 28(3), 272–303.
  • Anderson (2003) Anderson, T. W. (2003), An introduction to multivariate statistical analysis, 3rd edn, Wiley.
  • Bai & Ng (2002) Bai, J. & Ng, S. (2002), ‘Determining the number of factors in approximate factor models’, Econometrica 70(1), 191–221.
  • Bai & Silverstein (2010) Bai, Z. D. & Silverstein, J. W. (2010), Spectral analysis of large dimensional random matrices, 2nd edn, Springer.
  • Bai & Yin (1993) Bai, Z. D. & Yin, Y. Q. (1993), ‘Limit of the smallest eigenvalue of a large dimensional sample covariance matrix’, Ann. Probab. 21(3), 1275–1294.
  • Bai & Zhou (2008) Bai, Z. & Zhou, W. (2008), ‘Large sample covariance matrices without independence structures in columns’, Stat. Sin. 18(2), 425–442.
    https://www.jstor.org/stable/24308489
  • Baik & Silverstein (2006) Baik, J. & Silverstein, J. W. (2006), ‘Eigenvalues of large sample covariance matrices of spiked population models’, J. Multivar. Anal. 97(6), 1382–1408.
  • Cai et al. (2020) Cai, T., Han, X. & Pan, G. (2020), ‘Limiting laws for divergent spiked eigenvalues and largest nonspiked eigenvalue of sample covariance matrices’, Ann. Stat. 48(3), 1255–1280.
  • Candila (2021) Candila, V. (2021), Package ‘dccmidas’ (DCC Models with GARCH-MIDAS Specifications in the Univariate Step). R package version 0.1.0.
    https://CRAN.R-project.org/package=dccmidas
  • Chamberlain & Rothschild (1983) Chamberlain, G. & Rothschild, M. (1983), ‘Arbitrage, factor structure, and mean-variance analysis on large asset markets’, Econometrica 51(5), 1281–1304.
    http://www.jstor.org/stable/1912275
  • El Karoui (2009) El Karoui, N. (2009), ‘Concentration of measure and spectra of random matrices: Applications to correlation matrices, elliptical distributions and beyond’, Ann. Appl. Probab. 19(6), 2362–2405.
    http://www.jstor.org/stable/25662544
  • Engle & Kelly (2012) Engle, R. & Kelly, B. (2012), ‘Dynamic equicorrelation’, J. Bus. Econ. Stat. 30(2), 212–228.
  • Fama & French (1993) Fama, E. F. & French, K. R. (1993), ‘Common risk factors in the returns on stocks and bonds’, J. Financ. Econ. 33(1), 3–56.
  • Fama & French (2015) Fama, E. F. & French, K. R. (2015), ‘A five-factor asset pricing model’, J. Financ. Econ. 116(1), 1–22.
    https://www.sciencedirect.com/science/article/pii/S0304405X14002323
  • Fan et al. (2022) Fan, J., Guo, J. & Zheng, S. (2022), ‘Estimating number of factors by adjusted eigenvalues thresholding’, J. Am. Stat. Assoc. 117(538), 852–861.
    https://doi.org/10.1080/01621459.2020.1825448
  • Fan & Jiang (2019) Fan, J. & Jiang, T. (2019), ‘Largest entries of sample correlation matrices from equi-correlated normal populations’, Ann. Probab. 47(5), 3321–3374.
  • Frontier (1976) Frontier, S. (1976), ‘Étude de la décroissance des valeurs propres dans une analyse en composantes principales: Comparaison avec le modèle du bâton brisé’, J. Exp. Mar. Biol. Ecol. 25, 67–75.
  • Fujikoshi et al. (2011) Fujikoshi, Y., Ulyanov, V. V. & Shimizu, R. (2011), Multivariate statistics: High-dimensional and large-sample approximations, John Wiley & Sons.
  • Hodrick & Prescott (1997) Hodrick, R. J. & Prescott, E. C. (1997), ‘Postwar U.S. business cycles: An empirical investigation’, J. Money Credit Bank. 29(1), 1–16.
    http://www.jstor.org/stable/2953682
  • Holst (1980) Holst, L. (1980), ‘On the lengths of the pieces of a stick broken at random’, J. Appl. Prob. 17, 623–634.
  • Horn & Johnson (2013) Horn, R. A. & Johnson, C. R. (2013), Matrix analysis, 2nd edn, Cambridge University Press.
  • Jackson (1993) Jackson, D. A. (1993), ‘Stopping rules in principal components analysis: A comparison of heuristical and statistical approaches’, Ecology 74(8), 2204–2214.
    http://www.jstor.org/stable/1939574
  • Jensen (1968) Jensen, M. C. (1968), ‘The performance of mutual funds in the period 1945–1964’, J. Finance 23(2), 389–416.
    https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1540-6261.1968.tb00815.x
  • Jiang (2004) Jiang, T. (2004), ‘The limiting distributions of eigenvalues of sample correlation matrices’, Sankhyā: The Indian Journal of Statistics (2003-2007) 66(1), 35–48.
    http://www.jstor.org/stable/25053330
  • Johnson & Wichern (2007) Johnson, R. A. & Wichern, D. W. (2007), Applied Multivariate Statistical Analysis, 6th edn, Pearson Prentice Hall.
  • Johnstone (2001) Johnstone, I. M. (2001), ‘On the distribution of the largest eigenvalue in principal components analysis’, Ann. Stat. 29(2), 295–327.
  • Jolliffe (2002) Jolliffe, I. T. (2002), Principal Component Analysis, 2nd edn, Springer.
    https://link.springer.com/book/10.1007/b98835
  • Jolliffe & Cadima (2016) Jolliffe, I. T. & Cadima, J. (2016), ‘Principal component analysis: a review and recent developments’, Philos. Trans. Royal Soc. A 374, 20150202.
    http://doi.org/10.1098/rsta.2015.0202
  • Koltchinskii & Lounici (2017) Koltchinskii, V. & Lounici, K. (2017), ‘Concentration inequalities and moment bounds for sample covariance operators’, Bernoulli 23(1), 110–133.
    https://doi.org/10.3150/15-BEJ730
  • Kuiken et al. (2004) Kuiken, C., Yusim, K., Boykin, L. & Richardson, R. (2004), ‘The Los Alamos hepatitis C sequence database’, Bioinformatics 21(3), 379–384.
    https://doi.org/10.1093/bioinformatics/bth485
  • Lam & Yao (2012) Lam, C. & Yao, Q. (2012), ‘Factor modeling for high-dimensional time series: Inference for the number of factors’, Ann. Stat. 40(2), 694–726.
    https://doi.org/10.1214/12-AOS970
  • Ledoux (2001) Ledoux, M. (2001), The concentration of measure phenomenon, Vol. 89 of Mathematical Surveys and Monographs, American Mathematical Society.
    https://doi.org/10.1090/surv/089
  • Mallows (1991) Mallows, C. (1991), ‘Another comment on O’Cinneide’, Am. Stat. 45(3), 257.
  • Merlevède et al. (2019) Merlevède, F., Najim, J. & Tian, P. (2019), ‘Unbounded largest eigenvalue of large sample covariance matrices: Asymptotics, fluctuations and applications’, Linear Algebra Its Appl. 577, 317–359.
  • Morales-Jimenez et al. (2021) Morales-Jimenez, D., Johnstone, I. M., McKay, M. R. & Yang, J. (2021), ‘Asymptotics of eigenstructure of sample correlation matrices for high-dimensional spiked models’, Stat. Sin. 31(2), 571.
  • Muirhead (2009) Muirhead, R. J. (2009), Aspects of multivariate statistical theory, John Wiley & Sons.
  • Paul & Aue (2014) Paul, D. & Aue, A. (2014), ‘Random matrix theory in statistics: A review’, J. Stat. Plan. Inference 150, 1–29.
    https://www.sciencedirect.com/science/article/pii/S0378375813002280
  • Quadeer et al. (2014) Quadeer, A. A., Louie, R. H., Shekhar, K., Chakraborty, A. K., Hsing, I. & McKay, M. R. (2014), ‘Statistical linkage analysis of substitutions in patient-derived sequences of genotype 1a hepatitis C virus nonstructural protein 3 exposes targets for immunogen design’, J. Virol. 88(13), 7628–7644.
  • Quadeer et al. (2018) Quadeer, A. A., Morales-Jimenez, D. & McKay, M. R. (2018), ‘Co-evolution networks of HIV/HCV are modular with direct association to structure and function’, PLoS Computational Biology 14, 1–29.
  • Tomkins (1975) Tomkins, R. J. (1975), ‘On Conditional Medians’, Ann. Probab. 3(2), 375–379.
    https://doi.org/10.1214/aop/1176996411
  • Varah (1975) Varah, J. M. (1975), ‘A lower bound for the smallest singular value of a matrix’, Linear Algebra Its Appl. 11(1), 3–5.
  • Wang & Fan (2017) Wang, W. & Fan, J. (2017), ‘Asymptotics of empirical eigenstructure for high dimensional spiked covariance’, Ann. Stat. 45(3), 1342–1374.
    https://doi.org/10.1214/16-AOS1487
  • Wang et al. (2020) Wang, Y., Pan, Z., Wu, C. & Wu, W. (2020), ‘Industry equi-correlation: A powerful predictor of stock returns’, J. Empir. Finance 59, 1–24.
    https://www.sciencedirect.com/science/article/pii/S092753982030044X
  • Yao et al. (2015) Yao, J., Zheng, S. & Bai, Z. D. (2015), Sample covariance matrices and high-dimensional data analysis, Cambridge University Press.
    https://doi.org/10.1017/CBO9781107588080
  • Yata & Aoshima (2013) Yata, K. & Aoshima, M. (2013), ‘PCA consistency for the power spiked model in high-dimensional settings’, J. Multivar. Anal. 122, 334–354.
  • Yin et al. (1988) Yin, Y. Q., Bai, Z. D. & Krishnaiah, P. R. (1988), ‘On the limit of the largest eigenvalue of the large dimensional sample covariance matrix’, Probab. Theory Relat. Fields 78, 509–521.

Supplementary material for the manuscript “Asymptotic locations of bounded and unbounded eigenvalues of sample correlation matrices of certain factor models - application to a components retention rule”


This supplementary article proves Theorems 2.3, 2.4, 2.6, 2.7, and 2.8 of the main manuscript and collects some useful lemmas. The literature cited in this supplementary material are listed in References of the main text.

Appendix A Proofs of theorems

A.1 Proof of Theorem 2.3

Proposition B.4 clearly implies the latter inequality

si(𝐀𝐁)s1(𝐀)si(𝐁).\displaystyle s_{i}\left(\mathbf{A}\mathbf{B}\right)\leq s_{1}\left(\mathbf{A}\right)s_{i}\left(\mathbf{B}\right). (7)

Now we prove the other inequality sm(𝐀)si(𝐁)si(𝐀𝐁)s_{m}\left(\mathbf{A}\right)s_{i}\left(\mathbf{B}\right)\leq s_{i}\left(\mathbf{A}\mathbf{B}\right). If sm(𝐀)=0s_{m}\left(\mathbf{A}\right)=0, then there is nothing to prove. We now assume that sm(𝐀)>0s_{m}\left(\mathbf{A}\right)>0. Then it is necessary that mnm\leq n and the Hermitian matrix 𝐀𝐀\mathbf{A}\mathbf{A}^{*} is positive definite. Let 𝐀+\mathbf{A}^{+} be the Moore-Penrose inverse of 𝐀\mathbf{A} defined as

𝐀+:=𝐀(𝐀𝐀)1.\mathbf{A}^{+}:=\mathbf{A}^{*}(\mathbf{A}\mathbf{A}^{*})^{-1}.

By (7) and 𝐀𝐀+=𝐈m\mathbf{A}\mathbf{A}^{+}=\mathbf{I}_{m},

si(𝐁)=si(𝐁𝐀𝐀+)si(𝐁𝐀)s1(𝐀+).\displaystyle s_{i}\left(\mathbf{B}\right)=s_{i}\left(\mathbf{B}\mathbf{A}\mathbf{A}^{+}\right)\leq s_{i}\left(\mathbf{B}\mathbf{A}\right)s_{1}\left(\mathbf{A}^{+}\right). (8)

Noticing that

s1(𝐀+)=λ1(𝐀(𝐀𝐀)2𝐀)=λ1((𝐀𝐀)1)=1sm(𝐀),s_{1}\left(\mathbf{A}^{+}\right)=\sqrt{\lambda_{1}(\mathbf{A}^{*}(\mathbf{A}\mathbf{A}^{*})^{-2}\mathbf{A})}=\sqrt{\lambda_{1}((\mathbf{A}\mathbf{A}^{*})^{-1})}=\frac{1}{s_{m}\left(\mathbf{A}\right)},

we obtained the wanted inequality by multiplying sm(𝐀)s_{m}\left(\mathbf{A}\right) on both sides of (8).

A.2 Proof of Theorem 2.4

Note that 𝐂=(𝐃1/2𝚫1/2)(𝚫1/2𝐒𝚫1/2)(𝚫1/2𝐃1/2)\mathbf{C}=(\mathbf{D}^{-1/2}\bm{\Delta}^{1/2})(\bm{\Delta}^{-1/2}\mathbf{S}\bm{\Delta}^{-1/2})(\bm{\Delta}^{1/2}\mathbf{D}^{-1/2}) and the corresponding formula for 𝐂~\tilde{\mathbf{C}}. By Theorem 2.3, we have only to prove that all the singular values of 𝐃1/2𝚫1/2\mathbf{D}^{-1/2}\bm{\Delta}^{1/2} and 𝐃~1/2𝚫1/2\tilde{\mathbf{D}}^{-1/2}\bm{\Delta}^{1/2} are concentrated near 11, as follows:

Lemma A.1 (Key).

Assume the same condition as Theorem 2.4. Namely, suppose that we are given a KK-factor model with A1-A3.

limN,T|𝐃~𝚫1𝐈N|=limN,T|𝐃𝚫1𝐈N|=0\displaystyle\lim_{N,T\to\infty}\left|\!\left|\!\left|\tilde{\mathbf{D}}\bm{\Delta}^{-1}-\mathbf{I}_{N}\right|\!\right|\!\right|=\lim_{N,T\to\infty}\left|\!\left|\!\left|\mathbf{D}\bm{\Delta}^{-1}-\mathbf{I}_{N}\right|\!\right|\!\right|=0 (a.s.).\displaystyle(a.s.).

We now give the proof of this lemma. Recall that 𝐂\mathbf{C} is invariant under scaling and shifting of data. In the proof, we assume that 𝝁=𝟎\bm{\mu}=\bm{0} and the length of each row vector of 𝚪\bm{\Gamma} is 11: 𝜸i=1\left\|\bm{\gamma}^{i}\right\|=1 for i=1,,Ni=1,\dots,N without loss of generality. Then 𝚫=𝐈N\bm{\Delta}=\mathbf{I}_{N}. The almost sure convergence in the limiting regime N,T,N/Tc{N,T\to\infty},\,N/T\to c is denoted by a.s.\mathrel{\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}}. This proof is separated by two parts: the first assuming A3(b), and the second assuming A3(a).

Proof under A3(b).

Let 𝚲=diag(σ1,,σN)\mathbf{\Lambda}=\operatorname{diag}(\sigma_{1},\dots,\sigma_{N}). We prove that |𝐃~𝐈|a.s.0\left|\!\left|\!\left|\tilde{\mathbf{D}}-\mathbf{I}\right|\!\right|\!\right|\mathrel{\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}}0. It suffices to confirm

limN,Tmax1iN|𝒙i2T1|\displaystyle\lim_{N,T\to\infty}\max_{{1\leq i\leq N}}\left|\frac{\left\|\bm{x}^{i}\right\|^{2}}{T}-1\right| =0\displaystyle=0 (a.s.).\displaystyle(a.s.). (9)

Let 𝐋=[ik]1iN,1kKN×K\mathbf{L}=[\ell_{ik}]_{{1\leq i\leq N},1\leq k\leq K}\in{\mathbb{R}}^{N\times K}. Because of 𝜸i=1\left\|\bm{\gamma}^{i}\right\|=1, we have

k=1Kik2+σi2=1.\displaystyle\sum_{k=1}^{K}\ell_{ik}^{2}+\sigma_{i}^{2}=1. (10)

Thus

𝒙i2T1\displaystyle\frac{\left\|\bm{x}^{i}\right\|^{2}}{T}-1 =(k=1Kik2t=1T(fkt2T1)+1k<kK2ikikt=1TfktfktT+k=1K2σiikt=1TfktψitT+σi2(t=1Tψit2T1)).\displaystyle=\left(\begin{aligned} \sum_{k=1}^{K}\ell_{ik}^{2}\sum_{t=1}^{T}\left(\frac{f_{kt}^{2}}{T}-1\right)+\sum_{1\leq k<k^{\prime}\leq K}2\ell_{ik}\ell_{ik^{\prime}}\sum_{t=1}^{T}\frac{f_{kt}{f}_{k^{\prime}t}}{T}\\ +\sum_{k=1}^{K}{2\sigma_{i}\ell_{ik}\sum_{t=1}^{T}}\frac{f_{kt}\psi_{it}}{T}+\sigma_{i}^{2}\left(\sum_{t=1}^{T}\frac{\psi_{it}^{2}}{T}-1\right)\end{aligned}\right).

Here max{ik2, 2|ikik|, 2|σiik|,σi2}1\max\left\{\ell_{ik}^{2},\,2|\ell_{ik}\ell_{ik^{\prime}}|,\,2|\sigma_{i}\ell_{ik}|,\,\sigma_{i}^{2}\right\}\leq 1 for i=1,,Ni=1,\dots,N, 1kkT1\leq k\neq k^{\prime}\leq T. As a result, max1iN|𝒙i2T1|\max_{{{1\leq i\leq N}}}\left|\frac{\left\|\bm{x}^{i}\right\|^{2}}{T}-1\right| is at most

k=1K|t=1Tfkt2T1|+1k<kK|t=1TfktfktT|+k=1Kmax1iN|t=1TfktψitT|+max1iN|t=1Tψit2T1|.\sum_{k=1}^{K}\left|\sum_{t=1}^{T}{\displaystyle\frac{f_{kt}^{2}}{T}}-1\right|+\displaystyle\sum_{1\leq k<k^{\prime}\leq K}\left|\sum_{t=1}^{T}{\frac{f_{kt}{f}_{k^{\prime}t}}{T}}\right|+\sum_{k=1}^{K}\max_{1\leq i\leq N}\left|{\sum_{t=1}^{T}}\frac{f_{kt}\psi_{it}}{T}\right|+\max_{1\leq i\leq N}\left|\sum_{t=1}^{T}\frac{\psi_{it}^{2}}{T}-1\right|. (11)

Then, for each possible 1kK1\leq k\leq K and 1k<kK1\leq k<k^{\prime}\leq K, we can prove that each of the four term of (11) tends to 0 almost surely, as follows:

  1. (i)

    {fkt2:t1}\set{f_{kt}^{2}\colon t\geq 1} is an array of i.i.d. random variables with mean 11. Use Lemma B.5 with α=1,β=0\alpha=1,\beta=0.

  2. (ii)

    {fktfkt:t1}\set{f_{kt}{f}_{k^{\prime}t}\colon t\geq 1} for kkk\neq k^{\prime} is an array of centered i.i.d. random variables, since fktf_{kt} and fk,tf_{k^{\prime},t} are independent and centered. Use Lemma B.5 with α=1,β=0\alpha=1,\beta=0.

  3. (iii)

    {fktψit:1iN, 1tT}\set{f_{kt}\psi_{it}\colon{1\leq i\leq N},\,{1\leq t\leq T}} is a double array of centered i.i.d. random variables, since both of fktf_{kt} and ψit\psi_{it} are independent and centered. From 𝕍fkt=𝕍ψit=1\operatorname{\mathbb{V}}f_{kt}=\operatorname{\mathbb{V}}\psi_{it}=1 by Definition 2.1, we derive

    𝔼|fktψit|2=𝔼|fkt|2𝔼|ψit|2<.\operatorname{\mathbb{E}}|f_{kt}\psi_{it}|^{2}=\operatorname{\mathbb{E}}|f_{kt}|^{2}\operatorname{\mathbb{E}}|\psi_{it}|^{2}<\infty.

    Use Lemma B.5 with α=1,β=1\alpha=1,\beta=1.

  4. (iv)

    {ψit2:1iN, 1tT}\set{\psi_{it}^{2}\colon{1\leq i\leq N},\,{1\leq t\leq T}} is a double array of i.i.d. random variables with mean 11 and 𝔼|ψit|4<\operatorname{\mathbb{E}}|\psi_{it}|^{4}<\infty by A2. Use Lemma B.5 with α=β=1\alpha=\beta=1.

Hence, (9) is confirmed.

Now we prove that |𝐃𝐈|a.s.0\left|\!\left|\!\left|\mathbf{D}-\mathbf{I}\right|\!\right|\!\right|\mathrel{\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}}0 as N,T{N,T\to\infty}. It suffices to guarantee

limN,Tmax1iN|𝒙i𝒙¯i2T1|\displaystyle\lim_{{N,T\to\infty}}\max_{{1\leq i\leq N}}\left|\frac{\left\|{\bm{x}^{i}-{\bar{\bm{x}}}^{i}}\right\|^{2}}{T}-1\right| =0\displaystyle=0 (a.s.).\displaystyle(a.s.). (12)

Let x¯i=T1t=1Txit\bar{x}_{i}=T^{-1}\sum_{t=1}^{T}x_{it} be the sample average of the iith row of 𝐗\mathbf{X}. Then by 𝒙i𝒙¯i2=𝒙i2T|x¯i|2\left\|\bm{x}^{i}-{\bar{\bm{x}}}^{i}\right\|^{2}=\left\|\bm{x}^{i}\right\|^{2}-T\left|\bar{x}_{i}\right|^{2},

max1iN|𝒙i𝒙¯i2T1|max1iN|𝒙i2T1|+max1iN|x¯i|2.\displaystyle\max_{{{1\leq i\leq N}}}\left|\frac{\left\|\bm{x}^{i}-\bar{\bm{x}}^{i}\right\|^{2}}{T}-1\right|\leq\max_{{{1\leq i\leq N}}}\left|\frac{\left\|\bm{x}^{i}\right\|^{2}}{T}-1\right|+\max_{{{1\leq i\leq N}}}\left|\bar{x}_{i}\right|^{2}. (13)

The first term of the right side converges almost surely to 0, by (9).

As for the second term max1iN|x¯i|2\max_{{{1\leq i\leq N}}}\left|\bar{x}_{i}\right|^{2}, from the definition of x¯i\bar{x}_{i}, |ik|1|\ell_{ik}|\leq 1, and |σi|1|\sigma_{i}|\leq 1, we get

max1iN|x¯i|k=1K|t=1TfktT|+max1iN|t=1TψitT|.\displaystyle\max_{{1\leq i\leq N}}\left|\bar{x}_{i}\right|\leq\sum_{k=1}^{K}\left|\sum_{t=1}^{T}\frac{f_{kt}}{T}\right|+\max_{{1\leq i\leq N}}\left|\sum_{t=1}^{T}\frac{\psi_{it}}{T}\right|.

The first term of the right side converges almost surely to 0, as for each 1kK1\leq k\leq K, {fkt:t1}\set{f_{kt}\colon t\geq 1} is an array of centered i.i.d. random variables. The second term of the right side converges almost surely to 0 by Lemma B.5, since {ψit:1iN, 1tT}\set{\psi_{it}\colon{1\leq i\leq N},\,{1\leq t\leq T}} is a double array of i.i.d. random variables. Therefore the right side of (13) converges to 0 almost surely. Consequently, (12) is guaranteed. Q.E.D.

Proof under A3(a).

This part is inspired by the proof of Lemma 4 in El Karoui (2009). First, we proceed the truncation on the random variables ψit\psi_{it}. Let

𝐓N=[ψitI(|ψit|<T(logT)1+ε)]i,t\mathbf{T}_{N}=\left[\psi_{it}\mathrm{I}\left(|\psi_{it}|<\sqrt{\frac{T}{(\log T)^{1+\varepsilon}}}\right)\right]_{i,t}

be the N×TN\times T matrix with truncated entries, where ‘I\mathrm{I}’ is the indicator function and ε>0\varepsilon>0. Then we re-center the entries of 𝐓N\mathbf{T}_{N} by defining

𝚿^=𝐓NeT𝟏N×T,\hat{\mathbf{\Psi}}=\mathbf{T}_{N}-e_{T}\mathbf{1}_{N\times T},

where

eT=𝔼(ψitI(|ψit|<T/(logT)1+ε))e_{T}=\operatorname{\mathbb{E}}\left(\psi_{it}\mathrm{I}\left(|\psi_{it}|<\sqrt{T/(\log T)^{1+\varepsilon}}\right)\right)

and 𝟏N×T\mathbf{1}_{N\times T} is the N×TN\times T matrix of unity. Let ψ^it{{\hat{\psi}}}_{it} be the entries of 𝚿^\hat{\mathbf{\Psi}}.

We will prove that after replacing 𝚿\mathbf{\Psi} with 𝚿^\hat{\mathbf{\Psi}}, the diagonal entries of 𝐃~\tilde{\mathbf{D}} and 𝐃\mathbf{D} do not change a lot. By A3 and (El Karoui 2009, Lemma 2), almost surely for large enough NN, we have 𝚿=𝐓N\mathbf{\Psi}=\mathbf{T}_{N}, so almost surely, for large enough NN, the truncation does not modify 𝐃\mathbf{D} and 𝐃~\tilde{\mathbf{D}}. Also 𝐃\mathbf{D} is invariant under the translation, so the centering does not impact 𝐃\mathbf{D}. For 𝐃~\tilde{\mathbf{D}}, because ψit\psi_{it} is centered and has finite forth moment by A2, we get

(T(logT)1+ε)3/2|eT|𝔼(|ψit|4I(|ψit|T(logT)1+ε)).\left(\frac{T}{(\log T)^{1+\varepsilon}}\right)^{3/2}|e_{T}|\ \leq\ \operatorname{\mathbb{E}}\left(|\psi_{it}|^{4}\mathrm{I}\left(|\psi_{it}|\geq\sqrt{\frac{T}{(\log T)^{1+\varepsilon}}}\right)\right).

Here the right side is O(1)O(1). Thus eT=o(T1)e_{T}=o(T^{-1}). In the limiting regime N,T,N/Tc{N,T\to\infty},N/T\to c,

|𝚿𝚿^|=eT|𝟏N×T|=eTNTa.s.0.\left|\!\left|\!\left|\mathbf{\Psi}-\hat{\mathbf{\Psi}}\right|\!\right|\!\right|=e_{T}\left|\!\left|\!\left|\mathbf{1}_{N\times T}\right|\!\right|\!\right|=e_{T}\sqrt{NT}\,\mathrel{\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}}0.

Let 𝒙^i\hat{\bm{x}}^{i} be the iith row of data after truncation and re-centering. As we have assumed 𝝁=𝟎\bm{\mu}=\bm{0} without loss of generality, we can write

[𝒙^1𝒙^N]=𝐋𝐅+𝚲𝚿^.\begin{bmatrix}\hat{\bm{x}}^{1}\\ \vdots\\ \hat{\bm{x}}^{N}\end{bmatrix}=\mathbf{L}\mathbf{F}+\mathbf{\Lambda}\hat{\mathbf{\Psi}}.

Let 𝐠i\mathbf{g}^{i} be the iith row vector of 𝚲\mathbf{\Lambda}. Since we assumed the length of each row vector of 𝚪\bm{\Gamma} is 1 at the beginning of this proof, we get i1\left\|\bm{\ell}^{i}\right\|\leq 1 and 𝐠i1\left\|\mathbf{g}^{i}\right\|\leq 1. Hence

max1iN|𝒙^i2𝒙i2T|\displaystyle\max_{1\leq i\leq N}\left|\frac{\left\|\hat{\bm{x}}^{i}\right\|^{2}-\left\|\bm{x}^{i}\right\|^{2}}{T}\right| 1Tmax1iN|2i𝐅(𝚿𝚿^)(𝐠i)|+1Tmax1iN|𝐠i(𝚿𝚿𝚿^𝚿^)(𝐠i)|\displaystyle\leq\frac{1}{T}\max_{1\leq i\leq N}\left|2\bm{\ell}^{i}\mathbf{F}(\mathbf{\Psi}-\hat{\mathbf{\Psi}})^{\top}{(\mathbf{g}^{i})}^{\top}\right|+\frac{1}{T}\max_{1\leq i\leq N}\left|\mathbf{g}^{i}(\mathbf{\Psi}\mathbf{\Psi}^{\top}-\hat{\mathbf{\Psi}}\hat{\mathbf{\Psi}}^{\top}){(\mathbf{g}^{i})}^{\top}\right|
2T|𝐅||𝚿𝚿^|+1T|𝚿𝚿^||𝚿|+1T|𝚿^||𝚿𝚿^|\displaystyle\leq\frac{2}{T}\left|\!\left|\!\left|\mathbf{F}\right|\!\right|\!\right|\left|\!\left|\!\left|\mathbf{\Psi}-\hat{\mathbf{\Psi}}\right|\!\right|\!\right|+\frac{1}{T}\left|\!\left|\!\left|\mathbf{\Psi}-\hat{\mathbf{\Psi}}\right|\!\right|\!\right|\left|\!\left|\!\left|\mathbf{\Psi}^{\top}\right|\!\right|\!\right|+\frac{1}{T}\left|\!\left|\!\left|\hat{\mathbf{\Psi}}\right|\!\right|\!\right|\left|\!\left|\!\left|\mathbf{\Psi}^{\top}-\hat{\mathbf{\Psi}}^{\top}\right|\!\right|\!\right|
a.s.0.\displaystyle\mathrel{\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}}0.

Now define the function di:KT×NT[0,)d_{i}:{\mathbb{R}}^{KT}\times{\mathbb{R}}^{NT}\xrightarrow{}[0,\,\infty) by

di(𝐅,𝚿):=i𝐅+𝐠i𝚿.d_{i}(\mathbf{F},\mathbf{\Psi}):=\left\|\bm{\ell}^{i}\mathbf{F}+\mathbf{g}^{i}\mathbf{\Psi}\right\|.

From the proof of (El Karoui 2009, Lemma 4), did_{i} is convex and 11-Lipschitz (as 𝜸i=1\left\|\bm{\gamma}^{i}\right\|=1). Let μf\mu_{f} and μψ^\mu_{{\hat{\psi}}} be the distribution of fktf_{kt} and ψ^it{{\hat{\psi}}}_{it}, respectively. Conditioning on 𝐅\mathbf{F} (i.e., for every fixed 𝐅\mathbf{F}), we apply (Ledoux 2001, Corollary 4.10, pp.77–78, Eq.(4.10)) on the variables ψ^it{{\hat{\psi}}}_{it}. By |ψ^it|T/(logT)1+ε|{{\hat{\psi}}}_{it}|\leq\sqrt{T/(\log T)^{1+\varepsilon}}, it holds that for all δ>0\delta>0,

NTI(|di(𝐅,𝚿^)T𝕄|𝐅(di(𝐅,𝚿^)T)|>δ)dμψ^NT(𝚿^)\displaystyle\int_{{\mathbb{R}}^{NT}}\mathrm{I}\left(\,\left|\frac{d_{i}(\mathbf{F},\hat{\mathbf{\Psi}})}{\sqrt{T}}-\operatorname{\mathbb{M}_{|\mathbf{F}}}\left(\frac{d_{i}(\mathbf{F},\hat{\mathbf{\Psi}})}{\sqrt{T}}\right)\right|>\delta\right)\mathrm{d}\mu_{{\hat{\psi}}}^{\otimes NT}(\hat{\mathbf{\Psi}}) 4exp(δ2(logT)1+ε16).\displaystyle\leq 4\exp\left(-\frac{\delta^{2}(\log T)^{1+\varepsilon}}{16}\right). (14)

Here 𝕄|𝐅\operatorname{\mathbb{M}_{|\mathbf{F}}} (𝔼|𝐅\operatorname{\mathbb{E}_{|\mathbf{F}}}, 𝕍|𝐅\operatorname{\mathbb{V}_{|\mathbf{F}}}, resp.) represents a conditional median (the conditional expectation, the conditional variance, resp.) knowing 𝐅\mathbf{F}. By Tomkins (1975), 𝕄|𝐅(di(𝐅,𝚿^)/T)\operatorname{\mathbb{M}_{|\mathbf{F}}}({d_{i}(\mathbf{F},\hat{\mathbf{\Psi}})}/{\sqrt{T}}) is measurable, so we can integrate (14) with μfKT\mu_{f}^{\otimes KT} by 𝐅\mathbf{F}. Then, we get, for all δ>0\delta>0,

𝐏(|di(𝐅,𝚿^)T𝕄|𝐅(di(𝐅,𝚿^)T)|>δ)\displaystyle\mathbf{P}\left(\left|\frac{d_{i}(\mathbf{F},\hat{\mathbf{\Psi}})}{\sqrt{T}}-\operatorname{\mathbb{M}_{|\mathbf{F}}}\left(\frac{d_{i}(\mathbf{F},\hat{\mathbf{\Psi}})}{\sqrt{T}}\right)\right|>\delta\right) 4exp(δ2(logT)1+ε16).\displaystyle\leq 4\exp\left(-\frac{\delta^{2}(\log T)^{1+\varepsilon}}{16}\right).

Readily,

𝐏(max1iN|di(𝐅,𝚿^)T𝕄|𝐅(di(𝐅,𝚿^)T)|>δ)\displaystyle\mathbf{P}\left(\max_{1\leq i\leq N}\left|\frac{d_{i}(\mathbf{F},\hat{\mathbf{\Psi}})}{\sqrt{T}}-\operatorname{\mathbb{M}_{|\mathbf{F}}}\left(\frac{d_{i}(\mathbf{F},\hat{\mathbf{\Psi}})}{\sqrt{T}}\right)\right|>\delta\right) 4Nexp(δ2(logT)1+ε16).\displaystyle\leq 4N\exp\left(-\frac{\delta^{2}(\log T)^{1+\varepsilon}}{16}\right).

Borel-Cantelli’s lemma yields

max1iN|di(𝐅,𝚿^)T𝕄|𝐅(di(𝐅,𝚿^)T)|a.s.0.\displaystyle\max_{1\leq i\leq N}\left|\frac{d_{i}(\mathbf{F},\hat{\mathbf{\Psi}})}{\sqrt{T}}-\operatorname{\mathbb{M}_{|\mathbf{F}}}\left(\frac{d_{i}(\mathbf{F},\hat{\mathbf{\Psi}})}{\sqrt{T}}\right)\right|\mathrel{\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}}0. (15)

On the other hand, by Mallows (1991), and then by applying (Ledoux 2001, Proposition 1.9) to di(𝐅,𝚿^)/T{d_{i}(\mathbf{F},\hat{\mathbf{\Psi}})}/{\sqrt{T}} with inequality (14), we have

|𝕄|𝐅(di(𝐅,𝚿^)T)𝔼|𝐅(di(𝐅,𝚿^)T)|𝕍|𝐅(di(𝐅,𝚿^)T)C(logT)(1+ε)/2\displaystyle\left|\operatorname{\mathbb{M}_{|\mathbf{F}}}\left(\frac{d_{i}(\mathbf{F},\hat{\mathbf{\Psi}})}{\sqrt{T}}\right)-\operatorname{\mathbb{E}_{|\mathbf{F}}}\left(\frac{d_{i}(\mathbf{F},\hat{\mathbf{\Psi}})}{\sqrt{T}}\right)\right|\leq\sqrt{\operatorname{\mathbb{V}_{|\mathbf{F}}}\left(\frac{d_{i}(\mathbf{F},\hat{\mathbf{\Psi}})}{\sqrt{T}}\right)}\leq C(\log T)^{-(1+\varepsilon)/2} (16)

where CC is an absolute constant independent of ii. By (10) and the first two arguments (i, ii) just below (11), we can check

max1iN|1Ti𝐅2i2|a.s.0.\max_{1\leq i\leq N}\left|\frac{1}{T}\left\|\bm{\ell}^{i}\mathbf{F}\right\|^{2}-\left\|\bm{\ell}^{i}\right\|^{2}\right|\mathrel{\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}}0.

Moreover, by Lebesgue’s dominated convergence theorem, the premise 𝕍ψ11=1\operatorname{\mathbb{V}}\psi_{11}=1 deduces 𝕍ψ^111\operatorname{\mathbb{V}}{{\hat{\psi}}_{11}}\to 1. Hence

max1iN||(𝔼|𝐅di(𝐅,𝚿^)T)21|𝕍|𝐅di(𝐅,𝚿^)T|\displaystyle\max_{1\leq i\leq N}\left|\ \left|\left(\operatorname{\mathbb{E}_{|\mathbf{F}}}\frac{d_{i}(\mathbf{F},\hat{\mathbf{\Psi}})}{\sqrt{T}}\right)^{2}-1\right|-\operatorname{\mathbb{V}_{|\mathbf{F}}}\frac{d_{i}(\mathbf{F},\hat{\mathbf{\Psi}})}{\sqrt{T}}\ \right|
max1iN|𝕍|𝐅di(𝐅,𝚿^)T+(𝔼|𝐅di(𝐅,𝚿^)T)21|=max1iN|𝔼|𝐅di2(𝐅,𝚿^)T1|\displaystyle\leq\max_{1\leq i\leq N}\left|\operatorname{\mathbb{V}_{|\mathbf{F}}}\frac{d_{i}(\mathbf{F},\hat{\mathbf{\Psi}})}{\sqrt{T}}+\left(\operatorname{\mathbb{E}_{|\mathbf{F}}}\frac{d_{i}(\mathbf{F},\hat{\mathbf{\Psi}})}{\sqrt{T}}\right)^{2}-1\right|=\max_{1\leq i\leq N}\left|\operatorname{\mathbb{E}_{|\mathbf{F}}}\frac{d_{i}^{2}(\mathbf{F},\hat{\mathbf{\Psi}})}{T}-1\right|
max1iN{|1Ti𝐅2i2|+|𝕍(ψ^11)1|𝐠i2}\displaystyle\leq\max_{1\leq i\leq N}\left\{\left|\frac{1}{T}\left\|\bm{\ell}^{i}\mathbf{F}\right\|^{2}-\left\|\bm{\ell}^{i}\right\|^{2}\right|+\left|\operatorname{\mathbb{V}}({\hat{\psi}}_{11})-1\right|\,\left\|\mathbf{g}^{i}\right\|^{2}\right\}
a.s.0.\displaystyle\mathrel{\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}}0.

As a result, by the second inequality in (16),

max1iN|𝔼|𝐅di(𝐅,𝚿^)T1|max1iN|(𝔼|𝐅di(𝐅,𝚿^)T)21|a.s.0.\displaystyle\max_{1\leq i\leq N}\left|\operatorname{\mathbb{E}_{|\mathbf{F}}}\frac{d_{i}(\mathbf{F},\hat{\mathbf{\Psi}})}{\sqrt{T}}-1\right|\leq\max_{1\leq i\leq N}\left|\left(\operatorname{\mathbb{E}_{|\mathbf{F}}}\frac{d_{i}(\mathbf{F},\hat{\mathbf{\Psi}})}{\sqrt{T}}\right)^{2}-1\right|\mathrel{\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}}0.

Again by(16),

max1iN|𝕄|𝐅(di(𝐅,𝚿^)T)1|a.s.0,\displaystyle\max_{1\leq i\leq N}\left|\operatorname{\mathbb{M}_{|\mathbf{F}}}\left(\frac{d_{i}(\mathbf{F},\hat{\mathbf{\Psi}})}{\sqrt{T}}\right)-1\right|\mathrel{\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}}0,

from which (15) concludes

max1iN|di(𝐅,𝚿^)T1|a.s.0.\max_{1\leq i\leq N}\left|\frac{d_{i}(\mathbf{F},\hat{\mathbf{\Psi}})}{\sqrt{T}}-1\right|\mathrel{\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}}0.

For 𝐃\mathbf{D}, we use the similar arguments as El Karoui (2009) with the above adaptions. We omit the details. The proof is complete. Q.E.D.

A.3 Proof of Theorems 2.6, 2.7 and 2.8

To establish Theorem 2.6, by Theorem 2.3 and 𝚫=(L2+σ2)𝐈\bm{\Delta}=(L^{2}+\sigma^{2})\mathbf{I}, it is enough to determine the asymptotic locations of largest eigenvalues of sample covariance matrix 𝐒\mathbf{S} of a CLFM.

Theorem A.2.

If a CLFM satisfies A4, then for k=1,,rk=1,\dots,r,

limN,Tλk(𝐒~)λk(𝐋𝐋)=limN,Tλk(𝐒)λk(𝐋𝐋)=1\displaystyle\lim_{N,T\to\infty}\frac{\lambda_{k}({\tilde{\mathbf{S}}})}{\lambda_{k}(\mathbf{L}\mathbf{L}^{\top})}=\lim_{N,T\to\infty}\frac{\lambda_{k}\left(\mathbf{S}\right)}{\lambda_{k}(\mathbf{L}\mathbf{L}^{\top})}=1 (a.s.),\displaystyle(a.s.), (17)
limN,Tλr+1(𝐒~)=limN,Tλr+1(𝐒)=σ2(1+c)2\displaystyle\lim_{N,T\to\infty}\lambda_{r+1}({\tilde{\mathbf{S}}})=\lim_{N,T\to\infty}\lambda_{r+1}(\mathbf{S})=\sigma^{2}(1+\sqrt{c})^{2} (a.s.).\displaystyle(a.s.). (18)
Proof of (17).

Let 1kr1\leq k\leq r. For 𝐒~{\tilde{\mathbf{S}}}, by the definition (1) and the model setting, we observe

λk(𝐒~)=sk(1T𝐗)=1Tsk(𝐋𝐅+σ𝚿).\sqrt{\lambda_{k}({\tilde{\mathbf{S}}})}=s_{k}\left(\frac{1}{\sqrt{T}}\mathbf{X}\right)=\frac{1}{\sqrt{T}}s_{k}\left(\mathbf{L}\mathbf{F}+\sigma\mathbf{\Psi}\right).

By Corollary B.3,

|λk(𝐒~)1Tsk(𝐋𝐅)|σTs1(𝚿).\left|\sqrt{\lambda_{k}({\tilde{\mathbf{S}}})}-\frac{1}{\sqrt{T}}s_{k}\left(\mathbf{L}\mathbf{F}\right)\right|\leq\frac{\sigma}{\sqrt{T}}s_{1}\left(\mathbf{\Psi}\right).

By dividing the above inequality with sk(𝐋)s_{k}\left(\mathbf{L}\right),

|λk(𝐒~)λk(𝐋𝐋)1Tsk(𝐋𝐅)sk(𝐋)|σsk(𝐋)s1(𝚿T).\displaystyle\left|\sqrt{\frac{\lambda_{k}({\tilde{\mathbf{S}}})}{\lambda_{k}(\mathbf{L}\mathbf{L}^{\top})}}-\frac{1}{\sqrt{T}}\frac{s_{k}\left(\mathbf{L}\mathbf{F}\right)}{s_{k}\left(\mathbf{L}\right)}\right|\leq\frac{\sigma}{s_{k}\left(\mathbf{L}\right)}s_{1}\left(\frac{\mathbf{\Psi}}{\sqrt{T}}\right). (19)

Here limN,T1/sk(𝐋)=0\lim_{{N,T\to\infty}}1/s_{k}\left(\mathbf{L}\right)=0 by A4. For 𝚿N×T\mathbf{\Psi}\in{\mathbb{R}}^{N\times T}, limN,Ts1(T1/2𝚿)=(1+c)2\lim_{N,T\to\infty}s_{1}\left(T^{-1/2}\mathbf{\Psi}\right)=(1+\sqrt{c})^{2} (a.s.) by Proposition B.6 (1). Thus limN,T(19)=0\lim_{{N,T\to\infty}}\eqref{ineq:diff_sv_ratios}=0 (a.s.).

Now we only need to verify that

1Tsk(𝐋𝐅)sk(𝐋)\displaystyle\frac{1}{\sqrt{T}}\frac{s_{k}\left(\mathbf{L}\mathbf{F}\right)}{s_{k}\left(\mathbf{L}\right)} a.s.1\displaystyle\mathrel{\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}}1 (N,T).\displaystyle({N,T\to\infty}). (20)

By applying the strong law of large numbers to each element of T1𝐅𝐅K×KT^{-1}\mathbf{F}\mathbf{F}^{\top}\in{\mathbb{R}}^{K\times K}, we deduce

1T𝐅𝐅a.s.𝐈K,\displaystyle\frac{1}{T}\mathbf{F}\mathbf{F}^{\top}\mathrel{\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}}\mathbf{I}_{K},

which means that s1(𝐅/T)a.s.1s_{1}\left(\mathbf{F}/\sqrt{T}\right)\mathrel{\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}}1, and sK(𝐅/T)a.s.1s_{K}\left(\mathbf{F}/\sqrt{T}\right)\mathrel{\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}}1. Then, Theorem 2.3 verifies (20). Therefore, we established limN,Tλk(𝐒~)/sk(𝐋)=1\lim_{N,T\to\infty}{\lambda_{k}({\tilde{\mathbf{S}}})}/{s_{k}\left(\mathbf{L}\right)}=1 (a.s.) for 1kr{1\leq k\leq r} and K0K\neq 0.

From the above proof, we obtain a proof of the latter identity of (17) for 1kr{1\leq k\leq r} and K0K\neq 0, through the following modification. Firstly note that 𝐗¯=(L2+σ2)1/2(𝐋𝐅¯+σ𝚿¯)\bar{\mathbf{X}}=(L^{2}+\sigma^{2})^{-1/2}(\mathbf{L}\bar{\mathbf{F}}+\sigma\bar{\mathbf{\Psi}}) where 𝐅¯=[T1t=1Tfkt]k,t\bar{\mathbf{F}}=\left[T^{-1}\sum_{t=1}^{T}f_{kt}\right]_{k,t} and 𝚿¯=[T1t=1Tψit]i,t\bar{\mathbf{\Psi}}=\left[T^{-1}\sum_{t=1}^{T}\psi_{it}\right]_{i,t}. Then, in the proof of limN,Tλk(𝐒~)/sk(𝐋)=1\lim_{N,T\to\infty}{\lambda_{k}({\tilde{\mathbf{S}}})}/{s_{k}\left(\mathbf{L}\right)}=1 (a.s.), we use 𝐅𝐅¯\mathbf{F}-\bar{\mathbf{F}}, 𝚿𝚿¯\mathbf{\Psi}-\bar{\mathbf{\Psi}}, and Proposition B.6 (2), instead of 𝐅\mathbf{F}, 𝚿\mathbf{\Psi}, and Proposition B.6 (1). Q.E.D.

Proof of (18).

By Lemma B.7, we obviously have

lim¯N,Tλr+1(𝐒~)\displaystyle\varliminf_{N,T\to\infty}\lambda_{r+1}({\tilde{\mathbf{S}}}) σ2(1+c)2\displaystyle\geq\sigma^{2}(1+\sqrt{c})^{2} (a.s.).\displaystyle(a.s.).

Thus we only need to assure that

lim¯N,Tλr+1(𝐒~)\displaystyle\varlimsup_{N,T\to\infty}\lambda_{r+1}({\tilde{\mathbf{S}}}) σ2(1+c)2\displaystyle\leq\sigma^{2}(1+\sqrt{c})^{2} (a.s.).\displaystyle(a.s.). (21)

When r=0r=0, (21) follows from Proposition B.6 (1). So we assume now r>0r>0. The matrices 𝐒~{\tilde{\mathbf{S}}} can be written as

𝐒~=1T(𝐋𝐅+σ𝚿)(𝐋𝐅+σ𝚿).{\tilde{\mathbf{S}}}=\frac{1}{T}(\mathbf{L}\mathbf{F}+\sigma\mathbf{\Psi})(\mathbf{L}\mathbf{F}+\sigma\mathbf{\Psi})^{\top}.

Thus using Proposition B.2 with i=0,j=ri=0,j=r, and noting that sr+1(𝐋𝐅)=0s_{r+1}(\mathbf{L}\mathbf{F})=0, we have

λr+1(𝐒~)=1Tsr+12(𝐋𝐅+σ2𝚿)σ2Tλ1(𝚿𝚿).\lambda_{r+1}({\tilde{\mathbf{S}}})=\frac{1}{T}s^{2}_{r+1}(\mathbf{L}\mathbf{F}+\sigma^{2}\mathbf{\Psi})\leq\frac{\sigma^{2}}{T}\lambda_{1}(\mathbf{\Psi}\mathbf{\Psi}^{\top}).

As N,T{N,T\to\infty}, taking the limsup, by Proposition B.6, (1), we conclude (21).

Finally, the proof of limN,Tλr+1(𝐒)=σ2(1+c)2\lim_{N,T\to\infty}\lambda_{r+1}\left(\mathbf{S}\right)=\sigma^{2}(1+\sqrt{c})^{2} (a.s.) is that of limN,Tλr+1(𝐒~)=σ2(1+c)2\lim_{N,T\to\infty}\lambda_{r+1}({\tilde{\mathbf{S}}})=\sigma^{2}(1+\sqrt{c})^{2} (a.s.), except that 𝐒\mathbf{S}, 𝐅\mathbf{F}, 𝚿\mathbf{\Psi}, and Proposition B.6 (1) are superseded by 𝐒~{\tilde{\mathbf{S}}}, 𝐅[T1t=1Tfkt]k,t\mathbf{F}-\left[T^{-1}\sum_{t=1}^{T}f_{kt}\right]_{k,t}, 𝚿[T1t=1Tψit]i,t\mathbf{\Psi}-\left[T^{-1}\sum_{t=1}^{T}\psi_{it}\right]_{i,t}, and Proposition B.6 (2), respectively. Q.E.D.

Proof of Theorem 2.7.

It is well-known that n=rNn1=logN+O(1)\sum_{n=r}^{N}n^{-1}=\log N+O(1) for any fixed r1r\geq 1. By Theorem 2.6 along with condition (5), it holds almost surely that λr+1(𝐂)=(1ρ)(1+c)2+o(1)\lambda_{r+1}\left(\mathbf{C}\right)=(1-\rho)(1+\sqrt{c})^{2}+o(1) and lim¯N(λr(𝐂)/log(N))>1\displaystyle\varliminf_{N\to\infty}(\lambda_{r}(\mathbf{C})/\log(N))>1. As a result, for sufficiently large NN, it follows almost surely that λk(𝐂)>n=rNn1\lambda_{k}\left(\mathbf{C}\right)>\sum_{n=r}^{N}n^{-1} (1kr)({1\leq k\leq r}), and λr+1(𝐂)n=r+1Nn1\lambda_{r+1}\left(\mathbf{C}\right)\leq\sum_{n=r+1}^{N}n^{-1}. Thus, limN,TBS(𝐂)=r\lim_{N,T\to\infty}\operatorname{BS}(\mathbf{C})=r (a.s.).

We can demonstrate limN,TBS(𝐂~)=r\lim_{N,T\to\infty}\operatorname{BS}(\tilde{\mathbf{C}})=r (a.s.) mutatis mutandis. Q.E.D.

Now we prove Theorem 2.8. For an asymptotic CLF, we establish the following result for the largest and the second largest eigenvalues of sample correlation matrices 𝐂~\tilde{\mathbf{C}} and 𝐂\mathbf{C}, and then we proceed in the same manner as proving Theorem 2.7.

Theorem A.3.

Under the same conditions as Theorem 2.8, for 𝐌=𝐂,𝐂~\mathbf{M}=\mathbf{C},\tilde{\mathbf{C}}, the following hold almost surely:

limN,Tλ1(𝐌)N\displaystyle\lim_{{N,T\to\infty}}\frac{\lambda_{1}(\mathbf{M})}{N} =22+σ2,\displaystyle=\frac{\left\|\bm{\ell}\right\|^{2}}{\left\|\bm{\ell}\right\|^{2}+\sigma^{2}},\quad λ2(𝐌)\displaystyle\lambda_{2}(\mathbf{M}) =(1ρ)(1+c)2+o(log(N))\displaystyle=(1-\rho)(1+\sqrt{c})^{2}+o(\log(N)) (N,T).\displaystyle({N,T\to\infty}).
Proof.

After normalizing the kkth row of 𝐗\mathbf{X} with k2+σk2\sqrt{\left\|\bm{\ell}^{k}\right\|^{2}+\sigma_{k}^{2}}, we may assume without loss of generality that k2+σk2=1\left\|\bm{\ell}^{k}\right\|^{2}+\sigma_{k}^{2}=1 for each k[1,N]k\in[1,N]. Then 2+σ2=1\left\|\bm{\ell}\right\|^{2}+\sigma^{2}=1. For

𝐋0=[],\mathbf{L}_{0}=\begin{bmatrix}\bm{\ell}^{\top}&\dots&\bm{\ell}^{\top}\end{bmatrix}^{\top},

Corollary B.3 implies

|si(𝐋)si(𝐋0)||𝐋𝐋0|k=1Nk2=o(log1/2(N)).\displaystyle\left|s_{i}(\mathbf{L})-s_{i}(\mathbf{L}_{0})\right|\leq\left|\!\left|\!\left|\mathbf{L}-\mathbf{L}_{0}\right|\!\right|\!\right|\leq\sqrt{\sum_{k=1}^{N}\left\|\bm{\ell}^{k}-\bm{\ell}\right\|^{2}}=o(\log^{1/2}(N)). (22)

Using Theorem 2.4, Corollary B.3 and Theorem 2.3, we have, as N,T,N/Tc>0{N,T\to\infty},N/T\to c>0, almost surely,

λ1(𝐂~)N\displaystyle\sqrt{\frac{\lambda_{1}(\tilde{\mathbf{C}})}{N}} 1TNs1(𝐋𝐅+𝚲𝚿)=1TNs1(𝐋𝐅)+O(1TNs1(𝚲𝚿))\displaystyle\sim\frac{1}{\sqrt{TN}}s_{1}(\mathbf{L}\mathbf{F}+\mathbf{\Lambda}\mathbf{\Psi})=\frac{1}{\sqrt{TN}}s_{1}(\mathbf{L}\mathbf{F})+O(\frac{1}{\sqrt{TN}}s_{1}(\mathbf{\Lambda}\mathbf{\Psi}))
1Ns1(𝐋)+O(1N)1Ns1(𝐋0)+O(1N).\displaystyle\sim\frac{1}{\sqrt{N}}s_{1}(\mathbf{L})+O(\frac{1}{\sqrt{N}})\sim\frac{1}{\sqrt{N}}s_{1}(\mathbf{L}_{0})+O(\frac{1}{\sqrt{N}})\xrightarrow[]{}\left\|\bm{\ell}\right\|.

As a result of s2(𝐋0)=0s_{2}(\mathbf{L}_{0})=0 and (22), we get s2(𝐋)=o(log1/2N)s_{2}(\mathbf{L})=o(\log^{1/2}N). The same sequence of inequalities applying to the second largest eigenvalue of 𝐂~\tilde{\mathbf{C}}, we have

λ2(𝐂~)\displaystyle\sqrt{\lambda_{2}(\tilde{\mathbf{C}})} 1Ts2(𝐋𝐅+𝚲𝚿)=1Ts2(𝐋𝐅)+O(1Ts2(𝚲𝚿))\displaystyle\sim\frac{1}{\sqrt{T}}s_{2}(\mathbf{L}\mathbf{F}+\mathbf{\Lambda}\mathbf{\Psi})=\frac{1}{\sqrt{T}}s_{2}(\mathbf{L}\mathbf{F})+O(\frac{1}{\sqrt{T}}s_{2}(\mathbf{\Lambda}\mathbf{\Psi}))
s2(𝐋)+O(1)=o(log1/2N).\displaystyle\sim s_{2}(\mathbf{L})+O(1)=o(\log^{1/2}N).

For 𝐂\mathbf{C}, the proof is similar. Q.E.D.

Appendix B Some useful lemmas

The following Weyl’s inequality is well-known (Horn & Johnson 2013, Theorem 4.3.1).

Proposition B.1 (Weyl’s inequality).

Let 𝐍\mathbf{N} and 𝐇\mathbf{H} be N×NN\times N Hermitian matrices, and let 𝐌=𝐍+𝐇\mathbf{M}=\mathbf{N}+\mathbf{H}. Then

λj(𝐍)+λk(𝐇)λi(𝐌)λr(𝐍)+λs(𝐇)\displaystyle\lambda_{j}(\mathbf{N})+\lambda_{k}(\mathbf{H})\leq\lambda_{i}(\mathbf{M})\leq\lambda_{r}(\mathbf{N})+\lambda_{s}(\mathbf{H}) (r+s1ij+kN).\displaystyle(r+s-1\leq i\leq j+k-N).

An analogous theorem on the singular values of matrices can be established.

Proposition B.2 ((Bai & Silverstein 2010, Theorem A.8)).

Let 𝐀\mathbf{A} and 𝐁\mathbf{B} be two p×np\times n complex matrices. Then for any nonnegative integers i,ji,j,

si+j+1(𝐀+𝐁)si+1(𝐀)+sj+1(𝐁).\displaystyle s_{i+j+1}\left(\mathbf{A}+\mathbf{B}\right)\leq s_{i+1}\left(\mathbf{A}\right)+s_{j+1}\left(\mathbf{B}\right).

The conjugated transpose of a complex matrix 𝐀\mathbf{A} is denoted by 𝐀\mathbf{A}^{*}.

Proof.

Apply the second inequality λi(𝐌)λr(𝐍)+λs(𝐇)\lambda_{i}(\mathbf{M})\leq\lambda_{r}(\mathbf{N})+\lambda_{s}(\mathbf{H}) of Proposition B.1 to Hermitian matrices 𝐍=[𝐀𝐀]\mathbf{N}=\begin{bmatrix}&\mathbf{A}\\ \mathbf{A}^{*}&\end{bmatrix} and 𝐇=[𝐁𝐁]\mathbf{H}=\begin{bmatrix}&\mathbf{B}\\ \mathbf{B}^{*}&\end{bmatrix}. Q.E.D.

Using this proposition, we establish the small-size perturbation inequality for singular values.

Corollary B.3.

Let 𝐀\mathbf{A} and 𝐁\mathbf{B} be the same as in Proposition B.2. Then for i1i\geq 1,

|si(𝐀+𝐁)si(𝐀)|s1(𝐁).\displaystyle|s_{i}\left(\mathbf{A}+\mathbf{B}\right)-s_{i}\left(\mathbf{A}\right)|\leq s_{1}\left(\mathbf{B}\right).
Proof.

In Proposition B.2 taking j=0j=0 and replacing i+1i+1 by ii, we get si(𝐀+𝐁)si(𝐀)s1(𝐁).s_{i}\left(\mathbf{A}+\mathbf{B}\right)-s_{i}\left(\mathbf{A}\right)\leq s_{1}\left(\mathbf{B}\right). The other part is checked from the above inequality by si(𝐀+𝐁𝐁)si(𝐀+𝐁)s1(𝐁)s_{i}\left(\mathbf{A}+\mathbf{B}-\mathbf{B}\right)-s_{i}\left(\mathbf{A}+\mathbf{B}\right)\leq s_{1}\left(-\mathbf{B}\right) and note that s1(𝐁)=s1(𝐁)s_{1}\left(-\mathbf{B}\right)=s_{1}\left(\mathbf{B}\right). Q.E.D.

Proposition B.4 ((Bai & Silverstein 2010, Theorem A.10)).

Let 𝐀\mathbf{A} and 𝐁\mathbf{B} be complex matrices of order m×nm\times n and n×pn\times p. For any i,j0i,j\geq 0, we have

si+j+1(𝐀𝐁)si+1(𝐀)sj+1(𝐁)s_{i+j+1}\left(\mathbf{A}\mathbf{B}\right)\leq s_{i+1}\left(\mathbf{A}\right)s_{j+1}\left(\mathbf{B}\right)

To argue forms of the law of large numbers in the proportional limiting regime, we make use of the following:

Lemma B.5 ((Bai & Yin 1993, Lemma 2)).

Suppose that {yij:i,j1}\set{y_{ij}\colon\,i,j\geq 1} is a double array of i.i.d. random variables. Let α>1/2\alpha>{1}/{2}, β0\beta\geq 0 and Q>0Q>0 be constants. Then,

limnmaxiQnβ|j=1nyijmnα|=0(a.s.)\displaystyle\lim_{n\to\infty}\max_{i\leq Qn^{\beta}}\left|\sum_{j=1}^{n}\frac{y_{ij}-m}{n^{\alpha}}\right|=0\quad(a.s.)
\displaystyle\iff 𝔼|y11|1+βα<&m={𝔼y11,(α1),any,(α>1).\displaystyle\operatorname{\mathbb{E}}|y_{11}|^{\frac{1+\beta}{\alpha}}<\infty\;\&\;m=\begin{cases}\displaystyle\operatorname{\mathbb{E}}y_{11},&(\alpha\leq 1),\\ \displaystyle\text{\rm any},&(\alpha>1).\end{cases}

For all square matrices 𝐀\mathbf{A} and 𝐁\mathbf{B} of order nn, the characteristic polynomial of 𝐀𝐁\mathbf{A}\mathbf{B} is that of 𝐁𝐀\mathbf{B}\mathbf{A}. For two matrices 𝐀\mathbf{A} and 𝐁\mathbf{B} of size n×mn\times m and m×nm\times n, respectively, the matrix 𝐀𝐁\mathbf{A}\mathbf{B} has the same nonzero eigenvalues as 𝐁𝐀\mathbf{B}\mathbf{A}.

In the following, the first assertion is due to (Yin et al. 1988, Theorem 3.1) and the second to (Jiang 2004, (2.7)).

Proposition B.6.

Let 𝐙=[zit]i,t(K+N)×T\mathbf{Z}=\left[z_{it}\right]_{i,t}\in{\mathbb{R}}^{(K+N)\times T} satisfy the following subarray condition: There is an infinite matrix 𝒵=[zit]i1,t1\mathcal{Z}=[z_{it}]_{i\geq 1,t\geq 1} such that

  • all the entries of 𝒵\mathcal{Z} are centered i.i.d. real random variables having unit variances and the finite fourth moments; and

  • for each N1N\geq 1, the matrix 𝐙=[zit]1iK+N, 1tT\mathbf{Z}=\left[z_{it}\right]_{1\leq i\leq K+N,\,{1\leq t\leq T}} is the top-left submatrix of 𝒵\mathcal{Z} where T=T(N)T=T(N).

Moreover, let 𝐙¯=[T1t=1Tzit]i,t(K+N)×T\bar{\mathbf{Z}}=\left[T^{-1}\sum_{t=1}^{T}z_{it}\right]_{i,t}\in{\mathbb{R}}^{(K+N)\times T}. Then

limN,Tλ1(1T𝐙𝐙)\displaystyle\lim_{N,T\to\infty}\lambda_{1}\left(\frac{1}{T}\mathbf{Z}\mathbf{Z}^{\top}\right) =(1+c)2\displaystyle=(1+\sqrt{c})^{2} (a.s.).\displaystyle(a.s.). (1)
limTs1(1T(𝐙𝐙¯))\displaystyle\lim_{T\to\infty}s_{1}\left(\frac{1}{\sqrt{T}}(\mathbf{Z}-\bar{\mathbf{Z}})\right) =1+c\displaystyle=1+\sqrt{c} (a.s.).\displaystyle(a.s.). (2)

By the empirical spectral measure of an N×NN\times N semi-definite matrix 𝐇\mathbf{H}, we mean a probability measure θ\theta such that

θ(A)\displaystyle\theta(A) =#{i[1,N]:λi(𝐇)A}N\displaystyle=\frac{\#\set{i\in[1,\,N]\colon\,\lambda_{i}\left(\mathbf{H}\right)\in A}}{N} (A).\displaystyle(A\subseteq{\mathbb{R}}).

If the empirical spectral measure of 𝐇\mathbf{H} converges weakly to a probability measure ϑ\vartheta in a given limiting regime, we call ϑ\vartheta the limiting spectral measure of 𝐇\mathbf{H}.

Marčenko-Pastur probability measure of index c>0c>0 and scale parameter s>0s>0 has the probability density function

p(x)={12πxcs(a+x)(xa)(axa+),0(otherwise)\displaystyle p(x)=\begin{cases}\frac{1}{2\pi xcs}\sqrt{(a_{+}-x)(x-a_{-})}&(a_{-}\leq x\leq a_{+}),\\ 0&(\mbox{otherwise})\end{cases}

with an additional point mass of value max{0,11/c}\max\{0,1-1/c\} at the origin x=0x=0, where a±=s(1±c)2a_{\pm}=s(1\pm\sqrt{c})^{2}. a+a_{+} is called the right-edge.

Lemma B.7.

In every CLFM, if N,T,N/Tc(0,){{N,T\to\infty,\ N/T\to c}}\in(0,\,\infty), then all the limiting spectral measures of 𝐒\mathbf{S} and 𝐒~{\tilde{\mathbf{S}}} are Marčenko-Pastur probability measure of index cc and scale parameter σ2\sigma^{2}.

Proof.

Proved likewise as in (Akama 2023, Section 7). Q.E.D.