This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Exact Minimax Optimality of Spectral Methods in Phase Synchronization and Orthogonal Group Synchronization

Anderson Ye Zhang
 
University of Pennsylvania
Abstract

We study the performance of the spectral method for the phase synchronization problem with additive Gaussian noises and incomplete data. The spectral method utilizes the leading eigenvector of the data matrix followed by a normalization step. We prove that it achieves the minimax lower bound of the problem with a matching leading constant under a squared 2\ell_{2} loss. This shows that the spectral method has the same performance as more sophisticated procedures including maximum likelihood estimation, generalized power method, and semidefinite programming, as long as consistent parameter estimation is possible. To establish our result, we first have a novel choice of the population eigenvector, which enables us to establish the exact recovery of the spectral method when there is no additive noise. We then develop a new perturbation analysis toolkit for the leading eigenvector and show it can be well-approximated by its first-order approximation with a small 2\ell_{2} error. We further extend our analysis to establish the exact minimax optimality of the spectral method for the orthogonal group synchronization.

1 Introduction

We consider the phase synchronization problem with additive Gaussian noises and incomplete data [2, 5, 39, 18]. Let z1,,zn1z^{*}_{1},\ldots,z^{*}_{n}\in\mathbb{C}_{1} where 1:={x:|z|=1}\mathbb{C}_{1}:=\{x\in\mathbb{C}:\left|z\right|=1\}, the set of all unit complex numbers. Then each zjz^{*}_{j} can be written equivalently as eiθje^{i\theta^{*}_{j}} for some phase (or angle) θj[0,2π)\theta^{*}_{j}\in[0,2\pi) . For each 1j<kn1\leq j<k\leq n, the observation XjkX_{jk}\in\mathbb{C} is missing at random. Let Ajk{0,1}A_{jk}\in\{0,1\} and XjkX_{jk} satisfy

Xjk:={zjzk¯+σWjk, if Ajk=1,0, if Ajk=0,\displaystyle X_{jk}:=\begin{cases}{z_{j}^{*}\overline{z_{k}^{*}}+\sigma W_{jk}},\text{ if }A_{jk}=1,\\ 0,\quad\quad\quad\quad\quad\text{ if }A_{jk}=0,\end{cases} (1)

where AjkBernoulli(p)A_{jk}\sim\text{Bernoulli}(p) and Wjk𝒞𝒩(0,1)W_{jk}\sim\mathcal{CN}(0,1). That is, each XjkX_{jk} is missing with probability 1p1-p and is denoted as 0. If it is not missing, it is equal to zjzk¯z_{j}^{*}\overline{z_{k}^{*}} with an additive noise σWjk\sigma W_{jk} where WjkW_{jk} follows the standard complex Gaussian distribution: Re(Wjk),Im(Wjk)𝒩(0,1/2){\rm Re}(W_{jk}),{\rm Im}(W_{jk})\sim\mathcal{N}(0,1/2) independently. Each AjkA_{jk} is the indicator of whether XjkX_{jk} is observed or not. We assume all random variables {Ajk}1j<kn,{Wj,k}1j<kn\{A_{jk}\}_{1\leq j<k\leq n},\{W_{j,k}\}_{1\leq j<k\leq n} are independent of each other. The goal is to estimate the phase vector z:=(z1,,zn)1nz^{*}:=(z_{1}^{*},\ldots,z_{n}^{*})\in\mathbb{C}_{1}^{n} from {Ajk}1j<kn\{A_{jk}\}_{1\leq j<k\leq n} and {Xjk}1j<kn\{X_{jk}\}_{1\leq j<k\leq n}.

The observations can be seen as entries of a matrix Xn×nX\in\mathbb{C}^{n\times n} with Xjj:=0X_{jj}:=0 and Xkj:=Xjk¯X_{kj}:=\overline{X_{jk}} for any 1j<kn1\leq j<k\leq n. Define Ajj:=0A_{jj}:=0 and Akj:=AjkA_{kj}:=A_{jk} for all 1j<kn1\leq j<k\leq n. Then the matrix A{0,1}n×nA\in\{0,1\}^{n\times n} can be interpreted as the adjacency matrix of an Erdös-Rényi random graph with edge probability pp. Define Wn×nW\in\mathbb{C}^{n\times n} such that Wjj:=0W_{jj}:=0 and Wkj:=Wjk¯W_{kj}:=\overline{W_{jk}} for all 1j<kn1\leq j<k\leq n. Then all the matrices A,W,XA,W,X are Hermitian and XX can be written equivalently as

X=A(zzH+σW)=A(zzH)+σAW.\displaystyle X=A\circ\left(z^{*}z^{*{\mathrm{\scriptscriptstyle H}}}+\sigma W\right)=A\circ\left(z^{*}z^{*{\mathrm{\scriptscriptstyle H}}}\right)+\sigma A\circ W. (2)

Note that XX can be seen as a noisy version of

𝔼X=pzzHpIn\displaystyle\mathbb{E}X=pz^{*}z^{*{\mathrm{\scriptscriptstyle H}}}-pI_{n} (3)

whose leading eigenvector is z/nz^{*}/\sqrt{n}. This motivates the following spectral method [36, 18, 13]. Let unu\in\mathbb{C}^{n} be the leading eigenvector of XX. Then the spectral estimator z^1n\widehat{z}\in\mathbb{C}_{1}^{n} is defined as

z^j:={uj|uj|, if uj0,1, if uj=0,\displaystyle\widehat{z}_{j}:=\begin{cases}\frac{u_{j}}{\left|u_{j}\right|},\text{ if }u_{j}\neq 0,\\ 1,\quad\text{ if }u_{j}=0,\end{cases} (4)

for each j[n]j\in[n], where each uju_{j} is normalized so that z^j1\widehat{z}_{j}\in\mathbb{C}_{1}. The performance of the spectral estimator can be quantified by a normalized squared 2\ell_{2} loss

(z^,z):=mina11nj=1n|z^jzja|2,\displaystyle\ell(\widehat{z},z^{*}):=\min_{a\in\mathbb{C}_{1}}\frac{1}{n}\sum_{j=1}^{n}\left|\widehat{z}_{j}-z_{j}^{*}a\right|^{2}, (5)

where the minimum over 1\mathbb{C}_{1} is due to the fact that z1,,znz^{*}_{1},\ldots,z^{*}_{n} are identifiable only up to a phase.

The spectral estimator z^\widehat{z} is simple and easy to implement. Regarding its theoretical performance, it was suggested in [18] that an upper bound (z^,z)C(σ2+1)/(np)\ell(\widehat{z},z^{*})\leq C(\sigma^{2}+1)/(np) holds with high probability for some constant CC. However, the minimax risk of the phase synchronization was established in [18] and has the following lower bound:

infznsupz1n𝔼(z,z)(1o(1))σ22np.\displaystyle\inf_{z\in\mathbb{C}^{n}}\sup_{z^{*}\in\mathbb{C}_{1}^{n}}\mathbb{E}\ell(z,z^{*})\geq\left(1-o(1)\right)\frac{\sigma^{2}}{2np}. (6)

To provably achieve the minimax risk, the spectral method is often used as an initialization for some more sophisticated procedures. For example, it was used to initialize a generalized power method (GPM) [7, 29, 31] in [18]. Nevertheless, numerically the performance of the spectral method is already very good and the improvement from GPM is often marginal. This raises the following questions about the performance of the spectral method: Can we derive a sharp upper bound? Does the spectral method already achieve the minimax risk or not?

In this paper, we provide complete answers to these questions. We carry out a sharp 2\ell_{2} analysis of the performance of the spectral estimator z^\widehat{z} and further show it achieves the minimax risk with the correct constant. Our main result is summarized below in Theorem 1 in asymptotic form. Its non-asymptotic version will be given in Theorem 3 that only requires npσ2\frac{np}{\sigma^{2}}, nplogn\frac{np}{\log n} to be greater than a certain constant. We note that in this paper, pp and σ2\sigma^{2} are not constants but functions of nn. This dependence can be more explicitly represented as pnp_{n} and σn2\sigma^{2}_{n}. However, for simplicity of notation and readability, we choose to denote these as pp and σ2\sigma^{2} throughout the paper.

Theorem 1.

Assume npσ2\frac{np}{\sigma^{2}}\rightarrow\infty and nplogn\frac{np}{\log n}\rightarrow\infty. There exists some δ=o(1)\delta=o(1) such that with high probability,

(z^,z)(1+δ)σ22np.\displaystyle\ell(\widehat{z},z^{*})\leq(1+\delta)\frac{\sigma^{2}}{2np}. (7)

As a consequence, when σ=0\sigma=0 (i.e., there is no additive noise), the spectral method recovers zz^{*} exactly (up to a phase) with high probability as long as nplogn\frac{np}{\log n}\rightarrow\infty.

Theorem 1 shows that z^\widehat{z} is not only rate-optimal but also achieves the exact minimax risk with the correct leading constant in front of the optimal rate. The conditions needed in Theorem 1 are necessary for consistent estimation of zz^{*} in the phase synchronization problem. The condition npσ2\frac{np}{\sigma^{2}}\rightarrow\infty is needed so that zz^{*} can be estimated with a vanishing error according to the minimax lower bound (6). The condition nplogn\frac{np}{\log n}\rightarrow\infty allows pp to decrease as nn grows and is close to the nplogn>(1+ϵ)\frac{np}{\log n}>(1+\epsilon) condition required for AA to be connected. If AA has disjoint subgraphs, it is impossible to estimate zz^{*} with a global phase. These two conditions are also needed in [18, 19] to establish the optimality of MLE (maximum likelihood estimation), GPM, and SDP (semidefinite programming). Under these two conditions, [18] used the spectral method as an initialization for the GPM and shows GPM achieves the minimax risk after log(1/σ2)\log(1/\sigma^{2}) iterations. On the contrary, Theorem 1 shows that the spectral method already achieves the minimax risk. This means that the spectral estimator is minimax optimal and achieves the correct leading constant whenever consistent estimation is possible, and in this parameter regime, it is as good as MLE, GPM, and SDP.

There are two key novel components toward establishing Theorem 1. The first is a new idea regarding the choice of the “population eigenvector” as uu can be viewed as its sample counterpart obtained from data. Due to (3) and the fact pzzH=np(z/n)(z/n)Hpz^{*}z^{*{\mathrm{\scriptscriptstyle H}}}=np(z^{*}/\sqrt{n})(z^{*}/\sqrt{n})^{\mathrm{\scriptscriptstyle H}} is rank-one with the eigenvector z/nz^{*}/\sqrt{n}, existing literature such as [18, 20, 27] treated z/nz^{*}/\sqrt{n} as the population eigenvector and study the perturbation of uu with respect to z/nz^{*}/\sqrt{n}. This seems natural but turns out to be unappealing as it fails to explain why the spectral estimator is able to recover all phases exactly when σ=0\sigma=0, i.e., when there is no additive noise. Instead, we denote uwidecheckn\widecheck{u}\in\mathbb{R}^{n} as the leading eigenvector of AA and regard unu^{*}\in\mathbb{C}^{n}, defined as

u:=zuwidecheck,\displaystyle u^{*}:=z^{*}\circ\widecheck{u}, (8)

i.e., uj=zjuwidecheckju^{*}_{j}=z^{*}_{j}\widecheck{u}_{j} for each j[n]j\in[n], as the population eigenvector. Note that uu^{*} is random as it depends on the graph AA. A careful analysis of uu^{*} reveals that it is the leading eigenvector of AzzHA\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}} (see Lemma 1). In addition, Proposition 1 shows that with high probability, uj/|uj|=zju^{*}_{j}/|u^{*}_{j}|=z^{*}_{j} for each j[n]j\in[n], up to a global phase. Since uu equals uu^{*} when σ=0\sigma=0, it successfully explains the exact recovery of the spectral method in the no-additive-noise case. Another advantage of viewing uu^{*} as the population eigenvector, instead of z/nz^{*}/\sqrt{n}, is that intuitively uu^{*} is closer to uu than z/nz^{*}/\sqrt{n} is. This is because AzzHA\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}} is closer to the data matrix XX than pzzHpz^{*}z^{*{\mathrm{\scriptscriptstyle H}}} is.

The second key component is a novel perturbation analysis for uu. Classical matrix perturbation theory such as Davis-Kahan Theorem focuses on analyzing infb1uub\inf_{b\in\mathbb{C}_{1}}\|u-u^{*}b\|. We go beyond it and show uu can be well-approximated by its first-order approximation u~\widetilde{u} defined as

u~:=XuXu,\displaystyle\widetilde{u}:=\frac{Xu^{*}}{\left\|{Xu^{*}}\right\|}, (9)

in the sense that the difference between these two vectors (up to a phase) has a small 2\ell_{2} norm. This means that when nplognnp\gtrsim\log n and npσ2np\gtrsim\sigma^{2}, we have

infb1uu~bσ2+σnp,\displaystyle\inf_{b\in\mathbb{C}_{1}}\left\|{u-\widetilde{u}b}\right\|\lesssim\frac{\sigma^{2}+\sigma}{np}, (10)

with high probability (see Proposition 2). In fact, our perturbation analysis extends beyond the phase synchronization problem. What we establish is a general perturbation theory that can be applied to two arbitrary Hermitian matrices (see Lemma 2), which might be of independent interest.

With the help of these two key components, we then carry out an entrywise analysis for each z^j=uj/|uj|\widehat{z}_{j}=u_{j}/\left|u_{j}\right|. Note that uju_{j} can be decomposed into u~j\widetilde{u}_{j} and the difference between uju_{j} and u~j\widetilde{u}_{j} (up to some global phase). We can decompose the error of z^j\widehat{z}_{j} into two parts, one is related to the estimation error of u~j/|u~j|\widetilde{u}_{j}/|\widetilde{u}_{j}|, and the other is related to the magnitude of the difference between uju_{j} and u~j\widetilde{u}_{j} (up to some global phase). Summing over all coordinates, the first part eventually leads to the minimax risk (1+o(1))σ2/2np(1+o(1)){\sigma^{2}}/{2np} and the second part is essentially negligible due to (10), which leads to the exact minimax optimality of the spectral estimator.

Orthogonal Group Synchronization. The above analysis for the phase synchronization can be extended to quantify the performance of the spectral method for orthogonal group synchronization, which is about orthogonal matrices instead of phases. Let d>0d>0 be an integer. Define

𝒪(d):={Ud×d:UUT=UTU=Id}\displaystyle\mathcal{O}(d):=\left\{U\in\mathbb{R}^{d\times d}:UU^{\mathrm{\scriptscriptstyle T}}=U^{\mathrm{\scriptscriptstyle T}}U=I_{d}\right\} (11)

to include all orthogonal matrices in d×d\mathbb{R}^{d\times d}. Let Z1,,Zn𝒪(d)Z^{*}_{1},\ldots,Z^{*}_{n}\in\mathcal{O}(d). Analogous to (1), we consider the problem with additive Gaussian noise and incomplete data [20, 26, 27]. For each 1j<kn1\leq j<k\leq n, we observe 𝒳jk:=ZjZkT+σ𝒲jkd×d\mathcal{X}_{jk}:={Z_{j}^{*}Z_{k}^{*{\mathrm{\scriptscriptstyle T}}}+\sigma\mathcal{W}_{jk}}\in\mathbb{R}^{d\times d} when Ajk=1A_{jk}=1, where 𝒲jk\mathcal{W}_{jk} follows the standard matrix Gaussian distribution. The goal is to recover Z1,,ZnZ_{1}^{*},\ldots,Z_{n}^{*} from {𝒳jk}1j<kn\{\mathcal{X}_{jk}\}_{1\leq j<k\leq n} and {Aj,k}1j<kn\{A_{j,k}\}_{1\leq j<k\leq n}. This is known as the orthogonal group synchronization (or 𝒪(d)\mathcal{O}(d) synchronization).

The observations {𝒳jk}1j<kn\{\mathcal{X}_{jk}\}_{1\leq j<k\leq n} can be seen as submatrices of an nd×ndnd\times nd matrix 𝒳\mathcal{X} with 𝒳jj:=0\mathcal{X}_{jj}:=0 and 𝒳kj:=𝒳jkT\mathcal{X}_{kj}:=\mathcal{X}_{jk}^{\mathrm{\scriptscriptstyle T}} for any 1j<kn1\leq j<k\leq n. Then 𝒳\mathcal{X} is symmetric and can be seen as a noisy version of

𝔼𝒳=pZZTpInd\displaystyle\mathbb{E}\mathcal{X}=pZ^{*}Z^{*{\mathrm{\scriptscriptstyle T}}}-pI_{nd} (12)

whose leading eigenspace is Z/nZ^{*}/\sqrt{n}, where Z𝒪(d)nZ^{*}\in\mathcal{O}(d)^{n} is an nd×dnd\times d matrix such that its jjth submatrix is ZjZ^{*}_{j}. Similar to the phase synchronization, we have the following spectral method. Let λ1λd\lambda_{1}\geq\ldots\geq\lambda_{d} be the largest dd eigenvalues of 𝒳\mathcal{X} and u1,,udndu_{1},\ldots,u_{d}\in\mathbb{R}^{nd} be their corresponding eigenvectors. Denote U:=(u1,,ud)nd×dU:=(u_{1},\ldots,u_{d})\in\mathbb{R}^{nd\times d} as the eigenspace that includes the top dd eigenvectors of 𝒳\mathcal{X}. For each j[n]j\in[n], denote Ujd×dU_{j}\in\mathbb{R}^{d\times d} as its jjth submatrix. Then the spectral estimator Z^j𝒪(d)\widehat{Z}_{j}\in\mathcal{O}(d) is defined as

Z^j:={𝒫(Uj), if det(Uj)0,Id, if det(Uj)=0,\displaystyle\widehat{Z}_{j}:=\begin{cases}\mathcal{P}(U_{j}),\text{ if }\det(U_{j})\neq 0,\\ I_{d},\quad\;\;\;\text{ if }\det(U_{j})=0,\end{cases} (13)

for each j[n]j\in[n]. Here the mapping 𝒫:d×d𝒪(d)\mathcal{P}:\mathbb{R}^{d\times d}\rightarrow\mathcal{O}(d) is derived from the polar decomposition and serves as a normalization step for each UjU_{j} such that Z^j𝒪(d)\widehat{Z}_{j}\in\mathcal{O}(d). Let Z^𝒪(d)n\widehat{Z}\in\mathcal{O}(d)^{n} be an nd×dnd\times d matrix such that Z^j\widehat{Z}_{j} is its jjth submatrix for each j[n]j\in[n]. Then the performance of Z^\widehat{Z} can be quantified by a loss function od(Z^,Z)\ell^{\text{od}}(\widehat{Z},Z^{*}) that is analogous to (5). The detailed definitions of 𝒫\mathcal{P} and od\ell^{\text{od}} are deferred to Section 3.

The spectral method Z^\widehat{Z} was used as an initialization in [20] for a variant of GPM to achieve the exact minimax risk (1+o(1))d(d1)σ22np(1+o(1))\frac{d(d-1)\sigma^{2}}{2np} for d=O(1)d=O(1). To conduct a sharp analysis of its statistical performance, we extend our novel perturbation analysis from analyzing the leading eigenvector to the leading eigenspace. Recall uwidecheck\widecheck{u} is the leading eigenvector of AA. Analogous to (8), we have a novel choice of the population eigenspace UU^{*}, defined as

U:=Z(uwidecheck𝟙d),\displaystyle U^{*}:=Z^{*}\circ(\widecheck{u}\otimes\mathds{1}_{d}), (14)

and view UU as its sample counterpart. This is different from existing literature [20, 40] which uses Z/nZ^{*}/\sqrt{n} as the population eigenspace. Our choice of UU^{*} enables the establishment of the exact recovery of the spectral method when there is no additive noise (i.e., σ=0\sigma=0), as seen Proposition 3, and is closer to UU than Z/nZ^{*}/\sqrt{n} is.

The first-order approximation of UU is a matrix determined by 𝒳U\mathcal{X}U^{*} whose explicit expression will be given later in (22). We then show UU can be well-approximated by its first-order approximation, analogous to (9), with a remainder term of a small 2\ell_{2} norm (see Proposition 4). This is a consequence of a more general eigenspace perturbation theory (see Lemma 4) for two arbitrary Hermitian matrices. Using the first-order approximation, we then carry out an entrywise analysis for Z^\widehat{Z}. Our main result for the spectral method in the 𝒪(d)\mathcal{O}(d) synchronization is summarized below in Theorem 2. The non-asymptotic version will be given in Theorem 4.

Theorem 2.

Assume npσ2\frac{np}{\sigma^{2}}\rightarrow\infty, nplogn\frac{np}{\log n}\rightarrow\infty, and 2d=O(1)2\leq d=O(1). There exists some δ=o(1)\delta=o(1) such that with high probability,

od(Z^,Z)(1+δ)d(d1)σ22np.\displaystyle\ell^{\text{od}}(\widehat{Z},Z^{*})\leq(1+\delta)\frac{d(d-1)\sigma^{2}}{2np}.

As a consequence, when σ=0\sigma=0 (i.e., there is no additive noise), the spectral method recovers ZZ^{*} exactly (up to an orthogonal matrix) with high probability as long as nplogn\frac{np}{\log n}\rightarrow\infty.

Theorem 2 shows that the spectral method Z^\widehat{Z} achieves exact minimax optimality as it matches the minimax lower bound (1+o(1))d(d1)σ22np(1+o(1))\frac{d(d-1)\sigma^{2}}{2np} established in [20]. Similar to the phase synchronization, the two conditions needed in Theorem 2 so that consistent estimation of ZZ^{*} is possible. They are also needed in [20] to achieve the minimax risk by a variant of GPM initialized by the spectral method. On the contrary, Theorem 2 shows that in this parameter regime, the spectral method is already minimax optimal with the correct leading constant.

Related Literature. Synchronization is a fundamental problem in applied math and statistics. Various methods have been studied for both phase synchronization and 𝒪(d)\mathcal{O}(d) synchronization, including the maximum likelihood estimation (MLE) [18, 39], GPM [39, 29, 18, 20, 26, 34], SDP [4, 37, 28, 16, 38, 19, 22], spectral methods [4, 37, 33, 8, 30, 17], and message passing [31, 25, 35]. The theoretical performance of spectral methods was investigated in [18, 20, 27] and crude error bounds under 2\ell_{2} or Frobenius norm were obtained. An \ell_{\infty}-type error bound for spectral methods was also given in [27].

Fine-grained perturbation analysis of eigenvectors has gained increasing attention in recent years for various low-rank matrix problems in machine learning and statistics. Existing literature has mostly focused on establishing \ell_{\infty} bounds for eigenvectors [1, 12, 15] or 2,\ell_{2,\infty} bounds on eigenspaces [23, 11, 9, 3]. For instance, [1] developed \ell_{\infty}-type bounds for the difference between eigenvectors (or eigenspaces) and their first-order approximations. In this paper, we focus on developing sharp 2\ell_{2}-type perturbation bounds, where direct applications of existing \ell_{\infty}-type results will result in extra logarithm factors.

For the phase synchronization problem, [17, 21, 32] investigated variants of spectral methods based on Laplacian matrices. Instead of using the leading eigenvector of XX as in this paper, they utilize the eigenvector corresponding to the smallest eigenvalue of DXD-X or InD12XD12I_{n}-D^{-\frac{1}{2}}XD^{-\frac{1}{2}}, where Dn×nD\in\mathbb{R}^{n\times n} is the degree matrix of AA with diagonal entries Djj:=kjAjkD_{jj}:=\sum_{k\neq j}A_{jk} and off-diagonal entries set to zero. These studies have established upper bounds for the performance of their spectral methods applicable to general graphs AA and additive noise WW. Our focus, however, is on Erdös-Rényi random graphs with Gaussian noise. Under our setting, their results imply an upper bound of Cσ2np\frac{C\sigma^{2}}{np}, where CC is a constant significantly greater than 1. In contrast, our work establishes a sharp upper bound with the correct leading constant 1/21/2. Our analytical approach could potentially be extended to their methods to achieve the correct constant 1/21/2.

The existing literature [18, 19, 20] explored the exact minimax risk in synchronization problems, focusing primarily on minimax lower bounds and analyzing MLE, GPM, and SDP. While our study shares a thematic resemblance with these prior efforts, it fundamentally diverges in both analysis and proof techniques. Previous studies hinge on contraction properties of the generalized power iteration (GPI), demonstrating the iterative reduction in GPM error until an optimal error is achieved. This approach further interprets MLE as a GPI fixed point and SDP as an extension of GPI in a higher-dimensional space, thereby establishing their optimality. In contrast, this paper employs a novel strategy specifically tailored for the spectral method. Instead of relying on the GPI framework, which proves inadequate for spectral analysis, we introduce a new perturbation toolkit designed for eigenvector analysis. This toolkit provides precise characterization of eigenvector perturbation and leads to the optimality of the spectral method. It opens new avenues for research and application beyond synchronization problems.

Organization. We study the phase synchronization in Section 2. We first establish the exact recovery of the spectral method in the no-additive-noise case in Section 2.1. Then in Section 2.2, we present our main technical tool for quantifying the distance between the leading eigenvector and its first-order approximation. We then carry out an entry-wise analysis of the spectral method and obtain non-asymptotic sharp upper bounds in Section 2.3. Finally, we consider the extension to the orthogonal group synchronization in Section 3. Proofs of results for the phase synchronization are given in Section 5. Due to the page limit, we prove Lemma 4 in Section 6 and include the proofs of other results for the orthogonal group synchronization in the Appendix.

Notation. For any positive integer nn, we write [n]:={1,2,,n}[n]:=\{1,2,\ldots,n\} and 𝟙n:=(1,1,,1)n\mathds{1}_{n}:=(1,1,\ldots,1)\in\mathbb{R}^{n}. Denote InI_{n} as the n×nn\times n identity matrix and Jn:=𝟙n𝟙nTn×nJ_{n}:=\mathds{1}_{n}\mathds{1}_{n}^{\mathrm{\scriptscriptstyle T}}\in\mathbb{R}^{n\times n} as the n×nn\times n matrix with all entries being one. Given a,ba,b\in\mathbb{R}, we denote ab:=min{a,b}a\wedge b:=\min\{a,b\} and ab:=max{a,b}a\vee b:=\max\{a,b\}. For a complex number xx\in\mathbb{C}, we use x¯\bar{x} for its complex conjugate, Re(x){\rm Re}(x) for its real part, Im(x){\rm Im}(x) for its imaginary part, and |x||x| for its modulus. Denote 𝕊n:={xn:x=1}\mathbb{S}_{n}:=\left\{x\in\mathbb{C}^{n}:\left\|{x}\right\|=1\right\} as including all unit vectors in n\mathbb{C}^{n}. For a complex vector x=(xj)dx=(x_{j})\in\mathbb{C}^{d}, we denote x=(j=1d|xj|2)1/2\|x\|=({\sum_{j=1}^{d}|x_{j}|^{2}})^{1/2} as its Euclidean norm. For a complex matrix B=(Bjk)d1×d2B=(B_{jk})\in\mathbb{C}^{d_{1}\times d_{2}}, we use BHd2×d1B^{\mathrm{\scriptscriptstyle H}}\in\mathbb{C}^{d_{2}\times d_{1}} for its conjugate transpose such that BjkH=Bkj¯B^{{\mathrm{\scriptscriptstyle H}}}_{jk}=\overline{B_{kj}}. The Frobenius norm and the operator norm of BB are defined by BF:=(j=1d1k=1d2|Bjk|2)1/2\left\|{B}\right\|_{\rm F}:=({\sum_{j=1}^{d_{1}}\sum_{k=1}^{d_{2}}|B_{jk}|^{2}})^{1/2} and B:=supud1,vd2:u=v=1uHBv\left\|{B}\right\|:=\sup_{u\in\mathbb{C}^{d_{1}},v\in\mathbb{C}^{d_{2}}:\left\|{u}\right\|=\left\|{v}\right\|=1}u^{\mathrm{\scriptscriptstyle H}}Bv. We use Tr(B)\text{Tr}(B) for the trace of a squared matrix BB. We denote BjB_{j\cdot} as its jjth row and define B2:=maxj[d1]Bj\|B\|_{2\rightarrow\infty}:=\max_{j\in[d_{1}]}\left\|{B_{j\cdot}}\right\|. The notation det()\det(\cdot) and \otimes are used for determinant and Kronecker product. For U,Vd1×d2U,V\in\mathbb{C}^{d_{1}\times d_{2}}, UVd1×d2U\circ V\in\mathbb{R}^{d_{1}\times d_{2}} is the Hadamard product UV:=(UjkVjk)U\circ V:=(U_{jk}V_{jk}). For any Bd1×d2B\in\mathbb{R}^{d_{1}\times d_{2}}, we denote smin(B)s_{\min}(B) as its smallest singular value. For two positive sequences {an}\{a_{n}\} and {bn}\{b_{n}\}, anbna_{n}\lesssim b_{n} and an=O(bn)a_{n}=O(b_{n}) both mean anCbna_{n}\leq Cb_{n} for some constant C>0C>0 independent of nn. We also write an=o(bn)a_{n}=o(b_{n}) or bnan\frac{b_{n}}{a_{n}}\rightarrow\infty when lim supnanbn=0\limsup_{n}\frac{a_{n}}{b_{n}}=0. We use 𝕀{}{\mathbb{I}\left\{{\cdot}\right\}} as the indicator function. Define 𝒪(d1,d2):={Vd1×d2:VTV=Id2}\mathcal{O}(d_{1},d_{2}):=\{V\in\mathbb{R}^{d_{1}\times d_{2}}:V^{\mathrm{\scriptscriptstyle T}}V=I_{d_{2}}\} to include all d1×d2d_{1}\times d_{2} matrices that have orthonormal columns.

2 Phase Synchronization

2.1 No-additive-noise Case

We first study a special case where there is no additive noise (i.e., σ=0\sigma=0). In this setting, the data matrix X=AzzHX=A\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}}. Despite the data still being missing at random, we are going to show the spectral method is able to recover zz^{*} exactly, up to a phase.

Recall that uwidecheck\widecheck{u} is the leading eigenvector of AA and uu^{*} is defined in (8). The following lemma points out the connection between uu^{*} and AzzHA\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}} as well as the connection between eigenvalues of AA and those of AzzHA\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}}.

Lemma 1.

The unit vector uu^{*} is the leading eigenvector of AzzHA\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}}. That is, with λ\lambda^{*} denoting as the largest eigenvalue of AzzHA\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}}, we have

(AzzH)u=λu,\displaystyle\left(A\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}}\right)u^{*}=\lambda^{*}u^{*}, (15)

In addition, all the eigenvalues of AA are also eigenvalues of AzzHA\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}}, and vice versa.

Since X=AzzHX=A\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}} in the no-additive-noise case, we have u=uu=u^{*}. Note that z^j=uj/|uj|=uj/|uj|=zjuwidecheckj/|zjuwidecheckj|\widehat{z}_{j}=u_{j}/|u_{j}|=u^{*}_{j}/|u^{*}_{j}|=z^{*}_{j}\widecheck{u}_{j}/|z^{*}_{j}\widecheck{u}_{j}| for each j[n]j\in[n]. If uwidecheckj>0\widecheck{u}_{j}>0, we have z^j=zj\widehat{z}_{j}=z_{j}^{*}. If u^j<0\widehat{u}_{j}<0 instead, then z^j=zj\widehat{z}_{j}=-z_{j}^{*}. If all the coordinates of uwidecheck\widecheck{u} are positive (or negative), we have z^\widehat{z} being equal to zz^{*} (or z-z^{*}) exactly. The following proposition provides an \ell_{\infty} control for the difference between uwidecheck\widecheck{u} and 𝟙n/n\mathds{1}_{n}/\sqrt{n}, which are eigenvectors of AA and 𝔼A\mathbb{E}A, respectively. The proof of (16) follows proofs of results in [1]. When the right-hand side of (16) is smaller than 1/n1/\sqrt{n}, it immediately establishes the exact recovery of z^\widehat{z}.

Proposition 1.

There exist some constants C1,C2>0C_{1},C_{2}>0 such that if nplogn>C1\frac{np}{\log n}>C_{1}, we have

minb{1,1}maxj[n]|uwidecheckj1nb|C2(lognnp+1log(np))1n,\displaystyle\min_{b\in\{1,-1\}}\max_{j\in[n]}\left|\widecheck{u}_{j}-\frac{1}{\sqrt{n}}b\right|\leq C_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)\frac{1}{\sqrt{n}}, (16)

with probability at least 18n101-8n^{-10}. As a result, if nplogn>max{C1,2C22}\frac{np}{\log n}>\max\left\{C_{1},2C_{2}^{2}\right\}, we have (z^,z)=0\ell(\widehat{z},z^{*})=0 with probability at least 18n101-8n^{-10}.

Lemma 1 and Proposition 1 together establish the exact recovery of z^\widehat{z} for the special case where σ=0\sigma=0, through studying uu^{*}. This provides a starting point for our analysis of the general case where σ0\sigma\neq 0. From (2), the data matrix XX is a noisy version of AzzHA\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}} with additive noise σAW\sigma A\circ W that scales with σ\sigma. As a result, in the following sections, we view uu^{*} as the population eigenvector and uu as its sample counterpart, studying the performance of the spectral method.

2.2 First-order Approximation of The Leading Eigenvector

In this section, we provide a fine-grained perturbation analysis for the eigenvector uu. Classical matrix perturbation theory, such as Davis-Kahan Theorem, can only give a crude upper bound for infb1uub\inf_{b\in\mathbb{C}_{1}}\left\|{u-u^{*}b}\right\|, which turns out to be insufficient to derive a sharp bound for (z^,z)\ell(\widehat{z},z^{*}). Instead, we develop a more powerful tool for perturbation analysis of uu using its first-order approximation u~\widetilde{u} defined in (9). In fact, our tool goes beyond the phase synchronization problem and can be applied to arbitrary Hermitian matrices.

Lemma 2.

Consider two Hermitian matrices Y,Yn×nY,Y^{*}\in\mathbb{C}^{n\times n}. Let μ1μ2μn\mu^{*}_{1}\geq\mu^{*}_{2}\geq\ldots\geq\mu^{*}_{n} be the eigenvalues of YY^{*}. Let vv^{*} (resp. vv) be the eigenvector of YY^{*} (resp. YY) corresponding to its largest eigenvalue. If YYmin{μ1μ2,μ1}/4\left\|{Y-Y^{*}}\right\|\leq\min\{\mu_{1}^{*}-\mu_{2}^{*},\mu_{1}^{*}\}/4, we have

infb1vYvYvb\displaystyle\inf_{b\in\mathbb{C}_{1}}\left\|{v-\frac{Yv^{*}}{\left\|{Yv^{*}}\right\|}b}\right\| 4029(μ1μ2)((4μ1μ2+2μ1)YY2\displaystyle\leq\frac{40\sqrt{2}}{9(\mu_{1}^{*}-\mu_{2}^{*})}\Bigg{(}\left(\frac{4}{\mu_{1}^{*}-\mu_{2}^{*}}+\frac{2}{\mu_{1}^{*}}\right)\left\|{Y-Y^{*}}\right\|^{2}
+max{|μ2|,|μn|}μ1YY).\displaystyle\quad\quad\quad\quad\quad\quad\quad+\frac{\max\{|\mu_{2}^{*}|,|\mu_{n}^{*}|\}}{\mu_{1}^{*}}\left\|{Y-Y^{*}}\right\|\Bigg{)}.

In Lemma 2, there are two matrices Y,YY,Y^{*} whose leading eigenvectors are v,vv,v^{*} respectively. It studies the 2\ell_{2} difference between vv and Yv/YvYv^{*}/{\left\|{Yv^{*}}\right\|} up to a phase. Let μ1\mu_{1} be the largest eigenvalue of YY. The unit vector Yv/YvYv^{*}/{\left\|{Yv^{*}}\right\|} is interpreted as the first-order approximation of vv, as vv can be decomposed into v=Yv/μ1=Yv/μ1+Y(vv)/μ1v=Yv/\mu_{1}=Yv^{*}/\mu_{1}+Y(v-v^{*})/\mu_{1} where the first term Yv/μ1Yv^{*}/\mu_{1} is proportional to Yv/YvYv^{*}/{\left\|{Yv^{*}}\right\|}. If YY^{*} is rank-one, meaning μ2=μn=0\mu_{2}^{*}=\mu^{*}_{n}=0, the upper bound in Lemma 2 becomes 802YY2/(3μ12){80\sqrt{2}\left\|{Y-Y^{*}}\right\|^{2}}/{(3\mu_{1}^{*2})}. Lemma 2 itself might be of independent interest and be useful in other low-rank matrix problems.

The key to Lemma 2 is the following equation. Since μ1v=Yv\mu_{1}v=Yv and YvYvYv=Yv\left\|{Yv^{*}}\right\|\frac{Yv^{*}}{\left\|{Yv^{*}}\right\|}=Yv^{*}, we can derive (see (29) in the proof of Lemma 2):

μ11Yv(μ1InY)(vYvYv)=Y(μ11Yvv).\displaystyle\mu_{1}^{-1}\left\|{Yv^{*}}\right\|(\mu_{1}I_{n}-Y)\left(v-\frac{Yv^{*}}{\left\|{Yv^{*}}\right\|}\right)=Y(\mu_{1}^{-1}Yv^{*}-v^{*}).

Its left-hand side can be shown to be related to infb1vYvb/Yv\inf_{b\in\mathbb{C}_{1}}\left\|{v-{Yv^{*}b}/{\left\|{Yv^{*}}\right\|}}\right\|. By carefully studying and upper bounding its right-hand side, which does not involve vv, we derive Lemma 2.

Lemma 2 requires the perturbation between YY and YY^{*} is not only small compared to the eigengap μ1μ2\mu^{*}_{1}-\mu^{*}_{2}, but also small compared to the leading eigenvalue μ1\mu^{*}_{1}. A similar requirement is also needed in [1] to establish \ell_{\infty} bounds for the difference between the eigenvector and its first-order approximation. In contrast, classical theory such as Davis-Kahan theorem (see Lemma 5) only needs the perturbation to be small compared to the eigengap to bound infb1vvb\inf_{b\in\mathbb{C}_{1}}\left\|{v-v^{*}b}\right\|. A natural question is whether the bound in Lemma 2 can be modified to depend on eigenvalues only through the eigengaps. It turns out this is not feasible, as it deals with the distance between vv and its first-order approximation Yv/YvYv^{*}/{\left\|{Yv^{*}}\right\|}, not the distance between vv and vv^{*} as in Davis-Kahan theorem. To illustrate it, consider the following counterexample. Let e1,,ene_{1},\ldots,e_{n} be the canonical basis of n\mathbb{R}^{n}. Let δ>0\delta>0. Define

Y:=diag(0,1,1,,1)n×n, and Y:=Y+δ(e1+e2)(e1+e2)T/2.\displaystyle Y^{*}:=\text{diag}(0,-1,-1,\ldots,-1)\in\mathbb{R}^{n\times n},\text{ and }Y:=Y^{*}+\delta(e_{1}+e_{2})(e_{1}+e_{2})^{\mathrm{\scriptscriptstyle T}}/2. (17)

Then μ1=0,μ2=1,μ1μ2=1,v=e1,YY=δ\mu_{1}^{*}=0,\mu_{2}^{*}=-1,\mu^{*}_{1}-\mu^{*}_{2}=1,v^{*}=e_{1},\left\|{Y-Y^{*}}\right\|=\delta, and Yv/Yv=(e1+e2)/2Yv^{*}/{\left\|{Yv^{*}}\right\|}=(e_{1}+e_{2})/\sqrt{2}. We can show vv has the following explicit expression (see Appendix C for detailed calculation):

v=12(1+11+δ2)e1+12(111+δ2)e2.\displaystyle v=\sqrt{\frac{1}{2}\left(1+\frac{1}{\sqrt{1+\delta^{2}}}\right)}e_{1}+\sqrt{\frac{1}{2}\left(1-\frac{1}{\sqrt{1+\delta^{2}}}\right)}e_{2}. (18)

When δ\delta is sufficiently close to 0, we have vvv\approx v^{*}. This is not surprising as it is consistent with the bound from Davis-Kahan theorem as the ratio between the perturbation and eigengap is YY/(μ1μ2)=δ0\left\|{Y-Y^{*}}\right\|/(\mu^{*}_{1}-\mu^{*}_{2})=\delta\approx 0. On the other hand, vYv/Yve1(e1+e2)/2=22>0\left\|{v-Yv^{*}/{\left\|{Yv^{*}}\right\|}}\right\|\approx\left\|{e_{1}-(e_{1}+e_{2})/\sqrt{2}}\right\|=2-\sqrt{2}>0 no matter how small δ\delta may be. As a result, in this counterexample, Yv/YvYv^{*}/{\left\|{Yv^{*}}\right\|} is not a good approximation of vv despite the sufficiently small perturbation.

Applying Lemma 2 to the phase synchronization, we have the following result.

Proposition 2.

There exist constants C1,C2,C3>0C_{1},C_{2},C_{3}>0 such that if nplogn>C1\frac{np}{\log n}>C_{1} and npσ2>C2\frac{np}{\sigma^{2}}>C_{2}, we have

infb1uu~bC3σ2+σnp,\displaystyle\inf_{b\in\mathbb{C}_{1}}\left\|{u-\widetilde{u}b}\right\|\leq C_{3}\frac{\sigma^{2}+\sigma}{np},

with probability at least 13n101-3n^{-10}.

Proposition 2 shows that uu is well-approximated by its first-order approximation u~\widetilde{u} (up to a phase) with an approximation error that is at most in the order of (σ2+σ)/np(\sigma^{2}+\sigma)/np. Note that we can show infb1uub\inf_{b\in\mathbb{C}_{1}}\left\|{u-u^{*}b}\right\| is of order σ/np\sigma/\sqrt{np} by using Davis-Kahan Theorem. This is much larger than the upper bound derived in Proposition 2, particularly when np/σ2np/\sigma^{2} is large. As a result, u~\widetilde{u} provides a precise characterization of uu with negligible 2\ell_{2} error.

2.3 Sharp 2\ell_{2} Analysis of The Spectral Estimator

In this section, we will conduct a sharp analysis of the performance of the spectral estimator z^\widehat{z} using the first-order approximation u~\widetilde{u} of the eigenvector uu. According to Proposition 2, uu is close to u~\widetilde{u} (up to a phase) with a small difference. Then intuitively, z^\widehat{z} should be close to its counterpart that uses u~\widetilde{u} instead of uu in (4), up to a global phase. For each j[n]j\in[n], the distance of u~j/|u~j|\widetilde{u}_{j}/\left|\widetilde{u}_{j}\right| from zjz^{*}_{j} is essentially determined by zj¯u~j\overline{z^{*}_{j}}\widetilde{u}_{j}. By the definition in (9), u~j\widetilde{u}_{j} is proportional to [Xu]j[Xu^{*}]_{j}, the jjth coordinate of XuXu^{*}. With (2), it leads to zj¯u~jλzj¯uj+σkjAjkWjkzj¯uk\overline{z^{*}_{j}}\widetilde{u}_{j}\propto\lambda^{*}\overline{z^{*}_{j}}u^{*}_{j}+\sigma\sum_{k\neq j}A_{jk}W_{jk}\overline{z^{*}_{j}}u_{k}^{*}. Here the first term λzj¯uj\lambda^{*}\overline{z^{*}_{j}}u^{*}_{j} can be interpreted as the signal as it is related to the population quantity uju^{*}_{j}, which gives the exact recovery of the spectral method in the no-additive-noise case in Proposition 1. As uu^{*} is close to z/nz^{*}/\sqrt{n}, the second term is approximately equal to n1/2kjAjkWjkzj¯zkn^{-1/2}\sum_{k\neq j}A_{jk}W_{jk}\overline{z^{*}_{j}}z_{k}^{*}. Its contribution toward the estimation error is essentially determined by its imaginary part n1/2Im(kjAjkWjkzj¯zk)n^{-1/2}{\rm Im}(\sum_{k\neq j}A_{jk}W_{jk}\overline{z^{*}_{j}}z_{k}^{*}), which can be interpreted as the main error term. Summing over all j[n]j\in[n], the signals and the main error terms together lead to the minimax risk σ2/(2np)\sigma^{2}/(2np). At the same time, contributions of approximation errors such as infb1uu~b\inf_{b\in\mathbb{C}_{1}}\left\|{u-\widetilde{u}b}\right\| turn out to be negligible. This leads to the following theorem on the performance of the spectral estimator.

Theorem 3.

There exist constants C1,C2,C3>0C_{1},C_{2},C_{3}>0 such that if nplogn>C1\frac{np}{\log n}>C_{1} and npσ2>C2\frac{np}{\sigma^{2}}>C_{2}, we have

(z^,z)(1+C3((σ2np)14+lognnp+1log(np)))σ22np,\displaystyle\ell(\widehat{z},z^{*})\leq\left(1+C_{3}\left(\left(\frac{\sigma^{2}}{np}\right)^{\frac{1}{4}}+\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)\right)\frac{\sigma^{2}}{2np},

with probability at least 1n9exp(132(npσ2)14)1-n^{-9}-\exp\left(-\frac{1}{32}\left(\frac{np}{\sigma^{2}}\right)^{\frac{1}{4}}\right).

Theorem 3 is non-asymptotic and its asymptotic version is presented in Theorem 1. It covers the no-additive-noise case (i.e., Proposition 1), as it implies that (z^,z)=0\ell(\widehat{z},z^{*})=0 with high probability when σ=0\sigma=0. Theorem 3 shows that (z^,z)\ell(\widehat{z},z^{*}) is equal to σ2/(2np)\sigma^{2}/(2np) up to a factor that is determined by (σ2/(np))1/4(\sigma^{2}/(np))^{1/4}, logn/(np)\sqrt{\log n/(np)}, and 1/log(np)1/\log(np). The first term is related to various approximation errors including the one from Proposition 2. The second and third terms are derived from (16).

We can make a comparison between Theorem 3 and the existing result (z^,z)(σ2+1)/np\ell(\widehat{z},z^{*})\lesssim(\sigma^{2}+1)/np in [18]. There are two main improvements. First, we obtain the exact constant 1/2 for the error term σ2np\frac{\sigma^{2}}{np}, which gives a more accurate characterization of the performance of the spectral estimator. Second, the 1/np1/np error term in (σ2+1)/np(\sigma^{2}+1)/np no longer exists in Theorem 3. We further compare Theorem 3 with the minimax lower bound for the phase synchronization problem. The paper [18] proved that there exist constants C4,C5>0C_{4},C_{5}>0 such that if npσ2C4\frac{np}{\sigma^{2}}\geq C_{4}, we have

infznsupz1n𝔼(z,z)(1C5(σ2np+1n))σ22np.\displaystyle\inf_{z\in\mathbb{C}^{n}}\sup_{z^{*}\in\mathbb{C}_{1}^{n}}\mathbb{E}\ell(z,z^{*})\geq\left(1-C_{5}\left(\frac{\sigma^{2}}{np}+\frac{1}{n}\right)\right)\frac{\sigma^{2}}{2np}. (19)

Compared with (19), the spectral estimator z^\widehat{z} is exact minimax optimal as it not only achieves the correct rate σ2/(np)\sigma^{2}/(np) but also the correct constant 1/21/2. Under the parameter regime as in Theorem 3, [18, 19] showed that MLE, GPM (if properly initialized), and SDP achieve the exact minimax risk. Theorem 3 points out that the spectral method is as good as these methods.

3 Orthogonal Group Synchronization

In this section, we will extend our analysis to matrix synchronizations where the quantities of interest are orthogonal matrices instead of phases. The orthogonal group synchronization problem has been briefly introduced in Section 1. Here we provide more details about the problem.

Let d>0d>0 be an integer. Recall the definition of 𝒪(d)\mathcal{O}(d) in (11) and that Z1,,Zn𝒪(d)Z^{*}_{1},\ldots,Z^{*}_{n}\in\mathcal{O}(d). For each 1j<kn1\leq j<k\leq n, the observation 𝒳jkd×d\mathcal{X}_{jk}\in\mathbb{R}^{d\times d} is given by

𝒳jk:={ZjZkT+σ𝒲jk, if Ajk=1,0, if Ajk=0,\displaystyle\mathcal{X}_{jk}:=\begin{cases}{Z_{j}^{*}Z_{k}^{*{\mathrm{\scriptscriptstyle T}}}+\sigma\mathcal{W}_{jk}},\text{ if }A_{jk}=1,\\ 0,\quad\quad\quad\quad\quad\quad\text{ if }A_{jk}=0,\end{cases} (20)

where AjkBernoulli(p)A_{jk}\sim\text{Bernoulli}(p) and 𝒲jk𝒩(0,Id,Id)\mathcal{W}_{jk}\sim\mathcal{MN}(0,I_{d},I_{d}), i.e., the standard matrix Gaussian distribution111A random matrix XX follows a matrix Gaussian distribution 𝒩(M,Σ,Ω)\mathcal{MN}(M,\Sigma,\Omega) if its density function is proportional to exp(12Tr(Ω1(XM)TΣ1(XM)))\exp\left(-\frac{1}{2}\text{Tr}\left(\Omega^{-1}(X-M)^{{\mathrm{\scriptscriptstyle T}}}\Sigma^{-1}(X-M)\right)\right).. We assume {Ajk}1j<kn,{𝒲j,k}1j<kn\{A_{jk}\}_{1\leq j<k\leq n},\{\mathcal{W}_{j,k}\}_{1\leq j<k\leq n} are all independent of each other. Similar to the phase synchronization problem, the observations are missing at random with additive Gaussian noises. The goal is to recover Z1,,ZnZ_{1}^{*},\ldots,Z_{n}^{*} from {𝒳jk}1j<kn\{\mathcal{X}_{jk}\}_{1\leq j<k\leq n} and {Aj,k}1j<kn\{A_{j,k}\}_{1\leq j<k\leq n}.

The data matrix 𝒳nd×nd\mathcal{X}\in\mathbb{R}^{nd\times nd} can be written equivalently in a way that is analogous to (2). Define Ajj:=0A_{jj}:=0 and Akj:=AjkA_{kj}:=A_{jk} for all 1j<kn1\leq j<k\leq n. Define 𝒲nd×nd\mathcal{W}\in\mathbb{C}^{nd\times nd} such that 𝒲jj:=0d×d\mathcal{W}_{jj}:=0_{d\times d} and 𝒲kj:=𝒲jkT\mathcal{W}_{kj}:=\mathcal{W}_{jk}^{\mathrm{\scriptscriptstyle T}} for all 1j<kn1\leq j<k\leq n. Then we have the expression:

𝒳=(AJd)(ZZT+σ𝒲)=(AJd)ZZT+σ(AJd)𝒲.\displaystyle\mathcal{X}=(A\otimes J_{d})\circ(Z^{*}Z^{*{\mathrm{\scriptscriptstyle T}}}+\sigma\mathcal{W})=(A\otimes J_{d})\circ Z^{*}Z^{*{\mathrm{\scriptscriptstyle T}}}+\sigma(A\otimes J_{d})\circ\mathcal{W}. (21)

From (12), the data matrix 𝒳\mathcal{X} can be seen as a noisy version of pZZTpZ^{*}Z^{*{\mathrm{\scriptscriptstyle T}}}. Since the columns of ZZ^{*} are orthogonal to each other, we have the following eigendecomposition: pZZT=np(Z/n)(Z/n)TpZ^{*}Z^{*{\mathrm{\scriptscriptstyle T}}}=np(Z^{*}/\sqrt{n})(Z^{*}/\sqrt{n})^{\mathrm{\scriptscriptstyle T}} where Z/n𝒪(nd,d)Z^{*}/\sqrt{n}\in\mathcal{O}(nd,d). That is, npnp is the only non-zero eigenvalue of pZZTpZ^{*}Z^{*{\mathrm{\scriptscriptstyle T}}} with multiplicity dd.

The definition of the spectral estimator Z^1,,Z^n\widehat{Z}_{1},\ldots,\widehat{Z}_{n} is given in (13). The mapping 𝒫:d×d𝒪(d)\mathcal{P}:\mathbb{R}^{d\times d}\rightarrow\mathcal{O}(d) is from the polar decomposition and is defined as follows. For any matrix Bd×dB\in\mathbb{R}^{d\times d} that is full-rank, it admits a singular value decomposition (SVD): B=MDVTB=MDV^{\mathrm{\scriptscriptstyle T}} with M,V𝒪(d)M,V\in\mathcal{O}(d) and DD a diagonal matrix. Then its polar decomposition is B=(MVT)(VDVT)B=(MV^{\mathrm{\scriptscriptstyle T}})(VDV^{\mathrm{\scriptscriptstyle T}}) and 𝒫(B):=MVT\mathcal{P}(B):=MV^{\mathrm{\scriptscriptstyle T}} is defined as its first factor.

Recall that uwidecheck\widecheck{u} is the leading eigenvector of AA and the population eigenspace UU^{*} is defined in (14). That is, Und×dU^{*}\in\mathbb{R}^{nd\times d} and its jjth submatrix is Uj=uwidecheckjZjd×dU^{*}_{j}=\widecheck{u}_{j}Z^{*}_{j}\in\mathbb{R}^{d\times d} for each j[n]j\in[n]. Following the proof of Lemma 1, we can show UU^{*} is the leading eigenspace of (AJd)ZZT(A\otimes J_{d})\circ Z^{*}Z^{*{\mathrm{\scriptscriptstyle T}}}:

Lemma 3.

Denote λ1λ2λnd\lambda^{*}_{1}\geq\lambda_{2}^{*}\geq\ldots\geq\lambda^{*}_{nd} as the eigenvalues of (AJd)ZZT(A\otimes J_{d})\circ Z^{*}Z^{*{\mathrm{\scriptscriptstyle T}}}. Then λ1=λ2==λd\lambda^{*}_{1}=\lambda^{*}_{2}=\ldots=\lambda^{*}_{d}, all equal the leading eigenvalue of AA. In addition, λd+1\lambda^{*}_{d+1} is equal to the second largest eigenvalue of AA. Furthermore, UU^{*} is the eigenspace of (AJd)ZZT(A\otimes J_{d})\circ Z^{*}Z^{*{\mathrm{\scriptscriptstyle T}}} corresponding to λ1\lambda^{*}_{1}, i.e.,

((AJd)ZZT)U=λ1U.\displaystyle((A\otimes J_{d})\circ Z^{*}Z^{*{\mathrm{\scriptscriptstyle T}}})U^{*}=\lambda^{*}_{1}U^{*}.

Following the proof of Proposition 1, particularly using (16), we can further establish the exact recovery of Z^\widehat{Z}, up to an orthogonal matrix, in the no-additive-noise case.

Proposition 3.

Consider the no-additive-noise case where σ=0\sigma=0. There exists some constant C1>0C_{1}>0 such that if nplogn>C1\frac{np}{\log n}>C_{1}, we have (z^,z)=0\ell(\widehat{z},z^{*})=0 with probability at least 17n101-7n^{-10}.

Similar to the phase synchronization, we can study the first-order approximation of the eigenspace UU. Denote Λ:=diag(λ1,,λd)d×d\Lambda:=\text{diag}(\lambda_{1},\ldots,\lambda_{d})\in\mathbb{R}^{d\times d} as the diagonal matrix of the dd largest eigenvalues of 𝒳\mathcal{X}. Then UU can be expressed as U=𝒳UΛ1U=\mathcal{X}U\Lambda^{-1}. Define

U~:=argminU𝒪(nd,d)U𝒳UF2.\displaystyle\widetilde{U}:=\mathop{\rm argmin}_{U^{\prime}\in\mathcal{O}(nd,d)}\left\|{U^{\prime}-\mathcal{X}U^{*}}\right\|_{\rm F}^{2}. (22)

Then U~\widetilde{U} is the projection of 𝒳U\mathcal{X}U^{*} onto 𝒪(nd,d)\mathcal{O}(nd,d). This is similar to the definition of u~\widetilde{u} in (9) for the phase synchronization, where u~\widetilde{u} is the projection of XuXu^{*} onto the unit sphere. As a result, U~\widetilde{U} can be regarded as the first-order approximation of UU.

The following lemma provides an upper bound for a leading eigenspace and its first-order approximation of two arbitrary Hermitian matrices. It is an extension of Lemma 2 which is only about the perturbation of a leading eigenvector. The proof of Lemma 4 follows that of Lemma 2 but is more involved, as it needs to deal with matrix multiplication which is not commutative.

Lemma 4.

Consider two symmetric matrices Y,Yn×nY,Y^{*}\in\mathbb{R}^{n\times n}. Let μ1μ2μn\mu^{*}_{1}\geq\mu^{*}_{2}\geq\ldots\geq\mu^{*}_{n} be the eigenvalues of YY^{*}. Let Vn×dV^{*}\in\mathbb{R}^{n\times d} (resp. VV) be the leading eigenspace of YY^{*} (resp. YY) corresponding to its dd largest eigenvalues. Define V~:=argminV𝒪(n,d)VYVF2\widetilde{V}:=\mathop{\rm argmin}_{V^{\prime}\in\mathcal{O}(n,d)}\left\|{V^{\prime}-YV^{*}}\right\|_{\rm F}^{2}. If YYmin{μdμd+1,μd}/4\left\|{Y-Y^{*}}\right\|\leq\min\{\mu_{d}^{*}-\mu_{d+1}^{*},\mu_{d}^{*}\}/4, we have

infO𝒪(d)VV~O1623(μdμd+1)μd(2μ13(μdμd+1)+1)YY2\displaystyle\inf_{O\in\mathcal{O}(d)}\left\|{V-\widetilde{V}O}\right\|\leq\frac{16\sqrt{2}}{3\left(\mu_{d}^{*}-\mu_{d+1}^{*}\right)\mu_{d}^{*}}\left(\frac{2\mu_{1}^{*}}{3(\mu_{d}^{*}-\mu_{d+1}^{*})}+1\right)\left\|{Y-Y^{*}}\right\|^{2}
+823(μdμd+1)μd(4μ1(μ1μd)μdμd+1+2(μ1μd)+max{|μd+1|,|μn|})YY.\displaystyle\quad+\frac{8\sqrt{2}}{3\left(\mu_{d}^{*}-\mu_{d+1}^{*}\right)\mu_{d}^{*}}\left(\frac{4\mu_{1}^{*}\left(\mu_{1}^{*}-\mu_{d}^{*}\right)}{\mu_{d}^{*}-\mu_{d+1}^{*}}+2(\mu_{1}^{*}-\mu_{d}^{*})+\max\{|\mu^{*}_{d+1}|,|\mu^{*}_{n}|\}\right)\left\|{Y-Y^{*}}\right\|.

Lemma 4 includes Lemma 2 as a special case when d=1d=1. For d>1d>1, if μ1=μd\mu_{1}^{*}=\mu_{d}^{*}, i.e., the largest dd eigenvalues of YY^{*} are all equal, the upper bound in Lemma 4 simplifies to

infO𝒪(d)VV~O\displaystyle\inf_{O\in\mathcal{O}(d)}\|{V-\widetilde{V}O}\| 1μdμd+1((1μdμd+1+1μd)YY2\displaystyle\lesssim\frac{1}{\mu_{d}^{*}-\mu_{d+1}^{*}}\Bigg{(}\left(\frac{1}{\mu_{d}^{*}-\mu_{d+1}^{*}}+\frac{1}{\mu_{d}^{*}}\right)\left\|{Y-Y^{*}}\right\|^{2}
+max{|μd+1|,|μn|}μdYY),\displaystyle\quad\quad\quad\quad\quad\quad+\frac{\max\{|\mu^{*}_{d+1}|,|\mu^{*}_{n}|\}}{\mu_{d}^{*}}\left\|{Y-Y^{*}}\right\|\Bigg{)},

which is similar in form to the upper bound in Lemma 2. This expression can be used in the 𝒪(d)\mathcal{O}(d) synchronization problem as λ1\lambda_{1}^{*} is shown to be equal to λd\lambda_{d}^{*} in Lemma 3. A direct application of this expression leads to the following proposition regarding the perturbation between UU and U~\widetilde{U}.

Proposition 4.

Assume 2dC02\leq d\leq C_{0} for some constant C0>0C_{0}>0. There exist constants C1,C2,C3>0C_{1},C_{2},C_{3}>0 such that if nplogn>C1\frac{np}{\log n}>C_{1} and npσ2>C2\frac{np}{\sigma^{2}}>C_{2}, we have

infO𝒪(d)UU~OC3σ2d+σdnp,\displaystyle\inf_{O\in\mathcal{O}(d)}\left\|{U-\widetilde{U}O}\right\|\leq C_{3}\frac{\sigma^{2}d+\sigma\sqrt{d}}{np},

with probability at least 16n101-6n^{-10}.

When d=1d=1, Proposition 4 reduces to Proposition 2. With Proposition 4, we can carry out a sharp 2\ell_{2} analysis of the performance of the spectral estimator Z^\widehat{Z} using U~\widetilde{U}. The loss function is defined analogously to (5) as

od(Z^,Z):=minO𝒪(d)1nZ^jZjOF2.\displaystyle\ell^{\text{od}}(\widehat{Z},Z^{*}):=\min_{O\in\mathcal{O}(d)}\frac{1}{n}\left\|{\widehat{Z}_{j}-Z_{j}^{*}O}\right\|_{\rm F}^{2}.

In this way, we have the following theorem which is similar to Theorem 3. Its asymptotic version is given in Theorem 2. The proof of Theorem 4 follows that of Theorem 3 but is more complicated due to the existence of the mapping 𝒫\mathcal{P} in the definition of the spectral method. To prove Theorem 4, note that for each j[n]j\in[n], Z^jZjF=𝒫(Uj)ZjF=𝒫(ZjTUj)IdF\|{\widehat{Z}_{j}-Z^{*}_{j}}\|_{\rm F}=\|{\mathcal{P}(U_{j})-Z^{*}_{j}}\|_{\rm F}=\|{\mathcal{P}(Z_{j}^{*{\mathrm{\scriptscriptstyle T}}}U_{j})-I_{d}}\|_{\rm F} where ZjTUjZ_{j}^{*{\mathrm{\scriptscriptstyle T}}}U_{j} can be approximated by ZjTU~jZ_{j}^{*{\mathrm{\scriptscriptstyle T}}}\widetilde{U}_{j} according to Proposition (4). The term ZjTU~jZ_{j}^{*{\mathrm{\scriptscriptstyle T}}}\widetilde{U}_{j} can be further expanded using (21) and Lemma 3, leading to kjAjkZjT𝒲jkZk\sum_{k\neq j}A_{jk}Z_{j}^{*{\mathrm{\scriptscriptstyle T}}}\mathcal{W}_{jk}Z^{*}_{k} and several approximation error terms. Careful analysis of kjAjkZjT𝒲jkZk\sum_{k\neq j}A_{jk}Z_{j}^{*{\mathrm{\scriptscriptstyle T}}}\mathcal{W}_{jk}Z^{*}_{k} eventually leads to the minimax risk d(d1)σ2/(2np)d(d-1)\sigma^{2}/(2np) and all the other error terms turn out to be negligible.

Theorem 4.

Assume 2dC02\leq d\leq C_{0} for some constant C0>0C_{0}>0. There exist constants C1,C2,C3C_{1},C_{2},C_{3} such that if nplogn>C1\frac{np}{\log n}>C_{1} and npσ2>C2\frac{np}{\sigma^{2}}>C_{2}, we have

od(Z^,Z)\displaystyle\ell^{\text{od}}(\widehat{Z},Z^{*}) (1+C3((σ2np)14+lognnp+1log(np)))d(d1)σ22np\displaystyle\leq\left(1+C_{3}\left(\left(\frac{\sigma^{2}}{np}\right)^{\frac{1}{4}}+\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)\right)\frac{d(d-1)\sigma^{2}}{2np}

holds with probability at least 1n9exp(132(npσ2)14)1-n^{-9}-\exp\left(-\frac{1}{32}\left(\frac{np}{\sigma^{2}}\right)^{\frac{1}{4}}\right).

We can compare the upper bound in Theorem 4 to existing results for the 𝒪(d)\mathcal{O}(d) synchronization. [20] derived an upper bound for the spectral method: (Z^,Z)d4(1+σ2d)/(np)\ell(\widehat{Z},Z^{*})\lesssim d^{4}(1+\sigma^{2}d)/(np) with high probability. In comparison, our upper bound has a smaller factor of d(d1)/2d(d-1)/2 for σ2/np\sigma^{2}/np. In addition, it does not have the d4/npd^{4}/np error term. The paper [20] also established the minimax lower bound: when 2dC02\leq d\leq C_{0}, there exist constants C4,C5>0C_{4},C_{5}>0 such that if npσ2>C4\frac{np}{\sigma^{2}}>C_{4}, we have

infZ𝒪(d)nsupZ𝒪(d)nod(Z^,Z)(1C5(1n+σ2np))d(d1)σ22np.\displaystyle\inf_{Z\in\mathcal{O}(d)^{n}}\sup_{Z^{*}\in\mathcal{O}(d)^{n}}\ell^{\text{od}}(\widehat{Z},Z^{*})\geq\left(1-C_{5}\left(\frac{1}{n}+\frac{\sigma^{2}}{np}\right)\right)\frac{d(d-1)\sigma^{2}}{2np}.

Compared to the lower bound, the spectral estimator Z^\widehat{Z} is exact minimax optimal as it achieves the correct rate with the correct constant d(d1)/2d(d-1)/2 in front of the optimal rate σ2/np\sigma^{2}/np.

4 Discussions

4.1 Comparison of Spectral Method and Other Methods

In synchronization problems, the spectral method offers computational advantages over alternative methods such as MLE, SDP, and GPM. According to Theorem 1, the spectral method attains statistical optimality in the limit as npσ2\frac{np}{\sigma^{2}}\rightarrow\infty, achieving the minimum possible risk. The performance of the spectral method in scenarios where npσ2\frac{np}{\sigma^{2}} does not approach infinity, however, remains less understood.

Previous studies [22, 24] have explored the PCA method in Bayesian settings for synchronization problems with p=1p=1. Unlike the spectral method, as defined in (4), PCA does not involve entrywise normalization but scales the leading eigenvector uu to minimize the mean square error (MSE). These studies offer a comprehensive asymptotic analysis of PCA’s MSE and that of the Bayes-optimal estimator, demonstrating both methods’ ability to achieve substantial accuracy when σ2\sigma^{2} is below a specific threshold. However, PCA tends to exhibit a higher MSE compared to the Bayes-optimal estimator. Furthermore, [22] indicates that the MSE of SDP falls between that of PCA and the Bayes-optimal estimator, leaning more towards the latter.

While Theorem 1 addresses the regime where npσ2\frac{np}{\sigma^{2}}\rightarrow\infty, Theorem 3 establishes an upper bound in scenarios where npσ2\frac{np}{\sigma^{2}} exceeds a certain constant. This suggests a complex interplay between the performance of the spectral method and the ratio σ2np\frac{\sigma^{2}}{np} in the constant npσ2\frac{np}{\sigma^{2}} regime. To better understand this relationship, we conducted numerical experiments using the spectral method, GPM, and SDP under various σ2\sigma^{2} levels. The GPM, initialized with the spectral estimator z^GPM(0):=z^\widehat{z}^{(0)}_{\text{GPM}}:=\widehat{z}, iteratively updates z^GPM(t):=f(Xz^GPM(t1))\widehat{z}^{(t)}_{\text{GPM}}:=f(X\widehat{z}^{(t-1)}_{\text{GPM}}) for t1t\geq 1, where f:n1nf:\mathbb{C}^{n}\rightarrow\mathbb{C}_{1}^{n} is an entrywise normalization function defined as [f(x)]i:=xi/|xi|𝕀{xi0}+𝕀{xi=0}[f(x)]_{i}:=x_{i}/|x_{i}|{\mathbb{I}\left\{{x_{i}\neq 0}\right\}}+{\mathbb{I}\left\{{x_{i}=0}\right\}} for any xnx\in\mathbb{C}^{n}. The SDP, a convex optimization problem, maximizes maxZn×n:Z=ZH,diag(Z)=In,Z0Tr(XZ)\max_{Z\in\mathbb{C}^{n\times n}:Z=Z^{\mathrm{\scriptscriptstyle H}},\text{diag}(Z)=I_{n},Z\succeq 0}\text{Tr}(XZ) over complex positive-semidefinite Hermitian matrices with unit diagonal entries and can be initialized using the spectral method. We assessed their performances using the normalized squared 2\ell_{2} loss (5).

Refer to caption
Refer to caption
Figure 1: Numerical results for the spectral method, GPM, and SDP in phase synchronization, with n=100n=100, p=0.5p=0.5 and σ2\sigma^{2} varying within [0,20][0,20]. Left: Error comparison measured by the normalized squared 2\ell_{2} loss. Right: Comparison of the high-order term in their errors.

Figure 1 summarizes the comparative performances of these methods. For low σ2\sigma^{2} values, the error rates of all methods approximate σ22np\frac{\sigma^{2}}{2np}. The left panel of the figure shows that as σ2\sigma^{2} increases, their error rates rise more steeply than σ22np\frac{\sigma^{2}}{2np}. As σ2\sigma^{2} continues to increase, the spectral method exhibits higher error rates, as expected, since the other two methods use the spectral method for initialization and enhance it through more complex procedures. For a deeper insight into the numerical performance differences, we compare the high-order terms in their errors. Specifically, the normalized squared 2\ell_{2} loss for each method can be expressed as (1+δ)σ22np(1+\delta)\frac{\sigma^{2}}{2np}, where δ\delta represents the high-order term. The right panel of Figure 1 compares δ\delta for these three methods. It reveals that even at small σ2\sigma^{2} values, the spectral method’s performance diverges from those of the other methods. This suggests that while δ\delta diminishes to 0 for all three methods as σ2\sigma^{2} decreases (thus achieving exact minimax optimality), the spectral method’s δ\delta diminishes more slowly than those of the other two methods.

Deriving explicit expressions for these error rates would be insightful, yet it falls outside the scope of this paper and presents an avenue for future research.

4.2 Condition on pp

In the phase synchronization problem (1), observations are missing at random, forming an Erdös-Rényi random graph AA with edge probability pp. The value of pp cannot be excessively small, as this could result in AA being disconnected, thereby making accurate estimation of zz^{*} under a global phase impossible. Theorem 1 assumes nplogn\frac{np}{\log n}\rightarrow\infty to establish the exact minimax optimality of the spectral method. A less stringent condition, where nplogn\frac{np}{\log n} exceeds a certain constant, is considered in Theorem 3. However, it is known that AA is connected with high probability when nplogn>1+ϵ\frac{np}{\log n}>1+\epsilon for any constant ϵ>0\epsilon>0. This raises the question of how the spectral method performs when nplogn\frac{np}{\log n} is a small constant.

Our analysis requires nplogn\frac{np}{\log n} to be greater than a certain constant for several technical reasons. This condition ensures desired bounds hold for critical quantities such as A𝔼A\|A-\mathbb{E}A\| and AW\|A\circ W\|, which are essential for the \ell_{\infty} analysis in Proposition 1 and the 2\ell_{2} analysis of the first order approximation in Proposition 2. Moreover, the proof of Theorem 3 leverages the \ell_{\infty} results from Proposition 1, leading to the inclusion of the lognnp\sqrt{\frac{\log n}{np}} factor in the theorem’s upper bound. This requires nplogn\frac{np}{\log n} to approach infinity for the upper bound to asymptotically match the exact minimax risk. Obtaining precise bounds for the performance of the spectral method when nplogn\frac{np}{\log n} is a small constant would require an extension beyond our current analytic framework, a task we leave for future research.

4.3 Other Low-rank Problems

The synchronization problems investigated in this manuscript are part of a broader category of problems characterized by low-rank matrix structures disrupted by additive noise and incomplete data. The methodologies developed herein are applicable to a variety of related problems, such as matrix completion, principal component analysis, factor models, mixture models, and ranking from pairwise comparison data. A key observation is that many of these problems encompass multiple sources of randomness, such as that arising from missing data and additive noise. An effective approach, as demonstrated in this study, is to isolate these sources and evaluate their individual contributions to the overall estimation error. This strategy is exemplified in our analysis of synchronization problems, where we introduce a novel population eigenvector and eigenspace. Furthermore, Lemma 2 and Lemma 4 offer a general framework for the perturbation analysis of eigenvectors and eigenspaces.

On the other hand, synchronization problems are special in that their leading eigenvector or eigenspace is spread out. In the literature [12], the coherence of an eigenvector uu is defined as maxi[n]|ui|2/n\max_{i\in[n]}|u_{i}|^{2}/n, where u1,,unu_{1},\ldots,u_{n} are its coordinates. In phase synchronization, the leading eigenvector of 𝔼X\mathbb{E}X in (3) possesses uniformly equal magnitude 1/n1/\sqrt{n}, indicating maximal coherence. Contrastingly, in many low-rank problems, eigenvectors exhibit lower coherence, which naturally factors into theoretical analysis. Therefore, when extending the concepts and methodologies from this paper to other scenarios, it is crucial to monitor eigenvector coherence for more precise and insightful analysis.

5 Proofs for Phase Synchronization

5.1 Proof of Lemma 2

We first present a variant of Davis-Kahan Theorem [14] and an inequality about infb1xyb\inf_{b\in\mathbb{C}_{1}}\left\|{x-yb}\right\| and (IdxxH)y\left\|{(I_{d}-xx^{\mathrm{\scriptscriptstyle H}})y}\right\| that will be used in the proof of Lemma 2.

Lemma 5.

Let X,X~d×dX,\widetilde{X}\in\mathbb{C}^{d\times d} be two Hermitian matrices. Let λ1λ2λd\lambda_{1}\geq\lambda_{2}\geq\ldots\geq\lambda_{d} be the eigenvalues of XX. Consider any r[d]r\in[d]. Let Ud×rU\in\mathbb{C}^{d\times r} (resp. U~\widetilde{U}) be the eigenspace of XX (resp. X~\widetilde{X}) that includes its leading rr eigenvectors. Under the assumption that XX~<(λrλr+1)/4\|{X-\widetilde{X}}\|<(\lambda_{r}-\lambda_{r+1})/4, we have

(IUUH)U~4XX~3(λrλr+1).\displaystyle\left\|{(I-UU^{\mathrm{\scriptscriptstyle H}})\widetilde{U}}\right\|\leq\frac{4\left\|{X-\widetilde{X}}\right\|}{3(\lambda_{r}-\lambda_{r+1})}.
Lemma 6.

For any unit vectors x,ydx,y\in\mathbb{C}^{d}, we have infb1xyb2(IdxxH)y\inf_{b\in\mathbb{C}_{1}}\left\|{x-yb}\right\|\leq\sqrt{2}\left\|{(I_{d}-xx^{\mathrm{\scriptscriptstyle H}})y}\right\|.

Proof of Lemma 2.

Denote μ1μn\mu_{1}\geq\ldots\geq\mu_{n} as the eigenvalues of YY. We first give some inequalities for the eigenvalues and Yv\|Yv^{*}\| that will be used later in the proof. By Weyl’s inequality, we have

max{|μ1μ1|,|μ2μ2|}YY.\displaystyle\max\left\{\left|\mu_{1}-\mu_{1}^{*}\right|,\left|\mu_{2}-\mu_{2}^{*}\right|\right\}\leq\left\|{Y-Y^{*}}\right\|.

Since YYmin{μ1μ2,μ1}/4\left\|{Y-Y^{*}}\right\|\leq\min\{\mu_{1}^{*}-\mu_{2}^{*},\mu_{1}^{*}\}/4 is assumed, we have

34μ1μ154μ1,μ1μ2μ1μ22,\displaystyle\frac{3}{4}\mu_{1}^{*}\leq\mu_{1}\leq\frac{5}{4}\mu_{1}^{*},\quad\mu_{1}-\mu_{2}\geq\frac{\mu_{1}^{*}-\mu_{2}^{*}}{2}, (23)

and

|μ1μ11|=|μ1μ1|μ1YYμ1YY4YY3μ1.\displaystyle\left|\frac{\mu_{1}^{*}}{\mu_{1}}-1\right|=\frac{\left|\mu_{1}^{*}-\mu_{1}\right|}{\mu_{1}}\leq\frac{\left\|{Y-Y^{*}}\right\|}{\mu_{1}^{*}-\left\|{Y-Y^{*}}\right\|}\leq\frac{4\left\|{Y-Y^{*}}\right\|}{3\mu_{1}^{*}}. (24)

Regarding Yv\|Yv^{*}\|, using the decomposition

Y=Y+(YY)=μ1vvH+(Yμ1vvH)+(YY),\displaystyle Y=Y^{*}+(Y-Y^{*})=\mu_{1}^{*}v^{*}v^{*{\mathrm{\scriptscriptstyle H}}}+(Y^{*}-\mu_{1}^{*}v^{*}v^{*{\mathrm{\scriptscriptstyle H}}})+(Y-Y^{*}),

and its consequence

Yv=Yv+(YY)v=μ1v+(YY)v,\displaystyle Yv^{*}=Y^{*}v^{*}+(Y-Y^{*})v^{*}=\mu_{1}^{*}v^{*}+(Y-Y^{*})v^{*}, (25)

we have

Yvμ1YY3μ14.\displaystyle\left\|{Yv^{*}}\right\|\geq\mu_{1}^{*}-\left\|{Y-Y^{*}}\right\|\geq\frac{3\mu_{1}^{*}}{4}. (26)

We define vwidecheckn\widecheck{v}\in\mathbb{C}^{n} and v~𝕊n\widetilde{v}\in\mathbb{S}_{n} as

vwidecheck\displaystyle\widecheck{v} :=Yvμ1,\displaystyle:=\frac{Yv^{*}}{\mu_{1}}, (27)
v~\displaystyle\widetilde{v} :=YvYv.\displaystyle:=\frac{Yv^{*}}{\left\|{Yv^{*}}\right\|}. (28)

Then v~\widetilde{v} is the first-order approximation of vv, written equivalently as v~=vwidecheck/vwidecheck\widetilde{v}={\widecheck{v}}/{\left\|{\widecheck{v}}\right\|}. Note that with Yv>0\left\|{Yv^{*}}\right\|>0 as shown in (26), v~\widetilde{v} is well-defined.

Since vv is the eigenvector of YY corresponding to μ1\mu_{1}, we have

μ1v\displaystyle\mu_{1}v =Yv,\displaystyle=Yv,
μ1v~\displaystyle\mu_{1}\widetilde{v} =Yv/vwidecheck.\displaystyle=Yv^{*}/\left\|{\widecheck{v}}\right\|.

Subtracting the second equation from the first one, we have

μ1(vv~)=Y(vvvwidecheck)=Y(vv~)+Y(v~vvwidecheck)=Y(vv~)+1vwidecheckY(vwidecheckv).\displaystyle\mu_{1}(v-\widetilde{v})=Y\left(v-\frac{v^{*}}{\left\|{\widecheck{v}}\right\|}\right)=Y(v-\widetilde{v})+Y\left(\widetilde{v}-\frac{v^{*}}{\left\|{\widecheck{v}}\right\|}\right)=Y(v-\widetilde{v})+\frac{1}{\left\|{\widecheck{v}}\right\|}Y(\widecheck{v}-v^{*}).

After rearranging, we have

vwidecheck(μ1InY)(vv~)=Y(vwidecheckv).\displaystyle\left\|{\widecheck{v}}\right\|(\mu_{1}I_{n}-Y)(v-\widetilde{v})=Y(\widecheck{v}-v^{*}). (29)

Since (μ1InY)v=0(\mu_{1}I_{n}-Y)v=0, we have span(μ1InY)\text{span}(\mu_{1}I_{n}-Y) being orthogonal to vv. As a result, vwidecheck(μ1InY)(vv~)=vwidecheck(μ1InY)(InvvH)v~\left\|{\widecheck{v}}\right\|(\mu_{1}I_{n}-Y)(v-\widetilde{v})=\left\|{\widecheck{v}}\right\|(\mu_{1}I_{n}-Y)(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})\widetilde{v}. In addition, since the left-hand side of (29) belongs to span(InvvH)\text{span}(I_{n}-vv^{\mathrm{\scriptscriptstyle H}}), its right-hand side must also belong to span(InvvH)\text{span}(I_{n}-vv^{\mathrm{\scriptscriptstyle H}}). That is, Y(vwidecheckv)=(InvvH)Y(vwidecheckv)Y(\widecheck{v}-v^{*})=(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})Y(\widecheck{v}-v^{*}). Then (29) leads to

vwidecheck(μ1InY)(InvvH)v~=(InvvH)Y(vwidecheckv).\displaystyle\left\|{\widecheck{v}}\right\|(\mu_{1}I_{n}-Y)(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})\widetilde{v}=(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})Y(\widecheck{v}-v^{*}). (30)

Observe that 0μ1μ2μ1μn0\leq\mu_{1}-\mu_{2}\leq\ldots\leq\mu_{1}-\mu_{n} are the eigenvalues of μ1InY\mu_{1}I_{n}-Y. In particular, the eigenvector corresponding to 0 is vv. Since (InvvH)v~(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})\widetilde{v} is orthogonal to vv, from (30) we have

vwidecheck(μ1μ2)(InvvH)v~vwidecheck(μ1InY)(InvvH)v~=(InvvH)Y(vwidecheckv).\displaystyle\left\|{\widecheck{v}}\right\|(\mu_{1}-\mu_{2})\left\|{(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})\widetilde{v}}\right\|\leq\left\|{\widecheck{v}}\right\|\left\|{(\mu_{1}I_{n}-Y)(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})\widetilde{v}}\right\|=\left\|{(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})Y(\widecheck{v}-v^{*})}\right\|.

Hence,

(InvvH)v~1vwidecheck(μ1μ2)(InvvH)Y(vwidecheckv).\displaystyle\left\|{(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})\widetilde{v}}\right\|\leq\frac{1}{\left\|{\widecheck{v}}\right\|(\mu_{1}-\mu_{2})}\left\|{(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})Y(\widecheck{v}-v^{*})}\right\|. (31)

From Lemma 6, we have infb1vv~b2(InvvH)v~\inf_{b\in\mathbb{C}_{1}}\left\|{v-\widetilde{v}b}\right\|\leq\sqrt{2}\left\|{(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})\widetilde{v}}\right\|. With this, (31) leads to

infb1vv~b2vwidecheck(μ1μ2)(InvvH)Y(vwidecheckv).\displaystyle\inf_{b\in\mathbb{C}_{1}}\left\|{v-\widetilde{v}b}\right\|\leq\frac{\sqrt{2}}{\left\|{\widecheck{v}}\right\|(\mu_{1}-\mu_{2})}\left\|{(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})Y(\widecheck{v}-v^{*})}\right\|. (32)

In the following, we are going to analyze (InvvH)Y(vwidecheckv){(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})Y(\widecheck{v}-v^{*})}. We have

(InvvH)Y(vwidecheckv)\displaystyle(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})Y(\widecheck{v}-v^{*})
=(InvvH)Y(Yvμ1v)\displaystyle=(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})Y\left(\frac{Yv^{*}}{\mu_{1}}-v^{*}\right)
=(InvvH)Y(μ1μ11)v+1μ1(InvvH)Y(YY)v\displaystyle=(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})Y\left(\frac{\mu^{*}_{1}}{\mu_{1}}-1\right)v^{*}+\frac{1}{\mu_{1}}(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})Y\left(Y-Y^{*}\right)v^{*}
=(μ1μ11)(InvvH)μ1v+(μ1μ11)(InvvH)(YY)v\displaystyle=\left(\frac{\mu^{*}_{1}}{\mu_{1}}-1\right)(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})\mu^{*}_{1}v^{*}+\left(\frac{\mu^{*}_{1}}{\mu_{1}}-1\right)(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})(Y-Y^{*})v^{*}
+1μ1(InvvH)μ1vvH(YY)v+1μ1(InvvH)(Yμ1vvH)(YY)v\displaystyle\quad+\frac{1}{\mu_{1}}(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})\mu_{1}^{*}v^{*}v^{*{\mathrm{\scriptscriptstyle H}}}\left(Y-Y^{*}\right)v^{*}+\frac{1}{\mu_{1}}(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})(Y^{*}-\mu_{1}^{*}v^{*}v^{*{\mathrm{\scriptscriptstyle H}}})\left(Y-Y^{*}\right)v^{*}
+1μ1(InvvH)(YY)(YY)v\displaystyle\quad+\frac{1}{\mu_{1}}(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})(Y-Y^{*})\left(Y-Y^{*}\right)v^{*}
=((μ1μ11)+1μ1vH(YY)v)μ1(InvvH)v\displaystyle=\left(\left(\frac{\mu^{*}_{1}}{\mu_{1}}-1\right)+\frac{1}{\mu_{1}}v^{*{\mathrm{\scriptscriptstyle H}}}\left(Y-Y^{*}\right)v^{*}\right)\mu^{*}_{1}(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})v^{*}
+(μ1μ11)(InvvH)(YY)v+1μ1(InvvH)(Yμ1vvH)(YY)v\displaystyle\quad+\left(\frac{\mu^{*}_{1}}{\mu_{1}}-1\right)(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})(Y-Y^{*})v^{*}+\frac{1}{\mu_{1}}(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})(Y^{*}-\mu_{1}^{*}v^{*}v^{*{\mathrm{\scriptscriptstyle H}}})\left(Y-Y^{*}\right)v^{*}
+1μ1(InvvH)(YY)(YY)v.\displaystyle\quad+\frac{1}{\mu_{1}}(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})(Y-Y^{*})\left(Y-Y^{*}\right)v^{*}.

Hence,

(InvvH)Y(vwidecheckv)\displaystyle\left\|{(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})Y(\widecheck{v}-v^{*})}\right\|
(|μ1μ11|+|vH(YY)v|μ1)μ1(InvvH)v+|μ1μ11|YY\displaystyle\leq\left(\left|\frac{\mu^{*}_{1}}{\mu_{1}}-1\right|+\frac{\left|v^{*{\mathrm{\scriptscriptstyle H}}}\left(Y-Y^{*}\right)v^{*}\right|}{\mu_{1}}\right)\mu_{1}^{*}\left\|{(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})v^{*}}\right\|+\left|\frac{\mu^{*}_{1}}{\mu_{1}}-1\right|\left\|{Y-Y^{*}}\right\|
+Yμ1vvHYYμ1+YY2μ1\displaystyle\quad+\frac{\left\|{Y^{*}-\mu_{1}^{*}v^{*}v^{*{\mathrm{\scriptscriptstyle H}}}}\right\|\left\|{Y-Y^{*}}\right\|}{\mu_{1}}+\frac{\left\|{Y-Y^{*}}\right\|^{2}}{\mu_{1}}
(|μ1μ11|+YYμ1)μ1(InvvH)v+|μ1μ11|YY\displaystyle\leq\left(\left|\frac{\mu^{*}_{1}}{\mu_{1}}-1\right|+\frac{\left\|{Y-Y^{*}}\right\|}{\mu_{1}}\right)\mu_{1}^{*}\left\|{(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})v^{*}}\right\|+\left|\frac{\mu^{*}_{1}}{\mu_{1}}-1\right|\left\|{Y-Y^{*}}\right\|
+|μ2|YYμ1+YY2μ1,\displaystyle\quad+\frac{|\mu_{2}^{*}|\left\|{Y-Y^{*}}\right\|}{\mu_{1}}+\frac{\left\|{Y-Y^{*}}\right\|^{2}}{\mu_{1}},

where we use the fact that InvvH=1\left\|{I_{n}-vv^{\mathrm{\scriptscriptstyle H}}}\right\|=1 and Yμ1vvH=max{|μ2|,|μn|}\left\|{Y^{*}-\mu_{1}^{*}v^{*}v^{*{\mathrm{\scriptscriptstyle H}}}}\right\|=\max\{|\mu_{2}^{*}|,|\mu_{n}^{*}|\}. Then together with (32), we have

infb1vv~b\displaystyle\inf_{b\in\mathbb{C}_{1}}\left\|{v-\widetilde{v}b}\right\| 2vwidecheck(μ1μ2)((|μ1μ11|+YYμ1)μ1(InvvH)v\displaystyle\leq\frac{\sqrt{2}}{\left\|{\widecheck{v}}\right\|(\mu_{1}-\mu_{2})}\Bigg{(}\left(\left|\frac{\mu^{*}_{1}}{\mu_{1}}-1\right|+\frac{\left\|{Y-Y^{*}}\right\|}{\mu_{1}}\right)\mu_{1}^{*}\left\|{(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})v^{*}}\right\|
+|μ1μ11|YY+max{|μ2|,|μn|}YYμ1+YY2μ1).\displaystyle\quad+\left|\frac{\mu^{*}_{1}}{\mu_{1}}-1\right|\left\|{Y-Y^{*}}\right\|+\frac{\max\{|\mu_{2}^{*}|,|\mu_{n}^{*}|\}\left\|{Y-Y^{*}}\right\|}{\mu_{1}}+\frac{\left\|{Y-Y^{*}}\right\|^{2}}{\mu_{1}}\Bigg{)}.

In the rest of the proof, we are going to simplify the display above. From (23) and (26), we have

vwidecheck=Yvμ135.\displaystyle\left\|{\widecheck{v}}\right\|=\frac{\left\|{Yv^{*}}\right\|}{\mu_{1}}\geq\frac{3}{5}.

Using Lemma 5 and the assumption YY(μ1μ2)/4\left\|{Y-Y^{*}}\right\|\leq(\mu_{1}^{*}-\mu_{2}^{*})/4, we have

(InvvH)v2YYμ1μ2.\displaystyle\left\|{(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})v^{*}}\right\|\leq\frac{2\left\|{Y-Y^{*}}\right\|}{\mu_{1}^{*}-\mu_{2}^{*}}.

With the above results, together with (23) and (24), we have

infb1vv~b\displaystyle\inf_{b\in\mathbb{C}_{1}}\left\|{v-\widetilde{v}b}\right\|
235μ1μ22((4YY3μ1+YY34μ1)μ12YYμ1μ2+4YY3μ1YY\displaystyle\leq\frac{\sqrt{2}}{\frac{3}{5}\frac{\mu_{1}^{*}-\mu_{2}^{*}}{2}}\Bigg{(}\left(\frac{4\left\|{Y-Y^{*}}\right\|}{3\mu_{1}^{*}}+\frac{\left\|{Y-Y^{*}}\right\|}{\frac{3}{4}\mu_{1}^{*}}\right)\mu_{1}^{*}\frac{2\left\|{Y-Y^{*}}\right\|}{\mu_{1}^{*}-\mu_{2}^{*}}+\frac{4\left\|{Y-Y^{*}}\right\|}{3\mu_{1}^{*}}\left\|{Y-Y^{*}}\right\|
+max{|μ2|,|μn|}YY34μ1+YY234μ1)\displaystyle\quad+\frac{\max\{|\mu_{2}^{*}|,|\mu_{n}^{*}|\}\left\|{Y-Y^{*}}\right\|}{\frac{3}{4}\mu_{1}^{*}}+\frac{\left\|{Y-Y^{*}}\right\|^{2}}{\frac{3}{4}\mu_{1}^{*}}\Bigg{)}
=1023(μ1μ2)((163(μ1μ2)+83μ1)YY2+4max{|μ2|,|μn|}3μ1YY)\displaystyle=\frac{10\sqrt{2}}{3(\mu_{1}^{*}-\mu_{2}^{*})}\left(\left(\frac{16}{3(\mu_{1}^{*}-\mu_{2}^{*})}+\frac{8}{3\mu_{1}^{*}}\right)\left\|{Y-Y^{*}}\right\|^{2}+\frac{4\max\{|\mu_{2}^{*}|,|\mu_{n}^{*}|\}}{3\mu_{1}^{*}}\left\|{Y-Y^{*}}\right\|\right)
4029(μ1μ2)((4(μ1μ2)+2μ1)YY2+max{|μ2|,|μn|}μ1YY).\displaystyle\leq\frac{40\sqrt{2}}{9(\mu_{1}^{*}-\mu_{2}^{*})}\left(\left(\frac{4}{(\mu_{1}^{*}-\mu_{2}^{*})}+\frac{2}{\mu_{1}^{*}}\right)\left\|{Y-Y^{*}}\right\|^{2}+\frac{\max\{|\mu_{2}^{*}|,|\mu_{n}^{*}|\}}{\mu_{1}^{*}}\left\|{Y-Y^{*}}\right\|\right).

5.2 Proofs of Lemma 1, Proposition 1, and Proposition 2

Proof of Lemma 1 .

Denote λ\lambda^{\prime} as an eigenvalue of AA with its corresponding eigenvector uu^{\prime}. Then we have Au=λuAu^{\prime}=\lambda^{\prime}u^{\prime}. This can be equivalently written as

kjAjkuk=λuj,j[n].\displaystyle\sum_{k\neq j}A_{jk}u^{\prime}_{k}=\lambda^{\prime}u^{\prime}_{j},\forall j\in[n].

Multiplying by zj{z^{*}_{j}} on both sides, we have

kjAjkzjuk=kjAjkzjzk¯(zkuk)=λzjuj,j[n].\displaystyle\sum_{k\neq j}A_{jk}z^{*}_{j}u^{\prime}_{k}=\sum_{k\neq j}A_{jk}z^{*}_{j}\overline{z^{*}_{k}}(z^{*}_{k}u^{\prime}_{k})=\lambda^{\prime}z^{*}_{j}u^{\prime}_{j},\forall j\in[n].

That is, (AzzH)(zu)=λ(zu)(A\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}})(z^{*}\circ u^{\prime})=\lambda^{\prime}(z^{*}\circ u^{\prime}). Hence, λ\lambda^{\prime} is an eigenvalue of AzzHA\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}} with the corresponding eigenvector zuz^{*}\circ u^{\prime}.

By the same argument, we can show each eigenvalue of AzzHA\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}} is also an eigenvalue of AA. As a result, since uwidecheck\widecheck{u} is the leading eigenvector of AA, zuwidecheckz^{*}\circ\widecheck{u} is the leading eigenvector of AzzHA\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}}. ∎

Before proving Proposition 1 and Proposition 2, we first state some technical lemmas related to AA and WW.

Lemma 7.

The largest eigenvalue of 𝔼A\mathbb{E}A is (n1)p(n-1)p and the corresponding eigenvector is 𝟙n/n\mathds{1}_{n}/\sqrt{n}. The remaining eigenvalues of 𝔼A\mathbb{E}A are p-p with multiplicity n1n-1. Denote λλ2λn\lambda^{\prime}\geq\lambda^{\prime}_{2}\geq\ldots\geq\lambda^{\prime}_{n} as the eigenvalues of AA. We have

|λ(n1)p|,max2jn|λj+p|A𝔼A, and λλ2np2A𝔼A.\displaystyle|\lambda^{\prime}-(n-1)p|,\max_{2\leq j\leq n}|\lambda^{\prime}_{j}+p|\leq\left\|{A-\mathbb{E}A}\right\|,\text{ and }\lambda^{\prime}-\lambda^{\prime}_{2}\geq np-2\left\|{A-\mathbb{E}A}\right\|. (33)
Lemma 8.

There exist constants C1,C2>0C_{1},C_{2}>0 such that if nplogn>C1\frac{np}{\log n}>C_{1}, then we have

A𝔼AC2np,\displaystyle\left\|{A-\mathbb{E}A}\right\|\leq C_{2}\sqrt{np},
AWC2np,\displaystyle\left\|{A\circ W}\right\|\leq C_{2}\sqrt{np},
j[n]|Im(kjAjkWjkzj¯zk)|2n2p2(1+C2lognn),\displaystyle\sum_{j\in[n]}\left|{\rm Im}\left({\sum_{k\neq j}A_{jk}W_{jk}\overline{z_{j}^{*}}z^{*}_{k}}\right)\right|^{2}\leq\frac{n^{2}p}{2}\left(1+C_{2}\sqrt{\frac{\log n}{n}}\right),

with probability at least 13n101-3n^{-10}.

The first part of Proposition 1 (i.e., (16)) can be proved using Theorem 2.1 of [1] which we include below for completeness. The statement of Theorem 2.1 in [1] is complicated as the theorem works for perturbation of eigenspaces. However, what we need to consider here is only the perturbation of the leading eigenvector. For easier reference, we present below a simpler version of the theorem.

Lemma 9 (A simpler version of Theorem 2.1 of [1]).

Consider two symmetric matrices Y,Yn×nY,Y^{*}\in\mathbb{R}^{n\times n}. Let the eigenvalues of YY^{*} be μ1μ2μn\mu_{1}^{*}\geq\mu_{2}^{*}\geq\ldots\geq\mu_{n}^{*}. Define Δ:=min{μ1μ2,μ1}\Delta^{*}:=\min\{\mu^{*}_{1}-\mu^{*}_{2},\mu^{*}_{1}\} and κ:=max{|μ1|,|μn|}/Δ\kappa:=\max\{|\mu^{*}_{1}|,|\mu^{*}_{n}|\}/\Delta^{*}. Let the leading eigenvector of YY (resp. YY^{*}) be vv (resp. vv^{*}). Assume the following conditions are satisfied for some γ0\gamma\geq 0 and some function ϕ:[0,+)[0,+)\phi:[0,+\infty)\rightarrow[0,+\infty):

  1. 1.

    Y2γΔ\left\|{Y^{*}}\right\|_{2\rightarrow\infty}\leq\gamma\Delta^{*}.

  2. 2.

    For any m[n]m\in[n], {Yjk:j=m or k=m}\{Y_{jk}:j=m\text{ or }k=m\} are independent of {Yjk:jm,km}\{Y_{jk}:j\neq m,k\neq m\}.

  3. 3.

    32κmax{γ,ϕ(γ)}132\kappa\max\{\gamma,\phi(\gamma)\}\leq 1 and for some δ0(0,1)\delta_{0}\in(0,1), (YYγΔ)1δ0\mathbb{P}\left(\left\|{Y-Y^{*}}\right\|\leq\gamma\Delta^{*}\right)\geq 1-\delta_{0}.

  4. 4.

    Suppose ϕ(x)\phi(x) is continuous and non-decreasing in [0,+)[0,+\infty) with ϕ(0)=0\phi(0)=0, ϕ(x)/x\phi(x)/x is non-increasing in [0,+)[0,+\infty), and δ1(0,1)\delta_{1}\in(0,1). For any m[n]m\in[n] and wnw\in\mathbb{R}^{n},

    (|[YY]mw|Δwϕ(wnw))1δ1n.\displaystyle\mathbb{P}\left(\left|[Y-Y^{*}]_{m\cdot}w\right|\leq\Delta^{*}\left\|{w}\right\|_{\infty}\phi\left(\frac{\left\|{w}\right\|}{\sqrt{n}\left\|{w}\right\|_{\infty}}\right)\right)\geq 1-\frac{\delta_{1}}{n}.

Then with probability at least 1δ02δ11-\delta_{0}-2\delta_{1}, there exists some constant C>0C>0 and some b{1,1}b\in\{-1,1\} such that

vbYv/μ1\displaystyle\left\|{vb-Yv^{*}/\mu_{1}^{*}}\right\|_{\infty} C(κ(κ+ϕ(1))(γ+ϕ(γ))v+γY2/Δ).\displaystyle\leq C\left(\kappa(\kappa+\phi(1))(\gamma+\phi(\gamma))\left\|{v^{*}}\right\|_{\infty}+\gamma\left\|{Y^{*}}\right\|_{2\rightarrow\infty}/\Delta^{*}\right).

The following Lemma 10 provides two Bernstein-type concentration inequalities to be used in the proof of Proposition 1. The first one is the classical Bernstein inequality; see Section 2.8 of [6] for its proof. The second one is proved in Lemma 7 of [1].

Lemma 10.

Let B1,,BnB_{1},\ldots,B_{n} be real independent random variables such that maxj[n]|Bj|M\max_{j\in[n]}\left|B_{j}\right|\leq M for some M>0M>0. Then

(|j[n](Bj𝔼Bj)|t)2exp(12t2j[n]𝔼(Bj𝔼Bj)2+13Mt).\displaystyle\mathbb{P}\left(\left|\sum_{j\in[n]}(B_{j}-\mathbb{E}B_{j})\right|\geq t\right)\leq 2\exp\left(-\frac{\frac{1}{2}t^{2}}{\sum_{j\in[n]}\mathbb{E}(B_{j}-\mathbb{E}B_{j})^{2}+\frac{1}{3}Mt}\right).

Let wnw\in\mathbb{R}^{n} be a fixed vector and α0\alpha\geq 0. If {Bj}j[n]iidBernoulli(p)\{B_{j}\}_{j\in[n]}\stackrel{{\scriptstyle iid}}{{\sim}}\text{Bernoulli}(p), we have

(|j[n]wj(Bjp)|(2+α)np1log(nww)w)2exp(αnp).\displaystyle\mathbb{P}\left(\left|\sum_{j\in[n]}w_{j}(B_{j}-p)\right|\geq\frac{(2+\alpha)np}{1\vee\log\left(\frac{\sqrt{n}\left\|{w}\right\|_{\infty}}{\left\|{w}\right\|}\right)}\left\|{w}\right\|_{\infty}\right)\leq 2\exp(-\alpha np).
Proof of Proposition 1.

We use Lemma 9 to prove the first part of the proposition. Denote μ1μ2μn\mu_{1}^{*}\geq\mu_{2}^{*}\geq\ldots\geq\mu_{n}^{*} as the eigenvalues of 𝔼A\mathbb{E}A. Define Δ\Delta^{*} and κ\kappa the same as in Lemma 9. From Lemma 7, we have Δ=(n1)p\Delta^{*}=(n-1)p, κ=1\kappa=1, with 𝟙n/n\mathds{1}_{n}/\sqrt{n} being the leading eigenvector of 𝔼A\mathbb{E}A. Since 𝔼A=pJnpIn\mathbb{E}A=pJ_{n}-pI_{n}, we have 𝔼A2=(n1)p\|\mathbb{E}A\|_{2\rightarrow\infty}=\sqrt{(n-1)}p. By Lemma 8, there exist constants c1,c2>1c_{1},c_{2}>1 such that if nplogn>c1\frac{np}{\log n}>c_{1}, then A𝔼Ac2np\left\|{A-\mathbb{E}A}\right\|\leq c_{2}\sqrt{np} with probability at least 13n101-3n^{-10}. Define γ:=2c2/np\gamma:=2c_{2}/\sqrt{np}, δ0:=2n10\delta_{0}:=2n^{-10}, and ϕ(x):=3(1log(x1))1\phi(x):=3(1\vee\log(x^{-1}))^{-1}. Then the first assumption of Lemma 9 is satisfied as long as c21c_{2}\geq 1. When nplogn\frac{np}{\log n} is greater than some sufficiently large constant, we have ϕ(γ)8/log(np)\phi(\gamma)\leq 8/\log(np), and the third assumption is satisfied. We can also verify that the second assumption is also satisfied. For any m[n]m\in[n] and any wnw\in\mathbb{R}^{n}, since [A𝔼A]mw[A-\mathbb{E}A]_{m\cdot}w is a weighted average of centered Bernoulli random variables, the second inequality of Lemma 10 can be applied to have

(|[A𝔼A]jw|>Δwϕ(wnw))\displaystyle\mathbb{P}\left(\left|[A-\mathbb{E}A]_{j\cdot}w\right|>\Delta^{*}\|w\|_{\infty}\phi\left(\frac{\left\|{w}\right\|}{\sqrt{n}\left\|{w}\right\|_{\infty}}\right)\right)
(|[A𝔼A]jw|2.5np1log(nww)w)2n11,\displaystyle\leq\mathbb{P}\left(\left|[A-\mathbb{E}A]_{j\cdot}w\right|\geq\frac{2.5np}{1\vee\log\left(\frac{\sqrt{n}\left\|{w}\right\|_{\infty}}{\left\|{w}\right\|}\right)}\left\|{w}\right\|_{\infty}\right)\leq 2n^{-11},

when nplogn11\frac{np}{\log n}\geq 11 is greater than some sufficiently large constant. Define δ1:=2n10\delta_{1}:=2n^{-10}. Then the last assumption of Lemma 9 is satisfied. Then Lemma 9 leads to the conclusion that with probability at least 16n101-6n^{-10}, there exists some constant c1>0c_{1}>0 and some b{1,1}b\in\{-1,1\} such that

uwidecheckb1μ1nA𝟙n\displaystyle\left\|{\widecheck{u}b-\frac{1}{\mu_{1}^{*}\sqrt{n}}A\mathds{1}_{n}}\right\|_{\infty} c1(κ(κ+ϕ(1))(γ+ϕ(γ))1n𝟙n+γ𝔼A2Δ)\displaystyle\leq c_{1}\left(\kappa(\kappa+\phi(1))(\gamma+\phi(\gamma))\left\|{\frac{1}{\sqrt{n}}\mathds{1}_{n}}\right\|_{\infty}+\gamma\frac{\left\|{\mathbb{E}A}\right\|_{2\rightarrow\infty}}{\Delta^{*}}\right)
c1((1+3)(2c2np+8log(np))1n+2c2np(n1)p(n1)p)\displaystyle\leq c_{1}\left((1+3)\left(\frac{2c_{2}}{\sqrt{np}}+\frac{8}{\log(np)}\right)\frac{1}{\sqrt{n}}+\frac{2c_{2}}{\sqrt{np}}\frac{\sqrt{(n-1)}p}{(n-1)p}\right)
c2log(np)1n,\displaystyle\leq\frac{c_{2}}{\log(np)}\frac{1}{\sqrt{n}},

for some constant c2>0c_{2}>0. Note that

1μ1nA𝟙n=1μ1n𝔼A𝟙n+1μ1n(A𝔼A)𝟙n=1n𝟙n+1(n1)pn(A𝔼A)𝟙n.\displaystyle\frac{1}{\mu_{1}^{*}\sqrt{n}}A\mathds{1}_{n}=\frac{1}{\mu_{1}^{*}\sqrt{n}}\mathbb{E}A\mathds{1}_{n}+\frac{1}{\mu_{1}^{*}\sqrt{n}}(A-\mathbb{E}A)\mathds{1}_{n}=\frac{1}{\sqrt{n}}\mathds{1}_{n}+\frac{1}{(n-1)p\sqrt{n}}(A-\mathbb{E}A)\mathds{1}_{n}.

Then we have

uwidecheckb1n𝟙nc2log(np)1n+1(n1)p1n(A𝔼A)𝟙n.\displaystyle\left\|{\widecheck{u}b-\frac{1}{\sqrt{n}}\mathds{1}_{n}}\right\|_{\infty}\leq\frac{c_{2}}{\log(np)}\frac{1}{\sqrt{n}}+\frac{1}{(n-1)p}\left\|{\frac{1}{\sqrt{n}}(A-\mathbb{E}A)\mathds{1}_{n}}\right\|_{\infty}.

For any m[n]m\in[n], by the first inequality of Lemma 10, there exists some constant c3>0c_{3}>0 such that

(|[A𝔼A]m𝟙n|c3nplogn)\displaystyle\mathbb{P}\left(\left|[A-\mathbb{E}A]_{m\cdot}\mathds{1}_{n}\right|\geq c_{3}\sqrt{np\log n}\right) 2exp(c322nplogn(n1)p(1p)+c33nplogn)\displaystyle\leq 2\exp\left(-\frac{\frac{c_{3}^{2}}{2}np\log n}{(n-1)p(1-p)+\frac{c_{3}}{3}\sqrt{np\log n}}\right)
2n11.\displaystyle\leq 2n^{-11}.

Together with a union bound, we have ((A𝔼A)𝟙nc3nplogn)2n10\mathbb{P}\left(\|(A-\mathbb{E}A)\mathds{1}_{n}\|\geq c_{3}\sqrt{np\log n}\right)\leq 2n^{-10}. Hence,

uwidecheckb1n𝟙nc2log(np)1n+1(n1)pc3nplognnc4(lognnp+1log(np))1n,\displaystyle\left\|{\widecheck{u}b-\frac{1}{\sqrt{n}}\mathds{1}_{n}}\right\|_{\infty}\leq\frac{c_{2}}{\log(np)}\frac{1}{\sqrt{n}}+\frac{1}{(n-1)p}\frac{c_{3}\sqrt{np\log n}}{\sqrt{n}}\leq c_{4}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)\frac{1}{\sqrt{n}},

for some constant c4>0c_{4}>0 with probability at least 18n101-8n^{-10}.

The second part of the proposition is an immediate consequence of the first part. If nplogn>max{C1,2C22}\frac{np}{\log n}>\max\left\{C_{1},2C_{2}^{2}\right\}, all the coordinates of uwidecheck\widecheck{u} have the same sign according to (16). From Lemma 1, we have u=uu=u^{*} as uu^{*} is the leading eigenvector of AzzHA\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}}. If {uwidecheckj}j[n]\{\widecheck{u}_{j}\}_{j\in[n]} are all positive, we have

z^j=uj/|uj|=zjuwidecheckj/uwidecheckj=zj,\widehat{z}_{j}=u_{j}^{*}/|{u_{j}^{*}}|=z^{*}_{j}\widecheck{u}_{j}/\widecheck{u}_{j}=z^{*}_{j},

for each j[n]j\in[n]. That is, z^=z\widehat{z}=z^{*}. If {uwidecheckj}j[n]\{\widecheck{u}_{j}\}_{j\in[n]} are all negative, we then have z^=z\widehat{z}=-z^{*}. ∎

Proof of Proposition 2.

Recall λ\lambda^{*} is the largest eigenvalue of AzzHA\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}}. From Lemma 1, uu^{*} is the corresponding eigenvector. Denote λ2λn\lambda^{*}_{2}\geq\ldots\geq\lambda^{*}_{n} as its remaining eigenvalues. By Lemma 8, there exist constants c1,c2>0c_{1},c_{2}>0 such that when nplogn>c1\frac{np}{\log n}>c_{1}, we have A𝔼Ac2np\left\|{A-\mathbb{E}A}\right\|\leq c_{2}\sqrt{np} and AWc2np\left\|{A\circ W}\right\|\leq c_{2}\sqrt{np} with probability at least 13n101-3n^{-10}. By Lemma 1 and Lemma 7, we have λ(n1)pc2np\lambda^{*}\geq(n-1)p-c_{2}\sqrt{np}, max{|λ2|,|λn|}p+c2np\max\{|\lambda^{*}_{2}|,|\lambda^{*}_{n}|\}\leq p+c_{2}\sqrt{np}, and λλ2np2c2np\lambda^{*}-\lambda_{2}^{*}\geq np-2c_{2}\sqrt{np}. When nplogn\frac{np}{\log n} and npσ2\frac{np}{\sigma^{2}} are greater than some sufficiently large constant, we have 4σAWnp/2min{λ1,λλ2}4\sigma\left\|{A\circ W}\right\|\leq np/2\leq\min\{\lambda^{*}_{1},\lambda^{*}-\lambda_{2}^{*}\} satisfied. Since XAzzH=σAWX-A\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}}=\sigma A\circ W, a direct application of Lemma 2 leads to

infb1uu~b\displaystyle\inf_{b\in\mathbb{C}_{1}}\left\|{u-\widetilde{u}b}\right\|
4029(λλ2)((4λλ2+2λ)σ2AW2+max{|λ2|,|λn|}σAWλ)\displaystyle\leq\frac{40\sqrt{2}}{9(\lambda^{*}-\lambda_{2}^{*})}\left(\left(\frac{4}{\lambda^{*}-\lambda_{2}^{*}}+\frac{2}{\lambda^{*}}\right)\sigma^{2}\left\|{A\circ W}\right\|^{2}+\frac{\max\{|\lambda^{*}_{2}|,|\lambda^{*}_{n}|\}\sigma\left\|{A\circ W}\right\|}{\lambda^{*}}\right)
4029np/2((4np/2+2np/2)c22σ2np+(p+c2np)c2σnpnp/2)\displaystyle\leq\frac{40\sqrt{2}}{9np/2}\left(\left(\frac{4}{np/2}+\frac{2}{np/2}\right)c_{2}^{2}\sigma^{2}np+\frac{(p+c_{2}\sqrt{np})c_{2}\sigma\sqrt{np}}{np/2}\right)
c3σ2+σnp,\displaystyle\leq c_{3}\frac{\sigma^{2}+\sigma}{np},

for some constant c3>0c_{3}>0. ∎

5.3 Proof of Theorem 3

We first state some technical lemmas that will be used in the proof of Theorem 3.

Lemma 11.

There exists some constant C1>0C_{1}>0 such that for any γ\gamma satisfying γ2npσ2C1\frac{\gamma^{2}np}{\sigma^{2}}\geq C_{1}, we have

j[n]𝕀{2σnp|kjAjkWjkzj¯zk|γ}4σ2γ2pexp(116γ2npσ2),\displaystyle\sum_{j\in[n]}{\mathbb{I}\left\{{\frac{2\sigma}{np}\left|{{\sum_{k\neq j}A_{jk}W_{jk}\overline{z_{j}^{*}}z^{*}_{k}}}\right|\geq\gamma}\right\}}\leq\frac{4\sigma^{2}}{\gamma^{2}p}\exp\left(-\frac{1}{16}\sqrt{\frac{\gamma^{2}np}{\sigma^{2}}}\right),

holds with probability at least 1exp(132γ2npσ2)1-\exp\left(-\frac{1}{32}\sqrt{\frac{\gamma^{2}np}{\sigma^{2}}}\right).

Lemma 12 (Lemma 10 and Lemma 11 of [18]).

For any xx\in\mathbb{C} such that Re(x)>0{\rm Re}(x)>0, |x|x|1||Im(x)Re(x)|\left|\frac{x}{|x|}-1\right|\leq\left|\frac{{\rm Im}(x)}{{\rm Re}(x)}\right|. For any x{0}x\in\mathbb{C}\setminus\{0\} and any y1y\in\mathbb{C}_{1}, we have |x|x|y|2|xy|\left|\frac{x}{\left|x\right|}-y\right|\leq 2\left|x-y\right|.

Proof of Theorem 3.

Let b11b_{1}\in\mathbb{C}_{1} satisfy uu~b1=infa1uu~a\|{u-\widetilde{u}b_{1}}\|=\inf_{a\in\mathbb{C}_{1}}\left\|{u-\widetilde{u}a}\right\|. Denote δ:=uu~b1n\delta:=u-\widetilde{u}b_{1}\in\mathbb{C}^{n}. Recall uwidecheck\widecheck{u} is the leading eigenvector of AA. From Proposition 1, Proposition 2, and Lemma 8, there exist constants c1,c2>0c_{1},c_{2}>0 such that if nplogn,npσ2>c1\frac{np}{\log n},\frac{np}{\sigma^{2}}>c_{1}, we have

δ\displaystyle\left\|{\delta}\right\| c2σ2+σnp,\displaystyle\leq c_{2}\frac{\sigma^{2}+\sigma}{np}, (34)
maxj[n]|uwidecheckj1nb2|\displaystyle\max_{j\in[n]}\left|\widecheck{u}_{j}-\frac{1}{\sqrt{n}}b_{2}\right| c2(lognnp+1log(np))1n,\displaystyle\leq c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)\frac{1}{\sqrt{n}}, (35)
A𝔼A\displaystyle\left\|{A-\mathbb{E}A}\right\| c2np,\displaystyle\leq c_{2}\sqrt{np}, (36)
AW\displaystyle\left\|{A\circ W}\right\| c2np,\displaystyle\leq c_{2}\sqrt{np}, (37)
j[n]|Im(kjAjkWjkzj¯zk)|2\displaystyle\sum_{j\in[n]}\left|{\rm Im}\left({\sum_{k\neq j}A_{jk}W_{jk}\overline{z_{j}^{*}}z^{*}_{k}}\right)\right|^{2} n2p2(1+c2lognn),\displaystyle\leq\frac{n^{2}p}{2}\left(1+c_{2}\sqrt{\frac{\log n}{n}}\right), (38)

with probability at least 1n91-n^{-9}, for some b2{1,1}b_{2}\in\{-1,1\}.

From (35), when nplogn2c22\frac{np}{\log n}\geq 2c_{2}^{2}, uwidecheck\widecheck{u} is closer to 𝟙n/nb2\mathds{1}_{n}/\sqrt{n}b_{2} than to 𝟙n/nb2-\mathds{1}_{n}/\sqrt{n}b_{2} with respect to 2\ell_{2} norm. From Lemma 7, 𝟙n/n\mathds{1}_{n}/\sqrt{n} is the leading eigenvector of 𝔼A\mathbb{E}A. By Lemma 5 and Lemma 6, we have

uwidecheck𝟙n/nb22(I𝟙n𝟙nT/n)uwidecheck2A𝔼Anp2c2np.\displaystyle\left\|{\widecheck{u}-\mathds{1}_{n}/\sqrt{n}b_{2}}\right\|\leq\sqrt{2}\|(I-\mathds{1}_{n}\mathds{1}_{n}^{\mathrm{\scriptscriptstyle T}}/n)\widecheck{u}\|\leq\frac{2\left\|{A-\mathbb{E}A}\right\|}{np}\leq\frac{2c_{2}}{\sqrt{np}}.

Recall that uu^{*} is defined as zuwidecheckz^{*}\circ\widecheck{u} in (8). Define δ:=u1nzb2\delta^{*}:=u^{*}-\frac{1}{\sqrt{n}}z^{*}b_{2}. This yields

δ\displaystyle\left\|{\delta^{*}}\right\| =zuwidecheck1nz𝟙nb2=z(uwidecheck1n𝟙nb2)\displaystyle=\left\|{z^{*}\circ\widecheck{u}-\frac{1}{\sqrt{n}}z^{*}\circ\mathds{1}_{n}b_{2}}\right\|=\left\|{z^{*}\circ\left(\widecheck{u}-\frac{1}{\sqrt{n}}\mathds{1}_{n}b_{2}\right)}\right\|
=uwidecheck1n𝟙nb22c2np+2pnp.\displaystyle=\left\|{\widecheck{u}-\frac{1}{\sqrt{n}}\mathds{1}_{n}b_{2}}\right\|\leq\frac{2c_{2}\sqrt{np}+2p}{np}. (39)

By the definition of u~\widetilde{u} in (9), we can decompose uu into

u\displaystyle u =u~b1+δ=XuXub1+δ=b1Xu((AzzH)u+σ(AW)u)+δ\displaystyle=\widetilde{u}b_{1}+\delta=\frac{Xu^{*}}{\left\|{Xu^{*}}\right\|}b_{1}+\delta=\frac{b_{1}}{\left\|{Xu^{*}}\right\|}\left(\left(A\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}}\right)u^{*}+\sigma\left(A\circ W\right)u^{*}\right)+\delta
=b1Xu(λu+σ(AW)u)+δ,\displaystyle=\frac{b_{1}}{\left\|{Xu^{*}}\right\|}\left(\lambda^{*}u^{*}+\sigma\left(A\circ W\right)u^{*}\right)+\delta, (40)

where we use the fact that uu^{*} is the eigenvector of AzzHA\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}} corresponding to the eigenvalue λ\lambda^{*} by Lemma 1. With the definition of uu^{*} and also its approximation 1nzb2\frac{1}{\sqrt{n}}z^{*}b_{2}, (40) leads to

u=b1Xu(λ(zuwidecheck)+σ(AW)(1nzb2+δ))+δ.\displaystyle u=\frac{b_{1}}{\left\|{Xu^{*}}\right\|}\left(\lambda^{*}(z^{*}\circ\widecheck{u})+\sigma\left(A\circ W\right)\left(\frac{1}{\sqrt{n}}z^{*}b_{2}+\delta^{*}\right)\right)+\delta.

For any j[n]j\in[n], denote [AW]j[A\circ W]_{j\cdot} as its jjth row. From the display above, we can express uju_{j} as

uj=b1Xu(λzjuwidecheckj+σnkjAjkWjkzkb2+σ[AW]jδ)+δj.\displaystyle u_{j}=\frac{b_{1}}{\left\|{Xu^{*}}\right\|}\left(\lambda^{*}z^{*}_{j}\widecheck{u}_{j}+\frac{\sigma}{\sqrt{n}}\sum_{k\neq j}A_{jk}W_{jk}z^{*}_{k}b_{2}+\sigma[A\circ W]_{j\cdot}\delta^{*}\right)+\delta_{j}.

By (4), when uj0u_{j}\neq 0, we have

|z^jzjb1b2|\displaystyle\left|\widehat{z}_{j}-z_{j}^{*}b_{1}b_{2}\right| =|b2b1zj¯z^j1|=|b2b1zj¯uj|uj|1|=|b2b1zj¯uj|b2b1zj¯uj|1|\displaystyle=\left|b_{2}\overline{b_{1}z_{j}^{*}}\widehat{z}_{j}-1\right|=\left|b_{2}\overline{b_{1}z_{j}^{*}}\frac{u_{j}}{\left|u_{j}\right|}-1\right|=\left|\frac{b_{2}\overline{b_{1}z_{j}^{*}}u_{j}}{\left|b_{2}\overline{b_{1}z_{j}^{*}}u_{j}\right|}-1\right|
=|Xuλb2b1zj¯uj|Xuλb2b1zj¯uj|1|\displaystyle=\left|\frac{\frac{\left\|{Xu^{*}}\right\|}{\lambda^{*}}b_{2}\overline{b_{1}z_{j}^{*}}u_{j}}{\left|\frac{\left\|{Xu^{*}}\right\|}{\lambda^{*}}b_{2}\overline{b_{1}z_{j}^{*}}u_{j}\right|}-1\right| (41)

which is all about Xuλb2b1zj¯uj\frac{\left\|{Xu^{*}}\right\|}{\lambda^{*}}b_{2}\overline{b_{1}z_{j}^{*}}u_{j}. With

ξj:=kjAjkWjkzj¯zk,\displaystyle\xi_{j}:=\sum_{k\neq j}A_{jk}W_{jk}\overline{z^{*}_{j}}z^{*}_{k},

we have

Xuλb2b1zj¯uj\displaystyle\frac{\left\|{Xu^{*}}\right\|}{\lambda^{*}}b_{2}\overline{b_{1}z_{j}^{*}}u_{j} =b2uwidecheckj+σλnξj+σ[AW]jδb2zj¯λ+Xuλδjb2b1zj¯.\displaystyle={b_{2}}\widecheck{u}_{j}+\frac{\sigma}{\lambda^{*}\sqrt{n}}\xi_{j}+\frac{\sigma[A\circ W]_{j\cdot}\delta^{*}b_{2}\overline{z_{j}^{*}}}{\lambda^{*}}+\frac{\left\|{Xu^{*}}\right\|}{\lambda^{*}}\delta_{j}b_{2}\overline{b_{1}z_{j}^{*}}. (42)

Note that from (35), we have

b2uwidecheckj(1c2(lognnp+1log(np)))1n.\displaystyle b_{2}\widecheck{u}_{j}\geq\left(1-c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)\right)\frac{1}{\sqrt{n}}. (43)

Let 0<γ,ρ<1/80<\gamma,\rho<1/8 whose values will be given later. Consider the following two cases.

(1) If

|σλnξj|\displaystyle\left|\frac{\sigma}{\lambda^{*}\sqrt{n}}\xi_{j}\right| γn,\displaystyle\leq\frac{\gamma}{\sqrt{n}}, (44)
|σ[AW]jδλ|\displaystyle\left|\frac{\sigma[A\circ W]_{j\cdot}\delta^{*}}{\lambda^{*}}\right| ρn,\displaystyle\leq\frac{\rho}{\sqrt{n}}, (45)
|Xuλδj|\displaystyle\left|\frac{\left\|{Xu^{*}}\right\|}{\lambda^{*}}\delta_{j}\right| ρn\displaystyle\leq\frac{\rho}{\sqrt{n}} (46)

all hold, then from (42) and (43), we have

Re(Xuλb2b1zj¯uj)(1c2(lognnp+1log(np))γ2ρ)1n,\displaystyle{\rm Re}\left(\frac{\left\|{Xu^{*}}\right\|}{\lambda^{*}}b_{2}\overline{b_{1}z_{j}^{*}}u_{j}\right)\geq\left(1-c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)-\gamma-2\rho\right)\frac{1}{\sqrt{n}},

which can be further lower bounded by 1/(2n)1/(2\sqrt{n}) for sufficiently large nplogn\frac{np}{\log n}. Therefore, uj0u_{j}\neq 0 in this case. Then by Lemma 12 and (41), we have

|z^jzjb1b2|\displaystyle\left|\widehat{z}_{j}-z_{j}^{*}b_{1}b_{2}\right|
|Im(σλnξj+σ[AW]jδb2zj¯λ+Xuλδjb2b1zj¯)|(1c2(lognnp+1log(np))γ2ρ)1n\displaystyle\leq\frac{\left|{\rm Im}\left(\frac{\sigma}{\lambda^{*}\sqrt{n}}\xi_{j}+\frac{\sigma[A\circ W]_{j\cdot}\delta^{*}b_{2}\overline{z_{j}^{*}}}{\lambda^{*}}+\frac{\left\|{Xu^{*}}\right\|}{\lambda^{*}}\delta_{j}b_{2}\overline{b_{1}z_{j}^{*}}\right)\right|}{\left(1-c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)-\gamma-2\rho\right)\frac{1}{\sqrt{n}}}
|Im(σλnξj)|(1c2(lognnp+1log(np))γ2ρ)1n+|σ[AW]jδλ|+|Xuλδj|12n\displaystyle\leq\frac{\left|{\rm Im}\left(\frac{\sigma}{\lambda^{*}\sqrt{n}}\xi_{j}\right)\right|}{\left(1-c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)-\gamma-2\rho\right)\frac{1}{\sqrt{n}}}+\frac{\left|\frac{\sigma[A\circ W]_{j\cdot}\delta^{*}}{\lambda^{*}}\right|+\left|\frac{\left\|{Xu^{*}}\right\|}{\lambda^{*}}\delta_{j}\right|}{\frac{1}{2\sqrt{n}}}
=σλ|Im(ξj)|(1c2(lognnp+1log(np))γ2ρ)+2nσλ|[AW]jδ|+2nXuλ|δj|.\displaystyle=\frac{\frac{\sigma}{\lambda^{*}}\left|{\rm Im}\left(\xi_{j}\right)\right|}{\left(1-c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)-\gamma-2\rho\right)}+\frac{2\sqrt{n}\sigma}{\lambda^{*}}\left|[A\circ W]_{j\cdot}\delta^{*}\right|+\frac{2\sqrt{n}\left\|{Xu^{*}}\right\|}{\lambda^{*}}\left|\delta_{j}\right|.

Note that for any x,yx,y\in\mathbb{R} and any η>0\eta>0, we have (x+y)2=x2+2(η1/2x)(η1/2y)+y2(1+η)x2+(1+η1)y2(x+y)^{2}=x^{2}+2(\eta^{1/2}x)(\eta^{-1/2}y)+y^{2}\leq(1+\eta)x^{2}+(1+\eta^{-1})y^{2}. We have

|z^jzjb1b2|2\displaystyle\left|\widehat{z}_{j}-z_{j}^{*}b_{1}b_{2}\right|^{2} (1+η)σ2λ2|Im(ξj)|2(1c2(lognnp+1log(np))γ2ρ)2\displaystyle\leq\frac{(1+\eta)\frac{\sigma^{2}}{\lambda^{*2}}\left|{\rm Im}\left(\xi_{j}\right)\right|^{2}}{\left(1-c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)-\gamma-2\rho\right)^{2}}
+(1+η1)8nσ2λ2|[AW]jδ|2+(1+η1)8nXu2λ2|δj|2,\displaystyle\quad+(1+\eta^{-1})\frac{8n\sigma^{2}}{\lambda^{*2}}\left|[A\circ W]_{j\cdot}\delta^{*}\right|^{2}+(1+\eta^{-1})\frac{8n\left\|{Xu^{*}}\right\|^{2}}{\lambda^{*2}}\left|\delta_{j}\right|^{2},

where the value of η>0\eta>0 will be given later.

(2) If any one of (44)-(46) does not hold, we simply upper bound |z^jzjb1b2||\widehat{z}_{j}-z_{j}^{*}b_{1}b_{2}| by 2. Then this case can be written as

|z^jzjb1b2|2\displaystyle\left|\widehat{z}_{j}-z_{j}^{*}b_{1}b_{2}\right|^{2}
4(𝕀{|σλnξj|>γn}+𝕀{|σ[AW]jδλ|>ρn}+𝕀{|Xuλδj|>ρn})\displaystyle\leq 4\left({\mathbb{I}\left\{{\left|\frac{\sigma}{\lambda^{*}\sqrt{n}}\xi_{j}\right|>\frac{\gamma}{\sqrt{n}}}\right\}}+{\mathbb{I}\left\{{\left|\frac{\sigma[A\circ W]_{j\cdot}\delta^{*}}{\lambda^{*}}\right|>\frac{\rho}{\sqrt{n}}}\right\}}+{\mathbb{I}\left\{{\left|\frac{\left\|{Xu^{*}}\right\|}{\lambda^{*}}\delta_{j}\right|>\frac{\rho}{\sqrt{n}}}\right\}}\right)
4(𝕀{σ|ξj|γλ}+σ2n|[AW]jδ|2ρ2λ2+nXu2|δj|2ρ2λ2),\displaystyle\leq 4\left({\mathbb{I}\left\{{\sigma\left|\xi_{j}\right|\geq{\gamma\lambda^{*}}}\right\}}+\frac{\sigma^{2}n\left|[A\circ W]_{j\cdot}\delta^{*}\right|^{2}}{\rho^{2}\lambda^{*2}}+\frac{n\left\|{Xu^{*}}\right\|^{2}\left|\delta_{j}\right|^{2}}{\rho^{2}\lambda^{*2}}\right),

where in the last inequality we use the fact 𝕀{xy}x2/y2{\mathbb{I}\left\{{x\geq y}\right\}}\leq x^{2}/y^{2} for any x,y>0x,y>0.

Combining the above two cases together, we have

|z^jzjb1b2|2\displaystyle\left|\widehat{z}_{j}-z_{j}^{*}b_{1}b_{2}\right|^{2}
(1+η)σ2λ2|Im(ξj)|2(1c2(lognnp+1log(np))γ2ρ)2\displaystyle\leq\frac{(1+\eta)\frac{\sigma^{2}}{\lambda^{*2}}\left|{\rm Im}\left(\xi_{j}\right)\right|^{2}}{\left(1-c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)-\gamma-2\rho\right)^{2}}
+(1+η1)8nσ2λ2|[AW]jδ|2+(1+η1)8nXu2λ2|δj|2\displaystyle\quad+(1+\eta^{-1})\frac{8n\sigma^{2}}{\lambda^{*2}}\left|[A\circ W]_{j\cdot}\delta^{*}\right|^{2}+(1+\eta^{-1})\frac{8n\left\|{Xu^{*}}\right\|^{2}}{\lambda^{*2}}\left|\delta_{j}\right|^{2}
+4(𝕀{σ|ξj|γλ}+σ2n|[AW]jδ|2ρ2λ2+nXu2|δj|2ρ2λ2)\displaystyle\quad+4\left({\mathbb{I}\left\{{\sigma\left|\xi_{j}\right|\geq{\gamma\lambda^{*}}}\right\}}+\frac{\sigma^{2}n\left|[A\circ W]_{j\cdot}\delta^{*}\right|^{2}}{\rho^{2}\lambda^{*2}}+\frac{n\left\|{Xu^{*}}\right\|^{2}\left|\delta_{j}\right|^{2}}{\rho^{2}\lambda^{*2}}\right)
(1+η)σ2λ2|Im(ξj)|2(1c2(lognnp+1log(np))γ2ρ)2+4𝕀{σ|ξj|γλ}\displaystyle\leq\frac{(1+\eta)\frac{\sigma^{2}}{\lambda^{*2}}\left|{\rm Im}\left(\xi_{j}\right)\right|^{2}}{\left(1-c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)-\gamma-2\rho\right)^{2}}+4{\mathbb{I}\left\{{\sigma\left|\xi_{j}\right|\geq{\gamma\lambda^{*}}}\right\}}
+8(1+η1+ρ2)nσ2λ2|[AW]jδ|2+8(1+η1+ρ2)nXu2λ2|δj|2.\displaystyle\quad+8(1+\eta^{-1}+\rho^{-2})\frac{n\sigma^{2}}{\lambda^{*2}}\left|[A\circ W]_{j\cdot}\delta^{*}\right|^{2}+8(1+\eta^{-1}+\rho^{-2})\frac{n\left\|{Xu^{*}}\right\|^{2}}{\lambda^{*2}}\left|\delta_{j}\right|^{2}.

The display above holds for each j[n]j\in[n]. Summing over jj, we have

n(z^,z)\displaystyle n\ell(\widehat{z},z^{*})
j[n]|z^jzjb1b2|2\displaystyle\leq\sum_{j\in[n]}\left|\widehat{z}_{j}-z_{j}^{*}b_{1}b_{2}\right|^{2}
(1+η)σ2λ2(1c2(lognnp+1log(np))γ2ρ)2j[n]|Im(ξj)|2+4j[n]𝕀{σ|ξj|γλ}\displaystyle\leq\frac{(1+\eta)\frac{\sigma^{2}}{\lambda^{*2}}}{\left(1-c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)-\gamma-2\rho\right)^{2}}\sum_{j\in[n]}\left|{\rm Im}\left(\xi_{j}\right)\right|^{2}+4\sum_{j\in[n]}{\mathbb{I}\left\{{\sigma\left|\xi_{j}\right|\geq{\gamma\lambda^{*}}}\right\}}
+8(1+η1+ρ2)nσ2λ2j[n]|[AW]jδ|2+8(1+η1+ρ2)nXu2λ2j[n]|δj|2\displaystyle\quad+8(1+\eta^{-1}+\rho^{-2})\frac{n\sigma^{2}}{\lambda^{*2}}\sum_{j\in[n]}\left|[A\circ W]_{j\cdot}\delta^{*}\right|^{2}+8(1+\eta^{-1}+\rho^{-2})\frac{n\left\|{Xu^{*}}\right\|^{2}}{\lambda^{*2}}\sum_{j\in[n]}\left|\delta_{j}\right|^{2}
(1+η)σ2λ2(1c2(lognnp+1log(np))γ2ρ)2j[n]|Im(ξj)|2+4j[n]𝕀{σ|ξj|γλ}\displaystyle\leq\frac{(1+\eta)\frac{\sigma^{2}}{\lambda^{*2}}}{\left(1-c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)-\gamma-2\rho\right)^{2}}\sum_{j\in[n]}\left|{\rm Im}\left(\xi_{j}\right)\right|^{2}+4\sum_{j\in[n]}{\mathbb{I}\left\{{\sigma\left|\xi_{j}\right|\geq{\gamma\lambda^{*}}}\right\}}
+8(1+η1+ρ2)nσ2λ2AW2δ2+8(1+η1+ρ2)nXu2λ2δ2,\displaystyle\quad+8(1+\eta^{-1}+\rho^{-2})\frac{n\sigma^{2}}{\lambda^{*2}}\left\|{A\circ W}\right\|^{2}\left\|{\delta^{*}}\right\|^{2}+8(1+\eta^{-1}+\rho^{-2})\frac{n\left\|{Xu^{*}}\right\|^{2}}{\lambda^{*2}}\left\|{\delta}\right\|^{2},

where in the last inequality, we use j[n]|[AW]jδ|2=(AW)δ2AW2δ2\sum_{j\in[n]}\left|[A\circ W]_{j\cdot}\delta^{*}\right|^{2}=\left\|{(A\circ W)\delta^{*}}\right\|^{2}\leq\left\|{A\circ W}\right\|^{2}\left\|{\delta^{*}}\right\|^{2}.

We are going to simplify the display above. From (34), (37), (39), and (38), we have upper bounds for δ,AW,\left\|{\delta}\right\|,\left\|{A\circ W}\right\|, δ\left\|{\delta^{*}}\right\|, and j[n]|Im(ξj)|2\sum_{j\in[n]}\left|{\rm Im}\left(\xi_{j}\right)\right|^{2}. Using (36), Lemma 1, and Lemma 7, we have λ(n1)pc2np\lambda^{*}\geq(n-1)p-c_{2}\sqrt{np} and a crude bound np/2λ2npnp/2\leq\lambda^{*}\leq 2np when nplogn\frac{np}{\log n} is greater than some sufficiently large constant. Due to the decomposition X=AzzH+σAWX=A\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}}+\sigma A\circ W and that (AzzH)u=λu(A\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}})u^{*}=\lambda^{*}u^{*}, we have

Xu=λu+σAWuλ+σAWnp+c2σnp.\displaystyle\left\|{Xu^{*}}\right\|=\left\|{\lambda^{*}u^{*}+\sigma A\circ Wu^{*}}\right\|\leq\lambda^{*}+\sigma\left\|{A\circ W}\right\|\leq np+c_{2}\sigma\sqrt{np}.

From Lemma 11, if γ\gamma satisfies γ2npσ2>c3\frac{\gamma^{2}np}{\sigma^{2}}>c_{3} for some constant c3>0c_{3}>0, we have

j[n]𝕀{σ|ξj|γλ}\displaystyle\sum_{j\in[n]}{\mathbb{I}\left\{{\sigma\left|\xi_{j}\right|\geq{\gamma\lambda^{*}}}\right\}} j[n]𝕀{2σnp|ξj|γ}4σ2γ2pexp(116γ2npσ2),\displaystyle\leq\sum_{j\in[n]}{\mathbb{I}\left\{{\frac{2\sigma}{np}\left|\xi_{j}\right|\geq{\gamma}}\right\}}\leq\frac{4\sigma^{2}}{\gamma^{2}p}\exp\left(-\frac{1}{16}\sqrt{\frac{\gamma^{2}np}{\sigma^{2}}}\right),

holds with probability at least 1exp(132γ2npσ2)1-\exp\left(-\frac{1}{32}\sqrt{\frac{\gamma^{2}np}{\sigma^{2}}}\right). When c3c_{3} is sufficiently large, we have

4σ2γ2npexp(116γ2npσ2)(σ2γ2np)3,\displaystyle\frac{4\sigma^{2}}{\gamma^{2}np}\exp\left(-\frac{1}{16}\sqrt{\frac{\gamma^{2}np}{\sigma^{2}}}\right)\leq\left(\frac{\sigma^{2}}{\gamma^{2}np}\right)^{3},

which is due to the fact 4exp(x/16)1/x24\exp\left(-\sqrt{x}/16\right)\leq 1/x^{2} when xx0x\geq x_{0} for some large x0>0x_{0}>0.

Combining the above results together, we have

(z^,z)\displaystyle\ell(\widehat{z},z^{*}) (1+η)(11c21np1n)2(1c2(lognnp+1log(np))γ2ρ)2(1+c2lognn)σ22np+(σ2γ2np)3\displaystyle\leq\frac{(1+\eta)\left(\frac{1}{1-c_{2}\frac{1}{\sqrt{np}}-\frac{1}{n}}\right)^{2}}{\left(1-c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)-\gamma-2\rho\right)^{2}}\left(1+c_{2}\sqrt{\frac{\log n}{n}}\right)\frac{\sigma^{2}}{2np}+\left(\frac{\sigma^{2}}{\gamma^{2}np}\right)^{3}
+32(1+η1+ρ2)c22(2c2np)2σ2np\displaystyle\quad+32(1+\eta^{-1}+\rho^{-2})c_{2}^{2}\left(\frac{2c_{2}}{\sqrt{np}}\right)^{2}\frac{\sigma^{2}}{np}
+128(1+η1+ρ2)(1+c22σ2np)c22σ4+σ2(np)2.\displaystyle\quad+128(1+\eta^{-1}+\rho^{-2})\left(1+\frac{c_{2}^{2}\sigma^{2}}{np}\right)c_{2}^{2}\frac{\sigma^{4}+\sigma^{2}}{(np)^{2}}.

Note that 1(1x)21+16x,x12\frac{1}{(1-x)^{2}}\leq 1+16x,\forall\leq x\leq\frac{1}{2}. We have (1c2(lognnp+1log(np))γ2ρ)216(c2(lognnp+1log(np))+γ+2ρ)\left(1-c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)-\gamma-2\rho\right)^{-2}\leq 16\left(c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)+\gamma+2\rho\right) and (1c21np1n)216(c21np+1n)\left(1-c_{2}\frac{1}{\sqrt{np}}-\frac{1}{n}\right)^{-2}\leq 16\left(c_{2}\frac{1}{\sqrt{np}}+\frac{1}{n}\right) as long as nplogn\frac{np}{\log n} is greater than some sufficiently large constant. After rearrangement, there exists some constant c5>0c_{5}>0 such that

(z^,z)(1+c5(\displaystyle\ell(\widehat{z},z^{*})\leq\Bigg{(}1+c_{5}\Bigg{(} η+γ+ρ+lognnp+1log(np)+γ6(σ2np)2\displaystyle\eta+\gamma+\rho+\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}+\gamma^{-6}\left(\frac{\sigma^{2}}{np}\right)^{2}
+(η1+ρ2)(1+σ2np)))σ22np.\displaystyle\quad+(\eta^{-1}+\rho^{-2})\left(\frac{1+\sigma^{2}}{np}\right)\Bigg{)}\Bigg{)}\frac{\sigma^{2}}{2np}.

We can choose γ2=σ2/(np)\gamma^{2}=\sqrt{{\sigma^{2}}/{(np)}} (then γ2npσ2>c3\frac{\gamma^{2}np}{\sigma^{2}}>c_{3} is guaranteed as long as npσ2>c32\frac{np}{\sigma^{2}}>c_{3}^{2}). We also set ρ2=(1+σ2)/np\rho^{2}=\sqrt{(1+\sigma^{2})/np} and let η=ρ2\eta=\rho^{2}. Then, there exists some constant c6>0c_{6}>0 such that

(z^,z)(1+c6((σ2np)14+lognnp+1log(np)))σ22np.\displaystyle\ell(\widehat{z},z^{*})\leq\left(1+c_{6}\left(\left(\frac{\sigma^{2}}{np}\right)^{\frac{1}{4}}+\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)\right)\frac{\sigma^{2}}{2np}.

This holds with probability at least 1n9exp(132(npσ2)14)1-n^{-9}-\exp\left(-\frac{1}{32}\left(\frac{np}{\sigma^{2}}\right)^{\frac{1}{4}}\right). ∎

5.4 Proofs of Auxiliary Lemmas

Proof of Lemma 5.

Let λ~1λ~2λ~d\widetilde{\lambda}_{1}\geq\widetilde{\lambda}_{2}\geq\ldots\geq\widetilde{\lambda}_{d} be eigenvalues of X~\widetilde{X}. By Weyl’s inequality, we have λ~r+1λr+1XX~\|\widetilde{\lambda}_{r+1}-\lambda_{r+1}\|\leq\|X-\widetilde{X}\|. Under the assumption XX~<(λrλr+1)/4\|{X-\widetilde{X}}\|<(\lambda_{r}-\lambda_{r+1})/4, we have

λrλ~r+1\displaystyle\lambda_{r}-\widetilde{\lambda}_{r+1} =λrλr+1+λr+1λ~r+1λrλr+1XX~>34(λrλr+1)>0.\displaystyle=\lambda_{r}-\lambda_{r+1}+\lambda_{r+1}-\widetilde{\lambda}_{r+1}\geq\lambda_{r}-\lambda_{r+1}-\left\|{X-\widetilde{X}}\right\|>\frac{3}{4}\left(\lambda_{r}-\lambda_{r+1}\right)>0.

Define

Θ(U,U~):=diag(cos1σ1,,cos1σr)r×r,\displaystyle\Theta(U,\widetilde{U}):=\text{diag}(\cos^{-1}\sigma_{1},\ldots,\cos^{-1}\sigma_{r})\in\mathbb{R}^{r\times r},

where σ1σ2σr\sigma_{1}\geq\sigma_{2}\geq\ldots\geq\sigma_{r} are singular values of UHU~U^{\mathrm{\scriptscriptstyle H}}\widetilde{U}. Since λrλ~r+1>0\lambda_{r}-\widetilde{\lambda}_{r+1}>0, by Davis-Kahan Theorem [14], we have

sinΘ(U,U~)XX~λrλ~r+14XX~3(λrλr+1).\displaystyle\left\|{\sin\Theta(U,\widetilde{U})}\right\|\leq\frac{\left\|{X-\widetilde{X}}\right\|}{\lambda_{r}-\widetilde{\lambda}_{r+1}}\leq\frac{4\left\|{X-\widetilde{X}}\right\|}{3(\lambda_{r}-\lambda_{r+1})}.

From [14], we also have sinΘ(U,U~)=(IUUH)U~\|{\sin\Theta(U,\widetilde{U})}\|=\|{(I-UU^{\mathrm{\scriptscriptstyle H}})\widetilde{U}}\|. The proof is complete. ∎

Proof of Lemma 6.

Since both xx and yy are unit vectors, we have

xyb2=2xHyb(yb)Hx=22Re(xHyb),b1.\displaystyle\left\|{x-yb}\right\|^{2}=2-x^{\mathrm{\scriptscriptstyle H}}yb-(yb)^{\mathrm{\scriptscriptstyle H}}x=2-2{\rm Re}(x^{\mathrm{\scriptscriptstyle H}}yb),\forall b\in\mathbb{C}_{1}. (47)

Therefore, when xHy=0x^{\mathrm{\scriptscriptstyle H}}y=0, we have xyb=2\left\|{x-yb}\right\|=\sqrt{2} invariant of bb. In this case, we also have (InxxH)y=y=1\left\|{(I_{n}-xx^{\mathrm{\scriptscriptstyle H}})y}\right\|=\left\|{y}\right\|=1. This proves the statement in the lemma for the xHy=0x^{\mathrm{\scriptscriptstyle H}}y=0 case. When xHy0x^{\mathrm{\scriptscriptstyle H}}y\neq 0, the infimum over bb in (47) is achieved when b=yHx/|yHx|b=y^{\mathrm{\scriptscriptstyle H}}x/|y^{\mathrm{\scriptscriptstyle H}}x|. We then have

infb1xyb2\displaystyle\inf_{b\in\mathbb{C}_{1}}\left\|{x-yb}\right\|^{2} =yxHy|xHy|x2=yxxHy+xxHyxHy|xHy|x2\displaystyle=\left\|{y-\frac{x^{\mathrm{\scriptscriptstyle H}}y}{\left|x^{\mathrm{\scriptscriptstyle H}}y\right|}x}\right\|^{2}=\left\|{y-xx^{\mathrm{\scriptscriptstyle H}}y+xx^{\mathrm{\scriptscriptstyle H}}y-\frac{x^{\mathrm{\scriptscriptstyle H}}y}{\left|x^{\mathrm{\scriptscriptstyle H}}y\right|}x}\right\|^{2}
=yxxHy2+(11|xHy|)(xHy)x2\displaystyle=\left\|{y-xx^{\mathrm{\scriptscriptstyle H}}y}\right\|^{2}+\left\|{\left(1-\frac{1}{\left|x^{\mathrm{\scriptscriptstyle H}}y\right|}\right)(x^{\mathrm{\scriptscriptstyle H}}y)x}\right\|^{2}
=yxxHy2+|11|xHy||2|xHy|2\displaystyle=\left\|{y-xx^{\mathrm{\scriptscriptstyle H}}y}\right\|^{2}+\left|1-\frac{1}{\left|x^{\mathrm{\scriptscriptstyle H}}y\right|}\right|^{2}\left|x^{\mathrm{\scriptscriptstyle H}}y\right|^{2}
yxxHy2+|1|xHy||2,\displaystyle\leq\left\|{y-xx^{\mathrm{\scriptscriptstyle H}}y}\right\|^{2}+\left|1-\left|x^{\mathrm{\scriptscriptstyle H}}y\right|\right|^{2},

where we use the orthogonality between (IdxxH)y(I_{d}-xx^{\mathrm{\scriptscriptstyle H}})y and xx. With yxxHy2=1+xxHy22yHxxHy=1|xHy|2(1|xHy|)2\left\|{y-xx^{\mathrm{\scriptscriptstyle H}}y}\right\|^{2}=1+\left\|{xx^{\mathrm{\scriptscriptstyle H}}y}\right\|^{2}-2y^{\mathrm{\scriptscriptstyle H}}xx^{\mathrm{\scriptscriptstyle H}}y=1-\left|x^{\mathrm{\scriptscriptstyle H}}y\right|^{2}\geq\left(1-\left|x^{\mathrm{\scriptscriptstyle H}}y\right|\right)^{2}, where the last inequality is due to 0|xHy|10\leq\left|x^{\mathrm{\scriptscriptstyle H}}y\right|\leq 1, the proof is complete. ∎

Proof of Lemma 7.

Note that 𝔼A=pJnpIn\mathbb{E}A=pJ_{n}-pI_{n}. Note that (𝟙n/n)T𝔼A(𝟙n/n)=(n1)p(\mathds{1}_{n}/\sqrt{n})^{\mathrm{\scriptscriptstyle T}}\mathbb{E}A(\mathds{1}_{n}/\sqrt{n})=(n-1)p and for any unit vector unu\in\mathbb{R}^{n} that is orthogonal to 𝟙n/n\mathds{1}_{n}/\sqrt{n}, we have uT𝔼Au=0pu2=pu^{\mathrm{\scriptscriptstyle T}}\mathbb{E}Au=0-p\|u\|^{2}=-p. Hence, (n1)p(n-1)p is the largest eigenvalue with 𝟙n/n\mathds{1}_{n}/\sqrt{n} being the corresponding eigenvector, and p-p is another eigenvalue with multiplicity n1n-1.

By Weyl’s inequality, we have |λ(n1)p|,max2jn|λj(p)|A𝔼A|\lambda^{\prime}-(n-1)p|,\max_{2\leq j\leq n}|\lambda^{\prime}_{j}-(-p)|\leq\left\|{A-\mathbb{E}A}\right\|, which leads to (33) after rearrangement. This completes the proof, with λ=λ\lambda^{*}=\lambda^{\prime} and λ2=λ2\lambda^{*}_{2}=\lambda^{\prime}_{2} by Lemma 1. ∎

Proof of Lemma 8.

The first two inequalities stem from Lemma 5 and Lemma 6 of [18], respectively. The third inequality is derived from Lemma 7 and (29) in [18]. ∎

Proof of Lemma 11.

It is proved in (31) of [18]. ∎

6 Proof of Lemma 4

Before the proof, we first state a technical lemma that is analogous to Lemma 6.

Lemma 13.

For any two matrices U,V𝒪(d1,d2)U,V\in\mathcal{O}(d_{1},d_{2}), we have

(Id1VVT)UinfO𝒪(d2)VUO2(Id1VVT)U.\displaystyle\left\|{(I_{d_{1}}-VV^{\mathrm{\scriptscriptstyle T}})U}\right\|\leq\inf_{O\in\mathcal{O}(d_{2})}\left\|{V-UO}\right\|\leq\sqrt{2}\left\|{(I_{d_{1}}-VV^{\mathrm{\scriptscriptstyle T}})U}\right\|.
Proof.

Let Vd1×(d1d2)V_{\perp}\in\mathbb{R}^{d_{1}\times(d_{1}-d_{2})} be the complement of VV such that (V,V)𝒪(d1)(V,V_{\perp})\in\mathcal{O}(d_{1}). From Lemma 1 of [10], we have UTVinfO𝒪(d2)VUO2UTV\|U^{\mathrm{\scriptscriptstyle T}}V_{\perp}\|\leq\inf_{O\in\mathcal{O}(d_{2})}\left\|{V-UO}\right\|\leq\sqrt{2}\|U^{\mathrm{\scriptscriptstyle T}}V_{\perp}\|. The proof is complete with UTV=VVTU=(Id1VVT)U\|U^{\mathrm{\scriptscriptstyle T}}V_{\perp}\|=\|V_{\perp}V_{\perp}^{\mathrm{\scriptscriptstyle T}}U\|=\left\|{(I_{d_{1}}-VV^{\mathrm{\scriptscriptstyle T}})U}\right\|. ∎

Proof of Lemma 4.

We first give an explicit expression for the first-order approximation V~\widetilde{V}. Denote μ1μn\mu_{1}\geq\ldots\geq\mu_{n} as the eigenvalues of YY. Let YV=GDNTYV^{*}=GDN^{\mathrm{\scriptscriptstyle T}} be its SVD where G𝒪(n,d)G\in\mathcal{O}(n,d), N𝒪(d)N\in\mathcal{O}(d), and Dd×dD\in\mathbb{R}^{d\times d} is a diagonal matrix with singular values. Define M=diag(μ1,,μd)d×dM^{*}=\text{diag}(\mu_{1}^{*},\ldots,\mu^{*}_{d})\in\mathbb{R}^{d\times d}. Since

YV=YV+(YY)V=VM+(YY)V,\displaystyle YV^{*}=Y^{*}V^{*}+(Y-Y^{*})V^{*}=V^{*}M^{*}+(Y-Y^{*})V^{*}, (48)

we have

maxi[d]|Diiμi|(YY)VYY,\displaystyle\max_{i\in[d]}\left|D_{ii}-\mu^{*}_{i}\right|\leq\left\|{(Y-Y^{*})V^{*}}\right\|\leq\left\|{Y-Y^{*}}\right\|, (49)

by Weyl’s inequality. Under the assumption that YYmin{μdμd+1,μd}/4\left\|{Y-Y^{*}}\right\|\leq\min\{\mu_{d}^{*}-\mu_{d+1}^{*},\mu_{d}^{*}\}/4, we have {Dii}i[d]\{D_{ii}\}_{i\in[d]} all being positive. Note that

V~\displaystyle\widetilde{V} =argminV𝒪(n,d)VYVF2=argmaxV𝒪(n,d)V,YV\displaystyle=\mathop{\rm argmin}_{V^{\prime}\in\mathcal{O}(n,d)}\left\|{V^{\prime}-YV^{*}}\right\|_{\rm F}^{2}=\mathop{\rm argmax}_{V\in\mathcal{O}(n,d)}\left\langle V^{\prime},YV^{*}\right\rangle
=argmaxV𝒪(n,d)tr(VTGDNT)=argmaxV𝒪(n,d)GTVN,D.\displaystyle=\mathop{\rm argmax}_{V^{\prime}\in\mathcal{O}(n,d)}\text{tr}\left(V^{\prime{\mathrm{\scriptscriptstyle T}}}GDN^{\mathrm{\scriptscriptstyle T}}\right)=\mathop{\rm argmax}_{V^{\prime}\in\mathcal{O}(n,d)}\left\langle G^{\mathrm{\scriptscriptstyle T}}V^{\prime}N,D\right\rangle.

Due to the fact that G,V𝒪(n,d)G,V^{\prime}\in\mathcal{O}(n,d), N𝒪(d)N\in\mathcal{O}(d), and the diagonal entries of DD are all positive, the maximum is achieved when GTVN=IdG^{\mathrm{\scriptscriptstyle T}}V^{\prime}N=I_{d}. This gives V~=GNT\widetilde{V}=GN^{\mathrm{\scriptscriptstyle T}} which can also be written as

V~=YVS,\displaystyle\widetilde{V}=YV^{*}S, (50)

where

S:=ND1NTd×d\displaystyle S:=ND^{-1}N^{\mathrm{\scriptscriptstyle T}}\in\mathbb{R}^{d\times d} (51)

can be seen as a linear operator and plays a similar role as 1/Xu1/\left\|{Xu^{*}}\right\| for u~=Xu/Xu\widetilde{u}=Xu^{*}/\left\|{Xu^{*}}\right\| in (9).

Define M:=diag(μ1,μ2,,μd)d×dM:=\text{diag}(\mu_{1},\mu_{2},\ldots,\mu_{d})\in\mathbb{R}^{d\times d}. Then we have

VM\displaystyle VM =YV,\displaystyle=YV,
V~M\displaystyle\widetilde{V}M =YVSM,\displaystyle=YV^{*}SM,

and consequently,

(VV~)M=Y(VVSM)=Y(VV~)+Y(V~VSM).\displaystyle(V-\widetilde{V})M=Y(V-V^{*}SM)=Y(V-\widetilde{V})+Y(\widetilde{V}-V^{*}SM).

Note that (IVVT)Y=Y(IVVT)(I-VV^{\mathrm{\scriptscriptstyle T}})Y=Y(I-VV^{\mathrm{\scriptscriptstyle T}}) as VV is the leading eigenspace of YY. After rearranging, we have

YV~V~M=Y(V~VSM).\displaystyle Y\widetilde{V}-\widetilde{V}M=Y(\widetilde{V}-V^{*}SM).

Multiplying (IVVT)(I-VV^{\mathrm{\scriptscriptstyle T}}) on both sides, we have

Y(IVVT)V~(IVVT)V~M\displaystyle Y(I-VV^{\mathrm{\scriptscriptstyle T}})\widetilde{V}-(I-VV^{\mathrm{\scriptscriptstyle T}})\widetilde{V}M =(IVVT)YV~(IVVT)V~M\displaystyle=(I-VV^{\mathrm{\scriptscriptstyle T}})Y\widetilde{V}-(I-VV^{\mathrm{\scriptscriptstyle T}})\widetilde{V}M
=(IVVT)Y(V~VSM),\displaystyle=(I-VV^{\mathrm{\scriptscriptstyle T}})Y({\widetilde{V}-V^{*}SM}),

where the first equation is due to Y(IVVT)=(IVVT)YY(I-VV^{\mathrm{\scriptscriptstyle T}})=(I-VV^{\mathrm{\scriptscriptstyle T}})Y as VV is the leading eigenspace of YY. Note that for any xspan(IVVT)x\in\text{span}(I-VV^{\mathrm{\scriptscriptstyle T}}) and for any i[d]i\in[d], we have Yxμix(μiμd+1)x\left\|{Yx-\mu_{i}x}\right\|\geq(\mu_{i}-\mu_{d+1})\left\|{x}\right\|. Then we have

Y(IVVT)V~(IVVT)V~M(μdμd+1)(IVVT)V~.\displaystyle\left\|{Y(I-VV^{\mathrm{\scriptscriptstyle T}})\widetilde{V}-(I-VV^{\mathrm{\scriptscriptstyle T}})\widetilde{V}M}\right\|\geq(\mu_{d}-\mu_{d+1})\left\|{(I-VV^{\mathrm{\scriptscriptstyle T}})\widetilde{V}}\right\|.

As a result, we have

(IVVT)V~1μdμd+1(IVVT)Y(V~VSM),\displaystyle\left\|{(I-VV^{\mathrm{\scriptscriptstyle T}})\widetilde{V}}\right\|\leq\frac{1}{\mu_{d}-\mu_{d+1}}\left\|{(I-VV^{\mathrm{\scriptscriptstyle T}})Y({\widetilde{V}-V^{*}SM})}\right\|, (52)

which is analogous to (31) in the proof of Lemma 4. By Lemma 13, we have

infO𝒪(d)VV~O2(IVVT)V~2μdμd+1(IVVT)Y(V~VSM).\displaystyle\inf_{O\in\mathcal{O}(d)}\left\|{V-\widetilde{V}O}\right\|\leq\sqrt{2}\left\|{(I-VV^{\mathrm{\scriptscriptstyle T}})\widetilde{V}}\right\|\leq\frac{\sqrt{2}}{\mu_{d}-\mu_{d+1}}\left\|{(I-VV^{\mathrm{\scriptscriptstyle T}})Y({\widetilde{V}-V^{*}SM})}\right\|. (53)

In the next, we are going to analyze (IVVT)Y(V~VSM)(I-VV^{\mathrm{\scriptscriptstyle T}})Y({\widetilde{V}-V^{*}SM}). Using (50), we have

(IVVT)Y(V~VSM)\displaystyle(I-VV^{\mathrm{\scriptscriptstyle T}})Y({\widetilde{V}-V^{*}SM})
=(IVVT)Y(YVSVSM)\displaystyle=(I-VV^{\mathrm{\scriptscriptstyle T}})Y\left(YV^{*}S-V^{*}SM\right)
=(IVVT)Y(VMS+(YY)VSVSM)\displaystyle=(I-VV^{\mathrm{\scriptscriptstyle T}})Y\left(V^{*}M^{*}S+(Y-Y^{*})V^{*}S-V^{*}SM\right)
=(IVVT)YV(MSSM)+(IVVT)Y(YY)VS\displaystyle=(I-VV^{\mathrm{\scriptscriptstyle T}})YV^{*}\left(M^{*}S-SM\right)+(I-VV^{\mathrm{\scriptscriptstyle T}})Y(Y-Y^{*})V^{*}S
=(IVVT)(VM+(YY)V)(MSSM)\displaystyle=(I-VV^{\mathrm{\scriptscriptstyle T}})\left(V^{*}M^{*}+(Y-Y^{*})V^{*}\right)\left(M^{*}S-SM\right)
+(IVVT)VMVT(YY)VS\displaystyle\quad+(I-VV^{\mathrm{\scriptscriptstyle T}})V^{*}M^{*}V^{*{\mathrm{\scriptscriptstyle T}}}(Y-Y^{*})V^{*}S
+(IVVT)(YVMVT)(YY)VS+(IVVT)(YY)(YY)VS\displaystyle\quad+(I-VV^{\mathrm{\scriptscriptstyle T}})(Y^{*}-V^{*}M^{*}V^{*{\mathrm{\scriptscriptstyle T}}})(Y-Y^{*})V^{*}S+(I-VV^{\mathrm{\scriptscriptstyle T}})(Y-Y^{*})(Y-Y^{*})V^{*}S
=(IVVT)VM((MSSM)+VT(YY)VS)\displaystyle=(I-VV^{\mathrm{\scriptscriptstyle T}})V^{*}M^{*}\left(\left(M^{*}S-SM\right)+V^{*{\mathrm{\scriptscriptstyle T}}}(Y-Y^{*})V^{*}S\right)
+(IVVT)(YY)V(MSSM)\displaystyle\quad+(I-VV^{\mathrm{\scriptscriptstyle T}})(Y-Y^{*})V^{*}\left(M^{*}S-SM\right)
+(IVVT)(YVMVT)(YY)VS+(IVVT)(YY)(YY)VS,\displaystyle\quad+(I-VV^{\mathrm{\scriptscriptstyle T}})(Y^{*}-V^{*}M^{*}V^{*{\mathrm{\scriptscriptstyle T}}})(Y-Y^{*})V^{*}S+(I-VV^{\mathrm{\scriptscriptstyle T}})(Y-Y^{*})(Y-Y^{*})V^{*}S,

where in the second to last equation, we use (48) and the decomposition Y=VMVT+(YVMVT)+(YY)Y=V^{*}M^{*}V^{*{\mathrm{\scriptscriptstyle T}}}+(Y^{*}-V^{*}M^{*}V^{*{\mathrm{\scriptscriptstyle T}}})+(Y-Y^{*}). Hence, with YVMVT=max{|μd+1|,|μn|}\|Y^{*}-V^{*}M^{*}V^{*{\mathrm{\scriptscriptstyle T}}}\|=\max\{|\mu^{*}_{d+1}|,|\mu^{*}_{n}|\}, we have

(IVVT)Y(V~VSM)\displaystyle\left\|{(I-VV^{\mathrm{\scriptscriptstyle T}})Y({\widetilde{V}-V^{*}SM})}\right\|
μ1(IVVT)V(MSSM+YYS)\displaystyle\leq\mu_{1}^{*}\left\|{(I-VV^{\mathrm{\scriptscriptstyle T}})V^{*}}\right\|\left(\left\|{M^{*}S-SM}\right\|+\left\|{Y-Y^{*}}\right\|\left\|{S}\right\|\right)
+YYMSSM+max{|μd+1|,|μn|}YYS+YY2S.\displaystyle\quad+\left\|{Y-Y^{*}}\right\|\left\|{M^{*}S-SM}\right\|+\max\{|\mu^{*}_{d+1}|,|\mu^{*}_{n}|\}\left\|{Y-Y^{*}}\right\|\left\|{S}\right\|+\left\|{Y-Y^{*}}\right\|^{2}\left\|{S}\right\|.

Then from (53), we have

infO𝒪(d)VV~O\displaystyle\inf_{O\in\mathcal{O}(d)}\left\|{V-\widetilde{V}O}\right\| 2μdμd+1(μ1(IVVT)V(MSSM+YYS)\displaystyle\leq\frac{\sqrt{2}}{\mu_{d}-\mu_{d+1}}\Bigg{(}\mu_{1}^{*}\left\|{(I-VV^{\mathrm{\scriptscriptstyle T}})V^{*}}\right\|\left(\left\|{M^{*}S-SM}\right\|+\left\|{Y-Y^{*}}\right\|\left\|{S}\right\|\right)
+YYMSSM+max{|μd+1|,|μn|}YYS\displaystyle\quad+\left\|{Y-Y^{*}}\right\|\left\|{M^{*}S-SM}\right\|+\max\{|\mu^{*}_{d+1}|,|\mu^{*}_{n}|\}\left\|{Y-Y^{*}}\right\|\left\|{S}\right\|
+YY2S).\displaystyle\quad+\left\|{Y-Y^{*}}\right\|^{2}\left\|{S}\right\|\Bigg{)}.

In the rest of the proof, we are going to simplify the display above. By Weyl’s inequality, we have

maxi[n]|μiμi|YY.\displaystyle\max_{i\in[n]}\left|\mu_{i}-\mu_{i}^{*}\right|\leq\left\|{Y-Y^{*}}\right\|. (54)

Since YY(μdμd+1)/4\left\|{Y-Y^{*}}\right\|\leq(\mu_{d}^{*}-\mu_{d+1}^{*})/4 is assumed, we have

μdμd+1μdμd+12.\displaystyle\mu_{d}-\mu_{d+1}\geq\frac{\mu_{d}^{*}-\mu_{d+1}^{*}}{2}.

By this assumption and Lemma 5, we have

(IVVT)V2YYμdμd+1.\displaystyle\left\|{(I-VV^{\mathrm{\scriptscriptstyle T}})V^{*}}\right\|\leq\frac{2\left\|{Y-Y^{*}}\right\|}{\mu^{*}_{d}-\mu^{*}_{d+1}}.

By (49) and the definition of SS in (51), we have

S=D11μdYY43μd.\displaystyle\left\|{S}\right\|=\left\|{D^{-1}}\right\|\leq\frac{1}{\mu_{d}^{*}-\left\|{Y-Y^{*}}\right\|}\leq\frac{4}{3\mu_{d}^{*}}.

In addition,

MSSM\displaystyle\left\|{M^{*}S-SM}\right\| MSSM+S(MM)\displaystyle\leq\left\|{M^{*}S-SM^{*}}\right\|+\left\|{S\left(M-M^{*}\right)}\right\|
(MμdId)S+S(μdIdM)+SMM\displaystyle\leq\left\|{(M^{*}-\mu_{d}^{*}I_{d})S+S(\mu_{d}^{*}I_{d}-M^{*})}\right\|+\left\|{S}\right\|\left\|{M-M^{*}}\right\|
S(2MμdId+MM)\displaystyle\leq\left\|{S}\right\|\left(2\left\|{M^{*}-\mu_{d}^{*}I_{d}}\right\|+\left\|{M-M^{*}}\right\|\right)
43μd(2(μ1μd)+YY),\displaystyle\leq\frac{4}{3\mu_{d}^{*}}\left(2(\mu_{1}^{*}-\mu_{d}^{*})+\left\|{Y-Y^{*}}\right\|\right),

where in the last inequality we use the fact MM=maxi[d]|μiμi|\left\|{M-M^{*}}\right\|=\max_{i\in[d]}\left|\mu_{i}-\mu_{i}^{*}\right| and (54). Combining all the results together, we have

infO𝒪(d)VV~O\displaystyle\inf_{O\in\mathcal{O}(d)}\left\|{V-\widetilde{V}O}\right\|
22μdμd+1(μ12YYμdμd+1(4(2(μ1μd)+YY)3μd+4YY3μd))\displaystyle\leq\frac{2\sqrt{2}}{\mu_{d}^{*}-\mu_{d+1}^{*}}\left(\mu_{1}^{*}\frac{2\left\|{Y-Y^{*}}\right\|}{\mu^{*}_{d}-\mu^{*}_{d+1}}\left(\frac{4\left(2(\mu_{1}^{*}-\mu_{d}^{*})+\left\|{Y-Y^{*}}\right\|\right)}{3\mu_{d}^{*}}+\frac{4\left\|{Y-Y^{*}}\right\|}{3\mu_{d}^{*}}\right)\right)
+43μd(2(μ1μd)+YY)YY+4max{|μd+1|,|μn|}YY3μd\displaystyle\quad+\frac{4}{3\mu_{d}^{*}}\left(2(\mu_{1}^{*}-\mu_{d}^{*})+\left\|{Y-Y^{*}}\right\|\right)\left\|{Y-Y^{*}}\right\|+\frac{4\max\{|\mu^{*}_{d+1}|,|\mu^{*}_{n}|\}\left\|{Y-Y^{*}}\right\|}{3\mu_{d}^{*}}
+4YY23μd)\displaystyle\quad+\frac{4\left\|{Y-Y^{*}}\right\|^{2}}{3\mu_{d}^{*}}\Bigg{)}
1623(μdμd+1)μd(2μ13(μdμd+1)+1)YY2\displaystyle\leq\frac{16\sqrt{2}}{3\left(\mu_{d}^{*}-\mu_{d+1}^{*}\right)\mu_{d}^{*}}\left(\frac{2\mu_{1}^{*}}{3(\mu_{d}^{*}-\mu_{d+1}^{*})}+1\right)\left\|{Y-Y^{*}}\right\|^{2}
+823(μdμd+1)μd(4μ1(μ1μd)μdμd+1+2(μ1μd)+max{|μd+1|,|μn|})YY.\displaystyle\quad+\frac{8\sqrt{2}}{3\left(\mu_{d}^{*}-\mu_{d+1}^{*}\right)\mu_{d}^{*}}\left(\frac{4\mu_{1}^{*}\left(\mu_{1}^{*}-\mu_{d}^{*}\right)}{\mu_{d}^{*}-\mu_{d+1}^{*}}+2(\mu_{1}^{*}-\mu_{d}^{*})+\max\{|\mu^{*}_{d+1}|,|\mu^{*}_{n}|\}\right)\left\|{Y-Y^{*}}\right\|.

References

  • [1] Emmanuel Abbe, Jianqing Fan, Kaizheng Wang, and Yiqiao Zhong. Entrywise eigenvector analysis of random matrices with low expected rank. The Annals of Statistics, 48(3):1452, 2020.
  • [2] Emmanuel Abbe, Laurent Massoulie, Andrea Montanari, Allan Sly, and Nikhil Srivastava. Group synchronization on grids. Mathematical Statistics and Learning, 1(3):227–256, 2018.
  • [3] Joshua Agterberg, Zachary Lubberts, and Carey E Priebe. Entrywise estimation of singular vectors of low-rank matrices with heteroskedasticity and dependence. IEEE Transactions on Information Theory, 68(7):4618–4650, 2022.
  • [4] Mica Arie-Nachimson, Shahar Z Kovalsky, Ira Kemelmacher-Shlizerman, Amit Singer, and Ronen Basri. Global motion estimation from point matches. In 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission, pages 81–88. IEEE, 2012.
  • [5] Afonso S Bandeira, Nicolas Boumal, and Amit Singer. Tightness of the maximum likelihood semidefinite relaxation for angular synchronization. Mathematical Programming, 163(1-2):145–167, 2017.
  • [6] Stéphane Boucheron, Gábor Lugosi, and Olivier Bousquet. Concentration inequalities. In Summer school on machine learning, pages 208–240. Springer, 2003.
  • [7] Nicolas Boumal. Nonconvex phase synchronization. SIAM Journal on Optimization, 26(4):2355–2377, 2016.
  • [8] Nicolas Boumal, Amit Singer, and P-A Absil. Robust estimation of rotations from relative measurements by maximum likelihood. In 52nd IEEE Conference on Decision and Control, pages 1156–1161. IEEE, 2013.
  • [9] Changxiao Cai, Gen Li, Yuejie Chi, H Vincent Poor, and Yuxin Chen. Subspace estimation from unbalanced and incomplete data matrices: 2,\ell_{2,\infty} statistical guarantees. The Annals of Statistics, 49(2):944–967, 2021.
  • [10] T Tony Cai and Anru Zhang. Rate-optimal perturbation bounds for singular subspaces with applications to high-dimensional statistics. The Annals of Statistics, 46(1):60–89, 2018.
  • [11] Joshua Cape, Minh Tang, and Carey E Priebe. The two-to-infinity norm and singular subspace geometry with applications to high-dimensional statistics. The Annals of Statistics, 47(5):2405–2439, 2019.
  • [12] Yuxin Chen, Yuejie Chi, Jianqing Fan, Cong Ma, et al. Spectral methods for data science: A statistical perspective. Foundations and Trends® in Machine Learning, 14(5):566–806, 2021.
  • [13] Mihai Cucuringu. Sync-rank: Robust ranking, constrained ranking and rank aggregation via eigenvector and sdp synchronization. IEEE Transactions on Network Science and Engineering, 3(1):58–79, 2016.
  • [14] Chandler Davis and William Morton Kahan. The rotation of eigenvectors by a perturbation. iii. SIAM Journal on Numerical Analysis, 7(1):1–46, 1970.
  • [15] Jianqing Fan, Weichen Wang, and Yiqiao Zhong. An \ell_{\infty} eigenvector perturbation bound and its application to robust covariance estimation. Journal of Machine Learning Research, 18(207):1–42, 2018.
  • [16] Yifeng Fan, Yuehaw Khoo, and Zhizhen Zhao. Joint community detection and rotational synchronization via semidefinite programming. SIAM Journal on Mathematics of Data Science, 4(3):1052–1081, 2022.
  • [17] Frank Filbir, Felix Krahmer, and Oleh Melnyk. On recovery guarantees for angular synchronization. Journal of Fourier Analysis and Applications, 27(2):31, 2021.
  • [18] Chao Gao and Anderson Y Zhang. Exact minimax estimation for phase synchronization. IEEE Transactions on Information Theory, 67(12):8236–8247, 2021.
  • [19] Chao Gao and Anderson Y Zhang. Sdp achieves exact minimax optimality in phase synchronization. IEEE Transactions on Information Theory, 2022.
  • [20] Chao Gao and Anderson Y Zhang. Optimal orthogonal group synchronization and rotation group synchronization. Information and Inference: A Journal of the IMA, 12(2):591–632, 2023.
  • [21] Mark A Iwen, Brian Preskitt, Rayan Saab, and Aditya Viswanathan. Phase retrieval from local measurements: Improved robustness via eigenvector-based angular synchronization. Applied and Computational Harmonic Analysis, 48(1):415–444, 2020.
  • [22] Adel Javanmard, Andrea Montanari, and Federico Ricci-Tersenghi. Phase transitions in semidefinite relaxations. Proceedings of the National Academy of Sciences, 113(16):E2218–E2223, 2016.
  • [23] Lihua Lei. Unified 2\ell_{2\rightarrow\infty} eigenspace perturbation theory for symmetric random matrices. arXiv preprint arXiv:1909.04798, 2019.
  • [24] Marc Lelarge and Léo Miolane. Fundamental limits of symmetric low-rank matrix estimation. Probability Theory and Related Fields, 173:859–929, 2019.
  • [25] Gilad Lerman and Yunpeng Shi. Robust group synchronization via cycle-edge message passing. Foundations of Computational Mathematics, 22(6):1665–1741, 2022.
  • [26] Shuyang Ling. Improved performance guarantees for orthogonal group synchronization via generalized power method. SIAM Journal on Optimization, 32(2):1018–1048, 2022.
  • [27] Shuyang Ling. Near-optimal performance bounds for orthogonal and permutation group synchronization via spectral methods. Applied and Computational Harmonic Analysis, 60:20–52, 2022.
  • [28] Shuyang Ling. Solving orthogonal group synchronization via convex and low-rank optimization: Tightness and landscape analysis. Mathematical Programming, 200(1):589–628, 2023.
  • [29] Huikang Liu, Man-Chung Yue, and Anthony Man-Cho So. On the estimation performance and convergence rate of the generalized power method for phase synchronization. SIAM Journal on Optimization, 27(4):2426–2446, 2017.
  • [30] Amelia Perry, Alexander S Wein, Afonso S Bandeira, and Ankur Moitra. Optimality and sub-optimality of PCA for spiked random matrices and synchronization. arXiv preprint arXiv:1609.05573, 2016.
  • [31] Amelia Perry, Alexander S Wein, Afonso S Bandeira, and Ankur Moitra. Message-passing algorithms for synchronization problems over compact groups. Communications on Pure and Applied Mathematics, 71(11):2275–2322, 2018.
  • [32] Brian P Preskitt. Phase retrieval from locally supported measurements. University of California, San Diego, 2018.
  • [33] Elad Romanov and Matan Gavish. The noise-sensitivity phase transition in spectral group synchronization over compact groups. Applied and Computational Harmonic Analysis, 49(3):935–970, 2020.
  • [34] Yanyao Shen, Qixing Huang, Nati Srebro, and Sujay Sanghavi. Normalized spectral map synchronization. Advances in Neural Information Processing Systems, 29, 2016.
  • [35] Yunpeng Shi and Gilad Lerman. Message passing least squares framework and its application to rotation synchronization. In International conference on machine learning, pages 8796–8806. PMLR, 2020.
  • [36] Amit Singer. Angular synchronization by eigenvectors and semidefinite programming. Applied and Computational Harmonic Analysis, 30(1):20–36, 2011.
  • [37] Amit Singer and Yoel Shkolnisky. Three-dimensional structure determination from common lines in cryo-em by eigenvectors and semidefinite programming. SIAM journal on Imaging Sciences, 4(2):543–572, 2011.
  • [38] Lanhui Wang and Amit Singer. Exact and stable recovery of rotations for robust synchronization. Information and Inference: A Journal of the IMA, 2(2):145–193, 2013.
  • [39] Yiqiao Zhong and Nicolas Boumal. Near-optimal bounds for phase synchronization. SIAM Journal on Optimization, 28(2):989–1016, 2018.
  • [40] Linglingzhi Zhu, Jinxin Wang, and Anthony Man-Cho So. Orthogonal group synchronization with incomplete measurements: Error bounds and linear convergence of the generalized power method. arXiv preprint arXiv:2112.06556, 2021.

Appendix A Proofs of Lemma 3, Proposition 3, and Proposition 4

Proof of Lemma 3.

Similar to the proof of Lemma 1, we can show each eigenvalue of AA is also an eigenvalue of (AJd)ZZT(A\otimes J_{d})\circ Z^{*}Z^{*{\mathrm{\scriptscriptstyle T}}} with multiplicity dd. At the same time, each eigenvalue of (AJd)ZZT(A\otimes J_{d})\circ Z^{*}Z^{*{\mathrm{\scriptscriptstyle T}}} must be an eigenvalue of AA. The proof is omitted here. ∎

Proof of Proposition 3.

Since σ=0\sigma=0, we have U=UU=U^{*}. Then Z^j=𝒫(Uj)=𝒫(Uj)=𝒫(Zjuwidecheckj)\widehat{Z}_{j}=\mathcal{P}(U_{j})=\mathcal{P}(U^{*}_{j})=\mathcal{P}(Z^{*}_{j}\widecheck{u}_{j}). Since ZjZ^{*}_{j} is an orthogonal matrix, we have Z^j=Zjsign(uwidecheckj)\widehat{Z}_{j}=Z^{*}_{j}\text{sign}(\widecheck{u}_{j}). Then by (16), the proposition is proved by the same argument used to prove Proposition 1. ∎

Before proving Proposition 4, we state some properties of AA and 𝒲\mathcal{W}. The following lemma can be seen as an analog of Lemma 8.

Lemma 14.

There exist constants C1,C2>0C_{1},C_{2}>0 such that if nplogn>C1\frac{np}{\log n}>C_{1}, then we have

(AJd)𝒲C2dnp,\displaystyle\left\|{(A\otimes J_{d})\circ\mathcal{W}}\right\|\leq C_{2}\sqrt{dnp},
i=1nj[n]\{i}Aij(ZiT𝒲ijZjZjT𝒲jiZi)F22d(d1)n2p(1+C2lognn),\displaystyle\sum_{i=1}^{n}\left\|{\sum_{j\in[n]\backslash\{i\}}A_{ij}\left(Z_{i}^{*{\mathrm{\scriptscriptstyle T}}}\mathcal{W}_{ij}Z_{j}^{*}-Z_{j}^{*{\mathrm{\scriptscriptstyle T}}}\mathcal{W}_{ji}Z_{i}^{*}\right)}\right\|_{\rm F}^{2}\leq 2d(d-1)n^{2}p\left(1+C_{2}\sqrt{\frac{\log n}{n}}\right),
i=1nj[n]\{i}Aij𝒲ijZjF2d2n2p(1+C2lognn),\displaystyle\sum_{i=1}^{n}\left\|{\sum_{j\in[n]\backslash\{i\}}A_{ij}\mathcal{W}_{ij}Z_{j}^{*}}\right\|_{\rm F}^{2}\leq d^{2}n^{2}p\left(1+C_{2}\sqrt{\frac{\log n}{n}}\right),

hold with probability at least 13n101-3n^{-10}.

Proof.

The first inequality is from Lemma 4.2 of [20]. The second and third inequalities are from (59) and (60), together with Lemma 4.3, of [20], respectively. ∎

Proof of Proposition 4.

By Lemma 8 and Lemma 14, there exist constants c1,c2>0c_{1},c_{2}>0 such that when nplogn>c1\frac{np}{\log n}>c_{1}, we have A𝔼Ac2np\left\|{A-\mathbb{E}A}\right\|\leq c_{2}\sqrt{np} and (AJd)𝒲c2dnp\left\|{(A\otimes J_{d})\circ\mathcal{W}}\right\|\leq c_{2}\sqrt{dnp} with probability at least 16n101-6n^{-10}. By Lemma 3 and Lemma 7, we have λ1=λd(n1)pc2np\lambda^{*}_{1}=\lambda^{*}_{d}\geq(n-1)p-c_{2}\sqrt{np}, max{|λd+1|,|λn|}p+c2np\max\{|\lambda^{*}_{d+1}|,|\lambda^{*}_{n}|\}\leq p+c_{2}\sqrt{np}, and λdλd+1np2c2np\lambda^{*}_{d}-\lambda_{d+1}^{*}\geq np-2c_{2}\sqrt{np}. Note that dd is a constant. When nplogn\frac{np}{\log n} and npσ2\frac{np}{\sigma^{2}} are greater than some sufficiently large constant, we have 4σ(AJd)𝒲np/2min{λd,λdλd+1}4\sigma\left\|{(A\otimes J_{d})\circ\mathcal{W}}\right\|\leq np/2\leq\min\{\lambda^{*}_{d},\lambda^{*}_{d}-\lambda_{d+1}^{*}\} satisfied. Since 𝒳(AJd)ZZH=σ(AJd)𝒲\mathcal{X}-(A\otimes J_{d})\circ Z^{*}Z^{*{\mathrm{\scriptscriptstyle H}}}=\sigma(A\otimes J_{d})\circ\mathcal{W}, a direct application of Lemma 4 leads to

infO𝒪(d)UU~O\displaystyle\inf_{O\in\mathcal{O}(d)}\left\|{U-\widetilde{U}O}\right\|
823(λ1λd+1)((43(λ1λd+1)+2λ1)σ2(AJd)𝒲2\displaystyle\leq\frac{8\sqrt{2}}{3(\lambda_{1}^{*}-\lambda_{d+1}^{*})}\Bigg{(}\left(\frac{4}{3(\lambda_{1}^{*}-\lambda_{d+1}^{*})}+\frac{2}{\lambda_{1}^{*}}\right)\sigma^{2}\left\|{(A\otimes J_{d})\circ\mathcal{W}}\right\|^{2}
+max{|λd+1|,|λn|}λ1σ(AJd)𝒲)\displaystyle\quad+\frac{\max\{|\lambda^{*}_{d+1}|,|\lambda^{*}_{n}|\}}{\lambda_{1}^{*}}\sigma\left\|{(A\otimes J_{d})\circ\mathcal{W}}\right\|\Bigg{)}
=823(np/2)((43(np/2)+2np/2)σ2c22dnp+p+c2npnp/2σc2dnp)\displaystyle=\frac{8\sqrt{2}}{3(np/2)}\left(\left(\frac{4}{3(np/2)}+\frac{2}{np/2}\right)\sigma^{2}c_{2}^{2}dnp+\frac{p+c_{2}\sqrt{np}}{np/2}\sigma c_{2}\sqrt{dnp}\right)
c3σ2d+σdnp,\displaystyle\leq c_{3}\frac{\sigma^{2}d+\sigma\sqrt{d}}{np},

for some constant c3>0c_{3}>0. ∎

Appendix B Proof of Theorem 4

We first state useful technical lemmas. They are analogs of Lemma 11 and Lemma 12, respectively. Lemma 15 is proved in (31) of [20].

Lemma 15.

There exists some constant C>0C>0 such that for any ρ\rho that satisfies ρ2npd2σ2C\frac{\rho^{2}np}{d^{2}\sigma^{2}}\geq C , we

i=1n𝕀{2σnpj[n]\{i}Aij𝒲ijZj>ρ}σ2ρ2pexp(ρ2npσ2),\displaystyle\sum_{i=1}^{n}\mathbb{I}\left\{\frac{2\sigma}{np}\left\|{\sum_{j\in[n]\backslash\{i\}}A_{ij}\mathcal{W}_{ij}Z_{j}^{*}}\right\|>\rho\right\}\leq\frac{\sigma^{2}}{\rho^{2}p}\exp\left(-\sqrt{\frac{\rho^{2}np}{\sigma^{2}}}\right),

with probability at least 1exp(ρ2npσ2)1-\exp\left(-\sqrt{\frac{\rho^{2}np}{\sigma^{2}}}\right).

Lemma 16 (Lemma 2.1 of [20]).

Let X,X~d×dX,\widetilde{X}\in\mathbb{R}^{d\times d} be two matrices of full rank. Then,

𝒫(X)𝒫(X~)F2smin(X)+smin(X~)XX~F.\left\|{\mathcal{P}(X)-\mathcal{P}(\widetilde{X})}\right\|_{\rm F}\leq\frac{2}{s_{\min}(X)+s_{\min}(\widetilde{X})}\left\|{X-\widetilde{X}}\right\|_{\rm F}.
Proof of Theorem 4.

Let O𝒪(d)O\in\mathcal{O}(d) satisfy UU~O=infO𝒪(d)UU~O\|U-\widetilde{U}O\|=\inf_{O^{\prime}\in\mathcal{O}(d)}\|{U-\widetilde{U}O^{\prime}}\|. Define Δ:=UU~Ond×d\Delta:=U-\widetilde{U}O\in\mathbb{R}^{nd\times d}. Recall uwidecheck\widecheck{u} is the leading eigenvector of AA. From Proposition 1, Proposition 4, Lemma 8, and Lemma 14, there exist constants c1,c2>0c_{1},c_{2}>0 such that if nplogn,npσ2>c1\frac{np}{\log n},\frac{np}{\sigma^{2}}>c_{1}, we have

Δ\displaystyle\left\|{\Delta}\right\| c2σ2d+σdnp,\displaystyle\leq c_{2}\frac{\sigma^{2}d+\sigma\sqrt{d}}{np}, (55)
maxj[n]|uwidecheckj1nb2|\displaystyle\max_{j\in[n]}\left|\widecheck{u}_{j}-\frac{1}{\sqrt{n}}b_{2}\right| c2(lognnp+1log(np))1n,\displaystyle\leq c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)\frac{1}{\sqrt{n}}, (56)
A𝔼A\displaystyle\left\|{A-\mathbb{E}A}\right\| c2np,\displaystyle\leq c_{2}\sqrt{np}, (57)
(AJd)𝒲\displaystyle\left\|{(A\otimes J_{d})\circ\mathcal{W}}\right\| c2npd,\displaystyle\leq c_{2}\sqrt{npd}, (58)
i=1nj[n]\{i}Aij(ZiT𝒲ijZjZjT𝒲jiZi)F2\displaystyle\sum_{i=1}^{n}\left\|{\sum_{j\in[n]\backslash\{i\}}A_{ij}\left(Z_{i}^{*{\mathrm{\scriptscriptstyle T}}}\mathcal{W}_{ij}Z_{j}^{*}-Z_{j}^{*{\mathrm{\scriptscriptstyle T}}}\mathcal{W}_{ji}Z_{i}^{*}\right)}\right\|_{\rm F}^{2} 2d(d1)n2p(1+c2lognn),\displaystyle\leq 2d(d-1)n^{2}p\left(1+c_{2}\sqrt{\frac{\log n}{n}}\right), (59)
i=1nj[n]\{i}Aij𝒲ijZjF2\displaystyle\sum_{i=1}^{n}\left\|{\sum_{j\in[n]\backslash\{i\}}A_{ij}\mathcal{W}_{ij}Z_{j}^{*}}\right\|_{\rm F}^{2} d2n2p(1+c2lognn),\displaystyle\leq d^{2}n^{2}p\left(1+c_{2}\sqrt{\frac{\log n}{n}}\right), (60)

with probability at least 1n91-n^{-9}, for some b2{1,1}b_{2}\in\{-1,1\}. By Lemma 3 and Lemma 7, we have λ1=λd\lambda^{*}_{1}=\lambda^{*}_{d}, |λd(n1)p|c2np|\lambda_{d}^{*}-(n-1)p|\leq c_{2}\sqrt{np}, |λd+1|p+c2np\left|\lambda^{*}_{d+1}\right|\leq p+c_{2}\sqrt{np}, and λdλd+1np2c2np\lambda^{*}_{d}-\lambda_{d+1}^{*}\geq np-2c_{2}\sqrt{np}.

Using the same argument as (50) and (51) in the proof of Lemma 4, we can have an explicit expression for U~\widetilde{U}. Recall the definition of U~\widetilde{U} in (22). Let 𝒳U=GDNT\mathcal{X}U^{*}=GDN^{\mathrm{\scriptscriptstyle T}} be its SVD where G𝒪(nd,d)G\in\mathcal{O}(nd,d), N𝒪(d)N\in\mathcal{O}(d), and Dd×dD\in\mathbb{R}^{d\times d} is a diagonal matrix with singular values. By the decomposition (21), we have

𝒳U=((AJd)ZZT)U+σ((AJd)𝒲)U=λ1U+σ((AJd)𝒲)U.\displaystyle\mathcal{X}U^{*}=((A\otimes J_{d})\circ Z^{*}Z^{*{\mathrm{\scriptscriptstyle T}}})U^{*}+\sigma((A\otimes J_{d})\circ\mathcal{W})U^{*}=\lambda_{1}^{*}U^{*}+\sigma((A\otimes J_{d})\circ\mathcal{W})U^{*}. (61)

Since the diagonal entries of DD correspond to the leading singular values of 𝒳U\mathcal{X}U^{*}, Weyl’s inequality leads to maxj[d]|Djjλ1|σ(AJd)𝒲c2σdnp.\max_{j\in[d]}|D_{jj}-\lambda_{1}^{*}|\leq\sigma\left\|{(A\otimes J_{d})\circ\mathcal{W}}\right\|\leq c_{2}\sigma\sqrt{dnp}. Denote

t:=p+c2np+c2σdnp.\displaystyle t:=p+c_{2}\sqrt{np}+c_{2}\sigma\sqrt{dnp}. (62)

We then have

maxj[d]|Djjnp|p+t.\displaystyle\max_{j\in[d]}|D_{jj}-np|\leq p+t. (63)

When nplogn,npdσ2\frac{np}{\log n},\frac{np}{d\sigma^{2}} are greater than some sufficiently large constant, we have np/2λ1np/2\leq\lambda^{*}_{1} and np/2Djj3np/2np/2\leq D_{jj}\leq 3np/2 for all j[d]j\in[d]. As a consequence, all the diagonal entries of DD are positive. Then U~\widetilde{U} can be written as

U~=𝒳US,\displaystyle\widetilde{U}=\mathcal{X}U^{*}S,

where

S:=ND1NTd×d.\displaystyle S:=ND^{-1}N^{\mathrm{\scriptscriptstyle T}}\in\mathbb{R}^{d\times d}. (64)

Then (63) leads to

1npIdS=1npIdD11npt1np2t(np)2,\displaystyle\left\|{\frac{1}{np}I_{d}-S}\right\|=\left\|{\frac{1}{np}I_{d}-D^{-1}}\right\|\leq\frac{1}{np-t}-\frac{1}{np}\leq\frac{2t}{(np)^{2}}, (65)

and

S=D12np.\displaystyle\left\|{S}\right\|=\left\|{D^{-1}}\right\|\leq\frac{2}{np}. (66)

Using (61), we have the following decomposition for UU:

U=U~O+Δ=𝒳USO+Δ=(λ1U+σ((AJd)𝒲)U)SO+Δ.\displaystyle U=\widetilde{U}O+\Delta=\mathcal{X}U^{*}SO+\Delta=\left(\lambda_{1}^{*}U^{*}+\sigma((A\otimes J_{d})\circ\mathcal{W})U^{*}\right)SO+\Delta.

Recall the definition of UU^{*} in (14). Define Δ:=U1nZb2\Delta^{*}:=U^{*}-\frac{1}{\sqrt{n}}Z^{*}b_{2}. When nplogn2c2\frac{np}{\log n}\geq 2c_{2}^{*}, by the same argument used to derive (39) as in the proof of Theorem 3, we have

Δ\displaystyle\left\|{\Delta^{*}}\right\| =Z(uwidecheck𝟙d1n𝟙n𝟙db2)=uwidecheck𝟙d1n𝟙n𝟙d=duwidecheck1n𝟙nb2\displaystyle=\left\|{Z^{*}\circ\left(\widecheck{u}\otimes\mathds{1}_{d}-\frac{1}{\sqrt{n}}\mathds{1}_{n}\otimes\mathds{1}_{d}b_{2}\right)}\right\|=\left\|{\widecheck{u}\otimes\mathds{1}_{d}-\frac{1}{\sqrt{n}}\mathds{1}_{n}\otimes\mathds{1}_{d}}\right\|=\sqrt{d}\left\|{\widecheck{u}-\frac{1}{\sqrt{n}}\mathds{1}_{n}b_{2}}\right\|
2c2np+2pnpd.\displaystyle\leq\frac{2c_{2}\sqrt{np}+2p}{np}\sqrt{d}. (67)

Then UU can be further decomposed into

U=(λ1U+σ((AJd)𝒲)(1nZb2+Δ))SO+Δ.\displaystyle U=\left(\lambda_{1}^{*}U^{*}+\sigma((A\otimes J_{d})\circ\mathcal{W})\left(\frac{1}{\sqrt{n}}Z^{*}b_{2}+\Delta^{*}\right)\right)SO+\Delta.

For any j[n]j\in[n], denote [(AJd)𝒲]jd×nd[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}\in\mathbb{R}^{d\times nd} as the submatrix corresponding to its rows from the ((j1)d+1)((j-1)d+1)th to the (jd)(jd)th. Note that SOd×dSO\in\mathbb{R}^{d\times d}. Then UjU_{j} has an expression:

Uj\displaystyle U_{j} =(λ1Uj+σn[(AJd)𝒲]jZb2+σ[(AJd)𝒲]jΔ)SO+Δj\displaystyle=\left(\lambda_{1}^{*}U_{j}^{*}+\frac{\sigma}{\sqrt{n}}[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}Z^{*}b_{2}+\sigma[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}\Delta^{*}\right)SO+\Delta_{j}
=(λ1Zjuwidecheckj+σnkjAjk𝒲jkZkb2+σ[(AJd)𝒲]jΔ)SO+Δj,\displaystyle=\left(\lambda_{1}^{*}Z_{j}^{*}\widecheck{u}_{j}+\frac{\sigma}{\sqrt{n}}\sum_{k\neq j}A_{jk}\mathcal{W}_{jk}Z^{*}_{k}b_{2}+\sigma[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}\Delta^{*}\right)SO+\Delta_{j},

where Δjd×d\Delta_{j}\in\mathbb{R}^{d\times d} is denoted as the jjth submatrix of Δ\Delta.

Note that we have following properties for the mapping 𝒫\mathcal{P}. For any Bd×dB\in\mathbb{R}^{d\times d} of full rank and any F𝒪(d)F\in\mathcal{O}(d), we have 𝒫(BF)=𝒫(B)F\mathcal{P}(BF)=\mathcal{P}(B)F. In addition, if BB is positive-definite, 𝒫(B)=Id\mathcal{P}(B)=I_{d}. Since we have shown the diagonal entries of DD are all lower bounded by np/2np/2, (64) leads to 𝒫(S)=Id\mathcal{P}(S)=I_{d}. Then

Z^jZjOb2F=𝒫(Uj)ZjOb2F=𝒫(ZjTUjOTb2)IdF.\displaystyle\left\|{\widehat{Z}_{j}-Z^{*}_{j}Ob_{2}}\right\|_{\rm F}=\left\|{\mathcal{P}(U_{j})-Z^{*}_{j}Ob_{2}}\right\|_{\rm F}=\left\|{\mathcal{P}(Z_{j}^{*{\mathrm{\scriptscriptstyle T}}}U_{j}O^{\mathrm{\scriptscriptstyle T}}b_{2})-I_{d}}\right\|_{\rm F}.

We have

ZjTUjOTb2=(λ1uwidecheckjb2Id+σnΞj+σb2ZjT[(AJd)𝒲]jΔ)S+ZjTΔjOTb2\displaystyle Z_{j}^{*{\mathrm{\scriptscriptstyle T}}}U_{j}O^{\mathrm{\scriptscriptstyle T}}b_{2}=\left(\lambda_{1}^{*}\widecheck{u}_{j}b_{2}I_{d}+\frac{\sigma}{\sqrt{n}}\Xi_{j}+\sigma b_{2}Z_{j}^{*{\mathrm{\scriptscriptstyle T}}}[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}\Delta^{*}\right)S+Z_{j}^{*{\mathrm{\scriptscriptstyle T}}}\Delta_{j}O^{\mathrm{\scriptscriptstyle T}}b_{2}

where

Ξj:=kjAjkZjT𝒲jkZk.\displaystyle\Xi_{j}:=\sum_{k\neq j}A_{jk}Z^{*{\mathrm{\scriptscriptstyle T}}}_{j}\mathcal{W}_{jk}Z^{*}_{k}.

Note that from (56), we have

b2uwidecheckj(1c2(lognnp+1log(np)))1n.\displaystyle b_{2}\widecheck{u}_{j}\geq\left(1-c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)\right)\frac{1}{\sqrt{n}}.

As long as nplogn\frac{np}{\log n} is greater than some sufficiently large constant, we have b2uwidecheckj12n.b_{2}\widecheck{u}_{j}\geq\frac{1}{2\sqrt{n}}. Since λ1\lambda_{1}^{*} is also positive, we have

ZjTUjOTb2λ1uwidecheckjb2=S+Tj\displaystyle\frac{Z_{j}^{*{\mathrm{\scriptscriptstyle T}}}U_{j}O^{\mathrm{\scriptscriptstyle T}}b_{2}}{\lambda_{1}^{*}\widecheck{u}_{j}b_{2}}=S+T_{j} (68)

where TjT_{j} is defined as

Tj\displaystyle T_{j} :=1λ1uwidecheckjb2((σnΞj+σb2ZjT[(AJd)𝒲]jΔ)S+ZjTΔjOTb2)\displaystyle:=\frac{1}{\lambda_{1}^{*}\widecheck{u}_{j}b_{2}}\left(\left(\frac{\sigma}{\sqrt{n}}\Xi_{j}+\sigma b_{2}Z_{j}^{*{\mathrm{\scriptscriptstyle T}}}[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}\Delta^{*}\right)S+Z_{j}^{*{\mathrm{\scriptscriptstyle T}}}\Delta_{j}O^{\mathrm{\scriptscriptstyle T}}b_{2}\right)
=1λ1uwidecheckjb2σnΞjS+σb2ZjT[(AJd)𝒲]jΔSλ1uwidecheckjb2+ZjTΔjOTb2λ1uwidecheckjb2.\displaystyle=\frac{1}{\lambda_{1}^{*}\widecheck{u}_{j}b_{2}}\frac{\sigma}{\sqrt{n}}\Xi_{j}S+\frac{\sigma b_{2}Z_{j}^{*{\mathrm{\scriptscriptstyle T}}}[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}\Delta^{*}S}{\lambda_{1}^{*}\widecheck{u}_{j}b_{2}}+\frac{Z_{j}^{*{\mathrm{\scriptscriptstyle T}}}\Delta_{j}O^{\mathrm{\scriptscriptstyle T}}b_{2}}{\lambda_{1}^{*}\widecheck{u}_{j}b_{2}}.

As a consequence, when det(Uj)0\det(U_{j})\neq 0, we have

Z^jZjOb2F=𝒫(ZjTUjOTb2λ1uwidecheckjb2)IdF\displaystyle\left\|{\widehat{Z}_{j}-Z^{*}_{j}Ob_{2}}\right\|_{\rm F}=\left\|{\mathcal{P}\left(\frac{Z_{j}^{*{\mathrm{\scriptscriptstyle T}}}U_{j}O^{\mathrm{\scriptscriptstyle T}}b_{2}}{\lambda_{1}^{*}\widecheck{u}_{j}b_{2}}\right)-I_{d}}\right\|_{\rm F} =𝒫(S+Tj)IdF.\displaystyle=\left\|{\mathcal{P}\left(S+T_{j}\right)-I_{d}}\right\|_{\rm F}. (69)

Let 0<γ,ρ<1/80<\gamma,\rho<1/8 whose values will be determined later. To simplify Z^jZjOb2F\|\widehat{Z}_{j}-Z^{*}_{j}Ob_{2}\|_{\text{F}}, consider the following two cases.

(1) If

1λ1uwidecheckjb2σnΞjS\displaystyle\left\|{\frac{1}{\lambda_{1}^{*}\widecheck{u}_{j}b_{2}}\frac{\sigma}{\sqrt{n}}\Xi_{j}S}\right\| γnp\displaystyle\leq\frac{\gamma}{np} (70)
σb2ZjT[(AJd)𝒲]jΔSλ1uwidecheckjb2\displaystyle\left\|{\frac{\sigma b_{2}Z_{j}^{*{\mathrm{\scriptscriptstyle T}}}[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}\Delta^{*}S}{\lambda_{1}^{*}\widecheck{u}_{j}b_{2}}}\right\| ρnp\displaystyle\leq\frac{\rho}{np}
ZjTΔjOTb2λ1uwidecheckjb2\displaystyle\left\|{\frac{Z_{j}^{*{\mathrm{\scriptscriptstyle T}}}\Delta_{j}O^{\mathrm{\scriptscriptstyle T}}b_{2}}{\lambda_{1}^{*}\widecheck{u}_{j}b_{2}}}\right\| ρnp\displaystyle\leq\frac{\rho}{np} (71)

all hold, then

smin(S+Tj)\displaystyle s_{\min}(S+T_{j}) smin(S)Tj=smin(D1)Tj=D111Tj\displaystyle\geq s_{\min}(S)-\left\|{T_{j}}\right\|=s_{\min}(D^{-1})-\left\|{T_{j}}\right\|=D_{11}^{-1}-\left\|{T_{j}}\right\|
D111γ+2ρnp,\displaystyle\geq D_{11}^{-1}-\frac{\gamma+2\rho}{np},

which is greater than 0 by (63). Together with (68), we have det(Uj)0\det(U_{j})\neq 0. The same lower bound holds for smin(S+(Tj+TjT)/2)s_{\min}(S+(T_{j}+T_{j}^{\mathrm{\scriptscriptstyle T}})/2). Since SS is positive-definite, we have 𝒫(S+(Tj+TjT)/2)=Id\mathcal{P}(S+(T_{j}+T_{j}^{\mathrm{\scriptscriptstyle T}})/2)=I_{d}. By Lemma 16 and (69), we have

Z^jZjOb2F\displaystyle\left\|{\widehat{Z}_{j}-Z^{*}_{j}Ob_{2}}\right\|_{\rm F}
=𝒫(S+Tj)𝒫(S+Tj+TjT2)F\displaystyle=\left\|{\mathcal{P}\left(S+T_{j}\right)-\mathcal{P}\left(S+\frac{T_{j}+T_{j}^{\mathrm{\scriptscriptstyle T}}}{2}\right)}\right\|_{\rm F}
1(D111γ+2ρnp)TjTjT2F\displaystyle\leq\frac{1}{\left(D_{11}^{-1}-\frac{\gamma+2\rho}{np}\right)}\left\|{\frac{T_{j}-T_{j}^{\mathrm{\scriptscriptstyle T}}}{2}}\right\|_{\rm F}
1λ1uwidecheckjb212(D111γ+2ρnp)(σnΞjSSTΞjTF+2σb2ZjT[(AJd)𝒲]jΔSF\displaystyle\leq\frac{1}{\lambda_{1}^{*}\widecheck{u}_{j}b_{2}}\frac{1}{2\left(D_{11}^{-1}-\frac{\gamma+2\rho}{np}\right)}\Bigg{(}\frac{\sigma}{\sqrt{n}}\left\|{\Xi_{j}S-S^{\mathrm{\scriptscriptstyle T}}\Xi_{j}^{\mathrm{\scriptscriptstyle T}}}\right\|_{\rm F}+2\left\|{\sigma b_{2}Z_{j}^{*{\mathrm{\scriptscriptstyle T}}}[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}\Delta^{*}S}\right\|_{\rm F}
+2ZjTΔjOTb2F).\displaystyle\quad+2\left\|{Z_{j}^{*{\mathrm{\scriptscriptstyle T}}}\Delta_{j}O^{\mathrm{\scriptscriptstyle T}}b_{2}}\right\|_{\rm F}\Bigg{)}.

We can further simplify the first term in the display above. We have

ΞjSSTΞjTF\displaystyle\left\|{\Xi_{j}S-S^{\mathrm{\scriptscriptstyle T}}\Xi_{j}^{\mathrm{\scriptscriptstyle T}}}\right\|_{\rm F} =1np(ΞjΞjT)Ξj(1npIdS)+(1npIdST)ΞjTF\displaystyle=\left\|{\frac{1}{np}\left(\Xi_{j}-\Xi_{j}^{\mathrm{\scriptscriptstyle T}}\right)-\Xi_{j}\left(\frac{1}{np}I_{d}-S\right)+(\frac{1}{np}I_{d}-S^{\mathrm{\scriptscriptstyle T}})\Xi_{j}^{\mathrm{\scriptscriptstyle T}}}\right\|_{\rm F}
1npΞjΞjTF+21npIdSΞjF.\displaystyle\leq\frac{1}{np}\left\|{\Xi_{j}-\Xi_{j}^{\mathrm{\scriptscriptstyle T}}}\right\|_{\rm F}+2\left\|{\frac{1}{np}I_{d}-S}\right\|\left\|{\Xi_{j}}\right\|_{\rm F}.

Using (65) and (66), we have

Z^jZjOb2F\displaystyle\left\|{\widehat{Z}_{j}-Z^{*}_{j}Ob_{2}}\right\|_{\rm F} 1λ1uwidecheckjb212(D111γ+2ρnp)(σn1npΞjΞjTF+σnt(np)2ΞjF\displaystyle\leq\frac{1}{\lambda_{1}^{*}\widecheck{u}_{j}b_{2}}\frac{1}{2\left(D_{11}^{-1}-\frac{\gamma+2\rho}{np}\right)}\Bigg{(}\frac{\sigma}{\sqrt{n}}\frac{1}{np}\left\|{\Xi_{j}-\Xi_{j}^{\mathrm{\scriptscriptstyle T}}}\right\|_{\rm F}+\frac{\sigma}{\sqrt{n}}\frac{t}{(np)^{2}}\left\|{\Xi_{j}}\right\|_{\rm F}
+4npσ[(AJd)𝒲]jΔF+2ΔjF).\displaystyle\quad+\frac{4}{np}\sigma\left\|{[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}\Delta^{*}}\right\|_{\rm F}+2\left\|{\Delta_{j}}\right\|_{\rm F}\Bigg{)}.

Using the lower bounds for λ1\lambda_{1}^{*}, uwidecheckjb2\widecheck{u}_{j}b_{2}, and D111D_{11}^{-1}, as given at the beginning of this proof, we have

Z^jZjOb2F\displaystyle\left\|{\widehat{Z}_{j}-Z^{*}_{j}Ob_{2}}\right\|_{\rm F}
1(nppc2np)(1c2(lognnp+1log(np)))(1np+tγ+2ρnp)σ2npΞjΞjTF\displaystyle\leq\frac{1}{\left(np-p-c_{2}\sqrt{np}\right)\left(1-c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)\right)\left(\frac{1}{np+t}-\frac{\gamma+2\rho}{np}\right)}\frac{\sigma}{2np}\left\|{\Xi_{j}-\Xi_{j}^{\mathrm{\scriptscriptstyle T}}}\right\|_{\rm F}
+4σt(np)2ΞjF+16σnnp[(AJd)𝒲]jΔF+16nΔjF.\displaystyle\quad+\frac{4\sigma t}{(np)^{2}}\left\|{\Xi_{j}}\right\|_{\rm F}+\frac{16\sigma\sqrt{n}}{np}\left\|{[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}\Delta^{*}}\right\|_{\rm F}+16\sqrt{n}\left\|{\Delta_{j}}\right\|_{\rm F}.

Let η>0\eta>0 whose value will be given later. By the same argument as used in the proof of Theorem 3, we have

Z^jZjOb2F2\displaystyle\left\|{\widehat{Z}_{j}-Z^{*}_{j}Ob_{2}}\right\|_{\rm F}^{2}
1+η(nppc2np)2(1c2(lognnp+1log(np)))2(1np+tγ+2ρnp)2σ24(np)2ΞjΞjTF2\displaystyle\leq\frac{1+\eta}{\left(np-p-c_{2}\sqrt{np}\right)^{2}\left(1-c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)\right)^{2}\left(\frac{1}{np+t}-\frac{\gamma+2\rho}{np}\right)^{2}}\frac{\sigma^{2}}{4(np)^{2}}\left\|{\Xi_{j}-\Xi_{j}^{\mathrm{\scriptscriptstyle T}}}\right\|_{\rm F}^{2}
+3(1+η1)16σ2t2(np)4ΞjF2+3(1+η1)256σ2n(np)2[(AJd)𝒲]jΔF2\displaystyle\quad+3(1+\eta^{-1})\frac{16\sigma^{2}t^{2}}{(np)^{4}}\left\|{\Xi_{j}}\right\|_{\rm F}^{2}+3(1+\eta^{-1})\frac{256\sigma^{2}n}{(np)^{2}}\left\|{[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}\Delta^{*}}\right\|_{\rm F}^{2}
+3(1+η1)64nΔjF2.\displaystyle\quad+3(1+\eta^{-1})64n\left\|{\Delta_{j}}\right\|_{\rm F}^{2}.

(2) If any one of (70)-(71) does not hold, we simply upper bound Z^jZjQ~b2F\|{\widehat{Z}_{j}-Z^{*}_{j}\widetilde{Q}b_{2}}\|_{\rm F} by 2d2\sqrt{d}. Then this case can be written as

Z^jZjOb2F2\displaystyle\left\|{\widehat{Z}_{j}-Z^{*}_{j}Ob_{2}}\right\|_{\rm F}^{2}
4d(𝕀{1λ1uwidecheckjb2σnΞjS>γnp}+𝕀{σb2ZjT[(AJd)𝒲]jΔSλ1uwidecheckjb2>ρnp}\displaystyle\leq 4d\Bigg{(}{\mathbb{I}\left\{{\left\|{\frac{1}{\lambda_{1}^{*}\widecheck{u}_{j}b_{2}}\frac{\sigma}{\sqrt{n}}\Xi_{j}S}\right\|>\frac{\gamma}{np}}\right\}}+{\mathbb{I}\left\{{\left\|{\frac{\sigma b_{2}Z_{j}^{*{\mathrm{\scriptscriptstyle T}}}[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}\Delta^{*}S}{\lambda_{1}^{*}\widecheck{u}_{j}b_{2}}}\right\|>\frac{\rho}{np}}\right\}}
+𝕀{ZjTΔjOTb2λ1uwidecheckjb2>ρnp}).\displaystyle\quad+{\mathbb{I}\left\{{\left\|{\frac{Z_{j}^{*{\mathrm{\scriptscriptstyle T}}}\Delta_{j}O^{\mathrm{\scriptscriptstyle T}}b_{2}}{\lambda_{1}^{*}\widecheck{u}_{j}b_{2}}}\right\|>\frac{\rho}{np}}\right\}}\Bigg{)}.

Using (66), λ1np/2\lambda_{1}^{*}\geq np/2, and uwidecheckjb21/(2n)\widecheck{u}_{j}b_{2}\geq 1/(2\sqrt{n}), we have

Z^jZjOb2F2\displaystyle\left\|{\widehat{Z}_{j}-Z^{*}_{j}Ob_{2}}\right\|_{\rm F}^{2}
4d(𝕀{8σΞjγnp}+𝕀{8nσ[(AJd)𝒲]jΔρnp}+𝕀{4nΔjρ})\displaystyle\leq 4d\left({\mathbb{I}\left\{{8\sigma\left\|{\Xi_{j}}\right\|\geq\gamma np}\right\}}+{\mathbb{I}\left\{{8\sqrt{n}\sigma\left\|{[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}\Delta^{*}}\right\|\geq\rho np}\right\}}+{\mathbb{I}\left\{{4\sqrt{n}\left\|{\Delta_{j}}\right\|\geq\rho}\right\}}\right)
4d(𝕀{8σΞjγnp}+64σ2n(ρnp)2[(AJd)𝒲]jΔF2+16nρ2ΔjF2).\displaystyle\leq 4d\left({\mathbb{I}\left\{{8\sigma\left\|{\Xi_{j}}\right\|\geq\gamma np}\right\}}+\frac{64\sigma^{2}n}{(\rho np)^{2}}\left\|{[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}\Delta^{*}}\right\|_{\rm F}^{2}+16n\rho^{-2}\left\|{\Delta_{j}}\right\|_{\rm F}^{2}\right).

Combining these two cases together, we have

Z^jZjOb2F2\displaystyle\left\|{\widehat{Z}_{j}-Z^{*}_{j}Ob_{2}}\right\|_{\rm F}^{2}
1+η(nppc2np)2(1c2(lognnp+1log(np)))2(1np+tγ+2ρnp)2σ24(np)2ΞjΞjTF2\displaystyle\leq\frac{1+\eta}{\left(np-p-c_{2}\sqrt{np}\right)^{2}\left(1-c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)\right)^{2}\left(\frac{1}{np+t}-\frac{\gamma+2\rho}{np}\right)^{2}}\frac{\sigma^{2}}{4(np)^{2}}\left\|{\Xi_{j}-\Xi_{j}^{\mathrm{\scriptscriptstyle T}}}\right\|_{\rm F}^{2}
+3(1+η1)16σ2t2(np)4ΞjF2+3(1+η1)256σ2n(np)2[(AJd)𝒲]jΔF2\displaystyle\quad+3(1+\eta^{-1})\frac{16\sigma^{2}t^{2}}{(np)^{4}}\left\|{\Xi_{j}}\right\|_{\rm F}^{2}+3(1+\eta^{-1})\frac{256\sigma^{2}n}{(np)^{2}}\left\|{[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}\Delta^{*}}\right\|_{\rm F}^{2}
+3(1+η1)64nΔjF2\displaystyle\quad+3(1+\eta^{-1})64n\left\|{\Delta_{j}}\right\|_{\rm F}^{2}
+4d(𝕀{8σΞjγnp}+64σ2n(ρnp)2[(AJd)𝒲]jΔF2+16nρ2ΔjF2)\displaystyle\quad+4d\left({\mathbb{I}\left\{{8\sigma\left\|{\Xi_{j}}\right\|\geq\gamma np}\right\}}+\frac{64\sigma^{2}n}{(\rho np)^{2}}\left\|{[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}\Delta^{*}}\right\|_{\rm F}^{2}+16n\rho^{-2}\left\|{\Delta_{j}}\right\|_{\rm F}^{2}\right)
1+η(nppc2np)2(1c2(lognnp+1log(np)))2(1np+tγ+2ρnp)2σ24(np)2ΞjΞjTF2\displaystyle\leq\frac{1+\eta}{\left(np-p-c_{2}\sqrt{np}\right)^{2}\left(1-c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)\right)^{2}\left(\frac{1}{np+t}-\frac{\gamma+2\rho}{np}\right)^{2}}\frac{\sigma^{2}}{4(np)^{2}}\left\|{\Xi_{j}-\Xi_{j}^{\mathrm{\scriptscriptstyle T}}}\right\|_{\rm F}^{2}
+3(1+η1)16σ2t2(np)4ΞjF2+4d𝕀{8σΞjγnp}\displaystyle\quad+3(1+\eta^{-1})\frac{16\sigma^{2}t^{2}}{(np)^{4}}\left\|{\Xi_{j}}\right\|_{\rm F}^{2}+4d{\mathbb{I}\left\{{8\sigma\left\|{\Xi_{j}}\right\|\geq\gamma np}\right\}}
+256σ2n(np)2(3(1+η1)+dρ2)[(AJd)𝒲]jΔF2\displaystyle\quad+\frac{256\sigma^{2}n}{(np)^{2}}\left(3(1+\eta^{-1})+d\rho^{-2}\right)\left\|{[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}\Delta^{*}}\right\|_{\rm F}^{2}
+64n(3(1+η1)+dρ2)ΔjF2.\displaystyle\quad+64n\left(3(1+\eta^{-1})+d\rho^{-2}\right)\left\|{\Delta_{j}}\right\|_{\rm F}^{2}.

As a result, we have

od(Z^,Z)\displaystyle\ell^{\text{od}}(\widehat{Z},Z^{*})
1nj[n]Z^jZjOb2F2\displaystyle\leq\frac{1}{n}\sum_{j\in[n]}\left\|{\widehat{Z}_{j}-Z^{*}_{j}Ob_{2}}\right\|_{\rm F}^{2}
1+η(nppc2np)2(1c2(lognnp+1log(np)))2(1np+tγ+2ρnp)2\displaystyle\leq\frac{1+\eta}{\left(np-p-c_{2}\sqrt{np}\right)^{2}\left(1-c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)\right)^{2}\left(\frac{1}{np+t}-\frac{\gamma+2\rho}{np}\right)^{2}}
×σ24(np)21nj[n]ΞjΞjTF2\displaystyle\quad\times\frac{\sigma^{2}}{4(np)^{2}}\frac{1}{n}\sum_{j\in[n]}\left\|{\Xi_{j}-\Xi_{j}^{\mathrm{\scriptscriptstyle T}}}\right\|_{\rm F}^{2}
+3(1+η1)16σ2t2(np)41nj[n]ΞjF2+4d1nj[n]𝕀{8σΞjγnp}\displaystyle\quad+3(1+\eta^{-1})\frac{16\sigma^{2}t^{2}}{(np)^{4}}\frac{1}{n}\sum_{j\in[n]}\left\|{\Xi_{j}}\right\|_{\rm F}^{2}+4d\frac{1}{n}\sum_{j\in[n]}{\mathbb{I}\left\{{8\sigma\left\|{\Xi_{j}}\right\|\geq\gamma np}\right\}}
+256σ2(np)2(3(1+η1)+dρ2)j[n][(AJd)𝒲]jΔF2\displaystyle\quad+\frac{256\sigma^{2}}{(np)^{2}}\left(3(1+\eta^{-1})+d\rho^{-2}\right)\sum_{j\in[n]}\left\|{[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}\Delta^{*}}\right\|_{\rm F}^{2}
+64(3(1+η1)+dρ2)j[n]ΔjF2.\displaystyle\quad+64\left(3(1+\eta^{-1})+d\rho^{-2}\right)\sum_{j\in[n]}\left\|{\Delta_{j}}\right\|_{\rm F}^{2}.

In the rest of the proof, we are going to simplify the display above. Specifically, we are going to upper bound j[n]ΞjΞjTF2\sum_{j\in[n]}\|{\Xi_{j}-\Xi_{j}^{\mathrm{\scriptscriptstyle T}}}\|_{\rm F}^{2}, j[n]ΞjF2\sum_{j\in[n]}\left\|{\Xi_{j}}\right\|_{\rm F}^{2}, j[n]𝕀{8σΞjγnp}\sum_{j\in[n]}{\mathbb{I}\left\{{8\sigma\left\|{\Xi_{j}}\right\|\geq\gamma np}\right\}}, j[n][(AJd)𝒲]jΔF2\sum_{j\in[n]}\left\|{[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}\Delta^{*}}\right\|_{\rm F}^{2}, and j[n]ΔjF2\sum_{j\in[n]}\left\|{\Delta_{j}}\right\|_{\rm F}^{2}.

For j[n]ΞjΞjTF2\sum_{j\in[n]}\|{\Xi_{j}-\Xi_{j}^{\mathrm{\scriptscriptstyle T}}}\|_{\rm F}^{2} and j[n]ΞjF2\sum_{j\in[n]}\left\|{\Xi_{j}}\right\|_{\rm F}^{2}, note that they are the left-hand sides of (59) and (60), respectively. Hence, they can be upper bounded by the right-hand sides of (59) and (60), respectively. For j[n]𝕀{8σΞjγnp}\sum_{j\in[n]}{\mathbb{I}\left\{{8\sigma\left\|{\Xi_{j}}\right\|\geq\gamma np}\right\}}, according to Lemma 15, if γ2npd2σ2>c3\frac{\gamma^{2}np}{d^{2}\sigma^{2}}>c_{3} for some c3>0c_{3}>0, we have

j[n]𝕀{8σΞjγnp}\displaystyle\sum_{j\in[n]}{\mathbb{I}\left\{{8\sigma\left\|{\Xi_{j}}\right\|\geq\gamma np}\right\}} 16σ2γ2pexp(γ2np16σ2)\displaystyle\leq\frac{16\sigma^{2}}{\gamma^{2}p}\exp\left(-\sqrt{\frac{\gamma^{2}np}{16\sigma^{2}}}\right)

with probability at least 1exp(γ2np16σ2)1-\exp\left(-\sqrt{\frac{\gamma^{2}np}{16\sigma^{2}}}\right). When c3c_{3} is sufficiently large, it follows that

16σ2γ2npexp(γ2np16σ2)(σ2γ2np)3\displaystyle\frac{16\sigma^{2}}{\gamma^{2}np}\exp\left(-\sqrt{\frac{\gamma^{2}np}{16\sigma^{2}}}\right)\leq\left(\frac{\sigma^{2}}{\gamma^{2}np}\right)^{3}

by the same argument as in the proof of Theorem 3. For j[n][(AJd)𝒲]jΔF2\sum_{j\in[n]}\left\|{[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}\Delta^{*}}\right\|_{\rm F}^{2}, we have

j[n][(AJd)𝒲]jΔF2\displaystyle\sum_{j\in[n]}\left\|{[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}\Delta^{*}}\right\|_{\rm F}^{2} =(AJd)𝒲ΔF2\displaystyle=\left\|{(A\otimes J_{d})\circ\mathcal{W}\Delta^{*}}\right\|_{\rm F}^{2}
(AJd)𝒲2ΔF2\displaystyle\leq\left\|{(A\otimes J_{d})\circ\mathcal{W}}\right\|^{2}\left\|{\Delta^{*}}\right\|_{\rm F}^{2}
d(AJd)𝒲2Δ2\displaystyle\leq d\left\|{(A\otimes J_{d})\circ\mathcal{W}}\right\|^{2}\left\|{\Delta^{*}}\right\|^{2}
c2d(dnp2c2np+2pnpd)2,\displaystyle\leq c_{2}d\left(\sqrt{dnp}\frac{2c_{2}\sqrt{np}+2p}{np}\sqrt{d}\right)^{2},

where in the second to last inequality we use the fact that Δ\Delta^{*} is rank-dd and in the last inequality we use (67). For j[n]ΔjF2\sum_{j\in[n]}\left\|{\Delta_{j}}\right\|_{\rm F}^{2}, we have j[n]ΔjF2=ΔF2dΔ2d(c2σ2d+σdnp)2\sum_{j\in[n]}\left\|{\Delta_{j}}\right\|_{\rm F}^{2}=\left\|{\Delta}\right\|_{\rm F}^{2}\leq d\left\|{\Delta}\right\|^{2}\leq d\left(c_{2}\frac{\sigma^{2}d+\sigma\sqrt{d}}{np}\right)^{2} where the last inequality is due to (55).

Using the above results, we have

od(Z^,Z)\displaystyle\ell^{\text{od}}(\widehat{Z},Z^{*})
1+η(nppc2np)2(1c2(lognnp+1log(np)))2(1np+tγ+2ρnp)2\displaystyle\leq\frac{1+\eta}{\left(np-p-c_{2}\sqrt{np}\right)^{2}\left(1-c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)\right)^{2}\left(\frac{1}{np+t}-\frac{\gamma+2\rho}{np}\right)^{2}}
×σ24(np)22d(d1)np(1+c2lognn)\displaystyle\quad\times\frac{\sigma^{2}}{4(np)^{2}}2d(d-1)np\left(1+c_{2}^{\prime}\sqrt{\frac{\log n}{n}}\right)
+3(1+η1)16σ2t2(np)4d2np(1+c2lognn)+4d(σ2γ2np)3\displaystyle\quad+3(1+\eta^{-1})\frac{16\sigma^{2}t^{2}}{(np)^{4}}d^{2}np\left(1+c_{2}^{\prime}\sqrt{\frac{\log n}{n}}\right)+4d\left(\frac{\sigma^{2}}{\gamma^{2}np}\right)^{3}
+256σ2(np)2(3(1+η1)+dρ2)c2d(dnp2c2np+2pnpd)2\displaystyle\quad+\frac{256\sigma^{2}}{(np)^{2}}\left(3(1+\eta^{-1})+d\rho^{-2}\right)c_{2}d\left(\sqrt{dnp}\frac{2c_{2}\sqrt{np}+2p}{np}\sqrt{d}\right)^{2}
+64(3(1+η1)+dρ2)d(c2σ2d+σdnp)2.\displaystyle\quad+64\left(3(1+\eta^{-1})+d\rho^{-2}\right)d\left(c_{2}\frac{\sigma^{2}d+\sigma\sqrt{d}}{np}\right)^{2}.

Note that 1(1x)21+16x\frac{1}{(1-x)^{2}}\leq 1+16x for any 0x120\leq x\leq\frac{1}{2}. When nplogn\frac{np}{\log n} is greater than some sufficiently large constant, we have (1c2(lognnp+1log(np)))216c2(lognnp+1log(np))\left(1-c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)\right)^{-2}\leq 16c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right) and (1c21np1n)216(c21np+1n)\left(1-c_{2}\frac{1}{\sqrt{np}}-\frac{1}{n}\right)^{-2}\leq 16\left(c_{2}\frac{1}{\sqrt{np}}+\frac{1}{n}\right). When npdσ2\frac{np}{d\sigma^{2}} is also greater than some sufficiently large constant, we have (npnp+tγ2ρ)216(tnp+t+γ+2ρ)16(tnp+γ+2ρ)16(p+c2np+c2σdnpnp+γ+2ρ)\left(\frac{np}{np+t}-\gamma-2\rho\right)^{-2}\leq 16\left(\frac{t}{np+t}+\gamma+2\rho\right)\leq 16\left(\frac{t}{np}+\gamma+2\rho\right)\leq 16\left(\frac{p+c_{2}\sqrt{np}+c_{2}\sigma\sqrt{dnp}}{np}+\gamma+2\rho\right), using the definition of tt in (62). We then have

od(Z^,Z)\displaystyle\ell^{\text{od}}(\widehat{Z},Z^{*})
163c2(1+η)(c21np+1n)(lognnp+1log(np))(p+c2np+c2σdnpnp+γ+2ρ)\displaystyle\leq 16^{3}c_{2}(1+\eta)\left(c_{2}\frac{1}{\sqrt{np}}+\frac{1}{n}\right)\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)\left(\frac{p+c_{2}\sqrt{np}+c_{2}\sigma\sqrt{dnp}}{np}+\gamma+2\rho\right)
×(1+c2lognn)d(d1)σ22np\displaystyle\quad\times\left(1+c_{2}^{\prime}\sqrt{\frac{\log n}{n}}\right)\frac{d(d-1)\sigma^{2}}{2np}
+3(1+η1)(p+c2np+c2σdnpnp)2(1+c2lognn)16npd2σ2np\displaystyle\quad+3(1+\eta^{-1})\left(\frac{p+c_{2}\sqrt{np}+c_{2}\sigma\sqrt{dnp}}{np}\right)^{2}\left(1+c_{2}^{\prime}\sqrt{\frac{\log n}{n}}\right)\frac{16}{np}\frac{d^{2}\sigma^{2}}{np}
+4γ6(σ2np)2dσ2np+256c2(3(1+η1)+dρ2)(2c2np+2nnp)2d2σ2np\displaystyle\quad+4\gamma^{-6}\left(\frac{\sigma^{2}}{np}\right)^{2}\frac{d\sigma^{2}}{np}+256c_{2}\left(3(1+\eta^{-1})+d\rho^{-2}\right)\left(\frac{2c_{2}}{\sqrt{np}}+\frac{2}{n\sqrt{np}}\right)^{2}\frac{d^{2}\sigma^{2}}{np}
+64(3(1+η1)+dρ2)(c2σd+1np)2d2σ2np.\displaystyle\quad+64\left(3(1+\eta^{-1})+d\rho^{-2}\right)\left(c_{2}\frac{\sigma\sqrt{d}+1}{\sqrt{np}}\right)^{2}\frac{d^{2}\sigma^{2}}{np}.

After rearrangement, there exists some constant c5>0c_{5}>0 such that

od(Z^,Z)\displaystyle\ell^{\text{od}}(\widehat{Z},Z^{*}) (1+c5(η+γ+ρ+lognnp+1log(np)+γ6(σ2np)2+dσ2np\displaystyle\leq\Bigg{(}1+c_{5}\Bigg{(}\eta+\gamma+\rho+\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}+\gamma^{-6}\left(\frac{\sigma^{2}}{np}\right)^{2}+\sqrt{\frac{d\sigma^{2}}{np}}
+(η1+dρ2)(1+dσ2np)))d(d1)σ22np.\displaystyle\quad+\left(\eta^{-1}+d\rho^{-2}\right)\left(\frac{1+d\sigma^{2}}{np}\right)\Bigg{)}\Bigg{)}\frac{d(d-1)\sigma^{2}}{2np}.

We can take γ2=d2σ2/np\gamma^{2}=\sqrt{d^{2}\sigma^{2}/np} (then γ2npd2σ2>c3\frac{\gamma^{2}np}{d^{2}\sigma^{2}}>c_{3} is guaranteed as long as npd2σ2>c32\frac{np}{d^{2}\sigma^{2}}>c_{3}^{2}). We also take ρ2=(d+dσ2)/np\rho^{2}=\sqrt{(d+d\sigma^{2})/np} and let η=ρ2\eta=\rho^{2}. They are guaranteed to be smaller than 1/81/8 when npd\frac{np}{d} and npd2σ2\frac{np}{d^{2}\sigma^{2}} are greater than some large constant. Then, there exists some constant c6>0c_{6}>0 such that

od(Z^,Z)\displaystyle\ell^{\text{od}}(\widehat{Z},Z^{*}) (1+c5((d+dσ2np)12+(d2σ2np)14+(d+dσ2np)14+lognnp+1log(np)\displaystyle\leq\Bigg{(}1+c_{5}\Bigg{(}\left(\frac{d+d\sigma^{2}}{np}\right)^{\frac{1}{2}}+\left(\frac{d^{2}\sigma^{2}}{np}\right)^{\frac{1}{4}}+\left(\frac{d+d\sigma^{2}}{np}\right)^{\frac{1}{4}}+\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}
+d3(σ2np)12+dσ2np+(1+d)npd+dσ2(1+dσ2np)))d(d1)σ22np\displaystyle\quad+d^{-3}\left(\frac{\sigma^{2}}{np}\right)^{\frac{1}{2}}+\sqrt{\frac{d\sigma^{2}}{np}}+(1+d)\sqrt{\frac{np}{d+d\sigma^{2}}}\left(\frac{1+d\sigma^{2}}{np}\right)\Bigg{)}\Bigg{)}\frac{d(d-1)\sigma^{2}}{2np}
(1+c6((d+d2σ2np)14+lognnp+1log(np)))d(d1)σ22np.\displaystyle\leq\left(1+c_{6}\left(\left(\frac{d+d^{2}\sigma^{2}}{np}\right)^{\frac{1}{4}}+\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)\right)\frac{d(d-1)\sigma^{2}}{2np}.

This holds with probability at least 1n9exp(132(npσ2)14)1-n^{-9}-\exp\left(-\frac{1}{32}\left(\frac{np}{\sigma^{2}}\right)^{\frac{1}{4}}\right).

Appendix C Calculation for (18)

Recall the definitions of YY^{*} and YY in (17). First, we are going to show vv, the leading eigenvector of YY, must be a linear combination of e1e_{1} and e2e_{2}. Note that for any unit vector x=(x1,,xn)Tnx=(x_{1},\ldots,x_{n})^{\mathrm{\scriptscriptstyle T}}\in\mathbb{R}^{n}, we have

xTYx\displaystyle x^{\mathrm{\scriptscriptstyle T}}Yx =xTYx+xT(YY)x=(2jnxj2)+δ2(x1+x2)2=1+x12+δ2(x1+x2)2.\displaystyle=x^{\mathrm{\scriptscriptstyle T}}Y^{*}x+x^{\mathrm{\scriptscriptstyle T}}(Y-Y^{*})x=\left(-\sum_{2\leq j\leq n}x_{j}^{2}\right)+\frac{\delta}{2}(x_{1}+x_{2})^{2}=-1+x_{1}^{2}+\frac{\delta}{2}(x_{1}+x_{2})^{2}.

If xx maximizes the right-hand side over the unit sphere, it is obvious that neither x1x_{1} nor x2x_{2} can be 0. In addition, x1x20x_{1}x_{2}\geq 0 and x12+x22=1x_{1}^{2}+x_{2}^{2}=1 must be satisfied; otherwise the right-hand side can be made strictly larger. Then we can write v=αe1+1α2e2v=\alpha e_{1}+\sqrt{1-\alpha^{2}}e_{2} where α[0,1]\alpha\in[0,1]. Since Yv=δ2(α+1α2)e1+(δ2(α+1α2)1α2)e2Yv=\frac{\delta}{2}(\alpha+\sqrt{1-\alpha^{2}})e_{1}+\left(\frac{\delta}{2}(\alpha+\sqrt{1-\alpha^{2}})-\sqrt{1-\alpha^{2}}\right)e_{2}, we have

αδ2(α+1α2)=1α2(δ2(α+1α2)1α2).\displaystyle\frac{\alpha}{\frac{\delta}{2}(\alpha+\sqrt{1-\alpha^{2}})}=\frac{\sqrt{1-\alpha^{2}}}{\left(\frac{\delta}{2}(\alpha+\sqrt{1-\alpha^{2}})-\sqrt{1-\alpha^{2}}\right)}.

After rearrangement, this gives δ(2α21)=2α1α2\delta(2\alpha^{2}-1)=2\alpha\sqrt{1-\alpha^{2}} which means α2>12\alpha^{2}>\frac{1}{2}. Squaring it yields the equation 4(1+δ2)α44(1+δ2)α2+δ2=04(1+\delta^{2})\alpha^{4}-4(1+\delta^{2})\alpha^{2}+\delta^{2}=0 whose solution is α2=12(1±11+δ2)\alpha^{2}=\frac{1}{2}\left(1\pm\frac{1}{\sqrt{1+\delta^{2}}}\right). Since α2>12\alpha^{2}>\frac{1}{2}, we have α2=12(1+11+δ2)\alpha^{2}=\frac{1}{2}\left(1+\frac{1}{\sqrt{1+\delta^{2}}}\right). Hence,

v=\displaystyle v= 12(1+11+δ2)e1+12(111+δ2)e2.\displaystyle\sqrt{\frac{1}{2}\left(1+\frac{1}{\sqrt{1+\delta^{2}}}\right)}e_{1}+\sqrt{\frac{1}{2}\left(1-\frac{1}{\sqrt{1+\delta^{2}}}\right)}e_{2}.

We can verify it is the eigenvector of YY corresponding to the eigenvalue 12(δ+1+δ21)\frac{1}{2}({\delta+\sqrt{1+\delta^{2}}-1}).