Exact Minimax Optimality of Spectral Methods in Phase Synchronization and Orthogonal Group Synchronization

Anderson Ye Zhang

University of Pennsylvania

Abstract

We study the performance of the spectral method for the phase synchronization problem with additive Gaussian noises and incomplete data. The spectral method utilizes the leading eigenvector of the data matrix followed by a normalization step. We prove that it achieves the minimax lower bound of the problem with a matching leading constant under a squared $\ell_{2}$ loss. This shows that the spectral method has the same performance as more sophisticated procedures including maximum likelihood estimation, generalized power method, and semidefinite programming, as long as consistent parameter estimation is possible. To establish our result, we first have a novel choice of the population eigenvector, which enables us to establish the exact recovery of the spectral method when there is no additive noise. We then develop a new perturbation analysis toolkit for the leading eigenvector and show it can be well-approximated by its first-order approximation with a small $\ell_{2}$ error. We further extend our analysis to establish the exact minimax optimality of the spectral method for the orthogonal group synchronization.

1 Introduction

We consider the phase synchronization problem with additive Gaussian noises and incomplete data [2, 5, 39, 18]. Let $z^{*}_{1},\ldots,z^{*}_{n}\in\mathbb{C}_{1}$ where $\mathbb{C}_{1}:=\{x\in\mathbb{C}:\left|z\right|=1\}$ , the set of all unit complex numbers. Then each $z^{*}_{j}$ can be written equivalently as $e^{i\theta^{*}_{j}}$ for some phase (or angle) $\theta^{*}_{j}\in[0,2\pi)$ . For each $1\leq j<k\leq n$ , the observation $X_{jk}\in\mathbb{C}$ is missing at random. Let $A_{jk}\in\{0,1\}$ and $X_{jk}$ satisfy

\displaystyle X_{jk}:=\begin{cases}{z_{j}^{*}\overline{z_{k}^{*}}+\sigma W_{jk}},\text{ if }A_{jk}=1,\\ 0,\quad\quad\quad\quad\quad\text{ if }A_{jk}=0,\end{cases}

(1)

where $A_{jk}\sim\text{Bernoulli}(p)$ and $W_{jk}\sim\mathcal{CN}(0,1)$ . That is, each $X_{jk}$ is missing with probability $1-p$ and is denoted as 0. If it is not missing, it is equal to $z_{j}^{*}\overline{z_{k}^{*}}$ with an additive noise $\sigma W_{jk}$ where $W_{jk}$ follows the standard complex Gaussian distribution: ${\rm Re}(W_{jk}),{\rm Im}(W_{jk})\sim\mathcal{N}(0,1/2)$ independently. Each $A_{jk}$ is the indicator of whether $X_{jk}$ is observed or not. We assume all random variables $\{A_{jk}\}_{1\leq j<k\leq n},\{W_{j,k}\}_{1\leq j<k\leq n}$ are independent of each other. The goal is to estimate the phase vector $z^{*}:=(z_{1}^{*},\ldots,z_{n}^{*})\in\mathbb{C}_{1}^{n}$ from $\{A_{jk}\}_{1\leq j<k\leq n}$ and $\{X_{jk}\}_{1\leq j<k\leq n}$ .

The observations can be seen as entries of a matrix $X\in\mathbb{C}^{n\times n}$ with $X_{jj}:=0$ and $X_{kj}:=\overline{X_{jk}}$ for any $1\leq j<k\leq n$ . Define $A_{jj}:=0$ and $A_{kj}:=A_{jk}$ for all $1\leq j<k\leq n$ . Then the matrix $A\in\{0,1\}^{n\times n}$ can be interpreted as the adjacency matrix of an Erdös-Rényi random graph with edge probability $p$ . Define $W\in\mathbb{C}^{n\times n}$ such that $W_{jj}:=0$ and $W_{kj}:=\overline{W_{jk}}$ for all $1\leq j<k\leq n$ . Then all the matrices $A,W,X$ are Hermitian and $X$ can be written equivalently as

\displaystyle X=A\circ\left(z^{*}z^{*{\mathrm{\scriptscriptstyle H}}}+\sigma W\right)=A\circ\left(z^{*}z^{*{\mathrm{\scriptscriptstyle H}}}\right)+\sigma A\circ W.

(2)

Note that $X$ can be seen as a noisy version of

\displaystyle\mathbb{E}X=pz^{*}z^{*{\mathrm{\scriptscriptstyle H}}}-pI_{n}

(3)

whose leading eigenvector is $z^{*}/\sqrt{n}$ . This motivates the following spectral method [36, 18, 13]. Let $u\in\mathbb{C}^{n}$ be the leading eigenvector of $X$ . Then the spectral estimator $\widehat{z}\in\mathbb{C}_{1}^{n}$ is defined as

\displaystyle\widehat{z}_{j}:=\begin{cases}\frac{u_{j}}{\left|u_{j}\right|},\text{ if }u_{j}\neq 0,\\ 1,\quad\text{ if }u_{j}=0,\end{cases}

(4)

for each $j\in[n]$ , where each $u_{j}$ is normalized so that $\widehat{z}_{j}\in\mathbb{C}_{1}$ . The performance of the spectral estimator can be quantified by a normalized squared $\ell_{2}$ loss

\displaystyle\ell(\widehat{z},z^{*}):=\min_{a\in\mathbb{C}_{1}}\frac{1}{n}\sum_{j=1}^{n}\left|\widehat{z}_{j}-z_{j}^{*}a\right|^{2},

(5)

where the minimum over $\mathbb{C}_{1}$ is due to the fact that $z^{*}_{1},\ldots,z^{*}_{n}$ are identifiable only up to a phase.

The spectral estimator $\widehat{z}$ is simple and easy to implement. Regarding its theoretical performance, it was suggested in [18] that an upper bound $\ell(\widehat{z},z^{*})\leq C(\sigma^{2}+1)/(np)$ holds with high probability for some constant $C$ . However, the minimax risk of the phase synchronization was established in [18] and has the following lower bound:

\displaystyle\inf_{z\in\mathbb{C}^{n}}\sup_{z^{*}\in\mathbb{C}_{1}^{n}}\mathbb{E}\ell(z,z^{*})\geq\left(1-o(1)\right)\frac{\sigma^{2}}{2np}.

(6)

To provably achieve the minimax risk, the spectral method is often used as an initialization for some more sophisticated procedures. For example, it was used to initialize a generalized power method (GPM) [7, 29, 31] in [18]. Nevertheless, numerically the performance of the spectral method is already very good and the improvement from GPM is often marginal. This raises the following questions about the performance of the spectral method: Can we derive a sharp upper bound? Does the spectral method already achieve the minimax risk or not?

In this paper, we provide complete answers to these questions. We carry out a sharp $\ell_{2}$ analysis of the performance of the spectral estimator $\widehat{z}$ and further show it achieves the minimax risk with the correct constant. Our main result is summarized below in Theorem 1 in asymptotic form. Its non-asymptotic version will be given in Theorem 3 that only requires $\frac{np}{\sigma^{2}}$ , $\frac{np}{\log n}$ to be greater than a certain constant. We note that in this paper, $p$ and $\sigma^{2}$ are not constants but functions of $n$ . This dependence can be more explicitly represented as $p_{n}$ and $\sigma^{2}_{n}$ . However, for simplicity of notation and readability, we choose to denote these as $p$ and $\sigma^{2}$ throughout the paper.

Theorem 1.

Assume $\frac{np}{\sigma^{2}}\rightarrow\infty$ and $\frac{np}{\log n}\rightarrow\infty$ . There exists some $\delta=o(1)$ such that with high probability,

\displaystyle\ell(\widehat{z},z^{*})\leq(1+\delta)\frac{\sigma^{2}}{2np}.

(7)

As a consequence, when $\sigma=0$ (i.e., there is no additive noise), the spectral method recovers $z^{*}$ exactly (up to a phase) with high probability as long as $\frac{np}{\log n}\rightarrow\infty$ .

Theorem 1 shows that $\widehat{z}$ is not only rate-optimal but also achieves the exact minimax risk with the correct leading constant in front of the optimal rate. The conditions needed in Theorem 1 are necessary for consistent estimation of $z^{*}$ in the phase synchronization problem. The condition $\frac{np}{\sigma^{2}}\rightarrow\infty$ is needed so that $z^{*}$ can be estimated with a vanishing error according to the minimax lower bound (6). The condition $\frac{np}{\log n}\rightarrow\infty$ allows $p$ to decrease as $n$ grows and is close to the $\frac{np}{\log n}>(1+\epsilon)$ condition required for $A$ to be connected. If $A$ has disjoint subgraphs, it is impossible to estimate $z^{*}$ with a global phase. These two conditions are also needed in [18, 19] to establish the optimality of MLE (maximum likelihood estimation), GPM, and SDP (semidefinite programming). Under these two conditions, [18] used the spectral method as an initialization for the GPM and shows GPM achieves the minimax risk after $\log(1/\sigma^{2})$ iterations. On the contrary, Theorem 1 shows that the spectral method already achieves the minimax risk. This means that the spectral estimator is minimax optimal and achieves the correct leading constant whenever consistent estimation is possible, and in this parameter regime, it is as good as MLE, GPM, and SDP.

There are two key novel components toward establishing Theorem 1. The first is a new idea regarding the choice of the “population eigenvector” as $u$ can be viewed as its sample counterpart obtained from data. Due to (3) and the fact $pz^{*}z^{*{\mathrm{\scriptscriptstyle H}}}=np(z^{*}/\sqrt{n})(z^{*}/\sqrt{n})^{\mathrm{\scriptscriptstyle H}}$ is rank-one with the eigenvector $z^{*}/\sqrt{n}$ , existing literature such as [18, 20, 27] treated $z^{*}/\sqrt{n}$ as the population eigenvector and study the perturbation of $u$ with respect to $z^{*}/\sqrt{n}$ . This seems natural but turns out to be unappealing as it fails to explain why the spectral estimator is able to recover all phases exactly when $\sigma=0$ , i.e., when there is no additive noise. Instead, we denote $\widecheck{u}\in\mathbb{R}^{n}$ as the leading eigenvector of $A$ and regard $u^{*}\in\mathbb{C}^{n}$ , defined as

\displaystyle u^{*}:=z^{*}\circ\widecheck{u},

(8)

i.e., $u^{*}_{j}=z^{*}_{j}\widecheck{u}_{j}$ for each $j\in[n]$ , as the population eigenvector. Note that $u^{*}$ is random as it depends on the graph $A$ . A careful analysis of $u^{*}$ reveals that it is the leading eigenvector of $A\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}}$ (see Lemma 1). In addition, Proposition 1 shows that with high probability, $u^{*}_{j}/|u^{*}_{j}|=z^{*}_{j}$ for each $j\in[n]$ , up to a global phase. Since $u$ equals $u^{*}$ when $\sigma=0$ , it successfully explains the exact recovery of the spectral method in the no-additive-noise case. Another advantage of viewing $u^{*}$ as the population eigenvector, instead of $z^{*}/\sqrt{n}$ , is that intuitively $u^{*}$ is closer to $u$ than $z^{*}/\sqrt{n}$ is. This is because $A\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}}$ is closer to the data matrix $X$ than $pz^{*}z^{*{\mathrm{\scriptscriptstyle H}}}$ is.

The second key component is a novel perturbation analysis for $u$ . Classical matrix perturbation theory such as Davis-Kahan Theorem focuses on analyzing $\inf_{b\in\mathbb{C}_{1}}\|u-u^{*}b\|$ . We go beyond it and show $u$ can be well-approximated by its first-order approximation $\widetilde{u}$ defined as

\displaystyle\widetilde{u}:=\frac{Xu^{*}}{\left\|{Xu^{*}}\right\|},

(9)

in the sense that the difference between these two vectors (up to a phase) has a small $\ell_{2}$ norm. This means that when $np\gtrsim\log n$ and $np\gtrsim\sigma^{2}$ , we have

\displaystyle\inf_{b\in\mathbb{C}_{1}}\left\|{u-\widetilde{u}b}\right\|\lesssim\frac{\sigma^{2}+\sigma}{np},

(10)

with high probability (see Proposition 2). In fact, our perturbation analysis extends beyond the phase synchronization problem. What we establish is a general perturbation theory that can be applied to two arbitrary Hermitian matrices (see Lemma 2), which might be of independent interest.

With the help of these two key components, we then carry out an entrywise analysis for each $\widehat{z}_{j}=u_{j}/\left|u_{j}\right|$ . Note that $u_{j}$ can be decomposed into $\widetilde{u}_{j}$ and the difference between $u_{j}$ and $\widetilde{u}_{j}$ (up to some global phase). We can decompose the error of $\widehat{z}_{j}$ into two parts, one is related to the estimation error of $\widetilde{u}_{j}/|\widetilde{u}_{j}|$ , and the other is related to the magnitude of the difference between $u_{j}$ and $\widetilde{u}_{j}$ (up to some global phase). Summing over all coordinates, the first part eventually leads to the minimax risk $(1+o(1)){\sigma^{2}}/{2np}$ and the second part is essentially negligible due to (10), which leads to the exact minimax optimality of the spectral estimator.

Orthogonal Group Synchronization. The above analysis for the phase synchronization can be extended to quantify the performance of the spectral method for orthogonal group synchronization, which is about orthogonal matrices instead of phases. Let $d>0$ be an integer. Define

\displaystyle\mathcal{O}(d):=\left\{U\in\mathbb{R}^{d\times d}:UU^{\mathrm{\scriptscriptstyle T}}=U^{\mathrm{\scriptscriptstyle T}}U=I_{d}\right\}

(11)

to include all orthogonal matrices in $\mathbb{R}^{d\times d}$ . Let $Z^{*}_{1},\ldots,Z^{*}_{n}\in\mathcal{O}(d)$ . Analogous to (1), we consider the problem with additive Gaussian noise and incomplete data [20, 26, 27]. For each $1\leq j<k\leq n$ , we observe $\mathcal{X}_{jk}:={Z_{j}^{*}Z_{k}^{*{\mathrm{\scriptscriptstyle T}}}+\sigma\mathcal{W}_{jk}}\in\mathbb{R}^{d\times d}$ when $A_{jk}=1$ , where $\mathcal{W}_{jk}$ follows the standard matrix Gaussian distribution. The goal is to recover $Z_{1}^{*},\ldots,Z_{n}^{*}$ from $\{\mathcal{X}_{jk}\}_{1\leq j<k\leq n}$ and $\{A_{j,k}\}_{1\leq j<k\leq n}$ . This is known as the orthogonal group synchronization (or $\mathcal{O}(d)$ synchronization).

The observations $\{\mathcal{X}_{jk}\}_{1\leq j<k\leq n}$ can be seen as submatrices of an $nd\times nd$ matrix $\mathcal{X}$ with $\mathcal{X}_{jj}:=0$ and $\mathcal{X}_{kj}:=\mathcal{X}_{jk}^{\mathrm{\scriptscriptstyle T}}$ for any $1\leq j<k\leq n$ . Then $\mathcal{X}$ is symmetric and can be seen as a noisy version of

\displaystyle\mathbb{E}\mathcal{X}=pZ^{*}Z^{*{\mathrm{\scriptscriptstyle T}}}-pI_{nd}

(12)

whose leading eigenspace is $Z^{*}/\sqrt{n}$ , where $Z^{*}\in\mathcal{O}(d)^{n}$ is an $nd\times d$ matrix such that its $j$ th submatrix is $Z^{*}_{j}$ . Similar to the phase synchronization, we have the following spectral method. Let $\lambda_{1}\geq\ldots\geq\lambda_{d}$ be the largest $d$ eigenvalues of $\mathcal{X}$ and $u_{1},\ldots,u_{d}\in\mathbb{R}^{nd}$ be their corresponding eigenvectors. Denote $U:=(u_{1},\ldots,u_{d})\in\mathbb{R}^{nd\times d}$ as the eigenspace that includes the top $d$ eigenvectors of $\mathcal{X}$ . For each $j\in[n]$ , denote $U_{j}\in\mathbb{R}^{d\times d}$ as its $j$ th submatrix. Then the spectral estimator $\widehat{Z}_{j}\in\mathcal{O}(d)$ is defined as

\displaystyle\widehat{Z}_{j}:=\begin{cases}\mathcal{P}(U_{j}),\text{ if }\det(U_{j})\neq 0,\\ I_{d},\quad\;\;\;\text{ if }\det(U_{j})=0,\end{cases}

(13)

for each $j\in[n]$ . Here the mapping $\mathcal{P}:\mathbb{R}^{d\times d}\rightarrow\mathcal{O}(d)$ is derived from the polar decomposition and serves as a normalization step for each $U_{j}$ such that $\widehat{Z}_{j}\in\mathcal{O}(d)$ . Let $\widehat{Z}\in\mathcal{O}(d)^{n}$ be an $nd\times d$ matrix such that $\widehat{Z}_{j}$ is its $j$ th submatrix for each $j\in[n]$ . Then the performance of $\widehat{Z}$ can be quantified by a loss function $\ell^{\text{od}}(\widehat{Z},Z^{*})$ that is analogous to (5). The detailed definitions of $\mathcal{P}$ and $\ell^{\text{od}}$ are deferred to Section 3.

The spectral method $\widehat{Z}$ was used as an initialization in [20] for a variant of GPM to achieve the exact minimax risk $(1+o(1))\frac{d(d-1)\sigma^{2}}{2np}$ for $d=O(1)$ . To conduct a sharp analysis of its statistical performance, we extend our novel perturbation analysis from analyzing the leading eigenvector to the leading eigenspace. Recall $\widecheck{u}$ is the leading eigenvector of $A$ . Analogous to (8), we have a novel choice of the population eigenspace $U^{*}$ , defined as

\displaystyle U^{*}:=Z^{*}\circ(\widecheck{u}\otimes\mathds{1}_{d}),

(14)

and view $U$ as its sample counterpart. This is different from existing literature [20, 40] which uses $Z^{*}/\sqrt{n}$ as the population eigenspace. Our choice of $U^{*}$ enables the establishment of the exact recovery of the spectral method when there is no additive noise (i.e., $\sigma=0$ ), as seen Proposition 3, and is closer to $U$ than $Z^{*}/\sqrt{n}$ is.

The first-order approximation of $U$ is a matrix determined by $\mathcal{X}U^{*}$ whose explicit expression will be given later in (22). We then show $U$ can be well-approximated by its first-order approximation, analogous to (9), with a remainder term of a small $\ell_{2}$ norm (see Proposition 4). This is a consequence of a more general eigenspace perturbation theory (see Lemma 4) for two arbitrary Hermitian matrices. Using the first-order approximation, we then carry out an entrywise analysis for $\widehat{Z}$ . Our main result for the spectral method in the $\mathcal{O}(d)$ synchronization is summarized below in Theorem 2. The non-asymptotic version will be given in Theorem 4.

Theorem 2.

Assume $\frac{np}{\sigma^{2}}\rightarrow\infty$ , $\frac{np}{\log n}\rightarrow\infty$ , and $2\leq d=O(1)$ . There exists some $\delta=o(1)$ such that with high probability,

\displaystyle\ell^{\text{od}}(\widehat{Z},Z^{*})\leq(1+\delta)\frac{d(d-1)\sigma^{2}}{2np}.

As a consequence, when $\sigma=0$ (i.e., there is no additive noise), the spectral method recovers $Z^{*}$ exactly (up to an orthogonal matrix) with high probability as long as $\frac{np}{\log n}\rightarrow\infty$ .

Theorem 2 shows that the spectral method $\widehat{Z}$ achieves exact minimax optimality as it matches the minimax lower bound $(1+o(1))\frac{d(d-1)\sigma^{2}}{2np}$ established in [20]. Similar to the phase synchronization, the two conditions needed in Theorem 2 so that consistent estimation of $Z^{*}$ is possible. They are also needed in [20] to achieve the minimax risk by a variant of GPM initialized by the spectral method. On the contrary, Theorem 2 shows that in this parameter regime, the spectral method is already minimax optimal with the correct leading constant.

Related Literature. Synchronization is a fundamental problem in applied math and statistics. Various methods have been studied for both phase synchronization and $\mathcal{O}(d)$ synchronization, including the maximum likelihood estimation (MLE) [18, 39], GPM [39, 29, 18, 20, 26, 34], SDP [4, 37, 28, 16, 38, 19, 22], spectral methods [4, 37, 33, 8, 30, 17], and message passing [31, 25, 35]. The theoretical performance of spectral methods was investigated in [18, 20, 27] and crude error bounds under $\ell_{2}$ or Frobenius norm were obtained. An $\ell_{\infty}$ -type error bound for spectral methods was also given in [27].

Fine-grained perturbation analysis of eigenvectors has gained increasing attention in recent years for various low-rank matrix problems in machine learning and statistics. Existing literature has mostly focused on establishing $\ell_{\infty}$ bounds for eigenvectors [1, 12, 15] or $\ell_{2,\infty}$ bounds on eigenspaces [23, 11, 9, 3]. For instance, [1] developed $\ell_{\infty}$ -type bounds for the difference between eigenvectors (or eigenspaces) and their first-order approximations. In this paper, we focus on developing sharp $\ell_{2}$ -type perturbation bounds, where direct applications of existing $\ell_{\infty}$ -type results will result in extra logarithm factors.

For the phase synchronization problem, [17, 21, 32] investigated variants of spectral methods based on Laplacian matrices. Instead of using the leading eigenvector of $X$ as in this paper, they utilize the eigenvector corresponding to the smallest eigenvalue of $D-X$ or $I_{n}-D^{-\frac{1}{2}}XD^{-\frac{1}{2}}$ , where $D\in\mathbb{R}^{n\times n}$ is the degree matrix of $A$ with diagonal entries $D_{jj}:=\sum_{k\neq j}A_{jk}$ and off-diagonal entries set to zero. These studies have established upper bounds for the performance of their spectral methods applicable to general graphs $A$ and additive noise $W$ . Our focus, however, is on Erdös-Rényi random graphs with Gaussian noise. Under our setting, their results imply an upper bound of $\frac{C\sigma^{2}}{np}$ , where $C$ is a constant significantly greater than 1. In contrast, our work establishes a sharp upper bound with the correct leading constant $1/2$ . Our analytical approach could potentially be extended to their methods to achieve the correct constant $1/2$ .

The existing literature [18, 19, 20] explored the exact minimax risk in synchronization problems, focusing primarily on minimax lower bounds and analyzing MLE, GPM, and SDP. While our study shares a thematic resemblance with these prior efforts, it fundamentally diverges in both analysis and proof techniques. Previous studies hinge on contraction properties of the generalized power iteration (GPI), demonstrating the iterative reduction in GPM error until an optimal error is achieved. This approach further interprets MLE as a GPI fixed point and SDP as an extension of GPI in a higher-dimensional space, thereby establishing their optimality. In contrast, this paper employs a novel strategy specifically tailored for the spectral method. Instead of relying on the GPI framework, which proves inadequate for spectral analysis, we introduce a new perturbation toolkit designed for eigenvector analysis. This toolkit provides precise characterization of eigenvector perturbation and leads to the optimality of the spectral method. It opens new avenues for research and application beyond synchronization problems.

Organization. We study the phase synchronization in Section 2. We first establish the exact recovery of the spectral method in the no-additive-noise case in Section 2.1. Then in Section 2.2, we present our main technical tool for quantifying the distance between the leading eigenvector and its first-order approximation. We then carry out an entry-wise analysis of the spectral method and obtain non-asymptotic sharp upper bounds in Section 2.3. Finally, we consider the extension to the orthogonal group synchronization in Section 3. Proofs of results for the phase synchronization are given in Section 5. Due to the page limit, we prove Lemma 4 in Section 6 and include the proofs of other results for the orthogonal group synchronization in the Appendix.

Notation. For any positive integer $n$ , we write $[n]:=\{1,2,\ldots,n\}$ and $\mathds{1}_{n}:=(1,1,\ldots,1)\in\mathbb{R}^{n}$ . Denote $I_{n}$ as the $n\times n$ identity matrix and $J_{n}:=\mathds{1}_{n}\mathds{1}_{n}^{\mathrm{\scriptscriptstyle T}}\in\mathbb{R}^{n\times n}$ as the $n\times n$ matrix with all entries being one. Given $a,b\in\mathbb{R}$ , we denote $a\wedge b:=\min\{a,b\}$ and $a\vee b:=\max\{a,b\}$ . For a complex number $x\in\mathbb{C}$ , we use $\bar{x}$ for its complex conjugate, ${\rm Re}(x)$ for its real part, ${\rm Im}(x)$ for its imaginary part, and $|x|$ for its modulus. Denote $\mathbb{S}_{n}:=\left\{x\in\mathbb{C}^{n}:\left\|{x}\right\|=1\right\}$ as including all unit vectors in $\mathbb{C}^{n}$ . For a complex vector $x=(x_{j})\in\mathbb{C}^{d}$ , we denote $\|x\|=({\sum_{j=1}^{d}|x_{j}|^{2}})^{1/2}$ as its Euclidean norm. For a complex matrix $B=(B_{jk})\in\mathbb{C}^{d_{1}\times d_{2}}$ , we use $B^{\mathrm{\scriptscriptstyle H}}\in\mathbb{C}^{d_{2}\times d_{1}}$ for its conjugate transpose such that $B^{{\mathrm{\scriptscriptstyle H}}}_{jk}=\overline{B_{kj}}$ . The Frobenius norm and the operator norm of $B$ are defined by $\left\|{B}\right\|_{\rm F}:=({\sum_{j=1}^{d_{1}}\sum_{k=1}^{d_{2}}|B_{jk}|^{2}})^{1/2}$ and $\left\|{B}\right\|:=\sup_{u\in\mathbb{C}^{d_{1}},v\in\mathbb{C}^{d_{2}}:\left\|{u}\right\|=\left\|{v}\right\|=1}u^{\mathrm{\scriptscriptstyle H}}Bv$ . We use $\text{Tr}(B)$ for the trace of a squared matrix $B$ . We denote $B_{j\cdot}$ as its $j$ th row and define $\|B\|_{2\rightarrow\infty}:=\max_{j\in[d_{1}]}\left\|{B_{j\cdot}}\right\|$ . The notation $\det(\cdot)$ and $\otimes$ are used for determinant and Kronecker product. For $U,V\in\mathbb{C}^{d_{1}\times d_{2}}$ , $U\circ V\in\mathbb{R}^{d_{1}\times d_{2}}$ is the Hadamard product $U\circ V:=(U_{jk}V_{jk})$ . For any $B\in\mathbb{R}^{d_{1}\times d_{2}}$ , we denote $s_{\min}(B)$ as its smallest singular value. For two positive sequences $\{a_{n}\}$ and $\{b_{n}\}$ , $a_{n}\lesssim b_{n}$ and $a_{n}=O(b_{n})$ both mean $a_{n}\leq Cb_{n}$ for some constant $C>0$ independent of $n$ . We also write $a_{n}=o(b_{n})$ or $\frac{b_{n}}{a_{n}}\rightarrow\infty$ when $\limsup_{n}\frac{a_{n}}{b_{n}}=0$ . We use ${\mathbb{I}\left\{{\cdot}\right\}}$ as the indicator function. Define $\mathcal{O}(d_{1},d_{2}):=\{V\in\mathbb{R}^{d_{1}\times d_{2}}:V^{\mathrm{\scriptscriptstyle T}}V=I_{d_{2}}\}$ to include all $d_{1}\times d_{2}$ matrices that have orthonormal columns.

2 Phase Synchronization

2.1 No-additive-noise Case

We first study a special case where there is no additive noise (i.e., $\sigma=0$ ). In this setting, the data matrix $X=A\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}}$ . Despite the data still being missing at random, we are going to show the spectral method is able to recover $z^{*}$ exactly, up to a phase.

Recall that $\widecheck{u}$ is the leading eigenvector of $A$ and $u^{*}$ is defined in (8). The following lemma points out the connection between $u^{*}$ and $A\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}}$ as well as the connection between eigenvalues of $A$ and those of $A\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}}$ .

Lemma 1.

The unit vector $u^{*}$ is the leading eigenvector of $A\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}}$ . That is, with $\lambda^{*}$ denoting as the largest eigenvalue of $A\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}}$ , we have

\displaystyle\left(A\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}}\right)u^{*}=\lambda^{*}u^{*},

(15)

In addition, all the eigenvalues of $A$ are also eigenvalues of $A\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}}$ , and vice versa.

Since $X=A\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}}$ in the no-additive-noise case, we have $u=u^{*}$ . Note that $\widehat{z}_{j}=u_{j}/|u_{j}|=u^{*}_{j}/|u^{*}_{j}|=z^{*}_{j}\widecheck{u}_{j}/|z^{*}_{j}\widecheck{u}_{j}|$ for each $j\in[n]$ . If $\widecheck{u}_{j}>0$ , we have $\widehat{z}_{j}=z_{j}^{*}$ . If $\widehat{u}_{j}<0$ instead, then $\widehat{z}_{j}=-z_{j}^{*}$ . If all the coordinates of $\widecheck{u}$ are positive (or negative), we have $\widehat{z}$ being equal to $z^{*}$ (or $-z^{*}$ ) exactly. The following proposition provides an $\ell_{\infty}$ control for the difference between $\widecheck{u}$ and $\mathds{1}_{n}/\sqrt{n}$ , which are eigenvectors of $A$ and $\mathbb{E}A$ , respectively. The proof of (16) follows proofs of results in [1]. When the right-hand side of (16) is smaller than $1/\sqrt{n}$ , it immediately establishes the exact recovery of $\widehat{z}$ .

Proposition 1.

There exist some constants $C_{1},C_{2}>0$ such that if $\frac{np}{\log n}>C_{1}$ , we have

\displaystyle\min_{b\in\{1,-1\}}\max_{j\in[n]}\left|\widecheck{u}_{j}-\frac{1}{\sqrt{n}}b\right|\leq C_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)\frac{1}{\sqrt{n}},

(16)

with probability at least $1-8n^{-10}$ . As a result, if $\frac{np}{\log n}>\max\left\{C_{1},2C_{2}^{2}\right\}$ , we have $\ell(\widehat{z},z^{*})=0$ with probability at least $1-8n^{-10}$ .

Lemma 1 and Proposition 1 together establish the exact recovery of $\widehat{z}$ for the special case where $\sigma=0$ , through studying $u^{*}$ . This provides a starting point for our analysis of the general case where $\sigma\neq 0$ . From (2), the data matrix $X$ is a noisy version of $A\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}}$ with additive noise $\sigma A\circ W$ that scales with $\sigma$ . As a result, in the following sections, we view $u^{*}$ as the population eigenvector and $u$ as its sample counterpart, studying the performance of the spectral method.

2.2 First-order Approximation of The Leading Eigenvector

In this section, we provide a fine-grained perturbation analysis for the eigenvector $u$ . Classical matrix perturbation theory, such as Davis-Kahan Theorem, can only give a crude upper bound for $\inf_{b\in\mathbb{C}_{1}}\left\|{u-u^{*}b}\right\|$ , which turns out to be insufficient to derive a sharp bound for $\ell(\widehat{z},z^{*})$ . Instead, we develop a more powerful tool for perturbation analysis of $u$ using its first-order approximation $\widetilde{u}$ defined in (9). In fact, our tool goes beyond the phase synchronization problem and can be applied to arbitrary Hermitian matrices.

Lemma 2.

Consider two Hermitian matrices $Y,Y^{*}\in\mathbb{C}^{n\times n}$ . Let $\mu^{*}_{1}\geq\mu^{*}_{2}\geq\ldots\geq\mu^{*}_{n}$ be the eigenvalues of $Y^{*}$ . Let $v^{*}$ (resp. $v$ ) be the eigenvector of $Y^{*}$ (resp. $Y$ ) corresponding to its largest eigenvalue. If $\left\|{Y-Y^{*}}\right\|\leq\min\{\mu_{1}^{*}-\mu_{2}^{*},\mu_{1}^{*}\}/4$ , we have

	$\displaystyle\inf_{b\in\mathbb{C}_{1}}\left\\|{v-\frac{Yv^{}}{\left\\|{Yv^{}}\right\\|}b}\right\\|$	$\displaystyle\leq\frac{40\sqrt{2}}{9(\mu_{1}^{}-\mu_{2}^{})}\Bigg{(}\left(\frac{4}{\mu_{1}^{}-\mu_{2}^{}}+\frac{2}{\mu_{1}^{}}\right)\left\\|{Y-Y^{}}\right\\|^{2}$
		$\displaystyle\quad\quad\quad\quad\quad\quad\quad+\frac{\max\{\|\mu_{2}^{}\|,\|\mu_{n}^{}\|\}}{\mu_{1}^{}}\left\\|{Y-Y^{}}\right\\|\Bigg{)}.$

In Lemma 2, there are two matrices $Y,Y^{*}$ whose leading eigenvectors are $v,v^{*}$ respectively. It studies the $\ell_{2}$ difference between $v$ and $Yv^{*}/{\left\|{Yv^{*}}\right\|}$ up to a phase. Let $\mu_{1}$ be the largest eigenvalue of $Y$ . The unit vector $Yv^{*}/{\left\|{Yv^{*}}\right\|}$ is interpreted as the first-order approximation of $v$ , as $v$ can be decomposed into $v=Yv/\mu_{1}=Yv^{*}/\mu_{1}+Y(v-v^{*})/\mu_{1}$ where the first term $Yv^{*}/\mu_{1}$ is proportional to $Yv^{*}/{\left\|{Yv^{*}}\right\|}$ . If $Y^{*}$ is rank-one, meaning $\mu_{2}^{*}=\mu^{*}_{n}=0$ , the upper bound in Lemma 2 becomes ${80\sqrt{2}\left\|{Y-Y^{*}}\right\|^{2}}/{(3\mu_{1}^{*2})}$ . Lemma 2 itself might be of independent interest and be useful in other low-rank matrix problems.

The key to Lemma 2 is the following equation. Since $\mu_{1}v=Yv$ and $\left\|{Yv^{*}}\right\|\frac{Yv^{*}}{\left\|{Yv^{*}}\right\|}=Yv^{*}$ , we can derive (see (29) in the proof of Lemma 2):

\displaystyle\mu_{1}^{-1}\left\|{Yv^{*}}\right\|(\mu_{1}I_{n}-Y)\left(v-\frac{Yv^{*}}{\left\|{Yv^{*}}\right\|}\right)=Y(\mu_{1}^{-1}Yv^{*}-v^{*}).

Its left-hand side can be shown to be related to $\inf_{b\in\mathbb{C}_{1}}\left\|{v-{Yv^{*}b}/{\left\|{Yv^{*}}\right\|}}\right\|$ . By carefully studying and upper bounding its right-hand side, which does not involve $v$ , we derive Lemma 2.

Lemma 2 requires the perturbation between $Y$ and $Y^{*}$ is not only small compared to the eigengap $\mu^{*}_{1}-\mu^{*}_{2}$ , but also small compared to the leading eigenvalue $\mu^{*}_{1}$ . A similar requirement is also needed in [1] to establish $\ell_{\infty}$ bounds for the difference between the eigenvector and its first-order approximation. In contrast, classical theory such as Davis-Kahan theorem (see Lemma 5) only needs the perturbation to be small compared to the eigengap to bound $\inf_{b\in\mathbb{C}_{1}}\left\|{v-v^{*}b}\right\|$ . A natural question is whether the bound in Lemma 2 can be modified to depend on eigenvalues only through the eigengaps. It turns out this is not feasible, as it deals with the distance between $v$ and its first-order approximation $Yv^{*}/{\left\|{Yv^{*}}\right\|}$ , not the distance between $v$ and $v^{*}$ as in Davis-Kahan theorem. To illustrate it, consider the following counterexample. Let $e_{1},\ldots,e_{n}$ be the canonical basis of $\mathbb{R}^{n}$ . Let $\delta>0$ . Define

\displaystyle Y^{*}:=\text{diag}(0,-1,-1,\ldots,-1)\in\mathbb{R}^{n\times n},\text{ and }Y:=Y^{*}+\delta(e_{1}+e_{2})(e_{1}+e_{2})^{\mathrm{\scriptscriptstyle T}}/2.

(17)

Then $\mu_{1}^{*}=0,\mu_{2}^{*}=-1,\mu^{*}_{1}-\mu^{*}_{2}=1,v^{*}=e_{1},\left\|{Y-Y^{*}}\right\|=\delta$ , and $Yv^{*}/{\left\|{Yv^{*}}\right\|}=(e_{1}+e_{2})/\sqrt{2}$ . We can show $v$ has the following explicit expression (see Appendix C for detailed calculation):

\displaystyle v=\sqrt{\frac{1}{2}\left(1+\frac{1}{\sqrt{1+\delta^{2}}}\right)}e_{1}+\sqrt{\frac{1}{2}\left(1-\frac{1}{\sqrt{1+\delta^{2}}}\right)}e_{2}.

(18)

When $\delta$ is sufficiently close to 0, we have $v\approx v^{*}$ . This is not surprising as it is consistent with the bound from Davis-Kahan theorem as the ratio between the perturbation and eigengap is $\left\|{Y-Y^{*}}\right\|/(\mu^{*}_{1}-\mu^{*}_{2})=\delta\approx 0$ . On the other hand, $\left\|{v-Yv^{*}/{\left\|{Yv^{*}}\right\|}}\right\|\approx\left\|{e_{1}-(e_{1}+e_{2})/\sqrt{2}}\right\|=2-\sqrt{2}>0$ no matter how small $\delta$ may be. As a result, in this counterexample, $Yv^{*}/{\left\|{Yv^{*}}\right\|}$ is not a good approximation of $v$ despite the sufficiently small perturbation.

Applying Lemma 2 to the phase synchronization, we have the following result.

Proposition 2.

There exist constants $C_{1},C_{2},C_{3}>0$ such that if $\frac{np}{\log n}>C_{1}$ and $\frac{np}{\sigma^{2}}>C_{2}$ , we have

\displaystyle\inf_{b\in\mathbb{C}_{1}}\left\|{u-\widetilde{u}b}\right\|\leq C_{3}\frac{\sigma^{2}+\sigma}{np},

with probability at least $1-3n^{-10}$ .

Proposition 2 shows that $u$ is well-approximated by its first-order approximation $\widetilde{u}$ (up to a phase) with an approximation error that is at most in the order of $(\sigma^{2}+\sigma)/np$ . Note that we can show $\inf_{b\in\mathbb{C}_{1}}\left\|{u-u^{*}b}\right\|$ is of order $\sigma/\sqrt{np}$ by using Davis-Kahan Theorem. This is much larger than the upper bound derived in Proposition 2, particularly when $np/\sigma^{2}$ is large. As a result, $\widetilde{u}$ provides a precise characterization of $u$ with negligible $\ell_{2}$ error.

2.3 Sharp $\ell_{2}$ Analysis of The Spectral Estimator

In this section, we will conduct a sharp analysis of the performance of the spectral estimator $\widehat{z}$ using the first-order approximation $\widetilde{u}$ of the eigenvector $u$ . According to Proposition 2, $u$ is close to $\widetilde{u}$ (up to a phase) with a small difference. Then intuitively, $\widehat{z}$ should be close to its counterpart that uses $\widetilde{u}$ instead of $u$ in (4), up to a global phase. For each $j\in[n]$ , the distance of $\widetilde{u}_{j}/\left|\widetilde{u}_{j}\right|$ from $z^{*}_{j}$ is essentially determined by $\overline{z^{*}_{j}}\widetilde{u}_{j}$ . By the definition in (9), $\widetilde{u}_{j}$ is proportional to $[Xu^{*}]_{j}$ , the $j$ th coordinate of $Xu^{*}$ . With (2), it leads to $\overline{z^{*}_{j}}\widetilde{u}_{j}\propto\lambda^{*}\overline{z^{*}_{j}}u^{*}_{j}+\sigma\sum_{k\neq j}A_{jk}W_{jk}\overline{z^{*}_{j}}u_{k}^{*}$ . Here the first term $\lambda^{*}\overline{z^{*}_{j}}u^{*}_{j}$ can be interpreted as the signal as it is related to the population quantity $u^{*}_{j}$ , which gives the exact recovery of the spectral method in the no-additive-noise case in Proposition 1. As $u^{*}$ is close to $z^{*}/\sqrt{n}$ , the second term is approximately equal to $n^{-1/2}\sum_{k\neq j}A_{jk}W_{jk}\overline{z^{*}_{j}}z_{k}^{*}$ . Its contribution toward the estimation error is essentially determined by its imaginary part $n^{-1/2}{\rm Im}(\sum_{k\neq j}A_{jk}W_{jk}\overline{z^{*}_{j}}z_{k}^{*})$ , which can be interpreted as the main error term. Summing over all $j\in[n]$ , the signals and the main error terms together lead to the minimax risk $\sigma^{2}/(2np)$ . At the same time, contributions of approximation errors such as $\inf_{b\in\mathbb{C}_{1}}\left\|{u-\widetilde{u}b}\right\|$ turn out to be negligible. This leads to the following theorem on the performance of the spectral estimator.

Theorem 3.

There exist constants $C_{1},C_{2},C_{3}>0$ such that if $\frac{np}{\log n}>C_{1}$ and $\frac{np}{\sigma^{2}}>C_{2}$ , we have

\displaystyle\ell(\widehat{z},z^{*})\leq\left(1+C_{3}\left(\left(\frac{\sigma^{2}}{np}\right)^{\frac{1}{4}}+\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)\right)\frac{\sigma^{2}}{2np},

with probability at least $1-n^{-9}-\exp\left(-\frac{1}{32}\left(\frac{np}{\sigma^{2}}\right)^{\frac{1}{4}}\right)$ .

Theorem 3 is non-asymptotic and its asymptotic version is presented in Theorem 1. It covers the no-additive-noise case (i.e., Proposition 1), as it implies that $\ell(\widehat{z},z^{*})=0$ with high probability when $\sigma=0$ . Theorem 3 shows that $\ell(\widehat{z},z^{*})$ is equal to $\sigma^{2}/(2np)$ up to a factor that is determined by $(\sigma^{2}/(np))^{1/4}$ , $\sqrt{\log n/(np)}$ , and $1/\log(np)$ . The first term is related to various approximation errors including the one from Proposition 2. The second and third terms are derived from (16).

We can make a comparison between Theorem 3 and the existing result $\ell(\widehat{z},z^{*})\lesssim(\sigma^{2}+1)/np$ in [18]. There are two main improvements. First, we obtain the exact constant 1/2 for the error term $\frac{\sigma^{2}}{np}$ , which gives a more accurate characterization of the performance of the spectral estimator. Second, the $1/np$ error term in $(\sigma^{2}+1)/np$ no longer exists in Theorem 3. We further compare Theorem 3 with the minimax lower bound for the phase synchronization problem. The paper [18] proved that there exist constants $C_{4},C_{5}>0$ such that if $\frac{np}{\sigma^{2}}\geq C_{4}$ , we have

\displaystyle\inf_{z\in\mathbb{C}^{n}}\sup_{z^{*}\in\mathbb{C}_{1}^{n}}\mathbb{E}\ell(z,z^{*})\geq\left(1-C_{5}\left(\frac{\sigma^{2}}{np}+\frac{1}{n}\right)\right)\frac{\sigma^{2}}{2np}.

(19)

Compared with (19), the spectral estimator $\widehat{z}$ is exact minimax optimal as it not only achieves the correct rate $\sigma^{2}/(np)$ but also the correct constant $1/2$ . Under the parameter regime as in Theorem 3, [18, 19] showed that MLE, GPM (if properly initialized), and SDP achieve the exact minimax risk. Theorem 3 points out that the spectral method is as good as these methods.

3 Orthogonal Group Synchronization

In this section, we will extend our analysis to matrix synchronizations where the quantities of interest are orthogonal matrices instead of phases. The orthogonal group synchronization problem has been briefly introduced in Section 1. Here we provide more details about the problem.

Let $d>0$ be an integer. Recall the definition of $\mathcal{O}(d)$ in (11) and that $Z^{*}_{1},\ldots,Z^{*}_{n}\in\mathcal{O}(d)$ . For each $1\leq j<k\leq n$ , the observation $\mathcal{X}_{jk}\in\mathbb{R}^{d\times d}$ is given by

\displaystyle\mathcal{X}_{jk}:=\begin{cases}{Z_{j}^{*}Z_{k}^{*{\mathrm{\scriptscriptstyle T}}}+\sigma\mathcal{W}_{jk}},\text{ if }A_{jk}=1,\\ 0,\quad\quad\quad\quad\quad\quad\text{ if }A_{jk}=0,\end{cases}

(20)

where $A_{jk}\sim\text{Bernoulli}(p)$ and $\mathcal{W}_{jk}\sim\mathcal{MN}(0,I_{d},I_{d})$ , i.e., the standard matrix Gaussian distribution¹¹1A random matrix $X$ follows a matrix Gaussian distribution $\mathcal{MN}(M,\Sigma,\Omega)$ if its density function is proportional to $\exp\left(-\frac{1}{2}\text{Tr}\left(\Omega^{-1}(X-M)^{{\mathrm{\scriptscriptstyle T}}}\Sigma^{-1}(X-M)\right)\right)$ .. We assume $\{A_{jk}\}_{1\leq j<k\leq n},\{\mathcal{W}_{j,k}\}_{1\leq j<k\leq n}$ are all independent of each other. Similar to the phase synchronization problem, the observations are missing at random with additive Gaussian noises. The goal is to recover $Z_{1}^{*},\ldots,Z_{n}^{*}$ from $\{\mathcal{X}_{jk}\}_{1\leq j<k\leq n}$ and $\{A_{j,k}\}_{1\leq j<k\leq n}$ .

The data matrix $\mathcal{X}\in\mathbb{R}^{nd\times nd}$ can be written equivalently in a way that is analogous to (2). Define $A_{jj}:=0$ and $A_{kj}:=A_{jk}$ for all $1\leq j<k\leq n$ . Define $\mathcal{W}\in\mathbb{C}^{nd\times nd}$ such that $\mathcal{W}_{jj}:=0_{d\times d}$ and $\mathcal{W}_{kj}:=\mathcal{W}_{jk}^{\mathrm{\scriptscriptstyle T}}$ for all $1\leq j<k\leq n$ . Then we have the expression:

\displaystyle\mathcal{X}=(A\otimes J_{d})\circ(Z^{*}Z^{*{\mathrm{\scriptscriptstyle T}}}+\sigma\mathcal{W})=(A\otimes J_{d})\circ Z^{*}Z^{*{\mathrm{\scriptscriptstyle T}}}+\sigma(A\otimes J_{d})\circ\mathcal{W}.

(21)

From (12), the data matrix $\mathcal{X}$ can be seen as a noisy version of $pZ^{*}Z^{*{\mathrm{\scriptscriptstyle T}}}$ . Since the columns of $Z^{*}$ are orthogonal to each other, we have the following eigendecomposition: $pZ^{*}Z^{*{\mathrm{\scriptscriptstyle T}}}=np(Z^{*}/\sqrt{n})(Z^{*}/\sqrt{n})^{\mathrm{\scriptscriptstyle T}}$ where $Z^{*}/\sqrt{n}\in\mathcal{O}(nd,d)$ . That is, $np$ is the only non-zero eigenvalue of $pZ^{*}Z^{*{\mathrm{\scriptscriptstyle T}}}$ with multiplicity $d$ .

The definition of the spectral estimator $\widehat{Z}_{1},\ldots,\widehat{Z}_{n}$ is given in (13). The mapping $\mathcal{P}:\mathbb{R}^{d\times d}\rightarrow\mathcal{O}(d)$ is from the polar decomposition and is defined as follows. For any matrix $B\in\mathbb{R}^{d\times d}$ that is full-rank, it admits a singular value decomposition (SVD): $B=MDV^{\mathrm{\scriptscriptstyle T}}$ with $M,V\in\mathcal{O}(d)$ and $D$ a diagonal matrix. Then its polar decomposition is $B=(MV^{\mathrm{\scriptscriptstyle T}})(VDV^{\mathrm{\scriptscriptstyle T}})$ and $\mathcal{P}(B):=MV^{\mathrm{\scriptscriptstyle T}}$ is defined as its first factor.

Recall that $\widecheck{u}$ is the leading eigenvector of $A$ and the population eigenspace $U^{*}$ is defined in (14). That is, $U^{*}\in\mathbb{R}^{nd\times d}$ and its $j$ th submatrix is $U^{*}_{j}=\widecheck{u}_{j}Z^{*}_{j}\in\mathbb{R}^{d\times d}$ for each $j\in[n]$ . Following the proof of Lemma 1, we can show $U^{*}$ is the leading eigenspace of $(A\otimes J_{d})\circ Z^{*}Z^{*{\mathrm{\scriptscriptstyle T}}}$ :

Lemma 3.

Denote $\lambda^{*}_{1}\geq\lambda_{2}^{*}\geq\ldots\geq\lambda^{*}_{nd}$ as the eigenvalues of $(A\otimes J_{d})\circ Z^{*}Z^{*{\mathrm{\scriptscriptstyle T}}}$ . Then $\lambda^{*}_{1}=\lambda^{*}_{2}=\ldots=\lambda^{*}_{d}$ , all equal the leading eigenvalue of $A$ . In addition, $\lambda^{*}_{d+1}$ is equal to the second largest eigenvalue of $A$ . Furthermore, $U^{*}$ is the eigenspace of $(A\otimes J_{d})\circ Z^{*}Z^{*{\mathrm{\scriptscriptstyle T}}}$ corresponding to $\lambda^{*}_{1}$ , i.e.,

\displaystyle((A\otimes J_{d})\circ Z^{*}Z^{*{\mathrm{\scriptscriptstyle T}}})U^{*}=\lambda^{*}_{1}U^{*}.

Following the proof of Proposition 1, particularly using (16), we can further establish the exact recovery of $\widehat{Z}$ , up to an orthogonal matrix, in the no-additive-noise case.

Proposition 3.

Consider the no-additive-noise case where $\sigma=0$ . There exists some constant $C_{1}>0$ such that if $\frac{np}{\log n}>C_{1}$ , we have $\ell(\widehat{z},z^{*})=0$ with probability at least $1-7n^{-10}$ .

Similar to the phase synchronization, we can study the first-order approximation of the eigenspace $U$ . Denote $\Lambda:=\text{diag}(\lambda_{1},\ldots,\lambda_{d})\in\mathbb{R}^{d\times d}$ as the diagonal matrix of the $d$ largest eigenvalues of $\mathcal{X}$ . Then $U$ can be expressed as $U=\mathcal{X}U\Lambda^{-1}$ . Define

\displaystyle\widetilde{U}:=\mathop{\rm argmin}_{U^{\prime}\in\mathcal{O}(nd,d)}\left\|{U^{\prime}-\mathcal{X}U^{*}}\right\|_{\rm F}^{2}.

(22)

Then $\widetilde{U}$ is the projection of $\mathcal{X}U^{*}$ onto $\mathcal{O}(nd,d)$ . This is similar to the definition of $\widetilde{u}$ in (9) for the phase synchronization, where $\widetilde{u}$ is the projection of $Xu^{*}$ onto the unit sphere. As a result, $\widetilde{U}$ can be regarded as the first-order approximation of $U$ .

The following lemma provides an upper bound for a leading eigenspace and its first-order approximation of two arbitrary Hermitian matrices. It is an extension of Lemma 2 which is only about the perturbation of a leading eigenvector. The proof of Lemma 4 follows that of Lemma 2 but is more involved, as it needs to deal with matrix multiplication which is not commutative.

Lemma 4.

Consider two symmetric matrices $Y,Y^{*}\in\mathbb{R}^{n\times n}$ . Let $\mu^{*}_{1}\geq\mu^{*}_{2}\geq\ldots\geq\mu^{*}_{n}$ be the eigenvalues of $Y^{*}$ . Let $V^{*}\in\mathbb{R}^{n\times d}$ (resp. $V$ ) be the leading eigenspace of $Y^{*}$ (resp. $Y$ ) corresponding to its $d$ largest eigenvalues. Define $\widetilde{V}:=\mathop{\rm argmin}_{V^{\prime}\in\mathcal{O}(n,d)}\left\|{V^{\prime}-YV^{*}}\right\|_{\rm F}^{2}$ . If $\left\|{Y-Y^{*}}\right\|\leq\min\{\mu_{d}^{*}-\mu_{d+1}^{*},\mu_{d}^{*}\}/4$ , we have

	$\displaystyle\inf_{O\in\mathcal{O}(d)}\left\\|{V-\widetilde{V}O}\right\\|\leq\frac{16\sqrt{2}}{3\left(\mu_{d}^{}-\mu_{d+1}^{}\right)\mu_{d}^{}}\left(\frac{2\mu_{1}^{}}{3(\mu_{d}^{}-\mu_{d+1}^{})}+1\right)\left\\|{Y-Y^{*}}\right\\|^{2}$
	$\displaystyle\quad+\frac{8\sqrt{2}}{3\left(\mu_{d}^{}-\mu_{d+1}^{}\right)\mu_{d}^{}}\left(\frac{4\mu_{1}^{}\left(\mu_{1}^{}-\mu_{d}^{}\right)}{\mu_{d}^{}-\mu_{d+1}^{}}+2(\mu_{1}^{}-\mu_{d}^{})+\max\{\|\mu^{}_{d+1}\|,\|\mu^{}_{n}\|\}\right)\left\\|{Y-Y^{*}}\right\\|.$

Lemma 4 includes Lemma 2 as a special case when $d=1$ . For $d>1$ , if $\mu_{1}^{*}=\mu_{d}^{*}$ , i.e., the largest $d$ eigenvalues of $Y^{*}$ are all equal, the upper bound in Lemma 4 simplifies to

	$\displaystyle\inf_{O\in\mathcal{O}(d)}\\|{V-\widetilde{V}O}\\|$	$\displaystyle\lesssim\frac{1}{\mu_{d}^{}-\mu_{d+1}^{}}\Bigg{(}\left(\frac{1}{\mu_{d}^{}-\mu_{d+1}^{}}+\frac{1}{\mu_{d}^{}}\right)\left\\|{Y-Y^{}}\right\\|^{2}$
		$\displaystyle\quad\quad\quad\quad\quad\quad+\frac{\max\{\|\mu^{}_{d+1}\|,\|\mu^{}_{n}\|\}}{\mu_{d}^{}}\left\\|{Y-Y^{}}\right\\|\Bigg{)},$

which is similar in form to the upper bound in Lemma 2. This expression can be used in the $\mathcal{O}(d)$ synchronization problem as $\lambda_{1}^{*}$ is shown to be equal to $\lambda_{d}^{*}$ in Lemma 3. A direct application of this expression leads to the following proposition regarding the perturbation between $U$ and $\widetilde{U}$ .

Proposition 4.

Assume $2\leq d\leq C_{0}$ for some constant $C_{0}>0$ . There exist constants $C_{1},C_{2},C_{3}>0$ such that if $\frac{np}{\log n}>C_{1}$ and $\frac{np}{\sigma^{2}}>C_{2}$ , we have

\displaystyle\inf_{O\in\mathcal{O}(d)}\left\|{U-\widetilde{U}O}\right\|\leq C_{3}\frac{\sigma^{2}d+\sigma\sqrt{d}}{np},

with probability at least $1-6n^{-10}$ .

When $d=1$ , Proposition 4 reduces to Proposition 2. With Proposition 4, we can carry out a sharp $\ell_{2}$ analysis of the performance of the spectral estimator $\widehat{Z}$ using $\widetilde{U}$ . The loss function is defined analogously to (5) as

\displaystyle\ell^{\text{od}}(\widehat{Z},Z^{*}):=\min_{O\in\mathcal{O}(d)}\frac{1}{n}\left\|{\widehat{Z}_{j}-Z_{j}^{*}O}\right\|_{\rm F}^{2}.

In this way, we have the following theorem which is similar to Theorem 3. Its asymptotic version is given in Theorem 2. The proof of Theorem 4 follows that of Theorem 3 but is more complicated due to the existence of the mapping $\mathcal{P}$ in the definition of the spectral method. To prove Theorem 4, note that for each $j\in[n]$ , $\|{\widehat{Z}_{j}-Z^{*}_{j}}\|_{\rm F}=\|{\mathcal{P}(U_{j})-Z^{*}_{j}}\|_{\rm F}=\|{\mathcal{P}(Z_{j}^{*{\mathrm{\scriptscriptstyle T}}}U_{j})-I_{d}}\|_{\rm F}$ where $Z_{j}^{*{\mathrm{\scriptscriptstyle T}}}U_{j}$ can be approximated by $Z_{j}^{*{\mathrm{\scriptscriptstyle T}}}\widetilde{U}_{j}$ according to Proposition (4). The term $Z_{j}^{*{\mathrm{\scriptscriptstyle T}}}\widetilde{U}_{j}$ can be further expanded using (21) and Lemma 3, leading to $\sum_{k\neq j}A_{jk}Z_{j}^{*{\mathrm{\scriptscriptstyle T}}}\mathcal{W}_{jk}Z^{*}_{k}$ and several approximation error terms. Careful analysis of $\sum_{k\neq j}A_{jk}Z_{j}^{*{\mathrm{\scriptscriptstyle T}}}\mathcal{W}_{jk}Z^{*}_{k}$ eventually leads to the minimax risk $d(d-1)\sigma^{2}/(2np)$ and all the other error terms turn out to be negligible.

Theorem 4.

Assume $2\leq d\leq C_{0}$ for some constant $C_{0}>0$ . There exist constants $C_{1},C_{2},C_{3}$ such that if $\frac{np}{\log n}>C_{1}$ and $\frac{np}{\sigma^{2}}>C_{2}$ , we have

\displaystyle\ell^{\text{od}}(\widehat{Z},Z^{*})

\displaystyle\leq\left(1+C_{3}\left(\left(\frac{\sigma^{2}}{np}\right)^{\frac{1}{4}}+\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)\right)\frac{d(d-1)\sigma^{2}}{2np}

holds with probability at least $1-n^{-9}-\exp\left(-\frac{1}{32}\left(\frac{np}{\sigma^{2}}\right)^{\frac{1}{4}}\right)$ .

We can compare the upper bound in Theorem 4 to existing results for the $\mathcal{O}(d)$ synchronization. [20] derived an upper bound for the spectral method: $\ell(\widehat{Z},Z^{*})\lesssim d^{4}(1+\sigma^{2}d)/(np)$ with high probability. In comparison, our upper bound has a smaller factor of $d(d-1)/2$ for $\sigma^{2}/np$ . In addition, it does not have the $d^{4}/np$ error term. The paper [20] also established the minimax lower bound: when $2\leq d\leq C_{0}$ , there exist constants $C_{4},C_{5}>0$ such that if $\frac{np}{\sigma^{2}}>C_{4}$ , we have

\displaystyle\inf_{Z\in\mathcal{O}(d)^{n}}\sup_{Z^{*}\in\mathcal{O}(d)^{n}}\ell^{\text{od}}(\widehat{Z},Z^{*})\geq\left(1-C_{5}\left(\frac{1}{n}+\frac{\sigma^{2}}{np}\right)\right)\frac{d(d-1)\sigma^{2}}{2np}.

Compared to the lower bound, the spectral estimator $\widehat{Z}$ is exact minimax optimal as it achieves the correct rate with the correct constant $d(d-1)/2$ in front of the optimal rate $\sigma^{2}/np$ .

4 Discussions

4.1 Comparison of Spectral Method and Other Methods

In synchronization problems, the spectral method offers computational advantages over alternative methods such as MLE, SDP, and GPM. According to Theorem 1, the spectral method attains statistical optimality in the limit as $\frac{np}{\sigma^{2}}\rightarrow\infty$ , achieving the minimum possible risk. The performance of the spectral method in scenarios where $\frac{np}{\sigma^{2}}$ does not approach infinity, however, remains less understood.

Previous studies [22, 24] have explored the PCA method in Bayesian settings for synchronization problems with $p=1$ . Unlike the spectral method, as defined in (4), PCA does not involve entrywise normalization but scales the leading eigenvector $u$ to minimize the mean square error (MSE). These studies offer a comprehensive asymptotic analysis of PCA’s MSE and that of the Bayes-optimal estimator, demonstrating both methods’ ability to achieve substantial accuracy when $\sigma^{2}$ is below a specific threshold. However, PCA tends to exhibit a higher MSE compared to the Bayes-optimal estimator. Furthermore, [22] indicates that the MSE of SDP falls between that of PCA and the Bayes-optimal estimator, leaning more towards the latter.

While Theorem 1 addresses the regime where $\frac{np}{\sigma^{2}}\rightarrow\infty$ , Theorem 3 establishes an upper bound in scenarios where $\frac{np}{\sigma^{2}}$ exceeds a certain constant. This suggests a complex interplay between the performance of the spectral method and the ratio $\frac{\sigma^{2}}{np}$ in the constant $\frac{np}{\sigma^{2}}$ regime. To better understand this relationship, we conducted numerical experiments using the spectral method, GPM, and SDP under various $\sigma^{2}$ levels. The GPM, initialized with the spectral estimator $\widehat{z}^{(0)}_{\text{GPM}}:=\widehat{z}$ , iteratively updates $\widehat{z}^{(t)}_{\text{GPM}}:=f(X\widehat{z}^{(t-1)}_{\text{GPM}})$ for $t\geq 1$ , where $f:\mathbb{C}^{n}\rightarrow\mathbb{C}_{1}^{n}$ is an entrywise normalization function defined as $[f(x)]_{i}:=x_{i}/|x_{i}|{\mathbb{I}\left\{{x_{i}\neq 0}\right\}}+{\mathbb{I}\left\{{x_{i}=0}\right\}}$ for any $x\in\mathbb{C}^{n}$ . The SDP, a convex optimization problem, maximizes $\max_{Z\in\mathbb{C}^{n\times n}:Z=Z^{\mathrm{\scriptscriptstyle H}},\text{diag}(Z)=I_{n},Z\succeq 0}\text{Tr}(XZ)$ over complex positive-semidefinite Hermitian matrices with unit diagonal entries and can be initialized using the spectral method. We assessed their performances using the normalized squared $\ell_{2}$ loss (5).

Refer to caption — Figure 1: Numerical results for the spectral method, GPM, and SDP in phase synchronization, with $n=100$ , $p=0.5$ and $\sigma^{2}$ varying within $[0,20]$ . Left: Error comparison measured by the normalized squared $\ell_{2}$ loss. Right: Comparison of the high-order term in their errors.

Figure 1 summarizes the comparative performances of these methods. For low $\sigma^{2}$ values, the error rates of all methods approximate $\frac{\sigma^{2}}{2np}$ . The left panel of the figure shows that as $\sigma^{2}$ increases, their error rates rise more steeply than $\frac{\sigma^{2}}{2np}$ . As $\sigma^{2}$ continues to increase, the spectral method exhibits higher error rates, as expected, since the other two methods use the spectral method for initialization and enhance it through more complex procedures. For a deeper insight into the numerical performance differences, we compare the high-order terms in their errors. Specifically, the normalized squared $\ell_{2}$ loss for each method can be expressed as $(1+\delta)\frac{\sigma^{2}}{2np}$ , where $\delta$ represents the high-order term. The right panel of Figure 1 compares $\delta$ for these three methods. It reveals that even at small $\sigma^{2}$ values, the spectral method’s performance diverges from those of the other methods. This suggests that while $\delta$ diminishes to 0 for all three methods as $\sigma^{2}$ decreases (thus achieving exact minimax optimality), the spectral method’s $\delta$ diminishes more slowly than those of the other two methods.

Deriving explicit expressions for these error rates would be insightful, yet it falls outside the scope of this paper and presents an avenue for future research.

4.2 Condition on $p$

In the phase synchronization problem (1), observations are missing at random, forming an Erdös-Rényi random graph $A$ with edge probability $p$ . The value of $p$ cannot be excessively small, as this could result in $A$ being disconnected, thereby making accurate estimation of $z^{*}$ under a global phase impossible. Theorem 1 assumes $\frac{np}{\log n}\rightarrow\infty$ to establish the exact minimax optimality of the spectral method. A less stringent condition, where $\frac{np}{\log n}$ exceeds a certain constant, is considered in Theorem 3. However, it is known that $A$ is connected with high probability when $\frac{np}{\log n}>1+\epsilon$ for any constant $\epsilon>0$ . This raises the question of how the spectral method performs when $\frac{np}{\log n}$ is a small constant.

Our analysis requires $\frac{np}{\log n}$ to be greater than a certain constant for several technical reasons. This condition ensures desired bounds hold for critical quantities such as $\|A-\mathbb{E}A\|$ and $\|A\circ W\|$ , which are essential for the $\ell_{\infty}$ analysis in Proposition 1 and the $\ell_{2}$ analysis of the first order approximation in Proposition 2. Moreover, the proof of Theorem 3 leverages the $\ell_{\infty}$ results from Proposition 1, leading to the inclusion of the $\sqrt{\frac{\log n}{np}}$ factor in the theorem’s upper bound. This requires $\frac{np}{\log n}$ to approach infinity for the upper bound to asymptotically match the exact minimax risk. Obtaining precise bounds for the performance of the spectral method when $\frac{np}{\log n}$ is a small constant would require an extension beyond our current analytic framework, a task we leave for future research.

4.3 Other Low-rank Problems

The synchronization problems investigated in this manuscript are part of a broader category of problems characterized by low-rank matrix structures disrupted by additive noise and incomplete data. The methodologies developed herein are applicable to a variety of related problems, such as matrix completion, principal component analysis, factor models, mixture models, and ranking from pairwise comparison data. A key observation is that many of these problems encompass multiple sources of randomness, such as that arising from missing data and additive noise. An effective approach, as demonstrated in this study, is to isolate these sources and evaluate their individual contributions to the overall estimation error. This strategy is exemplified in our analysis of synchronization problems, where we introduce a novel population eigenvector and eigenspace. Furthermore, Lemma 2 and Lemma 4 offer a general framework for the perturbation analysis of eigenvectors and eigenspaces.

On the other hand, synchronization problems are special in that their leading eigenvector or eigenspace is spread out. In the literature [12], the coherence of an eigenvector $u$ is defined as $\max_{i\in[n]}|u_{i}|^{2}/n$ , where $u_{1},\ldots,u_{n}$ are its coordinates. In phase synchronization, the leading eigenvector of $\mathbb{E}X$ in (3) possesses uniformly equal magnitude $1/\sqrt{n}$ , indicating maximal coherence. Contrastingly, in many low-rank problems, eigenvectors exhibit lower coherence, which naturally factors into theoretical analysis. Therefore, when extending the concepts and methodologies from this paper to other scenarios, it is crucial to monitor eigenvector coherence for more precise and insightful analysis.

5 Proofs for Phase Synchronization

5.1 Proof of Lemma 2

We first present a variant of Davis-Kahan Theorem [14] and an inequality about $\inf_{b\in\mathbb{C}_{1}}\left\|{x-yb}\right\|$ and $\left\|{(I_{d}-xx^{\mathrm{\scriptscriptstyle H}})y}\right\|$ that will be used in the proof of Lemma 2.

Lemma 5.

Let $X,\widetilde{X}\in\mathbb{C}^{d\times d}$ be two Hermitian matrices. Let $\lambda_{1}\geq\lambda_{2}\geq\ldots\geq\lambda_{d}$ be the eigenvalues of $X$ . Consider any $r\in[d]$ . Let $U\in\mathbb{C}^{d\times r}$ (resp. $\widetilde{U}$ ) be the eigenspace of $X$ (resp. $\widetilde{X}$ ) that includes its leading $r$ eigenvectors. Under the assumption that $\|{X-\widetilde{X}}\|<(\lambda_{r}-\lambda_{r+1})/4$ , we have

\displaystyle\left\|{(I-UU^{\mathrm{\scriptscriptstyle H}})\widetilde{U}}\right\|\leq\frac{4\left\|{X-\widetilde{X}}\right\|}{3(\lambda_{r}-\lambda_{r+1})}.

Lemma 6.

For any unit vectors $x,y\in\mathbb{C}^{d}$ , we have $\inf_{b\in\mathbb{C}_{1}}\left\|{x-yb}\right\|\leq\sqrt{2}\left\|{(I_{d}-xx^{\mathrm{\scriptscriptstyle H}})y}\right\|$ .

Proof of Lemma 2.

Denote $\mu_{1}\geq\ldots\geq\mu_{n}$ as the eigenvalues of $Y$ . We first give some inequalities for the eigenvalues and $\|Yv^{*}\|$ that will be used later in the proof. By Weyl’s inequality, we have

\displaystyle\max\left\{\left|\mu_{1}-\mu_{1}^{*}\right|,\left|\mu_{2}-\mu_{2}^{*}\right|\right\}\leq\left\|{Y-Y^{*}}\right\|.

Since $\left\|{Y-Y^{*}}\right\|\leq\min\{\mu_{1}^{*}-\mu_{2}^{*},\mu_{1}^{*}\}/4$ is assumed, we have

\displaystyle\frac{3}{4}\mu_{1}^{*}\leq\mu_{1}\leq\frac{5}{4}\mu_{1}^{*},\quad\mu_{1}-\mu_{2}\geq\frac{\mu_{1}^{*}-\mu_{2}^{*}}{2},

(23)

and

\displaystyle\left|\frac{\mu_{1}^{*}}{\mu_{1}}-1\right|=\frac{\left|\mu_{1}^{*}-\mu_{1}\right|}{\mu_{1}}\leq\frac{\left\|{Y-Y^{*}}\right\|}{\mu_{1}^{*}-\left\|{Y-Y^{*}}\right\|}\leq\frac{4\left\|{Y-Y^{*}}\right\|}{3\mu_{1}^{*}}.

(24)

Regarding $\|Yv^{*}\|$ , using the decomposition

\displaystyle Y=Y^{*}+(Y-Y^{*})=\mu_{1}^{*}v^{*}v^{*{\mathrm{\scriptscriptstyle H}}}+(Y^{*}-\mu_{1}^{*}v^{*}v^{*{\mathrm{\scriptscriptstyle H}}})+(Y-Y^{*}),

and its consequence

\displaystyle Yv^{*}=Y^{*}v^{*}+(Y-Y^{*})v^{*}=\mu_{1}^{*}v^{*}+(Y-Y^{*})v^{*},

(25)

we have

\displaystyle\left\|{Yv^{*}}\right\|\geq\mu_{1}^{*}-\left\|{Y-Y^{*}}\right\|\geq\frac{3\mu_{1}^{*}}{4}.

(26)

We define $\widecheck{v}\in\mathbb{C}^{n}$ and $\widetilde{v}\in\mathbb{S}_{n}$ as

	$\displaystyle\widecheck{v}$	$\displaystyle:=\frac{Yv^{*}}{\mu_{1}},$		(27)
	$\displaystyle\widetilde{v}$	$\displaystyle:=\frac{Yv^{}}{\left\\|{Yv^{}}\right\\|}.$		(28)

Then $\widetilde{v}$ is the first-order approximation of $v$ , written equivalently as $\widetilde{v}={\widecheck{v}}/{\left\|{\widecheck{v}}\right\|}$ . Note that with $\left\|{Yv^{*}}\right\|>0$ as shown in (26), $\widetilde{v}$ is well-defined.

Since $v$ is the eigenvector of $Y$ corresponding to $\mu_{1}$ , we have

	$\displaystyle\mu_{1}v$	$\displaystyle=Yv,$
	$\displaystyle\mu_{1}\widetilde{v}$	$\displaystyle=Yv^{*}/\left\\|{\widecheck{v}}\right\\|.$

Subtracting the second equation from the first one, we have

\displaystyle\mu_{1}(v-\widetilde{v})=Y\left(v-\frac{v^{*}}{\left\|{\widecheck{v}}\right\|}\right)=Y(v-\widetilde{v})+Y\left(\widetilde{v}-\frac{v^{*}}{\left\|{\widecheck{v}}\right\|}\right)=Y(v-\widetilde{v})+\frac{1}{\left\|{\widecheck{v}}\right\|}Y(\widecheck{v}-v^{*}).

After rearranging, we have

\displaystyle\left\|{\widecheck{v}}\right\|(\mu_{1}I_{n}-Y)(v-\widetilde{v})=Y(\widecheck{v}-v^{*}).

(29)

Since $(\mu_{1}I_{n}-Y)v=0$ , we have $\text{span}(\mu_{1}I_{n}-Y)$ being orthogonal to $v$ . As a result, $\left\|{\widecheck{v}}\right\|(\mu_{1}I_{n}-Y)(v-\widetilde{v})=\left\|{\widecheck{v}}\right\|(\mu_{1}I_{n}-Y)(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})\widetilde{v}$ . In addition, since the left-hand side of (29) belongs to $\text{span}(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})$ , its right-hand side must also belong to $\text{span}(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})$ . That is, $Y(\widecheck{v}-v^{*})=(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})Y(\widecheck{v}-v^{*})$ . Then (29) leads to

\displaystyle\left\|{\widecheck{v}}\right\|(\mu_{1}I_{n}-Y)(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})\widetilde{v}=(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})Y(\widecheck{v}-v^{*}).

(30)

Observe that $0\leq\mu_{1}-\mu_{2}\leq\ldots\leq\mu_{1}-\mu_{n}$ are the eigenvalues of $\mu_{1}I_{n}-Y$ . In particular, the eigenvector corresponding to 0 is $v$ . Since $(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})\widetilde{v}$ is orthogonal to $v$ , from (30) we have

\displaystyle\left\|{\widecheck{v}}\right\|(\mu_{1}-\mu_{2})\left\|{(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})\widetilde{v}}\right\|\leq\left\|{\widecheck{v}}\right\|\left\|{(\mu_{1}I_{n}-Y)(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})\widetilde{v}}\right\|=\left\|{(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})Y(\widecheck{v}-v^{*})}\right\|.

Hence,

\displaystyle\left\|{(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})\widetilde{v}}\right\|\leq\frac{1}{\left\|{\widecheck{v}}\right\|(\mu_{1}-\mu_{2})}\left\|{(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})Y(\widecheck{v}-v^{*})}\right\|.

(31)

From Lemma 6, we have $\inf_{b\in\mathbb{C}_{1}}\left\|{v-\widetilde{v}b}\right\|\leq\sqrt{2}\left\|{(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})\widetilde{v}}\right\|$ . With this, (31) leads to

\displaystyle\inf_{b\in\mathbb{C}_{1}}\left\|{v-\widetilde{v}b}\right\|\leq\frac{\sqrt{2}}{\left\|{\widecheck{v}}\right\|(\mu_{1}-\mu_{2})}\left\|{(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})Y(\widecheck{v}-v^{*})}\right\|.

(32)

In the following, we are going to analyze ${(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})Y(\widecheck{v}-v^{*})}$ . We have

	$\displaystyle(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})Y(\widecheck{v}-v^{*})$
	$\displaystyle=(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})Y\left(\frac{Yv^{}}{\mu_{1}}-v^{}\right)$
	$\displaystyle=(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})Y\left(\frac{\mu^{}_{1}}{\mu_{1}}-1\right)v^{}+\frac{1}{\mu_{1}}(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})Y\left(Y-Y^{}\right)v^{}$
	$\displaystyle=\left(\frac{\mu^{}_{1}}{\mu_{1}}-1\right)(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})\mu^{}_{1}v^{}+\left(\frac{\mu^{}_{1}}{\mu_{1}}-1\right)(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})(Y-Y^{})v^{}$
	$\displaystyle\quad+\frac{1}{\mu_{1}}(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})\mu_{1}^{}v^{}v^{{\mathrm{\scriptscriptstyle H}}}\left(Y-Y^{}\right)v^{}+\frac{1}{\mu_{1}}(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})(Y^{}-\mu_{1}^{}v^{}v^{{\mathrm{\scriptscriptstyle H}}})\left(Y-Y^{}\right)v^{*}$
	$\displaystyle\quad+\frac{1}{\mu_{1}}(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})(Y-Y^{})\left(Y-Y^{}\right)v^{*}$
	$\displaystyle=\left(\left(\frac{\mu^{}_{1}}{\mu_{1}}-1\right)+\frac{1}{\mu_{1}}v^{{\mathrm{\scriptscriptstyle H}}}\left(Y-Y^{}\right)v^{}\right)\mu^{}_{1}(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})v^{}$
	$\displaystyle\quad+\left(\frac{\mu^{}_{1}}{\mu_{1}}-1\right)(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})(Y-Y^{})v^{}+\frac{1}{\mu_{1}}(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})(Y^{}-\mu_{1}^{}v^{}v^{{\mathrm{\scriptscriptstyle H}}})\left(Y-Y^{}\right)v^{*}$
	$\displaystyle\quad+\frac{1}{\mu_{1}}(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})(Y-Y^{})\left(Y-Y^{}\right)v^{*}.$

Hence,

	$\displaystyle\left\\|{(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})Y(\widecheck{v}-v^{*})}\right\\|$
	$\displaystyle\leq\left(\left\|\frac{\mu^{}_{1}}{\mu_{1}}-1\right\|+\frac{\left\|v^{{\mathrm{\scriptscriptstyle H}}}\left(Y-Y^{}\right)v^{}\right\|}{\mu_{1}}\right)\mu_{1}^{}\left\\|{(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})v^{}}\right\\|+\left\|\frac{\mu^{}_{1}}{\mu_{1}}-1\right\|\left\\|{Y-Y^{}}\right\\|$
	$\displaystyle\quad+\frac{\left\\|{Y^{}-\mu_{1}^{}v^{}v^{{\mathrm{\scriptscriptstyle H}}}}\right\\|\left\\|{Y-Y^{}}\right\\|}{\mu_{1}}+\frac{\left\\|{Y-Y^{}}\right\\|^{2}}{\mu_{1}}$
	$\displaystyle\leq\left(\left\|\frac{\mu^{}_{1}}{\mu_{1}}-1\right\|+\frac{\left\\|{Y-Y^{}}\right\\|}{\mu_{1}}\right)\mu_{1}^{}\left\\|{(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})v^{}}\right\\|+\left\|\frac{\mu^{}_{1}}{\mu_{1}}-1\right\|\left\\|{Y-Y^{}}\right\\|$
	$\displaystyle\quad+\frac{\|\mu_{2}^{}\|\left\\|{Y-Y^{}}\right\\|}{\mu_{1}}+\frac{\left\\|{Y-Y^{*}}\right\\|^{2}}{\mu_{1}},$

where we use the fact that $\left\|{I_{n}-vv^{\mathrm{\scriptscriptstyle H}}}\right\|=1$ and $\left\|{Y^{*}-\mu_{1}^{*}v^{*}v^{*{\mathrm{\scriptscriptstyle H}}}}\right\|=\max\{|\mu_{2}^{*}|,|\mu_{n}^{*}|\}$ . Then together with (32), we have

	$\displaystyle\inf_{b\in\mathbb{C}_{1}}\left\\|{v-\widetilde{v}b}\right\\|$	$\displaystyle\leq\frac{\sqrt{2}}{\left\\|{\widecheck{v}}\right\\|(\mu_{1}-\mu_{2})}\Bigg{(}\left(\left\|\frac{\mu^{}_{1}}{\mu_{1}}-1\right\|+\frac{\left\\|{Y-Y^{}}\right\\|}{\mu_{1}}\right)\mu_{1}^{}\left\\|{(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})v^{}}\right\\|$
		$\displaystyle\quad+\left\|\frac{\mu^{}_{1}}{\mu_{1}}-1\right\|\left\\|{Y-Y^{}}\right\\|+\frac{\max\{\|\mu_{2}^{}\|,\|\mu_{n}^{}\|\}\left\\|{Y-Y^{}}\right\\|}{\mu_{1}}+\frac{\left\\|{Y-Y^{}}\right\\|^{2}}{\mu_{1}}\Bigg{)}.$

In the rest of the proof, we are going to simplify the display above. From (23) and (26), we have

\displaystyle\left\|{\widecheck{v}}\right\|=\frac{\left\|{Yv^{*}}\right\|}{\mu_{1}}\geq\frac{3}{5}.

Using Lemma 5 and the assumption $\left\|{Y-Y^{*}}\right\|\leq(\mu_{1}^{*}-\mu_{2}^{*})/4$ , we have

\displaystyle\left\|{(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})v^{*}}\right\|\leq\frac{2\left\|{Y-Y^{*}}\right\|}{\mu_{1}^{*}-\mu_{2}^{*}}.

With the above results, together with (23) and (24), we have

	$\displaystyle\inf_{b\in\mathbb{C}_{1}}\left\\|{v-\widetilde{v}b}\right\\|$
	$\displaystyle\leq\frac{\sqrt{2}}{\frac{3}{5}\frac{\mu_{1}^{}-\mu_{2}^{}}{2}}\Bigg{(}\left(\frac{4\left\\|{Y-Y^{}}\right\\|}{3\mu_{1}^{}}+\frac{\left\\|{Y-Y^{}}\right\\|}{\frac{3}{4}\mu_{1}^{}}\right)\mu_{1}^{}\frac{2\left\\|{Y-Y^{}}\right\\|}{\mu_{1}^{}-\mu_{2}^{}}+\frac{4\left\\|{Y-Y^{}}\right\\|}{3\mu_{1}^{}}\left\\|{Y-Y^{*}}\right\\|$
	$\displaystyle\quad+\frac{\max\{\|\mu_{2}^{}\|,\|\mu_{n}^{}\|\}\left\\|{Y-Y^{}}\right\\|}{\frac{3}{4}\mu_{1}^{}}+\frac{\left\\|{Y-Y^{}}\right\\|^{2}}{\frac{3}{4}\mu_{1}^{}}\Bigg{)}$
	$\displaystyle=\frac{10\sqrt{2}}{3(\mu_{1}^{}-\mu_{2}^{})}\left(\left(\frac{16}{3(\mu_{1}^{}-\mu_{2}^{})}+\frac{8}{3\mu_{1}^{}}\right)\left\\|{Y-Y^{}}\right\\|^{2}+\frac{4\max\{\|\mu_{2}^{}\|,\|\mu_{n}^{}\|\}}{3\mu_{1}^{}}\left\\|{Y-Y^{}}\right\\|\right)$
	$\displaystyle\leq\frac{40\sqrt{2}}{9(\mu_{1}^{}-\mu_{2}^{})}\left(\left(\frac{4}{(\mu_{1}^{}-\mu_{2}^{})}+\frac{2}{\mu_{1}^{}}\right)\left\\|{Y-Y^{}}\right\\|^{2}+\frac{\max\{\|\mu_{2}^{}\|,\|\mu_{n}^{}\|\}}{\mu_{1}^{}}\left\\|{Y-Y^{}}\right\\|\right).$

∎

5.2 Proofs of Lemma 1, Proposition 1, and Proposition 2

Proof of Lemma 1 .

Denote $\lambda^{\prime}$ as an eigenvalue of $A$ with its corresponding eigenvector $u^{\prime}$ . Then we have $Au^{\prime}=\lambda^{\prime}u^{\prime}$ . This can be equivalently written as

\displaystyle\sum_{k\neq j}A_{jk}u^{\prime}_{k}=\lambda^{\prime}u^{\prime}_{j},\forall j\in[n].

Multiplying by ${z^{*}_{j}}$ on both sides, we have

\displaystyle\sum_{k\neq j}A_{jk}z^{*}_{j}u^{\prime}_{k}=\sum_{k\neq j}A_{jk}z^{*}_{j}\overline{z^{*}_{k}}(z^{*}_{k}u^{\prime}_{k})=\lambda^{\prime}z^{*}_{j}u^{\prime}_{j},\forall j\in[n].

That is, $(A\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}})(z^{*}\circ u^{\prime})=\lambda^{\prime}(z^{*}\circ u^{\prime})$ . Hence, $\lambda^{\prime}$ is an eigenvalue of $A\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}}$ with the corresponding eigenvector $z^{*}\circ u^{\prime}$ .

By the same argument, we can show each eigenvalue of $A\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}}$ is also an eigenvalue of $A$ . As a result, since $\widecheck{u}$ is the leading eigenvector of $A$ , $z^{*}\circ\widecheck{u}$ is the leading eigenvector of $A\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}}$ . ∎

Before proving Proposition 1 and Proposition 2, we first state some technical lemmas related to $A$ and $W$ .

Lemma 7.

The largest eigenvalue of $\mathbb{E}A$ is $(n-1)p$ and the corresponding eigenvector is $\mathds{1}_{n}/\sqrt{n}$ . The remaining eigenvalues of $\mathbb{E}A$ are $-p$ with multiplicity $n-1$ . Denote $\lambda^{\prime}\geq\lambda^{\prime}_{2}\geq\ldots\geq\lambda^{\prime}_{n}$ as the eigenvalues of $A$ . We have

\displaystyle|\lambda^{\prime}-(n-1)p|,\max_{2\leq j\leq n}|\lambda^{\prime}_{j}+p|\leq\left\|{A-\mathbb{E}A}\right\|,\text{ and }\lambda^{\prime}-\lambda^{\prime}_{2}\geq np-2\left\|{A-\mathbb{E}A}\right\|.

(33)

Lemma 8.

There exist constants $C_{1},C_{2}>0$ such that if $\frac{np}{\log n}>C_{1}$ , then we have

	$\displaystyle\left\\|{A-\mathbb{E}A}\right\\|\leq C_{2}\sqrt{np},$
	$\displaystyle\left\\|{A\circ W}\right\\|\leq C_{2}\sqrt{np},$
	$\displaystyle\sum_{j\in[n]}\left\|{\rm Im}\left({\sum_{k\neq j}A_{jk}W_{jk}\overline{z_{j}^{}}z^{}_{k}}\right)\right\|^{2}\leq\frac{n^{2}p}{2}\left(1+C_{2}\sqrt{\frac{\log n}{n}}\right),$

with probability at least $1-3n^{-10}$ .

The first part of Proposition 1 (i.e., (16)) can be proved using Theorem 2.1 of [1] which we include below for completeness. The statement of Theorem 2.1 in [1] is complicated as the theorem works for perturbation of eigenspaces. However, what we need to consider here is only the perturbation of the leading eigenvector. For easier reference, we present below a simpler version of the theorem.

Lemma 9 (A simpler version of Theorem 2.1 of [1]).

Consider two symmetric matrices $Y,Y^{*}\in\mathbb{R}^{n\times n}$ . Let the eigenvalues of $Y^{*}$ be $\mu_{1}^{*}\geq\mu_{2}^{*}\geq\ldots\geq\mu_{n}^{*}$ . Define $\Delta^{*}:=\min\{\mu^{*}_{1}-\mu^{*}_{2},\mu^{*}_{1}\}$ and $\kappa:=\max\{|\mu^{*}_{1}|,|\mu^{*}_{n}|\}/\Delta^{*}$ . Let the leading eigenvector of $Y$ (resp. $Y^{*}$ ) be $v$ (resp. $v^{*}$ ). Assume the following conditions are satisfied for some $\gamma\geq 0$ and some function $\phi:[0,+\infty)\rightarrow[0,+\infty)$ :

1.

$\left\|{Y^{*}}\right\|_{2\rightarrow\infty}\leq\gamma\Delta^{*}$ .
2.

For any $m\in[n]$ , $\{Y_{jk}:j=m\text{ or }k=m\}$ are independent of $\{Y_{jk}:j\neq m,k\neq m\}$ .
3.

$32\kappa\max\{\gamma,\phi(\gamma)\}\leq 1$ and for some $\delta_{0}\in(0,1)$ , $\mathbb{P}\left(\left\|{Y-Y^{*}}\right\|\leq\gamma\Delta^{*}\right)\geq 1-\delta_{0}$ .

Suppose $\phi(x)$ is continuous and non-decreasing in $[0,+\infty)$ with $\phi(0)=0$ , $\phi(x)/x$ is non-increasing in $[0,+\infty)$ , and $\delta_{1}\in(0,1)$ . For any $m\in[n]$ and $w\in\mathbb{R}^{n}$ ,

\displaystyle\mathbb{P}\left(\left|[Y-Y^{*}]_{m\cdot}w\right|\leq\Delta^{*}\left\|{w}\right\|_{\infty}\phi\left(\frac{\left\|{w}\right\|}{\sqrt{n}\left\|{w}\right\|_{\infty}}\right)\right)\geq 1-\frac{\delta_{1}}{n}.

Then with probability at least $1-\delta_{0}-2\delta_{1}$ , there exists some constant $C>0$ and some $b\in\{-1,1\}$ such that

\displaystyle\left\|{vb-Yv^{*}/\mu_{1}^{*}}\right\|_{\infty}

\displaystyle\leq C\left(\kappa(\kappa+\phi(1))(\gamma+\phi(\gamma))\left\|{v^{*}}\right\|_{\infty}+\gamma\left\|{Y^{*}}\right\|_{2\rightarrow\infty}/\Delta^{*}\right).

The following Lemma 10 provides two Bernstein-type concentration inequalities to be used in the proof of Proposition 1. The first one is the classical Bernstein inequality; see Section 2.8 of [6] for its proof. The second one is proved in Lemma 7 of [1].

Lemma 10.

Let $B_{1},\ldots,B_{n}$ be real independent random variables such that $\max_{j\in[n]}\left|B_{j}\right|\leq M$ for some $M>0$ . Then

\displaystyle\mathbb{P}\left(\left|\sum_{j\in[n]}(B_{j}-\mathbb{E}B_{j})\right|\geq t\right)\leq 2\exp\left(-\frac{\frac{1}{2}t^{2}}{\sum_{j\in[n]}\mathbb{E}(B_{j}-\mathbb{E}B_{j})^{2}+\frac{1}{3}Mt}\right).

Let $w\in\mathbb{R}^{n}$ be a fixed vector and $\alpha\geq 0$ . If $\{B_{j}\}_{j\in[n]}\stackrel{{\scriptstyle iid}}{{\sim}}\text{Bernoulli}(p)$ , we have

\displaystyle\mathbb{P}\left(\left|\sum_{j\in[n]}w_{j}(B_{j}-p)\right|\geq\frac{(2+\alpha)np}{1\vee\log\left(\frac{\sqrt{n}\left\|{w}\right\|_{\infty}}{\left\|{w}\right\|}\right)}\left\|{w}\right\|_{\infty}\right)\leq 2\exp(-\alpha np).

Proof of Proposition 1.

We use Lemma 9 to prove the first part of the proposition. Denote $\mu_{1}^{*}\geq\mu_{2}^{*}\geq\ldots\geq\mu_{n}^{*}$ as the eigenvalues of $\mathbb{E}A$ . Define $\Delta^{*}$ and $\kappa$ the same as in Lemma 9. From Lemma 7, we have $\Delta^{*}=(n-1)p$ , $\kappa=1$ , with $\mathds{1}_{n}/\sqrt{n}$ being the leading eigenvector of $\mathbb{E}A$ . Since $\mathbb{E}A=pJ_{n}-pI_{n}$ , we have $\|\mathbb{E}A\|_{2\rightarrow\infty}=\sqrt{(n-1)}p$ . By Lemma 8, there exist constants $c_{1},c_{2}>1$ such that if $\frac{np}{\log n}>c_{1}$ , then $\left\|{A-\mathbb{E}A}\right\|\leq c_{2}\sqrt{np}$ with probability at least $1-3n^{-10}$ . Define $\gamma:=2c_{2}/\sqrt{np}$ , $\delta_{0}:=2n^{-10}$ , and $\phi(x):=3(1\vee\log(x^{-1}))^{-1}$ . Then the first assumption of Lemma 9 is satisfied as long as $c_{2}\geq 1$ . When $\frac{np}{\log n}$ is greater than some sufficiently large constant, we have $\phi(\gamma)\leq 8/\log(np)$ , and the third assumption is satisfied. We can also verify that the second assumption is also satisfied. For any $m\in[n]$ and any $w\in\mathbb{R}^{n}$ , since $[A-\mathbb{E}A]_{m\cdot}w$ is a weighted average of centered Bernoulli random variables, the second inequality of Lemma 10 can be applied to have

	$\displaystyle\mathbb{P}\left(\left\|[A-\mathbb{E}A]_{j\cdot}w\right\|>\Delta^{*}\\|w\\|_{\infty}\phi\left(\frac{\left\\|{w}\right\\|}{\sqrt{n}\left\\|{w}\right\\|_{\infty}}\right)\right)$
	$\displaystyle\leq\mathbb{P}\left(\left\|[A-\mathbb{E}A]_{j\cdot}w\right\|\geq\frac{2.5np}{1\vee\log\left(\frac{\sqrt{n}\left\\|{w}\right\\|_{\infty}}{\left\\|{w}\right\\|}\right)}\left\\|{w}\right\\|_{\infty}\right)\leq 2n^{-11},$

when $\frac{np}{\log n}\geq 11$ is greater than some sufficiently large constant. Define $\delta_{1}:=2n^{-10}$ . Then the last assumption of Lemma 9 is satisfied. Then Lemma 9 leads to the conclusion that with probability at least $1-6n^{-10}$ , there exists some constant $c_{1}>0$ and some $b\in\{-1,1\}$ such that

	$\displaystyle\left\\|{\widecheck{u}b-\frac{1}{\mu_{1}^{*}\sqrt{n}}A\mathds{1}_{n}}\right\\|_{\infty}$	$\displaystyle\leq c_{1}\left(\kappa(\kappa+\phi(1))(\gamma+\phi(\gamma))\left\\|{\frac{1}{\sqrt{n}}\mathds{1}_{n}}\right\\|_{\infty}+\gamma\frac{\left\\|{\mathbb{E}A}\right\\|_{2\rightarrow\infty}}{\Delta^{*}}\right)$
		$\displaystyle\leq c_{1}\left((1+3)\left(\frac{2c_{2}}{\sqrt{np}}+\frac{8}{\log(np)}\right)\frac{1}{\sqrt{n}}+\frac{2c_{2}}{\sqrt{np}}\frac{\sqrt{(n-1)}p}{(n-1)p}\right)$
		$\displaystyle\leq\frac{c_{2}}{\log(np)}\frac{1}{\sqrt{n}},$

for some constant $c_{2}>0$ . Note that

\displaystyle\frac{1}{\mu_{1}^{*}\sqrt{n}}A\mathds{1}_{n}=\frac{1}{\mu_{1}^{*}\sqrt{n}}\mathbb{E}A\mathds{1}_{n}+\frac{1}{\mu_{1}^{*}\sqrt{n}}(A-\mathbb{E}A)\mathds{1}_{n}=\frac{1}{\sqrt{n}}\mathds{1}_{n}+\frac{1}{(n-1)p\sqrt{n}}(A-\mathbb{E}A)\mathds{1}_{n}.

Then we have

\displaystyle\left\|{\widecheck{u}b-\frac{1}{\sqrt{n}}\mathds{1}_{n}}\right\|_{\infty}\leq\frac{c_{2}}{\log(np)}\frac{1}{\sqrt{n}}+\frac{1}{(n-1)p}\left\|{\frac{1}{\sqrt{n}}(A-\mathbb{E}A)\mathds{1}_{n}}\right\|_{\infty}.

For any $m\in[n]$ , by the first inequality of Lemma 10, there exists some constant $c_{3}>0$ such that

	$\displaystyle\mathbb{P}\left(\left\|[A-\mathbb{E}A]_{m\cdot}\mathds{1}_{n}\right\|\geq c_{3}\sqrt{np\log n}\right)$	$\displaystyle\leq 2\exp\left(-\frac{\frac{c_{3}^{2}}{2}np\log n}{(n-1)p(1-p)+\frac{c_{3}}{3}\sqrt{np\log n}}\right)$
		$\displaystyle\leq 2n^{-11}.$

Together with a union bound, we have $\mathbb{P}\left(\|(A-\mathbb{E}A)\mathds{1}_{n}\|\geq c_{3}\sqrt{np\log n}\right)\leq 2n^{-10}$ . Hence,

\displaystyle\left\|{\widecheck{u}b-\frac{1}{\sqrt{n}}\mathds{1}_{n}}\right\|_{\infty}\leq\frac{c_{2}}{\log(np)}\frac{1}{\sqrt{n}}+\frac{1}{(n-1)p}\frac{c_{3}\sqrt{np\log n}}{\sqrt{n}}\leq c_{4}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)\frac{1}{\sqrt{n}},

for some constant $c_{4}>0$ with probability at least $1-8n^{-10}$ .

The second part of the proposition is an immediate consequence of the first part. If $\frac{np}{\log n}>\max\left\{C_{1},2C_{2}^{2}\right\}$ , all the coordinates of $\widecheck{u}$ have the same sign according to (16). From Lemma 1, we have $u=u^{*}$ as $u^{*}$ is the leading eigenvector of $A\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}}$ . If $\{\widecheck{u}_{j}\}_{j\in[n]}$ are all positive, we have

\widehat{z}_{j}=u_{j}^{*}/|{u_{j}^{*}}|=z^{*}_{j}\widecheck{u}_{j}/\widecheck{u}_{j}=z^{*}_{j},

for each $j\in[n]$ . That is, $\widehat{z}=z^{*}$ . If $\{\widecheck{u}_{j}\}_{j\in[n]}$ are all negative, we then have $\widehat{z}=-z^{*}$ . ∎

Proof of Proposition 2.

Recall $\lambda^{*}$ is the largest eigenvalue of $A\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}}$ . From Lemma 1, $u^{*}$ is the corresponding eigenvector. Denote $\lambda^{*}_{2}\geq\ldots\geq\lambda^{*}_{n}$ as its remaining eigenvalues. By Lemma 8, there exist constants $c_{1},c_{2}>0$ such that when $\frac{np}{\log n}>c_{1}$ , we have $\left\|{A-\mathbb{E}A}\right\|\leq c_{2}\sqrt{np}$ and $\left\|{A\circ W}\right\|\leq c_{2}\sqrt{np}$ with probability at least $1-3n^{-10}$ . By Lemma 1 and Lemma 7, we have $\lambda^{*}\geq(n-1)p-c_{2}\sqrt{np}$ , $\max\{|\lambda^{*}_{2}|,|\lambda^{*}_{n}|\}\leq p+c_{2}\sqrt{np}$ , and $\lambda^{*}-\lambda_{2}^{*}\geq np-2c_{2}\sqrt{np}$ . When $\frac{np}{\log n}$ and $\frac{np}{\sigma^{2}}$ are greater than some sufficiently large constant, we have $4\sigma\left\|{A\circ W}\right\|\leq np/2\leq\min\{\lambda^{*}_{1},\lambda^{*}-\lambda_{2}^{*}\}$ satisfied. Since $X-A\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}}=\sigma A\circ W$ , a direct application of Lemma 2 leads to

	$\displaystyle\inf_{b\in\mathbb{C}_{1}}\left\\|{u-\widetilde{u}b}\right\\|$
	$\displaystyle\leq\frac{40\sqrt{2}}{9(\lambda^{}-\lambda_{2}^{})}\left(\left(\frac{4}{\lambda^{}-\lambda_{2}^{}}+\frac{2}{\lambda^{}}\right)\sigma^{2}\left\\|{A\circ W}\right\\|^{2}+\frac{\max\{\|\lambda^{}_{2}\|,\|\lambda^{}_{n}\|\}\sigma\left\\|{A\circ W}\right\\|}{\lambda^{}}\right)$
	$\displaystyle\leq\frac{40\sqrt{2}}{9np/2}\left(\left(\frac{4}{np/2}+\frac{2}{np/2}\right)c_{2}^{2}\sigma^{2}np+\frac{(p+c_{2}\sqrt{np})c_{2}\sigma\sqrt{np}}{np/2}\right)$
	$\displaystyle\leq c_{3}\frac{\sigma^{2}+\sigma}{np},$

for some constant $c_{3}>0$ . ∎

5.3 Proof of Theorem 3

We first state some technical lemmas that will be used in the proof of Theorem 3.

Lemma 11.

There exists some constant $C_{1}>0$ such that for any $\gamma$ satisfying $\frac{\gamma^{2}np}{\sigma^{2}}\geq C_{1}$ , we have

\displaystyle\sum_{j\in[n]}{\mathbb{I}\left\{{\frac{2\sigma}{np}\left|{{\sum_{k\neq j}A_{jk}W_{jk}\overline{z_{j}^{*}}z^{*}_{k}}}\right|\geq\gamma}\right\}}\leq\frac{4\sigma^{2}}{\gamma^{2}p}\exp\left(-\frac{1}{16}\sqrt{\frac{\gamma^{2}np}{\sigma^{2}}}\right),

holds with probability at least $1-\exp\left(-\frac{1}{32}\sqrt{\frac{\gamma^{2}np}{\sigma^{2}}}\right)$ .

Lemma 12 (Lemma 10 and Lemma 11 of [18]).

For any $x\in\mathbb{C}$ such that ${\rm Re}(x)>0$ , $\left|\frac{x}{|x|}-1\right|\leq\left|\frac{{\rm Im}(x)}{{\rm Re}(x)}\right|$ . For any $x\in\mathbb{C}\setminus\{0\}$ and any $y\in\mathbb{C}_{1}$ , we have $\left|\frac{x}{\left|x\right|}-y\right|\leq 2\left|x-y\right|$ .

Proof of Theorem 3.

Let $b_{1}\in\mathbb{C}_{1}$ satisfy $\|{u-\widetilde{u}b_{1}}\|=\inf_{a\in\mathbb{C}_{1}}\left\|{u-\widetilde{u}a}\right\|$ . Denote $\delta:=u-\widetilde{u}b_{1}\in\mathbb{C}^{n}$ . Recall $\widecheck{u}$ is the leading eigenvector of $A$ . From Proposition 1, Proposition 2, and Lemma 8, there exist constants $c_{1},c_{2}>0$ such that if $\frac{np}{\log n},\frac{np}{\sigma^{2}}>c_{1}$ , we have

$\displaystyle\left\\|{\delta}\right\\|$	$\displaystyle\leq c_{2}\frac{\sigma^{2}+\sigma}{np},$	(34)
$\displaystyle\max_{j\in[n]}\left\|\widecheck{u}_{j}-\frac{1}{\sqrt{n}}b_{2}\right\|$	$\displaystyle\leq c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)\frac{1}{\sqrt{n}},$	(35)
$\displaystyle\left\\|{A-\mathbb{E}A}\right\\|$	$\displaystyle\leq c_{2}\sqrt{np},$	(36)
$\displaystyle\left\\|{A\circ W}\right\\|$	$\displaystyle\leq c_{2}\sqrt{np},$	(37)
$\displaystyle\sum_{j\in[n]}\left\|{\rm Im}\left({\sum_{k\neq j}A_{jk}W_{jk}\overline{z_{j}^{}}z^{}_{k}}\right)\right\|^{2}$	$\displaystyle\leq\frac{n^{2}p}{2}\left(1+c_{2}\sqrt{\frac{\log n}{n}}\right),$	(38)

with probability at least $1-n^{-9}$ , for some $b_{2}\in\{-1,1\}$ .

From (35), when $\frac{np}{\log n}\geq 2c_{2}^{2}$ , $\widecheck{u}$ is closer to $\mathds{1}_{n}/\sqrt{n}b_{2}$ than to $-\mathds{1}_{n}/\sqrt{n}b_{2}$ with respect to $\ell_{2}$ norm. From Lemma 7, $\mathds{1}_{n}/\sqrt{n}$ is the leading eigenvector of $\mathbb{E}A$ . By Lemma 5 and Lemma 6, we have

\displaystyle\left\|{\widecheck{u}-\mathds{1}_{n}/\sqrt{n}b_{2}}\right\|\leq\sqrt{2}\|(I-\mathds{1}_{n}\mathds{1}_{n}^{\mathrm{\scriptscriptstyle T}}/n)\widecheck{u}\|\leq\frac{2\left\|{A-\mathbb{E}A}\right\|}{np}\leq\frac{2c_{2}}{\sqrt{np}}.

Recall that $u^{*}$ is defined as $z^{*}\circ\widecheck{u}$ in (8). Define $\delta^{*}:=u^{*}-\frac{1}{\sqrt{n}}z^{*}b_{2}$ . This yields

	$\displaystyle\left\\|{\delta^{*}}\right\\|$	$\displaystyle=\left\\|{z^{}\circ\widecheck{u}-\frac{1}{\sqrt{n}}z^{}\circ\mathds{1}_{n}b_{2}}\right\\|=\left\\|{z^{*}\circ\left(\widecheck{u}-\frac{1}{\sqrt{n}}\mathds{1}_{n}b_{2}\right)}\right\\|$
		$\displaystyle=\left\\|{\widecheck{u}-\frac{1}{\sqrt{n}}\mathds{1}_{n}b_{2}}\right\\|\leq\frac{2c_{2}\sqrt{np}+2p}{np}.$		(39)

By the definition of $\widetilde{u}$ in (9), we can decompose $u$ into

	$\displaystyle u$	$\displaystyle=\widetilde{u}b_{1}+\delta=\frac{Xu^{}}{\left\\|{Xu^{}}\right\\|}b_{1}+\delta=\frac{b_{1}}{\left\\|{Xu^{}}\right\\|}\left(\left(A\circ z^{}z^{{\mathrm{\scriptscriptstyle H}}}\right)u^{}+\sigma\left(A\circ W\right)u^{*}\right)+\delta$
		$\displaystyle=\frac{b_{1}}{\left\\|{Xu^{}}\right\\|}\left(\lambda^{}u^{}+\sigma\left(A\circ W\right)u^{}\right)+\delta,$		(40)

where we use the fact that $u^{*}$ is the eigenvector of $A\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}}$ corresponding to the eigenvalue $\lambda^{*}$ by Lemma 1. With the definition of $u^{*}$ and also its approximation $\frac{1}{\sqrt{n}}z^{*}b_{2}$ , (40) leads to

\displaystyle u=\frac{b_{1}}{\left\|{Xu^{*}}\right\|}\left(\lambda^{*}(z^{*}\circ\widecheck{u})+\sigma\left(A\circ W\right)\left(\frac{1}{\sqrt{n}}z^{*}b_{2}+\delta^{*}\right)\right)+\delta.

For any $j\in[n]$ , denote $[A\circ W]_{j\cdot}$ as its $j$ th row. From the display above, we can express $u_{j}$ as

\displaystyle u_{j}=\frac{b_{1}}{\left\|{Xu^{*}}\right\|}\left(\lambda^{*}z^{*}_{j}\widecheck{u}_{j}+\frac{\sigma}{\sqrt{n}}\sum_{k\neq j}A_{jk}W_{jk}z^{*}_{k}b_{2}+\sigma[A\circ W]_{j\cdot}\delta^{*}\right)+\delta_{j}.

By (4), when $u_{j}\neq 0$ , we have

	$\displaystyle\left\|\widehat{z}_{j}-z_{j}^{*}b_{1}b_{2}\right\|$	$\displaystyle=\left\|b_{2}\overline{b_{1}z_{j}^{}}\widehat{z}_{j}-1\right\|=\left\|b_{2}\overline{b_{1}z_{j}^{}}\frac{u_{j}}{\left\|u_{j}\right\|}-1\right\|=\left\|\frac{b_{2}\overline{b_{1}z_{j}^{}}u_{j}}{\left\|b_{2}\overline{b_{1}z_{j}^{}}u_{j}\right\|}-1\right\|$
		$\displaystyle=\left\|\frac{\frac{\left\\|{Xu^{}}\right\\|}{\lambda^{}}b_{2}\overline{b_{1}z_{j}^{}}u_{j}}{\left\|\frac{\left\\|{Xu^{}}\right\\|}{\lambda^{}}b_{2}\overline{b_{1}z_{j}^{}}u_{j}\right\|}-1\right\|$		(41)

which is all about $\frac{\left\|{Xu^{*}}\right\|}{\lambda^{*}}b_{2}\overline{b_{1}z_{j}^{*}}u_{j}$ . With

\displaystyle\xi_{j}:=\sum_{k\neq j}A_{jk}W_{jk}\overline{z^{*}_{j}}z^{*}_{k},

we have

\displaystyle\frac{\left\|{Xu^{*}}\right\|}{\lambda^{*}}b_{2}\overline{b_{1}z_{j}^{*}}u_{j}

\displaystyle={b_{2}}\widecheck{u}_{j}+\frac{\sigma}{\lambda^{*}\sqrt{n}}\xi_{j}+\frac{\sigma[A\circ W]_{j\cdot}\delta^{*}b_{2}\overline{z_{j}^{*}}}{\lambda^{*}}+\frac{\left\|{Xu^{*}}\right\|}{\lambda^{*}}\delta_{j}b_{2}\overline{b_{1}z_{j}^{*}}.

(42)

Note that from (35), we have

\displaystyle b_{2}\widecheck{u}_{j}\geq\left(1-c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)\right)\frac{1}{\sqrt{n}}.

(43)

Let $0<\gamma,\rho<1/8$ whose values will be given later. Consider the following two cases.

(1) If

$\displaystyle\left\|\frac{\sigma}{\lambda^{*}\sqrt{n}}\xi_{j}\right\|$	$\displaystyle\leq\frac{\gamma}{\sqrt{n}},$	(44)
$\displaystyle\left\|\frac{\sigma[A\circ W]_{j\cdot}\delta^{}}{\lambda^{}}\right\|$	$\displaystyle\leq\frac{\rho}{\sqrt{n}},$	(45)
$\displaystyle\left\|\frac{\left\\|{Xu^{}}\right\\|}{\lambda^{}}\delta_{j}\right\|$	$\displaystyle\leq\frac{\rho}{\sqrt{n}}$	(46)

all hold, then from (42) and (43), we have

\displaystyle{\rm Re}\left(\frac{\left\|{Xu^{*}}\right\|}{\lambda^{*}}b_{2}\overline{b_{1}z_{j}^{*}}u_{j}\right)\geq\left(1-c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)-\gamma-2\rho\right)\frac{1}{\sqrt{n}},

which can be further lower bounded by $1/(2\sqrt{n})$ for sufficiently large $\frac{np}{\log n}$ . Therefore, $u_{j}\neq 0$ in this case. Then by Lemma 12 and (41), we have

	$\displaystyle\left\|\widehat{z}_{j}-z_{j}^{*}b_{1}b_{2}\right\|$
	$\displaystyle\leq\frac{\left\|{\rm Im}\left(\frac{\sigma}{\lambda^{}\sqrt{n}}\xi_{j}+\frac{\sigma[A\circ W]_{j\cdot}\delta^{}b_{2}\overline{z_{j}^{}}}{\lambda^{}}+\frac{\left\\|{Xu^{}}\right\\|}{\lambda^{}}\delta_{j}b_{2}\overline{b_{1}z_{j}^{*}}\right)\right\|}{\left(1-c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)-\gamma-2\rho\right)\frac{1}{\sqrt{n}}}$
	$\displaystyle\leq\frac{\left\|{\rm Im}\left(\frac{\sigma}{\lambda^{}\sqrt{n}}\xi_{j}\right)\right\|}{\left(1-c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)-\gamma-2\rho\right)\frac{1}{\sqrt{n}}}+\frac{\left\|\frac{\sigma[A\circ W]_{j\cdot}\delta^{}}{\lambda^{}}\right\|+\left\|\frac{\left\\|{Xu^{}}\right\\|}{\lambda^{*}}\delta_{j}\right\|}{\frac{1}{2\sqrt{n}}}$
	$\displaystyle=\frac{\frac{\sigma}{\lambda^{}}\left\|{\rm Im}\left(\xi_{j}\right)\right\|}{\left(1-c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)-\gamma-2\rho\right)}+\frac{2\sqrt{n}\sigma}{\lambda^{}}\left\|[A\circ W]_{j\cdot}\delta^{}\right\|+\frac{2\sqrt{n}\left\\|{Xu^{}}\right\\|}{\lambda^{*}}\left\|\delta_{j}\right\|.$

Note that for any $x,y\in\mathbb{R}$ and any $\eta>0$ , we have $(x+y)^{2}=x^{2}+2(\eta^{1/2}x)(\eta^{-1/2}y)+y^{2}\leq(1+\eta)x^{2}+(1+\eta^{-1})y^{2}$ . We have

	$\displaystyle\left\|\widehat{z}_{j}-z_{j}^{*}b_{1}b_{2}\right\|^{2}$	$\displaystyle\leq\frac{(1+\eta)\frac{\sigma^{2}}{\lambda^{*2}}\left\|{\rm Im}\left(\xi_{j}\right)\right\|^{2}}{\left(1-c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)-\gamma-2\rho\right)^{2}}$
		$\displaystyle\quad+(1+\eta^{-1})\frac{8n\sigma^{2}}{\lambda^{2}}\left\|[A\circ W]_{j\cdot}\delta^{}\right\|^{2}+(1+\eta^{-1})\frac{8n\left\\|{Xu^{}}\right\\|^{2}}{\lambda^{2}}\left\|\delta_{j}\right\|^{2},$

where the value of $\eta>0$ will be given later.

(2) If any one of (44)-(46) does not hold, we simply upper bound $|\widehat{z}_{j}-z_{j}^{*}b_{1}b_{2}|$ by 2. Then this case can be written as

	$\displaystyle\left\|\widehat{z}_{j}-z_{j}^{*}b_{1}b_{2}\right\|^{2}$
	$\displaystyle\leq 4\left({\mathbb{I}\left\{{\left\|\frac{\sigma}{\lambda^{}\sqrt{n}}\xi_{j}\right\|>\frac{\gamma}{\sqrt{n}}}\right\}}+{\mathbb{I}\left\{{\left\|\frac{\sigma[A\circ W]_{j\cdot}\delta^{}}{\lambda^{}}\right\|>\frac{\rho}{\sqrt{n}}}\right\}}+{\mathbb{I}\left\{{\left\|\frac{\left\\|{Xu^{}}\right\\|}{\lambda^{*}}\delta_{j}\right\|>\frac{\rho}{\sqrt{n}}}\right\}}\right)$
	$\displaystyle\leq 4\left({\mathbb{I}\left\{{\sigma\left\|\xi_{j}\right\|\geq{\gamma\lambda^{}}}\right\}}+\frac{\sigma^{2}n\left\|[A\circ W]_{j\cdot}\delta^{}\right\|^{2}}{\rho^{2}\lambda^{2}}+\frac{n\left\\|{Xu^{}}\right\\|^{2}\left\|\delta_{j}\right\|^{2}}{\rho^{2}\lambda^{*2}}\right),$

where in the last inequality we use the fact ${\mathbb{I}\left\{{x\geq y}\right\}}\leq x^{2}/y^{2}$ for any $x,y>0$ .

Combining the above two cases together, we have

	$\displaystyle\left\|\widehat{z}_{j}-z_{j}^{*}b_{1}b_{2}\right\|^{2}$
	$\displaystyle\leq\frac{(1+\eta)\frac{\sigma^{2}}{\lambda^{*2}}\left\|{\rm Im}\left(\xi_{j}\right)\right\|^{2}}{\left(1-c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)-\gamma-2\rho\right)^{2}}$
	$\displaystyle\quad+(1+\eta^{-1})\frac{8n\sigma^{2}}{\lambda^{2}}\left\|[A\circ W]_{j\cdot}\delta^{}\right\|^{2}+(1+\eta^{-1})\frac{8n\left\\|{Xu^{}}\right\\|^{2}}{\lambda^{2}}\left\|\delta_{j}\right\|^{2}$
	$\displaystyle\quad+4\left({\mathbb{I}\left\{{\sigma\left\|\xi_{j}\right\|\geq{\gamma\lambda^{}}}\right\}}+\frac{\sigma^{2}n\left\|[A\circ W]_{j\cdot}\delta^{}\right\|^{2}}{\rho^{2}\lambda^{2}}+\frac{n\left\\|{Xu^{}}\right\\|^{2}\left\|\delta_{j}\right\|^{2}}{\rho^{2}\lambda^{*2}}\right)$
	$\displaystyle\leq\frac{(1+\eta)\frac{\sigma^{2}}{\lambda^{2}}\left\|{\rm Im}\left(\xi_{j}\right)\right\|^{2}}{\left(1-c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)-\gamma-2\rho\right)^{2}}+4{\mathbb{I}\left\{{\sigma\left\|\xi_{j}\right\|\geq{\gamma\lambda^{}}}\right\}}$
	$\displaystyle\quad+8(1+\eta^{-1}+\rho^{-2})\frac{n\sigma^{2}}{\lambda^{2}}\left\|[A\circ W]_{j\cdot}\delta^{}\right\|^{2}+8(1+\eta^{-1}+\rho^{-2})\frac{n\left\\|{Xu^{}}\right\\|^{2}}{\lambda^{2}}\left\|\delta_{j}\right\|^{2}.$

The display above holds for each $j\in[n]$ . Summing over $j$ , we have

	$\displaystyle n\ell(\widehat{z},z^{*})$
	$\displaystyle\leq\sum_{j\in[n]}\left\|\widehat{z}_{j}-z_{j}^{*}b_{1}b_{2}\right\|^{2}$
	$\displaystyle\leq\frac{(1+\eta)\frac{\sigma^{2}}{\lambda^{2}}}{\left(1-c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)-\gamma-2\rho\right)^{2}}\sum_{j\in[n]}\left\|{\rm Im}\left(\xi_{j}\right)\right\|^{2}+4\sum_{j\in[n]}{\mathbb{I}\left\{{\sigma\left\|\xi_{j}\right\|\geq{\gamma\lambda^{}}}\right\}}$
	$\displaystyle\quad+8(1+\eta^{-1}+\rho^{-2})\frac{n\sigma^{2}}{\lambda^{2}}\sum_{j\in[n]}\left\|[A\circ W]_{j\cdot}\delta^{}\right\|^{2}+8(1+\eta^{-1}+\rho^{-2})\frac{n\left\\|{Xu^{}}\right\\|^{2}}{\lambda^{2}}\sum_{j\in[n]}\left\|\delta_{j}\right\|^{2}$
	$\displaystyle\leq\frac{(1+\eta)\frac{\sigma^{2}}{\lambda^{2}}}{\left(1-c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)-\gamma-2\rho\right)^{2}}\sum_{j\in[n]}\left\|{\rm Im}\left(\xi_{j}\right)\right\|^{2}+4\sum_{j\in[n]}{\mathbb{I}\left\{{\sigma\left\|\xi_{j}\right\|\geq{\gamma\lambda^{}}}\right\}}$
	$\displaystyle\quad+8(1+\eta^{-1}+\rho^{-2})\frac{n\sigma^{2}}{\lambda^{2}}\left\\|{A\circ W}\right\\|^{2}\left\\|{\delta^{}}\right\\|^{2}+8(1+\eta^{-1}+\rho^{-2})\frac{n\left\\|{Xu^{}}\right\\|^{2}}{\lambda^{2}}\left\\|{\delta}\right\\|^{2},$

where in the last inequality, we use $\sum_{j\in[n]}\left|[A\circ W]_{j\cdot}\delta^{*}\right|^{2}=\left\|{(A\circ W)\delta^{*}}\right\|^{2}\leq\left\|{A\circ W}\right\|^{2}\left\|{\delta^{*}}\right\|^{2}$ .

We are going to simplify the display above. From (34), (37), (39), and (38), we have upper bounds for $\left\|{\delta}\right\|,\left\|{A\circ W}\right\|,$ $\left\|{\delta^{*}}\right\|$ , and $\sum_{j\in[n]}\left|{\rm Im}\left(\xi_{j}\right)\right|^{2}$ . Using (36), Lemma 1, and Lemma 7, we have $\lambda^{*}\geq(n-1)p-c_{2}\sqrt{np}$ and a crude bound $np/2\leq\lambda^{*}\leq 2np$ when $\frac{np}{\log n}$ is greater than some sufficiently large constant. Due to the decomposition $X=A\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}}+\sigma A\circ W$ and that $(A\circ z^{*}z^{*{\mathrm{\scriptscriptstyle H}}})u^{*}=\lambda^{*}u^{*}$ , we have

\displaystyle\left\|{Xu^{*}}\right\|=\left\|{\lambda^{*}u^{*}+\sigma A\circ Wu^{*}}\right\|\leq\lambda^{*}+\sigma\left\|{A\circ W}\right\|\leq np+c_{2}\sigma\sqrt{np}.

From Lemma 11, if $\gamma$ satisfies $\frac{\gamma^{2}np}{\sigma^{2}}>c_{3}$ for some constant $c_{3}>0$ , we have

\displaystyle\sum_{j\in[n]}{\mathbb{I}\left\{{\sigma\left|\xi_{j}\right|\geq{\gamma\lambda^{*}}}\right\}}

\displaystyle\leq\sum_{j\in[n]}{\mathbb{I}\left\{{\frac{2\sigma}{np}\left|\xi_{j}\right|\geq{\gamma}}\right\}}\leq\frac{4\sigma^{2}}{\gamma^{2}p}\exp\left(-\frac{1}{16}\sqrt{\frac{\gamma^{2}np}{\sigma^{2}}}\right),

holds with probability at least $1-\exp\left(-\frac{1}{32}\sqrt{\frac{\gamma^{2}np}{\sigma^{2}}}\right)$ . When $c_{3}$ is sufficiently large, we have

\displaystyle\frac{4\sigma^{2}}{\gamma^{2}np}\exp\left(-\frac{1}{16}\sqrt{\frac{\gamma^{2}np}{\sigma^{2}}}\right)\leq\left(\frac{\sigma^{2}}{\gamma^{2}np}\right)^{3},

which is due to the fact $4\exp\left(-\sqrt{x}/16\right)\leq 1/x^{2}$ when $x\geq x_{0}$ for some large $x_{0}>0$ .

Combining the above results together, we have

	$\displaystyle\ell(\widehat{z},z^{*})$	$\displaystyle\leq\frac{(1+\eta)\left(\frac{1}{1-c_{2}\frac{1}{\sqrt{np}}-\frac{1}{n}}\right)^{2}}{\left(1-c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)-\gamma-2\rho\right)^{2}}\left(1+c_{2}\sqrt{\frac{\log n}{n}}\right)\frac{\sigma^{2}}{2np}+\left(\frac{\sigma^{2}}{\gamma^{2}np}\right)^{3}$
		$\displaystyle\quad+32(1+\eta^{-1}+\rho^{-2})c_{2}^{2}\left(\frac{2c_{2}}{\sqrt{np}}\right)^{2}\frac{\sigma^{2}}{np}$
		$\displaystyle\quad+128(1+\eta^{-1}+\rho^{-2})\left(1+\frac{c_{2}^{2}\sigma^{2}}{np}\right)c_{2}^{2}\frac{\sigma^{4}+\sigma^{2}}{(np)^{2}}.$

Note that $\frac{1}{(1-x)^{2}}\leq 1+16x,\forall\leq x\leq\frac{1}{2}$ . We have $\left(1-c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)-\gamma-2\rho\right)^{-2}\leq 16\left(c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)+\gamma+2\rho\right)$ and $\left(1-c_{2}\frac{1}{\sqrt{np}}-\frac{1}{n}\right)^{-2}\leq 16\left(c_{2}\frac{1}{\sqrt{np}}+\frac{1}{n}\right)$ as long as $\frac{np}{\log n}$ is greater than some sufficiently large constant. After rearrangement, there exists some constant $c_{5}>0$ such that

	$\displaystyle\ell(\widehat{z},z^{*})\leq\Bigg{(}1+c_{5}\Bigg{(}$	$\displaystyle\eta+\gamma+\rho+\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}+\gamma^{-6}\left(\frac{\sigma^{2}}{np}\right)^{2}$
		$\displaystyle\quad+(\eta^{-1}+\rho^{-2})\left(\frac{1+\sigma^{2}}{np}\right)\Bigg{)}\Bigg{)}\frac{\sigma^{2}}{2np}.$

We can choose $\gamma^{2}=\sqrt{{\sigma^{2}}/{(np)}}$ (then $\frac{\gamma^{2}np}{\sigma^{2}}>c_{3}$ is guaranteed as long as $\frac{np}{\sigma^{2}}>c_{3}^{2}$ ). We also set $\rho^{2}=\sqrt{(1+\sigma^{2})/np}$ and let $\eta=\rho^{2}$ . Then, there exists some constant $c_{6}>0$ such that

\displaystyle\ell(\widehat{z},z^{*})\leq\left(1+c_{6}\left(\left(\frac{\sigma^{2}}{np}\right)^{\frac{1}{4}}+\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)\right)\frac{\sigma^{2}}{2np}.

This holds with probability at least $1-n^{-9}-\exp\left(-\frac{1}{32}\left(\frac{np}{\sigma^{2}}\right)^{\frac{1}{4}}\right)$ . ∎

5.4 Proofs of Auxiliary Lemmas

Proof of Lemma 5.

Let $\widetilde{\lambda}_{1}\geq\widetilde{\lambda}_{2}\geq\ldots\geq\widetilde{\lambda}_{d}$ be eigenvalues of $\widetilde{X}$ . By Weyl’s inequality, we have $\|\widetilde{\lambda}_{r+1}-\lambda_{r+1}\|\leq\|X-\widetilde{X}\|$ . Under the assumption $\|{X-\widetilde{X}}\|<(\lambda_{r}-\lambda_{r+1})/4$ , we have

\displaystyle\lambda_{r}-\widetilde{\lambda}_{r+1}

\displaystyle=\lambda_{r}-\lambda_{r+1}+\lambda_{r+1}-\widetilde{\lambda}_{r+1}\geq\lambda_{r}-\lambda_{r+1}-\left\|{X-\widetilde{X}}\right\|>\frac{3}{4}\left(\lambda_{r}-\lambda_{r+1}\right)>0.

Define

\displaystyle\Theta(U,\widetilde{U}):=\text{diag}(\cos^{-1}\sigma_{1},\ldots,\cos^{-1}\sigma_{r})\in\mathbb{R}^{r\times r},

where $\sigma_{1}\geq\sigma_{2}\geq\ldots\geq\sigma_{r}$ are singular values of $U^{\mathrm{\scriptscriptstyle H}}\widetilde{U}$ . Since $\lambda_{r}-\widetilde{\lambda}_{r+1}>0$ , by Davis-Kahan Theorem [14], we have

\displaystyle\left\|{\sin\Theta(U,\widetilde{U})}\right\|\leq\frac{\left\|{X-\widetilde{X}}\right\|}{\lambda_{r}-\widetilde{\lambda}_{r+1}}\leq\frac{4\left\|{X-\widetilde{X}}\right\|}{3(\lambda_{r}-\lambda_{r+1})}.

From [14], we also have $\|{\sin\Theta(U,\widetilde{U})}\|=\|{(I-UU^{\mathrm{\scriptscriptstyle H}})\widetilde{U}}\|$ . The proof is complete. ∎

Proof of Lemma 6.

Since both $x$ and $y$ are unit vectors, we have

\displaystyle\left\|{x-yb}\right\|^{2}=2-x^{\mathrm{\scriptscriptstyle H}}yb-(yb)^{\mathrm{\scriptscriptstyle H}}x=2-2{\rm Re}(x^{\mathrm{\scriptscriptstyle H}}yb),\forall b\in\mathbb{C}_{1}.

(47)

Therefore, when $x^{\mathrm{\scriptscriptstyle H}}y=0$ , we have $\left\|{x-yb}\right\|=\sqrt{2}$ invariant of $b$ . In this case, we also have $\left\|{(I_{n}-xx^{\mathrm{\scriptscriptstyle H}})y}\right\|=\left\|{y}\right\|=1$ . This proves the statement in the lemma for the $x^{\mathrm{\scriptscriptstyle H}}y=0$ case. When $x^{\mathrm{\scriptscriptstyle H}}y\neq 0$ , the infimum over $b$ in (47) is achieved when $b=y^{\mathrm{\scriptscriptstyle H}}x/|y^{\mathrm{\scriptscriptstyle H}}x|$ . We then have

	$\displaystyle\inf_{b\in\mathbb{C}_{1}}\left\\|{x-yb}\right\\|^{2}$	$\displaystyle=\left\\|{y-\frac{x^{\mathrm{\scriptscriptstyle H}}y}{\left\|x^{\mathrm{\scriptscriptstyle H}}y\right\|}x}\right\\|^{2}=\left\\|{y-xx^{\mathrm{\scriptscriptstyle H}}y+xx^{\mathrm{\scriptscriptstyle H}}y-\frac{x^{\mathrm{\scriptscriptstyle H}}y}{\left\|x^{\mathrm{\scriptscriptstyle H}}y\right\|}x}\right\\|^{2}$
		$\displaystyle=\left\\|{y-xx^{\mathrm{\scriptscriptstyle H}}y}\right\\|^{2}+\left\\|{\left(1-\frac{1}{\left\|x^{\mathrm{\scriptscriptstyle H}}y\right\|}\right)(x^{\mathrm{\scriptscriptstyle H}}y)x}\right\\|^{2}$
		$\displaystyle=\left\\|{y-xx^{\mathrm{\scriptscriptstyle H}}y}\right\\|^{2}+\left\|1-\frac{1}{\left\|x^{\mathrm{\scriptscriptstyle H}}y\right\|}\right\|^{2}\left\|x^{\mathrm{\scriptscriptstyle H}}y\right\|^{2}$
		$\displaystyle\leq\left\\|{y-xx^{\mathrm{\scriptscriptstyle H}}y}\right\\|^{2}+\left\|1-\left\|x^{\mathrm{\scriptscriptstyle H}}y\right\|\right\|^{2},$

where we use the orthogonality between $(I_{d}-xx^{\mathrm{\scriptscriptstyle H}})y$ and $x$ . With $\left\|{y-xx^{\mathrm{\scriptscriptstyle H}}y}\right\|^{2}=1+\left\|{xx^{\mathrm{\scriptscriptstyle H}}y}\right\|^{2}-2y^{\mathrm{\scriptscriptstyle H}}xx^{\mathrm{\scriptscriptstyle H}}y=1-\left|x^{\mathrm{\scriptscriptstyle H}}y\right|^{2}\geq\left(1-\left|x^{\mathrm{\scriptscriptstyle H}}y\right|\right)^{2}$ , where the last inequality is due to $0\leq\left|x^{\mathrm{\scriptscriptstyle H}}y\right|\leq 1$ , the proof is complete. ∎

Proof of Lemma 7.

Note that $\mathbb{E}A=pJ_{n}-pI_{n}$ . Note that $(\mathds{1}_{n}/\sqrt{n})^{\mathrm{\scriptscriptstyle T}}\mathbb{E}A(\mathds{1}_{n}/\sqrt{n})=(n-1)p$ and for any unit vector $u\in\mathbb{R}^{n}$ that is orthogonal to $\mathds{1}_{n}/\sqrt{n}$ , we have $u^{\mathrm{\scriptscriptstyle T}}\mathbb{E}Au=0-p\|u\|^{2}=-p$ . Hence, $(n-1)p$ is the largest eigenvalue with $\mathds{1}_{n}/\sqrt{n}$ being the corresponding eigenvector, and $-p$ is another eigenvalue with multiplicity $n-1$ .

By Weyl’s inequality, we have $|\lambda^{\prime}-(n-1)p|,\max_{2\leq j\leq n}|\lambda^{\prime}_{j}-(-p)|\leq\left\|{A-\mathbb{E}A}\right\|$ , which leads to (33) after rearrangement. This completes the proof, with $\lambda^{*}=\lambda^{\prime}$ and $\lambda^{*}_{2}=\lambda^{\prime}_{2}$ by Lemma 1. ∎

Proof of Lemma 8.

The first two inequalities stem from Lemma 5 and Lemma 6 of [18], respectively. The third inequality is derived from Lemma 7 and (29) in [18]. ∎

Proof of Lemma 11.

It is proved in (31) of [18]. ∎

6 Proof of Lemma 4

Before the proof, we first state a technical lemma that is analogous to Lemma 6.

Lemma 13.

For any two matrices $U,V\in\mathcal{O}(d_{1},d_{2})$ , we have

\displaystyle\left\|{(I_{d_{1}}-VV^{\mathrm{\scriptscriptstyle T}})U}\right\|\leq\inf_{O\in\mathcal{O}(d_{2})}\left\|{V-UO}\right\|\leq\sqrt{2}\left\|{(I_{d_{1}}-VV^{\mathrm{\scriptscriptstyle T}})U}\right\|.

Proof.

Let $V_{\perp}\in\mathbb{R}^{d_{1}\times(d_{1}-d_{2})}$ be the complement of $V$ such that $(V,V_{\perp})\in\mathcal{O}(d_{1})$ . From Lemma 1 of [10], we have $\|U^{\mathrm{\scriptscriptstyle T}}V_{\perp}\|\leq\inf_{O\in\mathcal{O}(d_{2})}\left\|{V-UO}\right\|\leq\sqrt{2}\|U^{\mathrm{\scriptscriptstyle T}}V_{\perp}\|$ . The proof is complete with $\|U^{\mathrm{\scriptscriptstyle T}}V_{\perp}\|=\|V_{\perp}V_{\perp}^{\mathrm{\scriptscriptstyle T}}U\|=\left\|{(I_{d_{1}}-VV^{\mathrm{\scriptscriptstyle T}})U}\right\|$ . ∎

Proof of Lemma 4.

We first give an explicit expression for the first-order approximation $\widetilde{V}$ . Denote $\mu_{1}\geq\ldots\geq\mu_{n}$ as the eigenvalues of $Y$ . Let $YV^{*}=GDN^{\mathrm{\scriptscriptstyle T}}$ be its SVD where $G\in\mathcal{O}(n,d)$ , $N\in\mathcal{O}(d)$ , and $D\in\mathbb{R}^{d\times d}$ is a diagonal matrix with singular values. Define $M^{*}=\text{diag}(\mu_{1}^{*},\ldots,\mu^{*}_{d})\in\mathbb{R}^{d\times d}$ . Since

\displaystyle YV^{*}=Y^{*}V^{*}+(Y-Y^{*})V^{*}=V^{*}M^{*}+(Y-Y^{*})V^{*},

(48)

we have

\displaystyle\max_{i\in[d]}\left|D_{ii}-\mu^{*}_{i}\right|\leq\left\|{(Y-Y^{*})V^{*}}\right\|\leq\left\|{Y-Y^{*}}\right\|,

(49)

by Weyl’s inequality. Under the assumption that $\left\|{Y-Y^{*}}\right\|\leq\min\{\mu_{d}^{*}-\mu_{d+1}^{*},\mu_{d}^{*}\}/4$ , we have $\{D_{ii}\}_{i\in[d]}$ all being positive. Note that

	$\displaystyle\widetilde{V}$	$\displaystyle=\mathop{\rm argmin}_{V^{\prime}\in\mathcal{O}(n,d)}\left\\|{V^{\prime}-YV^{}}\right\\|_{\rm F}^{2}=\mathop{\rm argmax}_{V\in\mathcal{O}(n,d)}\left\langle V^{\prime},YV^{}\right\rangle$
		$\displaystyle=\mathop{\rm argmax}_{V^{\prime}\in\mathcal{O}(n,d)}\text{tr}\left(V^{\prime{\mathrm{\scriptscriptstyle T}}}GDN^{\mathrm{\scriptscriptstyle T}}\right)=\mathop{\rm argmax}_{V^{\prime}\in\mathcal{O}(n,d)}\left\langle G^{\mathrm{\scriptscriptstyle T}}V^{\prime}N,D\right\rangle.$

Due to the fact that $G,V^{\prime}\in\mathcal{O}(n,d)$ , $N\in\mathcal{O}(d)$ , and the diagonal entries of $D$ are all positive, the maximum is achieved when $G^{\mathrm{\scriptscriptstyle T}}V^{\prime}N=I_{d}$ . This gives $\widetilde{V}=GN^{\mathrm{\scriptscriptstyle T}}$ which can also be written as

\displaystyle\widetilde{V}=YV^{*}S,

(50)

where

\displaystyle S:=ND^{-1}N^{\mathrm{\scriptscriptstyle T}}\in\mathbb{R}^{d\times d}

(51)

can be seen as a linear operator and plays a similar role as $1/\left\|{Xu^{*}}\right\|$ for $\widetilde{u}=Xu^{*}/\left\|{Xu^{*}}\right\|$ in (9).

Define $M:=\text{diag}(\mu_{1},\mu_{2},\ldots,\mu_{d})\in\mathbb{R}^{d\times d}$ . Then we have

	$\displaystyle VM$	$\displaystyle=YV,$
	$\displaystyle\widetilde{V}M$	$\displaystyle=YV^{*}SM,$

and consequently,

\displaystyle(V-\widetilde{V})M=Y(V-V^{*}SM)=Y(V-\widetilde{V})+Y(\widetilde{V}-V^{*}SM).

Note that $(I-VV^{\mathrm{\scriptscriptstyle T}})Y=Y(I-VV^{\mathrm{\scriptscriptstyle T}})$ as $V$ is the leading eigenspace of $Y$ . After rearranging, we have

\displaystyle Y\widetilde{V}-\widetilde{V}M=Y(\widetilde{V}-V^{*}SM).

Multiplying $(I-VV^{\mathrm{\scriptscriptstyle T}})$ on both sides, we have

	$\displaystyle Y(I-VV^{\mathrm{\scriptscriptstyle T}})\widetilde{V}-(I-VV^{\mathrm{\scriptscriptstyle T}})\widetilde{V}M$	$\displaystyle=(I-VV^{\mathrm{\scriptscriptstyle T}})Y\widetilde{V}-(I-VV^{\mathrm{\scriptscriptstyle T}})\widetilde{V}M$
		$\displaystyle=(I-VV^{\mathrm{\scriptscriptstyle T}})Y({\widetilde{V}-V^{*}SM}),$

where the first equation is due to $Y(I-VV^{\mathrm{\scriptscriptstyle T}})=(I-VV^{\mathrm{\scriptscriptstyle T}})Y$ as $V$ is the leading eigenspace of $Y$ . Note that for any $x\in\text{span}(I-VV^{\mathrm{\scriptscriptstyle T}})$ and for any $i\in[d]$ , we have $\left\|{Yx-\mu_{i}x}\right\|\geq(\mu_{i}-\mu_{d+1})\left\|{x}\right\|$ . Then we have

\displaystyle\left\|{Y(I-VV^{\mathrm{\scriptscriptstyle T}})\widetilde{V}-(I-VV^{\mathrm{\scriptscriptstyle T}})\widetilde{V}M}\right\|\geq(\mu_{d}-\mu_{d+1})\left\|{(I-VV^{\mathrm{\scriptscriptstyle T}})\widetilde{V}}\right\|.

As a result, we have

\displaystyle\left\|{(I-VV^{\mathrm{\scriptscriptstyle T}})\widetilde{V}}\right\|\leq\frac{1}{\mu_{d}-\mu_{d+1}}\left\|{(I-VV^{\mathrm{\scriptscriptstyle T}})Y({\widetilde{V}-V^{*}SM})}\right\|,

(52)

which is analogous to (31) in the proof of Lemma 4. By Lemma 13, we have

\displaystyle\inf_{O\in\mathcal{O}(d)}\left\|{V-\widetilde{V}O}\right\|\leq\sqrt{2}\left\|{(I-VV^{\mathrm{\scriptscriptstyle T}})\widetilde{V}}\right\|\leq\frac{\sqrt{2}}{\mu_{d}-\mu_{d+1}}\left\|{(I-VV^{\mathrm{\scriptscriptstyle T}})Y({\widetilde{V}-V^{*}SM})}\right\|.

(53)

In the next, we are going to analyze $(I-VV^{\mathrm{\scriptscriptstyle T}})Y({\widetilde{V}-V^{*}SM})$ . Using (50), we have

	$\displaystyle(I-VV^{\mathrm{\scriptscriptstyle T}})Y({\widetilde{V}-V^{*}SM})$
	$\displaystyle=(I-VV^{\mathrm{\scriptscriptstyle T}})Y\left(YV^{}S-V^{}SM\right)$
	$\displaystyle=(I-VV^{\mathrm{\scriptscriptstyle T}})Y\left(V^{}M^{}S+(Y-Y^{})V^{}S-V^{*}SM\right)$
	$\displaystyle=(I-VV^{\mathrm{\scriptscriptstyle T}})YV^{}\left(M^{}S-SM\right)+(I-VV^{\mathrm{\scriptscriptstyle T}})Y(Y-Y^{})V^{}S$
	$\displaystyle=(I-VV^{\mathrm{\scriptscriptstyle T}})\left(V^{}M^{}+(Y-Y^{})V^{}\right)\left(M^{*}S-SM\right)$
	$\displaystyle\quad+(I-VV^{\mathrm{\scriptscriptstyle T}})V^{}M^{}V^{{\mathrm{\scriptscriptstyle T}}}(Y-Y^{})V^{*}S$
	$\displaystyle\quad+(I-VV^{\mathrm{\scriptscriptstyle T}})(Y^{}-V^{}M^{}V^{{\mathrm{\scriptscriptstyle T}}})(Y-Y^{})V^{}S+(I-VV^{\mathrm{\scriptscriptstyle T}})(Y-Y^{})(Y-Y^{})V^{*}S$
	$\displaystyle=(I-VV^{\mathrm{\scriptscriptstyle T}})V^{}M^{}\left(\left(M^{}S-SM\right)+V^{{\mathrm{\scriptscriptstyle T}}}(Y-Y^{})V^{}S\right)$
	$\displaystyle\quad+(I-VV^{\mathrm{\scriptscriptstyle T}})(Y-Y^{})V^{}\left(M^{*}S-SM\right)$
	$\displaystyle\quad+(I-VV^{\mathrm{\scriptscriptstyle T}})(Y^{}-V^{}M^{}V^{{\mathrm{\scriptscriptstyle T}}})(Y-Y^{})V^{}S+(I-VV^{\mathrm{\scriptscriptstyle T}})(Y-Y^{})(Y-Y^{})V^{*}S,$

where in the second to last equation, we use (48) and the decomposition $Y=V^{*}M^{*}V^{*{\mathrm{\scriptscriptstyle T}}}+(Y^{*}-V^{*}M^{*}V^{*{\mathrm{\scriptscriptstyle T}}})+(Y-Y^{*})$ . Hence, with $\|Y^{*}-V^{*}M^{*}V^{*{\mathrm{\scriptscriptstyle T}}}\|=\max\{|\mu^{*}_{d+1}|,|\mu^{*}_{n}|\}$ , we have

	$\displaystyle\left\\|{(I-VV^{\mathrm{\scriptscriptstyle T}})Y({\widetilde{V}-V^{*}SM})}\right\\|$
	$\displaystyle\leq\mu_{1}^{}\left\\|{(I-VV^{\mathrm{\scriptscriptstyle T}})V^{}}\right\\|\left(\left\\|{M^{}S-SM}\right\\|+\left\\|{Y-Y^{}}\right\\|\left\\|{S}\right\\|\right)$
	$\displaystyle\quad+\left\\|{Y-Y^{}}\right\\|\left\\|{M^{}S-SM}\right\\|+\max\{\|\mu^{}_{d+1}\|,\|\mu^{}_{n}\|\}\left\\|{Y-Y^{}}\right\\|\left\\|{S}\right\\|+\left\\|{Y-Y^{}}\right\\|^{2}\left\\|{S}\right\\|.$

Then from (53), we have

	$\displaystyle\inf_{O\in\mathcal{O}(d)}\left\\|{V-\widetilde{V}O}\right\\|$	$\displaystyle\leq\frac{\sqrt{2}}{\mu_{d}-\mu_{d+1}}\Bigg{(}\mu_{1}^{}\left\\|{(I-VV^{\mathrm{\scriptscriptstyle T}})V^{}}\right\\|\left(\left\\|{M^{}S-SM}\right\\|+\left\\|{Y-Y^{}}\right\\|\left\\|{S}\right\\|\right)$
		$\displaystyle\quad+\left\\|{Y-Y^{}}\right\\|\left\\|{M^{}S-SM}\right\\|+\max\{\|\mu^{}_{d+1}\|,\|\mu^{}_{n}\|\}\left\\|{Y-Y^{*}}\right\\|\left\\|{S}\right\\|$
		$\displaystyle\quad+\left\\|{Y-Y^{*}}\right\\|^{2}\left\\|{S}\right\\|\Bigg{)}.$

In the rest of the proof, we are going to simplify the display above. By Weyl’s inequality, we have

\displaystyle\max_{i\in[n]}\left|\mu_{i}-\mu_{i}^{*}\right|\leq\left\|{Y-Y^{*}}\right\|.

(54)

Since $\left\|{Y-Y^{*}}\right\|\leq(\mu_{d}^{*}-\mu_{d+1}^{*})/4$ is assumed, we have

\displaystyle\mu_{d}-\mu_{d+1}\geq\frac{\mu_{d}^{*}-\mu_{d+1}^{*}}{2}.

By this assumption and Lemma 5, we have

\displaystyle\left\|{(I-VV^{\mathrm{\scriptscriptstyle T}})V^{*}}\right\|\leq\frac{2\left\|{Y-Y^{*}}\right\|}{\mu^{*}_{d}-\mu^{*}_{d+1}}.

By (49) and the definition of $S$ in (51), we have

\displaystyle\left\|{S}\right\|=\left\|{D^{-1}}\right\|\leq\frac{1}{\mu_{d}^{*}-\left\|{Y-Y^{*}}\right\|}\leq\frac{4}{3\mu_{d}^{*}}.

In addition,

	$\displaystyle\left\\|{M^{*}S-SM}\right\\|$	$\displaystyle\leq\left\\|{M^{}S-SM^{}}\right\\|+\left\\|{S\left(M-M^{*}\right)}\right\\|$
		$\displaystyle\leq\left\\|{(M^{}-\mu_{d}^{}I_{d})S+S(\mu_{d}^{}I_{d}-M^{})}\right\\|+\left\\|{S}\right\\|\left\\|{M-M^{*}}\right\\|$
		$\displaystyle\leq\left\\|{S}\right\\|\left(2\left\\|{M^{}-\mu_{d}^{}I_{d}}\right\\|+\left\\|{M-M^{*}}\right\\|\right)$
		$\displaystyle\leq\frac{4}{3\mu_{d}^{}}\left(2(\mu_{1}^{}-\mu_{d}^{})+\left\\|{Y-Y^{}}\right\\|\right),$

where in the last inequality we use the fact $\left\|{M-M^{*}}\right\|=\max_{i\in[d]}\left|\mu_{i}-\mu_{i}^{*}\right|$ and (54). Combining all the results together, we have

	$\displaystyle\inf_{O\in\mathcal{O}(d)}\left\\|{V-\widetilde{V}O}\right\\|$
	$\displaystyle\leq\frac{2\sqrt{2}}{\mu_{d}^{}-\mu_{d+1}^{}}\left(\mu_{1}^{}\frac{2\left\\|{Y-Y^{}}\right\\|}{\mu^{}_{d}-\mu^{}_{d+1}}\left(\frac{4\left(2(\mu_{1}^{}-\mu_{d}^{})+\left\\|{Y-Y^{}}\right\\|\right)}{3\mu_{d}^{}}+\frac{4\left\\|{Y-Y^{}}\right\\|}{3\mu_{d}^{}}\right)\right)$
	$\displaystyle\quad+\frac{4}{3\mu_{d}^{}}\left(2(\mu_{1}^{}-\mu_{d}^{})+\left\\|{Y-Y^{}}\right\\|\right)\left\\|{Y-Y^{}}\right\\|+\frac{4\max\{\|\mu^{}_{d+1}\|,\|\mu^{}_{n}\|\}\left\\|{Y-Y^{}}\right\\|}{3\mu_{d}^{*}}$
	$\displaystyle\quad+\frac{4\left\\|{Y-Y^{}}\right\\|^{2}}{3\mu_{d}^{}}\Bigg{)}$
	$\displaystyle\leq\frac{16\sqrt{2}}{3\left(\mu_{d}^{}-\mu_{d+1}^{}\right)\mu_{d}^{}}\left(\frac{2\mu_{1}^{}}{3(\mu_{d}^{}-\mu_{d+1}^{})}+1\right)\left\\|{Y-Y^{*}}\right\\|^{2}$
	$\displaystyle\quad+\frac{8\sqrt{2}}{3\left(\mu_{d}^{}-\mu_{d+1}^{}\right)\mu_{d}^{}}\left(\frac{4\mu_{1}^{}\left(\mu_{1}^{}-\mu_{d}^{}\right)}{\mu_{d}^{}-\mu_{d+1}^{}}+2(\mu_{1}^{}-\mu_{d}^{})+\max\{\|\mu^{}_{d+1}\|,\|\mu^{}_{n}\|\}\right)\left\\|{Y-Y^{*}}\right\\|.$

∎

References

[1] Emmanuel Abbe, Jianqing Fan, Kaizheng Wang, and Yiqiao Zhong. Entrywise eigenvector analysis of random matrices with low expected rank. The Annals of Statistics, 48(3):1452, 2020.
[2] Emmanuel Abbe, Laurent Massoulie, Andrea Montanari, Allan Sly, and Nikhil Srivastava. Group synchronization on grids. Mathematical Statistics and Learning, 1(3):227–256, 2018.
[3] Joshua Agterberg, Zachary Lubberts, and Carey E Priebe. Entrywise estimation of singular vectors of low-rank matrices with heteroskedasticity and dependence. IEEE Transactions on Information Theory, 68(7):4618–4650, 2022.
[4] Mica Arie-Nachimson, Shahar Z Kovalsky, Ira Kemelmacher-Shlizerman, Amit Singer, and Ronen Basri. Global motion estimation from point matches. In 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission, pages 81–88. IEEE, 2012.
[5] Afonso S Bandeira, Nicolas Boumal, and Amit Singer. Tightness of the maximum likelihood semidefinite relaxation for angular synchronization. Mathematical Programming, 163(1-2):145–167, 2017.
[6] Stéphane Boucheron, Gábor Lugosi, and Olivier Bousquet. Concentration inequalities. In Summer school on machine learning, pages 208–240. Springer, 2003.
[7] Nicolas Boumal. Nonconvex phase synchronization. SIAM Journal on Optimization, 26(4):2355–2377, 2016.
[8] Nicolas Boumal, Amit Singer, and P-A Absil. Robust estimation of rotations from relative measurements by maximum likelihood. In 52nd IEEE Conference on Decision and Control, pages 1156–1161. IEEE, 2013.
[9] Changxiao Cai, Gen Li, Yuejie Chi, H Vincent Poor, and Yuxin Chen. Subspace estimation from unbalanced and incomplete data matrices: $\ell_{2,\infty}$ statistical guarantees. The Annals of Statistics, 49(2):944–967, 2021.
[10] T Tony Cai and Anru Zhang. Rate-optimal perturbation bounds for singular subspaces with applications to high-dimensional statistics. The Annals of Statistics, 46(1):60–89, 2018.
[11] Joshua Cape, Minh Tang, and Carey E Priebe. The two-to-infinity norm and singular subspace geometry with applications to high-dimensional statistics. The Annals of Statistics, 47(5):2405–2439, 2019.
[12] Yuxin Chen, Yuejie Chi, Jianqing Fan, Cong Ma, et al. Spectral methods for data science: A statistical perspective. Foundations and Trends® in Machine Learning, 14(5):566–806, 2021.
[13] Mihai Cucuringu. Sync-rank: Robust ranking, constrained ranking and rank aggregation via eigenvector and sdp synchronization. IEEE Transactions on Network Science and Engineering, 3(1):58–79, 2016.
[14] Chandler Davis and William Morton Kahan. The rotation of eigenvectors by a perturbation. iii. SIAM Journal on Numerical Analysis, 7(1):1–46, 1970.
[15] Jianqing Fan, Weichen Wang, and Yiqiao Zhong. An $\ell_{\infty}$ eigenvector perturbation bound and its application to robust covariance estimation. Journal of Machine Learning Research, 18(207):1–42, 2018.
[16] Yifeng Fan, Yuehaw Khoo, and Zhizhen Zhao. Joint community detection and rotational synchronization via semidefinite programming. SIAM Journal on Mathematics of Data Science, 4(3):1052–1081, 2022.
[17] Frank Filbir, Felix Krahmer, and Oleh Melnyk. On recovery guarantees for angular synchronization. Journal of Fourier Analysis and Applications, 27(2):31, 2021.
[18] Chao Gao and Anderson Y Zhang. Exact minimax estimation for phase synchronization. IEEE Transactions on Information Theory, 67(12):8236–8247, 2021.
[19] Chao Gao and Anderson Y Zhang. Sdp achieves exact minimax optimality in phase synchronization. IEEE Transactions on Information Theory, 2022.
[20] Chao Gao and Anderson Y Zhang. Optimal orthogonal group synchronization and rotation group synchronization. Information and Inference: A Journal of the IMA, 12(2):591–632, 2023.
[21] Mark A Iwen, Brian Preskitt, Rayan Saab, and Aditya Viswanathan. Phase retrieval from local measurements: Improved robustness via eigenvector-based angular synchronization. Applied and Computational Harmonic Analysis, 48(1):415–444, 2020.
[22] Adel Javanmard, Andrea Montanari, and Federico Ricci-Tersenghi. Phase transitions in semidefinite relaxations. Proceedings of the National Academy of Sciences, 113(16):E2218–E2223, 2016.
[23] Lihua Lei. Unified $\ell_{2\rightarrow\infty}$ eigenspace perturbation theory for symmetric random matrices. arXiv preprint arXiv:1909.04798, 2019.
[24] Marc Lelarge and Léo Miolane. Fundamental limits of symmetric low-rank matrix estimation. Probability Theory and Related Fields, 173:859–929, 2019.
[25] Gilad Lerman and Yunpeng Shi. Robust group synchronization via cycle-edge message passing. Foundations of Computational Mathematics, 22(6):1665–1741, 2022.
[26] Shuyang Ling. Improved performance guarantees for orthogonal group synchronization via generalized power method. SIAM Journal on Optimization, 32(2):1018–1048, 2022.
[27] Shuyang Ling. Near-optimal performance bounds for orthogonal and permutation group synchronization via spectral methods. Applied and Computational Harmonic Analysis, 60:20–52, 2022.
[28] Shuyang Ling. Solving orthogonal group synchronization via convex and low-rank optimization: Tightness and landscape analysis. Mathematical Programming, 200(1):589–628, 2023.
[29] Huikang Liu, Man-Chung Yue, and Anthony Man-Cho So. On the estimation performance and convergence rate of the generalized power method for phase synchronization. SIAM Journal on Optimization, 27(4):2426–2446, 2017.
[30] Amelia Perry, Alexander S Wein, Afonso S Bandeira, and Ankur Moitra. Optimality and sub-optimality of PCA for spiked random matrices and synchronization. arXiv preprint arXiv:1609.05573, 2016.
[31] Amelia Perry, Alexander S Wein, Afonso S Bandeira, and Ankur Moitra. Message-passing algorithms for synchronization problems over compact groups. Communications on Pure and Applied Mathematics, 71(11):2275–2322, 2018.
[32] Brian P Preskitt. Phase retrieval from locally supported measurements. University of California, San Diego, 2018.
[33] Elad Romanov and Matan Gavish. The noise-sensitivity phase transition in spectral group synchronization over compact groups. Applied and Computational Harmonic Analysis, 49(3):935–970, 2020.
[34] Yanyao Shen, Qixing Huang, Nati Srebro, and Sujay Sanghavi. Normalized spectral map synchronization. Advances in Neural Information Processing Systems, 29, 2016.
[35] Yunpeng Shi and Gilad Lerman. Message passing least squares framework and its application to rotation synchronization. In International conference on machine learning, pages 8796–8806. PMLR, 2020.
[36] Amit Singer. Angular synchronization by eigenvectors and semidefinite programming. Applied and Computational Harmonic Analysis, 30(1):20–36, 2011.
[37] Amit Singer and Yoel Shkolnisky. Three-dimensional structure determination from common lines in cryo-em by eigenvectors and semidefinite programming. SIAM journal on Imaging Sciences, 4(2):543–572, 2011.
[38] Lanhui Wang and Amit Singer. Exact and stable recovery of rotations for robust synchronization. Information and Inference: A Journal of the IMA, 2(2):145–193, 2013.
[39] Yiqiao Zhong and Nicolas Boumal. Near-optimal bounds for phase synchronization. SIAM Journal on Optimization, 28(2):989–1016, 2018.
[40] Linglingzhi Zhu, Jinxin Wang, and Anthony Man-Cho So. Orthogonal group synchronization with incomplete measurements: Error bounds and linear convergence of the generalized power method. arXiv preprint arXiv:2112.06556, 2021.

Appendix A Proofs of Lemma 3, Proposition 3, and Proposition 4

Proof of Lemma 3.

Similar to the proof of Lemma 1, we can show each eigenvalue of $A$ is also an eigenvalue of $(A\otimes J_{d})\circ Z^{*}Z^{*{\mathrm{\scriptscriptstyle T}}}$ with multiplicity $d$ . At the same time, each eigenvalue of $(A\otimes J_{d})\circ Z^{*}Z^{*{\mathrm{\scriptscriptstyle T}}}$ must be an eigenvalue of $A$ . The proof is omitted here. ∎

Proof of Proposition 3.

Since $\sigma=0$ , we have $U=U^{*}$ . Then $\widehat{Z}_{j}=\mathcal{P}(U_{j})=\mathcal{P}(U^{*}_{j})=\mathcal{P}(Z^{*}_{j}\widecheck{u}_{j})$ . Since $Z^{*}_{j}$ is an orthogonal matrix, we have $\widehat{Z}_{j}=Z^{*}_{j}\text{sign}(\widecheck{u}_{j})$ . Then by (16), the proposition is proved by the same argument used to prove Proposition 1. ∎

Before proving Proposition 4, we state some properties of $A$ and $\mathcal{W}$ . The following lemma can be seen as an analog of Lemma 8.

Lemma 14.

There exist constants $C_{1},C_{2}>0$ such that if $\frac{np}{\log n}>C_{1}$ , then we have

	$\displaystyle\left\\|{(A\otimes J_{d})\circ\mathcal{W}}\right\\|\leq C_{2}\sqrt{dnp},$
	$\displaystyle\sum_{i=1}^{n}\left\\|{\sum_{j\in[n]\backslash\{i\}}A_{ij}\left(Z_{i}^{{\mathrm{\scriptscriptstyle T}}}\mathcal{W}_{ij}Z_{j}^{}-Z_{j}^{{\mathrm{\scriptscriptstyle T}}}\mathcal{W}_{ji}Z_{i}^{}\right)}\right\\|_{\rm F}^{2}\leq 2d(d-1)n^{2}p\left(1+C_{2}\sqrt{\frac{\log n}{n}}\right),$
	$\displaystyle\sum_{i=1}^{n}\left\\|{\sum_{j\in[n]\backslash\{i\}}A_{ij}\mathcal{W}_{ij}Z_{j}^{*}}\right\\|_{\rm F}^{2}\leq d^{2}n^{2}p\left(1+C_{2}\sqrt{\frac{\log n}{n}}\right),$

hold with probability at least $1-3n^{-10}$ .

Proof.

The first inequality is from Lemma 4.2 of [20]. The second and third inequalities are from (59) and (60), together with Lemma 4.3, of [20], respectively. ∎

Proof of Proposition 4.

By Lemma 8 and Lemma 14, there exist constants $c_{1},c_{2}>0$ such that when $\frac{np}{\log n}>c_{1}$ , we have $\left\|{A-\mathbb{E}A}\right\|\leq c_{2}\sqrt{np}$ and $\left\|{(A\otimes J_{d})\circ\mathcal{W}}\right\|\leq c_{2}\sqrt{dnp}$ with probability at least $1-6n^{-10}$ . By Lemma 3 and Lemma 7, we have $\lambda^{*}_{1}=\lambda^{*}_{d}\geq(n-1)p-c_{2}\sqrt{np}$ , $\max\{|\lambda^{*}_{d+1}|,|\lambda^{*}_{n}|\}\leq p+c_{2}\sqrt{np}$ , and $\lambda^{*}_{d}-\lambda_{d+1}^{*}\geq np-2c_{2}\sqrt{np}$ . Note that $d$ is a constant. When $\frac{np}{\log n}$ and $\frac{np}{\sigma^{2}}$ are greater than some sufficiently large constant, we have $4\sigma\left\|{(A\otimes J_{d})\circ\mathcal{W}}\right\|\leq np/2\leq\min\{\lambda^{*}_{d},\lambda^{*}_{d}-\lambda_{d+1}^{*}\}$ satisfied. Since $\mathcal{X}-(A\otimes J_{d})\circ Z^{*}Z^{*{\mathrm{\scriptscriptstyle H}}}=\sigma(A\otimes J_{d})\circ\mathcal{W}$ , a direct application of Lemma 4 leads to

	$\displaystyle\inf_{O\in\mathcal{O}(d)}\left\\|{U-\widetilde{U}O}\right\\|$
	$\displaystyle\leq\frac{8\sqrt{2}}{3(\lambda_{1}^{}-\lambda_{d+1}^{})}\Bigg{(}\left(\frac{4}{3(\lambda_{1}^{}-\lambda_{d+1}^{})}+\frac{2}{\lambda_{1}^{*}}\right)\sigma^{2}\left\\|{(A\otimes J_{d})\circ\mathcal{W}}\right\\|^{2}$
	$\displaystyle\quad+\frac{\max\{\|\lambda^{}_{d+1}\|,\|\lambda^{}_{n}\|\}}{\lambda_{1}^{*}}\sigma\left\\|{(A\otimes J_{d})\circ\mathcal{W}}\right\\|\Bigg{)}$
	$\displaystyle=\frac{8\sqrt{2}}{3(np/2)}\left(\left(\frac{4}{3(np/2)}+\frac{2}{np/2}\right)\sigma^{2}c_{2}^{2}dnp+\frac{p+c_{2}\sqrt{np}}{np/2}\sigma c_{2}\sqrt{dnp}\right)$
	$\displaystyle\leq c_{3}\frac{\sigma^{2}d+\sigma\sqrt{d}}{np},$

for some constant $c_{3}>0$ . ∎

Appendix B Proof of Theorem 4

We first state useful technical lemmas. They are analogs of Lemma 11 and Lemma 12, respectively. Lemma 15 is proved in (31) of [20].

Lemma 15.

There exists some constant $C>0$ such that for any $\rho$ that satisfies $\frac{\rho^{2}np}{d^{2}\sigma^{2}}\geq C$ , we

\displaystyle\sum_{i=1}^{n}\mathbb{I}\left\{\frac{2\sigma}{np}\left\|{\sum_{j\in[n]\backslash\{i\}}A_{ij}\mathcal{W}_{ij}Z_{j}^{*}}\right\|>\rho\right\}\leq\frac{\sigma^{2}}{\rho^{2}p}\exp\left(-\sqrt{\frac{\rho^{2}np}{\sigma^{2}}}\right),

with probability at least $1-\exp\left(-\sqrt{\frac{\rho^{2}np}{\sigma^{2}}}\right)$ .

Lemma 16 (Lemma 2.1 of [20]).

Let $X,\widetilde{X}\in\mathbb{R}^{d\times d}$ be two matrices of full rank. Then,

\left\|{\mathcal{P}(X)-\mathcal{P}(\widetilde{X})}\right\|_{\rm F}\leq\frac{2}{s_{\min}(X)+s_{\min}(\widetilde{X})}\left\|{X-\widetilde{X}}\right\|_{\rm F}.

Proof of Theorem 4.

Let $O\in\mathcal{O}(d)$ satisfy $\|U-\widetilde{U}O\|=\inf_{O^{\prime}\in\mathcal{O}(d)}\|{U-\widetilde{U}O^{\prime}}\|$ . Define $\Delta:=U-\widetilde{U}O\in\mathbb{R}^{nd\times d}$ . Recall $\widecheck{u}$ is the leading eigenvector of $A$ . From Proposition 1, Proposition 4, Lemma 8, and Lemma 14, there exist constants $c_{1},c_{2}>0$ such that if $\frac{np}{\log n},\frac{np}{\sigma^{2}}>c_{1}$ , we have

$\displaystyle\left\\|{\Delta}\right\\|$	$\displaystyle\leq c_{2}\frac{\sigma^{2}d+\sigma\sqrt{d}}{np},$	(55)
$\displaystyle\max_{j\in[n]}\left\|\widecheck{u}_{j}-\frac{1}{\sqrt{n}}b_{2}\right\|$	$\displaystyle\leq c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)\frac{1}{\sqrt{n}},$	(56)
$\displaystyle\left\\|{A-\mathbb{E}A}\right\\|$	$\displaystyle\leq c_{2}\sqrt{np},$	(57)
$\displaystyle\left\\|{(A\otimes J_{d})\circ\mathcal{W}}\right\\|$	$\displaystyle\leq c_{2}\sqrt{npd},$	(58)
$\displaystyle\sum_{i=1}^{n}\left\\|{\sum_{j\in[n]\backslash\{i\}}A_{ij}\left(Z_{i}^{{\mathrm{\scriptscriptstyle T}}}\mathcal{W}_{ij}Z_{j}^{}-Z_{j}^{{\mathrm{\scriptscriptstyle T}}}\mathcal{W}_{ji}Z_{i}^{}\right)}\right\\|_{\rm F}^{2}$	$\displaystyle\leq 2d(d-1)n^{2}p\left(1+c_{2}\sqrt{\frac{\log n}{n}}\right),$	(59)
$\displaystyle\sum_{i=1}^{n}\left\\|{\sum_{j\in[n]\backslash\{i\}}A_{ij}\mathcal{W}_{ij}Z_{j}^{*}}\right\\|_{\rm F}^{2}$	$\displaystyle\leq d^{2}n^{2}p\left(1+c_{2}\sqrt{\frac{\log n}{n}}\right),$	(60)

with probability at least $1-n^{-9}$ , for some $b_{2}\in\{-1,1\}$ . By Lemma 3 and Lemma 7, we have $\lambda^{*}_{1}=\lambda^{*}_{d}$ , $|\lambda_{d}^{*}-(n-1)p|\leq c_{2}\sqrt{np}$ , $\left|\lambda^{*}_{d+1}\right|\leq p+c_{2}\sqrt{np}$ , and $\lambda^{*}_{d}-\lambda_{d+1}^{*}\geq np-2c_{2}\sqrt{np}$ .

Using the same argument as (50) and (51) in the proof of Lemma 4, we can have an explicit expression for $\widetilde{U}$ . Recall the definition of $\widetilde{U}$ in (22). Let $\mathcal{X}U^{*}=GDN^{\mathrm{\scriptscriptstyle T}}$ be its SVD where $G\in\mathcal{O}(nd,d)$ , $N\in\mathcal{O}(d)$ , and $D\in\mathbb{R}^{d\times d}$ is a diagonal matrix with singular values. By the decomposition (21), we have

\displaystyle\mathcal{X}U^{*}=((A\otimes J_{d})\circ Z^{*}Z^{*{\mathrm{\scriptscriptstyle T}}})U^{*}+\sigma((A\otimes J_{d})\circ\mathcal{W})U^{*}=\lambda_{1}^{*}U^{*}+\sigma((A\otimes J_{d})\circ\mathcal{W})U^{*}.

(61)

Since the diagonal entries of $D$ correspond to the leading singular values of $\mathcal{X}U^{*}$ , Weyl’s inequality leads to $\max_{j\in[d]}|D_{jj}-\lambda_{1}^{*}|\leq\sigma\left\|{(A\otimes J_{d})\circ\mathcal{W}}\right\|\leq c_{2}\sigma\sqrt{dnp}.$ Denote

\displaystyle t:=p+c_{2}\sqrt{np}+c_{2}\sigma\sqrt{dnp}.

(62)

We then have

\displaystyle\max_{j\in[d]}|D_{jj}-np|\leq p+t.

(63)

When $\frac{np}{\log n},\frac{np}{d\sigma^{2}}$ are greater than some sufficiently large constant, we have $np/2\leq\lambda^{*}_{1}$ and $np/2\leq D_{jj}\leq 3np/2$ for all $j\in[d]$ . As a consequence, all the diagonal entries of $D$ are positive. Then $\widetilde{U}$ can be written as

\displaystyle\widetilde{U}=\mathcal{X}U^{*}S,

where

\displaystyle S:=ND^{-1}N^{\mathrm{\scriptscriptstyle T}}\in\mathbb{R}^{d\times d}.

(64)

Then (63) leads to

\displaystyle\left\|{\frac{1}{np}I_{d}-S}\right\|=\left\|{\frac{1}{np}I_{d}-D^{-1}}\right\|\leq\frac{1}{np-t}-\frac{1}{np}\leq\frac{2t}{(np)^{2}},

(65)

and

\displaystyle\left\|{S}\right\|=\left\|{D^{-1}}\right\|\leq\frac{2}{np}.

(66)

Using (61), we have the following decomposition for $U$ :

\displaystyle U=\widetilde{U}O+\Delta=\mathcal{X}U^{*}SO+\Delta=\left(\lambda_{1}^{*}U^{*}+\sigma((A\otimes J_{d})\circ\mathcal{W})U^{*}\right)SO+\Delta.

Recall the definition of $U^{*}$ in (14). Define $\Delta^{*}:=U^{*}-\frac{1}{\sqrt{n}}Z^{*}b_{2}$ . When $\frac{np}{\log n}\geq 2c_{2}^{*}$ , by the same argument used to derive (39) as in the proof of Theorem 3, we have

	$\displaystyle\left\\|{\Delta^{*}}\right\\|$	$\displaystyle=\left\\|{Z^{*}\circ\left(\widecheck{u}\otimes\mathds{1}_{d}-\frac{1}{\sqrt{n}}\mathds{1}_{n}\otimes\mathds{1}_{d}b_{2}\right)}\right\\|=\left\\|{\widecheck{u}\otimes\mathds{1}_{d}-\frac{1}{\sqrt{n}}\mathds{1}_{n}\otimes\mathds{1}_{d}}\right\\|=\sqrt{d}\left\\|{\widecheck{u}-\frac{1}{\sqrt{n}}\mathds{1}_{n}b_{2}}\right\\|$
		$\displaystyle\leq\frac{2c_{2}\sqrt{np}+2p}{np}\sqrt{d}.$		(67)

Then $U$ can be further decomposed into

\displaystyle U=\left(\lambda_{1}^{*}U^{*}+\sigma((A\otimes J_{d})\circ\mathcal{W})\left(\frac{1}{\sqrt{n}}Z^{*}b_{2}+\Delta^{*}\right)\right)SO+\Delta.

For any $j\in[n]$ , denote $[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}\in\mathbb{R}^{d\times nd}$ as the submatrix corresponding to its rows from the $((j-1)d+1)$ th to the $(jd)$ th. Note that $SO\in\mathbb{R}^{d\times d}$ . Then $U_{j}$ has an expression:

	$\displaystyle U_{j}$	$\displaystyle=\left(\lambda_{1}^{}U_{j}^{}+\frac{\sigma}{\sqrt{n}}[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}Z^{}b_{2}+\sigma[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}\Delta^{}\right)SO+\Delta_{j}$
		$\displaystyle=\left(\lambda_{1}^{}Z_{j}^{}\widecheck{u}_{j}+\frac{\sigma}{\sqrt{n}}\sum_{k\neq j}A_{jk}\mathcal{W}_{jk}Z^{}_{k}b_{2}+\sigma[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}\Delta^{}\right)SO+\Delta_{j},$

where $\Delta_{j}\in\mathbb{R}^{d\times d}$ is denoted as the $j$ th submatrix of $\Delta$ .

Note that we have following properties for the mapping $\mathcal{P}$ . For any $B\in\mathbb{R}^{d\times d}$ of full rank and any $F\in\mathcal{O}(d)$ , we have $\mathcal{P}(BF)=\mathcal{P}(B)F$ . In addition, if $B$ is positive-definite, $\mathcal{P}(B)=I_{d}$ . Since we have shown the diagonal entries of $D$ are all lower bounded by $np/2$ , (64) leads to $\mathcal{P}(S)=I_{d}$ . Then

\displaystyle\left\|{\widehat{Z}_{j}-Z^{*}_{j}Ob_{2}}\right\|_{\rm F}=\left\|{\mathcal{P}(U_{j})-Z^{*}_{j}Ob_{2}}\right\|_{\rm F}=\left\|{\mathcal{P}(Z_{j}^{*{\mathrm{\scriptscriptstyle T}}}U_{j}O^{\mathrm{\scriptscriptstyle T}}b_{2})-I_{d}}\right\|_{\rm F}.

We have

\displaystyle Z_{j}^{*{\mathrm{\scriptscriptstyle T}}}U_{j}O^{\mathrm{\scriptscriptstyle T}}b_{2}=\left(\lambda_{1}^{*}\widecheck{u}_{j}b_{2}I_{d}+\frac{\sigma}{\sqrt{n}}\Xi_{j}+\sigma b_{2}Z_{j}^{*{\mathrm{\scriptscriptstyle T}}}[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}\Delta^{*}\right)S+Z_{j}^{*{\mathrm{\scriptscriptstyle T}}}\Delta_{j}O^{\mathrm{\scriptscriptstyle T}}b_{2}

where

\displaystyle\Xi_{j}:=\sum_{k\neq j}A_{jk}Z^{*{\mathrm{\scriptscriptstyle T}}}_{j}\mathcal{W}_{jk}Z^{*}_{k}.

Note that from (56), we have

\displaystyle b_{2}\widecheck{u}_{j}\geq\left(1-c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)\right)\frac{1}{\sqrt{n}}.

As long as $\frac{np}{\log n}$ is greater than some sufficiently large constant, we have $b_{2}\widecheck{u}_{j}\geq\frac{1}{2\sqrt{n}}.$ Since $\lambda_{1}^{*}$ is also positive, we have

\displaystyle\frac{Z_{j}^{*{\mathrm{\scriptscriptstyle T}}}U_{j}O^{\mathrm{\scriptscriptstyle T}}b_{2}}{\lambda_{1}^{*}\widecheck{u}_{j}b_{2}}=S+T_{j}

(68)

where $T_{j}$ is defined as

	$\displaystyle T_{j}$	$\displaystyle:=\frac{1}{\lambda_{1}^{}\widecheck{u}_{j}b_{2}}\left(\left(\frac{\sigma}{\sqrt{n}}\Xi_{j}+\sigma b_{2}Z_{j}^{{\mathrm{\scriptscriptstyle T}}}[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}\Delta^{}\right)S+Z_{j}^{{\mathrm{\scriptscriptstyle T}}}\Delta_{j}O^{\mathrm{\scriptscriptstyle T}}b_{2}\right)$
		$\displaystyle=\frac{1}{\lambda_{1}^{}\widecheck{u}_{j}b_{2}}\frac{\sigma}{\sqrt{n}}\Xi_{j}S+\frac{\sigma b_{2}Z_{j}^{{\mathrm{\scriptscriptstyle T}}}[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}\Delta^{}S}{\lambda_{1}^{}\widecheck{u}_{j}b_{2}}+\frac{Z_{j}^{{\mathrm{\scriptscriptstyle T}}}\Delta_{j}O^{\mathrm{\scriptscriptstyle T}}b_{2}}{\lambda_{1}^{}\widecheck{u}_{j}b_{2}}.$

As a consequence, when $\det(U_{j})\neq 0$ , we have

\displaystyle\left\|{\widehat{Z}_{j}-Z^{*}_{j}Ob_{2}}\right\|_{\rm F}=\left\|{\mathcal{P}\left(\frac{Z_{j}^{*{\mathrm{\scriptscriptstyle T}}}U_{j}O^{\mathrm{\scriptscriptstyle T}}b_{2}}{\lambda_{1}^{*}\widecheck{u}_{j}b_{2}}\right)-I_{d}}\right\|_{\rm F}

\displaystyle=\left\|{\mathcal{P}\left(S+T_{j}\right)-I_{d}}\right\|_{\rm F}.

(69)

Let $0<\gamma,\rho<1/8$ whose values will be determined later. To simplify $\|\widehat{Z}_{j}-Z^{*}_{j}Ob_{2}\|_{\text{F}}$ , consider the following two cases.

(1) If

$\displaystyle\left\\|{\frac{1}{\lambda_{1}^{*}\widecheck{u}_{j}b_{2}}\frac{\sigma}{\sqrt{n}}\Xi_{j}S}\right\\|$	$\displaystyle\leq\frac{\gamma}{np}$	(70)
$\displaystyle\left\\|{\frac{\sigma b_{2}Z_{j}^{{\mathrm{\scriptscriptstyle T}}}[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}\Delta^{}S}{\lambda_{1}^{*}\widecheck{u}_{j}b_{2}}}\right\\|$	$\displaystyle\leq\frac{\rho}{np}$
$\displaystyle\left\\|{\frac{Z_{j}^{{\mathrm{\scriptscriptstyle T}}}\Delta_{j}O^{\mathrm{\scriptscriptstyle T}}b_{2}}{\lambda_{1}^{}\widecheck{u}_{j}b_{2}}}\right\\|$	$\displaystyle\leq\frac{\rho}{np}$	(71)

all hold, then

	$\displaystyle s_{\min}(S+T_{j})$	$\displaystyle\geq s_{\min}(S)-\left\\|{T_{j}}\right\\|=s_{\min}(D^{-1})-\left\\|{T_{j}}\right\\|=D_{11}^{-1}-\left\\|{T_{j}}\right\\|$
		$\displaystyle\geq D_{11}^{-1}-\frac{\gamma+2\rho}{np},$

which is greater than 0 by (63). Together with (68), we have $\det(U_{j})\neq 0$ . The same lower bound holds for $s_{\min}(S+(T_{j}+T_{j}^{\mathrm{\scriptscriptstyle T}})/2)$ . Since $S$ is positive-definite, we have $\mathcal{P}(S+(T_{j}+T_{j}^{\mathrm{\scriptscriptstyle T}})/2)=I_{d}$ . By Lemma 16 and (69), we have

	$\displaystyle\left\\|{\widehat{Z}_{j}-Z^{*}_{j}Ob_{2}}\right\\|_{\rm F}$
	$\displaystyle=\left\\|{\mathcal{P}\left(S+T_{j}\right)-\mathcal{P}\left(S+\frac{T_{j}+T_{j}^{\mathrm{\scriptscriptstyle T}}}{2}\right)}\right\\|_{\rm F}$
	$\displaystyle\leq\frac{1}{\left(D_{11}^{-1}-\frac{\gamma+2\rho}{np}\right)}\left\\|{\frac{T_{j}-T_{j}^{\mathrm{\scriptscriptstyle T}}}{2}}\right\\|_{\rm F}$
	$\displaystyle\leq\frac{1}{\lambda_{1}^{}\widecheck{u}_{j}b_{2}}\frac{1}{2\left(D_{11}^{-1}-\frac{\gamma+2\rho}{np}\right)}\Bigg{(}\frac{\sigma}{\sqrt{n}}\left\\|{\Xi_{j}S-S^{\mathrm{\scriptscriptstyle T}}\Xi_{j}^{\mathrm{\scriptscriptstyle T}}}\right\\|_{\rm F}+2\left\\|{\sigma b_{2}Z_{j}^{{\mathrm{\scriptscriptstyle T}}}[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}\Delta^{*}S}\right\\|_{\rm F}$
	$\displaystyle\quad+2\left\\|{Z_{j}^{*{\mathrm{\scriptscriptstyle T}}}\Delta_{j}O^{\mathrm{\scriptscriptstyle T}}b_{2}}\right\\|_{\rm F}\Bigg{)}.$

We can further simplify the first term in the display above. We have

	$\displaystyle\left\\|{\Xi_{j}S-S^{\mathrm{\scriptscriptstyle T}}\Xi_{j}^{\mathrm{\scriptscriptstyle T}}}\right\\|_{\rm F}$	$\displaystyle=\left\\|{\frac{1}{np}\left(\Xi_{j}-\Xi_{j}^{\mathrm{\scriptscriptstyle T}}\right)-\Xi_{j}\left(\frac{1}{np}I_{d}-S\right)+(\frac{1}{np}I_{d}-S^{\mathrm{\scriptscriptstyle T}})\Xi_{j}^{\mathrm{\scriptscriptstyle T}}}\right\\|_{\rm F}$
		$\displaystyle\leq\frac{1}{np}\left\\|{\Xi_{j}-\Xi_{j}^{\mathrm{\scriptscriptstyle T}}}\right\\|_{\rm F}+2\left\\|{\frac{1}{np}I_{d}-S}\right\\|\left\\|{\Xi_{j}}\right\\|_{\rm F}.$

Using (65) and (66), we have

	$\displaystyle\left\\|{\widehat{Z}_{j}-Z^{*}_{j}Ob_{2}}\right\\|_{\rm F}$	$\displaystyle\leq\frac{1}{\lambda_{1}^{*}\widecheck{u}_{j}b_{2}}\frac{1}{2\left(D_{11}^{-1}-\frac{\gamma+2\rho}{np}\right)}\Bigg{(}\frac{\sigma}{\sqrt{n}}\frac{1}{np}\left\\|{\Xi_{j}-\Xi_{j}^{\mathrm{\scriptscriptstyle T}}}\right\\|_{\rm F}+\frac{\sigma}{\sqrt{n}}\frac{t}{(np)^{2}}\left\\|{\Xi_{j}}\right\\|_{\rm F}$
		$\displaystyle\quad+\frac{4}{np}\sigma\left\\|{[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}\Delta^{*}}\right\\|_{\rm F}+2\left\\|{\Delta_{j}}\right\\|_{\rm F}\Bigg{)}.$

Using the lower bounds for $\lambda_{1}^{*}$ , $\widecheck{u}_{j}b_{2}$ , and $D_{11}^{-1}$ , as given at the beginning of this proof, we have

	$\displaystyle\left\\|{\widehat{Z}_{j}-Z^{*}_{j}Ob_{2}}\right\\|_{\rm F}$
	$\displaystyle\leq\frac{1}{\left(np-p-c_{2}\sqrt{np}\right)\left(1-c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)\right)\left(\frac{1}{np+t}-\frac{\gamma+2\rho}{np}\right)}\frac{\sigma}{2np}\left\\|{\Xi_{j}-\Xi_{j}^{\mathrm{\scriptscriptstyle T}}}\right\\|_{\rm F}$
	$\displaystyle\quad+\frac{4\sigma t}{(np)^{2}}\left\\|{\Xi_{j}}\right\\|_{\rm F}+\frac{16\sigma\sqrt{n}}{np}\left\\|{[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}\Delta^{*}}\right\\|_{\rm F}+16\sqrt{n}\left\\|{\Delta_{j}}\right\\|_{\rm F}.$

Let $\eta>0$ whose value will be given later. By the same argument as used in the proof of Theorem 3, we have

	$\displaystyle\left\\|{\widehat{Z}_{j}-Z^{*}_{j}Ob_{2}}\right\\|_{\rm F}^{2}$
	$\displaystyle\leq\frac{1+\eta}{\left(np-p-c_{2}\sqrt{np}\right)^{2}\left(1-c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)\right)^{2}\left(\frac{1}{np+t}-\frac{\gamma+2\rho}{np}\right)^{2}}\frac{\sigma^{2}}{4(np)^{2}}\left\\|{\Xi_{j}-\Xi_{j}^{\mathrm{\scriptscriptstyle T}}}\right\\|_{\rm F}^{2}$
	$\displaystyle\quad+3(1+\eta^{-1})\frac{16\sigma^{2}t^{2}}{(np)^{4}}\left\\|{\Xi_{j}}\right\\|_{\rm F}^{2}+3(1+\eta^{-1})\frac{256\sigma^{2}n}{(np)^{2}}\left\\|{[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}\Delta^{*}}\right\\|_{\rm F}^{2}$
	$\displaystyle\quad+3(1+\eta^{-1})64n\left\\|{\Delta_{j}}\right\\|_{\rm F}^{2}.$

(2) If any one of (70)-(71) does not hold, we simply upper bound $\|{\widehat{Z}_{j}-Z^{*}_{j}\widetilde{Q}b_{2}}\|_{\rm F}$ by $2\sqrt{d}$ . Then this case can be written as

	$\displaystyle\left\\|{\widehat{Z}_{j}-Z^{*}_{j}Ob_{2}}\right\\|_{\rm F}^{2}$
	$\displaystyle\leq 4d\Bigg{(}{\mathbb{I}\left\{{\left\\|{\frac{1}{\lambda_{1}^{}\widecheck{u}_{j}b_{2}}\frac{\sigma}{\sqrt{n}}\Xi_{j}S}\right\\|>\frac{\gamma}{np}}\right\}}+{\mathbb{I}\left\{{\left\\|{\frac{\sigma b_{2}Z_{j}^{{\mathrm{\scriptscriptstyle T}}}[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}\Delta^{}S}{\lambda_{1}^{}\widecheck{u}_{j}b_{2}}}\right\\|>\frac{\rho}{np}}\right\}}$
	$\displaystyle\quad+{\mathbb{I}\left\{{\left\\|{\frac{Z_{j}^{{\mathrm{\scriptscriptstyle T}}}\Delta_{j}O^{\mathrm{\scriptscriptstyle T}}b_{2}}{\lambda_{1}^{}\widecheck{u}_{j}b_{2}}}\right\\|>\frac{\rho}{np}}\right\}}\Bigg{)}.$

Using (66), $\lambda_{1}^{*}\geq np/2$ , and $\widecheck{u}_{j}b_{2}\geq 1/(2\sqrt{n})$ , we have

	$\displaystyle\left\\|{\widehat{Z}_{j}-Z^{*}_{j}Ob_{2}}\right\\|_{\rm F}^{2}$
	$\displaystyle\leq 4d\left({\mathbb{I}\left\{{8\sigma\left\\|{\Xi_{j}}\right\\|\geq\gamma np}\right\}}+{\mathbb{I}\left\{{8\sqrt{n}\sigma\left\\|{[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}\Delta^{*}}\right\\|\geq\rho np}\right\}}+{\mathbb{I}\left\{{4\sqrt{n}\left\\|{\Delta_{j}}\right\\|\geq\rho}\right\}}\right)$
	$\displaystyle\leq 4d\left({\mathbb{I}\left\{{8\sigma\left\\|{\Xi_{j}}\right\\|\geq\gamma np}\right\}}+\frac{64\sigma^{2}n}{(\rho np)^{2}}\left\\|{[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}\Delta^{*}}\right\\|_{\rm F}^{2}+16n\rho^{-2}\left\\|{\Delta_{j}}\right\\|_{\rm F}^{2}\right).$

Combining these two cases together, we have

	$\displaystyle\left\\|{\widehat{Z}_{j}-Z^{*}_{j}Ob_{2}}\right\\|_{\rm F}^{2}$
	$\displaystyle\leq\frac{1+\eta}{\left(np-p-c_{2}\sqrt{np}\right)^{2}\left(1-c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)\right)^{2}\left(\frac{1}{np+t}-\frac{\gamma+2\rho}{np}\right)^{2}}\frac{\sigma^{2}}{4(np)^{2}}\left\\|{\Xi_{j}-\Xi_{j}^{\mathrm{\scriptscriptstyle T}}}\right\\|_{\rm F}^{2}$
	$\displaystyle\quad+3(1+\eta^{-1})\frac{16\sigma^{2}t^{2}}{(np)^{4}}\left\\|{\Xi_{j}}\right\\|_{\rm F}^{2}+3(1+\eta^{-1})\frac{256\sigma^{2}n}{(np)^{2}}\left\\|{[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}\Delta^{*}}\right\\|_{\rm F}^{2}$
	$\displaystyle\quad+3(1+\eta^{-1})64n\left\\|{\Delta_{j}}\right\\|_{\rm F}^{2}$
	$\displaystyle\quad+4d\left({\mathbb{I}\left\{{8\sigma\left\\|{\Xi_{j}}\right\\|\geq\gamma np}\right\}}+\frac{64\sigma^{2}n}{(\rho np)^{2}}\left\\|{[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}\Delta^{*}}\right\\|_{\rm F}^{2}+16n\rho^{-2}\left\\|{\Delta_{j}}\right\\|_{\rm F}^{2}\right)$
	$\displaystyle\leq\frac{1+\eta}{\left(np-p-c_{2}\sqrt{np}\right)^{2}\left(1-c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)\right)^{2}\left(\frac{1}{np+t}-\frac{\gamma+2\rho}{np}\right)^{2}}\frac{\sigma^{2}}{4(np)^{2}}\left\\|{\Xi_{j}-\Xi_{j}^{\mathrm{\scriptscriptstyle T}}}\right\\|_{\rm F}^{2}$
	$\displaystyle\quad+3(1+\eta^{-1})\frac{16\sigma^{2}t^{2}}{(np)^{4}}\left\\|{\Xi_{j}}\right\\|_{\rm F}^{2}+4d{\mathbb{I}\left\{{8\sigma\left\\|{\Xi_{j}}\right\\|\geq\gamma np}\right\}}$
	$\displaystyle\quad+\frac{256\sigma^{2}n}{(np)^{2}}\left(3(1+\eta^{-1})+d\rho^{-2}\right)\left\\|{[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}\Delta^{*}}\right\\|_{\rm F}^{2}$
	$\displaystyle\quad+64n\left(3(1+\eta^{-1})+d\rho^{-2}\right)\left\\|{\Delta_{j}}\right\\|_{\rm F}^{2}.$

As a result, we have

	$\displaystyle\ell^{\text{od}}(\widehat{Z},Z^{*})$
	$\displaystyle\leq\frac{1}{n}\sum_{j\in[n]}\left\\|{\widehat{Z}_{j}-Z^{*}_{j}Ob_{2}}\right\\|_{\rm F}^{2}$
	$\displaystyle\leq\frac{1+\eta}{\left(np-p-c_{2}\sqrt{np}\right)^{2}\left(1-c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)\right)^{2}\left(\frac{1}{np+t}-\frac{\gamma+2\rho}{np}\right)^{2}}$
	$\displaystyle\quad\times\frac{\sigma^{2}}{4(np)^{2}}\frac{1}{n}\sum_{j\in[n]}\left\\|{\Xi_{j}-\Xi_{j}^{\mathrm{\scriptscriptstyle T}}}\right\\|_{\rm F}^{2}$
	$\displaystyle\quad+3(1+\eta^{-1})\frac{16\sigma^{2}t^{2}}{(np)^{4}}\frac{1}{n}\sum_{j\in[n]}\left\\|{\Xi_{j}}\right\\|_{\rm F}^{2}+4d\frac{1}{n}\sum_{j\in[n]}{\mathbb{I}\left\{{8\sigma\left\\|{\Xi_{j}}\right\\|\geq\gamma np}\right\}}$
	$\displaystyle\quad+\frac{256\sigma^{2}}{(np)^{2}}\left(3(1+\eta^{-1})+d\rho^{-2}\right)\sum_{j\in[n]}\left\\|{[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}\Delta^{*}}\right\\|_{\rm F}^{2}$
	$\displaystyle\quad+64\left(3(1+\eta^{-1})+d\rho^{-2}\right)\sum_{j\in[n]}\left\\|{\Delta_{j}}\right\\|_{\rm F}^{2}.$

In the rest of the proof, we are going to simplify the display above. Specifically, we are going to upper bound $\sum_{j\in[n]}\|{\Xi_{j}-\Xi_{j}^{\mathrm{\scriptscriptstyle T}}}\|_{\rm F}^{2}$ , $\sum_{j\in[n]}\left\|{\Xi_{j}}\right\|_{\rm F}^{2}$ , $\sum_{j\in[n]}{\mathbb{I}\left\{{8\sigma\left\|{\Xi_{j}}\right\|\geq\gamma np}\right\}}$ , $\sum_{j\in[n]}\left\|{[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}\Delta^{*}}\right\|_{\rm F}^{2}$ , and $\sum_{j\in[n]}\left\|{\Delta_{j}}\right\|_{\rm F}^{2}$ .

For $\sum_{j\in[n]}\|{\Xi_{j}-\Xi_{j}^{\mathrm{\scriptscriptstyle T}}}\|_{\rm F}^{2}$ and $\sum_{j\in[n]}\left\|{\Xi_{j}}\right\|_{\rm F}^{2}$ , note that they are the left-hand sides of (59) and (60), respectively. Hence, they can be upper bounded by the right-hand sides of (59) and (60), respectively. For $\sum_{j\in[n]}{\mathbb{I}\left\{{8\sigma\left\|{\Xi_{j}}\right\|\geq\gamma np}\right\}}$ , according to Lemma 15, if $\frac{\gamma^{2}np}{d^{2}\sigma^{2}}>c_{3}$ for some $c_{3}>0$ , we have

\displaystyle\sum_{j\in[n]}{\mathbb{I}\left\{{8\sigma\left\|{\Xi_{j}}\right\|\geq\gamma np}\right\}}

\displaystyle\leq\frac{16\sigma^{2}}{\gamma^{2}p}\exp\left(-\sqrt{\frac{\gamma^{2}np}{16\sigma^{2}}}\right)

with probability at least $1-\exp\left(-\sqrt{\frac{\gamma^{2}np}{16\sigma^{2}}}\right)$ . When $c_{3}$ is sufficiently large, it follows that

\displaystyle\frac{16\sigma^{2}}{\gamma^{2}np}\exp\left(-\sqrt{\frac{\gamma^{2}np}{16\sigma^{2}}}\right)\leq\left(\frac{\sigma^{2}}{\gamma^{2}np}\right)^{3}

by the same argument as in the proof of Theorem 3. For $\sum_{j\in[n]}\left\|{[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}\Delta^{*}}\right\|_{\rm F}^{2}$ , we have

	$\displaystyle\sum_{j\in[n]}\left\\|{[(A\otimes J_{d})\circ\mathcal{W}]_{j\cdot}\Delta^{*}}\right\\|_{\rm F}^{2}$	$\displaystyle=\left\\|{(A\otimes J_{d})\circ\mathcal{W}\Delta^{*}}\right\\|_{\rm F}^{2}$
		$\displaystyle\leq\left\\|{(A\otimes J_{d})\circ\mathcal{W}}\right\\|^{2}\left\\|{\Delta^{*}}\right\\|_{\rm F}^{2}$
		$\displaystyle\leq d\left\\|{(A\otimes J_{d})\circ\mathcal{W}}\right\\|^{2}\left\\|{\Delta^{*}}\right\\|^{2}$
		$\displaystyle\leq c_{2}d\left(\sqrt{dnp}\frac{2c_{2}\sqrt{np}+2p}{np}\sqrt{d}\right)^{2},$

where in the second to last inequality we use the fact that $\Delta^{*}$ is rank- $d$ and in the last inequality we use (67). For $\sum_{j\in[n]}\left\|{\Delta_{j}}\right\|_{\rm F}^{2}$ , we have $\sum_{j\in[n]}\left\|{\Delta_{j}}\right\|_{\rm F}^{2}=\left\|{\Delta}\right\|_{\rm F}^{2}\leq d\left\|{\Delta}\right\|^{2}\leq d\left(c_{2}\frac{\sigma^{2}d+\sigma\sqrt{d}}{np}\right)^{2}$ where the last inequality is due to (55).

Using the above results, we have

	$\displaystyle\ell^{\text{od}}(\widehat{Z},Z^{*})$
	$\displaystyle\leq\frac{1+\eta}{\left(np-p-c_{2}\sqrt{np}\right)^{2}\left(1-c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)\right)^{2}\left(\frac{1}{np+t}-\frac{\gamma+2\rho}{np}\right)^{2}}$
	$\displaystyle\quad\times\frac{\sigma^{2}}{4(np)^{2}}2d(d-1)np\left(1+c_{2}^{\prime}\sqrt{\frac{\log n}{n}}\right)$
	$\displaystyle\quad+3(1+\eta^{-1})\frac{16\sigma^{2}t^{2}}{(np)^{4}}d^{2}np\left(1+c_{2}^{\prime}\sqrt{\frac{\log n}{n}}\right)+4d\left(\frac{\sigma^{2}}{\gamma^{2}np}\right)^{3}$
	$\displaystyle\quad+\frac{256\sigma^{2}}{(np)^{2}}\left(3(1+\eta^{-1})+d\rho^{-2}\right)c_{2}d\left(\sqrt{dnp}\frac{2c_{2}\sqrt{np}+2p}{np}\sqrt{d}\right)^{2}$
	$\displaystyle\quad+64\left(3(1+\eta^{-1})+d\rho^{-2}\right)d\left(c_{2}\frac{\sigma^{2}d+\sigma\sqrt{d}}{np}\right)^{2}.$

Note that $\frac{1}{(1-x)^{2}}\leq 1+16x$ for any $0\leq x\leq\frac{1}{2}$ . When $\frac{np}{\log n}$ is greater than some sufficiently large constant, we have $\left(1-c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)\right)^{-2}\leq 16c_{2}\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)$ and $\left(1-c_{2}\frac{1}{\sqrt{np}}-\frac{1}{n}\right)^{-2}\leq 16\left(c_{2}\frac{1}{\sqrt{np}}+\frac{1}{n}\right)$ . When $\frac{np}{d\sigma^{2}}$ is also greater than some sufficiently large constant, we have $\left(\frac{np}{np+t}-\gamma-2\rho\right)^{-2}\leq 16\left(\frac{t}{np+t}+\gamma+2\rho\right)\leq 16\left(\frac{t}{np}+\gamma+2\rho\right)\leq 16\left(\frac{p+c_{2}\sqrt{np}+c_{2}\sigma\sqrt{dnp}}{np}+\gamma+2\rho\right)$ , using the definition of $t$ in (62). We then have

	$\displaystyle\ell^{\text{od}}(\widehat{Z},Z^{*})$
	$\displaystyle\leq 16^{3}c_{2}(1+\eta)\left(c_{2}\frac{1}{\sqrt{np}}+\frac{1}{n}\right)\left(\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)\left(\frac{p+c_{2}\sqrt{np}+c_{2}\sigma\sqrt{dnp}}{np}+\gamma+2\rho\right)$
	$\displaystyle\quad\times\left(1+c_{2}^{\prime}\sqrt{\frac{\log n}{n}}\right)\frac{d(d-1)\sigma^{2}}{2np}$
	$\displaystyle\quad+3(1+\eta^{-1})\left(\frac{p+c_{2}\sqrt{np}+c_{2}\sigma\sqrt{dnp}}{np}\right)^{2}\left(1+c_{2}^{\prime}\sqrt{\frac{\log n}{n}}\right)\frac{16}{np}\frac{d^{2}\sigma^{2}}{np}$
	$\displaystyle\quad+4\gamma^{-6}\left(\frac{\sigma^{2}}{np}\right)^{2}\frac{d\sigma^{2}}{np}+256c_{2}\left(3(1+\eta^{-1})+d\rho^{-2}\right)\left(\frac{2c_{2}}{\sqrt{np}}+\frac{2}{n\sqrt{np}}\right)^{2}\frac{d^{2}\sigma^{2}}{np}$
	$\displaystyle\quad+64\left(3(1+\eta^{-1})+d\rho^{-2}\right)\left(c_{2}\frac{\sigma\sqrt{d}+1}{\sqrt{np}}\right)^{2}\frac{d^{2}\sigma^{2}}{np}.$

After rearrangement, there exists some constant $c_{5}>0$ such that

	$\displaystyle\ell^{\text{od}}(\widehat{Z},Z^{*})$	$\displaystyle\leq\Bigg{(}1+c_{5}\Bigg{(}\eta+\gamma+\rho+\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}+\gamma^{-6}\left(\frac{\sigma^{2}}{np}\right)^{2}+\sqrt{\frac{d\sigma^{2}}{np}}$
		$\displaystyle\quad+\left(\eta^{-1}+d\rho^{-2}\right)\left(\frac{1+d\sigma^{2}}{np}\right)\Bigg{)}\Bigg{)}\frac{d(d-1)\sigma^{2}}{2np}.$

We can take $\gamma^{2}=\sqrt{d^{2}\sigma^{2}/np}$ (then $\frac{\gamma^{2}np}{d^{2}\sigma^{2}}>c_{3}$ is guaranteed as long as $\frac{np}{d^{2}\sigma^{2}}>c_{3}^{2}$ ). We also take $\rho^{2}=\sqrt{(d+d\sigma^{2})/np}$ and let $\eta=\rho^{2}$ . They are guaranteed to be smaller than $1/8$ when $\frac{np}{d}$ and $\frac{np}{d^{2}\sigma^{2}}$ are greater than some large constant. Then, there exists some constant $c_{6}>0$ such that

	$\displaystyle\ell^{\text{od}}(\widehat{Z},Z^{*})$	$\displaystyle\leq\Bigg{(}1+c_{5}\Bigg{(}\left(\frac{d+d\sigma^{2}}{np}\right)^{\frac{1}{2}}+\left(\frac{d^{2}\sigma^{2}}{np}\right)^{\frac{1}{4}}+\left(\frac{d+d\sigma^{2}}{np}\right)^{\frac{1}{4}}+\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}$
		$\displaystyle\quad+d^{-3}\left(\frac{\sigma^{2}}{np}\right)^{\frac{1}{2}}+\sqrt{\frac{d\sigma^{2}}{np}}+(1+d)\sqrt{\frac{np}{d+d\sigma^{2}}}\left(\frac{1+d\sigma^{2}}{np}\right)\Bigg{)}\Bigg{)}\frac{d(d-1)\sigma^{2}}{2np}$
		$\displaystyle\leq\left(1+c_{6}\left(\left(\frac{d+d^{2}\sigma^{2}}{np}\right)^{\frac{1}{4}}+\sqrt{\frac{\log n}{np}}+\frac{1}{\log(np)}\right)\right)\frac{d(d-1)\sigma^{2}}{2np}.$

This holds with probability at least $1-n^{-9}-\exp\left(-\frac{1}{32}\left(\frac{np}{\sigma^{2}}\right)^{\frac{1}{4}}\right)$ .

∎

Appendix C Calculation for (18)

Recall the definitions of $Y^{*}$ and $Y$ in (17). First, we are going to show $v$ , the leading eigenvector of $Y$ , must be a linear combination of $e_{1}$ and $e_{2}$ . Note that for any unit vector $x=(x_{1},\ldots,x_{n})^{\mathrm{\scriptscriptstyle T}}\in\mathbb{R}^{n}$ , we have

\displaystyle x^{\mathrm{\scriptscriptstyle T}}Yx

\displaystyle=x^{\mathrm{\scriptscriptstyle T}}Y^{*}x+x^{\mathrm{\scriptscriptstyle T}}(Y-Y^{*})x=\left(-\sum_{2\leq j\leq n}x_{j}^{2}\right)+\frac{\delta}{2}(x_{1}+x_{2})^{2}=-1+x_{1}^{2}+\frac{\delta}{2}(x_{1}+x_{2})^{2}.

If $x$ maximizes the right-hand side over the unit sphere, it is obvious that neither $x_{1}$ nor $x_{2}$ can be 0. In addition, $x_{1}x_{2}\geq 0$ and $x_{1}^{2}+x_{2}^{2}=1$ must be satisfied; otherwise the right-hand side can be made strictly larger. Then we can write $v=\alpha e_{1}+\sqrt{1-\alpha^{2}}e_{2}$ where $\alpha\in[0,1]$ . Since $Yv=\frac{\delta}{2}(\alpha+\sqrt{1-\alpha^{2}})e_{1}+\left(\frac{\delta}{2}(\alpha+\sqrt{1-\alpha^{2}})-\sqrt{1-\alpha^{2}}\right)e_{2}$ , we have

\displaystyle\frac{\alpha}{\frac{\delta}{2}(\alpha+\sqrt{1-\alpha^{2}})}=\frac{\sqrt{1-\alpha^{2}}}{\left(\frac{\delta}{2}(\alpha+\sqrt{1-\alpha^{2}})-\sqrt{1-\alpha^{2}}\right)}.

After rearrangement, this gives $\delta(2\alpha^{2}-1)=2\alpha\sqrt{1-\alpha^{2}}$ which means $\alpha^{2}>\frac{1}{2}$ . Squaring it yields the equation $4(1+\delta^{2})\alpha^{4}-4(1+\delta^{2})\alpha^{2}+\delta^{2}=0$ whose solution is $\alpha^{2}=\frac{1}{2}\left(1\pm\frac{1}{\sqrt{1+\delta^{2}}}\right)$ . Since $\alpha^{2}>\frac{1}{2}$ , we have $\alpha^{2}=\frac{1}{2}\left(1+\frac{1}{\sqrt{1+\delta^{2}}}\right)$ . Hence,

\displaystyle v=

\displaystyle\sqrt{\frac{1}{2}\left(1+\frac{1}{\sqrt{1+\delta^{2}}}\right)}e_{1}+\sqrt{\frac{1}{2}\left(1-\frac{1}{\sqrt{1+\delta^{2}}}\right)}e_{2}.

We can verify it is the eigenvector of $Y$ corresponding to the eigenvalue $\frac{1}{2}({\delta+\sqrt{1+\delta^{2}}-1})$ .

	$\displaystyle(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})Y(\widecheck{v}-v^{*})$
	$\displaystyle=(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})Y\left(\frac{Yv^{}}{\mu_{1}}-v^{}\right)$
	$\displaystyle=(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})Y\left(\frac{\mu^{}_{1}}{\mu_{1}}-1\right)v^{}+\frac{1}{\mu_{1}}(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})Y\left(Y-Y^{}\right)v^{}$
	$\displaystyle=\left(\frac{\mu^{}_{1}}{\mu_{1}}-1\right)(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})\mu^{}_{1}v^{}+\left(\frac{\mu^{}_{1}}{\mu_{1}}-1\right)(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})(Y-Y^{})v^{}$
	$\displaystyle\quad+\frac{1}{\mu_{1}}(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})\mu_{1}^{}v^{}v^{{\mathrm{\scriptscriptstyle H}}}\left(Y-Y^{}\right)v^{}+\frac{1}{\mu_{1}}(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})(Y^{}-\mu_{1}^{}v^{}v^{{\mathrm{\scriptscriptstyle H}}})\left(Y-Y^{}\right)v^{*}$
	$\displaystyle\quad+\frac{1}{\mu_{1}}(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})(Y-Y^{})\left(Y-Y^{}\right)v^{*}$
	$\displaystyle=\left(\left(\frac{\mu^{}_{1}}{\mu_{1}}-1\right)+\frac{1}{\mu_{1}}v^{{\mathrm{\scriptscriptstyle H}}}\left(Y-Y^{}\right)v^{}\right)\mu^{}_{1}(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})v^{}$
	$\displaystyle\quad+\left(\frac{\mu^{}_{1}}{\mu_{1}}-1\right)(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})(Y-Y^{})v^{}+\frac{1}{\mu_{1}}(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})(Y^{}-\mu_{1}^{}v^{}v^{{\mathrm{\scriptscriptstyle H}}})\left(Y-Y^{}\right)v^{*}$
	$\displaystyle\quad+\frac{1}{\mu_{1}}(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})(Y-Y^{})\left(Y-Y^{}\right)v^{*}.$

	$\displaystyle\left\\|{(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})Y(\widecheck{v}-v^{*})}\right\\|$
	$\displaystyle\leq\left(\left\|\frac{\mu^{}_{1}}{\mu_{1}}-1\right\|+\frac{\left\|v^{{\mathrm{\scriptscriptstyle H}}}\left(Y-Y^{}\right)v^{}\right\|}{\mu_{1}}\right)\mu_{1}^{}\left\\|{(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})v^{}}\right\\|+\left\|\frac{\mu^{}_{1}}{\mu_{1}}-1\right\|\left\\|{Y-Y^{}}\right\\|$
	$\displaystyle\quad+\frac{\left\\|{Y^{}-\mu_{1}^{}v^{}v^{{\mathrm{\scriptscriptstyle H}}}}\right\\|\left\\|{Y-Y^{}}\right\\|}{\mu_{1}}+\frac{\left\\|{Y-Y^{}}\right\\|^{2}}{\mu_{1}}$
	$\displaystyle\leq\left(\left\|\frac{\mu^{}_{1}}{\mu_{1}}-1\right\|+\frac{\left\\|{Y-Y^{}}\right\\|}{\mu_{1}}\right)\mu_{1}^{}\left\\|{(I_{n}-vv^{\mathrm{\scriptscriptstyle H}})v^{}}\right\\|+\left\|\frac{\mu^{}_{1}}{\mu_{1}}-1\right\|\left\\|{Y-Y^{}}\right\\|$
	$\displaystyle\quad+\frac{\|\mu_{2}^{}\|\left\\|{Y-Y^{}}\right\\|}{\mu_{1}}+\frac{\left\\|{Y-Y^{*}}\right\\|^{2}}{\mu_{1}},$

	$\displaystyle\inf_{b\in\mathbb{C}_{1}}\left\\|{v-\widetilde{v}b}\right\\|$
	$\displaystyle\leq\frac{\sqrt{2}}{\frac{3}{5}\frac{\mu_{1}^{}-\mu_{2}^{}}{2}}\Bigg{(}\left(\frac{4\left\\|{Y-Y^{}}\right\\|}{3\mu_{1}^{}}+\frac{\left\\|{Y-Y^{}}\right\\|}{\frac{3}{4}\mu_{1}^{}}\right)\mu_{1}^{}\frac{2\left\\|{Y-Y^{}}\right\\|}{\mu_{1}^{}-\mu_{2}^{}}+\frac{4\left\\|{Y-Y^{}}\right\\|}{3\mu_{1}^{}}\left\\|{Y-Y^{*}}\right\\|$
	$\displaystyle\quad+\frac{\max\{\|\mu_{2}^{}\|,\|\mu_{n}^{}\|\}\left\\|{Y-Y^{}}\right\\|}{\frac{3}{4}\mu_{1}^{}}+\frac{\left\\|{Y-Y^{}}\right\\|^{2}}{\frac{3}{4}\mu_{1}^{}}\Bigg{)}$
	$\displaystyle=\frac{10\sqrt{2}}{3(\mu_{1}^{}-\mu_{2}^{})}\left(\left(\frac{16}{3(\mu_{1}^{}-\mu_{2}^{})}+\frac{8}{3\mu_{1}^{}}\right)\left\\|{Y-Y^{}}\right\\|^{2}+\frac{4\max\{\|\mu_{2}^{}\|,\|\mu_{n}^{}\|\}}{3\mu_{1}^{}}\left\\|{Y-Y^{}}\right\\|\right)$
	$\displaystyle\leq\frac{40\sqrt{2}}{9(\mu_{1}^{}-\mu_{2}^{})}\left(\left(\frac{4}{(\mu_{1}^{}-\mu_{2}^{})}+\frac{2}{\mu_{1}^{}}\right)\left\\|{Y-Y^{}}\right\\|^{2}+\frac{\max\{\|\mu_{2}^{}\|,\|\mu_{n}^{}\|\}}{\mu_{1}^{}}\left\\|{Y-Y^{}}\right\\|\right).$

	$\displaystyle\left\\|{\delta^{*}}\right\\|$	$\displaystyle=\left\\|{z^{}\circ\widecheck{u}-\frac{1}{\sqrt{n}}z^{}\circ\mathds{1}_{n}b_{2}}\right\\|=\left\\|{z^{*}\circ\left(\widecheck{u}-\frac{1}{\sqrt{n}}\mathds{1}_{n}b_{2}\right)}\right\\|$
		$\displaystyle=\left\\|{\widecheck{u}-\frac{1}{\sqrt{n}}\mathds{1}_{n}b_{2}}\right\\|\leq\frac{2c_{2}\sqrt{np}+2p}{np}.$		(39)

	$\displaystyle\left\|\widehat{z}_{j}-z_{j}^{*}b_{1}b_{2}\right\|$	$\displaystyle=\left\|b_{2}\overline{b_{1}z_{j}^{}}\widehat{z}_{j}-1\right\|=\left\|b_{2}\overline{b_{1}z_{j}^{}}\frac{u_{j}}{\left\|u_{j}\right\|}-1\right\|=\left\|\frac{b_{2}\overline{b_{1}z_{j}^{}}u_{j}}{\left\|b_{2}\overline{b_{1}z_{j}^{}}u_{j}\right\|}-1\right\|$
		$\displaystyle=\left\|\frac{\frac{\left\\|{Xu^{}}\right\\|}{\lambda^{}}b_{2}\overline{b_{1}z_{j}^{}}u_{j}}{\left\|\frac{\left\\|{Xu^{}}\right\\|}{\lambda^{}}b_{2}\overline{b_{1}z_{j}^{}}u_{j}\right\|}-1\right\|$		(41)

Exact Minimax Optimality of Spectral Methods in Phase Synchronization and Orthogonal Group Synchronization

Abstract

1 Introduction

Theorem 1.

Theorem 2.

2 Phase Synchronization

2.1 No-additive-noise Case

Lemma 1.

Proposition 1.

2.2 First-order Approximation of The Leading Eigenvector

Lemma 2.

Proposition 2.

2.3 Sharp ℓ2\ell_{2} Analysis of The Spectral Estimator

Theorem 3.

3 Orthogonal Group Synchronization

Lemma 3.

Proposition 3.

Lemma 4.

Proposition 4.

Theorem 4.

4 Discussions

4.1 Comparison of Spectral Method and Other Methods

4.2 Condition on pp

4.3 Other Low-rank Problems

5 Proofs for Phase Synchronization

5.1 Proof of Lemma 2

Lemma 5.

Lemma 6.

Proof of Lemma 2.

5.2 Proofs of Lemma 1, Proposition 1, and Proposition 2

Proof of Lemma 1 .

Lemma 7.

Lemma 8.

Lemma 9 (A simpler version of Theorem 2.1 of [1]).

Lemma 10.

Proof of Proposition 1.

Proof of Proposition 2.

5.3 Proof of Theorem 3

Lemma 11.

Lemma 12 (Lemma 10 and Lemma 11 of [18]).

Proof of Theorem 3.

5.4 Proofs of Auxiliary Lemmas

Proof of Lemma 5.

Proof of Lemma 6.

Proof of Lemma 7.

Proof of Lemma 8.

Proof of Lemma 11.

6 Proof of Lemma 4

Lemma 13.

Proof.

Proof of Lemma 4.

References

Appendix A Proofs of Lemma 3, Proposition 3, and Proposition 4

Proof of Lemma 3.

Proof of Proposition 3.

Lemma 14.

Proof.

Proof of Proposition 4.

Appendix B Proof of Theorem 4

Lemma 15.

Lemma 16 (Lemma 2.1 of [20]).

Proof of Theorem 4.

Appendix C Calculation for (18)

2.3 Sharp $\ell_{2}$ Analysis of The Spectral Estimator

4.2 Condition on $p$