Robust Multi-Dimensional Scaling via Accelerated Alternating Projections

Tong Deng T. Deng and T. Wang are with the School of Mathematics, Southwestern University of Finance and Economics, Chengdu, Sichuan, China. Tianming Wang ¹¹footnotemark: 1

Abstract

We consider the robust multi-dimensional scaling (RMDS) problem in this paper. The goal is to localize point locations from pairwise distances that may be corrupted by outliers. Inspired by classic MDS theories, and nonconvex works for the robust principal component analysis (RPCA) problem, we propose an alternating projection based algorithm that is further accelerated by the tangent space projection technique. For the proposed algorithm, if the outliers are sparse enough, we can establish linear convergence of the reconstructed points to the original points after centering and rotation alignment. Numerical experiments verify the state-of-the-art performances of the proposed algorithm.

1 Introduction

Multi-dimensional scaling (MDS) refers to the localization of points from their pairwise distances, and it has applications in wireless communication, chemistry, computer graphics, and so on [7]. The properties of classic MDS have been well studied.

In reality, the measured pairwise distances may be incomplete, noisy, or, due to sensor malfunction, corrupted by outliers. In this paper, we consider the robust reconstruction of the point locations from distances corrupted by outliers, i.e., the robust multi-dimensional scaling (RMDS) problem. As shown in [5], even a single outlier in the measured distances will cause the false information to spread to all the reconstructed point locations in the MDS calculation.

Suppose that we have a set of points $\{x_{i}\}_{i=1}^{n}\subseteq\mathbb{R}^{r}$ . Denote the squared Euclidean distance matrix (EDM) as $D^{\star}$ . The $(i,j)$ -th entry of $D^{\star}$ is equal to $\|x_{i}-x_{j}\|_{2}^{2}$ . If the pairwise distances are corrupted by outliers, the observed EDM $D$ is equal to $D^{\star}+S^{\star}$ for some outlier matrix $S^{\star}$ . One can see that the considered problem may be solved by methods for robust principal component analysis (RPCA) [4, 13, 16, 3]. However, naively adopting a RPCA solver for the RMDS problem is at the risk of neglecting the inner structure of the EDM, and may result in degraded reconstruction performances [10].

To combat outliers in the EDM, [8] casts the problem as an unconstrained optimization, where the variables are the data matrix of size $n\times r$ , and the outlier matrix of size $n\times n$ . The data fidelity term of the objective function is the squared Frobenius norm of the difference between the observed distance matrix, and the distance matrix formed by the data matrix plus the outlier matrix. The regularization term of the objective function is the $l_{1}$ penalty for the outlier matrix. The optimization is conducted using the majorization-minimization approach. Later, the iterative scheme of [8] is improved by [12], replacing the squared Frobenius norm with the more outlier-robust M-estimators. A more recent work [9], formulates the RMDS problem as a constrained optimization. Besides the outlier matrix, another sought-after matrix is constrained in a way such that it is the distance matrix formed by $n$ points with dimension less than or equal to $r$ . Compared to [8, 12], [9] shows improved performances. Weights and bounds for the entries of the distance matrix can also be handled easily in [9]. Different from the optimization perspectives of the aforementioned works, [1] proposes to first detect the outliers by the broken triangular inequalities among the distances, and then compute a weighted MDS without the detected corrupted distances. However, as shown in [9], oftentimes such an approach is not enough to yield satisfactory reconstruction performances.

Inspired by classic MDS theories, and state-of-the-art nonconvex solvers [13, 3] for the RPCA problem, we propose a nonconvex algorithm that alternates between finding the outlier matrix, and the Gram matrix that generates the outlier-free EDM. Our contributions can be summarized as the following.

•

For the proposed algorithm, we can establish linear convergence of the reconstructed points to the original points after centering and rotation alignment, if the outliers are sparse enough. The relationship between the amount of tolerable outliers and the properties of the original points, is also made clear in the theoretical guarantees.
•

We numerically verify the performances of the proposed algorithm in the considered outlier only setting, and demonstrate its advantages compared to other state-of-the-art solvers in the noise plus outlier setting.

1.1 Problem Formulation and Assumptions

Some necessary notations are first introduced. We then describe our problem formulation, and the assumptions to derive the theoretical guarantees.

Notations. The set of symmetric matrices of size $n$ is denoted by $\mathbb{S}^{n\times n}$ . The set of rotation matrices of size $r$ is denoted by $\mathcal{O}(r)$ , i.e., $\mathcal{O}(r)=\{G\in\mathbb{R}^{r\times r}~{}|~{}G^{T}G=I_{r}\}$ . $\forall Z\in\mathbb{S}^{n\times n}$ , $\operatorname{supp}(Z)$ is the indices of the nonzeros in $Z$ , and $\operatorname{diag}(Z)\in\mathbb{R}^{n}$ is the vector that contains the diagonal entries of $Z$ . $\bm{1}\in\mathbb{R}^{n}$ denotes the all-one vector, and $J=I_{n}-\frac{1}{n}\bm{1}\bm{1}^{T}$ . $\forall Z\in\mathbb{R}^{n\times r}$ , $\|Z\|_{2,\infty}:=\max_{1\leq i\leq n}\|e_{i}^{T}Z\|_{2}.$ For any matrix $M$ , the spectral norm, the Frobenius norm, and the entrywise infinity norm of $M$ are denoted by $\|M\|_{2}$ , $\|M\|_{\mathrm{F}}$ , and $\|M\|_{\infty}$ , respectively.

Recall that the set of points is $\{x_{i}\}_{i=1}^{n}\subseteq\mathbb{R}^{r}$ , and the corresponding EDM is $D^{\star}$ . As it turns out, it is possible to recover another set of points centered at zero and preserve the pairwise distances of $\{x_{i}\}_{i=1}^{n}$ . Denote the data matrix as $X\in\mathbb{R}^{n\times r}$ , whose $i$ -row is $x_{i}^{T}$ . Let $c=\frac{1}{n}\sum_{i=1}^{n}x_{i}$ , and then denote the centered data matrix as $X_{c}\in\mathbb{R}^{n\times r}$ , whose $i$ -row is $(x_{i}-c)^{T}$ . Define the operator $\mathcal{A}:\mathbb{S}^{n\times n}\rightarrow\mathbb{S}^{n\times n}$ such that

\mathcal{A}(Z)=\text{diag}(Z)\bm{1}^{T}+\bm{1}\text{diag}(Z)^{T}-2Z,\quad\forall Z\in\mathbb{S}^{n\times n}.

(1)

One can verify that $D^{\star}=\mathcal{A}(XX^{T})=\mathcal{A}(X_{c}X_{c}^{T})$ . Hence, in many applications, it often suffices to reconstruct $X_{c}$ from $D^{\star}$ .

Denote $L^{\star}:=X_{c}X_{c}^{T}$ , and define the operator $\mathcal{B}:\mathbb{S}^{n\times n}\rightarrow\mathbb{S}^{n\times n}$ such that

\mathcal{B}(Z)=-\frac{1}{2}\cdot JZJ,\quad\forall Z\in\mathbb{S}^{n\times n}.

(2)

Since $L^{\star}\bm{1}=0$ , by Lemma 5, $\mathcal{B}(D^{\star})=\mathcal{B}(\mathcal{A}(L^{\star}))=L^{\star}$ . Furthermore, $X_{c}$ can be reconstructed, after rotation alignment, from the eigen-decomposition of $L^{\star}$ . Without loss of generality, we assume that $X_{c}$ is of full column rank, then $L^{\star}$ is a rank- $r$ positive semi-definite matrix. Denote the eigen-decomposition of $L^{\star}$ as $U^{\star}\Lambda^{\star}(U^{\star})^{T}$ , where $U^{\star}\in\mathbb{R}^{n\times r}$ , $\Lambda^{\star}=\text{diag}(\lambda_{1},\cdots,\lambda_{r})$ , and $\lambda_{1}^{\star}\geq\cdots\geq\lambda_{r}^{\star}>0$ . Then denote $X^{\star}=U^{\star}(\Lambda^{\star})^{\frac{1}{2}}$ . One can show that there exists a rotation matrix $Q^{\star}\in\mathcal{O}(r)$ such that $X_{c}=X^{\star}Q^{\star}$ .

In the considered setting, the observed EDM $D$ is equal to $D^{\star}+S^{\star}$ for some outlier matrix $S^{\star}\in\mathbb{S}^{n\times n}$ . Our aim is to recover $L^{\star}$ (eventually, $X_{c}$ ) from $D=\mathcal{A}(L^{\star})+S^{\star}$ . To derive our theoretical guarantees, similar to the incoherence and sparsity assumptions commonly made in the RPCA literature [13, 3], we assume the following about $L^{\star}$ and $S^{\star}$ .

Assumption 1.

Suppose $L^{\star}$ has the eigen-decomposition $U^{\star}\Lambda^{\star}(U^{\star})^{T}$ , where $U^{\star}\in\mathbb{R}^{n\times r}$ , $\Lambda^{\star}=\text{diag}(\lambda_{1},\cdots,\lambda_{r})$ , and $\lambda_{1}^{\star}\geq\cdots\lambda_{r}^{\star}>0$ . We assume that $L^{\star}$ is $\mu$ -incoherent¹¹1One can show that $\mu\in[1,\frac{n}{r}]$ . Typically, $\mu=O(1)$ ., i.e.,

\|U^{\star}\|_{2,\infty}\leq\sqrt{\frac{\mu r}{n}}.

From this assumption, one can immediately get

\|L^{\star}\|_{\infty}=\max_{i,j}\|e_{i}^{T}U^{\star}\Lambda^{\star}(U^{\star})^{T}e_{j}\|\leq\frac{\mu r}{n}\lambda_{1}^{\star}.

Assumption 2.

The outlier matrix $S^{\star}\in\mathbb{S}^{n\times n}$ is $\alpha$ -sparse, i.e., it has no more than $\alpha n$ nonzero entries per row (and column).

2 Algorithm

Our proposed algorithm, described in Algorithm 1, is inspired by classic MDS theories, and state-of-the-art nonconvex solvers [13, 3] for RPCA.

Algorithm 1 RMDS via Accelerated Alternating Projections (RMDS-AAP)

1:Inputs: EDM

D

, target rank

r

, threshold parameter

\xi^{0}>0

, and decay rate

\gamma\in(0,1)

2:Initialization:

S^{0}=\mathcal{T}_{\xi^{0}}(D)

L^{1}=\mathcal{H}_{r}^{+}\mathcal{B}(D-S^{0})

3:for

k=1,2,\cdots

S^{k}=\mathcal{T}_{\xi^{k}}(D-\mathcal{A}(L^{k}))

, where

\xi^{k}=\xi^{0}\cdot\gamma^{k}

L^{k+1}=\mathcal{H}_{r}^{+}\mathcal{P}_{T^{k}}\mathcal{B}(D-S^{k})

6:end for

At initialization, with the hard thresholding function $\mathcal{T}_{\xi}(z)~{}(\xi>0):\mathbb{R}\rightarrow\mathbb{R}$ defined as

\mathcal{T}_{\xi}(z)=\left\{\begin{array}[]{cc}z&|z|>\xi\\ 0&|z|\leq\xi\end{array}\right.,

large entries corrupted by outliers in $D$ are picked out. Then in the spirit of MDS, $L^{1}$ is computed from $D-S^{0}$ . Here, the operator $\mathcal{B}$ is defined as in (2), and $\forall Z\in\mathbb{S}^{n\times n}$ with the eigen-decomposition $U\Lambda U^{T}$ , where $U=[u_{1},\cdots,u_{n}]\in\mathbb{R}^{n\times n}$ , $\Lambda=\operatorname{diag}(\lambda_{1},\cdots,\lambda_{n})$ , and $\lambda_{1}\geq\cdots\geq\lambda_{n}$ ,

\mathcal{H}_{r}^{+}(Z)=\sum_{i=1}^{r}\max\{\lambda_{i},0\}\cdot u_{i}u_{i}^{T}.

For later iterations ( $k\geq 1$ ), the threshold parameter $\xi^{k}$ is adjusted by the decay rate $\gamma$ to picked out the entries with outliers. Denote the eigen-decomposition of $L^{k}$ as $U^{k}\Lambda^{k}(U^{k})^{T}$ , where $U^{k}\in\mathbb{R}^{n\times r}$ , the operator $\mathcal{P}_{T^{k}}$ defined as the following,

\mathcal{P}_{T^{k}}(Z):=U^{k}(U^{k})^{T}Z+ZU^{k}(U^{k})^{T}-U^{k}(U^{k})^{T}ZU^{k}(U^{k})^{T},~{}\forall Z\in\mathbb{S}^{n\times n},

(3)

is the projection onto the tangent space of the manifold of symmetric positive semi-definite matrices of rank $r$ at $L^{k}$ . Such tangent space projection applied before the partial eigen-decomposition has been proven useful in deriving theoretical guarantees as well as reducing computation cost [15, 3, 2].

For RMDS-AAP, the dominant cost is the computation of $L^{k}$ each iteration. At initialization, the flops are about $5n^{2}+O(n^{2}r)$ , where the hidden constant comes from the partial eigen-decomposition. For later iterations, with the help of the tangent space projection, the partial eigen-decomposition of a $n\times n$ matrix is reduced to several matrix-matrix multiplications that costs about $5n^{2}+2n^{2}r+8nr^{2}$ flops, a QR factorization of a $n\times r$ matrix, and a small eigen-decomposition of a $2r\times 2r$ matrix. The total cost is about $5n^{2}+2n^{2}r+O(nr^{2})$ flops. For completeness, we include the implementation details in Appendix A. Compared to [3], the main difference comes from applying $\mathcal{B}$ before the tangent space projection.

For the theoretical guarantees of RMDS-AAP, we have derived the following results.

Theorem 1.

Suppose that RMDS-AAP is provided with $\xi^{0}$ that satisfies $\|D^{\star}\|_{\infty}\leq\xi^{0}\leq 3\|D^{\star}\|_{\infty}$ , and $\gamma\in[\frac{1}{3},1)$ . Denote $\kappa:=\frac{\lambda_{1}^{\star}}{\lambda_{r}^{\star}}$ . If

\alpha\leq\frac{1}{1624}\cdot\frac{\gamma}{\mu r\kappa^{2}},

then for $\forall k\geq 0$ , $\text{supp}(S^{k})\subseteq\text{supp}(S^{\star})$ ,

\|S^{k}-S^{\star}\|_{\infty}\leq(4\|D^{\star}\|_{\infty})\gamma^{k},

and

\|L^{k+1}-L^{\star}\|_{\infty}\leq\frac{\|D^{\star}\|_{\infty}}{4}\gamma^{k+1}.

Remark 1. The constant in the bound for $\alpha$ can be further optimized. The established bound $O(\frac{1}{\mu r\kappa^{2}})$ is better than the one $O(\frac{1}{\mu r^{2}\kappa^{3}})$ in [3] for the RPCA problem, and the one $O(\frac{1}{\mu^{2}r^{2}\kappa^{2}})$ in [2] for the robust recovery of low-rank Hankel matrices, showing the merits of our proof. When $\kappa=O(1)$ , the bound matches the optimal one $O(\frac{1}{\mu r})$ in [13] for the RPCA problem.

Proposition 1.

Assume the same conditions of Theorem 1. For $\forall k\geq 0$ , further denote $X^{k+1}=U^{k+1}(\Lambda^{k+1})^{\frac{1}{2}}$ , and $X^{\star}=U^{\star}(\Lambda^{\star})^{\frac{1}{2}}$ . Suppose the rank- $r$ SVD of $\left(X^{\star}\right)^{T}X^{k+1}$ is $Y^{k+1}\widetilde{\Sigma}^{k+1}(Z^{k+1})^{T}$ . Set $R^{k+1}=Y^{k+1}(Z^{k+1})^{T}$ . Then

\|X^{k+1}-X^{\star}R^{k+1}\|_{2,\infty}\leq\sqrt{\frac{\mu r\kappa\lambda_{1}^{\star}}{n}}\gamma^{k+1}.

Remark 2. As mentioned in the problem formulation, $X_{c}=X^{\star}Q^{\star}$ for some rotation matrix $Q^{\star}$ . Furthermore, one can show that the computed minimizer to

\min_{G\in\mathcal{O}(r)}\|X^{k+1}-X_{c}G\|_{F}=\min_{G\in\mathcal{O}(r)}\|X^{k+1}-(X^{\star}Q^{\star})G\|_{F}

is $(Q^{\star})^{T}R^{k+1}$ . Therefore, $X^{k+1}-X^{\star}R^{k+1}=X^{k+1}-X_{c}(Q^{\star})^{T}R^{k+1}$ , and Proposition 1 actually establishes the linear convergence of the reconstructed points $X^{k+1}$ to $X_{c}$ after the best rotation alignment by $(Q^{\star})^{T}R^{k+1}$ . Also, the contraction of error in the $l_{2}$ norm is uniform for all the points.

3 Numerical Experiments

We perform tests on the plus sign example that appeared in [8, 9]. The tests are first conducted for the noiseless case, i.e., the distances are only corrupted by outliers, to verify our theoretical guarantees for RMDS-AAP. We then consider the more realistic case, i.e., the distances are corrupted by both noise and outliers, and show the empirical performances of RMDS-AAP.

Noiseless Case. In this case, we consider a plus sign consists of $101$ 2D points $\{x_{i}\}_{i=1}^{101}$ , which are centered at $c=[6~{}6]^{T}$ , and have four end points at $[-19~{}6]^{T}$ , $[31~{}6]^{T}$ , $[6~{}-19]^{T}$ , and $[6~{}31]^{T}$ . The data matrix $X\in\mathbb{R}^{101\times 2}$ has $x_{i}^{T}$ as its $i$ -th row, and $X_{c}=X-\bm{1}c^{T}$ . The ground truth Gram matrix $L^{\star}=X_{c}X_{c}^{T}\in\mathbb{R}^{101\times 101}$ has incoherence parameter $\mu\approx 3$ , and condition number $\kappa=1$ . To test the reconstruction performances of RMDS-AAP against outliers, $m$ out of the $N:=\frac{101\times 100}{2}=5050$ distances $\left\{d_{ij}:=\|x_{i}-x_{j}\|_{2}~{}|~{}1\leq j<i\leq n\right\}$ are randomly sampled, and each is added with an outlier whose value is uniformly selected within $[0~{}40]$ . Denote the percentage of outliers as $p:=\frac{m}{N}$ . For $p\in\{0.05,0.1,0.15,\cdots,0.6\}$ , we test RMDS-AAP with different initial threshold values of $\xi^{0}$ and decay rate values of $\gamma$ . The results, averaged among 1000 simulations, are reported in Fig. 1.

In the left subfigure, the logarithms of the averaged errors in terms of $\log_{10}(\|S^{k}-S^{\star}\|_{\infty})$ , $\log_{10}(\|L^{k+1}-L^{\star}\|_{\infty})$ , and $\log_{10}(\|X^{k+1}-X^{\star}R^{k+1}\|_{\infty})$ are plotted when $p=0.05$ , $\xi^{0}=1.2\|D^{\star}\|_{\infty}$ , and $\gamma=0.5$ . One can see the linear convergence of the three terms, in agreement with Theorem 1 and Proposition 1. In the middle and right subfigures, the reconstruction of each simulation is considered successful if, at convergence,

\|X^{k+1}-X^{\star}R^{k+1}\|_{2,\infty}<0.01\cdot\|X^{\star}\|_{2,\infty}.

In the middle subfigure, $\xi^{0}$ is selected as $1.2\|D^{\star}\|_{\infty}$ , and the success rates computed out of the 1000 simulations are compared for $\gamma\in\{0.5,0.7,0.9\}$ . As predicted in Theorem 1, one can see that larger $\gamma$ indeed admits successful reconstruction from more outliers. In the right subfigure, the success rates when $\gamma=0.9$ , and $\xi^{0}\in\{\|D^{\star}\|_{\infty},1.1\|D^{\star}\|_{\infty},\cdots,1.5\|D^{\star}\|_{\infty}\}$ are shown, whereas the white color indicates success rate 1, and the black color indicates success rate 0. One can see that RMDS-AAP shows some desired robustness to the initial threshold value $\xi^{0}$ .

Noisy Case. We then empirically test the reconstruction ability of RMDS-AAP when the distances are also noisy. Following the setup of [9], 25 points centered at $c=[6~{}6]^{T}$ form the plus sign, with 4 anchor points at $[0~{}6]^{T}$ , $[12~{}6]^{T}$ , $[6~{}0]^{T}$ , and $[6~{}12]^{T}$ . For $1\leq j<i\leq n$ , each $d_{ij}$ is first added with a zero-mean Gaussian noise with variation $\sigma^{2}\in\{0,0.1,0.2\}$ . Then $m\in\{15,30,45,60,75\}$ out of the $N:=\frac{25\times 24}{2}-6=294$ distances, discounting for the 6 pairwise distances between the 4 anchor points, are sampled randomly to independently add with an outlier whose value is uniformly selected within $[0~{}20]$ . We use $\xi^{0}=1.2\|D^{\star}\|_{\infty}$ and $\gamma=0.7$ for RMDS-AAP. At convergence, a linear mapping $\mathcal{T}$ , consists of translation and rotation, is constructed to find the best alignment between the reconstructed 4 anchor points and the original 4 anchor points. Denote $\Omega$ as the indices of the 4 anchor points, and denote the $i$ -th row of the $X^{k+1}$ at convergence as $(x_{i}^{\text{rec}})^{T}$ . The error measure is computed at the other 21 points after alignment, i.e., $\text{RMSE}:=\sqrt{\sum_{i\notin\Omega}\|\mathcal{T}(x_{i}^{\text{rec}})-x_{i}\|_{2}^{2}/21}.$ Since in this example, FSMDS [9] shows the best performances, compared to RMDS [8], HQMMDS [12], and TMDS [1], we only compare RMDS-AAP with FSMDS. In each setting, the mean and standard deviation of the RMSE values of RMDS-AAP are computed over 1000 simulations. From Fig. 2, one can see that regardless of the noise strength, RMDS-AAP generally produces reconstruction with lower RMSE values, as well as smaller variations.

4 Proofs

We first collect some useful results in Section 4.1. Some of them are from the literature, and the proofs for the new ones are included in the Appendix. We then present the proofs for Theorem 1, splitted into the proof for the initialization phase, and the proof for the iteration phase, in Section 4.2, and Section 4.3, respectively. The proof for Proposition 1 is presented in Section 4.4.

4.1 Useful Lemmas

Lemma 1 ([6, Lemma 19]).

Let $L^{\star}=U^{\star}\Lambda^{\star}(U^{\star})^{T}$ , and $L=\mathcal{H}_{r}^{+}\left(L^{\star}+E\right)$ for some perturbation matrix $E\in\mathbb{S}^{n\times n}$ . Denote the eigen-decomposition of $L$ as $U\Lambda U^{T}$ . Suppose the rank- $r$ SVD of $\left(U^{\star}\right)^{T}U$ is $A\Sigma B^{T}$ . Set $G=AB^{T}$ and $\Delta=U-U^{\star}G$ . If $\|E\|_{2}\leq\frac{1}{2}\lambda_{r}^{\star}$ , then

\displaystyle\left\|L-L^{\star}\right\|_{\infty}\leq

\displaystyle\|\Delta\|_{2,\infty}\left(\|U\|_{2,\infty}+\|U^{\star}\|_{2,\infty}\right)\|\Lambda\|_{2}+(3+4\kappa)\left\|U^{\star}\right\|_{2,\infty}^{2}\|E\|_{2}.

Lemma 2 ([6, Lemma 1]).

Under the same conditions of Lemma 1,

	$\displaystyle\left\\|\Lambda^{\star}G-G\Lambda^{\star}\right\\|_{2}$	$\displaystyle\leq\left(2+\frac{2\lambda_{1}^{\star}}{\lambda_{r}^{\star}-\\|E\\|_{2}}\right)\\|E\\|_{2},$
	$\displaystyle\left\\|\Lambda^{\star}H-G\Lambda^{\star}\right\\|_{2}$	$\displaystyle\leq\left(2+\frac{\lambda_{1}^{\star}}{\lambda_{r}^{\star}-\\|E\\|_{2}}\right)\\|E\\|_{2}.$

Lemma 3 ([11, Lemma 47]).

Assume the same conditions of Lemma 1. Denote $X=U(\Lambda)^{\frac{1}{2}}$ , and $X^{\star}=U^{\star}(\Lambda^{\star})^{\frac{1}{2}}$ . Suppose the rank- $r$ SVD of $\left(X^{\star}\right)^{T}X$ is $Y\widetilde{\Sigma}Z^{T}$ . Set $R=YZ^{T}$ . Then,

\|G-R\|_{2}\leq 28\kappa^{\frac{3}{2}}\cdot\frac{\|E\|_{2}}{\lambda_{r}^{\star}}.

Lemma 4 ([13, Lemma 4]).

If $S\in\mathbb{S}^{n\times n}$ is $\alpha$ -sparse, i.e., $S$ has no more than $\alpha n$ nonzero entries per row (and column), then $\left\|S\right\|_{2}\leq\alpha n\cdot\left\|S\right\|_{\infty}$ .

Lemma 5.

The operator $\mathcal{A}$ satisfies:

1.

$\forall Z\in\mathbb{S}^{n\times n}$ , $\|\mathcal{A}(Z)\|_{\infty}\leq 4\|Z\|_{\infty}$ ;
2.

$\forall Z\in\mathbb{S}^{n\times n}$ with $Z\bm{1}=0$ , $\mathcal{B}(\mathcal{A}(Z))=Z$ .

The proof of Lemma 5 is deferred to Appendix B.1.

Lemma 6.

If $\|\mathcal{A}(L^{k})-D^{\star}\|_{\infty}\leq\xi^{k}$ , and $S^{k}=\mathcal{T}_{\xi^{k}}(D-\mathcal{A}(L^{k}))$ , then $\text{supp}(S^{k})\subseteq\text{supp}(S^{\star})$ , and

\|S^{k}-S^{\star}\|_{\infty}\leq\|\mathcal{A}(L^{k})-D^{\star}\|_{\infty}+\xi^{k}.

The proof of Lemma 6 is deferred to Appendix B.2.

Lemma 7.

The matrix $J=I_{n}-\frac{1}{n}\bm{1}\bm{1}^{T}$ satisfies:

1.

$\|J\|_{2}=1$ ,
2.

$JU^{\star}=U^{\star}$ ,
3.

$\|JZ\|_{2,\infty}\leq 2\|Z\|_{2,\infty},~{}\forall Z\in\mathbb{R}^{n\times r}$ .

The proof of Lemma 7 is deferred to Appendix B.3.

Lemma 8 ([3, Lemma 6]).

Suppose $L$ is a rank- $r$ positive semi-definite matrix with the eigen-decomposition $L=U\Lambda(U)^{T}$ , where $U\in\mathbb{R}^{n\times r}$ . Let $\mathcal{P}_{T}$ be the projection onto the tangent space of the manifold of symmetric positive semi-definite matrices of rank $r$ at $L$ , as defined in (3). Then,

\|(\mathcal{I}-\mathcal{P}_{T})(L^{\star})\|_{2}\leq\frac{\|L-L^{\star}\|_{2}^{2}}{\lambda_{1}^{\star}}.

Lemma 9 ([3, Lemma 8]).

Under the same conditions of Lemma 8,

\|\mathcal{P}_{T}(Z)\|_{2}\leq\frac{4}{3}\|Z\|_{2},~{}\forall Z\in\mathbb{S}^{n\times n}.

4.2 Proof for the Initialization Phase

Considering $L^{0}$ as the zero matrix, $S^{0}=\mathcal{T}_{\xi^{0}}(D-\mathcal{A}(L^{0}))$ . Since

\|\mathcal{A}(L^{0})-D^{\star}\|_{\infty}=\|D^{\star}\|_{\infty}\leq\xi^{0},

by Lemma 6, $\text{supp}(S^{0})\subseteq\text{supp}(S^{\star})$ , and

\|S^{0}-S^{\star}\|_{\infty}\leq\|D^{\star}\|_{\infty}+\xi^{0}\leq 4\|D^{\star}\|_{\infty}\leq 16\|L^{\star}\|_{\infty}\leq 16\frac{\mu r}{n}\lambda_{1}^{\star},

where the third inequality follows from $D^{\star}=\mathcal{A}(L^{\star})$ and Lemma 5. Denoting $E^{0}:=\mathcal{B}(S^{0}-S^{\star})$ ,

$\displaystyle\\|E^{0}\\|_{2}=\\|\mathcal{B}(S^{0}-S^{\star})\\|_{2}=$	$\displaystyle\frac{1}{2}\\|J(S^{0}-S^{\star})J\\|_{2}$	(4)
$\displaystyle\leq$	$\displaystyle\frac{1}{2}\\|S^{0}-S^{\star}\\|_{2}$
$\displaystyle\leq$	$\displaystyle\frac{\alpha n}{2}\\|S^{0}-S^{\star}\\|_{\infty}$
$\displaystyle\leq$	$\displaystyle 2(\alpha n)\\|D^{\star}\\|_{\infty}\leq 8(\alpha\mu r)\lambda_{1}^{\star},$

where the first inequality follows from Lemma 7, and the second inequality follows from Lemma 4. Therefore $\|E^{0}\|_{2}$ is no more than $\frac{\lambda_{r}^{\star}}{2}$ if $\alpha\leq\frac{1}{16}\cdot\frac{1}{\mu r\kappa}$ .

Suppose $L^{1}=\mathcal{H}_{r}^{+}(L^{\star}-E^{0})$ has the eigen-decomposition $U^{1}\Lambda^{1}(U^{1})^{T}$ , where $U^{1}\in\mathbb{R}^{n\times r}$ , and $\Lambda^{1}=\text{diag}(\lambda_{1}^{1},\cdots,\lambda_{r}^{1})$ . Lemma 1 is applicable once we have the row-wise bound of $\Delta^{1}:=U^{1}-U^{\star}G^{1}$ , where $G^{1}$ is the best rotation matrix between $U^{1}$ and $U^{\star}$ computed from the SVD of $H^{1}:=(U^{\star})^{T}U^{1}$ . Consider $i=1,\cdots,n$ .

	$\displaystyle e_{i}^{T}\Delta^{1}=$	$\displaystyle e_{i}^{T}U^{1}-e_{i}^{T}U^{\star}G^{1}$
	$\displaystyle=$	$\displaystyle e_{i}^{T}(L^{\star}-E^{0})U^{1}(\Lambda^{1})^{-1}-e_{i}^{T}U^{\star}G^{1}$
	$\displaystyle=$	$\displaystyle e_{i}^{T}U^{\star}\Lambda^{\star}\left[(U^{\star})^{T}U^{1}(\Lambda^{1})^{-1}-(\Lambda^{\star})^{-1}G^{1}\right]-e_{i}^{T}E^{0}U^{1}(\Lambda^{1})^{-1}$
	$\displaystyle=$	$\displaystyle\underbrace{e_{i}^{T}U^{\star}\Lambda^{\star}\left[(U^{\star})^{T}U^{1}(\Lambda^{\star})^{-1}-(\Lambda^{\star})^{-1}G^{1}\right]}_{T_{1}}$
		$\displaystyle+\underbrace{e_{i}^{T}U^{\star}\Lambda^{\star}(U^{\star})^{T}U^{1}\left[(\Lambda^{1})^{-1}-(\Lambda^{\star})^{-1}\right]}_{T_{2}}-\underbrace{e_{i}^{T}E^{0}U^{1}(\Lambda^{1})^{-1}}_{T_{3}}.$

Bounding $T_{1}$ .

	$\displaystyle\\|T_{1}\\|_{2}=$	$\displaystyle\\|e_{i}^{T}U^{\star}\left[\Lambda^{\star}(U^{\star})^{T}U^{1}-G^{1}\Lambda^{\star}\right](\Lambda^{\star})^{-1}\\|_{2}$
	$\displaystyle\leq$	$\displaystyle\\|e_{i}^{T}U^{\star}\\|_{2}\cdot\\|\Lambda^{\star}H^{1}-G^{1}\Lambda^{\star}\\|_{2}\cdot\\|(\Lambda^{\star})^{-1}\\|_{2}$
	$\displaystyle\leq$	$\displaystyle\sqrt{\frac{\mu r}{n}}\cdot(2+2\kappa)\\|E^{0}\\|_{2}\cdot\frac{1}{\lambda_{r}^{\star}},$

where in the last inequality, Lemma 2 is applied with the bound $\|E^{0}\|_{2}\leq\frac{1}{2}\lambda_{r}^{\star}$ .

Bounding $T_{2}$ . Due to Weyl’s inequality,

|\lambda_{j}^{1}-\lambda_{j}^{\star}|\leq\|E^{0}\|_{2},~{}j=1,\cdots,r.

\|(\Lambda^{1})^{-1}-(\Lambda^{\star})^{-1}\|_{2}=\max_{j=1,\cdots,r}\left|\frac{1}{\lambda_{j}^{1}}-\frac{1}{\lambda_{j}^{\star}}\right|=\max_{j=1,\cdots,r}\frac{|\lambda_{j}^{1}-\lambda_{j}^{\star}|}{\lambda_{j}^{1}\lambda_{j}^{\star}}\leq 2\frac{\|E^{0}\|_{2}}{(\lambda_{r}^{\star})^{2}},

where $\lambda_{r}^{1}\geq\frac{1}{2}\lambda_{r}^{\star}$ is used in the last inequality. Therefore,

\displaystyle\|T_{2}\|_{2}\leq

\displaystyle\|e_{i}^{T}U^{\star}\|_{2}\cdot\|\Lambda^{\star}\|_{2}\cdot\|(U^{\star})^{T}U^{1}\|_{2}\cdot\|(\Lambda^{1})^{-1}-(\Lambda^{\star})^{-1}\|_{2}\leq\sqrt{\frac{\mu r}{n}}\cdot 2\kappa\frac{\|E^{0}\|_{2}}{\lambda_{r}^{\star}}.

Bounding $T_{3}$ .

\displaystyle\|T_{3}\|_{2}\leq\|e_{i}^{T}E^{0}U^{1}\|_{2}\cdot\|(\Lambda^{1})^{-1}\|_{2}\leq\|e_{i}^{T}E^{0}U^{1}\|_{2}\cdot\frac{2}{\lambda_{r}^{\star}},

therefore we just need to bound $\|e_{i}^{T}E^{0}U^{1}\|_{2}$ .

\displaystyle\|e_{i}^{T}E^{0}U^{1}\|_{2}\leq\|e_{i}^{T}E^{0}U^{\star}G^{1}\|_{2}+\|e_{i}^{T}E^{0}(U^{1}-U^{\star}G^{1})\|_{2}=\|e_{i}^{T}E^{0}U^{\star}\|_{2}+\|e_{i}^{T}E^{0}\Delta^{1}\|_{2}.

For the first term,

	$\displaystyle\\|e_{i}^{T}E^{0}U^{\star}\\|_{2}=\\|e_{i}^{T}\mathcal{B}(S^{0}-S^{\star})U^{\star}\\|_{2}=$	$\displaystyle\frac{1}{2}\\|e_{i}^{T}J(S^{0}-S^{\star})JU^{\star}\\|_{2}$
	$\displaystyle=$	$\displaystyle\frac{1}{2}\\|e_{i}^{T}J(S^{0}-S^{\star})U^{\star}\\|_{2}$
	$\displaystyle\leq$	$\displaystyle\frac{1}{2}\\|J(S^{0}-S^{\star})U^{\star}\\|_{2,\infty}$
	$\displaystyle\leq$	$\displaystyle\\|(S^{0}-S^{\star})U^{\star}\\|_{2,\infty}$
	$\displaystyle\leq$	$\displaystyle(\alpha n)\\|S^{0}-S^{\star}\\|_{\infty}\cdot\sqrt{\frac{\mu r}{n}}\leq 4(\alpha n)\\|D^{\star}\\|_{\infty}\sqrt{\frac{\mu r}{n}},$

where the third equality and the second inequality follows from Lemma 7, the third inequality follows from triangular inequality and the row-wise bound of $U^{\star}$ .

The second term can be bounded similarly. Since

\bm{1}^{T}(L^{\star}-E^{0})=\bm{1}^{T}(L^{\star}-\mathcal{B}(S^{0}-S^{\star}))=0,

$\bm{1}$ is in the null space of $L^{1}$ . From $\bm{1}^{T}(U^{1}-U^{\star}G^{1})=0$ , we can get $J\Delta^{1}=\Delta^{1}$ , and

	$\displaystyle\\|e_{i}^{T}E^{0}\Delta^{1}\\|_{2}\leq$	$\displaystyle\\|(S^{0}-S^{\star})\Delta^{1}\\|_{2,\infty}$
	$\displaystyle\leq$	$\displaystyle(\alpha n)\\|S^{0}-S^{\star}\\|_{\infty}\cdot\\|\Delta^{1}\\|_{2,\infty}\leq 4(\alpha n)\\|D^{\star}\\|_{\infty}\\|\Delta^{1}\\|_{2,\infty}.$

Therefore,

	$\displaystyle\\|T_{3}\\|_{2}\leq$	$\displaystyle 8(\alpha n)\frac{\\|D^{\star}\\|_{\infty}}{\lambda_{r}^{\star}}\sqrt{\frac{\mu r}{n}}+8(\alpha n)\frac{\\|D^{\star}\\|_{\infty}}{\lambda_{r}^{\star}}\\|\Delta^{1}\\|_{2,\infty}$
	$\displaystyle\leq$	$\displaystyle 8(\alpha n)\frac{\\|D^{\star}\\|_{\infty}}{\lambda_{r}^{\star}}\sqrt{\frac{\mu r}{n}}+32(\alpha\mu r\kappa)\\|\Delta^{1}\\|_{2,\infty}$
	$\displaystyle\leq$	$\displaystyle 8(\alpha n)\frac{\\|D^{\star}\\|_{\infty}}{\lambda_{r}^{\star}}\sqrt{\frac{\mu r}{n}}+\frac{1}{2}\\|\Delta^{1}\\|_{2,\infty}$

where in the second inequality, the bound $\|D^{\star}\|_{\infty}\leq 4\frac{\mu r}{n}\lambda_{1}^{\star}$ is used again, and the last inequality holds if $\alpha\leq\frac{1}{64}\cdot\frac{1}{\mu r\kappa}$ .

Combining the bounds of $T_{1}$ to $T_{3}$ , and taking the maximum with respect to $i$ ,

\displaystyle\|\Delta^{1}\|_{2,\infty}\leq(2+2\kappa)\frac{\|E^{0}\|_{2}}{\lambda_{r}^{\star}}\sqrt{\frac{\mu r}{n}}+2\kappa\frac{\|E^{0}\|_{2}}{\lambda_{r}^{\star}}\sqrt{\frac{\mu r}{n}}+8(\alpha n)\frac{\|D^{\star}\|_{\infty}}{\lambda_{r}^{\star}}\sqrt{\frac{\mu r}{n}}+\frac{1}{2}\|\Delta^{1}\|_{2,\infty},

consequently,

$\displaystyle\\|\Delta^{1}\\|_{2,\infty}\leq$	$\displaystyle\left[12\kappa\frac{\\|E^{0}\\|_{2}}{\lambda_{r}^{\star}}+16(\alpha n)\frac{\\|D^{\star}\\|_{\infty}}{\lambda_{r}^{\star}}\right]\sqrt{\frac{\mu r}{n}}$	(5)
$\displaystyle\leq$	$\displaystyle\left[24(\alpha\kappa n)\frac{\\|D^{\star}\\|_{\infty}}{\lambda_{r}^{\star}}+16(\alpha n)\frac{\\|D^{\star}\\|_{\infty}}{\lambda_{r}^{\star}}\right]\sqrt{\frac{\mu r}{n}}$
$\displaystyle\leq$	$\displaystyle 40(\alpha\kappa n)\frac{\\|D^{\star}\\|_{\infty}}{\lambda_{r}^{\star}}\sqrt{\frac{\mu r}{n}}\leq 160(\alpha\mu r\kappa^{2})\sqrt{\frac{\mu r}{n}}\leq\sqrt{\frac{\mu r}{n}}$

where in the second inequality, the bound $\|E^{0}\|_{2}\leq 2(\alpha n)\|D^{\star}\|_{\infty}$ is used, and the last inequality holds if $\alpha\leq\frac{1}{160}\cdot\frac{1}{\mu r\kappa^{2}}$ . As a result,

\|e_{i}^{T}U^{1}\|_{2}\leq\|e_{i}^{T}U^{\star}G^{1}\|_{2}+\|e_{i}^{T}\Delta^{1}\|_{2}=\|e_{i}^{T}U^{\star}\|_{2}+\|e_{i}^{T}\Delta^{1}\|_{2},

therefore $\|U^{1}\|_{2,\infty}\leq 2\sqrt{\frac{\mu r}{n}}$ . Also, under this bound of $\alpha$ we have

\|E^{0}\|_{2}\leq 2(\alpha n)\|D^{\star}\|_{\infty}\leq 8(\alpha\mu r)\lambda_{1}^{\star}\leq\frac{1}{20}\lambda_{r}^{\star}.

Now by Lemma 1,

$\displaystyle\left\\|L^{1}-L^{\star}\right\\|_{\infty}\leq$	$\displaystyle\\|\Delta^{1}\\|_{2,\infty}\left(\\|U^{1}\\|_{2,\infty}+\\|U^{\star}\\|_{2,\infty}\right)\\|\Lambda^{1}\\|_{2}+(3+4\kappa)\left\\|U^{\star}\right\\|_{2,\infty}^{2}\\|E^{0}\\|_{2}$	(6)
$\displaystyle\leq$	$\displaystyle 40(\alpha\kappa n)\frac{\\|D^{\star}\\|_{\infty}}{\lambda_{r}^{\star}}\sqrt{\frac{\mu r}{n}}\cdot 3\sqrt{\frac{\mu r}{n}}\cdot\frac{21}{20}\lambda_{1}^{\star}+7\kappa\frac{\mu r}{n}\cdot 2(\alpha n)\\|D^{\star}\\|_{\infty}$
$\displaystyle=$	$\displaystyle(126\alpha\mu r\kappa^{2}+14\alpha\mu r\kappa)\\|D^{\star}\\|_{\infty}\leq\frac{\\|D^{\star}\\|_{\infty}}{4}\gamma$

if $\alpha\leq\frac{1}{560}\cdot\frac{\gamma}{\mu r\kappa^{2}}$ .

4.3 Proof for the Iteration Phase

For $k\geq 1$ , our induction hypotheses are the following:


$\displaystyle\\|E^{k-1}\\|_{2}\leq$	$\displaystyle 4(\alpha n)\\|D^{\star}\\|_{\infty}\gamma^{k-1},$	(7a)
$\displaystyle\\|\Delta^{k}\\|_{2,\infty}\leq$	$\displaystyle 120(\alpha\kappa n)\frac{\\|D^{\star}\\|_{\infty}}{\lambda_{r}^{\star}}\sqrt{\frac{\mu r}{n}}\gamma^{k-1},$	(7b)
$\displaystyle\\|L^{k}-L^{\star}\\|_{\infty}\leq$	$\displaystyle\frac{\\|D^{\star}\\|_{\infty}}{4}\gamma^{k}.$	(7c)

One can see from (4), (5), and (6) that the three bounds hold for $k=1$ . For any $k\geq 1$ ,

\|E^{k-1}\|_{2}\leq 16(\alpha\mu r)\lambda_{1}^{\star}\leq\frac{1}{20}\lambda_{r}^{\star}

when $\alpha\leq\frac{1}{320}\cdot\frac{1}{\mu r\kappa}$ ; and

\|\Delta^{k}\|_{2,\infty}\leq 480(\alpha\mu r\kappa^{2})\sqrt{\frac{\mu r}{n}}\leq\sqrt{\frac{\mu r}{n}}

when $\alpha\leq\frac{1}{480}\cdot\frac{1}{\mu r\kappa^{2}}$ so that $\|U^{k}\|_{2,\infty}\leq 2\sqrt{\frac{\mu r}{n}}$ .

Now we prove for iteration $(k+1)$ . With the bound $\|L^{k}-L^{\star}\|_{\infty}\leq\frac{\|D^{\star}\|_{\infty}}{4}\gamma^{k}$ , by Lemma 5,

\|\mathcal{A}(L^{k})-D^{\star}\|_{\infty}\leq\|D^{\star}\|_{\infty}\gamma^{k}\leq\xi^{k}.

Then by Lemma 6, $\text{supp}(S^{k})\subseteq\text{supp}(S^{\star})$ , and

\|S^{k}-S^{\star}\|_{\infty}\leq\|D^{\star}\|_{\infty}\gamma^{k}+\xi^{k}\leq(4\|D^{\star}\|_{\infty})\gamma^{k}.

By Lemma 5, $\mathcal{B}(\mathcal{A}(L^{\star}))=L^{\star}$ , and

	$\displaystyle L^{k+1}=\mathcal{H}_{r}^{+}\mathcal{P}_{T^{k}}\mathcal{B}(D-S^{k})=$	$\displaystyle\mathcal{H}_{r}^{+}\mathcal{P}_{T^{k}}\mathcal{B}(D^{\star}+S^{\star}-S^{k})$
	$\displaystyle=$	$\displaystyle\mathcal{H}_{r}^{+}\mathcal{P}_{T^{k}}\mathcal{B}(\mathcal{A}(L^{\star})+S^{\star}-S^{k})=\mathcal{H}_{r}^{+}\mathcal{P}_{T^{k}}(L^{\star}+\mathcal{B}(S^{\star}-S^{k})).$

Denoting

\displaystyle L^{k+1}=\mathcal{H}_{r}^{+}\mathcal{P}_{T^{k}}(L^{\star}+\mathcal{B}(S^{\star}-S^{k}))=

\displaystyle\mathcal{H}_{r}^{+}(L^{\star}-((\mathcal{I}-\mathcal{P}_{T^{k}})L^{\star}+\mathcal{P}_{T^{k}}\mathcal{B}(S^{k}-S^{\star})):=\mathcal{H}_{r}^{+}(L^{\star}-E^{k}),

and the eigen-decomposition of $L^{k+1}$ as $U^{k+1}\Lambda^{k+1}(U^{k+1})^{T}$ .

	$\displaystyle\\|E^{k}\\|_{2}=$	$\displaystyle\\|(\mathcal{I}-\mathcal{P}_{T^{k}})L^{\star}+\mathcal{P}_{T^{k}}\mathcal{B}(S^{k}-S^{\star})\\|_{2}$
	$\displaystyle\leq$	$\displaystyle\frac{\\|L^{k}-L^{\star}\\|_{2}^{2}}{\lambda_{1}^{\star}}+\frac{4}{3}\\|\mathcal{B}(S^{k}-S^{\star})\\|_{2}$
	$\displaystyle\leq$	$\displaystyle\frac{\\|E^{k-1}\\|_{2}}{\lambda_{1}^{\star}}\\|E^{k-1}\\|_{2}+\frac{2}{3}\\|S^{k}-S^{\star}\\|_{2}$
	$\displaystyle\leq$	$\displaystyle 4(\alpha n)\frac{\\|D^{\star}\\|_{\infty}}{\lambda_{1}^{\star}}\gamma^{k-1}\cdot 4(\alpha n)\\|D^{\star}\\|_{\infty}\gamma^{k-1}+\frac{8}{3}(\alpha n)\\|D^{\star}\\|_{\infty}\gamma^{k}$
	$\displaystyle\leq$	$\displaystyle 16(\alpha\mu r)\gamma^{k-1}\cdot 4(\alpha n)\\|D^{\star}\\|_{\infty}\gamma^{k-1}+\frac{8}{3}(\alpha n)\\|D^{\star}\\|_{\infty}\gamma^{k}$
	$\displaystyle\leq$	$\displaystyle\frac{\gamma}{3}\cdot 4(\alpha n)\\|D^{\star}\\|_{\infty}\gamma^{k-1}+\frac{8}{3}(\alpha n)\\|D^{\star}\\|_{\infty}\gamma^{k}=4(\alpha n)\\|D^{\star}\\|_{\infty}\gamma^{k},$

where the first inequality follows from Lemma 8 and Lemma 9, the second inequality is due to Lemma 7 again, the third inequality follows from the induction hypothesis (7a), and the fifth inequality holds if $\alpha\leq\frac{1}{48}\cdot\frac{\gamma}{\mu r}$ . Then we proceed as in the base case to get

	$\displaystyle e_{i}^{T}\Delta^{k+1}=$	$\displaystyle\underbrace{e_{i}^{T}U^{\star}\Lambda^{\star}\left[(U^{\star})^{T}U^{k+1}(\Lambda^{\star})^{-1}-(\Lambda^{\star})^{-1}G^{k+1}\right]}_{T_{1}}$
		$\displaystyle+\underbrace{e_{i}^{T}U^{\star}\Lambda^{\star}(U^{\star})^{T}U^{k+1}\left[(\Lambda^{k+1})^{-1}-(\Lambda^{\star})^{-1}\right]}_{T_{2}}-\underbrace{e_{i}^{T}E^{k}U^{k+1}(\Lambda^{k+1})^{-1}}_{T_{3}},$

and the first two terms can be bounded similarly, i.e.,

\displaystyle\|T_{1}\|_{2}\leq\sqrt{\frac{\mu r}{n}}\cdot(2+2\kappa)\frac{\|E^{k}\|_{2}}{\lambda_{r}^{\star}},~{}\|T_{2}\|_{2}\leq\sqrt{\frac{\mu r}{n}}\cdot 2\kappa\frac{\|E^{k}\|_{2}}{\lambda_{r}^{\star}}.

Bounding $T_{3}$ .

\displaystyle\|T_{3}\|_{2}\leq\|e_{i}^{T}E^{k}U^{k+1}\|_{2}\cdot\|(\Lambda^{k+1})^{-1}\|_{2}\leq\|e_{i}^{T}E^{k}\|_{2}\cdot\frac{2}{\lambda_{r}^{\star}},

while

\displaystyle\|e_{i}^{T}E^{k}\|_{2}\leq\underbrace{\|e_{i}^{T}(\mathcal{I}-\mathcal{P}_{T^{k}})L^{\star}\|_{2}}_{B_{1}}+\underbrace{\|e_{i}^{T}\mathcal{P}_{T^{k}}\mathcal{B}(S^{k}-S^{\star})\|_{2}}_{B_{2}}.

For the first term,

	$\displaystyle(\mathcal{I}-\mathcal{P}_{T^{k}})L^{\star}=$	$\displaystyle(I_{n}-U^{k}(U^{k})^{T})L^{\star}(I_{n}-U^{k}(U^{k})^{T})$
	$\displaystyle=$	$\displaystyle(U^{k}(U^{k})^{T}-U^{\star}(U^{\star})^{T})(-L^{\star})(I_{n}-U^{k}(U^{k})^{T})$
	$\displaystyle=$	$\displaystyle(U^{k}(U^{k})^{T}-U^{\star}(U^{\star})^{T})(L^{k}-L^{\star})(I_{n}-U^{k}(U^{k})^{T})$

where the last equality follows from the fact that $L^{k}(I_{n}-U^{k}(U^{k})^{T})=0$ . Therefore,

	$\displaystyle B_{1}=$	$\displaystyle\\|e_{i}^{T}(U^{k}(U^{k})^{T}-U^{\star}(U^{\star})^{T})(L^{k}-L^{\star})(I_{n}-U^{k}(U^{k})^{T})\\|_{2}$
	$\displaystyle\leq$	$\displaystyle\\|e_{i}^{T}(U^{k}(U^{k})^{T}-U^{\star}(U^{\star})^{T})\\|_{2}\cdot\\|L^{k}-L^{\star}\\|_{2}$
	$\displaystyle\leq$	$\displaystyle(\\|e_{i}^{T}U^{k}\\|_{2}+\\|e_{i}^{T}U^{\star}\\|_{2})\\|L^{k}-L^{\star}\\|_{2}\leq 3\sqrt{\frac{\mu r}{n}}\cdot\\|E^{k-1}\\|_{2}.$

For the second term,

	$\displaystyle B_{2}=$	$\displaystyle\\|e_{i}^{T}\mathcal{P}_{T^{k}}\mathcal{B}(S^{k}-S^{\star})\\|_{2}$
	$\displaystyle\leq$	$\displaystyle\underbrace{\\|e_{i}^{T}U^{k}(U^{k})^{T}\mathcal{B}(S^{k}-S^{\star})(I_{n}-U^{k}(U^{k})^{T})\\|_{2}}_{B_{21}}+\underbrace{\\|e_{i}^{T}\mathcal{B}(S^{k}-S^{\star})U^{k}(U^{k})^{T}\\|_{2}}_{B_{22}}.$

\displaystyle B_{21}\leq\|e_{i}^{T}U^{k}\|_{2}\cdot\|\mathcal{B}(S^{k}-S^{\star})\|_{2}\leq 2\sqrt{\frac{\mu r}{n}}\cdot\frac{\alpha n}{2}\|S^{k}-S^{\star}\|_{\infty}\leq\sqrt{\frac{\mu r}{n}}\cdot 4(\alpha n)\|D^{\star}\|_{\infty}\gamma^{k},

	$\displaystyle B_{22}\leq\\|e_{i}^{T}\mathcal{B}(S^{k}-S^{\star})U^{k}\\|_{2}\leq$	$\displaystyle\frac{1}{2}\\|J(S^{k}-S^{\star})JU^{k}\\|_{2,\infty}$
	$\displaystyle=$	$\displaystyle\frac{1}{2}\\|J(S^{k}-S^{\star})U^{k}\\|_{2,\infty}$
	$\displaystyle\leq$	$\displaystyle\\|(S^{k}-S^{\star})U^{k}\\|_{2,\infty}\leq 4(\alpha n)\\|D^{\star}\\|_{\infty}\gamma^{k}\cdot 2\sqrt{\frac{\mu r}{n}}.$

Putting together,

\displaystyle\|T_{3}\|_{2}\leq 6\frac{\|E^{k-1}\|_{2}}{\lambda_{r}^{\star}}\sqrt{\frac{\mu r}{n}}+24(\alpha n)\frac{\|D^{\star}\|_{\infty}}{\lambda_{r}^{\star}}\sqrt{\frac{\mu r}{n}}\gamma^{k}.

Combining the bounds of $T_{1}$ to $T_{3}$ , and taking the maximum with respect to $i$ ,

	$\displaystyle\\|\Delta^{k+1}\\|_{2,\infty}\leq$	$\displaystyle 6\kappa\frac{\\|E^{k}\\|_{2}}{\lambda_{r}^{\star}}\sqrt{\frac{\mu r}{n}}+6\frac{\\|E^{k-1}\\|_{2}}{\lambda_{r}^{\star}}\sqrt{\frac{\mu r}{n}}+24(\alpha n)\frac{\\|D^{\star}\\|_{\infty}}{\lambda_{r}^{\star}}\sqrt{\frac{\mu r}{n}}\gamma^{k}$
	$\displaystyle\leq$	$\displaystyle\left[24(\alpha\kappa n)\frac{\\|D^{\star}\\|_{\infty}}{\lambda_{r}^{\star}}\gamma^{k}+24(\alpha n)\frac{\\|D^{\star}\\|_{\infty}}{\lambda_{r}^{\star}}\gamma^{k-1}+24(\alpha n)\frac{\\|D^{\star}\\|_{\infty}}{\lambda_{r}^{\star}}\gamma^{k}\right]\sqrt{\frac{\mu r}{n}}$
	$\displaystyle\leq$	$\displaystyle\left[24(\alpha\kappa n)\frac{\\|D^{\star}\\|_{\infty}}{\lambda_{r}^{\star}}\gamma^{k}+72(\alpha n)\frac{\\|D^{\star}\\|_{\infty}}{\lambda_{r}^{\star}}\gamma^{k}+24(\alpha n)\frac{\\|D^{\star}\\|_{\infty}}{\lambda_{r}^{\star}}\gamma^{k}\right]\sqrt{\frac{\mu r}{n}}$
	$\displaystyle\leq$	$\displaystyle 120(\alpha\kappa n)\frac{\\|D^{\star}\\|_{\infty}}{\lambda_{r}^{\star}}\sqrt{\frac{\mu r}{n}}\gamma^{k},$

where the third inequality holds if $\gamma\geq\frac{1}{3}$ . By Lemma 1 again,

	$\displaystyle\left\\|L^{k+1}-L^{\star}\right\\|_{\infty}\leq$	$\displaystyle\\|\Delta^{k+1}\\|_{2,\infty}\left(\\|U^{k+1}\\|_{2,\infty}+\\|U^{\star}\\|_{2,\infty}\right)\\|\Lambda^{k+1}\\|_{2}+(3+4\kappa)\left\\|U^{\star}\right\\|_{2,\infty}^{2}\\|E^{k}\\|_{2}$
	$\displaystyle\leq$	$\displaystyle 120(\alpha\kappa n)\frac{\\|D^{\star}\\|_{\infty}}{\lambda_{r}^{\star}}\sqrt{\frac{\mu r}{n}}\gamma^{k}\cdot 3\sqrt{\frac{\mu r}{n}}\cdot\frac{21}{20}\lambda_{1}^{\star}+7\kappa\frac{\mu r}{n}\cdot(4\alpha n)\\|D^{\star}\\|_{\infty}\gamma^{k}$
	$\displaystyle\leq$	$\displaystyle 406(\alpha\mu r\kappa^{2})\\|D^{\star}\\|_{\infty}\gamma^{k}\leq\frac{\\|D^{\star}\\|_{\infty}}{4}\gamma^{k+1}$

if $\alpha\leq\frac{1}{1624}\cdot\frac{\gamma}{\mu r\kappa^{2}}$ .

4.4 Proof for Proposition 1

For $k\geq 0$ ,

		$\displaystyle\\|X^{k+1}-X^{\star}R^{k+1}\\|_{2,\infty}$
	$\displaystyle=$	$\displaystyle\\|U^{k+1}(\Lambda^{k+1})^{\frac{1}{2}}-U^{\star}(\Lambda^{\star})^{\frac{1}{2}}R^{k+1}\\|_{2,\infty}$
	$\displaystyle\leq$	$\displaystyle\underbrace{\\|U^{k+1}(\Lambda^{k+1})^{\frac{1}{2}}-U^{\star}G^{k+1}(\Lambda^{k+1})^{\frac{1}{2}}\\|_{2,\infty}}_{I_{1}}+\underbrace{\\|U^{\star}G^{k+1}(\Lambda^{k+1})^{\frac{1}{2}}-U^{\star}G^{k+1}(\Lambda^{\star})^{\frac{1}{2}}\\|_{2,\infty}}_{I_{2}}$
		$\displaystyle+\underbrace{\\|U^{\star}G^{k+1}(\Lambda^{\star})^{\frac{1}{2}}-U^{\star}(\Lambda^{\star})^{\frac{1}{2}}G^{k+1}\\|_{2,\infty}}_{I_{3}}+\underbrace{\\|U^{\star}(\Lambda^{\star})^{\frac{1}{2}}G^{k+1}-U^{\star}(\Lambda^{\star})^{\frac{1}{2}}R^{k+1}\\|_{2,\infty}}_{I_{4}}.$

Bounding $I_{1}$ .

	$\displaystyle I_{1}=$	$\displaystyle\max_{i}\\|e_{i}^{T}[U^{k+1}(\Lambda^{k+1})^{\frac{1}{2}}-U^{\star}G^{k+1}(\Lambda^{k+1})^{\frac{1}{2}}]\\|_{2}$
	$\displaystyle\leq$	$\displaystyle\\|\Delta^{k+1}\\|_{2,\infty}\cdot\\|(\Lambda^{k+1})^{\frac{1}{2}}\\|_{2}$
	$\displaystyle\leq$	$\displaystyle 120(\alpha\kappa n)\frac{\\|D^{\star}\\|_{2,\infty}}{\lambda_{r}^{\star}}\sqrt{\frac{\mu r}{n}}\gamma^{k}\cdot\left(\frac{21}{20}\lambda_{1}^{\star}\right)^{\frac{1}{2}}\leq 492(\alpha\mu r\kappa^{2})\sqrt{\frac{\mu r}{n}}(\lambda_{1}^{\star})^{\frac{1}{2}}\cdot\gamma^{k}.$

Bounding $I_{2}$ .

\displaystyle I_{2}=\max_{i}\|e_{i}^{T}U^{\star}G^{k+1}[(\Lambda^{k+1})^{\frac{1}{2}}-(\Lambda^{\star})^{\frac{1}{2}}]\|_{2}\leq\sqrt{\frac{\mu r}{n}}\cdot\|(\Lambda^{k+1})^{\frac{1}{2}}-(\Lambda^{\star})^{\frac{1}{2}}\|_{2}.

Since

\displaystyle\|(\Lambda^{k+1})^{\frac{1}{2}}-(\Lambda^{\star})^{\frac{1}{2}}\|_{2}=\max_{j=1,\cdots,r}\left|(\lambda_{j}^{k+1})^{\frac{1}{2}}-(\lambda_{j}^{\star})^{\frac{1}{2}}\right|=\max_{j=1,\cdots,r}\frac{|\lambda_{j}^{k+1}-\lambda_{j}^{\star}|}{(\lambda_{j}^{k+1})^{\frac{1}{2}}+(\lambda_{j}^{\star})^{\frac{1}{2}}}\leq\frac{\|E^{k}\|_{2}}{(\lambda_{r}^{\star})^{\frac{1}{2}}},

I_{2}\leq 4(\alpha n)\frac{\|D^{\star}\|_{\infty}}{(\lambda_{r}^{\star})^{\frac{1}{2}}}\sqrt{\frac{\mu r}{n}}\cdot\gamma^{k}\leq 16(\alpha\mu r\kappa^{\frac{1}{2}})\sqrt{\frac{\mu r}{n}}(\lambda_{1}^{\star})^{\frac{1}{2}}\cdot\gamma^{k}.

Bounding $I_{3}$ .

\displaystyle I_{3}=\max_{i}\|e_{i}^{T}U^{\star}[G^{k+1}(\Lambda^{\star})^{\frac{1}{2}}-(\Lambda^{\star})^{\frac{1}{2}}G^{k+1}]\|_{2}\leq\sqrt{\frac{\mu r}{n}}\cdot\|G^{k+1}(\Lambda^{\star})^{\frac{1}{2}}-(\Lambda^{\star})^{\frac{1}{2}}G^{k+1}\|_{2}.

	$\displaystyle\\|G^{k+1}(\Lambda^{\star})^{\frac{1}{2}}-(\Lambda^{\star})^{\frac{1}{2}}G^{k+1}\\|_{2}=$	$\displaystyle\\|G^{k+1}(\Lambda^{\star})^{\frac{1}{2}}-(\Lambda^{\star})^{\frac{1}{2}}G^{k+1}\\|_{2}$
	$\displaystyle=$	$\displaystyle\\|(\Lambda^{\star})^{\frac{1}{2}}-(G^{k+1})^{T}(\Lambda^{\star})^{\frac{1}{2}}G^{k+1}\\|_{2}$
	$\displaystyle\leq$	$\displaystyle\frac{1}{2(\lambda_{r}^{\star})^{\frac{1}{2}}}\\|\Lambda^{\star}-(G^{k+1})^{T}\Lambda^{\star}G^{k+1}\\|_{2}$
	$\displaystyle=$	$\displaystyle\frac{1}{2(\lambda_{r}^{\star})^{\frac{1}{2}}}\\|\Lambda^{\star}G^{k+1}-G^{k+1}\Lambda^{\star}\\|_{2}$
	$\displaystyle\leq$	$\displaystyle\frac{1}{2(\lambda_{r}^{\star})^{\frac{1}{2}}}\cdot(2+4\kappa)\\|E^{k}\\|_{2}\leq 3\kappa\frac{\\|E^{k}\\|_{2}}{(\lambda_{r}^{\star})^{\frac{1}{2}}},$

where the first inequality follows from [14, Lemma 2.1], since $(\Lambda^{\star})^{\frac{1}{2}}$ and $(G^{k+1})^{T}(\Lambda^{\star})^{\frac{1}{2}}G^{k+1}$ are the matrix square root of $\Lambda^{\star}$ and $(G^{k+1})^{T}\Lambda^{\star}G^{k+1}$ , respectively; and the second inequality follows from Lemma 2. Therefore,

I_{3}\leq 12(\alpha\kappa n)\frac{\|D^{\star}\|_{\infty}}{(\lambda_{r}^{\star})^{\frac{1}{2}}}\sqrt{\frac{\mu r}{n}}\cdot\gamma^{k}\leq 48(\alpha\mu r\kappa^{\frac{3}{2}})\sqrt{\frac{\mu r}{n}}(\lambda_{1}^{\star})^{\frac{1}{2}}\cdot\gamma^{k}.

Bounding $I_{4}$ .

	$\displaystyle I_{4}=$	$\displaystyle\max_{i}\\|e_{i}^{T}[U^{\star}(\Lambda^{\star})^{\frac{1}{2}}G^{k+1}-U^{\star}(\Lambda^{\star})^{\frac{1}{2}}R^{k+1}]\\|_{2}$
	$\displaystyle\leq$	$\displaystyle\sqrt{\frac{\mu r}{n}}\cdot(\lambda_{1}^{\star})^{\frac{1}{2}}\cdot\\|G^{k+1}-R^{k+1}\\|_{2}$
	$\displaystyle\leq$	$\displaystyle\sqrt{\frac{\mu r}{n}}\cdot(\lambda_{1}^{\star})^{\frac{1}{2}}\cdot 28\kappa^{\frac{3}{2}}\frac{\\|E^{k}\\|_{2}}{\lambda_{r}^{\star}}\leq 112(\alpha\kappa^{2}n)\frac{\\|D^{\star}\\|_{\infty}}{(\lambda_{r}^{\star})^{\frac{1}{2}}}\sqrt{\frac{\mu r}{n}}\cdot\gamma^{k}\leq 448(\alpha\mu r\kappa^{\frac{5}{2}})\sqrt{\frac{\mu r}{n}}(\lambda_{1}^{\star})^{\frac{1}{2}}\cdot\gamma^{k}.$

where the second inequality is due to Lemma 3.

Combining the bound of $I_{1}$ to $I_{4}$ ,

\|X^{k+1}-X^{\star}R^{k+1}\|_{2,\infty}\leq 1004(\alpha\mu r\kappa^{\frac{5}{2}})\sqrt{\frac{\mu r}{n}}(\lambda_{1}^{\star})^{\frac{1}{2}}\cdot\gamma^{k}\leq\sqrt{\frac{\mu r\kappa\lambda_{1}^{\star}}{n}}\gamma^{k+1},

where the last inequality follows from $\alpha\leq\frac{1}{1624}\cdot\frac{\gamma}{\mu r\kappa^{2}}$ .

Conclusion

In this paper, we propose an alternating projection based algorithm that is further accelerated by the tangent space projection technique, for the RMDS problem. Under standard assumptions in the RPCA literature, we establish linear convergence of the proposed algorithm when the outliers are sparse enough. We also numerically verified the performances of the proposed algorithm, with comparisons to other state-of-the-art solvers for RMDS.

To adapt to more realistic settings, it is of interest to incorporate noise in the convergence proof to handle the case that the pairwise distances are corrupted by sub-Gaussian noise, in addition to sparse outliers.

References

[1] L. Blouvshtein and D. Cohen-Or, Outlier detection for robust multi-dimensional scaling, IEEE Transactions on Pattern Analysis and Machine Intelligence, 41 (2019), pp. 2273–2279.
[2] H. Cai, J.-F. Cai, T. Wang, and G. Yin, Accelerated structured alternating projections for robust spectrally sparse signal recovery, IEEE Transactions on Signal Processing, 69 (2021), pp. 809–821.
[3] H. Cai, J.-F. Cai, and K. Wei, Accelerated alternating projections for robust principal component analysis, Journal of Machine Learning Research, 20 (2019), pp. 1–33.
[4] E. J. Candès, X. Li, Y. Ma, and J. Wright, Robust principal component analysis?, Journal of the ACM, 58 (2011), pp. 1–37.
[5] L. Cayton and S. Dasgupta, Robust Euclidean embedding, in Proceedings of the 23rd International Conference on Machine Learning, ICML, 2006, pp. 169–176.
[6] L. Ding and Y. Chen, Leave-one-out approach for matrix completion: Primal and dual analysis, IEEE Transactions on Information Theory, 66 (2020), pp. 7274–7301.
[7] I. Dokmanic, R. Parhizkar, J. Ranieri, and M. Vetterli, Euclidean distance matrices: Essential theory, algorithms, and applications, IEEE Signal Processing Magazine, 32 (2015), pp. 12–30.
[8] P. A. Forero and G. B. Giannakis, Sparsity-exploiting robust multidimensional scaling, IEEE Transactions on Signal Processing, 60 (2012), pp. 4118–4134.
[9] L. Kong, C. Qi, and H.-D. Qi, Classical multidimensional scaling: A subspace perspective, over-denoising, and outlier detection, IEEE Transactions on Signal Processing, 67 (2019), pp. 3842–3857.
[10] X. Li, S. Ding, and Y. Li, Outlier suppression via non-convex robust PCA for efficient localization in wireless sensor networks, IEEE Sensors Journal, 17 (2017), pp. 7053–7063.
[11] C. Ma, K. Wang, Y. Chi, and Y. Chen, Implicit regularization in nonconvex statistical estimation: Gradient descent converges linearly for phase retrieval, matrix completion, and blind deconvolution, Foundations of Computational Mathematics, 20 (2019), pp. 451–632.
[12] F. D. Mandanas and C. L. Kotropoulos, Robust multidimensional scaling using a maximum correntropy criterion, IEEE Transactions on Signal Processing, 65 (2017), pp. 919–932.
[13] P. Netrapalli, U. Niranjan, S. Sanghavi, A. Anandkumar, and P. Jain, Non-convex robust PCA, in Advances in Neural Information Processing Systems, NIPS, 2014, pp. 1107–1115.
[14] B. A. Schmitt, Perturbation bounds for matrix square roots and pythagorean sums, Linear Algebra and its Applications, 174 (1992), pp. 215–227.
[15] K. Wei, J.-F. Cai, T. F. Chan, and S. Leung, Guarantees of Riemannian optimization for low rank matrix recovery, SIAM Journal on Matrix Analysis and Applications, 37 (2016), pp. 1198–1222.
[16] X. Yi, D. Park, Y. Chen, and C. Caramanis, Fast algorithms for robust PCA via gradient descent, in Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS, 2016, pp. 4159–4167.

Appendix A Implementation Details of the Computation of $L^{k+1}$

First consider $\mathcal{B}(D-S^{k})=-\frac{1}{2}J(D-S^{k})J$ . The computation of $J(D-S^{k})$ can be achieved by subtracting from each row of $(D-S^{k})$ by its averaged row vector. This step costs about $2n^{2}$ flops. $(J(D-S^{k}))J$ can be computed in a similar fashion, by subtracting from each column of $J(D-S^{k})$ by its averaged column vector. Therefore, the computation cost of $\mathcal{B}(D-S^{k})$ is about $5n^{2}$ flops.

Denote $B:=\mathcal{B}(D-S^{k})$ . Note that $(I-U^{k}(U^{k})^{T})BU^{k}=BU^{k}-U^{k}(U^{k})^{T}BU^{k}$ , and the computation of $U^{k}(U^{k})^{T}BU^{k}$ requires about $2n^{2}r+4nr^{2}$ flops. Therefore the total cost to form $(I-U^{k}(U^{k})^{T})BU^{k}$ is about $nr+2n^{2}r+4nr^{2}$ flops. Let $(I-U^{k}(U^{k})^{T})BU^{k}=QR$ be the QR decomposition, whose cost is $O(nr^{2})$ flops. Then,

	$\displaystyle\mathcal{P}_{T^{k}}(B)$	$\displaystyle=U^{k}(U^{k})^{T}B+BU^{k}(U^{k})^{T}-U^{k}(U^{k})^{T}BU^{k}(U^{k})^{T}$
		$\displaystyle=U^{k}(U^{k})^{T}B(I-U^{k}(U^{k})^{T})+(I-U^{k}(U^{k})^{T})BU^{k}(U^{k})^{T}$
		$\displaystyle\quad+U^{k}(U^{k})^{T}BU^{k}(U^{k})^{T}$
		$\displaystyle=U^{k}R^{T}Q^{T}+QR(U^{k})^{T}+U^{k}(U^{k})^{T}BU^{k}(U^{k})^{T}$
		$\displaystyle=[U^{k}~{}Q]\begin{bmatrix}(U^{k})^{T}BU^{k}&R^{T}\\ R&0\end{bmatrix}\begin{bmatrix}(U^{k})^{T}\\ Q^{T}\end{bmatrix}$
		$\displaystyle:=[U^{k}~{}Q]M([U^{k}~{}Q])^{T}.$

Note that $M\in\mathbb{S}^{2r\times 2r}$ , the eigen-decomposition of $M$ , denoted as $U\Lambda U^{T}$ can be computed using $O(r^{3})$ flops. We may as well require that the diagonal entries of $\Lambda$ are in the descending order. Finally, since the columns of $Q$ is orthogonal to the columns of $U^{k}$ , $[U^{k}~{}Q]$ is an orthogonal matrix, and we can get the eigen-decomposition of $L^{k+1}$ by computing

U^{k+1}=\begin{bmatrix}U^{k}&Q\end{bmatrix}U_{(:,1:r)}.

(8)

The cost of this step is about $4nr^{2}$ . In summary, the computational cost of $\mathcal{H}_{r}^{+}\mathcal{P}_{T^{k}}\mathcal{B}(D-S^{k})$ is about $5n^{2}+2n^{2}r+O(nr^{2})$ flops.

Appendix B Proofs for Some Useful Lemmas

B.1 Proof for Lemma 5

By the definition of operator $\mathcal{A}$ ,

\displaystyle\mathcal{A}(Z)=\text{diag}(Z)\bm{1}^{T}+\bm{1}\text{diag}(Z)^{T}-2Z.

Therefore $[\mathcal{A}(Z)]_{ij}=Z_{ii}+Z_{jj}-2Z_{ij}$ ,

\displaystyle\|\mathcal{A}(Z)\|_{\infty}\leq\|Z\|_{\infty}+\|Z\|_{\infty}+2\|Z\|_{\infty}=4\|Z\|_{\infty}.

	$\displaystyle\mathcal{B}(\mathcal{A}(Z))=$	$\displaystyle-\frac{1}{2}J(\text{diag}(Z)\bm{1}^{T}+\bm{1}\text{diag}(Z)^{T}-2Z)J$
	$\displaystyle=$	$\displaystyle-\frac{1}{2}J\cdot\text{diag}(Z)\bm{1}^{T}(I_{n}-\frac{1}{n}\bm{1}\bm{1}^{T})-\frac{1}{2}(I_{n}-\frac{1}{n}\bm{1}\bm{1}^{T})\cdot\bm{1}\text{diag}(Z)^{T}J+JZJ=JZJ,$

and if $Z\bm{1}=0$ ,

\displaystyle JZJ=JZ(I_{n}-\frac{1}{n}\bm{1}\bm{1}^{T})=JZ=(I_{n}-\frac{1}{n}\bm{1}\bm{1}^{T})Z=Z.

B.2 Proof for Lemma 6

Denote $D^{k}=\mathcal{A}(L^{k})$ .

\displaystyle S^{k}=T_{\xi^{k}}(D-\mathcal{A}(L^{k}))=T_{\xi^{k}}(S^{\star}+D^{\star}-D^{k}).

When $S_{ij}^{\star}=0$ , since $\|D^{k}-D^{\star}\|_{\infty}\leq\xi^{k}$ ,

\displaystyle S_{ij}^{k}=T_{\xi^{k}}(D_{ij}^{\star}-D_{ij}^{k})=0,

and it follows that $\text{supp}(S^{k})\subseteq\text{supp}(S^{\star})$ . Next, there are two cases to consider.

1.

$(i,j)\in\text{supp}(S^{\star})\setminus\text{supp}(S^{k})$ . In this case, $|S_{ij}^{\star}+D_{ij}^{\star}-D_{ij}^{k}|\leq\xi^{k}$ ,

$\displaystyle|S_{ij}^{k}-S_{ij}^{\star}|=|S_{ij}^{\star}|\leq|D_{ij}^{k}-D_{ij}^{\star}|+\xi^{k}.$

$(i,j)\in\text{supp}(S^{k})$ . In this case, $|S_{ij}^{\star}+D_{ij}^{\star}-D_{ij}^{k}|>\xi^{k}$ ,

\displaystyle S_{ij}^{k}=T_{\xi^{k}}(S_{ij}^{\star}+D_{ij}^{\star}-D_{ij}^{k})=S_{ij}^{\star}+D_{ij}^{\star}-D_{ij}^{k},

and we get the bound

\displaystyle|S_{ij}^{k}-S_{ij}^{\star}|=|D_{ij}^{k}-D_{ij}^{\star}|\leq\xi^{k}.

B.3 Proof for Lemma 7

It is easy to see that $\|J\|_{2}\leq 1$ , and any unit vector orthogonal to $\bm{1}$ reaches the upper bound. For the second property, recall that $L^{\star}=U^{\star}\Lambda^{\star}(U^{\star})^{T}$ , and $L^{\star}\bm{1}=0$ , therefore $\bm{1}$ is in the null space of $L^{\star}$ , and is orthogonal to each column of $U^{\star}$ . Hence,

\bm{1}^{T}U^{\star}=0,~{}JU^{\star}=U^{\star}-\frac{1}{n}\bm{1}\bm{1}^{T}U^{\star}=U^{\star}.

Finally, denote the $i$ -th row of $Z$ as $z_{i}$ ,

\|e_{i}^{T}(JZ)\|_{2}=\left\|z_{i}-\frac{z_{1}+\cdots+z_{n}}{n}\right\|_{2}\leq 2\|Z\|_{2,\infty}.

	$\displaystyle\left\\|\Lambda^{\star}G-G\Lambda^{\star}\right\\|_{2}$	$\displaystyle\leq\left(2+\frac{2\lambda_{1}^{\star}}{\lambda_{r}^{\star}-\\|E\\|_{2}}\right)\\|E\\|_{2},$
	$\displaystyle\left\\|\Lambda^{\star}H-G\Lambda^{\star}\right\\|_{2}$	$\displaystyle\leq\left(2+\frac{\lambda_{1}^{\star}}{\lambda_{r}^{\star}-\\|E\\|_{2}}\right)\\|E\\|_{2}.$

$\displaystyle\\|E^{0}\\|_{2}=\\|\mathcal{B}(S^{0}-S^{\star})\\|_{2}=$	$\displaystyle\frac{1}{2}\\|J(S^{0}-S^{\star})J\\|_{2}$	(4)
$\displaystyle\leq$	$\displaystyle\frac{1}{2}\\|S^{0}-S^{\star}\\|_{2}$
$\displaystyle\leq$	$\displaystyle\frac{\alpha n}{2}\\|S^{0}-S^{\star}\\|_{\infty}$
$\displaystyle\leq$	$\displaystyle 2(\alpha n)\\|D^{\star}\\|_{\infty}\leq 8(\alpha\mu r)\lambda_{1}^{\star},$

	$\displaystyle\\|T_{1}\\|_{2}=$	$\displaystyle\\|e_{i}^{T}U^{\star}\left[\Lambda^{\star}(U^{\star})^{T}U^{1}-G^{1}\Lambda^{\star}\right](\Lambda^{\star})^{-1}\\|_{2}$
	$\displaystyle\leq$	$\displaystyle\\|e_{i}^{T}U^{\star}\\|_{2}\cdot\\|\Lambda^{\star}H^{1}-G^{1}\Lambda^{\star}\\|_{2}\cdot\\|(\Lambda^{\star})^{-1}\\|_{2}$
	$\displaystyle\leq$	$\displaystyle\sqrt{\frac{\mu r}{n}}\cdot(2+2\kappa)\\|E^{0}\\|_{2}\cdot\frac{1}{\lambda_{r}^{\star}},$

	$\displaystyle\\|e_{i}^{T}E^{0}U^{\star}\\|_{2}=\\|e_{i}^{T}\mathcal{B}(S^{0}-S^{\star})U^{\star}\\|_{2}=$	$\displaystyle\frac{1}{2}\\|e_{i}^{T}J(S^{0}-S^{\star})JU^{\star}\\|_{2}$
	$\displaystyle=$	$\displaystyle\frac{1}{2}\\|e_{i}^{T}J(S^{0}-S^{\star})U^{\star}\\|_{2}$
	$\displaystyle\leq$	$\displaystyle\frac{1}{2}\\|J(S^{0}-S^{\star})U^{\star}\\|_{2,\infty}$
	$\displaystyle\leq$	$\displaystyle\\|(S^{0}-S^{\star})U^{\star}\\|_{2,\infty}$
	$\displaystyle\leq$	$\displaystyle(\alpha n)\\|S^{0}-S^{\star}\\|_{\infty}\cdot\sqrt{\frac{\mu r}{n}}\leq 4(\alpha n)\\|D^{\star}\\|_{\infty}\sqrt{\frac{\mu r}{n}},$

	$\displaystyle\\|e_{i}^{T}E^{0}\Delta^{1}\\|_{2}\leq$	$\displaystyle\\|(S^{0}-S^{\star})\Delta^{1}\\|_{2,\infty}$
	$\displaystyle\leq$	$\displaystyle(\alpha n)\\|S^{0}-S^{\star}\\|_{\infty}\cdot\\|\Delta^{1}\\|_{2,\infty}\leq 4(\alpha n)\\|D^{\star}\\|_{\infty}\\|\Delta^{1}\\|_{2,\infty}.$