This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Robust Multi-Dimensional Scaling via Accelerated Alternating Projections

Tong Deng T. Deng and T. Wang are with the School of Mathematics, Southwestern University of Finance and Economics, Chengdu, Sichuan, China.    Tianming Wang 11footnotemark: 1
Abstract

We consider the robust multi-dimensional scaling (RMDS) problem in this paper. The goal is to localize point locations from pairwise distances that may be corrupted by outliers. Inspired by classic MDS theories, and nonconvex works for the robust principal component analysis (RPCA) problem, we propose an alternating projection based algorithm that is further accelerated by the tangent space projection technique. For the proposed algorithm, if the outliers are sparse enough, we can establish linear convergence of the reconstructed points to the original points after centering and rotation alignment. Numerical experiments verify the state-of-the-art performances of the proposed algorithm.

1 Introduction

Multi-dimensional scaling (MDS) refers to the localization of points from their pairwise distances, and it has applications in wireless communication, chemistry, computer graphics, and so on [7]. The properties of classic MDS have been well studied.

In reality, the measured pairwise distances may be incomplete, noisy, or, due to sensor malfunction, corrupted by outliers. In this paper, we consider the robust reconstruction of the point locations from distances corrupted by outliers, i.e., the robust multi-dimensional scaling (RMDS) problem. As shown in [5], even a single outlier in the measured distances will cause the false information to spread to all the reconstructed point locations in the MDS calculation.

Suppose that we have a set of points {xi}i=1nr\{x_{i}\}_{i=1}^{n}\subseteq\mathbb{R}^{r}. Denote the squared Euclidean distance matrix (EDM) as DD^{\star}. The (i,j)(i,j)-th entry of DD^{\star} is equal to xixj22\|x_{i}-x_{j}\|_{2}^{2}. If the pairwise distances are corrupted by outliers, the observed EDM DD is equal to D+SD^{\star}+S^{\star} for some outlier matrix SS^{\star}. One can see that the considered problem may be solved by methods for robust principal component analysis (RPCA) [4, 13, 16, 3]. However, naively adopting a RPCA solver for the RMDS problem is at the risk of neglecting the inner structure of the EDM, and may result in degraded reconstruction performances [10].

To combat outliers in the EDM, [8] casts the problem as an unconstrained optimization, where the variables are the data matrix of size n×rn\times r, and the outlier matrix of size n×nn\times n. The data fidelity term of the objective function is the squared Frobenius norm of the difference between the observed distance matrix, and the distance matrix formed by the data matrix plus the outlier matrix. The regularization term of the objective function is the l1l_{1} penalty for the outlier matrix. The optimization is conducted using the majorization-minimization approach. Later, the iterative scheme of [8] is improved by [12], replacing the squared Frobenius norm with the more outlier-robust M-estimators. A more recent work [9], formulates the RMDS problem as a constrained optimization. Besides the outlier matrix, another sought-after matrix is constrained in a way such that it is the distance matrix formed by nn points with dimension less than or equal to rr. Compared to [8, 12], [9] shows improved performances. Weights and bounds for the entries of the distance matrix can also be handled easily in [9]. Different from the optimization perspectives of the aforementioned works, [1] proposes to first detect the outliers by the broken triangular inequalities among the distances, and then compute a weighted MDS without the detected corrupted distances. However, as shown in [9], oftentimes such an approach is not enough to yield satisfactory reconstruction performances.

Inspired by classic MDS theories, and state-of-the-art nonconvex solvers [13, 3] for the RPCA problem, we propose a nonconvex algorithm that alternates between finding the outlier matrix, and the Gram matrix that generates the outlier-free EDM. Our contributions can be summarized as the following.

  • For the proposed algorithm, we can establish linear convergence of the reconstructed points to the original points after centering and rotation alignment, if the outliers are sparse enough. The relationship between the amount of tolerable outliers and the properties of the original points, is also made clear in the theoretical guarantees.

  • We numerically verify the performances of the proposed algorithm in the considered outlier only setting, and demonstrate its advantages compared to other state-of-the-art solvers in the noise plus outlier setting.

1.1 Problem Formulation and Assumptions

Some necessary notations are first introduced. We then describe our problem formulation, and the assumptions to derive the theoretical guarantees.

Notations. The set of symmetric matrices of size nn is denoted by 𝕊n×n\mathbb{S}^{n\times n}. The set of rotation matrices of size rr is denoted by 𝒪(r)\mathcal{O}(r), i.e., 𝒪(r)={Gr×r|GTG=Ir}\mathcal{O}(r)=\{G\in\mathbb{R}^{r\times r}~{}|~{}G^{T}G=I_{r}\}. Z𝕊n×n\forall Z\in\mathbb{S}^{n\times n}, supp(Z)\operatorname{supp}(Z) is the indices of the nonzeros in ZZ, and diag(Z)n\operatorname{diag}(Z)\in\mathbb{R}^{n} is the vector that contains the diagonal entries of ZZ. 𝟏n\bm{1}\in\mathbb{R}^{n} denotes the all-one vector, and J=In1n𝟏𝟏TJ=I_{n}-\frac{1}{n}\bm{1}\bm{1}^{T}. Zn×r\forall Z\in\mathbb{R}^{n\times r}, Z2,:=max1ineiTZ2.\|Z\|_{2,\infty}:=\max_{1\leq i\leq n}\|e_{i}^{T}Z\|_{2}. For any matrix MM, the spectral norm, the Frobenius norm, and the entrywise infinity norm of MM are denoted by M2\|M\|_{2}, MF\|M\|_{\mathrm{F}}, and M\|M\|_{\infty}, respectively.

Recall that the set of points is {xi}i=1nr\{x_{i}\}_{i=1}^{n}\subseteq\mathbb{R}^{r}, and the corresponding EDM is DD^{\star}. As it turns out, it is possible to recover another set of points centered at zero and preserve the pairwise distances of {xi}i=1n\{x_{i}\}_{i=1}^{n}. Denote the data matrix as Xn×rX\in\mathbb{R}^{n\times r}, whose ii-row is xiTx_{i}^{T}. Let c=1ni=1nxic=\frac{1}{n}\sum_{i=1}^{n}x_{i}, and then denote the centered data matrix as Xcn×rX_{c}\in\mathbb{R}^{n\times r}, whose ii-row is (xic)T(x_{i}-c)^{T}. Define the operator 𝒜:𝕊n×n𝕊n×n\mathcal{A}:\mathbb{S}^{n\times n}\rightarrow\mathbb{S}^{n\times n} such that

𝒜(Z)=diag(Z)𝟏T+𝟏diag(Z)T2Z,Z𝕊n×n.\mathcal{A}(Z)=\text{diag}(Z)\bm{1}^{T}+\bm{1}\text{diag}(Z)^{T}-2Z,\quad\forall Z\in\mathbb{S}^{n\times n}. (1)

One can verify that D=𝒜(XXT)=𝒜(XcXcT)D^{\star}=\mathcal{A}(XX^{T})=\mathcal{A}(X_{c}X_{c}^{T}). Hence, in many applications, it often suffices to reconstruct XcX_{c} from DD^{\star}.

Denote L:=XcXcTL^{\star}:=X_{c}X_{c}^{T}, and define the operator :𝕊n×n𝕊n×n\mathcal{B}:\mathbb{S}^{n\times n}\rightarrow\mathbb{S}^{n\times n} such that

(Z)=12JZJ,Z𝕊n×n.\mathcal{B}(Z)=-\frac{1}{2}\cdot JZJ,\quad\forall Z\in\mathbb{S}^{n\times n}. (2)

Since L𝟏=0L^{\star}\bm{1}=0, by Lemma 5, (D)=(𝒜(L))=L\mathcal{B}(D^{\star})=\mathcal{B}(\mathcal{A}(L^{\star}))=L^{\star}. Furthermore, XcX_{c} can be reconstructed, after rotation alignment, from the eigen-decomposition of LL^{\star}. Without loss of generality, we assume that XcX_{c} is of full column rank, then LL^{\star} is a rank-rr positive semi-definite matrix. Denote the eigen-decomposition of LL^{\star} as UΛ(U)TU^{\star}\Lambda^{\star}(U^{\star})^{T}, where Un×rU^{\star}\in\mathbb{R}^{n\times r}, Λ=diag(λ1,,λr)\Lambda^{\star}=\text{diag}(\lambda_{1},\cdots,\lambda_{r}), and λ1λr>0\lambda_{1}^{\star}\geq\cdots\geq\lambda_{r}^{\star}>0. Then denote X=U(Λ)12X^{\star}=U^{\star}(\Lambda^{\star})^{\frac{1}{2}}. One can show that there exists a rotation matrix Q𝒪(r)Q^{\star}\in\mathcal{O}(r) such that Xc=XQX_{c}=X^{\star}Q^{\star}.

In the considered setting, the observed EDM DD is equal to D+SD^{\star}+S^{\star} for some outlier matrix S𝕊n×nS^{\star}\in\mathbb{S}^{n\times n}. Our aim is to recover LL^{\star} (eventually, XcX_{c}) from D=𝒜(L)+SD=\mathcal{A}(L^{\star})+S^{\star}. To derive our theoretical guarantees, similar to the incoherence and sparsity assumptions commonly made in the RPCA literature [13, 3], we assume the following about LL^{\star} and SS^{\star}.

Assumption 1.

Suppose LL^{\star} has the eigen-decomposition UΛ(U)TU^{\star}\Lambda^{\star}(U^{\star})^{T}, where Un×rU^{\star}\in\mathbb{R}^{n\times r}, Λ=diag(λ1,,λr)\Lambda^{\star}=\text{diag}(\lambda_{1},\cdots,\lambda_{r}), and λ1λr>0\lambda_{1}^{\star}\geq\cdots\lambda_{r}^{\star}>0. We assume that LL^{\star} is μ\mu-incoherent111One can show that μ[1,nr]\mu\in[1,\frac{n}{r}]. Typically, μ=O(1)\mu=O(1)., i.e.,

U2,μrn.\|U^{\star}\|_{2,\infty}\leq\sqrt{\frac{\mu r}{n}}.

From this assumption, one can immediately get

L=maxi,jeiTUΛ(U)Tejμrnλ1.\|L^{\star}\|_{\infty}=\max_{i,j}\|e_{i}^{T}U^{\star}\Lambda^{\star}(U^{\star})^{T}e_{j}\|\leq\frac{\mu r}{n}\lambda_{1}^{\star}.

Assumption 2.

The outlier matrix S𝕊n×nS^{\star}\in\mathbb{S}^{n\times n} is α\alpha-sparse, i.e., it has no more than αn\alpha n nonzero entries per row (and column).

2 Algorithm

Our proposed algorithm, described in Algorithm 1, is inspired by classic MDS theories, and state-of-the-art nonconvex solvers [13, 3] for RPCA.

Algorithm 1 RMDS via Accelerated Alternating Projections (RMDS-AAP)
1:Inputs: EDM DD, target rank rr, threshold parameter ξ0>0\xi^{0}>0, and decay rate γ(0,1)\gamma\in(0,1).
2:Initialization: S0=𝒯ξ0(D)S^{0}=\mathcal{T}_{\xi^{0}}(D), L1=r+(DS0)L^{1}=\mathcal{H}_{r}^{+}\mathcal{B}(D-S^{0}).
3:for k=1,2,k=1,2,\cdots do
4:     Sk=𝒯ξk(D𝒜(Lk))S^{k}=\mathcal{T}_{\xi^{k}}(D-\mathcal{A}(L^{k})), where ξk=ξ0γk\xi^{k}=\xi^{0}\cdot\gamma^{k}
5:     Lk+1=r+𝒫Tk(DSk)L^{k+1}=\mathcal{H}_{r}^{+}\mathcal{P}_{T^{k}}\mathcal{B}(D-S^{k})
6:end for

At initialization, with the hard thresholding function 𝒯ξ(z)(ξ>0):\mathcal{T}_{\xi}(z)~{}(\xi>0):\mathbb{R}\rightarrow\mathbb{R} defined as

𝒯ξ(z)={z|z|>ξ0|z|ξ,\mathcal{T}_{\xi}(z)=\left\{\begin{array}[]{cc}z&|z|>\xi\\ 0&|z|\leq\xi\end{array}\right.,

large entries corrupted by outliers in DD are picked out. Then in the spirit of MDS, L1L^{1} is computed from DS0D-S^{0}. Here, the operator \mathcal{B} is defined as in (2), and Z𝕊n×n\forall Z\in\mathbb{S}^{n\times n} with the eigen-decomposition UΛUTU\Lambda U^{T}, where U=[u1,,un]n×nU=[u_{1},\cdots,u_{n}]\in\mathbb{R}^{n\times n}, Λ=diag(λ1,,λn)\Lambda=\operatorname{diag}(\lambda_{1},\cdots,\lambda_{n}), and λ1λn\lambda_{1}\geq\cdots\geq\lambda_{n},

r+(Z)=i=1rmax{λi,0}uiuiT.\mathcal{H}_{r}^{+}(Z)=\sum_{i=1}^{r}\max\{\lambda_{i},0\}\cdot u_{i}u_{i}^{T}.

For later iterations (k1k\geq 1), the threshold parameter ξk\xi^{k} is adjusted by the decay rate γ\gamma to picked out the entries with outliers. Denote the eigen-decomposition of LkL^{k} as UkΛk(Uk)TU^{k}\Lambda^{k}(U^{k})^{T}, where Ukn×rU^{k}\in\mathbb{R}^{n\times r}, the operator 𝒫Tk\mathcal{P}_{T^{k}} defined as the following,

𝒫Tk(Z):=Uk(Uk)TZ+ZUk(Uk)TUk(Uk)TZUk(Uk)T,Z𝕊n×n,\mathcal{P}_{T^{k}}(Z):=U^{k}(U^{k})^{T}Z+ZU^{k}(U^{k})^{T}-U^{k}(U^{k})^{T}ZU^{k}(U^{k})^{T},~{}\forall Z\in\mathbb{S}^{n\times n}, (3)

is the projection onto the tangent space of the manifold of symmetric positive semi-definite matrices of rank rr at LkL^{k}. Such tangent space projection applied before the partial eigen-decomposition has been proven useful in deriving theoretical guarantees as well as reducing computation cost [15, 3, 2].

For RMDS-AAP, the dominant cost is the computation of LkL^{k} each iteration. At initialization, the flops are about 5n2+O(n2r)5n^{2}+O(n^{2}r), where the hidden constant comes from the partial eigen-decomposition. For later iterations, with the help of the tangent space projection, the partial eigen-decomposition of a n×nn\times n matrix is reduced to several matrix-matrix multiplications that costs about 5n2+2n2r+8nr25n^{2}+2n^{2}r+8nr^{2} flops, a QR factorization of a n×rn\times r matrix, and a small eigen-decomposition of a 2r×2r2r\times 2r matrix. The total cost is about 5n2+2n2r+O(nr2)5n^{2}+2n^{2}r+O(nr^{2}) flops. For completeness, we include the implementation details in Appendix A. Compared to [3], the main difference comes from applying \mathcal{B} before the tangent space projection.

For the theoretical guarantees of RMDS-AAP, we have derived the following results.

Theorem 1.

Suppose that RMDS-AAP is provided with ξ0\xi^{0} that satisfies Dξ03D\|D^{\star}\|_{\infty}\leq\xi^{0}\leq 3\|D^{\star}\|_{\infty}, and γ[13,1)\gamma\in[\frac{1}{3},1). Denote κ:=λ1λr\kappa:=\frac{\lambda_{1}^{\star}}{\lambda_{r}^{\star}}. If

α11624γμrκ2,\alpha\leq\frac{1}{1624}\cdot\frac{\gamma}{\mu r\kappa^{2}},

then for k0\forall k\geq 0, supp(Sk)supp(S)\text{supp}(S^{k})\subseteq\text{supp}(S^{\star}),

SkS(4D)γk,\|S^{k}-S^{\star}\|_{\infty}\leq(4\|D^{\star}\|_{\infty})\gamma^{k},

and

Lk+1LD4γk+1.\|L^{k+1}-L^{\star}\|_{\infty}\leq\frac{\|D^{\star}\|_{\infty}}{4}\gamma^{k+1}.

Remark 1. The constant in the bound for α\alpha can be further optimized. The established bound O(1μrκ2)O(\frac{1}{\mu r\kappa^{2}}) is better than the one O(1μr2κ3)O(\frac{1}{\mu r^{2}\kappa^{3}}) in [3] for the RPCA problem, and the one O(1μ2r2κ2)O(\frac{1}{\mu^{2}r^{2}\kappa^{2}}) in [2] for the robust recovery of low-rank Hankel matrices, showing the merits of our proof. When κ=O(1)\kappa=O(1), the bound matches the optimal one O(1μr)O(\frac{1}{\mu r}) in [13] for the RPCA problem.

Proposition 1.

Assume the same conditions of Theorem 1. For k0\forall k\geq 0, further denote Xk+1=Uk+1(Λk+1)12X^{k+1}=U^{k+1}(\Lambda^{k+1})^{\frac{1}{2}}, and X=U(Λ)12X^{\star}=U^{\star}(\Lambda^{\star})^{\frac{1}{2}}. Suppose the rank-rr SVD of (X)TXk+1\left(X^{\star}\right)^{T}X^{k+1} is Yk+1Σ~k+1(Zk+1)TY^{k+1}\widetilde{\Sigma}^{k+1}(Z^{k+1})^{T}. Set Rk+1=Yk+1(Zk+1)TR^{k+1}=Y^{k+1}(Z^{k+1})^{T}. Then

Xk+1XRk+12,μrκλ1nγk+1.\|X^{k+1}-X^{\star}R^{k+1}\|_{2,\infty}\leq\sqrt{\frac{\mu r\kappa\lambda_{1}^{\star}}{n}}\gamma^{k+1}.

Remark 2. As mentioned in the problem formulation, Xc=XQX_{c}=X^{\star}Q^{\star} for some rotation matrix QQ^{\star}. Furthermore, one can show that the computed minimizer to

minG𝒪(r)Xk+1XcGF=minG𝒪(r)Xk+1(XQ)GF\min_{G\in\mathcal{O}(r)}\|X^{k+1}-X_{c}G\|_{F}=\min_{G\in\mathcal{O}(r)}\|X^{k+1}-(X^{\star}Q^{\star})G\|_{F}

is (Q)TRk+1(Q^{\star})^{T}R^{k+1}. Therefore, Xk+1XRk+1=Xk+1Xc(Q)TRk+1X^{k+1}-X^{\star}R^{k+1}=X^{k+1}-X_{c}(Q^{\star})^{T}R^{k+1}, and Proposition 1 actually establishes the linear convergence of the reconstructed points Xk+1X^{k+1} to XcX_{c} after the best rotation alignment by (Q)TRk+1(Q^{\star})^{T}R^{k+1}. Also, the contraction of error in the l2l_{2} norm is uniform for all the points.

3 Numerical Experiments

We perform tests on the plus sign example that appeared in [8, 9]. The tests are first conducted for the noiseless case, i.e., the distances are only corrupted by outliers, to verify our theoretical guarantees for RMDS-AAP. We then consider the more realistic case, i.e., the distances are corrupted by both noise and outliers, and show the empirical performances of RMDS-AAP.

Noiseless Case. In this case, we consider a plus sign consists of 101101 2D points {xi}i=1101\{x_{i}\}_{i=1}^{101}, which are centered at c=[66]Tc=[6~{}6]^{T}, and have four end points at [196]T[-19~{}6]^{T}, [316]T[31~{}6]^{T}, [619]T[6~{}-19]^{T}, and [631]T[6~{}31]^{T}. The data matrix X101×2X\in\mathbb{R}^{101\times 2} has xiTx_{i}^{T} as its ii-th row, and Xc=X𝟏cTX_{c}=X-\bm{1}c^{T}. The ground truth Gram matrix L=XcXcT101×101L^{\star}=X_{c}X_{c}^{T}\in\mathbb{R}^{101\times 101} has incoherence parameter μ3\mu\approx 3, and condition number κ=1\kappa=1. To test the reconstruction performances of RMDS-AAP against outliers, mm out of the N:=101×1002=5050N:=\frac{101\times 100}{2}=5050 distances {dij:=xixj2|1j<in}\left\{d_{ij}:=\|x_{i}-x_{j}\|_{2}~{}|~{}1\leq j<i\leq n\right\} are randomly sampled, and each is added with an outlier whose value is uniformly selected within [040][0~{}40]. Denote the percentage of outliers as p:=mNp:=\frac{m}{N}. For p{0.05,0.1,0.15,,0.6}p\in\{0.05,0.1,0.15,\cdots,0.6\}, we test RMDS-AAP with different initial threshold values of ξ0\xi^{0} and decay rate values of γ\gamma. The results, averaged among 1000 simulations, are reported in Fig. 1.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Figure 1: Performances of RMDS-AAP for the plus sign (101 points), where the distances are only corrupted by outliers.

In the left subfigure, the logarithms of the averaged errors in terms of log10(SkS)\log_{10}(\|S^{k}-S^{\star}\|_{\infty}), log10(Lk+1L)\log_{10}(\|L^{k+1}-L^{\star}\|_{\infty}), and log10(Xk+1XRk+1)\log_{10}(\|X^{k+1}-X^{\star}R^{k+1}\|_{\infty}) are plotted when p=0.05p=0.05, ξ0=1.2D\xi^{0}=1.2\|D^{\star}\|_{\infty}, and γ=0.5\gamma=0.5. One can see the linear convergence of the three terms, in agreement with Theorem 1 and Proposition 1. In the middle and right subfigures, the reconstruction of each simulation is considered successful if, at convergence,

Xk+1XRk+12,<0.01X2,.\|X^{k+1}-X^{\star}R^{k+1}\|_{2,\infty}<0.01\cdot\|X^{\star}\|_{2,\infty}.

In the middle subfigure, ξ0\xi^{0} is selected as 1.2D1.2\|D^{\star}\|_{\infty}, and the success rates computed out of the 1000 simulations are compared for γ{0.5,0.7,0.9}\gamma\in\{0.5,0.7,0.9\}. As predicted in Theorem 1, one can see that larger γ\gamma indeed admits successful reconstruction from more outliers. In the right subfigure, the success rates when γ=0.9\gamma=0.9, and ξ0{D,1.1D,,1.5D}\xi^{0}\in\{\|D^{\star}\|_{\infty},1.1\|D^{\star}\|_{\infty},\cdots,1.5\|D^{\star}\|_{\infty}\} are shown, whereas the white color indicates success rate 1, and the black color indicates success rate 0. One can see that RMDS-AAP shows some desired robustness to the initial threshold value ξ0\xi^{0}.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Figure 2: Performance comparisons for the plus sign (25 points) with 4 anchor points, where the noisy distances are further corrupted by outliers. The error bars show the standard deviation values of the two methods.

Noisy Case. We then empirically test the reconstruction ability of RMDS-AAP when the distances are also noisy. Following the setup of [9], 25 points centered at c=[66]Tc=[6~{}6]^{T} form the plus sign, with 4 anchor points at [06]T[0~{}6]^{T}, [126]T[12~{}6]^{T}, [60]T[6~{}0]^{T}, and [612]T[6~{}12]^{T}. For 1j<in1\leq j<i\leq n, each dijd_{ij} is first added with a zero-mean Gaussian noise with variation σ2{0,0.1,0.2}\sigma^{2}\in\{0,0.1,0.2\}. Then m{15,30,45,60,75}m\in\{15,30,45,60,75\} out of the N:=25×2426=294N:=\frac{25\times 24}{2}-6=294 distances, discounting for the 6 pairwise distances between the 4 anchor points, are sampled randomly to independently add with an outlier whose value is uniformly selected within [020][0~{}20]. We use ξ0=1.2D\xi^{0}=1.2\|D^{\star}\|_{\infty} and γ=0.7\gamma=0.7 for RMDS-AAP. At convergence, a linear mapping 𝒯\mathcal{T}, consists of translation and rotation, is constructed to find the best alignment between the reconstructed 4 anchor points and the original 4 anchor points. Denote Ω\Omega as the indices of the 4 anchor points, and denote the ii-th row of the Xk+1X^{k+1} at convergence as (xirec)T(x_{i}^{\text{rec}})^{T}. The error measure is computed at the other 21 points after alignment, i.e., RMSE:=iΩ𝒯(xirec)xi22/21.\text{RMSE}:=\sqrt{\sum_{i\notin\Omega}\|\mathcal{T}(x_{i}^{\text{rec}})-x_{i}\|_{2}^{2}/21}. Since in this example, FSMDS [9] shows the best performances, compared to RMDS [8], HQMMDS [12], and TMDS [1], we only compare RMDS-AAP with FSMDS. In each setting, the mean and standard deviation of the RMSE values of RMDS-AAP are computed over 1000 simulations. From Fig. 2, one can see that regardless of the noise strength, RMDS-AAP generally produces reconstruction with lower RMSE values, as well as smaller variations.

4 Proofs

We first collect some useful results in Section 4.1. Some of them are from the literature, and the proofs for the new ones are included in the Appendix. We then present the proofs for Theorem 1, splitted into the proof for the initialization phase, and the proof for the iteration phase, in Section 4.2, and Section 4.3, respectively. The proof for Proposition 1 is presented in Section 4.4.

4.1 Useful Lemmas

Lemma 1 ([6, Lemma 19]).

Let L=UΛ(U)TL^{\star}=U^{\star}\Lambda^{\star}(U^{\star})^{T}, and L=r+(L+E)L=\mathcal{H}_{r}^{+}\left(L^{\star}+E\right) for some perturbation matrix E𝕊n×nE\in\mathbb{S}^{n\times n}. Denote the eigen-decomposition of LL as UΛUTU\Lambda U^{T}. Suppose the rank-rr SVD of (U)TU\left(U^{\star}\right)^{T}U is AΣBTA\Sigma B^{T}. Set G=ABTG=AB^{T} and Δ=UUG\Delta=U-U^{\star}G. If E212λr\|E\|_{2}\leq\frac{1}{2}\lambda_{r}^{\star}, then

LL\displaystyle\left\|L-L^{\star}\right\|_{\infty}\leq Δ2,(U2,+U2,)Λ2+(3+4κ)U2,2E2.\displaystyle\|\Delta\|_{2,\infty}\left(\|U\|_{2,\infty}+\|U^{\star}\|_{2,\infty}\right)\|\Lambda\|_{2}+(3+4\kappa)\left\|U^{\star}\right\|_{2,\infty}^{2}\|E\|_{2}.
Lemma 2 ([6, Lemma 1]).

Under the same conditions of Lemma 1,

ΛGGΛ2\displaystyle\left\|\Lambda^{\star}G-G\Lambda^{\star}\right\|_{2} (2+2λ1λrE2)E2,\displaystyle\leq\left(2+\frac{2\lambda_{1}^{\star}}{\lambda_{r}^{\star}-\|E\|_{2}}\right)\|E\|_{2},
ΛHGΛ2\displaystyle\left\|\Lambda^{\star}H-G\Lambda^{\star}\right\|_{2} (2+λ1λrE2)E2.\displaystyle\leq\left(2+\frac{\lambda_{1}^{\star}}{\lambda_{r}^{\star}-\|E\|_{2}}\right)\|E\|_{2}.
Lemma 3 ([11, Lemma 47]).

Assume the same conditions of Lemma 1. Denote X=U(Λ)12X=U(\Lambda)^{\frac{1}{2}}, and X=U(Λ)12X^{\star}=U^{\star}(\Lambda^{\star})^{\frac{1}{2}}. Suppose the rank-rr SVD of (X)TX\left(X^{\star}\right)^{T}X is YΣ~ZTY\widetilde{\Sigma}Z^{T}. Set R=YZTR=YZ^{T}. Then,

GR228κ32E2λr.\|G-R\|_{2}\leq 28\kappa^{\frac{3}{2}}\cdot\frac{\|E\|_{2}}{\lambda_{r}^{\star}}.
Lemma 4 ([13, Lemma 4]).

If S𝕊n×nS\in\mathbb{S}^{n\times n} is α\alpha-sparse, i.e., SS has no more than αn\alpha n nonzero entries per row (and column), then S2αnS\left\|S\right\|_{2}\leq\alpha n\cdot\left\|S\right\|_{\infty}.

Lemma 5.

The operator 𝒜\mathcal{A} satisfies:

  1. 1.

    Z𝕊n×n\forall Z\in\mathbb{S}^{n\times n}, 𝒜(Z)4Z\|\mathcal{A}(Z)\|_{\infty}\leq 4\|Z\|_{\infty};

  2. 2.

    Z𝕊n×n\forall Z\in\mathbb{S}^{n\times n} with Z𝟏=0Z\bm{1}=0, (𝒜(Z))=Z\mathcal{B}(\mathcal{A}(Z))=Z.

The proof of Lemma 5 is deferred to Appendix B.1.

Lemma 6.

If 𝒜(Lk)Dξk\|\mathcal{A}(L^{k})-D^{\star}\|_{\infty}\leq\xi^{k}, and Sk=𝒯ξk(D𝒜(Lk))S^{k}=\mathcal{T}_{\xi^{k}}(D-\mathcal{A}(L^{k})), then supp(Sk)supp(S)\text{supp}(S^{k})\subseteq\text{supp}(S^{\star}), and

SkS𝒜(Lk)D+ξk.\|S^{k}-S^{\star}\|_{\infty}\leq\|\mathcal{A}(L^{k})-D^{\star}\|_{\infty}+\xi^{k}.

The proof of Lemma 6 is deferred to Appendix B.2.

Lemma 7.

The matrix J=In1n𝟏𝟏TJ=I_{n}-\frac{1}{n}\bm{1}\bm{1}^{T} satisfies:

  1. 1.

    J2=1\|J\|_{2}=1,

  2. 2.

    JU=UJU^{\star}=U^{\star},

  3. 3.

    JZ2,2Z2,,Zn×r\|JZ\|_{2,\infty}\leq 2\|Z\|_{2,\infty},~{}\forall Z\in\mathbb{R}^{n\times r}.

The proof of Lemma 7 is deferred to Appendix B.3.

Lemma 8 ([3, Lemma 6]).

Suppose LL is a rank-rr positive semi-definite matrix with the eigen-decomposition L=UΛ(U)TL=U\Lambda(U)^{T}, where Un×rU\in\mathbb{R}^{n\times r}. Let 𝒫T\mathcal{P}_{T} be the projection onto the tangent space of the manifold of symmetric positive semi-definite matrices of rank rr at LL, as defined in (3). Then,

(𝒫T)(L)2LL22λ1.\|(\mathcal{I}-\mathcal{P}_{T})(L^{\star})\|_{2}\leq\frac{\|L-L^{\star}\|_{2}^{2}}{\lambda_{1}^{\star}}.
Lemma 9 ([3, Lemma 8]).

Under the same conditions of Lemma 8,

𝒫T(Z)243Z2,Z𝕊n×n.\|\mathcal{P}_{T}(Z)\|_{2}\leq\frac{4}{3}\|Z\|_{2},~{}\forall Z\in\mathbb{S}^{n\times n}.

4.2 Proof for the Initialization Phase

Considering L0L^{0} as the zero matrix, S0=𝒯ξ0(D𝒜(L0))S^{0}=\mathcal{T}_{\xi^{0}}(D-\mathcal{A}(L^{0})). Since

𝒜(L0)D=Dξ0,\|\mathcal{A}(L^{0})-D^{\star}\|_{\infty}=\|D^{\star}\|_{\infty}\leq\xi^{0},

by Lemma 6, supp(S0)supp(S)\text{supp}(S^{0})\subseteq\text{supp}(S^{\star}), and

S0SD+ξ04D16L16μrnλ1,\|S^{0}-S^{\star}\|_{\infty}\leq\|D^{\star}\|_{\infty}+\xi^{0}\leq 4\|D^{\star}\|_{\infty}\leq 16\|L^{\star}\|_{\infty}\leq 16\frac{\mu r}{n}\lambda_{1}^{\star},

where the third inequality follows from D=𝒜(L)D^{\star}=\mathcal{A}(L^{\star}) and Lemma 5. Denoting E0:=(S0S)E^{0}:=\mathcal{B}(S^{0}-S^{\star}),

E02=(S0S)2=\displaystyle\|E^{0}\|_{2}=\|\mathcal{B}(S^{0}-S^{\star})\|_{2}= 12J(S0S)J2\displaystyle\frac{1}{2}\|J(S^{0}-S^{\star})J\|_{2} (4)
\displaystyle\leq 12S0S2\displaystyle\frac{1}{2}\|S^{0}-S^{\star}\|_{2}
\displaystyle\leq αn2S0S\displaystyle\frac{\alpha n}{2}\|S^{0}-S^{\star}\|_{\infty}
\displaystyle\leq 2(αn)D8(αμr)λ1,\displaystyle 2(\alpha n)\|D^{\star}\|_{\infty}\leq 8(\alpha\mu r)\lambda_{1}^{\star},

where the first inequality follows from Lemma 7, and the second inequality follows from Lemma 4. Therefore E02\|E^{0}\|_{2} is no more than λr2\frac{\lambda_{r}^{\star}}{2} if α1161μrκ\alpha\leq\frac{1}{16}\cdot\frac{1}{\mu r\kappa}.

Suppose L1=r+(LE0)L^{1}=\mathcal{H}_{r}^{+}(L^{\star}-E^{0}) has the eigen-decomposition U1Λ1(U1)TU^{1}\Lambda^{1}(U^{1})^{T}, where U1n×rU^{1}\in\mathbb{R}^{n\times r}, and Λ1=diag(λ11,,λr1)\Lambda^{1}=\text{diag}(\lambda_{1}^{1},\cdots,\lambda_{r}^{1}). Lemma 1 is applicable once we have the row-wise bound of Δ1:=U1UG1\Delta^{1}:=U^{1}-U^{\star}G^{1}, where G1G^{1} is the best rotation matrix between U1U^{1} and UU^{\star} computed from the SVD of H1:=(U)TU1H^{1}:=(U^{\star})^{T}U^{1}. Consider i=1,,ni=1,\cdots,n.

eiTΔ1=\displaystyle e_{i}^{T}\Delta^{1}= eiTU1eiTUG1\displaystyle e_{i}^{T}U^{1}-e_{i}^{T}U^{\star}G^{1}
=\displaystyle= eiT(LE0)U1(Λ1)1eiTUG1\displaystyle e_{i}^{T}(L^{\star}-E^{0})U^{1}(\Lambda^{1})^{-1}-e_{i}^{T}U^{\star}G^{1}
=\displaystyle= eiTUΛ[(U)TU1(Λ1)1(Λ)1G1]eiTE0U1(Λ1)1\displaystyle e_{i}^{T}U^{\star}\Lambda^{\star}\left[(U^{\star})^{T}U^{1}(\Lambda^{1})^{-1}-(\Lambda^{\star})^{-1}G^{1}\right]-e_{i}^{T}E^{0}U^{1}(\Lambda^{1})^{-1}
=\displaystyle= eiTUΛ[(U)TU1(Λ)1(Λ)1G1]T1\displaystyle\underbrace{e_{i}^{T}U^{\star}\Lambda^{\star}\left[(U^{\star})^{T}U^{1}(\Lambda^{\star})^{-1}-(\Lambda^{\star})^{-1}G^{1}\right]}_{T_{1}}
+eiTUΛ(U)TU1[(Λ1)1(Λ)1]T2eiTE0U1(Λ1)1T3.\displaystyle+\underbrace{e_{i}^{T}U^{\star}\Lambda^{\star}(U^{\star})^{T}U^{1}\left[(\Lambda^{1})^{-1}-(\Lambda^{\star})^{-1}\right]}_{T_{2}}-\underbrace{e_{i}^{T}E^{0}U^{1}(\Lambda^{1})^{-1}}_{T_{3}}.

Bounding T1T_{1}.

T12=\displaystyle\|T_{1}\|_{2}= eiTU[Λ(U)TU1G1Λ](Λ)12\displaystyle\|e_{i}^{T}U^{\star}\left[\Lambda^{\star}(U^{\star})^{T}U^{1}-G^{1}\Lambda^{\star}\right](\Lambda^{\star})^{-1}\|_{2}
\displaystyle\leq eiTU2ΛH1G1Λ2(Λ)12\displaystyle\|e_{i}^{T}U^{\star}\|_{2}\cdot\|\Lambda^{\star}H^{1}-G^{1}\Lambda^{\star}\|_{2}\cdot\|(\Lambda^{\star})^{-1}\|_{2}
\displaystyle\leq μrn(2+2κ)E021λr,\displaystyle\sqrt{\frac{\mu r}{n}}\cdot(2+2\kappa)\|E^{0}\|_{2}\cdot\frac{1}{\lambda_{r}^{\star}},

where in the last inequality, Lemma 2 is applied with the bound E0212λr\|E^{0}\|_{2}\leq\frac{1}{2}\lambda_{r}^{\star}.

Bounding T2T_{2}. Due to Weyl’s inequality,

|λj1λj|E02,j=1,,r.|\lambda_{j}^{1}-\lambda_{j}^{\star}|\leq\|E^{0}\|_{2},~{}j=1,\cdots,r.
(Λ1)1(Λ)12=maxj=1,,r|1λj11λj|=maxj=1,,r|λj1λj|λj1λj2E02(λr)2,\|(\Lambda^{1})^{-1}-(\Lambda^{\star})^{-1}\|_{2}=\max_{j=1,\cdots,r}\left|\frac{1}{\lambda_{j}^{1}}-\frac{1}{\lambda_{j}^{\star}}\right|=\max_{j=1,\cdots,r}\frac{|\lambda_{j}^{1}-\lambda_{j}^{\star}|}{\lambda_{j}^{1}\lambda_{j}^{\star}}\leq 2\frac{\|E^{0}\|_{2}}{(\lambda_{r}^{\star})^{2}},

where λr112λr\lambda_{r}^{1}\geq\frac{1}{2}\lambda_{r}^{\star} is used in the last inequality. Therefore,

T22\displaystyle\|T_{2}\|_{2}\leq eiTU2Λ2(U)TU12(Λ1)1(Λ)12μrn2κE02λr.\displaystyle\|e_{i}^{T}U^{\star}\|_{2}\cdot\|\Lambda^{\star}\|_{2}\cdot\|(U^{\star})^{T}U^{1}\|_{2}\cdot\|(\Lambda^{1})^{-1}-(\Lambda^{\star})^{-1}\|_{2}\leq\sqrt{\frac{\mu r}{n}}\cdot 2\kappa\frac{\|E^{0}\|_{2}}{\lambda_{r}^{\star}}.

Bounding T3T_{3}.

T32eiTE0U12(Λ1)12eiTE0U122λr,\displaystyle\|T_{3}\|_{2}\leq\|e_{i}^{T}E^{0}U^{1}\|_{2}\cdot\|(\Lambda^{1})^{-1}\|_{2}\leq\|e_{i}^{T}E^{0}U^{1}\|_{2}\cdot\frac{2}{\lambda_{r}^{\star}},

therefore we just need to bound eiTE0U12\|e_{i}^{T}E^{0}U^{1}\|_{2}.

eiTE0U12eiTE0UG12+eiTE0(U1UG1)2=eiTE0U2+eiTE0Δ12.\displaystyle\|e_{i}^{T}E^{0}U^{1}\|_{2}\leq\|e_{i}^{T}E^{0}U^{\star}G^{1}\|_{2}+\|e_{i}^{T}E^{0}(U^{1}-U^{\star}G^{1})\|_{2}=\|e_{i}^{T}E^{0}U^{\star}\|_{2}+\|e_{i}^{T}E^{0}\Delta^{1}\|_{2}.

For the first term,

eiTE0U2=eiT(S0S)U2=\displaystyle\|e_{i}^{T}E^{0}U^{\star}\|_{2}=\|e_{i}^{T}\mathcal{B}(S^{0}-S^{\star})U^{\star}\|_{2}= 12eiTJ(S0S)JU2\displaystyle\frac{1}{2}\|e_{i}^{T}J(S^{0}-S^{\star})JU^{\star}\|_{2}
=\displaystyle= 12eiTJ(S0S)U2\displaystyle\frac{1}{2}\|e_{i}^{T}J(S^{0}-S^{\star})U^{\star}\|_{2}
\displaystyle\leq 12J(S0S)U2,\displaystyle\frac{1}{2}\|J(S^{0}-S^{\star})U^{\star}\|_{2,\infty}
\displaystyle\leq (S0S)U2,\displaystyle\|(S^{0}-S^{\star})U^{\star}\|_{2,\infty}
\displaystyle\leq (αn)S0Sμrn4(αn)Dμrn,\displaystyle(\alpha n)\|S^{0}-S^{\star}\|_{\infty}\cdot\sqrt{\frac{\mu r}{n}}\leq 4(\alpha n)\|D^{\star}\|_{\infty}\sqrt{\frac{\mu r}{n}},

where the third equality and the second inequality follows from Lemma 7, the third inequality follows from triangular inequality and the row-wise bound of UU^{\star}.

The second term can be bounded similarly. Since

𝟏T(LE0)=𝟏T(L(S0S))=0,\bm{1}^{T}(L^{\star}-E^{0})=\bm{1}^{T}(L^{\star}-\mathcal{B}(S^{0}-S^{\star}))=0,

𝟏\bm{1} is in the null space of L1L^{1}. From 𝟏T(U1UG1)=0\bm{1}^{T}(U^{1}-U^{\star}G^{1})=0, we can get JΔ1=Δ1J\Delta^{1}=\Delta^{1}, and

eiTE0Δ12\displaystyle\|e_{i}^{T}E^{0}\Delta^{1}\|_{2}\leq (S0S)Δ12,\displaystyle\|(S^{0}-S^{\star})\Delta^{1}\|_{2,\infty}
\displaystyle\leq (αn)S0SΔ12,4(αn)DΔ12,.\displaystyle(\alpha n)\|S^{0}-S^{\star}\|_{\infty}\cdot\|\Delta^{1}\|_{2,\infty}\leq 4(\alpha n)\|D^{\star}\|_{\infty}\|\Delta^{1}\|_{2,\infty}.

Therefore,

T32\displaystyle\|T_{3}\|_{2}\leq 8(αn)Dλrμrn+8(αn)DλrΔ12,\displaystyle 8(\alpha n)\frac{\|D^{\star}\|_{\infty}}{\lambda_{r}^{\star}}\sqrt{\frac{\mu r}{n}}+8(\alpha n)\frac{\|D^{\star}\|_{\infty}}{\lambda_{r}^{\star}}\|\Delta^{1}\|_{2,\infty}
\displaystyle\leq 8(αn)Dλrμrn+32(αμrκ)Δ12,\displaystyle 8(\alpha n)\frac{\|D^{\star}\|_{\infty}}{\lambda_{r}^{\star}}\sqrt{\frac{\mu r}{n}}+32(\alpha\mu r\kappa)\|\Delta^{1}\|_{2,\infty}
\displaystyle\leq 8(αn)Dλrμrn+12Δ12,\displaystyle 8(\alpha n)\frac{\|D^{\star}\|_{\infty}}{\lambda_{r}^{\star}}\sqrt{\frac{\mu r}{n}}+\frac{1}{2}\|\Delta^{1}\|_{2,\infty}

where in the second inequality, the bound D4μrnλ1\|D^{\star}\|_{\infty}\leq 4\frac{\mu r}{n}\lambda_{1}^{\star} is used again, and the last inequality holds if α1641μrκ\alpha\leq\frac{1}{64}\cdot\frac{1}{\mu r\kappa}.

Combining the bounds of T1T_{1} to T3T_{3}, and taking the maximum with respect to ii,

Δ12,(2+2κ)E02λrμrn+2κE02λrμrn+8(αn)Dλrμrn+12Δ12,,\displaystyle\|\Delta^{1}\|_{2,\infty}\leq(2+2\kappa)\frac{\|E^{0}\|_{2}}{\lambda_{r}^{\star}}\sqrt{\frac{\mu r}{n}}+2\kappa\frac{\|E^{0}\|_{2}}{\lambda_{r}^{\star}}\sqrt{\frac{\mu r}{n}}+8(\alpha n)\frac{\|D^{\star}\|_{\infty}}{\lambda_{r}^{\star}}\sqrt{\frac{\mu r}{n}}+\frac{1}{2}\|\Delta^{1}\|_{2,\infty},

consequently,

Δ12,\displaystyle\|\Delta^{1}\|_{2,\infty}\leq [12κE02λr+16(αn)Dλr]μrn\displaystyle\left[12\kappa\frac{\|E^{0}\|_{2}}{\lambda_{r}^{\star}}+16(\alpha n)\frac{\|D^{\star}\|_{\infty}}{\lambda_{r}^{\star}}\right]\sqrt{\frac{\mu r}{n}} (5)
\displaystyle\leq [24(ακn)Dλr+16(αn)Dλr]μrn\displaystyle\left[24(\alpha\kappa n)\frac{\|D^{\star}\|_{\infty}}{\lambda_{r}^{\star}}+16(\alpha n)\frac{\|D^{\star}\|_{\infty}}{\lambda_{r}^{\star}}\right]\sqrt{\frac{\mu r}{n}}
\displaystyle\leq 40(ακn)Dλrμrn160(αμrκ2)μrnμrn\displaystyle 40(\alpha\kappa n)\frac{\|D^{\star}\|_{\infty}}{\lambda_{r}^{\star}}\sqrt{\frac{\mu r}{n}}\leq 160(\alpha\mu r\kappa^{2})\sqrt{\frac{\mu r}{n}}\leq\sqrt{\frac{\mu r}{n}}

where in the second inequality, the bound E022(αn)D\|E^{0}\|_{2}\leq 2(\alpha n)\|D^{\star}\|_{\infty} is used, and the last inequality holds if α11601μrκ2\alpha\leq\frac{1}{160}\cdot\frac{1}{\mu r\kappa^{2}}. As a result,

eiTU12eiTUG12+eiTΔ12=eiTU2+eiTΔ12,\|e_{i}^{T}U^{1}\|_{2}\leq\|e_{i}^{T}U^{\star}G^{1}\|_{2}+\|e_{i}^{T}\Delta^{1}\|_{2}=\|e_{i}^{T}U^{\star}\|_{2}+\|e_{i}^{T}\Delta^{1}\|_{2},

therefore U12,2μrn\|U^{1}\|_{2,\infty}\leq 2\sqrt{\frac{\mu r}{n}}. Also, under this bound of α\alpha we have

E022(αn)D8(αμr)λ1120λr.\|E^{0}\|_{2}\leq 2(\alpha n)\|D^{\star}\|_{\infty}\leq 8(\alpha\mu r)\lambda_{1}^{\star}\leq\frac{1}{20}\lambda_{r}^{\star}.

Now by Lemma 1,

L1L\displaystyle\left\|L^{1}-L^{\star}\right\|_{\infty}\leq Δ12,(U12,+U2,)Λ12+(3+4κ)U2,2E02\displaystyle\|\Delta^{1}\|_{2,\infty}\left(\|U^{1}\|_{2,\infty}+\|U^{\star}\|_{2,\infty}\right)\|\Lambda^{1}\|_{2}+(3+4\kappa)\left\|U^{\star}\right\|_{2,\infty}^{2}\|E^{0}\|_{2} (6)
\displaystyle\leq 40(ακn)Dλrμrn3μrn2120λ1+7κμrn2(αn)D\displaystyle 40(\alpha\kappa n)\frac{\|D^{\star}\|_{\infty}}{\lambda_{r}^{\star}}\sqrt{\frac{\mu r}{n}}\cdot 3\sqrt{\frac{\mu r}{n}}\cdot\frac{21}{20}\lambda_{1}^{\star}+7\kappa\frac{\mu r}{n}\cdot 2(\alpha n)\|D^{\star}\|_{\infty}
=\displaystyle= (126αμrκ2+14αμrκ)DD4γ\displaystyle(126\alpha\mu r\kappa^{2}+14\alpha\mu r\kappa)\|D^{\star}\|_{\infty}\leq\frac{\|D^{\star}\|_{\infty}}{4}\gamma

if α1560γμrκ2\alpha\leq\frac{1}{560}\cdot\frac{\gamma}{\mu r\kappa^{2}}.

4.3 Proof for the Iteration Phase

For k1k\geq 1, our induction hypotheses are the following:

Ek12\displaystyle\|E^{k-1}\|_{2}\leq 4(αn)Dγk1,\displaystyle 4(\alpha n)\|D^{\star}\|_{\infty}\gamma^{k-1}, (7a)
Δk2,\displaystyle\|\Delta^{k}\|_{2,\infty}\leq 120(ακn)Dλrμrnγk1,\displaystyle 120(\alpha\kappa n)\frac{\|D^{\star}\|_{\infty}}{\lambda_{r}^{\star}}\sqrt{\frac{\mu r}{n}}\gamma^{k-1}, (7b)
LkL\displaystyle\|L^{k}-L^{\star}\|_{\infty}\leq D4γk.\displaystyle\frac{\|D^{\star}\|_{\infty}}{4}\gamma^{k}. (7c)

One can see from (4), (5), and (6) that the three bounds hold for k=1k=1. For any k1k\geq 1,

Ek1216(αμr)λ1120λr\|E^{k-1}\|_{2}\leq 16(\alpha\mu r)\lambda_{1}^{\star}\leq\frac{1}{20}\lambda_{r}^{\star}

when α13201μrκ\alpha\leq\frac{1}{320}\cdot\frac{1}{\mu r\kappa}; and

Δk2,480(αμrκ2)μrnμrn\|\Delta^{k}\|_{2,\infty}\leq 480(\alpha\mu r\kappa^{2})\sqrt{\frac{\mu r}{n}}\leq\sqrt{\frac{\mu r}{n}}

when α14801μrκ2\alpha\leq\frac{1}{480}\cdot\frac{1}{\mu r\kappa^{2}} so that Uk2,2μrn\|U^{k}\|_{2,\infty}\leq 2\sqrt{\frac{\mu r}{n}}.

Now we prove for iteration (k+1)(k+1). With the bound LkLD4γk\|L^{k}-L^{\star}\|_{\infty}\leq\frac{\|D^{\star}\|_{\infty}}{4}\gamma^{k}, by Lemma 5,

𝒜(Lk)DDγkξk.\|\mathcal{A}(L^{k})-D^{\star}\|_{\infty}\leq\|D^{\star}\|_{\infty}\gamma^{k}\leq\xi^{k}.

Then by Lemma 6, supp(Sk)supp(S)\text{supp}(S^{k})\subseteq\text{supp}(S^{\star}), and

SkSDγk+ξk(4D)γk.\|S^{k}-S^{\star}\|_{\infty}\leq\|D^{\star}\|_{\infty}\gamma^{k}+\xi^{k}\leq(4\|D^{\star}\|_{\infty})\gamma^{k}.

By Lemma 5, (𝒜(L))=L\mathcal{B}(\mathcal{A}(L^{\star}))=L^{\star}, and

Lk+1=r+𝒫Tk(DSk)=\displaystyle L^{k+1}=\mathcal{H}_{r}^{+}\mathcal{P}_{T^{k}}\mathcal{B}(D-S^{k})= r+𝒫Tk(D+SSk)\displaystyle\mathcal{H}_{r}^{+}\mathcal{P}_{T^{k}}\mathcal{B}(D^{\star}+S^{\star}-S^{k})
=\displaystyle= r+𝒫Tk(𝒜(L)+SSk)=r+𝒫Tk(L+(SSk)).\displaystyle\mathcal{H}_{r}^{+}\mathcal{P}_{T^{k}}\mathcal{B}(\mathcal{A}(L^{\star})+S^{\star}-S^{k})=\mathcal{H}_{r}^{+}\mathcal{P}_{T^{k}}(L^{\star}+\mathcal{B}(S^{\star}-S^{k})).

Denoting

Lk+1=r+𝒫Tk(L+(SSk))=\displaystyle L^{k+1}=\mathcal{H}_{r}^{+}\mathcal{P}_{T^{k}}(L^{\star}+\mathcal{B}(S^{\star}-S^{k}))= r+(L((𝒫Tk)L+𝒫Tk(SkS)):=r+(LEk),\displaystyle\mathcal{H}_{r}^{+}(L^{\star}-((\mathcal{I}-\mathcal{P}_{T^{k}})L^{\star}+\mathcal{P}_{T^{k}}\mathcal{B}(S^{k}-S^{\star})):=\mathcal{H}_{r}^{+}(L^{\star}-E^{k}),

and the eigen-decomposition of Lk+1L^{k+1} as Uk+1Λk+1(Uk+1)TU^{k+1}\Lambda^{k+1}(U^{k+1})^{T}.

Ek2=\displaystyle\|E^{k}\|_{2}= (𝒫Tk)L+𝒫Tk(SkS)2\displaystyle\|(\mathcal{I}-\mathcal{P}_{T^{k}})L^{\star}+\mathcal{P}_{T^{k}}\mathcal{B}(S^{k}-S^{\star})\|_{2}
\displaystyle\leq LkL22λ1+43(SkS)2\displaystyle\frac{\|L^{k}-L^{\star}\|_{2}^{2}}{\lambda_{1}^{\star}}+\frac{4}{3}\|\mathcal{B}(S^{k}-S^{\star})\|_{2}
\displaystyle\leq Ek12λ1Ek12+23SkS2\displaystyle\frac{\|E^{k-1}\|_{2}}{\lambda_{1}^{\star}}\|E^{k-1}\|_{2}+\frac{2}{3}\|S^{k}-S^{\star}\|_{2}
\displaystyle\leq 4(αn)Dλ1γk14(αn)Dγk1+83(αn)Dγk\displaystyle 4(\alpha n)\frac{\|D^{\star}\|_{\infty}}{\lambda_{1}^{\star}}\gamma^{k-1}\cdot 4(\alpha n)\|D^{\star}\|_{\infty}\gamma^{k-1}+\frac{8}{3}(\alpha n)\|D^{\star}\|_{\infty}\gamma^{k}
\displaystyle\leq 16(αμr)γk14(αn)Dγk1+83(αn)Dγk\displaystyle 16(\alpha\mu r)\gamma^{k-1}\cdot 4(\alpha n)\|D^{\star}\|_{\infty}\gamma^{k-1}+\frac{8}{3}(\alpha n)\|D^{\star}\|_{\infty}\gamma^{k}
\displaystyle\leq γ34(αn)Dγk1+83(αn)Dγk=4(αn)Dγk,\displaystyle\frac{\gamma}{3}\cdot 4(\alpha n)\|D^{\star}\|_{\infty}\gamma^{k-1}+\frac{8}{3}(\alpha n)\|D^{\star}\|_{\infty}\gamma^{k}=4(\alpha n)\|D^{\star}\|_{\infty}\gamma^{k},

where the first inequality follows from Lemma 8 and Lemma 9, the second inequality is due to Lemma 7 again, the third inequality follows from the induction hypothesis (7a), and the fifth inequality holds if α148γμr\alpha\leq\frac{1}{48}\cdot\frac{\gamma}{\mu r}. Then we proceed as in the base case to get

eiTΔk+1=\displaystyle e_{i}^{T}\Delta^{k+1}= eiTUΛ[(U)TUk+1(Λ)1(Λ)1Gk+1]T1\displaystyle\underbrace{e_{i}^{T}U^{\star}\Lambda^{\star}\left[(U^{\star})^{T}U^{k+1}(\Lambda^{\star})^{-1}-(\Lambda^{\star})^{-1}G^{k+1}\right]}_{T_{1}}
+eiTUΛ(U)TUk+1[(Λk+1)1(Λ)1]T2eiTEkUk+1(Λk+1)1T3,\displaystyle+\underbrace{e_{i}^{T}U^{\star}\Lambda^{\star}(U^{\star})^{T}U^{k+1}\left[(\Lambda^{k+1})^{-1}-(\Lambda^{\star})^{-1}\right]}_{T_{2}}-\underbrace{e_{i}^{T}E^{k}U^{k+1}(\Lambda^{k+1})^{-1}}_{T_{3}},

and the first two terms can be bounded similarly, i.e.,

T12μrn(2+2κ)Ek2λr,T22μrn2κEk2λr.\displaystyle\|T_{1}\|_{2}\leq\sqrt{\frac{\mu r}{n}}\cdot(2+2\kappa)\frac{\|E^{k}\|_{2}}{\lambda_{r}^{\star}},~{}\|T_{2}\|_{2}\leq\sqrt{\frac{\mu r}{n}}\cdot 2\kappa\frac{\|E^{k}\|_{2}}{\lambda_{r}^{\star}}.

Bounding T3T_{3}.

T32eiTEkUk+12(Λk+1)12eiTEk22λr,\displaystyle\|T_{3}\|_{2}\leq\|e_{i}^{T}E^{k}U^{k+1}\|_{2}\cdot\|(\Lambda^{k+1})^{-1}\|_{2}\leq\|e_{i}^{T}E^{k}\|_{2}\cdot\frac{2}{\lambda_{r}^{\star}},

while

eiTEk2eiT(𝒫Tk)L2B1+eiT𝒫Tk(SkS)2B2.\displaystyle\|e_{i}^{T}E^{k}\|_{2}\leq\underbrace{\|e_{i}^{T}(\mathcal{I}-\mathcal{P}_{T^{k}})L^{\star}\|_{2}}_{B_{1}}+\underbrace{\|e_{i}^{T}\mathcal{P}_{T^{k}}\mathcal{B}(S^{k}-S^{\star})\|_{2}}_{B_{2}}.

For the first term,

(𝒫Tk)L=\displaystyle(\mathcal{I}-\mathcal{P}_{T^{k}})L^{\star}= (InUk(Uk)T)L(InUk(Uk)T)\displaystyle(I_{n}-U^{k}(U^{k})^{T})L^{\star}(I_{n}-U^{k}(U^{k})^{T})
=\displaystyle= (Uk(Uk)TU(U)T)(L)(InUk(Uk)T)\displaystyle(U^{k}(U^{k})^{T}-U^{\star}(U^{\star})^{T})(-L^{\star})(I_{n}-U^{k}(U^{k})^{T})
=\displaystyle= (Uk(Uk)TU(U)T)(LkL)(InUk(Uk)T)\displaystyle(U^{k}(U^{k})^{T}-U^{\star}(U^{\star})^{T})(L^{k}-L^{\star})(I_{n}-U^{k}(U^{k})^{T})

where the last equality follows from the fact that Lk(InUk(Uk)T)=0L^{k}(I_{n}-U^{k}(U^{k})^{T})=0. Therefore,

B1=\displaystyle B_{1}= eiT(Uk(Uk)TU(U)T)(LkL)(InUk(Uk)T)2\displaystyle\|e_{i}^{T}(U^{k}(U^{k})^{T}-U^{\star}(U^{\star})^{T})(L^{k}-L^{\star})(I_{n}-U^{k}(U^{k})^{T})\|_{2}
\displaystyle\leq eiT(Uk(Uk)TU(U)T)2LkL2\displaystyle\|e_{i}^{T}(U^{k}(U^{k})^{T}-U^{\star}(U^{\star})^{T})\|_{2}\cdot\|L^{k}-L^{\star}\|_{2}
\displaystyle\leq (eiTUk2+eiTU2)LkL23μrnEk12.\displaystyle(\|e_{i}^{T}U^{k}\|_{2}+\|e_{i}^{T}U^{\star}\|_{2})\|L^{k}-L^{\star}\|_{2}\leq 3\sqrt{\frac{\mu r}{n}}\cdot\|E^{k-1}\|_{2}.

For the second term,

B2=\displaystyle B_{2}= eiT𝒫Tk(SkS)2\displaystyle\|e_{i}^{T}\mathcal{P}_{T^{k}}\mathcal{B}(S^{k}-S^{\star})\|_{2}
\displaystyle\leq eiTUk(Uk)T(SkS)(InUk(Uk)T)2B21+eiT(SkS)Uk(Uk)T2B22.\displaystyle\underbrace{\|e_{i}^{T}U^{k}(U^{k})^{T}\mathcal{B}(S^{k}-S^{\star})(I_{n}-U^{k}(U^{k})^{T})\|_{2}}_{B_{21}}+\underbrace{\|e_{i}^{T}\mathcal{B}(S^{k}-S^{\star})U^{k}(U^{k})^{T}\|_{2}}_{B_{22}}.
B21eiTUk2(SkS)22μrnαn2SkSμrn4(αn)Dγk,\displaystyle B_{21}\leq\|e_{i}^{T}U^{k}\|_{2}\cdot\|\mathcal{B}(S^{k}-S^{\star})\|_{2}\leq 2\sqrt{\frac{\mu r}{n}}\cdot\frac{\alpha n}{2}\|S^{k}-S^{\star}\|_{\infty}\leq\sqrt{\frac{\mu r}{n}}\cdot 4(\alpha n)\|D^{\star}\|_{\infty}\gamma^{k},
B22eiT(SkS)Uk2\displaystyle B_{22}\leq\|e_{i}^{T}\mathcal{B}(S^{k}-S^{\star})U^{k}\|_{2}\leq 12J(SkS)JUk2,\displaystyle\frac{1}{2}\|J(S^{k}-S^{\star})JU^{k}\|_{2,\infty}
=\displaystyle= 12J(SkS)Uk2,\displaystyle\frac{1}{2}\|J(S^{k}-S^{\star})U^{k}\|_{2,\infty}
\displaystyle\leq (SkS)Uk2,4(αn)Dγk2μrn.\displaystyle\|(S^{k}-S^{\star})U^{k}\|_{2,\infty}\leq 4(\alpha n)\|D^{\star}\|_{\infty}\gamma^{k}\cdot 2\sqrt{\frac{\mu r}{n}}.

Putting together,

T326Ek12λrμrn+24(αn)Dλrμrnγk.\displaystyle\|T_{3}\|_{2}\leq 6\frac{\|E^{k-1}\|_{2}}{\lambda_{r}^{\star}}\sqrt{\frac{\mu r}{n}}+24(\alpha n)\frac{\|D^{\star}\|_{\infty}}{\lambda_{r}^{\star}}\sqrt{\frac{\mu r}{n}}\gamma^{k}.

Combining the bounds of T1T_{1} to T3T_{3}, and taking the maximum with respect to ii,

Δk+12,\displaystyle\|\Delta^{k+1}\|_{2,\infty}\leq 6κEk2λrμrn+6Ek12λrμrn+24(αn)Dλrμrnγk\displaystyle 6\kappa\frac{\|E^{k}\|_{2}}{\lambda_{r}^{\star}}\sqrt{\frac{\mu r}{n}}+6\frac{\|E^{k-1}\|_{2}}{\lambda_{r}^{\star}}\sqrt{\frac{\mu r}{n}}+24(\alpha n)\frac{\|D^{\star}\|_{\infty}}{\lambda_{r}^{\star}}\sqrt{\frac{\mu r}{n}}\gamma^{k}
\displaystyle\leq [24(ακn)Dλrγk+24(αn)Dλrγk1+24(αn)Dλrγk]μrn\displaystyle\left[24(\alpha\kappa n)\frac{\|D^{\star}\|_{\infty}}{\lambda_{r}^{\star}}\gamma^{k}+24(\alpha n)\frac{\|D^{\star}\|_{\infty}}{\lambda_{r}^{\star}}\gamma^{k-1}+24(\alpha n)\frac{\|D^{\star}\|_{\infty}}{\lambda_{r}^{\star}}\gamma^{k}\right]\sqrt{\frac{\mu r}{n}}
\displaystyle\leq [24(ακn)Dλrγk+72(αn)Dλrγk+24(αn)Dλrγk]μrn\displaystyle\left[24(\alpha\kappa n)\frac{\|D^{\star}\|_{\infty}}{\lambda_{r}^{\star}}\gamma^{k}+72(\alpha n)\frac{\|D^{\star}\|_{\infty}}{\lambda_{r}^{\star}}\gamma^{k}+24(\alpha n)\frac{\|D^{\star}\|_{\infty}}{\lambda_{r}^{\star}}\gamma^{k}\right]\sqrt{\frac{\mu r}{n}}
\displaystyle\leq 120(ακn)Dλrμrnγk,\displaystyle 120(\alpha\kappa n)\frac{\|D^{\star}\|_{\infty}}{\lambda_{r}^{\star}}\sqrt{\frac{\mu r}{n}}\gamma^{k},

where the third inequality holds if γ13\gamma\geq\frac{1}{3}. By Lemma 1 again,

Lk+1L\displaystyle\left\|L^{k+1}-L^{\star}\right\|_{\infty}\leq Δk+12,(Uk+12,+U2,)Λk+12+(3+4κ)U2,2Ek2\displaystyle\|\Delta^{k+1}\|_{2,\infty}\left(\|U^{k+1}\|_{2,\infty}+\|U^{\star}\|_{2,\infty}\right)\|\Lambda^{k+1}\|_{2}+(3+4\kappa)\left\|U^{\star}\right\|_{2,\infty}^{2}\|E^{k}\|_{2}
\displaystyle\leq 120(ακn)Dλrμrnγk3μrn2120λ1+7κμrn(4αn)Dγk\displaystyle 120(\alpha\kappa n)\frac{\|D^{\star}\|_{\infty}}{\lambda_{r}^{\star}}\sqrt{\frac{\mu r}{n}}\gamma^{k}\cdot 3\sqrt{\frac{\mu r}{n}}\cdot\frac{21}{20}\lambda_{1}^{\star}+7\kappa\frac{\mu r}{n}\cdot(4\alpha n)\|D^{\star}\|_{\infty}\gamma^{k}
\displaystyle\leq 406(αμrκ2)DγkD4γk+1\displaystyle 406(\alpha\mu r\kappa^{2})\|D^{\star}\|_{\infty}\gamma^{k}\leq\frac{\|D^{\star}\|_{\infty}}{4}\gamma^{k+1}

if α11624γμrκ2\alpha\leq\frac{1}{1624}\cdot\frac{\gamma}{\mu r\kappa^{2}}.

4.4 Proof for Proposition 1

For k0k\geq 0,

Xk+1XRk+12,\displaystyle\|X^{k+1}-X^{\star}R^{k+1}\|_{2,\infty}
=\displaystyle= Uk+1(Λk+1)12U(Λ)12Rk+12,\displaystyle\|U^{k+1}(\Lambda^{k+1})^{\frac{1}{2}}-U^{\star}(\Lambda^{\star})^{\frac{1}{2}}R^{k+1}\|_{2,\infty}
\displaystyle\leq Uk+1(Λk+1)12UGk+1(Λk+1)122,I1+UGk+1(Λk+1)12UGk+1(Λ)122,I2\displaystyle\underbrace{\|U^{k+1}(\Lambda^{k+1})^{\frac{1}{2}}-U^{\star}G^{k+1}(\Lambda^{k+1})^{\frac{1}{2}}\|_{2,\infty}}_{I_{1}}+\underbrace{\|U^{\star}G^{k+1}(\Lambda^{k+1})^{\frac{1}{2}}-U^{\star}G^{k+1}(\Lambda^{\star})^{\frac{1}{2}}\|_{2,\infty}}_{I_{2}}
+UGk+1(Λ)12U(Λ)12Gk+12,I3+U(Λ)12Gk+1U(Λ)12Rk+12,I4.\displaystyle+\underbrace{\|U^{\star}G^{k+1}(\Lambda^{\star})^{\frac{1}{2}}-U^{\star}(\Lambda^{\star})^{\frac{1}{2}}G^{k+1}\|_{2,\infty}}_{I_{3}}+\underbrace{\|U^{\star}(\Lambda^{\star})^{\frac{1}{2}}G^{k+1}-U^{\star}(\Lambda^{\star})^{\frac{1}{2}}R^{k+1}\|_{2,\infty}}_{I_{4}}.

Bounding I1I_{1}.

I1=\displaystyle I_{1}= maxieiT[Uk+1(Λk+1)12UGk+1(Λk+1)12]2\displaystyle\max_{i}\|e_{i}^{T}[U^{k+1}(\Lambda^{k+1})^{\frac{1}{2}}-U^{\star}G^{k+1}(\Lambda^{k+1})^{\frac{1}{2}}]\|_{2}
\displaystyle\leq Δk+12,(Λk+1)122\displaystyle\|\Delta^{k+1}\|_{2,\infty}\cdot\|(\Lambda^{k+1})^{\frac{1}{2}}\|_{2}
\displaystyle\leq 120(ακn)D2,λrμrnγk(2120λ1)12492(αμrκ2)μrn(λ1)12γk.\displaystyle 120(\alpha\kappa n)\frac{\|D^{\star}\|_{2,\infty}}{\lambda_{r}^{\star}}\sqrt{\frac{\mu r}{n}}\gamma^{k}\cdot\left(\frac{21}{20}\lambda_{1}^{\star}\right)^{\frac{1}{2}}\leq 492(\alpha\mu r\kappa^{2})\sqrt{\frac{\mu r}{n}}(\lambda_{1}^{\star})^{\frac{1}{2}}\cdot\gamma^{k}.

Bounding I2I_{2}.

I2=maxieiTUGk+1[(Λk+1)12(Λ)12]2μrn(Λk+1)12(Λ)122.\displaystyle I_{2}=\max_{i}\|e_{i}^{T}U^{\star}G^{k+1}[(\Lambda^{k+1})^{\frac{1}{2}}-(\Lambda^{\star})^{\frac{1}{2}}]\|_{2}\leq\sqrt{\frac{\mu r}{n}}\cdot\|(\Lambda^{k+1})^{\frac{1}{2}}-(\Lambda^{\star})^{\frac{1}{2}}\|_{2}.

Since

(Λk+1)12(Λ)122=maxj=1,,r|(λjk+1)12(λj)12|=maxj=1,,r|λjk+1λj|(λjk+1)12+(λj)12Ek2(λr)12,\displaystyle\|(\Lambda^{k+1})^{\frac{1}{2}}-(\Lambda^{\star})^{\frac{1}{2}}\|_{2}=\max_{j=1,\cdots,r}\left|(\lambda_{j}^{k+1})^{\frac{1}{2}}-(\lambda_{j}^{\star})^{\frac{1}{2}}\right|=\max_{j=1,\cdots,r}\frac{|\lambda_{j}^{k+1}-\lambda_{j}^{\star}|}{(\lambda_{j}^{k+1})^{\frac{1}{2}}+(\lambda_{j}^{\star})^{\frac{1}{2}}}\leq\frac{\|E^{k}\|_{2}}{(\lambda_{r}^{\star})^{\frac{1}{2}}},
I24(αn)D(λr)12μrnγk16(αμrκ12)μrn(λ1)12γk.I_{2}\leq 4(\alpha n)\frac{\|D^{\star}\|_{\infty}}{(\lambda_{r}^{\star})^{\frac{1}{2}}}\sqrt{\frac{\mu r}{n}}\cdot\gamma^{k}\leq 16(\alpha\mu r\kappa^{\frac{1}{2}})\sqrt{\frac{\mu r}{n}}(\lambda_{1}^{\star})^{\frac{1}{2}}\cdot\gamma^{k}.

Bounding I3I_{3}.

I3=maxieiTU[Gk+1(Λ)12(Λ)12Gk+1]2μrnGk+1(Λ)12(Λ)12Gk+12.\displaystyle I_{3}=\max_{i}\|e_{i}^{T}U^{\star}[G^{k+1}(\Lambda^{\star})^{\frac{1}{2}}-(\Lambda^{\star})^{\frac{1}{2}}G^{k+1}]\|_{2}\leq\sqrt{\frac{\mu r}{n}}\cdot\|G^{k+1}(\Lambda^{\star})^{\frac{1}{2}}-(\Lambda^{\star})^{\frac{1}{2}}G^{k+1}\|_{2}.
Gk+1(Λ)12(Λ)12Gk+12=\displaystyle\|G^{k+1}(\Lambda^{\star})^{\frac{1}{2}}-(\Lambda^{\star})^{\frac{1}{2}}G^{k+1}\|_{2}= Gk+1(Λ)12(Λ)12Gk+12\displaystyle\|G^{k+1}(\Lambda^{\star})^{\frac{1}{2}}-(\Lambda^{\star})^{\frac{1}{2}}G^{k+1}\|_{2}
=\displaystyle= (Λ)12(Gk+1)T(Λ)12Gk+12\displaystyle\|(\Lambda^{\star})^{\frac{1}{2}}-(G^{k+1})^{T}(\Lambda^{\star})^{\frac{1}{2}}G^{k+1}\|_{2}
\displaystyle\leq 12(λr)12Λ(Gk+1)TΛGk+12\displaystyle\frac{1}{2(\lambda_{r}^{\star})^{\frac{1}{2}}}\|\Lambda^{\star}-(G^{k+1})^{T}\Lambda^{\star}G^{k+1}\|_{2}
=\displaystyle= 12(λr)12ΛGk+1Gk+1Λ2\displaystyle\frac{1}{2(\lambda_{r}^{\star})^{\frac{1}{2}}}\|\Lambda^{\star}G^{k+1}-G^{k+1}\Lambda^{\star}\|_{2}
\displaystyle\leq 12(λr)12(2+4κ)Ek23κEk2(λr)12,\displaystyle\frac{1}{2(\lambda_{r}^{\star})^{\frac{1}{2}}}\cdot(2+4\kappa)\|E^{k}\|_{2}\leq 3\kappa\frac{\|E^{k}\|_{2}}{(\lambda_{r}^{\star})^{\frac{1}{2}}},

where the first inequality follows from [14, Lemma 2.1], since (Λ)12(\Lambda^{\star})^{\frac{1}{2}} and (Gk+1)T(Λ)12Gk+1(G^{k+1})^{T}(\Lambda^{\star})^{\frac{1}{2}}G^{k+1} are the matrix square root of Λ\Lambda^{\star} and (Gk+1)TΛGk+1(G^{k+1})^{T}\Lambda^{\star}G^{k+1}, respectively; and the second inequality follows from Lemma 2. Therefore,

I312(ακn)D(λr)12μrnγk48(αμrκ32)μrn(λ1)12γk.I_{3}\leq 12(\alpha\kappa n)\frac{\|D^{\star}\|_{\infty}}{(\lambda_{r}^{\star})^{\frac{1}{2}}}\sqrt{\frac{\mu r}{n}}\cdot\gamma^{k}\leq 48(\alpha\mu r\kappa^{\frac{3}{2}})\sqrt{\frac{\mu r}{n}}(\lambda_{1}^{\star})^{\frac{1}{2}}\cdot\gamma^{k}.

Bounding I4I_{4}.

I4=\displaystyle I_{4}= maxieiT[U(Λ)12Gk+1U(Λ)12Rk+1]2\displaystyle\max_{i}\|e_{i}^{T}[U^{\star}(\Lambda^{\star})^{\frac{1}{2}}G^{k+1}-U^{\star}(\Lambda^{\star})^{\frac{1}{2}}R^{k+1}]\|_{2}
\displaystyle\leq μrn(λ1)12Gk+1Rk+12\displaystyle\sqrt{\frac{\mu r}{n}}\cdot(\lambda_{1}^{\star})^{\frac{1}{2}}\cdot\|G^{k+1}-R^{k+1}\|_{2}
\displaystyle\leq μrn(λ1)1228κ32Ek2λr112(ακ2n)D(λr)12μrnγk448(αμrκ52)μrn(λ1)12γk.\displaystyle\sqrt{\frac{\mu r}{n}}\cdot(\lambda_{1}^{\star})^{\frac{1}{2}}\cdot 28\kappa^{\frac{3}{2}}\frac{\|E^{k}\|_{2}}{\lambda_{r}^{\star}}\leq 112(\alpha\kappa^{2}n)\frac{\|D^{\star}\|_{\infty}}{(\lambda_{r}^{\star})^{\frac{1}{2}}}\sqrt{\frac{\mu r}{n}}\cdot\gamma^{k}\leq 448(\alpha\mu r\kappa^{\frac{5}{2}})\sqrt{\frac{\mu r}{n}}(\lambda_{1}^{\star})^{\frac{1}{2}}\cdot\gamma^{k}.

where the second inequality is due to Lemma 3.

Combining the bound of I1I_{1} to I4I_{4},

Xk+1XRk+12,1004(αμrκ52)μrn(λ1)12γkμrκλ1nγk+1,\|X^{k+1}-X^{\star}R^{k+1}\|_{2,\infty}\leq 1004(\alpha\mu r\kappa^{\frac{5}{2}})\sqrt{\frac{\mu r}{n}}(\lambda_{1}^{\star})^{\frac{1}{2}}\cdot\gamma^{k}\leq\sqrt{\frac{\mu r\kappa\lambda_{1}^{\star}}{n}}\gamma^{k+1},

where the last inequality follows from α11624γμrκ2\alpha\leq\frac{1}{1624}\cdot\frac{\gamma}{\mu r\kappa^{2}}.

Conclusion

In this paper, we propose an alternating projection based algorithm that is further accelerated by the tangent space projection technique, for the RMDS problem. Under standard assumptions in the RPCA literature, we establish linear convergence of the proposed algorithm when the outliers are sparse enough. We also numerically verified the performances of the proposed algorithm, with comparisons to other state-of-the-art solvers for RMDS.

To adapt to more realistic settings, it is of interest to incorporate noise in the convergence proof to handle the case that the pairwise distances are corrupted by sub-Gaussian noise, in addition to sparse outliers.

References

  • [1] L. Blouvshtein and D. Cohen-Or, Outlier detection for robust multi-dimensional scaling, IEEE Transactions on Pattern Analysis and Machine Intelligence, 41 (2019), pp. 2273–2279.
  • [2] H. Cai, J.-F. Cai, T. Wang, and G. Yin, Accelerated structured alternating projections for robust spectrally sparse signal recovery, IEEE Transactions on Signal Processing, 69 (2021), pp. 809–821.
  • [3] H. Cai, J.-F. Cai, and K. Wei, Accelerated alternating projections for robust principal component analysis, Journal of Machine Learning Research, 20 (2019), pp. 1–33.
  • [4] E. J. Candès, X. Li, Y. Ma, and J. Wright, Robust principal component analysis?, Journal of the ACM, 58 (2011), pp. 1–37.
  • [5] L. Cayton and S. Dasgupta, Robust Euclidean embedding, in Proceedings of the 23rd International Conference on Machine Learning, ICML, 2006, pp. 169–176.
  • [6] L. Ding and Y. Chen, Leave-one-out approach for matrix completion: Primal and dual analysis, IEEE Transactions on Information Theory, 66 (2020), pp. 7274–7301.
  • [7] I. Dokmanic, R. Parhizkar, J. Ranieri, and M. Vetterli, Euclidean distance matrices: Essential theory, algorithms, and applications, IEEE Signal Processing Magazine, 32 (2015), pp. 12–30.
  • [8] P. A. Forero and G. B. Giannakis, Sparsity-exploiting robust multidimensional scaling, IEEE Transactions on Signal Processing, 60 (2012), pp. 4118–4134.
  • [9] L. Kong, C. Qi, and H.-D. Qi, Classical multidimensional scaling: A subspace perspective, over-denoising, and outlier detection, IEEE Transactions on Signal Processing, 67 (2019), pp. 3842–3857.
  • [10] X. Li, S. Ding, and Y. Li, Outlier suppression via non-convex robust PCA for efficient localization in wireless sensor networks, IEEE Sensors Journal, 17 (2017), pp. 7053–7063.
  • [11] C. Ma, K. Wang, Y. Chi, and Y. Chen, Implicit regularization in nonconvex statistical estimation: Gradient descent converges linearly for phase retrieval, matrix completion, and blind deconvolution, Foundations of Computational Mathematics, 20 (2019), pp. 451–632.
  • [12] F. D. Mandanas and C. L. Kotropoulos, Robust multidimensional scaling using a maximum correntropy criterion, IEEE Transactions on Signal Processing, 65 (2017), pp. 919–932.
  • [13] P. Netrapalli, U. Niranjan, S. Sanghavi, A. Anandkumar, and P. Jain, Non-convex robust PCA, in Advances in Neural Information Processing Systems, NIPS, 2014, pp. 1107–1115.
  • [14] B. A. Schmitt, Perturbation bounds for matrix square roots and pythagorean sums, Linear Algebra and its Applications, 174 (1992), pp. 215–227.
  • [15] K. Wei, J.-F. Cai, T. F. Chan, and S. Leung, Guarantees of Riemannian optimization for low rank matrix recovery, SIAM Journal on Matrix Analysis and Applications, 37 (2016), pp. 1198–1222.
  • [16] X. Yi, D. Park, Y. Chen, and C. Caramanis, Fast algorithms for robust PCA via gradient descent, in Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS, 2016, pp. 4159–4167.

Appendix A Implementation Details of the Computation of Lk+1L^{k+1}

First consider (DSk)=12J(DSk)J\mathcal{B}(D-S^{k})=-\frac{1}{2}J(D-S^{k})J. The computation of J(DSk)J(D-S^{k}) can be achieved by subtracting from each row of (DSk)(D-S^{k}) by its averaged row vector. This step costs about 2n22n^{2} flops. (J(DSk))J(J(D-S^{k}))J can be computed in a similar fashion, by subtracting from each column of J(DSk)J(D-S^{k}) by its averaged column vector. Therefore, the computation cost of (DSk)\mathcal{B}(D-S^{k}) is about 5n25n^{2} flops.

Denote B:=(DSk)B:=\mathcal{B}(D-S^{k}). Note that (IUk(Uk)T)BUk=BUkUk(Uk)TBUk(I-U^{k}(U^{k})^{T})BU^{k}=BU^{k}-U^{k}(U^{k})^{T}BU^{k}, and the computation of Uk(Uk)TBUkU^{k}(U^{k})^{T}BU^{k} requires about 2n2r+4nr22n^{2}r+4nr^{2} flops. Therefore the total cost to form (IUk(Uk)T)BUk(I-U^{k}(U^{k})^{T})BU^{k} is about nr+2n2r+4nr2nr+2n^{2}r+4nr^{2} flops. Let (IUk(Uk)T)BUk=QR(I-U^{k}(U^{k})^{T})BU^{k}=QR be the QR decomposition, whose cost is O(nr2)O(nr^{2}) flops. Then,

𝒫Tk(B)\displaystyle\mathcal{P}_{T^{k}}(B) =Uk(Uk)TB+BUk(Uk)TUk(Uk)TBUk(Uk)T\displaystyle=U^{k}(U^{k})^{T}B+BU^{k}(U^{k})^{T}-U^{k}(U^{k})^{T}BU^{k}(U^{k})^{T}
=Uk(Uk)TB(IUk(Uk)T)+(IUk(Uk)T)BUk(Uk)T\displaystyle=U^{k}(U^{k})^{T}B(I-U^{k}(U^{k})^{T})+(I-U^{k}(U^{k})^{T})BU^{k}(U^{k})^{T}
+Uk(Uk)TBUk(Uk)T\displaystyle\quad+U^{k}(U^{k})^{T}BU^{k}(U^{k})^{T}
=UkRTQT+QR(Uk)T+Uk(Uk)TBUk(Uk)T\displaystyle=U^{k}R^{T}Q^{T}+QR(U^{k})^{T}+U^{k}(U^{k})^{T}BU^{k}(U^{k})^{T}
=[UkQ][(Uk)TBUkRTR0][(Uk)TQT]\displaystyle=[U^{k}~{}Q]\begin{bmatrix}(U^{k})^{T}BU^{k}&R^{T}\\ R&0\end{bmatrix}\begin{bmatrix}(U^{k})^{T}\\ Q^{T}\end{bmatrix}
:=[UkQ]M([UkQ])T.\displaystyle:=[U^{k}~{}Q]M([U^{k}~{}Q])^{T}.

Note that M𝕊2r×2rM\in\mathbb{S}^{2r\times 2r}, the eigen-decomposition of MM, denoted as UΛUTU\Lambda U^{T} can be computed using O(r3)O(r^{3}) flops. We may as well require that the diagonal entries of Λ\Lambda are in the descending order. Finally, since the columns of QQ is orthogonal to the columns of UkU^{k}, [UkQ][U^{k}~{}Q] is an orthogonal matrix, and we can get the eigen-decomposition of Lk+1L^{k+1} by computing

Uk+1=[UkQ]U(:,1:r).U^{k+1}=\begin{bmatrix}U^{k}&Q\end{bmatrix}U_{(:,1:r)}. (8)

The cost of this step is about 4nr24nr^{2}. In summary, the computational cost of r+𝒫Tk(DSk)\mathcal{H}_{r}^{+}\mathcal{P}_{T^{k}}\mathcal{B}(D-S^{k}) is about 5n2+2n2r+O(nr2)5n^{2}+2n^{2}r+O(nr^{2}) flops.

Appendix B Proofs for Some Useful Lemmas

B.1 Proof for Lemma 5

By the definition of operator 𝒜\mathcal{A},

𝒜(Z)=diag(Z)𝟏T+𝟏diag(Z)T2Z.\displaystyle\mathcal{A}(Z)=\text{diag}(Z)\bm{1}^{T}+\bm{1}\text{diag}(Z)^{T}-2Z.

Therefore [𝒜(Z)]ij=Zii+Zjj2Zij[\mathcal{A}(Z)]_{ij}=Z_{ii}+Z_{jj}-2Z_{ij},

𝒜(Z)Z+Z+2Z=4Z.\displaystyle\|\mathcal{A}(Z)\|_{\infty}\leq\|Z\|_{\infty}+\|Z\|_{\infty}+2\|Z\|_{\infty}=4\|Z\|_{\infty}.
(𝒜(Z))=\displaystyle\mathcal{B}(\mathcal{A}(Z))= 12J(diag(Z)𝟏T+𝟏diag(Z)T2Z)J\displaystyle-\frac{1}{2}J(\text{diag}(Z)\bm{1}^{T}+\bm{1}\text{diag}(Z)^{T}-2Z)J
=\displaystyle= 12Jdiag(Z)𝟏T(In1n𝟏𝟏T)12(In1n𝟏𝟏T)𝟏diag(Z)TJ+JZJ=JZJ,\displaystyle-\frac{1}{2}J\cdot\text{diag}(Z)\bm{1}^{T}(I_{n}-\frac{1}{n}\bm{1}\bm{1}^{T})-\frac{1}{2}(I_{n}-\frac{1}{n}\bm{1}\bm{1}^{T})\cdot\bm{1}\text{diag}(Z)^{T}J+JZJ=JZJ,

and if Z𝟏=0Z\bm{1}=0,

JZJ=JZ(In1n𝟏𝟏T)=JZ=(In1n𝟏𝟏T)Z=Z.\displaystyle JZJ=JZ(I_{n}-\frac{1}{n}\bm{1}\bm{1}^{T})=JZ=(I_{n}-\frac{1}{n}\bm{1}\bm{1}^{T})Z=Z.

B.2 Proof for Lemma 6

Denote Dk=𝒜(Lk)D^{k}=\mathcal{A}(L^{k}).

Sk=Tξk(D𝒜(Lk))=Tξk(S+DDk).\displaystyle S^{k}=T_{\xi^{k}}(D-\mathcal{A}(L^{k}))=T_{\xi^{k}}(S^{\star}+D^{\star}-D^{k}).

When Sij=0S_{ij}^{\star}=0, since DkDξk\|D^{k}-D^{\star}\|_{\infty}\leq\xi^{k},

Sijk=Tξk(DijDijk)=0,\displaystyle S_{ij}^{k}=T_{\xi^{k}}(D_{ij}^{\star}-D_{ij}^{k})=0,

and it follows that supp(Sk)supp(S)\text{supp}(S^{k})\subseteq\text{supp}(S^{\star}). Next, there are two cases to consider.

  1. 1.

    (i,j)supp(S)supp(Sk)(i,j)\in\text{supp}(S^{\star})\setminus\text{supp}(S^{k}). In this case, |Sij+DijDijk|ξk|S_{ij}^{\star}+D_{ij}^{\star}-D_{ij}^{k}|\leq\xi^{k},

    |SijkSij|=|Sij||DijkDij|+ξk.\displaystyle|S_{ij}^{k}-S_{ij}^{\star}|=|S_{ij}^{\star}|\leq|D_{ij}^{k}-D_{ij}^{\star}|+\xi^{k}.
  2. 2.

    (i,j)supp(Sk)(i,j)\in\text{supp}(S^{k}). In this case, |Sij+DijDijk|>ξk|S_{ij}^{\star}+D_{ij}^{\star}-D_{ij}^{k}|>\xi^{k},

    Sijk=Tξk(Sij+DijDijk)=Sij+DijDijk,\displaystyle S_{ij}^{k}=T_{\xi^{k}}(S_{ij}^{\star}+D_{ij}^{\star}-D_{ij}^{k})=S_{ij}^{\star}+D_{ij}^{\star}-D_{ij}^{k},

    and we get the bound

    |SijkSij|=|DijkDij|ξk.\displaystyle|S_{ij}^{k}-S_{ij}^{\star}|=|D_{ij}^{k}-D_{ij}^{\star}|\leq\xi^{k}.

B.3 Proof for Lemma 7

It is easy to see that J21\|J\|_{2}\leq 1, and any unit vector orthogonal to 𝟏\bm{1} reaches the upper bound. For the second property, recall that L=UΛ(U)TL^{\star}=U^{\star}\Lambda^{\star}(U^{\star})^{T}, and L𝟏=0L^{\star}\bm{1}=0, therefore 𝟏\bm{1} is in the null space of LL^{\star}, and is orthogonal to each column of UU^{\star}. Hence,

𝟏TU=0,JU=U1n𝟏𝟏TU=U.\bm{1}^{T}U^{\star}=0,~{}JU^{\star}=U^{\star}-\frac{1}{n}\bm{1}\bm{1}^{T}U^{\star}=U^{\star}.

Finally, denote the ii-th row of ZZ as ziz_{i},

eiT(JZ)2=ziz1++znn22Z2,.\|e_{i}^{T}(JZ)\|_{2}=\left\|z_{i}-\frac{z_{1}+\cdots+z_{n}}{n}\right\|_{2}\leq 2\|Z\|_{2,\infty}.