Robust Multi-Dimensional Scaling via Accelerated Alternating Projections
Abstract
We consider the robust multi-dimensional scaling (RMDS) problem in this paper. The goal is to localize point locations from pairwise distances that may be corrupted by outliers. Inspired by classic MDS theories, and nonconvex works for the robust principal component analysis (RPCA) problem, we propose an alternating projection based algorithm that is further accelerated by the tangent space projection technique. For the proposed algorithm, if the outliers are sparse enough, we can establish linear convergence of the reconstructed points to the original points after centering and rotation alignment. Numerical experiments verify the state-of-the-art performances of the proposed algorithm.
1 Introduction
Multi-dimensional scaling (MDS) refers to the localization of points from their pairwise distances, and it has applications in wireless communication, chemistry, computer graphics, and so on [7]. The properties of classic MDS have been well studied.
In reality, the measured pairwise distances may be incomplete, noisy, or, due to sensor malfunction, corrupted by outliers. In this paper, we consider the robust reconstruction of the point locations from distances corrupted by outliers, i.e., the robust multi-dimensional scaling (RMDS) problem. As shown in [5], even a single outlier in the measured distances will cause the false information to spread to all the reconstructed point locations in the MDS calculation.
Suppose that we have a set of points . Denote the squared Euclidean distance matrix (EDM) as . The -th entry of is equal to . If the pairwise distances are corrupted by outliers, the observed EDM is equal to for some outlier matrix . One can see that the considered problem may be solved by methods for robust principal component analysis (RPCA) [4, 13, 16, 3]. However, naively adopting a RPCA solver for the RMDS problem is at the risk of neglecting the inner structure of the EDM, and may result in degraded reconstruction performances [10].
To combat outliers in the EDM, [8] casts the problem as an unconstrained optimization, where the variables are the data matrix of size , and the outlier matrix of size . The data fidelity term of the objective function is the squared Frobenius norm of the difference between the observed distance matrix, and the distance matrix formed by the data matrix plus the outlier matrix. The regularization term of the objective function is the penalty for the outlier matrix. The optimization is conducted using the majorization-minimization approach. Later, the iterative scheme of [8] is improved by [12], replacing the squared Frobenius norm with the more outlier-robust M-estimators. A more recent work [9], formulates the RMDS problem as a constrained optimization. Besides the outlier matrix, another sought-after matrix is constrained in a way such that it is the distance matrix formed by points with dimension less than or equal to . Compared to [8, 12], [9] shows improved performances. Weights and bounds for the entries of the distance matrix can also be handled easily in [9]. Different from the optimization perspectives of the aforementioned works, [1] proposes to first detect the outliers by the broken triangular inequalities among the distances, and then compute a weighted MDS without the detected corrupted distances. However, as shown in [9], oftentimes such an approach is not enough to yield satisfactory reconstruction performances.
Inspired by classic MDS theories, and state-of-the-art nonconvex solvers [13, 3] for the RPCA problem, we propose a nonconvex algorithm that alternates between finding the outlier matrix, and the Gram matrix that generates the outlier-free EDM. Our contributions can be summarized as the following.
-
•
For the proposed algorithm, we can establish linear convergence of the reconstructed points to the original points after centering and rotation alignment, if the outliers are sparse enough. The relationship between the amount of tolerable outliers and the properties of the original points, is also made clear in the theoretical guarantees.
-
•
We numerically verify the performances of the proposed algorithm in the considered outlier only setting, and demonstrate its advantages compared to other state-of-the-art solvers in the noise plus outlier setting.
1.1 Problem Formulation and Assumptions
Some necessary notations are first introduced. We then describe our problem formulation, and the assumptions to derive the theoretical guarantees.
Notations. The set of symmetric matrices of size is denoted by . The set of rotation matrices of size is denoted by , i.e., . , is the indices of the nonzeros in , and is the vector that contains the diagonal entries of . denotes the all-one vector, and . , For any matrix , the spectral norm, the Frobenius norm, and the entrywise infinity norm of are denoted by , , and , respectively.
Recall that the set of points is , and the corresponding EDM is . As it turns out, it is possible to recover another set of points centered at zero and preserve the pairwise distances of . Denote the data matrix as , whose -row is . Let , and then denote the centered data matrix as , whose -row is . Define the operator such that
(1) |
One can verify that . Hence, in many applications, it often suffices to reconstruct from .
Denote , and define the operator such that
(2) |
Since , by Lemma 5, . Furthermore, can be reconstructed, after rotation alignment, from the eigen-decomposition of . Without loss of generality, we assume that is of full column rank, then is a rank- positive semi-definite matrix. Denote the eigen-decomposition of as , where , , and . Then denote . One can show that there exists a rotation matrix such that .
In the considered setting, the observed EDM is equal to for some outlier matrix . Our aim is to recover (eventually, ) from . To derive our theoretical guarantees, similar to the incoherence and sparsity assumptions commonly made in the RPCA literature [13, 3], we assume the following about and .
Assumption 1.
Suppose has the eigen-decomposition , where , , and . We assume that is -incoherent111One can show that . Typically, ., i.e.,
From this assumption, one can immediately get
Assumption 2.
The outlier matrix is -sparse, i.e., it has no more than nonzero entries per row (and column).
2 Algorithm
Our proposed algorithm, described in Algorithm 1, is inspired by classic MDS theories, and state-of-the-art nonconvex solvers [13, 3] for RPCA.
At initialization, with the hard thresholding function defined as
large entries corrupted by outliers in are picked out. Then in the spirit of MDS, is computed from . Here, the operator is defined as in (2), and with the eigen-decomposition , where , , and ,
For later iterations (), the threshold parameter is adjusted by the decay rate to picked out the entries with outliers. Denote the eigen-decomposition of as , where , the operator defined as the following,
(3) |
is the projection onto the tangent space of the manifold of symmetric positive semi-definite matrices of rank at . Such tangent space projection applied before the partial eigen-decomposition has been proven useful in deriving theoretical guarantees as well as reducing computation cost [15, 3, 2].
For RMDS-AAP, the dominant cost is the computation of each iteration. At initialization, the flops are about , where the hidden constant comes from the partial eigen-decomposition. For later iterations, with the help of the tangent space projection, the partial eigen-decomposition of a matrix is reduced to several matrix-matrix multiplications that costs about flops, a QR factorization of a matrix, and a small eigen-decomposition of a matrix. The total cost is about flops. For completeness, we include the implementation details in Appendix A. Compared to [3], the main difference comes from applying before the tangent space projection.
For the theoretical guarantees of RMDS-AAP, we have derived the following results.
Theorem 1.
Suppose that RMDS-AAP is provided with that satisfies , and . Denote . If
then for , ,
and
Remark 1. The constant in the bound for can be further optimized. The established bound is better than the one in [3] for the RPCA problem, and the one in [2] for the robust recovery of low-rank Hankel matrices, showing the merits of our proof. When , the bound matches the optimal one in [13] for the RPCA problem.
Proposition 1.
Assume the same conditions of Theorem 1. For , further denote , and . Suppose the rank- SVD of is . Set . Then
Remark 2. As mentioned in the problem formulation, for some rotation matrix . Furthermore, one can show that the computed minimizer to
is . Therefore, , and Proposition 1 actually establishes the linear convergence of the reconstructed points to after the best rotation alignment by . Also, the contraction of error in the norm is uniform for all the points.
3 Numerical Experiments
We perform tests on the plus sign example that appeared in [8, 9]. The tests are first conducted for the noiseless case, i.e., the distances are only corrupted by outliers, to verify our theoretical guarantees for RMDS-AAP. We then consider the more realistic case, i.e., the distances are corrupted by both noise and outliers, and show the empirical performances of RMDS-AAP.
Noiseless Case. In this case, we consider a plus sign consists of 2D points , which are centered at , and have four end points at , , , and . The data matrix has as its -th row, and . The ground truth Gram matrix has incoherence parameter , and condition number . To test the reconstruction performances of RMDS-AAP against outliers, out of the distances are randomly sampled, and each is added with an outlier whose value is uniformly selected within . Denote the percentage of outliers as . For , we test RMDS-AAP with different initial threshold values of and decay rate values of . The results, averaged among 1000 simulations, are reported in Fig. 1.



In the left subfigure, the logarithms of the averaged errors in terms of , , and are plotted when , , and . One can see the linear convergence of the three terms, in agreement with Theorem 1 and Proposition 1. In the middle and right subfigures, the reconstruction of each simulation is considered successful if, at convergence,
In the middle subfigure, is selected as , and the success rates computed out of the 1000 simulations are compared for . As predicted in Theorem 1, one can see that larger indeed admits successful reconstruction from more outliers. In the right subfigure, the success rates when , and are shown, whereas the white color indicates success rate 1, and the black color indicates success rate 0. One can see that RMDS-AAP shows some desired robustness to the initial threshold value .



Noisy Case. We then empirically test the reconstruction ability of RMDS-AAP when the distances are also noisy. Following the setup of [9], 25 points centered at form the plus sign, with 4 anchor points at , , , and . For , each is first added with a zero-mean Gaussian noise with variation . Then out of the distances, discounting for the 6 pairwise distances between the 4 anchor points, are sampled randomly to independently add with an outlier whose value is uniformly selected within . We use and for RMDS-AAP. At convergence, a linear mapping , consists of translation and rotation, is constructed to find the best alignment between the reconstructed 4 anchor points and the original 4 anchor points. Denote as the indices of the 4 anchor points, and denote the -th row of the at convergence as . The error measure is computed at the other 21 points after alignment, i.e., Since in this example, FSMDS [9] shows the best performances, compared to RMDS [8], HQMMDS [12], and TMDS [1], we only compare RMDS-AAP with FSMDS. In each setting, the mean and standard deviation of the RMSE values of RMDS-AAP are computed over 1000 simulations. From Fig. 2, one can see that regardless of the noise strength, RMDS-AAP generally produces reconstruction with lower RMSE values, as well as smaller variations.
4 Proofs
We first collect some useful results in Section 4.1. Some of them are from the literature, and the proofs for the new ones are included in the Appendix. We then present the proofs for Theorem 1, splitted into the proof for the initialization phase, and the proof for the iteration phase, in Section 4.2, and Section 4.3, respectively. The proof for Proposition 1 is presented in Section 4.4.
4.1 Useful Lemmas
Lemma 1 ([6, Lemma 19]).
Let , and for some perturbation matrix . Denote the eigen-decomposition of as . Suppose the rank- SVD of is . Set and . If , then
Lemma 3 ([11, Lemma 47]).
Assume the same conditions of Lemma 1. Denote , and . Suppose the rank- SVD of is . Set . Then,
Lemma 4 ([13, Lemma 4]).
If is -sparse, i.e., has no more than nonzero entries per row (and column), then .
Lemma 5.
The operator satisfies:
-
1.
, ;
-
2.
with , .
Lemma 6.
If , and , then , and
Lemma 7.
The matrix satisfies:
-
1.
,
-
2.
,
-
3.
.
4.2 Proof for the Initialization Phase
Considering as the zero matrix, . Since
by Lemma 6, , and
where the third inequality follows from and Lemma 5. Denoting ,
(4) | ||||
where the first inequality follows from Lemma 7, and the second inequality follows from Lemma 4. Therefore is no more than if .
Suppose has the eigen-decomposition , where , and . Lemma 1 is applicable once we have the row-wise bound of , where is the best rotation matrix between and computed from the SVD of . Consider .
Bounding . Due to Weyl’s inequality,
where is used in the last inequality. Therefore,
Bounding .
therefore we just need to bound .
For the first term,
where the third equality and the second inequality follows from Lemma 7, the third inequality follows from triangular inequality and the row-wise bound of .
The second term can be bounded similarly. Since
is in the null space of . From , we can get , and
Therefore,
where in the second inequality, the bound is used again, and the last inequality holds if .
Combining the bounds of to , and taking the maximum with respect to ,
consequently,
(5) | ||||
where in the second inequality, the bound is used, and the last inequality holds if . As a result,
therefore . Also, under this bound of we have
Now by Lemma 1,
(6) | ||||
if .
4.3 Proof for the Iteration Phase
For , our induction hypotheses are the following:
(7a) | ||||
(7b) | ||||
(7c) |
One can see from (4), (5), and (6) that the three bounds hold for . For any ,
when ; and
when so that .
Now we prove for iteration . With the bound , by Lemma 5,
Then by Lemma 6, , and
By Lemma 5, , and
Denoting
and the eigen-decomposition of as .
where the first inequality follows from Lemma 8 and Lemma 9, the second inequality is due to Lemma 7 again, the third inequality follows from the induction hypothesis (7a), and the fifth inequality holds if . Then we proceed as in the base case to get
and the first two terms can be bounded similarly, i.e.,
Bounding .
while
For the first term,
where the last equality follows from the fact that . Therefore,
For the second term,
Putting together,
Combining the bounds of to , and taking the maximum with respect to ,
where the third inequality holds if . By Lemma 1 again,
if .
4.4 Proof for Proposition 1
For ,
Bounding .
Bounding .
Since
Bounding .
where the first inequality follows from [14, Lemma 2.1], since and are the matrix square root of and , respectively; and the second inequality follows from Lemma 2. Therefore,
Combining the bound of to ,
where the last inequality follows from .
Conclusion
In this paper, we propose an alternating projection based algorithm that is further accelerated by the tangent space projection technique, for the RMDS problem. Under standard assumptions in the RPCA literature, we establish linear convergence of the proposed algorithm when the outliers are sparse enough. We also numerically verified the performances of the proposed algorithm, with comparisons to other state-of-the-art solvers for RMDS.
To adapt to more realistic settings, it is of interest to incorporate noise in the convergence proof to handle the case that the pairwise distances are corrupted by sub-Gaussian noise, in addition to sparse outliers.
References
- [1] L. Blouvshtein and D. Cohen-Or, Outlier detection for robust multi-dimensional scaling, IEEE Transactions on Pattern Analysis and Machine Intelligence, 41 (2019), pp. 2273–2279.
- [2] H. Cai, J.-F. Cai, T. Wang, and G. Yin, Accelerated structured alternating projections for robust spectrally sparse signal recovery, IEEE Transactions on Signal Processing, 69 (2021), pp. 809–821.
- [3] H. Cai, J.-F. Cai, and K. Wei, Accelerated alternating projections for robust principal component analysis, Journal of Machine Learning Research, 20 (2019), pp. 1–33.
- [4] E. J. Candès, X. Li, Y. Ma, and J. Wright, Robust principal component analysis?, Journal of the ACM, 58 (2011), pp. 1–37.
- [5] L. Cayton and S. Dasgupta, Robust Euclidean embedding, in Proceedings of the 23rd International Conference on Machine Learning, ICML, 2006, pp. 169–176.
- [6] L. Ding and Y. Chen, Leave-one-out approach for matrix completion: Primal and dual analysis, IEEE Transactions on Information Theory, 66 (2020), pp. 7274–7301.
- [7] I. Dokmanic, R. Parhizkar, J. Ranieri, and M. Vetterli, Euclidean distance matrices: Essential theory, algorithms, and applications, IEEE Signal Processing Magazine, 32 (2015), pp. 12–30.
- [8] P. A. Forero and G. B. Giannakis, Sparsity-exploiting robust multidimensional scaling, IEEE Transactions on Signal Processing, 60 (2012), pp. 4118–4134.
- [9] L. Kong, C. Qi, and H.-D. Qi, Classical multidimensional scaling: A subspace perspective, over-denoising, and outlier detection, IEEE Transactions on Signal Processing, 67 (2019), pp. 3842–3857.
- [10] X. Li, S. Ding, and Y. Li, Outlier suppression via non-convex robust PCA for efficient localization in wireless sensor networks, IEEE Sensors Journal, 17 (2017), pp. 7053–7063.
- [11] C. Ma, K. Wang, Y. Chi, and Y. Chen, Implicit regularization in nonconvex statistical estimation: Gradient descent converges linearly for phase retrieval, matrix completion, and blind deconvolution, Foundations of Computational Mathematics, 20 (2019), pp. 451–632.
- [12] F. D. Mandanas and C. L. Kotropoulos, Robust multidimensional scaling using a maximum correntropy criterion, IEEE Transactions on Signal Processing, 65 (2017), pp. 919–932.
- [13] P. Netrapalli, U. Niranjan, S. Sanghavi, A. Anandkumar, and P. Jain, Non-convex robust PCA, in Advances in Neural Information Processing Systems, NIPS, 2014, pp. 1107–1115.
- [14] B. A. Schmitt, Perturbation bounds for matrix square roots and pythagorean sums, Linear Algebra and its Applications, 174 (1992), pp. 215–227.
- [15] K. Wei, J.-F. Cai, T. F. Chan, and S. Leung, Guarantees of Riemannian optimization for low rank matrix recovery, SIAM Journal on Matrix Analysis and Applications, 37 (2016), pp. 1198–1222.
- [16] X. Yi, D. Park, Y. Chen, and C. Caramanis, Fast algorithms for robust PCA via gradient descent, in Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS, 2016, pp. 4159–4167.
Appendix A Implementation Details of the Computation of
First consider . The computation of can be achieved by subtracting from each row of by its averaged row vector. This step costs about flops. can be computed in a similar fashion, by subtracting from each column of by its averaged column vector. Therefore, the computation cost of is about flops.
Denote . Note that , and the computation of requires about flops. Therefore the total cost to form is about flops. Let be the QR decomposition, whose cost is flops. Then,
Note that , the eigen-decomposition of , denoted as can be computed using flops. We may as well require that the diagonal entries of are in the descending order. Finally, since the columns of is orthogonal to the columns of , is an orthogonal matrix, and we can get the eigen-decomposition of by computing
(8) |
The cost of this step is about . In summary, the computational cost of is about flops.
Appendix B Proofs for Some Useful Lemmas
B.1 Proof for Lemma 5
By the definition of operator ,
Therefore ,
and if ,
B.2 Proof for Lemma 6
Denote .
When , since ,
and it follows that . Next, there are two cases to consider.
-
1.
. In this case, ,
-
2.
. In this case, ,
and we get the bound
B.3 Proof for Lemma 7
It is easy to see that , and any unit vector orthogonal to reaches the upper bound. For the second property, recall that , and , therefore is in the null space of , and is orthogonal to each column of . Hence,
Finally, denote the -th row of as ,