Sharp Restricted Isometry Property Bounds for Low-rank Matrix Recovery Problems with Corrupted Measurements
Abstract
In this paper, we study a general low-rank matrix recovery problem with linear measurements corrupted by some noise. The objective is to understand under what conditions on the restricted isometry property (RIP) of the problem local search methods can find the ground truth with a small error. By analyzing the landscape of the non-convex problem, we first propose a global guarantee on the maximum distance between an arbitrary local minimizer and the ground truth under the assumption that the RIP constant is smaller than . We show that this distance shrinks to zero as the intensity of the noise reduces. Our new guarantee is sharp in terms of the RIP constant and is much stronger than the existing results. We then present a local guarantee for problems with an arbitrary RIP constant, which states that any local minimizer is either considerably close to the ground truth or far away from it. Next, we prove the strict saddle property, which guarantees the global convergence of the perturbed gradient descent method in polynomial time. The developed results demonstrate how the noise intensity and the RIP constant of the problem affect the landscape of the problem.
1 Introduction
Low-rank matrix recovery problems arise in various applications, such as matrix completion (Candès and Recht 2009; Recht, Fazel, and Parrilo 2010), phase synchronization/retrieval (Singer 2011; Boumal 2016; Shechtman et al. 2015), robust PCA (Ge, Jin, and Zheng 2017), and several others (Chen and Chi 2018; Chi, Lu, and Chen 2019). In this paper, we study a class of low-rank matrix recovery problems, where the goal is to recover a symmetric and positive semidefinite ground truth matrix with from certain linear measurements corrupted by noise. This problem can be formulated as the following optimization problem:
(1) | ||||
Here, is a linear operator whose action on a matrix is given by
where are called sensing matrices. In addition, represents the perfect measurement on the ground truth and comes from an arbitrary probability distribution. Note that only the noisy measurement is available to the user, and indeed is unknown. In other words, from a problem-solving perspective, the random variable is hidden to the user, and it is explicitly modeled here only for the sake of analysis.
The modeling of the noise in this problem is critical for many practical applications in which the influence of the noise cannot be ignored. For example, state estimation is a major data analytic problem for the operation of power grids, which can be modeled as matrix sensing (Jin et al. 2019b). In this problem, each measurement comes from a physical device, and the noise in our formulation models not only the sensory noise but also mismatch between the true model of the system and the one used by a power operator, changes in the measurements due to cyber-attacks, mechanical faults, etc. In other words, by a noisy problem we mean a learning problem where a part of the data is wrong for various reasons, and as with this case, it is impossible to assume that the measurements are noise-free.
Although it may be possible to solve the problem (1) based on convex relaxations (Candès and Recht 2009; Recht, Fazel, and Parrilo 2010; Candès and Tao 2010), the computational complexity associated with solving a semidefinite program presents a major challenge for large-scale problems. A more scalable approach is to use the Burer–Monteiro factorization (Burer and Monteiro 2003) by expressing as with , which leads to the following equivalent formulation of the original problem (1):
(2) |
The unconstrained problem (2) is often solved by local search methods such as gradient descent. Since the objective function in (2) is non-convex, local search algorithms may converge to a local minimizer, leading to a suboptimal or plainly wrong solution. Hence, it is desirable to provide guarantees on the maximum distance between these local minimizers and the ground truth . This problem will be addressed in this paper.
Related Works
The special noiseless case of the problem (2) can be obtained by setting . In this case, any solution with is a global minimizer of the problem (2). Many previous researches such as Ge, Jin, and Zheng (2017); Bhojanapalli, Neyshabur, and Srebro (2016); Park et al. (2017); Zhang et al. (2018b); Zhu et al. (2018); Zhang, Sojoudi, and Lavaei (2019); Zhang and Zhang (2020); Bi and Lavaei (2020); Ha, Liu, and Barber (2020); Zhu et al. (2021); Zhang (2021) focused on proving that the problem has no spurious (non-global) local minimizers under the assumption of restricted isometry property (RIP). Moreover, as demonstrated in previous works such as Ge, Jin, and Zheng (2017), the developed techniques under the RIP condition can be adopted to show that other low-rank matrix recovery problems, such as matrix completion under incoherence condition and robust PCA problem, also have benign landscape. The RIP condition is equivalent to the restricted strongly convex and smooth property used in Wang, Zhang, and Gu (2017); Park et al. (2018); Zhu et al. (2021), and its formal definition is given below.
Definition 1.
The linear operator is said to satisfy the -RIP2r property for some constant if the inequality
holds for all with .
In the recent paper by Zhang (2021), the author developed a sharp bound on the absence of spurious local minima for the noiseless case of problem (2), which says that the problem has no spurious local minima if the measurement operator satisfies the - property with . This result is tight since there is a known counterexample (Zhang et al. 2018a) having spurious local minima under .
For the noisy problem, the relation is unlikely to be satisfied, where denotes a global minimizer of problem (2). However, in this situation, should be close to the ground truth if the noise is small. As a generalization of the above-mentioned results for the noiseless problem, it is natural to study whether all local minimizers, including the global minimizers, are close to the ground truth under the RIP assumption. One such result is presented in Bhojanapalli, Neyshabur, and Srebro (2016) and given below.
Theorem 1 (from Theorem 3.1 in Bhojanapalli, Neyshabur, and Srebro (2016)).
Suppose that and has the -RIP4r property with . Then, with probability at least , any local minimizer of problem (2) satisfies the inequality
Theorem 31 in Ge, Jin, and Zheng (2017) further improves the above result by replacing the - property with the - property. Li et al. (2020) studies a similar noisy low-rank matrix recovery problem with norm.
As compared above, there is an evident gap between the state-of-the-art results for the noiseless and noisy problems. The result for the noiseless problem only requires the RIP constant , but the one for the noisy problem requires no matter how small the noise is. This gap will be addressed in this paper by showing that a major generalization of Theorem 1 holds for the noisy problem under the same RIP assumption as in the sharp bound for the noiseless problem.
Notations
In this paper, refers to the identity matrix of size . The notation means that is a symmetric and positive semidefinite matrix. denotes the -th largest singular value of a matrix , and denotes the -th largest eigenvalue of . denotes the Euclidean norm of a vector , while and denote the Frobenius norm and induced norm of a matrix , respectively. is defined to be for two matrices and of the same size. The Kronecker product between and is denoted as . For a matrix , is the usual vectorization operation by stacking the columns of the matrix into a vector. For a vector , converts to a square matrix and converts to a symmetric matrix, i.e., and , where is the unique matrix satisfying . Finally, refers to the multivariate Gaussian distribution with mean and covariance .
2 Main Results
We first present the global guarantee on the local minimizers of the problem (2). To simplify the notation, we use a matrix representation of the measurement operator as follows:
Then, for every matrix .
Theorem 2.
Assume that the linear operator satisfies the - property with . For every , with probability at least , either of the following two inequalities
(3a) | |||
(3b) |
holds for every arbitrary local minimizer of problem (2).
Note that two upper bounds on the distance can be obtained for any local minimizer by solving the two quadratic-like inequalities (3a) and (3b), and the larger bound needs to be used because only one of the two inequalities is guaranteed to hold. The explicit upper bound solved from Theorem 2 is given in Appendix A. The reason for the existence of two inequalities in Theorem 2 is the split of its proof into two cases. The first case is when the -th smallest singular value of is small, and the second case is the opposite, which are respectively handled by Lemma 2 and Lemma 3.
The reason for the existence of two inequalities in Theorem 2 is the split of its proof into two cases. The first case is when the -th smallest singular value of is small, and the second case is the opposite, which are respectively handled by Lemma 2 and Lemma 3.
Theorem 2 is a major extension of the existing sharp result stating that the noiseless problem has no spurious local minima under the same assumption of the - property with . The reason is that in the case when the noise is equal to zero, one can choose an arbitrarily small in Theorem 2 to conclude from the inequalities (3a) and (3b) that for every local minimizer . Moreover, when the RIP constant further decreases from , the upper bound on will also decrease, which means that a local minimizer found by local search methods will be closer to the ground truth . This suggests that the RIP condition is able to not only guarantee the absence of spurious local minima as shown in the previous literature but also mitigate the influence of the noise in the measurements.
Compared with the existing results such as Theorem 1, our new result has two advantages. First, by improving the RIP constant from to , one can apply the results on the location of spurious local minima to a much broader class of problems, which can often help reduce the number of measurements. For example, in the case when the measurements are given by random Gaussian matrices, it is proven in Candès and Recht (2009) that to achieve the - property the minimum number of measurements needed is in the order of . By improving the RIP constant in the bound, we can significantly reduce the number of measurements while still keeping the benign landscape. In applications such as learning for energy networks, there is a fundamental limit on the number of measurements that can be collected due to the physics of the problem (Jin et al. 2021). Finding a better bound on RIP helps with addressing the issues with the number of measurements needed to reliably solve the problem. Second, Theorem 1 is just about the probability of having all spurious solutions in a fixed ball around the ground truth of radius instead of balls of arbitrary radii, and this fixed ball could be a large one depending on whether the noise level is fixed or scales with the problem. On the other hand, in Theorem 2, we consider the probability for any arbitrary value of . By having a flexible , our work not only improves the RIP constant but also allows computing the probability of having all spurious solutions in any given ball.
In the special case of rank , the conditions (3a) and (3b) in Theorem 2 can be substituted with a simpler condition as shown below. Its proof is highly similar to that of Lemma 3, and obviated in this paper for succinctness.
Theorem 3.
Consider the case and assume that the linear operator satisfies the - property with . For every , with probability at least , every arbitrary local minimizer of problem (2) satisfies
(4) |
In the case when the RIP constant is not less than , it is not possible to achieve a global guarantee similar to Theorem 2 or Theorem 3 since it is known that the problem may have a spurious solution even in the noiseless case. Instead, we turn to local guarantees by showing that every arbitrary local minimizer of problem (2) is either close to the ground truth or far away from it in terms of the distance .
Theorem 4.
Assume that the linear operator satisfies the -RIP2r property for some . Consider arbitrary constants and such that
Every arbitrary local minimizer of problem (2) satisfying
(5) |
will also satisfy
(6) |
with probability at least , where
The upper bounds in (5) and (6) define an outer ball and an inner ball centered at the ground truth . Theorem 4 states that there is no local minimizer in the ring between the two balls. As approaches zero, the inner ball shrinks to the ground truth. This theorem shows that bad local minimizers are located outside the outer ball. Note that the problem could be highly non-convex when is close to 1, while this theorem shows a benign landscape in a local neighborhood of the solution. Furthermore, all the theorems in this section are applicable to arbitrary noise models since they make no explicit use of the probability distribution of the noise. The only required information is the probability , which can be computed or bounded when the probability distribution of the noise is given as illustrated in Section 4.
The results presented above are all about the location of the local minimizers. They do not directly lead to the global convergence of local search methods with a fast convergence rate. To provide performance guarantees for local search methods, the next theorem establishes a stronger property for the landscape of the noisy problem that is usually called the strict saddle property in the literature, which essentially says that all approximate second-order critical points are close to the ground truth.
Theorem 5.
Assume that the linear operator satisfies the - property with . For every and , with probability at least , either of the following two inequalities
(7a) | |||
(7b) |
holds for every matrix satisfying
Note that this property is not proven in the literature for even in the noiseless case, and thus our result generalizes the existing ones even in this scenario. On the other hand, it is proven by Jin et al. (2017) that the perturbed gradient descent method with an arbitrary initialization will find a solution satisfying the requirements in Theorem 5 with a high probability in number of iterations. By Theorem 5, will be close to the ground truth if and are chosen to be relatively small.
Table 1 briefly summarizes our result compared with the existing literature.
Paper | Noise | RIP Assumption | Rank | Convergence |
---|---|---|---|---|
Bhojanapalli, Neyshabur, and Srebro (2016) | Isotropic Gaussian | rank | N/A | |
Zhang, Sojoudi, and Lavaei (2019) | Noiseless | rank 1 | N/A | |
Zhang (2021) | Noiseless | rank | N/A | |
Ours | Finite Variance | rank | Polynomial |
3 Proofs of Main Results
Before presenting the proofs, we first compute the gradient and the Hessian of the objective function of the problem (2):
where
and is the matrix satisfying
The first step in the proofs is to derive necessary conditions for a matrix to be an approximate second-order critical point, which depend on the linear operator , the noise , the solution , and the parameter characterizing how close is to a true second-order critical point.
Lemma 1.
Given , assume that satisfies
Then, it must satisfy the following inequalities:
(8a) | |||
(8b) |
where .
Proof.
If is a local minimizer of the problem (2), Lemma 1 shows that satisfies the inequalities (8a) and (8b) with . Similarly, Theorem 2 can also be regarded as a special case of Theorem 5 with . The proofs of these two theorems consist of inspecting two cases. The following lemma deals with the first case in which is an approximate second-order critical point with being close to zero.
Lemma 2.
Proof.
Let and be a unit eigenvector of corresponding to its minimum eigenvalue, i.e.,
In addition, let be a singular vector of such that
Let . Then, and (8b) implies that
(9) |
On the other hand,
in which the second last inequality is due to (9) and the last inequality is due to (8a). Furthermore, the right-hand side of the above inequality can be relaxed as
which leads to the inequality (7a). ∎
The remaining case with
will be handled in the following lemma using a different method.
Lemma 3.
Assume that the linear operator satisfies the - property with . Given and arbitrary constants and , the inequalities
and will together imply the inequality (7b).
The proofs of both Lemma 3 and the local guarantee in Theorem 4 generalize the proof of the absence of spurious local minima for the noiseless problem in Zhang and Zhang (2020); Zhang (2021). Our innovation here is to develop new techniques to analyze approximate optimality conditions for the solutions because unlike the noiseless problem the local minimizers of the noisy one are only approximate second-order critical points of the distance function . For a fixed solution and noise , one can find an operator satisfying the - property with the smallest possible such that and satisfy the necessary conditions stated in Lemma 1. Let be the RIP constant of the found measurement operator in the worst-case scenario. Then, if in Lemma 3 is a solution of the current problem with the linear operator satisfying the - property, it holds that , which can further lead to an upper bound on the distance .
To compute defined above, let and solve the following optimization problem whose optimal value is :
(10) | ||||
s.t. | ||||
Note that a matrix is said to satisfy the - property if
holds for every matrix with and . Obviously, for a linear operator , satisfies the - property if and only if satisfies the - property.
However, since problem (10) is non-convex due to the RIP constraint, we instead solve the following convex reformulation:
(11) | ||||
Lemma 14 in Bi and Lavaei (2020) proves that problem (10) and problem (11) have the same optimal value. The remaining step in the proof of Lemma 3 is to solve the optimization problem (11) for given , and . The complete proof of Lemma 3 is lengthy and deferred to Appendix B. Finally, Theorem 2 and Theorem 5 are direct consequences of Lemma 2 and Lemma 3. The proof of Theorem 3 is very similar to that of Lemma 3 and is also given in Appendix B.
Now, we turn to the proof of the local guarantee in Theorem 4. The following existing result will be useful.
Lemma 4 (from Lemma 14 in Zhang, Sojoudi, and Lavaei (2019)).
Given , the rank-2 matrix has two possibly nonzero eigenvalues
Here, is the angle between and .
Proof of Theorem 4.
First, we relax the optimization problem (11) by dropping the constraint related to the second-order necessary optimality condition. This gives rise to the optimization problem
(12) | ||||
To further simplify the problem (12), one can replace its decision variable with and introduce the following optimization problem:
(13) | ||||
Given any feasible solution to (12), the tuple
is a feasible solution to problem (13). Therefore, if the optimal value of (12) is denoted as and the optimal value of (13) is denoted as , then it holds that
(14) |
in which the last inequality is implied by as shown above. To prove the inequality (6), we need to bound from above, which can be achieved by finding a feasible solution to the dual problem of (13) given below:
(15) | ||||
For any matrix satisfying , we have , and it has been shown in the proof of Lemma 19 in Bi and Lavaei (2020) that there exists satisfying the inequalities
(16a) | |||
(16b) |
where is the angle between and . Note that (16b) holds only in the exact-parameterized regime, i.e., the case with , since the derivation of (16b) utilized Lemma 5.4 of Tu et al. (2016), which does not hold when . This is the main reason why our result cannot be directly generalized to the overparameterized case. Now, define
and then decompose as with and . Then, it is easy to verify that defined as
forms a feasible solution to the dual problem (15) with the objective value
(17) |
Furthermore, implies that . By the Wielandt–Hoffman theorem,
Thus, using the above two inequalities and inequality (16a), we have
(18) |
Next, according to Lemma 4, one can write
Substituting the above two equations and (18) into the dual objective value (17), one can obtain
which together with (14) implies that
The inequality (6) can then be proved by combining the above inequality and (16b) under the probabilistic event that . ∎
4 Numerical Illustration
In the next, we will empirically study the developed probabilistic guarantees and demonstrate the distance between any local minimizer and the ground truth as well as the value of the RIP constant required to be satisfied by the linear operator .
Before delving into the numerical illustration, note that the probability used in both Theorem 2 and Theorem 4 can be bounded from below by the probability with . The latter probability can be easily estimated when the probability distribution of the noise is given. As an example, in the simplest case when is sampled from an isotropic Gaussian distribution, i.e., , the random variable follows the chi-square distribution and one can apply the Chernoff bound to obtain
After solving the minimization problem in the above equation, we obtain
More generally, if is a -sub-Gaussian vector, then applying Lemma 1 in Jin et al. (2019a) leads to




For numerical illustration, assume that , and , while the noise is a -sub-Gaussian vector. We also assume that the ground truth is of rank 5 with the largest eigenvalue being 1.5 and the smallest eigenvalue being 1.
First, we explore the two inequalities (3a) and (3b) in Theorem 2 to obtain two upper bounds on , where denotes any arbitrary (worst) local minimizer. Figure 1 gives the contour plots of the two upper bounds on , which hold with the given probability on the -axis and the given RIP constant from 0 to on the -axis. While the final bound on is often determined by the inequality (3b), the inequality (3a) is needed theoretically to deal with the case when has a singular value close to 0.
Furthermore, a tighter bound on the success probability can be derived by calculating the exact probability for an explicit distribution on . Figure 2 is obtained in this way under the same assumptions as those for Figure 1 except that is isotropic Gaussian with the same parameter in the sub-Gaussian assumption. Compared with Figure 1, the shape is similar, but the bound is tighter.
Next, we illustrate the bounds given by Theorem 2 and Theorem 4. Figure 3 shows the contour plots of the maximum RIP constant that is necessary to guarantee that each local minimizer (satisfying the inequality (5) when Theorem 4 is applied) lies within a certain neighborhood of the ground truth (measured via the distance on the -axis) with a given probability on the -axis, as implied by the respective global and local guarantees. Figure 3 clearly shows how a smaller RIP constant leads to a tighter bound on the distance with a higher probability. In addition, the local guarantee generally requires a looser RIP assumption as it still holds even when . However, as the parameter in Theorem 4 increases, the local bound also degrades quickly, sometimes becoming worse than the global bound as illustrated in Figure 3(d).
Moreover, in our experiment, we have also tried different values of problem parameters and . They all yield similar results and are included in Appendix C for completeness.
5 Conclusion
In this paper, we develop global and local analyses for the locations of the local minima of the low-rank matrix recovery problem with noisy linear measurements. Unlike the existing results, the probability distribution of the noise is arbitrary and the RIP constant of the problem is free to take any arbitrary value. The developed results encompass the state-of-the-art results on the non-existence of spurious solutions in the noiseless case. Furthermore, we prove the strict saddle property, which guarantees the global convergence of the perturbed gradient descent method in polynomial time. Our analyses show how the value of the RIP constant and the intensity of noise affect the landscape of the non-convex learning problem and the locations of the local minima relative to the ground truth. Future research directions include extending our results into the cases when the matrices are asymmetric, the measurements are nonlinear, or the overparameterized regime in which is less than .
References
- Bhojanapalli, Neyshabur, and Srebro (2016) Bhojanapalli, S.; Neyshabur, B.; and Srebro, N. 2016. Global Optimality of Local Search for Low Rank Matrix Recovery. In Advances in Neural Information Processing Systems, volume 29.
- Bi and Lavaei (2020) Bi, Y.; and Lavaei, J. 2020. Global and Local Analyses of Nonlinear Low-Rank Matrix Recovery Problems. ArXiv:2010.04349.
- Boumal (2016) Boumal, N. 2016. Nonconvex phase synchronization. SIAM Journal on Optimization, 26(4): 2355–2377.
- Burer and Monteiro (2003) Burer, S.; and Monteiro, R. D. 2003. A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization. Mathematical Programming, 95(2): 329–357.
- Candès and Recht (2009) Candès, E. J.; and Recht, B. 2009. Exact matrix completion via convex optimization. Foundations of Computational Mathematics, 9(6): 717–772.
- Candès and Tao (2010) Candès, E. J.; and Tao, T. 2010. The power of convex relaxation: Near-optimal matrix completion. IEEE Transactions on Information Theory, 56(5): 2053–2080.
- Chen and Chi (2018) Chen, Y.; and Chi, Y. 2018. Harnessing structures in big data via guaranteed low-rank matrix estimation: Recent theory and fast algorithms via convex and nonconvex optimization. IEEE Signal Processing Magazine, 35(4): 14–31.
- Chi, Lu, and Chen (2019) Chi, Y.; Lu, Y. M.; and Chen, Y. 2019. Nonconvex optimization meets low-rank matrix factorization: An overview. IEEE Transactions on Signal Processing, 67(20): 5239–5269.
- Ge, Jin, and Zheng (2017) Ge, R.; Jin, C.; and Zheng, Y. 2017. No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis. In Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, 1233–1242.
- Ha, Liu, and Barber (2020) Ha, W.; Liu, H.; and Barber, R. F. 2020. An Equivalence Between Critical Points for Rank Constraints Versus Low-Rank Factorizations. SIAM Journal on Optimization, 30(4): 2927–2955.
- Jin et al. (2017) Jin, C.; Ge, R.; Netrapalli, P.; Kakade, S. M.; and Jordan, M. I. 2017. How to Escape Saddle Points Efficiently. In Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, 1724–1732.
- Jin et al. (2019a) Jin, C.; Netrapalli, P.; Ge, R.; Kakade, S. M.; and Jordan, M. I. 2019a. A short note on concentration inequalities for random vectors with subGaussian norm. ArXiv:1902.03736.
- Jin et al. (2021) Jin, M.; Lavaei, J.; Sojoudi, S.; and Baldick, R. 2021. Boundary Defense Against Cyber Threat for Power System State Estimation. IEEE Transactions on Information Forensics and Security, 16: 1752–1767.
- Jin et al. (2019b) Jin, M.; Molybog, I.; Mohammadi-Ghazi, R.; and Lavaei, J. 2019b. Towards robust and scalable power system state estimation. In 2019 IEEE 58th Conference on Decision and Control (CDC), 3245–3252. IEEE.
- Li et al. (2020) Li, X.; Zhu, Z.; Man-Cho So, A.; and Vidal, R. 2020. Nonconvex robust low-rank matrix recovery. SIAM Journal on Optimization, 30(1): 660–686.
- Park et al. (2018) Park, D.; Kyrillidis, A.; Caramanis, C.; and Sanghavi, S. 2018. Finding low-rank solutions via nonconvex matrix factorization, efficiently and provably. SIAM Journal on Imaging Sciences, 11(4): 2165–2204.
- Park et al. (2017) Park, D.; Kyrillidis, A.; Carmanis, C.; and Sanghavi, S. 2017. Non-square Matrix Sensing Without Spurious Local Minima via the Burer–Monteiro Approach. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, volume 54 of Proceedings of Machine Learning Research, 65–74.
- Recht, Fazel, and Parrilo (2010) Recht, B.; Fazel, M.; and Parrilo, P. A. 2010. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Review, 52(3): 471–501.
- Shechtman et al. (2015) Shechtman, Y.; Eldar, Y. C.; Cohen, O.; Chapman, H. N.; Miao, J.; and Segev, M. 2015. Phase retrieval with application to optical imaging: A contemporary overview. IEEE Signal Processing Magazine, 32(3): 87–109.
- Singer (2011) Singer, A. 2011. Angular synchronization by eigenvectors and semidefinite programming. Applied and Computational Harmonic Analysis, 30(1): 20–36.
- Tu et al. (2016) Tu, S.; Boczar, R.; Simchowitz, M.; Soltanolkotabi, M.; and Recht, B. 2016. Low-Rank Solutions of Linear Matrix Equations via Procrustes Flow. In Proceedings of the 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, 964–973.
- Wang, Zhang, and Gu (2017) Wang, L.; Zhang, X.; and Gu, Q. 2017. A unified computational and statistical framework for nonconvex low-rank matrix estimation. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, volume 54 of Proceedings of Machine Learning Research, 981–990.
- Zhang and Zhang (2020) Zhang, G.; and Zhang, R. Y. 2020. How Many Samples Is a Good Initial Point Worth in Low-Rank Matrix Recovery? In Advances in Neural Information Processing Systems, volume 33, 12583–12592.
- Zhang (2021) Zhang, R. Y. 2021. Sharp Global Guarantees for Nonconvex Low-Rank Matrix Recovery in the Overparameterized Regime. ArXiv:2104.10790.
- Zhang et al. (2018a) Zhang, R. Y.; Josz, C.; Sojoudi, S.; and Lavaei, J. 2018a. How Much Restricted Isometry Is Needed in Nonconvex Matrix Recovery? In Advances in Neural Information Processing Systems, volume 31.
- Zhang, Sojoudi, and Lavaei (2019) Zhang, R. Y.; Sojoudi, S.; and Lavaei, J. 2019. Sharp Restricted Isometry Bounds for the Inexistence of Spurious Local Minima in Nonconvex Matrix Recovery. Journal of Machine Learning Research, 20(114): 1–34.
- Zhang et al. (2018b) Zhang, X.; Wang, L.; Yu, Y.; and Gu, Q. 2018b. A Primal-Dual Analysis of Global Optimality in Nonconvex Low-Rank Matrix Recovery. In Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, 5862–5871.
- Zhu et al. (2018) Zhu, Z.; Li, Q.; Tang, G.; and Wakin, M. B. 2018. Global Optimality in Low-Rank Matrix Optimization. IEEE Transactions on Signal Processing, 66(13): 3614–3628.
- Zhu et al. (2021) Zhu, Z.; Li, Q.; Tang, G.; and Wakin, M. B. 2021. The global optimization geometry of low-rank matrix optimization. IEEE Transactions on Information Theory, 67(2): 1308–1331.
Acknowledgments
This work was supported by grants from AFOSR, ARO, ONR, and NSF.
Appendix A Remark on Theorem 2
Appendix B Proofs of Lemma 3 and Theorem 3
Proof of Lemma 3.
Let be a matrix satisfying . Similar to the proof of Theorem 4, we introduce an optimization problem as follows:
(19) | ||||
where its optimal value satisfies the inequality
(20) |
In the remaining part, we will prove the following upper bound on :
(21) |
The inequality (7b) is a consequence of (20), (21) and the inequality
The proof of the upper bound (21) can be completed by finding a feasible solution to the dual problem of (19):
(22) | ||||
Before describing the choice of the dual feasible solution, we need to represent the error vector in a different form. Let be the orthogonal projection matrix onto the range of , and be the orthogonal projection matrix onto the orthogonal complement of the range of . Then, can be decomposed as , and there exists a matrix such that . Note that
Thus, if we choose
(23) |
then it can be verified that
Moreover, we have
(24) | ||||
in which the first inequality is due to
Assume first that . The other case will be handled at the end of this proof. In the case when , we also have . Otherwise, the inequality (24) and the assumption imply that . The orthogonality and the definition of in (23) then give rise to
The first equation above implies that is invertible since has full column rank, which contradicts . Now, define the unit vectors
Then, and
(25) |
with
(26) |
We first describe our choices of the dual variables and (which will be scaled later). Let
with orthogonal matrices and diagonal matrices such that . Fix a constant that is to be determined and define
with defined in (23) and
Here, is the elementary matrix of size with the -entry being . By our construction, , which implies that
(27) |
with
(28) |
In addition,
(29) |
and
Therefore,
which together with (25) implies that
(30) |
Next, the inequality (24) and the assumption on imply that
(31) |
and similarly
(32) |
Define
and decompose as in which both and . Let be the angle between and . By Lemma 4, we have
Now, one can verify that defined as
forms a feasible solution to the dual problem (22) whose objective value is equal to
Substituting (27), (29), (30), (31) and (32) into the above equation, we obtain
Choosing the best value of the parameter to minimize the far right-side of the above inequality leads to
with
Here, and are defined in (26) and (28), respectively. In the proof of Theorem 1.2 in Zhang (2021), it is shown that for every with , which gives the upper bound (21).
Finally, we still need to deal with the case when . In this case, we know that with defined in (23). Then, it is easy to check that defined as
forms a feasible solution to the dual problem (22) whose objective value is . By the inequality (24), we have
Hence, the upper bound (21) still holds in this case. ∎
Proof of Theorem 3.
The proof of Theorem 3 is similar to the above proof of Lemma 3 in the situation with , and we will only emphasize the difference here. In the case when , after constructing the feasible solution to the dual problem (22), we have
(33) |
Note that in the rank-1 case, one can write and
in which the last inequality is due to (24). Substituting (27), (29), (30) and the above inequality into (33) and choosing an appropriate as shown in the proof of Lemma 3, we obtain
which implies inequality (4) under the probabilistic event that .
Appendix C Additional Numerical Illustration





