On the Sample Complexity and Optimization Landscape for Quadratic Feasibility Problems
Abstract
We consider the problem of recovering a complex vector from quadratic measurements . This problem, known as quadratic feasibility, encompasses the well known phase retrieval problem and has applications in a wide range of important areas including power system state estimation and x-ray crystallography. In general, not only is the the quadratic feasibility problem NP-hard to solve, but it may in fact be unidentifiable. In this paper, we establish conditions under which this problem becomes identifiable, and further prove isometry properties in the case when the matrices are Hermitian matrices sampled from a complex Gaussian distribution. Moreover, we explore a nonconvex optimization formulation of this problem, and establish salient features of the associated optimization landscape that enables gradient algorithms with an arbitrary initialization to converge to a globally optimal point with a high probability. Our results also reveal sample complexity requirements for successfully identifying a feasible solution in these contexts.
I Introduction
††This work is supported in part by the National Science Foundation under Grant No. OAC-1934766 and CCF-1717391Finding a solution to a system of quadratic equations is an important problem with a wide range of applications. It arises in areas such as power system state estimation [1], phase retrieval [2, 3, 4, 5], x-ray crystallography [6], the turnpike problem [7], and unlabeled distance geometry problems[8, 9] among others. Such problems are often reduced to a quadratic feasibility problem, where one is concerned with finding a feasible vector that conforms to a set of quadratic observations of the form with respect to a set of measurement matrices. Formally, it can be cast as:
(P1) |
The quadratic feasibility problem is an instance of quadratically constrained quadratic programs (QCQPs)[10], which has enjoyed a long and rich research history dating back to 1941[11]. Given their broad applicability to critical problems, research in QCQPs continues to be of active interest [10, 12, 13, 14]. Unfortunately, it is known that solving QCQPs is an NP-hard problem [15]. This combined with the lack of tractable duality properties [16] has made it hard to establish a sound theoretical framework for understanding the solutions and computing them. However, an extremely productive line of research has instead considered subclasses of QCQPs that are both practically relevant and can be analyzed. In this paper, we take a similar approach and identify an important subclass of QCQPs that have a broad range of applications. In particular, we analyze the quadratic feasibility problem, and establish conditions under which such problems are identifiable, and then show that these conditions are in-fact sufficient for the efficient computation of their solutions.
We start by considering quadratic functions , where and are Hermitian matrices. We focus on their ability to generate injective maps up-to a phase factor (note that the quadratic functions are invariant to phase shifts). We establish a relationship between injectivity and isometry and show that, in real world scenarios, it is not difficult for a set of quadratic measurements to possess such an isometry property by establishing that this holds with very high probability when the matrices are complex Gaussian random matrices.
After establishing injectivity (and hence the identifiability) of the problem, we consider the question of computationally tractable approaches to actually find a feasible solution. Toward this end, a natural approach is to optimize the appropriate -loss. Unfortunately, this turns out to be a nonconvex problem, that is NP-hard to solve in general. However, we show that under the same conditions required to establish injectivity, the landscape of this optimization problem is well-behaved. This allows us to establish that any gradient based algorithm with an arbitrary initialization can recover a globally optimal solution almost surely.
The rest of the paper is organized as follows. Section II highlights the main results of this work. We discuss some related work in Section III. In Section IV we establish and analyze isometry/identifiability properties of the mapping when the measurement matrices are complex Gaussian. Finally, Section V casts the problem as a quadratic loss minimization problem (suitable for efficient algorithms) and establishes favorable properties of the loss landscape that allows one to find a solution using gradient-based methods with arbitrary initial points.
II Main Results
Before we state our main results of the paper, we introduce some notation that will be used throughout the paper.
II-A Notation
For any , we write to denote the set . We let and denote the -dimensional complex and real vector spaces, respectively. Unless otherwise stated, bold letters such as indicate vectors in ; and denote the real and the imaginary part of the vector , respectively. We denote complex conjugate of by . Capital letters such as denote matrices in . The use of (without serif) indicates the complex square root of -1 (we will use to indicate an indexing variable). We let denote the set of all matrices having non-negative eigenvalues and negative eigenvalues, where . The set denotes the set of all Hermitian matrices. We write and to denote, respectively, the transpose and the Hermitian transpose (transpose conjugate) of a matrix . We use to denote the inner vector product in the complex space. The symmetric outer product, denoted by , is defined as Finally, we will let denote the following equivalence relation on : if and only if for some with . We will write to denote the associated quotient space. Given a set of matrices , we will let denote the following mapping from :
(1) |
While technically operates on the equivalence classes in , we will abuse the notation slightly and think of as operating on the elements of .
II-B Main Results
We consider the quadratic feasibility problem (P1) with the complex decision vector, i.e., , Hermitian matrices and real numbers for all . In order to understand the properties of this problem, we need a coherent distance metric that is analytically tractable. Toward this, we will use the following:
(2) |
Notice that this distance metric is invariant under phase shifts, and has been used to prove crucial robustness results for the phase retrieval problem (however, only in ); see e.g., [17, 18, 3].
Our first main result (Theorem 1) states that the mapping is a near-isometry when the matrices are chosen from a complex Gaussian distribution. A sketch of the statement is provided below.
Theorem 1 (sketch).
Let be a set of complex Gaussian random matrices. Suppose that the number of measurements satisfies , for a large enough . Then, with a high probability, the following relation holds: for some constants , and for all ,
In other words, nearly preserves distances with respect to the distance measure defined in (2). Therefore, when and are distinct, the corresponding measurements are also distinct – that is, the measurement model defined by is identifiable. The formal statement along with the full proof is presented in Theorem 1 in Section IV.
Having established that (P1) has a uniquely identifiable solution (upto a phase ambiguity), we next turn our attention to finding a feasible solution in a computationally efficient manner. To this end, one may consider recasting the quadratic feasibility problem as a quadratic minimization problem of the -loss function, as follows:
(P2) |
Unfortunately, this optimization problem is non-convex and, in general, one may not expect any gradient based method to converge to a global minimum. However, our next main result states that with high probability, the optimization landscape of is in fact amenable to gradient based methods! Moreover, we establish this result under the same condition required for the problem to be identifiable – namely, the measurement matrices are drawn from the complex Gaussian distribution. We now provide a sketch of our second main result.
Theorem 2 (sketch).
This theorem states that provided the measurement matrices are Gaussian, with high probability, the minimization problem (P2) has no spurious local minima, and any saddle point of the function is strict in the sense that has a strictly negative curvature at such a point. The latter property, called the strict saddle property, is defined in Section V, where we also provide a formal statement of Theorem 2. Finally, based on the properties established here about the loss landscape, we establish that a solution to problem (P2) can be obtained by applying a gradient based algorithm (from an arbitrary initial point), since such an algorithm is unlikely to converge to a saddle point.
III Related work
QCQPs have enjoyed a lot of attention over the last century. However, due to the limitation of the duality properties of QCQPs [19], a significant fraction of research has focused predominantly on heuristic approaches to their solution [20, 21, 22]. Recently, an ADMM-based method has been proposed in[23] with an asymptotic convergence result based on the duality properties QCQPs. Our results in this paper bring new insights to this area by analyzing a subset of QCQPs, namely, the quadratic feasibility problems.
The quadratic feasibility problem (P1) arises in many applications, including phase retrieval [2] and power system state estimation [1]. Phase retrieval in and of itself finds applications in a wide variety of fields such as imaging, optics, quantum tomography, audio signal processing with a wide literature, including [4, 2, 24, 5]. In [3], an approximate isometry property was established for the phase retrieval problem, but the bounds therein are not strong enough to provide RIP-like guarantees. In this paper, we improve these bounds to establish isometry results for a large class of problems and provide RIP-type bounds for the case when are complex Gaussian random matrices.
A feasibility problem is often cast as a minimization problem with a suitably chosen loss function. Even with a nonconvex objective, gradient based methods have proven to work for phase retrieval [24, 5, 2], matrix factorization [25, 26] and robust linear regression [27]. The work in [28] has established landscape properties for the phase retrieval problem, which sheds light on the success of gradient based methods in solving the problem. In this work, we extend these results to a wider class of problems along with additional insights into the problem properties. In [29], it was shown that many nonconvex loss functions have specific landscape properties, which allows gradient based algorithm to recover a globally optimal solution without any additional information. One unfortunately cannot readily transport those results to our setting, mainly due to the significant differences between the real and complex vector spaces. For instance, a quadratic feasibility problem in has only two isolated local minima, while it has a continuum of minima in .
The authors in [30] provide lower bounds on the minimum number of independent measurements required for a successful recovery for the quadratic feasibility problem. More recently, [31] showed that the quadratic feasibility problem can be solved, with high probability, by gradient descent provided a good initialization is used. In contrast, the current work takes a parallel track by analyzing the landscape of the associated -loss function. In particular, for the -loss function, we prove that all local minima are global and all saddle points are strict. Thus, our results enable gradient based algorithms with arbitrary initialization to recover the solution for the quadratic feasibility problem.
IV Properties of the quadratic mapping
As a first step towards establishing our main results, we start by characterizing when the quadratic mapping defined in (1) is in fact an injective mapping. Notice that the injectivity of the mapping is equivalent to the problem being identifiable (and hence solvable). It is also worth noting that in the context of the phase retrieval problem, when is injective, it is said to possess the phase retrievability property [30].
Lemma 1.
The following statements are equivalent:
-
1.
The nonlinear map is injective.
-
2.
There exist constants such that ,
(3)
We refer the reader to Appendix VI-A for a detailed proof. Lemma 1 characterizes the injectivity property of in terms of the action of the matrices on the outer product . In particular, it says that the mapping is injective if and only if the matrices do not distort the outer-product too much. We use this characterization to obtain a more tractable condition that allows us to establish the injectivity of in the case of Gaussian measurement matrices (forthcoming in Lemma 2).
Our tractable condition is based on what we call the stability of the mapping. Informally speaking, the mapping is stable if for any given , there exists such that whenever the vectors satisfy . Such stability properties have been considered in the specific context of phase retrieval; see e.g., [4, 3, 32, 33].
Definition 1 (()-stability).
Consider the mapping defined in (1). We say that is ()-stable in a metric on , with , if the following relation holds for all :
(4) |
The constants depend on the choice of the metric . Throughout the paper, we will work with the metric as defined in (2). Note that our definition of -stability sandwiches the norm of the measurement differences, which is in contrast to the stability concept considered for phase retrieval in [4, 32]. The constants can also be thought of as a condition number, thereby allowing one to quantify the quality of the map; the higher the ratio between and , the better the ability of the mapping to distinguish between two distinct inputs.
With this definition of stability in place, the next lemma states that the map is injective if and only if it is stable.
Lemma 2.
The mapping is injective iff it is -stable for some constants .
Please refer to Appendix VI-B for a proof of the lemma. This result demonstrates the usefulness of both our choice of the metric from (2) and of our definition of stability (Definition 1). As we will see in what follows, Lemma 2 allows one to assess the conditions under which the measurement model implied by the mapping is identifiable.
Next, we turn our attention to the question of when one can establish the above stability condition, and thereby establish the identifiability of the underlying measurement model. To do so, we take our cues from the compressive sensing and phase retrieval literature, and we assume that Hermitian matrices in the set are sampled from a complex Gaussian distribution. More precisely, we assume that each entry in the upper triangle (including the diagonal) of is drawn independently from . The remaining entries are determined by the fact that the matrices are Hermitian.
We first observe that, when is chosen as described above, then for all , and for all (see Appendix VII-A for more details). Our next result shows that these quantities are actually concentrated around their expected values.
Lemma 3.
Let be a set of complex Hermitian Gaussian random matrices for the measurement model given by (P1). Then, given and vectors , there are constants such that
(5) |
Please see Appendix VII-A for a proof of Lemma 3. We would like to draw attention to the fact that proving such a concentration result requires careful attention to the fact that we are operating in the complex domain; our proof techniques may be of independent interest. Notice that this concentration result only holds for a fixed pair of vectors. We however need the concentration result to be uniform over all pairs of vectors in . For this, we will next adapt a standard covering argument as follows; we use to denote the dimensional unit sphere in .
Lemma 4.
Given , let be the smallest collection of -dimensional balls of radius whose union covers the sphere . Then, for any matrix , we have
We refer the reader to Appendix VII-B for a proof of this lemma. This argument is not new and has found application in several results; see e.g., [31, 34, 35].
We are finally ready to state and prove our first main result.
Theorem 1.
Let be the set of complex Gaussian random matrices, and assume the number of measurements satisfies . Then, for any given , there exist constants and such that, with probability at least , the following relation holds
Proof.
Consider . If , then and , and the result holds trivially. Therefore, in the sequel, we assume that the vectors are distinct, implying that .
From Lemma 3, for a given we have
According to [34], for any , we have the following upper bound on the size of the covering: Therefore, for a given , by Lemma 3 and the preceding union bound, we have that
where are the same constants as in Lemma 3. This implies that
Now, observe that
Therefore,
By applying the covering result of Lemma 4 to each matrix , averaging the resulting relations over , and using
we obtain
Thus, we can conclude that
Similarly, we can prove that
Letting be such that and letting , we can see that the following relation holds with probability at least : for all ,
where are given by
Notice that we essentially have a choice of the values of and . The closer they are to 0, the stronger the stability result. However, this also implies the larger , the number of measurements, needs to be. ∎
V Non-convex loss reformulation and the Optimization Landscape
In the preceding section, we established conditions under which the mapping represents an identifiable measurement model. We next consider determining a feasible solution by applying efficient methods to the nonconvex optimization-based reformulation of the feasibility problem, as given in (P2). As noted earlier, in general, gradient-based aproaches need not converge to a global minimum of the -loss minimization problem (P2). Lately, nonconvex optimization has received considerable attention. In particular, methods like SGD and other gradient based methods have been shows to be astonishingly successful in converging to global minima in many nonconvex problems [36, 37, 38]. Arguably, the reason for this is that the optimization landscape of these (somewhat well-behaved) nonconvex problems enjoy some advantageous properties which are crucial in the empirical success of the gradient based methods [39, 40, 28]. The work in [28] proves that the -loss function for the phase retrieval problem enjoys properties such as all local minima being global, and each saddle point having a strictly negative curvature. This work adds to this rich body of work by demonstrating similarly advantageous properties of the optimization landscape for the quadratic feasibility problem. Before we proceed we observe that, since the -loss function is not differentiable in the complex space , it is challenging to address the problem (P2) in a standard way. In what follows, we instead use techniques from Wirtinger calculus [2]. Our first step is to define the notion of a strict saddle function.
Definition 2.
Let be positive scalars. A function is said be -strict saddle function if for any , at least one of the following statements is true:
-
1.
;
-
2.
for some ;
-
3.
is -close to a local minimum, i.e., for some satisfying and .
Intuitively, this implies that every either violates optimality (condition 1 and 2) or is close to a local optimum. A line of recent work [41, 42, 27] has explored the efficacy of gradient based methods in finding a local optimum of functions satisfying Definition 2.
We analyze the optimization landscape of (P2) when our measurement matrices are Hermitian and complex Gaussian, and show that with a high probability every local minimum is in fact global (upto the equivalence relation ). Our next main result states that the function in (P2) is strict saddle.
Theorem 2.
Let be a set of complex Gaussian random matrices, and let for some constant . Let the scalars characterizing the objective function of problem (P2) be generated by quadratic measurements of an unknown vector . Then, for any given , there exist positive constants and such that the following statements hold with probability at least :
-
1)
The function f is -strict saddle, and
-
2)
Every local minimum of satisfies
Recall that the distance metric is defined on the quotient space , therefore, the second statement above says that
. We refer the reader to Appendix VIII-C for the details, but we will give here the brief idea of the proof.
Proof sketch.
Notice that to show that the function in (P2) is a strict saddle function, it suffices to only consider the points such that (otherwise, condition 1 of Definition 2 is satisfied). For all such points, we analyze the behavior of the Hessian and establish that there exists a direction such that the following inequality holds
(6) |
where and is the standard Frobenius norm. By the equivalence of finite dimensional norms, the term on the right side is positive if and only if . This of course implies that whenever , there is a direction where the Hessian has a strict negative curvature, and hence such a point cannot be an optimum. In other words, we can conclude that: (1) All local minima satisfy , and (2) all saddle points have a strictly negative curvature.
This concludes the proof. ∎
Finally, we remark that the properties of the optimization landscape that we have established allows one to use any gradient based iterative method to find a global optimum of problem (P2) – hence, find a solution to the quadratic feasibility problem. Furthermore, our results above also imply that a gradient method, with an arbitrary initial point, would work, which is in a sharp contrast with the existing works,such as [31]. Formally, we have the following result.
Corollary 1.
Consider a gradient method applied to minimize the function in (P2). Then, for an arbitrary initial point, the method point converges to a global minimum of the loss function associated with the quadratic feasibility problem.
Given the landscape properties we have derived in Theorem 2, this result follows in a straightforward manner, for instance, from Theorem 4.1 in [41]. We would like to remark here that the broad flow of ideas in our proof of Theorem 2 bears similarities to those in papers like [29, 43]. However, to the best of our knowledge, the present paper is the first to derive such results in the complex domain.
References
- [1] G. Wang, A. S. Zamzam, G. B. Giannakis, and N. D. Sidiropoulos, “Power system state estimation via feasible point pursuit,” in Signal and Information Processing (GlobalSIP), 2016 IEEE Global Conference on. IEEE, 2016, pp. 773–777.
- [2] E. J. Candes, X. Li, and M. Soltanolkotabi, “Phase retrieval via wirtinger flow: Theory and algorithms,” IEEE Transactions on Information Theory, vol. 61, no. 4, pp. 1985–2007, 2015.
- [3] E. J. Candes, T. Strohmer, and V. Voroninski, “Phaselift: Exact and stable signal recovery from magnitude measurements via convex programming,” Communications on Pure and Applied Mathematics, vol. 66, no. 8, pp. 1241–1274, 2013.
- [4] Y. C. Eldar and S. Mendelson, “Phase retrieval: Stability and recovery guarantees,” Applied and Computational Harmonic Analysis, vol. 36, no. 3, pp. 473–494, 2014.
- [5] R. Balan and D. Zou, “Phase retrieval using lipschitz continuous maps,” arXiv preprint arXiv:1403.2301, 2014.
- [6] J. Drenth, Principles of protein X-ray crystallography. Springer Science & Business Media, 2007.
- [7] T. Dakic, On the turnpike problem. Simon Fraser University BC, Canada, 2000.
- [8] P. M. Duxbury, L. Granlund, S. Gujarathi, P. Juhas, and S. J. Billinge, “The unassigned distance geometry problem,” Discrete Applied Mathematics, vol. 204, pp. 117–132, 2016.
- [9] S. Huang and I. Dokmanić, “Reconstructing point sets from distance distributions,” arXiv preprint arXiv:1804.02465, 2018.
- [10] J. Park and S. Boyd, “General heuristics for nonconvex quadratically constrained quadratic programming,” arXiv preprint arXiv:1703.07870, 2017.
- [11] L. L. Dines, “On the mapping of quadratic forms,” Bulletin of the American Mathematical Society, vol. 47, no. 6, pp. 494–498, 1941.
- [12] A. Beck and D. Pan, “A branch and bound algorithm for nonconvex quadratic optimization with ball and linear constraints,” Journal of Global Optimization, vol. 69, no. 2, pp. 309–342, 2017.
- [13] A. Beck, “Convexity properties associated with nonconvex quadratic matrix functions and applications to quadratic programming,” Journal of Optimization Theory and Applications, vol. 142, no. 1, pp. 1–29, 2009.
- [14] A. Beck and Y. C. Eldar, “Strong duality in nonconvex quadratic optimization with two quadratic constraints,” SIAM Journal on Optimization, vol. 17, no. 3, pp. 844–860, 2006.
- [15] S. Sahni, “Computationally related problems,” SIAM Journal on Computing, vol. 3, no. 4, pp. 262–279, 1974.
- [16] I. Pólik and T. Terlaky, “A survey of the s-lemma,” SIAM review, vol. 49, no. 3, pp. 371–418, 2007.
- [17] A. S. Bandeira, J. Cahill, D. G. Mixon, and A. A. Nelson, “Saving phase: Injectivity and stability for phase retrieval,” Applied and Computational Harmonic Analysis, vol. 37, no. 1, pp. 106–125, 2014.
- [18] R. Balan and Y. Wang, “Invertibility and robustness of phaseless reconstruction,” Applied and Computational Harmonic Analysis, vol. 38, no. 3, pp. 469–488, 2015.
- [19] A. I. Barvinok, “Problems of distance geometry and convex properties of quadratic maps,” Discrete & Computational Geometry, vol. 13, no. 2, pp. 189–202, 1995.
- [20] A. Konar and N. D. Sidiropoulos, “Fast approximation algorithms for a class of non-convex qcqp problems using first-order methods,” IEEE Transactions on Signal Processing, vol. 65, no. 13, pp. 3494–3509.
- [21] ——, “First-order methods for fast feasibility pursuit of non-convex qcqps,” IEEE Transactions on Signal Processing, vol. 65, no. 22, pp. 5927–5941.
- [22] A. Konar, “Non-convex quadratically constrained quadratic programming: Hidden convexity, scalable approximation and applications,” 2017.
- [23] K. Huang and N. D. Sidiropoulos, “Consensus-admm for general quadratically constrained quadratic programming,” Transactions on Signal Processing, vol. 64, no. 20, pp. 5297–5310, 2016.
- [24] Y. S. Tan and R. Vershynin, “Phase retrieval via randomized kaczmarz: theoretical guarantees,” Information and Inference: A Journal of the IMA, 2017.
- [25] S. Bhojanapalli, A. Kyrillidis, and S. Sanghavi, “Dropping convexity for faster semi-definite optimization,” in Conference on Learning Theory, 2016, pp. 530–582.
- [26] C. Jin, S. M. Kakade, and P. Netrapalli, “Provable efficient online matrix completion via non-convex stochastic gradient descent,” in Advances in Neural Information Processing Systems, 2016, pp. 4520–4528.
- [27] P. Jain, P. Kar et al., “Non-convex optimization for machine learning,” Foundations and Trends® in Machine Learning, vol. 10, no. 3-4, pp. 142–336, 2017.
- [28] J. Sun, Q. Qu, and J. Wright, “A geometric analysis of phase retrieval,” Foundations of Computational Mathematics, vol. 18, no. 5, pp. 1131–1198, 2018.
- [29] R. Ge, C. Jin, and Y. Zheng, “No spurious local minima in nonconvex low rank problems: A unified geometric analysis,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017, pp. 1233–1242.
- [30] Y. Wang and Z. Xu, “Generalized phase retrieval: measurement number, matrix recovery and beyond,” Applied and Computational Harmonic Analysis, 2017.
- [31] S. Huang, S. Gupta, and I. Dokmanić, “Solving complex quadratic systems with full-rank random matrices,” arXiv preprint arXiv:1902.05612, 2019.
- [32] J. C. Duchi and F. Ruan, “Solving (most) of a set of quadratic equalities: Composite optimization for robust phase retrieval,” arXiv preprint arXiv:1705.02356, 2017.
- [33] Y. Ye and S. Zhang, “New results on quadratic minimization,” SIAM Journal on Optimization, vol. 14, no. 1, pp. 245–267, 2003.
- [34] R. Vershynin, “Introduction to the non-asymptotic analysis of random matrices,” arXiv preprint arXiv:1011.3027, 2010.
- [35] M. W. Meckes, “Concentration of norms and eigenvalues of random matrices,” Journal of Functional Analysis, vol. 211, no. 2, pp. 508–524, 2004.
- [36] S. S. Du, J. D. Lee, H. Li, L. Wang, and X. Zhai, “Gradient descent finds global minima of deep neural networks,” arXiv preprint arXiv:1811.03804, 2018.
- [37] Y. Chen, Y. Chi, J. Fan, and C. Ma, “Gradient descent with random initialization: Fast global convergence for nonconvex phase retrieval,” arXiv preprint arXiv:1803.07726, 2018.
- [38] C. Jin, R. Ge, P. Netrapalli, S. M. Kakade, and M. I. Jordan, “How to escape saddle points efficiently,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017, pp. 1724–1732.
- [39] A. Bower, L. Jain, and L. Balzano, “The landscape of non-convex quadratic feasibility,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018, pp. 3974–3978.
- [40] D. Davis, D. Drusvyatskiy, and C. Paquette, “The nonsmooth landscape of phase retrieval,” arXiv preprint arXiv:1711.03247, 2017.
- [41] J. D. Lee, M. Simchowitz, M. I. Jordan, and B. Recht, “Gradient descent converges to minimizers,” arXiv preprint arXiv:1602.04915, 2016.
- [42] J. D. Lee, I. Panageas, G. Piliouras, M. Simchowitz, M. I. Jordan, and B. Recht, “First-order methods almost always avoid saddle points,” arXiv preprint arXiv:1710.07406, 2017.
- [43] S. Bhojanapalli, B. Neyshabur, and N. Srebro, “Global optimality of local search for low rank matrix recovery,” in Advances in Neural Information Processing Systems, 2016, pp. 3873–3881.
- [44] R. Balan, “Reconstruction of signals from magnitudes of redundant representations: The complex case,” Foundations of Computational Mathematics, vol. 16, no. 3, pp. 677–721, 2016.
- [45] D. P. Shunmugaraj, “Complex differentiation and cauchy reimann equations.” [Online]. Available: http://home.iitk.ac.in/~psraj/mth102/lecture_notes/comp2.pdf
- [46] J. Cahill, P. G. Casazza, J. Peterson, and L. Woodland, “Phase retrieval by projections,” arXiv preprint arXiv:1305.6226, 2013.
VI Appendix A : Injectivity
VI-A Proof of Lemma 1
Lemma 1.
The following statements are equivalent:
-
1.
The nonlinear map is injective.
-
2.
There exist constants such that ,
Proof.
The following result from [30] is quite crucial,
Theorem (Theorem 2.1, [30]).
Let . The following statements are equivalent:
-
1.
For a given , the mapping has phase retrieval property.
-
2.
There exists no nonzero vector with , , such that Re( for all .
For the mapping to be injective, the following should holds,
Hence for , . It can be verified that for any , satisfies the following transformation,
(7) |
Thus,
We define the lower bound and upper bound as below,
The set is compact, hence the constants exists.
From Theorem Theorem statement, it is clear that cannot be satisfied for all if satisfying equation (7).
Suppose the mapping is not injective.
Then such that,
Thus , but . Thus and hence the negation follows. ∎
VI-B Proof of Lemma 2
Lemma 2.
The mapping is injective iff it is -stable for some constants .
VII Appendix B : High probability bounds
VII-A Proof of Lemma 3
Lemma 3.
Let be a set of complex Hermitian Gaussian random matrices for the measurement model given by (P1). Then, given and vectors , there are constants such that
Proof.
A matrix is a complex Hermitian Gaussian random matrix, if,
-
1.
, .
-
2.
, , .
Let be a set of complex Hermitian Gaussian random matrices. Define the random variable ,
Expectation of Y can be evaluated as,
(8) |
For every matrix , we can split the entire summation (8) into the following 4 sets:
-
1.
-
2.
-
3.
-
4.
Calculating the expectation of the sum of the elements in each individual sets:
-
1.
Set B,
-
2.
Set C, note that for every Hermitian matrix ,
-
3.
Set D,
Notice that
Thus,
Since both the real and imaginary parts are independent, we can conclude,
-
4.
Set E,
All elements are independent in ,
In conclusion,
From Lemma 3.9 [44], note that , where represents the trace of a matrix. Since is a Hermitian matrix . Hence, finally we can state,
Next we focus on obtaining concentration bounds. Just as with expectation , we evaluate the behaviour of deviation in each individual set and .
-
1.
Set B,
Note that is a centered subexponential random variable.
-
2.
Set C,
Again, note that is a centered subexponential random variable.
-
3.
Set D,
Note that . This makes it easier to argue that is a centered subexponential random variable.
-
4.
For elements in set E,
Since for are independent, it can be easily seen that is a centered subexponential random variable .
Take . We then have the Bernstein type inequality [34] as,
for some constants .
We introduce the normalized variable ,
where .
Note that is the distance metric defined in (2). Hence we can rewrite the high probability result more consicely as,
∎
VII-B Proof of Lemma 4
Lemma 4.
Given , let be the smallest collection of -dimensional balls of radius whose union covers the sphere . Then, for any matrix , we have
Proof.
In the proof, we relate the supremum of over to its supremum over .
Since covers , , such that .
Hence , such that,
where denotes the spectral norm of the matrix , i.e.,
We conclude,
And,
Taking supremum,
∎
VIII Appendix C : Non-convex lanscape
VIII-A Supporting Lemmas
The following intermediate results are crucial in proving Theorem 2
Lemma 5.
Let be a set of Hermitian Gaussian random matrices. Then with probability ,
where
Proof.
Let be a complex Hermitian Gaussian random matrix, i.e.
-
1.
, .
-
2.
, , .
Define the random variable ,
For any , Split the entire summation into the following 4 sets such that:
-
1.
-
2.
-
3.
-
4.
Calculating the expectation of the sum of the elements in each individual sets:
-
1.
For set A,
-
2.
For set B,
-
3.
For set C, since the matrix is hermitian
Notice that
Since . Thus,
-
4.
For set D, as all the elements make independent of each other, we have,
Hence we can conclude,
We focus our attention on obtaining concentration bounds. Evaluating the behaviour on the elements in set D,
We can see that the above is a centered subexponential random variable. Hence using Bernstein type inequality [34], we can say that,
where , is the subexponential norm. Thus we can argue that,
where,
∎
Throughout the rest of the paper, define such that for any .
Lemma 6.
For any ,
Proof.
∎
Lemma 7.
The vectors and are such that Im, where
Proof.
We know from Lemma 3.7 [44] that such that,
It can be easily verified that few such pairs are given by,
Next, we focus on . We argue that such that Im( To this end consider the following,
Im can only vanish if where is the angle between the two vectors, i.e. is such that . Next we prove that where,
Consider the following argument,
It can be seen easily that the is achieved when . Thus we have proved that . ∎
Lemma 8.
Let . Then,
where
Proof.
Note,
The minimum can only be achieved at a point where . Further notice that the following relation holds,
(9) |
Hence we can see that,
We know that for any matrix , Tr,
Note that,
Thus we can conclude,
Since , its a real value.
From Lemma 7, it can be seen that . Hence we can say,
∎
VIII-B Wirtinger Calculus
We use standard arguments from wirtinger calculus [30] to prove results Theorem 2. The basic intuition is to look at the -loss function (P2) as function of two real variables in rather instead of single complex variable in . This workaround is required for the analysis of real function of complex variables because of notions of complex differentiability and conclusions from Cauchy-Reimann equations [45]. Hence we map the function to and instead of analysing the properties of , we analyse the properties of .
We first introduce the mapped function and the corresponding expressions for and
For the gradient we have,
For the hessian , we have
The following can be verified easily,
(10) |
VIII-C Proof of Theorem 2
Theorem 2.
Let be a set of complex Gaussian random matrices, and let for some constant . Let the scalars characterizing the objective function of problem (P2) be generated by quadratic measurements of an unknown vector . Then, for any given , there exist positive constants and such that the following statements hold with probability at least :
-
1)
The function f is -strict saddle, and
-
2)
Every local minimum of satisfies
Proof.
Notice that,
Using (9), we can reorganize,
Adding and subtracting , reorganizing,
Adding and subtracting , reorganizing,
Using equation (VIII-B),
Overall, we can conclude that,
Using Lemma 3 and Lemma 5, we can conclude that with probability greater than ,
(11) |
For any , there can be multiple possibilities of the constants and satisfying Theorem 2.
Given , we can bound such that the mapping is -stable, for some small and if the current vector is not close to such that , for sufficiently large , then we have
A particular set of which fit the above condition is , then
Hence we can conclude that the function satisfies at-least one of the following is true,
-
•
-
•
For the direction vector ,
-
•
Thus there exists constants such that the function is strict saddle.
Following up on equation (VIII-C), the only possible way that the hessain is if . Hence we can conclude that all local minimas, i.e. all such that has to satisfy and hence satisfies which makes the solution of the problem (P1).
∎
IX Appendix D : Applications
IX-A Power system state estimation problem
Apart from being a broader class of problems encompassing phase retrieval, the problem setup (P1) also has applications in power system engineering. Given a network of buses and transmission lines, the goal is to estimate complex voltages across all buses from a subset of noisy power and voltage magnitude measurements. In the AC power model, these measurements are quadratically dependent on the voltage values to be determined. Let be the set of measurements and be the corresponding bus admittance value matrices. Then the problem boils down to an estimation problem
find | |||
s.t. |
where is gaussian noise associated with the readings.[21]. For details on the problem setup, please refer [1].
IX-B Fusion Phase retrieval
Let be a set of subspace of . Fusion phase retrieval deals with the problem of recovering upto a phase ambiguity from the measurements of the form , where are projection operators onto the subspaces. [46] had the initial results on this problem with regards to the conditions on the subspaces and minimum number of such subspaces required for successful recovery of under phase ambiguity.
IX-C X-ray crystallography
In X-ray crystallography, especially in crystal twinning [6], the measurements are obtained with orthogonal matrices which again would be solved by out setup.
In the worst case, a feasibility quadratic feasibility problem can be NP-hard, which makes the setup (P1) we address all the more interesting as we can highlight properties about a subgroup of quadratic feasibility problems and take a shot at providing provably converging algorithm for the same. This question resonates quite closely with many applications of quadratic feasibility as discussed above. In this write-up we have only considered the noiseless system, which later can be extended to noisy system analysis.