This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

On the Sample Complexity and Optimization Landscape for Quadratic Feasibility Problems

Parth K. Thaker, Gautam Dasarathy, and Angelia Nedić
Electrical, Computer and Energy Engineering
Arizona State University, Tempe, USA
Email: {pkthaker, gautamd, angelia.nedich}@asu.edu
Abstract

We consider the problem of recovering a complex vector 𝐱n\mathbf{x}\in\mathbb{C}^{n} from mm quadratic measurements {Ai𝐱,𝐱}i=1m\{\langle A_{i}\mathbf{x},\mathbf{x}\rangle\}_{i=1}^{m}. This problem, known as quadratic feasibility, encompasses the well known phase retrieval problem and has applications in a wide range of important areas including power system state estimation and x-ray crystallography. In general, not only is the the quadratic feasibility problem NP-hard to solve, but it may in fact be unidentifiable. In this paper, we establish conditions under which this problem becomes identifiable, and further prove isometry properties in the case when the matrices {Ai}i=1m\{A_{i}\}_{i=1}^{m} are Hermitian matrices sampled from a complex Gaussian distribution. Moreover, we explore a nonconvex optimization formulation of this problem, and establish salient features of the associated optimization landscape that enables gradient algorithms with an arbitrary initialization to converge to a globally optimal point with a high probability. Our results also reveal sample complexity requirements for successfully identifying a feasible solution in these contexts.

I Introduction

This work is supported in part by the National Science Foundation under Grant No. OAC-1934766 and CCF-1717391

Finding a solution to a system of quadratic equations is an important problem with a wide range of applications. It arises in areas such as power system state estimation [1], phase retrieval [2, 3, 4, 5], x-ray crystallography [6], the turnpike problem [7], and unlabeled distance geometry problems[8, 9] among others. Such problems are often reduced to a quadratic feasibility problem, where one is concerned with finding a feasible vector 𝐱\mathbf{x} that conforms to a set of quadratic observations of the form {Ai𝐱,𝐱}i=1m\{\langle A_{i}\mathbf{x},\mathbf{x}\rangle\}_{i=1}^{m} with respect to a set {Ai}i=1m\left\{A_{i}\right\}_{i=1}^{m} of measurement matrices. Formally, it can be cast as:

find 𝐱 such that\displaystyle\text{find }\mathbf{x}\text{ \ \ such that} Ai𝐱,𝐱=ci,i=1,2,,m.\displaystyle\ \langle A_{i}\mathbf{x},\mathbf{x}\rangle=c_{i},~{}~{}~{}\forall i=1,2,\dots,m. (P1)

The quadratic feasibility problem is an instance of quadratically constrained quadratic programs (QCQPs)[10], which has enjoyed a long and rich research history dating back to 1941[11]. Given their broad applicability to critical problems, research in QCQPs continues to be of active interest [10, 12, 13, 14]. Unfortunately, it is known that solving QCQPs is an NP-hard problem [15]. This combined with the lack of tractable duality properties [16] has made it hard to establish a sound theoretical framework for understanding the solutions and computing them. However, an extremely productive line of research has instead considered subclasses of QCQPs that are both practically relevant and can be analyzed. In this paper, we take a similar approach and identify an important subclass of QCQPs that have a broad range of applications. In particular, we analyze the quadratic feasibility problem, and establish conditions under which such problems are identifiable, and then show that these conditions are in-fact sufficient for the efficient computation of their solutions.

We start by considering quadratic functions {Ai𝐱,𝐱}i=1m\{\langle A_{i}\mathbf{x},\mathbf{x}\rangle\}_{i=1}^{m}, where 𝐱n\mathbf{x}\in\mathbb{C}^{n} and Ain×nA_{i}\in\mathbb{C}^{n\times n} are Hermitian matrices. We focus on their ability to generate injective maps up-to a phase factor (note that the quadratic functions {Ai𝐱,𝐱}i=1m\{\langle A_{i}\mathbf{x},\mathbf{x}\rangle\}_{i=1}^{m} are invariant to phase shifts). We establish a relationship between injectivity and isometry and show that, in real world scenarios, it is not difficult for a set of quadratic measurements {Ai𝐱,𝐱}i=1m\{\langle A_{i}\mathbf{x},\mathbf{x}\rangle\}_{i=1}^{m} to possess such an isometry property by establishing that this holds with very high probability when the matrices {Ai}i=1m\{A_{i}\}_{i=1}^{m} are complex Gaussian random matrices.

After establishing injectivity (and hence the identifiability) of the problem, we consider the question of computationally tractable approaches to actually find a feasible solution. Toward this end, a natural approach is to optimize the appropriate 2\ell_{2}-loss. Unfortunately, this turns out to be a nonconvex problem, that is NP-hard to solve in general. However, we show that under the same conditions required to establish injectivity, the landscape of this optimization problem is well-behaved. This allows us to establish that any gradient based algorithm with an arbitrary initialization can recover a globally optimal solution almost surely.

The rest of the paper is organized as follows. Section II highlights the main results of this work. We discuss some related work in Section III. In Section IV we establish and analyze isometry/identifiability properties of the mapping {Ai𝐱,𝐱}\{\langle A_{i}\mathbf{x},\mathbf{x}\rangle\} when the measurement matrices are complex Gaussian. Finally, Section V casts the problem as a quadratic loss minimization problem (suitable for efficient algorithms) and establishes favorable properties of the loss landscape that allows one to find a solution using gradient-based methods with arbitrary initial points.

II Main Results

Before we state our main results of the paper, we introduce some notation that will be used throughout the paper.

II-A Notation

For any rr\in\mathbb{N}, we write [r][r] to denote the set {1,2,,r}\{1,2,\dots,r\}. We let n\mathbb{C}^{n} and n\mathbb{R}^{n} denote the nn-dimensional complex and real vector spaces, respectively. Unless otherwise stated, bold letters such as 𝐱\mathbf{x} indicate vectors in n\mathbb{C}^{n}; 𝐱\mathbf{x}_{\mathbb{R}} and 𝐱\mathbf{x}_{\mathbb{C}} denote the real and the imaginary part of the vector 𝐱\mathbf{x}, respectively. We denote complex conjugate of 𝐱\mathbf{x} by 𝐱¯\bar{\mathbf{x}}. Capital letters such as XX denote matrices in n×n\mathbb{C}^{n\times n}. The use of 𝗂\mathsf{i} (without serif) indicates the complex square root of -1 (we will use ii to indicate an indexing variable). We let 𝒮a,b(n×n)\mathcal{S}^{a,b}(\mathbb{R}^{n\times n}) denote the set of all matrices Xn×nX\in\mathbb{R}^{n\times n} having aa non-negative eigenvalues and bb negative eigenvalues, where a+b=na+b=n. The set 𝐇n()\mathbf{H}_{n}(\mathbb{C}) denotes the set of all n×nn\times n Hermitian matrices. We write AA^{\top} and AA^{*} to denote, respectively, the transpose and the Hermitian transpose (transpose conjugate) of a matrix AA. We use ,\langle\cdot,\cdot\rangle to denote the inner vector product in the complex space. The symmetric outer product, denoted by [[,]][[\cdot,\cdot]], is defined as [[𝐮,𝐯]]=𝐮𝐯+𝐯𝐮.[[\mathbf{u},\mathbf{v}]]=\mathbf{u}\mathbf{v}^{*}+\mathbf{v}\mathbf{u}^{*}. Finally, we will let \sim denote the following equivalence relation on n\mathbb{C}^{n}: 𝐱𝐲\mathbf{x}\sim\mathbf{y} if and only if 𝐱=c𝐲\mathbf{x}=c\mathbf{y} for some cc\in\mathbb{C} with |c|=1|c|=1. We will write ^nn/\widehat{\mathbb{C}}^{n}\triangleq\mathbb{C}^{n}/\sim to denote the associated quotient space. Given a set of matrices 𝒜={Ai}i=1m𝐇n()\mathcal{A}=\{A_{i}\}_{i=1}^{m}\subset\mathbf{H}_{n}(\mathbb{C}), we will let 𝒜\mathcal{M}_{\mathcal{A}} denote the following mapping from ^nm\widehat{\mathbb{C}}^{n}\to\mathbb{C}^{m}:

𝒜(𝐱)=(A1𝐱,𝐱,A2𝐱,𝐱,,Am𝐱,𝐱).\mathcal{M}_{\mathcal{A}}(\mathbf{x})=(\langle A_{1}\mathbf{x},\mathbf{x}\rangle,\langle A_{2}\mathbf{x},\mathbf{x}\rangle,\dots,\langle A_{m}\mathbf{x},\mathbf{x}\rangle). (1)

While 𝒜\mathcal{M}_{\mathcal{A}} technically operates on the equivalence classes in ^n\widehat{\mathbb{C}}^{n}, we will abuse the notation slightly and think of 𝒜\mathcal{M}_{\mathcal{A}} as operating on the elements of n\mathbb{C}^{n}.

II-B Main Results

We consider the quadratic feasibility problem (P1) with the complex decision vector, i.e., 𝐱n\mathbf{x}\in\mathbb{C}^{n}, Hermitian matrices Ain×nA_{i}\in\mathbb{C}^{n\times n} and real numbers cic_{i}\in\mathbb{R} for all i[m]i\in[m]. In order to understand the properties of this problem, we need a coherent distance metric that is analytically tractable. Toward this, we will use the following:

d(𝐱,𝐲)=𝐱𝐱𝐲𝐲Ffor any 𝐱,𝐲n.d(\mathbf{x},\mathbf{y})=\|\mathbf{x}\mathbf{x}^{*}-\mathbf{y}\mathbf{y}^{*}\|_{F}\quad\hbox{for any }\mathbf{x},\mathbf{y}\in\mathbb{C}^{n}. (2)

Notice that this distance metric is invariant under phase shifts, and has been used to prove crucial robustness results for the phase retrieval problem (however, only in n\mathbb{R}^{n}); see e.g., [17, 18, 3].

Our first main result (Theorem 1) states that the mapping 𝒜\mathcal{M}_{\mathcal{A}} is a near-isometry when the matrices AiA_{i} are chosen from a complex Gaussian distribution. A sketch of the statement is provided below.

Theorem 1 (sketch).

Let 𝒜={Ai}i=1m\mathcal{A}=\{A_{i}\}_{i=1}^{m} be a set of complex Gaussian random matrices. Suppose that the number mm of measurements satisfies m>Cnm>Cn, for a large enough C>0C>0. Then, with a high probability, the following relation holds: for some constants α,β>0\alpha,\beta>0, and for all 𝐱,𝐲n\mathbf{x},\mathbf{y}\in\mathbb{C}^{n},

αd(𝐱,𝐲)𝒜(𝐱)𝒜(𝐲)2βd(𝐱,𝐲).\alpha d(\mathbf{x},\mathbf{y})\leq\|\mathcal{M}_{\mathcal{A}}(\mathbf{x})-\mathcal{M}_{\mathcal{A}}(\mathbf{y})\|_{2}\leq\beta d(\mathbf{x},\mathbf{y}).

In other words, 𝒜\mathcal{M}_{\mathcal{A}} nearly preserves distances with respect to the distance measure defined in (2). Therefore, when 𝐱\mathbf{x} and 𝐲\mathbf{y} are distinct, the corresponding measurements are also distinct – that is, the measurement model defined by 𝒜\mathcal{M}_{\mathcal{A}} is identifiable. The formal statement along with the full proof is presented in Theorem 1 in Section IV.

Having established that (P1) has a uniquely identifiable solution (upto a phase ambiguity), we next turn our attention to finding a feasible solution in a computationally efficient manner. To this end, one may consider recasting the quadratic feasibility problem as a quadratic minimization problem of the 2\ell_{2}-loss function, as follows:

min𝐱nf(𝐱),f(𝐱)1mi=1m|Ai𝐱,𝐱ci|2.\min_{\mathbf{x}\in\mathbb{C}^{n}}f(\mathbf{x}),\qquad f(\mathbf{x})\triangleq\frac{1}{m}\sum_{i=1}^{m}\left|\langle A_{i}\mathbf{x},\mathbf{x}\rangle-c_{i}\right|^{2}. (P2)

Unfortunately, this optimization problem is non-convex and, in general, one may not expect any gradient based method to converge to a global minimum. However, our next main result states that with high probability, the optimization landscape of (P2)\eqref{eq:ell2loss0} is in fact amenable to gradient based methods! Moreover, we establish this result under the same condition required for the problem to be identifiable – namely, the measurement matrices are drawn from the complex Gaussian distribution. We now provide a sketch of our second main result.

Theorem 2 (sketch).

Let 𝒜={Ai}i=1m\mathcal{A}=\{A_{i}\}_{i=1}^{m} be a set of complex Gaussian random matrices and let 𝐳\mathbf{z} be a global optimizer of (P2). Then, with high probability, the following holds:

  1. 1.

    𝒜(𝐰)=𝒜(𝐳)\mathcal{M}_{\mathcal{A}}(\mathbf{w})=\mathcal{M}_{\mathcal{A}}(\mathbf{z}) for all local minima 𝐰\mathbf{w} of (P2).

  2. 2.

    The function ff in (P2) has the strict saddle property.

This theorem states that provided the measurement matrices are Gaussian, with high probability, the minimization problem (P2) has no spurious local minima, and any saddle point of the function ff is strict in the sense that ff has a strictly negative curvature at such a point. The latter property, called the strict saddle property, is defined in Section V, where we also provide a formal statement of Theorem 2. Finally, based on the properties established here about the loss landscape, we establish that a solution to problem (P2) can be obtained by applying a gradient based algorithm (from an arbitrary initial point), since such an algorithm is unlikely to converge to a saddle point.

III Related work

QCQPs have enjoyed a lot of attention over the last century. However, due to the limitation of the duality properties of QCQPs [19], a significant fraction of research has focused predominantly on heuristic approaches to their solution [20, 21, 22]. Recently, an ADMM-based method has been proposed in[23] with an asymptotic convergence result based on the duality properties QCQPs. Our results in this paper bring new insights to this area by analyzing a subset of QCQPs, namely, the quadratic feasibility problems.

The quadratic feasibility problem (P1) arises in many applications, including phase retrieval [2] and power system state estimation [1]. Phase retrieval in and of itself finds applications in a wide variety of fields such as imaging, optics, quantum tomography, audio signal processing with a wide literature, including [4, 2, 24, 5]. In [3], an approximate 1\ell_{1} isometry property was established for the phase retrieval problem, but the bounds therein are not strong enough to provide RIP-like guarantees. In this paper, we improve these bounds to establish isometry results for a large class of problems and provide RIP-type bounds for the case when {Ai}i=1m\{A_{i}\}_{i=1}^{m} are complex Gaussian random matrices.

A feasibility problem is often cast as a minimization problem with a suitably chosen loss function. Even with a nonconvex objective, gradient based methods have proven to work for phase retrieval [24, 5, 2], matrix factorization [25, 26] and robust linear regression [27]. The work in [28] has established landscape properties for the phase retrieval problem, which sheds light on the success of gradient based methods in solving the problem. In this work, we extend these results to a wider class of problems along with additional insights into the problem properties. In [29], it was shown that many nonconvex loss functions have specific landscape properties, which allows gradient based algorithm to recover a globally optimal solution without any additional information. One unfortunately cannot readily transport those results to our setting, mainly due to the significant differences between the real and complex vector spaces. For instance, a quadratic feasibility problem in n\mathbb{R}^{n} has only two isolated local minima, while it has a continuum of minima in n\mathbb{C}^{n}.

The authors in [30] provide lower bounds on the minimum number of independent measurements required for a successful recovery for the quadratic feasibility problem. More recently, [31] showed that the quadratic feasibility problem can be solved, with high probability, by gradient descent provided a good initialization is used. In contrast, the current work takes a parallel track by analyzing the landscape of the associated 2\ell_{2}-loss function. In particular, for the 2\ell_{2}-loss function, we prove that all local minima are global and all saddle points are strict. Thus, our results enable gradient based algorithms with arbitrary initialization to recover the solution for the quadratic feasibility problem.

IV Properties of the quadratic mapping

As a first step towards establishing our main results, we start by characterizing when the quadratic mapping 𝒜\mathcal{M}_{\mathcal{A}} defined in (1) is in fact an injective mapping. Notice that the injectivity of the mapping is equivalent to the problem being identifiable (and hence solvable). It is also worth noting that in the context of the phase retrieval problem, when 𝒜\mathcal{M}_{\mathcal{A}} is injective, it is said to possess the phase retrievability property [30].

Lemma 1.

The following statements are equivalent:

  1. 1.

    The nonlinear map 𝒜:^nm\mathcal{M}_{\mathcal{A}}:\widehat{\mathbb{C}}^{n}\rightarrow\mathbb{C}^{m} is injective.

  2. 2.

    There exist constants α,β>0\alpha,\beta>0 such that 𝐮,𝐯n\forall\mathbf{u},\mathbf{v}\in\mathbb{C}^{n},

    β[[𝐮,𝐯]]F2i=1m|Ai,[[𝐮,𝐯]]|2α[[𝐮,𝐯]]F2.\beta\|[[\mathbf{u},\mathbf{v}]]\|^{2}_{F}\geq\sum_{i=1}^{m}|\langle A_{i},[[\mathbf{u},\mathbf{v}]]\rangle|^{2}\geq\alpha\|[[\mathbf{u},\mathbf{v}]]\|^{2}_{F}. (3)

We refer the reader to Appendix VI-A for a detailed proof. Lemma 1 characterizes the injectivity property of 𝒜\mathcal{M}_{\mathcal{A}} in terms of the action of the AiA_{i} matrices on the outer product [[𝐮,𝐯]][[\mathbf{u},\mathbf{v}]]. In particular, it says that the mapping 𝒜\mathcal{M}_{\mathcal{A}} is injective if and only if the matrices AiA_{i} do not distort the outer-product [[𝐮,𝐯]][[\mathbf{u},\mathbf{v}]] too much. We use this characterization to obtain a more tractable condition that allows us to establish the injectivity of 𝒜\mathcal{M}_{\mathcal{A}} in the case of Gaussian measurement matrices (forthcoming in Lemma 2).

Our tractable condition is based on what we call the stability of the mapping. Informally speaking, the mapping 𝒜\mathcal{M}_{\mathcal{A}} is stable if for any given ϵ>0\epsilon>0, there exists δ>0\delta>0 such that 𝒜(𝐱)𝒜(𝐲)δ\|\mathcal{M}_{\mathcal{A}}(\mathbf{x})-\mathcal{M}_{\mathcal{A}}(\mathbf{y})\|\geq\delta whenever the vectors 𝐱,𝐲^n\mathbf{x},\mathbf{y}\in\widehat{\mathbb{C}}^{n} satisfy d(𝐱,𝐲)ϵd(\mathbf{x},\mathbf{y})\geq\epsilon. Such stability properties have been considered in the specific context of phase retrieval; see e.g., [4, 3, 32, 33].

Definition 1 ((α,β\alpha,\beta)-stability).

Consider the mapping 𝒜\mathcal{M}_{\mathcal{A}} defined in (1). We say that 𝒜\mathcal{M}_{\mathcal{A}} is (α,β\alpha,\beta)-stable in a metric dd on n\mathbb{C}^{n}, with 0<αβ0<\alpha\leq\beta, if the following relation holds for all 𝐱1,𝐱2n\mathbf{x}_{1},\mathbf{x}_{2}\in\mathbb{C}^{n}:

αd(𝐱1,𝐱2)𝒜(𝐱1)𝒜(𝐱2)2βd(𝐱1,𝐱2).\alpha d(\mathbf{x}_{1},\mathbf{x}_{2})\leq\|\mathcal{M}_{\mathcal{A}}(\mathbf{x}_{1})-\mathcal{M}_{\mathcal{A}}(\mathbf{x}_{2})\|_{2}\leq\beta d(\mathbf{x}_{1},\mathbf{x}_{2}). (4)

The constants α,β\alpha,\beta depend on the choice of the metric d(,)d(\cdot,\cdot). Throughout the paper, we will work with the metric dd as defined in (2). Note that our definition of (α,β)(\alpha,\beta)-stability sandwiches the norm of the measurement differences, which is in contrast to the stability concept considered for phase retrieval in [4, 32]. The constants α,β\alpha,\beta can also be thought of as a condition number, thereby allowing one to quantify the quality of the map; the higher the ratio between α\alpha and β\beta, the better the ability of the mapping to distinguish between two distinct inputs.

With this definition of stability in place, the next lemma states that the map 𝒜\mathcal{M}_{\mathcal{A}} is injective if and only if it is stable.

Lemma 2.

The mapping 𝒜\mathcal{M}_{\mathcal{A}} is injective iff it is (α,β)(\alpha,\beta)-stable for some constants 0<αβ0<\alpha\leq\beta.

Please refer to Appendix VI-B for a proof of the lemma. This result demonstrates the usefulness of both our choice of the metric d(,)d(\cdot,\cdot) from (2) and of our definition of stability (Definition 1). As we will see in what follows, Lemma 2 allows one to assess the conditions under which the measurement model implied by the mapping 𝒜\mathcal{M}_{\mathcal{A}} is identifiable.

Next, we turn our attention to the question of when one can establish the above stability condition, and thereby establish the identifiability of the underlying measurement model. To do so, we take our cues from the compressive sensing and phase retrieval literature, and we assume that Hermitian matrices in the set 𝒜={Ai}i=1m\mathcal{A}=\{A_{i}\}_{i=1}^{m} are sampled from a complex Gaussian distribution. More precisely, we assume that each entry in the upper triangle (including the diagonal) of AiA_{i} is drawn independently from 𝒩(0,1)\mathcal{N}(0,1). The remaining entries are determined by the fact that the matrices AiA_{i} are Hermitian.

We first observe that, when 𝒜\mathcal{A} is chosen as described above, then 𝔼[|Ai,𝐱𝐱𝐲𝐲|2]=d(𝐱,𝐲)2\mathbb{E}\left[\left|\langle A_{i},\mathbf{x}\mathbf{x}^{\ast}-\mathbf{y}\mathbf{y}^{\ast}\rangle\right|^{2}\right]=d(\mathbf{x},\mathbf{y})^{2} for all 𝐱,𝐲n\mathbf{x},\mathbf{y}\in\mathbb{C}^{n}, and for all i[m]i\in[m] (see Appendix VII-A for more details). Our next result shows that these quantities are actually concentrated around their expected values.

Lemma 3.

Let 𝒜={Ai}i=1m\mathcal{A}=\{A_{i}\}_{i=1}^{m} be a set of complex Hermitian Gaussian random matrices for the measurement model given by (P1). Then, given ϵ>0\epsilon>0 and vectors 𝐱,𝐲n\mathbf{x},\mathbf{y}\in\mathbb{C}^{n}, there are constants c,d>0c,d>0 such that

(|i=1m1m|Ai,𝐱𝐱𝐲𝐲|2d(𝐱,𝐲)2|ϵd(𝐱,𝐲)2)\displaystyle\ \mathbb{P}\left(\left|\sum_{i=1}^{m}\frac{1}{m}|\langle A_{i},\mathbf{x}\mathbf{x}^{*}-\mathbf{y}\mathbf{y}^{*}\rangle|^{2}-d(\mathbf{x},\mathbf{y})^{2}\right|\geq\epsilon d(\mathbf{x},\mathbf{y})^{2}\right)
decmϵ.\displaystyle\quad\leq de^{-cm\epsilon}. (5)

Please see Appendix VII-A for a proof of Lemma 3. We would like to draw attention to the fact that proving such a concentration result requires careful attention to the fact that we are operating in the complex domain; our proof techniques may be of independent interest. Notice that this concentration result only holds for a fixed pair of vectors. We however need the concentration result to be uniform over all pairs of vectors in n\mathbb{C}^{n}. For this, we will next adapt a standard covering argument as follows; we use Sn1S^{n-1} to denote the nn-dimensional unit sphere in n\mathbb{C}^{n}.

Lemma 4.

Given δ>0\delta>0, let 𝒩δ\mathcal{N}_{\delta} be the smallest collection of nn-dimensional balls of radius δ\delta whose union covers the sphere Sn1S^{n-1}. Then, for any matrix An×nA\in\mathbb{C}^{n\times n}, we have

(12δ)sup𝐱1,𝐱2Sn1|A,𝐱1𝐱1𝐱2𝐱2|sup𝐱1,𝐱2𝒩δ|A,𝐱1𝐱1𝐱2𝐱2|(1+2δ)sup𝐱1,𝐱2Sn1|A,𝐱1𝐱1𝐱2𝐱2|.\displaystyle\ (1-2\delta)\sup_{\mathbf{x}_{1},\mathbf{x}_{2}\in S^{n-1}}|\langle A,\mathbf{x}_{1}\mathbf{x}_{1}^{*}-\mathbf{x}_{2}\mathbf{x}_{2}^{*}\rangle|\leq\sup_{\mathbf{x}_{1},\mathbf{x}_{2}\in\mathcal{N}_{\delta}}|\langle A,\mathbf{x}_{1}\mathbf{x}_{1}^{*}-\mathbf{x}_{2}\mathbf{x}_{2}^{*}\rangle|\leq(1+2\delta)\sup_{\mathbf{x}_{1},\mathbf{x}_{2}\in S^{n-1}}|\langle A,\mathbf{x}_{1}\mathbf{x}_{1}^{*}-\mathbf{x}_{2}\mathbf{x}_{2}^{*}\rangle|.

We refer the reader to Appendix VII-B for a proof of this lemma. This argument is not new and has found application in several results; see e.g., [31, 34, 35].

We are finally ready to state and prove our first main result.

Theorem 1.

Let 𝒜={Ai}i=1m\mathcal{A}=\{A_{i}\}_{i=1}^{m} be the set of complex Gaussian random matrices, and assume the number of measurements satisfies m>Cnm>Cn. Then, for any given ξ(0,1)\xi\in(0,1), there exist constants C,c0,d0>0C,c_{0},d_{0}>0 and βα>0\beta\geq\alpha>0 such that, with probability at least 1ξ1-\xi, the following relation holds

αd(𝐱,𝐲)𝒜(𝐱)𝒜(𝐲)2βd(𝐱,𝐲).\alpha d(\mathbf{x},\mathbf{y})\leq\|\mathcal{M}_{\mathcal{A}}(\mathbf{x})-\mathcal{M}_{\mathcal{A}}(\mathbf{y})\|_{2}\leq\beta d(\mathbf{x},\mathbf{y}).
Proof.

Consider 𝐱,𝐲n\mathbf{x},\mathbf{y}\in\mathbb{C}^{n}. If 𝐱𝐲\mathbf{x}\sim\mathbf{y}, then d(𝐱,𝐲)=0d(\mathbf{x},\mathbf{y})=0 and 𝒜(𝐱)=𝒜(𝐲)\mathcal{M}_{\mathcal{A}}(\mathbf{x})=\mathcal{M}_{\mathcal{A}}(\mathbf{y}), and the result holds trivially. Therefore, in the sequel, we assume that the vectors are distinct, implying that d(𝐱,𝐲)>0d(\mathbf{x},\mathbf{y})>0.

From Lemma 3, for a given ϵ>0\epsilon>0 we have

(|i=1m1m|Ai,𝐱𝐱𝐲𝐲|2d(𝐱,𝐲)21|ϵ)decmϵ.\mathbb{P}\left(\left|\sum_{i=1}^{{m}}\frac{1}{m}\frac{|\langle A_{i},\mathbf{x}\mathbf{x}^{*}-\mathbf{y}\mathbf{y}^{*}\rangle|^{2}}{d(\mathbf{x},\mathbf{y})^{2}}-1\right|\geq\epsilon\right)\leq de^{-cm\epsilon}.

According to [34], for any δ>0\delta>0, we have the following upper bound on the size of the covering: |𝒩δ|(12δ)n.|\mathcal{N}_{\delta}|\leq\left(\frac{12}{\delta}\right)^{n}. Therefore, for a given ϵ,δ>0\epsilon,\delta>0, by Lemma 3 and the preceding union bound, we have that

(sup𝐱,𝐲𝒩δ|i=1m1m|Ai,𝐱𝐱𝐲𝐲|2d(𝐱,𝐲)21|>ϵ)decmϵ(12δ)n,\displaystyle\ \ \mathbb{P}\left(\sup_{\mathbf{x},\mathbf{y}\in\mathcal{N}_{\delta}}\left|\sum_{i=1}^{{m}}\frac{1}{m}\frac{|\langle A_{i},\mathbf{x}\mathbf{x}^{*}-\mathbf{y}\mathbf{y}^{*}\rangle|^{2}}{d(\mathbf{x},\mathbf{y})^{2}}-1\right|>\epsilon\right)\leq de^{-cm\epsilon}\left(\frac{12}{\delta}\right)^{n},

where d,cd,c are the same constants as in Lemma 3. This implies that

(sup𝐱,𝐲𝒩δ|i=1m1m|Ai,𝐱𝐱𝐲𝐲|2d(𝐱,𝐲)21|ϵ)1decmϵ(12δ)n.\displaystyle\mathbb{P}\left(\sup_{\mathbf{x},\mathbf{y}\in\mathcal{N}_{\delta}}\left|\sum_{i=1}^{m}\frac{1}{m}\frac{|\langle A_{i},\mathbf{x}\mathbf{x}^{*}-\mathbf{y}\mathbf{y}^{*}\rangle|^{2}}{d(\mathbf{x},\mathbf{y})^{2}}-1\right|\leq\epsilon\right)\geq 1-de^{-cm\epsilon}\left(\frac{12}{\delta}\right)^{n}.

Now, observe that

sup𝐱,𝐲𝒩δ|i=1m1m|Ai,𝐱𝐱𝐲𝐲|2d(𝐱,𝐲)21|sup𝐱,𝐲𝒩δi=1m1m|Ai,𝐱𝐱𝐲𝐲|2d(𝐱,𝐲)21.\displaystyle\sup_{\mathbf{x},\mathbf{y}\in\mathcal{N}_{\delta}}\left|\sum_{i=1}^{m}\frac{1}{m}\frac{|\langle A_{i},\mathbf{x}\mathbf{x}^{*}-\mathbf{y}\mathbf{y}^{*}\rangle|^{2}}{d(\mathbf{x},\mathbf{y})^{2}}-1\right|\geq\sup_{\mathbf{x},\mathbf{y}\in\mathcal{N}_{\delta}}\sum_{i=1}^{m}\frac{1}{m}\frac{|\langle A_{i},\mathbf{x}\mathbf{x}^{*}-\mathbf{y}\mathbf{y}^{*}\rangle|^{2}}{d(\mathbf{x},\mathbf{y})^{2}}-1.

Therefore,

(sup𝐱,𝐲𝒩δi=1m1m|Ai,𝐱𝐱𝐲𝐲|2d(𝐱,𝐲)21ϵ)1decmϵ(12δ)n.\displaystyle\mathbb{P}\left(\sup_{\mathbf{x},\mathbf{y}\in\mathcal{N}_{\delta}}\sum_{i=1}^{m}\frac{1}{m}\frac{|\langle A_{i},\mathbf{x}\mathbf{x}^{*}-\mathbf{y}\mathbf{y}^{*}\rangle|^{2}}{d(\mathbf{x},\mathbf{y})^{2}}-1\leq\epsilon\right)\geq 1-de^{-cm\epsilon}\left(\frac{12}{\delta}\right)^{n}.

By applying the covering result of Lemma 4 to each matrix AiA_{i}, averaging the resulting relations over mm, and using

sup𝐱,𝐲𝒩δd(𝐱,𝐲)(1+2δ)sup𝐱,𝐲Sn1d(𝐱,𝐲).\sup_{\mathbf{x},\mathbf{y}\in\mathcal{N}_{\delta}}d(\mathbf{x},\mathbf{y})\leq(1+2\delta)\sup_{\mathbf{x},\mathbf{y}\in S^{n-1}}d(\mathbf{x},\mathbf{y}).

we obtain

sup𝐱,𝐲𝒩δi=1m1m|Ai,𝐱𝐱𝐲𝐲|2d(𝐱,𝐲)2sup𝐱,𝐲Sn1i=1m(12δ)2(1+2δ)2m|Ai,𝐱𝐱𝐲𝐲|2d(𝐱,𝐲)2.\displaystyle\sup_{\mathbf{x},\mathbf{y}\in\mathcal{N}_{\delta}}\sum_{i=1}^{m}\frac{1}{m}\frac{|\langle A_{i},\mathbf{x}\mathbf{x}^{*}-\mathbf{y}\mathbf{y}^{*}\rangle|^{2}}{d(\mathbf{x},\mathbf{y})^{2}}\geq\sup_{\mathbf{x},\mathbf{y}\in S^{n-1}}\sum_{i=1}^{m}\frac{(1-2\delta)^{2}}{(1+2\delta)^{2}m}\frac{|\langle A_{i},\mathbf{x}\mathbf{x}^{*}-\mathbf{y}\mathbf{y}^{*}\rangle|^{2}}{d(\mathbf{x},\mathbf{y})^{2}}.

Thus, we can conclude that

(sup𝐱,𝐲Sn1i=1m1m|Ai,𝐱𝐱𝐲𝐲|2d(𝐱,𝐲)2(1+2δ)2(1+ϵ)(12δ)2)1decmϵ(12δ)n.\displaystyle\mathbb{P}\left(\sup_{\mathbf{x},\mathbf{y}\in S^{n-1}}\sum_{i=1}^{m}\frac{1}{m}\frac{|\langle A_{i},\mathbf{x}\mathbf{x}^{*}-\mathbf{y}\mathbf{y}^{*}\rangle|^{2}}{d(\mathbf{x},\mathbf{y})^{2}}\leq\frac{\left(1+2\delta)^{2}(1+\epsilon\right)}{(1-2\delta)^{2}}\right)\geq 1-de^{-cm\epsilon}\left(\frac{12}{\delta}\right)^{n}.

Similarly, we can prove that

(inf𝐱,𝐲Sn1i=1m1m|Ai,𝐱𝐱𝐲𝐲|2d(𝐱,𝐲)2(12δ)2(1ϵ)(1+2δ)2)1decmϵ(12δ)n.\displaystyle\mathbb{P}\left(\inf_{\mathbf{x},\mathbf{y}\in S^{n-1}}\sum_{i=1}^{m}\frac{1}{m}\frac{|\langle A_{i},\mathbf{x}\mathbf{x}^{*}-\mathbf{y}\mathbf{y}^{*}\rangle|^{2}}{d(\mathbf{x},\mathbf{y})^{2}}\leq\frac{\left(1-2\delta)^{2}(1-\epsilon\right)}{(1+2\delta)^{2}}\right)\geq 1-de^{-cm\epsilon}\left(\frac{12}{\delta}\right)^{n}.

Letting CC be such that C>log12dlogδξcϵC>\frac{\log 12d-\log\delta\xi}{c\epsilon} and letting mCnm\geq Cn, we can see that the following relation holds with probability at least 1ξ1-\xi: for all 𝐱,𝐲n\mathbf{x},\mathbf{y}\in\mathbb{C}^{n},

βd(𝐱,𝐲)𝒜(𝐱)𝒜(𝐲)2αd(𝐱,𝐲),\displaystyle\beta d(\mathbf{x},\mathbf{y})\geq\|\mathcal{M}_{\mathcal{A}}(\mathbf{x})-\mathcal{M}_{\mathcal{A}}(\mathbf{y})\|_{2}\geq\alpha d(\mathbf{x},\mathbf{y}),

where α,β\alpha,\beta are given by

α((12δ)2(1ϵ)(1+2δ)2,β((1+2δ)2(1+ϵ)(12δ)2.\displaystyle\alpha\triangleq\frac{\left((1-2\delta)^{2}(1-\epsilon\right)}{(1+2\delta)^{2}},~{}~{}~{}\beta\triangleq\frac{\left((1+2\delta)^{2}(1+\epsilon\right)}{(1-2\delta)^{2}}.

Notice that we essentially have a choice of the values of δ\delta and ϵ\epsilon. The closer they are to 0, the stronger the stability result. However, this also implies the larger mm, the number of measurements, needs to be. ∎

V Non-convex loss reformulation and the Optimization Landscape

In the preceding section, we established conditions under which the mapping 𝒜\mathcal{M}_{\mathcal{A}} represents an identifiable measurement model. We next consider determining a feasible solution by applying efficient methods to the nonconvex optimization-based reformulation of the feasibility problem, as given in (P2). As noted earlier, in general, gradient-based aproaches need not converge to a global minimum of the 2\ell_{2}-loss minimization problem (P2). Lately, nonconvex optimization has received considerable attention. In particular, methods like SGD and other gradient based methods have been shows to be astonishingly successful in converging to global minima in many nonconvex problems [36, 37, 38]. Arguably, the reason for this is that the optimization landscape of these (somewhat well-behaved) nonconvex problems enjoy some advantageous properties which are crucial in the empirical success of the gradient based methods [39, 40, 28]. The work in [28] proves that the 2\ell_{2}-loss function for the phase retrieval problem enjoys properties such as all local minima being global, and each saddle point having a strictly negative curvature. This work adds to this rich body of work by demonstrating similarly advantageous properties of the optimization landscape for the quadratic feasibility problem. Before we proceed we observe that, since the 2\ell_{2}-loss function is not differentiable in the complex space n\mathbb{C}^{n}, it is challenging to address the problem (P2) in a standard way. In what follows, we instead use techniques from Wirtinger calculus [2]. Our first step is to define the notion of a strict saddle function.

Definition 2.

Let β,γ,ζ\beta,\gamma,\zeta be positive scalars. A function ff is said be (β,ζ,γ)(\beta,\zeta,\gamma)-strict saddle function if for any 𝐱n\mathbf{x}\in\mathbb{C}^{n}, at least one of the following statements is true:

  1. 1.

    f(𝐱)β\|\nabla f(\mathbf{x})\|\geq\beta;

  2. 2.

    2f(𝐱)𝐳,𝐳ζ\langle\nabla^{2}f(\mathbf{x})\mathbf{z},\mathbf{z}\rangle\leq-\zeta for some 𝐳n\mathbf{z}\in\mathbb{C}^{n};

  3. 3.

    𝐱\mathbf{x} is γ\gamma-close to a local minimum, i.e., d(𝐱,𝐰)γd(\mathbf{x},\mathbf{w})\leq\gamma for some 𝐰\mathbf{w} satisfying f(𝐰)=0\nabla f(\mathbf{w})=0 and 2f(𝐰)0\nabla^{2}f(\mathbf{w})\succeq 0.

Intuitively, this implies that every 𝐱n\mathbf{x}\in\mathbb{C}^{n} either violates optimality (condition 1 and 2) or is close to a local optimum. A line of recent work [41, 42, 27] has explored the efficacy of gradient based methods in finding a local optimum of functions satisfying Definition 2.

We analyze the optimization landscape of (P2) when our measurement matrices are Hermitian and complex Gaussian, and show that with a high probability every local minimum is in fact global (upto the equivalence relation \sim). Our next main result states that the function ff in (P2) is strict saddle.

Theorem 2.

Let {Ai}i=1m\{A_{i}\}_{i=1}^{m} be a set of complex n×nn\times n Gaussian random matrices, and let m>Cnm>Cn for some constant C>0C>0. Let the scalars {ci}i=1m\{c_{i}\}_{i=1}^{m} characterizing the objective function ff of problem (P2) be generated by quadratic measurements of an unknown vector 𝐳\mathbf{z}. Then, for any given ξ(0,1)\xi\in(0,1), there exist positive constants β,γ,\beta,\gamma, and ζ\zeta such that the following statements hold with probability at least 1ξ1-\xi:

  • 1)

    The function f is (β,ζ,γ)(\beta,\zeta,\gamma)-strict saddle, and

  • 2)

    Every local minimum 𝐰\mathbf{w} of ff satisfies d(𝐰,𝐳)=0d(\mathbf{w},\mathbf{z})=0

Recall that the distance metric dd is defined on the quotient space ^n\widehat{\mathbb{C}}^{n}, therefore, the second statement above says that
𝐰𝐳\mathbf{w}\sim\mathbf{z}. We refer the reader to Appendix VIII-C for the details, but we will give here the brief idea of the proof.

Proof sketch.

Notice that to show that the function ff in (P2) is a strict saddle function, it suffices to only consider the points 𝐱n\mathbf{x}\in\mathbb{C}^{n} such that f(𝐱)<β\left\|\nabla f(\mathbf{x})\right\|<\beta (otherwise, condition 1 of Definition 2 is satisfied). For all such points, we analyze the behavior of the Hessian and establish that there exists a direction Δn\Delta\in\mathbb{C}^{n} such that the following inequality holds

2f(𝐱)Δ,Δc0𝐱𝐱𝐳𝐳F2,\displaystyle\langle\nabla^{2}f(\mathbf{x})\Delta,\Delta\rangle\leq-c_{0}\|\mathbf{x}\mathbf{x}^{*}-\mathbf{z}\mathbf{z}^{*}\|_{F}^{2}, (6)

where c0>0c_{0}>0 and F\left\|\cdot\right\|_{F} is the standard Frobenius norm. By the equivalence of finite dimensional norms, the term on the right side is positive if and only if d(𝐱,𝐳)>0d(\mathbf{x},\mathbf{z})>0. This of course implies that whenever d(𝐱,𝐳)>0d(\mathbf{x},\mathbf{z})>0, there is a direction where the Hessian has a strict negative curvature, and hence such a point cannot be an optimum. In other words, we can conclude that: (1) All local minima satisfy d(𝐱,𝐳)=0d(\mathbf{x},\mathbf{z})=0, and (2) all saddle points have a strictly negative curvature.

This concludes the proof. ∎

Finally, we remark that the properties of the optimization landscape that we have established allows one to use any gradient based iterative method to find a global optimum of problem (P2) – hence, find a solution to the quadratic feasibility problem. Furthermore, our results above also imply that a gradient method, with an arbitrary initial point, would work, which is in a sharp contrast with the existing works,such as [31]. Formally, we have the following result.

Corollary 1.

Consider a gradient method applied to minimize the function ff in (P2). Then, for an arbitrary initial point, the method point converges to a global minimum of the loss function ff associated with the quadratic feasibility problem.

Given the landscape properties we have derived in Theorem 2, this result follows in a straightforward manner, for instance, from Theorem 4.1 in [41]. We would like to remark here that the broad flow of ideas in our proof of Theorem 2 bears similarities to those in papers like [29, 43]. However, to the best of our knowledge, the present paper is the first to derive such results in the complex domain.

References

  • [1] G. Wang, A. S. Zamzam, G. B. Giannakis, and N. D. Sidiropoulos, “Power system state estimation via feasible point pursuit,” in Signal and Information Processing (GlobalSIP), 2016 IEEE Global Conference on.   IEEE, 2016, pp. 773–777.
  • [2] E. J. Candes, X. Li, and M. Soltanolkotabi, “Phase retrieval via wirtinger flow: Theory and algorithms,” IEEE Transactions on Information Theory, vol. 61, no. 4, pp. 1985–2007, 2015.
  • [3] E. J. Candes, T. Strohmer, and V. Voroninski, “Phaselift: Exact and stable signal recovery from magnitude measurements via convex programming,” Communications on Pure and Applied Mathematics, vol. 66, no. 8, pp. 1241–1274, 2013.
  • [4] Y. C. Eldar and S. Mendelson, “Phase retrieval: Stability and recovery guarantees,” Applied and Computational Harmonic Analysis, vol. 36, no. 3, pp. 473–494, 2014.
  • [5] R. Balan and D. Zou, “Phase retrieval using lipschitz continuous maps,” arXiv preprint arXiv:1403.2301, 2014.
  • [6] J. Drenth, Principles of protein X-ray crystallography.   Springer Science & Business Media, 2007.
  • [7] T. Dakic, On the turnpike problem.   Simon Fraser University BC, Canada, 2000.
  • [8] P. M. Duxbury, L. Granlund, S. Gujarathi, P. Juhas, and S. J. Billinge, “The unassigned distance geometry problem,” Discrete Applied Mathematics, vol. 204, pp. 117–132, 2016.
  • [9] S. Huang and I. Dokmanić, “Reconstructing point sets from distance distributions,” arXiv preprint arXiv:1804.02465, 2018.
  • [10] J. Park and S. Boyd, “General heuristics for nonconvex quadratically constrained quadratic programming,” arXiv preprint arXiv:1703.07870, 2017.
  • [11] L. L. Dines, “On the mapping of quadratic forms,” Bulletin of the American Mathematical Society, vol. 47, no. 6, pp. 494–498, 1941.
  • [12] A. Beck and D. Pan, “A branch and bound algorithm for nonconvex quadratic optimization with ball and linear constraints,” Journal of Global Optimization, vol. 69, no. 2, pp. 309–342, 2017.
  • [13] A. Beck, “Convexity properties associated with nonconvex quadratic matrix functions and applications to quadratic programming,” Journal of Optimization Theory and Applications, vol. 142, no. 1, pp. 1–29, 2009.
  • [14] A. Beck and Y. C. Eldar, “Strong duality in nonconvex quadratic optimization with two quadratic constraints,” SIAM Journal on Optimization, vol. 17, no. 3, pp. 844–860, 2006.
  • [15] S. Sahni, “Computationally related problems,” SIAM Journal on Computing, vol. 3, no. 4, pp. 262–279, 1974.
  • [16] I. Pólik and T. Terlaky, “A survey of the s-lemma,” SIAM review, vol. 49, no. 3, pp. 371–418, 2007.
  • [17] A. S. Bandeira, J. Cahill, D. G. Mixon, and A. A. Nelson, “Saving phase: Injectivity and stability for phase retrieval,” Applied and Computational Harmonic Analysis, vol. 37, no. 1, pp. 106–125, 2014.
  • [18] R. Balan and Y. Wang, “Invertibility and robustness of phaseless reconstruction,” Applied and Computational Harmonic Analysis, vol. 38, no. 3, pp. 469–488, 2015.
  • [19] A. I. Barvinok, “Problems of distance geometry and convex properties of quadratic maps,” Discrete & Computational Geometry, vol. 13, no. 2, pp. 189–202, 1995.
  • [20] A. Konar and N. D. Sidiropoulos, “Fast approximation algorithms for a class of non-convex qcqp problems using first-order methods,” IEEE Transactions on Signal Processing, vol. 65, no. 13, pp. 3494–3509.
  • [21] ——, “First-order methods for fast feasibility pursuit of non-convex qcqps,” IEEE Transactions on Signal Processing, vol. 65, no. 22, pp. 5927–5941.
  • [22] A. Konar, “Non-convex quadratically constrained quadratic programming: Hidden convexity, scalable approximation and applications,” 2017.
  • [23] K. Huang and N. D. Sidiropoulos, “Consensus-admm for general quadratically constrained quadratic programming,” Transactions on Signal Processing, vol. 64, no. 20, pp. 5297–5310, 2016.
  • [24] Y. S. Tan and R. Vershynin, “Phase retrieval via randomized kaczmarz: theoretical guarantees,” Information and Inference: A Journal of the IMA, 2017.
  • [25] S. Bhojanapalli, A. Kyrillidis, and S. Sanghavi, “Dropping convexity for faster semi-definite optimization,” in Conference on Learning Theory, 2016, pp. 530–582.
  • [26] C. Jin, S. M. Kakade, and P. Netrapalli, “Provable efficient online matrix completion via non-convex stochastic gradient descent,” in Advances in Neural Information Processing Systems, 2016, pp. 4520–4528.
  • [27] P. Jain, P. Kar et al., “Non-convex optimization for machine learning,” Foundations and Trends® in Machine Learning, vol. 10, no. 3-4, pp. 142–336, 2017.
  • [28] J. Sun, Q. Qu, and J. Wright, “A geometric analysis of phase retrieval,” Foundations of Computational Mathematics, vol. 18, no. 5, pp. 1131–1198, 2018.
  • [29] R. Ge, C. Jin, and Y. Zheng, “No spurious local minima in nonconvex low rank problems: A unified geometric analysis,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70.   JMLR. org, 2017, pp. 1233–1242.
  • [30] Y. Wang and Z. Xu, “Generalized phase retrieval: measurement number, matrix recovery and beyond,” Applied and Computational Harmonic Analysis, 2017.
  • [31] S. Huang, S. Gupta, and I. Dokmanić, “Solving complex quadratic systems with full-rank random matrices,” arXiv preprint arXiv:1902.05612, 2019.
  • [32] J. C. Duchi and F. Ruan, “Solving (most) of a set of quadratic equalities: Composite optimization for robust phase retrieval,” arXiv preprint arXiv:1705.02356, 2017.
  • [33] Y. Ye and S. Zhang, “New results on quadratic minimization,” SIAM Journal on Optimization, vol. 14, no. 1, pp. 245–267, 2003.
  • [34] R. Vershynin, “Introduction to the non-asymptotic analysis of random matrices,” arXiv preprint arXiv:1011.3027, 2010.
  • [35] M. W. Meckes, “Concentration of norms and eigenvalues of random matrices,” Journal of Functional Analysis, vol. 211, no. 2, pp. 508–524, 2004.
  • [36] S. S. Du, J. D. Lee, H. Li, L. Wang, and X. Zhai, “Gradient descent finds global minima of deep neural networks,” arXiv preprint arXiv:1811.03804, 2018.
  • [37] Y. Chen, Y. Chi, J. Fan, and C. Ma, “Gradient descent with random initialization: Fast global convergence for nonconvex phase retrieval,” arXiv preprint arXiv:1803.07726, 2018.
  • [38] C. Jin, R. Ge, P. Netrapalli, S. M. Kakade, and M. I. Jordan, “How to escape saddle points efficiently,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70.   JMLR. org, 2017, pp. 1724–1732.
  • [39] A. Bower, L. Jain, and L. Balzano, “The landscape of non-convex quadratic feasibility,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).   IEEE, 2018, pp. 3974–3978.
  • [40] D. Davis, D. Drusvyatskiy, and C. Paquette, “The nonsmooth landscape of phase retrieval,” arXiv preprint arXiv:1711.03247, 2017.
  • [41] J. D. Lee, M. Simchowitz, M. I. Jordan, and B. Recht, “Gradient descent converges to minimizers,” arXiv preprint arXiv:1602.04915, 2016.
  • [42] J. D. Lee, I. Panageas, G. Piliouras, M. Simchowitz, M. I. Jordan, and B. Recht, “First-order methods almost always avoid saddle points,” arXiv preprint arXiv:1710.07406, 2017.
  • [43] S. Bhojanapalli, B. Neyshabur, and N. Srebro, “Global optimality of local search for low rank matrix recovery,” in Advances in Neural Information Processing Systems, 2016, pp. 3873–3881.
  • [44] R. Balan, “Reconstruction of signals from magnitudes of redundant representations: The complex case,” Foundations of Computational Mathematics, vol. 16, no. 3, pp. 677–721, 2016.
  • [45] D. P. Shunmugaraj, “Complex differentiation and cauchy reimann equations.” [Online]. Available: http://home.iitk.ac.in/~psraj/mth102/lecture_notes/comp2.pdf
  • [46] J. Cahill, P. G. Casazza, J. Peterson, and L. Woodland, “Phase retrieval by projections,” arXiv preprint arXiv:1305.6226, 2013.

VI Appendix A : Injectivity

VI-A Proof of Lemma 1

Lemma 1.

The following statements are equivalent:

  1. 1.

    The nonlinear map 𝒜:^nm\mathcal{M}_{\mathcal{A}}:\widehat{\mathbb{C}}^{n}\rightarrow\mathbb{C}^{m} is injective.

  2. 2.

    There exist constants α,β>0\alpha,\beta>0 such that 𝐮,𝐯n\forall\mathbf{u},\mathbf{v}\in\mathbb{C}^{n},

    β[[𝐮,𝐯]]F2i=1m|Ai,[[𝐮,𝐯]]|2α[[𝐮,𝐯]]F2.\beta\|[[\mathbf{u},\mathbf{v}]]\|^{2}_{F}\geq\sum_{i=1}^{m}|\langle A_{i},[[\mathbf{u},\mathbf{v}]]\rangle|^{2}\geq\alpha\|[[\mathbf{u},\mathbf{v}]]\|^{2}_{F}.\\
Proof.

( 1\Rightarrow2 )

The following result from [30] is quite crucial,

Theorem (Theorem 2.1, [30]).

Let 𝒜={Ai}i=1m𝐇n()\mathcal{A}=\{A_{i}\}_{i=1}^{m}\subset\mathbf{H}_{n}(\mathbb{C}). The following statements are equivalent:

  1. 1.

    For a given 𝒜={Ai}i=1m\mathcal{A}=\{A_{i}\}_{i=1}^{m}, the mapping 𝒜\mathcal{M}_{\mathcal{A}} has phase retrieval property.

  2. 2.

    There exists no nonzero vector 𝐯,𝐮n\mathbf{v},\mathbf{u}\in\mathbb{C}^{n} with 𝐮ic𝐯\mathbf{u}\neq ic\mathbf{v}, cc\in\mathbb{R}, such that Re(Aj𝐮,𝐯)=0\langle A_{j}\mathbf{u},\mathbf{v}\rangle)=0 for all 1jm1\leq j\leq m.

For the mapping 𝒜\mathcal{M}_{\mathcal{A}} to be injective, the following should holds,

𝒜(𝐱)=𝒜(𝐲)iff𝐱𝐲\mathcal{M}_{\mathcal{A}}(\mathbf{x})=\mathcal{M}_{\mathcal{A}}(\mathbf{y})~{}~{}~{}\text{iff}~{}~{}~{}\mathbf{x}\sim\mathbf{y}

Hence for 𝐱𝐲\mathbf{x}\nsim\mathbf{y}, 𝒜(𝐱)𝒜(𝐲)\mathcal{M}_{\mathcal{A}}(\mathbf{x})\neq\mathcal{M}_{\mathcal{A}}(\mathbf{y}). It can be verified that for any ϕ[0,2π]\phi\in[0,2\pi], u=xeiϕy,v=x+eiϕy\textbf{u}=\textbf{x}-e^{i\phi}\textbf{y},\textbf{v}=\textbf{x}+e^{i\phi}\textbf{y} satisfies the following transformation,

(𝐱𝐱𝐲𝐲)=(𝐮𝐯+𝐯𝐮)=[[𝐮,𝐯]]\left(\mathbf{x}\mathbf{x}^{*}-\mathbf{y}\mathbf{y}^{*}\right)=\left(\mathbf{u}\mathbf{v}^{*}+\mathbf{v}\mathbf{u}^{*}\right)=[[\mathbf{u},\mathbf{v}]] (7)

Thus,

𝒜(𝐱)𝒜(𝐲)22=i=1m|Ai,[[𝐮,𝐯]]|2\displaystyle\|\mathcal{M}_{\mathcal{A}}(\mathbf{x})-\mathcal{M}_{\mathcal{A}}(\mathbf{y})\|_{2}^{2}=\sum_{i=1}^{m}\left|\langle A_{i},[[\mathbf{u},\mathbf{v}]]\rangle\right|^{2}

We define the lower bound α\alpha and upper bound β\beta as below,

α:=minTS1,1,TF=1i=1m|Ai,T|2,β:=maxTS1,1,TF=1i=1m|Ai,T|2\displaystyle\alpha:=\min_{T\in S^{1,1},\|T\|_{F}=1}\sum_{i=1}^{m}|\langle A_{i},T\rangle|^{2},~{}~{}~{}\beta:=\max_{T\in S^{1,1},\|T\|_{F}=1}\sum_{i=1}^{m}|\langle A_{i},T\rangle|^{2}

The set TS1,1,TF=1T\in S^{1,1},\|T\|_{F}=1 is compact, hence the constants α,β\alpha,\beta exists.

From Theorem Theorem statement, it is clear that Ai,[[𝐮,𝐯]]=Ai,𝐮𝐯+Ai,𝐯𝐮=Re(Ai𝐯,𝐮)=0\langle A_{i},[[\mathbf{u},\mathbf{v}]]\rangle=\langle A_{i},\mathbf{u}\mathbf{v}^{*}\rangle+\langle A_{i},\mathbf{v}\mathbf{u}^{*}\rangle=Re(\langle A_{i}\mathbf{v},\mathbf{u}\rangle)=0 cannot be satisfied for all i[1,m]i\in[1,m] if 𝐱𝐲\mathbf{x}\nsim\mathbf{y} satisfying equation (7).

( 1\Leftarrow2 )

Instead of proving 1\Leftarrow2 , we argue the negation holds, i.e. 1\nRightarrow2.

Suppose the mapping 𝒜\mathcal{M}_{\mathcal{A}} is not injective.

Then 𝐱,𝐲n\exists\mathbf{x},\mathbf{y}\in\mathbb{C}^{n} such that,

𝐱𝐲,𝒜(𝐲)=𝒜(𝐱)\mathbf{x}\nsim\mathbf{y},~{}~{}~{}\mathcal{M}_{\mathcal{A}}(\mathbf{y})=\mathcal{M}_{\mathcal{A}}(\mathbf{x})

Thus 𝐱𝐱𝐲𝐲F0\|\mathbf{x}\mathbf{x}^{*}-\mathbf{y}\mathbf{y}^{*}\|_{F}\neq 0, but 𝒜(𝐲)𝒜(𝐱)2=0\|\mathcal{M}_{\mathcal{A}}(\mathbf{y})-\mathcal{M}_{\mathcal{A}}(\mathbf{x})\|_{2}=0. Thus α=0\alpha=0 and hence the negation follows. ∎

VI-B Proof of Lemma 2

Lemma 2.

The mapping 𝒜\mathcal{M}_{\mathcal{A}} is injective iff it is (α,β)(\alpha,\beta)-stable for some constants 0<αβ0<\alpha\leq\beta.

Proof.

In order to prove the theorem statement, we examine the properties of the ratio,

V(𝐱,𝐲)=i=1m|Ai,𝐱𝐱𝐲𝐲|2𝐱𝐱𝐲𝐲F2\displaystyle V(\mathbf{x},\mathbf{y})=\frac{\sum_{i=1}^{m}|\langle A_{i},\mathbf{x}\mathbf{x}^{*}-\mathbf{y}\mathbf{y}^{*}\rangle|^{2}}{\|\mathbf{x}\mathbf{x}^{*}-\mathbf{y}\mathbf{y}^{*}\|_{F}^{2}}

The (α,β\alpha,\beta)-stability of the mapping 𝒜\mathcal{M}_{\mathcal{A}} directly follows from Lemma 1 and the existence of 𝐮,𝐯n\mathbf{u},\mathbf{v}\in\mathbb{C}^{n} satisfying equation (7). ∎

VII Appendix B : High probability bounds

VII-A Proof of Lemma 3

Lemma 3.

Let 𝒜={Ai}i=1m\mathcal{A}=\{A_{i}\}_{i=1}^{m} be a set of complex Hermitian Gaussian random matrices for the measurement model given by (P1). Then, given ϵ>0\epsilon>0 and vectors 𝐱,𝐲n\mathbf{x},\mathbf{y}\in\mathbb{C}^{n}, there are constants c,d>0c,d>0 such that

(|i=1m1m|Ai,𝐱𝐱𝐲𝐲|2d(𝐱,𝐲)2|ϵd(𝐱,𝐲)2)decmϵ.\displaystyle\mathbb{P}\left(\left|\sum_{i=1}^{m}\frac{1}{m}|\langle A_{i},\mathbf{x}\mathbf{x}^{*}-\mathbf{y}\mathbf{y}^{*}\rangle|^{2}-d(\mathbf{x},\mathbf{y})^{2}\right|\geq\epsilon d(\mathbf{x},\mathbf{y})^{2}\right)\leq de^{-cm\epsilon}.
Proof.

A matrix An×nA\in\mathbb{C}^{n\times n} is a complex Hermitian Gaussian random matrix, if,

  1. 1.

    i\forall~{}i, aii𝒩(0,σ2)a_{ii}\sim\mathcal{N}(0,\sigma^{2}).

  2. 2.

    i,j\forall~{}i,j, iji\neq j, aij𝒩(0,σ22)+i𝒩(0,σ22)a_{ij}\sim\mathcal{N}(0,\frac{\sigma^{2}}{2})+i\mathcal{N}(0,\frac{\sigma^{2}}{2}).

Let {Ad}d=1m\{A_{d}\}_{d=1}^{m} be a set of complex Hermitian Gaussian random matrices. Define the random variable YY,

Y\displaystyle Y =1md=1m|Ad,𝐱𝐱𝐲𝐲|2=1md=1m(Ad,𝐱𝐱𝐲𝐲)(Ad,𝐱𝐱𝐲𝐲)\displaystyle\ =\frac{1}{m}\sum_{d=1}^{m}|\langle A_{d},\mathbf{x}\mathbf{x}^{*}-\mathbf{y}\mathbf{y}^{*}\rangle|^{2}=\frac{1}{m}\sum_{d=1}^{m}(\langle A_{d},\mathbf{x}\mathbf{x}^{*}-\mathbf{y}\mathbf{y}^{*}\rangle)(\langle A_{d},\mathbf{x}\mathbf{x}^{*}-\mathbf{y}\mathbf{y}^{*}\rangle)
=1md=1m(ijaij(xix¯jyiy¯j))(klakl(xkx¯lyky¯l))\displaystyle\ =\frac{1}{m}\sum_{d=1}^{m}\left(\sum_{ij}a_{ij}(x_{i}\bar{x}_{j}-y_{i}\bar{y}_{j})\right)\left(\sum_{kl}a_{kl}(x_{k}\bar{x}_{l}-y_{k}\bar{y}_{l})\right)

Expectation of Y can be evaluated as,

𝔼[Y]=𝔼(1md=1m(ijaij(xix¯jyiy¯j))(klakl(xkx¯lyky¯l)))\displaystyle\mathbb{E}[Y]=\mathbb{E}\left(\frac{1}{m}\sum_{d=1}^{m}\left(\sum_{ij}a_{ij}(x_{i}\bar{x}_{j}-y_{i}\bar{y}_{j})\right)\left(\sum_{kl}a_{kl}(x_{k}\bar{x}_{l}-y_{k}\bar{y}_{l})\right)\right) (8)

For every matrix AdA_{d}, we can split the entire summation (8) into the following 4 sets:

  1. 1.

    B:={(i,j,k,l)|i=j=k=l}B:=\{(i,j,k,l)|i=j=k=l\}

  2. 2.

    C:={(i,j,k,l)|i=l,j=k}ACC:=\{(i,j,k,l)|i=l,j=k\}\cap A^{C}

  3. 3.

    D:={(i,j,k,l)|i=k,j=l}ACD:=\{(i,j,k,l)|i=k,j=l\}\cap A^{C}

  4. 4.

    E:={(i,j,k,l)}ACBCCCE:=\{(i,j,k,l)\}\cap A^{C}\cap B^{C}\cap C^{C}

Calculating the expectation of the sum of the elements in each individual sets:

  1. 1.

    Set B,

    𝔼((i,j,k,l)Baijakl(xix¯jyiy¯j)(xkx¯lyky¯l))\displaystyle\mathbb{E}\left(\sum_{(i,j,k,l)\in B}a_{ij}a_{kl}(x_{i}\bar{x}_{j}-y_{i}\bar{y}_{j})(x_{k}\bar{x}_{l}-y_{k}\bar{y}_{l})\right) =𝔼(i=1n|aii|2(|xi|2|yi|2)2)\displaystyle\ =\mathbb{E}\left(\sum_{i=1}^{n}|a_{ii}|^{2}(|x_{i}|^{2}-|y_{i}|^{2})^{2}\right)
    =σ2i=1n|xi|4+|yi|42|xi|2|yi|2\displaystyle=\sigma^{2}\sum_{i=1}^{n}|x_{i}|^{4}+|y_{i}|^{4}-2|x_{i}|^{2}|y_{i}|^{2}
  2. 2.

    Set C, note that for every Hermitian matrix AdA_{d}, aij=a¯jia_{ij}=\bar{a}_{ji}

    𝔼((i,j,k,l)Caijakl(xix¯jyiy¯j)(xkx¯lyky¯l))\displaystyle\mathbb{E}\left(\sum_{(i,j,k,l)\in C}a_{ij}a_{kl}(x_{i}\bar{x}_{j}-y_{i}\bar{y}_{j})(x_{k}\bar{x}_{l}-y_{k}\bar{y}_{l})\right)
    =𝔼(i,j=1,ijn|aij|2(|xi|2|xj|2yiy¯jxjx¯ixix¯jyjy¯i+|yi|2|yj|2))\displaystyle\ =\mathbb{E}\left(\sum_{i,j=1,i\neq j}^{n}|a_{ij}|^{2}(|x_{i}|^{2}|x_{j}|^{2}-y_{i}\bar{y}_{j}x_{j}\bar{x}_{i}-x_{i}\bar{x}_{j}y_{j}\bar{y}_{i}+|y_{i}|^{2}|y_{j}|^{2})\right)
    =σ2i,j=1,ijn(|xi|2|xj|2yiy¯jxjx¯ixix¯jyjy¯i+|yi|2|yj|2)\displaystyle\ =\sigma^{2}\sum_{i,j=1,i\neq j}^{n}(|x_{i}|^{2}|x_{j}|^{2}-y_{i}\bar{y}_{j}x_{j}\bar{x}_{i}-x_{i}\bar{x}_{j}y_{j}\bar{y}_{i}+|y_{i}|^{2}|y_{j}|^{2})
  3. 3.

    Set D,

    𝔼((i,j,k,l)Daijakl(xix¯jyiy¯j)(xkx¯lyky¯l))=𝔼(ij(aij)2(xix¯jyiy¯j)2))=0\displaystyle\mathbb{E}\left(\sum_{(i,j,k,l)\in D}a_{ij}a_{kl}(x_{i}\bar{x}_{j}-y_{i}\bar{y}_{j})(x_{k}\bar{x}_{l}-y_{k}\bar{y}_{l})\right)=\mathbb{E}\left(\sum_{ij}(a_{ij})^{2}(x_{i}\bar{x}_{j}-y_{i}\bar{y}_{j})^{2})\right)=0

    Notice that i,j\forall i,j

    (aij)2\displaystyle(a_{ij})^{2} =((aijr)2(aiji)2+iaijraiji)\displaystyle\ =((a_{ij}^{r})^{2}-(a_{ij}^{i})^{2}+ia_{ij}^{r}a_{ij}^{i})

    Thus,

    𝔼[(aij)2]=𝔼[(aij)2]=𝔼[((aijr)2(aiji)2+iaijraiji)]\displaystyle\mathbb{E}\left[(a_{ij})^{2}\right]=\mathbb{E}\left[(a_{ij})^{2}\right]=\mathbb{E}\left[((a_{ij}^{r})^{2}-(a_{ij}^{i})^{2}+ia_{ij}^{r}a_{ij}^{i})\right]

    Since both the real and imaginary parts are independent, we can conclude, 𝔼[((aijr)2(aiji)2+iaijraiji)]=0\mathbb{E}\left[((a_{ij}^{r})^{2}-(a_{ij}^{i})^{2}+ia_{ij}^{r}a_{ij}^{i})\right]=0

  4. 4.

    Set E,

    All elements aij,akla_{ij},a_{kl} are independent in (i,j,k,l)E(i,j,k,l)\in E,

    𝔼((i,j,k,l)Eaijakl(xix¯jyiy¯j)(xkx¯lyky¯l))=0\displaystyle\mathbb{E}\left(\sum_{(i,j,k,l)\in E}a_{ij}a_{kl}(x_{i}\bar{x}_{j}-y_{i}\bar{y}_{j})(x_{k}\bar{x}_{l}-y_{k}\bar{y}_{l})\right)=0

In conclusion,

𝔼[Y]\displaystyle\mathbb{E}[Y] =σ2((i=1n|xi|2)2+(i=1n|yi|2)22i=1n|xi|2|yi|2i,j,ijyiy¯jxjx¯ii,j,ijyjy¯ixix¯j)\displaystyle\ =\sigma^{2}\left(\left(\sum_{i=1}^{n}|x_{i}|^{2}\right)^{2}+\left(\sum_{i=1}^{n}|y_{i}|^{2}\right)^{2}-2\sum_{i=1}^{n}|x_{i}|^{2}|y_{i}|^{2}-\sum_{i,j,i\neq j}y_{i}\bar{y}_{j}x_{j}\bar{x}_{i}-\sum_{i,j,i\neq j}y_{j}\bar{y}_{i}x_{i}\bar{x}_{j}\right)
=σ2[𝐱24+𝐲24|𝐱,𝐲|2]\displaystyle\ =\sigma^{2}\left[\|\mathbf{x}\|_{2}^{4}+\|\mathbf{y}\|_{2}^{4}-|\langle\mathbf{x},\mathbf{y}\rangle|^{2}\right]

From Lemma 3.9 [44], note that tr{(𝐱𝐱𝐲𝐲)2}=[𝐱24+𝐲24|𝐱,𝐲|2]tr\{(\mathbf{x}\mathbf{x}^{*}-\mathbf{y}\mathbf{y}^{*})^{2}\}=\left[\|\mathbf{x}\|_{2}^{4}+\|\mathbf{y}\|_{2}^{4}-|\langle\mathbf{x},\mathbf{y}\rangle|^{2}\right], where tr{}tr\{\cdot\} represents the trace of a matrix. Since 𝐱𝐱𝐲𝐲\mathbf{x}\mathbf{x}^{*}-\mathbf{y}\mathbf{y}^{*} is a Hermitian matrix tr{(𝐱𝐱𝐲𝐲)2}=𝐱𝐱𝐲𝐲F2tr\{(\mathbf{x}\mathbf{x}^{*}-\mathbf{y}\mathbf{y}^{*})^{2}\}=\|\mathbf{x}\mathbf{x}^{*}-\mathbf{y}\mathbf{y}^{*}\|_{F}^{2}. Hence, finally we can state,

𝔼[Y]=𝐱𝐱𝐲𝐲F2\mathbb{E}[Y]=\|\mathbf{x}\mathbf{x}^{*}-\mathbf{y}\mathbf{y}^{*}\|_{F}^{2}

Next we focus on obtaining concentration bounds. Just as with expectation 𝔼[Y]\mathbb{E}[Y], we evaluate the behaviour of deviation in each individual set B,C,DB,C,D and EE.

  1. 1.

    Set B,

    (i,j,k,l)Baijakl(xix¯jyiy¯j)(xkx¯lyky¯l)𝔼((i,j,k,l)Baijakl(xix¯jyiy¯j)(xkx¯lyky¯l))\displaystyle\sum_{(i,j,k,l)\in B}a_{ij}a_{kl}(x_{i}\bar{x}_{j}-y_{i}\bar{y}_{j})(x_{k}\bar{x}_{l}-y_{k}\bar{y}_{l})-\mathbb{E}\left(\sum_{(i,j,k,l)\in B}a_{ij}a_{kl}(x_{i}\bar{x}_{j}-y_{i}\bar{y}_{j})(x_{k}\bar{x}_{l}-y_{k}\bar{y}_{l})\right)
    =i=1n(|aii|2σ2)(|xi|4+|yi|42|xi|2|yi|2)\displaystyle=\sum_{i=1}^{n}\left(|a_{ii}|^{2}-\sigma^{2}\right)\left(|x_{i}|^{4}+|y_{i}|^{4}-2|x_{i}|^{2}|y_{i}|^{2}\right)

    Note that i[1,n],|aii|2σ2\forall i\in[1,n],|a_{ii}|^{2}-\sigma^{2} is a centered subexponential random variable.

  2. 2.

    Set C,

    (i,j,k,l)Caijakl(xix¯jyiy¯j)(xkx¯lyky¯l)𝔼((i,j,k,l)Caijakl(xix¯jyiy¯j)(xkx¯lyky¯l))\displaystyle\sum_{(i,j,k,l)\in C}a_{ij}a_{kl}(x_{i}\bar{x}_{j}-y_{i}\bar{y}_{j})(x_{k}\bar{x}_{l}-y_{k}\bar{y}_{l})-\mathbb{E}\left(\sum_{(i,j,k,l)\in C}a_{ij}a_{kl}(x_{i}\bar{x}_{j}-y_{i}\bar{y}_{j})(x_{k}\bar{x}_{l}-y_{k}\bar{y}_{l})\right)
    =i,j=1,ijn(|aij|2σ2)(|xi|2|xj|2yiy¯jxjx¯ixix¯jyjy¯i+|yi|2|yj|2)\displaystyle=\sum_{i,j=1,i\neq j}^{n}(|a_{ij}|^{2}-\sigma^{2})(|x_{i}|^{2}|x_{j}|^{2}-y_{i}\bar{y}_{j}x_{j}\bar{x}_{i}-x_{i}\bar{x}_{j}y_{j}\bar{y}_{i}+|y_{i}|^{2}|y_{j}|^{2})

    Again, note that i,j[1,n]2,ij,|aij|2σ2\forall i,j\in[1,n]^{2},i\neq j,|a_{ij}|^{2}-\sigma^{2} is a centered subexponential random variable.

  3. 3.

    Set D,

    (i,j,k,l)Daijakl(xix¯jyiy¯j)(xkx¯lyky¯l)𝔼((i,j,k,l)Daijakl(xix¯jyiy¯j)(xkx¯lyky¯l))\displaystyle\sum_{(i,j,k,l)\in D}a_{ij}a_{kl}(x_{i}\bar{x}_{j}-y_{i}\bar{y}_{j})(x_{k}\bar{x}_{l}-y_{k}\bar{y}_{l})-\mathbb{E}\left(\sum_{(i,j,k,l)\in D}a_{ij}a_{kl}(x_{i}\bar{x}_{j}-y_{i}\bar{y}_{j})(x_{k}\bar{x}_{l}-y_{k}\bar{y}_{l})\right)
    =i,j=1,ijn(aij)2(xi)2(x¯j)2\displaystyle=\sum_{i,j=1,i\neq j}^{n}(a_{ij})^{2}(x_{i})^{2}(\bar{x}_{j})^{2}

    Note that aij2=(aijr)2(aiji)2+iaijraijia_{ij}^{2}=(a_{ij}^{r})^{2}-(a_{ij}^{i})^{2}+ia_{ij}^{r}a_{ij}^{i}. This makes it easier to argue that i,j[1,n]2,ij,(aij)2\forall i,j\in[1,n]^{2},i\neq j,(a_{ij})^{2} is a centered subexponential random variable.

  4. 4.

    For elements in set E,

    (i,j,k,l)Eaijakl(xix¯jyiy¯j)(xkx¯lyky¯l)𝔼((i,j,k,l)Eaijakl(xix¯jyiy¯j)(xkx¯lyky¯l))\displaystyle\sum_{(i,j,k,l)\in E}a_{ij}a_{kl}(x_{i}\bar{x}_{j}-y_{i}\bar{y}_{j})(x_{k}\bar{x}_{l}-y_{k}\bar{y}_{l})-\mathbb{E}\left(\sum_{(i,j,k,l)\in E}a_{ij}a_{kl}(x_{i}\bar{x}_{j}-y_{i}\bar{y}_{j})(x_{k}\bar{x}_{l}-y_{k}\bar{y}_{l})\right)
    =(i,j,k,l)Eaijakl(xix¯jyiy¯j)(xkx¯lyky¯l)\displaystyle=\sum_{(i,j,k,l)\in E}a_{ij}a_{kl}(x_{i}\bar{x}_{j}-y_{i}\bar{y}_{j})(x_{k}\bar{x}_{l}-y_{k}\bar{y}_{l})

    Since aij,akla_{ij},a_{kl} for (i,j,k,l)E(i,j,k,l)\in E are independent, it can be easily seen that aijakla_{ij}a_{kl} is a centered subexponential random variable (i,j,k,l)E\forall(i,j,k,l)\in E.

Take σ2=1\sigma^{2}=1. We then have the Bernstein type inequality [34] as,

(|d=1n1m|Ad,𝐱𝐱𝐲𝐲|2𝐱𝐱𝐲𝐲F2|t)\displaystyle\ \mathbb{P}\left(\left|\sum_{d=1}^{n}\frac{1}{m}|\langle A_{d},\mathbf{x}\mathbf{x}^{*}-\mathbf{y}\mathbf{y}^{*}\rangle|^{2}-\|\mathbf{x}\mathbf{x}^{*}-\mathbf{y}\mathbf{y}^{*}\|_{F}^{2}\right|\leq t\right)
1c0exp(c1mmin{t2K42𝐱𝐱𝐲𝐲24,tK4𝐱𝐱𝐲𝐲2})\displaystyle\ \geq 1-c_{0}\text{exp}\left(-c_{1}m\min\left\{\frac{t^{2}}{K_{4}^{2}\|\mathbf{x}\mathbf{x}^{*}-\mathbf{y}\mathbf{y}^{*}\|_{2}^{4}},\frac{t}{K_{4}\|\mathbf{x}\mathbf{x}^{*}-\mathbf{y}\mathbf{y}^{*}\|_{\infty}^{2}}\right\}\right)

for some constants c0,c1>0c_{0},c_{1}>0.

We introduce the normalized variable ϵ=t𝐱𝐱𝐲𝐲F2\epsilon=\frac{t}{\|\mathbf{x}\mathbf{x}^{*}-\mathbf{y}\mathbf{y}^{*}\|_{F}^{2}},

(|d=1n1m|Ad,𝐱𝐱𝐲𝐲|2𝐱𝐱𝐲𝐲F2|ϵ𝐱𝐱𝐲𝐲F2)c0expc1mE(ϵ)\displaystyle\mathbb{P}\left(\left|\sum_{d=1}^{n}\frac{1}{m}|\langle A_{d},\mathbf{x}\mathbf{x}^{*}-\mathbf{y}\mathbf{y}^{*}\rangle|^{2}-\|\mathbf{x}\mathbf{x}^{*}-\mathbf{y}\mathbf{y}^{*}\|_{F}^{2}\right|\geq\epsilon\|\mathbf{x}\mathbf{x}^{*}-\mathbf{y}\mathbf{y}^{*}\|_{F}^{2}\right)\leq c_{0}\text{exp}^{-c_{1}mE(\epsilon)}

where E(ϵ):=min{ϵ2K2,ϵK}E(\epsilon):=\min\left\{\frac{\epsilon^{2}}{K^{2}},\frac{\epsilon}{K}\right\}.

Note that 𝐱𝐱𝐲𝐲F2\|\mathbf{x}\mathbf{x}^{*}-\mathbf{y}\mathbf{y}^{*}\|_{F}^{2} is the distance metric d(,)d(\cdot,\cdot) defined in (2). Hence we can rewrite the high probability result more consicely as,

(|d=1n1m|Ad,𝐱𝐱𝐲𝐲|2d(𝐱,𝐲)2|ϵd(𝐱,𝐲)2)c0expc1mE(ϵ)\displaystyle\mathbb{P}\left(\left|\sum_{d=1}^{n}\frac{1}{m}|\langle A_{d},\mathbf{x}\mathbf{x}^{*}-\mathbf{y}\mathbf{y}^{*}\rangle|^{2}-d(\mathbf{x},\mathbf{y})^{2}\right|\geq\epsilon d(\mathbf{x},\mathbf{y})^{2}\right)\leq c_{0}\text{exp}^{-c_{1}mE(\epsilon)}

VII-B Proof of Lemma 4

Lemma 4.

Given δ>0\delta>0, let 𝒩δ\mathcal{N}_{\delta} be the smallest collection of nn-dimensional balls of radius δ\delta whose union covers the sphere Sn1S^{n-1}. Then, for any matrix An×nA\in\mathbb{C}^{n\times n}, we have

(12δ)sup𝐱1,𝐱2Sn1|A,𝐱1𝐱1𝐱2𝐱2|sup𝐱1,𝐱2𝒩δ|A,𝐱1𝐱1𝐱2𝐱2|(1+2δ)sup𝐱1,𝐱2Sn1|A,𝐱1𝐱1𝐱2𝐱2|.\displaystyle(1-2\delta)\sup_{\mathbf{x}_{1},\mathbf{x}_{2}\in S^{n-1}}|\langle A,\mathbf{x}_{1}\mathbf{x}_{1}^{*}-\mathbf{x}_{2}\mathbf{x}_{2}^{*}\rangle|\leq\sup_{\mathbf{x}_{1},\mathbf{x}_{2}\in\mathcal{N}_{\delta}}|\langle A,\mathbf{x}_{1}\mathbf{x}_{1}^{*}-\mathbf{x}_{2}\mathbf{x}_{2}^{*}\rangle|\leq(1+2\delta)\sup_{\mathbf{x}_{1},\mathbf{x}_{2}\in S^{n-1}}|\langle A,\mathbf{x}_{1}\mathbf{x}_{1}^{*}-\mathbf{x}_{2}\mathbf{x}_{2}^{*}\rangle|.
Proof.

In the proof, we relate the supremum of |A,𝐱1𝐱1𝐱2𝐱2||\langle A,\mathbf{x}_{1}\mathbf{x}_{1}^{*}-\mathbf{x}_{2}\mathbf{x}_{2}^{*}\rangle| over 𝐱,𝐲Sn1\mathbf{x},\mathbf{y}\in S^{n-1} to its supremum over 𝐱,𝐲𝒩δ\mathbf{x},\mathbf{y}\in\mathcal{N}_{\delta}.

Since 𝒩δ\mathcal{N}_{\delta} covers Sn1S^{n-1}, 𝐱Sn1\forall\mathbf{x}\in S^{n-1}, 𝐮𝒩δ\exists\mathbf{u}\in\mathcal{N}_{\delta} such that 𝐱𝐮δ\|\mathbf{x}-\mathbf{u}\|\leq\delta.

Hence 𝐱1,𝐱2Sn1\forall\mathbf{x}_{1},\mathbf{x}_{2}\in S^{n-1}, 𝐲1,𝐲2𝒩δ\exists\mathbf{y}_{1},\mathbf{y}_{2}\in\mathcal{N}_{\delta} such that,

|A,𝐱1𝐱1𝐱2𝐱2A,𝐲1𝐲1𝐲2𝐲2|\displaystyle\ |\langle A,\mathbf{x}_{1}\mathbf{x}_{1}^{*}-\mathbf{x}_{2}\mathbf{x}_{2}^{*}\rangle-\langle A,\mathbf{y}_{1}\mathbf{y}_{1}^{*}-\mathbf{y}_{2}\mathbf{y}_{2}^{*}\rangle|
=|A𝐱1,𝐱1A𝐲1,𝐲1|A𝐱2,𝐱2A𝐲2,𝐲2|\displaystyle\ =|\langle A\mathbf{x}_{1},\mathbf{x}_{1}\rangle-\langle A\mathbf{y}_{1},\mathbf{y}_{1}\rangle|-\langle A\mathbf{x}_{2},\mathbf{x}_{2}\rangle-\langle A\mathbf{y}_{2},\mathbf{y}_{2}\rangle|
=|A𝐱1,𝐱1A𝐱1,𝐲1+A𝐱1,𝐲1A𝐲1,𝐲1|+|A𝐱2,𝐱2A𝐱2,𝐲2+A𝐱2,𝐲2A𝐲2,𝐲2|\displaystyle\ =|\langle A\mathbf{x}_{1},\mathbf{x}_{1}\rangle-\langle A\mathbf{x}_{1},\mathbf{y}_{1}\rangle+\langle A\mathbf{x}_{1},\mathbf{y}_{1}\rangle-\langle A\mathbf{y}_{1},\mathbf{y}_{1}\rangle|+|\langle A\mathbf{x}_{2},\mathbf{x}_{2}\rangle-\langle A\mathbf{x}_{2},\mathbf{y}_{2}\rangle+\langle A\mathbf{x}_{2},\mathbf{y}_{2}\rangle-\langle A\mathbf{y}_{2},\mathbf{y}_{2}\rangle|
=|A𝐱1,𝐱1𝐲1+A𝐲1,𝐱1𝐲1|+|A𝐱2,𝐱2𝐲2+A𝐲2,𝐱2𝐲2|\displaystyle\ =|\langle A\mathbf{x}_{1},\mathbf{x}_{1}-\mathbf{y}_{1}\rangle+\langle A\mathbf{y}_{1},\mathbf{x}_{1}-\mathbf{y}_{1}\rangle|+|\langle A\mathbf{x}_{2},\mathbf{x}_{2}-\mathbf{y}_{2}\rangle+\langle A\mathbf{y}_{2},\mathbf{x}_{2}-\mathbf{y}_{2}\rangle|
2A𝐱1𝐲1+2A𝐱2𝐲2\displaystyle\ \leq 2\|A\|\|\mathbf{x}_{1}-\mathbf{y}_{1}\|+2\|A\|\|\mathbf{x}_{2}-\mathbf{y}_{2}\|
4δA\displaystyle\ \leq 4\delta\|A\|

where A\|A\| denotes the spectral norm of the matrix AA, i.e.,

A=sup𝐱Sn1|A𝐱,𝐱|=12sup𝐱Sn1|A𝐱,𝐱|+12sup𝐲Sn1|A𝐲,𝐲|=12sup𝐱,𝐲Sn1|A,𝐱𝐱𝐲𝐲|\left\|A\right\|=\sup_{\mathbf{x}\in S^{n-1}}|\langle A\mathbf{x},\mathbf{x}\rangle|=\frac{1}{2}\sup_{\mathbf{x}\in S^{n-1}}|\langle A\mathbf{x},\mathbf{x}\rangle|+\frac{1}{2}\sup_{\mathbf{y}\in S^{n-1}}|\langle A\mathbf{y},\mathbf{y}\rangle|=\frac{1}{2}\sup_{\mathbf{x},\mathbf{y}\in S^{n-1}}|\langle A,\mathbf{x}\mathbf{x}^{*}-\mathbf{y}\mathbf{y}^{*}\rangle|

We conclude,

|A,𝐱1𝐱1𝐱2𝐱2||A,𝐲1𝐲1𝐲2𝐲2|4δA\displaystyle\ |\langle A,\mathbf{x}_{1}\mathbf{x}_{1}^{*}-\mathbf{x}_{2}\mathbf{x}_{2}^{*}\rangle|-|\langle A,\mathbf{y}_{1}\mathbf{y}_{1}^{*}-\mathbf{y}_{2}\mathbf{y}_{2}^{*}\rangle|\leq 4\delta\|A\|
|A,𝐲1𝐲1𝐲2𝐲2||A,𝐱1𝐱1𝐱2𝐱2|4δA\displaystyle\ |\langle A,\mathbf{y}_{1}\mathbf{y}_{1}^{*}-\mathbf{y}_{2}\mathbf{y}_{2}^{*}\rangle|\geq|\langle A,\mathbf{x}_{1}\mathbf{x}_{1}^{*}-\mathbf{x}_{2}\mathbf{x}_{2}^{*}\rangle|-4\delta\|A\|

And,

|A,𝐲1𝐲1𝐲2𝐲2||A,𝐱1𝐱1𝐱2𝐱2|4δA\displaystyle\ |\langle A,\mathbf{y}_{1}\mathbf{y}_{1}^{*}-\mathbf{y}_{2}\mathbf{y}_{2}^{*}\rangle|-|\langle A,\mathbf{x}_{1}\mathbf{x}_{1}^{*}-\mathbf{x}_{2}\mathbf{x}_{2}^{*}\rangle|\leq 4\delta\|A\|
|A,𝐲1𝐲1𝐲2𝐲2||A,𝐱1𝐱1𝐱2𝐱2|+4δA\displaystyle\ |\langle A,\mathbf{y}_{1}\mathbf{y}_{1}^{*}-\mathbf{y}_{2}\mathbf{y}_{2}^{*}\rangle|\leq|\langle A,\mathbf{x}_{1}\mathbf{x}_{1}^{*}-\mathbf{x}_{2}\mathbf{x}_{2}^{*}\rangle|+4\delta\|A\|

Taking supremum,

sup𝐱𝒩δ|A,𝐱1𝐱1𝐱2𝐱2|\displaystyle\sup_{\mathbf{x}\in\mathcal{N}_{\delta}}|\langle A,\mathbf{x}_{1}\mathbf{x}_{1}^{*}-\mathbf{x}_{2}\mathbf{x}_{2}^{*}\rangle| sup𝐱Sn1|A,𝐱1𝐱1𝐱2𝐱2|4δA\displaystyle\ \geq\sup_{\mathbf{x}\in S^{n-1}}|\langle A,\mathbf{x}_{1}\mathbf{x}_{1}^{*}-\mathbf{x}_{2}\mathbf{x}_{2}^{*}\rangle|-4\delta\|A\|
=(24δ)A\displaystyle\ =(2-4\delta)\|A\|
=(12δ)sup𝐱Sn1|A,𝐱1𝐱1𝐱2𝐱2|\displaystyle\ =(1-2\delta)\sup_{\mathbf{x}\in S^{n-1}}|\langle A,\mathbf{x}_{1}\mathbf{x}_{1}^{*}-\mathbf{x}_{2}\mathbf{x}_{2}^{*}\rangle|
sup𝐱𝒩δ|A,𝐱1𝐱1𝐱2𝐱2|\displaystyle\sup_{\mathbf{x}\in\mathcal{N}_{\delta}}|\langle A,\mathbf{x}_{1}\mathbf{x}_{1}^{*}-\mathbf{x}_{2}\mathbf{x}_{2}^{*}\rangle| sup𝐱Sn1|A,𝐱1𝐱1𝐱2𝐱2|+4δA\displaystyle\ \leq\sup_{\mathbf{x}\in S^{n-1}}|\langle A,\mathbf{x}_{1}\mathbf{x}_{1}^{*}-\mathbf{x}_{2}\mathbf{x}_{2}^{*}\rangle|+4\delta\|A\|
=(2+4δ)A\displaystyle\ =(2+4\delta)\|A\|
=(1+2δ)sup𝐱Sn1|A,𝐱1𝐱1𝐱2𝐱2|\displaystyle\ =(1+2\delta)\sup_{\mathbf{x}\in S^{n-1}}|\langle A,\mathbf{x}_{1}\mathbf{x}_{1}^{*}-\mathbf{x}_{2}\mathbf{x}_{2}^{*}\rangle|

VIII Appendix C : Non-convex lanscape

VIII-A Supporting Lemmas

The following intermediate results are crucial in proving Theorem 2

Lemma 5.

Let {Ad}d=1n\{A_{d}\}_{d=1}^{n} be a set of Hermitian Gaussian random matrices. Then with probability 14ecmD(ϵ)1-4e^{-cmD(\epsilon)},

|1md=1mAd,ΔΔ¯Ad,𝐱𝐱¯Ad,Δ𝐱¯Ad,𝐱Δ¯|ϵ\displaystyle\left|\frac{1}{m}\sum_{d=1}^{m}\langle A_{d},\Delta\bar{\Delta}^{\top}\rangle\langle A_{d},\mathbf{x}\bar{\mathbf{x}}^{\top}\rangle-\langle A_{d},\Delta\bar{\mathbf{x}}^{\top}\rangle\langle A_{d},\mathbf{x}\bar{\Delta}^{\top}\rangle\right|\leq\epsilon

where

D(ϵ):=min{ϵ24K42Δ24𝐱24,ϵ2K4Δ2𝐱2}D(\epsilon):=\min\left\{\frac{\epsilon^{2}}{4K_{4}^{2}\|\Delta\|_{2}^{4}\|\mathbf{x}\|_{2}^{4}},\frac{\epsilon}{2K_{4}\|\Delta\|_{\infty}^{2}\|\mathbf{x}\|_{\infty}^{2}}\right\}
Proof.

Let An×nA\in\mathbb{C}^{n\times n} be a complex Hermitian Gaussian random matrix, i.e.

  1. 1.

    i\forall~{}i, aii𝒩(0,σ2)a_{ii}\sim\mathcal{N}(0,\sigma^{2}).

  2. 2.

    i,j\forall~{}i,j, iji\neq j, aij𝒩(0,σ22)+i𝒩(0,σ22)a_{ij}\sim\mathcal{N}(0,\frac{\sigma^{2}}{2})+i\mathcal{N}(0,\frac{\sigma^{2}}{2}).

Define the random variable YY,

Y\displaystyle Y =1md=1mAd,ΔΔ¯Ad,𝐱𝐱¯Ad,Δ𝐱¯Ad,𝐱Δ¯\displaystyle\ =\frac{1}{m}\sum_{d=1}^{m}\langle A_{d},\Delta\bar{\Delta}^{\top}\rangle\langle A_{d},\mathbf{x}\bar{\mathbf{x}}^{\top}\rangle-\langle A_{d},\Delta\bar{\mathbf{x}}^{\top}\rangle\langle A_{d},\mathbf{x}\bar{\Delta}^{\top}\rangle
=1md=1m(ijaijdΔiΔ¯j)(klakld𝐱k𝐱¯l)(ijaijd𝐱iΔ¯j)(klakldΔk𝐱¯l)\displaystyle\ =\frac{1}{m}\sum_{d=1}^{m}\left(\sum_{ij}a_{ij}^{d}\Delta_{i}\bar{\Delta}_{j}\right)\left(\sum_{kl}a_{kl}^{d}\mathbf{x}_{k}\bar{\mathbf{x}}_{l}\right)-\left(\sum_{ij}a_{ij}^{d}\mathbf{x}_{i}\bar{\Delta}_{j}\right)\left(\sum_{kl}a_{kl}^{d}\Delta_{k}\bar{\mathbf{x}}_{l}\right)

For any AdA_{d}, Split the entire summation (i,j,k,l)[1,n]4(i,j,k,l)\in[1,n]^{4} into the following 4 sets such that:

  1. 1.

    A:={(i,j,k,l)|i=j=k=l}A:=\{(i,j,k,l)|i=j=k=l\}

  2. 2.

    B:={(i,j,k,l)|i=k,j=l}ACB:=\{(i,j,k,l)|i=k,j=l\}\cap A^{C}

  3. 3.

    C:={(i,j,k,l)|i=l,j=k}ACC:=\{(i,j,k,l)|i=l,j=k\}\cap A^{C}

  4. 4.

    D:={(i,j,k,l)}ACBCCCD:=\{(i,j,k,l)\}\cap A^{C}\cap B^{C}\cap C^{C}

Calculating the expectation of the sum of the elements in each individual sets:

  1. 1.

    For set A,

    𝔼[(ijaijΔiΔ¯j)(klakl𝐱k𝐱¯l)(ijaij𝐱iΔ¯j)(klaklΔk𝐱¯l)]\displaystyle\ \mathbb{E}\left[\left(\sum_{ij}a_{ij}\Delta_{i}\bar{\Delta}_{j}\right)\left(\sum_{kl}a_{kl}\mathbf{x}_{k}\bar{\mathbf{x}}_{l}\right)-\left(\sum_{ij}a_{ij}\mathbf{x}_{i}\bar{\Delta}_{j}\right)\left(\sum_{kl}a_{kl}\Delta_{k}\bar{\mathbf{x}}_{l}\right)\right]
    =𝔼[aii2ΔiΔ¯i𝐱i𝐱¯jaii2ΔiΔ¯i𝐱i𝐱¯i]=𝔼[0]=0\displaystyle\ =\mathbb{E}\left[a_{ii}^{2}\Delta_{i}\bar{\Delta}_{i}\mathbf{x}_{i}\bar{\mathbf{x}}_{j}-a_{ii}^{2}\Delta_{i}\bar{\Delta}_{i}\mathbf{x}_{i}\bar{\mathbf{x}}_{i}\right]=\mathbb{E}[0]=0
  2. 2.

    For set B,

    𝔼[(ijaijΔiΔ¯j)(klakl𝐱k𝐱¯l)(ijaij𝐱iΔ¯j)(klaklΔk𝐱¯l)]\displaystyle\ \mathbb{E}\left[\left(\sum_{ij}a_{ij}\Delta_{i}\bar{\Delta}_{j}\right)\left(\sum_{kl}a_{kl}\mathbf{x}_{k}\bar{\mathbf{x}}_{l}\right)-\left(\sum_{ij}a_{ij}\mathbf{x}_{i}\bar{\Delta}_{j}\right)\left(\sum_{kl}a_{kl}\Delta_{k}\bar{\mathbf{x}}_{l}\right)\right]
    =𝔼[aij2ΔiΔ¯j𝐱i𝐱¯jaij2ΔiΔ¯j𝐱i𝐱¯j]=𝔼[0]=0\displaystyle\ =\mathbb{E}\left[a_{ij}^{2}\Delta_{i}\bar{\Delta}_{j}\mathbf{x}_{i}\bar{\mathbf{x}}_{j}-a_{ij}^{2}\Delta_{i}\bar{\Delta}_{j}\mathbf{x}_{i}\bar{\mathbf{x}}_{j}\right]=\mathbb{E}[0]=0
  3. 3.

    For set C, since the matrix AA is hermitian aij=a¯jia_{ij}=\bar{a}_{ji}

    𝔼[(ijaijΔiΔ¯j)(klakl𝐱k𝐱¯l)(ijaij𝐱iΔ¯j)(klaklΔk𝐱¯l)]\displaystyle\ \mathbb{E}\left[\left(\sum_{ij}a_{ij}\Delta_{i}\bar{\Delta}_{j}\right)\left(\sum_{kl}a_{kl}\mathbf{x}_{k}\bar{\mathbf{x}}_{l}\right)-\left(\sum_{ij}a_{ij}\mathbf{x}_{i}\bar{\Delta}_{j}\right)\left(\sum_{kl}a_{kl}\Delta_{k}\bar{\mathbf{x}}_{l}\right)\right]
    =[|aij|2ΔiΔ¯j𝐱j𝐱¯i|aij|2ΔjΔ¯j𝐱i𝐱¯i]\displaystyle\ =\left[|a_{ij}|^{2}\Delta_{i}\bar{\Delta}_{j}\mathbf{x}_{j}\bar{\mathbf{x}}_{i}-|a_{ij}|^{2}\Delta_{j}\bar{\Delta}_{j}\mathbf{x}_{i}\bar{\mathbf{x}}_{i}\right]

    Notice that i,j\forall i,j

    =|aji|2ΔjΔ¯i𝐱i𝐱¯j+|aij|2ΔiΔ¯j𝐱j𝐱¯i|aji|2ΔiΔ¯i𝐱j𝐱¯j|aij|2ΔjΔ¯j𝐱i𝐱¯i\displaystyle=|a_{ji}|^{2}\Delta_{j}\bar{\Delta}_{i}\mathbf{x}_{i}\bar{\mathbf{x}}_{j}+|a_{ij}|^{2}\Delta_{i}\bar{\Delta}_{j}\mathbf{x}_{j}\bar{\mathbf{x}}_{i}-|a_{ji}|^{2}\Delta_{i}\bar{\Delta}_{i}\mathbf{x}_{j}\bar{\mathbf{x}}_{j}-|a_{ij}|^{2}\Delta_{j}\bar{\Delta}_{j}\mathbf{x}_{i}\bar{\mathbf{x}}_{i}

    Since |aij|2=|aji|2|a_{ij}|^{2}=|a_{ji}|^{2}. Thus,

    =|aji|2[ΔjΔ¯i𝐱i𝐱¯j+ΔiΔ¯j𝐱j𝐱¯iΔiΔ¯i𝐱j𝐱¯jΔjΔ¯j𝐱i𝐱¯i]\displaystyle\ =|a_{ji}|^{2}\left[\Delta_{j}\bar{\Delta}_{i}\mathbf{x}_{i}\bar{\mathbf{x}}_{j}+\Delta_{i}\bar{\Delta}_{j}\mathbf{x}_{j}\bar{\mathbf{x}}_{i}-\Delta_{i}\bar{\Delta}_{i}\mathbf{x}_{j}\bar{\mathbf{x}}_{j}-\Delta_{j}\bar{\Delta}_{j}\mathbf{x}_{i}\bar{\mathbf{x}}_{i}\right]
    =|aji|2[ΔjΔ¯i𝐱i𝐱¯j+ΔiΔ¯j𝐱j𝐱¯iΔi22𝐱j22Δj22𝐱i22]0\displaystyle\ =|a_{ji}|^{2}\left[\Delta_{j}\bar{\Delta}_{i}\mathbf{x}_{i}\bar{\mathbf{x}}_{j}+\Delta_{i}\bar{\Delta}_{j}\mathbf{x}_{j}\bar{\mathbf{x}}_{i}-\|\Delta_{i}\|_{2}^{2}\|\mathbf{x}_{j}\|_{2}^{2}-\|\Delta_{j}\|_{2}^{2}\|\mathbf{x}_{i}\|_{2}^{2}\right]\leq 0
  4. 4.

    For set D, as all the elements (i,j,k,l)D(i,j,k,l)\in D make aij,akla_{ij},a_{kl} independent of each other, we have,

    𝔼(aijaklΔ¯j𝐱¯l(Δi𝐱kΔk𝐱i))=0\displaystyle\mathbb{E}\left(\sum a_{ij}a_{kl}\bar{\Delta}_{j}\bar{\mathbf{x}}_{l}\left(\Delta_{i}\mathbf{x}_{k}-\Delta_{k}\mathbf{x}_{i}\right)\right)=0

Hence we can conclude,

𝔼[1md=1m(Ad,ΔΔ¯Ad,𝐱𝐱¯Ad,Δ𝐱¯Ad,𝐱Δ¯)]=0\mathbb{E}\left[\frac{1}{m}\sum_{d=1}^{m}\left(\langle A_{d},\Delta\bar{\Delta}^{\top}\rangle\langle A_{d},\mathbf{x}\bar{\mathbf{x}}^{\top}\rangle-\langle A_{d},\Delta\bar{\mathbf{x}}^{\top}\rangle\langle A_{d},\mathbf{x}\bar{\Delta}^{\top}\rangle\right)\right]=0

We focus our attention on obtaining concentration bounds. Evaluating the behaviour on the elements in set D,

1md=1m((i,j,k,l)DaijdakldΔ¯j𝐱¯l(Δi𝐱kΔk𝐱i))𝔼(1md=1m(i,j,k,l)DaijdakldΔ¯j𝐱¯l(Δi𝐱kΔk𝐱i))\displaystyle\frac{1}{m}\sum_{d=1}^{m}\left(\sum_{(i,j,k,l)\in D}a_{ij}^{d}a_{kl}^{d}\bar{\Delta}_{j}\bar{\mathbf{x}}_{l}\left(\Delta_{i}\mathbf{x}_{k}-\Delta_{k}\mathbf{x}_{i}\right)\right)-\mathbb{E}\left(\frac{1}{m}\sum_{d=1}^{m}\sum_{{(i,j,k,l)\in D}}a_{ij}^{d}a_{kl}^{d}\bar{\Delta}_{j}\bar{\mathbf{x}}_{l}\left(\Delta_{i}\mathbf{x}_{k}-\Delta_{k}\mathbf{x}_{i}\right)\right)
=(i,j,k,l)DaijdakldΔ¯j𝐱¯l(Δi𝐱kΔk𝐱i)\displaystyle=\sum_{(i,j,k,l)\in D}a_{ij}^{d}a_{kl}^{d}\bar{\Delta}_{j}\bar{\mathbf{x}}_{l}\left(\Delta_{i}\mathbf{x}_{k}-\Delta_{k}\mathbf{x}_{i}\right)

We can see that the above is a centered subexponential random variable. Hence using Bernstein type inequality [34], we can say that,

Pr(|1md=1m(i,j,k,l)DaijdakldΔ¯j𝐱¯l(Δi𝐱kΔk𝐱i)|t)\displaystyle Pr\left(\left|\frac{1}{m}\sum_{d=1}^{m}\sum_{(i,j,k,l)\in D}a_{ij}^{d}a_{kl}^{d}\bar{\Delta}_{j}\bar{\mathbf{x}}_{l}\left(\Delta_{i}\mathbf{x}_{k}-\Delta_{k}\mathbf{x}_{i}\right)\right|\geq t\right)
4exp(cmmin{t24K42Δ24𝐱24,t2K4Δ2𝐱2})\displaystyle\leq 4\text{exp}\left(-cm\min\left\{\frac{t^{2}}{4K_{4}^{2}\|\Delta\|_{2}^{4}\|\mathbf{x}\|_{2}^{4}},\frac{t}{2K_{4}\|\Delta\|_{\infty}^{2}\|\mathbf{x}\|_{\infty}^{2}}\right\}\right)

where K4:=maxi,j{i,j,k,lD(a¯ijakld)ψ1K_{4}:=\max_{i,j}\{\|\sum_{i,j,k,l\in D}(\bar{a}_{ij}a_{kl}^{d})_{\mathbb{R}}\|_{\psi_{1}}, i,j,k,lD(a¯ijakld)ψ1}\|\sum_{i,j,k,l\in D}(\bar{a}_{ij}a_{kl}^{d})_{\mathbb{C}}\|_{\psi_{1}}\} is the subexponential norm. Thus we can argue that,

(|1md=1mAd,ΔΔ¯Ad,𝐱𝐱¯Ad,Δ𝐱¯Ad,𝐱Δ¯|t)4expcmD(t)\displaystyle\mathbb{P}\left(\left|\frac{1}{m}\sum_{d=1}^{m}\langle A_{d},\Delta\bar{\Delta}^{\top}\rangle\langle A_{d},\mathbf{x}\bar{\mathbf{x}}^{\top}\rangle-\langle A_{d},\Delta\bar{\mathbf{x}}^{\top}\rangle\langle A_{d},\mathbf{x}\bar{\Delta}^{\top}\rangle\right|\geq t\right)\leq 4\text{exp}^{-cmD(t)}

where,

D(t):=min{t24K42Δ24𝐱24,t2K4Δ2𝐱2}D(t):=\min\left\{\frac{t^{2}}{4K_{4}^{2}\|\Delta\|_{2}^{4}\|\mathbf{x}\|_{2}^{4}},\frac{t}{2K_{4}\|\Delta\|_{\infty}^{2}\|\mathbf{x}\|_{\infty}^{2}}\right\}

Throughout the rest of the paper, define Δ=𝐱eiϕ𝐳\Delta=\mathbf{x}-e^{i\phi}\mathbf{z} such that ϕ=argminθ[0,2π]𝐱eiθ𝐳2\phi=\underset{\theta\in[0,2\pi]}{\arg\min}\|\mathbf{x}-e^{i\theta}\mathbf{z}\|_{2} for any 𝐱,𝐳𝐂n\mathbf{x},\mathbf{z}\in\mathbf{C}^{n}.

Lemma 6.

For any 𝐱𝐂n\mathbf{x}\in\mathbf{C}^{n},

𝐱𝐱F2=𝐱24\|\mathbf{x}\mathbf{x}^{*}\|_{F}^{2}=\|\mathbf{x}\|_{2}^{4}
Proof.
𝐱𝐱F2\displaystyle\|\mathbf{x}\mathbf{x}^{*}\|_{F}^{2} =i,j=1n|xix¯j|2=i,j=1n(xix¯j)(xix¯j)=i,j=1nxjx¯ixix¯j\displaystyle\ =\sum_{i,j=1}^{n}|x_{i}\bar{x}_{j}|^{2}=\sum_{i,j=1}^{n}(x_{i}\bar{x}_{j})^{*}(x_{i}\bar{x}_{j})=\sum_{i,j=1}^{n}x_{j}\bar{x}_{i}x_{i}\bar{x}_{j}
=i,j=1n|xj|2|xi|2=(i=1n|xi|2)2=𝐱24\displaystyle\ =\sum_{i,j=1}^{n}|x_{j}|^{2}|x_{i}|^{2}=(\sum_{i=1}^{n}|x_{i}|^{2})^{2}=\|\mathbf{x}\|_{2}^{4}

Lemma 7.

The vectors 𝐱eiϕ𝐳\mathbf{x}-e^{i\phi}\mathbf{z} and 𝐱+eiϕ𝐳\mathbf{x}+e^{i\phi}\mathbf{z} are such that Im(𝐱eiϕ𝐳,𝐱+eiϕ𝐳)=0(\langle\mathbf{x}-e^{i\phi}\mathbf{z},\mathbf{x}+e^{i\phi}\mathbf{z}\rangle)=0, where ϕ=argminθ[0,2π]𝐱eiθ𝐳2\phi=\underset{\theta\in[0,2\pi]}{\arg\min}\|\mathbf{x}-e^{i\theta}\mathbf{z}\|_{2}

Proof.

We know from Lemma 3.7 [44] that x,zn\forall\textbf{x},\textbf{z}\in\mathbb{C}^{n} u,vn\exists\textbf{u},\textbf{v}\in\mathbb{C}^{n} such that,

xxzz=uv+vu=[[u,v]]\textbf{x}\textbf{x}^{*}-\textbf{z}\textbf{z}^{*}=\textbf{u}\textbf{v}^{*}+\textbf{v}\textbf{u}^{*}=[[\textbf{u},\textbf{v}]]

It can be easily verified that few such pairs u,v\textbf{u},\textbf{v} are given by,

u=xeiθz,v=x+eiθz,θ[0,2π]\textbf{u}=\textbf{x}-e^{i\theta}\textbf{z},~{}~{}~{}\textbf{v}=\textbf{x}+e^{i\theta}\textbf{z},~{}~{}~{}\forall\theta\in[0,2\pi]

Next, we focus on 𝐮,𝐯\langle\mathbf{u},\mathbf{v}\rangle. We argue that θ\exists\theta such that Im((𝐱eiθ𝐳),(𝐱+eiθ𝐳))=0\langle(\mathbf{x}-e^{i\theta}\mathbf{z}),(\mathbf{x}+e^{i\theta}\mathbf{z})\rangle)=0 To this end consider the following,

𝐮,𝐯\displaystyle\langle\mathbf{u},\mathbf{v}\rangle =𝐮T𝐯¯\displaystyle\ =\mathbf{u}^{T}\bar{\mathbf{v}}
=(xeiθz)T(x+eiθz)¯\displaystyle\ =\left(\textbf{x}-e^{i\theta}\textbf{z}\right)^{T}\overline{(\textbf{x}+e^{i\theta}\textbf{z})}
=(xeiθz)T(x¯+eiθz¯)\displaystyle\ =\left(\textbf{x}-e^{i\theta}\textbf{z}\right)^{T}\left(\bar{\textbf{x}}+e^{-i\theta}\bar{\textbf{z}}\right)
=𝐱,𝐱eiθ𝐳,𝐱+eiθ𝐱,𝐳𝐳,𝐳\displaystyle\ =\langle\mathbf{x},\mathbf{x}\rangle-e^{i\theta}\langle\mathbf{z},\mathbf{x}\rangle+e^{-i\theta}\langle\mathbf{x},\mathbf{z}\rangle-\langle\mathbf{z},\mathbf{z}\rangle
=𝐱,𝐱𝐳,𝐳2iIm(eiθ𝐳,𝐱)\displaystyle\ =\langle\mathbf{x},\mathbf{x}\rangle-\langle\mathbf{z},\mathbf{z}\rangle-2i\text{Im}(e^{i\theta}\langle\mathbf{z},\mathbf{x}\rangle)

Im(eiθ𝐳,𝐱)(e^{i\theta}\langle\mathbf{z},\mathbf{x}\rangle) can only vanish if θ=ω\theta=\omega where ω[0,2π]\omega\in[0,2\pi] is the angle between the two vectors, i.e. ω\omega is such that 𝐱,𝐳=eiωxz\langle\mathbf{x},\mathbf{z}\rangle=e^{i\omega}\|\textbf{x}\|\|\textbf{z}\|. Next we prove that ω=ϕ\omega=\phi where,

ϕ=argminθ[0,2π]𝐱eiθ𝐳2\phi=\underset{\theta\in[0,2\pi]}{\arg\min}\|\mathbf{x}-e^{i\theta}\mathbf{z}\|_{2}

Consider the following argument,

argminθ[0,2π]𝐱eiθ𝐳22\displaystyle\underset{\theta\in[0,2\pi]}{\arg\min}\|\mathbf{x}-e^{i\theta}\mathbf{z}\|_{2}^{2} =argminθ[0,2π](𝐱eiθ𝐳)(𝐱eiθ𝐳)\displaystyle\ =\underset{\theta\in[0,2\pi]}{\arg\min}\left(\mathbf{x}-e^{i\theta}\mathbf{z}\right)^{*}\left(\mathbf{x}-e^{i\theta}\mathbf{z}\right)
=argminθ[0,2π]𝐱2+𝐳2eiθ𝐳𝐱eiθ𝐱𝐳\displaystyle\ =\underset{\theta\in[0,2\pi]}{\arg\min}\|\mathbf{x}\|^{2}+\|\mathbf{z}\|^{2}-e^{-i\theta}\mathbf{z}^{*}\mathbf{x}-e^{i\theta}\mathbf{x}^{*}\mathbf{z}
=argminθ[0,2π]𝐱2+𝐳22Re(eiθ𝐱,𝐳)\displaystyle\ =\underset{\theta\in[0,2\pi]}{\arg\min}\|\mathbf{x}\|^{2}+\|\mathbf{z}\|^{2}-2Re(e^{-i\theta}\langle\mathbf{x},\mathbf{z}\rangle)
=𝐱2+𝐳22argmaxθ[0,2π]Re(eiθ𝐱,𝐳)\displaystyle\ =\|\mathbf{x}\|^{2}+\|\mathbf{z}\|^{2}-2\underset{\theta\in[0,2\pi]}{\arg\max}Re(e^{-i\theta}\langle\mathbf{x},\mathbf{z}\rangle)

It can be seen easily that the argmaxθ[0,2π]Re(eiθ𝐱,𝐳)\underset{\theta\in[0,2\pi]}{\arg\max}Re(e^{-i\theta}\langle\mathbf{x},\mathbf{z}\rangle) is achieved when θ=ω\theta=\omega. Thus we have proved that ϕ=ω\phi=\omega. ∎

Lemma 8.

Let 𝐱,𝐳n\mathbf{x},\mathbf{z}\in\mathbb{C}^{n}. Then,

(𝐱eiϕ𝐳)(𝐱eiϕ𝐳)F22𝐱𝐱𝐳𝐳\|(\mathbf{x}-e^{i\phi}\mathbf{z})(\mathbf{x}-e^{i\phi}\mathbf{z})^{*}\|_{F}^{2}\leq 2\|\mathbf{x}\mathbf{x}^{*}-\mathbf{z}\mathbf{z}^{*}\|

where ϕ=argminθ[0,2π]𝐱eiθ𝐳\phi=\underset{\theta\in[0,2\pi]}{\arg\min}\|\mathbf{x}-e^{i\theta}\mathbf{z}\|

Proof.

Note,

argminθ[0,2π]𝐱eiθ𝐳2\displaystyle\underset{\theta\in[0,2\pi]}{\arg\min}\|\mathbf{x}-e^{i\theta}\mathbf{z}\|^{2} =argminθ[0,2π](𝐱eiθ𝐳)(𝐱eiθ𝐳)\displaystyle\ =\underset{\theta\in[0,2\pi]}{\arg\min}\left(\mathbf{x}-e^{i\theta}\mathbf{z}\right)^{*}\left(\mathbf{x}-e^{i\theta}\mathbf{z}\right)
=argminθ[0,2π]𝐱2+𝐳2eiθ𝐳𝐱eiθ𝐱𝐳\displaystyle\ =\underset{\theta\in[0,2\pi]}{\arg\min}\|\mathbf{x}\|^{2}+\|\mathbf{z}\|^{2}-e^{-i\theta}\mathbf{z}^{*}\mathbf{x}-e^{i\theta}\mathbf{x}^{*}\mathbf{z}
=argminθ[0,2π]𝐱2+𝐳22Re(𝐱,eiθ𝐳)\displaystyle\ =\underset{\theta\in[0,2\pi]}{\arg\min}\|\mathbf{x}\|^{2}+\|\mathbf{z}\|^{2}-2Re(\langle\mathbf{x},e^{i\theta}\mathbf{z}\rangle)

The minimum can only be achieved at a point where Re(𝐱(eiϕ𝐳))0Re(\mathbf{x}^{*}(e^{i\phi}\mathbf{z}))\geq 0. Further notice that the following relation holds,

𝐱𝐱𝐳𝐳+ΔΔ=𝐱Δ+Δ𝐱\mathbf{x}\mathbf{x}^{*}-\mathbf{z}\mathbf{z}^{*}+\Delta\Delta^{*}=\mathbf{x}\Delta^{*}+\Delta\mathbf{x}^{*} (9)

Hence we can see that,

𝐱𝐱𝐳𝐳F2\displaystyle\|\mathbf{x}\mathbf{x}^{*}-\mathbf{z}\mathbf{z}^{*}\|_{F}^{2} =𝐱Δ+Δ𝐱ΔΔF2\displaystyle\ =\|\mathbf{x}\Delta^{*}+\Delta\mathbf{x}^{*}-\Delta\Delta^{*}\|_{F}^{2}

We know that for any matrix AA, AF2=\|A\|_{F}^{2}=Tr(AHA)(A^{H}A),

𝐱𝐱𝐳𝐳F2\displaystyle\|\mathbf{x}\mathbf{x}^{*}-\mathbf{z}\mathbf{z}^{*}\|_{F}^{2} =Tr((𝐱Δ+Δ𝐱ΔΔ)(𝐱Δ+Δ𝐱ΔΔ))\displaystyle\ =Tr\left((\mathbf{x}\Delta^{*}+\Delta\mathbf{x}^{*}-\Delta\Delta^{*})^{*}(\mathbf{x}\Delta^{*}+\Delta\mathbf{x}^{*}-\Delta\Delta^{*})\right)
=(𝐱ΔF2+(𝐱,Δ)2+(Δ,𝐱)2+Δ𝐱F22𝐱,ΔΔF22Δ,𝐱ΔF2+ΔΔF2)\displaystyle\ =\left(\|\mathbf{x}\Delta^{*}\|_{F}^{2}+(\langle\mathbf{x},\Delta\rangle)^{2}+(\langle\Delta,\mathbf{x}\rangle)^{2}+\|\Delta\mathbf{x}^{*}\|_{F}^{2}-2\langle\mathbf{x},\Delta\rangle\|\Delta\|_{F}^{2}-2\langle\Delta,\mathbf{x}\rangle\|\Delta\|_{F}^{2}+\|\Delta\Delta^{*}\|_{F}^{2}\right)

Note that,

(𝐱,Δ)2+(Δ,𝐱)2\displaystyle(\langle\mathbf{x},\Delta\rangle)^{2}+(\langle\Delta,\mathbf{x}\rangle)^{2} =(𝐱,Δ)2+(𝐱,Δ)2¯\displaystyle=(\langle\mathbf{x},\Delta\rangle)^{2}+\overline{(\langle\mathbf{x},\Delta\rangle)^{2}} =2Re(𝐱,Δ)2\displaystyle=2Re(\langle\mathbf{x},\Delta\rangle)^{2}
𝐱,ΔΔF2+Δ,𝐱ΔF2\displaystyle\langle\mathbf{x},\Delta\rangle\|\Delta\|_{F}^{2}+\langle\Delta,\mathbf{x}\rangle\|\Delta\|_{F}^{2} =𝐱,ΔΔF2+𝐱,Δ¯ΔF2\displaystyle=\langle\mathbf{x},\Delta\rangle\|\Delta\|_{F}^{2}+\overline{\langle\mathbf{x},\Delta\rangle}\|\Delta\|_{F}^{2} =2Re(𝐱,Δ)ΔF2\displaystyle=2Re(\langle\mathbf{x},\Delta\rangle)\|\Delta\|_{F}^{2}

Thus we can conclude,

𝐱𝐱𝐳𝐳F2\displaystyle\|\mathbf{x}\mathbf{x}^{*}-\mathbf{z}\mathbf{z}^{*}\|_{F}^{2} =(2𝐱,ΔF2+2Re((𝐱,Δ)2)4Re(𝐱,Δ)ΔF2+ΔΔF2)\displaystyle\ =\left(2\|\langle\mathbf{x},\Delta\rangle\|_{F}^{2}+2Re((\langle\mathbf{x},\Delta\rangle)^{2})-4Re(\langle\mathbf{x},\Delta\rangle)\|\Delta\|_{F}^{2}+\|\Delta\Delta^{*}\|_{F}^{2}\right)
=(2𝐱𝐱ΔΔ+2Re((𝐱,Δ)2)4Re(𝐱ΔΔΔ)+ΔΔF2)\displaystyle\ =\left(2\mathbf{x}^{*}\mathbf{x}\Delta^{*}\Delta+2Re((\langle\mathbf{x},\Delta\rangle)^{2})-4Re(\mathbf{x}^{*}\Delta\Delta^{*}\Delta)+\|\Delta\Delta^{*}\|_{F}^{2}\right)

Since 𝐱𝐱ΔΔ=𝐱,ΔF2\mathbf{x}^{*}\mathbf{x}\Delta^{*}\Delta=\|\langle\mathbf{x},\Delta\rangle\|_{F}^{2}, its a real value.

𝐱𝐱𝐱(𝐱)F2\displaystyle\|\mathbf{x}\mathbf{x}^{*}-\mathbf{x}^{*}(\mathbf{x}^{*})^{*}\|_{F}^{2} =2𝐱(𝐱Δ)ΔΔ+2Re((𝐱,Δ)2)2Re(𝐱ΔΔΔ)+ΔΔF2\displaystyle\ =2\mathbf{x}^{*}\left(\mathbf{x}-\Delta\right)\Delta^{*}\Delta+2Re((\langle\mathbf{x},\Delta\rangle)^{2})-2Re(\mathbf{x}^{*}\Delta\Delta^{*}\Delta)+\|\Delta\Delta^{*}\|_{F}^{2}
=2𝐱(𝐱Δ)ΔΔ+(𝐱,Δ)2+(Δ,𝐱)2𝐱,ΔΔF2Δ,𝐱ΔF2+ΔΔF2\displaystyle\ =2\mathbf{x}^{*}\left(\mathbf{x}-\Delta\right)\Delta^{*}\Delta+(\langle\mathbf{x},\Delta\rangle)^{2}+(\langle\Delta,\mathbf{x}\rangle)^{2}-\langle\mathbf{x},\Delta\rangle\|\Delta\|_{F}^{2}-\langle\Delta,\mathbf{x}\rangle\|\Delta\|_{F}^{2}+\|\Delta\Delta^{*}\|_{F}^{2}
=2𝐱(𝐱Δ)ΔΔ+(𝐱,Δ12Δ,Δ)2+((Δ,𝐱12Δ,Δ)2+12ΔΔF2\displaystyle\ =2\mathbf{x}^{*}\left(\mathbf{x}-\Delta\right)\Delta^{*}\Delta+\left(\langle\mathbf{x},\Delta\rangle-\frac{1}{2}\langle\Delta,\Delta\rangle\right)^{2}+\left((\langle\Delta,\mathbf{x}\rangle-\frac{1}{2}\langle\Delta,\Delta\rangle\right)^{2}+\frac{1}{2}\|\Delta\Delta^{*}\|_{F}^{2}
=2𝐱(𝐱Δ)ΔΔ+2Re((Δ,𝐱12Δ)2+12ΔΔF2\displaystyle\ =2\mathbf{x}^{*}\left(\mathbf{x}-\Delta\right)\Delta^{*}\Delta+2Re\left((\langle\Delta,\mathbf{x}-\frac{1}{2}\Delta\rangle\right)^{2}+\frac{1}{2}\|\Delta\Delta^{*}\|_{F}^{2}
=2𝐱eiϕ𝐳ΔΔ+12Re((Δ,𝐱+eiϕ𝐳)2+12ΔΔF2\displaystyle\ =2\mathbf{x}^{*}e^{i\phi}\mathbf{z}\Delta^{*}\Delta+\frac{1}{2}Re\left((\langle\Delta,\mathbf{x}+e^{i\phi}\mathbf{z}\rangle\right)^{2}+\frac{1}{2}\|\Delta\Delta^{*}\|_{F}^{2}

From Lemma 7, it can be seen that Im(Δ,𝐱+eiϕ𝐳)=0Im\left(\langle\Delta,\mathbf{x}+e^{i\phi}\mathbf{z}\rangle\right)=0. Hence we can say,

𝐱𝐱𝐳𝐳F212ΔΔF2\|\mathbf{x}\mathbf{x}^{*}-\mathbf{z}\mathbf{z}^{*}\|_{F}^{2}\geq\frac{1}{2}\|\Delta^{*}\Delta\|_{F}^{2}

VIII-B Wirtinger Calculus

We use standard arguments from wirtinger calculus [30] to prove results Theorem 2. The basic intuition is to look at the 2\ell_{2}-loss function ff (P2) as function of two real variables in n\mathbb{R}^{n} rather instead of single complex variable in n\mathbb{C}^{n}. This workaround is required for the analysis of real function of complex variables because of notions of complex differentiability and conclusions from Cauchy-Reimann equations [45]. Hence we map the function f:nf:\mathbb{C}^{n}\rightarrow\mathbb{R} to g:n×ng:\mathbb{R}^{n}\times\mathbb{R}^{n}\rightarrow\mathbb{R} and instead of analysing the properties of 2f\nabla^{2}f, we analyse the properties of 2g\nabla^{2}g.

We first introduce the mapped function gg and the corresponding expressions for g\nabla g and 2g\nabla^{2}g

f(𝐱)=g(𝐱,𝐱¯)\displaystyle f(\mathbf{x})=g(\mathbf{x},\bar{\mathbf{x}}) =1mi=1mgi(𝐱,𝐱¯)\displaystyle\ =\frac{1}{m}\sum_{i=1}^{m}g_{i}(\mathbf{x},\bar{\mathbf{x}})
=1mi=1n|𝐱¯Ai𝐱ci|2\displaystyle\ =\frac{1}{m}\sum_{i=1}^{n}|\bar{\mathbf{x}}^{\top}A_{i}\mathbf{x}-c_{i}|^{2}

For the gradient g\nabla g we have,

g(𝐱,𝐱¯)=1mi=1n[(𝐱¯Ai𝐱ci)Ai𝐱(𝐱¯Ai𝐱ci)Ai𝐱¯]\displaystyle\nabla g(\mathbf{x},\bar{\mathbf{x}})=\frac{1}{m}\sum_{i=1}^{n}\begin{bmatrix}(\bar{\mathbf{x}}^{\top}A_{i}\mathbf{x}-c_{i})A_{i}\mathbf{x}\\ (\bar{\mathbf{x}}^{\top}A_{i}\mathbf{x}-c_{i})A_{i}\bar{\mathbf{x}}\end{bmatrix}

For the hessian 2g\nabla^{2}g, we have

2g(𝐱,𝐱¯)=1mi=1m[(2𝐱¯Ai𝐱ci)Ai(Ai𝐱)(Ai𝐱)(Ai𝐱¯)(Ai𝐱¯)(2𝐱¯Ai𝐱ci)Ai]\nabla^{2}g(\mathbf{x},\bar{\mathbf{x}})=\frac{1}{m}\sum_{i=1}^{m}\begin{bmatrix}(2\bar{\mathbf{x}}^{\top}A_{i}\mathbf{x}-c_{i})A_{i}&(A_{i}\mathbf{x})(A_{i}\mathbf{x})^{\top}\\ (A_{i}\bar{\mathbf{x}})(A_{i}\bar{\mathbf{x}})^{\top}&(2\bar{\mathbf{x}}^{\top}A_{i}\mathbf{x}-c_{i})A_{i}\end{bmatrix}

The following can be verified easily,

g(𝐱),[ΔΔ¯]\displaystyle\langle\nabla g(\mathbf{x}),\begin{bmatrix}\Delta\\ \bar{\Delta}\end{bmatrix}\rangle =1mi=1mAi,𝐱𝐱¯𝐳(𝐳¯)Ai,𝐱Δ¯+Δ𝐱¯\displaystyle\ =\frac{1}{m}\sum_{i=1}^{m}\langle A_{i},\mathbf{x}\bar{\mathbf{x}}^{\top}-\mathbf{z}(\bar{\mathbf{z}})^{\top}\rangle\langle A_{i},\mathbf{x}\bar{\Delta}^{\top}+\Delta\bar{\mathbf{x}}^{\top}\rangle
=1mi=1mAi,𝐱𝐱¯𝐳(𝐳¯)Ai,𝐱𝐱¯𝐳(𝐳¯)+ΔΔ¯\displaystyle\ =\frac{1}{m}\sum_{i=1}^{m}\langle A_{i},\mathbf{x}\bar{\mathbf{x}}^{\top}-\mathbf{z}(\bar{\mathbf{z}})^{\top}\rangle\langle A_{i},\mathbf{x}\bar{\mathbf{x}}^{\top}-\mathbf{z}(\bar{\mathbf{z}}^{*})^{\top}+\Delta\bar{\Delta}^{\top}\rangle (10)
[ΔΔ¯]2g(𝐱,𝐱¯)[ΔΔ¯]\displaystyle\begin{bmatrix}\Delta\\ \bar{\Delta}\end{bmatrix}^{*}\nabla^{2}g(\mathbf{x},\bar{\mathbf{x}})\begin{bmatrix}\Delta\\ \bar{\Delta}\end{bmatrix} =1mi=1m[ΔΔ¯]2gi(𝐱,𝐱¯)[ΔΔ¯]\displaystyle\ =\frac{1}{m}\sum_{i=1}^{m}\begin{bmatrix}\Delta\\ \bar{\Delta}\end{bmatrix}^{*}\nabla^{2}g_{i}(\mathbf{x},\bar{\mathbf{x}})\begin{bmatrix}\Delta\\ \bar{\Delta}\end{bmatrix}
=1mi=1m(2𝐱Ai𝐱¯bi)(Δ¯AiΔ+ΔAiΔ¯)+((ΔAi𝐱¯)2+(Δ¯Ai𝐱)2)\displaystyle\ =\frac{1}{m}\sum_{i=1}^{m}(2\mathbf{x}^{\top}A_{i}\bar{\mathbf{x}}-b_{i})(\bar{\Delta}^{\top}A_{i}\Delta+\Delta^{\top}A_{i}\bar{\Delta})+\left((\Delta^{\top}A_{i}\bar{\mathbf{x}})^{2}+(\bar{\Delta}A_{i}\mathbf{x})^{2}\right)

VIII-C Proof of Theorem 2

Theorem 2.

Let {Ai}i=1m\{A_{i}\}_{i=1}^{m} be a set of complex n×nn\times n Gaussian random matrices, and let m>Cnm>Cn for some constant C>0C>0. Let the scalars {ci}i=1m\{c_{i}\}_{i=1}^{m} characterizing the objective function ff of problem (P2) be generated by quadratic measurements of an unknown vector 𝐳\mathbf{z}. Then, for any given ξ(0,1)\xi\in(0,1), there exist positive constants β,γ,\beta,\gamma, and ζ\zeta such that the following statements hold with probability at least 1ξ1-\xi:

  • 1)

    The function f is (β,ζ,γ)(\beta,\zeta,\gamma)-strict saddle, and

  • 2)

    Every local minimum 𝐰\mathbf{w} of ff satisfies d(𝐰,𝐳)=0d(\mathbf{w},\mathbf{z})=0

Proof.

Notice that,

(ΔA𝐱¯)2+(Δ¯A𝐱)2\displaystyle(\Delta A\bar{\mathbf{x}})^{2}+(\bar{\Delta}^{\top}A\mathbf{x})^{2} =(A,𝐱Δ¯+Δ𝐱¯)22(ΔA𝐱¯)(Δ¯A𝐱)\displaystyle\ =\left(\langle A,\mathbf{x}\bar{\Delta}^{\top}+\Delta\bar{\mathbf{x}}^{\top}\rangle\right)^{2}-2(\Delta^{\top}A\bar{\mathbf{x}})(\bar{\Delta}^{\top}A\mathbf{x})
=(A,𝐱𝐱¯𝐳(𝐳¯)+ΔΔ¯)22(A,Δ𝐱¯)(A,𝐱Δ¯)\displaystyle\ =\left(\langle A,\mathbf{x}\bar{\mathbf{x}}^{\top}-\mathbf{z}(\bar{\mathbf{z}})^{\top}+\Delta\bar{\Delta}^{\top}\rangle\right)^{2}-2(\langle A,\Delta\bar{\mathbf{x}}^{\top}\rangle)(\langle A,\mathbf{x}\bar{\Delta}^{\top}\rangle)

Using (9), we can reorganize,

[ΔΔ¯]2gi(𝐱,𝐱¯)[ΔΔ¯]\displaystyle\ \begin{bmatrix}\Delta\\ \bar{\Delta}\end{bmatrix}^{*}\nabla^{2}g_{i}(\mathbf{x},\bar{\mathbf{x}})\begin{bmatrix}\Delta\\ \bar{\Delta}\end{bmatrix}
=Ai,2𝐱𝐱¯𝐳(𝐳¯)Ai,2ΔΔ¯+(A,𝐱𝐱¯𝐳(𝐳¯)+ΔΔ¯)22(A,Δ𝐱¯)(A,𝐱Δ¯)\displaystyle\ =\langle A_{i},2\mathbf{x}\bar{\mathbf{x}}^{\top}-\mathbf{z}(\bar{\mathbf{z}})^{\top}\rangle\langle A_{i},2\Delta\bar{\Delta}^{\top}\rangle+\left(\langle A,\mathbf{x}\bar{\mathbf{x}}^{\top}-\mathbf{z}(\bar{\mathbf{z}})^{\top}+\Delta\bar{\Delta}^{\top}\rangle\right)^{2}-2(\langle A,\Delta\bar{\mathbf{x}}^{\top}\rangle)(\langle A,\mathbf{x}\bar{\Delta}^{\top}\rangle)
=2(Ai,2𝐱𝐱¯𝐳(𝐳¯)Ai,ΔΔ¯)+Ai,ΔΔ¯Ai,ΔΔ¯+𝐱𝐱¯𝐳(𝐳¯)\displaystyle\ =2\left(\langle A_{i},2\mathbf{x}\bar{\mathbf{x}}^{\top}-\mathbf{z}(\bar{\mathbf{z}})^{\top}\rangle\langle A_{i},\Delta\bar{\Delta}^{\top}\rangle\right)+\langle A_{i},\Delta\bar{\Delta}^{\top}\rangle\langle A_{i},\Delta\bar{\Delta}^{\top}+\mathbf{x}\bar{\mathbf{x}}^{\top}-\mathbf{z}(\bar{\mathbf{z}})^{\top}\rangle
+Ai,𝐱𝐱¯𝐳(𝐳¯)Ai,ΔΔ¯+𝐱𝐱¯𝐳(𝐳¯)2(A,Δ𝐱¯)(A,𝐱Δ¯)\displaystyle\ +\langle A_{i},\mathbf{x}\bar{\mathbf{x}}^{\top}-\mathbf{z}(\bar{\mathbf{z}})^{\top}\rangle\langle A_{i},\Delta\bar{\Delta}^{\top}+\mathbf{x}\bar{\mathbf{x}}^{\top}-\mathbf{z}(\bar{\mathbf{z}})^{\top}\rangle-2(\langle A,\Delta\bar{\mathbf{x}}^{\top}\rangle)(\langle A,\mathbf{x}\bar{\Delta}^{\top}\rangle)
=2Ai,ΔΔ¯Ai,𝐱𝐱¯+2(Ai,𝐱𝐱¯𝐳(𝐳¯)Ai,ΔΔ¯)+Ai,ΔΔ¯Ai,𝐱𝐱¯𝐳(𝐳¯)\displaystyle\ =2\langle A_{i},\Delta\bar{\Delta}^{\top}\rangle\langle A_{i},\mathbf{x}\bar{\mathbf{x}}^{\top}\rangle+2\left(\langle A_{i},\mathbf{x}\bar{\mathbf{x}}^{\top}-\mathbf{z}(\bar{\mathbf{z}})^{\top}\rangle\langle A_{i},\Delta\bar{\Delta}^{\top}\rangle\right)+\langle A_{i},\Delta\bar{\Delta}^{\top}\rangle\langle A_{i},\mathbf{x}\bar{\mathbf{x}}^{\top}-\mathbf{z}(\bar{\mathbf{z}})^{\top}\rangle
+Ai,ΔΔ¯Ai,ΔΔ¯+Ai,𝐱𝐱¯𝐳(𝐳¯)Ai,ΔΔ¯+𝐱𝐱¯𝐳(𝐳¯)2(A,Δ𝐱¯)(A,𝐱Δ¯)\displaystyle\ +\langle A_{i},\Delta\bar{\Delta}^{\top}\rangle\langle A_{i},\Delta\bar{\Delta}^{\top}\rangle+\langle A_{i},\mathbf{x}\bar{\mathbf{x}}^{\top}-\mathbf{z}(\bar{\mathbf{z}})^{\top}\rangle\langle A_{i},\Delta\bar{\Delta}^{\top}+\mathbf{x}\bar{\mathbf{x}}^{\top}-\mathbf{z}(\bar{\mathbf{z}})^{\top}\rangle-2(\langle A,\Delta\bar{\mathbf{x}}^{\top}\rangle)(\langle A,\mathbf{x}\bar{\Delta}^{\top}\rangle)

Adding and subtracting 2Ai,𝐱𝐱¯𝐳(𝐳¯)Ai,𝐱𝐱¯𝐳(𝐳¯)2\langle A_{i},\mathbf{x}\bar{\mathbf{x}}^{\top}-\mathbf{z}(\bar{\mathbf{z}})^{\top}\rangle\langle A_{i},\mathbf{x}\bar{\mathbf{x}}^{\top}-\mathbf{z}(\bar{\mathbf{z}})^{\top}\rangle, reorganizing,

[ΔΔ¯]2gi(𝐱,𝐱¯)[ΔΔ¯]\displaystyle\ \begin{bmatrix}\Delta\\ \bar{\Delta}\end{bmatrix}^{*}\nabla^{2}g_{i}(\mathbf{x},\bar{\mathbf{x}})\begin{bmatrix}\Delta\\ \bar{\Delta}\end{bmatrix}
=2Ai,ΔΔ¯Ai,𝐱𝐱¯+2(Ai,𝐱𝐱¯𝐳(𝐳¯)Ai,ΔΔ¯+𝐱𝐱¯𝐳(𝐳¯))\displaystyle\ =2\langle A_{i},\Delta\bar{\Delta}^{\top}\rangle\langle A_{i},\mathbf{x}\bar{\mathbf{x}}^{\top}\rangle+2\left(\langle A_{i},\mathbf{x}\bar{\mathbf{x}}^{\top}-\mathbf{z}(\bar{\mathbf{z}})^{\top}\rangle\langle A_{i},\Delta\bar{\Delta}^{\top}+\mathbf{x}\bar{\mathbf{x}}^{\top}-\mathbf{z}(\bar{\mathbf{z}})^{\top}\rangle\right)
2Ai,𝐱𝐱¯𝐳(𝐳¯)Ai,𝐱𝐱¯𝐳(𝐳¯)+Ai,ΔΔ¯Ai,𝐱𝐱¯𝐳(𝐳¯)\displaystyle\ -2\langle A_{i},\mathbf{x}\bar{\mathbf{x}}^{\top}-\mathbf{z}(\bar{\mathbf{z}})^{\top}\rangle\langle A_{i},\mathbf{x}\bar{\mathbf{x}}^{\top}-\mathbf{z}(\bar{\mathbf{z}})^{\top}\rangle+\langle A_{i},\Delta\bar{\Delta}^{\top}\rangle\langle A_{i},\mathbf{x}\bar{\mathbf{x}}^{\top}-\mathbf{z}(\bar{\mathbf{z}})^{\top}\rangle
+Ai,ΔΔ¯Ai,ΔΔ¯+Ai,𝐱𝐱¯𝐳(𝐳¯)Ai,ΔΔ¯+𝐱𝐱¯𝐳(𝐳¯)2(A,Δ𝐱¯)(A,𝐱Δ¯)\displaystyle\ +\langle A_{i},\Delta\bar{\Delta}^{\top}\rangle\langle A_{i},\Delta\bar{\Delta}^{\top}\rangle+\langle A_{i},\mathbf{x}\bar{\mathbf{x}}^{\top}-\mathbf{z}(\bar{\mathbf{z}})^{\top}\rangle\langle A_{i},\Delta\bar{\Delta}^{\top}+\mathbf{x}\bar{\mathbf{x}}^{\top}-\mathbf{z}(\bar{\mathbf{z}})^{\top}\rangle-2(\langle A,\Delta\bar{\mathbf{x}}^{\top}\rangle)(\langle A,\mathbf{x}\bar{\Delta}^{\top}\rangle)

Adding and subtracting Ai,𝐱𝐱¯𝐳(𝐳¯)Ai,𝐱𝐱¯𝐳(𝐳¯)\langle A_{i},\mathbf{x}\bar{\mathbf{x}}^{\top}-\mathbf{z}(\bar{\mathbf{z}})^{\top}\rangle\langle A_{i},\mathbf{x}\bar{\mathbf{x}}^{\top}-\mathbf{z}(\bar{\mathbf{z}})^{\top}\rangle, reorganizing,

[ΔΔ¯]2gi(𝐱,𝐱¯)[ΔΔ¯]\displaystyle\ \begin{bmatrix}\Delta\\ \bar{\Delta}\end{bmatrix}^{*}\nabla^{2}g_{i}(\mathbf{x},\bar{\mathbf{x}})\begin{bmatrix}\Delta\\ \bar{\Delta}\end{bmatrix}
=2Ai,ΔΔ¯Ai,𝐱𝐱¯2(A,Δ𝐱¯)(A,𝐱Δ¯)3Ai,𝐱𝐱¯𝐳(𝐳¯)Ai,𝐱𝐱¯𝐳(𝐳¯)\displaystyle\ =2\langle A_{i},\Delta\bar{\Delta}^{\top}\rangle\langle A_{i},\mathbf{x}\bar{\mathbf{x}}^{\top}\rangle-2(\langle A,\Delta\bar{\mathbf{x}}^{\top}\rangle)(\langle A,\mathbf{x}\bar{\Delta}^{\top}\rangle)-3\langle A_{i},\mathbf{x}\bar{\mathbf{x}}^{\top}-\mathbf{z}(\bar{\mathbf{z}})^{\top}\rangle\langle A_{i},\mathbf{x}\bar{\mathbf{x}}^{\top}-\mathbf{z}(\bar{\mathbf{z}})^{\top}\rangle
+Ai,ΔΔ¯Ai,ΔΔ¯+4Ai,𝐱𝐱¯𝐳(𝐳¯)Ai,ΔΔ¯+𝐱𝐱¯𝐳(𝐳¯)\displaystyle\ +\langle A_{i},\Delta\bar{\Delta}^{\top}\rangle\langle A_{i},\Delta\bar{\Delta}^{\top}\rangle+4\langle A_{i},\mathbf{x}\bar{\mathbf{x}}^{\top}-\mathbf{z}(\bar{\mathbf{z}})^{\top}\rangle\langle A_{i},\Delta\bar{\Delta}^{\top}+\mathbf{x}\bar{\mathbf{x}}^{\top}-\mathbf{z}(\bar{\mathbf{z}})^{\top}\rangle

Using equation (VIII-B),

[ΔΔ¯]2gi(𝐱,𝐱¯)[ΔΔ¯]\displaystyle\ \begin{bmatrix}\Delta\\ \bar{\Delta}\end{bmatrix}^{*}\nabla^{2}g_{i}(\mathbf{x},\bar{\mathbf{x}})\begin{bmatrix}\Delta\\ \bar{\Delta}\end{bmatrix}
=2Ai,ΔΔ¯Ai,𝐱𝐱¯2(A,Δ𝐱¯)(A,𝐱Δ¯)3Ai,𝐱𝐱¯𝐳(𝐳¯)Ai,𝐱𝐱¯𝐳(𝐳¯)\displaystyle\ =2\langle A_{i},\Delta\bar{\Delta}^{\top}\rangle\langle A_{i},\mathbf{x}\bar{\mathbf{x}}^{\top}\rangle-2(\langle A,\Delta\bar{\mathbf{x}}^{\top}\rangle)(\langle A,\mathbf{x}\bar{\Delta}^{\top}\rangle)-3\langle A_{i},\mathbf{x}\bar{\mathbf{x}}^{\top}-\mathbf{z}(\bar{\mathbf{z}})^{\top}\rangle\langle A_{i},\mathbf{x}\bar{\mathbf{x}}^{\top}-\mathbf{z}(\bar{\mathbf{z}})^{\top}\rangle
+Ai,ΔΔ¯Ai,ΔΔ¯+4gi(𝐱,𝐱¯),[ΔΔ¯]\displaystyle\ +\langle A_{i},\Delta\bar{\Delta}^{\top}\rangle\langle A_{i},\Delta\bar{\Delta}^{\top}\rangle+4\langle\nabla g_{i}(\mathbf{x},\bar{\mathbf{x}}),\begin{bmatrix}\Delta\\ \bar{\Delta}\end{bmatrix}\rangle

Overall, we can conclude that,

[ΔΔ¯]2g(𝐱,𝐱¯)[ΔΔ¯]=1mi=1m[ΔΔ¯]2gi(𝐱,𝐱¯)[ΔΔ¯]\displaystyle\ \begin{bmatrix}\Delta\\ \bar{\Delta}\end{bmatrix}^{*}\nabla^{2}g(\mathbf{x},\bar{\mathbf{x}})\begin{bmatrix}\Delta\\ \bar{\Delta}\end{bmatrix}=\frac{1}{m}\sum_{i=1}^{m}\begin{bmatrix}\Delta\\ \bar{\Delta}\end{bmatrix}^{*}\nabla^{2}g_{i}(\mathbf{x},\bar{\mathbf{x}})\begin{bmatrix}\Delta\\ \bar{\Delta}\end{bmatrix}
=2mi=1mAi,ΔΔ¯Ai,𝐱𝐱¯(A,Δ𝐱¯)(A,𝐱Δ¯)3mi=1mAi,𝐱𝐱¯𝐳(𝐳¯)Ai,𝐱𝐱¯𝐳(𝐳¯)\displaystyle\ =\frac{2}{m}\sum_{i=1}^{m}\langle A_{i},\Delta\bar{\Delta}^{\top}\rangle\langle A_{i},\mathbf{x}\bar{\mathbf{x}}^{\top}\rangle-(\langle A,\Delta\bar{\mathbf{x}}^{\top}\rangle)(\langle A,\mathbf{x}\bar{\Delta}^{\top}\rangle)-\frac{3}{m}\sum_{i=1}^{m}\langle A_{i},\mathbf{x}\bar{\mathbf{x}}^{\top}-\mathbf{z}(\bar{\mathbf{z}})^{\top}\rangle\langle A_{i},\mathbf{x}\bar{\mathbf{x}}^{\top}-\mathbf{z}(\bar{\mathbf{z}})^{\top}\rangle
+1mi=1mAi,ΔΔ¯Ai,ΔΔ¯+4mi=1mgi(𝐱,𝐱¯),[ΔΔ¯]\displaystyle\ +\frac{1}{m}\sum_{i=1}^{m}\langle A_{i},\Delta\bar{\Delta}^{\top}\rangle\langle A_{i},\Delta\bar{\Delta}^{\top}\rangle+\frac{4}{m}\sum_{i=1}^{m}\langle\nabla g_{i}(\mathbf{x},\bar{\mathbf{x}}),\begin{bmatrix}\Delta\\ \bar{\Delta}\end{bmatrix}\rangle

Using Lemma 3 and Lemma 5, we can conclude that with probability greater than 1c1ec2mmin{D(t),E(ϵ)}1-c_{1}e^{-c_{2}m\min\{D(t),E(\epsilon)\}},

[ΔΔ¯]2g(𝐱,𝐱¯)[ΔΔ¯]\displaystyle\begin{bmatrix}\Delta\\ \bar{\Delta}\end{bmatrix}^{*}\nabla^{2}g(\mathbf{x},\bar{\mathbf{x}})\begin{bmatrix}\Delta\\ \bar{\Delta}\end{bmatrix} t+4δΔ2+βΔΔF23α𝐱𝐱𝐳(𝐳)F2\displaystyle\ \leq t+4\delta\|\Delta\|_{2}+\beta\|\Delta\Delta^{*}\|_{F}^{2}-3\alpha\|\mathbf{x}\mathbf{x}^{*}-\mathbf{z}(\mathbf{z})^{*}\|_{F}^{2}
t+4δΔ2+2β𝐱𝐱𝐳(𝐳)F23α𝐱𝐱𝐳(𝐳)F2\displaystyle\ \leq t+4\delta\|\Delta\|_{2}+2\beta\|\mathbf{x}\mathbf{x}^{*}-\mathbf{z}(\mathbf{z})^{*}\|_{F}^{2}-3\alpha\|\mathbf{x}\mathbf{x}^{*}-\mathbf{z}(\mathbf{z})^{*}\|_{F}^{2}
(2β3α)𝐱𝐱𝐳(𝐳)F2+4δΔ2+t\displaystyle\ \leq\left(2\beta-3\alpha\right)\|\mathbf{x}\mathbf{x}^{*}-\mathbf{z}(\mathbf{z})^{*}\|_{F}^{2}+4\delta\|\Delta\|_{2}+t (11)

where c1,c2>0\exists c_{1},c_{2}>0 which can be computed from Lemma 3 and Lemma 5.

For any ξ[0,1]\xi\in[0,1], there can be multiple possibilities of the constants β,ζ\beta,\zeta and γ\gamma satisfying Theorem 2.

Given ξ\xi, we can bound m=O(n)m=O(n) such that the mapping 𝒜\mathcal{M}_{\mathcal{A}} is (1c,1+c)(1-c,1+c)-stable, for some small c>0c>0 and if the current vector 𝐱\mathbf{x} is not close to 𝐱\mathbf{x}^{*} such that ΔC0δ\|\Delta\|\geq C_{0}\delta, for sufficiently large C0>0C_{0}>0, then we have

[ΔΔ¯]2g(𝐱,𝐱¯)[ΔΔ¯](1+5c)C02δ2+4C0δ20\displaystyle\begin{bmatrix}\Delta\\ \bar{\Delta}\end{bmatrix}^{*}\nabla^{2}g(\mathbf{x},\bar{\mathbf{x}})\begin{bmatrix}\Delta\\ \bar{\Delta}\end{bmatrix}\leq(-1+5c)C_{0}^{2}\delta^{2}+4C_{0}\delta^{2}\leq 0

A particular set of (c,C0)(c,C_{0}) which fit the above condition is c=120,C0=10c=\frac{1}{20},C_{0}=10, then

[ΔΔ¯]2g(𝐱,𝐱¯)[ΔΔ¯](1+5c)C02δ2+4C0δ20\displaystyle\begin{bmatrix}\Delta\\ \bar{\Delta}\end{bmatrix}^{*}\nabla^{2}g(\mathbf{x},\bar{\mathbf{x}})\begin{bmatrix}\Delta\\ \bar{\Delta}\end{bmatrix}\leq(-1+5c)C_{0}^{2}\delta^{2}+4C_{0}\delta^{2}\leq 0

Hence we can conclude that the function f(𝐱)=g(𝐱,𝐱¯)f(\mathbf{x})=g(\mathbf{x},\overline{\mathbf{x}}) satisfies at-least one of the following is true,

  • g(𝐱)δ\|\nabla g(\mathbf{x})\|\geq\delta

  • For the direction vector Δ\Delta,

    [ΔΔ¯]2g(𝐱,𝐱¯)[ΔΔ¯](1+5c)C02δ2+4C0δ2\displaystyle\begin{bmatrix}\Delta\\ \bar{\Delta}\end{bmatrix}^{*}\nabla^{2}g(\mathbf{x},\bar{\mathbf{x}})\begin{bmatrix}\Delta\\ \bar{\Delta}\end{bmatrix}\leq(-1+5c)C_{0}^{2}\delta^{2}+4C_{0}\delta^{2}
  • d(𝐱,𝐳)C0δd(\mathbf{x},\mathbf{z})\leq C_{0}\delta

Thus there exists constants β,ζ,γ>0\beta,\zeta,\gamma>0 such that the function f(𝐱)f(\mathbf{x}) is (β,ζ,γ)(\beta,\zeta,\gamma) strict saddle.

Following up on equation (VIII-C), the only possible way that the hessain 2g(𝐱,𝐱¯)0\nabla^{2}g(\mathbf{x},\overline{\mathbf{x}})\succeq 0 is if 𝐱𝐱𝐳𝐳=0\|\mathbf{x}\mathbf{x}^{*}-\mathbf{z}\mathbf{z}^{*}\|=0. Hence we can conclude that all local minimas, i.e. all 𝐰\mathbf{w} such that 2g(𝐰,𝐰¯)0\nabla^{2}g(\mathbf{w},\overline{\mathbf{w}})\succeq 0 has to satisfy 𝐰𝐰𝐳𝐳=0\|\mathbf{w}\mathbf{w}^{*}-\mathbf{z}\mathbf{z}^{*}\|=0 and hence satisfies 𝐳𝐰\mathbf{z}\sim\mathbf{w} which makes 𝐰\mathbf{w} the solution of the problem (P1).

IX Appendix D : Applications

IX-A Power system state estimation problem

Apart from being a broader class of problems encompassing phase retrieval, the problem setup (P1) also has applications in power system engineering. Given a network of buses and transmission lines, the goal is to estimate complex voltages across all buses from a subset of noisy power and voltage magnitude measurements. In the AC power model, these measurements are quadratically dependent on the voltage values to be determined. Let {ci}i=1m\{c_{i}\}_{i=1}^{m} be the set of measurements and {Ai}i=1m\{A_{i}\}_{i=1}^{m} be the corresponding bus admittance value matrices. Then the problem boils down to an estimation problem

find 𝐱\displaystyle\ \mathbf{x}
s.t. ci=𝐱Ai𝐱+νii=1,2,,m.\displaystyle\ c_{i}=\mathbf{x}^{*}A_{i}\mathbf{x}+\nu_{i}~{}~{}~{}\forall i=1,2,\dots,m.

where νi𝒩(0,σi2)\nu_{i}\sim\mathcal{N}(0,\sigma_{i}^{2}) is gaussian noise associated with the readings.[21]. For details on the problem setup, please refer [1].

IX-B Fusion Phase retrieval

Let {Wi}i=1m\{W_{i}\}_{i=1}^{m} be a set of subspace of n/n\mathbb{R}^{n}/\mathbb{C}^{n}. Fusion phase retrieval deals with the problem of recovering 𝐱\mathbf{x} upto a phase ambiguity from the measurements of the form {Pi𝐱}i=1m\{\|P_{i}\mathbf{x}\|\}_{i=1}^{m}, where Pi:n/nWiP_{i}:\mathbb{C}^{n}/\mathbb{R}^{n}\rightarrow W_{i} are projection operators onto the subspaces. [46] had the initial results on this problem with regards to the conditions on the subspaces and minimum number of such subspaces required for successful recovery of 𝐱\mathbf{x} under phase ambiguity.

IX-C X-ray crystallography

In X-ray crystallography, especially in crystal twinning [6], the measurements are obtained with orthogonal matrices Qi2=QiQ_{i}^{2}=Q_{i} which again would be solved by out setup.

In the worst case, a feasibility quadratic feasibility problem can be NP-hard, which makes the setup (P1) we address all the more interesting as we can highlight properties about a subgroup of quadratic feasibility problems and take a shot at providing provably converging algorithm for the same. This question resonates quite closely with many applications of quadratic feasibility as discussed above. In this write-up we have only considered the noiseless system, which later can be extended to noisy system analysis.