This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: T. Zhang 22institutetext: University of Central Florida, Department of Mathematics
22email: [email protected]
F. Yu
33institutetext: University of Central Florida, Department of Mathematics
33email: [email protected]

Phase retrieval of complex-valued objects via a randomized Kaczmarz method

Teng Zhang and Feng Yu
(Received: date / Accepted: date)
Abstract

This paper investigates the convergence of the randomized Kaczmarz algorithm for the problem of phase retrieval of complex-valued objects. While this algorithm has been studied for the real-valued case in 10.1093/imaiai/iay005 , its generalization to the complex-valued case is nontrivial and has been left as a conjecture. This paper applies a different approach by establishing the connection between the convergence of the algorithm and the convexity of an objective function. Based on the connection, it demonstrates that when the sensing vectors are sampled uniformly from a unit sphere in 𝒞n\mathcal{C}^{n} and the number of sensing vectors mm satisfies m>O(nlogn)m>O(n\log n) as n,mn,m\rightarrow\infty, then this algorithm with a good initialization achieves linear convergence to the solution with high probability. The method can be applied to other statistical models of sensing vectors as well. A similar convergence result is established for the unitary model, where the sensing vectors are from the columns of random orthogonal matrices.

1 Introduction

This article concerns the phase retrieval problem as follows: let 𝐳𝒞n\mathbf{z}\in\mathcal{C}^{n} be an unknown vector, given mm known sensing vectors {𝐚i}i=1m𝒞n\{\mathbf{a}_{i}\}_{i=1}^{m}\in\mathcal{C}^{n} and the observations

yi=|𝐚i𝐳|,i=1,2,,m,y_{i}=|\mathbf{a}_{i}^{*}\mathbf{z}|,i=1,2,\cdots,m, (1)

then can we reconstruct 𝐳\mathbf{z} from the observations {yi}i=1m\{y_{i}\}_{i=1}^{m}?

Many algorithm has been proposed for this problem, including approaches based on convex relaxation to semidefinite optimization Chai2011 ; Candes_PhaseLift ; Waldspurger2015 ; Candes2014 ; Gross2015 , convex relaxation to linear program pmlr-v54-bahmani17a ; Goldstein2016 ; Hand2016 ; Hand20162 ; NIPS2018_8082 , nonconvex approaches based on Wirtinger flows, i.e., gradient flow in the complex setting Candes7029630 ; NIPS2015_5743 ; Zhang:2016:PNP:3045390.3045499 ; NIPS2016_6319 ; cai2016 ; NIPS2016_6061 ; Soltanolkotabi2017 ; Chen2018 ; Candes7029630 ; Soltanolkotabi2017 ; 7541725 , alternate minimization (Gerchberg-Saxton) algorithm and its variants Gerchberg72 ; Fienup78 ; Fienup82 ; Bauschke03 ; Waldspurger2016 ; Netrapalli7130654 ; Zhang2017 ; zhang2020 , and algorithms based on Douglas-Rachford splitting doi:10.1137/18M1170364 . This work investigates the randomized Kaczmarz, which is simple to implement and has shown competitive performance in simulations. This algorithm is first proposed by Wei in Wei_2015 and it is shown that the method performs comparably with the state-of-the-art Wirtinger flow methods, when the sensing vectors are from real or complex Gaussian distributions, or when they follow the unitary model or the coded diffraction pattern (CDP) model. The work also includes a preliminary convergence analysis. The convergence analysis is improved by Tan and Vershynin in 10.1093/imaiai/iay005 , which shows that the algorithm is successful when there are as many Gaussian measurements as the dimension, up to a constant factor, and the initialization is in a “basin of linear convergence”. However, their results only apply when the signal 𝐳\mathbf{z} and the measurement vectors {𝐚i}i=1m\{\mathbf{a}_{i}\}_{i=1}^{m} are real-valued. As discussed in (10.1093/imaiai/iay005, , Section 7.2), there is no straightforward generalization of their technique from the real-valued case to the complex-valued case. In a related work tan2019online , Tan and Vershynin show that constant step size online stochastic gradient descent (SGD) converges from arbitrary initializations for a non-smooth, non-convex amplitude squared loss objective, and this online SGD is strongly reminiscent to the randomized Kaczmarz algorithm from numerical analysis. However, the analysis in tan2019online is still based on the real-valued setting.

1.1 Randomized Kaczmarz algorithm for solving linear systems

The Kaczmarz method kaczmarz1379 is an iterative algorithm for solving a system of linear equations

𝐚i𝐱=bi,i=1,,m.\mathbf{a}_{i}^{*}\mathbf{x}=b_{i},i=1,\cdots,m. (2)

In the kk-th iteration, a linear equation (out of mm equations) is selected and the new estimate 𝐱(k+1)\mathbf{x}^{(k+1)} is obtained by projecting the current estimate 𝐱(k)\mathbf{x}^{(k)} to the hyperplane corresponding to the solution set of the linear equation. The deterministic version of the Kaczmarz method usually selects the linear equation in a cyclic manner, and the randomized Kaczmarz method selects the linear equation randomly. When the randomized Kaczmarz method randomly picks up a system with the probability proportional to 1/𝐚i21/\|\mathbf{a}_{i}\|^{2}, the randomized Kaczmarz method has been shown to converge linearly in Strohmer2008 with a rate of 1κ(𝐀)1-\kappa(\mathbf{A}), where κ(𝐀)\kappa(\mathbf{A}) is the condition number of 𝐀=[𝐚1,,𝐚m]m×n\mathbf{A}=[\mathbf{a}_{1},\cdots,\mathbf{a}_{m}]\in\mathbb{R}^{m\times n}. For additional analysis on this method and its variants, we refer the readers to NEEDELL2014199 ; Needell2016 .

1.2 Randomized Kaczmarz algorithm for phase retrieval

The randomized Kaczmarz algorithm can be generalized to the phase retrieval problem (1) naturally. While the solution of each equation is not a hyperplane anymore, the projection to the solution set still has an explicit formula as follows. Let 𝒜i={𝐳:yi=|𝐚i𝐳|}\mathcal{A}_{i}=\{\mathbf{z}:y_{i}=|\mathbf{a}_{i}^{*}\mathbf{z}|\}, then

P𝒜i(𝐱)=𝐱(1yi|𝐚i𝐱|)𝐱𝐚i𝐚i𝐚i2.P_{\mathcal{A}_{i}}(\mathbf{x})=\mathbf{x}-\left(1-\frac{y_{i}}{|\mathbf{a}_{i}^{*}\mathbf{x}|}\right)\mathbf{x}^{*}\frac{\mathbf{a}_{i}\mathbf{a}_{i}^{*}}{\|\mathbf{a}_{i}\|^{2}}. (3)

A randomized Kaczmarz update projects the estimate to the nearest point in 𝒜r(k)\mathcal{A}_{r(k)} at the kk-th iteration, where r(k)r(k) is randomly chosen from {1,,m}\{1,\cdots,m\}, and the algorithm can be written as

𝐱(k+1)=P𝒜r(k)(𝐱(k)).\mathbf{x}^{(k+1)}=P_{\mathcal{A}_{r(k)}}(\mathbf{x}^{(k)}). (4)

1.3 Contribution and Main Result

The main contribution of this paper is a guarantee on the linear convergence of randomized Kaczmarz algorithm as follows:

  • First, this paper establishes a deterministic condition such that the algorithm converges linearly with high probability. Intuitively, the condition requires that an objective function is strongly convex in a neighborhood around the true signal 𝐳\mathbf{z}.

  • Second, this paper proves that when the sensing vectors are sampled uniformly from a unit sphere in 𝒞n\mathcal{C}^{n}, and the number of sensing vectors mm satisfies m>O(nlogn)m>O(n\log n) as m,nm,n\rightarrow\infty, the deterministic condition is satisfied with high probability. A similar result is also obtained for the unitary model, where the sensing vectors are from the columns of random orthogonal matrices.

This paper generalizes the result in Tan and Vershynin in 10.1093/imaiai/iay005 from the real-valued case to the complex-valued case. The generalization is not straightforward and the approach to obtain the deterministic condition is very different in this work. In comparison, since the phases can only be either 11 or 1-1 in the real-valued case, 10.1093/imaiai/iay005 divides all sensing vectors into “good measurements” with correct phases and “bad measurements” with possibly incorrect phases, and control the total influence of bad measurements. However, as remarked in (10.1093/imaiai/iay005, , Section 7.2), this method would not work in the complex-valued case since the phases are no longer ±1\pm 1 and each measurement contributes an error that scales with the phase difference, and we can no longer simply sum up the influence of bad measurements as in (10.1093/imaiai/iay005, , Lemma 2.1).

We remark that if 𝐚i\mathbf{a}_{i} are scaled such that 𝐚i=1\|\mathbf{a}_{i}\|=1 for all 1in1\leq i\leq n, then the update formula (4) can also be considered as the stochastic gradient descent algorithm that minimizes 1mi=1m(|𝐚i𝐱|yi)2\frac{1}{m}\sum_{i=1}^{m}\Big{(}|\mathbf{a}_{i}^{*}\mathbf{x}|-y_{i}\Big{)}^{2}, with step size chosen to be 11. In this sense, our work is related to bassily2018exponential , which the convergence of the stochastic gradient descent algorithm for generic objective functions has been studied, and both their work and this work are based on the convexity of the objective function. However, we remark that their result can not be directly applied here since it assumes a specific step size that depends on the smoothness constant and the Polyak-Lojasiewicz condition of the objective function, which is unclear for this objective function.

1.4 Notation

Throughout the paper, CC and cc are absolute constants that do not depend on m,nm,n, and can change from line to line. We also implicitly assume that m,nm,n are sufficiently large, for example, we write m>n+10m>n+10 when m>nlognm>n\log n is assumed. Re(x)\mathrm{Re}(x) represents the real component of a complex number xx. For a set 𝒮\mathcal{S}, |𝒮||\mathcal{S}| represents the cardinality of the set.

2 Main Result

We first present the main contribution of this paper, as well as a sketch of the proof and some discussion.

Theorem 1

(a) Assuming that the sensing vectors {𝐚i}i=1m\{\mathbf{a}_{i}\}_{i=1}^{m} are i.i.d. sampled from the uniform distribution on the unit sphere in 𝒞n\mathcal{C}^{n}, and in each iteration, the randomized Kaczmarz algorithm randomly picks up each equation with probability 1/m1/m. Then there exist absolute constants C0,c0,LC_{0},c_{0},L that does not depend on m,nm,n such that if mC0nlognm\geq C_{0}n\log n as m,nm,n\rightarrow\infty, then for all 𝐱(0)𝐳c0δ1𝐳\|\mathbf{x}^{(0)}-\mathbf{z}\|\leq c_{0}\sqrt{\delta_{1}}\|\mathbf{z}\| and ϵ>0\epsilon>0, we have

Pr(𝐱(k)𝐳2ϵ𝐱(0)𝐳2)1δ1(1Ln)kϵ(1δ1)Cexp(cn).Pr\left(\|\mathbf{x}^{(k)}-\mathbf{z}\|^{2}\leq\epsilon\|\mathbf{x}^{(0)}-\mathbf{z}\|^{2}\right)\geq 1-\delta_{1}-\frac{\left(1-\frac{L}{n}\right)^{k}}{\epsilon\left(1-\delta_{1}\right)}-C\exp(-cn). (5)

(b) For the unitary model that m=Knm=Kn for some integer KK, and for any 1kK1\leq k\leq K, [𝐚(k1)n+1,,𝐚kn]𝒞n×n[\mathbf{a}_{(k-1)n+1},\cdots,\mathbf{a}_{kn}]\in\mathcal{C}^{n\times n} is a random orthogonal matrix in 𝒞n×n\mathcal{C}^{n\times n}, there exists some constants C0,c0,LC_{0},c_{0},L such that if mC0nlognm\geq C_{0}n\log n and n>log2m\sqrt{n}>\log^{2}m, then (5) also holds as m,nm,n\rightarrow\infty.

This theorem shows the linear convergence of as follows: if δ1<12\delta_{1}<\frac{1}{2}, and klog(2ϵδ2)/log(1L/n)k\geq\log(2\epsilon\delta_{2})/\log(1-L/n), then with probability at least 1δ1δ2Cexp(cn)1-\delta_{1}-\delta_{2}-C\exp(-cn), we have 𝐱(k)𝐳2ϵ𝐱(0)𝐳2\|\mathbf{x}^{(k)}-\mathbf{z}\|^{2}\leq\epsilon\|\mathbf{x}^{(0)}-\mathbf{z}\|^{2}. If we let δ1=δ2=δ/2\delta_{1}=\delta_{2}=\delta/2, the number of iterations to achieve accuracy ϵ\epsilon with probability 1δ1-\delta is in the order of O(nlog1ϵδ)O\left(n\log\frac{1}{\epsilon\delta}\right).

The proof of Theorem 1 is divided into three steps. The first step establishes a condition in (7), with which the algorithm converges with high probability. This condition describes the regularity of an objective function in a local neighborhood around 𝐳\mathbf{z}. The second step establishes an explicit lower bound of the key parameter LL in the condition (7). In the third step, we apply tools from random matrix theory and measure concentration to analyze the explicit formula of LL and show that it can be chosen as a constant as n,mn,m\rightarrow\infty. The three steps for part (a) are described in Sections 3,  4, and  5 respectively, and the third step for part (b) is described in Section 6. The main results of these sections are summarized in Theorems 2 and  3,  4, and  5. Combining these theorems, we have the proof of Theorem 1.

Proof (Proof of Theorem 1)

WLOG we assume 𝐳=1\|\mathbf{z}\|=1 for the rest of the paper. In the statement of Theorem 4, α\alpha in (16) can be chosen such that α>1\alpha>1 and 6.6α1<c172\frac{6.6}{\alpha-1}<\frac{c_{1}}{72} and c0c_{0} in (16) can be chosen such that (2+4α)C42c0α<c172(2+4\alpha)C4\sqrt{2}c_{0}\alpha<\frac{c_{1}}{72}. Since when α\alpha is large, c0c_{0} can be chosen in the order of 1/α21/\alpha^{2} and c0αc_{0}\alpha is in the order of 1/α1/\alpha, there exists a choice of (α,c0)(\alpha,c_{0}) such that the assumption 2c0α<12c_{0}\alpha<1 in Theorem 4 holds. Then Theorem 3 and Theorem 4 imply that L=c172L=\frac{c_{1}}{72} satisfies the assumption (7). With this assumption satisfied, Theorem 2 implies Theorem 1(a). The proof of Theorem 1(b) is similar to the proof of (a), with Theorem 4 replaced by Theorem 5.

3 Step 1: Convergence under a deterministic condition

This section connects the convergence of the randomized Kaczmarz algorithm with the function

f(𝐱)=1mi=1m(|𝐚i𝐱|yi)2f(\mathbf{x})=\frac{1}{m}\sum_{i=1}^{m}\Big{(}|\mathbf{a}_{i}^{*}\mathbf{x}|-y_{i}\Big{)}^{2} (6)

and its directional derivatives defined by

f𝐯(𝐱)=limt0+f(𝐱+t𝐯)f(𝐯)t=1mi=1m(1yi|𝐚i𝐱|)(𝐚i𝐯𝐱𝐚i+𝐚i𝐱𝐯𝐚i).f^{\prime}_{\mathbf{v}}(\mathbf{x})=\lim_{t\rightarrow 0^{+}}\frac{f(\mathbf{x}+t\mathbf{v})-f(\mathbf{v})}{t}=\frac{1}{m}\sum_{i=1}^{m}\Big{(}1-\frac{y_{i}}{|\mathbf{a}_{i}^{*}\mathbf{x}|}\Big{)}\big{(}\mathbf{a}_{i}^{*}\mathbf{v}\mathbf{x}^{*}\mathbf{a}_{i}+\mathbf{a}_{i}^{*}\mathbf{x}\mathbf{v}^{*}\mathbf{a}_{i}\big{)}.

The result of this section depends on the following local regularity assumption on ff:

f(𝐱)+f𝐳𝐱(𝐱)+Ln𝐳𝐱2f(𝐳),for all 𝐱 such that 𝐱𝐳c0.f(\mathbf{x})+f^{\prime}_{\mathbf{z}-\mathbf{x}}(\mathbf{x})+\frac{L}{n}\|\mathbf{z}-\mathbf{x}\|^{2}\leq f(\mathbf{z}),\,\,\text{for all $\mathbf{x}$ such that $\|\mathbf{x}-\mathbf{z}\|\leq c_{0}$.} (7)

We remark that this formulation is identical to the definition of strong convexity, so this assumption is related to the strong convexity of f(𝐱)f(\mathbf{x}) in the local neighborhood 𝐱B(𝐳,c0)\mathbf{x}\in B(\mathbf{z},c_{0}). However, it is slightly less restrictive in the sense that it only requires (7) for a fixed 𝐳\mathbf{z} (if 𝐳\mathbf{z} is replaced by any 𝐲B(𝐳,c0)\mathbf{y}\in B(\mathbf{z},c_{0}), then this is equivalent to strong convexity).

We also remark that this objective function (6) has been studied in NIPS2016_6319 ; 8049465 ; tan2019online and its local regularity property has been studied in NIPS2016_6319 ; 8049465 . However, these works study a different regularity assumption in (NIPS2016_6319, , (12)) and (8049465, , (39)), which can be written as:

f𝐱𝐳(𝐱)μ2f(𝐱)2+λ2𝐱𝐳2.f^{\prime}_{\mathbf{x}-\mathbf{z}}(\mathbf{x})\geq\frac{\mu}{2}\|f^{\prime}(\mathbf{x})\|^{2}+\frac{\lambda}{2}\|\mathbf{x}-\mathbf{z}\|^{2}.

In addition, similar to 10.1093/imaiai/iay005 , these works only theoretically analyze the real-valued setting.

The main result of this section is summarized as follows. It states that under the assumption (7), the algorithm converges linearly with high probability.

Theorem 2

Assume that 𝐳=1\|\mathbf{z}\|=1 and 𝐚i=1\|\mathbf{a}_{i}\|=1 for all 1im1\leq i\leq m, (7) holds, the randomized Kaczmarz algorithm randomly picks up an equation with the same probability 1/m1/m, and the algorithm is initialized such that 𝐱(0)𝐳c0δ1\|\mathbf{x}^{(0)}-\mathbf{z}\|\leq c_{0}\sqrt{\delta_{1}} for some 0δ110\leq\delta_{1}\leq 1. Then for any ϵ>0\epsilon>0,

Pr(𝐱(k)𝐳2ϵ𝐱(0)𝐳2)1δ1(1Ln)kϵ(1δ1).\Pr(\|\mathbf{x}^{(k)}-\mathbf{z}\|^{2}\leq\epsilon\|\mathbf{x}^{(0)}-\mathbf{z}\|^{2})\geq 1-\delta_{1}-\frac{\left(1-\frac{L}{n}\right)^{k}}{\epsilon\left(1-\delta_{1}\right)}. (8)
Proof

Let PP be the random mapping P𝒜iP_{\mathcal{A}_{i}} where ii is uniformly sampled from {1,,m}\{1,\cdots,m\}, apply the projection formula (3) with the assumption 𝐚i=1\|\mathbf{a}_{i}\|=1 for all 1im1\leq i\leq m, then

𝔼PP𝐱𝐳2=𝔼i{1,,m}𝐱(1yi|𝐚i𝐱|)𝐱𝐚i𝐚i𝐳2\displaystyle\operatorname{\mathbb{E}}_{P}\|P\mathbf{x}-\mathbf{z}\|^{2}=\operatorname{\mathbb{E}}_{i\sim\{1,\cdots,m\}}\Big{\|}\mathbf{x}-\Big{(}1-\frac{y_{i}}{|\mathbf{a}_{i}^{*}\mathbf{x}|}\Big{)}\mathbf{x}^{*}\mathbf{a}_{i}\mathbf{a}_{i}^{*}-\mathbf{z}\Big{\|}^{2}
=\displaystyle= 𝔼i[(1yi|𝐚i𝐱|)2|𝐚i𝐱|2Re(2(1yi|𝐚i𝐱|)𝐱𝐚i𝐚i(𝐱𝐳))]+𝐳𝐱2\displaystyle\operatorname{\mathbb{E}}_{i}\left[\Big{(}1-\frac{y_{i}}{|\mathbf{a}_{i}^{*}\mathbf{x}|}\Big{)}^{2}|\mathbf{a}_{i}^{*}\mathbf{x}|^{2}-\mathrm{Re}\left(2\Big{(}1-\frac{y_{i}}{|\mathbf{a}_{i}^{*}\mathbf{x}|}\Big{)}\mathbf{x}^{*}\mathbf{a}_{i}\mathbf{a}_{i}^{*}(\mathbf{x}-\mathbf{z})\right)\right]+\|\mathbf{z}-\mathbf{x}\|^{2}
=\displaystyle= 𝔼i[(|𝐚i𝐱|yi)2]+2𝔼i[Re((1yi|𝐚i𝐱|)𝐱𝐚i𝐚i(𝐳𝐱))]+𝐳𝐱2\displaystyle\operatorname{\mathbb{E}}_{i}\Big{[}({|\mathbf{a}_{i}^{*}\mathbf{x}|}-{y_{i}})^{2}\Big{]}+2\operatorname{\mathbb{E}}_{i}\left[\mathrm{Re}\left(\Big{(}1-\frac{y_{i}}{|\mathbf{a}_{i}^{*}\mathbf{x}|}\Big{)}\mathbf{x}^{*}\mathbf{a}_{i}\mathbf{a}_{i}^{*}(\mathbf{z}-\mathbf{x})\right)\right]+\|\mathbf{z}-\mathbf{x}\|^{2}
=\displaystyle= f(𝐱)+f𝐳𝐱(𝐱)+𝐳𝐱2(1Ln)𝐳𝐱2,\displaystyle f(\mathbf{x})+f^{\prime}_{\mathbf{z}-\mathbf{x}}(\mathbf{x})+\|\mathbf{z}-\mathbf{x}\|^{2}\leq\Big{(}1-\frac{L}{n}\Big{)}\|\mathbf{z}-\mathbf{x}\|^{2},

where the last inequality applies the assumption (7) and f(𝐳)=0f(\mathbf{z})=0, and the last equality use the fact that 𝐲+𝐲=2Re(𝐲)\mathbf{y}+\mathbf{y}^{*}=2\mathrm{Re}(\mathbf{y}) (this is a fact that we will apply repetitively later).

The rest of the proof follows from the proof in (10.1093/imaiai/iay005, , Section 3). Let τ=min{k:𝐱(k)𝐳c0},\tau=\min\{k:\|\mathbf{x}^{(k)}-\mathbf{z}\|\leq c_{0}\}, then following the proof in (10.1093/imaiai/iay005, , Theorem 3.1), we have P(τ<)(c0δ1c0)2=δ1.P(\tau<\infty)\leq\left(\frac{c_{0}\sqrt{\delta_{1}}}{c_{0}}\right)^{2}=\delta_{1}. Following the proof in (10.1093/imaiai/iay005, , Corollary 3.2), (8) is proved.

4 The property of an implicit objective function

In this section, we will give an explicit formula for LL defined in (7). The formula will be based on a few additional definitions as follows. Let fi(𝐱)=(|𝐚i𝐳||𝐚i𝐱|)2f_{i}(\mathbf{x})=({|\mathbf{a}_{i}^{*}\mathbf{z}|}-{|\mathbf{a}_{i}^{*}\mathbf{x}|})^{2}, and define the first and the second directional derivatives of fif_{i} of direction 𝐯\mathbf{v} at 𝐱\mathbf{x} by

fi,𝐯(𝐱)=limt0fi(𝐱+t𝐯)fi(𝐱)t,\displaystyle f^{\prime}_{i,\mathbf{v}}(\mathbf{x})=\lim_{t\rightarrow 0}\frac{f_{i}(\mathbf{x}+t\mathbf{v})-f_{i}(\mathbf{x})}{t}, (9)
fi,𝐯′′(𝐱)=limt0fi(𝐱+t𝐯)fi(𝐱)t.\displaystyle f^{\prime\prime}_{i,\mathbf{v}}(\mathbf{x})=\lim_{t\rightarrow 0}\frac{f^{\prime}_{i}(\mathbf{x}+t\mathbf{v})-f_{i}(\mathbf{x})}{t}. (10)

It can be shown that the directional derivatives have explicit expressions

fi,𝐯(𝐱)\displaystyle f^{\prime}_{i,\mathbf{v}}(\mathbf{x}) =𝐚i𝐱𝐯𝐚i+𝐚i𝐯𝐱𝐚i|𝐚i𝐳|𝐚i𝐱𝐯𝐚i+𝐚i𝐯𝐱𝐚i|𝐚i𝐱|,\displaystyle=\mathbf{a}_{i}^{*}\mathbf{x}\mathbf{v}^{*}\mathbf{a}_{i}+\mathbf{a}_{i}^{*}\mathbf{v}\mathbf{x}^{*}\mathbf{a}_{i}-|\mathbf{a}_{i}^{*}\mathbf{z}|\frac{\mathbf{a}_{i}^{*}\mathbf{x}\mathbf{v}^{*}\mathbf{a}_{i}+\mathbf{a}_{i}^{*}\mathbf{v}\mathbf{x}^{*}\mathbf{a}_{i}}{|\mathbf{a}_{i}^{*}\mathbf{x}|},
fi,𝐯′′(𝐱)\displaystyle f^{\prime\prime}_{i,\mathbf{v}}(\mathbf{x}) =2𝐚i𝐯𝐯𝐚i|𝐚i𝐳|2𝐚i𝐯𝐯𝐚i|𝐚i𝐱|+|𝐚i𝐳|(𝐚i𝐱𝐯𝐚i+𝐚i𝐯𝐱𝐚i)22|𝐚i𝐱|3.\displaystyle=2\mathbf{a}_{i}^{*}\mathbf{v}\mathbf{v}^{*}\mathbf{a}_{i}-|\mathbf{a}_{i}^{*}\mathbf{z}|\frac{2\mathbf{a}_{i}^{*}\mathbf{v}\mathbf{v}^{*}\mathbf{a}_{i}}{|\mathbf{a}_{i}^{*}\mathbf{x}|}+|\mathbf{a}_{i}^{*}\mathbf{z}|\frac{(\mathbf{a}_{i}^{*}\mathbf{x}\mathbf{v}^{*}\mathbf{a}_{i}+\mathbf{a}_{i}^{*}\mathbf{v}\mathbf{x}^{*}\mathbf{a}_{i})^{2}}{2|\mathbf{a}_{i}^{*}\mathbf{x}|^{3}}.

In addition, since f(𝐱)=i=1mfi(𝐱)f(\mathbf{x})=\sum_{i=1}^{m}f_{i}(\mathbf{x}), the first and the second directional derivative of f(𝐱)f(\mathbf{x}) are f𝐯(𝐱)=i=1mfi,𝐯(𝐱)f^{\prime}_{\mathbf{v}}(\mathbf{x})=\sum_{i=1}^{m}f^{\prime}_{i,\mathbf{v}}(\mathbf{x}) and f𝐯′′(𝐱)=i=1mfi,𝐯′′(𝐱)f^{\prime\prime}_{\mathbf{v}}(\mathbf{x})=\sum_{i=1}^{m}f^{\prime\prime}_{i,\mathbf{v}}(\mathbf{x}).

Theorem 3

For any 𝐯,𝐳𝒞n\mathbf{v},\mathbf{z}\in\mathcal{C}^{n} with 𝐯=𝐳=1\|\mathbf{v}\|=\|\mathbf{z}\|=1 and β>0\beta>0, define

𝒮(𝐯,β)={1im:β|𝐚i𝐯||𝐚i𝐳|}\mathcal{S}(\mathbf{v},\beta)=\{1\leq i\leq m:\beta|\mathbf{a}_{i}^{*}\mathbf{v}|\geq|\mathbf{a}_{i}^{*}\mathbf{z}|\} (11)

then for any α>1\alpha>1,

L=\displaystyle L= nmmin𝐯=1{m2f𝐯′′(𝐳)6α1i=1m|𝐚i𝐯|2(2+4α)i𝒮(𝐯,c0α)|𝐚i𝐯|2}\displaystyle\frac{n}{m}\min_{\|\mathbf{v}\|=1}\left\{\frac{m}{2}f^{\prime\prime}_{\mathbf{v}}(\mathbf{z})-\frac{6}{\alpha-1}\sum_{i=1}^{m}|\mathbf{a}_{i}^{*}\mathbf{v}|^{2}-(2+4\alpha)\sum_{i\in\mathcal{S}(\mathbf{v},c_{0}\alpha)}|\mathbf{a}_{i}^{*}\mathbf{v}|^{2}\right\}
=\displaystyle= nmmin𝐯=1{12i=1m(𝐚i𝐳𝐯𝐚i+𝐚i𝐯𝐳𝐚i)22|𝐚i𝐳|26α1i=1m|𝐚i𝐯|2(2+4α)i𝒮(𝐯,c0α)|𝐚i𝐯|2}.\displaystyle\frac{n}{m}\min_{\|\mathbf{v}\|=1}\left\{\!\!\frac{1}{2}\!\sum_{i=1}^{m}\!\frac{(\mathbf{a}_{i}^{*}\mathbf{z}\mathbf{v}^{*}\!\mathbf{a}_{i}\!\!+\!\!\mathbf{a}_{i}^{*}\mathbf{v}\mathbf{z}^{*}\!\mathbf{a}_{i}\!)^{2}}{2|\mathbf{a}_{i}^{*}\mathbf{z}|^{2}}\!-\!\frac{6}{\alpha\!-\!1}\!\!\sum_{i=1}^{m}|\mathbf{a}_{i}^{*}\mathbf{v}|^{2}\!-\!(2\!+\!4\alpha)\!\!\!\!\!\!\!\!\sum_{i\in\mathcal{S}(\mathbf{v},c_{0}\alpha)}\!\!\!\!\!\!\!\!|\mathbf{a}_{i}^{*}\mathbf{v}|^{2}\!\!\right\}. (12)

satisfies the assumption (7).

Proof (Proof of Theorem 3)

We will first prove (7) with LL defined in (12) when 𝐱𝐳=c0\|\mathbf{x}-\mathbf{z}\|=c_{0}. For this case, there exists 𝐯=𝐱𝐳𝐱𝐳\mathbf{v}=\frac{\mathbf{x}-\mathbf{z}}{\|\mathbf{x}-\mathbf{z}\|} such that 𝐱=𝐳+c0𝐯\mathbf{x}=\mathbf{z}+c_{0}\mathbf{v} and 𝐯=1\|\mathbf{v}\|=1. For any i𝒮(𝐯,c0α)i\not\in\mathcal{S}(\mathbf{v},c_{0}\alpha), we have |𝐚i𝐳|c0α|𝐚i𝐯||\mathbf{a}_{i}^{*}\mathbf{z}|\geq c_{0}\alpha|\mathbf{a}_{i}^{*}\mathbf{v}| and the triangle inequality implies

|𝐚i(𝐱𝐳)||𝐚i𝐳|1α,||𝐚i𝐳||𝐚i𝐱|1|1α1,|𝐚i𝐳||𝐚i𝐱|αα1.\frac{|\mathbf{a}_{i}^{*}(\mathbf{x}-\mathbf{z})|}{|\mathbf{a}_{i}^{*}\mathbf{z}|}\leq\frac{1}{\alpha},\,\,\,\left|\frac{|\mathbf{a}_{i}^{*}\mathbf{z}|}{|\mathbf{a}_{i}^{*}\mathbf{x}|}-1\right|\leq\frac{1}{\alpha-1},\,\,\,\frac{|\mathbf{a}_{i}^{*}\mathbf{z}|}{|\mathbf{a}_{i}^{*}\mathbf{x}|}\leq\frac{\alpha}{\alpha-1}.

Applying Lemma II.13 from zhang2020 ,

|𝐚i𝐱|𝐚i𝐱|𝐚i𝐳|𝐚i𝐳||2min(|𝐚i(𝐱𝐳)||𝐚i𝐳|,1)=2α.\left|\frac{\mathbf{a}_{i}^{*}\mathbf{x}}{|\mathbf{a}_{i}^{*}\mathbf{x}|}-\frac{\mathbf{a}_{i}^{*}\mathbf{z}}{|\mathbf{a}_{i}^{*}\mathbf{z}|}\right|\leq 2\min\Big{(}\frac{|\mathbf{a}_{i}^{*}(\mathbf{x}-\mathbf{z})|}{|\mathbf{a}_{i}^{*}\mathbf{z}|},1\Big{)}=\frac{2}{\alpha}.

Applying these inequalities, we have that for i𝒮(𝐯,c0α)i\not\in\mathcal{S}(\mathbf{v},c_{0}\alpha),

|fi,𝐯′′(𝐱)fi,𝐯′′(𝐳)|2||𝐚i𝐳||𝐚i𝐱|1|𝐚i𝐯𝐯𝐚i+||𝐚i𝐳||𝐚i𝐱|1|(𝐚i𝐳𝐯𝐚i+𝐚i𝐯𝐳𝐚i)22|𝐚i𝐳|2\displaystyle|f^{\prime\prime}_{i,\mathbf{v}}(\mathbf{x})-f^{\prime\prime}_{i,\mathbf{v}}(\mathbf{z})|\leq 2\left|\frac{|\mathbf{a}_{i}^{*}\mathbf{z}|}{|\mathbf{a}_{i}^{*}\mathbf{x}|}-1\right|\mathbf{a}_{i}^{*}\mathbf{v}\mathbf{v}^{*}\mathbf{a}_{i}+\left|\frac{|\mathbf{a}_{i}^{*}\mathbf{z}|}{|\mathbf{a}_{i}^{*}\mathbf{x}|}-1\right|\frac{(\mathbf{a}_{i}^{*}\mathbf{z}\mathbf{v}^{*}\!\mathbf{a}_{i}\!+\!\mathbf{a}_{i}^{*}\mathbf{v}\mathbf{z}^{*}\!\mathbf{a}_{i})^{2}}{2|\mathbf{a}_{i}^{*}\mathbf{z}|^{2}}
+|𝐚i𝐳|2|𝐚i𝐱|[(𝐚i𝐱|𝐚i𝐱|𝐯𝐚i+𝐚i𝐯𝐱𝐚i|𝐱𝐚i|)2(𝐚i𝐳|𝐚i𝐳|𝐯𝐚i+𝐚i𝐯𝐳𝐚i|𝐳𝐚i|)2]\displaystyle+\frac{|\mathbf{a}_{i}^{*}\mathbf{z}|}{2|\mathbf{a}_{i}^{*}\mathbf{x}|}\Big{[}\big{(}\frac{\mathbf{a}_{i}^{*}\mathbf{x}}{|\mathbf{a}_{i}^{*}\mathbf{x}|}\mathbf{v}^{*}\mathbf{a}_{i}+\mathbf{a}_{i}^{*}\mathbf{v}\frac{\mathbf{x}^{*}\mathbf{a}_{i}}{|\mathbf{x}^{*}\mathbf{a}_{i}|}\big{)}^{2}-\big{(}\frac{\mathbf{a}_{i}^{*}\mathbf{z}}{|\mathbf{a}_{i}^{*}\mathbf{z}|}\mathbf{v}^{*}\mathbf{a}_{i}+\mathbf{a}_{i}^{*}\mathbf{v}\frac{\mathbf{z}^{*}\mathbf{a}_{i}}{|\mathbf{z}^{*}\mathbf{a}_{i}|}\big{)}^{2}\Big{]}
\displaystyle\leq 2α1𝐚i𝐯𝐯𝐚i+1α1(𝐚i𝐳𝐯𝐚i+𝐚i𝐯𝐳𝐚i)22|𝐚i𝐳|2\displaystyle\frac{2}{\alpha-1}\mathbf{a}_{i}^{*}\mathbf{v}\mathbf{v}^{*}\mathbf{a}_{i}+\frac{1}{\alpha-1}\frac{(\mathbf{a}_{i}^{*}\mathbf{z}\mathbf{v}^{*}\mathbf{a}_{i}+\mathbf{a}_{i}^{*}\mathbf{v}\mathbf{z}^{*}\mathbf{a}_{i})^{2}}{2|\mathbf{a}_{i}^{*}\mathbf{z}|^{2}}
+|𝐚i𝐳|2|𝐚i𝐱|[(𝐚i𝐱|𝐚i𝐱|𝐯𝐚i+𝐚i𝐯𝐱𝐚i|𝐱𝐚i|)(𝐚i𝐳|𝐚i𝐳|𝐯𝐚i+𝐚i𝐯𝐳𝐚i|𝐳𝐚i|)]\displaystyle+\frac{|\mathbf{a}_{i}^{*}\mathbf{z}|}{2|\mathbf{a}_{i}^{*}\mathbf{x}|}\left[\big{(}\frac{\mathbf{a}_{i}^{*}\mathbf{x}}{|\mathbf{a}_{i}^{*}\mathbf{x}|}\mathbf{v}^{*}\mathbf{a}_{i}+\mathbf{a}_{i}^{*}\mathbf{v}\frac{\mathbf{x}^{*}\mathbf{a}_{i}}{|\mathbf{x}^{*}\mathbf{a}_{i}|}\big{)}-\big{(}\frac{\mathbf{a}_{i}^{*}\mathbf{z}}{|\mathbf{a}_{i}^{*}\mathbf{z}|}\mathbf{v}^{*}\mathbf{a}_{i}+\mathbf{a}_{i}^{*}\mathbf{v}\frac{\mathbf{z}^{*}\mathbf{a}_{i}}{|\mathbf{z}^{*}\mathbf{a}_{i}|}\big{)}\right]
[(𝐚i𝐱|𝐚i𝐱|𝐯𝐚i+𝐚i𝐯𝐱𝐚i|𝐱𝐚i|)+(𝐚i𝐳|𝐚i𝐳|𝐯𝐚i+𝐚i𝐯𝐳𝐚i|𝐳𝐚i|)]\displaystyle\left[\big{(}\frac{\mathbf{a}_{i}^{*}\mathbf{x}}{|\mathbf{a}_{i}^{*}\mathbf{x}|}\mathbf{v}^{*}\mathbf{a}_{i}+\mathbf{a}_{i}^{*}\mathbf{v}\frac{\mathbf{x}^{*}\mathbf{a}_{i}}{|\mathbf{x}^{*}\mathbf{a}_{i}|}\big{)}+\big{(}\frac{\mathbf{a}_{i}^{*}\mathbf{z}}{|\mathbf{a}_{i}^{*}\mathbf{z}|}\mathbf{v}^{*}\mathbf{a}_{i}+\mathbf{a}_{i}^{*}\mathbf{v}\frac{\mathbf{z}^{*}\mathbf{a}_{i}}{|\mathbf{z}^{*}\mathbf{a}_{i}|}\big{)}\right]
\displaystyle\leq 2α1|𝐚i𝐯|2+2α1|𝐚i𝐯|2+α2(α1)[4α|𝐚i𝐯|][4|𝐚i𝐯|]12α1|𝐚i𝐯|2,\displaystyle\frac{2}{\alpha-1}|\mathbf{a}_{i}^{*}\mathbf{v}|^{2}+\frac{2}{\alpha-1}|\mathbf{a}_{i}^{*}\mathbf{v}|^{2}+\frac{\alpha}{2(\alpha-1)}\left[\frac{4}{\alpha}|\mathbf{a}_{i}^{*}\mathbf{v}|\right]\Big{[}4|\mathbf{a}_{i}^{*}\mathbf{v}|\Big{]}\leq\frac{12}{\alpha-1}|\mathbf{a}_{i}^{*}\mathbf{v}|^{2}, (13)

where the intermediate inequalities use the fact |𝐚i𝐳𝐯𝐚i||𝐚i𝐳||𝐚i𝐯|.|\mathbf{a}_{i}^{*}\mathbf{z}\mathbf{v}^{*}\mathbf{a}_{i}|\leq|\mathbf{a}_{i}^{*}\mathbf{z}||\mathbf{a}_{i}^{*}\mathbf{v}|.

For any i𝒮(𝐯,c0α)i\in\mathcal{S}(\mathbf{v},c_{0}\alpha) and any 0tc00\leq t\leq c_{0}, we have

|𝐚i(𝐳+t𝐯)|𝐚i(𝐳+t𝐯)|𝐚i(𝐳+c0𝐯)|𝐚i(𝐳+c0𝐯)||2\left|\frac{\mathbf{a}_{i}^{*}(\mathbf{z}+t\mathbf{v})}{|\mathbf{a}_{i}^{*}(\mathbf{z}+t\mathbf{v})|}-\frac{\mathbf{a}_{i}^{*}(\mathbf{z}+c_{0}\mathbf{v})}{|\mathbf{a}_{i}^{*}(\mathbf{z}+c_{0}\mathbf{v})|}\right|\leq 2

and

|fi,𝐯(𝐳+t𝐯)fi,𝐯(𝐳+c0𝐯)|\displaystyle|f_{i,\mathbf{v}}^{\prime}(\mathbf{z}+t\mathbf{v})-f_{i,\mathbf{v}}^{\prime}(\mathbf{z}+c_{0}\mathbf{v})|
=\displaystyle= 2(tc0)|𝐚i𝐯|22|𝐚i𝐳|Re((𝐚i(𝐳+t𝐯)|𝐚i(𝐳+t𝐯)|𝐚i(𝐳+c0𝐯)|𝐚i(𝐳+c0𝐯)|)𝐯𝐚i)\displaystyle 2(t-c_{0})|\mathbf{a}_{i}^{*}\mathbf{v}|^{2}-2|\mathbf{a}_{i}^{*}\mathbf{z}|\mathrm{Re}\left(\left(\frac{\mathbf{a}_{i}^{*}(\mathbf{z}+t\mathbf{v})}{|\mathbf{a}_{i}^{*}(\mathbf{z}+t\mathbf{v})|}-\frac{\mathbf{a}_{i}^{*}(\mathbf{z}+c_{0}\mathbf{v})}{|\mathbf{a}_{i}^{*}(\mathbf{z}+c_{0}\mathbf{v})|}\right)\mathbf{v}\mathbf{a}_{i}^{*}\right)
\displaystyle\leq 2(c0t)|𝐚i𝐯|2+4|𝐚i𝐳||𝐚i𝐯|(2(c0t)+4c0α)|𝐚i𝐯|2.\displaystyle 2(c_{0}-t)|\mathbf{a}_{i}^{*}\mathbf{v}|^{2}+4|\mathbf{a}_{i}^{*}\mathbf{z}||\mathbf{a}_{i}^{*}\mathbf{v}|\leq(2(c_{0}-t)+4c_{0}\alpha)|\mathbf{a}_{i}^{*}\mathbf{v}|^{2}. (14)

Combining the two cases studied in (13) and (14), we have

m[f(𝐱)f(𝐳)+f𝐱𝐳(𝐱)]=m[f(𝐳+c0𝐯)f(𝐳)c0f𝐯(𝐳+c0𝐯)]\displaystyle m[f(\mathbf{x})-f(\mathbf{z})+f^{\prime}_{\mathbf{x}-\mathbf{z}}(\mathbf{x})]=m[f(\mathbf{z}+c_{0}\mathbf{v})-f(\mathbf{z})-c_{0}f^{\prime}_{\mathbf{v}}(\mathbf{z}+c_{0}\mathbf{v})]
=\displaystyle= mt=0c0[f𝐯(𝐳+t𝐯)f𝐯(𝐳+c0𝐯)]dt\displaystyle m\int_{t=0}^{c_{0}}\Big{[}f^{\prime}_{\mathbf{v}}(\mathbf{z}+t\mathbf{v})-f^{\prime}_{\mathbf{v}}(\mathbf{z}+c_{0}\mathbf{v})\Big{]}{\,\mathrm{d}}t
=\displaystyle= i𝒮(𝐯,c0α)t=0c0[fi,𝐯(𝐳+t𝐯)fi,𝐯(𝐳+c0𝐯)]dt\displaystyle\sum_{i\not\in\mathcal{S}(\mathbf{v},c_{0}\alpha)}\int_{t=0}^{c_{0}}\Big{[}f_{i,\mathbf{v}}^{\prime}(\mathbf{z}+t\mathbf{v})-f_{i,\mathbf{v}}^{\prime}(\mathbf{z}+c_{0}\mathbf{v})\Big{]}{\,\mathrm{d}}t
+i𝒮(𝐯,c0α)t=0c0[fi,𝐯(𝐳+t𝐯)fi,𝐯(𝐳+c0𝐯)]dt\displaystyle+\sum_{i\in\mathcal{S}(\mathbf{v},c_{0}\alpha)}\int_{t=0}^{c_{0}}\Big{[}f_{i,\mathbf{v}}^{\prime}(\mathbf{z}+t\mathbf{v})-f_{i,\mathbf{v}}^{\prime}(\mathbf{z}+c_{0}\mathbf{v})\Big{]}{\,\mathrm{d}}t
=\displaystyle= i𝒮(𝐯,c0α)t=0c0a=tc0fi,𝐯′′(𝐳+a𝐯)dadt+i𝒮(𝐯,c0α)t=0c0[fi,𝐯(𝐳+t𝐯)fi,𝐯(𝐳+c0𝐯)]dt\displaystyle\!-\!\!\!\!\!\!\sum_{i\not\in\mathcal{S}(\mathbf{v},c_{0}\alpha)}\!\!\!\!\!\int_{t=0}^{c_{0}}\!\!\int_{a=t}^{c_{0}}\!\!f_{i,\mathbf{v}}^{\prime\prime}(\mathbf{z}+a\mathbf{v}){\,\mathrm{d}}a{\,\mathrm{d}}t+\!\!\!\!\!\!\!\sum_{i\in\mathcal{S}(\mathbf{v},c_{0}\alpha)}\!\!\!\!\!\int_{t=0}^{c_{0}}\!\!\Big{[}f_{i,\mathbf{v}}^{\prime}(\mathbf{z}+t\mathbf{v})-f_{i,\mathbf{v}}^{\prime}(\mathbf{z}+c_{0}\mathbf{v})\Big{]}{\,\mathrm{d}}t
\displaystyle\leq i𝒮(𝐯,c0α)t=0c0a=tc0[fi,𝐯′′(𝐳)12α1|𝐚i𝐯|2]dadt\displaystyle-\sum_{i\not\in\mathcal{S}(\mathbf{v},c_{0}\alpha)}\int_{t=0}^{c_{0}}\int_{a=t}^{c_{0}}\Big{[}f^{\prime\prime}_{i,\mathbf{v}}(\mathbf{z})-\frac{12}{\alpha-1}|\mathbf{a}_{i}^{*}\mathbf{v}|^{2}\Big{]}{\,\mathrm{d}}a{\,\mathrm{d}}t
+i𝒮(𝐯,c0α)t=0c0[(2(bt)+4c0α)|𝐚i𝐯|2]dt\displaystyle+\sum_{i\in\mathcal{S}(\mathbf{v},c_{0}\alpha)}\int_{t=0}^{c_{0}}\Big{[}(2(b-t)+4c_{0}\alpha)|\mathbf{a}_{i}^{*}\mathbf{v}|^{2}\Big{]}{\,\mathrm{d}}t
=\displaystyle= c02{i𝒮(𝐯,c0α)[fi,𝐯′′(𝐳)26α1|𝐚i𝐯|2]+i𝒮(𝐯,c0α)(1+4α)|𝐚i𝐯|2}\displaystyle c_{0}^{2}\left\{-\sum_{i\not\in\mathcal{S}(\mathbf{v},c_{0}\alpha)}\Big{[}\frac{f^{\prime\prime}_{i,\mathbf{v}}(\mathbf{z})}{2}-\frac{6}{\alpha-1}|\mathbf{a}_{i}^{*}\mathbf{v}|^{2}\Big{]}+\sum_{i\in\mathcal{S}(\mathbf{v},c_{0}\alpha)}(1+4\alpha)|\mathbf{a}_{i}^{*}\mathbf{v}|^{2}\right\}
\displaystyle\leq c02{i=1m[fi,𝐯′′(𝐳)26α1|𝐚i𝐯|2]+i𝒮(𝐯,c0α)fi,𝐯′′(𝐳)2+i𝒮(𝐯,c0α)(2+4α)|𝐚i𝐯|2}\displaystyle c_{0}^{2}\left\{-\sum_{i=1}^{m}\Big{[}\frac{f^{\prime\prime}_{i,\mathbf{v}}(\mathbf{z})}{2}-\frac{6}{\alpha-1}|\mathbf{a}_{i}^{*}\mathbf{v}|^{2}\Big{]}\!+\!\!\!\!\!\!\sum_{i\in\mathcal{S}(\mathbf{v},c_{0}\alpha)}\frac{f^{\prime\prime}_{i,\mathbf{v}}(\mathbf{z})}{2}+\!\!\!\!\!\sum_{i\in\mathcal{S}(\mathbf{v},c_{0}\alpha)}\!\!\!\!\!(2+4\alpha)|\mathbf{a}_{i}^{*}\mathbf{v}|^{2}\right\}
\displaystyle\leq 𝐱𝐳2{m2f𝐯′′(𝐳)+i=1m6α1|𝐚i𝐯|2+i𝒮(𝐯,c0α)(2+4α)|𝐚i𝐯|2},\displaystyle{\|\mathbf{x}-\mathbf{z}\|^{2}}\left\{-\frac{m}{2}f^{\prime\prime}_{\mathbf{v}}(\mathbf{z})+\sum_{i=1}^{m}\frac{6}{\alpha-1}|\mathbf{a}_{i}^{*}\mathbf{v}|^{2}+\sum_{i\in\mathcal{S}(\mathbf{v},c_{0}\alpha)}(2+4\alpha)|\mathbf{a}_{i}^{*}\mathbf{v}|^{2}\right\},

where the first inequality follows from (13) and (14), and the last inequality applies the observation that |fi,𝐯′′(𝐳)|=|(𝐚i𝐱𝐯𝐚i+𝐚i𝐯𝐱𝐚i)22|𝐚i𝐱|2|2|𝐚i𝐯|2|f^{\prime\prime}_{i,\mathbf{v}}(\mathbf{z})|=|\frac{(\mathbf{a}_{i}^{*}\mathbf{x}\mathbf{v}^{*}\mathbf{a}_{i}+\mathbf{a}_{i}^{*}\mathbf{v}\mathbf{x}^{*}\mathbf{a}_{i})^{2}}{2|\mathbf{a}_{i}^{*}\mathbf{x}|^{2}}|\leq 2|\mathbf{a}_{i}^{*}\mathbf{v}|^{2}. Recall the definition of LL in (7), we have proved (7) with LL defined in (12) when 𝐱𝐳=c0\|\mathbf{x}-\mathbf{z}\|=c_{0}.

When 𝐱𝐳<c0\|\mathbf{x}-\mathbf{z}\|<c_{0}, applying the same procedure we can show that (7) holds with LL defined by

nmmin𝐯=1{m2f𝐯′′(𝐳)6α1i=1m|𝐚i𝐯|2(2+4α)i𝒮(𝐯,𝐱𝐳α)|𝐚i𝐯|2}.\frac{n}{m}\min_{\|\mathbf{v}\|=1}\left\{\frac{m}{2}f^{\prime\prime}_{\mathbf{v}}(\mathbf{z})-\frac{6}{\alpha-1}\sum_{i=1}^{m}|\mathbf{a}_{i}^{*}\mathbf{v}|^{2}-(2+4\alpha)\sum_{i\in\mathcal{S}(\mathbf{v},\|\mathbf{x}-\mathbf{z}\|\alpha)}|\mathbf{a}_{i}^{*}\mathbf{v}|^{2}\right\}. (15)

By definition, 𝐱𝐳<c0\|\mathbf{x}-\mathbf{z}\|<c_{0} implies that 𝒮(𝐯,𝐱𝐳α)𝒮(𝐯,c0α)\mathcal{S}(\mathbf{v},\|\mathbf{x}-\mathbf{z}\|\alpha)\subseteq\mathcal{S}(\mathbf{v},c_{0}\alpha). As a result, the expression in (15) is greater or equal than LL defined in (12), and (7) with LL defined in (12) also holds when 𝐱𝐳<c0\|\mathbf{x}-\mathbf{z}\|<c_{0}, and Theorem 3 is proved.

5 The Gaussian model

In this section, we will show that under the Gaussian model, LL defined in (7) and (12) can be estimated and has a lower bound.

Theorem 4

Assuming that the sensing vectors {𝐚i}i=1m\{\mathbf{a}_{i}\}_{i=1}^{m} are selected uniformly and independently from the unit sphere in 𝒞n\mathcal{C}^{n} and 2c0α>12c_{0}\alpha>1, then there exists C0C_{0} that does not depend on nn and mm such that when m>C0nlognm>C_{0}n\log n, with probability at least 1Cnexp(cn)1-Cn\exp(-cn), LL defined in (12) has a lower bound

Lc1246.6α1(2+4α)C42c0α.L\geq\frac{c_{1}}{24}-\frac{6.6}{\alpha-1}-(2+4\alpha)C4\sqrt{2}c_{0}\alpha. (16)
Proof

The proof is based on bounding each component in (12) separately in Lemma 2, Lemma 3, and Lemma 4. Combining these estimations with δ=exp(nlogn)\delta=\exp(-n\log n) in Lemma 2 and β=2c0α\beta=2c_{0}\alpha in Lemma 4, Theorem 4 is proved.

The following is a restatement of (vershynin2010introduction, , Lemma 5.2). While the original setting is real-valued, it is easy to generalize it to the complex-valued setting by treating 𝒞n\mathcal{C}^{n} as 2n\mathbb{R}^{2n}.

Lemma 1 (Covering numbers of the sphere)

There exists an ϵ\epsilon-net over the unit sphere in 𝒞n\mathcal{C}^{n} equipped with the Euclidean metric, with at most (1+2ϵ)2n(1+\frac{2}{\epsilon})^{2n} points.

The following lemma bounds the second term in (12). In fact, this term is the squared operator norm of 𝐀\mathbf{A} and has been well studied, and the following result follows from (10.1093/imaiai/iay005, , Lemma 5.8).

Lemma 2 (The bound on the second term in (12))

If mC(n+log(1/δ))m\geq C(n+\sqrt{\log(1/\delta)}), then for any δ>0\delta>0,

Pr(max𝐯=11mi=1m𝐚i𝐯21.1n)1δ.\Pr\left(\max_{\|\mathbf{v}\|=1}\frac{1}{m}\sum_{i=1}^{m}\|\mathbf{a}_{i}^{*}\mathbf{v}\|^{2}\leq\frac{1.1}{n}\right)\geq 1-\delta.

The following lemma bounds the first term in (12). The proof is based on an ϵ\epsilon-net argument. The proof is deferred to Section 5.1.

Lemma 3 (The bound on the first term in (12))

For any fixed 𝐳,𝐯n\mathbf{z},\mathbf{v}\in\mathbb{C}^{n} with 𝐳=𝐯=1\|\mathbf{z}\|=\|\mathbf{v}\|=1, assuming that {𝐚i}i=1m\{\mathbf{a}_{i}\}_{i=1}^{m} are selected uniformly and independently from the unit sphere in n\mathbb{C}^{n}, there exists c1>0c_{1}>0 such that

Pr(min𝐯=11mi=1m(𝐚i𝐳𝐯𝐚i+𝐚i𝐯𝐳𝐚i)22|𝐚i𝐳|2c1m24n)\displaystyle\Pr\left(\min_{\|\mathbf{v}\|=1}\frac{1}{m}\sum_{i=1}^{m}\frac{(\mathbf{a}_{i}^{*}\mathbf{z}\mathbf{v}^{*}\mathbf{a}_{i}+\mathbf{a}_{i}^{*}\mathbf{v}\mathbf{z}^{*}\mathbf{a}_{i})^{2}}{2|\mathbf{a}_{i}^{*}\mathbf{z}|^{2}}\geq\frac{c_{1}m}{24n}\right)
\displaystyle\geq 1(exp(m576)+exp(m8))(1+192nc1)2n.\displaystyle 1-\left(\exp\left(-\frac{m}{576}\right)+\exp\left(-\frac{m}{8}\right)\right)\left(1+\frac{192n}{c_{1}}\right)^{2n}. (17)

The following lemma bounds the last term in (12). We remark that it shares some similarities with the estimation in (10.1093/imaiai/iay005, , Theorem 5.7), in the sense that both estimations depend on a “wedge”. However, the “wedge” 𝒮(𝐯,β/2)\mathcal{S}(\mathbf{v},\beta/2) in the complex setting is more complicated and the argument based on VC theory in 10.1093/imaiai/iay005 does not apply. Instead, the proof of Lemma 4 consists of two parts: first, we control the size of 𝒮(𝐯,β/2)\mathcal{S}(\mathbf{v},\beta/2) based on an ϵ\epsilon-net argument. Then we can invoke the metric entropy/chaining argument in 10.1093/imaiai/iay005 . The proof is deferred to Section 5.2.

Lemma 4 (The bound on the third term in (12))

Assume β>1\beta>1 and mmax(n,nlogn2β2)m\geq\max(n,\frac{n\log n}{2\beta^{2}}), then there exists CC that does not depend on n,m,βn,m,\beta such that

Pr(max𝐯=1i𝒮(𝐯,β/2)|𝐚i𝐯|2C2βmn)12nexp(n)2exp(mβ42)(1+2n)2n.\Pr\!\left(\!\max_{\|\mathbf{v}\|=1}\!\!\!\!\sum_{i\in\mathcal{S}(\mathbf{v},\beta/2)}\!\!\!\!|\mathbf{a}_{i}^{*}\mathbf{v}|^{2}\leq C\sqrt{2}\beta\frac{m}{n}\!\right)\!\!\geq 1-2n\exp(-n)-2\exp\!\Big{(}\!-\frac{m\beta^{4}}{2}\!\Big{)}(1+2n)^{2n}.

5.1 Proof of Lemma 3

Proof

The proof will be based on an ϵ\epsilon-net argument. In the first step, we will show that for any 𝐯\mathbf{v} with 𝐯=1\|\mathbf{v}\|=1, the event in the LHS of (17) happens with high probability. Second, we will establish a perturbation bound on 𝐯\mathbf{v} when the perturbation is smaller than ϵ\epsilon. Then a standard ϵ\epsilon-net argument will be applied.

First, we note that

(𝐚i𝐳𝐯𝐚i+𝐚i𝐯𝐳𝐚i)22|𝐚i𝐳|2=(𝐚i𝐳𝐯𝐚i+𝐚i𝐯𝐳𝐚i)22|𝐚i𝐳|2PSp(𝐯,𝐳)(𝐚i)2PSp(𝐯,𝐳)(𝐚i)2\displaystyle\frac{(\mathbf{a}_{i}^{*}\mathbf{z}\mathbf{v}^{*}\mathbf{a}_{i}+\mathbf{a}_{i}^{*}\mathbf{v}\mathbf{z}^{*}\mathbf{a}_{i})^{2}}{2|\mathbf{a}_{i}^{*}\mathbf{z}|^{2}}=\frac{(\mathbf{a}_{i}^{*}\mathbf{z}\mathbf{v}^{*}\mathbf{a}_{i}+\mathbf{a}_{i}^{*}\mathbf{v}\mathbf{z}^{*}\mathbf{a}_{i})^{2}}{2|\mathbf{a}_{i}^{*}\mathbf{z}|^{2}\|P_{\mathrm{Sp}(\mathbf{v},\mathbf{z})}(\mathbf{a}_{i})\|^{2}}\|P_{\mathrm{Sp}(\mathbf{v},\mathbf{z})}(\mathbf{a}_{i})\|^{2} (18)
=\displaystyle= (𝐛i𝐳^𝐯^𝐛i+𝐛i𝐯^𝐳^𝐛i)22|𝐛i𝐳^|2PSp(𝐯,𝐳)(𝐚i)2,\displaystyle\frac{(\mathbf{b}_{i}^{*}\hat{\mathbf{z}}\hat{\mathbf{v}}^{*}\mathbf{b}_{i}+\mathbf{b}_{i}^{*}\hat{\mathbf{v}}\hat{\mathbf{z}}^{*}\mathbf{b}_{i})^{2}}{2|\mathbf{b}_{i}^{*}\hat{\mathbf{z}}|^{2}}\|P_{\mathrm{Sp}(\mathbf{v},\mathbf{z})}(\mathbf{a}_{i})\|^{2},

where 𝐛i,𝐳^,𝐯^𝒞2\mathbf{b}_{i},\hat{\mathbf{z}},\hat{\mathbf{v}}\in\mathcal{C}^{2} are defined by 𝐛i=PSp(𝐯,𝐳)(𝐚i)PSp(𝐯,𝐳)(𝐚i)\mathbf{b}_{i}=\frac{P_{\mathrm{Sp}(\mathbf{v},\mathbf{z})}(\mathbf{a}_{i})}{\|P_{\mathrm{Sp}(\mathbf{v},\mathbf{z})}(\mathbf{a}_{i})\|}, 𝐳^=PSp(𝐯,𝐳)𝐳\hat{\mathbf{z}}=P_{\mathrm{Sp}(\mathbf{v},\mathbf{z})}\mathbf{z}, and 𝐯^=PSp(𝐯,𝐳)𝐯\hat{\mathbf{v}}=P_{\mathrm{Sp}(\mathbf{v},\mathbf{z})}\mathbf{v}. Here 𝐛i\mathbf{b}_{i} is sampled uniformly and independently from a unit circle in 𝒞2\mathcal{C}^{2}, and it is independent of PSp(𝐯,𝐳)(𝐚i)\|P_{\mathrm{Sp}(\mathbf{v},\mathbf{z})}(\mathbf{a}_{i})\|.

Considering that PSp(𝐯,𝐳)(𝐚i)P_{\mathrm{Sp}(\mathbf{v},\mathbf{z})}(\mathbf{a}_{i}) is a projection of a random unit vector from 𝒞n\mathcal{C}^{n} to 𝒞2\mathcal{C}^{2}, there exists c1>0c_{1}>0 such that (in fact, one can verify it with c1=0.8c_{1}=0.8)

Pr(PSp(𝐯,𝐳)(𝐚i)2c1n)34.\Pr\left(\|P_{\mathrm{Sp}(\mathbf{v},\mathbf{z})}(\mathbf{a}_{i})\|^{2}\geq\frac{c_{1}}{n}\right)\geq\frac{3}{4}.

Then Hoeffding’s inequality suggests that

P(i=1mI(PSp(𝐯,𝐳)(𝐚i)2c1n)m2)1exp(m8).P\left(\sum_{i=1}^{m}I\left(\|P_{\mathrm{Sp}(\mathbf{v},\mathbf{z})}(\mathbf{a}_{i})\|^{2}\geq\frac{c_{1}}{n}\right)\geq\frac{m}{2}\right)\geq 1-\exp\Big{(}-\frac{m}{8}\Big{)}. (19)

For the component (𝐛i𝐳^𝐯^𝐛i+𝐛i𝐯^𝐳^𝐛i)22|𝐛i𝐳^|2\frac{(\mathbf{b}_{i}^{*}\hat{\mathbf{z}}\hat{\mathbf{v}}^{*}\mathbf{b}_{i}+\mathbf{b}_{i}^{*}\hat{\mathbf{v}}\hat{\mathbf{z}}^{*}\mathbf{b}_{i})^{2}}{2|\mathbf{b}_{i}^{*}\hat{\mathbf{z}}|^{2}}, note that 𝐛i\mathbf{b}_{i} is distributed uniformly on a circle in 𝒞2\mathcal{C}^{2}, so WLOG we may assume that 𝐳^=[1,0]\hat{\mathbf{z}}=[1,0] and 𝐯^=[cosθ,sinθ]\hat{\mathbf{v}}=[\cos\theta,\sin\theta], and then

𝐛i𝐳^𝐯^𝐛i+𝐛i𝐯^𝐳^𝐛i=2Re(𝐛i𝐳^𝐯^𝐛i)=2Re(cosθ|𝐛i,1|2+sinθ𝐛i,1𝐛i,2)\displaystyle\mathbf{b}_{i}^{*}\hat{\mathbf{z}}\hat{\mathbf{v}}^{*}\mathbf{b}_{i}+\mathbf{b}_{i}^{*}\hat{\mathbf{v}}\hat{\mathbf{z}}^{*}\mathbf{b}_{i}=2\mathrm{Re}(\mathbf{b}_{i}^{*}\hat{\mathbf{z}}\hat{\mathbf{v}}^{*}\mathbf{b}_{i})=2\mathrm{Re}(\cos\theta|\mathbf{b}_{i,1}|^{2}+\sin\theta\mathbf{b}_{i,1}^{*}\mathbf{b}_{i,2})
=\displaystyle= 2cosθ|𝐛i,1|2+2cosθsinθRe(𝐛i,1𝐛i,2),\displaystyle 2\cos\theta|\mathbf{b}_{i,1}|^{2}+2\cos\theta\sin\theta\mathrm{Re}(\mathbf{b}_{i,1}^{*}\mathbf{b}_{i,2}),

and applying |𝐛i𝐳^|2=|𝐛i,1|2|\mathbf{b}_{i}^{*}\hat{\mathbf{z}}|^{2}=|\mathbf{b}_{i,1}|^{2},

(𝐛i𝐳^𝐯^𝐛i+𝐛i𝐯^𝐳^𝐛i)22|𝐛i𝐳^|2=cos2θ|𝐛i,1|2+2cosθsinθRe(𝐛i,1𝐛i,2)+sin2θ(Re(𝐛i,1𝐛i,2))2|𝐛i,1|2.\frac{(\mathbf{b}_{i}^{*}\hat{\mathbf{z}}\hat{\mathbf{v}}^{*}\mathbf{b}_{i}+\mathbf{b}_{i}^{*}\hat{\mathbf{v}}\hat{\mathbf{z}}^{*}\mathbf{b}_{i})^{2}}{2|\mathbf{b}_{i}^{*}\hat{\mathbf{z}}|^{2}}=\cos^{2}\theta|\mathbf{b}_{i,1}|^{2}+2\cos\theta\sin\theta\mathrm{Re}(\mathbf{b}_{i,1}^{*}\mathbf{b}_{i,2})+\sin^{2}\theta\frac{(\mathrm{Re}(\mathbf{b}_{i,1}^{*}\mathbf{b}_{i,2}))^{2}}{|\mathbf{b}_{i,1}|^{2}}. (20)

By the symmetry of the distribution of 𝐛i,2\mathbf{b}_{i,2}, when 𝐛i,1\mathbf{b}_{i,1} is fixed, 𝔼𝐛i,2Re(𝐛i,1𝐛i,2)=0\operatorname{\mathbb{E}}_{\mathbf{b}_{i,2}}{\mathrm{Re}(\mathbf{b}_{i,1}^{*}\mathbf{b}_{i,2})}=0 and 𝔼𝐛i,2(Re(𝐛i,1𝐛i,2))2|𝐛i,1|2=12𝔼𝐛i,2|𝐛i,1𝐛i,2|2|𝐛i,1|2=12𝔼𝐛i,2|𝐛i,2|2=12(1|𝐛i,1|2)\operatorname{\mathbb{E}}_{\mathbf{b}_{i,2}}\frac{(\mathrm{Re}(\mathbf{b}_{i,1}^{*}\mathbf{b}_{i,2}))^{2}}{|\mathbf{b}_{i,1}|^{2}}=\frac{1}{2}\operatorname{\mathbb{E}}_{\mathbf{b}_{i,2}}\frac{|\mathbf{b}_{i,1}^{*}\mathbf{b}_{i,2}|^{2}}{|\mathbf{b}_{i,1}|^{2}}=\frac{1}{2}\operatorname{\mathbb{E}}_{\mathbf{b}_{i,2}}|\mathbf{b}_{i,2}|^{2}=\frac{1}{2}(1-|\mathbf{b}_{i,1}|^{2}). Combining it with 𝔼|𝐛i,1|2=1/2\operatorname{\mathbb{E}}|\mathbf{b}_{i,1}|^{2}=1/2 and (20), we have

𝔼[(𝐛i𝐳^𝐯^𝐛i+𝐛i𝐯^𝐳^𝐛i)22|𝐛i𝐳^|2]=12cos2θ+14sin2θ14.\operatorname{\mathbb{E}}\left[\frac{(\mathbf{b}_{i}^{*}\hat{\mathbf{z}}\hat{\mathbf{v}}^{*}\mathbf{b}_{i}+\mathbf{b}_{i}^{*}\hat{\mathbf{v}}\hat{\mathbf{z}}^{*}\mathbf{b}_{i})^{2}}{2|\mathbf{b}_{i}^{*}\hat{\mathbf{z}}|^{2}}\right]=\frac{1}{2}\cos^{2}\theta+\frac{1}{4}\sin^{2}\theta\geq\frac{1}{4}.

Since for each 1im1\leq i\leq m,

(𝐛i𝐳^𝐯^𝐛i+𝐛i𝐯^𝐳^𝐛i)22|𝐛i𝐳^|22|𝐛i𝐯^|22,\frac{(\mathbf{b}_{i}^{*}\hat{\mathbf{z}}\hat{\mathbf{v}}^{*}\mathbf{b}_{i}+\mathbf{b}_{i}^{*}\hat{\mathbf{v}}\hat{\mathbf{z}}^{*}\mathbf{b}_{i})^{2}}{2|\mathbf{b}_{i}^{*}\hat{\mathbf{z}}|^{2}}\leq 2|\mathbf{b}_{i}^{*}\hat{\mathbf{v}}|^{2}\leq 2,

When the event in (19) holds,

there exists a set \mathcal{I} of size m/2m/2 such that for all ii\in\mathcal{I}, PSp(𝐯,𝐳)(𝐚i)2c1n\|P_{\mathrm{Sp}(\mathbf{v},\mathbf{z})}(\mathbf{a}_{i})\|^{2}\geq\frac{c_{1}}{n}. (21)

Hoeffding’s inequality then gives the estimation

Pr(2mi(𝐛i𝐳𝐯𝐛i+𝐛i𝐯𝐳𝐛i)22|𝐛i𝐳|214t)1exp(mt24).\Pr\left(\frac{2}{m}\sum_{i\in\mathcal{I}}\frac{(\mathbf{b}_{i}^{*}\mathbf{z}\mathbf{v}^{*}\mathbf{b}_{i}+\mathbf{b}_{i}^{*}\mathbf{v}\mathbf{z}^{*}\mathbf{b}_{i})^{2}}{2|\mathbf{b}_{i}^{*}\mathbf{z}|^{2}}\geq\frac{1}{4}-t\right)\geq 1-\exp\left(-\frac{mt^{2}}{4}\right). (22)

Combining (18), (21), and (22),

Pr(2mi(𝐚i𝐳𝐯𝐚i+𝐚i𝐯𝐳𝐚i)22|𝐚i𝐳|2(14t)c1n)1exp(mt24)exp(m8),\Pr\left(\frac{2}{m}\sum_{i\in\mathcal{I}}\frac{(\mathbf{a}_{i}^{*}\mathbf{z}\mathbf{v}^{*}\mathbf{a}_{i}+\mathbf{a}_{i}^{*}\mathbf{v}\mathbf{z}^{*}\mathbf{a}_{i})^{2}}{2|\mathbf{a}_{i}^{*}\mathbf{z}|^{2}}\geq\Big{(}\frac{1}{4}-t\Big{)}\frac{c_{1}}{n}\right)\geq 1-\exp\left(-\frac{mt^{2}}{4}\right)-\exp\Big{(}-\frac{m}{8}\Big{)}, (23)

which leads to

Pr(i=1m(𝐚i𝐳𝐯𝐚i+𝐚i𝐯𝐳𝐚i)22|𝐚i𝐳|2(14t)c1m2n)1exp(mt24)exp(m8).\Pr\left(\sum_{i=1}^{m}\frac{(\mathbf{a}_{i}^{*}\mathbf{z}\mathbf{v}^{*}\mathbf{a}_{i}+\mathbf{a}_{i}^{*}\mathbf{v}\mathbf{z}^{*}\mathbf{a}_{i})^{2}}{2|\mathbf{a}_{i}^{*}\mathbf{z}|^{2}}\geq\Big{(}\frac{1}{4}-t\Big{)}\frac{c_{1}m}{2n}\right)\geq 1-\exp\left(-\frac{mt^{2}}{4}\right)-\exp\Big{(}-\frac{m}{8}\Big{)}. (24)

Second, we will establish a perturbation bound on 𝐯\mathbf{v}. For any 𝐯\mathbf{v}^{\prime} such that 𝐯𝐯ϵ\|\mathbf{v}^{\prime}-\mathbf{v}\|\leq\epsilon,

i=1m(𝐚i𝐳𝐯𝐚i+𝐚i𝐯𝐳𝐚i)22|𝐚i𝐳|2i=1m(𝐚i𝐳𝐯𝐚i+𝐚i𝐯𝐳𝐚i)22|𝐚i𝐳|2\displaystyle\sum_{i=1}^{m}\frac{(\mathbf{a}_{i}^{*}\mathbf{z}\mathbf{v}^{*}\mathbf{a}_{i}+\mathbf{a}_{i}^{*}\mathbf{v}\mathbf{z}^{*}\mathbf{a}_{i})^{2}}{2|\mathbf{a}_{i}^{*}\mathbf{z}|^{2}}-\sum_{i=1}^{m}\frac{(\mathbf{a}_{i}^{*}\mathbf{z}\mathbf{v}^{\prime*}\mathbf{a}_{i}+\mathbf{a}_{i}^{*}\mathbf{v}^{\prime}\mathbf{z}^{*}\mathbf{a}_{i})^{2}}{2|\mathbf{a}_{i}^{*}\mathbf{z}|^{2}}
=\displaystyle= 4i=1mRe(𝐚i𝐳(𝐯+𝐯)𝐚i)Re(𝐚i𝐳(𝐯𝐯)𝐚i)2|𝐚i𝐳|24ϵi=1m𝐚i2=4mϵ.\displaystyle 4\sum_{i=1}^{m}\frac{\mathrm{Re}(\mathbf{a}_{i}^{*}\mathbf{z}(\mathbf{v}+\mathbf{v}^{\prime})^{*}\mathbf{a}_{i})\mathrm{Re}(\mathbf{a}_{i}^{*}\mathbf{z}(\mathbf{v}-\mathbf{v}^{\prime})^{*}\mathbf{a}_{i})}{2|\mathbf{a}_{i}^{*}\mathbf{z}|^{2}}\leq 4\epsilon\sum_{i=1}^{m}\|\mathbf{a}_{i}\|^{2}=4m\epsilon.

Combining it with (24), Lemma 1, and a standard ϵ\epsilon-net argument, we have

Pr(min𝐯=11mi=1m(𝐚i𝐳𝐯𝐚i+𝐚i𝐯𝐳𝐚i)22|𝐚i𝐳|2c1m2n(14t)4mϵ)\displaystyle\Pr\left(\min_{\|\mathbf{v}\|=1}\frac{1}{m}\sum_{i=1}^{m}\frac{(\mathbf{a}_{i}^{*}\mathbf{z}\mathbf{v}^{*}\mathbf{a}_{i}+\mathbf{a}_{i}^{*}\mathbf{v}\mathbf{z}^{*}\mathbf{a}_{i})^{2}}{2|\mathbf{a}_{i}^{*}\mathbf{z}|^{2}}\geq\frac{c_{1}m}{2n}\Big{(}\frac{1}{4}-t\Big{)}-4m\epsilon\right)
\displaystyle\geq 1(exp(mt24)+exp(m8))(1+2ϵ)2n.\displaystyle 1-\left(\exp\left(-\frac{mt^{2}}{4}\right)+\exp\left(-\frac{m}{8}\right)\right)\left(1+\frac{2}{\epsilon}\right)^{2n}.

Set t=112t=\frac{1}{12} and ϵ=c196n\epsilon=\frac{c_{1}}{96n}, (17) and Lemma 3 are proved.

5.2 Proof of Lemma 4

In this section, we will first present a few lemmas and their proof, and then prove Lemma 4 in the end. In particular, we will estimate the expected value of |𝒮(𝐯,β)||\mathcal{S}(\mathbf{v},\beta)| in Lemma 5; then we will investigate the perturbation of |𝒮(𝐯,β)||\mathcal{S}(\mathbf{v},\beta)| when 𝐯\mathbf{v} is perturbed in Lemma 6. Next, an ϵ\epsilon-net argument will be used to give a uniform upper bound on |𝒮(𝐯,β)||\mathcal{S}(\mathbf{v},\beta)| for all 𝐯\mathbf{v}. Combining this uniform upper bound and Lemma 7, Lemma 4 is proved.

Lemma 5

Fix 𝐯\mathbf{v} with 𝐯=1\|\mathbf{v}\|=1 and assume β1\beta\geq 1, then for each 1im1\leq i\leq m,

Pr{|𝐚i𝐳|β|𝐚i𝐯|}β21+β2.\Pr\{|\mathbf{a}_{i}^{*}\mathbf{z}|\leq\beta|\mathbf{a}_{i}^{*}\mathbf{v}|\}\leq\frac{\beta^{2}}{1+\beta^{2}}.
Proof (Proof of Lemma 5)

First, we will show that it is sufficient to prove the case 𝐯𝐳\mathbf{v}\perp\mathbf{z}. Assume 𝐯⟂̸𝐳\mathbf{v}\not\perp\mathbf{z}, then there exists 𝐮\mathbf{u} such that 𝐮=1\|\mathbf{u}\|=1, 𝐮𝐳\mathbf{u}\perp\mathbf{z}, 𝐯Sp(𝐮,𝐳)\mathbf{v}\in\mathrm{Sp}(\mathbf{u},\mathbf{z}). WLOG assume that 𝐯=eiη1cosθ𝐳+eiη2sinθ𝐮\mathbf{v}=e^{i\eta_{1}}\cos\theta\mathbf{z}+e^{i\eta_{2}}\sin\theta\mathbf{u}, then we have |𝐚i𝐯|2=cos2θ|𝐚i𝐳|2+sin2θ|𝐚i𝐮|2|\mathbf{a}_{i}^{*}\mathbf{v}|^{2}=\cos^{2}\theta|\mathbf{a}_{i}^{*}\mathbf{z}|^{2}+\sin^{2}\theta|\mathbf{a}_{i}^{*}\mathbf{u}|^{2} and

|𝐚i𝐯|2|𝐚i𝐳|2=cos2θ|𝐚i𝐳|2+sin2θ|𝐚i𝐮|2|𝐚i𝐳|2=cos2θ+sin2θ|𝐚i𝐮|2|𝐚i𝐳|2.\frac{|\mathbf{a}_{i}^{*}\mathbf{v}|^{2}}{|\mathbf{a}_{i}^{*}\mathbf{z}|^{2}}=\frac{\cos^{2}\theta|\mathbf{a}_{i}^{*}\mathbf{z}|^{2}+\sin^{2}\theta|\mathbf{a}_{i}^{*}\mathbf{u}|^{2}}{|\mathbf{a}_{i}^{*}\mathbf{z}|^{2}}=\cos^{2}\theta+\sin^{2}\theta\frac{|\mathbf{a}_{i}^{*}\mathbf{u}|^{2}}{|\mathbf{a}_{i}^{*}\mathbf{z}|^{2}}. (25)

Assuming that i𝒮(𝐯,β)i\in\mathcal{S}(\mathbf{v},\beta), then |𝐚i𝐯|2|𝐚i𝐳|21β21\frac{|\mathbf{a}_{i}^{*}\mathbf{v}|^{2}}{|\mathbf{a}_{i}^{*}\mathbf{z}|^{2}}\geq\frac{1}{\beta^{2}}\geq 1 and (25) implies

|𝐚i𝐮|2|𝐚i𝐳|2|𝐚i𝐯|2|𝐚i𝐳|2,\frac{|\mathbf{a}_{i}^{*}\mathbf{u}|^{2}}{|\mathbf{a}_{i}^{*}\mathbf{z}|^{2}}\geq\frac{|\mathbf{a}_{i}^{*}\mathbf{v}|^{2}}{|\mathbf{a}_{i}^{*}\mathbf{z}|^{2}},

so ii is also in 𝒮(𝐮,β)\mathcal{S}(\mathbf{u},\beta). Therefore, 𝒮(𝐯,β)𝒮(𝐮,β)\mathcal{S}(\mathbf{v},\beta)\subseteq\mathcal{S}(\mathbf{u},\beta), and it is sufficient to prove Lemma 5 for the case 𝐯𝐳\mathbf{v}\perp\mathbf{z}.

Note that Pr{|𝐚i𝐳|β|𝐚i𝐯|}\Pr\{|\mathbf{a}_{i}^{*}\mathbf{z}|\leq\beta|\mathbf{a}_{i}^{*}\mathbf{v}|\} does not change with the scaling of 𝐚i\mathbf{a}_{i}, so we may assume that 𝐚i\mathbf{a}_{i} are sampled from complex Gaussian distribution CN(0,2𝐈)CN(0,\sqrt{2}\mathbf{I}), i.e., each real and imaginary component are sampled from N(0,1)N(0,1). With 𝐯𝐳\mathbf{v}\perp\mathbf{z},

Pr(𝐚i𝐳|β|𝐚i𝐯|)=Pr(g12+g22βg32+g42),\Pr\left(\mathbf{a}_{i}^{*}\mathbf{z}|\leq\beta|\mathbf{a}_{i}^{*}\mathbf{v}|\right)=\Pr\left(\sqrt{g_{1}^{2}+g_{2}^{2}}\leq\beta\sqrt{g_{3}^{2}+g_{4}^{2}}\right),

assuming that g1,g2,g3,g4g_{1},g_{2},g_{3},g_{4} are i.i.d. sampled from N(0,1)N(0,1). By calculation, both g12+g22\sqrt{g_{1}^{2}+g_{2}^{2}} and g32+g42\sqrt{g_{3}^{2}+g_{4}^{2}} have the probability density function xex2/2xe^{-x^{2}/2} with x0x\geq 0. Therefore,

Pr(g12+g22βg12+g22)=x=0xex2/2y=xβyey2/2dydx\displaystyle\Pr\left(\sqrt{g_{1}^{2}+g_{2}^{2}}\leq\beta\sqrt{g_{1}^{2}+g_{2}^{2}}\right)=\int_{x=0}^{\infty}xe^{-x^{2}/2}\int_{y=\frac{x}{\beta}}^{\infty}ye^{-y^{2}/2}{\,\mathrm{d}}y{\,\mathrm{d}}x
=\displaystyle= x=0xex2/2ex2/2β2dx=β21+β2,\displaystyle\int_{x=0}^{\infty}xe^{-x^{2}/2}e^{-x^{2}/2\beta^{2}}{\,\mathrm{d}}x=\frac{\beta^{2}}{1+\beta^{2}},

and Lemma 5 is then proved.

Lemma 6

For any 𝐯,𝐯𝒞n\mathbf{v},\mathbf{v}^{\prime}\in\mathcal{C}^{n} with 𝐯𝐯ϵ\|\mathbf{v}-\mathbf{v}^{\prime}\|\leq\epsilon, we have

|𝒮(𝐯,β1ϵc2)||𝒮(𝐯,β)|+|{1im:𝐚i/|𝐚i𝐯|c2}|.\Big{|}\mathcal{S}\Big{(}\mathbf{v}^{\prime},\frac{\beta}{1-\epsilon c_{2}}\Big{)}\Big{|}\leq|\mathcal{S}(\mathbf{v},\beta)|+|\{1\leq i\leq m:\|\mathbf{a}_{i}\|/|\mathbf{a}_{i}^{*}\mathbf{v}|\geq c_{2}\}|.
Proof (Proof of Lemma 6)

If i𝒮(𝐯,β)i\in\mathcal{S}(\mathbf{v},\beta) and 𝐚i/|𝐚i𝐯|c2\|\mathbf{a}_{i}^{*}\|/|\mathbf{a}_{i}^{*}\mathbf{v}|\leq c_{2}, then

|𝐚i𝐯||𝐚i𝐯|+ϵ𝐚i|𝐚i𝐯|+ϵc2|𝐚i𝐯|,|\mathbf{a}_{i}^{*}\mathbf{v}|\leq|\mathbf{a}_{i}^{*}\mathbf{v}^{\prime}|+\epsilon\|\mathbf{a}_{i}^{*}\|\leq|\mathbf{a}_{i}^{*}\mathbf{v}^{\prime}|+\epsilon c_{2}|\mathbf{a}_{i}^{*}\mathbf{v}|,

so 𝐚i𝐳β|𝐚i𝐯|β1ϵc2|𝐚i𝐯|\|\mathbf{a}_{i}^{*}\mathbf{z}\|\leq\beta|\mathbf{a}_{i}^{*}\mathbf{v}|\leq\frac{\beta}{1-\epsilon c_{2}}|\mathbf{a}_{i}^{*}\mathbf{v}^{\prime}|.

The following is a restatement of (10.1093/imaiai/iay005, , Theorem 5.7). While the statement is proved for the real-valued case, it can be generalized to the complex-valued case by treating the real component and the imaginary component separately.

Lemma 7

Let 0<δ<1/20<\delta<1/2, 0<c3<10<c_{3}<1, suppose mmax(n,log(1/δ)/c3)m\geq\max(n,\log(1/\delta)/c_{3}), then with probability at least 12δ1-2\delta, for any set 𝒮{1,,m}\mathcal{S}\in\{1,\cdots,m\} with |𝒮|c3m|\mathcal{S}|\leq c_{3}m, we have

i𝒮𝐚i𝐚iCc3mn.\Big{\|}\sum_{i\in\mathcal{S}}\mathbf{a}_{i}\mathbf{a}_{i}^{*}\Big{\|}\leq C\sqrt{c_{3}}\frac{m}{n}.
Proof (Proof of Lemma 4)

To investigate the probability that 𝐚i/|𝐚i𝐯|c2\|\mathbf{a}_{i}^{*}\|/|\mathbf{a}_{i}^{*}\mathbf{v}|\geq c_{2}, WLOG we may assume scale 𝐚i𝒞n\mathbf{a}_{i}\in\mathcal{C}^{n} and assume that the real and the complex component of each element of 𝐚i\mathbf{a}_{i} is sampled from N(0,1)N(0,1). Then |𝐚i𝐯||\mathbf{a}_{i}^{*}\mathbf{v}| has a p.d.f. of f(x)=xex2/2f(x)=xe^{-x^{2}/2} for x0x\geq 0, and

Pr(|𝐚i𝐯|t)=exp(t2/2).\Pr(|\mathbf{a}_{i}^{*}\mathbf{v}|\geq t)=\exp(-t^{2}/2). (26)

In addition, 𝐚i2\|\mathbf{a}_{i}^{*}\|^{2} has the same distribution of χ2\chi^{2} distribution with a degree of freedom 2n2n. As a result, the tail bound of the χ2\chi^{2} distribution (wainwright2019high, , Example 2.11) implies (using one-sided inequality, apply t=1t=1 and replace nn by 2n2n from their notation)

Pr(𝐚i24n)exp(n/4).\Pr(\|\mathbf{a}_{i}\|^{2}\geq 4n)\leq\exp(-n/4). (27)

Combining (26) with t=2nc2t=\frac{2\sqrt{n}}{c_{2}} and (27), we have

Pr(𝐚i/|𝐚i𝐯|c2)1exp(2n/c22)+exp(n/4).\Pr(\|\mathbf{a}_{i}^{*}\|/|\mathbf{a}_{i}^{*}\mathbf{v}|\geq c_{2})\leq 1-\exp(-2n/c_{2}^{2})+\exp(-n/4).

Combining it with Lemma 5, we have that for each 1im1\leq i\leq m,

Pr(𝐚i|𝐚i𝐯|c2)1exp(2nc22)+exp(n4).\Pr\left(\frac{\|\mathbf{a}_{i}\|}{|\mathbf{a}_{i}^{*}\mathbf{v}|}\geq c_{2}\right)\leq 1-\exp\Big{(}-\frac{2n}{c_{2}^{2}}\Big{)}+\exp\Big{(}-\frac{n}{4}\Big{)}.

Applying Hoeffding inequality, for each 𝐯\mathbf{v},

Pr(|{1im:𝐚i|𝐚i𝐯|c2}|m(1exp(2nc22)+exp(n4)+t))\displaystyle\Pr\left(\Big{|}\Big{\{}1\leq i\leq m:\frac{\|\mathbf{a}_{i}\|}{|\mathbf{a}_{i}^{*}\mathbf{v}|}\geq c_{2}\Big{\}}\Big{|}\geq m\Big{(}1-\exp\Big{(}-\frac{2n}{c_{2}^{2}})+\exp\Big{(}-\frac{n}{4}\Big{)}+t\Big{)}\right)
\displaystyle\leq exp(2mt2).\displaystyle\exp(-2mt^{2}). (28)

Similarly, applying Hoeffding inequality to Lemma 5, for each 𝐯\mathbf{v},

Pr(|𝒮(𝐯,β)}|m(β21+β2+t))exp(2mt2).\displaystyle\Pr\left(\Big{|}\mathcal{S}(\mathbf{v},\beta)\Big{\}}\Big{|}\geq m\Big{(}\frac{\beta^{2}}{1+\beta^{2}}+t\Big{)}\right)\leq\exp(-2mt^{2}). (29)

Combining (28), (29) with c2=1/2ϵc_{2}=1/2\epsilon, then the standard ϵ\epsilon-net argument, Lemma 1, and Lemma 6 imply

Pr(max𝐯=1|𝒮(𝐯,β2)|m(1exp(8nϵ2)+exp(n4)+β21+β2+2t))\displaystyle\Pr\left(\max_{\|\mathbf{v}\|=1}\Big{|}\mathcal{S}\Big{(}\mathbf{v},\frac{\beta}{2}\Big{)}\Big{|}\geq m\Big{(}1-\exp(-8n\epsilon^{2})+\exp\big{(}-\frac{n}{4}\big{)}+\frac{\beta^{2}}{1+\beta^{2}}+2t\Big{)}\right)
\displaystyle\leq 2exp(2mt2)(1+2/ϵ)2n.\displaystyle 2\exp(-2mt^{2})(1+2/\epsilon)^{2n}. (30)

Let ϵ=1/n\epsilon=1/n, η=exp(nlogn)\eta=\exp(-n\log n), t=β2/2t=\beta^{2}/2, c3=2β2c_{3}=2\beta^{2}, note that 1exp(8nϵ2)+exp(n4)01-\exp(-8n\epsilon^{2})+\exp\big{(}-\frac{n}{4}\big{)}\rightarrow 0 as nn\rightarrow\infty, Lemma 4 is then proved using Lemma 7 and the fact that i𝒮𝐚i𝐚ii𝒮|𝐚i𝐯|2\Big{\|}\sum_{i\in\mathcal{S}}\mathbf{a}_{i}\mathbf{a}_{i}^{*}\Big{\|}\geq\sum_{i\in\mathcal{S}}|\mathbf{a}_{i}^{*}\mathbf{v}|^{2}.

6 The unitary model

In this section, we will show that under the unitary model, LL defined in (7) and (12) can be estimated and has a lower bound as follows’:

Theorem 5

Under the unitary model, there exists c0c_{0} and α\alpha such that 2c0α<12c_{0}\alpha<1, and

L=nmmin𝐯=1{12i=1m(𝐚i𝐳𝐯𝐚i+𝐚i𝐯𝐳𝐚i)22|𝐚i𝐳|26α1i=1m|𝐚i𝐯|2(2+4α)i𝒮(𝐯,c0α)|𝐚i𝐯|2}.L=\frac{n}{m}\min_{\|\mathbf{v}\|=1}\left\{\!\!\frac{1}{2}\!\sum_{i=1}^{m}\!\frac{(\mathbf{a}_{i}^{*}\mathbf{z}\mathbf{v}^{*}\!\mathbf{a}_{i}\!\!+\!\!\mathbf{a}_{i}^{*}\mathbf{v}\mathbf{z}^{*}\!\mathbf{a}_{i}\!)^{2}}{2|\mathbf{a}_{i}^{*}\mathbf{z}|^{2}}\!-\!\frac{6}{\alpha\!-\!1}\!\!\sum_{i=1}^{m}|\mathbf{a}_{i}^{*}\mathbf{v}|^{2}\!-\!(2\!+\!4\alpha)\!\!\!\!\!\!\!\!\sum_{i\in\mathcal{S}(\mathbf{v},c_{0}\alpha)}\!\!\!\!\!\!\!\!|\mathbf{a}_{i}^{*}\mathbf{v}|^{2}\!\!\right\}. (31)

is bounded from below with high probability, that is, L>cL>c for some c>0c>0 with probability 1Cnexp(Cn)1-Cn\exp(-Cn).

We will control each of the three expressions in (31) separately. For the component 6α1i=1m|𝐚i𝐯|2\frac{6}{\alpha\!-\!1}\!\!\sum_{i=1}^{m}|\mathbf{a}_{i}^{*}\mathbf{v}|^{2}, by definition it is equivalent to 6α1K=6α1mn\frac{6}{\alpha-1}K=\frac{6}{\alpha-1}\frac{m}{n}. Assuming that we have that for some c4>0c_{4}>0,

Pr{min𝐯=1(𝐚i𝐳𝐯𝐚i+𝐚i𝐯𝐳𝐚i)22|𝐚i𝐳|2c42mn}1Cexp(Cm+nlogn).\Pr\left\{\min_{\|\mathbf{v}\|=1}\!\!\frac{(\mathbf{a}_{i}^{*}\mathbf{z}\mathbf{v}^{*}\!\mathbf{a}_{i}\!\!+\!\!\mathbf{a}_{i}^{*}\mathbf{v}\mathbf{z}^{*}\!\mathbf{a}_{i}\!)^{2}}{2|\mathbf{a}_{i}^{*}\mathbf{z}|^{2}}\leq\frac{c_{4}}{2}\frac{m}{n}\!\!\right\}\geq 1-C\exp(-Cm+n\log n). (32)

and that there exists c8>0c_{8}>0 such that for any β\beta,

Pr{max𝐯=1i𝒮(𝐯,β)|𝐚i𝐯|2c8βmn}1Cexp(Cm+nlogn),\Pr\left\{\max_{\|\mathbf{v}\|=1}\sum_{i\in\mathcal{S}(\mathbf{v},\beta)}\!\!|\mathbf{a}_{i}^{*}\mathbf{v}|^{2}\leq c_{8}\beta\frac{m}{n}\!\!\right\}\geq 1-C\exp(-Cm+n\log n), (33)

then we can choose α\alpha and c0c_{0} such that 2c0α<12c_{0}\alpha<1 and L>cL>c for some c>0c>0 with probability 1Cnexp(Cn)1-Cn\exp(-Cn). That is, Theorem 5 is proved.

6.1 Proof of (32)

For the component min𝐯=1i=1m(𝐚i𝐳𝐯𝐚i+𝐚i𝐯𝐳𝐚i)22|𝐚i𝐳|2\min_{\|\mathbf{v}\|=1}\sum_{i=1}^{m}\!\frac{(\mathbf{a}_{i}^{*}\mathbf{z}\mathbf{v}^{*}\!\mathbf{a}_{i}+\mathbf{a}_{i}^{*}\mathbf{v}\mathbf{z}^{*}\!\mathbf{a}_{i}\!)^{2}}{2|\mathbf{a}_{i}^{*}\mathbf{z}|^{2}}, we will use a two-step procedure: First, we will show that for any fixed 𝐯𝒞n\mathbf{v}\in\mathcal{C}^{n} with 𝐯=1\|\mathbf{v}\|=1, there exists some c4>0c_{4}>0 such that

Pr{(𝐚i𝐳𝐯𝐚i+𝐚i𝐯𝐳𝐚i)22|𝐚i𝐳|2c4mn}1Cexp(Cm).\Pr\left\{\!\!\frac{(\mathbf{a}_{i}^{*}\mathbf{z}\mathbf{v}^{*}\!\mathbf{a}_{i}\!\!+\!\!\mathbf{a}_{i}^{*}\mathbf{v}\mathbf{z}^{*}\!\mathbf{a}_{i}\!)^{2}}{2|\mathbf{a}_{i}^{*}\mathbf{z}|^{2}}\leq c_{4}\frac{m}{n}\!\!\right\}\geq 1-C\exp(-Cm). (34)

Second, we will apply an ϵ\epsilon-net argument to the set {𝐯𝒞n:𝐯=1}\{\mathbf{v}\in\mathcal{C}^{n}:\|\mathbf{v}\|=1\}.

To prove (34), considering that the expression only depends on the inner products 𝐚i𝐯\mathbf{a}_{i}^{*}\mathbf{v} and 𝐚i𝐳\mathbf{a}_{i}^{*}\mathbf{z}, it is equivalent to work with 𝐚~i=PSp(𝐯,𝐳)𝐚i\tilde{\mathbf{a}}_{i}=P_{\mathrm{Sp}(\mathbf{v},\mathbf{z})}\mathbf{a}_{i}, 𝐯~=PSp(𝐯,𝐳)𝐯\tilde{\mathbf{v}}=P_{\mathrm{Sp}(\mathbf{v},\mathbf{z})}\mathbf{v}, and 𝐳~=PSp(𝐯,𝐳)𝐳\tilde{\mathbf{z}}=P_{\mathrm{Sp}(\mathbf{v},\mathbf{z})}\mathbf{z}, that is, 𝐚~i,𝐯~,𝐳~𝒞2\tilde{\mathbf{a}}_{i},\tilde{\mathbf{v}},\tilde{\mathbf{z}}\in\mathcal{C}^{2} are obtained by projecting these vectors to the subspace spanned by 𝐯\mathbf{v} and 𝐳\mathbf{z}. Then for any 1kK1\leq k\leq K, [𝐚~(k1)n+1,,𝐚~kn]𝒞n×2[\tilde{\mathbf{a}}_{(k-1)n+1},\cdots,\tilde{\mathbf{a}}_{kn}]\in\mathcal{C}^{n\times 2} is a random orthogonal matrix in 𝒞n×2\mathcal{C}^{n\times 2}. While {𝐚~i}i=1m\{\tilde{\mathbf{a}}_{i}\}_{i=1}^{m} are not independently distributed, their correlation is weak and can be decomposed as follows: For 1kK1\leq k\leq K, let 𝐒k\mathbf{S}_{k} be a random matrix that represents the covariance of a Gaussian matrix of n×2n\times 2, then {𝐛i}i=1m\{\mathbf{b}_{i}\}_{i=1}^{m} defined by 𝐛i=𝐚~i𝐒(i1)/n+1\mathbf{b}_{i}=\tilde{\mathbf{a}}_{i}\sqrt{\mathbf{S}_{\lfloor(i-1)/n\rfloor+1}} are i.i.d. sampled from N(0,𝐈2×2)N(0,\mathbf{I}_{2\times 2}). Let tt be a parameter such that

Pr{max1kK𝐒k/n𝐈t}1/2.\Pr\{\max_{1\leq k\leq K}\|\sqrt{\mathbf{S}_{k}}/\sqrt{n}-\mathbf{I}\|\leq t\}\geq 1/2. (35)

By the property of the singular values of a random Gaussian matrix (Szarek2001, Theorem 2.13) and a union bound over K=mnK=\frac{m}{n} matrices {𝐒k}k=1K\{\mathbf{S}_{k}\}_{k=1}^{K}, we may let t=clogmnnt=\frac{c\log\frac{m}{n}}{\sqrt{n}}, which converges to zero as n,mn,m\rightarrow\infty since n>log2m\sqrt{n}>\log^{2}m. We remark that

when the event in (35) holds, 𝐚~i1n𝐛it𝐚~i\|\tilde{\mathbf{a}}_{i}-\frac{1}{\sqrt{n}}\mathbf{b}_{i}\|\leq t\|\tilde{\mathbf{a}}_{i}\| for all 1im1\leq i\leq m. (36)

Let 1={1im:𝐚~ic5/n,|𝐚~i𝐳|c6𝐚~i,|𝐚~i𝐯|c6𝐚~i,Re(𝐚~i𝐳~𝐯~𝐚~i)𝐯~𝐚~i𝐳~𝐚~ic7}\mathcal{I}_{1}=\{1\leq i\leq m:\|\tilde{\mathbf{a}}_{i}^{*}\|\geq c_{5}/\sqrt{n},|\tilde{\mathbf{a}}_{i}^{*}\mathbf{z}|\geq c_{6}\|\tilde{\mathbf{a}}_{i}^{*}\|,|\tilde{\mathbf{a}}_{i}^{*}\mathbf{v}|\geq c_{6}\|\tilde{\mathbf{a}}_{i}^{*}\|,\frac{\mathrm{Re}(\tilde{\mathbf{a}}_{i}^{*}\tilde{\mathbf{z}}\tilde{\mathbf{v}}^{*}\!\tilde{\mathbf{a}}_{i})}{\|\tilde{\mathbf{v}}^{*}\!\tilde{\mathbf{a}}_{i}\|\|\tilde{\mathbf{z}}^{*}\!\tilde{\mathbf{a}}_{i}\|}\geq c_{7}\}, and let 2={1im:𝐛ic5(1t),|𝐛i𝐳|(c6t)𝐛i,|𝐛i𝐯|(c6t)𝐛i,Re(𝐛i𝐳~𝐯~𝐛i)𝐯~𝐛i𝐳~𝐛ic73t/c6}\mathcal{I}_{2}=\{1\leq i\leq m:\|{\mathbf{b}}_{i}^{*}\|\geq c_{5}(1-t),|{\mathbf{b}}_{i}^{*}\mathbf{z}|\geq(c_{6}-t)\|{\mathbf{b}}_{i}^{*}\|,|{\mathbf{b}}_{i}^{*}\mathbf{v}|\geq(c_{6}-t)\|{\mathbf{b}}_{i}^{*}\|,\frac{\mathrm{Re}({\mathbf{b}}_{i}^{*}\tilde{\mathbf{z}}\tilde{\mathbf{v}}^{*}\!{\mathbf{b}}_{i})}{\|\tilde{\mathbf{v}}^{*}\!{\mathbf{b}}_{i}\|\|\tilde{\mathbf{z}}^{*}\!{\mathbf{b}}_{i}\|}\geq c_{7}-3t/c_{6}\}. By definition, when the event in (35) holds, then 12{\mathcal{I}_{1}}\subseteq\mathcal{I}_{2} and 2¯1¯\bar{\mathcal{I}_{2}}\supseteq\bar{\mathcal{I}_{1}}. Since the event in (35) is independent of {𝐚~i}i=1m\{\tilde{\mathbf{a}}_{i}\}_{i=1}^{m}, for any α>0\alpha>0,

Pr{|¯2|αm}Pr{|¯1|αm}Pr{max1kK𝐒kn𝐈/nt}\displaystyle\Pr\{|\bar{\mathcal{I}}_{2}|\geq\alpha m\}\geq\Pr\{|\bar{\mathcal{I}}_{1}|\geq\alpha m\}\Pr\{\max_{1\leq k\leq K}\|\mathbf{S}_{k}-n\mathbf{I}\|/n\leq t\}
\displaystyle\geq 12Pr{|¯1|αm}.\displaystyle\frac{1}{2}\Pr\{|\bar{\mathcal{I}}_{1}|\geq\alpha m\}.

On the other hand, clearly we can choose c5,c6,c7,0<α<1c_{5},c_{6},c_{7},0<\alpha<1 such that

Pr{|¯2|αm}=Pr{|2|(1α)m}Cexp(Cm),\Pr\{|\bar{\mathcal{I}}_{2}|\geq\alpha m\}=\Pr\{|{\mathcal{I}}_{2}|\leq(1-\alpha)m\}\leq C\exp(-Cm),

and it suggests that

Pr{|1|(1α)m}=Pr{|¯1|αm}2Pr{|¯2|αm}Cexp(Cm).\Pr\{|{\mathcal{I}}_{1}|\leq(1-\alpha)m\}=\Pr\{|\bar{\mathcal{I}}_{1}|\geq\alpha m\}\leq 2\Pr\{|\bar{\mathcal{I}}_{2}|\geq\alpha m\}\leq C\exp(-Cm).

As a result, with probability 1Cexp(Cm)1-C\exp(-Cm), |1|(1α)m|{\mathcal{I}}_{1}|\geq(1-\alpha)m and as a result,

i=1m(𝐚~i𝐳~𝐯~𝐚~i+𝐚~i𝐯~𝐳~𝐚~i)22|𝐚~i𝐳|2(1α)c52c62c72mn,\sum_{i=1}^{m}\!\frac{(\tilde{\mathbf{a}}_{i}^{*}\tilde{\mathbf{z}}\tilde{\mathbf{v}}^{*}\!\tilde{\mathbf{a}}_{i}\!\!+\!\!\tilde{\mathbf{a}}_{i}^{*}\tilde{\mathbf{v}}\tilde{\mathbf{z}}^{*}\!\tilde{\mathbf{a}}_{i}\!)^{2}}{2|\tilde{\mathbf{a}}_{i}^{*}\mathbf{z}|^{2}}\geq(1-\alpha)c_{5}^{2}c_{6}^{2}c_{7}^{2}\frac{m}{n},

and (34) is proved with c4=(1α)c52c62c72c_{4}=(1-\alpha)c_{5}^{2}c_{6}^{2}c_{7}^{2}.

It remains to apply an ϵ\epsilon-net argument, which is a standard argument: by noting that

i=1m|(𝐚i𝐳𝐯𝐚i+𝐚i𝐯𝐳𝐚i)22|𝐚i𝐳|2(𝐚i𝐳𝐯𝐚i+𝐚i𝐯𝐳𝐚i)22|𝐚i𝐳|2|i=1m𝐚i2𝐯𝐯2=m𝐯𝐯2\sum_{i=1}^{m}\Big{|}\frac{(\mathbf{a}_{i}^{*}\mathbf{z}\mathbf{v}^{*}\!\mathbf{a}_{i}\!\!+\!\!\mathbf{a}_{i}^{*}\mathbf{v}\mathbf{z}^{*}\!\mathbf{a}_{i}\!)^{2}}{2|\mathbf{a}_{i}^{*}\mathbf{z}|^{2}}-\frac{(\mathbf{a}_{i}^{*}\mathbf{z}\mathbf{v}^{\prime*}\!\mathbf{a}_{i}\!\!+\!\!\mathbf{a}_{i}^{*}\mathbf{v}^{\prime}\mathbf{z}^{*}\!\mathbf{a}_{i}\!)^{2}}{2|\mathbf{a}_{i}^{*}\mathbf{z}|^{2}}\Big{|}\leq\sum_{i=1}^{m}\|\mathbf{a}_{i}\|^{2}\|\mathbf{v}-\mathbf{v}^{\prime}\|^{2}=m\|\mathbf{v}-\mathbf{v}^{\prime}\|^{2}

and use ϵ=c42n\epsilon=\frac{c_{4}}{2\sqrt{n}}, we have (32).

6.2 Proof of (33)

The proof of (33) will be similar to the proof of (32). First, it will be proved for a fixed 𝐯\mathbf{v}, and then an ϵ\epsilon-net argument will be used. In the proof of (33) for any fixed 𝐯\mathbf{v}, we apply 𝐚~i\tilde{\mathbf{a}}_{i} and 𝐛i\mathbf{b}_{i} as defined in the proof of (32).

For any fixed 𝐯\mathbf{v}, the first part of the proof of Lemma 5 implies that it is sufficient to assume 𝐯𝐳\mathbf{v}\perp\mathbf{z} and 𝐯~𝐳~\tilde{\mathbf{v}}\perp\tilde{\mathbf{z}}. When i𝒮(𝐯,β)i\in\mathcal{S}(\mathbf{v},\beta), then β𝐚~i𝐯~𝐚~i𝐳~\beta\|\tilde{\mathbf{a}}_{i}^{*}\tilde{\mathbf{v}}\|\geq\|\tilde{\mathbf{a}}_{i}^{*}\tilde{\mathbf{z}}\|. When the event (35) holds, then (36) suggests that 𝐛i𝐯~𝐛i𝐳~1tβ+t\frac{\|\mathbf{b}_{i}^{*}\tilde{\mathbf{v}}\|}{\|\mathbf{b}_{i}^{*}\tilde{\mathbf{z}}\|}\geq\frac{1-t}{\beta+t}, which happens with probability smaller than (β+t1t)2(\frac{\beta+t}{1-t})^{2} by Lemma 5.

Applying Hoeffding’s inequality for subgaussian distributions, there exists γ0>0\gamma_{0}>0 such that with probability Cexp(Cm)C\exp(-Cm), 1im:𝐛i𝐯~𝐛i𝐳~1tβ+t|𝐛i𝐯~|2(β+t1t)2γ0m\sum_{1\leq i\leq m:\frac{\|\mathbf{b}_{i}^{*}\tilde{\mathbf{v}}\|}{\|\mathbf{b}_{i}^{*}\tilde{\mathbf{z}}\|}\geq\frac{1-t}{\beta+t}}|\mathbf{b}_{i}^{*}\tilde{\mathbf{v}}|^{2}\geq(\frac{\beta+t}{1-t})^{2}\gamma_{0}m. By the same argument as the proof of (34) and |𝐛i𝐯~|2|𝐚~i𝐯~|2t2𝐚~i2|\mathbf{b}_{i}^{*}\tilde{\mathbf{v}}|^{2}-|\tilde{\mathbf{a}}_{i}^{*}\tilde{\mathbf{v}}|^{2}\leq t^{2}\|\tilde{\mathbf{a}}_{i}\|^{2}, we have that with probability at most 2Cexp(Cm)2C\exp(-Cm),

i𝒮(𝐯,β)|𝐚~i𝐯|2(β+t1t)2γ0mn+t2mn.\sum_{i\in\mathcal{S}(\mathbf{v},\beta)}|\tilde{\mathbf{a}}_{i}^{*}\mathbf{v}|^{2}\geq(\frac{\beta+t}{1-t})^{2}\gamma_{0}\frac{m}{n}+t^{2}\frac{m}{n}.

Combining it with an ϵ\epsilon-net argument with ϵ=c/n\epsilon=c/n and Lemma 6 (with c2=n/2cc_{2}=n/2c), note that

{1im:𝐚i/|𝐚i𝐯|c2}𝐚i𝐯2{1im:𝐚i/|𝐚i𝐯|c2}𝐚i2/c22m/c22,\{1\leq i\leq m:\|\mathbf{a}_{i}\|/|\mathbf{a}_{i}^{*}\mathbf{v}|\geq c_{2}\}\|\mathbf{a}_{i}^{*}\mathbf{v}\|^{2}\leq\{1\leq i\leq m:\|\mathbf{a}_{i}\|/|\mathbf{a}_{i}^{*}\mathbf{v}|\geq c_{2}\}\|\mathbf{a}_{i}\|^{2}/c_{2}^{2}\leq m/c_{2}^{2},

(33) is proved.

7 Discussion

7.1 Comparison with existing analysis of real-valued objects

This section compares the analysis in this case with the analysis of the same algorithm of real-valued objects in Tan and Vershynin 10.1093/imaiai/iay005 , since both works have deterministic conditions of convergence and verify the deterministic condition under a probabilistic model.

First, the deterministic condition in 10.1093/imaiai/iay005 can be rewritten as follows: there exist θ\theta such that for all “wedges of angle θ\theta𝒲\mathcal{W} in n\mathbb{R}^{n},

1mλmin(i=1m𝐚i𝐚iT4𝐚i𝒲𝐚i𝐚iT)cn,\frac{1}{m}\lambda_{\min}\Big{(}\sum_{i=1}^{m}\mathbf{a}_{i}\mathbf{a}_{i}^{T}-4\sum_{\mathbf{a}_{i}\in\mathcal{W}}\mathbf{a}_{i}\mathbf{a}_{i}^{T}\Big{)}\geq\frac{c}{n}, (37)

and here wedge of angle θ\theta represents the region of the sphere between two hemispheres with normal vectors making an angle of θ\theta.

In comparison, combining Theorems 2 and 3, the deterministic result in this paper requires the existence of some c0>0c_{0}>0 and α>1\alpha>1 such that

min𝐯=1{12i=1m(𝐚i𝐳𝐯𝐚i+𝐚i𝐯𝐳𝐚i)22|𝐚i𝐳|26α1i=1m|𝐚i𝐯|2(2+4α)i𝒮(𝐯,c0α)|𝐚i𝐯|2}cn.\min_{\|\mathbf{v}\|=1}\!\left\{\!\frac{1}{2}\!\sum_{i=1}^{m}\!\frac{(\mathbf{a}_{i}^{*}\mathbf{z}\mathbf{v}^{*}\!\mathbf{a}_{i}\!+\!\mathbf{a}_{i}^{*}\mathbf{v}\mathbf{z}^{*}\!\mathbf{a}_{i})^{2}}{2|\mathbf{a}_{i}^{*}\mathbf{z}|^{2}}\!-\!\frac{6}{\alpha\!-\!1}\!\!\sum_{i=1}^{m}|\mathbf{a}_{i}^{*}\mathbf{v}|^{2}\!-\!(2\!+\!4\alpha\!)\!\!\!\!\!\!\sum_{i\in\mathcal{S}(\mathbf{v},c_{0}\alpha)}\!\!\!\!\!\!|\mathbf{a}_{i}^{*}\mathbf{v}|^{2}\!\right\}\!\geq\!\frac{c}{n}. (38)

The term i=1m𝐚i𝐚iT\sum_{i=1}^{m}\mathbf{a}_{i}\mathbf{a}_{i}^{T} in (37) is comparable to the terms 12i=1m(𝐚i𝐳𝐯𝐚i+𝐚i𝐯𝐳𝐚i)22|𝐚i𝐳|26α1i=1m|𝐚i𝐯|2\frac{1}{2}\sum_{i=1}^{m}\frac{(\mathbf{a}_{i}^{*}\mathbf{z}\mathbf{v}^{*}\mathbf{a}_{i}+\mathbf{a}_{i}^{*}\mathbf{v}\mathbf{z}^{*}\mathbf{a}_{i})^{2}}{2|\mathbf{a}_{i}^{*}\mathbf{z}|^{2}}-\frac{6}{\alpha-1}\sum_{i=1}^{m}|\mathbf{a}_{i}^{*}\mathbf{v}|^{2} in (38). Under the real-valued setting, the latter can be simplified to (16α1)i=1m|𝐚i𝐯|2\left(1-\frac{6}{\alpha-1}\right)\sum_{i=1}^{m}|\mathbf{a}_{i}^{*}\mathbf{v}|^{2}, and minimizing it over all 𝐯=1\|\mathbf{v}\|=1 gives the smallest eigenvalue of i=1m𝐚i𝐚iT\sum_{i=1}^{m}\mathbf{a}_{i}\mathbf{a}_{i}^{T}.

The term 𝐚i𝒲𝐚i𝐚iT\sum_{\mathbf{a}_{i}\in\mathcal{W}}\mathbf{a}_{i}\mathbf{a}_{i}^{T} in (37) and the set 𝒲\mathcal{W} are also comparable to the term (2+4α)i𝒮(𝐯,c0α)|𝐚i𝐯|2(2+4\alpha)\sum_{i\in\mathcal{S}(\mathbf{v},c_{0}\alpha)}|\mathbf{a}_{i}^{*}\mathbf{v}|^{2} in (38) and the set 𝒮(𝐯,c0α)\mathcal{S}(\mathbf{v},c_{0}\alpha). In fact, the set 𝒮(𝐯,c0α)\mathcal{S}(\mathbf{v},c_{0}\alpha) also has the “wedge” shape under the real-valued setting, and both works attempt to show that the number of sensing vectors in the set is small.

Second, the probabilistic analysis in these two works also shares connections, which is natural since there are similarities in the deterministic conditions. However, 10.1093/imaiai/iay005 achieves the bound m=O(n)m=O(n), which is a logarithmic factor better than the bound m=O(nlogn)m=O(n\log n) in Theorem 1. Looking into the analysis of both works, the extra logn\log n factor comes from the estimation of i=1m(𝐚i𝐳𝐯𝐚i+𝐚i𝐯𝐳𝐚i)22|𝐚i𝐳|2\sum_{i=1}^{m}\frac{(\mathbf{a}_{i}^{*}\mathbf{z}\mathbf{v}^{*}\mathbf{a}_{i}+\mathbf{a}_{i}^{*}\mathbf{v}\mathbf{z}^{*}\mathbf{a}_{i})^{2}}{2|\mathbf{a}_{i}^{*}\mathbf{z}|^{2}} in Lemma 3 and the estimation of the size of 𝒮(𝐯,c0α)\mathcal{S}(\mathbf{v},c_{0}\alpha) in Lemma 4, where simple ϵ\epsilon-net arguments are used. In comparison, in the real-valued setting 10.1093/imaiai/iay005 , i=1m(𝐚i𝐳𝐯𝐚i+𝐚i𝐯𝐳𝐚i)22|𝐚i𝐳|2=2i=1m𝐚i𝐯2\sum_{i=1}^{m}\frac{(\mathbf{a}_{i}^{*}\mathbf{z}\mathbf{v}^{*}\mathbf{a}_{i}+\mathbf{a}_{i}^{*}\mathbf{v}\mathbf{z}^{*}\mathbf{a}_{i})^{2}}{2|\mathbf{a}_{i}^{*}\mathbf{z}|^{2}}=2\sum_{i=1}^{m}\|\mathbf{a}_{i}^{*}\mathbf{v}\|^{2} and a standard result on the eigenvalue of i=1m𝐚i𝐚i\sum_{i=1}^{m}\mathbf{a}_{i}\mathbf{a}_{i}^{*} can be used; and the number of the sensing vectors in the set 𝒲\mathcal{W} is uniformly bounded by applying VC theory, and these arguments do not have natural generalizations to the complex-valued setting. In comparison, the ϵ\epsilon-net argument gives an additional logn\log n factor. It would be interesting to investigate whether there exist more careful arguments for the complex-valued setting such that the logn\log n factor could be removed.

Finally, while there are similarities between this work and 10.1093/imaiai/iay005 , the fundamental difference comes from the argument for the deterministic condition in Theorems 2 and 3, which relates the convergence of the randomized Kaczmarz algorithm with the local convexity of an objective function. In comparison, the straightforward calculation in 10.1093/imaiai/iay005 is based on the fact that there only exist two phases of ±1\pm 1 in the real-valued setting.

7.2 Initialization

Theorem 1 requires an initialization such that 𝐱(0)𝐳c0δ1\|\mathbf{x}^{(0)}-\mathbf{z}\|\leq c_{0}\sqrt{\delta_{1}}. Many schemes have been proposed for obtaining a good initialization (10.1093/imaiai/iay005, , Section B). For example, we may use the truncated spectral method that let 𝐱^(0)=λ0𝐱~\hat{\mathbf{x}}^{(0)}=\lambda_{0}\tilde{\mathbf{x}}, where λ0=1mi=1mbi2\lambda_{0}=\sqrt{\frac{1}{m}\sum_{i=1}^{m}b_{i}^{2}} and 𝐱~\tilde{\mathbf{x}} in the leading eigenvector of Y=1mi=1mbi2𝐚i𝐚iI(bi3λ0).Y=\frac{1}{m}\sum_{i=1}^{m}b_{i}^{2}\mathbf{a}_{i}\mathbf{a}_{i}^{*}I(b_{i}\leq 3\lambda_{0}). Following the analysis in (10.1093/imaiai/iay005, , Section B), one can show that the requirement on the initialization 𝐱^(0)𝐳c0δ1𝐳\|\hat{\mathbf{x}}^{(0)}-\mathbf{z}\|\leq c_{0}\sqrt{\delta_{1}}\|\mathbf{z}\| holds as long as mC(log(1/δ)+n)/(c02δ1)m\geq C(\log(1/\delta)+n)/(c_{0}^{2}\delta_{1}).

However, considering that this construction of the initialization is dependent on the isotropy of the distribution of the sensing vectors, it is still interesting to investigate whether the randomized Kaczmarz algorithm works with random initialization, similar to the results in tan2019online ; zhang2020 , and we leave it as a possible future direction.

7.3 Other probabilistic models

Sections 5 and 6 verifies the deterministic condition in Theorems 2 and 3 when the sensing vectors are sampled i.i.d. from a uniform distribution on the sphere or generated from a unitary model. However, there exist many other models of generating sensing vectors, such as coded diffraction model in Wei_2015 , and it would be interesting to see whether the deterministic condition in Theorems 2 and 3 could be verified for more generic models.

8 Summary

This paper justifies the convergence of the randomized Kaczmarz algorithm for phase retrieval of complex-valued objects. Specifically, the paper first establishes a deterministic condition for its convergence, and then demonstrates that when the sensing vectors are sampled uniformly from a unit sphere in 𝒞n\mathcal{C}^{n} and the number of sensing vectors mm satisfies m>O(nlogn)m>O(n\log n) as n,mn,m\rightarrow\infty, then this deterministic condition holds with high probability.

References

  • (1) Bahmani, S., Romberg, J.: Phase Retrieval Meets Statistical Learning Theory: A Flexible Convex Relaxation. In: A. Singh, J. Zhu (eds.) Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, vol. 54, pp. 252–260. PMLR, Fort Lauderdale, FL, USA (2017). URL http://proceedings.mlr.press/v54/bahmani17a.html
  • (2) Bassily, R., Belkin, M., Ma, S.: On exponential convergence of sgd in non-convex over-parametrized learning (2018)
  • (3) Bauschke, H.H., Combettes, P.L., Luke, D.R.: Hybrid projection–reflection method for phase retrieval. J. Opt. Soc. Am. A 20(6), 1025–1034 (2003). DOI 10.1364/JOSAA.20.001025. URL http://josaa.osa.org/abstract.cfm?URI=josaa-20-6-1025
  • (4) Cai, T.T., Li, X., Ma, Z.: Optimal rates of convergence for noisy sparse phase retrieval via thresholded wirtinger flow. Ann. Statist. 44(5), 2221–2251 (2016). DOI 10.1214/16-AOS1443. URL http://dx.doi.org/10.1214/16-AOS1443
  • (5) Candès, E.J., Li, X.: Solving quadratic equations via phaselift when there are about as many equations as unknowns. Foundations of Computational Mathematics 14(5), 1017–1026 (2014). DOI 10.1007/s10208-013-9162-z. URL http://dx.doi.org/10.1007/s10208-013-9162-z
  • (6) Candes, E.J., Li, X., Soltanolkotabi, M.: Phase retrieval via wirtinger flow: Theory and algorithms. IEEE Transactions on Information Theory 61(4), 1985–2007 (2015). DOI 10.1109/TIT.2015.2399924
  • (7) Candes, E.J., Strohmer, T., Voroninski, V.: Phaselift: Exact and stable signal recovery from magnitude measurements via convex programming. Communications on Pure and Applied Mathematics 66(8), 1241–1274 (2013). DOI 10.1002/cpa.21432. URL http://dx.doi.org/10.1002/cpa.21432
  • (8) Chai, A., Moscoso, M., Papanicolaou, G.: Array imaging using intensity-only measurements. Inverse Problems 27(1), 015005 (2011). URL http://stacks.iop.org/0266-5611/27/i=1/a=015005
  • (9) Chen, Y., Candes, E.: Solving random quadratic systems of equations is nearly as easy as solving linear systems. In: C. Cortes, N.D. Lawrence, D.D. Lee, M. Sugiyama, R. Garnett (eds.) Advances in Neural Information Processing Systems 28, pp. 739–747. Curran Associates, Inc. (2015). URL http://papers.nips.cc/paper/5743-solving-random-quadratic-systems-of-equations-is-nearly-as-easy-as-solving-linear-systems.pdf
  • (10) Chen, Y., Chi, Y., Fan, J., Ma, C.: Gradient descent with random initialization: fast global convergence for nonconvex phase retrieval. Mathematical Programming 176(1), 5–37 (2019). DOI 10.1007/s10107-019-01363-6. URL https://doi.org/10.1007/s10107-019-01363-6
  • (11) Elser, V., Lan, T.Y., Bendory, T.: Benchmark problems for phase retrieval. SIAM Journal on Imaging Sciences 11(4), 2429–2455 (2018). DOI 10.1137/18M1170364. URL https://doi.org/10.1137/18M1170364
  • (12) Fienup, J.R.: Reconstruction of an object from the modulus of its fourier transform. Opt. Lett. 3(1), 27–29 (1978). DOI 10.1364/OL.3.000027. URL http://ol.osa.org/abstract.cfm?URI=ol-3-1-27
  • (13) Fienup, J.R.: Phase retrieval algorithms: a comparison. Appl. Opt. 21(15), 2758–2769 (1982). DOI 10.1364/AO.21.002758. URL http://ao.osa.org/abstract.cfm?URI=ao-21-15-2758
  • (14) Gerchberg, R.W., Saxton, W.O.: A practical algorithm for the determination of the phase from image and diffraction plane pictures. Optik (Jena) 35, 237+ (1972)
  • (15) Goldstein, T., Studer, C.: Phasemax: Convex phase retrieval via basis pursuit. IEEE Transactions on Information Theory 64(4), 2675–2689 (2018)
  • (16) Gross, D., Krahmer, F., Kueng, R.: A partial derandomization of phaselift using spherical designs. Journal of Fourier Analysis and Applications 21(2), 229–266 (2015). DOI 10.1007/s00041-014-9361-2. URL http://dx.doi.org/10.1007/s00041-014-9361-2
  • (17) Hand, P., Voroninski, V.: Corruption robust phase retrieval via linear programming. CoRR abs/1612.03547 (2016). URL http://arxiv.org/abs/1612.03547
  • (18) Hand, P., Voroninski, V.: An elementary proof of convex phase retrieval in the natural parameter space via the linear program phasemax. CoRR abs/1611.03935 (2016). URL http://arxiv.org/abs/1611.03935
  • (19) Kaczmarz, S.: Angenäherte auflösung von systemen linearer gleichungen. Bull. Int. Acad. Pol. Sci. Lett. Class. Sci. Math. Nat A(35), 355–57 (1937)
  • (20) Needell, D., Srebro, N., Ward, R.: Stochastic gradient descent, weighted sampling, and the randomized kaczmarz algorithm. Mathematical Programming 155(1), 549–573 (2016). DOI 10.1007/s10107-015-0864-7. URL https://doi.org/10.1007/s10107-015-0864-7
  • (21) Needell, D., Tropp, J.A.: Paved with good intentions: Analysis of a randomized block kaczmarz method. Linear Algebra and its Applications 441, 199 – 221 (2014). DOI https://doi.org/10.1016/j.laa.2012.12.022. URL http://www.sciencedirect.com/science/article/pii/S0024379513000098. Special Issue on Sparse Approximate Solution of Linear Systems
  • (22) Netrapalli, P., Jain, P., Sanghavi, S.: Phase retrieval using alternating minimization. IEEE Transactions on Signal Processing 63(18), 4814–4826 (2015). DOI 10.1109/TSP.2015.2448516
  • (23) Salehi, F., Abbasi, E., Hassibi, B.: Learning without the phase: Regularized phasemax achieves optimal sample complexity. In: S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, R. Garnett (eds.) Advances in Neural Information Processing Systems 31, pp. 8654–8665. Curran Associates, Inc. (2018). URL http://papers.nips.cc/paper/8082-learning-without-the-phase-regularized-phasemax-achieves-optimal-sample-complexity.pdf
  • (24) Soltanolkotabi, M.: Structured signal recovery from quadratic measurements: Breaking sample complexity barriers via nonconvex optimization. IEEE Transactions on Information Theory 65(4), 2374–2400 (2019)
  • (25) Strohmer, T., Vershynin, R.: A randomized kaczmarz algorithm with exponential convergence. Journal of Fourier Analysis and Applications 15(2), 262 (2008). DOI 10.1007/s00041-008-9030-4. URL https://doi.org/10.1007/s00041-008-9030-4
  • (26) Sun, J., Qu, Q., Wright, J.: A geometric analysis of phase retrieval. In: 2016 IEEE International Symposium on Information Theory (ISIT), pp. 2379–2383 (2016). DOI 10.1109/ISIT.2016.7541725
  • (27) Tan, Y.S., Vershynin, R.: Phase retrieval via randomized Kaczmarz: theoretical guarantees. Information and Inference: A Journal of the IMA 8(1), 97–123 (2018). DOI 10.1093/imaiai/iay005. URL https://doi.org/10.1093/imaiai/iay005
  • (28) Tan, Y.S., Vershynin, R.: Online stochastic gradient descent with arbitrary initialization solves non-smooth, non-convex phase retrieval (2019)
  • (29) Vershynin, R.: Introduction to the non-asymptotic analysis of random matrices, p. 210–268. Cambridge University Press (2012). DOI 10.1017/CBO9780511794308.006
  • (30) Wainwright, M.: High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press (2019). URL https://books.google.com/books?id=IluHDwAAQBAJ
  • (31) Waldspurger, I.: Phase retrieval with random gaussian sensing vectors by alternating projections. IEEE Transactions on Information Theory 64(5), 3301–3312 (2018). DOI 10.1109/TIT.2018.2800663
  • (32) Waldspurger, I., d’Aspremont, A., Mallat, S.: Phase recovery, maxcut and complex semidefinite programming. Mathematical Programming 149(1), 47–81 (2015). DOI 10.1007/s10107-013-0738-9. URL http://dx.doi.org/10.1007/s10107-013-0738-9
  • (33) Wang, G., Giannakis, G.: Solving random systems of quadratic equations via truncated generalized gradient flow. In: D.D. Lee, M. Sugiyama, U.V. Luxburg, I. Guyon, R. Garnett (eds.) Advances in Neural Information Processing Systems 29, pp. 568–576. Curran Associates, Inc. (2016). URL http://papers.nips.cc/paper/6061-solving-random-systems-of-quadratic-equations-via-truncated-generalized-gradient-flow.pdf
  • (34) Wang, G., Giannakis, G.B., Eldar, Y.C.: Solving systems of random quadratic equations via truncated amplitude flow. IEEE Transactions on Information Theory 64(2), 773–794 (2018)
  • (35) Wei, K.: Solving systems of phaseless equations via kaczmarz methods: a proof of concept study. Inverse Problems 31(12), 125008 (2015). DOI 10.1088/0266-5611/31/12/125008. URL https://doi.org/10.1088%2F0266-5611%2F31%2F12%2F125008
  • (36) Zhang, H., Chi, Y., Liang, Y.: Provable non-convex phase retrieval with outliers: Median truncated wirtinger flow. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, ICML’16, pp. 1022–1031. JMLR.org (2016). URL http://dl.acm.org/citation.cfm?id=3045390.3045499
  • (37) Zhang, H., Liang, Y.: Reshaped wirtinger flow for solving quadratic system of equations. In: D.D. Lee, M. Sugiyama, U.V. Luxburg, I. Guyon, R. Garnett (eds.) Advances in Neural Information Processing Systems 29, pp. 2622–2630. Curran Associates, Inc. (2016). URL http://papers.nips.cc/paper/6319-reshaped-wirtinger-flow-for-solving-quadratic-system-of-equations.pdf
  • (38) Zhang, T.: Phase retrieval by alternating minimization with random initialization. IEEE Transactions on Information Theory pp. 1–1 (2020)
  • (39) Zhang, T.: Phase retrieval using alternating minimization in a batch setting. Applied and Computational Harmonic Analysis 49(1), 279 – 295 (2020). DOI https://doi.org/10.1016/j.acha.2019.02.001. URL http://www.sciencedirect.com/science/article/pii/S1063520319300181