Local saddles of relaxed averaged alternating reflections algorithms on phase retrieval

Pengwen Chen Applied mathematics, National Chung Hsing University, Taiwan [email protected]

Abstract

Phase retrieval can be expressed as a non-convex constrained optimization problem to identify one phase minimizer one a torus. Many iterative transform techniques have been proposed to identify the minimizer, e.g., relaxed averaged alternating reflections(RAAR) algorithms. In this paper, we present one optimization viewpoint on the RAAR algorithm. RAAR algorithm is one alternating direction method of multipliers(ADMM) with one penalty parameter. Pairing with multipliers (dual vectors), phase vectors on the primal space are lifted to higher dimensional vectors, RAAR algorithm is one continuation algorithm, which searches for local saddles in the primal-dual space. The dual iteration approximates one gradient ascent flow, which drives the corresponding local minimizers in a positive-definite Hessian region. Altering penalty parameters, the RAAR avoids the stagnation of these corresponding local minimizers in the primal space and thus screens out many stationary points corresponding to non-local minimizers.

Keywords: Phase retrieval, relaxed averaged alternating reflections, alternating direction method of multipliers, Nash equilibrium, local saddles

1 Introduction

Phase retrieval has recently attracted attentions in the mathematics community (see one review [Shechtman2015] and references therein). The problem of phase retrieval is motivated by the inability of photo detectors to directly measure the phase of an electromagnetic wave at frequencies of THz (terahertz) and higher. The problem of phase retrieval aims to reconstruct an unknown object $x_{0}\in\IC^{n}$ from its magnitude measurement data $b=|A^{*}x_{0}|$ , where $A\in\IC^{n\times N}$ represents some isometric matrix and $A^{*}$ represents the Hermitian adjoint of $A$ . Introduce one non-convex $N$ -dimensional torus associated with its normalized torus $\cZ:=\left\{z\in\IC^{N}:|z|=b\right\},\;\cU:=\left\{u\in\IC^{N}:|u|=1\right\}.$ The whole problem is equivalent to reconstructing the missing phase information $u$ and the unknown object $x=x_{0}$ via solving the constrained least squares problem \beqq min_x∈\IC^n, —u—=1 { ∥b⊙u-A^* x∥^2: u∈\IC^N}= min_z∈\cZ ∥A_⊥z∥^2, \eeqqwhere $A_{\bot}\in\IC^{(N-n)\times N}$ is an isometric matrix with unitary matrix $[A^{*},A_{\bot}^{*}]$ ,

[A^{*},A_{\bot}^{*}][A^{*},A_{\bot}^{*}]^{*}=A_{\bot}^{*}A_{\bot}+A^{*}A=I.

Here, $b\odot u$ represents the component-wise multiplication between two vectors $b,u$ , respectively. The isometric condition is not very restrictive in applications, since Fourier transforms are commonly applied in phase retrieval. Even for non-Fourier transforms, we can still obtain equivalent problems via a QR-factorization, see [Chen2017]. Let $\cU_{*}$ denote the set in $\cU$ consisting of all the local minimizers of (1). A vector $z_{*}\in\cZ$ minimizes (1) is called a global solution. In the noiseless measurement case, $A_{\bot}z_{*}=0$ or $z_{*}=A^{*}x_{*}=b\odot u_{*}$ for some $u_{*}\in\cU$ and some $x_{*}\in\IC^{n}$ . Numerically, it is a nontrivial task to obtain a global minimizer on the non-convex torus. The error reduction method is one traditional method [Gerchberg1972], which could produce a local solution of poor quality for (1), if no proper initialization is taken. During last decades, researchers propose various spectral initialization algorithms to overcome this challenge[NetrapalliEtAl2013Phase, ChenCandes2015Solving, CandesEtAl2015Phase, NIPS2016_83adc922, Chen2017, Chen2017a, LuoEtAl2018Optimal, LuLi2017Phase, Mondelli2019, duchi2019solving]. On the other hand, phase retrieval can be also tackled by another class of algorithms, including the well-known hybrid input-output algorithm(HIO)[Fienup1982, Fienup2013], the hybrid projection–reflection method[Bauschke2003], Fourier Douglas-Rachford algorithm (FDR)[CHEN2018665], alternating direction methods[Wen_2012] and relaxed averaged alternating reflections(RAAR) algorithms[Luke_2004]. An important feature of these algorithms is the empirical ability to avoid local minima and converge to a global mini-mum for noise-free oversampled diffraction patterns. For instance, the empirical study of FDR indicates the disappearance of the stagnation at poor local solutions under sufficiently many random masks. A limit point of FDR is a global solution in (1) and the limit point with appropriate spectral gap conditions reconstructs the phase retrieval solution [CHEN2018665]. Traditional convergence study on Douglas-Rachford splitting algorithm [Eckstein1992, He2015] heavily relies on the convexity assumption. Noise-free measurement is a strict requirement for HIO and FDR, which motivates the proposal of relaxed averaged alternating reflections algorithm [Luke_2004, Li]. Let $\cA,\cB$ denote the sets $Range(A^{*})$ and $\cZ$ , respectively. Let $P_{\cA}$ and $P_{\cB}$ denote the projector on $\cA$ and $\cB$ , respectively. Let $R_{\cA},R_{\cB}$ denote the reflectors corresponding to $\cA,\cB$ . With one parameter $\beta\in(0,1)$ relaxing the original feasibility problem (the intersection of $\cA$ and $\cB$ ), the $\beta$ -RAAR algorithm [Luke_2004] is defined as the iterations $\left\{S^{k}(w):k=1,2,\ldots\right\}$ for some initialization $w\in\IC^{N}$ ,

$\displaystyle S(w)$	$\displaystyle=$	$\displaystyle\beta\cdot\frac{1}{2}(R_{\cA}R_{\cB}+I)w+(1-\beta)P_{\cB}w$	(1)
	$\displaystyle=$	$\displaystyle\frac{\beta}{2}\{(2A^{*}A-I)(2b\odot\frac{w}{\|w\|}-w)+w\}+(1-\beta)b\odot\frac{w}{\|w\|}$	(2)
	$\displaystyle=$	$\displaystyle{\beta}w+(1-2{\beta})b\odot\frac{w}{\|w\|}+{\beta}A^{*}A(2b\odot\frac{w}{\|w\|}-w).$	(3)

Fourier Douglas-Rachford algorithm can be deemed as an extreme case of $\beta$ -RAAR family with $\beta=1$ . As RAAR converges to a fixed point $w$ , we could retrieve the phase information $u=w/|w|$ for (1). Any $u$ in $\cU_{*}$ yields a fixed point $w$ . Empirically, RAAR fixed points can produce local solutions of high quality, if a large value is properly chosen for $\beta$ , as reported in [Wen_2012, Li].

In this work, we disclose the relation between RAAR and the local minimizers $z$ in (1). As HIO can be reformulated as one alternating direction method of multipliers in [Wen_2012], we identify RAAR as one ADMM with penalty parameter $1/\beta^{\prime}=(1-\beta)/\beta$ applied to the constrained optimization problem in (1), e.g., Theorem. 2.3. This perspective links $\beta$ -RAAR with a small parameter $\beta$ to multiplier methods with large penalty ${\beta^{\prime}}^{-1}$ . It is known in optimization that convergence of a multiplier method relies on a sufficiently large penalty (e.g., see Prop. 2.7 in [Bertsekas1996]). From this perspective, it is not surprising that the convergence of RAAR to its fixed point also requires a large penalty parameter. Actually, large penalty has been employed to ensure various ADMM iterations converging to stationary points [Hong2016, li2015global, wang2019global]. For instance, ADMM [wang2019global] is applied to solve the minimization of nonconvex nonsmooth functions. Global convergence to a stationary point can be established, when sufficiently large penalty parameters are used.

Saddle plays a fundamental role in the theory and the application of convex optimization[Bert], in particular, the convergence of ADMM, e.g., [Boyd]. For the application on phase retrieval, Sun et al[Sun] conduct saddle analysis on a quatradic objective function of Gaussian measurements. The geometric analysis shows that with high probability the global solution is the one local minimizer, when $N/n$ is sufficiently large. Most of critical points are saddles at actually. We believe that saddle analysis is also one key ingredient in explaining the avoidance of undesired critical points for the Lagrangian of RAAR. To some extent, promising empirical performance of non-convex ADMM conveys the impression that saddles exist in the Lagrangian function, which is not evident in the context of phase retrieval. This is a motivation of the current study. Recently, researchers have been cognizant of the importance of saddle structure in non-convex optimization research. Analysis of critical points in non-concave-convex problems leads to many interesting results in various applications. For instance, Lee et al. used a dynamical system approach to show that many gradient-descent algorithms almost surely converge to local minimizers with random initialization, even though they can get stuck at critical points theoretically [Lee2016, NIPS2017_f79921bb, jin2017escape]. The terminology “local saddle” is a crucial concept in understanding the min-max algorithm employed in modern machine learning research, e.g., gradient descent-ascent algorithms in generative adversarial networks (GANs)[goodfellow2020generative] and multi-agent reinforcement learning[omidshafiei2017deep]. With proper Hessian adjustment, [Adolphs2018] and [Daskalakis2018] proposed novel saddle algorithms to escape undesired critical points and to reach local saddles of min-max problems almost surely with random initialization. Jin et al. [Jin2020] proposed one non-symmetric definition of local saddles to address one basic question, “ what is a proper definition of local optima for the local saddle?” Later, Dai and Zhang gave saddle analysis on the constraint minimization problems [Dai2020].

Our study starts with one characterization of all the fixed point of RAAR algorithms in Theorem 2.2. These fixed points are critical points of (1). By varying $\beta$ , some of the fixed points become “local saddles” of a concave-nonconvex function $F$ , \beqq max_λmin_z {F(z,λ; β):=(β2 ∥A_⊥(z-λ)∥^2-12 ∥λ∥^2 ), z∈\cZ, λ∈\IC^N}. \eeqqTo characterize RAAR iterates, we investigate saddles in (1) lying in a high dimensional primal-dual space. Our study aims to answer a few intuitive questions, whether these local dimensional critical points on $\cZ$ in the primal space can be lift to local saddles of (1) in a primal-dual space under some spectral gap condition, and how the ADMM iterates avoid or converge to these local saddles under a proper penalty parameter? The line of thought motivates the current study on local saddles of (1). Unfortunately, the definition of local saddles in [Jin2020] can not be employed to analyze the RAAR convergence, since the objective function in phase retrieval shares phase invariance, see Remark LABEL:alpha-79.

The main goal of the present work is to establish an optimization view to illustrate the convergence of RAAR, and show by analysis and numerics, under the framework for phase retrieval with coded diffraction patterns in [Fannjiang_2012], RAAR has a basin of attraction at a local saddle $(z_{*},\lambda_{*})$ . For noiseless measurement, $z_{*}=A^{*}x_{0}$ is a strictly local minimizer of (1). In practice, numerical stagnation of RAAR on noiseless measurements disappears under sufficient large $\beta$ values. Specifically, Theorem 2.3 shows that RAAR is actually one ADMM to solve the constrained problem in (1). Based on this identification, Theorem LABEL:Main1 show that each limit of RAAR iterates can be viewed as a “local saddle” of $\max\min F$ in (1).

The rest of the paper is organized as follows. In section 2, we examine the fixed point condition of RAAR algorithm. By identifying RAAR as ADMM, we disclose the concave-non-convex function for the dynamics of RAAR, which provides a continuation viewpoint on RAAR iteration. In section LABEL:sec:3, we present one proper definition for local saddles and show the existence of local saddles for oversampled coded diffraction patterns. In section LABEL:sec:4, we show the convergence of RAAR to a local saddle under a sufficiently large parameter. Last, we provide experiments to illustrate the behavior of RAAR, (i)comparison experiments between RAAR and Douglas Rachford splitting proposed in [fannjiang2020fixed]; (ii) applications of RAAR on coded diffraction patterns.

2 RAAR algorithms

2.1 Critical points

The following gives the first order optimality of the problem in (1). This is a special case of Prop. LABEL:convex1 with $\lambda=0$ . We skip the proof. {prop} Let $z_{0}=b\odot u_{0}$ be a local minimizer of the problem in (1). Let $K_{z_{0}}^{\bot}:=\Re(\diag(\bar{u}_{0})A_{\bot}^{*}A_{\bot}\diag(u_{0}))$ . Then the first-order optimality condition is \beqq q_0:=z_0^-1⊙(A_⊥^* A_⊥z_0)∈\IR^N, \eeqqand the second-order necessary condition is that for all $\xi\in\IR^{N}$ , \beqq ξ^⊤(K_z_0^⊥-\diag(q_0))ξ≥0. \eeqq

{rem}

Once a local solution $z$ is obtained, the unknown object of phase retrieval in (1) can be estimated by $x=Az$ . On the other hand, using $I=A^{*}A+A_{\bot}^{*}A_{\bot}$ , we can express the first order condition as \beqqu^-1⊙(A^*A(b⊙u))=b⊙(1-q_0)∈\IR^N.\eeqqUsing $\xi=e_{i}$ canonical vectors of $\IR^{N}$ , we have a componentwise lower bound on $1-q_{0}$ from (2.1): $b^{-1}\odot(1-q_{0})\geq\|Ae_{i}\|^{2}\geq 0$ . In general, there exists many local minimizers on $\cZ$ , satisfying (2.1) and (2.1).

2.2 Fixed point conditions of RAAR

We begin with fixed point conditions of $\beta$ -RAAR iterations in (3). For each $\beta\in(0,1)$ , introduce one auxiliary parameter $\beta^{\prime}\in(0,\infty)$ defined by \beqq β=β’ 1+β’, i.e., β’=β1-β. \eeqqWe shall show the reduction in the cardinality of fixed points under with a small penalty parameter. {theorem} Consider the application of the $\beta$ -RAAR algorithm on the constrained problem in (1). Write $w\in\IC^{N}$ in polar form $w=u\odot|w|$ . For each $\beta\in(0,1)$ , let ${\beta^{\prime}}={\beta}/(1-{\beta})$ and \beqq c:= (1-1-ββ) b+1-ββ —w—∈\IR^N. \eeqqThen $w$ is a fixed point of $\beta$ -RAAR, if and only if $w$ satisfies the phase condition \beqq A^*A(b⊙u)=c⊙u, \eeqqand the magnitude condition, \beqq —w—=β’ c+ (1-β’)b≥0, i.e., c≥(1-β’^-1) b. \eeqqIn particular, for ${\beta}\in[1/2,1)$ , we have $c\geq 0$ from (2.2). Observe that the inequality in (2.2) ensures the well-defined magnitude vector $|w|$ . Hence, the fixed points are critical points of (1).

{proof}

Rearranging (3), we obtain the fixed point condition of RAAR, \beqq ((1-β)—w—-(1-2β)b)⊙w—w—=β A^*A { (2b-—w—)⊙w—w—}. \eeqqEquivalently, taking the projections $A^{*}A$ and $I-A^{*}A$ on (2.2) yield

	$\displaystyle A^{*}A\left\{(b-\|w\|)\odot\frac{w}{\|w\|}\right\}=0,$		(4)
	$\displaystyle(I-A^{*}A)\left\{b\odot\frac{w}{\|w\|}\right\}={\beta^{\prime}}^{-1}\left\{(b-\|w\|)\odot\frac{w}{\|w\|}\right\}.$		(5)

For the only-if part, let $w$ be a fixed point of RAAR with (4,5). With the definition of $c$ , (4) gives \beqq A^*A ((c-b)⊙u)=β’A^*A((—w—-b)⊙u)=0, \eeqqand (5) gives \beqqA_⊥^* A_⊥(c⊙w—w—)=A_⊥^* A_⊥({ b-β’^-1 (b-—w—)}⊙w—w—)=0,\eeqqwhich implies $c\odot u$ in the range of $A^{*}$ . Together with (2.2), we have (2.2). Also, (2.2) is the result of the non-negativeness of $|w|$ in (2.2).

To verify the if-part, we need to show that $w$ constructed from a phase vector $u\in\IC^{N}$ satisfying (2.2) and a magnitude vector $|w|$ satisfying (2.2) meets (4,5). From (2.2) and (2.2), we have (4), i.e., \beqqA^*A((b-—w—)⊙u)=A^*A{ β’(b-c)⊙u}=β’{c⊙u-c⊙u}=0. \eeqqWith the aid of (4, 2.2), the fixed point condition in (5) is ensured by the computation: \beqq(I-A^*A){(b-β’^-1(b-—w—))⊙u} =(I-A^*A){c⊙u}=0. \eeqq

Finally, the condition in (2.2) is identical to the first optimality condition in (2.1). Hence, the fixed points must be critical points of (1). $\square$ Theorem 2.2 indicates that each fixed point $w$ can be re-parameterized by $(u,\beta)$ satisfying (2.2) and (2.2). The condition in (2.2) always hold for ${\beta^{\prime}}$ sufficiently small.

{rem}

The first order optimality in (2.1) yields that the phase condition in (2.2) is actually the critical point condition of $u\in\cU_{*}$ in (1). Fix one critical point $u\in\cU_{*}$ and let $c$ be the corresponding vector given from (2.2). From Theorem 2.2, $w$ given from the polar form $w=u\odot|w|$ with (2.2) is a fixed point of $\beta$ -RAAR, if $\beta$ satisfies the condition in (2.2). To further examine (2.2), we parameterize the fixed point $w$ by $(u,\beta)$ . Let $b^{-1}\odot K^{\bot}b$ denote the threshold vector, where

K:=\Re(\diag(\bar{u})A^{*}A\diag(u)),\;K^{\bot}:=I-K.

The fixed point condition in (2.2) indicates that $(u,\beta_{1})$ gives a fixed point of $\beta_{1}$ -RAAR with any ${\beta}_{1}\in(0,{\beta})$ . That is, the corresponding parameter $(\beta^{\prime})^{-1}$ must exceed the threshold vector, \beqq (β’)^-1=1-ββ≥b^-1⊙(K^⊥b). \eeqqSince $\beta^{\prime}=\beta/(1-\beta)$ can be viewed as one penalty parameter in the associated Lagrangian in (2.3), we call (2.2) the penalty-threshold condition of RAAR fixed points. In general, the cardinality of RAAR fixed points decreases under a large parameter $\beta$ . See Fig. 1.

For ${\beta}=1$ , RAAR reduces to FDR, whose fixed point $w$ satisfies $\|A_{\bot}(b\odot w/|w|)\|=0$ and thus $\|A(b\odot w/|w|)\|=\|b\|$ . When phase retrieval has uniqueness property, $A(b\odot w/|w|)$ gives the reconstruction. On the other hand, for $\beta=1/2$ , (3) gives \beqq S(w)=A^*A (b⊙w—w—)+12(I-A^*A) w. \eeqqSuppose a RAAR initialization is chosen from the range of $A^{*}$ . The second term in (2.2) always vanishes and thus RAAR iterations reduce to alternating projection iterations(AP) in [Chen2017]. From this perspective, one can regard $\beta$ -RAAR as one family of algorithms interpolating AP and FDR, varying $\beta$ from $1/2$ to $1$ .

\includegraphics

[width=0.4]beta_fig.eps

Figure 1: Illustration of the penalty-threshold condition of RAAR fixed points associated with critical points

u_{1},u_{2},u_{3},u_{*}\in\cU_{*}

of (1). The set of fixed points of

\beta

-RAAR is a collection of line segments parameterized by

(u,\beta)\in\cU_{*}\times(0,1)

. As

\beta

gets larger, the cardinality of intersections (i.e., the fixed points) decreases. Critical points associated with

\beta=0.5

are

u_{1},u_{3}

and

u_{*}

. The global minimizer

u_{*}

is the only associated critical point, if

\beta\in(0.9,1)

is used.

2.3 Alternative directions method of multipliers

Next, we present one relation between RAAR and the alternating direction method of multipliers (ADMM). The alternating direction method of multipliers was originally introduced in the 1970s[Glowinski1975, Gabay1976] and can be regarded as an approximation of the augmented Lagrangian method, whose primal update step is replaced by one iteration of the alternating minimization. Although ADMM is classified as one first-order method, practically ADMM could produce a solution with modest accuracy within a reasonable amount of time. Due to the algorithm simplicity, nowadays this approach is popular in many applications, in particular, applications of nonsmooth optimalication. See [Boyd, yang2009, Deka2019] and the references therein.

Use the standard inner product

\langle x,y\rangle:=\Re(x^{*}y),\;\forall x,y\in\IC^{N}.

To solve the problem in (1), introduce one auxiliary variable $y\in\IC^{N}$ with one constraint $y=z$ and one associated multiplier $\lambda\in\IC^{N}$ , and form the Lagrangian function with some parameter $\beta^{\prime}>0$ in (2.2), \beqq β’2 ∥A_⊥y∥^2+¡ λ, y-z¿+12∥y-z∥^2, y∈\IC^N, z∈\cZ. \eeqqEquivalently, we have the augmented Lagrangian function, \beqq L(y,z, λ):= 12 ∥A_⊥y∥^2+ β’^-1¡ λ, y-z¿+12β’∥y-z∥^2, \eeqqwhen we identify $1/\beta^{\prime}$ as a penalty parameter. To solve $(y,z,\lambda)$ , ADMM starts with some initialization $z_{1}\in\cZ$ and $\lambda_{1}\in\IC^{N}$ and generates the sequence $\left\{(y_{k},z_{k},\lambda_{k}):k=1,2,3,\ldots\right\}$ with stepsize $s>0$ , according to rules,

	$\displaystyle y_{k+1}=arg\min_{y}L(y,z_{k},\lambda_{k}),$
	$\displaystyle z_{k+1}=arg\min_{\|z\|=b}L(y_{k+1},z,\lambda_{k}),$
	$\displaystyle\lambda_{k+1}=\lambda_{k}+s\nabla_{\lambda}L(y_{k+1},z_{k+1},\lambda)$

Introducing one projection operator on $\cZ$ , $[w]_{\cZ}:=w/|w|\odot b$ for $w\in\IC^{N}$ . Algebraic computation yields

	$\displaystyle y_{k+1}=(I+{\beta^{\prime}}A_{\bot}^{}A_{\bot})^{-1}(z_{k}-\lambda_{k})=(I-{\beta}A_{\bot}^{}A_{\bot})(z_{k}-\lambda_{k}),$		(6)
	$\displaystyle z_{k+1}=[y_{k+1}+\lambda_{k}]_{\cZ},$		(7)
	$\displaystyle\lambda_{k+1}=\lambda_{k}+s(y_{k+1}-z_{k+1}).$		(8)

From the $y$ -update in (6), one reconstruction $x$ for the unknown object in (1) can be computed by \beqq x=Ay=A(I-βA_⊥^* A_⊥)(z-λ)=A(z-λ). \eeqq

Theorem 2.3 indicates that RAAR is actually an ADMM with proper initialization applied to the problem in (1). In general, the step size $s$ of the dual vector $\lambda$ should be chosen properly to ensure the convergence. The following shows the relation between RAAR and the ADMM with $s=1$ . Hence, we shall focus $s=1$ in this paper.

{theorem}

Consider one $\beta$ -RAAR iteration $\{{w_{0},w_{1},}\ldots,\}$ with nonzero initialization ${w_{0}}\in\IC^{N}$ . Let $\lambda_{1}=A_{\bot}^{*}A_{\bot}w_{0},\;z_{1}=[w_{0}]_{\cZ}.$ Generate one ADMM sequence $\left\{(y_{k+1},z_{k+1},\lambda_{k+1}):k={1,2,}\ldots\right\}$ with dual step size $s=1$ , according to (6, 7, 8) with initialization $\lambda_{1},z_{1}$ . Construct a sequence $\left\{w^{\prime}_{k}:k=1,2,\ldots\right\}$ from $(y_{k},z_{k},\lambda_{k})$ , \beqq w’_k:=y_k+1+λ_k=(I-βA_⊥^* A_⊥) z_k+βA_⊥^* A_⊥λ_k. \eeqqThen the sequence $\left\{w^{\prime}_{k}:k=1,2,\ldots\right\}$ is exactly the $\beta$ -RAAR sequence, i.e., $w^{\prime}_{k}=w_{k}$ for all $k\geq 1$ .

{proof}

Use induction. For $k=1$ , we have

w_{1}^{\prime}=(I-\beta A_{\bot}^{*}A_{\bot})z_{1}+\beta A_{\bot}^{*}A_{\bot}\lambda_{1}=(I-\beta A_{\bot}^{*}A_{\bot})[w_{0}]_{\cZ}+\beta A_{\bot}^{*}A_{\bot}w_{0}=w_{1}.

Suppose $w_{k}^{\prime}=w_{k}$ for $k\geq 1$ . From (2.3) and (7), we have \beqq A_⊥^* A_⊥w’_k=A_⊥^* A_⊥((1-β) z