\ul
A New Inexact Proximal Linear Algorithm with Adaptive Stopping Criteria for Robust Phase Retrieval
Abstract
This paper considers the robust phase retrieval problem, which can be cast as a nonsmooth and nonconvex optimization problem. We propose a new inexact proximal linear algorithm with the subproblem being solved inexactly. Our contributions are two adaptive stopping criteria for the subproblem. The convergence behavior of the proposed methods is analyzed. Through experiments on both synthetic and real datasets, we demonstrate that our methods are much more efficient than existing methods, such as the original proximal linear algorithm and the subgradient method.
Keywords— Robust Phase Retrieval, Nonconvex and Nonsmooth Optimization, Proximal Linear Algorithm, Complexity, Sharpness.
1 Introduction
Phase retrieval aims to recover a signal from intensity-based or magnitude-based measurements. It finds various applications in different fields, including X-ray crystallography [1], optics [2], array and high-power coherent diffractive imaging [3], astronomy [4] and microscopy [5]. Mathematically, phase retrieval tries to find the true signal vectors or in from a set of magnitude measurements:
(1) |
where and . Directly solving the equations leads to an NP-hard problem [6], and nonconvex algorithms based on different designs of objective functions have been well studied in the literature, including Wirtinger flow [7], truncated Wirtinger flow [8], truncated amplitude flow [9], reshaped Wirtinger flow [10], etc.
In this paper, we focus on the robust phase retrieval (RPR) problem [11], which considers the case where contains noise due to measurement errors of equipment. That is,
(2) |
in which , , and denotes a random noise. [11] proposed to formulate RPR as the following optimization problem:
(3) |
It is demonstrated in [11] that using (3) for RPR possesses better recoverability compared to the median truncated Wirtinger flow algorithm [12] based on the -loss.
Solving (3) is challenging because it is a nonconvex and nonsmooth optimization problem. In [13], the authors proposed the subgradient method to solve it. This method requires geometrically decaying step size, and it is unclear how to schedule this kind of step size in practice. [11] proposed to use the proximal linear (PL) algorithm to solve (3). For ease of presentation, we rewrite (3) as
(4) |
where , , , and is a smooth map in which is element-wise square. One typical iteration of the PL algorithm is
(5) |
where is the step size,
(6) |
(7) |
denotes the Jacobian of , and “” means that the subproblem is solved inexactly. The subproblem (5) is convex and can be solved by various methods such as the proximal operator graph splitting (POGS) algorithm used in [11]. The PL method has drawn lots of attention recently. It has been studied by [14, 15, 16] and applied to solving many important applications such as RPR [11], robust matrix recovery [17, 18], and sparse spectral clustering [19]. The subproblem (5) is usually solved inexactly for practical concerns. As pointed out in [11], the PL implemented in [11] is much slower than the median truncated Wirtinger flow method. We found that this is mainly due to their stopping criterion for solving the subproblem (5), which unnecessarily solves (5) to very high accuracy in the early stage of the algorithm. Moreover, we found that the POGS algorithm used in [11] is ineffective in solving the subproblem (5). In this paper, we propose adaptive stopping criteria for inexactly solving (5) with the fast iterative shrinkage-thresholding algorithm (FISTA) [20, 21, 22]. We found that our new inexact PL (IPL) with the adaptive stopping criteria greatly outperforms existing implementations of PL methods [11] for solving RPR (4).
Our Contributions. In this paper, we propose two new adaptive stopping criteria for inexactly solving (5). The first one ensures that (5) is solved to a relatively low accuracy:
(LACC) | (8) | |||
and the second one ensures that (5) is solved to a relatively high accuracy:
(HACC) | (9) | |||
Here, and are given constants. Similar to the proximal bundle method [23] for nonsmooth convex problems, (LACC) and (HACC) are designed to ensure the sufficient decrease of the objective function for the nonsmooth and nonconvex RPR problem. Note that both (LACC) and (HACC) are only used theoretically because is not available. Later we will propose more practical stopping criteria that can guarantee (LACC) and (HACC). The connections of our approach to existing work are listed below.
-
(a)
Our (LACC) condition coincides with the inexact stopping criterion proposed in [24, 25, 26] for the proximal gradient method. In these papers, the authors focus on a different optimization problem
in which is a smooth function, and is a proper, convex, and lower semi-continuous function. One typical iteration of their algorithms can be written as
(10a) (10b) where is a positive semi-definite matrix and is a step size. The stopping criterion for inexactly solving (10a) proposed in [24, 25, 26] is
(11) where and . We note that this is the same as our (LACC). Therefore, our (LACC) is essentially an extension of (11) from the proximal gradient method to the proximal linear method.
-
(b)
To the best of our knowledge, our (HACC) criterion is new and serves as a good alternative to (LACC). From our numerical experiments, we found that (HACC) works comparably with (LACC), and we believe that it can be useful for other applications.
-
(c)
We analyze the overall complexity and the local convergence of our IPL algorithm for solving RPR under the sharpness condition. To the best of our knowledge, this is the first time such results have been obtained under the sharpness condition.
-
(d)
We propose to solve (5) inexactly using FISTA [20, 21, 22], which uses easily verifiable stopping conditions that can guarantee (LACC) and (HACC). Through extensive numerical experiments, we demonstrate that our IPL with the new stopping criteria significantly outperforms existing algorithms for solving RPR.
Organization. The rest of this paper is organized as follows. In Section 2, we propose the main framework of our inexact proximal linear algorithm with two new adaptive stopping criteria for the subproblem. We establish its iteration complexity for obtaining an -stationary point and its local convergence under the sharpness condition. Connections with some existing methods are also discussed. In Section 3, we discuss how to adapt the FISTA to solve the subproblem inexactly. We also establish the overall complexity of FISTA – the total number of iterations of the FISTA – in order to obtain an -optimal solution under the sharpness condition. In Section 4, we show the numerical results on both synthetic and real datasets to demonstrate the advantage of the proposed methods over some existing methods. The proofs for all the theorems and lemmas are given in Section 5. Finally, we include some concluding remarks in Section 6.
2 IPL and Its Convergence Analysis
In this section, we introduce our IPL algorithm for solving the RPR (4) with the inexact stopping criteria (LACC) and (HACC) for the subproblem (5) and analyze its convergence. We will discuss the FISTA for solving (5) that guarantees (LACC) and (HACC) in the next section.
We first follow [15] to introduce some notation. Let
where is defined in (7). We will also use the notation
A Meta Algorithm of our IPL is summarized in Algorithm 1. We again emphasize that Algorithm 1 cannot be implemented because is not available, and we will discuss practical versions of it in the next section.
2.1 Convergence under General Settings
In this subsection, we analyze the convergence rate of IPL (Algorithm 1) for obtaining an -stationary point of (4) under the general settings when the sharpness condition may not hold. We use the definition of -stationary point as introduced in [15].
Definition 1.
We call an -stationary point of (4) if the following inequality holds:
(12) |
where is the proximal gradient which is defined as:
(13) |
Our convergence rate result of Algorithm 1 is given in Theorem 1, and the proof is given in Section 5.
Theorem 1.
Denote . For Algorithm 1 with , the following conclusion holds.
-
(a)
When (LACC) holds with for any , we can find an -stationary point in
iterations for any .
-
(b)
When (HACC) holds with for any , we can find an -stationary point in
iterations for any .
Theorem 1 shows that IPL finds an -stationary point in main iterations with the adaptive IPL stopping conditions. Moreover, Theorem 1 achieves the best known convergence rate for PL in [15]. We should point out that we use two adaptive stopping criteria for the subproblem, but [15] requires solving the subproblem (5) exactly (see their Proposition 3) or using their pre-determined subproblem accuracy conditions (see their Theorem 5.2).
2.2 Local Convergence under Sharpness Assumption
In this subsection, we analyze the local convergence of IPL (Algorithm 1) to the global optimal solution under the sharpness condition.
Assumption 1 (Sharpness).
There exists a constant such that the following inequality holds for any :
(14) |
where .
[11] proved that the sharpness condition (Assumption 1) is satisfied by the RPR (4) with high probability under certain mild conditions.
Another assumption is about the closeness between the initial point and the optimal solution, which can be guaranteed by the modified spectral initialization (see Algorithm 3 in [11]) with high probability under some mild conditions.
Assumption 2.
We now define the -optimal solution to RPR (4).
Definition 2.
We call an -optimal solution to RPR (4), if .
Now we are ready to show in Theorem 2 that, in terms of main iteration number, (LACC) leads to local linear convergence and (HACC) leads to local quadratic convergence.
Theorem 2.
2.3 Related Work
There are two closely related works that need to be discussed here. [11] studied the PL algorithm for solving RPR (4), and established its local quadratic convergence under the sharpness condition. But their theoretical analysis requires the subproblem (5) to be solved exactly. In practice, [11] proposed to use POGS [27], which is a variant of the alternating direction method of multipliers (ADMM), to solve (5) inexactly. However, they did not provide any convergence analysis for the algorithm when the subproblem (5) is solved inexactly by POGS. [15] also considered solving (4) for obtaining an -stationary point as defined in Definition 1.222The authors of [15] actually considered solving a more general problem . Here, for simplicity, we assume that and this does not affect the discussion. Indeed, several algorithms were proposed and analyzed in [15]. In particular, a practical algorithm proposed by [15] uses FISTA [20, 21, 22] to inexactly solve
which is the dual problem of a smoothed version of (5). Here is the Fenchel conjugate of , and is the Moreau envelope of , which is defined as
[15] proposed to terminate the FISTA when , and then update by . The authors established the overall complexity of this algorithm for suitably chosen parameters and . Compared to [15], we use adaptive stopping criteria and provide a better convergence rate based on Assumption 1.
3 FISTA for Solving the Subproblem Inexactly
In this section, we propose to use the FISTA to inexactly solve (5) with more practical stopping criteria that guarantee (LACC) and (HACC). Therefore, the convergence results (Theorems 1 and 2) in Section 2 still apply here.
For simplicity, we let throughout this section and rewrite (5) as follows.
(17) |
where we denote and . As a result, (LACC) and (HACC) can be rewritten respectively as
(18) |
and
(19) |
In IPL, we set , where satisfies either (18) or (19). The dual problem of (17) is
(20) |
From weak duality, we know that
(21) |
Therefore, can serve as a lower bound for , and we can obtain verifiable stopping criteria that are sufficient conditions for (18) and (19). This leads to our inexact FISTA for solving (17), which is summarized in Algorithm 2. Here we define .
(22) | ||||
(LACC-FISTA) | ||||
(23) | ||||
(HACC-FISTA) | ||||
(24) |
Remark 1.
Here we remark on the step size in (22). It can be chosen as for some or chosen by the Armijo backtracking line search. More specifically, suppose that we have an initial step size . Given the step size , denote
and
can be selected as
We now discuss the overall complexity of the IPL (Algorithm 1) with the subproblem (5) solved inexactly by FISTA (Algorithm 2). For ease of presentation, we denote this algorithm as IPL+FISTA. We assume that IPL is terminated after iterations, when an -optimal solution is found, i.e., . We use to denote the number of iterations of FISTA when it is called in the -th iteration (getting from ) of IPL. The overall complexity of IPL+FISTA for obtaining an -optimal solution is thus given by , which equals the times that we call (22). Now we are ready to give the overall complexity of IPL+FISTA in Theorem 3.
Theorem 3.
Let , and suppose that Assumption 1 holds.
- (a)
-
(b)
(High Accuracy) For the overall complexity of Algorithm 1, when using Algorithm 2 with option (24) with , there exist positive constants , and with and such that if , then
(26) Here only depend on and , depends on and they will be specified later in the proof. Note that (26) implies that the worst-case overall complexity might be higher than – see the explanation below for more details.
Theorem 3 shows that under the (LACC-FISTA) stopping criterion, we need iterations to find an -optimal solution, and under the (HACC-FISTA) stopping criterion, we have the same rate with regard to a countable positive sequence that decreases to zero. Theorem 3 provides better theoretical rates compared to in [15]. Moreover, the results in Theorem 3 are about the convergence to -optimal solution, while the results in [15] are for convergence to -stationary point. Our results require the sharpness condition, which was not assumed in [15].
For (b) in Theorem 3, we can only find a countable sequence of diminishing ’s to show the rate. We cannot show the rate for any fixed . This is because of the local quadratic convergence under (HACC) shown in (b) of Theorem 3. For instance, if our initial point satisfies , under the (HACC), the quadratic convergence result in Theorem 3 (b) implies that for some constant when is sufficiently small. Therefore, IPL finds an -optimal stationary point with only one main iteration. However, Lemma 15 (b) indicates that . Therefore, it is possible that the overall complexity becomes , which is higher than .
4 Numerical Experiments
In this section, we conduct numerical experiments to compare our IPL method with existing methods for solving the RPR problem (4). Readers can find the code and datasets to replicate the experiments in this section via https://github.com/zhengzhongpku/IPL-code-share. The algorithms that we test include the following ones.
-
(i)
PL: The original proximal linear algorithm proposed by [11] where the subproblem (5) is solved by POGS [27]. POGS terminates when both the primal residual and the dual residual are small enough. In their code, the authors [11] implemented a two-stage trick that uses a relatively larger tolerance in early iterations and a smaller tolerance in later iterations to terminate the POGS. In our comparison, we use all the default parameters set by the authors in their code333The code of [11] can be downloaded from https://web.stanford.edu/~jduchi/projects/phase-retrieval-code.tgz.
- (ii)
- (iii)
The initial point for all the tested algorithms is generated by the spectral initialization given in Algorithm 3 in [11]. All the code is run on a server with Intel Xeon E5-2650v4 (2.2GHz). Each task is limited to a single core – no multi-threading is used.
4.1 Synthetic Data
We generate synthetic data following the same manner as [11]. Specifically, ’s are drawn randomly from the normal distribution . The entries of are drawn randomly from discrete Bernoulli distribution. We denote , where is generated by random sampling without replacement from . ’s for these samples in (2) are independently drawn from Cauchy distribution, which means that
where is the sample median of . For a given threshold , we call an algorithm successful if it returns an such that the relative error
(28) |
For each combination of and , we randomly generate 50 instances according to the above procedure, and we report the success rate of the algorithm among the 50 instances. For IPL-FISTA-Low and IPL-FISTA-High, we set For the subgradient method, we set which is one of the suggested choices of in [13]. Moreover, [13] did not specify how to choose , and we set as we found that this choice gave good performance. Since we found that in most cases, the relative error given by PL is in the level of , we set in (28) for PL. In our comparison, we first run PL using the default settings of the code provided by the authors of [11]. If the returned satisfies (28) with , then we claim that PL is successful, and we terminate IPL and Subgradient method once they found an iterate with a smaller objective value than the one given by PL. If the iterate returned by PL does not satisfy (28) with , then we claim that PL failed, and we then terminate IPL and Subgradient method when . The CPU time is only reported for the successful cases for PL. The cost of computing the spectral norm to obtain is included in the total CPU time of PL and IPL.
The simulation results corresponding to and are shown in Figures 1 and 2, where the -axis corresponds to different values of since is fixed. From both Figures 1 and 2, we can see that the four algorithms have similar success rates, but the total CPU time of IPL-FISTA-Low and IPL-FISTA-High that includes the cost of computing the spectral norm is significantly less than that of others.


4.2 Image Recovery
In this section, we compare the four candidate algorithms on images in a similar manner as [11]. In particular, suppose we have an RGB image array , we construct the signal as , in which . Let be the Hadamard matrix and is a random diagonal matrix, and its diagonal elements are independently distributed as discrete uniform distribution. We then let and we know in this case. The advantage of such a mapping is that it mimics the fast Fourier transform, and calculating is only of time complexity . We first examine the numerical comparisons on a real RNA nanoparticles image444https://visualsonline.cancer.gov/details.cfm?imageid=11167 as shown in Figure 3, and we follow the code of [11] for the experiments on a sub-image with . We also take , and set the noise in the same way as the synthetic datasets. For each combination of dataset parameters, we run 50 replicates by generating 50 different and test all the candidate algorithms. We use the same way as the synthetic datasets to define success. For IPL, For the Subgradient method, . For a replicate, if PL succeeds, the CPU time is the time needed to reach (28).

Table 1 reports the median CPU time (in seconds) of the candidate algorithms for and based on two tolerances and . We only show the results for this combination of and because other choices give similar results. It is noted from Table 1 that PL can only reach a relative error that takes value in , and IPL-FISTA is much more efficient than PL and Subgradient methods.
PL | 3166.93 | NA |
Subgradient | 88.14 | 659.64 |
IPL-FISTA-Low | 6.01 | 218.14 |
IPL-FISTA-High | 48.56 | 175.60 |
Additional experimental results are reported in Table 2 for comparing CPU time for the four candidate algorithms. We provide four images with being at most . The experiments use the same and , and the CPU time based on ten replications is reported in the form of “median (Interquartile Range)”. Subgradient method, IPL-FISTA-Low and IPL-FISTA-High are terminated with tolerance . We can see that PL cannot terminate within 10 hours, and IPL-FISTA still enjoys the best efficiency.
RNA () | Hubble555https://www.nasa.gov/image-feature/goddard/2017/hubble-hones-in-on-a-hypergiants-home () | |
PL | ||
Subgradient | 4.39 (0.24) | 4.72 (0.27) |
IPL-FISTA-Low | 1.65 (0.41) | 2.07 (0.14) |
IPL-FISTA-High | 1.30 (0.08) | 1.26 (0.06) |
James Web666https://www.nasa.gov/webbfirstimages () | Penn State777https://www.britannica.com/topic/Pennsylvania-State-University () | |
PL | ||
Subgradient | 1.86 (0.14) | 4.84 (0.66) |
IPL-FISTA-Low | 0.90 (0.21) | 2.23 (0.16) |
IPL-FISTA-High | 0.56 (0.09) | 1.41 (0.21) |
5 Proofs
5.1 Proof of Theorem 1
Before proceeding, we first present some lemmas.
Lemma 1 (Weak Convexity).
(Discussion of Condition C2 in [11]) The following inequalities hold for any , and .
(29) | |||
(30) |
Lemma 2 (see, e.g., equation (5.2) in [15]).
The following inequality holds for any and
(31) |
The following lemma studies one iteration of our IPL algorithm (as summarized in Algorithm 1).
Lemma 3.
Proof of Lemma 3.
(a). Letting and in (31), we have
where the second inequality follows from (8) and (30). This proves (32) in Lemma 3(a).
(b). When (9) holds, since is -strongly convex, we have
(34) |
Let , . From (34) we have
(35) |
and
(36) |
where the first inequality is from the Cauchy-Schwarz inequality, and the second inequality follows from (35) and the fact that . Therefore, from (34) and (36) we have
which, together with (31), yields
Therefore, the proof of Lemma 3 is complete. ∎
Now we are ready to prove Theorem 1.
Proof of Theorem 1.
We will prove (a) and (b) together. Both (32) and (33) indicate that generated by Algorithm 1 satisfies
in which is a constant that for LACC and for HACC. Letting , we have
Then, our IPL (Algorithm 1) finds an -stationary point to RPR (4) in
iterations. Therefore, the proof of both (a) and (b) in Theorem 1 is complete. ∎
5.2 Proof of Theorem 2
To prove Theorem 2, we need the following lemmas.
Lemma 4.
(Part of the Proof for Theorem 1 in [11]) Let . For any , we have
(37) |
which also holds if we replace by .
Proof of Lemma 4.
We have
(38) |
where the first inequality follows from the Cauchy-Schwarz inequality and the second one is from the convexity of . We then have
where the first and the last inequalities are from (29), and the second inequality is from the strong convexity of . Combining this inequality with (38) yields (37). It is easy to find that all the proofs still hold if we replace with . Therefore, the proof of Lemma 4 is complete. ∎
Lemma 5 (One-step progress).
Proof of Lemma 5.
(a). When the low accuracy condition (8) holds, from (30) we have
which, combining with (37), yields
Discarding the term , we get
Since (37) also holds when is replaced by , we also have
This proves (39). (40) holds because of Assumption 1. Hence, it proves part (a) of Lemma 5.
(b). When the high accuracy condition (9) holds, we have
where the second inequality is from the Cauchy-Schwarz inequality. Combining with (37), we have
Similarly, replacing by , we have
Combining the above two inequalities with Assumption 1 yields (41), which proves part (b) of Lemma 5.
Therefore, the proof of Lemma 5 is complete. ∎
Now we are ready to give the proof of Theorem 2.
Proof of Theorem 2.
(a). We will prove that for any ,
(42) |
by induction, which immediately leads to the conclusion of (a) by Assumption 1. First, (42) clearly holds for . Now we assume that it holds for . For , since (15) holds, we have
Using (40) directly proves that (42) holds when is replaced by . This proves part (a) of Theorem 2.
(b). Note that (41) is equivalent to
(43) |
Since (16) holds, from (43) we know that
Plug this back into (43) and work recursively, we get
Then, , which together with (43) proves the quadratic convergence in part (b) of Theorem 2.
Therefore, the proof of Theorem 2 is complete. ∎
5.3 Proof of Theorem 3
Throughout this subsection, we assume that the assumptions in Theorem 3 hold. That is, we assume , in Algorithm 2, and Assumption 1 holds. To prove Theorem 3, we first present some lemmas.
Lemma 6 (Local Lipschitz Constant of ).
It holds that
Proof of Lemma 6.
Lemma 7.
Proof of Lemma 7.
Lemma 8 (Bound of ).
For any , if , then
Proof of Lemma 8.
Since , we have and The desired result follows by using . Therefore, the proof of Lemma 8 is complete. ∎
Next, we provide Lemmas 9 and 10 to show the convergence rate for solving the subproblem with Algorithm 2. These results can be used for both conditions for (a) and (b).
Lemma 10 (Theorem 4 in [28]).
For Algorithm 2, there exist universal positive constants such that, when , it holds
(45) |
Proof of Lemma 10.
We now define some constants.
and where , , , and are universal positive constants mentioned in Lemma 10.
We now formally state the sufficiency of (23) and (24) for (8) and (9) in Lemma 11 so that we can use the linear and quadratic convergence rate for main iterations.
Proof of Lemma 11.
Note that , and . The conclusion immediately holds because and
which is from strong duality. Therefore, the proof of Lemma 11 is complete. ∎
Based on Lemma 11, we prove some properties induced by conditions of Theorem 3. In particular, Lemma 12 gives the ones induced by part (a) of Theorem 3, and Lemma 13 gives the ones induced by part (b) of Theorem 3.
Lemma 12.
Lemma 13.
Assume Assumption 1 and (9) hold for any with some . If , then for any , (46) holds, and the following two inequalities hold
(49) | ||||
(50) |
Proof of Lemma 13.
Based on (41), we have
(51) |
We prove (49) by induction. When , noticing that , by (51), we have , and therefore (49) holds for . We now assume that (49) holds for for . We then have and using (51), we have , which implies that (49) holds for . This completes the proof for (49), which immediately indicates (50) for any . This indicates that . By Lemma 7, , which indicates that (46) holds for any . Therefore, the proof of Lemma 13 is complete. ∎
We can notice that a common property under conditions for (a) or (b) is that (46) holds for any , based on which, we give Lemma 14 to show another common property.
Proof of Lemma 14.
For fixed , we define and . Therefore and satisfy (8) (with replaced by and replaced by ) with . Hence we have and
(54) |
where the inequality is from (40) with . Moreover, from Assumption 1 we have
where the second inequality can be obtained by applying (40). Thus, we have . So, based on Lemma 6 and Assumption 1,
(55) |
where the second inequality is from (54), and the last inequality is from Lemma 6. (55) implies that
This proves (52). To prove (53), without loss of generality, we assume that . Using (29) and the fact that and noticing that is the minimizer of , we have
where the second inequality is due to (29), and the last inequality is due to Assumption 1. Using the fact that based on Assumption 1, we have
which proves (53). Therefore, the proof of Lemma 14 is complete. ∎
Now, we are ready to give an upper bound for .
Lemma 15.
Assume that Assumption 1 holds. For Algorithm 2 under either options (23) with or (24) with , for any , if (46) holds, the following statements hold.
-
(a)
(Low Accuracy) When using option (23), we have
(56) -
(b)
(High Accuracy) When using option (24), we have
(57)
Here, and are the constants in Lemma 10 and , and
Proof of Lemma 15 .
(a). By choosing
(58) |
we have
(59) |
which, together with (45), yields that
(60) |
where the last inequality is from (53). From (60) we have
where the last inequality is due to (21), and this gives (23). Therefore, (23) should have already been satisfied when (58) holds. This proves (56) in Lemma 15(a).
(b). By choosing
(61) |
we have
(62) |
which, together with (45), yields that
(63) |
where the last inequality is from (52). Denote , which indicates that . We have
which follows from the fact that is -strongly convex and (21). From (63), we have So, by the Cauchy-Schwarz inequality, we have
Together with (63), (24) should have already been satisfied when (61) holds. This proves (57) in Lemma 15(b).
Therefore, the proof of Lemma 15 is complete. ∎
Next, we are ready to present the proof of Theorem 3.
Proof of Theorem 3.
We now prove the desired result. If , then and (25) holds. If , we denote as the smallest index such that and . From Lemma 6 we have
(65) | ||||
Here we explain how these inequalities were obtained. The first inequality follows from Lemma 6 and (47). Since (46) holds, (40) holds for any and it implies the second inequality in (65). The third inequality in (65) follows from Assumption 1 and the last inequality in (65) follows from the fact that .
From (65) we know that , it holds that
(66) |
Therefore, we have
where the first inequality follows from (64) and the second inequality follows from (66). This completes the proof of Theorem 3(a).
(b). Lemma 8 and (50) indicates that . Together with (46), (49), (57), ,
(67) |
which follows from our choice of such that . Next, we prove the conclusion. We adopt the same definition of as in (a) and pick so that . From (51) we have
(68) |
(69) |
Finally, for any , we have
Here, the first inequality follows from (67) and (69), and the second inequality follows from the fact that
This completes the proof of Theorem 3(b).
Therefore, the proof of Theorem 3 is complete. ∎
6 Conclusion
In this paper, we proposed a new inexact proximal linear algorithm for solving the robust phase retrieval problem. Our contribution lies in the two adaptive stopping criteria for the subproblem in the proximal linear algorithm. We showed that the iteration complexity of our inexact proximal linear algorithm is in the same order as the exact proximal linear algorithm. Under the sharpness condition, we are able to prove the local convergence of the proposed method. Moreover, we discussed how to use the FISTA for solving the subproblem in our inexact proximal linear algorithm, and analyzed the total oracle complexity for obtaining an -optimal solution under the sharpness condition. Numerical results demonstrated the superior performance of the proposed methods over some existing methods.
References
- [1] J. Miao, P. Charalambous, J. Kirz, and D. Sayre, “Extending the methodology of x-ray crystallography to allow imaging of micrometre-sized non-crystalline specimens,” Nature, vol. 400, no. 6742, pp. 342–344, 1999.
- [2] R. P. Millane, “Phase retrieval in crystallography and optics,” Journal of the Optical Society of America A, vol. 7, no. 3, pp. 394–411, 1990.
- [3] A. Chai, M. Moscoso, and G. Papanicolaou, “Array imaging using intensity-only measurements,” Inverse Problems, vol. 27, no. 1, p. 015005, 2010.
- [4] C. Fienup and J. Dainty, “Phase retrieval and image reconstruction for astronomy,” Image Recovery: Theory and Application, vol. 231, p. 275, 1987.
- [5] J. Miao, T. Ishikawa, Q. Shen, and T. Earnest, “Extending x-ray crystallography to allow the imaging of noncrystalline materials, cells, and single protein complexes,” Annual Review of Physical Chemistry, vol. 59, pp. 387–410, 2008.
- [6] M. Fickus, D. G. Mixon, A. A. Nelson, and Y. Wang, “Phase retrieval from very few measurements,” Linear Algebra and its Applications, vol. 449, pp. 475–499, 2014.
- [7] E. J. Candes, X. Li, and M. Soltanolkotabi, “Phase retrieval via wirtinger flow: Theory and algorithms,” IEEE Transactions on Information Theory, vol. 61, no. 4, pp. 1985–2007, 2015.
- [8] Y. Chen and E. J. Candès, “Solving random quadratic systems of equations is nearly as easy as solving linear systems,” Communications on Pure and Applied Mathematics, vol. 70, no. 5, pp. 822–883, 2017.
- [9] G. Wang, G. B. Giannakis, and Y. C. Eldar, “Solving systems of random quadratic equations via truncated amplitude flow,” IEEE Transactions on Information Theory, vol. 64, no. 2, pp. 773–794, 2017.
- [10] H. Zhang, Y. Zhou, Y. Liang, and Y. Chi, “A nonconvex approach for phase retrieval: Reshaped wirtinger flow and incremental algorithms,” Journal of Machine Learning Research, vol. 18, 2017.
- [11] J. C. Duchi and F. Ruan, “Solving (most) of a set of quadratic equalities: Composite optimization for robust phase retrieval,” Information and Inference: A Journal of the IMA, vol. 8, no. 3, pp. 471–529, 2019.
- [12] H. Zhang, Y. Chi, and Y. Liang, “Provable non-convex phase retrieval with outliers: Median truncated wirtinger flow,” in International Conference on Machine Learning. PMLR, 2016, pp. 1022–1031.
- [13] D. Davis, D. Drusvyatskiy, K. J. MacPhee, and C. Paquette, “Subgradient methods for sharp weakly convex functions,” Journal of Optimization Theory and Applications, vol. 179, no. 3, pp. 962–982, 2018.
- [14] A. S. Lewis and S. J. Wright, “A proximal method for composite minimization,” Mathematical Programming, vol. 158, no. 1, pp. 501–546, 2016.
- [15] D. Drusvyatskiy and C. Paquette, “Efficiency of minimizing compositions of convex functions and smooth maps,” Mathematical Programming, vol. 178, no. 1, pp. 503–558, 2019.
- [16] J. C. Duchi and F. Ruan, “Stochastic methods for composite and weakly convex optimization problems,” SIAM Journal on Optimization, vol. 28, no. 4, pp. 3229–3259, 2018.
- [17] V. Charisopoulos, Y. Chen, D. Davis, M. Díaz, L. Ding, and D. Drusvyatskiy, “Low-rank matrix recovery with composite optimization: good conditioning and rapid convergence,” Foundations of Computational Mathematics, pp. 1–89, 2021.
- [18] V. Charisopoulos, D. Davis, M. Díaz, and D. Drusvyatskiy, “Composite optimization for robust blind deconvolution,” arXiv preprint arXiv:1901.01624, 2019.
- [19] Z. Wang, B. Liu, S. Chen, S. Ma, L. Xue, and H. Zhao, “A manifold proximal linear method for sparse spectral clustering with application to single-cell RNA sequencing data analysis,” INFORMS Journal on Optimization, vol. 4, no. 2, pp. 200–214, 2022.
- [20] Y. Nesterov, “On an approach to the construction of optimal methods of minimization of smooth convex functions,” Ekonomika i Mateaticheskie Metody, vol. 24, no. 3, pp. 509–517, 1988.
- [21] A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM Journal on Imaging Sciences, vol. 2, no. 1, pp. 183–202, 2009.
- [22] Y. Nesterov, “Gradient methods for minimizing composite functions,” Mathematical Programming, vol. 140, no. 1, pp. 125–161, 2013.
- [23] M. Díaz and B. Grimmer, “Optimal convergence rates for the proximal bundle method,” SIAM Journal on Optimization, vol. 33, no. 2, pp. 424–454, 2023.
- [24] S. Bonettini, I. Loris, F. Porta, and M. Prato, “Variable metric inexact line-search-based methods for nonsmooth optimization,” SIAM Journal on Optimization, vol. 26, no. 2, pp. 891–921, 2016.
- [25] C.-p. Lee and S. J. Wright, “Inexact successive quadratic approximation for regularized optimization,” Computational Optimization and Applications, vol. 72, no. 3, pp. 641–674, 2019.
- [26] ——, “Inexact variable metric stochastic block-coordinate descent for regularized optimization,” Journal of Optimization Theory and Applications, vol. 185, no. 1, pp. 151–187, 2020.
- [27] N. Parikh and S. Boyd, “Block splitting for distributed optimization,” Mathematical Programming Computation, vol. 6, no. 1, pp. 77–102, 2014.
- [28] C. Dünner, S. Forte, M. Takác, and M. Jaggi, “Primal-dual rates and certificates,” in International Conference on Machine Learning. PMLR, 2016, pp. 783–792.