On the inexact scaled gradient projection method
Abstract
The purpose of this paper is to present an inexact version of the scaled gradient projection method on a convex set, which is inexact in two sense. First, an inexact projection on the feasible set is computed, allowing for an appropriate relative error tolerance. Second, an inexact non-monotone line search scheme is employed to compute a step size which defines the next iteration. It is shown that the proposed method has similar asymptotic convergence properties and iteration-complexity bounds as the usual scaled gradient projection method employing monotone line searches.
Keywords: Scaled gradient projection method, feasible inexact projection, constrained convex optimization.
AMS subject classification: 49J52, 49M15, 65H10, 90C30.
1 Introduction
This paper is devoted to the study of the scaled gradient projection (SGP) method with non-monotone line search to solve general constrained convex optimization problems as follows
(1) |
where is a closed and convex subset of and is a continuously differentiable function. Denotes by the optimal value of (1) and by its solution set, which we will assume to be nonempty unless the contrary is explicitly stated. Problem (1) is a basic issue of constrained optimization, which appears very often in various areas, including finance, machine learning, control theory, and signal processing, see for example [20, 21, 35, 46, 50, 61]. Recent problems considered in most of these areas, the datasets are large or high-dimensional and their solutions need to be approximated quickly with a reasonably accuracy. It is well known that SGP method with non-monotone line search is among those that are suitable for this task, as will be explained below.
The gradient projection method is what first comes to mind when we start from the ideas of the classic optimization methods in an attempt to deal with problem (1). In fact, this method is one of the oldest known optimization methods to solve (1), the study of its convergence properties goes back to the works of Goldstein [39] and Levitin and Polyak [49]. After these works, several variants of it have appeared over the years, resulting in a vast literature on the subject, including [10, 11, 12, 33, 35, 40, 47, 56, 67]. Additional reference on this subject can be found in the recent review [17] and references therein. Among all the variants of the gradient projection method, the scaled version has been especially considered due to the flexibility provided in efficient implementations of the method; see [13, 5, 16, 18, 19]. In addition, its simplicity and easy implementation has attracted the attention of the scientific community that works on optimization over the years. This method usually uses only first-order derivatives, which makes it very stable from a numerical point of view and therefore quite suitable for solving large-scale optimization problems, see [52, 53, 61, 62]. At each current iteration, SGP method moves along the direction of the negative scaled gradient, and then projects the obtained point onto the constraint set. The current iteration and such projection define a feasible descent direction and a line search in this direction is performed to define the next iteration. It is worth mentioning that the performance of SGM method is strongly related to each of the steps we have just mentioned. In fact, the scale matrix and the step size towards the negative scaled gradient are freely selected in order to improve the performance of SGM method but without increasing the cost of each iteration. Strategies for choosing both has its origin in the study of gradient method for unconstrained optimization, papers addressing this issues include but not limited to [7, 18, 26, 27, 29, 36, 69, 25, 49]. More details about about selecting step sizes and scale matrices can be found in the recent review [17] and references therein.
In this paper, we are particularly interested in the main stages that make up the SGP method, namely, in the projection calculation and in the line search employed. It is well known that the mostly computational burden of each iteration of the SGP method is in the calculation of the projection. Indeed, the projection calculation requires, at each iteration, the solution of a quadratic problem restricted to the feasible set, which can lead to a substantial increase in the cost per iteration if the number of unknowns is large. For this reason, it may not be justified to carry out exact projections when the iterates are far from the solution of the problem. In order to reduce the computational effort spent on projections, inexact procedures that become more and more accurate when approaching the solution, have been proposed, resulting in more efficient methods; see for exemple [13, 16, 38, 42, 60, 64, 57]. On the other hand, non-monotonous searches can improve the probability of finding an optimal global solution, in addition to potentially improving the speed of convergence of the method as a whole, see for example [24, 55, 63]. The concept of non-monotone line search, that we will use here as a synonym for inexact line search, have been proposed first in [45], and later a new non-monotone search was proposed in [68]. After these papers others non-monotone searches appeared, see for example [3, 51]. In [59], an interesting general framework for non-monotonous line research was proposed, and more recently modifications of it have been presented in [43, 44].
The purpose of the present paper is to present an inexact version of the SGP method, which is inexact in two sense. First, using a version of scheme introduced in [13] and also a variation of the one appeared [64, Example 1], the inexact projection onto the feasible set is computed allowing an appropriate relative error tolerance. Second, using the inexact conceptual scheme for the line search introduced in [44, 59], a step size is computed to define the next iteration. More specifically, initially we show that the feasible inexact projection of [13] provides greater latitude than the projection of [64, Example 1]. In the first convergence result presented, we show that the SGP method using the projection proposed in [13] preserves the same partial convergence result as the classic method, that is, we prove that every accumulation point of the sequence generated by the SGP method is stationary for problem (1). Then, considering the inexact projection of [64, Example 1], and under mild assumptions, we establish full asymptotic convergence results and some complexity bounds. The presented analysis of the method is done using the general non-monotonous line search scheme introduced in [44]. In this way, the proposed method can be adapted to several line searches and, in particular, will allow obtaining several known versions of the SGP method as particular instances, including [10, 13, 47, 66]. Except for the particular case when we assume that the SGP method employs the non-monotonous line search introduced by [45], all other asymptotic convergence and complexity results are obtained without any assumption of compactness of the sub-level sets of the objective function. Finally, it is worth mentioning that the complexity results obtained for the SGP method with a general non-monotone line search are the same as in the classic case when the usual Armijo search is employed, namely, the complexity bound is unveil for finding -stationary points for problem (1) and, under convexity on , the rate to find a -optimal functional value is .
In Section 2, some notations and basic results used throughout the paper is presented. In particular, Section 2.1 is devoted to recall the concept of relative feasible inexact projection and some new properties about this concept are presented. Section 3 describes the SGP method with a general non-monotone line search and some particular instances of it are presented. Partial asymptotic convergence results are presented in Section 4. Section 5 presents a full convergence result and iteration-complexity bounds. Some numerical experiments are provided in Section 6. Finally, some concluding remarks are made in Section 7.
2 Preliminaries and basic results
In this section, we introduce some notation and results used throughout our presentation. First we consider the index set , the usual inner product in , and the associated Euclidean norm . Let be a differentiable function and . The gradient of is said to be Lipschitz continuous in with constant if , for all . Combining this definition with the fundamental theorem of calculus, we obtain the following result whose proof can be found in [12, Proposition A.24].
Lemma 1.
Let be a differentiable function and . Assume that is Lipschitz continuous in C with constant . Then, , for all .
Assume that is a convex set. The function is said to be convex on , if , for all . We recall that a point is a stationary point for problem (1) if
(2) |
Consequently, if is a convex function on , then (2) implies that . We end this section with some useful concepts for the analysis of the sequence generated by the scaled gradient method, for more details, see [23]. For that, let be a positive definite matrix and be the norm defined by
(3) |
For a fixed constant , denote by the set of symmetric positive definite matrices with all eigenvalues contained in the interval . The set is compact. Moreover, for each , it follows that also belongs to . Furthermore, due to , by (3), we obtain
(4) |
Definition 1.
Let be a sequence in and be a sequence in . The sequence is said to be quasi-Fejér convergent to a set with respect to if, for all , there exists a sequence such that , , and , for all .
The main property of a quasi-Fejér convergent sequence is stated in the next result. Its proof can be found in [23] but, for sake of completeness, we include it here.
Theorem 2.
Let be a sequence in and be a sequence in . If is quasi-Fejér convergent to a nomempty set with respect to , then is bounded. Furthermore, if a cluster point of belongs to , then .
Proof.
Take . Definition 1 implies that , for all . Thus, using the first inequality in (4), we conclude that , for all . Therefore, combining the two previous inequalities, we conclude that is bounded. Let be a cluster point of and be a subsequence of such that . Take . Since and , there exists such that and such that , for all . Hence, using the first inequality in (4) and taking into account that , for all , we have for all . Hence, using the second inequality in (4), we conclude that for all . Therefore, . ∎
2.1 Relative feasible inexact projections
In this section, we recall two concepts of relative feasible inexact projections onto a closed and convex set, and also present some new properties of them which will be used throughout the paper. These concepts of inexact projections were introduced seeking to make the subproblem of computing the projections on the feasible set more efficient; see for example [13, 60, 64]. Before presenting the inexact projection concept that we will use, let us first recall the concept of exact projection with respect to a given norm. For that, throughout this section . The exact projection of the point onto with respect to the norm , denoted by , is defined by
(5) |
The next result characterizes the exact projection, its proof can be found in [8, Theorem 3.14].
Lemma 3.
Let . Then, if and only if and , for all
Remark 1.
In the following, we recall the concept of a feasible inexact projection with respect to relative to a fixed point.
Definition 2.
The feasible inexact projection mapping, with respect to the norm , onto relative to a point and forcing parameter , denoted by , is the set-valued mapping defined as follows
(6) |
Each point is called a feasible inexact projection, with respect to the norm , of onto relative to and forcing parameter .
In the following, we show that the definition given above is nothing more than a reformulation of the concept of relative feasible inexact projection with respect to introduced in [13].
Remark 2.
Let , and be an positive definite matrix. Consider the quadratic function defined by . Thus, letting be the norm defined by (3), some algebraic manipulations shows that
(7) |
Hence, (7) and (5) implies that Let . Thus, by using (7), after some calculations, we can see that the following inexactness condition introduced in [13],
is equivalent to find such that , which corresponds to condition (6) in Definition 2.
The concept of feasible inexact projection in Definition 2 provides more latitude to the usual concept of exact projection (5). The next remark makes this more precise.
Remark 3.
Lemma 4.
Let , and . Then, there hold
Proof.
Let . Since , using (6) we have , which is equivalent to the desired inequality. ∎
Next, we recall a second concept of relative feasible inexact projection onto a closed convex set, see [2, 28]. The definition is as follows.
Definition 3.
The feasible inexact projection mapping, with respect to the norm , onto relative to and forcing parameter , denoted by , is the set-valued mapping defined as follows
(8) |
Each point is called a feasible inexact projection, with respect to the norm , of onto relative to and forcing parameter .
The concept of feasible inexact projection in Definition 3 also provides more latitude to the usual concept of exact projection. Next, we present some remarks about this concept.
Remark 4.
Lemma 5.
Let , , and . Then, there hold
for all and .
Proof.
Lemma 6.
Let , , and . If and , then
Proof.
Remark 5.
Under the conditions of Lemma 6, there exists and such that . Indeed, let , , and , then
Since is the exact projection of , we have . Combining this inequality with the last equality and Definition 2, we conclude that . Now, letting with , after some algebraic manipulations we have
Thus, it follows from Lemma 3 that . Hence, taking we conclude that . Therefore, considering that , the statement follows.
It follows from Remark 5 that, in general, . However, whenever is a bounded set, we will show that for each fixed there exist such that . For that, we first need the next lemma.
Lemma 7.
Let , and . Assume that is a bounded set and take
(10) |
where denotes the diameter of . Then, .
Proof.
Take satisfying (10) and such that . For all , we have
Using Lemma 3, we have . Thus, the last equality becomes
By using Cauchy-Schwarz inequality, we conclude from the last inequality that
Since and , the last inequality implies that
(11) |
On the other hand, if satisfies (10) then
hence
Since , we have and we can conclude that
It follows from (11) that
(12) |
Using again that and the triangular inequality, we have
Hence, taking into account (12), we conclude that . Therefore, it follows from Definition 3 that . ∎
Proposition 8.
Let , and assume that is a bounded set. Then, for each , there exist such that .
Proof.
Next, we present some important properties of inexact projections, which will be useful in the sequel.
Lemma 9.
Proof.
Since , applying Lemma 4 with , , , and , we conclude, after some algebraic manipulations, that
Substituting in the left hand side of the last inequality, some manipulations yield the inequality of item . For proving item , we first assume that is stationary for problem (1). In this case, (2) implies that . Hence, due to , item implies
Since and , the last inequality yields . Therefore, . Reciprocally, if , then Definition 2 implies that
Hence, . Considering that we have
Thus, due to exact projection with respect to the norm be unique and , we have . Hence, is the solution of the constrained optimization problem , which taking into account that implies (2). Therefore, is stationary point for problem (1). Finally, to prove item , take a nonstationary point for problem (1). Thus, by item , and taking into account that , we conclude that . Since , and , it follows from item that and the first sentence is proved. Finally, note that the second sentence is the contrapositive of the first sentence. ∎
Finally, it is worth mentioning that Definitions 2 and 3, introduced respectively in [13] and [28], are relative inexact concepts, while the concept introduced in [60, 64] is absolute.
2.1.1 Practical computation of inexact projections
In this section, for a given and , we discuss how to calculate a point belonging to or . We recall that Lemma 6 implies that has more latitude than , i.e., .
We begin our discussion by showing how a point can be calculated without knowing the point . Considering that this discussion has already been covered in [13, Section 3, Algorithm 3.1], we will limit ourselves to giving a general idea of how this task is carried out; see also [16, Section 5.1]. The idea is to use an external procedure capable of computing two sequences and satisfying the following conditions
(13) |
In this case, if , then we have . Hence, given an arbitrary , the second condition in (13) implies that there exists such that
Moreover, by using the last condition in (13), we conclude that there exists such that
(14) |
which using the inequality in (13) yields . Hence, Definition 2 implies that . Therefore, (14) can be used as a stopping criterion to compute a feasible inexact projection, with respect to the norm , of onto relative to and forcing parameter . For instance, it follows from [13, Theorem 3.2, Lemma 3.1] that such sequences and satisfying (13) can be computed by using Dykstra’s algorithm [22, 32], whenever is the identity matrix and the set , where are closed and convex sets and the exact projection is easy to obtain, for all .
We end this section by discussing how to compute a point . For that, we apply the classical Frank-Wolfe method, also known as conditional gradient method, to minimize the function onto the constraint set with a suitable stop criteria depending of and , see [9, 48]. To state the method we assume the existence of a linear optimization oracle (or simply LO oracle) capable of minimizing linear functions over the constraint set , which is assumed to be compact. The Frank-Wolfe method is formally stated as follows.
- Input:
-
, , and .
- Step 0.
-
Let and set .
- Step 1.
-
Use a LO oracle to compute an optimal solution and the optimal value as
(15) If , then define and stop.
- Step 2.
-
Compute and as
(16) Set , and go to Step 1.
- Output:
-
.
Let us describe the main features of Algorithm 1, i.e., the Frank-Wolfe method applied to the problem . In this case, (15) is equivalent to . Since is convex, we have , for all . Define and . Letting in the last inequality, we obtain , which implies that whenever . Thus, we conclude that , for all . Therefore, if Algorithm 1 computes satisfying , then the method terminates. Otherwise, it computes the step size using exact minimization. Since , and is convex, we conclude from (16) that , thus Algorithm 1 generates a sequence in . Finally, (15) implies that , for all . Considering that [9, Proposition A.2] implies that and taking into account the stopping criteria , we conclude that the output of Algorithm 1 is a feasible inexact projection i.e., , for all .
3 Inexact scaled gradient method
The aim of this section is to present an inexact version of the scaled gradient method (SGM), which inexactness are in two distinct senses. First, we use a version of the inexactness scheme introduced in [13], and also a variation of the one appeared in [64], to compute an inexact projection onto the feasible set allowing an appropriate relative error tolerance. Second, using the inexactness conceptual scheme for non-monotones line search introduced in [43, 59], a step size is computed to define the next iterate. The statement of the conceptual algorithm is as follows.
- Step 0.
-
Choose , , , and . Let , and set .
- Step 1.
-
Choose positive real numbers and , and a positive definite matrix such that
(17) Compute as any feasible inexact projection with respect to the norm of onto relative to with forcing parameter , i.e.,
(18) If , then stop declaring convergence.
- Step 2.
-
Set . If
(19) then , define the next iterate as
(20) and go to Step 3. Otherwise, choose , set , and repeat test (19).
- Step 3.
-
Take and choose satisfying
(21) Set and go to Step 1.
Let us describe the main features of Algorithm 2. In Step 1, we first choose , , and . Then, by using some (inner) procedure, such as those specified in Section 2.1, we compute as any feasible inexact projection of onto the feasible set relative to the previous iterate with forcing parameter . If , then Lemma 9(ii) implies that is a solution of problem (1). Otherwise, and Lemma 9(i) implies that is a descent direction of at , i.e., . Hence, in Step 2, we employ a non-monotone line search with tolerance parameter to compute a step size , and the next iterate is computed as in (20). Finally, due to (19) and , we have . Therefore, the next tolerance parameter can be chosen satisfying (21) in Step 3, completing the iteration.
It is worth mentioning that the conditions in (17) allow combining several strategies for choosing the step sizes and the matrices to accelerate the performance of the classical gradient method. Strategies of choosing the step sizes and the matrices have their origin in the study of the gradient method for unconstrained optimization, papers dealing with this issue include but are not limited to [7, 27, 29, 36, 69], see also [18, 25, 26, 49]. More details about selecting step sizes and matrices can be found in the recent review [17] and references therein.
Below, we present some particular instances of the parameter and the non-monotonicity tolerance parameter in Step 3.
-
1.
Armijo line search
-
2.
Max-type line search
The earliest non-monotone line search strategy was proposed in [45]. Let be an integer parameter. In an iteration , this strategy requires a step size satisfying
(22) where and . To simplify the notations, we define . In order to identify (22) as a particular instance of (19), we set
(23) Parameters and in (23) satisfy the corresponding conditions in Algorithm 2, i.e., and (with ) satisfy (21). In fact, the definition of implies that and hence . Due to , it follows from (19) that . Since , we conclude that . Hence, owing to , we obtain . Moreover, (21) is equivalent to , which in turn, taking into account that , is equivalent to second inequality in (23). Thus, (22) is a particular instance of (19) with and defined in (23). Therefore, Algorithm 2 has as a particular instance the inexact projected version of the scaled gradient method employing the non-monotone line search (22). This version has been considered in [13]; see also [19, 65].
-
3.
Average-type line search
Let us first recall the definition of the sequence of “cost updates” that characterize the non-monotonous line search proposed in [68]. Let , and . Choose and set
(24) Some algebraic manipulations show that the sequence defined in (24) is equivalent to
(25) Since (21) is equivalent to , it follows from (25) that letting and , Algorithm 2 becomes the inexact projected version of the scaled gradient method employing the non-monotone line search proposed in [68]. Finally, considering that and , the first equality in (24) implies that . In this case, due to , we can take in Step 3. For gradient projection methods employing the non-monotone Average-type line search see, for example, [6, 34, 66].
4 Partial asymptotic convergence analysis
The goal of this section is to present a partial convergence result for the sequence generated by Algorithm 2, namely, we will prove that every cluster point of is stationary for problem (1). For that, we state a result that is contained in the proof of [43, Theorem 4].
Lemma 10.
There holds , for all . As consequence the sequence is non-increasing.
Next, we present our first convergence result. It is worth noting that, just as in the classical projected gradient method, we do not need to assume that has a bounded sub-level set.
Proposition 11.
Proof.
First, assume that is finite. In this case, according to Step 1, there exists such that , where , and . Therefore, applying Lemma 9(ii) with , and , we conclude that is stationary for problem (1). Now, assume that is infinite. Let be a cluster point of and be a subsequence of such that . Since is closed and , we have . Moreover, owing to , we have . Hence, considering that and Lemma 10 implies that is non-increasing, we conclude that . On the other hand, due to , where , Definition 2 implies
(26) |
Considering that and are bounded, , converges to and is continuous, the last inequality together Remark 1 and (4) imply that is also bounded. Thus, we can assume without loss of generality that . In addition, taking into account that for all , applying Lemma 9(i) with , , and , we obtain that , for all . Therefore, (19) and (20) imply that
(27) |
Now, due , for all , we can also assume without loss of generality that Therefore, owing to and , taking limit in (27) along the subsequences , and yields We have two possibilities: or . If , then Now, we assume that . In this case, for all large enough, there exists such that
(28) |
On the other hand, by the mean value theorem, there exists such that
Combining this equality with (28), and taking into account that , we have
for large enough. Since , it follows that . Then, dividing both sides of the above inequality by and taking limits as goes to , we conclude that . Hence, due to , we obtain . We recall that , for all , which taking limit as goes to yields . Hence, we also have . Therefore, for any of the two possibilities, or , we have . On the other hand, since and are bounded, we also assume without loss of generality that and . Thus, since Remark 1 implies that
and considering that , , , , taking limit in (26), we conclude that
where . Hence, Definition 2 implies that , where . Therefore, due to , we can apply second sentence in Lemma 9(iii) with , and , to conclude that is stationary for problem (1). ∎
The tolerance parameter that controls the non-monotonicity of the line search must be smaller and smaller as the sequence tends to a stationary point. Next corollary presents a general condition for this property, its proof can be found in [43, Theorem 4].
Corollary 12.
If , then . Consequently, .
The Armijo and the non-monotone Average-type line searches discussed in Section 3 satisfy the assumption of Corollary 12, i.e., . However, for the non-monotone Max-type line search, we can only guarantee that . Hence, we can not apply Corollary 12 to conclude that . In the next proposition, we will deal with this case separately.
Proposition 13.
Proof.
First of all, note that , where and . Thus, applying Lemma 9(i) with , , and , we obtain
(29) |
On the other hand, due to , Lemma 10 implies that is non-increasing and . Hence, we have and, as a consequence, converges. Note that is an integer such that
(30) |
Since , (22) implies that
for all . In view of be convergent, for all , and taking into account that , the last inequality together (29) implies that
(31) |
We proceed to prove that . For that, set . First, we prove by induction that, for all , the following two equalities hold
(32) |
where we are considering . Assume that . Since , the first equality in (32) follows from (31). Hence, . Since is compact and is uniformly continuous on , we have , which again using that implies the second equality in (32). Assume that (32) holds for . Again, due to , (22) implies that
Similar argument used to obtain (31) yields . Thus, the first equality in (32) holds for , which implies . Again, the uniformly continuity of on gives
which shows that the second equality in (32) holds for . From (30) and , we obtain . Thus, taking into account that
it follows from the first inequality in (32) that . Hence, due to be uniformly continuous on and be convergent, we conclude that
and considering that the desired results follows. ∎
Remark 7.
Let be bounded and be generated by Algorithm 2 with the non-monotone line search (22) with . Then, combining Propositions 11 and 13, we conclude that is either finite terminating at a stationary point of problem (1), or infinite, and every cluster point of is stationary for problem (1). Therefore, we have an alternative proof for the result obtained in [13, Theorem 2.1].
5 Full asymptotic convergence and complexity analysis
The purpose of this section is twofold. We will first prove, under suitable assumptions, the full convergence of the sequence and then we will present iteration-complexity bounds for it. For this end, we need to be more restrictive both with respect to the inexact projection in (18) and in the tolerance parameter that controls the non-monotonicity of the line search used in (19). More precisely, we assume that in Step 1 of Algorithm 2:
-
A1.
For all , we take with .
It is worth recalling that, taking the parameter , it follows from Lemma 6 that . In addition, we also assume that in Step 2 of Algorithm 2:
-
A2.
For all , we take such that .
It follows from Corollary 12 that the Armijo and the non-monotone Average-type line searches discussed in Section 3 satisfy Assumption A2.
5.1 Full asymptotic convergence analysis
In this section, we prove the full convergence of the sequence satisfying A1 and A2. We will begin establishing a basic inequality for . To simplify notations, we define the constant
(33) |
Lemma 14.
For each , there holds
(34) |
Proof.
We know that , for all and . Thus, using (20), we have
(35) |
On the other hand, since with , it follows from Definition 3, with , , , , , and , that
Hence, after some algebraic manipulations in the last inequality, we have
Combining the last inequality with (35), we conclude that
(36) |
Since and , we have . Thus, it follows from (36) that
Thus, considering that and taking into account (19), we conclude that
(37) |
for all . On the other hand, applying Lemma 9(iii) with , , , , and , we obtain , for all . Therefore, it follows from (19) and (20) that , to all . Hence, due to , we have
Therefore, (34) follows from the combination of the last inequality with (33) and (37). ∎
For proceeding with the analysis of the behavior of the sequence , we define the following auxiliary set
Corollary 15.
Assume that is a convex function. If , then converges to a stationary point of problem (1).
Proof.
Let . Since is convex, we have , for all . Thus, , for all . Using Lemma 14 and taking into account that and , we obtain
Defining , we have , for all . On the other hand, summing with and using Corollary 12, we have
Hence, . Thus, it follows from Definition 1 that is quasi-Fejér convergent to with respect to the sequence . Since is nonempty, it follows from Theorem 2 that is bounded, and therefore it has cluster points. Let be a cluster point of and be a subsequence of such that . Considering that is continuous and , we have . On the other hand, Lemma 10 implies that is non-increasing. Thus Hence, , and Theorem 2 implies that converges to . The conclusion is obtained by using Proposition 11. ∎
Theorem 16.
If is a convex function and has no cluster points, then , , and .
Proof.
Since has no cluster points, then . Assume by contradiction that . Thus, there exists , such that for all . Therefore, . Using Corollary 15, we obtain that is convergent, contradicting that . Therefore, . Now, we claim that . If , the claim holds. Assume by contraction that . Thus, there exists such that , for all . Hence, . Using Corollary 15, we have that is convergent, contradicting again and concluding the proof. ∎
Corollary 17.
If is a convex function and has at least one cluster point, then converges to a stationary point of problem (1).
Proof.
Theorem 18.
Assume that is a convex function and . Then, converge to an optimal solution of problem (1).
5.2 Iteration-complexity bound
In the section, we preset some iteration-complexity bounds related to the sequence generated by Algorithm 2. For that, besides assuming A1 and A2, we also need the following assumption.
-
A3.
The gradient of is Lipschitz continuous with constant .
For simple notations, we define the following positive constant
(38) |
Lemma 19.
The steepsize in Algorithm 2 satisfies .
Proof.
First, we assume that . In this case, we have and the required inequality holds. Now, we assume that . Thus, it follows from (19) that there exists such that
(39) |
Considering that we are under assumption A3, we apply Lemma 1 to obtain
(40) |
Hence, the combination of (39) with (40) yields
(41) |
On the order hand, with , where . Thus, applying Lemma 9(i) with , , and , we obtain
Hence, considering that and , the last inequality implies
The combination of the last inequality with (41) yields
Thus, since , we obtain and the proof is concluded. ∎
Considering that , it follows from Lemma 9(ii) that if , then the point is stationary for problem (1). Since , the quantity can be seen as a measure of stationarity of the point . In next theorem, we present an iteration-complexity bound for this quantity, which is a constrained inexact version of [43, Theorem 1].
Theorem 20.
Let be defined in (38). Then, for every , the following inequality holds
Proof.
Since with , where , applying Lemma 9(i) with , , and , and taking into account that and , we obtain
The definition of and (19) imply . The combination of the last two inequalities together with Lemma 19 yields
Hence, performing the sum of the above inequality for , we conclude that
which implies the desired result. ∎
Next theorem presents an iteration-complexity bound for the sequence when is convex.
Theorem 21.
Let be a convex function on . Then, for every , there holds
Proof.
Using the first inequality in (17) and Lemma 19, we have , for all . We also know form the convexity of that , for all . Thus, applying Lemma 14 with , after some algebraic manipulations, we conclude
Hence, performing the sum of the above inequality for , we obtain
Thus, , which implies the desired inequality. ∎
We ended this section with some results regarding the number of function evaluations performed by Algorithm 2. Note that the computational cost associated to each (outer) iteration involves a gradient evaluation, the computation of a (inexact) projection, and evaluations of at different trial points. Thus, we must consider the function evaluations at the rejected trial points.
Lemma 22.
Let be the number of function evaluations after iterations of Algorithm 2. Then,
Proof.
Theorem 23.
Theorem 24.
Let be a convex function on . For a given , the number of function evaluations performed by Algorithm 2 is, at most,
to compute such that .
6 Numerical experiments
This section presents some numerical experiments in order to illustrate the potential advantages of considering inexact schemes in the SPG method. We will discuss inexactness associated with both the projection onto the feasible set and the line search procedure.
Given and two matrices, with , and , we consider the matrix function given by:
(42) |
which combines a least squares term with a Rosenbrock-type function. Throughout this section, stands for the -element of the matrix and denotes the Frobenius matrix norm, i.e., where the inner product is given by . The test problems consist of minimizing in (42) subject to two different feasible sets, as described below. We point out that interesting applications in many areas emerge as constrained least squares matrix problems, see [13] and references therein. In turn, the Rosenbrock term was added in order to make the problems more challenging.
- Problem I:
-
where is the cone of symmetric and diagonally dominant real matrices with positive diagonal, i.e.,
and are given matrices, and means that for all . The feasible set of Problem I was considered, for example, in the numerical tests of [13].
- Problem II:
It is easy to see that the feasible set o Problem I is a closed and convex set and the feasible set of Problem II is a compact and convex set. As discussed in Section 2.1.1, the Dykstra’s alternating algorithm and the Frank-Wolfe algorithm can be used to calculate inexact projections. The choice of the most appropriate method depends on the structure of the feasible set under consideration. For Problem I, we used the Dykstra’s algorithm described in [13], see also [58]. In this case, , where
and the projection of a given onto consists of cycles of projections onto the convex sets . Here an iteration of the Dykstra’s algorithm should be understood as a complete cycle of projections onto all sets and onto the box . Recall that this scheme provides an inexact projection as in Definition 2. Now consider Problem II. It is well known that calculating an exact projection onto the spectrahedron (i.e., onto the feasible set of Problem II) requires a complete spectral decomposition, which can be prohibitive specially in the large scale case. In contrast, the computational cost of an iteration of the Frank-Wolfe algorithm described in Algorithm 1 is associated by an extreme eigenpair computation, see, for example, [48]. Unfortunately, despite its low cost per-iteration, the Frank-Wolfe algorithm suffers from a slow convergence rate. Thus, we considered a variant of the Frank-Wolfe algorithm proposed in [4], which improves the convergence rate and the total time complexity of the classical Frank-Wolfe method. This algorithm specialized for the projection problem over the spectrahedron is carefully described in [1]. Without attempting to go into details, it replaces the top eigenpair computation in Frank-Wolfe with a top- (with ) eigenpair computation, where is an algorithmic parameter automatically selected. The total number of computed eigenpairs can be used to measure the computational effort to calculate projections. We recall that a Frank-Wolfe type scheme provides an inexact projection as in Definition 3.
We notice that Problems I and II can be seen as particular instances of the problem (1) in which the number of variables is . This mean that they can be solved by using Algorithm 2. We are especially interested in the spectral gradient version [14] of the SPG method, which is often associated with large-scale problems [15]. For this, we implemented Algorithm 2 considering for all , and, for ,
where , , , and . We set , , , and . Parameter was chosen according to the line search used (see Section 3), while parameter depends on the inexact projection scheme considered.
In the line search scheme (Step 2 of Algorithm 2), if a step size is not accepted, then is calculated using one-dimensional quadratic interpolation employing the safeguard when the minimum of the one-dimensional quadratic lies outside , see, for example, [54, Section 3.5]. Concerning the stopping criterion, all runs were stopped at an iterate declaring convergence if
where is as in (18). Our codes are written in Matlab and are freely available at https://github.com/maxlemes/InexProj-SGM. All experiments were run on a macOS 10.15.7 with 3.7GHz Intel Core i5 processor and 8GB of RAM.
6.1 Influence of the inexact projection
We begin the numerical experiments by checking the influence of the forcing parameters that control the degree of inexactness of the projections in the performance of the SPG method. In this first battery of tests, we used Armijo line searches, see Section 3.
We generated 10 instances of Problem I using , , and . The matrices and were randomly generated with elements belonging to . We set and as in [13]. For each instance, the starting point was randomly generated with elements belonging to , then it was redefined as and its diagonal elements were again redefined as , ensuring a feasible starting point. Figure 1 shows the average number of iterations, the average number of Dykstra’s iterations, and the average CPU time in seconds needed to reach the solution for different choices of , namely, , , , , , , , , , and for all . Remember that smaller values of imply more inexact projections. As expected, the number of iterations of the SPG tended to increase as decreased, see Figure 1(a). On the other hand, the computational cost of an outer iteration (which can be measured by the number of Dykstra’s iterations) tends to decrease when considering smaller values of . This suggests a trade-off, controlled by parameter , between the number and the cost per iteration. Figure 1(b) shows that values for close to 0.8 showed better results, which is in line with the experiments reported in [13]. Finally, as can be seen in Figure 1(c), the CPU time was shown to be directly proportional to the number Dykstra’s iterations.
![]() |
![]() |
![]() |
(a) | (b) | (c) |
Although Algorithm 2 is given only in terms of parameter , we will directly consider parameter for Problem II in which inexact projections are computed according to Definition 3. We randomly generated 10 instances of Problem II with , , and . Matrices and were obtained similarly to Problem I. In turn, a starting point was randomly generated with elements in the interval , then it was redefined to be , resulting in a feasible initial guess. Figure 2 shows the average number of iterations, the average number of computed eigenpairs, and the average CPU time in seconds needed to reach the solution for different constant choices of ranging from to . Now, higher values of imply more inexact projections. Note that for appropriate choices of , the adopted values of fulfill Assumption A1 of Section 5. Concerning the number of iterations, as can be seen in Figure 2(a), the SPG method was not very sensitive to the choice of parameter . Hence, since higher values of imply cheaper iterations, the number of computed eigenpairs and the CPU time showed to be inversely proportional to , see Figures 2(b)–(c). Thus, our experiments suggest that the best value for seems to be .
![]() |
![]() |
![]() |
(a) | (b) | (c) |
6.2 Influence of the line search scheme
The following experiments compare the performance of the SPG method with different strategies for computing the step sizes. We considered the Armijo, the Average-type, and the Max-type line searches discussed in Section 3. Based on our numerical experience, we employed the fixed value for the Average-type line search and for the Max-type line search. According to the results of the previous section, we used the fixed forcing parameters and to compute inexact projections for Problems I and II, respectively.
We randomly generated 100 instances of each problem as described in Section 6.1. The dimension of the problems and the parameter in (42) were also taken arbitrarily. For Problem I, we choose and , whereas for Problem II, we choose and . In both cases, we set . We compare the strategies with respect to the number of function evaluations, the number of (outer) iterations, the total computational effort to calculate projections (measured by the number of Dykstra’s iterations and computed eigenpairs for Problems I and II, respectively), and the CPU time. The results are shown in Figures 3 and 4 for Problems I and II, respectively, using performance profiles [31].
For Problem I, with regard to the number of function evaluations, the SPG method with the Average-type line search was the most efficient among the tested strategies. In a somewhat surprising way, in this set of test problems, the Armijo strategy was better than the Max-type line search, see Figure 3(a). On the other hand, as can be seen in Figure 3(b), the Armijo strategy required fewer iterations than the non-monotonous strategies. As expected, this was reflected in the number of Dykstra’s iterations and the CPU time, see Figures 3(c)–(d). We can conclude that, with respect to the last two criteria, the Armijo and Average-type strategies had similar and superior performances to the Max-type strategy.
Now, concerning Problem II, Figure 4 shows that the non-monotonous strategies outperformed the Armijo strategy in all the comparative criteria considered. Again, the Average-type strategy seems to be superior to the Max-type strategy.
![]() |
![]() |
![]() |
![]() |
(a) | (b) | (c) | (d) |
![]() |
![]() |
![]() |
![]() |
(a) | (b) | (c) | (d) |
From all the above experiments, we conclude that the non-monotone line searches tend to require fewer objective function evaluations. However, this does not necessarily mean computational savings, since there may be an increase in the number of iterations. In this case, optimal efficiency of the algorithm comes from a compromise between those two conflicting tendencies. Overall, the use of non-monotonous line search techniques is mainly justified when the computational effort of an iteration is associated with the cost of evaluating the objective function.
7 Conclusions
In this paper, we study the SGP method to solve constrained convex optimization problems employing inexact projections onto the feasible set and a general non-monotone line search. We expect that this paper will contribute to the development of research in this field, mainly to solve large-scale problems when the computational effort of an iteration is associated with the projections onto the feasible set and the cost of evaluating the objective function. Indeed, the idea of using the inexactness in the projection as well as in the line search, instead of the exact ones, is particularly interesting from a computational point of view. In particular, it is noteworthy that the Frank-Wolfe method has a low computational cost per iteration resulting in high computational performance in different classes of compact sets, see [37, 48]. An issue that deserves attention is the search for new efficient methods such as the Frank-Wolfe’s and Dykstra’s methods that generate inexact projections.
References
- [1] A. A. Aguiar, O. P. Ferreira, and L. F. Prudente. Inexact gradient projection method with relative error tolerance. arXiv preprint arXiv:2101.11146, 2021.
- [2] A. A. Aguiar, O. P. Ferreira, and L. F. Prudente. Subgradient method with feasible inexact projections for constrained convex optimization problems. Optimization, 0(0):1–23, 2021, https://doi.org/10.1080/02331934.2021.1902520.
- [3] M. Ahookhosh, K. Amini, and S. Bahrami. A class of nonmonotone Armijo-type line search method for unconstrained optimization. Optimization, 61(4):387–404, 2012.
- [4] Z. Allen-Zhu, E. Hazan, W. Hu, and Y. Li. Linear convergence of a Frank-Wolfe type algorithm over trace-norm balls. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, page 6192–6201, Red Hook, NY, USA, 2017. Curran Associates Inc.
- [5] R. Andreani, E. G. Birgin, J. M. Martínez, and J. Yuan. Spectral projected gradient and variable metric methods for optimization with linear inequalities. IMA J. Numer. Anal., 25(2):221–252, 04 2005, https://academic.oup.com/imajna/article-pdf/25/2/221/2090233/drh020.pdf.
- [6] A. Auslender, P. J. S. Silva, and M. Teboulle. Nonmonotone projected gradient methods based on barrier and Euclidean distances. Comput. Optim. Appl., 38(3):305–327, 2007.
- [7] J. Barzilai and J. M. Borwein. Two-point step size gradient methods. IMA J. Numer. Anal., 8(1):141–148, 1988.
- [8] H. H. Bauschke and P. L. Combettes. Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer Publishing Company, Incorporated, 1st edition, 2011.
- [9] A. Beck and M. Teboulle. A conditional gradient method with linear rate of convergence for solving convex linear systems. Math. Methods Oper. Res., 59(2):235–247, 2004.
- [10] J. Y. Bello Cruz and L. R. Lucambio Pérez. Convergence of a projected gradient method variant for quasiconvex objectives. Nonlinear Anal., 73(9):2917–2922, 2010.
- [11] D. P. Bertsekas. On the Goldstein-Levitin-Polyak gradient projection method. IEEE Trans. Automat. Control, 21(2):174–184, 1976.
- [12] D. P. Bertsekas. Nonlinear programming. Athena Scientific Optimization and Computation Series. Athena Scientific, Belmont, MA, second edition, 1999.
- [13] E. G. Birgin, J. M. Martínez, and M. Raydan. Inexact spectral projected gradient methods on convex sets. IMA J. Numer. Anal., 23(4):539–559, 2003.
- [14] E. G. Birgin, J. M. Martínez, and M. Raydan. Nonmonotone spectral projected gradient methods on convex sets. SIAM J. Optim., 10(4):1196–1211, 2000, https://doi.org/10.1137/S1052623497330963.
- [15] E. G. Birgin, J. M. Martínez, and M. Raydan. Spectral projected gradient methods: Review and perspectives. J. Stat. Softw., 60(3):1–21, 2014.
- [16] S. Bonettini, I. Loris, F. Porta, and M. Prato. Variable metric inexact line-search-based methods for nonsmooth optimization. SIAM J. Optim., 26(2):891–921, 2016.
- [17] S. Bonettini, F. Porta, M. Prato, S. Rebegoldi, V. Ruggiero, and L. Zanni. Recent advances in variable metric first-order methods. In Computational Methods for Inverse Problems in Imaging, pages 1–31. Springer, 2019.
- [18] S. Bonettini and M. Prato. New convergence results for the scaled gradient projection method. Inverse Problems, 31(9):095008, 20, 2015.
- [19] S. Bonettini, R. Zanella, and L. Zanni. A scaled gradient projection method for constrained image deblurring. Inverse Problems, 25(1):015002, 23, 2009.
- [20] L. Bottou, F. E. Curtis, and J. Nocedal. Optimization methods for large-scale machine learning. SIAM Rev., 60(2):223–311, 2018.
- [21] S. Boyd, L. El Ghaoui, E. Feron, and V. Balakrishnan. Linear Matrix Inequalities in System and Control Theory. Society for Industrial and Applied Mathematics, 1994, https://epubs.siam.org/doi/pdf/10.1137/1.9781611970777.
- [22] J. P. Boyle and R. L. Dykstra. A method for finding projections onto the intersection of convex sets in Hilbert spaces. In Advances in order restricted statistical inference (Iowa City, Iowa, 1985), volume 37 of Lect. Notes Stat., pages 28–47. Springer, Berlin, 1986.
- [23] P. L. Combettes and B. C. Vũ. Variable metric quasi-Fejér monotonicity. Nonlinear Anal., 78:17–31, 2013.
- [24] Y. H. Dai. On the nonmonotone line search. J. Optim. Theory Appl., 112(2):315–330, 2002.
- [25] Y.-H. Dai and R. Fletcher. Projected Barzilai-Borwein methods for large-scale box-constrained quadratic programming. Numer. Math., 100(1):21–47, 2005.
- [26] Y.-H. Dai and R. Fletcher. New algorithms for singly linearly constrained quadratic programs subject to lower and upper bounds. Math. Program., 106(3, Ser. A):403–421, 2006.
- [27] Y.-H. Dai, W. W. Hager, K. Schittkowski, and H. Zhang. The cyclic Barzilai-Borwein method for unconstrained optimization. IMA J. Numer. Anal., 26(3):604–627, 2006.
- [28] F. R. de Oliveira, O. P. Ferreira, and G. N. Silva. Newton’s method with feasible inexact projections for solving constrained generalized equations. Comput. Optim. Appl., 72(1):159–177, 2019.
- [29] D. di Serafino, V. Ruggiero, G. Toraldo, and L. Zanni. On the steplength selection in gradient methods for unconstrained optimization. Appl. Math. Comput., 318:176–195, 2018.
- [30] R. Díaz Millán, O. P. Ferreira, and L. F. Prudente. Alternating conditional gradient method for convex feasibility problems. arXiv e-prints, page arXiv:1912.04247, Dec 2019, 1912.04247.
- [31] E. D. Dolan and J. J. Moré. Benchmarking optimization software with performance profiles. Math. Program., 91(2):201–213, 2002.
- [32] R. L. Dykstra. An algorithm for restricted least squares regression. J. Amer. Statist. Assoc., 78(384):837–842, 1983.
- [33] J. Fan, L. Wang, and A. Yan. An inexact projected gradient method for sparsity-constrained quadratic measurements regression. Asia-Pac. J. Oper. Res., 36(2):1940008, 21, 2019.
- [34] N. S. Fazzio and M. L. Schuverdt. Convergence analysis of a nonmonotone projected gradient method for multiobjective optimization problems. Optim. Lett., 13(6):1365–1379, 2019.
- [35] M. A. T. Figueiredo, R. D. Nowak, and S. J. Wright. Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems. IEEE J. Sel. Topics Signal Process., 1(4):586–597, Dec 2007.
- [36] A. Friedlander, J. M. Martínez, B. Molina, and M. Raydan. Gradient method with retards and generalizations. SIAM J. Numer. Anal., 36(1):275–289, 1999.
- [37] D. Garber and E. Hazan. Faster rates for the frank-wolfe method over strongly-convex sets. Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37, pages 541–549, 2015.
- [38] M. Golbabaee and M. E. Davies. Inexact gradient projection and fast data driven compressed sensing. IEEE Trans. Inform. Theory, 64(10):6707–6721, 2018.
- [39] A. A. Goldstein. Convex programming in Hilbert space. Bull. Amer. Math. Soc., 70:709–710, 1964.
- [40] P. Gong, K. Gai, and C. Zhang. Efficient euclidean projections via piecewise root finding and its application in gradient projection. Neurocomputing, 74(17):2754 – 2766, 2011.
- [41] D. S. Gonçalves, M. A. Gomes-Ruggiero, and C. Lavor. A projected gradient method for optimization over density matrices. Optim. Methods Softw., 31(2):328–341, 2016, https://doi.org/10.1080/10556788.2015.1082105.
- [42] D. S. Gonçalves, M. L. N. Gonçalves, and T. C. Menezes. Inexact variable metric method for convex-constrained optimization problems. Optimization, 0(0):1–19, 2021, https://doi.org/10.1080/02331934.2021.1887181.
- [43] G. N. Grapiglia and E. W. Sachs. On the worst-case evaluation complexity of non-monotone line search algorithms. Comput. Optim. Appl., 68(3):555–577, 2017.
- [44] G. N. Grapiglia and E. W. Sachs. A generalized worst-case complexity analysis for non-monotone line searches. Numer. Algorithms, 87(2):779–796, Jun 2021.
- [45] L. Grippo, F. Lampariello, and S. Lucidi. A nonmonotone line search technique for Newton’s method. SIAM J. Numer. Anal., 23(4):707–716, 1986.
- [46] N. J. Higham. Computing the nearest correlation matrix—a problem from finance. IMA J. Numer. Anal., 22(3):329–343, 2002.
- [47] A. N. Iusem. On the convergence properties of the projected gradient method for convex optimization. Comput. Appl. Math., 22(1):37–52, 2003.
- [48] M. Jaggi. Revisiting Frank-Wolfe: Projection-free sparse convex optimization. In S. Dasgupta and D. McAllester, editors, Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, pages 427–435, Atlanta, Georgia, USA, 17–19 Jun 2013. PMLR.
- [49] E. Levitin and B. Polyak. Constrained minimization methods. USSR Comput. Math. Math. Phys., 6(5):1 – 50, 1966.
- [50] G. Ma, Y. Hu, and H. Gao. An accelerated momentum based gradient projection method for image deblurring. In 2015 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), pages 1–4, 2015.
- [51] J. Mo, C. Liu, and S. Yan. A nonmonotone trust region method based on nonincreasing technique of weighted average of the successive function values. J. Comput. Appl. Math., 209(1):97–108, 2007.
- [52] J. J. Moré. On the performance of algorithms for large-scale bound constrained problems. In Large-scale numerical optimization (Ithaca, NY, 1989), pages 32–45. SIAM, Philadelphia, PA, 1990.
- [53] Y. Nesterov and A. Nemirovski. On first-order algorithms for /nuclear norm minimization. Acta Numer., 22:509–575, 2013.
- [54] J. Nocedal and S. Wright. Numerical optimization. Springer Science & Business Media, 2006.
- [55] E. R. Panier and A. L. Tits. Avoiding the Maratos effect by means of a nonmonotone line search. I. General constrained problems. SIAM J. Numer. Anal., 28(4):1183–1195, 1991.
- [56] A. Patrascu and I. Necoara. On the convergence of inexact projection primal first-order methods for convex minimization. IEEE Trans. Automat. Control, 63(10):3317–3329, 2018.
- [57] J. Rasch and A. Chambolle. Inexact first-order primal-dual algorithms. Comput. Optim. Appl., 76(2):381–430, 2020.
- [58] M. Raydan and P. Tarazaga. Primal and polar approach for computing the symmetric diagonally dominant projection. Numer. Linear Algebra Appl., 9(5):333–345, 2002, https://onlinelibrary.wiley.com/doi/pdf/10.1002/nla.277.
- [59] E. W. Sachs and S. M. Sachs. Nonmonotone line searches for optimization algorithms. Control Cybernet., 40(4):1059–1075, 2011.
- [60] S. Salzo and S. Villa. Inexact and accelerated proximal point algorithms. J. Convex Anal., 19(4):1167–1192, 2012.
- [61] S. Sra, S. Nowozin, and S. Wright. Optimization for Machine Learning. Neural information processing series. MIT Press, 2012.
- [62] J. Tang, M. Golbabaee, and M. E. Davies. Gradient projection iterative sketch for large-scale constrained least-squares. In D. Precup and Y. W. Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 3377–3386, International Convention Centre, Sydney, Australia, 06–11 Aug 2017. PMLR.
- [63] P. L. Toint. An assessment of nonmonotone linesearch techniques for unconstrained optimization. SIAM J. Sci. Comput., 17(3):725–739, 1996.
- [64] S. Villa, S. Salzo, L. Baldassarre, and A. Verri. Accelerated and inexact forward-backward algorithms. SIAM J. Optim., 23(3):1607–1633, 2013.
- [65] C. Wang, Q. Liu, and X. Yang. Convergence properties of nonmonotone spectral projected gradient methods. J. Comput. Appl. Math., 182(1):51–66, 2005.
- [66] X. Yan, K. Wang, and H. He. On the convergence rate of scaled gradient projection method. Optimization, 67(9):1365–1376, 2018.
- [67] F. Zhang, H. Wang, J. Wang, and K. Yang. Inexact primal–dual gradient projection methods for nonlinear optimization on convex set. Optimization, 69(10):2339–2365, 2020, https://doi.org/10.1080/02331934.2019.1696338.
- [68] H. Zhang and W. W. Hager. A nonmonotone line search technique and its application to unconstrained optimization. SIAM J. Optim., 14(4):1043–1056, 2004.
- [69] B. Zhou, L. Gao, and Y.-H. Dai. Gradient methods with adaptive step-sizes. Comput. Optim. Appl., 35(1):69–86, 2006.