∎
[email protected] and [email protected]. 44institutetext: Max L.N. Gonçalves 55institutetext: Jefferson G. Melo 66institutetext: Institute of Mathematics and Statistics, Federal University of Goias, Campus II- Caixa Postal 131, CEP 74001-970, Goiânia-GO, Brazil.
[email protected] and [email protected]
A Relative Inexact Proximal Gradient Method With an Explicit Linesearch
Abstract
This paper presents and investigates an inexact proximal gradient method for solving composite convex optimization problems characterized by an objective function composed of a sum of a full domain differentiable convex function and a non-differentiable convex function. We introduce an explicit linesearch strategy that requires only a relative inexact solution of the proximal subproblem per iteration. We prove the convergence of the sequence generated by our scheme and establish its iteration complexity, considering both the functional values and a residual associated with first-order stationary solutions. Additionally, we provide numerical experiments to illustrate the practical efficacy of our method.
Keywords:
Linesearch Iteration complexity Nonsmooth and convex optimization problemProximal gradient method Relative error ruleMSC:
65K05 90C25 90C301 Introduction
In this paper, our focus is on addressing nonsmooth convex optimization problems characterized by the following formulation:
(1) |
where represents a finite-dimensional Euclidean space. The function is a nonsmooth, proper, and lower semicontinuous convex function. The function is a continuously differentiable and convex function. Throughout the paper, we denote the optimal value and solution set of problem (1) by and , respectively. From now on, we assume that .
Proximal gradient methods effectively solve optimization problems such as in (1). The main step in the proximal gradient method involves evaluating the proximal operator defined by:
(2) |
where the norm, , is induced by the inner product of , , as . The proximal operator is well-known as a full-domain, firmly nonexpansive operator. These useful properties, together with the descent property of the gradient step, establish the foundation for the convergence and complexity for proximal gradient iterations.
Proximal Gradient Method (PGM) Let . Compute and (3) Choose and compute (4)
The coefficients and , referred to as stepsizes, can be determined based on backtracking linesearch procedures. Such strategies are essential whenever the global -Lipschitz continuity of fails or even when computing an acceptable upper bound for is challenging. This situation is often encountered in numerous applications; for instance, in inverse problems based on Banach norms Bredies:2009 ; Schuster:2012 or Bregman distances such as the Kullback-Leibler divergence Salzo:2014 ; Bonettini:2012 ; BelloCruz:2016a . Moreover, even when is known, linesearches may allow for longer steps toward the solution by using local information at every iteration.
There are several possible choices for these stepsizes, each impacting the algorithm’s performance in different ways; see, for instance, Salzo:2017a ; BelloCruz:2016a ; BelloCruz:2016c ; Bello-Cruz:2021 . It is important to note that in order to compute the stepsize using a backtracking linesearch at each iteration , the proximal operator may need to be evaluated multiple times within the procedure. Conversely, the stepsize can be selected by evaluating the proximal operator only once per iteration. In this context, we will refer to explicit linesearch to describe a backtracking procedure that determines after setting as a constant for all . These explicit strategies are particularly advantageous, especially in cases where evaluating the proximal operator is challenging. The function is often complex enough that the corresponding proximal operator lacks an analytical solution. In such cases, an ad-hoc algorithm should be employed to evaluate the proximal operator inexactly. For instance, consider , and let be defined as
with , a form encountered in sparse regression problems with grouping, as discussed in Zhong:2012 . Similarly, in the context of matrix factorization, consider for the CUR-like factorization optimization problem Barre:2022 ; JMLR:v12:mairal11a , where the goal is to approximate a matrix with having sparse rows and columns. In this case, is given by
which will be considered in the numerical illustrations of this paper. For further examples and discussions, see BelloCruz:2017a ; Bello-Cruz:2023 ; Salzo:2012 ; Villa:2013 ; Schmidt:2011 ; Jiang:2012 ; Millan:2019 ; Barre:2022 . Of course, there are rare cases when the exact analytical solution of the proximal operator is available, such as when as or the indicator function of a simple convex and closed set, see, for instance, Beck:2009 ; Combettes:2005 .
Consequently, in practice, the evaluation of the proximal operator is often done inexactly. A basic inexactness criterion is to approximately evaluate the proximal operator using an exogenous sequence of error tolerance which, in general, must be summable in order to guarantee the convergence of the proximal iterations. Such a diminishing sequence is a priori chosen without using any information that may be available along with the iterations. Usually, the restrictive summability condition forces the solution of the proximal subproblem to be increasingly accurate, often more than necessary; see, for instance, Jiang:2012 ; Salzo:2012 ; Aujol:2015 ; Villa:2013 ; Schmidt:2011 . In the past two decades, relative error criteria have been considered as an effective and practical way of controlling the inexactness in solving proximal subproblems of several algorithms, including FISTA, ADMM, augmented Lagrangian, and Douglas-Rachford methods; refer to Eckstein:2013 ; Eckstein:2018 ; Alves:2020 ; Adona:2019 ; Adona:2020 for examples. Relative error criteria are often easy to verify in practice and has the advantage of exploring the available information at a specific iteration, being, therefore, an interesting alternative to the aforementioned exogenous sequences.
In this paper, we propose and analyze an inexact proximal gradient method for solving problem (1). We propose a novel relative inexactness criterion for solving the proximal subproblems which in some sense ressembles the ideas of relative error criteria introduced in Monteiro:2010 ; Solodov:1999a , but adds some new elements in order to control the objective function on the inexact solution. The proposed scheme requires only one inexact solution of the proximal subproblem per iteration, and the stepsizes are computed through a relaxed explicit linesearch procedure that takes into account the residuals obtained in the proximal subproblem and enables the iteration to address non-Lipschitz optimization problems effectively. We show that the sequence generated by our method converges to a solution of problem (1). Moreover, we establish its iteration-complexity in terms of both the function values and the residues associated to an approximate stationary solution. We also present some numerical experiments to illustrate the performance of the proposed method.
The paper is structured as follows: Section 2 presents definitions, basic facts, and auxiliary results. In Section 3, we introduce the inexact proximal gradient method with an explicit linesearch (IPG-ELS) and establish some of its fundamental properties. Section 4 establishes the iteration-complexity bounds of the IPG-ELS method in terms of functional values and a residual associated with the stationary condition for problem (1). Additionally, the full convergence of the sequence generated by the IPG-ELS method is discussed in this section. Some numerical experiments illustrating the behavior of the proposed scheme are reported in Section 5. Finally, concluding remarks are provided in Section 6.
2 Preliminary
In this section, we present some preliminary material and notations that will be used throughout this paper.
Let be a proper, lower semicontinuous, and convex function. For a given , the -subdifferential of at , denoted by , is defined as
(5) |
When , we define . Any element is called an -subgradient. If , then becomes the classical subdifferential of at , denoted by . It follows immediately from (5) that
(6) |
We present two useful auxiliary results for . The first one is the closedness of the graph of and the second is the so-called transportation formula; see Propositions 4.1.1 and 4.2.2 of Hiriart-Urruty:1993 .
Proposition 1 (Closed Graph Property)
Let be a sequence converging to with for all . Then, .
Proposition 2 (Transportation Formula)
With and in let . Then where .
We now introduce a concept of approximate stationary solution to problem (1). First, note that is a solution to problem (1) if and only if . The concept of approximate stationary solution relaxes this inclusion by introducing a residual and enlarging using .
Definition 1 (-Approximate Stationary Solution)
Given , an element is said to be an -approximate stationary solution to problem (1) with a residual pair if
(7) |
Next, we recall the definition of quasi-Fejér convergence, which is an important and well-known tool for establishing full convergence of gradient and proximal point type methods.
Definition 2 (Quasi-Fejér Convergence)
Let be a nonempty subset of . A sequence is said to be quasi-Fejér convergent to if and only if, for every , there exists a summable sequence such that
(8) |
The following result presents the main properties of quasi-Fejér convergent sequences; see Theorem of Bauschke:1996 .
Lemma 1 (Quasi-Féjer Convergence Properties)
If is quasi-Fejér convergent to , then the following statements hold:
The sequence is bounded;
If a cluster point of belongs to , then is convergent to .
We conclude the section with a basic inequality that will be used in the next sections.
Lemma 2
For any , we have .
3 Inexact Proximal Gradient Method
In this section, we introduce the inexact proximal gradient method with an explicit linesearch and establish some basic properties.
Recall that, given , the exact solution of the proximal gradient subproblem (3) with consists of finding such that
This proximal subproblem is equivalent to the following monotone inclusion problem:
(9) |
The -th iteration of the inexact proximal gradient method, which incorporates an explicit linesearch as proposed below, begins by first relaxing the above inclusion. This is achieved by finding an approximate solution along with a residual pair satisfying
and adheres to a specific mixed relative error criterion. This criterion is designed to ensure that the residual pair remains small, while also controlling the functional value of and the angle between and , in addition to guiding a search direction that involves the intermediate iteration and the residual . If the stopping criterion is not satisfied, we use a novel linesearch procedure, designed with the triple in mind, to find a suitable stepsize . This stepsize is used to move from the current point towards where , genegenerating the next point .
In the following, we formally describe the proposed method, which combines a relative inexact computation of the proximal subproblem with an explicit linesearch.
Inexact Proximal Gradient Method with an Explicit Linesearch (IPG-ELS) Initialization Step. Let , , , , , and . Set . Inexact Proximal Subproblem. Compute a triple satisfying (10) (11) Stopping Criterion. If , then stop. Linesearch Procedure. Set and . If (12) then set , and , and go to Step 2. Otherwise, set , and verify (12).
Remark 1 (The Inexact Proximal Subproblem)
Note that the inexact proximal gradient subproblem (10) is equivalent to the exact proximal gradient subproblem (9) whenever and . Moreover, it is possible to always set to and still introduce a novel scheme that diverges from the previous one, which was based on inexact summable error tolerance. Indeed, under this assumption, the inexact proximal subproblem (10)-(11) can be rewritten as:
(13) | |||
(14) |
It is worth noting that for the above subproblem, no summability assumption is made on the tolerance sequence to achieve the convergence and complexity of the proposed method, as will be shown later. Furthermore, inclusion (10) and inequality (11) guarantee that and are in the . Additionally, if is the indicator function of a nonempty closed convex set , then (13)–(14) can be fulfilled by finding a point such that
(15) |
For example, if is bounded, one can use the conditional gradient (CondG) method, a.k.a. Frank-Wolfe method Frank:1956 ; Jaggi:2013 , to obtain such point . Indeed, given , the -th step of the CondG method, applied to solve the projection problem , first finds as a minimum of the linear function over and then sets for some . If at an iteration , the computed point satisfies the stopping criterion for the CondG method, then and (15) will hold.
Remark 2 (The Explicit Lineasearch)
Note that the explicit linesearch proposed in Step 4 of the IPG-ELS is related to the one proposed in BelloCruz:2016a when and . Moreover, note that the linesearch does not evaluate the inexact proximal subproblem inside of the inner loop of Step 4.
We introduce a useful lemma that plays a crucial role in analyzing the stopping criterion of the IPG-ELS method.
Lemma 3 (Iteration Inequality Condition)
The following inequality holds for every iteration .
(16) |
Proof It follows from (10) that
Now using (5) together with the fact that , we have
which is equivalent to
(17) |
On the other hand, considering the definitions of and , we observe that (11) is equivalent to
which, combined with (17), yields the desired inequality (16). ∎
The following result demonstrates that the termination criterion of the IPG-ELS method, as specified in Step 3, is satisfied only when a solution to problem (1) is identified.
Lemma 4 (Stopping at a Solution)
The IPG-ELS method terminates at the -th iteration if and only if is a solution to problem (1).
Proof Assume that the IPG-ELS method stops at the -th iteration. In view of Step 3, we have . Hence, it follows from (16) that
Since and , we obtain and . Hence, in view of (10), we get that concluding that is a solution of problem (1).
Assume now that is a solution of problem (1). Thus, It follows from (10) that . So, the -monotonicity of in (6) implies
which is equivalent to
(18) |
Hence, it follows from (16) that
Since , , , and , we conclude that and . Hence, from (11), we also have . Therefore, the IPG-ELS method stops at the -th iteration. ∎
In the following, we establish the well-definedness of the linesearch procedure in Step 4 of the IPG-ELS method.
Lemma 5 (Finite Linesearch Termination)
The linesearch procedure in Step 4 stops after a finite number of steps.
Proof In view of Step 3, we have . Assume, for the sake of contradiction, that the linesearch procedure does not terminate after a finite number of steps. Thus, for all , we have
or, equivalently,
Given that is differentiable, the left-hand side of the above inequality approaches zero as , leading to the conclusion that
This implies contradicting the assumption that . ∎
The subsequent analysis investigates the linesearch procedure introduced in Step 4 of the IPG-ELS method, assuming that the gradient of the function , , is Lipschitz continuous. We aim to establish an upper bound for the number of iterations required by the linesearch procedure in Step 4.
Lemma 6 (Lipschitz Condition and Linesearch Complexity)
Proof Since is -Lipschitz continuous, for any , we have
Hence, if , we conclude that
using Lemma 2 in the last inequality. Since and , we have that (12) holds, thereby proving the first statement of the lemma. The last statement follows from the first one, given that the natural number , defined in (19), satisfies . ∎
This lemma provides a sufficient condition to ensure that the lower bound of the sequence generated by our linesearch is greater than . Specifically, if is -Lipschitz, then the step sizes , produced through the line search (12), are guaranteed to be bounded below by a positive constant , i.e., for all . Moreover, it is possible to relax the global Lipschitz condition to something local, such as being locally Lipschitz continuous around any solution of problem (1), as was done in Proposition 5.4(ii) of BelloCruz:2016a .
4 Convergence and Complexity Analyses of the IPG-ELS Method
In this section, we focus on analyzing the convergence properties of the IPG-ELS method. We establish the convergence and its iteration complexity in terms of functional values and a residual associated with an approximated solution, as defined in Definition 1.
We begin this section by presenting a result that is fundamental for establishing the convergence and the iteration complexity of the IPG-ELS method.
Proposition 3 (Key Inequality for the IPG-ELS Method)
For every and , we have
(20) |
Additionally, the functional value sequence is decreasing and convergent, and .
Proof Let and . In view of the inexact relative inclusion (10), we have , and hence the definition of in (5) implies that
(21) |
Since is convex, we have . Adding the above two inequalities, using , and simplifying the resulting expression, we obtain
Combining the above inequality with the identity
we have
or, equivalently,
Since and is convex, we have . Hence, combining the last two inequalities and the fact that , we get
or, equivalently,
Now, using the linesearch procedure of Step 4, we obtain
It follows from the above inequality and (11) that
(22) |
On the other hand, using the identity and the strong convexity of , we have
which implies
Therefore, the proof of (20) follows by combining the latter inequality with (22). The last statement of the proposition follows immediately from (20) with that
(23) |
So, is decreasing and convergent because it is bounded from below by . Moreover, the last inequality implies that . ∎
Next, we establish the convergence rate of the functional values sequence .
Theorem 4.1 (Convergence Rate of the IPG-ELS Method)
Let and be generated by the IPG-ELS method. Assume that there exists such that for all . Then, for all , we have
(24) |
Proof For any and , it follows from Proposition 3 with and that
Summing the above inequality over , we have
(25) |
Since in view of the last statement of Proposition 3, we have for any , it follows from (4) that
Since is arbitrary, the proof of (24) follows. ∎
Remark 3 (Complexity of -Approximate Solution)
Theorem 4.2 (Complexity of -Approximate Stationary Solution)
Consider , , , and generated by the IPG-ELS method and define , for every . Then, we have
(26) |
Additionally, if , and is Lipschitz continuous on , then, given a tolerance , the IPG-ELS method generates an -approximate stationary solution to problem (1) with residues , in the sense of Definition 1, in at most iterations.
Proof The first inclusion in (26) follows immediately from (10) and the definition of , whereas the second inclusion in (26) follows from the definitions of and .
Now let be the projection of onto and let . As it was observed in Lemma 6, the Lipschitz continuity of implies that there exists such that for all . It follows from Proposition 3 with and that
(27) |
for all . Summing the above inequality over , and using that , we have
Hence, since , we see that there exists such that
(28) |
On the other hand, since , if follows from (16) that, for every ,
where the last inequality is due to Cauchy-Schwarz inequality and the fact that for any and , in particular, with , , and . Hence, we have
(29) |
Now, from the definition of , (29), the Cauchy-Schwarz inequality and the Lipschitz continuity of , we have, for every ,
(30) |
Moreover, it follows from (29) that
(31) |
Hence, it follows from (28), (4), and (31) with and that
which in turn implies that
(32) |
Thus, the last statement of the theorem follows from (32) and the first inclusion in (26). ∎
We end this section by establishing the full convergence of the sequence to a solution of problem (1). The proof is based on the quasi-Fejér convergence of the sequence to the set , as defined in Definition 2.
Theorem 4.3 (Convergence for the IPG-ELS Method)
The sequence generated by the IPG-ELS method converges to a point in .
Proof By employing Proposition 3 at , we have
(33) |
We set , and we will prove that is a summable sequence. In fact,
This together with (33) tells us that the sequence is quasi-Fejér convergent to via Definition 2. By Lemma 1(a), the sequence is bounded. Let be a cluster point of . Hence there exists a subsequence of converging to .
Now we proceed by considering the two possible cases:
Case 1. The sequence does not converge to , i.e., there exist some and a subsequence of (without relabeling) such that
(34) |
By using Proposition 3 with , we get
Summing for , the above inequality implies that
By taking and using the fact that and (34), we obtain that
which together with (34) establishes that Since is lower semicontinuous on , it follows from the last equality that
which yields and thus .
Case 2. . Define and
(35) |
where . It follows from the definition of the linesearch that
(36) |
It follows from the convexity of , the fact that , and the positiveness of the term that
which, together with (35), yields
Now it follows from Lemma 2 that
Hence,
which, due to the positiveness of , yields
(37) |
Note that which combined with the last statement of Proposition 3 give us that as . Since is continuous, we have as . From (37), it is derived that
(38) |
therefore, and . Additionally, we can use (11) to show that . Thus, is also an accumulation point of the sequence , and without loss of generality, we can assume that as . Moreover, we have that
(39) |
Now, using (11), we obtain
Furthermore, since converges to as indicated by (38), and by applying the triangular inequality, we obtain
which implies, via (38) and (39), that also converges to . Consequently, the convergence of to and to , combined with the closedness of the graph of in Proposition 1, gives us that . This is equivalent to stating that .
In all the cases considered above, we have shown that , an accumulation point of the sequence , belongs to . Proposition 1(b) implies that converges to an optimal solution in . ∎
5 Numerical Experiments
In this section, we investigate the numerical behavior of the IPG-ELS method in solving the CUR-like factorization optimization problem JMLR:v12:mairal11a . Consider . Given a matrix , the objective is to find a sparse rows and columns matrix, , such that approximates . This problem can be formulated as the following splitting optimization problem:
where denotes the Frobenius norm, and and denote the -th row and -th column of , respectively. This problem is a special case of problem (1) with
In this case, the gradient of is given by and has a Lipschitz constant . However, the proximal operator of does not have a closed-form solution. Following the approach in Schmidt:2011 (see also Barre:2022 ; Zhou:2022 ), a triple satisfying (10)-(11) can be obtained using the Dykstra-like algorithm (Bauschke:2008c, , Theorem 3.3). This algorithm is applied to the proximal subproblem.
(40) |
where with the initial points , generates the sequences
(41) |
where and . Recall that:
and
Using now the definition of , as presented in (41), we have that . This observation, combined with Proposition 2, yields
On the other hand, it follows from the definition of in (41) that
Thus, it follows from the last two inclusions and the definition of that
Therefore, if the Dykstra-like algorithm is stopped when
Considering that the IPG-ELS method integrates two effective strategies: (i) permitting the subproblem to be solved inexactly to meet a relative approximation criterion, and (ii) employing an explicit linesearch procedure that computes the proximal operator only once per iteration, our primary goal is to demonstrate that, in certain cases, the proposed method surpasses the proximal gradient method that employs only one of these strategies. Consequently, we compare the new algorithm with two alternative schemes: an exact version of the IPG-ELS method, denoted by PG-ELS, which corresponds to IPG-ELS with , , , and , replacing the inexact criterion in (11); and an IPG method with a fixed Stepsize, corresponding to (Millan:2019, , Algorithm 2) with and , where is the Lipschitz constant of . This algorithm is denoted by IPG-FixStep and is defined as
where and satisfy
The initialization parameters for the IPG-ELS method were set as , , , and . For all tests, we initialized , and set . The IPG-ELS method was executed for outer iterations to establish a baseline performance metric, . The other two algorithms were terminated as soon as . The algorithms were evaluated on six datasets from Cano:2005 ; Guyon:2004 ; Team:2008 : Colon tumor (), heart disease (), central nervous system (CNS) (), lung cancer-Michigan (), Secom (), and Cina0 ().
Each matrix was normalized to have a unit Frobenius norm, with an additional step of centering each column. Subsequently, the resulting matrices were multiplied by a constant, which plays a crucial role in controlling the Lipschitz constant of the function . The experiments were conducted using the Python programming language, which was installed on a machine equipped with a 3.5 GHz Dual-Core Intel Core i5 processor and 16 GB of 2400 MHz DDR4 memory.
In Tables 1 and 2, we report the Lipschitz constant of the gradient of (denoted as Lips), the number of outer iterations (O-IT), the number of inner iterations (I-IT), the number of linesearch iterations (LS-IT), and the total running time in seconds (Time). The results indicate that, in terms of CPU times, the IPG-ELS method outperforms the other two methods. This efficiency can be attributed to two main factors: (i) the PG-ELS method requires a significantly higher number of inner iterations to solve the proximal subproblem ”exactly”, and (ii) the IPG-FixStep method employs a small stepsize of in the gradient step.
Problem | Lips | Method | O-IT | I-IT | LS-IT | Time | |
---|---|---|---|---|---|---|---|
Colon Tumor | 41.58 | IPG-ELS | 1.1056 | 101 | 195 | 318 | 28.88 |
PG-ELS | 1.1056 | 104 | 751 | 324 | 36.19 | ||
IPG-FixStep | 1.1056 | 1450 | 1451 | - | 171.72 | ||
Colon Tumor | 665.32 | IPG-ELS | 2.3647 | 101 | 101 | 823 | 32.74 |
PG-ELS | 2.3623 | 128 | 455 | 1038 | 44.02 | ||
IPG-FixStep | 2.3647 | 1103 | 1104 | - | 97.72 | ||
Colon Tumor | 5133.69 | IPG-ELS | 5.8989 | 101 | 101 | 1134 | 110.86 |
PG-ELS | 5.8832 | 119 | 328 | 1324 | 116.26 | ||
IPG-FixStep | 5.8981 | 586 | 587 | - | 138.92 | ||
Heart Disease | 77.12 | IPG-ELS | 0.1732 | 101 | 178 | 539 | 1.46 |
PG-ELS | 0.1732 | 119 | 876 | 603 | 4.28 | ||
IPG-FixStep | 0.1732 | 487 | 488 | - | 3.52 | ||
Heart Disease | 1233.99 | IPG-ELS | 0.3129 | 101 | 101 | 903 | 0.85 |
PG-ELS | 0.3119 | 89 | 368 | 788 | 1.71 | ||
IPG-FixStep | 0.3520 | 2001 | 2001 | - | 11.85 | ||
Heart Disease | 9521.59 | IPG-ELS | 0.4995 | 101 | 101 | 1222 | 0.99 |
PG-ELS | 0.4992 | 227 | 622 | 2690 | 5.46 | ||
IPG-FixStep | 0.7242 | 2001 | 2001 | - | 43.92 | ||
CNS | 41.95 | IPG-ELS | 0.9519 | 101 | 182 | 341 | 364.44 |
PG-ELS | 0.9518 | 119 | 714 | 396 | 501.59 | ||
IPG-FixStep | 0.9519 | 1217 | 1218 | - | 1675.05 | ||
CNS | 671.17 | IPG-ELS | 2.1153 | 101 | 101 | 768 | 544.91 |
PG-ELS | 2.1119 | 148 | 725 | 1110 | 878.39 | ||
IPG-FixStep | 2.2193 | 2001 | 2001 | - | 2749.17 | ||
CNS | 5178.80 | IPG-ELS | 6.0665 | 101 | 101 | 1130 | 1089.70 |
PG-ELS | 6.0613 | 102 | 343 | 1138 | 1169.25 | ||
IPG-FixStep | 6.0655 | 775 | 776 | - | 1677.93 |
Problem | Lips | Method | O-IT | I-IT | LS-IT | Time | |
---|---|---|---|---|---|---|---|
Lung cancer | 52.58 | IPG-ELS | 0.8985 | 101 | 179 | 443 | 422.07 |
PG-ELS | 0.8984 | 105 | 837 | 457 | 521.46 | ||
IPG-FixStep | 0.8985 | 678 | 679 | - | 962.80 | ||
Lung cancer | 841.22 | IPG-ELS | 2.7632 | 101 | 101 | 845 | 591.92 |
PG-ELS | 2.7623 | 131 | 768 | 1085 | 851.75 | ||
IPG-FixStep | 2.7631 | 588 | 589 | - | 838.30 | ||
Lung cancer | 2658.70 | IPG-ELS | 3.6391 | 101 | 101 | 992 | 696.61 |
PG-ELS | 3.6378 | 161 | 740 | 1574 | 1173.05 | ||
IPG-FixStep | 3.8819 | 2001 | 2001 | - | 2969.58 | ||
Secom | 45.78 | IPG-ELS | 0.6438 | 101 | 175 | 373 | 66.64 |
PG-ELS | 0.6438 | 99 | 9247 | 360 | 694.81 | ||
IPG-FixStep | 0.6438 | 857 | 858 | - | 225.29 | ||
Secom | 732.50 | IPG-ELS | 0.8587 | 101 | 101 | 779 | 82.81 |
PG-ELS | 0.8586 | 102 | 4931 | 795 | 423.80 | ||
IPG-FixStep | 0.8586 | 1662 | 1663 | - | 440.13 | ||
Secom | 5652.06 | IPG-ELS | 1.6981 | 101 | 101 | 1138 | 104.54 |
PG-ELS | 1.6899 | 125 | 1346 | 1388 | 215.08 | ||
IPG-FixStep | 1.6977 | 801 | 802 | - | 211.93 | ||
Cina0 | 68.39 | IPG-ELS | 0.7972 | 101 | 245 | 487 | 3041.60 |
PG-ELS | 0.7972 | 104 | 1490 | 483 | 5005.84 | ||
IPG-FixStep | 0.7972 | 608 | 790 | - | 5530.44 | ||
Cina0 | 527.70 | IPG-ELS | 1.2817 | 101 | 250 | 838 | 3925.72 |
PG-ELS | 1.2817 | 94 | 1789 | 693 | 5712.33 | ||
IPG-FixStep | 1.2817 | 1010 | 1011 | - | 8741.06 | ||
Cina0 | 8443.20 | IPG-ELS | 3.6531 | 101 | 104 | 1126 | 3791.25 |
PG-ELS | 3.6530 | 284 | 3083 | 3061 | 15338.18 | ||
IPG-FixStep | 3.8493 | 2001 | 2001 | - | 17104.23 |
6 Conclusions
In this work, we present an inexact proximal gradient method for solving composite convex optimization problems. This method features a novel explicit line search using the relative-type inexact solution of the proximal subproblem. Our approach is primarily designed to solve splitting problems when the objective function is the sum of differentiable and nondifferentiable convex functions, and the analytical computation of the proximal operator is not available. Notably, the convergence of the proposed method is established without assuming Lipschitz continuity of the gradient of the smooth function. This method addresses the need for a balance between computational efficiency and the accuracy of solving the proximal subproblem, a common challenge in practice.
We have confirmed the convergence and iteration complexity of our method, validating its theoretical soundness and practical utility. Numerical experiments demonstrate its applicability and efficiency. Our method maintains convergence rates while efficiently managing relative inexact solutions of the proximal operator. The numerical results indicate that the proposed method competes effectively with both the exact proximal gradient method and the inexact proximal gradient method with a fixed stepsize.
Acknowledgements.
YBC was partially supported by the NSF Grant DMS-2307328, and by an internal grant from NIU. MLNG was partially supported by CNPq Grants 405349/2021-1 and 304133/2021-3. JGM was partially supported by CNPq Grant 312223/2022-6.References
- (1) Adona, V. A., Gonçalves, M. L. N., and Melo, J. G. A Partially Inexact Proximal Alternating Direction Method of Multipliers and Its Iteration-Complexity Analysis. Journal of Optimization Theory and Applications 182, 2 (Aug. 2019), 640–666.
- (2) Adona, V. A., Gonçalves, M. L. N., and Melo, J. G. An inexact proximal generalized alternating direction method of multipliers. Computational Optimization and Applications 76, 3 (July 2020), 621–647.
- (3) Alves, M. M., Eckstein, J., Geremia, M., and Melo, J. G. Relative-error inertial-relaxed inexact versions of Douglas-Rachford and ADMM splitting algorithms. Computational Optimization and Applications 75, 2 (Mar. 2020), 389–422.
- (4) Aujol, J.-F., and Dossal, Ch. Stability of over-relaxations for the forward-backward algorithm, application to FISTA. SIAM Journal on Optimization 25, 4 (2015), 2408–2433.
- (5) Barré, M., Taylor, A., and Bach, F. A note on approximate accelerated forward-backward methods with absolute and relative errors, and possibly strongly convex objectives. Open Journal of Mathematical Optimization 3 (2022), 1–15.
- (6) Bauschke, H. H., and Borwein, J. M. On Projection Algorithms for Solving Convex Feasibility Problems. SIAM Review 38, 3 (1996), 367–426.
- (7) Bauschke, H. H., and Combettes, P. L. A dykstra-like algorithm for two monotone operators. Pacific J. Optim. 4, 3 (2008), 382–391.
- (8) Beck, Amir., and Teboulle, Marc. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences 2, 1 (2009), 183–202.
- (9) Bello Cruz, J. Y. On Proximal Subgradient Splitting Method for Minimizing the sum of two Nonsmooth Convex Functions. Set-Valued and Variational Analysis 25, 2 (June 2017), 245–263.
- (10) Bello Cruz, J. Y., and de Oliveira, W. On weak and strong convergence of the projected gradient method for convex optimization in real Hilbert spaces. Numerical Functional Analysis and Optimization. An International Journal 37, 2 (2016), 129–144.
- (11) Bello Cruz, J. Y., and Nghia, T. T. A. On the convergence of the forward–backward splitting method with linesearches. Optimization Methods & Software 31, 6 (2016), 1209–1238.
- (12) Bello-Cruz, Y., Gonçalves, M. L. N., and Krislock, N. On FISTA with a relative error rule. Computational Optimization and Applications 84, 2 (Mar. 2023), 295–318.
- (13) Bello-Cruz, Y., Li, G., and Nghia, T. T. A. On the Linear Convergence of Forward–Backward Splitting Method: Part I—Convergence Analysis. Journal of Optimization Theory and Applications 188, 2 (Feb. 2021), 378–401.
- (14) Bonettini, S., and Ruggiero, V. On the convergence of primal-dual hybrid gradient algorithms for total variation image restoration. Journal of Mathematical Imaging and Vision 44, 3 (2012), 236–253.
- (15) Bredies, K. A forward-backward splitting algorithm for the minimization of non-smooth convex functionals in Banach space. Inverse Problems. An International Journal on the Theory and Practice of Inverse Problems, Inverse Methods and Computerized Inversion of Data 25, 1 (2009), 015005, 20.
- (16) Cano, A., Masegosa, A., and Moral, S. ELVIRA biomedical data set repository. https://leo.ugr.es/elvira/DBCRepository/, 2005.
- (17) Combettes, P. L., and Wajs, V. R. Signal recovery by proximal forward-backward splitting. Multiscale Modeling and Simulation 4, 4 (2005), 1168–1200.
- (18) Eckstein, J., and Silva, P. J. S. A practical relative error criterion for augmented Lagrangians. Mathematical Programming 141, 1 (Oct. 2013), 319–348.
- (19) Eckstein, J., and Yao, W. Approximate ADMM algorithms derived from Lagrangian splitting. Computational Optimization and Applications 68, 2 (Nov. 2017), 363–405.
- (20) Eckstein, J., and Yao, W. Relative-error approximate versions of Douglas–Rachford splitting and special cases of the ADMM. Mathematical Programming 170, 2 (Aug. 2018), 417–444.
- (21) Frank, M., and Wolfe, P. An algorithm for quadratic programming. Naval Research Logistics Quarterly 3, 1-2 (1956), 95–110.
- (22) Guyon, I. UCI machine learning repository, 2004.
- (23) Hiriart-Urruty, J.-B., and Lemaréchal, C. Convex Analysis and Minimization Algorithms II, vol. 306 of Grundlehren Der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin, 1993.
- (24) Jaggi, M. Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization. In Proceedings of the 30th International Conference on Machine Learning (Feb. 2013), PMLR, pp. 427–435.
- (25) Jiang, K., Sun, D., and Toh, K.-c. An Inexact Accelerated Proximal Gradient Method for Large Scale Linearly Constrained Convex SDP. SIAM Journal on Optimization 22, 3 (Jan. 2012), 1042–1064.
- (26) Mairal, J., Jenatton, R., Obozinski, G., and Bach, F. Convex and network flow optimization for structured sparsity. Journal of Machine Learning Research 12, 81 (2011), 2681–2720.
- (27) Millán, R. D., and Machado, M. P. Inexact proximal $$\epsilon $$-subgradient methods for composite convex optimization problems. Journal of Global Optimization 75, 4 (Dec. 2019), 1029–1060.
- (28) Monteiro, R. D. C., and Svaiter, B. F. On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean. SIAM Journal on Optimization 20, 6 (2010), 2755–2787.
- (29) Salzo, S. The variable metric forward-backward splitting algorithm under mild differentiability assumptions. SIAM Journal on Optimization 27, 4 (2017), 2153–2181.
- (30) Salzo, S., Masecchia, S., Verri, A., and Barla, A. Alternating proximal regularized dictionary learning. Neural Computation 26, 12 (Dec. 2014), 2855–2895.
- (31) Salzo, S., and Villa, S. Inexact and accelerated proximal point algorithms. J. Convex Anal. 19, 4 (2012), 1167–1192.
- (32) Schmidt, M., Roux, N. L., and Bach, F. Convergence rates of inexact proximal-gradient methods for convex optimization. In Proceedings of the 24th International Conference on Neural Information Processing Systems (Red Hook, NY, USA, Dec. 2011), NIPS’11, Curran Associates Inc., pp. 1458–1466.
- (33) Schuster, T., Kaltenbacher, B., Hofmann, B., and Kazimierski, K. S. Regularization Methods in Banach Spaces. In Regularization Methods in Banach Spaces. De Gruyter, July 2012.
- (34) Solodov, M. V., and Svaiter, B. F. A hybrid approximate extragradient-proximal point algorithm using the enlargement of a maximal monotone operator. Set-Valued Analysis 7, 4 (1999), 323–345.
- (35) Team, C. A marketing dataset, Available at http://www.causality.inf.ethz.ch/data/CINA.html, 2008.
- (36) Villa, S., Salzo, S., Baldassarre, L., and Verri, A. Accelerated and inexact forward-backward algorithms. SIAM Journal on Optimization 23, 3 (2013), 1607–1633.
- (37) Zhong, L. W., and Kwok, J. T. Efficient Sparse Modeling With Automatic Feature Grouping. IEEE Transactions on Neural Networks and Learning Systems 23, 9 (Sept. 2012), 1436–1447.
- (38) Zhou, Q., and Pan, S. J. Convergence Analysis of Linear Coupling with Inexact Proximal Operator. In Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence (Aug. 2022), PMLR, pp. 2394–2403.