∎

¹¹institutetext: Yunier Bello-Cruz ²²institutetext: Cassandra Mohr ³³institutetext: Department of Mathematical Sciences, Northern Illinois University. Watson Hall 330, DeKalb, IL, USA - 60115.
[email protected] and [email protected]. ⁴⁴institutetext: Max L.N. Gonçalves ⁵⁵institutetext: Jefferson G. Melo ⁶⁶institutetext: Institute of Mathematics and Statistics, Federal University of Goias, Campus II- Caixa Postal 131, CEP 74001-970, Goiânia-GO, Brazil.
[email protected] and [email protected]

A Relative Inexact Proximal Gradient Method With an Explicit Linesearch

Yunier Bello-Cruz Max L.N. Gonçalves Jefferson G. Melo Cassandra Mohr

(Received: date / Accepted: date)

Abstract

This paper presents and investigates an inexact proximal gradient method for solving composite convex optimization problems characterized by an objective function composed of a sum of a full domain differentiable convex function and a non-differentiable convex function. We introduce an explicit linesearch strategy that requires only a relative inexact solution of the proximal subproblem per iteration. We prove the convergence of the sequence generated by our scheme and establish its iteration complexity, considering both the functional values and a residual associated with first-order stationary solutions. Additionally, we provide numerical experiments to illustrate the practical efficacy of our method.

Keywords:

Linesearch Iteration complexity Nonsmooth and convex optimization problemProximal gradient method Relative error rule

MSC:

65K05 90C25 90C30

1 Introduction

In this paper, our focus is on addressing nonsmooth convex optimization problems characterized by the following formulation:

\min\,F(x):=f(x)+g(x)\quad\text{subject to}\quad x\in\mathbbm{E},

(1)

where $\mathbbm{E}$ represents a finite-dimensional Euclidean space. The function $g:\mathbbm{E}\to\overline{\mathbbm{R}}:=\mathbbm{R}\cup\{+\infty\}$ is a nonsmooth, proper, and lower semicontinuous convex function. The function $f:\mathbbm{E}\to\mathbbm{R}$ is a continuously differentiable and convex function. Throughout the paper, we denote the optimal value and solution set of problem (1) by $F_{*}$ and $S_{*}$ , respectively. From now on, we assume that $S_{*}\neq\emptyset$ .

Proximal gradient methods effectively solve optimization problems such as in (1). The main step in the proximal gradient method involves evaluating the proximal operator ${\bf prox}_{g}:\mathbbm{E}\to{\bf dom\,}g:=\{x\in\mathbbm{E}\mid g(x)<+\infty\}$ defined by:

{\bf prox}_{g}(y):=\underset{x\in\mathbbm{E}}{\arg\min}\;\left\{g(x)+\frac{1}{2}\left\|x-y\right\|^{2}\right\},

(2)

where the norm, $\|\cdot\|$ , is induced by the inner product of $\mathbbm{E}$ , $\langle\cdot,\cdot\rangle$ , as $\|\cdot\|:=\sqrt{\langle\cdot,\cdot\rangle}$ . The proximal operator is well-known as a full-domain, firmly nonexpansive operator. These useful properties, together with the descent property of the gradient step, establish the foundation for the convergence and complexity for proximal gradient iterations.

Proximal Gradient Method (PGM) Let $x_{0}\in{\bf dom\,}g$ . Compute $\gamma_{k}>0$ and $\bar{x}_{k}:={\bf prox}_{\gamma_{k}g}\left(x_{k}-\gamma_{k}\nabla f(x_{k})\right).$ (3) Choose $\beta_{k}\in(0,1]$ and compute $x_{k+1}:=x_{k}+\beta_{k}(\bar{x}_{k}-x_{k}).$ (4)

The coefficients $\gamma_{k}$ and $\beta_{k}$ , referred to as stepsizes, can be determined based on backtracking linesearch procedures. Such strategies are essential whenever the global $L$ -Lipschitz continuity of $f$ fails or even when computing an acceptable upper bound for $L$ is challenging. This situation is often encountered in numerous applications; for instance, in inverse problems based on Banach norms Bredies:2009 ; Schuster:2012 or Bregman distances such as the Kullback-Leibler divergence Salzo:2014 ; Bonettini:2012 ; BelloCruz:2016a . Moreover, even when $L$ is known, linesearches may allow for longer steps toward the solution by using local information at every iteration.

There are several possible choices for these stepsizes, each impacting the algorithm’s performance in different ways; see, for instance, Salzo:2017a ; BelloCruz:2016a ; BelloCruz:2016c ; Bello-Cruz:2021 . It is important to note that in order to compute the stepsize $\gamma_{k}$ using a backtracking linesearch at each iteration $k$ , the proximal operator may need to be evaluated multiple times within the procedure. Conversely, the stepsize $\beta_{k}$ can be selected by evaluating the proximal operator only once per iteration. In this context, we will refer to explicit linesearch to describe a backtracking procedure that determines $\beta_{k}$ after setting $\gamma_{k}$ as a constant for all $k$ . These explicit strategies are particularly advantageous, especially in cases where evaluating the proximal operator is challenging. The function $g$ is often complex enough that the corresponding proximal operator lacks an analytical solution. In such cases, an ad-hoc algorithm should be employed to evaluate the proximal operator inexactly. For instance, consider $\mathbb{E}=\mathbbm{R}^{n}$ , and let $g:\mathbbm{R}^{n}\to\mathbbm{R}$ be defined as

g(x)=\|x\|_{1}+\lambda\sum_{i<j}\max\{|x_{i}|,|x_{j}|\},

with $\lambda>0$ , a form encountered in sparse regression problems with grouping, as discussed in Zhong:2012 . Similarly, in the context of matrix factorization, consider $\mathbb{E}=\mathbbm{R}^{n\times m}$ for the CUR-like factorization optimization problem Barre:2022 ; JMLR:v12:mairal11a , where the goal is to approximate a matrix $W\in\mathbbm{R}^{m\times n}$ with $X\in\mathbbm{R}^{n\times m}$ having sparse rows and columns. In this case, $g:\mathbbm{R}^{n\times m}\to\mathbbm{R}$ is given by

g(X)=\lambda_{\text{row}}\sum_{i=1}^{n}\|X^{(i)}\|_{2}+\lambda_{\text{col}}\sum_{j=1}^{m}\|X_{(j)}\|_{2},

which will be considered in the numerical illustrations of this paper. For further examples and discussions, see BelloCruz:2017a ; Bello-Cruz:2023 ; Salzo:2012 ; Villa:2013 ; Schmidt:2011 ; Jiang:2012 ; Millan:2019 ; Barre:2022 . Of course, there are rare cases when the exact analytical solution of the proximal operator is available, such as when $g:\mathbbm{R}^{n}\to\mathbbm{R}$ as $g(x)=\|x\|_{1}$ or the indicator function of a simple convex and closed set, see, for instance, Beck:2009 ; Combettes:2005 .

Consequently, in practice, the evaluation of the proximal operator is often done inexactly. A basic inexactness criterion is to approximately evaluate the proximal operator using an exogenous sequence of error tolerance which, in general, must be summable in order to guarantee the convergence of the proximal iterations. Such a diminishing sequence is a priori chosen without using any information that may be available along with the iterations. Usually, the restrictive summability condition forces the solution of the proximal subproblem to be increasingly accurate, often more than necessary; see, for instance, Jiang:2012 ; Salzo:2012 ; Aujol:2015 ; Villa:2013 ; Schmidt:2011 . In the past two decades, relative error criteria have been considered as an effective and practical way of controlling the inexactness in solving proximal subproblems of several algorithms, including FISTA, ADMM, augmented Lagrangian, and Douglas-Rachford methods; refer to Eckstein:2013 ; Eckstein:2018 ; Alves:2020 ; Adona:2019 ; Adona:2020 for examples. Relative error criteria are often easy to verify in practice and has the advantage of exploring the available information at a specific iteration, being, therefore, an interesting alternative to the aforementioned exogenous sequences.

In this paper, we propose and analyze an inexact proximal gradient method for solving problem (1). We propose a novel relative inexactness criterion for solving the proximal subproblems which in some sense ressembles the ideas of relative error criteria introduced in Monteiro:2010 ; Solodov:1999a , but adds some new elements in order to control the objective function on the inexact solution. The proposed scheme requires only one inexact solution of the proximal subproblem per iteration, and the stepsizes are computed through a relaxed explicit linesearch procedure that takes into account the residuals obtained in the proximal subproblem and enables the iteration to address non-Lipschitz optimization problems effectively. We show that the sequence generated by our method converges to a solution of problem (1). Moreover, we establish its iteration-complexity in terms of both the function values and the residues associated to an approximate stationary solution. We also present some numerical experiments to illustrate the performance of the proposed method.

The paper is structured as follows: Section 2 presents definitions, basic facts, and auxiliary results. In Section 3, we introduce the inexact proximal gradient method with an explicit linesearch (IPG-ELS) and establish some of its fundamental properties. Section 4 establishes the iteration-complexity bounds of the IPG-ELS method in terms of functional values and a residual associated with the stationary condition for problem (1). Additionally, the full convergence of the sequence generated by the IPG-ELS method is discussed in this section. Some numerical experiments illustrating the behavior of the proposed scheme are reported in Section 5. Finally, concluding remarks are provided in Section 6.

2 Preliminary

In this section, we present some preliminary material and notations that will be used throughout this paper.

Let $g:\mathbbm{E}\to\overline{\mathbbm{R}}$ be a proper, lower semicontinuous, and convex function. For a given $\varepsilon\geq 0$ , the $\varepsilon$ -subdifferential of $g$ at $x\in{\bf dom\,}g=\{x\in\mathbbm{E}\mid g(x)<+\infty\}$ , denoted by $\partial_{\varepsilon}g(x)$ , is defined as

\partial_{\varepsilon}g(x):=\{v\in\mathbbm{E}\mid g(y)\geq g(x)+\langle v,y-x\rangle-\varepsilon,\ \forall y\in\mathbbm{E}\}.

(5)

When $x\notin{\bf dom\,}g$ , we define $\partial_{\varepsilon}g(x)=\emptyset$ . Any element $v\in\partial_{\varepsilon}g(x)$ is called an $\varepsilon$ -subgradient. If $\varepsilon=0$ , then $\partial_{0}g(x)$ becomes the classical subdifferential of $g$ at $x$ , denoted by $\partial g(x)$ . It follows immediately from (5) that

v\in\partial_{\varepsilon}g(y),\;u\in\partial g(x)\quad\text{implies}\quad\langle v-u,y-x\rangle\geq-\varepsilon.

(6)

We present two useful auxiliary results for $\partial_{\varepsilon}g$ . The first one is the closedness of the graph of $\partial_{\varepsilon}g$ and the second is the so-called transportation formula; see Propositions 4.1.1 and 4.2.2 of Hiriart-Urruty:1993 .

Proposition 1 (Closed Graph Property)

Let $(\varepsilon_{k},x_{k},v_{k})_{k\in\mathbbm{N}}\subseteq\mathbbm{R}_{+}\times\mathbbm{E}\times\mathbbm{E}$ be a sequence converging to $(\varepsilon,x,v)$ with $v_{k}\in\partial_{\varepsilon_{k}}g(x_{k})$ for all $k\in\mathbbm{N}$ . Then, $v\in\partial_{\varepsilon}g(x)$ .

Proposition 2 (Transportation Formula)

With $x$ and $y$ in ${\bf dom\,}g,$ let $v\in\partial g(y)$ . Then $v\in\partial g_{\epsilon}(x)$ where $\epsilon=g(x)-g(y)-\langle{v},{x-y}\rangle\geq 0$ .

We now introduce a concept of approximate stationary solution to problem (1). First, note that $\bar{x}$ is a solution to problem (1) if and only if $0\in\nabla f(\bar{x})+\partial g(\bar{x})$ . The concept of approximate stationary solution relaxes this inclusion by introducing a residual $v$ and enlarging $\partial g$ using $\partial_{\varepsilon}g$ .

Definition 1 ( $\eta$ -Approximate Stationary Solution)

Given $\eta>0$ , an element $\tilde{x}\in{\bf dom\,}g$ is said to be an $\eta$ -approximate stationary solution to problem (1) with a residual pair $(v,\varepsilon)$ if

v\in\nabla f(\tilde{x})+\partial_{\varepsilon}g(\tilde{x}),\quad\max\{\|v\|,\varepsilon\}\leq\eta.

(7)

Next, we recall the definition of quasi-Fejér convergence, which is an important and well-known tool for establishing full convergence of gradient and proximal point type methods.

Definition 2 (Quasi-Fejér Convergence)

Let $S$ be a nonempty subset of $\mathbbm{E}$ . A sequence $(x_{k})_{k\in\mathbbm{N}}\subseteq\mathbbm{E}$ is said to be quasi-Fejér convergent to $S$ if and only if, for every $x\in S$ , there exists a summable sequence $(\delta_{k})_{k\in\mathbbm{N}}\subseteq\mathbbm{R}_{+}$ such that

\|x_{k+1}-x\|^{2}\leq\|x_{k}-x\|^{2}+\delta_{k},\qquad\forall k\in\mathbbm{N}.

(8)

The following result presents the main properties of quasi-Fejér convergent sequences; see Theorem $2.6$ of Bauschke:1996 .

Lemma 1 (Quasi-Féjer Convergence Properties)

If $(x_{k})_{k\in\mathbbm{N}}$ is quasi-Fejér convergent to $S$ , then the following statements hold:

The sequence $(x_{k})_{k\in\mathbbm{N}}$ is bounded;

If a cluster point $\bar{x}$ of $(x_{k})_{k\in\mathbbm{N}}$ belongs to $S$ , then $(x_{k})_{k\in\mathbbm{N}}$ is convergent to $\bar{x}$ .

We conclude the section with a basic inequality that will be used in the next sections.

Lemma 2

For any $x,y\in\mathbbm{E}$ , we have $\|x+y\|^{2}\leq 2\|x\|^{2}+2\|y\|^{2}$ .

3 Inexact Proximal Gradient Method

In this section, we introduce the inexact proximal gradient method with an explicit linesearch and establish some basic properties.

Recall that, given $x_{k}\in\mathbbm{E}$ , the exact solution of the proximal gradient subproblem (3) with $\gamma_{k}=1$ consists of finding $\bar{x}_{k}$ such that

\bar{x}_{k}={\rm arg}\!\min_{x\in\mathbbm{E}}\left\{\left\langle\nabla f(x_{k}),x-x_{k}\right\rangle+g(x)+\frac{1}{2}\|x-x_{k}\|^{2}\right\}.

This proximal subproblem is equivalent to the following monotone inclusion problem:

0\in\nabla f(x_{k})+\partial g(\bar{x}_{k})+\bar{x}_{k}-x_{k}.

(9)

The $k$ -th iteration of the inexact proximal gradient method, which incorporates an explicit linesearch as proposed below, begins by first relaxing the above inclusion. This is achieved by finding an approximate solution $\tilde{x}_{k}$ along with a residual pair $(v_{k},\varepsilon_{k})$ satisfying

v_{k}\in\nabla f(x_{k})+\partial_{\varepsilon_{k}}g(\tilde{x}_{k})+\tilde{x}_{k}-x_{k},

and adheres to a specific mixed relative error criterion. This criterion is designed to ensure that the residual pair remains small, while also controlling the functional value of $g$ and the angle between $\nabla f(x_{k})$ and $v_{k}$ , in addition to guiding a search direction that involves the intermediate iteration $\tilde{x}_{k}$ and the residual $v_{k}$ . If the stopping criterion $\tilde{x}_{k}=x_{k}$ is not satisfied, we use a novel linesearch procedure, designed with the triple $(\tilde{x}_{k},v_{k},\varepsilon_{k})$ in mind, to find a suitable stepsize $\beta_{k}$ . This stepsize is used to move from the current point $x_{k}$ towards $y_{k}-x_{k}$ where $y_{k}=\tilde{x}_{k}-v_{k}$ , genegenerating the next point $x_{k+1}$ .

In the following, we formally describe the proposed method, which combines a relative inexact computation of the proximal subproblem with an explicit linesearch.

Inexact Proximal Gradient Method with an Explicit Linesearch (IPG-ELS) Initialization Step. Let $x_{0}\in{\bf dom\,}g$ , $\tau\in(0,1]$ , $\theta\in(0,1)$ , $\gamma_{1}>1$ , $\gamma_{2}\geq 1$ , and $\alpha\in[0,1-\tau]$ . Set $k=0$ . Inexact Proximal Subproblem. Compute a triple $(\tilde{x}_{k},v_{k},\varepsilon_{k})$ satisfying $\displaystyle v_{k}\in\nabla f(x_{k})+\partial_{\varepsilon_{k}}g(\tilde{x}_{k})+\tilde{x}_{k}-x_{k},$ (10) $\displaystyle g(\tilde{x}_{k}-v_{k})-g(\tilde{x}_{k})-\langle\nabla f(x_{k}),v_{k}\rangle+\frac{(1+\gamma_{1})}{2}\|v_{k}\|^{2}+(1+\gamma_{2})\varepsilon_{k}\leq\frac{(1-\tau-\alpha)}{2}\|x_{k}-\tilde{x}_{k}\|^{2};$ (11) Stopping Criterion. If $\tilde{x}_{k}=x_{k}$ , then stop. Linesearch Procedure. Set $\beta=1$ and $y_{k}:=\tilde{x}_{k}-v_{k}$ . If $\dsty f\left(x_{k}+\beta(y_{k}-x_{k})\right)\leq f(x_{k})+\beta\langle\nabla f(x_{k}),y_{k}-x_{k}\rangle+\frac{\beta\tau}{2}\|x_{k}-\tilde{x}_{k}\|^{2}+\frac{\gamma_{1}}{2}\beta\|v_{k}\|^{2}+\beta\gamma_{2}\varepsilon_{k},$ (12) then set $\beta_{k}=\beta$ , $x_{k+1}=x_{k}+\beta_{k}(y_{k}-x_{k})$ and $k:=k+1$ , and go to Step 2. Otherwise, set $\beta=\theta\beta$ , and verify (12).

Remark 1 (The Inexact Proximal Subproblem)

Note that the inexact proximal gradient subproblem (10) is equivalent to the exact proximal gradient subproblem (9) whenever $v_{k}=0$ and $\varepsilon_{k}=0$ . Moreover, it is possible to always set $v_{k}$ to $0$ and still introduce a novel scheme that diverges from the previous one, which was based on inexact summable error tolerance. Indeed, under this assumption, the inexact proximal subproblem (10)-(11) can be rewritten as:

	$\displaystyle 0\in\nabla f(x_{k})+\partial_{\varepsilon_{k}}g(\tilde{x}_{k})+\tilde{x}_{k}-x_{k},$		(13)
	$\displaystyle\varepsilon_{k}\leq\frac{(1-\tau-\alpha)}{2(1+\gamma_{2})}\\|x_{k}-\tilde{x}_{k}\\|^{2}.$		(14)

It is worth noting that for the above subproblem, no summability assumption is made on the tolerance sequence $(\varepsilon_{k})_{k\in\mathbbm{N}}$ to achieve the convergence and complexity of the proposed method, as will be shown later. Furthermore, inclusion (10) and inequality (11) guarantee that $\tilde{x}_{k}$ and $\tilde{x}_{k}-v_{k}$ are in the ${\bf dom\,}g$ . Additionally, if $g$ is the indicator function of a nonempty closed convex set $C$ , then (13)–(14) can be fulfilled by finding a point $\tilde{x}_{k}\in C$ such that

\langle x_{k}-\nabla f(x_{k})-\tilde{x}_{k},y-\tilde{x}_{k}\rangle\leq\varepsilon_{k}:=\frac{(1-\tau-\alpha)}{2(1+\gamma_{2})}\|x_{k}-\tilde{x}_{k}\|^{2},\quad\forall y\in C.

(15)

For example, if $C$ is bounded, one can use the conditional gradient (CondG) method, a.k.a. Frank-Wolfe method Frank:1956 ; Jaggi:2013 , to obtain such point $\tilde{x}_{k}\in C$ . Indeed, given $z_{t}\in C$ , the $t$ -th step of the CondG method, applied to solve the projection problem $\min_{y\in C}\|y-x_{k}+\nabla f(x_{k})\|^{2}/2$ , first finds $w_{t}$ as a minimum of the linear function $\langle{z_{t}-x_{k}+\nabla f(x_{k})},{\cdot-z_{t}}\rangle$ over $C$ and then sets $z_{t+1}=(1-\alpha_{t})z_{t}+\alpha_{t}w_{t}$ for some $\alpha_{t}\in(0,1)$ . If at an iteration $\ell$ , the computed point $z_{\ell}$ satisfies the stopping criterion $\langle{z_{\ell}-x_{k}+\nabla f(x_{k})},{w_{\ell}-z_{\ell}}\rangle\geq-\varepsilon_{k}$ for the CondG method, then $\tilde{x}_{k}:=z_{\ell}$ and (15) will hold.

Remark 2 (The Explicit Lineasearch)

Note that the explicit linesearch proposed in Step 4 of the IPG-ELS is related to the one proposed in BelloCruz:2016a when $v_{k}=0$ and $\varepsilon_{k}=0$ . Moreover, note that the linesearch does not evaluate the inexact proximal subproblem inside of the inner loop of Step 4.

We introduce a useful lemma that plays a crucial role in analyzing the stopping criterion of the IPG-ELS method.

Lemma 3 (Iteration Inequality Condition)

The following inequality holds for every iteration $k$ .

\left\langle\tilde{x}_{k}-x_{k},v_{k}\right\rangle+\frac{(\gamma_{1}-1)}{2}\|v_{k}\|^{2}+\gamma_{2}\varepsilon_{k}\leq\frac{(1-\tau-\alpha)}{2}\|x_{k}-\tilde{x}_{k}\|^{2}.

(16)

Proof It follows from (10) that

v_{k}+{x}_{k}-\tilde{x}_{k}-\nabla f({x}_{k})\in\partial_{\varepsilon_{k}}g(\tilde{x}_{k}).

Now using (5) together with the fact that $y_{k}=\tilde{x}_{k}-v_{k}$ , we have

	$\displaystyle g(y_{k})-g(\tilde{x}_{k})$	$\displaystyle\geq\left\langle v_{k}+{x}_{k}-\tilde{x}_{k}-\nabla f({x}_{k}),y_{k}-\tilde{x}_{k}\right\rangle-\varepsilon_{k}$
		$\displaystyle=\left\langle v_{k}+{x}_{k}-\tilde{x}_{k}-\nabla f({x}_{k}),-v_{k}\right\rangle-\varepsilon_{k},$

which is equivalent to

\displaystyle\Gamma_{k}:=g(y_{k})-g(\tilde{x}_{k})-\langle\nabla f({x}_{k}),v_{k}\rangle

\displaystyle\geq-\|v_{k}\|^{2}+\langle\tilde{x}_{k}-x_{k},v_{k}\rangle-\varepsilon_{k}.

(17)

On the other hand, considering the definitions of $\Gamma_{k}$ and $y_{k}$ , we observe that (11) is equivalent to

\Gamma_{k}+\frac{(1+\gamma_{1})}{2}\|v_{k}\|^{2}+(1+\gamma_{2})\varepsilon_{k}\leq\frac{(1-\tau-\alpha)}{2}\|x_{k}-\tilde{x}_{k}\|^{2},

which, combined with (17), yields the desired inequality (16). ∎

The following result demonstrates that the termination criterion of the IPG-ELS method, as specified in Step 3, is satisfied only when a solution to problem (1) is identified.

Lemma 4 (Stopping at a Solution)

The IPG-ELS method terminates at the $k$ -th iteration if and only if $x_{k}$ is a solution to problem (1).

Proof Assume that the IPG-ELS method stops at the $k$ -th iteration. In view of Step 3, we have $\tilde{x}_{k}=x_{k}$ . Hence, it follows from (16) that

\frac{(\gamma_{1}-1)}{2}\|v_{k}\|^{2}+\gamma_{2}\varepsilon_{k}\leq 0.

Since $\gamma_{1}>1$ and $\gamma_{2}\geq 1$ , we obtain $v_{k}=0$ and $\varepsilon_{k}=0$ . Hence, in view of (10), we get that $0\in\nabla f(x_{k})+\partial g(x_{k}),$ concluding that $x_{k}$ is a solution of problem (1).

Assume now that $x_{k}$ is a solution of problem (1). Thus, $-\nabla f(x_{k})\in\partial g(x_{k}).$ It follows from (10) that $v_{k}+x_{k}-\tilde{x}_{k}-\nabla f(x_{k})\in\partial_{\varepsilon_{k}}g(\tilde{x}_{k})$ . So, the $\varepsilon$ -monotonicity of $\partial_{\varepsilon}g$ in (6) implies

\langle v_{k}+x_{k}-\tilde{x}_{k},\tilde{x}_{k}-x_{k}\rangle\geq-\varepsilon_{k},

which is equivalent to

\langle v_{k},\tilde{x}_{k}-x_{k}\rangle\geq\|x_{k}-\tilde{x}_{k}\|^{2}-\varepsilon_{k}.

(18)

Hence, it follows from (16) that

\frac{(1+\tau+\alpha)}{2}\|x_{k}-\tilde{x}_{k}\|^{2}+\frac{(\gamma_{1}-1)}{2}\|v_{k}\|^{2}+(\gamma_{2}-1)\varepsilon_{k}\leq 0.

Since $\gamma_{1}>1$ , $\gamma_{2}\geq 1$ , $\alpha\geq 0$ , and $\tau>0$ , we conclude that $x_{k}=\tilde{x}_{k}$ and $v_{k}=0$ . Hence, from (11), we also have $\varepsilon_{k}=0$ . Therefore, the IPG-ELS method stops at the $k$ -th iteration. ∎

In the following, we establish the well-definedness of the linesearch procedure in Step 4 of the IPG-ELS method.

Lemma 5 (Finite Linesearch Termination)

The linesearch procedure in Step 4 stops after a finite number of steps.

Proof In view of Step 3, we have $x_{k}\neq\tilde{x}_{k}$ . Assume, for the sake of contradiction, that the linesearch procedure does not terminate after a finite number of steps. Thus, for all $\beta\in\{1,\theta,\theta^{2},\ldots\}$ , we have

f\left(x_{k}+\beta(y_{k}-x_{k})\right)>f(x_{k})+\beta\langle\nabla f(x_{k}),y_{k}-x_{k}\rangle+\frac{\beta\tau}{2}\|x_{k}-\tilde{x}_{k}\|^{2}+\frac{\gamma_{1}}{2}\beta\|v_{k}\|^{2}+\beta\gamma_{2}\varepsilon_{k},

or, equivalently,

\frac{f\left(x_{k}+\beta(y_{k}-x_{k})\right)-f(x_{k})}{\beta}-\langle\nabla f(x_{k}),y_{k}-x_{k}\rangle>\frac{\tau}{2}\|x_{k}-\tilde{x}_{k}\|^{2}+\frac{\gamma_{1}}{2}\|v_{k}\|^{2}+\gamma_{2}\varepsilon_{k}.

Given that $f$ is differentiable, the left-hand side of the above inequality approaches zero as $\beta\downarrow 0$ , leading to the conclusion that

\displaystyle 0\geq\frac{\tau}{2}\|x_{k}-\tilde{x}_{k}\|^{2}+\frac{\gamma_{1}}{2}\|v_{k}\|^{2}+\gamma_{2}\varepsilon_{k}.

This implies $x_{k}=\tilde{x}_{k}$ contradicting the assumption that $x_{k}\neq\tilde{x}_{k}$ . ∎

The subsequent analysis investigates the linesearch procedure introduced in Step 4 of the IPG-ELS method, assuming that the gradient of the function $f$ , $\nabla f$ , is Lipschitz continuous. We aim to establish an upper bound for the number of iterations required by the linesearch procedure in Step 4.

Lemma 6 (Lipschitz Condition and Linesearch Complexity)

Assume that $f$ has an $L$ -Lipschitz continuous gradient and that $x_{k}$ is not a solution to problem (1). Then, any $\beta\leq\tau/(2L)$ satisfies (12). As a consequence, the linesearch procedure in Step 4 of the IPG-ELS method stops in at most

\ell:=\left\lceil\frac{\ln(\min\{\tau/(2L),1\})}{\ln(\theta)}\right\rceil

(19)

iterations.

Proof Since $\nabla f$ is $L$ -Lipschitz continuous, for any $\beta>0$ , we have

\displaystyle f(x_{k}+\beta(y_{k}-x_{k}))-f(x_{k})-\beta\langle\nabla f(x_{k}),y_{k}-x_{k}\rangle

\displaystyle\leq\frac{L\beta^{2}}{2}\|y_{k}-x_{k}\|^{2}.

Hence, if $\beta\leq\tau/(2L)$ , we conclude that

	$\displaystyle f(x_{k}+\beta(y_{k}-x_{k}))-f(x_{k})-\beta\langle\nabla f(x_{k}),y_{k}-x_{k}\rangle$	$\displaystyle\leq\frac{\tau\beta}{4}\\|y_{k}-x_{k}\\|^{2}=\frac{\tau\beta}{4}\\|x_{k}-\tilde{x}_{k}+v_{k}\\|^{2}$
		$\displaystyle\leq\frac{\beta\tau}{2}\\|x_{k}-\tilde{x}_{k}\\|^{2}+\frac{\tau}{2}\beta\\|v_{k}\\|^{2},$

using Lemma 2 in the last inequality. Since $\gamma_{1}>1>\tau$ and $\gamma_{2}\geq 0$ , we have that (12) holds, thereby proving the first statement of the lemma. The last statement follows from the first one, given that the natural number $\ell$ , defined in (19), satisfies $\beta_{\ell}:=\theta^{\ell}\leq\min\{\tau/(2L),1\}$ . ∎

This lemma provides a sufficient condition to ensure that the lower bound of the sequence generated by our linesearch is greater than $0$ . Specifically, if $\nabla f$ is $L$ -Lipschitz, then the step sizes $\beta_{k}$ , produced through the line search (12), are guaranteed to be bounded below by a positive constant $\beta>0$ , i.e., $\beta_{k}\geq\beta>0$ for all $k\in\mathbb{N}$ . Moreover, it is possible to relax the global Lipschitz condition to something local, such as $\nabla f$ being locally Lipschitz continuous around any solution of problem (1), as was done in Proposition 5.4(ii) of BelloCruz:2016a .

4 Convergence and Complexity Analyses of the IPG-ELS Method

In this section, we focus on analyzing the convergence properties of the IPG-ELS method. We establish the convergence and its iteration complexity in terms of functional values and a residual associated with an approximated solution, as defined in Definition 1.

We begin this section by presenting a result that is fundamental for establishing the convergence and the iteration complexity of the IPG-ELS method.

Proposition 3 (Key Inequality for the IPG-ELS Method)

For every $x\in{\bf dom\,}\,g$ and $k\in\mathbbm{N}$ , we have

2\beta_{k}[F(x_{k})-F(x)]+2({F(x_{k+1})-F(x_{k})})\leq\|x_{k}-x\|^{2}-\|x_{k+1}-x\|^{2}-{\alpha\beta_{k}}\|x_{k}-\tilde{x}_{k}\|^{2}.

(20)

Additionally, the functional value sequence $(F(x_{k}))_{k\in\mathbbm{N}}$ is decreasing and convergent, and $\sum_{k=0}^{\infty}\|x^{k}-x^{k+1}\|^{2}<+\infty$ .

Proof Let $x\in{\bf dom\,}\,g$ and $k\in\mathbb{N}$ . In view of the inexact relative inclusion (10), we have $v_{k}+{x}_{k}-\tilde{x}_{k}-\nabla f({x}_{k})\in\partial_{\varepsilon_{k}}g(\tilde{x}_{k})$ , and hence the definition of $\partial_{\varepsilon}g$ in (5) implies that

g(x)\geq g(\tilde{x}_{k})+\langle v_{k}+{x}_{k}-\tilde{x}_{k}-\nabla f({x}_{k}),x-\tilde{x}_{k}\rangle-\varepsilon_{k}.

(21)

Since $f$ is convex, we have $f(x)-f(x_{k})\geq\langle\nabla f(x_{k}),x-x_{k}\rangle$ . Adding the above two inequalities, using $f+g=F$ , and simplifying the resulting expression, we obtain

F(x)-F(x_{k})\geq g(\tilde{x}_{k})-g(x_{k})+\langle\nabla f(x_{k}),\tilde{x}_{k}-x_{k}\rangle-\langle{{x}_{k}-\tilde{x}_{k}},{\tilde{x}_{k}-x}\rangle+\langle{v_{k}},{x-\tilde{x}_{k}}\rangle-\varepsilon_{k}.

Combining the above inequality with the identity

-\langle{{x}_{k}-\tilde{x}_{k}},{\tilde{x}_{k}-x}\rangle=\frac{1}{2}\left[\|x_{k}-\tilde{x}_{k}\|^{2}+\|\tilde{x}_{k}-x\|^{2}-\|x_{k}-x\|^{2}\right],

we have

	$\displaystyle F(x)-F(x_{k})\geq$	$\displaystyle\;g(\tilde{x}_{k})-g(x_{k})+\langle\nabla f(x_{k}),\tilde{x}_{k}-x_{k}\rangle+\frac{1}{2}\\|x_{k}-\tilde{x}_{k}\\|^{2}$
		$\displaystyle+\frac{1}{2}\left[\\|\tilde{x}_{k}-x\\|^{2}-\\|x_{k}-x\\|^{2}\right]+\langle{v_{k}},{x-\tilde{x}_{k}}\rangle-\varepsilon_{k},$

or, equivalently,

	$\displaystyle F(x)-F(x_{k})+\frac{F(x_{k})-F(x_{k+1})}{\beta_{k}}\geq$	$\displaystyle\frac{g(x_{k})-g(x_{k+1})}{\beta_{k}}+g(\tilde{x}_{k})-g(x_{k})+\langle\nabla f(x_{k}),\tilde{x}_{k}-y_{k}\rangle$
		$\displaystyle+\frac{f(x_{k})-f(x_{k+1})}{\beta_{k}}+\langle\nabla f(x_{k}),y_{k}-x_{k}\rangle+\frac{1}{2}\\|x_{k}-\tilde{x}_{k}\\|^{2}$
		$\displaystyle+\frac{1}{2}\left[\\|x-(\tilde{x}_{k}-v_{k})\\|^{2}-\\|x_{k}-x\\|^{2}\right]-\frac{1}{2}\\|v_{k}\\|^{2}-\varepsilon_{k}.$

Since $x_{k+1}=x_{k}+\beta_{k}(y_{k}-x_{k})$ and $g$ is convex, we have $g(x_{k+1})-g(x_{k})\leq\beta_{k}(g(y_{k})-g(x_{k}))$ . Hence, combining the last two inequalities and the fact that $y_{k}=\tilde{x}_{k}-v_{k}$ , we get

	$\displaystyle F(x)-F(x_{k})+\frac{F(x_{k})-F(x_{k+1})}{\beta_{k}}\geq$	$\displaystyle\frac{1}{2}\left[\\|x-y_{k}\\|^{2}-\\|x_{k}-x\\|^{2}\right]$
		$\displaystyle+\frac{f(x_{k})-f(x_{k+1})}{\beta_{k}}+\langle\nabla f(x_{k}),y_{k}-x_{k}\rangle+\frac{1}{2}\\|x_{k}-\tilde{x}_{k}\\|^{2}$
		$\displaystyle+g(\tilde{x}_{k})-g(y_{k})+\langle\nabla f(x_{k}),\tilde{x}_{k}-y_{k}\rangle-\frac{1}{2}\\|v_{k}\\|^{2}-\varepsilon_{k}.$

or, equivalently,

	$\displaystyle F(x)-F(x_{k})$	$\displaystyle+\frac{F(x_{k})-F(x_{k+1})}{\beta_{k}}\geq\frac{1}{2}\left[\\|x-y_{k}\\|^{2}-\\|x_{k}-x\\|^{2}\right]+\frac{\alpha}{2}\\|x_{k}-\tilde{x}_{k}\\|^{2}$
		$\displaystyle+\frac{f(x_{k})-f(x_{k+1})}{\beta_{k}}+\langle\nabla f(x_{k}),y_{k}-x_{k}\rangle+\frac{\tau}{2}\\|x_{k}-\tilde{x}_{k}\\|^{2}+\frac{\gamma_{1}}{2}\\|v_{k}\\|^{2}+\gamma_{2}\varepsilon_{k}$
		$\displaystyle+g(\tilde{x}_{k})-g(y_{k})+\langle\nabla f(x_{k}),\tilde{x}_{k}-y_{k}\rangle-\frac{(1+\gamma_{1})}{2}\\|v_{k}\\|^{2}-(1+\gamma_{2})\varepsilon_{k}$
		$\displaystyle+\frac{(1-\tau-\alpha)}{2}\\|x_{k}-\tilde{x}_{k}\\|^{2}.$

Now, using the linesearch procedure of Step 4, we obtain

	$\displaystyle F(x)-F(x_{k})+\frac{F(x_{k})-F(x_{k+1})}{\beta_{k}}\geq$	$\displaystyle\frac{1}{2}\left[\\|x-y_{k}\\|^{2}-\\|x_{k}-x\\|^{2}\right]+\frac{\alpha}{2}\\|x_{k}-\tilde{x}_{k}\\|^{2}$
		$\displaystyle+g(\tilde{x}_{k})-g(y_{k})+\langle\nabla f(x_{k}),\tilde{x}_{k}-y_{k}\rangle$
		$\displaystyle-\frac{(1+\gamma_{1})}{2}\\|v_{k}\\|^{2}-(1+\gamma_{2})\varepsilon_{k}+\frac{(1-\tau-\alpha)}{2}\\|x_{k}-\tilde{x}_{k}\\|^{2}.$

It follows from the above inequality and (11) that

\beta_{k}[F(x)-F(x_{k})]+{F(x_{k})-F(x_{k+1})}\geq\frac{\beta_{k}}{2}\left[\|x-y_{k}\|^{2}-\|x_{k}-x\|^{2}\right]+\frac{\alpha\beta_{k}}{2}\|x_{k}-\tilde{x}_{k}\|^{2}.

(22)

On the other hand, using the identity $x_{k+1}-x=(1-\beta_{k})(x_{k}-x)+\beta_{k}(y_{k}-x)$ and the strong convexity of $\|\cdot\|^{2}$ , we have

\|x_{k+1}-x\|^{2}\leq(1-\beta_{k})\|x_{k}-x\|^{2}+\beta_{k}\|y_{k}-x\|^{2}-(1-\beta_{k})\beta_{k}\|x_{k}-y_{k}\|^{2},

which implies

\beta_{k}\left(\|y_{k}-x\|^{2}-\|x_{k}-x\|^{2}\right)\geq\|x_{k+1}-x\|^{2}-\|x_{k}-x\|^{2}.

Therefore, the proof of (20) follows by combining the latter inequality with (22). The last statement of the proposition follows immediately from (20) with $x=x_{k}$ that

2({F(x_{k+1})-F(x_{k})})\leq-\|x_{k+1}-x_{k}\|^{2}<0.

(23)

So, $(F(x_{k}))_{k\in\mathbbm{N}}$ is decreasing and convergent because it is bounded from below by $F(x_{*})$ . Moreover, the last inequality implies that $\sum_{k=0}^{\infty}\|x^{k}-x^{k+1}\|^{2}<+\infty$ . ∎

Next, we establish the convergence rate of the functional values sequence $(F(x_{k}))_{k\in\mathbbm{N}}$ .

Theorem 4.1 (Convergence Rate of the IPG-ELS Method)

Let $(x_{k})_{k\in\mathbbm{N}}$ and $(\beta_{k})_{k\in\mathbbm{N}}$ be generated by the IPG-ELS method. Assume that there exists $\beta>0$ such that $\beta_{k}\geq\beta$ for all $k\in\mathbbm{N}$ . Then, for all $k\in\mathbbm{N}$ , we have

F(x_{k})-F_{*}\leq\frac{{\rm dist}(x_{0},S_{*})^{2}+2\left(F(x_{0})-F_{*}\right)}{2\beta(k+1)}.

(24)

Proof For any $\ell\in\mathbbm{N}$ and $x_{*}\in S_{*}$ , it follows from Proposition 3 with $k=\ell$ and $x=x_{*}$ that

F(x_{\ell})-F(x_{*})\leq\frac{1}{2\beta_{\ell}}\left[\|x_{\ell}-x_{*}\|^{2}-\|x_{\ell+1}-x_{*}\|^{2}+2\left(F(x_{\ell})-F(x_{\ell+1})\right)\right]

Summing the above inequality over $\ell=0,1,\ldots,k$ , we have

	$\displaystyle\sum_{\ell=0}^{k}\left(F(x_{\ell})-F(x_{*})\right)$	$\displaystyle\leq\frac{1}{2\beta}\left[\\|x_{0}-x_{}\\|^{2}-\\|x_{k+1}-x_{}\\|^{2}+2\left(F(x_{0})-F(x_{k+1})\right)\right]$
		$\displaystyle\leq\frac{1}{2\beta}\left[\\|x_{0}-x_{}\\|^{2}+2\left(F(x_{0})-F(x_{})\right)\right].$		(25)

Since in view of the last statement of Proposition 3, we have $F(x_{\ell+1})\leq F(x_{\ell})$ for any $\ell\in\mathbbm{N}$ , it follows from (4) that

(k+1)\left(F(x_{k})-F(x_{*})\right)\leq\frac{1}{2\beta}\left(\|x_{0}-x_{*}\|^{2}+2\left[F(x_{0})-F(x_{*})\right]\right).

Since $x_{*}\in S_{*}$ is arbitrary, the proof of (24) follows. ∎

Remark 3 (Complexity of $\eta$ -Approximate Solution)

It follows from Theorem 4.1 that every accumulation point of the sequence $(x_{k})_{k\in\mathbbm{N}}$ is a solution to problem (1). Moreover, given any $\eta>0$ , in at most $k=\mathcal{O}(1/\eta^{2})$ iterations, the IPG-ELS method generates an $\eta$ -approximate solution $x_{k}$ to problem (1), in the sense that $F(x_{k})-F_{*}\leq\eta$ .

Theorem 4.2 (Complexity of $\eta$ -Approximate Stationary Solution)

Consider $(x_{k})_{k\in\mathbbm{N}}$ , $(\tilde{x}_{k})_{k\in\mathbbm{N}}$ , $(v_{k})_{k\in\mathbbm{N}}$ , and $(\varepsilon_{k})_{k\in\mathbbm{N}}$ generated by the IPG-ELS method and define $w_{k}:=v_{k}+x_{k}-\tilde{x}_{k}+\nabla f(\tilde{x}_{k})-\nabla f(x_{k})$ , for every $k\in\mathbbm{N}$ . Then, we have

w_{k}\in\nabla f(\tilde{x}_{k})+\partial_{\varepsilon_{k}}g(\tilde{x}_{k})\subseteq\partial_{\varepsilon_{k}}F(\tilde{x}_{k}),\qquad\forall k\in\mathbbm{N}.

(26)

Additionally, if $\alpha\in(0,1-\tau)$ , and $\nabla f$ is Lipschitz continuous on ${\bf dom\,}g$ , then, given a tolerance $\eta>0$ , the IPG-ELS method generates an $\eta$ -approximate stationary solution $\tilde{x}_{k}$ to problem (1) with residues $(w_{k},\varepsilon_{k})$ , in the sense of Definition 1, in at most $k=O\left(1/\eta^{2}\right)$ iterations.

Proof The first inclusion in (26) follows immediately from (10) and the definition of $w_{k}$ , whereas the second inclusion in (26) follows from the definitions of $F$ and $\partial_{\varepsilon}F$ .

Now let $x_{*}$ be the projection of $x_{0}$ onto $S_{*}$ and let $d_{0}:=\|x_{0}-x_{*}\|$ . As it was observed in Lemma 6, the Lipschitz continuity of $\nabla f$ implies that there exists $\beta>0$ such that $\beta_{k}\geq\beta$ for all $k$ . It follows from Proposition 3 with $k=\ell\in\mathbbm{N}$ and $x=x_{*}\in S_{*}$ that

\alpha\beta\|x_{\ell}-\tilde{x}_{\ell}\|^{2}\leq\left(\|x_{\ell}-x_{*}\|^{2}-\|x_{\ell+1}-x_{*}\|^{2}+2\left[F(x_{\ell})-F(x_{\ell+1})\right]\right)

(27)

for all $\ell\in\mathbbm{N}$ . Summing the above inequality over $\ell=0,1,\ldots,k$ , and using that $x_{*}\in S_{*}$ , we have

\alpha\beta\sum_{\ell=0}^{k}\|x_{\ell}-\tilde{x}_{\ell}\|^{2}\leq\|x_{0}-x_{*}\|^{2}+2\left[F(x_{0})-F(x_{*})\right].

Hence, since $d_{0}=\|x_{0}-x_{*}\|$ , we see that there exists ${\ell_{k}}\leq k$ such that

\|\tilde{x}_{{\ell_{k}}}-x_{{\ell_{k}}}\|^{2}\leq\frac{d_{0}^{2}+2[F(x_{0})-F(x_{*})]}{\alpha\beta(k+1)}.

(28)

On the other hand, since $\tau,\alpha>0$ , if follows from (16) that, for every $\ell\in\mathbbm{N}$ ,

	$\displaystyle\frac{(\gamma_{1}-1)}{2}\\|v_{\ell}\\|^{2}+\gamma_{2}\varepsilon_{\ell}$	$\displaystyle\leq\frac{\\|x_{\ell}-\tilde{x}_{\ell}\\|^{2}}{2}+\langle x_{\ell}-\tilde{x}_{\ell},v_{\ell}\rangle$
		$\displaystyle\leq\frac{\\|x_{\ell}-\tilde{x}_{\ell}\\|^{2}}{2}+\frac{\\|x_{\ell}-\tilde{x}_{\ell}\\|^{2}}{\gamma_{1}-1}+\frac{(\gamma_{1}-1)}{4}\\|v_{\ell}\\|^{2},$

where the last inequality is due to Cauchy-Schwarz inequality and the fact that $ab\leq sa^{2}/2+b^{2}/(2s)$ for any $a,b\in\mathbbm{R}$ and $s>0$ , in particular, with $a=\|v_{\ell}\|$ , $b=\|\tilde{x}_{\ell}-x_{\ell}\|$ , and $s=(\gamma_{1}-1)/2$ . Hence, we have

\frac{(\gamma_{1}-1)}{4}\|v_{\ell}\|^{2}+\gamma_{2}\varepsilon_{\ell}\leq\left(\frac{\gamma_{1}+1}{2(\gamma_{1}-1)}\right)\|\tilde{x}_{\ell}-x_{\ell}\|^{2},\qquad\forall\ell\in\mathbbm{N}.

(29)

Now, from the definition of $w_{\ell}$ , (29), the Cauchy-Schwarz inequality and the Lipschitz continuity of $\nabla f$ , we have, for every $\ell\in\mathbbm{N}$ ,

	$\displaystyle\\|w_{\ell}\\|$	$\displaystyle\leq\\|v_{\ell}\\|+\\|{x}_{\ell}-\tilde{x}_{\ell}\\|+\\|\nabla f(x_{\ell})-\nabla f(\tilde{x}_{\ell})\\|$
		$\displaystyle\leq\left[\frac{\sqrt{2(\gamma_{1}+1)}}{\gamma_{1}-1}+1+L\right]\\|x_{\ell}-\tilde{x}_{\ell}\\|.$		(30)

Moreover, it follows from (29) that

\varepsilon_{\ell}\leq\left(\frac{\gamma_{1}+1}{2\gamma_{2}(\gamma_{1}-1)}\right)\|x_{\ell}-\tilde{x}_{\ell}\|^{2}.

(31)

Hence, it follows from (28), (4), and (31) with $\ell=\ell_{k}$ and $m_{0}:=d_{0}^{2}+2(F(x_{0})-F(x_{*}))$ that

w_{\ell_{k}}\leq\left[\frac{\sqrt{2(\gamma_{1}+1)}}{\gamma_{1}-1}+1+L\right]\frac{\sqrt{m_{0}}}{\sqrt{\alpha\beta(k+1)}},\qquad\varepsilon_{\ell_{k}}\leq\left(\frac{\gamma_{1}+1}{2\gamma_{2}(\gamma_{1}-1)}\right)\frac{m_{0}}{\alpha\beta(k+1)}

which in turn implies that

w_{\ell_{k}}=\mathcal{O}(1/\sqrt{k}),\quad\varepsilon_{\ell_{k}}=\mathcal{O}(1/k).

(32)

Thus, the last statement of the theorem follows from (32) and the first inclusion in (26). ∎

We end this section by establishing the full convergence of the sequence $(x_{k})_{k\in\mathbbm{N}}$ to a solution of problem (1). The proof is based on the quasi-Fejér convergence of the sequence $(x_{k})_{k\in\mathbbm{N}}$ to the set $S_{*}$ , as defined in Definition 2.

Theorem 4.3 (Convergence for the IPG-ELS Method)

The sequence $(x_{k})_{k\in\mathbbm{N}}$ generated by the IPG-ELS method converges to a point in $S_{*}$ .

Proof By employing Proposition 3 at $x=x_{*}\in S_{*}\subseteq{\bf dom\,}g$ , we have

\|x_{k+1}-x_{*}\|^{2}\leq\|x_{k}-x_{*}\|^{2}+2\left[F(x_{k})-F(x_{k+1})\right]\quad\mbox{for all}\quad k\in\mathbbm{N}.

(33)

We set $\delta_{k}:=2\left[F(x_{k})-F(x_{k+1})\right]\geq 0$ , and we will prove that $(\delta_{k})_{k\in\mathbbm{N}}$ is a summable sequence. In fact,

	$\displaystyle\sum_{k=0}^{\infty}\delta_{k}=$	$\displaystyle 2\sum_{k=0}^{\infty}\Big{[}F(x_{k})-F(x_{k+1})\Big{]}\leq 2\Big{[}F(x_{0})-\lim_{k\to\infty}F(x_{k+1})\Big{]}$
	$\displaystyle\leq$	$\displaystyle 2\Big{[}F(x_{0})-F(x_{*})\Big{]}<+\infty.$

This together with (33) tells us that the sequence $(x_{k})_{k\in\mathbbm{N}}$ is quasi-Fejér convergent to $S_{*}$ via Definition 2. By Lemma 1(a), the sequence $(x_{k})_{k\in\mathbbm{N}}$ is bounded. Let $\bar{x}$ be a cluster point of $(x_{k})_{k\in\mathbbm{N}}$ . Hence there exists a subsequence $(x_{\ell_{k}})_{k\in\mathbbm{N}}$ of $(x_{k})_{k\in\mathbbm{N}}$ converging to $\bar{x}$ .

Now we proceed by considering the two possible cases:

Case 1. The sequence $\left(\beta_{\ell_{k}}\right)_{k\in\mathbbm{N}}$ does not converge to $0$ , i.e., there exist some $\beta>0$ and a subsequence of $\left(\beta_{\ell_{k}}\right)_{k\in\mathbbm{N}}$ (without relabeling) such that

\beta_{\ell_{k}}\geq\beta,\quad\forall\,k\in\mathbbm{N}.

(34)

By using Proposition 3 with $x=x_{*}\in S_{*}$ , we get

\displaystyle\beta_{k}\left[F(x_{k})-F(x_{*})\right]\leq

\displaystyle\frac{1}{2}(\|x_{k}-x_{*}\|^{2}-\|x_{k+1}-x_{*}\|^{2})+F(x_{k})-F(x_{k+1}).

Summing for $k=0,1,\ldots,m$ , the above inequality implies that

	$\displaystyle\sum_{k=0}^{m}\beta_{k}\left[F(x_{k})-F(x_{*})\right]\leq$	$\displaystyle\frac{1}{2}(\\|x_{0}-x_{}\\|^{2}-\\|x_{m+1}-x_{}\\|^{2})+F(x_{0})-F(x_{m+1})$
	$\displaystyle\leq$	$\displaystyle\frac{1}{2}\\|x_{0}-x_{}\\|^{2}+F(x_{0})-F(x_{}).$

By taking $m\to+\infty$ and using the fact that $F(x_{k})\geq F(x_{*})$ and (34), we obtain that

\beta\sum_{k=0}^{\infty}\left[F(x_{\ell_{k}})-F(x_{*})\right]\leq\sum_{k=0}^{\infty}\beta_{\ell_{k}}\left[F(x_{\ell_{k}})-F(x_{*})\right]\leq\sum_{k=0}^{\infty}\beta_{k}\left[F(x_{k})-F(x_{*})\right]<+\infty,

which together with (34) establishes that $\dsty\lim_{k\rightarrow\infty}\,F(x_{\ell_{k}})=F(x_{*}).$ Since $F$ is lower semicontinuous on ${\bf dom\,}g$ , it follows from the last equality that

F(x_{*})\leq F(\bar{x})\leq\liminf_{k\rightarrow\infty}F(x_{\ell_{k}})=\lim_{k\rightarrow\infty}F(x_{\ell_{k}})=F(x_{*}),

which yields $F(\bar{x})=F(x_{*})$ and thus $\bar{x}\in S_{*}$ .

Case 2. $\dsty\lim_{k\rightarrow\infty}\beta_{\ell_{k}}=0$ . Define $\dsty\hat{\beta}_{k}:=\frac{\beta_{k}}{\theta}>0$ and

\hat{x}_{k+1}:=x_{k}+\hat{\beta}_{k}(y_{k}-x_{k})=(1-\hat{\beta}_{k})x_{k}+\hat{\beta}_{k}y_{k},

(35)

where $y_{k}=\tilde{x}_{k}-v_{k}$ . It follows from the definition of the linesearch that

f(\hat{x}_{k+1})>f(x_{k})+\hat{\beta}_{k}\langle\nabla f(x_{k}),y_{k}-x_{k}\rangle+\frac{\tau}{2}\hat{\beta}_{k}\|x_{k}-\tilde{x}_{k}\|^{2}+\frac{\gamma_{1}}{2}\hat{\beta}_{k}\|v_{k}\|^{2}+\hat{\beta}_{k}\gamma_{2}\varepsilon_{k}.

(36)

It follows from the convexity of $f$ , the fact that $\gamma_{1}>1>\tau$ , and the positiveness of the term $\hat{\beta}_{k}\gamma_{2}\varepsilon_{k}$ that

\displaystyle\langle\nabla f(\hat{x}_{k+1}),\hat{x}_{k+1}-x_{k}\rangle\geq f(\hat{x}_{k+1})-f(x_{k})>\hat{\beta}_{k}\langle\nabla f(x_{k}),y_{k}-x_{k}\rangle+\hat{\beta}_{k}\frac{\tau}{2}\left(\|x_{k}-\tilde{x}_{k}\|^{2}+\|v_{k}\|^{2}\right)

which, together with (35), yields

	$\displaystyle\hat{\beta}_{k}\frac{\tau}{2}\left(\\|x_{k}-\tilde{x}_{k}\\|^{2}+\\|v_{k}\\|^{2}\right)$	$\displaystyle<\hat{\beta}_{k}\langle\nabla f(\hat{x}_{k+1})-\nabla f(x_{k}),y_{k}-x_{k}\rangle$
		$\displaystyle\leq\hat{\beta}_{k}\\|\nabla f(\hat{x}_{k+1})-\nabla f(x_{k})\\|\\|y_{k}-x_{k}\\|$
		$\displaystyle=\hat{\beta}_{k}\\|\nabla f(\hat{x}_{k+1})-\nabla f(x_{k})\\|\\|x_{k}-\tilde{x}_{k}+v_{k}\\|.$

Now it follows from Lemma 2 that

\|x_{k}-\tilde{x}_{k}+v_{k}\|^{2}\leq 2\left(\|x_{k}-\tilde{x}_{k}\|^{2}+\|v_{k}\|^{2}\right).

Hence,

\hat{\beta}_{k}\frac{\tau}{2}\left(\|x_{k}-\tilde{x}_{k}\|^{2}+\|v_{k}\|^{2}\right)<\hat{\beta}_{k}\|\nabla f(\hat{x}_{k+1})-\nabla f(x_{k})\|\cdot\sqrt{2}\left(\|x_{k}-\tilde{x}_{k}\|^{2}+\|v_{k}\|^{2}\right)^{\frac{1}{2}},

which, due to the positiveness of $\|x_{k}-\tilde{x}_{k}\|^{2}+\|v_{k}\|^{2}$ , yields

\frac{\tau\sqrt{2}}{4}\left(\|x_{k}-\tilde{x}_{k}\|^{2}+\|v_{k}\|^{2}\right)^{\frac{1}{2}}\leq\|\nabla f(\hat{x}_{k+1})-\nabla f(x_{k})\|.

(37)

Note that $\|\hat{x}_{k+1}-x_{k}\|=\|\hat{\beta}_{k}(y_{k}-x_{k})\|=\frac{\hat{\beta}_{k}}{\beta_{k}}\|x_{k+1}-x_{k}\|=\frac{1}{\theta}\|x_{k+1}-x_{k}\|,$ which combined with the last statement of Proposition 3 give us that $\|\hat{x}_{\ell_{k}+1}-x_{\ell_{k}}\|\to 0$ as $k\to+\infty$ . Since $\nabla f$ is continuous, we have $\|\nabla f(\hat{x}_{\ell_{k}+1})-\nabla f(x_{\ell_{k}})\|\to 0$ as $k\to+\infty$ . From (37), it is derived that

\lim_{k\rightarrow+\infty}\,\|x_{\ell_{k}}-\tilde{x}_{\ell_{k}}\|^{2}+\|v_{\ell_{k}}\|^{2}\leq 0,

(38)

therefore, $\lim_{k\rightarrow+\infty}\,\|x_{\ell_{k}}-\tilde{x}_{\ell_{k}}\|=0$ and $\lim_{k\rightarrow+\infty}\|v_{\ell_{k}}\|=0$ . Additionally, we can use (11) to show that $\lim_{k\rightarrow+\infty}\varepsilon_{\ell_{k}}=0$ . Thus, $\bar{x}$ is also an accumulation point of the sequence $(\tilde{x}_{k})_{k\in\mathbb{N}}$ , and without loss of generality, we can assume that $\tilde{x}_{\ell_{k}}\to\bar{x}$ as $k\to+\infty$ . Moreover, we have that

\lim_{k\rightarrow+\infty}\|\nabla f(x_{\ell_{k}})-\nabla f(\tilde{x}_{\ell_{k}})\|=0.

(39)

Now, using (11), we obtain

w_{k}:=x_{k}-\tilde{x}_{k}+v_{k}+\nabla f(\tilde{x}_{k})-\nabla f(x_{k})\in\nabla f(\tilde{x}_{k})+\partial_{\varepsilon_{k}}g(\tilde{x}_{k})\subseteq\partial_{\varepsilon_{k}}F(\tilde{x}_{k}).

Furthermore, since $v_{\ell_{k}}$ converges to $0$ as indicated by (38), and by applying the triangular inequality, we obtain

\|w_{k}\|=\|x_{k}-\tilde{x}_{k}+v_{k}+\nabla f(\tilde{x}_{k})-\nabla f(x_{k})\|\leq\|x_{k}-\tilde{x}_{k}\|+\|v_{k}\|+\|\nabla f(\tilde{x}_{k})-\nabla f(x_{k})\|,

which implies, via (38) and (39), that $w_{\ell_{k}}\in\partial_{\varepsilon_{\ell_{k}}}F(\tilde{x}_{\ell_{k}})$ also converges to $0$ . Consequently, the convergence of $\tilde{x}_{\ell_{k}}$ to $\bar{x}$ and $\varepsilon_{\ell_{k}}$ to $0$ , combined with the closedness of the graph of $\partial F$ in Proposition 1, gives us that $0\in\partial F(\bar{x})$ . This is equivalent to stating that $\bar{x}\in S_{*}$ .

In all the cases considered above, we have shown that $\bar{x}$ , an accumulation point of the sequence $(x_{k})_{k\in\mathbb{N}}$ , belongs to $S_{*}$ . Proposition 1(b) implies that $(x_{k})_{k\in\mathbb{N}}$ converges to an optimal solution in $S_{*}$ . ∎

5 Numerical Experiments

In this section, we investigate the numerical behavior of the IPG-ELS method in solving the CUR-like factorization optimization problem JMLR:v12:mairal11a . Consider $\mathbbm{E}=\mathbbm{R}^{n\times m}$ . Given a matrix $W\in\mathbbm{R}^{m\times n}$ , the objective is to find a sparse rows and columns matrix, $X\in\mathbbm{R}^{n\times m}$ , such that $WXW$ approximates $W$ . This problem can be formulated as the following splitting optimization problem:

\min_{X\in\mathbbm{R}^{n\times m}}\frac{1}{2}\|W-WXW\|_{F}^{2}+\lambda_{\text{row}}\sum_{i=1}^{n}\|X^{(i)}\|_{2}+\lambda_{\text{col}}\sum_{j=1}^{m}\|X_{(j)}\|_{2},

where $\|\cdot\|_{F}$ denotes the Frobenius norm, and $X^{(i)}$ and $X_{(j)}$ denote the $i$ -th row and $j$ -th column of $X$ , respectively. This problem is a special case of problem (1) with

f(X):=\frac{1}{2}\|W-WXW\|_{F}^{2},\quad g(X):=\lambda_{\text{row}}\sum_{i=1}^{n}\|X^{(i)}\|_{2}+\lambda_{\text{col}}\sum_{j=1}^{m}\|X_{(j)}\|_{2}.

In this case, the gradient of $f$ is given by $\nabla f(X)=W^{T}(W-WXW)W^{T}$ and has a Lipschitz constant $L=\|W^{T}W\|^{2}$ . However, the proximal operator of $g$ does not have a closed-form solution. Following the approach in Schmidt:2011 (see also Barre:2022 ; Zhou:2022 ), a triple $(\tilde{X}_{k},V_{k},\varepsilon_{k})$ satisfying (10)-(11) can be obtained using the Dykstra-like algorithm (Bauschke:2008c, , Theorem 3.3). This algorithm is applied to the proximal subproblem.

\min_{X\in\mathbbm{R}^{n\times m}}\frac{1}{2}\|X-Z\|^{2}_{F}+{\lambda_{\rm row}}\sum_{i=1}^{n}\|X^{(i)}\|_{2}+{\lambda_{\rm col}}\sum_{j=1}^{m}\|X_{(j)}\|_{2},

(40)

where $Z:=X_{k}-\nabla f(X_{k}),$ with the initial points $Z_{0}=Z,\quad P_{0}=0,\quad Q_{0}=0$ , generates the sequences

\left\{\begin{array}[]{l}Y_{\ell}={\bf prox}_{g_{1}}(Z_{\ell}+P_{\ell})\\ P_{\ell+1}=Z_{\ell}+P_{\ell}-Y_{\ell}\end{array}\right.\quad\mbox{and}\quad\left\{\begin{array}[]{l}Z_{\ell+1}={\bf prox}_{g_{2}}(Y_{\ell}+Q_{\ell})\\ Q_{\ell+1}=Y_{\ell}+Q_{\ell}-Z_{\ell+1},\end{array}\right.

(41)

where $g_{1}(X)=\lambda_{\rm col}\sum_{j=1}^{m}\|X_{(j)}\|_{2}$ and $g_{2}(X)=\lambda_{\rm row}\sum_{i=1}^{n}\|X^{(i)}\|_{2}$ . Recall that:

{\bf prox}_{g_{1}}(X)_{(j)}=\max\left\{1-\frac{\lambda_{\text{col}}}{\|X_{(j)}\|_{2}},0\right\}X_{(j)},\;\text{for }\quad j=1,\ldots,m

and

{\bf prox}_{g_{2}}(X)^{(i)}=\max\left\{1-\frac{\lambda_{\text{row}}}{\|X^{(i)}\|_{2}},0\right\}X^{(i)},\;\text{for }\quad i=1,\ldots,n.

Using now the definition of $Y_{\ell}$ , as presented in (41), we have that $Z_{\ell}+P_{\ell}-Y_{\ell}\in\partial g_{1}(Y_{\ell})$ . This observation, combined with Proposition 2, yields

Z_{\ell}+P_{\ell}-Y_{\ell}\in\partial_{\epsilon_{\ell}}g_{1}(Z_{\ell+1}),\quad\mbox{where}\quad\epsilon_{\ell}=g_{1}(Z_{\ell+1})-g_{1}(Y_{\ell})-\langle{Z_{\ell}+P_{\ell}-Y_{\ell}},{Z_{\ell+1}-Y_{\ell}}\rangle\geq 0.

On the other hand, it follows from the definition of $Z_{\ell+1}$ in (41) that

Y_{\ell}+Q_{\ell}-Z_{\ell+1}\in\partial g_{2}(Z_{\ell+1}).

Thus, it follows from the last two inclusions and the definition of $Z$ that

V_{\ell}:=Z_{\ell}+P_{\ell}+Q_{\ell}-Z\in\partial_{\epsilon_{\ell}}g(Z_{\ell+1})+Z_{\ell+1}-X_{\ell}+\nabla f(X_{\ell}).

Therefore, if the Dykstra-like algorithm is stopped when

g(Z_{\ell}-V_{\ell})-g(Z_{\ell})-\langle\nabla f(X_{\ell}),V_{\ell}\rangle+\frac{(1+\gamma_{1})}{2}\|V_{\ell}\|^{2}+(1+\gamma_{2})\varepsilon_{\ell}\leq\frac{(1-\tau-\alpha)}{2}\|X_{\ell}-Z_{\ell}\|^{2},

then the triple $(\tilde{X}_{k},V_{k},\varepsilon_{k}):=(Z_{\ell},V_{\ell},\epsilon_{\ell})$ satisfies (10)-(11).

Considering that the IPG-ELS method integrates two effective strategies: (i) permitting the subproblem to be solved inexactly to meet a relative approximation criterion, and (ii) employing an explicit linesearch procedure that computes the proximal operator only once per iteration, our primary goal is to demonstrate that, in certain cases, the proposed method surpasses the proximal gradient method that employs only one of these strategies. Consequently, we compare the new algorithm with two alternative schemes: an exact version of the IPG-ELS method, denoted by PG-ELS, which corresponds to IPG-ELS with $\gamma_{1}=\gamma_{2}=0$ , $\theta=0.5$ , $\tau=1$ , and $\max\{\|V_{k}\|,\epsilon_{k}\}\leq 10^{-12}$ , replacing the inexact criterion in (11); and an IPG method with a fixed Stepsize, corresponding to (Millan:2019, , Algorithm 2) with $\alpha_{k}=1/L$ and $\sigma=0.9$ , where $L$ is the Lipschitz constant of $f$ . This algorithm is denoted by IPG-FixStep and is defined as

X_{k+1}=\tilde{X}_{k}-V_{k},\quad\forall k\geq 0,

where $\tilde{X}_{k}$ and $V_{k}$ satisfy

LV_{k}\in\partial_{\varepsilon_{k}}g(\tilde{X}_{k})+L(\tilde{X}_{k}-X_{k})+\nabla f(X_{k}),\quad\|V_{k}\|^{2}+\frac{2\varepsilon_{k}}{L}\leq 0.9\|X_{k}-\tilde{X}_{k}\|^{2}.

The initialization parameters for the IPG-ELS method were set as $\tau=0.8$ , $\theta=0.5$ , $\gamma_{1}=\gamma_{2}=1.1$ , and $\alpha=0.01$ . For all tests, we initialized $X_{0}=0\in\mathbbm{R}^{n\times m}$ , and set $\lambda_{\text{row}}=\lambda_{\text{col}}=0.01$ . The IPG-ELS method was executed for $101$ outer iterations to establish a baseline performance metric, $F_{*}:=F(X_{101})$ . The other two algorithms were terminated as soon as $F(x_{k})\leq F_{*}$ . The algorithms were evaluated on six datasets from Cano:2005 ; Guyon:2004 ; Team:2008 : Colon tumor ( $62\times 2000$ ), heart disease ( $303\times 14$ ), central nervous system (CNS) ( $60\times 7129$ ), lung cancer-Michigan ( $96\times 7129$ ), Secom ( $1567\times 590$ ), and Cina0 ( $132\times 16033$ ).

Each matrix $W$ was normalized to have a unit Frobenius norm, with an additional step of centering each column. Subsequently, the resulting matrices were multiplied by a constant, which plays a crucial role in controlling the Lipschitz constant of the function $f$ . The experiments were conducted using the Python programming language, which was installed on a machine equipped with a 3.5 GHz Dual-Core Intel Core i5 processor and 16 GB of 2400 MHz DDR4 memory.

In Tables 1 and 2, we report the Lipschitz constant of the gradient of $f$ (denoted as Lips), the number of outer iterations (O-IT), the number of inner iterations (I-IT), the number of linesearch iterations (LS-IT), and the total running time in seconds (Time). The results indicate that, in terms of CPU times, the IPG-ELS method outperforms the other two methods. This efficiency can be attributed to two main factors: (i) the PG-ELS method requires a significantly higher number of inner iterations to solve the proximal subproblem ”exactly”, and (ii) the IPG-FixStep method employs a small stepsize of $1/L$ in the gradient step.

Problem	Lips	Method	$F(X_{k})$	O-IT	I-IT	LS-IT	Time
Colon Tumor	41.58	IPG-ELS	1.1056	101	195	318	28.88
		PG-ELS	1.1056	104	751	324	36.19
		IPG-FixStep	1.1056	1450	1451	-	171.72
Colon Tumor	665.32	IPG-ELS	2.3647	101	101	823	32.74
		PG-ELS	2.3623	128	455	1038	44.02
		IPG-FixStep	2.3647	1103	1104	-	97.72
Colon Tumor	5133.69	IPG-ELS	5.8989	101	101	1134	110.86
		PG-ELS	5.8832	119	328	1324	116.26
		IPG-FixStep	5.8981	586	587	-	138.92
Heart Disease	77.12	IPG-ELS	0.1732	101	178	539	1.46
		PG-ELS	0.1732	119	876	603	4.28
		IPG-FixStep	0.1732	487	488	-	3.52
Heart Disease	1233.99	IPG-ELS	0.3129	101	101	903	0.85
		PG-ELS	0.3119	89	368	788	1.71
		IPG-FixStep	0.3520	2001	2001	-	11.85
Heart Disease	9521.59	IPG-ELS	0.4995	101	101	1222	0.99
		PG-ELS	0.4992	227	622	2690	5.46
		IPG-FixStep	0.7242	2001	2001	-	43.92
CNS	41.95	IPG-ELS	0.9519	101	182	341	364.44
		PG-ELS	0.9518	119	714	396	501.59
		IPG-FixStep	0.9519	1217	1218	-	1675.05
CNS	671.17	IPG-ELS	2.1153	101	101	768	544.91
		PG-ELS	2.1119	148	725	1110	878.39
		IPG-FixStep	2.2193	2001	2001	-	2749.17
CNS	5178.80	IPG-ELS	6.0665	101	101	1130	1089.70
		PG-ELS	6.0613	102	343	1138	1169.25
		IPG-FixStep	6.0655	775	776	-	1677.93

Table 1: Performance of the IPG-ELS, PG-ELS and IPG-FixStep methods on

3

data sets.

Problem	Lips	Method	$F(X_{k})$	O-IT	I-IT	LS-IT	Time
Lung cancer	52.58	IPG-ELS	0.8985	101	179	443	422.07
		PG-ELS	0.8984	105	837	457	521.46
		IPG-FixStep	0.8985	678	679	-	962.80
Lung cancer	841.22	IPG-ELS	2.7632	101	101	845	591.92
		PG-ELS	2.7623	131	768	1085	851.75
		IPG-FixStep	2.7631	588	589	-	838.30
Lung cancer	2658.70	IPG-ELS	3.6391	101	101	992	696.61
		PG-ELS	3.6378	161	740	1574	1173.05
		IPG-FixStep	3.8819	2001	2001	-	2969.58
Secom	45.78	IPG-ELS	0.6438	101	175	373	66.64
		PG-ELS	0.6438	99	9247	360	694.81
		IPG-FixStep	0.6438	857	858	-	225.29
Secom	732.50	IPG-ELS	0.8587	101	101	779	82.81
		PG-ELS	0.8586	102	4931	795	423.80
		IPG-FixStep	0.8586	1662	1663	-	440.13
Secom	5652.06	IPG-ELS	1.6981	101	101	1138	104.54
		PG-ELS	1.6899	125	1346	1388	215.08
		IPG-FixStep	1.6977	801	802	-	211.93
Cina0	68.39	IPG-ELS	0.7972	101	245	487	3041.60
		PG-ELS	0.7972	104	1490	483	5005.84
		IPG-FixStep	0.7972	608	790	-	5530.44
Cina0	527.70	IPG-ELS	1.2817	101	250	838	3925.72
		PG-ELS	1.2817	94	1789	693	5712.33
		IPG-FixStep	1.2817	1010	1011	-	8741.06
Cina0	8443.20	IPG-ELS	3.6531	101	104	1126	3791.25
		PG-ELS	3.6530	284	3083	3061	15338.18
		IPG-FixStep	3.8493	2001	2001	-	17104.23

Table 2: Performance of the IPG-ELS, PG-ELS and IPG-FixStep methods on

3

data sets.

6 Conclusions

In this work, we present an inexact proximal gradient method for solving composite convex optimization problems. This method features a novel explicit line search using the relative-type inexact solution of the proximal subproblem. Our approach is primarily designed to solve splitting problems when the objective function is the sum of differentiable and nondifferentiable convex functions, and the analytical computation of the proximal operator is not available. Notably, the convergence of the proposed method is established without assuming Lipschitz continuity of the gradient of the smooth function. This method addresses the need for a balance between computational efficiency and the accuracy of solving the proximal subproblem, a common challenge in practice.

We have confirmed the convergence and iteration complexity of our method, validating its theoretical soundness and practical utility. Numerical experiments demonstrate its applicability and efficiency. Our method maintains convergence rates while efficiently managing relative inexact solutions of the proximal operator. The numerical results indicate that the proposed method competes effectively with both the exact proximal gradient method and the inexact proximal gradient method with a fixed stepsize.

Acknowledgements.

YBC was partially supported by the NSF Grant DMS-2307328, and by an internal grant from NIU. MLNG was partially supported by CNPq Grants 405349/2021-1 and 304133/2021-3. JGM was partially supported by CNPq Grant 312223/2022-6.

References

(1) Adona, V. A., Gonçalves, M. L. N., and Melo, J. G. A Partially Inexact Proximal Alternating Direction Method of Multipliers and Its Iteration-Complexity Analysis. Journal of Optimization Theory and Applications 182, 2 (Aug. 2019), 640–666.
(2) Adona, V. A., Gonçalves, M. L. N., and Melo, J. G. An inexact proximal generalized alternating direction method of multipliers. Computational Optimization and Applications 76, 3 (July 2020), 621–647.
(3) Alves, M. M., Eckstein, J., Geremia, M., and Melo, J. G. Relative-error inertial-relaxed inexact versions of Douglas-Rachford and ADMM splitting algorithms. Computational Optimization and Applications 75, 2 (Mar. 2020), 389–422.
(4) Aujol, J.-F., and Dossal, Ch. Stability of over-relaxations for the forward-backward algorithm, application to FISTA. SIAM Journal on Optimization 25, 4 (2015), 2408–2433.
(5) Barré, M., Taylor, A., and Bach, F. A note on approximate accelerated forward-backward methods with absolute and relative errors, and possibly strongly convex objectives. Open Journal of Mathematical Optimization 3 (2022), 1–15.
(6) Bauschke, H. H., and Borwein, J. M. On Projection Algorithms for Solving Convex Feasibility Problems. SIAM Review 38, 3 (1996), 367–426.
(7) Bauschke, H. H., and Combettes, P. L. A dykstra-like algorithm for two monotone operators. Pacific J. Optim. 4, 3 (2008), 382–391.
(8) Beck, Amir., and Teboulle, Marc. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences 2, 1 (2009), 183–202.
(9) Bello Cruz, J. Y. On Proximal Subgradient Splitting Method for Minimizing the sum of two Nonsmooth Convex Functions. Set-Valued and Variational Analysis 25, 2 (June 2017), 245–263.
(10) Bello Cruz, J. Y., and de Oliveira, W. On weak and strong convergence of the projected gradient method for convex optimization in real Hilbert spaces. Numerical Functional Analysis and Optimization. An International Journal 37, 2 (2016), 129–144.
(11) Bello Cruz, J. Y., and Nghia, T. T. A. On the convergence of the forward–backward splitting method with linesearches. Optimization Methods & Software 31, 6 (2016), 1209–1238.
(12) Bello-Cruz, Y., Gonçalves, M. L. N., and Krislock, N. On FISTA with a relative error rule. Computational Optimization and Applications 84, 2 (Mar. 2023), 295–318.
(13) Bello-Cruz, Y., Li, G., and Nghia, T. T. A. On the Linear Convergence of Forward–Backward Splitting Method: Part I—Convergence Analysis. Journal of Optimization Theory and Applications 188, 2 (Feb. 2021), 378–401.
(14) Bonettini, S., and Ruggiero, V. On the convergence of primal-dual hybrid gradient algorithms for total variation image restoration. Journal of Mathematical Imaging and Vision 44, 3 (2012), 236–253.
(15) Bredies, K. A forward-backward splitting algorithm for the minimization of non-smooth convex functionals in Banach space. Inverse Problems. An International Journal on the Theory and Practice of Inverse Problems, Inverse Methods and Computerized Inversion of Data 25, 1 (2009), 015005, 20.
(16) Cano, A., Masegosa, A., and Moral, S. ELVIRA biomedical data set repository. https://leo.ugr.es/elvira/DBCRepository/, 2005.
(17) Combettes, P. L., and Wajs, V. R. Signal recovery by proximal forward-backward splitting. Multiscale Modeling and Simulation 4, 4 (2005), 1168–1200.
(18) Eckstein, J., and Silva, P. J. S. A practical relative error criterion for augmented Lagrangians. Mathematical Programming 141, 1 (Oct. 2013), 319–348.
(19) Eckstein, J., and Yao, W. Approximate ADMM algorithms derived from Lagrangian splitting. Computational Optimization and Applications 68, 2 (Nov. 2017), 363–405.
(20) Eckstein, J., and Yao, W. Relative-error approximate versions of Douglas–Rachford splitting and special cases of the ADMM. Mathematical Programming 170, 2 (Aug. 2018), 417–444.
(21) Frank, M., and Wolfe, P. An algorithm for quadratic programming. Naval Research Logistics Quarterly 3, 1-2 (1956), 95–110.
(22) Guyon, I. UCI machine learning repository, 2004.
(23) Hiriart-Urruty, J.-B., and Lemaréchal, C. Convex Analysis and Minimization Algorithms II, vol. 306 of Grundlehren Der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin, 1993.
(24) Jaggi, M. Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization. In Proceedings of the 30th International Conference on Machine Learning (Feb. 2013), PMLR, pp. 427–435.
(25) Jiang, K., Sun, D., and Toh, K.-c. An Inexact Accelerated Proximal Gradient Method for Large Scale Linearly Constrained Convex SDP. SIAM Journal on Optimization 22, 3 (Jan. 2012), 1042–1064.
(26) Mairal, J., Jenatton, R., Obozinski, G., and Bach, F. Convex and network flow optimization for structured sparsity. Journal of Machine Learning Research 12, 81 (2011), 2681–2720.
(27) Millán, R. D., and Machado, M. P. Inexact proximal $$\epsilon $$-subgradient methods for composite convex optimization problems. Journal of Global Optimization 75, 4 (Dec. 2019), 1029–1060.
(28) Monteiro, R. D. C., and Svaiter, B. F. On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean. SIAM Journal on Optimization 20, 6 (2010), 2755–2787.
(29) Salzo, S. The variable metric forward-backward splitting algorithm under mild differentiability assumptions. SIAM Journal on Optimization 27, 4 (2017), 2153–2181.
(30) Salzo, S., Masecchia, S., Verri, A., and Barla, A. Alternating proximal regularized dictionary learning. Neural Computation 26, 12 (Dec. 2014), 2855–2895.
(31) Salzo, S., and Villa, S. Inexact and accelerated proximal point algorithms. J. Convex Anal. 19, 4 (2012), 1167–1192.
(32) Schmidt, M., Roux, N. L., and Bach, F. Convergence rates of inexact proximal-gradient methods for convex optimization. In Proceedings of the 24th International Conference on Neural Information Processing Systems (Red Hook, NY, USA, Dec. 2011), NIPS’11, Curran Associates Inc., pp. 1458–1466.
(33) Schuster, T., Kaltenbacher, B., Hofmann, B., and Kazimierski, K. S. Regularization Methods in Banach Spaces. In Regularization Methods in Banach Spaces. De Gruyter, July 2012.
(34) Solodov, M. V., and Svaiter, B. F. A hybrid approximate extragradient-proximal point algorithm using the enlargement of a maximal monotone operator. Set-Valued Analysis 7, 4 (1999), 323–345.
(35) Team, C. A marketing dataset, Available at http://www.causality.inf.ethz.ch/data/CINA.html, 2008.
(36) Villa, S., Salzo, S., Baldassarre, L., and Verri, A. Accelerated and inexact forward-backward algorithms. SIAM Journal on Optimization 23, 3 (2013), 1607–1633.
(37) Zhong, L. W., and Kwok, J. T. Efficient Sparse Modeling With Automatic Feature Grouping. IEEE Transactions on Neural Networks and Learning Systems 23, 9 (Sept. 2012), 1436–1447.
(38) Zhou, Q., and Pan, S. J. Convergence Analysis of Linear Coupling with Inexact Proximal Operator. In Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence (Aug. 2022), PMLR, pp. 2394–2403.

	$\displaystyle F(x)-F(x_{k})+\frac{F(x_{k})-F(x_{k+1})}{\beta_{k}}\geq$	$\displaystyle\frac{1}{2}\left[\\|x-y_{k}\\|^{2}-\\|x_{k}-x\\|^{2}\right]$
		$\displaystyle+\frac{f(x_{k})-f(x_{k+1})}{\beta_{k}}+\langle\nabla f(x_{k}),y_{k}-x_{k}\rangle+\frac{1}{2}\\|x_{k}-\tilde{x}_{k}\\|^{2}$
		$\displaystyle+g(\tilde{x}_{k})-g(y_{k})+\langle\nabla f(x_{k}),\tilde{x}_{k}-y_{k}\rangle-\frac{1}{2}\\|v_{k}\\|^{2}-\varepsilon_{k}.$

	$\displaystyle F(x)-F(x_{k})$	$\displaystyle+\frac{F(x_{k})-F(x_{k+1})}{\beta_{k}}\geq\frac{1}{2}\left[\\|x-y_{k}\\|^{2}-\\|x_{k}-x\\|^{2}\right]+\frac{\alpha}{2}\\|x_{k}-\tilde{x}_{k}\\|^{2}$
		$\displaystyle+\frac{f(x_{k})-f(x_{k+1})}{\beta_{k}}+\langle\nabla f(x_{k}),y_{k}-x_{k}\rangle+\frac{\tau}{2}\\|x_{k}-\tilde{x}_{k}\\|^{2}+\frac{\gamma_{1}}{2}\\|v_{k}\\|^{2}+\gamma_{2}\varepsilon_{k}$
		$\displaystyle+g(\tilde{x}_{k})-g(y_{k})+\langle\nabla f(x_{k}),\tilde{x}_{k}-y_{k}\rangle-\frac{(1+\gamma_{1})}{2}\\|v_{k}\\|^{2}-(1+\gamma_{2})\varepsilon_{k}$
		$\displaystyle+\frac{(1-\tau-\alpha)}{2}\\|x_{k}-\tilde{x}_{k}\\|^{2}.$

	$\displaystyle\\|w_{\ell}\\|$	$\displaystyle\leq\\|v_{\ell}\\|+\\|{x}_{\ell}-\tilde{x}_{\ell}\\|+\\|\nabla f(x_{\ell})-\nabla f(\tilde{x}_{\ell})\\|$
		$\displaystyle\leq\left[\frac{\sqrt{2(\gamma_{1}+1)}}{\gamma_{1}-1}+1+L\right]\\|x_{\ell}-\tilde{x}_{\ell}\\|.$		(30)

	$\displaystyle\hat{\beta}_{k}\frac{\tau}{2}\left(\\|x_{k}-\tilde{x}_{k}\\|^{2}+\\|v_{k}\\|^{2}\right)$	$\displaystyle<\hat{\beta}_{k}\langle\nabla f(\hat{x}_{k+1})-\nabla f(x_{k}),y_{k}-x_{k}\rangle$
		$\displaystyle\leq\hat{\beta}_{k}\\|\nabla f(\hat{x}_{k+1})-\nabla f(x_{k})\\|\\|y_{k}-x_{k}\\|$
		$\displaystyle=\hat{\beta}_{k}\\|\nabla f(\hat{x}_{k+1})-\nabla f(x_{k})\\|\\|x_{k}-\tilde{x}_{k}+v_{k}\\|.$

A Relative Inexact Proximal Gradient Method With an Explicit Linesearch

Abstract

Keywords:

MSC:

1 Introduction

2 Preliminary

Proposition 1 (Closed Graph Property)

Proposition 2 (Transportation Formula)

Definition 1 (η\eta-Approximate Stationary Solution)

Definition 2 (Quasi-Fejér Convergence)

Lemma 1 (Quasi-Féjer Convergence Properties)

Lemma 2

3 Inexact Proximal Gradient Method

Remark 1 (The Inexact Proximal Subproblem)

Remark 2 (The Explicit Lineasearch)

Lemma 3 (Iteration Inequality Condition)

Lemma 4 (Stopping at a Solution)

Lemma 5 (Finite Linesearch Termination)

Lemma 6 (Lipschitz Condition and Linesearch Complexity)

4 Convergence and Complexity Analyses of the IPG-ELS Method

Proposition 3 (Key Inequality for the IPG-ELS Method)

Theorem 4.1 (Convergence Rate of the IPG-ELS Method)

Remark 3 (Complexity of η\eta-Approximate Solution)

Theorem 4.2 (Complexity of η\eta-Approximate Stationary Solution)

Theorem 4.3 (Convergence for the IPG-ELS Method)

5 Numerical Experiments

6 Conclusions

Acknowledgements.

References

Definition 1 ( $\eta$ -Approximate Stationary Solution)

Remark 3 (Complexity of $\eta$ -Approximate Solution)

Theorem 4.2 (Complexity of $\eta$ -Approximate Stationary Solution)