A mirror inertial forward-reflected-backward splitting: Global convergence and linesearch extension beyond convexity and Lipschitz smoothness^†^†thanks: This work was supported by NSERC Discovery Grants, and the JSPS KAKENHI grant number JP21K17710.

Ziyuan Wang Department of Mathematics, Irving K. Barber Faculty of Science, University of British Columbia, Kelowna, B.C. V1V 1V7, Canada.
E-mails:[email protected], [email protected] Andreas Themelis Kyushu University, Faculty of Information Science and Electrical Engineering (ISEE), 744 Motooka, Nishi-ku, 819-0395 Fukuoka, Japan.
E-mails:[email protected], [email protected] Hongjia Ou³³footnotemark: 3 Xianfu Wang²²footnotemark: 2

Abstract

This work investigates a Bregman and inertial extension of the forward-reflected-backward algorithm [Y. Malitsky and M. Tam, SIAM J. Optim., 30 (2020), pp. 1451–1472] applied to structured nonconvex minimization problems under relative smoothness. To this end, the proposed algorithm hinges on two key features: taking inertial steps in the dual space, and allowing for possibly negative inertial values. Our analysis begins with studying an associated envelope function that takes inertial terms into account through a novel product space formulation. Such construction substantially differs from similar objects in the literature and could offer new insights for extensions of splitting algorithms. Global convergence and rates are obtained by appealing to the generalized concave Kurdyka-Łojasiewicz (KL) property, which allows us to describe a sharp upper bound on the total length of iterates. Finally, a linesearch extension is given to enhance the proposed method.

Keywords. Nonsmooth nonconvex optimization $\cdot$ forward-reflected-backward splitting $\cdot$ inertia $\cdot$ Bregman distance $\cdot$ relative smoothness.

AMS subject classifications. 90C26 $\cdot$ 49J52 $\cdot$ 49J53.

1 Introduction

Consider the following composite minimization problem

\operatorname*{minimize}_{x\in\varmathbb R^{n}}\varphi(x)\coloneqq f(x)+g(x)\quad\operatorname{subject\ to}x\in\overline{C},

(P)

where $C$ is a nonempty open and convex set with closure $\overline{C}$ , $f:\varmathbb R^{n}\rightarrow\overline{\varmathbb}R\coloneqq\varmathbb R\cup{\mathopen{}\left\{\pm\infty{}\mathrel{\mid}{}{}\right\}\mathclose{}}$ is differentiable on $\operatorname{int}\operatorname{dom}f\neq\emptyset$ , and $g:\varmathbb R^{n}\rightarrow\overline{\varmathbb}R$ is proper and lower semicontinuous (lsc). For notational brevity, we define $\varphi_{\overline{C}}\coloneqq\varphi+\operatorname{\delta}_{\overline{C}}$ with $\operatorname{\delta}_{X}$ denoting the indicator function of set $X\subseteq\varmathbb R^{n}$ , namely such that $\operatorname{\delta}_{X}(x)=0$ if $x\in X$ and $\infty$ otherwise. By doing so, problem (P) can equivalently be cast as the “unconstrained” minimization

\operatorname*{minimize}_{x\in\varmathbb R^{n}}\varphi_{\overline{C}}(x).

Note that (P) is beyond the scope of traditional first-order methods that require global Lipschitz continuity of ${\nabla}f$ and the consequential descent lemma [9, Prop. A.24]; see, e.g, [3, 31, 24, 20, 21, 19] for such algorithms. To resolve this issue, Lipschitz-like convexity was introduced in the seminal work [5], furnishing a descent lemma beyond the aforementioned setting. This notion was then referred to as relative smoothness (see 2.1) and has played a central role in extending splitting algorithm to the setting of (P); see, e.g., [10, 13, 16, 23, 29, 35].

The goal of this paper is to propose a Bregman inertial forward-reflected-backward method $i^{*}$ FRB (Algorithm 1) for solving (P), which, roughly speaking, iterates

x^{k+1}{}\in{}({\nabla}h+\gamma\partial g)^{-1}({\nabla}h(x^{k})+\beta({\nabla}h(x^{k})-{\nabla}h(x^{k-1})-\gamma(2{\nabla}f(x^{k})-{\nabla}f(x^{k-1})),

where $\gamma>0$ is the stepsize, $\beta$ is inertial parameter, and $h$ is the kernel. In the convex case, the above scheme reduces to the inertial forward-reflected-backward method proposed in [25] when $h=(1/2)\|\cdot\|^{2}$ , which is not applicable to (P) due to its assumption on Lipschitz continuity of ${\nabla}f$ .

A fundamental tool in our analysis is the $i^{*}$ FRB-envelope (see 3.4), which is the value function associated to the parametric minimization of a “model” of (P); see Section 3.1. The term “envelope” is borrowed from the celebrated Moreau envelope [28] and its relation with the proximal operator. Indeed, there has been a re-emerged interest of employing an associated envelope function to study convergence of splitting methods, such as forward-backward splitting [38, 1], Douglas-Rachford splitting [37, 39], alternating minimization algorithm [34], as well as the splitting scheme of Davis and Yin [22]. The aforementioned works share one common theme: regularity properties of the associated envelope function are used for further enhancement and deeper algorithmic insights.

Pursuing the same pattern, additionally to studying global convergence of the proposed algorithm in Section 4 using the $i^{*}$ FRB-envelope, we will showcase in Section 5 how it offers a versatile globalization framework for fast local methods in the full generality of I. Such framework revealed the necessity of interleaving noninertial trial steps in the globalization strategy, an observation that led to a substantial change in the linesearch strategy with respect to related works.

A major departure of our analysis from previous work is that we consider an envelope function with two independent variables, allowing us to take inertial terms into account. In this regard, we believe that our methodology is appealing in its own right, as it can be instrumental for deriving inertial extensions of other splitting methods. Another notable feature of this work is that, as one shall see in Section 3.4, non-positive inertial parameter is required for the sake of convergence under relative smoothness. This result, although more pessimistic, aligns with the recent work [14] regarding the impossibility of accelerated Bregman forward-backward method under the same assumption; see 3.7 for a detailed discussion. Our work differs from the analysis carried out in [40], which also deals with an inertial forward-reflected-backward algorithm using Bregman metrics but is still limited to the Lipschitz smoothness assumption. The game changer that enables us to cope with the relative smoothness is taking the inertial step in the dual space, that is, interpolating application of ${\nabla}h$ (cf. step 4 of Algorithm 1), whence the name, inspired by [8], mirror inertial forward-reflected-backward splitting ( $i^{*}$ FRB).

The rest of the paper is structured as follows. In the remainder of the section we formally define the problem setting and the proposed algorithm, and in Section 2 we discuss some preliminary material and notational conventions. Section 3 introduces the $i^{*}$ FRB-envelope and an associated merit function; the proof of the main result therein is deferred to Appendix A. The convergence analysis is carried out in Section 4, and finally Section 5 presents the $i^{*}$ FRB-based globalization framework. Section 6 draws some concluding remarks.

1.1 Problem setting and proposed algorithm

Throughout, we fix a Legendre kernel $h:\varmathbb R^{n}\rightarrow\overline{\varmathbb}R$ with $\operatorname{dom}{\nabla}h=\operatorname{int}\operatorname{dom}h=C$ , namely, a proper, lsc, and strictly convex function that is 1-coercive and essentially smooth, i.e., such that $\|{\nabla}h(x_{k})\|\to\infty$ for every sequence $(x_{k})_{k\in\varmathbb N}\subset C$ converging to a boundary point of $C$ . We will consider the following iterative scheme for addressing problem (P), where $\operatorname{D}_{h}:\varmathbb R^{n}\times\varmathbb R^{n}\rightarrow\overline{\varmathbb}R$ denotes the Bregman distance induced by $h$ , defined as

{\operatorname{D}_{h}(x,y){}\coloneqq{}{\mathopen{}\left\{\begin{array}[]{l @{\hspace{\ifcasescolsep}} >{\text{if~}}l }h(x)-h(y)-{\mathopen{}\left\langle{}{\nabla}h(y){},{}x-y{}\right\rangle\mathclose{}}\hfil\hskip 10.00002pt&\leavevmode\nobreak\ }y\in C,\\ \infty\hfil\hskip 10.00002pt&\lx@intercol\text{otherwise.}\hfil\lx@intercol\end{array}\right.\mathclose{}}

(1.1)

Algorithm 1 Mirror inertial forward-reflected-backward (

i^{*}

FRB)

3:Set

y^{k}

such that

{\nabla}h(y^{k}){}={}{\nabla}h(x^{k})-\gamma\bigl{(}{\nabla}f(x^{k})-{\nabla}f(x^{k-1})\bigr{)}

4:Choose

x^{k+1}{}\in{}\operatorname*{arg\,min}\limits_{w\in\varmathbb R^{n}}{\mathopen{}\left\{g(w){}+{}\langle{}w{},{}{\nabla}f(x^{k})-\tfrac{\beta}{\gamma}\bigl{(}{\nabla}h(x^{k})-{\nabla}h(x^{k-1})\bigr{)}{}\rangle{}+{}\tfrac{1}{\gamma}\operatorname{D}_{h}(w,y^{k}){}\mathrel{\mid}{}{}\right\}\mathclose{}}

Note that Algorithm 1 takes inertial step in the dual space, hence the abbreviation $i^{*}$ FRB. We will work under the following assumptions.

Assumption I.

The following hold in problem (P):

1

$f:\varmathbb R^{n}\rightarrow\overline{\varmathbb}R$ is smooth relative to $h$ (see Section 2.2).
2

$g:\varmathbb R^{n}\rightarrow\overline{\varmathbb}R$ is proper and lsc.
3

$\operatorname*{inf}\varphi_{\overline{C}}>-\infty$ .
4

For any $v\in\varmathbb R^{n}$ and $\gamma>0$ , $\operatorname*{arg\,min}{\mathopen{}\left\{\gamma g+h-{\mathopen{}\left\langle{}v{},{}\cdot{}\right\rangle\mathclose{}}{}\mathrel{\mid}{}{}\right\}\mathclose{}}\subseteq C$ .

As will be made explicit in 3.2, Assumption 4 is a requirement ensuring that Algorithm 1 is well defined. Note that, in general, the minimizers therein are a (possibly empty) subset of $\operatorname{dom}h\cap\operatorname{dom}g$ ; Assumption 4 thus only excludes points on the boundary of $\operatorname{dom}h$ . This standard requirement is trivially satisfied when $\operatorname{dom}h$ is open, or more generally when constraint qualifications enabling a subdifferential calculus rule on the boundary are met, as is the case when $g$ is convex.

Remark 1.1 (constraint qualifications for Assumption 4).

If $g$ is proper and lsc, Assumption 4 is satisfied if $\partial^{\infty}g\cap\bigl{(}-\partial^{\infty}h\bigr{)}\subseteq{\mathopen{}\left\{0{}\mathrel{\mid}{}{}\right\}\mathclose{}}$ holds everywhere (this condition being automatically guaranteed at all point outside the boundary of $C$ , having $\partial^{\infty}h$ empty outside $\operatorname{dom}h$ and ${\mathopen{}\left\{0{}\mathrel{\mid}{}{}\right\}\mathclose{}}$ in its interior). Indeed, optimality of $\bar{x}\in\operatorname*{arg\,min}{\mathopen{}\left\{\gamma g+h-{\mathopen{}\left\langle{}v{},{}\cdot{}\right\rangle\mathclose{}}{}\mathrel{\mid}{}{}\right\}\mathclose{}}$ implies that $v\in\partial[\gamma g+h](\bar{x})\subseteq\gamma\partial g(\bar{x})+\partial h(\bar{x})$ , with inclusion holding by [33, Cor. 10.9] and implying nonemptiness of $\partial h(\bar{x})$ ; see Section 2.1 for definitions of subdifferentials. ∎

Regardless, it will be shown in 2.4 that existence of minimizers is guaranteed for small enough values of $\gamma$ , which will then be linked in 3.2 to the well definedness of Algorithm 1.

2 Preliminaries

2.1 Notation

The extended-real line is denoted by $\overline{\varmathbb}R\coloneqq\varmathbb R\cup{\mathopen{}\left\{\pm\infty{}\mathrel{\mid}{}{}\right\}\mathclose{}}$ . The positive and negative part of $r\in\varmathbb R$ are respectively defined as $[r]_{+}\coloneqq\max{\mathopen{}\left\{0,r{}\mathrel{\mid}{}{}\right\}\mathclose{}}$ and $[r]_{-}\coloneqq\max{\mathopen{}\left\{0,-r{}\mathrel{\mid}{}{}\right\}\mathclose{}}$ , so that $r=[r]_{+}-[r]_{-}$ .

The distance of a point $x\in\varmathbb R^{n}$ to a nonempty set $S\subseteq\varmathbb R^{n}$ is given by $\operatorname{dist}(x,S)=\operatorname*{inf}_{z\in S}\|z-x\|$ . The interior, closure, and boundary of $S$ are respectively denoted as $\operatorname{int}S$ , $\overline{S}$ , and $\operatorname{bdry}S=\overline{S}\setminus\operatorname{int}S$ . The indicator function of $S$ is $\operatorname{\delta}_{S}:\varmathbb R^{n}\rightarrow\overline{\varmathbb}R$ defined as $\operatorname{\delta}_{S}(x)=0$ if $x\in S$ and $\infty$ otherwise.

A function $f:\varmathbb R^{n}\rightarrow\overline{\varmathbb}R$ is proper if $f\not\equiv\infty$ and $f>-\infty$ , in which case its domain is defined as the set $\operatorname{dom}f\coloneqq{\mathopen{}\left\{x\in\varmathbb R^{n}{}\mathrel{\mid}{}f(x)<\infty\right\}\mathclose{}}$ . For $\alpha\in\varmathbb R$ , $[f\leq\alpha]\coloneqq{\mathopen{}\left\{x\in\varmathbb R^{n}{}\mathrel{\mid}{}f(x)\leq\alpha\right\}\mathclose{}}$ denotes the $\alpha$ -sublevel set of $f$ ; $[\alpha\leq f\leq\beta]$ with $\alpha,\beta\in\varmathbb R$ is defined accordingly. We say that $f$ is level bounded (or coercive) if $\liminf_{\|x\|\to\infty}f(x)=\infty$ , and $1$ -coercive if $\lim_{\|x\|\to\infty}f(x)/\|x\|=\infty$ . A point $x_{\star}\in\operatorname{dom}f$ is a local minimum for $f$ if $f(x)\geq f(x_{\star})$ holds for all $x$ in a neighborhood of $x_{\star}$ . If the inequality can be strengthened to $f(x)\geq f(x_{\star})+\tfrac{\mu}{2}\|x-x_{\star}\|^{2}$ for some $\mu>0$ , then $x_{\star}$ is a strong local minimum. The convex conjugate of $f$ is denoted as $f^{\ast}\coloneqq\sup_{z}{\mathopen{}\left\{{\mathopen{}\left\langle{}{}\cdot{}{},{}z{}\right\rangle\mathclose{}}-f(z){}\mathrel{\mid}{}{}\right\}\mathclose{}}$ . Given $x\in\operatorname{dom}f$ , $\partial f(x)$ denotes the Mordukhovich (limiting) subdifferential of $f$ at $x$ , given by

\partial f(x){}\coloneqq{}{\mathopen{}\left\{v\in\varmathbb R^{n}{}\mathrel{\mid}{}\exists(x^{k},v^{k})_{k\in\varmathbb N}\leavevmode\nobreak\ \text{s.t.}\leavevmode\nobreak\ x^{k}\to x,\leavevmode\nobreak\ f(x^{k})\to f(x),\leavevmode\nobreak\ \hat{\partial}f(x^{k})\ni v^{k}\to v\right\}\mathclose{}},

and $\hat{\partial}f(x)$ is the set of regular subgradients of $f$ at $x$ , namely vectors $v\in\varmathbb R^{n}$ such that $\liminf_{{\begin{array}[]{>{\scriptstyle}r >{\scriptstyle{}}c<{{}} >{\scriptstyle}l}z&\to&x\\ z&\neq&x\end{array}}}{\frac{f(z)-f(x)-{\mathopen{}\left\langle{}v{},{}z-x{}\right\rangle\mathclose{}}}{\|z-x\|}}{}\geq{}0$ ; see, e.g., [33, 27]. $\mathcal{C}^{k}(\mathcal{U})$ is the set of functions $\mathcal{U}\to\varmathbb R$ which are $k$ times continuously differentiable. We write $\mathcal{C}^{k}$ if $\mathcal{U}$ is clear from context. The notation $T:\varmathbb R^{n}\rightrightarrows\varmathbb R^{n}$ indicates a set-valued mapping, whose domain and graph are respectively defined as $\operatorname{dom}T={\mathopen{}\left\{x\in\varmathbb R^{n}{}\mathrel{\mid}{}T(x)\neq\emptyset\right\}\mathclose{}}$ and $\operatorname{gph}T={\mathopen{}\left\{(x,y)\in\varmathbb R^{n}\times\varmathbb R^{n}{}\mathrel{\mid}{}y\in T(x)\right\}\mathclose{}}$ .

2.2 Relative smoothness and weak convexity

In the following definition, as well as throughout the entire paper, we follow the extended-real convention $\infty-\infty=\infty$ to resolve possible ill definitions of difference of extended-real–valued functions. Similarly, we will adopt the convention that $1/0=\infty$ .

Definition 2.1 (relative smoothness).

We say that an lsc function $f:\varmathbb R^{n}\rightarrow\overline{\varmathbb}R$ is smooth relative to $h$ if $\operatorname{dom}f\supseteq\operatorname{dom}h$ and there exists a constant $L_{f,h}\geq 0$ such that $L_{f,h}h\pm f$ are convex functions on $\operatorname{int}\operatorname{dom}h$ . We may alternatively say that $f$ is $L_{f,h}$ -smooth relative to $h$ to make the smoothness modulus $L_{f,h}$ explicit.

Note that the constant $L_{f,h}$ may be loose. For instance, if $f$ is convex, then $Lh+f$ is convex for any $L\geq 0$ . This motivates us to consider one-sided conditions and treat $f$ and $-f$ separately.

Definition 2.2 (relative weak convexity).

We say that an lsc function $f:\varmathbb R^{n}\rightarrow\overline{\varmathbb}R$ is weakly convex relative to $h$ if there exists a (possibly negative) constant $\sigma_{f,h}\in\varmathbb R$ such that $f-\sigma_{f,h}h$ is a convex function. We may alternatively say that $f$ is $\sigma_{f,h}$ -weakly convex relative to $h$ to make the weak convexity modulus $\sigma_{f,h}$ explicit.

Note that relative smoothness of $f$ is equivalent to the relative weak convexity of $\pm f$ . Indeed, if $f$ is $L_{f,h}$ -smooth relative to $h$ , then both $f$ and $-f$ are $(-L_{f,h})$ -weakly convex. Conversely, if $f$ and $-f$ are $\sigma_{f,h}$ - and $\sigma_{-f,h}$ -weakly convex relative to $h$ , respectively, then $f$ (as well as $-f$ ) is $L_{f,h}$ -smooth relative to $h$ with

L_{f,h}=\max{\mathopen{}\left\{|\sigma_{f,h}|,|\sigma_{-f,h}|{}\mathrel{\mid}{}{}\right\}\mathclose{}}.

(2.1)

The relative weak convexity moduli $\sigma_{\pm f,h}$ will be henceforth adopted when referring to Assumption 1. It will be convenient to normalize these quantities into pure numbers

p_{\pm f,h}{}\coloneqq{}\tfrac{\sigma_{\pm f,h}}{L_{f,h}}{}\in{}[-1,1].

(2.2)

To make all definitions and implications well posed, we will ignore the uninsteresting case in which $f$ is affine, in which case $L_{f,h}=0$ for any $h$ . The comment below will be instrumental in Section 3.4.

Remark 2.3.

Invoking (2.1) and (2.2) yields that

-2\leq p_{f,h}+p_{-f,h}\leq 0\quad\text{and}\quad-1\in{\mathopen{}\left\{p_{f,h},p_{-f,h}{}\mathrel{\mid}{}{}\right\}\mathclose{}},

(2.3)

where the second inequality owes to the fact that, by definition, both $f-\sigma_{f,h}h$ and $-f-\sigma_{-f,h}h$ are convex functions, and therefore so is their sum $-(\sigma_{f,h}+\sigma_{-f,h})h=-L_{f,h}(p_{f,h}+p_{-f,h})h$ . Thus, whenever $f$ is convex (resp. concave), one can take $p_{f,h}=0$ (resp. $p_{-f,h}=0$ ) and by virtue of the inclusion in (2.3) it directly follows that $p_{-f,h}=-1$ (resp. $p_{f,h}=-1$ ). ∎

We now turn to a lemma that guarantees well definedness of Algorithm 1.

Lemma 2.4 (relative prox-boundedness).

Suppose that I holds. Then, the set $\operatorname*{arg\,min}{\mathopen{}\left\{\gamma g+h+{\mathopen{}\left\langle{}v{},{}\cdot{}\right\rangle\mathclose{}}{}\mathrel{\mid}{}{}\right\}\mathclose{}}$ as in Assumption 4 is nonempty for any $v\in\varmathbb R^{n}$ and $0<\gamma<1/[\sigma_{-f,h}]_{-}$ . In other words, $g$ is prox bounded relative to $h$ with threshold $\gamma_{g,h}\geq 1/[\sigma_{-f,h}]_{-}$ [18, Def. 2.3].

Proof.

Recall that a proper convex function admits affine minorant; see, e.g, [7, Cor. 16.18]. It then follows that $\gamma g+h$ is 1-coercive by observing that

\gamma g+h{}={}\gamma\varphi_{\overline{C}}-\gamma f+h{}={}\gamma\vphantom{\vphantom{{\mathopen{}\left([\sigma_{-f,h}]_{-}\right)\mathclose{}}}\varphi_{\overline{C}}}\smash{\underbracket{\vphantom{{\mathopen{}\left([\sigma_{-f,h}]_{-}\right)\mathclose{}}}\varphi_{\overline{C}}}_{\mathclap{\geq\operatorname*{inf}\varphi_{\overline{C}}}}}{}+{}\gamma\vphantom{{\mathopen{}\left(-f+[\sigma_{-f,h}]_{-}h\right)\mathclose{}}}\smash{\underbracket{{\mathopen{}\left(-f+[\sigma_{-f,h}]_{-}h\right)\mathclose{}}}_{\text{convex}}}{}+{}\vphantom{{\mathopen{}\left(1-\gamma[\sigma_{-f,h}]_{-}\right)\mathclose{}}}\smash{\underbracket{{\mathopen{}\left(1-\gamma[\sigma_{-f,h}]_{-}\right)\mathclose{}}}_{>0}}\vphantom{\,h\,}\smash{\overbracket{\,h\,}^{\text{\clap{1-coercive}}}}

on $\overline{C}$ and $\infty$ on $\varmathbb R^{n}\setminus\overline{C}$ . ∎

We point out that the content of this subsection is a (well-known) generalization of the (well-known) equivalence between smoothness relative to the Euclidean kernel $\mathcal{j}\coloneqq(1/2)\|{}\cdot{}\|^{2}$ and Lipschitz differentiability, a fact that will be invoked in Section 3 and whose proof is given next for the sake of completeness.

Lemma 2.5 (Lipschitz smoothness from weak convexity).

Let $\mathcal{j}\coloneqq(1/2)\|{}\cdot{}\|^{2}$ . Then, for any $F:\varmathbb R^{n}\rightarrow\overline{\varmathbb}R$ the following are equivalent:

1

there exist $\sigma_{\pm F}\in\varmathbb R$ such that both $F-\sigma_{F}\mathcal{j}$ and $-F-\sigma_{-F}\mathcal{j}$ are proper, convex and lsc;
2

$\operatorname{dom}\partial F=\varmathbb R^{n}$ , and there exist $\sigma_{\pm F}\in\varmathbb R$ such that for all $(x_{i},v_{i})\in\operatorname{gph}\partial F$ , $i=1,2$ , it holds that $\sigma_{F}\|x_{1}-x_{2}\|^{2}{}\leq{}{\mathopen{}\left\langle{}v_{1}-v_{2}{},{}x_{1}-x_{2}{}\right\rangle\mathclose{}}{}\leq{}-\sigma_{-F}\|x_{1}-x_{2}\|^{2}$ ;
3

${\nabla}F$ is $L_{F}$ -Lipschitz differentiable for some $L_{F}\geq 0$ .

In particular, 1 and/or 2 imply 3 with $L_{F}=\max{\mathopen{}\left\{|\sigma_{F}|,|\sigma_{-F}|{}\mathrel{\mid}{}{}\right\}\mathclose{}}$ , and conversely 3 implies 1 and 2 with $\sigma_{\pm F}=-L_{F}$ .

Proof.

1 $\Leftrightarrow$ 2 Follows from [33, Ex.s 12.28(b),(c)].

2 $\Rightarrow$ 3 Invoking again [33, Ex. 12.28(c)] yields that both $F$ and $-F$ are lower- $\mathcal{C}^{2}$ , in the sense of [33, Def. 10.29], and continuous differentiability then follows from [33, Prop. 10.30]. Thus, $v_{i}={\nabla}F(x_{i})$ in assertion 2, and denoting $L_{F}=\max{\mathopen{}\left\{|\sigma_{F}|,|\sigma_{-F}|{}\mathrel{\mid}{}{}\right\}\mathclose{}}$ the function $\psi\coloneqq L_{F}\mathcal{j}+F$ satisfies

0{}\leq{}{\mathopen{}\left\langle{}{\nabla}(\psi(x_{1})-{\nabla}\psi(x_{2}){},{}x_{1}-x_{2}{}\right\rangle\mathclose{}}{}\leq{}2L_{F}\|x_{1}-x_{2}\|^{2}.

It then follows from [30, Thm. 2.1.5] that $\psi$ is convex and $(2L_{F})$ -Lipschitz differentiable, and that

\tfrac{1}{2L_{F}}\|{\nabla}\psi(x_{1})-{\nabla}\psi(x_{2})\|^{2}{}\leq{}{\mathopen{}\left\langle{}{\nabla}\psi(x_{1})-{\nabla}\psi(x_{2}){},{}x_{1}-x_{2}{}\right\rangle\mathclose{}}.

Using the definition of $\psi$ and expanding the squares yields $\tfrac{1}{2L_{F}}\|{\nabla}F(x_{1})-{\nabla}F(x_{2})\|^{2}{}\leq{}\tfrac{L_{F}}{2}\|x_{1}-x_{2}\|^{2}$ , proving that ${\nabla}F$ is $L_{F}$ -Lipschitz continuous.

3 $\Rightarrow$ 1 From the quadratic upper bound [9, Prop. A.24] it follows that

\pm F(x_{2}){}\geq{}\pm F(x_{1}){}\pm{}{\mathopen{}\left\langle{}{\nabla}F(x_{1}){},{}x_{2}-x_{1}{}\right\rangle\mathclose{}}{}-{}\tfrac{L_{F}}{2}\|x_{2}-x_{1}\|^{2}.

or, equivalently,

(L_{F}\mathcal{j}\pm F)(x_{2}){}\geq{}(L_{F}\mathcal{j}\pm F)(x_{1}){}+{}{\mathopen{}\left\langle{}{\nabla}(L_{F}\mathcal{j}\pm F)(x_{1}){},{}x_{2}-x_{1}{}\right\rangle\mathclose{}}.

This proves convexity of $L_{F}\mathcal{j}\pm F$ , whence the claim by taking $\sigma_{\pm F}=-L_{F}$ . ∎

3 Algorithmic analysis toolbox

In the literature, convergence analysis for nonconvex splitting algorithms typically revolves around the identification of a ‘Lyapunov potential’, namely, a lower bounded function that decreases its value along the iterates. In this section we will pursue this direction. We will actually go one step further in identifying a function which, additionally to serving as Lyapunov potential, is also continuous. This property, whose utility will be showcased in Section 5, will come as the result of a parametric minimization, as discussed in the following two subsections.

In what follows, to simplify the discussion we introduce

\hat{h}\coloneqq\tfrac{1}{\gamma}h-f\quad\text{and}\quad\vphantom{f}\smash{\hat{f}}_{\!\beta}\coloneqq f-\tfrac{\beta}{\gamma}h,

(3.1)

and observe that $\hat{h}$ too is a Legendre kernel, for $\gamma$ small enough.

Lemma 3.1 ([1, Thm. 4.1]).

Suppose that Assumption 1 holds. Then, for every $\gamma<1/[\sigma_{-f,h}]_{-}$ the function $\hat{h}$ is a Legendre kernel with $\operatorname{dom}\hat{h}=\operatorname{dom}h$ .

We will also (ab)use the notation $\operatorname{D}_{\psi}$ of the Bregman distance for functions $\psi:\varmathbb R^{n}\rightarrow\overline{\varmathbb}R$ differentiable on $C$ that are not necessarily convex, thereby possibly having $\operatorname{D}_{\psi}\not\geq 0$ . This notational abuse is justified by the fact that all algebraic identities of the Bregman distance used in the manuscript (e.g., the three-point identity [12, Lem. 3.1]) are valid regardless of whether $\psi$ is convex or not, and will overall yield a major simplification of the math.

3.1 Parametric minimization model

As a first step towards the desired goals, as well as to considerably simplify the discussion, we begin by observing that the $i^{*}$ FRB-update is the result of a parametric minimization. Namely, by introducing the “model”


$\displaystyle\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(w;x,x^{-}){}={}$	$\displaystyle\varphi(w){}+{}\operatorname{D}_{\hat{h}}(w,x){}+{}\langle{}w-x{},{}{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}(x)-{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}(x^{-}){}\rangle$	(3.2a)
$\displaystyle{}={}$	$\displaystyle\varphi(w){}+{}\operatorname{D}_{\hat{h}-\vphantom{f}\smash{\hat{f}}_{\!\beta}}(w,x){}+{}\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}(w,x^{-}){}-{}\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}(x,x^{-}),$	(3.2b)

observe that the $x$ -update in $i^{*}$ FRB can be compactly expressed as


	$\displaystyle x^{k+1}{}\in{}$	$\displaystyle\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},x^{k-1}),$	(3.3a)
where $\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}:C\times C\rightrightarrows C$ defined by
	$\displaystyle\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-}){}\coloneqq{}$	$\displaystyle\operatorname*{arg\,min}_{w\in\varmathbb R^{n}}\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(w;x,x^{-})$	(3.3b)

is the $i^{*}$ FRB-operator with stepsize $\gamma$ and inertial parameter $\beta$ . The fact that $\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}$ maps pairs in $C\times C$ to subsets of $C$ is a consequence of Assumption 4, as we are about to formalize in Item 3. Note that many $\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}$ can be defined giving rise to the same $\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}$ , and all these differ by additive terms which are constant with respect to $w$ . Among these, the one given in (3.2b) reflects the tangency condition $\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x;x,x^{-})=\varphi(x)=\varphi_{\overline{C}}(x)$ for every $x,x^{-}\in C$ . A consequence of this fact and other basic properties are summarized next.

Lemma 3.2 (basic properties of the model $\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}$ and the operator $\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}$ ).

Suppose that I holds, and let $\gamma<1/[\sigma_{-f,h}]_{-}$ and $\beta\in\varmathbb R$ be fixed. The following hold:

1.

$\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x;x,x^{-}){}={}\varphi_{\overline{C}}(x)$ for all $x,x^{-}\in C$ .
2.

$\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(w;x,x^{-})$ is level bounded in $w$ locally uniformly in $(x,x^{-})$ on $C\times C$ .¹¹1Namely, $\bigcup_{(x,x^{-})\in\mathcal{V}}{\mathopen{}\left\{w\in\varmathbb R^{n}{}\mathrel{\mid}{}\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(w;x,x^{-})\leq\alpha\right\}\mathclose{}}$ is bounded for any $\mathcal{V}\subset C\times C$ compact and $\alpha\in\varmathbb R$ .
3.

$\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}$ is locally bounded and osc,²²2Being $\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}$ defined on $C\times C$ , osc and local boundedness are meant relative to $C\times C$ . Namely, $\operatorname{gph}\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}$ is closed relative to $C\times C\times\varmathbb R^{n}$ , and $\bigcup_{(x,x^{-})\in\mathcal{V}}\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-})$ is bounded for any $\mathcal{V}\subset C\times C$ compact. and $T(x,x^{-})$ is a nonempty and compact subset of $C$ for any $x,x^{-}\in C$ .
4.

${\nabla}\hat{h}(x)-{\nabla}\hat{h}(\bar{x})-{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}(x)+{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}(x^{-})\in\hat{\partial}\varphi(\bar{x})$ for any $x,x^{-}\in C$ and $\bar{x}\in\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-})$ .
5.

If $x\in\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x)$ , then $0\in\hat{\partial}\varphi(x)$ and $\operatorname{T}_{\gamma^{\prime}\!,\,\beta}^{h\text{-\sc frb}}(x,x)={\mathopen{}\left\{x{}\mathrel{\mid}{}{}\right\}\mathclose{}}$ for every $\gamma^{\prime}\in(0,\gamma)$ .

Proof.

We start by observing that 2.4 ensures that the set $\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-})$ is nonempty for any $x,x^{-}\in C$ ; this follows by considering the expression (3.2a) of the model, by observing that, for any $x\in C$ , $\varphi+\operatorname{D}_{\hat{h}}({}\cdot{},x){}={}g+\tfrac{1}{\gamma}h-\hat{h}(x)-\langle{}{\nabla}\hat{h}(x){},{}{}\cdot{}-x{}\rangle$ . For the same reason, it then follows from Assumption 4 that $T(x,x^{-})\subset C$ .

1 Apparent, by considering $w=x$ in (3.2b).
2 & 3 The first assertion owes to the fact that $\hat{h}$ is 1-coercive by 3.1 and that both $\hat{h}$ and ${\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}$ are continuous on $C$ , so that for any compact $\mathcal{V}\subset\varmathbb R^{n}\times\varmathbb R^{n}$ one has that $\lim_{\|w\|\to\infty}\operatorname*{inf}_{(x,x^{-})\in\mathcal{V}}\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(w;x,x^{-})=\infty$ , as is apparent from (3.2a). In turn, the second assertion follows from [33, Thm. 1.17].
4 Follows from the optimality conditions of $\bar{x}\in\operatorname*{arg\,min}\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}({}\cdot{};x,x^{-})$ , for $\bar{x}\in C$ by assumption and thus the calculus rule of [33, Ex. 8.8(c)] applies (having $\hat{h}$ smooth around $\bar{x}\in C$ ).
5 That $0\in\hat{\partial}\varphi(x)$ follows from assertion 4, and the other claim from [1, Lem. 3.6] by observing that $\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x)=\operatorname*{arg\,min}{\mathopen{}\left\{\varphi+\operatorname{D}_{\hat{h}}({}\cdot{},x){}\mathrel{\mid}{}{}\right\}\mathclose{}}$ for any $\gamma>0$ and $\beta\in\varmathbb R$ . ∎

Remark 3.3 (inertial effect).

Letting $\tilde{f}=f+ch$ and $\tilde{g}=g-ch$ for some $c\in\varmathbb R$ , $\tilde{f}+\tilde{g}$ gives an alternative decomposition of $\varphi$ which still complies with I, having $\sigma_{\pm\tilde{f},h}=\sigma_{\pm f,h}\pm c$ . Relative to this decomposition, it is easy to verify that the corresponding model $\tilde{\mathcal{M}}^{h\text{-\sc frb}}$ is related to the original one in (3.2a) as

{\tilde{\mathcal{M}}_{\tilde{\gamma}\!,\,\tilde{\beta}}^{h\text{-\sc frb}}}{}={}\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}\quad\text{with}\quad{\mathopen{}\left\{\begin{array}[]{@{}l@{}l@{}}\gamma={}&\frac{\tilde{\gamma}}{1-\tilde{\gamma}c}\\[4.0pt] \beta={}&\frac{\tilde{\beta}-\tilde{\gamma}c}{1-\tilde{\gamma}c}\end{array}\right.\mathclose{}}\leavevmode\nobreak\ \leavevmode\nobreak\ \Leftrightarrow\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ {\mathopen{}\left\{\begin{array}[]{@{}l@{}l@{}}\tilde{\gamma}={}&\frac{\gamma}{1+\gamma c}\\[4.0pt] \tilde{\beta}={}&\frac{\beta+\gamma c}{1+\gamma c},\end{array}\right.\mathclose{}}

and in particular $i^{*}$ FRB steps with the respective parameters coincide. The effect of inertia can then be explained as a redistribution of multiples of $h$ among $f$ and $g$ in the problem formulation, having $\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}{}={}{\tilde{\mathcal{M}}_{\frac{\gamma}{1-\beta}\,\!,\,\!0}^{h\text{-\sc frb}}}$ for any $\gamma>0$ and $\beta<1$ . ∎

3.2 The $i^{*}$ FRB-envelope

Having defined model $\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}$ and its solution mapping $\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}$ resulted from parametric minimization, we now introduce the associated value function, which we name $i^{*}$ FRB-envelope.

Definition 3.4 ( $i^{*}$ FRB-envelope).

The envelope function associated to $i^{*}$ FRB with stepsize $\gamma<1/[\sigma_{f,h}]_{-}$ and inertia $\beta\in\varmathbb R$ is $\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}:C\times C\rightarrow\varmathbb R$ defined as

\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-}){}\coloneqq{}\operatorname*{inf}_{w\in\varmathbb R^{n}}\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(w;x,x^{-}).

(3.4)

Lemma 3.5 (basic properties of $\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}$ ).

Suppose that I holds. Then, for any $\gamma<1/[\sigma_{f,h}]_{-}$ and $\beta\in\varmathbb R$ the following hold:

1.

$\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}$ is (real-valued and) continuous on $C\times C$ ; in fact, it is locally Lipschitz provided that $f,h\in\mathcal{C}^{2}(C)$ .

For any $x,x^{-}\in C$ and $\bar{x}\in\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-})$

\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-}){}={}\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(\bar{x};x,x^{-}){}={}\varphi(\bar{x}){}+{}\operatorname{D}_{\hat{h}-\vphantom{f}\smash{\hat{f}}_{\!\beta}}(\bar{x},x){}+{}\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}(\bar{x},x^{-}){}-{}\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}(x,x^{-}).

3.

$\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-}){}\leq{}\varphi(x)$ for any $x,x^{-}\in C$ .

Proof.

1 In light of the uniform level boundedness asserted in Item 2, continuity of $\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}$ follows from [33, Thm. 1.17(c)] by observing that the mapping $(x,x^{-})\mapsto\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(w;x,x^{-})$ is continuous for every $w$ ; in fact, when $f$ and $h$ are both $\mathcal{C}^{2}$ on $C$ , its gradient exists and is continuous with respect to all its arguments, which together with local boundedness of $\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}$ , cf. Item 3, gives that $-\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}$ is a lower- $\mathcal{C}^{1}$ function in the sense of [33, Def. 10.29], and in particular locally Lipschitz continuous by virtue of [33, Thm.s 10.31 and 9.2].
2 & 3 The identity follows by definition, cf. (3.4) and (3.3b). The inequality follows by considering $w=x$ in (3.4) and (3.2b). ∎

3.3 Establishing a merit function

We now work towards establishing a merit function for $i^{*}$ FRB, starting from comparing the values of $\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(\bar{x},x)$ and $\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-})$ , with $\bar{x}\in\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-})$ . Owing to Item 3, we have

	$\displaystyle\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(\bar{x},x){}\leq{}\varphi(\bar{x}){}={}$	$\displaystyle\varphi_{\overline{C}}(\bar{x})$
	$\displaystyle{}={}$	$\displaystyle\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-}){}-{}\operatorname{D}_{\hat{h}-\vphantom{f}\smash{\hat{f}}_{\!\beta}}(\bar{x},x){}-{}\vphantom{\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}(\bar{x},x^{-})}\smash{\overbracket{\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}(\bar{x},x^{-})}}{}+{}\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}(x,x^{-}).$		(3.5)

From here two separate cases can be considered, each yielding surprisingly different results. The watershed lies in whether the bracketed term is positive or not: one case will result in a very straightforward convergence analysis in the full generality of I, while the other will necessitate an additional Lipschitz differentiability requirement. The convergence analysis in both cases revolves around the identification of a constant $c>0$ determining a lower bounded merit function

\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}=\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}+\tfrac{c}{2\gamma}\operatorname{D}_{h}+\operatorname{D}_{\xi}.

(3.6)

The difference between the two cases is determined by function $\xi$ appearing in the last Bregman operator $\operatorname{D}_{\xi}$ , having $\xi=\vphantom{f}\smash{\hat{f}}_{\!\beta}$ in the former case and $\xi=L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}\mathcal{j}$ in the latter, where $L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}$ is a Lipschitz constant for ${\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}$ and

\mathcal{j}\coloneqq\tfrac{1}{2}\|{}\cdot{}\|^{2}

(3.7)

is the squared Euclidean norm. The two cases are stated in the next theorem, which constitutes the main result of this section. Special and worst-case scenarios leading to simplified statements will be given in Section 3.4. In what follows, patterning the normalization of $\sigma_{\pm f,h}$ into $p_{\pm f,h}$ detailed in Section 2.2 we also introduce the scaled stepsize

\alpha\coloneqq\gamma L_{f,h},

(3.8)

which as a result of the convergence analysis will be confined in the interval $(0,1)$ .

Theorem 3.6.

Suppose that I holds and consider one of the following scenarios:

1

either $\vphantom{f}\smash{\hat{f}}_{\!\beta}$ is convex (i.e., $\alpha p_{f,h}-\beta\geq 0$ ) and $\beta{}>{}-(1+3\alpha p_{-f,h})/2$ , in which case

$\xi\coloneqq\vphantom{f}\smash{\hat{f}}_{\!\beta}\quad\text{and}\quad c{}\coloneqq{}1+2\beta+3\alpha p_{-f,h}>0,$
2

or $\vphantom{f}\smash{\hat{f}}_{\!\beta}$ is $L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}$ -Lipschitz differentiable, $h$ is $\sigma_{h}$ -strongly convex, and

$c{}\coloneqq{}(1+\alpha p_{-f,h})\sigma_{h}{}-{}2\gamma L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}{}>{}0,$

in which case $\xi\coloneqq L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}\mathcal{j}$ .

Then, for $\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}$ as in (3.6) the following assertions hold:

For every $x,x^{-}\in C$ and $\bar{x}\in\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-})$ ,

	$\displaystyle\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(\bar{x},x){}\leq{}$	$\displaystyle\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-}){}-{}\tfrac{c}{2\gamma}\operatorname{D}_{h}(\bar{x},x){}-{}\tfrac{c}{2\gamma}\operatorname{D}_{h}(x,x^{-})$	(3.9a)
and
	$\displaystyle\varphi_{\overline{C}}(\bar{x}){}\leq{}$	$\displaystyle\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-}){}-{}\tfrac{c}{\gamma}\operatorname{D}_{h}(\bar{x},x){}-{}\tfrac{c}{2\gamma}\operatorname{D}_{h}(x,x^{-}).$	(3.9b)

2.

$\operatorname*{inf}\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}=\operatorname*{inf}\varphi_{\overline{C}}$ , and $\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}$ is level bounded iff so is $\varphi_{\overline{C}}$ .

The proof of this result is detailed in the dedicated Appendix A; before that, let us draw some comments. As clarified in the statement of 1, convexity of $\vphantom{f}\smash{\hat{f}}_{\!\beta}$ can be enforced by suitably choosing $\gamma$ and $\beta$ without imposing additional requirements on the problem. However, an unusual yet reasonable condition on inertial parameter $\beta$ may be necessary.

Remark 3.7.

In order to furnish 1, we shall see soon that $\beta\leq 0$ may be required; see Section 3.4. Such assumption, although more pessimistic, coincides with a recent conjecture by Dragomir et al. [14, §4.5.3], which states that inertial methods with nonadaptive coefficients fail to convergence in the relative smoothness setting, and provides an alternative perspective to the same matter through the lens of the convexity of $\vphantom{f}\smash{\hat{f}}_{\!\beta}$ . ∎

Unlike 1, however, additional assumptions are needed for the Lipschitz differentiable case of 2. This is because the requirement is equivalent to smoothness relative to the Euclidean Bregman kernel $\mathcal{j}$ , while I prescribes bounds only relative to $h$ . For simplicity, for functions $\psi_{1},\psi_{2}:\varmathbb R^{n}\rightarrow\overline{\varmathbb}R$ we write

\psi_{1}\preceq\psi_{2}

to indicate that $\psi_{2}-\psi_{1}$ is convex.

Remark 3.8.

Under I, one has that $\vphantom{f}\smash{\hat{f}}_{\!\beta}$ is Lipschitz-differentiable with modulus $L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}$ under either one of the following conditions:

1

either ${\nabla}h$ is $L_{h}$ -Lipschitz, in which case $L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}{}={}\tfrac{L_{h}}{\gamma}\max{\mathopen{}\left\{\beta-\alpha p_{f,h},-\beta-\alpha p_{-f,h}{}\mathrel{\mid}{}{}\right\}\mathclose{}}$ ,
2

or $\beta=0$ and ${\nabla}f$ is $L_{f}$ -Lipschitz, in which case $L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}=L_{f}$ .

Recalling that $\vphantom{f}\smash{\hat{f}}_{\!\beta}=f-\frac{\beta}{\gamma}h$ , the second condition is tautological. In case ${\nabla}h$ is $L_{h}$ -Lipschitz, 2.5 yields that

-\tfrac{L_{h}}{\gamma}[\beta-\alpha p_{f,h}]_{+}\mathcal{j}{}\preceq{}\tfrac{\alpha p_{f,h}-\beta}{\gamma}h{}\preceq{}\vphantom{f}\smash{\hat{f}}_{\!\beta}{}\preceq{}-\tfrac{\alpha p_{-f,h}+\beta}{\gamma}h{}\preceq{}\tfrac{L_{h}}{\gamma}[-\beta-\alpha p_{-f,h}]_{+}\mathcal{j},

and that consequently

L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}{}\coloneqq{}\tfrac{L_{h}}{\gamma}\max{\mathopen{}\left\{\bigl{[}\beta-\alpha p_{f,h}\bigl{]}_{+},\bigl{[}-\beta-\alpha p_{-f,h}\bigr{]}_{+}{}\mathrel{\mid}{}{}\right\}\mathclose{}}{}={}\tfrac{L_{h}}{\gamma}\max{\mathopen{}\left\{\beta-\alpha p_{f,h},-\beta-\alpha p_{-f,h}{}\mathrel{\mid}{}{}\right\}\mathclose{}}

is a Lipschitz modulus for ${\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}$ . ∎

3.4 Simplified bounds

In this section we provide bounds that only discern whether $f$ is convex, concave, or neither of the above. As discussed in 2.3, these cases can be recovered by suitable combinations of the coefficients $p_{\pm f,h}\in{\mathopen{}\left\{0,\pm 1{}\mathrel{\mid}{}{}\right\}\mathclose{}}$ , and thus lead to easier, though possibly looser, bounds compared to those in 3.6. We will also avail ourselves of the estimates of $L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}$ in 3.8 to discuss the cases in which $\vphantom{f}\smash{\hat{f}}_{\!\beta}$ is Lipschitz differentiable.

Without distinguishing between upper and lower relative bounds, whenever $f$ is $L_{f,h}$ -smooth relative to $h$ as in I one can consider $\sigma_{\pm f,h}=-L_{f,h}$ or, equivalently, $p_{f,h}=p_{-f,h}=-1$ . Plugging these values into 3.6 yields the following.

Corollary 3.9 (worst-case bounds).

Suppose that I holds. All the claims of 3.6 hold when $\gamma>0$ , $\beta\in\varmathbb R$ and $c>0$ are such that

1

either $-1/2<\beta<0$ and $\gamma\leq(1/L_{f,h})\min{\mathopen{}\left\{-\beta,(1+2\beta-c)/3{}\mathrel{\mid}{}{}\right\}\mathclose{}},$ in which case $\xi=\vphantom{f}\smash{\hat{f}}_{\!\beta}$ ;

1

or $h$ is $\sigma_{h}$ -strongly convex and $L_{h}$ -Lipschitz differentiable, $|\beta|<\sigma_{h}/2L_{h}$ and $\gamma\leq(1/L_{f,h})[(\sigma_{h}-2L_{h}|\beta|-c)/(\sigma_{h}+2L_{h})],$ in which case $\xi=(L_{h}/\gamma)(\alpha+|\beta|)\mathcal{j}$ ;
2

or $h$ is $\sigma_{h}$ -strongly convex, ${\nabla}f$ is $L_{f}$ -Lipschitz, $\beta=0$ and $\gamma\leq(1/L_{f,h})[(\sigma_{h}-c)/(\sigma_{h}+2L_{h})],$ in which case $\xi=L_{f}\mathcal{j}$ .

Proof.

We will discuss only the bounds on $\gamma$ as those on $\beta$ will automatically follow from the fact that $\gamma>0$ . Setting $p_{\pm f,h}=-1$ in 3.6, one has:

1 From 1 we otain $0{}<{}c{}={}1+2\beta-3\alpha$ and $-(1-3\alpha)/2{}<{}\beta{}\leq{}-\alpha$ . By replacing the first equality with “ $\leq$ ”, that is, by possibly allowing looser values of $c$ , the bound on $\gamma=\alpha/L_{f,h}$ as in 1 is obtained.
1 & 2 the two subcases refer to the corresponding items in 3.8. We shall show only the first one, as the second one is a trivial adaptation. The value of $L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}$ as in 1 reduces to $L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}=(L_{h}/\gamma)(\alpha+|\beta|)$ . Plugged into 2 yields $0{}<{}c{}={}(1-\alpha)\sigma_{h}{}-{}2L_{h}(\alpha+|\beta|)$ , implying that $\gamma$ is bounded as in 1. ∎

When $f$ is convex, $\sigma_{f,h}=0$ can be considered resulting in $p_{f,h}=0$ and $p_{-f,h}=-1$ .

Corollary 3.10 (bounds when $f$ is convex).

Suppose that I holds and that $f$ is convex. All the claims of 3.6 remain valid if $\gamma>0$ , $\beta\in\varmathbb R$ and $c>0$ are such that

1

either $-1/2<\beta\leq 0$ and $\gamma\leq(1/L_{f,h})[(1+2\beta-c)/3],$ in which case $\xi=\vphantom{f}\smash{\hat{f}}_{\!\beta}$ ;

or $h$ is $\sigma_{h}$ -strongly convex and $L_{h}$ -Lipschitz differentiable,

|\beta|<\tfrac{\sigma_{h}}{2L_{h}}\quad\text{and}\quad\gamma\leq\tfrac{1}{L_{f,h}}\min{\mathopen{}\left\{\tfrac{\sigma_{h}+2L_{h}\beta-c}{\sigma_{h}+2L_{h}},\,\tfrac{\sigma_{h}-2L_{h}\beta-c}{\sigma_{h}}{}\mathrel{\mid}{}{}\right\}\mathclose{}},

in which case $\xi=(L_{h}/\gamma)\max{\mathopen{}\left\{\beta,\alpha-\beta{}\mathrel{\mid}{}{}\right\}\mathclose{}}\mathcal{j}$ ;

2

or $h$ is $\sigma_{h}$ -strongly convex, ${\nabla}f$ is $L_{f}$ -Lipschitz, $\beta=0$ , and $\gamma\leq(\sigma_{h}-c)/(\sigma_{h}L_{f,h}+2L_{f}),$ in which case $\xi=L_{f}\mathcal{j}$ .

Proof.

As motivated in the proof of 3.9, it suffices to prove the bounds on $\gamma$ . Similarly, the proof of assertion 2 is an easy adaptation of that of 1. Setting $p_{f,h}=0$ and $p_{-f,h}=-1$ in 3.6, one has:

1 From 1 we otain $0{}<{}c{}={}1+2\beta-3\alpha$ and $-(1-3\alpha)/2{}<{}\beta{}\leq{}0$ . Again by possibly allowing looser values of $c$ , the bound on $\gamma=\alpha/L_{f,h}$ as in 1 is obtained.

1 The value of $L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}$ as in 1 reduces to $\frac{L_{h}}{\gamma}\max{\mathopen{}\left\{\beta,\alpha-\beta{}\mathrel{\mid}{}{}\right\}\mathclose{}}$ . Plugged into 2 yields $0{}<{}c{}={}(1-\alpha)\sigma_{h}{}-{}2L_{h}\max{\mathopen{}\left\{\beta,\alpha-\beta{}\mathrel{\mid}{}{}\right\}\mathclose{}}$ , which can be loosened as

{\mathopen{}\left\{\begin{array}[]{@{}l@{}l@{}}c{}\leq{}(1-\alpha)\sigma_{h}{}-{}2L_{h}\beta\\ c{}\leq{}(1-\alpha)\sigma_{h}{}-{}2L_{h}(\alpha-\beta).\end{array}\right.\mathclose{}}

In terms of $\gamma=\alpha/L_{f,h}$ , this results in the bound of assertion 1. ∎

Similarly, when $f$ is concave (that is, $-f$ is convex), then $\sigma_{-f,h}=0$ can be considered, resulting in $p_{f,h}=-1$ and $p_{-f,h}=0$ . The proof is omitted, as it uses the same arguments as in the previous results.

Corollary 3.11 (bounds when $f$ is concave).

Suppose that I holds and that $f$ is concave. All the claims of 3.6 remain valid if $\gamma>0$ , $\beta\in\varmathbb R$ and $c>0$ are such that

1

either $(c-1)/2\leq\beta<0$ and $\gamma\leq-\beta/L_{f,h},$ in which case $\xi=\vphantom{f}\smash{\hat{f}}_{\!\beta}$ ;

or $h$ is $\sigma_{h}$ -strongly convex and $L_{h}$ -Lipschitz differentiable,

\tfrac{c-\sigma_{h}}{2L_{h}}\leq\beta<\tfrac{\sigma_{h}}{2L_{h}}\quad\text{and}\quad\gamma\leq\tfrac{1}{L_{f,h}}\tfrac{\sigma_{h}-2L_{h}\beta-c}{2L_{h}},

in which case $\xi=\frac{L_{h}}{\gamma}\max{\mathopen{}\left\{\alpha+\beta,-\beta{}\mathrel{\mid}{}{}\right\}\mathclose{}}\mathcal{j}$ ;

3

or $h$ is $\sigma_{h}$ -strongly convex, $f$ is $L_{f}$ -Lipschitz differentiable, $\beta=0$ and $\gamma\leq(\sigma_{h}-c)/(2L_{f}),$ in which case $\xi=L_{f}\mathcal{j}$ .

4 Convergence analysis

In this section we study the behavior of sequences generated by $i^{*}$ FRB. Although some basic convergence results can be derived in the full generality of I, establishing local optimality guarantees of the limit point(s) will ultimately require an additional full domain assumption.

Assumption II.

Function $h$ has full domain, that is, $C=\varmathbb R^{n}$ .

II is standard for nonconvex splitting algorithms in a relative smooth setting. To the best of our knowledge, the question regarding whether this requirement can be removed remains open; see, e.g., [35] and the references therein.

4.1 Function value convergence

We begin with the convergence of merit function value.

Theorem 4.1 (function value convergence of $i^{*}$ FRB).

Let $(x^{k})_{k\in\varmathbb N}$ be a sequence generated by $i^{*}$ FRB (Algorithm 1) in the setting of 3.6. Then,

It holds that

\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}{\mathopen{}\left(x^{k+1},x^{k}\right)\mathclose{}}\leq\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}{\mathopen{}\left(x^{k},x^{k-1}\right)\mathclose{}}-\tfrac{c}{2\gamma}\operatorname{D}_{h}{\mathopen{}\left(x^{k+1},x^{k}\right)\mathclose{}}-\tfrac{c}{2\gamma}\operatorname{D}_{h}{\mathopen{}\left(x^{k},x^{k-1}\right)\mathclose{}}.

(4.1)

Then, $\sum_{k=0}^{\infty}\operatorname{D}_{h}{\mathopen{}\left(x^{k},x^{k-1}\right)\mathclose{}}<\infty$ and $\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},x^{k-1})\to\varphi^{\star}$ as $k\to\infty$ for some $\varphi^{\star}\geq\operatorname*{inf}\varphi_{\overline{C}}$ .

2.

If $\varphi_{\overline{C}}$ is level bounded, then $(x^{k})_{k\in\varmathbb N}$ is bounded.
3.

Suppose that II holds, and let $\omega$ be the set of limit points of $(x^{k})_{k\in\varmathbb N}$ . Then, $\varphi$ is constant on $\omega$ with value $\varphi^{\star}$ , and for every $x^{\star}\in\omega$ it holds that $x^{\star}\in\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{\star},x^{\star})$ and $0\in\hat{\partial}\varphi(x^{\star})$ .

Proof.

1 Recall from 3.6 that (4.1) holds and that $\operatorname*{inf}\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}=\operatorname*{inf}\varphi_{\overline{C}}>-\infty$ , from which the convergence of $(\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},x^{k-1}))_{k\in\varmathbb N}$ readily follows. In turn, a telescoping argument on (4.1) shows that $\sum_{k\in\varmathbb N}\operatorname{D}_{h}(x^{k},x^{k-1})<\infty$ .
2 It follows from Item 1 that $\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}{\mathopen{}\left(x^{k+1},x^{k}\right)\mathclose{}}\leq\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{0},x^{-1})$ holds for every $k$ . Then boundedness of $(x^{k})_{k\in\varmathbb N}$ is implied by level boundedness of $\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}$ ; see Item 2.

3 Suppose that a subsequence $(x^{k_{j}})_{j\in\varmathbb N}$ converges to a point $x^{\star}$ , then so do $(x^{k_{j}\pm 1})_{k\in\varmathbb N}$ by Item 1 and [6, Prop. 2.2(iii)]. Since $x^{k_{j}+1}\in\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k_{j}},x^{k_{j}-1})$ , by passing to the limit osc of $\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}$ (Item 3) implies that $x^{\star}\in\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{\star},x^{\star})$ . Invoking Item 5 yields the stationarity condition $0\in\hat{\partial}\varphi(x^{\star})$ . Moreover, by continuity of $\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}$ one has

\varphi^{\star}{}\mathrel{{}\mathop{=}\limits^{\text{\clap{\tiny(def)}}}}{}\lim_{k\to\infty}\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}{\mathopen{}\left(x^{k},x^{k-1}\right)\mathclose{}}{}={}\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{\star},x^{\star}){}={}\varphi(x^{\star}),

where the last equality follows from Item 2, owing to the inclusion $x^{\star}\in\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{\star},x^{\star})$ (and the fact that $\operatorname{D}_{\psi}(x,x)=0$ for any differentiable function $\psi$ ). From the arbitrarity of $x^{\star}\in\omega$ we conclude that $\varphi\equiv\varphi^{\star}$ on $\omega$ . ∎

It is now possible to demonstrate the necessity of some of the bounds on the stepsize that were discussed in Section 3.4, by showing that $\operatorname{D}_{h}(x^{k+1},x^{k})$ may otherwise fail to vanish. Note that, for $\beta=0$ , the following counterexample constitutes a tightness certificate for the bound $\gamma<1/3L_{f}$ derived in [42] in the noninertial Euclidean case.

Example 4.2.

The bound $\alpha=\gamma L_{f,h}<(1+2\beta)/3$ is tight even in the Euclidean case. To see this, consider $g=\operatorname{\delta}_{{\mathopen{}\left\{\pm 1{}\mathrel{\mid}{}{}\right\}\mathclose{}}}$ and for a fixed $L>0$ let $f(x)=Lh(x)=\frac{L}{2}x^{2}$ . Then, one has $L_{f,h}=\sigma_{f,h}=L$ and $\sigma_{-f,h}=-L$ . For $\gamma<1/L=1/[\sigma_{-f,h}]_{-}$ , it is easy to see that

\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-}){}={}\operatorname{sgn}\bigl{(}{\nabla}\hat{h}(x)-{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}(x)+{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}(x^{-})\bigr{)}{}={}\operatorname{sgn}\bigl{(}(1-2\alpha+\beta)x+(\alpha-\beta)x^{-}\bigr{)}

(with $\operatorname{sgn}0\coloneqq{\mathopen{}\left\{\pm 1{}\mathrel{\mid}{}{}\right\}\mathclose{}}$ ). If $\alpha\geq(1+2\beta)/3$ , then $((-1)^{k})_{k\in\varmathbb N}$ is a sequence generated by $i^{*}$ FRB for which $\operatorname{D}_{h}(x^{k+1},x^{k})\equiv 2\not\to 0$ . ∎

As a consequence of Item 1, the condition $\operatorname{D}_{h}(x^{k+1},x^{k})\leq\varepsilon$ is satisfied in finitely many iterations for any tolerance $\varepsilon>0$ . While this could be used as termination criterion, in the generality of s Iand II there is no guarantee on the relaxed stationarity measure $\operatorname{dist}(0,\hat{\partial}\varphi(x^{k+1}))$ , which through Item 4 can only be estimated as

\operatorname{dist}(0,\hat{\partial}\varphi(x^{k+1})){}\leq{}\|v^{k+1}\|\leavevmode\nobreak\ \leavevmode\nobreak\ \text{with}\leavevmode\nobreak\ \leavevmode\nobreak\ v^{k+1}\coloneqq{\nabla}\hat{h}(x^{k})-{\nabla}\hat{h}(x^{k+1})-{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}(x^{k})+{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}(x^{k-1}).

(4.2)

On the other hand, in accounting for possibly unbounded sequences, additional assumptions are needed for the condition $\|v^{k+1}\|\leq\varepsilon$ to be met in finitely many iterations.

Lemma 4.3 (termination criteria).

Suppose that II holds, and let $(x^{k})_{k\in\varmathbb N}$ be a sequence generated by $i^{*}$ FRB (Algorithm 1) in the setting of 3.6. If

1

either $\varphi$ is level bounded,
2

or $h^{\ast}$ is uniformly convex (equivalently, $h$ is uniformly smooth),

then, for $v^{k+1}$ as in (4.2) it holds that $v^{k+1}\to 0$ . Thus, for any $\varepsilon>0$ the condition $\|v^{k+1}\|\leq\varepsilon$ is satisfied for all $k$ large enough and guarantees $\operatorname{dist}(0,\hat{\partial}\varphi(x^{k+1}))\leq\varepsilon$ .

Proof.

The implication of $\|v^{k+1}\|\leq\varepsilon$ and $\varepsilon$ -stationarity of $x^{k+1}$ has already been discussed. If $\varphi$ is level bounded, then 4.1 implies that $(x^{k})_{k\in\varmathbb N}$ is bounded and $v^{k+1}\to 0$ holds by continuity of ${\nabla}\hat{h}$ and ${\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}$ . In case $h^{\ast}$ is uniformly convex, this being equivalent to uniform smoothness of $h$ as shown in [4, Cor. 2.8], the vanishing of $\operatorname{D}_{h^{\ast}}({\nabla}h(x^{k}),{\nabla}h(x^{k+1}))=\operatorname{D}_{h}(x^{k+1},x^{k})$ implies through [32, Prop. 4.13(IV)] that $\|{\nabla}h(x^{k})-{\nabla}h(x^{k+1})\|\to 0$ . In turn, by relative smoothness necessarily $\|{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}(x^{k+1})-{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}(x^{k})\|\to 0$ as well, overall proving that $v^{k+1}\to 0$ . ∎

4.2 Global convergence

In this subsection, we work towards the global sequential convergence of $i^{*}$ FRB. To this end, we introduce a key concept which will be useful soon. For $\eta\in(0,\infty]$ , denote by $\Psi_{\eta}$ the class of functions $\psi:[0,\eta)\rightarrow\varmathbb R_{+}$ satisfying the following: (i) $\psi(t)$ is right-continuous at $t=0$ with $\psi(0)=0$ ; (ii) $\psi$ is strictly increasing on $[0,\eta)$ .

Definition 4.4 ([41, Def. 5]).

Let $f:\varmathbb R^{n}\rightarrow\overline{\varmathbb}R$ be proper and lsc. Let $\bar{x}\in\operatorname{dom}\partial f$ and $\mu\in\varmathbb R$ , and let $V\subseteq\operatorname{dom}\partial f$ be a nonempty subset.

1.

We say that $f$ has the pointwise generalized concave Kurdyka-Łojasiewicz (KL) property at $\bar{x}\in\operatorname{dom}\partial f$ , if there exist a neighborhood $U\ni\bar{x}$ , $\eta\in(0,\infty]$ and a concave $\psi\in\Psi_{\eta}$ , such that for all $x\in U\cap[0<f-f(\bar{x})<\eta]$ ,

$\psi^{\prime}_{-}\bigl{(}f(x)-f(\bar{x})\bigr{)}\cdot\operatorname{dist}\bigl{(}0,\partial f(x)\bigr{)}\geq 1,$

where $\psi_{-}^{\prime}$ denotes the left derivative. Moreover, $f$ is a generalized concave KL function if it has the generalized concave KL property at every $x\in\operatorname{dom}\partial f$ .
2.

Suppose that $f(x)=\mu$ on $V$ . We say $f$ has the setwise³³3We shall omit adjectives “pointwise” and “setwise” whenever there is no ambiguity. generalized concave KL property on $V$ if there exist $U\supset V$ , $\eta\in(0,\infty]$ and a concave $\psi\in\Psi_{\eta}$ such that for every $x\in U\cap[0<f-\mu<\eta]$ ,

$\psi^{\prime}_{-}\bigl{(}f(x)-\mu\bigr{)}\cdot\operatorname{dist}\bigl{(}0,\partial f(x)\bigr{)}\geq 1.$

For a subset $\Omega\subseteq\varmathbb R^{n}$ , define $(\forall\varepsilon>0)$ $\Omega_{\varepsilon}={\mathopen{}\left\{x\in\varmathbb R^{n}{}\mathrel{\mid}{}\operatorname{dist}(x,\Omega)<\varepsilon\right\}\mathclose{}}$ .

Lemma 4.5 ([41, Lem. 4.4]).

Let $f:\varmathbb R^{n}\rightarrow\overline{\varmathbb}R$ be proper lsc and let $\mu\in\varmathbb R$ . Let $\Omega\subseteq\operatorname{dom}\partial f$ be a nonempty compact set on which $f(x)=\mu$ for all $x\in\Omega$ . Then the following hold:

1.

Suppose that $f$ satisfies the pointwise generalized concave KL property at each $x\in\Omega$ . Then there exist $\varepsilon>0,\eta\in(0,\infty]$ and $\psi\in\Psi_{\eta}$ such that $f$ has the setwise generalized concave KL property on $\Omega$ with respect to $U=\Omega_{\varepsilon}$ , $\eta$ and $\psi$ .

Set $U=\Omega_{\varepsilon}$ and define $h:(0,\eta)\rightarrow\varmathbb R_{+}$ by

h(s)=\sup{\mathopen{}\left\{\operatorname{dist}^{-1}{\mathopen{}\left(0,\partial f(x)\right)\mathclose{}}{}\mathrel{\mid}{}x\in U\cap[s\leq f-\mu<\eta]\right\}\mathclose{}}.

Then the function $\tilde{\psi}:[0,\eta)\rightarrow\varmathbb R_{+}$ defined by $(\forall 0<t<\eta)\leavevmode\nobreak\ \tilde{\psi}(t)=\int_{0}^{t}h(s)ds$ with $\tilde{\psi}(0)=0$ is well defined and belongs to $\Psi_{\eta}$ . The function $f$ has the setwise generalized concave KL property on $\Omega$ with respect to $U$ , $\eta$ and $\tilde{\psi}$ . Moreover,

\tilde{\psi}{}={}\operatorname*{inf}{\mathopen{}\left\{\psi\in\Psi_{\eta}\,\Big{|}\,\text{\begin{tabular}[]{@{}c@{}}$\psi$ is a concave desingularizing function\\ of $f$ on $\Omega$ with respect to $U$ and $\eta$\end{tabular}}\right\}\mathclose{}}.

We say that $\tilde{\psi}$ is the exact modulus of the setwise generalized concave KL property of $f$ on $\Omega$ with respect to $U$ and $\eta$ .

In the remainder of the section, we will make use of the norm ${|\kern-0.86108pt|\kern-0.86108pt|{}\cdot{}|\kern-0.86108pt|\kern-0.86108pt|}$ on the product space $\varmathbb R^{n}\times\varmathbb R^{n}$ defined as ${|\kern-0.86108pt|\kern-0.86108pt|(x,y)|\kern-0.86108pt|\kern-0.86108pt|}=\|x\|+\|y\|$ .

Theorem 4.6 (sequential convergence of $i^{*}$ FRB).

Suppose that II holds, and let $(x^{k})_{k\in\varmathbb N}$ be a sequence generated by $i^{*}$ FRB (Algorithm 1) in the setting of 3.6. Define $(\forall k\in\varmathbb N)$ $z^{k}=(x^{k+1},x^{k},x^{k-1})$ and let $\omega(z^{0})$ be the set of limit points of $(z_{k})_{k\in\varmathbb N}$ . Define

(\forall\omega,x,x^{-}\in\varmathbb R^{n})\leavevmode\nobreak\ \mathcal{F}_{\gamma,\beta}^{h\text{\sc-frb}}(\omega,x,x^{-})=\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(\omega;x,x^{-})+\tfrac{c}{2\gamma}\operatorname{D}_{h}(x,x^{-})+D_{\xi}(x,x^{-}).

Assume in addition the following:

1

$\varphi$ is level bounded.
2

$f,h$ are twice continuously differentiable and $\nabla^{2}h$ is positive definite.
3

$(\exists r>0)\leavevmode\nobreak\ (\exists\eta\in[0,\infty))$ $\mathcal{F}_{\gamma,\beta}^{h\text{\sc-frb}}$ satisfies the generalized concave KL property on $\Omega=\omega(z^{0})$ with respect to $U=\Omega_{r}$ and $\eta$ .

Then $\sum_{k=0}^{\infty}\|{}x^{k+1}-x^{k}{}\|<\infty$ and there exists $x^{\star}$ with $0\in\hat{\partial}\varphi(x^{\star})$ such that $x^{k}\to x^{\star}$ as $k\to\infty$ . To be specific, there exists $k_{0}\in\varmathbb N$ such that for all $l\geq k_{0}+1$

\sum_{k=0}^{\infty}\|{}x^{k+1}-x^{k}{}\|{}\leq{}\sum_{k=0}^{l}\|{}x^{k+1}-x^{k}{}\|{}+{}\tfrac{4\gamma M}{c\sigma}\tilde{\psi}{\mathopen{}\left(\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{l},x^{l-1})-\varphi^{\star}\right)\mathclose{}},

(4.3)

where $M>0$ is some constant, $\sigma>0$ is the strong convexity modulus of $h$ on $\mathcal{B}$ , the closed ball in which $(x^{k})_{k\in\varmathbb N}$ lies, and $\tilde{\psi}$ is the exact modulus of the generalized concave KL property associated with $\mathcal{F}_{\gamma,\beta}^{h\text{\sc-frb}}$ produced by Item 2.

Proof.

Set $(\forall k\in\varmathbb N)$ $\delta_{k}=\mathcal{F}_{\gamma,\beta}^{h\text{\sc-frb}}(z^{k})$ for simplicity. Then $\delta_{k}=\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},x^{k-1})$ , $\delta_{k}\to\varphi^{\star}$ decreasingly and $\operatorname{dist}(x^{k},\omega)\to 0$ as $k\to\infty$ by invoking 4.1. Assume without loss of generality that $(\forall k\in\varmathbb N)$ $\delta_{k}>\varphi^{\star}$ , otherwise we would have $(\exists k_{0}\in\varmathbb N)$ $x^{k_{0}}=x^{k_{0}+1}$ due to Item 1, from which the desired result readily follows by simple induction. Thus $(\exists k_{0}\in\varmathbb N)$ $(\forall k\geq k_{0})$ $z^{k}\in\Omega_{r}\cap[0<\mathcal{F}_{\gamma,\beta}^{h\text{\sc-frb}}-\varphi^{\star}<\eta]$ . Appealing to Item 3 and Item 1 yields that $\mathcal{F}_{\gamma,\beta}^{h\text{\sc-frb}}$ is constantly equal to $\varphi^{\star}$ on the compact set $\omega(z^{0})$ . In turn, all conditions in Item 2 are satisfied, which implies that for $k\geq k_{0}$

1\leq(\tilde{\psi})^{\prime}_{-}{\mathopen{}\left(\delta_{k}-\varphi^{\star}\right)\mathclose{}}\operatorname{dist}{\mathopen{}\left(0,\partial\mathcal{F}_{\gamma,\beta}^{h\text{\sc-frb}}(z^{k})\right)\mathclose{}}.

(4.4)

Define $(\forall k\in\varmathbb N)$

	$\displaystyle u^{k}={}$	$\displaystyle\nabla^{2}\hat{h}(x^{k})(x^{k}-x^{k+1})+\nabla^{2}\vphantom{f}\smash{\hat{f}}_{\!\beta}(x^{k})(x^{k+1}-x^{k})+\tfrac{c}{2\gamma}\bigl{(}{\nabla}h(x^{k})-{\nabla}h(x^{k-1})\bigr{)}$
		$\displaystyle+{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}(x^{k-1})-{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}(x^{k})+{\nabla}\xi(x^{k})-{\nabla}\xi(x^{k-1}),$
	$\displaystyle v^{k}={}$	$\displaystyle\nabla^{2}\vphantom{f}\smash{\hat{f}}_{\!\beta}(x^{k-1})(x^{k}-x^{k+1})+\tfrac{c}{2\gamma}\nabla^{2}h(x^{k-1})(x^{k-1}-x^{k})+\nabla^{2}\xi(x^{k-1})(x^{k-1}-x^{k}).$

Then $\|{}u^{k}{}\|\leq(M_{1}+M_{2})\|{}x^{k+1}-x^{k}{}\|+(M_{1}+(c/2\gamma)M_{3}+M_{4})\|{}x^{k}-x^{k-1}{}\|$ and $\|{}v^{k}{}\|\leq M_{1}\|{}x^{k+1}-x^{k}{}\|+((c/2\gamma)M_{3}+M_{4})\|{}x^{k}-x^{k-1}{}\|$ , where $M_{1}=\sup\|\nabla^{2}\vphantom{f}\smash{\hat{f}}_{\!\beta}(\mathcal{B})\|$ , $M_{2}=\sup\|\nabla^{2}\hat{h}(\mathcal{B})\|$ , $M_{3}=\sup\|{}\nabla^{2}h(\mathcal{B}){}\|$ , and $M_{4}=\sup\|{}\nabla^{2}\xi(\mathcal{B}){}\|$ . Applying subdifferential calculus to $\partial\mathcal{F}_{\gamma,\beta}^{h\text{\sc-frb}}(z_{k})$ yields that

\partial\mathcal{F}_{\gamma,\beta}^{h\text{\sc-frb}}(z^{k}){}={}\bigl{(}\partial\varphi(x^{k+1})+{\nabla}\hat{h}(x^{k+1})-{\nabla}\hat{h}(x^{k})+{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}(x^{k})-{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}(x^{k-1})\bigr{)}\times\{u^{k}\}\times\{v^{k}\},

which together with Item 4 entails that $(0,u^{k},v^{k})\in\partial\mathcal{F}_{\gamma,\beta}^{h\text{\sc-frb}}(z^{k})$ . In turn, summing the aforementioned bounds on $\|{}u^{k}{}\|$ and $\|{}v^{k}{}\|$ gives

\operatorname{dist}{\mathopen{}\left(0,\partial\mathcal{F}_{\gamma,\beta}^{h\text{\sc-frb}}(z^{k})\right)\mathclose{}}\leq M{|\kern-0.86108pt|\kern-0.86108pt|(x^{k+1}-x^{k},x^{k}-x^{k-1})|\kern-0.86108pt|\kern-0.86108pt|},

(4.5)

where $M=\max{\mathopen{}\left\{2M_{1}+M_{2},M_{1}+(c/\gamma)M_{3}+2M_{4}{}\mathrel{\mid}{}{}\right\}\mathclose{}}$ .

Finally, we show that $(x^{k})_{k\in\varmathbb N}$ is convergent. For simplicity, define $(\forall k,l\in\varmathbb N)$ $\Delta_{k,l}=\tilde{\psi}{\mathopen{}\left(\delta_{k}-\varphi^{\star}\right)\mathclose{}}-\tilde{\psi}{\mathopen{}\left(\delta_{l}-\varphi^{\star}\right)\mathclose{}}$ . Then, combining (4.4) and (4.5) yields

	$\displaystyle 1$	$\displaystyle{}\leq{}M(\tilde{\psi})^{\prime}_{-}(\delta_{k}-\varphi^{\star}){\|\kern-0.86108pt\|\kern-0.86108pt\|(x^{k+1}-x^{k},x^{k}-x^{k-1})\|\kern-0.86108pt\|\kern-0.86108pt\|}{}\leq{}\tfrac{M\Delta_{k,k+1}}{\delta_{k}-\delta_{k+1}}{\|\kern-0.86108pt\|\kern-0.86108pt\|(x^{k+1}-x^{k},x^{k}-x^{k-1})\|\kern-0.86108pt\|\kern-0.86108pt\|}$
		$\displaystyle{}\leq{}\frac{2\gamma M\Delta_{k,k+1}}{c{\mathopen{}\left(\operatorname{D}_{h}(x^{k+1},x^{k})+\operatorname{D}_{h}(x^{k},x^{k-1})\right)\mathclose{}}}{\|\kern-0.86108pt\|\kern-0.86108pt\|(x^{k+1}-x^{k},x^{k}-x^{k-1})\|\kern-0.86108pt\|\kern-0.86108pt\|}$
		$\displaystyle{}\leq{}\frac{2\gamma M\Delta_{k,k+1}{\|\kern-0.86108pt\|\kern-0.86108pt\|(x^{k+1}-x^{k},x^{k}-x^{k-1})\|\kern-0.86108pt\|\kern-0.86108pt\|}}{c\sigma\\|{}x^{k+1}-x^{k}{}\\|^{2}+c\sigma\\|{}x^{k}-x^{k-1}{}\\|^{2}}\leq\frac{4\gamma M\Delta_{k,k+1}}{c\sigma{\|\kern-0.86108pt\|\kern-0.86108pt\|(x^{k+1}-x^{k},x^{k}-x^{k-1})\|\kern-0.86108pt\|\kern-0.86108pt\|}},$

where the second inequality is implied by concavity of $\tilde{\psi}$ , the third one follows from (4.1), and the fourth one holds because $\sigma>0$ is the strong convexity modulus of $h$ on $\mathcal{B}$ . Hence,

{|\kern-0.86108pt|\kern-0.86108pt|(x^{k+1}-x^{k},x^{k}-x^{k-1})|\kern-0.86108pt|\kern-0.86108pt|}\leq\tfrac{4\gamma M}{c\sigma}\Delta_{k,k+1}.

(4.6)

Summing (4.6) from $k=k_{0}$ to an arbitrary $l\geq k_{0}+1$ and passing $l$ to infinity justifies (4.3). A similar procedure shows that $(x^{k})_{k\in\varmathbb N}$ is Cauchy, which together with Item 3 entails the rest of the statement. ∎

Remark 4.7.

Note that $\tilde{\psi}$ is the smallest concave desingularizing function associated with $\mathcal{F}_{\gamma,\beta}^{h\text{\sc-frb}}$ . Therefore (4.3) is the sharpest upper bound on $\sum_{k=0}^{\infty}\|{}x^{k+1}-x^{k}{}\|$ produced by the usual KL convergence framework. We refer readers to [10, §6] for a summary of such a framework. ∎

The corollary below states that semialgebraic functions satisfy the assumptions in 4.6.

Corollary 4.8.

Suppose that II holds, and let $(x^{k})_{k\in\varmathbb N}$ be a sequence generated by $i^{*}$ FRB (Algorithm 1) in the setting of 3.6. Assume in addition that

1

$\varphi$ is level-bounded,
2

$f,h$ are twice continuously differentiable and $\nabla^{2}h$ is positive definite, and
3

$\varphi,h$ are semialgebraic.

Then $\sum_{k=0}^{\infty}\|{}x^{k+1}-x^{k}{}\|<\infty$ and there exists $x^{\star}$ with $0\in\hat{\partial}\varphi(x^{\star})$ such that $x^{k}\to x^{\star}$ .

Proof.

Item 2 entails that $(x_{k})_{k\in\varmathbb N}$ is bounded. Note that the class of semialgebraic functions is closed under summation and satisfies the generalized concave KL property; see, e.g., [2, §4.3] and [41, §2.2]. Hence so does $\mathcal{F}_{\gamma,\beta}^{h\text{\sc-frb}}$ defined in 4.6. Applying 4.6 completes the proof. ∎

4.3 Convergence rates

Having established convergence of $i^{*}$ FRB, we now turn to its rate. Recall that a function is said to have KL exponent $\theta\in[0,1)$ if it satisfies the generalized concave KL property (recall 4.4) and there exists a desingularizing function of the form $\psi(t)=ct^{1-\theta}$ for some $c>0$ .

Theorem 4.9 (function value and sequential convergence rate).

Suppose that all the assumptions in 4.6 are satisfied, and follow the notation therein. Define $(\forall k\in\varmathbb N)$ $e_{k}=\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k+1},x^{k})-\varphi^{\star}$ . Assume in addition that $\mathcal{F}_{\gamma,\beta}^{h\text{\sc-frb}}$ has KL exponent $\theta\in[0,1)$ at $(x^{\star},x^{\star},x^{\star})$ . Then the following hold:

1.

If $\theta=0$ , then $e_{k}\to 0$ and $x^{k}\to x^{\star}$ after finite steps.
2.

If $\theta\in(0,1/2]$ , then there exist $c_{1},\hat{c}_{1}>0$ and $Q_{1},\hat{Q}_{1}\in[0,1)$ such that for $k$ sufficiently large,

$e_{k}\leq\hat{c}_{1}\hat{Q}_{1}^{k}\text{ and }\|{}x^{k}-x^{\star}{}\|\leq c_{1}Q_{1}^{k}.$

If $\theta\in(1/2,1)$ , then there exist $c_{2},\hat{c}_{2}>0$ such that for $k$ sufficiently large,

e_{k}\leq\hat{c}_{2}k^{-\frac{1}{2\theta-1}}\text{ and }\|{}x^{k}-x^{\star}{}\|\leq c_{2}k^{-\frac{1-\theta}{2\theta-1}}.

Proof.

Assume without loss of generality that $\mathcal{F}_{\gamma,\beta}^{h\text{\sc-frb}}$ has a desingularizing function $\psi(t)=t^{1-\theta}/(1-\theta)$ and let $(\forall k\in\varmathbb N)$ $\delta_{k}=\sum_{i=k}^{\infty}\|{}x^{i+1}-x^{i}{}\|$ . We claim that

(\forall k\geq k_{0})\leavevmode\nobreak\ \delta_{k}\leq\tfrac{4\gamma M}{(1-\theta)c\sigma}e_{k-1}^{1-\theta}+2\sqrt{\tfrac{2\gamma}{c\sigma}}\sqrt{e_{k-1}},

(4.7)

which will be justified at the end of this proof. It is routine to see that the desired sequential rate can be implied by those of $(e_{k})$ through (4.7); see, e.g., [40, Thm. 5.3], therefore it suffices to prove convergence rate of $(e_{k})$ .

Recall from Item 1 that $(e_{k})$ is a decreasing sequence converging to $0$ . Then invoking the KL exponent assumption yields $e_{k-1}^{\theta}=[\mathcal{F}_{\gamma,\beta}^{h\text{\sc-frb}}(x^{k+1},x^{k},x^{k-1})-\varphi^{\star}]^{\theta}\leq\operatorname{dist}(0,\partial\mathcal{F}_{\gamma,\beta}^{h\text{\sc-frb}}(x^{k+1},x^{k},x^{k-1}))$ , which together with (4.5) implies that

e_{k}^{\theta}\leq e_{k-1}^{\theta}\leq M{|\kern-0.86108pt|\kern-0.86108pt|(x^{k+1}-x^{k},x^{k}-x^{k-1})|\kern-0.86108pt|\kern-0.86108pt|}.

(4.8)

Appealing again to Item 1 gives

	$\displaystyle e_{k-1}-e_{k}$	$\displaystyle=\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},x^{k-1})-\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k+1},x^{k})$
		$\displaystyle\geq\tfrac{c}{2\gamma}{\mathopen{}\left[\operatorname{D}_{h}(x^{k+1},x^{k})+\operatorname{D}_{h}(x^{k},x^{k-1})\right]\mathclose{}}$
		$\displaystyle\geq\tfrac{c\sigma}{8\gamma}{\|\kern-0.86108pt\|\kern-0.86108pt\|(x^{k+1}-x^{k},x^{k}-x^{k-1})\|\kern-0.86108pt\|\kern-0.86108pt\|}^{2}$
		$\displaystyle\geq\tfrac{c\sigma}{8\gamma M^{2}}e_{k}^{2\theta},$

where the last inequality is implied by (4.8). Then applying [11, Lem. 10] justifies the desired convergence rate of $(e_{k})$ .

Finally, we show that (4.7) holds. Invoking Assumptions 2and 1 entails ${|\kern-0.86108pt|\kern-0.86108pt|(x^{k+1}-x^{k},x^{k}-x^{k-1})|\kern-0.86108pt|\kern-0.86108pt|}^{2}\leq(4/\sigma)[\operatorname{D}_{h}(x^{k+1},x^{k})+\operatorname{D}_{h}(x^{k},x^{k-1})]\leq(8\gamma/c\sigma)(e_{k-1}-e_{k})\leq(8\gamma/c\sigma)e_{k-1},$ thus $\|{}x^{k}-x^{k-1}{}\|\leq 2\sqrt{(2\gamma/c\sigma)e_{k-1}}$ . In turn,

	$\displaystyle\delta_{k}$	$\displaystyle\leq\delta_{k}+\\|{}x^{k}-x^{k-1}{}\\|\leq\tfrac{4\gamma M}{c\sigma}\tilde{\psi}(e_{k})+2\sqrt{\tfrac{2\gamma}{c\sigma}}\sqrt{e_{k-1}}$
		$\displaystyle\leq\tfrac{4\gamma M}{c\sigma}\tilde{\psi}(e_{k-1})+2\sqrt{\tfrac{2\gamma}{c\sigma}}\sqrt{e_{k-1}}\leq\tfrac{4\gamma M}{(1-\theta)c\sigma}e_{k-1}^{1-\theta}+2\sqrt{\tfrac{2\gamma}{c\sigma}}\sqrt{e_{k-1}},$

where the second inequality follows from (4.3), the third one holds due to the fact that $e_{k}\leq e_{k-1}$ and the monotonicity of $\tilde{\psi}$ , and the last one is implied by the KL exponent assumption and Lemma 4.5, as claimed. ∎

5 Globalizing fast local methods with $i^{*}$ FRB

With some due care, the method can be enhanced in the context of the continuous-Lyapunov descent (CLyD) framework [36, §4], so as to globalize fast local methods $x^{+}=x+d$ by using the same oracle as $i^{*}$ FRB. Methods of quasi-Newton type constitute a convenient class of candidate local methods. A prototypical use case hinges on the inclusion $x\in\operatorname{S}(x)\coloneqq\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x)$ encoding necessary optimality conditions, cf. Item 5, so that fast update directions $d$ can be retrieved based on the root-finding problem $0\in({\rm id}-\operatorname{S})(x)$ ; see, e.g., [15, §7] and [17]. Regardless, while the globalization framework is flexible to accommodate any update direction and yet retains convergence of $i^{*}$ FRB, it promotes the ones triggering fast local convergence as it will be demonstrated in 5.3.

Algorithm 2

i^{*}

FRB linesearch extension

4:Compute

\bar{x}^{k}\in\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},y^{k-1})

\triangleright

\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(\bar{x}^{k},x^{k})\leq\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},y^{k-1})

{}-\frac{c}{2\gamma}\operatorname{D}_{h}(\bar{x}^{k},x^{k})-\frac{c}{2\gamma}\operatorname{D}_{h}(x^{k},y^{k-1})

5:Choose an update direction

d^{k}

x^{k}

Set

y^{k}=x^{k}+\tau_{k}d^{k}

and

x^{k+1}=(1-\tau_{k})\bar{x}^{k}+\tau_{k}(x^{k}+d^{k})

with

\tau_{k}

the largest in

{\mathopen{}\left\{1,1/2,\dots\mathrel{\mid}{}{}\right\}\mathclose{}}

such that

\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k+1},y^{k}){}\leq{}\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},y^{k-1}){}-{}\tfrac{\delta c}{2\gamma}{\mathopen{}\left(\operatorname{D}_{h}(\bar{x}^{k},x^{k}){}+{}\operatorname{D}_{h}(x^{k},y^{k-1})\right)\mathclose{}}

(5.1)

The algorithmic framework revolves around two key facts: continuity of $\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}$ and its decrease after $i^{*}$ FRB steps $(x,y^{-})\mapsto(\bar{x},x)$ with $\bar{x}\in\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,y^{-})$ . (The reason for introducing an auxiliary variable $y$ will be discussed after 5.1.) Thus, not only is $\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}$ smaller than $\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,y^{-})$ at $(\bar{x},x)$ , but also at sufficiently close points, thereby ensuring that by gradually pushing the tentative fast update towards the safeguard $(\bar{x},x)$ a decrease on $\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}$ is eventually achieved. This fact is formalized next.

Theorem 5.1 (well definedness and asymptotic analysis of Algorithm 2).

Suppose that I and the bounds on $\gamma$ and $\beta$ as in 3.6 are satisfied. Then, the following hold for the iterates generated by Algorithm 2:

1.

Regardless of what the selected update direction $d^{k}$ is, the linesearch at step 6 always succeeds: either $\bar{x}^{k}=x^{k}=y^{k-1}$ holds, in which case $0\in\hat{\partial}\varphi(\bar{x}^{k})$ , or there exists $\bar{\tau}_{k}>0$ such that (5.1) holds for every $\tau_{k}\leq\bar{\tau}_{k}$ .
2.

$\sum_{k\in\varmathbb N}\operatorname{D}_{h}(\bar{x}^{k},x^{k})<\infty$ , and in particular $\operatorname{D}_{h}(\bar{x}^{k},x^{k})\to 0$ .
3.

If $\varphi_{\overline{C}}$ is coercive, then the sequence $(x^{k})_{k\in\varmathbb N}$ remains bounded.
4.

Suppose that II holds, and let $\omega$ be the set of limit points of $(x^{k})_{k\in\varmathbb N}$ . Then, $\varphi$ is constant on $\omega$ with value $\varphi^{\star}$ , and for every $x^{\star}\in\omega$ it holds that $x^{\star}\in\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{\star},x^{\star})$ and $0\in\hat{\partial}\varphi(x^{\star})$ .
5.

Under the assumptions of 4.3, for any $\varepsilon>0$ the condition $\|{\nabla}\hat{h}(x^{k})-{\nabla}\hat{h}(\bar{x}^{k})-{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}(x^{k})+{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}(y^{k-1})\|\leq\varepsilon$ holds for all $k$ large enough and guarantees that $\operatorname{dist}(0,\hat{\partial}\varphi(\bar{x}^{k}))\leq\varepsilon$ .

Proof.

1 Stationarity of $\bar{x}^{k}$ when $\bar{x}^{k}=x^{k}=y^{k}$ follows from Item 5. Otherwise, let $x_{\tau}^{k+1}\coloneqq(1-\tau)\bar{x}^{k}+\tau(x^{k}+d^{k})$ and $y_{\tau}^{k}\coloneqq x^{k}+\tau d^{k}$ , and note that $(x_{\tau}^{k+1},y_{\tau}^{k})\to(\bar{x}^{k},x^{k})$ as $\tau\searrow 0$ . By continuity of $\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}$ we thus have

	$\displaystyle\lim_{\tau\searrow 0}\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x_{\tau}^{k+1},y_{\tau}^{k}){}={}$	$\displaystyle\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(\bar{x}^{k},x^{k})$
	$\displaystyle{}\mathrel{{}\mathop{\leq}\limits^{\smash{\raisebox{1.5pt}{\text{\scriptsize\clap{\begin{tabular}[b]{c}\ref{thm:SD}\end{tabular}}}}}}}{}$	$\displaystyle\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},y^{k-1}){}-{}\tfrac{c}{2\gamma}{\mathopen{}\left(\operatorname{D}_{h}(\bar{x}^{k},x^{k}){}+{}\operatorname{D}_{h}(x^{k},y^{k-1})\right)\mathclose{}}$
and since one at least among $\operatorname{D}_{h}(\bar{x}^{k},x^{k})$ and $\operatorname{D}_{h}(x^{k},y^{k-1})$ is strictly positive, $\delta\in(0,1)$ , and $c>0$ ,
	$\displaystyle{}<{}$	$\displaystyle\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},y^{k-1}){}-{}\tfrac{\delta c}{2\gamma}{\mathopen{}\left(\operatorname{D}_{h}(\bar{x}^{k},x^{k}){}+{}\operatorname{D}_{h}(x^{k},y^{k-1})\right)\mathclose{}}.$

It then follows that there exists $\bar{\tau}_{k}>0$ such that

\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x_{\tau}^{k+1},y_{\tau}^{k}){}\leq{}\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},y^{k-1}){}-{}\tfrac{\delta c}{2\gamma}{\mathopen{}\left(\operatorname{D}_{h}(\bar{x}^{k},x^{k}){}+{}\operatorname{D}_{h}(x^{k},y^{k-1})\right)\mathclose{}}

holds for every $\tau\in(0,\bar{\tau}_{k}]$ , which is the claim.

2 & 3 Follow by telescoping (5.1), since $\operatorname*{inf}\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}=\operatorname*{inf}\varphi>-\infty$ by Item 2 and from the fact that $\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},y^{k-1})\leq\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{0},y^{-1})$ .
4 & 5 Having shown the validity of (5.1), the proof follows from the same arguments of those of Items 3and 4.3. ∎

It should be noted that 5.1 remains valid even with the simpler choice $y^{k}=x^{k}$ . The apparently wasteful choice of $y^{k}$ as in Algorithm 2 is instead of paramount importance in promoting “good” directions by enabling acceptance of unit stepsize. This is in sharp contrast with known issues of other nonsmooth globalization strategies, where convergence severely hampered (Maratos’ effect [26], see also [17, §6.2]). A quality assessment on the direction is provided by the following notion.

Definition 5.2 (superlinearly convergent directions [15, Eq. (7.5.2)]).

Relative to a sequence $(x^{k})_{k\in\varmathbb N}$ converging to a point $x^{\star}$ we say that $(d^{k})_{k\in\varmathbb N}$ are superlinearly convergent directions if

\lim_{k\to\infty}\frac{\|x^{k}+d^{k}-x^{\star}\|}{\|x^{k}-x^{\star}\|}=0.

Theorem 5.3 (acceptance of unit stepsize).

Suppose that s Iand II and the bounds on $\gamma$ and $\beta$ as in 3.6 are satisfied. Suppose further that $f,h\in\mathcal{C}^{2}$ with ${\nabla}h\succ 0$ , and that $(x^{k})_{k\in\varmathbb N}$ converges to a strong local minimum $x^{\star}$ for $\varphi$ satisfying $\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{\star},x^{\star})={\mathopen{}\left\{x^{\star}{}\mathrel{\mid}{}{}\right\}\mathclose{}}$ .⁴⁴4Although it is only the inclusion $x^{\star}\in\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{\star},x^{\star})$ which is guaranteed for any limit point in the full generality of s Iand II (cf. Item 4), additionally requiring single-valuedness is a negligible extra assumption as it can be inferred from Item 5; see [1, §3.1] and [36, §2.4] for a detailed discussion. If the directions $(d^{k})_{k\in\varmathbb N}$ are superlinearly convergent with respect to $(x^{k})_{k\in\varmathbb N}$ , then eventually $\tau_{k}=1$ is always accepted at step 6, and the algorithm reduces to the local method $x^{k+1}=x^{k}+d^{k}$ and converges at superlinear rate.

Proof.

Let $\varphi^{\star}=\varphi(x^{\star})$ be the limit point of $(\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k+1},y^{k}))_{k\in\varmathbb N}$ . Since $x^{k}\to x^{\star}$ , $\bar{x}^{k}\to x^{\star}$ as well, and strong local minimality of $x^{\star}$ implies that $\varphi(\bar{x}^{k})-\varphi^{\star}\geq\frac{\sigma}{2}\|\bar{x}^{k}-x^{\star}\|^{2}$ holds for some $\sigma>0$ and all $k$ large enough. In turn, for $k$ large it holds that

	$\displaystyle\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},y^{k-1}){}\mathrel{{}\mathop{\geq}\limits^{\vphantom{\text{\scriptsize\clap{\begin{tabular}[b]{c}\eqref{eq:L:geq}\end{tabular}}}}\smash{\raisebox{1.5pt}{\text{\scriptsize\clap{\begin{tabular}[b]{c}\eqref{eq:L:geq}\end{tabular}}}}}}}{}$	$\displaystyle\varphi(\bar{x}^{k}){}+{}\tfrac{c}{\gamma}\operatorname{D}_{h}(\bar{x}^{k},x^{k}){}+{}\tfrac{c}{2\gamma}\operatorname{D}_{h}(x^{k},y^{k-1})$
	$\displaystyle{}\geq{}$	$\displaystyle\varphi^{\star}{}+{}\tfrac{\sigma}{2}\\|\bar{x}^{k}-x^{\star}\\|^{2}{}+{}\tfrac{r}{2}\\|\bar{x}^{k}-x^{k}\\|^{2}$
for some $r>0$ (since $\nabla^{2}h\succ 0$ ), and by Young’s inequality
	$\displaystyle{}\geq{}$	$\displaystyle\varphi^{\star}{}+{}\tfrac{\sigma}{2}\\|\bar{x}^{k}-x^{\star}\\|^{2}{}+{}\tfrac{r}{2}{\mathopen{}\left(\tfrac{1}{1+\epsilon}\\|x^{k}-x^{\star}\\|^{2}{}-{}\tfrac{1}{\epsilon}\\|\bar{x}^{k}-x^{\star}\\|^{2}\right)\mathclose{}}$

for every $\epsilon>0$ . By choosing $\epsilon=r/\sigma$ one obtains that

\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},y^{k-1})-\varphi^{\star}\geq\tfrac{\mu}{2}\|x^{k}-x^{\star}\|^{2}\quad\text{for $k$ large enough,}

(5.2)

where $\mu\coloneqq\sigma r/(\sigma+r)>0$ . On the other hand, since $\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x)=\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x)$ for any $x$ , by definition of $\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}$ (cf. (3.4) with $x^{-}=x$ ) it follows that

$\displaystyle\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k}+d^{k},x^{k}+d^{k}){}={}$	$\displaystyle\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k}+d^{k},x^{k}+d^{k})$	(5.3)
$\displaystyle{}\leq{}$	$\displaystyle\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{\star};x^{k}+d^{k},x^{k}+d^{k}){}={}\varphi^{\star}+\operatorname{D}_{\hat{h}}(x^{\star},x^{k}+d^{k})$
$\displaystyle{}\leq{}$	$\displaystyle\varphi^{\star}+\tfrac{L}{2}\\|x^{k}+d^{k}-x^{\star}\\|^{2}$

holds for some $L>0$ and all $k$ large enough, where the last inequality uses the fact that $f,h\in\mathcal{C}^{2}$ (hence $\hat{h}\in\mathcal{C}^{2}$ too). Combined with (5.2) and superlinearity of the directions $(d^{k})_{k\in\varmathbb N}$ we obtain that

\varepsilon_{k}{}\coloneqq{}\frac{\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k}+d^{k},x^{k}+d^{k})-\varphi^{\star}}{\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},y^{k-1})-\varphi^{\star}}{}\to{}0\quad\text{as $k\to\infty$.}

(5.4)

Observe that for any $z^{k}\in\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(\bar{x}^{k},x^{k})$ eventually it holds that $\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(\bar{x}^{k},x^{k})\geq\varphi(z^{k})\geq\varphi^{\star}$ . In fact, the first inequality owes to (3.9b), and the second one holds for $k$ large enough because of local minimality of $x^{\star}$ for $\varphi$ and the fact that

\lim_{k\to\infty}z^{k}{}\in{}\limsup_{k\to\infty}\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(\bar{x}^{k},x^{k}){}\subseteq{}\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{\star},x^{\star}){}={}\{x^{\star}{}\mathrel{\mid}{}{}\}

by osc of $\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}$ (cf. Item 3). Then, for $k$ large enough so that $\varepsilon_{k}\leq 1$ also holds, we have

	$\displaystyle\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k}+d^{k},x^{k}+d^{k})-\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},y^{k-1}){}={}$	$\displaystyle(\varepsilon_{k}-1){\mathopen{}\left(\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},y^{k-1})-\varphi^{\star}\right)\mathclose{}}$
	$\displaystyle{}\leq{}$	$\displaystyle(\varepsilon_{k}-1){\mathopen{}\left(\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},y^{k-1})-\varphi(z^{k})\right)\mathclose{}}$
for $\bar{x}^{k}\in\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},y^{k-1})$ , so that 3.6 yields
	$\displaystyle{}\leq{}$	$\displaystyle\tfrac{(\varepsilon_{k}-1)c}{2\gamma}{\mathopen{}\left(\operatorname{D}_{h}(\bar{x}^{k},x^{k})+\operatorname{D}_{h}(x^{k},y^{k-1})\right)\mathclose{}}.$

Since $\varepsilon_{k}\to 0$ and $\delta\in(0,1)$ , eventually $\varepsilon_{k}-1\leq-\delta$ , hence

\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k}+d^{k},x^{k}+d^{k}){}\leq{}\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},y^{k-1}){}-{}\tfrac{\delta c}{2\gamma}\operatorname{D}_{h}(\bar{x}^{k},x^{k}){}-{}\tfrac{\delta c}{2\gamma}\operatorname{D}_{h}(x^{k},y^{k-1}),

implying that the first attempt with $\tau_{k}=1$ passes the linesearch condition (since, when $\tau_{k}=1$ , one has $x^{k+1}=y^{k}=x^{k}+d^{k}$ ). In particular, the sequence $(x^{k})_{k\in\varmathbb N}$ eventually reduces to $x^{k+1}=x^{k}+d^{k}$ , and thus converges superlinearly. ∎

The core of the proof hinges on showing that $\varepsilon_{k}$ as in (5.4) vanishes. The same argument cannot be achieved by a naive linesearch prescribing $y^{k}=x^{k}$ . Indeed,

0{}\leq{}\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k}+d^{k},x^{k}){}-{}\varphi(x^{\star})\\ {}\leq{}\underbracket{\operatorname{D}_{\hat{h}}(x^{\star},x^{k}+d^{k})}_{O(\|x^{k}+d^{k}-x^{\star}\|^{2})}{}+{}\langle{}x^{\star}-x^{k}-d^{k}{},{}\underbracket{{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}(x^{k}+d^{k})-{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}(x^{k})}_{O(\|x^{k}-x^{\star}\|)}{}\rangle,

where the first inequality owes to local minimality of $x^{\star}$ and the fact that $x^{k}+d^{k}\to x^{\star}$ , the second one from the expression (3.2a) of $\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}$ and the definition of the envelope, cf. (3.4), and the big- $O$ estimates from local smoothness of $h$ and $f$ . In particular, $\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k}+d^{k},x^{k}){}-{}\varphi(x^{\star}){}={}o(\|x^{k}-x^{\star}\|^{2})$ . Observing that (5.2) is still valid, denoting $\hat{\xi}\coloneqq\xi+(c/2\gamma)h$ so that $\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}=\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}+\operatorname{D}_{\hat{\xi}}$ one then has

\varepsilon_{k}{}\approx{}\frac{\operatorname{D}_{\hat{\xi}}(x^{k}+d^{k},x^{k})}{\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},x^{k-1})-\varphi^{\star}}{}\approx{}\frac{c^{\prime}\|d^{k}\|^{2}}{\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},x^{k-1})-\varphi^{\star}}{}\approx{}\frac{c^{\prime\prime}\|x^{k}-x^{\star}\|^{2}}{\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},x^{k-1})-\varphi^{\star}},

where “ $\approx$ ” denotes equality up to vanishing terms and $c^{\prime},c^{\prime\prime}>0$ are some constants due to [15, Lem. 7.5.7]. In this process, the information of superlinearity of $(d^{k})_{k\in\varmathbb N}$ is lost, and the vanishing of $\varepsilon_{k}$ cannot be established. Instead, as is apparent from (5.3) the choice of $y^{k}$ as in step 6 guarantees that on the first trial step $\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}$ coincides with a Bregman Moreau envelope, a function more tightly connected to the cost $\varphi$ .

6 Conclusions

This work contributes a mirror inertial forward-reflected-backward splitting algorithm ( $i^{*}$ FRB) and its linesearch enhancement, extending the forward-reflected-backward method proposed in [25] to the nonconvex and relative smooth setting. We have shown that the proposed algorithms enjoy pleasant properties akin to other splitting methods in the same setting. However, our methodology deviates from tradition through the $i^{*}$ FRB-envelope, an envelope function defined on a product space that takes inertial terms into account, which, to the best of our knowledge, is the first of its kind and thus could be instrumental for future research. This approach also requires the inertial parameter to be negative, which coincides with a recent result [14] regarding the impossibility of accelerated non-Euclidean algorithms under relative smoothness. Thus, it would be tempting to see whether an explicit example can be constructed to prove the sharpness of such restrictive assumption. It is also worth applying our technique to other two-stage splitting methods, such as Tseng’s method, to obtain similar extensions.

Appendix A Proof of 3.6

A.1 Convex $\vphantom{f}\smash{\hat{f}}_{\!\beta}$ case

Lemma A.1.

Suppose that I holds and let $\gamma>0$ and $\beta\in\varmathbb R$ be such that $\vphantom{f}\smash{\hat{f}}_{\!\beta}\coloneqq f-\frac{\beta}{\gamma}h$ is a convex function. Then, for every $x,x^{-}\in C$ and $\bar{x}\in\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-})$

\displaystyle{\mathopen{}\left(\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}+\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}\right)\mathclose{}}(\bar{x},x){}\leq{}

\displaystyle{\mathopen{}\left(\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}+\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}\right)\mathclose{}}(x,x^{-}){}-{}\operatorname{D}_{\hat{h}-2\vphantom{f}\smash{\hat{f}}_{\!\beta}}(\bar{x},x).

(A.1)

If $\hat{h}-\vphantom{f}\smash{\hat{f}}_{\!\beta}$ too is convex, then $\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}+\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}$ has the same infimum of $\varphi_{\overline{C}}$ , and is level bounded iff so is $\varphi_{\overline{C}}$ .

Proof.

All the claimed inequalities follows from (3.5) together with the fact that $\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}\geq 0$ , and that $\operatorname{D}_{\hat{h}-\vphantom{f}\smash{\hat{f}}_{\!\beta}}\geq 0$ too when $\hat{h}-\vphantom{f}\smash{\hat{f}}_{\!\beta}$ is convex. When both $\vphantom{f}\smash{\hat{f}}_{\!\beta}$ and $\hat{h}-\vphantom{f}\smash{\hat{f}}_{\!\beta}$ are convex, Item 2 implies that

\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(y,y^{-})+\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}(y,y^{-})\geq\varphi(\bar{y})\geq\operatorname*{inf}\varphi_{\overline{C}}\quad\forall y,y^{-}\in C,\ \bar{y}\in\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(y,y^{-}),

(A.2)

proving that $\operatorname*{inf}(\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}+\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}})\geq\operatorname*{inf}\varphi_{\overline{C}}$ . The converse inequality follows from Item 3 by observing that $(\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}+\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}})(x,x)=\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x)\leq\varphi(x)$ .

To conclude, suppose that $\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}+\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}$ is not level bounded, and consider an unbounded sequence $(x_{k},x_{k}^{-})_{k\in\varmathbb N}$ such that $\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x_{k},x_{k}^{-})+\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}(x_{k},x_{k}^{-})\leq\ell$ , for some $\ell\in\varmathbb R$ . Then, it follows from (A.2) that $\varphi(\bar{x}_{k})\leq\ell$ , where $\bar{x}_{k}\in\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x_{k},x_{k}^{-})$ . From local boundedness of $\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}$ (Item 3) it follows that $(\bar{x}_{k})_{k\in\varmathbb N}$ too is unbounded, showing that $\varphi$ is not level bounded either. The converse holds by observing that $(\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}+\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}})(x,x)=\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x)\leq\varphi_{\overline{C}}(x)$ , cf. Item 3. ∎

In the setting of 1, inequality (A.1) can equivalently be written in terms of $\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}$ as

	$\displaystyle\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(\bar{x},x){}\leq{}$	$\displaystyle\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-}){}-{}\operatorname{D}_{\hat{h}-2\vphantom{f}\smash{\hat{f}}_{\!\beta}-\frac{c}{\gamma}h}(\bar{x},x){}-{}\tfrac{c}{2\gamma}\operatorname{D}_{h}(\bar{x},x){}-{}\tfrac{c}{2\gamma}\operatorname{D}_{h}(x,x^{-})$
	$\displaystyle{}\leq{}$	$\displaystyle\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-}){}-{}\tfrac{c}{2\gamma}\operatorname{D}_{h}(\bar{x},x){}-{}\tfrac{c}{2\gamma}\operatorname{D}_{h}(x,x^{-}),$

where the second inequality owes to the fact that $\operatorname{D}_{\hat{h}-2\vphantom{f}\smash{\hat{f}}_{\!\beta}-\frac{c}{\gamma}h}\geq 0$ , since $\hat{h}-2\vphantom{f}\smash{\hat{f}}_{\!\beta}-\frac{c}{\gamma}h$ is convex, having

\hat{h}-2\vphantom{f}\smash{\hat{f}}_{\!\beta}-\tfrac{c}{\gamma}h{}={}\tfrac{1+2\beta-c}{\gamma}h-3f{}={}\vphantom{\tfrac{1+2\beta+3\alpha p_{-f,h}-c}{\gamma}}\smash{\overbracket{\tfrac{1+2\beta+3\alpha p_{-f,h}-c}{\gamma}}^{=0}}h+2(-f-\sigma_{-f,h}h),

the coefficient of $h$ being null by definition of $c$ , and $-f-\sigma_{-f,h}h$ being convex by definition of the relative weak convexity modulus $\sigma_{-f,h}$ , cf. 2.2. This proves (3.9a); inequality (3.9b) follows similarly by observing that

0{}\leq{}\operatorname{D}_{\hat{h}-2\vphantom{f}\smash{\hat{f}}_{\!\beta}-\frac{c}{\gamma}h}{}={}\operatorname{D}_{\hat{h}-\vphantom{f}\smash{\hat{f}}_{\!\beta}}-\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}-\tfrac{c}{\gamma}\operatorname{D}_{h}{}\leq{}\operatorname{D}_{\hat{h}-\vphantom{f}\smash{\hat{f}}_{\!\beta}}-\tfrac{c}{\gamma}\operatorname{D}_{h},

so that

\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-}){}\mathrel{{}\mathop{{=}\vphantom{\leq}}\limits^{\vphantom{\text{\scriptsize\clap{\begin{tabular}[b]{c}\ref{thm:env:eq}\end{tabular}}}}\smash{\raisebox{1.5pt}{\text{\scriptsize\clap{\begin{tabular}[b]{c}\ref{thm:env:eq}\end{tabular}}}}}}}{}\varphi(\bar{x}){}+{}\vphantom{\operatorname{D}_{\hat{h}-\vphantom{f}\smash{\hat{f}}_{\!\beta}}\vphantom{x^{-}}}\smash{\overbracket{\operatorname{D}_{\hat{h}-\vphantom{f}\smash{\hat{f}}_{\!\beta}}\vphantom{x^{-}}}^{\geq\frac{c}{\gamma}\operatorname{D}_{h}}}(\bar{x},x){}+{}\vphantom{\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}(\bar{x},x^{-})}\smash{\overbracket{\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}(\bar{x},x^{-})}^{\geq 0}}{}+{}\tfrac{c}{2\gamma}\operatorname{D}_{h}(x,x^{-}).

In turn, Item 2 follows from the same arguments as in the proof of A.1.

A.2 Lipschitz differentiable $\vphantom{f}\smash{\hat{f}}_{\!\beta}$ case

Lemma A.2.

Additionally to I, suppose that $\vphantom{f}\smash{\hat{f}}_{\!\beta}$ is $L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}$ -Lipschitz differentiable for some $L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}\geq 0$ . Then, for every $x,x^{-}\in C$ and $\bar{x}\in\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-})$

{\mathopen{}\left(\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}+\operatorname{D}_{L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}\mathcal{j}}\right)\mathclose{}}(\bar{x},x){}\leq{}{\mathopen{}\left(\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}+\operatorname{D}_{L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}\mathcal{j}}\right)\mathclose{}}(x,x^{-}){}-{}\operatorname{D}_{\hat{h}-2L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}\mathcal{j}}(\bar{x},x).

(A.3)

If $\hat{h}-L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}\mathcal{j}$ is a convex function, then $\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}+\operatorname{D}_{L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}\mathcal{j}}$ has the same infimum of $\varphi_{\overline{C}}$ , and is level bounded iff so is $\varphi_{\overline{C}}$ .

Proof.

By means of the three-point identity, that is, by using (3.2a) in place of (3.2b), inequality (3.5) can equivalently be written as

	$\displaystyle\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(\bar{x},x){}\leq{}\varphi_{\overline{C}}(\bar{x}){}={}$	$\displaystyle\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-}){}-{}\operatorname{D}_{\hat{h}}(\bar{x},x){}-{}\langle{}\bar{x}-x{},{}{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}(x)-{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}(x^{-}){}\rangle$
which by using Young’s inequality on the inner product and $L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}$ -Lipschitz differentiability yields
	$\displaystyle{}\leq{}$	$\displaystyle\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-}){}-{}\operatorname{D}_{\hat{h}}(\bar{x},x){}+{}\tfrac{L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}}{2}\\|\bar{x}-x\\|^{2}{}+{}\tfrac{L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}}{2}\\|x-x^{-}\\|^{2}.$

Rearranging and using the fact that $\operatorname{D}_{\mathcal{j}}(x,y)=\frac{1}{2}\|x-y\|^{2}$ yields the claimed inequality.

Under convexity of $\hat{h}-L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}\mathcal{j}$ , we may draw the same conclusions as in the proof of A.1 by observing that the same application of Young’s inequality above, combined with the expression (3.2a) of $\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}$ , gives

\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(y,y^{-}){}={}\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(\bar{y};y,y^{-}){}\geq{}\varphi(\bar{y}){}+{}\operatorname{D}_{\hat{h}}(\bar{y},y){}-{}\tfrac{L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}}{2}\|\bar{y}-y\|^{2}{}-{}\tfrac{L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}}{2}\|y-y^{-}\|^{2},

holding for any $y,y^{-}\in C$ and $\bar{y}\in\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(y,y^{-})$ . ∎

We will pattern the arguments in the previous section, and observe that inequality (A.3) can equivalently be written in terms of $\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}$ as

\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(\bar{x},x){}\leq{}\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-}){}-{}\vphantom{\operatorname{D}_{\hat{h}-2L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}\mathcal{j}-\frac{c}{\gamma}h}}\smash{\overbracket{\operatorname{D}_{\hat{h}-2L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}\mathcal{j}-\frac{c}{\gamma}h}}^{\geq 0}}(\bar{x},x){}-{}\tfrac{c}{2\gamma}\operatorname{D}_{h}(\bar{x},x){}-{}\tfrac{c}{2\gamma}\operatorname{D}_{h}(x,x^{-}).

Once again, the fact that $\operatorname{D}_{\hat{h}-2L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}\mathcal{j}-\frac{c}{\gamma}h}\geq 0$ owes to convexity of $\hat{h}-2L{\vphantom{f}\smash{\hat{f}}_{\!\beta}}\mathcal{j}-\frac{c}{\gamma}h$ , having

	$\displaystyle\hat{h}-2L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}\mathcal{j}-\tfrac{c}{\gamma}h{}={}$	$\displaystyle\tfrac{1-c}{\gamma}h{}-{}f{}-{}2L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}\mathcal{j}$
	$\displaystyle{}={}$	$\displaystyle\overbracket{\vphantom{\tfrac{1+\gamma\sigma_{-f,h}-c}{\gamma}}\tfrac{1+\gamma\sigma_{-f,h}-c}{\gamma}}^{=2L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}/\sigma_{h}\geq 0}(\vphantom{\vphantom{\tfrac{1+\gamma\sigma_{-f,h}-c}{\gamma}}h-\sigma_{h}\mathcal{j}}\smash{\overbracket{\vphantom{\tfrac{1+\gamma\sigma_{-f,h}-c}{\gamma}}h-\sigma_{h}\mathcal{j}}^{\text{convex}}}){}+{}(\vphantom{\vphantom{\tfrac{1+\gamma\sigma_{-f,h}-c}{\gamma}}-f-\sigma_{-f,h}h}\smash{\overbracket{\vphantom{\tfrac{1+\gamma\sigma_{-f,h}-c}{\gamma}}-f-\sigma_{-f,h}h}^{\text{convex}}}){}+{}\bigl{(}\vphantom{\vphantom{\tfrac{1+\gamma\sigma_{-f,h}-c}{\gamma}}\tfrac{1+\gamma\sigma_{-f,h}-c}{\gamma}\sigma_{h}{}-{}2L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}}\smash{\overbracket{\vphantom{\tfrac{1+\gamma\sigma_{-f,h}-c}{\gamma}}\tfrac{1+\gamma\sigma_{-f,h}-c}{\gamma}\sigma_{h}{}-{}2L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}}^{=0}}\bigr{)}\mathcal{j}$

altogether proving (3.9a). Similarly, inequality (3.9b) follows by observing that $0{}\leq{}\operatorname{D}_{\hat{h}-2\vphantom{f}\smash{\hat{f}}_{\!\beta}-\frac{c}{\gamma}h}{}={}\operatorname{D}_{\hat{h}-\vphantom{f}\smash{\hat{f}}_{\!\beta}}-\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}-\frac{c}{\gamma}\operatorname{D}_{h}{}\leq{}\operatorname{D}_{\hat{h}-\vphantom{f}\smash{\hat{f}}_{\!\beta}}-\frac{c}{\gamma}\operatorname{D}_{h}$ , so that

\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-}){}\mathrel{{}\mathop{{=}\vphantom{\leq}}\limits^{\vphantom{\text{\scriptsize\clap{\begin{tabular}[b]{c}\ref{thm:env:eq}\end{tabular}}}}\smash{\raisebox{1.5pt}{\text{\scriptsize\clap{\begin{tabular}[b]{c}\ref{thm:env:eq}\end{tabular}}}}}}}{}\varphi(\bar{x}){}+{}\vphantom{\operatorname{D}_{\hat{h}-\vphantom{f}\smash{\hat{f}}_{\!\beta}}\vphantom{x^{-}}}\smash{\overbracket{\operatorname{D}_{\hat{h}-\vphantom{f}\smash{\hat{f}}_{\!\beta}}\vphantom{x^{-}}}^{\geq\frac{c}{\gamma}\operatorname{D}_{h}}}(\bar{x},x){}+{}\vphantom{\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}(\bar{x},x^{-})}\smash{\overbracket{\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}(\bar{x},x^{-})}^{\geq 0}}{}+{}\tfrac{c}{2\gamma}\operatorname{D}_{h}(x,x^{-}).

The assertion of Item 2 once again follows from the same arguments as in the proof of A.2.

References

[1] M. Ahookhosh, A. Themelis, and P. Patrinos. A Bregman forward-backward linesearch algorithm for nonconvex composite optimization: Superlinear convergence to nonisolated local minima. SIAM Journal on Optimization, 31(1):653–685, 2021.
[2] H. Attouch, J. Bolte, P. Redont, and A. Soubeyran. Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality. Mathematics of Operations Research, 35(2):438–457, 2010.
[3] H. Attouch, J. Bolte, and B.F. Svaiter. Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Mathematical Programming, 137(1):91–129, 2013.
[4] D. Azé and J. Penot. Uniformly convex and uniformly smooth convex functions. Annales de la Faculté des sciences de Toulouse : Mathématiques, Ser. 6, 4(4):705–730, 1995.
[5] H.H. Bauschke, J. Bolte, and M. Teboulle. A descent lemma beyond Lipschitz gradient continuity: First-order methods revisited and applications. Mathematics of Operations Research, 42(2):330–348, 2017.
[6] H.H. Bauschke and P.L. Combettes. Iterating Bregman retractions. SIAM Journal on Optimization, 13(4):1159–1173, 2003.
[7] H.H. Bauschke and P.L. Combettes. Convex Analysis and Monotone Operator Theory in Hilbert spaces. CMS Books in Mathematics. Springer, 2017.
[8] A. Beck and M. Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31(3):167–175, 2003.
[9] D.P. Bertsekas. Nonlinear Programming. Athena Scientific, 2016.
[10] J. Bolte, S. Sabach, M. Teboulle, and Y. Vaisbourd. First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems. SIAM Journal on Optimization, 28(3):2131–2151, 2018.
[11] R.I. Boţ and D. Nguyen. The proximal alternating direction method of multipliers in the nonconvex setting: convergence analysis and rates. Mathematics of Operations Research, 45(2):682–712, 2020.
[12] G. Chen and M. Teboulle. Convergence analysis of a proximal-like minimization algorithm using Bregman functions. SIAM Journal on Optimization, 3(3):538–543, 1993.
[13] R. Dragomir, A. d’Aspremont, and J. Bolte. Quartic first-order methods for low-rank minimization. Journal of Optimization Theory and Applications, 189(2):341–363, 2021.
[14] R. Dragomir, A.B. Taylor, A. d’Aspremont, and J. Bolte. Optimal complexity and certification of Bregman first-order methods. Mathematical Programming, 194(1):41–83, 2022.
[15] F. Facchinei and J. Pang. Finite-dimensional Variational Inequalities and Complementarity Problems, volume II. Springer, 2003.
[16] F. Hanzely, P. Richtarik, and L. Xiao. Accelerated Bregman proximal gradient methods for relatively smooth convex optimization. Computational Optimization and Applications, 79(2):405–440, 2021.
[17] A.F. Izmailov and M.V. Solodov. Newton-type Methods for Optimization and Variational Problems. Springer, 2014.
[18] C. Kan and W. Song. The Moreau envelope function and proximal mapping in the sense of the Bregman distance. Nonlinear Analysis: Theory, Methods & Applications, 75(3):1385 – 1399, 2012.
[19] G. Li, T. Liu, and T.K. Pong. Peaceman–Rachford splitting for a class of nonconvex optimization problems. Computational Optimization and Applications, 68(2):407–436, Nov 2017.
[20] G. Li and T.K. Pong. Global convergence of splitting methods for nonconvex composite optimization. SIAM Journal on Optimization, 25(4):2434–2460, 2015.
[21] G. Li and T.K. Pong. Douglas-Rachford splitting for nonconvex optimization with application to nonconvex feasibility problems. Mathematical Programming, 159(1):371–401, Sep 2016.
[22] Y. Liu and W. Yin. An envelope for Davis–Yin splitting and strict saddle-point avoidance. Journal of Optimization Theory and Applications, 181(2):567–587, 2019.
[23] H. Lu, R.M. Freund, and Y. Nesterov. Relatively smooth convex optimization by first-order methods, and applications. SIAM Journal on Optimization, 28(1):333–354, 2018.
[24] J. Mairal. Incremental majorization-minimization optimization with application to large-scale machine learning. SIAM Journal on Optimization, 25(2):829–855, 2015.
[25] Y. Malitsky and M.K. Tam. A forward-backward splitting method for monotone inclusions without cocoercivity. SIAM Journal on Optimization, 30(2):1451–1472, 2020.
[26] N. Maratos. Exact penalty function algorithms for finite dimensional and control optimization problems. PhD thesis, Imperial College London (University of London), 1978.
[27] B. Mordukhovich. Variational Analysis and Applications, volume 30. Springer, 2018.
[28] J. Moreau. Proximité et dualité dans un espace hilbertien. Bulletin de la Société mathématique de France, 93:273–299, 1965.
[29] Y. Nesterov. Implementable tensor methods in unconstrained convex optimization. Mathematical Programming, pages 1–27, 2019.
[30] Yurii Nesterov. Lectures on convex optimization, volume 137. Springer, 2018.
[31] N. Parikh and S. Boyd. Proximal algorithms. Foundations and Trends in Optimization, 1(3):127–239, 2014.
[32] D. Reem, S. Reich, and A. De Pierro. Re-examination of Bregman functions and new properties of their divergences. Optimization, 68(1):279–348, 2019.
[33] R.T. Rockafellar and R.J. Wets. Variational Analysis, volume 317. Springer, 2011.
[34] L. Stella, A. Themelis, and P. Patrinos. Newton-type alternating minimization algorithm for convex optimization. IEEE Transactions on Automatic Control, 2018.
[35] M. Teboulle. A simplified view of first order methods for optimization. Mathematical Programming, pages 1–30, 2018.
[36] A. Themelis. Proximal Algorithms for Structured Nonconvex Optimization. PhD thesis, KU Leuven, 12 2018.
[37] A. Themelis and P. Patrinos. Douglas–Rachford splitting and ADMM for nonconvex optimization: Tight convergence results. SIAM Journal on Optimization, 30(1):149–181, 2020.
[38] A. Themelis, L. Stella, and P. Patrinos. Forward-backward envelope for the sum of two nonconvex functions: Further properties and nonmonotone linesearch algorithms. SIAM Journal on Optimization, 28(3):2274–2303, 2018.
[39] A. Themelis, L. Stella, and P. Patrinos. Douglas-Rachford splitting and ADMM for nonconvex optimization: Accelerated and Newton-type algorithms. Computational Optimization and Applications, 82:395–440, 2022.
[40] X. Wang and Z. Wang. A Bregman inertial forward-reflected-backward method for nonconvex minimization. arXiv:2207.01170, 2022.
[41] X. Wang and Z. Wang. The exact modulus of the generalized concave Kurdyka-Łojasiewicz property. Mathematics of Operations Research, 47(4):2765–2783, 2022.
[42] X. Wang and Z. Wang. Malitsky-Tam forward-reflected-backward splitting method for nonconvex minimization problems. Computational Optimization and Applications, 82(2):441–463, 2022.

	$\displaystyle 1$	$\displaystyle{}\leq{}M(\tilde{\psi})^{\prime}_{-}(\delta_{k}-\varphi^{\star}){\|\kern-0.86108pt\|\kern-0.86108pt\|(x^{k+1}-x^{k},x^{k}-x^{k-1})\|\kern-0.86108pt\|\kern-0.86108pt\|}{}\leq{}\tfrac{M\Delta_{k,k+1}}{\delta_{k}-\delta_{k+1}}{\|\kern-0.86108pt\|\kern-0.86108pt\|(x^{k+1}-x^{k},x^{k}-x^{k-1})\|\kern-0.86108pt\|\kern-0.86108pt\|}$
		$\displaystyle{}\leq{}\frac{2\gamma M\Delta_{k,k+1}}{c{\mathopen{}\left(\operatorname{D}_{h}(x^{k+1},x^{k})+\operatorname{D}_{h}(x^{k},x^{k-1})\right)\mathclose{}}}{\|\kern-0.86108pt\|\kern-0.86108pt\|(x^{k+1}-x^{k},x^{k}-x^{k-1})\|\kern-0.86108pt\|\kern-0.86108pt\|}$
		$\displaystyle{}\leq{}\frac{2\gamma M\Delta_{k,k+1}{\|\kern-0.86108pt\|\kern-0.86108pt\|(x^{k+1}-x^{k},x^{k}-x^{k-1})\|\kern-0.86108pt\|\kern-0.86108pt\|}}{c\sigma\\|{}x^{k+1}-x^{k}{}\\|^{2}+c\sigma\\|{}x^{k}-x^{k-1}{}\\|^{2}}\leq\frac{4\gamma M\Delta_{k,k+1}}{c\sigma{\|\kern-0.86108pt\|\kern-0.86108pt\|(x^{k+1}-x^{k},x^{k}-x^{k-1})\|\kern-0.86108pt\|\kern-0.86108pt\|}},$

A mirror inertial forward-reflected-backward splitting: Global convergence and linesearch extension beyond convexity and Lipschitz smoothness††thanks: This work was supported by NSERC Discovery Grants, and the JSPS KAKENHI grant number JP21K17710.

Abstract

1 Introduction

1.1 Problem setting and proposed algorithm

Assumption I.

Remark 1.1 (constraint qualifications for Assumption 4).

2 Preliminaries

2.1 Notation

2.2 Relative smoothness and weak convexity

Definition 2.1 (relative smoothness).

Definition 2.2 (relative weak convexity).

Remark 2.3.

Lemma 2.4 (relative prox-boundedness).

Proof.

Lemma 2.5 (Lipschitz smoothness from weak convexity).

Proof.

3 Algorithmic analysis toolbox

Lemma 3.1 ([1, Thm. 4.1]).

3.1 Parametric minimization model

Lemma 3.2 (basic properties of the model ℳγ,βh​-frb\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}} and the operator Tγ,βh​-frb\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}).

Proof.

Remark 3.3 (inertial effect).

3.2 The i∗i^{*}FRB-envelope

Definition 3.4 (i∗i^{*}FRB-envelope).

Lemma 3.5 (basic properties of ϕγ,βh​-frb\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}).

Proof.

3.3 Establishing a merit function

Theorem 3.6.

Remark 3.7.

Remark 3.8.

3.4 Simplified bounds

Corollary 3.9 (worst-case bounds).

Proof.

Corollary 3.10 (bounds when ff is convex).

Proof.

Corollary 3.11 (bounds when ff is concave).

4 Convergence analysis

Assumption II.

4.1 Function value convergence

Theorem 4.1 (function value convergence of i∗i^{*}FRB).

Proof.

Example 4.2.

Lemma 4.3 (termination criteria).

Proof.

4.2 Global convergence

Definition 4.4 ([41, Def. 5]).

Lemma 4.5 ([41, Lem. 4.4]).

Theorem 4.6 (sequential convergence of i∗i^{*}FRB).

Proof.

Remark 4.7.

Corollary 4.8.

Proof.

4.3 Convergence rates

Theorem 4.9 (function value and sequential convergence rate).

Proof.

5 Globalizing fast local methods with i∗i^{*}FRB

Theorem 5.1 (well definedness and asymptotic analysis of Algorithm 2).

Proof.

Definition 5.2 (superlinearly convergent directions [15, Eq. (7.5.2)]).

Theorem 5.3 (acceptance of unit stepsize).

Proof.

6 Conclusions

Appendix A Proof of 3.6

A.1 Convex f^β\vphantom{f}\smash{\hat{f}}_{\!\beta} case

Lemma A.1.

Proof.

A.2 Lipschitz differentiable f^β\vphantom{f}\smash{\hat{f}}_{\!\beta} case

Lemma A.2.

Proof.

References

A mirror inertial forward-reflected-backward splitting: Global convergence and linesearch extension beyond convexity and Lipschitz smoothness^†^†thanks: This work was supported by NSERC Discovery Grants, and the JSPS KAKENHI grant number JP21K17710.

Lemma 3.2 (basic properties of the model $\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}$ and the operator $\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}$ ).

3.2 The $i^{*}$ FRB-envelope

Definition 3.4 ( $i^{*}$ FRB-envelope).

Lemma 3.5 (basic properties of $\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}$ ).

Corollary 3.10 (bounds when $f$ is convex).

Corollary 3.11 (bounds when $f$ is concave).

Theorem 4.1 (function value convergence of $i^{*}$ FRB).

Theorem 4.6 (sequential convergence of $i^{*}$ FRB).

5 Globalizing fast local methods with $i^{*}$ FRB

A.1 Convex $\vphantom{f}\smash{\hat{f}}_{\!\beta}$ case

A.2 Lipschitz differentiable $\vphantom{f}\smash{\hat{f}}_{\!\beta}$ case