This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A mirror inertial forward-reflected-backward splitting: Global convergence and linesearch extension beyond convexity and Lipschitz smoothnessthanks: This work was supported by NSERC Discovery Grants, and the JSPS KAKENHI grant number JP21K17710.

Ziyuan Wang Department of Mathematics, Irving K. Barber Faculty of Science, University of British Columbia, Kelowna, B.C. V1V 1V7, Canada.
E-mails:[email protected], [email protected]
   Andreas Themelis Kyushu University, Faculty of Information Science and Electrical Engineering (ISEE), 744 Motooka, Nishi-ku, 819-0395 Fukuoka, Japan.
E-mails:[email protected], [email protected]
   Hongjia Ou33footnotemark: 3    Xianfu Wang22footnotemark: 2
Abstract

This work investigates a Bregman and inertial extension of the forward-reflected-backward algorithm [Y. Malitsky and M. Tam, SIAM J. Optim., 30 (2020), pp. 1451–1472] applied to structured nonconvex minimization problems under relative smoothness. To this end, the proposed algorithm hinges on two key features: taking inertial steps in the dual space, and allowing for possibly negative inertial values. Our analysis begins with studying an associated envelope function that takes inertial terms into account through a novel product space formulation. Such construction substantially differs from similar objects in the literature and could offer new insights for extensions of splitting algorithms. Global convergence and rates are obtained by appealing to the generalized concave Kurdyka-Łojasiewicz (KL) property, which allows us to describe a sharp upper bound on the total length of iterates. Finally, a linesearch extension is given to enhance the proposed method.

Keywords. Nonsmooth nonconvex optimization \cdot forward-reflected-backward splitting \cdot inertia \cdot Bregman distance \cdot relative smoothness.

AMS subject classifications. 90C26 \cdot 49J52 \cdot 49J53.

1 Introduction

Consider the following composite minimization problem

minimizex\varmathbbRnφ(x)f(x)+g(x)subjecttoxC¯,\operatorname*{minimize}_{x\in\varmathbb R^{n}}\varphi(x)\coloneqq f(x)+g(x)\quad\operatorname{subject\ to}x\in\overline{C}, (P)

where CC is a nonempty open and convex set with closure C¯\overline{C}, f:\varmathbbRn\varmathbb¯R\varmathbbR{±}f:\varmathbb R^{n}\rightarrow\overline{\varmathbb}R\coloneqq\varmathbb R\cup{\mathopen{}\left\{\pm\infty{}\mathrel{\mid}{}{}\right\}\mathclose{}} is differentiable on intdomf\operatorname{int}\operatorname{dom}f\neq\emptyset, and g:\varmathbbRn\varmathbb¯Rg:\varmathbb R^{n}\rightarrow\overline{\varmathbb}R is proper and lower semicontinuous (lsc). For notational brevity, we define φC¯φ+δC¯\varphi_{\overline{C}}\coloneqq\varphi+\operatorname{\delta}_{\overline{C}} with δX\operatorname{\delta}_{X} denoting the indicator function of set X\varmathbbRnX\subseteq\varmathbb R^{n}, namely such that δX(x)=0\operatorname{\delta}_{X}(x)=0 if xXx\in X and \infty otherwise. By doing so, problem (P) can equivalently be cast as the “unconstrained” minimization

minimizex\varmathbbRnφC¯(x).\operatorname*{minimize}_{x\in\varmathbb R^{n}}\varphi_{\overline{C}}(x).

Note that (P) is beyond the scope of traditional first-order methods that require global Lipschitz continuity of f{\nabla}f and the consequential descent lemma [9, Prop. A.24]; see, e.g, [3, 31, 24, 20, 21, 19] for such algorithms. To resolve this issue, Lipschitz-like convexity was introduced in the seminal work [5], furnishing a descent lemma beyond the aforementioned setting. This notion was then referred to as relative smoothness (see 2.1) and has played a central role in extending splitting algorithm to the setting of (P); see, e.g., [10, 13, 16, 23, 29, 35].

The goal of this paper is to propose a Bregman inertial forward-reflected-backward method ii^{*}FRB (Algorithm 1) for solving (P), which, roughly speaking, iterates

xk+1(h+γg)1(h(xk)+β(h(xk)h(xk1)γ(2f(xk)f(xk1)),x^{k+1}{}\in{}({\nabla}h+\gamma\partial g)^{-1}({\nabla}h(x^{k})+\beta({\nabla}h(x^{k})-{\nabla}h(x^{k-1})-\gamma(2{\nabla}f(x^{k})-{\nabla}f(x^{k-1})),

where γ>0\gamma>0 is the stepsize, β\beta is inertial parameter, and hh is the kernel. In the convex case, the above scheme reduces to the inertial forward-reflected-backward method proposed in [25] when h=(1/2)2h=(1/2)\|\cdot\|^{2}, which is not applicable to (P) due to its assumption on Lipschitz continuity of f{\nabla}f.

A fundamental tool in our analysis is the ii^{*}FRB-envelope (see 3.4), which is the value function associated to the parametric minimization of a “model” of (P); see Section 3.1. The term “envelope” is borrowed from the celebrated Moreau envelope [28] and its relation with the proximal operator. Indeed, there has been a re-emerged interest of employing an associated envelope function to study convergence of splitting methods, such as forward-backward splitting [38, 1], Douglas-Rachford splitting [37, 39], alternating minimization algorithm [34], as well as the splitting scheme of Davis and Yin [22]. The aforementioned works share one common theme: regularity properties of the associated envelope function are used for further enhancement and deeper algorithmic insights.

Pursuing the same pattern, additionally to studying global convergence of the proposed algorithm in Section 4 using the ii^{*}FRB-envelope, we will showcase in Section 5 how it offers a versatile globalization framework for fast local methods in the full generality of I. Such framework revealed the necessity of interleaving noninertial trial steps in the globalization strategy, an observation that led to a substantial change in the linesearch strategy with respect to related works.

A major departure of our analysis from previous work is that we consider an envelope function with two independent variables, allowing us to take inertial terms into account. In this regard, we believe that our methodology is appealing in its own right, as it can be instrumental for deriving inertial extensions of other splitting methods. Another notable feature of this work is that, as one shall see in Section 3.4, non-positive inertial parameter is required for the sake of convergence under relative smoothness. This result, although more pessimistic, aligns with the recent work [14] regarding the impossibility of accelerated Bregman forward-backward method under the same assumption; see 3.7 for a detailed discussion. Our work differs from the analysis carried out in [40], which also deals with an inertial forward-reflected-backward algorithm using Bregman metrics but is still limited to the Lipschitz smoothness assumption. The game changer that enables us to cope with the relative smoothness is taking the inertial step in the dual space, that is, interpolating application of h{\nabla}h (cf. step 4 of Algorithm 1), whence the name, inspired by [8], mirror inertial forward-reflected-backward splitting (ii^{*}FRB).

The rest of the paper is structured as follows. In the remainder of the section we formally define the problem setting and the proposed algorithm, and in Section 2 we discuss some preliminary material and notational conventions. Section 3 introduces the ii^{*}FRB-envelope and an associated merit function; the proof of the main result therein is deferred to Appendix A. The convergence analysis is carried out in Section 4, and finally Section 5 presents the ii^{*}FRB-based globalization framework. Section 6 draws some concluding remarks.

1.1 Problem setting and proposed algorithm

Throughout, we fix a Legendre kernel h:\varmathbbRn\varmathbb¯Rh:\varmathbb R^{n}\rightarrow\overline{\varmathbb}R with domh=intdomh=C\operatorname{dom}{\nabla}h=\operatorname{int}\operatorname{dom}h=C, namely, a proper, lsc, and strictly convex function that is 1-coercive and essentially smooth, i.e., such that h(xk)\|{\nabla}h(x_{k})\|\to\infty for every sequence (xk)k\varmathbbNC(x_{k})_{k\in\varmathbb N}\subset C converging to a boundary point of CC. We will consider the following iterative scheme for addressing problem (P), where Dh:\varmathbbRn×\varmathbbRn\varmathbb¯R\operatorname{D}_{h}:\varmathbb R^{n}\times\varmathbb R^{n}\rightarrow\overline{\varmathbb}R denotes the Bregman distance induced by hh, defined as

Dh(x,y){h(x)h(y)h(y),xyif yC,otherwise.{\operatorname{D}_{h}(x,y){}\coloneqq{}{\mathopen{}\left\{\begin{array}[]{l @{\hspace{\ifcasescolsep}} >{\text{if~}}l }h(x)-h(y)-{\mathopen{}\left\langle{}{\nabla}h(y){},{}x-y{}\right\rangle\mathclose{}}\hfil\hskip 10.00002pt&\leavevmode\nobreak\ }y\in C,\\ \infty\hfil\hskip 10.00002pt&\lx@intercol\text{otherwise.}\hfil\lx@intercol\end{array}\right.\mathclose{}} (1.1)
Algorithm 1 Mirror inertial forward-reflected-backward (ii^{*}FRB)
1:
2:
3:Set yky^{k} such that h(yk)=h(xk)γ(f(xk)f(xk1)){\nabla}h(y^{k}){}={}{\nabla}h(x^{k})-\gamma\bigl{(}{\nabla}f(x^{k})-{\nabla}f(x^{k-1})\bigr{)}
4:Choose xk+1argminw\varmathbbRn{g(w)+w,f(xk)βγ(h(xk)h(xk1))+1γDh(w,yk)}x^{k+1}{}\in{}\operatorname*{arg\,min}\limits_{w\in\varmathbb R^{n}}{\mathopen{}\left\{g(w){}+{}\langle{}w{},{}{\nabla}f(x^{k})-\tfrac{\beta}{\gamma}\bigl{(}{\nabla}h(x^{k})-{\nabla}h(x^{k-1})\bigr{)}{}\rangle{}+{}\tfrac{1}{\gamma}\operatorname{D}_{h}(w,y^{k}){}\mathrel{\mid}{}{}\right\}\mathclose{}}

Note that Algorithm 1 takes inertial step in the dual space, hence the abbreviation ii^{*}FRB. We will work under the following assumptions.

Assumption I.

The following hold in problem (P):

  1. 1

    f:\varmathbbRn\varmathbb¯Rf:\varmathbb R^{n}\rightarrow\overline{\varmathbb}R is smooth relative to hh (see Section 2.2).

  2. 2

    g:\varmathbbRn\varmathbb¯Rg:\varmathbb R^{n}\rightarrow\overline{\varmathbb}R is proper and lsc.

  3. 3

    infφC¯>\operatorname*{inf}\varphi_{\overline{C}}>-\infty.

  4. 4

    For any v\varmathbbRnv\in\varmathbb R^{n} and γ>0\gamma>0, argmin{γg+hv,}C\operatorname*{arg\,min}{\mathopen{}\left\{\gamma g+h-{\mathopen{}\left\langle{}v{},{}\cdot{}\right\rangle\mathclose{}}{}\mathrel{\mid}{}{}\right\}\mathclose{}}\subseteq C.

As will be made explicit in 3.2, Assumption 4 is a requirement ensuring that Algorithm 1 is well defined. Note that, in general, the minimizers therein are a (possibly empty) subset of domhdomg\operatorname{dom}h\cap\operatorname{dom}g; Assumption 4 thus only excludes points on the boundary of domh\operatorname{dom}h. This standard requirement is trivially satisfied when domh\operatorname{dom}h is open, or more generally when constraint qualifications enabling a subdifferential calculus rule on the boundary are met, as is the case when gg is convex.

Remark 1.1 (constraint qualifications for Assumption 4).

If gg is proper and lsc, Assumption 4 is satisfied if g(h){0}\partial^{\infty}g\cap\bigl{(}-\partial^{\infty}h\bigr{)}\subseteq{\mathopen{}\left\{0{}\mathrel{\mid}{}{}\right\}\mathclose{}} holds everywhere (this condition being automatically guaranteed at all point outside the boundary of CC, having h\partial^{\infty}h empty outside domh\operatorname{dom}h and {0}{\mathopen{}\left\{0{}\mathrel{\mid}{}{}\right\}\mathclose{}} in its interior). Indeed, optimality of x¯argmin{γg+hv,}\bar{x}\in\operatorname*{arg\,min}{\mathopen{}\left\{\gamma g+h-{\mathopen{}\left\langle{}v{},{}\cdot{}\right\rangle\mathclose{}}{}\mathrel{\mid}{}{}\right\}\mathclose{}} implies that v[γg+h](x¯)γg(x¯)+h(x¯)v\in\partial[\gamma g+h](\bar{x})\subseteq\gamma\partial g(\bar{x})+\partial h(\bar{x}), with inclusion holding by [33, Cor. 10.9] and implying nonemptiness of h(x¯)\partial h(\bar{x}); see Section 2.1 for definitions of subdifferentials. ∎

Regardless, it will be shown in 2.4 that existence of minimizers is guaranteed for small enough values of γ\gamma, which will then be linked in 3.2 to the well definedness of Algorithm 1.

2 Preliminaries

2.1 Notation

The extended-real line is denoted by \varmathbb¯R\varmathbbR{±}\overline{\varmathbb}R\coloneqq\varmathbb R\cup{\mathopen{}\left\{\pm\infty{}\mathrel{\mid}{}{}\right\}\mathclose{}}. The positive and negative part of r\varmathbbRr\in\varmathbb R are respectively defined as [r]+max{0,r}[r]_{+}\coloneqq\max{\mathopen{}\left\{0,r{}\mathrel{\mid}{}{}\right\}\mathclose{}} and [r]max{0,r}[r]_{-}\coloneqq\max{\mathopen{}\left\{0,-r{}\mathrel{\mid}{}{}\right\}\mathclose{}}, so that r=[r]+[r]r=[r]_{+}-[r]_{-}.

The distance of a point x\varmathbbRnx\in\varmathbb R^{n} to a nonempty set S\varmathbbRnS\subseteq\varmathbb R^{n} is given by dist(x,S)=infzSzx\operatorname{dist}(x,S)=\operatorname*{inf}_{z\in S}\|z-x\|. The interior, closure, and boundary of SS are respectively denoted as intS\operatorname{int}S, S¯\overline{S}, and bdryS=S¯intS\operatorname{bdry}S=\overline{S}\setminus\operatorname{int}S. The indicator function of SS is δS:\varmathbbRn\varmathbb¯R\operatorname{\delta}_{S}:\varmathbb R^{n}\rightarrow\overline{\varmathbb}R defined as δS(x)=0\operatorname{\delta}_{S}(x)=0 if xSx\in S and \infty otherwise.

A function f:\varmathbbRn\varmathbb¯Rf:\varmathbb R^{n}\rightarrow\overline{\varmathbb}R is proper if ff\not\equiv\infty and f>f>-\infty, in which case its domain is defined as the set domf{x\varmathbbRnf(x)<}\operatorname{dom}f\coloneqq{\mathopen{}\left\{x\in\varmathbb R^{n}{}\mathrel{\mid}{}f(x)<\infty\right\}\mathclose{}}. For α\varmathbbR\alpha\in\varmathbb R, [fα]{x\varmathbbRnf(x)α}[f\leq\alpha]\coloneqq{\mathopen{}\left\{x\in\varmathbb R^{n}{}\mathrel{\mid}{}f(x)\leq\alpha\right\}\mathclose{}} denotes the α\alpha-sublevel set of ff; [αfβ][\alpha\leq f\leq\beta] with α,β\varmathbbR\alpha,\beta\in\varmathbb R is defined accordingly. We say that ff is level bounded (or coercive) if lim infxf(x)=\liminf_{\|x\|\to\infty}f(x)=\infty, and 11-coercive if limxf(x)/x=\lim_{\|x\|\to\infty}f(x)/\|x\|=\infty. A point xdomfx_{\star}\in\operatorname{dom}f is a local minimum for ff if f(x)f(x)f(x)\geq f(x_{\star}) holds for all xx in a neighborhood of xx_{\star}. If the inequality can be strengthened to f(x)f(x)+μ2xx2f(x)\geq f(x_{\star})+\tfrac{\mu}{2}\|x-x_{\star}\|^{2} for some μ>0\mu>0, then xx_{\star} is a strong local minimum. The convex conjugate of ff is denoted as fsupz{,zf(z)}f^{\ast}\coloneqq\sup_{z}{\mathopen{}\left\{{\mathopen{}\left\langle{}{}\cdot{}{},{}z{}\right\rangle\mathclose{}}-f(z){}\mathrel{\mid}{}{}\right\}\mathclose{}}. Given xdomfx\in\operatorname{dom}f, f(x)\partial f(x) denotes the Mordukhovich (limiting) subdifferential of ff at xx, given by

f(x){v\varmathbbRn(xk,vk)k\varmathbbNs.t.xkx,f(xk)f(x),^f(xk)vkv},\partial f(x){}\coloneqq{}{\mathopen{}\left\{v\in\varmathbb R^{n}{}\mathrel{\mid}{}\exists(x^{k},v^{k})_{k\in\varmathbb N}\leavevmode\nobreak\ \text{s.t.}\leavevmode\nobreak\ x^{k}\to x,\leavevmode\nobreak\ f(x^{k})\to f(x),\leavevmode\nobreak\ \hat{\partial}f(x^{k})\ni v^{k}\to v\right\}\mathclose{}},

and ^f(x)\hat{\partial}f(x) is the set of regular subgradients of ff at xx, namely vectors v\varmathbbRnv\in\varmathbb R^{n} such that lim infzxzxf(z)f(x)v,zxzx0\liminf_{{\begin{array}[]{>{\scriptstyle}r >{\scriptstyle{}}c<{{}} >{\scriptstyle}l}z&\to&x\\ z&\neq&x\end{array}}}{\frac{f(z)-f(x)-{\mathopen{}\left\langle{}v{},{}z-x{}\right\rangle\mathclose{}}}{\|z-x\|}}{}\geq{}0; see, e.g., [33, 27]. 𝒞k(𝒰)\mathcal{C}^{k}(\mathcal{U}) is the set of functions 𝒰\varmathbbR\mathcal{U}\to\varmathbb R which are kk times continuously differentiable. We write 𝒞k\mathcal{C}^{k} if 𝒰\mathcal{U} is clear from context. The notation T:\varmathbbRn\varmathbbRnT:\varmathbb R^{n}\rightrightarrows\varmathbb R^{n} indicates a set-valued mapping, whose domain and graph are respectively defined as domT={x\varmathbbRnT(x)}\operatorname{dom}T={\mathopen{}\left\{x\in\varmathbb R^{n}{}\mathrel{\mid}{}T(x)\neq\emptyset\right\}\mathclose{}} and gphT={(x,y)\varmathbbRn×\varmathbbRnyT(x)}\operatorname{gph}T={\mathopen{}\left\{(x,y)\in\varmathbb R^{n}\times\varmathbb R^{n}{}\mathrel{\mid}{}y\in T(x)\right\}\mathclose{}}.

2.2 Relative smoothness and weak convexity

In the following definition, as well as throughout the entire paper, we follow the extended-real convention =\infty-\infty=\infty to resolve possible ill definitions of difference of extended-real–valued functions. Similarly, we will adopt the convention that 1/0=1/0=\infty.

Definition 2.1 (relative smoothness).

We say that an lsc function f:\varmathbbRn\varmathbb¯Rf:\varmathbb R^{n}\rightarrow\overline{\varmathbb}R is smooth relative to hh if domfdomh\operatorname{dom}f\supseteq\operatorname{dom}h and there exists a constant Lf,h0L_{f,h}\geq 0 such that Lf,hh±fL_{f,h}h\pm f are convex functions on intdomh\operatorname{int}\operatorname{dom}h. We may alternatively say that ff is Lf,hL_{f,h}-smooth relative to hh to make the smoothness modulus Lf,hL_{f,h} explicit.

Note that the constant Lf,hL_{f,h} may be loose. For instance, if ff is convex, then Lh+fLh+f is convex for any L0L\geq 0. This motivates us to consider one-sided conditions and treat ff and f-f separately.

Definition 2.2 (relative weak convexity).

We say that an lsc function f:\varmathbbRn\varmathbb¯Rf:\varmathbb R^{n}\rightarrow\overline{\varmathbb}R is weakly convex relative to hh if there exists a (possibly negative) constant σf,h\varmathbbR\sigma_{f,h}\in\varmathbb R such that fσf,hhf-\sigma_{f,h}h is a convex function. We may alternatively say that ff is σf,h\sigma_{f,h}-weakly convex relative to hh to make the weak convexity modulus σf,h\sigma_{f,h} explicit.

Note that relative smoothness of ff is equivalent to the relative weak convexity of ±f\pm f. Indeed, if ff is Lf,hL_{f,h}-smooth relative to hh, then both ff and f-f are (Lf,h)(-L_{f,h})-weakly convex. Conversely, if ff and f-f are σf,h\sigma_{f,h}- and σf,h\sigma_{-f,h}-weakly convex relative to hh, respectively, then ff (as well as f-f) is Lf,hL_{f,h}-smooth relative to hh with

Lf,h=max{|σf,h|,|σf,h|}.L_{f,h}=\max{\mathopen{}\left\{|\sigma_{f,h}|,|\sigma_{-f,h}|{}\mathrel{\mid}{}{}\right\}\mathclose{}}. (2.1)

The relative weak convexity moduli σ±f,h\sigma_{\pm f,h} will be henceforth adopted when referring to Assumption 1. It will be convenient to normalize these quantities into pure numbers

p±f,hσ±f,hLf,h[1,1].p_{\pm f,h}{}\coloneqq{}\tfrac{\sigma_{\pm f,h}}{L_{f,h}}{}\in{}[-1,1]. (2.2)

To make all definitions and implications well posed, we will ignore the uninsteresting case in which ff is affine, in which case Lf,h=0L_{f,h}=0 for any hh. The comment below will be instrumental in Section 3.4.

Remark 2.3.

Invoking (2.1) and (2.2) yields that

2pf,h+pf,h0and1{pf,h,pf,h},-2\leq p_{f,h}+p_{-f,h}\leq 0\quad\text{and}\quad-1\in{\mathopen{}\left\{p_{f,h},p_{-f,h}{}\mathrel{\mid}{}{}\right\}\mathclose{}}, (2.3)

where the second inequality owes to the fact that, by definition, both fσf,hhf-\sigma_{f,h}h and fσf,hh-f-\sigma_{-f,h}h are convex functions, and therefore so is their sum (σf,h+σf,h)h=Lf,h(pf,h+pf,h)h-(\sigma_{f,h}+\sigma_{-f,h})h=-L_{f,h}(p_{f,h}+p_{-f,h})h. Thus, whenever ff is convex (resp. concave), one can take pf,h=0p_{f,h}=0 (resp. pf,h=0p_{-f,h}=0) and by virtue of the inclusion in (2.3) it directly follows that pf,h=1p_{-f,h}=-1 (resp. pf,h=1p_{f,h}=-1). ∎

We now turn to a lemma that guarantees well definedness of Algorithm 1.

Lemma 2.4 (relative prox-boundedness).

Suppose that I holds. Then, the set argmin{γg+h+v,}\operatorname*{arg\,min}{\mathopen{}\left\{\gamma g+h+{\mathopen{}\left\langle{}v{},{}\cdot{}\right\rangle\mathclose{}}{}\mathrel{\mid}{}{}\right\}\mathclose{}} as in Assumption 4 is nonempty for any v\varmathbbRnv\in\varmathbb R^{n} and 0<γ<1/[σf,h]0<\gamma<1/[\sigma_{-f,h}]_{-}. In other words, gg is prox bounded relative to hh with threshold γg,h1/[σf,h]\gamma_{g,h}\geq 1/[\sigma_{-f,h}]_{-} [18, Def. 2.3].

Proof.

Recall that a proper convex function admits affine minorant; see, e.g, [7, Cor. 16.18]. It then follows that γg+h\gamma g+h is 1-coercive by observing that

γg+h=γφC¯γf+h=γφC¯infφC¯+γ(f+[σf,h]h)convex+(1γ[σf,h])>0h1-coercive\gamma g+h{}={}\gamma\varphi_{\overline{C}}-\gamma f+h{}={}\gamma\vphantom{\vphantom{{\mathopen{}\left([\sigma_{-f,h}]_{-}\right)\mathclose{}}}\varphi_{\overline{C}}}\smash{\underbracket{\vphantom{{\mathopen{}\left([\sigma_{-f,h}]_{-}\right)\mathclose{}}}\varphi_{\overline{C}}}_{\mathclap{\geq\operatorname*{inf}\varphi_{\overline{C}}}}}{}+{}\gamma\vphantom{{\mathopen{}\left(-f+[\sigma_{-f,h}]_{-}h\right)\mathclose{}}}\smash{\underbracket{{\mathopen{}\left(-f+[\sigma_{-f,h}]_{-}h\right)\mathclose{}}}_{\text{convex}}}{}+{}\vphantom{{\mathopen{}\left(1-\gamma[\sigma_{-f,h}]_{-}\right)\mathclose{}}}\smash{\underbracket{{\mathopen{}\left(1-\gamma[\sigma_{-f,h}]_{-}\right)\mathclose{}}}_{>0}}\vphantom{\,h\,}\smash{\overbracket{\,h\,}^{\text{\clap{1-coercive}}}}

on C¯\overline{C} and \infty on \varmathbbRnC¯\varmathbb R^{n}\setminus\overline{C}. ∎

We point out that the content of this subsection is a (well-known) generalization of the (well-known) equivalence between smoothness relative to the Euclidean kernel 𝒿(1/2)2\mathcal{j}\coloneqq(1/2)\|{}\cdot{}\|^{2} and Lipschitz differentiability, a fact that will be invoked in Section 3 and whose proof is given next for the sake of completeness.

Lemma 2.5 (Lipschitz smoothness from weak convexity).

Let 𝒿(1/2)2\mathcal{j}\coloneqq(1/2)\|{}\cdot{}\|^{2}. Then, for any F:\varmathbbRn\varmathbb¯RF:\varmathbb R^{n}\rightarrow\overline{\varmathbb}R the following are equivalent:

  1. 1

    there exist σ±F\varmathbbR\sigma_{\pm F}\in\varmathbb R such that both FσF𝒿F-\sigma_{F}\mathcal{j} and FσF𝒿-F-\sigma_{-F}\mathcal{j} are proper, convex and lsc;

  2. 2

    domF=\varmathbbRn\operatorname{dom}\partial F=\varmathbb R^{n}, and there exist σ±F\varmathbbR\sigma_{\pm F}\in\varmathbb R such that for all (xi,vi)gphF(x_{i},v_{i})\in\operatorname{gph}\partial F, i=1,2i=1,2, it holds that σFx1x22v1v2,x1x2σFx1x22\sigma_{F}\|x_{1}-x_{2}\|^{2}{}\leq{}{\mathopen{}\left\langle{}v_{1}-v_{2}{},{}x_{1}-x_{2}{}\right\rangle\mathclose{}}{}\leq{}-\sigma_{-F}\|x_{1}-x_{2}\|^{2};

  3. 3

    F{\nabla}F is LFL_{F}-Lipschitz differentiable for some LF0L_{F}\geq 0.

In particular, 1 and/or 2 imply 3 with LF=max{|σF|,|σF|}L_{F}=\max{\mathopen{}\left\{|\sigma_{F}|,|\sigma_{-F}|{}\mathrel{\mid}{}{}\right\}\mathclose{}}, and conversely 3 implies 1 and 2 with σ±F=LF\sigma_{\pm F}=-L_{F}.

Proof.
  • 1 \Leftrightarrow 2  Follows from [33, Ex.s 12.28(b),(c)].

  • 2 \Rightarrow 3  Invoking again [33, Ex. 12.28(c)] yields that both FF and F-F are lower-𝒞2\mathcal{C}^{2}, in the sense of [33, Def. 10.29], and continuous differentiability then follows from [33, Prop. 10.30]. Thus, vi=F(xi)v_{i}={\nabla}F(x_{i}) in assertion 2, and denoting LF=max{|σF|,|σF|}L_{F}=\max{\mathopen{}\left\{|\sigma_{F}|,|\sigma_{-F}|{}\mathrel{\mid}{}{}\right\}\mathclose{}} the function ψLF𝒿+F\psi\coloneqq L_{F}\mathcal{j}+F satisfies

    0(ψ(x1)ψ(x2),x1x22LFx1x22.0{}\leq{}{\mathopen{}\left\langle{}{\nabla}(\psi(x_{1})-{\nabla}\psi(x_{2}){},{}x_{1}-x_{2}{}\right\rangle\mathclose{}}{}\leq{}2L_{F}\|x_{1}-x_{2}\|^{2}.

    It then follows from [30, Thm. 2.1.5] that ψ\psi is convex and (2LF)(2L_{F})-Lipschitz differentiable, and that

    12LFψ(x1)ψ(x2)2ψ(x1)ψ(x2),x1x2.\tfrac{1}{2L_{F}}\|{\nabla}\psi(x_{1})-{\nabla}\psi(x_{2})\|^{2}{}\leq{}{\mathopen{}\left\langle{}{\nabla}\psi(x_{1})-{\nabla}\psi(x_{2}){},{}x_{1}-x_{2}{}\right\rangle\mathclose{}}.

    Using the definition of ψ\psi and expanding the squares yields 12LFF(x1)F(x2)2LF2x1x22\tfrac{1}{2L_{F}}\|{\nabla}F(x_{1})-{\nabla}F(x_{2})\|^{2}{}\leq{}\tfrac{L_{F}}{2}\|x_{1}-x_{2}\|^{2}, proving that F{\nabla}F is LFL_{F}-Lipschitz continuous.

  • 3 \Rightarrow 1  From the quadratic upper bound [9, Prop. A.24] it follows that

    ±F(x2)±F(x1)±F(x1),x2x1LF2x2x12.\pm F(x_{2}){}\geq{}\pm F(x_{1}){}\pm{}{\mathopen{}\left\langle{}{\nabla}F(x_{1}){},{}x_{2}-x_{1}{}\right\rangle\mathclose{}}{}-{}\tfrac{L_{F}}{2}\|x_{2}-x_{1}\|^{2}.

    or, equivalently,

    (LF𝒿±F)(x2)(LF𝒿±F)(x1)+(LF𝒿±F)(x1),x2x1.(L_{F}\mathcal{j}\pm F)(x_{2}){}\geq{}(L_{F}\mathcal{j}\pm F)(x_{1}){}+{}{\mathopen{}\left\langle{}{\nabla}(L_{F}\mathcal{j}\pm F)(x_{1}){},{}x_{2}-x_{1}{}\right\rangle\mathclose{}}.

    This proves convexity of LF𝒿±FL_{F}\mathcal{j}\pm F, whence the claim by taking σ±F=LF\sigma_{\pm F}=-L_{F}. ∎

3 Algorithmic analysis toolbox

In the literature, convergence analysis for nonconvex splitting algorithms typically revolves around the identification of a ‘Lyapunov potential’, namely, a lower bounded function that decreases its value along the iterates. In this section we will pursue this direction. We will actually go one step further in identifying a function which, additionally to serving as Lyapunov potential, is also continuous. This property, whose utility will be showcased in Section 5, will come as the result of a parametric minimization, as discussed in the following two subsections.

In what follows, to simplify the discussion we introduce

h^1γhfandf^βfβγh,\hat{h}\coloneqq\tfrac{1}{\gamma}h-f\quad\text{and}\quad\vphantom{f}\smash{\hat{f}}_{\!\beta}\coloneqq f-\tfrac{\beta}{\gamma}h, (3.1)

and observe that h^\hat{h} too is a Legendre kernel, for γ\gamma small enough.

Lemma 3.1 ([1, Thm. 4.1]).

Suppose that Assumption 1 holds. Then, for every γ<1/[σf,h]\gamma<1/[\sigma_{-f,h}]_{-} the function h^\hat{h} is a Legendre kernel with domh^=domh\operatorname{dom}\hat{h}=\operatorname{dom}h.

We will also (ab)use the notation Dψ\operatorname{D}_{\psi} of the Bregman distance for functions ψ:\varmathbbRn\varmathbb¯R\psi:\varmathbb R^{n}\rightarrow\overline{\varmathbb}R differentiable on CC that are not necessarily convex, thereby possibly having Dψ0\operatorname{D}_{\psi}\not\geq 0. This notational abuse is justified by the fact that all algebraic identities of the Bregman distance used in the manuscript (e.g., the three-point identity [12, Lem. 3.1]) are valid regardless of whether ψ\psi is convex or not, and will overall yield a major simplification of the math.

3.1 Parametric minimization model

As a first step towards the desired goals, as well as to considerably simplify the discussion, we begin by observing that the ii^{*}FRB-update is the result of a parametric minimization. Namely, by introducing the “model”

γ,βh-frb(w;x,x)=\displaystyle\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(w;x,x^{-}){}={} φ(w)+Dh^(w,x)+wx,f^β(x)f^β(x)\displaystyle\varphi(w){}+{}\operatorname{D}_{\hat{h}}(w,x){}+{}\langle{}w-x{},{}{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}(x)-{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}(x^{-}){}\rangle (3.2a)
=\displaystyle{}={} φ(w)+Dh^f^β(w,x)+Df^β(w,x)Df^β(x,x),\displaystyle\varphi(w){}+{}\operatorname{D}_{\hat{h}-\vphantom{f}\smash{\hat{f}}_{\!\beta}}(w,x){}+{}\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}(w,x^{-}){}-{}\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}(x,x^{-}), (3.2b)

observe that the xx-update in ii^{*}FRB can be compactly expressed as

xk+1\displaystyle x^{k+1}{}\in{} Tγ,βh-frb(xk,xk1),\displaystyle\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},x^{k-1}), (3.3a)
where Tγ,βh-frb:C×CC\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}:C\times C\rightrightarrows C defined by
Tγ,βh-frb(x,x)\displaystyle\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-}){}\coloneqq{} argminw\varmathbbRnγ,βh-frb(w;x,x)\displaystyle\operatorname*{arg\,min}_{w\in\varmathbb R^{n}}\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(w;x,x^{-}) (3.3b)

is the ii^{*}FRB-operator with stepsize γ\gamma and inertial parameter β\beta. The fact that Tγ,βh-frb\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}} maps pairs in C×CC\times C to subsets of CC is a consequence of Assumption 4, as we are about to formalize in Item 3. Note that many γ,βh-frb\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}} can be defined giving rise to the same Tγ,βh-frb\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}, and all these differ by additive terms which are constant with respect to ww. Among these, the one given in (3.2b) reflects the tangency condition γ,βh-frb(x;x,x)=φ(x)=φC¯(x)\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x;x,x^{-})=\varphi(x)=\varphi_{\overline{C}}(x) for every x,xCx,x^{-}\in C. A consequence of this fact and other basic properties are summarized next.

Lemma 3.2 (basic properties of the model γ,βh-frb\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}} and the operator Tγ,βh-frb\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}).

Suppose that I holds, and let γ<1/[σf,h]\gamma<1/[\sigma_{-f,h}]_{-} and β\varmathbbR\beta\in\varmathbb R be fixed. The following hold:

  1. 1.

    γ,βh-frb(x;x,x)=φC¯(x)\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x;x,x^{-}){}={}\varphi_{\overline{C}}(x) for all x,xCx,x^{-}\in C.

  2. 2.

    γ,βh-frb(w;x,x)\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(w;x,x^{-}) is level bounded in ww locally uniformly in (x,x)(x,x^{-}) on C×CC\times C.111Namely, (x,x)𝒱{w\varmathbbRnγ,βh-frb(w;x,x)α}\bigcup_{(x,x^{-})\in\mathcal{V}}{\mathopen{}\left\{w\in\varmathbb R^{n}{}\mathrel{\mid}{}\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(w;x,x^{-})\leq\alpha\right\}\mathclose{}} is bounded for any 𝒱C×C\mathcal{V}\subset C\times C compact and α\varmathbbR\alpha\in\varmathbb R.

  3. 3.

    Tγ,βh-frb\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}} is locally bounded and osc,222Being Tγ,βh-frb\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}} defined on C×CC\times C, osc and local boundedness are meant relative to C×CC\times C. Namely, gphTγ,βh-frb\operatorname{gph}\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}} is closed relative to C×C×\varmathbbRnC\times C\times\varmathbb R^{n}, and (x,x)𝒱Tγ,βh-frb(x,x)\bigcup_{(x,x^{-})\in\mathcal{V}}\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-}) is bounded for any 𝒱C×C\mathcal{V}\subset C\times C compact. and T(x,x)T(x,x^{-}) is a nonempty and compact subset of CC for any x,xCx,x^{-}\in C.

  4. 4.

    h^(x)h^(x¯)f^β(x)+f^β(x)^φ(x¯){\nabla}\hat{h}(x)-{\nabla}\hat{h}(\bar{x})-{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}(x)+{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}(x^{-})\in\hat{\partial}\varphi(\bar{x}) for any x,xCx,x^{-}\in C and x¯Tγ,βh-frb(x,x)\bar{x}\in\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-}).

  5. 5.

    If xTγ,βh-frb(x,x)x\in\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x), then 0^φ(x)0\in\hat{\partial}\varphi(x) and Tγ,βh-frb(x,x)={x}\operatorname{T}_{\gamma^{\prime}\!,\,\beta}^{h\text{-\sc frb}}(x,x)={\mathopen{}\left\{x{}\mathrel{\mid}{}{}\right\}\mathclose{}} for every γ(0,γ)\gamma^{\prime}\in(0,\gamma).

Proof.

We start by observing that 2.4 ensures that the set Tγ,βh-frb(x,x)\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-}) is nonempty for any x,xCx,x^{-}\in C; this follows by considering the expression (3.2a) of the model, by observing that, for any xCx\in C, φ+Dh^(,x)=g+1γhh^(x)h^(x),x\varphi+\operatorname{D}_{\hat{h}}({}\cdot{},x){}={}g+\tfrac{1}{\gamma}h-\hat{h}(x)-\langle{}{\nabla}\hat{h}(x){},{}{}\cdot{}-x{}\rangle. For the same reason, it then follows from Assumption 4 that T(x,x)CT(x,x^{-})\subset C.

  • 1  Apparent, by considering w=xw=x in (3.2b).

  • 2 & 3  The first assertion owes to the fact that h^\hat{h} is 1-coercive by 3.1 and that both h^\hat{h} and f^β{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta} are continuous on CC, so that for any compact 𝒱\varmathbbRn×\varmathbbRn\mathcal{V}\subset\varmathbb R^{n}\times\varmathbb R^{n} one has that limwinf(x,x)𝒱γ,βh-frb(w;x,x)=\lim_{\|w\|\to\infty}\operatorname*{inf}_{(x,x^{-})\in\mathcal{V}}\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(w;x,x^{-})=\infty, as is apparent from (3.2a). In turn, the second assertion follows from [33, Thm. 1.17].

  • 4  Follows from the optimality conditions of x¯argminγ,βh-frb(;x,x)\bar{x}\in\operatorname*{arg\,min}\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}({}\cdot{};x,x^{-}), for x¯C\bar{x}\in C by assumption and thus the calculus rule of [33, Ex. 8.8(c)] applies (having h^\hat{h} smooth around x¯C\bar{x}\in C).

  • 5  That 0^φ(x)0\in\hat{\partial}\varphi(x) follows from assertion 4, and the other claim from [1, Lem. 3.6] by observing that Tγ,βh-frb(x,x)=argmin{φ+Dh^(,x)}\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x)=\operatorname*{arg\,min}{\mathopen{}\left\{\varphi+\operatorname{D}_{\hat{h}}({}\cdot{},x){}\mathrel{\mid}{}{}\right\}\mathclose{}} for any γ>0\gamma>0 and β\varmathbbR\beta\in\varmathbb R. ∎

Remark 3.3 (inertial effect).

Letting f~=f+ch\tilde{f}=f+ch and g~=gch\tilde{g}=g-ch for some c\varmathbbRc\in\varmathbb R, f~+g~\tilde{f}+\tilde{g} gives an alternative decomposition of φ\varphi which still complies with I, having σ±f~,h=σ±f,h±c\sigma_{\pm\tilde{f},h}=\sigma_{\pm f,h}\pm c. Relative to this decomposition, it is easy to verify that the corresponding model ~h-frb\tilde{\mathcal{M}}^{h\text{-\sc frb}} is related to the original one in (3.2a) as

~γ~,β~h-frb=γ,βh-frbwith{γ=γ~1γ~cβ=β~γ~c1γ~c{γ~=γ1+γcβ~=β+γc1+γc,{\tilde{\mathcal{M}}_{\tilde{\gamma}\!,\,\tilde{\beta}}^{h\text{-\sc frb}}}{}={}\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}\quad\text{with}\quad{\mathopen{}\left\{\begin{array}[]{@{}l@{}l@{}}\gamma={}&\frac{\tilde{\gamma}}{1-\tilde{\gamma}c}\\[4.0pt] \beta={}&\frac{\tilde{\beta}-\tilde{\gamma}c}{1-\tilde{\gamma}c}\end{array}\right.\mathclose{}}\leavevmode\nobreak\ \leavevmode\nobreak\ \Leftrightarrow\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ {\mathopen{}\left\{\begin{array}[]{@{}l@{}l@{}}\tilde{\gamma}={}&\frac{\gamma}{1+\gamma c}\\[4.0pt] \tilde{\beta}={}&\frac{\beta+\gamma c}{1+\gamma c},\end{array}\right.\mathclose{}}

and in particular ii^{*}FRB steps with the respective parameters coincide. The effect of inertia can then be explained as a redistribution of multiples of hh among ff and gg in the problem formulation, having γ,βh-frb=~γ1β,0h-frb\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}{}={}{\tilde{\mathcal{M}}_{\frac{\gamma}{1-\beta}\,\!,\,\!0}^{h\text{-\sc frb}}} for any γ>0\gamma>0 and β<1\beta<1. ∎

3.2 The ii^{*}FRB-envelope

Having defined model γ,βh-frb\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}} and its solution mapping Tγ,βh-frb\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}} resulted from parametric minimization, we now introduce the associated value function, which we name ii^{*}FRB-envelope.

Definition 3.4 (ii^{*}FRB-envelope).

The envelope function associated to ii^{*}FRB with stepsize γ<1/[σf,h]\gamma<1/[\sigma_{f,h}]_{-} and inertia β\varmathbbR\beta\in\varmathbb R is ϕγ,βh-frb:C×C\varmathbbR\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}:C\times C\rightarrow\varmathbb R defined as

ϕγ,βh-frb(x,x)infw\varmathbbRnγ,βh-frb(w;x,x).\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-}){}\coloneqq{}\operatorname*{inf}_{w\in\varmathbb R^{n}}\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(w;x,x^{-}). (3.4)
Lemma 3.5 (basic properties of ϕγ,βh-frb\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}).

Suppose that I holds. Then, for any γ<1/[σf,h]\gamma<1/[\sigma_{f,h}]_{-} and β\varmathbbR\beta\in\varmathbb R the following hold:

  1. 1.

    ϕγ,βh-frb\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}} is (real-valued and) continuous on C×CC\times C; in fact, it is locally Lipschitz provided that f,h𝒞2(C)f,h\in\mathcal{C}^{2}(C).

  2. 2.

    For any x,xCx,x^{-}\in C and x¯Tγ,βh-frb(x,x)\bar{x}\in\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-})

    ϕγ,βh-frb(x,x)=γ,βh-frb(x¯;x,x)=φ(x¯)+Dh^f^β(x¯,x)+Df^β(x¯,x)Df^β(x,x).\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-}){}={}\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(\bar{x};x,x^{-}){}={}\varphi(\bar{x}){}+{}\operatorname{D}_{\hat{h}-\vphantom{f}\smash{\hat{f}}_{\!\beta}}(\bar{x},x){}+{}\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}(\bar{x},x^{-}){}-{}\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}(x,x^{-}).
  3. 3.

    ϕγ,βh-frb(x,x)φ(x)\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-}){}\leq{}\varphi(x) for any x,xCx,x^{-}\in C.

Proof.
  • 1  In light of the uniform level boundedness asserted in Item 2, continuity of ϕγ,βh-frb\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}} follows from [33, Thm. 1.17(c)] by observing that the mapping (x,x)γ,βh-frb(w;x,x)(x,x^{-})\mapsto\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(w;x,x^{-}) is continuous for every ww; in fact, when ff and hh are both 𝒞2\mathcal{C}^{2} on CC, its gradient exists and is continuous with respect to all its arguments, which together with local boundedness of Tγ,βh-frb\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}, cf. Item 3, gives that ϕγ,βh-frb-\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}} is a lower-𝒞1\mathcal{C}^{1} function in the sense of [33, Def. 10.29], and in particular locally Lipschitz continuous by virtue of [33, Thm.s 10.31 and 9.2].

  • 2 & 3  The identity follows by definition, cf. (3.4) and (3.3b). The inequality follows by considering w=xw=x in (3.4) and (3.2b). ∎

3.3 Establishing a merit function

We now work towards establishing a merit function for ii^{*}FRB, starting from comparing the values of ϕγ,βh-frb(x¯,x)\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(\bar{x},x) and ϕγ,βh-frb(x,x)\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-}), with x¯Tγ,βh-frb(x,x)\bar{x}\in\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-}). Owing to Item 3, we have

ϕγ,βh-frb(x¯,x)φ(x¯)=\displaystyle\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(\bar{x},x){}\leq{}\varphi(\bar{x}){}={} φC¯(x¯)\displaystyle\varphi_{\overline{C}}(\bar{x})
=\displaystyle{}={} ϕγ,βh-frb(x,x)Dh^f^β(x¯,x)Df^β(x¯,x)+Df^β(x,x).\displaystyle\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-}){}-{}\operatorname{D}_{\hat{h}-\vphantom{f}\smash{\hat{f}}_{\!\beta}}(\bar{x},x){}-{}\vphantom{\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}(\bar{x},x^{-})}\smash{\overbracket{\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}(\bar{x},x^{-})}}{}+{}\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}(x,x^{-}). (3.5)

From here two separate cases can be considered, each yielding surprisingly different results. The watershed lies in whether the bracketed term is positive or not: one case will result in a very straightforward convergence analysis in the full generality of I, while the other will necessitate an additional Lipschitz differentiability requirement. The convergence analysis in both cases revolves around the identification of a constant c>0c>0 determining a lower bounded merit function

γ,βh-frb=ϕγ,βh-frb+c2γDh+Dξ.\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}=\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}+\tfrac{c}{2\gamma}\operatorname{D}_{h}+\operatorname{D}_{\xi}. (3.6)

The difference between the two cases is determined by function ξ\xi appearing in the last Bregman operator Dξ\operatorname{D}_{\xi}, having ξ=f^β\xi=\vphantom{f}\smash{\hat{f}}_{\!\beta} in the former case and ξ=Lf^β𝒿\xi=L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}\mathcal{j} in the latter, where Lf^βL_{\vphantom{f}\smash{\hat{f}}_{\!\beta}} is a Lipschitz constant for f^β{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta} and

𝒿122\mathcal{j}\coloneqq\tfrac{1}{2}\|{}\cdot{}\|^{2} (3.7)

is the squared Euclidean norm. The two cases are stated in the next theorem, which constitutes the main result of this section. Special and worst-case scenarios leading to simplified statements will be given in Section 3.4. In what follows, patterning the normalization of σ±f,h\sigma_{\pm f,h} into p±f,hp_{\pm f,h} detailed in Section 2.2 we also introduce the scaled stepsize

αγLf,h,\alpha\coloneqq\gamma L_{f,h}, (3.8)

which as a result of the convergence analysis will be confined in the interval (0,1)(0,1).

Theorem 3.6.

Suppose that I holds and consider one of the following scenarios:

  1. 1

    either f^β\vphantom{f}\smash{\hat{f}}_{\!\beta} is convex (i.e., αpf,hβ0\alpha p_{f,h}-\beta\geq 0) and β>(1+3αpf,h)/2\beta{}>{}-(1+3\alpha p_{-f,h})/2, in which case

    ξf^βandc1+2β+3αpf,h>0,\xi\coloneqq\vphantom{f}\smash{\hat{f}}_{\!\beta}\quad\text{and}\quad c{}\coloneqq{}1+2\beta+3\alpha p_{-f,h}>0,
  2. 2

    or f^β\vphantom{f}\smash{\hat{f}}_{\!\beta} is Lf^βL_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}-Lipschitz differentiable, hh is σh\sigma_{h}-strongly convex, and

    c(1+αpf,h)σh2γLf^β>0,c{}\coloneqq{}(1+\alpha p_{-f,h})\sigma_{h}{}-{}2\gamma L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}{}>{}0,

    in which case ξLf^β𝒿\xi\coloneqq L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}\mathcal{j}.

Then, for γ,βh-frb\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}} as in (3.6) the following assertions hold:

  1. 1.
    For every x,xCx,x^{-}\in C and x¯Tγ,βh-frb(x,x)\bar{x}\in\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-}),
    γ,βh-frb(x¯,x)\displaystyle\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(\bar{x},x){}\leq{} γ,βh-frb(x,x)c2γDh(x¯,x)c2γDh(x,x)\displaystyle\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-}){}-{}\tfrac{c}{2\gamma}\operatorname{D}_{h}(\bar{x},x){}-{}\tfrac{c}{2\gamma}\operatorname{D}_{h}(x,x^{-}) (3.9a)
    and
    φC¯(x¯)\displaystyle\varphi_{\overline{C}}(\bar{x}){}\leq{} γ,βh-frb(x,x)cγDh(x¯,x)c2γDh(x,x).\displaystyle\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-}){}-{}\tfrac{c}{\gamma}\operatorname{D}_{h}(\bar{x},x){}-{}\tfrac{c}{2\gamma}\operatorname{D}_{h}(x,x^{-}). (3.9b)
  2. 2.

    infγ,βh-frb=infφC¯\operatorname*{inf}\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}=\operatorname*{inf}\varphi_{\overline{C}}, and γ,βh-frb\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}} is level bounded iff so is φC¯\varphi_{\overline{C}}.

The proof of this result is detailed in the dedicated Appendix A; before that, let us draw some comments. As clarified in the statement of 1, convexity of f^β\vphantom{f}\smash{\hat{f}}_{\!\beta} can be enforced by suitably choosing γ\gamma and β\beta without imposing additional requirements on the problem. However, an unusual yet reasonable condition on inertial parameter β\beta may be necessary.

Remark 3.7.

In order to furnish 1, we shall see soon that β0\beta\leq 0 may be required; see Section 3.4. Such assumption, although more pessimistic, coincides with a recent conjecture by Dragomir et al. [14, §4.5.3], which states that inertial methods with nonadaptive coefficients fail to convergence in the relative smoothness setting, and provides an alternative perspective to the same matter through the lens of the convexity of f^β\vphantom{f}\smash{\hat{f}}_{\!\beta}. ∎

Unlike 1, however, additional assumptions are needed for the Lipschitz differentiable case of 2. This is because the requirement is equivalent to smoothness relative to the Euclidean Bregman kernel 𝒿\mathcal{j}, while I prescribes bounds only relative to hh. For simplicity, for functions ψ1,ψ2:\varmathbbRn\varmathbb¯R\psi_{1},\psi_{2}:\varmathbb R^{n}\rightarrow\overline{\varmathbb}R we write

ψ1ψ2\psi_{1}\preceq\psi_{2}

to indicate that ψ2ψ1\psi_{2}-\psi_{1} is convex.

Remark 3.8.

Under I, one has that f^β\vphantom{f}\smash{\hat{f}}_{\!\beta} is Lipschitz-differentiable with modulus Lf^βL_{\vphantom{f}\smash{\hat{f}}_{\!\beta}} under either one of the following conditions:

  1. 1

    either h{\nabla}h is LhL_{h}-Lipschitz, in which case Lf^β=Lhγmax{βαpf,h,βαpf,h}L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}{}={}\tfrac{L_{h}}{\gamma}\max{\mathopen{}\left\{\beta-\alpha p_{f,h},-\beta-\alpha p_{-f,h}{}\mathrel{\mid}{}{}\right\}\mathclose{}},

  2. 2

    or β=0\beta=0 and f{\nabla}f is LfL_{f}-Lipschitz, in which case Lf^β=LfL_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}=L_{f}.

Recalling that f^β=fβγh\vphantom{f}\smash{\hat{f}}_{\!\beta}=f-\frac{\beta}{\gamma}h, the second condition is tautological. In case h{\nabla}h is LhL_{h}-Lipschitz, 2.5 yields that

Lhγ[βαpf,h]+𝒿αpf,hβγhf^βαpf,h+βγhLhγ[βαpf,h]+𝒿,-\tfrac{L_{h}}{\gamma}[\beta-\alpha p_{f,h}]_{+}\mathcal{j}{}\preceq{}\tfrac{\alpha p_{f,h}-\beta}{\gamma}h{}\preceq{}\vphantom{f}\smash{\hat{f}}_{\!\beta}{}\preceq{}-\tfrac{\alpha p_{-f,h}+\beta}{\gamma}h{}\preceq{}\tfrac{L_{h}}{\gamma}[-\beta-\alpha p_{-f,h}]_{+}\mathcal{j},

and that consequently

Lf^βLhγmax{[βαpf,h]+,[βαpf,h]+}=Lhγmax{βαpf,h,βαpf,h}L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}{}\coloneqq{}\tfrac{L_{h}}{\gamma}\max{\mathopen{}\left\{\bigl{[}\beta-\alpha p_{f,h}\bigl{]}_{+},\bigl{[}-\beta-\alpha p_{-f,h}\bigr{]}_{+}{}\mathrel{\mid}{}{}\right\}\mathclose{}}{}={}\tfrac{L_{h}}{\gamma}\max{\mathopen{}\left\{\beta-\alpha p_{f,h},-\beta-\alpha p_{-f,h}{}\mathrel{\mid}{}{}\right\}\mathclose{}}

is a Lipschitz modulus for f^β{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}. ∎

3.4 Simplified bounds

In this section we provide bounds that only discern whether ff is convex, concave, or neither of the above. As discussed in 2.3, these cases can be recovered by suitable combinations of the coefficients p±f,h{0,±1}p_{\pm f,h}\in{\mathopen{}\left\{0,\pm 1{}\mathrel{\mid}{}{}\right\}\mathclose{}}, and thus lead to easier, though possibly looser, bounds compared to those in 3.6. We will also avail ourselves of the estimates of Lf^βL_{\vphantom{f}\smash{\hat{f}}_{\!\beta}} in 3.8 to discuss the cases in which f^β\vphantom{f}\smash{\hat{f}}_{\!\beta} is Lipschitz differentiable.

Without distinguishing between upper and lower relative bounds, whenever ff is Lf,hL_{f,h}-smooth relative to hh as in I one can consider σ±f,h=Lf,h\sigma_{\pm f,h}=-L_{f,h} or, equivalently, pf,h=pf,h=1p_{f,h}=p_{-f,h}=-1. Plugging these values into 3.6 yields the following.

Corollary 3.9 (worst-case bounds).

Suppose that I holds. All the claims of 3.6 hold when γ>0\gamma>0, β\varmathbbR\beta\in\varmathbb R and c>0c>0 are such that

  1. 1

    either 1/2<β<0-1/2<\beta<0 and γ(1/Lf,h)min{β,(1+2βc)/3},\gamma\leq(1/L_{f,h})\min{\mathopen{}\left\{-\beta,(1+2\beta-c)/3{}\mathrel{\mid}{}{}\right\}\mathclose{}}, in which case ξ=f^β\xi=\vphantom{f}\smash{\hat{f}}_{\!\beta};

  1. 1

    or hh is σh\sigma_{h}-strongly convex and LhL_{h}-Lipschitz differentiable, |β|<σh/2Lh|\beta|<\sigma_{h}/2L_{h} and γ(1/Lf,h)[(σh2Lh|β|c)/(σh+2Lh)],\gamma\leq(1/L_{f,h})[(\sigma_{h}-2L_{h}|\beta|-c)/(\sigma_{h}+2L_{h})], in which case ξ=(Lh/γ)(α+|β|)𝒿\xi=(L_{h}/\gamma)(\alpha+|\beta|)\mathcal{j};

  2. 2

    or hh is σh\sigma_{h}-strongly convex, f{\nabla}f is LfL_{f}-Lipschitz, β=0\beta=0 and γ(1/Lf,h)[(σhc)/(σh+2Lh)],\gamma\leq(1/L_{f,h})[(\sigma_{h}-c)/(\sigma_{h}+2L_{h})], in which case ξ=Lf𝒿\xi=L_{f}\mathcal{j}.

Proof.

We will discuss only the bounds on γ\gamma as those on β\beta will automatically follow from the fact that γ>0\gamma>0. Setting p±f,h=1p_{\pm f,h}=-1 in 3.6, one has:

  • 1  From 1 we otain 0<c=1+2β3α0{}<{}c{}={}1+2\beta-3\alpha and (13α)/2<βα-(1-3\alpha)/2{}<{}\beta{}\leq{}-\alpha. By replacing the first equality with “\leq”, that is, by possibly allowing looser values of cc, the bound on γ=α/Lf,h\gamma=\alpha/L_{f,h} as in 1 is obtained.

  • 1 & 2  the two subcases refer to the corresponding items in 3.8. We shall show only the first one, as the second one is a trivial adaptation. The value of Lf^βL_{\vphantom{f}\smash{\hat{f}}_{\!\beta}} as in 1 reduces to Lf^β=(Lh/γ)(α+|β|)L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}=(L_{h}/\gamma)(\alpha+|\beta|). Plugged into 2 yields 0<c=(1α)σh2Lh(α+|β|)0{}<{}c{}={}(1-\alpha)\sigma_{h}{}-{}2L_{h}(\alpha+|\beta|), implying that γ\gamma is bounded as in 1. ∎

When ff is convex, σf,h=0\sigma_{f,h}=0 can be considered resulting in pf,h=0p_{f,h}=0 and pf,h=1p_{-f,h}=-1.

Corollary 3.10 (bounds when ff is convex).

Suppose that I holds and that ff is convex. All the claims of 3.6 remain valid if γ>0\gamma>0, β\varmathbbR\beta\in\varmathbb R and c>0c>0 are such that

  1. 1

    either 1/2<β0-1/2<\beta\leq 0 and γ(1/Lf,h)[(1+2βc)/3],\gamma\leq(1/L_{f,h})[(1+2\beta-c)/3], in which case ξ=f^β\xi=\vphantom{f}\smash{\hat{f}}_{\!\beta};

  1. 1

    or hh is σh\sigma_{h}-strongly convex and LhL_{h}-Lipschitz differentiable,

    |β|<σh2Lhandγ1Lf,hmin{σh+2Lhβcσh+2Lh,σh2Lhβcσh},|\beta|<\tfrac{\sigma_{h}}{2L_{h}}\quad\text{and}\quad\gamma\leq\tfrac{1}{L_{f,h}}\min{\mathopen{}\left\{\tfrac{\sigma_{h}+2L_{h}\beta-c}{\sigma_{h}+2L_{h}},\,\tfrac{\sigma_{h}-2L_{h}\beta-c}{\sigma_{h}}{}\mathrel{\mid}{}{}\right\}\mathclose{}},

    in which case ξ=(Lh/γ)max{β,αβ}𝒿\xi=(L_{h}/\gamma)\max{\mathopen{}\left\{\beta,\alpha-\beta{}\mathrel{\mid}{}{}\right\}\mathclose{}}\mathcal{j};

  2. 2

    or hh is σh\sigma_{h}-strongly convex, f{\nabla}f is LfL_{f}-Lipschitz, β=0\beta=0, and γ(σhc)/(σhLf,h+2Lf),\gamma\leq(\sigma_{h}-c)/(\sigma_{h}L_{f,h}+2L_{f}), in which case ξ=Lf𝒿\xi=L_{f}\mathcal{j}.

Proof.

As motivated in the proof of 3.9, it suffices to prove the bounds on γ\gamma. Similarly, the proof of assertion 2 is an easy adaptation of that of 1. Setting pf,h=0p_{f,h}=0 and pf,h=1p_{-f,h}=-1 in 3.6, one has:

  • 1  From 1 we otain 0<c=1+2β3α0{}<{}c{}={}1+2\beta-3\alpha and (13α)/2<β0-(1-3\alpha)/2{}<{}\beta{}\leq{}0. Again by possibly allowing looser values of cc, the bound on γ=α/Lf,h\gamma=\alpha/L_{f,h} as in 1 is obtained.

  • 1  The value of Lf^βL_{\vphantom{f}\smash{\hat{f}}_{\!\beta}} as in 1 reduces to Lhγmax{β,αβ}\frac{L_{h}}{\gamma}\max{\mathopen{}\left\{\beta,\alpha-\beta{}\mathrel{\mid}{}{}\right\}\mathclose{}}. Plugged into 2 yields 0<c=(1α)σh2Lhmax{β,αβ}0{}<{}c{}={}(1-\alpha)\sigma_{h}{}-{}2L_{h}\max{\mathopen{}\left\{\beta,\alpha-\beta{}\mathrel{\mid}{}{}\right\}\mathclose{}}, which can be loosened as

    {c(1α)σh2Lhβc(1α)σh2Lh(αβ).{\mathopen{}\left\{\begin{array}[]{@{}l@{}l@{}}c{}\leq{}(1-\alpha)\sigma_{h}{}-{}2L_{h}\beta\\ c{}\leq{}(1-\alpha)\sigma_{h}{}-{}2L_{h}(\alpha-\beta).\end{array}\right.\mathclose{}}

    In terms of γ=α/Lf,h\gamma=\alpha/L_{f,h}, this results in the bound of assertion 1. ∎

Similarly, when ff is concave (that is, f-f is convex), then σf,h=0\sigma_{-f,h}=0 can be considered, resulting in pf,h=1p_{f,h}=-1 and pf,h=0p_{-f,h}=0. The proof is omitted, as it uses the same arguments as in the previous results.

Corollary 3.11 (bounds when ff is concave).

Suppose that I holds and that ff is concave. All the claims of 3.6 remain valid if γ>0\gamma>0, β\varmathbbR\beta\in\varmathbb R and c>0c>0 are such that

  1. 1

    either (c1)/2β<0(c-1)/2\leq\beta<0 and γβ/Lf,h,\gamma\leq-\beta/L_{f,h}, in which case ξ=f^β\xi=\vphantom{f}\smash{\hat{f}}_{\!\beta};

  2. 2

    or hh is σh\sigma_{h}-strongly convex and LhL_{h}-Lipschitz differentiable,

    cσh2Lhβ<σh2Lhandγ1Lf,hσh2Lhβc2Lh,\tfrac{c-\sigma_{h}}{2L_{h}}\leq\beta<\tfrac{\sigma_{h}}{2L_{h}}\quad\text{and}\quad\gamma\leq\tfrac{1}{L_{f,h}}\tfrac{\sigma_{h}-2L_{h}\beta-c}{2L_{h}},

    in which case ξ=Lhγmax{α+β,β}𝒿\xi=\frac{L_{h}}{\gamma}\max{\mathopen{}\left\{\alpha+\beta,-\beta{}\mathrel{\mid}{}{}\right\}\mathclose{}}\mathcal{j};

  3. 3

    or hh is σh\sigma_{h}-strongly convex, ff is LfL_{f}-Lipschitz differentiable, β=0\beta=0 and γ(σhc)/(2Lf),\gamma\leq(\sigma_{h}-c)/(2L_{f}), in which case ξ=Lf𝒿\xi=L_{f}\mathcal{j}.

4 Convergence analysis

In this section we study the behavior of sequences generated by ii^{*}FRB. Although some basic convergence results can be derived in the full generality of I, establishing local optimality guarantees of the limit point(s) will ultimately require an additional full domain assumption.

Assumption II.

Function hh has full domain, that is, C=\varmathbbRnC=\varmathbb R^{n}.

II is standard for nonconvex splitting algorithms in a relative smooth setting. To the best of our knowledge, the question regarding whether this requirement can be removed remains open; see, e.g., [35] and the references therein.

4.1 Function value convergence

We begin with the convergence of merit function value.

Theorem 4.1 (function value convergence of ii^{*}FRB).

Let (xk)k\varmathbbN(x^{k})_{k\in\varmathbb N} be a sequence generated by ii^{*}FRB (Algorithm 1) in the setting of 3.6. Then,

  1. 1.

    It holds that

    γ,βh-frb(xk+1,xk)γ,βh-frb(xk,xk1)c2γDh(xk+1,xk)c2γDh(xk,xk1).\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}{\mathopen{}\left(x^{k+1},x^{k}\right)\mathclose{}}\leq\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}{\mathopen{}\left(x^{k},x^{k-1}\right)\mathclose{}}-\tfrac{c}{2\gamma}\operatorname{D}_{h}{\mathopen{}\left(x^{k+1},x^{k}\right)\mathclose{}}-\tfrac{c}{2\gamma}\operatorname{D}_{h}{\mathopen{}\left(x^{k},x^{k-1}\right)\mathclose{}}. (4.1)

    Then, k=0Dh(xk,xk1)<\sum_{k=0}^{\infty}\operatorname{D}_{h}{\mathopen{}\left(x^{k},x^{k-1}\right)\mathclose{}}<\infty and γ,βh-frb(xk,xk1)φ\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},x^{k-1})\to\varphi^{\star} as kk\to\infty for some φinfφC¯\varphi^{\star}\geq\operatorname*{inf}\varphi_{\overline{C}}.

  2. 2.

    If φC¯\varphi_{\overline{C}} is level bounded, then (xk)k\varmathbbN(x^{k})_{k\in\varmathbb N} is bounded.

  3. 3.

    Suppose that II holds, and let ω\omega be the set of limit points of (xk)k\varmathbbN(x^{k})_{k\in\varmathbb N}. Then, φ\varphi is constant on ω\omega with value φ\varphi^{\star}, and for every xωx^{\star}\in\omega it holds that xTγ,βh-frb(x,x)x^{\star}\in\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{\star},x^{\star}) and 0^φ(x)0\in\hat{\partial}\varphi(x^{\star}).

Proof.
  • 1  Recall from 3.6 that (4.1) holds and that infγ,βh-frb=infφC¯>\operatorname*{inf}\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}=\operatorname*{inf}\varphi_{\overline{C}}>-\infty, from which the convergence of (γ,βh-frb(xk,xk1))k\varmathbbN(\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},x^{k-1}))_{k\in\varmathbb N} readily follows. In turn, a telescoping argument on (4.1) shows that k\varmathbbNDh(xk,xk1)<\sum_{k\in\varmathbb N}\operatorname{D}_{h}(x^{k},x^{k-1})<\infty.

  • 2  It follows from Item 1 that γ,βh-frb(xk+1,xk)γ,βh-frb(x0,x1)\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}{\mathopen{}\left(x^{k+1},x^{k}\right)\mathclose{}}\leq\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{0},x^{-1}) holds for every kk. Then boundedness of (xk)k\varmathbbN(x^{k})_{k\in\varmathbb N} is implied by level boundedness of γ,βh-frb\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}; see Item 2.

  • 3  Suppose that a subsequence (xkj)j\varmathbbN(x^{k_{j}})_{j\in\varmathbb N} converges to a point xx^{\star}, then so do (xkj±1)k\varmathbbN(x^{k_{j}\pm 1})_{k\in\varmathbb N} by Item 1 and [6, Prop. 2.2(iii)]. Since xkj+1Tγ,βh-frb(xkj,xkj1)x^{k_{j}+1}\in\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k_{j}},x^{k_{j}-1}), by passing to the limit osc of Tγ,βh-frb\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}} (Item 3) implies that xTγ,βh-frb(x,x)x^{\star}\in\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{\star},x^{\star}). Invoking Item 5 yields the stationarity condition 0^φ(x)0\in\hat{\partial}\varphi(x^{\star}). Moreover, by continuity of γ,βh-frb\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}} one has

    φ=(def)limkγ,βh-frb(xk,xk1)=γ,βh-frb(x,x)=φ(x),\varphi^{\star}{}\mathrel{{}\mathop{=}\limits^{\text{\clap{\tiny(def)}}}}{}\lim_{k\to\infty}\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}{\mathopen{}\left(x^{k},x^{k-1}\right)\mathclose{}}{}={}\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{\star},x^{\star}){}={}\varphi(x^{\star}),

    where the last equality follows from Item 2, owing to the inclusion xTγ,βh-frb(x,x)x^{\star}\in\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{\star},x^{\star}) (and the fact that Dψ(x,x)=0\operatorname{D}_{\psi}(x,x)=0 for any differentiable function ψ\psi). From the arbitrarity of xωx^{\star}\in\omega we conclude that φφ\varphi\equiv\varphi^{\star} on ω\omega. ∎

It is now possible to demonstrate the necessity of some of the bounds on the stepsize that were discussed in Section 3.4, by showing that Dh(xk+1,xk)\operatorname{D}_{h}(x^{k+1},x^{k}) may otherwise fail to vanish. Note that, for β=0\beta=0, the following counterexample constitutes a tightness certificate for the bound γ<1/3Lf\gamma<1/3L_{f} derived in [42] in the noninertial Euclidean case.

Example 4.2.

The bound α=γLf,h<(1+2β)/3\alpha=\gamma L_{f,h}<(1+2\beta)/3 is tight even in the Euclidean case. To see this, consider g=δ{±1}g=\operatorname{\delta}_{{\mathopen{}\left\{\pm 1{}\mathrel{\mid}{}{}\right\}\mathclose{}}} and for a fixed L>0L>0 let f(x)=Lh(x)=L2x2f(x)=Lh(x)=\frac{L}{2}x^{2}. Then, one has Lf,h=σf,h=LL_{f,h}=\sigma_{f,h}=L and σf,h=L\sigma_{-f,h}=-L. For γ<1/L=1/[σf,h]\gamma<1/L=1/[\sigma_{-f,h}]_{-}, it is easy to see that

Tγ,βh-frb(x,x)=sgn(h^(x)f^β(x)+f^β(x))=sgn((12α+β)x+(αβ)x)\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-}){}={}\operatorname{sgn}\bigl{(}{\nabla}\hat{h}(x)-{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}(x)+{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}(x^{-})\bigr{)}{}={}\operatorname{sgn}\bigl{(}(1-2\alpha+\beta)x+(\alpha-\beta)x^{-}\bigr{)}

(with sgn0{±1}\operatorname{sgn}0\coloneqq{\mathopen{}\left\{\pm 1{}\mathrel{\mid}{}{}\right\}\mathclose{}}). If α(1+2β)/3\alpha\geq(1+2\beta)/3, then ((1)k)k\varmathbbN((-1)^{k})_{k\in\varmathbb N} is a sequence generated by ii^{*}FRB for which Dh(xk+1,xk)2↛0\operatorname{D}_{h}(x^{k+1},x^{k})\equiv 2\not\to 0. ∎

As a consequence of Item 1, the condition Dh(xk+1,xk)ε\operatorname{D}_{h}(x^{k+1},x^{k})\leq\varepsilon is satisfied in finitely many iterations for any tolerance ε>0\varepsilon>0. While this could be used as termination criterion, in the generality of s Iand II there is no guarantee on the relaxed stationarity measure dist(0,^φ(xk+1))\operatorname{dist}(0,\hat{\partial}\varphi(x^{k+1})), which through Item 4 can only be estimated as

dist(0,^φ(xk+1))vk+1withvk+1h^(xk)h^(xk+1)f^β(xk)+f^β(xk1).\operatorname{dist}(0,\hat{\partial}\varphi(x^{k+1})){}\leq{}\|v^{k+1}\|\leavevmode\nobreak\ \leavevmode\nobreak\ \text{with}\leavevmode\nobreak\ \leavevmode\nobreak\ v^{k+1}\coloneqq{\nabla}\hat{h}(x^{k})-{\nabla}\hat{h}(x^{k+1})-{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}(x^{k})+{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}(x^{k-1}). (4.2)

On the other hand, in accounting for possibly unbounded sequences, additional assumptions are needed for the condition vk+1ε\|v^{k+1}\|\leq\varepsilon to be met in finitely many iterations.

Lemma 4.3 (termination criteria).

Suppose that II holds, and let (xk)k\varmathbbN(x^{k})_{k\in\varmathbb N} be a sequence generated by ii^{*}FRB (Algorithm 1) in the setting of 3.6. If

  1. 1

    either φ\varphi is level bounded,

  2. 2

    or hh^{\ast} is uniformly convex (equivalently, hh is uniformly smooth),

then, for vk+1v^{k+1} as in (4.2) it holds that vk+10v^{k+1}\to 0. Thus, for any ε>0\varepsilon>0 the condition vk+1ε\|v^{k+1}\|\leq\varepsilon is satisfied for all kk large enough and guarantees dist(0,^φ(xk+1))ε\operatorname{dist}(0,\hat{\partial}\varphi(x^{k+1}))\leq\varepsilon.

Proof.

The implication of vk+1ε\|v^{k+1}\|\leq\varepsilon and ε\varepsilon-stationarity of xk+1x^{k+1} has already been discussed. If φ\varphi is level bounded, then 4.1 implies that (xk)k\varmathbbN(x^{k})_{k\in\varmathbb N} is bounded and vk+10v^{k+1}\to 0 holds by continuity of h^{\nabla}\hat{h} and f^β{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}. In case hh^{\ast} is uniformly convex, this being equivalent to uniform smoothness of hh as shown in [4, Cor. 2.8], the vanishing of Dh(h(xk),h(xk+1))=Dh(xk+1,xk)\operatorname{D}_{h^{\ast}}({\nabla}h(x^{k}),{\nabla}h(x^{k+1}))=\operatorname{D}_{h}(x^{k+1},x^{k}) implies through [32, Prop. 4.13(IV)] that h(xk)h(xk+1)0\|{\nabla}h(x^{k})-{\nabla}h(x^{k+1})\|\to 0. In turn, by relative smoothness necessarily f^β(xk+1)f^β(xk)0\|{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}(x^{k+1})-{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}(x^{k})\|\to 0 as well, overall proving that vk+10v^{k+1}\to 0. ∎

4.2 Global convergence

In this subsection, we work towards the global sequential convergence of ii^{*}FRB. To this end, we introduce a key concept which will be useful soon. For η(0,]\eta\in(0,\infty], denote by Ψη\Psi_{\eta} the class of functions ψ:[0,η)\varmathbbR+\psi:[0,\eta)\rightarrow\varmathbb R_{+} satisfying the following: (i) ψ(t)\psi(t) is right-continuous at t=0t=0 with ψ(0)=0\psi(0)=0; (ii) ψ\psi is strictly increasing on [0,η)[0,\eta).

Definition 4.4 ([41, Def. 5]).

Let f:\varmathbbRn\varmathbb¯Rf:\varmathbb R^{n}\rightarrow\overline{\varmathbb}R be proper and lsc. Let x¯domf\bar{x}\in\operatorname{dom}\partial f and μ\varmathbbR\mu\in\varmathbb R, and let VdomfV\subseteq\operatorname{dom}\partial f be a nonempty subset.

  1. 1.

    We say that ff has the pointwise generalized concave Kurdyka-Łojasiewicz (KL) property at x¯domf\bar{x}\in\operatorname{dom}\partial f, if there exist a neighborhood Ux¯U\ni\bar{x}, η(0,]\eta\in(0,\infty] and a concave ψΨη\psi\in\Psi_{\eta}, such that for all xU[0<ff(x¯)<η]x\in U\cap[0<f-f(\bar{x})<\eta],

    ψ(f(x)f(x¯))dist(0,f(x))1,\psi^{\prime}_{-}\bigl{(}f(x)-f(\bar{x})\bigr{)}\cdot\operatorname{dist}\bigl{(}0,\partial f(x)\bigr{)}\geq 1,

    where ψ\psi_{-}^{\prime} denotes the left derivative. Moreover, ff is a generalized concave KL function if it has the generalized concave KL property at every xdomfx\in\operatorname{dom}\partial f.

  2. 2.

    Suppose that f(x)=μf(x)=\mu on VV. We say ff has the setwise333We shall omit adjectives “pointwise” and “setwise” whenever there is no ambiguity. generalized concave KL property on VV if there exist UVU\supset V, η(0,]\eta\in(0,\infty] and a concave ψΨη\psi\in\Psi_{\eta} such that for every xU[0<fμ<η]x\in U\cap[0<f-\mu<\eta],

    ψ(f(x)μ)dist(0,f(x))1.\psi^{\prime}_{-}\bigl{(}f(x)-\mu\bigr{)}\cdot\operatorname{dist}\bigl{(}0,\partial f(x)\bigr{)}\geq 1.

For a subset Ω\varmathbbRn\Omega\subseteq\varmathbb R^{n}, define (ε>0)(\forall\varepsilon>0) Ωε={x\varmathbbRndist(x,Ω)<ε}\Omega_{\varepsilon}={\mathopen{}\left\{x\in\varmathbb R^{n}{}\mathrel{\mid}{}\operatorname{dist}(x,\Omega)<\varepsilon\right\}\mathclose{}}.

Lemma 4.5 ([41, Lem. 4.4]).

Let f:\varmathbbRn\varmathbb¯Rf:\varmathbb R^{n}\rightarrow\overline{\varmathbb}R be proper lsc and let μ\varmathbbR\mu\in\varmathbb R. Let Ωdomf\Omega\subseteq\operatorname{dom}\partial f be a nonempty compact set on which f(x)=μf(x)=\mu for all xΩx\in\Omega. Then the following hold:

  1. 1.

    Suppose that ff satisfies the pointwise generalized concave KL property at each xΩx\in\Omega. Then there exist ε>0,η(0,]\varepsilon>0,\eta\in(0,\infty] and ψΨη\psi\in\Psi_{\eta} such that ff has the setwise generalized concave KL property on Ω\Omega with respect to U=ΩεU=\Omega_{\varepsilon}, η\eta and ψ\psi.

  2. 2.

    Set U=ΩεU=\Omega_{\varepsilon} and define h:(0,η)\varmathbbR+h:(0,\eta)\rightarrow\varmathbb R_{+} by

    h(s)=sup{dist1(0,f(x))xU[sfμ<η]}.h(s)=\sup{\mathopen{}\left\{\operatorname{dist}^{-1}{\mathopen{}\left(0,\partial f(x)\right)\mathclose{}}{}\mathrel{\mid}{}x\in U\cap[s\leq f-\mu<\eta]\right\}\mathclose{}}.

    Then the function ψ~:[0,η)\varmathbbR+\tilde{\psi}:[0,\eta)\rightarrow\varmathbb R_{+} defined by (0<t<η)ψ~(t)=0th(s)𝑑s(\forall 0<t<\eta)\leavevmode\nobreak\ \tilde{\psi}(t)=\int_{0}^{t}h(s)ds with ψ~(0)=0\tilde{\psi}(0)=0 is well defined and belongs to Ψη\Psi_{\eta}. The function ff has the setwise generalized concave KL property on Ω\Omega with respect to UU, η\eta and ψ~\tilde{\psi}. Moreover,

    ψ~=inf{ψΨη|ψ is a concave desingularizing functionof f on Ω with respect to U and η}.\tilde{\psi}{}={}\operatorname*{inf}{\mathopen{}\left\{\psi\in\Psi_{\eta}\,\Big{|}\,\text{\begin{tabular}[]{@{}c@{}}$\psi$ is a concave desingularizing function\\ of $f$ on $\Omega$ with respect to $U$ and $\eta$\end{tabular}}\right\}\mathclose{}}.

We say that ψ~\tilde{\psi} is the exact modulus of the setwise generalized concave KL property of ff on Ω\Omega with respect to UU and η\eta.

In the remainder of the section, we will make use of the norm ||||||{|\kern-0.86108pt|\kern-0.86108pt|{}\cdot{}|\kern-0.86108pt|\kern-0.86108pt|} on the product space \varmathbbRn×\varmathbbRn\varmathbb R^{n}\times\varmathbb R^{n} defined as |(x,y)|=x+y{|\kern-0.86108pt|\kern-0.86108pt|(x,y)|\kern-0.86108pt|\kern-0.86108pt|}=\|x\|+\|y\|.

Theorem 4.6 (sequential convergence of ii^{*}FRB).

Suppose that II holds, and let (xk)k\varmathbbN(x^{k})_{k\in\varmathbb N} be a sequence generated by ii^{*}FRB (Algorithm 1) in the setting of 3.6. Define (k\varmathbbN)(\forall k\in\varmathbb N) zk=(xk+1,xk,xk1)z^{k}=(x^{k+1},x^{k},x^{k-1}) and let ω(z0)\omega(z^{0}) be the set of limit points of (zk)k\varmathbbN(z_{k})_{k\in\varmathbb N}. Define

(ω,x,x\varmathbbRn)γ,βh-frb(ω,x,x)=γ,βh-frb(ω;x,x)+c2γDh(x,x)+Dξ(x,x).(\forall\omega,x,x^{-}\in\varmathbb R^{n})\leavevmode\nobreak\ \mathcal{F}_{\gamma,\beta}^{h\text{\sc-frb}}(\omega,x,x^{-})=\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(\omega;x,x^{-})+\tfrac{c}{2\gamma}\operatorname{D}_{h}(x,x^{-})+D_{\xi}(x,x^{-}).

Assume in addition the following:

  1. 1

    φ\varphi is level bounded.

  2. 2

    f,hf,h are twice continuously differentiable and 2h\nabla^{2}h is positive definite.

  3. 3

    (r>0)(η[0,))(\exists r>0)\leavevmode\nobreak\ (\exists\eta\in[0,\infty)) γ,βh-frb\mathcal{F}_{\gamma,\beta}^{h\text{\sc-frb}} satisfies the generalized concave KL property on Ω=ω(z0)\Omega=\omega(z^{0}) with respect to U=ΩrU=\Omega_{r} and η\eta.

Then k=0xk+1xk<\sum_{k=0}^{\infty}\|{}x^{k+1}-x^{k}{}\|<\infty and there exists xx^{\star} with 0^φ(x)0\in\hat{\partial}\varphi(x^{\star}) such that xkxx^{k}\to x^{\star} as kk\to\infty. To be specific, there exists k0\varmathbbNk_{0}\in\varmathbb N such that for all lk0+1l\geq k_{0}+1

k=0xk+1xkk=0lxk+1xk+4γMcσψ~(γ,βh-frb(xl,xl1)φ),\sum_{k=0}^{\infty}\|{}x^{k+1}-x^{k}{}\|{}\leq{}\sum_{k=0}^{l}\|{}x^{k+1}-x^{k}{}\|{}+{}\tfrac{4\gamma M}{c\sigma}\tilde{\psi}{\mathopen{}\left(\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{l},x^{l-1})-\varphi^{\star}\right)\mathclose{}}, (4.3)

where M>0M>0 is some constant, σ>0\sigma>0 is the strong convexity modulus of hh on \mathcal{B}, the closed ball in which (xk)k\varmathbbN(x^{k})_{k\in\varmathbb N} lies, and ψ~\tilde{\psi} is the exact modulus of the generalized concave KL property associated with γ,βh-frb\mathcal{F}_{\gamma,\beta}^{h\text{\sc-frb}} produced by Item 2.

Proof.

Set (k\varmathbbN)(\forall k\in\varmathbb N) δk=γ,βh-frb(zk)\delta_{k}=\mathcal{F}_{\gamma,\beta}^{h\text{\sc-frb}}(z^{k}) for simplicity. Then δk=γ,βh-frb(xk,xk1)\delta_{k}=\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},x^{k-1}), δkφ\delta_{k}\to\varphi^{\star} decreasingly and dist(xk,ω)0\operatorname{dist}(x^{k},\omega)\to 0 as kk\to\infty by invoking 4.1. Assume without loss of generality that (k\varmathbbN)(\forall k\in\varmathbb N) δk>φ\delta_{k}>\varphi^{\star}, otherwise we would have (k0\varmathbbN)(\exists k_{0}\in\varmathbb N) xk0=xk0+1x^{k_{0}}=x^{k_{0}+1} due to Item 1, from which the desired result readily follows by simple induction. Thus (k0\varmathbbN)(\exists k_{0}\in\varmathbb N) (kk0)(\forall k\geq k_{0}) zkΩr[0<γ,βh-frbφ<η]z^{k}\in\Omega_{r}\cap[0<\mathcal{F}_{\gamma,\beta}^{h\text{\sc-frb}}-\varphi^{\star}<\eta]. Appealing to Item 3 and Item 1 yields that γ,βh-frb\mathcal{F}_{\gamma,\beta}^{h\text{\sc-frb}} is constantly equal to φ\varphi^{\star} on the compact set ω(z0)\omega(z^{0}). In turn, all conditions in Item 2 are satisfied, which implies that for kk0k\geq k_{0}

1(ψ~)(δkφ)dist(0,γ,βh-frb(zk)).1\leq(\tilde{\psi})^{\prime}_{-}{\mathopen{}\left(\delta_{k}-\varphi^{\star}\right)\mathclose{}}\operatorname{dist}{\mathopen{}\left(0,\partial\mathcal{F}_{\gamma,\beta}^{h\text{\sc-frb}}(z^{k})\right)\mathclose{}}. (4.4)

Define (k\varmathbbN)(\forall k\in\varmathbb N)

uk=\displaystyle u^{k}={} 2h^(xk)(xkxk+1)+2f^β(xk)(xk+1xk)+c2γ(h(xk)h(xk1))\displaystyle\nabla^{2}\hat{h}(x^{k})(x^{k}-x^{k+1})+\nabla^{2}\vphantom{f}\smash{\hat{f}}_{\!\beta}(x^{k})(x^{k+1}-x^{k})+\tfrac{c}{2\gamma}\bigl{(}{\nabla}h(x^{k})-{\nabla}h(x^{k-1})\bigr{)}
+f^β(xk1)f^β(xk)+ξ(xk)ξ(xk1),\displaystyle+{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}(x^{k-1})-{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}(x^{k})+{\nabla}\xi(x^{k})-{\nabla}\xi(x^{k-1}),
vk=\displaystyle v^{k}={} 2f^β(xk1)(xkxk+1)+c2γ2h(xk1)(xk1xk)+2ξ(xk1)(xk1xk).\displaystyle\nabla^{2}\vphantom{f}\smash{\hat{f}}_{\!\beta}(x^{k-1})(x^{k}-x^{k+1})+\tfrac{c}{2\gamma}\nabla^{2}h(x^{k-1})(x^{k-1}-x^{k})+\nabla^{2}\xi(x^{k-1})(x^{k-1}-x^{k}).

Then uk(M1+M2)xk+1xk+(M1+(c/2γ)M3+M4)xkxk1\|{}u^{k}{}\|\leq(M_{1}+M_{2})\|{}x^{k+1}-x^{k}{}\|+(M_{1}+(c/2\gamma)M_{3}+M_{4})\|{}x^{k}-x^{k-1}{}\| and vkM1xk+1xk+((c/2γ)M3+M4)xkxk1\|{}v^{k}{}\|\leq M_{1}\|{}x^{k+1}-x^{k}{}\|+((c/2\gamma)M_{3}+M_{4})\|{}x^{k}-x^{k-1}{}\|, where M1=sup2f^β()M_{1}=\sup\|\nabla^{2}\vphantom{f}\smash{\hat{f}}_{\!\beta}(\mathcal{B})\|, M2=sup2h^()M_{2}=\sup\|\nabla^{2}\hat{h}(\mathcal{B})\|, M3=sup2h()M_{3}=\sup\|{}\nabla^{2}h(\mathcal{B}){}\|, and M4=sup2ξ()M_{4}=\sup\|{}\nabla^{2}\xi(\mathcal{B}){}\|. Applying subdifferential calculus to γ,βh-frb(zk)\partial\mathcal{F}_{\gamma,\beta}^{h\text{\sc-frb}}(z_{k}) yields that

γ,βh-frb(zk)=(φ(xk+1)+h^(xk+1)h^(xk)+f^β(xk)f^β(xk1))×{uk}×{vk},\partial\mathcal{F}_{\gamma,\beta}^{h\text{\sc-frb}}(z^{k}){}={}\bigl{(}\partial\varphi(x^{k+1})+{\nabla}\hat{h}(x^{k+1})-{\nabla}\hat{h}(x^{k})+{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}(x^{k})-{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}(x^{k-1})\bigr{)}\times\{u^{k}\}\times\{v^{k}\},

which together with Item 4 entails that (0,uk,vk)γ,βh-frb(zk)(0,u^{k},v^{k})\in\partial\mathcal{F}_{\gamma,\beta}^{h\text{\sc-frb}}(z^{k}). In turn, summing the aforementioned bounds on uk\|{}u^{k}{}\| and vk\|{}v^{k}{}\| gives

dist(0,γ,βh-frb(zk))M|(xk+1xk,xkxk1)|,\operatorname{dist}{\mathopen{}\left(0,\partial\mathcal{F}_{\gamma,\beta}^{h\text{\sc-frb}}(z^{k})\right)\mathclose{}}\leq M{|\kern-0.86108pt|\kern-0.86108pt|(x^{k+1}-x^{k},x^{k}-x^{k-1})|\kern-0.86108pt|\kern-0.86108pt|}, (4.5)

where M=max{2M1+M2,M1+(c/γ)M3+2M4}M=\max{\mathopen{}\left\{2M_{1}+M_{2},M_{1}+(c/\gamma)M_{3}+2M_{4}{}\mathrel{\mid}{}{}\right\}\mathclose{}}.

Finally, we show that (xk)k\varmathbbN(x^{k})_{k\in\varmathbb N} is convergent. For simplicity, define (k,l\varmathbbN)(\forall k,l\in\varmathbb N) Δk,l=ψ~(δkφ)ψ~(δlφ)\Delta_{k,l}=\tilde{\psi}{\mathopen{}\left(\delta_{k}-\varphi^{\star}\right)\mathclose{}}-\tilde{\psi}{\mathopen{}\left(\delta_{l}-\varphi^{\star}\right)\mathclose{}}. Then, combining (4.4) and (4.5) yields

1\displaystyle 1 M(ψ~)(δkφ)|(xk+1xk,xkxk1)|MΔk,k+1δkδk+1|(xk+1xk,xkxk1)|\displaystyle{}\leq{}M(\tilde{\psi})^{\prime}_{-}(\delta_{k}-\varphi^{\star}){|\kern-0.86108pt|\kern-0.86108pt|(x^{k+1}-x^{k},x^{k}-x^{k-1})|\kern-0.86108pt|\kern-0.86108pt|}{}\leq{}\tfrac{M\Delta_{k,k+1}}{\delta_{k}-\delta_{k+1}}{|\kern-0.86108pt|\kern-0.86108pt|(x^{k+1}-x^{k},x^{k}-x^{k-1})|\kern-0.86108pt|\kern-0.86108pt|}
2γMΔk,k+1c(Dh(xk+1,xk)+Dh(xk,xk1))|(xk+1xk,xkxk1)|\displaystyle{}\leq{}\frac{2\gamma M\Delta_{k,k+1}}{c{\mathopen{}\left(\operatorname{D}_{h}(x^{k+1},x^{k})+\operatorname{D}_{h}(x^{k},x^{k-1})\right)\mathclose{}}}{|\kern-0.86108pt|\kern-0.86108pt|(x^{k+1}-x^{k},x^{k}-x^{k-1})|\kern-0.86108pt|\kern-0.86108pt|}
2γMΔk,k+1|(xk+1xk,xkxk1)|cσxk+1xk2+cσxkxk124γMΔk,k+1cσ|(xk+1xk,xkxk1)|,\displaystyle{}\leq{}\frac{2\gamma M\Delta_{k,k+1}{|\kern-0.86108pt|\kern-0.86108pt|(x^{k+1}-x^{k},x^{k}-x^{k-1})|\kern-0.86108pt|\kern-0.86108pt|}}{c\sigma\|{}x^{k+1}-x^{k}{}\|^{2}+c\sigma\|{}x^{k}-x^{k-1}{}\|^{2}}\leq\frac{4\gamma M\Delta_{k,k+1}}{c\sigma{|\kern-0.86108pt|\kern-0.86108pt|(x^{k+1}-x^{k},x^{k}-x^{k-1})|\kern-0.86108pt|\kern-0.86108pt|}},

where the second inequality is implied by concavity of ψ~\tilde{\psi}, the third one follows from (4.1), and the fourth one holds because σ>0\sigma>0 is the strong convexity modulus of hh on \mathcal{B}. Hence,

|(xk+1xk,xkxk1)|4γMcσΔk,k+1.{|\kern-0.86108pt|\kern-0.86108pt|(x^{k+1}-x^{k},x^{k}-x^{k-1})|\kern-0.86108pt|\kern-0.86108pt|}\leq\tfrac{4\gamma M}{c\sigma}\Delta_{k,k+1}. (4.6)

Summing (4.6) from k=k0k=k_{0} to an arbitrary lk0+1l\geq k_{0}+1 and passing ll to infinity justifies (4.3). A similar procedure shows that (xk)k\varmathbbN(x^{k})_{k\in\varmathbb N} is Cauchy, which together with Item 3 entails the rest of the statement. ∎

Remark 4.7.

Note that ψ~\tilde{\psi} is the smallest concave desingularizing function associated with γ,βh-frb\mathcal{F}_{\gamma,\beta}^{h\text{\sc-frb}}. Therefore (4.3) is the sharpest upper bound on k=0xk+1xk\sum_{k=0}^{\infty}\|{}x^{k+1}-x^{k}{}\| produced by the usual KL convergence framework. We refer readers to [10, §6] for a summary of such a framework. ∎

The corollary below states that semialgebraic functions satisfy the assumptions in 4.6.

Corollary 4.8.

Suppose that II holds, and let (xk)k\varmathbbN(x^{k})_{k\in\varmathbb N} be a sequence generated by ii^{*}FRB (Algorithm 1) in the setting of 3.6. Assume in addition that

  1. 1

    φ\varphi is level-bounded,

  2. 2

    f,hf,h are twice continuously differentiable and 2h\nabla^{2}h is positive definite, and

  3. 3

    φ,h\varphi,h are semialgebraic.

Then k=0xk+1xk<\sum_{k=0}^{\infty}\|{}x^{k+1}-x^{k}{}\|<\infty and there exists xx^{\star} with 0^φ(x)0\in\hat{\partial}\varphi(x^{\star}) such that xkxx^{k}\to x^{\star}.

Proof.

Item 2 entails that (xk)k\varmathbbN(x_{k})_{k\in\varmathbb N} is bounded. Note that the class of semialgebraic functions is closed under summation and satisfies the generalized concave KL property; see, e.g., [2, §4.3] and [41, §2.2]. Hence so does γ,βh-frb\mathcal{F}_{\gamma,\beta}^{h\text{\sc-frb}} defined in 4.6. Applying 4.6 completes the proof. ∎

4.3 Convergence rates

Having established convergence of ii^{*}FRB, we now turn to its rate. Recall that a function is said to have KL exponent θ[0,1)\theta\in[0,1) if it satisfies the generalized concave KL property (recall 4.4) and there exists a desingularizing function of the form ψ(t)=ct1θ\psi(t)=ct^{1-\theta} for some c>0c>0.

Theorem 4.9 (function value and sequential convergence rate).

Suppose that all the assumptions in 4.6 are satisfied, and follow the notation therein. Define (k\varmathbbN)(\forall k\in\varmathbb N) ek=γ,βh-frb(xk+1,xk)φe_{k}=\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k+1},x^{k})-\varphi^{\star}. Assume in addition that γ,βh-frb\mathcal{F}_{\gamma,\beta}^{h\text{\sc-frb}} has KL exponent θ[0,1)\theta\in[0,1) at (x,x,x)(x^{\star},x^{\star},x^{\star}). Then the following hold:

  1. 1.

    If θ=0\theta=0, then ek0e_{k}\to 0 and xkxx^{k}\to x^{\star} after finite steps.

  2. 2.

    If θ(0,1/2]\theta\in(0,1/2], then there exist c1,c^1>0c_{1},\hat{c}_{1}>0 and Q1,Q^1[0,1)Q_{1},\hat{Q}_{1}\in[0,1) such that for kk sufficiently large,

    ekc^1Q^1k and xkxc1Q1k.e_{k}\leq\hat{c}_{1}\hat{Q}_{1}^{k}\text{ and }\|{}x^{k}-x^{\star}{}\|\leq c_{1}Q_{1}^{k}.
  3. 3.

    If θ(1/2,1)\theta\in(1/2,1), then there exist c2,c^2>0c_{2},\hat{c}_{2}>0 such that for kk sufficiently large,

    ekc^2k12θ1 and xkxc2k1θ2θ1.e_{k}\leq\hat{c}_{2}k^{-\frac{1}{2\theta-1}}\text{ and }\|{}x^{k}-x^{\star}{}\|\leq c_{2}k^{-\frac{1-\theta}{2\theta-1}}.
Proof.

Assume without loss of generality that γ,βh-frb\mathcal{F}_{\gamma,\beta}^{h\text{\sc-frb}} has a desingularizing function ψ(t)=t1θ/(1θ)\psi(t)=t^{1-\theta}/(1-\theta) and let (k\varmathbbN)(\forall k\in\varmathbb N) δk=i=kxi+1xi\delta_{k}=\sum_{i=k}^{\infty}\|{}x^{i+1}-x^{i}{}\|. We claim that

(kk0)δk4γM(1θ)cσek11θ+22γcσek1,(\forall k\geq k_{0})\leavevmode\nobreak\ \delta_{k}\leq\tfrac{4\gamma M}{(1-\theta)c\sigma}e_{k-1}^{1-\theta}+2\sqrt{\tfrac{2\gamma}{c\sigma}}\sqrt{e_{k-1}}, (4.7)

which will be justified at the end of this proof. It is routine to see that the desired sequential rate can be implied by those of (ek)(e_{k}) through (4.7); see, e.g., [40, Thm. 5.3], therefore it suffices to prove convergence rate of (ek)(e_{k}).

Recall from Item 1 that (ek)(e_{k}) is a decreasing sequence converging to 0. Then invoking the KL exponent assumption yields ek1θ=[γ,βh-frb(xk+1,xk,xk1)φ]θdist(0,γ,βh-frb(xk+1,xk,xk1))e_{k-1}^{\theta}=[\mathcal{F}_{\gamma,\beta}^{h\text{\sc-frb}}(x^{k+1},x^{k},x^{k-1})-\varphi^{\star}]^{\theta}\leq\operatorname{dist}(0,\partial\mathcal{F}_{\gamma,\beta}^{h\text{\sc-frb}}(x^{k+1},x^{k},x^{k-1})), which together with (4.5) implies that

ekθek1θM|(xk+1xk,xkxk1)|.e_{k}^{\theta}\leq e_{k-1}^{\theta}\leq M{|\kern-0.86108pt|\kern-0.86108pt|(x^{k+1}-x^{k},x^{k}-x^{k-1})|\kern-0.86108pt|\kern-0.86108pt|}. (4.8)

Appealing again to Item 1 gives

ek1ek\displaystyle e_{k-1}-e_{k} =γ,βh-frb(xk,xk1)γ,βh-frb(xk+1,xk)\displaystyle=\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},x^{k-1})-\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k+1},x^{k})
c2γ[Dh(xk+1,xk)+Dh(xk,xk1)]\displaystyle\geq\tfrac{c}{2\gamma}{\mathopen{}\left[\operatorname{D}_{h}(x^{k+1},x^{k})+\operatorname{D}_{h}(x^{k},x^{k-1})\right]\mathclose{}}
cσ8γ|(xk+1xk,xkxk1)|2\displaystyle\geq\tfrac{c\sigma}{8\gamma}{|\kern-0.86108pt|\kern-0.86108pt|(x^{k+1}-x^{k},x^{k}-x^{k-1})|\kern-0.86108pt|\kern-0.86108pt|}^{2}
cσ8γM2ek2θ,\displaystyle\geq\tfrac{c\sigma}{8\gamma M^{2}}e_{k}^{2\theta},

where the last inequality is implied by (4.8). Then applying [11, Lem. 10] justifies the desired convergence rate of (ek)(e_{k}).

Finally, we show that (4.7) holds. Invoking Assumptions 2and 1 entails |(xk+1xk,xkxk1)|2(4/σ)[Dh(xk+1,xk)+Dh(xk,xk1)](8γ/cσ)(ek1ek)(8γ/cσ)ek1,{|\kern-0.86108pt|\kern-0.86108pt|(x^{k+1}-x^{k},x^{k}-x^{k-1})|\kern-0.86108pt|\kern-0.86108pt|}^{2}\leq(4/\sigma)[\operatorname{D}_{h}(x^{k+1},x^{k})+\operatorname{D}_{h}(x^{k},x^{k-1})]\leq(8\gamma/c\sigma)(e_{k-1}-e_{k})\leq(8\gamma/c\sigma)e_{k-1}, thus xkxk12(2γ/cσ)ek1\|{}x^{k}-x^{k-1}{}\|\leq 2\sqrt{(2\gamma/c\sigma)e_{k-1}}. In turn,

δk\displaystyle\delta_{k} δk+xkxk14γMcσψ~(ek)+22γcσek1\displaystyle\leq\delta_{k}+\|{}x^{k}-x^{k-1}{}\|\leq\tfrac{4\gamma M}{c\sigma}\tilde{\psi}(e_{k})+2\sqrt{\tfrac{2\gamma}{c\sigma}}\sqrt{e_{k-1}}
4γMcσψ~(ek1)+22γcσek14γM(1θ)cσek11θ+22γcσek1,\displaystyle\leq\tfrac{4\gamma M}{c\sigma}\tilde{\psi}(e_{k-1})+2\sqrt{\tfrac{2\gamma}{c\sigma}}\sqrt{e_{k-1}}\leq\tfrac{4\gamma M}{(1-\theta)c\sigma}e_{k-1}^{1-\theta}+2\sqrt{\tfrac{2\gamma}{c\sigma}}\sqrt{e_{k-1}},

where the second inequality follows from (4.3), the third one holds due to the fact that ekek1e_{k}\leq e_{k-1} and the monotonicity of ψ~\tilde{\psi}, and the last one is implied by the KL exponent assumption and Lemma 4.5, as claimed. ∎

5 Globalizing fast local methods with ii^{*}FRB

With some due care, the method can be enhanced in the context of the continuous-Lyapunov descent (CLyD) framework [36, §4], so as to globalize fast local methods x+=x+dx^{+}=x+d by using the same oracle as ii^{*}FRB. Methods of quasi-Newton type constitute a convenient class of candidate local methods. A prototypical use case hinges on the inclusion xS(x)Tγ,βh-frb(x,x)x\in\operatorname{S}(x)\coloneqq\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x) encoding necessary optimality conditions, cf. Item 5, so that fast update directions dd can be retrieved based on the root-finding problem 0(idS)(x)0\in({\rm id}-\operatorname{S})(x); see, e.g., [15, §7] and [17]. Regardless, while the globalization framework is flexible to accommodate any update direction and yet retains convergence of ii^{*}FRB, it promotes the ones triggering fast local convergence as it will be demonstrated in 5.3.

Algorithm 2 ii^{*}FRB linesearch extension
1:
2:
3:
4:Compute x¯kTγ,βh-frb(xk,yk1)\bar{x}^{k}\in\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},y^{k-1}) \triangleright γ,βh-frb(x¯k,xk)γ,βh-frb(xk,yk1)\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(\bar{x}^{k},x^{k})\leq\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},y^{k-1}) c2γDh(x¯k,xk)c2γDh(xk,yk1){}-\frac{c}{2\gamma}\operatorname{D}_{h}(\bar{x}^{k},x^{k})-\frac{c}{2\gamma}\operatorname{D}_{h}(x^{k},y^{k-1})
5:Choose an update direction dkd^{k} at xkx^{k}
6:
Set  yk=xk+τkdky^{k}=x^{k}+\tau_{k}d^{k}  and  xk+1=(1τk)x¯k+τk(xk+dk)x^{k+1}=(1-\tau_{k})\bar{x}^{k}+\tau_{k}(x^{k}+d^{k})
with τk\tau_{k} the largest in {1,1/2,}{\mathopen{}\left\{1,1/2,\dots\mathrel{\mid}{}{}\right\}\mathclose{}} such that
γ,βh-frb(xk+1,yk)γ,βh-frb(xk,yk1)δc2γ(Dh(x¯k,xk)+Dh(xk,yk1))\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k+1},y^{k}){}\leq{}\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},y^{k-1}){}-{}\tfrac{\delta c}{2\gamma}{\mathopen{}\left(\operatorname{D}_{h}(\bar{x}^{k},x^{k}){}+{}\operatorname{D}_{h}(x^{k},y^{k-1})\right)\mathclose{}} (5.1)

The algorithmic framework revolves around two key facts: continuity of γ,βh-frb\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}} and its decrease after ii^{*}FRB steps (x,y)(x¯,x)(x,y^{-})\mapsto(\bar{x},x) with x¯Tγ,βh-frb(x,y)\bar{x}\in\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,y^{-}). (The reason for introducing an auxiliary variable yy will be discussed after 5.1.) Thus, not only is γ,βh-frb\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}} smaller than γ,βh-frb(x,y)\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,y^{-}) at (x¯,x)(\bar{x},x), but also at sufficiently close points, thereby ensuring that by gradually pushing the tentative fast update towards the safeguard (x¯,x)(\bar{x},x) a decrease on γ,βh-frb\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}} is eventually achieved. This fact is formalized next.

Theorem 5.1 (well definedness and asymptotic analysis of Algorithm 2).

Suppose that I and the bounds on γ\gamma and β\beta as in 3.6 are satisfied. Then, the following hold for the iterates generated by Algorithm 2:

  1. 1.

    Regardless of what the selected update direction dkd^{k} is, the linesearch at step 6 always succeeds: either x¯k=xk=yk1\bar{x}^{k}=x^{k}=y^{k-1} holds, in which case 0^φ(x¯k)0\in\hat{\partial}\varphi(\bar{x}^{k}), or there exists τ¯k>0\bar{\tau}_{k}>0 such that (5.1) holds for every τkτ¯k\tau_{k}\leq\bar{\tau}_{k}.

  2. 2.

    k\varmathbbNDh(x¯k,xk)<\sum_{k\in\varmathbb N}\operatorname{D}_{h}(\bar{x}^{k},x^{k})<\infty, and in particular Dh(x¯k,xk)0\operatorname{D}_{h}(\bar{x}^{k},x^{k})\to 0.

  3. 3.

    If φC¯\varphi_{\overline{C}} is coercive, then the sequence (xk)k\varmathbbN(x^{k})_{k\in\varmathbb N} remains bounded.

  4. 4.

    Suppose that II holds, and let ω\omega be the set of limit points of (xk)k\varmathbbN(x^{k})_{k\in\varmathbb N}. Then, φ\varphi is constant on ω\omega with value φ\varphi^{\star}, and for every xωx^{\star}\in\omega it holds that xTγ,βh-frb(x,x)x^{\star}\in\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{\star},x^{\star}) and 0^φ(x)0\in\hat{\partial}\varphi(x^{\star}).

  5. 5.

    Under the assumptions of 4.3, for any ε>0\varepsilon>0 the condition h^(xk)h^(x¯k)f^β(xk)+f^β(yk1)ε\|{\nabla}\hat{h}(x^{k})-{\nabla}\hat{h}(\bar{x}^{k})-{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}(x^{k})+{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}(y^{k-1})\|\leq\varepsilon holds for all kk large enough and guarantees that dist(0,^φ(x¯k))ε\operatorname{dist}(0,\hat{\partial}\varphi(\bar{x}^{k}))\leq\varepsilon.

Proof.
  • 1  Stationarity of x¯k\bar{x}^{k} when x¯k=xk=yk\bar{x}^{k}=x^{k}=y^{k} follows from Item 5. Otherwise, let xτk+1(1τ)x¯k+τ(xk+dk)x_{\tau}^{k+1}\coloneqq(1-\tau)\bar{x}^{k}+\tau(x^{k}+d^{k}) and yτkxk+τdky_{\tau}^{k}\coloneqq x^{k}+\tau d^{k}, and note that (xτk+1,yτk)(x¯k,xk)(x_{\tau}^{k+1},y_{\tau}^{k})\to(\bar{x}^{k},x^{k}) as τ0\tau\searrow 0. By continuity of γ,βh-frb\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}} we thus have

    limτ0γ,βh-frb(xτk+1,yτk)=\displaystyle\lim_{\tau\searrow 0}\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x_{\tau}^{k+1},y_{\tau}^{k}){}={} γ,βh-frb(x¯k,xk)\displaystyle\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(\bar{x}^{k},x^{k})
    3.6\displaystyle{}\mathrel{{}\mathop{\leq}\limits^{\smash{\raisebox{1.5pt}{\text{\scriptsize\clap{\begin{tabular}[b]{c}\ref{thm:SD}\end{tabular}}}}}}}{} γ,βh-frb(xk,yk1)c2γ(Dh(x¯k,xk)+Dh(xk,yk1))\displaystyle\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},y^{k-1}){}-{}\tfrac{c}{2\gamma}{\mathopen{}\left(\operatorname{D}_{h}(\bar{x}^{k},x^{k}){}+{}\operatorname{D}_{h}(x^{k},y^{k-1})\right)\mathclose{}}
    and since one at least among Dh(x¯k,xk)\operatorname{D}_{h}(\bar{x}^{k},x^{k}) and Dh(xk,yk1)\operatorname{D}_{h}(x^{k},y^{k-1}) is strictly positive, δ(0,1)\delta\in(0,1), and c>0c>0,
    <\displaystyle{}<{} γ,βh-frb(xk,yk1)δc2γ(Dh(x¯k,xk)+Dh(xk,yk1)).\displaystyle\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},y^{k-1}){}-{}\tfrac{\delta c}{2\gamma}{\mathopen{}\left(\operatorname{D}_{h}(\bar{x}^{k},x^{k}){}+{}\operatorname{D}_{h}(x^{k},y^{k-1})\right)\mathclose{}}.

    It then follows that there exists τ¯k>0\bar{\tau}_{k}>0 such that

    γ,βh-frb(xτk+1,yτk)γ,βh-frb(xk,yk1)δc2γ(Dh(x¯k,xk)+Dh(xk,yk1))\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x_{\tau}^{k+1},y_{\tau}^{k}){}\leq{}\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},y^{k-1}){}-{}\tfrac{\delta c}{2\gamma}{\mathopen{}\left(\operatorname{D}_{h}(\bar{x}^{k},x^{k}){}+{}\operatorname{D}_{h}(x^{k},y^{k-1})\right)\mathclose{}}

    holds for every τ(0,τ¯k]\tau\in(0,\bar{\tau}_{k}], which is the claim.

  • 2 & 3  Follow by telescoping (5.1), since infγ,βh-frb=infφ>\operatorname*{inf}\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}=\operatorname*{inf}\varphi>-\infty by Item 2 and from the fact that γ,βh-frb(xk,yk1)γ,βh-frb(x0,y1)\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},y^{k-1})\leq\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{0},y^{-1}).

  • 4 & 5  Having shown the validity of (5.1), the proof follows from the same arguments of those of Items 3and 4.3. ∎

It should be noted that 5.1 remains valid even with the simpler choice yk=xky^{k}=x^{k}. The apparently wasteful choice of yky^{k} as in Algorithm 2 is instead of paramount importance in promoting “good” directions by enabling acceptance of unit stepsize. This is in sharp contrast with known issues of other nonsmooth globalization strategies, where convergence severely hampered (Maratos’ effect [26], see also [17, §6.2]). A quality assessment on the direction is provided by the following notion.

Definition 5.2 (superlinearly convergent directions [15, Eq. (7.5.2)]).

Relative to a sequence (xk)k\varmathbbN(x^{k})_{k\in\varmathbb N} converging to a point xx^{\star} we say that (dk)k\varmathbbN(d^{k})_{k\in\varmathbb N} are superlinearly convergent directions if

limkxk+dkxxkx=0.\lim_{k\to\infty}\frac{\|x^{k}+d^{k}-x^{\star}\|}{\|x^{k}-x^{\star}\|}=0.
Theorem 5.3 (acceptance of unit stepsize).

Suppose that s Iand II and the bounds on γ\gamma and β\beta as in 3.6 are satisfied. Suppose further that f,h𝒞2f,h\in\mathcal{C}^{2} with h0{\nabla}h\succ 0, and that (xk)k\varmathbbN(x^{k})_{k\in\varmathbb N} converges to a strong local minimum xx^{\star} for φ\varphi satisfying Tγ,βh-frb(x,x)={x}\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{\star},x^{\star})={\mathopen{}\left\{x^{\star}{}\mathrel{\mid}{}{}\right\}\mathclose{}}.444Although it is only the inclusion xTγ,βh-frb(x,x)x^{\star}\in\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{\star},x^{\star}) which is guaranteed for any limit point in the full generality of s Iand II (cf. Item 4), additionally requiring single-valuedness is a negligible extra assumption as it can be inferred from Item 5; see [1, §3.1] and [36, §2.4] for a detailed discussion. If the directions (dk)k\varmathbbN(d^{k})_{k\in\varmathbb N} are superlinearly convergent with respect to (xk)k\varmathbbN(x^{k})_{k\in\varmathbb N}, then eventually τk=1\tau_{k}=1 is always accepted at step 6, and the algorithm reduces to the local method xk+1=xk+dkx^{k+1}=x^{k}+d^{k} and converges at superlinear rate.

Proof.

Let φ=φ(x)\varphi^{\star}=\varphi(x^{\star}) be the limit point of (γ,βh-frb(xk+1,yk))k\varmathbbN(\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k+1},y^{k}))_{k\in\varmathbb N}. Since xkxx^{k}\to x^{\star}, x¯kx\bar{x}^{k}\to x^{\star} as well, and strong local minimality of xx^{\star} implies that φ(x¯k)φσ2x¯kx2\varphi(\bar{x}^{k})-\varphi^{\star}\geq\frac{\sigma}{2}\|\bar{x}^{k}-x^{\star}\|^{2} holds for some σ>0\sigma>0 and all kk large enough. In turn, for kk large it holds that

γ,βh-frb(xk,yk1)(3.9b)\displaystyle\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},y^{k-1}){}\mathrel{{}\mathop{\geq}\limits^{\vphantom{\text{\scriptsize\clap{\begin{tabular}[b]{c}\eqref{eq:L:geq}\end{tabular}}}}\smash{\raisebox{1.5pt}{\text{\scriptsize\clap{\begin{tabular}[b]{c}\eqref{eq:L:geq}\end{tabular}}}}}}}{} φ(x¯k)+cγDh(x¯k,xk)+c2γDh(xk,yk1)\displaystyle\varphi(\bar{x}^{k}){}+{}\tfrac{c}{\gamma}\operatorname{D}_{h}(\bar{x}^{k},x^{k}){}+{}\tfrac{c}{2\gamma}\operatorname{D}_{h}(x^{k},y^{k-1})
\displaystyle{}\geq{} φ+σ2x¯kx2+r2x¯kxk2\displaystyle\varphi^{\star}{}+{}\tfrac{\sigma}{2}\|\bar{x}^{k}-x^{\star}\|^{2}{}+{}\tfrac{r}{2}\|\bar{x}^{k}-x^{k}\|^{2}
for some r>0r>0 (since 2h0\nabla^{2}h\succ 0), and by Young’s inequality
\displaystyle{}\geq{} φ+σ2x¯kx2+r2(11+ϵxkx21ϵx¯kx2)\displaystyle\varphi^{\star}{}+{}\tfrac{\sigma}{2}\|\bar{x}^{k}-x^{\star}\|^{2}{}+{}\tfrac{r}{2}{\mathopen{}\left(\tfrac{1}{1+\epsilon}\|x^{k}-x^{\star}\|^{2}{}-{}\tfrac{1}{\epsilon}\|\bar{x}^{k}-x^{\star}\|^{2}\right)\mathclose{}}

for every ϵ>0\epsilon>0. By choosing ϵ=r/σ\epsilon=r/\sigma one obtains that

γ,βh-frb(xk,yk1)φμ2xkx2for k large enough,\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},y^{k-1})-\varphi^{\star}\geq\tfrac{\mu}{2}\|x^{k}-x^{\star}\|^{2}\quad\text{for $k$ large enough,} (5.2)

where μσr/(σ+r)>0\mu\coloneqq\sigma r/(\sigma+r)>0. On the other hand, since γ,βh-frb(x,x)=ϕγ,βh-frb(x,x)\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x)=\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x) for any xx, by definition of ϕγ,βh-frb\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}} (cf. (3.4) with x=xx^{-}=x) it follows that

γ,βh-frb(xk+dk,xk+dk)=\displaystyle\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k}+d^{k},x^{k}+d^{k}){}={} ϕγ,βh-frb(xk+dk,xk+dk)\displaystyle\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k}+d^{k},x^{k}+d^{k}) (5.3)
\displaystyle{}\leq{} γ,βh-frb(x;xk+dk,xk+dk)=φ+Dh^(x,xk+dk)\displaystyle\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{\star};x^{k}+d^{k},x^{k}+d^{k}){}={}\varphi^{\star}+\operatorname{D}_{\hat{h}}(x^{\star},x^{k}+d^{k})
\displaystyle{}\leq{} φ+L2xk+dkx2\displaystyle\varphi^{\star}+\tfrac{L}{2}\|x^{k}+d^{k}-x^{\star}\|^{2}

holds for some L>0L>0 and all kk large enough, where the last inequality uses the fact that f,h𝒞2f,h\in\mathcal{C}^{2} (hence h^𝒞2\hat{h}\in\mathcal{C}^{2} too). Combined with (5.2) and superlinearity of the directions (dk)k\varmathbbN(d^{k})_{k\in\varmathbb N} we obtain that

εkγ,βh-frb(xk+dk,xk+dk)φγ,βh-frb(xk,yk1)φ0as k.\varepsilon_{k}{}\coloneqq{}\frac{\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k}+d^{k},x^{k}+d^{k})-\varphi^{\star}}{\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},y^{k-1})-\varphi^{\star}}{}\to{}0\quad\text{as $k\to\infty$.} (5.4)

Observe that for any zkTγ,βh-frb(x¯k,xk)z^{k}\in\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(\bar{x}^{k},x^{k}) eventually it holds that γ,βh-frb(x¯k,xk)φ(zk)φ\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(\bar{x}^{k},x^{k})\geq\varphi(z^{k})\geq\varphi^{\star}. In fact, the first inequality owes to (3.9b), and the second one holds for kk large enough because of local minimality of xx^{\star} for φ\varphi and the fact that

limkzklim supkTγ,βh-frb(x¯k,xk)Tγ,βh-frb(x,x)={x}\lim_{k\to\infty}z^{k}{}\in{}\limsup_{k\to\infty}\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(\bar{x}^{k},x^{k}){}\subseteq{}\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{\star},x^{\star}){}={}\{x^{\star}{}\mathrel{\mid}{}{}\}

by osc of Tγ,βh-frb\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}} (cf. Item 3). Then, for kk large enough so that εk1\varepsilon_{k}\leq 1 also holds, we have

γ,βh-frb(xk+dk,xk+dk)γ,βh-frb(xk,yk1)=\displaystyle\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k}+d^{k},x^{k}+d^{k})-\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},y^{k-1}){}={} (εk1)(γ,βh-frb(xk,yk1)φ)\displaystyle(\varepsilon_{k}-1){\mathopen{}\left(\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},y^{k-1})-\varphi^{\star}\right)\mathclose{}}
\displaystyle{}\leq{} (εk1)(γ,βh-frb(xk,yk1)φ(zk))\displaystyle(\varepsilon_{k}-1){\mathopen{}\left(\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},y^{k-1})-\varphi(z^{k})\right)\mathclose{}}
for x¯kTγ,βh-frb(xk,yk1)\bar{x}^{k}\in\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},y^{k-1}), so that 3.6 yields
\displaystyle{}\leq{} (εk1)c2γ(Dh(x¯k,xk)+Dh(xk,yk1)).\displaystyle\tfrac{(\varepsilon_{k}-1)c}{2\gamma}{\mathopen{}\left(\operatorname{D}_{h}(\bar{x}^{k},x^{k})+\operatorname{D}_{h}(x^{k},y^{k-1})\right)\mathclose{}}.

Since εk0\varepsilon_{k}\to 0 and δ(0,1)\delta\in(0,1), eventually εk1δ\varepsilon_{k}-1\leq-\delta, hence

γ,βh-frb(xk+dk,xk+dk)γ,βh-frb(xk,yk1)δc2γDh(x¯k,xk)δc2γDh(xk,yk1),\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k}+d^{k},x^{k}+d^{k}){}\leq{}\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},y^{k-1}){}-{}\tfrac{\delta c}{2\gamma}\operatorname{D}_{h}(\bar{x}^{k},x^{k}){}-{}\tfrac{\delta c}{2\gamma}\operatorname{D}_{h}(x^{k},y^{k-1}),

implying that the first attempt with τk=1\tau_{k}=1 passes the linesearch condition (since, when τk=1\tau_{k}=1, one has xk+1=yk=xk+dkx^{k+1}=y^{k}=x^{k}+d^{k}). In particular, the sequence (xk)k\varmathbbN(x^{k})_{k\in\varmathbb N} eventually reduces to xk+1=xk+dkx^{k+1}=x^{k}+d^{k}, and thus converges superlinearly. ∎

The core of the proof hinges on showing that εk\varepsilon_{k} as in (5.4) vanishes. The same argument cannot be achieved by a naive linesearch prescribing yk=xky^{k}=x^{k}. Indeed,

0ϕγ,βh-frb(xk+dk,xk)φ(x)Dh^(x,xk+dk)O(xk+dkx2)+xxkdk,f^β(xk+dk)f^β(xk)O(xkx),0{}\leq{}\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k}+d^{k},x^{k}){}-{}\varphi(x^{\star})\\ {}\leq{}\underbracket{\operatorname{D}_{\hat{h}}(x^{\star},x^{k}+d^{k})}_{O(\|x^{k}+d^{k}-x^{\star}\|^{2})}{}+{}\langle{}x^{\star}-x^{k}-d^{k}{},{}\underbracket{{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}(x^{k}+d^{k})-{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}(x^{k})}_{O(\|x^{k}-x^{\star}\|)}{}\rangle,

where the first inequality owes to local minimality of xx^{\star} and the fact that xk+dkxx^{k}+d^{k}\to x^{\star}, the second one from the expression (3.2a) of γ,βh-frb\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}} and the definition of the envelope, cf. (3.4), and the big-OO estimates from local smoothness of hh and ff. In particular, ϕγ,βh-frb(xk+dk,xk)φ(x)=o(xkx2)\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k}+d^{k},x^{k}){}-{}\varphi(x^{\star}){}={}o(\|x^{k}-x^{\star}\|^{2}). Observing that (5.2) is still valid, denoting ξ^ξ+(c/2γ)h\hat{\xi}\coloneqq\xi+(c/2\gamma)h so that γ,βh-frb=ϕγ,βh-frb+Dξ^\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}=\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}+\operatorname{D}_{\hat{\xi}} one then has

εkDξ^(xk+dk,xk)γ,βh-frb(xk,xk1)φcdk2γ,βh-frb(xk,xk1)φc′′xkx2γ,βh-frb(xk,xk1)φ,\varepsilon_{k}{}\approx{}\frac{\operatorname{D}_{\hat{\xi}}(x^{k}+d^{k},x^{k})}{\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},x^{k-1})-\varphi^{\star}}{}\approx{}\frac{c^{\prime}\|d^{k}\|^{2}}{\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},x^{k-1})-\varphi^{\star}}{}\approx{}\frac{c^{\prime\prime}\|x^{k}-x^{\star}\|^{2}}{\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x^{k},x^{k-1})-\varphi^{\star}},

where “\approx” denotes equality up to vanishing terms and c,c′′>0c^{\prime},c^{\prime\prime}>0 are some constants due to [15, Lem. 7.5.7]. In this process, the information of superlinearity of (dk)k\varmathbbN(d^{k})_{k\in\varmathbb N} is lost, and the vanishing of εk\varepsilon_{k} cannot be established. Instead, as is apparent from (5.3) the choice of yky^{k} as in step 6 guarantees that on the first trial step γ,βh-frb\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}} coincides with a Bregman Moreau envelope, a function more tightly connected to the cost φ\varphi.

6 Conclusions

This work contributes a mirror inertial forward-reflected-backward splitting algorithm (ii^{*}FRB) and its linesearch enhancement, extending the forward-reflected-backward method proposed in [25] to the nonconvex and relative smooth setting. We have shown that the proposed algorithms enjoy pleasant properties akin to other splitting methods in the same setting. However, our methodology deviates from tradition through the ii^{*}FRB-envelope, an envelope function defined on a product space that takes inertial terms into account, which, to the best of our knowledge, is the first of its kind and thus could be instrumental for future research. This approach also requires the inertial parameter to be negative, which coincides with a recent result [14] regarding the impossibility of accelerated non-Euclidean algorithms under relative smoothness. Thus, it would be tempting to see whether an explicit example can be constructed to prove the sharpness of such restrictive assumption. It is also worth applying our technique to other two-stage splitting methods, such as Tseng’s method, to obtain similar extensions.

Appendix A Proof of 3.6

A.1 Convex f^β\vphantom{f}\smash{\hat{f}}_{\!\beta} case

Lemma A.1.

Suppose that I holds and let γ>0\gamma>0 and β\varmathbbR\beta\in\varmathbb R be such that f^βfβγh\vphantom{f}\smash{\hat{f}}_{\!\beta}\coloneqq f-\frac{\beta}{\gamma}h is a convex function. Then, for every x,xCx,x^{-}\in C and x¯Tγ,βh-frb(x,x)\bar{x}\in\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-})

(ϕγ,βh-frb+Df^β)(x¯,x)\displaystyle{\mathopen{}\left(\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}+\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}\right)\mathclose{}}(\bar{x},x){}\leq{} (ϕγ,βh-frb+Df^β)(x,x)Dh^2f^β(x¯,x).\displaystyle{\mathopen{}\left(\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}+\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}\right)\mathclose{}}(x,x^{-}){}-{}\operatorname{D}_{\hat{h}-2\vphantom{f}\smash{\hat{f}}_{\!\beta}}(\bar{x},x). (A.1)

If h^f^β\hat{h}-\vphantom{f}\smash{\hat{f}}_{\!\beta} too is convex, then ϕγ,βh-frb+Df^β\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}+\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}} has the same infimum of φC¯\varphi_{\overline{C}}, and is level bounded iff so is φC¯\varphi_{\overline{C}}.

Proof.

All the claimed inequalities follows from (3.5) together with the fact that Df^β0\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}\geq 0, and that Dh^f^β0\operatorname{D}_{\hat{h}-\vphantom{f}\smash{\hat{f}}_{\!\beta}}\geq 0 too when h^f^β\hat{h}-\vphantom{f}\smash{\hat{f}}_{\!\beta} is convex. When both f^β\vphantom{f}\smash{\hat{f}}_{\!\beta} and h^f^β\hat{h}-\vphantom{f}\smash{\hat{f}}_{\!\beta} are convex, Item 2 implies that

ϕγ,βh-frb(y,y)+Df^β(y,y)φ(y¯)infφC¯y,yC,y¯Tγ,βh-frb(y,y),\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(y,y^{-})+\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}(y,y^{-})\geq\varphi(\bar{y})\geq\operatorname*{inf}\varphi_{\overline{C}}\quad\forall y,y^{-}\in C,\ \bar{y}\in\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(y,y^{-}), (A.2)

proving that inf(ϕγ,βh-frb+Df^β)infφC¯\operatorname*{inf}(\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}+\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}})\geq\operatorname*{inf}\varphi_{\overline{C}}. The converse inequality follows from Item 3 by observing that (ϕγ,βh-frb+Df^β)(x,x)=ϕγ,βh-frb(x,x)φ(x)(\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}+\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}})(x,x)=\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x)\leq\varphi(x).

To conclude, suppose that ϕγ,βh-frb+Df^β\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}+\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}} is not level bounded, and consider an unbounded sequence (xk,xk)k\varmathbbN(x_{k},x_{k}^{-})_{k\in\varmathbb N} such that ϕγ,βh-frb(xk,xk)+Df^β(xk,xk)\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x_{k},x_{k}^{-})+\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}(x_{k},x_{k}^{-})\leq\ell, for some \varmathbbR\ell\in\varmathbb R. Then, it follows from (A.2) that φ(x¯k)\varphi(\bar{x}_{k})\leq\ell, where x¯kTγ,βh-frb(xk,xk)\bar{x}_{k}\in\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x_{k},x_{k}^{-}). From local boundedness of Tγ,βh-frb\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}} (Item 3) it follows that (x¯k)k\varmathbbN(\bar{x}_{k})_{k\in\varmathbb N} too is unbounded, showing that φ\varphi is not level bounded either. The converse holds by observing that (ϕγ,βh-frb+Df^β)(x,x)=ϕγ,βh-frb(x,x)φC¯(x)(\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}+\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}})(x,x)=\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x)\leq\varphi_{\overline{C}}(x), cf. Item 3. ∎

In the setting of 1, inequality (A.1) can equivalently be written in terms of γ,βh-frb\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}} as

γ,βh-frb(x¯,x)\displaystyle\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(\bar{x},x){}\leq{} γ,βh-frb(x,x)Dh^2f^βcγh(x¯,x)c2γDh(x¯,x)c2γDh(x,x)\displaystyle\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-}){}-{}\operatorname{D}_{\hat{h}-2\vphantom{f}\smash{\hat{f}}_{\!\beta}-\frac{c}{\gamma}h}(\bar{x},x){}-{}\tfrac{c}{2\gamma}\operatorname{D}_{h}(\bar{x},x){}-{}\tfrac{c}{2\gamma}\operatorname{D}_{h}(x,x^{-})
\displaystyle{}\leq{} γ,βh-frb(x,x)c2γDh(x¯,x)c2γDh(x,x),\displaystyle\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-}){}-{}\tfrac{c}{2\gamma}\operatorname{D}_{h}(\bar{x},x){}-{}\tfrac{c}{2\gamma}\operatorname{D}_{h}(x,x^{-}),

where the second inequality owes to the fact that Dh^2f^βcγh0\operatorname{D}_{\hat{h}-2\vphantom{f}\smash{\hat{f}}_{\!\beta}-\frac{c}{\gamma}h}\geq 0, since h^2f^βcγh\hat{h}-2\vphantom{f}\smash{\hat{f}}_{\!\beta}-\frac{c}{\gamma}h is convex, having

h^2f^βcγh=1+2βcγh3f=1+2β+3αpf,hcγ=0h+2(fσf,hh),\hat{h}-2\vphantom{f}\smash{\hat{f}}_{\!\beta}-\tfrac{c}{\gamma}h{}={}\tfrac{1+2\beta-c}{\gamma}h-3f{}={}\vphantom{\tfrac{1+2\beta+3\alpha p_{-f,h}-c}{\gamma}}\smash{\overbracket{\tfrac{1+2\beta+3\alpha p_{-f,h}-c}{\gamma}}^{=0}}h+2(-f-\sigma_{-f,h}h),

the coefficient of hh being null by definition of cc, and fσf,hh-f-\sigma_{-f,h}h being convex by definition of the relative weak convexity modulus σf,h\sigma_{-f,h}, cf. 2.2. This proves (3.9a); inequality (3.9b) follows similarly by observing that

0Dh^2f^βcγh=Dh^f^βDf^βcγDhDh^f^βcγDh,0{}\leq{}\operatorname{D}_{\hat{h}-2\vphantom{f}\smash{\hat{f}}_{\!\beta}-\frac{c}{\gamma}h}{}={}\operatorname{D}_{\hat{h}-\vphantom{f}\smash{\hat{f}}_{\!\beta}}-\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}-\tfrac{c}{\gamma}\operatorname{D}_{h}{}\leq{}\operatorname{D}_{\hat{h}-\vphantom{f}\smash{\hat{f}}_{\!\beta}}-\tfrac{c}{\gamma}\operatorname{D}_{h},

so that

γ,βh-frb(x,x)=2φ(x¯)+Dh^f^βcγDh(x¯,x)+Df^β(x¯,x)0+c2γDh(x,x).\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-}){}\mathrel{{}\mathop{{=}\vphantom{\leq}}\limits^{\vphantom{\text{\scriptsize\clap{\begin{tabular}[b]{c}\ref{thm:env:eq}\end{tabular}}}}\smash{\raisebox{1.5pt}{\text{\scriptsize\clap{\begin{tabular}[b]{c}\ref{thm:env:eq}\end{tabular}}}}}}}{}\varphi(\bar{x}){}+{}\vphantom{\operatorname{D}_{\hat{h}-\vphantom{f}\smash{\hat{f}}_{\!\beta}}\vphantom{x^{-}}}\smash{\overbracket{\operatorname{D}_{\hat{h}-\vphantom{f}\smash{\hat{f}}_{\!\beta}}\vphantom{x^{-}}}^{\geq\frac{c}{\gamma}\operatorname{D}_{h}}}(\bar{x},x){}+{}\vphantom{\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}(\bar{x},x^{-})}\smash{\overbracket{\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}(\bar{x},x^{-})}^{\geq 0}}{}+{}\tfrac{c}{2\gamma}\operatorname{D}_{h}(x,x^{-}).

In turn, Item 2 follows from the same arguments as in the proof of A.1.

A.2 Lipschitz differentiable f^β\vphantom{f}\smash{\hat{f}}_{\!\beta} case

Lemma A.2.

Additionally to I, suppose that f^β\vphantom{f}\smash{\hat{f}}_{\!\beta} is Lf^βL_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}-Lipschitz differentiable for some Lf^β0L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}\geq 0. Then, for every x,xCx,x^{-}\in C and x¯Tγ,βh-frb(x,x)\bar{x}\in\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-})

(ϕγ,βh-frb+DLf^β𝒿)(x¯,x)(ϕγ,βh-frb+DLf^β𝒿)(x,x)Dh^2Lf^β𝒿(x¯,x).{\mathopen{}\left(\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}+\operatorname{D}_{L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}\mathcal{j}}\right)\mathclose{}}(\bar{x},x){}\leq{}{\mathopen{}\left(\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}+\operatorname{D}_{L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}\mathcal{j}}\right)\mathclose{}}(x,x^{-}){}-{}\operatorname{D}_{\hat{h}-2L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}\mathcal{j}}(\bar{x},x). (A.3)

If h^Lf^β𝒿\hat{h}-L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}\mathcal{j} is a convex function, then ϕγ,βh-frb+DLf^β𝒿\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}+\operatorname{D}_{L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}\mathcal{j}} has the same infimum of φC¯\varphi_{\overline{C}}, and is level bounded iff so is φC¯\varphi_{\overline{C}}.

Proof.

By means of the three-point identity, that is, by using (3.2a) in place of (3.2b), inequality (3.5) can equivalently be written as

ϕγ,βh-frb(x¯,x)φC¯(x¯)=\displaystyle\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(\bar{x},x){}\leq{}\varphi_{\overline{C}}(\bar{x}){}={} ϕγ,βh-frb(x,x)Dh^(x¯,x)x¯x,f^β(x)f^β(x)\displaystyle\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-}){}-{}\operatorname{D}_{\hat{h}}(\bar{x},x){}-{}\langle{}\bar{x}-x{},{}{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}(x)-{\nabla}\vphantom{f}\smash{\hat{f}}_{\!\beta}(x^{-}){}\rangle
which by using Young’s inequality on the inner product and Lf^βL_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}-Lipschitz differentiability yields
\displaystyle{}\leq{} ϕγ,βh-frb(x,x)Dh^(x¯,x)+Lf^β2x¯x2+Lf^β2xx2.\displaystyle\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-}){}-{}\operatorname{D}_{\hat{h}}(\bar{x},x){}+{}\tfrac{L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}}{2}\|\bar{x}-x\|^{2}{}+{}\tfrac{L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}}{2}\|x-x^{-}\|^{2}.

Rearranging and using the fact that D𝒿(x,y)=12xy2\operatorname{D}_{\mathcal{j}}(x,y)=\frac{1}{2}\|x-y\|^{2} yields the claimed inequality.

Under convexity of h^Lf^β𝒿\hat{h}-L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}\mathcal{j}, we may draw the same conclusions as in the proof of A.1 by observing that the same application of Young’s inequality above, combined with the expression (3.2a) of γ,βh-frb\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}, gives

ϕγ,βh-frb(y,y)=γ,βh-frb(y¯;y,y)φ(y¯)+Dh^(y¯,y)Lf^β2y¯y2Lf^β2yy2,\phi_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(y,y^{-}){}={}\mathcal{M}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(\bar{y};y,y^{-}){}\geq{}\varphi(\bar{y}){}+{}\operatorname{D}_{\hat{h}}(\bar{y},y){}-{}\tfrac{L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}}{2}\|\bar{y}-y\|^{2}{}-{}\tfrac{L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}}{2}\|y-y^{-}\|^{2},

holding for any y,yCy,y^{-}\in C and y¯Tγ,βh-frb(y,y)\bar{y}\in\operatorname{T}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(y,y^{-}). ∎

We will pattern the arguments in the previous section, and observe that inequality (A.3) can equivalently be written in terms of γ,βh-frb\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}} as

γ,βh-frb(x¯,x)γ,βh-frb(x,x)Dh^2Lf^β𝒿cγh0(x¯,x)c2γDh(x¯,x)c2γDh(x,x).\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(\bar{x},x){}\leq{}\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-}){}-{}\vphantom{\operatorname{D}_{\hat{h}-2L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}\mathcal{j}-\frac{c}{\gamma}h}}\smash{\overbracket{\operatorname{D}_{\hat{h}-2L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}\mathcal{j}-\frac{c}{\gamma}h}}^{\geq 0}}(\bar{x},x){}-{}\tfrac{c}{2\gamma}\operatorname{D}_{h}(\bar{x},x){}-{}\tfrac{c}{2\gamma}\operatorname{D}_{h}(x,x^{-}).

Once again, the fact that Dh^2Lf^β𝒿cγh0\operatorname{D}_{\hat{h}-2L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}\mathcal{j}-\frac{c}{\gamma}h}\geq 0 owes to convexity of h^2Lf^β𝒿cγh\hat{h}-2L{\vphantom{f}\smash{\hat{f}}_{\!\beta}}\mathcal{j}-\frac{c}{\gamma}h, having

h^2Lf^β𝒿cγh=\displaystyle\hat{h}-2L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}\mathcal{j}-\tfrac{c}{\gamma}h{}={} 1cγhf2Lf^β𝒿\displaystyle\tfrac{1-c}{\gamma}h{}-{}f{}-{}2L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}\mathcal{j}
=\displaystyle{}={} 1+γσf,hcγ=2Lf^β/σh0(hσh𝒿convex)+(fσf,hhconvex)+(1+γσf,hcγσh2Lf^β=0)𝒿\displaystyle\overbracket{\vphantom{\tfrac{1+\gamma\sigma_{-f,h}-c}{\gamma}}\tfrac{1+\gamma\sigma_{-f,h}-c}{\gamma}}^{=2L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}/\sigma_{h}\geq 0}(\vphantom{\vphantom{\tfrac{1+\gamma\sigma_{-f,h}-c}{\gamma}}h-\sigma_{h}\mathcal{j}}\smash{\overbracket{\vphantom{\tfrac{1+\gamma\sigma_{-f,h}-c}{\gamma}}h-\sigma_{h}\mathcal{j}}^{\text{convex}}}){}+{}(\vphantom{\vphantom{\tfrac{1+\gamma\sigma_{-f,h}-c}{\gamma}}-f-\sigma_{-f,h}h}\smash{\overbracket{\vphantom{\tfrac{1+\gamma\sigma_{-f,h}-c}{\gamma}}-f-\sigma_{-f,h}h}^{\text{convex}}}){}+{}\bigl{(}\vphantom{\vphantom{\tfrac{1+\gamma\sigma_{-f,h}-c}{\gamma}}\tfrac{1+\gamma\sigma_{-f,h}-c}{\gamma}\sigma_{h}{}-{}2L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}}\smash{\overbracket{\vphantom{\tfrac{1+\gamma\sigma_{-f,h}-c}{\gamma}}\tfrac{1+\gamma\sigma_{-f,h}-c}{\gamma}\sigma_{h}{}-{}2L_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}}^{=0}}\bigr{)}\mathcal{j}

altogether proving (3.9a). Similarly, inequality (3.9b) follows by observing that 0Dh^2f^βcγh=Dh^f^βDf^βcγDhDh^f^βcγDh0{}\leq{}\operatorname{D}_{\hat{h}-2\vphantom{f}\smash{\hat{f}}_{\!\beta}-\frac{c}{\gamma}h}{}={}\operatorname{D}_{\hat{h}-\vphantom{f}\smash{\hat{f}}_{\!\beta}}-\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}-\frac{c}{\gamma}\operatorname{D}_{h}{}\leq{}\operatorname{D}_{\hat{h}-\vphantom{f}\smash{\hat{f}}_{\!\beta}}-\frac{c}{\gamma}\operatorname{D}_{h}, so that

γ,βh-frb(x,x)=2φ(x¯)+Dh^f^βcγDh(x¯,x)+Df^β(x¯,x)0+c2γDh(x,x).\mathcal{L}_{\gamma\!,\,\beta}^{h\text{-\sc frb}}(x,x^{-}){}\mathrel{{}\mathop{{=}\vphantom{\leq}}\limits^{\vphantom{\text{\scriptsize\clap{\begin{tabular}[b]{c}\ref{thm:env:eq}\end{tabular}}}}\smash{\raisebox{1.5pt}{\text{\scriptsize\clap{\begin{tabular}[b]{c}\ref{thm:env:eq}\end{tabular}}}}}}}{}\varphi(\bar{x}){}+{}\vphantom{\operatorname{D}_{\hat{h}-\vphantom{f}\smash{\hat{f}}_{\!\beta}}\vphantom{x^{-}}}\smash{\overbracket{\operatorname{D}_{\hat{h}-\vphantom{f}\smash{\hat{f}}_{\!\beta}}\vphantom{x^{-}}}^{\geq\frac{c}{\gamma}\operatorname{D}_{h}}}(\bar{x},x){}+{}\vphantom{\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}(\bar{x},x^{-})}\smash{\overbracket{\operatorname{D}_{\vphantom{f}\smash{\hat{f}}_{\!\beta}}(\bar{x},x^{-})}^{\geq 0}}{}+{}\tfrac{c}{2\gamma}\operatorname{D}_{h}(x,x^{-}).

The assertion of Item 2 once again follows from the same arguments as in the proof of A.2.

References

  • [1] M. Ahookhosh, A. Themelis, and P. Patrinos. A Bregman forward-backward linesearch algorithm for nonconvex composite optimization: Superlinear convergence to nonisolated local minima. SIAM Journal on Optimization, 31(1):653–685, 2021.
  • [2] H. Attouch, J. Bolte, P. Redont, and A. Soubeyran. Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality. Mathematics of Operations Research, 35(2):438–457, 2010.
  • [3] H. Attouch, J. Bolte, and B.F. Svaiter. Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Mathematical Programming, 137(1):91–129, 2013.
  • [4] D. Azé and J. Penot. Uniformly convex and uniformly smooth convex functions. Annales de la Faculté des sciences de Toulouse : Mathématiques, Ser. 6, 4(4):705–730, 1995.
  • [5] H.H. Bauschke, J. Bolte, and M. Teboulle. A descent lemma beyond Lipschitz gradient continuity: First-order methods revisited and applications. Mathematics of Operations Research, 42(2):330–348, 2017.
  • [6] H.H. Bauschke and P.L. Combettes. Iterating Bregman retractions. SIAM Journal on Optimization, 13(4):1159–1173, 2003.
  • [7] H.H. Bauschke and P.L. Combettes. Convex Analysis and Monotone Operator Theory in Hilbert spaces. CMS Books in Mathematics. Springer, 2017.
  • [8] A. Beck and M. Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31(3):167–175, 2003.
  • [9] D.P. Bertsekas. Nonlinear Programming. Athena Scientific, 2016.
  • [10] J. Bolte, S. Sabach, M. Teboulle, and Y. Vaisbourd. First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems. SIAM Journal on Optimization, 28(3):2131–2151, 2018.
  • [11] R.I. Boţ and D. Nguyen. The proximal alternating direction method of multipliers in the nonconvex setting: convergence analysis and rates. Mathematics of Operations Research, 45(2):682–712, 2020.
  • [12] G. Chen and M. Teboulle. Convergence analysis of a proximal-like minimization algorithm using Bregman functions. SIAM Journal on Optimization, 3(3):538–543, 1993.
  • [13] R. Dragomir, A. d’Aspremont, and J. Bolte. Quartic first-order methods for low-rank minimization. Journal of Optimization Theory and Applications, 189(2):341–363, 2021.
  • [14] R. Dragomir, A.B. Taylor, A. d’Aspremont, and J. Bolte. Optimal complexity and certification of Bregman first-order methods. Mathematical Programming, 194(1):41–83, 2022.
  • [15] F. Facchinei and J. Pang. Finite-dimensional Variational Inequalities and Complementarity Problems, volume II. Springer, 2003.
  • [16] F. Hanzely, P. Richtarik, and L. Xiao. Accelerated Bregman proximal gradient methods for relatively smooth convex optimization. Computational Optimization and Applications, 79(2):405–440, 2021.
  • [17] A.F. Izmailov and M.V. Solodov. Newton-type Methods for Optimization and Variational Problems. Springer, 2014.
  • [18] C. Kan and W. Song. The Moreau envelope function and proximal mapping in the sense of the Bregman distance. Nonlinear Analysis: Theory, Methods & Applications, 75(3):1385 – 1399, 2012.
  • [19] G. Li, T. Liu, and T.K. Pong. Peaceman–Rachford splitting for a class of nonconvex optimization problems. Computational Optimization and Applications, 68(2):407–436, Nov 2017.
  • [20] G. Li and T.K. Pong. Global convergence of splitting methods for nonconvex composite optimization. SIAM Journal on Optimization, 25(4):2434–2460, 2015.
  • [21] G. Li and T.K. Pong. Douglas-Rachford splitting for nonconvex optimization with application to nonconvex feasibility problems. Mathematical Programming, 159(1):371–401, Sep 2016.
  • [22] Y. Liu and W. Yin. An envelope for Davis–Yin splitting and strict saddle-point avoidance. Journal of Optimization Theory and Applications, 181(2):567–587, 2019.
  • [23] H. Lu, R.M. Freund, and Y. Nesterov. Relatively smooth convex optimization by first-order methods, and applications. SIAM Journal on Optimization, 28(1):333–354, 2018.
  • [24] J. Mairal. Incremental majorization-minimization optimization with application to large-scale machine learning. SIAM Journal on Optimization, 25(2):829–855, 2015.
  • [25] Y. Malitsky and M.K. Tam. A forward-backward splitting method for monotone inclusions without cocoercivity. SIAM Journal on Optimization, 30(2):1451–1472, 2020.
  • [26] N. Maratos. Exact penalty function algorithms for finite dimensional and control optimization problems. PhD thesis, Imperial College London (University of London), 1978.
  • [27] B. Mordukhovich. Variational Analysis and Applications, volume 30. Springer, 2018.
  • [28] J. Moreau. Proximité et dualité dans un espace hilbertien. Bulletin de la Société mathématique de France, 93:273–299, 1965.
  • [29] Y. Nesterov. Implementable tensor methods in unconstrained convex optimization. Mathematical Programming, pages 1–27, 2019.
  • [30] Yurii Nesterov. Lectures on convex optimization, volume 137. Springer, 2018.
  • [31] N. Parikh and S. Boyd. Proximal algorithms. Foundations and Trends in Optimization, 1(3):127–239, 2014.
  • [32] D. Reem, S. Reich, and A. De Pierro. Re-examination of Bregman functions and new properties of their divergences. Optimization, 68(1):279–348, 2019.
  • [33] R.T. Rockafellar and R.J. Wets. Variational Analysis, volume 317. Springer, 2011.
  • [34] L. Stella, A. Themelis, and P. Patrinos. Newton-type alternating minimization algorithm for convex optimization. IEEE Transactions on Automatic Control, 2018.
  • [35] M. Teboulle. A simplified view of first order methods for optimization. Mathematical Programming, pages 1–30, 2018.
  • [36] A. Themelis. Proximal Algorithms for Structured Nonconvex Optimization. PhD thesis, KU Leuven, 12 2018.
  • [37] A. Themelis and P. Patrinos. Douglas–Rachford splitting and ADMM for nonconvex optimization: Tight convergence results. SIAM Journal on Optimization, 30(1):149–181, 2020.
  • [38] A. Themelis, L. Stella, and P. Patrinos. Forward-backward envelope for the sum of two nonconvex functions: Further properties and nonmonotone linesearch algorithms. SIAM Journal on Optimization, 28(3):2274–2303, 2018.
  • [39] A. Themelis, L. Stella, and P. Patrinos. Douglas-Rachford splitting and ADMM for nonconvex optimization: Accelerated and Newton-type algorithms. Computational Optimization and Applications, 82:395–440, 2022.
  • [40] X. Wang and Z. Wang. A Bregman inertial forward-reflected-backward method for nonconvex minimization. arXiv:2207.01170, 2022.
  • [41] X. Wang and Z. Wang. The exact modulus of the generalized concave Kurdyka-Łojasiewicz property. Mathematics of Operations Research, 47(4):2765–2783, 2022.
  • [42] X. Wang and Z. Wang. Malitsky-Tam forward-reflected-backward splitting method for nonconvex minimization problems. Computational Optimization and Applications, 82(2):441–463, 2022.