This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: Jan Harold Alcantara 22institutetext: Center for Advanced Intelligence Project, RIKEN, 1-4-1 Nihombashi, Chuo City, Tokyo, 103-0027, Japan 22email: [email protected] 33institutetext: Felipe Atenas 44institutetext: School of Mathematics and Statistics, The University of Melbourne, 813 Swanston St, Parkville, Melbourne, VIC, 3052, Australia 44email: [email protected]

A relaxed version of Ryu’s three-operator splitting method for structured nonconvex optimization

Jan Harold Alcantara and Felipe Atenas
Abstract

In this work, we propose a modification of Ryu’s splitting algorithm for minimizing the sum of three functions, where two of them are convex with Lipschitz continuous gradients, and the third is an arbitrary proper closed function that is not necessarily convex. The modification is essential to facilitate the convergence analysis, particularly in establishing a sufficient descent property for an associated envelope function. This envelope, tailored to the proposed method, is an extension of the well-known Moreau envelope. Notably, the original Ryu splitting algorithm is recovered as a limiting case of our proposal. The results show that the descent property holds as long as the stepsizes remain sufficiently small. Leveraging this result, we prove global subsequential convergence to critical points of the nonconvex objective.

1 Introduction

Modern optimization applications often exhibit a structured form that is well-suited to the application of decomposition techniques. Operator splitting methods have emerged as powerful tools for efficiently solving complex optimization problems. Instances of operator splitting methods include the forward-backward splitting method Passty79 and the Douglas-Rachford splitting method DR ; LM . These algorithmic schemes have been pivotal in the development of techniques for image reconstruction, signal processing, and machine learning combettes2007douglas ; cai2010split ; combettes2011proximal ; boyd2011distributed ; glowinski2017splitting .

The aforementioned methods were originally designed to solve maximal monotone problems with a two-block structure. In the context of optimization problems, it translates to the sum of two convex functions. Several generalizations to three-block problems have been proposed in the literature, such as the Davis-Yin operator splitting method davis2017three , Ryu’s three-operator splitting method ryu2020uniqueness , and the Malitsky-Tam operator splitting method MT23 (which is also applicable to nn-block problems). All these methods, in the context of optimization, have provable convergence guarantees for convex optimization problems.

The literature is relatively scarce for methods that solve nonconvex optimization problems for the sum of three functions. Notable contributions in this direction include the Davis-Yin splitting to nonconvex problems analyzed in bian2021three , the extension of Davis-Yin to four-operator splitting investigated in ALT24 , and an extension of Douglas-Rachford splitting examined in AT25 (that also works for nn-operators). In this work, we propose a modification of Ryu’s splitting method to solve nonconvex three-block optimization problems with a specific structure.

Consider the optimization problem

minxdφ(x)f1(x)+f2(x)+f3(x),\min_{x\in\mathbb{R}^{d}}~{}\varphi(x)\coloneqq f_{1}(x)+f_{2}(x)+f_{3}(x), (1)

where fi:d{+}f_{i}:\mathbb{R}^{d}\to\mathbb{R}\cup\{+\infty\} for i=1,2,3i=1,2,3. Given z10,z20dz_{1}^{0},z_{2}^{0}\in\mathbb{R}^{d}, γ>0\gamma>0 and λ>0\lambda>0, the (relaxed) Ryu’s three-operator splitting method ryu2020uniqueness is given by the following iteration: for k1k\geq 1,

{x1k=proxγf1(z1k)x2k=proxγf2(z2k+x1k)x3kproxγf3(x1kz1k+x2kz2k)(z1k+1z2k+1)=(z1kz2k)+λ(x3kx1kx3kx2k),\left\{\begin{array}[]{rcl}x_{1}^{k}&=&\operatorname{prox}_{\gamma f_{1}}(z_{1}^{k})\\ x_{2}^{k}&=&\operatorname{prox}_{\gamma f_{2}}(z_{2}^{k}+x_{1}^{k})\\ x_{3}^{k}&\in&\operatorname{prox}_{\gamma f_{3}}\big{(}x_{1}^{k}-z_{1}^{k}+x_{2}^{k}-z_{2}^{k}\big{)}\\ \begin{pmatrix}z_{1}^{k+1}\\ z_{2}^{k+1}\end{pmatrix}&=&\begin{pmatrix}z_{1}^{k}\\ z_{2}^{k}\end{pmatrix}+\lambda\begin{pmatrix}x_{3}^{k}-x_{1}^{k}\\ x_{3}^{k}-x_{2}^{k}\end{pmatrix},\end{array}\right. (2)

where proxγf\operatorname{prox}_{\gamma f} denotes the proximal mapping of γf\gamma f defined in (4) below.

When the functions f1,f2f_{1},f_{2} and f3f_{3} in problem (1) are convex, (ryu2020uniqueness, , Theorem 4) states the convergence of the method to minimizers of problem (1). Our objective in this work is to investigate whether the convergence guarantees can be extended to a nonconvex setting.

In our convergence analysis, we adopt the “envelope technique”, a strategy that has been employed in the analysis of several splitting algorithms for nonconvex problems themelis2018forward ; themelis2020douglas ; ALT24 ; AT25 ; Atenas25 . A key challenge, however, lies in the fact that the standard form of Ryu’s splitting algorithm is not directly amenable to this type of analysis. To address this, we consider the following relaxed variant of Ryu’s method: given z10,z20dz_{1}^{0},z_{2}^{0}\in\mathbb{R}^{d}, α>0\alpha>0, γ>0\gamma>0 and λ>0\lambda>0, we define the iterates for all k1k\geq 1 as follows:

{x1k=proxγf1(z1k)x2k=proxγαf2(z2kα+x1k)x3kproxγf3(x1kz1k+x2kz2k)(z1k+1z2k+1)=(z1kz2k)+λ(x3kx1kx3kx2k).\left\{\begin{array}[]{rcl}x_{1}^{k}&=&\operatorname{prox}_{\gamma f_{1}}(z_{1}^{k})\\ x_{2}^{k}&=&\operatorname{prox}_{\frac{\gamma}{\alpha}f_{2}}(\frac{z_{2}^{k}}{\alpha}+x_{1}^{k})\\ x_{3}^{k}&\in&\operatorname{prox}_{\gamma f_{3}}\big{(}x_{1}^{k}-z_{1}^{k}+x_{2}^{k}-z_{2}^{k}\big{)}\\ \begin{pmatrix}z_{1}^{k+1}\\ z_{2}^{k+1}\end{pmatrix}&=&\begin{pmatrix}z_{1}^{k}\\ z_{2}^{k}\end{pmatrix}+\lambda\begin{pmatrix}x_{3}^{k}-x_{1}^{k}\\ x_{3}^{k}-x_{2}^{k}\end{pmatrix}.\end{array}\right. (3)

In particular, we modify the f2f_{2}-proximal step in (2) to obtain (3). When α=1\alpha=1, this reduces to Ryu’s splitting algorithm.

Throughout this paper, we adopt the following blanket assumptions.

Assumption 1 (Blanket assumption)

Suppose the following conditions are satisfied.

  1. (a)

    For i=1,2i=1,2, the function fi:df_{i}:\mathbb{R}^{d}\to\mathbb{R} is a convex LiL_{i}-smooth function, that is, fi\nabla f_{i} is globally Lipschitz continuous with modulus Li>0L_{i}>0.

  2. (b)

    The function f3:d{+}f_{3}:\mathbb{R}^{d}\to\mathbb{R}\cup\{+\infty\} is proper and lower semicontinuous (lsc for short).

  3. (c)

    Problem (1) has a nonempty set of solutions.

Remark 1 (On the simplified nonconvex setting)

The convexity in Assumption 1(a) can be relaxed. However, we impose it in our setting to simplify the analysis. Similar results can be obtained when f1f_{1} and f2f_{2} are merely L1L_{1}- and L2L_{2}-smooth, respectively, by employing a strategy similar to that in themelis2018forward ; bian2021three .

This paper is organized as follows. In Section 2, we introduce the notation and some variational analysis results we use throughout this paper. In Section 3, we define the merit function we use as the foundation of our analysis - the Ryu’s three-operator splitting envelope - and establish key properties relevant to the convergence analysis of the proposed method. Section 4 is dedicated to investigating the convergence properties of the method (3). In particular, we show that the defined envelope satisfies a sufficient decrease condition, and then we exploit this property to prove that all cluster points of the generated sequence are critical points of the objective function of problem (1). Finally, in Section 5 we comment on some ongoing works and future research directions.

2 Preliminaries and notation

Throughout this paper, ,\langle\cdot,\cdot\rangle denotes an inner product in d\mathbb{R}^{d}, and \|\cdot\| its induced norm. We shall make use of the following technical result.

Lemma 1

For all a,b,c,dda,b,c,d\in\mathbb{R}^{d}, it holds

ab2ac2=bc2+2cb,ab.\|a-b\|^{2}-\|a-c\|^{2}=-\|b-c\|^{2}+2\langle c-b,a-b\rangle.

A function φ:d{+}\varphi:\mathbb{R}^{d}\to\mathbb{R}\cup\{+\infty\} is called proper when its domain, the set dom(φ)={xd:φ(x)<+}\operatorname{dom}(\varphi)=\{x\in\mathbb{R}^{d}:\varphi(x)<+\infty\}, is nonempty. We say φ\varphi is lsc if at any xdom(φ)x\in\operatorname{dom}(\varphi), f(x)lim infyxf(y)f(x)\leq\liminf_{y\to x}f(y), and it is convex if for all x,ydom(φ)x,y\in\operatorname{dom}(\varphi), and all β(0,1)\beta\in(0,1), φ(βx+(1β)y)βφ(x)+(1β)φ(y)\varphi(\beta x+(1-\beta)y)\leq\beta\varphi(x)+(1-\beta)\varphi(y). A function Φ:d\Phi:\mathbb{R}^{d}\to\mathbb{R} is said to be LL-smooth if it is differentiable and its gradient is LL-Lipschitz continuous, that is, for all x,ydx,y\in\mathbb{R}^{d}, Φ(x)Φ(y)Lxy{\left\|{\nabla\Phi(x)-\nabla\Phi(y)}\right\|}\leq L\|x-y\|. In the next result, we recall some propeties of LL-smooth functions that will be important in our analysis (see, for instance, (Beck17, , Theorem 5.8)).

Lemma 2

Suppose f:df:\mathbb{R}^{d}\to\mathbb{R} is LL-smooth. Then,

(x,yd)|f(x)f(y)f(y),xy|L2yx2.(\forall x,y\in\mathbb{R}^{d})\;|f(x)-f(y)-\langle\nabla f(y),x-y\rangle|\leq\dfrac{L}{2}\|y-x\|^{2}.

If, in addiiton, ff is convex, then

f(x)f(y),xy12Lf(x)f(y)2.\langle\nabla f(x)-\nabla f(y),x-y\rangle\geq\dfrac{1}{2L}\|\nabla f(x)-\nabla f(y)\|^{2}.

For a proper function φ:d{+}\varphi:\mathbb{R}^{d}\to\mathbb{R}\cup\{+\infty\}, and xdom(φ)x\in\operatorname{dom}(\varphi), we denote by ^φ(x)\hat{\partial}\varphi(x) the Fréchet (or regular) subdifferential of φ\varphi at xx, defined as

^φ(x)={vd:lim infyxφ(y)φ(x)v,yxyx0}.\hat{\partial}\varphi(x)=\left\{v\in\mathbb{R}^{d}:\liminf_{y\to x}\frac{\varphi(y)-\varphi(x)-\langle v,y-x\rangle}{\|y-x\|}\geq 0\right\}.

The limiting (or general) subdifferential of φ\varphi at xx, denoted φ(x)\partial\varphi(x), is defined as

φ(x)=lim supy𝜑x^φ(x),\partial\varphi(x)=\limsup_{y\xrightarrow[\varphi]{}x}\hat{\partial}\varphi(x),

where y𝜑xy\xrightarrow[\varphi]{}x denotes convergence in the attentive sense, that is, yxy\to x and φ(y)φ(x)\varphi(y)\to\varphi(x). When φ\varphi is smooth, then ^φ(x)=φ(x)={φ(x)}\hat{\partial}\varphi(x)={\partial}\varphi(x)=\{\nabla\varphi(x)\}. If φ\varphi is proper lsc convex, then the Fréchet and limiting subdifferentials coincide with the subdifferential of convex analysis, namely,

{vd:(yd)φ(y)φ(x)+v,yx}.\{v\in\mathbb{R}^{d}:(\forall y\in\mathbb{R}^{d})\;\varphi(y)\geq\varphi(x)+\langle v,y-x\rangle\}.

A set of points of particular interest is the zeros of the subdifferential operator. We say x¯d\bar{x}\in\mathbb{R}^{d} is a critical point of φ\varphi if 0φ(x¯)0\in\partial\varphi(\bar{x}). If φ\varphi is convex, critical points are exactly the global minimizers of φ\varphi.

Remark 2 (Critical points of the sum)

Under Assumption 1 and for φ\varphi defined in (1), in view of (Rockafellar_Wets_2009, , Exercise 8.8(c)), x¯\bar{x} is a critical point of φ\varphi if

0f1(x¯)+f2(x¯)+f3(x¯).0\in\nabla f_{1}(\bar{x})+\nabla f_{2}(\bar{x})+\partial f_{3}(\bar{x}).

Given a function φ:d{+}\varphi:\mathbb{R}^{d}\to\mathbb{R}\cup\{+\infty\}, a parameter γ>0\gamma>0, and a point zdz\in\mathbb{R}^{d}, the proximal operator of γφ\gamma\varphi at xx is defined as

proxγφ(z)argminydφ(y)+12γyz2.\operatorname{prox}_{\gamma\varphi}(z)\coloneqq\operatorname*{argmin}\limits_{y\in\mathbb{R}^{d}}~{}\varphi(y)+\frac{1}{2\gamma}{\left\|{y-z}\right\|}^{2}. (4)

The associated optimal value function is known as the Moreau envelope, defined as follows:

φγMoreau(z)minydφ(y)+12γyz2.{\varphi}_{\gamma}^{\texttt{Moreau}}(z)\coloneqq\min_{y\in\mathbb{R}^{d}}~{}\varphi(y)+\frac{1}{2\gamma}{\left\|{y-z}\right\|}^{2}.

We say φ\varphi is prox-bounded if, for some γ>0\gamma>0, φ()+12γ2\varphi(\cdot)+\frac{1}{2\gamma}\|\cdot\|^{2} is bounded from below. The supremum γφ>0\gamma_{\varphi}>0 of such parameters γ\gamma is called the threshold of prox-boundedness. If φ\varphi is a proper lsc prox-bounded function with threshold γφ>0\gamma_{\varphi}>0, then for any γ(0,γφ)\gamma\in(0,\gamma_{\varphi}), proxγφ\operatorname{prox}_{\gamma\varphi} is nonempty and compact-valued, and φγφMoreau\varphi_{\gamma\varphi}^{\texttt{Moreau}} is finite-valued (Rockafellar_Wets_2009, , Theorem 1.25). In particular, if φ\varphi is a proper lsc convex function, then γφ=+\gamma_{\varphi}=+\infty, and proxγφ\operatorname{prox}_{\gamma\varphi} is a single-valued mapping (Rockafellar_Wets_2009, , Theorem 12.12, Theorem 12.17).

Remark 3 (On the well-definedness of proximal operations)

Under Assumption 1, proxγf1\operatorname{prox}_{\gamma f_{1}} and proxγf2\operatorname{prox}_{\gamma f_{2}} are single-valued since f1f_{1} and f2f_{2} are proper lsc convex, while proxγf3\operatorname{prox}_{\gamma f_{3}} is well-defined for γ<1L1+L2\gamma<\frac{1}{L_{1}+L_{2}}, as f3f_{3} is prox-bounded with threshold at least 1L1+L2\frac{1}{L_{1}+L_{2}} from (themelis2020douglas, , Remark 3.1).

3 An envelope for Ryu’s splitting method

Our convergence analysis for (3) under Assumption 1 builds on the approach in ALT24 ; AT25 ; themelis2020douglas , which utilizes an envelope function, akin to the Moreau envelope, well-suited to the corresponding iterative method. We begin our analysis by motivating such merit function.

Proposition 1

Suppose that Assumption 1 holds, and let α>0\alpha>0 and (z1k,z2k)d×d(z_{1}^{k},z_{2}^{k})\in\mathbb{R}^{d}\times\mathbb{R}^{d}. Then, x3kx_{3}^{k} defined in (3) solves the following minimization problem

minyd{f3(y)+i=12[fi(xik)+yxik,fi(xik)+12γiyxik2]},\min_{y\in\mathbb{R}^{d}}\left\{f_{3}(y)+\displaystyle\sum_{i=1}^{2}\left[f_{i}(x_{i}^{k})+\langle y-x_{i}^{k},\nabla f_{i}(x_{i}^{k})\rangle+\dfrac{1}{2\gamma_{i}}\|y-x_{i}^{k}\|^{2}\right]\right\},

where γ1γα\gamma_{1}\coloneqq\frac{\gamma}{\alpha} and γ2γ1α\gamma_{2}\coloneqq\frac{\gamma}{1-\alpha}.111We adopt the convention that c0=\frac{c}{0}=\infty and d=0\frac{d}{\infty}=0 for any c>0c>0 and dd\in\mathbb{R}. Hence, when α=1\alpha=1, γ2=\gamma_{2}=\infty and the term 12γiyxik2\dfrac{1}{2\gamma_{i}}\|y-x_{i}^{k}\|^{2} vanishes when i=2i=2.

Proof

From the x3x_{3}-update in (3), we have

x3kargminyd{f3(y)+12γy(x1kz1k+x2kz2k)2}.x_{3}^{k}\in\operatorname*{argmin}_{y\in\mathbb{R}^{d}}\left\{f_{3}(y)+\dfrac{1}{2\gamma}\|y-(x_{1}^{k}-z_{1}^{k}+x_{2}^{k}-z_{2}^{k})\|^{2}\right\}. (5)

Meanwhile, the first-order optimality conditions of the x1x_{1}-update and the x2x_{2}-update yield, respectively,

z1k\displaystyle z_{1}^{k} =γf1(x1k)+x1k\displaystyle=\gamma\nabla f_{1}(x_{1}^{k})+x_{1}^{k} (6)
z2k\displaystyle z_{2}^{k} =γf2(x2k)+αx2kαx1k.\displaystyle=\gamma\nabla f_{2}(x_{2}^{k})+\alpha x_{2}^{k}-\alpha x_{1}^{k}. (7)

Hence,

x1kz1k+x2kz2k\displaystyle x_{1}^{k}-z_{1}^{k}+x_{2}^{k}-z_{2}^{k} =x1k(γf1(x1k)+x1k)+x2k(γf2(x2k)+αx2kαx1k)\displaystyle=x_{1}^{k}-\left(\gamma\nabla f_{1}(x_{1}^{k})+x_{1}^{k}\right)+x_{2}^{k}-\left(\gamma\nabla f_{2}(x_{2}^{k})+\alpha x_{2}^{k}-\alpha x_{1}^{k}\right)
=αx1kγf1(x1k)+(1α)x2kγf2(x2k)\displaystyle=\alpha x_{1}^{k}-\gamma\nabla f_{1}(x_{1}^{k})+(1-\alpha)x_{2}^{k}-\gamma\nabla f_{2}(x_{2}^{k})
=α(x1kγαf1(x1k))+(1α)(x2kγ1αf2(x2k)),\displaystyle=\alpha\left(x_{1}^{k}-\frac{\gamma}{\alpha}\nabla f_{1}(x_{1}^{k})\right)+(1-\alpha)\left(x_{2}^{k}-\frac{\gamma}{1-\alpha}\nabla f_{2}(x_{2}^{k})\right),

which, combined with (5), yields

x3kargminyd{f3(y)+12γy(i=12αi(xikγifi(xik))2}x_{3}^{k}\in\operatorname*{argmin}_{y\in\mathbb{R}^{d}}\left\{f_{3}(y)+\dfrac{1}{2\gamma}\left\|y-\left(\sum_{i=1}^{2}\alpha_{i}(x_{i}^{k}-\gamma_{i}\nabla f_{i}(x_{i}^{k})\right)\right\|^{2}\right\} (8)

for α1=α\alpha_{1}=\alpha and α2=1α\alpha_{2}=1-\alpha. Following the calculations in (AT25, , Theorem 5.5), by expanding the squared norm, dropping the constant term i=12αi(xikγifi(xik)2\left\|\sum_{i=1}^{2}\alpha_{i}(x_{i}^{k}-\gamma_{i}\nabla f_{i}(x_{i}^{k})\right\|^{2}, and adding the constant term i=12fi(xik)\sum_{i=1}^{2}f_{i}(x_{i}^{k}), we get the desired result. ∎

In view of Proposition 1, we define the Relaxed Ryu envelope (RRE), for all (z1,z2)d×d(z_{1},z_{2})\in\mathbb{R}^{d}\times\mathbb{R}^{d}, by

φγRyu(z1,z2)minyd{f3(y)+i=12[fi(xi)+yxi,fi(xi)+12γiyxi2]}.\varphi_{\gamma}^{\texttt{Ryu}}(z_{1},z_{2})\coloneqq\min_{y\in\mathbb{R}^{d}}\left\{f_{3}(y)+\displaystyle\sum_{i=1}^{2}\left[f_{i}(x_{i})+\langle y-x_{i},\nabla f_{i}(x_{i})\rangle+\dfrac{1}{2\gamma_{i}}\|y-x_{i}\|^{2}\right]\right\}. (9)

Meanwhile, the corresponding set-valued iteration operator associated with (3) is given by

TγRyu:(z1,z2)(z1+λ(x3x1),z2+λ(x3x2)),T_{\gamma}^{\texttt{Ryu}}:(z_{1},z_{2})\mapsto(z_{1}+\lambda(x_{3}-x_{1}),z_{2}+\lambda(x_{3}-x_{2})),

where

{x1=proxγf1(z1)x2=proxγαf2(z2α+x1)x3proxγf3(x1z1+x2z2).\left\{\begin{array}[]{rcl}x_{1}&=&\operatorname{prox}_{\gamma f_{1}}(z_{1})\\ x_{2}&=&\operatorname{prox}_{\frac{\gamma}{\alpha}f_{2}}(\frac{z_{2}}{\alpha}+x_{1})\\ x_{3}&\in&\operatorname{prox}_{\gamma f_{3}}\big{(}x_{1}-z_{1}+x_{2}-z_{2}\big{)}.\end{array}\right. (10)

In view of Remark 3, note that TγRyuT_{\gamma}^{\texttt{Ryu}} is nonempty- and compact-valued for any (z1,z2)d×d(z_{1},z_{2})\in\mathbb{R}^{d}\times\mathbb{R}^{d} provided that γ<1L1+L2\gamma<\frac{1}{L_{1}+L_{2}}.

Remark 4 (Relationship between the Moreau envelope and the RRE)

Observe that from (8), we have

φγRyu(z1,z2)=φγf3Moreau(i=12αi(xiγifi(xi)))i=12αi(xiγifi(xi))2+i=12fi(xi).\varphi_{\gamma}^{\texttt{Ryu}}(z_{1},z_{2})=\varphi_{\gamma f_{3}}^{\texttt{Moreau}}\left(\sum_{i=1}^{2}\alpha_{i}(x_{i}-\gamma_{i}\nabla f_{i}(x_{i}))\right)\\ -\left\|\sum_{i=1}^{2}\alpha_{i}(x_{i}-\gamma_{i}\nabla f_{i}(x_{i}))\right\|^{2}+\sum_{i=1}^{2}f_{i}(x_{i}).

We now establish some properties of the envelope function. The following result states that the RRE inherits continuity properties of the Moreau envelope (cf. (themelis2018forward, , Proposition 4.2) and (themelis2020douglas, , Proposition 3.2)).

Proposition 2 (Continuity of RRE)

The RRE is a real-valued and locally Lipschitz continuous function.

Proof

Since proxγf1\operatorname{prox}_{\gamma f_{1}} and proxγαf2\operatorname{prox}_{\frac{\gamma}{\alpha}f_{2}} are nonexpansive (Beck17, , Theorem 6.42(b)), then the maps z1x1z_{1}\mapsto x_{1} and (z1,z2)x2(z_{1},z_{2})\mapsto x_{2} defined in (10) are (globally) Lipschitz continuous. Furthermore, from Assumption 1, fi\nabla f_{i} is (globally) Lipschitz continuous, for i=1,2i=1,2. Then, as the Moreau envelope is locally Lipschitz continuous (Rockafellar_Wets_2009, , Example 10.32), the conclusion follows from Remark 4.∎

Next, we show some sandwich-type bounds relating the RRE and the original objective function in problem (1) (cf. (themelis2018forward, , Proposition 4.3) and (themelis2020douglas, , Proposition 3.3)).

Proposition 3

Suppose Assumption 1 holds, γ(0,1L1+L2)\gamma\in(0,\frac{1}{L_{1}+L_{2}}), and λ,α>0\lambda,\alpha>0. Then, given (z1,z2)d×d(z_{1},z_{2})\in\mathbb{R}^{d}\times\mathbb{R}^{d}, consider (x1,x2,x3)d×d×d(x_{1},x_{2},x_{3})\in\mathbb{R}^{d}\times\mathbb{R}^{d}\times\mathbb{R}^{d} defined by (10). Then,

  • (i)

    For γ1=γα\gamma_{1}=\frac{\gamma}{\alpha} and γ2=γ1α\gamma_{2}=\frac{\gamma}{1-\alpha},

    φγRyu(z1,z2)min{φ(x1)+12(L2+1γ2)x1x22,φ(x2)+12(L1+1γ1)x1x22}.\varphi_{\gamma}^{\texttt{Ryu}}(z_{1},z_{2})\leq\min\left\{\varphi(x_{1})+\frac{1}{2}\left(L_{2}+\frac{1}{\gamma_{2}}\right)\|x_{1}-x_{2}\|^{2},\right.\\ \left.\varphi(x_{2})+\frac{1}{2}\left(L_{1}+\frac{1}{\gamma_{1}}\right)\|x_{1}-x_{2}\|^{2}\right\}.
  • (ii)

    φγRyu(z1,z2)φ(x3)+αγL12γx3x12+(1α)γL22γx3x22\varphi_{\gamma}^{\texttt{Ryu}}(z_{1},z_{2})\geq\varphi(x_{3})+\frac{\alpha-\gamma L_{1}}{2\gamma}{\left\|{x_{3}-x_{1}}\right\|}^{2}+\frac{(1-\alpha)-\gamma L_{2}}{2\gamma}{\left\|{x_{3}-x_{2}}\right\|}^{2}.

  • (iii)

    If α(0,1)\alpha\in(0,1) and γmin{αL1,1αL2}\gamma\leq\min\left\{\frac{\alpha}{L_{1}},\frac{1-\alpha}{L_{2}}\right\}, then φγRyu(z1,z2)φ(x3)\varphi_{\gamma}^{\texttt{Ryu}}(z_{1},z_{2})\geq\varphi(x_{3}).

Proof

Take y=x1y=x_{1} in (9) and apply Lemma 2 to f=f2f=f_{2} to obtain

φγRyu(z1,z2)f3(x1)+f1(x1)+f2(x2)+f2(x2),x1x2+12γ2x1x22φ(x1)+12(L2+1γ2)x1x22.\begin{array}[]{rcl}\varphi_{\gamma}^{\texttt{Ryu}}(z_{1},z_{2})&\leq&f_{3}(x_{1})+f_{1}(x_{1})+f_{2}(x_{2})+\langle\nabla f_{2}(x_{2}),x_{1}-x_{2}\rangle+\frac{1}{2\gamma_{2}}\|x_{1}-x_{2}\|^{2}\\ &\leq&\varphi(x_{1})+\dfrac{1}{2}\left(L_{2}+\frac{1}{\gamma_{2}}\right)\|x_{1}-x_{2}\|^{2}.\end{array}

Similarly, taking y=x2y=x_{2} in (9) and applying Lemma 2 to f=f1f=f_{1} yields

φγRyu(z1,z2)φ(x2)+12(L1+1γ1)x2x12.\begin{array}[]{rcl}\varphi_{\gamma}^{\texttt{Ryu}}(z_{1},z_{2})&\leq&\varphi(x_{2})+\dfrac{1}{2}\left(L_{1}+\frac{1}{\gamma_{1}}\right)\|x_{2}-x_{1}\|^{2}.\end{array}

From these, we immediately get (i). Moreover, since y=x3y=x_{3} minimizes the problem in (9),

φγRyu(z1,z2)=f3(x3)+i=12[fi(xi)+fi(xi),x3xi+12γix3xi2]f3(x3)+i=12[fi(x3)Li2x3xi2+12γix3xi2]=φ(x3)+αγL12γx3x12+(1α)γL22γx3x22,\begin{array}[]{rcl}\varphi_{\gamma}^{\texttt{Ryu}}(z_{1},z_{2})&=&f_{3}(x_{3})+\displaystyle\sum_{i=1}^{2}\left[f_{i}(x_{i})+\langle\nabla f_{i}(x_{i}),x_{3}-x_{i}\rangle+\dfrac{1}{2\gamma_{i}}\|x_{3}-x_{i}\|^{2}\right]\\ &\geq&f_{3}(x_{3})+\displaystyle\sum_{i=1}^{2}\left[f_{i}(x_{3})-\dfrac{L_{i}}{2}\|x_{3}-x_{i}\|^{2}+\dfrac{1}{2\gamma_{i}}\|x_{3}-x_{i}\|^{2}\right]\\ &=&\varphi(x_{3})+\frac{\alpha-\gamma L_{1}}{2\gamma}{\left\|{x_{3}-x_{1}}\right\|}^{2}+\frac{(1-\alpha)-\gamma L_{2}}{2\gamma}{\left\|{x_{3}-x_{2}}\right\|}^{2},\end{array}

where the inequality holds by Lemma 2 and the last equality holds by plugging in γ1=γα\gamma_{1}=\frac{\gamma}{\alpha} and γ2=γ1α\gamma_{2}=\frac{\gamma}{1-\alpha}. This completes the proof of (ii). Finally, part (iii) immediately follows from (ii). ∎

Note that the iteration in (3) is designed to find a fixed point of the relaxed Ryu splitting operator TγRyuT_{\gamma}^{\texttt{Ryu}}, that is, a point in the set

FixTγRyu{(z1,z2)d×d:(z1,z2)TγRyu(z1,z2)}.\operatorname{Fix}T_{\gamma}^{\texttt{Ryu}}\coloneqq\left\{(z_{1},z_{2})\in\mathbb{R}^{d}\times\mathbb{R}^{d}:(z_{1},z_{2})\in T_{\gamma}^{\texttt{Ryu}}(z_{1},z_{2})\right\}.

In the next proposition, we establish a connection between such fixed points and the notion of criticality, as commonly used in optimization.

Proposition 4

Suppose Assumption 1 holds, and let γ(0,1L1+L2)\gamma\in(0,\frac{1}{L_{1}+L_{2}}) and α,λ>0\alpha,\lambda>0. Then, (z¯1,z¯2)FixTγRyu(\bar{z}_{1},\bar{z}_{2})\in\operatorname{Fix}T_{\gamma}^{\texttt{Ryu}} if and only if x¯:=x¯1=x¯2=x¯3\bar{x}:=\bar{x}_{1}=\bar{x}_{2}=\bar{x}_{3}, where

{x¯1=proxγf1(z¯1)x¯2=proxγαf2(z¯2α+x¯1)x¯3proxγf3(x¯1z¯1+x¯2z¯2).\left\{\begin{array}[]{rcl}\bar{x}_{1}&=&\operatorname{prox}_{\gamma f_{1}}(\bar{z}_{1})\\ \bar{x}_{2}&=&\operatorname{prox}_{\frac{\gamma}{\alpha}f_{2}}(\frac{\bar{z}_{2}}{\alpha}+\bar{x}_{1})\\ \bar{x}_{3}&\in&\operatorname{prox}_{\gamma f_{3}}\big{(}\bar{x}_{1}-\bar{z}_{1}+\bar{x}_{2}-\bar{z}_{2}\big{)}.\end{array}\right. (11)

Furthermore, such x¯\bar{x} is a critical point of (1), and φ(x¯)=φγRyu(z¯1,z¯2)\varphi(\bar{x})=\varphi_{\gamma}^{\texttt{Ryu}}(\bar{z}_{1},\bar{z}_{2}). In particular, if α(0,1)\alpha\in(0,1) and γmin{αL1,1αL2}\gamma\leq\min\left\{\frac{\alpha}{L_{1}},\frac{1-\alpha}{L_{2}}\right\} , then

minφ=minφγRyu.\min\varphi=\min\varphi_{\gamma}^{\texttt{Ryu}}.
Proof

It is straightforward from the definition of TγRyuT_{\gamma}^{\texttt{Ryu}} that (z¯1,z¯2)FixTγRyu(\bar{z}_{1},\bar{z}_{2})\in\operatorname{Fix}T_{\gamma}^{\texttt{Ryu}} is equivalent to having x¯:=x¯1=x¯2=x¯3\bar{x}:=\bar{x}_{1}=\bar{x}_{2}=\bar{x}_{3} satisfying the conditions in (11). Hence, it suffices to prove that such x¯\bar{x} is, in this case, always a critical point of φ\varphi. Evaluating the first-order optimality conditions of the x3x_{3}-step, namely,

0γf3(x3)+x3x1+z1x2+z20\in\gamma\partial f_{3}(x_{3})+x_{3}-x_{1}+z_{1}-x_{2}+z_{2} (12)

at (x1,x2,x3)=(x¯,x¯,x¯)(x_{1},x_{2},x_{3})=(\bar{x},\bar{x},\bar{x}) and zi=z¯iz_{i}=\bar{z}_{i}, for i=1,2i=1,2, yields

0γf3(x¯)+z¯1x¯+z¯2.0\in\gamma\partial f_{3}(\bar{x})+\bar{z}_{1}-\bar{x}+\bar{z}_{2}.

Adding this inclusion with (6) and (7) using the substitutions x1kx¯x_{1}^{k}\leftarrow\bar{x} and x2kx¯x_{2}^{k}\leftarrow\bar{x}, we obtain

0γ(f1(x¯)+f2(x¯)+f3(x¯)),0\in\gamma\big{(}\nabla f_{1}(\bar{x})+\nabla f_{2}(\bar{x})+\partial f_{3}(\bar{x})\big{)},

that is, 0φ(x¯)0\in\partial\varphi(\bar{x}) by Remark 2.

Furthermore, given (z¯1,z¯2)FixTγRyu(\bar{z}_{1},\bar{z}_{2})\in\operatorname{Fix}T_{\gamma}^{\texttt{Ryu}}, Proposition 3(i)&(ii) imply that φ(x¯)=φγRyu(z¯1,z¯2)\varphi(\bar{x})=\varphi_{\gamma}^{\texttt{Ryu}}(\bar{z}_{1},\bar{z}_{2}). Finally, in view of Proposition 3(iii), for any (z1,z2)d×d(z_{1},z_{2})\in\mathbb{R}^{d}\times\mathbb{R}^{d}, and x¯argminφ\bar{x}\in\operatorname*{argmin}\varphi,

φγRyu(z¯1,z¯2)=φ(x¯)φ(x3)φγRyu(z1,z2).\varphi_{\gamma}^{\texttt{Ryu}}(\bar{z}_{1},\bar{z}_{2})=\varphi(\bar{x})\leq\varphi(x_{3})\leq\varphi_{\gamma}^{\texttt{Ryu}}(z_{1},z_{2}).

Hence, (z¯1,z¯2)argminφγRyu(\bar{z}_{1},\bar{z}_{2})\in\operatorname*{argmin}\varphi_{\gamma}^{\texttt{Ryu}} and minφ=minφγRyu\min\varphi=\min\varphi_{\gamma}^{\texttt{Ryu}}.∎

At this point, we have built the necessary tools regarding the RRE for the analysis of convergence of the proposed relaxed Ryu splitting method. In the next section, we show subsequential convergence of the method under standard assumptions in the literature.

4 Convergence of modified Ryu’s three-operator splitting method via envelopes

To establish the (subsequential) convergence of the iterative method in (3), we follow the approach introduced in themelis2020douglas for the Douglas–Rachford splitting method. Specifically, the core argument relies on a sufficient decrease property satisfied by the RRE.

4.1 Sufficient decrease property for RRE

We first prove three technical lemmas that we will use in the main result of this section. We use the following notation: for i=1,2i=1,2,

Δxik=xik+1xik,Δgik=fi(xik+1)fi(xik),Δzik=zik+1zik.\begin{array}[]{rcl}\Delta x_{i}^{k}&=&x_{i}^{k+1}-x_{i}^{k},\\ \Delta g_{i}^{k}&=&\nabla f_{i}(x_{i}^{k+1})-\nabla f_{i}(x_{i}^{k}),\\ \Delta z_{i}^{k}&=&z_{i}^{k+1}-z_{i}^{k}.\end{array} (13)
Lemma 3

Under Assumption 1, the sequences (xik)k(x_{i}^{k})_{k} for i=1,2,3i=1,2,3, and (zik)k(z_{i}^{k})_{k} for i=1,2i=1,2 generated by (3) satisfy:

x3kx1k2x3kx1k+12=(2λ1)Δx1k2+2γλΔx1k,Δg1k\|x_{3}^{k}-x_{1}^{k}\|^{2}-\|x_{3}^{k}-x_{1}^{k+1}\|^{2}=\left(\dfrac{2}{\lambda}-1\right)\|\Delta x_{1}^{k}\|^{2}+\dfrac{2\gamma}{\lambda}\langle\Delta x_{1}^{k},\Delta g_{1}^{k}\rangle (14)

and

x3kx2k2x3kx2k+12=(2αλ1)Δx2k2+2γλΔx2k,Δg2k2αλΔx2k,Δx1k.\|x_{3}^{k}-x_{2}^{k}\|^{2}-\|x_{3}^{k}-x_{2}^{k+1}\|^{2}=\left(\dfrac{2\alpha}{\lambda}-1\right)\|\Delta x_{2}^{k}\|^{2}+\dfrac{2\gamma}{\lambda}\langle\Delta x_{2}^{k},\Delta g_{2}^{k}\rangle-\dfrac{2\alpha}{\lambda}\langle\Delta x_{2}^{k},\Delta x_{1}^{k}\rangle. (15)
Proof

Using the notations (13), we have from (6) and (7) that

Δz1k\displaystyle\Delta z_{1}^{k} =γΔg1k+Δx1k\displaystyle=\gamma\Delta g_{1}^{k}+\Delta x_{1}^{k} (16)
Δz2k\displaystyle\Delta z_{2}^{k} =γΔg2k+αΔx2kαΔx1k.\displaystyle=\gamma\Delta g_{2}^{k}+\alpha\Delta x_{2}^{k}-\alpha\Delta x_{1}^{k}. (17)

On the other hand, Lemma 1 and the (z1,z2)(z_{1},z_{2})-update in (3) yield

x3kxik2x3kxik+12=Δxik2+2Δxik,x3kxik=Δxik2+2λΔxik,Δzik.\|x_{3}^{k}-x_{i}^{k}\|^{2}-\|x_{3}^{k}-x_{i}^{k+1}\|^{2}=-\|\Delta x_{i}^{k}\|^{2}+2\langle\Delta x_{i}^{k},x_{3}^{k}-x_{i}^{k}\rangle=-\|\Delta x_{i}^{k}\|^{2}+\dfrac{2}{\lambda}\langle\Delta x_{i}^{k},\Delta z_{i}^{k}\rangle.

Together with (16) and (17), we get (14) and (15), respectively. ∎

Lemma 4

Under Assumption 1, the sequences (xik)k(x_{i}^{k})_{k} for i=1,2,3i=1,2,3, and (zik)k(z_{i}^{k})_{k} for i=1,2i=1,2 generated by (2) satisfy:

i=12fi(xik)fi(xik+1)fi(xik+1),x3kxik+1+fi(xik),x3kxiki=12(12Liγλ)Δgik21λΔg1k,Δx1kαλΔg2k,Δx2k+αλΔg2k,Δx1k.\sum_{i=1}^{2}f_{i}(x_{i}^{k})-f_{i}(x_{i}^{k+1})-\langle\nabla f_{i}(x_{i}^{k+1}),x_{3}^{k}-x_{i}^{k+1}\rangle+\langle\nabla f_{i}(x_{i}^{k}),x_{3}^{k}-x_{i}^{k}\rangle\\ \geq\sum_{i=1}^{2}\left(\dfrac{1}{2L_{i}}-\dfrac{\gamma}{\lambda}\right)\|\Delta g_{i}^{k}\|^{2}-\dfrac{1}{\lambda}\langle\Delta g_{1}^{k},\Delta x_{1}^{k}\rangle\\ -\dfrac{\alpha}{\lambda}\langle\Delta g_{2}^{k},\Delta x_{2}^{k}\rangle+\dfrac{\alpha}{\lambda}\langle\Delta g_{2}^{k},\Delta x_{1}^{k}\rangle.
Proof

For i=1,2i=1,2, we have from Lemma 2 and the (z1,z2)(z_{1},z_{2})-update rule in (3) that

fi(xik)fi(xik+1)fi(xik+1),x3kxik+1+fi(xik),x3kxik\displaystyle f_{i}(x_{i}^{k})-f_{i}(x_{i}^{k+1})-\langle\nabla f_{i}(x_{i}^{k+1}),x_{3}^{k}-x_{i}^{k+1}\rangle+\langle\nabla f_{i}(x_{i}^{k}),x_{3}^{k}-x_{i}^{k}\rangle
=\displaystyle= fi(xik)fi(xik+1)fi(xik+1),xikxik+1Δgik,x3kxik\displaystyle f_{i}(x_{i}^{k})-f_{i}(x_{i}^{k+1})-\langle\nabla f_{i}(x_{i}^{k+1}),x_{i}^{k}-x_{i}^{k+1}\rangle-\langle\Delta g_{i}^{k},x_{3}^{k}-x_{i}^{k}\rangle
\displaystyle\geq 12LiΔgik2Δgik,x3kxik\displaystyle\dfrac{1}{2L_{i}}\|\Delta g_{i}^{k}\|^{2}-\langle\Delta g_{i}^{k},x_{3}^{k}-x_{i}^{k}\rangle
=\displaystyle= 12LiΔgik21λΔgik,Δzik,\displaystyle\dfrac{1}{2L_{i}}\|\Delta g_{i}^{k}\|^{2}-\dfrac{1}{\lambda}\langle\Delta g_{i}^{k},\Delta z_{i}^{k}\rangle, (18)

Meanwhile, we have from (16) that

Δg1k,Δz1k=γΔg1k2+Δg1k,Δx1k.\langle\Delta g_{1}^{k},\Delta z_{1}^{k}\rangle=\gamma\|\Delta g_{1}^{k}\|^{2}+\langle\Delta g_{1}^{k},\Delta x_{1}^{k}\rangle. (19)

On the other hand, (17) yields

Δg2k,Δz2k=γΔg2k2+αΔg2k,Δx2kαΔg2k,Δx1k.\langle\Delta g_{2}^{k},\Delta z_{2}^{k}\rangle=\gamma\|\Delta g_{2}^{k}\|^{2}+\alpha\langle\Delta g_{2}^{k},\Delta x_{2}^{k}\rangle-\alpha\langle\Delta g_{2}^{k},\Delta x_{1}^{k}\rangle. (20)

Combining (18), (19) and (20) gives the desired inequality. ∎

Lemma 5

Let λ(0,2)\lambda\in(0,2) and let α¯2λ3+94λ2\underline{\alpha}\coloneqq\frac{2\lambda-3+\sqrt{9-4\lambda}}{2}. Then the following holds:

  1. (i)

    α¯(λ2,1)\underline{\alpha}\in\left(\frac{\lambda}{2},1\right).

  2. (ii)

    The interval (α2αλ,2λ1α)\left(\dfrac{\alpha}{2\alpha-\lambda},\dfrac{2-\lambda}{1-\alpha}\right) is nonempty for any α(α¯,1)\alpha\in(\underline{\alpha},1).

  3. (iii)

    For ϵ1,ϵ2>0\epsilon_{1},\epsilon_{2}>0 and α(α¯,1)\alpha\in(\underline{\alpha},1), define the constants

    γ¯1λ2L2α2ε2,γ¯2α(2λ(1α)ϵ1)αϵ2+2(1α)L1,γ¯3(1α)(ϵ1(2αλ)α)2αL2ϵ1,\begin{array}[]{rcl}\bar{\gamma}_{1}&\coloneqq&\dfrac{\lambda}{2L_{2}}-\dfrac{\alpha}{2\varepsilon_{2}},\\ \bar{\gamma}_{2}&\coloneqq&\dfrac{\alpha(2-\lambda-(1-\alpha)\epsilon_{1})}{\alpha\epsilon_{2}+2(1-\alpha)L_{1}},\\ \bar{\gamma}_{3}&\coloneqq&\dfrac{(1-\alpha)(\epsilon_{1}(2\alpha-\lambda)-\alpha)}{2\alpha L_{2}\epsilon_{1}},\end{array} (21)

    and the intervals

    I1(α2αλ,2λ1α)andI2(αL2λ,+).I_{1}\coloneqq\left(\frac{\alpha}{2\alpha-\lambda},\frac{2-\lambda}{1-\alpha}\right)\quad\text{and}\quad I_{2}\coloneqq\left(\frac{\alpha L_{2}}{\lambda},+\infty\right). (22)

    If ϵjIj\epsilon_{j}\in I_{j} for j=1,2j=1,2, then γ¯i\bar{\gamma}_{i} is strictly positive for i=1,2,3i=1,2,3.

Proof

These results follow from straightforward calculations. ∎

With these lemmas in place, we are now ready to present the first main result of this paper. We establish the sufficient descent property of the RRE, provided that the stepsize is chosen sufficiently small. In particular, we restrict the stepsize γ\gamma in the interval

Γ(0,min{γ¯0,γ¯1}](0,min{γ¯2,γ¯3,1L1+L2}),\Gamma\coloneqq\left(0,\min\{\bar{\gamma}_{0},\bar{\gamma}_{1}\}\right]\cap\left(0,\min\left\{\bar{\gamma}_{2},\bar{\gamma}_{3},{\frac{1}{L_{1}+L_{2}}}\right\}\right), (23)

where γ0¯λ2L1\bar{\gamma_{0}}\coloneqq\frac{\lambda}{2L_{1}}, andγ¯i\bar{\gamma}_{i} is given by (21) for i=1,2,3i=1,2,3, with ϵj\epsilon_{j} taken from IjI_{j} given in (22) for j=1,2j=1,2.

Theorem 4.1 (RRE sufficient descent)

Suppose that Assumption 1 holds. Let α(α¯,1)\alpha\in(\underline{\alpha},1) where α¯2λ3+94λ2\underline{\alpha}\coloneqq\frac{2\lambda-3+\sqrt{9-4\lambda}}{2}, and λ(0,2)\lambda\in(0,2). Let γΓ\gamma\in\Gamma, where Γ\Gamma is given in (23). For the sequences (zik)k(z_{i}^{k})_{k} for i=1,2i=1,2 generated by (3), there exists M=M(γ)>0M=M(\gamma)>0 such that for all k1k\geq 1,

φγRyu(z1k,z2k)φγRyu(z1k+1,z2k+1)+M(z1k+1z1k2+z2k+1z2k2).\varphi_{\gamma}^{\texttt{Ryu}}(z_{1}^{k},z_{2}^{k})\geq\varphi_{\gamma}^{\texttt{Ryu}}(z_{1}^{k+1},z_{2}^{k+1})+M(\|z_{1}^{k+1}-z_{1}^{k}\|^{2}+\|z_{2}^{k+1}-z_{2}^{k}\|^{2}).
Proof

From the definition of the relaxed Ryu envelope, we have

φγRyu(z1k+1,z2k+1)f3(x3k)+i=12(fi(xik+1)+fi(xik+1),x3kxik+1+12γix3kxik+12).\begin{array}[]{rl}&\varphi_{\gamma}^{\texttt{Ryu}}(z_{1}^{k+1},z_{2}^{k+1})\\ &\leq f_{3}(x_{3}^{k})+\sum_{i=1}^{2}\left(f_{i}(x_{i}^{k+1})+\langle\nabla f_{i}(x_{i}^{k+1}),x_{3}^{k}-x_{i}^{k+1}\rangle+\dfrac{1}{2\gamma_{i}}\|x_{3}^{k}-x_{i}^{k+1}\|^{2}\right).\end{array}

Thus, together with Lemma 1, we have

φγRyu(z1k,z2k)φγRyu(z1k+1,z2k+1)i=12(fi(xik)fi(xik+1)+fi(xik),x3kxikfi(xik+1),x3kxik+1)+i=1212γi(x3kxik2x3kxik+12)\begin{array}[]{rl}&\varphi_{\gamma}^{\texttt{Ryu}}(z_{1}^{k},z_{2}^{k})-\varphi^{\texttt{Ryu}}_{\gamma}(z_{1}^{k+1},z_{2}^{k+1})\\ &\geq\displaystyle\sum_{i=1}^{2}\left(f_{i}(x_{i}^{k})-f_{i}(x_{i}^{k+1})+\langle\nabla f_{i}(x_{i}^{k}),x_{3}^{k}-x_{i}^{k}\rangle-\langle\nabla f_{i}(x_{i}^{k+1}),x_{3}^{k}-x_{i}^{k+1}\rangle\right)\\ &+\displaystyle\sum_{i=1}^{2}\dfrac{1}{2\gamma_{i}}\left(\|x_{3}^{k}-x_{i}^{k}\|^{2}-\|x_{3}^{k}-x_{i}^{k+1}\|^{2}\right)\end{array}

Using Lemmas 3 and 4, we obtain

φγRyu(z1k,z2k)φγRyu(z1k+1,z2k+1)(12L1γλ)Δg1k2+(12L2γλ)Δg2k2+(αγλα2γ)Δx1k2+(α(1α)γλ1α2γ)Δx2k2+α1λΔx1k,Δg1k+12αλΔx2k,Δg2k(1α)αγλΔx2k,Δx1k+αλΔx1k,Δg2k.\begin{array}[]{rl}&\varphi_{\gamma}^{\texttt{Ryu}}(z_{1}^{k},z_{2}^{k})-\varphi^{\texttt{Ryu}}_{\gamma}(z_{1}^{k+1},z_{2}^{k+1})\\ &\geq\left(\dfrac{1}{2L_{1}}-\dfrac{\gamma}{\lambda}\right){\left\|{\Delta g_{1}^{k}}\right\|}^{2}+\left(\dfrac{1}{2L_{2}}-\dfrac{\gamma}{\lambda}\right){\left\|{\Delta g_{2}^{k}}\right\|}^{2}\\ &+\left(\dfrac{\alpha}{\gamma\lambda}-\dfrac{\alpha}{2\gamma}\right){\left\|{\Delta x_{1}^{k}}\right\|}^{2}+\left(\dfrac{\alpha(1-\alpha)}{\gamma\lambda}-\dfrac{1-\alpha}{2\gamma}\right){\left\|{\Delta x_{2}^{k}}\right\|}^{2}\\ &+\dfrac{\alpha-1}{\lambda}{\left\langle{\Delta x_{1}^{k}},{\Delta g_{1}^{k}}\right\rangle}+\dfrac{1-2\alpha}{\lambda}{\left\langle{\Delta x_{2}^{k}},{\Delta g_{2}^{k}}\right\rangle}\\ &-\dfrac{(1-\alpha)\alpha}{\gamma\lambda}{\left\langle{\Delta x_{2}^{k}},{\Delta x_{1}^{k}}\right\rangle}+\dfrac{\alpha}{\lambda}{\left\langle{\Delta x_{1}^{k}},{\Delta g_{2}^{k}}\right\rangle}.\end{array}

Since α<1\alpha<1, then

α1λΔx1k,Δg1k\displaystyle\dfrac{\alpha-1}{\lambda}{\left\langle{\Delta x_{1}^{k}},{\Delta g_{1}^{k}}\right\rangle} (α1)L1λΔx1k2,\displaystyle\geq\dfrac{(\alpha-1)L_{1}}{\lambda}{\left\|{\Delta x_{1}^{k}}\right\|}^{2},
12αλΔx2k,Δg2k\displaystyle\dfrac{1-2\alpha}{\lambda}{\left\langle{\Delta x_{2}^{k}},{\Delta g_{2}^{k}}\right\rangle} =1αλΔx2k,Δg2kαλΔx2k,Δg2k0αL2λΔx2k2,\displaystyle=\dfrac{1-\alpha}{\lambda}{\left\langle{\Delta x_{2}^{k}},{\Delta g_{2}^{k}}\right\rangle}-\dfrac{\alpha}{\lambda}{\left\langle{\Delta x_{2}^{k}},{\Delta g_{2}^{k}}\right\rangle}\geq 0-\dfrac{\alpha L_{2}}{\lambda}{\left\|{\Delta x_{2}^{k}}\right\|}^{2},

where the first inequality holds by L1L_{1}-smoothness of f1f_{1}, while the second inequality holds by the convexity of f2f_{2} and L2L_{2}-smoothness of f2f_{2}. Moreover, by Young’s inequality, we have

(1α)αγλΔx2k,Δx1k\displaystyle-\dfrac{(1-\alpha)\alpha}{\gamma\lambda}{\left\langle{\Delta x_{2}^{k}},{\Delta x_{1}^{k}}\right\rangle} (1α)αγλ(Δx2k22ϵ1+ϵ1Δx1k22),\displaystyle\geq-\dfrac{(1-\alpha)\alpha}{\gamma\lambda}\left(\dfrac{{\left\|{\Delta x_{2}^{k}}\right\|}^{2}}{2\epsilon_{1}}+\dfrac{\epsilon_{1}{\left\|{\Delta x_{1}^{k}}\right\|}^{2}}{2}\right),
αλΔx1k,Δg2k\displaystyle\dfrac{\alpha}{\lambda}{\left\langle{\Delta x_{1}^{k}},{\Delta g_{2}^{k}}\right\rangle} αλΔx1kΔg2kαλ(Δg2k22ϵ2+ϵ2Δx1k22),\displaystyle\geq-\dfrac{\alpha}{\lambda}{\left\|{\Delta x_{1}^{k}}\right\|}{\left\|{\Delta g_{2}^{k}}\right\|}\geq-\frac{\alpha}{\lambda}\left(\dfrac{{\left\|{\Delta g_{2}^{k}}\right\|}^{2}}{2\epsilon_{2}}+\dfrac{\epsilon_{2}{\left\|{\Delta x_{1}^{k}}\right\|}^{2}}{2}\right),

where ϵ1,ϵ2>0\epsilon_{1},\epsilon_{2}>0 are arbitrary. With these, we obtain

φγRyu(z1k,z2k)φγRyu(z1k+1,z2k+1)C0Δg1k2+C1Δg2k2+C2Δx1k2+C3Δx2k2,\begin{array}[]{rl}&\varphi_{\gamma}^{\texttt{Ryu}}(z_{1}^{k},z_{2}^{k})-\varphi^{\texttt{Ryu}}_{\gamma}(z_{1}^{k+1},z_{2}^{k+1})\\ &\geq C_{0}{\left\|{\Delta g_{1}^{k}}\right\|}^{2}+C_{1}{\left\|{\Delta g_{2}^{k}}\right\|}^{2}+C_{2}{\left\|{\Delta x_{1}^{k}}\right\|}^{2}+C_{3}{\left\|{\Delta x_{2}^{k}}\right\|}^{2},\end{array} (24)

where the constants CiC_{i} for i=0,1,2,3i=0,1,2,3 are given by

C0\displaystyle C_{0} =12L1γλ\displaystyle=\dfrac{1}{2L_{1}}-\dfrac{\gamma}{\lambda}
C1\displaystyle C_{1} =12L2γλα2λϵ2\displaystyle=\dfrac{1}{2L_{2}}-\dfrac{\gamma}{\lambda}-\dfrac{\alpha}{2\lambda\epsilon_{2}}
C2\displaystyle C_{2} =αγλα2γ+(α1)L1λ(1α)αϵ12γλαϵ22λ\displaystyle=\dfrac{\alpha}{\gamma\lambda}-\dfrac{\alpha}{2\gamma}+\dfrac{(\alpha-1)L_{1}}{\lambda}-\dfrac{(1-\alpha)\alpha\epsilon_{1}}{2\gamma\lambda}-\dfrac{\alpha\epsilon_{2}}{2\lambda}
C3\displaystyle C_{3} =α(1α)γλ1α2γαL2λ(1α)α2γλϵ1.\displaystyle=\dfrac{\alpha(1-\alpha)}{\gamma\lambda}-\dfrac{1-\alpha}{2\gamma}-\dfrac{\alpha L_{2}}{\lambda}-\dfrac{(1-\alpha)\alpha}{2\gamma\lambda\epsilon_{1}}.

Choosing ϵjIj\epsilon_{j}\in I_{j} for j=1,2j=1,2, it is not difficult to compute that C0,C10C_{0},C_{1}\geq 0 since γmin{γ¯0,γ¯1}\gamma\leq\min\{\bar{\gamma}_{0},\bar{\gamma}_{1}\}. In addition, C2,C3>0C_{2},C_{3}>0 since γ<min{γ¯2,γ¯3}\gamma<\min\{\bar{\gamma}_{2},\bar{\gamma}_{3}\}. Hence, for the given stepsize γ\gamma, we obtain from (24) that

φγRyu(z1k,z2k)φγRyu(z1k+1,z2k+1)C2Δx1k2+C3Δx2k2.\begin{array}[]{rl}&\varphi_{\gamma}^{\texttt{Ryu}}(z_{1}^{k},z_{2}^{k})-\varphi^{\texttt{Ryu}}_{\gamma}(z_{1}^{k+1},z_{2}^{k+1})\geq C_{2}{\left\|{\Delta x_{1}^{k}}\right\|}^{2}+C_{3}{\left\|{\Delta x_{2}^{k}}\right\|}^{2}.\end{array} (25)

Meanwhile, from (16), (17) and LiL_{i}-smoothness of fif_{i} for i=1,2i=1,2, it follows

Δz1k2(1+L1γ)2Δx1k2\|\Delta z_{1}^{k}\|^{2}\leq(1+L_{1}\gamma)^{2}\|\Delta x_{1}^{k}\|^{2}
Δz2k22(α+γL2)2Δx2k2+2α2Δx1k2.\|\Delta z_{2}^{k}\|^{2}\leq 2(\alpha+\gamma L_{2})^{2}\|\Delta x_{2}^{k}\|^{2}+2\alpha^{2}\|\Delta x_{1}^{k}\|^{2}.

Defining

C4=min{C2,C3},C5=max{2α2+(1+L1γ)2,2(α+L2γ)2},C_{4}=\min\{C_{2},C_{3}\},\quad C_{5}=\max\{2\alpha^{2}+\big{(}1+L_{1}\gamma\big{)}^{2},2(\alpha+L_{2}\gamma)^{2}\},

we obtain from (25) that

φγRyu(zk)φγRyu(zk+1)C4(Δx1k2+Δx2k2)C4C5(Δz1k2+Δz2k2).\begin{array}[]{rcl}\varphi_{\gamma}^{\texttt{Ryu}}(z^{k})-\varphi_{\gamma}^{\texttt{Ryu}}(z^{k+1})&\geq&C_{4}(\|\Delta x_{1}^{k}\|^{2}+\|\Delta x_{2}^{k}\|^{2})\\ &\geq&\dfrac{C_{4}}{C_{5}}(\|\Delta z_{1}^{k}\|^{2}+\|\Delta z_{2}^{k}\|^{2}).\end{array}

The proof concludes by setting M=C4/C5M=C_{4}/C_{5}. ∎

Theorem 4.1 suggests that our modification of Ryu’s three-operator splitting method behaves like a descent method for the suitably defined RRE. Hence, one could expect this method to converge. In the next section, we formalize this idea.

4.2 Convergence properties of the modified Ryu’s algorithm

The convergence analysis of the method in (3) relies on a particular relationship between the RRE and a Lagrangian associated with a reformulation of problem (1), which we proceed to define. By duplicating variables, problem (1) can be reformulated as follows

minx1,x2,x3df1(x1)+f2(x2)+f3(x3) s.t. x1=x3,x2=x3.\min_{x_{1},x_{2},x_{3}\in\mathbb{R}^{d}}f_{1}(x_{1})+f_{2}(x_{2})+f_{3}(x_{3})\quad\mbox{ s.t. }x_{1}=x_{3},\;x_{2}=x_{3}.

We define the following Lagrangian associated with this problem reformulation:

β1,β2(x1,x2,x3,μ1,μ2)=f3(x3)+i=12(fi(xi)+μi,xix3+βi2xix32),\mathcal{L}_{\beta_{1},\beta_{2}}(x_{1},x_{2},x_{3},\mu_{1},\mu_{2})=f_{3}(x_{3})+\sum_{i=1}^{2}\left(f_{i}(x_{i})+\langle\mu_{i},x_{i}-x_{3}\rangle+\dfrac{\beta_{i}}{2}\|x_{i}-x_{3}\|^{2}\right),

where, for i=1,2i=1,2, μid\mu_{i}\in\mathbb{R}^{d} is a Lagrangian multiplier associated with the constraint xi=x3x_{i}=x_{3}. The next result states the aformentioned relationship between the RRE and the augmented Lagrangian.

Lemma 6

Suppose Assumption 1 holds. Let γ(0,1L1+L2)\gamma\in(0,\frac{1}{L_{1}+L_{2}}), and λ,α>0\lambda,\alpha>0. Denoting

ξk=(x1k,x2k,x3k,γ1(x1kz1k),γ1(α(x2kx1k)z2k)),\xi^{k}=\big{(}x_{1}^{k},x_{2}^{k},x_{3}^{k},\gamma^{-1}(x_{1}^{k}-z_{1}^{k}),\gamma^{-1}(\alpha(x_{2}^{k}-x_{1}^{k})-z_{2}^{k})\big{)},

then

(k1)φγRyu(z1k,z2k)=1γ1,1γ2(ξk).(\forall k\geq 1)\;\varphi_{\gamma}^{\texttt{Ryu}}(z_{1}^{k},z_{2}^{k})=\mathcal{L}_{\frac{1}{\gamma_{1}},\frac{1}{\gamma_{2}}}(\xi^{k}).
Proof

From (6)–(7), f1(x1k)=γ1(z1kx1k)\nabla f_{1}(x_{1}^{k})=\gamma^{-1}(z_{1}^{k}-x_{1}^{k}) and f2(x2k)=γ1(z2kα(x2kx1k))\nabla f_{2}(x_{2}^{k})=\gamma^{-1}(z_{2}^{k}-\alpha(x_{2}^{k}-x_{1}^{k})). Substituting these identities into (9) yields the result.

We are now ready to state the second main result of this paper. We follow the approach taken in themelis2020douglas ; Atenas25 .

Theorem 4.2 (Subsequential convergence of nonconvex Ryu’s three-operator splitting)

Suppose that Assumption 1 holds. Let α(α¯,1)\alpha\in(\underline{\alpha},1) where α¯2λ3+94λ2\underline{\alpha}\coloneqq\frac{2\lambda-3+\sqrt{9-4\lambda}}{2}, and λ(0,2)\lambda\in(0,2). Let γΓ\gamma\in\Gamma such that γmin{αL1,1αL2}\gamma\leq\min\left\{\frac{\alpha}{L_{1}},\frac{1-\alpha}{L_{2}}\right\}, where Γ\Gamma is given in (23). Then, for any sequence ((x1k,x2k,x3k),(z1k,z2k))k((x_{1}^{k},x_{2}^{k},x_{3}^{k}),(z_{1}^{k},z_{2}^{k}))_{k} generated by algorithm (3), then

  • (i)

    xik+1xik0x_{i}^{k+1}-x_{i}^{k}\to 0 for i=1,2,3i=1,2,3, and zik+1zik0z_{i}^{k+1}-z_{i}^{k}\to 0 and x3kxik0x_{3}^{k}-x_{i}^{k}\to 0 for i=1,2i=1,2.

If, in addition, ((x1k,x2k,x3k),(z1k,z2k))k((x_{1}^{k},x_{2}^{k},x_{3}^{k}),(z_{1}^{k},z_{2}^{k}))_{k} is bounded, then

  • (ii)

    For any cluster point xx^{*} of (x3k)k(x_{3}^{k})_{k}, φγ(zk)φ(x)\varphi_{\gamma}(z^{k})\to\varphi(x^{*}), and f1(x1k)+f2(x2k)+f3(x3k)φ(x)f_{1}(x_{1}^{k})+f_{2}(x_{2}^{k})+f_{3}(x_{3}^{k})\to\varphi(x^{*}).

  • (iii)

    All cluster points of the sequences (xik)k(x_{i}^{k})_{k}, for i=1,2,3i=1,2,3, coincide and are critical points of problem (1).

Proof

From the assumption, minφ>\min\varphi>-\infty, then Proposition 4 implies (φγRyu(z1k,z2k))k(\varphi_{\gamma}^{\texttt{Ryu}}(z_{1}^{k},z_{2}^{k}))_{k} is a bounded non-increasing sequence, thus convergent to some φγ\varphi^{\star}_{\gamma}\in\mathbb{R}. As a consequence, for i=1,2i=1,2, Theorem 4.1 yields zik+1zik0z_{i}^{k+1}-z_{i}^{k}\to 0, and the zikz_{i}^{k}-update rule in (3) gives x3kxik0x_{3}^{k}-x_{i}^{k}\to 0. Since f1f_{1} is convex, then proxγf1\operatorname{prox}_{\gamma f_{1}} is nonexpansive. In particular,

x1k+1x1kz1k+1z1k\|x_{1}^{k+1}-x_{1}^{k}\|\leq\|z_{1}^{k+1}-z_{1}^{k}\|

so that x1k+1x1k0x_{1}^{k+1}-x_{1}^{k}\to 0. Likewise, proxγf2\operatorname{prox}_{\gamma f_{2}} is nonexpansive, yielding

x2k+1x2k1αz2k+1z2k+x1k+1x1k,\|x_{2}^{k+1}-x_{2}^{k}\|\leq\dfrac{1}{\alpha}\|z_{2}^{k+1}-z_{2}^{k}\|+\|x_{1}^{k+1}-x_{1}^{k}\|,

and thus x2k+1x2k0x_{2}^{k+1}-x_{2}^{k}\to 0. Moreover, as

x3k+1x3k=x3k+1x2k+1+x2k+1x2k+x2kx3k,x_{3}^{k+1}-x_{3}^{k}=x_{3}^{k+1}-x_{2}^{k+1}+x_{2}^{k+1}-x_{2}^{k}+x_{2}^{k}-x_{3}^{k},

then x3k+1x3k0x_{3}^{k+1}-x_{3}^{k}\to 0 as well. This proves (i). Next, observe that x3kx1k0x_{3}^{k}-x_{1}^{k}\to 0 and x3kx2k0x_{3}^{k}-x_{2}^{k}\to 0 imply that the sequences (x1k)k(x_{1}^{k})_{k}, (x2k)k(x_{2}^{k})_{k} and (x3k)k(x_{3}^{k})_{k} have the same cluster points. Let xx^{*} be a cluster point of (x3k)k(x_{3}^{k})_{k} and (z1,z2)(z_{1}^{*},z_{2}^{*}) be a cluster point of (z1k,z2k)k(z_{1}^{k},z_{2}^{k})_{k}, and let xikjxx_{i}^{k_{j}}\to x^{*} for i=1,2,3i=1,2,3, and zikjziz_{i}^{k_{j}}\to z_{i}^{*} for i=1,2i=1,2. Note that from continuity of the proximal operator, x=proxγf1(z1)x^{*}=\operatorname{prox}_{\gamma f_{1}}(z_{1}^{*}) and x=proxγαf2(z2α+x)x^{*}=\operatorname{prox}_{\frac{\gamma}{\alpha}f_{2}}(\frac{z_{2}^{*}}{\alpha}+x^{*}). Then

φ(x)lim infjφ(x3kj)(φ is lsc)lim supjφ(x3kj)lim supjφγRyu(z1kj,z2kj)(Proposition 3(ii))=φγRyu(z1,z2)(Proposition 2)φ(x)(Proposition 3(i)).\begin{array}[]{rll}\varphi(x^{*})&\displaystyle\leq\liminf_{j}\varphi(x_{3}^{k_{j}})&\quad\text{($\varphi$ is lsc)}\\ &\displaystyle\leq\limsup_{j}\varphi(x_{3}^{k_{j}})&\\ &\displaystyle\leq\limsup_{j}\varphi_{\gamma}^{\texttt{Ryu}}(z_{1}^{k_{j}},z_{2}^{k_{j}})&\quad\text{(Proposition~{}\ref{Ryu:sandwich}(ii))}\\ &\displaystyle=\varphi_{\gamma}^{\texttt{Ryu}}(z_{1}^{*},z_{2}^{*})&\quad\text{(Proposition~{}\ref{p:loc-lip})}\\ &\leq\varphi(x^{*})&\quad\text{(Proposition~{}\ref{Ryu:sandwich}(i)).}\end{array}

Hence, φγ=limkφγRyu(z1k,z2k)=limjφ(x3kj)=φ(x)\varphi^{\star}_{\gamma}=\lim_{k}\varphi_{\gamma}^{\texttt{Ryu}}(z_{1}^{k},z_{2}^{k})=\lim_{j}\varphi(x_{3}^{k_{j}})=\varphi(x^{*}). Furthermore, from Lemma 6, limk1γ1,1γ2(ξk)=φ(x)\lim_{k}\mathcal{L}_{\frac{1}{\gamma_{1}},\frac{1}{\gamma_{2}}}(\xi^{k})=\varphi(x^{*}). Since (x1k)k(x_{1}^{k})_{k}, (x2k)k(x_{2}^{k})_{k}, (z1k)k(z_{1}^{k})_{k} and (z2k)k(z_{2}^{k})_{k} are bounded, then part (i) implies limk1γ1,1γ2(ξk)=limkf1(x1k)+f2(x2k)+f3(x3k)\lim_{k}\mathcal{L}_{\frac{1}{\gamma_{1}},\frac{1}{\gamma_{2}}}(\xi^{k})=\lim_{k}f_{1}(x_{1}^{k})+f_{2}(x_{2}^{k})+f_{3}(x_{3}^{k}), from where (ii) follows. Earlier in the proof we already showed the first part of item (iii). It remains to prove that xx^{*} is a critical point. A simple computation shows that for any g3f3(x3)g_{3}\in\partial f_{3}(x_{3}),

(f1(x1)+μ1+1γ1(x1x3)f2(x2)+μ2+1γ2(x2x3)g3(μ1+μ2+1γ1(x3x1)+1γ2(x3x2)))^(x1,x2,x3)1γ1,1γ2(x1,x2,x3,μ1,μ2).\begin{pmatrix}\nabla f_{1}(x_{1})+\mu_{1}+\dfrac{1}{\gamma_{1}}(x_{1}-x_{3})\\ \nabla f_{2}(x_{2})+\mu_{2}+\dfrac{1}{\gamma_{2}}(x_{2}-x_{3})\\ g_{3}-\left(\mu_{1}+\mu_{2}+\frac{1}{\gamma_{1}}(x_{3}-x_{1})+\frac{1}{\gamma_{2}}(x_{3}-x_{2})\right)\end{pmatrix}\\ \in\hat{\partial}_{(x_{1},x_{2},x_{3})}\mathcal{L}_{\frac{1}{\gamma_{1}},\frac{1}{\gamma_{2}}}(x_{1},x_{2},x_{3},\mu_{1},\mu_{2}).

Hence, taking xi=xikjx_{i}=x_{i}^{k_{j}} for i=1,2,3i=1,2,3,

μ1kj=γ1(x1kjz1kj), and μ2kj=γ1(α(x2kjx1kj)z2kj).\mu_{1}^{k_{j}}=\gamma^{-1}(x_{1}^{k_{j}}-z_{1}^{k_{j}})\text{, and }\mu_{2}^{k_{j}}=\gamma^{-1}(\alpha(x_{2}^{k_{j}}-x_{1}^{k_{j}})-z_{2}^{k_{j}}).

Then, in view of (6), (7), and (12), we get

(1γ1(x1kjx3kj)1γ2(x2kjx3kj)2αγ(x1kjx2kj)+2γ(x2kjx3kj))^(x1,x2,x3)1γ1,1γ2(x1kj,x2kj,x3kj,μ1kj,μ2kj).\begin{pmatrix}\dfrac{1}{\gamma_{1}}(x_{1}^{k_{j}}-x_{3}^{k_{j}})\\ \dfrac{1}{\gamma_{2}}(x_{2}^{k_{j}}-x_{3}^{k_{j}})\\ \dfrac{2\alpha}{\gamma}(x_{1}^{k_{j}}-x_{2}^{k_{j}})+\dfrac{2}{\gamma}(x_{2}^{k_{j}}-x_{3}^{k_{j}})\end{pmatrix}\in\hat{\partial}_{(x_{1},x_{2},x_{3})}\mathcal{L}_{\frac{1}{\gamma_{1}},\frac{1}{\gamma_{2}}}(x_{1}^{k_{j}},x_{2}^{k_{j}},x_{3}^{k_{j}},\mu_{1}^{k_{j}},\mu_{2}^{k_{j}}).

Taking the limit as jj\to\infty, since xikjxx_{i}^{k_{j}}\to x^{*} for i=1,2,3i=1,2,3, and zikjziz_{i}^{k_{j}}\to z_{i}^{*} for i=1,2i=1,2, item (i) then implies

(000)(x1,x2,x3)1γ1,1γ2(x,x,x,μ1,μ2),\begin{pmatrix}0\\ 0\\ 0\end{pmatrix}\in\partial_{(x_{1},x_{2},x_{3})}\mathcal{L}_{\frac{1}{\gamma_{1}},\frac{1}{\gamma_{2}}}(x^{*},x^{*},x^{*},\mu_{1}^{*},\mu_{2}^{*}),

where μ1=γ1(xz1)\mu_{1}^{*}=\gamma^{-1}(x^{*}-z_{1}^{*}), and μ2=γ1z2\mu_{2}^{*}=-\gamma^{-1}z_{2}^{*}. In turn, this inclusion is equivalent to the following conditions:

0=\displaystyle 0= f1(x)+γ1(xz1)\displaystyle\nabla f_{1}(x^{*})+\gamma^{-1}(x^{*}-z_{1}^{*})
0=\displaystyle 0= f2(x)γ1z2\displaystyle\nabla f_{2}(x^{*})-\gamma^{-1}z_{2}^{*}
0\displaystyle 0\in f3(x)γ1(xz1z2).\displaystyle\partial f_{3}(x^{*})-\gamma^{-1}(x^{*}-z_{1}^{*}-z_{2}^{*}).

Addind these equations yields 0f1(x)+f2(x)+f3(x)=φ(x)0\in\nabla f_{1}(x^{*})+\nabla f_{2}(x^{*})+\partial f_{3}(x^{*})=\partial\varphi(x^{*}), concluding the proof.

Remark 5 (Boundedness of the generated sequences)

Using a similar argument to (themelis2018forward, , Theorem 3.4(iii)), we can show that if φ\varphi has bounded level sets, then φγRyu\varphi_{\gamma}^{\texttt{Ryu}} also has bounded level sets. Since for appropriately chosen stepsize, (φγRyu(z1k,z2k))k(\varphi_{\gamma}^{\texttt{Ryu}}(z_{1}^{k},z_{2}^{k}))_{k} is a non-decreasing sequence as shown in Theorem 4.1, then for all k1k\geq 1, (z1k,z2k){(z1,z2):φγRyu(z1,z2)φγRyu(z10,z20)}(z_{1}^{k},z_{2}^{k})\in\{(z_{1},z_{2}):\varphi_{\gamma}^{\texttt{Ryu}}(z_{1},z_{2})\leq\varphi_{\gamma}^{\texttt{Ryu}}(z_{1}^{0},z_{2}^{0})\}, and thus (z1k,z2k)k(z_{1}^{k},z_{2}^{k})_{k} is bounded. Moreover, boundedness of (x1k,x2k)k(x_{1}^{k},x_{2}^{k})_{k} follows from the continuity properties of proxγf1\operatorname{prox}_{\gamma f_{1}} and proxγαf2\operatorname{prox}_{\frac{\gamma}{\alpha}f_{2}}, and boundedness of (x3k)k(x_{3}^{k})_{k} is a consequence of Theorem 4.2(i).

Remark 6 (Global convergence)

In Theorem 4.2, we establish subsequential convergence of the relaxed variant of Ryu’s three-operator splitting method in a specific nonconvex setting. A natural question is whether the method converges globally. This can be affirmatively addressed by adopting the now standard approach based on the Kurdyka–Łojasiewicz (KL) inequality attouch2013convergence , or alternatively, by leveraging the subdifferential-based error bound technique used in Atenas25 within the unifying framework proposed in atenas2023unified . Under the same set of assumptions, one can also obtain local linear convergence rates.

5 Conclusion

By defining a Moreau-type envelope tailored to the algorithmic scheme in (3), the core of the analysis relies on the sufficient decrease property shown in Theorem 4.1. Observe that our analysis does not cover the limiting case α=1\alpha=1 (corresponding to the original method proposed by Ryu in (2)), as it would make the stepsize interval in (23) empty. Nevertheless, when α\alpha is sufficiently close to 1, we can still guarantee global subsequential convergence for sufficiently small stepsizes. A full treatment of the limiting case α=1\alpha=1 requires a more refined analysis, which is the subject of ongoing work by the authors.

Acknowledgments

The authors thank the mathematical research institute MATRIX in Australia where part of this research was performed. JHA’s visit at MATRIX was supported in part by the MATRIX-Simons Travel Grant. The research of FA was supported in part by Australian Research Council grant DP230101749.

References

  • [1] J. H. Alcantara, C.-p. Lee, and A. Takeda. A four-operator splitting algorithm for nonconvex and nonsmooth optimization. arXiv preprint arXiv:2406.16025, 2024.
  • [2] J. H. Alcantara and A. Takeda. Douglas-Rachford algorithm for nonmonotone multioperator inclusion problems. arXiv preprint arXiv:2501.02752, 2025.
  • [3] F. Atenas. Understanding the Douglas–Rachford splitting method through the lenses of Moreau-type envelopes. Computational Optimization and Applications, 90:1–30, 2025.
  • [4] F. Atenas, C. Sagastizábal, P. J. Silva, and M. Solodov. A unified analysis of descent sequences in weakly convex optimization, including convergence rates for bundle methods. SIAM Journal on Optimization, 33(1):89–115, 2023.
  • [5] H. Attouch, J. Bolte, and B. F. Svaiter. Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Mathematical programming, 137(1):91–129, 2013.
  • [6] A. Beck. First-Order Methods in Optimization. SIAM - Society for Industrial and Applied Mathematics, Philadelphia, PA, United States, 2017.
  • [7] F. Bian and X. Zhang. A three-operator splitting algorithm for nonconvex sparsity regularization. SIAM Journal on Scientific Computing, 43(4):A2809–A2839, 2021.
  • [8] S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, et al. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine learning, 3(1):1–122, 2011.
  • [9] J.-F. Cai, S. Osher, and Z. Shen. Split Bregman methods and frame based image restoration. Multiscale Modeling & Simulation, 8(2):337–369, 2010.
  • [10] P. L. Combettes and J.-C. Pesquet. A Douglas–Rachford splitting approach to nonsmooth convex variational signal recovery. IEEE Journal of Selected Topics in Signal Processing, 1(4):564–574, 2007.
  • [11] P. L. Combettes and J.-C. Pesquet. Proximal Splitting Methods in Signal Processing, pages 185–212. Springer New York, New York, NY, 2011.
  • [12] D. Davis and W. Yin. A three-operator splitting scheme and its optimization applications. Set-valued and variational analysis, 25:829–858, 2017.
  • [13] J. Douglas and H. H. Rachford. On the numerical solution of heat conduction problems in two and three space variables. Transactions of the American mathematical Society, 82(2):421–439, 1956.
  • [14] R. Glowinski, S. Osher, and W. Yin. Splitting Methods in Communication, Imaging, Science, and Engineering. Scientific Computation. Springer International Publishing, Cham, 2017.
  • [15] P.-L. Lions and B. Mercier. Splitting algorithms for the sum of two nonlinear operators. SIAM Journal on Numerical Analysis, 16(6):964–979, 1979.
  • [16] Y. Malitsky and M. K. Tam. Resolvent splitting for sums of monotone operators with minimal lifting. Mathematical Programming, 201(1):231–262, 2023.
  • [17] G. B. Passty. Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. Journal of Mathematical Analysis and Applications, 72(2):383–390, 1979.
  • [18] R. T. Rockafellar and R. J.-B. Wets. Variational Analysis, volume 317 of Grundlehren der Mathematischen Wissenschaften. Springer Verlag Berlin, Berlin, 3rd printing edition, 2009.
  • [19] E. K. Ryu. Uniqueness of DRS as the 2 operator resolvent-splitting and impossibility of 3 operator resolvent-splitting. Mathematical Programming, 182:233–273, 2020.
  • [20] A. Themelis and P. Patrinos. Douglas–Rachford splitting and ADMM for nonconvex optimization: Tight convergence results. SIAM Journal on Optimization, 30(1):149–181, 2020.
  • [21] A. Themelis, L. Stella, and P. Patrinos. Forward-backward envelope for the sum of two nonconvex functions: Further properties and nonmonotone linesearch algorithms. SIAM Journal on Optimization, 28(3):2274–2303, 2018.