¹¹institutetext: Jan Harold Alcantara ²²institutetext: Center for Advanced Intelligence Project, RIKEN, 1-4-1 Nihombashi, Chuo City, Tokyo, 103-0027, Japan ²²email: [email protected] ³³institutetext: Felipe Atenas ⁴⁴institutetext: School of Mathematics and Statistics, The University of Melbourne, 813 Swanston St, Parkville, Melbourne, VIC, 3052, Australia ⁴⁴email: [email protected]

A relaxed version of Ryu’s three-operator splitting method for structured nonconvex optimization

Jan Harold Alcantara and Felipe Atenas

Abstract

In this work, we propose a modification of Ryu’s splitting algorithm for minimizing the sum of three functions, where two of them are convex with Lipschitz continuous gradients, and the third is an arbitrary proper closed function that is not necessarily convex. The modification is essential to facilitate the convergence analysis, particularly in establishing a sufficient descent property for an associated envelope function. This envelope, tailored to the proposed method, is an extension of the well-known Moreau envelope. Notably, the original Ryu splitting algorithm is recovered as a limiting case of our proposal. The results show that the descent property holds as long as the stepsizes remain sufficiently small. Leveraging this result, we prove global subsequential convergence to critical points of the nonconvex objective.

1 Introduction

Modern optimization applications often exhibit a structured form that is well-suited to the application of decomposition techniques. Operator splitting methods have emerged as powerful tools for efficiently solving complex optimization problems. Instances of operator splitting methods include the forward-backward splitting method Passty79 and the Douglas-Rachford splitting method DR ; LM . These algorithmic schemes have been pivotal in the development of techniques for image reconstruction, signal processing, and machine learning combettes2007douglas ; cai2010split ; combettes2011proximal ; boyd2011distributed ; glowinski2017splitting .

The aforementioned methods were originally designed to solve maximal monotone problems with a two-block structure. In the context of optimization problems, it translates to the sum of two convex functions. Several generalizations to three-block problems have been proposed in the literature, such as the Davis-Yin operator splitting method davis2017three , Ryu’s three-operator splitting method ryu2020uniqueness , and the Malitsky-Tam operator splitting method MT23 (which is also applicable to $n$ -block problems). All these methods, in the context of optimization, have provable convergence guarantees for convex optimization problems.

The literature is relatively scarce for methods that solve nonconvex optimization problems for the sum of three functions. Notable contributions in this direction include the Davis-Yin splitting to nonconvex problems analyzed in bian2021three , the extension of Davis-Yin to four-operator splitting investigated in ALT24 , and an extension of Douglas-Rachford splitting examined in AT25 (that also works for $n$ -operators). In this work, we propose a modification of Ryu’s splitting method to solve nonconvex three-block optimization problems with a specific structure.

Consider the optimization problem

\min_{x\in\mathbb{R}^{d}}~{}\varphi(x)\coloneqq f_{1}(x)+f_{2}(x)+f_{3}(x),

(1)

where $f_{i}:\mathbb{R}^{d}\to\mathbb{R}\cup\{+\infty\}$ for $i=1,2,3$ . Given $z_{1}^{0},z_{2}^{0}\in\mathbb{R}^{d}$ , $\gamma>0$ and $\lambda>0$ , the (relaxed) Ryu’s three-operator splitting method ryu2020uniqueness is given by the following iteration: for $k\geq 1$ ,

\left\{\begin{array}[]{rcl}x_{1}^{k}&=&\operatorname{prox}_{\gamma f_{1}}(z_{1}^{k})\\ x_{2}^{k}&=&\operatorname{prox}_{\gamma f_{2}}(z_{2}^{k}+x_{1}^{k})\\ x_{3}^{k}&\in&\operatorname{prox}_{\gamma f_{3}}\big{(}x_{1}^{k}-z_{1}^{k}+x_{2}^{k}-z_{2}^{k}\big{)}\\ \begin{pmatrix}z_{1}^{k+1}\\ z_{2}^{k+1}\end{pmatrix}&=&\begin{pmatrix}z_{1}^{k}\\ z_{2}^{k}\end{pmatrix}+\lambda\begin{pmatrix}x_{3}^{k}-x_{1}^{k}\\ x_{3}^{k}-x_{2}^{k}\end{pmatrix},\end{array}\right.

(2)

where $\operatorname{prox}_{\gamma f}$ denotes the proximal mapping of $\gamma f$ defined in (4) below.

When the functions $f_{1},f_{2}$ and $f_{3}$ in problem (1) are convex, (ryu2020uniqueness, , Theorem 4) states the convergence of the method to minimizers of problem (1). Our objective in this work is to investigate whether the convergence guarantees can be extended to a nonconvex setting.

In our convergence analysis, we adopt the “envelope technique”, a strategy that has been employed in the analysis of several splitting algorithms for nonconvex problems themelis2018forward ; themelis2020douglas ; ALT24 ; AT25 ; Atenas25 . A key challenge, however, lies in the fact that the standard form of Ryu’s splitting algorithm is not directly amenable to this type of analysis. To address this, we consider the following relaxed variant of Ryu’s method: given $z_{1}^{0},z_{2}^{0}\in\mathbb{R}^{d}$ , $\alpha>0$ , $\gamma>0$ and $\lambda>0$ , we define the iterates for all $k\geq 1$ as follows:

\left\{\begin{array}[]{rcl}x_{1}^{k}&=&\operatorname{prox}_{\gamma f_{1}}(z_{1}^{k})\\ x_{2}^{k}&=&\operatorname{prox}_{\frac{\gamma}{\alpha}f_{2}}(\frac{z_{2}^{k}}{\alpha}+x_{1}^{k})\\ x_{3}^{k}&\in&\operatorname{prox}_{\gamma f_{3}}\big{(}x_{1}^{k}-z_{1}^{k}+x_{2}^{k}-z_{2}^{k}\big{)}\\ \begin{pmatrix}z_{1}^{k+1}\\ z_{2}^{k+1}\end{pmatrix}&=&\begin{pmatrix}z_{1}^{k}\\ z_{2}^{k}\end{pmatrix}+\lambda\begin{pmatrix}x_{3}^{k}-x_{1}^{k}\\ x_{3}^{k}-x_{2}^{k}\end{pmatrix}.\end{array}\right.

(3)

In particular, we modify the $f_{2}$ -proximal step in (2) to obtain (3). When $\alpha=1$ , this reduces to Ryu’s splitting algorithm.

Throughout this paper, we adopt the following blanket assumptions.

Assumption 1 (Blanket assumption)

Suppose the following conditions are satisfied.

(a)

For $i=1,2$ , the function $f_{i}:\mathbb{R}^{d}\to\mathbb{R}$ is a convex $L_{i}$ -smooth function, that is, $\nabla f_{i}$ is globally Lipschitz continuous with modulus $L_{i}>0$ .
(b)

The function $f_{3}:\mathbb{R}^{d}\to\mathbb{R}\cup\{+\infty\}$ is proper and lower semicontinuous (lsc for short).
(c)

Problem (1) has a nonempty set of solutions.

Remark 1 (On the simplified nonconvex setting)

The convexity in Assumption 1(a) can be relaxed. However, we impose it in our setting to simplify the analysis. Similar results can be obtained when $f_{1}$ and $f_{2}$ are merely $L_{1}$ - and $L_{2}$ -smooth, respectively, by employing a strategy similar to that in themelis2018forward ; bian2021three .

This paper is organized as follows. In Section 2, we introduce the notation and some variational analysis results we use throughout this paper. In Section 3, we define the merit function we use as the foundation of our analysis $-$ the Ryu’s three-operator splitting envelope $-$ and establish key properties relevant to the convergence analysis of the proposed method. Section 4 is dedicated to investigating the convergence properties of the method (3). In particular, we show that the defined envelope satisfies a sufficient decrease condition, and then we exploit this property to prove that all cluster points of the generated sequence are critical points of the objective function of problem (1). Finally, in Section 5 we comment on some ongoing works and future research directions.

2 Preliminaries and notation

Throughout this paper, $\langle\cdot,\cdot\rangle$ denotes an inner product in $\mathbb{R}^{d}$ , and $\|\cdot\|$ its induced norm. We shall make use of the following technical result.

Lemma 1

For all $a,b,c,d\in\mathbb{R}^{d}$ , it holds

\|a-b\|^{2}-\|a-c\|^{2}=-\|b-c\|^{2}+2\langle c-b,a-b\rangle.

A function $\varphi:\mathbb{R}^{d}\to\mathbb{R}\cup\{+\infty\}$ is called proper when its domain, the set $\operatorname{dom}(\varphi)=\{x\in\mathbb{R}^{d}:\varphi(x)<+\infty\}$ , is nonempty. We say $\varphi$ is lsc if at any $x\in\operatorname{dom}(\varphi)$ , $f(x)\leq\liminf_{y\to x}f(y)$ , and it is convex if for all $x,y\in\operatorname{dom}(\varphi)$ , and all $\beta\in(0,1)$ , $\varphi(\beta x+(1-\beta)y)\leq\beta\varphi(x)+(1-\beta)\varphi(y)$ . A function $\Phi:\mathbb{R}^{d}\to\mathbb{R}$ is said to be $L$ -smooth if it is differentiable and its gradient is $L$ -Lipschitz continuous, that is, for all $x,y\in\mathbb{R}^{d}$ , ${\left\|{\nabla\Phi(x)-\nabla\Phi(y)}\right\|}\leq L\|x-y\|$ . In the next result, we recall some propeties of $L$ -smooth functions that will be important in our analysis (see, for instance, (Beck17, , Theorem 5.8)).

Lemma 2

Suppose $f:\mathbb{R}^{d}\to\mathbb{R}$ is $L$ -smooth. Then,

(\forall x,y\in\mathbb{R}^{d})\;|f(x)-f(y)-\langle\nabla f(y),x-y\rangle|\leq\dfrac{L}{2}\|y-x\|^{2}.

If, in addiiton, $f$ is convex, then

\langle\nabla f(x)-\nabla f(y),x-y\rangle\geq\dfrac{1}{2L}\|\nabla f(x)-\nabla f(y)\|^{2}.

For a proper function $\varphi:\mathbb{R}^{d}\to\mathbb{R}\cup\{+\infty\}$ , and $x\in\operatorname{dom}(\varphi)$ , we denote by $\hat{\partial}\varphi(x)$ the Fréchet (or regular) subdifferential of $\varphi$ at $x$ , defined as

\hat{\partial}\varphi(x)=\left\{v\in\mathbb{R}^{d}:\liminf_{y\to x}\frac{\varphi(y)-\varphi(x)-\langle v,y-x\rangle}{\|y-x\|}\geq 0\right\}.

The limiting (or general) subdifferential of $\varphi$ at $x$ , denoted $\partial\varphi(x)$ , is defined as

\partial\varphi(x)=\limsup_{y\xrightarrow[\varphi]{}x}\hat{\partial}\varphi(x),

where $y\xrightarrow[\varphi]{}x$ denotes convergence in the attentive sense, that is, $y\to x$ and $\varphi(y)\to\varphi(x)$ . When $\varphi$ is smooth, then $\hat{\partial}\varphi(x)={\partial}\varphi(x)=\{\nabla\varphi(x)\}$ . If $\varphi$ is proper lsc convex, then the Fréchet and limiting subdifferentials coincide with the subdifferential of convex analysis, namely,

\{v\in\mathbb{R}^{d}:(\forall y\in\mathbb{R}^{d})\;\varphi(y)\geq\varphi(x)+\langle v,y-x\rangle\}.

A set of points of particular interest is the zeros of the subdifferential operator. We say $\bar{x}\in\mathbb{R}^{d}$ is a critical point of $\varphi$ if $0\in\partial\varphi(\bar{x})$ . If $\varphi$ is convex, critical points are exactly the global minimizers of $\varphi$ .

Remark 2 (Critical points of the sum)

Under Assumption 1 and for $\varphi$ defined in (1), in view of (Rockafellar_Wets_2009, , Exercise 8.8(c)), $\bar{x}$ is a critical point of $\varphi$ if

0\in\nabla f_{1}(\bar{x})+\nabla f_{2}(\bar{x})+\partial f_{3}(\bar{x}).

Given a function $\varphi:\mathbb{R}^{d}\to\mathbb{R}\cup\{+\infty\}$ , a parameter $\gamma>0$ , and a point $z\in\mathbb{R}^{d}$ , the proximal operator of $\gamma\varphi$ at $x$ is defined as

\operatorname{prox}_{\gamma\varphi}(z)\coloneqq\operatorname*{argmin}\limits_{y\in\mathbb{R}^{d}}~{}\varphi(y)+\frac{1}{2\gamma}{\left\|{y-z}\right\|}^{2}.

(4)

The associated optimal value function is known as the Moreau envelope, defined as follows:

{\varphi}_{\gamma}^{\texttt{Moreau}}(z)\coloneqq\min_{y\in\mathbb{R}^{d}}~{}\varphi(y)+\frac{1}{2\gamma}{\left\|{y-z}\right\|}^{2}.

We say $\varphi$ is prox-bounded if, for some $\gamma>0$ , $\varphi(\cdot)+\frac{1}{2\gamma}\|\cdot\|^{2}$ is bounded from below. The supremum $\gamma_{\varphi}>0$ of such parameters $\gamma$ is called the threshold of prox-boundedness. If $\varphi$ is a proper lsc prox-bounded function with threshold $\gamma_{\varphi}>0$ , then for any $\gamma\in(0,\gamma_{\varphi})$ , $\operatorname{prox}_{\gamma\varphi}$ is nonempty and compact-valued, and $\varphi_{\gamma\varphi}^{\texttt{Moreau}}$ is finite-valued (Rockafellar_Wets_2009, , Theorem 1.25). In particular, if $\varphi$ is a proper lsc convex function, then $\gamma_{\varphi}=+\infty$ , and $\operatorname{prox}_{\gamma\varphi}$ is a single-valued mapping (Rockafellar_Wets_2009, , Theorem 12.12, Theorem 12.17).

Remark 3 (On the well-definedness of proximal operations)

Under Assumption 1, $\operatorname{prox}_{\gamma f_{1}}$ and $\operatorname{prox}_{\gamma f_{2}}$ are single-valued since $f_{1}$ and $f_{2}$ are proper lsc convex, while $\operatorname{prox}_{\gamma f_{3}}$ is well-defined for $\gamma<\frac{1}{L_{1}+L_{2}}$ , as $f_{3}$ is prox-bounded with threshold at least $\frac{1}{L_{1}+L_{2}}$ from (themelis2020douglas, , Remark 3.1).

3 An envelope for Ryu’s splitting method

Our convergence analysis for (3) under Assumption 1 builds on the approach in ALT24 ; AT25 ; themelis2020douglas , which utilizes an envelope function, akin to the Moreau envelope, well-suited to the corresponding iterative method. We begin our analysis by motivating such merit function.

Proposition 1

Suppose that Assumption 1 holds, and let $\alpha>0$ and $(z_{1}^{k},z_{2}^{k})\in\mathbb{R}^{d}\times\mathbb{R}^{d}$ . Then, $x_{3}^{k}$ defined in (3) solves the following minimization problem

\min_{y\in\mathbb{R}^{d}}\left\{f_{3}(y)+\displaystyle\sum_{i=1}^{2}\left[f_{i}(x_{i}^{k})+\langle y-x_{i}^{k},\nabla f_{i}(x_{i}^{k})\rangle+\dfrac{1}{2\gamma_{i}}\|y-x_{i}^{k}\|^{2}\right]\right\},

where $\gamma_{1}\coloneqq\frac{\gamma}{\alpha}$ and $\gamma_{2}\coloneqq\frac{\gamma}{1-\alpha}$ .¹¹1We adopt the convention that $\frac{c}{0}=\infty$ and $\frac{d}{\infty}=0$ for any $c>0$ and $d\in\mathbb{R}$ . Hence, when $\alpha=1$ , $\gamma_{2}=\infty$ and the term $\dfrac{1}{2\gamma_{i}}\|y-x_{i}^{k}\|^{2}$ vanishes when $i=2$ .

Proof

From the $x_{3}$ -update in (3), we have

x_{3}^{k}\in\operatorname*{argmin}_{y\in\mathbb{R}^{d}}\left\{f_{3}(y)+\dfrac{1}{2\gamma}\|y-(x_{1}^{k}-z_{1}^{k}+x_{2}^{k}-z_{2}^{k})\|^{2}\right\}.

(5)

Meanwhile, the first-order optimality conditions of the $x_{1}$ -update and the $x_{2}$ -update yield, respectively,

	$\displaystyle z_{1}^{k}$	$\displaystyle=\gamma\nabla f_{1}(x_{1}^{k})+x_{1}^{k}$		(6)
	$\displaystyle z_{2}^{k}$	$\displaystyle=\gamma\nabla f_{2}(x_{2}^{k})+\alpha x_{2}^{k}-\alpha x_{1}^{k}.$		(7)

Hence,

	$\displaystyle x_{1}^{k}-z_{1}^{k}+x_{2}^{k}-z_{2}^{k}$	$\displaystyle=x_{1}^{k}-\left(\gamma\nabla f_{1}(x_{1}^{k})+x_{1}^{k}\right)+x_{2}^{k}-\left(\gamma\nabla f_{2}(x_{2}^{k})+\alpha x_{2}^{k}-\alpha x_{1}^{k}\right)$
		$\displaystyle=\alpha x_{1}^{k}-\gamma\nabla f_{1}(x_{1}^{k})+(1-\alpha)x_{2}^{k}-\gamma\nabla f_{2}(x_{2}^{k})$
		$\displaystyle=\alpha\left(x_{1}^{k}-\frac{\gamma}{\alpha}\nabla f_{1}(x_{1}^{k})\right)+(1-\alpha)\left(x_{2}^{k}-\frac{\gamma}{1-\alpha}\nabla f_{2}(x_{2}^{k})\right),$

which, combined with (5), yields

x_{3}^{k}\in\operatorname*{argmin}_{y\in\mathbb{R}^{d}}\left\{f_{3}(y)+\dfrac{1}{2\gamma}\left\|y-\left(\sum_{i=1}^{2}\alpha_{i}(x_{i}^{k}-\gamma_{i}\nabla f_{i}(x_{i}^{k})\right)\right\|^{2}\right\}

(8)

for $\alpha_{1}=\alpha$ and $\alpha_{2}=1-\alpha$ . Following the calculations in (AT25, , Theorem 5.5), by expanding the squared norm, dropping the constant term $\left\|\sum_{i=1}^{2}\alpha_{i}(x_{i}^{k}-\gamma_{i}\nabla f_{i}(x_{i}^{k})\right\|^{2}$ , and adding the constant term $\sum_{i=1}^{2}f_{i}(x_{i}^{k})$ , we get the desired result. ∎

In view of Proposition 1, we define the Relaxed Ryu envelope (RRE), for all $(z_{1},z_{2})\in\mathbb{R}^{d}\times\mathbb{R}^{d}$ , by

\varphi_{\gamma}^{\texttt{Ryu}}(z_{1},z_{2})\coloneqq\min_{y\in\mathbb{R}^{d}}\left\{f_{3}(y)+\displaystyle\sum_{i=1}^{2}\left[f_{i}(x_{i})+\langle y-x_{i},\nabla f_{i}(x_{i})\rangle+\dfrac{1}{2\gamma_{i}}\|y-x_{i}\|^{2}\right]\right\}.

(9)

Meanwhile, the corresponding set-valued iteration operator associated with (3) is given by

T_{\gamma}^{\texttt{Ryu}}:(z_{1},z_{2})\mapsto(z_{1}+\lambda(x_{3}-x_{1}),z_{2}+\lambda(x_{3}-x_{2})),

where

\left\{\begin{array}[]{rcl}x_{1}&=&\operatorname{prox}_{\gamma f_{1}}(z_{1})\\ x_{2}&=&\operatorname{prox}_{\frac{\gamma}{\alpha}f_{2}}(\frac{z_{2}}{\alpha}+x_{1})\\ x_{3}&\in&\operatorname{prox}_{\gamma f_{3}}\big{(}x_{1}-z_{1}+x_{2}-z_{2}\big{)}.\end{array}\right.

(10)

In view of Remark 3, note that $T_{\gamma}^{\texttt{Ryu}}$ is nonempty- and compact-valued for any $(z_{1},z_{2})\in\mathbb{R}^{d}\times\mathbb{R}^{d}$ provided that $\gamma<\frac{1}{L_{1}+L_{2}}$ .

Remark 4 (Relationship between the Moreau envelope and the RRE)

Observe that from (8), we have

\varphi_{\gamma}^{\texttt{Ryu}}(z_{1},z_{2})=\varphi_{\gamma f_{3}}^{\texttt{Moreau}}\left(\sum_{i=1}^{2}\alpha_{i}(x_{i}-\gamma_{i}\nabla f_{i}(x_{i}))\right)\\ -\left\|\sum_{i=1}^{2}\alpha_{i}(x_{i}-\gamma_{i}\nabla f_{i}(x_{i}))\right\|^{2}+\sum_{i=1}^{2}f_{i}(x_{i}).

We now establish some properties of the envelope function. The following result states that the RRE inherits continuity properties of the Moreau envelope (cf. (themelis2018forward, , Proposition 4.2) and (themelis2020douglas, , Proposition 3.2)).

Proposition 2 (Continuity of RRE)

The RRE is a real-valued and locally Lipschitz continuous function.

Proof

Since $\operatorname{prox}_{\gamma f_{1}}$ and $\operatorname{prox}_{\frac{\gamma}{\alpha}f_{2}}$ are nonexpansive (Beck17, , Theorem 6.42(b)), then the maps $z_{1}\mapsto x_{1}$ and $(z_{1},z_{2})\mapsto x_{2}$ defined in (10) are (globally) Lipschitz continuous. Furthermore, from Assumption 1, $\nabla f_{i}$ is (globally) Lipschitz continuous, for $i=1,2$ . Then, as the Moreau envelope is locally Lipschitz continuous (Rockafellar_Wets_2009, , Example 10.32), the conclusion follows from Remark 4.∎

Next, we show some sandwich-type bounds relating the RRE and the original objective function in problem (1) (cf. (themelis2018forward, , Proposition 4.3) and (themelis2020douglas, , Proposition 3.3)).

Proposition 3

Suppose Assumption 1 holds, $\gamma\in(0,\frac{1}{L_{1}+L_{2}})$ , and $\lambda,\alpha>0$ . Then, given $(z_{1},z_{2})\in\mathbb{R}^{d}\times\mathbb{R}^{d}$ , consider $(x_{1},x_{2},x_{3})\in\mathbb{R}^{d}\times\mathbb{R}^{d}\times\mathbb{R}^{d}$ defined by (10). Then,

(i)

For $\gamma_{1}=\frac{\gamma}{\alpha}$ and $\gamma_{2}=\frac{\gamma}{1-\alpha}$ ,

\varphi_{\gamma}^{\texttt{Ryu}}(z_{1},z_{2})\leq\min\left\{\varphi(x_{1})+\frac{1}{2}\left(L_{2}+\frac{1}{\gamma_{2}}\right)\|x_{1}-x_{2}\|^{2},\right.\\ \left.\varphi(x_{2})+\frac{1}{2}\left(L_{1}+\frac{1}{\gamma_{1}}\right)\|x_{1}-x_{2}\|^{2}\right\}.

(ii)

$\varphi_{\gamma}^{\texttt{Ryu}}(z_{1},z_{2})\geq\varphi(x_{3})+\frac{\alpha-\gamma L_{1}}{2\gamma}{\left\|{x_{3}-x_{1}}\right\|}^{2}+\frac{(1-\alpha)-\gamma L_{2}}{2\gamma}{\left\|{x_{3}-x_{2}}\right\|}^{2}$ .
(iii)

If $\alpha\in(0,1)$ and $\gamma\leq\min\left\{\frac{\alpha}{L_{1}},\frac{1-\alpha}{L_{2}}\right\}$ , then $\varphi_{\gamma}^{\texttt{Ryu}}(z_{1},z_{2})\geq\varphi(x_{3})$ .

Proof

Take $y=x_{1}$ in (9) and apply Lemma 2 to $f=f_{2}$ to obtain

\begin{array}[]{rcl}\varphi_{\gamma}^{\texttt{Ryu}}(z_{1},z_{2})&\leq&f_{3}(x_{1})+f_{1}(x_{1})+f_{2}(x_{2})+\langle\nabla f_{2}(x_{2}),x_{1}-x_{2}\rangle+\frac{1}{2\gamma_{2}}\|x_{1}-x_{2}\|^{2}\\ &\leq&\varphi(x_{1})+\dfrac{1}{2}\left(L_{2}+\frac{1}{\gamma_{2}}\right)\|x_{1}-x_{2}\|^{2}.\end{array}

Similarly, taking $y=x_{2}$ in (9) and applying Lemma 2 to $f=f_{1}$ yields

\begin{array}[]{rcl}\varphi_{\gamma}^{\texttt{Ryu}}(z_{1},z_{2})&\leq&\varphi(x_{2})+\dfrac{1}{2}\left(L_{1}+\frac{1}{\gamma_{1}}\right)\|x_{2}-x_{1}\|^{2}.\end{array}

From these, we immediately get (i). Moreover, since $y=x_{3}$ minimizes the problem in (9),

\begin{array}[]{rcl}\varphi_{\gamma}^{\texttt{Ryu}}(z_{1},z_{2})&=&f_{3}(x_{3})+\displaystyle\sum_{i=1}^{2}\left[f_{i}(x_{i})+\langle\nabla f_{i}(x_{i}),x_{3}-x_{i}\rangle+\dfrac{1}{2\gamma_{i}}\|x_{3}-x_{i}\|^{2}\right]\\ &\geq&f_{3}(x_{3})+\displaystyle\sum_{i=1}^{2}\left[f_{i}(x_{3})-\dfrac{L_{i}}{2}\|x_{3}-x_{i}\|^{2}+\dfrac{1}{2\gamma_{i}}\|x_{3}-x_{i}\|^{2}\right]\\ &=&\varphi(x_{3})+\frac{\alpha-\gamma L_{1}}{2\gamma}{\left\|{x_{3}-x_{1}}\right\|}^{2}+\frac{(1-\alpha)-\gamma L_{2}}{2\gamma}{\left\|{x_{3}-x_{2}}\right\|}^{2},\end{array}

where the inequality holds by Lemma 2 and the last equality holds by plugging in $\gamma_{1}=\frac{\gamma}{\alpha}$ and $\gamma_{2}=\frac{\gamma}{1-\alpha}$ . This completes the proof of (ii). Finally, part (iii) immediately follows from (ii). ∎

Note that the iteration in (3) is designed to find a fixed point of the relaxed Ryu splitting operator $T_{\gamma}^{\texttt{Ryu}}$ , that is, a point in the set

\operatorname{Fix}T_{\gamma}^{\texttt{Ryu}}\coloneqq\left\{(z_{1},z_{2})\in\mathbb{R}^{d}\times\mathbb{R}^{d}:(z_{1},z_{2})\in T_{\gamma}^{\texttt{Ryu}}(z_{1},z_{2})\right\}.

In the next proposition, we establish a connection between such fixed points and the notion of criticality, as commonly used in optimization.

Proposition 4

Suppose Assumption 1 holds, and let $\gamma\in(0,\frac{1}{L_{1}+L_{2}})$ and $\alpha,\lambda>0$ . Then, $(\bar{z}_{1},\bar{z}_{2})\in\operatorname{Fix}T_{\gamma}^{\texttt{Ryu}}$ if and only if $\bar{x}:=\bar{x}_{1}=\bar{x}_{2}=\bar{x}_{3}$ , where

\left\{\begin{array}[]{rcl}\bar{x}_{1}&=&\operatorname{prox}_{\gamma f_{1}}(\bar{z}_{1})\\ \bar{x}_{2}&=&\operatorname{prox}_{\frac{\gamma}{\alpha}f_{2}}(\frac{\bar{z}_{2}}{\alpha}+\bar{x}_{1})\\ \bar{x}_{3}&\in&\operatorname{prox}_{\gamma f_{3}}\big{(}\bar{x}_{1}-\bar{z}_{1}+\bar{x}_{2}-\bar{z}_{2}\big{)}.\end{array}\right.

(11)

Furthermore, such $\bar{x}$ is a critical point of (1), and $\varphi(\bar{x})=\varphi_{\gamma}^{\texttt{Ryu}}(\bar{z}_{1},\bar{z}_{2})$ . In particular, if $\alpha\in(0,1)$ and $\gamma\leq\min\left\{\frac{\alpha}{L_{1}},\frac{1-\alpha}{L_{2}}\right\}$ , then

\min\varphi=\min\varphi_{\gamma}^{\texttt{Ryu}}.

Proof

It is straightforward from the definition of $T_{\gamma}^{\texttt{Ryu}}$ that $(\bar{z}_{1},\bar{z}_{2})\in\operatorname{Fix}T_{\gamma}^{\texttt{Ryu}}$ is equivalent to having $\bar{x}:=\bar{x}_{1}=\bar{x}_{2}=\bar{x}_{3}$ satisfying the conditions in (11). Hence, it suffices to prove that such $\bar{x}$ is, in this case, always a critical point of $\varphi$ . Evaluating the first-order optimality conditions of the $x_{3}$ -step, namely,

0\in\gamma\partial f_{3}(x_{3})+x_{3}-x_{1}+z_{1}-x_{2}+z_{2}

(12)

at $(x_{1},x_{2},x_{3})=(\bar{x},\bar{x},\bar{x})$ and $z_{i}=\bar{z}_{i}$ , for $i=1,2$ , yields

0\in\gamma\partial f_{3}(\bar{x})+\bar{z}_{1}-\bar{x}+\bar{z}_{2}.

Adding this inclusion with (6) and (7) using the substitutions $x_{1}^{k}\leftarrow\bar{x}$ and $x_{2}^{k}\leftarrow\bar{x}$ , we obtain

0\in\gamma\big{(}\nabla f_{1}(\bar{x})+\nabla f_{2}(\bar{x})+\partial f_{3}(\bar{x})\big{)},

that is, $0\in\partial\varphi(\bar{x})$ by Remark 2.

Furthermore, given $(\bar{z}_{1},\bar{z}_{2})\in\operatorname{Fix}T_{\gamma}^{\texttt{Ryu}}$ , Proposition 3(i)&(ii) imply that $\varphi(\bar{x})=\varphi_{\gamma}^{\texttt{Ryu}}(\bar{z}_{1},\bar{z}_{2})$ . Finally, in view of Proposition 3(iii), for any $(z_{1},z_{2})\in\mathbb{R}^{d}\times\mathbb{R}^{d}$ , and $\bar{x}\in\operatorname*{argmin}\varphi$ ,

\varphi_{\gamma}^{\texttt{Ryu}}(\bar{z}_{1},\bar{z}_{2})=\varphi(\bar{x})\leq\varphi(x_{3})\leq\varphi_{\gamma}^{\texttt{Ryu}}(z_{1},z_{2}).

Hence, $(\bar{z}_{1},\bar{z}_{2})\in\operatorname*{argmin}\varphi_{\gamma}^{\texttt{Ryu}}$ and $\min\varphi=\min\varphi_{\gamma}^{\texttt{Ryu}}$ .∎

At this point, we have built the necessary tools regarding the RRE for the analysis of convergence of the proposed relaxed Ryu splitting method. In the next section, we show subsequential convergence of the method under standard assumptions in the literature.

4 Convergence of modified Ryu’s three-operator splitting method via envelopes

To establish the (subsequential) convergence of the iterative method in (3), we follow the approach introduced in themelis2020douglas for the Douglas–Rachford splitting method. Specifically, the core argument relies on a sufficient decrease property satisfied by the RRE.

4.1 Sufficient decrease property for RRE

We first prove three technical lemmas that we will use in the main result of this section. We use the following notation: for $i=1,2$ ,

\begin{array}[]{rcl}\Delta x_{i}^{k}&=&x_{i}^{k+1}-x_{i}^{k},\\ \Delta g_{i}^{k}&=&\nabla f_{i}(x_{i}^{k+1})-\nabla f_{i}(x_{i}^{k}),\\ \Delta z_{i}^{k}&=&z_{i}^{k+1}-z_{i}^{k}.\end{array}

(13)

Lemma 3

Under Assumption 1, the sequences $(x_{i}^{k})_{k}$ for $i=1,2,3$ , and $(z_{i}^{k})_{k}$ for $i=1,2$ generated by (3) satisfy:

\|x_{3}^{k}-x_{1}^{k}\|^{2}-\|x_{3}^{k}-x_{1}^{k+1}\|^{2}=\left(\dfrac{2}{\lambda}-1\right)\|\Delta x_{1}^{k}\|^{2}+\dfrac{2\gamma}{\lambda}\langle\Delta x_{1}^{k},\Delta g_{1}^{k}\rangle

(14)

and

\|x_{3}^{k}-x_{2}^{k}\|^{2}-\|x_{3}^{k}-x_{2}^{k+1}\|^{2}=\left(\dfrac{2\alpha}{\lambda}-1\right)\|\Delta x_{2}^{k}\|^{2}+\dfrac{2\gamma}{\lambda}\langle\Delta x_{2}^{k},\Delta g_{2}^{k}\rangle-\dfrac{2\alpha}{\lambda}\langle\Delta x_{2}^{k},\Delta x_{1}^{k}\rangle.

(15)

Proof

Using the notations (13), we have from (6) and (7) that

	$\displaystyle\Delta z_{1}^{k}$	$\displaystyle=\gamma\Delta g_{1}^{k}+\Delta x_{1}^{k}$		(16)
	$\displaystyle\Delta z_{2}^{k}$	$\displaystyle=\gamma\Delta g_{2}^{k}+\alpha\Delta x_{2}^{k}-\alpha\Delta x_{1}^{k}.$		(17)

On the other hand, Lemma 1 and the $(z_{1},z_{2})$ -update in (3) yield

\|x_{3}^{k}-x_{i}^{k}\|^{2}-\|x_{3}^{k}-x_{i}^{k+1}\|^{2}=-\|\Delta x_{i}^{k}\|^{2}+2\langle\Delta x_{i}^{k},x_{3}^{k}-x_{i}^{k}\rangle=-\|\Delta x_{i}^{k}\|^{2}+\dfrac{2}{\lambda}\langle\Delta x_{i}^{k},\Delta z_{i}^{k}\rangle.

Together with (16) and (17), we get (14) and (15), respectively. ∎

Lemma 4

Under Assumption 1, the sequences $(x_{i}^{k})_{k}$ for $i=1,2,3$ , and $(z_{i}^{k})_{k}$ for $i=1,2$ generated by (2) satisfy:

\sum_{i=1}^{2}f_{i}(x_{i}^{k})-f_{i}(x_{i}^{k+1})-\langle\nabla f_{i}(x_{i}^{k+1}),x_{3}^{k}-x_{i}^{k+1}\rangle+\langle\nabla f_{i}(x_{i}^{k}),x_{3}^{k}-x_{i}^{k}\rangle\\ \geq\sum_{i=1}^{2}\left(\dfrac{1}{2L_{i}}-\dfrac{\gamma}{\lambda}\right)\|\Delta g_{i}^{k}\|^{2}-\dfrac{1}{\lambda}\langle\Delta g_{1}^{k},\Delta x_{1}^{k}\rangle\\ -\dfrac{\alpha}{\lambda}\langle\Delta g_{2}^{k},\Delta x_{2}^{k}\rangle+\dfrac{\alpha}{\lambda}\langle\Delta g_{2}^{k},\Delta x_{1}^{k}\rangle.

Proof

For $i=1,2$ , we have from Lemma 2 and the $(z_{1},z_{2})$ -update rule in (3) that

	$\displaystyle f_{i}(x_{i}^{k})-f_{i}(x_{i}^{k+1})-\langle\nabla f_{i}(x_{i}^{k+1}),x_{3}^{k}-x_{i}^{k+1}\rangle+\langle\nabla f_{i}(x_{i}^{k}),x_{3}^{k}-x_{i}^{k}\rangle$
$\displaystyle=$	$\displaystyle f_{i}(x_{i}^{k})-f_{i}(x_{i}^{k+1})-\langle\nabla f_{i}(x_{i}^{k+1}),x_{i}^{k}-x_{i}^{k+1}\rangle-\langle\Delta g_{i}^{k},x_{3}^{k}-x_{i}^{k}\rangle$
$\displaystyle\geq$	$\displaystyle\dfrac{1}{2L_{i}}\\|\Delta g_{i}^{k}\\|^{2}-\langle\Delta g_{i}^{k},x_{3}^{k}-x_{i}^{k}\rangle$
$\displaystyle=$	$\displaystyle\dfrac{1}{2L_{i}}\\|\Delta g_{i}^{k}\\|^{2}-\dfrac{1}{\lambda}\langle\Delta g_{i}^{k},\Delta z_{i}^{k}\rangle,$	(18)

Meanwhile, we have from (16) that

\langle\Delta g_{1}^{k},\Delta z_{1}^{k}\rangle=\gamma\|\Delta g_{1}^{k}\|^{2}+\langle\Delta g_{1}^{k},\Delta x_{1}^{k}\rangle.

(19)

On the other hand, (17) yields

\langle\Delta g_{2}^{k},\Delta z_{2}^{k}\rangle=\gamma\|\Delta g_{2}^{k}\|^{2}+\alpha\langle\Delta g_{2}^{k},\Delta x_{2}^{k}\rangle-\alpha\langle\Delta g_{2}^{k},\Delta x_{1}^{k}\rangle.

(20)

Combining (18), (19) and (20) gives the desired inequality. ∎

Lemma 5

Let $\lambda\in(0,2)$ and let $\underline{\alpha}\coloneqq\frac{2\lambda-3+\sqrt{9-4\lambda}}{2}$ . Then the following holds:

(i)

$\underline{\alpha}\in\left(\frac{\lambda}{2},1\right)$ .
(ii)

The interval $\left(\dfrac{\alpha}{2\alpha-\lambda},\dfrac{2-\lambda}{1-\alpha}\right)$ is nonempty for any $\alpha\in(\underline{\alpha},1)$ .

(iii)

For $\epsilon_{1},\epsilon_{2}>0$ and $\alpha\in(\underline{\alpha},1)$ , define the constants

\begin{array}[]{rcl}\bar{\gamma}_{1}&\coloneqq&\dfrac{\lambda}{2L_{2}}-\dfrac{\alpha}{2\varepsilon_{2}},\\ \bar{\gamma}_{2}&\coloneqq&\dfrac{\alpha(2-\lambda-(1-\alpha)\epsilon_{1})}{\alpha\epsilon_{2}+2(1-\alpha)L_{1}},\\ \bar{\gamma}_{3}&\coloneqq&\dfrac{(1-\alpha)(\epsilon_{1}(2\alpha-\lambda)-\alpha)}{2\alpha L_{2}\epsilon_{1}},\end{array}

(21)

and the intervals

I_{1}\coloneqq\left(\frac{\alpha}{2\alpha-\lambda},\frac{2-\lambda}{1-\alpha}\right)\quad\text{and}\quad I_{2}\coloneqq\left(\frac{\alpha L_{2}}{\lambda},+\infty\right).

(22)

If $\epsilon_{j}\in I_{j}$ for $j=1,2$ , then $\bar{\gamma}_{i}$ is strictly positive for $i=1,2,3$ .

Proof

These results follow from straightforward calculations. ∎

With these lemmas in place, we are now ready to present the first main result of this paper. We establish the sufficient descent property of the RRE, provided that the stepsize is chosen sufficiently small. In particular, we restrict the stepsize $\gamma$ in the interval

\Gamma\coloneqq\left(0,\min\{\bar{\gamma}_{0},\bar{\gamma}_{1}\}\right]\cap\left(0,\min\left\{\bar{\gamma}_{2},\bar{\gamma}_{3},{\frac{1}{L_{1}+L_{2}}}\right\}\right),

(23)

where $\bar{\gamma_{0}}\coloneqq\frac{\lambda}{2L_{1}}$ , and $\bar{\gamma}_{i}$ is given by (21) for $i=1,2,3$ , with $\epsilon_{j}$ taken from $I_{j}$ given in (22) for $j=1,2$ .

Theorem 4.1 (RRE sufficient descent)

Suppose that Assumption 1 holds. Let $\alpha\in(\underline{\alpha},1)$ where $\underline{\alpha}\coloneqq\frac{2\lambda-3+\sqrt{9-4\lambda}}{2}$ , and $\lambda\in(0,2)$ . Let $\gamma\in\Gamma$ , where $\Gamma$ is given in (23). For the sequences $(z_{i}^{k})_{k}$ for $i=1,2$ generated by (3), there exists $M=M(\gamma)>0$ such that for all $k\geq 1$ ,

\varphi_{\gamma}^{\texttt{Ryu}}(z_{1}^{k},z_{2}^{k})\geq\varphi_{\gamma}^{\texttt{Ryu}}(z_{1}^{k+1},z_{2}^{k+1})+M(\|z_{1}^{k+1}-z_{1}^{k}\|^{2}+\|z_{2}^{k+1}-z_{2}^{k}\|^{2}).

Proof

From the definition of the relaxed Ryu envelope, we have

\begin{array}[]{rl}&\varphi_{\gamma}^{\texttt{Ryu}}(z_{1}^{k+1},z_{2}^{k+1})\\ &\leq f_{3}(x_{3}^{k})+\sum_{i=1}^{2}\left(f_{i}(x_{i}^{k+1})+\langle\nabla f_{i}(x_{i}^{k+1}),x_{3}^{k}-x_{i}^{k+1}\rangle+\dfrac{1}{2\gamma_{i}}\|x_{3}^{k}-x_{i}^{k+1}\|^{2}\right).\end{array}

Thus, together with Lemma 1, we have

\begin{array}[]{rl}&\varphi_{\gamma}^{\texttt{Ryu}}(z_{1}^{k},z_{2}^{k})-\varphi^{\texttt{Ryu}}_{\gamma}(z_{1}^{k+1},z_{2}^{k+1})\\ &\geq\displaystyle\sum_{i=1}^{2}\left(f_{i}(x_{i}^{k})-f_{i}(x_{i}^{k+1})+\langle\nabla f_{i}(x_{i}^{k}),x_{3}^{k}-x_{i}^{k}\rangle-\langle\nabla f_{i}(x_{i}^{k+1}),x_{3}^{k}-x_{i}^{k+1}\rangle\right)\\ &+\displaystyle\sum_{i=1}^{2}\dfrac{1}{2\gamma_{i}}\left(\|x_{3}^{k}-x_{i}^{k}\|^{2}-\|x_{3}^{k}-x_{i}^{k+1}\|^{2}\right)\end{array}

Using Lemmas 3 and 4, we obtain

\begin{array}[]{rl}&\varphi_{\gamma}^{\texttt{Ryu}}(z_{1}^{k},z_{2}^{k})-\varphi^{\texttt{Ryu}}_{\gamma}(z_{1}^{k+1},z_{2}^{k+1})\\ &\geq\left(\dfrac{1}{2L_{1}}-\dfrac{\gamma}{\lambda}\right){\left\|{\Delta g_{1}^{k}}\right\|}^{2}+\left(\dfrac{1}{2L_{2}}-\dfrac{\gamma}{\lambda}\right){\left\|{\Delta g_{2}^{k}}\right\|}^{2}\\ &+\left(\dfrac{\alpha}{\gamma\lambda}-\dfrac{\alpha}{2\gamma}\right){\left\|{\Delta x_{1}^{k}}\right\|}^{2}+\left(\dfrac{\alpha(1-\alpha)}{\gamma\lambda}-\dfrac{1-\alpha}{2\gamma}\right){\left\|{\Delta x_{2}^{k}}\right\|}^{2}\\ &+\dfrac{\alpha-1}{\lambda}{\left\langle{\Delta x_{1}^{k}},{\Delta g_{1}^{k}}\right\rangle}+\dfrac{1-2\alpha}{\lambda}{\left\langle{\Delta x_{2}^{k}},{\Delta g_{2}^{k}}\right\rangle}\\ &-\dfrac{(1-\alpha)\alpha}{\gamma\lambda}{\left\langle{\Delta x_{2}^{k}},{\Delta x_{1}^{k}}\right\rangle}+\dfrac{\alpha}{\lambda}{\left\langle{\Delta x_{1}^{k}},{\Delta g_{2}^{k}}\right\rangle}.\end{array}

Since $\alpha<1$ , then

	$\displaystyle\dfrac{\alpha-1}{\lambda}{\left\langle{\Delta x_{1}^{k}},{\Delta g_{1}^{k}}\right\rangle}$	$\displaystyle\geq\dfrac{(\alpha-1)L_{1}}{\lambda}{\left\\|{\Delta x_{1}^{k}}\right\\|}^{2},$
	$\displaystyle\dfrac{1-2\alpha}{\lambda}{\left\langle{\Delta x_{2}^{k}},{\Delta g_{2}^{k}}\right\rangle}$	$\displaystyle=\dfrac{1-\alpha}{\lambda}{\left\langle{\Delta x_{2}^{k}},{\Delta g_{2}^{k}}\right\rangle}-\dfrac{\alpha}{\lambda}{\left\langle{\Delta x_{2}^{k}},{\Delta g_{2}^{k}}\right\rangle}\geq 0-\dfrac{\alpha L_{2}}{\lambda}{\left\\|{\Delta x_{2}^{k}}\right\\|}^{2},$

where the first inequality holds by $L_{1}$ -smoothness of $f_{1}$ , while the second inequality holds by the convexity of $f_{2}$ and $L_{2}$ -smoothness of $f_{2}$ . Moreover, by Young’s inequality, we have

	$\displaystyle-\dfrac{(1-\alpha)\alpha}{\gamma\lambda}{\left\langle{\Delta x_{2}^{k}},{\Delta x_{1}^{k}}\right\rangle}$	$\displaystyle\geq-\dfrac{(1-\alpha)\alpha}{\gamma\lambda}\left(\dfrac{{\left\\|{\Delta x_{2}^{k}}\right\\|}^{2}}{2\epsilon_{1}}+\dfrac{\epsilon_{1}{\left\\|{\Delta x_{1}^{k}}\right\\|}^{2}}{2}\right),$
	$\displaystyle\dfrac{\alpha}{\lambda}{\left\langle{\Delta x_{1}^{k}},{\Delta g_{2}^{k}}\right\rangle}$	$\displaystyle\geq-\dfrac{\alpha}{\lambda}{\left\\|{\Delta x_{1}^{k}}\right\\|}{\left\\|{\Delta g_{2}^{k}}\right\\|}\geq-\frac{\alpha}{\lambda}\left(\dfrac{{\left\\|{\Delta g_{2}^{k}}\right\\|}^{2}}{2\epsilon_{2}}+\dfrac{\epsilon_{2}{\left\\|{\Delta x_{1}^{k}}\right\\|}^{2}}{2}\right),$

where $\epsilon_{1},\epsilon_{2}>0$ are arbitrary. With these, we obtain

\begin{array}[]{rl}&\varphi_{\gamma}^{\texttt{Ryu}}(z_{1}^{k},z_{2}^{k})-\varphi^{\texttt{Ryu}}_{\gamma}(z_{1}^{k+1},z_{2}^{k+1})\\ &\geq C_{0}{\left\|{\Delta g_{1}^{k}}\right\|}^{2}+C_{1}{\left\|{\Delta g_{2}^{k}}\right\|}^{2}+C_{2}{\left\|{\Delta x_{1}^{k}}\right\|}^{2}+C_{3}{\left\|{\Delta x_{2}^{k}}\right\|}^{2},\end{array}

(24)

where the constants $C_{i}$ for $i=0,1,2,3$ are given by

	$\displaystyle C_{0}$	$\displaystyle=\dfrac{1}{2L_{1}}-\dfrac{\gamma}{\lambda}$
	$\displaystyle C_{1}$	$\displaystyle=\dfrac{1}{2L_{2}}-\dfrac{\gamma}{\lambda}-\dfrac{\alpha}{2\lambda\epsilon_{2}}$
	$\displaystyle C_{2}$	$\displaystyle=\dfrac{\alpha}{\gamma\lambda}-\dfrac{\alpha}{2\gamma}+\dfrac{(\alpha-1)L_{1}}{\lambda}-\dfrac{(1-\alpha)\alpha\epsilon_{1}}{2\gamma\lambda}-\dfrac{\alpha\epsilon_{2}}{2\lambda}$
	$\displaystyle C_{3}$	$\displaystyle=\dfrac{\alpha(1-\alpha)}{\gamma\lambda}-\dfrac{1-\alpha}{2\gamma}-\dfrac{\alpha L_{2}}{\lambda}-\dfrac{(1-\alpha)\alpha}{2\gamma\lambda\epsilon_{1}}.$

Choosing $\epsilon_{j}\in I_{j}$ for $j=1,2$ , it is not difficult to compute that $C_{0},C_{1}\geq 0$ since $\gamma\leq\min\{\bar{\gamma}_{0},\bar{\gamma}_{1}\}$ . In addition, $C_{2},C_{3}>0$ since $\gamma<\min\{\bar{\gamma}_{2},\bar{\gamma}_{3}\}$ . Hence, for the given stepsize $\gamma$ , we obtain from (24) that

\begin{array}[]{rl}&\varphi_{\gamma}^{\texttt{Ryu}}(z_{1}^{k},z_{2}^{k})-\varphi^{\texttt{Ryu}}_{\gamma}(z_{1}^{k+1},z_{2}^{k+1})\geq C_{2}{\left\|{\Delta x_{1}^{k}}\right\|}^{2}+C_{3}{\left\|{\Delta x_{2}^{k}}\right\|}^{2}.\end{array}

(25)

Meanwhile, from (16), (17) and $L_{i}$ -smoothness of $f_{i}$ for $i=1,2$ , it follows

\|\Delta z_{1}^{k}\|^{2}\leq(1+L_{1}\gamma)^{2}\|\Delta x_{1}^{k}\|^{2}

\|\Delta z_{2}^{k}\|^{2}\leq 2(\alpha+\gamma L_{2})^{2}\|\Delta x_{2}^{k}\|^{2}+2\alpha^{2}\|\Delta x_{1}^{k}\|^{2}.

Defining

C_{4}=\min\{C_{2},C_{3}\},\quad C_{5}=\max\{2\alpha^{2}+\big{(}1+L_{1}\gamma\big{)}^{2},2(\alpha+L_{2}\gamma)^{2}\},

we obtain from (25) that

\begin{array}[]{rcl}\varphi_{\gamma}^{\texttt{Ryu}}(z^{k})-\varphi_{\gamma}^{\texttt{Ryu}}(z^{k+1})&\geq&C_{4}(\|\Delta x_{1}^{k}\|^{2}+\|\Delta x_{2}^{k}\|^{2})\\ &\geq&\dfrac{C_{4}}{C_{5}}(\|\Delta z_{1}^{k}\|^{2}+\|\Delta z_{2}^{k}\|^{2}).\end{array}

The proof concludes by setting $M=C_{4}/C_{5}$ . ∎

Theorem 4.1 suggests that our modification of Ryu’s three-operator splitting method behaves like a descent method for the suitably defined RRE. Hence, one could expect this method to converge. In the next section, we formalize this idea.

4.2 Convergence properties of the modified Ryu’s algorithm

The convergence analysis of the method in (3) relies on a particular relationship between the RRE and a Lagrangian associated with a reformulation of problem (1), which we proceed to define. By duplicating variables, problem (1) can be reformulated as follows

\min_{x_{1},x_{2},x_{3}\in\mathbb{R}^{d}}f_{1}(x_{1})+f_{2}(x_{2})+f_{3}(x_{3})\quad\mbox{ s.t. }x_{1}=x_{3},\;x_{2}=x_{3}.

We define the following Lagrangian associated with this problem reformulation:

\mathcal{L}_{\beta_{1},\beta_{2}}(x_{1},x_{2},x_{3},\mu_{1},\mu_{2})=f_{3}(x_{3})+\sum_{i=1}^{2}\left(f_{i}(x_{i})+\langle\mu_{i},x_{i}-x_{3}\rangle+\dfrac{\beta_{i}}{2}\|x_{i}-x_{3}\|^{2}\right),

where, for $i=1,2$ , $\mu_{i}\in\mathbb{R}^{d}$ is a Lagrangian multiplier associated with the constraint $x_{i}=x_{3}$ . The next result states the aformentioned relationship between the RRE and the augmented Lagrangian.

Lemma 6

Suppose Assumption 1 holds. Let $\gamma\in(0,\frac{1}{L_{1}+L_{2}})$ , and $\lambda,\alpha>0$ . Denoting

\xi^{k}=\big{(}x_{1}^{k},x_{2}^{k},x_{3}^{k},\gamma^{-1}(x_{1}^{k}-z_{1}^{k}),\gamma^{-1}(\alpha(x_{2}^{k}-x_{1}^{k})-z_{2}^{k})\big{)},

then

(\forall k\geq 1)\;\varphi_{\gamma}^{\texttt{Ryu}}(z_{1}^{k},z_{2}^{k})=\mathcal{L}_{\frac{1}{\gamma_{1}},\frac{1}{\gamma_{2}}}(\xi^{k}).

Proof

From (6)–(7), $\nabla f_{1}(x_{1}^{k})=\gamma^{-1}(z_{1}^{k}-x_{1}^{k})$ and $\nabla f_{2}(x_{2}^{k})=\gamma^{-1}(z_{2}^{k}-\alpha(x_{2}^{k}-x_{1}^{k}))$ . Substituting these identities into (9) yields the result.

We are now ready to state the second main result of this paper. We follow the approach taken in themelis2020douglas ; Atenas25 .

Theorem 4.2 (Subsequential convergence of nonconvex Ryu’s three-operator splitting)

Suppose that Assumption 1 holds. Let $\alpha\in(\underline{\alpha},1)$ where $\underline{\alpha}\coloneqq\frac{2\lambda-3+\sqrt{9-4\lambda}}{2}$ , and $\lambda\in(0,2)$ . Let $\gamma\in\Gamma$ such that $\gamma\leq\min\left\{\frac{\alpha}{L_{1}},\frac{1-\alpha}{L_{2}}\right\}$ , where $\Gamma$ is given in (23). Then, for any sequence $((x_{1}^{k},x_{2}^{k},x_{3}^{k}),(z_{1}^{k},z_{2}^{k}))_{k}$ generated by algorithm (3), then

(i)

$x_{i}^{k+1}-x_{i}^{k}\to 0$ for $i=1,2,3$ , and $z_{i}^{k+1}-z_{i}^{k}\to 0$ and $x_{3}^{k}-x_{i}^{k}\to 0$ for $i=1,2$ .

If, in addition, $((x_{1}^{k},x_{2}^{k},x_{3}^{k}),(z_{1}^{k},z_{2}^{k}))_{k}$ is bounded, then

(ii)

For any cluster point $x^{*}$ of $(x_{3}^{k})_{k}$ , $\varphi_{\gamma}(z^{k})\to\varphi(x^{*})$ , and $f_{1}(x_{1}^{k})+f_{2}(x_{2}^{k})+f_{3}(x_{3}^{k})\to\varphi(x^{*})$ .
(iii)

All cluster points of the sequences $(x_{i}^{k})_{k}$ , for $i=1,2,3$ , coincide and are critical points of problem (1).

Proof

From the assumption, $\min\varphi>-\infty$ , then Proposition 4 implies $(\varphi_{\gamma}^{\texttt{Ryu}}(z_{1}^{k},z_{2}^{k}))_{k}$ is a bounded non-increasing sequence, thus convergent to some $\varphi^{\star}_{\gamma}\in\mathbb{R}$ . As a consequence, for $i=1,2$ , Theorem 4.1 yields $z_{i}^{k+1}-z_{i}^{k}\to 0$ , and the $z_{i}^{k}$ -update rule in (3) gives $x_{3}^{k}-x_{i}^{k}\to 0$ . Since $f_{1}$ is convex, then $\operatorname{prox}_{\gamma f_{1}}$ is nonexpansive. In particular,

\|x_{1}^{k+1}-x_{1}^{k}\|\leq\|z_{1}^{k+1}-z_{1}^{k}\|

so that $x_{1}^{k+1}-x_{1}^{k}\to 0$ . Likewise, $\operatorname{prox}_{\gamma f_{2}}$ is nonexpansive, yielding

\|x_{2}^{k+1}-x_{2}^{k}\|\leq\dfrac{1}{\alpha}\|z_{2}^{k+1}-z_{2}^{k}\|+\|x_{1}^{k+1}-x_{1}^{k}\|,

and thus $x_{2}^{k+1}-x_{2}^{k}\to 0$ . Moreover, as

x_{3}^{k+1}-x_{3}^{k}=x_{3}^{k+1}-x_{2}^{k+1}+x_{2}^{k+1}-x_{2}^{k}+x_{2}^{k}-x_{3}^{k},

then $x_{3}^{k+1}-x_{3}^{k}\to 0$ as well. This proves (i). Next, observe that $x_{3}^{k}-x_{1}^{k}\to 0$ and $x_{3}^{k}-x_{2}^{k}\to 0$ imply that the sequences $(x_{1}^{k})_{k}$ , $(x_{2}^{k})_{k}$ and $(x_{3}^{k})_{k}$ have the same cluster points. Let $x^{*}$ be a cluster point of $(x_{3}^{k})_{k}$ and $(z_{1}^{*},z_{2}^{*})$ be a cluster point of $(z_{1}^{k},z_{2}^{k})_{k}$ , and let $x_{i}^{k_{j}}\to x^{*}$ for $i=1,2,3$ , and $z_{i}^{k_{j}}\to z_{i}^{*}$ for $i=1,2$ . Note that from continuity of the proximal operator, $x^{*}=\operatorname{prox}_{\gamma f_{1}}(z_{1}^{*})$ and $x^{*}=\operatorname{prox}_{\frac{\gamma}{\alpha}f_{2}}(\frac{z_{2}^{*}}{\alpha}+x^{*})$ . Then

\begin{array}[]{rll}\varphi(x^{*})&\displaystyle\leq\liminf_{j}\varphi(x_{3}^{k_{j}})&\quad\text{($\varphi$ is lsc)}\\ &\displaystyle\leq\limsup_{j}\varphi(x_{3}^{k_{j}})&\\ &\displaystyle\leq\limsup_{j}\varphi_{\gamma}^{\texttt{Ryu}}(z_{1}^{k_{j}},z_{2}^{k_{j}})&\quad\text{(Proposition~{}\ref{Ryu:sandwich}(ii))}\\ &\displaystyle=\varphi_{\gamma}^{\texttt{Ryu}}(z_{1}^{*},z_{2}^{*})&\quad\text{(Proposition~{}\ref{p:loc-lip})}\\ &\leq\varphi(x^{*})&\quad\text{(Proposition~{}\ref{Ryu:sandwich}(i)).}\end{array}

Hence, $\varphi^{\star}_{\gamma}=\lim_{k}\varphi_{\gamma}^{\texttt{Ryu}}(z_{1}^{k},z_{2}^{k})=\lim_{j}\varphi(x_{3}^{k_{j}})=\varphi(x^{*})$ . Furthermore, from Lemma 6, $\lim_{k}\mathcal{L}_{\frac{1}{\gamma_{1}},\frac{1}{\gamma_{2}}}(\xi^{k})=\varphi(x^{*})$ . Since $(x_{1}^{k})_{k}$ , $(x_{2}^{k})_{k}$ , $(z_{1}^{k})_{k}$ and $(z_{2}^{k})_{k}$ are bounded, then part (i) implies $\lim_{k}\mathcal{L}_{\frac{1}{\gamma_{1}},\frac{1}{\gamma_{2}}}(\xi^{k})=\lim_{k}f_{1}(x_{1}^{k})+f_{2}(x_{2}^{k})+f_{3}(x_{3}^{k})$ , from where (ii) follows. Earlier in the proof we already showed the first part of item (iii). It remains to prove that $x^{*}$ is a critical point. A simple computation shows that for any $g_{3}\in\partial f_{3}(x_{3})$ ,

\begin{pmatrix}\nabla f_{1}(x_{1})+\mu_{1}+\dfrac{1}{\gamma_{1}}(x_{1}-x_{3})\\ \nabla f_{2}(x_{2})+\mu_{2}+\dfrac{1}{\gamma_{2}}(x_{2}-x_{3})\\ g_{3}-\left(\mu_{1}+\mu_{2}+\frac{1}{\gamma_{1}}(x_{3}-x_{1})+\frac{1}{\gamma_{2}}(x_{3}-x_{2})\right)\end{pmatrix}\\ \in\hat{\partial}_{(x_{1},x_{2},x_{3})}\mathcal{L}_{\frac{1}{\gamma_{1}},\frac{1}{\gamma_{2}}}(x_{1},x_{2},x_{3},\mu_{1},\mu_{2}).

Hence, taking $x_{i}=x_{i}^{k_{j}}$ for $i=1,2,3$ ,

\mu_{1}^{k_{j}}=\gamma^{-1}(x_{1}^{k_{j}}-z_{1}^{k_{j}})\text{, and }\mu_{2}^{k_{j}}=\gamma^{-1}(\alpha(x_{2}^{k_{j}}-x_{1}^{k_{j}})-z_{2}^{k_{j}}).

Then, in view of (6), (7), and (12), we get

\begin{pmatrix}\dfrac{1}{\gamma_{1}}(x_{1}^{k_{j}}-x_{3}^{k_{j}})\\ \dfrac{1}{\gamma_{2}}(x_{2}^{k_{j}}-x_{3}^{k_{j}})\\ \dfrac{2\alpha}{\gamma}(x_{1}^{k_{j}}-x_{2}^{k_{j}})+\dfrac{2}{\gamma}(x_{2}^{k_{j}}-x_{3}^{k_{j}})\end{pmatrix}\in\hat{\partial}_{(x_{1},x_{2},x_{3})}\mathcal{L}_{\frac{1}{\gamma_{1}},\frac{1}{\gamma_{2}}}(x_{1}^{k_{j}},x_{2}^{k_{j}},x_{3}^{k_{j}},\mu_{1}^{k_{j}},\mu_{2}^{k_{j}}).

Taking the limit as $j\to\infty$ , since $x_{i}^{k_{j}}\to x^{*}$ for $i=1,2,3$ , and $z_{i}^{k_{j}}\to z_{i}^{*}$ for $i=1,2$ , item (i) then implies

\begin{pmatrix}0\\ 0\\ 0\end{pmatrix}\in\partial_{(x_{1},x_{2},x_{3})}\mathcal{L}_{\frac{1}{\gamma_{1}},\frac{1}{\gamma_{2}}}(x^{*},x^{*},x^{*},\mu_{1}^{*},\mu_{2}^{*}),

where $\mu_{1}^{*}=\gamma^{-1}(x^{*}-z_{1}^{*})$ , and $\mu_{2}^{*}=-\gamma^{-1}z_{2}^{*}$ . In turn, this inclusion is equivalent to the following conditions:

	$\displaystyle 0=$	$\displaystyle\nabla f_{1}(x^{})+\gamma^{-1}(x^{}-z_{1}^{*})$
	$\displaystyle 0=$	$\displaystyle\nabla f_{2}(x^{})-\gamma^{-1}z_{2}^{}$
	$\displaystyle 0\in$	$\displaystyle\partial f_{3}(x^{})-\gamma^{-1}(x^{}-z_{1}^{}-z_{2}^{}).$

Addind these equations yields $0\in\nabla f_{1}(x^{*})+\nabla f_{2}(x^{*})+\partial f_{3}(x^{*})=\partial\varphi(x^{*})$ , concluding the proof.

Remark 5 (Boundedness of the generated sequences)

Using a similar argument to (themelis2018forward, , Theorem 3.4(iii)), we can show that if $\varphi$ has bounded level sets, then $\varphi_{\gamma}^{\texttt{Ryu}}$ also has bounded level sets. Since for appropriately chosen stepsize, $(\varphi_{\gamma}^{\texttt{Ryu}}(z_{1}^{k},z_{2}^{k}))_{k}$ is a non-decreasing sequence as shown in Theorem 4.1, then for all $k\geq 1$ , $(z_{1}^{k},z_{2}^{k})\in\{(z_{1},z_{2}):\varphi_{\gamma}^{\texttt{Ryu}}(z_{1},z_{2})\leq\varphi_{\gamma}^{\texttt{Ryu}}(z_{1}^{0},z_{2}^{0})\}$ , and thus $(z_{1}^{k},z_{2}^{k})_{k}$ is bounded. Moreover, boundedness of $(x_{1}^{k},x_{2}^{k})_{k}$ follows from the continuity properties of $\operatorname{prox}_{\gamma f_{1}}$ and $\operatorname{prox}_{\frac{\gamma}{\alpha}f_{2}}$ , and boundedness of $(x_{3}^{k})_{k}$ is a consequence of Theorem 4.2(i).

Remark 6 (Global convergence)

In Theorem 4.2, we establish subsequential convergence of the relaxed variant of Ryu’s three-operator splitting method in a specific nonconvex setting. A natural question is whether the method converges globally. This can be affirmatively addressed by adopting the now standard approach based on the Kurdyka–Łojasiewicz (KL) inequality attouch2013convergence , or alternatively, by leveraging the subdifferential-based error bound technique used in Atenas25 within the unifying framework proposed in atenas2023unified . Under the same set of assumptions, one can also obtain local linear convergence rates.

5 Conclusion

By defining a Moreau-type envelope tailored to the algorithmic scheme in (3), the core of the analysis relies on the sufficient decrease property shown in Theorem 4.1. Observe that our analysis does not cover the limiting case $\alpha=1$ (corresponding to the original method proposed by Ryu in (2)), as it would make the stepsize interval in (23) empty. Nevertheless, when $\alpha$ is sufficiently close to 1, we can still guarantee global subsequential convergence for sufficiently small stepsizes. A full treatment of the limiting case $\alpha=1$ requires a more refined analysis, which is the subject of ongoing work by the authors.

Acknowledgments

The authors thank the mathematical research institute MATRIX in Australia where part of this research was performed. JHA’s visit at MATRIX was supported in part by the MATRIX-Simons Travel Grant. The research of FA was supported in part by Australian Research Council grant DP230101749.

References

[1] J. H. Alcantara, C.-p. Lee, and A. Takeda. A four-operator splitting algorithm for nonconvex and nonsmooth optimization. arXiv preprint arXiv:2406.16025, 2024.
[2] J. H. Alcantara and A. Takeda. Douglas-Rachford algorithm for nonmonotone multioperator inclusion problems. arXiv preprint arXiv:2501.02752, 2025.
[3] F. Atenas. Understanding the Douglas–Rachford splitting method through the lenses of Moreau-type envelopes. Computational Optimization and Applications, 90:1–30, 2025.
[4] F. Atenas, C. Sagastizábal, P. J. Silva, and M. Solodov. A unified analysis of descent sequences in weakly convex optimization, including convergence rates for bundle methods. SIAM Journal on Optimization, 33(1):89–115, 2023.
[5] H. Attouch, J. Bolte, and B. F. Svaiter. Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Mathematical programming, 137(1):91–129, 2013.
[6] A. Beck. First-Order Methods in Optimization. SIAM - Society for Industrial and Applied Mathematics, Philadelphia, PA, United States, 2017.
[7] F. Bian and X. Zhang. A three-operator splitting algorithm for nonconvex sparsity regularization. SIAM Journal on Scientific Computing, 43(4):A2809–A2839, 2021.
[8] S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, et al. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine learning, 3(1):1–122, 2011.
[9] J.-F. Cai, S. Osher, and Z. Shen. Split Bregman methods and frame based image restoration. Multiscale Modeling & Simulation, 8(2):337–369, 2010.
[10] P. L. Combettes and J.-C. Pesquet. A Douglas–Rachford splitting approach to nonsmooth convex variational signal recovery. IEEE Journal of Selected Topics in Signal Processing, 1(4):564–574, 2007.
[11] P. L. Combettes and J.-C. Pesquet. Proximal Splitting Methods in Signal Processing, pages 185–212. Springer New York, New York, NY, 2011.
[12] D. Davis and W. Yin. A three-operator splitting scheme and its optimization applications. Set-valued and variational analysis, 25:829–858, 2017.
[13] J. Douglas and H. H. Rachford. On the numerical solution of heat conduction problems in two and three space variables. Transactions of the American mathematical Society, 82(2):421–439, 1956.
[14] R. Glowinski, S. Osher, and W. Yin. Splitting Methods in Communication, Imaging, Science, and Engineering. Scientific Computation. Springer International Publishing, Cham, 2017.
[15] P.-L. Lions and B. Mercier. Splitting algorithms for the sum of two nonlinear operators. SIAM Journal on Numerical Analysis, 16(6):964–979, 1979.
[16] Y. Malitsky and M. K. Tam. Resolvent splitting for sums of monotone operators with minimal lifting. Mathematical Programming, 201(1):231–262, 2023.
[17] G. B. Passty. Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. Journal of Mathematical Analysis and Applications, 72(2):383–390, 1979.
[18] R. T. Rockafellar and R. J.-B. Wets. Variational Analysis, volume 317 of Grundlehren der Mathematischen Wissenschaften. Springer Verlag Berlin, Berlin, 3rd printing edition, 2009.
[19] E. K. Ryu. Uniqueness of DRS as the 2 operator resolvent-splitting and impossibility of 3 operator resolvent-splitting. Mathematical Programming, 182:233–273, 2020.
[20] A. Themelis and P. Patrinos. Douglas–Rachford splitting and ADMM for nonconvex optimization: Tight convergence results. SIAM Journal on Optimization, 30(1):149–181, 2020.
[21] A. Themelis, L. Stella, and P. Patrinos. Forward-backward envelope for the sum of two nonconvex functions: Further properties and nonmonotone linesearch algorithms. SIAM Journal on Optimization, 28(3):2274–2303, 2018.