This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A Tseng type stochastic forward-backward algorithm for monotone inclusions

Van Dung Nguyena and Nguyen The Vinha

aDepartment of Mathematical Analysis,
University of Transport and Communications, 3 Cau Giay Street, Hanoi, Vietnam
[email protected][email protected]
CONTACT N. T. Vinh. Email: [email protected]
Abstract

In this paper, we propose a stochastic version of the classical Tseng’s forward-backward-forward method with inertial term for solving monotone inclusions given by the sum of a maximal monotone operator and a single-valued monotone operator in real Hilbert spaces. We obtain the almost sure convergence for the general case and the rate 𝒪(1/n)\mathcal{O}(1/n) in expectation for the strong monotone case. Furthermore, we derive 𝒪(1/n)\mathcal{O}(1/n) rate convergence of the primal-dual gap for saddle point problems.

Keywords: Forward-backward Splitting Algorithm, Monotone Inclusions, Tseng’s Method, Stochastic Algorithm, Convergence Rate.

Mathematics Subject Classifications (2010):

1 Introduction

In this paper, we study the following inclusion problem:

find x such that 0Ax+Bx,\displaystyle\text{find $x^{*}\in\mathcal{H}$ such that}\ \ 0\in Ax^{*}+Bx^{*}, (1.1)

where \mathcal{H} is a separable real Hilbert space, A:2A:\mathcal{H}\to 2^{\mathcal{H}} is a maximal monotone operator and B:B:\mathcal{H}\to\mathcal{H} is a monotone operator. The solution set of (1.1) is denoted by (A+B)1(0)(A+B)^{-1}(0).

This problem plays an important role in many fields, such as equilibrium problems, fixed point problems, variational inequalities, and composite minimization problems, see, for example, [1, 9, 2]. To be more precise, many problems in signal processing, computer vision and machine learning can be modeled mathematically as this formulation, see [19, 33, 3] and the references therein. For solving the problem (1.1), the so-called forward-backward splitting method is given as follows:

xn+1=(I+λA)1(xnλBxn),\displaystyle x_{n+1}=(I+\lambda A)^{-1}(x_{n}-\lambda Bx_{n}), (1.2)

where λ>0\lambda>0.

The forward-backward splitting algorithm for monotone inclusion problems was first introduced by Lions and Mercier [28]. In the work of Lions and Mercier, other splitting methods, such as Peaceman– Rachford algorithm [31] and Douglas-Rachford algorithm [22] was developed to find the zeros of the sum of two maximal monotone operators. Since then, it has been studied and reported extensively in the literature; see, for instance, [41, 10, 14, 6, 13, 25, 40] and the references therein. Recently, stochastic versions of splitting algorithms for monotone inclusions have been proposed, for example stochastic forward-backward splitting method [37, 36], stochastic Douglas-Rachford splitting method [11], stochastic reflected forward-backward splitting method [29] and stochastic primal-dual method [38], see also [19, 4, 32] and applications to stochastic optimization [37, 17] and machine learning [35, 24].

Motivated and inspired by the algorithms in [41, 37, 36, 42, 29], we will introduce a new stochastic splitting algorithm for inclusion problems. The convergence and the rate convergence of the proposed algorithm are obtained.

The rest of the paper is organized as follows. After collecting some definitions and basic results in Section 2, we prove in Section 3 the almost sure convergen for the general case and the strong convergence along with the rate convergence in the strongly monotone case.

In section 4, we apply the proposed algorithm to the convex-concave saddle point problem.

2 Preliminaries

Let \mathcal{H} be a separable real Hilbert space endowed with the inner product \left\langle{}\mid{}\right\rangle and the associated norm \|\cdot\|. When (xn)n(x_{n})_{n\in\mathbb{N}} is a sequence in \mathcal{H}, we denote strong convergence of (xn)(x_{n}) to xx\in\mathcal{H} by xnxx_{n}\to x and weak convergence by xnxx_{n}\rightharpoonup x.

We recall some well-known definitions.

Definition 2.1

Let A:2A:\mathcal{H}\to 2^{\mathcal{H}} be a set-valued mapping with nonempty values.

  1. (1)

    AA is said to be monotone if for all x,y,uAxx,y\in\mathcal{H},\ u\in Ax and vAyv\in Ay, the following inequality holds uvxy0\left\langle{u-v}\mid{x-y}\right\rangle\geq 0.

  2. (2)

    AA is said to be maximally monotone, if it is monotone and if for any (x,u)×(x,u)\in\mathcal{H}\times\mathcal{H}, uvxy0\left\langle{u-v}\mid{x-y}\right\rangle\geq 0 for every (y,v)(y,v)\in graA={(x,y):yAx}\operatorname{gra}A=\{(x,y):y\in Ax\} (the graph of mapping AA) implies that uAxu\in Ax.

  3. (3)

    We say that AA is ϕA\phi_{A}-uniformly monotone, if there exists an increasing function ϕA:[0,[[0,]\phi_{A}\colon\left[0,\infty\right[\to\left[0,\infty\right] that vanishes only at 0 such that

    ((x,u),(y,v)graA)xyuvϕA(yx).\big{(}\forall(x,u),(y,v)\in\operatorname{gra}A\big{)}\quad\left\langle{x-y}\mid{u-v}\right\rangle\geq\phi_{A}(\|y-x\|). (2.1)

    If ϕA=νA||2\phi_{A}=\nu_{A}|\cdot|^{2} for some νA]0,[\nu_{A}\in\left]0,\infty\right[, then we say that AA is νA\nu_{A}-strongly monotone.

  4. (4)

    Let Id\operatorname{Id} denote the identity operator on \mathcal{H} and A:2A:\mathcal{H}\to 2^{\mathcal{H}} be a maximal monotone operator. For each λ>0\lambda>0, the resolvent mapping JλA:J^{A}_{\lambda}:\mathcal{H}\to\mathcal{H} associated with AA is defined by

    JλA(x):=(Id+λA)1(x)x.J^{A}_{\lambda}(x):=(\operatorname{Id}+\lambda A)^{-1}(x)\ \forall x\in\mathcal{H}. (2.2)
Definition 2.2

A mapping T:T:\mathcal{H}\to\mathcal{H} is said to be

  1. (1)

    firmly nonexpansive if

    TxTy2TxTyxyx,y,\|Tx-Ty\|^{2}\leq\left\langle{Tx-Ty}\mid{x-y}\right\rangle\ \ \ \forall x,y\in\mathcal{H},

    or equivalently

    TxTy2xy2(IT)x(IT)y2x,y.\|Tx-Ty\|^{2}\leq\|x-y\|^{2}-\|(I-T)x-(I-T)y\|^{2}\ \ \ \forall x,y\in\mathcal{H}.
  2. (2)

    LL-Lipschitz continuous with L>0L>0 if

    TxTyLxyx,y.\|Tx-Ty\|\leq L\|x-y\|\ \ \ \forall x,y\in\mathcal{H}. (2.3)

Let Γ0()\Gamma_{0}(\mathcal{H}) be the class of proper lower semicontinuous convex functions from \mathcal{H} to ],+]\left]-\infty,+\infty\right].

Definition 2.3

For fΓ0()f\in\Gamma_{0}(\mathcal{H}):

  1. (1)

    domf={x,f(x)<+}\operatorname{dom}f=\{x\in\mathcal{H},\ f(x)<+\infty\}. The subdifferential of ff at xdomfx\in\operatorname{dom}f is

    f(x)={u,zdomf:f(z)f(x)+uzx}.\partial f(x)=\{u\in\mathcal{H},\ \forall z\in\operatorname{dom}f:\ f(z)\geq f(x)+\left\langle{u}\mid{z-x}\right\rangle\}.
  2. (2)

    The proximity operator of ff is

    proxf::xargminy(f(y)+12xy2).\displaystyle\text{prox}_{f}:\mathcal{H}\to\mathcal{H}:x\mapsto\underset{y\in\mathcal{H}}{\operatorname{argmin}}\big{(}f(y)+\frac{1}{2}\|x-y\|^{2}\big{)}. (2.4)
  3. (3)

    The conjugate function of ff is

    f:asupx(axf(x)).\displaystyle f^{*}:a\mapsto\underset{x\in\mathcal{H}}{\sup}\big{(}\left\langle{a}\mid{x}\right\rangle-f(x)\big{)}. (2.5)
  4. (4)

    The infimal convolution of the two functions \ell and gg from \mathcal{H} to ],+]\left]-\infty,+\infty\right] is

    g:xinfy((y)+g(xy)).\ell\;\mbox{\footnotesize$\square$}\;g\colon x\mapsto\inf_{y\in\mathcal{H}}(\ell(y)+g(x-y)). (2.6)

Note that proxf=Jf\text{prox}_{f}=J_{\partial f} and

(fΓ0())(f)1=f.\displaystyle(\forall f\in\Gamma_{0}(\mathcal{H}))\ \ (\partial f)^{-1}=\partial f^{*}. (2.7)

We now recall some results which are needed in sequel.

Lemma 2.4

([39]) Let A:2A:\mathcal{H}\to 2^{\mathcal{H}} be a set-valued maximal monotone mapping and λ>0\lambda>0. Then the domain of the resolvent of AA is the whole space, that is D(JλA)=D(J^{A}_{\lambda})=\mathcal{H}, and in addition JλAJ^{A}_{\lambda} is a single-valued and firmly nonexpansive mapping.

Lemma 2.5

([8], Lemma 2.4) Let A:2A:\mathcal{H}\to 2^{\mathcal{H}} be a maximal monotone mapping and B:B:\mathcal{H}\to\mathcal{H} be a Lipschitz continuous and monotone mapping. Then the mapping A+BA+B is a maximal monotone mapping.

Following [27], let (Ω,,𝖯)(\Omega,{\mathcal{F}},\mathsf{P}) be a probability space. A \mathcal{H}-valued random variable is a measurable function X:ΩX:\Omega\to\mathcal{H}, where \mathcal{H} is endowed with the Borel σ\sigma-algebra. We denote by σ(X)\sigma(X) the σ\sigma-field generated by XX. The expectation of a random variable XX is denoted by 𝖤[X]\mathsf{E}[X]. The conditional expectation of XX given a σ\sigma-field 𝒜{\mathcal{A}}\subset{\mathcal{F}} is denoted by 𝖤[X|𝒜]\mathsf{E}[X|{\mathcal{A}}]. A \mathcal{H}-valued random process is a sequence (xn)(x_{n}) of \mathcal{H}-valued random variables. The abbreviation a.s. stands for ’almost surely’.

Lemma 2.6

([34, Theorem 1]) Let (n)n({\mathcal{F}}_{n})_{n\in\mathbb{N}} be an increasing sequence of sub-σ\sigma-algebras of {\mathcal{F}}, let (zn)n,(βn)n,(θn)n(z_{n})_{n\in\mathbb{N}},\ (\beta_{n})_{n\in\mathbb{N}},\ (\theta_{n})_{n\in\mathbb{N}} and (γn)n(\gamma_{n})_{n\in\mathbb{N}} be [0,+][0,+\infty]-valued random sequences such that, for every nn\in\mathbb{N}, zn,βn,θnz_{n},\ \beta_{n},\ \theta_{n} and γn\gamma_{n} are n{\mathcal{F}}_{n}-measurable. Suppose that nγn<+,nβn<+a.s.\sum_{n\in\mathbb{N}}\gamma_{n}<+\infty,\ \sum_{n\in\mathbb{N}}\beta_{n}<+\infty\ a.s. and

(n)𝖤[zn+1|n](1+γn)zn+βnθna.s..\displaystyle(\forall n\in\mathbb{N})\ \mathsf{E}[z_{n+1}|{\mathcal{F}}_{n}]\leq(1+\gamma_{n})z_{n}+\beta_{n}-\theta_{n}\ a.s..

Then znz_{n} converges a.s. and (θn)n(\theta_{n})_{n\in\mathbb{N}} is summable a.s..

According to the proof of Proposition 2.3 [17], we have the following lemma.

Lemma 2.7

Let CC be a non-empty closed subset of \mathcal{H} and let (xn)n(x_{n})_{n\in\mathbb{N}} be a \mathcal{H}-valued random process. Suppose that, for every xCx\in C, (xn+1x)n(\|x_{n+1}-x\|)_{n\in\mathbb{N}} converges a.s.. Suppose that the set of weak sequentially cluster points of (xn)n(x_{n})_{n\in\mathbb{N}} is a subset of C a.s.. Then (xn)n(x_{n})_{n\in\mathbb{N}} converges weakly a.s. to a CC-valued random vector.

3 Main results

In this section, we propose a novel stochastic forward-backward-forward algorithm for solving the problem 1.1 and analyse its convergence behaviour. Unless otherwise specified, we assume that the following assumptions are satisfied from now on.

Assumption 3.1

In what follows we suppose the following assumptions for AA and BB:

  1. (A1)

    The mapping B:B:\mathcal{H}\to\mathcal{H} is LL-Lipschitz continuous and monotone;

  2. (A2)

    The set-valued maping A:2A:\mathcal{H}\to 2^{\mathcal{H}} is maximal monotone.

  3. (A3)

    The solution set 𝒫=zer(A+B)=(A+B)1(0)\mathcal{P}=\operatorname{zer}(A+B)=(A+B)^{-1}(0)\neq\emptyset.

The algorithm is designed as follows.

Algorithm 3.2

Step 0: (Initialization) Choose θ[0,1]\theta\in[0,1]. Let (λn)n(\lambda_{n})_{n\in\mathbb{N}} be a positive sequence, (ϵn)n[0,+)(\epsilon_{n})_{n\in\mathbb{N}}\subset[0,+\infty) satisfying

n=0+ϵn<+.\displaystyle\sum_{n=0}^{+\infty}\epsilon_{n}<+\infty. (3.1)

Let x1,x0x_{-1},\ x_{0} be \mathcal{H}-valued, squared integrable random variables and set n=0n=0.
Step 1: Given xn1,xnx_{n-1},\ x_{n} (n0n\geq 0), choose αn\alpha_{n} such that

αn={min{ϵnxnxn1,θ}if xnxn1,θif xn=xn1.\displaystyle{\alpha}_{n}=\begin{cases}\min\bigg{\{}\dfrac{\epsilon_{n}}{\|x_{n}-x_{n-1}\|},\theta\bigg{\}}&\text{if $x_{n}\neq x_{n-1}$,}\\ \theta&\text{if $x_{n}=x_{n-1}$.}\end{cases} (3.3)

Let rnr_{n} be a random vector. Compute

wn\displaystyle w_{n} =xn+αn(xnxn1),\displaystyle=x_{n}+\alpha_{n}(x_{n}-x_{n-1}),
yn\displaystyle y_{n} =(I+λnA)1(wnλnrn).\displaystyle=(I+\lambda_{n}A)^{-1}(w_{n}-\lambda_{n}r_{n}).

Step 2: Let sns_{n} be an unbiased estimator of BynBy_{n}, i.e., 𝖤[sn|n]=Byn\mathsf{E}[s_{n}|{\mathcal{F}_{n}}]=By_{n}. Calculate the next iterate as

xn+1\displaystyle x_{n+1} =ynλn(snrn),\displaystyle=y_{n}-\lambda_{n}(s_{n}-r_{n}), (3.4)

where n=σ(x1,x0,r0,x1,r1,,xn,rn)\mathcal{F}_{n}=\sigma(x_{-1},x_{0},r_{0},x_{1},r_{1},\ldots,x_{n},r_{n}).
Let n:=n+1n:=n+1 and return to Step 1.

Remark 3.3

Some remarks on the algorithm are in order now.

  1. (1)

    Algorithm 3.2 is an extension of the forward-backward-forward splitting method in [41] which is in the deterministic setting. In the setting of this method, we do not need the cocoercive condition as in [17, 18, 36].

  2. (2)

    When αn=0\alpha_{n}=0, Algorithm 3.2 reduces to (3.2) in [42]. However, the conditions for the convergences are different from that in [42].

  3. (3)

    In [20, 21], for solving (1.1), the authors designed stochastic forward-backward-forward splitting methods which require a large number of samples in each iteration. Our results are also different from that in [20, 21].

  4. (4)

    Evidently, we have from (3.3) that

    αnxnxn1ϵn.\displaystyle\alpha_{n}\|x_{n}-x_{n-1}\|\leq\epsilon_{n}. (3.5)
Lemma 3.4

Let (xn)(x_{n}) be generated by Algorithm 3.2, then the following holds:

xn+1p2\displaystyle\|x_{n+1}-p\|^{2} wnp2(1λn2L2)wnyn2+λn2(snByn2+rnBwn2)\displaystyle\leq\|w_{n}-p\|^{2}-(1-\lambda_{n}^{2}L^{2})\|w_{n}-y_{n}\|^{2}+\lambda_{n}^{2}\big{(}\|s_{n}-By_{n}\|^{2}+\|r_{n}-Bw_{n}\|^{2}\big{)}
+2λn2(snBynBynrn+BynBwnBwnrn)\displaystyle\hskip 56.9055pt\ +2\lambda_{n}^{2}\big{(}\left\langle{s_{n}-By_{n}}\mid{By_{n}-r_{n}}\right\rangle+\left\langle{By_{n}-Bw_{n}}\mid{Bw_{n}-r_{n}}\right\rangle\big{)}
+2ynwnλn(Bynrn)ynp+2λnBynsnynp,\displaystyle\hskip 56.9055pt\ +2\left\langle{y_{n}-w_{n}-\lambda_{n}(By_{n}-r_{n})}\mid{y_{n}-p}\right\rangle+2\lambda_{n}\left\langle{By_{n}-s_{n}}\mid{y_{n}-p}\right\rangle, (3.6)

for any p𝒫p\in\mathcal{P}.

Proof. We have

xn+1p2\displaystyle\|x_{n+1}-p\|^{2} =ynλn(snrn)p2\displaystyle=\|y_{n}-\lambda_{n}(s_{n}-r_{n})-p\|^{2}
=ynwnλn(snrn)+wnp2\displaystyle=\|y_{n}-w_{n}-\lambda_{n}(s_{n}-r_{n})+w_{n}-p\|^{2}
=wnp2+ynwnλn(snrn)2+2ynwnλn(snrn)wnp\displaystyle=\|w_{n}-p\|^{2}+\|y_{n}-w_{n}-\lambda_{n}(s_{n}-r_{n})\|^{2}+2\left\langle{y_{n}-w_{n}-\lambda_{n}(s_{n}-r_{n})}\mid{w_{n}-p}\right\rangle
=wnp2+ynwnλn(snrn)2+2ynwnλn(snrn)wnyn\displaystyle=\|w_{n}-p\|^{2}+\|y_{n}-w_{n}-\lambda_{n}(s_{n}-r_{n})\|^{2}+2\left\langle{y_{n}-w_{n}-\lambda_{n}(s_{n}-r_{n})}\mid{w_{n}-y_{n}}\right\rangle
+2ynwnλn(snrn)ynp\displaystyle\hskip 204.85974pt+2\left\langle{y_{n}-w_{n}-\lambda_{n}(s_{n}-r_{n})}\mid{y_{n}-p}\right\rangle
=wnp2+ynwnλn(snrn)22wnyn2+2λnsnrnynwn\displaystyle=\|w_{n}-p\|^{2}+\|y_{n}-w_{n}-\lambda_{n}(s_{n}-r_{n})\|^{2}-2\|w_{n}-y_{n}\|^{2}+2\lambda_{n}\left\langle{s_{n}-r_{n}}\mid{y_{n}-w_{n}}\right\rangle
+2ynwnλn(snrn)ynp\displaystyle\hskip 204.85974pt+2\left\langle{y_{n}-w_{n}-\lambda_{n}(s_{n}-r_{n})}\mid{y_{n}-p}\right\rangle
=wnp2wnyn2+λn2snrn2\displaystyle=\|w_{n}-p\|^{2}-\|w_{n}-y_{n}\|^{2}+\lambda_{n}^{2}\|s_{n}-r_{n}\|^{2}
+2ynwnλn(Bynrn)ynp+2λnBynsnynp.\displaystyle\hskip 42.67912pt+2\left\langle{y_{n}-w_{n}-\lambda_{n}(By_{n}-r_{n})}\mid{y_{n}-p}\right\rangle+2\lambda_{n}\left\langle{By_{n}-s_{n}}\mid{y_{n}-p}\right\rangle. (3.7)

Note that

snrn2\displaystyle\|s_{n}-r_{n}\|^{2} =snByn+Bynrn2\displaystyle=\|s_{n}-By_{n}+By_{n}-r_{n}\|^{2}
=snByn2+2snBynBynrn+BynBwn+Bwnrn2\displaystyle=\|s_{n}-By_{n}\|^{2}+2\left\langle{s_{n}-By_{n}}\mid{By_{n}-r_{n}}\right\rangle+\|By_{n}-Bw_{n}+Bw_{n}-r_{n}\|^{2}
=snByn2+2snBynBynrn+BynBwn2+Bwnrn2\displaystyle=\|s_{n}-By_{n}\|^{2}+2\left\langle{s_{n}-By_{n}}\mid{By_{n}-r_{n}}\right\rangle+\|By_{n}-Bw_{n}\|^{2}+\|Bw_{n}-r_{n}\|^{2}
+2BynBwnBwnrn\displaystyle\ \ \ +2\left\langle{By_{n}-Bw_{n}}\mid{Bw_{n}-r_{n}}\right\rangle
snByn2+2snBynBynrn+L2ynwn2+Bwnrn2\displaystyle\leq\|s_{n}-By_{n}\|^{2}+2\left\langle{s_{n}-By_{n}}\mid{By_{n}-r_{n}}\right\rangle+L^{2}\|y_{n}-w_{n}\|^{2}+\|Bw_{n}-r_{n}\|^{2}
+2BynBwnBwnrn\displaystyle\ \ \ +2\left\langle{By_{n}-Bw_{n}}\mid{Bw_{n}-r_{n}}\right\rangle (3.8)

By combining (3) and (3) we obtain

xn+1p2\displaystyle\|x_{n+1}-p\|^{2} wnp2(1λn2L2)wnyn2+λn2(snByn2+rnBwn2)\displaystyle\leq\|w_{n}-p\|^{2}-(1-\lambda_{n}^{2}L^{2})\|w_{n}-y_{n}\|^{2}+\lambda_{n}^{2}\big{(}\|s_{n}-By_{n}\|^{2}+\|r_{n}-Bw_{n}\|^{2}\big{)}
+2λn2(snBynBynrn+BynBwnBwnrn)\displaystyle\hskip 56.9055pt\ +2\lambda_{n}^{2}\big{(}\left\langle{s_{n}-By_{n}}\mid{By_{n}-r_{n}}\right\rangle+\left\langle{By_{n}-Bw_{n}}\mid{Bw_{n}-r_{n}}\right\rangle\big{)}
+2ynwnλn(Bynrn)ynp+2λnBynsnynp.\displaystyle\hskip 56.9055pt\ +2\left\langle{y_{n}-w_{n}-\lambda_{n}(By_{n}-r_{n})}\mid{y_{n}-p}\right\rangle+2\lambda_{n}\left\langle{By_{n}-s_{n}}\mid{y_{n}-p}\right\rangle.

The proof is complete.     

Theorem 3.5

Let (xn)n(x_{n})_{n\in\mathbb{N}} be generated by Algorithm 3.2. The followings hold

  1. (i)

    Assume that (λn)n(\lambda_{n})_{n\in\mathbb{N}} be a sequence in ]ϵ,1ϵL[\left]\epsilon,\dfrac{1-\epsilon}{L}\right[ and the following conditions are satisfied for n=σ(x1,x0,r0,x1,r1,,xn,rn)\mathcal{F}_{n}=\sigma(x_{-1},x_{0},r_{0},x_{1},r_{1},\ldots,x_{n},r_{n})

    n𝖤[snByn2|n]<+a.s..andnrnBwn2<+a.s.\sum_{n\in\mathbb{N}}\mathsf{E}[\|s_{n}-By_{n}\|^{2}|\mathcal{F}_{n}]<+\infty\ \ a.s..\ \ \text{and}\ \ \sum_{n\in\mathbb{N}}\|r_{n}-Bw_{n}\|^{2}<+\infty\ \ a.s. (3.9)

    Then (xn)n(x_{n})_{n\in\mathbb{N}} converges weakly to a random varibale x¯:Ωzer(A+B)\overline{x}\colon\Omega\to\operatorname{zer}(A+B) a.s..

  2. (ii)

    Suppose that AA or BB is uniformly monotone. Let (λn)n(\lambda_{n})_{n\in\mathbb{N}} be a sequence in ]0,+[\left]0,+\infty\right[ such that (λn)n2()\1()(\lambda_{n})_{n\in\mathbb{N}}\in\ell_{2}(\mathbb{N})\backslash\ell_{1}(\mathbb{N}) and

    nλn2rnBwn2<a.sandnλn2𝖤[snByn2|n]<+a.s.,\sum_{n\in\mathbb{N}}\lambda^{2}_{n}\|r_{n}-Bw_{n}\|^{2}<\infty\ \ a.s\ \ \text{and}\ \ \sum_{n\in\mathbb{N}}\lambda_{n}^{2}\mathsf{E}[\|s_{n}-By_{n}\|^{2}|\mathcal{F}_{n}]<+\infty\ \ \text{a.s}., (3.10)

    where (p]0,[)p()={(λn)n|(n)λn,n|λn|p<+}.(\forall p\in\left]0,\infty\right[)\;\ell_{p}(\mathbb{N})=\big{\{}{(\lambda_{n})_{n\in\mathbb{N}}}\;|\;{(\forall n\in\mathbb{N})\;\lambda_{n}\in\mathbb{R},\sum\limits_{n\in\mathbb{N}}|\lambda_{n}|^{p}<+\infty}\big{\}}. Then (xn)n(x_{n})_{n\in\mathbb{N}} converges strongly to a unique solution x¯\overline{x} a.s..

Proof. From Lemma 3.4, taking conditional expectation given n\mathcal{F}_{n} on both sides of (3.6)), using 𝖤[sn|n]=Byn\mathsf{E}[s_{n}|{\mathcal{F}_{n}}]=By_{n} we get

𝖤[xn+1p2|n]\displaystyle\mathsf{E}[\|x_{n+1}-p\|^{2}|\mathcal{F}_{n}] wnp2(1λn2L2)wnyn2+λn2𝖤[snByn2|n]+λn2rnBwn2\displaystyle\leq\|w_{n}-p\|^{2}-(1-\lambda_{n}^{2}L^{2})\|w_{n}-y_{n}\|^{2}+\lambda_{n}^{2}\mathsf{E}[\|s_{n}-By_{n}\|^{2}|\mathcal{F}_{n}]+\lambda_{n}^{2}\|r_{n}-Bw_{n}\|^{2}
+2λn2BynBwnBwnrn+2ynwnλn(Bynrn)ynp.\displaystyle\ +2\lambda_{n}^{2}\left\langle{By_{n}-Bw_{n}}\mid{Bw_{n}-r_{n}}\right\rangle+2\left\langle{y_{n}-w_{n}-\lambda_{n}(By_{n}-r_{n})}\mid{y_{n}-p}\right\rangle. (3.11)

Since yn=(I+λA)1(wnλnrn)y_{n}=(I+\lambda A)^{-1}(w_{n}-\lambda_{n}r_{n}), we obtain

wnynλnrnAyn\dfrac{w_{n}-y_{n}}{\lambda_{n}}-r_{n}\in Ay_{n}

which is equivalent to

wnynλnrn+Byn(A+B)yn.\dfrac{w_{n}-y_{n}}{\lambda_{n}}-r_{n}+By_{n}\in(A+B)y_{n}.

We have 0(A+B)p0\in(A+B)p, using the uniformly monotone of A+BA+B, we get

wnynλnrn+Bynynpϕ(ynp),\displaystyle\left\langle{\dfrac{w_{n}-y_{n}}{\lambda_{n}}-r_{n}+By_{n}}\mid{y_{n}-p}\right\rangle\geq\phi(\|y_{n}-p\|), (3.12)

which implies

ynwnλn(Bynrn)ynpλnϕ(ynp).\displaystyle\left\langle{y_{n}-w_{n}-\lambda_{n}(By_{n}-r_{n})}\mid{y_{n}-p}\right\rangle\leq-\lambda_{n}\phi(\|y_{n}-p\|). (3.13)

Using (3.5) and Cauchy-Schwarz inequality, we estimate the term wnp2\|w_{n}-p\|^{2} in (3.11) as follows:

wnp2\displaystyle\|w_{n}-p\|^{2} =xn+αn(xnxn1)p2\displaystyle=\|x_{n}+\alpha_{n}(x_{n}-x_{n-1})-p\|^{2}
=xnp2+2αnxnpxnxn1+αn2xnxn12\displaystyle=\|x_{n}-p\|^{2}+2\alpha_{n}\left\langle{x_{n}-p}\mid{x_{n}-x_{n-1}}\right\rangle+\alpha_{n}^{2}\|x_{n}-x_{n-1}\|^{2}
xnp2+2αnxnpxnxn1+αn2xnxn12\displaystyle\leq\|x_{n}-p\|^{2}+2\alpha_{n}\|x_{n}-p\|\|x_{n}-x_{n-1}\|+\alpha_{n}^{2}\|x_{n}-x_{n-1}\|^{2}
xnp2+2ϵnxnp+ϵn2\displaystyle\leq\|x_{n}-p\|^{2}+2\epsilon_{n}\|x_{n}-p\|+\epsilon_{n}^{2}
(1+ϵn)xnp2+ϵn2+ϵn.\displaystyle\leq(1+\epsilon_{n})\|x_{n}-p\|^{2}+\epsilon_{n}^{2}+\epsilon_{n}\ . (3.14)

Therefore, from (3.11), using (3.13) and (3.14), we derive

𝖤[xn+1\displaystyle\mathsf{E}[\|x_{n+1} p2|n](1+ϵn)xnp2(1λn2L2)wnyn2+λn2𝖤[snByn2|n]\displaystyle-p\|^{2}|\mathcal{F}_{n}]\leq(1+\epsilon_{n})\|x_{n}-p\|^{2}-(1-\lambda_{n}^{2}L^{2})\|w_{n}-y_{n}\|^{2}+\lambda_{n}^{2}\mathsf{E}[\|s_{n}-By_{n}\|^{2}|\mathcal{F}_{n}]
+λn2rnBwn2+2λn2BynBwnBwnrn2λnϕ(ynp)+ϵn2+ϵn.\displaystyle+\lambda_{n}^{2}\|r_{n}-Bw_{n}\|^{2}+2\lambda_{n}^{2}\left\langle{By_{n}-Bw_{n}}\mid{Bw_{n}-r_{n}}\right\rangle-2\lambda_{n}\phi(\|y_{n}-p\|)+\epsilon_{n}^{2}+\epsilon_{n}. (3.15)

(i) In general case, i.e. ϕ=0\phi=0. We have

2BynBwnBwnrn\displaystyle 2\left\langle{By_{n}-Bw_{n}}\mid{Bw_{n}-r_{n}}\right\rangle 2BynBwnBwnrn\displaystyle\leq 2\|By_{n}-Bw_{n}\|\|Bw_{n}-r_{n}\|
ϵ1ϵBynBwn2+1ϵϵrnBwn2\displaystyle\leq\dfrac{\epsilon}{1-\epsilon}\|By_{n}-Bw_{n}\|^{2}+\dfrac{1-\epsilon}{\epsilon}\|r_{n}-Bw_{n}\|^{2}
ϵ1ϵL2ynwn2+1ϵϵrnBwn2.\displaystyle\leq\dfrac{\epsilon}{1-\epsilon}L^{2}\|y_{n}-w_{n}\|^{2}+\dfrac{1-\epsilon}{\epsilon}\|r_{n}-Bw_{n}\|^{2}. (3.16)

Hence, (3) implies that

𝖤[xn+1p2|n]\displaystyle\mathsf{E}[\|x_{n+1}-p\|^{2}|\mathcal{F}_{n}] (1+ϵn)xnp2(1λn2L2(1+ϵ1ϵ)wnyn2+λn2𝖤[snByn2|n]\displaystyle\leq(1+\epsilon_{n})\|x_{n}-p\|^{2}-(1-\lambda_{n}^{2}L^{2}(1+\frac{\epsilon}{1-\epsilon})\|w_{n}-y_{n}\|^{2}+\lambda_{n}^{2}\mathsf{E}[\|s_{n}-By_{n}\|^{2}|\mathcal{F}_{n}]
+λn2(1+1ϵϵ)rnBwn2+ϵn2+ϵn\displaystyle\ \ \ +\lambda_{n}^{2}(1+\frac{1-\epsilon}{\epsilon})\|r_{n}-Bw_{n}\|^{2}+\epsilon_{n}^{2}+\epsilon_{n}
(1+ϵn)xnp2ϵwnyn2+λn2𝖤[snByn2|n]+λn2ϵrnBwn2\displaystyle\leq(1+\epsilon_{n})\|x_{n}-p\|^{2}-\epsilon\|w_{n}-y_{n}\|^{2}+\lambda_{n}^{2}\mathsf{E}[\|s_{n}-By_{n}\|^{2}|\mathcal{F}_{n}]+\frac{\lambda_{n}^{2}}{\epsilon}\|r_{n}-Bw_{n}\|^{2}
+ϵn2+ϵn.\displaystyle\ \ \ +\epsilon_{n}^{2}+\epsilon_{n}. (3.17)

We have that n=1ϵn<\sum_{n=1}^{\infty}\epsilon_{n}<\infty which implies n=1ϵn2<\sum_{n=1}^{\infty}\epsilon_{n}^{2}<\infty. Therefore, using the conditions in Theorem 3.5 and Lemma 2.6, (3) implies that

xnpconverges andwnyn0a.s..\|x_{n}-p\|\ \ \text{converges and}\ \|w_{n}-y_{n}\|\to 0\ \ \text{a.s}..

We have

xnyn\displaystyle\|x_{n}-y_{n}\| xnwn+wnyn\displaystyle\leq\|x_{n}-w_{n}\|+\|w_{n}-y_{n}\|
αnxnxn1+wnyn0.\displaystyle\leq\alpha_{n}\|x_{n}-x_{n-1}\|+\|w_{n}-y_{n}\|\to 0. (3.18)

Let us set

zn=(I+λnA)1(wnλnBwn).z_{n}=(I+\lambda_{n}A)^{-1}(w_{n}-\lambda_{n}Bw_{n}). (3.19)

Then, since JλnAJ_{\lambda_{n}A} is nonexpansive, we have

ynznλnBwnrn0a.s..\|y_{n}-z_{n}\|\leq\lambda_{n}\|Bw_{n}-r_{n}\|\to 0\;\text{a.s}.. (3.20)

Hence

wnznwnyn+ynzn0a.s..\displaystyle\|w_{n}-z_{n}\|\leq\|w_{n}-y_{n}\|+\|y_{n}-z_{n}\|\to 0\ \ \text{a.s}.. (3.21)

Let xx^{*} be a weak cluster point of (xn)n(x_{n})_{n\in\mathbb{N}}. Then, there exists a subsequence (xnk)k(x_{n_{k}})_{k\in\mathbb{N}} which converges weakly to xx^{*} a.s.. By (3.18), ynkxy_{n_{k}}\rightharpoonup x^{*} a.s.. It follows from ynkxy_{n_{k}}\rightharpoonup x^{*} that znkxz_{n_{k}}\rightharpoonup x^{*}. Since znk=(I+γnkA)1(wnkγnkBwnk)z_{n_{k}}=(I+\gamma_{n_{k}}A)^{-1}(w_{n_{k}}-\gamma_{n_{k}}Bw_{n_{k}}), we have

wnkznkγnkBwnk+Bznk(A+B)znk.\frac{w_{n_{k}}-z_{n_{k}}}{\gamma_{n_{k}}}-Bw_{n_{k}}+Bz_{n_{k}}\in(A+B)z_{n_{k}}. (3.22)

Since BB is LL-Lipschitz and (λn)n(\lambda_{n})_{n\in\mathbb{N}} is bounded away from 0, it follows that

wnkznkγnkBwnk+Bznk0a.s..\frac{w_{n_{k}}-z_{n_{k}}}{\gamma_{n_{k}}}-Bw_{n_{k}}+Bz_{n_{k}}\to 0\quad{a.s..} (3.23)

Using [2, Corollary 25.5], the sum A+BA+B is maximally monotone and hence, its graph is closed in weak×strong\mathcal{H}^{weak}\times\mathcal{H}^{strong} [2, Proposition 20.38]. Therefore, 0(A+B)x0\in(A+B)x^{*} a.s., that is x𝒫x^{*}\in\mathcal{P} a.s. By Lemma 2.7, the sequence (xn)n(x_{n})_{n\in\mathbb{N}} converges weakly to x¯𝒫\bar{x}\in\mathcal{P} a.s. and the proof is completed.

(ii) In case A+BA+B is uniform monotone.
We rewrite (3) as

𝖤[xn+1p2|n]\displaystyle\mathsf{E}[\|x_{n+1}-p\|^{2}|\mathcal{F}_{n}] (1+ϵn)xnp2(1λn2L2)wnyn2+λn2𝖤[snByn2|n]\displaystyle\leq(1+\epsilon_{n})\|x_{n}-p\|^{2}-(1-\lambda_{n}^{2}L^{2})\|w_{n}-y_{n}\|^{2}+\lambda_{n}^{2}\mathsf{E}[\|s_{n}-By_{n}\|^{2}|\mathcal{F}_{n}]
+λn2rnBwn2+2λn2BynBwnBwnrn2λnϕ(ynp)\displaystyle\ \ +\lambda_{n}^{2}\|r_{n}-Bw_{n}\|^{2}+2\lambda_{n}^{2}\left\langle{By_{n}-Bw_{n}}\mid{Bw_{n}-r_{n}}\right\rangle-2\lambda_{n}\phi(\|y_{n}-p\|)
+ϵn2+ϵn.\displaystyle\ \ +\epsilon_{n}^{2}+\epsilon_{n}. (3.24)

We have

2BynBwnBwnrn\displaystyle 2\left\langle{By_{n}-Bw_{n}}\mid{Bw_{n}-r_{n}}\right\rangle BynBwn2+rnBwn2\displaystyle\leq\|By_{n}-Bw_{n}\|^{2}+\|r_{n}-Bw_{n}\|^{2}
L2ynwn2+rnBwn2\displaystyle\leq L^{2}\|y_{n}-w_{n}\|^{2}+\|r_{n}-Bw_{n}\|^{2} (3.25)

Using (3), from (3) we have

𝖤[xn+1p2|n]\displaystyle\mathsf{E}[\|x_{n+1}-p\|^{2}|\mathcal{F}_{n}] (1+ϵn)xnp2(12λn2L2)wnyn2+λn2𝖤[snByn2|n]\displaystyle\leq(1+\epsilon_{n})\|x_{n}-p\|^{2}-(1-2\lambda_{n}^{2}L^{2})\|w_{n}-y_{n}\|^{2}+\lambda_{n}^{2}\mathsf{E}[\|s_{n}-By_{n}\|^{2}|\mathcal{F}_{n}]
+2λn2rnBwn22λnϕ(ynp)+ϵn2+ϵn.\displaystyle\ \ \ +2\lambda_{n}^{2}\|r_{n}-Bw_{n}\|^{2}-2\lambda_{n}\phi(\|y_{n}-p\|)+\epsilon_{n}^{2}+\epsilon_{n}. (3.26)

From nλn2<+\sum_{n\in\mathbb{N}}\lambda_{n}^{2}<+\infty, we derive limn+λn=0\lim\limits_{n\to+\infty}\lambda_{n}=0. We have that

{nϵn<+nλn2𝖤[snByn2|n]<+a.s.nλn2rnBwn2<+a.s..\displaystyle\begin{cases}\sum_{n\in\mathbb{N}}\epsilon_{n}<+\infty\\ \sum_{n\in\mathbb{N}}\lambda_{n}^{2}\mathsf{E}[\|s_{n}-By_{n}\|^{2}|\mathcal{F}_{n}]<+\infty\ \ a.s.\\ \sum_{n\in\mathbb{N}}\lambda_{n}^{2}\|r_{n}-Bw_{n}\|^{2}<+\infty\ \ a.s..\end{cases} (3.27)

Therofore (3) and Lemma 2.6 imply

xnpconvergesandnλnϕ(ynp)<+a.s..\displaystyle\|x_{n}-p\|\ \text{converges}\ \ \text{and}\ \sum\limits_{n\in\mathbb{N}}\lambda_{n}\phi(\|y_{n}-p\|)<+\infty\ \ a.s.. (3.28)

Since nλn=\sum_{n\in\mathbb{N}}\lambda_{n}=\infty, it follows from nλnϕ(ynp)<+\sum\limits_{n\in\mathbb{N}}\lambda_{n}\phi(\|y_{n}-p\|)<+\infty that lim infnϕ(ynp)=0\liminf_{n\to\infty}\phi(\|y_{n}-p\|)=0. Thus, there exists a subsequence {nk}n\{n_{k}\}_{n\in\mathbb{N}} such that ynkp0\|y_{n_{k}}-p\|\to 0. It follows from (3.18) that xnkp0\|x_{n_{k}}-p\|\to 0 a.s.. Therefore, we infer that xnp0\|x_{n}-p\|\to 0 a.s.. This completes the proof.     

Remark 3.6

With respect to Theorem 3.5, we observe the following.

  1. (1)

    The conditions in (3.10) are satisfied if sequences (rnBwn2)n(\|r_{n}-Bw_{n}\|^{2})_{n\in\mathbb{N}}, (𝖤[snByn2|n])n(\mathsf{E}[\|s_{n}-By_{n}\|^{2}|\mathcal{F}_{n}])_{n\in\mathbb{N}} are bounded.

  2. (2)

    Theorem 3.5 removes the assumption (iii) of Theorem 3.2 in [37], i.e. supnxnxn12<+\sup_{n\in\mathbb{N}}\|x_{n}-x_{n-1}\|^{2}<+\infty and nαn<+\sum_{n\in\mathbb{N}}\alpha_{n}<+\infty.

  3. (3)

    The algorithm (3.2) of [42] is a particular case of our algorithm when αn=0\alpha_{n}=0. The condition (3.9), i.e. n𝖤[snByn2|n]<+andnrnBwn2|<+\sum_{n\in\mathbb{N}}\mathsf{E}[\|s_{n}-By_{n}\|^{2}|\mathcal{F}_{n}]<+\infty\ \text{and}\ \ \sum_{n\in\mathbb{N}}\|r_{n}-Bw_{n}\|^{2}|<+\infty\ \ is weaker the conditions n𝖤[snByn2|n]<+andnrnBwn2<+\sum_{n\in\mathbb{N}}\sqrt{\mathsf{E}[\|s_{n}-By_{n}\|^{2}|\mathcal{F}_{n}]}<+\infty\ \ \text{and}\ \sum_{n\in\mathbb{N}}\sqrt{\|r_{n}-Bw_{n}\|^{2}}<+\infty\ \ of Theorem 3.1 in [42].

  4. (4)

    For general case, i.e. AA is maximally monotone and BB is monotone and Lipschitz, the proposed algoritms in [20, 21] require a large number of samples in each iteration, all are unbiased estimates. However, Algorithm 3.2 only requires sns_{n} is unbiased estimate. The range of the step size λn\lambda_{n} in Theorem 3.5 is more extended than that in Theorem 1 of [20]. In case rnr_{n} and sns_{n} is the average of samples as in [20, 21], we can obtain the same results as there.

For the rate convergence in uniformly monotone case, we define the function

φc:]0,+[:t{tc1cifc0logtifc=0.\displaystyle\varphi_{c}:\ ]0,+\infty[\to\mathbb{R}:\ t\mapsto\begin{cases}\dfrac{t^{c}-1}{c}\ &\text{if}\ c\neq 0\\ \log t\ \ &\text{if}\ c=0.\end{cases} (3.29)

The following Lemma establishes a non asymptotic bound for numerical sequences satisfying a given recursive inequality. The proof is obtained similarly to the proof of Lemma 3.1 in [36].

Lemma 3.7

Let α]12,1],β>1\alpha\in]\frac{1}{2},1],\ \beta>1. Let a,b]0,+[,aβ.a,b\in]0,+\infty[,\ a\leq\beta. Set αn=anα,βn=bnβ\alpha_{n}=\dfrac{a}{n^{\alpha}},\ \beta_{n}=\dfrac{b}{n^{\beta}}. Let (sn)n(s_{n})_{n\in\mathbb{N}} be a sequence such that

(n) 0sn+1(1αn)sn+βn.\displaystyle(\forall n\in\mathbb{N})\ \ 0\leq s_{n+1}\leq(1-\alpha_{n})s_{n}+\beta_{n}. (3.30)

Let n0n_{0} be the smallest integer such that, for every nn0>1,n\geq n_{0}>1, it holds αn<1,\alpha_{n}<1, set t=12α10.t=1-2^{\alpha-1}\geq 0. Then for every n2n0n\geq 2n_{0}, the followings hold

  1. (i)

    If α=1\alpha=1, we get

    sn+1sn0(n0n+1)a+b(n+1)a(1+1n0)aφa+1β(n).\displaystyle s_{n+1}\leq s_{n_{0}}\big{(}\dfrac{n_{0}}{n+1}\big{)}^{a}+\dfrac{b}{(n+1)^{a}}\big{(}1+\frac{1}{n_{0}}\big{)}^{a}\varphi_{a+1-\beta}(n). (3.31)
  2. (ii)

    If 1/2<α<11/2<\alpha<1, we have

    sn+1(bφ1β(n)+sn0exp(an01α1α))exp(at(n+1)1α1α)+b2βαa(n2)βα.\displaystyle s_{n+1}\leq\bigg{(}b\varphi_{1-\beta}(n)+s_{n_{0}}\text{exp}\bigg{(}\frac{an_{0}^{1-\alpha}}{1-\alpha}\bigg{)}\bigg{)}\text{exp}\bigg{(}\frac{-at(n+1)^{1-\alpha}}{1-\alpha}\bigg{)}+\dfrac{b2^{\beta-\alpha}}{a(n-2)^{\beta-\alpha}}. (3.32)

Proof. We recall the definition of φc\varphi_{c} in (3.29). Note that, φc\varphi_{c} is a increasing function and for δ0\delta\geq 0, 2mn2\leq m\leq n, we get:

φ1δ(n+1)φ1δ(m)k=mnkδφ1δ(n).\displaystyle\varphi_{1-\delta}(n+1)-\varphi_{1-\delta}(m)\leq\sum_{k=m}^{n}k^{-\delta}\leq\varphi_{1-\delta}(n). (3.33)

We have

sn+1sn0k=n0n(1αn)+k=n0ni=k+1n(1αi)βk.\displaystyle s_{n+1}\leq s_{n_{0}}\prod_{k=n_{0}}^{n}(1-\alpha_{n})+\sum_{k=n_{0}}^{n}\prod_{i=k+1}^{n}(1-\alpha_{i})\beta_{k}. (3.34)

Let us estimate the first term in the right hand side of (3.34). Using (3.33) and the inequality 1xexx1-x\leq e^{-x}\ \forall\ x\in\mathbb{R}, we have

k=n0n(1αn)\displaystyle\prod_{k=n_{0}}^{n}(1-\alpha_{n}) =k=n0n(1akα)eak=n0nkα\displaystyle=\prod_{k=n_{0}}^{n}(1-ak^{-\alpha})\leq e^{-a\sum\limits_{k=n_{0}}^{n}k^{-\alpha}}
{(n0n+1)aifα=1exp(a1α(n01α(n+1)1α))if12<α<1.\displaystyle\leq\begin{cases}\big{(}\dfrac{n_{0}}{n+1}\big{)}^{a}\ &\text{if}\ \alpha=1\\ \text{exp}\big{(}\frac{a}{1-\alpha}(n_{0}^{1-\alpha}-(n+1)^{1-\alpha})\big{)}\ &\text{if}\ \frac{1}{2}<\alpha<1.\end{cases} (3.35)

To estimate the second term in the right hand side of (3.34), let us consider firstly the case α=1\alpha=1. We have

k=n0ni=k+1n(1αi)βk\displaystyle\sum_{k=n_{0}}^{n}\prod_{i=k+1}^{n}(1-\alpha_{i})\beta_{k} k=n0neai=k+1niαβk\displaystyle\leq\sum_{k=n_{0}}^{n}e^{-a\sum\limits_{i=k+1}^{n}i^{-\alpha}}\beta_{k}
k=n0n(k+1n+1)abkβ=b(n+1)ak=n0n(1+1k)akaβ\displaystyle\leq\sum_{k=n_{0}}^{n}\big{(}\dfrac{k+1}{n+1}\big{)}^{a}\dfrac{b}{k^{\beta}}=\dfrac{b}{(n+1)^{a}}\sum_{k=n_{0}}^{n}(1+\frac{1}{k})^{a}k^{a-\beta}
b(n+1)a(1+1n0)aφa+1β(n).\displaystyle\leq\dfrac{b}{(n+1)^{a}}\big{(}1+\frac{1}{n_{0}}\big{)}^{a}\varphi_{a+1-\beta}(n). (3.36)

From (3.34), using (3) and (3), for α=1\alpha=1, we get

sn+1sn0(n0n+1)a+b(n+1)a(1+1n0)aφa+1β(n).\displaystyle s_{n+1}\leq s_{n_{0}}\big{(}\dfrac{n_{0}}{n+1}\big{)}^{a}+\dfrac{b}{(n+1)^{a}}\big{(}1+\frac{1}{n_{0}}\big{)}^{a}\varphi_{a+1-\beta}(n). (3.37)

We next estimate the second term in the right hand side of (3.34) in case 1/2<α<11/2<\alpha<1. Let mm\in\mathbb{N} such that n0n/2m+1(n+1)/2.n_{0}\leq n/2\leq m+1\leq(n+1)/2. We have

k=n0n\displaystyle\sum_{k=n_{0}}^{n} i=k+1n(1αi)βk=k=n0mi=k+1n(1αi)βk+k=m+1ni=k+1n(1αi)βk\displaystyle\prod_{i=k+1}^{n}(1-\alpha_{i})\beta_{k}=\sum_{k=n_{0}}^{m}\prod_{i=k+1}^{n}(1-\alpha_{i})\beta_{k}+\sum_{k=m+1}^{n}\prod_{i=k+1}^{n}(1-\alpha_{i})\beta_{k}
exp(i=m+1nαi)k=n0mβk+bamβαk=m+1ni=k+1n(1αi)αn\displaystyle\leq\text{exp}\big{(}-\sum_{i=m+1}^{n}\alpha_{i}\big{)}\sum_{k=n_{0}}^{m}\beta_{k}+\dfrac{b}{a\cdot m^{\beta-\alpha}}\sum_{k=m+1}^{n}\prod_{i=k+1}^{n}(1-\alpha_{i})\alpha_{n}
bexp(ai=m+1niα)k=n0mkβ+bamβαk=m+1n(i=k+1n(1αi)i=kn(1αi))\displaystyle\leq b\cdot\text{exp}\big{(}-a\sum_{i=m+1}^{n}i^{-\alpha}\big{)}\sum_{k=n_{0}}^{m}k^{-\beta}+\dfrac{b}{a\cdot m^{\beta-\alpha}}\sum_{k=m+1}^{n}\big{(}\prod_{i=k+1}^{n}(1-\alpha_{i})-\prod_{i=k}^{n}(1-\alpha_{i})\big{)}
bexp(a1α((m+1)1α(n+1)1α))φ1β(n)+b2βαa(n2)βα\displaystyle\leq b\cdot\text{exp}\big{(}\frac{a}{1-\alpha}((m+1)^{1-\alpha}-(n+1)^{1-\alpha})\big{)}\varphi_{1-\beta}(n)+\dfrac{b2^{\beta-\alpha}}{a(n-2)^{\beta-\alpha}}
bexp(at(n+1)1α1α)φ1β(n)+b2βαa(n2)βα.\displaystyle\leq b\cdot\text{exp}\big{(}\frac{-at(n+1)^{1-\alpha}}{1-\alpha}\big{)}\varphi_{1-\beta}(n)+\dfrac{b2^{\beta-\alpha}}{a(n-2)^{\beta-\alpha}}. (3.38)

Combing (3) and (3.38), for 1/2<α<11/2<\alpha<1, we have

sn+1\displaystyle s_{n+1} sn0exp(a1α(n01α(n+1)1α))+bexp(at(n+1)1α1α)φ1β(n)+b2βαa(n2)βα\displaystyle\leq s_{n_{0}}\text{exp}\big{(}\frac{a}{1-\alpha}(n_{0}^{1-\alpha}-(n+1)^{1-\alpha})\big{)}+b\cdot\text{exp}\big{(}\frac{-at(n+1)^{1-\alpha}}{1-\alpha}\big{)}\varphi_{1-\beta}(n)+\dfrac{b2^{\beta-\alpha}}{a(n-2)^{\beta-\alpha}}
(bφ1β(n)+sn0exp(an01α1α))exp(at(n+1)1α1α)+b2βαa(n2)βα.\displaystyle\leq\bigg{(}b\varphi_{1-\beta}(n)+s_{n_{0}}\text{exp}\bigg{(}\frac{an_{0}^{1-\alpha}}{1-\alpha}\bigg{)}\bigg{)}\text{exp}\bigg{(}\frac{-at(n+1)^{1-\alpha}}{1-\alpha}\bigg{)}+\dfrac{b2^{\beta-\alpha}}{a(n-2)^{\beta-\alpha}}. (3.39)

    

Theorem 3.8

Suppose that AA or BB is μ\mu-strongly monotone. For α]1/2,1],a>0\alpha\in]1/2,1],\ a>0, define

(n)λn=4aμnα.(\forall n\in\mathbb{N})\quad\lambda_{n}=\frac{4a}{\mu n^{\alpha}}. (3.40)

Suppose that there exist constants cc and θ>1\theta>1 such that

(n)𝖤[(2rnBwn2+snByn2)|n]ca.s.(\forall n\in\mathbb{N})\;\mathsf{E}[(2\|r_{n}-Bw_{n}\|^{2}+\|s_{n}-By_{n}\|^{2})|\mathcal{F}_{n}]\leq c\ \ a.s. (3.41)

and ϵn=𝒪(nθ)\epsilon_{n}=\mathcal{O}(n^{-\theta}). Set β=min{2α,θ}\beta=\min\{2\alpha,\theta\}, assume that aβa\leq\beta. Then

𝖤[xnp2]={𝒪(nαβ)if 1/2<α<1,𝒪(na)+𝒪(n1β)ifα=1,aβ1,𝒪(na)+𝒪(lnnna)ifα=1,a=β1.\displaystyle\mathsf{E}[\|x_{n}-p\|^{2}]=\begin{cases}\mathcal{O}(n^{\alpha-\beta})\ &\text{if}\ 1/2<\alpha<1,\\ \mathcal{O}(n^{-a})+\mathcal{O}(n^{1-\beta})\ &\text{if}\ \alpha=1,\ a\neq\beta-1,\\ \mathcal{O}(n^{-a})+\mathcal{O}(\frac{\ln n}{n^{a}})\ &\text{if}\ \alpha=1,\ a=\beta-1.\end{cases} (3.42)

Proof. Using the strong monotonicity of A+BA+B, we rewrite (3)

𝖤[xn+1p2|n]\displaystyle\mathsf{E}[\|x_{n+1}-p\|^{2}|\mathcal{F}_{n}] (1+ϵn)xnp2(12λn2L2)wnyn2+λn2𝖤[snByn2|n]\displaystyle\leq(1+\epsilon_{n})\|x_{n}-p\|^{2}-(1-2\lambda_{n}^{2}L^{2})\|w_{n}-y_{n}\|^{2}+\lambda_{n}^{2}\mathsf{E}[\|s_{n}-By_{n}\|^{2}|\mathcal{F}_{n}]
+2λn2rnBwn2λnμynp2+ϵn2+ϵn.\displaystyle\ \ \ +2\lambda_{n}^{2}\|r_{n}-Bw_{n}\|^{2}-\lambda_{n}\mu\|y_{n}-p\|^{2}+\epsilon_{n}^{2}+\epsilon_{n}. (3.43)

Using Cauchy-Schwarz inequality, we have

xnp2\displaystyle\|x_{n}-p\|^{2} 2(xnyn2+ynp2)\displaystyle\leq 2\big{(}\|x_{n}-y_{n}\|^{2}+\|y_{n}-p\|^{2}\big{)}
4(xnwn2+wnyn2)+2ynp2\displaystyle\leq 4\big{(}\|x_{n}-w_{n}\|^{2}+\|w_{n}-y_{n}\|^{2}\big{)}+2\|y_{n}-p\|^{2}
4ϵn2+4wnyn2+2ynp2\displaystyle\leq 4\epsilon_{n}^{2}+4\|w_{n}-y_{n}\|^{2}+2\|y_{n}-p\|^{2} (3.44)

which is equivalent to

ynp2\displaystyle\|y_{n}-p\|^{2} xnp222ϵn22wnyn2.\displaystyle\geq\dfrac{\|x_{n}-p\|^{2}}{2}-2\epsilon_{n}^{2}-2\|w_{n}-y_{n}\|^{2}. (3.45)

Hence (3) implies that

𝖤[xn+1p2|n]\displaystyle\mathsf{E}[\|x_{n+1}-p\|^{2}|\mathcal{F}_{n}] (1+ϵnλnμ2)xnp2(12λn2L22λnμ)wnyn2\displaystyle\leq(1+\epsilon_{n}-\frac{\lambda_{n}\mu}{2})\|x_{n}-p\|^{2}-(1-2\lambda_{n}^{2}L^{2}-2\lambda_{n}\mu)\|w_{n}-y_{n}\|^{2}
+λn2𝖤[snByn2|n]+2λn2rnBwn2+2λnμϵn2+ϵn2+ϵn.\displaystyle\ \ \ +\lambda_{n}^{2}\mathsf{E}[\|s_{n}-By_{n}\|^{2}|\mathcal{F}_{n}]+2\lambda_{n}^{2}\|r_{n}-Bw_{n}\|^{2}+2\lambda_{n}\mu\epsilon_{n}^{2}+\epsilon_{n}^{2}+\epsilon_{n}. (3.46)

We have that there exists n0n_{0}\in\mathbb{N} such that nn0\forall n\geq n_{0}

{ϵnλnμ4,12λn2L22λnμ0,2λnμϵn2+ϵn2+ϵn2ϵn.\displaystyle\begin{cases}\epsilon_{n}\leq\frac{\lambda_{n}\mu}{4},\\ 1-2\lambda_{n}^{2}L^{2}-2\lambda_{n}\mu\geq 0,\\ 2\lambda_{n}\mu\epsilon_{n}^{2}+\epsilon_{n}^{2}+\epsilon_{n}\leq 2\epsilon_{n}.\end{cases} (3.47)

Therefore (3) implies that for nn0n\geq n_{0}, we have

𝖤[xn+1p2]\displaystyle\mathsf{E}[\|x_{n+1}-p\|^{2}] (1λnμ4)𝖤[xnp2]+cλn2+2ϵn\displaystyle\leq(1-\frac{\lambda_{n}\mu}{4})\mathsf{E}[\|x_{n}-p\|^{2}]+c\lambda_{n}^{2}+2\epsilon_{n}
=(1anα)𝖤[xnp2]+16ca2μn2α+2ϵn\displaystyle=(1-an^{-\alpha})\mathsf{E}[\|x_{n}-p\|^{2}]+\dfrac{16ca^{2}}{\mu}n^{-2\alpha}+2\epsilon_{n} (3.48)

From the definition of θ,β\theta,\beta, there exist n1n_{1}\in\mathbb{N} and b>0b>0 such that nn1\forall n\geq n_{1}, we get

𝖤[xn+1p2](1anα)𝖤[xnp2]+bnβ.\displaystyle\mathsf{E}[\|x_{n+1}-p\|^{2}]\leq(1-an^{-\alpha})\mathsf{E}[\|x_{n}-p\|^{2}]+bn^{-\beta}. (3.49)

Using Lemma 3.7, we obtain:

In case 1/2<α<11/2<\alpha<1, from (3.32), we have 𝖤[xnp2]=𝒪(nαβ).\mathsf{E}[\|x_{n}-p\|^{2}]=\mathcal{O}(n^{\alpha-\beta}).

In case α=1\alpha=1, from (3.31) and (3.29), we get

𝖤[xnp2]={𝒪(na)+𝒪(n1β)ifaβ1𝒪(na)+𝒪(lnnna)ifa=β1\displaystyle\mathsf{E}[\|x_{n}-p\|^{2}]=\begin{cases}\mathcal{O}(n^{-a})+\mathcal{O}(n^{1-\beta})\ &\text{if}\ a\neq\beta-1\\ \mathcal{O}(n^{-a})+\mathcal{O}(\frac{\ln n}{n^{a}})\ &\text{if}\ a=\beta-1\end{cases} (3.50)

which proves the desired result.     

Remark 3.9

Here are some remarks.

  1. (1)

    The strong almost sure convergence of the iterates is obtained from the condition (3.10).

  2. (2)

    It follows from (3.42) that the best rate 𝒪(1/n)\mathcal{O}(1/n) is derived with α=1\alpha=1, θ2\theta\geq 2 and a>1a>1. This result is similar to the result in Theorem 3.4 (iii) [36]. Note that, we do not require BB is cocoercive as in [36].

  3. (3)

    The rate 𝒪(1/n)\mathcal{O}(1/n) is faster than the rate 𝒪(logn/n)\mathcal{O}(\log n/n) in [29].

  4. (4)

    In [20], the authors proved the linear convergence of 𝖤xnp2\mathsf{E}\|x_{n}-p\|^{2}. However, as mentioned above, in each iteration, the algorithm requires a large number of samples and the oracle complexity is still 𝒪(1/ϵ)\mathcal{O}(1/\epsilon) which is equal to the complexity as in Theorem 3.8.

From Theorem 3.5 and Theorem 3.8, we have the following Corollary:

Corollary 3.10

Let fΓ0()f\in\Gamma_{0}(\mathcal{H}) and h:h\colon\mathcal{H}\to\mathbb{R} be a convex differentiable function, with LL-Lipschitz continuous gradient, given by an expectation form h(x)=𝖤ξ[H(x,ξ)]h(x)=\mathsf{E}_{\xi}[H(x,\xi)]. In the expectation, ξ\xi is a random vector whose probability distribution is supported on a set Ωm\Omega\subset\mathbb{R}^{m}, and H:×ΩH\colon\mathcal{H}\times\Omega\to\mathbb{R} is convex function with respect to the variable xx. The problem is to

minimizexf(x)+h(x),\underset{x\in\mathcal{H}}{\text{minimize}}\;f(x)+h(x), (3.51)

under the following assumptions:

  1. (i)

    zer(f+h)\operatorname{zer}(\partial f+\nabla h)\not={\varnothing}.

  2. (ii)

    It is possible to obtain independent and identically distributed (i.i.d.) samples (ξn)n(\xi_{n})_{n\in\mathbb{N}}, (ξn)n(\xi_{n}^{\prime})_{n\in\mathbb{N}} of ξ\xi.

  3. (iii)

    𝖤[H(x,ξ)]=h(x)\mathsf{E}[\nabla H(x,\xi)]=\nabla h(x) for x\forall x\in\mathcal{H}.

Let (λn)n(\lambda_{n})_{n\in\mathbb{N}} be a sequence in ]0,+[\left]0,+\infty\right[. Let x1,x0x_{-1},x_{0} be in \mathcal{H}. Define

(n)wn=xn+αn(xnxn1),yn=proxλnf(wnλnH(wn,ξn))xn+1=ynλn(H(yn,ξn)H(wn,ξn)),(\forall n\in\mathbb{N})\quad\begin{array}[]{l}\left\lfloor\begin{array}[]{l}w_{n}=x_{n}+\alpha_{n}(x_{n}-x_{n-1}),\\ y_{n}=\operatorname{prox}_{\lambda_{n}f}\big{(}w_{n}-\lambda_{n}\nabla H(w_{n},\xi_{n})\big{)}\\ x_{n+1}=y_{n}-\lambda_{n}\big{(}\nabla H(y_{n},\xi_{n}^{\prime})-\nabla H(w_{n},\xi_{n})\big{)},\end{array}\right.\\[5.69054pt] \end{array}\\ (3.52)

where (αn)n(\alpha_{n})_{n\in\mathbb{N}} is defined as in Algorithm 3.2. Denote n=σ(ξ0,ξ0,,ξn1,ξn1,ξn)\mathcal{F}_{n}=\sigma(\xi_{0},\xi_{0}^{\prime},\ldots,\xi_{n-1},\xi_{n-1}^{\prime},\xi_{n}). Then, the followings hold.

  1. (i)

    If ff is μ\mu-strongly monotone (μ]0,+[\mu\in\left]0,+\infty\right[), and there exists a constant cc such that

    𝖤[(H(yn,ξn)h(yn)2+H(wn,ξn)h(wn)2)|n]ca.s..\mathsf{E}[(\|\nabla H(y_{n},\xi_{n}^{\prime})-\nabla h(y_{n})\|^{2}+\|\nabla H(w_{n},\xi_{n})-\nabla h(w_{n})\|^{2})|\mathcal{F}_{n}]\leq c\ \ a.s.. (3.53)

    Assume that ϵn=𝒪(n2)\epsilon_{n}=\mathcal{O}(n^{-2}) then for λn=8μn\lambda_{n}=\dfrac{8}{\mu n} n\forall n\in\mathbb{N}, we obtain

    𝖤[𝗑n𝗑¯2]=𝒪(1/n),\mathsf{E}\left[\|\mathsf{x}_{n}-\overline{\mathsf{x}}\|^{2}\right]=\mathcal{O}(1/n), (3.54)

    where x¯\overline{x} is the unique solution to (3.51).

  2. (ii)

    If ff is not strongly monotone, let (λn)n(\lambda_{n})_{n\in\mathbb{N}} be a sequence in ]ϵ,1ϵL[\left]\epsilon,\frac{1-\epsilon}{L}\right[. Assume that

    {n𝖤[H(wn,ξn)h(wn)2|n]<+a.s..n𝖤[H(yn,ξn)h(yn)2|n]<+a.s..\begin{cases}\sum_{n\in\mathbb{N}}\mathsf{E}[\nabla H(w_{n},\xi_{n})-\nabla h(w_{n})\|^{2}|\mathcal{F}_{n}]<+\infty\ \ a.s..\\ \sum_{n\in\mathbb{N}}\mathsf{E}[\nabla H(y_{n},\xi_{n}^{\prime})-\nabla h(y_{n})\|^{2}|\mathcal{F}_{n}]<+\infty\ \ a.s..\end{cases} (3.55)

    Then (xn)n(x_{n})_{n\in\mathbb{N}} converges weakly to a random variable x¯:Ωzer(f+h)\overline{x}\colon\Omega\to\operatorname{zer}(\partial f+\nabla h) a.s..

Proof. The conclusions are followed from Theorem 3.5 &\& 3.8 where

A=f,B=h,and(n)rn=H(wn,ξn),sn=H(yn,ξn).A=\partial f,B=\nabla h,\;\text{and}\;(\forall n\in\mathbb{N})\;r_{n}=\nabla H(w_{n},\xi_{n}),\ s_{n}=\nabla H(y_{n},\xi_{n}^{\prime}). (3.56)

    

4 Saddle point problem

Now, we study the primal-dual problem which was firstly investigated in [15]. This typical structured primal-dual framework covers a widely class of convex optimization problems and it has found many applications to image processing, machine learning [15, 16, 26, 12, 30]. Based on the duality nature of this framework, we design a new stochastic primal-dual splitting method and research the ergodic convergence of the primal-dual gap.

Problem 4.1

Let \mathcal{H} and 𝒢\mathcal{G} be separable real Hilbert spaces. Let fΓ0()f\in\Gamma_{0}(\mathcal{H}), gΓ0(𝒢)g\in\Gamma_{0}(\mathcal{G}) and h:h\colon\mathcal{H}\to\mathbb{R} be a convex differentiable function, with LhL_{h}-Lipschitz continuous gradient, given by an expectation form h(x)=𝖤ξ[H(x,ξ)]h(x)=\mathsf{E}_{\xi}[H(x,\xi)]. In the expectation, ξ\xi is a random vector whose probability distribution PP is supported on a set Ωpm\Omega_{p}\subset\mathbb{R}^{m}, and H:×ΩH\colon\mathcal{H}\times\Omega\to\mathbb{R} is convex function with respect to the variable xx. Let :𝒢\ell\colon\mathcal{G}\to\mathbb{R} be a convex differentiable function with LL_{\ell}-Lipschitz continuous gradient, and given by an expectation form (v)=𝖤ξ[L(v,ξ)]\ell(v)=\mathsf{E}_{\xi}[L(v,\xi)]. In the expectation, ζ\zeta is a random vector whose probability distribution is supported on a set ΩDd\Omega_{D}\subset\mathbb{R}^{d}, and L:𝒢×ΩDL\colon\mathcal{G}\times\Omega_{D}\to\mathbb{R} is convex function with respect to the variable vv. Let K:𝒢K\colon\mathcal{H}\to\mathcal{G} be a bounded linear operator. The primal problem is to

minimizexh(x)+(g)(Kx)+f(x),\displaystyle\underset{x\in\mathcal{H}}{\text{minimize}}\;h(x)+(\ell^{*}\mbox{\footnotesize$\square$}g)(Kx)+f(x), (4.1)

and the dual problem is to

minimizev𝒢(h+f)(Kv)+g(v)+(v),\displaystyle\underset{v\in\mathcal{G}}{\text{minimize}}\;(h+f)^{*}(-K^{*}v)+g^{*}(v)+\ell(v), (4.2)

under the following assumptions:

  1. (i)

    There exists a point (x,v)×𝒢(x^{\star},v^{\star})\in\mathcal{H}\times\mathcal{G} such that the primal-dual gap function defined by

    G:\displaystyle G: ×𝒢{,+}\displaystyle\mathcal{H}\times\mathcal{G}\to\mathbb{R}\cup\{-\infty,+\infty\}
    (x,v)h(x)+f(x)+Kxvg(v)(v)\displaystyle(x,v)\mapsto h(x)+f(x)+\left\langle{Kx}\mid{v}\right\rangle-g^{*}(v)-\ell(v) (4.3)

    verifies the following condition:

    (x)(v𝒢)G(x,v)G(x,v)G(x,v).\displaystyle\big{(}\forall x\in\mathcal{H}\big{)}\big{(}\forall v\in\mathcal{G}\big{)}\;G(x^{\star},v)\leq G(x^{\star},v^{\star})\leq G(x,v^{\star}). (4.4)
  2. (ii)

    It is possible to obtain independent and identically distributed (i.i.d.) samples (ξn,ζn)n(\xi_{n},\zeta_{n})_{n\in\mathbb{N}} and (ξn,ζn)n(\xi_{n}^{\prime},\zeta_{n}^{\prime})_{n\in\mathbb{N}} of (ξ,ζ)(\xi,\zeta).

  3. (iii)

    x,v𝒢\forall x\in\mathcal{H},\ v\in\mathcal{G}, we have 𝖤(ξ,ζ)[(H(x,ξ),L(v,ζ))]=(h(x),(v))\mathsf{E}_{(\xi,\zeta)}[(\nabla H(x,\xi),\nabla L(v,\zeta))]=(\nabla h(x),\nabla\ell(v)).

Using the standard technique as in [15], we derive from (3.52) the following stochastic primal-dual splitting method, Algorithm 4.2, for solving Problem 4.1. The weak almost sure convergence and the convergence in expectation of the resulting algorithm can be derived easily from Corollary 3.10 and hence we omit them here.

Algorithm 4.2

Choose θ[0,1]\theta\in[0,1]. Let (λn)n(\lambda_{n})_{n\in\mathbb{N}} be a positive sequence, (ϵn)n[0,+)(\epsilon_{n})_{n\in\mathbb{N}}\subset[0,+\infty) satisfying

n=0+ϵn<+.\sum_{n=0}^{+\infty}\epsilon_{n}<+\infty.

Let (x1,x0)2(x_{-1},x_{0})\in\mathcal{H}^{2} and (v1,v0)𝒢2(v_{-1},v_{0})\in\mathcal{G}^{2}. Iterates

Forn=0,1,,αn={θif xn=xn1,min{θ,ϵnxnxn1}if xnxn1.wn=xn+αn(xnxn1)un=vn+αn(vnvn1)yn=proxλnf(wnλnH(wn,ξn)λnKun)zn=proxλng(unλnL(un,ζn)+λnKwn)vn+1=znλn(L(zn,ζn)L(un,ζn))+λnK(ynwn)xn+1=ynλn(H(yn,ξn)H(wn,ξn))λnK(znun)\begin{array}[]{l}\operatorname{For}\;n=0,1,\ldots,\\ \left\lfloor\begin{array}[]{l}\alpha_{n}=\begin{cases}\theta&\text{if $x_{n}=x_{n-1}$,}\\ \min\bigg{\{}\theta,\dfrac{\epsilon_{n}}{\|x_{n}-x_{n-1}\|}\bigg{\}}&\text{if $x_{n}\neq x_{n-1}$.}\end{cases}\\ w_{n}=x_{n}+\alpha_{n}(x_{n}-x_{n-1})\\ u_{n}=v_{n}+\alpha_{n}(v_{n}-v_{n-1})\\ y_{n}=\text{prox}_{\lambda_{n}f}(w_{n}-\lambda_{n}\nabla H(w_{n},\xi_{n})-\lambda_{n}K^{*}u_{n})\\ z_{n}=\text{prox}_{\lambda_{n}g^{*}}(u_{n}-\lambda_{n}\nabla L(u_{n},\zeta_{n})+\lambda_{n}Kw_{n})\\ v_{n+1}=z_{n}-\lambda_{n}(\nabla L(z_{n},\zeta_{n}^{\prime})-\nabla L(u_{n},\zeta_{n}))+\lambda_{n}K(y_{n}-w_{n})\\ x_{n+1}=y_{n}-\lambda_{n}(\nabla H(y_{n},\xi_{n}^{\prime})-\nabla H(w_{n},\xi_{n}))-\lambda_{n}K^{*}(z_{n}-u_{n})\end{array}\right.\\[5.69054pt] \end{array} (4.5)

For simple, set μ=max{Lh,L}\mu=\max\{L_{h},L_{\ell}\} and let us define some notations in the space ×𝒢\mathcal{H}\times\mathcal{G} where the scalar product and the associated norm are defined in the normal manner,

{𝗑=(x,v),𝗐n=(wn,un),𝗑n=(xn,vn),𝗒n=(yn,zn),rn=(H(wn,ξn),L(un,ζn)),𝗌n=(H(yn,ξn),L(zn,ζn)),𝒉(𝗐n)=(h(wn),(un)),𝒉(𝗒n)=(h(yn),(zn)).\begin{cases}\mathsf{x}&=(x,v),\quad\;\mathsf{w}_{n}=(w_{n},u_{n}),\quad\mathsf{x}_{n}=(x_{n},v_{n}),\quad\;\mathsf{y}_{n}=(y_{n},z_{n}),\\ r_{n}&=(\nabla H(w_{n},\xi_{n}),\nabla L(u_{n},\zeta_{n})),\\ \mathsf{s}_{n}&=(\nabla H(y_{n},\xi_{n}^{\prime}),\nabla L(z_{n},\zeta_{n}^{\prime})),\\ \nabla\boldsymbol{h}(\mathsf{w}_{n})&=(\nabla h(w_{n}),\ell(u_{n})),\ \nabla\boldsymbol{h}(\mathsf{y}_{n})=(\nabla h(y_{n}),\nabla\ell(z_{n})).\\ \end{cases} (4.6)
Theorem 4.3

Let (λn)n(\lambda_{n})_{n\in\mathbb{N}} be a sequence in ]0,11+ϵ(μ+K)[\left]0,\frac{1}{\sqrt{1+\epsilon}(\mu+\|K\|)}\right[ (ϵ>0)(\epsilon>0) such that

C=λn2𝖤𝗌n𝒉(𝗒n)2+(1+1ϵ)λn2𝖤rn𝒉(𝗐n)2<.\;C=\lambda_{n}^{2}\mathsf{E}\|\mathsf{s}_{n}-\nabla\boldsymbol{h}(\mathsf{y}_{n})\|^{2}+(1+\frac{1}{\epsilon})\lambda_{n}^{2}\mathsf{E}\|r_{n}-\nabla\boldsymbol{h}(\mathsf{w}_{n})\|^{2}<\infty. (4.7)

For every NN\in\mathbb{N}, define

y^N=(n=0Nλnyn)/(n=0Nλn)andz^N=(n=0Nλnzn)/(n=0Nλn).\hat{y}_{N}=\bigg{(}\sum_{n=0}^{N}\lambda_{n}y_{n}\bigg{)}/\bigg{(}\sum_{n=0}^{N}\lambda_{n}\bigg{)}\;\text{and}\;\hat{z}_{N}=\bigg{(}\sum_{n=0}^{N}\lambda_{n}z_{n}\bigg{)}/\bigg{(}\sum_{n=0}^{N}\lambda_{n}\bigg{)}. (4.8)

Then the following holds:

𝖤[G(y^N,v)G(x,z^N)](12(1+ST)((x0,v0)(x,v)2+C))/(n=0Nλn),\mathsf{E}[G(\hat{y}_{N},v)-G(x,\hat{z}_{N})]\leq\bigg{(}\frac{1}{2}(1+ST)(\|(x_{0},v_{0})-(x,v)\|^{2}+C)\bigg{)}\bigg{/}\bigg{(}\sum_{n=0}^{N}\lambda_{n}\bigg{)}, (4.9)

where S=nϵnS=\sum\limits_{n\in\mathbb{N}}\epsilon_{n} and T=n=0+(1+ϵn).T=\prod_{n=0}^{+\infty}(1+\epsilon_{n}).

Proof. Since \ell is convex, we get

(zn)(v)+(zn)znv.\ell(z_{n})\leq\ell(v)+\left\langle{\nabla\ell(z_{n})}\mid{z_{n}-v}\right\rangle. (4.10)

From (4.5), we have

(znun+λnL(un,ζn)λnKwn)λng(zn),-(z_{n}-u_{n}+\lambda_{n}\nabla L(u_{n},\zeta_{n})-\lambda_{n}Kw_{n})\in\lambda_{n}\partial g^{*}(z_{n}), (4.11)

and hence, using the convexity of gg^{*},

g(v)g(zn)1λnznvznun+λnL(un,ζn)λnKwn.g^{*}(v)-g^{*}(z_{n})\geq\frac{1}{\lambda_{n}}\left\langle{z_{n}-v}\mid{z_{n}-u_{n}+\lambda_{n}\nabla L(u_{n},\zeta_{n})-\lambda_{n}Kw_{n}}\right\rangle. (4.12)

Therefore, we derive from (4.10), (4.12) and (4.3) that

G(yn,v)\displaystyle G(y_{n},v) G(yn,zn)=Kynvzng(v)+g(zn)(v)+(zn)\displaystyle-G(y_{n},z_{n})=\left\langle{Ky_{n}}\mid{v-z_{n}}\right\rangle-g^{*}(v)+g^{*}(z_{n})-\ell(v)+\ell(z_{n})
Kynvzn+1λnvznznun+λnL(un,ζn)λnKwn\displaystyle\leq\left\langle{Ky_{n}}\mid{v-z_{n}}\right\rangle+\dfrac{1}{\lambda_{n}}\left\langle{v-z_{n}}\mid{z_{n}-u_{n}+\lambda_{n}\nabla L(u_{n},\zeta_{n})-\lambda_{n}Kw_{n}}\right\rangle
+(zn)znv\displaystyle\ \ +\left\langle{\nabla\ell(z_{n})}\mid{z_{n}-v}\right\rangle
=K(ynwn)vzn+1λnvznznun\displaystyle=\left\langle{K(y_{n}-w_{n})}\mid{v-z_{n}}\right\rangle+\dfrac{1}{\lambda_{n}}\left\langle{v-z_{n}}\mid{z_{n}-u_{n}}\right\rangle
+(zn)L(un,ζn)znv\displaystyle\ \ +\left\langle{\nabla\ell(z_{n})-\nabla L(u_{n},\zeta_{n})}\mid{z_{n}-v}\right\rangle
=K(ynwn)L(zn,ζn+1/2)+L(un,ζn)vzn+1λnvznznun\displaystyle=\left\langle{K(y_{n}-w_{n})-\nabla L(z_{n},\zeta_{n+1/2})+\nabla L(u_{n},\zeta_{n})}\mid{v-z_{n}}\right\rangle+\dfrac{1}{\lambda_{n}}\left\langle{v-z_{n}}\mid{z_{n}-u_{n}}\right\rangle
+l(zn)L(zn,ζn+1/2)znv\displaystyle\ \ +\left\langle{\nabla l(z_{n})-\nabla L(z_{n},\zeta_{n+1/2})}\mid{z_{n}-v}\right\rangle
=vn+1znλnvzn+1λnvznznun\displaystyle=\left\langle{\frac{v_{n+1}-z_{n}}{\lambda_{n}}}\mid{v-z_{n}}\right\rangle+\dfrac{1}{\lambda_{n}}\left\langle{v-z_{n}}\mid{z_{n}-u_{n}}\right\rangle
+l(zn)L(zn,ζn+1/2)znv\displaystyle\ \ +\left\langle{\nabla l(z_{n})-\nabla L(z_{n},\zeta_{n+1/2})}\mid{z_{n}-v}\right\rangle (4.13)

By the same way, we have

h(yn)h(x)h(yn)ynx,h(y_{n})-h(x)\leq\left\langle{\nabla h(y_{n})}\mid{y_{n}-x}\right\rangle, (4.14)

and

(ynwn+λnH(wn,ξn)+λnK(un))λnf(yn),-(y_{n}-w_{n}+\lambda_{n}\nabla H(w_{n},\xi_{n})+\lambda_{n}K^{*}(u_{n}))\in\lambda_{n}\partial f(y_{n}), (4.15)

which implies

f(yn)f(x)1λnxynynwn+λnH(wn,ξn)+λnK(un).f(y_{n})-f(x)\leq\frac{1}{\lambda_{n}}\left\langle{x-y_{n}}\mid{y_{n}-w_{n}+\lambda_{n}\nabla H(w_{n},\xi_{n})+\lambda_{n}K^{*}(u_{n})}\right\rangle. (4.16)

In turn, we have

G(yn,zn)\displaystyle G(y_{n},z_{n}) G(x,zn)=h(yn)h(x)+K(ynx)zn+f(yn)f(x)\displaystyle-G(x,z_{n})=h(y_{n})-h(x)+\left\langle{K(y_{n}-x)}\mid{z_{n}}\right\rangle+f(y_{n})-f(x)
h(yn)ynx+K(ynx)zn\displaystyle\leq\left\langle{\nabla h(y_{n})}\mid{y_{n}-x}\right\rangle+\left\langle{K(y_{n}-x)}\mid{z_{n}}\right\rangle
+1λnxynynwn+λnH(wn,ξn)+λnK(un)\displaystyle\ \ +\frac{1}{\lambda_{n}}\left\langle{x-y_{n}}\mid{y_{n}-w_{n}+\lambda_{n}\nabla H(w_{n},\xi_{n})+\lambda_{n}K^{*}(u_{n})}\right\rangle
=1λnxynynwn+xynH(wm,ξn)+K(unzn)H(yn,ξn+1/2)\displaystyle=\frac{1}{\lambda_{n}}\left\langle{x-y_{n}}\mid{y_{n}-w_{n}}\right\rangle+\left\langle{x-y_{n}}\mid{\nabla H(w_{m},\xi_{n})+K^{*}(u_{n}-z_{n})-\nabla H(y_{n},\xi_{n+1/2})}\right\rangle
+h(yn)H(yn,ξn+1/2)ynx\displaystyle\ \ +\left\langle{\nabla h(y_{n})-\nabla H(y_{n},\xi_{n+1/2})}\mid{y_{n}-x}\right\rangle
=1λnxynynwn+xynxn+1ynλn\displaystyle=\frac{1}{\lambda_{n}}\left\langle{x-y_{n}}\mid{y_{n}-w_{n}}\right\rangle+\left\langle{x-y_{n}}\mid{\dfrac{x_{n+1}-y_{n}}{\lambda_{n}}}\right\rangle
+h(yn)H(yn,ξn+1/2)ynx.\displaystyle\ \ +\left\langle{\nabla h(y_{n})-\nabla H(y_{n},\xi_{n+1/2})}\mid{y_{n}-x}\right\rangle. (4.17)

It follows from (4.13) and (4.17) that

G(yn,v)G(x,zn)\displaystyle G(y_{n},v)-G(x,z_{n})
1λn(𝗑𝗒n𝗒n𝗐n+𝗑𝗒n𝗑n+1𝗒n)+𝒉(𝗒n)𝗌n𝗒n𝗑,\displaystyle\quad\leq\dfrac{1}{\lambda_{n}}\big{(}\left\langle{\mathsf{x}-\mathsf{y}_{n}}\mid{\mathsf{y}_{n}-\mathsf{w}_{n}}\right\rangle+\left\langle{\mathsf{x}-\mathsf{y}_{n}}\mid{\mathsf{x}_{n+1}-\mathsf{y}_{n}}\right\rangle\big{)}+\left\langle{\nabla\boldsymbol{h}(\mathsf{y}_{n})-\mathsf{s}_{n}}\mid{\mathsf{y}_{n}-\mathsf{x}}\right\rangle, (4.18)

which is equivalent to

2λn\displaystyle 2\lambda_{n} (G(yn,v)G(x,zn))\displaystyle\big{(}G(y_{n},v)-G(x,z_{n})\big{)}
𝗐n𝗑2𝗒n𝗑2𝗒n𝗐n2(𝗑n+1𝗑2𝗑n+1𝗒n2𝗒n𝗑2)\displaystyle\leq\|\mathsf{w}_{n}-\mathsf{x}\|^{2}-\|\mathsf{y}_{n}-\mathsf{x}\|^{2}-\|\mathsf{y}_{n}-\mathsf{w}_{n}\|^{2}-\big{(}\|\mathsf{x}_{n+1}-\mathsf{x}\|^{2}-\|\mathsf{x}_{n+1}-\mathsf{y}_{n}\|^{2}-\|\mathsf{y}_{n}-\mathsf{x}\|^{2}\big{)}
+𝒉(𝗒n)𝗌n𝗒n𝗑\displaystyle\ \ +\left\langle{\nabla\boldsymbol{h}(\mathsf{y}_{n})-\mathsf{s}_{n}}\mid{\mathsf{y}_{n}-\mathsf{x}}\right\rangle
=𝗐n𝗑2𝗑n+1𝗑2𝗒n𝗐n2+𝗑n+1𝗒n2+𝒉(𝗒n)𝗌n𝗒n𝗑,\displaystyle=\|\mathsf{w}_{n}-\mathsf{x}\|^{2}-\|\mathsf{x}_{n+1}-\mathsf{x}\|^{2}-\|\mathsf{y}_{n}-\mathsf{w}_{n}\|^{2}+\|\mathsf{x}_{n+1}-\mathsf{y}_{n}\|^{2}+\left\langle{\nabla\boldsymbol{h}(\mathsf{y}_{n})-\mathsf{s}_{n}}\mid{\mathsf{y}_{n}-\mathsf{x}}\right\rangle, (4.19)

which implies

2λn𝖤[G(yn,v)G(x,zn)|n]\displaystyle 2\lambda_{n}\mathsf{E}\big{[}G(y_{n},v)-G(x,z_{n})\big{|}\mathcal{F}_{n}] 𝗐n𝗑2𝖤[𝗑n+1𝗑2|n]𝗒n𝗐n2\displaystyle\leq\|\mathsf{w}_{n}-\mathsf{x}\|^{2}-\mathsf{E}[\|\mathsf{x}_{n+1}-\mathsf{x}\|^{2}|\mathcal{F}_{n}]-\|\mathsf{y}_{n}-\mathsf{w}_{n}\|^{2}
+𝖤[𝗑n+1𝗒n2|n],\displaystyle\ \ +\mathsf{E}[\|\mathsf{x}_{n+1}-\mathsf{y}_{n}\|^{2}|\mathcal{F}_{n}], (4.20)

where n=σ(ξ0,ζ0,ξ0,ζ0,,ξn1,ζn1,ξn1,ζn1,ξn,ζn).\mathcal{F}_{n}=\sigma(\xi_{0},\zeta_{0},\xi_{0}^{\prime},\zeta_{0}^{\prime},\ldots,\xi_{n-1},\zeta_{n-1},\xi_{n-1}^{\prime},\zeta_{n-1}^{\prime},\xi_{n},\zeta_{n}).
Taking expectation both side of above inequality, we get

2λn𝖤[G(yn,v)G(x,zn)]\displaystyle 2\lambda_{n}\mathsf{E}\big{[}G(y_{n},v)-G(x,z_{n})\big{]} 𝖤𝗐n𝗑2𝖤𝗑n+1𝗑2𝖤𝗒n𝗐n2+𝖤𝗑n+1𝗒n2.\displaystyle\leq\mathsf{E}\|\mathsf{w}_{n}-\mathsf{x}\|^{2}-\mathsf{E}\|\mathsf{x}_{n+1}-\mathsf{x}\|^{2}-\mathsf{E}\|\mathsf{y}_{n}-\mathsf{w}_{n}\|^{2}+\mathsf{E}\|\mathsf{x}_{n+1}-\mathsf{y}_{n}\|^{2}. (4.21)

Now, we estimate the first and the last term in the right side of (4.21). For the first term, using (3.5), we have

𝗐n𝗑2\displaystyle\|\mathsf{w}_{n}-\mathsf{x}\|^{2} =𝗑n+αn(𝗑n𝗑n1)𝗑2=𝗑n𝗑2+αn2𝗑n𝗑n12\displaystyle=\|\mathsf{x}_{n}+\alpha_{n}(\mathsf{x}_{n}-\mathsf{x}_{n-1})-\mathsf{x}\|^{2}=\|\mathsf{x}_{n}-\mathsf{x}\|^{2}+\alpha_{n}^{2}\|\mathsf{x}_{n}-\mathsf{x}_{n-1}\|^{2}
+2αn𝗑n𝗑𝗑n𝗑n1\displaystyle\ \ +2\alpha_{n}\left\langle{\mathsf{x}_{n}-\mathsf{x}}\mid{\mathsf{x}_{n}-\mathsf{x}_{n-1}}\right\rangle
𝗑n𝗑2+ϵn2+2ϵn𝗑n𝗑\displaystyle\leq\|\mathsf{x}_{n}-\mathsf{x}\|^{2}+\epsilon_{n}^{2}+2\epsilon_{n}\|\mathsf{x}_{n}-\mathsf{x}\|
(1+ϵn)𝗑n𝗑2+ϵn2+ϵn.\displaystyle\leq(1+\epsilon_{n})\|\mathsf{x}_{n}-\mathsf{x}\|^{2}+\epsilon_{n}^{2}+\epsilon_{n}. (4.22)

For the last term of (4.21)

𝗑n+1𝗒n2=xn+1yn2+vn+1zn2.\displaystyle\|\mathsf{x}_{n+1}-\mathsf{y}_{n}\|^{2}=\|x_{n+1}-y_{n}\|^{2}+\|v_{n+1}-z_{n}\|^{2}. (4.23)

From (4.5), we get

xn+1yn2\displaystyle\|x_{n+1}-y_{n}\|^{2} =λn(H(yn,ξn)H(wn,ξn))+λnK(znun)2\displaystyle=\|\lambda_{n}(\nabla H(y_{n},\xi_{n}^{\prime})-\nabla H(w_{n},\xi_{n}))+\lambda_{n}K^{*}(z_{n}-u_{n})\|^{2}
=λn2(H(yn,ξn)h(yn))+h(yn)H(wn,ξn))+K(znun)2\displaystyle=\lambda_{n}^{2}\|(\nabla H(y_{n},\xi_{n}^{\prime})-\nabla h(y_{n}))+\nabla h(y_{n})-\nabla H(w_{n},\xi_{n}))+K^{*}(z_{n}-u_{n})\|^{2}
=λn2(H(yn,ξn)h(yn)2+h(yn)H(wn,ξn))+K(znun)2\displaystyle=\lambda_{n}^{2}\bigg{(}\|\nabla H(y_{n},\xi_{n}^{\prime})-\nabla h(y_{n})\|^{2}+\|\nabla h(y_{n})-\nabla H(w_{n},\xi_{n}))+K^{*}(z_{n}-u_{n})\|^{2}
+2H(yn,ξn)h(yn)h(yn)H(wn,ξn))+K(znun)),\displaystyle\ \ \ +2\left\langle{\nabla H(y_{n},\xi_{n}^{\prime})-\nabla h(y_{n})}\mid{\nabla h(y_{n})-\nabla H(w_{n},\xi_{n}))+K^{*}(z_{n}-u_{n})}\right\rangle\bigg{)}, (4.24)

which implies that

𝖤xn+1yn2\displaystyle\mathsf{E}\|x_{n+1}-y_{n}\|^{2} λn2𝖤[H(yn,ξn)h(yn)2]\displaystyle\leq\lambda_{n}^{2}\mathsf{E}[\|\nabla H(y_{n},\xi_{n}^{\prime})-\nabla h(y_{n})\|^{2}]
+λn2𝖤[h(wn)H(wn,ξn))+h(yn)h(wn)+K(znun)2]\displaystyle\ +\lambda_{n}^{2}\mathsf{E}[\|\nabla h(w_{n})-\nabla H(w_{n},\xi_{n}))+\nabla h(y_{n})-\nabla h(w_{n})+K^{*}(z_{n}-u_{n})\|^{2}]
λn2𝖤[H(yn,ξn)h(yn)2]+(1+1ϵ)λn2𝖤[h(wn)H(wn,ξn)2]\displaystyle\leq\lambda_{n}^{2}\mathsf{E}[\|\nabla H(y_{n},\xi_{n}^{\prime})-\nabla h(y_{n})\|^{2}]+(1+\frac{1}{\epsilon})\lambda_{n}^{2}\mathsf{E}[\|\nabla h(w_{n})-\nabla H(w_{n},\xi_{n})\|^{2}]
+(1+ϵ)λn2𝖤[h(yn)h(wn)+K(znun)2].\displaystyle\ +(1+\epsilon)\lambda_{n}^{2}\mathsf{E}[\|\nabla h(y_{n})-\nabla h(w_{n})+K^{*}(z_{n}-u_{n})\|^{2}]. (4.25)

The last term in the right side of (4.25) can be bounded

h(yn)h(wn)+K(znun)2\displaystyle\|\nabla h(y_{n})-\nabla h(w_{n})+K^{*}(z_{n}-u_{n})\|^{2} =h(yn)h(wn)2+K(znun)2\displaystyle=\|\nabla h(y_{n})-\nabla h(w_{n})\|^{2}+\|K^{*}(z_{n}-u_{n})\|^{2}
+2h(yn)h(wn)K(znun)\displaystyle\ +2\left\langle{\nabla h(y_{n})-\nabla h(w_{n})}\mid{K^{*}(z_{n}-u_{n})}\right\rangle
Lh2ynwn2+K2znun2\displaystyle\leq L_{h}^{2}\|y_{n}-w_{n}\|^{2}+\|K\|^{2}\|z_{n}-u_{n}\|^{2}
+2LhKynwnznun.\displaystyle\ +2L_{h}\|K\|\|y_{n}-w_{n}\|\|z_{n}-u_{n}\|. (4.26)

From (4.25) and (4.26), we get

𝖤xn+1\displaystyle\mathsf{E}\|x_{n+1} yn2λn2𝖤[H(yn,ξn)h(yn)2]+(1+1ϵ)λn2𝖤[h(wn)H(wn,ξn)2]\displaystyle-y_{n}\|^{2}\leq\lambda_{n}^{2}\mathsf{E}[\|\nabla H(y_{n},\xi_{n}^{\prime})-\nabla h(y_{n})\|^{2}]+(1+\frac{1}{\epsilon})\lambda_{n}^{2}\mathsf{E}[\|\nabla h(w_{n})-\nabla H(w_{n},\xi_{n})\|^{2}]
+(1+ϵ)λn2𝖤[Lh2ynwn2+K2znun2+2LhKynwnznun].\displaystyle\ +(1+\epsilon)\lambda_{n}^{2}\mathsf{E}[L_{h}^{2}\|y_{n}-w_{n}\|^{2}+\|K\|^{2}\|z_{n}-u_{n}\|^{2}+2L_{h}\|K\|\|y_{n}-w_{n}\|\|z_{n}-u_{n}\|]. (4.27)

Similarly to (4.27), we have

𝖤vn+1\displaystyle\mathsf{E}\|v_{n+1} zn2λn2𝖤[L(zn,ζn)(zn)2]+(1+1ϵ)λn2𝖤[(un)L(un,ζn)2]\displaystyle-z_{n}\|^{2}\leq\lambda_{n}^{2}\mathsf{E}[\|\nabla L(z_{n},\zeta_{n}^{\prime})-\nabla\ell(z_{n})\|^{2}]+(1+\frac{1}{\epsilon})\lambda_{n}^{2}\mathsf{E}[\|\nabla\ell(u_{n})-\nabla L(u_{n},\zeta_{n})\|^{2}]
+(1+ϵ)λn2𝖤[L2znun2+K2ynwn2+2LKznunynwn].\displaystyle\ +(1+\epsilon)\lambda_{n}^{2}\mathsf{E}[L_{\ell}^{2}\|z_{n}-u_{n}\|^{2}+\|K\|^{2}\|y_{n}-w_{n}\|^{2}+2L_{\ell}\|K\|\|z_{n}-u_{n}\|\|y_{n}-w_{n}\|]. (4.28)

Combining (4.27) and (4.28), we derive

𝖤𝗑n+1𝗒n2\displaystyle\mathsf{E}\|\mathsf{x}_{n+1}-\mathsf{y}_{n}\|^{2} λn2𝖤𝗌n𝒉(𝗒n)2+(1+1ϵ)λn2𝖤rn𝒉(𝗐n)2\displaystyle\leq\lambda_{n}^{2}\mathsf{E}\|\mathsf{s}_{n}-\nabla\boldsymbol{h}(\mathsf{y}_{n})\|^{2}+(1+\frac{1}{\epsilon})\lambda_{n}^{2}\mathsf{E}\|r_{n}-\nabla\boldsymbol{h}(\mathsf{w}_{n})\|^{2}
+(1+ϵ)(μ+K)2λn2𝖤𝗒n𝗐n2\displaystyle\ +(1+\epsilon)(\mu+\|K\|)^{2}\lambda_{n}^{2}\mathsf{E}\|\mathsf{y}_{n}-\mathsf{w}_{n}\|^{2} (4.29)

From (4.21), (4) and (4.29), using the condition of λn\lambda_{n}, i.e. λn11+ϵ(μ+K)\lambda_{n}\leq\frac{1}{\sqrt{1+\epsilon}(\mu+\|K\|)}, we get

2λn𝖤[G(yn,v)G(x,zn)]\displaystyle 2\lambda_{n}\mathsf{E}\big{[}G(y_{n},v)-G(x,z_{n})\big{]} (1+ϵn)𝖤𝗑n𝗑2𝖤𝗑n+1𝗑2+cn,\displaystyle\leq(1+\epsilon_{n})\mathsf{E}\|\mathsf{x}_{n}-\mathsf{x}\|^{2}-\mathsf{E}\|\mathsf{x}_{n+1}-\mathsf{x}\|^{2}+c_{n}, (4.30)

where cn=λn2𝖤𝗌n𝒉(𝗒n)2+(1+1ϵ)λn2𝖤rn𝒉(𝗐n)2c_{n}=\lambda_{n}^{2}\mathsf{E}\|\mathsf{s}_{n}-\nabla\boldsymbol{h}(\mathsf{y}_{n})\|^{2}+(1+\frac{1}{\epsilon})\lambda_{n}^{2}\mathsf{E}\|r_{n}-\nabla\boldsymbol{h}(\mathsf{w}_{n})\|^{2}. It follows from (4.30) that

𝖤𝗑n+1𝗑2(1+ϵn)𝖤𝗑n𝗑2+cn.\displaystyle\mathsf{E}\|\mathsf{x}_{n+1}-\mathsf{x}\|^{2}\leq(1+\epsilon_{n})\mathsf{E}\|\mathsf{x}_{n}-\mathsf{x}\|^{2}+c_{n}. (4.31)

Using above inequality nn times, we obtain

𝖤𝗑n+1𝗑2\displaystyle\mathsf{E}\|\mathsf{x}_{n+1}-\mathsf{x}\|^{2} i=0n(1+ϵi)𝗑0𝗑2+i=0nj=i+1n(1+ϵj)ci\displaystyle\leq\prod_{i=0}^{n}(1+\epsilon_{i})\|\mathsf{x}_{0}-\mathsf{x}\|^{2}+\sum\limits_{i=0}^{n}\prod_{j=i+1}^{n}(1+\epsilon_{j})c_{i}
T𝗑0𝗑2+Tncn=T𝗑0𝗑2+TC.\displaystyle\leq T\|\mathsf{x}_{0}-\mathsf{x}\|^{2}+T\sum\limits_{n\in\mathbb{N}}c_{n}=T\|\mathsf{x}_{0}-\mathsf{x}\|^{2}+TC. (4.32)

Summing (4.30) from n=0n=0 to n=Nn=N, we have

2n=0Nλn𝖤[G(yn,v)G(x,zn)]\displaystyle 2\sum\limits_{n=0}^{N}\lambda_{n}\mathsf{E}\big{[}G(y_{n},v)-G(x,z_{n})\big{]} 𝗑0𝗑2+n=0Nϵn𝖤𝗑n𝗑2𝖤𝗑n+1𝗑2+n=0Ncn\displaystyle\leq\|\mathsf{x}_{0}-\mathsf{x}\|^{2}+\sum\limits_{n=0}^{N}\epsilon_{n}\mathsf{E}\|\mathsf{x}_{n}-\mathsf{x}\|^{2}-\mathsf{E}\|\mathsf{x}_{n+1}-\mathsf{x}\|^{2}+\sum\limits_{n=0}^{N}c_{n}
𝗑0𝗑2+S(T𝗑0𝗑2+TC)+C\displaystyle\leq\|\mathsf{x}_{0}-\mathsf{x}\|^{2}+S(T\|\mathsf{x}_{0}-\mathsf{x}\|^{2}+TC)+C
=(1+TS)(𝗑0𝗑2+C).\displaystyle=(1+TS)(\|\mathsf{x}_{0}-\mathsf{x}\|^{2}+C). (4.33)

Using the convexity-concavity of GG, we obtain the desired result.     

Remark 4.4

Here are some remarks.

  1. (1)

    The upper bound of (λn)n(\lambda_{n})_{n\in\mathbb{N}} in Theorem 4.3 can be written as 1ϵμ+K\dfrac{1-\epsilon^{\prime}}{\mu+\|K\|}, where ϵ]0,1[.\epsilon^{\prime}\in]0,1[. Indeed, for ϵ<1(1ϵ)21\epsilon<\frac{1}{(1-\epsilon^{\prime})^{2}}-1, we get λn<11+ϵ(μ+K).\lambda_{n}<\frac{1}{\sqrt{1+\epsilon}(\mu+\|K\|)}.

  2. (2)

    In the deterministic setting, the convergence rate of the primal-dual gap for structure convex optimization involving infimal convolutions were also investigated in [5, 7]. Furthermore, in the deterministic case and αn=0\alpha_{n}=0 n\forall n\in\mathbb{N}, our result is one in [7]. The convergence rate 𝒪(1/N)\mathcal{O}(1/N) of the primal-dual gap also obtain in [23].

  3. (3)

    In the stochastic setting, our result is the same convergence rate of the primal-dual gap as in [38, 29].

References

  • [1] K. Aoyama, Y. Kimura, and W. Takahashi. Maximal monotone operators and maximal monotone functions for equilibrium problems. Journal of Convex Analysis, 15:395–409, 2008.
  • [2] H. H. Bauschke and P. L. Combettes. Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York, 2017.
  • [3] A. Beck and M. Teboulle. Gradient-based algorithms with applications to signal recovery problems. In: Convex Optimization in Signal Processing and Communications, Editors: Yonina Eldar and Daniel Palomar. Cambridge University Press, 2009.
  • [4] P. Bianchi, W. Hachem, and F. Iutzeler. A stochastic coordinate descent primal-dual algorithm and applications to large-scale composite optimization. http://arxiv.org/abs/1407.0898.
  • [5] R. I. Bot and E. R. Csetnek. On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems. J. Math. Imaging, 64:5–23, 2015.
  • [6] R. I. Bot and E. R. Csetnek. An inertial forward-backward-forward primal-dual splitting algorithm for solving monotone inclusion problems. Numer. Algor., 71:519–540, 2016.
  • [7] R. I. Bot and C. Hendrich. Convergence analysis for a primal-dual monotone + skew splitting algorithm with applications to total variation minimization,. J. Math. Imaging Vis., 49:551–568, 2014.
  • [8] H. Brézis. Operateurs maximaux monotones. North-Holland Math Stud., 5:19–51, 1973.
  • [9] H. Brézis. Opérateurs maximaux monotones et semi-groupes de contractions dans les espaces de Hilbert. North-Holland Publishing Co., Amsterdam-London; American Elsevier Publishing Co., Inc., New York, 1973.
  • [10] V. Cevher and B. C. Vu. A reflected forward-backward splitting method for monotone inclu-sions involving lipschitzian operators. Set-Valued Var. Anal., 29:163–174, 2021.
  • [11] V. Cevher, B. C. Vu, and A. Yurtsever. Stochastic forward douglas-rachford splitting methodfor monotone inclusions. In: P. Giselsson and A. Rantzer (eds) Large-scale and distributed optimization. Lecture Notes in Mathematics, 2227, 2018.
  • [12] A. Chambolle and T. Pock. An introduction to continuous optimization for imaging. Acta Numer., 25:161–319, 2016.
  • [13] W. Cholamjiak, P. Cholamjiak, and S. Suanta. An inertial forward–backward splitting method for solving inclusion problems in hilbert spaces. J. Fixed Point Theory Appl., pages 20–42, 2018.
  • [14] P. L. Combettes. Solving monotone inclusions via compositions of nonexpansive averaged operators. Optimization, 53:475–504, 2004.
  • [15] P. L. Combettes and J. C. Pesque. Primal-dual splitting algorithm for solving inclusions with mixtures of composite, lipschitzian, and parallel-sum type monotone operators. Set-Valued Var. Anal, 20:307–330, 2012.
  • [16] P. L. Combettes and J. C. Pesquet. Proximal splitting methods in signal processing. fixed-point algorithms for inverse problems in science and engineering. Optim. Appl., 49:185–212, 2011.
  • [17] P. L. Combettes and J. C. Pesquet. Stochastic quasi-fejér block-coordinate fixed point itera-tions with random sweeping. SIAM J. Optim., 25:1221–1248, 2015.
  • [18] P. L. Combettes and J. C. Pesquet. Stochastic approximations and perturbations in forward-backward splitting for monotone operators. Pure Appl. Funct. Anal., 1:13–37, 2016.
  • [19] P. L. Combettes and V. R. Wajs. Signal recovery by proximal forward-backward splitting. Multi-scale Model. Simul., 4:1168–1200, 2005.
  • [20] S. Cui and V. Shanbhag. Variance-reduced splitting schemes for monotone stochastic generalized equations. https://arxiv.org/pdf/2008.11348.pdf.
  • [21] S. Cui, V. Shanbhag, M. Staudigl, and P. T. Vuong. Stochastic relaxed inertial forward-backward-forward splitting for monotone inclusions in hilbert spaces. https://arxiv.org/pdf/2107.10335.pdf.
  • [22] J. Douglas and H. H. Rachford. On the numerical solution of the heat conduction problem in 2 and 3 space variables. Trans. Am. Math. Soc., 82:421–439, 1956.
  • [23] Y. Drori, S. Sabach, and M. Teboulle. A simple algorithm for a class of nonsmooth convex-concave saddle-point problems. Oper. Res. Lett., 43:209–214, 2015.
  • [24] J. Duchi and Y. Singer. Efficient online and batch learning using forward backward splitting. J. Mach. Learn. Res., 10:2899–2934, 2009.
  • [25] J. Eckstein and D. P. Bertsekas. On the douglas-rachford splitting method and the proximalpoint algorithm for maximal monotone operators. Math. Program., 55:293–318, 1992.
  • [26] N. Komodakis and J. C. Pesquet. Playing with duality: An overview of recent primal-dual approaches for solving large-scale optimization problems. IEEE Signal processing magazine, 32:31–54, 2015.
  • [27] M. Ledoux and M. Talagrand. Probability in Banach spaces: isoperimetry and processes. Springer, New York, 1991.
  • [28] P. L. Lions and B. Mercier. Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal., 16(6):964–979, 1979.
  • [29] V. D. Nguyen and B. C. Vu. Convergence analysis of the stochastic reflected forward-backward splitting algorithm. Optimization letter, 2022, https://link.springer.com/article/10.1007/s11590-021-01844-8.
  • [30] H. Ouyang, N. He, L. Tran, and A. Gray. Stochastic alternating direction method of multipliers. In Proceedings of the 30th International Conference on Machine Learning, Atlanta, GA, USA, 2013.
  • [31] D. W. Peaceman and H. H. Rachford. The numerical solution of parabolic and elliptic differentials. J. Soc. Ind. Appl. Math., 3:28–41, 1955.
  • [32] J. C. Pesquet and A. Repetti. A class of randomized primal-dual algorithms for distributed optimization. J. Nonlinear Convex Anal., 2015.
  • [33] H. Raguet, J. Fadili, and G. Peyré. A generalized forward-backward splitting. SIAM J. Imaging Sci., 6(13):1199–1226, 2013.
  • [34] H. Robbins and D.Siegmund. A convergence theorem for non negative almost supermartingales and some applications. In: Rustagi JS, editor. Optimizing methods in statistic, Academic Press:233–257, 1971.
  • [35] L. Rosasco, S. Villa, and B. C. Vu. Convergence of stochastic proximal gradient. http://arxiv.org/abs/1403.5074, 2014.
  • [36] L. Rosasco, S. Villa, and B. C. Vu. Stochastic forward-backward splitting for monotone inclusions. J. Optim. Theory Appl., 169:388–406, 2016.
  • [37] L. Rosasco, S. Villa, and B. C. Vu. A stochastic inertial forward-backward splitting algorithm for multivariate monotone inclusions. Optimization, 65:1293–1314, 2016.
  • [38] L. Rosasco, S. Villa, and B. C. Vu. A first-order stochastic primal-dual algorithm with correction step. Numer. Funct. Anal. Optim. 2, 38(5):602–626, 2017.
  • [39] S. Takahashi, W. Takahashi, and M. Toyoda. Strong convergence theorems for maximal monotone operators with nonlinear mappings in hilbert spaces. J. Optim. Theory Appl., 147:27–41, 2010.
  • [40] D. V. Thong and N. T. Vinh. Inertial methods for fixed point problems and zero point problems of the sum of two monotone mappings. Optimization, 68(5):1037–1072, 2019.
  • [41] P. Tseng. A modified forward-backward splitting method for maximal monotone mappings. SIAM J. Control Optim., 38(2):431–446, 2000.
  • [42] B. C. Vu. Almost sure convergence of the forward-backward-forward splitting algorithm. Optim. Lett., 10:781–803, 2016.