A Tseng type stochastic forward-backward algorithm for monotone inclusions

Van Dung Nguyen^a and Nguyen The Vinh^a

^aDepartment of Mathematical Analysis,
University of Transport and Communications, 3 Cau Giay Street, Hanoi, Vietnam
[email protected]; [email protected] CONTACT N. T. Vinh. Email: [email protected]

Abstract

In this paper, we propose a stochastic version of the classical Tseng’s forward-backward-forward method with inertial term for solving monotone inclusions given by the sum of a maximal monotone operator and a single-valued monotone operator in real Hilbert spaces. We obtain the almost sure convergence for the general case and the rate $\mathcal{O}(1/n)$ in expectation for the strong monotone case. Furthermore, we derive $\mathcal{O}(1/n)$ rate convergence of the primal-dual gap for saddle point problems.

Keywords: Forward-backward Splitting Algorithm, Monotone Inclusions, Tseng’s Method, Stochastic Algorithm, Convergence Rate.

Mathematics Subject Classifications (2010):

1 Introduction

In this paper, we study the following inclusion problem:

\displaystyle\text{find $x^{*}\in\mathcal{H}$ such that}\ \ 0\in Ax^{*}+Bx^{*},

(1.1)

where $\mathcal{H}$ is a separable real Hilbert space, $A:\mathcal{H}\to 2^{\mathcal{H}}$ is a maximal monotone operator and $B:\mathcal{H}\to\mathcal{H}$ is a monotone operator. The solution set of (1.1) is denoted by $(A+B)^{-1}(0)$ .

This problem plays an important role in many fields, such as equilibrium problems, fixed point problems, variational inequalities, and composite minimization problems, see, for example, [1, 9, 2]. To be more precise, many problems in signal processing, computer vision and machine learning can be modeled mathematically as this formulation, see [19, 33, 3] and the references therein. For solving the problem (1.1), the so-called forward-backward splitting method is given as follows:

\displaystyle x_{n+1}=(I+\lambda A)^{-1}(x_{n}-\lambda Bx_{n}),

(1.2)

where $\lambda>0$ .

The forward-backward splitting algorithm for monotone inclusion problems was first introduced by Lions and Mercier [28]. In the work of Lions and Mercier, other splitting methods, such as Peaceman– Rachford algorithm [31] and Douglas-Rachford algorithm [22] was developed to find the zeros of the sum of two maximal monotone operators. Since then, it has been studied and reported extensively in the literature; see, for instance, [41, 10, 14, 6, 13, 25, 40] and the references therein. Recently, stochastic versions of splitting algorithms for monotone inclusions have been proposed, for example stochastic forward-backward splitting method [37, 36], stochastic Douglas-Rachford splitting method [11], stochastic reflected forward-backward splitting method [29] and stochastic primal-dual method [38], see also [19, 4, 32] and applications to stochastic optimization [37, 17] and machine learning [35, 24].

Motivated and inspired by the algorithms in [41, 37, 36, 42, 29], we will introduce a new stochastic splitting algorithm for inclusion problems. The convergence and the rate convergence of the proposed algorithm are obtained.

The rest of the paper is organized as follows. After collecting some definitions and basic results in Section 2, we prove in Section 3 the almost sure convergen for the general case and the strong convergence along with the rate convergence in the strongly monotone case.

In section 4, we apply the proposed algorithm to the convex-concave saddle point problem.

2 Preliminaries

Let $\mathcal{H}$ be a separable real Hilbert space endowed with the inner product $\left\langle{}\mid{}\right\rangle$ and the associated norm $\|\cdot\|$ . When $(x_{n})_{n\in\mathbb{N}}$ is a sequence in $\mathcal{H}$ , we denote strong convergence of $(x_{n})$ to $x\in\mathcal{H}$ by $x_{n}\to x$ and weak convergence by $x_{n}\rightharpoonup x$ .

We recall some well-known definitions.

Definition 2.1

Let $A:\mathcal{H}\to 2^{\mathcal{H}}$ be a set-valued mapping with nonempty values.

(1)

$A$ is said to be monotone if for all $x,y\in\mathcal{H},\ u\in Ax$ and $v\in Ay$ , the following inequality holds $\left\langle{u-v}\mid{x-y}\right\rangle\geq 0$ .
(2)

$A$ is said to be maximally monotone, if it is monotone and if for any $(x,u)\in\mathcal{H}\times\mathcal{H}$ , $\left\langle{u-v}\mid{x-y}\right\rangle\geq 0$ for every $(y,v)\in$ $\operatorname{gra}A=\{(x,y):y\in Ax\}$ (the graph of mapping $A$ ) implies that $u\in Ax$ .

(3)

We say that $A$ is $\phi_{A}$ -uniformly monotone, if there exists an increasing function $\phi_{A}\colon\left[0,\infty\right[\to\left[0,\infty\right]$ that vanishes only at $0$ such that

\big{(}\forall(x,u),(y,v)\in\operatorname{gra}A\big{)}\quad\left\langle{x-y}\mid{u-v}\right\rangle\geq\phi_{A}(\|y-x\|).

(2.1)

If $\phi_{A}=\nu_{A}|\cdot|^{2}$ for some $\nu_{A}\in\left]0,\infty\right[$ , then we say that $A$ is $\nu_{A}$ -strongly monotone.

(4)

Let $\operatorname{Id}$ denote the identity operator on $\mathcal{H}$ and $A:\mathcal{H}\to 2^{\mathcal{H}}$ be a maximal monotone operator. For each $\lambda>0$ , the resolvent mapping $J^{A}_{\lambda}:\mathcal{H}\to\mathcal{H}$ associated with $A$ is defined by

$J^{A}_{\lambda}(x):=(\operatorname{Id}+\lambda A)^{-1}(x)\ \forall x\in\mathcal{H}.$ (2.2)

Definition 2.2

A mapping $T:\mathcal{H}\to\mathcal{H}$ is said to be

(1)

firmly nonexpansive if

$\|Tx-Ty\|^{2}\leq\left\langle{Tx-Ty}\mid{x-y}\right\rangle\ \ \ \forall x,y\in\mathcal{H},$

or equivalently

$\|Tx-Ty\|^{2}\leq\|x-y\|^{2}-\|(I-T)x-(I-T)y\|^{2}\ \ \ \forall x,y\in\mathcal{H}.$
(2)

$L$ -Lipschitz continuous with $L>0$ if

$\|Tx-Ty\|\leq L\|x-y\|\ \ \ \forall x,y\in\mathcal{H}.$ (2.3)

Let $\Gamma_{0}(\mathcal{H})$ be the class of proper lower semicontinuous convex functions from $\mathcal{H}$ to $\left]-\infty,+\infty\right]$ .

Definition 2.3

For $f\in\Gamma_{0}(\mathcal{H})$ :

(1)

$\operatorname{dom}f=\{x\in\mathcal{H},\ f(x)<+\infty\}$ . The subdifferential of $f$ at $x\in\operatorname{dom}f$ is

$\partial f(x)=\{u\in\mathcal{H},\ \forall z\in\operatorname{dom}f:\ f(z)\geq f(x)+\left\langle{u}\mid{z-x}\right\rangle\}.$

(2)

The proximity operator of $f$ is

\displaystyle\text{prox}_{f}:\mathcal{H}\to\mathcal{H}:x\mapsto\underset{y\in\mathcal{H}}{\operatorname{argmin}}\big{(}f(y)+\frac{1}{2}\|x-y\|^{2}\big{)}.

(2.4)

(3)

The conjugate function of $f$ is

$\displaystyle f^{*}:a\mapsto\underset{x\in\mathcal{H}}{\sup}\big{(}\left\langle{a}\mid{x}\right\rangle-f(x)\big{)}.$ (2.5)
(4)

The infimal convolution of the two functions $\ell$ and $g$ from $\mathcal{H}$ to $\left]-\infty,+\infty\right]$ is

$\ell\;\mbox{\footnotesize$\square$}\;g\colon x\mapsto\inf_{y\in\mathcal{H}}(\ell(y)+g(x-y)).$ (2.6)

Note that $\text{prox}_{f}=J_{\partial f}$ and

\displaystyle(\forall f\in\Gamma_{0}(\mathcal{H}))\ \ (\partial f)^{-1}=\partial f^{*}.

(2.7)

We now recall some results which are needed in sequel.

Lemma 2.4

([39]) Let $A:\mathcal{H}\to 2^{\mathcal{H}}$ be a set-valued maximal monotone mapping and $\lambda>0$ . Then the domain of the resolvent of $A$ is the whole space, that is $D(J^{A}_{\lambda})=\mathcal{H}$ , and in addition $J^{A}_{\lambda}$ is a single-valued and firmly nonexpansive mapping.

Lemma 2.5

([8], Lemma 2.4) Let $A:\mathcal{H}\to 2^{\mathcal{H}}$ be a maximal monotone mapping and $B:\mathcal{H}\to\mathcal{H}$ be a Lipschitz continuous and monotone mapping. Then the mapping $A+B$ is a maximal monotone mapping.

Following [27], let $(\Omega,{\mathcal{F}},\mathsf{P})$ be a probability space. A $\mathcal{H}$ -valued random variable is a measurable function $X:\Omega\to\mathcal{H}$ , where $\mathcal{H}$ is endowed with the Borel $\sigma$ -algebra. We denote by $\sigma(X)$ the $\sigma$ -field generated by $X$ . The expectation of a random variable $X$ is denoted by $\mathsf{E}[X]$ . The conditional expectation of $X$ given a $\sigma$ -field ${\mathcal{A}}\subset{\mathcal{F}}$ is denoted by $\mathsf{E}[X|{\mathcal{A}}]$ . A $\mathcal{H}$ -valued random process is a sequence $(x_{n})$ of $\mathcal{H}$ -valued random variables. The abbreviation a.s. stands for ’almost surely’.

Lemma 2.6

([34, Theorem 1]) Let $({\mathcal{F}}_{n})_{n\in\mathbb{N}}$ be an increasing sequence of sub- $\sigma$ -algebras of ${\mathcal{F}}$ , let $(z_{n})_{n\in\mathbb{N}},\ (\beta_{n})_{n\in\mathbb{N}},\ (\theta_{n})_{n\in\mathbb{N}}$ and $(\gamma_{n})_{n\in\mathbb{N}}$ be $[0,+\infty]$ -valued random sequences such that, for every $n\in\mathbb{N}$ , $z_{n},\ \beta_{n},\ \theta_{n}$ and $\gamma_{n}$ are ${\mathcal{F}}_{n}$ -measurable. Suppose that $\sum_{n\in\mathbb{N}}\gamma_{n}<+\infty,\ \sum_{n\in\mathbb{N}}\beta_{n}<+\infty\ a.s.$ and

\displaystyle(\forall n\in\mathbb{N})\ \mathsf{E}[z_{n+1}|{\mathcal{F}}_{n}]\leq(1+\gamma_{n})z_{n}+\beta_{n}-\theta_{n}\ a.s..

Then $z_{n}$ converges a.s. and $(\theta_{n})_{n\in\mathbb{N}}$ is summable a.s..

According to the proof of Proposition 2.3 [17], we have the following lemma.

Lemma 2.7

Let $C$ be a non-empty closed subset of $\mathcal{H}$ and let $(x_{n})_{n\in\mathbb{N}}$ be a $\mathcal{H}$ -valued random process. Suppose that, for every $x\in C$ , $(\|x_{n+1}-x\|)_{n\in\mathbb{N}}$ converges a.s.. Suppose that the set of weak sequentially cluster points of $(x_{n})_{n\in\mathbb{N}}$ is a subset of C a.s.. Then $(x_{n})_{n\in\mathbb{N}}$ converges weakly a.s. to a $C$ -valued random vector.

3 Main results

In this section, we propose a novel stochastic forward-backward-forward algorithm for solving the problem 1.1 and analyse its convergence behaviour. Unless otherwise specified, we assume that the following assumptions are satisfied from now on.

Assumption 3.1

In what follows we suppose the following assumptions for $A$ and $B$ :

(A1)

The mapping $B:\mathcal{H}\to\mathcal{H}$ is $L$ -Lipschitz continuous and monotone;
(A2)

The set-valued maping $A:\mathcal{H}\to 2^{\mathcal{H}}$ is maximal monotone.
(A3)

The solution set $\mathcal{P}=\operatorname{zer}(A+B)=(A+B)^{-1}(0)\neq\emptyset$ .

The algorithm is designed as follows.

Algorithm 3.2

Step 0: (Initialization) Choose $\theta\in[0,1]$ . Let $(\lambda_{n})_{n\in\mathbb{N}}$ be a positive sequence, $(\epsilon_{n})_{n\in\mathbb{N}}\subset[0,+\infty)$ satisfying

\displaystyle\sum_{n=0}^{+\infty}\epsilon_{n}<+\infty.

(3.1)

Let $x_{-1},\ x_{0}$ be $\mathcal{H}$ -valued, squared integrable random variables and set $n=0$ .
Step 1: Given $x_{n-1},\ x_{n}$ ( $n\geq 0$ ), choose $\alpha_{n}$ such that

\displaystyle{\alpha}_{n}=\begin{cases}\min\bigg{\{}\dfrac{\epsilon_{n}}{\|x_{n}-x_{n-1}\|},\theta\bigg{\}}&\text{if $x_{n}\neq x_{n-1}$,}\\ \theta&\text{if $x_{n}=x_{n-1}$.}\end{cases}

(3.3)

Let $r_{n}$ be a random vector. Compute

	$\displaystyle w_{n}$	$\displaystyle=x_{n}+\alpha_{n}(x_{n}-x_{n-1}),$
	$\displaystyle y_{n}$	$\displaystyle=(I+\lambda_{n}A)^{-1}(w_{n}-\lambda_{n}r_{n}).$

Step 2: Let $s_{n}$ be an unbiased estimator of $By_{n}$ , i.e., $\mathsf{E}[s_{n}|{\mathcal{F}_{n}}]=By_{n}$ . Calculate the next iterate as

\displaystyle x_{n+1}

\displaystyle=y_{n}-\lambda_{n}(s_{n}-r_{n}),

(3.4)

where $\mathcal{F}_{n}=\sigma(x_{-1},x_{0},r_{0},x_{1},r_{1},\ldots,x_{n},r_{n})$ .
Let $n:=n+1$ and return to Step 1.

Remark 3.3

Some remarks on the algorithm are in order now.

(1)

Algorithm 3.2 is an extension of the forward-backward-forward splitting method in [41] which is in the deterministic setting. In the setting of this method, we do not need the cocoercive condition as in [17, 18, 36].
(2)

When $\alpha_{n}=0$ , Algorithm 3.2 reduces to (3.2) in [42]. However, the conditions for the convergences are different from that in [42].
(3)

In [20, 21], for solving (1.1), the authors designed stochastic forward-backward-forward splitting methods which require a large number of samples in each iteration. Our results are also different from that in [20, 21].
(4)

Evidently, we have from (3.3) that

$\displaystyle\alpha_{n}\|x_{n}-x_{n-1}\|\leq\epsilon_{n}.$ (3.5)

Lemma 3.4

Let $(x_{n})$ be generated by Algorithm 3.2, then the following holds:

$\displaystyle\\|x_{n+1}-p\\|^{2}$	$\displaystyle\leq\\|w_{n}-p\\|^{2}-(1-\lambda_{n}^{2}L^{2})\\|w_{n}-y_{n}\\|^{2}+\lambda_{n}^{2}\big{(}\\|s_{n}-By_{n}\\|^{2}+\\|r_{n}-Bw_{n}\\|^{2}\big{)}$
	$\displaystyle\hskip 56.9055pt\ +2\lambda_{n}^{2}\big{(}\left\langle{s_{n}-By_{n}}\mid{By_{n}-r_{n}}\right\rangle+\left\langle{By_{n}-Bw_{n}}\mid{Bw_{n}-r_{n}}\right\rangle\big{)}$
	$\displaystyle\hskip 56.9055pt\ +2\left\langle{y_{n}-w_{n}-\lambda_{n}(By_{n}-r_{n})}\mid{y_{n}-p}\right\rangle+2\lambda_{n}\left\langle{By_{n}-s_{n}}\mid{y_{n}-p}\right\rangle,$	(3.6)

for any $p\in\mathcal{P}$ .

Proof. We have

$\displaystyle\\|x_{n+1}-p\\|^{2}$	$\displaystyle=\\|y_{n}-\lambda_{n}(s_{n}-r_{n})-p\\|^{2}$
	$\displaystyle=\\|y_{n}-w_{n}-\lambda_{n}(s_{n}-r_{n})+w_{n}-p\\|^{2}$
	$\displaystyle=\\|w_{n}-p\\|^{2}+\\|y_{n}-w_{n}-\lambda_{n}(s_{n}-r_{n})\\|^{2}+2\left\langle{y_{n}-w_{n}-\lambda_{n}(s_{n}-r_{n})}\mid{w_{n}-p}\right\rangle$
	$\displaystyle=\\|w_{n}-p\\|^{2}+\\|y_{n}-w_{n}-\lambda_{n}(s_{n}-r_{n})\\|^{2}+2\left\langle{y_{n}-w_{n}-\lambda_{n}(s_{n}-r_{n})}\mid{w_{n}-y_{n}}\right\rangle$
	$\displaystyle\hskip 204.85974pt+2\left\langle{y_{n}-w_{n}-\lambda_{n}(s_{n}-r_{n})}\mid{y_{n}-p}\right\rangle$
	$\displaystyle=\\|w_{n}-p\\|^{2}+\\|y_{n}-w_{n}-\lambda_{n}(s_{n}-r_{n})\\|^{2}-2\\|w_{n}-y_{n}\\|^{2}+2\lambda_{n}\left\langle{s_{n}-r_{n}}\mid{y_{n}-w_{n}}\right\rangle$
	$\displaystyle\hskip 204.85974pt+2\left\langle{y_{n}-w_{n}-\lambda_{n}(s_{n}-r_{n})}\mid{y_{n}-p}\right\rangle$
	$\displaystyle=\\|w_{n}-p\\|^{2}-\\|w_{n}-y_{n}\\|^{2}+\lambda_{n}^{2}\\|s_{n}-r_{n}\\|^{2}$
	$\displaystyle\hskip 42.67912pt+2\left\langle{y_{n}-w_{n}-\lambda_{n}(By_{n}-r_{n})}\mid{y_{n}-p}\right\rangle+2\lambda_{n}\left\langle{By_{n}-s_{n}}\mid{y_{n}-p}\right\rangle.$	(3.7)

Note that

$\displaystyle\\|s_{n}-r_{n}\\|^{2}$	$\displaystyle=\\|s_{n}-By_{n}+By_{n}-r_{n}\\|^{2}$
	$\displaystyle=\\|s_{n}-By_{n}\\|^{2}+2\left\langle{s_{n}-By_{n}}\mid{By_{n}-r_{n}}\right\rangle+\\|By_{n}-Bw_{n}+Bw_{n}-r_{n}\\|^{2}$
	$\displaystyle=\\|s_{n}-By_{n}\\|^{2}+2\left\langle{s_{n}-By_{n}}\mid{By_{n}-r_{n}}\right\rangle+\\|By_{n}-Bw_{n}\\|^{2}+\\|Bw_{n}-r_{n}\\|^{2}$
	$\displaystyle\ \ \ +2\left\langle{By_{n}-Bw_{n}}\mid{Bw_{n}-r_{n}}\right\rangle$
	$\displaystyle\leq\\|s_{n}-By_{n}\\|^{2}+2\left\langle{s_{n}-By_{n}}\mid{By_{n}-r_{n}}\right\rangle+L^{2}\\|y_{n}-w_{n}\\|^{2}+\\|Bw_{n}-r_{n}\\|^{2}$
	$\displaystyle\ \ \ +2\left\langle{By_{n}-Bw_{n}}\mid{Bw_{n}-r_{n}}\right\rangle$	(3.8)

By combining (3) and (3) we obtain

	$\displaystyle\\|x_{n+1}-p\\|^{2}$	$\displaystyle\leq\\|w_{n}-p\\|^{2}-(1-\lambda_{n}^{2}L^{2})\\|w_{n}-y_{n}\\|^{2}+\lambda_{n}^{2}\big{(}\\|s_{n}-By_{n}\\|^{2}+\\|r_{n}-Bw_{n}\\|^{2}\big{)}$
		$\displaystyle\hskip 56.9055pt\ +2\lambda_{n}^{2}\big{(}\left\langle{s_{n}-By_{n}}\mid{By_{n}-r_{n}}\right\rangle+\left\langle{By_{n}-Bw_{n}}\mid{Bw_{n}-r_{n}}\right\rangle\big{)}$
		$\displaystyle\hskip 56.9055pt\ +2\left\langle{y_{n}-w_{n}-\lambda_{n}(By_{n}-r_{n})}\mid{y_{n}-p}\right\rangle+2\lambda_{n}\left\langle{By_{n}-s_{n}}\mid{y_{n}-p}\right\rangle.$

The proof is complete.

Theorem 3.5

Let $(x_{n})_{n\in\mathbb{N}}$ be generated by Algorithm 3.2. The followings hold

(i)

Assume that $(\lambda_{n})_{n\in\mathbb{N}}$ be a sequence in $\left]\epsilon,\dfrac{1-\epsilon}{L}\right[$ and the following conditions are satisfied for $\mathcal{F}_{n}=\sigma(x_{-1},x_{0},r_{0},x_{1},r_{1},\ldots,x_{n},r_{n})$

\sum_{n\in\mathbb{N}}\mathsf{E}[\|s_{n}-By_{n}\|^{2}|\mathcal{F}_{n}]<+\infty\ \ a.s..\ \ \text{and}\ \ \sum_{n\in\mathbb{N}}\|r_{n}-Bw_{n}\|^{2}<+\infty\ \ a.s.

(3.9)

Then $(x_{n})_{n\in\mathbb{N}}$ converges weakly to a random varibale $\overline{x}\colon\Omega\to\operatorname{zer}(A+B)$ a.s..

(ii)

Suppose that $A$ or $B$ is uniformly monotone. Let $(\lambda_{n})_{n\in\mathbb{N}}$ be a sequence in $\left]0,+\infty\right[$ such that $(\lambda_{n})_{n\in\mathbb{N}}\in\ell_{2}(\mathbb{N})\backslash\ell_{1}(\mathbb{N})$ and

\sum_{n\in\mathbb{N}}\lambda^{2}_{n}\|r_{n}-Bw_{n}\|^{2}<\infty\ \ a.s\ \ \text{and}\ \ \sum_{n\in\mathbb{N}}\lambda_{n}^{2}\mathsf{E}[\|s_{n}-By_{n}\|^{2}|\mathcal{F}_{n}]<+\infty\ \ \text{a.s}.,

(3.10)

where $(\forall p\in\left]0,\infty\right[)\;\ell_{p}(\mathbb{N})=\big{\{}{(\lambda_{n})_{n\in\mathbb{N}}}\;|\;{(\forall n\in\mathbb{N})\;\lambda_{n}\in\mathbb{R},\sum\limits_{n\in\mathbb{N}}|\lambda_{n}|^{p}<+\infty}\big{\}}.$ Then $(x_{n})_{n\in\mathbb{N}}$ converges strongly to a unique solution $\overline{x}$ a.s..

Proof. From Lemma 3.4, taking conditional expectation given $\mathcal{F}_{n}$ on both sides of (3.6)), using $\mathsf{E}[s_{n}|{\mathcal{F}_{n}}]=By_{n}$ we get

	$\displaystyle\mathsf{E}[\\|x_{n+1}-p\\|^{2}\|\mathcal{F}_{n}]$	$\displaystyle\leq\\|w_{n}-p\\|^{2}-(1-\lambda_{n}^{2}L^{2})\\|w_{n}-y_{n}\\|^{2}+\lambda_{n}^{2}\mathsf{E}[\\|s_{n}-By_{n}\\|^{2}\|\mathcal{F}_{n}]+\lambda_{n}^{2}\\|r_{n}-Bw_{n}\\|^{2}$
		$\displaystyle\ +2\lambda_{n}^{2}\left\langle{By_{n}-Bw_{n}}\mid{Bw_{n}-r_{n}}\right\rangle+2\left\langle{y_{n}-w_{n}-\lambda_{n}(By_{n}-r_{n})}\mid{y_{n}-p}\right\rangle.$		(3.11)

Since $y_{n}=(I+\lambda A)^{-1}(w_{n}-\lambda_{n}r_{n})$ , we obtain

\dfrac{w_{n}-y_{n}}{\lambda_{n}}-r_{n}\in Ay_{n}

which is equivalent to

\dfrac{w_{n}-y_{n}}{\lambda_{n}}-r_{n}+By_{n}\in(A+B)y_{n}.

We have $0\in(A+B)p$ , using the uniformly monotone of $A+B$ , we get

\displaystyle\left\langle{\dfrac{w_{n}-y_{n}}{\lambda_{n}}-r_{n}+By_{n}}\mid{y_{n}-p}\right\rangle\geq\phi(\|y_{n}-p\|),

(3.12)

which implies

\displaystyle\left\langle{y_{n}-w_{n}-\lambda_{n}(By_{n}-r_{n})}\mid{y_{n}-p}\right\rangle\leq-\lambda_{n}\phi(\|y_{n}-p\|).

(3.13)

Using (3.5) and Cauchy-Schwarz inequality, we estimate the term $\|w_{n}-p\|^{2}$ in (3.11) as follows:

$\displaystyle\\|w_{n}-p\\|^{2}$	$\displaystyle=\\|x_{n}+\alpha_{n}(x_{n}-x_{n-1})-p\\|^{2}$
	$\displaystyle=\\|x_{n}-p\\|^{2}+2\alpha_{n}\left\langle{x_{n}-p}\mid{x_{n}-x_{n-1}}\right\rangle+\alpha_{n}^{2}\\|x_{n}-x_{n-1}\\|^{2}$
	$\displaystyle\leq\\|x_{n}-p\\|^{2}+2\alpha_{n}\\|x_{n}-p\\|\\|x_{n}-x_{n-1}\\|+\alpha_{n}^{2}\\|x_{n}-x_{n-1}\\|^{2}$
	$\displaystyle\leq\\|x_{n}-p\\|^{2}+2\epsilon_{n}\\|x_{n}-p\\|+\epsilon_{n}^{2}$
	$\displaystyle\leq(1+\epsilon_{n})\\|x_{n}-p\\|^{2}+\epsilon_{n}^{2}+\epsilon_{n}\ .$	(3.14)

Therefore, from (3.11), using (3.13) and (3.14), we derive

	$\displaystyle\mathsf{E}[\\|x_{n+1}$	$\displaystyle-p\\|^{2}\|\mathcal{F}_{n}]\leq(1+\epsilon_{n})\\|x_{n}-p\\|^{2}-(1-\lambda_{n}^{2}L^{2})\\|w_{n}-y_{n}\\|^{2}+\lambda_{n}^{2}\mathsf{E}[\\|s_{n}-By_{n}\\|^{2}\|\mathcal{F}_{n}]$
		$\displaystyle+\lambda_{n}^{2}\\|r_{n}-Bw_{n}\\|^{2}+2\lambda_{n}^{2}\left\langle{By_{n}-Bw_{n}}\mid{Bw_{n}-r_{n}}\right\rangle-2\lambda_{n}\phi(\\|y_{n}-p\\|)+\epsilon_{n}^{2}+\epsilon_{n}.$		(3.15)

(i) In general case, i.e. $\phi=0$ . We have

$\displaystyle 2\left\langle{By_{n}-Bw_{n}}\mid{Bw_{n}-r_{n}}\right\rangle$	$\displaystyle\leq 2\\|By_{n}-Bw_{n}\\|\\|Bw_{n}-r_{n}\\|$
	$\displaystyle\leq\dfrac{\epsilon}{1-\epsilon}\\|By_{n}-Bw_{n}\\|^{2}+\dfrac{1-\epsilon}{\epsilon}\\|r_{n}-Bw_{n}\\|^{2}$
	$\displaystyle\leq\dfrac{\epsilon}{1-\epsilon}L^{2}\\|y_{n}-w_{n}\\|^{2}+\dfrac{1-\epsilon}{\epsilon}\\|r_{n}-Bw_{n}\\|^{2}.$	(3.16)

Hence, (3) implies that

$\displaystyle\mathsf{E}[\\|x_{n+1}-p\\|^{2}\|\mathcal{F}_{n}]$	$\displaystyle\leq(1+\epsilon_{n})\\|x_{n}-p\\|^{2}-(1-\lambda_{n}^{2}L^{2}(1+\frac{\epsilon}{1-\epsilon})\\|w_{n}-y_{n}\\|^{2}+\lambda_{n}^{2}\mathsf{E}[\\|s_{n}-By_{n}\\|^{2}\|\mathcal{F}_{n}]$
	$\displaystyle\ \ \ +\lambda_{n}^{2}(1+\frac{1-\epsilon}{\epsilon})\\|r_{n}-Bw_{n}\\|^{2}+\epsilon_{n}^{2}+\epsilon_{n}$
	$\displaystyle\leq(1+\epsilon_{n})\\|x_{n}-p\\|^{2}-\epsilon\\|w_{n}-y_{n}\\|^{2}+\lambda_{n}^{2}\mathsf{E}[\\|s_{n}-By_{n}\\|^{2}\|\mathcal{F}_{n}]+\frac{\lambda_{n}^{2}}{\epsilon}\\|r_{n}-Bw_{n}\\|^{2}$
	$\displaystyle\ \ \ +\epsilon_{n}^{2}+\epsilon_{n}.$	(3.17)

We have that $\sum_{n=1}^{\infty}\epsilon_{n}<\infty$ which implies $\sum_{n=1}^{\infty}\epsilon_{n}^{2}<\infty$ . Therefore, using the conditions in Theorem 3.5 and Lemma 2.6, (3) implies that

\|x_{n}-p\|\ \ \text{converges and}\ \|w_{n}-y_{n}\|\to 0\ \ \text{a.s}..

We have

	$\displaystyle\\|x_{n}-y_{n}\\|$	$\displaystyle\leq\\|x_{n}-w_{n}\\|+\\|w_{n}-y_{n}\\|$
		$\displaystyle\leq\alpha_{n}\\|x_{n}-x_{n-1}\\|+\\|w_{n}-y_{n}\\|\to 0.$		(3.18)

Let us set

z_{n}=(I+\lambda_{n}A)^{-1}(w_{n}-\lambda_{n}Bw_{n}).

(3.19)

Then, since $J_{\lambda_{n}A}$ is nonexpansive, we have

\|y_{n}-z_{n}\|\leq\lambda_{n}\|Bw_{n}-r_{n}\|\to 0\;\text{a.s}..

(3.20)

Hence

\displaystyle\|w_{n}-z_{n}\|\leq\|w_{n}-y_{n}\|+\|y_{n}-z_{n}\|\to 0\ \ \text{a.s}..

(3.21)

Let $x^{*}$ be a weak cluster point of $(x_{n})_{n\in\mathbb{N}}$ . Then, there exists a subsequence $(x_{n_{k}})_{k\in\mathbb{N}}$ which converges weakly to $x^{*}$ a.s.. By (3.18), $y_{n_{k}}\rightharpoonup x^{*}$ a.s.. It follows from $y_{n_{k}}\rightharpoonup x^{*}$ that $z_{n_{k}}\rightharpoonup x^{*}$ . Since $z_{n_{k}}=(I+\gamma_{n_{k}}A)^{-1}(w_{n_{k}}-\gamma_{n_{k}}Bw_{n_{k}})$ , we have

\frac{w_{n_{k}}-z_{n_{k}}}{\gamma_{n_{k}}}-Bw_{n_{k}}+Bz_{n_{k}}\in(A+B)z_{n_{k}}.

(3.22)

Since $B$ is $L$ -Lipschitz and $(\lambda_{n})_{n\in\mathbb{N}}$ is bounded away from $0$ , it follows that

\frac{w_{n_{k}}-z_{n_{k}}}{\gamma_{n_{k}}}-Bw_{n_{k}}+Bz_{n_{k}}\to 0\quad{a.s..}

(3.23)

Using [2, Corollary 25.5], the sum $A+B$ is maximally monotone and hence, its graph is closed in $\mathcal{H}^{weak}\times\mathcal{H}^{strong}$ [2, Proposition 20.38]. Therefore, $0\in(A+B)x^{*}$ a.s., that is $x^{*}\in\mathcal{P}$ a.s. By Lemma 2.7, the sequence $(x_{n})_{n\in\mathbb{N}}$ converges weakly to $\bar{x}\in\mathcal{P}$ a.s. and the proof is completed.

(ii) In case $A+B$ is uniform monotone.
We rewrite (3) as

$\displaystyle\mathsf{E}[\\|x_{n+1}-p\\|^{2}\|\mathcal{F}_{n}]$	$\displaystyle\leq(1+\epsilon_{n})\\|x_{n}-p\\|^{2}-(1-\lambda_{n}^{2}L^{2})\\|w_{n}-y_{n}\\|^{2}+\lambda_{n}^{2}\mathsf{E}[\\|s_{n}-By_{n}\\|^{2}\|\mathcal{F}_{n}]$
	$\displaystyle\ \ +\lambda_{n}^{2}\\|r_{n}-Bw_{n}\\|^{2}+2\lambda_{n}^{2}\left\langle{By_{n}-Bw_{n}}\mid{Bw_{n}-r_{n}}\right\rangle-2\lambda_{n}\phi(\\|y_{n}-p\\|)$
	$\displaystyle\ \ +\epsilon_{n}^{2}+\epsilon_{n}.$	(3.24)

We have

	$\displaystyle 2\left\langle{By_{n}-Bw_{n}}\mid{Bw_{n}-r_{n}}\right\rangle$	$\displaystyle\leq\\|By_{n}-Bw_{n}\\|^{2}+\\|r_{n}-Bw_{n}\\|^{2}$
		$\displaystyle\leq L^{2}\\|y_{n}-w_{n}\\|^{2}+\\|r_{n}-Bw_{n}\\|^{2}$		(3.25)

Using (3), from (3) we have

	$\displaystyle\mathsf{E}[\\|x_{n+1}-p\\|^{2}\|\mathcal{F}_{n}]$	$\displaystyle\leq(1+\epsilon_{n})\\|x_{n}-p\\|^{2}-(1-2\lambda_{n}^{2}L^{2})\\|w_{n}-y_{n}\\|^{2}+\lambda_{n}^{2}\mathsf{E}[\\|s_{n}-By_{n}\\|^{2}\|\mathcal{F}_{n}]$
		$\displaystyle\ \ \ +2\lambda_{n}^{2}\\|r_{n}-Bw_{n}\\|^{2}-2\lambda_{n}\phi(\\|y_{n}-p\\|)+\epsilon_{n}^{2}+\epsilon_{n}.$		(3.26)

From $\sum_{n\in\mathbb{N}}\lambda_{n}^{2}<+\infty$ , we derive $\lim\limits_{n\to+\infty}\lambda_{n}=0$ . We have that

\displaystyle\begin{cases}\sum_{n\in\mathbb{N}}\epsilon_{n}<+\infty\\ \sum_{n\in\mathbb{N}}\lambda_{n}^{2}\mathsf{E}[\|s_{n}-By_{n}\|^{2}|\mathcal{F}_{n}]<+\infty\ \ a.s.\\ \sum_{n\in\mathbb{N}}\lambda_{n}^{2}\|r_{n}-Bw_{n}\|^{2}<+\infty\ \ a.s..\end{cases}

(3.27)

Therofore (3) and Lemma 2.6 imply

\displaystyle\|x_{n}-p\|\ \text{converges}\ \ \text{and}\ \sum\limits_{n\in\mathbb{N}}\lambda_{n}\phi(\|y_{n}-p\|)<+\infty\ \ a.s..

(3.28)

Since $\sum_{n\in\mathbb{N}}\lambda_{n}=\infty$ , it follows from $\sum\limits_{n\in\mathbb{N}}\lambda_{n}\phi(\|y_{n}-p\|)<+\infty$ that $\liminf_{n\to\infty}\phi(\|y_{n}-p\|)=0$ . Thus, there exists a subsequence $\{n_{k}\}_{n\in\mathbb{N}}$ such that $\|y_{n_{k}}-p\|\to 0$ . It follows from (3.18) that $\|x_{n_{k}}-p\|\to 0$ a.s.. Therefore, we infer that $\|x_{n}-p\|\to 0$ a.s.. This completes the proof.

Remark 3.6

With respect to Theorem 3.5, we observe the following.

(1)

The conditions in (3.10) are satisfied if sequences $(\|r_{n}-Bw_{n}\|^{2})_{n\in\mathbb{N}}$ , $(\mathsf{E}[\|s_{n}-By_{n}\|^{2}|\mathcal{F}_{n}])_{n\in\mathbb{N}}$ are bounded.
(2)

Theorem 3.5 removes the assumption (iii) of Theorem 3.2 in [37], i.e. $\sup_{n\in\mathbb{N}}\|x_{n}-x_{n-1}\|^{2}<+\infty$ and $\sum_{n\in\mathbb{N}}\alpha_{n}<+\infty$ .
(3)

The algorithm (3.2) of [42] is a particular case of our algorithm when $\alpha_{n}=0$ . The condition (3.9), i.e. $\sum_{n\in\mathbb{N}}\mathsf{E}[\|s_{n}-By_{n}\|^{2}|\mathcal{F}_{n}]<+\infty\ \text{and}\ \ \sum_{n\in\mathbb{N}}\|r_{n}-Bw_{n}\|^{2}|<+\infty\ \$ is weaker the conditions $\sum_{n\in\mathbb{N}}\sqrt{\mathsf{E}[\|s_{n}-By_{n}\|^{2}|\mathcal{F}_{n}]}<+\infty\ \ \text{and}\ \sum_{n\in\mathbb{N}}\sqrt{\|r_{n}-Bw_{n}\|^{2}}<+\infty\ \$ of Theorem 3.1 in [42].
(4)

For general case, i.e. $A$ is maximally monotone and $B$ is monotone and Lipschitz, the proposed algoritms in [20, 21] require a large number of samples in each iteration, all are unbiased estimates. However, Algorithm 3.2 only requires $s_{n}$ is unbiased estimate. The range of the step size $\lambda_{n}$ in Theorem 3.5 is more extended than that in Theorem 1 of [20]. In case $r_{n}$ and $s_{n}$ is the average of samples as in [20, 21], we can obtain the same results as there.

For the rate convergence in uniformly monotone case, we define the function

\displaystyle\varphi_{c}:\ ]0,+\infty[\to\mathbb{R}:\ t\mapsto\begin{cases}\dfrac{t^{c}-1}{c}\ &\text{if}\ c\neq 0\\ \log t\ \ &\text{if}\ c=0.\end{cases}

(3.29)

The following Lemma establishes a non asymptotic bound for numerical sequences satisfying a given recursive inequality. The proof is obtained similarly to the proof of Lemma 3.1 in [36].

Lemma 3.7

Let $\alpha\in]\frac{1}{2},1],\ \beta>1$ . Let $a,b\in]0,+\infty[,\ a\leq\beta.$ Set $\alpha_{n}=\dfrac{a}{n^{\alpha}},\ \beta_{n}=\dfrac{b}{n^{\beta}}$ . Let $(s_{n})_{n\in\mathbb{N}}$ be a sequence such that

\displaystyle(\forall n\in\mathbb{N})\ \ 0\leq s_{n+1}\leq(1-\alpha_{n})s_{n}+\beta_{n}.

(3.30)

Let $n_{0}$ be the smallest integer such that, for every $n\geq n_{0}>1,$ it holds $\alpha_{n}<1,$ set $t=1-2^{\alpha-1}\geq 0.$ Then for every $n\geq 2n_{0}$ , the followings hold

(i)

If $\alpha=1$ , we get

\displaystyle s_{n+1}\leq s_{n_{0}}\big{(}\dfrac{n_{0}}{n+1}\big{)}^{a}+\dfrac{b}{(n+1)^{a}}\big{(}1+\frac{1}{n_{0}}\big{)}^{a}\varphi_{a+1-\beta}(n).

(3.31)

(ii)

If $1/2<\alpha<1$ , we have

\displaystyle s_{n+1}\leq\bigg{(}b\varphi_{1-\beta}(n)+s_{n_{0}}\text{exp}\bigg{(}\frac{an_{0}^{1-\alpha}}{1-\alpha}\bigg{)}\bigg{)}\text{exp}\bigg{(}\frac{-at(n+1)^{1-\alpha}}{1-\alpha}\bigg{)}+\dfrac{b2^{\beta-\alpha}}{a(n-2)^{\beta-\alpha}}.

(3.32)

Proof. We recall the definition of $\varphi_{c}$ in (3.29). Note that, $\varphi_{c}$ is a increasing function and for $\delta\geq 0$ , $2\leq m\leq n$ , we get:

\displaystyle\varphi_{1-\delta}(n+1)-\varphi_{1-\delta}(m)\leq\sum_{k=m}^{n}k^{-\delta}\leq\varphi_{1-\delta}(n).

(3.33)

We have

\displaystyle s_{n+1}\leq s_{n_{0}}\prod_{k=n_{0}}^{n}(1-\alpha_{n})+\sum_{k=n_{0}}^{n}\prod_{i=k+1}^{n}(1-\alpha_{i})\beta_{k}.

(3.34)

Let us estimate the first term in the right hand side of (3.34). Using (3.33) and the inequality $1-x\leq e^{-x}\ \forall\ x\in\mathbb{R}$ , we have

	$\displaystyle\prod_{k=n_{0}}^{n}(1-\alpha_{n})$	$\displaystyle=\prod_{k=n_{0}}^{n}(1-ak^{-\alpha})\leq e^{-a\sum\limits_{k=n_{0}}^{n}k^{-\alpha}}$
		$\displaystyle\leq\begin{cases}\big{(}\dfrac{n_{0}}{n+1}\big{)}^{a}\ &\text{if}\ \alpha=1\\ \text{exp}\big{(}\frac{a}{1-\alpha}(n_{0}^{1-\alpha}-(n+1)^{1-\alpha})\big{)}\ &\text{if}\ \frac{1}{2}<\alpha<1.\end{cases}$		(3.35)

To estimate the second term in the right hand side of (3.34), let us consider firstly the case $\alpha=1$ . We have

$\displaystyle\sum_{k=n_{0}}^{n}\prod_{i=k+1}^{n}(1-\alpha_{i})\beta_{k}$	$\displaystyle\leq\sum_{k=n_{0}}^{n}e^{-a\sum\limits_{i=k+1}^{n}i^{-\alpha}}\beta_{k}$
	$\displaystyle\leq\sum_{k=n_{0}}^{n}\big{(}\dfrac{k+1}{n+1}\big{)}^{a}\dfrac{b}{k^{\beta}}=\dfrac{b}{(n+1)^{a}}\sum_{k=n_{0}}^{n}(1+\frac{1}{k})^{a}k^{a-\beta}$
	$\displaystyle\leq\dfrac{b}{(n+1)^{a}}\big{(}1+\frac{1}{n_{0}}\big{)}^{a}\varphi_{a+1-\beta}(n).$	(3.36)

From (3.34), using (3) and (3), for $\alpha=1$ , we get

\displaystyle s_{n+1}\leq s_{n_{0}}\big{(}\dfrac{n_{0}}{n+1}\big{)}^{a}+\dfrac{b}{(n+1)^{a}}\big{(}1+\frac{1}{n_{0}}\big{)}^{a}\varphi_{a+1-\beta}(n).

(3.37)

We next estimate the second term in the right hand side of (3.34) in case $1/2<\alpha<1$ . Let $m\in\mathbb{N}$ such that $n_{0}\leq n/2\leq m+1\leq(n+1)/2.$ We have

$\displaystyle\sum_{k=n_{0}}^{n}$	$\displaystyle\prod_{i=k+1}^{n}(1-\alpha_{i})\beta_{k}=\sum_{k=n_{0}}^{m}\prod_{i=k+1}^{n}(1-\alpha_{i})\beta_{k}+\sum_{k=m+1}^{n}\prod_{i=k+1}^{n}(1-\alpha_{i})\beta_{k}$
	$\displaystyle\leq\text{exp}\big{(}-\sum_{i=m+1}^{n}\alpha_{i}\big{)}\sum_{k=n_{0}}^{m}\beta_{k}+\dfrac{b}{a\cdot m^{\beta-\alpha}}\sum_{k=m+1}^{n}\prod_{i=k+1}^{n}(1-\alpha_{i})\alpha_{n}$
	$\displaystyle\leq b\cdot\text{exp}\big{(}-a\sum_{i=m+1}^{n}i^{-\alpha}\big{)}\sum_{k=n_{0}}^{m}k^{-\beta}+\dfrac{b}{a\cdot m^{\beta-\alpha}}\sum_{k=m+1}^{n}\big{(}\prod_{i=k+1}^{n}(1-\alpha_{i})-\prod_{i=k}^{n}(1-\alpha_{i})\big{)}$
	$\displaystyle\leq b\cdot\text{exp}\big{(}\frac{a}{1-\alpha}((m+1)^{1-\alpha}-(n+1)^{1-\alpha})\big{)}\varphi_{1-\beta}(n)+\dfrac{b2^{\beta-\alpha}}{a(n-2)^{\beta-\alpha}}$
	$\displaystyle\leq b\cdot\text{exp}\big{(}\frac{-at(n+1)^{1-\alpha}}{1-\alpha}\big{)}\varphi_{1-\beta}(n)+\dfrac{b2^{\beta-\alpha}}{a(n-2)^{\beta-\alpha}}.$	(3.38)

Combing (3) and (3.38), for $1/2<\alpha<1$ , we have

	$\displaystyle s_{n+1}$	$\displaystyle\leq s_{n_{0}}\text{exp}\big{(}\frac{a}{1-\alpha}(n_{0}^{1-\alpha}-(n+1)^{1-\alpha})\big{)}+b\cdot\text{exp}\big{(}\frac{-at(n+1)^{1-\alpha}}{1-\alpha}\big{)}\varphi_{1-\beta}(n)+\dfrac{b2^{\beta-\alpha}}{a(n-2)^{\beta-\alpha}}$
		$\displaystyle\leq\bigg{(}b\varphi_{1-\beta}(n)+s_{n_{0}}\text{exp}\bigg{(}\frac{an_{0}^{1-\alpha}}{1-\alpha}\bigg{)}\bigg{)}\text{exp}\bigg{(}\frac{-at(n+1)^{1-\alpha}}{1-\alpha}\bigg{)}+\dfrac{b2^{\beta-\alpha}}{a(n-2)^{\beta-\alpha}}.$		(3.39)

Theorem 3.8

Suppose that $A$ or $B$ is $\mu$ -strongly monotone. For $\alpha\in]1/2,1],\ a>0$ , define

(\forall n\in\mathbb{N})\quad\lambda_{n}=\frac{4a}{\mu n^{\alpha}}.

(3.40)

Suppose that there exist constants $c$ and $\theta>1$ such that

(\forall n\in\mathbb{N})\;\mathsf{E}[(2\|r_{n}-Bw_{n}\|^{2}+\|s_{n}-By_{n}\|^{2})|\mathcal{F}_{n}]\leq c\ \ a.s.

(3.41)

and $\epsilon_{n}=\mathcal{O}(n^{-\theta})$ . Set $\beta=\min\{2\alpha,\theta\}$ , assume that $a\leq\beta$ . Then

\displaystyle\mathsf{E}[\|x_{n}-p\|^{2}]=\begin{cases}\mathcal{O}(n^{\alpha-\beta})\ &\text{if}\ 1/2<\alpha<1,\\ \mathcal{O}(n^{-a})+\mathcal{O}(n^{1-\beta})\ &\text{if}\ \alpha=1,\ a\neq\beta-1,\\ \mathcal{O}(n^{-a})+\mathcal{O}(\frac{\ln n}{n^{a}})\ &\text{if}\ \alpha=1,\ a=\beta-1.\end{cases}

(3.42)

Proof. Using the strong monotonicity of $A+B$ , we rewrite (3)

	$\displaystyle\mathsf{E}[\\|x_{n+1}-p\\|^{2}\|\mathcal{F}_{n}]$	$\displaystyle\leq(1+\epsilon_{n})\\|x_{n}-p\\|^{2}-(1-2\lambda_{n}^{2}L^{2})\\|w_{n}-y_{n}\\|^{2}+\lambda_{n}^{2}\mathsf{E}[\\|s_{n}-By_{n}\\|^{2}\|\mathcal{F}_{n}]$
		$\displaystyle\ \ \ +2\lambda_{n}^{2}\\|r_{n}-Bw_{n}\\|^{2}-\lambda_{n}\mu\\|y_{n}-p\\|^{2}+\epsilon_{n}^{2}+\epsilon_{n}.$		(3.43)

Using Cauchy-Schwarz inequality, we have

$\displaystyle\\|x_{n}-p\\|^{2}$	$\displaystyle\leq 2\big{(}\\|x_{n}-y_{n}\\|^{2}+\\|y_{n}-p\\|^{2}\big{)}$
	$\displaystyle\leq 4\big{(}\\|x_{n}-w_{n}\\|^{2}+\\|w_{n}-y_{n}\\|^{2}\big{)}+2\\|y_{n}-p\\|^{2}$
	$\displaystyle\leq 4\epsilon_{n}^{2}+4\\|w_{n}-y_{n}\\|^{2}+2\\|y_{n}-p\\|^{2}$	(3.44)

which is equivalent to

\displaystyle\|y_{n}-p\|^{2}

\displaystyle\geq\dfrac{\|x_{n}-p\|^{2}}{2}-2\epsilon_{n}^{2}-2\|w_{n}-y_{n}\|^{2}.

(3.45)

Hence (3) implies that

	$\displaystyle\mathsf{E}[\\|x_{n+1}-p\\|^{2}\|\mathcal{F}_{n}]$	$\displaystyle\leq(1+\epsilon_{n}-\frac{\lambda_{n}\mu}{2})\\|x_{n}-p\\|^{2}-(1-2\lambda_{n}^{2}L^{2}-2\lambda_{n}\mu)\\|w_{n}-y_{n}\\|^{2}$
		$\displaystyle\ \ \ +\lambda_{n}^{2}\mathsf{E}[\\|s_{n}-By_{n}\\|^{2}\|\mathcal{F}_{n}]+2\lambda_{n}^{2}\\|r_{n}-Bw_{n}\\|^{2}+2\lambda_{n}\mu\epsilon_{n}^{2}+\epsilon_{n}^{2}+\epsilon_{n}.$		(3.46)

We have that there exists $n_{0}\in\mathbb{N}$ such that $\forall n\geq n_{0}$

\displaystyle\begin{cases}\epsilon_{n}\leq\frac{\lambda_{n}\mu}{4},\\ 1-2\lambda_{n}^{2}L^{2}-2\lambda_{n}\mu\geq 0,\\ 2\lambda_{n}\mu\epsilon_{n}^{2}+\epsilon_{n}^{2}+\epsilon_{n}\leq 2\epsilon_{n}.\end{cases}

(3.47)

Therefore (3) implies that for $n\geq n_{0}$ , we have

	$\displaystyle\mathsf{E}[\\|x_{n+1}-p\\|^{2}]$	$\displaystyle\leq(1-\frac{\lambda_{n}\mu}{4})\mathsf{E}[\\|x_{n}-p\\|^{2}]+c\lambda_{n}^{2}+2\epsilon_{n}$
		$\displaystyle=(1-an^{-\alpha})\mathsf{E}[\\|x_{n}-p\\|^{2}]+\dfrac{16ca^{2}}{\mu}n^{-2\alpha}+2\epsilon_{n}$		(3.48)

From the definition of $\theta,\beta$ , there exist $n_{1}\in\mathbb{N}$ and $b>0$ such that $\forall n\geq n_{1}$ , we get

\displaystyle\mathsf{E}[\|x_{n+1}-p\|^{2}]\leq(1-an^{-\alpha})\mathsf{E}[\|x_{n}-p\|^{2}]+bn^{-\beta}.

(3.49)

Using Lemma 3.7, we obtain:

In case $1/2<\alpha<1$ , from (3.32), we have $\mathsf{E}[\|x_{n}-p\|^{2}]=\mathcal{O}(n^{\alpha-\beta}).$

In case $\alpha=1$ , from (3.31) and (3.29), we get

\displaystyle\mathsf{E}[\|x_{n}-p\|^{2}]=\begin{cases}\mathcal{O}(n^{-a})+\mathcal{O}(n^{1-\beta})\ &\text{if}\ a\neq\beta-1\\ \mathcal{O}(n^{-a})+\mathcal{O}(\frac{\ln n}{n^{a}})\ &\text{if}\ a=\beta-1\end{cases}

(3.50)

which proves the desired result.

Remark 3.9

Here are some remarks.

(1)

The strong almost sure convergence of the iterates is obtained from the condition (3.10).
(2)

It follows from (3.42) that the best rate $\mathcal{O}(1/n)$ is derived with $\alpha=1$ , $\theta\geq 2$ and $a>1$ . This result is similar to the result in Theorem 3.4 (iii) [36]. Note that, we do not require $B$ is cocoercive as in [36].
(3)

The rate $\mathcal{O}(1/n)$ is faster than the rate $\mathcal{O}(\log n/n)$ in [29].
(4)

In [20], the authors proved the linear convergence of $\mathsf{E}\|x_{n}-p\|^{2}$ . However, as mentioned above, in each iteration, the algorithm requires a large number of samples and the oracle complexity is still $\mathcal{O}(1/\epsilon)$ which is equal to the complexity as in Theorem 3.8.

From Theorem 3.5 and Theorem 3.8, we have the following Corollary:

Corollary 3.10

Let $f\in\Gamma_{0}(\mathcal{H})$ and $h\colon\mathcal{H}\to\mathbb{R}$ be a convex differentiable function, with $L$ -Lipschitz continuous gradient, given by an expectation form $h(x)=\mathsf{E}_{\xi}[H(x,\xi)]$ . In the expectation, $\xi$ is a random vector whose probability distribution is supported on a set $\Omega\subset\mathbb{R}^{m}$ , and $H\colon\mathcal{H}\times\Omega\to\mathbb{R}$ is convex function with respect to the variable $x$ . The problem is to

\underset{x\in\mathcal{H}}{\text{minimize}}\;f(x)+h(x),

(3.51)

under the following assumptions:

(i)

$\operatorname{zer}(\partial f+\nabla h)\not={\varnothing}$ .
(ii)

It is possible to obtain independent and identically distributed (i.i.d.) samples $(\xi_{n})_{n\in\mathbb{N}}$ , $(\xi_{n}^{\prime})_{n\in\mathbb{N}}$ of $\xi$ .
(iii)

$\mathsf{E}[\nabla H(x,\xi)]=\nabla h(x)$ for $\forall x\in\mathcal{H}$ .

Let $(\lambda_{n})_{n\in\mathbb{N}}$ be a sequence in $\left]0,+\infty\right[$ . Let $x_{-1},x_{0}$ be in $\mathcal{H}$ . Define

(\forall n\in\mathbb{N})\quad\begin{array}[]{l}\left\lfloor\begin{array}[]{l}w_{n}=x_{n}+\alpha_{n}(x_{n}-x_{n-1}),\\ y_{n}=\operatorname{prox}_{\lambda_{n}f}\big{(}w_{n}-\lambda_{n}\nabla H(w_{n},\xi_{n})\big{)}\\ x_{n+1}=y_{n}-\lambda_{n}\big{(}\nabla H(y_{n},\xi_{n}^{\prime})-\nabla H(w_{n},\xi_{n})\big{)},\end{array}\right.\\[5.69054pt] \end{array}\\

(3.52)

where $(\alpha_{n})_{n\in\mathbb{N}}$ is defined as in Algorithm 3.2. Denote $\mathcal{F}_{n}=\sigma(\xi_{0},\xi_{0}^{\prime},\ldots,\xi_{n-1},\xi_{n-1}^{\prime},\xi_{n})$ . Then, the followings hold.

(i)

If $f$ is $\mu$ -strongly monotone ( $\mu\in\left]0,+\infty\right[$ ), and there exists a constant $c$ such that

\mathsf{E}[(\|\nabla H(y_{n},\xi_{n}^{\prime})-\nabla h(y_{n})\|^{2}+\|\nabla H(w_{n},\xi_{n})-\nabla h(w_{n})\|^{2})|\mathcal{F}_{n}]\leq c\ \ a.s..

(3.53)

Assume that $\epsilon_{n}=\mathcal{O}(n^{-2})$ then for $\lambda_{n}=\dfrac{8}{\mu n}$ $\forall n\in\mathbb{N}$ , we obtain

\mathsf{E}\left[\|\mathsf{x}_{n}-\overline{\mathsf{x}}\|^{2}\right]=\mathcal{O}(1/n),

(3.54)

where $\overline{x}$ is the unique solution to (3.51).

(ii)

If $f$ is not strongly monotone, let $(\lambda_{n})_{n\in\mathbb{N}}$ be a sequence in $\left]\epsilon,\frac{1-\epsilon}{L}\right[$ . Assume that

\begin{cases}\sum_{n\in\mathbb{N}}\mathsf{E}[\nabla H(w_{n},\xi_{n})-\nabla h(w_{n})\|^{2}|\mathcal{F}_{n}]<+\infty\ \ a.s..\\ \sum_{n\in\mathbb{N}}\mathsf{E}[\nabla H(y_{n},\xi_{n}^{\prime})-\nabla h(y_{n})\|^{2}|\mathcal{F}_{n}]<+\infty\ \ a.s..\end{cases}

(3.55)

Then $(x_{n})_{n\in\mathbb{N}}$ converges weakly to a random variable $\overline{x}\colon\Omega\to\operatorname{zer}(\partial f+\nabla h)$ a.s..

Proof. The conclusions are followed from Theorem 3.5 $\&$ 3.8 where

A=\partial f,B=\nabla h,\;\text{and}\;(\forall n\in\mathbb{N})\;r_{n}=\nabla H(w_{n},\xi_{n}),\ s_{n}=\nabla H(y_{n},\xi_{n}^{\prime}).

(3.56)

4 Saddle point problem

Now, we study the primal-dual problem which was firstly investigated in [15]. This typical structured primal-dual framework covers a widely class of convex optimization problems and it has found many applications to image processing, machine learning [15, 16, 26, 12, 30]. Based on the duality nature of this framework, we design a new stochastic primal-dual splitting method and research the ergodic convergence of the primal-dual gap.

Problem 4.1

Let $\mathcal{H}$ and $\mathcal{G}$ be separable real Hilbert spaces. Let $f\in\Gamma_{0}(\mathcal{H})$ , $g\in\Gamma_{0}(\mathcal{G})$ and $h\colon\mathcal{H}\to\mathbb{R}$ be a convex differentiable function, with $L_{h}$ -Lipschitz continuous gradient, given by an expectation form $h(x)=\mathsf{E}_{\xi}[H(x,\xi)]$ . In the expectation, $\xi$ is a random vector whose probability distribution $P$ is supported on a set $\Omega_{p}\subset\mathbb{R}^{m}$ , and $H\colon\mathcal{H}\times\Omega\to\mathbb{R}$ is convex function with respect to the variable $x$ . Let $\ell\colon\mathcal{G}\to\mathbb{R}$ be a convex differentiable function with $L_{\ell}$ -Lipschitz continuous gradient, and given by an expectation form $\ell(v)=\mathsf{E}_{\xi}[L(v,\xi)]$ . In the expectation, $\zeta$ is a random vector whose probability distribution is supported on a set $\Omega_{D}\subset\mathbb{R}^{d}$ , and $L\colon\mathcal{G}\times\Omega_{D}\to\mathbb{R}$ is convex function with respect to the variable $v$ . Let $K\colon\mathcal{H}\to\mathcal{G}$ be a bounded linear operator. The primal problem is to

\displaystyle\underset{x\in\mathcal{H}}{\text{minimize}}\;h(x)+(\ell^{*}\mbox{\footnotesize$\square$}g)(Kx)+f(x),

(4.1)

and the dual problem is to

\displaystyle\underset{v\in\mathcal{G}}{\text{minimize}}\;(h+f)^{*}(-K^{*}v)+g^{*}(v)+\ell(v),

(4.2)

under the following assumptions:

(i)

There exists a point $(x^{\star},v^{\star})\in\mathcal{H}\times\mathcal{G}$ such that the primal-dual gap function defined by

	$\displaystyle G:$	$\displaystyle\mathcal{H}\times\mathcal{G}\to\mathbb{R}\cup\{-\infty,+\infty\}$
		$\displaystyle(x,v)\mapsto h(x)+f(x)+\left\langle{Kx}\mid{v}\right\rangle-g^{*}(v)-\ell(v)$		(4.3)

verifies the following condition:

\displaystyle\big{(}\forall x\in\mathcal{H}\big{)}\big{(}\forall v\in\mathcal{G}\big{)}\;G(x^{\star},v)\leq G(x^{\star},v^{\star})\leq G(x,v^{\star}).

(4.4)

(ii)

It is possible to obtain independent and identically distributed (i.i.d.) samples $(\xi_{n},\zeta_{n})_{n\in\mathbb{N}}$ and $(\xi_{n}^{\prime},\zeta_{n}^{\prime})_{n\in\mathbb{N}}$ of $(\xi,\zeta)$ .
(iii)

$\forall x\in\mathcal{H},\ v\in\mathcal{G}$ , we have $\mathsf{E}_{(\xi,\zeta)}[(\nabla H(x,\xi),\nabla L(v,\zeta))]=(\nabla h(x),\nabla\ell(v))$ .

Using the standard technique as in [15], we derive from (3.52) the following stochastic primal-dual splitting method, Algorithm 4.2, for solving Problem 4.1. The weak almost sure convergence and the convergence in expectation of the resulting algorithm can be derived easily from Corollary 3.10 and hence we omit them here.

Algorithm 4.2

Choose $\theta\in[0,1]$ . Let $(\lambda_{n})_{n\in\mathbb{N}}$ be a positive sequence, $(\epsilon_{n})_{n\in\mathbb{N}}\subset[0,+\infty)$ satisfying

\sum_{n=0}^{+\infty}\epsilon_{n}<+\infty.

Let $(x_{-1},x_{0})\in\mathcal{H}^{2}$ and $(v_{-1},v_{0})\in\mathcal{G}^{2}$ . Iterates

\begin{array}[]{l}\operatorname{For}\;n=0,1,\ldots,\\ \left\lfloor\begin{array}[]{l}\alpha_{n}=\begin{cases}\theta&\text{if $x_{n}=x_{n-1}$,}\\ \min\bigg{\{}\theta,\dfrac{\epsilon_{n}}{\|x_{n}-x_{n-1}\|}\bigg{\}}&\text{if $x_{n}\neq x_{n-1}$.}\end{cases}\\ w_{n}=x_{n}+\alpha_{n}(x_{n}-x_{n-1})\\ u_{n}=v_{n}+\alpha_{n}(v_{n}-v_{n-1})\\ y_{n}=\text{prox}_{\lambda_{n}f}(w_{n}-\lambda_{n}\nabla H(w_{n},\xi_{n})-\lambda_{n}K^{*}u_{n})\\ z_{n}=\text{prox}_{\lambda_{n}g^{*}}(u_{n}-\lambda_{n}\nabla L(u_{n},\zeta_{n})+\lambda_{n}Kw_{n})\\ v_{n+1}=z_{n}-\lambda_{n}(\nabla L(z_{n},\zeta_{n}^{\prime})-\nabla L(u_{n},\zeta_{n}))+\lambda_{n}K(y_{n}-w_{n})\\ x_{n+1}=y_{n}-\lambda_{n}(\nabla H(y_{n},\xi_{n}^{\prime})-\nabla H(w_{n},\xi_{n}))-\lambda_{n}K^{*}(z_{n}-u_{n})\end{array}\right.\\[5.69054pt] \end{array}

(4.5)

For simple, set $\mu=\max\{L_{h},L_{\ell}\}$ and let us define some notations in the space $\mathcal{H}\times\mathcal{G}$ where the scalar product and the associated norm are defined in the normal manner,

\begin{cases}\mathsf{x}&=(x,v),\quad\;\mathsf{w}_{n}=(w_{n},u_{n}),\quad\mathsf{x}_{n}=(x_{n},v_{n}),\quad\;\mathsf{y}_{n}=(y_{n},z_{n}),\\ r_{n}&=(\nabla H(w_{n},\xi_{n}),\nabla L(u_{n},\zeta_{n})),\\ \mathsf{s}_{n}&=(\nabla H(y_{n},\xi_{n}^{\prime}),\nabla L(z_{n},\zeta_{n}^{\prime})),\\ \nabla\boldsymbol{h}(\mathsf{w}_{n})&=(\nabla h(w_{n}),\ell(u_{n})),\ \nabla\boldsymbol{h}(\mathsf{y}_{n})=(\nabla h(y_{n}),\nabla\ell(z_{n})).\\ \end{cases}

(4.6)

Theorem 4.3

Let $(\lambda_{n})_{n\in\mathbb{N}}$ be a sequence in $\left]0,\frac{1}{\sqrt{1+\epsilon}(\mu+\|K\|)}\right[$ $(\epsilon>0)$ such that

\;C=\lambda_{n}^{2}\mathsf{E}\|\mathsf{s}_{n}-\nabla\boldsymbol{h}(\mathsf{y}_{n})\|^{2}+(1+\frac{1}{\epsilon})\lambda_{n}^{2}\mathsf{E}\|r_{n}-\nabla\boldsymbol{h}(\mathsf{w}_{n})\|^{2}<\infty.

(4.7)

For every $N\in\mathbb{N}$ , define

\hat{y}_{N}=\bigg{(}\sum_{n=0}^{N}\lambda_{n}y_{n}\bigg{)}/\bigg{(}\sum_{n=0}^{N}\lambda_{n}\bigg{)}\;\text{and}\;\hat{z}_{N}=\bigg{(}\sum_{n=0}^{N}\lambda_{n}z_{n}\bigg{)}/\bigg{(}\sum_{n=0}^{N}\lambda_{n}\bigg{)}.

(4.8)

Then the following holds:

\mathsf{E}[G(\hat{y}_{N},v)-G(x,\hat{z}_{N})]\leq\bigg{(}\frac{1}{2}(1+ST)(\|(x_{0},v_{0})-(x,v)\|^{2}+C)\bigg{)}\bigg{/}\bigg{(}\sum_{n=0}^{N}\lambda_{n}\bigg{)},

(4.9)

where $S=\sum\limits_{n\in\mathbb{N}}\epsilon_{n}$ and $T=\prod_{n=0}^{+\infty}(1+\epsilon_{n}).$

Proof. Since $\ell$ is convex, we get

\ell(z_{n})\leq\ell(v)+\left\langle{\nabla\ell(z_{n})}\mid{z_{n}-v}\right\rangle.

(4.10)

From (4.5), we have

-(z_{n}-u_{n}+\lambda_{n}\nabla L(u_{n},\zeta_{n})-\lambda_{n}Kw_{n})\in\lambda_{n}\partial g^{*}(z_{n}),

(4.11)

and hence, using the convexity of $g^{*}$ ,

g^{*}(v)-g^{*}(z_{n})\geq\frac{1}{\lambda_{n}}\left\langle{z_{n}-v}\mid{z_{n}-u_{n}+\lambda_{n}\nabla L(u_{n},\zeta_{n})-\lambda_{n}Kw_{n}}\right\rangle.

(4.12)

Therefore, we derive from (4.10), (4.12) and (4.3) that

$\displaystyle G(y_{n},v)$	$\displaystyle-G(y_{n},z_{n})=\left\langle{Ky_{n}}\mid{v-z_{n}}\right\rangle-g^{}(v)+g^{}(z_{n})-\ell(v)+\ell(z_{n})$
	$\displaystyle\leq\left\langle{Ky_{n}}\mid{v-z_{n}}\right\rangle+\dfrac{1}{\lambda_{n}}\left\langle{v-z_{n}}\mid{z_{n}-u_{n}+\lambda_{n}\nabla L(u_{n},\zeta_{n})-\lambda_{n}Kw_{n}}\right\rangle$
	$\displaystyle\ \ +\left\langle{\nabla\ell(z_{n})}\mid{z_{n}-v}\right\rangle$
	$\displaystyle=\left\langle{K(y_{n}-w_{n})}\mid{v-z_{n}}\right\rangle+\dfrac{1}{\lambda_{n}}\left\langle{v-z_{n}}\mid{z_{n}-u_{n}}\right\rangle$
	$\displaystyle\ \ +\left\langle{\nabla\ell(z_{n})-\nabla L(u_{n},\zeta_{n})}\mid{z_{n}-v}\right\rangle$
	$\displaystyle=\left\langle{K(y_{n}-w_{n})-\nabla L(z_{n},\zeta_{n+1/2})+\nabla L(u_{n},\zeta_{n})}\mid{v-z_{n}}\right\rangle+\dfrac{1}{\lambda_{n}}\left\langle{v-z_{n}}\mid{z_{n}-u_{n}}\right\rangle$
	$\displaystyle\ \ +\left\langle{\nabla l(z_{n})-\nabla L(z_{n},\zeta_{n+1/2})}\mid{z_{n}-v}\right\rangle$
	$\displaystyle=\left\langle{\frac{v_{n+1}-z_{n}}{\lambda_{n}}}\mid{v-z_{n}}\right\rangle+\dfrac{1}{\lambda_{n}}\left\langle{v-z_{n}}\mid{z_{n}-u_{n}}\right\rangle$
	$\displaystyle\ \ +\left\langle{\nabla l(z_{n})-\nabla L(z_{n},\zeta_{n+1/2})}\mid{z_{n}-v}\right\rangle$	(4.13)

By the same way, we have

h(y_{n})-h(x)\leq\left\langle{\nabla h(y_{n})}\mid{y_{n}-x}\right\rangle,

(4.14)

and

-(y_{n}-w_{n}+\lambda_{n}\nabla H(w_{n},\xi_{n})+\lambda_{n}K^{*}(u_{n}))\in\lambda_{n}\partial f(y_{n}),

(4.15)

which implies

f(y_{n})-f(x)\leq\frac{1}{\lambda_{n}}\left\langle{x-y_{n}}\mid{y_{n}-w_{n}+\lambda_{n}\nabla H(w_{n},\xi_{n})+\lambda_{n}K^{*}(u_{n})}\right\rangle.

(4.16)

In turn, we have

$\displaystyle G(y_{n},z_{n})$	$\displaystyle-G(x,z_{n})=h(y_{n})-h(x)+\left\langle{K(y_{n}-x)}\mid{z_{n}}\right\rangle+f(y_{n})-f(x)$
	$\displaystyle\leq\left\langle{\nabla h(y_{n})}\mid{y_{n}-x}\right\rangle+\left\langle{K(y_{n}-x)}\mid{z_{n}}\right\rangle$
	$\displaystyle\ \ +\frac{1}{\lambda_{n}}\left\langle{x-y_{n}}\mid{y_{n}-w_{n}+\lambda_{n}\nabla H(w_{n},\xi_{n})+\lambda_{n}K^{*}(u_{n})}\right\rangle$
	$\displaystyle=\frac{1}{\lambda_{n}}\left\langle{x-y_{n}}\mid{y_{n}-w_{n}}\right\rangle+\left\langle{x-y_{n}}\mid{\nabla H(w_{m},\xi_{n})+K^{*}(u_{n}-z_{n})-\nabla H(y_{n},\xi_{n+1/2})}\right\rangle$
	$\displaystyle\ \ +\left\langle{\nabla h(y_{n})-\nabla H(y_{n},\xi_{n+1/2})}\mid{y_{n}-x}\right\rangle$
	$\displaystyle=\frac{1}{\lambda_{n}}\left\langle{x-y_{n}}\mid{y_{n}-w_{n}}\right\rangle+\left\langle{x-y_{n}}\mid{\dfrac{x_{n+1}-y_{n}}{\lambda_{n}}}\right\rangle$
	$\displaystyle\ \ +\left\langle{\nabla h(y_{n})-\nabla H(y_{n},\xi_{n+1/2})}\mid{y_{n}-x}\right\rangle.$	(4.17)

It follows from (4.13) and (4.17) that

	$\displaystyle G(y_{n},v)-G(x,z_{n})$
	$\displaystyle\quad\leq\dfrac{1}{\lambda_{n}}\big{(}\left\langle{\mathsf{x}-\mathsf{y}_{n}}\mid{\mathsf{y}_{n}-\mathsf{w}_{n}}\right\rangle+\left\langle{\mathsf{x}-\mathsf{y}_{n}}\mid{\mathsf{x}_{n+1}-\mathsf{y}_{n}}\right\rangle\big{)}+\left\langle{\nabla\boldsymbol{h}(\mathsf{y}_{n})-\mathsf{s}_{n}}\mid{\mathsf{y}_{n}-\mathsf{x}}\right\rangle,$		(4.18)

which is equivalent to

$\displaystyle 2\lambda_{n}$	$\displaystyle\big{(}G(y_{n},v)-G(x,z_{n})\big{)}$
	$\displaystyle\leq\\|\mathsf{w}_{n}-\mathsf{x}\\|^{2}-\\|\mathsf{y}_{n}-\mathsf{x}\\|^{2}-\\|\mathsf{y}_{n}-\mathsf{w}_{n}\\|^{2}-\big{(}\\|\mathsf{x}_{n+1}-\mathsf{x}\\|^{2}-\\|\mathsf{x}_{n+1}-\mathsf{y}_{n}\\|^{2}-\\|\mathsf{y}_{n}-\mathsf{x}\\|^{2}\big{)}$
	$\displaystyle\ \ +\left\langle{\nabla\boldsymbol{h}(\mathsf{y}_{n})-\mathsf{s}_{n}}\mid{\mathsf{y}_{n}-\mathsf{x}}\right\rangle$
	$\displaystyle=\\|\mathsf{w}_{n}-\mathsf{x}\\|^{2}-\\|\mathsf{x}_{n+1}-\mathsf{x}\\|^{2}-\\|\mathsf{y}_{n}-\mathsf{w}_{n}\\|^{2}+\\|\mathsf{x}_{n+1}-\mathsf{y}_{n}\\|^{2}+\left\langle{\nabla\boldsymbol{h}(\mathsf{y}_{n})-\mathsf{s}_{n}}\mid{\mathsf{y}_{n}-\mathsf{x}}\right\rangle,$	(4.19)

which implies

	$\displaystyle 2\lambda_{n}\mathsf{E}\big{[}G(y_{n},v)-G(x,z_{n})\big{\|}\mathcal{F}_{n}]$	$\displaystyle\leq\\|\mathsf{w}_{n}-\mathsf{x}\\|^{2}-\mathsf{E}[\\|\mathsf{x}_{n+1}-\mathsf{x}\\|^{2}\|\mathcal{F}_{n}]-\\|\mathsf{y}_{n}-\mathsf{w}_{n}\\|^{2}$
		$\displaystyle\ \ +\mathsf{E}[\\|\mathsf{x}_{n+1}-\mathsf{y}_{n}\\|^{2}\|\mathcal{F}_{n}],$		(4.20)

where $\mathcal{F}_{n}=\sigma(\xi_{0},\zeta_{0},\xi_{0}^{\prime},\zeta_{0}^{\prime},\ldots,\xi_{n-1},\zeta_{n-1},\xi_{n-1}^{\prime},\zeta_{n-1}^{\prime},\xi_{n},\zeta_{n}).$
Taking expectation both side of above inequality, we get

\displaystyle 2\lambda_{n}\mathsf{E}\big{[}G(y_{n},v)-G(x,z_{n})\big{]}

\displaystyle\leq\mathsf{E}\|\mathsf{w}_{n}-\mathsf{x}\|^{2}-\mathsf{E}\|\mathsf{x}_{n+1}-\mathsf{x}\|^{2}-\mathsf{E}\|\mathsf{y}_{n}-\mathsf{w}_{n}\|^{2}+\mathsf{E}\|\mathsf{x}_{n+1}-\mathsf{y}_{n}\|^{2}.

(4.21)

Now, we estimate the first and the last term in the right side of (4.21). For the first term, using (3.5), we have

$\displaystyle\\|\mathsf{w}_{n}-\mathsf{x}\\|^{2}$	$\displaystyle=\\|\mathsf{x}_{n}+\alpha_{n}(\mathsf{x}_{n}-\mathsf{x}_{n-1})-\mathsf{x}\\|^{2}=\\|\mathsf{x}_{n}-\mathsf{x}\\|^{2}+\alpha_{n}^{2}\\|\mathsf{x}_{n}-\mathsf{x}_{n-1}\\|^{2}$
	$\displaystyle\ \ +2\alpha_{n}\left\langle{\mathsf{x}_{n}-\mathsf{x}}\mid{\mathsf{x}_{n}-\mathsf{x}_{n-1}}\right\rangle$
	$\displaystyle\leq\\|\mathsf{x}_{n}-\mathsf{x}\\|^{2}+\epsilon_{n}^{2}+2\epsilon_{n}\\|\mathsf{x}_{n}-\mathsf{x}\\|$
	$\displaystyle\leq(1+\epsilon_{n})\\|\mathsf{x}_{n}-\mathsf{x}\\|^{2}+\epsilon_{n}^{2}+\epsilon_{n}.$	(4.22)

For the last term of (4.21)

\displaystyle\|\mathsf{x}_{n+1}-\mathsf{y}_{n}\|^{2}=\|x_{n+1}-y_{n}\|^{2}+\|v_{n+1}-z_{n}\|^{2}.

(4.23)

From (4.5), we get

$\displaystyle\\|x_{n+1}-y_{n}\\|^{2}$	$\displaystyle=\\|\lambda_{n}(\nabla H(y_{n},\xi_{n}^{\prime})-\nabla H(w_{n},\xi_{n}))+\lambda_{n}K^{*}(z_{n}-u_{n})\\|^{2}$
	$\displaystyle=\lambda_{n}^{2}\\|(\nabla H(y_{n},\xi_{n}^{\prime})-\nabla h(y_{n}))+\nabla h(y_{n})-\nabla H(w_{n},\xi_{n}))+K^{*}(z_{n}-u_{n})\\|^{2}$
	$\displaystyle=\lambda_{n}^{2}\bigg{(}\\|\nabla H(y_{n},\xi_{n}^{\prime})-\nabla h(y_{n})\\|^{2}+\\|\nabla h(y_{n})-\nabla H(w_{n},\xi_{n}))+K^{*}(z_{n}-u_{n})\\|^{2}$
	$\displaystyle\ \ \ +2\left\langle{\nabla H(y_{n},\xi_{n}^{\prime})-\nabla h(y_{n})}\mid{\nabla h(y_{n})-\nabla H(w_{n},\xi_{n}))+K^{*}(z_{n}-u_{n})}\right\rangle\bigg{)},$	(4.24)

which implies that

$\displaystyle\mathsf{E}\\|x_{n+1}-y_{n}\\|^{2}$	$\displaystyle\leq\lambda_{n}^{2}\mathsf{E}[\\|\nabla H(y_{n},\xi_{n}^{\prime})-\nabla h(y_{n})\\|^{2}]$
	$\displaystyle\ +\lambda_{n}^{2}\mathsf{E}[\\|\nabla h(w_{n})-\nabla H(w_{n},\xi_{n}))+\nabla h(y_{n})-\nabla h(w_{n})+K^{*}(z_{n}-u_{n})\\|^{2}]$
	$\displaystyle\leq\lambda_{n}^{2}\mathsf{E}[\\|\nabla H(y_{n},\xi_{n}^{\prime})-\nabla h(y_{n})\\|^{2}]+(1+\frac{1}{\epsilon})\lambda_{n}^{2}\mathsf{E}[\\|\nabla h(w_{n})-\nabla H(w_{n},\xi_{n})\\|^{2}]$
	$\displaystyle\ +(1+\epsilon)\lambda_{n}^{2}\mathsf{E}[\\|\nabla h(y_{n})-\nabla h(w_{n})+K^{*}(z_{n}-u_{n})\\|^{2}].$	(4.25)

The last term in the right side of (4.25) can be bounded

$\displaystyle\\|\nabla h(y_{n})-\nabla h(w_{n})+K^{*}(z_{n}-u_{n})\\|^{2}$	$\displaystyle=\\|\nabla h(y_{n})-\nabla h(w_{n})\\|^{2}+\\|K^{*}(z_{n}-u_{n})\\|^{2}$
	$\displaystyle\ +2\left\langle{\nabla h(y_{n})-\nabla h(w_{n})}\mid{K^{*}(z_{n}-u_{n})}\right\rangle$
	$\displaystyle\leq L_{h}^{2}\\|y_{n}-w_{n}\\|^{2}+\\|K\\|^{2}\\|z_{n}-u_{n}\\|^{2}$
	$\displaystyle\ +2L_{h}\\|K\\|\\|y_{n}-w_{n}\\|\\|z_{n}-u_{n}\\|.$	(4.26)

From (4.25) and (4.26), we get

	$\displaystyle\mathsf{E}\\|x_{n+1}$	$\displaystyle-y_{n}\\|^{2}\leq\lambda_{n}^{2}\mathsf{E}[\\|\nabla H(y_{n},\xi_{n}^{\prime})-\nabla h(y_{n})\\|^{2}]+(1+\frac{1}{\epsilon})\lambda_{n}^{2}\mathsf{E}[\\|\nabla h(w_{n})-\nabla H(w_{n},\xi_{n})\\|^{2}]$
		$\displaystyle\ +(1+\epsilon)\lambda_{n}^{2}\mathsf{E}[L_{h}^{2}\\|y_{n}-w_{n}\\|^{2}+\\|K\\|^{2}\\|z_{n}-u_{n}\\|^{2}+2L_{h}\\|K\\|\\|y_{n}-w_{n}\\|\\|z_{n}-u_{n}\\|].$		(4.27)

Similarly to (4.27), we have

	$\displaystyle\mathsf{E}\\|v_{n+1}$	$\displaystyle-z_{n}\\|^{2}\leq\lambda_{n}^{2}\mathsf{E}[\\|\nabla L(z_{n},\zeta_{n}^{\prime})-\nabla\ell(z_{n})\\|^{2}]+(1+\frac{1}{\epsilon})\lambda_{n}^{2}\mathsf{E}[\\|\nabla\ell(u_{n})-\nabla L(u_{n},\zeta_{n})\\|^{2}]$
		$\displaystyle\ +(1+\epsilon)\lambda_{n}^{2}\mathsf{E}[L_{\ell}^{2}\\|z_{n}-u_{n}\\|^{2}+\\|K\\|^{2}\\|y_{n}-w_{n}\\|^{2}+2L_{\ell}\\|K\\|\\|z_{n}-u_{n}\\|\\|y_{n}-w_{n}\\|].$		(4.28)

Combining (4.27) and (4.28), we derive

	$\displaystyle\mathsf{E}\\|\mathsf{x}_{n+1}-\mathsf{y}_{n}\\|^{2}$	$\displaystyle\leq\lambda_{n}^{2}\mathsf{E}\\|\mathsf{s}_{n}-\nabla\boldsymbol{h}(\mathsf{y}_{n})\\|^{2}+(1+\frac{1}{\epsilon})\lambda_{n}^{2}\mathsf{E}\\|r_{n}-\nabla\boldsymbol{h}(\mathsf{w}_{n})\\|^{2}$
		$\displaystyle\ +(1+\epsilon)(\mu+\\|K\\|)^{2}\lambda_{n}^{2}\mathsf{E}\\|\mathsf{y}_{n}-\mathsf{w}_{n}\\|^{2}$		(4.29)

From (4.21), (4) and (4.29), using the condition of $\lambda_{n}$ , i.e. $\lambda_{n}\leq\frac{1}{\sqrt{1+\epsilon}(\mu+\|K\|)}$ , we get

\displaystyle 2\lambda_{n}\mathsf{E}\big{[}G(y_{n},v)-G(x,z_{n})\big{]}

\displaystyle\leq(1+\epsilon_{n})\mathsf{E}\|\mathsf{x}_{n}-\mathsf{x}\|^{2}-\mathsf{E}\|\mathsf{x}_{n+1}-\mathsf{x}\|^{2}+c_{n},

(4.30)

where $c_{n}=\lambda_{n}^{2}\mathsf{E}\|\mathsf{s}_{n}-\nabla\boldsymbol{h}(\mathsf{y}_{n})\|^{2}+(1+\frac{1}{\epsilon})\lambda_{n}^{2}\mathsf{E}\|r_{n}-\nabla\boldsymbol{h}(\mathsf{w}_{n})\|^{2}$ . It follows from (4.30) that

\displaystyle\mathsf{E}\|\mathsf{x}_{n+1}-\mathsf{x}\|^{2}\leq(1+\epsilon_{n})\mathsf{E}\|\mathsf{x}_{n}-\mathsf{x}\|^{2}+c_{n}.

(4.31)

Using above inequality $n$ times, we obtain

	$\displaystyle\mathsf{E}\\|\mathsf{x}_{n+1}-\mathsf{x}\\|^{2}$	$\displaystyle\leq\prod_{i=0}^{n}(1+\epsilon_{i})\\|\mathsf{x}_{0}-\mathsf{x}\\|^{2}+\sum\limits_{i=0}^{n}\prod_{j=i+1}^{n}(1+\epsilon_{j})c_{i}$
		$\displaystyle\leq T\\|\mathsf{x}_{0}-\mathsf{x}\\|^{2}+T\sum\limits_{n\in\mathbb{N}}c_{n}=T\\|\mathsf{x}_{0}-\mathsf{x}\\|^{2}+TC.$		(4.32)

Summing (4.30) from $n=0$ to $n=N$ , we have

$\displaystyle 2\sum\limits_{n=0}^{N}\lambda_{n}\mathsf{E}\big{[}G(y_{n},v)-G(x,z_{n})\big{]}$	$\displaystyle\leq\\|\mathsf{x}_{0}-\mathsf{x}\\|^{2}+\sum\limits_{n=0}^{N}\epsilon_{n}\mathsf{E}\\|\mathsf{x}_{n}-\mathsf{x}\\|^{2}-\mathsf{E}\\|\mathsf{x}_{n+1}-\mathsf{x}\\|^{2}+\sum\limits_{n=0}^{N}c_{n}$
	$\displaystyle\leq\\|\mathsf{x}_{0}-\mathsf{x}\\|^{2}+S(T\\|\mathsf{x}_{0}-\mathsf{x}\\|^{2}+TC)+C$
	$\displaystyle=(1+TS)(\\|\mathsf{x}_{0}-\mathsf{x}\\|^{2}+C).$	(4.33)

Using the convexity-concavity of $G$ , we obtain the desired result.

Remark 4.4

Here are some remarks.

(1)

The upper bound of $(\lambda_{n})_{n\in\mathbb{N}}$ in Theorem 4.3 can be written as $\dfrac{1-\epsilon^{\prime}}{\mu+\|K\|}$ , where $\epsilon^{\prime}\in]0,1[.$ Indeed, for $\epsilon<\frac{1}{(1-\epsilon^{\prime})^{2}}-1$ , we get $\lambda_{n}<\frac{1}{\sqrt{1+\epsilon}(\mu+\|K\|)}.$
(2)

In the deterministic setting, the convergence rate of the primal-dual gap for structure convex optimization involving infimal convolutions were also investigated in [5, 7]. Furthermore, in the deterministic case and $\alpha_{n}=0$ $\forall n\in\mathbb{N}$ , our result is one in [7]. The convergence rate $\mathcal{O}(1/N)$ of the primal-dual gap also obtain in [23].
(3)

In the stochastic setting, our result is the same convergence rate of the primal-dual gap as in [38, 29].

References

[1] K. Aoyama, Y. Kimura, and W. Takahashi. Maximal monotone operators and maximal monotone functions for equilibrium problems. Journal of Convex Analysis, 15:395–409, 2008.
[2] H. H. Bauschke and P. L. Combettes. Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York, 2017.
[3] A. Beck and M. Teboulle. Gradient-based algorithms with applications to signal recovery problems. In: Convex Optimization in Signal Processing and Communications, Editors: Yonina Eldar and Daniel Palomar. Cambridge University Press, 2009.
[4] P. Bianchi, W. Hachem, and F. Iutzeler. A stochastic coordinate descent primal-dual algorithm and applications to large-scale composite optimization. http://arxiv.org/abs/1407.0898.
[5] R. I. Bot and E. R. Csetnek. On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems. J. Math. Imaging, 64:5–23, 2015.
[6] R. I. Bot and E. R. Csetnek. An inertial forward-backward-forward primal-dual splitting algorithm for solving monotone inclusion problems. Numer. Algor., 71:519–540, 2016.
[7] R. I. Bot and C. Hendrich. Convergence analysis for a primal-dual monotone + skew splitting algorithm with applications to total variation minimization,. J. Math. Imaging Vis., 49:551–568, 2014.
[8] H. Brézis. Operateurs maximaux monotones. North-Holland Math Stud., 5:19–51, 1973.
[9] H. Brézis. Opérateurs maximaux monotones et semi-groupes de contractions dans les espaces de Hilbert. North-Holland Publishing Co., Amsterdam-London; American Elsevier Publishing Co., Inc., New York, 1973.
[10] V. Cevher and B. C. Vu. A reflected forward-backward splitting method for monotone inclu-sions involving lipschitzian operators. Set-Valued Var. Anal., 29:163–174, 2021.
[11] V. Cevher, B. C. Vu, and A. Yurtsever. Stochastic forward douglas-rachford splitting methodfor monotone inclusions. In: P. Giselsson and A. Rantzer (eds) Large-scale and distributed optimization. Lecture Notes in Mathematics, 2227, 2018.
[12] A. Chambolle and T. Pock. An introduction to continuous optimization for imaging. Acta Numer., 25:161–319, 2016.
[13] W. Cholamjiak, P. Cholamjiak, and S. Suanta. An inertial forward–backward splitting method for solving inclusion problems in hilbert spaces. J. Fixed Point Theory Appl., pages 20–42, 2018.
[14] P. L. Combettes. Solving monotone inclusions via compositions of nonexpansive averaged operators. Optimization, 53:475–504, 2004.
[15] P. L. Combettes and J. C. Pesque. Primal-dual splitting algorithm for solving inclusions with mixtures of composite, lipschitzian, and parallel-sum type monotone operators. Set-Valued Var. Anal, 20:307–330, 2012.
[16] P. L. Combettes and J. C. Pesquet. Proximal splitting methods in signal processing. fixed-point algorithms for inverse problems in science and engineering. Optim. Appl., 49:185–212, 2011.
[17] P. L. Combettes and J. C. Pesquet. Stochastic quasi-fejér block-coordinate fixed point itera-tions with random sweeping. SIAM J. Optim., 25:1221–1248, 2015.
[18] P. L. Combettes and J. C. Pesquet. Stochastic approximations and perturbations in forward-backward splitting for monotone operators. Pure Appl. Funct. Anal., 1:13–37, 2016.
[19] P. L. Combettes and V. R. Wajs. Signal recovery by proximal forward-backward splitting. Multi-scale Model. Simul., 4:1168–1200, 2005.
[20] S. Cui and V. Shanbhag. Variance-reduced splitting schemes for monotone stochastic generalized equations. https://arxiv.org/pdf/2008.11348.pdf.
[21] S. Cui, V. Shanbhag, M. Staudigl, and P. T. Vuong. Stochastic relaxed inertial forward-backward-forward splitting for monotone inclusions in hilbert spaces. https://arxiv.org/pdf/2107.10335.pdf.
[22] J. Douglas and H. H. Rachford. On the numerical solution of the heat conduction problem in 2 and 3 space variables. Trans. Am. Math. Soc., 82:421–439, 1956.
[23] Y. Drori, S. Sabach, and M. Teboulle. A simple algorithm for a class of nonsmooth convex-concave saddle-point problems. Oper. Res. Lett., 43:209–214, 2015.
[24] J. Duchi and Y. Singer. Efficient online and batch learning using forward backward splitting. J. Mach. Learn. Res., 10:2899–2934, 2009.
[25] J. Eckstein and D. P. Bertsekas. On the douglas-rachford splitting method and the proximalpoint algorithm for maximal monotone operators. Math. Program., 55:293–318, 1992.
[26] N. Komodakis and J. C. Pesquet. Playing with duality: An overview of recent primal-dual approaches for solving large-scale optimization problems. IEEE Signal processing magazine, 32:31–54, 2015.
[27] M. Ledoux and M. Talagrand. Probability in Banach spaces: isoperimetry and processes. Springer, New York, 1991.
[28] P. L. Lions and B. Mercier. Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal., 16(6):964–979, 1979.
[29] V. D. Nguyen and B. C. Vu. Convergence analysis of the stochastic reflected forward-backward splitting algorithm. Optimization letter, 2022, https://link.springer.com/article/10.1007/s11590-021-01844-8.
[30] H. Ouyang, N. He, L. Tran, and A. Gray. Stochastic alternating direction method of multipliers. In Proceedings of the 30th International Conference on Machine Learning, Atlanta, GA, USA, 2013.
[31] D. W. Peaceman and H. H. Rachford. The numerical solution of parabolic and elliptic differentials. J. Soc. Ind. Appl. Math., 3:28–41, 1955.
[32] J. C. Pesquet and A. Repetti. A class of randomized primal-dual algorithms for distributed optimization. J. Nonlinear Convex Anal., 2015.
[33] H. Raguet, J. Fadili, and G. Peyré. A generalized forward-backward splitting. SIAM J. Imaging Sci., 6(13):1199–1226, 2013.
[34] H. Robbins and D.Siegmund. A convergence theorem for non negative almost supermartingales and some applications. In: Rustagi JS, editor. Optimizing methods in statistic, Academic Press:233–257, 1971.
[35] L. Rosasco, S. Villa, and B. C. Vu. Convergence of stochastic proximal gradient. http://arxiv.org/abs/1403.5074, 2014.
[36] L. Rosasco, S. Villa, and B. C. Vu. Stochastic forward-backward splitting for monotone inclusions. J. Optim. Theory Appl., 169:388–406, 2016.
[37] L. Rosasco, S. Villa, and B. C. Vu. A stochastic inertial forward-backward splitting algorithm for multivariate monotone inclusions. Optimization, 65:1293–1314, 2016.
[38] L. Rosasco, S. Villa, and B. C. Vu. A first-order stochastic primal-dual algorithm with correction step. Numer. Funct. Anal. Optim. 2, 38(5):602–626, 2017.
[39] S. Takahashi, W. Takahashi, and M. Toyoda. Strong convergence theorems for maximal monotone operators with nonlinear mappings in hilbert spaces. J. Optim. Theory Appl., 147:27–41, 2010.
[40] D. V. Thong and N. T. Vinh. Inertial methods for fixed point problems and zero point problems of the sum of two monotone mappings. Optimization, 68(5):1037–1072, 2019.
[41] P. Tseng. A modified forward-backward splitting method for maximal monotone mappings. SIAM J. Control Optim., 38(2):431–446, 2000.
[42] B. C. Vu. Almost sure convergence of the forward-backward-forward splitting algorithm. Optim. Lett., 10:781–803, 2016.

$\displaystyle\\|x_{n+1}-p\\|^{2}$	$\displaystyle=\\|y_{n}-\lambda_{n}(s_{n}-r_{n})-p\\|^{2}$
	$\displaystyle=\\|y_{n}-w_{n}-\lambda_{n}(s_{n}-r_{n})+w_{n}-p\\|^{2}$
	$\displaystyle=\\|w_{n}-p\\|^{2}+\\|y_{n}-w_{n}-\lambda_{n}(s_{n}-r_{n})\\|^{2}+2\left\langle{y_{n}-w_{n}-\lambda_{n}(s_{n}-r_{n})}\mid{w_{n}-p}\right\rangle$
	$\displaystyle=\\|w_{n}-p\\|^{2}+\\|y_{n}-w_{n}-\lambda_{n}(s_{n}-r_{n})\\|^{2}+2\left\langle{y_{n}-w_{n}-\lambda_{n}(s_{n}-r_{n})}\mid{w_{n}-y_{n}}\right\rangle$
	$\displaystyle\hskip 204.85974pt+2\left\langle{y_{n}-w_{n}-\lambda_{n}(s_{n}-r_{n})}\mid{y_{n}-p}\right\rangle$
	$\displaystyle=\\|w_{n}-p\\|^{2}+\\|y_{n}-w_{n}-\lambda_{n}(s_{n}-r_{n})\\|^{2}-2\\|w_{n}-y_{n}\\|^{2}+2\lambda_{n}\left\langle{s_{n}-r_{n}}\mid{y_{n}-w_{n}}\right\rangle$
	$\displaystyle\hskip 204.85974pt+2\left\langle{y_{n}-w_{n}-\lambda_{n}(s_{n}-r_{n})}\mid{y_{n}-p}\right\rangle$
	$\displaystyle=\\|w_{n}-p\\|^{2}-\\|w_{n}-y_{n}\\|^{2}+\lambda_{n}^{2}\\|s_{n}-r_{n}\\|^{2}$
	$\displaystyle\hskip 42.67912pt+2\left\langle{y_{n}-w_{n}-\lambda_{n}(By_{n}-r_{n})}\mid{y_{n}-p}\right\rangle+2\lambda_{n}\left\langle{By_{n}-s_{n}}\mid{y_{n}-p}\right\rangle.$	(3.7)

$\displaystyle\\|s_{n}-r_{n}\\|^{2}$	$\displaystyle=\\|s_{n}-By_{n}+By_{n}-r_{n}\\|^{2}$
	$\displaystyle=\\|s_{n}-By_{n}\\|^{2}+2\left\langle{s_{n}-By_{n}}\mid{By_{n}-r_{n}}\right\rangle+\\|By_{n}-Bw_{n}+Bw_{n}-r_{n}\\|^{2}$
	$\displaystyle=\\|s_{n}-By_{n}\\|^{2}+2\left\langle{s_{n}-By_{n}}\mid{By_{n}-r_{n}}\right\rangle+\\|By_{n}-Bw_{n}\\|^{2}+\\|Bw_{n}-r_{n}\\|^{2}$
	$\displaystyle\ \ \ +2\left\langle{By_{n}-Bw_{n}}\mid{Bw_{n}-r_{n}}\right\rangle$
	$\displaystyle\leq\\|s_{n}-By_{n}\\|^{2}+2\left\langle{s_{n}-By_{n}}\mid{By_{n}-r_{n}}\right\rangle+L^{2}\\|y_{n}-w_{n}\\|^{2}+\\|Bw_{n}-r_{n}\\|^{2}$
	$\displaystyle\ \ \ +2\left\langle{By_{n}-Bw_{n}}\mid{Bw_{n}-r_{n}}\right\rangle$	(3.8)

$\displaystyle\\|w_{n}-p\\|^{2}$	$\displaystyle=\\|x_{n}+\alpha_{n}(x_{n}-x_{n-1})-p\\|^{2}$
	$\displaystyle=\\|x_{n}-p\\|^{2}+2\alpha_{n}\left\langle{x_{n}-p}\mid{x_{n}-x_{n-1}}\right\rangle+\alpha_{n}^{2}\\|x_{n}-x_{n-1}\\|^{2}$
	$\displaystyle\leq\\|x_{n}-p\\|^{2}+2\alpha_{n}\\|x_{n}-p\\|\\|x_{n}-x_{n-1}\\|+\alpha_{n}^{2}\\|x_{n}-x_{n-1}\\|^{2}$
	$\displaystyle\leq\\|x_{n}-p\\|^{2}+2\epsilon_{n}\\|x_{n}-p\\|+\epsilon_{n}^{2}$
	$\displaystyle\leq(1+\epsilon_{n})\\|x_{n}-p\\|^{2}+\epsilon_{n}^{2}+\epsilon_{n}\ .$	(3.14)

	$\displaystyle\mathsf{E}[\\|x_{n+1}$	$\displaystyle-p\\|^{2}\|\mathcal{F}_{n}]\leq(1+\epsilon_{n})\\|x_{n}-p\\|^{2}-(1-\lambda_{n}^{2}L^{2})\\|w_{n}-y_{n}\\|^{2}+\lambda_{n}^{2}\mathsf{E}[\\|s_{n}-By_{n}\\|^{2}\|\mathcal{F}_{n}]$
		$\displaystyle+\lambda_{n}^{2}\\|r_{n}-Bw_{n}\\|^{2}+2\lambda_{n}^{2}\left\langle{By_{n}-Bw_{n}}\mid{Bw_{n}-r_{n}}\right\rangle-2\lambda_{n}\phi(\\|y_{n}-p\\|)+\epsilon_{n}^{2}+\epsilon_{n}.$		(3.15)

$\displaystyle 2\left\langle{By_{n}-Bw_{n}}\mid{Bw_{n}-r_{n}}\right\rangle$	$\displaystyle\leq 2\\|By_{n}-Bw_{n}\\|\\|Bw_{n}-r_{n}\\|$
	$\displaystyle\leq\dfrac{\epsilon}{1-\epsilon}\\|By_{n}-Bw_{n}\\|^{2}+\dfrac{1-\epsilon}{\epsilon}\\|r_{n}-Bw_{n}\\|^{2}$
	$\displaystyle\leq\dfrac{\epsilon}{1-\epsilon}L^{2}\\|y_{n}-w_{n}\\|^{2}+\dfrac{1-\epsilon}{\epsilon}\\|r_{n}-Bw_{n}\\|^{2}.$	(3.16)