1 Introduction
In this paper, we study the following inclusion problem:
|
|
|
(1.1) |
where is a separable real Hilbert space, is a maximal monotone operator and is a monotone operator. The solution set of (1.1) is denoted by .
This problem
plays an important role in many fields,
such as equilibrium problems, fixed point problems, variational inequalities, and composite minimization problems, see, for example, [1, 9, 2]. To be more precise, many
problems in signal processing,
computer vision and machine learning
can be modeled mathematically as this formulation, see [19, 33, 3] and the references therein. For solving the problem (1.1), the so-called forward-backward splitting method
is given as follows:
|
|
|
(1.2) |
where .
The forward-backward splitting algorithm for monotone inclusion problems was first introduced by Lions and Mercier [28]. In the work of Lions and Mercier, other splitting methods, such as Peaceman–
Rachford algorithm [31] and Douglas-Rachford algorithm [22] was developed to find the zeros of the sum of two maximal monotone operators.
Since then, it has been studied and reported extensively in the literature; see, for instance, [41, 10, 14, 6, 13, 25, 40] and the references therein.
Recently, stochastic versions
of splitting algorithms for monotone inclusions have been proposed, for example stochastic forward-backward splitting method [37, 36],
stochastic Douglas-Rachford splitting method [11], stochastic reflected forward-backward splitting method [29] and stochastic primal-dual method [38], see also [19, 4, 32] and applications to stochastic optimization [37, 17] and
machine learning [35, 24].
Motivated and inspired by the algorithms in [41, 37, 36, 42, 29], we will introduce a new stochastic splitting algorithm for inclusion problems. The convergence and the rate convergence of the proposed algorithm are obtained.
The rest of the paper is organized as follows. After collecting some definitions and basic results in Section 2, we prove in Section 3 the almost sure convergen for the general case and the strong convergence along with the rate convergence in the strongly monotone case.
In section 4, we apply the proposed algorithm to the convex-concave saddle point problem.
3 Main results
In this section, we propose a novel stochastic forward-backward-forward algorithm for solving the problem 1.1 and analyse its convergence behaviour. Unless otherwise specified, we assume that the following assumptions are satisfied from now on.
Assumption 3.1
In what follows we suppose the following assumptions for and :
-
(A1)
The mapping is -Lipschitz continuous and monotone;
-
(A2)
The set-valued maping is maximal monotone.
-
(A3)
The solution set .
The algorithm is designed as follows.
Algorithm 3.2
Step 0: (Initialization) Choose . Let be a positive sequence, satisfying
|
|
|
(3.1) |
Let be -valued, squared integrable
random variables and set .
Step 1: Given (), choose such that
|
|
|
(3.3) |
Let be a random vector.
Compute
|
|
|
|
|
|
|
|
Step 2: Let be an unbiased estimator of , i.e., . Calculate the next iterate as
|
|
|
|
(3.4) |
where .
Let and return to Step 1.
Lemma 3.4
Let be generated by Algorithm 3.2, then the following holds:
|
|
|
|
|
|
|
|
|
|
|
|
(3.6) |
for any .
Proof. We have
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(3.7) |
Note that
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(3.8) |
By combining (3) and (3) we obtain
|
|
|
|
|
|
|
|
|
|
|
|
Theorem 3.5
Let be generated by Algorithm 3.2. The followings hold
-
(i)
Assume that be a sequence in and the following conditions are satisfied for
|
|
|
(3.9) |
Then converges weakly to a random varibale a.s..
-
(ii)
Suppose that or is uniformly monotone. Let be a sequence in such that and
|
|
|
(3.10) |
where
Then converges strongly to a unique solution a.s..
Proof. From Lemma 3.4, taking conditional expectation given on both sides of (3.6)), using we get
|
|
|
|
|
|
|
|
(3.11) |
Since , we obtain
|
|
|
which is equivalent to
|
|
|
We have , using the uniformly monotone of , we get
|
|
|
(3.12) |
which implies
|
|
|
(3.13) |
Using (3.5) and Cauchy-Schwarz inequality, we estimate the term in (3.11) as follows:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(3.14) |
Therefore, from (3.11), using (3.13) and (3.14), we derive
|
|
|
|
|
|
|
|
(3.15) |
(i) In general case, i.e. . We have
|
|
|
|
|
|
|
|
|
|
|
|
(3.16) |
Hence, (3) implies that
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(3.17) |
We have that which implies . Therefore, using the conditions in Theorem 3.5 and Lemma 2.6, (3) implies that
|
|
|
We have
|
|
|
|
|
|
|
|
(3.18) |
Let us set
|
|
|
(3.19) |
Then, since is nonexpansive, we have
|
|
|
(3.20) |
Hence
|
|
|
(3.21) |
Let be a weak cluster point of .
Then, there exists a subsequence which converges weakly to a.s.. By (3.18), a.s..
It follows from
that .
Since , we have
|
|
|
(3.22) |
Since is -Lipschitz and is bounded away from , it follows that
|
|
|
(3.23) |
Using [2, Corollary 25.5], the sum is maximally monotone and hence, its graph is closed in [2, Proposition 20.38].
Therefore, a.s., that is a.s. By Lemma 2.7, the sequence converges weakly to a.s. and the proof is completed.
(ii) In case is uniform monotone.
We rewrite (3) as
|
|
|
|
|
|
|
|
|
|
|
|
(3.24) |
We have
|
|
|
|
|
|
|
|
(3.25) |
Using (3), from (3) we have
|
|
|
|
|
|
|
|
(3.26) |
From , we derive . We have that
|
|
|
(3.27) |
Therofore (3) and Lemma 2.6 imply
|
|
|
(3.28) |
Since , it follows from that
. Thus, there exists a subsequence such that . It follows from (3.18) that a.s.. Therefore, we infer that a.s.. This completes the proof.
For the rate convergence in uniformly monotone case, we define the function
|
|
|
(3.29) |
The following Lemma establishes a non asymptotic bound for numerical sequences satisfying a given recursive inequality. The proof is obtained similarly to the proof of Lemma 3.1 in [36].
Lemma 3.7
Let . Let Set . Let be a sequence such that
|
|
|
(3.30) |
Let be the smallest integer such that, for every it holds set Then for every , the followings hold
-
(i)
If , we get
|
|
|
(3.31) |
-
(ii)
If , we have
|
|
|
(3.32) |
Proof. We recall the definition of in (3.29). Note that, is a increasing function and for , , we get:
|
|
|
(3.33) |
We have
|
|
|
(3.34) |
Let us estimate the first term in the right hand side of (3.34). Using (3.33) and the inequality , we have
|
|
|
|
|
|
|
|
(3.35) |
To estimate the second term in the right hand side of (3.34), let us consider firstly the case . We have
|
|
|
|
|
|
|
|
|
|
|
|
(3.36) |
From (3.34), using (3) and (3), for , we get
|
|
|
(3.37) |
We next estimate the second term in the right hand side of (3.34) in case . Let such that We have
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(3.38) |
Combing (3) and (3.38), for , we have
|
|
|
|
|
|
|
|
(3.39) |
Theorem 3.8
Suppose that or is -strongly monotone. For , define
|
|
|
(3.40) |
Suppose that there exist constants and such that
|
|
|
(3.41) |
and . Set , assume that . Then
|
|
|
(3.42) |
Proof. Using the strong monotonicity of , we rewrite (3)
|
|
|
|
|
|
|
|
(3.43) |
Using Cauchy-Schwarz inequality, we have
|
|
|
|
|
|
|
|
|
|
|
|
(3.44) |
which is equivalent to
|
|
|
|
(3.45) |
Hence (3) implies that
|
|
|
|
|
|
|
|
(3.46) |
We have that there exists such that
|
|
|
(3.47) |
Therefore (3) implies that for , we have
|
|
|
|
|
|
|
|
(3.48) |
From the definition of , there exist and such that , we get
|
|
|
(3.49) |
Using Lemma 3.7, we obtain:
In case , from (3.32), we have
In case , from (3.31) and (3.29), we get
|
|
|
(3.50) |
which proves the desired result.
From Theorem 3.5 and Theorem 3.8, we have the following Corollary:
Corollary 3.10
Let and
be a convex differentiable function, with -Lipschitz continuous gradient, given by an expectation form . In the expectation, is a random vector whose probability distribution is supported on a set , and is convex function with respect to the variable . The problem is to
|
|
|
(3.51) |
under the following assumptions:
-
(i)
.
-
(ii)
It is possible to obtain independent and identically distributed (i.i.d.) samples , of .
-
(iii)
for .
Let be a sequence in . Let
be in . Define
|
|
|
(3.52) |
where is defined as in Algorithm 3.2. Denote .
Then, the followings hold.
-
(i)
If is -strongly monotone (),
and there exists a constant such that
|
|
|
(3.53) |
Assume that then for , we obtain
|
|
|
(3.54) |
where is the unique solution to (3.51).
-
(ii)
If is not strongly monotone, let be a sequence in . Assume that
|
|
|
(3.55) |
Then converges weakly to a random variable a.s..
Proof. The conclusions are followed from Theorem 3.5 3.8 where
|
|
|
(3.56) |
4 Saddle point problem
Now, we study the primal-dual problem which was firstly investigated
in [15]. This typical structured primal-dual framework covers a widely class of convex
optimization problems and it has found many applications to image processing, machine learning
[15, 16, 26, 12, 30].
Based on the duality nature of this framework, we design a new stochastic primal-dual splitting method and research the ergodic convergence
of the primal-dual gap.
Problem 4.1
Let and be separable real Hilbert spaces.
Let , and be a convex differentiable function, with -Lipschitz continuous gradient, given by an expectation form . In the expectation, is a random vector whose probability distribution is supported on a set , and is convex function with respect to the variable .
Let be a convex differentiable function with -Lipschitz continuous gradient, and given by an expectation form . In the expectation, is a random vector whose probability distribution is supported on a set , and is convex function with respect to the variable . Let be a bounded linear operator.
The primal problem is to
|
|
|
(4.1) |
and the dual problem is to
|
|
|
(4.2) |
under the following assumptions:
-
(i)
There exists a point such that the primal-dual gap function defined by
|
|
|
|
|
|
|
|
(4.3) |
verifies the following condition:
|
|
|
(4.4) |
-
(ii)
It is possible to obtain independent and identically distributed (i.i.d.) samples and of .
-
(iii)
, we have
.
Using the standard technique as in [15],
we derive from (3.52) the following
stochastic primal-dual splitting method,
Algorithm 4.2, for solving Problem 4.1. The weak almost sure convergence and the convergence in expectation of the resulting algorithm can be derived easily from Corollary 3.10 and hence we omit them here.
Algorithm 4.2
Choose . Let be a positive sequence, satisfying
|
|
|
Let and .
Iterates
|
|
|
(4.5) |
For simple, set and let us define some notations in the space where the scalar product and the associated norm are defined in the normal manner,
|
|
|
(4.6) |
Theorem 4.3
Let be a sequence in such that
|
|
|
(4.7) |
For every , define
|
|
|
(4.8) |
Then the following holds:
|
|
|
(4.9) |
where
and
Proof. Since is convex, we get
|
|
|
(4.10) |
From (4.5), we have
|
|
|
(4.11) |
and hence, using the convexity of ,
|
|
|
(4.12) |
Therefore, we derive from (4.10), (4.12) and (4.3) that
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(4.13) |
By the same way, we have
|
|
|
(4.14) |
and
|
|
|
(4.15) |
which implies
|
|
|
(4.16) |
In turn, we have
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(4.17) |
It follows from (4.13) and (4.17) that
|
|
|
|
|
|
(4.18) |
which is equivalent to
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(4.19) |
which implies
|
|
|
|
|
|
|
|
(4.20) |
where
Taking expectation both side of above inequality, we get
|
|
|
|
(4.21) |
Now, we estimate the first and the last term in the right side of (4.21). For the first term, using (3.5), we have
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(4.22) |
For the last term of (4.21)
|
|
|
(4.23) |
From (4.5), we get
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(4.24) |
which implies that
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(4.25) |
The last term in the right side of (4.25) can be bounded
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(4.26) |
From (4.25) and (4.26), we get
|
|
|
|
|
|
|
|
(4.27) |
Similarly to (4.27), we have
|
|
|
|
|
|
|
|
(4.28) |
Combining (4.27) and (4.28), we derive
|
|
|
|
|
|
|
|
(4.29) |
From (4.21), (4) and (4.29), using the condition of , i.e. , we get
|
|
|
|
(4.30) |
where . It follows from (4.30) that
|
|
|
(4.31) |
Using above inequality times, we obtain
|
|
|
|
|
|
|
|
(4.32) |
Summing (4.30) from to , we have
|
|
|
|
|
|
|
|
|
|
|
|
(4.33) |
Using the convexity-concavity of , we obtain the desired result.