Stochastic forward-backward-half forward splitting algorithm with variance reduction
Abstract
In this paper, we present a stochastic forward-backward-half forward splitting algorithm with variance reduction for solving the structured monotone inclusion problem composed of a maximally monotone operator, a maximally monotone and Lipschitz continuous operator and a cocoercive operator. By defining a Lyapunov function, we establish the almost sure convergence of the proposed algorithm, and obtain the linear convergence when one of the maximally monotone operators is strongly monotone. Numerical examples are provided to show the performance of the proposed algorithm.
Key words: Variance reduction; Forward-backward-half forward splitting algorithm; Monotone inclusion problem; The almost sure convergence; Strongly monotone; Linear convergence.
1 Introduction and preliminaries
In this paper, we consider the structured monotone inclusion problem which is to find such that
(1) |
where is a maximally monotone operator, is a maximally monotone and -Lipschitz operator, and is a -cocoercive operator. Problem (1) arises in various applications such as optimization problems [4, 7], variational inequalities [1], deep learning [3] and image deblurring [22].
Numerous iterative algorithms for solving (1) have been presented and analyzed, see, for instance, [6, 7, 11, 12, 13, 14, 15, 19, 21, 22] and references therein. In particular, Briceño-Arias et al. [4] first proposed a forward-backward-half forward (FBHF) splitting algorithm as follows
(2) |
where is step-size, , , , is the resolvent of , and is a nonempty closed convex subset of containing a solution of the problem (1). They obtained the weak convergence of the method (2) in the real Hilbert space.
In many cases, monotone inclusion problems have a finite sum structure. For example, finite sum minimization is ubiquitous in machine learning where we minimize the empirical risk [10], and nonlinear constrained optimization problems [4]. Finite sum saddle-point problems and finite sum variational inequalities can also be transformed into the monotone inclusion problems [20]. Given the effectiveness of variance-reduced algorithms for finite sum function minimization, a natural idea is to use similar algorithms to solve the more general finite sum monotone inclusion problems.
Now, we detail our problem setting. Suppose that the maximally monotone operator in (1) has a finite sum representation , where each is -Lipschitz, is -Lipschitz and it is -Lipschitz in mean. Then the problem (1) can be written in the following form
It might be the case that are easy to compute, but not . In this case, gives us a most natural upper bound on . On the other hand, the cost of computing is rather expansive when is very large.
Throughout this paper, we assume access to a stochastic oracle such that is unbiased, , and then consider utilizing the stochastic oracle to perform in the half forward step in the (2) instead of , which yields lower cost per iteration. The two simplest stochastic oracles can be defined as follows
-
(i)
Uniform sampling: , . In this case, .
-
(ii)
Importance sampling: , . In this case, .
Recently, Kovalev et al.[9] proposed a loopless variant of SVRG [8] which removes the outer loop present in SVRG and uses a probabilistic update of the full gradient instead. Later, Alacaoglu et al. [1] proposed the loopless version of extragradient method with variance reduction for solving variational inequalities. They also applied the same idea over the forward-backward-forward (FBF) splitting algorithm which was introduced by Tseng [18] to solve the two operators monotone inclusion problem,
where and are maximally monotone operators. The operator has a stochastic oracle that is unbiased and -Lipschitz in mean. They proved the almost sure convergence of the forward-backward-forward splitting algorithm with variance reduction when is continuous for all . However, the cocoercive operator is required to admit a finite-sum structure as well, if one extends the forward-backward-forward splitting algorithm with variance reduction to solve problem (1).
In this paper, we propose a stochastic forward-backward-half forward splitting algorithm with variance reduction (shortly, VRFBHF). Under some mild assumptions, we establish the almost sure convergence of the sequence generated by our algorithm. Lyapunov analysis of the proposed algorithm is based on the monotonicity inequalities of and , and the cocoercivity inequality of . Furthermore, we obtain the linear convergence when or is strongly monotone. Numerical experiments are conducted to demonstrate the efficacy of the proposed algorithm.
Following, we recall some definitions and known results which will be helpful for further analysis.
Throughout this paper, is a -dimensional Euclidean space with inner product and induced norm . The set of nonnegative integers is denoted by . Probability mass function supported on .
Definition 1.1.
([2, Definition 20.1 and Definition 20.20])
A set-valued mapping is characterized by its graph
A set-valued mapping is said to be
(i) monotone if for all (ii) maximally monotone if there exists no monotone operator such that gra properly contains gra i.e., for every ,
Definition 1.2.
An operator is said to be(i) -Lipschitz continuous, if there exists a constant , such that
(ii) -cocoercive, if there exists a constant , such that
By the Cauchy–Schwarz inequality, a -cocoercive operator is -Lipschitz continuous.
Lemma 1.3.
([2, Proposition 20.38]) Let be maximally monotone, where is a finite dimensional Euclidean space. Then is closed in , i.e., for every sequence in and , if and , then .
Lemma 1.4.
([17, Theorem 1]) Let be a probability space and be a sequence of sub--algebras of . For each let , , , and be non-negative -measurable random variables such that
then exists and almost surely on and .
The paper is organized as follows. In Section 2, we introduce the stochastic forward-backward-half forward splitting algorithm with variance reduction to solve the problem (1), and show the almost sure and linear convergence of the proposed algorithm. Finally, we present the numerical experiments in Section 3.
2 Main Results
In the sequel, we assume that the following conditions are satisfied:
Assumption 2.1.
-
(i)
The operator is maximal monotone;
-
(ii)
The operator is single-valued, monotone and -Lipschitz;
-
(iii)
The operator has a stochastic oracle that is unbiased, , and -Lipschitz in mean:
(3) -
(iv)
is -cocoercive;
-
(v)
The solution set of the problem (1), denoted by , is nonempty.
We now present the stochastic forward-backward-half forward splitting algorithm with variance reduction to solve the problem (1).
Algorithm 2.2.
VRFBHF
1. Input: Probability , probability distribution , step-size ,
Let .
2. for do
3.
4.
5. Draw an index according to
6.
7.
8: end for
Remark 2.3.
Algorithm 2.2 is a very general algorithm and it is brand new to the literature. We review how Algorithm 2.2 relates to previous work. Algorithm 2.2 becomes the forward-backward-forward algorithm with variance reduction in [1] if . Algorithm 2.2 reduces to loopless SVRG in [9] if , , and , where and is the loss of model on data point .
Remark 2.4.
We have two sources of randomness at each iteraton: the index which is used for updating , and the reference point which is updated in each iteration with probability by the iterate , or left unchanged with probability . Intuitively, we wish to keep small to lower the cost per iteration. And different from the FBHF splitting algorithm (2), we use the parameter to add a step of calculating . This means that we assign some weight to the past iteration points.
2.1 The almost sure convergence
In this subsection, we establish the almost sure convergence of Algorithm 2.2. We use the following notations for conditional expectations: and .
For the iterates and generated by Algorithm 2.2, we define the Lyapunov function
which helps to establish the almost sure convergence of the proposed algorithm.
Theorem 2.1.
Proof.
Since we have
(5) |
Step 4 in Algorithm 2.2 is equivalent to the inclusion
(6) |
Combining (5), (6) and the monotonicity of , we have
Then from step 6 in Algorithm 2.2, we obtain
(7) |
By the definition of and identities , we have
(8) | ||||
By the -cocoercivity of and Young’s inequality for all , we get
(9) | ||||
We use (8) and (9) in (7) to obtain
(10) | ||||
Taking expectation on (10) and using
we obtain
By the monotonicity of we have
(11) |
Combining the definition of and (3), we have
Therefore,
(12) | ||||
On the other hand, the definition of and yield that
(13) |
Then apply to (13) the tower property , we have
(14) |
(15) | ||||
Thus, the inequality (4) holds by and .
Next, we show the almost sure convergence of the sequence generated by Algorithm 2.2. By Lemma 1.4, we have that converges almost surely and converges to almost surely. Then applying Proposition 2.3 in [5], we can construct a space , with , such that and ,
(16) |
which implies that the sequence is bounded. Let be the probability 1 set such that , , , which implies . Pick and let be the convergent subsequence of the bounded sequence , say without loss of generality that as . From , it follows that as . Then according to (6), we can get
We know that is -Lipschitz. Therefore,
Furthermore, based on the assumption that the operator has a full domain, we have that is maximally monotone. Combining is cocoercive, one has that is maximally monotone. By Lemma 1.3, , i.e., .
Hence, all cluster points of and belong to . We have shown that at least one subsequence of converges to . Combining (16), we deduce and consequently . This shows that converges almost surely to a point in . ∎
2.2 Linear convergence
In this subsection, we show the linear convergence of Algorithm 2.2 for solving the structured monotone inclusion problem (1) when is -strongly monotone. Indeed, assuming that the operator is strongly monotone also leads to a linear convergence result, and the proof procedure is similar.
Theorem 2.2.
Proof.
If is -strongly monotone, then (11) becomes
We continue as in the proof of Theorem 2.1 to obtain, instead of (15),
(18) | ||||
By , the step 6 and (3), we have
(19) | ||||
Combining (18), (19) and , we get
(20) | ||||
where the last inequality is obtained by and . Similar to (19), we have
(21) | ||||
Putting (21) into (20) and recalling that , we have
(22) | ||||
where the last inequality comes from Then, using and taking the full expectation on (22), we have
Iterating this inequality, we obain
showing (17). ∎
3 Numerical Simulations
In this section, we compare the Algorithm 2.2 (VRFBHF) with the FBHF splitting algorithm (2). All codes were written in MATLAB R2018b and performed on a PC Desktop Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz 3.19 GHz, RAM 8.00 GB. Consider the nonlinear constrained optimization problem of the form
(23) |
where , is a proper, convex and lower semi-continuous function, for every , and are convex functions in and , respectively, and is -Lipschitz. The solution to the optimization problem (23) can be found via the saddle points of the Lagrangian
where is the indicator function of , Under some standard qualifications, the solution to the optimization problem (23) can be found by solving the monotone inclusion [4, 16]: find such that ,
(24) |
where is a nonempty closed convex set modeling the prior information of the solution, is maximally monotone, is -cocoercive, and is nonlinear, monotone and continuous. In the light of the structure of , we can rewrite as where for every . In the numerical results listed in the following table, “Iter” denotes the number of iterations.
Example 3.1.
Let , () with , and with being an real matrix, , . Then the operators in (24) become
(25) | ||||
where , , . It is easy to see that the operator is a maximally monotone operator, is a -cocoercive operator with , and B is a -Lipschitz operator with . According to the structure of the operator , rewrite as where for every . For uniform sampling, the stochastic oracle , , .
In the numerical test, and initial value are all randomly generated. In VRFBHF, set , take and . In FBHF, take . We use
as the stopping criterion.

Figure 1 illustrates the behavior of VRFBHF for different , from which it can be observed that oscillates with and reaches the stopping criterion the fastest when . Next we test eight problem sizes and randomly generate 10 instances for each size. The average number of iterations and CPU time for 10 instances are listed in Table 1. It is observed from Table 1 that VRFBHF has remarkably less CPU time and iteration numbers compared to the FBHF splitting algorithm (2).
Problem size | Iter | CPU time | |||||||
---|---|---|---|---|---|---|---|---|---|
VRFBHF | FBHF | VRFBHF | FBHF | ||||||
1000 | 500 | 75.8 | 3147 | 1.0531 | 15.4125 | ||||
750 | 92.3 | 1363 | 1.1047 | 8.8984 | |||||
1000 | 45.8 | 1933 | 0.8344 | 24.825 | |||||
2000 | 16.4 | 1731 | 1.5344 | 77.2984 | |||||
2000 | 1000 | 96.2 | 1058 | 2.4609 | 28.775 | ||||
1500 | 73.8 | 1020 | 8.8984 | 55.3953 | |||||
2000 | 16.9 | 2022 | 14.6687 | 151.3359 | |||||
2500 | 26.6 | 1268 | 22.8078 | 136.9688 |
References
- [1] Alacaoglu, A., Malitsky, Y.: Stochastic variance reduction for variational inequality methods. Mach. Learn. 178, 1–-39 (2022)
- [2] Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 2nd ed. Springer, New York, (2017)
- [3] Barnet, S., Rudzusika, J., Öktem, O., and Adler, J.: Accelerated forward-backward optimization using deep learning. https://arxiv.org/abs/2105.05210
- [4] Briceño-Arias, L.M., Davis, D.: Forward-backward-half forward algorithm for solving monotone inclusions. Siam J. Optimiz. 28(4), 2839–2871 (2017)
- [5] Combettes, P.L., Pesquet, J.-C.: Stochastic quasi-Fejér block-coordinate fixed point iterations with random sweeping. Siam J. Optimiz. 25(2), 1221–1248 (2015)
- [6] Combettes, P.L., Pesquet, J.C.: Primal-dual splitting algorithm for solving inclusions with mixtures of composite, Lipschitzian, and parallel-sum type monotone operators. Set-valued. Var. Anal. 20, 307–-330 (2012)
- [7] Davis, D., Yin, W.: A three-operator splitting scheme and its optimization applications. Set-valued. Var. Anal. 25, 829–858 (2017)
- [8] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Adv. Neural. Inf. Process. Syst. 315–323 (2013)
- [9] Kovalev, D., Horvath, S., and Richtárik, P.: Don’t jump through hoops and remove those loops: SVRG and katyusha are better without the outer loop. Mach. Learn. 117, 1–17 (2020)
- [10] Liu, J.C., Xu, L.L., Shen, S.H., and Ling, Q.: An accelerated variance reducing stochastic method with Douglas-Rachford splitting. Mach. Learn. 108, 859-–878 (2019)
- [11] Latafat, P., Patrinos, P.: Asymmetric forward-backward-adjoint splitting for solving monotone inclusions involving three operators. Comput. Optim. Appl. 68, 57–93 (2017)
- [12] Malitsky, Y., Tam, M.K.: A forward-backward splitting method for monotone inclusions without cocoercivity. Siam J. Optim. 30(2), 1451–1472 (2020)
- [13] Rieger, J., Tam, M.K.: Backward-forward-reflected-backward splitting for three operator monotone inclusions. Appl. Math. Comput. 381, (2020)
- [14] Ryu, E.K.: Uniqueness of DRS as the 2 operator resolvent-splitting and impossibility of 3 operator resolvent-splitting. Math. Program. 182, 233–273 (2020)
- [15] Ryu, E.K., Vũ, B.C.: Finding the forward-Douglas–Rachford-forward method. J. Optimiz. Theory. App. 184, 858–876 (2020)
- [16] Rockafellar, R.T.: Monotone operators associated with saddle-functions and minimax problems, in: Nonlinear Func. Anal., I, F.E. Browder ed., Proc. Pure Mat 18, 241–-250 (1970)
- [17] Robbins, H., Siegmund, D.: A convergence theorem for non negative almost supermartingales and some applications. Optimizing Methods in Statistics. 233–257 (1971)
- [18] Tseng, P.: A modified forward-backward splitting method for maximal monotone mapping. Siam J. Control. Optim. 38(2), (2000)
- [19] Yu, H., Zong, C., and Tang, Y.: An outer reflected forward-backward splitting algorithm for solving monotone inclusions. https://arxiv.org/abs/2009.12493 (2020)
- [20] Zhang, X., Haskell, W. B., and Ye, Z.S.: A Unifying Framework for Variance-Reduced Algorithms for Findings Zeroes of Monotone operators. J. Mach. Learn. Res. 23(60), 1–-44 (2022)
- [21] Zong, C., Tang, Y., and Cho, Y.J.: Convergence analysis of an inexact three-operator splitting algorithm. Symmetry. 10(11), 563 (2018)
- [22] Zong, C., Tang, Y., and Zhang, G.: An accelerated forward-backward-half forward splitting algorithm for monotone inclusion with applications to image restoration. Optimization. (2022) DOI: 10.1080/02331934.2022.2107926