This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: Yu You 22institutetext: Shanghai Jiao Tong University, School of Mathematical Sciences, China
22email: [email protected]

PGA-based Predictor-Corrector Algorithms for Monotone Generalized Variational Inequality

Yu You
(Received: date / Accepted: date)
Abstract

In this paper, we consider the monotone generalized variational inequality (MGVI) where the monotone operator is Lipschitz continuous. Inspired by the extragradient method MR451121 and the projection contraction algorithms MR1418264 ; MR1297058 for monotone variational inequality (MVI), we propose a class of PGA-based Predictor-Corrector algorithms for MGVI. A significant characteristic of our algorithms for separable multi-blocks convex optimization problems is that they can be well adapted for parallel computation. Numerical simulations about different models for sparsity recovery show the wide applicability and effectiveness of our proposed methods.

Keywords:
Monotone Generalized variational inequality Lipschitz continuous PGA-based Predictor-Corrector algorithms parallel computation

1 Introduction

Let θ:n(,]\theta:\mathbb{R}^{n}\rightarrow(-\infty,\infty] be a closed proper convex function, and F:nnF:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n} a vector-valued and continuous mapping. The generalized variational inequality (GVI) MR686212 ; MR1673631 takes the next form

xn,θ(x)θ(x)+(xx)TF(x)0,xn.x^{*}\in\mathbb{R}^{n},~{}\theta(x)-\theta(x^{*})+(x-x^{*})^{T}F(x^{*})\geq 0,~{}\forall x\in\mathbb{R}^{n}~{}.

Here, we keep the arithmetic operation that =\infty-\infty=\infty. In this paper, we focus on the case where FF is monotone, i.e., (F(x)F(y))T(xy)0(F(x)-F(y))^{T}(x-y)\geq 0 for all x,ynx,y\in\mathbb{R}^{n}. In this case, the corresponding GVI is called monotone generalized variational inequality (MGVI). Throughout this paper, we make the following additional assumptions for MGVI(θ,F)\text{MGVI}(\theta,F):

Assumption 1
  1. (a)

    The monotone operator FF is Lipschitz continuous with Lipschitz constant L>0L>0.

  2. (b)

    The solution set of MGVI(θ,F)\text{MGVI}(\theta,F), denoted as Ω\Omega^{*}, is nonempty.

Various convex optimization problems can be formulated as MGVIs, such as lasso problem, basis pursuit problem MR1854649 , basis pursuit denoising problem MR2481111 , and Dantzig selector MR2382651 , to name a few. Moreover, MGVI contains the classical monotone variational inequality (MVI). One sees this by setting θ=δ𝒞\theta=\delta_{\mathcal{C}}111δ𝒞\delta_{\mathcal{C}} is the indicator function of 𝒞\mathcal{C} which is equal to 0 if x𝒞x\in\mathcal{C} and \infty otherwise. with 𝒞n\mathcal{C}\subseteq\mathbb{R}^{n} being a nonempty closed convex set, then MGVI reduces to MVI in form of

x𝒞,(xx)TF(x)0,x𝒞.x^{*}\in\mathcal{C},~{}(x-x^{*})^{T}F(x^{*})\geq 0,~{}\forall x\in\mathcal{C}~{}.

For solving MVI(θ,F,𝒞)\text{MVI}(\theta,F,\mathcal{C}), the extragradient method MR451121 can be applied. Specifically, at iteration kk, in predictor step, the projection operator is used for generating a predictor x~k=P𝒞(xkβkF(xk))\tilde{x}^{k}=\text{P}_{\mathcal{C}}(x^{k}-\beta^{k}F(x^{k})) where βk\beta^{k} is selected to satisfy some given condition, then in corrector step, xk+1=P𝒞(xkβkF(x~k))x^{k+1}=\text{P}_{\mathcal{C}}(x^{k}-\beta^{k}F(\tilde{x}^{k})). This method yields a sequence converging to a solution of MVI(θ,F,𝒞)\text{MVI}(\theta,F,\mathcal{C}). On the other hand, He MR1418264 ; MR1297058 proposed a class of projection and contraction methods for MVI(θ,F,𝒞)\text{MVI}(\theta,F,\mathcal{C}), which also belongs to a class of Predictor-Corrector algorithms. In predictor step, the predictor is the same with the above extragradient method as x~k=P𝒞(xkβkF(xk))\tilde{x}^{k}=\text{P}_{\mathcal{C}}(x^{k}-\beta^{k}F(x^{k})), then the corrector step is based on the next three fundamental inequalities for constructing different profitable directions:

(x~kx)TβkF(x)0,(\tilde{x}^{k}-x^{*})^{T}\beta^{k}F(x^{*})\geq 0, (1)
(x~kx)T([xkβkF(xk)]x~k)0,(\tilde{x}^{k}-x^{*})^{T}([x^{k}-\beta^{k}F(x^{k})]-\tilde{x}^{k})\geq 0, (2)
(x~kx)T(βkF(x~k)βkF(x))0,\displaystyle(\tilde{x}^{k}-x^{*})^{T}(\beta^{k}F(\tilde{x}^{k})-\beta^{k}F(x^{*}))\geq 0, (3)

where xx^{*} is any solution of MVI(θ,F,𝒞)\text{MVI}(\theta,F,\mathcal{C}), and βk>0\beta^{k}>0. Numerical simulations therein MR1418264 ; MR1297058 show that the methods proposed by He outperformed the extragradient method in general.

Inspired by both of their works for MVI(θ,F,𝒞)\text{MVI}(\theta,F,\mathcal{C}), we generalize their ideas for MGVI(θ,F)\text{MGVI}(\theta,F). Firstly, for generalizing the idea of extragradient method for MGVI(θ,F)\text{MGVI}(\theta,F), at iteration kk and in predictor step, we use the proximity operator to yield a predictor x~k=Proxβkθ(xkβkF(xk))\tilde{x}^{k}=\text{Prox}_{\beta^{k}\theta}(x^{k}-\beta^{k}F(x^{k})), where βk\beta^{k} is selected to meet some given condition, then in corrector step, we set xk+1x^{k+1} as Proxβkθ(xkβkF(x~k))\text{Prox}_{\beta^{k}\theta}(x^{k}-\beta^{k}F(\tilde{x}^{k})). The resulting algorithm can be deemed as a natural generalization of extragradient method (EM) for MGVI(θ,F)\text{MGVI}(\theta,F). Thus we denote our algorithm as GEM. The difference between GEM and EM is the different operator used in both predictor and corrector steps. Surprisingly, this kind of modification does not change the key inequality for the convergence analysis of GEM compared with that for EM. Secondly, for generalizing the idea of He’s projection and contraction algorithms, we also use the proximity operator to make a predictor x~k=Proxβkθ(xkβkF(xk))\tilde{x}^{k}=\text{Prox}_{\beta^{k}\theta}(x^{k}-\beta^{k}F(x^{k})), then we likewise obtain three fundamental inequalities as follows:

βkθ(x~k)βkθ(x)+(x~kx)TβkF(x)0,\beta^{k}\theta(\tilde{x}^{k})-\beta^{k}\theta(x^{*})+(\tilde{x}^{k}-x^{*})^{T}\beta^{k}F(x^{*})\geq 0, (4)
βkθ(x)βkθ(x~k)+(x~kx)T([xkβkF(xk)]x~k)0,\beta^{k}\theta(x^{*})-\beta^{k}\theta(\tilde{x}^{k})+(\tilde{x}^{k}-x^{*})^{T}([x^{k}-\beta^{k}F(x^{k})]-\tilde{x}^{k})\geq 0, (5)
(x~kx)T(βkF(x~k)βkF(x))0,\displaystyle(\tilde{x}^{k}-x^{*})^{T}(\beta^{k}F(\tilde{x}^{k})-\beta^{k}F(x^{*}))\geq 0, (6)

where xx^{*} is any solution of MGVI(θ,F)\text{MGVI}(\theta,F), and βk>0\beta^{k}>0. We observe that the leftmost terms of (4) and (5) are opposite to each other, thus they can be eliminated when adding (4) and (5), or when adding (4), (5) and (6). This yields the same results with that when adding (1) and (2), or adding (1), (2), and (3). The latter is used by He in corrector step for constructing projection and contraction algorithms for MVI(θ,F,𝒞)\text{MVI}(\theta,F,\mathcal{C}). Thus, the methodologies developed by He for MVI(θ,F,𝒞)\text{MVI}(\theta,F,\mathcal{C}) can be straightforwardly inherited. This is an inspiring result. The only difference between He’s works for MVI(θ,F,𝒞)\text{MVI}(\theta,F,\mathcal{C}) and our proposed algorithms for MGVI(θ,F)\text{MGVI}(\theta,F) will be the different predictors, which differ a little. Here, it should be mentioned that He et al. MR2902659 ; MR3049509 ; MR3128749 ; MR2500836 developed diverse PPA-based contraction methods for MGVI(θ,F)\text{MGVI}(\theta,F), one can also refer to MR3618786 for a review about the works of He et al. in MVI(θ,F,𝒞)\text{MVI}(\theta,F,\mathcal{C}) and MGVI(θ,F)\text{MGVI}(\theta,F). Although their PPA-based contraction algorithms possess the contraction properties, the various corrector strategies developed by them for MVI(θ,F,𝒞)\text{MVI}(\theta,F,\mathcal{C}) are not used. Thus our proposed algorithms for MGVI(θ,F)\text{MGVI}(\theta,F) differ from theirs. Part of our contribution in this paper is to construct a direct link between their previous works in MVI(θ,F,𝒞)\text{MVI}(\theta,F,\mathcal{C}) and later works in MGVI(θ,F)\text{MGVI}(\theta,F). The above two generalizations can be treated as Predictor-Corrector algorithms, and both of them use the proximity operator for making predictors, thus we call our algorithms as PGA-based Predictor-Corrector algorithms.

Note that in order to make our algorithms to be implementable in practice, the proximity operator of θ\theta should be easy to compute. The reader can refer to MR3616647 ; MR3719240 ; MR2858838 for a wide variety of computable proximity operators. Finally, we mention that for the structured convex optimization problem with a separable objective function and linear constraint, our proposed algorithms can be well adapted for parallel computation, which can not be realized by many algorithms related to ADMM. This is a significant characteristic of our algorithms, allowing lesser time exhausted by employing the technique of parallel computation, especially in case of large-scale multi-blocks separable problems.

The rest of this paper is organized as follows. In section 2, we recall some notations and preliminaries. Then in section 3, we will introduce the first class of our proposed PGA-based Predictor-Corrector algorithms, which is a natural generalization of extragradient method. Next in section 4, the second class of our algorithms will be introduced. In section 5, we will discuss and compare our proposed algorithms with ADMM-related algorithms with respect to a concrete convex optimization problem. Finally in section 6, we test our proposed algorithms for different sparsity recovery models.

2 Notations and preliminaries

Let n\mathbb{R}^{n} denote the nn-dimensional Euclidean space with standard inner product ,\langle\cdot,\cdot\rangle, i.e., x,y=xTy\langle x,y\rangle=x^{T}y for any x,ynx,y\in\mathbb{R}^{n}, and \|\cdot\| denote the Euclidean norm. Given a function f:n(,]f:\mathbb{R}^{n}\rightarrow(-\infty,\infty], the set

domf:={xn:f(x)<}\text{dom}f:=\{x\in\mathbb{R}^{n}:f(x)<\infty\}

denotes its effective domain. If domf\text{dom}f\neq\emptyset, then ff is called a proper function (or proper). The set

epif:={(x,t):f(x)t,xn,t}\text{epi}f:=\{(x,t):f(x)\leq t,x\in\mathbb{R}^{n},t\in\mathbb{R}\}

denotes the epigraph of ff, and ff is closed (resp. convex) if epif\text{epi}f is closed (resp. convex). The function ff is called a closed proper convex function if it is proper, closed and convex.

Now, let f:n(,]f:\mathbb{R}^{n}\rightarrow(-\infty,\infty] be a closed proper convex function, the proximity operator of ff is defined as follows:

Proxf:xargmin{f(y)+12xy2:yn},\text{Prox}_{f}:x\mapsto\operatorname*{argmin}\{f(y)+\frac{1}{2}\|x-y\|^{2}:y\in\mathbb{R}^{n}\},

the optimal solution is unique. If f=δ𝒞f=\delta_{\mathcal{C}} with 𝒞\mathcal{C} being a nonempty closed convex set, then the above proximity operator reduces to the usual projection operator P𝒞\text{P}_{\mathcal{C}}.

Lemma 1 (MR3719240 )

Let f:n(,]f:\mathbb{R}^{n}\rightarrow(-\infty,\infty] be a closed proper convex function. Then the next three conditions are equivalent:

  1. 1.

    p=Proxf(x)p=\text{Prox}_{f}(x).

  2. 2.

    xpf(p)x-p\in\partial f(p).

  3. 3.

    For all wnw\in\mathbb{R}^{n}, it holds that

    (xp)T(pw)f(p)f(w).(x-p)^{T}(p-w)\geq f(p)-f(w).

For any β>0\beta>0, let e(x,β):=xProxβθ(xβF(x))e(x,\beta):=x-\text{Prox}_{\beta\theta}(x-\beta F(x)). The next theorem shows that xnx^{*}\in\mathbb{R}^{n} is a solution of MGVI(θ,F)\text{MGVI}(\theta,F) if and only if e(x,β)=0e(x,\beta)=0.

Lemma 2

For any β>0\beta>0, xnx^{*}\in\mathbb{R}^{n} is a solution of MGVI(θ,F)\text{MGVI}(\theta,F) if and only if e(x,β)=0e(x,\beta)=0.

Proof

If xx^{*} is a solution of MGVI(θ,F)\text{MGVI}(\theta,F), then the definition of MGVI(θ,F)\text{MGVI}(\theta,F) implies F(x)θ(x)-F(x^{*})\in\partial\theta(x^{*}), that is same with [xβF(x)]xβθ(x)[x^{*}-\beta F(x^{*})]-x^{*}\in\partial\beta\theta(x^{*}), then Lemma 1 implies that

Proxβθ(xβF(x))=x,\text{Prox}_{\beta\theta}(x^{*}-\beta F(x^{*}))=x^{*},

thus e(x,β)=0e(x,\beta)=0. Now, let e(x,β)=0e(x,\beta)=0, this implies Proxβθ(xβF(x))=x\text{Prox}_{\beta\theta}(x^{*}-\beta F(x^{*}))=x^{*}, then it follows from Lemma 1 that F(x)θ(x)-F(x^{*})\in\partial\theta(x^{*}), thus xx^{*} is a solution of MGVI(θ,F)\text{MGVI}(\theta,F). ∎

Thus, solving MGVI(θ,F)\text{MGVI}(\theta,F) reduces to solving a zero point of e(x,β)e(x,\beta). So, for given β>0\beta>0, e(x,β)\|e(x,\beta)\| can be treated as a error measure function. Next we will show that e(x,β)\|e(x,\beta)\| is a non-decreasing function about β\beta, while e(x,β)/β\|e(x,\beta)\|/\beta is a non-increasing function about β\beta. This show that e(x,β)e(x,\beta) can be used for stopping criterion when implementing algorithms. The next result can be seen by a simple modification of notations in the proof of Theorem 10.9 in MR3719240 .

Lemma 3 (MR3719240 )

For all xnx\in\mathbb{R}^{n}, and β2β1>0\beta_{2}\geq\beta_{1}>0, it holds that

e(x,β2)e(x,β1)\|e(x,\beta_{2})\|\geq\|e(x,\beta_{1})\| (7)

and

e(x,β2)β2e(x,β1)β1.\frac{\|e(x,\beta_{2})\|}{\beta_{2}}\leq\frac{\|e(x,\beta_{1})\|}{\beta_{1}}. (8)

Before ending this part, for simplifying the convergence analysis of our proposed algorithms, the definition of Fejér monotone sequence and its related properties are recalled.

Definition 1 (Fejér monotone sequence)

Let 𝒟n\mathcal{D}\subseteq\mathbb{R}^{n} be nonempty, and let {xk}k\{x^{k}\}_{k} be a sequence in n\mathbb{R}^{n}, then {xk}k\{x_{k}\}_{k} is Fejér monotone with respect to 𝒟\mathcal{D} if for all kk and any x𝒟x\in\mathcal{D}, it holds that

xk+1xxkx.\|x^{k+1}-x\|\leq\|x^{k}-x\|.
Lemma 4 (MR3616647 )

Let 𝒟n\mathcal{D}\subseteq\mathbb{R}^{n} be nonempty, and let {xk}k\{x^{k}\}_{k} be a Fejér monotone sequence with respect to 𝒟\mathcal{D}, then the following hold

  • {xk}k\{x^{k}\}_{k} is bounded;

  • If any subsequence of {xk}k\{x^{k}\}_{k} converges to a point in 𝒟\mathcal{D}, then {xk}k\{x^{k}\}_{k} converges to a point in 𝒟\mathcal{D}.

3 Generalized Extragradient Method for MGVI(θ,F)\text{MGVI}(\theta,F)

In this section, we will introduce the first class of our PGA-based Predictor-Corrector algorithms for solving MGVI(θ,F)\text{MGVI}(\theta,F). The motivation for our algorithm is the extragradient method (EM) MR451121 , which is designed for solving MVI(θ,F,𝒞)\text{MVI}(\theta,F,\mathcal{C}). Our algorithm for MGVI(θ,F)\text{MGVI}(\theta,F) can be deemed as a natural generalization of EM, and is denoted by us as GEM. The difference of GEM from EM is that the proximity operator is used for both of the predictor and corrector steps, while projector operator is used in EM. Next, we will show firstly GEM for MGVI(θ,F)\text{MGVI}(\theta,F) in Algorithm 1, followed by the its convergence.

Input: x0nx^{0}\in\mathbb{R}^{n}, and ν(0,1)\nu\in(0,1).
1
2for k=0,1,k=0,1,\cdots do
3       Predictor: Selecting βk>0\beta^{k}>0 such that βkF(xk)F(x~k)νxkx~k\beta^{k}\|F(x^{k})-F(\tilde{x}^{k})\|\leq\nu\|x^{k}-\tilde{x}^{k}\| where x~k=Proxβkθ(xkβkF(xk))\tilde{x}^{k}=\text{Prox}_{\beta^{k}\theta}(x^{k}-\beta^{k}F(x^{k})) ;
4      Corrector: Setting xk+1=Proxβkθ(xkβkF(x~k))x^{k+1}=\text{Prox}_{\beta^{k}\theta}(x^{k}-\beta^{k}F(\tilde{x}^{k})).
5 end for
6
Algorithm 1 GEM for MGVI(θ,F)\text{MGVI}(\theta,F)
Remark 1

Under the second term of Assumption 1, we conclude that

βkF(xk)F(x~k)νxkx~k\beta^{k}\|F(x^{k})-F(\tilde{x}^{k})\|\leq\nu\|x^{k}-\tilde{x}^{k}\|

whenever βkνL\beta^{k}\leq\frac{\nu}{L}.

In order to prove the convergence of GEM, we need the next propositions.

Proposition 1

Let x,x~,ynx,\tilde{x},y\in\mathbb{R}^{n}, β>0\beta>0, and let

T~β(x):=Proxβθ(xβF(x~)),\widetilde{T}_{\beta}(x):=\text{Prox}_{\beta\theta}(x-\beta F(\tilde{x})),

then the following inequality holds

T~β(x)y2xy2xT~β(x)2+2β(yT~β(x))TF(x~)+2β(θ(y)θ(T~β(x))).\|\widetilde{T}_{\beta}(x)-y\|^{2}\leq\|x-y\|^{2}-\|x-\widetilde{T}_{\beta}(x)\|^{2}+2\beta(y-\widetilde{T}_{\beta}(x))^{T}F(\tilde{x})+2\beta(\theta(y)-\theta(\widetilde{T}_{\beta}(x))).
Proof

Consider the function

ϕ(u)=(ux)TF(x~)+θ(u)+12βux2.\phi(u)=(u-x)^{T}F(\tilde{x})+\theta(u)+\frac{1}{2\beta}\|u-x\|^{2}.

Because T~β(x)=argmin{ϕ(u):un}\widetilde{T}_{\beta}(x)=\operatorname*{argmin}\{\phi(u):u\in\mathbb{R}^{n}\}, and ϕ\phi is a 1β\frac{1}{\beta}-strongly closed proper convex function, then it follows from Theorem 5.25 in MR3719240 that

ϕ(y)ϕ(T~β(x))12βyT~β(x)2,\phi(y)-\phi(\widetilde{T}_{\beta}(x))\geq\frac{1}{2\beta}\|y-\widetilde{T}_{\beta}(x)\|^{2}, (9)

on the other hand, one obtains that

ϕ(y)ϕ(T~β(x))=(yT~β(x))TF(x~)+θ(y)θ(T~β(x))+12β(xy2xT~β(x)2).\begin{array}[]{ll}\phi(y)-\phi(\widetilde{T}_{\beta}(x))&=(y-\widetilde{T}_{\beta}(x))^{T}F(\tilde{x})+\theta(y)-\theta(\widetilde{T}_{\beta}(x))\\ &\quad+\frac{1}{2\beta}(\|x-y\|^{2}-\|x-\widetilde{T}_{\beta}(x)\|^{2}).\end{array} (10)

Combining (9) and (10), we obtain the targeted inequality

T~β(x)y2xy2xT~β(x)2+2β(yT~β(x))TF(x~)+2β(θ(y)θ(T~β(x))).\|\widetilde{T}_{\beta}(x)-y\|^{2}\leq\|x-y\|^{2}-\|x-\widetilde{T}_{\beta}(x)\|^{2}+2\beta(y-\widetilde{T}_{\beta}(x))^{T}F(\tilde{x})+2\beta(\theta(y)-\theta(\widetilde{T}_{\beta}(x))).

Next, we prove the key inequality for the convergence analysis of GEM, which is based on proposition 1.

Proposition 2

Let {xk}k\{x^{k}\}_{k} be the sequence generated by GEM (Algorithm 1), and let xx^{*} be any solution of MGVI(θ,F)\text{MGVI}(\theta,F), then the next inequality holds

xk+1x2xkx2(1ν2)xkx~k2.\|x^{k+1}-x^{*}\|^{2}\leq\|x^{k}-x^{*}\|^{2}-(1-\nu^{2})\|x^{k}-\tilde{x}^{k}\|^{2}.
Proof

Substituting x=yx^{*}=y, xk=xx^{k}=x, x~k=x~\tilde{x}^{k}=\tilde{x}, βk=β\beta^{k}=\beta, and xk+1=T~βk(xk)x^{k+1}=\widetilde{T}_{\beta^{k}}(x^{k}) in the inequality of proposition 1, we obtain

xk+1x2xkx2xkxk+12+2βk(xxk+1)TF(x~k)+2βk(θ(x)θ(xk+1)).\begin{array}[]{ll}\|x^{k+1}-x^{*}\|^{2}\leq&\|x^{k}-x^{*}\|^{2}-\|x^{k}-x^{k+1}\|^{2}\\ &+2\beta^{k}(x^{*}-x^{k+1})^{T}F(\tilde{x}^{k})+2\beta^{k}(\theta(x^{*})-\theta(x^{k+1})).\end{array} (11)

On the other hand, F(x)θ(x)-F(x^{*})\in\partial\theta(x^{*}) implies that

2βkθ(x~k)2βkθ(x)+2βk(x~kx)TF(x~k)0.2\beta^{k}\theta(\tilde{x}^{k})-2\beta^{k}\theta(x^{*})+2\beta^{k}(\tilde{x}^{k}-x^{*})^{T}F(\tilde{x}^{k})\geq 0. (12)

Adding inequality (12) to the right side of inequality (11), then it follows that

xk+1x2xkx2xkxk+12+2βk(x~kxk+1)TF(x~k)+2βk(θ(x~k)θ(xk+1))=xkx2xkx~k2x~kxk+122(xk+1x~k)T(x~kxk+βkF(x~k))+2βk(θ(x~k)θ(xk+1),\begin{array}[]{ll}\|x^{k+1}-x^{*}\|^{2}&\leq\|x^{k}-x^{*}\|^{2}-\|x^{k}-x^{k+1}\|^{2}\\ &\quad+2\beta^{k}(\tilde{x}^{k}-x^{k+1})^{T}F(\tilde{x}^{k})+2\beta^{k}(\theta(\tilde{x}^{k})-\theta(x^{k+1}))\\ &=\|x^{k}-x^{*}\|^{2}-\|x^{k}-\tilde{x}^{k}\|^{2}-\|\tilde{x}^{k}-x^{k+1}\|^{2}\\ &\quad-2(x^{k+1}-\tilde{x}^{k})^{T}(\tilde{x}^{k}-x^{k}+\beta^{k}F(\tilde{x}^{k}))+2\beta^{k}(\theta(\tilde{x}^{k})-\theta(x^{k+1}),\end{array} (13)

where the second equality follows from

xkxk+12=xkx~k2+x~kxk+12+2(xkx~k)T(x~kxk+1).\|x^{k}-x^{k+1}\|^{2}=\|x^{k}-\tilde{x}^{k}\|^{2}+\|\tilde{x}^{k}-x^{k+1}\|^{2}+2(x^{k}-\tilde{x}^{k})^{T}(\tilde{x}^{k}-x^{k+1}).

Moreover, x~k=Proxβkθ(xkβkF(xk))\tilde{x}^{k}=\text{Prox}_{\beta^{k}\theta}(x^{k}-\beta^{k}F(x^{k})) (Lemma 4) implies that

2βkθ(xk+1)2βkθ(x~k)+2(x~kxk+1)T(xkβkF(xk)x~k)0.2\beta^{k}\theta(x^{k+1})-2\beta^{k}\theta(\tilde{x}^{k})+2(\tilde{x}^{k}-x^{k+1})^{T}(x^{k}-\beta^{k}F(x^{k})-\tilde{x}^{k})\geq 0. (14)

Then adding (14) to the right side of (13), it follows that

xk+1x2xkx2xkx~k2x~kxk+12+2βk(x~kxk+1)T(F(x~k)F(xk)).\begin{array}[]{ll}\|x^{k+1}-x^{*}\|^{2}&\leq\|x^{k}-x^{*}\|^{2}-\|x^{k}-\tilde{x}^{k}\|^{2}-\|\tilde{x}^{k}-x^{k+1}\|^{2}\\ &\quad+2\beta^{k}(\tilde{x}^{k}-x^{k+1})^{T}(F(\tilde{x}^{k})-F(x^{k})).\end{array} (15)

Considering also that

2βk(x~kxk+1)T(F(x~k)F(xk))x~kxk+12+βk(F(x~k)F(xk))2,2\beta^{k}(\tilde{x}^{k}-x^{k+1})^{T}(F(\tilde{x}^{k})-F(x^{k}))\leq\|\tilde{x}^{k}-x^{k+1}\|^{2}+\|\beta^{k}(F(\tilde{x}^{k})-F(x^{k}))\|^{2}, (16)

then combining (15) and (16), and also taking into account that

βk(F(x~k)F(xk))2ν2x~kxk2,\|\beta^{k}(F(\tilde{x}^{k})-F(x^{k}))\|^{2}\leq\nu^{2}\|\tilde{x}^{k}-x^{k}\|^{2},

we conclude that

xk+1x2xkx2(1ν2)xkx~k2.\|x^{k+1}-x^{*}\|^{2}\leq\|x^{k}-x^{*}\|^{2}-(1-\nu^{2})\|x^{k}-\tilde{x}^{k}\|^{2}.

Theorem 3.1 (Convergence of GEM for MGVI(θ,F)\text{MGVI}(\theta,F))

Let {xk}k\{x^{k}\}_{k} be the sequence generated by GEM, and let {βk}k\{\beta^{k}\}_{k} be bounded such that inf{βk}>0\inf\{\beta^{k}\}>0. Then {xk}k\{x^{k}\}_{k} converges to a solution of MGVI(θ,F)\text{MGVI}(\theta,F).

Proof

One sees from proposition 2 that for all kk, and any xΩx^{*}\in\Omega^{*}, it holds that

xk+1x2xkx2(1ν2)xkx~k2.\|x^{k+1}-x^{*}\|^{2}\leq\|x^{k}-x^{*}\|^{2}-(1-\nu^{2})\|x^{k}-\tilde{x}^{k}\|^{2}.

This implies that {xk}k\{x^{k}\}_{k} is Fejér monotone with respect to Ω\Omega^{*}. Adding the above inequality from 0 to nn, taking some rearrangements, and letting nn\rightarrow\infty, then it holds that

k=0(1ν2)xkx~k2x0x2.\sum_{k=0}^{\infty}(1-\nu^{2})\|x^{k}-\tilde{x}^{k}\|^{2}\leq\|x_{0}-x^{*}\|^{2}.

Thus k=0xkx~k2<\sum_{k=0}^{\infty}\|x^{k}-\tilde{x}^{k}\|^{2}<\infty, this implies that xkx~k0\|x^{k}-\tilde{x}^{k}\|\rightarrow 0, from which we conclude {xk}k\{x^{k}\}_{k} and {x~k}k\{\tilde{x}^{k}\}_{k} have common limit points. Let yy be any of their common limit points, then xkjyx^{k_{j}}\rightarrow y and x~kjy\tilde{x}^{k_{j}}\rightarrow y. Because {βk}k\{\beta^{k}\}_{k} is bounded and inf{βk}>0\inf\{\beta_{k}\}>0, without loss of generality, one can assume that βkjβ\beta^{k_{j}}\rightarrow\beta^{*} with β>0\beta^{*}>0. Clearly, x~kj=Proxβkjθ(xkjβkjF(xkj))\tilde{x}^{k_{j}}=\text{Prox}_{\beta^{k_{j}}\theta}(x^{k_{j}}-\beta^{k_{j}}F(x^{k_{j}})), then it follows from Lemma 1 that for any xnx\in\mathbb{R}^{n}, it holds

βkjθ(x)βkjθ(x~kj)+(xx~kj)T(x~kj[xkjβkjF(xkj)])0,\beta^{k_{j}}\theta(x)-\beta^{k_{j}}\theta(\tilde{x}^{k_{j}})+(x-\tilde{x}^{k_{j}})^{T}(\tilde{x}^{k_{j}}-[x^{k_{j}}-\beta^{k_{j}}F(x^{k_{j}})])\geq 0,

thus

lim sup{βkjθ(x)βkjθ(x~kj)+(xx~kj)T(x~kj[xkjβkjF(xkj)])}\displaystyle\limsup\{\beta^{k_{j}}\theta(x)-\beta^{k_{j}}\theta(\tilde{x}^{k_{j}})+(x-\tilde{x}^{k_{j}})^{T}(\tilde{x}^{k_{j}}-[x^{k_{j}}-\beta^{k_{j}}F(x^{k_{j}})])\}
=\displaystyle= βθ(x)lim infβkjθ(x~kj)+(xy)T(βF(y))0.\displaystyle\beta^{*}\theta(x)-\liminf\beta^{k_{j}}\theta(\tilde{x}^{k_{j}})+(x-y)^{T}(\beta^{*}F(y))\geq 0. (17)

On the other hand, because θ\theta is a closed proper convex function, this implies that θ\theta is lower semicontinuous, thus θ(y)lim infθ(x~kj)\theta(y)\leq\liminf\theta(\tilde{x}^{k_{j}}). Thus

lim infβkjθ(x~kj)βθ(y).-\liminf\beta^{k_{j}}\theta(\tilde{x}^{k_{j}})\leq-\beta^{*}\theta(y). (18)

Then combining (3) and (18), and considering that β>0\beta^{*}>0, we conclude that

θ(x)θ(y)+(xy)TF(y)0,\theta(x)-\theta(y)+(x-y)^{T}F(y)\geq 0,

for any xnx\in\mathbb{R}^{n}. Thus yy is a solution of MGVI(θ,F)\text{MGVI}(\theta,F), i.e., yΩy\in\Omega^{*}. Finally, because {xk}k\{x^{k}\}_{k} is Fejér monotone with respect to Ω\Omega^{*}, yy is any limit point of {xk}k\{x^{k}\}_{k}, and yΩy\in\Omega^{*}, thus Lemma 4 implies that {xk}k\{x^{k}\}_{k} converges to a solution of MGVI(θ,F)\text{MGVI}(\theta,F). ∎

Remark 2

To guarantee the convergence of GEM (Algorithm 1), {βk}k\{\beta^{k}\}_{k} requires to be bounded with inf{βk}>0\inf\{\beta^{k}\}>0. One can select βkβ\beta^{k}\equiv\beta with βνL\beta\leq\frac{\nu}{L}. However, in some applications, LL can not be explicitly computed, or it is evaluated conservatively, the latter causes small β\beta which usually results in slow convergence. Thus, one can take some adaptive rules for choosing βk\beta^{k} at each iteration.

4 Proximity and Contraction Algorithms for MGVI(θ,F)\text{MGVI}(\theta,F)

In this section, we will introduce the second class of our PGA-based Predictor-Corrector algorithms for MGVI(θ,F)\text{MGVI}(\theta,F). We are inspired by the works of He for MVI(θ,F,𝒞)\text{MVI}(\theta,F,\mathcal{C}) in MR1418264 ; MR1297058 , where the projection operator is used for making a predictor, while our algorithms for MGVI(θ,F)\text{MGVI}(\theta,F) use proximity operator for making a predictor. The corrector strategies developed by He for MVI(θ,F,𝒞)\text{MVI}(\theta,F,\mathcal{C}) can be inherited in our algorithms, this is an inspiring result. Considering that the algorithms developed by He for MVI(θ,F,𝒞)\text{MVI}(\theta,F,\mathcal{C}) are called as projection and contraction algorithms, our algorithms depend on the strategies for corrector steps by He, and our algorithms also enjoy the contraction properties, thus we name our algorithms as proximity and contraction algorithms. In the sequel, we will just employ part of the corrector strategies developed by He for constructing our algorithms, the reader can refer to MR1418264 ; MR1297058 for more strategies for corrector step.

Firstly, we introduce three fundamental inequalities, which are key for constructing our algorithms for MGVI(θ,F)\text{MGVI}(\theta,F).

Proposition 3 (Three fundamental inequalities)

Let xx^{*} be any solution of MGVI(θ,F)\text{MGVI}(\theta,F), and for any xnx\in\mathbb{R}^{n} and β>0\beta>0, let x~=Proxβθ(xβF(x))\tilde{x}=\text{Prox}_{\beta\theta}(x-\beta F(x)), then the following three inequalities hold:

βθ(x~)βθ(x)+(x~x)TβF(x)0,\beta\theta(\tilde{x})-\beta\theta(x^{*})+(\tilde{x}-x^{*})^{T}\beta F(x^{*})\geq 0, (FI1)
βθ(x)βθ(x~)+(x~x)T([xβF(x)]x~)0,\beta\theta(x^{*})-\beta\theta(\tilde{x})+(\tilde{x}-x^{*})^{T}([x-\beta F(x)]-\tilde{x})\geq 0, (FI2)
(x~x)T(βF(x~)βF(x))0.\displaystyle(\tilde{x}-x^{*})^{T}(\beta F(\tilde{x})-\beta F(x^{*}))\geq 0. (FI3)
Proof

The first inequality (FI1) follows from the definition of MGVI(θ,F)\text{MGVI}(\theta,F). The final inequality (FI3) holds because FF is monotone over n\mathbb{R}^{n}. Now let us prove the second inequality (FI2). Because x~=Proxβθ(xβF(x))\tilde{x}=\text{Prox}_{\beta\theta}(x-\beta F(x)), then it follows from Lemma 1 that for all yny\in\mathbb{R}^{n}, it holds that

βθ(y)βθ(x~)+(yx~)T(xβF(x)x~),\beta\theta(y)\geq\beta\theta(\tilde{x})+(y-\tilde{x})^{T}(x-\beta F(x)-\tilde{x}),

replacing yy by xx^{*}, and by rearrangement, then we conclude (FI2). ∎

We observe that the leftmost terms of (FI1) and (FI2) are opposite to each other, so they can be eliminated when adding (FI1) and (FI2), or when adding (FI1), (FI2) and (FI3). This observance makes it possible for us to directly employ the strategies of corrector step developed by He in MR1418264 ; MR1297058 for MVI(θ,F,𝒞)\text{MVI}(\theta,F,\mathcal{C}).

4.1 Proximity and contraction algorithms based on (FI1) and (FI2)

In this part, we consider MGVI(θ,F)\text{MGVI}(\theta,F) with F(x)=Mx+qF(x)=Mx+q. Then (FI1) and (FI2) take the next forms

βθ(x~)βθ(x)+(x~x)Tβ(Mx+q)0,\beta\theta(\tilde{x})-\beta\theta(x^{*})+(\tilde{x}-x^{*})^{T}\beta(Mx^{*}+q)\geq 0,

and

βθ(x)βθ(x~)+(x~x)T([xβ(Mx+q)]x~)0.\beta\theta(x^{*})-\beta\theta(\tilde{x})+(\tilde{x}-x^{*})^{T}([x-\beta(Mx+q)]-\tilde{x})\geq 0.

Adding the above two inequalities together, one obtains that

(xx)T(I+βMT)(xx~)\displaystyle(x-x^{*})^{T}(I+\beta M^{T})(x-\tilde{x}) (xx)TβM(xx)+xx~2\displaystyle\geq(x-x^{*})^{T}\beta M(x-x^{*})+\|x-\tilde{x}\|^{2}
xx~2,\displaystyle\geq\|x-\tilde{x}\|^{2}, (19)

where the last inequality holds because MM is positive semi-definite222Here, MM is not required to be symmetric., i.e., for all x,ynx,y\in\mathbb{R}^{n}, (xy)TM(xy)0(x-y)^{T}M(x-y)\geq 0. Replacing the above xx, x~\tilde{x}, β\beta by xkx^{k}, x~k\tilde{x}^{k}, βk\beta^{k} respectively, and setting

ϕ(xk,x~k,βk)\displaystyle\phi(x^{k},\tilde{x}^{k},\beta^{k}) :=xkx~k2,\displaystyle:=\|x^{k}-\tilde{x}^{k}\|^{2},
d(xk,x~k,βk)\displaystyle d(x^{k},\tilde{x}^{k},\beta^{k}) :=(I+βkMT)(xkx~k),\displaystyle:=(I+\beta^{k}M^{T})(x^{k}-\tilde{x}^{k}),

then it follows that (xkx)Td(xk,x~k,βk)ϕ(xk,x~k,βk)(x^{k}-x^{*})^{T}d(x^{k},\tilde{x}^{k},\beta^{k})\geq\phi(x^{k},\tilde{x}^{k},\beta^{k}). Next, we use

xk+1=xkαd(xk,x~k,βk)x^{k+1}=x^{k}-\alpha d(x^{k},\tilde{x}^{k},\beta^{k})

to obtain the next surrogate, then one sees

ϑk(α)\displaystyle\vartheta_{k}(\alpha) :=xkx2xk+1x2\displaystyle:=\|x^{k}-x^{*}\|^{2}-\|x^{k+1}-x^{*}\|^{2}
=2α(xkx)Td(xk,x~k,βk)α2d(xk,x~k,βk)2\displaystyle=2\alpha(x^{k}-x^{*})^{T}d(x^{k},\tilde{x}^{k},\beta^{k})-\alpha^{2}\|d(x^{k},\tilde{x}^{k},\beta^{k})\|^{2}
2αϕ(xk,x~k,βk)α2d(xk,x~k,βk)2:=qk(α).\displaystyle\geq 2\alpha\phi(x^{k},\tilde{x}^{k},\beta^{k})-\alpha^{2}\|d(x^{k},\tilde{x}^{k},\beta^{k})\|^{2}:=q_{k}(\alpha).

Note that qk(α)q_{k}(\alpha) obtains its maximum at αk=ϕ(xk,x~k,βk)d(xk,x~k,βk)2\alpha_{k}^{*}=\frac{\phi(x^{k},\tilde{x}^{k},\beta^{k})}{\|d(x^{k},\tilde{x}^{k},\beta^{k})\|^{2}}, thus by setting xk+1=xkγαd(xk,x~k,βk)x^{k+1}=x^{k}-\gamma\alpha^{*}d(x^{k},\tilde{x}^{k},\beta^{k}), one sees that

xk+1x2\displaystyle\|x^{k+1}-x^{*}\|^{2} xkx2γ(2γ)αkϕ(xk,x~k,βk)\displaystyle\leq\|x^{k}-x^{*}\|^{2}-\gamma(2-\gamma)\alpha_{k}^{*}\phi(x^{k},\tilde{x}^{k},\beta^{k})
=xkx2γ(2γ)αkxkx~k2.\displaystyle=\|x^{k}-x^{*}\|^{2}-\gamma(2-\gamma)\alpha_{k}^{*}\|x^{k}-\tilde{x}^{k}\|^{2}. (20)

Note that αk=ϕ(xk,x~k,βk)d(xk,x~k,βk)1I+βkMT2\alpha_{k}^{*}=\frac{\phi(x^{k},\tilde{x}^{k},\beta^{k})}{d(x^{k},\tilde{x}^{k},\beta^{k})}\geq\frac{1}{\|I+\beta^{k}M^{T}\|_{2}}. Thus, inf{βk}k>0\inf\{\beta^{k}\}_{k}>0 implies that inf{αk}k>0\inf\{\alpha_{k}^{*}\}_{k}>0.

The aforementioned algorithm, denoted as PGAa1\text{PGA}_{a_{1}}, is summarized in Algorithm 2.

Input: x0nx^{0}\in\mathbb{R}^{n}, and γ(0,2)\gamma\in(0,2).
1
2for k=0,1,k=0,1,\cdots do
3       Predictor: Selecting βk>0\beta_{k}>0, and x~k=Proxβkθ(xkβkF(xk))\tilde{x}^{k}=\text{Prox}_{\beta^{k}\theta}(x^{k}-\beta^{k}F(x^{k})) ;
4      Corrector: Setting d(xk,x~k,βk)=(I+βkMT)(xkx~k)d(x^{k},\tilde{x}^{k},\beta^{k})=(I+\beta^{k}M^{T})(x^{k}-\tilde{x}^{k}), and computing αk=xkx~k2d(xk,x~k,βk)2\alpha_{k}^{*}=\frac{\|x^{k}-\tilde{x}^{k}\|^{2}}{\|d(x^{k},\tilde{x}^{k},\beta^{k})\|^{2}}.
5      Setting xk+1=xkγαkd(xk,x~k,βk)x^{k+1}=x^{k}-\gamma\alpha_{k}^{*}d(x^{k},\tilde{x}^{k},\beta^{k}).
6 end for
7
Algorithm 2 PGAa1\text{PGA}_{a_{1}} for MGVI(θ,F)\text{MGVI}(\theta,F)

Next, we construct another proximity and contraction algorithm, which is based on that MM is symmetric positive semi-definite. In this case, setting ϕ(xk,x~k):=xkx~k2\phi(x^{k},\tilde{x}^{k}):=\|x^{k}-\tilde{x}^{k}\|^{2}, d(xk,x~k):=xkx~kd(x^{k},\tilde{x}^{k}):=x^{k}-\tilde{x}^{k}, and let G:=I+βMTG:=I+\beta M^{T}, xk+1=xkαd(xk,x~k)x^{k+1}=x^{k}-\alpha d(x^{k},\tilde{x}^{k}), then one sees (xkx)TGd(xk,x~k)ϕ(xk,x~k)(x^{k}-x^{*})^{T}Gd(x^{k},\tilde{x}^{k})\geq\phi(x^{k},\tilde{x}^{k}). Let ϑk(α):=xkxG2xk+1xG2\vartheta_{k}(\alpha):=\|x^{k}-x^{*}\|_{G}^{2}-\|x^{k+1}-x^{*}\|_{G}^{2}, one sees that

ϑk(α)\displaystyle\vartheta_{k}(\alpha) =2(xkx)Gd(xk,x~k)α2d(xk,x~k)G2\displaystyle=2(x^{k}-x^{*})Gd(x^{k},\tilde{x}^{k})-\alpha^{2}\|d(x^{k},\tilde{x}^{k})\|_{G}^{2}
2ϕ(xk,x~k)α2d(xk,x~k)G2:=qk(α).\displaystyle\geq 2\phi(x^{k},\tilde{x}^{k})-\alpha^{2}\|d(x^{k},\tilde{x}^{k})\|_{G}^{2}:=q_{k}(\alpha).

Note that qk(α)q_{k}(\alpha) obtains its maximum at αk=ϕ(xk,x~k)d(xk,x~k)G2\alpha_{k}^{*}=\frac{\phi(x^{k},\tilde{x}^{k})}{\|d(x^{k},\tilde{x}^{k})\|_{G}^{2}}, thus by setting xk+1=xkγαkd(xk,x~k)x^{k+1}=x^{k}-\gamma\alpha_{k}^{*}d(x^{k},\tilde{x}^{k}), one sees that

xk+1xG2\displaystyle\|x^{k+1}-x^{*}\|_{G}^{2} xkxG2γ(2γ)αkϕ(xk,x~k,βk)\displaystyle\leq\|x^{k}-x^{*}\|_{G}^{2}-\gamma(2-\gamma)\alpha_{k}^{*}\phi(x^{k},\tilde{x}^{k},\beta^{k})
=xkxG2γ(2γ)αkxkx~k2.\displaystyle=\|x^{k}-x^{*}\|_{G}^{2}-\gamma(2-\gamma)\alpha_{k}^{*}\|x^{k}-\tilde{x}^{k}\|^{2}.

Note that αk=ϕ(xk,x~k)d(xk,x~k)G21λmax(G)\alpha_{k}^{*}=\frac{\phi(x^{k},\tilde{x}^{k})}{\|d(x^{k},\tilde{x}^{k})\|_{G}^{2}}\geq\frac{1}{\lambda_{\text{max}}(G)}.

The above method for symmetric MM, denoted as PGAa2\text{PGA}_{a_{2}}, is summarized below in Algorithm 3.

Input: x0nx^{0}\in\mathbb{R}^{n}, β>0\beta>0, and γ(0,2)\gamma\in(0,2).
1
2for k=0,1,k=0,1,\cdots do
3       Predictor: x~k=Proxβθ(xkβF(xk))\tilde{x}^{k}=\text{Prox}_{\beta\theta}(x^{k}-\beta F(x^{k})) ;
4      Corrector: Setting d(xk,x~k)=xkx~kd(x^{k},\tilde{x}^{k})=x^{k}-\tilde{x}^{k}, and computing αk=xkx~k2d(xk,x~k)G2\alpha_{k}^{*}=\frac{\|x^{k}-\tilde{x}^{k}\|^{2}}{\|d(x^{k},\tilde{x}^{k})\|_{G}^{2}},
5      Setting xk+1=xkγαkd(xk,x~k)x^{k+1}=x^{k}-\gamma\alpha_{k}^{*}d(x^{k},\tilde{x}^{k}).
6 end for
7
Algorithm 3 PGAa2\text{PGA}_{a_{2}} for MGVI(θ,F)\text{MGVI}(\theta,F)
Theorem 4.1 (Convergence of PGAa1\text{PGA}_{a_{1}} and PGAa2\text{PGA}_{a_{2}} for MGVI(θ,F)\text{MGVI}(\theta,F))

Let {xk}k\{x^{k}\}_{k} and {yk}k\{y^{k}\}_{k} be the sequences generated by PGAa1\text{PGA}_{a_{1}} and PGAa2\text{PGA}_{a_{2}}, respectively, and for PGAa1\text{PGA}_{a_{1}}, let {βk}k\{\beta^{k}\}_{k} be bounded with inf{βk}>0\inf\{\beta_{k}\}>0. Then {xk}k\{x^{k}\}_{k} (resp. {yk}k)\{y^{k}\}_{k}) converges to a solution of the corresponding MGVI(θ,F)\text{MGVI}(\theta,F).

Proof

We will just prove the convergence of PGAa1\text{PGA}_{a_{1}} since the proof of the convergence of PGAa2\text{PGA}_{a_{2}} follows the similar argument.

Let xΩx^{*}\in\Omega^{*} be any solution of MGVI(θ,F)\text{MGVI}(\theta,F). The inequality (4.1) implies that {xk}k\{x^{k}\}_{k} is Fejér monotone with respect to Ω\Omega^{*}, moreover, by adding from k=0k=0 to nn, taking some rearrangements, and letting nn\rightarrow\infty, then it holds that

k=0γ(2γ)αkxkx~k2x0x2.\sum_{k=0}^{\infty}\gamma(2-\gamma)\alpha_{k}^{*}\|x^{k}-\tilde{x}^{k}\|^{2}\leq\|x_{0}-x^{*}\|^{2}.

Because inf{αk}>0\inf\{\alpha_{k}^{*}\}>0, thus the above inequality implies xkx~k0\|x^{k}-\tilde{x}^{k}\|\rightarrow 0, for which we conclude that {xk}k\{x^{k}\}_{k} and {x~k}k\{\tilde{x}^{k}\}_{k} have common limit points. Then the similar argument in Theorem 3.1 implies that {xk}k\{x^{k}\}_{k} converges to a solution of MGVI(θ,F)\text{MGVI}(\theta,F). ∎

Remark 3

To guarantee that PGAa1\text{PGA}_{a_{1}} (Algorithm 2) converges, {βk}k\{\beta^{k}\}_{k} requires to be bounded such that inf{βk}>0\inf\{\beta^{k}\}>0. The adaptive rule

βkMT(xkx~k)=𝒪(xkx~k)\|\beta^{k}M^{T}(x^{k}-\tilde{x}^{k})\|=\mathcal{O}(\|x^{k}-\tilde{x}^{k}\|)

was suggested by He for MVI(θ,F,𝒞)\text{MVI}(\theta,F,\mathcal{C}), and can also be applied to our algorithms. For PGAa2\text{PGA}_{a_{2}}, βkβ\beta^{k}\equiv\beta, the convergence is guaranteed with β\beta being any positive constant, one possibility may be βM21\beta\|M\|_{2}\approx 1.

4.2 Proximity and contraction algorithms based on (FI2), (FI2) and (FI3)

In this part, we consider the case where FF is monotone. By adding (FI1), (FI2) and (FI3) together, one obtains that

(x~x)T{(xx~)β(F(x)F(x~))}0,(\tilde{x}-x^{*})^{T}\{(x-\tilde{x})-\beta(F(x)-F(\tilde{x}))\}\geq 0,

thus it holds that

(xx)T{(xx~)β(F(x)F(x~))}(xx~)T{(xx~)β(F(x)F(x~))}.(x-x^{*})^{T}\{(x-\tilde{x})-\beta(F(x)-F(\tilde{x}))\}\geq(x-\tilde{x})^{T}\{(x-\tilde{x})-\beta(F(x)-F(\tilde{x}))\}. (21)

Replacing the above xx, x~\tilde{x}, β\beta by xkx^{k}, x~k\tilde{x}^{k}, βk\beta^{k} respectively, and setting

ϕ(xk,x~k,βk)\displaystyle\phi(x^{k},\tilde{x}^{k},\beta^{k}) :=(xkx~k)T{(xkx~k)βk(F(xk)F(x~k))},\displaystyle:=(x^{k}-\tilde{x}^{k})^{T}\{(x^{k}-\tilde{x}^{k})-\beta^{k}(F(x^{k})-F(\tilde{x}^{k}))\},
d(xk,x~k,βk)\displaystyle d(x^{k},\tilde{x}^{k},\beta^{k}) :=(xkx~k)βk(F(xk)F(x~k)),\displaystyle:=(x^{k}-\tilde{x}^{k})-\beta^{k}(F(x^{k})-F(\tilde{x}^{k})),

then one sees that (xkx)Td(xk,x~k,βk)ϕ(xk,x~k,βk)(x^{k}-x^{*})^{T}d(x^{k},\tilde{x}^{k},\beta^{k})\geq\phi(x^{k},\tilde{x}^{k},\beta^{k}). For a fixed ν(0,1)\nu\in(0,1), considering that FF is lipschitz continuous with parameter L>0L>0, thus some βk\beta^{k} can be selected such that

βkF(xk)F(x~k)νxkx~k,\beta^{k}\|F(x^{k})-F(\tilde{x}^{k})\|\leq\nu\|x^{k}-\tilde{x}^{k}\|,

thus ϕ(xk,x~k,βk)(1ν)xkx~k2\phi(x^{k},\tilde{x}^{k},\beta^{k})\geq(1-\nu)\|x^{k}-\tilde{x}^{k}\|^{2}. By setting xk+1=xkαd(xk,x~k,βk)x^{k+1}=x^{k}-\alpha d(x^{k},\tilde{x}^{k},\beta^{k}), one sees that

ϑk(α)\displaystyle\vartheta_{k}(\alpha) :=xkx2xk+1x2\displaystyle:=\|x^{k}-x^{*}\|^{2}-\|x^{k+1}-x^{*}\|^{2}
=2α(xkx)Td(xk,x~k,βk)α2d(xk,x~k,βk)2\displaystyle=2\alpha(x^{k}-x^{*})^{T}d(x^{k},\tilde{x}^{k},\beta^{k})-\alpha^{2}\|d(x^{k},\tilde{x}^{k},\beta^{k})\|^{2}
2αϕ(xk,x~k,βk)α2d(xk,x~k,βk)2:=qk(α).\displaystyle\geq 2\alpha\phi(x^{k},\tilde{x}^{k},\beta^{k})-\alpha^{2}\|d(x^{k},\tilde{x}^{k},\beta^{k})\|^{2}:=q_{k}(\alpha).

Note that qk(α)q_{k}(\alpha) obtains its maximum at αk=ϕ(xk,x~k,βk)d(xk,x~k,βk)2\alpha_{k}^{*}=\frac{\phi(x^{k},\tilde{x}^{k},\beta^{k})}{\|d(x^{k},\tilde{x}^{k},\beta^{k})\|^{2}}, and one also sees that

2ϕ(xk,x~k,βk)d(xk,x~k,βk)2\displaystyle 2\phi(x^{k},\tilde{x}^{k},\beta^{k})-\|d(x^{k},\tilde{x}^{k},\beta^{k})\|^{2} =2(xkx~k)Td(xk,x~k,βk)d(xk,x~k,βk)2\displaystyle=2(x^{k}-\tilde{x}^{k})^{T}d(x^{k},\tilde{x}^{k},\beta^{k})-\|d(x^{k},\tilde{x}^{k},\beta^{k})\|^{2}
=d(xk,x~k,βk)T{2(xkx~k)d(xk,x~k,βk)}\displaystyle=d(x^{k},\tilde{x}^{k},\beta^{k})^{T}\{2(x^{k}-\tilde{x}^{k})-d(x^{k},\tilde{x}^{k},\beta^{k})\}
=(xkx~k)Tβk2F(xk)F(x~k)2\displaystyle=(x^{k}-\tilde{x}^{k})^{T}-{\beta^{k}}^{2}\|F(x^{k})-F(\tilde{x}^{k})\|^{2}
(1ν2)xkx~k2,\displaystyle\geq(1-\nu^{2})\|x^{k}-\tilde{x}^{k}\|^{2},

thus αk>12\alpha_{k}^{*}>\frac{1}{2} for all kk. Setting xk+1=xkγαkd(xk,x~k,βk)x^{k+1}=x^{k}-\gamma\alpha_{k}^{*}d(x^{k},\tilde{x}^{k},\beta^{k}), then one obtains that

xk+1x2\displaystyle\|x^{k+1}-x^{*}\|^{2} xkx2(2γ)γαkϕ(xk,x~k,βk)\displaystyle\leq\|x^{k}-x^{*}\|^{2}-(2-\gamma)\gamma\alpha_{k}^{*}\phi(x^{k},\tilde{x}^{k},\beta^{k})
xkx2(2γ)γαk(1ν)xkx~k2.\displaystyle\leq\|x^{k}-x^{*}\|^{2}-(2-\gamma)\gamma\alpha_{k}^{*}(1-\nu)\|x^{k}-\tilde{x}^{k}\|^{2}.

The aforementioned algorithm, denoted as PGAb1\text{PGA}_{b_{1}}, is summarized in Algorithm 4.

Input: x0nx^{0}\in\mathbb{R}^{n}, and γ(0,2)\gamma\in(0,2).
1
2for k=0,1,k=0,1,\cdots do
3       Predictor: Selecting βk>0\beta^{k}>0 such that βkF(xk)F(x~k)νxkx~k\beta^{k}\|F(x^{k})-F(\tilde{x}^{k})\|\leq\nu\|x^{k}-\tilde{x}^{k}\| where x~k=Proxβkθ(xkβkF(xk))\tilde{x}^{k}=\text{Prox}_{\beta^{k}\theta}(x^{k}-\beta^{k}F(x^{k})) ;
4      Corrector: Setting d(xk,x~k,βk)=(xkx~k)βk(F(xk)F(x~k))d(x^{k},\tilde{x}^{k},\beta^{k})=(x^{k}-\tilde{x}^{k})-\beta^{k}(F(x^{k})-F(\tilde{x}^{k})), and computing αk=(xkx~k)Td(xk,x~k,βk)d(xk,x~k,βk)2\alpha_{k}^{*}=\frac{(x^{k}-\tilde{x}^{k})^{T}d(x^{k},\tilde{x}^{k},\beta^{k})}{\|d(x^{k},\tilde{x}^{k},\beta^{k})\|^{2}},
5      Setting xk+1=xkγαkd(xk,x~k,βk)x^{k+1}=x^{k}-\gamma\alpha_{k}^{*}d(x^{k},\tilde{x}^{k},\beta^{k}).
6 end for
7
Algorithm 4 PGAb1\text{PGA}_{b_{1}} for MGVI(θ,F)\text{MGVI}(\theta,F)

Finally, we distinguish a special case where F(x)=Ax+cF(x)=Ax+c with AA being a symmetric positive semi-definite matrix, and cnc\in\mathbb{R}^{n} being a constant vector. In this case, inequality (21) takes the next form

(xx)T(IβA)(xx~)(xx~)T(IβA)(xx~).(x-x^{*})^{T}(I-\beta A)(x-\tilde{x})\geq(x-\tilde{x})^{T}(I-\beta A)(x-\tilde{x}).

Setting G:=IβAG:=I-\beta A with β\beta being a fixed positive constant smaller than 1λmax(A)\frac{1}{\lambda_{\max}(A)}, thus GG is a symmetric positive definite matrix. Replacing the above xx and x~\tilde{x} by xkx^{k} and x~k\tilde{x}^{k}, respectively. Then, one sees that

(xkx)TG(xkx~k)(xkx~k)TG(xkx~k).(x^{k}-x^{*})^{T}G(x^{k}-\tilde{x}^{k})\geq(x^{k}-\tilde{x}^{k})^{T}G(x^{k}-\tilde{x}^{k}).

Let xk+1=xkγ(xkx~k)x^{k+1}=x^{k}-\gamma(x^{k}-\tilde{x}^{k}), thus it follows that

xk+1xG2xkxG2(2γ)γxkx~kG2.\|x^{k+1}-x^{*}\|_{G}^{2}\leq\|x^{k}-x^{*}\|_{G}^{2}-(2-\gamma)\gamma\|x^{k}-\tilde{x}^{k}\|_{G}^{2}.

The above method for symmetric AA is denoted as PGAb2\text{PGA}_{b2}, the corrector strategy was not mentioned by He, but it can be simply constructed from (21), we summarize it below in Algorithm 5.

Input: x0nx^{0}\in\mathbb{R}^{n}, β>0\beta>0, and γ(0,2)\gamma\in(0,2).
1
2for k=0,1,k=0,1,\cdots do
3       Predictor: x~k=Proxβθ(xkβF(xk))\tilde{x}^{k}=\text{Prox}_{\beta\theta}(x^{k}-\beta F(x^{k})) ;
4      Corrector: Setting d(xk,x~k)=xkx~kd(x^{k},\tilde{x}^{k})=x^{k}-\tilde{x}^{k},
5      Setting xk+1=xkγd(xk,x~k)x^{k+1}=x^{k}-\gamma d(x^{k},\tilde{x}^{k}).
6 end for
7
Algorithm 5 PGAb2\text{PGA}_{b_{2}} for MGVI(θ,F)\text{MGVI}(\theta,F)

The convergence analysis of PGAb1\text{PGA}_{b_{1}} and PGAb2\text{PGA}_{b_{2}} can be proved with essentially the argument that was used in Theorem 4.1.

Theorem 4.2 (Convergence of PGAb1\text{PGA}_{b_{1}} and PGAb2\text{PGA}_{b_{2}} for MGVI(θ,F)\text{MGVI}(\theta,F))

Let {xk}k\{x^{k}\}_{k} and {yk}k\{y^{k}\}_{k} be the sequences generated by PGAb1PGA_{b_{1}} and PGAb2PGA_{b_{2}}, respectively, and for PGAb1PGA_{b_{1}}, let {βk}k\{\beta^{k}\}_{k} be bounded with inf{βk}>0\inf\{\beta_{k}\}>0. Then {xk}k\{x^{k}\}_{k} (resp. {yk}k)\{y^{k}\}_{k}) converges to a solution of the corresponding MGVI(θ,F)\text{MGVI}(\theta,F).

Remark 4

In the description of PGAa1\text{PGA}_{a_{1}}, PGAa2\text{PGA}_{a_{2}}, and PGAb1\text{PGA}_{b_{1}}, we neglect the fact that xkx^{k} may be equal to x~k\tilde{x}^{k}, however, this will hardly happen in practice, moreover, when implementing these algorithms, one may priorly determine whether xk=x~kx^{k}=\tilde{x}^{k}.

5 A Discussion about Our Algorithms

In the preceding sections, we introduce our algorithms for MGVI(θ,F)\text{MGVI}(\theta,F), while in this part, we will show the characteristics of our algorithms with respect to the following separable two-blocks convex optimization model.

minθ1(x)+θ2(y)s.t.Ax+By=c,\begin{split}\min\quad&\theta_{1}(x)+\theta_{2}(y)\\ \text{s.t.}\quad&Ax+By=c,\\ \end{split} (22)

where Am×nA\in\mathbb{R}^{m\times n}, Bm×qB\in\mathbb{R}^{m\times q}, cmc\in\mathbb{R}^{m}, θ1:n(,]\theta_{1}:\mathbb{R}^{n}\rightarrow(-\infty,\infty] and θ2:q(,]\theta_{2}:\mathbb{R}^{q}\rightarrow(-\infty,\infty] are two closed proper convex functions.

The well known alternating direction method of multipliers (ADMM) MR724072 is very popular for problem (22), the reader can refer to BoydADMM and references therein for efficient applications of ADMM in a variety of fields. The procedure of ADMM is shown below in Algorithm 6.

1 Initialization: x0nx^{0}\in\mathbb{R}^{n}, y0qy^{0}\in\mathbb{R}^{q}, λ0m\lambda^{0}\in\mathbb{R}^{m}, and ρ>0\rho>0.
General step: for k=0,1,k=0,1,\cdots execute the following:
  1. (a)

    xk+1argmin{θ1(x)+ρ2Ax+Bykc+1ρλk2:xn}x^{k+1}\in\operatorname*{argmin}\{\theta_{1}(x)+\frac{\rho}{2}\|Ax+By^{k}-c+\frac{1}{\rho}\lambda^{k}\|^{2}:x\in\mathbb{R}^{n}\};

  2. (b)

    yk+1argmin{θ2(y)+ρ2Axk+1+Byc+1ρλk2:yq}y^{k+1}\in\operatorname*{argmin}\{\theta_{2}(y)+\frac{\rho}{2}\|Ax^{k+1}+By-c+\frac{1}{\rho}\lambda^{k}\|^{2}:y\in\mathbb{R}^{q}\};

  3. (c)

    λk+1=λk+ρ(Axk+1+Byk+1c)\lambda^{k+1}=\lambda^{k}+\rho(Ax^{k+1}+By^{k+1}-c).

Algorithm 6 ADMM for problem (22)

Note that the computation of xk+1x^{k+1} and yk+1y^{k+1} in the general step of ADMM may be much more complex than the computation of the proximity operators of θ1\theta_{1} and θ2\theta_{2}, thus some variants of ADMM have been proposed for dealing with such matters. One of them is the alternating direction proximal method of multipliers (AD-PMM) MR3719240 , and a special case of AD-PMM is the the alternating direction linearized proximal method of multipliers (AD-LPMM), which just requires the computation of the proximity operators of θ1\theta_{1} and θ2\theta_{2}. AD-LPMM for problem (22) is show in Algorithm 7.

1 Initialization: x0nx^{0}\in\mathbb{R}^{n}, y0qy^{0}\in\mathbb{R}^{q}, λ0m\lambda^{0}\in\mathbb{R}^{m}, ρ>0\rho>0, αρλmax(ATA)\alpha\geq\rho\lambda_{\max}(A^{T}A), βρλmax(BTB)\beta\geq\rho\lambda_{\max}(B^{T}B).
General step: for k=0,1,k=0,1,\cdots execute the following:
  1. (a)

    xk+1=Prox1αθ1[xkραAT(Axk+Bykc+1ρλk)]x^{k+1}=\text{Prox}_{\frac{1}{\alpha}\theta_{1}}[x^{k}-\frac{\rho}{\alpha}A^{T}(Ax^{k}+By^{k}-c+\frac{1}{\rho}\lambda^{k})];

  2. (b)

    yk+1=Prox1βθ2[ykρβBT(Axk+1+Bykc+1ρλk)]y^{k+1}=\text{Prox}_{\frac{1}{\beta}\theta_{2}}[y^{k}-\frac{\rho}{\beta}B^{T}(Ax^{k+1}+By^{k}-c+\frac{1}{\rho}\lambda^{k})];

  3. (c)

    λk+1=λk+ρ(Axk+1+Byk+1c)\lambda^{k+1}=\lambda^{k}+\rho(Ax^{k+1}+By^{k+1}-c).

Algorithm 7 AD-LPMM for problem (22)

Next, we will introduce our algorithms for problem (22). Consider the lagrangian function of problem (22)

L(x,y,λ)=θ1(x)+θ2(y)λT(Ax+Byc).L(x,y,\lambda)=\theta_{1}(x)+\theta_{2}(y)-\lambda^{T}(Ax+By-c).

Then (x,y,λ)n×q×m(x^{*},y^{*},\lambda^{*})\in\mathbb{R}^{n}\times\mathbb{R}^{q}\times\mathbb{R}^{m} is a saddle point of problem (22) if and only if

minλL(x,y,λ)L(x,y,λ)minx,yL(x,y,λ),\min_{\lambda}L(x^{*},y^{*},\lambda)\leq L(x^{*},y^{*},\lambda^{*})\leq\min_{x,y}L(x,y,\lambda^{*}),

the latter is equivalent to

[θ1(x)+θ2(y)][θ1(x)+θ2(y)]+[(xyλ)(xyλ)]T[(00AT00BTAB0)(xyλ)+(00c)]0,\Big{[}\begin{array}[]{c}\theta_{1}(x)\\ +\theta_{2}(y)\end{array}\Big{]}-\Big{[}\begin{array}[]{c}\theta_{1}(x^{*})\\ +\theta_{2}(y^{*})\end{array}\Big{]}+\Bigg{[}\begin{pmatrix}x\\ y\\ \lambda\end{pmatrix}-\begin{pmatrix}x^{*}\\ y^{*}\\ \lambda^{*}\end{pmatrix}\Bigg{]}^{T}\Bigg{[}\begin{pmatrix}0&0&-A^{T}\\ 0&0&-B^{T}\\ A&B&0\end{pmatrix}\begin{pmatrix}x\\ y\\ \lambda\end{pmatrix}+\begin{pmatrix}0\\ 0\\ -c\end{pmatrix}\Bigg{]}\geq 0,

for all (x,y,λ)n×q×m(x,y,\lambda)\in\mathbb{R}^{n}\times\mathbb{R}^{q}\times\mathbb{R}^{m}. This fits MGVI(θ,F)\text{MGVI}(\theta,F) with

θ(x,y,λ)=θ1(x)+θ2(y),F(x,y,λ)=(00AT00BTAB0)(xyλ)+(00c).\begin{array}[]{ll}\theta(x,y,\lambda)&=\theta_{1}(x)+\theta_{2}(y),\\ F(x,y,\lambda)&=\begin{pmatrix}0&0&-A^{T}\\ 0&0&-B^{T}\\ A&B&0\end{pmatrix}\begin{pmatrix}x\\ y\\ \lambda\end{pmatrix}+\begin{pmatrix}0\\ 0\\ -c\end{pmatrix}.\end{array}

For the theory of saddle point, one can see e.g., MR2184037 . Now we only need to keep in mind that if (x,y,λ)(x^{*},y^{*},\lambda^{*}) is a saddle point of problem (22), then (x,y)(x^{*},y^{*}) will be a minimizer of problem (22).

Note that our algorithms PGAa2\text{PGA}_{a_{2}} and PGAb2\text{PGA}_{b_{2}} are designed for some specified operators about FF, which can not be applied for this problem. Moreover, for problem (22), owing to the special form of FF, this causes that the resulting PGAa1\text{PGA}_{a_{1}} and PGAb1\text{PGA}_{b_{1}} will only differ in the computation of αk\alpha_{k}^{*} when the same self-adaptive rule is employed. For these reasons, we will firstly show GEM and PGAa1\text{PGA}_{a_{1}} in Algorithm 8 and 9, respectively. A self-adaptive rule for {βk}k\{\beta^{k}\}_{k} is also included into these two algorithms. The computation of αk\alpha_{k}^{*} in PGAb1\text{PGA}_{b_{1}} for problem (22) is given in Algorithm 10.

1 Initialization: x0nx^{0}\in\mathbb{R}^{n}, y0qy^{0}\in\mathbb{R}^{q}, λ0m\lambda^{0}\in\mathbb{R}^{m}, β0>0\beta^{0}>0, and ν,μ(0,1)\nu,\mu\in(0,1) with μ<ν\mu<\nu.
2for k=0,1,k=0,1,\cdots do
3      
4      x~k=Proxβkθ1(xk+βkATλk);\tilde{x}^{k}=\text{Prox}_{\beta^{k}\theta_{1}}(x^{k}+\beta^{k}A^{T}\lambda^{k});
5      y~k=Proxβkθ2(yk+βkBTλk)\tilde{y}^{k}=\text{Prox}_{\beta^{k}\theta_{2}}(y^{k}+\beta^{k}B^{T}\lambda^{k});
6      λ~k=λkβk(Axk+Bykc)\tilde{\lambda}^{k}=\lambda^{k}-\beta^{k}(Ax^{k}+By^{k}-c);
7      rk=βk(AT(λkλ~k)BT(λkλ~k)A(xkx~k)+B(yky~k))/(xkx~kyky~kλkλ~k);r_{k}=\beta^{k}\Bigg{\|}\begin{pmatrix}A^{T}(\lambda^{k}-\tilde{\lambda}^{k})\\ B^{T}(\lambda^{k}-\tilde{\lambda}^{k})\\ A(x^{k}-\tilde{x}^{k})+B(y^{k}-\tilde{y}^{k})\end{pmatrix}\Bigg{\|}\Bigg{/}\Bigg{\|}\begin{pmatrix}x^{k}-\tilde{x}^{k}\\ y^{k}-\tilde{y}^{k}\\ \lambda^{k}-\tilde{\lambda}^{k}\\ \end{pmatrix}\Bigg{\|};
8       if rk>νr_{k}>\nu then
9             βk:=23βkmin{1,1rk}\beta^{k}:=\frac{2}{3}*\beta^{k}\min\{1,\frac{1}{r_{k}}\};
10            Goto line 3.
11       end
12      
13      xk+1=Proxβkθ1(xk+βkATλ~k);x^{k+1}=\text{Prox}_{\beta^{k}\theta_{1}}(x^{k}+\beta^{k}A^{T}\tilde{\lambda}^{k});
14      yk+1=Proxβkθ2(yk+βkBTλ~k)y^{k+1}=\text{Prox}_{\beta^{k}\theta_{2}}(y^{k}+\beta^{k}B^{T}\tilde{\lambda}^{k});
15      λk+1=λkβk(Ax~k+By~kc)\lambda^{k+1}=\lambda^{k}-\beta^{k}(A\tilde{x}^{k}+B\tilde{y}^{k}-c);
16       if rkμr_{k}\leq\mu then
17             βk+1:=1.5βk\beta^{k+1}:=1.5*\beta^{k};
18      else
19             βk+1:=βk\beta^{k+1}:=\beta^{k}.
20       end
21      
22 end for
Algorithm 8 GEM for problem (22)
1 Initialization: x0nx^{0}\in\mathbb{R}^{n}, y0qy^{0}\in\mathbb{R}^{q}, λ0m\lambda^{0}\in\mathbb{R}^{m}, β0>0\beta^{0}>0, ν(0,1)\nu\in(0,1), and γ(0,2)\gamma\in(0,2).
2for k=0,1,k=0,1,\cdots do
3       x~k=Proxβkθ1(xk+βkATλk);\tilde{x}^{k}=\text{Prox}_{\beta^{k}\theta_{1}}(x^{k}+\beta^{k}A^{T}\lambda^{k});
4      y~k=Proxβkθ2(yk+βkBTλk);\tilde{y}^{k}=\text{Prox}_{\beta^{k}\theta_{2}}(y^{k}+\beta^{k}B^{T}\lambda^{k});
5      λ~k=λkβk(Axk+Bykc);\tilde{\lambda}^{k}=\lambda^{k}-\beta^{k}(Ax^{k}+By^{k}-c);
6      rk=βk(AT(λkλ~k)BT(λkλ~k)A(xkx~k)+B(yky~k))/(xkx~kyky~kλkλ~k);r_{k}=\beta^{k}\Bigg{\|}\begin{pmatrix}A^{T}(\lambda^{k}-\tilde{\lambda}^{k})\\ B^{T}(\lambda^{k}-\tilde{\lambda}^{k})\\ A(x^{k}-\tilde{x}^{k})+B(y^{k}-\tilde{y}^{k})\end{pmatrix}\Bigg{\|}\Bigg{/}\Bigg{\|}\begin{pmatrix}x^{k}-\tilde{x}^{k}\\ y^{k}-\tilde{y}^{k}\\ \lambda^{k}-\tilde{\lambda}^{k}\\ \end{pmatrix}\Bigg{\|};
7      if rk>νr_{k}>\nu then
8             βk:=23βkmin{1,1rk}\beta^{k}:=\frac{2}{3}*\beta^{k}\min\{1,\frac{1}{r_{k}}\};
9            Goto line 3.
10       end if
11      
12      αk=(xkx~kyky~kλkλ~k)2/([xkx~k]+βk[AT(λkλ~k)][yky~k]+βk[BT(λkλ~k)][λkλ~k]βk[A(xkx~k)+B(yky~k)])2\alpha_{k}^{*}=\Bigg{\|}\begin{pmatrix}x^{k}-\tilde{x}^{k}\\ y^{k}-\tilde{y}^{k}\\ \lambda^{k}-\tilde{\lambda}^{k}\end{pmatrix}\Bigg{\|}^{2}\Bigg{/}\Bigg{\|}\begin{pmatrix}[x^{k}-\tilde{x}^{k}]+\beta^{k}[A^{T}(\lambda^{k}-\tilde{\lambda}^{k})]\\ [y^{k}-\tilde{y}^{k}]+\beta^{k}[B^{T}(\lambda^{k}-\tilde{\lambda}^{k})]\\ [\lambda^{k}-\tilde{\lambda}^{k}]-\beta^{k}[A(x^{k}-\tilde{x}^{k})+B(y^{k}-\tilde{y}^{k})]\\ \end{pmatrix}\Bigg{\|}^{2};
13      xk+1=xkγαk([xkx~k]+βk[AT(λkλ~k)])x^{k+1}=x^{k}-\gamma\alpha_{k}^{*}\left([x^{k}-\tilde{x}^{k}]+\beta^{k}[A^{T}(\lambda^{k}-\tilde{\lambda}^{k})]\right);
14      yk+1=ykγαk([yky~k]+βk[BT(λkλ~k)])y^{k+1}=y^{k}-\gamma\alpha_{k}^{*}\left([y^{k}-\tilde{y}^{k}]+\beta^{k}[B^{T}(\lambda^{k}-\tilde{\lambda}^{k})]\right);
15      λk+1=λkγαk([λkλ~k]βk[A(xkx~k)+B(yky~k)])\lambda^{k+1}=\lambda^{k}-\gamma\alpha_{k}^{*}\left([\lambda^{k}-\tilde{\lambda}^{k}]-\beta^{k}[A(x^{k}-\tilde{x}^{k})+B(y^{k}-\tilde{y}^{k})]\right);
16       if rkμr_{k}\leq\mu then
17             βk+1:=1.5βk\beta^{k+1}:=1.5*\beta^{k};
18      else
19             βk+1:=βk\beta^{k+1}:=\beta^{k}.
20       end
21      
22 end for
23
Algorithm 9 PGAa1\text{PGA}_{a_{1}} for problem (22)
1 \cdots
2αk=(xkx~kyky~kλkλ~k
3)T([xk-~xk]+[βkAT(λk-~λk)][yk-~yk]+[βkBT(λk-~λk)][λk-~λk]-βk[A(xk-~xk)+B(yk-~yk)])∥([xk-~xk]+[βkAT(λk-~λk)][yk-~yk]+[βkBT(λk-~λk)][λk-~λk]-βk[A(xk-~xk)+B(yk-~yk)])∥2;
4\cdots
Algorithm 10 PGAb1\text{PGA}_{b_{1}} for problem (22)

In the preceding part of this section, we have introduced five algorithms for problem (22). Now, let us take a close look at them. Firstly, ADMM (Algorithm 6) may not be implementable in practice because compared with the computation of proximity operators of θ1\theta_{1} and θ2\theta_{2}, it may be much harder to solve the next two convex optimization problems

min{θ1(x)+ρ2Ax+Bykc+1ρ2:xn},min{θ2(y)+ρ2Axk+1+Byc+1ρ2:yn}.\begin{array}[]{l}\min\{\theta_{1}(x)+\frac{\rho}{2}\|Ax+By^{k}-c+\frac{1}{\rho}\|^{2}:x\in\mathbb{R}^{n}\},\\ \min\{\theta_{2}(y)+\frac{\rho}{2}\|Ax^{k+1}+By-c+\frac{1}{\rho}\|^{2}:y\in\mathbb{R}^{n}\}.\end{array}

Next, let us look at AD-LPMM (Algorithm 7) and the other three algorithms proposed by us. Firstly, at iteration kk, AD-LPMM solves (xk+1,yk+1,λk+1)(x^{k+1},y^{k+1},\lambda^{k+1}) in order, while the other three algorithms solve (x~k,y~k,λ~k)(\tilde{x}^{k},\tilde{y}^{k},\tilde{\lambda}^{k}) and (xk+1,yk+1,λk+1)(x^{k+1},y^{k+1},\lambda^{k+1}) in parallel, this is a significant characteristic of our algorithms, allowing lesser time exhausted by employing the technique of parallel computation, especially in case of large-scale multi-blocks separable problems. Secondly, although the proximity operators of θ1\theta_{1} and θ2\theta_{2} are used by all of the four algorithms, more computation is involved by AD-LPMM in updating xx and yy because it requires to compute ATAxA^{T}Ax and ATByA^{T}By, while our algorithms only require the computation of ATλA^{T}\lambda or BTλB^{T}\lambda. Finally, AD-LPMM additionally evaluates in advance the maximal eigenvalues of ATAA^{T}A and BTBB^{T}B, which is not necessary for our algorithms. Now, let us look at our proposed algorithms PGAa1\text{PGA}_{a_{1}}, PGAb1\text{PGA}_{b_{1}}, and GEM. The predictor steps are the same for all of them, who use the proximity operators of θ1\theta_{1} and θ2\theta_{2} for making the predictors. In corrector step, GEM employs again the proximity operators for obtaining the next surrograte (xk+1,yk+1,λk+1)(x^{k+1},y^{k+1},\lambda^{k+1}), while for the other two algorithms, only some cheap matrix-vector products, additions, and so on are involved. Thus, when the computation of proximity operators of θ1\theta_{1} and θ2\theta_{2} requires some iterative techniques, the corrector step of GEM will be more complex than that of PGAa1\text{PGA}_{a_{1}} and PGAb1\text{PGA}_{b_{1}}. Finally, we point out that the value of αk\alpha_{k}^{*} in PGAb1\text{PGA}_{b_{1}} is lesser or equal to its counterpart in PGAa1\text{PGA}_{a_{1}}.

6 Numerical Simulations

In this section, we report some numerical results of our proposed algorithms. Our algorithms are implemented in MATLAB. The experiments are performed on a laptop equipped with Intel i55-62006200U 2.302.30GHz CPU and 88 GB RAM.

6.1 First Example

Consider the next l1l_{1}-regularized least square problem

minxnf(x):=12Axb2+λx1,\min_{x\in\mathbb{R}^{n}}f(x):=\frac{1}{2}\|Ax-b\|^{2}+\lambda\|x\|_{1}, (23)

where Am×nA\in\mathbb{R}^{m\times n}, bmb\in\mathbb{R}^{m}, and λ>0\lambda>0. This problem fits MGVI(θ,F)\text{MGVI}(\theta,F) with

θ(x)=λx1,F(x)=AT(Axb).\begin{array}[]{l}\theta(x)=\lambda\|x\|_{1},\\ F(x)=A^{T}(Ax-b).\end{array}

We generate an instance of the problem with λ=1\lambda=1 and A1000×1100A\in\mathbb{R}^{1000\times 1100} for showing the iterative shrinkage-thresholding algorithm (ISTA) and our proposed algorithms GME, PGAa1\text{PGA}_{a_{1}}, PGAa2\text{PGA}_{a_{2}}, PGAb1\text{PGA}_{b_{1}}, PGAb2\text{PGA}_{b_{2}}. The components of AA are independently generated by using the standard normal distribution, the ”true” vector x_true is a sparse vector with 20 non-zeros elements, which is generated by the following matlab command:

x_true = zeros(n,1);
x_true(3:8:80) = 1;
x_true(7:8:80) = -1;

then we set b:=Ax_trueb:=A\text{x\_{true}}. The initial point for all of the algorithms are x=ex=\textbf{e}, the vector of all ones. The stopping criterion for all of them is

xkProxθ(xkF(xk))<106.\|x^{k}-\text{Prox}_{\theta}(x^{k}-F(x^{k}))\|_{\infty}<10^{-6}.

The trends about the values of ff of ISTA, GME, PGAa1\text{PGA}_{a_{1}}, PGAa2\text{PGA}_{a_{2}}, PGAb1\text{PGA}_{b_{1}}, PGAb2\text{PGA}_{b_{2}} on l1l_{1}-regularized least squares problem are shown in Fig. 1. The number of iteration and CPU time for all of thm is shown in Table 1. All of them recover the ”true” vector.

Refer to caption
Figure 1: Trends about ff of ISTA, GME, PGAa1\text{PGA}_{a_{1}}, PGAa2\text{PGA}_{a_{2}}, PGAb1\text{PGA}_{b_{1}}, PGAb2\text{PGA}_{b_{2}} on l1l_{1}-regularized least squares problem.
Table 1: The number of iteration and time of ISTA, GME, PGAa1\text{PGA}_{a_{1}}, PGAa2\text{PGA}_{a_{2}}, PGAb1\text{PGA}_{b_{1}}, PGAb2\text{PGA}_{b_{2}} on l1l_{1}-regularized least squares problem.
No. of iteration CPU time (seconds)
ISTA 1739 5.36
GEM 1682 11.10
PGAa1\text{PGA}_{a_{1}} 1816 24.93
PGAa2\text{PGA}_{a_{2}} 822 4.20
PGAb1\text{PGA}_{b_{1}} 1157 7.31
PGAb2\text{PGA}_{b_{2}} 1085 3.38

6.2 Second Example

Consider the next basis pursuit problem

minx1s.t.Ax=b,\begin{split}\min\quad&\|x\|_{1}\\ \text{s.t.}\quad&Ax=b,\\ \end{split} (24)

where Am×nA\in\mathbb{R}^{m\times n}, bmb\in\mathbb{R}^{m}. This problem matches MGVI(θ,F)\text{MGVI}(\theta,F) with θ(x,λ)=x1\theta(x,\lambda)=\|x\|_{1}, and F(x,λ)=(0ATA0)(xλ)+(0b).F(x,\lambda)=\begin{pmatrix}0&-A^{T}\\ A&0\end{pmatrix}\begin{pmatrix}x\\ \lambda\end{pmatrix}+\begin{pmatrix}0\\ -b\end{pmatrix}. We use the same data as in the preceding example for showing AD-LPMM , and our algorithms GME, PGAa1\text{PGA}_{a_{1}}, PGAb1\text{PGA}_{b_{1}}. The initial point for all the algorithms are x=ex=\textbf{e}, the vector of all ones, and λ=0\lambda=\textbf{0}, the vector of all zeros. The stopping criterion for all the algorithms except AD-LPMM are (xk,λk)Proxθ((xk,λk)F(xk,λk))<106\|(x^{k},\lambda^{k})-\text{Prox}_{\theta}\left((x^{k},\lambda^{k})-F(x^{k},\lambda^{k})\right)\|_{\infty}<10^{-6}, while that for AD-LPMM is (xk+1,λk+1)(xk,λk)<106\|(x^{k+1},\lambda^{k+1})-(x^{k},\lambda^{k})\|_{\infty}<10^{-6}. Results of AD-LPMM, PGAa1\text{PGA}_{a_{1}}, PGAb1\text{PGA}_{b_{1}}, GEM on basis pursuit problem are shown in Table 2, where we show the number of iteration and CPU time for all of them. All of these algorithms recover the ”true” vector.

Table 2: The number of iteration and time of AD-LPMM, GEM, PGAa1\text{PGA}_{a_{1}}, PGAb1\text{PGA}_{b_{1}} on basis pursuit problem.
No. of iteration CPU time (seconds)
AD-LPMM 1773 18.65
GEM 105 4.65
PGAa1\text{PGA}_{a_{1}} 225 7.58
PGAb1\text{PGA}_{b_{1}} 226 7.05

7 Conclusion

In this paper, we propose some PGA-based Predictor-Corrector algorithms for solving MGVI(θ,F)\text{MGVI}(\theta,F). Our works are inspired by the extragradient method and the projection and contraction algorithms for MVI(θ,F,𝒞)\text{MVI}(\theta,F,\mathcal{C}). The form of our algorithms is also simple, and one of their significant characteristic for separable multi-blocks convex optimization problems is that they can be well adapted for parallel computation, allowing our algorithms can be well applied in large-scale settings. Preliminary simulations of our algorithms in the sparsity recovery models show their effectiveness when compared with some existing algorithms.

Acknowledgements.
My special thanks should go to my friend, Kuan Lu, who has put considerable time and effort into his comments on the draft.

References

  • (1) Bauschke, H.H., Combettes, P.L.: Convex analysis and monotone operator theory in Hilbert spaces, second edn. CMS Books in Mathematics/Ouvrages de Mathématiques de la SMC. Springer, Cham (2017). DOI 10.1007/978-3-319-48311-5. URL https://doi.org/10.1007/978-3-319-48311-5. With a foreword by Hédy Attouch
  • (2) Beck, A.: First-order methods in optimization, MOS-SIAM Series on Optimization, vol. 25. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA; Mathematical Optimization Society, Philadelphia, PA (2017). DOI 10.1137/1.9781611974997.ch1. URL https://doi.org/10.1137/1.9781611974997.ch1
  • (3) Bertsekas, D.P.: Convex analysis and optimization. Athena Scientific, Belmont, MA (2003). With Angelia Nedić and Asuman E. Ozdaglar
  • (4) Boyd, S., Parikh, N., Chu, E., Borja, P., Jonathan, E.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning 3(1), 1–122 (2011)
  • (5) Bruckstein, A.M., Donoho, D.L., Elad, M.: From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Rev. 51(1), 34–81 (2009). DOI 10.1137/060657704. URL https://doi.org/10.1137/060657704
  • (6) Candès, E., Tao, T.: Rejoinder: “The Dantzig selector: statistical estimation when pp is much larger than nn” [Ann. Statist. 35 (2007), no. 6, 2313–2351; mr2382644]. Ann. Statist. 35(6), 2392–2404 (2007). DOI 10.1214/009053607000000532. URL https://doi.org/10.1214/009053607000000532
  • (7) Chen, S.S., Donoho, D.L., Saunders, M.A.: Atomic decomposition by basis pursuit. SIAM Rev. 43(1), 129–159 (2001). DOI 10.1137/S003614450037906X. URL https://doi.org/10.1137/S003614450037906X. Reprinted from SIAM J. Sci. Comput. 20 (1998), no. 1, 33–61 (electronic) [ MR1639094 (99h:94013)]
  • (8) Combettes, P.L., Pesquet, J.C.: Proximal splitting methods in signal processing. In: Fixed-point algorithms for inverse problems in science and engineering, Springer Optim. Appl., vol. 49, pp. 185–212. Springer, New York (2011). DOI 10.1007/978-1-4419-9569-8˙10. URL https://doi.org/10.1007/978-1-4419-9569-8_10
  • (9) Fang, S.C., Peterson, E.L.: Generalized variational inequalities. J. Optim. Theory Appl. 38(3), 363–383 (1982). DOI 10.1007/BF00935344. URL https://doi.org/10.1007/BF00935344
  • (10) Fortin, M., Glowinski, R.: Augmented Lagrangian methods, Studies in Mathematics and its Applications, vol. 15. North-Holland Publishing Co., Amsterdam (1983). Applications to the numerical solution of boundary value problems, Translated from the French by B. Hunt and D. C. Spicer
  • (11) He, B.: A class of projection and contraction methods for monotone variational inequalities. Appl. Math. Optim. 35(1), 69–76 (1997). DOI 10.1007/s002459900037. URL https://doi.org/10.1007/s002459900037
  • (12) He, B., Yuan, X.: Convergence analysis of primal-dual algorithms for a saddle-point problem: from contraction perspective. SIAM J. Imaging Sci. 5(1), 119–149 (2012). DOI 10.1137/100814494. URL https://doi.org/10.1137/100814494
  • (13) He, B., Yuan, X.: Linearized alternating direction method of multipliers with Gaussian back substitution for separable convex programming. Numer. Algebra Control Optim. 3(2), 247–260 (2013). DOI 10.3934/naco.2013.3.247. URL https://doi.org/10.3934/naco.2013.3.247
  • (14) He, B., Yuan, X., Zhang, W.: A customized proximal point algorithm for convex minimization with linear constraints. Comput. Optim. Appl. 56(3), 559–572 (2013). DOI 10.1007/s10589-013-9564-5. URL https://doi.org/10.1007/s10589-013-9564-5
  • (15) He, B.S.: A new method for a class of linear variational inequalities. Math. Programming 66(2, Ser. A), 137–144 (1994). DOI 10.1007/BF01581141. URL https://doi.org/10.1007/BF01581141
  • (16) He, B.S.: From the projection and contraction methods for variational inequalities to the splitting contraction methods for convex optimization. Numer. Math. J. Chinese Univ. 38(1), 74–96 (2016)
  • (17) He, B.S., Fu, X.L., Jiang, Z.K.: Proximal-point algorithm using a linear proximal term. J. Optim. Theory Appl. 141(2), 299–319 (2009). DOI 10.1007/s10957-008-9493-0. URL https://doi.org/10.1007/s10957-008-9493-0
  • (18) Korpelevič, G.M.: An extragradient method for finding saddle points and for other problems. Èkonom. i Mat. Metody 12(4), 747–756 (1976)
  • (19) Patriksson, M.: Nonlinear programming and variational inequality problems, Applied Optimization, vol. 23. Kluwer Academic Publishers, Dordrecht (1999). DOI 10.1007/978-1-4757-2991-7. URL https://doi.org/10.1007/978-1-4757-2991-7. A unified approach