∎

¹¹institutetext: Yu You ²²institutetext: Shanghai Jiao Tong University, School of Mathematical Sciences, China
²²email: [email protected]

PGA-based Predictor-Corrector Algorithms for Monotone Generalized Variational Inequality

Yu You

(Received: date / Accepted: date)

Abstract

In this paper, we consider the monotone generalized variational inequality (MGVI) where the monotone operator is Lipschitz continuous. Inspired by the extragradient method MR451121 and the projection contraction algorithms MR1418264 ; MR1297058 for monotone variational inequality (MVI), we propose a class of PGA-based Predictor-Corrector algorithms for MGVI. A significant characteristic of our algorithms for separable multi-blocks convex optimization problems is that they can be well adapted for parallel computation. Numerical simulations about different models for sparsity recovery show the wide applicability and effectiveness of our proposed methods.

Keywords:

Monotone Generalized variational inequality Lipschitz continuous PGA-based Predictor-Corrector algorithms parallel computation

1 Introduction

Let $\theta:\mathbb{R}^{n}\rightarrow(-\infty,\infty]$ be a closed proper convex function, and $F:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n}$ a vector-valued and continuous mapping. The generalized variational inequality (GVI) MR686212 ; MR1673631 takes the next form

x^{*}\in\mathbb{R}^{n},~{}\theta(x)-\theta(x^{*})+(x-x^{*})^{T}F(x^{*})\geq 0,~{}\forall x\in\mathbb{R}^{n}~{}.

Here, we keep the arithmetic operation that $\infty-\infty=\infty$ . In this paper, we focus on the case where $F$ is monotone, i.e., $(F(x)-F(y))^{T}(x-y)\geq 0$ for all $x,y\in\mathbb{R}^{n}$ . In this case, the corresponding GVI is called monotone generalized variational inequality (MGVI). Throughout this paper, we make the following additional assumptions for $\text{MGVI}(\theta,F)$ :

Assumption 1

(a)

The monotone operator $F$ is Lipschitz continuous with Lipschitz constant $L>0$ .
(b)

The solution set of $\text{MGVI}(\theta,F)$ , denoted as $\Omega^{*}$ , is nonempty.

Various convex optimization problems can be formulated as MGVIs, such as lasso problem, basis pursuit problem MR1854649 , basis pursuit denoising problem MR2481111 , and Dantzig selector MR2382651 , to name a few. Moreover, MGVI contains the classical monotone variational inequality (MVI). One sees this by setting $\theta=\delta_{\mathcal{C}}$ ¹¹1 $\delta_{\mathcal{C}}$ is the indicator function of $\mathcal{C}$ which is equal to $0$ if $x\in\mathcal{C}$ and $\infty$ otherwise. with $\mathcal{C}\subseteq\mathbb{R}^{n}$ being a nonempty closed convex set, then MGVI reduces to MVI in form of

x^{*}\in\mathcal{C},~{}(x-x^{*})^{T}F(x^{*})\geq 0,~{}\forall x\in\mathcal{C}~{}.

For solving $\text{MVI}(\theta,F,\mathcal{C})$ , the extragradient method MR451121 can be applied. Specifically, at iteration $k$ , in predictor step, the projection operator is used for generating a predictor $\tilde{x}^{k}=\text{P}_{\mathcal{C}}(x^{k}-\beta^{k}F(x^{k}))$ where $\beta^{k}$ is selected to satisfy some given condition, then in corrector step, $x^{k+1}=\text{P}_{\mathcal{C}}(x^{k}-\beta^{k}F(\tilde{x}^{k}))$ . This method yields a sequence converging to a solution of $\text{MVI}(\theta,F,\mathcal{C})$ . On the other hand, He MR1418264 ; MR1297058 proposed a class of projection and contraction methods for $\text{MVI}(\theta,F,\mathcal{C})$ , which also belongs to a class of Predictor-Corrector algorithms. In predictor step, the predictor is the same with the above extragradient method as $\tilde{x}^{k}=\text{P}_{\mathcal{C}}(x^{k}-\beta^{k}F(x^{k}))$ , then the corrector step is based on the next three fundamental inequalities for constructing different profitable directions:

(\tilde{x}^{k}-x^{*})^{T}\beta^{k}F(x^{*})\geq 0,

(1)

(\tilde{x}^{k}-x^{*})^{T}([x^{k}-\beta^{k}F(x^{k})]-\tilde{x}^{k})\geq 0,

(2)

\displaystyle(\tilde{x}^{k}-x^{*})^{T}(\beta^{k}F(\tilde{x}^{k})-\beta^{k}F(x^{*}))\geq 0,

(3)

where $x^{*}$ is any solution of $\text{MVI}(\theta,F,\mathcal{C})$ , and $\beta^{k}>0$ . Numerical simulations therein MR1418264 ; MR1297058 show that the methods proposed by He outperformed the extragradient method in general.

Inspired by both of their works for $\text{MVI}(\theta,F,\mathcal{C})$ , we generalize their ideas for $\text{MGVI}(\theta,F)$ . Firstly, for generalizing the idea of extragradient method for $\text{MGVI}(\theta,F)$ , at iteration $k$ and in predictor step, we use the proximity operator to yield a predictor $\tilde{x}^{k}=\text{Prox}_{\beta^{k}\theta}(x^{k}-\beta^{k}F(x^{k}))$ , where $\beta^{k}$ is selected to meet some given condition, then in corrector step, we set $x^{k+1}$ as $\text{Prox}_{\beta^{k}\theta}(x^{k}-\beta^{k}F(\tilde{x}^{k}))$ . The resulting algorithm can be deemed as a natural generalization of extragradient method (EM) for $\text{MGVI}(\theta,F)$ . Thus we denote our algorithm as GEM. The difference between GEM and EM is the different operator used in both predictor and corrector steps. Surprisingly, this kind of modification does not change the key inequality for the convergence analysis of GEM compared with that for EM. Secondly, for generalizing the idea of He’s projection and contraction algorithms, we also use the proximity operator to make a predictor $\tilde{x}^{k}=\text{Prox}_{\beta^{k}\theta}(x^{k}-\beta^{k}F(x^{k}))$ , then we likewise obtain three fundamental inequalities as follows:

\beta^{k}\theta(\tilde{x}^{k})-\beta^{k}\theta(x^{*})+(\tilde{x}^{k}-x^{*})^{T}\beta^{k}F(x^{*})\geq 0,

(4)

\beta^{k}\theta(x^{*})-\beta^{k}\theta(\tilde{x}^{k})+(\tilde{x}^{k}-x^{*})^{T}([x^{k}-\beta^{k}F(x^{k})]-\tilde{x}^{k})\geq 0,

(5)

\displaystyle(\tilde{x}^{k}-x^{*})^{T}(\beta^{k}F(\tilde{x}^{k})-\beta^{k}F(x^{*}))\geq 0,

(6)

where $x^{*}$ is any solution of $\text{MGVI}(\theta,F)$ , and $\beta^{k}>0$ . We observe that the leftmost terms of (4) and (5) are opposite to each other, thus they can be eliminated when adding (4) and (5), or when adding (4), (5) and (6). This yields the same results with that when adding (1) and (2), or adding (1), (2), and (3). The latter is used by He in corrector step for constructing projection and contraction algorithms for $\text{MVI}(\theta,F,\mathcal{C})$ . Thus, the methodologies developed by He for $\text{MVI}(\theta,F,\mathcal{C})$ can be straightforwardly inherited. This is an inspiring result. The only difference between He’s works for $\text{MVI}(\theta,F,\mathcal{C})$ and our proposed algorithms for $\text{MGVI}(\theta,F)$ will be the different predictors, which differ a little. Here, it should be mentioned that He et al. MR2902659 ; MR3049509 ; MR3128749 ; MR2500836 developed diverse PPA-based contraction methods for $\text{MGVI}(\theta,F)$ , one can also refer to MR3618786 for a review about the works of He et al. in $\text{MVI}(\theta,F,\mathcal{C})$ and $\text{MGVI}(\theta,F)$ . Although their PPA-based contraction algorithms possess the contraction properties, the various corrector strategies developed by them for $\text{MVI}(\theta,F,\mathcal{C})$ are not used. Thus our proposed algorithms for $\text{MGVI}(\theta,F)$ differ from theirs. Part of our contribution in this paper is to construct a direct link between their previous works in $\text{MVI}(\theta,F,\mathcal{C})$ and later works in $\text{MGVI}(\theta,F)$ . The above two generalizations can be treated as Predictor-Corrector algorithms, and both of them use the proximity operator for making predictors, thus we call our algorithms as PGA-based Predictor-Corrector algorithms.

Note that in order to make our algorithms to be implementable in practice, the proximity operator of $\theta$ should be easy to compute. The reader can refer to MR3616647 ; MR3719240 ; MR2858838 for a wide variety of computable proximity operators. Finally, we mention that for the structured convex optimization problem with a separable objective function and linear constraint, our proposed algorithms can be well adapted for parallel computation, which can not be realized by many algorithms related to ADMM. This is a significant characteristic of our algorithms, allowing lesser time exhausted by employing the technique of parallel computation, especially in case of large-scale multi-blocks separable problems.

The rest of this paper is organized as follows. In section 2, we recall some notations and preliminaries. Then in section 3, we will introduce the first class of our proposed PGA-based Predictor-Corrector algorithms, which is a natural generalization of extragradient method. Next in section 4, the second class of our algorithms will be introduced. In section 5, we will discuss and compare our proposed algorithms with ADMM-related algorithms with respect to a concrete convex optimization problem. Finally in section 6, we test our proposed algorithms for different sparsity recovery models.

2 Notations and preliminaries

Let $\mathbb{R}^{n}$ denote the $n$ -dimensional Euclidean space with standard inner product $\langle\cdot,\cdot\rangle$ , i.e., $\langle x,y\rangle=x^{T}y$ for any $x,y\in\mathbb{R}^{n}$ , and $\|\cdot\|$ denote the Euclidean norm. Given a function $f:\mathbb{R}^{n}\rightarrow(-\infty,\infty]$ , the set

\text{dom}f:=\{x\in\mathbb{R}^{n}:f(x)<\infty\}

denotes its effective domain. If $\text{dom}f\neq\emptyset$ , then $f$ is called a proper function (or proper). The set

\text{epi}f:=\{(x,t):f(x)\leq t,x\in\mathbb{R}^{n},t\in\mathbb{R}\}

denotes the epigraph of $f$ , and $f$ is closed (resp. convex) if $\text{epi}f$ is closed (resp. convex). The function $f$ is called a closed proper convex function if it is proper, closed and convex.

Now, let $f:\mathbb{R}^{n}\rightarrow(-\infty,\infty]$ be a closed proper convex function, the proximity operator of $f$ is defined as follows:

\text{Prox}_{f}:x\mapsto\operatorname*{argmin}\{f(y)+\frac{1}{2}\|x-y\|^{2}:y\in\mathbb{R}^{n}\},

the optimal solution is unique. If $f=\delta_{\mathcal{C}}$ with $\mathcal{C}$ being a nonempty closed convex set, then the above proximity operator reduces to the usual projection operator $\text{P}_{\mathcal{C}}$ .

Lemma 1 (MR3719240 )

Let $f:\mathbb{R}^{n}\rightarrow(-\infty,\infty]$ be a closed proper convex function. Then the next three conditions are equivalent:

1.

$p=\text{Prox}_{f}(x)$ .
2.

$x-p\in\partial f(p)$ .
3.

For all $w\in\mathbb{R}^{n}$ , it holds that

$(x-p)^{T}(p-w)\geq f(p)-f(w).$

For any $\beta>0$ , let $e(x,\beta):=x-\text{Prox}_{\beta\theta}(x-\beta F(x))$ . The next theorem shows that $x^{*}\in\mathbb{R}^{n}$ is a solution of $\text{MGVI}(\theta,F)$ if and only if $e(x,\beta)=0$ .

Lemma 2

For any $\beta>0$ , $x^{*}\in\mathbb{R}^{n}$ is a solution of $\text{MGVI}(\theta,F)$ if and only if $e(x,\beta)=0$ .

Proof

If $x^{*}$ is a solution of $\text{MGVI}(\theta,F)$ , then the definition of $\text{MGVI}(\theta,F)$ implies $-F(x^{*})\in\partial\theta(x^{*})$ , that is same with $[x^{*}-\beta F(x^{*})]-x^{*}\in\partial\beta\theta(x^{*})$ , then Lemma 1 implies that

\text{Prox}_{\beta\theta}(x^{*}-\beta F(x^{*}))=x^{*},

thus $e(x,\beta)=0$ . Now, let $e(x,\beta)=0$ , this implies $\text{Prox}_{\beta\theta}(x^{*}-\beta F(x^{*}))=x^{*}$ , then it follows from Lemma 1 that $-F(x^{*})\in\partial\theta(x^{*})$ , thus $x^{*}$ is a solution of $\text{MGVI}(\theta,F)$ . ∎

Thus, solving $\text{MGVI}(\theta,F)$ reduces to solving a zero point of $e(x,\beta)$ . So, for given $\beta>0$ , $\|e(x,\beta)\|$ can be treated as a error measure function. Next we will show that $\|e(x,\beta)\|$ is a non-decreasing function about $\beta$ , while $\|e(x,\beta)\|/\beta$ is a non-increasing function about $\beta$ . This show that $e(x,\beta)$ can be used for stopping criterion when implementing algorithms. The next result can be seen by a simple modification of notations in the proof of Theorem 10.9 in MR3719240 .

Lemma 3 (MR3719240 )

For all $x\in\mathbb{R}^{n}$ , and $\beta_{2}\geq\beta_{1}>0$ , it holds that

\|e(x,\beta_{2})\|\geq\|e(x,\beta_{1})\|

(7)

and

\frac{\|e(x,\beta_{2})\|}{\beta_{2}}\leq\frac{\|e(x,\beta_{1})\|}{\beta_{1}}.

(8)

Before ending this part, for simplifying the convergence analysis of our proposed algorithms, the definition of Fejér monotone sequence and its related properties are recalled.

Definition 1 (Fejér monotone sequence)

Let $\mathcal{D}\subseteq\mathbb{R}^{n}$ be nonempty, and let $\{x^{k}\}_{k}$ be a sequence in $\mathbb{R}^{n}$ , then $\{x_{k}\}_{k}$ is Fejér monotone with respect to $\mathcal{D}$ if for all $k$ and any $x\in\mathcal{D}$ , it holds that

\|x^{k+1}-x\|\leq\|x^{k}-x\|.

Lemma 4 (MR3616647 )

Let $\mathcal{D}\subseteq\mathbb{R}^{n}$ be nonempty, and let $\{x^{k}\}_{k}$ be a Fejér monotone sequence with respect to $\mathcal{D}$ , then the following hold

•

$\{x^{k}\}_{k}$ is bounded;
•

If any subsequence of $\{x^{k}\}_{k}$ converges to a point in $\mathcal{D}$ , then $\{x^{k}\}_{k}$ converges to a point in $\mathcal{D}$ .

3 Generalized Extragradient Method for $\text{MGVI}(\theta,F)$

In this section, we will introduce the first class of our PGA-based Predictor-Corrector algorithms for solving $\text{MGVI}(\theta,F)$ . The motivation for our algorithm is the extragradient method (EM) MR451121 , which is designed for solving $\text{MVI}(\theta,F,\mathcal{C})$ . Our algorithm for $\text{MGVI}(\theta,F)$ can be deemed as a natural generalization of EM, and is denoted by us as GEM. The difference of GEM from EM is that the proximity operator is used for both of the predictor and corrector steps, while projector operator is used in EM. Next, we will show firstly GEM for $\text{MGVI}(\theta,F)$ in Algorithm 1, followed by the its convergence.

Input:

x^{0}\in\mathbb{R}^{n}

, and

\nu\in(0,1)

2for $k=0,1,\cdots$ do

3 Predictor: Selecting

\beta^{k}>0

such that

\beta^{k}\|F(x^{k})-F(\tilde{x}^{k})\|\leq\nu\|x^{k}-\tilde{x}^{k}\|

where

\tilde{x}^{k}=\text{Prox}_{\beta^{k}\theta}(x^{k}-\beta^{k}F(x^{k}))

;

4 Corrector: Setting

x^{k+1}=\text{Prox}_{\beta^{k}\theta}(x^{k}-\beta^{k}F(\tilde{x}^{k}))

5 end for

Algorithm 1 GEM for

\text{MGVI}(\theta,F)

Remark 1

Under the second term of Assumption 1, we conclude that

\beta^{k}\|F(x^{k})-F(\tilde{x}^{k})\|\leq\nu\|x^{k}-\tilde{x}^{k}\|

whenever $\beta^{k}\leq\frac{\nu}{L}$ .

In order to prove the convergence of GEM, we need the next propositions.

Proposition 1

Let $x,\tilde{x},y\in\mathbb{R}^{n}$ , $\beta>0$ , and let

\widetilde{T}_{\beta}(x):=\text{Prox}_{\beta\theta}(x-\beta F(\tilde{x})),

then the following inequality holds

\|\widetilde{T}_{\beta}(x)-y\|^{2}\leq\|x-y\|^{2}-\|x-\widetilde{T}_{\beta}(x)\|^{2}+2\beta(y-\widetilde{T}_{\beta}(x))^{T}F(\tilde{x})+2\beta(\theta(y)-\theta(\widetilde{T}_{\beta}(x))).

Proof

Consider the function

\phi(u)=(u-x)^{T}F(\tilde{x})+\theta(u)+\frac{1}{2\beta}\|u-x\|^{2}.

Because $\widetilde{T}_{\beta}(x)=\operatorname*{argmin}\{\phi(u):u\in\mathbb{R}^{n}\}$ , and $\phi$ is a $\frac{1}{\beta}$ -strongly closed proper convex function, then it follows from Theorem 5.25 in MR3719240 that

\phi(y)-\phi(\widetilde{T}_{\beta}(x))\geq\frac{1}{2\beta}\|y-\widetilde{T}_{\beta}(x)\|^{2},

(9)

on the other hand, one obtains that

\begin{array}[]{ll}\phi(y)-\phi(\widetilde{T}_{\beta}(x))&=(y-\widetilde{T}_{\beta}(x))^{T}F(\tilde{x})+\theta(y)-\theta(\widetilde{T}_{\beta}(x))\\ &\quad+\frac{1}{2\beta}(\|x-y\|^{2}-\|x-\widetilde{T}_{\beta}(x)\|^{2}).\end{array}

(10)

Combining (9) and (10), we obtain the targeted inequality

\|\widetilde{T}_{\beta}(x)-y\|^{2}\leq\|x-y\|^{2}-\|x-\widetilde{T}_{\beta}(x)\|^{2}+2\beta(y-\widetilde{T}_{\beta}(x))^{T}F(\tilde{x})+2\beta(\theta(y)-\theta(\widetilde{T}_{\beta}(x))).

∎

Next, we prove the key inequality for the convergence analysis of GEM, which is based on proposition 1.

Proposition 2

Let $\{x^{k}\}_{k}$ be the sequence generated by GEM (Algorithm 1), and let $x^{*}$ be any solution of $\text{MGVI}(\theta,F)$ , then the next inequality holds

\|x^{k+1}-x^{*}\|^{2}\leq\|x^{k}-x^{*}\|^{2}-(1-\nu^{2})\|x^{k}-\tilde{x}^{k}\|^{2}.

Proof

Substituting $x^{*}=y$ , $x^{k}=x$ , $\tilde{x}^{k}=\tilde{x}$ , $\beta^{k}=\beta$ , and $x^{k+1}=\widetilde{T}_{\beta^{k}}(x^{k})$ in the inequality of proposition 1, we obtain

\begin{array}[]{ll}\|x^{k+1}-x^{*}\|^{2}\leq&\|x^{k}-x^{*}\|^{2}-\|x^{k}-x^{k+1}\|^{2}\\ &+2\beta^{k}(x^{*}-x^{k+1})^{T}F(\tilde{x}^{k})+2\beta^{k}(\theta(x^{*})-\theta(x^{k+1})).\end{array}

(11)

On the other hand, $-F(x^{*})\in\partial\theta(x^{*})$ implies that

2\beta^{k}\theta(\tilde{x}^{k})-2\beta^{k}\theta(x^{*})+2\beta^{k}(\tilde{x}^{k}-x^{*})^{T}F(\tilde{x}^{k})\geq 0.

(12)

Adding inequality (12) to the right side of inequality (11), then it follows that

\begin{array}[]{ll}\|x^{k+1}-x^{*}\|^{2}&\leq\|x^{k}-x^{*}\|^{2}-\|x^{k}-x^{k+1}\|^{2}\\ &\quad+2\beta^{k}(\tilde{x}^{k}-x^{k+1})^{T}F(\tilde{x}^{k})+2\beta^{k}(\theta(\tilde{x}^{k})-\theta(x^{k+1}))\\ &=\|x^{k}-x^{*}\|^{2}-\|x^{k}-\tilde{x}^{k}\|^{2}-\|\tilde{x}^{k}-x^{k+1}\|^{2}\\ &\quad-2(x^{k+1}-\tilde{x}^{k})^{T}(\tilde{x}^{k}-x^{k}+\beta^{k}F(\tilde{x}^{k}))+2\beta^{k}(\theta(\tilde{x}^{k})-\theta(x^{k+1}),\end{array}

(13)

where the second equality follows from

\|x^{k}-x^{k+1}\|^{2}=\|x^{k}-\tilde{x}^{k}\|^{2}+\|\tilde{x}^{k}-x^{k+1}\|^{2}+2(x^{k}-\tilde{x}^{k})^{T}(\tilde{x}^{k}-x^{k+1}).

Moreover, $\tilde{x}^{k}=\text{Prox}_{\beta^{k}\theta}(x^{k}-\beta^{k}F(x^{k}))$ (Lemma 4) implies that

2\beta^{k}\theta(x^{k+1})-2\beta^{k}\theta(\tilde{x}^{k})+2(\tilde{x}^{k}-x^{k+1})^{T}(x^{k}-\beta^{k}F(x^{k})-\tilde{x}^{k})\geq 0.

(14)

Then adding (14) to the right side of (13), it follows that

\begin{array}[]{ll}\|x^{k+1}-x^{*}\|^{2}&\leq\|x^{k}-x^{*}\|^{2}-\|x^{k}-\tilde{x}^{k}\|^{2}-\|\tilde{x}^{k}-x^{k+1}\|^{2}\\ &\quad+2\beta^{k}(\tilde{x}^{k}-x^{k+1})^{T}(F(\tilde{x}^{k})-F(x^{k})).\end{array}

(15)

Considering also that

2\beta^{k}(\tilde{x}^{k}-x^{k+1})^{T}(F(\tilde{x}^{k})-F(x^{k}))\leq\|\tilde{x}^{k}-x^{k+1}\|^{2}+\|\beta^{k}(F(\tilde{x}^{k})-F(x^{k}))\|^{2},

(16)

then combining (15) and (16), and also taking into account that

\|\beta^{k}(F(\tilde{x}^{k})-F(x^{k}))\|^{2}\leq\nu^{2}\|\tilde{x}^{k}-x^{k}\|^{2},

we conclude that

\|x^{k+1}-x^{*}\|^{2}\leq\|x^{k}-x^{*}\|^{2}-(1-\nu^{2})\|x^{k}-\tilde{x}^{k}\|^{2}.

∎

Theorem 3.1 (Convergence of GEM for $\text{MGVI}(\theta,F)$ )

Let $\{x^{k}\}_{k}$ be the sequence generated by GEM, and let $\{\beta^{k}\}_{k}$ be bounded such that $\inf\{\beta^{k}\}>0$ . Then $\{x^{k}\}_{k}$ converges to a solution of $\text{MGVI}(\theta,F)$ .

Proof

One sees from proposition 2 that for all $k$ , and any $x^{*}\in\Omega^{*}$ , it holds that

\|x^{k+1}-x^{*}\|^{2}\leq\|x^{k}-x^{*}\|^{2}-(1-\nu^{2})\|x^{k}-\tilde{x}^{k}\|^{2}.

This implies that $\{x^{k}\}_{k}$ is Fejér monotone with respect to $\Omega^{*}$ . Adding the above inequality from $0$ to $n$ , taking some rearrangements, and letting $n\rightarrow\infty$ , then it holds that

\sum_{k=0}^{\infty}(1-\nu^{2})\|x^{k}-\tilde{x}^{k}\|^{2}\leq\|x_{0}-x^{*}\|^{2}.

Thus $\sum_{k=0}^{\infty}\|x^{k}-\tilde{x}^{k}\|^{2}<\infty$ , this implies that $\|x^{k}-\tilde{x}^{k}\|\rightarrow 0$ , from which we conclude $\{x^{k}\}_{k}$ and $\{\tilde{x}^{k}\}_{k}$ have common limit points. Let $y$ be any of their common limit points, then $x^{k_{j}}\rightarrow y$ and $\tilde{x}^{k_{j}}\rightarrow y$ . Because $\{\beta^{k}\}_{k}$ is bounded and $\inf\{\beta_{k}\}>0$ , without loss of generality, one can assume that $\beta^{k_{j}}\rightarrow\beta^{*}$ with $\beta^{*}>0$ . Clearly, $\tilde{x}^{k_{j}}=\text{Prox}_{\beta^{k_{j}}\theta}(x^{k_{j}}-\beta^{k_{j}}F(x^{k_{j}}))$ , then it follows from Lemma 1 that for any $x\in\mathbb{R}^{n}$ , it holds

\beta^{k_{j}}\theta(x)-\beta^{k_{j}}\theta(\tilde{x}^{k_{j}})+(x-\tilde{x}^{k_{j}})^{T}(\tilde{x}^{k_{j}}-[x^{k_{j}}-\beta^{k_{j}}F(x^{k_{j}})])\geq 0,

thus

		$\displaystyle\limsup\{\beta^{k_{j}}\theta(x)-\beta^{k_{j}}\theta(\tilde{x}^{k_{j}})+(x-\tilde{x}^{k_{j}})^{T}(\tilde{x}^{k_{j}}-[x^{k_{j}}-\beta^{k_{j}}F(x^{k_{j}})])\}$
	$\displaystyle=$	$\displaystyle\beta^{}\theta(x)-\liminf\beta^{k_{j}}\theta(\tilde{x}^{k_{j}})+(x-y)^{T}(\beta^{}F(y))\geq 0.$		(17)

On the other hand, because $\theta$ is a closed proper convex function, this implies that $\theta$ is lower semicontinuous, thus $\theta(y)\leq\liminf\theta(\tilde{x}^{k_{j}})$ . Thus

-\liminf\beta^{k_{j}}\theta(\tilde{x}^{k_{j}})\leq-\beta^{*}\theta(y).

(18)

Then combining (3) and (18), and considering that $\beta^{*}>0$ , we conclude that

\theta(x)-\theta(y)+(x-y)^{T}F(y)\geq 0,

for any $x\in\mathbb{R}^{n}$ . Thus $y$ is a solution of $\text{MGVI}(\theta,F)$ , i.e., $y\in\Omega^{*}$ . Finally, because $\{x^{k}\}_{k}$ is Fejér monotone with respect to $\Omega^{*}$ , $y$ is any limit point of $\{x^{k}\}_{k}$ , and $y\in\Omega^{*}$ , thus Lemma 4 implies that $\{x^{k}\}_{k}$ converges to a solution of $\text{MGVI}(\theta,F)$ . ∎

Remark 2

To guarantee the convergence of GEM (Algorithm 1), $\{\beta^{k}\}_{k}$ requires to be bounded with $\inf\{\beta^{k}\}>0$ . One can select $\beta^{k}\equiv\beta$ with $\beta\leq\frac{\nu}{L}$ . However, in some applications, $L$ can not be explicitly computed, or it is evaluated conservatively, the latter causes small $\beta$ which usually results in slow convergence. Thus, one can take some adaptive rules for choosing $\beta^{k}$ at each iteration.

4 Proximity and Contraction Algorithms for $\text{MGVI}(\theta,F)$

In this section, we will introduce the second class of our PGA-based Predictor-Corrector algorithms for $\text{MGVI}(\theta,F)$ . We are inspired by the works of He for $\text{MVI}(\theta,F,\mathcal{C})$ in MR1418264 ; MR1297058 , where the projection operator is used for making a predictor, while our algorithms for $\text{MGVI}(\theta,F)$ use proximity operator for making a predictor. The corrector strategies developed by He for $\text{MVI}(\theta,F,\mathcal{C})$ can be inherited in our algorithms, this is an inspiring result. Considering that the algorithms developed by He for $\text{MVI}(\theta,F,\mathcal{C})$ are called as projection and contraction algorithms, our algorithms depend on the strategies for corrector steps by He, and our algorithms also enjoy the contraction properties, thus we name our algorithms as proximity and contraction algorithms. In the sequel, we will just employ part of the corrector strategies developed by He for constructing our algorithms, the reader can refer to MR1418264 ; MR1297058 for more strategies for corrector step.

Firstly, we introduce three fundamental inequalities, which are key for constructing our algorithms for $\text{MGVI}(\theta,F)$ .

Proposition 3 (Three fundamental inequalities)

Let $x^{*}$ be any solution of $\text{MGVI}(\theta,F)$ , and for any $x\in\mathbb{R}^{n}$ and $\beta>0$ , let $\tilde{x}=\text{Prox}_{\beta\theta}(x-\beta F(x))$ , then the following three inequalities hold:

\beta\theta(\tilde{x})-\beta\theta(x^{*})+(\tilde{x}-x^{*})^{T}\beta F(x^{*})\geq 0,

(FI1)

\beta\theta(x^{*})-\beta\theta(\tilde{x})+(\tilde{x}-x^{*})^{T}([x-\beta F(x)]-\tilde{x})\geq 0,

(FI2)

\displaystyle(\tilde{x}-x^{*})^{T}(\beta F(\tilde{x})-\beta F(x^{*}))\geq 0.

(FI3)

Proof

The first inequality (FI1) follows from the definition of $\text{MGVI}(\theta,F)$ . The final inequality (FI3) holds because $F$ is monotone over $\mathbb{R}^{n}$ . Now let us prove the second inequality (FI2). Because $\tilde{x}=\text{Prox}_{\beta\theta}(x-\beta F(x))$ , then it follows from Lemma 1 that for all $y\in\mathbb{R}^{n}$ , it holds that

\beta\theta(y)\geq\beta\theta(\tilde{x})+(y-\tilde{x})^{T}(x-\beta F(x)-\tilde{x}),

replacing $y$ by $x^{*}$ , and by rearrangement, then we conclude (FI2). ∎

We observe that the leftmost terms of (FI1) and (FI2) are opposite to each other, so they can be eliminated when adding (FI1) and (FI2), or when adding (FI1), (FI2) and (FI3). This observance makes it possible for us to directly employ the strategies of corrector step developed by He in MR1418264 ; MR1297058 for $\text{MVI}(\theta,F,\mathcal{C})$ .

4.1 Proximity and contraction algorithms based on (FI1) and (FI2)

In this part, we consider $\text{MGVI}(\theta,F)$ with $F(x)=Mx+q$ . Then (FI1) and (FI2) take the next forms

\beta\theta(\tilde{x})-\beta\theta(x^{*})+(\tilde{x}-x^{*})^{T}\beta(Mx^{*}+q)\geq 0,

and

\beta\theta(x^{*})-\beta\theta(\tilde{x})+(\tilde{x}-x^{*})^{T}([x-\beta(Mx+q)]-\tilde{x})\geq 0.

Adding the above two inequalities together, one obtains that

	$\displaystyle(x-x^{*})^{T}(I+\beta M^{T})(x-\tilde{x})$	$\displaystyle\geq(x-x^{})^{T}\beta M(x-x^{})+\\|x-\tilde{x}\\|^{2}$
		$\displaystyle\geq\\|x-\tilde{x}\\|^{2},$		(19)

where the last inequality holds because $M$ is positive semi-definite²²2Here, $M$ is not required to be symmetric., i.e., for all $x,y\in\mathbb{R}^{n}$ , $(x-y)^{T}M(x-y)\geq 0$ . Replacing the above $x$ , $\tilde{x}$ , $\beta$ by $x^{k}$ , $\tilde{x}^{k}$ , $\beta^{k}$ respectively, and setting

	$\displaystyle\phi(x^{k},\tilde{x}^{k},\beta^{k})$	$\displaystyle:=\\|x^{k}-\tilde{x}^{k}\\|^{2},$
	$\displaystyle d(x^{k},\tilde{x}^{k},\beta^{k})$	$\displaystyle:=(I+\beta^{k}M^{T})(x^{k}-\tilde{x}^{k}),$

then it follows that $(x^{k}-x^{*})^{T}d(x^{k},\tilde{x}^{k},\beta^{k})\geq\phi(x^{k},\tilde{x}^{k},\beta^{k})$ . Next, we use

x^{k+1}=x^{k}-\alpha d(x^{k},\tilde{x}^{k},\beta^{k})

to obtain the next surrogate, then one sees

	$\displaystyle\vartheta_{k}(\alpha)$	$\displaystyle:=\\|x^{k}-x^{}\\|^{2}-\\|x^{k+1}-x^{}\\|^{2}$
		$\displaystyle=2\alpha(x^{k}-x^{*})^{T}d(x^{k},\tilde{x}^{k},\beta^{k})-\alpha^{2}\\|d(x^{k},\tilde{x}^{k},\beta^{k})\\|^{2}$
		$\displaystyle\geq 2\alpha\phi(x^{k},\tilde{x}^{k},\beta^{k})-\alpha^{2}\\|d(x^{k},\tilde{x}^{k},\beta^{k})\\|^{2}:=q_{k}(\alpha).$

Note that $q_{k}(\alpha)$ obtains its maximum at $\alpha_{k}^{*}=\frac{\phi(x^{k},\tilde{x}^{k},\beta^{k})}{\|d(x^{k},\tilde{x}^{k},\beta^{k})\|^{2}}$ , thus by setting $x^{k+1}=x^{k}-\gamma\alpha^{*}d(x^{k},\tilde{x}^{k},\beta^{k})$ , one sees that

	$\displaystyle\\|x^{k+1}-x^{*}\\|^{2}$	$\displaystyle\leq\\|x^{k}-x^{}\\|^{2}-\gamma(2-\gamma)\alpha_{k}^{}\phi(x^{k},\tilde{x}^{k},\beta^{k})$
		$\displaystyle=\\|x^{k}-x^{}\\|^{2}-\gamma(2-\gamma)\alpha_{k}^{}\\|x^{k}-\tilde{x}^{k}\\|^{2}.$		(20)

Note that $\alpha_{k}^{*}=\frac{\phi(x^{k},\tilde{x}^{k},\beta^{k})}{d(x^{k},\tilde{x}^{k},\beta^{k})}\geq\frac{1}{\|I+\beta^{k}M^{T}\|_{2}}$ . Thus, $\inf\{\beta^{k}\}_{k}>0$ implies that $\inf\{\alpha_{k}^{*}\}_{k}>0$ .

The aforementioned algorithm, denoted as $\text{PGA}_{a_{1}}$ , is summarized in Algorithm 2.

Input:

x^{0}\in\mathbb{R}^{n}

, and

\gamma\in(0,2)

2for $k=0,1,\cdots$ do

3 Predictor: Selecting

\beta_{k}>0

, and

\tilde{x}^{k}=\text{Prox}_{\beta^{k}\theta}(x^{k}-\beta^{k}F(x^{k}))

;

4 Corrector: Setting

d(x^{k},\tilde{x}^{k},\beta^{k})=(I+\beta^{k}M^{T})(x^{k}-\tilde{x}^{k})

, and computing

\alpha_{k}^{*}=\frac{\|x^{k}-\tilde{x}^{k}\|^{2}}{\|d(x^{k},\tilde{x}^{k},\beta^{k})\|^{2}}

5 Setting

x^{k+1}=x^{k}-\gamma\alpha_{k}^{*}d(x^{k},\tilde{x}^{k},\beta^{k})

6 end for

Algorithm 2

\text{PGA}_{a_{1}}

for

\text{MGVI}(\theta,F)

Next, we construct another proximity and contraction algorithm, which is based on that $M$ is symmetric positive semi-definite. In this case, setting $\phi(x^{k},\tilde{x}^{k}):=\|x^{k}-\tilde{x}^{k}\|^{2}$ , $d(x^{k},\tilde{x}^{k}):=x^{k}-\tilde{x}^{k}$ , and let $G:=I+\beta M^{T}$ , $x^{k+1}=x^{k}-\alpha d(x^{k},\tilde{x}^{k})$ , then one sees $(x^{k}-x^{*})^{T}Gd(x^{k},\tilde{x}^{k})\geq\phi(x^{k},\tilde{x}^{k})$ . Let $\vartheta_{k}(\alpha):=\|x^{k}-x^{*}\|_{G}^{2}-\|x^{k+1}-x^{*}\|_{G}^{2}$ , one sees that

	$\displaystyle\vartheta_{k}(\alpha)$	$\displaystyle=2(x^{k}-x^{*})Gd(x^{k},\tilde{x}^{k})-\alpha^{2}\\|d(x^{k},\tilde{x}^{k})\\|_{G}^{2}$
		$\displaystyle\geq 2\phi(x^{k},\tilde{x}^{k})-\alpha^{2}\\|d(x^{k},\tilde{x}^{k})\\|_{G}^{2}:=q_{k}(\alpha).$

Note that $q_{k}(\alpha)$ obtains its maximum at $\alpha_{k}^{*}=\frac{\phi(x^{k},\tilde{x}^{k})}{\|d(x^{k},\tilde{x}^{k})\|_{G}^{2}}$ , thus by setting $x^{k+1}=x^{k}-\gamma\alpha_{k}^{*}d(x^{k},\tilde{x}^{k})$ , one sees that

	$\displaystyle\\|x^{k+1}-x^{*}\\|_{G}^{2}$	$\displaystyle\leq\\|x^{k}-x^{}\\|_{G}^{2}-\gamma(2-\gamma)\alpha_{k}^{}\phi(x^{k},\tilde{x}^{k},\beta^{k})$
		$\displaystyle=\\|x^{k}-x^{}\\|_{G}^{2}-\gamma(2-\gamma)\alpha_{k}^{}\\|x^{k}-\tilde{x}^{k}\\|^{2}.$

Note that $\alpha_{k}^{*}=\frac{\phi(x^{k},\tilde{x}^{k})}{\|d(x^{k},\tilde{x}^{k})\|_{G}^{2}}\geq\frac{1}{\lambda_{\text{max}}(G)}$ .

The above method for symmetric $M$ , denoted as $\text{PGA}_{a_{2}}$ , is summarized below in Algorithm 3.

Input:

x^{0}\in\mathbb{R}^{n}

\beta>0

, and

\gamma\in(0,2)

2for $k=0,1,\cdots$ do

3 Predictor:

\tilde{x}^{k}=\text{Prox}_{\beta\theta}(x^{k}-\beta F(x^{k}))

;

4 Corrector: Setting

d(x^{k},\tilde{x}^{k})=x^{k}-\tilde{x}^{k}

, and computing

\alpha_{k}^{*}=\frac{\|x^{k}-\tilde{x}^{k}\|^{2}}{\|d(x^{k},\tilde{x}^{k})\|_{G}^{2}}

5 Setting

x^{k+1}=x^{k}-\gamma\alpha_{k}^{*}d(x^{k},\tilde{x}^{k})

6 end for

Algorithm 3

\text{PGA}_{a_{2}}

for

\text{MGVI}(\theta,F)

Theorem 4.1 (Convergence of $\text{PGA}_{a_{1}}$ and $\text{PGA}_{a_{2}}$ for $\text{MGVI}(\theta,F)$ )

Let $\{x^{k}\}_{k}$ and $\{y^{k}\}_{k}$ be the sequences generated by $\text{PGA}_{a_{1}}$ and $\text{PGA}_{a_{2}}$ , respectively, and for $\text{PGA}_{a_{1}}$ , let $\{\beta^{k}\}_{k}$ be bounded with $\inf\{\beta_{k}\}>0$ . Then $\{x^{k}\}_{k}$ (resp. $\{y^{k}\}_{k})$ converges to a solution of the corresponding $\text{MGVI}(\theta,F)$ .

Proof

We will just prove the convergence of $\text{PGA}_{a_{1}}$ since the proof of the convergence of $\text{PGA}_{a_{2}}$ follows the similar argument.

Let $x^{*}\in\Omega^{*}$ be any solution of $\text{MGVI}(\theta,F)$ . The inequality (4.1) implies that $\{x^{k}\}_{k}$ is Fejér monotone with respect to $\Omega^{*}$ , moreover, by adding from $k=0$ to $n$ , taking some rearrangements, and letting $n\rightarrow\infty$ , then it holds that

\sum_{k=0}^{\infty}\gamma(2-\gamma)\alpha_{k}^{*}\|x^{k}-\tilde{x}^{k}\|^{2}\leq\|x_{0}-x^{*}\|^{2}.

Because $\inf\{\alpha_{k}^{*}\}>0$ , thus the above inequality implies $\|x^{k}-\tilde{x}^{k}\|\rightarrow 0$ , for which we conclude that $\{x^{k}\}_{k}$ and $\{\tilde{x}^{k}\}_{k}$ have common limit points. Then the similar argument in Theorem 3.1 implies that $\{x^{k}\}_{k}$ converges to a solution of $\text{MGVI}(\theta,F)$ . ∎

Remark 3

To guarantee that $\text{PGA}_{a_{1}}$ (Algorithm 2) converges, $\{\beta^{k}\}_{k}$ requires to be bounded such that $\inf\{\beta^{k}\}>0$ . The adaptive rule

\|\beta^{k}M^{T}(x^{k}-\tilde{x}^{k})\|=\mathcal{O}(\|x^{k}-\tilde{x}^{k}\|)

was suggested by He for $\text{MVI}(\theta,F,\mathcal{C})$ , and can also be applied to our algorithms. For $\text{PGA}_{a_{2}}$ , $\beta^{k}\equiv\beta$ , the convergence is guaranteed with $\beta$ being any positive constant, one possibility may be $\beta\|M\|_{2}\approx 1$ .

4.2 Proximity and contraction algorithms based on (FI2), (FI2) and (FI3)

In this part, we consider the case where $F$ is monotone. By adding (FI1), (FI2) and (FI3) together, one obtains that

(\tilde{x}-x^{*})^{T}\{(x-\tilde{x})-\beta(F(x)-F(\tilde{x}))\}\geq 0,

thus it holds that

(x-x^{*})^{T}\{(x-\tilde{x})-\beta(F(x)-F(\tilde{x}))\}\geq(x-\tilde{x})^{T}\{(x-\tilde{x})-\beta(F(x)-F(\tilde{x}))\}.

(21)

Replacing the above $x$ , $\tilde{x}$ , $\beta$ by $x^{k}$ , $\tilde{x}^{k}$ , $\beta^{k}$ respectively, and setting

	$\displaystyle\phi(x^{k},\tilde{x}^{k},\beta^{k})$	$\displaystyle:=(x^{k}-\tilde{x}^{k})^{T}\{(x^{k}-\tilde{x}^{k})-\beta^{k}(F(x^{k})-F(\tilde{x}^{k}))\},$
	$\displaystyle d(x^{k},\tilde{x}^{k},\beta^{k})$	$\displaystyle:=(x^{k}-\tilde{x}^{k})-\beta^{k}(F(x^{k})-F(\tilde{x}^{k})),$

then one sees that $(x^{k}-x^{*})^{T}d(x^{k},\tilde{x}^{k},\beta^{k})\geq\phi(x^{k},\tilde{x}^{k},\beta^{k})$ . For a fixed $\nu\in(0,1)$ , considering that $F$ is lipschitz continuous with parameter $L>0$ , thus some $\beta^{k}$ can be selected such that

\beta^{k}\|F(x^{k})-F(\tilde{x}^{k})\|\leq\nu\|x^{k}-\tilde{x}^{k}\|,

thus $\phi(x^{k},\tilde{x}^{k},\beta^{k})\geq(1-\nu)\|x^{k}-\tilde{x}^{k}\|^{2}$ . By setting $x^{k+1}=x^{k}-\alpha d(x^{k},\tilde{x}^{k},\beta^{k})$ , one sees that

	$\displaystyle\vartheta_{k}(\alpha)$	$\displaystyle:=\\|x^{k}-x^{}\\|^{2}-\\|x^{k+1}-x^{}\\|^{2}$
		$\displaystyle=2\alpha(x^{k}-x^{*})^{T}d(x^{k},\tilde{x}^{k},\beta^{k})-\alpha^{2}\\|d(x^{k},\tilde{x}^{k},\beta^{k})\\|^{2}$
		$\displaystyle\geq 2\alpha\phi(x^{k},\tilde{x}^{k},\beta^{k})-\alpha^{2}\\|d(x^{k},\tilde{x}^{k},\beta^{k})\\|^{2}:=q_{k}(\alpha).$

Note that $q_{k}(\alpha)$ obtains its maximum at $\alpha_{k}^{*}=\frac{\phi(x^{k},\tilde{x}^{k},\beta^{k})}{\|d(x^{k},\tilde{x}^{k},\beta^{k})\|^{2}}$ , and one also sees that

	$\displaystyle 2\phi(x^{k},\tilde{x}^{k},\beta^{k})-\\|d(x^{k},\tilde{x}^{k},\beta^{k})\\|^{2}$	$\displaystyle=2(x^{k}-\tilde{x}^{k})^{T}d(x^{k},\tilde{x}^{k},\beta^{k})-\\|d(x^{k},\tilde{x}^{k},\beta^{k})\\|^{2}$
		$\displaystyle=d(x^{k},\tilde{x}^{k},\beta^{k})^{T}\{2(x^{k}-\tilde{x}^{k})-d(x^{k},\tilde{x}^{k},\beta^{k})\}$
		$\displaystyle=(x^{k}-\tilde{x}^{k})^{T}-{\beta^{k}}^{2}\\|F(x^{k})-F(\tilde{x}^{k})\\|^{2}$
		$\displaystyle\geq(1-\nu^{2})\\|x^{k}-\tilde{x}^{k}\\|^{2},$

thus $\alpha_{k}^{*}>\frac{1}{2}$ for all $k$ . Setting $x^{k+1}=x^{k}-\gamma\alpha_{k}^{*}d(x^{k},\tilde{x}^{k},\beta^{k})$ , then one obtains that

	$\displaystyle\\|x^{k+1}-x^{*}\\|^{2}$	$\displaystyle\leq\\|x^{k}-x^{}\\|^{2}-(2-\gamma)\gamma\alpha_{k}^{}\phi(x^{k},\tilde{x}^{k},\beta^{k})$
		$\displaystyle\leq\\|x^{k}-x^{}\\|^{2}-(2-\gamma)\gamma\alpha_{k}^{}(1-\nu)\\|x^{k}-\tilde{x}^{k}\\|^{2}.$

The aforementioned algorithm, denoted as $\text{PGA}_{b_{1}}$ , is summarized in Algorithm 4.

Input:

x^{0}\in\mathbb{R}^{n}

, and

\gamma\in(0,2)

2for $k=0,1,\cdots$ do

3 Predictor: Selecting

\beta^{k}>0

such that

\beta^{k}\|F(x^{k})-F(\tilde{x}^{k})\|\leq\nu\|x^{k}-\tilde{x}^{k}\|

where

\tilde{x}^{k}=\text{Prox}_{\beta^{k}\theta}(x^{k}-\beta^{k}F(x^{k}))

;

4 Corrector: Setting

d(x^{k},\tilde{x}^{k},\beta^{k})=(x^{k}-\tilde{x}^{k})-\beta^{k}(F(x^{k})-F(\tilde{x}^{k}))

, and computing

\alpha_{k}^{*}=\frac{(x^{k}-\tilde{x}^{k})^{T}d(x^{k},\tilde{x}^{k},\beta^{k})}{\|d(x^{k},\tilde{x}^{k},\beta^{k})\|^{2}}

5 Setting

x^{k+1}=x^{k}-\gamma\alpha_{k}^{*}d(x^{k},\tilde{x}^{k},\beta^{k})

6 end for

Algorithm 4

\text{PGA}_{b_{1}}

for

\text{MGVI}(\theta,F)

Finally, we distinguish a special case where $F(x)=Ax+c$ with $A$ being a symmetric positive semi-definite matrix, and $c\in\mathbb{R}^{n}$ being a constant vector. In this case, inequality (21) takes the next form

(x-x^{*})^{T}(I-\beta A)(x-\tilde{x})\geq(x-\tilde{x})^{T}(I-\beta A)(x-\tilde{x}).

Setting $G:=I-\beta A$ with $\beta$ being a fixed positive constant smaller than $\frac{1}{\lambda_{\max}(A)}$ , thus $G$ is a symmetric positive definite matrix. Replacing the above $x$ and $\tilde{x}$ by $x^{k}$ and $\tilde{x}^{k}$ , respectively. Then, one sees that

(x^{k}-x^{*})^{T}G(x^{k}-\tilde{x}^{k})\geq(x^{k}-\tilde{x}^{k})^{T}G(x^{k}-\tilde{x}^{k}).

Let $x^{k+1}=x^{k}-\gamma(x^{k}-\tilde{x}^{k})$ , thus it follows that

\|x^{k+1}-x^{*}\|_{G}^{2}\leq\|x^{k}-x^{*}\|_{G}^{2}-(2-\gamma)\gamma\|x^{k}-\tilde{x}^{k}\|_{G}^{2}.

The above method for symmetric $A$ is denoted as $\text{PGA}_{b2}$ , the corrector strategy was not mentioned by He, but it can be simply constructed from (21), we summarize it below in Algorithm 5.

Input:

x^{0}\in\mathbb{R}^{n}

\beta>0

, and

\gamma\in(0,2)

2for $k=0,1,\cdots$ do

3 Predictor:

\tilde{x}^{k}=\text{Prox}_{\beta\theta}(x^{k}-\beta F(x^{k}))

;

4 Corrector: Setting

d(x^{k},\tilde{x}^{k})=x^{k}-\tilde{x}^{k}

5 Setting

x^{k+1}=x^{k}-\gamma d(x^{k},\tilde{x}^{k})

6 end for

Algorithm 5

\text{PGA}_{b_{2}}

for

\text{MGVI}(\theta,F)

The convergence analysis of $\text{PGA}_{b_{1}}$ and $\text{PGA}_{b_{2}}$ can be proved with essentially the argument that was used in Theorem 4.1.

Theorem 4.2 (Convergence of $\text{PGA}_{b_{1}}$ and $\text{PGA}_{b_{2}}$ for $\text{MGVI}(\theta,F)$ )

Let $\{x^{k}\}_{k}$ and $\{y^{k}\}_{k}$ be the sequences generated by $PGA_{b_{1}}$ and $PGA_{b_{2}}$ , respectively, and for $PGA_{b_{1}}$ , let $\{\beta^{k}\}_{k}$ be bounded with $\inf\{\beta_{k}\}>0$ . Then $\{x^{k}\}_{k}$ (resp. $\{y^{k}\}_{k})$ converges to a solution of the corresponding $\text{MGVI}(\theta,F)$ .

Remark 4

In the description of $\text{PGA}_{a_{1}}$ , $\text{PGA}_{a_{2}}$ , and $\text{PGA}_{b_{1}}$ , we neglect the fact that $x^{k}$ may be equal to $\tilde{x}^{k}$ , however, this will hardly happen in practice, moreover, when implementing these algorithms, one may priorly determine whether $x^{k}=\tilde{x}^{k}$ .

5 A Discussion about Our Algorithms

In the preceding sections, we introduce our algorithms for $\text{MGVI}(\theta,F)$ , while in this part, we will show the characteristics of our algorithms with respect to the following separable two-blocks convex optimization model.

\begin{split}\min\quad&\theta_{1}(x)+\theta_{2}(y)\\ \text{s.t.}\quad&Ax+By=c,\\ \end{split}

(22)

where $A\in\mathbb{R}^{m\times n}$ , $B\in\mathbb{R}^{m\times q}$ , $c\in\mathbb{R}^{m}$ , $\theta_{1}:\mathbb{R}^{n}\rightarrow(-\infty,\infty]$ and $\theta_{2}:\mathbb{R}^{q}\rightarrow(-\infty,\infty]$ are two closed proper convex functions.

The well known alternating direction method of multipliers (ADMM) MR724072 is very popular for problem (22), the reader can refer to BoydADMM and references therein for efficient applications of ADMM in a variety of fields. The procedure of ADMM is shown below in Algorithm 6.

1 Initialization:

x^{0}\in\mathbb{R}^{n}

y^{0}\in\mathbb{R}^{q}

\lambda^{0}\in\mathbb{R}^{m}

, and

\rho>0

General step: for

k=0,1,\cdots

execute the following:

(a)

$x^{k+1}\in\operatorname*{argmin}\{\theta_{1}(x)+\frac{\rho}{2}\|Ax+By^{k}-c+\frac{1}{\rho}\lambda^{k}\|^{2}:x\in\mathbb{R}^{n}\}$ ;
(b)

$y^{k+1}\in\operatorname*{argmin}\{\theta_{2}(y)+\frac{\rho}{2}\|Ax^{k+1}+By-c+\frac{1}{\rho}\lambda^{k}\|^{2}:y\in\mathbb{R}^{q}\}$ ;
(c)

$\lambda^{k+1}=\lambda^{k}+\rho(Ax^{k+1}+By^{k+1}-c)$ .

Algorithm 6 ADMM for problem (22)

Note that the computation of $x^{k+1}$ and $y^{k+1}$ in the general step of ADMM may be much more complex than the computation of the proximity operators of $\theta_{1}$ and $\theta_{2}$ , thus some variants of ADMM have been proposed for dealing with such matters. One of them is the alternating direction proximal method of multipliers (AD-PMM) MR3719240 , and a special case of AD-PMM is the the alternating direction linearized proximal method of multipliers (AD-LPMM), which just requires the computation of the proximity operators of $\theta_{1}$ and $\theta_{2}$ . AD-LPMM for problem (22) is show in Algorithm 7.

1 Initialization:

x^{0}\in\mathbb{R}^{n}

y^{0}\in\mathbb{R}^{q}

\lambda^{0}\in\mathbb{R}^{m}

\rho>0

\alpha\geq\rho\lambda_{\max}(A^{T}A)

\beta\geq\rho\lambda_{\max}(B^{T}B)

General step: for

k=0,1,\cdots

execute the following:

(a)

$x^{k+1}=\text{Prox}_{\frac{1}{\alpha}\theta_{1}}[x^{k}-\frac{\rho}{\alpha}A^{T}(Ax^{k}+By^{k}-c+\frac{1}{\rho}\lambda^{k})]$ ;
(b)

$y^{k+1}=\text{Prox}_{\frac{1}{\beta}\theta_{2}}[y^{k}-\frac{\rho}{\beta}B^{T}(Ax^{k+1}+By^{k}-c+\frac{1}{\rho}\lambda^{k})]$ ;
(c)

$\lambda^{k+1}=\lambda^{k}+\rho(Ax^{k+1}+By^{k+1}-c)$ .

Algorithm 7 AD-LPMM for problem (22)

Next, we will introduce our algorithms for problem (22). Consider the lagrangian function of problem (22)

L(x,y,\lambda)=\theta_{1}(x)+\theta_{2}(y)-\lambda^{T}(Ax+By-c).

Then $(x^{*},y^{*},\lambda^{*})\in\mathbb{R}^{n}\times\mathbb{R}^{q}\times\mathbb{R}^{m}$ is a saddle point of problem (22) if and only if

\min_{\lambda}L(x^{*},y^{*},\lambda)\leq L(x^{*},y^{*},\lambda^{*})\leq\min_{x,y}L(x,y,\lambda^{*}),

the latter is equivalent to

\Big{[}\begin{array}[]{c}\theta_{1}(x)\\ +\theta_{2}(y)\end{array}\Big{]}-\Big{[}\begin{array}[]{c}\theta_{1}(x^{*})\\ +\theta_{2}(y^{*})\end{array}\Big{]}+\Bigg{[}\begin{pmatrix}x\\ y\\ \lambda\end{pmatrix}-\begin{pmatrix}x^{*}\\ y^{*}\\ \lambda^{*}\end{pmatrix}\Bigg{]}^{T}\Bigg{[}\begin{pmatrix}0&0&-A^{T}\\ 0&0&-B^{T}\\ A&B&0\end{pmatrix}\begin{pmatrix}x\\ y\\ \lambda\end{pmatrix}+\begin{pmatrix}0\\ 0\\ -c\end{pmatrix}\Bigg{]}\geq 0,

for all $(x,y,\lambda)\in\mathbb{R}^{n}\times\mathbb{R}^{q}\times\mathbb{R}^{m}$ . This fits $\text{MGVI}(\theta,F)$ with

\begin{array}[]{ll}\theta(x,y,\lambda)&=\theta_{1}(x)+\theta_{2}(y),\\ F(x,y,\lambda)&=\begin{pmatrix}0&0&-A^{T}\\ 0&0&-B^{T}\\ A&B&0\end{pmatrix}\begin{pmatrix}x\\ y\\ \lambda\end{pmatrix}+\begin{pmatrix}0\\ 0\\ -c\end{pmatrix}.\end{array}

For the theory of saddle point, one can see e.g., MR2184037 . Now we only need to keep in mind that if $(x^{*},y^{*},\lambda^{*})$ is a saddle point of problem (22), then $(x^{*},y^{*})$ will be a minimizer of problem (22).

Note that our algorithms $\text{PGA}_{a_{2}}$ and $\text{PGA}_{b_{2}}$ are designed for some specified operators about $F$ , which can not be applied for this problem. Moreover, for problem (22), owing to the special form of $F$ , this causes that the resulting $\text{PGA}_{a_{1}}$ and $\text{PGA}_{b_{1}}$ will only differ in the computation of $\alpha_{k}^{*}$ when the same self-adaptive rule is employed. For these reasons, we will firstly show GEM and $\text{PGA}_{a_{1}}$ in Algorithm 8 and 9, respectively. A self-adaptive rule for $\{\beta^{k}\}_{k}$ is also included into these two algorithms. The computation of $\alpha_{k}^{*}$ in $\text{PGA}_{b_{1}}$ for problem (22) is given in Algorithm 10.

1 Initialization:

x^{0}\in\mathbb{R}^{n}

y^{0}\in\mathbb{R}^{q}

\lambda^{0}\in\mathbb{R}^{m}

\beta^{0}>0

, and

\nu,\mu\in(0,1)

with

\mu<\nu

2for $k=0,1,\cdots$ do

\tilde{x}^{k}=\text{Prox}_{\beta^{k}\theta_{1}}(x^{k}+\beta^{k}A^{T}\lambda^{k});

\tilde{y}^{k}=\text{Prox}_{\beta^{k}\theta_{2}}(y^{k}+\beta^{k}B^{T}\lambda^{k})

;

\tilde{\lambda}^{k}=\lambda^{k}-\beta^{k}(Ax^{k}+By^{k}-c)

;

r_{k}=\beta^{k}\Bigg{\|}\begin{pmatrix}A^{T}(\lambda^{k}-\tilde{\lambda}^{k})\\ B^{T}(\lambda^{k}-\tilde{\lambda}^{k})\\ A(x^{k}-\tilde{x}^{k})+B(y^{k}-\tilde{y}^{k})\end{pmatrix}\Bigg{\|}\Bigg{/}\Bigg{\|}\begin{pmatrix}x^{k}-\tilde{x}^{k}\\ y^{k}-\tilde{y}^{k}\\ \lambda^{k}-\tilde{\lambda}^{k}\\ \end{pmatrix}\Bigg{\|};

8 if $r_{k}>\nu$ then

\beta^{k}:=\frac{2}{3}*\beta^{k}\min\{1,\frac{1}{r_{k}}\}

;

10 Goto line 3.

11 end

x^{k+1}=\text{Prox}_{\beta^{k}\theta_{1}}(x^{k}+\beta^{k}A^{T}\tilde{\lambda}^{k});

y^{k+1}=\text{Prox}_{\beta^{k}\theta_{2}}(y^{k}+\beta^{k}B^{T}\tilde{\lambda}^{k})

;

\lambda^{k+1}=\lambda^{k}-\beta^{k}(A\tilde{x}^{k}+B\tilde{y}^{k}-c)

;

16 if $r_{k}\leq\mu$ then

\beta^{k+1}:=1.5*\beta^{k}

;

18 else

\beta^{k+1}:=\beta^{k}

20 end

22 end for

Algorithm 8 GEM for problem (22)

1 Initialization:

x^{0}\in\mathbb{R}^{n}

y^{0}\in\mathbb{R}^{q}

\lambda^{0}\in\mathbb{R}^{m}

\beta^{0}>0

\nu\in(0,1)

, and

\gamma\in(0,2)

2for $k=0,1,\cdots$ do

\tilde{x}^{k}=\text{Prox}_{\beta^{k}\theta_{1}}(x^{k}+\beta^{k}A^{T}\lambda^{k});

\tilde{y}^{k}=\text{Prox}_{\beta^{k}\theta_{2}}(y^{k}+\beta^{k}B^{T}\lambda^{k});

\tilde{\lambda}^{k}=\lambda^{k}-\beta^{k}(Ax^{k}+By^{k}-c);

r_{k}=\beta^{k}\Bigg{\|}\begin{pmatrix}A^{T}(\lambda^{k}-\tilde{\lambda}^{k})\\ B^{T}(\lambda^{k}-\tilde{\lambda}^{k})\\ A(x^{k}-\tilde{x}^{k})+B(y^{k}-\tilde{y}^{k})\end{pmatrix}\Bigg{\|}\Bigg{/}\Bigg{\|}\begin{pmatrix}x^{k}-\tilde{x}^{k}\\ y^{k}-\tilde{y}^{k}\\ \lambda^{k}-\tilde{\lambda}^{k}\\ \end{pmatrix}\Bigg{\|};

7 if $r_{k}>\nu$ then

\beta^{k}:=\frac{2}{3}*\beta^{k}\min\{1,\frac{1}{r_{k}}\}

;

9 Goto line 3.

10 end if

\alpha_{k}^{*}=\Bigg{\|}\begin{pmatrix}x^{k}-\tilde{x}^{k}\\ y^{k}-\tilde{y}^{k}\\ \lambda^{k}-\tilde{\lambda}^{k}\end{pmatrix}\Bigg{\|}^{2}\Bigg{/}\Bigg{\|}\begin{pmatrix}[x^{k}-\tilde{x}^{k}]+\beta^{k}[A^{T}(\lambda^{k}-\tilde{\lambda}^{k})]\\ [y^{k}-\tilde{y}^{k}]+\beta^{k}[B^{T}(\lambda^{k}-\tilde{\lambda}^{k})]\\ [\lambda^{k}-\tilde{\lambda}^{k}]-\beta^{k}[A(x^{k}-\tilde{x}^{k})+B(y^{k}-\tilde{y}^{k})]\\ \end{pmatrix}\Bigg{\|}^{2}

;

x^{k+1}=x^{k}-\gamma\alpha_{k}^{*}\left([x^{k}-\tilde{x}^{k}]+\beta^{k}[A^{T}(\lambda^{k}-\tilde{\lambda}^{k})]\right)

;

y^{k+1}=y^{k}-\gamma\alpha_{k}^{*}\left([y^{k}-\tilde{y}^{k}]+\beta^{k}[B^{T}(\lambda^{k}-\tilde{\lambda}^{k})]\right)

;

\lambda^{k+1}=\lambda^{k}-\gamma\alpha_{k}^{*}\left([\lambda^{k}-\tilde{\lambda}^{k}]-\beta^{k}[A(x^{k}-\tilde{x}^{k})+B(y^{k}-\tilde{y}^{k})]\right)

;

16 if $r_{k}\leq\mu$ then

\beta^{k+1}:=1.5*\beta^{k}

;

18 else

\beta^{k+1}:=\beta^{k}

20 end

22 end for

Algorithm 9

\text{PGA}_{a_{1}}

for problem (22)

\cdots

α_{k}^{*} = (\begin{matrix} x^{k} - {\tilde{x}}^{k} \\ y^{k} - {\tilde{y}}^{k} \\ λ^{k} - {\tilde{λ}}^{k} \end{matrix}

3)T([xk-~xk]+[βkAT(λk-~λk)][yk-~yk]+[βkBT(λk-~λk)][λk-~λk]-βk[A(xk-~xk)+B(yk-~yk)])∥([xk-~xk]+[βkAT(λk-~λk)][yk-~yk]+[βkBT(λk-~λk)][λk-~λk]-βk[A(xk-~xk)+B(yk-~yk)])∥2;

\cdots

Algorithm 10

\text{PGA}_{b_{1}}

for problem (22)

In the preceding part of this section, we have introduced five algorithms for problem (22). Now, let us take a close look at them. Firstly, ADMM (Algorithm 6) may not be implementable in practice because compared with the computation of proximity operators of $\theta_{1}$ and $\theta_{2}$ , it may be much harder to solve the next two convex optimization problems

\begin{array}[]{l}\min\{\theta_{1}(x)+\frac{\rho}{2}\|Ax+By^{k}-c+\frac{1}{\rho}\|^{2}:x\in\mathbb{R}^{n}\},\\ \min\{\theta_{2}(y)+\frac{\rho}{2}\|Ax^{k+1}+By-c+\frac{1}{\rho}\|^{2}:y\in\mathbb{R}^{n}\}.\end{array}

Next, let us look at AD-LPMM (Algorithm 7) and the other three algorithms proposed by us. Firstly, at iteration $k$ , AD-LPMM solves $(x^{k+1},y^{k+1},\lambda^{k+1})$ in order, while the other three algorithms solve $(\tilde{x}^{k},\tilde{y}^{k},\tilde{\lambda}^{k})$ and $(x^{k+1},y^{k+1},\lambda^{k+1})$ in parallel, this is a significant characteristic of our algorithms, allowing lesser time exhausted by employing the technique of parallel computation, especially in case of large-scale multi-blocks separable problems. Secondly, although the proximity operators of $\theta_{1}$ and $\theta_{2}$ are used by all of the four algorithms, more computation is involved by AD-LPMM in updating $x$ and $y$ because it requires to compute $A^{T}Ax$ and $A^{T}By$ , while our algorithms only require the computation of $A^{T}\lambda$ or $B^{T}\lambda$ . Finally, AD-LPMM additionally evaluates in advance the maximal eigenvalues of $A^{T}A$ and $B^{T}B$ , which is not necessary for our algorithms. Now, let us look at our proposed algorithms $\text{PGA}_{a_{1}}$ , $\text{PGA}_{b_{1}}$ , and GEM. The predictor steps are the same for all of them, who use the proximity operators of $\theta_{1}$ and $\theta_{2}$ for making the predictors. In corrector step, GEM employs again the proximity operators for obtaining the next surrograte $(x^{k+1},y^{k+1},\lambda^{k+1})$ , while for the other two algorithms, only some cheap matrix-vector products, additions, and so on are involved. Thus, when the computation of proximity operators of $\theta_{1}$ and $\theta_{2}$ requires some iterative techniques, the corrector step of GEM will be more complex than that of $\text{PGA}_{a_{1}}$ and $\text{PGA}_{b_{1}}$ . Finally, we point out that the value of $\alpha_{k}^{*}$ in $\text{PGA}_{b_{1}}$ is lesser or equal to its counterpart in $\text{PGA}_{a_{1}}$ .

6 Numerical Simulations

In this section, we report some numerical results of our proposed algorithms. Our algorithms are implemented in MATLAB. The experiments are performed on a laptop equipped with Intel i $5$ - $6200$ U $2.30$ GHz CPU and $8$ GB RAM.

6.1 First Example

Consider the next $l_{1}$ -regularized least square problem

\min_{x\in\mathbb{R}^{n}}f(x):=\frac{1}{2}\|Ax-b\|^{2}+\lambda\|x\|_{1},

(23)

where $A\in\mathbb{R}^{m\times n}$ , $b\in\mathbb{R}^{m}$ , and $\lambda>0$ . This problem fits $\text{MGVI}(\theta,F)$ with

\begin{array}[]{l}\theta(x)=\lambda\|x\|_{1},\\ F(x)=A^{T}(Ax-b).\end{array}

We generate an instance of the problem with $\lambda=1$ and $A\in\mathbb{R}^{1000\times 1100}$ for showing the iterative shrinkage-thresholding algorithm (ISTA) and our proposed algorithms GME, $\text{PGA}_{a_{1}}$ , $\text{PGA}_{a_{2}}$ , $\text{PGA}_{b_{1}}$ , $\text{PGA}_{b_{2}}$ . The components of $A$ are independently generated by using the standard normal distribution, the ”true” vector x_true is a sparse vector with 20 non-zeros elements, which is generated by the following matlab command:

⬇

x_true = zeros(n,1);

x_true(3:8:80) = 1;

x_true(7:8:80) = -1;

then we set $b:=A\text{x\_{true}}$ . The initial point for all of the algorithms are $x=\textbf{e}$ , the vector of all ones. The stopping criterion for all of them is

\|x^{k}-\text{Prox}_{\theta}(x^{k}-F(x^{k}))\|_{\infty}<10^{-6}.

The trends about the values of $f$ of ISTA, GME, $\text{PGA}_{a_{1}}$ , $\text{PGA}_{a_{2}}$ , $\text{PGA}_{b_{1}}$ , $\text{PGA}_{b_{2}}$ on $l_{1}$ -regularized least squares problem are shown in Fig. 1. The number of iteration and CPU time for all of thm is shown in Table 1. All of them recover the ”true” vector.

Refer to caption — Figure 1: Trends about $f$ of ISTA, GME, $\text{PGA}_{a_{1}}$ , $\text{PGA}_{a_{2}}$ , $\text{PGA}_{b_{1}}$ , $\text{PGA}_{b_{2}}$ on $l_{1}$ -regularized least squares problem.

Table 1: The number of iteration and time of ISTA, GME,

\text{PGA}_{a_{1}}

\text{PGA}_{a_{2}}

\text{PGA}_{b_{1}}

\text{PGA}_{b_{2}}

l_{1}

-regularized least squares problem.

	No. of iteration	CPU time (seconds)
ISTA	1739	5.36
GEM	1682	11.10
$\text{PGA}_{a_{1}}$	1816	24.93
$\text{PGA}_{a_{2}}$	822	4.20
$\text{PGA}_{b_{1}}$	1157	7.31
$\text{PGA}_{b_{2}}$	1085	3.38

6.2 Second Example

Consider the next basis pursuit problem

\begin{split}\min\quad&\|x\|_{1}\\ \text{s.t.}\quad&Ax=b,\\ \end{split}

(24)

where $A\in\mathbb{R}^{m\times n}$ , $b\in\mathbb{R}^{m}$ . This problem matches $\text{MGVI}(\theta,F)$ with $\theta(x,\lambda)=\|x\|_{1}$ , and $F(x,\lambda)=\begin{pmatrix}0&-A^{T}\\ A&0\end{pmatrix}\begin{pmatrix}x\\ \lambda\end{pmatrix}+\begin{pmatrix}0\\ -b\end{pmatrix}.$ We use the same data as in the preceding example for showing AD-LPMM , and our algorithms GME, $\text{PGA}_{a_{1}}$ , $\text{PGA}_{b_{1}}$ . The initial point for all the algorithms are $x=\textbf{e}$ , the vector of all ones, and $\lambda=\textbf{0}$ , the vector of all zeros. The stopping criterion for all the algorithms except AD-LPMM are $\|(x^{k},\lambda^{k})-\text{Prox}_{\theta}\left((x^{k},\lambda^{k})-F(x^{k},\lambda^{k})\right)\|_{\infty}<10^{-6}$ , while that for AD-LPMM is $\|(x^{k+1},\lambda^{k+1})-(x^{k},\lambda^{k})\|_{\infty}<10^{-6}$ . Results of AD-LPMM, $\text{PGA}_{a_{1}}$ , $\text{PGA}_{b_{1}}$ , GEM on basis pursuit problem are shown in Table 2, where we show the number of iteration and CPU time for all of them. All of these algorithms recover the ”true” vector.

Table 2: The number of iteration and time of AD-LPMM, GEM,

\text{PGA}_{a_{1}}

\text{PGA}_{b_{1}}

on basis pursuit problem.

	No. of iteration	CPU time (seconds)
AD-LPMM	1773	18.65
GEM	105	4.65
$\text{PGA}_{a_{1}}$	225	7.58
$\text{PGA}_{b_{1}}$	226	7.05

7 Conclusion

In this paper, we propose some PGA-based Predictor-Corrector algorithms for solving $\text{MGVI}(\theta,F)$ . Our works are inspired by the extragradient method and the projection and contraction algorithms for $\text{MVI}(\theta,F,\mathcal{C})$ . The form of our algorithms is also simple, and one of their significant characteristic for separable multi-blocks convex optimization problems is that they can be well adapted for parallel computation, allowing our algorithms can be well applied in large-scale settings. Preliminary simulations of our algorithms in the sparsity recovery models show their effectiveness when compared with some existing algorithms.

Acknowledgements.

My special thanks should go to my friend, Kuan Lu, who has put considerable time and effort into his comments on the draft.

References

(1) Bauschke, H.H., Combettes, P.L.: Convex analysis and monotone operator theory in Hilbert spaces, second edn. CMS Books in Mathematics/Ouvrages de Mathématiques de la SMC. Springer, Cham (2017). DOI 10.1007/978-3-319-48311-5. URL https://doi.org/10.1007/978-3-319-48311-5. With a foreword by Hédy Attouch
(2) Beck, A.: First-order methods in optimization, MOS-SIAM Series on Optimization, vol. 25. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA; Mathematical Optimization Society, Philadelphia, PA (2017). DOI 10.1137/1.9781611974997.ch1. URL https://doi.org/10.1137/1.9781611974997.ch1
(3) Bertsekas, D.P.: Convex analysis and optimization. Athena Scientific, Belmont, MA (2003). With Angelia Nedić and Asuman E. Ozdaglar
(4) Boyd, S., Parikh, N., Chu, E., Borja, P., Jonathan, E.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning 3(1), 1–122 (2011)
(5) Bruckstein, A.M., Donoho, D.L., Elad, M.: From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Rev. 51(1), 34–81 (2009). DOI 10.1137/060657704. URL https://doi.org/10.1137/060657704
(6) Candès, E., Tao, T.: Rejoinder: “The Dantzig selector: statistical estimation when $p$ is much larger than $n$ ” [Ann. Statist. 35 (2007), no. 6, 2313–2351; mr2382644]. Ann. Statist. 35(6), 2392–2404 (2007). DOI 10.1214/009053607000000532. URL https://doi.org/10.1214/009053607000000532
(7) Chen, S.S., Donoho, D.L., Saunders, M.A.: Atomic decomposition by basis pursuit. SIAM Rev. 43(1), 129–159 (2001). DOI 10.1137/S003614450037906X. URL https://doi.org/10.1137/S003614450037906X. Reprinted from SIAM J. Sci. Comput. 20 (1998), no. 1, 33–61 (electronic) [ MR1639094 (99h:94013)]
(8) Combettes, P.L., Pesquet, J.C.: Proximal splitting methods in signal processing. In: Fixed-point algorithms for inverse problems in science and engineering, Springer Optim. Appl., vol. 49, pp. 185–212. Springer, New York (2011). DOI 10.1007/978-1-4419-9569-8˙10. URL https://doi.org/10.1007/978-1-4419-9569-8_10
(9) Fang, S.C., Peterson, E.L.: Generalized variational inequalities. J. Optim. Theory Appl. 38(3), 363–383 (1982). DOI 10.1007/BF00935344. URL https://doi.org/10.1007/BF00935344
(10) Fortin, M., Glowinski, R.: Augmented Lagrangian methods, Studies in Mathematics and its Applications, vol. 15. North-Holland Publishing Co., Amsterdam (1983). Applications to the numerical solution of boundary value problems, Translated from the French by B. Hunt and D. C. Spicer
(11) He, B.: A class of projection and contraction methods for monotone variational inequalities. Appl. Math. Optim. 35(1), 69–76 (1997). DOI 10.1007/s002459900037. URL https://doi.org/10.1007/s002459900037
(12) He, B., Yuan, X.: Convergence analysis of primal-dual algorithms for a saddle-point problem: from contraction perspective. SIAM J. Imaging Sci. 5(1), 119–149 (2012). DOI 10.1137/100814494. URL https://doi.org/10.1137/100814494
(13) He, B., Yuan, X.: Linearized alternating direction method of multipliers with Gaussian back substitution for separable convex programming. Numer. Algebra Control Optim. 3(2), 247–260 (2013). DOI 10.3934/naco.2013.3.247. URL https://doi.org/10.3934/naco.2013.3.247
(14) He, B., Yuan, X., Zhang, W.: A customized proximal point algorithm for convex minimization with linear constraints. Comput. Optim. Appl. 56(3), 559–572 (2013). DOI 10.1007/s10589-013-9564-5. URL https://doi.org/10.1007/s10589-013-9564-5
(15) He, B.S.: A new method for a class of linear variational inequalities. Math. Programming 66(2, Ser. A), 137–144 (1994). DOI 10.1007/BF01581141. URL https://doi.org/10.1007/BF01581141
(16) He, B.S.: From the projection and contraction methods for variational inequalities to the splitting contraction methods for convex optimization. Numer. Math. J. Chinese Univ. 38(1), 74–96 (2016)
(17) He, B.S., Fu, X.L., Jiang, Z.K.: Proximal-point algorithm using a linear proximal term. J. Optim. Theory Appl. 141(2), 299–319 (2009). DOI 10.1007/s10957-008-9493-0. URL https://doi.org/10.1007/s10957-008-9493-0
(18) Korpelevič, G.M.: An extragradient method for finding saddle points and for other problems. Èkonom. i Mat. Metody 12(4), 747–756 (1976)
(19) Patriksson, M.: Nonlinear programming and variational inequality problems, Applied Optimization, vol. 23. Kluwer Academic Publishers, Dordrecht (1999). DOI 10.1007/978-1-4757-2991-7. URL https://doi.org/10.1007/978-1-4757-2991-7. A unified approach

	$\displaystyle\vartheta_{k}(\alpha)$	$\displaystyle:=\\|x^{k}-x^{}\\|^{2}-\\|x^{k+1}-x^{}\\|^{2}$
		$\displaystyle=2\alpha(x^{k}-x^{*})^{T}d(x^{k},\tilde{x}^{k},\beta^{k})-\alpha^{2}\\|d(x^{k},\tilde{x}^{k},\beta^{k})\\|^{2}$
		$\displaystyle\geq 2\alpha\phi(x^{k},\tilde{x}^{k},\beta^{k})-\alpha^{2}\\|d(x^{k},\tilde{x}^{k},\beta^{k})\\|^{2}:=q_{k}(\alpha).$

	$\displaystyle\\|x^{k+1}-x^{*}\\|^{2}$	$\displaystyle\leq\\|x^{k}-x^{}\\|^{2}-\gamma(2-\gamma)\alpha_{k}^{}\phi(x^{k},\tilde{x}^{k},\beta^{k})$
		$\displaystyle=\\|x^{k}-x^{}\\|^{2}-\gamma(2-\gamma)\alpha_{k}^{}\\|x^{k}-\tilde{x}^{k}\\|^{2}.$		(20)

	$\displaystyle\vartheta_{k}(\alpha)$	$\displaystyle:=\\|x^{k}-x^{}\\|^{2}-\\|x^{k+1}-x^{}\\|^{2}$
		$\displaystyle=2\alpha(x^{k}-x^{*})^{T}d(x^{k},\tilde{x}^{k},\beta^{k})-\alpha^{2}\\|d(x^{k},\tilde{x}^{k},\beta^{k})\\|^{2}$
		$\displaystyle\geq 2\alpha\phi(x^{k},\tilde{x}^{k},\beta^{k})-\alpha^{2}\\|d(x^{k},\tilde{x}^{k},\beta^{k})\\|^{2}:=q_{k}(\alpha).$

	$\displaystyle 2\phi(x^{k},\tilde{x}^{k},\beta^{k})-\\|d(x^{k},\tilde{x}^{k},\beta^{k})\\|^{2}$	$\displaystyle=2(x^{k}-\tilde{x}^{k})^{T}d(x^{k},\tilde{x}^{k},\beta^{k})-\\|d(x^{k},\tilde{x}^{k},\beta^{k})\\|^{2}$
		$\displaystyle=d(x^{k},\tilde{x}^{k},\beta^{k})^{T}\{2(x^{k}-\tilde{x}^{k})-d(x^{k},\tilde{x}^{k},\beta^{k})\}$
		$\displaystyle=(x^{k}-\tilde{x}^{k})^{T}-{\beta^{k}}^{2}\\|F(x^{k})-F(\tilde{x}^{k})\\|^{2}$
		$\displaystyle\geq(1-\nu^{2})\\|x^{k}-\tilde{x}^{k}\\|^{2},$

PGA-based Predictor-Corrector Algorithms for Monotone Generalized Variational Inequality

Abstract

Keywords:

1 Introduction

Assumption 1

2 Notations and preliminaries

Lemma 1 (MR3719240 )

Lemma 2

Proof

Lemma 3 (MR3719240 )

Definition 1 (Fejér monotone sequence)

Lemma 4 (MR3616647 )

3 Generalized Extragradient Method for MGVI​(θ,F)\text{MGVI}(\theta,F)

Remark 1

Proposition 1

Proof

Proposition 2

Proof

Theorem 3.1 (Convergence of GEM for MGVI​(θ,F)\text{MGVI}(\theta,F))

Proof

Remark 2

4 Proximity and Contraction Algorithms for MGVI​(θ,F)\text{MGVI}(\theta,F)

Proposition 3 (Three fundamental inequalities)

Proof

4.1 Proximity and contraction algorithms based on (FI1) and (FI2)

Theorem 4.1 (Convergence of PGAa1\text{PGA}_{a_{1}} and PGAa2\text{PGA}_{a_{2}} for MGVI​(θ,F)\text{MGVI}(\theta,F))

Proof

Remark 3

4.2 Proximity and contraction algorithms based on (FI2), (FI2) and (FI3)

Theorem 4.2 (Convergence of PGAb1\text{PGA}_{b_{1}} and PGAb2\text{PGA}_{b_{2}} for MGVI​(θ,F)\text{MGVI}(\theta,F))

Remark 4

5 A Discussion about Our Algorithms

6 Numerical Simulations

6.1 First Example

6.2 Second Example

7 Conclusion

Acknowledgements.

References

3 Generalized Extragradient Method for $\text{MGVI}(\theta,F)$

Theorem 3.1 (Convergence of GEM for $\text{MGVI}(\theta,F)$ )

4 Proximity and Contraction Algorithms for $\text{MGVI}(\theta,F)$

Theorem 4.1 (Convergence of $\text{PGA}_{a_{1}}$ and $\text{PGA}_{a_{2}}$ for $\text{MGVI}(\theta,F)$ )

Theorem 4.2 (Convergence of $\text{PGA}_{b_{1}}$ and $\text{PGA}_{b_{2}}$ for $\text{MGVI}(\theta,F)$ )