An adaptive linearized alternating direction multiplier method with a relaxation step for convex programming

\fnmBoran Wang [email protected] \orgdivCollege of Science, \orgnameMinzu University of China, \cityBeijing, \postcode100081, \stateChina

Abstract

Alternating direction multiplication is a powerful technique for solving convex optimisation problems. When challenging subproblems are encountered in the real world, it is useful to solve them by introducing neighbourhood terms. When the neighbourhood matrix is positive definite, the algorithm converges but at the same time makes the iteration step small. Recent studies have revealed the potential non-positive definiteness of the neighbourhood matrix. In this paper, we present an adaptive linearized alternating direction multiplier method with a relaxation step, combining the relaxation step with an adaptive technique. The novelty of the method is to use the information of the current iteration point to dynamically select the neighbourhood matrix, increase the iteration step size, and speed up the convergence of the algorithm.We prove the global convergence of the algorithm theoretically and illustrate the effectiveness of the algorithm using numerical experiments.

keywords:

variational inequality, an adaptive linearized ADMM , a relaxation step , global convergence.

1 Introduction

This paper considers the following separable convex optimization problem with linear constraints:

\min\left\{f\left(x\right)+g\left(y\right)|Ax+By=b,x\in X,y\in Y\right\}

(1)

where $f:R^{n_{1}}\to R$ and $g:R^{n_{2}}\to R$ are convex functions (but not necessarily smooth), $A\in R^{m\times n_{1}},B\in R^{m\times n_{2}},b\in R^{m},X\subseteq R^{n_{1}}$ and $Y\subseteq R^{n_{2}}$ are closed convex sets. Problem (1) has recently been widely used in several areas such as image processing[1, 2], statistical learning[3] and communication networks[4].There exist numerous effective methods for addressing problem (1), including the primal-dual method, the augmented Lagrange method, and the alternating direction multiplier method. Notably, the alternating direction multiplier method has garnered significant attention in recent years due to its simplicity and efficiency. Many scholars have introduced several effective variations of this algorithm tailored to practical challenges.

The alternate direction multiplier approach, which was first put forth by Gabay and Mercier[5] and Glowinski and Marrocco[6], is widely recognized as an effective way to solve (1). For the progress of research on the alternating direction multiplier method, we refer to[7].The method possesses fast convergence speed and strong convergence performance, making it a crucial tool for addressing convex optimization problems with divisibility. It can decompose a large problem into two or more small-scale subproblems and then iteratively solve each subproblem to enhance solution speed. Its direct, adaptable, and practical nature is particularly effective in tackling convex optimization problems with separability. The alternating direction multiplier method can be described as follows:

\left\{\begin{aligned} x^{k+1}&=arg\min_{x}\left\{L_{\beta}\left(x,y^{k},\lambda^{k}\right)|x\in X\right\}\\ y^{k+1}&=arg\min_{y}\left\{L_{\beta}\left(x^{k+1},y,\lambda^{k}\right)|y\in Y\right\}\\ \lambda^{k+1}&=\lambda^{k}-\beta\left(Ax^{k+1}+By^{k+1}-b\right)\end{aligned}\right.

(2)

where $L_{\beta}\left(x,y,\lambda\right)$ is the augmented Lagrangian function of (1)

L_{\beta}\left(x,y,\lambda\right)=f\left(x\right)+g\left(y\right)-\lambda^{T}\left(Ax+By-b\right)+\frac{\beta}{2}\left\|Ax+By-b\right\|_{2}^{2}.

where $\lambda\in R^{m}$ represents the Lagrange multiplier, and $\beta>0$ represents the penalty parameter. For simplicity, the penalty parameter $\beta$ is fixed in our discussion.

A significant focus in the diverse research literature concerning the alternating direction multiplier method is the exploration of efficient strategies for solving its subproblems. In certain scenarios, the functions $f\left(x\right)$ and $g\left(y\right)$ , or the coefficient matrices $A$ and $B$ may exhibit unique properties and structures. Leveraging these characteristics, researchers can extend the framework presented in (2) to devise algorithms tailored to specific applications while upholding theoretical convergence guarantees. This principle underscores the critical role of effectively applying the alternating direction multiplier method across various fields and applications. To delve deeper into this concept, let’s examine the $y$ -subproblem outlined in (2). As previously noted, the function $g(y)$ , the matrice $B$ , and the set $Y$ are pivotal in addressing the $y$ -subproblems. Typically, when these the function, matrice, and set are in a generic form, the solution to (2) can be obtained using straightforward iterative techniques,we refer to [8, 9, 10]. However, in practical scenarios, the functions, matrices, and sets $Y$ associated with the $y$ -subproblem may possess distinct characteristics, necessitating more efficient solution methods. When the matrix B is not an identity matrix, the $y$ -subproblem (2) can be reformulated as follows:

y^{k+1}=arg\min_{y}\left\{g\left(y\right)+\frac{\beta}{2}\left\|By+\left(Ax^{k+1}-b-\frac{1}{\beta}\lambda^{k}\right)\right\|^{2}|y\in Y\right\}

(3)

After linearizing the quadratic term $\left\|By+\left(Ax^{k+1}-b-\frac{1}{\beta}\lambda^{k}\right)\right\|^{2}$ in (3) , it is obtained:

y^{k+1}=arg\min_{y}\left\{g\left(y\right)+\frac{r}{2}\left\|y-\left(y^{k}+\frac{1}{r}q^{k}\right)\right\|^{2}|y\in Y\right\}.

(4)

where

q^{k}=B^{T}\left(\lambda^{k}-\beta\left(Ax^{k+1}+By^{k}-b\right)\right).

(5)

where $r>0$ is a constant, and thus we obtain the linearized alternating direction multiplier method[11]:

\left\{\begin{aligned} x^{k+1}&=arg\min_{x}\left\{L_{\beta}\left(x,y^{k},\lambda^{k}\right)|x\in X\right\}\\ y^{k+1}&=arg\min_{y}\left\{L_{\beta}\left(x^{k+1},y,\lambda^{k}\right)+\frac{r}{2}\left\|y-\left(y^{k}+\frac{1}{r}q^{k}\right)\right\|^{2}|y\in Y\right\}\\ \lambda^{k+1}&=\lambda^{k}-\beta\left(Ax^{k+1}+By^{k+1}-b\right)\end{aligned}\right.

(6)

Linearized alternating direction multiplier method is widely used with compressed perception[12], and image processing[13, 14], etc. .

A more general alternating direction multiplier method[15] can be written as follows:

\left\{\begin{aligned} x^{k+1}&=arg\min_{x}\left\{L_{\beta}\left(x,y^{k},\lambda^{k}\right)|x\in X\right\}\\ y^{k+1}&=arg\min_{y}\left\{L_{\beta}\left(x^{k+1},y,\lambda^{k}\right)+\frac{1}{2}\left\|y-y^{k}\right\|_{D}^{2}|y\in Y\right\}\\ \lambda^{k+1}&=\lambda^{k}-\beta\left(Ax^{k+1}+By^{k+1}-b\right)\end{aligned}\right.

(7)

where $D\in R^{n_{2}\times n_{2}}$ is positive definite, and thus, the linearized alternating direction multiplier method can be viewed as a special case of the general alternating direction multiplier method, where the positive definite proximal term.

D=rI_{n_{2}}-\beta B^{T}B\quad and\quad r>\beta\left\|B^{T}B\right\|

(8)

Since in many practical applications, only one subproblem in (6) needs to be linearized, in this paper, we only need to consider the case of linearizing the $y$ -subproblem in (7).

In order to improve the applicability of the linearized alternating direction multiplier method, a number of improved linearized alternating direction multiplier methods[16, 17, 18, 19, 20]have been proposed by some scholars.

Literature[21] proposes an indefinite proximity alternating direction multiplier method with the following iteration format:

\left\{\begin{aligned} x^{k+1}&=arg\min_{x}\left\{L_{\beta}\left(x,y^{k},\lambda^{k}\right)|x\in X\right\}\\ y^{k+1}&=arg\min_{y}\left\{L_{\beta}\left(x^{k+1},y,\lambda^{k}\right)+\frac{1}{2}\left\|y-y^{k}\right\|_{D_{0}}^{2}|y\in Y\right\}\\ \lambda^{k+1}&=\lambda^{k}-\beta\left(Ax^{k+1}+By^{k+1}-b\right)\end{aligned}\right.

(9)

where

D_{0}=\tau rI_{n_{2}}-\beta B^{T}B,\quad 0.8\leq\tau<1\quad and\quad r>\beta\left\|B^{T}B\right\|.

He et al. proposed an optimal linearized ADMM in the literature [22] with the iterative steps:

\left\{\begin{aligned} x^{k+1}&=arg\min_{x}\left\{L_{\beta}\left(x,y^{k},\lambda^{k}\right)|x\in X\right\}\\ y^{k+1}&=arg\min_{y}\left\{L_{\beta}\left(x^{k+1},y,\lambda^{k}\right)+\frac{1}{2}\left\|y-y^{k}\right\|_{D}^{2}|y\in Y\right\}\\ \lambda^{k+1}&=\lambda^{k}-\beta\left(Ax^{k+1}+By^{k+1}-b\right)\end{aligned}\right.

(10)

where

D=\tau rI-\beta B^{T}B,\quad 0.75<\tau<1\quad and\quad\rho>\beta\left\|B^{T}B\right\|

It is easy to see that the matrix $D$ is not necessarily semipositive definite. In the paper the authors prove that 0.75 is an optimal lower bound for $\tau$ .

Another important issue for ADMM is to design an accelerated version of the original ADMM by slightly modifying it through a simple relaxation scheme. Note that ADMM is closely related to Proximity Point Algorithms[23] (PPA). For the Proximity Point Algorithm, the convergence rate is improved if an additional over-relaxation step is added to the basic variables; see Tao[24] for a relaxation variant of the PPA and proof of its linear convergence result.Gu and Yang[25] further proved the optimal linear convergence rate of the relaxation PPA under regularity conditions. As Boyd et al. proved in[3], the execution of ADMM relies on the inputs of $\left(y^{k},\lambda^{k}\right)$ and does not require $x^{k}$ at all. Thus, $x$ plays the role of an intermediate variable in (2), while $\left(y^{k},\lambda^{k}\right)$ is the basic variable. It is therefore natural to ask whether it is possible to include a relaxation step for the basic variables $\left(y^{k},\lambda^{k}\right)$ in the ADMM scheme (2) to obtain a faster ADMM-type method. This idea leads to:

\left\{\begin{aligned} x^{k+1}&=arg\min_{x}\left\{f\left(x\right)-x^{T}A^{T}\lambda^{k}+\frac{\beta}{2}\left\|Ax+By^{k}-b\right\|^{2}\right\}\\ \hat{y}^{k}&=arg\min_{y}\left\{g\left(y\right)-y^{T}B^{T}\lambda^{k}+\frac{\beta}{2}\left\|Ax^{k+1}+By-b\right\|^{2}\right\}\\ \hat{\lambda}^{k}&=\lambda^{k}-\beta\left(Ax^{k+1}+B\hat{y}^{k}-b\right)\end{aligned}\right.

(11)

\begin{pmatrix}y^{k+1}\\ \lambda^{k+1}\end{pmatrix}=\begin{pmatrix}y^{k}\\ \lambda^{k}\end{pmatrix}-\sigma\begin{pmatrix}y^{k}-\hat{y}^{k}\\ \lambda^{k}-\hat{\lambda}^{k}\end{pmatrix}

(12)

where the relaxation factor $\sigma\in\left(0,2\right).$ Based on the above discussion,In this study, we introduce an adaptive linearized alternating direction multiplier method with a relaxation step that incorporates both a relaxation step and an adaptive technique. Our approach adopts a unified framework of variational inequalities and establishes the global convergence of this adaptive linearized alternating direction multiplier method with a relaxation step through theoretical analysis. Through the resolution of the Lasso problem, we demonstrate the algorithm’s outstanding numerical performance. These findings will propel advancements in optimization algorithms and offer more efficient tools and methodologies for addressing practical challenges.

The rest of the paper is organized as follows. Section 2 gives some preliminaries; Section 3 gives iterative steps and proof of convergence of the algorithm; Section 4 gives the numerical experiments; and Section 5 draws the conclusions.

2 Preliminaries

We present some preliminaries that will be used in convergence analysis.

In this article, the symbol $\left\|\cdot\right\|$ denotes the two-norm $\left\|\cdot\right\|_{2}.\left\|x\right\|_{D}^{2}:=x^{T}Dx$ is the matrix norm, where $D\in R^{n\times n}$ is a symmetric positive definite matrix and the vector $x\in R^{n}$ . When $D$ is not a positive definite matrix, we still use the above notation.

2.1 characterization of variational inequalities

Define the Lagrangian function corresponding to problem (1) as:

L\left(x,y,\lambda\right)=f\left(x\right)+g\left(y\right)-\lambda^{T}\left(Ax+By-b\right)

(13)

In (13), $(x,y)$ and $\lambda$ denote primitive and dual variables, respectively.

If $\left(x^{\ast},y^{\ast},\lambda^{\ast}\right)\in X\times Y\times R^{m}$ satisfies the following inequality:

L\left(x^{\ast},y^{\ast},\lambda\right)\leq L\left(x^{\ast},y^{\ast},\lambda^{\ast}\right)\leq L\left(x,y,\lambda^{\ast}\right),\quad\forall\left(x,y,\lambda\right)\in X\times Y\times R^{m}=\Omega.

then $\left(x^{\ast},y^{\ast},\lambda^{\ast}\right)$ is called a saddle point of $L\left(x,y,\lambda\right)$ .
The above inequality is equivalent to the following variational inequality form:

\begin{cases}x^{\ast}\in X,\quad f\left(x\right)-f\left(x^{\ast}\right)+\left(x-x^{\ast}\right)\left(-A^{T}\lambda^{\ast}\right)\geq 0,\quad\forall x\in X.\\ y^{\ast}\in Y,\quad g\left(y\right)-g\left(y^{\ast}\right)+\left(y-y^{\ast}\right)\left(-A^{T}\lambda^{\ast}\right)\geq 0,\quad\forall y\in Y.\\ \lambda^{\ast}\in R^{m},\qquad\left(\lambda-\lambda^{\ast}\right)^{T}\left(Ax^{\ast}+By^{\ast}-b\right)\geq 0,\quad\forall\lambda\in R^{m}.\end{cases}

(14)

The above variational inequality can be written in the following form:

VI\left(\Omega,F,\theta\right):\quad w^{\ast}\in\Omega,\quad\theta\left(u\right)-\theta\left(u^{\ast}\right)+\left(w-w^{\ast}\right)^{T}F\left(w^{\ast}\right)\geq 0,\quad\forall w\in\Omega.

(15)

where

\theta\left(u\right)=f\left(x\right)+g\left(y\right),u=\begin{pmatrix}x\\ y\end{pmatrix},v=\begin{pmatrix}y\\ \lambda\par\end{pmatrix},w=\begin{pmatrix}x\\ y\\ \lambda\end{pmatrix},F\left(w\right)=\begin{pmatrix}-A^{T}\lambda\\ -B^{T}\lambda\\ Ax+By-b\end{pmatrix}

(16)

Owing to:

\left(w_{1}-w_{2}\right)^{T}\left(F\left(w_{1}\right)-F\left(w_{2}\right)\right)\geq 0,\quad\forall w_{1},w_{2}\in\Omega.

So, $F$ is a monotone operator.
The problem (1) is reformulated as the variational inequality (15) in this manner. We denote the set of solutions of the variational inequality $VI\left(\Omega,F,\theta\right)$ as $\Omega^{\ast}$ , where $\Omega^{\ast}$ represents the set of non-empty solutions.

2.2 some notation

For the convenience of the proof, first define some auxiliary variables and matrices. Let

Q_{k+1}=\begin{pmatrix}\tau_{k}r\beta I_{n_{2}}&O\\ -B&\frac{1}{\beta}I_{m}\end{pmatrix}\qquad and\qquad M=\begin{pmatrix}\sigma I_{n_{2}}&O\\ -\sigma\beta B&\sigma I_{m}\end{pmatrix}.

(17)

H_{k+1}=\frac{1}{\sigma}\begin{pmatrix}\tau_{k}r\beta I_{n_{2}}&O\\ O&\frac{1}{\beta}I_{m}\end{pmatrix}.

(18)

\tilde{w}^{k}=\begin{pmatrix}\tilde{x}^{k}\\ \tilde{y}^{k}\\ \tilde{\lambda}^{k}\end{pmatrix}=\begin{pmatrix}x^{k+1}\\ \hat{y}^{k}\\ \lambda^{k}-\beta\left(Ax^{k+1}+By^{k}-b\right)\end{pmatrix}.

(19)

\tilde{v}^{k}=\begin{pmatrix}\tilde{y}^{k}\\ \tilde{\lambda}^{k}\end{pmatrix}=\begin{pmatrix}\hat{y}^{k}\\ \lambda^{k}-\beta\left(Ax^{k+1}+By^{k}-b\right)\end{pmatrix}

(20)

Lemma 2.1. The $Q_{k+1},H_{k+1}$ and $M$ defined in (17) and (18) satisfy:

Q_{k+1}=H_{k+1}M\quad and\quad H_{k+1}\succeq 0,

(21)

Proof: (21) clearly hold.
Lemma 2.2. (Robbins-Siegmund Theorem[26]) $a^{k},b^{k},c^{k}$ and $d^{k}$ are non-negative sequences and there are :

a^{k+1}\leq\left(1+b^{k}\right)a^{k}+c^{k}-d^{k},\forall k=0,1,2\dots

(22)

if $\sum_{k=0}^{+\infty}b^{k}<+\infty$ and $\sum_{k=0}^{+\infty}c^{k}<+\infty,$ so $\lim_{k\to\infty}a^{k}$ exists and is bounded, while there are $\sum_{k=0}^{+\infty}d^{k}<+\infty.$
Definition 2.1.If a function $f\left(\cdot\right)$ is a nonsmooth convex function on a convex set,if the following inequality holds:

f\left(u\right)\geq f\left(v\right)+\phi^{T}\left(u-v\right),\forall u\in\Omega.

then $\phi$ is said to be the subgradient of the function $f\left(\cdot\right)$ at $v\in\Omega$ ; The set consisting of all subgradients is called the subdifferential of the function $f\left(\cdot\right)$ at $v\in\Omega$ , denoted $\partial{f\left(v\right)}$ .

2.3 Stopping criterion

In the context of the two-block divisible convex optimization problem (1), we investigate the optimality conditions for each subproblem during the iterative process of the Alternating Direction Method of Multipliers .
According to the optimality condition theorem, if $x^{\ast}$ , $y^{\ast}$ are optimal solutions to the convex optimisation problem (1) and $\lambda^{\ast}$ is the corresponding Lagrange multiplier, then the following conditions are satisfied.

0\in\partial{f\left(x^{\ast}\right)}-A^{T}\lambda^{\ast}

(23)

0\in\partial{g\left(y^{\ast}\right)}-B^{T}\lambda^{\ast}

(24)

Ax^{\ast}+By^{\ast}=b

(25)

In this context, condition (25) is denoted as the original feasibility condition, while conditions (23) and (24) are labeled as the dual feasibility conditions.
According to the optimality conditions for the $y$ -subproblem in (1), we have:

	$\displaystyle 0$	$\displaystyle\in\partial{g\left(y^{k+1}\right)}-B^{T}\lambda^{k}+\beta B^{T}\left(Ax^{k+1}+By^{k+1}-b\right)$		(26)
		$\displaystyle=\partial{g\left(y^{k+1}\right)}-B^{T}\lambda^{k+1}.$		(26)

According to the optimality conditions for the $x$ -subproblem in (1), we have:

0\in\partial{f\left(x^{k+1}\right)}-A^{T}\lambda^{k}+\beta A^{T}\left(Ax^{k+1}+By^{k}-b\right).

(27)

From the definition of $\lambda^{k+1}$ in (1), the above equation can be equated to:

	$\displaystyle 0$	$\displaystyle\in\partial{f\left(x^{k+1}\right)}-A^{T}\left(\lambda^{k}-\beta\left(Ax^{k+1}+By^{k+1}-b\right)-\beta B\left(y^{k}-y^{k+1}\right)\right)$		(28)
		$\displaystyle=\partial{f\left(x^{k+1}\right)}-A^{T}\lambda^{k+1}+\beta A^{T}B\left(y^{k}-y^{k+1}\right).$		(28)

(28) is equivalent to:

\beta A^{T}B\left(y^{k+1}-y^{k}\right)\in\partial{f\left(x^{k+1}\right)}-A^{T}\lambda^{k+1}.

(29)

When comparing (29) with condition (23), it is evident that the additional term is $\beta A^{T}B\left(y^{k+1}-y^{k}\right).$ Thus, to verify dual feasibility, it is adequate to examine the residual $\beta A^{T}B\left(y^{k+1}-y^{k}\right).$

In summary, to ascertain the convergence of the alternating direction multiplier method, it is necessary to verify if the two residuals $p^{k+1}$ and $d^{k+1}$ are sufficiently small.
where

p^{k+1}=\left\|Ax^{k+1}+By^{k+1}-b\right\|.

d^{k}=\left\|\beta A^{T}B\left(y^{k+1}-y^{k}\right)\right\|.

3 Algorithm and convergence analysis

3.1 New algorithms

Suppose that $f:R^{n}\to R\cup\left\{+\infty\right\}$ and $g:R^{n}\to R\cup\left\{+\infty\right\}$ are proper, closed, convex functions.
The main iterative steps of the algorithm are:
Algorithm 1. An adaptive linearized alternating direction multiplier method with a relaxation step .
Set up: $\Omega=X\times Y\times R^{m},\beta>0,\tau_{k}>0,p^{k}\geq 0,d^{k}\geq 0,r=\left\|B^{T}B\right\|,\sigma\in\left(0,2\right),\varepsilon\in\left(0,2-\sigma\right),\Upsilon>1,\rho>1,\epsilon^{pri}>0,\epsilon^{dual}>0.$ choose parameter sequence $\left\{\eta_{k}\right\}$ and $\left\{s_{k}\right\}$ , where $\sum_{k=0}^{\infty}\eta_{k}<+\infty$ and $\sum_{k=0}^{\infty}s_{k}<+\infty.$
Step 0. Input: $w^{0}=\left(x^{0},y^{0},\lambda^{0}\right)\in\Omega,\beta,\sigma,\tau_{0},\tau_{min},p^{0},d^{0},\Upsilon,\varepsilon,r,\rho,\epsilon^{pri},\epsilon^{dual}.$ set up $k=0.$
Step 1. Calculate: $w^{k+1}=\left(x^{k+1},y^{k+1},\lambda^{k+1}\right)\in X\times Y\times R^{m}.$

\begin{cases}x^{k+1}=arg\min_{x}\left\{f\left(x\right)-\left(\lambda^{k}\right)^{T}Ax+\frac{\beta}{2}\left\|Ax+By^{k}-b\right\|^{2}\right\}\\ \hat{y}^{k}=arg\min_{y}\left\{g\left(y\right)-\left(\lambda^{k}\right)^{T}By+\frac{\beta}{2}\left\|Ax^{k+1}+By-b\right\|^{2}\right.\\ \left.\qquad\qquad\qquad\qquad\qquad+\frac{1}{2}\left\|y-y^{k}\right\|_{D_{k}}^{2}\right\}\\ \hat{\lambda}^{k}=\lambda^{k}-\beta\left(Ax^{k+1}+B\hat{y}^{k}-b\right)\end{cases}

(30)

where $D_{k}=\tau_{k}rI_{n_{2}}-\beta B^{T}B.$

\begin{cases}y^{k+1}=y^{k}-\sigma\left(y^{k}-\hat{y}^{k}\right)\\ \lambda^{k+1}=\lambda^{k}-\sigma\left(\lambda^{k}-\hat{\lambda}^{k}\right)\end{cases}

(31)

where $\sigma\in\left(0,2\right).$
Step 2. If any one of the following conditions holds:

	$\displaystyle Condition1.\Theta_{1}^{k}>\Theta_{2}^{k}.$		(32)
	$\displaystyle Condition2.y^{k+1}=y^{k}.$

where $\Theta_{1}^{k}=\left(2-\sigma\right)\tau_{k}r\left\|y^{k}-y^{k+1}\right\|^{2}$ , $\Theta_{2}^{k}=\frac{1}{\varepsilon}\left\|B\left(y^{k}-y^{k+1}\right)\right\|^{2}.$
then go to step 3. otherwise $\tau_{k}=\gamma\ast\tau_{k}\left(\gamma>1\right),$ turn to step 1.
Step 3. If $\Theta_{1}^{k}-\Theta_{2}^{k}\geq\Upsilon\Theta_{2}^{k}.$ then set $t_{k+1}=\max\left\{\frac{r_{k}}{1+\eta_{k+1}},\tau_{min}\right\},$ Otherwise $t_{k+1}=\tau_{k}.$
Step 4. If any one of the following conditions holds:

Condition1.p^{k+1}>\left(1+s_{k}\right)p^{k}.

Condition2.d^{k+1}>\left(1+s_{k}\right)d^{k}.

where $p^{k+1}=\left\|Ax^{k+1}+By^{k+1}-b\right\|,d^{k+1}=\left\|\beta A^{T}B\left(y^{k+1}-y^{k}\right)\right\|.$
then set $\tau_{k+1}=\rho\ast t_{k+1}\left(\rho>1\right),$ Otherwise $\tau_{k+1}=t_{k+1}.$
Step 5. If the stopping criterion is satisfied, return to step 6.
where, the stopping criterion is:

\left\|Ax^{k+1}+By^{k+1}-b\right\|\leq\epsilon^{pri}\quad and\quad\left\|\beta A^{T}B\left(y^{k+1}-y^{k}\right)\right\|\leq\epsilon^{dual}

(33)

Otherwise make $k=k+1,$ and return to step 1.
Step 6. Output: $x^{k+1},y^{k+1},f\left(x^{k+1}\right)+g\left(y^{k+1}\right).$

3.2 Global convergence analysis

In this section,we prove the global convergence for the proposed method.Before proceeding,we need the following lemma.
Lemma 3.1. Let the sequence $\left\{w^{k}\right\}$ be the iterative sequence generated by Algorithm 1, and $\left\{\tilde{w}^{k}\right\}$ be defined in (19), then, for any $\tilde{w}^{k}\in\Omega$ , there are:

\theta\left(u\right)-\theta\left(\tilde{u}^{k}\right)+\left(w-\tilde{w}^{k}\right)^{T}F\left(\tilde{w}^{k}\right)\geq\left(v-\tilde{v}^{k}\right)^{T}Q_{k+1}\left(v^{k}-\tilde{v}^{k}\right),\quad\forall w\in\Omega.

(34)

Proof: by the optimality condition for the $x$ subproblem in Algorithm 1: for any $\tilde{x}^{k}\in X$ ,there are:

f\left(x\right)-f\left(\tilde{x}^{k}\right)+\left(x-\tilde{x}^{k}\right)^{T}\left\{-A^{T}\lambda^{k}+\beta A^{T}\left(A\tilde{x}^{k}+By^{k}-b\right)\right\}\geq 0,\quad\forall x\in X.

Since $\tilde{\lambda}^{k}=\lambda^{k}-\beta\left(A\tilde{x}^{k}+By^{k}-b\right)$ in (19), then:

f\left(x\right)-f\left(\tilde{x}^{k}\right)+\left(x-\tilde{x}^{k}\right)^{T}\left(-A^{T}\tilde{\lambda}^{k}\right)\geq 0.\quad\forall x\in X.

(35)

From the optimality conditions for the $y$ -subproblem in Algorithm 1, it follows that for any $\tilde{y}^{k}\in Y$ , there are:

g\left(y\right)-g\left(\tilde{y}^{k}\right)+\left(y-\tilde{y}^{k}\right)^{T}\left\{-B^{T}\lambda^{k}+\beta B^{T}\left(A\tilde{x}^{k}+B\tilde{y}^{k}-b\right)\right.

\left.+\left(\tau_{k}rI_{n_{2}}-\beta B^{T}B\right)\left(\tilde{y}^{k}-y^{k}\right)\right\}\geq 0,\quad\forall y\in Y.

Also by the definition of $\tilde{\lambda}^{k}$ , the above equation can be written as:

g\left(y\right)-g\left(\tilde{y}^{k}\right)+\left(y-\tilde{y}^{k}\right)^{T}\left\{-B^{T}\tilde{\lambda}^{k}+\tau_{k}r\beta\left(\tilde{y}^{k}-y^{k}\right)\right\}\geq 0,\qquad\forall y\in Y.

(36)

By the definition of $\tilde{\lambda}^{k}$ in (19), we have:

\left(A\tilde{x}+B\tilde{y}^{k}-b\right)-B\left(\tilde{y}^{k}-y^{k}\right)+\frac{1}{\beta}\left(\tilde{\lambda}^{k}-\lambda^{k}\right)=0.

The above equation can be written as:

\begin{split}\tilde{\lambda}^{k}\in R^{m},\qquad\left(\lambda-\tilde{\lambda}^{k}\right)^{T}\left\{A\tilde{x}^{k}+B\tilde{y}^{k}-b-B\left(\tilde{y}^{k}-y^{k}\right)\right.\\ \left.+\frac{1}{\beta}\left(\tilde{\lambda}^{k}-\lambda^{k}\right)\right\}\geq 0,\quad\forall\lambda\in R^{m}.\end{split}

(37)

The lemma can be proven by combining equations (35), (36), and (37) along with the notation $Q_{k+1}$ .
Lemma 3.2. Let the sequence $\left\{w^{k}\right\}$ be the iterative sequence generated by Algorithm 1 and $\left\{\tilde{w}^{k}\right\}$ be defined in (19), then:

v^{k}-v^{k+1}=M\left(v^{k}-\tilde{v}^{k}\right)

(38)

where $M$ is defined in (17).
Proof: This follows from (31) and the definitions of $\tilde{\lambda}^{k}$ and $\hat{\lambda}^{k}$ :
$\lambda^{k+1}=\lambda^{k}-\sigma\left(\lambda^{k}-\hat{\lambda}^{k}\right)$
$=\lambda^{k}-\sigma\beta\left(A\tilde{x}^{k}+B\hat{y}^{k}-b\right)$
$=\lambda^{k}-\sigma\left[\beta\left(A\tilde{x}^{k}+By^{k}-b\right)-\beta B\left(y^{k}-\hat{y}^{k}\right)\right]$
$=\lambda^{k}-\sigma\left(\lambda^{k}-\tilde{\lambda}^{k}\right)+\sigma\beta B\left(y^{k}-\tilde{y}^{k}\right).$
This can be seen in combination with (31):

\begin{pmatrix}y^{k+1}\\ \lambda^{k+1}\par\end{pmatrix}=\begin{pmatrix}y^{k}\\ \lambda^{k}\par\end{pmatrix}-\begin{pmatrix}\sigma I_{n_{2}}&O\\ -\sigma\beta B&\sigma I_{m}\end{pmatrix}\begin{pmatrix}y^{k}-\tilde{y}^{k}\\ \lambda^{k}-\tilde{\lambda}^{k}\end{pmatrix}.

Therefore, equation (38) holds.

It is clear from (38) and (21):

	$\displaystyle\left(v-\tilde{v}^{k}\right)^{T}Q_{k+1}\left(v^{k}-\tilde{v}^{k}\right)$	$\displaystyle=\left(v-\tilde{v}^{k}\right)^{T}H_{k+1}M\left(v^{k}-\tilde{v}^{k}\right)$
		$\displaystyle=\left(v-\tilde{v}^{k}\right)^{T}H_{k+1}\left(v^{k}-v^{k+1}\right).$

Lemma 3.3.Let the sequence $\left\{w^{k}\right\}$ be the iterative sequence generated by Algorithm 1 and $\left\{\tilde{w}^{k}\right\}$ be defined in (19), Then, we have:

\begin{split}\left(v-\tilde{v}^{k}\right)^{T}H_{k+1}\left(v^{k}-v^{k+1}\right)&=\frac{1}{2}\left(\left\|v-v^{k+1}\right\|_{H_{k+1}}^{2}-\left\|v-v^{k}\right\|_{H_{k+1}}^{2}\right)\\ &+\frac{1}{2}\left\|v^{k}-\tilde{v}^{k}\right\|_{G_{k+1}}^{2},\qquad\forall v\in\Omega.\end{split}

(39)

where the matrix $G_{k+1}=Q_{k}^{T}+Q_{k}-M^{T}H_{k+1}M.$
Proof: Using the equation:

\begin{split}\left(a-b\right)^{T}H_{k+1}\left(c-d\right)&=\frac{1}{2}\left(\left\|a-d\right\|_{H_{k+1}}^{2}-\left\|a-c\right\|_{H_{k+1}}^{2}\right)\\ &+\frac{1}{2}\left(\left\|c-b\right\|_{H_{k+1}}^{2}-\left\|d-b\right\|_{H_{k+1}}^{2}\right).\end{split}

(40)

In (40), let $a=v$ , $b=\tilde{v}^{k}$ , $c=v^{k}$ , $d=v^{k+1}$ . We can get:

\begin{split}\left(v-\tilde{v}^{k}\right)^{T}H_{k+1}\left(v^{k}-v^{k+1}\right)&=\frac{1}{2}\left(\left\|v-v^{k+1}\right\|_{H_{k+1}}^{2}-\left\|v-v^{k}\right\|_{H_{k+1}}^{2}\right)\\ &+\frac{1}{2}\left(\left\|v^{k}-\tilde{v}^{k}\right\|_{H_{k+1}}^{2}-\left\|v^{k+1}-\tilde{v}^{k}\right\|_{H_{k+1}}^{2}\right).\end{split}

(41)

For the last term in (41), we have:

\begin{split}&\left\|v^{k}-\tilde{v}^{k}\right\|_{H_{k+1}}^{2}-\left\|v^{k+1}-\tilde{v}^{k}\right\|_{H_{k+1}}^{2}\\ &=\left\|v^{k}-\tilde{v}^{k}\right\|_{H_{k+1}}^{2}-\left\|\left(v^{k}-\tilde{v}^{k}\right)-\left(v^{k}-v^{k+1}\right)\right\|_{H_{k+1}}^{2}\\ &\stackrel{{\scriptstyle\left(44\right)}}{{=}}\left\|v^{k}-\tilde{v}^{k}\right\|_{H_{k+1}}^{2}-\left\|\left(v^{k}-\tilde{v}^{k}\right)-M\left(v^{k}-\tilde{v}^{k}\right)\right\|_{H_{k+1}}^{2}\\ &=2\left(v^{k}-\tilde{v}^{k}\right)^{T}H_{k+1}M\left(v^{k}-\tilde{v}^{k}\right)-\left(v^{k}-\tilde{v}^{k}\right)^{T}M^{T}H_{k+1}M\left(v^{k}-\tilde{v}^{k}\right)\\ &=\left(v^{k}-\tilde{v}^{k}\right)^{T}\left(Q_{k}^{T}+Q_{k}-M^{T}H_{k+1}M\right)\left(v^{k}-\tilde{v}^{k}\right).\end{split}

(42)

Combining (41) and (42), (39) is proved.

Now, let us examine the properties of the matrix $G_{k+1}$ . Utilizing (21), we can derive:

\begin{split}G_{k+1}&=Q_{k+1}^{T}+Q_{k+1}-M^{T}H_{k+1}M\\ &=Q_{k+1}^{T}+Q_{k+1}-M^{T}Q_{k+1}\\ &=\begin{pmatrix}2\tau_{k}r\beta I_{n_{2}}&-B^{T}\\ -B&\frac{2}{\beta}I_{m}\end{pmatrix}-\begin{pmatrix}\sigma I_{n_{2}}&-\sigma\beta B^{T}\\ O&\sigma I_{m}\end{pmatrix}\begin{pmatrix}\tau_{k}r\beta I_{n_{2}}&O\\ -B&\frac{1}{\beta}I_{m}\end{pmatrix}\\ &=\begin{pmatrix}2\tau_{k}r\beta I_{n_{2}}&-B^{T}\\ -B&\frac{2}{\beta}I_{m}\end{pmatrix}-\begin{pmatrix}\sigma\tau_{k}r\beta I_{n_{2}}+\sigma\beta B^{T}B&-\sigma B^{T}\\ -\sigma B&\frac{\sigma}{\beta}I_{m}\end{pmatrix}\\ &=\begin{pmatrix}\left(2-\sigma\right)\tau_{k}r\beta I_{n_{2}}-\sigma\beta B^{T}B&\left(\sigma-1\right)B^{T}\\ \left(\sigma-1\right)B&\frac{\left(2-\sigma\right)}{\beta}I_{m}\end{pmatrix}.\end{split}

(43)

To simplify the proof, we give some properties of the parameters as follows:
From the iterative format of Algorithm 1: $\frac{1}{1+\eta_{k}}\tau_{k}\leq\tau_{k+1}\leq\left(1+\xi_{k}\right)\tau_{k},$ That is, the sequence in Algorithm 1 satisfies: $\tau_{k}\subset\left[\tau_{min},\tau_{max}\right].$
Let the $(k+1)$ th step ultimately satisfy the parameters of Step 4 as $\tau_{k+1}=\rho\gamma^{m_{k}}\tau_{k}$ , where $m_{k}$ is an integer.
Let $\xi_{k}:=\rho\gamma^{m_{k}}-1,$ that is $1+\xi_{k}=\rho\gamma^{m_{k}}.$
then:

\tau_{k+1}\geq\frac{1+\xi_{k}}{1+\eta_{k}}\tau_{k}\geq\cdots\geq\frac{\prod_{i=1}^{k}\left(1+\xi_{i}\right)}{\prod_{i=1}^{k}\left(1+\eta_{i}\right)}\tau_{1}\geq\frac{\prod_{i=1}^{k}\left(1+\xi_{i}\right)}{\prod_{i=1}^{\infty}\left(1+\eta_{i}\right)}\tau_{0}.

By parameter setting $\sum_{k=1}^{\infty}\eta_{k}<+\infty$ know $\prod_{i=1}^{\infty}\left(1+\eta_{i}\right)<+\infty.$
Making $k\to+\infty$ gives: $\prod_{i=1}^{\infty}\left(1+\xi_{i}\right)<+\infty$ , that is: $\sum_{k=1}^{\infty}\xi_{i}<+\infty.$
Theorem 3.1. Let the sequence $\left\{w^{k}\right\}$ be the iterative sequence generated by Algorithm 1 and $\left\{\tilde{w}^{k}\right\}$ be defined in (19). Then, we have:

\begin{split}\left\|v^{k+1}-v^{\ast}\right\|_{H_{k+1}}^{2}&\leq\left(1+\xi_{k}\right)\left\|v^{k}-v^{\ast}\right\|_{H_{k}}^{2}-\frac{1}{\sigma^{2}}\left\{\beta\left\|y^{k}-y^{k+1}\right\|_{T_{k+1}}^{2}\right.\\ &\left.+\frac{\left(2-\sigma\right)-\varepsilon}{\beta}\left\|\lambda^{k}-\lambda^{k+1}\right\|^{2}\right\}.\end{split}

(44)

where $\varepsilon\in\left(0,2-\sigma\right)$ and $T_{k+1}\succ 0.$
Proof: In the following, we will further investigate the $\left\|v^{k}-\tilde{v}^{k}\right\|_{G_{k+1}}^{2}$ term and show how it can be bounded.
Due to $G_{k+1}=\begin{pmatrix}\left(2-\sigma\right)\tau_{k}r\beta I_{n_{2}}-\sigma\beta B^{T}B&\left(\sigma-1\right)B^{T}\\ \left(\sigma-1\right)B&\frac{\left(2-\sigma\right)}{\beta}I_{m}\end{pmatrix}$ and $v=\begin{pmatrix}y\\ \lambda\end{pmatrix},$ We have:

\begin{split}\left\|v^{k}-\tilde{v}^{k}\right\|_{G_{k+1}}^{2}&=\left(2-\sigma\right)\tau_{k}r\beta\left\|y^{k}-\tilde{y}^{k}\right\|^{2}-\sigma\beta\left\|B\left(y^{k}-\tilde{y}^{k}\right)\right\|^{2}\\ &+\frac{2-\sigma}{\beta}\left\|\lambda^{k}-\tilde{\lambda}^{k}\right\|^{2}+2\left(\sigma-1\right)\left(\lambda^{k}-\tilde{\lambda}^{k}\right)^{T}B\left(y^{k}-\tilde{y}^{k}\right)\\ &=\left(2-\sigma\right)\tau_{k}r\beta\left\|y^{k}-\tilde{y}^{k}\right\|^{2}+\left(2-\sigma\right)\beta\left\|A\tilde{x}^{k}+B\tilde{y}^{k}-b\right\|^{2}\\ &+2\beta\left(A\tilde{x}^{k}+B\tilde{y}^{k}-b\right)^{T}B\left(y^{k}-\tilde{y}^{k}\right)\end{split}

(45)

Also because $\hat{\lambda}^{k}=\lambda^{k}-\beta\left(A\tilde{x}^{k}+B\tilde{y}^{k}-b\right)$ and $y^{k}-\tilde{y}^{k}=\frac{1}{\sigma}\left(y^{k}-y^{k+1}\right).$
Therefore, we have:

\begin{split}\left\|v^{k}-\tilde{v}^{k}\right\|_{G_{k+1}}^{2}&=\frac{\left(2-\sigma\right)}{\sigma^{2}}\tau_{k}r\beta\left\|y^{k}-y^{k+1}\right\|^{2}+\frac{\left(2-\sigma\right)}{\beta}\left\|\lambda^{k}-\hat{\lambda}^{k}\right\|^{2}\\ &+\frac{2}{\sigma}\left(\lambda^{k}-\hat{\lambda}^{k}\right)^{T}B\left(y^{k}-y^{k+1}\right).\end{split}

(46)

It is clear from (31):

\lambda^{k}-\hat{\lambda}^{k}=\frac{1}{\sigma}\left(\lambda^{k}-\lambda^{k+1}\right).

Substituting the above equation into (46) gives:

\begin{split}\left\|v^{k}-\tilde{v}^{k}\right\|_{G_{k+1}}^{2}&=\frac{\left(2-\sigma\right)}{\sigma^{2}}\tau_{k}r\beta\left\|y^{k}-y^{k+1}\right\|^{2}+\frac{\left(2-\sigma\right)}{\sigma^{2}\beta}\left\|\lambda^{k}-\lambda^{k+1}\right\|^{2}\\ &+\frac{2}{\sigma^{2}}\left(\lambda^{k}-\lambda^{k+1}\right)^{T}B\left(y^{k}-y^{k+1}\right).\end{split}

(47)

In the following, we estimate $\left(\lambda^{k}-\lambda^{k+1}\right)^{T}B\left(y^{k}-y^{k+1}\right)$ .
From the $Cauchy-Schwarz$ inequality: for $\forall\delta>0$ , there is :

\left(\lambda^{k}-\lambda^{k+1}\right)^{T}\left(By^{k}-By^{k+1}\right)\geq-\frac{\delta}{\beta}\left\|\lambda^{k}-\lambda^{k+1}\right\|^{2}-\frac{1}{4\delta}\beta\left\|B\left(y^{k}-y^{k+1}\right)\right\|^{2}.

(48)

Substituting (48) into (47) shows that:

\begin{split}\left\|v^{k}-\tilde{v}^{k}\right\|_{G_{k+1}}^{2}&\geq\frac{1}{\sigma^{2}}\left[\left(2-\sigma\right)\tau_{k}r\beta\left\|y^{k}-y^{k+1}\right\|^{2}-\frac{1}{2\delta}\beta\left\|B\left(y^{k}-y^{k+1}\right)\right\|^{2}\right.\\ &\left.+\frac{\left(2-\sigma\right)-2\delta}{\beta}\left\|\lambda^{k}-\lambda^{k+1}\right\|^{2}\right].\end{split}

(49)

Owing to:

\left(w_{1}-w_{2}\right)^{T}\left(F\left(w_{1}\right)-F\left(w_{2}\right)\right)\geq 0,\quad\forall w_{1},w_{2}\in\Omega.

(50)

It follows from (50) and (34):

\theta\left(u\right)-\theta\left(\tilde{u}^{k}\right)+\left(w-\tilde{w}^{k}\right)^{T}F\left(w\right)\geq\left(v-\tilde{v}^{k}\right)^{T}Q_{k+1}\left(v^{k}-\tilde{v}^{k}\right),\quad\forall w\in\Omega.

Let the above equation $w=w^{\ast}$ and combine with (15) to show that:

\left(\tilde{v}^{k}-v^{\ast}\right)^{T}Q_{k+1}\left(v^{k}-\tilde{v}^{k}\right)\geq 0,\quad\forall w\in\Omega.

(51)

It is clear from (21) and (38):

\left(\tilde{v}^{k}-v^{\ast}\right)^{T}Q_{k+1}\left(v^{k}-\tilde{v}^{k}\right)=\left(\tilde{v}^{k}-v^{\ast}\right)^{T}H_{k+1}\left(v^{k}-v^{k+1}\right).

(52)

It is clear from (39):

\begin{split}\left(\tilde{v}^{k}-v^{\ast}\right)^{T}H_{k+1}\left(v^{k}-v^{k+1}\right)&=\frac{1}{2}\left(\left\|v^{k}-v^{\ast}\right\|_{H_{k+1}}^{2}-\left\|v^{k+1}-v^{\ast}\right\|_{H_{k+1}}^{2}\right)\\ &-\frac{1}{2}\left\|v^{k}-\tilde{v}^{k}\right\|_{G_{k+1}}^{2}.\end{split}

(53)

It follows from (51), (52) and (53):

\left\|v^{k+1}-v^{\ast}\right\|_{H_{k+1}}^{2}\leq\left\|v^{k}-v^{\ast}\right\|_{H_{k+1}}^{2}-\left\|v^{k}-\tilde{v}^{k}\right\|_{G_{k+1}}^{2}.

(54)

Substituting (49) into (54) gives:

\begin{split}\left\|v^{k+1}-v^{\ast}\right\|_{H_{k+1}}^{2}&\leq\left\|v^{k}-v^{\ast}\right\|_{H_{k+1}}^{2}-\frac{1}{\sigma^{2}}\left\{\beta\left\|y^{k}-y^{k+1}\right\|_{T_{k+1}}^{2}\right.\\ &\left.+\frac{\left(2-\sigma\right)-2\delta}{\beta}\left\|\lambda^{k}-\lambda^{k+1}\right\|^{2}\right\}\\ &\leq\left\|v^{k}-v^{\ast}\right\|_{H_{k}}^{2}+\left(\tau_{k}-\tau_{k-1}\right)\frac{r\beta}{\sigma}\left\|y^{k}-y^{\ast}\right\|^{2}\\ &-\frac{1}{\sigma^{2}}\left\{\beta\left\|y^{k}-y^{k+1}\right\|_{T_{k+1}}^{2}+\frac{\left(2-\sigma\right)-2\delta}{\beta}\left\|\lambda^{k}-\lambda^{k+1}\right\|^{2}\right\}\\ &\leq\left(1+\xi_{k}\right)\left\|v^{k}-v^{\ast}\right\|_{H_{k}}^{2}-\frac{1}{\sigma^{2}}\left\{\beta\left\|y^{k}-y^{k+1}\right\|_{T_{k+1}}^{2}\right.\\ &\left.+\frac{\left(2-\sigma\right)-2\delta}{\beta}\left\|\lambda^{k}-\lambda^{k+1}\right\|^{2}\right\}\end{split}

(55)

where $T_{k+1}=\left(2-\sigma\right)\tau_{k}rI_{n_{2}}-\frac{1}{2\delta}B^{T}B.$
For $\lim_{k\to\infty}\left(v^{k}-v^{k+1}\right)=0$ to hold, it is sufficient that: $\delta=\frac{\varepsilon}{2}$ and $T_{k+1}\succ 0.$
$T_{k+1}\succ 0$ is equivalent to $\left(2-\sigma\right)r_{k}\left\|y^{k}-y^{k+1}\right\|^{2}>\frac{1}{\varepsilon}\left\|B\left(y^{k}-y^{k+1}\right)\right\|^{2}.$
So (55) can be written as:

\begin{split}\left\|v^{k+1}-v^{\ast}\right\|_{H_{k+1}}^{2}&\leq\left(1+\eta_{k}\right)\left\|v^{k}-v^{\ast}\right\|_{H_{k}}^{2}-\frac{1}{\sigma^{2}}\left\{\beta\left\|y^{k}-y^{k+1}\right\|_{D_{k+1}}^{2}\right.\\ &\left.+\frac{\left(2-\sigma\right)-\varepsilon}{\beta}\left\|\lambda^{k}-\lambda^{k+1}\right\|^{2}\right\}.\end{split}

Therefore, Theorem 3.1 is proved.
Theorem 3.2. Let the sequence $\left\{w^{k}\right\}$ be the iterative sequence generated by Algorithm 1. Taking any point $w^{\ast}$ in $\Omega^{\ast}$ , we have:
(a). $\lim_{k\to\infty}\left\|v^{k+1}-v^{\ast}\right\|_{H_{k+1}}^{2}$ exists and $\lim_{k\to\infty}\left\|v^{k+1}-v^{\ast}\right\|_{H_{k+1}}^{2}<+\infty.$
(b). $\lim_{k\to\infty}\left\|y^{k+1}-y^{k}\right\|=0$ and $\lim_{k\to\infty}\left\|\lambda^{k+1}-\lambda^{k}\right\|=0.$
Proof: Let

a^{k}=\left\|v^{k+1}-v^{\ast}\right\|_{H_{k+1}}^{2},b^{k}=\xi_{k},c^{k}=0,

d^{k}=\beta\left\|y^{k+1}-y^{k}\right\|_{D_{k+1}}^{2}+\frac{1-2\varepsilon}{\beta}\left\|\lambda^{k+1}-\lambda^{k}\right\|^{2}.

By Lemma 2.2, (a) and (b) hold.
Theorem 3.3. Let the sequence $\left\{w^{k}\right\}$ be the iterative sequence generated by Algorithm 1, then $\left\{w^{k}\right\}$ converges to a point $w^{\infty}\in\Omega^{\ast}$ .
Proof: It follows from Theorem 3.2: $\lim_{k\to\infty}\left\|v^{k+1}-v^{k}\right\|=0.$
It follows from combining (38) and the non-singularity of $M$ : $\lim_{k\to\infty}\left\|v^{k}-\tilde{v}^{k}\right\|=0.$
Since the sequence $\left\{\left\|v^{k+1}-v^{\ast}\right\|_{H_{k+1}}^{2}\right\}$ is a bounded sequence,
then for any fixed $v^{\ast}\in\Omega^{\ast},$ there is a $\left\|v^{k+1}-v^{\ast}\right\|$ bounded.
That is, the sequence $\left\{v^{k}\right\}$ is bounded.
because of

\left\|\tilde{v}^{k}-v^{\ast}\right\|\leq\left\|v^{k}-\tilde{v}^{k}\right\|+\left\|v^{k}-v^{\ast}\right\|.

It is known that $\left\|\tilde{v}^{k}-v^{\ast}\right\|$ is bounded.
Clearly the sequence $\left\{\tilde{v}^{k}\right\}$ is also bounded and there must exist a convergence point $v^{\infty},\\$ such that a subsequence $\left\{\tilde{v}^{k_{j}}\right\}$ of the existence sequence $\left\{\tilde{v}^{k}\right\}$ converges to $v^{\infty}.$
From $\tilde{\lambda}^{k}=\lambda^{k}-\beta\left(A\tilde{x}^{k}+By^{k}-b\right)$ in (19), we have:

A\tilde{x}^{k_{j}}=\frac{1}{\beta}\left(\lambda^{k_{j}}-\tilde{\lambda}^{k_{j}}\right)-\left(By^{k_{j}}-b\right).

Since Matrix A is a column full rank matrix, it follows that the sequence $\left\{\tilde{x}^{k_{j}}\right\}$ converges.
Set $\lim_{k_{j}\to\infty}\tilde{x}^{k_{j}}=x^{\infty},$ then there exists a subsequence $\left\{\tilde{w}^{k_{j}}\right\}$ converging to $w^{\infty}$ .
Let $k=k_{j}$ in (34), we know that $\qquad\tilde{w}^{k_{j}}\in\Omega$ ,

\theta\left(u\right)-\theta\left(\tilde{u}^{k_{j}}\right)+\left(w-\tilde{w}^{k_{j}}\right)^{T}F\left(\tilde{w}^{k_{j}}\right)\geq\left(w-\tilde{w}^{k_{j}}\right)^{T}Q\left(w^{k_{j}}-\tilde{w}^{k_{j}}\right),\quad\forall w\in\Omega.

Let the above equation $k\to\infty,$ it is clear that:

w^{\infty}\in\Omega,\theta\left(u\right)-\theta\left(u^{\infty}\right)+\left(w-w^{\infty}\right)^{T}F\left(w^{\infty}\right)\geq 0,\forall w\in\Omega.

(56)

(56) shows that $w^{\infty}$ in $\Omega^{\ast}$ is a solution of the variational inequality $VI\left(\Omega,F,\theta\right)$ , so the sequence $\left\{w^{k}\right\}$ converges to $w^{\infty}.$

4 Numerical experiments

The numerical performance of Algorithm 1 is presented in this paper through solving the Lasso problem and comparing Algorithm 1 with optimal linearized ADMM(OLADMM)[22]. All simulation experiments were conducted on a laptop with 4GB of RAM memory using Matlab R2016a.

The LASSO problem[12] in statistics is as follows:

\min_{x,y}\frac{1}{2}\left\|x-b\right\|_{2}^{2}+\iota\left\|y\right\|_{1}\qquad s.t.\quad x=Ay

(57)

where $A\in R^{m\times n},b\in R^{m},x\in R^{n},y\in R^{n}.$
Then applying Algorithm 1 to (57), we obtain:

The $x$ -subproblem is an unconstrained convex quadratic minimization problem, and its unique solution can be expressed as follows:

x^{k+1}=\frac{1}{1+\beta}\left(b+\lambda^{k}+\beta Ay^{k}\right)

The $y$ -subproblem is a $l_{1}+l_{2}$ minimization problem that can be solved using the soft-threshold operator:

\begin{split}&\hat{y}^{k}=shrink\left\{y^{k}-\frac{1}{\tau_{k}\delta_{k}\beta}A^{T}\left[\lambda^{k}-\beta\left(x^{k+1}-Ay^{k}\right)\right],\frac{\iota}{\tau_{k}\delta_{k}\beta}\right\}.\\ &y^{k+1}=y^{k}-\sigma\left(y^{k}-\hat{y}^{k}\right).\end{split}

The solution to the $\lambda$ -subproblem is:

\begin{split}&\hat{\lambda}^{k}=\lambda^{k}-\beta\left(x^{k+1}-A\hat{y}^{k}\right).\\ &\lambda^{k+1}=\lambda^{k}-\sigma\left(\lambda^{k}-\hat{\lambda}^{k}\right).\end{split}

In this paper, numerical experiments are conducted to compare the performance of the Algorithm 1 with OLADMM .
Set the parameters of each algorithm as follows:
Algorithm 1: $r=\left\|A^{T}A\right\|$ , $\iota=0.1\left\|A^{T}b\right\|_{\infty}$ , $\tau_{0}=0.75$ , $\tau_{min}=0.01$ , $\gamma=1.2$ , $\sigma=0.9$ , $\beta=1$ , $\rho=3$ , $p^{0}=100$ , $d^{0}=100$ , ${\varepsilon}^{\prime}=\frac{1}{\varepsilon}=\frac{1}{2-\sigma}+0.1$ , $\eta_{k}=0.25\min\left\{1,\frac{1}{\left(\max\left\{1,k-l\right\}\right)^{2}}\right\}$ , $s_{k}=2\min\left\{1,\frac{1}{\left(\max\left\{1,k-l\right\}\right)^{2}}\right\}.$
where $l$ = the dimension of $\lambda$ .
$\text{OLADMM: }r=\left\|A^{T}A\right\|,\tau=0.75,\beta=1,\iota=0.1\left\|A^{T}b\right\|_{\infty}$ .
The stopping guidelines are:

\left\|p^{k+1}\right\|=\left\|x^{k+1}-Ay^{k+1}\right\|<\varepsilon^{pri},\quad and\quad\left\|d^{k+1}\right\|=\left\|\beta A\left(y^{k+1}-y^{k}\right)\right\|<\varepsilon^{dual}.

where

\varepsilon^{pri}=\sqrt{n}\varepsilon^{abs}+\varepsilon^{rel}max\left\{\left\|x^{k+1}\right\|,\left\|Ay^{k+1}\right\|\right\}

and

\varepsilon^{dual}=\sqrt{n}\varepsilon^{abs}+\varepsilon^{rel}\left\|y^{k+1}\right\|.

with $\varepsilon^{abs}$ and $\varepsilon^{rel}$ set to be $10^{-4}$ and $10^{-2}.$
For a matrix A of given dimension $m\times n$ ,we generate the data randomly as follow:
$p=\frac{1}{n},$ $x^{0}=sprandn\left(n,1,p\right),$ $A=randn\left(m,n\right),$ $b=A\ast x^{0}+sqrt\left(0.001\right)\ast randn\left(m,1\right).$ The initial point is $\left(y^{0},\lambda^{0}\right)=\left(0,0\right)$ .

In order to observe the effect on the experiment caused by different values of $\sigma$ , the choice of the:

\sigma=\left\{0.1,0.2,0.3,\cdots 1.7,1.8,1.9\right\}

to carry out the experiment, the results of which show that when $\sigma\in\left[0.6,1.4\right]$ , the results are satisfactory.In this paper, we take $\sigma=0.9.$

The experimental results of the two algorithms are shown below:
Table 1. Comparsion between Algorithm 1 and OLADMM for (57).
$n\times n$ matrix OLADMM Algorithm 1 $m$ $n$ Iter. CPU(s) $\left\|p^{k}\right\|$ $\left\|q^{k}\right\|$ Iter. CPU(s) $\left\|p^{k}\right\|$ $\left\|q^{k}\right\|$ 1000 1500 16 3.23 $0.0700$ $0.0701$ 11 2.38 $0.0414$ $0.0446$ 1500 1500 13 2.81 $0.0650$ $0.0660$ 10 2.53 $0.0350$ $0.0500$ 1500 3000 17 7.03 $0.0658$ $0.0659$ 10 6.18 $0.0448$ $0.0599$ 2000 3000 14 7.16 $0.0759$ $0.0764$ 10 6.57 $0.0243$ $0.0399$ 3000 3000 13 7.80 $0.0572$ $0.0582$ 9 7.40 $0.0377$ $0.0695$ 3000 5000 15 24.96 $0.0685$ $0.0688$ 10 23.44 $0.0312$ $0.0485$ 4000 5000 13 25.01 $0.0664$ $0.0674$ 10 24.69 $0.0365$ $0.0549$ 5000 5000 12 26.34 $0.0614$ $0.0636$ 9 25.73 $0.0435$ $0.0739$

Table 1 presents the number of iterations and total computation time needed for OLADMM and Algorithm 1. It is evident that Algorithm 1 consistently outperforms OLADMM as it demands fewer iteration steps and less time to meet the termination condition. Consequently, Algorithm 1 demonstrates superior efficiency in contrast to OLADMM. Furthermore, Algorithm 1 significantly surpasses OLADMM, providing robust backing for our convergence analysis.

5 Conclusion

In this study, we introduce an adaptive linearized alternating direction multiplier method with a relaxation step that incorporates both a relaxation step and an adaptive technique. Our approach adopts a unified framework of variational inequalities and establishes the global convergence of this adaptive linearized alternating direction multiplier method with a relaxation step through theoretical analysis. Through the resolution of the Lasso problem, we demonstrate the algorithm’s outstanding numerical performance. These findings will propel advancements in optimization algorithms and offer more efficient tools and methodologies for addressing practical challenges.

Conflict of Interest:The authors declare that they have no conflict of interest.

References

\bibcommenthead
Tao et al. [2009] Tao, M., Yang, J., He, B.: Alternating direction algorithms for total variation deconvolution in image reconstruction. TR0918, Department of Mathematics, Nanjing University (2009)
Ng et al. [2010] Ng, M.K., Weiss, P., Yuan, X.: Solving constrained total-variation image restoration and reconstruction problems via alternating direction methods. SIAM journal on Scientific Computing 32(5), 2710–2736 (2010)
Boyd et al. [2011] Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J., et al.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning 3(1), 1–122 (2011)
Combettes and Pesquet [2007] Combettes, P.L., Pesquet, J.-C.: A douglas–rachford splitting approach to nonsmooth convex variational signal recovery. IEEE Journal of Selected Topics in Signal Processing 1(4), 564–574 (2007)
Gabay and Mercier [1976] Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Computers & Mathematics with Applications 2(1), 17–40 (1976)
Glowinski and Marrocco [1974] Glowinski, R., Marrocco, A.: Analyse numerique du champ magnetique d’un alternateur par elements finis et sur-relaxation ponctuelle non lineaire. Computer Methods in Applied Mechanics & Engineering 3(1), 55–85 (1974)
He [2018] He, B.: My 20 years research on alternating directions method of multipliers. Oper. Res. Trans 22(1), 1–31 (2018)
Eckstein and Yao [2017] Eckstein, J., Yao, W.: Approximate admm algorithms derived from lagrangian splitting. Computational Optimization and Applications 68, 363–405 (2017)
He et al. [2002] He, B., Liao, L.-Z., Han, D., Yang, H.: A new inexact alternating directions method for monotone variational inequalities. Mathematical Programming 92, 103–118 (2002)
Ng et al. [2011] Ng, M.K., Wang, F., Yuan, X.: Inexact alternating direction methods for image recovery. SIAM Journal on Scientific Computing 33(4), 1643–1668 (2011)
Chan et al. [2012] Chan, R.H., Tao, M., Yuan, X.: Linearized alternating direction method of multipliers for constrained linear least-squares problem. East Asian Journal on Applied Mathematics 2(4), 326–341 (2012)
Tibshirani [1996] Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology 58(1), 267–288 (1996)
Recht et al. [2010] Recht, B., Fazel, M., Parrilo, P.A.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM review 52(3), 471–501 (2010)
Tao and Yuan [2011] Tao, M., Yuan, X.: Recovering low-rank and sparse components of matrices from incomplete and noisy observations. SIAM Journal on Optimization 21(1), 57–81 (2011)
Fang et al. [2015] Fang, E.X., He, B., Liu, H., Yuan, X.: Generalized alternating direction method of multipliers: new theoretical insights and applications. Mathematical programming computation 7(2), 149–187 (2015)
Woo and Yun [2013] Woo, H., Yun, S.: Proximal linearized alternating direction method for multiplicative denoising. SIAM Journal on Scientific Computing 35(2), 336–358 (2013)
He and Yuan [2013] He, B., Yuan, X.: Linearized alternating direction method of multipliers with gaussianback substitution for separable convex programming. Numerical Algebra, Control and Optimization 3(2), 247–260 (2013)
Ouyang et al. [2015] Ouyang, Y., Chen, Y., Lan, G., Pasiliao Jr, E.: An accelerated linearized alternating direction method of multipliers. SIAM Journal on Imaging Sciences 8(1), 644–681 (2015)
Chan et al. [2012] Chan, R.H., Tao, M., Yuan, X.: Linearized alternating direction method of multipliers for constrained linear least-squares problem. East Asian Journal on Applied Mathematics 2(4), 326–341 (2012)
Wang and Yuan [2012] Wang, X., Yuan, X.: The linearized alternating direction method of multipliers for dantzig selector. SIAM Journal on Scientific Computing 34(5), 2792–2811 (2012)
He et al. [2016] He, B., Ma, F., Yuan, X.: Linearized alternating direction method of multipliers via positive-indefinite proximal regularization for convex programming. Avaliable on (2016) https://doi.org/https://optimization-online.org
He et al. [2020] He, B., Ma, F., Yuan, X.: Optimally linearizing the alternating direction method of multipliers for convex programming. Computational Optimization and Applications 75(2), 361–388 (2020)
He [2015] He, B.: Ppa-like contraction methods for convex optimization: a framework using variational inequality approach. Journal of the Operations Research Society of China 3, 391–420 (2015)
Tao and Yuan [2018] Tao, M., Yuan, X.: On the optimal linear convergence rate of a generalized proximal point algorithm. Journal of Scientific Computing 74, 826–850 (2018)
Gu and Yang [2019] Gu, G., Yang, J.: On the optimal linear convergence factor of the relaxed proximal point algorithm for monotone inclusion problems. arXiv preprint arXiv:1905.04537 (2019)
Robbins H [1971] Robbins H, S.D.: A convergence theorem for non negative almost supermartingales and some applications. Optimizing Methods in Statistics, 233–257 (1971)