This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

An adaptive linearized alternating direction multiplier method with a relaxation step for convex programming

\fnmBoran Wang [email protected] \orgdivCollege of Science, \orgnameMinzu University of China, \cityBeijing, \postcode100081, \stateChina
Abstract

Alternating direction multiplication is a powerful technique for solving convex optimisation problems. When challenging subproblems are encountered in the real world, it is useful to solve them by introducing neighbourhood terms. When the neighbourhood matrix is positive definite, the algorithm converges but at the same time makes the iteration step small. Recent studies have revealed the potential non-positive definiteness of the neighbourhood matrix. In this paper, we present an adaptive linearized alternating direction multiplier method with a relaxation step, combining the relaxation step with an adaptive technique. The novelty of the method is to use the information of the current iteration point to dynamically select the neighbourhood matrix, increase the iteration step size, and speed up the convergence of the algorithm.We prove the global convergence of the algorithm theoretically and illustrate the effectiveness of the algorithm using numerical experiments.

keywords:
variational inequality, an adaptive linearized ADMM , a relaxation step , global convergence.

1 Introduction

This paper considers the following separable convex optimization problem with linear constraints:

min{f(x)+g(y)|Ax+By=b,xX,yY}\min\left\{f\left(x\right)+g\left(y\right)|Ax+By=b,x\in X,y\in Y\right\} (1)

where f:Rn1Rf:R^{n_{1}}\to R and g:Rn2Rg:R^{n_{2}}\to R are convex functions (but not necessarily smooth), ARm×n1,BRm×n2,bRm,XRn1A\in R^{m\times n_{1}},B\in R^{m\times n_{2}},b\in R^{m},X\subseteq R^{n_{1}} and YRn2Y\subseteq R^{n_{2}} are closed convex sets. Problem (1) has recently been widely used in several areas such as image processing[1, 2], statistical learning[3] and communication networks[4].There exist numerous effective methods for addressing problem (1), including the primal-dual method, the augmented Lagrange method, and the alternating direction multiplier method. Notably, the alternating direction multiplier method has garnered significant attention in recent years due to its simplicity and efficiency. Many scholars have introduced several effective variations of this algorithm tailored to practical challenges.

The alternate direction multiplier approach, which was first put forth by Gabay and Mercier[5] and Glowinski and Marrocco[6], is widely recognized as an effective way to solve (1). For the progress of research on the alternating direction multiplier method, we refer to[7].The method possesses fast convergence speed and strong convergence performance, making it a crucial tool for addressing convex optimization problems with divisibility. It can decompose a large problem into two or more small-scale subproblems and then iteratively solve each subproblem to enhance solution speed. Its direct, adaptable, and practical nature is particularly effective in tackling convex optimization problems with separability. The alternating direction multiplier method can be described as follows:

{xk+1=argminx{Lβ(x,yk,λk)|xX}yk+1=argminy{Lβ(xk+1,y,λk)|yY}λk+1=λkβ(Axk+1+Byk+1b)\left\{\begin{aligned} x^{k+1}&=arg\min_{x}\left\{L_{\beta}\left(x,y^{k},\lambda^{k}\right)|x\in X\right\}\\ y^{k+1}&=arg\min_{y}\left\{L_{\beta}\left(x^{k+1},y,\lambda^{k}\right)|y\in Y\right\}\\ \lambda^{k+1}&=\lambda^{k}-\beta\left(Ax^{k+1}+By^{k+1}-b\right)\end{aligned}\right. (2)

where Lβ(x,y,λ)L_{\beta}\left(x,y,\lambda\right) is the augmented Lagrangian function of (1)

Lβ(x,y,λ)=f(x)+g(y)λT(Ax+Byb)+β2Ax+Byb22.L_{\beta}\left(x,y,\lambda\right)=f\left(x\right)+g\left(y\right)-\lambda^{T}\left(Ax+By-b\right)+\frac{\beta}{2}\left\|Ax+By-b\right\|_{2}^{2}.

where λRm\lambda\in R^{m} represents the Lagrange multiplier, and β>0\beta>0 represents the penalty parameter. For simplicity, the penalty parameter β\beta is fixed in our discussion.

A significant focus in the diverse research literature concerning the alternating direction multiplier method is the exploration of efficient strategies for solving its subproblems. In certain scenarios, the functions f(x)f\left(x\right) and g(y)g\left(y\right), or the coefficient matrices AA and BB may exhibit unique properties and structures. Leveraging these characteristics, researchers can extend the framework presented in (2) to devise algorithms tailored to specific applications while upholding theoretical convergence guarantees. This principle underscores the critical role of effectively applying the alternating direction multiplier method across various fields and applications. To delve deeper into this concept, let’s examine the yy-subproblem outlined in (2). As previously noted, the function g(y)g(y), the matrice BB, and the set YY are pivotal in addressing the yy-subproblems. Typically, when these the function, matrice, and set are in a generic form, the solution to (2) can be obtained using straightforward iterative techniques,we refer to [8, 9, 10]. However, in practical scenarios, the functions, matrices, and sets YY associated with the yy-subproblem may possess distinct characteristics, necessitating more efficient solution methods. When the matrix B is not an identity matrix, the yy-subproblem (2) can be reformulated as follows:

yk+1=argminy{g(y)+β2By+(Axk+1b1βλk)2|yY}y^{k+1}=arg\min_{y}\left\{g\left(y\right)+\frac{\beta}{2}\left\|By+\left(Ax^{k+1}-b-\frac{1}{\beta}\lambda^{k}\right)\right\|^{2}|y\in Y\right\} (3)

After linearizing the quadratic term By+(Axk+1b1βλk)2\left\|By+\left(Ax^{k+1}-b-\frac{1}{\beta}\lambda^{k}\right)\right\|^{2} in (3) , it is obtained:

yk+1=argminy{g(y)+r2y(yk+1rqk)2|yY}.y^{k+1}=arg\min_{y}\left\{g\left(y\right)+\frac{r}{2}\left\|y-\left(y^{k}+\frac{1}{r}q^{k}\right)\right\|^{2}|y\in Y\right\}. (4)

where

qk=BT(λkβ(Axk+1+Bykb)).q^{k}=B^{T}\left(\lambda^{k}-\beta\left(Ax^{k+1}+By^{k}-b\right)\right). (5)

where r>0r>0 is a constant, and thus we obtain the linearized alternating direction multiplier method[11]:

{xk+1=argminx{Lβ(x,yk,λk)|xX}yk+1=argminy{Lβ(xk+1,y,λk)+r2y(yk+1rqk)2|yY}λk+1=λkβ(Axk+1+Byk+1b)\left\{\begin{aligned} x^{k+1}&=arg\min_{x}\left\{L_{\beta}\left(x,y^{k},\lambda^{k}\right)|x\in X\right\}\\ y^{k+1}&=arg\min_{y}\left\{L_{\beta}\left(x^{k+1},y,\lambda^{k}\right)+\frac{r}{2}\left\|y-\left(y^{k}+\frac{1}{r}q^{k}\right)\right\|^{2}|y\in Y\right\}\\ \lambda^{k+1}&=\lambda^{k}-\beta\left(Ax^{k+1}+By^{k+1}-b\right)\end{aligned}\right. (6)

Linearized alternating direction multiplier method is widely used with compressed perception[12], and image processing[13, 14], etc. .

A more general alternating direction multiplier method[15] can be written as follows:

{xk+1=argminx{Lβ(x,yk,λk)|xX}yk+1=argminy{Lβ(xk+1,y,λk)+12yykD2|yY}λk+1=λkβ(Axk+1+Byk+1b)\left\{\begin{aligned} x^{k+1}&=arg\min_{x}\left\{L_{\beta}\left(x,y^{k},\lambda^{k}\right)|x\in X\right\}\\ y^{k+1}&=arg\min_{y}\left\{L_{\beta}\left(x^{k+1},y,\lambda^{k}\right)+\frac{1}{2}\left\|y-y^{k}\right\|_{D}^{2}|y\in Y\right\}\\ \lambda^{k+1}&=\lambda^{k}-\beta\left(Ax^{k+1}+By^{k+1}-b\right)\end{aligned}\right. (7)

where DRn2×n2D\in R^{n_{2}\times n_{2}} is positive definite, and thus, the linearized alternating direction multiplier method can be viewed as a special case of the general alternating direction multiplier method, where the positive definite proximal term.

D=rIn2βBTBandr>βBTBD=rI_{n_{2}}-\beta B^{T}B\quad and\quad r>\beta\left\|B^{T}B\right\| (8)

Since in many practical applications, only one subproblem in (6) needs to be linearized, in this paper, we only need to consider the case of linearizing the yy-subproblem in (7).

In order to improve the applicability of the linearized alternating direction multiplier method, a number of improved linearized alternating direction multiplier methods[16, 17, 18, 19, 20]have been proposed by some scholars.

Literature[21] proposes an indefinite proximity alternating direction multiplier method with the following iteration format:

{xk+1=argminx{Lβ(x,yk,λk)|xX}yk+1=argminy{Lβ(xk+1,y,λk)+12yykD02|yY}λk+1=λkβ(Axk+1+Byk+1b)\left\{\begin{aligned} x^{k+1}&=arg\min_{x}\left\{L_{\beta}\left(x,y^{k},\lambda^{k}\right)|x\in X\right\}\\ y^{k+1}&=arg\min_{y}\left\{L_{\beta}\left(x^{k+1},y,\lambda^{k}\right)+\frac{1}{2}\left\|y-y^{k}\right\|_{D_{0}}^{2}|y\in Y\right\}\\ \lambda^{k+1}&=\lambda^{k}-\beta\left(Ax^{k+1}+By^{k+1}-b\right)\end{aligned}\right. (9)

where

D0=τrIn2βBTB,0.8τ<1andr>βBTB.D_{0}=\tau rI_{n_{2}}-\beta B^{T}B,\quad 0.8\leq\tau<1\quad and\quad r>\beta\left\|B^{T}B\right\|.

He et al. proposed an optimal linearized ADMM in the literature [22] with the iterative steps:

{xk+1=argminx{Lβ(x,yk,λk)|xX}yk+1=argminy{Lβ(xk+1,y,λk)+12yykD2|yY}λk+1=λkβ(Axk+1+Byk+1b)\left\{\begin{aligned} x^{k+1}&=arg\min_{x}\left\{L_{\beta}\left(x,y^{k},\lambda^{k}\right)|x\in X\right\}\\ y^{k+1}&=arg\min_{y}\left\{L_{\beta}\left(x^{k+1},y,\lambda^{k}\right)+\frac{1}{2}\left\|y-y^{k}\right\|_{D}^{2}|y\in Y\right\}\\ \lambda^{k+1}&=\lambda^{k}-\beta\left(Ax^{k+1}+By^{k+1}-b\right)\end{aligned}\right. (10)

where

D=τrIβBTB,0.75<τ<1andρ>βBTBD=\tau rI-\beta B^{T}B,\quad 0.75<\tau<1\quad and\quad\rho>\beta\left\|B^{T}B\right\|

It is easy to see that the matrix DD is not necessarily semipositive definite. In the paper the authors prove that 0.75 is an optimal lower bound for τ\tau.

Another important issue for ADMM is to design an accelerated version of the original ADMM by slightly modifying it through a simple relaxation scheme. Note that ADMM is closely related to Proximity Point Algorithms[23] (PPA). For the Proximity Point Algorithm, the convergence rate is improved if an additional over-relaxation step is added to the basic variables; see Tao[24] for a relaxation variant of the PPA and proof of its linear convergence result.Gu and Yang[25] further proved the optimal linear convergence rate of the relaxation PPA under regularity conditions. As Boyd et al. proved in[3], the execution of ADMM relies on the inputs of (yk,λk)\left(y^{k},\lambda^{k}\right) and does not require xkx^{k} at all. Thus, xx plays the role of an intermediate variable in (2), while (yk,λk)\left(y^{k},\lambda^{k}\right) is the basic variable. It is therefore natural to ask whether it is possible to include a relaxation step for the basic variables (yk,λk)\left(y^{k},\lambda^{k}\right) in the ADMM scheme (2) to obtain a faster ADMM-type method. This idea leads to:

{xk+1=argminx{f(x)xTATλk+β2Ax+Bykb2}y^k=argminy{g(y)yTBTλk+β2Axk+1+Byb2}λ^k=λkβ(Axk+1+By^kb)\left\{\begin{aligned} x^{k+1}&=arg\min_{x}\left\{f\left(x\right)-x^{T}A^{T}\lambda^{k}+\frac{\beta}{2}\left\|Ax+By^{k}-b\right\|^{2}\right\}\\ \hat{y}^{k}&=arg\min_{y}\left\{g\left(y\right)-y^{T}B^{T}\lambda^{k}+\frac{\beta}{2}\left\|Ax^{k+1}+By-b\right\|^{2}\right\}\\ \hat{\lambda}^{k}&=\lambda^{k}-\beta\left(Ax^{k+1}+B\hat{y}^{k}-b\right)\end{aligned}\right. (11)
(yk+1λk+1)=(ykλk)σ(yky^kλkλ^k)\begin{pmatrix}y^{k+1}\\ \lambda^{k+1}\end{pmatrix}=\begin{pmatrix}y^{k}\\ \lambda^{k}\end{pmatrix}-\sigma\begin{pmatrix}y^{k}-\hat{y}^{k}\\ \lambda^{k}-\hat{\lambda}^{k}\end{pmatrix} (12)

where the relaxation factor σ(0,2).\sigma\in\left(0,2\right). Based on the above discussion,In this study, we introduce an adaptive linearized alternating direction multiplier method with a relaxation step that incorporates both a relaxation step and an adaptive technique. Our approach adopts a unified framework of variational inequalities and establishes the global convergence of this adaptive linearized alternating direction multiplier method with a relaxation step through theoretical analysis. Through the resolution of the Lasso problem, we demonstrate the algorithm’s outstanding numerical performance. These findings will propel advancements in optimization algorithms and offer more efficient tools and methodologies for addressing practical challenges.

The rest of the paper is organized as follows. Section 2 gives some preliminaries; Section 3 gives iterative steps and proof of convergence of the algorithm; Section 4 gives the numerical experiments; and Section 5 draws the conclusions.

2 Preliminaries

We present some preliminaries that will be used in convergence analysis.

In this article, the symbol \left\|\cdot\right\| denotes the two-norm 2.xD2:=xTDx\left\|\cdot\right\|_{2}.\left\|x\right\|_{D}^{2}:=x^{T}Dx is the matrix norm, where DRn×nD\in R^{n\times n} is a symmetric positive definite matrix and the vector xRnx\in R^{n}. When DD is not a positive definite matrix, we still use the above notation.

2.1 characterization of variational inequalities

Define the Lagrangian function corresponding to problem (1) as:

L(x,y,λ)=f(x)+g(y)λT(Ax+Byb)L\left(x,y,\lambda\right)=f\left(x\right)+g\left(y\right)-\lambda^{T}\left(Ax+By-b\right) (13)

In (13), (x,y)(x,y) and λ\lambda denote primitive and dual variables, respectively.

If (x,y,λ)X×Y×Rm\left(x^{\ast},y^{\ast},\lambda^{\ast}\right)\in X\times Y\times R^{m} satisfies the following inequality:

L(x,y,λ)L(x,y,λ)L(x,y,λ),(x,y,λ)X×Y×Rm=Ω.L\left(x^{\ast},y^{\ast},\lambda\right)\leq L\left(x^{\ast},y^{\ast},\lambda^{\ast}\right)\leq L\left(x,y,\lambda^{\ast}\right),\quad\forall\left(x,y,\lambda\right)\in X\times Y\times R^{m}=\Omega.

then (x,y,λ)\left(x^{\ast},y^{\ast},\lambda^{\ast}\right) is called a saddle point of L(x,y,λ)L\left(x,y,\lambda\right).
The above inequality is equivalent to the following variational inequality form:

{xX,f(x)f(x)+(xx)(ATλ)0,xX.yY,g(y)g(y)+(yy)(ATλ)0,yY.λRm,(λλ)T(Ax+Byb)0,λRm.\begin{cases}x^{\ast}\in X,\quad f\left(x\right)-f\left(x^{\ast}\right)+\left(x-x^{\ast}\right)\left(-A^{T}\lambda^{\ast}\right)\geq 0,\quad\forall x\in X.\\ y^{\ast}\in Y,\quad g\left(y\right)-g\left(y^{\ast}\right)+\left(y-y^{\ast}\right)\left(-A^{T}\lambda^{\ast}\right)\geq 0,\quad\forall y\in Y.\\ \lambda^{\ast}\in R^{m},\qquad\left(\lambda-\lambda^{\ast}\right)^{T}\left(Ax^{\ast}+By^{\ast}-b\right)\geq 0,\quad\forall\lambda\in R^{m}.\end{cases} (14)

The above variational inequality can be written in the following form:

VI(Ω,F,θ):wΩ,θ(u)θ(u)+(ww)TF(w)0,wΩ.VI\left(\Omega,F,\theta\right):\quad w^{\ast}\in\Omega,\quad\theta\left(u\right)-\theta\left(u^{\ast}\right)+\left(w-w^{\ast}\right)^{T}F\left(w^{\ast}\right)\geq 0,\quad\forall w\in\Omega. (15)

where

θ(u)=f(x)+g(y),u=(xy),v=(yλ),w=(xyλ),F(w)=(ATλBTλAx+Byb)\theta\left(u\right)=f\left(x\right)+g\left(y\right),u=\begin{pmatrix}x\\ y\end{pmatrix},v=\begin{pmatrix}y\\ \lambda\par\end{pmatrix},w=\begin{pmatrix}x\\ y\\ \lambda\end{pmatrix},F\left(w\right)=\begin{pmatrix}-A^{T}\lambda\\ -B^{T}\lambda\\ Ax+By-b\end{pmatrix} (16)

Owing to:

(w1w2)T(F(w1)F(w2))0,w1,w2Ω.\left(w_{1}-w_{2}\right)^{T}\left(F\left(w_{1}\right)-F\left(w_{2}\right)\right)\geq 0,\quad\forall w_{1},w_{2}\in\Omega.

So, FF is a monotone operator.
The problem (1) is reformulated as the variational inequality (15) in this manner. We denote the set of solutions of the variational inequality VI(Ω,F,θ)VI\left(\Omega,F,\theta\right) as Ω\Omega^{\ast} , where Ω\Omega^{\ast} represents the set of non-empty solutions.

2.2 some notation

For the convenience of the proof, first define some auxiliary variables and matrices. Let

Qk+1=(τkrβIn2OB1βIm)andM=(σIn2OσβBσIm).Q_{k+1}=\begin{pmatrix}\tau_{k}r\beta I_{n_{2}}&O\\ -B&\frac{1}{\beta}I_{m}\end{pmatrix}\qquad and\qquad M=\begin{pmatrix}\sigma I_{n_{2}}&O\\ -\sigma\beta B&\sigma I_{m}\end{pmatrix}. (17)
Hk+1=1σ(τkrβIn2OO1βIm).H_{k+1}=\frac{1}{\sigma}\begin{pmatrix}\tau_{k}r\beta I_{n_{2}}&O\\ O&\frac{1}{\beta}I_{m}\end{pmatrix}. (18)
w~k=(x~ky~kλ~k)=(xk+1y^kλkβ(Axk+1+Bykb)).\tilde{w}^{k}=\begin{pmatrix}\tilde{x}^{k}\\ \tilde{y}^{k}\\ \tilde{\lambda}^{k}\end{pmatrix}=\begin{pmatrix}x^{k+1}\\ \hat{y}^{k}\\ \lambda^{k}-\beta\left(Ax^{k+1}+By^{k}-b\right)\end{pmatrix}. (19)
v~k=(y~kλ~k)=(y^kλkβ(Axk+1+Bykb))\tilde{v}^{k}=\begin{pmatrix}\tilde{y}^{k}\\ \tilde{\lambda}^{k}\end{pmatrix}=\begin{pmatrix}\hat{y}^{k}\\ \lambda^{k}-\beta\left(Ax^{k+1}+By^{k}-b\right)\end{pmatrix} (20)

Lemma 2.1. The Qk+1,Hk+1Q_{k+1},H_{k+1} and MM defined in (17) and (18) satisfy:

Qk+1=Hk+1MandHk+10,Q_{k+1}=H_{k+1}M\quad and\quad H_{k+1}\succeq 0, (21)

Proof: (21) clearly hold.
Lemma 2.2. (Robbins-Siegmund Theorem[26]) ak,bk,cka^{k},b^{k},c^{k} and dkd^{k} are non-negative sequences and there are :

ak+1(1+bk)ak+ckdk,k=0,1,2a^{k+1}\leq\left(1+b^{k}\right)a^{k}+c^{k}-d^{k},\forall k=0,1,2\dots (22)

if k=0+bk<+\sum_{k=0}^{+\infty}b^{k}<+\infty and k=0+ck<+,\sum_{k=0}^{+\infty}c^{k}<+\infty, so limkak\lim_{k\to\infty}a^{k} exists and is bounded, while there are k=0+dk<+.\sum_{k=0}^{+\infty}d^{k}<+\infty.
Definition 2.1.If a function f()f\left(\cdot\right) is a nonsmooth convex function on a convex set,if the following inequality holds:

f(u)f(v)+ϕT(uv),uΩ.f\left(u\right)\geq f\left(v\right)+\phi^{T}\left(u-v\right),\forall u\in\Omega.

then ϕ\phi is said to be the subgradient of the function f()f\left(\cdot\right) at vΩv\in\Omega; The set consisting of all subgradients is called the subdifferential of the function f()f\left(\cdot\right) at vΩv\in\Omega, denoted f(v)\partial{f\left(v\right)}.

2.3 Stopping criterion

In the context of the two-block divisible convex optimization problem (1), we investigate the optimality conditions for each subproblem during the iterative process of the Alternating Direction Method of Multipliers .
According to the optimality condition theorem, if xx^{\ast}, yy^{\ast} are optimal solutions to the convex optimisation problem (1) and λ\lambda^{\ast} is the corresponding Lagrange multiplier, then the following conditions are satisfied.

0f(x)ATλ0\in\partial{f\left(x^{\ast}\right)}-A^{T}\lambda^{\ast} (23)
0g(y)BTλ0\in\partial{g\left(y^{\ast}\right)}-B^{T}\lambda^{\ast} (24)
Ax+By=bAx^{\ast}+By^{\ast}=b (25)

In this context, condition (25) is denoted as the original feasibility condition, while conditions (23) and (24) are labeled as the dual feasibility conditions.
According to the optimality conditions for the yy-subproblem in (1), we have:

0\displaystyle 0 g(yk+1)BTλk+βBT(Axk+1+Byk+1b)\displaystyle\in\partial{g\left(y^{k+1}\right)}-B^{T}\lambda^{k}+\beta B^{T}\left(Ax^{k+1}+By^{k+1}-b\right) (26)
=g(yk+1)BTλk+1.\displaystyle=\partial{g\left(y^{k+1}\right)}-B^{T}\lambda^{k+1}.

According to the optimality conditions for the xx-subproblem in (1), we have:

0f(xk+1)ATλk+βAT(Axk+1+Bykb).0\in\partial{f\left(x^{k+1}\right)}-A^{T}\lambda^{k}+\beta A^{T}\left(Ax^{k+1}+By^{k}-b\right). (27)

From the definition of λk+1\lambda^{k+1} in (1), the above equation can be equated to:

0\displaystyle 0 f(xk+1)AT(λkβ(Axk+1+Byk+1b)βB(ykyk+1))\displaystyle\in\partial{f\left(x^{k+1}\right)}-A^{T}\left(\lambda^{k}-\beta\left(Ax^{k+1}+By^{k+1}-b\right)-\beta B\left(y^{k}-y^{k+1}\right)\right) (28)
=f(xk+1)ATλk+1+βATB(ykyk+1).\displaystyle=\partial{f\left(x^{k+1}\right)}-A^{T}\lambda^{k+1}+\beta A^{T}B\left(y^{k}-y^{k+1}\right).

(28) is equivalent to:

βATB(yk+1yk)f(xk+1)ATλk+1.\beta A^{T}B\left(y^{k+1}-y^{k}\right)\in\partial{f\left(x^{k+1}\right)}-A^{T}\lambda^{k+1}. (29)

When comparing (29) with condition (23), it is evident that the additional term is βATB(yk+1yk).\beta A^{T}B\left(y^{k+1}-y^{k}\right).Thus, to verify dual feasibility, it is adequate to examine the residual βATB(yk+1yk).\beta A^{T}B\left(y^{k+1}-y^{k}\right).

In summary, to ascertain the convergence of the alternating direction multiplier method, it is necessary to verify if the two residuals pk+1p^{k+1} and dk+1d^{k+1} are sufficiently small.
where

pk+1=Axk+1+Byk+1b.p^{k+1}=\left\|Ax^{k+1}+By^{k+1}-b\right\|.
dk=βATB(yk+1yk).d^{k}=\left\|\beta A^{T}B\left(y^{k+1}-y^{k}\right)\right\|.

3 Algorithm and convergence analysis

3.1 New algorithms

Suppose that f:RnR{+}f:R^{n}\to R\cup\left\{+\infty\right\} and g:RnR{+}g:R^{n}\to R\cup\left\{+\infty\right\} are proper, closed, convex functions.
The main iterative steps of the algorithm are:  
Algorithm 1. An adaptive linearized alternating direction multiplier method with a relaxation step .
  Set up: Ω=X×Y×Rm,β>0,τk>0,pk0,dk0,r=BTB,σ(0,2),ε(0,2σ),Υ>1,ρ>1,ϵpri>0,ϵdual>0.\Omega=X\times Y\times R^{m},\beta>0,\tau_{k}>0,p^{k}\geq 0,d^{k}\geq 0,r=\left\|B^{T}B\right\|,\sigma\in\left(0,2\right),\varepsilon\in\left(0,2-\sigma\right),\Upsilon>1,\rho>1,\epsilon^{pri}>0,\epsilon^{dual}>0. choose parameter sequence {ηk}\left\{\eta_{k}\right\} and {sk}\left\{s_{k}\right\} , where k=0ηk<+\sum_{k=0}^{\infty}\eta_{k}<+\infty and k=0sk<+.\sum_{k=0}^{\infty}s_{k}<+\infty.
Step 0. Input: w0=(x0,y0,λ0)Ω,β,σ,τ0,τmin,p0,d0,Υ,ε,r,ρ,ϵpri,ϵdual.w^{0}=\left(x^{0},y^{0},\lambda^{0}\right)\in\Omega,\beta,\sigma,\tau_{0},\tau_{min},p^{0},d^{0},\Upsilon,\varepsilon,r,\rho,\epsilon^{pri},\epsilon^{dual}. set up k=0.k=0.
Step 1. Calculate: wk+1=(xk+1,yk+1,λk+1)X×Y×Rm.w^{k+1}=\left(x^{k+1},y^{k+1},\lambda^{k+1}\right)\in X\times Y\times R^{m}.

{xk+1=argminx{f(x)(λk)TAx+β2Ax+Bykb2}y^k=argminy{g(y)(λk)TBy+β2Axk+1+Byb2+12yykDk2}λ^k=λkβ(Axk+1+By^kb)\begin{cases}x^{k+1}=arg\min_{x}\left\{f\left(x\right)-\left(\lambda^{k}\right)^{T}Ax+\frac{\beta}{2}\left\|Ax+By^{k}-b\right\|^{2}\right\}\\ \hat{y}^{k}=arg\min_{y}\left\{g\left(y\right)-\left(\lambda^{k}\right)^{T}By+\frac{\beta}{2}\left\|Ax^{k+1}+By-b\right\|^{2}\right.\\ \left.\qquad\qquad\qquad\qquad\qquad+\frac{1}{2}\left\|y-y^{k}\right\|_{D_{k}}^{2}\right\}\\ \hat{\lambda}^{k}=\lambda^{k}-\beta\left(Ax^{k+1}+B\hat{y}^{k}-b\right)\end{cases} (30)

where Dk=τkrIn2βBTB.D_{k}=\tau_{k}rI_{n_{2}}-\beta B^{T}B.

{yk+1=ykσ(yky^k)λk+1=λkσ(λkλ^k)\begin{cases}y^{k+1}=y^{k}-\sigma\left(y^{k}-\hat{y}^{k}\right)\\ \lambda^{k+1}=\lambda^{k}-\sigma\left(\lambda^{k}-\hat{\lambda}^{k}\right)\end{cases} (31)

where σ(0,2).\sigma\in\left(0,2\right).
Step 2. If any one of the following conditions holds:

Condition1.Θ1k>Θ2k.\displaystyle Condition1.\Theta_{1}^{k}>\Theta_{2}^{k}. (32)
Condition2.yk+1=yk.\displaystyle Condition2.y^{k+1}=y^{k}.

where Θ1k=(2σ)τkrykyk+12\Theta_{1}^{k}=\left(2-\sigma\right)\tau_{k}r\left\|y^{k}-y^{k+1}\right\|^{2}, Θ2k=1εB(ykyk+1)2.\Theta_{2}^{k}=\frac{1}{\varepsilon}\left\|B\left(y^{k}-y^{k+1}\right)\right\|^{2}.
then go to step 3. otherwise τk=γτk(γ>1),\tau_{k}=\gamma\ast\tau_{k}\left(\gamma>1\right), turn to step 1.
Step 3. If Θ1kΘ2kΥΘ2k.\Theta_{1}^{k}-\Theta_{2}^{k}\geq\Upsilon\Theta_{2}^{k}. then set tk+1=max{rk1+ηk+1,τmin},t_{k+1}=\max\left\{\frac{r_{k}}{1+\eta_{k+1}},\tau_{min}\right\}, Otherwise tk+1=τk.t_{k+1}=\tau_{k}.
Step 4. If any one of the following conditions holds:

Condition1.pk+1>(1+sk)pk.Condition1.p^{k+1}>\left(1+s_{k}\right)p^{k}.
Condition2.dk+1>(1+sk)dk.Condition2.d^{k+1}>\left(1+s_{k}\right)d^{k}.

where pk+1=Axk+1+Byk+1b,dk+1=βATB(yk+1yk).p^{k+1}=\left\|Ax^{k+1}+By^{k+1}-b\right\|,d^{k+1}=\left\|\beta A^{T}B\left(y^{k+1}-y^{k}\right)\right\|.
then set τk+1=ρtk+1(ρ>1),\tau_{k+1}=\rho\ast t_{k+1}\left(\rho>1\right), Otherwise τk+1=tk+1.\tau_{k+1}=t_{k+1}.
Step 5. If the stopping criterion is satisfied, return to step 6.
where, the stopping criterion is:

Axk+1+Byk+1bϵpriandβATB(yk+1yk)ϵdual\left\|Ax^{k+1}+By^{k+1}-b\right\|\leq\epsilon^{pri}\quad and\quad\left\|\beta A^{T}B\left(y^{k+1}-y^{k}\right)\right\|\leq\epsilon^{dual} (33)

Otherwise make k=k+1,k=k+1, and return to step 1.
Step 6. Output: xk+1,yk+1,f(xk+1)+g(yk+1).x^{k+1},y^{k+1},f\left(x^{k+1}\right)+g\left(y^{k+1}\right).

 

3.2 Global convergence analysis

In this section,we prove the global convergence for the proposed method.Before proceeding,we need the following lemma.
Lemma 3.1. Let the sequence {wk}\left\{w^{k}\right\} be the iterative sequence generated by Algorithm 1, and {w~k}\left\{\tilde{w}^{k}\right\} be defined in (19), then, for any w~kΩ\tilde{w}^{k}\in\Omega, there are:

θ(u)θ(u~k)+(ww~k)TF(w~k)(vv~k)TQk+1(vkv~k),wΩ.\theta\left(u\right)-\theta\left(\tilde{u}^{k}\right)+\left(w-\tilde{w}^{k}\right)^{T}F\left(\tilde{w}^{k}\right)\geq\left(v-\tilde{v}^{k}\right)^{T}Q_{k+1}\left(v^{k}-\tilde{v}^{k}\right),\quad\forall w\in\Omega. (34)

Proof: by the optimality condition for the xx subproblem in Algorithm 1: for any x~kX\tilde{x}^{k}\in X,there are:

f(x)f(x~k)+(xx~k)T{ATλk+βAT(Ax~k+Bykb)}0,xX.f\left(x\right)-f\left(\tilde{x}^{k}\right)+\left(x-\tilde{x}^{k}\right)^{T}\left\{-A^{T}\lambda^{k}+\beta A^{T}\left(A\tilde{x}^{k}+By^{k}-b\right)\right\}\geq 0,\quad\forall x\in X.

Since λ~k=λkβ(Ax~k+Bykb)\tilde{\lambda}^{k}=\lambda^{k}-\beta\left(A\tilde{x}^{k}+By^{k}-b\right) in (19), then:

f(x)f(x~k)+(xx~k)T(ATλ~k)0.xX.f\left(x\right)-f\left(\tilde{x}^{k}\right)+\left(x-\tilde{x}^{k}\right)^{T}\left(-A^{T}\tilde{\lambda}^{k}\right)\geq 0.\quad\forall x\in X. (35)

From the optimality conditions for the yy-subproblem in Algorithm 1, it follows that for any y~kY\tilde{y}^{k}\in Y, there are:

g(y)g(y~k)+(yy~k)T{BTλk+βBT(Ax~k+By~kb)g\left(y\right)-g\left(\tilde{y}^{k}\right)+\left(y-\tilde{y}^{k}\right)^{T}\left\{-B^{T}\lambda^{k}+\beta B^{T}\left(A\tilde{x}^{k}+B\tilde{y}^{k}-b\right)\right.
+(τkrIn2βBTB)(y~kyk)}0,yY.\left.+\left(\tau_{k}rI_{n_{2}}-\beta B^{T}B\right)\left(\tilde{y}^{k}-y^{k}\right)\right\}\geq 0,\quad\forall y\in Y.

Also by the definition of λ~k\tilde{\lambda}^{k}, the above equation can be written as:

g(y)g(y~k)+(yy~k)T{BTλ~k+τkrβ(y~kyk)}0,yY.g\left(y\right)-g\left(\tilde{y}^{k}\right)+\left(y-\tilde{y}^{k}\right)^{T}\left\{-B^{T}\tilde{\lambda}^{k}+\tau_{k}r\beta\left(\tilde{y}^{k}-y^{k}\right)\right\}\geq 0,\qquad\forall y\in Y. (36)

By the definition of λ~k\tilde{\lambda}^{k} in (19), we have:

(Ax~+By~kb)B(y~kyk)+1β(λ~kλk)=0.\left(A\tilde{x}+B\tilde{y}^{k}-b\right)-B\left(\tilde{y}^{k}-y^{k}\right)+\frac{1}{\beta}\left(\tilde{\lambda}^{k}-\lambda^{k}\right)=0.

The above equation can be written as:

λ~kRm,(λλ~k)T{Ax~k+By~kbB(y~kyk)+1β(λ~kλk)}0,λRm.\begin{split}\tilde{\lambda}^{k}\in R^{m},\qquad\left(\lambda-\tilde{\lambda}^{k}\right)^{T}\left\{A\tilde{x}^{k}+B\tilde{y}^{k}-b-B\left(\tilde{y}^{k}-y^{k}\right)\right.\\ \left.+\frac{1}{\beta}\left(\tilde{\lambda}^{k}-\lambda^{k}\right)\right\}\geq 0,\quad\forall\lambda\in R^{m}.\end{split} (37)

The lemma can be proven by combining equations (35), (36), and (37) along with the notation Qk+1Q_{k+1}.
Lemma 3.2. Let the sequence {wk}\left\{w^{k}\right\} be the iterative sequence generated by Algorithm 1 and {w~k}\left\{\tilde{w}^{k}\right\} be defined in (19), then:

vkvk+1=M(vkv~k)v^{k}-v^{k+1}=M\left(v^{k}-\tilde{v}^{k}\right) (38)

where MM is defined in (17).
Proof: This follows from (31) and the definitions of λ~k\tilde{\lambda}^{k} and λ^k\hat{\lambda}^{k}:
λk+1=λkσ(λkλ^k)\lambda^{k+1}=\lambda^{k}-\sigma\left(\lambda^{k}-\hat{\lambda}^{k}\right)
=λkσβ(Ax~k+By^kb)=\lambda^{k}-\sigma\beta\left(A\tilde{x}^{k}+B\hat{y}^{k}-b\right)
=λkσ[β(Ax~k+Bykb)βB(yky^k)]=\lambda^{k}-\sigma\left[\beta\left(A\tilde{x}^{k}+By^{k}-b\right)-\beta B\left(y^{k}-\hat{y}^{k}\right)\right]
=λkσ(λkλ~k)+σβB(yky~k).=\lambda^{k}-\sigma\left(\lambda^{k}-\tilde{\lambda}^{k}\right)+\sigma\beta B\left(y^{k}-\tilde{y}^{k}\right).
This can be seen in combination with (31):

(yk+1λk+1)=(ykλk)(σIn2OσβBσIm)(yky~kλkλ~k).\begin{pmatrix}y^{k+1}\\ \lambda^{k+1}\par\end{pmatrix}=\begin{pmatrix}y^{k}\\ \lambda^{k}\par\end{pmatrix}-\begin{pmatrix}\sigma I_{n_{2}}&O\\ -\sigma\beta B&\sigma I_{m}\end{pmatrix}\begin{pmatrix}y^{k}-\tilde{y}^{k}\\ \lambda^{k}-\tilde{\lambda}^{k}\end{pmatrix}.

Therefore, equation (38) holds.

It is clear from (38) and (21):

(vv~k)TQk+1(vkv~k)\displaystyle\left(v-\tilde{v}^{k}\right)^{T}Q_{k+1}\left(v^{k}-\tilde{v}^{k}\right) =(vv~k)THk+1M(vkv~k)\displaystyle=\left(v-\tilde{v}^{k}\right)^{T}H_{k+1}M\left(v^{k}-\tilde{v}^{k}\right)
=(vv~k)THk+1(vkvk+1).\displaystyle=\left(v-\tilde{v}^{k}\right)^{T}H_{k+1}\left(v^{k}-v^{k+1}\right).

Lemma 3.3.Let the sequence {wk}\left\{w^{k}\right\} be the iterative sequence generated by Algorithm 1 and {w~k}\left\{\tilde{w}^{k}\right\} be defined in (19), Then, we have:

(vv~k)THk+1(vkvk+1)=12(vvk+1Hk+12vvkHk+12)+12vkv~kGk+12,vΩ.\begin{split}\left(v-\tilde{v}^{k}\right)^{T}H_{k+1}\left(v^{k}-v^{k+1}\right)&=\frac{1}{2}\left(\left\|v-v^{k+1}\right\|_{H_{k+1}}^{2}-\left\|v-v^{k}\right\|_{H_{k+1}}^{2}\right)\\ &+\frac{1}{2}\left\|v^{k}-\tilde{v}^{k}\right\|_{G_{k+1}}^{2},\qquad\forall v\in\Omega.\end{split} (39)

where the matrix Gk+1=QkT+QkMTHk+1M.G_{k+1}=Q_{k}^{T}+Q_{k}-M^{T}H_{k+1}M.
Proof: Using the equation:

(ab)THk+1(cd)=12(adHk+12acHk+12)+12(cbHk+12dbHk+12).\begin{split}\left(a-b\right)^{T}H_{k+1}\left(c-d\right)&=\frac{1}{2}\left(\left\|a-d\right\|_{H_{k+1}}^{2}-\left\|a-c\right\|_{H_{k+1}}^{2}\right)\\ &+\frac{1}{2}\left(\left\|c-b\right\|_{H_{k+1}}^{2}-\left\|d-b\right\|_{H_{k+1}}^{2}\right).\end{split} (40)

In (40), let a=va=v,b=v~kb=\tilde{v}^{k},c=vkc=v^{k},d=vk+1d=v^{k+1}. We can get:

(vv~k)THk+1(vkvk+1)=12(vvk+1Hk+12vvkHk+12)+12(vkv~kHk+12vk+1v~kHk+12).\begin{split}\left(v-\tilde{v}^{k}\right)^{T}H_{k+1}\left(v^{k}-v^{k+1}\right)&=\frac{1}{2}\left(\left\|v-v^{k+1}\right\|_{H_{k+1}}^{2}-\left\|v-v^{k}\right\|_{H_{k+1}}^{2}\right)\\ &+\frac{1}{2}\left(\left\|v^{k}-\tilde{v}^{k}\right\|_{H_{k+1}}^{2}-\left\|v^{k+1}-\tilde{v}^{k}\right\|_{H_{k+1}}^{2}\right).\end{split} (41)

For the last term in (41), we have:

vkv~kHk+12vk+1v~kHk+12=vkv~kHk+12(vkv~k)(vkvk+1)Hk+12=(44)vkv~kHk+12(vkv~k)M(vkv~k)Hk+12=2(vkv~k)THk+1M(vkv~k)(vkv~k)TMTHk+1M(vkv~k)=(vkv~k)T(QkT+QkMTHk+1M)(vkv~k).\begin{split}&\left\|v^{k}-\tilde{v}^{k}\right\|_{H_{k+1}}^{2}-\left\|v^{k+1}-\tilde{v}^{k}\right\|_{H_{k+1}}^{2}\\ &=\left\|v^{k}-\tilde{v}^{k}\right\|_{H_{k+1}}^{2}-\left\|\left(v^{k}-\tilde{v}^{k}\right)-\left(v^{k}-v^{k+1}\right)\right\|_{H_{k+1}}^{2}\\ &\stackrel{{\scriptstyle\left(44\right)}}{{=}}\left\|v^{k}-\tilde{v}^{k}\right\|_{H_{k+1}}^{2}-\left\|\left(v^{k}-\tilde{v}^{k}\right)-M\left(v^{k}-\tilde{v}^{k}\right)\right\|_{H_{k+1}}^{2}\\ &=2\left(v^{k}-\tilde{v}^{k}\right)^{T}H_{k+1}M\left(v^{k}-\tilde{v}^{k}\right)-\left(v^{k}-\tilde{v}^{k}\right)^{T}M^{T}H_{k+1}M\left(v^{k}-\tilde{v}^{k}\right)\\ &=\left(v^{k}-\tilde{v}^{k}\right)^{T}\left(Q_{k}^{T}+Q_{k}-M^{T}H_{k+1}M\right)\left(v^{k}-\tilde{v}^{k}\right).\end{split} (42)

Combining (41) and (42), (39) is proved.

Now, let us examine the properties of the matrix Gk+1G_{k+1}. Utilizing (21), we can derive:

Gk+1=Qk+1T+Qk+1MTHk+1M=Qk+1T+Qk+1MTQk+1=(2τkrβIn2BTB2βIm)(σIn2σβBTOσIm)(τkrβIn2OB1βIm)=(2τkrβIn2BTB2βIm)(στkrβIn2+σβBTBσBTσBσβIm)=((2σ)τkrβIn2σβBTB(σ1)BT(σ1)B(2σ)βIm).\begin{split}G_{k+1}&=Q_{k+1}^{T}+Q_{k+1}-M^{T}H_{k+1}M\\ &=Q_{k+1}^{T}+Q_{k+1}-M^{T}Q_{k+1}\\ &=\begin{pmatrix}2\tau_{k}r\beta I_{n_{2}}&-B^{T}\\ -B&\frac{2}{\beta}I_{m}\end{pmatrix}-\begin{pmatrix}\sigma I_{n_{2}}&-\sigma\beta B^{T}\\ O&\sigma I_{m}\end{pmatrix}\begin{pmatrix}\tau_{k}r\beta I_{n_{2}}&O\\ -B&\frac{1}{\beta}I_{m}\end{pmatrix}\\ &=\begin{pmatrix}2\tau_{k}r\beta I_{n_{2}}&-B^{T}\\ -B&\frac{2}{\beta}I_{m}\end{pmatrix}-\begin{pmatrix}\sigma\tau_{k}r\beta I_{n_{2}}+\sigma\beta B^{T}B&-\sigma B^{T}\\ -\sigma B&\frac{\sigma}{\beta}I_{m}\end{pmatrix}\\ &=\begin{pmatrix}\left(2-\sigma\right)\tau_{k}r\beta I_{n_{2}}-\sigma\beta B^{T}B&\left(\sigma-1\right)B^{T}\\ \left(\sigma-1\right)B&\frac{\left(2-\sigma\right)}{\beta}I_{m}\end{pmatrix}.\end{split} (43)

To simplify the proof, we give some properties of the parameters as follows:
From the iterative format of Algorithm 1:11+ηkτkτk+1(1+ξk)τk,\frac{1}{1+\eta_{k}}\tau_{k}\leq\tau_{k+1}\leq\left(1+\xi_{k}\right)\tau_{k},That is, the sequence in Algorithm 1 satisfies:τk[τmin,τmax].\tau_{k}\subset\left[\tau_{min},\tau_{max}\right].
Let the (k+1)(k+1)th step ultimately satisfy the parameters of Step 4 as τk+1=ργmkτk\tau_{k+1}=\rho\gamma^{m_{k}}\tau_{k}, where mkm_{k} is an integer.
Let ξk:=ργmk1,\xi_{k}:=\rho\gamma^{m_{k}}-1, that is 1+ξk=ργmk.1+\xi_{k}=\rho\gamma^{m_{k}}.
then:

τk+11+ξk1+ηkτki=1k(1+ξi)i=1k(1+ηi)τ1i=1k(1+ξi)i=1(1+ηi)τ0.\tau_{k+1}\geq\frac{1+\xi_{k}}{1+\eta_{k}}\tau_{k}\geq\cdots\geq\frac{\prod_{i=1}^{k}\left(1+\xi_{i}\right)}{\prod_{i=1}^{k}\left(1+\eta_{i}\right)}\tau_{1}\geq\frac{\prod_{i=1}^{k}\left(1+\xi_{i}\right)}{\prod_{i=1}^{\infty}\left(1+\eta_{i}\right)}\tau_{0}.

By parameter setting k=1ηk<+\sum_{k=1}^{\infty}\eta_{k}<+\infty know i=1(1+ηi)<+.\prod_{i=1}^{\infty}\left(1+\eta_{i}\right)<+\infty.
Making k+k\to+\infty gives: i=1(1+ξi)<+\prod_{i=1}^{\infty}\left(1+\xi_{i}\right)<+\infty, that is: k=1ξi<+.\sum_{k=1}^{\infty}\xi_{i}<+\infty.
Theorem 3.1. Let the sequence {wk}\left\{w^{k}\right\} be the iterative sequence generated by Algorithm 1 and {w~k}\left\{\tilde{w}^{k}\right\} be defined in (19). Then, we have:

vk+1vHk+12(1+ξk)vkvHk21σ2{βykyk+1Tk+12+(2σ)εβλkλk+12}.\begin{split}\left\|v^{k+1}-v^{\ast}\right\|_{H_{k+1}}^{2}&\leq\left(1+\xi_{k}\right)\left\|v^{k}-v^{\ast}\right\|_{H_{k}}^{2}-\frac{1}{\sigma^{2}}\left\{\beta\left\|y^{k}-y^{k+1}\right\|_{T_{k+1}}^{2}\right.\\ &\left.+\frac{\left(2-\sigma\right)-\varepsilon}{\beta}\left\|\lambda^{k}-\lambda^{k+1}\right\|^{2}\right\}.\end{split} (44)

where ε(0,2σ)\varepsilon\in\left(0,2-\sigma\right) and Tk+10.T_{k+1}\succ 0.
Proof: In the following, we will further investigate the vkv~kGk+12\left\|v^{k}-\tilde{v}^{k}\right\|_{G_{k+1}}^{2} term and show how it can be bounded.
Due to Gk+1=((2σ)τkrβIn2σβBTB(σ1)BT(σ1)B(2σ)βIm)G_{k+1}=\begin{pmatrix}\left(2-\sigma\right)\tau_{k}r\beta I_{n_{2}}-\sigma\beta B^{T}B&\left(\sigma-1\right)B^{T}\\ \left(\sigma-1\right)B&\frac{\left(2-\sigma\right)}{\beta}I_{m}\end{pmatrix} and v=(yλ),v=\begin{pmatrix}y\\ \lambda\end{pmatrix}, We have:

vkv~kGk+12=(2σ)τkrβyky~k2σβB(yky~k)2+2σβλkλ~k2+2(σ1)(λkλ~k)TB(yky~k)=(2σ)τkrβyky~k2+(2σ)βAx~k+By~kb2+2β(Ax~k+By~kb)TB(yky~k)\begin{split}\left\|v^{k}-\tilde{v}^{k}\right\|_{G_{k+1}}^{2}&=\left(2-\sigma\right)\tau_{k}r\beta\left\|y^{k}-\tilde{y}^{k}\right\|^{2}-\sigma\beta\left\|B\left(y^{k}-\tilde{y}^{k}\right)\right\|^{2}\\ &+\frac{2-\sigma}{\beta}\left\|\lambda^{k}-\tilde{\lambda}^{k}\right\|^{2}+2\left(\sigma-1\right)\left(\lambda^{k}-\tilde{\lambda}^{k}\right)^{T}B\left(y^{k}-\tilde{y}^{k}\right)\\ &=\left(2-\sigma\right)\tau_{k}r\beta\left\|y^{k}-\tilde{y}^{k}\right\|^{2}+\left(2-\sigma\right)\beta\left\|A\tilde{x}^{k}+B\tilde{y}^{k}-b\right\|^{2}\\ &+2\beta\left(A\tilde{x}^{k}+B\tilde{y}^{k}-b\right)^{T}B\left(y^{k}-\tilde{y}^{k}\right)\end{split} (45)

Also because λ^k=λkβ(Ax~k+By~kb)\hat{\lambda}^{k}=\lambda^{k}-\beta\left(A\tilde{x}^{k}+B\tilde{y}^{k}-b\right) and yky~k=1σ(ykyk+1).y^{k}-\tilde{y}^{k}=\frac{1}{\sigma}\left(y^{k}-y^{k+1}\right).
Therefore, we have:

vkv~kGk+12=(2σ)σ2τkrβykyk+12+(2σ)βλkλ^k2+2σ(λkλ^k)TB(ykyk+1).\begin{split}\left\|v^{k}-\tilde{v}^{k}\right\|_{G_{k+1}}^{2}&=\frac{\left(2-\sigma\right)}{\sigma^{2}}\tau_{k}r\beta\left\|y^{k}-y^{k+1}\right\|^{2}+\frac{\left(2-\sigma\right)}{\beta}\left\|\lambda^{k}-\hat{\lambda}^{k}\right\|^{2}\\ &+\frac{2}{\sigma}\left(\lambda^{k}-\hat{\lambda}^{k}\right)^{T}B\left(y^{k}-y^{k+1}\right).\end{split} (46)

It is clear from (31):

λkλ^k=1σ(λkλk+1).\lambda^{k}-\hat{\lambda}^{k}=\frac{1}{\sigma}\left(\lambda^{k}-\lambda^{k+1}\right).

Substituting the above equation into (46) gives:

vkv~kGk+12=(2σ)σ2τkrβykyk+12+(2σ)σ2βλkλk+12+2σ2(λkλk+1)TB(ykyk+1).\begin{split}\left\|v^{k}-\tilde{v}^{k}\right\|_{G_{k+1}}^{2}&=\frac{\left(2-\sigma\right)}{\sigma^{2}}\tau_{k}r\beta\left\|y^{k}-y^{k+1}\right\|^{2}+\frac{\left(2-\sigma\right)}{\sigma^{2}\beta}\left\|\lambda^{k}-\lambda^{k+1}\right\|^{2}\\ &+\frac{2}{\sigma^{2}}\left(\lambda^{k}-\lambda^{k+1}\right)^{T}B\left(y^{k}-y^{k+1}\right).\end{split} (47)

In the following, we estimate (λkλk+1)TB(ykyk+1)\left(\lambda^{k}-\lambda^{k+1}\right)^{T}B\left(y^{k}-y^{k+1}\right).
From the CauchySchwarzCauchy-Schwarz inequality: for δ>0\forall\delta>0, there is :

(λkλk+1)T(BykByk+1)δβλkλk+1214δβB(ykyk+1)2.\left(\lambda^{k}-\lambda^{k+1}\right)^{T}\left(By^{k}-By^{k+1}\right)\geq-\frac{\delta}{\beta}\left\|\lambda^{k}-\lambda^{k+1}\right\|^{2}-\frac{1}{4\delta}\beta\left\|B\left(y^{k}-y^{k+1}\right)\right\|^{2}. (48)

Substituting (48) into (47) shows that:

vkv~kGk+121σ2[(2σ)τkrβykyk+1212δβB(ykyk+1)2+(2σ)2δβλkλk+12].\begin{split}\left\|v^{k}-\tilde{v}^{k}\right\|_{G_{k+1}}^{2}&\geq\frac{1}{\sigma^{2}}\left[\left(2-\sigma\right)\tau_{k}r\beta\left\|y^{k}-y^{k+1}\right\|^{2}-\frac{1}{2\delta}\beta\left\|B\left(y^{k}-y^{k+1}\right)\right\|^{2}\right.\\ &\left.+\frac{\left(2-\sigma\right)-2\delta}{\beta}\left\|\lambda^{k}-\lambda^{k+1}\right\|^{2}\right].\end{split} (49)

Owing to:

(w1w2)T(F(w1)F(w2))0,w1,w2Ω.\left(w_{1}-w_{2}\right)^{T}\left(F\left(w_{1}\right)-F\left(w_{2}\right)\right)\geq 0,\quad\forall w_{1},w_{2}\in\Omega. (50)

It follows from (50) and (34):

θ(u)θ(u~k)+(ww~k)TF(w)(vv~k)TQk+1(vkv~k),wΩ.\theta\left(u\right)-\theta\left(\tilde{u}^{k}\right)+\left(w-\tilde{w}^{k}\right)^{T}F\left(w\right)\geq\left(v-\tilde{v}^{k}\right)^{T}Q_{k+1}\left(v^{k}-\tilde{v}^{k}\right),\quad\forall w\in\Omega.

Let the above equation w=ww=w^{\ast} and combine with (15) to show that:

(v~kv)TQk+1(vkv~k)0,wΩ.\left(\tilde{v}^{k}-v^{\ast}\right)^{T}Q_{k+1}\left(v^{k}-\tilde{v}^{k}\right)\geq 0,\quad\forall w\in\Omega. (51)

It is clear from (21) and (38):

(v~kv)TQk+1(vkv~k)=(v~kv)THk+1(vkvk+1).\left(\tilde{v}^{k}-v^{\ast}\right)^{T}Q_{k+1}\left(v^{k}-\tilde{v}^{k}\right)=\left(\tilde{v}^{k}-v^{\ast}\right)^{T}H_{k+1}\left(v^{k}-v^{k+1}\right). (52)

It is clear from (39):

(v~kv)THk+1(vkvk+1)=12(vkvHk+12vk+1vHk+12)12vkv~kGk+12.\begin{split}\left(\tilde{v}^{k}-v^{\ast}\right)^{T}H_{k+1}\left(v^{k}-v^{k+1}\right)&=\frac{1}{2}\left(\left\|v^{k}-v^{\ast}\right\|_{H_{k+1}}^{2}-\left\|v^{k+1}-v^{\ast}\right\|_{H_{k+1}}^{2}\right)\\ &-\frac{1}{2}\left\|v^{k}-\tilde{v}^{k}\right\|_{G_{k+1}}^{2}.\end{split} (53)

It follows from (51), (52) and (53):

vk+1vHk+12vkvHk+12vkv~kGk+12.\left\|v^{k+1}-v^{\ast}\right\|_{H_{k+1}}^{2}\leq\left\|v^{k}-v^{\ast}\right\|_{H_{k+1}}^{2}-\left\|v^{k}-\tilde{v}^{k}\right\|_{G_{k+1}}^{2}. (54)

Substituting (49) into (54) gives:

vk+1vHk+12vkvHk+121σ2{βykyk+1Tk+12+(2σ)2δβλkλk+12}vkvHk2+(τkτk1)rβσyky21σ2{βykyk+1Tk+12+(2σ)2δβλkλk+12}(1+ξk)vkvHk21σ2{βykyk+1Tk+12+(2σ)2δβλkλk+12}\begin{split}\left\|v^{k+1}-v^{\ast}\right\|_{H_{k+1}}^{2}&\leq\left\|v^{k}-v^{\ast}\right\|_{H_{k+1}}^{2}-\frac{1}{\sigma^{2}}\left\{\beta\left\|y^{k}-y^{k+1}\right\|_{T_{k+1}}^{2}\right.\\ &\left.+\frac{\left(2-\sigma\right)-2\delta}{\beta}\left\|\lambda^{k}-\lambda^{k+1}\right\|^{2}\right\}\\ &\leq\left\|v^{k}-v^{\ast}\right\|_{H_{k}}^{2}+\left(\tau_{k}-\tau_{k-1}\right)\frac{r\beta}{\sigma}\left\|y^{k}-y^{\ast}\right\|^{2}\\ &-\frac{1}{\sigma^{2}}\left\{\beta\left\|y^{k}-y^{k+1}\right\|_{T_{k+1}}^{2}+\frac{\left(2-\sigma\right)-2\delta}{\beta}\left\|\lambda^{k}-\lambda^{k+1}\right\|^{2}\right\}\\ &\leq\left(1+\xi_{k}\right)\left\|v^{k}-v^{\ast}\right\|_{H_{k}}^{2}-\frac{1}{\sigma^{2}}\left\{\beta\left\|y^{k}-y^{k+1}\right\|_{T_{k+1}}^{2}\right.\\ &\left.+\frac{\left(2-\sigma\right)-2\delta}{\beta}\left\|\lambda^{k}-\lambda^{k+1}\right\|^{2}\right\}\end{split} (55)

where Tk+1=(2σ)τkrIn212δBTB.T_{k+1}=\left(2-\sigma\right)\tau_{k}rI_{n_{2}}-\frac{1}{2\delta}B^{T}B.
For limk(vkvk+1)=0\lim_{k\to\infty}\left(v^{k}-v^{k+1}\right)=0 to hold, it is sufficient that:δ=ε2\delta=\frac{\varepsilon}{2} and Tk+10.T_{k+1}\succ 0.
Tk+10T_{k+1}\succ 0 is equivalent to (2σ)rkykyk+12>1εB(ykyk+1)2.\left(2-\sigma\right)r_{k}\left\|y^{k}-y^{k+1}\right\|^{2}>\frac{1}{\varepsilon}\left\|B\left(y^{k}-y^{k+1}\right)\right\|^{2}.
So (55) can be written as:

vk+1vHk+12(1+ηk)vkvHk21σ2{βykyk+1Dk+12+(2σ)εβλkλk+12}.\begin{split}\left\|v^{k+1}-v^{\ast}\right\|_{H_{k+1}}^{2}&\leq\left(1+\eta_{k}\right)\left\|v^{k}-v^{\ast}\right\|_{H_{k}}^{2}-\frac{1}{\sigma^{2}}\left\{\beta\left\|y^{k}-y^{k+1}\right\|_{D_{k+1}}^{2}\right.\\ &\left.+\frac{\left(2-\sigma\right)-\varepsilon}{\beta}\left\|\lambda^{k}-\lambda^{k+1}\right\|^{2}\right\}.\end{split}

Therefore, Theorem 3.1 is proved.
Theorem 3.2. Let the sequence {wk}\left\{w^{k}\right\} be the iterative sequence generated by Algorithm 1. Taking any point ww^{\ast} in Ω\Omega^{\ast}, we have:
(a).limkvk+1vHk+12\lim_{k\to\infty}\left\|v^{k+1}-v^{\ast}\right\|_{H_{k+1}}^{2} exists and limkvk+1vHk+12<+.\lim_{k\to\infty}\left\|v^{k+1}-v^{\ast}\right\|_{H_{k+1}}^{2}<+\infty.
(b).limkyk+1yk=0\lim_{k\to\infty}\left\|y^{k+1}-y^{k}\right\|=0 and limkλk+1λk=0.\lim_{k\to\infty}\left\|\lambda^{k+1}-\lambda^{k}\right\|=0.
Proof: Let

ak=vk+1vHk+12,bk=ξk,ck=0,a^{k}=\left\|v^{k+1}-v^{\ast}\right\|_{H_{k+1}}^{2},b^{k}=\xi_{k},c^{k}=0,
dk=βyk+1ykDk+12+12εβλk+1λk2.d^{k}=\beta\left\|y^{k+1}-y^{k}\right\|_{D_{k+1}}^{2}+\frac{1-2\varepsilon}{\beta}\left\|\lambda^{k+1}-\lambda^{k}\right\|^{2}.

By Lemma 2.2, (a) and (b) hold.
Theorem 3.3. Let the sequence {wk}\left\{w^{k}\right\} be the iterative sequence generated by Algorithm 1, then {wk}\left\{w^{k}\right\} converges to a point wΩw^{\infty}\in\Omega^{\ast}.
Proof: It follows from Theorem 3.2: limkvk+1vk=0.\lim_{k\to\infty}\left\|v^{k+1}-v^{k}\right\|=0.
It follows from combining (38) and the non-singularity of MM:limkvkv~k=0.\lim_{k\to\infty}\left\|v^{k}-\tilde{v}^{k}\right\|=0.
Since the sequence {vk+1vHk+12}\left\{\left\|v^{k+1}-v^{\ast}\right\|_{H_{k+1}}^{2}\right\} is a bounded sequence,
then for any fixed vΩ,v^{\ast}\in\Omega^{\ast}, there is a vk+1v\left\|v^{k+1}-v^{\ast}\right\| bounded.
That is, the sequence {vk}\left\{v^{k}\right\} is bounded.
because of

v~kvvkv~k+vkv.\left\|\tilde{v}^{k}-v^{\ast}\right\|\leq\left\|v^{k}-\tilde{v}^{k}\right\|+\left\|v^{k}-v^{\ast}\right\|.

It is known that v~kv\left\|\tilde{v}^{k}-v^{\ast}\right\| is bounded.
Clearly the sequence {v~k}\left\{\tilde{v}^{k}\right\} is also bounded and there must exist a convergence point v,v^{\infty},\\ such that a subsequence {v~kj}\left\{\tilde{v}^{k_{j}}\right\} of the existence sequence {v~k}\left\{\tilde{v}^{k}\right\} converges to v.v^{\infty}.
From λ~k=λkβ(Ax~k+Bykb)\tilde{\lambda}^{k}=\lambda^{k}-\beta\left(A\tilde{x}^{k}+By^{k}-b\right) in (19), we have:

Ax~kj=1β(λkjλ~kj)(Bykjb).A\tilde{x}^{k_{j}}=\frac{1}{\beta}\left(\lambda^{k_{j}}-\tilde{\lambda}^{k_{j}}\right)-\left(By^{k_{j}}-b\right).

Since Matrix A is a column full rank matrix, it follows that the sequence {x~kj}\left\{\tilde{x}^{k_{j}}\right\} converges.
Set limkjx~kj=x,\lim_{k_{j}\to\infty}\tilde{x}^{k_{j}}=x^{\infty}, then there exists a subsequence {w~kj}\left\{\tilde{w}^{k_{j}}\right\} converging to ww^{\infty}.
Let k=kjk=k_{j} in (34), we know that w~kjΩ\qquad\tilde{w}^{k_{j}}\in\Omega,

θ(u)θ(u~kj)+(ww~kj)TF(w~kj)(ww~kj)TQ(wkjw~kj),wΩ.\theta\left(u\right)-\theta\left(\tilde{u}^{k_{j}}\right)+\left(w-\tilde{w}^{k_{j}}\right)^{T}F\left(\tilde{w}^{k_{j}}\right)\geq\left(w-\tilde{w}^{k_{j}}\right)^{T}Q\left(w^{k_{j}}-\tilde{w}^{k_{j}}\right),\quad\forall w\in\Omega.

Let the above equation k,k\to\infty, it is clear that:

wΩ,θ(u)θ(u)+(ww)TF(w)0,wΩ.w^{\infty}\in\Omega,\theta\left(u\right)-\theta\left(u^{\infty}\right)+\left(w-w^{\infty}\right)^{T}F\left(w^{\infty}\right)\geq 0,\forall w\in\Omega. (56)

(56) shows that ww^{\infty} in Ω\Omega^{\ast} is a solution of the variational inequality VI(Ω,F,θ)VI\left(\Omega,F,\theta\right), so the sequence {wk}\left\{w^{k}\right\} converges to w.w^{\infty}.

4 Numerical experiments

The numerical performance of Algorithm 1 is presented in this paper through solving the Lasso problem and comparing Algorithm 1 with optimal linearized ADMM(OLADMM)[22]. All simulation experiments were conducted on a laptop with 4GB of RAM memory using Matlab R2016a.

The LASSO problem[12] in statistics is as follows:

minx,y12xb22+ιy1s.t.x=Ay\min_{x,y}\frac{1}{2}\left\|x-b\right\|_{2}^{2}+\iota\left\|y\right\|_{1}\qquad s.t.\quad x=Ay (57)

where ARm×n,bRm,xRn,yRn.A\in R^{m\times n},b\in R^{m},x\in R^{n},y\in R^{n}.
Then applying Algorithm 1 to (57), we obtain:

The xx-subproblem is an unconstrained convex quadratic minimization problem, and its unique solution can be expressed as follows:

xk+1=11+β(b+λk+βAyk)x^{k+1}=\frac{1}{1+\beta}\left(b+\lambda^{k}+\beta Ay^{k}\right)

The yy-subproblem is a l1+l2l_{1}+l_{2} minimization problem that can be solved using the soft-threshold operator:

y^k=shrink{yk1τkδkβAT[λkβ(xk+1Ayk)],ιτkδkβ}.yk+1=ykσ(yky^k).\begin{split}&\hat{y}^{k}=shrink\left\{y^{k}-\frac{1}{\tau_{k}\delta_{k}\beta}A^{T}\left[\lambda^{k}-\beta\left(x^{k+1}-Ay^{k}\right)\right],\frac{\iota}{\tau_{k}\delta_{k}\beta}\right\}.\\ &y^{k+1}=y^{k}-\sigma\left(y^{k}-\hat{y}^{k}\right).\end{split}

The solution to the λ\lambda-subproblem is:

λ^k=λkβ(xk+1Ay^k).λk+1=λkσ(λkλ^k).\begin{split}&\hat{\lambda}^{k}=\lambda^{k}-\beta\left(x^{k+1}-A\hat{y}^{k}\right).\\ &\lambda^{k+1}=\lambda^{k}-\sigma\left(\lambda^{k}-\hat{\lambda}^{k}\right).\end{split}

In this paper, numerical experiments are conducted to compare the performance of the Algorithm 1 with OLADMM .
 Set the parameters of each algorithm as follows:
 Algorithm 1: r=ATAr=\left\|A^{T}A\right\|, ι=0.1ATb\iota=0.1\left\|A^{T}b\right\|_{\infty},τ0=0.75\tau_{0}=0.75,τmin=0.01\tau_{min}=0.01,γ=1.2\gamma=1.2 ,σ=0.9\sigma=0.9, β=1\beta=1,ρ=3\rho=3, p0=100p^{0}=100 , d0=100d^{0}=100 , ε=1ε=12σ+0.1{\varepsilon}^{\prime}=\frac{1}{\varepsilon}=\frac{1}{2-\sigma}+0.1, ηk=0.25min{1,1(max{1,kl})2}\eta_{k}=0.25\min\left\{1,\frac{1}{\left(\max\left\{1,k-l\right\}\right)^{2}}\right\} , sk=2min{1,1(max{1,kl})2}.s_{k}=2\min\left\{1,\frac{1}{\left(\max\left\{1,k-l\right\}\right)^{2}}\right\}.
where ll= the dimension of λ\lambda.
OLADMM: r=ATA,τ=0.75,β=1,ι=0.1ATb\text{OLADMM: }r=\left\|A^{T}A\right\|,\tau=0.75,\beta=1,\iota=0.1\left\|A^{T}b\right\|_{\infty}.
The stopping guidelines are:

pk+1=xk+1Ayk+1<εpri,anddk+1=βA(yk+1yk)<εdual.\left\|p^{k+1}\right\|=\left\|x^{k+1}-Ay^{k+1}\right\|<\varepsilon^{pri},\quad and\quad\left\|d^{k+1}\right\|=\left\|\beta A\left(y^{k+1}-y^{k}\right)\right\|<\varepsilon^{dual}.

where

εpri=nεabs+εrelmax{xk+1,Ayk+1}\varepsilon^{pri}=\sqrt{n}\varepsilon^{abs}+\varepsilon^{rel}max\left\{\left\|x^{k+1}\right\|,\left\|Ay^{k+1}\right\|\right\}

and

εdual=nεabs+εrelyk+1.\varepsilon^{dual}=\sqrt{n}\varepsilon^{abs}+\varepsilon^{rel}\left\|y^{k+1}\right\|.

with εabs\varepsilon^{abs} and εrel\varepsilon^{rel} set to be 10410^{-4} and102.10^{-2}.
For a matrix A of given dimension m×nm\times n,we generate the data randomly as follow:
p=1n,p=\frac{1}{n}, x0=sprandn(n,1,p),x^{0}=sprandn\left(n,1,p\right), A=randn(m,n),A=randn\left(m,n\right), b=Ax0+sqrt(0.001)randn(m,1).b=A\ast x^{0}+sqrt\left(0.001\right)\ast randn\left(m,1\right).The initial point is (y0,λ0)=(0,0)\left(y^{0},\lambda^{0}\right)=\left(0,0\right).

In order to observe the effect on the experiment caused by different values of σ\sigma , the choice of the:

σ={0.1,0.2,0.3,1.7,1.8,1.9}\sigma=\left\{0.1,0.2,0.3,\cdots 1.7,1.8,1.9\right\}

to carry out the experiment, the results of which show that when σ[0.6,1.4]\sigma\in\left[0.6,1.4\right], the results are satisfactory.In this paper, we take σ=0.9.\sigma=0.9.

The experimental results of the two algorithms are shown below:
Table 1. Comparsion between Algorithm 1 and OLADMM for (57).
n×nn\times n matrix OLADMM Algorithm 1 mm nn Iter. CPU(s) pk\left\|p^{k}\right\| qk\left\|q^{k}\right\| Iter. CPU(s) pk\left\|p^{k}\right\| qk\left\|q^{k}\right\| 1000 1500 16 3.23 0.07000.0700 0.07010.0701 11 2.38 0.04140.0414 0.04460.0446 1500 1500 13 2.81 0.06500.0650 0.06600.0660 10 2.53 0.03500.0350 0.05000.0500 1500 3000 17 7.03 0.06580.0658 0.06590.0659 10 6.18 0.04480.0448 0.05990.0599 2000 3000 14 7.16 0.07590.0759 0.07640.0764 10 6.57 0.02430.0243 0.03990.0399 3000 3000 13 7.80 0.05720.0572 0.05820.0582 9 7.40 0.03770.0377 0.06950.0695 3000 5000 15 24.96 0.06850.0685 0.06880.0688 10 23.44 0.03120.0312 0.04850.0485 4000 5000 13 25.01 0.06640.0664 0.06740.0674 10 24.69 0.03650.0365 0.05490.0549 5000 5000 12 26.34 0.06140.0614 0.06360.0636 9 25.73 0.04350.0435 0.07390.0739

Table 1 presents the number of iterations and total computation time needed for OLADMM and Algorithm 1. It is evident that Algorithm 1 consistently outperforms OLADMM as it demands fewer iteration steps and less time to meet the termination condition. Consequently, Algorithm 1 demonstrates superior efficiency in contrast to OLADMM. Furthermore, Algorithm 1 significantly surpasses OLADMM, providing robust backing for our convergence analysis.

5 Conclusion

In this study, we introduce an adaptive linearized alternating direction multiplier method with a relaxation step that incorporates both a relaxation step and an adaptive technique. Our approach adopts a unified framework of variational inequalities and establishes the global convergence of this adaptive linearized alternating direction multiplier method with a relaxation step through theoretical analysis. Through the resolution of the Lasso problem, we demonstrate the algorithm’s outstanding numerical performance. These findings will propel advancements in optimization algorithms and offer more efficient tools and methodologies for addressing practical challenges.

Conflict of Interest:The authors declare that they have no conflict of interest.

References

  • \bibcommenthead
  • Tao et al. [2009] Tao, M., Yang, J., He, B.: Alternating direction algorithms for total variation deconvolution in image reconstruction. TR0918, Department of Mathematics, Nanjing University (2009)
  • Ng et al. [2010] Ng, M.K., Weiss, P., Yuan, X.: Solving constrained total-variation image restoration and reconstruction problems via alternating direction methods. SIAM journal on Scientific Computing 32(5), 2710–2736 (2010)
  • Boyd et al. [2011] Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J., et al.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning 3(1), 1–122 (2011)
  • Combettes and Pesquet [2007] Combettes, P.L., Pesquet, J.-C.: A douglas–rachford splitting approach to nonsmooth convex variational signal recovery. IEEE Journal of Selected Topics in Signal Processing 1(4), 564–574 (2007)
  • Gabay and Mercier [1976] Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Computers & Mathematics with Applications 2(1), 17–40 (1976)
  • Glowinski and Marrocco [1974] Glowinski, R., Marrocco, A.: Analyse numerique du champ magnetique d’un alternateur par elements finis et sur-relaxation ponctuelle non lineaire. Computer Methods in Applied Mechanics & Engineering 3(1), 55–85 (1974)
  • He [2018] He, B.: My 20 years research on alternating directions method of multipliers. Oper. Res. Trans 22(1), 1–31 (2018)
  • Eckstein and Yao [2017] Eckstein, J., Yao, W.: Approximate admm algorithms derived from lagrangian splitting. Computational Optimization and Applications 68, 363–405 (2017)
  • He et al. [2002] He, B., Liao, L.-Z., Han, D., Yang, H.: A new inexact alternating directions method for monotone variational inequalities. Mathematical Programming 92, 103–118 (2002)
  • Ng et al. [2011] Ng, M.K., Wang, F., Yuan, X.: Inexact alternating direction methods for image recovery. SIAM Journal on Scientific Computing 33(4), 1643–1668 (2011)
  • Chan et al. [2012] Chan, R.H., Tao, M., Yuan, X.: Linearized alternating direction method of multipliers for constrained linear least-squares problem. East Asian Journal on Applied Mathematics 2(4), 326–341 (2012)
  • Tibshirani [1996] Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology 58(1), 267–288 (1996)
  • Recht et al. [2010] Recht, B., Fazel, M., Parrilo, P.A.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM review 52(3), 471–501 (2010)
  • Tao and Yuan [2011] Tao, M., Yuan, X.: Recovering low-rank and sparse components of matrices from incomplete and noisy observations. SIAM Journal on Optimization 21(1), 57–81 (2011)
  • Fang et al. [2015] Fang, E.X., He, B., Liu, H., Yuan, X.: Generalized alternating direction method of multipliers: new theoretical insights and applications. Mathematical programming computation 7(2), 149–187 (2015)
  • Woo and Yun [2013] Woo, H., Yun, S.: Proximal linearized alternating direction method for multiplicative denoising. SIAM Journal on Scientific Computing 35(2), 336–358 (2013)
  • He and Yuan [2013] He, B., Yuan, X.: Linearized alternating direction method of multipliers with gaussianback substitution for separable convex programming. Numerical Algebra, Control and Optimization 3(2), 247–260 (2013)
  • Ouyang et al. [2015] Ouyang, Y., Chen, Y., Lan, G., Pasiliao Jr, E.: An accelerated linearized alternating direction method of multipliers. SIAM Journal on Imaging Sciences 8(1), 644–681 (2015)
  • Chan et al. [2012] Chan, R.H., Tao, M., Yuan, X.: Linearized alternating direction method of multipliers for constrained linear least-squares problem. East Asian Journal on Applied Mathematics 2(4), 326–341 (2012)
  • Wang and Yuan [2012] Wang, X., Yuan, X.: The linearized alternating direction method of multipliers for dantzig selector. SIAM Journal on Scientific Computing 34(5), 2792–2811 (2012)
  • He et al. [2016] He, B., Ma, F., Yuan, X.: Linearized alternating direction method of multipliers via positive-indefinite proximal regularization for convex programming. Avaliable on (2016) https://doi.org/https://optimization-online.org
  • He et al. [2020] He, B., Ma, F., Yuan, X.: Optimally linearizing the alternating direction method of multipliers for convex programming. Computational Optimization and Applications 75(2), 361–388 (2020)
  • He [2015] He, B.: Ppa-like contraction methods for convex optimization: a framework using variational inequality approach. Journal of the Operations Research Society of China 3, 391–420 (2015)
  • Tao and Yuan [2018] Tao, M., Yuan, X.: On the optimal linear convergence rate of a generalized proximal point algorithm. Journal of Scientific Computing 74, 826–850 (2018)
  • Gu and Yang [2019] Gu, G., Yang, J.: On the optimal linear convergence factor of the relaxed proximal point algorithm for monotone inclusion problems. arXiv preprint arXiv:1905.04537 (2019)
  • Robbins H [1971] Robbins H, S.D.: A convergence theorem for non negative almost supermartingales and some applications. Optimizing Methods in Statistics, 233–257 (1971)