An adaptive linearized alternating direction multiplier method with a relaxation step for convex programming
Abstract
Alternating direction multiplication is a powerful technique for solving convex optimisation problems. When challenging subproblems are encountered in the real world, it is useful to solve them by introducing neighbourhood terms. When the neighbourhood matrix is positive definite, the algorithm converges but at the same time makes the iteration step small. Recent studies have revealed the potential non-positive definiteness of the neighbourhood matrix. In this paper, we present an adaptive linearized alternating direction multiplier method with a relaxation step, combining the relaxation step with an adaptive technique. The novelty of the method is to use the information of the current iteration point to dynamically select the neighbourhood matrix, increase the iteration step size, and speed up the convergence of the algorithm.We prove the global convergence of the algorithm theoretically and illustrate the effectiveness of the algorithm using numerical experiments.
keywords:
variational inequality, an adaptive linearized ADMM , a relaxation step , global convergence.1 Introduction
This paper considers the following separable convex optimization problem with linear constraints:
(1) |
where and are convex functions (but not necessarily smooth), and are closed convex sets. Problem (1) has recently been widely used in several areas such as image processing[1, 2], statistical learning[3] and communication networks[4].There exist numerous effective methods for addressing problem (1), including the primal-dual method, the augmented Lagrange method, and the alternating direction multiplier method. Notably, the alternating direction multiplier method has garnered significant attention in recent years due to its simplicity and efficiency. Many scholars have introduced several effective variations of this algorithm tailored to practical challenges.
The alternate direction multiplier approach, which was first put forth by Gabay and Mercier[5] and Glowinski and Marrocco[6], is widely recognized as an effective way to solve (1). For the progress of research on the alternating direction multiplier method, we refer to[7].The method possesses fast convergence speed and strong convergence performance, making it a crucial tool for addressing convex optimization problems with divisibility. It can decompose a large problem into two or more small-scale subproblems and then iteratively solve each subproblem to enhance solution speed. Its direct, adaptable, and practical nature is particularly effective in tackling convex optimization problems with separability. The alternating direction multiplier method can be described as follows:
(2) |
where is the augmented Lagrangian function of (1)
where represents the Lagrange multiplier, and represents the penalty parameter. For simplicity, the penalty parameter is fixed in our discussion.
A significant focus in the diverse research literature concerning the alternating direction multiplier method is the exploration of efficient strategies for solving its subproblems. In certain scenarios, the functions and , or the coefficient matrices and may exhibit unique properties and structures. Leveraging these characteristics, researchers can extend the framework presented in (2) to devise algorithms tailored to specific applications while upholding theoretical convergence guarantees. This principle underscores the critical role of effectively applying the alternating direction multiplier method across various fields and applications. To delve deeper into this concept, let’s examine the -subproblem outlined in (2). As previously noted, the function , the matrice , and the set are pivotal in addressing the -subproblems. Typically, when these the function, matrice, and set are in a generic form, the solution to (2) can be obtained using straightforward iterative techniques,we refer to [8, 9, 10]. However, in practical scenarios, the functions, matrices, and sets associated with the -subproblem may possess distinct characteristics, necessitating more efficient solution methods. When the matrix B is not an identity matrix, the -subproblem (2) can be reformulated as follows:
(3) |
After linearizing the quadratic term in (3) , it is obtained:
(4) |
where
(5) |
where is a constant, and thus we obtain the linearized alternating direction multiplier method[11]:
(6) |
Linearized alternating direction multiplier method is widely used with compressed perception[12], and image processing[13, 14], etc. .
A more general alternating direction multiplier method[15] can be written as follows:
(7) |
where is positive definite, and thus, the linearized alternating direction multiplier method can be viewed as a special case of the general alternating direction multiplier method, where the positive definite proximal term.
(8) |
Since in many practical applications, only one subproblem in (6) needs to be linearized, in this paper, we only need to consider the case of linearizing the -subproblem in (7).
In order to improve the applicability of the linearized alternating direction multiplier method, a number of improved linearized alternating direction multiplier methods[16, 17, 18, 19, 20]have been proposed by some scholars.
Literature[21] proposes an indefinite proximity alternating direction multiplier method with the following iteration format:
(9) |
where
He et al. proposed an optimal linearized ADMM in the literature [22] with the iterative steps:
(10) |
where
It is easy to see that the matrix is not necessarily semipositive definite. In the paper the authors prove that 0.75 is an optimal lower bound for .
Another important issue for ADMM is to design an accelerated version of the original ADMM by slightly modifying it through a simple relaxation scheme. Note that ADMM is closely related to Proximity Point Algorithms[23] (PPA). For the Proximity Point Algorithm, the convergence rate is improved if an additional over-relaxation step is added to the basic variables; see Tao[24] for a relaxation variant of the PPA and proof of its linear convergence result.Gu and Yang[25] further proved the optimal linear convergence rate of the relaxation PPA under regularity conditions. As Boyd et al. proved in[3], the execution of ADMM relies on the inputs of and does not require at all. Thus, plays the role of an intermediate variable in (2), while is the basic variable. It is therefore natural to ask whether it is possible to include a relaxation step for the basic variables in the ADMM scheme (2) to obtain a faster ADMM-type method. This idea leads to:
(11) |
(12) |
where the relaxation factor Based on the above discussion,In this study, we introduce an adaptive linearized alternating direction multiplier method with a relaxation step that incorporates both a relaxation step and an adaptive technique. Our approach adopts a unified framework of variational inequalities and establishes the global convergence of this adaptive linearized alternating direction multiplier method with a relaxation step through theoretical analysis. Through the resolution of the Lasso problem, we demonstrate the algorithm’s outstanding numerical performance. These findings will propel advancements in optimization algorithms and offer more efficient tools and methodologies for addressing practical challenges.
The rest of the paper is organized as follows. Section 2 gives some preliminaries; Section 3 gives iterative steps and proof of convergence of the algorithm; Section 4 gives the numerical experiments; and Section 5 draws the conclusions.
2 Preliminaries
We present some preliminaries that will be used in convergence analysis.
In this article, the symbol denotes the two-norm is the matrix norm, where is a symmetric positive definite matrix and the vector . When is not a positive definite matrix, we still use the above notation.
2.1 characterization of variational inequalities
Define the Lagrangian function corresponding to problem (1) as:
(13) |
In (13), and denote primitive and dual variables, respectively.
If satisfies the following inequality:
then is called a saddle point of .
The above inequality is equivalent to the following variational inequality form:
(14) |
The above variational inequality can be written in the following form:
(15) |
where
(16) |
Owing to:
So, is a monotone operator.
The problem (1) is reformulated as the variational inequality (15) in this manner. We denote the set of solutions of the variational inequality as , where represents the set of non-empty solutions.
2.2 some notation
For the convenience of the proof, first define some auxiliary variables and matrices. Let
(17) |
(18) |
(19) |
(20) |
Lemma 2.1. The and defined in (17) and (18) satisfy:
(21) |
Proof: (21) clearly hold.
Lemma 2.2. (Robbins-Siegmund Theorem[26]) and are non-negative sequences and there are :
(22) |
if and so exists and is bounded, while there are
Definition 2.1.If a function is a nonsmooth convex function on a convex set,if the following inequality holds:
then is said to be the subgradient of the function at ; The set consisting of all subgradients is called the subdifferential of the function at , denoted .
2.3 Stopping criterion
In the context of the two-block divisible convex optimization problem (1), we investigate the optimality conditions for each subproblem during the iterative process of the Alternating Direction Method of Multipliers .
According to the optimality condition theorem, if , are optimal solutions to the convex optimisation problem (1) and is the corresponding Lagrange multiplier, then the following conditions are satisfied.
(23) |
(24) |
(25) |
In this context, condition (25) is denoted as the original feasibility condition, while conditions (23) and (24) are labeled as the dual feasibility conditions.
According to the optimality conditions for the -subproblem in (1), we have:
(26) | ||||
According to the optimality conditions for the -subproblem in (1), we have:
(27) |
From the definition of in (1), the above equation can be equated to:
(28) | ||||
(28) is equivalent to:
(29) |
When comparing (29) with condition (23), it is evident that the additional term is Thus, to verify dual feasibility, it is adequate to examine the residual
In summary, to ascertain the convergence of the alternating direction multiplier method, it is necessary to verify if the two residuals and are sufficiently small.
where
3 Algorithm and convergence analysis
3.1 New algorithms
Suppose that and are proper, closed, convex functions.
The main iterative steps of the algorithm are:
Algorithm 1. An adaptive linearized alternating direction multiplier method with a relaxation step .
Set up: choose parameter sequence and , where and
Step 0.
Input: set up
Step 1. Calculate:
(30) |
where
(31) |
where
Step 2. If any one of the following conditions holds:
(32) | |||
where ,
then go to step 3. otherwise turn to step 1.
Step 3. If
then set Otherwise
Step 4. If any one of the following conditions holds:
where
then set Otherwise
Step 5. If the stopping criterion is satisfied, return to step 6.
where, the stopping criterion is:
(33) |
Otherwise make and return to step 1.
Step 6. Output:
3.2 Global convergence analysis
In this section,we prove the global convergence for the proposed method.Before proceeding,we need the following lemma.
Lemma 3.1. Let the sequence be the iterative sequence generated by Algorithm 1, and be defined in (19), then, for any , there are:
(34) |
Proof: by the optimality condition for the subproblem in Algorithm 1: for any ,there are:
Since in (19), then:
(35) |
From the optimality conditions for the -subproblem in Algorithm 1, it follows that for any , there are:
Also by the definition of , the above equation can be written as:
(36) |
By the definition of in (19), we have:
The above equation can be written as:
(37) |
The lemma can be proven by combining equations (35), (36), and (37) along with the notation .
Lemma 3.2. Let the sequence be the iterative sequence generated by Algorithm 1 and be defined in (19), then:
(38) |
where is defined in (17).
Proof: This follows from (31) and the definitions of and :
This can be seen in combination with (31):
Therefore, equation (38) holds.
It is clear from (38) and (21):
Lemma 3.3.Let the sequence be the iterative sequence generated by Algorithm 1 and be defined in (19), Then, we have:
(39) |
where the matrix
Proof: Using the equation:
(40) |
In (40), let ,,,. We can get:
(41) |
For the last term in (41), we have:
(42) |
Now, let us examine the properties of the matrix . Utilizing (21), we can derive:
(43) |
To simplify the proof, we give some properties of the parameters as follows:
From the iterative format of Algorithm 1:That is, the sequence in Algorithm 1 satisfies:
Let the th step ultimately satisfy the parameters of Step 4 as , where is an integer.
Let that is
then:
By parameter setting know
Making gives: , that is:
Theorem 3.1. Let the sequence be the iterative sequence generated by Algorithm 1 and be defined in (19). Then, we have:
(44) |
where and
Proof: In the following, we will further investigate the term and show how it can be bounded.
Due to and We have:
(45) |
Also because and
Therefore, we have:
(46) |
It is clear from (31):
Substituting the above equation into (46) gives:
(47) |
In the following, we estimate .
From the inequality: for , there is :
(48) |
Substituting (48) into (47) shows that:
(49) |
Owing to:
(50) |
It follows from (50) and (34):
Let the above equation and combine with (15) to show that:
(51) |
It is clear from (21) and (38):
(52) |
It is clear from (39):
(53) |
It follows from (51), (52) and (53):
(54) |
Substituting (49) into (54) gives:
(55) |
where
For to hold, it is sufficient that: and
is equivalent to
So (55) can be written as:
Therefore, Theorem 3.1 is proved.
Theorem 3.2. Let the sequence be the iterative sequence generated by Algorithm 1. Taking any point in , we have:
(a). exists and
(b). and
Proof: Let
By Lemma 2.2, (a) and (b) hold.
Theorem 3.3. Let the sequence be the iterative sequence generated by Algorithm 1, then converges to a point .
Proof: It follows from Theorem 3.2:
It follows from combining (38) and the non-singularity of :
Since the sequence is a bounded sequence,
then for any fixed there is a bounded.
That is, the sequence is bounded.
because of
It is known that is bounded.
Clearly the sequence is also bounded and there must exist a convergence point such that a subsequence of the existence sequence converges to
From in (19), we have:
Since Matrix A is a column full rank matrix, it follows that the sequence converges.
Set then there exists a subsequence converging to .
Let in (34), we know that
,
Let the above equation it is clear that:
(56) |
(56) shows that in is a solution of the variational inequality , so the sequence converges to
4 Numerical experiments
The numerical performance of Algorithm 1 is presented in this paper through solving the Lasso problem and comparing Algorithm 1 with optimal linearized ADMM(OLADMM)[22]. All simulation experiments were conducted on a laptop with 4GB of RAM memory using Matlab R2016a.
The LASSO problem[12] in statistics is as follows:
(57) |
where
Then applying Algorithm 1 to (57), we obtain:
The -subproblem is an unconstrained convex quadratic minimization problem, and its unique solution can be expressed as follows:
The -subproblem is a minimization problem that can be solved using the soft-threshold operator:
The solution to the -subproblem is:
In this paper, numerical experiments are conducted to compare the performance of the Algorithm 1 with OLADMM .
Set the parameters of each algorithm as follows:
Algorithm 1: , ,,, ,, ,, , , ,
,
where = the dimension of .
.
The stopping guidelines are:
where
and
with and set to be and
For a matrix A of given dimension ,we generate the data randomly as follow:
The initial
point is .
In order to observe the effect on the experiment caused by different values of , the choice of the:
to carry out the experiment, the results of which show that when , the results are satisfactory.In this paper, we take
The experimental results of the two algorithms are shown below:
Table 1. Comparsion between Algorithm 1 and OLADMM for (57).
matrix
OLADMM
Algorithm 1
Iter.
CPU(s)
Iter.
CPU(s)
1000
1500
16
3.23
11
2.38
1500
1500
13
2.81
10
2.53
1500
3000
17
7.03
10
6.18
2000
3000
14
7.16
10
6.57
3000
3000
13
7.80
9
7.40
3000
5000
15
24.96
10
23.44
4000
5000
13
25.01
10
24.69
5000
5000
12
26.34
9
25.73
Table 1 presents the number of iterations and total computation time needed for OLADMM and Algorithm 1. It is evident that Algorithm 1 consistently outperforms OLADMM as it demands fewer iteration steps and less time to meet the termination condition. Consequently, Algorithm 1 demonstrates superior efficiency in contrast to OLADMM. Furthermore, Algorithm 1 significantly surpasses OLADMM, providing robust backing for our convergence analysis.
5 Conclusion
In this study, we introduce an adaptive linearized alternating direction multiplier method with a relaxation step that incorporates both a relaxation step and an adaptive technique. Our approach adopts a unified framework of variational inequalities and establishes the global convergence of this adaptive linearized alternating direction multiplier method with a relaxation step through theoretical analysis. Through the resolution of the Lasso problem, we demonstrate the algorithm’s outstanding numerical performance. These findings will propel advancements in optimization algorithms and offer more efficient tools and methodologies for addressing practical challenges.
Conflict of Interest:The authors declare that they have no conflict of interest.
References
- \bibcommenthead
- Tao et al. [2009] Tao, M., Yang, J., He, B.: Alternating direction algorithms for total variation deconvolution in image reconstruction. TR0918, Department of Mathematics, Nanjing University (2009)
- Ng et al. [2010] Ng, M.K., Weiss, P., Yuan, X.: Solving constrained total-variation image restoration and reconstruction problems via alternating direction methods. SIAM journal on Scientific Computing 32(5), 2710–2736 (2010)
- Boyd et al. [2011] Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J., et al.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning 3(1), 1–122 (2011)
- Combettes and Pesquet [2007] Combettes, P.L., Pesquet, J.-C.: A douglas–rachford splitting approach to nonsmooth convex variational signal recovery. IEEE Journal of Selected Topics in Signal Processing 1(4), 564–574 (2007)
- Gabay and Mercier [1976] Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Computers & Mathematics with Applications 2(1), 17–40 (1976)
- Glowinski and Marrocco [1974] Glowinski, R., Marrocco, A.: Analyse numerique du champ magnetique d’un alternateur par elements finis et sur-relaxation ponctuelle non lineaire. Computer Methods in Applied Mechanics & Engineering 3(1), 55–85 (1974)
- He [2018] He, B.: My 20 years research on alternating directions method of multipliers. Oper. Res. Trans 22(1), 1–31 (2018)
- Eckstein and Yao [2017] Eckstein, J., Yao, W.: Approximate admm algorithms derived from lagrangian splitting. Computational Optimization and Applications 68, 363–405 (2017)
- He et al. [2002] He, B., Liao, L.-Z., Han, D., Yang, H.: A new inexact alternating directions method for monotone variational inequalities. Mathematical Programming 92, 103–118 (2002)
- Ng et al. [2011] Ng, M.K., Wang, F., Yuan, X.: Inexact alternating direction methods for image recovery. SIAM Journal on Scientific Computing 33(4), 1643–1668 (2011)
- Chan et al. [2012] Chan, R.H., Tao, M., Yuan, X.: Linearized alternating direction method of multipliers for constrained linear least-squares problem. East Asian Journal on Applied Mathematics 2(4), 326–341 (2012)
- Tibshirani [1996] Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology 58(1), 267–288 (1996)
- Recht et al. [2010] Recht, B., Fazel, M., Parrilo, P.A.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM review 52(3), 471–501 (2010)
- Tao and Yuan [2011] Tao, M., Yuan, X.: Recovering low-rank and sparse components of matrices from incomplete and noisy observations. SIAM Journal on Optimization 21(1), 57–81 (2011)
- Fang et al. [2015] Fang, E.X., He, B., Liu, H., Yuan, X.: Generalized alternating direction method of multipliers: new theoretical insights and applications. Mathematical programming computation 7(2), 149–187 (2015)
- Woo and Yun [2013] Woo, H., Yun, S.: Proximal linearized alternating direction method for multiplicative denoising. SIAM Journal on Scientific Computing 35(2), 336–358 (2013)
- He and Yuan [2013] He, B., Yuan, X.: Linearized alternating direction method of multipliers with gaussianback substitution for separable convex programming. Numerical Algebra, Control and Optimization 3(2), 247–260 (2013)
- Ouyang et al. [2015] Ouyang, Y., Chen, Y., Lan, G., Pasiliao Jr, E.: An accelerated linearized alternating direction method of multipliers. SIAM Journal on Imaging Sciences 8(1), 644–681 (2015)
- Chan et al. [2012] Chan, R.H., Tao, M., Yuan, X.: Linearized alternating direction method of multipliers for constrained linear least-squares problem. East Asian Journal on Applied Mathematics 2(4), 326–341 (2012)
- Wang and Yuan [2012] Wang, X., Yuan, X.: The linearized alternating direction method of multipliers for dantzig selector. SIAM Journal on Scientific Computing 34(5), 2792–2811 (2012)
- He et al. [2016] He, B., Ma, F., Yuan, X.: Linearized alternating direction method of multipliers via positive-indefinite proximal regularization for convex programming. Avaliable on (2016) https://doi.org/https://optimization-online.org
- He et al. [2020] He, B., Ma, F., Yuan, X.: Optimally linearizing the alternating direction method of multipliers for convex programming. Computational Optimization and Applications 75(2), 361–388 (2020)
- He [2015] He, B.: Ppa-like contraction methods for convex optimization: a framework using variational inequality approach. Journal of the Operations Research Society of China 3, 391–420 (2015)
- Tao and Yuan [2018] Tao, M., Yuan, X.: On the optimal linear convergence rate of a generalized proximal point algorithm. Journal of Scientific Computing 74, 826–850 (2018)
- Gu and Yang [2019] Gu, G., Yang, J.: On the optimal linear convergence factor of the relaxed proximal point algorithm for monotone inclusion problems. arXiv preprint arXiv:1905.04537 (2019)
- Robbins H [1971] Robbins H, S.D.: A convergence theorem for non negative almost supermartingales and some applications. Optimizing Methods in Statistics, 233–257 (1971)