A Modified Distributed Optimization Algorithm via Alternating Direction Method of Multipliers
Abstract
Alternating Direction Method of Multipliers (ADMM) algorithm is widely adopted for solving the distributed optimization problem(DOP).
In this paper, a modified ADMM algorithm is proposed, in which the unconstrained distributed optimization problems are considered. Our algorithm allows agents to update their local states and dual variables in a completely distributed and parallel manner by adding extra designed items based on sequential ADMM [22].
Moreover, updating rules and storage method for variables are illustrated.
It is shown that all agents can reach a consensus by asymptotically converging to the optimal solution.
Besides, the global cost function will converges to the optimal value at a rate of .
A numerical simulation is given to show the effectiveness of the proposed algorithm.
ADMM, distributed optimization, parallel algorithm.
1 Introduction
Optimization in networked multi-agent systems has widespread applications including wireless sensor nerworks [1], internet of things [2], distributed privacy preservation [3], distributed Nash Equilibrium seeking [4, 5] and machine learning [6]. Such a problem was initially handled in a centralized manner [7, 8, 9], which means that a centralized fusion is required to collect and process the information of all the agents through an underlying communication network. Clearly, tremendous communication and computation resources need to be consumed to implement the centralized optimization methods [10]. For instance, in linear regression problems, it is often prohibitive to gather all the data from different applications of interest [9]. In economic dispatch (ED) problems [11, 12], centralized manner cannot satisfy the plug-to-play demand.
To remove the mentioned limitations, distributed optimization problem(DOP) is considered. A common DOP is illustrated as follows,
(1) |
where is the global variable to be optimized, is the local cost function privately known by agent , while unknown by other agents for . In each agent, an estimate of the optimal solution, which is also known as the state, will be generated by applying an iterative algorithm. Then the states need often to be transmitted among networked agents for the purpose of reaching a consensus as the algorithm runs. Finally, the global optimization problem (1) can be solved through collaborations among different agents.
To solve the DOP, two main classes of distributed optimization algorithms have been proposed, namely the subgradient-based methods and dual-based methods. In the former class, a subgradient computation needs to be conducted in each iteration [13]. Each agent updates its state with the local subgradient information and states collected from other agents. The subgradient-based methods have been developed for various scenarios [14, 15, 16, 17, 18, 19]. In the dual-based methods, a Lagrangian-related functions need to be constructed in the update steps. Alternating Direction Method of Multipliers (ADMM) is a well-known dual-based method, of which the early works can be traced back to [20, 21, 35]. By treating the states as primal variables, ADMM minimizes an augmented Lagrangian function and updates both primal and dual variables alternately. In recent years, ADMM has become a popular way to solve DOPs in various areas. Compared with the subgradient-based methods, ADMM-based methods can achieve faster convergence rate [7, 25], and improved robustness against Gaussian noises[37].
It is worth noting that for some early works on ADMM-based algorithms developed to solve the optimization problems in a multi-agent systems, the utilization of the centralized fusion units cannot be avoided [7]. In [23], an ADMM-based optimization algorithm named Jacobi-Proximal ADMM is proposed. It adds a proximal term for each primal variable and a damping parameter for the update step of dual variable. Though the method shows a satisfactory performance, the dual variables need to be handled in a centralized manner. To avoid the need of centralized data processing, some distributed ADMM-based algorithms have been proposed. A sequential ADMM is proposed in [22], in which each agent updates its states by following a predefined order. Both primal and dual variables are computed in a distributed manner. However, the predefined updating order may cost more computational resources, which makes it not well suited for large-scaled systems. A novel algorithm named parallel ADMM is proposed in [8]. It reduplicates dual variables for each edge and doubles the dimension of edge-node incidence matrix to make parallelism amenable, hence the algorithm may need extra storage space.
In this paper, the sequential ADMM algorithm in [22] will be modified by adding extra terms are added in the primal variable updating rule. And then the primal variables of different iterations are adopted to update the dual variable for each agent. Main features of the newly presented algorithm are summarized as follows,
-
1)
The presented algorithm is fully distributed, in the sense that a central unit for data collection and processing is not needed for solving the optimization problem. The update of both primal and dual variables for each agent only depends on its local and neighboring information.
-
2)
The algorithm is suitable for parallel computing. By modifing the existing algorithm in [22], each agent updates primal variable in a parallel manner rather than a preset sequential order.
The remainder of our paper is organized as follows. In Section II, the considered unconstrained DOP is presented and the sequential ADMM algorithm in [22] is reviewed. In Section III, the modified ADMM for solving Probmen (1) with distributed and paralle manner is developed. In Section IV, numerical simulation results are provided to validate the effectiveness of the proposed method, followed by the conclusion drawn in Section V.
Notation: For a column vector , denotes the transpose of . denotes the standard Euclidean norm, . For a matrix , denotes the transpose of . denotes the entry located on the th row and th column of matrix . and denotes the th column and th row of , respectively. For matrices and , and . represents a matrix of all zeros. represents the set of all subgradient of function at . For a nonempty and finite set , denotes the cardinality of the set.
2 Problem Statement
In this paper, we shall revise the distributed ADMM algorithm in [22], such that it can be implemented in a parallel manner. Before we present the details of our proposed algorithm, some necessary preliminaries are provided in this section.
2.1 Problem Statement
In this paper, an unconstrained distributed optimization problem, as shown in (1), will be considered. To be more specific, agents indexed by are aimed at solving the optimization problem collaboratively under a fixed undirected graph , where and denote the set of nodes and the set of edges connecting two distinct agents, respectively. Suppose that there is at most one edge with a subscript pair connecting agents and , we prescribe the the first subscript and the second subscript satisfy , for example, if agent 1 and agent 2 are connected, then is the sole edge for them.
Let us recall the basic idea of solving a distributed optimization problem [7, 13, 22]
(2) |
where is the global variable to be optimized, is the local cost function stored in agent .
The goal of Problem (2) is to find the minimizer , such that the function can be minimized when . To develop a distributed algorithm for solving Problem (2), this problem should often be reformulated to accommodate the distributed property. Normally, each agent holds an estimate of , which can be denoted by . As the optimization algorithm goes on, each agent , for , will utilize its local estimate and the information received from its neighbors till current iteration to update its local estimate for the next iteration. After sufficient iterations, the estimates for all agents tend to reach a consensual value. The form of Problem (2) can thus be changed as follows,
(3) |
The linear constraint implies the consensual mentioned above, Problem (3) and Problem (2) has the same solution.
2.2 Distributed Sequential ADMM Algorithm [22]
To solve Problem (3), a distributed sequential ADMM algorithm is presented in [22]. Firstly, the updating rules of agent in the th iteration are developed as follows,
(4) | ||||
(5) |
where is a positive penalty parameter. denotes the dual variable associated with the constraint on edge stored in agent . is the set of neighbors of agent . Furthermore, the subsets of neighbors and are called the predecessors and successors of agent , respectively. Take the sample graph of 4 agents in Fig. 1 as an example. , , and .
The detailed procedure of the distributed sequential ADMM algorithm developed in [22] is provided below.
Remark 1
The distributed sequential ADMM algorithm in [22] is developed based on standard ADMM [7]. The distributed feature of the algorithm can be observed from the updating rules (4)-(5). More specifically, only locally available information received from its neighbors is needed to update both primal variable and dual variable , with , for each agent . In other words, center node is not necessary for data collection and processing to implement the algorithm.
Nevertheless, the updating process in (4) is calculated in a sequential order from agent to . Each agent needs to collect its predecessors’ information obtained in iteration (i.e. ) to update its local state in iteration (i.e. ). It implies that the updating processes for the entire group should be operated from agent to agent in a sequential order, as depicted in Fig. 2. Hence, the sequential ADMM algorithm may not be suitable to solve distributed optimization problems for large scaled multi-agent systems, since the total waiting time will be increased to an unacceptable level when the number of agents is large.

3 A Modified Distributed Parallel ADMM algorithm
To remove the limitation of distributed sequential ADMM algorithm [22] as summarized in Section II, a new distributed ADMM algorithm is presented in this section. We first present the updating rules of each agent in the th iteration:
(6) | ||||
(7) |
where is a positive penalty parameter. is a constant satisfying . and are defined the same as mentioned above in Section II. denotes the dual variable associated with the constraint on edge stored in agent . Notice that according to the subscript, dual variable is stored in different agents in the two algorithm. The detailed procedure of the newly presented distributed ADMM algorithm is summarized below.
Remark 2
Note that the definations of the predecessors and successors of the distributed sequential ADMM algorithm in [22] and our newly proposed algorithm are the same, mathematically.
The proposed algorithm changes the sequential updating manner to a parallel way. As observed from (6), no terms related to are needed to update . Hence all the agents can update both primal and dual variables simultaneously, rather than waiting for its predecessors to complete their updating procedures. For easier understanding the major difference between Algorithms 1 and 2, the parallel updating manner of for the considered group of 4 agents is plotted in Fig. 3.
4 Convergence Analysis
In this section, we shall firstly show that all the agents’ states will converge to a unique optimal solution as the number of iterations approaches infinity. Then the convergence rate will be analyzed.
Denote the global cost function by , where , and the optimal solution vector by .
Let the vector be the Lagrange multiplier vector. For clarity, is stacked by dual variables with subscript numbers arranged in the ascending lexicographical order. Next an edge-node incidence matrix as defined in [22] is reintroduced to describe the existence of edges among two distinct agents for a given graph. Suppose that is the th element of . corresponds to the th row of (denoted by ). Note that the elements in satisfy that and other entries are . For example, the vector can be written as , then the edge-incidence matrix of Fig. 1 is shown as follows,
(8) |
Now Problem (3) can be rewritten in the following compact form,
(9) |
next introduce two matrices and for further illustration.
Denote Lagrangian function of Problem (1) by , next some typical assumptions are imposed.
Assumption 1: Each cost function is closed, proper and convex.
The next assumption states the existence of the saddle point.
Assumption 2: The Lagrangian function has a saddle point, i.e., there exists a solution, variable pair with
(10) |
The following lemma shows that Problem (3) can be changed to the form of variational inequality, based on which we can complete the proof of optimality. which is not expressed in [22]. The details are demonstrated in Section IV.
Before establishing the main theorem, we first show three important lemmas. We now recall the following preliminary results.
Lemma 1[23]: Problem (3) is equivalent to variational inequality as follows,
(11) |
where , . is the solution of the variational inequality (3).
The classification of neighbors in Section II and Section III serves the convergence proof of our upcoming algorithm and has a special property without losing generality, shown in the next lemma. This property plays a crucial role in the convergence analysis. The proof of this lemma is provided in Appendix.
Lemma 2: In an undirected graph with nodes, the following equation holds for all :
(12) |
Next a basic lemma for convergence analysis is established.
Lemma 3: The following inequality holds for all :
(13) |
Proof: We define as follows,
(14) |
Then the gradient of is
(15) |
where the equation holds by utilizing (7) and the relation . Then denote , thus we have and after that holds. For the relation of subgradient holds: , we get the following inequality:
(16) |
By substituting with (4) we obtain
(17) |
Noting that the following relations hold for all :
(18) |
(19) |
(20) |
(21) |
where relations (19) and (20) hold due to the matrix relation .
The following theorem illustrates the main result in our algorithm that all agents’ states converge to the optimal point.
Theorem 1: The state generated by Algorithm 2 in each agent converges to the optimal solution of Problem (1), and global cost function converges to the optimal value with . i.e.,
(22) | |||
(23) |
where and is the optimal solution and the optimal value, respectively.
Proof: According to derived from (7) and , where is the optimal solution, these following relations hold as follows,
(24) |
(25) |
(26) |
(27) |
Then the following relation hold:
(28) |
Recall (4), set in and rearrange the terms, the following inequality is established:
(29) |
where in the second inequality we use the relation of the saddle point of Lagrangian function relation .
Then rearranging and summing preceding relation over yields
(31) |
Inequality (4) implies is bounded because is bounded and left side of (4) is nonnegative. Thus the following equation holds for any arbitrary constant .
(32) |
Equation (4) implies that that states of agents converge to the same point when the number of iterations goes to infinity. We now show that this point is the optimal point of the mentioned optimization Problem (1).
It is trivial to obtain that and from and . By summing up (4) over , for , and , we obtain that
(33) |
Notice that the optimal solution vector satisfies , which implies the optimal solution is the consensual value for all , i.e., . For relation (4), now we let , then denote and . Then for arbitrary , inequality (4) can be written as
(34) |
By applying Lemma 2, (4) can be simplified as
(35) |
Relation (35) indicates that for sufficiently large , vector is the solution to (11). Then recall that Lemma 1 implies that the solution of the preceding VI is equivalent to the solution of our primary optimization problem. By using this lemma and the foregoing details we complete the proof of Theorem 1.
Remark 3
The term with coefficient in (6) is the standard form of our algorithm. It can also be proved that when the coefficient , Algorithm 1 also meets Theorem 1 but shows worse simulation performance.
The next theorem reveals the convergence rate for our algorithm and shows the same rate as mentioned in [22, 10].
Theorem 2: Algorithm 2 converges at a rate of .
Denote , summing up from to we have
(37) |
Let . Since is convex, we have . Then we obtain
(38) |
Finally by the re-arrangement of (4), we obtain
(39) |
5 Simulation
5.1 Experiment Setup
A distributed sensing problem is considered with a network of 9 nodes shown in Fig. 4. For an unknown signal(using one-dimensional signal for convenience) , each agent measures by using , where and are measured data following the standard Gaussian distribution . Then an consensus least squares problem is to be solved as follows,
(40) |
We use to record the residual. Both prime and dual variables are initialized by zero. Penalty parameter is chosen to be 1. For comparison, some classical distributed optimization algorithms are also considered such as JADMM in [23] and DSM in [13]. The first one is another distributed ADMM-based algorithm with parallel manner, and the latter one is one of the most nown subgradient-based algorithm.
5.2 JADMM algorithm
We reformulate the typical form of JADMM in [23] as follows,
(41) |
and we let the penalty parameter to match the fore-mentioned denotation.
5.3 DSM algorithm
A well-known algorithm named distributed subgradient method (DSM)[13] is shown as follows,
(42) |
where the coefficients before states denote the weights, is the diminishing stepsize where we use in the simulation, and is the subgradient of at .
5.4 Simulation Performance
All the simulations are operated in MATLAB R2018b and its convex optimization toolbox(CVX) developed by Stanford. The experimental results are depicted in Fig. 5-8.
Fig. 5 depicts that original Problem (2) go to the optimal value with Algorithm 2. The state updating records of are given in Fig. 6, which shows the states reach a consensual optimal value . Fig. 7 shows that our proposed algorithm achieves good performance and is the fastest one among the compared algorithms. Fig. 8 illustrates the difference by differing the adjustable parameter and shows that higher parameter may cause worse performance.
6 Conclusion
In this paper, we improve the sequential distributed ADMM algorithm in [22], and propose a new ADMM algorithm which makes distributed ADMM parallelism, without adding more parameters or extra procedures, and we fully prove the optimality both for global cost function and the variables stored in agents. We also give numerical example to verify the possibility and effectiveness of our results. Our results also show similar performance with JPADMM in given example. Future work include adding privacy-preservation property for our proposed method and consider more general scenarios such as constrained distributed optimization problem in smart grids.
7 APPENDIX
7.1 Proof of lemma 2
Proof: We prove this lemma by mathematical induction. Let , it is easy to obtain that relation is true. Assume relation (12) holds in the case now. Then for , we add one node to the graph with the index . Without losing generality, suppose that agents are the neighbors of agent . then . In the meantime, of agents increases by 1 for agent belongs to . i.e., the following relations hold:
(43) |
It is implied by (7.1) that is true for . This lemma is completed.
References
- [1] Y. Zhang, Y. Lou, Y. Hong, and L. Xie, “Distributed projection-based algorithms for source localization in wireless sensor networks,” IEEE Trans. Wirel. Commun., vol. 15, no. 6, Jun. 2015.
- [2] H. Wu, and W. Wang, “A game theory based collaborative security detection method for internet of things systems,” IEEE Trans. Inf. Forensics Security, vol. 13, no. 6, pp. 1432-1445, Jun. 2018.
- [3] C. Zhang, M. Ahmad, and Y. Wang, “ADMM based privacy-preserving decentralized optimization,” IEEE Trans. Inf. Forensics Security, vol. 14, no. 3, pp. 565-580, Mar. 2019.
- [4] F. Salehisadaghiani, W. Shi, L. Pavel, “Distributed Nash equilibrium seeking under partial-decision information via the alternating direction method of multipliers,” Automatica, vol. 103, pp. 27-35, 2019.
- [5] F. Salehisadaghiani, L. Pavel, “Distributed Nash equilibrium seeking: A gossip-based algorithm,” Automatica, vol. 72, pp. 209–216, 2016.
- [6] C. Cortes, and V. Vapnik, “Support-vector networks,” Mach. Learn., vol. 20, no. 3, pp. 273–297, 1995.
- [7] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Found. Trends Mach. Learn., vol. 3, no. 1, pp. 1–122, 2011.
- [8] J. Yan et al.,“Parallel alternating direction method of multipliers,” Inf. Sci., vol 507, pp. 185–196, 2020.
- [9] J. A. Bazerque, G. Mateos, G. B. Giannakis, “Distributed lasso for in-network linear regression,” in Proc. IEEE ICASSP, Dallas, TX, USA, 2010, pp. 2978-2981.
- [10] Tao Yang et al, “A survey of distributed optimization,” Annu. Rev. Control, vol. 47, pp. 3278–305, 2019.
- [11] F. Guo et al., “Hierarchical decentralized optimization architecture for economic dispatch: A New Approach for Large-Scale Power System” IEEE Trans. Ind. Inform., vol. 14, no. 2, pp. 523-534, Feb. 2018.
- [12] F. Guo, C. Wen, J. Mao, G. Li, and Y. -D. Song, “A distributed hierarchical algorithm for multi-cluster constrained optimization,” Automatica, vol. 77, pp. 230–238, 2017.
- [13] A. Nedic and A. Ozdaglar, “Distributed subgradient methods for multi-agent optimization,” IEEE Trans. Automat. Control, vol. 54, no. 1, pp. 48–61, Jan. 2009.
- [14] A. Nedic, A. Ozdaglar, and P. A. Parrilo, “Constrained consensus and optimization in multi-agent networks,” IEEE Trans. Automat. Control, vol. 55, no. 4, pp. 922–938, Apr. 2010.
- [15] J. Xu, S. Zhu, Y. C. Soh, and L. Xie, “Convergence of asynchronous distributed gradient methods over stochastic networks,” IEEE Trans. Automat. Control, vol. 63, no. 2, pp.434-448 Feb. 2018.
- [16] J. C. Duchi, A. Agarwal, M. J. Wainwright, “Dual averaging for distributed optimization: convergence analysis and network scaling,” IEEE Trans. Automat. Control, vol. 57, no. 3, pp.592-606, Mar. 2012.
- [17] C. Xi and U. A. Khan, “DEXTRA: A fast algorithm for optimization over directed graphs,” IEEE Trans. Automat. Control, vol. 62, no. 10, pp. 4980–4993, Feb. 2017.
- [18] W. Shi, Q. Ling, G. Wu, and W. Yin, “Extra: An exact first-order algorithm for decentralized consensus optimization,” SIAM J. Optim., vol. 25, no. 2, pp. 944–966, Nov. 2015.
- [19] P. Shi, S. Wei, J. Xu, A. Nedic, “Push-pull gradient Methods for distributed optimization in Networks,” IEEE Trans. Automat. Control, vol. 6, no. 1, pp. 1-16, Lan. 2021.
- [20] H. H. Bauschke, and J. M. Borwein, “Dykstra’s alternating projection algorithm for two sets,” J. Approx. Theory, vol. 79, no. 3, pp. 418– 443, 1994.
- [21] P. L. Combettes, and J. C. Pesquet, “A Douglas-Rachford splitting approach to nonsmooth convex variational signal recovery,” IEEE J. Sel. Top. Signal Process, vol. 1, no. 4, pp. 564–574, 2007.
- [22] E. Wei and A. Ozdaglar, “Distributed alternating direction method of multipliers,” in Proc. 51st IEEE Conf. Decis. Control, Dec. 2012, pp. 5445–5450.
- [23] W. Deng, M. Lai, Z. Peng, and W. Yin, “Parallel multi-block ADMM with convergence,” J. Sci. Comput., vol. 71, no. 2, pp. 712–736, 2017.
- [24] J. Xu, S. Zhu, Y. C. Soh, and L. Xie, “Convergence of asynchronous distributed gradient methods over stochastic networks,” IEEE Trans. Automat. Control, vol. 63, no. 2, Feb. 2018.
- [25] W. Shi, Q. Ling, K. Yuan, G. Wu, and W. Yin, “On the linear convergence of the ADMM in decentralized consensus optimization,” IEEE Trans. Signal Process., vol. 62, no. 7, pp. 1750–1761, Apr. 2014.
- [26] R. A. Horn, C. R. Johnson, “Positive and nonnegative matrices,” in Matrix Analysis, 2nd ed. Cambridge University Press, Cambridge, UK, 1985, ch. 8, pp. 517-553.
- [27] W. Shi, Q. Ling, K. Yuan, G. Wu. w. Yin,“On the linear convergence of the ADMM in decentralized consensus optimization,” IEEE Trans. Signal Process., vol. 62, no. 7 pp. 1750-1761, Apr. 2014.
- [28] A. Nedic and A. Ozdaglar, “Distributed subgradient methods for multi-agent optimization,” IEEE Trans. Autom. Control, vol. 54, no. 1, pp. 48–61, Jan. 2009.
- [29] V. S. Mai and E. H. Abed, “Distributed optimization over weighted directed graphs using row stochastic matrix,” in American Control Conference, pp. 7165–7170, 2016.
- [30] Y. Wang, W. Yin, J. Zeng, “Global convergence of ADMM in nonconvex nonsmooth optimization,” J. Sci. Comput., pp. 1–35, 2015.
- [31] E. Wei and A. Ozdaglar, “On the convergence of asynchronous distributed alternating direction method of multipliers.” in Proc. IEEE CDC, Maui, HI, USA, Dec. 10-13, 2012, pp. 5445–5450.
- [32] A. Nedic and A. Olshevsky, “Distributed optimization over time-varying directed graphs,” IEEE Trans. Automat. Control, vol. 60, no. 3, pp. 601–615, Mar. 2015.
- [33] A. Nedic and A. Olshevsky, “Stochastic gradient-push for strongly convex functions on time-varying directed graphs,” IEEE Trans. Automat. Control, vol. 61, no. 12, pp. 3936–3947, Dec. 2016.
- [34] C. Xi, V. S. Mai, R. Xin, E. H. Abed, and U. A. Khan, “Linear convergence in optimization over directed graphs with row-stochastic matrices,” IEEE Trans. Automat. Control, vol. 63, no. 10, pp. 3558–3565, Oct. 2018.
- [35] R. Zhang and J. T. Kwok, “Asynchronous distributed ADMM for consensus optimization,” in Proc. Int. Conf. Mach. Learn., Jun. 2014, pp. 1701–1709.
- [36] B. S. He, H. K. Xu, and X. M. Yuan, “On the proximal jacobian decomposition of ALM for multiple-block separable convex minimization problems and its relationship to ADMM,” J. Sci. Comput., vol. 66, no. 3, pp. 1204–1217, 2016.
- [37] A. Simonetto, G. Leus, “Distributed maximum likelihood sensor network localization,” IEEE Trans. Signal Process., vol. 62, no. 6, pp. 1424–1437, Mar. 2014.