Primal-dual -Subgradient Method for Distributed Optimization ††thanks: This work was supported by National Natural Science Foundation of China under Grants 61973043.
Abstract: This paper studies the distributed optimization problem when the objective functions might be nondifferentiable and subject to heterogeneous set constraints. Unlike existing subgradient methods, we focus on the case when the exact subgradients of the local objective functions can not be accessed by the agents. To solve this problem, we propose a projected primal-dual dynamics using only the objective function’s approximate subgradients. We first prove that the formulated optimization problem can generally be solved with an error depending upon the accuracy of the available subgradients. Then, we show the exact solvability of this distributed optimization problem when the accumulated approximation error of inexact subgradients is not too large. After that, we also give a novel componentwise normalized variant to improve the transient behavior of the convergent sequence. The effectiveness of our algorithms is verified by a numerical example.
Keywords: Distributed optimization, -subgradient, constrained optimization, primal-dual dynamics
1 Introduction
The last decade has witnessed considerable interests in distributed optimization problems due to the numerous applications in signal processing, control, and machine learning. To solve this problem, subgradient information of the objective functions has been widely used due to the cheap iteration cost and well-established convergence properties [1, 2].
Note that most of these subgradient-based results assume the availability of local cost function’s exact subgradients. In many circumstances, the function subgradient is computed by solving another auxiliary optimization problem as shown in [3, 5, 4]. In practice, we are often only able to solve these subproblems approximately. Hence, in that context, numerical methods solving the original optimization problem are provided with only inexact subgradient information. This leads us to investigate the solvability of distributed optimization problem using inexact subgradient information.
A close topic is the inexact augmented Lagrangian method. As surveyed in [6], this method has been extensively extended to distributed settings in various ways assuming the primal variables are only obtained in an approximate sense. Nevertheless, most of these results still require the exact gradient or subgradient information of the local objective functions at each given estimate. It is interesting to ask whether the primal-dual method is still effective when only inexact gradient/subgradient information is available.
In this paper, we focus on a typical distributed consensus optimization problem for a sum of convex objective functions subject to heterogeneous set constraints. Although this problem has been partially studied by gradient/subgradient methods in [7, 8, 9, 10, 11, 12, 13, 14], its solvability using only inexact subgradient information has not yet been addressed and is still unclear at present.
To solve this problem, we first convert it into a distributed saddle point seeking problem and present a projected primal-dual -subgradient dynamics to handle both the distributedness and constraints. When the objective functions are smooth with exact gradients, the proposed algorithms reduce to the primal-dual dynamics considering in [10, 8]. Then, we discuss the convergent property with diminishing step sizes and suboptimality of the proposed algorithm depending on the accuracy of available subgradients. In particular, we show that if the accumulated error resulting from the subgradient inexactness is not too large, the proposed algorithm under certain diminishing step size will drive all the estimates of agents to reach a consensus about an optimal solution to the global optimization problem. To our knowledge, this might be the first attempt to solve the formulated distributed optimization problem using only inexact subgradients of the local objective functions. To improve the transient performance of our preceding designs, we further propose a novel componentwise normalized step size as that in [14]. As a byproduct, this normalized step size removes the widely used subgradient boundedness assumption in the literature [7, 8].
The rest of this paper is organized as follows. We first give some preliminaries in Section 2 and then introduce the formulation of our problem in Section 3. Main results are presented in Section 4. After that, we give a numerical example in Section 5 to show the effectiveness of our design. Finally, some concluding remarks are given in Section 6.
2 Preliminary
In this section, we first give some preliminaries about graph theory and convex analysis.
2.1 Graph Theory
Let be the -dimensional Euclidean space and be the set of all matrices. (or ) denotes an -dimensional all-one (or all-zero) column vector and (or ) all-one (or all-zero) matrix. for column vectors . For a vector (or matrix ) , () denotes its Euclidean (or spectral) norm.
A weighted (undirected) graph is defined as follows, where is the set of nodes, is the set of edges, and is a weighted adjacency matrix. denotes an edge leaving from node and entering node . The weighted adjacency matrix of this graph is described by , where and ( if and only if there is an edge between nodes and ). The neighbor set of agent is defined as for . A path in graph is an alternating sequence of nodes and edges for . If there exists a path from node to node then node is said to be reachable from node . The Laplacian of graph is defined as and . It can be found that the Laplacian is symmetric and semi-definite. Denote its ordered eigenvalues as . The corresponding eigenvector of is the all one vector . Moreover, if and only if this graph is connected.
2.2 Convex analysis
For each set , the indicator function is denoted by with for any and for any . A set is said to be convex if for any any and . For a closed convex set , the projection operator is defined as . The projection operator is non-expansive in the sense that for any . For any , it holds .
For a given function , we denote by the domain of . We always assume that and if not specified. We say it is convex if its domain is convex and holds for all and . If this inequality is strict in the sense that the equation holds only if , the function is called strictly convex. A function is called closed and convex on a convex set if its constrained epigraph is a closed convex set. If , we call a closed convex function.
A vector-valued function is Lipschitz with constant (or simply -Lipschitz) if we have
Let us consider a function , where and are nonempty subsets of and , respectively. A pair of vectors and is called a saddle point of if holds for any and .
3 Problem Formulation
In this paper, we focus on solving the following constrained optimization problem by a network of agents:
(1) |
Here function and set are private to agent for each and can not be shared with others.
To ensure its solvability, the following assumption is made.
Assumption 1
For each , function is convex, set is convex and closed, and is nonempty and contained in .
Note that might be nondifferentiable under this assumption. Denote the minimal value of problem (1) by and the optimal solution set by , i.e., and . As usual, we assume is finite and set is nonempty. To cooperatively address the optimization problem (1) in a distributed manner, we use a weighted undirected graph to describe the information sharing relationships with node set , edge set , and weight matrix . Here means agents and can communicate with each other.
Assumption 2
Graph is connected.
Suppose agent maintains an estimate of the optimal solution to (1) with other (possible) auxiliary variables. Agents exchange these variables through the communication network described by and perform some updates at given discrete-time instants . Then, the distributed optimization problem in this paper is formulated to find an update rule of for agent using only its own and neighboring information such that for any and . If possible, we expect that all the estimates will converge to an optimal solution to problem (1).
As stated above, this problem has been intensively studied in literature [1, 2]. However, most existing designs require the exact subgradients of the local objective functions in constructing effective distributed algorithms. In this paper, we are interested in the solvability of the formulated distributed constrained optimization problem (1) working with inexact subgradients of the local objective functions. For this purpose, we adopt the notion of -subgradient to describe such inexactness as in [15] and assume the -subgradient of can be easily computed for any given .
Definition 1
For a convex function and a scalar , is said to be an -subgradient of at if
Denote by the -subdifferential of at , which is the set of all -subgradients of at . is nonempty and convex for any due to the convexity of . Moreover, coincides with the subdifferential of at .
In next section, we will convert our problem into a saddle-point seeking problem and develop a projected primal-dual -subgradient method with rigorous solvability analysis.
4 Main Result
To begin with, we rewrite problem (1) into an alternative form as that in [10, 16]:
s.t. | (2) | |||
where and is the Laplacian of graph . Note that is symmetric and positive semi-definite with its ordered eigenvalues as under Assumption 2 by Theorem 2.8 in [17].
Consider the augmented Lagrangian function of problem (4):
(3) |
with . By Proposition 3.4.1 in [3], if has a saddle point in , then must be an optimal solution to problem (4), which in turn provides an optimal solution to (1). Since the Slater’s condition holds under Assumption 1, such saddle points indeed exist by virtue of Theorems 3.34 and 4.7 in [18]. Thus, it suffices for us to seek a saddle point of in .
Following this conversion, many solvability results on problem (1) have been presented when the exact gradient or subgradient of is available, e.g., [19, 20, 16, 10, 21, 14]. However, whether and how -subgradient algorithms can be derived has not been discussed yet. To this end, we are motivated by aforementioned saddle-point seeking designs and present the following dynamics:
(4) |
where , , and with parameters to be specified later. It can be taken as a constrained version of algorithms in [19, 20]. Different from similar primal-dual designs in [10, 16], we do not require the differentiability of these objective functions or their exact gradients.
Letting and , we can put (4) into a compact form:
(5) |
with . It can be further rewritten as follows.
(6) |
where , , and
To establish the effectiveness of this algorithm, another assumption is made as follows.
Assumption 3
The -subgradient sequence is uniformly bounded for each , i.e., there exists a scalar such that all .
This assumption is temporally made for simplicity as in [22, 8] and will be further removed later by some novel step sizes later. Suppose is a saddle point of in . Here is a key lemma under Assumption 3.
Lemma 1
Proof. By lemma conditions, is a saddle point of . Then, can be easily verified by the definition of saddle points.
Next, we consider the evolution of with respect to . Under the iteration (4), it follows then
(8) |
By the proprieties of saddle point and -subgradient, we have
(9) | ||||
Since , can be rewritten as
Under Assumption 3, there must be constant such that
(10) |
Putting all inequalities (4)–(10) together, we have
which is exactly the expected inequality (7).
When the exact subgradient is available (i.e., ), the inequality (7) can be simplified into the well-known supermartingale inequality ensuring the convergence of towards as shown in [3] if is chosen to satisfy
(11) |
However, the inexactness of available subgradients deteriorates this property and the expected convergence might fail when we use only -subgradients in the iteration (4).
Let us denote and take a closer look at the inequality (7). Note that consists of two parts, i.e., the discrepancy of function values and violation of constraints (in term of consensus error) of the iterative sequence. It can be taken as a measure of suboptimality of the iterative sequence. In other words, we can determine the upper bound for to evaluate the effectiveness of algorithm (4).
We first consider the case when is a constant.
Theorem 1
Proof. To prove this theorem, we only have to show that . If the inequality does not hold, there must exist a and a sufficient large integer such that for all . By (11), we have . Thus, there must exist an integer such that for all . Bringing these conditions together, one can strengthen inequality (7) for all as follows.
Summing up its both sides from to gives
where we use to handle the cross terms.
Note that for any . Hence . Under the condition (11), there must exist a positive scalar such that
which can not hold for a sufficiently large since . We obtain a contradiction and complete the proof.
Remark 1
According to Theorem 1, one can generally obtain a suboptimal solution to problem (1) using inexact subgradients. If we are interested in an exact solution, it is required to ensure . For a very special case when , it shows the effectiveness of our algorithm (4) in solving the formulated problem (1) with exact subgradients. This observation is consistent with the existing subgradient methods in [20, 11, 12, 13, 14].
With Theorem 1, it is natural for us to enforce some stronger condition on the error for a better convergence performance of the entire sequence . Along this line, we provide another theorem supposing the accumulated error of subgradient inexactness is not too large.
Theorem 2
Proof. Note that by Lemma 1 and under the theorem assumption. Applying Lemma 5.31 in [23] to the inequality (7), we can obtain the convergence of and
(14) |
Thus, the sequence must be uniformly bounded by some . From the continuity of , it must be -Lipschitz with respect to for some constant . It follows then
(15) |
Jointly using (14), (15), and , we resort to Proposition 2 in [24] and conclude that . Recalling the expression of , we have and . Note that is positive semidefinite with as its simple eigenvalue under Assumption 2. It follows then and .
Due to the uniform boundedness of sequence by item 1), there must be a convergent subsequence of . We denote its limit by . Then, it should satisfy and by item 2). In other words, is an optimal solution to (4). By Assumption 2, one can conclude that there exists some such that . Note that , i.e., is an exact optimal solution to problem (1).
If holds, all convergent subsequences of have the same limit . This combined with the boundedness of implies item 4) and completes the proof.
Remark 2
This theorem specifies a nontrivial case when our distributed optimization problem (1) can be exactly solved even using only inexact subgradients information of the local objective functions. This observation is consistent with the centralized results in [5]. Compared with similar primal-dual domain results in [19, 25, 21, 10, 16, 14], this algorithm further allows us to consider nonsmooth objective functions with only approximate subgradients.
It is known that normalization might improve the transient performance of the algorithms to avoid overshoots at the starting phase. However, conventional normalized techniques often involve some global information and can not be directly implemented in distributed settings. We here present a novel componentwise normalized version of algorithm (4) as follows.
(16) | ||||
where integer with the diameter of graph and is any given constant. Since or an upper bound can be computed by distributed rules [26], this normalized algorithm is implementable in a fully distributed manner by embedding a max-consensus subiteration.
Here is a corollary to state that the normalized algorithm (4) retains all the established properties in Theorem 2.
Corollary 1
Proof. The proof is similar with that of Theorem 2. First, we recall Theorem 4.1 in [27] and conclude that all agents will get after the subiteration. Thus, we only have to consider the following system:
with . For this new system, we can establish a similar inequality as (7) for :
Note that and . Recalling the fact (9) and , one can obtain
According to Lemma 5.3.1 in [23], we can conclude the convergence of and . Then, is uniformly bounded, which implies that there must be small enough constant such that . We use Proposition 2 in [24] again and obtain that . Then, items 3) and 4) can be easily verified following a similar procedure as in Theorem 2. The proof is thus complete.
Remark 3
Compared with the conventional normalized step sizes in [18, 28], the proposed componentwise normalized step size can be taken as their distributed extension and the iterative sequence generated by (4) might have a better transient behavior than that generated by (4). Interestingly, the widely used subgradient boundedness assumption (i.e., Assumption 3) is also removed as a byproduct, which might be favorable in distributed scenarios.
5 Simulation
In this section, we consider a LASSO (least absolute shrinkage and selection operator) regression problem to verify the effectiveness of our algorithms:
where is the regularization parameter, is an estimate only known by agent , and is the constrained set. In this case, we let and and put it into a form of problem (1). Distributed subgradient algorithms have been developed to solve this problem when is available, e.g., [12]. Although and can be easily calculated, we use this example to show the effectiveness of our algorithm only using its -subdifferential instead of the exact one.
According to the definition of -subgradient, we have




For simulations, we let , , and . The communication graph is given as Fig. 1 with unity weights. To make it more interesting, we assume this problem has set constraints specified by with . Assumptions 1-2 can be confirmed. To ensure the condition (13), we choose . Then, the considered problem can be solved by our proposed distributed primal-dual -subgradient method (PDSM) (4) and the normalized primal-dual -subgradient method (NPDSM) (4) according to Theorem 2 and Corollary 1. In simulations, when , we choose as the -subgradient, when , we choose as the -subgradient, when , we choose as the -subgradient.

Simulation results with and for (4) and (4) are shown in Figs. 2–3, where all agents’ primal variables are observed to converge to the global optimal solution while the dual variables are bounded and converge. This verifies the effectiveness of our algorithms. Moreover, one can find that although the normalized step size in (4) might slow down the convergence speed compared with (4), the resultant transient performance of the primal and dual variables has been much improved with less and weaker oscillations. For a clear comparison, we let be the residential error of our algorithms. The profiles of in both algorithms are shown in Fig. 4. From this, we can also confirm the improvement of transient performance by the proposed componentwise normalized step size.
6 Conclusion
In this paper, we have attempted to solve a distributed constrained optimization problem with inexact subgradient information of local objective functions. We have developed a projected primal-dual dynamics using only -subgradients and discussed its convergence properties. In particular, we have shown the exact solvability of this problem if the accumulated error introduced by subgradient inexactness is not too large. We have also presented a novel distributed normalized step size to improve the transient performance of our algorithms. It is interesting to consider more general graphs in future works.
References
- [1] Nedić A, Liu J, Distributed optimization for control, Annual Review of Control, Robotics, and Autonomous Systems, 2018, 1: 77–103.
- [2] Yang T, Yi X, Wu J, et al. A survey of distributed optimization, Annual Reviews in Control, 2019, 47: 278–305.
- [3] Bertsekas D, Convex Optimization Algorithms, Athena Scientific, Belmont, 2015.
- [4] Devolder O, Glineur F, Nesterov Y, First-order methods of smooth convex optimization with inexact oracle, Mathematical Programming, 2014, 146(1-2): 37–75.
- [5] Kiwiel K, Convergence of approximate and incremental subgradient methods for convex optimization, SIAM Journal on Optimization, 2004, 14(3): 807–840.
- [6] Jakovetić D, Bajović D, Xavier J, Moura J, Primal–Dual Methods for Large-Scale and Distributed Convex Optimization and Data Analytics, Proceedings of the IEEE, 2020, 108(11): 1923–1938.
- [7] Nedić A, Ozdaglar A, Distributed subgradient methods for multi-agent optimization, IEEE Transactions on Automatic Control, 2009, 54(1):48–61.
- [8] Jakovetić D, Moura J, Xavier J, Linear convergence rate of a class of distributed augmented Lagrangian algorithms, IEEE Transactions on Automatic Control, 2014, 60(4): 922–936.
- [9] Yi P, Hong Y, Liu F, Distributed gradient algorithm for constrained optimization with application to load sharing in power systems, Systems & Control Letters, 2015, 83: 45–52.
- [10] Lei J, Chen H, Fang H, Primal–dual algorithm for distributed constrained optimization, Systems & Control Letters, 2016, 96:110–117.
- [11] Xi C, Khan U, Distributed subgradient projection algorithm over directed graphs, IEEE Transactions on Automatic Control, 2016, 62(8):3986–3992.
- [12] Liu S, Qiu Z, Xie L, Convergence rate analysis of distributed optimization with projected subgradient algorithm, Automatica, 2017, 83:162–169.
- [13] Zeng X, Yi P, Hong Y, Distributed continuous-time algorithm for constrained convex optimizations via nonsmooth analysis approach, IEEE Transactions on Automatic Control, 2017, 62(10): 5227–5233.
- [14] Zhu K, Zhu H, Tang Y, On the Boundedness of Subgradients in Distributed Optimization, Proceedings of 39th Chinese Control Conference CCC, Shenyang, 2020, 4912–4917.
- [15] Polyak B, Introduction to Optimization, Optimization Software Inc., New York, 1987.
- [16] Liu Q, Yang S, Hong Y, Constrained consensus algorithms with fixed step size for distributed convex optimization over multiagent networks, IEEE Transactions on Automatic Control, 2017, 62(8): 4259–4265.
- [17] Mesbahi M, Egerstedt M, Graph Theoretic Methods in Multiagent Networks, Princeton University Press, Princeton, 2010.
- [18] Ruszczynski A, Nonlinear Optimization, Princeton University Press, Princeton, 2011.
- [19] Wang J, Elia N, Control approach to distributed optimization, Proceedings of 48th Annual Allerton Conference on Communication, Control, and Computing, Monticello, 2010, 557–561.
- [20] Gharesifard B, Cortés J, Distributed continuous-time convex optimization on weight-balanced digraphs, IEEE Transactions on Automatic Control, 2014, 59(3): 781–786.
- [21] Kia S, Cortés J, Martínez S, Distributed convex optimization via continuous-time coordination algorithms with discrete-time communication, Automatica, 2015, 55: 254–264.
- [22] Nedić A, Ozdaglar A, Subgradient methods for saddle-point problems, Journal of Optimization Theory & Applications, 2009, 142(1): 205-–228.
- [23] Bauschke H, Combettes P, Convex Analysis and Monotone Operator Theory in Hilbert Spaces (2nd ed.), Springer, Cham, 2017.
- [24] Alber Y, Iusem A, Solodov M, On the projected subgradient method for nonsmooth convex optimization in a Hilbert space, Mathematical Programming, 1998, 81(1): 23–35.
- [25] Jakovetić D, Xavier J, Moura J, Fast distributed gradient methods, IEEE Transactions on Automatic Control, 2014, 59(5): 1131–1146.
- [26] Oliva G, Setola R, Hadjicostis C, Distributed finite-time average-consensus with limited computational and storage capability, IEEE Transactions on Control of Network Systems, 2016, 4(2): 380–391.
- [27] Nejad B, Attia S, Raisch J, Max-consensus in a max-plus algebraic setting: The case of fixed communication topologies, Proceedings of 2009 XXII International Symposium on Information, Communication and Automation Technologies, Sarajevo, 2009, 1–7.
- [28] Boyd S, Mutapcic A. Subgradient Methods, Notes for EE364b, Stanford University, 2008.