Distributed Optimization with Projection-free Dynamics
Abstract
We consider continuous-time dynamics for distributed optimization with set constraints in the paper. To handle the computational complexity of projection-based dynamics due to solving a general quadratic optimization subproblem with projection, we propose a distributed projection-free dynamics by employing the Frank-Wolfe method, also known as the conditional gradient algorithm. The process searches a feasible descent direction by solving an alternative linear optimization instead of a quadratic one. To make the approach applicable over weight-balanced digraphs, we design a dynamics for the consensus of local decision variables and another dynamics of auxiliary variables to track the global gradient. Then we prove the convergence of the dynamical systems to the optimal solution, and provide detailed numerical comparisons with both projection-based dynamics and other distributed projection-free algorithms. Also, we derive the distributed discrete-time scheme following the instructive ideas of the proposed dynamics and provide its accordingly convergence rate.
Index Terms:
distributed optimization, Frank-Wolfe method, projection-free algorithm, constraint.I Introduction
Distributed optimization and its applications have attracted a large amount of research attention in the past decade. Under multi-agent frameworks, the global objective function consists of agents’ local objective functions, and each agent shares limited amounts of information with neighbors through the networks to achieve an optimal solution. Both discrete-time algorithms [1, 2, 3, 4] and continuous-time algorithms [5, 6, 7, 8, 9, 10] are extensively developed for solving distributed problems.
Among continuous-time algorithms, projection-based dynamics have been widely adopted to solve distributed optimization with constraints, on the basis of the well-developed theory in nonlinear optimization [11, 12]. Various projection-based dynamics have been designed with techniques in dynamical systems and control theory. For example, [6] proposed a proportional-integral protocol to solve distributed constrained optimization, while [7] proposed distributed dynamics where the projection maps are with respect to tangent cones. However, projection-based design implies that agents will encounter a quadratic subproblem when a variable needs to find its nearest point to a set. When the constraints are expressed as complex structures, the computational cost of quadratic subproblems discourages agents from employing projection-based approaches, particularly for high-dimensional constraints.
Meanwhile, though the primal-dual method is another common tool for distributed constrained optimization [4, 13, 14, 15, 16], the attendant difficulties and challenges actually occur in the high dimension of the algorithm, since the dimension of dual dynamics is as high as the dimension of the constraints. This brings extra burden in local storage, algorithm iteration and distributed communication. Furthermore, practical constraints are usually equipped with sparsity, that is, we may spend huge calculation resources to iterate the additional dual dynamics. Nowadays, considering the optimization with large-scale data or constraints, such as sparse PCA, structural SVM, and matrix completion, researchers are eagerly seeking efficient and specific tools for optimization with high-dimensional constraints, rather than the common approaches like projection-based or primal-dual algorithms.
Fortunately, the well-known Frank-Wolfe (FW) method [17], also known as the conditional gradient, provides us with a helpful idea. Briefly speaking, the FW method uses a linearized function to approximate the objective function and derives a feasible descent direction by solving optimization with a linear objective. Thanks to the linear programming toolbox, the feasible descent direction can be efficiently computed when the constraints are high-dimensional polyhedrons, which can usually be used as an approximation for general convex sets [18]. Then, this process avoids general projection operations and solving corresponding quadratic subproblems in algorithm iterations. In fact, the FW method is suitable for various sparse optimization with atomic constraints [19], among which the linear constraint is the most comprehensible one. Though proposed early in 1956 [17], the FW method has regained lots of attention in recent years due to its computational advantage for optimization with large-scale sparse constraints, which covers wide applications such as image process in machine learning and data analysis in bioinformation [21, 22, 23, 20]. Also, there have been massive investigations on the FW method afterward, such as rate analysis over strongly convex sets in [24], online learning over networks in [25], and quantized FW for lower communication in [26].
Motivated by the above, we aim to design a projection-free dynamics based on the FW method for solving distributed constrained optimization in this paper. Agents have their own local objective functions and need to achieve the optimal solution via communicating with neighbors over networks. The main contributions are as follows.
Contributions
Inspired by the historical literature [27], we design a novel FW-based distributed dynamics for agents to solve the constrained optimization. Compared to the dynamics in [6, 7], a feasible descent direction is derived by solving optimization with a linear objective. Hence, the dynamics avoids solving complicated quadratic subproblems due to projection operations on set constraints, which actually leads to a projection-free dynamics. Our distributed projection-free dynamics accordingly induces a novel discrete-time format, where averaging consensus is employed both to ensure the consensus of local decision variables and to help auxiliary variables track the global gradient. This differs from the mechanism in the decentralized discrete-time FW algorithm of [28]. Hence, we develop a novel convergence analysis with the Lyapunov theory and comparison theorems, and provide the analysis on the convergence rate. Moreover, compared with the projected dynamics given in [6, 7] and the discrete-time FW algorithm in [28], the distributed projection-free dynamics is applied over the communication networks described by weight-balanced digraphs.
Paper Organization
The organization of this paper is given as follows. Section II formulates the distributed constrained optimization with basic assumptions, and presents a projection-free algorithm with the corresponding discretization. Then Section III reports the main results, including the consensus of decision variables, global gradient tracking and convergence of the continuous-time algorithm, as well as the convergence rate of the induced discrete scheme. Finally, Section IV shows numerical examples with a comparison to the existing algorithms, while Section V provides some concluding remarks and discussions on future work.
Notations
Here we give some necessary notations in this paper. Denote ( ) as the set of -dimension (-by-) real column vectors (real matrices), (or ) as the -dimensional column vectors with all elements of one (or zero), and Let as the Kronecker product of matrices and . In addition, take as the Euclidean norm of vectors, and as the Frobenius norm of real matrices defined by
II Distributed Projection-free Dynamics
In this section, we formulate the constrained distributed optimization, and then propose the distributed projection-free dynamics, as well as a discrete scheme derived from the continuous-time algorithm.
II-A Problem Formulation
Consider agents indexed by . For agent , there is a local cost function on the feasible set . The global cost function is
All agents aim to solve the constrained optimization:
(1) |
In a multi-agent network, the th agent controls a local decision variable to search the optimal Also, the information of local cost functions are regarded as private knowledge. The agents communicate with their neighbors through a network described by a digraph , where is the set of nodes (regarded as agents here) and is the set of edges. is the adjacency matrix subject to if and only if , which means that agent can send information to agent , and , otherwise. The Laplacian matrix , where is diagonal with , for any . A digraph is strongly connected if there exists at least one directed path between any pair of vertices, and is weight-balanced if for .
In addition, the gradient of a differentiable function is -Lipschitz on convex set with a constant , if
Also, the above is equivalent to the following:
Then we consider solving distributed optimization (1) under the following assumptions.
Assumption 1
-
•
The feasible set is convex, compact and nonempty.
-
•
For , is convex and differentiable, and is -Lipschitz on .
-
•
The digraph is strongly connected and weight-balanced.
II-B Distributed Projection-free Dynamics
To solve the distributed optimization (1), we intend to explore a projection-free approach following the FW method. In fact, continuous-time algorithms provide evolved dynamics, which help reveal how each variable achieves the optimal solution with given directions. Also, the analysis techniques in modern calculus and nonlinear systems may lead to comprehensive results with mild assumptions in continuous-time schemes. Hence, we firstly propose a novel projection-free dynamics with the FW method in the following Algorithm 1, which differs from the projection-based continuous-time algorithms in [6, 7].
Initialization:
, , , .
Flows renewal:
where is a positive time-varying parameter satisfying
Algorithm 1 is distributed since the dynamics of the th agent is only influenced by local values of , , , and . Specifically, the th agent uses local decision variable to estimate the optimal solution and local optimal solution as a conditional gradient. Since each agent is merely capable to calculate its own gradient , rather than the global gradient , thus, serves as the variable that simultaneously operates two processes — one is to compute agent ’s local gradient, the other is to achieve consensus with neighbors’ local gradients, in order to estimate the global gradient. In fact, the gradient tracking method in [30, 31] motivates our algorithm design. Although the time-varying seems to be a global parameter, it is easy to determine its value for all agents, by merely selecting some general decreasing functions like . That is analogous to other FW-based works dealing with parameters [28, 25].
II-C Discretization
So far, we have devoted to proposing distributed projection-free dynamics to avoid solving complex subproblems owing to projections. Following these instructive ideas in designing the above distributed continuous-time Algorithm 1, we consider deriving its implementable discretization. We notice the decentralized discrete-time FW algorithm in [28]. To make a comparison, we present the discretization of Algorithm 1 by adopting the notations in [28] in the following. To remain consistent with [28], set the network undirected and connected, and adjacency matrix as symmetric and doubly stochastic. Let be a fixed step-size in discretization, and denote . Take the weighted average of agent ’s neighbors in the network by
and similarly,
Then, consider the ODE in Algorithm 1
The corresponding difference equation is
Therefore, the discretization of Algorithm 1 gives
(2) |
For clarification, the discrete-time FW in [28] is given as
(3) |
The discretization above reveals that the major difference between (2) and (3) refers to the update protocol of . In (2), agent uses both its neighbors’ gradient values and its own gradient renewal to track the global gradient. In (3), agent gathers the average value of neighbors’ decision variables to compute local gradient at first. Then agent makes again the neighbors’ average gradient value to estimate the global gradient. In other words, the consensus format in (3) requires agents to sequentially collect its neighbors’ decision variables and auxiliary variables , which results in more communication burden and data storage. Consider that constructing a reliable communication link is time-consuming and energy-consuming in practice. Therefore, (2) overcomes this weak point by requiring agents to simultaneously collect the both variables. Since the mechanism of Algorithm 1 differs from what in [28], we need to explore new tools for analysis. In addition, we apply Algorithm 1 over weight-balanced digraphs as the generalization of undirected connected graphs in [28], which brings more technical challenges correspondingly.
III Convergence Analysis
III-A Convergence of Algorithm 1
For convenience of statements in the following, we first provide some compact notations here. Take
and . Also, define the profile of gradients
Equivalently, Algorithm 1 can be expressed in the following compact form
(4) |
where with
Hereupon, we can derive the following results for the convergence guarantees of Algorithm 1.
Theorem 1
The following two lemmas are necessary for the analysis of Algorithm 1, whose proofs can be found in the appendix.
Lemma 1
Under Assumption 1, if for all , then for all and for all , .
Lemma 2
Given scalars , , and , if , , and
then .
In this way, we give the proof of Theorem 1.
Proof.
i). It follows from Lemma 1 that . Moreover, since is chosen from , it implies that is bounded. Because as , we have
Thus, the dynamics for decision variables in Algorithm 1 can be written as
(5) |
where . According to the existing results in [32, Section 4.3], all decision variables in (5) reach consensus, i.e.,
which yields the result.
ii). Set , and let us investigate
Considering the orthogonal decomposition in the subspace and its complementary space , for , define
and
where
and . Since , clearly, we further obtain that
and
Due to the weight-balanced digraph ,
Together with the initial condition for , we can verify that
Hence,
or equivalently,
It follows from that . Therefore,
which indicates that
Futhermore, let us invetigate the function
and consider its derivative along Algorithm 1 in the following.
where . By Assumption 1, the digraph is strongly connected and weight-balanced, which yields
Hence,
where is the smallest positive eigenvalue of , and the last inequality follows from the fact and Rayleigh quotient theorem [33, Page 234]. Moreover,
(6) | ||||
It follows from Assumption 1 that is -Lipschitz on , which leads to the boundedness of . Thus,
So far, since achieves consensus and as , we have
which indicates that
Recall the properties that and . Therefore, we learn from Lemma 2 that (III-A) yields
which implies that all variables track the dynamical global gradient.
iii). Suppose that is an optimal solution to problem (1) and denote . Take
Consider the following function
Clearly, . Then we investigate its derivative.
where the last equality holds since . Then,
Recall the derivation of , or equivalently,
which implies that,
On this basis, take , and thus,
By the convexity of the cost functions,
Meanwhile, it follows from that
Since , there exists a contant such that and
Therefore,
Recall that we have already proved
as well as
Moreover, the positive parameter satisfies
Hence, by Lemma 2 again, we have
III-B Convergence Rate of Discretization
To compare our discretization (2) with the discrete-time algorithm (3) in [28], we provide theoretical guarantees on the convergence rate of discretization (2). Similarly, here we need some new notations to describe the discrete scheme. Take , and analogously,
Also, denote the averages global gradients by
Set as the second largest eigenvalue of the adjcency matrix . Select as the smallest positive integer such that
(7) |
The existence of is guaranteed because the adjacency matrix is symmetric and doubly stochastic [28]. Then we have the following result on the convergence rate of discrete algorithm (2).
Theorem 2
Under Assumption 1, set , , and as the diameter of . For any , there exist constants
and
such that
-
i).
all decision variables achieve consensus, i.e.,
-
ii).
all variable track the global gradient, i.e.,
-
iii).
the average converges to an optimal solution with a rate of , i.e.,
Proof.
i). Recall the compactness of and this is naturally true for . For induction, assume
for some . Then, we can verify that
where and the last inequality follows from (7). Thus, we have the conclusion according to induction.
ii). Similarly, we can verify this for . For induction, assume
for some . Then we can verify that
(8) |
The firsr term in (8) satisfies
The second term in (8) satisfies
Note that
So far, we have
Next, the third term in (8) satisfies
To sum up, we obtain that
where the last inequality follows from (7) and the definitions of constants and . Hence, we get the conclusion according to induction.
iii). Take
It follows from (2) that
Since is -Lipschitz and is compact,
(9) |
For , because ,
Thus, take
Together with the optimality of and the convexity of ,
(10) |
In addition,
(11) |
When , it follows from the obtained conclusions in i) and ii) that, (III-B) becomes
(12) |
By substituting (III-B) and (12) into (9), we further have
which yields
Consequently, by [34, Lemma 4], we have convergence rate for , that is,
This completes the proof.
IV Numerical Examples

dimensions | n=16 | n=64 | n=256 | n=1024 | n=4096 |
DCPf (msec) | 11.5 | 12.0 | 12.6 | 13.3 | 14.1 |
DeFW (msec) | 11.5 | 12.1 | 12.8 | 13.1 | 14.4 |
DCPb-Zeng (msec) | 8.7 | 13.2 | 19.8 | 27.9 | 40.7 |
DCPb-Yang (msec) | 8.7 | 13.1 | 19.5 | 28.4 | 43.0 |



In this section, we carry on some experiments to illustrate the effectiveness of our approach in this paper. We first take a simple example with only agents and only dimensions of the decision variables to illustrate the trajectories of Algorithm 1. Set the local cost functions as
The feasible sets is
The initial locations are as follows,
Under this circumstance, the global optimal solution should be the origin on Euclidean space. We employ a directed ring graph as the communication network. Fig.1 shows the trajectory in the plane. The boundaries of the feasible set are in black, while the trajectories of the four agents’ decision variables are showed with different colors.
Next, we show the effectiveness of our distributed projection-free dynamics by comparisons. The number of agents is increased to . As shown in [19], the FW method works better than projection-based algorithms when the vertices on the boundary of the constraint set are easily to find in high-dimensional decision spaces. Thus, along with quadratic local cost functions, we set the constraint set as . Then we choose different dimensions of decision variables and compare our distributed continuous-time projection-free algorithm (DCPf) with two distributed continuous-time projection-based algorithms, (DCPb-Yang) by [6], (DCPb-Zeng) by [7], and a decentralized discrete-time FW algorithm (DeFW) by [28]. Since all these distributed algorithms are suitable for undirected graphs, we set an undirected ring graph as the communication network.
We set the dimensions of decision variables as the power of two. In Fig.2, the -axis is for the real running time (CPU time) in seconds, while the -axis is for the optimal solution errors in each algorithm. We learn from Fig.2 that as the dimension increases, the real running time (CPU time) of projection-based algorithms is obviously longer than projection-free ones, because searching the vertices on the boundary of high-dimensional constraint sets (to solve a linear program) is faster than calculating a projection on high-dimensional constraint sets (to solve a quadratic program). Moreover, we can observe from Fig.2 that our DCPf is not second to DeFW over connected undirected graphs.
Furthermore, in Tab.I, we list the average real running time of solving subproblems, i.e., linear programs or quadratic programs. When the dimension is low, solving linear programs may take more time than solving quadratic programs over such constraint sets. However, as the dimension increases explosively, solving quadratic programs in such situation turns to be difficult, but the time of solving linear programs still remains almost the same. That conforms with the advantage of projection-free approaches.
V Discussions
In this paper, we developed a novel projection-free dynamics for solving distributed optimization with constraints. By employing the Frank-Wolfe method, we found a feasible descent direction by solving optimization with a linear objective, which avoided solving high-dimensional subproblems in projection-based algorithms. In the distributed dynamics, we simultaneously made the decision variables achieve consensus and the local auxiliary variables track the global gradient. Then we gave an analysis on the convergence of the projection-free dynamics by the Lyapunov theory. As for the induced discrete-time format, we proved the convergence rate. Finally, we provided comparative illustrations to show the efficiency of our algorithm.
In the future, we may focus on some potential challenges to improve the result in this paper. First, whether fixed constants or self-adaptive settings are suitable for parameter in the algorithm. Second, more advanced approaches should be investigated for directly analyzing the convergence rate of the continuous-time algorithm itself. Third, whether more general formulations of distributed constrained optimization can adopt analogous projection-free algorithms, such as resource allocation problems with coupled constraints. At last, as the advantages from the FW method itself, such projecion-free approaches fit a class of constrained optimization well, including polyhedrons, unit simplices, and unit squares. However, this approach does not always mean efficient for all kinds of constraints, and how to adopt similar ideas to handle diverse constraints is still a challenging but interesting task. Maybe mix-projection schemes for distributed optimization with different constraints will be developed in the near future.
Appendix
Proof of Lemma 1
For a convex set and , denote the normal cone to at by
and the tangent cone to at by
Let as the projection on at point , which yields that
Consider for and some . Since , for , are also selected from , we have
and similarly,
Hence, it follows from the dynamics of Algorithm 1 that
which leads to . On this basis, consider the energy function as
Its derivative along the dynamics of Algorithm 1 is
where the last inequality holds because normal cones and tangent cones are orthogonal. Since , we have . Together with and , we have at any time . This reveals that once all variables are located within for some , they will not escape. Therefore, recalling for , we complete the proof.
Proof of Lemma 2
Let us denote the funciton
which implies
Recall that and
After multiplying the both sides of the inequality above by , we can further obtain that
Then we integrate the above on the segment by the Comparison Lemma in [11], which leads to the following inequality
Hereupon, we can give a detailed discussion based on the above result. On the one hand, if
then eventually. On the other hand, if
then it follows from L’ Hospital rule that
which completes the proof.
References
- [1] A. Nedic, A. Ozdaglar, and P. A. Parrilo, “Constrained consensus and optimization in multi-agent networks,” IEEE Transactions on Automatic Control, vol. 55, no. 4, pp. 922–938, 2010.
- [2] D. Yuan, S. Xu, H. Zhao, and L. Rong, “Distributed dual averaging method for multi-agent optimization with quantized communication,” Systems & Control Letters, vol. 61, no. 11, pp. 1053–1061, 2012.
- [3] X. Li, G. Feng, and L. Xie, “Distributed proximal algorithms for multi-agent optimization with coupled inequality constraints,” IEEE Transactions on Automatic Control, 2020.
- [4] D. Yuan, D. W. Ho, and S. Xu, “Regularized primal–dual subgradient method for distributed constrained optimization,” IEEE transactions on Cybernetics, vol. 46, no. 9, pp. 2109–2118, 2015.
- [5] X. Wang, Y. Hong, and H. Ji, “Distributed optimization for a class of nonlinear multiagent systems with disturbance rejection,” IEEE transactions on Cybernetics, vol. 46, no. 7, pp. 1655–1666, 2015.
- [6] S. Yang, Q. Liu, and J. Wang, “A multi-agent system with a proportional-integral protocol for distributed constrained optimization,” IEEE Transactions on Automatic Control, vol. 62, no. 7, pp. 3461–3467, 2016.
- [7] X. Zeng, P. Yi, and Y. Hong, “Distributed continuous-time algorithm for constrained convex optimizations via nonsmooth analysis approach,” IEEE Transactions on Automatic Control, vol. 62, no. 10, pp. 5227–5233, 2016.
- [8] X. Yi, L. Yao, T. Yang, J. George, and K. H. Johansson, “Distributed optimization for second-order multi-agent systems with dynamic event-triggered communication,” in Conference on Decision and Control. IEEE, 2018, pp. 3397–3402.
- [9] X. Shi, J. Cao, G. Wen, and M. Perc, “Finite-time consensus of opinion dynamics and its applications to distributed optimization over digraph,” IEEE transactions on Cybernetics, vol. 49, no. 10, pp. 3767–3779, 2018.
- [10] K. Lu, G. Jing, and L. Wang, “Distributed algorithms for searching generalized nash equilibrium of noncooperative games,” IEEE transactions on Cybernetics, vol. 49, no. 6, pp. 2362–2371, 2018.
- [11] H. K. Khalil and J. W. Grizzle, Nonlinear Systems. Prentice Hall Upper Saddle River, NJ, 2002, vol. 3.
- [12] A. Ruszczynski, Nonlinear Optimization. Princeton University Press, 2011.
- [13] P. Yi, Y. Hong, and F. Liu, “Initialization-free distributed algorithms for optimal resource allocation with feasibility constraints and application to economic dispatch of power systems,” Automatica, vol. 74, pp. 259–269, 2016.
- [14] Y. Zhu, W. Yu, G. Wen, and G. Chen, “Projected primal–dual dynamics for distributed constrained nonsmooth convex optimization,” IEEE Transactions on Cybernetics, vol. 50, no. 4, pp. 1776–1782, 2018.
- [15] T. Yang, X. Yi, J. Wu, Y. Yuan, D. Wu, Z. Meng, Y. Hong, H. Wang, Z. Lin, and K. H. Johansson, “A survey of distributed optimization,” Annual Reviews in Control, vol. 47, pp. 278–305, 2019.
- [16] S. Liang, X. Zeng, G. Chen, and Y. Hong, “Distributed sub-optimal resource allocation via a projected form of singular perturbation,” Automatica, vol. 121, p. 109180, 2020.
- [17] M. Frank, P. Wolfe et al., “An algorithm for quadratic programming,” Naval Research Logistics Quarterly, vol. 3, no. 1-2, pp. 95–110, 1956.
- [18] G. Chen, Y. Ming, Y. Hong, and P. Yi, “Distributed algorithm for -generalized nash equilibria with uncertain coupled constraints,” Automatica, vol. 123, p. 109313.
- [19] M. Jaggi, “Revisiting Frank-Wolfe: Projection-free sparse convex optimization,” in International Conference on Machine Learning, 2013, pp. 427–435.
- [20] X. Jiang, X. Zeng, L. Xie, J. Sun, and J. Chen, “Distributed stochastic projection-free solver for constrained optimization,” arXiv preprint arXiv:2204.10605, 2022.
- [21] Z. Cui, H. Chang, S. Shan, and X. Chen, “Generalized unsupervised manifold alignment,” Advances in Neural Information Processing Systems, vol. 27, pp. 2429–2437, 2014.
- [22] L. Chapel, M. Z. Alaya, and G. Gasso, “Partial optimal tranport with applications on positive-unlabeled learning,” Advances in Neural Information Processing Systems, vol. 33, pp. 2903–2913, 2020.
- [23] X. Zhang and S. Mahadevan, “A bio-inspired approach to traffic network equilibrium assignment problem,” IEEE Transactions on Cybernetics, vol. 48, no. 4, pp. 1304–1315, 2017.
- [24] D. Garber and E. Hazan, “Faster rates for the Frank-Wolfe method over strongly-convex sets,” in International Conference on Machine Learning, 2015, pp. 541–549.
- [25] W. Zhang, P. Zhao, W. Zhu, S. C. Hoi, and T. Zhang, “Projection-free distributed online learning in networks,” in International Conference on Machine Learning, 2017, pp. 4054–4062.
- [26] M. Zhang, L. Chen, A. Mokhtari, H. Hassani, and A. Karbasi, “Quantized Frank-Wolfe: Faster optimization, lower communication, and projection free,” in International Conference on Artificial Intelligence and Statistics, 2020, pp. 3696–3706.
- [27] M. Jacimovic and A. Geary, “A continuous conditional gradient method,” Yugoslav Journal of Operations Research, vol. 9, no. 2, pp. 169–182, 1999.
- [28] H.-T. Wai, J. Lafond, A. Scaglione, and E. Moulines, “Decentralized Frank-Wolfe algorithm for convex and nonconvex problems,” IEEE Transactions on Automatic Control, vol. 62, no. 11, pp. 5522–5537, 2017.
- [29] B. Gharesifard and J. Cortés, “Distributed continuous-time convex optimization on weight-balanced digraphs,” IEEE Transactions on Automatic Control, vol. 59, no. 3, pp. 781–786, 2013.
- [30] A. Nedic, A. Olshevsky, and W. Shi, “Achieving geometric convergence for distributed optimization over time-varying graphs,” SIAM Journal on Optimization, vol. 27, no. 4, pp. 2597–2633, 2017.
- [31] S. Pu and A. Nedić, “Distributed stochastic gradient tracking methods,” Mathematical Programming, pp. 1–49, 2020.
- [32] Z. Qiu, S. Liu, and L. Xie, “Distributed constrained optimal consensus of multi-agent systems,” Automatica, vol. 68, pp. 209–215, 2016.
- [33] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge University Press, 2012.
- [34] A. Rivet and A. Souloumiac, “Introduction to optimization,” in Optimization Software, Publications Division. Citeseer, 1987.