Distributed Optimization of Clique-Wise Coupled Problems via Three-Operator Splitting
Abstract
This study explores distributed optimization problems with clique-wise coupling via operator splitting and how we can utilize this framework for performance analysis and enhancement. This framework extends beyond conventional pairwise coupled problems (e.g., consensus optimization) and is applicable to broader examples. To this end, we first introduce a new distributed optimization algorithm by leveraging a clique-based matrix and the Davis-Yin splitting (DYS), a versatile three-operator splitting method. We then demonstrate that this approach sheds new light on conventional algorithms in the following way: (i) Existing algorithms (NIDS, Exact diffusion, diffusion, and our previous work) can be derived from our proposed method; (ii) We present a new mixing matrix based on clique-wise coupling, which surfaces when deriving the NIDS. We prove its preferable distribution of eigenvalues, enabling fast consensus; (iii) These observations yield a new linear convergence rate for the NIDS with non-smooth objective functions. Remarkably our linear rate is first established for the general DYS with a projection for a subspace. This case is not covered by any prior results, to our knowledge. Finally, numerical examples showcase the efficacy of our proposed approach.
1 Introduction
The last two decades have witnessed the significant advancement of distributed optimization. In the literature, a huge body of existing studies has been dedicated to pairwise coupled optimization problems, where every coupling of variables comprises two agents’ decision variables corresponding to the communication path (edge) between the two. Representative examples are consensus optimization [22, 28, 38, 20, 1, 33] and formation control [26]. Moreover, so are the problems with globally coupled linear constraints [21] because their dual problems result in pairwise coupled consensus optimization.
To handle wider applications that involve complex coupling beyond edges, we address a more generic class of distributed optimization—clique-wise coupled optimization problems. This class has been introduced in our recent works [30, 31] with an emphasis on its generalization aspect. In this note, we elucidate additional benefits of this class of problems for performance enhancement and analysis via a new algorithm based on a three-operator splitting [13]. This class of problems is formulated as follows: Consider a multi-agent system with agents over a time-invariant undirected graph with and an edge set . Let represent the dimensional decision variable of agent . Then, the following is called a clique-wise coupled optimization problem:
(2) |
where for all , is -smooth and convex, and is proper, closed, and convex. For all , is -smooth and convex, and is proper, closed, and convex. For the set , let denote . Here, the set represents a clique, i.e., a complete subgraph in [7]. The set is a subset of , where is the index set of all the cliques in . For example, in the undirected graph in Fig. 1, we have and .

As mentioned above, an immediate benefit of Problem (2) is that it can handle variable couplings of more than two agents. As Fig. 1, cliques in (b) can deal with the coupling of three nodes , differently from (a). Indeed, Problem (2) always contains conventional pairwise coupled optimization problems as nodes and edges are also cliques. As well as pairwise coupled problems, other possible applications are, for example, (i) clique-wise coupled linear constraints [30, 31, 19] (e.g., resource allocation in Section 6), (ii) sparse SDP [29] (e.g., distributed design of distributed controllers [42], sensor network localization [2], etc), (iii) regularization accounting for network structures (e.g, Network lasso [15]), and (iv) approximation of trace norm minimization problems (e.g., multi-task learning [40], robust PCA [9], etc).
This note addresses Problem (2) using the Davis-Yin Splitting (DYS) and reveals that the notion of clique-wise coupling is beneficial for analyzing and improving convergence performance. The DYS is a versatile three-operator splitting scheme that generalizes basic operator-splitting methods (e.g., the forward-backward and Douglas-Rachford splittings). Firstly, we reformulate Problem (2) by introducing a matrix called the clique-wise duplication (CD) matrix. This matrix lifts Problem (2) to a tractable separated form for algorithm design. Then, applying the DYS, we derive the proposed algorithm called the clique-based distributed DYS (CD-DYS). Subsequently, we demonstrate that the CD-DYS generalizes several existing algorithms, encompassing the celebrated NIDS [20]. Then, we analyze a new mixing matrix that naturally comes up in deriving the NIDS and show a preferable distribution of its eigenvalues. Moreover, we present a new linear convergence rate for the NIDS with non-smooth terms by proving a more general linear rate for the DYS with a projection onto a subspace under strong convexity of the smooth term. Finally, numerical examples illustrate the effectiveness of the proposed approach.
Our contributions can be summarized as follows. (i) We propose a new distributed algorithm, CD-DYS, for Problem (2) applicable to broad examples ranging from optimization to control and learning problems; (ii) Our investigation of consensus optimization as a clique-wise coupled problem unveils that several conventional distributed optimization methods, including NIDS [20], are derived from the proposed CD-DYS method, which leads to a new linear convergence rate for the NIDS with non-smooth objective functions. This linear rate admits bigger stepsizes than ones in [1, 33]. It is worth mentioning that our linear convergence is first established for the general DYS with an indicator function of a linear image space, which does not follow from the prior works [13, 18, 37, 12] as indicator functions are neither smooth nor strongly convex; (iii) Numerical examples demonstrate the higher performance of our proposed approach than [20] and [21]. In particular, the superiority against the standard NIDS [20] is attributed to a novel mixing matrix obtained from our proposed method, which realizes a preferable eigenvalue distribution for fast consensus. We also provide its theoretical evidence. Note that one can construct this matrix without global information and use it for other consensus-based algorithms.
The remainder of this note is organized as follows. Section 2 provides preliminaries. Section 3 presents the definition of the CD matrix and its analysis including a new mixing matrix. In Section 4, we propose new distributed algorithms based on the DYS. In Section 5, we analyze the proposed methods for consensus optimization and show a new linear convergence result. Section 6 presents numerical experiments. Section 7 provides the proof of the convergence rate. Finally, Section 8 concludes this note.
2 Preliminaries
We here prepare several important notions.
Notations
Throughout this note, we use the following notations. Let denote the identity matrix in . We omit the subscript of when the dimension is obvious. Let be the zero matrix. Let . For , and represent the stacked vector in ascending order obtained from vectors , and we use the same notation to express stacked matrices. Let with denote the diagonal matrix whose th diagonal entry is . Similarly, and represent the block diagonal matrix. For a symmetric matrix , let with the inner product , and we simply write for . Let and be the largest and smallest eigenvalues of , respectively. For a differentiable function and , we write . We simply use when it is obvious. For a proper, closed, and convex function and , the proximal operator of with is represented by , and we denote for . Let represent the indicator function of , i.e., for and for . The projection onto a closed convex set with a metric is represented by , and we write for .
Graph theory
Consider a graph with a node set and an edge set consisting of pairs of different nodes . Note that throughout this note, we consider undirected graphs and do not distinguish and for each . For and , let be the neighbor set of node over , defined as .
For an undirected graph , consider a set . The set is called a clique of if the subgraph induced by is complete [7]. We define as the set of indices of all the cliques in . For , the set represents a subset of . If a clique is not contained by any other cliques, is said to be maximal. Let be the set of indices of all the maximal cliques in . For edge set , let be the index set of all the edges. For and , we define as the index set of all cliques in containing . Similarly, represents . For each , , and ,
(3) |
holds [26]. Note that agent can independently obtain from the undirected subgraph induced by .
Operator splitting
Consider
(5) |
where is an smooth convex function, and are proper, closed, and convex functions. For this problem, the following versatile algorithm, called (variable metric) Davis-Yin splitting (DYS), has been proposed in [13]:
(9) |
where is a positive definite symmetric matrix. Note that the case of corresponds to the standard DYS. This algorithm reduces to the Douglas-Rachford splitting when and to forward-backward splitting when . We have the following basic result, which states that and converge to a solution to (5) under an appropriate . For further convergence results, see [13, 24, 3, 18, 37, 12] and Subsection 5.2.
3 Clique-Wise Duplication Matrix
This section presents the definition and properties of the CD matrix that allows us to leverage operator splitting techniques for Problem (2) in a distributed fashion111Note that the matrix itself is not new. The same or similar ideas can be found in other existing papers, e.g., SDP [29, 42] and generalized Nash equilibrium seeking [6]. We also present a new mixing matrix with the matrix , showing a preferable distribution of eigenvalues.
3.1 Fundamentals
The definition and essential properties of the CD matrix are presented in what follows.
First, we assume the non-emptiness of . If this assumption is not satisfied, we can alternatively consider a subgraph induced by the node set to .
Assumption 1.
For all , holds.
Then, the definition of the CD matrix is given as follows. Here, for each is the size of in Problem (2), and we define
Definition 1.
For and cliques of graph , the Clique-wise Duplication (CD) matrix is defined as
(10) |
where and for each .
The CD matrix can be interpreted as follows. For , holds since . Hence, the CD matrix generates the copies of with respect to cliques .
The following lemma provides the fundamental properties of the CD matrix. Now, let the matrix be
(11) |
for . This matrix fulfills for and .
Lemma 2.
Under Assumption 1, the followings hold.
-
(a)
is column full rank.
-
(b)
.
-
(c)
For with ,
(12)
Using the CD matrix and (3), we can distributedly compute the least squares solution of , i.e.,
(13) |
and the projection of onto as
(14) |
Example 1.
Consider with and . Let and with and . Then, we obtain , , and , which ensures Assumption 1. For this system, the CD matrix is given by , where . We then obtain and for . Moreover, , and
for any vector with and , which can be computed in a distributed fashion.
3.2 Useful properties
Here, we provide useful properties of the CD matrix .
The following result shows that the gradient and proximal operator with can be computed in a distributed fashion. Here, th block of is represented by from Lemma 2.
Proposition 1.
Let . Then, under Assumption 1, the following equations hold.
-
(a)
Let be a proper, closed, and convex function for each . Define as . Let . Then,
(15) -
(b)
Let , where for each . Then,
(16) -
(c)
Let be a differentiable function. Then,
(17)
3.3 A mixing matrix
Using the CD matrix and the matrices in Proposition 1b, we can obtain a positive semidefinite and doubly stochastic matrix that will be used in Section 5. Thanks to the definition, in (18) can be constructed only from local information (i.e., ). Note that this matrix can be viewed as a special case of the clique-based projection in [30] and Appendix F for the consensus constraint, i.e., for in (39). We here pose the following assumption222Assumption 2 is not strict and satisfied for , , ..
Assumption 2.
For , .
The matrix and its basic properties are given as follows.
Proposition 3.
Proof.
The right stochasticity is proved as Using the definition of in Definition 1, the left stochasticity is also verified as
from and . Next,
holds. Then, we obtain (19). Moreover, directly follows from Gershgorin disks theorem [8]. Additionally, from the firmly nonexpansiveness of the clique-based projection (see Proposition 6), we obtain for any , which gives .
Finally, by the assumption of , the associated graph of is equal to . Therefore, the eigenvalue of is simple when is connected (see [8]). ∎
Now, we state the following proposition for in (18), implying that enables fast consensus. This is because all the eigenvalues smaller than are likely to take smaller values than other popular positive semidefinite mixing matrices from the Gershgorin disks theorem [8]. A numerical example of the eigenvalues and a sketch of Proposition 4’s implication are illustrated in Fig. 2.
Proposition 4.
Suppose Assumption 1. For undirected graph , consider the matrix in (18) with . Let , where with Laplacian matrix of and 333 The matrix is defined as for and for . Otherwise . In [32], with is said to be the max-degree weight. . Similarly, let with obtained by the Metropolis–Hastings weights444 is defined as for and for . Otherwise . in [32]. Then, for all , we have
(20) |
Proof.
When , holds for , and for becomes a singleton with as is just the set of indices of edges that include . Then, for we get . Hence, recalling the definition of and for , we have and , respectively. Therefore, since all entries of for are bigger than those of and and these matrices are doubly stochastic, we get (20). ∎
Remark 1.
In Fig. 2a, we use not but , which also realizes smaller eigenvalues. Likewise, even when , can have smaller eigenvalues than and .

4 Solution to Clique-Wise Coupled Problems via Operator Splitting
This section presents our proposed algorithms for Problem (2) with the CD matrix and DYS in (9) with the metrics of and . We now assume the following.
Assumption 3.
Problem (2) has an optimal solution.
In what follows, the functions , , , and represent
(21) | ||||
(22) |
4.1 Algorithm description
We give the distributed optimization algorithm in Algorithm 1, the clique-based distributed Davis-Yin splitting (CD-DYS) algorithm. This algorithm is distributed from (3). By Lemma 2, the aggregated form of this algorithm is as follows:
(28) |
where , , , and . By Lemma 1, this algorithm converges to the optimal point under .
This algorithm can be derived in the following way. From (13), for , we can reformulate Problem (2) into the form of (5) as follows:
(31) |
For Problem (31), Proposition 1 tells us that the function is proximable for proximable , and the proximal operator can be computed in a distributed fashion. Accordingly, we can directly apply DYS in (9) with to (31). From Proposition 1, setting gives the distributed algorithm in Algorithm 1 (or (28)).
4.2 Variable metric version
Applying the variable metric DYS in (9) with in Proposition 1 instead to Problem (31) gives the following algorithm:
(36) |
where we use Propositions 1b and 2. It will turn out in Section 5 that this algorithm shows an interesting connection to other methods as Fig. 3 through in (18). By Lemma 1, a sufficient condition for the convergence is .
5 Revisit of Consensus Optimization as A Clique-Wise Coupled Problem
This section is dedicated to a detailed analysis of the proposed methods in Section 4 in light of consensus optimization, presenting a renewed perspective. We here demonstrate the relationship summarized in Fig. 3 by showing the most important part: Algorithm (36) generalizes the NIDS in [20]. Our analysis reveals that matrix in (18) naturally arises in the NIDS. This fact and Proposition 4 imply that the proposed algorithm enables fast convergence [20] beyond standard mixing matrices. Furthermore, we present a new linear convergence rate for the NIDS with a non-smooth term based on its DYS structure. The linear rate follows from a more general result for the DYS (9).
We here consider a special case of Problem (2) given by
(38) |
When and
(39) |
this problem is called a consensus optimization problem, which we discuss here. Notice that according to [26], holds under the connectivity of graph and Assumptions 1 and 2.
5.1 CD-DYS as generalized NIDS
NIDS algorithm
First, the NIDS algorithm [20] for consensus optimization for is as follows:
(42) |
with arbitrary and . The matrix is a positive semidefinite doubly stochastic mixing matrix. A standard choice of with no use of global information is in Proposition 4. To make less conservative, [20] suggests that some global information is necessary (e.g., the value of ).
Analysis
We here present the following proposition, stating that the proposed algorithm in (36) yields the NIDS algorithm with in (18), which can achieve fast convergence as shown in Proposition 4 and Fig. 2. Note that we show the case of for simplicity.
Proposition 5.
Proof.
5.2 Linear convergence of the NIDS with
This subsection presents a linear convergence rate of the NIDS via the CD-DYS. We first present a new result of linear convergence for the general DYS for Problem (5) (not limited to the CD-DYS) when is strongly convex and is the indicator function of a linear image space. As indicator functions satisfy neither smoothness nor strong convexity, our result cannot be derived from the prior results of linear convergence as [13, 12, 18, 37]. The proof is presented in Section 7.
Theorem 1.
Consider the variable metric DYS in (9) for for Problem (5). Let and be the optimal values of , , and . Suppose that is -Lipschitz continuous, is -strongly convex, with a column full-rank matrix , and is proper, closed, and convex. Set a stepsize , where . Pick any start point and set . Then it holds that
where with , , and with .
Since for in (39), Theorem 2 provides the following linear rate for the NIDS with . Although [1, 33] have addressed this case, our result below admits bigger stepsizes due to the arbitrariness of .
Theorem 2.
Proof.
This theorem follows from Proposition 5 and Theorem 1. Note that while the -strong convexity of for guarantees the convexity of 555 is strongly convex over ; recall our formulation in (31)., we can treat just as a -strongly convex function and directly apply the same arguments as Theorem 1 because both in Algorithm (36) and always belong to . The linear convergence of follows from the first line of Algorithm (36). ∎
6 Numerical Experiments
This section presents two numerical examples: resource allocation and consensus optimization.
Clique-wise resource allocation
First, we consider a resource allocation problem for the network of agents in [30, Fig. 1] with four communities modeled by cliques and suppose that a resource constraint is imposed on each clique. The clique parameters are given as and . Let for be the amount of resources of agent . For and , we set with and generated by the uniform distribution with , with , for , with and generated by the uniform distribution with , and .
We here compare the proposed methods in (28) (or Algorithm 1) with ) with Liang et al. [21] for two different stepsizes (). The method in [21] is a distributed algorithm for globally coupled constraints using the gradient of the cost function. Notice that the dual decomposition technique cannot be directly used here since we have to minimize .
The simulation result is plotted in Fig. 4. The proposed CD-DYS converges much faster than Liang et al. [21] whereas that with fails to converge. This difference is rooted in the fact that the CD-DYS exploits the community structure of the problem and admits larger stepsizes. Note that to get the largest stepsize for Liang et al., one has to know an upper bound of the norm of dual variables.
Consensus optimization via NIDS
We next consider the norm regularized consensus optimization problem (38) for agents over an undirected graph . Here, we set and for . Here, , , and . Each entry of and is generated by the standard normal distribution.
We here conduct simulations of the NIDS in (42) for with and , with , , and , where . The last is introduced in [20] as a less conservative choice using global information . The stepsize is assigned as with
The simulation result is reported in Fig. 5, which plots the relative objective residual where . It can be observed that the case of with exhibits the fastest convergence in almost 60 iterations with high accuracy, outperforming without using global information. While the case of with is slower than , this is still superior to and , as implied in Proposition 4.


7 Proof of Theorem 2
Our proof is based on the following trick (See Theorem D.6 in [13]): If for some , and
(44) |
then , so
Thus,
(45) |
which provides a linear convergence rate. In the following, we derive an inequality of the form in (44) with , , and some constants .
We first prepare a key inclusion for establishing the desired rate. We suppose that , , and are not optimal without loss of generality. For , we have , which leads to
since . Note that this holds for all thanks to the initialization. Then we have
and thus we get by [4, Prop. 16.44]. This inclusion allows us to remove the assumption of smoothness or strong convexity for or in [13, 18, 37, 12] because one can evaluate only with .
We derive the lower side of the inequality in (44) as follows. By the smoothness and strong convexity of , for , [13, Proposition D.4] provides
Then, using , is lower bounded as
(46) |
where , , , and . Here, the term in (7) comes from the definition of and the relationship obtained by Jensen’s inequality for the terms of and .
Next, utilizing a similar calculation to the proof of [13, Proposition D.4] (see the proof of the second upper bound) and the inclusion , we derive the upper-bound of as follows. Here, we set , and let and satisfy and , respectively:
(47) |
where the last line follows from .
8 Conclusion
This note addressed distributed optimization of clique-wise coupled problems via operator splitting. First, we defined the CD matrix and a new mixing matrix and analyzed its properties. Then, using the CD matrix, we presented the CD-DYS algorithm via the Davis-Yin splitting (DYS). Subsequently, its connection to consensus optimization methods as NIDS was also analyzed. Moreover, we presented a new linear convergence rate not only for the NIDS with non-smooth terms but also for the general DYS with a projection onto a subspace. Finally, we demonstrated the effectiveness via numerical examples.
References
- [1] Sulaiman A Alghunaim, Ernest K Ryu, Kun Yuan, and Ali H Sayed. Decentralized proximal gradient algorithms with linear convergence rates. IEEE Transactions on Automatic Control, 66(6):2787–2794, 2020.
- [2] Miguel F Anjos and Jean B Lasserre. Handbook on semidefinite, conic and polynomial optimization, volume 166. Springer Science & Business Media, 2011.
- [3] Francisco J Aragón-Artacho and David Torregrosa-Belén. A direct proof of convergence of Davis–Yin splitting algorithm allowing larger stepsizes. Set-Valued and Variational Analysis, 30(3):1011–1029, 2022.
- [4] Heinz H Bauschke, Patrick L Combettes, Heinz H Bauschke, and Patrick L Combettes. Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, 2017.
- [5] Amir Beck and Marc Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1):183–202, 2009.
- [6] Mattia Bianchi and Sergio Grammatico. The end: Estimation network design for games under partial-decision information. IEEE Transactions on Control of Network Systems, 2024.
- [7] Béla Bollobás. Modern Graph Theory, volume 184. Springer Science & Business Media, 1998.
- [8] Francesco Bullo. Lectures on Network Systems, volume 1. Kindle Direct Publishing Seattle, DC, USA, 2020.
- [9] Emmanuel J Candès, Xiaodong Li, Yi Ma, and John Wright. Robust principal component analysis? Journal of the ACM (JACM), 58(3):1–37, 2011.
- [10] Jianshu Chen and Ali H Sayed. Diffusion adaptation strategies for distributed optimization and learning over networks. IEEE Transactions on Signal Processing, 60(8):4289–4305, 2012.
- [11] Laurent Condat, Daichi Kitahara, Andrés Contreras, and Akira Hirabayashi. Proximal splitting algorithms for convex optimization: A tour of recent advances, with new twists. SIAM Review, 65(2):375–435, 2023.
- [12] Laurent Condat and Peter Richtárik. Randprox: Primal-dual optimization algorithms with randomized proximal updates. arXiv preprint arXiv:2207.12891, 2022.
- [13] Damek Davis and Wotao Yin. A three-operator splitting scheme and its optimization applications. Set-Valued and Variational Analysis, 25:829–858, 2017.
- [14] Mituhiro Fukuda, Masakazu Kojima, Kazuo Murota, and Kazuhide Nakata. Exploiting sparsity in semidefinite programming via matrix completion i: General framework. SIAM Journal on optimization, 11(3):647–674, 2001.
- [15] David Hallac, Jure Leskovec, and Stephen Boyd. Network lasso: Clustering and optimization in large graphs. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 387–396, 2015.
- [16] Elad Hazan, Amit Agarwal, and Satyen Kale. Logarithmic regret algorithms for online convex optimization. Machine Learning, 69:169–192, 2007.
- [17] Puya Latafat, Nikolaos M Freris, and Panagiotis Patrinos. A new randomized block-coordinate primal-dual proximal algorithm for distributed optimization. IEEE Transactions on Automatic Control, 64(10):4050–4065, 2019.
- [18] Jongmin Lee, Soheun Yi, and Ernest K Ryu. Convergence analyses of davis-yin splitting via scaled relative graphs. arXiv preprint arXiv:2207.04015, 2022.
- [19] Huaqing Li, Enbing Su, Chengbo Wang, Jiawei Liu, Zuqing Zheng, Zheng Wang, and Dawen Xia. A primal-dual forward-backward splitting algorithm for distributed convex optimization. IEEE Transactions on Emerging Topics in Computational Intelligence, 2021.
- [20] Zhi Li, Wei Shi, and Ming Yan. A decentralized proximal-gradient method with network independent step-sizes and separated convergence rates. IEEE Transactions on Signal Processing, 67(17):4494–4506, 2019.
- [21] Shu Liang, George Yin, et al. Distributed smooth convex optimization with coupled constraints. IEEE Transactions on Automatic Control, 65(1):347–353, 2019.
- [22] Angelia Nedić and Asuman Ozdaglar. Distributed subgradient methods for multi-agent optimization. IEEE Transactions on Automatic Control, 54(1):48–61, 2009.
- [23] Yurii Evgen’evich Nesterov. A method of solving a convex programming problem with convergence rate . In Doklady Akademii Nauk, volume 269, pages 543–547. Russian Academy of Sciences, 1983.
- [24] Ernest K Ryu and Wotao Yin. Large-scale Convex Optimization: Algorithms & Analyses via Monotone Operators. Cambridge University Press, 2022.
- [25] Kazunori Sakurama. Unified formulation of multiagent coordination with relative measurements. IEEE Transactions on Automatic Control, 66(9):4101–4116, 2020.
- [26] Kazunori Sakurama and Toshiharu Sugie. Generalized coordination of multi-robot systems. Foundations and Trends® in Systems and Control, 9(1):1–170, 2022.
- [27] Ali H Sayed. Diffusion adaptation over networks. In Academic Press Library in Signal Processing, volume 3, pages 323–453. Elsevier, 2014.
- [28] Wei Shi, Qing Ling, Gang Wu, and Wotao Yin. A proximal gradient algorithm for decentralized composite optimization. IEEE Transactions on Signal Processing, 63(22):6013–6023, 2015.
- [29] Lieven Vandenberghe and Martin S Andersen. Chordal graphs and semidefinite optimization. Foundations and Trends® in Optimization, 1(4):241–433, 2015.
- [30] Yuto Watanabe and Kazunori Sakurama. Accelerated distributed projected gradient descent for convex optimization with clique-wise coupled constraints. In Proceedings of the 22nd IFAC World Congress, 2023.
- [31] Yuto Watanabe and Kazunori Sakurama. Distributed optimization of clique-wise coupled problems. In 2023 62nd IEEE Conference on Decision and Control (CDC), pages 296–302. IEEE, 2023.
- [32] Lin Xiao, Stephen Boyd, and Sanjay Lall. Distributed average consensus with time-varying metropolis weights. Automatica, 1:1–4, 2006.
- [33] Jinming Xu, Ye Tian, Ying Sun, and Gesualdo Scutari. Distributed algorithms for composite optimization: Unified framework and convergence analysis. IEEE Transactions on Signal Processing, 69:3555–3570, 2021.
- [34] Isao Yamada. The hybrid steepest descent method for the variational inequality problem over the intersection of fixed point sets of nonexpansive mappings. Inherently Parallel Algorithms in Feasibility and Optimization and Their Applications, 8:473–504, 2001.
- [35] Isao Yamada, Nobuhiko Ogura, and Nobuyasu Shirakawa. A numerically robust hybrid steepest descent method for the convexly constrained generalized inverse problems. Contemporary Mathematics, 313:269–305, 2002.
- [36] Ming Yan. A new primal–dual algorithm for minimizing the sum of three functions with a linear operator. Journal of Scientific Computing, 76:1698–1717, 2018.
- [37] Soheun Yi and Ernest K Ryu. Convergence analyses of davis-yin splitting via scaled relative graphs ii: Convex optimization problems. arXiv preprint arXiv:2211.15604, 2022.
- [38] Kun Yuan, Bicheng Ying, Xiaochuan Zhao, and Ali H Sayed. Exact diffusion for distributed optimization and learning—part I: Algorithm development. IEEE Transactions on Signal Processing, 67(3):708–723, 2018.
- [39] Kun Yuan, Bicheng Ying, Xiaochuan Zhao, and Ali H Sayed. Exact diffusion for distributed optimization and learning—part II: Convergence analysis. IEEE Transactions on Signal Processing, 67(3):724–739, 2018.
- [40] Yu Zhang and Qiang Yang. An overview of multi-task learning. National Science Review, 5(1):30–43, 2018.
- [41] Yang Zheng, Giovanni Fantuzzi, and Antonis Papachristodoulou. Chordal and factor-width decompositions for scalable semidefinite and polynomial optimization. Annual Reviews in Control, 52:243–279, 2021.
- [42] Yang Zheng, Maryam Kamgarpour, Aivar Sootla, and Antonis Papachristodoulou. Distributed design for decentralized control using chordal decomposition and ADMM. IEEE Transactions on Control of Network Systems, 7(2):614–626, 2019.
Appendix A Application examples
In addition to the examples in Section 6, we here present various application examples of Problem 2.
Formation control
A formation control problem aims to steer the positions of robots to a desired configuration and has been actively investigated for the past two decades. For a multi-agent system over undirected graph , the most basic formulation of this problem is
(49) |
where is the position of agent , and is the desired relative position from to . By assigning , one can obtain the desired configuration via the proposed CD-DYS. Note that one can also deal with various constraints in the clique-wise coupled framework, e.g., an agent-wise constraint and a pairwise distance constraint . In addition, the proposed framework also allows us to achieve the desired formation in a distributed manner even for linear multi-agent systems, as shown in [17], and in the case where each agent has no access to the global coordinate and can only use information via relative measurements, as shown in [25, 26]
Network Lasso
The network lasso is an optimization-based machine-learning technique accounting for network structures. For a multi-agent system over graph , a network lasso problem [15] is given as follows:
(51) |
where and for . This problem can be seen as a special case of Problem (2). Owing to the second term in (51), neighboring nodes are more likely to form a cluster, i.e., to take close values. Applications of the Network Lasso include the estimation of home prices [15], where there is a spatial interdependence among houses’ prices.
Sparse semidefinite programming

Semidefinite programming via chordal graph decomposition has been actively studied not only in optimization [29, 14] but also in control [41] as an efficient and scalable computation scheme exploiting the sparsity of matrices that naturally arises from underlying networked structures of problems. This type of problem can also be solved in a distributed manner based on the framework of clique-wise coupling.
Consider a multi-agent system over and the following standard semidefinite programming
(54) |
where we consider that agent possesses and th column of . Here, represents the set of positive semidefinite matrices. This problem cannot be solved in a distributed manner by standard algorithms due to the undecomposable constraint . Nevertheless, if have the sparsity with respect to and graph is chordal, [29, 41] show that this problem can be equivalently transformed into the following decomposed form with smaller positive semidefinite constraints:
(57) |
Moreover, when in (57) can be rewritten as
(58) |
with and some matrices with the sparsity with respect to , we can reformulate Problem (57) into a clique-wise coupled problem in (2) by introducing auxiliary variables. For example, for the system with in Fig. 6, which is a chordal graph with maximal cliques and , Problem (57) with (58) reduces to
(62) |
where , , are the entries of , , and , respectively. Hence, by decomposing the constraint in (62) into clique-wise coupled constraints by using the auxiliary variables and as
(63) | ||||
(64) | ||||
(65) |
where represents the entry of , we can obtain an equivalent clique-wise coupled problem in the following with , , , where represents th column of the matrix :
(68) |
Semidefinite programming in the form of (57) with (58) arises in practical problems, e.g., distributed design of decentralized controllers for linear networked systems [42] and sensor network localization [2]. Note that one can extend the discussion above to higher dimensional vectors and block-partitioned matrices , as shown in [41].
Approximating trace norm minimization problems
Trace norm minimization is a powerful technique in machine learning and computer vision that can obtain a low-rank matrix representing the underlying structure of the data. Its applications include the Robust PCA (RPCA) and multi-task learning problems.
For example, we can relax an RPCA problem to a clique-wise coupled problem as follows. Consider a data matrix . Then, a standard form of RPCA is formulated as follows:
(69) |
By solving this problem, we can decompose a data matrix into two components: a low-rank matrix representing the underlying structure of the data and a sparse matrix capturing the outliers or noise. Consider that for a multi-agent system with agents and , agent possesses the matrix . Then the robust PCA problem with the clique-based relaxation is formulated as follows:
(72) |
where and for . Here, and correspond to and in Problem (2). Although Problem (72) involves relaxation, one can still realize a low-rank matrix by solving it.
Appendix B Proof of Lemma 2
(a) We prove the statement by contradiction. Assume that the CD matrix is not column full rank. Then, there exists a vector with such that . This yields for and all . Hence, we obtain for all from Assumption 1. This contradicts the assumption.
(c) It holds that . Hence, we obtain (12).
Appendix C Proof of Proposition 1
(a) For , there exists some such that . Then, we obtain
Therefore, we obtain (15). Note that the last line can be verified by considering the optimality condition.
(b) This can be proved in the same way as Proposition 1a with an easy modification from the definition of .
(c) By the chain rule, we have . which gives ((c)).
Appendix D Proof of Proposition 2
Appendix E Connection of the CD-DYS to other distributed optimization algorithms
We here present the comprehensive analysis for the diagram in Fig. 3 by deriving the CPGD algorithm in [30] and Section F.
Exact diffusion and Diffusion algorithms
Over undirected graphs, the exact diffusion algorithm is just a special case of the NIDS. In the case of for all , the NIDS reduces to the Exact diffusion [38, 39], which is given as follows:
(73) |
This can be rewritten as follows:
(76) |
Those algorithms exactly converge to an optimal solution under mild conditions. Note that the Exact diffusion is also valid for directed networks and non-doubly stochastic . For details, see [38, 39].
The diffusion algorithm [27, 10] is an early distributed optimization algorithm, given as
(78) |
This algorithm is obtained from the NIDS for and the Exact diffusion approximating in the second line of (76). Notice that conditions on in (78) are not equivalent to (73) and (76) (see [27, 10, 24, 38, 39]). Although its convergence is inexact over constant , its simple structure allows us to easily apply it to stochastic and online setups.
CPGD generalizes of the diffusion algorithm
Invoking the relationship between NIDS/Exact diffusion and diffusion algorithms, we derive a diffusion-like algorithm from the variable metric CD-DYS in (36) for
(80) |
where is a closed convex set and not limited to (39). The derived algorithm will be formalized as the clique-based projected gradient descent (CPGD) in Appendix F.
We derive the diffusion-like algorithm as follows. From , we have and . Accordingly, the variable metric CD-DYS in (36) reduces to
By using of the form in (76), we get
(81) | ||||
with from . In consensus optimization, it can be observed from the previous subsection that boils down to a linear map and satisfies because we have
for in (39), as shown in (5.1). Therefore, recalling that the diffusion algorithm (78) can be viewed as (76) with , we can obtain the following diffusion-like algorithm (CPGD) from (81) by the similar approximation for the second line of (81):
(83) |
with defined as . Note that the operator , which will be defined as the clique-based projection in Appendix F, is equal to the doubly stochastic matrix in Proposition 3 for in (39).
Appendix F Clique-based projected gradient descent (CPGD) algorithm
We here formalize the generalization of the diffusion algorithm (CPGD) in (83). We provide detailed convergence analysis, which guarantees the exact convergence under diminishing step sizes and an inexact convergence rate over fixed ones. Moreover, we provide Nesterov’s acceleration and an improved convergence rate.
This section highlights the well-behavedness of clique-wise coupling that enables similar theoretical and algorithmic properties to consensus optimization (diffusion algorithm).
F.1 Clique-based Projected Gradient Descent (CPGD)
To this problem, the CPGD is given as follows:
(84) |
where is the clique-based projection for
(85) |
, , and is a step size. The clique-based projection is defined as follows.
Definition 2.
The clique-based projection can be represented as
The clique-based projection has many favorable operator-theoretic properties as follows.
Proposition 6.
Proof.
See Appendix F.3. ∎
The convergence properties of the CPGD over various step sizes are presented as follows. Note that the CPGD with fixed step sizes does not exactly converge to an optimal solution like the DGD and diffusion methods for consensus optimization.
Theorem 3.
Consider Problem (38) with closed convex sets . Consider the CPGD algorithm in (84). Suppose Assumptions 1–3.
-
(a)
Let a positive sequence satisfy , , and .666For example, satisfies the conditions. Assume that is bounded. Then, for any and any , converges to an optimal solution .
-
(b)
Let a positive sequence satisfy , , and . 777For example, and satisfy the conditions. Additionally, assume that is strongly convex. Then converges to the unique optimal solution for any and any .
-
(c)
Let for any . Let be
(87) with
(88) Then, for any and ,
(89) holds for .
Proof.
Remark 3.
Using in (88), another expression of the clique-based projection is obtained as follows.
Proposition 7.
Consider the function in (88). Then, it holds for any that
(90) |
Proof.
From Proposition 7, we can interpret the CPGD as a variant of the proximal gradient descent [24, 11, 5] since the clique-based projection can be represented as In this sense, the CPGD is a generalization of the conventional projected gradient descent (PGD). When is complete, the CPGD equals PGD because and hold for complete graphs.
F.2 Nesterov’s acceleration
The CPGD with fixed step sizes can be accelerated up to the inexact convergence rate of with Nesterov’s acceleration [23, 5]. The accelerated CPGD (ACPGD) is given as follows:
(91) |
where and with . This algorithm can also be implemented in a distributed manner.
The convergence rate is proved as follows.
Theorem 4.
Proof.
See Appendix F.4. ∎
F.3 Proof of Proposition 6
As a preliminary, we present important properties of the function in (88) for in (85) as follows. Note that the function in (88) is convex because of the convexity of each .
Proof.
If for , we obtain for all , which yields because of (85). Conversely, if , then we have for all . Thus, holds. ∎
Proposition 9.
The function in (88) is a -smooth function, i.e., its gradient is -Lipschitzian.
Proof.
With this in mind, we prove Proposition 6 as follows.
a) From Jensen’s inequality and the quasinonexpansiveness of convex projection operators [4], the following inequality holds for any :
(93) |
Thus, we obtain .
b) holds because holds for any and all . In the following, we prove the converse inclusion . Let . Then, it suffices to show . From , we obtain for all . In addition, from Jensen’s inequality and the quasinonexpansiveness of convex projection operators [4], we have
Thus, from the equality condition of Jensen’s inequality, we obtain for all for all . Then, we have for all . Therefore, since , we have . Thus, holds from Proposition 8.
c) For a non-empty closed convex set in (85) and , there exists such that . Hence, for , , and , we have because
where the last line follows from the projection theorem (see Theorem 3.16 in [4]). Thus, by Jensen’s inequality and the nonexpansiveness of [4], for any and , we obtain Hence, for any and .
F.4 Proof of Theorems 3c and 4
Here, we show the proofs of Theorems 3c and 4. These proofs are based on the convergence theorems for the ISTA and FISTA (Theorems 3.1 and 4.4 in [5]), respectively.
Supporting Lemmas
Before proceeding to prove the theorems, we show some inequalities corresponding to those obtained from Lemma 2.3 in [5], which is a key to proving the convergence theorems. Note that a differentiable function is convex if and only if
(94) |
holds for any . If is -smooth and convex,
(95) | ||||
(96) |
hold for any . For details, see textbooks on convex theory, e.g., Theorem 18.15 in [4].
In preparation for showing lemmas, let and with in (88). Additionally, for , we define with some as
(97) |
For in (97), the following inequalities hold.
Proposition 10.
Assume that is -smooth and convex. Let . Then,
(98) |
holds for any .
Proof.
Proposition 11.
Let with some . Then, it holds that
(99) |
Proof.
With this in mind, we consider the following update rule with and some :
(100) |
In addition, we define as
(101) |
with in (97). By , can be rewritten as .
Remarkably, in (101) satisfies the following lemma.
Lemma 3.
Consider the sequence generated by (F.4). Then,
(102) |
Proof.
In light of -smoothness of and , we obtain Hence, adding to both sides yields (102). ∎
For and an optimal , we present the following lemma.
Lemma 5.
For , it holds that
(104) |
Proof of Theorem 3c
In this proof, assume that for all . Then, holds and the algorithm in (F.4) equals to the CPGD with for all .