On the convergence of the distributed Proximal point algorithm
Abstract.
In this work, we establish convergence results for the distributed proximal point algorithm (DPPA) for distributed optimization problems. We consider the problem on the whole domain and find a general condition on the stepsize and cost functions such that the DPPA is stable. We prove that the DPPA with stepsize exponentially converges to an -neighborhood of the optimizer. Our result clearly explains the advantage of the DPPA with respect to the convergence and stability in comparison with the distributed gradient descent algorithm. We also provide numerical tests supporting the theoretical results.
Key words and phrases:
Proximal point method, Distributed optimization, Distributed gradient algorithm2010 Mathematics Subject Classification:
Primary 90C25, 68Q251. Introduction
This works consider the distributed optimization
(1.1) |
where denotes the number of the agents in the network, and is a differentiable local cost only known agent for each . In recent years, extensive research has been conducted on distributed optimization due to its relevance in various applications, including wireless sensor networks [1, 24], multi-agent control [4, 22, 5], smart grids [9, 10], and machine learning [11, 21, 30, 3, 27].
Numerous studies have focused on solving distributed optimization problems on networks. Relevant research works include [17, 18, 25, 26], and the references therein. A key algorithm in this field is the distributed gradient descent algorithm (DGD), introduced in [17]. Additionally, there exist various versions of distributed optimization algorithms such as EXTRA [25] and decentralized gradient tracking [18, 20, 28]. Lately, there has been growing interest in designing communication-efficient algorithms for distributed optimization [23, 8, 12, 2].
The distributed proximal point algorithm (DPPA) was proposed in [15, 16] as a distributed counterpart of the proximal point method, analogous to the relation between the distributed gradient descent algorithm and the gradient descent method. The works [15, 16] established the asymptotic convergence of the DPPA under the assumptions of a compact domain for each local cost and a decreasing step size. The work [14] designed the DPPA on a directed graph and proved a convergence estimate with rate when the step size is set to and each cost function has a compact domain.
It is a well-known fact that the proximal point method is more stable compared to the gradient descent method for large choices of the stepsize. This fact suggests that the DPPA may also exhibit be more stable than the DGD, as mentioned in the previous works [13, 14, 16]. In this work, we provide convergence results for the DPPA in the case of the entire domain and a constant step size. Comparing the results with the convergence result [29] of the DGD, we will find that the DPPA is more stable than the DGD when the stepsize is large.
The DPPA for the problem (1.1) is described as follows:
(1.2) |
where denotes the weight for the communication among agents in the network desribed by an undirected graph . Each node in represents an agent, and each edge means that can send messages to and vice versa. We consider a graph satisfying the following assumption.
Assumption 1.
The communication graph is undirected and connected, i.e., there exists a path between any two agents.
We define the mixing matrix as follows. The nonnegative weight is given for each communication link where if and if . In this paper, we make the following assumption on the mixing matrix .
Assumption 2.
The mixing matrix is doubly stochastic, i.e., and . In addition, for all .
In the following result, we establish that the sequence is stable, i.e., uniformly bounded for under suitable conditions.
Theorem 1.1.
Suppose that the assumptions 1-2 hold, and assume that one of the following conditions holds:
-
(1)
The following function
(1.3) is bounded below and has an optimal solution .
-
(2)
Each function is -smooth and is -stronlgy convex. In addition, the stepsize satisfies the following inequality
(1.4) where is the spectral norm of the matrix .
Then the sequence of the DPPA with stepsize is uniformly bounded for .
We now compare the stability result with that of the DGD described by
Let be the smallest eigenvalue of . Then it is known from [29] that the DGD is stable if the stepsize satisfies
provided that each cost function is convex and -smooth.
Condition on the costs | Stepsize | Paper | |
DGD | Each is convex and -smooth | [29] | |
DPPA | is bounded below and has an optimizer | This work |
Table 1 summarizes the condition on the stepsize for the stability of the DGD and the DPPA. We observe that defined in (1.3) is bounded below for any when each is convex. Therefore, the condition of (1) in Theorem 1.1 is not so restrictive than the condition that is convex which is required for the stability result [29] of the DGD. Hence the result of Theorem 1.1 proves that the stepsize choice of is much wider for the DPPA than that for the DGD.
For the convergence analysis of the DPPA, we assume the uniformly boundedness property.
Assumption 3.
The sequence is uniformly bounded for , i.e., there is such that
Although the property is guaranteed for broad class of functions by the result of Theorem 1.1, we formulate this assumption of the sake of simplicity in the statement of the convergence result. We also consider the following assumption.
Assumption 4.
The aggregate cost function is -strongly convex for some and each cost function is -smooth for each , i.e.,
for all .
Under this assumption, there exists a unique optimizer . We let . Also we regard as a row vector in , and define the variable and by
(1.5) |
where . We let
We show that the DPPA converges exponentially to an -neighborhood of the optimizer in the following result.
Theorem 1.2.
Suppose that the assumptions 1-3 hold and Then we have
(1.6) |
and
Upon knowing that the DGD is stable, the linear convergence result of the DGD is proved when the stepsize satisfies . We refer to [29, 7] for the detail. As for the DPPA, we note that there is no restriction on the stepsize for the linear convergence as obtained in Theorem 1.2. Table 2 compares the condition on the stepsize of the DGD and the DPPA for the convergence of the algorithms.
2. Sequential estimates
In this section, we dervie two sequential inequalities of and , which are main ingredients for the stability and the convergence analysis of the DPPA.
Proposition 2.1.
Assume that is -strongly convex. Then, for any stepsize , the sequence satisfies the follwoing inequality
(2.1) |
for all .
Proof.
From the minimality of (1.2), it follows that
(2.2) |
We reformulate this as
(2.3) |
By averaging this for , we get
(2.4) |
Using this and the fact that , we find
(2.5) |
Since is assumed to be -strongly convex, we have
and
Combining these estimates, we get
Using this estimate in (2.5) and applying the triangle inequality, we get
where we used the Cauchy-Schwartz inequality in the last inequality. This gives the desired inequality. ∎
Next we will derive a bound of in terms of and . For this we will use the following result (see [19, Lemma 1]).
Lemma 2.2.
In the following result, we find an estimate of in terms of and .
Proposition 2.3.
Suppose that each is -smooth. Then the sequence satisfies the following inequality
(2.6) |
for all .
Proof.
We may write (2.3) and (2.4) in the following way
and
where we have let
Combining the above equalities, we find
By applying the triangle inequality and Lemma 2.2, we deduce
(2.7) |
Using the fact that the spectral radius of the matrix is one, we obtain
Inserting this into (2.7) we obtain
which is the desired inequality. ∎
3. Boundedness of the sequence
We prove the uniform boundedness result of Theorem 1.1 under the two conditions separately in the below.
Proof of Theorem 1.1.
Assume the first condition of Theorem 1.1. We claim the following inequality
(3.1) |
The optimizer of (2.7) satisfies
Combining this with (2.2) gives
From this we have
Summing up this for , we get
which proves the inequality (3.1). It gives the following bound
Hence and are uniformly bounded.
Next we assume the second condition of Theorem 1.1. Then we claim that the sequence and satisfies
where is defined by
(3.2) |
We argue by induction to prove this claim. First, we note that , and by the definition of . Next we assume that
(3.3) |
for some and . Then, we use these bounds in (2.1) to find
which gives . Next we recall the estimates (2.1) and (2.6) with replaced by as
and
where we used in the last inequality. Combining these estimates with (3.3) gives
This gives provided that
This holds true for defined in (3.2) and satisfying
which is same with
The proof is done.
∎
4. Convergence result
In this section, we prove the main convergence result of the decentralized proximal point method.
Proof of Theorem 1.2.
By Proposition 2.1 and Proposition 2.3 we have the following inequalities
(4.1) |
and
Using the assumption that and , we have
for all . Using this iteratively, we get
Putting this inequality into (4.1) leads to
(4.2) |
To analyize this sequential estimate, we consider two positive sequences and satisfying
and
where and . It then follows from (4.2) that for all .
We estimate as follows:
Next we estimate as
Combining the above estimates gives
This corresponds to the inequality (1.4) in the condition (2). Thus, it holds that and . This completes the induction, and so the claim is proved. Hence the uniform boundedness is proved. ∎
5. Numerical tests
This section gives numerical experimental results of the DPPA. We consider the cost function
where is the number of agents and for each and be an matrix whose element is chosen randomly following the normal distribution . Also we set whose each element is generated from the normal distribution . We choose , and .
For the communication matrix , we link each two agents with probability 0.4, and define the weights by
We define the following value
where is the smallest eigenvalue of and is the constant for the smoothness propery of for . In our experiment, the constants are computed as and . Therefore we find .
We perform the numerical simulation for the DPPA and the DGD with stepsize chosen as and . Figure 1 indicates the graphs of the error with respect to . The result shows that both the DPPA and the DGD are stable for as the theoretical results of Table 1 guarantee. On the other hand, if we choose , then the DGD becomes unstable while the DPPA is still stable. This supports the result of Theorem 1.1.


Next we perform the numerical test for the DPPA with various stepsize . We measure the error and the graphs are given in Figure 2. The result shows that the error decreases exponentially up to an value, which supports the convergence result of Theorem 1.2.

Acknowledgments
The author was supported by the National Research Foundation of Korea (NRF) grants funded by the Korea government No. NRF-2016R1A5A1008055 and No. NRF-2021R1F1A1059671.
References
- [1] J. A. Bazerque, G.B. Giannakis, Distributed spectrum sensing for cognitive radio networks by exploiting sparsity. IEEE Trans. Signal Process. 58(3), 1847–1862 (2010)
- [2] A.S. Berahas, R. Bollapragada, N.S. Keskar, E. Wei, Balancing communication and computation in distributed optimization. IEEE Trans. Autom. Control 64(8), 3141–3155 (2018)
- [3] L. Bottou, F. E. Curtis, and J. Nocedal, Optimization methods for large-scale machine learning, SIAM Review, vol. 60, no. 2, pp. 223–-311, 2018.
- [4] F. Bullo, J. Cortes, S. Martinez, S. Distributed Control of Robotic Networks: A Mathematical Approach to Motion Coordination Algorithms, vol. 27. Princeton University Press, Princeton (2009)
- [5] Y. Cao, W. Yu, W. Ren, and G. Chen, An overview of recent progress in the study of distributed multi-agent coordination, IEEE Trans Ind. Informat., 9 (2013), pp. 427–438
- [6] W. Choi, D. Kim, S. Yun, Convergence results of a nested decentralized gradient method for non-strongly convex problems. J. Optim. Theory Appl. 195 (2022), no. 1, 172–204.
- [7] W. Choi, J. Kim, On the convergence of decentralized gradient descent with diminishing stepsize, revisited, arXiv:2203.09079.
- [8] A. I. Chen and A. Ozdaglar, A fast distributed proximal-gradient method, in Communication, Control, and Computing (Allerton), 2012 50th Annual Allerton Conference on. IEEE, 2012, pp. 601–608.
- [9] G. B. Giannakis, V. Kekatos, N. Gatsis, S.-J. Kim, H. Zhu, and B. Wollenberg, Mon-itoring and optimization for power grids: A signal processing perspective, IEEE Signal Processing Mag., 30 (2013), pp. 107–128,
- [10] V. Kekatos and G. B. Giannakis, Distributed robust power system state estimation, IEEE Trans. Power Syst., 28 (2013), pp. 1617–1626,
- [11] P. A. Forero, A. Cano, and G. B. Giannakis, Consensus-based distributed support vector machines, Journal of Machine Learning Research, vol. 11, pp. 1663–-1707, 2010.
- [12] D. Jakovetic, J. Xavier, and J. M. F. Moura, Fast Distributed Gradient Methods, IEEE Transactions on Automatic Control, vol. 59, no. 5, pp. 1131–1146, 2014.
- [13] X. Li, G. Feng, L. Xie, Distributed proximal algorithms for multiagent optimization with coupled inequality constraints. IEEE Trans. Automat. Control 66 (2021), no. 3, 1223–1230.
- [14] X. Li, G. Feng, L. Xie, Distributed Proximal Point Algorithm for Constrained Optimization over Unbalanced Graphs. In 2019 IEEE 15th International Conference on Control and Automation (ICCA) (pp. 824-829).
- [15] K. Margellos, A. Falsone, S. Garatti, and M. Prandini, Proximal minimization based distributed convex optimization,” in 2016 American Control Conference (ACC), July 2016, pp. 2466–2471.
- [16] K. Margellos, A. Falsone, S. Garatti, and M. Prandini, Distributed constrained optimization and consensus in uncertain networks via proximal minimization, IEEE Trans. Autom. Control, vol. 63, no. 5, pp. 1372–1387, May 2018.
- [17] A. Nedić and A. Ozdaglar, Distributed subgradient methods for multi-agent optimization, IEEE Transactions on Automatic Control, vol. 54, no. 1, pp. 48, 2009.
- [18] A. Nedić, A. Olshevsky, and W. Shi, Achieving geometric convergence for distributed optimization over time-varying graphs, SIAM Journal on Optimization, vol. 27, no. 4, pp. 2597–2633, 2017.
- [19] S. Pu and A. Nedić, Distributed stochastic gradient tracking methods, Math. Program, pp. 1–49, 2018
- [20] G. Qu and N. Li, Harnessing smoothness to accelerate distributed optimization, IEEE Transactions on Control of Network Systems, vol. 5, no. 3, pp. 1245–1260, 2018.
- [21] H. Raja and W. U. Bajwa, Cloud K-SVD: A collaborative dictionary learning algorithm for big, distributed data, IEEE Transactions on Signal Processing, vol. 64, no. 1, pp. 173–-188, Jan. 2016.
- [22] W. Ren, Consensus Based Formation Control Strategies for Multi-Vehicle Systems, in Proceedings of the Amer-ican Control Conference, 2006, pp. 4237–-4242.
- [23] A. H. Sayed, Diffusion adaptation over networks. Academic Press Library in Signal Processing, 2013, vol. 3.
- [24] I.D. Schizas, G.B. Giannakis, S.I. Roumeliotis, A. Ribeiro, Consensus in ad hoc WSNs with noisy links—part II: distributed estimation and smoothing of random signals. IEEE Trans. Signal Process. 56(4), 1650–1666 (2008)
- [25] W. Shi, Q. Ling, G. Wu, W. Yin, Extra: an exact first-order algorithm for decentralized consensus optimization. SIAM J. Optim. 25(2), 944–966 (2015)
- [26] W. Shi, Q. Ling, K. Yuan, G. Wu, W. Yin, On the linear convergence of the ADMM in decentralized consensus optimization. IEEE Trans. Signal Process. 62(7), 1750–1761 (2014)
- [27] K. I. Tsianos, S. Lawlor, and M. G. Rabbat, Consensus-based distributed optimization: Practical issues and applications in large-scale machine learning, in Proceedings of the IEEE Allerton Conference on Communication, Control, and Computing, IEEE, New York, 2012, pp. 1543–1550,
- [28] R. Xin and U. A. Khan, A linear algorithm for optimization over directed graphs with geometric convergence, IEEE Control Systems Letters, vol. 2, no. 3, pp. 315–320, 2018.
- [29] K. Yuan, Q. Ling, and W. Yin, On the convergence of decentralized gradient descent, SIAM Journal on Optimization, vol. 26, no. 3, pp. 1835–1854, Sep. 2016.
- [30] T. Yang, X. Yi, J. Wu, Y. Yuan, D. Wu, Z. Meng, Y. Hong, H. Wang, Z. Lin, and K. H. Johansson, A survey of distributed optimization, Annual Reviews in Control, vol. 47, pp. 278–305, 2019.