Distributed Optimization Method Based On Optimal Control
Abstract
In this paper, a novel distributed optimization framework has been proposed. The key idea is to convert optimization problems into optimal control problems where the objective of each agent is to design the current control input minimizing the original objective function of itself and updated size for the future time instant. Compared with the existing distributed optimization problem for optimizing a sum of convex objective functions corresponding to multiple agents, we present a distributed optimization algorithm for multi-agents system based on the results from the maximum principle. Moreover, the convergence and superlinear convergence rate are also analyzed stringently.
I INTRODUCTION
Optimization problem which involves maximizing or minimizing the objective function is ubiquitous in science and engineering [1]. In the traditional centralized optimization control, it requires a centralized manner to collect and deal with the information received from all other agents. With increasing penetrations of the networked control systems, the division of labor and cooperation of multi-agent systems plays an increasingly important role in daily life. Meanwhile, distributed optimization problems arise accordingly in which several autonomous agents collectively try to achieve a global objective and compared with the centralized optimization, distributed algorithms also have the potential to respect privacy of data, measurements, cost functions, and constraints, which becomes increasingly important in practical applications such as intelligent traffic system [2], electric power systems [3], formation control[4], resource allocation [5] and so on.
Distributed optimization has gained growing renewed interest and there are various distributed algorithms proposed in the literature. The dual decomposition, which is the early class of distributed optimization algorithms presented in [6], typically relied on choosing the step size to ensure convergence. Due to the inexpensive computation cost of (sub)gradients, the gradient descent algorithms is also a popular tool to deal with the distributed optimization. [7] considered a unconstrained distributed computation model for optimizing a sum of convex objective functions corresponding to multiple agents where each agent performed a consensus step and then a descent step along the local (sub)gradient direction of its own convex objective function. For the constrained distributed optimization problem, [8] focused on the cooperative control problems where the values of agents are constrained to lie in closed convex sets and developed distributed optimization algorithms for problems. To accelerate the convergence of (sub)gradient method, an effective distribution algorithm called Newton method is taken into consideration [9, 10, 11], which has a superlinear rate in the convergence rate. In particular, [12] proposed a distributed adaptive Newton algorithm with a global quadratic convergence rate where each node of a peer-to-peer network minimizes a finite sum of objective functions by communicating with its neighboring nodes. The procedure proposed in [13] let agents distributedly compute and sequentially update an approximated Newton-Raphson direction by means of suitable average consensus ratios. However, as point out in [14], although subgradient and Newton methods are favorable, powerful and widely used in distributed optimization algorithms, the choice of step size and the singularity of Hessian matrix are crucial to the algorithms.
Motivated by [14], a novel distributed optimization algorithm is proposed from the viewpoint of the optimal control problem. To be specific, there exists control input of each agent who is the updated size of each iteration. The objective of each agent is to design current control input to minimize the objective function of itself and updated size for the future time instant. Through a simple transformation, the sum of the original objective function of each agent is constructed. Based on the maximum principle, the distributed optimization method is derived compared with the existing distributed optimization problem for optimizing a sum of convex objective functions. To show the superiority of the algorithm, the superlinear convergence rate is proved.
The outline of this paper is given below. Compared with the distributed optimization problem, the optimal control problem is derived in Section II. The distributed optimization algorithm based on the maximum principle is proposed in Section III. The rate of the convergence is analysed in Section IV. Numerical examples are given in Section V. Conclusion is arranged in Section VI.
Notation: Throughout the paper, stands for the transposition of matrix . denotes the -dimensional real vector space. is a unit matrix with appropriate dimension. For a symmetric matrix , means that is a positive definite (positive semi-definite) matrix. represents the spectral radius of . denotes 2-norm of vectors and is the induced norm of matrices. represents all integers from integer to integer .
II PROBLEM FORMULATION
II-A Graph theory
The communication graph is denoted by a directed graph , where is a set of vertices(nodes), is the number of agents satisfying , is the set of edges, and the weighted matrix is a non-negative matrix for adjacency weights of edges, if and only if . The graph is said to be balanced if the sum of the interaction weights from agent to agent are equal, i.e., . Moreover, is used to represent the neighbor set of agent . For the topology , a path of length from node to node is a sequence of distinct nodes such that for . If there exists a path between any two nodes, then is said to be strongly connected. Here we make the following assumptions for the communication graph.
Assumption 1
The directed graph is balanced and is strongly connected.
II-B Distributed optimization
Consider an MAS consisting of agents with optimization problem, labeled by set where agents communicate the local state information with their neighbors via directed graph . Only agent knows the function , which is twice continuously differentiable, possibly non-convex. The traditional unconstrained optimization problem is given by
(1) |
Different from the traditional distributed optimization method, we transform the task of finding solutions to problem (1) into updating of the state sequence within an optimal control problem. To be specific, consider the following discrete-time linear time-invariant system for each agent
(2) |
where is state with the initial value , is the control input which needs to be further specified later. The represents the th agent, .
Correspondingly, the cost function of each agent satisfies
(3) | |||||
with where , and .
Remark 1
Compared with the traditional method, we convert the optimization problem into an optimal control problem, where the objective of each agent is to design the current control input minimizing the sum of the original objective function and updated size for the future time instant. On the one hand, the problem studied in this paper includes the traditional results with multi-agents system, if we let while the system (1) can be modified as . On the other hand, in the existing optimization algorithms to handle the problem (1), there is a typical distributed first-order gradient descent algorithm [16]:
(4) |
where is the step size. Noting that in convex optimization, it has been proven that using distributed algorithm (4), the global optimal solution for problem (1) can be obtained. However, distributed algorithm (4) has limitations in terms of selecting the step size. The step size in distributed algorithm (4) should theoretically satisfy the following two conditions:
(5) |
(6) |
In practical applications, the common practice for selecting the is to start with a small constant value. Alternatively, even if a time-varying step size is chosen to satisfy the two conditions (5)-(6), the results obtained from distributed algorithm (4) are evidently influenced by the . The distributed algorithm (4) has limitations in terms of selecting the step size and the results obtained from distributed algorithm (4) are evidently influenced by the .
Based on the discussion above, we are in position to derive the distributed optimization algorithm from the viewpoint of the optimal control. For convenience of future discussion, some symbols are denoted below.
Let be the errors set of the th agent, then . In this way, we have
where is the column vector composed by , satisfying and is a Laplacian matrix. To this end, (2) and (3) can be written with expanded forms as
(7) |
and
(8) |
respectively. The optimal control problem to be solved in this section is given.
Problem: Find the optimal control minimizing the long-term cost and subject to (7).
Remark 2
In our recent work [14], we implemented a centralized optimization algorithm, which causes us to design the system and cost function (7)-(8). Naturally, we considered the possibility of designing a distributed algorithm based on the centralized optimization algorithm from the maximum principle. It should be noted that if , (8) can be reduced to a cost function of consensus problem based on optimal control in our recent work [17].
III Distributed Optimization Method Using Optimal Control
In this section, we use optimal control theory to solve the Problem. This inspires us to develop an optimization algorithm which can be implemented into a distributed manner. Additionally, a variant of the algorithm is also proposed to balance the number of iterations and communications.
III-A Algorithm Development
Following from [18], applying Pontryagins maximum principle to the system (7) with the cost function (8), the following costate equation and equilibrium condition are obtained:
(9) |
(10) |
with the terminal value , where .
To derive the analytical solution from the FBDEs (7), (9)- (10), define the Riccati equation
(11) |
with the terminal value , where .
The analytical expression to the controller is given.
Lemma 1
If the Riccati equation (11) admits solution such that is invertible, then the control satisfies
(12) |
where for and for . Moreover, the costate equation is derived as
(13) |
Proof:
For , adding the terminal value into (9), there holds
(14) |
i.e., satisfies (12) for . Accordingly, substituting (14) into (10), the costate is calculated as
Assume that (13) is established for , we will show it also holds for . For , adding into (9), one has
(15) |
i.e., is such that
(16) |
Remark 3
It should be noted that the analytical solution obtained in Lemma 1 contains the future information, i.e., it depends on , which is difficult to realize in practical life, thus a modified algorithm is presented.
Before proposing an algorithm based on optimal control obtained in Lemma 1, we first show an interesting phenomenon: the controller (12) derived from optimal control theory utilizes the average value of the derivative of each agent’s local function, referred to as the “average gradient” hereafter. That is to say, the matrix has the property of taking the average value, which is proved below. It should be noted that distributed optimization algorithms that employ the average gradient have been explored in [19]. Our derived results theoretically demonstrate the rationality and correctness of adopting average gradient. The property of defined in Lemma 1 is given below.
Lemma 2
defined in Lemma 1 satisfies .
Proof:
If the Riccati equation (11) admits a stable solution when , then we adopt this stable solution in . In order to prove the convergence of , the following system is constructed firstly
(17) |
where . In this case, the key point is to prove . Utilizing the properties of the Riccati equation’s solution and , rewrite the above (17) with finesse
(18) |
where is a Laplacian matrix.
According to [20], since the eigenvalues of matrix are greater than or equal to zero, the eigenvalues of are greater than or equal to . Therefore, the eigenvalues of the inverse of lie between and , indicating that the above system (17) is stable.
When , the stable state of the system should satisfy , which simplifies to . Referring to conclusion [20], the solution of is , so we have , i.e., . This completes the proof. ∎
Lemma 2 indicates that (12) utilizes the average gradient. For the sake of simplifying notation, we will denote the average gradient as , i.e., .
III-B Distributed Optimization Algorithm
In this subsection, we discuss the distributed optimization algorithm derived based on Lemma 1, which includes a centralized algorithm and a decentralized algorithm.
Lemma 3
Under the condition of Lemma 1 and based on the optimal control obtained from it, there exists an iteration algorithm
(19) |
where is average gradient, , , represents the stable solution of (11) used for .
Proof:
For controller (12), noting that it exhibits non-causality because the future average gradient is used. Fortunately, the problem of utilizing the future information has been previously addressed in our research referring to [14] for the details. In accordance with the methodology presented in [14], controller (12) will transform into as depicted in (19). ∎
Remark 4
The connection between algorithm (19) and our previous work presented in [14] will be discussed. On the one hand, if each agent does not have a local function to optimize, i.e., , it is evident that the terms and in (19) will not exist. That is to say, (19) transforms into the following form:
(20) |
In this case, corresponds to the form of the controller from Lemma 1 in [17] when the matrix is the identity matrix. On the other hand, if we only consider a centralized optimization problem, or if the states during the iteration process of each agent are same, then and the related will not exist. Moreover, (19) degenerates into the following form:
(21) |
In this case, (21) corresponds to Algorithm I in the previous work [14], where and correspond to the (average) gradient and Hessian matrix, respectively.
Based on optimal control theory, we have derived a distributed optimization algorithm (19) with a central computing node (in computer science, it is called as [21] and we use this expression in Algorithm 1). The algorithm presented in Algorithm 1 is called the Distributed Optimal Control Method with a Central Computing Node (DOCMC). Now, we summarize the DOCMC algorithm from the perspective of each agent .
Utilizing the state error with neighbors can significantly enhance the computational stability of the server. In fact, if each agent adopts the same state, then (21) will be utilized in DOCMC. However, when the states of each agent cannot be guaranteed to be consistent due to communication processes or the influence of individual agents, relying solely on (21) may not be sufficient to achieve the optimization goals. The existence of ensures that each agent can achieve the optimization goals even when their states are not consistent (whether due to initially different states or deviations during the optimization process), as the algorithm incorporates an optimal consensus feedback control form based on .
Next, we will focus on describing the distributed optimization algorithm. Firstly, reconsidering the DOCMC algorithm, the existence of is a primary reason why the algorithm requires a server. Motivated by our recent work [22], which replaced the inverse matrix with an adjustable scalar , in the DOCMC algorithm, will be replaced by in the following discussion. Due to the relationship based on (19), we propose the following algorithm
(22) |
where the second-order information is used in algorithm (22) on account of the optimization framework of optimal control. We refer to the above algorithm as the Distributed Optimization Algorithm based on Optimal Control (DOAOC).
Remark 5
Now the connection between algorithm (22) and our previous work in [23] is discussed. If we only consider a centralized optimization problem, or if the states during the iteration process of each agent are the same, then and the related will not exist. In this case, (22) degrades into the following form:
(23) |
which is exactly Algorithm II in the previous work [23].
Now, a distributed optimization algorithm for each agent to handling problem (1) from the DOAOC algorithm is given, referring to Algorithm 2 for details. In the proposed distributed algorithm, as described in Steps 3 and 4, each agent only communicates with its neighbors. Step 5 initializes the loop by computing the initial . Different from [22], Step 7 only requires each agent to calculate independently.
III-C Dissussion of the DOCMC and DOAOC Algorithm
In this subsection, we discuss the compelling characteristics of DOCMC and DOAOC and some simplification techniques in practical use , which are summarized as follows:
-
•
Many distributed optimization methods commonly used in machine learning, including frameworks like TensorFlow, employ distributed algorithms with a central computing node. When considering a star network (master-slave network) structure [21] in the DOCMC algorithm (where does not occur as each agent independently exchanges information with the server), simplification can be achieved using (21), where represents the average derivative and directly represents .
-
•
To obtain the average gradient, Step 3 in Algorithm 2 is necessary. In practical applications, after evaluating the performance of the agents, a suitable consensus protocol can be chosen: if agents have ample memory, memory-intensive consensus protocols like the DSF algorithm [24] can be employed; if agents have limited memory, a finite-time average consensus protocol [25] can be used; and in cases where agent performance is poor, traditional consensus protocols [7] [20] can still be applied. in DOAOC algorithm can be obtained not only through the above consensus protocol but also can be approximated by first-order derivative difference. In fact, in our recent work [23], Algorithm III utilizes a first-order derivative difference method to obtain the Hessian matrix.
IV CONVERGENCE ANALYSIS
In this section, we conduct a convergence analysis of the proposed algorithms. The superlinear convergence rate demonstrates the superiority of our optimal control-based algorithms. The assumption provided in this subsection is instrumental for drawing our conclusion.
Assumption 2
The local objective functions are twice continuously differentiable, and there exist constants such that for any ,
Remark 6
Let be the minimum point of , where represents and represents , now we can proceed with the convergence analysis of our algorithms.
For DOCMC algorithm, we can compactly write the update iteration in (19) as
(24) |
Lemma 4
Under the condition of Assumption 2. DOCMC algorithm converges if the weighted matrix is selected appropriately.
Proof:
First ly, we prove that converges to , where .
(25) |
Let . Recalling that and , it is obtained immediately from [27] that every eigenvalue of is positive and less than 1. Accordingly, all eigenvalues of are positive and its spectral radius is less than 1. It should be noted that is a symmetric matrix , which can be proved in [23]. Therefore, satisf ies the properties of in [20]. According to [19], then we have
(26) |
where . This means that converges to as . Let , next we will show that converges to . Using in the update iteration in (24) , we have
(27) |
which is consistent with the form of Algorithm 1 in [23]. Thus, there holds
(28) |
where . More details can refer to the proof of Lemma 2 in [23]. This proof is completed. ∎
Now we are in a position to discuss the convergence rate of DOCMC algorithm.
Theorem 1
Under the condition of Assumption 2, when generated by DOCMC algorithm is convergent, then there exists a scalar such that
(29) | ||||
(30) |
that is to say, DOCMC algorithm is superlinearly convergent.
Proof:
Denote
(31) | |||||
It is evident from (31) that
(32) | ||||
(33) |
From (24), one has
(34) |
Noting that is positive definite, there exists a matrix satisfying . To this end, it generates . The second equality indicates , where is a diagonal matrix. Due to , it immediately gets and
(35) |
By letting , (30) is obtained accordingly. The proof is completed.
∎
Now, the superlinear convergence of DOAOC algorithm is proved.
Theorem 2
Under the condition of Assumption 2, selecting and , when generated by DOAOC algorithm converges, then there exists a scalar such that
(36) |
That is to say, DOAOC algorithm is superlinearly convergent.
Proof:
Following from the direct derivation , it shows
(37) | ||||
(38) |
Let . From (23), it follows that
(39) |
Since , when it follows that . By letting , then the superlinearly convergence of the DOAOC algorithm can be obtained. The proof is completed. ∎
V CONCLUSIONS
This paper has proposed distributed second-order optimization algorithms with global superlinear convergence from the viewpoint of optimal control theory. It successfully integrates second-order information into distributed optimization without requiring the inversion of the Hessian matrix. A connection has been revealed between algorithms and traditional optimization methods. The convergence analysis of algorithms also have been achieved.
References
- [1] D. Belegundu, and R. Chandrupatla. Optimization concepts and applications in engineering. Journal of Management, 52(11): 56, 2000.
- [2] R. Eini, and S. Abdelwahed. Distributed model predictive control for intelligent trafic system. International Conference on Internet of Things (iThings), 909-915, 2019
- [3] K. Molzahn, F. Drfler, H. Sandberg, H. Low, S. Chakrabarti, R. Baldick, and J. Lavaei. A survey of distributed optimization and control algorithms for electric power systems. IEEE Transactions on Smart grid, 8(6): 2941-2962, 2017.
- [4] A. Olshevsky. Efficient Information Aggregation for Distributed Control and Signal Processing. Ph.D. dissertation, MIT, Cambridge, MA, 2010.
- [5] Y. Xue, B. Li, and K. Nahrstedt. Optimal resource allocation in wireless ad hoc networks: a price-based approach. IEEE Transactions on Mobile Computing, 5(4):347–364, 2006
- [6] P. Wan, and D. Lemmon. Event-triggered distributed optimization in sensor networks. International Conference on Information Processing in Sensor Networks, 49-60, 2009.
- [7] A. Nedi, and A. Ozdaglar. Distributed subgradient methods for multi-agent optimization. IEEE Transactions on Automatic Control, 54(1): 48–61, 2009.
- [8] A.Nedi, A. Ozdaglar, and A. Parrilo. Constrained consensus and optimization in multi-agent networks. IEEE Transactions on Automatica Control, 55(4): 922-938, 2010.
- [9] R. Islamov, X. Qian, and P. Richtrik. Distributed second order methods with fast rates and compressed communication. Proceedings of the 38 th International Conference on Machine Learning, 139: 4617-4628, 2021.
- [10] E. Wei, A. Ozdaglar, and A. Jadbabaie. A distributed Newton method for network utility maximization-I: Algorithm. IEEE Transactions on Automatic Control, 58(9): 2162-2175, 2013.
- [11] E. Wei, A. Ozdaglar, and A. Jadbabaie. A distributed Newton method for network utility maximization-II: Convergence. IEEE Transactions on Automatic Control, 58(9): 2176-2188 .
- [12] J. Zhang, K. You, and T. Baar. Distributed adaptive Newton methods with global superlinear convergence. Automatica,138: 110156, 2022.
- [13] D. Varagnolo, F. Zanella, A. Cenedese, G. Pillonetto, and L. Schenato. Newton-Raphson consensus for distributed convex optimization. IEEE Transactions on Automatica Control, 61(4): 994-1009, 2016.
- [14] H. Zhang, and H. Wang. Optimization methods rooting in optimal control. arXiv preprint arXiv:2312.01334, 1-8, 2024.
- [15] V. D. Blondel, J. M. Hendrickx, A. Olshevsky, and J. N. Tsitsiklis. Convergence in multiagent coordination, consensus, and flocking. Proceedings of the 44th IEEE Conference on Decision and Control, pp. 2996-3000, 2005. IEEE.
- [16] S. Ruder. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747, 2016.
- [17] L. Zhang, J. Xu, H. Zhang, and L. Xie. Distributed Optimal Control and Application to Consensus of Multi-Agent Systems. arXiv preprint arXiv:2309.12577, 2023.
- [18] H. Zhang, H. Wang, and L. Li. Adapted and casual maximum principle and analytical solution to optimal control for stochastic multiplicativenoise systems with multiple input-delays. in Proc. 51st IEEE Conf. Decision Control, Maui, HI, USA, pp. 2122–2127, 2012.
- [19] G. Qu and N. Li. Harnessing smoothness to accelerate distributed optimization. IEEE Transactions on Control of Network Systems, vol. 5, no. 3, pp. 1245-1260, 2017. IEEE.
- [20] R. Olfati-Saber and R. M. Murray. Consensus problems in networks of agents with switching topology and time-delays. IEEE Transactions on Automatic Control, vol. 49, no. 9, pp. 1520-1533, 2004. IEEE.
- [21] S. Soori, K. Mishchenko, A. Mokhtari, M. M. Dehnavi, and M. Gurbuzbalaban. DAve-QN: A distributed averaged quasi-Newton method with local superlinear convergence rate. International Conference on Artificial Intelligence and Statistics, pp. 1965-1976, 2020. PMLR.
- [22] Y. Xu, Z. Guo, K. Lu, and H. Zhang. Distributed Optimization Algorithm with Superlinear Convergence Rate. arXiv preprint arXiv:2409.12392, 2024.
- [23] H. Wang, Y. Xu, Z. Guo, and H. Zhang. Superlinear Optimization Algorithms. arXiv preprint arXiv:2403.11115, 2024.
- [24] J. Zhang, K. You, and T. Başar. Distributed adaptive Newton methods with global superlinear convergence. Automatica, vol. 138, pp. 110156, 2022. Elsevier.
- [25] T. Charalambous and C. N. Hadjicostis. Laplacian-based matrix design for finite-time average consensus in digraphs. 2018 IEEE Conference on Decision and Control (CDC), pp. 3654-3659, 2018. IEEE.
- [26] S. Boyd and L. Vandenberghe. Convex optimization. Cambridge University Press, 2004.
- [27] S.G. Wang, M.X. Wu, and Z.Z. Jia. Matrix inequalities. Chinese Science Press, Beijing, 2006.