Near-Optimal Control Strategy in Leader-Follower Networks: A Case Study for Linear Quadratic Mean-Field Teams
Abstract
In this paper, a decentralized stochastic control system consisting of one leader and many homogeneous followers is studied. The leader and followers are coupled in both dynamics and cost, where the dynamics are linear and the cost function is quadratic in the states and actions of the leader and followers. The objective of the leader and followers is to reach consensus while minimizing their communication and energy costs. The leader knows its local state and each follower knows its local state and the state of the leader. The number of required links to implement this decentralized information structure is equal to the number of followers, which is the minimum number of links for a communication graph to be connected. In the special case of leaderless, no link is required among followers, i.e., the communication graph is not even connected. We propose a near-optimal control strategy that converges to the optimal solution as the number of followers increases. One of the salient features of the proposed solution is that it provides a design scheme, where the convergence rate as well as the collective behavior of the followers can be designed by choosing appropriate cost functions. In addition, the computational complexity of the proposed solution does not depend on the number of followers. Furthermore, the proposed strategy can be computed in a distributed manner, where the leader solves one Riccati equation and each follower solves two Riccati equations to calculate their strategies. Two numerical examples are provided to demonstrate the effectiveness of the results in the control of multi-agent systems.
Proceedings of IEEE Conference on Decision and Control, 2018.
I Introduction
The control of multi-agent systems with leader-follower structure has attracted much interest in the past two decades due to its wide range of applications in various fields of science and engineering. Such applications include vehicle formation [1], sensor networks [2], surveillance using a team of unmanned aerial vehicles (UAVs) [3] and flocking [4, 5], to name only a few. In this type of problem, a group of agents (called followers) are to track an agent (called leader) while certain performance objectives are achieved. Different performance measures such as minimum energy, fuel or time are considered in the literature. For this purpose, limited communication and computation resources are two main challenges that need to be overcome.
To address the challenges outlined in the previous paragraph, the following two problems have been investigated in the context of consensus control protocols in the literature recently: (i) how the states of the followers can reach the state of the leader under communication constraints (distributed control problem), where consensus is reached under appropriate linear strategies for properly connected communication graphs [6, 7, 8], and (ii) how the state of the followers can reach the state of the leader with minimum energy consumption (optimal control problem), where the optimal control strategy for a quadratic performance index with linear dynamics under centralized information structure is a linear feedback rule obtained by the solution of the celebrated Riccati equation [9]. Combining the above two objectives, however, is quite challenging as it leads to a decentralized optimal control problem wherein the optimal control law is not necessarily linear [10]. Furthermore, since the dimension of the matrices in the network model increases with the number of followers, the optimal control law may be intractable for a network of large size. In this paper, we consider a decentralized optimal control problem with a large number of followers.
For dynamically decoupled followers and also the case of a leaderless multi-agent system, [11, 12] use the control inverse optimality approach to compute the optimal distributed control strategies for special classes of communication graphs. The authors in [13] consider a large number of homogeneous followers and determine the optimal strategy by solving two coupled Riccati equations. In contrast, this paper studies a leader-follower multi-agent network with coupled dynamics under a directed communication graph in which there is a direct link from the leader to each follower. In the special case of leaderless, the communication graph is not required to be connected. When the initial states of followers are identically and independently distributed, a near-optimal strategy is proposed for a large number of homogeneous followers by solving two decoupled Riccati equations using mean-field team approach introduced by [14] and showcased in [15, 16, 17, 18].
The remainder of this paper is organized as follows. The problem is formulated in the next section, where the main contributions of the work are also outlined. Then in Section III, some important assumptions are presented and the control strategy is derived. Two numerical examples are provided in Section IV and finally the paper is concluded in Section V.
II Problem Formulation
II-A Notation
In this paper, and denote natural and real numbers, respectively. The short-hand notation is used to denote vector , . For any , is the finite set of integers . is the trace of a matrix and is the covariance of a random vector.
II-B Dynamics
Consider a multi-agent system consisting of one leader and followers. Denote by , and , , the state, action, and noise of the leader at time , respectively. In addition, let , , and denote the state, action, and noise of follower at time , analogously. The state of the leader evolves as follows:
(1) |
where is the average of the states of the followers at time , and will hereafter be called mean-field [14]. Similarly, the state of each follower evolves as follows:
(2) |
Let , be the control horizon. It is assumed that the primitive random variables are defined on a common probability space, and are mutually independent.
II-C Information structure
At time , the leader observes its state and chooses its action according to a control law , i.e.,
(3) |
In addition, for any , follower observes its state as well as the state of the leader at time , and decides its action as follows:
(4) |
where is the control law. Note that the control actions (3) and (4) have a non-classical decentralized information structure.
Remark 1
The number of links required to implement the above information structure is , which is the minimum possible number of links for leader-follower networks to be connected. In addition, no information about the states of the followers is communicated; hence, the privacy of the followers is preserved. □
Remark 2
The set of all control laws is called the strategy of the network. The objective of the followers is to track the leader in an energy-efficient manner. To this end, the following cost function is defined:
(5) |
where the expectation is taken with respect to the probability measures induced by the choice of strategy , and are symmetric matrices of appropriate dimensions. It is to be noted that the rate of convergence of the followers to the leader depends on the matrices and . Moreover, the collective behavior of the followers changes by matrix .
Problem 1
Remark 3
Notice that if matrices , and are zero, Problem 1 reduces to the optimal control of a leaderless multi-agent network. In that case, represents the desired reference trajectory, and as noted before, the followers do not share anything with each other once they receive the reference trajectory information , according to (4). □
II-D Main challenges and contributions
There are two main challenges in finding a solution to Problem 1. The first one is concerned with non-classical information structure, as the optimal strategy under this type of information structure is not necessarily linear [10]. The second challenge is the curse of dimensionality as the matrices in Problem 1 are fully dense, yet their dimension increases with the number of followers.
In their previous work [15], the authors show that if the mean-field is available to the leader and followers, then the optimal solution is unique and linear. However, collecting and sharing the mean-field among all agents is not cost-efficient, in general, specially when the number of followers is large. It is shown in the next section that the effect of such information sharing on the performance of the network is negligible when the number of followers is large enough.
III Main Results
In this section, we propose a strategy and compute its performance with respect to the optimal performance, and show that the difference between them converges to zero at rate . For the sake of clarity in the notation, we use letters and to denote the states and actions, respectively, under the optimal strategy. Therefore, from (1) and (2), the dynamics of the leader and followers at time under the optimal strategy are given by
(7) |
(8) |
where . Similarly to [15], define the following matrices:
(9) |
(10) |
Assumption 1
Matrices and are positive semi-definite and matrices and are positive definite.
Define two decoupled Riccati equations s.t. for any ,
(11) |
(12) |
where and . According to [14, 15], the optimal performance is obtained under the following linear strategies:
(13) |
(14) |
where and can be found by using these formulas:
(15) |
III-A Solution of Problem 1
The following standard assumptions are imposed.
Assumption 2
The initial states of the followers are i.i.d. with mean and finite covariance matrix . In addition, the local noises of the followers are i.i.d. with zero mean and finite covariance matrix .
Assumption 3
Matrices , , , , and do not depend on the number of followers .
Define a stochastic process such that and for any :
(16) |
Note that the leader and followers can compute under the information structures (3) and (4). Given the matrix gains defined in (III), the following strategies are proposed:
(17) |
(18) |
At any time , define the following relative errors , and :
(19) |
Lemma 1
Proof
(22) |
Also, it results from (1) and (17) that
(23) |
(24) |
where is given by:
(25) |
Substituting (25) in (24) yields:
(26) |
In addition, from (2) and (18), one arrives at:
(27) |
where is as follows:
(28) |
From (27) and (28), it results that:
(29) |
Equations (19), (22) and (23) lead to:
(30) |
Moreover, it results from (16), (19) and (26) that:
(31) | ||||
(32) |
As a consequence of (16), (19) and (29), the following equation is obtained:
(33) |
and this completes the proof. ■
Now, for any follower , define the following variables at time :
(34) |
Lemma 2
At any time , and . □
Proof
Lemma 3
Proof
From (II-C), we have
(38) |
The above equation can be re-written in terms of the variables defined in (34) as follows:
(39) |
On the other hand, by definition the following relations hold:
(40) |
By incorporating (40) in (39), it results that
(41) |
According to Lemma 2, the above equation simplifies to
(42) |
Using the definition of relative errors in (19), one concludes that:
(43) |
It is implied from (42) and (43) that:
(44) |
Expand (44) as follows:
(49) | ||||
(54) | ||||
(59) | ||||
(64) | ||||
(69) | ||||
(74) | ||||
(79) | ||||
(84) | ||||
(89) |
The second, fifth and eighth terms in the right side of (49) are zero from Lemma 1 and Assumption 2, on noting that and are completely known under the information structures (3) and (4). This completes the proof. ■
Theorem 1
Let Assumptions 1 and 2 hold. Then,
(93) | ||||
(97) |
where , and is the solution of the following Lyapunov equation for any :
(98) |
□
Proof
According to Lemma 3, the performance discrepancy is a quadratic function of the relative errors, and from Lemma 1, the relative errors have linear dynamics. Therefore, can be regarded as the quadratic cost of an uncontrolled linear system (where there is no control action). Thus, from the standard results in linear systems [9], can be expressed by the Lyaponuv equation (98) and the covariance matrices of the initial relative errors and noises , as:
(99) |
and
(100) |
■
Theorem 2
Proof
Corollary 1
For the special case of a leaderless multi-agent network, let , . Then, according to Theorem 2, strategy (18) steers all the followers to the initial mean as grows to infinity. In addition, if the initial mean is not known, it can be replaced by its expectation, i.e., , , and the resultant strategy (18) steers all the followers to the initial mean consensus as grows to infinity, due to the strong law of large numbers. □
IV Numerical Examples
Example 1: Consider a multi-agent network with one leader and followers, where the initial state of the leader is and the initial states of the followers are chosen as uniformly distributed random variables in the interval . Let the dynamics of the leader and followers be described by (1) and (2), respectively, where
(103) | |||
(104) | |||
(105) |
The network objective is to minimize the cost function (II-C), where . The leader solves the Riccati equation (12) to obtain gains and , and determines its control action according to strategy (17) using its local state as well as . It is to be noted that is obtained at any time in terms of and using (16). In addition, for any , follower solves two Riccati equations (11) and (12) to find and , and then computes its control action based on (18) using its local state , the state of the leader , and variable . The results are depicted in Figure 1, where the thick curve represents the state of the leader, and thin curves are the states of the followers (to avoid a cluttered figure, only 100 followers are chosen, randomly, to display their states). It can be observed from this figure that the states of all agents (which are, in fact, the position of the agents) are convergent under the proposed control strategy, as expected, and hence consensus is achieved asymptotically.

The next example demonstrates the efficacy of the results obtained in this work, for the special case of leaderless multi-agent networks.
Example 2: Consider a multi-agent system consisting of agents that are to track a constant reference trajectory . The following parameters are used in the simulation:
(106) | |||
(107) | |||
(108) | |||
(109) |
Similar to Example 1, each follower computes its control action according to (18). It is to be noted that in the leaderless case, the agents do not communicate as discussed in Remark 3.

V Conclusions
A mean-field approach to the decentralized control of a leader-follower multi-agent network with a single leader is presented in this paper, where the states of the leader and followers are coupled in the dynamics and cost. A near-optimal strategy for a non-classical information structure is proposed such that the strategy is obtained by solving two decoupled Riccati equations, where the dimension of the matrices in these equations is independent of the number of followers. This means that the proposed method is not only distributed, it is also scalable. It is shown that the proposed solution converges to the optimal strategy at a rate inversely proportional to the number of followers. The effectiveness of the results is verified by simulation, for two different multi-agent settings with 1000 and 100 followers.
As suggestions for future research directions, one can extend the results to the case of infinite horizon, multiple leaders, heterogeneous followers, and weighted cost functions, under standard assumptions in mean-field teams [14]. The approach is robust in the sense that the failure of a small number of followers has negligible impact on the mean-field for a network of sufficiently large population.
References
- [1] J. A. Fax and R. M. Murray, “Information flow and cooperative control of vehicle formations,” IEEE Transactions on Automatic Control, vol. 49, no. 9, pp. 1465–1476, 2004.
- [2] P. Rawat, K. D. Singh, H. Chaouchi, and J. M. Bonnin, “Wireless sensor networks: a survey on recent developments and potential synergies,” Supercomputing, vol. 68, no. 1-48, 2014.
- [3] L. Gupta, R. Jain, and G. Vaszkun, “Survey of important issues in UAV communication networks,” IEEE Communications Surveys & Tutorials, vol. 18, no. 2, pp. 1123–1152, 2016.
- [4] C. W. Reynolds, “Flocks, herds and schools: A distributed behavioral model,” SIGGRAPH Comput. Graph., vol. 21, no. 4, pp. 25–34, 1987.
- [5] R. Olfati-Saber, “Flocking for multi-agent dynamic systems: algorithms and theory,” IEEE Transactions on Automatic Control, vol. 51, no. 3, pp. 401–420, 2006.
- [6] A. Jadbabaie, J. Lin, and A. S. Morse, “Coordination of groups of mobile autonomous agents using nearest neighbor rules,” IEEE Transactions on Automatic Control, vol. 48, no. 6, pp. 988–1001, 2003.
- [7] W. Ren, R. W. Beard, and E. M. Atkins, “Information consensus in multivehicle cooperative control,” IEEE Control Systems, vol. 27, no. 2, pp. 71–82, 2007.
- [8] R. Olfati-Saber, J. A. Fax, and R. M. Murray, “Consensus and cooperation in networked multi-agent systems,” Proceedings of the IEEE, vol. 95, no. 1, pp. 215–233, 2007.
- [9] P. E. Caines, Linear stochastic systems. John Wiley and Sons, Inc., 1987.
- [10] H. Witsenhausen, “A counterexample in stochastic optimum control,” SIAM Journal on Control and Optimization, vol. 6, pp. 131–147, 1968.
- [11] K. H. Movric and F. L. Lewis, “Cooperative optimal control for multi-agent systems on directed graph topologies,” IEEE Transactions on Automatic Control, vol. 59, no. 3, pp. 769–774, 2014.
- [12] Y. Cao and W. Ren, “Optimal linear-consensus algorithms: An LQR perspective,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 40, no. 3, pp. 819–830, 2010.
- [13] F. Borrelli and T. Keviczky, “Distributed LQR design for identical dynamically decoupled systems,” IEEE Transactions on Automatic Control, vol. 53, no. 8, pp. 1901–1912, 2008.
- [14] J. Arabneydi, “New concepts in team theory: Mean field teams and reinforcement learning,” Ph.D. dissertation, Dep. of Electrical and Computer Engineering, McGill University, Montreal, Canada, 2016.
- [15] J. Arabneydi, M. Baharloo, and A. G. Aghdam, “Optimal distributed control for leader-follower networks: A scalable design,” in Proceedings of the \nth31 IEEE Canadian Conference on Electrical and Computer Engineering, 2018, pp. 1–4.
- [16] J. Arabneydi and A. G. Aghdam, “A certainty equivalence result in team-optimal control of mean-field coupled Markov chains,” in Proceedings of the \nth56 IEEE Conference on Decision and Control, 2017, pp. 3125–3130.
- [17] ——, “Optimal dynamic pricing for binary demands in smart grids: A fair and privacy-preserving strategy,” in Proceedings of American Control Conference, 2018, pp. 5368–5373.
- [18] J. Arabneydi and A. Mahajan, “Team-optimal solution of finite number of mean-field coupled LQG subsystems,” in Proceedings of the \nth54 IEEE Conference on Decision and Control, 2015, pp. 5308 – 5313.