Data-Based Receding Horizon Control of Linear Network Systems††thanks: This work was supported by ARO-W911NF-18-1-0213.
Abstract
We propose a distributed data-based predictive control scheme to stabilize a network system described by linear dynamics. Agents cooperate to predict the future system evolution without knowledge of the dynamics, relying instead on learning a data-based representation from a single sample trajectory. We employ this representation to reformulate the finite-horizon Linear Quadratic Regulator problem as a network optimization with separable objective functions and locally expressible constraints. We show that the controller resulting from approximately solving this problem using a distributed optimization algorithm in a receding horizon manner is stabilizing. We validate our results through numerical simulations.
Index Terms:
Data-based control, network systems, predictive control of linear systemsI Introduction
With the growing complexity of engineering systems, data-based methods in control theory are becoming increasingly popular, particularly for systems where it is too difficult to develop models from first principles and parameter identification is impractical or too costly. An important class of such systems are network systems, which arise in many applications such as neuroscience, power systems, traffic management, and robotics. Without a system model, agents must use sampled data to characterize the network behavior. However, the decentralized nature of the system means that agents only have access to information that can be measured locally, and must coordinate with one another to predict the network response and decide their control actions. These observations motivate the focus here on distributed data-based control of network systems with linear dynamics.
Literature Review
Distributed control of network systems is a burgeoning area of research, see e.g., [1, 2, 3] and references therein. In general, designing optimal controllers for network systems is an NP-hard problem, but under certain conditions optimal distributed controllers for linear systems can be obtained as the solution to a convex program [4]. When these conditions do not hold, suboptimal controllers can be obtained by convex relaxations [5, 6] or convex restrictions [7] of the original problem. Although these methods produce distributed controllers, the computation of the controller itself is typically done offline, in a centralized manner, and requires knowledge of the underlying system model. Reinforcement learning (RL) is an increasingly popular approach for controlling robots [8] and multi-agent systems [9]. However, RL approaches typically require a very large number of samples to perform effectively [10] and their complexity makes it difficult to get stability, safety, and robustness guarantees as is standard with other control approaches. For applications where safety assurances are required, model predictive control (MPC) is widely used since performance and safety constraints can be directly incorporated into an optimization problem that is solved online. Several distributed MPC formulations are available for multi-agent systems where the dynamics of the agents are coupled, such as [11, 12] where each agent implements a control policy minimizing its own objective while accounting for network interactions locally, or [13] where agents cooperate to minimize a system-wide objective using a network optimization algorithm. Data-based approaches to predictive control have also been proposed. System identification [14] is often leveraged to learn a parameterized model which can then be used with any of the MPC formulations previously mentioned. Methods for implementing a controller directly from sampled data without any intermediate identification also exist. The fundamental lemma from behavioral systems theory [15], which characterizes system trajectories from a single sample trajectory, has recently gained attention in the area of data-based control [16, 17, 18], and has been used for predictive control in the recently developed DeePC framework [19, 20]. Our work here extends the DeePC framework to network systems where each node only has partial access to the data.
Statement of Contributions
We develop distributed data-based feedback controllers for network systems111Throughout the paper, we make use of the following notation. Given integers, with , let . Let be an undirected graph with nodes, where and . The neighbors of are . Given and a vector , we denote . For with , we let . For positive semidefinite , we denote . For , we denote by its Moore-Penrose pseudoinverse. The Hankel matrix of a signal with block rows is the matrix Given two signals and , let be the signal where for , and for .. A group of agents whose state evolves according to unknown coupled linear dynamics each have access to their own state and those of their neighbors in some sample trajectory. Their collective objective is to drive the network state to the origin while minimizing a quadratic objective function without knowledge of the system dynamics. The approach we use computes the control policy online and in a distributed manner by extending the DeePC formalism to the network case. Building upon the fundamental lemma, we introduce a new distributed, data-based representation of possible network trajectories. We use this representation to pose the control synthesis as a network optimization problem, without state or input constraints, in terms of the data available to each agent. We show that this optimization problem is equivalent to the standard finite-horizon Linear Quadratic Regulation (LQR) problem and introduce a primal-dual method along with a suboptimality certificate to allow agents to cooperatively find an approximate solution. Finally, we show that the controller that results from implementing the distributed solver in a receding horizon manner is stabilizing.
II Preliminaries
We briefy recall here basic concepts on the identifiability of Linear Time-Invariant (LTI) systems from data. Given with , a signal is persistently exciting of order if . Informally, this means that any arbitrary signal can be described as a linear combination of windows of width in the signal . A necessary condition for persistence of excitation is .
Lemma II.1.
(Fundamental Lemma [15]): Consider the LTI system , with controllable. Let , be sequences such that is a trajectory of the system and is persistently exciting of order . Then for any pair , , is a trajectory of the system if and only if there exists such that .
Lemma II.1 is stated here in state-space form, even though the result was originally presented in the language of behavioral systems theory. The result states that all trajectories of a controllable LTI system can be characterized by a single trajectory if the corresponding input is persistently exciting of sufficiently high order, obviating the need for a model or parameter estimation when designing a controller.
III Problem Formulation
Consider a network system described by an undirected graph with nodes. Each node corresponds to an agent with sensing, communication, and computation capabilities. Each edge corresponds to both a physical coupling and a communication link between the corresponding agents. A subset of the nodes , with , also have actuation capabilities via inputs . The system dynamics are then
(1) |
where , and . Let and and define and . Let and be matrices so that (1) takes the compact form .
To each node , we associate an objective of the form when and otherwise. Here, each is positive semidefinite, each is positive definite, is a system trajectory, and is the time horizon of trajectories being considered.
Each node wants to drive its state to the origin while minimizing and satisfying the constraints. The resulting network objective function is the sum of the objective functions across the nodes. Letting and , this objective can be written as
so that . If the system starts from , the agents’ goal can be formulated as the network optimization problem:
(2) | ||||||
subject to | for | |||||
Note that the agents’ decisions on their control inputs are coupled through the constraints. Since , if (2) is feasible, its optimal state and input trajectories are unique.
A key aspect of this paper is that we consider scenarios where the system matrices and are unknown to the network. Instead, we assume that, for a set of given input sequences , the corresponding state trajectories are available, and each node has access to its own state trajectory as well as those of its neighbors. Actuated nodes also have access to their own input , but this is unknown to its neighbors . Our aim is to synthesize a control policy that can be implemented by each node in a distributed way with data available to it. The resulting controller should stabilize the system to the origin while minimizing .
IV Data-Based Representation for Optimization
Here, we introduce a data-based representation of system trajectories that is employed to pose a network optimization problem equivalent to (2). Throughout this section, we let , be sequences such that is a trajectory of (1). Let
for each and . Then is the data available to each node. Let , be arbitrary sequences where . Define and
Let for , and otherwise. We define to be the matrix consisting of all ones and zeros such that and .
IV-A Data-Based Representation of Network Trajectories
Lemma II.1 states conditions under which the behavior of the system can be described completely by the Hankel matrix of the sampled data. Here we extend Lemma II.1 to the setting of a network system to build a data-based representation of network trajectories using the Hankel matrices of the data available to each agent, . We show that under certain conditions the image of is the set of all possible trajectories of node .
Proposition IV.1.
(Sufficiency of Date-Based Image Representation): If for each there exists with , then is a trajectory of (1).
Proof.
Writing so for all , , it follows that so for ,
By a similar computation, we can show that for each ,
which is consistent with (1). ∎
Next we identify conditions for the converse of the above result to hold, i.e., when the Hankel matrices of all the agents characterize all possible network trajectories.
Proposition IV.2.
(Necessity of Data-Based Image Representation): If is controllable, is a trajectory of (1) and either
-
(i)
is persistently exciting of order ;
-
(ii)
is persistently exciting of order for each , and is persistently exciting of order for each ;
then for all there exists such that .
Proof.
In the case of (i) we simply apply Lemma II.1 to obtain where , and note that for all , , so the result follows by letting . For case (ii), we think of for as an input to node . Letting , where , and defining
we have
Let be arbitrary, and such that the th block component is . Since is controllable there exists an input such the corresponding state trajectory with has . Note that if , then is the state trajectory corresponding to the input , and and . Likewise, if , is the state trajectory corresponding to the input , and and . Hence is controllable for all and the result follows from Lemma II.1. ∎
Remark IV.3.
(Feasibility of Identifiability Conditions): Proposition IV.2 gives conditions on when the data is rich enough to characterize all possible trajectories of the system. Condition (i) gives conditions on the input sequence, , which guarantee a priori the identifiability of the system from data. This condition is generically true in the sense that the set of sequences which are not persistently exciting of order (even though for all , is) have zero Lebesgue measure. In general, it is difficult to verify condition (i) in a distributed manner. On the other hand, it is straightforward to verify condition (ii) using only information available to the individual agents. However this verification must be done in an ad hoc manner, after the input has been applied to the system. While the condition is sufficient for identifiability, there are systems where for all inputs , the resulting trajectory will never satisfy it.
IV-B Equivalent Network Optimization Problem
Here, we build on the data-based image representation of network trajectories in a distributed fashion to pose a network optimization problem that can be solved with the data available to each agent, which is equivalent to an LQR problem with a time horizon of . Each node can use this representation along with past states and inputs to predict future trajectories assuming that the hypotheses of Proposition IV.2 are satisfied. Formally, let and let and be sequences such that is a long trajectory of the system. In the network optimization we introduce below, we optimize over system trajectories of length , constrained so the first samples of and are and resp. This plays a similar role to the initial condition constraint in (2).
For each node , define
(3) |
Consider the following problem
(4) | ||||||
subject to | ||||||
Although (4) does not necessarily have a unique optimizer, any optimizer of (4) is such that and are the optimal input and state sequences of (2), as formalized next.
Proposition IV.4.
We omit the proof for space reasons, but note that it is the analogue of Theorem 5.1 and Corollary 5.2 in [19] for the case of network systems once one invokes Propositions IV.1 and IV.2 above. Unlike the original network optimization problem (2), for which agents lack knowledge of the system matrices , , the network optimization problem (4) can be solved in a distributed way with the information available to them. The structure of the problem (aggregate objective functions plus locally expressible constraints) makes it amenable to a variety of distributed optimization algorithms, see e.g., [21, 22]. In Section V-B below, we employ a primal-dual dynamic to find asymptotically a solution of (4) in a distributed way.
Remark IV.5.
(Scalability of Network Optimization): As the number of nodes in the network increases so does the state dimension, hence more data is required in order to maintain persistency of excitation. A necessary condition is . Assuming that , , we have . The decision variable for each node is when and otherwise. The size of is . However, using the distributed optimization algorithm of Section V-B, agent only needs to send messages of size to agent .
V Distributed Data-Based Predictive Control
Here we introduce a distributed data-based predictive control scheme to stabilize the system (1) to the origin, as described in Section III. To do this, we solve the network optimization problem (4) in a receding horizon manner with and updated every time step based on the systems current state. The control scheme is summarized in Algorithm 1.
The rest of the section proceeds by first showing that the controller resulting from Algorithm 1 is stabilizing even when the network optimization (4) is solved only approximately; and then introducing a particular distributed solver for (4) along with a suboptimality certificate to check, in a distributed manner, the stopping condition in Step 4 of Algorithm 1.
V-A Stability Analysis of Closed-Loop System
In the rest of the paper, we rely on the following assumption.
Assumption V.1.
Since the system is controllable (cf. (i)), a sufficient condition for guaranteeing the feasibility in (ii) is . In fact, in such case, there exists a trajectory such that . It follows that is feasible for (2), so by Proposition IV.4, there exists some such that is feasible for (4).
Under Assumption V.1, the closed-loop system with the controller corresponding to a receding horizon implementation of (2) is globally exponentially stable, cf. [23, Theorem 12.2]. Unlike [19], we do not assume we have access to the exact solution of (4) since distributed optimization algorithms typically only converge asymptotically to the true optimizer and must be terminated in finite time. Here we show that Algorithm 1 still stabilizes the system when the tolerance is sufficiently small.
Theorem V.2.
Proof.
Let be the feedback corresponding to a receding horizon implementation of (2). Consider the system where . Let , where is an optimizer of (2). Because (4) is nondegenerate, is continuously differentiable, cf. [23, Theorem 6.9], and the system is input-to-state stable (ISS) with being a ISS-Lyapunov function satisfying for constants , cf. [24, Theorem 1] (albeit the result is stated there for systems where is Schur stable, the same reasoning is valid when there are no state or input constraints and is unstable). Because the system without disturbances is exponentially stable [25], it follows by [26, proof of Lemma 3.5] that the gain function is linear, so there exist and a class function such that
for all . Let be a trajectory of the closed-loop dynamics of (1) with the controller described by Algorithm 1 where and define . It follows that . Note that , where and so it follows that We claim that for all , there exists such that whenever . The case when follows by observing that and there exists such that for all . If the claim holds for some , then for all ,
so by choosing such that for all , then for all and the claim follows by induction. Hence To show global Lyapunov stability, let be arbitrary, and suppose that is chosen so that and . Then for all ,
If , then . It follows by induction on that for all . ∎
V-B Primal-Dual Solver for Network Optimization
In this section we introduce a method for solving the optimization problem (4) in a distributed way. We let
and . Note , where for and otherwise. Problem (4) can be written as
(5) | ||||
subject to |
for suitable , , ,, and , with . The Lagrangian of (5) is
If is an optimizer of the dual problem, then the pair is a (min-max) saddle point of , meaning that for all and . The saddle-point property of the Lagrangian suggests that the primal-dual flow, which descends along the gradient of the primal variable and ascends along the gradient of the dual variable,
(6) | ||||
can be used to compute the optimizer. Here, is the matrix such that . By [22, Corollary 4.5], the flow converges asymptotically to a saddle point of . This procedure is fully distributed, since the flow equations in (6) can be computed with the information available to each agent or its direct neighbors. In particular, if , then the message agent shares with agent consists of , which is (cf. Remark IV.5).
We conclude by providing a certificate that can be used to verify the stopping condition of Step 4 in Algorithm 1.
Proposition V.3.
Proof.
The set of saddle points of is . Since the optimal input and state trajectories are unique, all saddle points share the property that, for each , the components of their equal . Given an arbitrary , we have for all , and hence . The set of saddle points can also be described as , for any . Therefore, . Since is the orthogonal projection onto ,
and therefore,
∎
The suboptimality certificate can be checked in a fully distributed manner using information locally available to each agent provided that is known. Because depends only on the objective and constraints , which in turn comes from the sample trajectory , it can be computed offline. Finally, it is possible for each agent to compute a bound on using the fact that, for , one has for all . It follows that for all ,
for , so each agent can compute a bound on using data available to itself and its neighbors.




V-C Numerical Simulations
We simulate the proposed distributed data-based predictive controller on a Newman-Watts-Strogatz network [27] and a star network. In each case, and are chosen at random so that is controllable. The input sequence is chosen as , where is a matrix so that is marginally stable (the data does not need to be generated from a stable system, but this is done to avoid numerical issues), and is a Gaussian white noise process. We use Proposition IV.2 to ensure that the data is informative enough for data-driven control. In both cases, condition (i) is satisfied. Condition (ii) fails for the Newman-Watts-Strogatz network, but is satisfied by the star network (cf. Remark IV.3). We integrate the primal-dual flow using the stopping condition in Proposition V.3 is used to terminate the flow. Fig. 1 shows the closed-loop state trajectories and the number of iterations on each time step with different values of . For the Newman-Watts-Strogatz network, cf. Fig. 1(a), the time horizon is . For the star network, cf. Fig. 1(b), the time horizon is , but the optimization at each time step is still feasible. In both cases, the distributed data-based predictive controller better approximates the exact MPC for smaller values of at the cost of more iterations per time step.
VI Conclusions and Future Work
We have introduced a distributed data-based predictive controller for stabilizing network linear dynamics described by unknown system matrices. Instead of building a dynamic model, agents learn a non-parametric representation based on a single trajectory and use it to implement a controller as the solution of a network optimization problem solved in a receding horizon manner and in a distributed way. Future work will explicitly quantify the tolerance in terms of the available data and study ways to construct a terminal cost without knowledge of the underlying model to guarantee stability when the stabilizing terminal constraint is omitted. We plan to extend the results to cases where there are constraints on the state and input, characterize the robustness properties of the introduced control scheme, investigate ways of improving its scalability, and consider more general scenarios, including the presence of noise in the data, inputs not persistently exciting of sufficiently high order, and partial observations of the network state. We also plan to explore improvements to the primal-dual flow to solve the optimization problem with fewer iterations and less communication between the agents.
References
- [1] F. Bullo, J. Cortés, and S. Martinez, Distributed Control of Robotic Networks. Applied Mathematics Series, Princeton University Press, 2009.
- [2] M. Mesbahi and M. Egerstedt, Graph Theoretic Methods in Multiagent Networks. Applied Mathematics Series, Princeton University Press, 2010.
- [3] R. R. Negenborn and J. M. Maestre, “Distributed model predictive control: An overview and roadmap of future research opportunities,” IEEE Control Systems, vol. 34, no. 4, pp. 87–97, 2014.
- [4] M. Rotkowitz and S. Lall, “A characterization of convex problems in decentralized control,” IEEE Transactions on Automatic Control, vol. 51, no. 2, pp. 274–286, 2006.
- [5] F. Lin, M. Fardad, and M. R. Jovanovic, “Design of optimal sparse feedback gains via the alternating direction method of multipliers,” IEEE Transactions on Automatic Control, vol. 58, no. 9, pp. 2426–2431, 2013.
- [6] G. Fazelnia, R. Madani, A. Kalbat, and J. Lavaei, “Convex relaxation for optimal distributed control problems,” IEEE Transactions on Automatic Control, vol. 62, no. 1, pp. 206–221, 2017.
- [7] L. Furieri, Y. Zheng, A. Papachristodoulou, and M. Kamgarpour, “On separable quadratic Lyapunov functions for convex design of distributed controllers,” in European Control Conference, (Naples, Italy), pp. 42–49, 2019.
- [8] J. Kober, J. A. Bagnell, and J. Peters, “Reinforcement learning in robotics: A survey,” International Journal of Robotics Research, vol. 32, no. 11, pp. 1238–1274, 2013.
- [9] L. Bu, R. Babu, B. D. Schutter, et al., “A comprehensive survey of multiagent reinforcement learning,” IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 38, no. 2, pp. 156–172, 2008.
- [10] B. Recht, “A tour of reinforcement learning: The view from continuous control,” Annual Review of Control, Robotics, and Autonomous Systems, vol. 2, pp. 253–279, 2019.
- [11] D. Jia and B. Krogh, “Min-max feedback model predictive control for distributed control with communication,” in American Control Conference, (Anchorage, AK), pp. 4507–4512, 2002.
- [12] W. B. Dunbar, “Distributed receding horizon control of dynamically coupled nonlinear systems,” IEEE Transactions on Automatic Control, vol. 52, no. 7, pp. 1249–1263, 2007.
- [13] B. T. Stewart, A. N. Venkat, J. B. Rawlings, S. J. Wright, and G. Pannocchia, “Cooperative distributed model predictive control,” Systems & Control Letters, vol. 59, no. 8, pp. 460–469, 2010.
- [14] L. Ljung, System Identification: Theory for the User. Prentice Hall information and system sciences series, Prentice Hall, 1999.
- [15] J. C. Willems, P. Rapisarda, I. Markovsky, and B. L. M. D. Moor, “A note on persistency of excitation,” Systems & Control Letters, vol. 54, no. 4, pp. 325–329, 2005.
- [16] U. Park and M. Ikeda, “Stability analysis and control design of lti discrete-time systems by the direct use of time series data,” Automatica, vol. 45, no. 5, pp. 1265–1271, 2009.
- [17] T. M. Maupong and P. Rapisarda, “Data-driven control: A behavioral approach,” Systems & Control Letters, vol. 101, pp. 37–43, 2017.
- [18] C. D. Persis and P. Tesi, “Formulas for data-driven control: Stabilization, optimality and robustness,” arXiv preprint arXiv:1903.06842, 2019.
- [19] J. Coulson, J. Lygeros, and F. Dörfler, “Data-enabled predictive control: in the shallows of the DeePC,” in European Control Conference, (Naples, Italy), pp. 307–312, 2019.
- [20] J. Berberich, J. Köhler, A. M. Müller, and F. Allgöwer, “Data-driven model predictive control with stability and robustness guarantees,” arXiv preprint arXiv:1906.04679, 2019.
- [21] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends in Machine Learning, vol. 3, no. 1, pp. 1–122, 2011.
- [22] A. Cherukuri, B. Gharesifard, and J. Cortés, “Saddle-point dynamics: conditions for asymptotic stability of saddle points,” SIAM Journal on Control and Optimization, vol. 55, no. 1, pp. 486–511, 2017.
- [23] F. Borrelli, A. Bemporad, and M. Morari, Predictive Control for Linear and Hybrid Systems. Cambridge, UK: Cambridge University Press, 2017.
- [24] A. Jadbabaie and A. S. Morse, “On the ISS property for receding horizon control of constrained linear systems,” IFAC Proceedings Volumes, vol. 35, no. 1, pp. 37–40, 2002.
- [25] D. Q. Mayne, J. B. Rawlings, C. V. Rao, and P. O. M. Scokaert, “Constrained model predictive control: Stability and optimality,” Automatica, vol. 36, pp. 789–814, 2000.
- [26] Z.-P. Jiang and Y. Wang, “Input-to-state stability for discrete-time nonlinear systems,” Automatica, vol. 37, no. 6, pp. 857–869, 2001.
- [27] M. E. J. Newman and D. J. Watts, “Renormalization group analysis of the small-world network model,” Physics Letters A, vol. 263, no. 4-6, pp. 341–346, 1999.